Discrete Mathematics A Concise Introduction

Synthesis Lectures on
Mathematics & Statistics
George Tourlakis
Discrete
Mathematics
A Concise Introduction
Synthesis Lectures on Mathematics &
Statistics
Series Editor
Steven G. Krantz, Department of Mathematics, Washington University, Saint Louis, MO,
USA
This series includes titles in applied mathematics and statistics for cross-disciplinary
STEM professionals, educators, researchers, and students. The series focuses on new and
traditional techniques to develop mathematical knowledge and skills, an understanding of
core mathematical reasoning, and the ability to utilize data in specific applications.
George Tourlakis
Discrete Mathematics
A Concise Introduction
George Tourlakis
Department of Electrical Engineering
and Computer Science
York University
Toronto, ON, Canada
ISSN 1938-1743 ISSN 1938-1751 (electronic)

Synthesis Lectures on Mathematics & Statistics
ISBN 978-3-031-30487-3 ISBN 978-3-031-30488-0 (eBook)
https://doi.org/10.1007/978-3-031-30488-0
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
για την šσπoινα
Preface
This volume is an introduction to discrete mathematics, an area of study that is required by

most computer science and computer, software as well as electrical engineering curricula.
It is also often prescribed for mathematics majors as the first course where you “do
mathematics with proofs”.
Discrete mathematics studies properties of “discrete sets”, finite or infinite, and their
objects. The major representatives of such sets are all the finite sets, and not only the set
of natural numbers, N —but also the set of all integers, Z = {…, −1, 0, 1, …}— in the
infinite case.
What makes N (or Z or finite sets) “discrete” is that the standard order “<” on natural
numbers (or on Z) and any order on finite sets has “gaps” between consecutive members.
By contrast the standard order, also denoted by “<”, on the set of reals (R) has the property
that if a and b are any two reals such that a < b, then there is a real c between the two:
Just take c = (a + b)/2 and you have a < c < b. This is not so with natural numbers. For
example, there is no c in N that lies between 1 and 2: For a c in N, 1 < c <2 is false.
A looser definition of a “discrete” set might allow Q, the set of rationals (fractions with
integers as numerator and denominator), as a set of interest in the discipline of discrete
mathematics.1 Allowing this set in due to another of its attributes (not shared by the set
of reals that is the domain of study in calculus and analysis), namely the members of Q
can be enumerated using natural number indices; it is enumerable as we say.
We will not study the properties of Q in this volume.
All books on discrete mathematics, big or small (this one is deliberately small), include
mostly topics on set theory.
The area of study denoted by the term set theory is vast, non-elementary, and entire
volumes have been written to cover just set theory and nothing else (e.g., cf. Jech (1978),
Levy (1979), Kunen (1980), Tourlakis (2003b)). As it turns out, most discrete mathematics
texts have the majority of their topics in the area of set theory, doing so under various
chapter titles.
1 Rational numbers have no gaps with respect to the order <. If a, b are rationals, there is always a
rational c between the two: For example, take c = (a + b)/2.
vii
viii Preface
The present volume engages all but two of its chapters on set theory topics —the
two exceptions being Chaps. 4 and 7. In particular, we include operations on sets, func-
tions, and relations, finite and infinite sets, uncountable sets, and diagonalisation that is
a must-have tool in theory of computation, equivalence relations, orders, induction and
inductive (or recursive) definitions, inductively defined sets, and “structural induction”.
I omit advanced topics such as ordinal and cardinal numbers (these are omitted also in
every other book on discrete mathematics that I know of).
But I do include several non-elementary —yet not utterly esoteric— topics from set
theory in this volume. For example, the chapter on induction includes induction along
arbitrary relations that are not orders, and also the recursive definition topics in this
volume are introduced thoroughly and completely —not as it is commonly done, namely
“here are some recursive definition examples”— to the extent that we prove that recursive
definitions do (each) define a function and that said function is unique. Our recursive
definitions include (with proofs) the case of recursion along a non-order relation. As an
illustration, we give the example defining the so-called support function of set theory by
induction along the non-order relation ∈. The support is a function that returns the set
of atoms that were used to build the function’s set input. For example, the support of
{{{3}}, {{{1}}}} is {1,3}.
Notably, we also include a brief introduction to set operators (not to be confused
with set operations) through which we prove the so-called Schröder-Bernstein theorem
—which actually is due to Dedekind— that proves the difficult, but “obvious”, statement
“if the infinite set A has at most as many elements as the infinite set B, and if also B has
at most as many members as A, then the two sets have the same number of elements”.
Above all, I note that set theory —the small part that we cover— in this volume is
founded so that is “safe”, as I call it, meaning that I build sets by stages following the idea
of Bertrand Russell in order to avoid the obvious contradictions (known as paradoxes or
antinomies) of Cantor’s set theory. For example, the universe of all sets and atoms, U
(see Wilder (1963) for extensive historical commentary) is not a “self-contradictory set”
(ibid.) in this book —in fact it is (easily, cf. Sect. 2.2) provably not a set, that is, it is a
proper class. No harm done!
Other discrete mathematics texts contain much less coverage on set theory, and nor-
mally they omit uncountable sets and diagonalisation, proofs that induction definitions
work, and the generalisation of induction along non-orders.
Outside set theory topics, other discrete mathematics texts are usually deficient in their
presentation of predicate logic (reduced to recipes), and they normally stay away from
solving recurrence equations with generating functions or without. This is a must-have (in
our opinion) topic that supports sequel courses on the theory of algorithms.
Invariably, I see other texts on discrete mathematics defining functions wrongly: in
those definitions, they require all functions to be total 2 by something like “[the function
f from S to T ] … for every x in S returns a unique element y = f (x) in T as output …”.
2 That is, totally defined.

Preface ix
But this function is total on its “left field” S (that is, supply of inputs) unlike many of the
functions used in theory of computation, for example, this one: “for all x, y return x/y”.
This is undefined for all inputs (x, y) of the form (a, 0) —so is it not a “function”?
The reader will find here a rigorous, correct, and simple chapter on predicate (first-
order) mathematical logic for the user. I have to admit a shortcut I took in my logic
chapter (maybe two), which is (are) made in the interest of saving space and minimising
formalism-fatigue. It is usual —given that most books on algebra or calculus or discrete
mathematics do not offer this definition either— for one to adopt the belief that “one
learns the syntax of formulas via practise”.
Thus, I do not define the syntax of mathematical formulas in its full abstract generality
within predicate logic (the reader that wishes to see this definition may consult Tourlakis
(2008)), but the reader will quickly learn by use from Chaps. 1–3 that, e.g., x ∈ A → x ∈

A ∪ B, A ∈ F → F ⊆ A, and A ⊆ A ∪ B are formulas of interest in set theory, while
n(n + 1)
1 + 2 + 3 + ... + n = is also a formula of number theory (over N).
2
I note that the precise “shape” (syntax) of Boolean formulas —but not of predicate
logic formulas— is defined in Exercise 6.4.5 and several syntactic properties of Boolean
formulas are proved (by the reader) using structural induction.
The related second shortcut is motivated by the lack of a formal definition of formula
syntax! “Perhaps we can do away with the introduction of formal counterparts of the
truth values false and true, namely the Boolean logic constants ⊥ and ” I thought! This
thought is behind my elevating the metasymbols of (Boolean) logic t (true) and f (false)
to atomic Boolean formula status in the logic chapter.
This is not more odd —or more unusual— than using the symbol 3 in algebra, say, to
denote both the name and the value of the constant, we pronounce “three”.
The logic chapter explains fully, and gives several examples of, the application of
generalisation and specialisation, the use of the “ping-pong” theorem used to prove
equivalences, the deduction theorem and proof by contradiction techniques, which are
introduced (and proved to be valid —metatheorems— in the Exercises section with the
help of extensive Hints) and the (complicated) technique of the elimination of an ∃-prefix
of a formula. This also is proved in the Exercises section with my help (Hints). The
variant theorem (4.1.32) —or bound variable renaming theorem— is also proved.
What to Include? What to Omit?
My undertaking, by agreement with the editor, was for a small-length volume. This makes
the above two questions very important.
I hold that absolutely central to any discrete mathematics volume —indeed, any
course on the subject— is induction and inductive definitions (the latter nowadays being
increasingly called “recursive definitions”). Then I need material to support the proper
introduction of these topics, and I need to do it correctly without reproducing/re-bumping
x Preface
into the contradictions (cutely called paradoxes3 and antinomies back then) of Cantor’s
set theory. Enter “safe set theory” in the style of Russell where sets are defined by stages
—only a short step away, this, from modern axiomatic set theory.
Then I must include enough set theory —to do a good job on induction— e.g., I must
cover well the topics on relations and functions that lead to induction (along relations
that are “well-orderings”4 or at the other extreme might not even be orders but do have,
as we technically say in this volume, “MC”) and recursive definitions (again along well-
orderings but also along non-order relations that have MC, in the general case).
Incidentally, as this volume aspires to serve, among others, courses on limitations of
computing at the 2nd, 3rd, or 4th year level, it chooses to steer away from the practice
prevalent in other discrete mathematics books where functions are introduced in a manner
that makes them defined everywhere. Neither in practice nor in theory (of computation)
are all functions and relations defined on all possible “legal” inputs.5 Thus, our functions
and relations are defined to be “partial”, a term that allows both totally defined (“total”)
but also otherwise (“nontotal”) functions and relations.
Our topics on cardinality are just about the right amount, however we do not cover
the advanced topic of cardinal and ordinal numbers. We include mathematical definitions
and the use of finite and infinite sets and countable and uncountable sets. It makes sense
to include the topic of diagonalisation —invented by Cantor— which computer science
students should be able to see and understand its application before they get into a course
on the theory of computation where the concept is extensively used.
Computer science (and computer and software engineering) students also need to take
courses on the analysis of algorithms, where recurrence equations are set up to compute
the run times (usually worst-case upper bounds) of algorithms.
This motivates the chapter on recurrence equations —and their closed form solutions—
generating functions, and trees, the latter not as data structures of interest (they are that
too!) but rather as a tool towards computing some interesting but scary sums that are
of use in the solution of recurrence relations. The justification for including these here
—rather than hoping they will be covered in the analysis of algorithms course and do
nothing— is that these solution techniques are numerous and involved, and we do not
believe that they can be easily embedded as teaching topics in the course that uses them.
The informed reader of this preface will notice that I omit combinatorics, graphs, and
automata.
3 The catastrophic failure of a theory that leads to a contradiction —an inconsistent theory, as we
say— is sugar-coated if we call it a “paradox” from the Greek παρά, that is, against, and δoκ ώ that
is, I believe or I know. If it is something that only betrays our beliefs, then probably it is not that bad?
4 This ungrammatical terminology is imposed on us by the literature.
5 In a theory about functions with natural number inputs and natural number outputs, all natural
numbers are “legal” inputs. But for some functions, not all legal inputs cause an output.
Preface xi
Some discrete mathematics books include these topics. However, automata does not fit
the purposes of a book on discrete mathematics tools. Third year courses in software engi-
neering develop in situ what material they need from automata theory and so do courses
on compilers. Automata is not a discrete math tools topic on par with induction, recursive
definitions, diagonalisation, relations and functions, logic, and recurrence equation solv-
ing. The topics included here are here because students need preparatory practice in these
before one uses them in a course that follows.
But what about graph theory? In practice, graphs —as opposed to graph theory which
is a very extensive subject— are normally introduced quickly where they are needed,
whether it is a course in data structures (no more than the definition of graphs is usually
needed in such a course) or analysis of algorithms where some topics on graph theory
might be needed (paths, cycles, spanning trees).
Regarding combinatorics, if one needs to “count”, then techniques other than the
sophisticated one via generating functions (covered extensively in the present volume
but almost in no other discrete mathematics book) will be covered in situ in a course
where they are called for, that is, typically an analysis of algorithms course.
Under the above heading “what to include”, I felt obliged to introduce the so-called
Axiom of Choice, which guarantees that I can fit in a (finite!) proof infinitely many choices
of elements from a set —even if there is no obvious methodology to describe the sequence
of my choices in a finite manner. Cf. Remark 3.5.28.6
How much mathematical rigour and how much intuition is a good mix in a book like
this? We favour both!
Intuition helps us conceptualise and formulate the elements (rough details) of solutions
to mathematical problems —“napkining it”, as it were, that is, just as we would do a rough
calculation on a napkin. Rigour is the expression and discipline of being mathematically
careful. Thus using rigorous arguments, the extra care that this entails might help to avoid
errors.
The Chapters
Chapter 1 is very short and retells the story of the Russell paradox. This is our motivation
to practise safe set theory, of which the exposition and foundation begin with Chap. 2.
Chapter 2 —endowed with the experience of Chap. 1— states at the outset that we
have two types of collections: sets and non sets. All collections we shall call classes but
the non-set ones we call proper classes. To prove that a class is a set, we use three
Principles, 0, 1, and 2, of set formation by stages. Further, the chapter introduces the class
6 I should note that I can make such a finite-description without the Axiom of Choice helping, if I
wanted to finitely describe infinitely many choices from an infinite set of natural numbers A: Just
choose the smallest a0 in A, then, for all n ≥ 0, choose the smallest an+1 in A − {a0 , a1 , ... , an }.
This two-line recursive definition is good for infinitely many choices.
xii Preface
(unordered) pair {A, B} and proves that indeed it is a set, then proceeds to defining union,
intersection, difference, power set, ordered pair —the latter as a set not as a new strange
object— and Cartesian product. All the italicised terms represent operations on sets that
provably produce sets.
Chapter 3 is on relations and functions. The transitive closure of a relation is included
and for relations on finite sets, algorithms for its computation are proposed including the
well-known Warshall’s algorithm. Equivalence relations and order relations are included.
The former (in an illustrative example) leads to our first acquaintance of the “least prin-
ciple” on the set of natural numbers. The latter starts our study of orders that culminates
to induction and inductive (recursive) definitions in a later chapter.
This chapter also introduces us to the concepts “finite” and “infinite” (set), diago-
nalisation, operators on sets, and the Schröder-Bernstein (or Cantor-Bernstein) theorem
—which is actually Dedekind’s.
Chapter 4 is an about 20-pages long chapter, which outlines the elements of predi-
cate logic for the user, including the techniques of adding/removing ∀ and ∃ prefixes of
formulas in proofs. It contains a good number of illustrative examples.
Chapter 5 introduces the concept of “inductiveness condition” (IC) of a relation and
proves its equivalence to the “minimal condition” (MC) of a relation. This has as a special
case that induction on the natural number set N is equivalent with the least principle
on N.
The statement that “< has IC” is the expression of the fact that we can do induction
along this < to prove mathematical statements about the members of the class where
< acts.
The formula that expresses the “strong” or “course-of-values” induction proof principle
is derived. There are many illustrative examples of induction as well as several end-
of-chapter exercises. The chapter proves the theorem that recursive definitions lead to
functions that uniquely exist. Both induction and recursion are extended to apply to non-
order relations, as long as the latter have MC (equivalently IC). One such non-order
relation is ∈ and thus we not only can prove properties of sets by induction along the
relation ∈ but also can make recursive definitions along ∈. As an illustration of the latter,
the support function is discussed.
Chapter 6 introduces a generalisation of definitions by induction (recursion) of Chap. 5.
According to this generalisation, a set is defined from a given set of operations —that is,
relations R(x1 , ... , xn , y)— where the xi is where the inputs are “read” in and the y is
where the outputs appear. The definition requires the set —it turns out it is a set— to be
the ⊆-smallest set that is closed 7 under all given operations R. We note that if n = 0, for
some such operation R, then the outputs of R can be thought of as given initial objects.
7 A set T is closed under a relation R(x , ... , x , y) —by definition— iff for all specific x , ... , x
1 n 1 n
in T, all the produced y are also in T.
Preface xiii
If the operations are all sets, then the ⊆-smallest set so formed is unique and is called
the closure S = C1({... , R(x1 , ... , xn , y), ...) of these relations. We also say that S is
inductively defined by said operations.
The associated proof tool —induction over a closure, also termed structural induction—
proves properties of inductively defined sets. We validate that this induction “works” in
this chapter.
We also connect the inductive definition of sets with an appropriate iterative con-
struction by stages, and we also connect it (in the chapter’s Exercises section) with the
definition of sets as monotone operator fixpoints (monotone operators were introduced in
3.8.1).
Chapter 7. Here, we discuss many classes of recurrence equations and the various
techniques to obtain “closed form” solutions —that is, in terms of known functions such
as λn.n2 , λn.2n , λn.n log2 n etc.
The technique of generating functions is also outlined and demonstrated with several
examples (the most nontrivial of which appearing in the following Chap. 8).
Chapter 8. In the area of mathematics known as graph theory, trees —in particular
binary trees— play a central role as special cases of the so-called directed graphs. While
trees are studied for their own merit in modelling important data structures in computing
practise, they have also unexpected applications to discrete mathematics such as the one
we will demonstrate in this chapter —using trees to compute a scary sum in closed form.
The chapter concludes with an application of generating functions used to compute a
simple expression that computes the number of all extended trees that have n internal
nodes.
There exists a direct graph-theoretic definition of a tree —that is beyond the design of
this volume— but it is arguably more convenient to offer the direct, graph-independent,
recursive definition (as in Knuth (1973), but see Example 6.3.10) that we took in Chap. 6
if for no other reason, then at least for the fact that such definition enables us to prove
tree properties by structural induction.
If I used this book in a first year undergraduate course on discrete mathematics —which
I am approximately doing now and for the past few years, using a simplified prequel of
this volume that I wrote— I would cover almost everything, except the most difficult
recurrence equations (with or without the help of generating functions) and I would also
skip induction and recursion along a non-order relation with MC. If by a(n unexpected)
miracle discrete mathematics was taught in 2nd year (just as a course in logic, essentially,
is), then I would want to cover everything. This is the advantage of a short volume; you
can cover everything if the audience is prepared.
The reader will forgive, I hope, the many footnotes, which the “style police” may assess
as “bad style”! However, there is always a story within a story that is best delegated to
footnotes not to disrupt the flow of exposition. Incidentally, the book by Wilder (1963)
on the foundations of mathematics would lose most of its effectiveness if it were robbed
of its superbly informative footnotes!
xiv Preface
My footnotes (unlike many in Wilder’s book) are almost never of historical import,
but do support the understanding of those who may be bewildered by long displayed
formulas. To the rescue I often include a very local footnote inside the display to explain
a potentially puzzling spot —and I do so on the spot! Consider the display below as an
example.
x ≤ k, where k is a natural number, implies x ≤ x ≤8 k (1)
The style of exposition that I prefer is informal and conversational and is expected
to serve well not only the readers who have the guidance of an instructor but also those
readers who wish to learn discrete mathematics on their own. I use several devices to
promote understanding, such as frequent “pauses” that anticipate questions and encourage
the reader to rethink an issue that might be misunderstood if read but not studied and
reflected upon. All pauses start with “Pause.” and end with “”.
Apropos quotes and punctuation, we follow the “logical approach” (as Gries and
Schneider (1994) call it) where punctuation is put inside the quotation marks if and only
if it is a logical part of the quoted text; never otherwise. So we would never write
The relation “is a member of” is fundamental in set theory. It is denoted by “∈.”
No. “.” is not part of the symbol! We should write instead
The relation “is a member of” is fundamental in set theory. It is denoted by “∈”.
Another feature of the above reference that I have adopted is the logical use of the em-
dash “—” as a parenthesis. As such we have a left version and a right version to avoid
ambiguities. The left version is contiguous with the following word but not with the pre-
ceding word. The right version is the reverse of this. For example, “discrete mathematics
is easy —as long as one studies— and is useful towards preparing you for courses in
algorithms and logic”.
I have included numerous remarks, examples, and embedded exercises (the latter in
addition to the end-of-chapter exercises) that reflect on a preceding definition or theorem.
Influenced by my teaching —where I love emphasising things— but originally by the
books of Bourbaki, I use in my books the stylised “winding road ahead” warning, ,
that I first saw in Bourbaki (1966).
It delimits a passage that is too important to skim over.
delimits non-elementary passages that I could not resist including.
There are 202 end-of-chapter exercises and several embedded ones in the text. Many
have hints and thus I refrained from (subjectively) flagging them for “level of difficulty”.
8 x is the smallest natural number ≥ x.

Preface xv
After all, as one of my mentors, Allan Borodin, used to say to us (when I was a graduate
student at the University of Toronto), “Attempt all exercises. The ones you can do, don’t
do; do the ones you cannot do”.
Toronto, Canada George Tourlakis

February 2023
Acknowledgments I wish to thank all those who taught me, including my parents. In particular, I
thank the many mentors I have had at the University of Toronto as a graduate student9 and all those
in my prehistory, in chronological order Andreas Katsaros, Yiannis Ioannidis, and Pan. Ladopoulos
—all three of whom taught me geometry.
9 I will avoid a name listing since every permutation unintentionally —but inevitably— implies a
ranking.
Contents
1 Some Elementary Informal Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Russell’s “Paradox” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Safe Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 The “Real Sets” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 What Caused Russell’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Some Useful Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Operations on Classes and Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 The Powerset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 The Ordered Pair and Finite Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 The Cartesian Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7.1 Strings or Expressions Over an Alphabet . . . . . . . . . . . . . . . . . . . . . 36
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Transitive Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Computing the Transitive Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 The Special Cases of Reflexive Relations on Finite Sets . . . . . . . . 62
3.2.3 Warshall’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.1 Lambda Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5.2 Kleene Extended Equality for Function Calls . . . . . . . . . . . . . . . . . . 89
3.5.3 Function Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.6 Finite and Infinite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.7 Diagonalisation and Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.8 Operators and the Cantor-Bernstein Theorem . . . . . . . . . . . . . . . . . . . . . . . . 108
3.8.1 An Application of Operators to Cardinality . . . . . . . . . . . . . . . . . . . 110
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
xvii
xviii Contents
4 A Tiny Bit of Informal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.1 Enriching Our Proofs to Manipulate Quantifiers . . . . . . . . . . . . . . . . . . . . . . 117
4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.1 Inductiveness Condition (IC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2 IC Over N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2.1 Well-Foundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.2.2 Induction Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3 Inductive Definitions of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.3.1 Examples on Inductive Function Definitions . . . . . . . . . . . . . . . . . . . 174
5.3.2 Fibonacci-like Inductive Definitions; Course-of-Values
Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6 Inductively Defined Sets; Structural Induction . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.1 Set Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Induction Over a Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.3 Closure Versus Definition by Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7 Recurrence Equations and Their Closed-Form Solutions . . . . . . . . . . . . . . . . . 207
7.1 Big-O, Small-o, and the “Other” ∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.2 Solving Recurrences; the Additive Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.3 Solving Recurrences; the Multiplicative Case . . . . . . . . . . . . . . . . . . . . . . . . 213
7.4 Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8 An Addendum to Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.1 Trees: More Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.2 A Few Provable Facts About Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3 An Application to Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.4 How Many Trees? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Some Elementary Informal Set Theory
1
Overview
Set theory is due to Georg Cantor. “Elementary” in the title above does not apply to the body
of his work, since he went into considerable technical depth and mathematical sophistication
in this, his new theory. It applies however to our coverage in this volume as we are going to
restrict ourselves to elementary topics only.
Cantor’s Set Theory contains quite a few contradictions widely referred to in the literature
as paradoxes1 or antinomies,2 some of considerable consequence. The next section is about
the least technical, hence the most elementary of all to describe, and most fundamental of
these antinomies contained in Cantorian set theory and was discovered by Bertrand Russell.
What caused these contradictions or inconsistencies as logicians call them?
The reason is that Cantor’s set theory was not based on axioms nor rigid rules of reasoning —
a state of affairs for a theory that we loosely characterise as “informal”.
At the opposite end of “informal” we have the formal theories that are based on the
form of the mathematical statements under consideration and utilise axioms and the rules
of mathematical logic to formulate proofs.
As such the latter theories are “safer” to develop; they do not lead to obvious con-
tradictions.
One cannot fault Cantor for not using logic in arguing his theorems —that process for
“doing mathematics” was not yet invented when he built his theory— but then, a fortiori,
mathematical logic was not invented in Euclid’s time either, and yet he did use axioms that
1 From the Greek par£doxo. ar£ means “against” while doκ è means “I believe” or “I know”. A
paradox thus is against one’s belief or knowledge.
2 From the Greek Antinom…a. Ant… also means “against” and nÒmoj means “the law”. So an antinomy
is against the (mathematical or logical) law.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_1
2 1 Some Elementary Informal Set Theory
stated how his building blocks, points, lines and planes interacted with each other and how
they behaved.
Incidentally, Euclidean Geometry (provably) is free of contradiction —or, as we say dif-

ferently, it is a consistent theory.
The problem with Cantor’s set theory is that anything goes regarding what sets are. A set
is just a synonym of the dictionary terms “aggregate”, “collection”, “class”, etc. Definition-
by-dictionary of synonyms as it were.
Moreover, Cantorian set theory does not deal with the most fundamental question: “How
are sets formed? This question has a remarkably simple answer (Russell’s) that can be
credited for the particular choice of the axioms of modern axiomatic formal ZF Set Theory.3
We sample in the next section the kind of logical “trouble” this extremely informal
approach entails (Russell’s paradox).
It must be stated at the outset that to have a set theory that has no obvious inconsistencies
does not necessitate to work formally within the axiomatic method. Our first chapter, based
on Russell’s approach allows us to do “safe” set theory —a nickname we gave to informal
set theory that does not have any of the known inconsistencies of Cantor’s set theory—
within an informal (non axiomatic) setting.
Following Russell we will ask and answer how sets are built first, and then derive from
our answer some principles that will guide (and protect!) the theory’s development —and,
in particular, will guide us in “safely” building “large” sets; indeed any sets!
1.1 Russell’s “Paradox”
Cantor’s naïve (this adjective is not derogatory but is synonymous in the literature with
informal and non axiomatic) set theory was plagued by paradoxes, the most famous of which
(and the least “technical”) was pointed out by Bertrand Russell and was thus nicknamed
“Russell’s paradox”.
Cantor’s theory is the theory of collections (synonymously, sets) of objects, as we men-
tioned above, terms that were not defined and moreover it was not indicated how these sets
were built.4
3 “ZF” for Zermelo and Fraenkel, the designers of the most commonly used axiomatic set theory.
4 This is not a problem in itself. Euclid too did not say what points and lines were; but his axioms
did characterise their nature and interrelationships: For example, he started from these (among a few
others) a priori truths (axioms): a unique line passes through two distinct points; also, on any plane,
a unique line l can be drawn parallel to another line k on the plane if we want l to pass through a
given point A that is not on k.
The point is:
1.1 Russell’s “Paradox” 3
This theory studies operations on sets, properties of sets, and aims to use set theory as
the foundation of all mathematics. Naturally, mathematicians “do” set theory of mathemat-
ical object collections —not collections of birds and other beasts. We have learnt some
elementary aspects of set theory in high school. We will learn more from this book.
1. Variables. Like any theory, informal or not, informal set theory —a “safe”5 variety of
which we will develop here— uses variables just as algebra does. There is only one
type of variable that varies over set and over atomic objects too, the latter being objects
that have no set structure. For example integers. We use the names A, B, C, . . . and
a, b, c, . . . for such variables, sometimes with primes (e.g., A ) or subscripts (e.g., x23 ),
or both (e.g., x22 , Y ).
42
2. Notation. Sets given by listing. For example, {1, 2} is a set that contains precisely the
objects 1 and 2, while {1, {5, 6}} is a set that contains precisely the objects 1 and {5, 6}.
The braces { and } are used to delimit at the left and right the indicated collection/set of
objects by outright listing.
3. Notation. Sets given by “defining property”. But what if we cannot (or will not) explicitly
list all the members of a set? Then we may define what objects x get in the set/collection
by having them to pass an entrance requirement, P(x):
An object x gets in the set iff (if and only if ) P(x) is true of said object.
Let us parse “iff”:
a. The IF: So, IF P(x) is true, then x gets in the set (it passed the “admission require-
ment”).
b. The ONLY IF: So, IF x gets in the set, then the only way for this to happen is for it
to pass the “admission requirement”; that is, P(x) is true.
In other words, “iff” (as we probably learnt in high school or some previous university
course such as calculus) is the same thing as “is equivalent”:
“x is in the set” is equivalent to “P(x) is true”.
You cannot leave out both what the nature of your objects of study is and how they behave/interrelate
and get away with it! Euclid omitted the former but provided the latter, so all worked out.
5 Safe from contradictions, that is.
4 1 Some Elementary Informal Set Theory
We denote the collection/set6 defined by the entrance condition P(x) by
{x : P(x)} (1)
but also as
{x | P(x)} (1 )
reading it “the set of all x such that (this “such that” is the “:” or “|”) P(x) is true [or
holds]”
4. “x ∈ A” is the assertion that “object x is in the set A”. Of course, this assertion may be
true or false or “it depends”, just like the assertions of algebra 2 = 2, 3 = 2 and x = y
are so (respectively).
5. x ∈/ A is the negation of the assertion x ∈ A.
6. Properties
• Sets are named by letters of the Latin alphabet (cf. Variables, above). Naming is
pervasive in mathematics as in, e.g., “let x = 5” in algebra.
So we can write “let A = {1, 2}” and let “c = {1, {5, 6}}” to give the names A and c
to the two example sets above, ostensibly because we are going to discuss these sets,
and refer to them often, and it is cumbersome to keep writing things like {1, {5, 6}}.
Names are not permanent;7 they are local to a discussion (argument).
• Equality of sets (repetition and permutation do not matter!)
Two sets A and B are equal iff they have the same members. Thus order and multi-
plicity do not matter! E.g., {1} = {1, 1, 1}, {1, 2, 1} = {2, 1, 1, 1, 1, 2}.
• The fundamental equivalence pertaining to definition of sets by “defining property”:
So, if we name the set in (1) above, S, that is,
if we say “let S = {x : P(x)}”, then “x ∈ S iff P(x) is true” (1)
Incidentally, we almost never say “is true” unless we want to emphasise this fact. We
would say instead: “x ∈ S iff P(x)”.
Equipped with the knowledge of the previous bullet, we see that the symbol {x : P(x)}
defines a unique set/collection: Well, say A and B are so defined, that is, A = {x :
P(x)} and B = {x : P(x)}. Thus
A={x:P(x)} B={x:P(x)}
x∈A iff P(x) iff x∈B
6 We have not yet reached Russell’s result, so keeping an open mind and humouring Cantor we still
allow ourselves to call said collection a “set”.
7 OK, there are exceptions: ∅ is the permanent name for the empty set —the set with no elements at
all— and for that set only; N is the permanent name of the set of all natural numbers.
1.1 Russell’s “Paradox” 5
thus
x ∈ A iff x ∈ B
and therefore A = B by the way equality of sets/collections is defined.
Let us pursue, as Russell did, the point made in the boxed statement (last bullet) above. Take
P(x) to be specifically the assertion x ∈/ x. He then gave a name to
{x : x ∈
/ x}
say, R. But then, by the referred to statement above,
x ∈ R iff x ∈
/x (2)
If we now believe,8 as Cantor, the father of set theory who did not question and went ahead
with it, that every P(x) defines a set, then R is a set.
What is wrong with that?
Well, if R is a set then this object has the proper type to be substituted into the variable
of type “math object”, namely, x, throughout the equivalence (2) above. But this yields the
contradiction
R ∈ R iff R ∈
/ R (3)
This contradiction is called the Russell’s Paradox.
This and similar paradoxes motivated mathematicians to develop formal symbolic logic
and to invent axiomatic set theory9 as a means to avoid paradoxes like the above.
Other mathematicians who did not want to use mathematical logic and axiomatic theories
found a way to do set theory informally, yet safely.
What they did was to ask and answer “how are sets formed?”10
We will look into the details of the founding this “safe” set theory in the next chapter.
8 Informal mathematics often relies on “I know so” or “I believe” or “it is ‘obviously’ true”. Some
people call “proofs” like this —that is, baseless arguments— “proofs by intimidation”. Nowadays,
with the ubiquitousness of the qualifier “fake”, one could also call them “fake proofs”.
9 There are many flavours or axiomatisations of set theory, the most frequently used being the “ZF”
set theory, due to Zermelo and Fraenkel.
10 Actually, axiomatic set theory —in particular, its axioms are— is built upon the answers this group
came up with. This story is told at an advanced level in Tourlakis (2003b).
Safe Set Theory
2
Overview
This chapter introduces Russell’s idea that sets are built by stages. This avoids the obvious
contradictions of the naïve set theory of Cantor’s that stem from situations where it allows
some collections, such as {x : x ∈ / x}, to be “self contradictory sets” as mathematicians
referred to them back then when set theory was new and contradictions were first discovered
(cf. Wilder 1963).
Once we introduce the what and the how of Russell’s Principles of set formation by
stages and demonstrate their application by examples, we let the chapter proceed with the
development of the elementary theory of sets. This theory —as introduced here— recognises
that we have two types of collections, sets and non sets, the latter called proper classes in
the modern literature, and we have tools to tell them apart. The “self contradictory sets” of
naïve set theory go away in this setting.
We begin in this chapter the development of elementary set theory by introducing the
usual operations on sets such as ∪, ∩, × that create new sets from given ones.
2.0.1 Definition (Classes and sets) We call all collections classes.

Definitions by defining property “Let A = {x : P(x)}” always define a class, but as we
saw (Sect. 1.1), sometimes —e.g., when “P(x)” is specifically “x ∈ / x”— that A is not a
set.
Classes that are not sets are called proper classes.
The above is captured by the following picture:

https://doi.org/10.1007/978-3-031-30488-0_2
8 2 Safe Set Theory
We will normally use what is known as “blackboard bold” notation and capital latin letters
to denote classes by names such as A, B, X. If we determine that some class A is a set, we
would rather write it as A, but we make an exception for the following sets: Mathematicians
use notation and results from set theory in their everyday practice. We call the sets that
mathematicians use the “real sets” of our mathematical intuition, like the set of natural
numbers, N (also denoted by ω), integers Z, rationals Q and reals R.
In forming the class {x : P(x)} for any property P(x) we say that we apply comprehension.
It was Frege and Cantor who believed (explicitly or implicitly) that comprehension was safe
—i.e., always produced what they understood to be a “set”.
But as we saw in Sect. 1.1 Russell proved that this was not the case.
Mind you, Cantor never said what a “set” really is. He just relied on dictionary-derived
synonyms —which, alas, does not settle ita — such as “collection” and “aggregate”.
Nevertheless, very precisely, Russell proved that whatever you might want to call
“collections” (or “sets” for that matter) of objects that are namable by “{x : P(x)}”-
type names it is a mathematical fact that there is a choice of at least one “P(x)” that
names a “collection” that cannot possibly be of the same type as any of those collections
that you just believe you are “defining” via names such as “{x : P(x)}”!
a “Natural language” is neither a substitute nor an aid for the precision of mathematics.

It is a widely held tenet that set theory, using as primitives the notions of set, atom (an
object that is not sub-divisible; not a collection of objects), and the relation belongs to (∈),
is sufficiently strong to serve as the foundation of all mathematics. Mathematicians use
notation and results from set theory in their everyday practice.
In Definition 2.0.1 we said that {x : P(x)} always defines a class, say, A.
Is there a converse in this observation? That is, if A names a class, is there always a
“property” P(x) —whose expression does not use the letter A— such that A = {x : P(x)}?
If P(x) can refer to the letter A then, yes: A = {x : x ∈ A} since this simply says “x ∈ A
iff x ∈ A” (cf. (1) on p. 4).
2.1 The “Real Sets” 9
If on the other hand we heed the restriction in italics immediately above, then this converse
is false. Here is why:
The term “property” is in our context, mathematically speaking, a “formula (of logic) in
which the only set theory symbol that need occur is ∈”.1
We will later learn that we can only have enumerably many such formulas (properties) —
meaning that we can enumerate all of them in a straight (infinitely long) line where unique
positive integers are associated with —they index— each formula that we place on said line,
and no such integer repeats in our straight line.
On the other hand we will also learn that if we consider all the sets of integers, then
we cannot enumerate them in a similar way on a straight line. We have “more” such
sets of integers S than we have properties T (x) to name them without cheating2 as S =
{x : T (x)}.
2.1 The “Real Sets”
So, how can we tell, or indeed guarantee, that a certain class is a set?
Russell proposed the following “recovery” from his Paradox:
Make sure that sets are built by stages, where at stage 0 all atoms are available. Atoms are
also called urelements in the literature from the German Urelemente, which in analogy with
the word “urtext” —meaning the earliest text— would mean that they are the “earliest”
mathematical objects available. Witness that they are available at stage 0!
We may then collect atoms to form all sorts of “first level” sets. We may also proceed to
collect any mix of atoms and first-level sets to build new collections —second-level sets—
and so on. Much of what set theory does is attempting to remove the ambiguity from this
“and so on”. See below, Principles 0–2.
Thus, at the beginning we have √ all the level-0, or type-0, objects available to us. For
example, atoms such as 1, 2, 13, 2, π are available. At the next level we can include any
number of such atoms (from none at all, to all) to build a set, that is, a new mathematical
object. Allowing the usual notation, i.e., listing of what is included within braces, we may
cite a few examples of level-1 sets:
L1-1. {1}.
L1-2. {1, √
1}.
L1-3. {1, 2}.
1 The multitude of symbols we use in set theory, “∅, ∩, ⊆, ∪, ” are all derived symbols —“macros”
if you will— that are expressed using variables and “∈” only.
2 “Cheating” would be to write S = {x : x ∈ S}. You see, the informal name “S” is not a permissible
symbol to use in writing down set or atom “properties”. It is neither a symbol of logic, nor is it the
permissible symbol ∈.
10 2 Safe Set Theory
√
L1-4. { 2,
√1}.
L1-5. {π, 3, π }.
We already can identify a few level-2 objects, using what (we already know) is available:
√
L2-1. {{ 2, 1}}.
L2-2. {{0}, π }.
Note how the level of nesting of { }-brackets matches the level or stage of the formation of
these objects!
2.1.1 Definition (Class and set equality) This definition applies to any classes, hence, in
particular, to any sets as well.
Two classes A and B are equal —written A = B— means
x ∈ A iff x ∈ B (1)
That is, an object is in A iff it is also in B.

A is a subclass of B —written A ⊆ B— means that every element of the first class occurs
also in the second, or
If x ∈ A, then x ∈ B
If A is a set and A ⊆ B, then we say it is a subset of B.
If we have A ⊆ B but A = B, then we write A B (some of the literature uses A B
or even A ⊂ B instead) and say that A is a proper subclass of B.
Caution. In the terminology “proper subclass” the “proper” refers to the fact that A is
not all of B. It does not say that A is not a set! It may be a set and then we say that it is
proper subset of B.
2.1.2 Remark
(1) Thus our definition of how classes (or sets) compare with respect to equality is chosen
to be 2.1.1 —yes, this is our choice about what is the important factor for two classes to
be equal; other choices are possible but not taken in the standard set theory literature.
For example, {1} = {1, 1, 1}. Why? Because any object I see in the class to the left of
“=” I also see in the class to the right, and vice versa. Similarly, {1, 2} = {2, 1}, for I
see just “1” and “2” in the left class and I note these objects are also in the right class,
and vice versa.
These two observations related to the representation of classes by listing, obtained from
Definition 2.1.1, are often stated as “in a class or set depicted by listing its elements
within braces, neither the order (of listing) the elements nor their multiplicity matter”.
Thus one will usually write {1, 2} rather than {1, 2, 2, 1, 1}.
(2) If n is an integer-valued variable, then what do we understand by the statement “2n is
even”? The normal understanding is that “no matter what the value of n is, 2n is even”,
or “for all values of n, 2n is even”.
When we get into our logic topic later on we will see that we can write “for all values
of n, 2n is even” with less English as “(∀n)(2n is even)”. The expression “(∀n)” says
“for all (values of) n”.
Mathematicians often prefer to have statements like “2n is even” with the “for all”
implied.3 You can write a whole math book without writing ∀ even once, and without
overdoing the English.
(3) Definition 2.1.1 is called “extensionality” because it is the extension —that is, what
members are in the two classes— that determines equality; not the intention —i.e., how
the members of the two classes were selected.
For example, the two classes {x : x 2 − 2x + 1 = 0} and {1} are equal. Both contain just
“1”.
(4) Definition 2.1.1, more economically, could be stated
if we have “x ∈ A iff x ∈ B”, then we have A = B
The converse follows from logic needing no help from set theory concepts.
How? Well, in
x ∈A (†)
A is a name of a mathematical object. Therefore, if the name B stands for the same
object (i.e., A = B) then x ∈ B means exactly the same thing as (†).
But see also the -passage below.
Is Definition 2.1.1 a “definition” or is it an “axiom”?

It depends:
• In a formal approach to set theory, said theory is an extension of predicate logic, obtained
by adding the theory-specific symbol “∈” and adding a number of set theory-specific
3 An exception occurs in Induction that we will study later, where you fix an n (but keep it as a variable
of an unspecified fixed value, not as 5 or 42) and assume the “induction hypothesis” P(n). But do not
worry about this now!
axioms, one of which is extensionality.4 The axioms governing the behaviour of equality
“=” in logic are inherited by any theory that we base on logic, that is, a theory whose
theorems we are proving syntactically using logic.
• Thus in such theories we do not “redefine” or “amend” what equality is and what its
axiomatically postulated properties —in logic— are.5
• Here is an analogy from Peano Arithmetic (PA) —an axiomatic theory of natural numbers
based on logic. It contains among others the axiom “x + 1 = y + 1 implies x = y”. This
axiom evidently is not a “definition of =” between numbers, but rather is an axiom about
the behaviour of the function “+1”.6 in the presence of equality
Another property of the successor is captured by the PA axiom “x + 1 = 0”, and again it
clearly does not “define” equality or its negation “ =” —it is rather about the successor’s
behaviour around “=” and “0”.
Entirely analogously, extensionality is not about logic’s “=”, but rather is about how sets7
behave around “=”.
An axiom about sets!
• However, our exposition of the elements of “safe set theory” is not axiomatic —so we do
not rely on preexisting axioms for “=” from logic— thus we will side with the excellent
informal but mathematically rigorous discussion of the foundations of set theory in Wilder
(1963, p. 58) and take extensionality as a definition with no harmful side-effects. This
choice is convenient as at once we “define” equality for all classes —that may or may
not be sets.

2.1.3 Remark Since “iff” between two statements S1 and S2 means that we have both
directions
If S1 , then S2
and
If S2 , then S1
we have that “A = B” is logically the same as (equivalent to) “A ⊆ B and B ⊆ A”.
4 In formal set theory if one ever speaks of classes (e.g., Levy 1979; Tourlakis 2003b) then one does
so informally and only for convenience. Non set classes have no status within the theory. Within the
theory we have only sets and atoms, and the axioms are about sets and atoms only.
5 One such postulated property of “=” in logic is one of the usual axioms of equality, namely,
“x = x”.
6 Called the successor function
7 A symbolic formulation within logic of the relationship between “ A = B” (for sets) and “x ∈ A iff
x ∈ B” is the “axiom of extensionality” of axiomatic set theory.
2.1.4 Example In the context of the “A = {x : P(x)}” notation we should remark that
notation-by-listing can be simulated by notation-by-defining-property: For example, {a} =
{x : x = a} —here “P(x)” is x = a.
Also {A, B} = {x : x = A or x = B}. Let us verify the latter: Say x ∈ lhs.8 Then x = A
or x = B. Thus x must be A or B. But then the entrance requirement of the rhs9 is met, so
x ∈ rhs.
Conversely, say x ∈ rhs. Then the entrance requirement is met so we have (at least) one
of x = A or x = B. Trivially, in the first case x ∈ lhs and ditto for the second case.
We now postulate the principles of formation of sets!
Sets and atoms are the mathematical objects of our (safe) set theory.
Sets are formed by stages. At stage 0 we acknowledge the presence of atoms. They are
given outright, they are not built.
Principle 0. At any stage we may build a set, collecting together other mathematical
objects (sets or atoms) provided these (mathematical) objects that we put into our set
were available at stages before .
Principle 1. Every set is built at some stage.
So there is no set in our approach that “just happens”.

Principle 2. If is a stage of set construction, then there is a stage after it.
2.1.5 Remark (Assumed properties of stages) The reader would be surprised by this
remark: Do we need to say more about stages? The concept of building something by stages
is intuitively clear: At stage 0 we do this; at stage 1 we do that; at stage 3 we do something
else, etc.
Note however that this impatient observation is based on stages that are (named by)
natural numbers. Natural numbers have nice and well understood order properties. For
example you cannot have n < n if n is a natural number. In fact you cannot have an increasing
chain that starts with n and ends with n. So, we do not have n < n. On the other hand we
have that n < m < k implies n < k.
But there are far too many sets. More than natural numbers, which is easy to readily
agree with since we also have real numbers, a much “larger” set that contains the natural
numbers.
Ergo, we need many more stages of set formation than just (those named by) natural numbers.
8 Left Hand Side.

9 Right Hand Side.
So we postulate for our stages “reasonable” and “intuitively desirable” properties below,
which imitate the order properties of natural numbers, without attempting to identify stages
with such numbers as this would be unnecessarily restrictive as we noted above.
Below we depict stages by the letters or T with or without primes or subscripts and
postulate as true a few intuitively pleasing properties they will have with respect to “before”
and “after” relation.
We accept that the stages of set formation ordered by “before” (or “after”) share the
following properties with the natural numbers, the latter ordered by “less than”.
Namely, let us write <s for “stage is before stage ”. Then we have
1. <s is false. That is “before” and “after” mean what we expect them to. No
event or stage can occur before (or after) itself.
2. If <s <s , then <s . No surprises here either: the expected transi-
tivity of before and after relations.
3. If , are stages, then we have one of <s , = , <s . We expect to
be able to tell if a stage is before another (or after, or are the same), else how will
we be able to assert that a class that we just built was built after all its members?
4. If is any stage, then there is a stage after it: <s (this repeats Principle 2).
Principle 2 (equivalently, 4. above) makes it clear that we have infinitely many stages of
set formation in our toolbox. Indeed, starting with any , by repeated application of said
Principle we can build an infinite ascending sequence
<s < s < s . . . (1)
All members in (1) are distinct, else one, say a , repeats. We then have
a <s T <s T <s . . . <s a
By repeated application of 2. we get a <s a , which contradicts 1.
2.1.6 Remark If some set is definable (“buildable”) at some stage , then it is also definable
at any later stage as well, as Principle 0 makes clear.
The informal set-formation-by-stages will guide us to build, safely, all the sets we may
need in order to do mathematics.
2.2 What Caused Russell’s Paradox 15
In axiomatic set theory ZFC10 —just as in “small” tasks where we use natural numbers
as “stages”— one defines stages beyond natural numbers to be certain “infinite numbers”
called ordinals. See, for example, Tourlakis (2003b).
2.2 What Caused Russell’s Paradox
How would the set-building-by-stages doctrine avoid Russell’s paradox?
Recall that à la Cantor we get a paradox (contradiction) because we insisted to believe that
all classes are sets, that is, following Cantor we “believed” Russell’s “R” was a set.
Principles 0–2 allow us to know a priori that R is a proper class. No contradiction!
How so?
OK, is x ∈ x true or false? Is there any mathematical object x —say, A— for which the
following is true?
A ∈ A? (1)
Well, for atom A, (1) is false since atoms have no set structure, that is, they are
not collections of objects. An atom A cannot contain anything, in particular it cannot
contain A.
What if A is a set and A ∈ A? Then in order to build A, the set, we have to wait until
after its member, A is built (Principle 0 says “provided”). So, we need (the left) A to be
built before we can build (the right) A in (1) as a set. In short, since the left and right A are
the same, we want A build before A is built. Preposterous!
So (1) is false. A being arbitrary, we demonstrated that
x ∈ x is false (for all x) (2)
thus x ∈
/ x is true (for all x), therefore the R of Sect. 1.1 is U, the universe of all sets and
atoms; the class of everything.
“Everything” with restrictions in the modern literature. Our classes are allowed to contain
only atoms and sets. Not proper classes.
Of course, Cantorian set theory had no such restriction since it did not distinguish between
set and non-set classes to begin with.
10 As founded by Zermelo and Fraenkel, with the axiom of Choice.

Thus
R=U
Here is now an exact reason why U is not a set. Well, assume for a moment that it is.
Then
• U ∈ U since the rhs contains all sets and we temporarily assumed the lhs to be a set.
• But we just saw that the above is false taking x to be U in (2) above.
2.2.1 Remark The immediate reactions of the mathematical community to Russell’s para-
dox was to blame “size”: “R is too big to be a set”. Well, define “too big”!
They did not. The discussion of the panic that ensued is outlined in Wilder (1963) in a
very illuminating manner. He points out (loc. cit.) that even the phrase “all sets” was viewed
with suspicion, not only the dangerous act of collecting “all sets”!
But why did Russell bother to define his R? Why did he not use
U = {x : x = x}
to collect all sets and prove directly, as we did above, that U cannot be a set, thus demon-
strating in this alternative way that not all “defining properties” lead to sets?
Because his idea that sets should be build by stages was suggested later. Incidentally,
the “too big” U —if any collection qualifies for the label “too big” surely the one that
contains everything does!— was also discovered (without showing R = U) to not be a set
in a roundabout longish manner that I cannot reproduce this early in our development.
I promise to come back to this “paradox of the powerset” —to be defined later. You see,
U being an omni-container contains its powerset as an element and as a subset. A so-called
cardinality argument then derives a contradiction to the claim that U is a set.
So U, and R, are proper classes. Thus, the fact that R is not a set is neither a surprise,
nor paradoxical. It is just a proper class as we just have recognised.
2.3 Some Useful Sets
2.3.1 Example (Pair) By Principle 0, if A and B are sets or atoms, then let A be available
at stage and B at stage . Without loss of generality say is not later than —recall
postulate 3. about the relative positions of two stages.
Let then pick a stage after (Principle 2). This will be after both (postulate 2. on
p.14) , .
At stage we can build
{A, B} (1)
2.3 Some Useful Sets 17
as a set (cf. Principle 0).

We call (1) the (unordered) pair set.
Pause. Why “unordered”? See Remark 2.1.2, item 1.
We have just proved a theorem above:
2.3.2 Theorem If A, B are sets or atoms, then {A, B} is a set.
2.3.3 Exercise Without referring to stages in your proof, prove that if A is a set or atom,
then {A} is a set.
Incidentally, a set that contains exactly one element is called a singleton.
2.3.4 Remark A very short digression into Boolean Logic —for now. It will be con-
venient to use truth tables to handle many simple situations that we will encounter where
“logical connectives” such as “not”, “and”, “or”, “implies” and “is equivalent” enter into
our arguments.
We will put on record here how to handle things such as “S1 and S2 ”, “S1 implies S2 ”,
etc., where S1 and S2 stand for two arbitrary statements of mathematics. In the process we
will introduce the mathematical symbols for “and”, “implies”, etc.
The symbol translation table from English to symbol (and back) is:
NOT ¬
AND ∧
OR ∨
IMPLIES (IF…,THEN) →
IS EQUIVALENT ≡
The truth table below has a simple reading. For all possible truth values —true/false, in
short t/f— of the “simpler” statements S1 and S2 we indicate the computed truth value of the
compound (or “more complex)” statement that we obtain when we apply one or the other
Boolean connective of the previous table.
S1 S2 ¬S1 S1 ∧ S2 S1 ∨ S2 S1 → S2 S1 ≡ S2 S2 → S1
f f t f f t t t
f t t f t t f f
t f f f t f f t
t t f t t t t t
Comment. All the computations of truth values satisfy our intuition, except perhaps that
for “→”:
¬ flips the truth value as it should, ∧ is eminently consistent with common sense as it
applies to “and”, ∨ is the “inclusive or” of the mathematician, and ≡ is just equality on the
set {f, t}, as it should be.
The “uneasiness” with this so-called “classical” → is that there is no causality from left
to right. The only “easy to understand” entry is for t → f. The outcome should be false, that
is, indicating a “bad implication”: You see, we have a true hypothesis but a false conclusion
while, intuitively, a “good” implication ought to preserve truth. This implication must be
“broken”, so we entered f.
But what I just said about the case t → f indicates that → is meant to preserve truth from
left to right. But that it precisely does as per table!
Here is the full picture for →:
• Row one is the “no counterexample” case. That is, I claim that truth was preserved since
there was no truth (left of →) to preserve anyway! You have no counterexample to what
I said. For that you need a t to the left and a f to the right of →.
• In the second row we are good! We got t without lifting a finger!
• In the last row truth is preserved left to right!
• As for row three, we made our case already.
Practical considerations. Thus
1. If you want to demonstrate that S1 ∨ S2 is true, for any component statements S1 , S2 ,

then show that at least one of the S1 and S2 is true.
2. If you want to demonstrate that S1 ∧ S2 is true, then show that both of the S1 and S2 are
true.
Note, incidentally, if we know that S1 ∧ S2 is true, then the truth table guarantees that
each of S1 and S2 must be true.
3. If now you want to show the implication S1 → S2 is true, then the only real work is to
show that if we assume S1 is true, then S2 is true too.
If S1 is known to be false, then no work is required to prove the implication because of
the first two lines of the truth table!
4. If you want to show S1 ≡ S2 , then —becausethelast three columns show that this is
equivalent to (same truth values as) S1 → S2 ∧ S2 → S1 — you just prove each of
the two implications S1 → S2 and S2 → S1 .
From the truth table we see that we have one unary (takes one argument) and four binary
(they take two arguments each) Boolean connectives. We can cascade the operations the
connectives indicate to obtain more complex expressions, such as (S1 ∨ S2 ) ∧ S3 , (S1 ∧
S2 ) ∨ S3 , (S1 → S2 ) → S3 .
Do we always have to carry as many brackets as in the examples immediately above?
Well, as a rule never remove brackets that you or someone else that understands logic
cannot restore correctly.
By agreeing on the “strengths” or “priorities” of the connectives we often can get away
with fewer brackets iff we have an algorithm using which we can restore the ones we
remove to their original positions. The usual agreement is that the unary “¬” is strongest
(has highest priority) and the binary connectives follow (see below) from left to right in
order of decreasing priority.
¬, ∧, ∨, →, ≡ (†)
Equipped with these priorities we can reinsert brackets correctly if anyone has removed
them correctly.
2.3.5 Example
1. Consider (S1 ∨ S2 ) ∧ S3 . We cannot remove the brackets we see in this example, for if
we did, then the strengths of the connectives would suggest we reinsert them this way
S1 ∨ (S2 ∧ S3 ) (‡)
because, in the contest between ∨ and ∧ to win over S2 , ∧ wins in (‡) while ∨ wins in
the original (as the brackets override the priorities).
But (‡) is not correct. How do we determine incorrectness? By finding truth values for
the Si —can you find them?— that lead to distinct results in the original versus (‡).
2. (S1 ∧ S2 ) ∨ S3 . This simplifies to S1 ∧ S2 ∨ S3 in a reversible manner, since the priority
of ∧ (vs. ∨) allows us to reinsert the missing brackets.
3. (S1 → S2 ) → S3 . Can we simplify this, by, say, removing the brackets? This example
amplifies the fact that the priorities are chosen by agreement.
So if we did remove the brackets, how would we reinsert them? Well, the standard
agreement when the same connective fights to win an Si , as in
S1 → S2 → S3 (¶)
is to let the one to the right always win. That is, if we have a chain connected using the
same connective throughout we insert brackets from right to left. Thus here we would
say that brackets would have to be inserted this way
S1 → (S2 → S3 ) ()
So? Is the above different from the formula in the beginning of 3?

Yes! Find truth values for the Si so that the overall truth values of the formula in the
beginning of 3 and () are different. Thus, the bracket removal we contemplated in the
2nd sentence of 3 is incorrect.
An important variant of → and ≡. Pay attention to this point since almost everybody
gets it wrong! In the literature and in the interest of creating a usable shorthand many
practitioners of mathematical writing use notation like
S1 → S2 → S3 (1)
attempting to convey the meaning
(S1 → S2 ) ∧ (S2 → S3 ) (2)
Alas, (2) is not the same as (1)!11

Back to →-chains like (1) versus chains like (2): Take S1 to be t (true), S2 to be f and
S3 to be t. Then (1) is true because in a chain using the same Boolean connective we put
brackets from right to left: (1) is S1 → (S2 → S3 ) and evaluates to t, while (2) evaluates
clearly to false (f) since S1 → S2 = f and S2 → S3 = t.
So we need a special symbol to denote (2) correctly. We need a conjunctional implies!
Most people use =⇒ for that:
S1 =⇒ S2 =⇒ S3 (3)
that means, by definition, (2) above. Incidentally, a conjunctional implication “=⇒” is, we
say, an implication used conjunctionally.
Similarly, ≡ is not conjunctional, it is associative. That is,
S1 ≡ S2 ≡ S3 (4)
means equivalently, as one can check from the truth tables,
11 Logic does not have the sole privilege of being abused. So does plain arithmetic, from High School
onwards: One often writes a < b < c but they mean a < b ∧ b < c! This is wrong!
An amusing example from PL/1 —“Programming Language One”— an old programming lan-
guage that incorporates Algol and Cobol (!) and SNOBOL (!) among others is based on the flexibility
of this language in its handling of different data types. It converts from one data type to the other
readily, without error messages. In particular, the logical “true” constant (what we call “t” in this
book) is essentially —I am avoiding tedious details that are immaterial here— the number “1”. Thus
it allows, say, 6 > 5 > 3 as a condition. PL/1 evaluates expressions from left to right and 6 > 5 is
evaluated first and returns 1 (true). Then 1 > 3 is evaluated and returns false.
Try this in your familiar programming language and see what happens!
(S1 ≡ S2 ) ≡ S3 (4 )
or
S1 ≡ (S2 ≡ S3 ) (4 )
where the brackets indicate “which ≡ applies first”.
On the many occasions that we we may want to chain two or more equivalences with the
intention that the chain means that all the equivalences are true, we use a conjunctional “≡”
denoted by ⇐⇒.
Thus
S1 ⇐⇒ S2 ⇐⇒ S3 (5)
means that all equivalences Si ≡ Si+1 —for i = 1, 2— are true. “⇐⇒” is the conjunctional
“≡”.
Note that the notation ⇐⇒ is not offered just for the sake of notation.
The two notations do have distinct meanings. For example, if the truth values of S1 , S2
and S3 are f, f and t, respectively, then (4) computed either as (4 ) or the equivalent (4 )
yields the value t.
On the other hand, evaluating (5), that is, (5 ) below for the same truth values of the Si
yields the value f.
(S1 ≡ S2 ) ∧ (S2 ≡ S3 ) (5 )
So how do we denote (5) correctly without repeating the consecutive S2 ’s and omitting
the implied “∧”? This way:
S1 ⇐⇒ S2 ⇐⇒ S3 (4)
By definition, “⇐⇒” is conjunctional: It applies to two statements —Si and Si+1 — only
and implies an ∧ before the adjoining next similar equivalence.
2.3.6 Theorem (The subclass theorem) Let A ⊆ B (B a set). Then A is a set.
Proof Well, B being a set there is a stage when it is built (Principle 1). By Principle 0,
all members of B are available or built before stage .
But by A ⊆ B, all the members of A are among those of B.
Hey! By Principle 0 we can build A at stage , so it is a set.
Some corollaries are useful:
2.3.7 Corollary (Modified comprehension I) If for all x we have
P(x) → x ∈ A (1)
for some set A, then B = {x : P(x)} is a set.

Proof I will show that B ⊆ A, that is,
x ∈B→x ∈ A
Indeed (see 3. under Practical considerations in Remark 2.3.4), let x ∈ B. Then P(x) is
true, hence x ∈ A by (1). Now invoke Theorem 2.3.6.
2.3.8 Corollary (Modified comprehension II) If A is a set, then so is B = {x : x ∈ A ∧

P(x)} for any property P(x).
Proof The defining property here is “x ∈ A ∧ P(x)”. This implies x ∈ A —by 2. in Remark
2.3.4— that is, we have
(x ∈ A ∧ P(x)) → x ∈ A
Now invoke Corollary 2.3.7.
2.3.9 Remark (The empty set) The class E = {x : x = x} has no members at all; it is
empty. Why? Because
x ∈ E ≡ x = x
but the condition x = x is always false, therefore so is the statement
x ∈E (1)
Nothing is permitted to enter E.

Is the class E a set?
Well, take A = {1}. This is a set as the atom 1 is given at stage 0, and thus we can construct
the set A at stage 1.
Note that, by (1) and 3. in Remark 2.3.4 we have that
x ∈ E → x ∈ {1}
is true (for all x). That is, E ⊆ {1}.
By Theorem 2.3.6, E is a set.
But is it unique so we can justify the use of the definite article “the”? Yes. The specification
of the empty set is a class with no members. So if D is another empty set, then we will have
x ∈ D always being false. But then
x ∈ E ≡ x ∈ D (both sides of ≡ are false)

2.4 Operations on Classes and Sets 23
and we have E = D by Definition 2.1.1.

The unique empty set is denoted by the symbol ∅ in the literature.
2.4 Operations on Classes and Sets
The reader probably has seen before (perhaps in calculus) the operations on sets denoted by
∩, ∪, − and others. We will look into them in this section.
2.4.1 Definition (Intersection of two classes) We define for any classes A and B
De f

A∩B = x : x ∈A∧x ∈B
We call the operator ∩ intersection and the result A ∩ B the intersection of A and B.
If A ∩ B = ∅ —which happens precisely when the two classes have no common
elements— we call the classes disjoint.
It is meaningless to have − operate on atoms.12
We have the easy theorem below:
2.4.2 Theorem If B is a set, as its notation suggests, then A ∩ B is a set.
Proof I will prove A ∩ B ⊆ B which will rest the case by Theorem 2.3.6. So, I want
x ∈A∩ B → x ∈ B
To this end, let then x ∈ A ∩ B (cf. 3. in 2.3.4). This says that x ∈ A ∧ x ∈ B is true, so
x ∈ B is true.
2.4.3 Corollary For sets A and B, A ∩ B is a set.
2.4.4 Definition (Union of two classes) We define for any classes A and B
De f

A∪B = x : x ∈A∨x ∈B
We call the operator ∪ union and the result A ∪ B the union of A and B.
It is meaningless to have ∪ operate on atoms.
12 The definition expects ∩ to operate on classes. As we know, atoms (by definition) have no set/class
structure thus no class and no set is an atom.
2.4.5 Theorem For any sets A and B, A ∪ B is a set.
Proof By assumption say A is built at stage while B is built at stage . Without loss of
generality (in short, “wlg”) say is no later than , that is, ≤ .
By Principle 2 I can pick a state > , thus
> (1)
and
> (2)
Let us pick any item x ∈ A ∪ B:
I have two (not necessarily mutually exclusive) cases (by Definition 2.4.4):
• x ∈ A. Then x was available or built13 at a stage < ,
hence, by (2), x is available before (3)
• x ∈ B. Then x was available or built at a stage < ,
hence, by (1), x is available before (4)
In either case, (3) or (4), the arbitrary x from A ∪ B is built before , so we can collect all
those x-values at stage in order to form a set: A ∪ B.
2.4.6 Definition (Difference of two classes) We define for any classes A and B
De f

A−B = x : x ∈A∧x ∈ /B
We call the operator “−” (set-theoretic) difference and the result A − B the difference of A
and B, in that order.
It is meaningless to have − operate on atoms.
2.4.7 Theorem For any set A and class B, A − B is a set.
Proof The reader is asked to verify that A − B ⊆ A. We are done by Theorem 2.3.6.
Notation. The definitions of ∩ and − suggest a shorter notation for the rhs for A ∩ B and
A − B. That is, respectively, it is common to write instead
13 As x may be an atom, we allow the possibility that it was available with no building involved,
hence we said “available or built”. For A and B though we are told they are sets, so they were built
at some stage, by Principle 1!

x ∈A:x ∈B
and
x ∈A:x ∈
/B
2.4.8 Exercise Demonstrate —using Definition 2.4.1— that for any A and B we have
A ∩ B = B ∩ A.
Hint. There are two parts in the proof:
1. Assume that x ∈ A ∩ B. Prove that x ∈ B ∩ A.

2. Assume that x ∈ B ∩ A. Prove that x ∈ A ∩ B.
For 1. and 2. use Remark 2.3.4, practical considerations, 3.
2.4.9 Exercise Demonstrate —using Definition 2.4.4— that for any A and B we have
A ∪ B = B ∪ A.
2.4.10 Exercise By picking two particular very small sets A and B show that A − B =
B − A is not true for all sets A and B.
Is it true of all classes?
Let us generalise unions and intersections next. First a definition:
2.4.11 Definition (Family of sets) A class F is called a family of sets iff it contains no
atoms. The letter F is here used generically, and a family may be given any name, usually
capital.
2.4.12 Example Thus, ∅ is a family of sets; the empty family.

So are {{2}, {2, {3}}} and V, the latter given by
De f

V = x : x is a set
Incidentally, as V contains all sets (but no atoms!) it is a proper class! Why? Well, if it is a
set, then it is equal to one of the x-values that we are collecting, thus V ∈ V. But we saw
that this statement is false for sets!
2.4.13 Exercise Is A ∈ A also false for proper classes A? Why?

Here are some classes that are not families: {1}, {2, {{2}}} and U, the latter being the
universe of all objects —sets and atoms— and equals Russell’s “R” as we saw in Sect. 2.2.
These all are disqualified from being “families” as they contain atoms.
2.4.14 Definition (Intersection and union of families) Let F be a family of sets. Then

(i) the symbol F denotes the class that contains all the objects that are common to all
A ∈ F.
In symbols the definition reads:
De f
F = x : for all A, A ∈ F → x ∈ A (1)

(ii) the symbol F denotes the class that contains all the objects that are found among the
various A ∈ F. That is, imagine that the members of each A ∈ F are “emptied” into
a single —originally empty— container {. . .}. The class we get this way is what we

denote by F.
In symbols the definition reads (and arguably is clearer):
De f

F = x : for some A, A ∈ F ∧ x ∈ A (2)
2.4.15 Example Let F = {{1}, {1, {2}}}. Then emptying all the contents of the members of
F into some (originally) empty container we get
{1, 1, {2}} (3)

This is F.
Would we get the same answer from the mathematical definition (2)? Of course:
1 is in some member of F, indeed in both of the members {1} and {1, {2}}, and in order
to emphasise this I wrote two copies of 1 in (3) —it is emptied/contributed twice. Then {2}
is the member that only {1, {2}} of F contributes.

What is F? Well, only 1 is common between the two sets —{1} and {1, {2}}— that are

in F. So, F = {1}.
2.4.16 Exercise

1. Prove that A, B = A ∪ B.

2. Prove that A, B = A ∩ B.
Hint. In each of part 1. and 2. show that lhs ⊆ rhs and rhs ⊆ lhs (cf. Remark 2.3.4, practical
considerations, 3. and 4.). For that analyse membership, i.e., “assume x ∈ lhs and prove
x ∈ rhs”, and conversely (cf. Definition 2.1.1 and Remark 2.1.3.)

2.4.17 Theorem If the set F is a family of sets, then F is a set.
Proof Let F be built at stage . Now,
some
↓
x∈ F≡x∈ A ∈F
Thus x is available or built before A which is built before stage since that is when F was

built. x being arbitrary, all members of F are available/built before , so we can build

F as a set at stage .

2.4.18 Theorem If the class F = ∅ is a family of sets, then F is a set.
Proof By assumption there is some set in F. Fix one such and call it D.
First note that
x∈ F→x ∈D (∗)
Why? Because (i) of Definition 2.4.14 says that

x∈ F ≡ for all A ∈ F we have x ∈ A

Well, D is one of those “A” sets in F, so if x ∈ F then x ∈ D. We established (∗) and
thus we established
F⊆D
by Definition 2.1.1. We are done by Theorem 2.3.6.
2.4.19 Remark What if F = ∅? Does it affect Theorem 2.4.18? Yes, drastically!

In Definition 2.4.14 we read
De f
F = x : for all A, A ∈ F → x ∈ A (∗∗)
However, as the hypothesis (i.e., lhs) of the implication in (∗∗) is false, the implication itself
is true. Thus the entrance condition “for all A, A ∈ F → x ∈ A” is true for all x and thus

allows all objects x to get into F.

Therefore F = U, the universe of all objects which we saw (cf. Sect. 2.2) is a proper
class.

2.4.20 Exercise What is F if F = ∅? Set or proper class? Can you “compute” which
class it is precisely?
2.4.21 Remark (More notation)

Suppose the family of sets Q is a set of sets Ai , for i = 1, 2, . . . , n where n ≥ 3.
Q = {A1 , A2 , . . . , An }

Then we have a few alternative notations for Q:
(a)
A1 ∩ A2 ∩ . . . ∩ An
or, more elegantly,
(b)

n
Ai
i=1
or also
(c) n
Ai
i=1

Similarly for Q:
(i)
A1 ∪ A2 ∪ . . . ∪ An
or, more elegantly,
(ii)
n
Ai
i=1
or also
(iii)
n
Ai
i=1
If the family has so many elements that all the natural numbers are needed to index the sets
in the set family Q we will write
∞
Ai
i=0
or ∞
Ai
i=0
2.5 The Powerset 29
or
Ai
i≥0
or
Ai
i≥0

for Q and
∞
Ai
i=0
or
∞
Ai
i=0
or
Ai
i≥0
or
Ai
i≥0

for Q
2.4.22 Example Thus, for example, A ∪ B ∪ C ∪ D can be seen —just changing the

notation— as A1 ∪ A2 ∪ A3 ∪ A4 , therefore it means, {A1 , A2 , A3 , A4 }, or {A, B,
C, D}.
Same comment for ∩.

Pause. How come for the case for n = 2 we proved 14 A ∪ B = {A, B} (2.4.16) but
here we say (n ≥ 3) that something like the content of the previous remark and example are
just notation (definitions)?
Well, we had independent definitions (and associated theorems re set status for each,

Theorems 2.4.5 and 2.4.17) for A ∪ B and {A, B} so it makes sense to compare the two
definitions after the fact and see if we can prove that they say the same thing. For n ≥ 3 we

opted to not give a definition for A1 ∪ . . . ∪ An that is independent of {A1 ∪ . . . ∪ An },
rather we gave the definition of the former in terms of the latter. No independent definitions,
no theorem to compare the two!
2.5 The Powerset
2.5.1 Definition For any set A the symbol P (A) —pronounced the powerset of A— is
defined to be the class
14 Well, you proved! Same thing :-)

De f

P (A) = x:x⊆A
Thus we collect all the subsets x of A to form P (A).

The literature most frequently uses the symbol 2 A in place for of P (A).
(1) The term “powerset” is slightly premature, but it is apt. Under the conditions of the
definition —A a set— 2 A is also a set as we prove immediately below.
(2) We said “all the subsets x of A” in the definition. This is correct. As we know from
Theorem 2.3.6, if X ⊆ Y and Y is a set, then so is X.
2.5.2 Theorem For any set A, its powerset P (A) is a set.
Proof Let A be built at stage . Then each of its members y are given or built before .
Thus, since every subset x of A is a set of y-values, every such subset x can be built at
stage .
But then, just take any > . Since all x-values (such that x ⊆ A) are built before ,
at stage we can collect them all and build the set 2 A .
2.5.3 Example Let A = {1, 2, 3}. Then

P (A) = ∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {3, 2}, {1, 2, 3}
Thus the powerset of A has 8 elements.

We will later see that if A has n elements, for any n ≥ 0, then 2 A has 2n elements. This
observation is at the root of the notation “2 A ”.
2.5.4 Remark For any set A it is trivial (verify!) that we have ∅ ⊆ A and A ⊆ A. Thus, for
any A, {∅, A} ⊆ 2 A .
2.6 The Ordered Pair and Finite Sequences
To introduce the concepts of cartesian product —so that, for example, plane analytic geom-
etry can be developed within set theory— we need an object “(A, B)” that is like the set pair
(2.3.1) in that it contains two objects, A and B (A = B is a possibility), but in (A, B) order
and length (in (A, B) it is two) matter!
We want (A, B) = (A , B ) to imply A = A and B = B . Moreover, note that (A, A) is not

{A}! It is still an ordered pair but it so happens that the first and second component, of the
ordered pair, are equal in this example.
2.6 The Ordered Pair and Finite Sequences 31
So, are we going to accept a new type of object in set theory? Not at all! We will build
(A, B) so that it is a set!
2.6.1 Definition (Ordered pair) By definition, (A, B) is the abbreviation (short name)
given below:
De f
(A, B) = A, {A, B} (1)
We call “(A, B)” an ordered pair, and A its first component, while B is its second
component.
2.6.2 Remark
1. Note that A = {A, B} and A = {A, A}, because in either case we would otherwise get
A ∈ A, which is false for sets or atoms A. Thus (A, B) does contain exactly two members
or has length two: A and {A, B}.
Pause. We have not said in Definition 2.6.1 that A and B are sets or atoms. So what right
do we have in the paragraph above to so declare?
2. What about the desired property that
(A, B) = (X , Y ) → A = X ∧ B = Y (2)
Well, assume the lhs of “→” in (2) and prove the rhs, “A = X ∧ B = Y ”. From our
truth table we know that we do the latter by proving each of A = X and B = Y true
(separately).
The lhs that we assume translates to

A, {A, B} = X , {X , Y } (3)
By the remark 1. above there are two distinct members in each of the two sets that we
equate in (3).
So since (3) is true (by assumption) we have (by definition of set equality) one of:
a. A = {X , Y } and {A, B} = X , that is, 1st listed element in lhs of “=” equals the 2nd
listed in rhs; and 2nd listed element in lhs of “=” equals the 1st listed in rhs.
b. A = X and {A, B} = {X , Y }.
Now case (a) above cannot hold, for it leads to A = {{A, B}, Y }. This in turn leads to
{A, B} ∈ A
and thus the set {A, B} is built before one of its members A, which contradicts Principle 0.
Let us then work with case (b).

We have
{A, B} = {A, Y } (4)
Well, all the members on the lhs must also be on the rhs. I note that A is. We have two cases.
• What if B is also equal to A? Then we have {B} = {A, Y } and thus Y ∈ {B} (why?).
Hence Y = B. We showed so far A = X (listed in case (b)) and B = Y (proved here);
great!
• Here B is not equal to A. But B must be in the rhs of (4), so the only way for that is
B = Y . All Done!
Worth noting as a theorem what we have just proved above:
2.6.3 Theorem If (A, B) = (X , Y ), then A = X and B = Y .
But is (A, B) a set? (atom it is not, of course!) Yes!
2.6.4 Theorem (A, B) is a set.

Proof (A, B) = A, {A, B} . By Example 2.3.1, {A, B} is set. Applying Example 2.3.1

once more, A, {A, B} is a set.
2.6.5 Example So, (1, 2) = {1, {1, 2}}, (1, 1) = {1, {1}}, and ({a}, {b}) = {{a}, {{a},
{b}}}.
2.6.6 Remark We can extend the ordered pair to ordered triple, ordered quadruple, and
beyond!
We take this approach in these notes:
De f

(A, B, C) = (A, B), C (1)
De f

(A, B, C, D) = (A, B, C), D (2)
De f

(A, B, C, D, E) = (A, B, C, D), E (3)
etc.
So suppose we defined what an n-tuple is, for some fixed unspecified n, and denote it by
(A1 , A2 , . . . , An ) for convenience. Then we define
2.6 The Ordered Pair and Finite Sequences 33
De f

(A1 , A2 , . . . , An , An+1 ) = (A1 , A2 , . . . , An ), An+1 (∗)
This is an “inductive” or “recursive” definition, defining a concept (n + 1-tuple) in terms of

a smaller instance of itself, namely, in terms of the concept for an n-tuple, and in terms of
the case n = 2 that we dealt with by direct definition (not in terms of the concept itself) in
Definition 2.6.1.
Suffice it to say this “case of n + 1 in terms of case of n” provides just shorthand notation
to take the mystery out of the “etc.” above. We condense/codify infinitely many definitions
(1), (2), (3), … into just two:
• Definition 2.6.1
and
• (∗)
The reader has probably seen such recursive definitions before (likely in calculus and/or
high school).
The most frequent example that occurs is to define, for any natural number n and any
real number a > 0, what a n means. One goes like this:
De f
a0 = 1
De f (1)
a n+1 = a · an
The pair of definitions above condenses infinitely many definitions such as
a0 =1
a1 = a · a0 =a
a2 = a · a1 =a·a
a3 = a · a2 =a·a·a
a4 = a · a3 =a·a·a·a
..
.
into just two!
We will study inductive definitions and induction soon!
De f
2.6.7 Exercise What would happen if we defined (in (1)) a 0 = 42?
Caution. Should we not? Why not? Because then a 1 = a × a 0 = a × 42. Hardly the
intended and expected value for a 1 !
A correct answer to these two questions does not prove that a 0 = 1! This expression is
not provable by logic using some axioms I forgot to mention. It is just a judicious renaming
of “1” as “a 0 ”.
Before we exit this remark note that (A, B, C) = (A , B , C ) implies A = A , B =

B , C = C because it implies
C = C and (A, B) = (A , B )
That is, (A, B, C) is an ordered triple (3-tuple).

We can also prove that (A1 , A2 , . . . , An , An+1 ) is an ordered n + 1-tuple, i.e.,
(A1 , A2 , . . . , An+1 ) = (A1 , A2 , . . . , An+1 ) → A1 = A1 ∧ . . . ∧ An+1 = An+1 (2)
if we have followed (proved) the “etc.” all the way to the case of (A1 , A2 , . . . , An ) —for
k = 1, 2, 3, . . . , n. Then, by (∗), the case for k = n + 1 —(2) above— is a straightforward
application of the case for Theorem 2.6.3 where X = (A1 , . . . , An ) and Y = An+1 .
We will do the “etc.”-argument elegantly once we learn induction!
2.6.8 Definition (Finite sequences) An n-tuple for n ≥ 1 is called a finite sequence of

length n, where we extend the concept to a one element sequence —by definition— to be
De f
(A) = A
Note that now we can redefine all sequences of lengths n ≥ 1 using again (∗) above, but
this time with starting condition that of Definition 2.6.8. Indeed, for n = 2 we rediscover
(A1 , A2 ):
by (∗)
by 2.6.8
the “new” 2-tuple pair: (A1 , A2 ) = (A1 ), A2 = A1 , A2
The big brackets are applications of the ordered pair defined in Definition 2.6.1, just as it
was in the general definition (∗).
2.7 The Cartesian Product
We are ready to define classes of ordered pairs.
2.7.1 Definition (Cartesian product of classes) Let A and B be classes. Then we define
De f

A × B = (x, y) : x ∈ A ∧ y ∈ B
called the Cartesian product of A and B in that order. The definition requires both sides of
× to be classes. It makes no sense if one or both are atoms.
2.7 The Cartesian Product 35
2.7.2 Theorem If A and B are sets, then so is A × B.
Proof By Definitions 2.7.1 and 2.6.1

A × B = x, {x, y} : x ∈ A ∧ y ∈ B (1)
So, for each x, {x, y} ∈ A × B we have x ∈ A and {x, y} ⊆ A ∪ B, or x ∈ A and {x, y} ∈

A∪B
2 A∪B . Thus x, {x, y} ⊆ A ∪ 2 A∪B and hence (changing notation) (x, y) ∈ 2 A∪2 .
We have established that
A∪B
A × B ⊆ 2 A∪2
thus A × B is a set by Theorems 2.3.6, 2.4.5 and 2.5.2.

2.7.3 Definition Mindful of the Remark 2.6.6 where (A, B), C , (A, B, C), D , etc.
were defined, we define here A1 × . . . × An for any n ≥ 3 as follows:
De f
A× B ×C = (A × B) × C
De f
A× B ×C × D = (A × B × C) × D
..
.
De f
A1 × A2 × . . . × An × An+1 = (A1 × A2 × . . . × An ) × An+1
..
.
ą
n
We may write Ai for A1 × A2 × . . . × An
i=1
If A1 = . . . = An = B we may write B n for A1 × A2 × . . . × An .
2.7.4 Remark Thus, what we learnt in Definition 2.7.3 is, in other words,
ą
n
De f

Ai = (x1 , . . . , xn ) : xi ∈ Ai , for i = 1, 2, . . . , n
i=1
and
De f

Bn = (x1 , . . . , xn ) : xi ∈ B
Ś
n
2.7.5 Theorem If Ai , for i = 1, 2, . . . , n is a set, then so is Ai .
i=1
Proof A × B is a set by Theorem 2.7.2. By Definition 2.7.3, and in this order, we verify
that so is A × B × C and A × B × C × D and . . . and A1 × A2 × . . . × An and . . .
If we had inductive definitions available already, then Definition 2.7.3 would simply read
De f

A1 × A2 = (x1 , x2 ) : x1 ∈ A1 ∧ x2 ∈ A2
and, for n ≥ 2,
De f
A1 × A2 × . . . × An × An+1 = (A1 × A2 × . . . × An ) × An+1
Correspondingly, the proof of 2.7.5 would be far more elegant, via induction.
2.7.1 Strings or Expressions Over an Alphabet
2.7.6 Definition (Strings over an Alphabet) A string x (or expression or word or vector)
over an alphabet A is just an n-tuple, all of whose components come from the same set, A.
That is, we say that “x is a string of length n over the alphabet A” meaning x ∈ An , for
some n > 0 —or we simply say “x is over A”.
Traditionally, strings are written down in “string notation” without separating commas
or spaces, ommiting enclosing brackets. So if A = {a, b} we will write aababa rather than
(a, a, b, a, b, a).

Thus all strings over A that we spoke of already are members of i>0 Ai .
What is the advantage of the notation aababa over that of (a, a, b, a, b, a)? Well, it is more
natural!
We write words (strings) like this “words” instead of like this“(w,o,r,d,s)” and aformula
(∃x)x = y is written like this “(∃x)x = y” rather than like this “ (, ∃, x, ), x, =, y ”. How-
ever we must be careful with the bracket-less and comma-less notation: Let us start with the
alphabet A = {1, 11}. Which string is denoted by “111”? Unfortunately, we can answer this
in many different ways, so the notation is ill-defined. It is ambiguous as we say. In n-tuple
notation we depict the possible meanings of “111” below:
(1, 1, 1) (length 3) or (11, 1) (length 2) or (1, 11) (length 2).
We avoid this ambiguity in notation if we choose our alphabet members so that each
is a symbol of length one. Thus, in application in assembly programming where one uses
integers base-16, rather than employing the digits
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
which are mathematically necessary and sufficient for the job one avoids ambiguities by
renaming the last 6 digits
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
Thus the decimal-notation number 11 is denoted by “b” base-16, since 11 translates (in
decimal notation) as 1 × 16 + 1 = 17.
2.7 The Cartesian Product 37
We can exercise the remedy of length-1-symbols easily in practice —just as in the example
above— because we normally deal with finite alphabets.
Concatenation of the strings (a1 , . . . , am ) and (b1 , . . . , bn ) in that order, denoted as
(a1 , . . . , am ) ∗ (b1 , . . . , bn )
is the string of length m + n

(a1 , . . . , am , b1 , . . . , bn )
Clearly, concatenation as defined above is associative, that is, for any strings x, y and z we
have (x ∗ y) ∗ z = x ∗ (y ∗ z).
It is convenient to include an empty vector or empty string —also known as the null string—
“( )” as a vector with no components and define that it is the only member of A0 . It has zero
length.
The symbols prevalent in the literature used to denote the empty string are or λ. We
will choose λ since might be confused with “∈”.
Note that the empty string is an ordered empty set, so cannot be identified, nor confused,
with the empty unordered set, ∅.
At the intuitive level, and given how concatenation was defined, we see that x ∗ λ =
λ ∗ x = x for any string x.
The set of all strings of non zero length over A is denoted by A+ . If we want to include
λ we must include A0 = {λ}. That is, all strings over A, including λ form the set
∞
Ai (1)
i=0
∞
We use the symbols by A∗ for the set in (1) and A+ for the set i=1 Ai .
Thus A∗ relates to A+ by the relation
A∗ = A+ ∪ {λ}
A∗ is often called the Kleene closure or Kleene star of A.

A string A is a prefix of a string B if there is a string C such that B = A ∗ C. It is a suffix
of B if for some D, we have B = D ∗ A. The prefix (suffix) is proper if it is not equal to B.
Just as we use implied multiplication, ab for a × b or a · b, we also use implied concatena-
tion, x y for x ∗ y —leaving it up to the context to fend off ambiguities.
2.7.7 Definition (Languages) A language, L, over an alphabet A is just a subset

of A∗ .
The “interesting” languages are those that are finitely definable. Automata and language
theory studies the properties of such finitely definable languages and of the “machinery”
that effects these finite definitions. The language of Logic is also finitely definable.
2.7.8 Definition (Concatenation of Languages) If L and M are two languages over an

alphabet A, then the symbol L ∗ M or simply (implied concatenation) L M means the set
{x y : x ∈ L ∧ y ∈ M}.
One can learn to live with ∗ as both a unary (one-argument) operation, A∗ , and as a binary
one, L ∗ M, much the same way we can see no ambiguity in uses of minus as −x and y − z.
2.8 Exercises
1. An argument towards showing that U, the class of all sets and atoms, is a set might go
like this:
Let be a stage after all atoms and all member sets of U were built. At stage we
can build U as a set.
Do you accept the preceding argument? Why?
2. Let a be a set, and consider the class b = {x ∈ a : x ∈/ x}.
Show that, despite similarities with the Russell class R, b is a set.
Moreover, show that b ∈ / a.
3. Show that R (the Russell class)= U.
4. Show that if a class A satisfies A ⊆ X for all X, then A = ∅.
5. Without using set-formation-by-stages Principles, show that ∅ = {∅}.
6. Without using set-formation-by-stages Principles, show that ∅ ∈ / ∅.
7. Without using set-formation-by-stages Principles, show that 1 ∈ / 1.
8. Let us prove that {A} —where A is a set— is a set. Argument. Well, {A} ⊆ {A, B}
and {A, B} has been proved to be a set. We conclude by the subclass theorem.
What exactly is wrong with this argument?
Hint. What exactly are we given?
9. Now prove that {A} is a set correctly! Do not argue via Principles 0, 1, 2.
10. Prove that the class {{x} : x = x} which includes only one-element sets {x} —but
includes all of them— is a proper class. Incidentally, the literature calls one-element
sets singletons.
11. Prove that the class {{x} : x is an atom} is a set.
12. How about the class {x : x is an atom}? Set or proper class?
2.8 Exercises 39
13. ZF (Zermelo-Fraenkel) axiomatic set theory contains the following axiom, here
expressed in terms of sets:

∅ = S → (∃x) x ∈ S ∧ ¬(∃z)(z ∈ S ∧ z ∈ x)
Prove that this axiom is true if we accept the Principles of set-formation-by-stages.

Hint. Pick an x ∈ S that is not built later than any z ∈ S.
14. Suppose that A and B have intuitively n and m members respectively. That is, A =
{a1 , a2 , . . . , an } and B = {b1 , b2 , . . . , bm } where all the ai and b j are pairwise distinct.
Prove that A × B has nm members.

15. What is ∅ (and why)?

16. What is ∅ (and why)?
17. Show that
(1) A ∪ B = B ∪ A and
(2) A ∪ (B ∪ C) = A ∪ B ∪ C (ensure that you translate the left hand side correctly).
18. Show that
(1) A ∩ B = B ∩ A and
(2) A ∩ (B ∩ C) = A ∩ B ∩ C (ensure that you translate the left hand side correctly).
19. Show that A ∪ (A ∩ B) = A.
20. Show that A ∩ (A ∪ B) = A.
21. For any set A, show that U − A is a proper class.
22. Show for any classes A, B, that A − B = A − A ∩ B.
23. For any classes A, B show that A ∪ B = A iff B ⊆ A.
24. For any classes A, B show that A ∩ B = A iff A ⊆ B.
25. For any classes A, B show that A − (A − B) = B iff B ⊆ A.
26. (1) Express A ∩ B using class difference as the only operation.
(2) Express A ∪ B using class difference as the only operations.
27. (Generalized de Morgan’s laws). Prove for any class A and indexed family (Bi )i∈F ,
that

(1) A− Bi = (A − Bi )
i∈F i∈F

(2) A− Bi = (A − Bi )
i∈F i∈F
28. (Distributive laws for ∪, ∩). For any classes A, B, D show
(1) A ∩ (B ∪ D) = (A ∩ B) ∪ (A ∩ D)
(2) A ∪ (B ∩ D) = (A ∪ B) ∩ (A ∪ D)
29. (Generalized distributive laws for ∪, ∩). Prove for any class A and indexed family
(Bi )i∈F , that
(1) A∩ Bi = (A ∩ Bi )
i∈F i∈F

(2) A∪ Bi = (A ∪ Bi )
i∈F i∈F
30. Show that the Principles of set formation by stages disallow the truth of a ∈ a.
31. Show that the axiom of foundation (Exercise 2.8.13) disallows the truth of a ∈ a.
32. Show that the Principles of set formation by stages disallow the truth of a ∈ b ∈ c ∈
· · · ∈ a.
33. Show that the axiom of foundation (Exercise 2.8.13) disallows the truth of a ∈ b ∈ c ∈
· · · ∈ a.
34. Show that V = {x : x is a set} is a proper class.
35. Show that for any class (not just set) A, A ∈ A is false.
36. Somebody once said (cf. Wilder 1963) “Consider the class A of all abstract ideas. But
that is an abstract idea, so it is a member of itself.”
Discuss.
37. (1) Show that A =“the class of all sets that contain at least one element” can be defined
by a defining property.
(2) Show that A is a proper class.
38. Expand (i.e., show the set in by-listing notation) 2{1,2,3} .
39. This exercise will reappear after we covered Induction over N.
Attach the intuitive meaning to the statement that the set A has n distinct elements.
Show that if A has n elements then P (A) has 2n elements.
Hint. Imagine that you arranged the members of A in a straight line in any fixed order
you please. So they occupy position 1, position 2, position 3, . . ., position n in an array.
Now any subset of A can be marked off by a checkmark against each of its members
in the above mentioned array. No checkmark against an A-member means it is not in
the subset under consideration.
Well, we can have as many subsets as we can have ways to mark some entries of the
array and leave the rest unmarked! How many such marking schemes do we have?
40. Show (without the use of the Principles of set formation) that {a}, {a, b} = {a },
{a , b } implies a = a and b = b .
41. For any sets x, y show that x ∪ {x} = y ∪ {y} → x = y.
Hint: Use principles of set formation, or even foundation (2.8.13).
2.8 Exercises 41
42. For any A, B show that ∅ = A × B iff A = ∅ or B = ∅.

43. (Distributive law for ×) Show for any A, B and D that
D × (A ∪ B) = (D × A) ∪ (D × B)
Relations and Functions
3
Overview
The topic of relations and functions is central in all mathematics and computing. In the
former, whether it is calculus, algebra or theory of computation, one deals with relations
(notably equivalence relations, order) and all sorts of functions while in the latter one com-
putes relations and functions (among other related endeavours1 ), in that, one writes programs
that given an input to a relation they compute the response (true or false) or given an input to
a function they compute a response which is some object (number, graph, tree, matrix, other)
or nothing, in case there is no response for said input (for example, there is no response to
x
input “x, y” if what we are computing is but y = 0).
y
We are taking mostly an “extensional” point of view of relations and functions in this
course, as is customary in set theory, that is, we view them as sets of (input, output) ordered
pairs. It is also possible to take an intentional point of view, especially in theory of compu-
tation and some specific areas of mathematics, viewing relations and functions as methods
to compute outputs from given inputs.
The topics in this chapter include an introduction to equivalence and order relations,
finite and infinite sets, to uncountable sets and diagonalisation, and contain the proof of the
nontrivial Cantor-Bernstein theorem.
1 Cf. Tourlakis (2022).

https://doi.org/10.1007/978-3-031-30488-0_3
44 3 Relations and Functions
3.1 Relations
3.1.1 Definition (Binary relation) A binary relation is a class R2 of ordered pairs.

The statements (x, y) ∈ R, xRy and R(x, y) are equivalent. xRy is the infix notation
—imitating notation such as A ⊂ B, x < y, x = y and has notational advantages.
3.1.2 Remark R contains just pairs (x, y), that is, just sets {x, {x, y}}, in other words, it is
a family of sets.
3.1.3 Example Examples of relations:
(i)∅
(ii){(1, 1)}
(iii){(1, 1), (1, 2)}
(iv) N2 , that is {(x, y) : x ∈ N ∧ y ∈ N}. This is a set by the fact that N is (Why?) and thus
so is N × N by 2.7.2.
(v) < on N, that is {(x, y) : x < y ∧ x ∈ N ∧ y ∈ N}. This is a set since <⊆ N2 .
(vi) ∈, that is,
{(x, y) : x ∈ y ∧ x ∈ U ∧ y ∈ V} (∗)
This is a proper class (nonSet). Why? Well, if ∈ is a set, then it is built at some stage .
Now examine the arbitrary (x, y) in ∈. This is {x, {x, y}} so it is built before , but
then so is its member x (available before ). Thus we can collect all such x into a set
at stage . But this “set” contains all x ∈ U due to the middle conjunct in the entrance
condition in (∗).3 That is, this “set” is U. This is absurd!

Here is another way to argue that the relation ∈ is not a set: If it is, so is ∈. Any (x, y) ∈ ∈

is of the form {x, {x, y}}. Thus all x for which there is a y such that x ∈ y are in ∈. As
we said in the footnote, taking y = {x} makes clear that “x ∈ y” does not restrict the x’s we

can get. We get them all: thus they form the proper class U. I argued U ⊆ ∈, thus ∈
cannot be a set. So, neither can ∈ (2.4.17).
So, a binary relation R is a table of pairs:
1. Thus one way to view R is as a device that for inputs x, valued a, a , . . . , u, . . . one gets
the outputs y, valued b, b , . . . , u , . . . respectively. It is all right that a given input may
yield multiple outputs (e.g., case (iii) in the previous example).
2 I write “R” or “R” for a relation, generically, but P, Q, S and T are available to use as well.
3 Hmm. Doesn’t the first conjunct “x ∈ y” constrain and reduce the number of x-values? No: For
every x out there take y = {x} thus the conjunct x ∈ y is fulfilled for all x-values, as I just showed
how to find a y that works.
3.1 Relations 45
2. Another point of view is to see both x and y as inputs and the outputs are true or false (t
or f) according as (x, y) is in the table —that is, xRy is true— or not. For example, (a, b)
is in the table (that is, aRb) hence if the relation receives it as input, then it outputs t.
input: x output: y
a b
a b
.. ..
. .
u u
.. ..
. .
Most of the time we will take the point of view in 1 above. This point of view compels
us to define domain and range of a relation R, that is, the class of all inputs that cause an
output and the class of all caused outputs respectively.
3.1.4 Definition (Domain and range) For any relation R we define domain, in symbols
“dom” by
De f
dom(R) = {x : (∃y)xRy}
where we have introduced the notation “(∃y)” as short for “there exists some y such that”,
or “for some y,”
Range, in symbols “ran”, is defined also in the obvious way:
De f
ran(R) = {x : (∃y)yRx}
Notation 1. For a relation P, the symbol (a)P means the class of all outputs caused by a:
De f
(a)P = {x : aPx}
If (a)P = ∅ and therefore a ∈ dom(P) we may also write (a)P ↓ and say
“(a)P is defined”. Otherwise —(a)P = ∅— we write (a)P ↑ and say “(a)P is
undefined”.
We sometimes want to restrict a relation S to a class A. There are two main
ways to want to do this:
Notation 2. Restrict both inputs and outputs to be in A: This is the way we restrict relations
to obtain a relational restriction. We obtain
De f
S | A = S ∩ A2
Notation 3. For functions (to be introduced shortly) one prefers to restrict only inputs of S
to be in A: We obtain a functional restriction
De f

S A = S ∩ A × ran(S)

Notation 4. “Notation 1” above becomes (a) S | A and (a) S A in the context of
Notations 2 and 3 (note the brackets to help readability).
We settle the following, before other things:
3.1.5 Theorem For a set relation R, both dom(R) and ran(R) are sets.
Proof For domain we collect all the x such that x Ry, for some y, that is, all the x such that
{x, {x, y}} ∈ R (1)

for some y. Since R is a family of sets, we have that R is a set. But then each x in the set

{x, {x, y}} in (1) is in R. But the set of these x is dom(R) (3.1.4). Thus dom(R) ⊆ R.
This settles the domain case.

Now R may contain atoms as some of the x in the {x, {x, y}} maybe indeed be such.

Let then A be the set of all atoms in R and define
De f

S = R −A
We know that S is a set (cf. 2.4.7).

So, S is a family of sets that contains all {x, y} as R does and no {x, y} is an atom.

Thus, S contains all the y. That is, ran(R) ⊆ S, and that settles the range case.
3.1.6 Definition In practice we often have an a priori decision about what are in principle
“legal” inputs for a relation R, and where its outputs go. Thus we have two classes, A and B
for the class of legal inputs and possible outputs respectively. Clearly we have R ⊆ A × B.
We call A and B left field and right field respectively, and instead of R ⊆ A × B we often
write
R:A→B
and also
R
A −→ B
pronounced “R is a relation from A to B”.
3.1 Relations 47
The term field —without left/right qualifiers— for R : A → B refers to A ∪ B.
If A = B then we have
R:A→A
but rather than pronouncing this as “R is a relation from A to A” we prefer 4 to say “R is on
A”.
3.1.7 Remark Trivially, for any R : A → B, we have dom(R) ⊆ A and ran(R) ⊆ B (give
a quick proof of each of these inclusions).
Also, for any relation P with no a priori specified left/right fields, P is a relation from
dom(P) → ran(P). Naturally, we say that dom(P) ∪ ran(P) is the field of P.
3.1.8 Example As an example, consider the divisibility relation on all integers (their set
denoted by Z) that is usually named “|”:
x|y means x divides y with 0 remainder
thus, for x = 0 and all y, the division is illegal, therefore
The input x = 0 to the relation “ |” produces no output, in other words, “for input x = 0 the
relation is undefined.”
We walk away with two things from this example:
1. It does make sense for some relations to a priori choose left and right fields, here
|:Z→Z
You would not have divisibility on real numbers!

2. dom( | ) is the set of all inputs that produce some output. Thus, it is not the case for
all relations that their domain is the same as the left field chosen! Note the case in this
example! And, incidentally, ignore the term “codomain” that may appear —erroneously,
instead of the correct “right field”— in some of the elementary discrete mathematics
literature!
3.1.9 Example Next consider the relation < with left/right fields restricted to N.
Then dom(<) = N, but ran(<) N. Indeed, 0 ∈ N − ran(<).
Let us extract some terminology from the above examples:
4 Both ways of saying it are correct.

3.1.10 Definition Given

R:A→B
If dom(R) = A, then we call R total or totally defined. If dom(R) A, then we say that R
is nontotal.
If ran(R) = B, then we call R onto. If ran(R) B, then we say that R is not onto.
So, | above is nontotal, and < is not onto.
3.1.11 Example Let A = {1, 2}.
• The relation {(1, 1)} on A is neither total nor onto.

• The relation {(1, 1), (1, 2)} on A is onto but not total.
• The relation {(1, 1), (2, 1)} on A is total but not onto.
• The relation {(1, 1), (2, 2)} on A is total and onto.
3.1.12 Definition The relation A on the set A is given by

De f
A = {(x, x) : x ∈ A}
We call it the diagonal (“” for “diagonal”) or identity relation on A.

Consistent with the second terminology, we may also use the symbol 1 A for this relation.

3.1.13 Definition A relation R (not a priori restricted to have predetermined left or right
fields) is
1. Transitive: Iff x Ry ∧ y Rz implies x Rz.

2. Symmetric: Iff x Ry implies y Rx.
3. Antisymmetric: Iff x Ry ∧ y Rx implies x = y.
4. Irreflexive: Iff x Ry implies x = y.
Now assume R is on a set A. Then we call it reflexive iff A ⊆ R.
3.1.14 Example
(i) Transitive examples: ∅, {(1, 1)}, {(1, 2), (2, 3), (1, 3)}, <, ≤, =, N2 .
(ii) Symmetric examples: ∅, {(1, 1)}, {(1, 2), (2, 1)}, =, N2 .
(iii) Antisymmetric examples: ∅, {(1, 1)}, =, ≤, ⊆.
(iv) Irreflexive examples: ∅, {(1, 2)}, <, , the relation “ =” on N.
3.1 Relations 49
(v) Reflexive examples: 1 A on A, {(1, 1)} on {1}, {(1, 2), (2, 1), (1, 1), (2, 2)} on {1, 2}, =
on N, ≤ on N.
3.1.15 Exercise Show that R is symmetric iff x Ry ≡ y Rx.
We can compose relations:
3.1.16 Definition (Relational Composition) Let R and S be (set) relations. Then, their
composition, in that order, denoted by R ◦ S is defined for all x and y by:
De f

x R ◦ Sy ≡ (∃z) x Rz ∧ zSy
It is customary to abuse notation and write “x RzSy” for “x Rz ∧ zSy” just as one writes
x < y < z for x < y ∧ y < z.
The definition, unchanged, applies to any class relations R and S as well.
3.1.17 Example Here is whence the emphasis “in that order” above. Say, R = {(1, 2)}
and S = {(2, 1)}. Thus, R ◦ S = {(1, 1)} while S ◦ R = {(2, 2)}. Thus, R ◦ S = S ◦ R in
general.
3.1.18 Example For any R, we diagrammatically indicate x Ry by

R
x −→ y
Thus, the situation where we have that x R ◦ Sy means, for some z, x RzSy and is depicted
as:

3.1.19 Theorem The composition of two (set) relations R and S in that order is also a set.
Proof Trivially, R ◦ S ⊆ dom(R) × ran(S).

Indeed, if x R ◦ Sy then x RzSy, for some z. Hence the x is in dom(R) (by x Rz) and the
y is in ran(S) (by zSy). Moreover, we proved in 3.1.5 that dom(R) and ran(S) are sets. Thus
so is dom(R) × ran(S) (2.7.2).
3.1.20 Corollary If we have R : A → B and S : B → C, then R ◦ S : A → C.
Proof This is a trivial modification of the argument above.
The result of the corollary is depicted diagrammatically as
3.1.21 Theorem (Associativity of composition) For any relations R, S and T, we have
(R ◦ S) ◦ T = R ◦ (S ◦ T)
We state and prove this central result for any class relations.
Proof We have two directions:

→: Fix x and y and let x(R ◦ S) ◦ Ty.
Then, for some z, we have x(R ◦ S)zTy and hence for some w, the above becomes
xRwSzTy (1)
But wSzTy means wS ◦ Ty, hence we rewrite (1) as
xRw(S ◦ T)y
Finally, the above says xR ◦ (S ◦ T)y.

←: Fix x and y and let xR ◦ (S ◦ T)y.
Then, for some z, we have xRz(S ◦ T)y and hence for some u, the above becomes
xRzSuTy (2)
But xRzSu means xR ◦ Su, hence we rewrite (2) as
x(R ◦ S)uTy
Finally, the above says x(R ◦ S) ◦ Ty.

The following is almost unnecessary, but offered for emphasis:
3.1.22 Corollary If R, S and T are (set) relations, all on some set A,5 then “R ◦ S ◦ T ”
has a meaning that is independent of how brackets are inserted.
5 Recall that “R is on a set A” means R ⊆ A2 , which is the same as R : A → A.

3.2 Transitive Closure 51
The corollary allows us to just omit brackets in a chain of compositions, even longer than
the above. It also leads to the definition of relational exponentiation, below:
3.1.23 Definition (Powers of a binary relation) Let R be a (set) relation. We define R n ,

for n > 0, as
R ◦ R ◦· · · ◦ R (1)
n R
Note that the resulting relation in (1) is independent of how brackets are inserted (3.1.22).
If moreover we have defined R to be on a set A, then we also define the 0-th power: R 0
stands for A or 1 A .
3.1.24 Exercise Let R be a relation on A. Then for all n ≥ 0, R n is a set.

Hint. You do not need to do induction. A “and so on” argument will be all right.
3.1.25 Example Let R = {(1, 2), (2, 3)}. What is R 2 ?

Well, when can we have x R 2 y? Precisely if/when we can find x, y, z that satisfy x Rz Ry.
The values x = 1, y = 3 and z = 2 are the only ones that satisfy x Rz Ry.
Thus 1R 2 3, or (1, 3) ∈ R 2 . We conclude R 2 = {(1, 3)} due to the “only ones” above.
3.1.26 Exercise Show that if for a a relation R we know that R 2 ⊆ R, then R is transitive
and conversely.
3.2 Transitive Closure
3.2.1 Definition (Transitive closure of R) A transitive closure of a relation R —if it

exists— is a ⊆-smallest transitive T that contains R as a subset.
More precisely,
1. T is transitive, and R ⊆ T .
2. If S is also transitive and R ⊆ S, then T ⊆ S. This makes the term “⊆-smallest” precise.

Note that we hedged twice in the definition, because at this point we do not know yet:
• If every relation has a transitive closure; hence the “if it exists”.

• We do not know if it is unique, hence the emphasised indefinite article “A”.
3.2.2 Remark Uniqueness can be settled immediately from the definition above: Suppose
T and T fulfil Definition 3.2.1, that is,
1. R ⊆ T
and
2. R ⊆ T
since both are closures. But now think of T as a closure and T as the “S” of 3.2.1 (it includes
R all right!)
Hence T ⊆ T .
Now reverse the role playing and think of T as a closure, while T plays the role of “S”.
We get T ⊆ T . Hence, T = T .
3.2.3 Definition The unique transitive closure, if it exists, is denoted by R + .
3.2.4 Exercise If R is transitive, then R + exists. In fact, R + = R.
The above exercise is hardly exciting, but learning that R + exists for every R and also
learning how to “compute” R + is exciting. We do this next.
∞
3.2.5 Lemma Given a (set) relation R. Then n=1 R n is a transitive (set) relation.
Proof We have two things to do.

∞
1. R n is a set.
n=1
∞ n
2. n=1 R is a transitive relation.
Proof of 1. Note that all positive powers of R, R n+1 , for n ≥ 0, are sets. Indeed, they all
are subsets of the same set!
Here is why:
Firstly, R ⊆ dom(R) × ran(R) by Definition 3.1.4.
Let now n > 0: We have
n+1 n

R n+1
= R ◦ R ◦ . . . ◦ R = R ◦ R ◦ . . . ◦ R ◦R = R n ◦ R
similarly, observing that

n+1 n

R ◦ R ◦ . . . ◦ R = R ◦ R ◦ . . . ◦ R = R ◦ Rn
we have R n+1 = R ◦ R n . Thus, we established

R n+1 = R ◦ R n (1)
and
R n+1 = R n ◦ R (2)
Applying 3.1.19 to (1) we get
R n+1 ⊆ dom(R) × . . . (1 )
and applying 3.1.19 to (2) we get
R n+1 ⊆ . . . × ran(R) (2 )
Thus
R n+1 ⊆ dom(R) × ran(R)
for n ≥ 0.
Therefore,
X ∈ F = {R i : i = 1, 2, 3, . . .} ⇒ X ⊆ dom(R) × ran(R) ⇒ X ∈ 2dom(R)×ran(R) (3)
Thus F —being a subclass of 2dom(R)×ran(R) — is a set and so is

∞
2.4.21
F = Ri
i=1
Proof of 2. Next, let

∞
∞

x Ri y Ri z
i=1 i=1
Thus for some n and m we have
x Rn y Rm z
this says the same thing as

n m

x R ◦ R ◦ ··· R y R ◦ R ◦ ··· R z
or
n m

x R ◦ R ◦ ··· R◦ R ◦ R ◦ ··· R z
or
n+m

x R ◦ R ◦ ··· R z
that is,
∞

x Ri z
i=1
3.2.6 Remark Why all this work for Part 1 of the proof above? Why not just use 2.4.21
right away? Because 2.4.21 offers only notation once we know that
F = {A0 , A1 , A2 , A3 , . . .} (3)
is a set! Cf. “Suppose the family of sets Q is a set of sets”, the opening statement in the
passage 2.4.21 on notation states.
Here we do not know (yet) if every family of sets like (3) is indeed a set —but in this
case it turns out that we do not care because every member of F = {R i : i = 1, 2, 3, . . .} is
included (as a subset) in dom(R) × ran(R) (a set), which allows us to sidestep the issue!
Whether every family of sets like F in (3) is a set will be answered affirmatively in 3.3.6.
For now note that we cannot recklessly say that after any sequence of construction by stages
steps there is a stage after all those stages. Why? Well, take all the objects in set theory.
Each is given outright (atom; stage 0) or is constructed at some stage (set). If we could prove
there is a stage after all these stages then we could also prove that U is a set, a claim we
refuted with two methods so far!
∞ i ∞ i
Since R ⊆ i=1 R due to R = R 1 , all that remains to show is that i=1 R is a transitive
closure of R is to show that
∞
3.2.7 Lemma If R ⊆ S and S is transitive, then i=1 R i ⊆ S.
Proof I will just show that for all n ≥ 1, R n ⊆ S. OK, R ⊆ S is our assumption, thus R 1 ⊆ S
is true.
For R 2 ⊆ S let x R 2 y, thus (for some z), x Rz Ry hence x SzSy. As S is transitive, the
latter gives x Sy. Done.
For R 3 ⊆ S let x R 3 y, thus (for some z), x R 2 z Ry hence x SzSy. As S is transitive, the
latter gives x Sy. Done.
You see the pattern: Assume now that we proved up to some fixed but unspecified n that
(1) below holds and we want to prove for n + 1 that R n+1 ⊆ S as well using the same value
for n, as in our assumption above.
So, we have R n ⊆ S. (1)
Thus,
(1)
x R n+1 y ⇐⇒ x R n ◦ Ry ⇐⇒ x R n z Ry (some z) =⇒ x SzSy =⇒ x Sy (S transitive)

We have proved:
3.2.8 Theorem (The transitive closure exists) For any relation R, its transitive closure
∞ i
R + exists and is unique. Indeed we have that R + = i=1 R.
An interesting corollary that will lend a computational flavour to 3.2.8 is the following.
3.2.9 Corollary If R is on the set S = {a1 , a2 , . . . , an } where for all i, j in S we have i = j

n
implies ai = a j , then R + = i=1 Ri .
Proof By 3.2.8, all we have to do is prove

∞

n
Ri ⊆ Ri (1)
i=1 i=1
since the ⊇ part is obvious.

∞ i
So let x i=1 R y. This means that
x R q y, for some q ≥ 1 (2)
Thus, I have two cases for (2):
n n
Case 1. q ≤ n. Then x i=1 R i y since R q ⊆ i=1 R i , R q being one of the “R i ” with i in
the 1 ≤ i ≤ n range.
Case 2. q > n. In this case I will show that there is also a k ≤ n such that x R k y, which
sends me back to the “easy Case 1”.
Well, if there is one q > n that satisfies (2) there are probably more. Let us choose
our q to be the smallest > n that gives us (2).
Wait! Why is there a smallest q such that
x R q y and q > n ? (3)
Because among those “q” that fit (3)6 imagine we fix attention to one such.
Now, if it is not the smallest such, then go down to the next smaller one that still satisfies
(3), call it q .
Now go down to the next smaller, q > n, if q is not smallest.
6 There is at least one, else we would not be in Case 2.

Continue like this. Can I do this forever? That is, can we have the following being an
infinitely long sequence of distinct numbers?
n < . . . < q (k)7 < . . . < q < . . . < q < q < q (4)
If yes, then I will have an infinite “descending” chain of distinct natural numbers between
q and n. Absurd!8
Back to the proof. So let the q we are working with be the smallest that satisfies (3). Then
we have the configuration
x Rz 1 Rz 2 Rz 3 . . . z i−1 R z i Rz i+1 . . . zr Rzr +1 . . . z q−1 Ry (5)
The above accounts for q copies of R as needed for

q R

R = R ◦ ... R
q
Now the sequence
z 1 , z 2 , z 3 . . . z i , z i+1 , . . . zr , zr +1 , . . . , z q−1 , y
in (5) above contains q > n members. As they all come from A, not all are distinct. So
let z i = zr (the zr could be as late in the sequence as y, i.e., equal to y).
Now omit the boxed part in (5). We obtain
x Rz 1 Rz 2 Rz 3 . . . z i−1 Rzr Rzr +1 . . . z q−1 Ry (6)

zi
which contains at least one “R” fewer than the sequence (5) does —the entry “z i Rz i+1 ”
(and everything else in the “. . .” part in the box) being removed. That is, (6) states

x Rq y
with q < q. Since the q in (3) was smallest > n, we must have q ≤ n which sends us
to Case 1 and we are done.
7 By “q (k) ” I mean q with k primes.

8 Including n and q there are exactly q − n + 1 distinct numbers in (4).
3.2.1 Computing the Transitive Closure
The result from 3.2.9 permits the computation of the transitive closure of relations on finite
sets. We will give a general definition of “finite” later but for this subsection we mean sets
like S in 3.2.9. We will introduce a matrix representation of relations R on finite sets (in
the sense agreed to for this subsection) the matrix rows and columns being indexed by the
entries of the set S. Since matrices like their coordinates to be pairs of natural numbers,
much will be gained in clarity and nothing lost in mathematical generality if the names of
the entries of finite sets like S of 3.2.9 are natural numbers rather than a generic letter “a”
indexed by natural numbers. Our S-sets therefore are precisely the
S = {1, 2, 3, . . . , n} (1)
A relation R on a set S (as in (1)) can be represented by a matrix.
3.2.10 Definition (Matrices and the Adjacency matrix) A matrix is the term used in
mathematics for the programming term “two dimensional array”. An “m × k” matrix has m
rows and k columns. The address or location of an item in a matrix M, just as in programming,
is given by two coordinates, i (row number) and j (column number) with this notation
M(i, j). Two matrices M and N are equal iff
• They are both k × r , for some k and r ,

and
• for all i, j satisfying 1 ≤ i ≤ k and 1 ≤ j ≤ r we have M(i, j) = N (i, j).
A special case of matrices are the so-called adjacency matrices. Given a relation R on a
finite set S. Its adjacency matrix M R —or just M if R is understood— is an n × n matrix
of 0-1 entries given by
De f 1 if i R j
M R (i, j) =
0 othw
For computational purposes the entries “1” and “0” are taken to be “Boolean” with respect
to addition, that is, their arithmetic is not the normal one on natural numbers but is governed
by the “addition table” below.
Table 3.1 Addition table x y x+y

0 0 0
0 1 1
1 0 1
1 1 1
Table 3.2 Multiplication x y x×y

table
0 0 0
0 1 0
1 0 0
1 1 1
Multiplication is the same one as for numbers:

3.2.11 Example (Matrix addition) Addition of two n × n matrices is done by adding all
the entries with the same coordinates in the two matrices. That is, if M and N are two n × n
De f
matrices, then (M + N )(i, j) = M(i, j) + N (i, j), for all i, j.
Thus
0 1 01 0 1
• + =
1 1 10 1 1
0 1 01 0 1
• + =
0 0 10 1 0
3.2.12 Example (Matrix multiplication) Multiplication of two n × n matrices is done

according to the following formula:
Say M and N are two n × n matrices. Then

n
De f
(M × N )(i, j) = M(i, k) × N (k, j), for all i, j
k=1
n
The notation k=1 f (k) means take all f (k), for 1 ≤ k ≤ n, and add them.
01 01 11
Thus × = because
11 11 11
1. Address (1, 1) of the product holds 0 × 0 + 1 × 1 = 1

2. (1, 2) holds 0 × 1 + 1 × 1 = 1
3. (2, 1) holds 1 × 0 + 1 × 1 = 1
4. (2, 2) holds 1 × 1 + 1 × 1 = 1 + 1 = 1

Incidentally, an n × n matrix is called a “square” matrix.
3.2.13 Example (The Identity Matrix) For each n > 0 we have an n × n identity matrix
In —or I , if the n is understood from the context— whose entries are as follows:
1. The diagonal entries are all equal to 1: That is, In (i, i) = 1, for all 1 ≤ i ≤ n.
2. All non diagonal entries are zero: That is, In (i, j) = 0, for all 1 ≤ j, i ≤ n such that
i = j.
3.2.14 Example Given two relations R and Q on A = {1, 2, . . . , n − 1, n} for some n.

We can calculate the entries of M R◦Q in terms of the entries of M R and M Q that we have
outright, after all, that is how R and Q are “given” to a computer: via M R and M Q .
Indeed, pick i and j. By definition of “◦”,
i R ◦ Q j iff, for some k, it is i Rk and k Q j (1)
By 3.2.10, (1) is equivalent to
i R ◦ Q j iff, for some k, it is M R (i, k) × M Q (k, j) = 1 (1 )
The above in turn can be written without the “wordy” part “for some k” as follows (see also
Tables 3.1 and 3.2)
n
i R ◦ Q j iff M R (i, k) × M Q (k, j) = 1 (1 )
k=1
n
Indeed, the part k=1 M R (i, k) × M Q (k, j) is 1 iff, for at least one k-value, (“for some k”
as we also say) we have M R (i, k) × M Q (k, j) = 1.
One last observation and we are done:

By 3.2.12 we have (M R × M Q )(i, j) = nk=1 M R (i, k) × M Q (k, j). Factoring in (1 ) we
have now
i R ◦ Q j iff (M R × M Q )(i, j) = 1 (1 )
In other words,
M R◦Q (i, j) = (M R × M Q )(i, j), for all i, j (2)
and thus
M R◦Q = M R × M Q (3)
Pause. Elaborate the “In other words”.
3.2.15 Remark In particular, if R is on A = {1, 2, . . . n}, then M R 2 = M R × M R or as we

write usually,
1.
M R 2 = M R2 (4)
2.
(3) (4)
M R 3 = M R 2 ◦R = M R 2 × M R = M R2 × M R = M R3
3. Suppose now that we have had enough perseverance to progress sufficiently far to a
number m that we will not disclose, and obtained
M R m = M Rm (5)
4. To show that we can obtain identities like (5) as far as we like we show how to stand on
the shoulders of (5) and obtain (6) below for m + 1 (“m” is still the fixed undisclosed
number we used in (5))
(3) (5)
M R m+1 = M R m ◦R = M R m × M R = M Rm × M R = M Rm+1 (6)
3.2.16 Exercise Give an example of two 2 × 2 matrices M and N such that M × N =

N × M.
3.2.17 Exercise Prove that for any n × n adjacency matrix M we have
M × In = In × M = M
We have just seen that a computation of R + can be based on (Boolean) matrix multipli-
cation (due to 3.2.9 and the results immediately above). We want to compute
R ∪ R2 ∪ R3 ∪ · · · ∪ Rn
as
M R + M R2 + M R3 + · · · + M Rn
Here is then the most obvious algorithm to do so:
3.2.18 Example (A crude algorithm for computing R + ) Let R be on A = {1, 2, . . . , n}.

We can compute M R + and hence R + as follows:
T ← In
for k = 1 to n do
T ← MR + T × MR
end
In fact, on the k-th iteration of the loop (1 ≤ k ≤ n) T holds
M R + M R2 + · · · + M Rk
since the successive contents of T at the end of the k-the iteration are
W hy?
k = 1, T = M R + In × M R = M R + M R = M R
k = 2, T = M R + T × M R = M R + M R2
k = 3, T = M R + T × M R = M R + (M R + M R2 ) × M R = M R + M R2 + M R3
Guess and Postulate the pattern for k = m below:
k = m, T = M R + M R2 + · · · + M Rm , thus (we are right! The pattern is preserved
below!)
k = m + 1,
T =M R + (M R + M R2 + · · · + M Rm ) × M R
=M R + M R2 + M R3 + · · · + M Rm+1
thus we validated the form of T also for iteration k = m + 1 and therefore
our “guess” of the form of T at iteration k = m is correct for all m-values. In
particular,
It follows that when the program exits (k = n iterations) we have
T = M R + M R2 + M R3 + · · · + M Rn
and therfore
T = M R+
If we ignore a multiplicative constant, the algorithm’s run time is n 3 Boolean additions and
multiplications every time it goes through a loop iteration. That is, it is K n 4 such operations
overall (over all n passages through the loop), for all large n where K is a constant that in
the analysis of algorithms domain we “normally” (read “most of the time”) do not care to
specify exactly.
This “do not care” attitude has led to the so-called “big-O” notation that we will develop
in some detail in Section 7.1. Put simply, for now, for two non negative expressions f (n)
and g(n), we can express the English “the expression f (n) is bounded (or majorised) by a
constant times g(n), for all large n” —or also the less verbose “ f (n) ≤ K × g(n), for all
n ≥ N0 ”— by the very brief notation
f (n) = O(g(n))

Thus, overall the run time is dominated by n times n 3 (the program loops n times), that
is, the algorithm’s run time is O(n 4 ) (Boolean operations) in big-O notation.
3.2.2 The Special Cases of Reflexive Relations on Finite Sets
We can compute R + on a finite set A faster if we know that R is reflexive on A. This better
algorithm is based on the following theorem and its corollaries.
3.2.19 Theorem If R is reflexive on A = {1, 2, 3, . . . , n} and m ≥ n, then

m
R i = R m−1 (1)
i=1
Proof Our proof relies on the techniques in 3.2.9 (ibid. Case 2 of the proof). Towards (1)
we have two directions.
m
⊇-direction. We want i=1 R i ⊇ R m−1 , but this is trivial.
m m
⊆-direction. We want i=1 R i ⊆ R m−1 . This needs some work. So let x i=1 R i y.
Then x R i y, for some i among 1 ≤ i ≤ m.

We have three cases for the above:
1. i = m − 1 works for the assumption (left hand side of ⊆). But then x R m−1 y which is
our conclusion.
2. i = k < m − 1 works for the assumption.
m−1−k R

From x Rk y
and y Ry (reflexivity) we get x Rk y Ry . . . y Ry which trivially implies
x R m−1 y.
3. i = m works for the assumption. We must reduce i so we end up in one of the above two
cases. We have
m R

x Ra1 Ra2 Ra3 · · · am−1 Ry (2)
We will partly use the technique of proof of 3.2.9, Case 2. Now we named m + 1 points
in (2) but we have only n < m + 1 distinct ones in A. So two names in (2) must name
the same point. We have cases:
Case 1. as = at for some 1 ≤ s, t ≤ m − 1. Argue as in (proof of) 3.2.9, Case 2 to

remove at least one R from (2) thus end up with x R q y, where q ≤ m − 1.
Case 2. x = ar for some 1 ≤ r ≤ m − 1. Argue as in (proof of) 3.2.9, Case 2 to remove
at least one R from (2) thus end up with x R q y, where q ≤ m − 1.
Case 3. y = ar for some 1 ≤ r ≤ m − 1. Argue as in (proof of) 3.2.9, Case 2 to remove
at least one R from (2) thus end up with x R q y, where q ≤ m − 1.
Case 4. x = y. By reflexivity we have x Ry, that is, x R 1 y. But 1 ≤ m − 1.
3.2.20 Corollary If R is reflexive on A = {1, 2, . . . , n} and m ≥ n, then R + = R m−1 .
Proof We know that

∞
3.2.9 i 3.2.8 i
n
R+ = R = R
i=1 i=1
Thus adding powers Ri

(via the union operation “∪”) beyond the n-th does not alter the
n
expression i=1 R i . So, if m ≥ n, then

n
m
3.2.19 m−1
R+ = Ri = Ri = R
i=1 i=1
We can now compute faster: Let R be reflexive on A = {1, 2, . . . , n} and let p be smallest 9
such that
n − 1 ≤ 2p (3)
That is p = log2 (n − 1).10 Set m = 2 p + 1. Thus n ≤ m and by 3.2.20
R + = R m−1 = R 2
p
(4)
We can now compute with jumps, by starting with M R and using repeated squaring ( p
times)
9 As the expression 2 p increases without bound and n − 1 is fixed, there are infinitely many p for
which n − 1 ≤ 2 p is true. We can thus pick the smallest p that works.
10 Let x be a real number and t an integer such that t − 1 < x ≤ t. Then we call t the ceiling of x
and write t = x.
T ← MR
for k = 1 to p do
T ← T2
end
By (4) T , at the end of the algorithm above, holds the adjacency matrix M R2 of R + .
p
The computation of T takes11 a constant times p · n 3 = n 3 · log2 (n − 1) Boolean

+, × operations to conclude the above indicated computation. In big-O notation that is
O(n 3 · log2 (n − 1)) Boolean operations.
3.2.3 Warshall’s Algorithm
There is an even faster way to compute R + due to Warshall. The algorithm relies on the
visualisation that x R + y means that there is a path from x to y that passes through points
(members) of A which are connected by arrows labelled “R” and all (arrows) point in the
direction from the “start” x toward the “end” y.
R R R R R R R R R
x −→ a1 −→ a2 −→ a3 −→ · · · −→ a j −→ a j+1 −→ · · · −→ ar −1 −→ y (1)
Thus M R + (x, y) = 1 iff x and y are connected by a path as depicted in (1) above, that is,
x R r y holds. What the algorithm below does is whenever it detects (1) and (1 ) below —
manifested as T (x, y) = 1 and T (y, z) = 1— it adds an “edge” from x to z, that is, makes
T (x, z) = 1 too.
R R R R R R R R R
y −→ b1 −→ b2 −→ b3 −→ · · · −→ b j −→ b j+1 −→ · · · −→ bq−1 −→ z (1 )
The algorithm is simple:
T ← MR
for j = 1 to n
for i = 1 to n
for k=1 to n
T (i, k) ←T (i, k)+T (i, j) × T ( j, k)
The command in the last loop says “if there is a path from i to j and one from j to k then
acknowledge a path (edge) from i to k by making T (i, k) equal to 1.”
This is the correct behaviour for the algorithm but the $1M question is:
Does the algorithm add all the edges (paths) needed for R + ?
11 “It takes” in the analysis of an algorithm’s run time is rarely exact. Usually, as is the case here, “it
takes” is short for “it takes up to”; an upper bound on steps/time.
Is it possible that T (i, k), for some i, k, will incorrectly stay 0 because, when we come
to perform T (i, j) × T ( j, k), the entry T ( j, k) is not 1 yet?
No, not possible.
We will prove the correctness of Warshall’s algorithm employing a “trick”. Well, not a
trick really, but the methodology of “dynamic programming” taught in courses on algorithms
but also appearing in the proof of Kleene’s theorem that expresses sets recognised by finite
automata (FA) as regular expressions (cf. for example, Tourlakis (2012)). Namely, we add
notation to help the reasoning about the correctness of the program above.
We add a superscript to T on the right and left of “←” in the innermost loop:
T ( j) (i, k) ← T ( j−1) (i, k) + T ( j−1) (i, j) × T ( j−1) ( j, k)
The meaning of T (q) (x, y) is that this entry is 1 precisely if there is a “path” from
x to y (such as (1)) that does not use intermediate points ar that are outside the set
{1, 2, . . . , q}.
Correspondingly, the initialisation T ← M R should be viewed as T (0) ← M R since
all paths depicted by M R are direct (single “edges” (arrows) labelled by R) —no
intermediate points on any of them.
Thus, not only the initialisation is correct but also the innermost loop behaves correctly:
The right hand side (before the execution of the assignment instruction inside the loop)
holds recorded paths —if such were recorded— i → k, i → j and j → k that have no
inner points outside {1, 2, . . . , j − 1} the “record” being a 1 or 0 in the corresponding
matrix entries T ( j−1) (i, k), T ( j−1) (i, j), and T ( j−1) ( j, k) according as the foregoing paths
were detected or not.
The left hand side T ( j) (i, k) records paths i → k that either have no inner points outside
{1, 2, . . . , j − 1} (term T ( j−1 (i, k) to the right of “←”) OR by virtue of the concatenation
of i → j and j → k they have inner points from 1 up to and including j; justifying us to
place a “ j” superscript on T to the left of “←”.
Given the semantics of
T ( j) (i, k) ← T ( j−1) (i, k) + T ( j−1) (i, j) × T ( j−1) ( j, k) (2)
as noted in the above two paragraphs, and since T (0) on line 1 is initialised correctly as
already noted in the preceding boxed remark, assignment (2) is correct for all j (and all
i, k). In particular, for j = n, we have T (n) (i, k) = 1 iff there is a path from i to k that uses
no internal nodes outside {1, 2, . . . , n}. In short, “T (n) (i, k) = 1 iff there is a path from i to
k” is correct, period; without the preceding italicised qualification.
Do we need to use the superscripts in T (q) and to introduce new matrices T ( j) , one for
each j = 1, 2, . . . , n, beyond their use notionally in the justification of correctness?
No.
We can record the T (q) entries into T —that is, we store T (q) into T at each step that
the former is updated— without altering the analysis above: Namely, if the T ( j−1) (i, k),
T ( j−1) (i, j) and T ( j−1) ( j, k) in (2) have already been stored in T , then if paths (1) and
(1 ) have already been recorded as T ( j−1) (i, j) = 1 and T ( j−1) ( j, k) = 1, according to the
previous analysis, then they are stored as T (i, j) = 1 and T ( j, k) = 1 in the suggested
algorithm, without superscripts. Thus using the “T (i, k) ← T (i, k) + T (i, j) × T ( j, k)”
in the innermost loop —no “( j)” superscript on the leftmost T !— the algorithm above
correctly updates the left hand side T (i, k) since the right hand side is assumed correct and
the assignment statement can be viewed as having two steps: One, obtaining T ( j) (i, k) as an
“intermediate step” and Two, copying this result into the left hand side of “←” as T (i, k).
The latter entry is yes/no (1/0) without reference to inner nodes.
Before we turn to the timing assessment of the algorithm we give it the form that is
prevalent in the literature.
T ← MR
for j = 1 to n
for i = 1 to n
if T (i, j) =1 then
for k=1 to n
T (i, k) ←T (i, k)+T ( j, k)
By inspection of the above program and the presence of the three nested loops it is trivial
that the algorithm’s run time is bounded by O(n 3 ) Boolean + operations (no ×!).
3.3 Equivalence Relations
Equivalence relations must be on some set A, since we require reflexivity. They play a
significant role in many branches of mathematics and even in computer science. For example,
the minimisation process of finite automata (a topic that we will not cover) relies on the
concept of equivalence relations.
3.3.1 Definition A relation R on A is an equivalence relation, provided it is all of
1. Reflexive
2. Symmetric
3. Transitive
3.3 Equivalence Relations 67
3.3.2 Example The following are equivalence relations
• {(1, 1)} on A = {1}.

• = (or 1 A or A ) on A.
• Let A = {1, 2, 3}. Then R = {(1, 2), (1, 3), (2, 3), (2, 1), (3, 1), (3, 2), (1, 1), (2, 2),
(3, 3)} is an equivalence relation on A.
• N2 is an equivalence relation on N.
Here is a longish, more sophisticated example, that is central in number theory. We will
have another instalment of it after a few definitions and results.
3.3.3 Example (Congruences) Fix an m ≥ 2. We define the relation ≡m on Z by

x ≡m y iff m | (x − y)
Recall that “|” is the “divides with zero remainder” relation.

A notation that is very widespread in the literature is to split the symbol “≡m ” into two
and write
x ≡ y (mod m) instead of x ≡m y
“x ≡ y (mod m)” and x ≡m y are read “x is congruent to y modulo m (or just ‘mod m’)”.
Thus “≡m ” is the congruence (mod m) short symbol, while “≡ . . . (mod m)” is the long
two-piece symbol. We will be using the short symbol.
We verify the required properties for ≡m to be an equivalence relation.
1. Reflexivity: Indeed, m | (x − x), hence x ≡m x.

2. Symmetry: Clearly, if m | (x − y), then m | (y − x). I translate: If x ≡m y, then y ≡m x.
3. Transitivity: Let m | (x − y) and m | (y − z). The first says that, for some k, x − y = km.
Similarly the second says, for some n, y − z = nm. Thus, adding these two equations I
get x − z = (k + n)m, that is, m | (x − z). I translate: If x ≡m y and y ≡m z, then also
x ≡m z.
3.3.4 Definition (Equivalence classes) Given an equivalence relation R on A. The equiv-

alence class of an element x ∈ A is {y ∈ A : x Ry}. We use the symbol [x] R , or just [x] if
R is understood, for the equivalence class.
3.3.5 Remark Suppose an equivalence relation R on A is given.

By reflexivity, x Rx, for any x. Thus x ∈ [x] R , hence all equivalence classes are nonempty.

Be careful to distinguish the brackets {. . .} from these [. . .]. It is not a priori obvious that
x ∈ [x] R until you look at the definition 3.3.4! [x] R = {x}.
The symbol A/R denotes the quotient class of A with respect to R, that is,
De f
A/R = {[x] P : x ∈ A}

This is the time to introduce “Principle 3”12 of set formation.

3.3.6 Remark (Principle 3) Suppose that the class F is indexed by some (or all) members
of a set A. Then F is a set.
Being indexed by (some) members of a set A means that —for every x ∈ F— we have
attached to it as “label(s)” (each often depicted as a subscript or superscript) some mem-
ber(s) of A.
We must ensure that once a label is used it is not used again for another y ∈ F.
Thus, if F = {a, b, c}, then {a1 , b13,19,0 , c42 } is a valid labelling with members from N.13
{a1,13 , b13 , c19 } is not correctly labelled (same label, 13, twice), while the labelling
{a1,42 , b13 , C} is also invalid (C was not labelled):
In sum, we can label an object in F with many labels, but we may not use the same
label twice to label two objects of F and we may not leave any object of F unlabelled.
Note that in 3.3.4 we have labelled every X ∈ A/R by a member of A by virtue of the
fact that any X is an [a] R We can use a or any (or all) x ∈ [a] R to label X .
Two things:
1. The presence of a valid (correct) labelling from a set A ensures that the labelled class is
a set as it —intuitively!— has no more members than the set of labels (I can spend many
—or even all— of available labels on one set of F, but I may not reuse a label, so I have
at least as many labels as there are members in F).
Thus F is as “small” as a set, and thus is a set itself. Some people call Principle 3 the
size limitation doctrine.14
2. Why can’t I use the Principles 0–2 to argue that F, labelled by A, is a set? Well, because
these principles are notorious in not telling me when a stage exists after infinitely many
stages of construction that I might have if, say, I were to build one set for each natural
number:
12 This is the last Principle, I promise!

13 b has three labels attached to it.
14 Practitioners on the foundations of set theory felt that paradoxes occurred in connection with
enormous classes.
A0 , A1 , . . . , An , . . .
Suppose the nature of each Ai —for each i ≥ 0— is such that each Ai+1 is built at stage
i+1 that is astronomically later than the stage i at which Ai was built.
Thus we get an infinite sequence of stages, wildly apart! How can I justify —just on the
basis of Principles 0-2— the existence of a stage that is after all the i , in order to
build the class {A0 , A1 , . . . , An , . . . , } as a set?
We can now state the obvious:
3.3.7 Theorem A/R is a set for any set A and equivalence relation R on A.
Proof A provides labels for all members of A/R. Now invoke Principle 3.
Now that we have had an excuse to introduce Principle 3 early, and applied it to the easy
example above let us do the following exercise:
3.3.8 Exercise Show that it was not necessary to apply the new Principle to prove 3.3.7.
Specifically show that the Lemma follows by Principles 0–2 implicitly via 2.3.6.
Hint. You will need, of course, to find a superset of A/R, that is, a class X that demon-
strably is a set, and satisfies A/R ⊆ X .
3.3.9 Lemma Let P be an equivalence relation on A. Then [x] = [y] iff x P y —where we
have omitted the subscript P from the [. . .]-notation.
Proof (→) part. By reflexivity, x ∈ [x] (3.3.5). The assumption then yields x ∈ [y] and
therefore y P x by 3.3.4. Symmetry gives us x P y now.
(←) part. Let z ∈ [x]. Then x P z. The assumption yields y P x (by symmetry), thus, transi-
tivity yields y P z. That is, z ∈ [y], proving
[x] ⊆ [y]
By swapping letters we have proved above that y P x implies [y] ⊆ [x]. Now (by sym-
metry) our original assumption, namely x P y, implies y P x, hence also [y] ⊆ [x]. All in all,
[x] = [y].
3.3.10 Lemma Let R be an equivalence relation on A. Then
(i) [x] = ∅, for all x ∈ A.

(ii) [x] ∩ [y] = ∅ implies [x] = [y], for all x, y in A.

(iii) x∈A [x] = A.
Proof
(i) 3.3.5.
(ii) Let z ∈ [x] ∩ [y]. Then x Rz and y Rz, therefore x Rz and z Ry (the latter by symmetry)
hence x Ry (transitivity). Thus, [x] = [y] by Lemma 3.3.9.

(iii) The ⊆-part is obvious from [x] ⊆ A. The ⊇-part follows from x∈A {x} = A and
{x} ⊆ [x].
The properties (i)–(iii) are characteristic of the notion of a partition of a set.
3.3.11 Definition (Partitions) Let F be a family of subsets of A. It is a partition of A iff

all of the following hold:
(i) For all X ∈ F we have that X = ∅.

(ii) If {X , Y } ⊆ F and X ∩ Y = ∅, then X = Y .

(iii) F = A.
3.3.12 Remark Often a partition F is given as an indexed family of sets denoted by (Fa )a∈I ,
where I is the indexing set.
Less informatively we may write (Fa )a∈I as
{Fa , Fb , Fc , . . .}
where the Fa are the X , Y , . . . of the definition above.
There is a natural affinity between equivalence relations and partitions on a set A. In fact,
3.3.13 Theorem Given a partition F on a set A. This leads to the definition of an equiv-
alence relation P whose equivalence classes are precisely the sets of the partition, that is
F = A/P.
Proof First we define P:

De f
x P y iff (∃X ∈ F){x, y} ⊆ X (1)
Observe that
(i) P is reflexive: Take any x ∈ A. By 3.3.11(iii), there is an X ∈ F such that x ∈ X , hence

{x, x} ⊆ X . Thus x P x.
(ii) P is, trivially, symmetric since there is no order in {x, y} to make a difference in
definition (1).
(iii) P is transitive: Indeed, let x P y P z. Then {x, y} ⊆ X and {y, z} ⊆ Y for some X , Y in
F.
Thus, y ∈ X ∩ Y hence X = Y by 3.3.11(ii). Hence {x, z} ⊆ X , therefore x P z.
So P is an equivalence relation. Let us compare its equivalence classes with the various
X ∈ F.
Now [x] P (denoted without the subscript P in the remaining proof) is
{y : x P y} (2)
Let us compare [x] with the unique X ∈ F that contains x —why unique? By 3.3.11(ii).
Thus,
(2) (1) x∈X is t
y ∈ [x] ⇐⇒ x P y ⇐⇒ x ∈ X ∧ y ∈ X ⇐⇒ y ∈ X
Thus [x] = X .
3.3.14 Example (Another look at congruences) Euclid’s theorem for the division of inte-
gers states, where Z is the set of all integers, negative, positive and 0:
If a ∈ Z and 0 < m ∈ Z, then there are unique q and r such that
a = mq + r and 0 ≤ r < m (1)
There are many proofs, but here is one: The set
T = {x : 0 ≤ x = a − mz, for some z}
is not empty. For example, if a > 0, then take z = 0 to obtain x = a > 0 in T . If a = 0,

then take z = 0 to obtain x = 0. Finally, if a < 0, then take z = −2|a|15 to obtain x =
−|a| + 2m|a| = |a|(2m − 1) > 0. Since m ≥ 1 we have 2m ≥ 2.
Let then r be the smallest x ≥ 0 in T . If there is one x that works (as we just showed), then
possibly there are more. But we cannot have an infinite descending sequence of nonnegative
integers
. . . < x < x < x < x
thus, in particular, we cannot have such a sequence in T.
There are just x + 1 numbers from 0 to x inclusive! So a smallest x ∈ T that works exists.
The corresponding “z” to the smallest x = r let us call q. So we have
a = mq + r
Can r ≥ m? If so, them write r = k + m, where k = r − m ≥ 0 and k < r (recall that

m > 0). I got
15 Absolute value.
a = m(q + 1) + k
As k < r I have contradicted the minimality of r .
This proves that r < m (that r ≥ 0 is trivial; why?)
We have proved existence of at least one pair q and r that works for (1). How about
uniqueness? Well, the worst thing that can happen is to have two representations (1). Here
is another one:
a = mq + r and 0 ≤ r < m (2)
As both r and r are < m, their “distance” (absolute difference) is also < m.
Now, from (1) and (2) we get
m|q − q | = |r − r | (3)
This cannot be unless q = q (in which case r = r , therefore uniqueness is proved).

Wait: Why “it cannot be” if q = q ? Because then |q − q | ≥ 1 thus the lhs of “=” in (3)
is ≥ m but the rhs is < m.
We now take a deep breath!
Now, back to congruences! The above was just a preamble!

Fix an m > 116 and consider the congruences x ≡m y. What are the equivalence classes?
Better question is what representative members are convenient to use for each such class?
Given that a ≡m r by (1), and using Lemma 3.3.9 we have [a]m = [r ]m .
r is a far better representative than a for the class [a]m as it is “normalised”.
Thus, we have just m equivalence classes [0], [1], . . . , [m − 1].
Wait! Are they distinct? Yes! Since [i] = [ j] is the same as i ≡m j (3.3.9) and, since
0 < |i − j| < m, m cannot divide i − j with 0 remainder, we cannot have [i] = [ j].
OK. How about missing some? We are not, for any a is uniquely expressible as a =
m · q + r , where 0 ≤ r < m. Since m | (a − r ), we have a ≡m r , i.e., (by 3.3.4) a ∈ [r ].
3.3.15 Example (A practical example) Say, I chose m = 5. Where does a = −110987

belong? I.e., in which [. . .]5 class out of [0]5 , [1]5 , [2]5 , [3]5 , [4]5 ?
Well, let’s do primary-school-learnt long division of −a divided by 5 and find quotient
q and remainder r . We find, in this case, q = 22197 and r = 2. These satisfy
−a = 22197 × 5 + 2
16 Congruences modulo m = 1 are trivial and not worth considering.

3.4 Partial Orders 73
Thus,
a = −22197 × 5 − 2 (1)
(1) can be rephrased as
a ≡5 −2 (2)
But easily we check that −2 ≡5 3 (since 3 − (−2) = 5). Thus,
a ∈ [−2]5 = [3]5
3.3.16 Exercise Can you now easily write the same a above as
a = Q × 5 + R, with 0 ≤ R < 5?
Show all your work.
3.4 Partial Orders
This section introduces one of the most important kind of binary relations in set theory and
mathematics in general: The partial order relations.
3.4.1 Definition (Converse or inverse relation of P) For any relation P, the symbol P−1
stands for the converse or inverse relation of P and is defined as
De f
P−1 = {(x, y) : yPx} (1)
Equivalently to (1), we may define xP−1 y iff yPx.
3.4.2 Definition (“(a)P” notation) For any relation P we write “(a)P” to indicate the class
—might fail to be a set— of all outputs of P on (caused by) input a. That is,
De f
(a)P = {y : a P y}
If (a)P = ∅, then P is undefined at a —that is, a ∈

/ dom(P). This undefinedness statement
is often denoted simply by “(a)P ↑” and is naturally read as “P is undefined at a”.
If (a)P = ∅, then P is defined at a —that is, a ∈ dom(P). This definedness statement is
often denoted simply by “(a)P ↓” and is naturally read as “P is defined at a”.
3.4.3 Exercise Give an example of a specific relation P and one specific object (set or atom)
a such that (a)P is a proper class.
3.4.4 Remark We note that for any P and a,
(a)P−1 = {y : aP−1 y} = {y : yPa}
Thus,
(a)P−1 ↑ iff a ∈
/ ran(P)
and
(a)P−1 ↓ iff a ∈ ran(P)

3.4.5 Exercise Show that (P | A)−1 = P−1 | A.
3.4.6 Definition (Partial order) A relation P is called a partial order or just an order, iff
it is
(1) irreflexive (i.e., xPy → x = y for all x, y), and
(2) transitive.
It is emphasised that in the interest of generality —for much of this section (until we say
otherwise)— P need not be a set.
Some people call this a strict order as it imitates the “<” on, say, the natural numbers.

3.4.7 Remark (1) We will normally use the symbol “<” in the abstract setting to denote
any unspecified order P, and it will be pronounced “less than”.
It is hoped that the context will not allow confusion with any concrete use of the symbol
< on numbers (say, on the reals, natural numbers, etc.).
(2) If the order < is a subclass of A × A —i.e., it is <: A → A— then we say that < is an
order on A.
(3) It is easy to check and verify that, for any order < and any class B, we have that
< ∩(B × B) is an order on B.
3.4.8 Exercise Do (3) above with a simple, short proof.
3.4.9 Example The concrete “less than”, <, on N is an order, but ≤ is not (it is not irreflex-
ive). The “greater than” relation, >, on N is also an order, but ≥ is not. Of course, >=<−1 .
In general, it is trivial to verify that P is an order iff P−1 is an order. Exercise!
3.4.10 Example ∅ is an order. Moreover for any A, ∅ ⊆ A × A, thus ∅ is also an order on

A for the arbitrary A.
3.4.11 Example The relation ∈ is irreflexive by the well known A ∈ / A, for all A. It is not
transitive though. For example, if a is a set (or atom), then a ∈ {a} ∈ {{a}} but a ∈/ {{a}}.
So it is not an order.

Let M = ∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}} . The relation ε =∈ ∩(M × M) is transitive
and irreflexive, hence it is an order (on M). Verify!
3.4.12 Example ⊂ is an order, ⊆—failing irreflexivity— is not.
3.4.13 Example Consider the order ⊂ again. In this case we have none of {∅} ⊂ {{∅}},
{{∅}} ⊂ {∅} or {{∅}} = {∅}. That is, {∅} and {{∅}} are non comparable items. This justifies
the qualification partial for orders in general (Definition 3.4.18).
On the other hand, the “natural” < on N is such that one of x = y, x < y, y < x always
holds for any x, y. That is, all (unordered) pairs x, y of N are comparable under <. This is
a concrete example of a total order (see the “official definition” below: 3.4.19).
While all orders are “partial”, some are total (< above) and others are nontotal (⊂ above).

3.4.14 Definition Let < be a partial order on A. We set

De f
≤ = A ∪ <
We pronounce ≤ “less than or equal”. A ∪ > is denoted by ≥ and is pronounced “greater

than or equal”.
Let us call ≤ a reflexive order.
(1) In plain English, given < on A, we define x ≤ y to mean

equality is A

x <y∨ x=y
for all x, y in A.
(2) The definition of ≤ depends on A due to the presence of A . There is no such dependency
on a “reference” class in the case of <.
(3) We remind ourselves once more here that the symbols < and ≤ —and their
pronunciations— do not imply that we are talking about the specific ones on numbers.
It is just a harmless (I hope) notational devise, but unless said explicitly otherwise, “<” and
“≤” are any orders.
3.4.15 Lemma For any <: A → A, the associated relation ≤ on A is reflexive, antisym-
metric and transitive.
Proof (1) Reflexivity is trivial.

(2) For antisymmetry, let x ≤ y and y ≤ x. If x = y then we are done, so assume the
remaining case x = y (i.e., (x, y) ∈/ A ). Then the hypothesis becomes x < y and y < x,
therefore x < x by transitivity, contradicting the irreflexivity of <.
(3) As for transitivity let x ≤ y and y ≤ z.
(a) If x = z, then x ≤ z (see the -remark after 3.4.14) and we are done.
(b) The remaining case is x = z. Now, if it is x = y or y = z (but not both (why?)), then
we are done again. So it remains to consider x < y and y < z. By transitivity of < we
get x < z, hence x ≤ z, since < ⊆ ≤.
3.4.16 Lemma Let P on A be reflexive, antisymmetric and transitive.

Then P − A is an order on A.
Proof Since
P − A ⊆ P (1)
it is clear that P − A is on A. It is also clear that it is irreflexive. We only need verify that
it is transitive.
So let
(x, y) and (y, z) be in P − A (2)
By (1) (or (2))
(x, y) and (y, z) are in P (3)
hence
(x, z) ∈ P
by transitivity of P.
Can (x, z) ∈ A , i.e., can x = z? No, for antisymmetry of P and (3) would imply x = y,
i.e., (x, y) ∈ A contrary to (2).
So, (x, z) ∈ P − A .
3.4.17 Remark Often in the literature, but decreasingly so, it is the “reflexive order”
≤: A → A that is defined as a “partial order” by the requirements that it is reflexive, anti-
symmetric and transitive. Then < is obtained as in Lemma 3.4.16, namely, as “≤ −A ”.
Lemmas 3.4.15 and 3.4.16 show that the two approaches are interchangeable, but the “mod-
ern” approach of Definition 3.4.6 avoids the nuisance of having to tie the notion of order to
some particular “field” A (3.1.6).
For us “≤” is the derived notion defined in 3.4.14.
3.4.18 Definition (PO Class) If < is an order on a class A, we call the informal pair
(A, <)17 a partially ordered class, or PO class.
If < is an order on a set A, we call the pair (A, <) a partially ordered set or PO set.
Often, if the order < is understood as being on A or A, one says that “A is a PO class” or
“A is a PO set” respectively.
3.4.19 Definition (Linear order) A relation < on A is a total or linear order on A iff it is
(1) An order, and
(2) For any x, y in A one of x = y, x < y, y < x holds —this is the so-called “tri-
chotomy” property.
If A is a class, then the informal pair (A, <) is a linearly ordered class —in short, a LO
class.
If A is a set, then the pair (A, <) is a linearly ordered set —in short, a LO set.
One often calls just A a LO class or LO set (as the case warrants) when < is understood
from the context.
3.4.20 Example The standard <: N → N is a total order, hence (N, <) is a LO set.
3.4.21 Definition (Minimal and minimum elements) Let < be an order and A some class.
We are not postulating that < is on A.
An element a ∈ A is a <-minimal element in A, or a <-minimal element of A, iff ¬(∃x ∈

A)x < a —in words, there is nothing below a in A.
m ∈ A is a <-minimum element in A iff (∀x ∈ A)m ≤ x, in words, all x in A satisfy
m ≤ x.
We also use the terminology minimal or minimum with respect to <, instead of <-minimal
or <-minimum.
If a ∈ A is >-minimal in A, that is ¬(∃x ∈ A)x > a, we call a a <-maximal element in
A. Similarly, a >-minimum element —(∀x ∈ A)m ≥ x— is called a <-maximum.
If the order < is understood, then the qualification “<-” is omitted.
17 Formally, (A, <) is not an ordered pair since A may be a proper class and we do not allow class
members —e.g., in {A, {A, <}}— to be proper classes. We may think then of “(A, <)” as informal
notation that simply “ties” A and < together. Alternatively, if we are really determined to have
class pairs (we are not!), we can define pairing with proper classes as components, for example as
(A, B) = De f (A × {0}) ∪ (B × {1}). For our part we will have no use for such formality, and will
consider (A, <) in only the informal sense.
3.4.22 Remark In particular, if a (∈ A) is not in the field dom(<) ∪ ran(<) (cf. 3.1.7) of
<, then a is both <-minimal and <-maximal in A. For example, (∃x ∈ A)x < a is false in
this case since if, for some x, we have x ∈ A and also x < a, then a ∈ ran(<); impossible.
Because of the duality between the notions of minimal/maximal and minimum/maximum,
we will mostly deal with the <-notions whose results can be trivially translated for the >-
notions.
Note how the notation learnt from 3.4.2 and 3.4.1 and 3.4.4 can simplify
¬(∃x ∈ A)x < a (1)
Note that (1) says that no x is in both A and (a) >.18 That is, a is <-minimal in A iff
A ∩ (a) > = ∅ (2)

3.4.23 Example 0 is minimal, also minimum, in N with respect to the “standard” ordering.
In P (N), ∅ is both ⊂-minimal and ⊂-minimum. On the other hand, all of {0}, {1}, {2}
are ⊂-minimal in P (N) − {∅} but none are ⊂-minimum in that set.
Observe from this last example that minimal elements in a class are not unique.
3.4.24 Remark (Hasse diagrams) There is a neat pictorial way to depict orders on finite
sets known as “Hasse diagrams”. To do so one creates a so-called “graph” of the finite PO
set (A, <) where A = {a1 , a2 , . . . , an }.
How? The graph consists of n nodes —which are drawn as points— each labeled by one
ai . The graph also contains 0 or more arrows that connect nodes. These arrows are called
edges.
When we depict an arbitrary R on a finite set like A we draw one arrow (edge) from ai
to a j iff the two relate: ai Ra j .
In Hasse diagrams for PO sets (A, <) we are more selective: We say that b covers a iff
a < b, but there is no c such that a < c < b. In a Hasse diagram we will
1. draw an edge from ai to a j iff a j covers ai .

2. by convention we will draw b higher than a on the page if b covers a.
3. given the convention above, using “arrow-heads” is superfluous: our edges are plain line
segments.
So, let us have A = {1, 2, 3} and < = {(1, 2), (1, 3), (2, 3)}.
18 (a) >= {x : a > x} = {x : x < a} (cf. also 3.4.4).

The above has a minimum (1) and a maximum (3) and is clearly a linear order.
A slightly more complex one is this (A, <), where A = {1, 2, 3, 4} and
< = {(1, 2), (4, 2), (2, 3), (1, 3), (4, 3)}
This one has a maximum (3), two minimal elements (1 and 4) but no minimum, and is not
a linear order: 1 and 4 are not comparable.
3.4.25 Lemma Given an order < and a class A.

(1) If m is a minimum in A, then it is also minimal.
(2) If m is a minimum in A, then it is unique.
Proof (1) Let m be minimum in A. Then
m ≤ x, that is, m = x ∨ m < x (i)
for all x ∈ A. Now, prove that there is no x ∈ A such that x < m.
Let us argue by way of contradiction:
Let
Aa<m (ii)
By (i) I also have

m =a∨m <a (iii)
Now, by irreflexivity, (ii) rules out a = m. So, (iii) nets m < a. (ii) and (iii) and transitivity
yield a < a; contradiction (< is irreflexive). Done.
(2) Let m and n both be minima in A. Then m ≤ n (with m posing as minimum) and
n ≤ m (now n is so posing), hence m = n by antisymmetry of “≤” (Lemma 3.4.15).
3.4.26 Example Let m be <-minimal in A.

Let us attempt to “show” that it is also <-minimum (this is, of course, doomed to fail
due to 3.4.23 and 3.4.25(2) —but the “faulty proof” below is interesting):
By 3.4.21 we have that there is no x in A such that x < m.
Another way to say this is:
For all x ∈ A, I have the negation of “x < m”, that is, I have ¬x < m. (1)
But from “our previous math” (high school? university?) ¬x < m is equivalent to m ≤ x.
Thus (1) says (∀x ∈ A)m ≤ x, in other words, m is the minimum in A.
Do you believe this? (Don’t!) If the order is not total, then the disjunction of x < m, x =
m, m < x may fail to be true, and thus ¬m < x and x < m ∨ x = m are not necessarily
equivalent. See also the counterexample to such expectation in 3.4.13 and also 3.4.23.
3.4.27 Lemma If < is a linear order on A, then every minimal element is also minimum.
Proof The “false proof” of the previous example is valid under the present circumstances.

The following type of relation has fundamental importance for set theory, and mathemat-
ics in general.
3.4.28 Definition 1. An order < satisfies the minimal condition, in short it has MC, iff
every nonempty A has <-minimal elements.
2. If a total order <: B → B has MC, then it is called a well-ordering19 on (or of) the class
B.
3. If (B, <) is a LO class (or set) with MC, then it is a well-ordered class (or set), or WO
class (or WO set).

19 The term “well-ordering” is ungrammatical, but it is the terminology established in the literature!
3.4.29 Remark
1. What Definition 3.4.28 says in case 1. is —see (2) in 3.4.22— “if, for some fixed order
< the following statement
∅ = A → (∃a ∈ A)A ∩ (a) > = ∅ (1)
is true in set theory, for any A, then we say that < has MC ”.
The following observation is very important for future reference:
If A is given via a defining property F(x), as
De f
A = {x : F(x)}
then (1) translates —in terms of F(x)— into

(∃a)F(a) → (∃a) F(a) ∧ ¬(∃y) y < a ∧ F(y) (2)
Conversely, for each formula F(x) we get a class A = {x : F(x)} and thus if < has MC,
then we may express this fact as in (2) above.
2. Much is to be gained in applications by allowing slightly more generality to the concept
of MC by not requiring the relation that is so equipped to be an order. To this end we
will define the counterpart concepts for <-minimal (of Definition 3.4.21) and will also
generalise Definition 3.4.28 below by Definition 3.4.32 below.
3.4.30 Definition This time we are not postulating that P is on A nor that it is an order.
An element a ∈ A is a P-minimal element in A, or a P-minimal element of A, iff ¬(∃x ∈

A)xPa —in words, there is nothing below a in A if you “walk backwards along P”.
3.4.31 Remark
1. The defining condition for P-minimal —¬(∃x ∈ A)xPa— can be simplified.

Noting that xPa iff aP−1 x, we have
¬(∃x ∈ A)xPa iff A ∩ {x : aP−1 x} = ∅ iff A ∩ (a)P−1 = ∅ (cf. 3.4.22)
2. The following observation is useful:
∅ if a ∈
/A
(a) P | A =
A ∩ (a)P othw

Indeed, in the first case we cannot have an x such that a P | A x since this requires
(a, x) ∈ A2 that is untenable under the condition for a. If, on the other hand, a ∈ A, then

we will include in (a) P | A all those x that are in both (a)P = {x : aPx} and A (cf.
3.1.4, Notation 2).
3.4.32 Definition A relation T —that is not necessarily an order— satisfies the minimal
condition, in short it has MC, iff every nonempty A has T-minimal elements in the sense
that a t ∈ A exists such that there is no t ∈ A satisfying t Tt.
3.4.33 Remark Definition 3.4.32 has a formulation identical to (1) of 3.4.29, although it is

here for the general relation with MC —T— as opposed to an order < with MC:
A = ∅ → (∃a ∈ A)A ∩ (a)T−1 = ∅ (1 )

Of course, (a)T−1 = x : aT−1 x = x : xTa .
If we set A = {x : F(x)}, for some formula F, then (1 ) becomes the analogue (2 ) of
3.4.29 (2) below.

(∃a)F(a) → (∃a) F(a) ∧ ¬(∃y) yTa ∧ F(y) (2 )

Often one works in a class A other than the class of everything, U (A might still be a proper
class). It is then useful to “relativise” a relation P to A and perhaps even have this restriction
—because a relational restriction is what we have in mind— have MC even if, perhaps, the
unrelativised P does not. Thus we have
3.4.34 Definition (Relations with MC over, or relative to a class) We say that a relation
P has MC over (or on, or in, or relative to) a class A if P | A does.
The proof of the following proposition relies on Exercise 3.4.5.
3.4.35 Proposition (MC over a Class Test) A relation P has MC over a class A iff the
schema
∅ = B ⊆ A → (∃b ∈ B)B ∩ (b)P−1 = ∅ (1)
is true in set theory.
Proof That P has MC over A means that P | A has MC, that is, the schema

∅ = B → (∃b ∈ B)B ∩ (b) P−1 | A = ∅ (2)
is true in set theory. We will prove that we have (1) iff we have (2). There are two directions
to verify this “iff”.
(I) Assume (1) and prove (2): Towards proving (2) start with the assumption for (2): B = ∅.
Note that we do not adopt the assumption of (1) that includes B ⊆ A.

We consider two subcases for B vs. A on which P has MC.
• B ∩ A = ∅. Using (1) we deduce from the truth of
∅ =B∩A⊆A (2 )
the truth of (∃b ∈ B ∩ A)B ∩ A ∩ (b)P−1 = ∅. Since b ∈ B ∩ A implies b ∈ B we

further conclude (∃b ∈ B)B ∩ A ∩ (b)P−1 = ∅ and further obtain —given b ∈ A
and 3.4.31 2.—
(∃b ∈ B)B ∩ (b) P−1 | A = ∅ (2 )
Having derived (2 ) from B = ∅, we established (2) in this subcase.

• B ∩ A = ∅. The assumption B = ∅ is still the active primary assumption in this
subcase. However the additional
subcase
assumption means that if b ∈ B then b ∈
/A
−1
and by 3.4.31 we have (b) P | A = ∅ which again implies (2 ).
(II) Assume (2) and prove (1): So let ∅ = B ⊆ A. By (2) —and

only the part ∅ = B of
the assumption— we obtain (2 ) above. Noting that (b) P−1 | A = A ∩ (b)P−1 by

3.4.31 2., and mindful of B ∩ A = B, the right hand side of “→” in (2) —that is, (2 )—
becomes (∃b ∈ B)B ∩ (b)P−1 = ∅. This proves (1) due to the “let” at the onset of this
(2) → (1) case.
3.4.36 Remark In practice, the minimal condition (MC) of an order or, indeed, of an arbi-
trary relation P is usually taken relative to a class A, often a set class.
Thus it is important to reformulate (1) of Proposition 3.4.35 that succinctly states that a
relation P (not necessarily an order) has MC over a class A.
3.4.37 Corollary Let P be a relation with MC over A. (1) of 3.4.35 is equivalent to the
truth of the schema below —where A and F[x] are arbitrary, but A in any one application
of MC is fixed as the class inside which we do mathematics.

(∃b ∈ A)F[b] → (∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (†)
Proof (I) Assume (1) in 3.4.35 and prove (†). To this end assume the hypothesis
(∃b ∈ A)F[b] (‡)
of (†) and let us define a class B by

De f
B = A ∩ {x : F[x]} (¶)
20
By (‡) and (¶) we have ∅ = B ⊆ A, thus, by (1) quoted above we get (∃b ∈B) B ∩
(b)P−1 = ∅. This translates (in terms of F[x]) into (∃b ∈ A) F[b] ∧ ¬(∃x) x ∈ B ∧

bP−1 x , which after further translation (replacing “x ∈ B” and “bP−1 x”) becomes

(∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (¶¶)
The conclusion part of (†) is proved.

(II) Next assume (†) and prove (1) of 3.4.35. To this end, assume hypothesis of (1) for some
B, namely,
∅=B⊆A (††)
Let us express B as a class defined by a property, that is, set
De f
B = {x ∈ A : F[x]}, for some formula F[x]
(††) implies —in terms of F[x]— (∃b ∈ A)F[b], which is the same as the hypothesis of
the implication in (†). Since (†) is assumed, we have its conclusion part (see (¶¶) above)
—i.e., it is true under assumption (††). Let us express it without using the notation that
employs “F[x]”. We observe
(∃b∈B) (∃x∈B)

(∃b ∈ A) F[b]∧ ¬ (∃x ∈ A) F[x]∧
xPb
x∈ (b)P−1
hence the following is true (∃b ∈ B)¬(∃x ∈ B)x ∈ (b)P−1 .
In short, (∃b ∈ B)B ∩ (b)P−1 = ∅. We have just shown the truth of the conclusion part
of the implication (1) in 3.4.35 as we had set out to do.
3.5 Functions
At last! We consider here a special case of relations that we know them as “functions”. Many
of you know already that a function is a relation with some special properties.
Let’s make this official:
20 That is, (∃b)(b ∈ B ∧ . . .) or, equivalently, (∃b)(b ∈ A ∧ F[b] ∧ . . .) or, equivalently, (∃b ∈
A)(F[b] ∧ . . .) the “…” part being, before further translation, “B ∩ (b)P−1 = ∅”.
3.5 Functions 85
3.5.1 Definition A function R is a single-valued relation. That is, whenever we have both
x Ry and x Rz, we will also have y = z.
It is traditional to use, generically, lower case letters from among f , g, h, k to denote
functions but this is by no means a requirement.
Another way of putting it, using the notation from 3.4.2, is: A relation R is a function iff
(a)R is either empty or contains exactly one element.
3.5.2 Example The empty set is a relation of course, the empty set of pairs. It is also a
function since
(x, y) ∈ ∅ ∧ (x, z) ∈ ∅ → y = z
vacuously, by virtue of the left hand side of → being false.
We now turn to notation and concepts specific to functions.
3.5.3 Definition (Function-specific notations, terminology) Let f be a function. First off,

the concepts of domain, range, and —in case of a function f : A → B— total and onto are
inherited from those for relations without change. Even the notations “a Rb” and “(a, b) ∈
R” transfer over to functions. Moreover, the notation “(a) f ↓” (correspondingly “(a) f ↑”)
meaning a ∈ dom( f ) (correspondingly a ∈ / dom( f )) and terminology “ f is defined at a”
(correspondingly “ f is undefined at a”) are extended to functions.
In particular, the “relational” notation (a) f for {y : a f y} can always be used.
We noted in 3.5.1 that for functions f and input a we have

⎧
⎪
⎨{y : a f y} = {b} for some b
⎪
(a) f = or
⎪
⎪
⎩∅ if {y : a f y} is empty
But we also have an annoying difference in notation that is used extremely widely:
Mathematicians normally prefer to write f (a) = b instead of (a) f = {b} and f (a) ↑
(undefined at a) if (a) f = ∅.
The qualifier “normally” indicates frequency, but also allows some authors to differ:
Notably, Kurosh (1963) writes “a f ” for relations and functions, even omitting the brackets
around the input a.
We will follow the “normally preferred” notation for functions — f (a)— in this work and
will give reasons for this “preference” —notation “ f (a)” over “(a) f ”— when we consider
the composition of functions below.
Worth recording: If b is such that a f b or (a, b) ∈ f and f is a function, then seeing that
b is unique we have (a) f = {b}.
The relationship between “functional notation” vs. “relational notation” is summarised
below.
f (a) = b iff (a) f = {b}

functional notation relational notation
and
f (a) ↑ iff (a) f = ∅

functional notation relational notation

3.5.4 Definition (Images and Inverse Images) The set of all outputs of a function, when
the inputs come from a particular set X , is called the image of X under f and is denoted
by f [X ]. Thus,
De f
f [X ] = { f (x) : x ∈ X } (1)
Note that careless notation (in many discrete mathematics texts) like f (X ) will not do. This
means the input is X . If I want the inputs to be from inside X , then we must not use the
round brackets notation.
The inverse image of a set Y under a function is useful as well, that is, the set of all inputs
that generate f -outputs exclusively in Y . It is denoted by f −1 [Y ] and is defined as
De f
f −1 [Y ] = {x : f (x) ∈ Y } (2)
Pause. So far we have been giving definitions regarding functions of one variable. Or have
we?
Not really: We have already said that the multiple-input case is subsumed by our notation.
If f : A → B and A is a set of n-tuples, then f is a function of “n-variables”, essentially. The

binary relation that is the alias of f contains like (
pairs xn ), xn+1 . However,
we usually
abuse notation and write (
xn ) f instead of (
xn ) f and f (
xn ) instead of f (
xn ) .
3.5.5 Remark Regarding, say, the definition of f [X ]:

What if f (a) ↑? How do you “collect” an “undefined value” into a set?
Well, you don’t. Both (1) and (2) have a rendering that is independent of the notation “ f (a)”.
Let us never forget that a function is no mystery; it is a relation and we have access to
relational notation. Thus,
f [X ] = {y : (∃x ∈ X )x f y} (1 )
f −1 [Y ] = {x : (∃y ∈ Y )x f y} (2 )

3.5 Functions 87
3.5.6 Example Thus, f [{a}] = { f (x) : x ∈ {a}} = { f (x) : x = a} = { f (a)}.
Let now g = {(1, 2), ({1, 2}, 2), (2, 7)}, clearly a function. Thus, g({1, 2}) = 2, but
g[{1, 2}] = {2, 7}. Also, g(5) ↑ and thus g[{5}] = ∅.
On the other hand, g −1 [{2, 7}] = {1, {1, 2}, 2} and g −1 [{2}] = {1, {1, 2}}, while
g −1 [{8}] = ∅ since no input causes output 8.
3.5.7 Example We saw that (3.5.3) F(a) = ∅ means (a)F = {∅}, that is, (a, ∅) ∈ F or aF∅
—not what one might hastily conclude it means.
We have F(a) ↓ here, with output the object “∅”, that is, it is not the case that F(a) ↑.

The following is quite useful in set theory and even has a nickname, “(the Principle of)
Replacement”.21
3.5.8 Theorem If F is a function (possibly a proper class of pairs), and A is a set, then
F[A] is a set.
Proof Let
∅ = Y = F[A] (†)
Thus, for every y, y ∈ Y iff for some x ∈ A, F(x) = y.
In short, each y ∈ Y is labelled —in the sense of 3.3.6— by all the x ∈ A with the prop-
erty F(x) = y.
Note that the described label-set is valid according to 3.3.6 since
• No member of Y is without an A-label (by (†)). These labels are in A and in F−1 [{y}],
thus
A ∩ F−1 [{y}] is nonempt y (1)
• The set A ∩ F−1 [{y}] has no repeated members (being a set) thus the labels assigned to
y are distinct, and more importantly
• If y = y are both in Y, then they receive non overlapping labels because F−1 [{y}] ∩
F−1 [{y }] = ∅.
21 Said “Principle” is given as the Axiom of Replacement in axiomatic set theory.

Why? Because if z ∈ F−1 [{y}] ∩ F−1 [{y }], then F(z) = y and F(z) = y ; impossible
for a function.
By Principle 3, Y —being labelled by the members of A— is a set too.
3.5.9 Corollary If the domain of a function F is a set, then so is F.
Proof Exercise!
3.5.1 Lambda Notation
Some texts in discrete mathematics and also in calculus will say “let f (x) be a function …”
Well, “ f (x)” is not a function. Correctly it is known variously as a function call, or
function application or function invocation. “ f ” is the function here; a set or “table” —
possibly infinite— of input/output pairs. Thus f is a set and f (x) is an output value (when
the input is x).
Computer programmers are very much aware of the distinction between a function call
f (x) and a function definition for f , the latter being defined intentionally (by behaviour)
rather than extensionally (by explicit listing as a set or table of input/out pairs).
This intentional definition of the input/output behaviour of a function f is done by a
program. Luckily, unlike tables, all programs being finite in length, can fit into a computer!
In mathematics we often want to say “let f be a function of input variables x and z …”
but we are not excused to say it incorrectly as “let f (x, z) be a function”; it is not!
We can say instead “let (or consider) λx z. f (x, z)”. This names both the function f and
its input variables x, z. This is known as Church’s λ-notation and is a by-product of his
foundation of computability via “λ calculus”.
3.5.10 Definition (λ-notation) The expression
input list the outputs rule

begin input list
↓
end input list
↓

λ x1 x2 . . . xn · r ule (1)
denotes a function with input variables x1 , x2 , . . . , xn and output computed according to the
“rule” following the end-of-input dot. We can use “vector notation” for the input list and
write xn or just x, if n is understood or is unimportant, for x1 , x2 , . . . , xn . Then (1) morphs
into
λxn .r ule (1 )
3.5 Functions 89
Examples:
1. λx.x + 1 but also λy.y + 1 and λu.u + 1. The successor function over the natural num-
bers. The variables “x, y” and “u” are not able to accept substitutions —unlike the “x”
in “x + 1” or “x 2 − 30x + 5”.

That is, they are “bound” or “dummy” variables just as x is in this expression 100 2
x=1 x .
2. λxw.w . This function inputs x and w, then ignores x and returns w as output.
2 2
3. We can give a short (letter) name to a function as always. Thus, we can say
“Let f = λxw.w2 ”. Then f (x, w) = w2 and f (5, 2) = 22 = 4.
3.5.2 Kleene Extended Equality for Function Calls
When f (a) ↓, then f (a) = f (a) as is naturally expected. What happens when f (a) ↑?
This begs a more general question that we settle as follows:
3.5.11 Definition (Kleene Equality) Kleene (Kleene (1943)) extended equality to include
the case when the two sides of “=” are calls f ( that are both undefined. For such
a ) and g(b)
cases validate the equality. In symbols
Case 2

Case 1

f ( ≡ f (
a ) = g(b) ↑ ∨ (∃z) f (
a ) ↑ ∧ g(b) =z
a ) = z ∧ g(b)
There is no universal agreement in the literature as to whether or not to use a new symbol
for the extended equality. We will not do so use, but those (publications, not individuals)
who do, use “ ” as in f (
a ) g(b).
3.5.12 Example Let g = {(1, 2), ({1, 2}, 2), (2, 7)}. Then, g(1) = g({1, 2}) and g(1) =
g(2). Moreover, g(3) = g(4) (both undefined).
3.5.13 Definition A function f is 1-1 if for all x, y and z where f (x) = f (y) = z we
obtain x = y. We can also say f is 1-1 iff x f z and y f z imply x = y.
The presence of z (the definedness at x and y) in 3.5.13 ensures that we will not expect
anything unreasonable, like 3 = 4, in the context of a 1-1 function f where f (3) ↑= f (4) ↑.
3.5.14 Example {(1, 1)} and {(1, 1), (2, 7)} are 1-1. {(1, 0), (2, 0)} is not. ∅ is 1-1 vacu-
ously.
3.5.15 Exercise Prove that if f is a 1-1 function, then the relation converse f −1 is a function
(that is, single-valued).
3.5.16 Definition (1-1 Correspondence) A function f : A → B is called a 1-1 correspon-

dence iff it is all three: 1-1, total and onto.
Often we say that A and B are in 1-1 correspondence writing A ∼ B, often omitting
mention of the function that is the 1-1 correspondence.
The terminology is derived from the fact that every element of A is paired with precisely
one element of B and vice versa.
3.5.17 Exercise Show that ∼ is a symmetric and transitive relation on sets.
3.5.3 Function Composition
3.5.18 Remark Composition of functions is inherited from the composition of relations.

Thus, f ◦ g for two functions naturally means
x f ◦ g y iff, for some z, x f z g y (1)
In particular,
f ◦ g is also a function.
Indeed, if we have
x f ◦ g y and x f ◦ g y
then
for some z, x f z g y (2)
and
for some w, x f w g y (3)
As f is a function, (2) and (3) give z = w. In turn, this (g is a function too) gives y = y .
The notation (as in 3.4.2) “(a) f ” for relations is awkward when applied to functions in
the presence of composition. In something like
x→ f →z→ g →y
that represents (1) above, note that f acts first. Its result or output z = f (x) is then inputed
to g —that is, we perform
g(z) = g f (x)
to obtain output y. Thus the first acting function f is called first with argument x and after
that g is called with argument f (x). “Everyday math” notation places the two calls as in
3.5 Functions 91
the displayed formua above: The first call to the right of the 2nd call —order reversal vis a
vis relational notation!
So, set theory heeds these observations and defines:
3.5.19 Definition (Composition of Functions; Notation) We just learnt (3.5.18) that the
composition of two functions produces a function. The present definition is about notation
only.
Let f : A → B and g : B → C be two functions. The relation f ◦ g : A → C, their
relational composition is given in 3.1.16.
For composition of functions, we have the alternative —so-called functional notation
for composition: If f , g are functions then we may use “g f ” to stand for “ f ◦ g”. Note the
order reversal and the absence of “ ◦”, the composition symbol.
In particular we write (g f )(a) for (a)( f ◦ g) —cf. 3.5.3.
Above we said “alternative”, not exclusive. For functions we have two possible notations
for composition: relational and functional.
Thus
De f 22
a(g f )y ⇐⇒ a f ◦ g y ⇐⇒ (∃z)(a f z ∧ z g y)
also
De f De f3.4.2
a(g f )y ⇐⇒ a f ◦ g y ⇐⇒ (a)( f ◦ g) = {y}

In particular, we have that (a)( f ◦ g) of 3.4.2 is the same as (g f )(a) = g f (a) as seen
through the “computation”
(a)( f ◦ g) =3.5.18 {y}⇐⇒ for some z, a f z ∧ z g y

⇐⇒3.5.3 for somez, f (a) = z ∧ g(z) = y
⇐⇒subst. z by f (a) g f (a) = y (1)
Conclusion:
3.5.19 via 3.5.3 (1)
(g f )(a) = (a)( f ◦ g) = g f (x)

Thus the “reversal” g f = f ◦ g now makes sense! So does (g f )(a) = g f (a) .
3.5.20 Theorem Functional composition is associative, that is, (g f )h = g( f h).
Proof Exercise!
22 “⇐⇒” is an equivalence that is different from “≡”. When I write A ⇐⇒ B ⇐⇒ C ⇐⇒ D it

means (A ≡ B) ∧ (B ≡ C) ∧ (C ≡ D). Hmm. So what if I wrote the immediately
previous “lazily”
as A ≡ B ≡ C ≡ D? Well, then it means something else: A ≡ B ≡ (C ≡ D) , which is evaluated
as indicated from right to left. We say that “≡” is associative while ⇐⇒ is conjunctional. More in
our chapter on Logic!
Hint. Note that by, 3.5.19, (g f )h = h ◦ ( f ◦ g). Take it from here.
3.5.21 Example The identity relation on a set A is a function since (a)1 A is the singleton
{a}.
The following interesting result connects the notions of ontoness and 1-1ness with the
“algebra” of composition.
3.5.22 Theorem Let f : A → B and g : B → A be functions. If
g f = 1A (1)
then g is onto while f is total and 1-1.
We say that g is a left inverse of f and f is a right inverse of g. “A” because these are not
in general unique! Stay tuned on this!
Proof About g: Our goal, ontoness, means that, for each x ∈ A, I can “solve the equation
g(y) = x for y”. Indeed I can: By definition of 1 A ,
(1)
3.5.19
g f (x) = (g f )(x) = 1 A (x) = x
So to solve, take y = f (x).

About f : As seen above, x = g( f (x)), for each x ∈ A. Since this is the same as “x f ◦
g x is true”, there must be a z such that x f z and z g x. The first of these implies f (x) ↓.
This, along with “for all x ∈ A”, settles totalness.
For the 1-1ness, let f (a) = f (b). Applying g to both sides we get g( f (a)) = g( f (b)).
But this says a = b, by g f = 1 A , and we are done.
Pause. Should we not “translate” 1-1ness above (per 3.5.13) starting with “let f (a) =
f (b) = c, for some c, then etc.”?
3.5.23 Example The above is as much as can be proved. For example, say A = {1, 2} and
B = {3, 4, 5, 6}. Let f : A → B be {(1, 4), (2, 3)} and g : B → A be {(4, 1), (3, 2), (6, 1)},
or in friendlier notation
f (1)= 4
f (2)= 3
and
g(3) = 2
g(4) = 1
g(5) ↑
3.5 Functions 93
g(6) = 1
Clearly, g f = 1 A holds, but note:

(1) f is not onto.
(2) g is neither 1-1 nor total.
3.5.24 Example With A = {1, 2}, B = {3, 4, 5, 6} and f : A → B and g : B → A as in

the previous example, consider also the functions f˜ and g̃ given by
f˜(1)= 6
f˜(2)= 3
and
g̃(3) = 2
g̃(4) = 1
g̃(5) = 1
g̃(6) = 1
Clearly, g̃ f = 1 A and g f˜ = 1 A hold, but note:

(1) f = f˜.
(2) g = g̃.
Thus, neither left nor right inverses need be unique. The article “a” in the definition of
said inverses was well-chosen.
The following two partial converses of 3.5.22 are useful.
3.5.25 Theorem Let f : A → B be total and 1-1. Then there is an onto g : B → A such
that g f = 1 A .
Proof Consider the converse (also called “inverse”) relation (3.4.1) of f —that is, the
relation f −1 — and call it g:
Def
x g y iff y f x (1)
By Exercise 3.5.15, g : B → A is a (possibly nontotal) function so we can write (1) as
g(x) = y iff f (y) = x, from which, substituting f (y) for x in g(x) we get g( f (y)) = y,
f or all y ∈ A, that is g f = 1 A , hence g is onto by 3.5.22. We got both statements that we
needed to prove.
Pause. Why “for all y ∈ A”?

3.5.26 Remark By (1) above, dom(g) = {x : (∃y)g(x) = y} = {x : (∃y) f (y) = x} =
ran( f ).
3.5.27 Theorem Let f : A → B be onto. Then there is a total and 1-1 g : B → A such
that f g = 1 B .
Proof By assumption, ∅ = f −1 [{b}] ⊆ A, for all b ∈ B. To define g(b) choose one c ∈

f −1 [{b}] and set g(b) = c. Since f (c) = b, we get f (g(b)) = b for all b ∈ B, and hence g
is 1-1 and total by 3.5.22.
3.5.28 Remark (The Axiom of Choice) Strictly speaking, the argument in 3.5.27 is flawed.

“Choose one c ∈ f −1 [{b}], for each b”? How?
Well, “for each b, there is at least one c ∈ f −1 [{b}].23 For each b, write one such down!”
Hmm! If that process were finite I’d be willing to go along, and say, “oh well! A proof is
a finite sequence of statements. Like the one above, in quotes. So I accept it”.
But then I thought: “wait a minute!” If B is infinite, intuitively speaking, then this “proof”
never ends! But there is no such thing as a never-ending proof!
Is there a way out of this difficulty? Answer: Only if we could somehow describe these
infinitely many choices in a finite proof !
For example, if all the sets f −1 [{b}] are subsets of N, and since they are nonempty, we
could say “in each f −1 [{b}] pick the smallest natural number therein!” This simple phrase
well-defines exactly what to do and how to effect each choice, and describes so in a finite
way, avoiding an “infinite proof”.
Russell once illustrated this problem by contrasting choosing one sock from each pair
of an infinite set of pairs of socks, on one hand, while, on the other hand, choosing one shoe
from from each pair of a similarly infinite set of pairs of shoes.
For shoes the method is simply described: “pick the left shoe in each pair”. This totally
defines in a finite manner how each of the infinitely many choices can be effected; precisely.
In the absence of a method (back then24 ) of identifying a left from a right sock there is
no obvious finite method to avoid the obvious infinite “proof/construction”: “Pick a sock
pair; pick a sock from it. Pick another sock pair; pick a sock from it. Etc., forever!” Set
theorists (reluctantly at first! for a discussion see Wilder (1963)) adopted an axiom —called
the Axiom of Choice— that postulates
If S is an infinite family of nonempty sets, then there exists a “choice function” C that
“chooses” one element from each set A ∈ S, in the precise technical sense: C(A) ∈ A,
for all A ∈ S.
This axiom is applicable to our { f −1 [{b}] : b ∈ B} since all f −1 [{b}] are nonempty
by ontoness of f . Thus the “construction” of g(b) in the proof above —“To define g(b)
choose one c ∈ f −1 [{b}] and set g(b) = c”— is legitimised by the Axiom of Choice by
23 Since f is onto.
24 The illustration with the socks would not be valid with certain branded socks nowadays that have
the brand insignia so positioned as to identify left and right socks in a pair.
3.6 Finite and Infinite Sets 95
saying, “let C be a choice

function for the family { f −1 [{b}] : b ∈ B}. For each b, define
De f
g(b) = C f −1 [{b}] ”.
3.6 Finite and Infinite Sets
Broadly speaking (that is, with very little detail contained in what I will say next) we have
sets that are finite —intuitively meaning that we can count all their elements in a finite
amount of time (but see the -remark 3.6.3 below)— and those that are not, naturally
called infinite!
What is a mathematical way to say all this?
Any counting process of the elements of a finite set A will have us say out loud —every
time we pick or point at an element of A— “0th”, “1st”, “2nd”, etc., and, once we reach and
pick the last element of the set, we finally pronounce “nth”, for some appropriate n that we
reached in our counting (again, see 3.6.3).
Thus, mathematically, we are pairing each member of the set with a member from
{0, . . . , n}.
So we propose,
3.6.1 Definition (Finite and infinite sets) A set A is finite iff it is either empty, or is in 1-1
correspondence with {x ∈ N : x ≤ n}. This “normalised” small set of natural numbers we
usually denote by {0, 1, 2, . . . , n}.
If a set is not finite, then it is, by definition, infinite.
3.6.2 Example For any n, {0, . . . , n} is finite since, trivially, {0, . . . , n} ∼ {0, . . . , n} using
the identity () function on the set {0, . . . , n}.
3.6.3 Remark One must be careful when one attempts to explain finiteness via counting

by a human.
For example, Achilles25 could count infinitely many objects by constantly accelerating
his counting process as follows:
He procrastinated for a full second, and then counted the first element. Then, he counted
the second object exactly after 1/2 a second from the first. Then he got to the third element
1/22 seconds after the previous, …, he counted the n th item at exactly 1/2n−1 seconds after
the previous, and so on forever.
Hmm! It was not “forever”, was it? After a total of 2 seconds he was done!
25 OK, he was a demigod; but only “demi”.

You see (as you can easily verify from your calculus knowledge (limits)),26
1 1 1 1
1+ + + . . . + n−1 + . . . = =2
2 22 2 1 − 1/2
So “time” is not a good determinant of finiteness!

3.6.4 Theorem If X {0, . . . , n}, then there is no onto function f : X → {0, . . . , n}.
I am saying, no such f —whether total or not— exists; totalness is immaterial.

Proof First off, the claim holds if X = ∅, since then any such f equals ∅ and its range is
empty.
Let us otherwise proceed by way of contradiction, and assume that the theorem is wrong:
That is, assume that it is possible to have such onto functions, for some n and well chosen
X.
Since I assume there are such n > 0 values, suppose then that the smallest n that allows
this to happen is, say, n 0 , and let X 0 be a corresponding set “X ” that works, that is,
Assume that we have an onto f : X 0 → {0, . . . , n 0 } (1)
Thus X 0 = ∅, by the preceding remark, and therefore n 0 > 0, since otherwise X 0 = ∅.

Pause. Why “otherwise X 0 = ∅”?
Let us call H be the set of all x such that f (x) = n 0 , in short, H = f −1 [{n 0 }]. We have
∅ = H ⊆ X 0 ; the = by ontoness.
Case 1. n 0 ∈ H . Then removing all pairs (a, n 0 ) from f —all these have a ∈ H — we
get a new function f : X 0 − H → {0, 1, . . . , n 0 − 1}, which is still onto as we
only removed inputs that cause output n 0 . Moreover, X 0 − H {0, 1, . . . , n 0 − 1}.
(Why?)
This contradicts minimality of n 0 since n 0 − 1 works too!
n
26 1 + 1 + 1 + . . . + 1 = 1−1/2 . Now let n go to infinity at the limit.
2 22 2n−1 1−1/2
Case 2. n 0 ∈
/ H.
If n 0 ∈
/ X 0 , then we argue exactly as in Case 1 and we just remove the base “H ”
of the cone (in the picture) from X 0 .
Otherwise, we have two subcases:
• f (n 0 ) ↑. Then we (almost) act as in Case 1: The new “X 0 ” is (X 0 − H ) − {n 0 }, since

if we leave n 0 in, then the new “X 0 ” will not be a subset of {0, 1, . . . , n 0 − 1}. We get
a contradiction per Case 1.
• The picture below —that is, f (n 0 ) = m, for some m = n 0 .
We simply transform the picture to the one below, “correcting” f to have f (a) = m
and f (n 0 ) = n 0 , that is defining a new “ f ” that we will call f by

f = f − {(n 0 , m), (a, n 0 )} ∪ {(n 0 , n 0 ), (a, m)}
We get a contradiction per Case 1.
3.6.5 Corollary (Pigeon-Hole Principle) If m < n, then {0, . . . , m} {0, . . . , n}.
Proof If the conclusion fails then we have an onto f : {0, . . . , m} → {0, . . . , n}, contra-
dicting 3.6.4.
Important!
3.6.6 Theorem If A is finite due to A ∼ {0, 1, 2, . . . n} then there can be no justification of

finiteness via another canonical set {0, 1, 2, . . . m} with n = m.
Proof If {0, 1, 2, . . . n} ∼ A ∼ {0, 1, 2, . . . m}, then {0, 1, 2, . . . n} ∼ {0, 1, 2, . . . m} by

3.5.17, hence n = m, otherwise we contradict 3.6.5.
3.6.7 Definition Let A ∼ {0, . . . , n}. Since n is uniquely determined by A we say that A
has n + 1 elements and write |A| = n + 1.

3.6.8 Corollary For any choice of n, there is no onto function from {0, . . . , n} to N.
Proof Fix an n. By way of contradiction, let g : {0, . . . , n} → N be onto. Let

De f
Y = {x ≤ n : g(x) > n + 1}
Now let
De f
X = {0, . . . , n} − Y
and
De f
g = g − Y × N
The “g − Y × N” above is an easy way to say “remove all pairs from g that have their first
component in Y ”.
Thus, g : X → {0, . . . , n, n + 1} is onto (Why?), contradicting 3.6.4 because X ⊆
{0, . . . , n} {0, . . . , n, n + 1}.
3.6.9 Corollary N is infinite.
Proof By 3.6.1 the opposite case requires that there is an n and a function f :
{0, 1, 2, . . . , n} → N that is a 1-1 correspondence. Impossible, since any such an f will
fail to be onto.
Our mathematical definitions have led to what we hoped they would: That N is infinite as
we intuitively understand, notwithstanding Achilles’s accelerated counting!
N is a “canonical” infinite set that we can use to index the members of many infinite sets.
Sets that can be indexed using natural number indices
a0 , a1 , . . .
are called countable.
In the interest of technical flexibility, we do not insist that all members of N be used as
indices. We might enumerate with gaps:
b5 , b9 , b13 , b42 , . . .
Thus, informally, a set A is countable if it is empty or (in the opposite case) if there is a way
to index, hence enumerate, all its members in an array, utilising indices from N. Cf. 3.3.6.
It is allowed to repeatedly list any element of A, so that finite sets are countable. For
example, the set {42}:
42 forever
42, 42, 42, . .
.
We may think that the enumeration above is done by assigning to “42” all of the members
of N as indices, in other words, the enumeration is effected, for example, by the constant
function f : N → {42} given by f (n) = 42 for all n ∈ N. This is consistent with our earlier
definition of indexing (3.3.6).
Now, mathematically,
3.6.10 Definition (Countable Sets) We call a set A countable if A = ∅, or there is an onto

function f : N → A. We do not require f to be total. This means that some or many indices
from N need not be used in the enumeration.
If f (n) ↓, then we say that f (n) is the nth element of A in the enumeration f . We often
write f n instead of f (n) and then call n a “subscript” or “index”.
Thus a nonempty set is countable iff it is the range of some function that has N as its left
field.
Incidentally, since we allow f to be non total, the hedging “nonempty” (in 3.6.10 above
and in this remark) is unnecessary: ∅ is the range of the empty function that has N as its left
field.
We said that the f that proves countability of a set A need not be total. But such an f
can always be “completed”, by adding pairs to it, to get an f such that f : N → A is onto
and total. Here is how:
3.6.11 Proposition Let f : N → A = ∅27 be onto. Then we can extend f to f so that

f : N → A is onto and total.
27 Since we are constructing a total onto function to A we need to assume the case A = ∅ as we
cannot put any outputs into ∅.
Proof Pick an a ∈ A —possible since A = ∅— and keep it fixed. Now, our sought f is
given for all n ∈ N by cases as below:
f (n) if f (n) ↓
f (n) =
a if f (n) ↑

Some set theorists also define sets that can be enumerated using all the elements of N as
indices without repetitions.
3.6.12 Definition (Enumerable or denumerable sets) A set A is enumerable or denumer-

able28 iff A ∼ N.
3.6.13 Example Every enumerable set is countable, but the converse fails. For example, {1}
is countable but not enumerable due to 3.6.8. {2n : n ∈ N} is enumerable, with f (n) = 2n
effecting the 1-1 correspondence f : N → {2n : n ∈ N}.
3.6.14 Theorem If A is an infinite subset of N, then A ∼ N.
Proof We will build a 1-1 and total enumeration of A, presented in a finite manner as a
(pseudo) program below, which enumerates all the members of A in strict ascending order
and arranges them in an array
a(0), a(1), a(2), . . . (1)
n ←0
while A=∅
a(n) ← min A Comment.We are inside the loop ∅ = A ⊆ N, hence min exists.
A ← A − {a(n)}
n ←n+1
end while
Note that the sequence {a(0), a(1), . . . , a(m)} is strictly increasing for any m, since a(0) is
smallest in A, a(1) is smallest in A − {a(0)} and hence the next higher than a(0) in A, etc.
Will this loop ever exit? Suppose that it exits when it starts (but does not complete) the
k-th pass through the loop. Thus A became empty when we did A ← A − {a(k − 1)} in the
previous pass, that is A = {a(0), a(1), . . . , a(k − 1)} and thus, since
a(0) < a(1) < . . . < a(k − 1)
we have that the function f : {0, . . . , k − 1} → A given by
28 We will not use this term in this work.

f = {(0, a(0)), (1, a(1)), . . . (k − 1, a(k − 1))}
is total, 1-1 and onto, hence A ∼ {0, . . . , k − 1} contradicting that A is infinite!
Thus, we never exit the loop!

Hence, by the remark in the paragraph above, (1) enumerates A in strict ascending
order, that is,
if we define f : N → A by f (n) = a(n), for all n
then f is 1-1 (by strict increasing property: distinct inputs cause distinct outputs), and is
trivially total, and onto. Why the latter? Every a ∈ A is reached in ascending order, and
assigned an “n” from N.
3.6.15 Theorem Every infinite countable set is enumerable.
Proof Let f : N → A be onto and total (cf. 3.6.11), where A is infinite. Let g : A → N such
that f g = 1 A (3.5.27). Thus, if we let B = ran(g), we have that g is onto B, and by 3.5.22
is also 1-1 and total. Thus it is a 1-1 correspondence g : A → B, that is,
A∼B (1)
B must be infinite, otherwise (3.6.1), for some n, B ∼ {0, . . . , n} and by (1) via Exercise
3.5.17 we have A ∼ {0, . . . , n}, contradicting that A is infinite. Thus, by 3.6.14, B ∼ N,
hence (again, Exercise 3.5.17 and (1)) A ∼ N. That is, A is enumerable.
So, if we can enumerate an infinite set at all, then we can enumerate it without repetitions.
We can linearise an infinite square matrix of elements in each location (i, j) by devising
a traversal that will go through each (i, j) entry once, and will not miss any entry!
In the literature one often sees the method diagrammatically, see below, where arrows
clearly indicate the sequence of traversing, with the understanding that we use the arrows
by picking the first unused chain of arrows from left to right.
(0, 0) (0, 1) (0, 2) (0, 3) . . .

# # #
(1, 0) (1, 1) (1, 2)
# #
(2, 0) (2, 1)
#
(3, 0)
..
.
So the linearisation induces a 1-1 correspondence between N and the linearised sequence
of matrix entries, that is, it shows that N × N ∼ N. In short,
3.6.16 Theorem The set N × N is countable. In fact, it is enumerable.
Is there a “mathematical” way to do this? Well, the above is mathematical, don’t get me
wrong, but is given in outline. It is like an argument in geometry, where we rely on drawings
(figures).
Here are the algebraic details:
Proof (of 3.6.16 with an algebraic argument). Let us call i + j + 1 the “weight” of a pair
(i, j). The weight is the number of elements in the group:
(i + j, 0), (i + j − 1, 1), (i + j − 2, 2), . . . , (i, j), . . . , (0, i + j)
Thus the diagrammatic enumeration proceeds by enumerating groups by increasing weight
1, 2, 3, 4, 5, . . .
and in each group of weight k we enumerate in ascending order of the second component.
Thus the (i, j) th entry occupies position j in its group —the first position in the group
being the 0 th, e.g., in the group of (3, 0) the first position is the 0 th— and this position
globally is the number of elements in all groups before group i + j + 1, plus j. Thus the first
available position for the first entry of group (i, j) members is just after this many occupied
positions:
(i + j)(i + j + 1)
1 + 2 + 3 + . . . (i + j) =
2
That is,
(i + j)(i + j + 1)
global position of (i, j) is this: +j
2
The function f which for all i, j is given by
(i + j)(i + j + 1)
f (i, j) = +j
2
is the algebraic form of the above enumeration.
There is an easier way to show that N × N ∼ N without diagrams:

By the unique factorisation of numbers into products of primes (Euclid) the function
g : N × N → N given for all m, n by g(m, n) = 2m 3n is 1-1, since Euclid proved that

2m 3n = 2m 3n implies m = m and n = n . It is not onto as it never outputs, say, 5, but
ran(g) is an infinite subset of N (Exercise!).
Thus, trivially, N × N ∼ ran(g) ∼ N, the latter “∼” by 3.6.14.
3.7 Diagonalisation and Uncountable Sets 103

3.6.17 Exercise If A and B are enumerable, so is A × B.
Hint. So, N ∼ A and N ∼ B. Can you show now that N × N ∼ A × B?
Ś
n
With little additional effort one can generalise to the case of Ai .
i=1
3.6.18 Remark
1. Let us collect a few more remarks on countable sets here. Suppose now that we start with
a countable set A. Is every subset of A countable? Yes, because the composition of onto
functions is onto.
2. 3.6.19 Exercise What does composition of onto functions have to do with this? Well,
if B ⊆ A then there is a natural onto function g : A → B. Which one? Think “natural”!
Get a natural total and 1-1 function f : B → A and then use f to get g.
3. As a special case, if A is countable, then so is A ∩ B for any B, since A ∩ B ⊆ A.
4. How about A ∪ B? If both A and B are countable, then so is A ∪ B. Indeed, and without
inventing a new technique, let
a0 , a1 , . . .
be an enumeration of A and
b0 , b1 , . . .
for B. Now form an infinite matrix with the A-enumeration as the 1st row, while each
remaining row is the same as the B-enumeration. Now linearise this matrix!
Of course, we may alternatively adapt the unfolding technique to an infinite matrix of
just two rows. How?
5. 3.6.20 Exercise Let A be enumerable and an enumeration of A
a0 , a1 , a2 , . . . (1)
is given.
So, this is an enumeration without repetitions.
Use techniques we employed in this section to propose a new enumeration in which every
ai is listed infinitely many times (this is useful in some applications of logic).
3.7 Diagonalisation and Uncountable Sets
3.7.1 Example Suppose we have a 3 × 3 matrix
110
101
011
and we are asked: Find a sequence of three numbers, using only 0 or 1, that does not fit as
a row of the above matrix —i.e., is different from all rows.
Sure, you reply: Take 1 1 1. Or, take 0 0 0.
That is correct. But what if the matrix were big, say, 10350000 × 10350000 , or even infinite?
Is there a finitely describable technique that can produce an “unfit” row for any square
matrix, even an infinite one? Yes, it is Cantor’s diagonal method or technique.
He noticed that any row that fits in the matrix as the, say, i-th row, intersects the main
diagonal at the same spot that the i-th column does.
That is, at entry (i, i).
Thus if we take the main diagonal —a sequence that has the same length as any row—
and change every one of its entries, then it will not fit anywhere as a row! Because no row
can have an entry that is different from the entry at the location where it intersects the main
diagonal!
This idea would give the answer 0 1 0 to our original question. While 1000 11 3 also
follows the principle “change all the entries of the diagonal” and works, we are constrained
here to “use only 0 or 1” as entries. More seriously, in a case of a very large or infinite matrix
it is best to have a simple technique that is finitely describable and works even if we do not
know much about the elements of the matrix. Read on!
3.7.2 Example We have an infinite matrix of 0-1 entries. Can we produce an infinite
sequence of 0-1 entries that does not match any row in the matrix? Yes, take the main
diagonal and flip every entry (0 to 1; 1 to 0).
If we assume that it fits as row i, then we get a contradiction:
Say the original row has an a as entry (i, i). But, by our construction, the new row has an
1 − a in as entry (i, i), so it will not fit as row i after all. So it fits nowhere, i being arbitrary.

The technique of obtaining a modified copy of the main diagonal of an infinite matrix so
that it does not match any row of the matrix is due to Cantor and is called diagonal method,
or diagonalisation.

3.7.3 Example (Cantor) Let S denote the set of all infinite sequences of 0s and 1s.
Pause. What is an infinite sequence? Our intuitive understanding of the term is captured
mathematically by the concept of a total function f with left field (and hence domain) N.
The n-th member of the sequence is f (n).
Can we arrange all of S in an infinite matrix —one element per row? No, since the
preceding example shows that we would miss at least one infinite sequence (i.e., we would
fail to list it as a row), because a sequence of infinitely many 0s and/or 1s can be found, as
indicated above, that does not match any row!
But arranging all members of S as an infinite matrix —one element per row— is tan-
tamount to saying that we can enumerate all the members of S using members of N as
indices.
So we cannot do that. S is not countable!
3.7.4 Definition (Uncountable Sets) A set that is not countable is called uncountable.
So, an uncountable set is neither finite, nor enumerable. The first observation makes it
infinite, the second makes it “more infinite” than the set of natural numbers since it is not
in 1-1 correspondence with N (else it would be enumerable, hence countable) nor with a
subset of N: indeed, if the latter holds, then our uncountable set would be finite or enumerable
(which is absurd) according as it would be in 1-1 correspondence with a finite subset or an
infinite subset of N (cf. 3.6.14 and Exercise 3.5.17).
Example 3.7.3 shows that uncountable sets exist. Here is a more interesting one.

3.7.5 Example (Cantor) The set of real numbers in the interval
Def
(0, 1) = {x ∈ R : 0 < x < 1}
is uncountable. This is done via an elaboration of the argument in 3.7.3.

Think of a member of (0, 1), in form, as an infinite sequence of numbers from the set
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
prefixed with a dot; that is, think of the number’s decimal notation.
Some numbers have representations that end in 0s after a certain point. We call these
representations finite. Every such number has also an “infinite representation” since the non
zero digit d immediately to the left of the infinite tail of 0s can be converted to d − 1, and
the infinite tail into 9s, without changing the value of the number.
Allow only infinite representations.
Assume now by way of contradiction that a listing of all members of (0, 1) exists, listing
them via their infinite representations
.a00 a01 a02 a03 a04 . . .

.a10 a11 a12 a13 a14 . . .
.a20 a21 a22 a23 a24 . . .
.a30 a31 a32 a33 a34 . . .
..
.
The argument from 3.7.3 can be easily modified to get a “row that does not fit”, that is, a
representation
.d0 d1 d2 · · ·
not in the listing.
Well, just let
2 if aii = 0 ∨ aii = 1
di =
1 otherwise
Clearly .d0 d1 d2 · · · does not fit in any row i as it differs from the expected digit at the
i-th decimal place: should be aii , but di = aii . It is, on the other hand, an infinite decimal
expansion, being devoid of zeros, and thus should be listed. This contradiction settles the
issue.
3.7.6 Example (3.7.3 Revisited) Consider the set of all total functions from N to {0, 1}.
Is this countable?
Well, if there is an enumeration of these one-variable functions
f0 , f1 , f2 , f3, . . . (1)
consider the function g : N → {0, 1} given by g(x) = 1 − f x (x). Clearly, this must appear
in the listing (1) since it has the correct left and right fields, and is total.
Too bad! If g = f i then g(i) = f i (i). By definition, it is also 1 − f i (i). A contradiction.
This is just a version of 3.7.3; as already noted there, an infinite sequence of 0s and 1s is
just a total function from N to {0, 1}.
The same argument as above shows that the set of all functions from N to itself is
uncountable. Taking g(x) = f x (x) + 1 also works in this case to “systematically change
the diagonal” f 0 (0), f 1 (1), . . . since we are not constrained to keep the function values in
{0, 1}.

3.7.7 Remark Worth Emphasizing. Here is how we constructed g: We have a list of in
principle available f -indices for g. We want to make sure that none of them applies.
A convenient method to do that is to inspect each available index, i, and using the diagonal
method do this: Ensure that g differs from f i at input i, by setting g(i) = 1 − f i (i).
This ensures that g = f i ; period. We say that we cancelled the index i as a possible “ f -index”
of g.
Since the process is applied for each i, we have cancelled all possible indices for g: For
no i can we have g = f i .
3.7.8 Example (Cantor) What about the set of all subsets of N —P (N) or 2N ?
Cantor showed that this is uncountable as well: If not, we have an enumeration of its
members as
S0 , S1 , S2 , . . . (1)
Define the set

Def
D = {x ∈ N : x ∈
/ Sx } (2)
So, D ⊆ N, thus it must appear in the list (1) as an Si . But then i ∈ D iff i ∈ Si by virtue of
D = Si . However, also i ∈ D iff i ∈
/ Si by (2). This contradiction establishes that a legitimate
subset of N, namely D, is not an Si . That is, 2N cannot be so enumerated; it is uncountable.

3.7.9 Example (Characteristic functions) Let S ⊆ N. We can represent S as an infinite

0/1 array:
array position . . . i ... j ...

array content . . . 0 ... 1 ...
... ↑ ... ↑ ...
means . . .i ∈
/ S. . . j ∈ S. . .
This array faithfully represents S —tells all we need to know about what S contains— since
it contains a “1” in location x iff x ∈ S; contains “0” otherwise.
The array viewed as a total function from N to {0, 1} is called the characteristic function
of S, denoted by c S :
1 if x ∈ S
c S (x) =
0 if x ∈ N − S
Note that there is a 1-1 correspondence, let’s call it F, between subsets of N and the total
0-1-valued functions from N simply given by F(S) = c S . (Exercise!)
Thus
{ f : f : N → {0, 1} and f is total} ∼ 2N
In particular, the concept of characteristic functions shows that Example 3.7.8 fits the diag-
onalization methodology. Indeed, the argument in 3.7.8 sets c D (x) = 1 − c Sx (x), for all x,
because
c D (x) = 1 iff x ∈ D iff x ∈

/ Sx iff c Sx (x) = 0 iff 1 − c Sx (x) = 1
But then, the argument in 3.7.8 essentially applies the diagonal method to the list of 0/1
functions c Sx , for x = 0, 1, 2, . . ., to show that some 0/1 function, namely, c D cannot be in
the list.
3.7.10 Remark (Cantor) Cantor offered also a generalisation of 3.7.8: For any set X , we

have X 2 X = P (X ).
Assume otherwise, and let f : X → 2 X be onto, and let us write “Wx ” for “ f (x)” (the
latter, hence also the former) being the subset of X enumerated by f at “position” x (∈ X ).
Define
De f
D = {x ∈ X : x ∈ / Wx } (1)
We show that D (⊆ X ) is not a “Wx ”, that is, it is not enumerated by f contradicting ontoness
of the latter. Thus, indeed X 2 X .
The details: Suppose D = Wa for some a ∈ X . Then a ∈ D ≡ a ∈ Wa . But also (by (1)),
a∈D≡a∈ / Wa . A contradiction.
Is this diagonalisation? Of course: Let c D and cWx be the characteristic functions of D
and Wx (any x) respectively.
We have arranged so that (by (1)) c D (x) = 1 iff cWx (x) = 0. So from the infinite 0/1
matrix with entries cWi ( j) we obtained D by flipping the main diagonal entries cWi (i).
3.7.11 Remark (Russell Paradox and Diagonalisation) It should be mentioned that the

argument in Russell’s paradox is a diagonalisation in the model of the above.
In 3.7.10 we show that {x : x ∈
/ Wx } is “not a Wx ”.
The Wx above are enumerated using indices x from X .
Well, consider here an (attempted) enumeration of all sets using as indices the sets
themselves —that is, the enumerating function is the identity —λx.x : U → U— so while
we might imagine that a set a is enumerated as a “Wa ” we actually enumerate it as just “a”
without the unnecessary burden of the “W -notation”.
As in 3.7.10, we consider the question: Have we enumerated all sets? Or, for example,
Is
simplified “Wx ” notation

{x : x ∈
/ x }
a “Wa ” —or, in our simplified notation, an “a”?
Well, if yes then a = {x : x ∈/ x}, hence a ∈ a ≡ a ∈

/ a, a contradiction.
It is somewhat ironic that Cantor’s famous tool of diagonalisation was used to find a
contradiction in his set theory.
3.8 Operators and the Cantor-Bernstein Theorem
3.8.1 Definition (Operators) An operator on a set X is a total function : P (X ) →

P (X ).
It is called monotone iff S ⊆ T ⊆ X implies (S) ⊆ (T ). A set Z ⊆ X is -closed
means that (Z ) ⊆ Z .
The most popular general symbols that name arbitrary operators in the literature are , , .

Note that acts on points in P (X ) so the notation “(S)” (as opposed to “[S]”) is
correct.
3.8.2 Example Let : P (X ) → P (X ) be as in 3.8.1. Then (X ) ⊆ X . Indeed, by defi-

nition, (X ) ∈ P (X ) therefore (X ) ⊆ X .
3.8 Operators and the Cantor-Bernstein Theorem 109
3.8.3 Theorem (Fixpoint Theorem) Given a monotone operator : 2 X → 2 X . It prov-

ably has a (⊆-)least fixpoint S, that is,
• (S) = S and
• If (X ) = X , then S ⊆ X .
Proof Consider the family of sets

F = Z : Z ⊆ X ∧ (Z ) ⊆ Z
By 3.8.2 and the trivial X ⊆ X we have X ∈ F so this class is nonempty and it has,
therefore, an intersection S that is a subset of X :

S= F= Z⊆X (1)
Z ∈F

For Z ∈ F, it is T ∈F T ⊆ Z , thus we have T ∈F T ⊆ (Z ) by monotonicity and
hence also

T ⊆ (Z ) ⊆ Z=S (2)
T ∈F Z ∈F Z ∈F
where the rightmost “⊆” is due to the condition “(Z ) ⊆ Z ” in F.

We have shown (in (2)) that
(S) ⊆ S (3)
The converse inclusion is also true. For suppose not; then there is an x ∈ S − (S). By
monotonicity, (S − {x}) ⊆ (S) ⊆ S. The left hand side of the ⊆-chain cannot include x
since (S) does not. We conclude that
(S − {x}) ⊆ S − {x}
hence S − {x} is in F implying by definition of S that S ⊆ S − {x}. We reached a contra-

diction as we set out to do.
That is, (S) = S is proved.
And a final observation: Say a subset Z of X satisfies (Z ) = Z . Then Z ∈ F and
therefore S ⊆ Z . Thus, S as defined in (1) is indeed the ⊆-smallest subset Z of X among
all that satisfy (Z ) = Z .
3.8.4 Definition (Least Fixpoint) Let : 2 X → 2 X be a monotone operator. A Z ∈ 2 X

such that (Z ) = Z is called a fixpoint (also fixed point) of .
The ⊆-smallest fixpoint S among all fixpoints of is called the least fixpoint of . It is
denoted by but also by ∞ .
Note the italics in “The” above. The uniqueness follows trivially from the ⊆-smallest
property.29
3.8.1 An Application of Operators to Cardinality
3.8.5 Definition Let f : A → B be total and 1-1. We indicate this situation by the symbol
f
A B and also A B if the role of f is important.
3.8.6 Example Thus
1. A ∼ B implies A B since we have a 1-1 correspondence f : A → B which in partic-

ular is 1-1 and total.
2. For any A we have A 2 A since f that maps a ∈ A to {a} ∈ 2 A is total and 1-1. By
3.7.10, this example establishes the distinctness of the concepts captured by “∼” and
“”. This item establishes a counterexample to a possible conjecture that item 1. has a
converse.
3. If A ⊆ B, then A B. Indeed, the so-called inclusion map i : A → B given by i(x) = x,
for all x ∈ A, is 1-1 and total on A. Sometimes we write i : A ⊆ B.
f
4. A B iff for some C ⊆ B we have A ∼ C. Indeed, for the “if” let the function g : A →
C ⊆ B be 1-1, total on A and be onto C. Trivially, g : A → B is total and 1-1 (not
De f
necessarily onto). For the “only if” define C = ran( f ) and verify claim.
3.8.6 2. motivates the introduction of a new symbol:
3.8.7 Definition If A B but A B, then we write A ≺ B.
The following theorem that variously goes under the pair of names Cantor and Bern-
stein or, alternatively, Schroeder and Bernstein and even only Bernstein is remarkable in
establishing the connection between and ∼.30
f g
3.8.8 Theorem (Dedekind) Let A B and B A. Then A ∼ B.
29 If T is also a least fixpoint, then T ⊆ S, and since S is also least, we have S ⊆ T .

30 As a historical footnote, we observe (cf. Levy (1979), Wilder (1963)) that the theorem was actually
proved by Dedekind in 1887 (cf. (Dedekind, 1888, p.447)), then was only conjectured by Cantor 8
years later, in 1895, and was re-proved by F. Bernstein in 1898 (cf. Borel (1928)). E. Schröder offered a
proof independently as well, which supports also the attribution “Schröder-Bernstein” to the theorem.
3.8 Operators and the Cantor-Bernstein Theorem 111
Proof This proof follows an idea from Dieudonné (1960) but here we use operators (loc.
cit. does not). The assumption says that f : A → B and g : B → A are each total and 1-1.
Define the operator : 2 A → 2 A by
De f
For all Z ⊆ A, (Z ) = (A − g[B]) ∪ g f [Z ] (1)
is monotone since Z ⊆ Z implies g f [Z ] ⊆ g f [Z ]. Let then X be the ⊆-least fixpoint

∞ of , so
(X ) = X ⊆ A (2)
Define
(B ⊇ )X = f [X ] (3)
Y = A−X (4)

Y =B−X (5)
Thus
A = X ∪ Y and X ∩ Y = ∅ (4 )
and
B = X ∪ Y and X ∩ Y = ∅ (5 )
We next derive the counterpart —g[Y ] = Y — of (3) that we will call (3 ).
g[Y ] = g[B − X ] by (5)

= g[B] − g[X ] by Exercise 3.9.15
= g[B] − g f [X ] by (3)
= g[B] ∩ (A − g f [X ])

= A − A − g[B] ∩ (A − g f [X ]) double negation relative to A
= A − (A − g[B]) ∪ g f [X ] de Morgan relative to A
= Y (2) followed by (4) (3 )
It is clear now that h : A → B given below (cf. (4 ))
De f f (x) if x ∈ X
h(x) =
g −1 (x) if x ∈ Y
is a 1-1 correspondence A ∼ B. Of course, g −1 is a function —since g : Y → Y is 1-1—

and is onto Y (by (3 )).
3.8.9 Example (The self-contradictory set of all sets) The quoted term in the Example
caption goes back to the time when mathematicians were still shocked from the discovery of
set theory paradoxes by Russell, Burali-Forti and others (cf. Wilder (1963)) that invariably
pointed to huge or enormously inclusive sets as the culprits. Yet, until such time as Russell
introduced the remedy —of requiring sets to be built by stages and not to just “happen”—
they still considered all collections as sets and employed the nickname “self contradictory
sets” for those examples that were proper classes as we call them in the current terminology.
This example outlines the discovery that U was a “self contradictory set”.
That U cannot be a set —and “yet was back then, a set” (all collections were; hence it
turned out to be a “self contradictory set”)— was actually proved early on in the development
of set theory. This was not deduced as a result of “the Russell non-set set” being a subset of
U (true the latter contains all “sets”).
Besides, such an avenue (via 2.3.6) would not necessarily lead one to conclude that U is
not a set too, considering that the subclass theorem was not yet known. Nor was it known
yet that x ∈ x is false for all x —a fact that would entail U to equal the Russell non-set set
(See also 2.2.1) and thus be a non-set set itself.
But still there was a clear contradiction to “U being a set” that was argued, essentially,
in the following less elementary way:
De f
1. Since U contains everything —U = {x : x = x}— in particular every member of any
set S is in U and thus S ⊆ U. In particular, P (U) ⊆ U and thus (3.8.6.3) P (U) U.
2. On the other hand, by 3.8.6.2 we have U P (U).
By 3.8.8 we have P (U) ∼ U contradicting 3.7.10.
3.9 Exercises
1. Give an example of two equivalence relations R and S on the set A = {1, 2, 3} such that
R ∪ S is not an equivalence relation.
2. Draw the Hasse diagram of the order ⊂ defined on the set 2{1,2,3} .
3. Given the left field A = {0, 1, 2} and right field B = {4, 5}. Which of the following
functions from A to B is partial, total, nontotal, 1-1, onto?
(i) {(0, 4)}

(ii) {(0, 4), (1, 4), (2, 4)}
(iii) {(0, 4), (2, 5)}.
4. Prove that if R and S are any equivalence relations on any set A, then (R ∪ S)+ , the
transitive closure of their union, is also an equivalence relation.
3.9 Exercises 113
5. Let P be a reflexive relation on A that satisfies a Pb ∧ a Pc → b Pc. Prove that P is an

equivalence relation on A.
Caution. This a Pb ∧ a Pc → b Pc is not exactly transitivity!
6. (“False theorem”) Let me “prove” that every symmetric and transitive relation P on
some set A is an equivalence relation. OK, a Pb implies b Pa by symmetry and thus we
have a Pa by transitivity. Reflexivity proved. Done.
a. Actually give a counterexample to the above “theorem”: Propose a small set A and
construct a symmetric and transitive relation on it which is not reflexive.
b. So the theorem is false, but if you were to grade the “proof” you would have to find
where it misstepped. Where exactly is the error in the proof?
7. Find two right inverses of f : A → B, where A = {1, 2, 3}, B = {a, b} and f =

{(1, b), (2, b), (3, a)}.
8. Let F : X → Y be a function, and A ⊆ Y, B ⊆ Y. Prove
(a) F−1 [A ∪ B] = F−1 [A] ∪ F−1 [B]
(b) F−1 [A ∩ B] = F−1 [A] ∩ F−1 [B]
(c) if A ⊆ B, then F−1 [B − A] = F−1 [B] − F−1 [A].
Is this last equality true if A B? Why?
9. Let F : X → Y be a function, and A ⊆ X, B ⊆ X. Prove
(a) F[A ∪ B] = F[A] ∪ F[B]
(b) F[A ∩ B] ⊆ F[A] ∩ F[B]
(c) if A ⊆ B, then F[B − A] ⊇ F[B] − F[A].
Can the above inclusions be sharpened to equalities? Why?
10. Which parts, if any, of the above two problems generalize to the case that F is just a
relation?
11. Let G be a function and F a family of sets. Prove

(a) G−1 [ F] = G−1 [F]

(b) G−1 [ F] = G−1 [F]

(c) G[ F] = G[F]

(d) G[ F] ⊆ G[F] (can ⊆ be replaced by =? Why?)
12. Let F be a function, and A a class. Prove
(a) F[F−1 [A]] ⊆ A
(b) F−1 [F[A]] ⊇ A, provided that A ⊆ dom(F).
Show by appropriate concrete examples that the above inclusions cannot be sharpened,
in general, to equalities.
13. Let the function F be 1-1, while A ⊆ dom(F) is an arbitrary class. Show that
F−1 [F[A]] = A.
State and prove an appropriate converse.
14. Let B ⊆ ran(G). Prove G[G−1 [B]] = B.
State and prove an appropriate converse.
15. Let F be a 1-1 function and A ⊆ B ⊆ dom(F).

(a) Prove F[B − A] = F[B] − F[A]
(b) Prove a suitable converse
Is the restriction A ⊆ B ⊆ dom(F) necessary? Why?
16. For any relations S, T prove
(1) (S−1 )−1 = S
(2) dom(S) = ran(S−1 )
(3) ran(S) = dom(S−1 )
(4) (S ∪ T)−1 = S−1 ∪ T−1 .
17. Show that if a function F is a set, then so are both dom(F) and ran(F).
18. Show for a relation S that if both the range and the domain are sets, then S is a set.
19. Show that if a relation S is transitive, then so is S−1 .
20. Show that for any relations P and Q, (P ◦ Q)−1 = Q−1 ◦ P−1 .
21. Show that (S−1 )+ = (S+ )−1 .
Hint. a(S−1 )+ b means for some finite sequence t1 , . . . , tk we have
aS−1 t1 S−1 t2 . . . tk S−1 b
and a(S+ )−1 b means b(S+ )a, that is, for some finite sequence r1 , . . . , rm we have
bSr1 Sr2 . . . rm Sa
22. If F : A → B is a 1-1 function, show that F−1 : B → A is also a function (single-valued).

23. If F : A → B is a 1-1 correspondence, show that so is F−1 : B → A.
24. Let A be an enumerable set. Prove that we can enumerate it so that every one of its
members is enumerated infinitely many times.
25. Prove that if A is infinite and B is finite, then A ∪ B ∼ A.
26. Prove that every infinite set (in the sense of Definition 3.6.1) has an enumerable subset.
27. Prove that if A is infinite and B is enumerable, then A ∪ B ∼ A.
28. (Dedekind Infinite) Dedekind gave this alternative definition of infinite set, namely
A is infinite iff for some proper subset of A —let’s call it S— we have A ∼ S.
Prove that his definition is equivalent (two directions!) to the one we introduced in
this chapter (3.6.1).
29. Suppose someone defined certain subsets of N —one for each natural number x, which
thus is naming one such subset (possibly several x names are given to the same subset)
and they called them Wx .
Is the following subset of N,
3.9 Exercises 115
{x ∈ N : x ∈
/ Wx }
a Wx ? Why?31
30. Prove that the set of all finite subsets of N is countable.

Hint. Refer to 3.7.9 and use it to uniquely associate an integer with each finite subset of
N. Note that there is a last “1” in the array of outputs of the characteristic function of a
finite subset of N. Now read that array of outputs backwards and think “binary notation”.
31. Cantor proved, using diagonalisation, that (0, 1) = {x ∈ R : 0 < x < x1} is uncount-
able.
Prove that if a < b, then (0, 1) ∼ (a, b) = {x ∈ R : a < x < b}.
Hint. You need to define a function f : (0, 1) → (a, b) that is total, 1-1 and onto. You
have to “stretch” 1 (the length of (0, 1)) to b − a (length of (a, b)).
32. Next prove that R is not “larger” than an interval. To this end show that (−1, 1) ∼ R.
Hint. Try the function f on R defined by
x
f (x) =
1 + |x|
Clearly, f is total. Next, we see that it is 1-1. Indeed, let
a b
= (1)
1 + |a| 1 + |b|
where a and b are in R. This leads to
a − b = b|a| − a|b| (2)
Since by (1) ab ≥ 0, (2) has only two solutions. Both lead to 1-1-ness (a = b).
For ontoness consider three cases of −1 < c < 1 and find a b ∈ R such that f (b) = c:
The cases are c = 0, −1 < c < 0 and 0 < c < 1.
33. Let A and C be sets. Take for granted that {X , Y } is a set (2.3.1) for any sets or atoms
X, Y .
Without using Principles 0, 1, 2, but using Principle 3 prove that A × {C} is a set.
34. Let A and B be sets. Take for granted that {X , Y } is a set (2.3.1) for any sets or atoms
X, Y .
Without using Principles 0, 1, 2, but using Principle 3 prove that A × B is a set.
Hint. Use the preceding exercise.
31 W sets are actually defined and used in computability and were introduced by Rogers. Cf. Rogers
x
(1967), Tourlakis (2022).
A Tiny Bit of Informal Logic
4
Overview
We have become somewhat proficient in using informal logic in our arguments about aspects
of discrete mathematics.
Although we have already used quantifiers, ∃ and ∀, we did so mostly viewing them as
symbolic abbreviations of English texts about mathematics. In this chapter we will expand
our techniques in logic, extending them to include the manipulation of quantifiers, such as
formal —i.e., syntax-based— techniques towards adding ∀, ∃ to the left of a formula, and
also removing them when they are prefixes.
4.1 Enriching Our Proofs to Manipulate Quantifiers
Manipulation of quantifiers boils down, mostly, to “how can I remove a quantifier from the
very beginning of a formula?” and “how can I add a quantifier at the very beginning of a
formula?” Once we learn this technique we will be able to reason within mathematics with
ease.
But first let us define once and for all what a mathematical proof looks like: its correct,
expected syntax or form, that is.
We will need some concepts to begin with.
1. The alphabet and structure of formulas. Formulas are strings. The alphabet —that is
set— of symbols that we use to write down formulas contains, at a minimum,
=, ¬, ∧, ∨, →, ≡, (, ), ∀, ∃, object variables1
1 That is, variables that denote objects such as numbers, arrays, matrices, sets, trees, etc.
https://doi.org/10.1007/978-3-031-30488-0_4
118 4 A Tiny Bit of Informal Logic
We finitely generate the infinite set of object variables using single letters, if necessary
, u
with primes and/or subscripts: A, x, y , w23 501 .
2. One normally works in a mathematical area of interest, or mathematical theory —such as
Geometry, Set Theory, Number Theory, Algebra, Calculus— where one needs additional
symbols to write down formulas. That is, symbols like

0, ∅, ∈, ⊂, , ◦, +, ×
and many others.

3. Mathematicians as a rule get —after a lot of practise— to recognise the formulas and
terms in the math areas of their interest without being necessarily taught the recursive
definition of the syntax of these. We will not give the syntax in these notes either (but
see Tourlakis (2008) if you want to know). Thus one learns to be content with getting to
know formulas and terms by their behaviour and through use, rather than by their exact
definition of syntax.
• Terms are “function calls”, in the jargon of the computer savvy person. These calls
take math objects as inputs and return math objects as outputs. Examples of calls or
terms —all drawn from our familiar arithmetic— are: x + y, x × 3, 0 × x + 1 (one is
told that × is stronger than +, so, notwithstanding the bracket-parsimonious notation
“0 × x + 1”, we know it means “(0 × x) + 1”, so this call returns 1, no matter what
we plugged into x).
• Formulas are also function calls, but their output is restricted (by their syntax that
I will not define carefully!) to be one or the other of the truth values true or false
(denoted in this book by t or f) but nothing else! Their inputs, just as in the case for
terms, are any math objects. Examples are: 2 < 3 (which is true, t), (∀x)x = x (t),
(∀x)x = 0 (f over, say, the reals R), (∃x)x = 0 (t over the reals and natural numbers),
x = 0 the latter being neither true nor false; the answer depends on the input we put
in the “input variable” x.
More: x = x (t), an answer that is independent of input; x = 0 → x = 0 (t), an answer
that is independent of input; x = 0 → (∀x)x = 0, which is neither true nor false; the
answer depends on the input in x. The input variable is the leftmost x; the other two
occurrences of “x” are bound —as we say— and unavailable to accept inputs. See
also below.
• If an occurrence of a variable in a formula is available to accept inputs, then non
logicians would normally call it an occurrence as an input variable. Logicians (in
their classrooms and in the literature they author) would rather call such occurrences
free occurrences.
At the expense of writing style, “occurrence” occurred no less than four times in the
short passage above. The aim is emphasis: It is not a variable x itself that is free
4.1 Enriching Our Proofs to Manipulate Quantifiers 119
(input) or bound (not available for input) in a formula, but it is the occurrences of said
variable that we are speaking of, as the immediately preceding example makes clear.
4. In (∀x)x = 0 the variable x is non input, it is “bound” we say. Just like this: i=1 4 i,
which means 1 + 2 + 3 + 4 and “i” is not available for input: Something like 3=1 3 is,
4
of course, nonsense! Similar comment for ∃.

5. We call ∀, ∃, ¬, ∧, ∨, →, ≡ the “logical connectives”, the last five of them being called
Boolean connectives. Logicians avoid cluttering notation with a lot of brackets, agreeing
that the first three connectives have the same “strength” or “priority”; the highest. To the
remaining connectives they assign priorities that are decreasing as we walk towards the
right. As a habit, logicians omit outermost brackets outright unless a formula is used as
part of another formula —as a subformula. For example, in A ∧ (B ∨ C) the subformula
“(B ∨ C)” is equipped with outermost brackets which protect the “weaker” ∨ from the
“stronger” ∧ and allow the former to bind B forcing ∧ to wait and act on the entire
B ∨ C instead.
As other examples, if A and B are —denote, is meant by “are”— formulas, then ¬A ∨ B
means (¬A) ∨ B because ¬ wins the competition with ∨ as to who binds with A.
Similarly, if we want (∀x) to apply to the entire A → B we must write (∀x)(A → B).
What about A → B → C and A ≡ B ≡ C? Brackets are implied from right to left:
A → (B → C) and A ≡ (B ≡ C). And this? (∃y)(∀x)¬A. Brackets are implied, again,

from right to left: (∃y) (∀x)(¬A) .
The expression (or string of symbols) from the left bracket of the indicated “(∀x)”
(respectively, “(∃x)”) to the end of the formula A on which the quantifier acts (see
below)
(∀x)A —respectively, (∃x)A
is called the scope of the quantifier (∀x) (respectively, (∃x)).

For example, in (∀x)A → B the scope of the (∀x) is the entire expression “(∀x)A”, that
is, no part of the string “→ B” belongs to the scope of the displayed (∀x).
6. Boolean deconstruction. A formula like (∀x)A → B can be deconstructed in the
Boolean sense into the formulas (∀x)A and B. If I knew more about B —say, I knew
that it is “x = 3 → x = 7”, then I can deconstruct further.
So, now I have got
(∀x)A, x = 3, x = 7
The last two have no Boolean structure so deconstructing stops with them. How about
(∀x)A? This cannot be deconstructed either, even if A has Boolean structure! Such
structure is locked up and hidden in the scope of (∀x).
We call the formulas where deconstruction stops “prime”. A prime formula is one with
no Boolean structure, e.g., x < 8, or one of the form (∀x)A or (∃x)A.
Every formula is either prime or can be deconstructed into prime components. A prime
formula is one with no explicit Boolean connectives. Such connectives are either totally
absent in it —e.g., x < y— or are buried in the scope of a quantifier —e.g., (∃x)(x =
0 ∨ x > 5).
Thus prime formulas are “atomic” —no further deconstructible— as far as Boolean
structure is concerned.
4.1.1 Remark (Tautologies) A formula A is a tautology iff it is true due to its Boolean
structure, according to truth tables (Remark 2.3.4) no matter what the values of its prime
formulas into which it is deconstructed are postulated to be. Postulated to be: This signifies
that we do not (attempt to) compute the intrinsic truth value of a prime formula when we
check whether A is a tautology or not.2
For example, x = x is a prime formula and thus its postulated value could be any one of
t or f. Thus it is not a tautology, even though, intrinsically is true, no matter what the value
of x may be.
4.1.2 Example
1. (∀x)A is not a tautology as it has two possible truth values (being a prime formula) in
principle.
2. x = 0 → x = 0 is a tautology. Which are its prime (sub) formulas?
3. (∀x)x = 0 → x = 0 is not a tautology. As noted, to determine tautologyhood we do not
evaluate prime formulas; we just consider each of the two scenarios, t or f, for each
prime formula and use truth tables to compute the overall truth value.
If we did evaluate (∀x)x = 0 we would see that (say over the natural numbers, or reals,
or complex numbers) it is false.3 So the implication is true. However it is not true as a
Boolean formula.
So, how do we show that (∀x)A is true (if it is)? Well, in easy cases we try to see if A is true
for all values of x —no matter where these values come from! That failing, we will use a
proof (see 4.1.11).
Similarly for (∃x)A. To show it is true (if it is) we try to see if A is true for some value
of x. Often we just guess one such value that works, say c, and then verify the truth of A
when x = c. That failing, we will use a proof.
2 After all, not all prime formulas have intrinsic values; x = y does not. It depends on assumed values
of x and y.
3 If we are doing our mathematics restricted to the set {0}, then, in this “theory” the formula is true!
4.1.3 Definition (Tautological implication)

We say that the formulas A1 , A2 , . . . , An tautologically imply a formula B, in symbols
A1 , A2 , . . . , An |=taut B
meaning
“the truth of A1 ∧ A2 ∧ . . . ∧ An implies the truth of B”
that is, that

A1 ∧ A2 ∧ . . . ∧ An → B is a tautology (1)

4.1.4 Remark Note that we do not care to check, or even state, what happens if A ∧ A ∧
1 2
. . . ∧ An is false because the formula in (1) is then trivially true.
So, a tautological implication A1 , A2 , . . . , An |=taut B says that B is true provided we
proved (or accepted) that the lhs of |=taut is true.
|=taut propagates truth from left to right.
4.1.5 Example Here are some easy and some involved tautological implications. They can
all be verified using truth tables, either building the tables in full, or taking shortcuts.
1. A |=taut A
2. A |=taut A ∨ B
3. A |=taut B → A
4. A, ¬A |=taut B —any B. Because I do work only if A ∧ ¬A is true! See above.
5. f |=taut B —any B. Because I do work only if lhs is true! See above.
6. Is this a valid tautological implication? B, A → B |=taut A, where A and B are distinct.
No, for if A is false and B is true, then the lhs is true, but the rhs is false!
7. Is this a valid tautological implication? A, A → B |=taut B? Yes! Say A = t and (A →
B) = t. Then, from the truth table of →, it must be B = t.
Statements such as “B = t” are shorthand for “B evaluates as t”.
8. How about this? A, A ≡ B |=taut B? Yes! Verify!
9. How about this? A ∨ B ≡ B |=taut A → B? Yes! I verify:
First off, assume lhs of |=taut —that is, A ∨ B ≡ B— is true.
Two cases:
• B = f. Then I need the lhs of ≡ to be false (f) to satisfy the italicised “assume”. So
A = f as well and clearly the rhs of |=taut is true with these values.
• B = t. Then I need not worry about A on the lhs. The rhs of |=taut is true by truth
table of →.
10. A ∧ (f ≡ A) |=taut B, for any B. Well, just note that the lhs of |=taut is f so we need to
do no work with B to conclude that the implication is valid.
11.
A → B, C → B |=taut A ∨ C → B
This is nicknamed “proof by cases” for the obvious reasons. Verify this tautological
implication!
Before we describe what a logical proof is, we need some discussion and a definition.
4.1.6 Example (A Cautionary Tale) Consider the formula
(∃y)¬x = y
written more simply

(∃y)x = y (1)
(1) says “for any value of x there is a value of y that is different”.
If the set where we do math contains two or more (distinct) elements —which is, in
practice, always the case— then (1) is true.
The same will be observed —they are true statements— if we substitute z or w for x to
obtain
(∃y)z = y “for any value of z there is a value of y that is different”. (1 )
and
(∃y)w = y “for any value of w there is a value of y that is different”. (1 )
However suppose I substitute y for x in (1). I obtain
(∃y)y = y (2)
which says “there is a value of y that is different from itself” —obviously a false statement.
What caused this distortion of (original) meaning is that an object that we substituted
into a free variable occurrence x contained a variable y that was “captured” by a quantifier,
as we say.
So we always disallow substitutions that cause capture! We say they are illegal or
impossible.

4.1.7 Definition Let A be a formula and x a variable. The symbol A[x] indicates our interest
in the possibly input variable x of A. If y and z are actually the only input (free) variables
of A I can indicate this without words by writing A(y, z).
I explain “possibly”. For example,
1. If A is y = z then x does not even occur in A. But I said possibly! I can still write A[x].
I can also write A(y, z).
2. In the case where A is (∀x)x = 1, A cannot receive inputs in the so-called bound variable
x. Even though I may write A[x], this is just wishful thinking and x does not occur free
in A or as we variously say x is not an input variable of A or A does not depend on x.
3. A is (∀x)x = y. I can write A[x] but x is actually not free in A. I actually have A(y).
4. Let now t be any term —a constant or variable or a function call.
Having declared interest in a (possibly) free variable of A by writing down A[x], I can
next write A[t] in the same context meaning the substitution of t into x —that is, a search
and replace operation: find all free occurrences of x in A and replace them all by t; but do
abort the entire substitution (illegal) operation if any replacement caused some variable
in t to become bound (capture).
In the first illustration above, and assuming t is g(w), we get A[t] is y = z.
If now B[x] is (∀w)w = x then B[t] is illegal since it means (∀w)w = g(w).
5. If I wrote A(y, z), then A(t, z) means A(g(w), z) which is legal if A is as in item 1.
above.
The job of a mathematical proof is to start from established (previous theorems) truths,
or assumed truths (axioms) and unfailingly preserve truths in all the proof’s steps as we
develop it.
This description is word-parsimonious and sounds circular: No chicken and egg dilemma
here, however. “Previous theorems” can be used only if we have any of those at any given
moment. Else we use just axioms.
In fact, the concept of proof is defined in terms of axioms (and rules of inference) alone.
Thus, by the truth-preservation property, we will have produced, in particular, a truth at
the very last step of a proof. This is what we call a proved theorem.
What are our axioms, our starting assumptions, when we do proofs?
4.1.8 Definition First off, in any proof that we will write in math there are axioms that are
independent of the type of math that we do, whether it is set theory, number theory, algebra,
calculus, etc. Such axioms are called logical (logic-specific, that is).
Our logical axioms are
1. All tautologies; these need no defence as “start-up truths”.

2. Formulas of the form (∀x)A[x] → A[t], for any formula A, variable x and “object” t.
This object can be as simple as a variable y (might also be the same as x), constant c,
or as complex as a “function call”, f (t1 , t2 , . . . , f n ) where f accepts n-inputs, and the
inputs shown here are already available objects.
A couple of comments: This is a bona fide start-up truth as its says “if A[x] is true for
all x-values,4 then it is true also if we plug a specific value/object into x”.
Refer also to Definition 4.1.7 which leads to one more comment: If A[t] is aborted due to
capture, then the axiom form 2 or —axiom schema 2 as we call them properly in writings
on logic— does not contribute an axiom for the specific A, x and t that participated in
the capture phenomenon.
3. Formulas of the form A[x] → (∀x)A[x], for any formula A where x is not an input
variable (not free).
For example say A is 3 = 3. This axiom says then, “if 3 = 3 is true, then so is (∀x)3 = 3”.
Sure! 3 = 3 does not depend on x. So saying “for all values of x we have 3 = 3” is the
same as saying just “we have 3 = 3”.
4. Formulas of the form (∀x)(A[x] → B[x]) → (∀x)A[x] → (∀x)B[x], for any formulas
A, B, and variable x.
This is a useful start-up truth. We verify it informally (semantically) for the special case
that A, B have no other free variables besides (possibly) x: Since by truth tables we have
that X ∧ Y → Z and X → Y → Z are equivalent, we view this axiom as
(∀x)(A[x] → B[x]) ∧ (∀x)A[x] → (∀x)B[x]
Now the hypothesis of the last “→” says that
A[x] is true for all x (†)
and
A[x] → B[x] is true for all x (‡)
By (‡), every x that makes A true makes B true. But that is all values of x by (†). So
B[x] is true for all values of x as we wanted to verify.
5. For any choice of variable (here I use “x”) x = x is the identity axiom, no matter what
(is the value of) “x”. Note that “For any choice of variable” means that y = y and w = w
are also instances of the axiom.
6. x = y → y = x and x = y ∧ y = z → x = z are the equality axioms.
They can be expressed equally well using variables other than x and y (e.g., u, v and w).
4 Practicioners usually say “for all x”, meaning for all values of x.
7. The ∃ vs. ∀ axiom. For any formula A and any choice of quantified variable, (∃x)A[x] ≡
¬(∀x)¬A[x] is an axiom.
Note that the right hand side of the “≡” says “it is not the case that all values of x make
A[x] false”.

The “rules of proving”, or rules of inference. These are two rules provided up in front and
as such they are essential to start up the proof mechanism. To indicate this “start-up role”
we often call these rules primary or primitive.
Incidentally you will find I am grossly miscounting the rules in item 2 below:
4.1.9 Definition (Rules)
1. From A[x] I may infer (∀x)A[x]. Logicians write the up-in-front (“primary”) rules as
fractions without words:
A[x]
(∀x)A[x]
this rule we call generalisation, or we are using the nickname “Gen”.
2. I may construct (and use), using any tautological implication that I have verified, say,
one of this shape (form)
A1 , A2 , . . . , An |=taut B
the rule
A1 , A2 , . . . , An
B
For example, seeing readily that A, A → B |=taut B, we have the rule
A, A → B
(M P)
B
This is a very popular rule, known as modus ponens, in short MP.
It turns out that MP and Gen is all you need to prove theorems, thus, officially they are
THE two primary rules. However, additional tautological implications from 2. help in
the practice of proofs.

1. Rule Use: How do we use rules? See also Definition 4.1.11 below. If in a proof that we
are writing we have already written all the formulas of the numerator of some rule, then
it is correct to write next (or at any later step) the denominator of the rule.
We say that we have applied the rule in order to obtain and write the denominator.
2. The second “rule” above is a rule constructor. Any tautological implication we come
up with is fair game: It leads to a valid rule since the name of the game (in a proof) is
preservation/propagation of truth.
This is not an invitation to learn and memorise infinitely many rules (!) but is rather a
license to build your own rules as you go, as long as you endeavoured to verify first the
validity of the tautological implication they are based on.
3. Gen is a rule that indeed propagates truth: If A[x] is true, that means that it is so for all
values of x and all values of any other variables on which A depends, which variables
I did not show in the [. . .] notation. But then so is (∀x)A[x] true, as it says precisely
the same thing: “A[x] is true, for all values of x and all values of any other variables on
which A depends but I did not show in the [. . .] notation”.
The only difference between the two notations is that I added some notational emphasis
in the second —(∀x).
For example, if I know that B has just two variables, u and v, I can write it as B(u, v).
Then
B(u, v) = t iff (∀u)B(u, v) = t iff (∀v)B(u, v) = t iff (∀u)(∀v)B(u, v) = t
4. Hmm. So is (∀x) redundant? Yes, but only as a formula prefix. In something like this
x = 0 → (∀x)x = 0 (1)
it is not redundant!
Dropping ∀ we change the meaning of (1).
As is, (1) is not a true statement (for all values of x, that is). For example, if we set x
to be 0, then (1) becomes the false statement 0 = 0 → (∀x)x = 0 since 0 = 0 is true
but (over the integers, say) “(∀x)x = 0” is f. However dropping (∀x), (1) morphs into
“x = 0 → x = 0” which is a tautology; always true.
5. The axioms in 4.1.8 are indispensable to do just logic; that is why we call them logical
axioms.
We also use them in all math reasoning no matter what type of math it is. However,
mathematical theories have their own additional axioms! These are called special axioms
but most often “mathematical axioms”.
We are not going to list them. Why? Because every math branch, or “theory” as we say,
has different axioms! Unless we do, say, (axiomatic) set theory there is absolutely no
need to list all the set theory axioms.

4.1.10 Example Here is only a sample of axioms from math (theories):
1. Number theory for N:
• x < y ∨ x = y ∨ x > y (trichotomy)

• ¬x < 0 this axiom indicates that 0 is minimal in N. Adding the previous one makes
< a total order, so 0 is also minimum.
• Many other axioms that we omit.
2. Euclidean geometry:
• From two distinct points passes one and only one line.
• (“Axiom of parallels”) From a point A off a line named k —both A and k being on
the same plane— passes a unique line on said plane that is parallel to k.
• Many others that we omit.
3. Axiomatic set theory:
• For any set A,

(∃y)y ∈ A → (∃x) x ∈ A ∧ ¬(∃z)(z ∈ x ∧ z ∈ A)
This is the axiom of “foundation” from which one can prove things like A ∈ A is
always false.
It says that IF there is any element in A at all —this is the hypothesis part “(∃y)y ∈
A”— THEN there is some element —this is the part “(∃x) x ∈ A”— below which, if
you follow “∈” backwards from it, you will not find a z (“¬(∃z)”) that is both below
x along ∈ backwards, and also a member of A —this part is “(z ∈ x ∧ z ∈ A)”.
4. And a few others that we omit.
So what is the shape of proofs?

4.1.11 Definition (Proofs and theorems) A proof is a finite sequence of formulas
F1 , F2 , . . . , Fi , . . . , Fn (1)
such that, for each i = 1, 2, . . . , n, Fi is obtained as one of:
1. It is an axiom from among the ones we listed in 4.1.8.

2. It is an axiom of the theory (area of Mathematics) that we are working in.
3. It is a non-axiom hypothesis that we find convenient to assume (see examples below for
when such hypotheses become applicable).
In annotations of proofs we denote that a formula written down is a hypothesis by
labelling it “hyp” (no quotes).
4. It is the result of “Gen” (nickname for “generalisation”) applied to a previous formula
F j . That is, Fi = (∀x)F j , for some x and j < i.
5. It is the result of “|=taut ” applied to previous formulas F jk , k = 1, 2, . . . , m. That is,
F j1 , F j2 , F j3 , . . . , F jm |=taut Fi , and all jr for r = 1, 2, . . . , m are < i.
Such proofs are known as “Hilbert-style proofs”. We write them vertically, one formula
per line, every formula consecutively numbered, with annotation to the right of formulas
(the “why did I write this?”). Like this
1) F1 because
2) F2 because
.. .. ..
. . .
n) Fn because
Every Fn in (1) is called a theorem. Thus we define
A theorem A from in theory T is a formula that appears in a proof in said theory

with hypotheses . We may call A a -theorem in T . We often omit the “in T ”.
Often one writes A to symbolically say that A is a theorem. If we must indicate that we
worked in some specific theory, say ZFC (set theory), then we may indicate this as
Z FC A
If moreover we have had some “non-axiom hypotheses” (read on to see when this happens!)
that form a set , then we may indicate so by writing
Z FC A

Why for a set of (non-axiom) assumptions? Because we reserve upper case latin letters
for formulas. For sets of formulas we use a distinguishable capital letter, so, we chose
distinguishable Greek capital letters, such as , , , , , , . Obviously, Greek capital
letters like A, B, E, Z will not do!
Before we do a few example proofs, some easy and some more complex, let us establish
a few properties of proofs.
4.1.12 Proposition If A1 , A2 , . . . , Ak , Ak+1 , . . . , An is a proof then so is A1 , A2 , . . . , Ak .
Proof The syntactic fitness for a non-axiom, non-hypothesis Ai for inclusion in a proof
depends on formulas to its left, not to its right. Thus dropping the “tail” Ak , Ak+1 , . . . , An
will leave a shorter proof A1 , . . . Ak .
4.1.13 Corollary A theorem is precisely a formula at the end of some proof.
Proof Let A be a formula.

If it is at the end of a proof, then it is also “in” the proof, hence it is a theorem by 4.1.11.
Let then A appear as Ak in a proof of length n > k. Is there a proof with A, that is, Ak ,
at its end?
Yes. Just chop the tail Ak+1 , . . . , An . Now A finds itself at the end of a proof!
4.1.14 Proposition (Proof Concatenation) If A1 , . . . , An and B1 , B2 , . . . , Bm are proofs,

then so is A1 , . . . , An , B1 , B2 , . . . , Bm .
Proof Exercise 4.2.4
4.1.15 Corollary The concatenation of any finite number of proofs is a proof.
4.1.16 Proposition (Hypothesis Strengthening) If ⊆ and A, then also A.
Proof Exercise 4.2.5
4.1.17 Proposition (Quoting Theorems in a Proof) Proved theorems can be quoted

(included without proof) in a proof.
Proof Suppose we have proved A as A. With this fact in hand, a proof from as
hypotheses is legitimate if along with quoting members of , members of the theory in
hand, and logical axioms we also allow ourselves to quote A.
Let such a proof be

B1 , B2 , . . . , Bk , A, Bk+1 , . . . , Bn (1)
where for simplicity we quoted the -theorem A only once in the above proof.
We view the A as a “macro” that we omitted its expansion (its proof) from (1). The proof
(1) is legitimised by noting that it is only an abbreviation that omits A’s proof and that we
can add the proof at any time (see details below).
The benefit from just quoting A is that the part from Bk+1 to the end may refer to A in
applications of rules of inference but need not know or care what A’s proof is. Nothing from
the proof of A is made available (in (1)) to quote/use by B j , where j > k.
(1) would transform into the following sequence if we included the -proof of A:
... A .
B1 , B2 , . . . , Bk , . . . A , Bk+1 , . . . , Bn (2)
As before, the part from Bk+1 to the end may continue to refer to A and again the formulas
other than A in the proof box for A are not used by proof (2).
Now consider: B1 , B2 , . . . , Bk and . . . A are -proofs, thus so is their concatena-
tion (4.2.4)
B1 , B2 , . . . , Bk , ... A
Finally, the tail part Bk+1 , . . . , Bn has all its B j pass the proof test (1.–5. in 4.1.11) as
these depend ONLY on the sequence B1 , B2 , . . . , Bk , A.
In short, (2) IS a proof according to 4.1.11
The whole point is: By the devise of including not only A but also its (inaccessible)
proof we rendered (2) formally correct, unlike (1). We legitimised (1) as a practical
shorthand.

The following is related:
4.1.18 Proposition (Derived Rules of Inference) A derived rule of inference is usually

written as A1 , A2 , . . . , An B rather than
A1 , A2 , . . . , An
B
They are applicable exactly as the primary rules are, that is, if we proved all of
1. A1
2. A2
3. A3
..
.
n. An
then we have B.
Proof We are given that we have proofs
A1 , A2 , . . . , An , . . . , B (1)
. . . , A1
. . . , A2
..
.
. . . , An
Concatenate the previous n + 1 derivations in this order
. . . , A1 . . . , A2 ... . . . , An A1 , A2 , . . . , An , . . . , B (2)
The first n proofs, as concatenated together, form a proof from (4.1.14, corollary). Now
concatenating box (1) at the right end of the foregoing sequence we get the sequence (2).
Since all the Ai that are hypotheses in the proof (1) are copies of those proved from
in the previous n boxes —and since in a proof we may repeat a formula that we proved
earlier as many times, and anywhere after its first occurrence, we please— it follows that
the sequence (2) is a -proof.
4.1.19 Example (New (derived) rules) A derived rule is one we were not given as primitive
—in 4.1.9— to bootstrap logic, but we can still prove that it propagates truth.
1. We have a new (derived) rule: (∀x)A[x] A[t].

This is called Specialisation, or Spec.
Aha! We used a non-axiom assumption (hypothesis) here! I write a Hilbert proof to show
that A[t] is a theorem if (∀x)A[x] is a (non-axiom) hypothesis (assumption) —shortened
to “hyp”.
1) (∀x)A[x] hyp
2) (∀x)A[x] → A[t] axiom 2
3) A[t] 1 + 2 + MP
2. Taking t to be x we have (∀x)A[x] A[x], simply written as (∀x)A A.

3. The Dual Spec derived rule: A[t] (∃x)A[x]. We prove it:
1) A[t] hyp
2) (∀x)¬A[x] → ¬A[t] axiom 2
3) A[t] → ¬(∀x)¬A[x] 2 + Post
4) ¬(∀x)¬A[x] 3 + MP
Line 4 contains (∃x)A[x] by axiom 7, or, if you prefer, “(∃x)A[x] is obtained from axiom 7
and an application of Post”.
Instead of “tautological implication” we may give as reason just “Post” (no quotes) see line 3
above since it is Post’s completeness theorem for Boolean Logic that is at play when we
invoke “tautological implication” as a rule of inference (see 4.1.9, 2.)
Taking t to be x we have A[x] (∃x)A[x], simply written as A (∃x)A.
There are two principles of proof that we state without proof here (but you should try to
prove them as Exercises (Sect. 4.2) where several helpful hints are included.).
4.1.20 Remark (Deduction theorem and proof by contradiction)
1. The deduction theorem (also known as “proof by assuming the antecedent”) states, if
, A B (1)
then also A → B, provided that in the proof of (1), all free variables of A were
treated as constants: That is we neither used them to do a Gen, nor substituted objects
into them.
The notations “, A” and “ + A” are standard for the more cumbersome ∪ {A}.
In practice, this principle is applied to prove A → B, by doing instead the “eas-

ier” (1). Why easier? We are helped by an extra hypothesis, A, and the formula to prove,
B, is less complex than A → B.
2. Proof by contradiction. To prove A —where A has no free variables or is closed or is

a sentence— is equivalent to proving the “constant formula” f from hypothesis , ¬A.
3. Why the burden of the non-axiom hypotheses ? Because in applying the deduction
theorem we usually start with a task like “do A → B → C → D”.
So we go like this:
• By DThm, it suffices to prove A B → C → D instead (here “” was ∅).
• Again, by DThm, it suffices to prove A, B C → D instead (here “” was A).
• Again, by DThm, it suffices to prove A, B, C D instead (here “” was A, B).

4.1.21 Remark (Ping-Pong) For any formulas A and B, the formula —where I am using
way more brackets than I have to, ironically, to improve readability—

(A ≡ B) ≡ (A → B) ∧ (B → A)
is a tautology (draw up a truth table with one row for each of the possible values of A and
B and verify that the equivalence is always t).
Thus to prove the lhs of the second ≡ suffices to prove the rhs:
.. ..
. .
1) (A → B) ∧ (B → A) suppose I proved this
2) (A ≡ B) ≡ (A → B) ∧ (B → A) tautology, hence also axiom
3) A≡B 1 + 2 + tautological implication
In turn, to prove the rhs it suffices to prove each of A → B and B → A separately. This last
idea encapsulates the ping-pong approach to proving equivalences.
Here are a few applications.
4.1.22 Example 1. Establish (∀x)(A ∧ B) ≡ (∀x)A ∧ (∀x)B.

By ping-pong.
• Prove (∀x)(A ∧ B) → (∀x)A ∧ (∀x)B. By DThm suffices to do (∀x)(A ∧ B)

(∀x)A ∧ (∀x)B instead.
1) (∀x)(A ∧ B) hyp
2) A∧B 1 + Spec
3) A 2 + tautological implication
4) B 2 + tautological implication
5) (∀x)A 3 + Gen; OK : x is not free in line 1
6) (∀x)B 4 + Gen; OK : x is not free in line 1
7) (∀x)A ∧ (∀x)B 5 + 6 + tautological implication
• Prove (∀x)A ∧ (∀x)B → (∀x)(A ∧ B). By DThm suffices to do (∀x)A ∧ (∀x)B

(∀x)(A ∧ B) instead.
1) (∀x)A ∧ (∀x)B hyp

2) (∀x)A 1 + tautological implication
3) (∀x)B 1 + tautological implication
Complete the above proof!
2. Prove (∀x)(∀y)A ≡ (∀y)(∀x)A. By ping-pong.
• Prove (∀x)(∀y)A → (∀y)(∀x)A.

By DThm suffices to do (∀x)(∀y)A (∀y)(∀x)A instead.
1) (∀x)(∀y)A hyp
2) (∀y)A 1 + Spec
3) A 2 + Spec
4) (∀x)A 3 + Gen; OK, no free x in line 1
5) (∀y)(∀x)A 4 + Gen; OK, no free y in line 1
• Prove (∀y)(∀x)A → (∀x)(∀y)A.

Exercise!
We have seen how to add an (∃x) in front of a formula (4.1.19 3.).

How about removing an (∃x)-prefix? This is much more complex than removing a (∀x)-
prefix:
The technique can be proved to be correct (e.g., Tourlakis (2003a)) but I will omit the
proof here —albeit I will ask you to prove it in the Exercises Sect. 4.2 with hints.
Note that I also omitted the proof of the deduction theorem technique. This I will help
you prove in the Exercises section.
In preparation for the removal of an ∃-prefix proof we will need an important and very
useful result, that of renaming the bound variable:
4.1.23 Definition (Substitution Again) Recall Definition 4.1.7. We indicate there that once
we declared our interest in the (possibly) free variable x of A by writing A[x], we can in
the same context write A[t] to indicate that all the free occurrences of x (if any) in A are
everywhere replaced by the object (term) t. Recall the caution needed in such substitutions
(4.1.6).
Here we add an explicit notation for the process “find and replace by the term t all the
free occurrences of x in A”: The symbol is A[x ← t].
The substitution operation compound symbol [x ← t] is viewed as an operator or con-
nective and as such it has the highest priority of all connectives.
Thus, if we write A∧ B[x ← t] then we mean A ∧ (B[x ← t]). Also, (∀y)A[x ← t]
means (∀y) A[x ← t] thus there is no capture of y (if it appears in t) since the substitution
took effect before the quantifier was applied. For example, obtaining (∀x)x = 0 from x = 0
we do not speak of capture of x! It is just the process of formula formation!
4.1.24 Theorem Technique of removing an ∃-prefix: Suppose I have that (∃x)A[x] is

true —either as an assumption or a theorem that I proved earlier— and I want to use this
and prove B.
Then I assume that A[z] is true —for some new variable z that does not occur in B nor
in (∃x)A —we call such a variable “fresh”.
Logicians annotate this step in a proof as “aux. hyp. associated with (∃x)A[x]”.
Now proceed to prove B using all that is known to you —that is, the axioms of the theory
T that you work in, perhaps some non-axiom hypotheses , and (∃x)A[x], and the new
non-axiom hypothesis A[z].
Do so by using all free (input-) variables of A[z] as constants in your proof!a

a This is a side-effect of using the deduction theorem in the proof of correctness of this ∃-
elimination technique. See 4.2.11.
The technique of removing an ∃ -prefix guarantees that you did better than
, A[z] T B
In fact, you actually achieved
T B
as if you never assumed nor used A[z]!

That is why they call it “auxiliary hypothesis”. Once it helps you prove B it drops out;
it does not stay around to get credit!
4.1.25 Example In practice we often have an assumption (∃x)Q from which we want to
eliminate (∃x) to benefit from the (possibly) uncovered Boolean structure of Q. This fits
with the theorem above taking = {(∃x)Q}.
For example, prove (∃y)(∀x)A[x, y] → (∀x)(∃y)A[x, y].
By the DThm it suffices to prove (∃y)(∀x)A[x, y] (∀x)(∃y)A[x, y] instead.
1) (∃y)(∀x)A[x, y] hyp
2) (∀x)A[x, z] aux. hyp. caused by 1; z is some fresh variable, not in the conclusion
3) A[x, z] 2 + Spec
4) (∃y)A[x, y] 3 + Dual Spec
5) (∀x)(∃y)A[x, y] 4 + Gen; OK, no free x in lines 1 and 2
I said in line 2, “z is some fresh variable, not in the conclusion”. Doesn’t “fresh” cover the
“not in the conclusion?” NO! “Fresh” ensures that none of the lines before the introduction
of z contain it. Freshness is not global to the proof! So, non occurrence in B must be added
explicitly.

4.1.26 Example Can I also prove the converse of the above? That is (∀x)(∃y)A[x, y] →
(∃y)(∀x)A[x, y].
I will try.
By the DThm it suffices to prove (∀x)(∃y)A[x, y] (∃y)(∀x)A[x, y] instead.
1) (∀x)(∃y)A[x, y] hyp
2) (∃y)A[x, y] 1 + spec
3) A[x, z] aux. hyp. for 2; z is fresh and not in the conclusion
4) (∀x)A[x, z] 3 + Gen; Hmm!
Illegal: I should treat the free x of aux. hyp. as a constant!
Still, can anyone prove this even if I cannot?
A question like this, if you are to answer “no”, must be resolved by offering a coun-
terexample. That is, a special case of A for which I can clearly see that the claim is not
true.
Here is one such special case:
(∀x)(∃y)x = y → (∃y)(∀x)x = y (1)
Say we work in N. The lhs of → is true, but the rhs is false as it claims that there is a number
such that all numbers are equal to it. So the implication fails in the special case invalidating
also the general case.5
Another useful principle that can be proved, but we will not do so, is that one can replace
equivalents-by-equivalents. That is, if C is some formula, and if I have
5 If the general case is valid so would be any of its special cases!

1. A ≡ B, via proof, or via assumption, and also

2. A is a subformula of C
then I can replace one (or more) occurrence(s) of A in C (as subformula(s)) by B and call
the resulting formula C , and be guaranteed the conclusion C ≡ C . That is, from A ≡ B, I
can prove C ≡ C .
This principle is called the equivalence theorem.
Let’s do a couple of ad hoc additional examples before we move to the section on Induc-
tion.
4.1.27 Example A → B (∀x)A → (∀x)B.

By the DThm it suffices to prove A → B, (∀x)A (∀x)B instead.
1) A→B hyp
2) (∀x)A hyp
3) A 2 + Spec
4) B 1 + 3 + MP
5) (∀x)B 4 + Gen; OK as the DThm hypothesis (line 2) has no free x
4.1.28 Example Refer to 4.1.8(7). Let us apply it to ¬A for arbitrary A. We get
(∃x)¬A ≡ ¬(∀x)¬¬A (1)
We apply the equivalence theorem above. To this end, note that A ≡ ¬¬A is a tautology,
hence an axiom in group 1, and thus a theorem: A ≡ ¬¬A. Applying the equivalence
theorem to (1) we thus obtain:
(∃x)¬A ≡ ¬(∀x)A (2)
Applying another tautological implication to (2) we get
(∀x)A ≡ ¬(∃x)¬A
which is of the same form as 4.1.8(7) with the roles of ∃ and ∀ reversed.
4.1.29 Example A ≡ B (∀x)A ≡ (∀x)B.

True due to the equivalence theorem! “C” is “(∀x)A”. We replaced (one occurrence of)
A by B in C, and we have assumed as starting point that A ≡ B.
4.1.30 Exercise Prove A ≡ B (∀x)A ≡ (∀x)B without relying on the equivalence the-
orem. Rather use 4.1.27 in your proof, remembering the ping-pong tautology (4.1.21).
4.1.31 Example (A-Intro, or ∀-Intro) We establish here the theorem “A → B A →

(∀x)B, provided A contains no free x”, or, as one often says, “there is no free x in the
conclusion” (right hand side of ).
This is proved without the Deduction theorem as we will use this result in the exercises
section (4.2) towards proving the Deduction theorem.
1) A→B hyp
2) (∀x)(A → B) 1 + Gen
3) (∀x)(A → B) → (∀x)A → (∀x)B axiom 2
4) (∀x)A → (∀x)B 2 + 3 + MP
5) A → (∀x)A axiom 3
6) A → (∀x)B 4 + 5 + Post
4.1.32 Example (Variant Theorem for ∀) Another useful result that practitioners use
without quoting and without notice is the “bound variable renaming”, which some people,
uncharitably, call the “dummy renaming” theorem. In Shoenfield (1967), Tourlakis (2003a)
it goes as the variant theorem. Suppose that the variable z is fresh for (∀x)A[x]. Then we
have the theorem
(∀x)A[x] ≡ (∀z)A[z]
The proof uses ping-pong and is straightforward, except that in the “← Direction” below it
requires some combinatorial thinking that is not part of logic.
→ Direction. Note that step 1 below has a legal substitution [x ← z] since freshness of
z entails that no part of A is “(∀z)(. . . x . . .)” to give trouble when we do [x ← z].
1) (∀x)A[x] → A[z] axiom 2

2) (∀x)A[x] → (∀z)A[z] 1 + A-Intro (4.1.31); recall that z is fresh for (∀x)A
← Direction. Here we start the two-line proof with “(∀z)A[z] → A[z][z ← x]”. We
need to argue that “A[z][z ← x]” is the same as “A[x]” or just “A” in simpler notation.
First off, the rightmost substitution in A[z][z ← x] is legal. Why? Because there is NO
(∀x)(. . . z . . .)-part in A[z ← x] to capture x. Note that
There are NO preexisting z in A since z is fresh for (∀x)A.
Thus a z could only appear in (∀x)(. . . . . .) in the spot iff “” were a free x
—impossible when such an x is in the scope of (∀x).
4.2 Exercises 139
Indeed, we see that A[z][z ← x] is A[x ← z][z ← x]. Now, we already noted that
A[x ← z] is legal. At the end of this operation we introduce the symbol “z” in precisely
those spots where
A held
originally free occurrences of x.
But then, A[x ← z] [z ← x] will change back to x precisely those z that were originally
free x. Seeing that there were no preexisting z, all z change back to x. A is restored.
Now the ← Direction proof.
1) (∀z)A[z] → A[z][z ← x] axiom 2

1 ) (∀z)A[z] → A[x] discussion above and “ A[x ← z] [z ← x]” is “A[x]” conclusion
2) (∀z)A[z] → (∀x)A[x] 1 + A-Intro (4.1.31); recall that z is fresh hence not same as x

4.2 Exercises
1. Define: The formula A is true over the real numbers R.

2. Let, non standardly, A ∨ B be defined to be true over some domain of interest just in
case A is true, or B is true, or both.
Let now A stand for “x is even” and B stand for “x is odd”, x varying over N. Is A ∨ B
true according to the above non standard definition?
3. a. Show through a general (syntactic) proof that x < y y < x (< is an uninterpreted
relation; the choice of symbol here is meant to provoke)
b. Show that x < y → y < x (Hint: Show that x < y → y < x cannot be possibly
true over, say, the real numbers.)
c. Does this invalidate the deduction theorem? Explain.
4. Prove Proposition 4.1.14.
5. Prove Proposition 4.1.16.
6. (∃-Introduction) Prove that if x is not free in the conclusion, then A → B (∃x)A →
B. This derived rule is also called “L∃”, since ∃ is introduced to the left. Hint. Use
axiom 7 in conjunction with 4.1.31.
7. (Variant theorem for ∃) Suppose that the variable z is fresh for (∃x)A[x]. Then we have
the theorem
(∃x)A[x] ≡ (∃z)A[z]
8. (The Deduction Theorem) The reader is asked here to prove the Deduction Theorem:
If A is a sentence or closed —meaning it has no free variables, then from
, A B (1)
follows
A→B
Hints. For the proof.

Do induction on the length of the proof indicated by (1). (You may want to come
back to this after you study the next chapter; but only if induction intimidates you.)
For the Basis we have a proof of length one, so it contains only B, in which case we
have subcases
(i) B is an axiom. Then B (justify!) and hence, A → B (justify “hence”!)

(ii) B is in . As in item (i).
(iii) B is A (“, A” is an alias of ∪ {A} or “ + A”. Thus it is the “whole hyp”
in (1)). A → B is thus the tautology A → A hence an axiom in group 1. Why
should you conclude A → A, that is, A → B?
For (Induction Hypothesis) I.H. fix an n and assume all + A-proofs of lengths
≤ n satisfy the theorem.
We now embark discussing a + A-proof of length n + 1.
A1 , A2 , . . . , A j , . . . , Ak → B, . . . , An , B (1)
Cases for B:
a. B is placed because it is in , or is A or is a logical axiom. No work needed as

all these cases were discussed in the Basis.
b. B was obtained by MP from two previous formulas in (1), say Ak and Ak → B.
By I.H. we have
A → Ak (2)
A → (Ak → B) (3)
See if you can now prove
A→B
c. B was obtained by Gen from one previous formula in (1), say A j , that is, B is
(∀x)A j . By the I.H. we have
4.2 Exercises 141
A → Aj (4)
See if you can prove that G A → B, which is the same as G A → (∀x)A j .
End of Hints.
9. (The Deduction Theorem version 2) The reader is asked here to prove this version of
the Deduction Theorem:
Suppose that
, A B (1)
and that there is a proof of this fact that treated all the free variables of A as
constants, meaning, if x is such a variable, then we never used it in said proof with
(∀x) nor with [x ← t]. Under these conditions prove that we have
A→B
10. (Proof by Contradiction) Let us (somewhat informally) consider, as we did before, the
truth values f and t as “constant” atomic (i.e., devoid of connectives hence of Boolean
structure) formulas.
We then state the principle of proof by contradiction as “to prove A, where A has
no free variable, is the same as proving a falsehood, such as f, from premises + ¬A”.
Thus, prove that, for a sentence A, we have A iff , ¬A f.
Hint. In the if-direction use the deduction theorem to obtain ¬A → f. Follow up
with an application of Post.
In the only if-direction use 4.1.16 to show , ¬A A and then the definition of proof
to also show , ¬A ¬A. Follow up these two with an application of Post.
11. (Proof by Auxiliary Hypothesis, or ∃ -Elimination) See also 4.1.24 Suppose that
(∃x)A and let z be fresh for (∃x)A and B. Then
If , A[x ← z] B, we will also have B
Hint. The assumption is that we have a proof with the help of the auxiliary hypothesis,
aux. hyp

, A[x ← z] B
Treating all the variables (incl. z) of the auxiliary hypothesis as constants we apply the
deduction theorem to get A[x ← z] → B.
By 4.2.6 obtain (∃z)A[z] → B. Then (∃x)A[x] ≡ (∃z)A[z] and the previous yield
(∃x)A[x] → B by Post. Now useone of the assumptions
and Post to get B.
12. Prove by ∃-elimination that (∃x) (A ∨ B) → C → (∃x)(A → C) ∧ (∃x)(B →
C).
13. Find a proof other than via ∃-elimination
for the above.

14. Prove by ∃-elimination that (∃x) (A ∨ B) → C → (∃x) (A → C) ∧ (B → C) .
15. Find a proof other than via ∃-elimination
for the above.
16. Prove by ∃-elimination that (∃x) (A → B) ∧ (A → C) → (∃x) A → B ∧ C .
17. Find a proof other than via ∃-elimination for the above.
18. Prove by ∃-elimination that (∀x)A → (∃x)(A → B) → (∃x)B.
19. Prove by ∃-elimination that (∃x)A → (∀x)(A → B) → (∃x)B.
20. Find a proof other than via ∃-elimination for the above.
21. Let φ stand for an unspecified relation of two variables.
This could be anything like: <, >, ≤, ≥, ∈, , ⊂ and many others!
Prove (1) within pure logic, that is, logic without any theory-specific axioms, no math
hypotheses, and no special meaning for symbols. Anyway, “symbols” —other than log-
ical symbols, that is, connectives, brackets and =— never have any inherent meaning
relating to their shape if there are no axioms about said symbols!

¬(∃y)(∀x) φ(x, y) ≡ ¬φ(x, x) (1)
Hint. Prove (1) via a proof by contradiction. In the initial setup you end up with a formula,
which has a leading (∃y) and it can prove f. Now apply ∃-elimination to construct the
latter proof.
22. What just happened in 21 above? Contextualise within set theory and discuss.
Induction
5
Overview
This chapter is about two of the most important topics in a course on discrete mathematics
—induction and inductive definitions. Nowadays most authors prefer to call “inductive”
definitions “recursive”. These topics are called upon in numerous sequel courses such as
logic, data structures, theory of computation, design and analysis of algorithms.
In this chapter we introduce the induction (proof) principle on N as an equivalent principle
to the least (integer) principle on N.
But we also generalise induction in two important directions making this tool sophis-
ticated enough to be applicable to advanced readings, for example (axiomatic) set theory,
which is relevant to mathematics students: One direction is to recognise that the induction
principle (equivalently, the minimal condition or principle, MC, which is a generalisation of
the least principle on N) which on N is an attribute of the “natural” order <, can be extended
to arbitrary orders on arbitrary classes. This opens applicability of induction to any classes
that are equipped with an order that has MC.
The other direction of our generalisation is to recognise that a relation —whether we
denote it by “P” or “<”— does not have to be an order for us to do induction along it. All
it needs is to satisfy MC. For example, we can do induction along ∈ —this is not an order
on all sets (it fails transitivity)— to prove properties of classes!
The chapter concludes with the important topic of inductive or recursive definitions of
functions, such as, for example, the recursive definition of the factorial function 0! = 1 and
(n + 1)! = (n + 1) × (n!), for n ≥ 0. Such inductive definitions are central in the theory of
computation, and in practical computation in fact, since it turns out that in practical com-
putation we cannot compute functions beyond the so-called primitive recursive functions;
cannot even compute all primitive recursive functions (except only “in principle”) due to
https://doi.org/10.1007/978-3-031-30488-0_5
144 5 Induction
the fact that “most of them” have astronomical outputs —like the function that outputs the
“ladder” of x 2s on input x below1 — and equally astronomical run times needed for their
computation. For example it can trivially be proved that the function that with input x outputs
the following number is primitive recursive.

· ·2
· x 2s
22
What is the connection between primitive recursive functions and recursive definitions?
Primitive recursive functions (cf. e.g., Tourlakis (2022)) are formed by starting from trivial
functions, such as the function that for all inputs returns zero, using repeatedly compositions
and so-called “primitive” recursive definitions. A primitive recursive definition is a special
simple form of a general inductive definition where f is defined from functions h and g by
the two equations below, valid for all x, y from N,
f (0, y) = h(y)
f (x + 1, y) = g(x, y, f (x, y))
Definitions just as the above are based on the fact that N supports induction along the
standard “<”. We will not omit a generalisation of recursive definitions along any relation P
that may have MC without being an order. As an example, taking P = ∈ we give an inductive
definition over the class U of the so-called “support” function that for any set A as input
returns the set of all the atoms used to build A. For example, if A = {{{2}}, {1}}, it returns
{2, 1}.
5.1 Inductiveness Condition (IC)
In Remark 3.4.29 we concluded with a formulation of the minimal condition (MC) for any
order <, for which fields (left/right) have not been specified, an unrelativised order, that is.
We did this as follows:
That an “order < has MC” is captured by —i.e., is equivalent to— the statement
For any “property”, that is, formula F[x]2 we have that the following is true

(∃a)F[a] → (∃a) F[a] ∧ ¬(∃y) y < a ∧ F[y] (†)
More generally, for a non-order relation P we saw that “P has MC” (see also 3.4.30 for
the concept “a is P-minimal in A”) is expressed in terms of classes (3.4.32 , (see (1 ))) as
2 22
1 222 = 16 but 222 = 65536 while 222 is astronomical.
2 Recall that this notation, square brackets, indicates our interest in one among the, possibly many,
free variables of F.
5.1 Inductiveness Condition (IC) 145
A = ∅ → (∃a ∈ A)A ∩ (a)P−1 = ∅ (1 )
while in terms of “properties” F[x] it is expressed by 3.4.32 , (see (2 )), reproduced below.

(∃a)F(a) → (∃a) F(a) ∧ ¬(∃y) yPa ∧ F(y) (2 )
which formally looks exactly like (†) above but using the symbol “P” instead of “<”.
Thus, (†) and (2 ) are formally (in form!) identical, but semantically we will need to
recall that “<” represents any order with MC in (†) while “P represents any relation
with MC” in (2 ) (and (1 )).
Let us logically manipulate (2 ) to bring it into an equivalent form that goes under the
nickname “Inductiveness Condition” —in short IC— or, alternatively, is called the “Principle
of Induction.”
Let us rewrite (2 ) replacing F[x] by ¬G[x] everywhere, where G[x] is arbitrary. We
get the theorem

(∃a)¬G[a] → (∃a) ¬G[a] ∧ ¬(∃y) yPa ∧ ¬G[y] (2)
Using the equivalence theorem (p. 137) and Axiom 7 (p. 125), we obtain from (2)

¬(∀a)G[x] → ¬(∀a)¬ ¬G[a] ∧ (∀y)¬ yPa ∧ ¬G[y]
and then —the tautology (X → Y ) ≡ (¬Y → ¬X ) known as “contrapositive” is used—

also
(∀a)¬ ¬G[a] ∧ (∀y)¬ yPa ∧ ¬G[y] → (∀a)G[a]
Using the tautology
¬(A ∧ B) ≡ ¬A ∨ ¬B
and the equivalence theorem, we transform the above to this theorem:

(∀a) G[a] ∨ ¬(∀y) ¬yPa ∨ G[y] → (∀a)G[a]
Again, this time using the tautology
¬A ∨ B ≡ A → B
(twice) and the equivalence theorem, we transform the above to this theorem:

(∀a) (∀y) yPa → G[y] → G[a] → (∀a)G[a] (3)
146 5 Induction
Display (3) above expresses the Inductiveness Condition (IC) for P, or, as we usually say,
expresses the principle of strong induction, or complete induction, or course-of-values induc-
tion for the —not necessarily an order— relation P.
5.1.1 Remark The above method of showing the equivalence between MC and IC is not
mentioned much in the literature (see however Barwise (1975) who applies it in the case
where P is ∈ on U).
We should state an obvious and trivial corollary.
5.1.2 Corollary An order < has MC iff it has IC.
Proof We never relied on whether or not P is an order, so the preceding equivalence of MC

and IC proof holds if P is, in particular, an order <. In fact, a proof for an order < is obtained
from the above simply by replacing P, everywhere in the reasoning, by <.
De f
If we replace, everywhere in (3), the formula G[y] by the class B = {y : G[y]}, then we
directly obtain (3 ) below from (3):

(∀a) (a)P−1 ⊆ B → a ∈ B → (∀a)a ∈ B (3 )
Note that the y in (a)P−1 are the P-predecessors of a in the sense that they are precisely
those y that satisfy yPa —“y is before a along P”.
It is extremely useful to state (3 ) in words.
If we want to prove that all a are in some class B and we have a relation P with IC
(equivalently MC), then it suffices to prove, for any arbitrary unspecified a, that a is
in B provided all its P-predecessors are.
The part “for any arbitrary unspecified a” is English for the part (∀a). For such an a we
prove a ∈ B with the help of the condition (assumption) (a)P−1 ⊆ B . The last implication
in (3 ) says that “for any arbitrary unspecified a, a is in B, unconditionally”.
The boxed formula above is called the Induction Hypothesis or I.H. for a —and so is the

corresponding part “(∀y) yPa → G[y] ” in (3).
The essence of the I.H. in either formulation —(3) or (3 )— is that it assists in the proof
of the leftmost (conditional) “a ∈ B” (or “G[a]”) for the “arbitrary unspecified a”. Having
proved a ∈ B under the I.H., the fact that our P has IC also implies (the last implication in
(3) or (3 )) the unconditional truth of a ∈ B for the arbitrary a.
5.1 Inductiveness Condition (IC) 147
Thus the rightmost (unconditional) “a ∈ B” is established for all a only from the axioms
and assumptions of the theory we are working in (e.g., set theory, number theory, etc)
and the I.H. drops out from the hypotheses list.
Of course, (∀a)a ∈ B implies U ⊆ B, hence B = U, the set theoretic class of “all things”.
This is fine for (informal) set theory, indeed useful, but we often work within much smaller
than U classes A.
For example in number theory we work in N. Then the classes B of interest will be subsets
of N. Therefore let us formulate IC relativised to a class or set A before we move on to
practical considerations and examples.
We reproduce here the formula (†) that says “P has MC relative to a class A” (cf. 3.4.37)
—which is the same as “P | A has MC” (Definition 3.4.34):

(∃b ∈ A)F[b] → (∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (†)
Expressing (†) in terms of the formula ¬G[x] instead of F[x] we obtain

(∃b ∈ A)¬G[b] → (∃b ∈ A) ¬G[b] ∧ ¬(∃x ∈ A) ¬G[x] ∧ xPb († )
Applying the very same transformations we introduced on p. 145 to († ) we obtain the
equivalent formula (‡) below

(∀b ∈ A) (∀x ∈ A) xPb → G[x] → G[b] → (∀b ∈ A)G[b] (‡)
De f
If we replace, everywhere in (‡), the formula G[y] by the class B = {y ∈ A : G[y]}, then
we directly obtain (‡ ) below from (‡):

(∀b ∈ A) (b) P−1 | A ⊆ B → b ∈ B → (∀b ∈ A)b ∈ B (‡ )
or, by 3.4.31 2. (cf. also 3.4.5),

(∀b ∈ A) A ∩ (b)P−1 ⊆ B → b ∈ B → (∀b ∈ A)b ∈ B (¶)
Let us render (‡) more recognisable: By applying MP (modus ponens, cf. rule (M P) on
p. 125) I can transform (‡) in “rule of inference form”, indeed I will write it like a rule that
says, like all rules do, “if you proved my numerator, then my denominator is also proved!”

(∀b ∈ A) (∀x ∈ A) xPb → G[x] → G[b]
(∀b ∈ A)G[b]
148 5 Induction
Dropping the (∀b ∈ A)-prefix 3 we have the rule in the form:

I .H . I .S.

(∀x ∈ A) xPb → G[x] → G[b]
(C V I )
G[b]
“CVI” for Course-of-Values Induction. CVI says
To prove G[b] (for all b ∈ A is implied!) do as follows:

Step (a) Fix an arbitrary b-value. Now, assume (∀y ∈ A) yPb → G[y] for all said y. We
call
the assumption Induction Hypothesis (for y), in short, I.H.
Step (b) Next, using the I.H. —and the axioms / assumptions of the theory you are working
in— prove G[b], for the same fixed unspecified b. This proof step we call the
Induction Step or I.S.
Note that what is described by (a) and (b) is precisely an application of the Deduction
theorem towards proving “If, for all xPb,4 G[x] is true, then G[b] is true”, that is,
proving the implication on the numerator of CVI for any given b.
Step (c) If you have done Step (a) and Step (b) above, then you have proved G[x] (for all
x ∈ A is implied!)
Important. Step (a) above says “arbitrary b”.

So, I should not leave any b-value out of the proof!
The case where b is P-minimal in A is singular (no pun intended). How do I prove the
I.S. when there is no x below b along P in A? There is no I.H. to rely on. No problem: The
numerator implication in CVI now reads
t

f

(∀y ∈ A) yPb → G[y] → G[b]
The lhs of the second “→” is true. Thus, to certify the truth of that implication I must prove
G[b] without I.H. help.
This step was hidden in Steps (a) – (b) above. It is called the Basis of the induction.
3 In Chapter 4 we noted that A[x] is true iff (∀x)A[x] is true.

4 x in the class of interest A.
5.2 IC Over N 149
5.2 IC Over N
With the above general considerations in hand, the present section focuses in some practise
with induction over N.
Taking P here to be the order < restricted to N and taking for granted that < | N has
MC (as we argued informally in 3.3.14; but see also the counterpoint in -delimited
comments on p. 151 !) and we conclude that < | N also has IC.
Thus we have, for some arbitrary property P[y] of natural numbers, the special form of
(‡) below:

(∀n ∈ N) (∀k ∈ N) k < n → P[k] → P[n] → (∀n ∈ N)P[n]
Dropping the ∀-prefix we have the above in “rule form”:

I .H . I .S.

(∀k ∈ N) k < n → P[k] → P[n]
(C V I on N)
P[n]
5.2.1 Remark Of course, we have the proof technique we called CVI (after Kleene) for
any POset (A, <) where < has IC (equivalently MC) over A, that is, < | A has IC.
There is another simpler induction principle over N that we call, well, simple induction:
P[0], P[x] → P[x + 1]
(S I )
P[x]
“SI” above stands for Simple Induction. That is, to prove P[x] for all x (denominator) do
three things:
Step 1. Prove/verify P[0]
Step 2. Assume P[x] for fixed (“frozen”) x (unspecified!).
Step 3. prove P[x + 1] for that same x. The assumption is the I.H. for simple induction.
The I.S. is the step that proves P[x + 1].
Note that what is described here is precisely an application of the Deduction theorem
towards proving “P[x] → P[x + 1]”, that is, proving the implication for any
given x.
Step 4. If you have done Step 1. through Step 3. above, then you have proved P[x] (for
all x in N is implied!)
150 5 Induction
Is the principle SI correct? I.e., if I do all that the numerator of SI asks me to do (or Steps
1. – 3.), then do I really get that the denominator is true (for all x implied)?
5.2.2 Theorem The validity of SI is a consequence of MC on N.
Proof Suppose SI is not correct. Then, for some property P[x], despite having completed
Steps 1. – 3., still, P[x] is not true for all x!
Well, if so, then by MC let n ∈ N be smallest such that P[n] is false. Now, n > 0
since I did verify the truth of P[0] (Step 1.). Thus, n − 1 ≥ 0. But then, when I proved
“P[x] → P[x + 1] for all x (in N)” —in Steps 2. and 3.— this includes proving the case
P[n − 1] → P[n] (4)
Now, by the smallest-ness of n, P[n − 1] is true, hence P[n] is true by (4) and the truth
table of “→”. I have just got a contradiction! I conclude that no such smallest n exists, i.e.,
P[x] is true (for all x ∈ N).
We conclude that SI works —if MC does (cf. discussion in the -passage on
p. 151).
How do the simple and course-of-values induction relate? They are equivalent tools, or,
they have the same (proof) power as we say. Here is why:
5.2.3 Theorem From the validity of SI I can obtain the validity of MC.
Proof Suppose < | N has SI.
We prove it also has MC. Suppose not.
Then there is a set

∅ = S ⊆ N (1)
that has no minimal (same as least, since < | N is a total order) element.
Let us call T the set N − S. I will use SI and prove that T = N. The property I am proving
for all n ∈ N using SI is
{0, 1, . . . , n} ⊆ T (2)
Basis. {0} ⊆ T , that is, 0 ∈ T . This is so, because otherwise 0 ∈ S contradicting that S has
no least element.
Now fix an unspecified n and take (2) as the I.H.
I prove next the I.S. that

5.2 IC Over N 151
{0, 1, . . . , n, n + 1} ⊆ T (3)
Towards (3) —given the I.H.— I need to show n + 1 ∈ T . Suppose this is not true. But then
n + 1 ∈ S and the I.H. implies that none of 0, 1, . . . , n is in S. This means n + 1 is minimal
in S, a contradiction.
Having shown (2), for all n ∈ N, we have N ⊆ T hence T = N and thus S = ∅. A
contradiction to (1). Done.
5.2.4 Corollary All three of SI, CVI and MC are equivalent.
Proof For < | N we have CVI iff we have MC iff we have SI.
5.2.5 Remark
1. When do I use CVI and when SI? SI is best to use when to prove P[x] (in the I.S.) I only
need to know P[x − 1] is true. CVI is used when we need a more flexible I.H. that P[n]
is true for all n < x. See the examples below!
2. “0” is the boundary case if the claim we are proving is valid “for all n ∈ N”, or simply put,
“for n ≥ 0”. If the claim is “for all n ≥ a, P[n] is true” then usually P[n] is meaningless
for x < a and thus the Basis is for n = a.
Having established that MC and CVI (and SI) are equivalent for the “standard” order <, it
follows that over N we have both or we have none. Which one is it?
We have informally argued earlier (e.g., in the section on congruences, 3.3.14) that <
does have MC over N but the informal argument we gave there does not have the force of a
proof.
A mathematical proof requires that established properties of natural numbers and the
set N be known and used. We only offered a tentative argument within informal set theory
where we took for granted that N is one of this theory’s sets. If so, what properties does this
set have?
The proper way to prove within set theory that N has CVI or equivalently MC is to build
a copy of N within set theory and prove such properties as theorems.
Indeed this can be done (e.g., in Tourlakis (2003b)) but we did not do it here. One builds
a counterpart of N and gives it an alternative name —ω— as follows: “0” is defined to be
the object ∅. If we defined the number n as a set, then its successor, n + 1, is defined as
the set n ∪ {n}. Thus n + 1 stands for {0, 1, 2, . . . , n}. We then prove that the class of all so
constructed natural numbers is a set, and call it ω.
Next we prove that < on ω defined by
De f
n < m iff n m
satisfies MC and hence also both CVI and SI.

152 5 Induction
More elegantly, one may axiomatise N outside set theory, via the Peano’s axioms. One
such axiom schema states that the “<” on the set of natural numbers —with its basic prop-
erties axiomatically postulated— satisfies simple induction.
Hm. But we have seen arguments that directly “prove” simple induction “works” employ-
ing a “falling dominoes” argument. Haven’t we? It goes like this:
We are equipped with a proof that
P[n] implies P[n + 1] (1)
for the arbitrary n. We also verify that P[0] is true.
Thus the argument
P[0] and P[0] → P[1] yield P[1]; next P[1] and P[1] → P[2] yield P[2]; next
. . . next P[n] and P[n]→P[n + 1] yield P[n + 1]; . . .
(2)
proves P[n + 1] is true for no matter what n.
That is, we can prove P[x] for any natural number x, the argument x being reachable by
adding one repeatedly. However, this does not say that we proved (∀x)P[x].
One, a proof has finite length and we cannot extend the proof (2) by just repeating the
parts “P[n] and P[n] → P[n + 1] yield P[n + 1]” an infinite number of times.
Two, nor can we be sure that all Informal natural numbers can be reached from 0 by
just repeatedly adding one. What are the natural numbers? Unless we know more about the
natural numbers we cannot be sure that, for example, there are no “infinite natural numbers”
after the end of the sequence 0, 1, 2, 3, . . . , n, . . . The subclass I of N consisting of only
“infinite numbers” has no least member, since if X is an infinite natural number, then so is
X − 1.
In particular, such an observation invalidates the argument in support of the thesis that <
on N has MC that we offered in 3.3.14, namely,
But we cannot have an infinite descending sequence of nonnegative integers
. . . < x < x < x < x

The boxed statement is false if your descent downwards along natural numbers starts
from an infinite integer. We need to be able to prove that infinite walks downwards are
impossible —that is, infinite natural numbers do not exist— and accepting IC as an axiom
(as we do in Peano arithmetic) or, equivalently, postulating MC, is one elegant way to show
such an impossibility.
So, we accept one of MC, IC (CVI) or SI axiomatically.
5.2 IC Over N 153
The discussion above triggers the motivation to connect the non existence of infinite
downwards walks with the presence of MC (or IC).
5.2.1 Well-Foundedness
5.2.6 Definition For any relation P, an infinite descending P-chain is a function f with the
properties
(1) dom( f ) = N, and

(2) (∀n ∈ N) f (n + 1) P f (n).
Intuitively, an infinite descending P-chain is an infinite sequence a0 , a1 , . . . such that

. . . a3 Pa2 Pa1 Pa0 (that is, an+1 Pan , for all n ≥ 0).
5.2.7 Definition A relation P is well-founded iff it has no infinite descending chains.

P is well-founded over A iff P | A is well-founded.
Intuitively, P is well-founded if the universe of all sets and atoms U cannot contain an infinite
descending chain, while it is well-founded over A if A cannot contain an infinite descending
P-chain. Clearly, no infinite descending P-chain can “start” anywhere outside ran(P) in any
case.
There is some disagreement on the term “well-founded” in the literature. In some of the
literature it applies definitionally to what we have called relations “with MC”. However, in
the presence of AC well-founded relations are precisely those that have MC, so the slight
confusion —if any— is harmless.
5.2.8 Theorem If a relation P has MC over A, then P is well-founded over A.
Proof Let instead f be an infinite descending P|A-chain.

Then ∅ = ran( f ) ⊆ A hence there is an a ∈ ran( f ) which is P|A-minimal.
Now, a = f (n) for some n ∈ N, but f (n + 1)(P|A) f (n) contradicting the P|A-
minimality of a.
5.2.9 Corollary Let A be set. Then the following are equivalent:

(1) P has MC over A.
(2) P has IC over A.
(3) P is well-founded over A.
154 5 Induction
Proof The equivalence of (1) and (2) as well as (1)=⇒(3) have already been proved. Thus
we only need to prove that (3) implies (1). So assume (3) and let (1) fail. Let ∅ = B ⊆ A
such that B has no P-minimal elements. Pick an a ∈ B. Since it cannot be P-minimal, pick
an a1 ∈ B such that a1 Pa. Since a1 cannot be P-minimal, pick an a2 ∈ B such that a2 Pa1 .
This process can continue ad infinitum to yield an infinite descending chain
. . . a3 Pa2 Pa1 Pa in A, contradicting (3). Done.
This argument used AC, and more precisely it goes like this:
Let g be a choice function for 2 B − {∅}, that is, for each S ∈ 2 B − {∅}, we have g(S) ∈ S
(cf. 3.5.28).
Define now f on N as
a if n = 0
f (n) =
g B ∩ ( f (n − 1))P−1 if n > 0
f is total on N for B ∩ ( f (n − 1))P−1 = ∅ for all n > 0, by assumption. By g(x) ∈ x, for

all x ∈ 2 B − {∅}, we get f (n) ∈ B ∩ ( f (n − 1))P−1 , i.e., f (n)P f (n − 1) and f (n) ∈ B,
for all n > 0, thus f is an infinite descending chain in B ⊆ A.

5.2.10 Remark The implication (3)=⇒(1) and hence the entire corollary goes through
for any classes A and ∅ = B ⊆ A as long as P is left-narrow as we say, that is, the class
{x : xPa} = (a)P−1 is a set for all a ∈ A.
Indeed, the part “let a ∈ B” needs no elaboration, and moreover all B ∩ ( f (n − 1))P−1
are sets by left-narrowness.
5.2.11 Definition (λ-notation) λ-notation is very useful in both discrete mathematics and
in the theory of computation (Tourlakis (2012, 2022)). It easily allows us to separate the
intensional notation of a function —i.e., what it does on any given input— as opposed to its
extensional notation, that is, the function as a possibly infinite table of input/output pairs.
The format of λ-notation is
begin input
↓ end input
↓
λ list of inputs . rule for how to obtain the output
Examples:
1. λx.x + 1.
2. λx y.x − y.
3. λx y.x + 42. In this example the input y is ignored. Its value is not used to compute the
output.
5.2 IC Over N 155
5.2.12 Example If P is well-founded, then it is irreflexive.

Indeed, if aPa for some a, then the function λn.a on N is an infinite descending chain
(. . . aPaPaPa).
By Corollary 5.2.9, if P has IC (equivalently MC) then it is irreflexive.
If P is irreflexive but not well-founded, is then P+ a partial order? (A legitimate question
since P+ is transitive.)
Well, no, for consider R = {1, 2, 2, 3, 3, 1} which is irreflexive. It is not well-
founded: for example, we have an infinite descending chain
. . . R1R2R3R1R2R3R1R2R3R1
Now R + = {1, 1, 2, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 3, 2, 1, 3, 2} which is not
a partial order (it is reflexive), nor is it a “reflexive” order, since it is not antisymmetric
(e.g., 1R3 ∧ 3R1 requires 1 = 3).
It turns out that if P has MC, then so does P+ and hence, in particular, it is a partial order,
being irreflexive.
5.2.13 Theorem If P has MC (IC), then so does P+ .
Proof Let ∅ = A. Let a ∈ A be P-minimal, i.e.,
(a)P−1 ↑ (1)
Suppose now that bP+ a for some b. Then, for some b1 , b2 , . . . , bk we have
bPb1 Pb2 P . . . Pbk Pa
But bk Pa contradicts (1). Therefore a is also P+ -minimal.
5.2.14 Corollary If P has MC (IC) over A, then (P|A)+ has MC (IC).
Proof It is given that P|A has MC (IC). By 5.2.13 (P|A)+ has MC (IC).
We cannot sharpen the above to “P+ has MC (IC) over A”, for that means that P+ |A has
MC. This is not true though: Let O be the odd natural numbers, and R be defined on N by
x Ry iff x = y + 1, thus R + = >.
Now, R has MC over O (for R|O = ∅), yet R + does not, for R + |O has an infinite
descending chain in O:
... > 7 > 5 > 3 > 1
In particular, we note from this example that (P|A)+ = P+ |A in general.
156 5 Induction
5.2.15 Example Let ≺ on N be defined by n ≺ m iff m = n + 1. It is obvious that ≺ is

well-founded, hence it has MC and IC by 5.2.9.
What is ≺-induction? For notational convenience let “(∀x)” stand for “(∀x ∈ N)”. Thus, for
any formula F(x)

(∀n) (∀x ≺ n)F(x) → F(n) → (∀n)F(n) (≺ -I C)
holds.
In other words, if F(0) is proved —this is (∀x ≺ 0)F(x) → F(0)— and if also
F(n − 1) → F(n) is proved for all n > 0, then (∀n)F(n) holds.
This is just our familiar (from N) “simple” (as opposed to “course-of-values”) induction
SI over N.
The “natural” < on N is ≺+ . <-induction over N coincides with the “usual” CVI over N
displayed at the top of Section 5.2.
5.2.16 Example The Axiom of Foundation of axiomatic set theory ZFC5 is

(∃y)F[y] → (∃y) F[y] ∧ ¬(∃z ∈ y)F[z]
It says that ∈ has MC. Therefore properties of sets can be proved by ∈-IC (∈-CVI) over U.

5.2.2 Induction Examples
5.2.17 Example This is the “classical first example of induction use” in the discrete math
bibliography! Prove that
n(n + 1)
0 + 1 + 2 + ... + n = (1)
2
So, the property to prove is the entire expression (1). One must learn to not have to rename
the “properties to use” as “P[n]”.
I will use SI. So let us do the Basis. Boundary case is n = 0. We verify: lhs = 0. r hs =
(0 × 1)/2 = 0. Good!
Now fix n and take the expression (1) as I.H.
5 According to Zermelo and Fraenkel, with the Axiom of Choice.

5.2 IC Over N 157
Do the I.S. Prove:

(n + 1)(n + 2)
0 + 1 + 2 + . . . + n + (n + 1) =
2
Here it goes
0 + 1 + 2 + . . . + n + (n + 1)
using I.H.
= n(n + 1) + (n + 1)
2
arithmetic
= (n + 1)(n/2 + 1)
arithmetic (n + 1)(n + 2)
= 2

I will write more concisely in the examples that follow.
5.2.18 Example Same as above but doing away with the “0+”. Again, I use SI.
n(n + 1)
1 + 2 + ... + n = (1)
2
• Basis. n = 1: (1) becomes 1 = (1 × 2)/2. True.

• Take (1) as I.H. with fixed n.
• I.S.:
1 + 2 + . . . + n + (n + 1)
using I.H.
= n(n + 1) + (n + 1)
2
arithmetic
= (n + 1)(n/2 + 1)
arithmetic (n + 1)(n + 2)
= 2
5.2.19 Example Prove

1 + 2 + 22 + . . . 2n = 2n+1 − 1 (1)
By SI.
• Basis. n = 0: 1 = 20 = 21 − 1. True.
• As I.H. take (1) for fixed n.
• I.S.
using I.H. n+1
1 + 2 + 22 + . . . + 2n + 2n+1 = 2 − 1 + 2n+1
arithmetic
= 2 · 2n+1
−1
arithmetic n+2
= 2 −1

158 5 Induction
5.2.20 Example (Euclid) Every natural number n ≥ 2 has a prime factor.

I do CVI (as you will see why!)
• Basis: For n = 2 we are done since 2 is a prime and 2 = 2 × 1.6

• I.H. Fix an n and assume the claim for all k, such that 2 ≤ k < n.
• I.S.: Prove for n: Two subcases:
1. If n is prime, then OK! n divides n.

2. If not, then n = a · b, where a ≥ 2 and b ≥ 2. By I.H.7 a has a prime factor, thus so
does n = a · b.
5.2.21 Example (Euclid) Every natural number n ≥ 0 is expressible base-10 as an

expression
n = am 10m + am−1 10m−1 + · · · + a1 10 + a0 (1)
where each ai satisfies 0 ≤ ai < 10 (2)
Proof by CVI again. You will see why.
• Basis. For n = 0 the expression “0” has the form of the rhs of (1) and satisfies inequality
(2).
• Fix an n > 0 and assume (I.H.) that if k < n, then k can be expressed as in (1) and (2).
• For the I.S. express the n of the I.H. using Euclid’s theorem (3.3.14) as
n = 10q + r
where 0 ≤ r < 10. By the I.H. —since q < n— let
q = bt 10t + bt−1 10t−1 + · · · + b1 10 + b0
with 0 ≤ b j < 10.
Then
n = 10q + r
n = 10 bt 10t + bt−1 10t−1 + · · · + b1 10 + b0 + r
n = bt 10t+1 + bt 10t + · · · + b1 102 + b0 10 + r
We see that n has the right form since 0 ≤ r < 10.
6 You will recall that a number N n > 1 is a prime iff —by definition— its only factors are 1
and n.
7 You see? Do you know many natural numbers n such that n − 1 divides n?! Only 2 has this property,
but 2 is just our Basis!
5.2 IC Over N 159
5.2.22 Example An inequality this time. Prove n < 2n , for n ≥ 0. We do SI.

Basis. n = 0. We prove 0 < 20 . This is true since 20 = 1.
Take as I.H.
n < 2n (1)
for fixed unspecified n.
The I.S. requires n + 1 < 2n+1 . Well, 1 ≤ 8 2n . Adding this to the assumed (1), term by
term, we get
n + 1 < 2n + 2n = 2n+1

5.2.23 Example Another inequality. Let pn denote the n-th prime number, for n ≥ 0. Thus
p0 = 2, p1 = 3, p2 = 5, etc.
We prove that
n
pn ≤ 2 2 (1)
I use CVI on n. This is a bit of a rabbit out of a hat if you never read Euclid’s proof that
there are infinitely many primes.
0
• Basis p0 = 2 ≤ 22 = 21 = 2.
• Fix n > 0 and take (1) as I.H.
• The I.S.: I will work with the fixed n above and the expression (product of primes, plus
1; this is inspired from Euclid’s proof quoted above).
p 0 p1 p2 · · · p n + 1
I have
0 1 2 n
p0 p1 p2 · · · pn + 1≤ 22 22 22 · · · 22 + 1 by I.H.
= 22 +2 +2 +···+2 + 1
0 1 2 n
algebra
= 22 −1 + 1
n+1
by 5.2.19
< 22 −1 + 22 −1
n+1 n+1
smallest n possible is 0
= 21 · 22 −1
n+1
n+1
= 22
8 The “=” is attained for n = 0.

160 5 Induction
Now we have two cases on q = p0 p1 p2 · · · pn + 1
1. q is a prime. Because of the “ + 1” q is different from all pi in the product, so q is

pn+1 or pn+2 or pn+3 or …
Since the sequence of primes is strictly increasing, pn+1 is the least that q can be.
Thus
n+1
pn+1 ≤ p0 p1 p2 · · · pn + 1 ≤ 22
in this case.
2. q is composite. By 5.2.20 some prime r divides q. Now, none of the
p 0 , p1 , p2 , · · · , p n
divides q because of the “ + 1”. Thus r is different from all of them, so it must be
one of pn+1 or pn+2 or pn+3 or …
Thus,
n+1
pn+1 ≤ r < q = p0 p1 p2 · · · pn + 1 ≤ 22
Done!
5.2.24 Example Let
b1 = 3, b2 = 6
bk = bk−1 + bk−2 , for k ≥ 3
Prove by induction that bn is divisible by 3 for n ≥ 1. (Be careful to distinguish between

what is basis and what are cases arising from the induction step! As you know, many texts
are careless about this.)
Proof So the boundary condition is (from the italicised part above) n = 1. This is the Basis.
1. Basis: For n = 1, I have b1 = 3 and this is divided by 3. We are good.

2. I.H. Fix n and assume claim for all k < n.
3. I.S. Prove claim for the above fixed n. There are two cases, as the I.H. is not useable
for n = 2. Why? Because it would require entries b0 and b1 . The b0 entry does not exist
since the sequence starts with b1 . So,
Case 1. n = 2. Then I am OK as b2 = 6; it is divisible by 3.

5.2 IC Over N 161
Case 2. n > 2. Is bn divisible by 3? Well, bn = bn−1 + bn−2 in this case. By I.H. (valid
for all k < n) I have that bn−1 = 3t and bn−2 = 3r , for some integers t, r . Thus,
bn = 3(t + r ). Done!

5.2.25 Example (The Binomial Theorem) We prove in this example the so-called binomial
theorem, for any N n > 0 and any real or complex numbers a and b.
n
n n−i i
(a + b)n = a b (1)
i
i=0
First let us take care of notation.
5.2.26 Definition (Binomial Coefficients) The notation

n
m
is called a binomial coefficient and stands for
n!
(n − m)!m!
where in turn n! stands for 1 × 2 × 3 × · · · × n, that is, it is inductively defined as
0! = 1
(n + 1)! = (n + 1) × n!
We call “n!” “n-factorial”.
Suppose we have n objects. In how many ways can we choose m among them (m < n)
ignoring repetitions? Well, the first of the m elements I can choose in n ways. The second
of the m I can choose in n − 1 ways after I chose and removed the first. Clearly then I can
choose the 3rd in n − 2 ways after I removed the 2nd; etc.

All in all, I can choose all m members in n(n − 1)(n − 2) · · · n − (m − 1) ways.
Wait! I have m! repetitions in my choices of m elements if I do nothing else. So the final
answer is the above divided by m!
162 5 Induction

n(n − 1)(n − 2) · · · n − (m − 1)
=
m!
(n−m)!

n(n − 1)(n − 2) · · · n − (m − 1) (n − m) n − (m + 1) · · · 3 · 2 · 1
=
m!(n − m)!
n!
m!(n − m)!

Before we
embark on the proof of the binomial theorem here are some properties of the
n
symbol that we will use:
m

n n n!
I. = 1. Indeed, = , but 0! = 1 by definition.
0 0 (n − 0)!0!

n n n!
II. = 1. Indeed, = .
n n n!(n − n)!
III.
n n n+1
+ =
m m−1 m

n
Indeed we work from left to right using the definition of .
m

n n n! n!
+ = +
m m−1 (n − m)!m! n − (m − 1) !(m − 1)!

n! 1 1
= +
(n − m)!(m − 1)! m n − (m − 1)

n! n+1
=
(n − m)!(m − 1)! m(n − m + 1)
(n + 1)!
=
(n + 1 − m)!m!

n+1
=
m
The proof of the binomial theorem now. By induction on n.

1 1 n−i i 1 1 0 1 0 1
Basis. n = 1. We have (a =
+ b)1 i=0 i a b = a b + a b = a + b,
0 1
1 1
since =1= .
0 1
We now fix n and take as I.H. (1) at the head of this example.
For the same fixed n we next probe n + 1:
5.2 IC Over N 163

(a + b)n+1 = (a + b) (a + b)n

I .H . n n n n−1 1 n n−2 2 n n
= (a + b) a + a b + a b + ··· + b
0 1 2 n

multi ply n n+1 n n 1 n n−1 2 n n−2 3 n 1 n
= a + a b + a b + a b + ··· + a b
0 1 2 3 n

n n n n−1 2 n n−2 3
+ a b+ a b + a b + ···
0 1 2

n
+ a 1 bn + bn+1
n−1

I. II. III. n + 1 n+1 n+1 n 1 n + 1 n−1 2 n + 1 n−2 3
= a + a b + a b + a b +
0 1 2 3

n+1 1 n n + 1 n+1
··· + a b + b
n n+1

Here are a few additional exercises for you to try.
5.2.27 Exercise
1. Prove that 22n+1 + 32n+1 is divisible by 5 for all n ≥ 0.

n(n + 1) 2
2. Using induction prove that 1 + 2 + . . . + n =
3 3 3 , for n ≥ 1.
2
n+1 i
3. Using induction prove that i=1 i2 = n2n+2 + 2, for n ≥ 0.
√ 1 1 1
4. Using induction prove that n < √ + √ + . . . + √ , for n ≥ 2.
1 2 n
5. Let
b0 = 1, b1 = 2, b3 = 3
bk = bk−1 + bk−2 + bk−3 , for k ≥ 3
Prove by induction that bn ≤ 3n for n ≥ 0. (Once again, be careful to distinguish between

what is basis and what are cases arising from the induction step!)
As a postscript to our examples of induction proofs we offer this comment. It is clear that
since sets such as N − {0, 1, 3, 4, 5} and N ∪ {−3, −2, −1} are well-ordered (by <) we can
carry induction proofs over them. In the former case the “basis” case is at 6, in the latter
case it is at −3. In fact, in the preceding problem 4. the basis is at n = 2.
164 5 Induction
5.3 Inductive Definitions of Functions
Inductive definitions are increasingly being renamed to “recursive definitions” in the modern
literature, thus using “recursive” for definitions, and “induction” for proofs. I will not go
out of my way to use this dichotomy of nomenclature. Here are some familiar examples of
inductive definitions of functions.
5.3.1 Example For any integer a > 0 we define
a0 = 1
(†)
a n+1 = a · a n
This is an example of an inductive (recursive) definition of the non-negative integer powers

of a non zero number a.
One can use SI to prove for n ≥ 1 that the above definition ensures that
na

a = a × a × a × ··· × a
n
(1)
5.3.2 Example Another example is the Fibonacci sequence,9 given by
F0 = 0
F1 = 1, and for n ≥ 1 (‡)
Fn+1 = Fn + Fn−1
Unlike the function (sequence) a 0 , a 1 , a 2 , a 3 , . . ., for which we only need the value at n
to compute the value at n + 1, the Fibonacci function needs two previous values, at n − 1
and at n, to compute the value at n + 1.
The question is: Given an inductive definition of a function, can we prove that a function
f exists —that is, a potentially infinitely long table of input/output pairs— that satisfies the
“inductive specification”?
This translates, in the first example above, into “is there a realisation f —as a function,
an infinite table in this case— of what the definition (†) specifies as the behaviour of the
function?”
Such a function must obey the two equations below:
f (0) = 1
(† )
f (n + 1) = a · f (n)
9 The “sequence” F , F , F , . . . is, of course, a total function F : N → N.

0 0 0
5.3 Inductive Definitions of Functions 165
How NOT to answer this: “Of course it exists. This f satisfies f (0) = 1, so we got an
output for input x = 0. If we now assume that we do have an output f (n) at input x = n
(I.H.), then at input x = n + 1 we have the output a × f (n).”
What just happened here? We proved that IF a function f that satisfies († ) exists, then
it is total. We never proved that the infinite table, which the function f is supposed to be,
exists and has the stated property (i.e., obeys the inductive definition).
In fact we took for granted that f exists and satisfies the recurrence equations († ) and
proceeded to prove that then it will be total!
The above (non) “proof” of function existence has actually appeared in print in a Discrete
Math text!
This section looks into inductive definitions in general, and proves that a function defined
inductively as in, for example, († ) above exists and is unique.
We said “in general” above. So we will present the existence and uniqueness theorem by
generalising the Fibonacci example above in several directions, as follows:
1. Will have the second equation depend on several (more than just two that we used in the
Fibonacci definition) recursive calls, as we name them in computer programming.
2. The defined function will not need to be total nor will the functions —which make
the recursive calls (such as the function “+” in the Fibonacci example and “×” in the
exponential example)— need be total.
3. The inductive definition in the Fibonacci example defines Fn in terms of two recursive
calls on the two immediately preceding (along <) arguments n − 1 and n − 2 of n. We
generalise this in two directions:
• The order P we use in an inductive definition to “sort the inputs” in the general case
(equation number two) is not necessarily < on N, nor is necessarily total.
• The 2nd equation defines some function g at an argument a using one recursive call
for each predecessor of a along the order P.
Thus, to motivate the general inductive definition of a function F over any class equipped
with an order P that has IC, we first sketch the case of an inductive definition of a function
K over N equipped with the standard order <.
5.3.3 Tentative Definition We consider in this section a general recursive definition of a

function K : N → A, for a given set A.
The function G : N × 2 A → A is given and performs the needed recursive calls. A typical
call to G is G(n, X ) where n ∈ N and X ⊆ A, that is, X ∈ 2 A . Let us also fix a C ∈ A.
This inductive definition of K has the form below.
166 5 Induction
K (0) = C, and, for n > 0

(1)
K (n) = G n, K (0), K (1), . . . , K (n − 1)

5.3.4 Remark The notation of the set-argument

K (0), K (1), . . . , K (n − 1) (2)
in the definition (1) above is significantly less informative than the notation implies! Its
members —listed again in (2)— are just members of the set A and the marking of the inputs
responsible for the various K (i) is not embedded in these output values! So neither we, nor
G, knows which is which if we are just given the values in the set (2).

Can we modify the right hand side of K (n) to G n, K (0), K (1), . . . , K (n − 1) ?
No, because a function G cannot have a variable number of arguments! (n + 1 arguments
in all), that increases or decreases with the value of n.
This final idea however works: Tag along the input values that cause the K (i), that is,
use
K (0) = C, and, for n > 0
(3)
K (n) = G n, 0, K (0) , 1, K (1) , . . . , n − 1, K (n − 1)
A more elegant way to write down

0, K (0) , 1, K (1) , . . . , n − 1, K (n − 1)
is (cf. 3.1.4)
K (n) >
which we will use in all that follows.
Our theorem of the existence and uniqueness of inductively defined functions will be for
recursions along an arbitrary partial order P with IC and then we will obtain as a corollary
the case where P has IC but is not necessarily an order. Of course, a trivial corollary to all
that will be the existence and uniqueness of functions K defined as in (3) above.
5.3.5 Definition (Levy (1979)) A relation P is left-narrow iff (x)P−1 is a set for all x. It is
left-narrow over A iff P | A is left-narrow.
For example, ∈ is left-narrow by the foundation axiom (5.2.16 and p. 127), while is
not.
5.3.6 Definition (Initial Segments) If < is an order on A and a ∈ A, then the class {x :
x < a} = (a) > is called the (initial) open segment defined by a, while the class (a) ≥ =
{x : x ≤ a} is called the closed segment defined by a.
≤ is < ∪ =, of course, so that (a) ≥ = (a) > ∪{a}. Segments of left-narrow relations are
sets.
5.3.7 Theorem (Recursive or inductive definitions) Let <: A → A be a left-narrow order

with IC, and G a (not necessarily total) function G : A × U → X, for some class X.
Then there exists a unique function F : A → X satisfying:
(∀a ∈ A)F(a) = G(a, F (a) >) (1)
The requirement of left-narrowness guarantees (via Principle 3, 3.3.6) that the second argu-
ment of G in (1) is a set. This restriction does not adversely affect applicability of the theorem
as the reader will be able to observe.
In (1) above “=” is Kleene’s extended equality, so that in the recurrence (1) above we have
either both sides are defined and equal (as sets or atoms), or both are undefined (see 3.5.11).
Proof We prove uniqueness first, so let H : A → X also satisfy (1). Let a ∈ A and adopt
the I.H. that
(∀b < a)F(b) = H(b)
that is, for all b < a, (∀y)(b, y ∈ F ↔ b, y ∈ H), and therefore
F (a) >= H (a) >
It follows that
F(a) = G(a, F (a) >)

= G(a, H (a) >)
= H(a)
This settles the claim of uniqueness: (∀a ∈ A)F(a) = H(a), that is, F = H. Define now,

F = f : (∃a ∈ A) f : (a) ≥ → X
(2)
∧ ∀x ∈ (a) ≥ f (x) = G(x, f (x) >)
168 5 Induction
Note that F = ∅. For example, if a ∈ A is <-minimal,10 then (a) >= ∅ and hence f
(a) >= ∅ for any f , thus F contains {(a, G(a, ∅))}, if G(a, ∅) ↓, else it contains the
empty function ∅ : (a) ≥ → X.
For the latter we clearly have ∅(a) = G(a, ∅ (a) >), where both sides are undefined.
A trivial adaptation of the uniqueness argument to the case that A is a closed segment
(a) ≥, shows that if f : (a) ≥ → X and g : (a) ≥ → X are in F , then f = g. We use “ f a ”
to denote the unique f : (a) ≥ → X for each a ∈ A, if it exists.
To remove the hedging, fix a and assume (I.H.) that, for each b < a, f b : (b) ≥ → X
satisfying (1) (where here A = (b) ≥) exists.
Let us argue that so does f a . Indeed, define h : (a) ≥ → X from the existing (by the I.H.)
f b and G by

{a, G(a, b<a f b )} ∪ b<a f b if G(a, b<a fb ) ↓
h= (3)
b<a f b otherwise
Observe next that, by transitivity of <,11 we have (c) ≥ ⊆ (b) ≥ whenever c ≤ b,12 therefore
f c ⊆ f b , due to f c = f b ≤ (c) (by uniqueness).
We draw two conclusions:

First, to retire the induction, note that b<a f b in (3) is single-valued (a function) and is
equal to h (a) >. Thus h satisfies the recurrence (1) at a outright, and also at b < a because
h(b) = f b (b)
= G(b, f b (b) >)
= G(b, h (b) >), since f b ⊆ h
It follows that h = f a , and hence by CVI f a exists for all a ∈ A.

Second,
f a ∈ F ∧ f b ∈ F ∧ x ∈ dom( f a ) ∩ dom( f b ) → f a (x) = f b (x) (4)
because (x) ≥⊆ (a) ≥ ∩(b) ≥, hence f a (x) ≥= f b (x) ≥ by uniqueness on (x) ≥.
By (4),
F= F is a function F : A → X
F satisfies the recurrence (1) of the theorem. Indeed, let a ∈ A. Then
10 Since < has MC on A, it does have minimal elements.

11 We have just used the assumption that < is an order.
12 Let x ∈ (a) ≥. Then x ≤ a and a ≤ b hence x ≤ b.
F(a) = f a (a), since f a ⊆ F

= G(a, f a (a) >)
= G(a, F (a) >), since f a ⊆ F

5.3.8 Remark
(1) Since (a) ≥ is a set for each a ∈ A (by left-narrowness), so is dom( f ) for each f ∈ F
and hence each f itself is a set, by Principle 3, so forming the class F is legitimate.13
(2) The simple recursion on the natural numbers, where g is total
f (0) = a
(1)
for n ≥ 0, f (n + 1) = g(n, f (n))
is a special case of 5.3.7:

Indeed we can rewrite (1) as
(∀n ∈ N) f (n) = G(n, f (n) >) (2)
where
⎧
⎪
⎪ if n = 0
⎨a
G(n, h) = g(n − 1, h(n − 1)) if h is a function ∧ dom(h) = {x : x < n}
⎪
⎪
⎩↑ otherwise
Note that G on N × U is nontotal. In particular, if the second argument is not of the

correct type (middle case above), G will be undefined. We can still prove that f (n) ↓
for all n ∈ N.
Indeed, assume the claim for m < n (I.H.). For n = 0, f (0) = G(0, ∅) = a; defined. Let
next n > 0. Now f (n) = G(n, f (n) >) and dom( f (n) >) = (n) > by I.H., hence

f (n) = G(n, f (n) >) = g n − 1, f (n) > (n − 1) = g(n−1, f (n−1));
defined (g total).
(3) In view of the above, it is worth noting that a recursive definition à la 5.3.7 can still
define a total function, even if G is nontotal.
5.3.9 Corollary (Inductive Definition with Respect to Any P with IC) Montague (1955),
Tarski (1955)
Let P : A → A be a left-narrow relation —not necessarily an order— with IC, and G
a (not necessarily total) function G : A × U → X, for some class X. Then there exists a
13 A class must contain only sets or atoms.

170 5 Induction
unique function F : A → X satisfying:
(∀a ∈ A)F(a) = G(a, F (a)P−1 )
: A × U → X by
Proof Define G
↑ if f is not a function
G(a, f) = −1
G(a, f (a)P ) othw
Let < stand for P+ and hence > is (P−1 )+ (cf. Exercise 3.9.21). Now < is an order on A
that has IC, and is left-narrow since

(a)(P−1 )+ = (a)(P−1 )n : n > 0
and an easy argument shows that each (a)(P−1 )n is a set (Exercise 5.4.27). Thus, by 5.3.7,
there is a unique F : A → X such that
F (a) >)
(∀a ∈ A)F(a) = G(a, (1)

= G a, F (a) > (a)P−1
Now, since (a)P−1 ⊆ (a) > we have (F (a) >) (a)P−1 = F (a)P−1 , hence (1)
becomes
(∀a ∈ A)F(a) = G(a, F (a)P−1 )

5.3.10 Corollary (Recursion with a total G) Let P : A → A be a left-narrow relation —

not necessarily an order— with IC, and G a total function G : A × U → X, for some class
X. Then there exists a unique total function F : A → X satisfying
(∀a ∈ A)F(a) = G(a, F (a)P−1 )
Proof We only need to show that dom(F) = A. By 5.3.9, there is a unique F satisfying
(∀a ∈ A)F(a) = G(a, F (a)P−1 )
Clearly the right hand side of = is defined for all a ∈ A.
5.3.11 Remark (Notation Moschovakis (1969)) In the following corollaries we use some
notation introduced by Moschovakis:
Define, the functions π and δ by
π(z) is the x such that (∃y)z = (x, y)
δ(z) is the y such that (∃x)z = (x, y)

Incidentally, π is for prèto (first) and δ for deÚtero (second).
5.3.12 Corollary (Recursive definition with parameters I) Let P : A → A be a left-

narrow relation —not necessarily an order— with IC, and G a (not necessarily total) func-
tion G : S × A × U → X, for some classes S and X. Then there exists a unique function
F : S × A → X satisfying:

(∀(s, a) ∈ S × A)F(s, a) = G s, a, s, x, F(s, x) : xPa (1)
In equation (1) s persists throughout (unchanged), hence it is called a “parameter”.
Proof Define the relation

P on S × A by
(u, a)
P(v, b) iff u = v ∧ aPb
It is clear that
P has MC. Now, (1) can be rewritten as

(∀(s, a) ∈ S × A)F(s, a) = G s, a, s, x, F(s, x) : (s, x)
P(s, a)

= G s, a, F (s, a)
P−1
The result follows from 5.3.9 by using the J given below as the “G-function”
↑ if g ∈
/ S×A
J(g, f ) =
G(π(g), δ(g), f ) othw
Thus,

(∀g ∈ S × A)F(g) = G π(g), δ(g), F (g)
P−1

= J g, F (g)
P−1
5.3.13 Corollary (Recursive Definition with Parameters II) Let all assumptions be as in
Corollary 5.3.12, except that the recurrence now reads

(∀(s, a) ∈ S × A)F(s, a) = G s, a, x, F(s, x) : xPa (1)
Then there exists a unique function F : S × A → X satisfying (1).

172 5 Induction
Proof Apply Corollary 5.3.12 with

P as above and a “G-function” J, given by
J(s, a, f ) = G(s, a, p23 ( f ))
where p23 : U → U —to get the right hand side of (2) to be the same as that in (1)— is
⎧
⎨↑ if f is not a class of 3-tuples
p23 ( f ) =
⎩ δ(π(z) , δ(z)) : z ∈ f othw
Thus, (1) takes the format of 5.3.12,

(∀(s, a) ∈ S × A)F(s, a) = J s, a, F (s, a)
P−1 (2)
Note that,
F (s, a)
P−1 = (s, x), F(s, x) : xPa

thus, setting z = (s, x), F(s, x) , we have δ π(z) = x and δ(z) = F(s, x) as needed by
(1).
5.3.14 Corollary (Pure Recursion Along a Well-Ordering with a Partial G) Let <: A →
A be a left-narrow well-ordering, and G a (not necessarily total) function G : U → X, for
some class X.
Then there exists a unique function F : A → X satisfying (1)–(2) below:
(1) (∀a ∈ A)F(a) = G(F (a) >),
(2) dom(F) is either A, or (a) > for some a ∈ A.
“Pure recursion” refers to the fact that G has only one argument, the “history” of F on the
open segment (a) >.
Proof In view of Theorem 5.3.7, we only need prove (2). So let dom(F) = A. Let a in A
be <-minimal (also minimum here, since < is total) such that
F(a) ↑ i.e., G(F (a) >) ↑ (3)
Thus (a) > ⊆ dom(F). We will argue that dom(F) = (a) >. Well, let instead b ∈ dom(F) −
(a) > be minimal such that F(b) ↓.
By (3) and totalness of <, it is a < b. By choice of b,
(∀x)(a ≤ x < b → F(x) ↑)

Thus,
F (b) >= F (a) > (4)
therefore
F(b) = G(F (b) >)

= G(F (a) >) (by (4))
= F(a)
contradicting (3), since F(b) ↓.

5.3.15 Example Let G : {0, 1} × U → {0, 1} be given as
1 if x = 1 ∧ f = ∅
G(x, f ) =
↑ othw
and {0, 1} be equipped with the standard order < on N. Then the recursive definition
(∀a ∈ {0, 1})F(a) = G(a, F (a) >)
yields the function F = {1, 1} whose domain is neither {0, 1} nor a segment of {0, 1}.
Thus the requirement of pure recursion in 5.3.14 is essential.14
5.3.16 Remark In “practice”, recursive definitions with respect to a P that has MC (IC)
have often the form
H(s) if x is P-minimal
F(s, x) =
G(s, x, {s, y, F(s, y) : yPx}) othw
given by
This reduces to the case considered in 5.3.12 with a G-function, G,
x, f ) = H(s)
G(s,
if x is P-minimal
G(s, x, f ) othw
A similar remark holds —regarding making the “basis” of the recursion explicit— for all
the forms of recursion that we have considered.
14 It was tacitly taken advantage of in the last step of the proof. Imagine what would happen if F’s
argument were explicitly present in G: We would get G(b, F (b) >) = G(b, F (a) >) but not
necessarily G(b, F (a) >) = G(a, F (a) >).
174 5 Induction
5.3.17 Example (The support function) The support function sp : U → U gives the set
of all atoms, sp(x), that took part in the formation of some set x.
For example,
sp(∅) = ∅
sp({{{∅}}}) = ∅
sp({2, {#, !, {1}}}) = {2, #, 1, !} for atoms 2, #, 1, !
The existence and uniqueness of sp is established by the following recursive definition:
{x} if x is an atom
sp(x) = (1)
{sp(y) : y ∈ x} othw
That (1) is an appropriate recursion can be seen as follows:
First, ∈ is left-narrow and has MC.

Next, (1) can be put in “standard” form (Theorem 5.3.9 in this case)
(∀x ∈ dom(sp))sp(x) = G(x, sp (x) ) (2)
(of course, for a set x, (x) = x) where the total G : U × U → U is given by

⎧
⎪
⎨{x}
⎪ if x is an atom
G(x, f ) = ↑ othw, if f is not a relation
⎪
⎩
⎪
δ(y) in all other cases
y∈ f

In (2) the middle case for G above never applies. Note that, for a set x, δ(y) =
y∈spx
{sp(y) : y ∈ x}.
5.3.18 Definition A set with empty support is called a pure set.
5.3.1 Examples on Inductive Function Definitions
≺ (a , b ) iff a < a

5.3.19 Lemma Let n ≥ 1. If we define the order ≺ on Nn+1 by (a, b)
and b = b , then ≺ is an order that has MC on Nn+1 .
Proof 1. ≺ is an order:
≺ (a, b),
• Indeed, if (a, b) then a < a which is absurd.
≺ (a , b ) ≺ (a , b ), then b = b = b and a < a < a . Thus a < a and
• If (a, b)
≺ (a , b ).
hence (a, b)

2. ≺ has MC: So let ∅ = A ⊆ Nn+1 . Let a be <-minimal in S = {x : (∃b)(x, ∈ A} ⊆ N.
b)
Pause. Why is S = ∅?
Let c be such that (a, c) ∈ A. This (a, c) is ≺-minimal in A. Otherwise for some d,
A (d, c) ≺ (a, c). Hence d < a, but this is a contradiction since d ∈ S (why?).
The minimal elements of ≺ in Nn+1 are of the form (0, b), (0, b ), (0, b ), . . ., which are

not comparable if they have distinct “b-parts”. Thus they are infinitely many.
We can now state the important (for computability, e.g., cf. Tourlakis (2012, 2022)).
5.3.20 Definition (Primitive Recursive Schema) The following inductive definition is the
schema of primitive recursion due to Dedekind. Define f : Nn+1 → N via given functions
h : Nn → N and g : Nn+2 → N by
f (0, y) = h(y )

(1)
f (x + 1, y) = g(x, y, f (x, y))
5.3.21 Theorem The schema (1) of 5.3.20 defines inductively a unique function f :
Nn+1 → N.
Proof Using the relation ≺ of 5.3.19 that has MC (and thus IC) on Nn+1 , we rewrite (1) of
5.3.20 as follows:
• First, let G be given by

⎧
⎪
⎪ y) if x = 0
⎨h(

De f
G(x, y, ψ) = g x − 1, y, ψ x − 1, y if x > 0 and ψ is a function Nn+1 → N
⎪
⎪
⎩
↑ othw
• Thus we can rewrite (1) as

∀(x, y) ∈ Nn+1 f (x, y) = G x, y, f x, y (2)
176 5 Induction
• Noting that

f x, y = x−1, y, f (x − 1, y) , x − 2, y, f (x − 2, y) , . . . , 0, y, f (0, y)
(3)

we see that the function in (3) applied to input x − 1, y yields f (x − 1, y) as needed.

5.3.22 Exercise Prove by induction on x (and using y as a parameter) that the f defined
by (1) is total provided h and g are.
Let us see some examples of primitive recursions:
5.3.23 Example We know that 2n means

n 2s

2 × 2 × 2 × ... × 2
But “. . .”, or “etc.”, is not mathematics! That is why we gave at the outset of this section
the definition 5.3.1.
Applied to the case a = 2 we have
20 = 1
2n+1 = 2 × 2n (1)
From 5.3.21 we have at once that 5.3.1 and in particular 5.3.23 defines a unique function,
each satisfying its defining equations.
For the function that for each n outputs 2n we can give an alternative definition that uses
“+” rather than “×” in the “g-function” part of the definition:
20 = 1
2n+1 = 2n + 2n
m
5.3.24 Example Let f : Nn+1 → N be given. How can I define i=0
—for any
f (i, b)
b ∈ Nn — other than by the sloppy
+ f (1, b)
f (0, b) + f (2, b)
+ . . . + f (i, b)
+ . . . + f (m, b)?

By induction/recursion, of course:
0
= f (0, b)
f (i, b)
i=0
m+1
=
f (i, b) m
i=0 f (i, b) + f (m + 1, b) (1)
i=0

n
5.3.25 Example Let f : Nn+1 → N be given. How can I define i=0
—for any
f (i, b)
b ∈ Nn — other than by the sloppy
× f (1, b)
f (0, b) × f (2, b)
× . . . × f (i, b)
× . . . × f (n, b)?

By induction/recursion:
0
= f (0, b)
f (i, b)
i=0
n+1

f (i, b) = n
i=0 f (i,
b)
× f (n + 1, b) (2)
i=0
n

Again, by 5.3.21, (2) defines a unique function named λn b. i=0
that behaves as
f (i, b)
required.
5.3.26 Example Here is a function with huge output! Define f : N → N by
f (0) =1
f (n + 1) = 2 f (n) (3)
What does (3) look like in the notation of 5.3.21?

It is
f (0) =1
f (n + 1) = G(n, f (n)) (3 )
where for all n and z, G(n, z) = 2z .
What does the output f (n) look like in mathematical notation? Well,
f (1) = 2 f (0) = 2, f (2) = 2 f (1) = 22 , f (3) = 2 f (2) = 22

2
f (0) = 1,
Hmm! Is the guess that f (n) is a ladder of n 2s correct? Yes! Let’s verify by induction:
1. Basis. f (0) = 1. A ladder of zero 2s. Correct!

2. I.H. Fix n and assume that ⎫
· 2⎬
·
2 2· ⎭ n 2s
f (n) = 2
A ladder of n 2s.
3. I.S. Thus f (n + 1) = 2 f (n) , so we put the ladder of n 2s of the I.H. as the exponent of
2 —forming a ladder of n + 1 2s— to obtain f (n + 1). Done!
178 5 Induction
5.3.2 Fibonacci-like Inductive Definitions; Course-of-Values Recursion
5.3.27 Definition (Course-of-Values-Recursion) The general case of Fibonacci-like recur-

sive definitions is based on 5.3.9 or 5.3.12 and uses the order ≺ (and =≺−1 ).
f (0, y) = h(y )

Fibonacci-like
f (n, y) = if n > 0, then g n − 1, y, f (n, y)
One often refers to the part “ f (n, y) ”, that is,15

n − 1, y, f (n − 1, y) , n − 2, y, f (n − 2, y) , . . . , 0, y, f (0, y)
as the history of f at (n − 1, y).

In computability theory Fibonacci-like recursions are called Course-of-Values
Recursions (CVR) as they depend for their recursive calls, in general, on the entire his-
tory of the under definition function.
The above CVR has the form of 5.3.9 or 5.3.12.
5.3.28 Example (Fibonacci again; with a comment re Basis case) Thus if want to fit
the Fibonacci definition into the general schemata of 5.3.9 or 5.3.27 —without a parameter
“y ”— we would choose a “g” like this
⎧
⎪
⎪ if f N2 then ↑
⎪
⎪
⎨else if n = 0 then 0
g(n, f ) = (1)
⎪
⎪ else if n = 1 then 1
⎪
⎪
⎩
else if n > 1 then f (n − 1) + f (n − 2)
Thus the recurrence

F(n) = g(n, F (n) ) (2)
shows the uniqueness and existence of the Fibonacci definition via Theorem 5.3.7 or 5.3.9.
In (1) above “ f (1) = 1” is not a “Basis case” because 1 is not minimal in N. (“ f (0) = 0”
is the Basis case since 0 is <-minimal, indeed least, in N).
So what is “ f (1) = 1”? It is a boundary case of the g-definition since n − 2 makes no
sense in the Fibonacci recurrence if n = 1. Display (2) (via (1)) yields F(0) = 0, F(1) = 1
and, for n > 1, F(n) = F(n − 1) + F(n − 2).
15 In the expanded version below it is understood that the tuple (x, y, f (x, y)) is missing if f (x, y) ↑.
5.4 Exercises 179
5.4 Exercises
1. Use induction to prove that 1 + 2 + 22 + · · · + 2n = 2n+1 − 1.

n 2
2. Use induction to prove that i=1 i = n(n + 1)(2n + 1)/6.
n 3 n 2
3. Use induction to prove that i=1 i = i=1 i .
n
Hint. You may use the well-known i=1 i = n(n + 1)/2.
4. Use induction to prove that 5n − 3n is even, for n ≥ 0.
5. Use a direct proof (no induction!) to prove that 5n − 3n is even, for n ≥ 0.
6. Use induction to prove that 11n − 4n is divisible by 7, for n ≥ 1.
7. Use induction to prove that 5n+1 + 2 × 3n + 1 is divisible by 8, for n ≥ 0.
8. Use induction to prove that n 3 − 4n + 6 is divisible by 3, for n ≥ 0.
9. Use induction to prove that n 2 − n is divisible by 2, for n ≥ 0.
10. Prove that n 3 − n is divisible by 3, for n ≥ 0.
1 1 1 √
11. Using induction prove that √ + √ + . . . + √ ≤ 2 n − 1, for n ≥ 1
1 2 n
12. Use induction to prove that n! ≥ 22n , for n ≥ 9.
13. Prove 9 without induction, using a one-(short)line direct proof.
n
14. Use induction to prove that i=1 (3i − 2) = (3n 2 − n)/2.
n
15. This time do not use induction. Prove directly —using the well-known i=1 i = n(n +
n
1)/2— that i=1 (3i − 2) = (3n − n)/2.
2
16. Use induction to prove that

n
1 n
=
(4n − 3)(4n + 1) 4n + 1
i=1
17. Can you prove that

n
1 n
=
i · (i + 1) n+1
i=1
without induction?
1 1
Hint. 1/i(i + 1) = − .
i i +1
18. Use induction to prove that
n
1 n
=
i · (i + 1) n+1
i=1
19. Can you prove that

n
1 n
=
(4n − 3)(4n + 1) 4n + 1
i=1
without induction?
180 5 Induction
20. Let
b1 = 3, b2 = 6
bk = bk−1 + bk−2 , for k ≥ 3
Prove by induction that bn is divisible by 3 for n ≥ 1. (Be careful to distinguish between

what is basis and what are cases arising from the induction step!)
21. Prove that
(−2)k = (1/3)(1 − 2n+1 )

0≤k≤n
for all odd positive n.

22. Prove that 22n+1 + 32n+1 is divisible by 5 for all n ≥ 0.
23. Let
F0 = 0, F1 = 1
Fk = Fk−1 + Fk−2 , for k ≥ 2
√
1+ 5
Let φ stand for the number . Prove by induction that Fn > φ n−2 for all n ≥ 3.
2
24. Let A be a set of n elements. Prove that 2 A has 2n elements using the binomial theorem.
n n
Hint. By the binomial theorem (5.2.25) 2n = (1 + 1)n = i=0 . Do not mix this
i
up with the methodology suggested in Exercise 25 below.
25. Use induction on n to prove that if A has n elements, that is, A ∼ {0, 1, . . . , n − 1}
if n ≥ 1 —that is, A has the form {a0 , a1 , . . . , an−1 }— or A = ∅, then 2 A has 2n
elements.
Hint. For the induction step —going from A = {a0 , . . . , an−1 } to A = {a0 , . . . ,
an−1 , an }— argue that the added member an is in as many new subsets (of A ) as
A has in total.
26. Show, for any 0 < n ∈ N, that (Pn )−1 = (P−1 )n .
27. Let P on A be left-narrow. Show that, for any a ∈ A, (a)(P−1 )n is a set.
Hint. (y)P−1 = {x : xPy} is a set for any y by left narrowness. What values go into x
in an expression like y (P−1 ) ◦ (P−1 ) ◦ · · · ◦ (P−1 ) x for any y?

n
28. Prove that every natural number ≥ 2 is a product of primes, where 2 is the “trivial”
product of one factor.
Hint. Use CVI in conjunction with 5.2.20.
29. Supplement the above problem to add a proof that every natural number ≥ 2 is a
product of primes in a unique way, if we disregard permutations of the factors, which
is permissible by the commutativity of ×.
5.4 Exercises 181
30. Code any finite set S = {a0 , a1 , . . . , an } by a code we will name c S (“c” for “code”)
given by
De f !
cS = pa
a∈S
where “ pa ” is the “a-th prime” in the sequence of primes
position =0 1 2 3 4 ...
prime name = p0 p1 p2 p3 p4 . . .
prime value = 2 3 5 7 11 . . .
Prove
a. The function that assigns to every finite ∅ = S the number c S —and assigns 1 to
∅— is a 1-1 correspondence onto its range.
b. Use that fact to show that the set of all finite subsets of N is enumerable.
31. Reprove the previous problem with a different coding: To every set S = {a0 , a1 , . . . ,
an } —where the ai are distinct— this time we assign the natural number (we assign
0 to ∅).
bc S = 2a0 + 2a1 + 2a2 + · · · + 2an
“bc” stands for “binary code”.16 That is, show
a. The function that assigns to every finite ∅ = S the number bc S —and assigns 0
to ∅— is a 1-1 correspondence onto its range.
b. If we are given a binary code of a set, we easily find the set if we convert the
number in binary notation. The positions of the 1-bits (terminology for binary
digits) in the code are the values of the members of the set.
c. Now argue again that the set of finite subsets of N is enumerable.
32. Prove that if A ⊆ B and A and C are enumerable, then B ∪ C ∼ B.

Hint. You can actually construct the 1-1 correspondence.
33. Prove that if B is infinite then it has an enumerable subset.
Hint. Construct a sequence of infinitely many distinct members of B,
b 0 , b1 , b2 , b3 , . . .
and argue that the set {b0 , b1 , b2 , b3 , . . .} is enumerable.

In more detail,
16 The literature on computability refers to this finite set code as the “canonical index” (Rogers
(1967), Tourlakis (2022)).
182 5 Induction
a. “Define” the sequence above by induction (recursion) by
Pick (and fix) any b in B and call it b0

Pick (and fix) any b in B − {b0 , . . . , bn } and call it bn+1
b. Well, “define” is a bit of an exaggeration as this implies an infinite length proof

since we have/give no clue on how these infinitely many choices are done. Let’s
use the axiom of choice that guarantees a function C exists such that for each
S ∈ 2 B − {∅} it is C(S) ∈ S: We now really define
b0 = C(B)

bn+1 = C B − {b0 , . . . , bn }
From our work on inductive definitions, the sequence (function of n, right?) exists.
Next
c. Prove by induction on n that the function λn.bn 17 is total (on N), 1-1 and onto
the set
T = {b0 , b1 , . . .} (1)
(the onto part is trivial; why?).
34. Prove that if B is infinite and C is enumerable, then B ∪ C ∼ B.

Hint. Use 33 and 32.
35. (Dedekind Infinite) Dedekind gave this alternative definition of infinite set, namely,
A is infinite iff for some proper subset of A —let’s call it S— we have A ∼ S.
Prove that his definition is equivalent to the one we introduced in Definition 3.6.1.
There are two directions in the equivalence!
Hint. Use 33 and 34. Note that if A ⊆ B is enumerable, then B = (B − A) ∪ A.
The following few exercises expand our topics.
36. (Upper bounds in POsets) Let (A, <) be some POset, and let ∅ = B ⊆ A.
We call a u ∈ A an upper bound of B iff, for all x ∈ B, we have x ≤ u —where you
will recall that x ≤ u means x < u ∨ x = u.
We say that u is the least upper bound of B (in A) —in symbols, u = lub(B)— or also
the supremum or “the sup”, in symbols, u = sup(B), iff for all upper bounds u of B
we have that u ≤ u .
Determine upper bounds and the lub (if any) for the following two sets in the POset

{1, 2, 3}, {(1, 2), (1, 3)}
17 λ-notation was defined in 3.5.10.

5.4 Exercises 183
• {1} and
• {1, 2, 3}.
√
37. Prove that 2 is not rational. We√ say that a real number that is not rational is irrational.
Hint. The statement means √ that 2 cannot equal m/n for any integers m, n (n = 0, of
course). Well, assume that 2 = m/n, for some m, n, where we also assume that we
have reduced the fraction m/n to the lowest possible numerator and denominator.
Now 2n 2 = m 2 says that 2 divides 2
√ m . Does it also divide m? Can you now reach a
contradiction to the assumption 2 = m/n?
38. (A non Constructive Proof!) Prove that there are irrational numbers a and b such that
a b is rational (fraction of two integers).
Hint. Logic tells us that A ∨ ¬A is true for any sentence A —see Exercises 4.2.8 for a
definition of “sentence”.
So, consider cases.
√ √2
a. Case where 2 is rational. Done!
√ √2
b. Negation of case above: 2 is irrational. Take it from here.
39. Recall the notation “(a, b)” for the open interval of reals or rationals as the case may
De f De f
be, that is, (a, b) = {x√∈ R : a < x<b} or (a, b) = {x∈Q : a < x < b} respectively.18
A real —such as 2 that is not rational is called irrational. √
40. Start with the POset (R,√<). Consider next the sets of rationals S = {x ∈ Q : x < 2}
and T = {x ∈ Q : x > 2}. Prove two things:
a. T has no smallest (rational, of course, element).

Hint. Use the “calculus 101” fact that every interval of reals (a, b) contains some
rational√number (hence, infinitely many; why “hence”?).
b. While 2 is trivially the lub of S if we include irrational numbers in the set of
upper bounds, on the other hand if we insist on rational upper bounds only (all
those are in T ), then there is NO least.
41. Prove that if (A, <) is a LOset (linearly ordered set), then any pair a, b of members of
A has a least upper bound. Explain how such an lub can be found/constructed in each
case.
42. (Knaster-Tarski Fixpoint theorem) Let (A, <) be a POset with a minimum element m
and f : A → A be a total continuous function, which in the context of POsets means
that lub and function applications (calls) commute, that is, whenever lub(X ) exists for
∅ = X ⊆ A, then so does lub( f [X ]) and f (lub(X )) = lub( f [X ]).
18 We are reminded that R stands for the set of all real numbers and Q for the set of all rational
numbers. Of course, Q R as we know from “calculus 101”.
184 5 Induction
Assume now that every nonempty totaly ordered by “<” subset of A —such a subset
is often called a “chain”— has a lub.
Prove that there is an a ∈ A such that f (a) = a. Such an a is called a fixed point or
fixpoint of f .
Hints.
• Prove that f is increasing on A, that is, if f (x) ↓ and f (y) ↓ and x ≤ y, then
f (x) ≤ f (y) (Hint. What is lub({x, y})?).
• Define inductively the sequence —that is, function λn.an from N to A— below:
a0 = m
an+1 = f (an )
• Prove by induction that λn.an is total and increasing, that is, an ≤ an+1 , for n ≥ 0.
• Let a be defined to stand for lub {a0 , a1 , a2 , . . .} , that is, lub(ran(λn.an )).

Prove f (a) = a. Hint for this bullet. Show that lub {a0 , a1 , a2 , . . .} =

lub {a1 , a2 , . . .} .
43. Prove that the fixpoint you found above is least. That is, if c is any other member of A
such that f (c) = c, then a ≤ c.
44. (An Application of 42 to Computability) Let P(N : N) denote the set of all 1-argument
functions from N to N.
Let F be a total function F : P(N : N) → P(N : N). Such a function is called an
operator.
Now equip P(N : N) with the order ⊂ to obtain the POset of unary functions under the
inclusion (subset) order.
Prove

a. P(N : N), ⊂ has the properties given abstractly to (A, <) in Exercise 42 above.
b. Assume that the total operator F : P(N : N) → P(N : N) is continuous. Prove that
F has a least fixpoint α ∈ P(N : N).
Note. In computability theory F is assumed to be computable (in some mathemat-
ically appropriate sense). Then, provably, so is its least fixpoint. This result, due to
Kleene, known as (a special case of his) first recursion theorem (cf. Tourlakis (2022))
has extensive applications in computability, but also in the area of program semantics.
5.4 Exercises 185
45. The greatest common divisor —acronym gcd— let us call it “d”, of two nonzero
integers a and b is the largest positive common divisor of the two. We write d =
gcd(a, b).
Prove that if d = gcd(a, b), then for some integers (members of Z = {. . . , −1, 0, 1, . . .})
x, y we have d = ax + by.
Hint. Prove that the set S = {ax + by : x ∈ Z ∧ y ∈ Z} has positive members. Call d
the smallest such positive member and prove d = gcd(a, b). To this end,
a. Prove that we may write d = ax + by for the smallest positive member of S. (Trivial)
b. Every common divisor of a and b divides d (Trivial)
c. d divides a and b. If not it will be, say, a = dq + r for some q and 0 < r < d. Derive
a contradiction by showing that r = a X + bY for some X and Y in Z. Similarly for b.
46. If ab = 0 and 1 = gcd(a, b), then we say that a and b are relatively prime.
Prove that if 1 = gcd(a, b) and a | bc —where as before, x | y means that y = xq for
some q— then a | c.
Hint. Use 45 above.
47. Refer to Example 5.2.21. Generalise said example to any base. Thus, let 1 < b ∈ N.
Prove that every natural number n ≥ 0 is expressible base-b as an expression
n = am bm + am−1 bm−1 + · · · + a1 b + a0 (1)
where each ai satisfies 0 ≤ ai < b (2)

Hint. Use CVI.
48. Write an algorithm as a, say, pseudo C program,19 which will convert a number n ≥ 0
given base-10 to base-b.
Hint. Due to (1) above,
n = am bm + am−1 bm−1 + · · · + a1 b + a0

= am bm−1 + am−1 bm−2 + · · · + a1 b + a0
thus we can obtain a0 by noting the remainder of the division of n by b. Your

algorithm ought to work from right to left to get the sequence a0 , a1 , . . . am by repeating
the preceding observation.
49. Demonstrate your algorithm above by converting 131 (this is expressed in decimal or
base-10 notation) to binary (or base-2) notation.
19 “Pseudo” means to not be too faithful to programming language syntax, and shortcuts in notation
are allowed if they do not introduce ambiguities.
186 5 Induction
50. (An ancient theorem20 ) Prove that there are infinitely many primes. Note that this will
be a proof by contradiction. Induction is not relevant.
Hint. Suppose instead that there only finitely many primes, exactly these n + 1:
p 0 , p 1 , . . . , pn
Consider as Euclid did their product plus 1:
Q = p 0 × p 1 × . . . × pn + 1
We have two cases: One, Q is prime and Two Q is not prime.

Trivially show that the first case implies a contradiction right away and then invoke
5.2.20 for the second case to get, again, a contradiction. Done. Right? Fill in all the
“blanks”.
20 Euclid.
Inductively Defined Sets; Structural Induction
6
Overview
This chapter introduces a generalisation of the definitions by induction (recursion) of the last
section. Here we define sets inductively, not functions. The associated proof tool —induction
along an inductive definition, or structural induction — of properties of inductively defined
sets is introduced and validated.
We also connect the inductive definition of sets with an appropriate iterative construction
by stages and also we connect it (in the chapter’s Exercises section) with the definition of
sets as monotone operator fixpoints (see 3.8.1).
6.1 Set Closures
An example of an inductively defined set is the following.

Suppose you want to define by finite means, and do so precisely, the set of all “simple”
arithmetical expressions that use the numbers 1, 2, 3, the operations + and ×, and round
brackets (but nothing else). Then you would do it like this:
The set of said simple arithmetical expressions is the smallest set (⊆-smallest) that
1. Contains each of 1, 2 and 3.

2. If it contains expressions E and E , then it also contains (E + E ) and (E × E ).
Some folks would add a 3rd requirement “nothing else is in the set unless so demonstrated
using 1. 2. above” and omit “smallest”. Really?
https://doi.org/10.1007/978-3-031-30488-0_6
188 6 Inductively Defined Sets; Structural Induction
How exactly would you so “demonstrate”? In a recursive definition you ought to be able
to make your recursive calls and not have to trace back why the object you constructed
exists!
We will prove in Sect. 6.3.5 that indeed there is an iterative way to show that a particular
simple arithmetic expression was formed correctly by our recursion, but that defeats the
beauty of recursion.
Besides, until we reach said section we don’t even know what “nothing else is in the
set unless so demonstrated using 1. 2. above” means or how to “use” 1. and 2. to do it!
So it is nonsense to stick such a statement in the bottom of the definition as a (redundant)

afterthought.
Before we get to the general definitions, let us finesse our construction and propose some
terminology.
(a) First off, in step 1. above we say that 1, 2 and 3 are the initial objects of our recur-
sive/inductive definition.
(b) In step 2. we say that (E + E ) is obtained by an operation (on strings) that is available
to us, depicted as a “blackbox” below, which we named “+”.
E
−→
+ −→ (E + E )
−→
E
In words, the operation concatenates from left to right the strings
“(”, the string named by “E”, “+”, the string named by “E ”, and “)”
Similar comments for the operation “×”.
E
−→
× −→ (E × E )
−→
E
(c) Both operations in this example are single-valued, that is, functions. It is preferable to
be slightly more general and allow operations that are just relations, but not necessarily
functions. Such an operation O(x1 , . . . , xn , y) is n-ary —n inputs, x1 , . . . , xn — with
output variable y.
(d) We say that a set of objects S is closed under a relation (operation) —it could also be a
function— O(x1 , . . . , xn , y) meaning that for all input values x1 , . . . , xn in S, all the
obtained values y are also in S.
6.1 Set Closures 189
We are ready for the general definition:
6.1.1 Definition Given a set of initial objects I and a set of operations O = {O1 , O2 , O2 ,
. . .}, the object Cl(I , O) is called the closure of I under O —or the set inductively defined
by the pair (I , O)— and denotes the ⊆-smallest set 1 S that satisfies
1. I ⊆ S.
2. S is closed under all operations in O, or simply, closed under O or even O-closed.
3. The “smallest” part: Any set T that satisfies 1. and 2. also satisfies S ⊆ T .
The set O may be infinite. Each operation Oi is a set.
Nice definition, but does the set Cl(I , O) exist given any I and O? Yes. But first,
6.1.2 Theorem For any choice of I and O, if Cl(I , O) exists, then it is unique.
Proof Say the definition of Cl(I , O) ambiguously —i.e., may have more than one value—
leads to two classes, S and T .
Then, letting S pose as closure, we get S ⊆ T from 6.1.1, 3.
Then, letting T pose as closure, we get T ⊆ S, again from 6.1.1, 3. Thus S = T .
6.1.3 Theorem For any choice of I and O with the restrictions of Definition 6.1.1 the set
Cl(I , O) exists.
Proof We have to check and note a few things.
1. By 3.1.5, for each Oi , ran(Oi ) is a set (because the Oi is).

2. The class F = {ran(Oi ) : i = 1, 2, 3 . . .} is a set. This is so by Principle 3, since I can
index all members of F by assigning unique indices from N to each of its members (and
N is a set by Principle 0).

3. By 2. above and 2.4.17, F is a set, and so is T = I ∪ F
4. T contains I as a subset (by the way T was defined) and is O-closed since any Oi -output

—no matter where the inputs come from— is in ran(Oi ) ⊆ F.
5. The family G = {S : I ⊆ S ∧ S is O-closed} contains the set T as a member. Thus (cf.
2.4.18)
De f
C = G ⊆T
is a set by the subclass theorem (2.3.6).
1 We will learn that it is actually a set.

Since all sets S in G contain I and are O-closed, so is C (Verify). That is, C satisfies
1.–2. of 6.1.1. But also C ⊆ S for all such sets S the way it is defined. So it satisfies
6.1.1, 3. as well; it is ⊆-smallest.
We proved existence: C = Cl(I , O).
6.2 Induction Over a Closure
6.2.1 Definition Let a pair (I , O) be given as above.
We say that a property P[x] propagates with O iff for each Oi (x1 , . . . , xn , y) ∈ O, if
whenever all the inputs in the xi satisfy P[x] (i.e., P[xi ] is true for each argument xi ), then
all output values returned by y —for said inputs— satisfy P[x] as well. Recall that for each
assignment of values to the inputs x1 , . . . , xn we may have more than one output values in
y; for all such values P[y] is true.
6.2.2 Lemma For all (I , O) and a property P[x], if the latter propagates with O, then the
class A = {x : P[x]} is closed under O (is O-closed).
Proof So let Oi (x1 , . . . , xn , y) ∈ O. Let a1 , . . . , an be all in A. Thus
P[ai ], for all i = 1, . . . , n
By assumption, if Oi (a1 , . . . , an , b), then P[b] is true, hence b ∈ A.
6.2.3 Theorem (Induction Over a Closure Principle) Let Cl(I , O) and a property P[x]
be given. Suppose we have done the following steps:
1. We showed that for each a ∈ I , P[a] is true.

2. We showed that P[x] propagates with O.
Then every a ∈ Cl(I , O) has property P[x].
Naturally, the technique encapsulated by 1. and 2. of 6.2.3 is called “induction over Cl(I , O)”
or “structural induction” over Cl(I , O).
Note that for each Oi ∈ O the “propagation of property P[x]” will take the form of an
I.H. followed by an I.S.:
• Assume for the unspecified fixed inputs a1 , . . . , an of Oi that all satisfy P[x]. This is
the I.H. for Oi .
6.2 Induction Over a Closure 191
• Then prove that any output b of Oi caused by said input also satisfies the property.
Proof (of 6.2.3) Let us write

De f
A = {x : P[x]}
Thus, 1. in 6.2.3 translates to
I⊆A (∗)
2. in 6.2.3 yields by the Lemma
A is O-closed (∗∗)
Now we cannot directly apply 6.1.1 and say “by (∗) and (∗∗) we have”
Cl(I , O) ⊆ A
because in 6.1.1 the “sets T ” that fulfil “1. and 2.” must be, well, sets; not proper classes.
Here is the workaround: Cl(I , O) contains I and is O-closed. By (∗) and (∗∗) so does
T = Cl(I , O) ∩ A (∗ ∗ ∗)
But T is a set by 2.3.6 and thus
6.1.1 (∗∗∗)
Cl(I , O) ⊆ T ⊆ A
The last inclusion immediately translates to
x ∈ Cl(I , O) impliesP[x] is true
6.2.4 Example Let S = Cl(I , O) where I = {0} and O contains just one operation,
x + 1 = y, where y is the output variable. That is,
n −→ x + 1 = y −→ n + 1 (1)
is our only operation. By induction over S, I can show S ⊆ N.

The “P[x]” here is “x ∈ N”.
So P[0] is true. I verified the property for I . That the property propagates with our oper-
ation is captured by (1) above (if n ∈ N, then n + 1 ∈ N). Done!
Can we show also N ⊆ Cl(I , O)? Yes: In this direction I do SI over N on variable n. The
property, let’s call it Q[x], now is “x ∈ Cl(I , O)”.
For n = 0, n ∈ Cl(I , O) since 0 ∈ I ⊆ Cl(I , O) by 6.1.1.
Now, say (I.H.) n ∈ Cl(I , O). Since Cl(I , O) is closed under the operation x + 1 = y,
we have n + 1 ∈ Cl(I , O) by 6.1.1.
So,
Cl(I , O) = N
Thus the induction over a closure generalises SI. The direction N ⊆ Cl(I , O) can be also
proved directly by a result in the new section.
6.3 Closure Versus Definition by Stages
We will see in this section that there is also a by-stages or by-steps way to obtain Cl(I , O).
6.3.1 Definition (Derivations) An (I , O)-derivation —or just derivation if we know which

(I , O) we are talking about— is a finite sequence of objects
d1 , d2 , d3 , . . . , di , . . . , dn (1)
satisfying:
Each di is
1. A member of I ,
or
2. For some j, one of the results of O j (x1 , . . . , xk , y) with inputs a1 , . . . , ak that are found
in the derivation (1) to the left of di .
n is called the length of the derivation. Every di in (1) is called an (I , O)-derived object,
or just derived, if the (I , O) is understood.
Clearly, the concept of a derivation abstracts, thus generalises, the concept of proof, while
a derived object abstracts the concept of a theorem.
6.3.2 Example For the (I , O) of 6.2.4, here are some derivations:
0, 0, 0
0, 1, 0, 1, 0, 1, 1, 1, 1, 0
Nothing says we cannot repeat a di in a derivation! Lastly here is an “efficient” derivation
with no redundant steps: 0, 1, 2, 3, 4, 5.
6.3 Closure Versus Definition by Stages 193
6.3.3 Proposition If
d1 , d2 , d3 , . . . , di , . . . , dn , dn+1 , . . . , dm
is a (I , O)-derivation, then so is
d1 , d2 , d3 , . . . , di , . . . , dn
Proof Each di is validated in a derivation either outright (i.e., is in I ) or by looking to the

left! What we may want to remove to the right of di does not affect the validity of that entry.

6.3.4 Proposition If d1 , d2 , . . . , dn and e1 , e2 , . . . , em are (I , O)-derivations, then so is
d1 , d 2 , . . . , d n , e 1 , e 2 , . . . , e m
Proof Traversing d1 , d2 , . . . , dn and e1 , e2 , . . . , em in
d1 , d2 , . . . , dn , e1 , e2 , . . . , em
from left to right we validate each di and each e j giving precisely the same validation reason
as we would in each sequence d1 , d2 , . . . , dn and e1 , e2 , . . . , em separately. These reasons
are local to each sequence.
We now prove that defining a set S as a (I , O)-closure is equivalent with defining S as

the set of all (I , O)-derived objects.
6.3.5 Theorem For any initial sets of objects and operations on objects (I and O) we have
that Cl(I , O) = {x : x is (I , O)-derived}.
Proof Let us write D = {x : x is (I , O)-derived} and prove that Cl(I , O) = D. We have

two directions:
1. Cl(I , O) ⊆ D: By induction over Cl(I , O). The property to prove is “x ∈ D”.
• Let x ∈ I . Then x is derived via the one-member derivation
So x ∈ D. Thus all x ∈ I have the property.

• The property “x ∈ D” propagates with each Ok (xn , y) ∈ O: So let each of the xi have
a derivation . . . , xi . We show that so does y.
Concatenating all these derivations we get a derivation (6.3.4)
. . . , x1 , . . . , . . . , xi , . . . , . . . , xn (1)
But then so is
. . . , x1 , . . . , . . . , xi , . . . , . . . , xn , y (2)
by 6.3.1, case 2. That is, y is derived, hence y ∈ D is proved (I.S.).
2. Conversely, prove that D ⊆ Cl(I , O): Let x ∈ D. This time we do good old-fashioned
CVI over N on the length n of a derivation of x, toward showing that x ∈ Cl(I , O) —this
is the “property of x” that we prove.
Basis. n = 1. The only way to have a 1-element derivation is that x ∈ I .
Thus, x ∈ I ⊆ Cl(I , O) by 6.1.1.
I.H. Assume the claim for x derived with length k < n.
I.S. Prove that the claim holds when x has a derivation of length n.
Consider such a derivation
an
a1 , . . . ai , . . . , ak , . . . , x
If x ∈ I , then we are done by the Basis. Otherwise, say x is the result of an operation
(relation) Or ∈ O, applied on entries to the left of x, that is, say that Or (. . . , x) is true
—where we did not (have to) specify the inputs.
By the I.H. the inputs of Or all are all in Cl(I , O). Now, since this closure is closed
under Or (. . . , x), we have that the output x is in Cl(I , O) too.
6.3.6 Remark So now we have two equivalent (6.3.5) approaches to defining inductively
defined sets S: As S = Cl(I , O) or as S = {x : x is (I , O)-derived}.
The first approach is best when you want to prove properties of all members of the set S.
The second is best when you want to show x ∈ S, for some specific x.
6.3.7 Example Let us revisit Example 6.2.4, second half of the proof. To prove N ⊆
Cl(I , O) we prove that each n ∈ N has a (I , O)-derivation.
Indeed, such a derivation for n is
0, 1, 2, . . . , n − 1, n (1)
where the above is (n) ≥= {x ∈ N : x ≤ n} where all entries in (1) are in ascending order
without repetitions.
6.3.8 Example Let A = {a, b}. We call A an “alphabet”.

Let I = {λ}, λ being (the name of) the empty string. Let us denote string concatenations
by putting the strings we want to concatenate next to each other. E.g., concatenate aaa and
bbbaa to obtain aaabbbaa. Also, if X denotes a string, and so does Y , then X Y denotes
the concatenation of the strings (denoted by) X and Y in that order. Similarly, Xa means
the result of concatenating string named X with the (length-1) string a, in that order. The
length of a string over A is the number of occurrences in the string (counting repetitions) of
a and b.
We denote by A+ the set of all strings of non zero length formed using the symbols a
and b. A∗ is defined to be A+ ∪ {λ}. Let O consist of the operations Oa and Ob :
X −→ Oa −→ Xa (1)
and
X −→ Ob −→ X b (2)
We claim that Cl(I , O) = A∗ .
1. For Cl(I , O) ⊆ A∗ we do induction over the closure to prove that any x ∈ Cl(I , O)
satisfies x ∈ A∗ (“the property”).
• Well, if x ∈ I then x = λ. But λ ∈ A∗ .

• The property propagates with each of Oa and Ob . For example, if X ∈ A∗ , then since
Xa is also a string over the alphabet A, we have Xa ∈ A∗ . Similarly for Ob . Done.
2. For Cl(I , O) ⊇ A∗ we do induction over N on n = |Y | —the length of Y — to prove

that any Y ∈ A∗ satisfies Y ∈ Cl(I , O) (“the property”).
• Basis. n = 0. Then Y = λ ∈ I ⊆ Cl(I , O). Done.

• I.H. Assume claim for fixed n.
• I.S. Prove for n + 1. If |Y | = n + 1 then Y = Xa or Y = X b for some X or X of
length n. Say, it is Y = Xa. By I.H. X ∈ Cl(I , O). But since Cl(I , O) is O-closed,
we have Y = Xa ∈ Cl(I , O) by (1). The Y = X b case is entirely similar.
6.3.9 Example Let A = {a, b} again.

Let I = {λ}, let O consist of one operation R:
X −→ R −→ a X b (3)
We claim that Cl(I , O) = {a n bn : n ≥ 0}, where for any string X ,

De f
X n = X X
... X
n copies of X
If n = 0, “0 copies of X ” means λ.
Let us write S = {a n bn : n ≥ 0}.

1. For Cl(I , O) ⊆ S we do induction over the closure to prove that any x ∈ Cl(I , O)
satisfies x ∈ S (“the property”).
• Well, if x ∈ I then x = λ = a 0 b0 . Done.

• The property propagates with each of R. For example, say x = a n bn ∈ S. Using (3)
we see that the output, axb, is a n+1 bn+1 ∈ S. The property does propagate! Done.
2. For Cl(I , O) ⊇ S we do induction over N on n of x = a n bn (arbitrary member of S) to

prove that any x ∈ S satisfies x ∈ Cl(I , O) (“the property”).
• Basis. n = 0. Then x = λ ∈ I ⊆ Cl(I , O). Done.

• I.H. Assume claim for fixed n.
• I.S. Prove for n + 1. Thus x = a n+1 bn+1 = aa n bn b. By the I.H., a n bn ∈ Cl(I , O).
By (3) —recall that Cl(I , O) is O-closed— we get the output aa n bn b = a n+1 bn+1 ∈
Cl(I , O).
6.3.10 Example (Extended Binary Trees) This is a longish example with some prelim-
inary discussion up in front. We want to define the mathematical (and computer science)
term known as “Tree”.
This term refers to a structure, which uses as building blocks —called nodes— the
members of the enumerable set below
A = {0 , 1 , 2 , . . . ; 0 , 1 , 2 , . . .}
Trees look something like this:
The qualifier “extended” is due to the presence of square nodes. We will not define simple
trees (they have round nodes only).
These nodes are made distinct by the use of subscripts. The symbols in the set A are
distinguished by their type, “round” versus “square”, and within each type by their natural
number index. Thus, i = j iff i = j, i = j iff i = j, and i = j , for all i, j.
One feature in both of the above drawings is essential to note:
Circular or square nodes are connected by line segments. Walking in the vertical direction
from the top of the page towards the bottom, no nodes are ever shared. In particular, in all
the examples above where we have more than one node, you will notice that
the two sets of nodes that “hang below” the top node (left and right of it) are disjoint.
We need to include this requirement in our definition.

But clearly these sets of notes have “geometric structure” (position: left/right; and con-
nections: via line segments)! They are not “flat” sets like {5 , 11 }.
And yet, in the mathematical definition below we will need to state the boxed condition:
the left and right, when you “forget” the lines and positions, become disjoint flat sets. This
observation is what imposes some complexity in the definition, which defines the “structure”
and the “flat” set that supports the structure (the set of nodes in the tree) simultaneously.
We define an extended binary tree as a member of the inductively defined set of e-trees.
It is intended that each e-tree of the inductively defined set of all trees is an ordered pair:
(flat set of its nodes, geometric tree structure)
The “geometric tree structure” can be mathematically pictured in a one-dimensional depic-

tion of the trees.
For example, the first tree in the figure above is linearly represented by the (ordered)
triple below, whose first and third components are also ordered triples.

1 , 2 , 2 , 1 , 3 , 3 , 4
The “flat set” of (round) nodes of the above is {1 , 2 , 3 }.
Thus our definition below builds the flat set —called the support of the tree— of nodes
of a tree at the same time as it builds the structure of the tree.
6.3.11 Definition We define the set of all extended trees —or just trees— E T , as Cl(I , O)
where:
1. First, chose as the set of initial objects
I = (∅, 0 ), (∅, 1 ), (∅, 2 ), . . .}

2. O has just one rule with a constraint on the input: If FX ∩ FY = ∅ and i ∈

/ FX ∪ FY ,
then ⎫
(FX , X ) −→⎬
i −→ form tree −→ FX ∪ FY ∪ {i }, (X , i , Y )
⎭
(FY , Y ) −→
3. For each (S, T ) ∈ Cl(I , O) we say that T is an extended tree, and S is its support, that
is, the “flat” set of round nodes used to build T .2
We indicate this relationship by
3
S = sup(T )
If T = (X , i , Y ), then we say that i is the root of T , while X is its left and Y is its
right subtree.

Some examples of trees are
We verify the example above: Using 6.3.5, the leftmost example is a tree since it is the right
component of the pair (∅, 1 ). The next tree is built via the derivation —written linearly,

(∅, 1 ), (∅, 2 ), {2 }, (1 , 2 , 2 )
The next derivation builds both the 2nd and 3rd trees:

(∅, 1 ), (∅, 2 ), {2 }, (1 , 2 , 2 ) , (∅, 3 ), (∅, 4 ), {3 }, (3 , 3 , 4 )
The 4th tree has this 4

as a derivation:
(∅, 1 ), (∅, 2 ), {2 },(1 , 2 , 2 ) , (∅, 3 ), (∅, 4 ), {3 }, (3 , 3 , 4 ) ,

{1 , 2 , 3 }, {2 }, (1 , 2 , 2 ) , 1 , {3 }, (3 , 3 , 4 )
The support of the 4th tree is the flat set {1 , 2 , 3 }.

2 We may fix a priori a supply (set) A of acceptable round nodes.
3 As for many other symbols, “sup” means something else in the context of POsets.
4 Derivations are not unique as is clear from Example 6.3.2.
6.3.12 Example (Trees —continued) Hmm! Seems like we are not including square nodes
in the support. See how the support of all nodes in I is ∅ for each entry. Why so?
In the words of Knuth (Knuth (1973)) “trees is the most important nonlinear structure
arising in computing algorithms”. The extended tree is an abstraction of trees that we imple-
ment with computer programs, where round nodes are the only ones that can carry data.
The lines are (implicitly) pointing downwards. They are pointers, in computer jargon. For
example, the topmost leftmost line in the fourth tree above points to the node 2 . Practically
it means that if your program is processing node 1 , then it can transfer to and process node
2 if it wishes. It knows the address of 2 . The pointer holds this address as value.
Which brings me to square nodes! Together with the line planted on them, they are
notation for null pointers! They point nowhere. So square nodes cannot hold information,
that is why they do not contribute to the support of the tree.
The computer scientist calls round nodes “internal” and calls square nodes “external”.
Finally, how do the lines —called edges— get inserted? We defined “root” for trees, as
well as “left subtree” and “right subtree”. So, to draw lines and draw a tree that is given
mathematically as (X , r , Y ), we call recursively the process that does the “drawing” on
(inputs) X and Y .
Then add two more edges: One from r to the root of X and one from r to the root
of Y .
How does the recursion terminate? Well, if your tree is just j , then there is nothing to
draw. j is the root. This is the basis of the recursive procedure: do nothing.
Here is something interesting about all extended trees:
6.3.13 Proposition In any extended tree, the number of square nodes exceeds by one the
number of round nodes.
Proof Induction over the set of all trees (6.3.11) Cl(I , O).
1. Basis. For any (∅, i ), the tree-part (structure-part) is just i . One square node, 0 round
nodes. Done.
2. The property propagates with the only tree-builder operation:
⎫
(FX , X ) −→⎬
i −→ form tree −→ FX ∪ FY ∪ {i }, (X , i , Y )
⎭
(FY , Y ) −→
Indeed, suppose that X has φ internal (round) and ε external (square) nodes. Let also Y
have φ internal and ε external nodes.
The assumption on the input side is then (I.H.) that
φ+1=ε (1)
and
φ + 1 = ε (2)
The output side of the operation has the tree (X , i , Y ). This has = φ + φ + 1
internal nodes and E = ε + ε external ones. Using (1) and (2) we have
= ε + ε − 1 = E − 1
Seeing that this is the property we want to prove on the output side, indeed the property
propagates with the rule. Done.
We will have more to say about trees in Chap. 8.
6.4 Exercises
1. (Long but Easy Exercise) Below we simultaneously define the syntax of a set of
names of certain sets of strings and the semantics of said names —that is, what sets
they name.
The set of names is given as a closure while the semantics of those names is given
along the definition of the closure, inductively. See below.
For the names we need an alphabet of symbols. As such we take the alphabet =
{0, 1, (, ), ·, +, ∗} by the inductive definition:
Names Form Semantics (as subsets of {0, 1}∗ )

∅ ∅
0 {0}
1 {1}
If α and β are strings among those we are defining here, with meanings A ⊆
{0, 1}∗ and B ⊆ {0, 1}∗ respectively, then so are
(α + β) A∪B
(α · β) A · B( = {x · y : x ∈ A ∧ y ∈ B})
(α ∗ ) A∗
These strings are called regular expressions, and the sets they are “naming” are called
regular sets. For example, (0 + 1) is a regular expression for the regular set {0, 1}, while
(∅∗ ) is a regular expression for {λ}, where λ denotes the empty string (in this context “λ”
is not related to λ-notation). We informally omit brackets (so that we can write 0 + 1, ∅∗ ),
by the rules:
6.4 Exercises 201
a. Omit outermost brackets

b. The strength of operations is (from strongest to weakest), ∗, ·, +.
Prove by induction on the definition of regular expressions, that if A is a regular set,

then so is {x : x · y ∈ A for some y ∈ {0, 1}∗ }.
Hint.
You need to show (by induction on regular expressions) that any such set of prefixes,
pr e f (A) to use a name for convenience, can be named by a regular expression.
Basis. ∅ names the set ∅. Clearly, pr e f (∅) = ∅, hence the prefix has a “name”, ∅. 0 names
{0}. pr e f ({0}) = {λ, 0}. This has a name (using simplified notation, without all brackets)
∅∗ + 0, so is a regular set. Similarly for regular set named “1”.
I.H. Assume that if α, β name A, B respectively, that pr e f (A), pr e f (B) have names (i.e.,
are regular) γ , δ respectively.
I.S. What about α + β that names A ∪ B? Well, pr e f (A ∪ B) = pr e f (A) ∪ pr e f (B)
which has name γ + δ hence is regular.
Your work starts here:
Now work out the cases where α · β names A · B and α ∗ names A∗ .

2. Prove that for each choice of I , O we can define a monotone operator (cf. 3.8.1)
such that ∞ = Cl(I , O). The notation “∞ ” was introduced in Definition 3.8.4.
3. Show that the transitive closure P + of a set relation P is Cl(I , O) for appropriate
I , O. Specify I and O and prove they work for P.
4. Show that, for each I and O, the part I can be absorbed by O —its members viewed as
zero-ary operations without
premises (inputs). Thus an O exists such that Cl(I , O) =

S : S is O -closed .
Note. Authors adopting 0-premise rules thus write their closures as “Cl(O)” where O
contains 0-premise rules.
5. (The Syntax of Boolean Formulas, #1) Boolean formulas are defined as Cl(I , O) where
I = {⊥, , p, p , p , p , . . .} and { p, p , p , . . .} is the enumerable set of Boolean
variables while ⊥, are the two Boolean constants which, for simplicity, we have
identified in the body of this book (in Chap. 4) with their intended values in the
metatheory, namely, f, t.
The members of I are called atomic Boolean formulas.
There are two operations on strings in O, namely,
A −→ ¬ −→ (∗¬ ∗ A∗)
A −→
∨ −→ (∗A ∗ ∨ ∗ B∗)
B −→
where “∗” denotes string concatenation.
a. How many brackets do we need to write ⊥ or p correctly?

b. Prove by induction over Cl(I , O) that every Boolean formula has as many left as it
has right brackets.
6. (The Syntax of Boolean Formulas, #2) Prove by induction over Cl(I , O) (5 above) that
every Boolean formula has as many left brackets as Boolean connectives.
7. (The Syntax of Boolean Formulas, #3) Prove by induction over Cl(I , O) (5 above) that
every Boolean formula contains an atomic one as a substring.
8. (The Syntax of Boolean Formulas, #4) A prefix of a string of symbols X is a string U
for which we have a string V such that X = U ∗ V (or, simply, X = U V ). A prefix U
of X is proper by definition iff X = U .
Prove by induction over Cl(I , O) (5 above) that, for every Boolean formula A, every
one of its nonempty ( = λ) proper prefixes contains an excess of left brackets.
9. (Lack of Ambiguity or “Unique Readability”; Immediate Predecessors) A Boolean
formula A is either atomic or (¬B) or (B ∨ C) where B, C are formulas (cf. 5 above).
In the first of the latter two cases we call B an immediate predecessor (or i.p.) of A,
while in the second case we say that both B and C are i.p. of A. On the other hand, an
atomic formula has no i.p.
If for every formula A the i.p. are uniquely determined, then we say that the pair (I , O)
is unambiguous, else it is ambiguous.
Prove that the rule set in 5 is unambiguous. We say that this is the “unique readability”
metatheorem for Boolean formulas.
Hint. Use Problem 8.
10. Let S = Cl(I , O) where I = {2} and O contains only λx y.x + y.
Prove that S = {2n : n ≥ 1}. There are two directions: ⊆ and ⊇.
11. Let T = Cl(I , O) where I = {2} and O contains only λx y.x + y and λx y.x − y.
Prove that S = {2n : n ∈ Z}. There are two directions: ⊆ and ⊇.
12. Consider Cl(I , O) where I = {(0, 0)} and O contains only λ(x, y).(x + 3, y + 2)
where both x and y are in N.
Prove that for all (n, m) ∈ Cl(I , O), 5 divides n + m.
13. Consider Cl(I , O) where I = {(0, 0)} and O contains only the two rules λ(x, y).(x +
1, y) and λ(x, y).(x, y + 1) where both x and y are in N.
Prove that N × N = Cl(I , O).
Caution. Do not forget the ⊆ direction.
14. Use Simple Induction to prove that if h and g are total and if f is defined from them
by primitive recursion, that is,
For all x, y,
f (0, y) = h( y)
f (x + 1, y) = g(x, y, f (x, y))
6.4 Exercises 203
then f is total as well. Incidentally, we use the notation f = prim(h, g) for the f
defined above and call h the basis function and g the iteration function.
Hint. Prove by simple induction on x that, for all y, we have f (x, y) ↓.
Very Important! Some of the variables in f , g and h above may be missing! This is
alright. Permutation of the variables is also alright. We have indicated all the variables as
place-holders in the general case. The following function (Exercise 15) has a primitive
recursive definition where the last variable of “g” is missing so there is no “recursive
call”!
15. (The switch function) Prove that λx yz.i f x = 0 then y else z is in PR.
16. Let f = prim(h, g). Imagine a programming language that allows the assignment
statements z ← h( y) and z ← g(x, y, w). Program in this programming language,
using a single do loop, the function λx y. f (x, y) given by the primitive recursion in
14.
Hint. You will obviously use pseudo-programming, as details of the programming
language are not essential. The crucial part is that it supports the above mentioned
assignment statement and it can do “loops”:
do i = 0 to n
..
.
17. True or false? In the schema defining f as f = prim(h, g) the recursive call can be
eliminated.
18. Define the set of all primitive recursive functions, that we denote by PR, as a closure
Cl(I , O) where
a. I is the set of initial functions. These are precisely S = λx.x + 1, Z = λx.0, and
Uin , for 1 ≤ i ≤ n > 0, given by Uin = λxn xi .
b. There are just two operations in O:
i. From functions λx z y. f (x, z, y) and λw.g(w) obtain —as we say by substitution
H = λx w y.F(x, g(w), y)
ii. From functions h, g obtain f = prim(h, g)
Prove that all the functions of PR are total.

Hint. Towards proving “ f ∈ PR implies that f is total” (by induction over the
closure), the case that an application of the operation prim propagates totalness
relies on the previous exercise.
19. (Easy) True or false and why?
a. If h and g are in PR then so is f = prim(h, g)

b. λx z y. f (x, z, y) and λw.g(w) are in PR then so is H = λx w y.F(x, g(w), y).
20. Prove that

a. λx.x + 3 ∈ PR.
b. λx y.x + y ∈ PR.
c. λx.x + x ∈ PR.
d. λx.2x ∈ PR.
e.
· 2
·· x 2’s
λx.22
is in PR.
Hint. The last function is obtained trivially as prim(h, g) where h is the constant
function that outputs 1 and g is one of the previous functions in this exercise (of
course, you must provide justification).
21. (The Formal Natural Numbers) Let Cl(I , O) be such that I = {∅} and O contains
only the following operation on sets
5
λx.x ∪ {x}
Modern set theorists call this closure ω, the set of formal natural numbers.
Correspondingly, they call the members of ω formal natural numbers.
We will see (actually you will prove) below (30) that ∅ ∈ ω is the counterpart of 0 ∈
N
and n ∪ {n} ∈ ω is the counterpart of n + 1 ∈ N. Naturally, “n ∪ {n}” is called the
(formal) successor of n ∈ ω.
Let us now discover properties of the formal natural numbers —the first three of
which
are the sets ∅, {∅}, {∅, {∅}}, etc.—
Pause. Can you better specify the “etc.” above?
that mirror exactly those of the members of N.
Now, Prove (easy) that we can do induction over ω on the variable n, namely:
If P(n) is a property of sets n,
then if
one verifies P(∅) and also, for the arbitrary n,
verifies that P(n) implies P n ∪ {n} , then one has proved that P(n) is true for all
n ∈ ω.
22. Prove that n ∪ {n} = ∅ for all n ∈ ω.
5 λ-notation was introduced in 3.5.10.

6.4 Exercises 205
This corresponds to “n + 1 = 0” for N.
23. Prove that n ∪ {n} = m ∪ {m} implies m = n for all n, m in ω.

Hint. Say instead that m, n exist such that m = n and yet n ∪ {n} = m ∪ {m}.
This corresponds to “n + 1 = m + 1 implies n = m” on N.

The reader will note that a proof along the Hint is valid for all sets n, m not just those
in ω.
24. Prove if n ∈ ω, then either n = ∅ or n = m ∪ {m} for some m ∈ ω. We say, in the latter
case, that n is a successor.
Hint. Do induction over ω on n ∈ ω to prove (∀n)P(n), where P(n) stands for n =
∅ ∨ (∃m ∈ ω)(n = m ∪ {m}).
25. (Transitive classes) A class A is transitive iff x ∈ y ∈ A implies x ∈ A for all x, y.
This can be also said as y ∈ A implies y ⊆ A. Prove that every member of ω is a
transitive set.
Hint. Use induction over ω on the variable z to prove (∀z)(∀x)(∀y)(x ∈ y ∈ z → x ∈
z).
26. Prove that ω is a transitive set.
Hint. Use induction over ω on the variable y to prove (∀y)(∀x)(x ∈ y ∈ ω → x ∈ ω).
27. Prove that ω is not a successor.
28. Prove that if n is a formal natural number then every one of its members is a natural
number.
29. Prove that the (proper) subset relation n ⊂ m on formal natural numbers is a well
ordering on ω.
Hint. Organise your thoughts on this by listing first what exactly “well ordering” entails
as a property of ⊂.
30. Prove that N ∼ ω.
Hint. Define by induction (recursion) f : N → ω by f (0) = ∅ and f (n + 1) = f (n) ∪
{ f (n)}. Show that f is total, 1-1 and onto.
Recurrence Equations and Their Closed-Form
Solutions
7
Overview
In so-called “divide and conquer” algorithms one usually ends up with a recurrence relation
(i.e., inductive or recursive definition!) that defines the “timing function”, T (n) —such
timing indicating worst case upper bound on run time or average run time as the case may
be. For example, the recurrence might look like

1 if n = 1
T (n) =
T (n/2) + 1 otherwise
In order to assess the “goodness” of the proposed algorithm by comparison to either our
expectations or to another algorithm, we need to know T (n) in “closed” form, that is, in
terms of known functions, for example, nr for r > 0, cn for c > 1, logb n for some integer
b > 1.
Often, a preliminary analysis need only worry about the “asymptotic behaviour” of the
algorithm, i.e., the behaviour for large inputs (n is the input size).
What is input size? Since many algorithms of interest —e.g., they may manipulate trees—
are non numerical, “size” is not the numerical value of the input normally. Moreover, even
numerical algorithms are often expressed in terms of the digit-structure of the inputs thus it
makes sense to assess their “goodness” with respect to the number of digits in the input or
the length of the input, not its value. Does it matter? It does in the context of the so-called
efficient (or “feasible”) algorithms which is defined to mean that their run time is bounded
by a polynomial function of the input size!
It turns out that —due to the exponential relation between value and length of a natural
number— an algorithm that runs in polynomial time with respect to the input numerical value
https://doi.org/10.1007/978-3-031-30488-0_7
208 7 Recurrence Equations and Their Closed-Form Solutions
will run in exponential time with respect to input length and thus be termed “inefficient” or
worse: “unfeasible”.
“Big-O” notation —introduced in this chapter— is an excellent tool in gauging upper
bounds of run times of algorithms, therefore the solution of recurrences is often sought in
such notation. On occasion one requires an “exact” solution (this is much harder to achieve
in general).
There is a big variety of recurrence relations and an equally big variety of solution
techniques. Some restricted cases are handled well by packages such as Mathematica or
Maple.
In this chapter we restrict attention to simple classes of recurrences taken from both the
“additive” and “multiplicative” cases. These characterisations in quotes refer to the manner
of handling the argument of the recurrence. E.g., the recurrence above is multiplicative as
the recursive call is to an argument obtained by halving the original argument n.
For the solution of the Fibonacci recurrence and other “Fibonacci-like” recurrences in
closed form we introduce the topic of generating functions.
7.1 Big-O, Small-o, and the “Other” ∼
This notation is due to the mathematician E. Landau and is in wide use in number theory,
but also in computer science in the context of measuring (bounding above) computational
complexity of algorithms for all “very large inputs”.
7.1.1 Definition Let f and g be two total functions of one variable, where g(x) > 0, for
all x. Then
1. f = O(g) —also written as f (x) = O(g(x))— read “ f is big-oh g”, means that there
are positive constants C and K in N such that
x > K implies | f (x)| ≤ Cg(x)
2. f = o(g) —also written as f (x) = o(g(x))— read “ f is small-oh g”, means that
f (x)
lim =0
x→∞ g(x)
3. f ∼ g —also written as f (x) ∼ g(x)— read “ f is of the same order as g”, means that
f (x)
lim =1
x→∞ g(x)

7.1 Big-O, Small-o, and the “Other” ∼ 209
“∼” between two sets A and B, as in A ∼ B, means that there is a 1-1 correspondence
f : A → B. Obviously, the context will protect us from confusing this ∼ with the one
introduced just now, in 7.1.1.
Both definitions 2. and 3. require some elementary understanding of differential calculus.
Case 2. says, intuitively, that as x gets extremely large, then the fraction f (x)/g(x) gets
extremely small, infinitesimally close to 0. Case 3. says, intuitively, that as x gets extremely
large, then the fraction f (x)/g(x) gets infinitesimally close to 1; that is, the function outputs
are infinitesimally close to each other.
7.1.2 Example 1. x = O(x) since x ≤ 1 · x for x ≥ 0.

2. x ∼ x, since x/x = 1, and stays 1 as x gets very large.
3. x = o(x 2 ) since x/x 2 = 1/x which trivially goes to 0 as x goes to infinity.
4. 2x 2 + 10001000 x + 10350000 = O(x 2 ). Indeed
2x 2 + 10001000 x + 10350000
= 2/3 + 10001000 /3x + 10350000 /3x 2 < 1
3x 2
for x > K for some well chosen K . Note that 10001000 /3x and 10350000 /3x 2 will
each be < 1/6 for all sufficiently large x-values: we will have 2/3 + 10001000 /3x +
10350000 /3x 2 < 2/3 + 1/6 + 1/6 = 1 for all such x-values. Thus 2x 2 + 10001000 x +
10350000 < 3x 2 for x > K as claimed.
In many words, in a polynomial, the order of magnitude is determined by the highest
power term.
The last example motivates
7.1.3 Proposition Suppose that g is as in 7.1.1 and f (x) ≥ 0 for all x > L, hence | f (x)| =
f (x) for all x > L. Now, if f (x) ∼ g(x), then f (x) = O(g(x)).
Proof The assumption says that

f (x)
lim =1
x→∞ g(x)
From “calculus 1” (1st year differential calculus) we learn that this implies that for some K ,
x > K entails
f (x)
− 1 <1
g(x)
hence
f (x)
−1 < −1<1
g(x)
therefore, x > max(K , L) implies f (x) < 2g(x).
7.1.4 Proposition Suppose that g is as in 7.1.1 and f (x) ≥ 0 for all x > L, hence | f (x)| =
f (x) for all x > L. Now, if f (x) = o(g(x)), then f (x) = O(g(x)).
Proof The assumption says that

f (x)
lim =0
x→∞ g(x)
From calculus 1 we learn that this implies that for some K , x > K entails

f (x)

g(x) < 1
hence
f (x)
−1 < <1
g(x)
therefore, x > max(K , L) implies f (x) < g(x).
These two propositions enrich our toolbox:
7.1.5 Example 1. ln x = o(x r ) for any positive real r . Here “ln” stands for loge where e
is the Euler constant
2.7182818284590452353602874713526624977572470937 . . .
Seeing that both numerator and denominator

ln x
lim
x→∞ x r
go to ∞, we have here (if we do not do anything to mitigate) an impasse: We have a

“limit” that is “indeterminate”:
∞
∞
So, we will use “l’Hôpital’s rule” (the limit of the fraction is equal to the limit of the
fraction of the derivatives):
ln x 1/x 1
lim r
= lim r −1
= lim =0
x→∞ x x→∞ rx x→∞ r xr
2. ln x = O(log10 (x)). In fact, you can go from one log-base to the other:
log10 (x)
loge (x) =
log10 (e)
The claim follows from 7.1.3 since trivially ln x ∼ log10 (x)/ log10 (e). For that reason
—and since multiplicative constants are hidden in big-O notation— complexity- and
7.2 Solving Recurrences; the Additive Case 211
algorithms-practitioners omit the base of the logarithm and write things like O(log n)
and O(n log n).
7.2 Solving Recurrences; the Additive Case
The general case here is of the form1
T0 = k
sn Tn = vn Tn−1 + f (n) if n > 0
a recurrence defining the sequence Tn , or equivalently, the function T (n) (both jargons and
notations spell out the same thing), in terms of the known functions (sequences) sn , vn , f (n).
For the general case see Knuth (1973). Here we will restrict attention to the case sn = 1,
for all n, and also vn = a (a constant), for all n.
Subcase 1. (a = 1) Solve
T0 = k
(1)
Tn = Tn−1 + f (n) if n > 0
From (1), Tn − Tn−1 = f (n), thus

n
n
(Ti − Ti−1 ) = f (i)
i=1 i=1
the lower summation value dictated by the lowest valid value of i − 1 according to (1).

7.2.1 Remark The summation in the lhs above is called a “telescoping (finite) series”
because the terms T1 , T2 , . . . , Tn−1 appear both positively and negatively and pairwise can-
cel. Thus the series “contracts” into Tn − T0 like a (hand held) telescope.
Therefore n
Tn = T0 + i=1 f (i)
n (2)
= k + i=1 f (i)
If we know how to get the sum in (2) in closed form, then we solved the problem!
7.2.2 Example Solve

2 if n = 1
pn = (3)
pn−1 + n otherwise
Here
1 Note the “additivity” in the relation between indices/arguments: n versus n − 1.


n
n
( pi − pi−1 ) = i
i=2 i=2
Note the lower bound of the summation: It is here 2, to allow for the lowest i − 1 value
possible. That is 1 according to 3, hence i = 2.
Thus,
(n + 2)(n − 1)
pn = 2 +
2
(Where did I get the (n + 2)(n − 1)/2 from?) The above answer is the same as (verify!)
(n + 1)n
pn = 1 +
2
obtained by writing

n
n
2+ i =1+ i
i=2 i=1
Subcase 2. (a = 1) Solve
T0 = k
(4)
Tn = aTn−1 + f (n) if n > 0
(4) is the same as
Tn Tn−1 f (n)
n
= n−1 + n
a a a
To simplify notation, set
Def Tn
tn =
an
thus the recurrence (4) becomes
t0 = k
f (n) (5)
tn = tn−1 + if n > 0
an
By subcase 1, this yields

n
f (i)
tn = k +
ai
i=1
from which

n
f (i)
Tn = ka n + a n (6)
ai
i=1
7.2.3 Example As an illustration solve the recurrence below.

7.3 Solving Recurrences; the Multiplicative Case 213

1 if n = 1
Tn = (7)
2Tn−1 + 1 otherwise
To avoid trouble, note that the lowest term here is T1 , hence its “translation” to follow the
above methodology will be “t1 = T1 /21 = 1/2”. So, the right hand side of (6) applied here
will have “ka n−1 ” instead of “ka n ” (Why?) and the indexing in the summation will start at
i = 2 (Why?)
Thus, by (6),

n
1
Tn = 2n (1/2) + 2n
2i
i=2
(2−1 )n+1 − 1 1
= 2n−1 + 2n ( −1
−1− )
2 −1 2
−n 1
=2 n−1
+ 2 (2 − 2 − 1 − )
n
2
= 2n − 1
In the end you will probably agree that it is easier to redo the work with (7) directly, first
translating it to
1/2 if n = 1
tn = (8)
tn−1 + 1/2 if n > 1
n
rather than applying (6)!
We immediately get from (8)

n (2−1 )n+1 − 1
Tn = 2n tn = 2n 1/2 + 1/2i = 2n 1/2 + −1
− 1 − 1/2
2 −1
i=2
etc.

The red terms are subtracted as they are missing from our . The blue formula used is
for
n
1/2i
i=0
7.3 Solving Recurrences; the Multiplicative Case
Subcase 1.
k if n = 1
T (n) = (1)
aT (n/b) + c if n > 1
were a, b are positive integer constants (b > 1) and k, c are any constants. Recurrences
like (1) above arise in divide and conquer solutions to problems. For example, binary search
has timing governed by the above recurrence with b = 2, a = c = k = 1.
Why does (1) with the above-mentioned parameters —b = 2, a = c = k = 1— capture the
run time of binary search? First off, regarding “run time” let us be specific: we mean number
of comparisons.
OK, to do such a search on a sorted (in ascending order, say) array of length n, you first
check the mid point (for a match with what you are searching for). If you found what you
want, exit. If not, you know (due to the ordering) whether you should next search the left
half or the right half.
So you call the procedure recursively on an array of length about n/2.
This decision and call took T (n/2) + 1 comparisons. This equals T (n). If the array has
length 1, then you spend just one comparison, T (1) = 1.
We seek a general solution to (1) in big-O notation.
First convert to an “additive case” problem: To this end, seek a solution in the restricted
set {n ∈ N : n = bm for some m ∈ N}. Next, set
t(m) = T (bm ) (2)
so that the recurrence becomes

k if m = 0
t(m) = (3)
at(m − 1) + c if m > 0
hence, from the work in the previous section,

m

m
t(i) t(i − 1)
− =c a −i
ai a i−1
i=1 i=1
therefore ⎧
⎨m if a = 1
t(m) = a m k + ca m (a −1 )m −1
⎩a −1 if a = 1
a −1 − 1
or, more simply, ⎧
⎨k + cm if a = 1
t(m) = a m −1
⎩a m k + c if a = 1
a−1
Using O-notation, and going back to T we get:
7.3 Solving Recurrences; the Multiplicative Case 215

O(m) if a = 1
T (b ) = m
(4)
O(a m ) if a = 1
or, provided we remember that this solution relies on the assumption that n has the form bm :

O(log n) if a = 1 O(log n) if a = 1
T (n) = = (5)
O(a logb n ) if a = 1 O(n logb a ) if a = 1
If a > b then we get slower than linear “run time” O(n logb a ) with logb a > 1. If on the other
hand b > a > 1 then we get a sublinear run time, since then logb a < 1.
Now a very important observation. For functions T (n) that are increasing,2 i.e., T (i) ≤ T ( j)
if i < j the restriction of n to have form bm proves to be irrelevant in obtaining the solution.
The solution is still given by (5) for all n. Here’s why:
In the general case, n satisfies
bm−1 < n ≤ bm for some m ≥ 0 (6)
Suppose now that a = 1 (upper case in (4)). We want to establish that T (n) = O(log n) for
the general n (of (6)). By monotonicity of T and the second inequality of (6) we get
by (6) right by (4) by (6) left
T (n) ≤ T (bm ) = O(m) = O(log n)
The case where a > 1 is handled similarly. Here we found an answer O(nr ) (where
r = logb a > 0) provided n = bm (some m). Relax this proviso, and assume (6).
Now
by (6) right by (4) Why? by (6) left
T (n) ≤ T (bm ) = O(a m ) = O((bm )r ) = O((bm−1 )r ) = O(nr )

Subcase 2.
k if n = 1
T (n) = (1 )
aT (n/b) + cn if n > 1
were a, b are positive integer constants (b > 1) and k, c any constants. Recurrences like (1 )
above also occur in divide and conquer solutions to problems. For example, two-way merge
sort has timing governed by the above recurrence with a = b = 2 and c = 1/2. Quicksort
has average run time governed, essentially, by the above with a = b = 2 and c = 1. Both
lead to O(n log n) solutions. Also, Karatsuba’s “fast” integer multiplication has a run time
recurrence as above with a = 3, b = 2.
2 Such are the “complexity” or “timing” functions of algorithms.

These examples are named for easy look up, in case they trigger your interest or curiosity. It
is not in the design of this course to expand on them. Merge Sort and Quicksort you might
see in a course on data structures while Karatsuba’s “fast multiplication” of natural numbers
might appear in a course on algorithms.
Setting at first (our famous initial restriction on n) n = bm for some m ∈ N and using (2)
above we end up with a variation on (3):

k if m = 0
t(m) = (3 )
at(m − 1) + cbm if m > 0
thus we need do
m

m
t(i) t(i − 1)
− =c (b/a)i
ai a i−1
i=1 i=1
therefore ⎧
⎨m if a = b
t(m) = a m k + ca m (b/a)m − 1
⎩(b/a) if a = b
b/a − 1
Using O-notation, and using cases according as to a < b or a > b we get:
⎧
⎪
⎪ O(bm m) if a = b
⎨
t(m) = a O(1) = O(a )
m m if b < a /* (b/a)m → 0 as m → ∞ */
⎪
⎪
⎩ O(bm − a m ) = O(bm ) if b > a
or, in terms of T and n, which is restricted to form bm (using same calculational “tricks” as
before): ⎧
⎪
⎨ O(n log n) if a = b
⎪
T (n) = O(n logb a ) if b < a (4 )
⎪
⎪
⎩ O(n) if b > a
The above solution is valid for any n without restriction, provided T is increasing. The proof
is as before, so we will not redo it (you may wish to check the “new case” O(n log n) as an
exercise).
In terms of complexity of algorithms, the above solution says that in a divide and conquer
algorithm (governed by (1 )) we have the following cases:
• The total size of all subproblems we solve (recursively) is equal to the original problem’s
size. Then we have a O(n log n) algorithm (e.g., merge sort).
• The total size of all subproblems we solve is more than the original problem’s size. Then
we go worse than linear (logb a > 1 in this case). An example is Karatsuba multiplication
7.4 Generating Functions 217
that runs in O(n log2 3 ) time —still better than the O(n 2 ) so-called “school method” integer
multiplication, which justifies the nickname “fast” for Karatsuba’s multiplication.3
• The total size of all subproblems we solve is less than the original problem’s size. Then
we go in linear time (e.g., the problem of finding the k-th smallest in a set of n elements).
7.4 Generating Functions
We saw some simple cases of recurrence relations with additive and multiplicative index
structure (we reduced the latter to the former). Now we turn to a wider class of additive
index structure problems where our previous technique of utilizing a “telescoping sum”

n
(t(i) − t(i − 1))
i=1
does not apply because the right hand side still refers to t(i) for some i < n. Such is the
case of the well known Fibonacci sequence Fn given by
⎧
⎪
⎪ if n = 0
⎨0
Fn = 1 if n = 1
⎪
⎪
⎩F + Fn−2 if n > 1
n−1
The method of generating functions that solves this harder problem also solves the previous
problems we saw.
Here’s the method in outline. We will then embark on a number of fully worked out
examples.
Given a recurrence relation
tn = . . . tn−1 . . . tn−2 . . . tn−3 . . . (1)
with the appropriate “starting” (initial) conditions. We want tn in “closed form” in terms of
known functions. Here are the steps:
1. Define a generating function of the sequence t0 , t1 , . . . , tn , . . .

∞
G(z) = i=0 ti z i
(2)
= t0 + t1 z + t2 z 2 + · · · + tn z n + · · ·
3 But there are even faster integer multiplication algorithms!

(2) is a formal power series, where formal means that we only are interested in the form
of the “infinite sum” and not in any issues of convergence4 (therefore “meaning”) of
the sum. It is stressed that our disinterest in convergence matters is not a simplifying
convenience but it is due to the fact that convergence issues are irrelevant to the problem
in hand!
In particular, we will never have to consider values of z or make substitutions into z.
2. Using the recurrence (1), find a closed form of G(z) as a function of z (this can be done
prior to knowing the tn in closed form!)
3. Expand the closed form G(z) back into a power series
∞
G(z) = i=0 ai z i
(3)
= a0 + a1 z + a2 z 2 + · · · + an z n + · · ·
But now we do have the an ’s in terms of known functions, because we know G(z) in
closed form! We only need to compare (2) and (3) and proclaim
tn = an for n = 0, 1, . . .
The problem has been solved.
Steps 2. and 3. embody all the real work. We will illustrate by examples how this is done
in practice, but first we need some “tools”:
Let us concentrate below on the “boxed” results, which we will be employing —not being
too interested in the arithmetic needed to obtain them!
The Binomial Expansion. For our purposes in this volume we will be content with just one
tool, the “binomial expansion theorem” of calculus (the finite version of it we proved by
induction here Example 5.2.25):
For any m ∈ R (where R is the set of real numbers) we have
∞
m
(1 + z) =
m
zr
r
r =0 (4)

m r
= ··· + z + ···
r
where for any r ∈ N and m ∈ R
4 In Calculus one learns that power series converge in an interval like |z| < r for some real r ≥ 0.
The r = 0 case means the series diverges for all z.
⎧

m def ⎨1 if r = 0
= m(m − 1) · · · (m − [r − 1]) (5)
r ⎩ otherwise
r!
The expansion (4) terminates with last term

m m by (5) m
z = z
m
as the “binomial theorem of Algebra” says, and that is so iff m is a positive integer. In all
other cases (4) is non-terminating (infinitely many terms) and the formula is then situated
in Calculus. As we remarked before, we will not be concerned with when (4) converges.
Note that (5) gives the familiar

m m(m − 1) · · · (m − [r − 1])
=
r r!
m(m − 1) · · · (m − [r − 1])(m − r ) · · · 2 · 1
=
r !(m − r )!
m!
=
r !(m − r )!
when m ∈ N. In all other cases we use (5) because if m ∈
/ N, then “m!” is meaningless.
Let us record the very useful special case when m is a negative integer, −n (n > 0).
−n(−n − 1) · · · (−n − [r − 1]) r

(1 + z)−n = · · · + z + ···
r!
n(n + 1) · · · (n + [r − 1]) r
= · · · + (−1)r z + ···
r!
(n + [r − 1]) · · · (n + 1)n r (6)
= · · · + (−1)r z + ···
r!
n +r −1 r
= · · · + (−1)r z + ···
r

−n n +r −1 r
(1 − z) = · · · + z + ··· (7)
r

Finally, let us record in “boxes” some important special cases of (6) and (7)

1 r r
(1 − z)−1 = = ··· + z + ···
1−z r (8)
= ··· + z + ···
r
The above is the familiar “converging geometric progression” (converging for |z| < 1, that
is, but this is the last time I’ll raise irrelevant convergence issues). Two more special cases
of (6) will be helpful:

−2 1 r +1 r
(1 − z) = = ··· + z + ···
(1 − z)2 r (9)
= 1 + 2z + · · · + (r + 1)z r + · · ·
and

1 r +2 r
(1 − z)−3 = = · · · + z + ···
(1 − z) 3 r
(10)
(r + 2)(r + 1) r
= 1 + 3z + · · · + z + ···
2

7.4.1 Example Solve the recurrence
a0 = 1
(i)
an = 2an−1 + 1 if n > 0
Write (i) as
an − 2an−1 = 1 (ii)
Next, form the generating function for an , and a “shifted” copy of it (multiplied by 2z; z
does the shifting) underneath it (this was “inspired” by (ii)):
G(z) = a0 +a1 z +a2 z 2 + · · · +an z n +···

2zG(z) = 2a0 z +2a1 z 2 + · · · +2an−1 z n + · · ·
Subtract the above term-by-term to get
G(z)(1 − 2z) = 1 + z + z 2 + z 3 + · · ·
1
=
1−z
Hence
1
G(z) = (iii)
(1 − 2z)(1 − z)
(iii) is G(z) in closed form. To expand it back to a (known) power series we first use the
“partial fractions” method (familiar to students of calculus) to write G(z) as the sum of two
fractions with linear denominators. I.e., find constants A and B such that (iv) below is true
for all z:
1 A B
= +
(1 − 2z)(1 − z) (1 − 2z) (1 − z)
or
1 = A(1 − z) + B(1 − 2z) (iv)
Setting in turn z ← 1 and z ← 1/2 we find B = −1 and A = 2, hence
2 1
G(z) = −
1 − 2z 1−z

= 2 · · · (2z)n · · · − · · · z n · · ·
= · · · (2n+1 − 1)z n · · ·
Comparing this known expansion with the original power series above, we conclude that
an = 2n+1 − 1
Of course, we solved this problem much more easily in Sect. 7.2. However due to its
simplicity it was worked out here again to illustrate this new method. Normally, you
apply the method of generating functions when there is no other obviously simpler way to
do it.
7.4.2 Example Solve

p1 = 2
(i)
pn = pn−1 + n if n > 1
Write (i) as
pn − pn−1 = n (ii)
Next, form the generating function for pn , and a “shifted” copy of it underneath it (this was
“inspired” by (ii)).
Note how this sequence starts with p1 (rather than p0 ). Correspondingly, the constant term
of the generating function is p1 .
G(z) = p1 + p2 z + p3 z 2 + · · · + pn+1 z n + · · ·
zG(z) = p1 z + p2 z 2 + · · · + pn z n + · · ·
Subtract the above term-by-term to get
G(z)(1 − z) = 2 + 2z + 3z 2 + 4z 3 + · · · + (n + 1)z n + · · ·
1
=1+ by (9)
(1 − z)2
Hence
1 1
G(z) = +
1−z (1 − z)3
(n + 2)(n + 1) n
= · · · zn · · · + · · · z ··· by (10)
2
(n + 2)(n + 1) n
= ··· 1 + z ···
2
Comparing this known expansion with the original power series above, we conclude that
(n + 2)(n + 1)
pn+1 = 1 + , the coefficient of z n
2
or (n + 1)n
pn = 1 +
2

7.4.3 Example Here is one that cannot be handled by the techniques of Sect. 7.2.
s0 = 1
s1 = 1 (i)
sn = 4sn−1 − 4sn−2 if n > 1
Write (i) as
sn − 4sn−1 + 4sn−2 = 0 (ii)
to “inspire”
G(z) = s0 +s1 z +s2 z 2 + · · · +sn z n +···
4zG(z) = 4s0 z +4s1 z 2 + · · · +4sn−1 z n + · · ·
4z 2 G(z) = 4s0 z 2 + · · · +4sn−2 z n + · · ·
By (ii),
G(z)(1 − 4z + 4z 2 ) = 1 + (1 − 4)z
= 1 − 3z
Since 1 − 4z + 4z 2 = (1 − 2z)2 we get
1 1
G(z) = − 3z
(1 − 2z) 2 (1 − 2z)2

= · · · (n + 1)(2z)n · · · − 3z · · · (n + 1)(2z)n · · ·

= · · · (n + 1)2n − 3n2n−1 z n · · ·
Thus,
sn = (n + 1)2n − 3n2n−1
= 2n−1 (2n + 2 − 3n)
= 2n (1 − n/2)

7.4.4 Example Here is another one that cannot be handled by the techniques of Sect. 7.2.
s0 = 0
s1 = 8
sn = 2sn−1 + 3sn−2 if n > 1 (i)

Write (i) as
sn − 2sn−1 − 3sn−2 = 0 (ii)
Next,
G(z) = s0 +s1 z +s2 z 2 + · · · +sn z n +···
2zG(z) = 2s0 z +2s1 z 2 + · · · +2sn−1 z n + · · ·
3z 2 G(z) = 3s0 z 2 + · · · +3sn−2 z n + · · ·
By (ii),
G(z)(1 − 2z − 3z 2 ) = 8z
The roots of 1 − 2z − 3z 2 = 0 are
√
−2 ± 4 + 12 −2 ± 4 −1
z= = =
6 6 1/3
hence 1 − 2z − 3z 2 = −3(z + 1)(z − 1/3) = (1 − 3z)(1 + z), therefore

8z A B
G(z) = = + splitting into partial fractions
(1 − 3z)(1 + z) 1 − 3z 1+z
By a calculation as in Example 7.4.1, A = 2 and B = −2, so
2 2
G(z) = −
1 − 3z 1+z

= 2 · · · (3z)n · · · − 2 · · · (−z)n · · ·
= (· · · [2 · 3n − 2(−1)n ]z n · · · )
hence sn = 2 · 3n − 2(−1)n
7.4.5 Example The Fibonacci recurrence.
F0 = 0
F1 = 1 (i)
Fn = Fn−1 + Fn−2 if n > 1
Write (i) as
Fn − Fn−1 − Fn−2 = 0 (ii)
Next,
G(z) = F0 +F1 z +F2 z 2 + · · · +Fn z n + · · ·
zG(z) = F0 z +F1 z 2 + · · · +Fn−1 z n + · · ·
z G(z) =
2 F0 z 2 + · · · +Fn−2 z n + · · ·
By (ii),
G(z)(1 − z − z 2 ) = z
The roots of 1 − z − z 2 = 0 are
⎧ √
√ ⎪ −1 + 5
⎪
−1 ± 1 + 4 ⎨
z= = 2√
2 ⎪ −1 − 5
⎪
⎩
2
For convenience of notation, set
√ √
−1 + 5 −1 − 5
φ1 = , φ2 = (iii)
2 2
Hence
1 − z − z 2 = −(z − φ1 )(z − φ2 )
(iv)
= −(φ1 − z)(φ2 − z)
therefore
z A B
G(z) = = + splitting into partial fractions
1−z−z 2 φ1 − z φ2 − z
from which (after some arithmetic that I will not display),
φ1 φ2
A= , B=
φ1 − φ2 φ2 − φ1
so
1 φ1 φ2
G(z) = −
φ1 − φ2 φ1 − z φ2 − z
1 1 1
= −
φ1 − φ2 1 − z/φ1 1 − z/φ2
n n
1 z z
= ··· ··· − ··· ···
φ1 − φ2 φ1 φ2
therefore
1 1 1
Fn = − n (v)
φ1 − φ2 φ1n φ2
Let’s simplify (v):
First, by brute force calculation, or by using the “known” relations between the roots of
a 2nd degree equation, we find
7.5 Exercises 225
√
φ1 φ2 = −1, φ1 − φ2 = 5
so that (v) gives

1 φ2n φ1n
Fn = √ −
5 (φ1 φ2 ) n (φ1 φ2 )n
√ n √ n
1 n (1 + 5)/2 n (1 − 5)/2
= √ (−1) − (−1)
5 (−1)n (−1)n
√ n √ n
1 1+ 5 1− 5
=√ −
5 2 2
In particular, we find that √ n

1+ 5
Fn = O
2
since √ n
1− 5
→ 0 as n → ∞
2
√
due to (1 − 5)/2 being about −0.62.
That is, Fn grows exponentially with n, since |φ2 | > 1.
7.5 Exercises
1. Given the recurrence below.

T (n/9) + T (63n/72) + Cn if n ≥ 90
T (n) ≤
Cn if n < 90
Prove that T (n) ≤ 72Cn, for n ≥ 1.

2. Not using generating functions, solve in closed form employing O-notation, the fol-
lowing recurrence. State clearly what assumptions you need to make on T in order to
have a solution that is valid for all n.
T (1) = 1
T (n) = 2T (n/2) + n 2
3. Solve in closed form the following recurrence, and express the answer in Big-O notation.
Do not use generating functions.
T (1) = a
T (n) = 3T (n/2) + n
4. In this exercise you are asked to use the method of generating functions —the tele-
scoping method is not acceptable.
Solve in closed form the following recurrence.
a0 = 0
an = an−1 + 1, for n ≥ 1
5. Design a divide-and-conquer recursive function procedure F(n) (give the pseudo-code

in pseudo-C) which returns the n-th Fibonacci number, Fn . Ensure that it runs in
O(log n) arithmetic operations (multiplications/additions).
In particular, (a) prove the correctness of your algorithm, and (b) prove that indeed it
runs in “time” O(log n), by setting and solving the appropriate recurrence relation that
defines the run-time.
Hint. It is useful to approach the problem by proving first that

Fn 11 Fn−1
=
Fn−1 10 Fn−2
and then conclude that n−1

Fn 11 F1
=
Fn−1 10 F0
6. The Euclidean algorithm towards finding the greatest common divisor (gcd) of two nat-
ural numbers a > b > 0 —denoted as gcd(a, b)— notes that gcd(a, b) = gcd(b, r ),
where a = bq + r with 0 ≤ r < b. Argue that the process of shifting the answer —
finding gcd(a, b), that is— from the pair (a, b) to the pair (b, r ) is terminating.
Estimate the number of steps of that process. Then gauge an upper bound of this roughly
implied algorithm in big-O notation in terms of the digit-length of a.
Hint. Relate this problem to the generation of the Fibonacci sequence.
7. Using generating functions solve the following recurrence exactly in closed form
a0 = 1
a1 = 2
for n ≥ 2, an = 2an−1 − an−2
8. a. Prove that if G(z) is the generating function for the sequence (an ), for n = 0, 1, . . .,
that is,
G(z) = a0 + a1 z + a2 z 2 + a3 z 3 + · · ·
n
then G(z)/(1 − z) is the generating function of the sequence ( i=0 an ), for n = 0, 1, . . ..
7.5 Exercises 227
b. Now using generating functions prove that

n
n(n + 1)
i=
2
i=0
9. Using generating functions solve the following recurrence in closed form.
a0 = 0
a1 = 1
a2 = 2
for n ≥ 3, an = 3an−1 − 3an−2 + an−3
10. Using generating functions solve the following recurrence in closed form.
a0 = 0
a1 = 1
for n ≥ 2, an = −2an−1 − an−2
11. Consider the recurrence
a0 = 1
a1 = 1
a2 = 1
for n ≥ 3, an = an−1 + an−3
By induction or in any other manner, prove that an ≥ 2an−2 , for n ≥ 3 and an ≥

2(n−2)/2 , for n ≥ 2.
12. This is a tooling exercise for Exercises 13, 14 below and Exercise 8.5.3 in the next
chapter. Prove these facts about floors and ceilings.
• n/2 + n/2 = n, for all n. Hint. Argue the two cases separately, n even and n
odd.
• n/2 − 1 = (n − 1)/2, for all n. Hint. Argue the two cases separately, n even
and n odd.
• n/2 = (n + 1)/2, for all n. Hint. Argue the two cases separately, n even and n
odd.
• n/2 ≥ n/2 ≥ n/2 (trivial).
• n/2 + 1 ≥ n/2 . Hint. Directly from the definitions of . . . and . . . .
• n/4 = n/2 /2. Hint. If l = n/4, then by definition, l ≤ n/4 < l + 1 hence
2l ≤ n/2 < 2l + 2 thus 2l ≤ n/2 < 2l + 2 (explain “≤” but the “<” is trivial).
It follows that l ≤ n/2 /2 < l + 1. Checkmate in one very short sentence.
• n/4 = n/2 /2 . Hint. If l = n/4 , then by definition, l − 1 < n/4 ≤ l hence

2l − 2 < n/2 ≤ 2l thus 2l − 2 < n/2 ≤ 2l (explain “≤” but the “<” is trivial).
It follows that l − 1 < n/2 /2 ≤ l. Checkmate in one very short sentence.
13. Consider standard binary search, where an array A[1 . . . n] with entries in ascending
order is recursively searched for the possible occurrence of K as an A[i] as follows:
a. Check if K matches A[(n + 1)/2]. If yes, exit successfully. If not

b. Recursively call the search algorithm to search for K in the array
A[1 . . . (n + 1)/2 − 1]
if K < A[(n + 1)/2]

c. Recursively call the search algorithm to search for K in the array
A[(n + 1)/2 + 1 . . . n]
if K > A[(n + 1)/2].
a. Set up the run time (upper bound, worst case) for the run time λn.T (n) —assumed
to be a non-decreasing function of n— of the algorithm. Preserve . . . in the recur-
rence equations for T , i.e., do not approximate with non-integer expressions such as
n/2, (n + 1)/2.
Hint. Trivially, the worst case T (n) is 1 (comparing K with the middle entry) plus
maximum of worst case time for left —T ((n + 1)/2 − 1)— call and right —
T (n − (n + 1)/2)— call. Decide which of these two calls is always worst and
arrive at T (n) = 1 + T (. . .). Note the “=”! You are equating two sides with the
worst case run time in each.
b. Now solve for T (n) in exact closed form by the telescoping trick and using the
appropriate tools from 12. Hint. Extend the last bullet from 12 with the result

n/2m /2 = n/2m+1 .
14. Consider a modified “binary search” where the “middle” entry of the array that we
compare with the key5 we are searching for, is the one stored in location to n/2 rather
than (n + 1)/2.
a. Formulate the recurrence that expresses the worst case number of comparisons T (n)
in this modified binary search.
5 In storing data —for example in an array— we often store them accompanied by short aliases that
we call “keys”. Thus instead of repeatedly comparing a possibly unwieldy and large record during
our search, we instead compare repeatedly its key against the keys of the stored (in the array) records.
7.5 Exercises 229
b. Solve your recurrence exactly, that is, do not “simplify” the floors, and do not answer
in O-notation.
15. Solve the following recurrence and express the solution in O-notation, where it is given
that the function T that expresses the run time is increasing.

0 if n ≤ 2
T (n) = √ √
n T ( n) + n if n > 2
Hint. Solve for f (n) = T (n)/n instead. The simplified equations for f must be solved
m
first for the special case of n = 22 . Then adapt to any n.
Caution! Show that your solution is valid for all n rather than only for n of restricted
forms.
16. Solve the following recurrence in O-notation, where we are told that T (n) is increasing.

1 if n = 1
T (n) =
aT (n/b) + n if n > 2
2
Caution!
a. You need to consider cases according to the values of the natural numbers a kai b.
b. Show that your closed-form solution is valid for all n rather than only for those of
some restricted forms.
17. Consider the described below algorithm to search a sorted (ascending order) array of
length n:
/ ∗ We are searching the array for an element equal to m ∗ /

if n ≤ 7 then do a linear search of A[1 . . . n]
else if A[7] = m then successful exit
else if A[7] > m then do a linear search of A[1 . . . 7]
else call recursively on segment A[8 . . . n]
Formulate and solve exactly (not in O-notation) the recurrence relation that defines the
worst case number of comparisons T (n).
18. Let the symbol (n) stand for the number of ways we can write down the sum a1 +
· · · + an with all possible brackets inserted.
Examples:
The trivial “sum” a1 offers only one way to become “fully parenthesised”, namely, (a1 ).
Then, a1 + a2 + a3 allows two ways to be fully parenthesised, namely, ((a1 ) + ((a2 ) +

(a3 ))) and (((a1 ) + (a2 )) + (a3 )).
In terms of we have (1) = 1, (3) = 2.
a. Find the correct recurrence that expresses (n).

b. Find the generating function G(z) of the sequence (n), n = 1, 2, . . . in closed
form, but you are not required to find (n) itself in closed form.
An Addendum to Trees
8
Overview
This short Addendum uses trees to calculate a sum that arrises in the analysis of algorithms
in exact (i.e., not in O-notation) closed form. The difficulty with this sum is that its terms
involve the ceiling function —in something forbidding like λx.log2 x. In the area of
discrete mathematics known as graph theory, trees —in particular binary trees— play a
central role as special cases of the so-called directed graphs. While trees are studied for
their own merit in modelling important data structures in computing practise, they have also
unexpected applications to discrete mathematics such as the one we will demonstrate in this
chapter. The chapter concludes with an application of generating functions used to compute
a simple expression that computes the number of all extended trees that have n internal
nodes.
There is a direct graph-theoretic definition of a tree —that is beyond the design of this
volume— it is arguably more convenient to go the direct graph-independent route to a
definition (Example 6.3.10) that we took in Chapter 6 if for no other reason besides the fact
that such definition enables us to prove tree properties by structural induction. Our definitive
definition has been given in Example 6.3.10.
8.1 Trees: More Terminology

8.1.1 Example This example supplements the discussion started at 6.3.10.
So here are some trees ∅, (∅, 1, ∅), and ((∅, 1, ∅), 2, ∅) where we wrote 1 and 2 for 1
and 2 respectively.
https://doi.org/10.1007/978-3-031-30488-0_8
232 8 An Addendum to Trees
In the figure below the notation shows only the structure part (not the support) and the
first example is what we may call “the most general tree” as there is no specific assumptions
regarding the left and right subtrees of the root r (which we can, by abuse of notation and
language just call “root r ”). They are “any” trees drawn as triangles with names “T1 ” and
“T2 ” respectively.
The last tree below has both subtrees of its root empty, while the second tree has an empty
right subtree. Thus, the “general tree” is drawn as one of the following, where r is the root.
The leftmost drawing uses the notation of a “triangle” to denote a tree. The other three cases
are used when we want to draw attention to the fact that the right (respectively, left, both)
subtree(s) is (are) empty.
8.1.2 Definition (Simple and Extended Trees) We agree that we have two types of tree-
notations (abusing language we say that we have two types of trees).
Simple Trees are those drawn only with “round” nodes (i.e., we do not draw the empty
subtrees).
Extended Trees are those that all empty subtrees are drawn as “square nodes” (as in 6.3.10).
We call, in this case, the round nodes internal and the square nodes external.
Clearly, the “external nodes” of an extended tree cannot hold any information since they are
(notations for) empty subtrees.
Alternatively, we may think of them (in a programming-implementation sense) as nota-
tions for null pointers. That is, in the case of simple trees we do not draw any null links,
while in the case of extended trees we draw all null links as square nodes.
8.1.3 Definition (Graph-Theoretic Terminology) We introduce some standard graph the-

ory terminology:
If node a points to node b by an edge, then b is a child of a and a is the parent of b. If
two nodes are the children of the same node, then they are siblings.
A sequence of nodes a1 , a2 , . . . , an in a tree is a path or chain iff for all i = 1, 2, . . . , n −
1, ai is the parent of ai+1 . We say that this is a chain from a1 to an . We say that an is a
descendant of a1 and that a1 is an ancestor of an .
A node is a leaf iff it has no children.
In an extended tree the only leaves are the external (square) nodes.
8.1 Trees: More Terminology 233
8.1.4 Definition We define levels of nodes in a tree recursively (inductively):

The root has level 0 (sometimes we assign level 1 to the root, as it may prove convenient).
If b is any child of a and a has level i, then b has level i + 1.
The highest level in a tree is called the height of the tree.
8.1.5 Example (Assignment of levels)
8.1.6 Definition A non leaf node is fertile iff it has two children. A tree is fertile iff all its
non leaves are fertile.
A tree is full iff it is fertile and all the leaves are at the same level.
An extended tree is always fertile. The last sentence above then simplifies, if restricted to
such trees, to “All square nodes are at the same level”.
8.1.7 Example (Full Trees). A “full tree” has all the possible nodes it deserves.
8.1.8 Definition A tree is complete iff it is fertile and all the leaves occupy at most two
consecutive levels (obviously, one is going to be the last (highest) level).
Again, for extended trees we need only ask that all square nodes occupy “at most two
consecutive levels”.
8.1.9 Example (Complete Trees)
Redraw the above so that all the square nodes are “rounded” and you get examples of
complete Simple Trees.

8.1.10 Example There is a variety of complete trees, the general case having the nodes
in the highest level scattered about in any manner. In practice we like to deal mostly with
complete trees whose highest level nodes are left justified (left-complete) or right-justified
(right-complete). See the following, where we drew (abstractly) a full tree (special case of
complete!), a left complete, a right complete, and a “general” complete tree.
Example 8.1.9 provides a number of more concrete examples
8.2 A Few Provable Facts About Trees
8.2.1 Theorem An extended tree with a total number of nodes equal to n (this accounts for
internal and external nodes) has n − 1 edges.
Proof We do induction with respect to the definition of trees, or as we say in short, induction
on trees. Cf. 6.2.3.
Basis. The smallest tree is ∅, i.e., exactly one “square” node. It has no edges, so the
theorem verifies in this case.
8.2 A Few Provable Facts About Trees 235
I.H. Assume the claim for “small” trees.

I.S. Consider the “big” tree below composed of two “small” trees of k and l nodes and a
root. Say the total number of nodes is n + 1.1
By I.H., the left and right (small) subtrees have k − 1 and l − 1 edges respectively. Thus the
total number of edges is k − 1 + l − 1 + 2 = k + l (Note that in an extended tree all round
nodes are fertile, so both edges emanating from the root are indeed present).
On the other hand, the total number of nodes n + 1 is k + l + 1. We rest our case.
8.2.2 Corollary An extended tree of n internal nodes has n + 1 external nodes.
Proof This was proved in 6.3.13. Here is another proof.
Let us have φ internal and ε external nodes. Given that φ = n.

By the 8.2.1 the tree has ε + φ − 1 edges. That is, accounting differently, 2φ edges since
all round nodes are fertile, and the square nodes are all leaves. Thus,
ε + φ − 1 = 2φ
from which, φ = ε − 1. Thus there are n + 1 square nodes as claimed.
8.2.3 Corollary A simple tree of n ≥ 1 nodes has n − 1 edges.
Proof Let E be the original number of edges, still to be computed in terms of n. Add external
nodes (two for each “original” leaf). What this does is:
1 No magic with n + 1. We could have called the total n, but then we would have to add “where
n ≥ 1” to account for the presence of the root. The “ ≥ 1” part is built-in if you use n + 1 instead.
It adds n + 1 square nodes, by the previous corollary.

It adds n + 1 new edges (one per square node). Thus,
Total Nodes= 2n + 1
Total Edges = E + n + 1
By theorem 8.2.1, E + n + 1 = 2n, hence E = n − 1 as claimed.
8.2.4 Theorem In any nonempty fertile simple tree we have

2−l = 1
l is a leaf’s
level
where we assigned level 0 to the root.
Proof Induction on trees.
Basis The smallest tree is one round node. Its level is 0 and 2−0 = 1, so we are OK.
I.H. Assume for small trees, and go to the “big” case.
I.S. The big case (recall that the tree is fertile, so even though simple, the root has two
children).
Since each of T1 and T2 are “small”, I.H. applies to give

2−l = 1 (1)
l is a leaf’s
level in T1
and
2−l = 1 (2)
l is a leaf’s
level in T2
8.2 A Few Provable Facts About Trees 237
It is understood that (1) and (2) are valid for T1 and T2 “free-standing” (i.e., root level is 0
in each). When they are incorporated in the overall tree, call it T , then their roots obtain a
level value of 1, so that formulas (1) and (2) need adjustment: All levels now in T1 , T2 are
by one larger than the previous values. Thus,

2−l = 2−(l+1) + 2−(l+1) = 1/2 + 1/2 = 1
I.H.
l is a leaf’s l is a leaf’s l is a leaf’s

level in T level in T1 level in T2
free standing free standing
8.2.5 Corollary In an extended tree

2−l = 1
l is a leaf’s
level
where we assigned level 0 to the root.
8.2.6 Corollary In both 8.2.4 and 8.2.5, if the root is assigned level 1, then

2−l = 1/2
l is a leaf’s
level
Next we address the relation between n, the number of nodes in a simple complete tree
(8.1.8), with its height l (8.1.4).
Clearly,
n = 20 + 21 + · · · + 2l−2 + k ≤ 2l − 1 (A)
thus
2l−1 − 1 < n ≤ 2l − 1
From this follows
2l−1 < n + 1 ≤ 2l
or
2l−1 ≤ n < 2l
leading to
l = log2 (n + 1) = log2 n + 1 (∗)
a good formula to remember.
Of course, all this holds when counting levels from 1 up. Check to see what happens if the
root level is 0.
How does k, the number of nodes at level l, relate to n, the number of nodes in the tree?
From (A),
k = n + 1 − 2l−1 (∗∗)
another very important formula this to remember, which can be also written (because of (∗))
as
k = n + 1 − 2log2 (n+1)−1
(B)
= n + 1 − 2log2 n
Note that (∗∗) or (B) hold even if some or all nodes at level l − 1 have no more than one
child (in which case the tree fails to be complete, or fertile for that matter).
8.3 An Application to Summations
Let us see next what happens if we label the nodes of a left-complete tree by numbers
successively, starting with label 2 for the root.
8.3 An Application to Summations 239
An easy induction shows that at level i we have the labels
2i−1 + 1, 2i−1 + 2, . . . , 2i (1)
Note that if t is any of the numbers in (1), then 2i−1 < t ≤ 2i , hence
log2 t = i
In words, the ceiling of log2 of any node-label at level i equals i.

We are in a position now to evaluate the sum

n+1
A= log2 i (2)
i=2
which arises in the analysis of certain algorithms. The figure below helps to group terms
appropriately:
Clearly,

l−1
A= i2i−1 + klog2 (n + 1) (3)
i=1
To compute (3) we need to find k as a function of n, and to evaluate

l−1
B= i2i−1 (4)
i=1
There are two cases at level l as in the previous figure. Regardless, k is given in (∗∗) of the
previous section (p. 238) as
k = n + 1 − 2l−1
Thus, we only have to compute (4). Now,
l−1
B= i2i−1
i=1
l−2
= (i + 1)2i
i=0
l−2
= i2i + 2l−1 − 1
i=0
l−2
=2 i2i−1 + 2l−1 − 1
i=1
l−1
=2 i2i−1 − (l − 2)2l−1 − 1
i=1
= 2B − (l − 2)2l−1 − 1
Thus, B = (l − 2)2l−1 + 1, and the original becomes (recall (∗) and (∗∗)!)
A = (l − 2)2l−1 + 1 + kl
= l2l−1 − 2l + 1 + (n + 1 − 2l−1 )l
= (n + 1)l − 2l + 1
= (n + 1)log2 (n + 1) − 2log2 (n+1) + 1

Note. A rough analysis of A would go like this: Each term of the sum is O log(n + 1)

and we have n terms. Therefore, A = O n log(n + 1) . However we often need the exact
answer . . .
8.4 How Many Trees?
We want to find the number of all extended binary trees of n internal nodes.
Let the sought quantity be called xn .

Refer to the following figure, where the tree has n internal nodes, while subtree T1 has
m internal nodes and subtree T2 has r internal nodes.
8.4 How Many Trees? 241
Thus, n = m + r + 1. We can choose T1 in xm different ways, and for each way, we can
have xr different versions of T2 . And that is true for each size of T1 . Thus, the recurrence
equations for xn are
x0 = 1(there is only one, empty tree) (1)

xn = xm xr , for n > 0 (2)
n−1=m+r
We recognise in (2) the so-called convolution resulting from G(z)2 , where
G(z) = x0 + x1 z + x2 z 2 + · · · + xn z n + · · · (3)
Indeed,

G(z)2 = x02 + (x0 x1 + x1 x0 )z + · · · + ( n−1=m+r xm xr )z n−1 + · · ·
= x1 + x2 z + · · · + xn z n−1 + · · · (4)
Thus, zG(z)2 + x0 = G(z), or
zG(z)2 − G(z) + 1 = 0 (5)
We solve (5) for G(z) to find

⎧ √
⎪
⎪ 1 + 1 − 4z
⎨ or
G(z) = √2z (6)
⎪
⎪ 1 − 1 − 4z
⎩
2z
equivalently, √
⎧
⎪ 1 + 1 − 4z
⎨ , or
zG(z) = √2 (6 )
⎩ 1 − 1 − 4z
⎪
2
The first of (6 ) is false for z = 0, so we keep and develop the second solution. To this end
√
we expand 1 − 4z = (1 − 4z)1/2 by the binomial expansion.
1/2
(1 − 4z)1/2 = 1 + · · · + (−4z)n + · · ·
n
1/2
Let us work with the coefficient (−4)n .
n
1/2 1/2(1/2 − 1)(1/2 − 2) . . . (1/2 − [n − 1])

(−4)n = (−4)n
n n!
1(1 − 2 · 1)(1 − 2 · 2) . . . (1 − 2 · [n − 1])
= (−1)n 22n
2n n!
(2 · 1 − 1)(2 · 2 − 1) . . . (2 · [n − 1] − 1)
= (−1)n (−1)n−1 2n (7)
n!
(2 · 1 − 1)[2 · 1](2 · 2 − 1)[2 · 2] . . . (2 · [n − 1] − 1)[2 · [n − 1]]
= −2 (8)
n!(n − 1)!
(2n − 2)!
= −2
n!(n − 1)!
2 2n − 2
=−
n n−1
Going from (7) to (8) above we introduced factors [2 · 1], [2 · 2], . . . , [2 · [n − 1]] in order
to “close the gaps” and make the numerator be a factorial. This has spent n − 1 of the n
2-factors in 2n , but introduced (n − 1)! on the numerator, hence we balanced it out in the
denominator.
It follows that, according to the second case of (6),
2 2n − 2 n
1 − 1 − 2z − · · · − z − ···
n n−1
xn = G(z)[z n ] = [z n ] (9)
2z
where for any generating function G(z), G(z)[z n ] denotes the coefficient of z n .
In short,
1 2n
xn =
n+1 n
Don’t forget that we want G(z)[z n ]; we have adjusted for the division by z.
8.5 Exercises
1. Prove the identity (due to Vandermonde)
n+m n m
=
r i j
i+ j=r
Hint. Multiply the generating functions (1 + z)n and (1 + z)m and look at the r -th coef-
ficient.
2. Do n, m above have to be natural numbers?
8.5 Exercises 243
3. This exercise is asking you to 1) formulate the recurrence equations and 2) solve them
in exact closed form (not in O-notation) for the worst case run time of the “linear merge
sort” algorithm.
Just like the binary search algorithm, linear merge sort uses a divide and conquer —
as practitioners in the analysis of algorithms call them— technique to devise a fast
algorithm.
In this case we want to sort an array A[1 . . . n] in ascending order. Thus we divide the
problem into two almost equal size problems —to sort each of A[1 . . . (n + 1)/2 ] and
A[(n + 1)/2 + 1 . . . n]— and we do so by calling the procedure recursively for each
half-size array.
We then do a linear —essentially n-comparisons— merge of the two sorted half-size
arrays.
Hints and directions.
a. Prove that the recurrence equations for the worst case are
T (0) = 0
T (1) = 0
Case of n > 1
T (n) = T (n/2) + T (n/2 ) + n
Be sure to prove that the equations above are correct.
b. You will use now the tools from Exercise 7.5.12 and also what we have learnt from
our work (your work) on recurrence equations solving. Exercise 7.5.13 will be helpful
in the final stages of your solution of the present exercise. Indeed let me show how
you can solve the above recurrence by reducing it, essentially, to the binary search
case recurrence.
Step 1: Towards the telescoping trick. We are also trying to get all divisions by 2 in
. . .-notation. So,
T (n) − T (n − 1) = T (n/2) + T (n/2 ) −

− T ((n − 1)/2) − T ((n − 1)/2 ) + 1
= T (n/2) + T (n/2 ) −
− T (n/2 ) − T ((n − 1)/2 ) + 1
= T (n/2) − T (n/2 − 1) + 1
De f
Step 2: Rename: B(n) = T (n)−T (n−1) so that B(n)=B(n/2)+1 and B(n) =
0, for n ∈ {0, 1}.
Step 3: Solve for B(n) similarly to Exrecise 7.5.13 and note that 1 < n/2m ≤ 2 iff
2m < n ≤ 2m+1 iff m < log2 n ≤ m + 1 iff log2 n = m + 1. Thus,
B(n) = B(n/2) + 1
= B( n/22 ) + 2
= B( n/23 ) + 3
..
.
= B( n/2m+1 ) + m + 1
So (use the initial (boundary) conditions for B and provide details!) conclude
that B(n) = log n, where I wrote “log” for “log2 ”, and
Step 4: Compute the sum of the telescoping T (n) − T (n − 1) = log n using
Section 8.3.
4. Given an extended tree. Traverse it and mark its round nodes in the order you encounter
them.
Program —as a pseudo program— this traversal so that you first process the left subtree,
then the root and then the right subtree. This is called the inorder traversal.
Suppose now that the given tree contains one number in each round node. Moreover
assume the tree has the property that for every round node the number contained therein
is greater than all numbers contained in the node’s left subtree and less than all the
numbers in the right subtree of the node.
Prove that the inorder traversal of such a tree will visit all these numbers in ascending
order.
5. We have introduced edges to trees. Prove that no node (round or square) in a tree has
more than one edge coming into it (from above).
References
J. Barwise, Admissible Sets and Structures (Springer, New York, 1975)

É. Borel, Leçons sur la théorie des fonctions, 3rd edn. (Gauthier-Villars, Paris, 1928)
N. Bourbaki, Éléments de Mathématique; Théorie des Ensembles (Hermann, Paris, 1966)
M. Davis, The Undecidable (Raven Press, Hewlett, NY, 1965)
R. Dedekind, Was sind und was sollen die Zahlen? Vieweg, Braunschweig, 1888 (In English translation
by W.W. Beman; cf, Dedekind, 1963)
R. Dedekind, Essays on the Theory of Numbers (First English edition translated by W.W. Beman and
published by Open Court Publishing, 1901) (Dover Publications, New York, 1963)
J.A. Dieudonné, Foundations of Modern Analysis (Academic Press, New York, 1960)
R. Graham, D. Knuth, O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, 2
edn. (Addison-Wesley, 1994)
David Gries, Fred B. Schneider, A Logical Approach to Discrete Math (Springer, New York, 1994)
T.J. Jech, Set Theory (Academic Press, New York, 1978)
S.C. Kleene, Recursive predicates and quantifiers. Trans. Am. Math. Soc. 53, 41–73 (1943) (Also in
Davis (1965), 255–287)
D.E. Knuth, The Art of Computer Programming; Fundamental Algorithms, vol. 1, 2 edn. (Addison-
Wesley, 1973)
Kenneth Kunen, Set Theory, An Introduction to Independence Proofs (North-Holland, Amsterdam,
1980)
A.G. Kurosh, Lectures on General Algebra (Chelsea Publishing Company, New York, 1963)
A. Levy, Basic Set Theory (Springer, New York, 1979)
R. Montague, Well-founded relations; generalizations of principles of induction and recursion. Bull.
Am. Math. Soc. 61, 442 (1955) (abstract)
Y.N. Moschovakis, Abstract first-order computability. Trans. Am. Math. Soc. 138(427–464), 465–504
(1969)
H. Rogers, Theory of Recursive Functions and Effective Computability (McGraw-Hill, New York,
1967)
J.R. Shoenfield, Mathematical Logic (Addison-Wesley, Reading, MA, 1967)
A.L. Tarski, General principles of induction and recursion; the notion of rank in axiomatic set theory
and some of its applications. Bull. Am. Math. Soc. 61, 442–443 (1955) (2 abstracts)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 245
Springer Nature Switzerland AG 2024
https://doi.org/10.1007/978-3-031-30488-0
246 References
G. Tourlakis, Lectures in Logic and Set Theory, Volume 1: Mathematical Logic (Cambridge University
Press, Cambridge, 2003a)
G. Tourlakis, Lectures in Logic and Set Theory, Volume 2: Set Theory (Cambridge University Press,
Cambridge, 2003b)
G. Tourlakis, Mathematical Logic (Wiley, Hoboken, NJ, 2008)
G. Tourlakis, Theory of Computation (Wiley, Hoboken, NJ, 2012)
G. Tourlakis, Computability (Springer Nature, New York, 2022)
R.L. Wilder, Introduction to the Foundations of Mathematics (Wiley, New York, 1963)
Index
A Binary, 186
Adjacency matrix, 57 Binary notation, 186
A-introduction, 138 Binary relation, 44
Algorithm Binary search, 228
analysis of , 243 Binomial coefficient, 161
efficient, 207 Binomial theorem, 161
Euclidean, 226 Boolean, 57
feasible, 207 Boolean connectives, 204
Alphabet, 117 Boolean formulas, 203
Ambiguity, 204 Bound, 118, 123
Ambiguous, 36, 204 Bound variable, 89
string notation, 36 changing, 138
Analysis of algorithms, 243
Ancestor, 232
Antinomy, 1 C
Atomic Boolean formulas, 204 Call
Atomic formula, 204 to a function, 88
constant, 141 Cancelling indices, 106
Axiom Canonical index, 181
of foundation, 127, 156, 166 Captured, 122
mathematical, 126 Cartesian product, 34
schema, 124 Ceiling, 63
special, 126 Chain, 184, 232
Axiom of choice, 182 in a graph, 232
Axiom of replacement, 87 Child, 232
Axiom schema, 124 Child node, 232
Choice, 94
axiom, 94
B function, 94
Basis function, 205 Class, 7
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 247
Springer Nature Switzerland AG 2024
https://doi.org/10.1007/978-3-031-30488-0
248 Index
proper, 7 Derived rule, 131

transitive, 187 Descendant, 232
Closed, 140 Descending chain
Closed formula, 140 infinite, 153
Closed segment, 167 Diagonal, 48, 59
Closed under, 190 of a matrix, 59
Coefficient Diagonalisation, 104
binomial, 161 Diagonal method, 104
Complete induction, 146 Difference, 24
Completeness theorem, 132 set-theoretic, 24
of Post, 132 Disjoint, 23
Completeness theorem of Boolean Logic, 132 Distributive laws, 39
Complete tree, 233, 237 for ×, 41
Composition generalized, 40
relational, 49 Divide and conquer, 243
Composition of relations, 49 Divisor
Comprehension, 8 greatest common, 185, 226
Concatenation Domain, 45
of languages, 38 Dual spec rule, 132
of strings, 204 Dummy variable, 89
Conjunctional, 20
Conjunctionally, 20
Connective, 134 E
Boolean, 204 Efficient, 207
for substitution, 134 Empty string, 37
Consistent, 2 Enumerable set, 100
Constant, 141 e-tree, 199
Constant atomic formula, 141 Euclidean algorithm, 226
Continuous function, 184 ∃-introduction, 139
Converse, 73, 93 Expression, 36
Converse relation, 73 Extended equality, 167
Convolution, 241 of Kleene, 89
Coordinates, 57 Extension, 11
in a matrix, 57 Extensionality, 11
Countable, 99 Extensionally, 88
Countable set, 99
Course-of-values induction, 146
Course-of-values recursion (CVR), 178 F
Factorial, 143, 161
Family, 25
D of sets, 25
Decimal, 186 Feasible algorithm, 207
Decimal notation, 186 Fertile
Dedekind Infinite, 114, 182 tree, 233
Definition Fertile node, 233
primitive recursive, 144 Field, 46
de Morgan’s laws left, 46
generalized, 39 right, 46
Denumerable, 100 Finite automata (FA), 65
Index 249
Finite sequence, 34 History, 178

length, 34 History of function at a point, 178
Fixed point, 109, 184 Hypothesis, 128
of an operator, 109 non axiom, 128
Fixpoint, 109, 184
least, 109, 110
monotone operator, 189 I
of an operator, 109 Identity, 48
⊆-least, 109 Identity matrix, 59
∀-introduction, 138 Image
Formal natural number, 186 inverse, 86
successor of, 186 of a set, 86
Formula Immediate predecessor (i.p.), 204
atomic, 204 Implied concatenation, 37
closed, 140 of languages, 38
prime, 119 L M for L ∗ M, 38
Foundation x y for x ∗ y, 37
axiom, 156, 166 Implied multiplication, 37
axiom of, 127 ab for a × b or a · b, 37
Free occurrences, 118 Implies
Fresh, 135 conjunctional, 20
Fresh variable, 135 Inclusion, 184
Full tree, 233 Inclusion map, 110
Function, 85, 184 Inconsistency, 1
1-1, 89 Induction
basis, 205 complete, 146
generating, 217 course-of-values, 146
intensional notation, 154 simple, 149
iteration, 205 strong, 146
left inverse of, 92 structural, 192
restriction of, 45 Induction hypothesis (I.H.), 140, 146
right inverse of, 92 Induction over a closure, 192
support, 144, 174 Induction step (I.S.), 148
Functional notation, 86 Inductiveness condition (IC), 145, 146
Functional restriction, 45 Infinite, 98
Function application, 88 Infinite descending chain, 153
Function call, 88 Infinite sequence, 104
Function invocation, 88 Infinite set, 182
Dedekind’s definition, 114, 182
Infix notation, 44
G Initial objects, The, 190
-closed, 108 Inorder, 244
Gen, 125 Input variable, 123
Generating function, 217 Integers
Greatest common divisor (gcd), 185, 226 relatively prime, 185
Intension, 11
Intensional, 154
H Intentionally, 88
Height, 233 Intersection, 23
250 Index
Interval square, 59
open, 183 MC
Inverse, 73, 93 for non orders, 82
Inverse image, 86 Minimal
Inverse relation, 73 <-minimal, 77
Irrational, 183 for non orders, 82
Irrational number, 183 Minimal condition, 83, 143
Irrational real, 183 for non orders, 82
Iteration function, 205 Monotone operator, 108, 109
K N
Key, 228 Natural number
Kleene closure, 37 formal, 186
Kleene’s extended equality, 89 n-factorial, 161
Kleene star, 37 Node, 199, 233
ancestor of, 232
circular, 199
L descendant of, 232
λ calculus, 88 fertile, 233
λ-notation, 88 non leaf, 233
Language, 38 square, 199
finitely definable, 38 Node level, 233
Language concatenation, 38 Non-axiom
Leaf, 232 hypothesis, 132
Leaf node, 232 Non comparable elements, 75
Least fixpoint, 110, 184 Nontotal, 47
Least upper bound Notation
the, 182 infix, 44
Left field, 46 Null pointer, 232
Left inverse, 92 Null string, 37
Left-narrow, 154 Number
Left-narrow over a class, 166 irrational, 183
Length, 34 rational, 183
Level, 233
Linear merge sort, 243
Linear order, 77 O
Lub, 182 Object variable, 118
Occurrence, 118
free, 118
M of variable, 118
Majorised, 62 1-1 correspondence, 90
Map Onto, 47
inclusion, 110 Open interval, 183
Mathematical axioms, 126 Open segment, 167
Matrix, 57 Operation, 190
adjacency, 57 Operator, 108, 134
diagonal entries, 59 monotone, 108, 109
identity, 59 Order, 74
Index 251
on a class, 74 Proper subset, 10

inclusion, 184 Property
linear, 77 propagates, 192
partial, 74, 75 Pure recursion, 172
reflexive, 75
strict, 74
total, 75, 77 Q
unrelativised, 144 Quantifier scope, 119
Ordered pair, 31
Over an alphabet, 36
R
Range, 45
P Rational, 183
Pair Recursion
ordered, 31 course-of-values, 178
Paradox, 1 primitive, 175
Parameter, 171 pure, 172
Parent, 232 Recursion with parameters, 171
Parent node, 232 Recursive call, 165
Partial, 75 Reflexive order, 75, 76
Partial order, 74 Regular expressions, 65
Partition, 70 Relation, 44
Path, 232 binary, 44
Pigeon-hole principle, 97 closed under, 190
Ping-pong, 133 converse, 73
P-minimal, 81 diagonal, 48
Pointer domain of, 45
null, 232 from-to, 46
Post, 132 identity, 48
Powerset, 16 inverse, 73
P-predecessor, 146 left-narrow, 154, 166
Prefix nontotal, 47
of a string, 204 on, 46
Primary rule, 125 onto, 47
Prime formula, 119 range of, 45
Primitive recursion, 175 restriction of, 45
schema of, 175 single-valued, 85
Primitive recursive definitions, 144 total, 47
Primitive rule, 125 transitive closure of, 51
Product well-founded, 153
Cartesian, 34 well-founded over A, 153
Program semantics, 185 Relational notation, 86
Proof Relational restriction, 45
ping-pong, 133 Relatively prime, 185
Propagates, 192 Rename the bound variable theorem, 139
Proper class, 7 Renaming the bound variable, 138
Proper prefix Replacement axiom, 87
of a string, 204 Right field, 46
Proper subclass, 10 Right inverse, 92
252 Index
Rule String notation, 36

derived, 131 ambiguous, 36
dual spec, 132 Strong induction, 146
of inference, 125 Structural induction, 192, 231
primary, 125 Subclass, 10
Rule set proper, 10
ambiguous, 204 Subset, 10
unambiguous, 204 proper, 10
Ruleprimitive, 125 ⊆-smallest, 189, 191
Rules of inference, 125 Substitution, 206
function obtained by, 206
Substitution connective, 134
S Successor function, 12
Scope, 119 Sup, The, 182
of a quantifier, 119 Supremum, The, 182
Segment Support function, 144, 174
closed, 167
open, 167
Semantics T
program, 185 Telescoping series, 211
Sentence, 133, 140, 183 Theorem, 123
Sequence T-minimal element, 82
finite, 34 Total, 47, 75
infinite, 104 Total order, 77
Set Transitive class, 187
countable, 99 Transitive closure, 51, 203
enumerable, 100 Transitive set, 187
infinite, 95, 182 Tree
transitive, 187 complete, 233
uncountable, 105 extended, 232
Set closed under an operation, 191 full, 233
Set closed under a set of operations, 191 Tree height, 233
Sets in 1-1 correspondence, 90 Tree traversal
Sibling nodes, 232 inorder, 244
Siblings, 232
Simple induction, 149
Singleton, 17, 38 U
Single-valued, 89 Unambiguous, 204
Special axioms, 126 Union, 23
Square matrix, 59 Unique readability, 204
Strict order, 74 of Boolean formulas, 204
String, 36 Unrelativised, 144
concatenation, 37, 204 Unrelativised order, 144
empty, 37 Upper bound, 182
null, 37 the least, 182
prefix, 37, 204
proper, 37
suffix, 37 V
proper, 37 Variable
Index 253
bound, 118, 123 Variant theorem for ∃, 139

captured, 122
fresh, 135
object, 118 W
Variant theorem, 138, 139 Without loss of generality (Wlg), 24
Variant theorem for ∀, 138 Word, 36

Discrete Mathematics A Concise Introduction

Uploaded by

Copyright:

Available Formats

Discrete Mathematics A Concise Introduction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discrete Mathematics A Concise Introduction

Uploaded by

Copyright:

Available Formats

Synthesis Lectures on

Mathematics & Statistics

ISSN 1938-1743 ISSN 1938-1751 (electronic)

This volume is an introduction to discrete mathematics, an area of study that is required by

2 That is, totally defined.

What to Include? What to Omit?

x ≤ k, where k is a natural number, implies x ≤ x ≤8 k (1)

No. “.” is not part of the symbol! We should write instead

8 x is the smallest natural number ≥ x.

Toronto, Canada George Tourlakis

1 Some Elementary Informal Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

4 A Tiny Bit of Informal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Incidentally, Euclidean Geometry (provably) is free of contradiction —or, as we say dif-

1.1 Russell’s “Paradox”

Let us parse “iff”:

We denote the collection/set6 defined by the entrance condition P(x) by

if we say “let S = {x : P(x)}”, then “x ∈ S iff P(x) is true” (1)

say, R. But then, by the referred to statement above,

2.0.1 Definition (Classes and sets) We call all collections classes.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 7

2.1 The “Real Sets”

That is, an object is in A iff it is also in B.

if we have “x ∈ A iff x ∈ B”, then we have A = B

Is Definition 2.1.1 a “definition” or is it an “axiom”?

We now postulate the principles of formation of sets!

Principle 1. Every set is built at some stage.

So there is no set in our approach that “just happens”.

8 Left Hand Side.

<s < s < s . . . (1)

a <s T <s T <s . . . <s a

By repeated application of 2. we get a <s a , which contradicts 1.

2.2 What Caused Russell’s Paradox

How would the set-building-by-stages doctrine avoid Russell’s paradox?

So (1) is false. A being arbitrary, we demonstrated that

x ∈ x is false (for all x) (2)

10 As founded by Zermelo and Fraenkel, with the axiom of Choice.

2.3 Some Useful Sets

as a set (cf. Principle 0).

We have just proved a theorem above:

2.3.2 Theorem If A, B are sets or atoms, then {A, B} is a set.

Practical considerations. Thus

1. If you want to demonstrate that S1 ∨ S2 is true, for any component statements S1 , S2 ,

So? Is the above different from the formula in the beginning of 3?

attempting to convey the meaning

(S1 → S2 ) ∧ (S2 → S3 ) (2)

Alas, (2) is not the same as (1)!11

means equivalently, as one can check from the truth tables,

2.3.6 Theorem (The subclass theorem) Let A ⊆ B (B a set). Then A is a set.

2.3.7 Corollary (Modified comprehension I) If for all x we have

for some set A, then B = {x : P(x)} is a set.

Proof I will show that B ⊆ A, that is,

2.3.8 Corollary (Modified comprehension II) If A is a set, then so is B = {x : x ∈ A ∧

Nothing is permitted to enter E.

is true (for all x). That is, E ⊆ {1}.

By Theorem 2.3.6, E is a set.

x ∈ E ≡ x ∈ D (both sides of ≡ are false)

and we have E = D by Definition 2.1.1.

2.4 Operations on Classes and Sets

We have the easy theorem below:

x ≤ k, where k is a natural number, implies x ≤ x ≤8 k (1)

8 x is the smallest natural number ≥ x.