Discrete Mathematics A Concise Introduction
Discrete Mathematics A Concise Introduction
Discrete Mathematics A Concise Introduction
George Tourlakis
Discrete
Mathematics
A Concise Introduction
Synthesis Lectures on Mathematics &
Statistics
Series Editor
Steven G. Krantz, Department of Mathematics, Washington University, Saint Louis, MO,
USA
This series includes titles in applied mathematics and statistics for cross-disciplinary
STEM professionals, educators, researchers, and students. The series focuses on new and
traditional techniques to develop mathematical knowledge and skills, an understanding of
core mathematical reasoning, and the ability to utilize data in specific applications.
George Tourlakis
Discrete Mathematics
A Concise Introduction
George Tourlakis
Department of Electrical Engineering
and Computer Science
York University
Toronto, ON, Canada
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
για την šσπoινα
Preface
1 Rational numbers have no gaps with respect to the order <. If a, b are rationals, there is always a
rational c between the two: For example, take c = (a + b)/2.
vii
viii Preface
The present volume engages all but two of its chapters on set theory topics —the
two exceptions being Chaps. 4 and 7. In particular, we include operations on sets, func-
tions, and relations, finite and infinite sets, uncountable sets, and diagonalisation that is
a must-have tool in theory of computation, equivalence relations, orders, induction and
inductive (or recursive) definitions, inductively defined sets, and “structural induction”.
I omit advanced topics such as ordinal and cardinal numbers (these are omitted also in
every other book on discrete mathematics that I know of).
But I do include several non-elementary —yet not utterly esoteric— topics from set
theory in this volume. For example, the chapter on induction includes induction along
arbitrary relations that are not orders, and also the recursive definition topics in this
volume are introduced thoroughly and completely —not as it is commonly done, namely
“here are some recursive definition examples”— to the extent that we prove that recursive
definitions do (each) define a function and that said function is unique. Our recursive
definitions include (with proofs) the case of recursion along a non-order relation. As an
illustration, we give the example defining the so-called support function of set theory by
induction along the non-order relation ∈. The support is a function that returns the set
of atoms that were used to build the function’s set input. For example, the support of
{{{3}}, {{{1}}}} is {1,3}.
Notably, we also include a brief introduction to set operators (not to be confused
with set operations) through which we prove the so-called Schröder-Bernstein theorem
—which actually is due to Dedekind— that proves the difficult, but “obvious”, statement
“if the infinite set A has at most as many elements as the infinite set B, and if also B has
at most as many members as A, then the two sets have the same number of elements”.
Above all, I note that set theory —the small part that we cover— in this volume is
founded so that is “safe”, as I call it, meaning that I build sets by stages following the idea
of Bertrand Russell in order to avoid the obvious contradictions (known as paradoxes or
antinomies) of Cantor’s set theory. For example, the universe of all sets and atoms, U
(see Wilder (1963) for extensive historical commentary) is not a “self-contradictory set”
(ibid.) in this book —in fact it is (easily, cf. Sect. 2.2) provably not a set, that is, it is a
proper class. No harm done!
Other discrete mathematics texts contain much less coverage on set theory, and nor-
mally they omit uncountable sets and diagonalisation, proofs that induction definitions
work, and the generalisation of induction along non-orders.
Outside set theory topics, other discrete mathematics texts are usually deficient in their
presentation of predicate logic (reduced to recipes), and they normally stay away from
solving recurrence equations with generating functions or without. This is a must-have (in
our opinion) topic that supports sequel courses on the theory of algorithms.
Invariably, I see other texts on discrete mathematics defining functions wrongly: in
those definitions, they require all functions to be total 2 by something like “[the function
f from S to T ] … for every x in S returns a unique element y = f (x) in T as output …”.
But this function is total on its “left field” S (that is, supply of inputs) unlike many of the
functions used in theory of computation, for example, this one: “for all x, y return x/y”.
This is undefined for all inputs (x, y) of the form (a, 0) —so is it not a “function”?
The reader will find here a rigorous, correct, and simple chapter on predicate (first-
order) mathematical logic for the user. I have to admit a shortcut I took in my logic
chapter (maybe two), which is (are) made in the interest of saving space and minimising
formalism-fatigue. It is usual —given that most books on algebra or calculus or discrete
mathematics do not offer this definition either— for one to adopt the belief that “one
learns the syntax of formulas via practise”.
Thus, I do not define the syntax of mathematical formulas in its full abstract generality
within predicate logic (the reader that wishes to see this definition may consult Tourlakis
(2008)), but the reader will quickly learn by use from Chaps. 1–3 that, e.g., x ∈ A → x ∈
A ∪ B, A ∈ F → F ⊆ A, and A ⊆ A ∪ B are formulas of interest in set theory, while
n(n + 1)
1 + 2 + 3 + ... + n = is also a formula of number theory (over N).
2
I note that the precise “shape” (syntax) of Boolean formulas —but not of predicate
logic formulas— is defined in Exercise 6.4.5 and several syntactic properties of Boolean
formulas are proved (by the reader) using structural induction.
The related second shortcut is motivated by the lack of a formal definition of formula
syntax! “Perhaps we can do away with the introduction of formal counterparts of the
truth values false and true, namely the Boolean logic constants ⊥ and ” I thought! This
thought is behind my elevating the metasymbols of (Boolean) logic t (true) and f (false)
to atomic Boolean formula status in the logic chapter.
This is not more odd —or more unusual— than using the symbol 3 in algebra, say, to
denote both the name and the value of the constant, we pronounce “three”.
The logic chapter explains fully, and gives several examples of, the application of
generalisation and specialisation, the use of the “ping-pong” theorem used to prove
equivalences, the deduction theorem and proof by contradiction techniques, which are
introduced (and proved to be valid —metatheorems— in the Exercises section with the
help of extensive Hints) and the (complicated) technique of the elimination of an ∃-prefix
of a formula. This also is proved in the Exercises section with my help (Hints). The
variant theorem (4.1.32) —or bound variable renaming theorem— is also proved.
My undertaking, by agreement with the editor, was for a small-length volume. This makes
the above two questions very important.
I hold that absolutely central to any discrete mathematics volume —indeed, any
course on the subject— is induction and inductive definitions (the latter nowadays being
increasingly called “recursive definitions”). Then I need material to support the proper
introduction of these topics, and I need to do it correctly without reproducing/re-bumping
x Preface
into the contradictions (cutely called paradoxes3 and antinomies back then) of Cantor’s
set theory. Enter “safe set theory” in the style of Russell where sets are defined by stages
—only a short step away, this, from modern axiomatic set theory.
Then I must include enough set theory —to do a good job on induction— e.g., I must
cover well the topics on relations and functions that lead to induction (along relations
that are “well-orderings”4 or at the other extreme might not even be orders but do have,
as we technically say in this volume, “MC”) and recursive definitions (again along well-
orderings but also along non-order relations that have MC, in the general case).
Incidentally, as this volume aspires to serve, among others, courses on limitations of
computing at the 2nd, 3rd, or 4th year level, it chooses to steer away from the practice
prevalent in other discrete mathematics books where functions are introduced in a manner
that makes them defined everywhere. Neither in practice nor in theory (of computation)
are all functions and relations defined on all possible “legal” inputs.5 Thus, our functions
and relations are defined to be “partial”, a term that allows both totally defined (“total”)
but also otherwise (“nontotal”) functions and relations.
Our topics on cardinality are just about the right amount, however we do not cover
the advanced topic of cardinal and ordinal numbers. We include mathematical definitions
and the use of finite and infinite sets and countable and uncountable sets. It makes sense
to include the topic of diagonalisation —invented by Cantor— which computer science
students should be able to see and understand its application before they get into a course
on the theory of computation where the concept is extensively used.
Computer science (and computer and software engineering) students also need to take
courses on the analysis of algorithms, where recurrence equations are set up to compute
the run times (usually worst-case upper bounds) of algorithms.
This motivates the chapter on recurrence equations —and their closed form solutions—
generating functions, and trees, the latter not as data structures of interest (they are that
too!) but rather as a tool towards computing some interesting but scary sums that are
of use in the solution of recurrence relations. The justification for including these here
—rather than hoping they will be covered in the analysis of algorithms course and do
nothing— is that these solution techniques are numerous and involved, and we do not
believe that they can be easily embedded as teaching topics in the course that uses them.
The informed reader of this preface will notice that I omit combinatorics, graphs, and
automata.
3 The catastrophic failure of a theory that leads to a contradiction —an inconsistent theory, as we
say— is sugar-coated if we call it a “paradox” from the Greek παρά, that is, against, and δoκ ώ that
is, I believe or I know. If it is something that only betrays our beliefs, then probably it is not that bad?
4 This ungrammatical terminology is imposed on us by the literature.
5 In a theory about functions with natural number inputs and natural number outputs, all natural
numbers are “legal” inputs. But for some functions, not all legal inputs cause an output.
Preface xi
Some discrete mathematics books include these topics. However, automata does not fit
the purposes of a book on discrete mathematics tools. Third year courses in software engi-
neering develop in situ what material they need from automata theory and so do courses
on compilers. Automata is not a discrete math tools topic on par with induction, recursive
definitions, diagonalisation, relations and functions, logic, and recurrence equation solv-
ing. The topics included here are here because students need preparatory practice in these
before one uses them in a course that follows.
But what about graph theory? In practice, graphs —as opposed to graph theory which
is a very extensive subject— are normally introduced quickly where they are needed,
whether it is a course in data structures (no more than the definition of graphs is usually
needed in such a course) or analysis of algorithms where some topics on graph theory
might be needed (paths, cycles, spanning trees).
Regarding combinatorics, if one needs to “count”, then techniques other than the
sophisticated one via generating functions (covered extensively in the present volume
but almost in no other discrete mathematics book) will be covered in situ in a course
where they are called for, that is, typically an analysis of algorithms course.
Under the above heading “what to include”, I felt obliged to introduce the so-called
Axiom of Choice, which guarantees that I can fit in a (finite!) proof infinitely many choices
of elements from a set —even if there is no obvious methodology to describe the sequence
of my choices in a finite manner. Cf. Remark 3.5.28.6
How much mathematical rigour and how much intuition is a good mix in a book like
this? We favour both!
Intuition helps us conceptualise and formulate the elements (rough details) of solutions
to mathematical problems —“napkining it”, as it were, that is, just as we would do a rough
calculation on a napkin. Rigour is the expression and discipline of being mathematically
careful. Thus using rigorous arguments, the extra care that this entails might help to avoid
errors.
The Chapters
Chapter 1 is very short and retells the story of the Russell paradox. This is our motivation
to practise safe set theory, of which the exposition and foundation begin with Chap. 2.
Chapter 2 —endowed with the experience of Chap. 1— states at the outset that we
have two types of collections: sets and non sets. All collections we shall call classes but
the non-set ones we call proper classes. To prove that a class is a set, we use three
Principles, 0, 1, and 2, of set formation by stages. Further, the chapter introduces the class
6 I should note that I can make such a finite-description without the Axiom of Choice helping, if I
wanted to finitely describe infinitely many choices from an infinite set of natural numbers A: Just
choose the smallest a0 in A, then, for all n ≥ 0, choose the smallest an+1 in A − {a0 , a1 , ... , an }.
This two-line recursive definition is good for infinitely many choices.
xii Preface
(unordered) pair {A, B} and proves that indeed it is a set, then proceeds to defining union,
intersection, difference, power set, ordered pair —the latter as a set not as a new strange
object— and Cartesian product. All the italicised terms represent operations on sets that
provably produce sets.
Chapter 3 is on relations and functions. The transitive closure of a relation is included
and for relations on finite sets, algorithms for its computation are proposed including the
well-known Warshall’s algorithm. Equivalence relations and order relations are included.
The former (in an illustrative example) leads to our first acquaintance of the “least prin-
ciple” on the set of natural numbers. The latter starts our study of orders that culminates
to induction and inductive (recursive) definitions in a later chapter.
This chapter also introduces us to the concepts “finite” and “infinite” (set), diago-
nalisation, operators on sets, and the Schröder-Bernstein (or Cantor-Bernstein) theorem
—which is actually Dedekind’s.
Chapter 4 is an about 20-pages long chapter, which outlines the elements of predi-
cate logic for the user, including the techniques of adding/removing ∀ and ∃ prefixes of
formulas in proofs. It contains a good number of illustrative examples.
Chapter 5 introduces the concept of “inductiveness condition” (IC) of a relation and
proves its equivalence to the “minimal condition” (MC) of a relation. This has as a special
case that induction on the natural number set N is equivalent with the least principle
on N.
The statement that “< has IC” is the expression of the fact that we can do induction
along this < to prove mathematical statements about the members of the class where
< acts.
The formula that expresses the “strong” or “course-of-values” induction proof principle
is derived. There are many illustrative examples of induction as well as several end-
of-chapter exercises. The chapter proves the theorem that recursive definitions lead to
functions that uniquely exist. Both induction and recursion are extended to apply to non-
order relations, as long as the latter have MC (equivalently IC). One such non-order
relation is ∈ and thus we not only can prove properties of sets by induction along the
relation ∈ but also can make recursive definitions along ∈. As an illustration of the latter,
the support function is discussed.
Chapter 6 introduces a generalisation of definitions by induction (recursion) of Chap. 5.
According to this generalisation, a set is defined from a given set of operations —that is,
relations R(x1 , ... , xn , y)— where the xi is where the inputs are “read” in and the y is
where the outputs appear. The definition requires the set —it turns out it is a set— to be
the ⊆-smallest set that is closed 7 under all given operations R. We note that if n = 0, for
some such operation R, then the outputs of R can be thought of as given initial objects.
7 A set T is closed under a relation R(x , ... , x , y) —by definition— iff for all specific x , ... , x
1 n 1 n
in T, all the produced y are also in T.
Preface xiii
If the operations are all sets, then the ⊆-smallest set so formed is unique and is called
the closure S = C1({... , R(x1 , ... , xn , y), ...) of these relations. We also say that S is
inductively defined by said operations.
The associated proof tool —induction over a closure, also termed structural induction—
proves properties of inductively defined sets. We validate that this induction “works” in
this chapter.
We also connect the inductive definition of sets with an appropriate iterative con-
struction by stages, and we also connect it (in the chapter’s Exercises section) with the
definition of sets as monotone operator fixpoints (monotone operators were introduced in
3.8.1).
Chapter 7. Here, we discuss many classes of recurrence equations and the various
techniques to obtain “closed form” solutions —that is, in terms of known functions such
as λn.n2 , λn.2n , λn.n log2 n etc.
The technique of generating functions is also outlined and demonstrated with several
examples (the most nontrivial of which appearing in the following Chap. 8).
Chapter 8. In the area of mathematics known as graph theory, trees —in particular
binary trees— play a central role as special cases of the so-called directed graphs. While
trees are studied for their own merit in modelling important data structures in computing
practise, they have also unexpected applications to discrete mathematics such as the one
we will demonstrate in this chapter —using trees to compute a scary sum in closed form.
The chapter concludes with an application of generating functions used to compute a
simple expression that computes the number of all extended trees that have n internal
nodes.
There exists a direct graph-theoretic definition of a tree —that is beyond the design of
this volume— but it is arguably more convenient to offer the direct, graph-independent,
recursive definition (as in Knuth (1973), but see Example 6.3.10) that we took in Chap. 6
if for no other reason, then at least for the fact that such definition enables us to prove
tree properties by structural induction.
If I used this book in a first year undergraduate course on discrete mathematics —which
I am approximately doing now and for the past few years, using a simplified prequel of
this volume that I wrote— I would cover almost everything, except the most difficult
recurrence equations (with or without the help of generating functions) and I would also
skip induction and recursion along a non-order relation with MC. If by a(n unexpected)
miracle discrete mathematics was taught in 2nd year (just as a course in logic, essentially,
is), then I would want to cover everything. This is the advantage of a short volume; you
can cover everything if the audience is prepared.
The reader will forgive, I hope, the many footnotes, which the “style police” may assess
as “bad style”! However, there is always a story within a story that is best delegated to
footnotes not to disrupt the flow of exposition. Incidentally, the book by Wilder (1963)
on the foundations of mathematics would lose most of its effectiveness if it were robbed
of its superbly informative footnotes!
xiv Preface
My footnotes (unlike many in Wilder’s book) are almost never of historical import,
but do support the understanding of those who may be bewildered by long displayed
formulas. To the rescue I often include a very local footnote inside the display to explain
a potentially puzzling spot —and I do so on the spot! Consider the display below as an
example.
The style of exposition that I prefer is informal and conversational and is expected
to serve well not only the readers who have the guidance of an instructor but also those
readers who wish to learn discrete mathematics on their own. I use several devices to
promote understanding, such as frequent “pauses” that anticipate questions and encourage
the reader to rethink an issue that might be misunderstood if read but not studied and
reflected upon. All pauses start with “Pause.” and end with “”.
Apropos quotes and punctuation, we follow the “logical approach” (as Gries and
Schneider (1994) call it) where punctuation is put inside the quotation marks if and only
if it is a logical part of the quoted text; never otherwise. So we would never write
The relation “is a member of” is fundamental in set theory. It is denoted by “∈.”
The relation “is a member of” is fundamental in set theory. It is denoted by “∈”.
Another feature of the above reference that I have adopted is the logical use of the em-
dash “—” as a parenthesis. As such we have a left version and a right version to avoid
ambiguities. The left version is contiguous with the following word but not with the pre-
ceding word. The right version is the reverse of this. For example, “discrete mathematics
is easy —as long as one studies— and is useful towards preparing you for courses in
algorithms and logic”.
I have included numerous remarks, examples, and embedded exercises (the latter in
addition to the end-of-chapter exercises) that reflect on a preceding definition or theorem.
Influenced by my teaching —where I love emphasising things— but originally by the
books of Bourbaki, I use in my books the stylised “winding road ahead” warning, ,
that I first saw in Bourbaki (1966).
It delimits a passage that is too important to skim over.
delimits non-elementary passages that I could not resist including.
There are 202 end-of-chapter exercises and several embedded ones in the text. Many
have hints and thus I refrained from (subjectively) flagging them for “level of difficulty”.
After all, as one of my mentors, Allan Borodin, used to say to us (when I was a graduate
student at the University of Toronto), “Attempt all exercises. The ones you can do, don’t
do; do the ones you cannot do”.
Acknowledgments I wish to thank all those who taught me, including my parents. In particular, I
thank the many mentors I have had at the University of Toronto as a graduate student9 and all those
in my prehistory, in chronological order Andreas Katsaros, Yiannis Ioannidis, and Pan. Ladopoulos
—all three of whom taught me geometry.
9 I will avoid a name listing since every permutation unintentionally —but inevitably— implies a
ranking.
Contents
xvii
xviii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Some Elementary Informal Set Theory
1
Overview
Set theory is due to Georg Cantor. “Elementary” in the title above does not apply to the body
of his work, since he went into considerable technical depth and mathematical sophistication
in this, his new theory. It applies however to our coverage in this volume as we are going to
restrict ourselves to elementary topics only.
Cantor’s Set Theory contains quite a few contradictions widely referred to in the literature
as paradoxes1 or antinomies,2 some of considerable consequence. The next section is about
the least technical, hence the most elementary of all to describe, and most fundamental of
these antinomies contained in Cantorian set theory and was discovered by Bertrand Russell.
What caused these contradictions or inconsistencies as logicians call them?
The reason is that Cantor’s set theory was not based on axioms nor rigid rules of reasoning —
a state of affairs for a theory that we loosely characterise as “informal”.
At the opposite end of “informal” we have the formal theories that are based on the
form of the mathematical statements under consideration and utilise axioms and the rules
of mathematical logic to formulate proofs.
As such the latter theories are “safer” to develop; they do not lead to obvious con-
tradictions.
One cannot fault Cantor for not using logic in arguing his theorems —that process for
“doing mathematics” was not yet invented when he built his theory— but then, a fortiori,
mathematical logic was not invented in Euclid’s time either, and yet he did use axioms that
1 From the Greek par£doxo. ar£ means “against” while doκ è means “I believe” or “I know”. A
paradox thus is against one’s belief or knowledge.
2 From the Greek Antinom…a. Ant… also means “against” and nÒmoj means “the law”. So an antinomy
is against the (mathematical or logical) law.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_1
2 1 Some Elementary Informal Set Theory
stated how his building blocks, points, lines and planes interacted with each other and how
they behaved.
The problem with Cantor’s set theory is that anything goes regarding what sets are. A set
is just a synonym of the dictionary terms “aggregate”, “collection”, “class”, etc. Definition-
by-dictionary of synonyms as it were.
Moreover, Cantorian set theory does not deal with the most fundamental question: “How
are sets formed? This question has a remarkably simple answer (Russell’s) that can be
credited for the particular choice of the axioms of modern axiomatic formal ZF Set Theory.3
We sample in the next section the kind of logical “trouble” this extremely informal
approach entails (Russell’s paradox).
It must be stated at the outset that to have a set theory that has no obvious inconsistencies
does not necessitate to work formally within the axiomatic method. Our first chapter, based
on Russell’s approach allows us to do “safe” set theory —a nickname we gave to informal
set theory that does not have any of the known inconsistencies of Cantor’s set theory—
within an informal (non axiomatic) setting.
Following Russell we will ask and answer how sets are built first, and then derive from
our answer some principles that will guide (and protect!) the theory’s development —and,
in particular, will guide us in “safely” building “large” sets; indeed any sets!
Cantor’s naïve (this adjective is not derogatory but is synonymous in the literature with
informal and non axiomatic) set theory was plagued by paradoxes, the most famous of which
(and the least “technical”) was pointed out by Bertrand Russell and was thus nicknamed
“Russell’s paradox”.
Cantor’s theory is the theory of collections (synonymously, sets) of objects, as we men-
tioned above, terms that were not defined and moreover it was not indicated how these sets
were built.4
3 “ZF” for Zermelo and Fraenkel, the designers of the most commonly used axiomatic set theory.
4 This is not a problem in itself. Euclid too did not say what points and lines were; but his axioms
did characterise their nature and interrelationships: For example, he started from these (among a few
others) a priori truths (axioms): a unique line passes through two distinct points; also, on any plane,
a unique line l can be drawn parallel to another line k on the plane if we want l to pass through a
given point A that is not on k.
The point is:
1.1 Russell’s “Paradox” 3
This theory studies operations on sets, properties of sets, and aims to use set theory as
the foundation of all mathematics. Naturally, mathematicians “do” set theory of mathemat-
ical object collections —not collections of birds and other beasts. We have learnt some
elementary aspects of set theory in high school. We will learn more from this book.
1. Variables. Like any theory, informal or not, informal set theory —a “safe”5 variety of
which we will develop here— uses variables just as algebra does. There is only one
type of variable that varies over set and over atomic objects too, the latter being objects
that have no set structure. For example integers. We use the names A, B, C, . . . and
a, b, c, . . . for such variables, sometimes with primes (e.g., A ) or subscripts (e.g., x23 ),
or both (e.g., x22 , Y ).
42
2. Notation. Sets given by listing. For example, {1, 2} is a set that contains precisely the
objects 1 and 2, while {1, {5, 6}} is a set that contains precisely the objects 1 and {5, 6}.
The braces { and } are used to delimit at the left and right the indicated collection/set of
objects by outright listing.
3. Notation. Sets given by “defining property”. But what if we cannot (or will not) explicitly
list all the members of a set? Then we may define what objects x get in the set/collection
by having them to pass an entrance requirement, P(x):
An object x gets in the set iff (if and only if ) P(x) is true of said object.
a. The IF: So, IF P(x) is true, then x gets in the set (it passed the “admission require-
ment”).
b. The ONLY IF: So, IF x gets in the set, then the only way for this to happen is for it
to pass the “admission requirement”; that is, P(x) is true.
In other words, “iff” (as we probably learnt in high school or some previous university
course such as calculus) is the same thing as “is equivalent”:
“x is in the set” is equivalent to “P(x) is true”.
You cannot leave out both what the nature of your objects of study is and how they behave/interrelate
and get away with it! Euclid omitted the former but provided the latter, so all worked out.
5 Safe from contradictions, that is.
4 1 Some Elementary Informal Set Theory
{x : P(x)} (1)
but also as
{x | P(x)} (1 )
reading it “the set of all x such that (this “such that” is the “:” or “|”) P(x) is true [or
holds]”
4. “x ∈ A” is the assertion that “object x is in the set A”. Of course, this assertion may be
true or false or “it depends”, just like the assertions of algebra 2 = 2, 3 = 2 and x = y
are so (respectively).
5. x ∈/ A is the negation of the assertion x ∈ A.
6. Properties
• Sets are named by letters of the Latin alphabet (cf. Variables, above). Naming is
pervasive in mathematics as in, e.g., “let x = 5” in algebra.
So we can write “let A = {1, 2}” and let “c = {1, {5, 6}}” to give the names A and c
to the two example sets above, ostensibly because we are going to discuss these sets,
and refer to them often, and it is cumbersome to keep writing things like {1, {5, 6}}.
Names are not permanent;7 they are local to a discussion (argument).
• Equality of sets (repetition and permutation do not matter!)
Two sets A and B are equal iff they have the same members. Thus order and multi-
plicity do not matter! E.g., {1} = {1, 1, 1}, {1, 2, 1} = {2, 1, 1, 1, 1, 2}.
• The fundamental equivalence pertaining to definition of sets by “defining property”:
So, if we name the set in (1) above, S, that is,
Incidentally, we almost never say “is true” unless we want to emphasise this fact. We
would say instead: “x ∈ S iff P(x)”.
Equipped with the knowledge of the previous bullet, we see that the symbol {x : P(x)}
defines a unique set/collection: Well, say A and B are so defined, that is, A = {x :
P(x)} and B = {x : P(x)}. Thus
A={x:P(x)} B={x:P(x)}
x∈A iff P(x) iff x∈B
6 We have not yet reached Russell’s result, so keeping an open mind and humouring Cantor we still
allow ourselves to call said collection a “set”.
7 OK, there are exceptions: ∅ is the permanent name for the empty set —the set with no elements at
all— and for that set only; N is the permanent name of the set of all natural numbers.
1.1 Russell’s “Paradox” 5
thus
x ∈ A iff x ∈ B
and therefore A = B by the way equality of sets/collections is defined.
Let us pursue, as Russell did, the point made in the boxed statement (last bullet) above. Take
P(x) to be specifically the assertion x ∈/ x. He then gave a name to
{x : x ∈
/ x}
x ∈ R iff x ∈
/x (2)
If we now believe,8 as Cantor, the father of set theory who did not question and went ahead
with it, that every P(x) defines a set, then R is a set.
What is wrong with that?
Well, if R is a set then this object has the proper type to be substituted into the variable
of type “math object”, namely, x, throughout the equivalence (2) above. But this yields the
contradiction
R ∈ R iff R ∈
/ R (3)
This contradiction is called the Russell’s Paradox.
This and similar paradoxes motivated mathematicians to develop formal symbolic logic
and to invent axiomatic set theory9 as a means to avoid paradoxes like the above.
Other mathematicians who did not want to use mathematical logic and axiomatic theories
found a way to do set theory informally, yet safely.
What they did was to ask and answer “how are sets formed?”10
We will look into the details of the founding this “safe” set theory in the next chapter.
8 Informal mathematics often relies on “I know so” or “I believe” or “it is ‘obviously’ true”. Some
people call “proofs” like this —that is, baseless arguments— “proofs by intimidation”. Nowadays,
with the ubiquitousness of the qualifier “fake”, one could also call them “fake proofs”.
9 There are many flavours or axiomatisations of set theory, the most frequently used being the “ZF”
set theory, due to Zermelo and Fraenkel.
10 Actually, axiomatic set theory —in particular, its axioms are— is built upon the answers this group
came up with. This story is told at an advanced level in Tourlakis (2003b).
Safe Set Theory
2
Overview
This chapter introduces Russell’s idea that sets are built by stages. This avoids the obvious
contradictions of the naïve set theory of Cantor’s that stem from situations where it allows
some collections, such as {x : x ∈ / x}, to be “self contradictory sets” as mathematicians
referred to them back then when set theory was new and contradictions were first discovered
(cf. Wilder 1963).
Once we introduce the what and the how of Russell’s Principles of set formation by
stages and demonstrate their application by examples, we let the chapter proceed with the
development of the elementary theory of sets. This theory —as introduced here— recognises
that we have two types of collections, sets and non sets, the latter called proper classes in
the modern literature, and we have tools to tell them apart. The “self contradictory sets” of
naïve set theory go away in this setting.
We begin in this chapter the development of elementary set theory by introducing the
usual operations on sets such as ∪, ∩, × that create new sets from given ones.
We will normally use what is known as “blackboard bold” notation and capital latin letters
to denote classes by names such as A, B, X. If we determine that some class A is a set, we
would rather write it as A, but we make an exception for the following sets: Mathematicians
use notation and results from set theory in their everyday practice. We call the sets that
mathematicians use the “real sets” of our mathematical intuition, like the set of natural
numbers, N (also denoted by ω), integers Z, rationals Q and reals R.
In forming the class {x : P(x)} for any property P(x) we say that we apply comprehension.
It was Frege and Cantor who believed (explicitly or implicitly) that comprehension was safe
—i.e., always produced what they understood to be a “set”.
But as we saw in Sect. 1.1 Russell proved that this was not the case.
Mind you, Cantor never said what a “set” really is. He just relied on dictionary-derived
synonyms —which, alas, does not settle ita — such as “collection” and “aggregate”.
Nevertheless, very precisely, Russell proved that whatever you might want to call
“collections” (or “sets” for that matter) of objects that are namable by “{x : P(x)}”-
type names it is a mathematical fact that there is a choice of at least one “P(x)” that
names a “collection” that cannot possibly be of the same type as any of those collections
that you just believe you are “defining” via names such as “{x : P(x)}”!
a “Natural language” is neither a substitute nor an aid for the precision of mathematics.
It is a widely held tenet that set theory, using as primitives the notions of set, atom (an
object that is not sub-divisible; not a collection of objects), and the relation belongs to (∈),
is sufficiently strong to serve as the foundation of all mathematics. Mathematicians use
notation and results from set theory in their everyday practice.
In Definition 2.0.1 we said that {x : P(x)} always defines a class, say, A.
Is there a converse in this observation? That is, if A names a class, is there always a
“property” P(x) —whose expression does not use the letter A— such that A = {x : P(x)}?
If P(x) can refer to the letter A then, yes: A = {x : x ∈ A} since this simply says “x ∈ A
iff x ∈ A” (cf. (1) on p. 4).
2.1 The “Real Sets” 9
If on the other hand we heed the restriction in italics immediately above, then this converse
is false. Here is why:
The term “property” is in our context, mathematically speaking, a “formula (of logic) in
which the only set theory symbol that need occur is ∈”.1
We will later learn that we can only have enumerably many such formulas (properties) —
meaning that we can enumerate all of them in a straight (infinitely long) line where unique
positive integers are associated with —they index— each formula that we place on said line,
and no such integer repeats in our straight line.
On the other hand we will also learn that if we consider all the sets of integers, then
we cannot enumerate them in a similar way on a straight line. We have “more” such
sets of integers S than we have properties T (x) to name them without cheating2 as S =
{x : T (x)}.
So, how can we tell, or indeed guarantee, that a certain class is a set?
Russell proposed the following “recovery” from his Paradox:
Make sure that sets are built by stages, where at stage 0 all atoms are available. Atoms are
also called urelements in the literature from the German Urelemente, which in analogy with
the word “urtext” —meaning the earliest text— would mean that they are the “earliest”
mathematical objects available. Witness that they are available at stage 0!
We may then collect atoms to form all sorts of “first level” sets. We may also proceed to
collect any mix of atoms and first-level sets to build new collections —second-level sets—
and so on. Much of what set theory does is attempting to remove the ambiguity from this
“and so on”. See below, Principles 0–2.
Thus, at the beginning we have √ all the level-0, or type-0, objects available to us. For
example, atoms such as 1, 2, 13, 2, π are available. At the next level we can include any
number of such atoms (from none at all, to all) to build a set, that is, a new mathematical
object. Allowing the usual notation, i.e., listing of what is included within braces, we may
cite a few examples of level-1 sets:
L1-1. {1}.
L1-2. {1, √
1}.
L1-3. {1, 2}.
1 The multitude of symbols we use in set theory, “∅, ∩, ⊆, ∪, ” are all derived symbols —“macros”
if you will— that are expressed using variables and “∈” only.
2 “Cheating” would be to write S = {x : x ∈ S}. You see, the informal name “S” is not a permissible
symbol to use in writing down set or atom “properties”. It is neither a symbol of logic, nor is it the
permissible symbol ∈.
10 2 Safe Set Theory
√
L1-4. { 2,
√1}.
L1-5. {π, 3, π }.
We already can identify a few level-2 objects, using what (we already know) is available:
√
L2-1. {{ 2, 1}}.
L2-2. {{0}, π }.
Note how the level of nesting of { }-brackets matches the level or stage of the formation of
these objects!
2.1.1 Definition (Class and set equality) This definition applies to any classes, hence, in
particular, to any sets as well.
Two classes A and B are equal —written A = B— means
x ∈ A iff x ∈ B (1)
2.1.2 Remark
(1) Thus our definition of how classes (or sets) compare with respect to equality is chosen
to be 2.1.1 —yes, this is our choice about what is the important factor for two classes to
be equal; other choices are possible but not taken in the standard set theory literature.
For example, {1} = {1, 1, 1}. Why? Because any object I see in the class to the left of
“=” I also see in the class to the right, and vice versa. Similarly, {1, 2} = {2, 1}, for I
see just “1” and “2” in the left class and I note these objects are also in the right class,
and vice versa.
2.1 The “Real Sets” 11
These two observations related to the representation of classes by listing, obtained from
Definition 2.1.1, are often stated as “in a class or set depicted by listing its elements
within braces, neither the order (of listing) the elements nor their multiplicity matter”.
Thus one will usually write {1, 2} rather than {1, 2, 2, 1, 1}.
(2) If n is an integer-valued variable, then what do we understand by the statement “2n is
even”? The normal understanding is that “no matter what the value of n is, 2n is even”,
or “for all values of n, 2n is even”.
When we get into our logic topic later on we will see that we can write “for all values
of n, 2n is even” with less English as “(∀n)(2n is even)”. The expression “(∀n)” says
“for all (values of) n”.
Mathematicians often prefer to have statements like “2n is even” with the “for all”
implied.3 You can write a whole math book without writing ∀ even once, and without
overdoing the English.
(3) Definition 2.1.1 is called “extensionality” because it is the extension —that is, what
members are in the two classes— that determines equality; not the intention —i.e., how
the members of the two classes were selected.
For example, the two classes {x : x 2 − 2x + 1 = 0} and {1} are equal. Both contain just
“1”.
(4) Definition 2.1.1, more economically, could be stated
The converse follows from logic needing no help from set theory concepts.
How? Well, in
x ∈A (†)
A is a name of a mathematical object. Therefore, if the name B stands for the same
object (i.e., A = B) then x ∈ B means exactly the same thing as (†).
But see also the -passage below.
• In a formal approach to set theory, said theory is an extension of predicate logic, obtained
by adding the theory-specific symbol “∈” and adding a number of set theory-specific
3 An exception occurs in Induction that we will study later, where you fix an n (but keep it as a variable
of an unspecified fixed value, not as 5 or 42) and assume the “induction hypothesis” P(n). But do not
worry about this now!
12 2 Safe Set Theory
axioms, one of which is extensionality.4 The axioms governing the behaviour of equality
“=” in logic are inherited by any theory that we base on logic, that is, a theory whose
theorems we are proving syntactically using logic.
• Thus in such theories we do not “redefine” or “amend” what equality is and what its
axiomatically postulated properties —in logic— are.5
• Here is an analogy from Peano Arithmetic (PA) —an axiomatic theory of natural numbers
based on logic. It contains among others the axiom “x + 1 = y + 1 implies x = y”. This
axiom evidently is not a “definition of =” between numbers, but rather is an axiom about
the behaviour of the function “+1”.6 in the presence of equality
Another property of the successor is captured by the PA axiom “x + 1 = 0”, and again it
clearly does not “define” equality or its negation “ =” —it is rather about the successor’s
behaviour around “=” and “0”.
Entirely analogously, extensionality is not about logic’s “=”, but rather is about how sets7
behave around “=”.
An axiom about sets!
• However, our exposition of the elements of “safe set theory” is not axiomatic —so we do
not rely on preexisting axioms for “=” from logic— thus we will side with the excellent
informal but mathematically rigorous discussion of the foundations of set theory in Wilder
(1963, p. 58) and take extensionality as a definition with no harmful side-effects. This
choice is convenient as at once we “define” equality for all classes —that may or may
not be sets.
2.1.3 Remark Since “iff” between two statements S1 and S2 means that we have both
directions
If S1 , then S2
and
If S2 , then S1
we have that “A = B” is logically the same as (equivalent to) “A ⊆ B and B ⊆ A”.
4 In formal set theory if one ever speaks of classes (e.g., Levy 1979; Tourlakis 2003b) then one does
so informally and only for convenience. Non set classes have no status within the theory. Within the
theory we have only sets and atoms, and the axioms are about sets and atoms only.
5 One such postulated property of “=” in logic is one of the usual axioms of equality, namely,
“x = x”.
6 Called the successor function
7 A symbolic formulation within logic of the relationship between “ A = B” (for sets) and “x ∈ A iff
x ∈ B” is the “axiom of extensionality” of axiomatic set theory.
2.1 The “Real Sets” 13
2.1.4 Example In the context of the “A = {x : P(x)}” notation we should remark that
notation-by-listing can be simulated by notation-by-defining-property: For example, {a} =
{x : x = a} —here “P(x)” is x = a.
Also {A, B} = {x : x = A or x = B}. Let us verify the latter: Say x ∈ lhs.8 Then x = A
or x = B. Thus x must be A or B. But then the entrance requirement of the rhs9 is met, so
x ∈ rhs.
Conversely, say x ∈ rhs. Then the entrance requirement is met so we have (at least) one
of x = A or x = B. Trivially, in the first case x ∈ lhs and ditto for the second case.
Sets and atoms are the mathematical objects of our (safe) set theory.
Sets are formed by stages. At stage 0 we acknowledge the presence of atoms. They are
given outright, they are not built.
Principle 0. At any stage we may build a set, collecting together other mathematical
objects (sets or atoms) provided these (mathematical) objects that we put into our set
were available at stages before .
2.1.5 Remark (Assumed properties of stages) The reader would be surprised by this
remark: Do we need to say more about stages? The concept of building something by stages
is intuitively clear: At stage 0 we do this; at stage 1 we do that; at stage 3 we do something
else, etc.
Note however that this impatient observation is based on stages that are (named by)
natural numbers. Natural numbers have nice and well understood order properties. For
example you cannot have n < n if n is a natural number. In fact you cannot have an increasing
chain that starts with n and ends with n. So, we do not have n < n. On the other hand we
have that n < m < k implies n < k.
But there are far too many sets. More than natural numbers, which is easy to readily
agree with since we also have real numbers, a much “larger” set that contains the natural
numbers.
Ergo, we need many more stages of set formation than just (those named by) natural numbers.
So we postulate for our stages “reasonable” and “intuitively desirable” properties below,
which imitate the order properties of natural numbers, without attempting to identify stages
with such numbers as this would be unnecessarily restrictive as we noted above.
Below we depict stages by the letters or T with or without primes or subscripts and
postulate as true a few intuitively pleasing properties they will have with respect to “before”
and “after” relation.
We accept that the stages of set formation ordered by “before” (or “after”) share the
following properties with the natural numbers, the latter ordered by “less than”.
Namely, let us write <s for “stage is before stage ”. Then we have
1. <s is false. That is “before” and “after” mean what we expect them to. No
event or stage can occur before (or after) itself.
2. If <s <s , then <s . No surprises here either: the expected transi-
tivity of before and after relations.
3. If , are stages, then we have one of <s , = , <s . We expect to
be able to tell if a stage is before another (or after, or are the same), else how will
we be able to assert that a class that we just built was built after all its members?
4. If is any stage, then there is a stage after it: <s (this repeats Principle 2).
Principle 2 (equivalently, 4. above) makes it clear that we have infinitely many stages of
set formation in our toolbox. Indeed, starting with any , by repeated application of said
Principle we can build an infinite ascending sequence
All members in (1) are distinct, else one, say a , repeats. We then have
2.1.6 Remark If some set is definable (“buildable”) at some stage , then it is also definable
at any later stage as well, as Principle 0 makes clear.
The informal set-formation-by-stages will guide us to build, safely, all the sets we may
need in order to do mathematics.
2.2 What Caused Russell’s Paradox 15
In axiomatic set theory ZFC10 —just as in “small” tasks where we use natural numbers
as “stages”— one defines stages beyond natural numbers to be certain “infinite numbers”
called ordinals. See, for example, Tourlakis (2003b).
Recall that à la Cantor we get a paradox (contradiction) because we insisted to believe that
all classes are sets, that is, following Cantor we “believed” Russell’s “R” was a set.
Principles 0–2 allow us to know a priori that R is a proper class. No contradiction!
How so?
OK, is x ∈ x true or false? Is there any mathematical object x —say, A— for which the
following is true?
A ∈ A? (1)
Well, for atom A, (1) is false since atoms have no set structure, that is, they are
not collections of objects. An atom A cannot contain anything, in particular it cannot
contain A.
What if A is a set and A ∈ A? Then in order to build A, the set, we have to wait until
after its member, A is built (Principle 0 says “provided”). So, we need (the left) A to be
built before we can build (the right) A in (1) as a set. In short, since the left and right A are
the same, we want A build before A is built. Preposterous!
thus x ∈
/ x is true (for all x), therefore the R of Sect. 1.1 is U, the universe of all sets and
atoms; the class of everything.
“Everything” with restrictions in the modern literature. Our classes are allowed to contain
only atoms and sets. Not proper classes.
Of course, Cantorian set theory had no such restriction since it did not distinguish between
set and non-set classes to begin with.
Thus
R=U
Here is now an exact reason why U is not a set. Well, assume for a moment that it is.
Then
• U ∈ U since the rhs contains all sets and we temporarily assumed the lhs to be a set.
• But we just saw that the above is false taking x to be U in (2) above.
2.2.1 Remark The immediate reactions of the mathematical community to Russell’s para-
dox was to blame “size”: “R is too big to be a set”. Well, define “too big”!
They did not. The discussion of the panic that ensued is outlined in Wilder (1963) in a
very illuminating manner. He points out (loc. cit.) that even the phrase “all sets” was viewed
with suspicion, not only the dangerous act of collecting “all sets”!
But why did Russell bother to define his R? Why did he not use
U = {x : x = x}
to collect all sets and prove directly, as we did above, that U cannot be a set, thus demon-
strating in this alternative way that not all “defining properties” lead to sets?
Because his idea that sets should be build by stages was suggested later. Incidentally,
the “too big” U —if any collection qualifies for the label “too big” surely the one that
contains everything does!— was also discovered (without showing R = U) to not be a set
in a roundabout longish manner that I cannot reproduce this early in our development.
I promise to come back to this “paradox of the powerset” —to be defined later. You see,
U being an omni-container contains its powerset as an element and as a subset. A so-called
cardinality argument then derives a contradiction to the claim that U is a set.
So U, and R, are proper classes. Thus, the fact that R is not a set is neither a surprise,
nor paradoxical. It is just a proper class as we just have recognised.
2.3.1 Example (Pair) By Principle 0, if A and B are sets or atoms, then let A be available
at stage and B at stage . Without loss of generality say is not later than —recall
postulate 3. about the relative positions of two stages.
Let then pick a stage after (Principle 2). This will be after both (postulate 2. on
p.14) , .
At stage we can build
{A, B} (1)
2.3 Some Useful Sets 17
2.3.3 Exercise Without referring to stages in your proof, prove that if A is a set or atom,
then {A} is a set.
Incidentally, a set that contains exactly one element is called a singleton.
2.3.4 Remark A very short digression into Boolean Logic —for now. It will be con-
venient to use truth tables to handle many simple situations that we will encounter where
“logical connectives” such as “not”, “and”, “or”, “implies” and “is equivalent” enter into
our arguments.
We will put on record here how to handle things such as “S1 and S2 ”, “S1 implies S2 ”,
etc., where S1 and S2 stand for two arbitrary statements of mathematics. In the process we
will introduce the mathematical symbols for “and”, “implies”, etc.
The symbol translation table from English to symbol (and back) is:
NOT ¬
AND ∧
OR ∨
IMPLIES (IF…,THEN) →
IS EQUIVALENT ≡
The truth table below has a simple reading. For all possible truth values —true/false, in
short t/f— of the “simpler” statements S1 and S2 we indicate the computed truth value of the
compound (or “more complex)” statement that we obtain when we apply one or the other
Boolean connective of the previous table.
S1 S2 ¬S1 S1 ∧ S2 S1 ∨ S2 S1 → S2 S1 ≡ S2 S2 → S1
f f t f f t t t
f t t f t t f f
t f f f t f f t
t t f t t t t t
18 2 Safe Set Theory
Comment. All the computations of truth values satisfy our intuition, except perhaps that
for “→”:
¬ flips the truth value as it should, ∧ is eminently consistent with common sense as it
applies to “and”, ∨ is the “inclusive or” of the mathematician, and ≡ is just equality on the
set {f, t}, as it should be.
The “uneasiness” with this so-called “classical” → is that there is no causality from left
to right. The only “easy to understand” entry is for t → f. The outcome should be false, that
is, indicating a “bad implication”: You see, we have a true hypothesis but a false conclusion
while, intuitively, a “good” implication ought to preserve truth. This implication must be
“broken”, so we entered f.
But what I just said about the case t → f indicates that → is meant to preserve truth from
left to right. But that it precisely does as per table!
Here is the full picture for →:
• Row one is the “no counterexample” case. That is, I claim that truth was preserved since
there was no truth (left of →) to preserve anyway! You have no counterexample to what
I said. For that you need a t to the left and a f to the right of →.
• In the second row we are good! We got t without lifting a finger!
• In the last row truth is preserved left to right!
• As for row three, we made our case already.
Note, incidentally, if we know that S1 ∧ S2 is true, then the truth table guarantees that
each of S1 and S2 must be true.
3. If now you want to show the implication S1 → S2 is true, then the only real work is to
show that if we assume S1 is true, then S2 is true too.
If S1 is known to be false, then no work is required to prove the implication because of
the first two lines of the truth table!
4. If you want to show S1 ≡ S2 , then —becausethelast three columns show that this is
equivalent to (same truth values as) S1 → S2 ∧ S2 → S1 — you just prove each of
the two implications S1 → S2 and S2 → S1 .
2.3 Some Useful Sets 19
From the truth table we see that we have one unary (takes one argument) and four binary
(they take two arguments each) Boolean connectives. We can cascade the operations the
connectives indicate to obtain more complex expressions, such as (S1 ∨ S2 ) ∧ S3 , (S1 ∧
S2 ) ∨ S3 , (S1 → S2 ) → S3 .
Do we always have to carry as many brackets as in the examples immediately above?
Well, as a rule never remove brackets that you or someone else that understands logic
cannot restore correctly.
By agreeing on the “strengths” or “priorities” of the connectives we often can get away
with fewer brackets iff we have an algorithm using which we can restore the ones we
remove to their original positions. The usual agreement is that the unary “¬” is strongest
(has highest priority) and the binary connectives follow (see below) from left to right in
order of decreasing priority.
¬, ∧, ∨, →, ≡ (†)
Equipped with these priorities we can reinsert brackets correctly if anyone has removed
them correctly.
2.3.5 Example
1. Consider (S1 ∨ S2 ) ∧ S3 . We cannot remove the brackets we see in this example, for if
we did, then the strengths of the connectives would suggest we reinsert them this way
S1 ∨ (S2 ∧ S3 ) (‡)
because, in the contest between ∨ and ∧ to win over S2 , ∧ wins in (‡) while ∨ wins in
the original (as the brackets override the priorities).
But (‡) is not correct. How do we determine incorrectness? By finding truth values for
the Si —can you find them?— that lead to distinct results in the original versus (‡).
2. (S1 ∧ S2 ) ∨ S3 . This simplifies to S1 ∧ S2 ∨ S3 in a reversible manner, since the priority
of ∧ (vs. ∨) allows us to reinsert the missing brackets.
3. (S1 → S2 ) → S3 . Can we simplify this, by, say, removing the brackets? This example
amplifies the fact that the priorities are chosen by agreement.
So if we did remove the brackets, how would we reinsert them? Well, the standard
agreement when the same connective fights to win an Si , as in
S1 → S2 → S3 (¶)
is to let the one to the right always win. That is, if we have a chain connected using the
same connective throughout we insert brackets from right to left. Thus here we would
say that brackets would have to be inserted this way
20 2 Safe Set Theory
S1 → (S2 → S3 ) ()
An important variant of → and ≡. Pay attention to this point since almost everybody
gets it wrong! In the literature and in the interest of creating a usable shorthand many
practitioners of mathematical writing use notation like
S1 → S2 → S3 (1)
S1 ≡ S2 ≡ S3 (4)
11 Logic does not have the sole privilege of being abused. So does plain arithmetic, from High School
onwards: One often writes a < b < c but they mean a < b ∧ b < c! This is wrong!
An amusing example from PL/1 —“Programming Language One”— an old programming lan-
guage that incorporates Algol and Cobol (!) and SNOBOL (!) among others is based on the flexibility
of this language in its handling of different data types. It converts from one data type to the other
readily, without error messages. In particular, the logical “true” constant (what we call “t” in this
book) is essentially —I am avoiding tedious details that are immaterial here— the number “1”. Thus
it allows, say, 6 > 5 > 3 as a condition. PL/1 evaluates expressions from left to right and 6 > 5 is
evaluated first and returns 1 (true). Then 1 > 3 is evaluated and returns false.
Try this in your familiar programming language and see what happens!
2.3 Some Useful Sets 21
(S1 ≡ S2 ) ≡ S3 (4 )
or
S1 ≡ (S2 ≡ S3 ) (4 )
where the brackets indicate “which ≡ applies first”.
On the many occasions that we we may want to chain two or more equivalences with the
intention that the chain means that all the equivalences are true, we use a conjunctional “≡”
denoted by ⇐⇒.
Thus
S1 ⇐⇒ S2 ⇐⇒ S3 (5)
means that all equivalences Si ≡ Si+1 —for i = 1, 2— are true. “⇐⇒” is the conjunctional
“≡”.
Note that the notation ⇐⇒ is not offered just for the sake of notation.
The two notations do have distinct meanings. For example, if the truth values of S1 , S2
and S3 are f, f and t, respectively, then (4) computed either as (4 ) or the equivalent (4 )
yields the value t.
On the other hand, evaluating (5), that is, (5 ) below for the same truth values of the Si
yields the value f.
(S1 ≡ S2 ) ∧ (S2 ≡ S3 ) (5 )
So how do we denote (5) correctly without repeating the consecutive S2 ’s and omitting
the implied “∧”? This way:
S1 ⇐⇒ S2 ⇐⇒ S3 (4)
By definition, “⇐⇒” is conjunctional: It applies to two statements —Si and Si+1 — only
and implies an ∧ before the adjoining next similar equivalence.
Proof Well, B being a set there is a stage when it is built (Principle 1). By Principle 0,
all members of B are available or built before stage .
But by A ⊆ B, all the members of A are among those of B.
Hey! By Principle 0 we can build A at stage , so it is a set.
Some corollaries are useful:
P(x) → x ∈ A (1)
x ∈B→x ∈ A
Indeed (see 3. under Practical considerations in Remark 2.3.4), let x ∈ B. Then P(x) is
true, hence x ∈ A by (1). Now invoke Theorem 2.3.6.
Proof The defining property here is “x ∈ A ∧ P(x)”. This implies x ∈ A —by 2. in Remark
2.3.4— that is, we have
(x ∈ A ∧ P(x)) → x ∈ A
Now invoke Corollary 2.3.7.
2.3.9 Remark (The empty set) The class E = {x : x = x} has no members at all; it is
empty. Why? Because
x ∈ E ≡ x = x
but the condition x = x is always false, therefore so is the statement
x ∈E (1)
x ∈ E → x ∈ {1}
But is it unique so we can justify the use of the definite article “the”? Yes. The specification
of the empty set is a class with no members. So if D is another empty set, then we will have
x ∈ D always being false. But then
The reader probably has seen before (perhaps in calculus) the operations on sets denoted by
∩, ∪, − and others. We will look into them in this section.
2.4.1 Definition (Intersection of two classes) We define for any classes A and B
De f
A∩B = x : x ∈A∧x ∈B
We call the operator ∩ intersection and the result A ∩ B the intersection of A and B.
If A ∩ B = ∅ —which happens precisely when the two classes have no common
elements— we call the classes disjoint.
It is meaningless to have − operate on atoms.12
Proof I will prove A ∩ B ⊆ B which will rest the case by Theorem 2.3.6. So, I want
x ∈A∩ B → x ∈ B
To this end, let then x ∈ A ∩ B (cf. 3. in 2.3.4). This says that x ∈ A ∧ x ∈ B is true, so
x ∈ B is true.
2.4.4 Definition (Union of two classes) We define for any classes A and B
De f
A∪B = x : x ∈A∨x ∈B
We call the operator ∪ union and the result A ∪ B the union of A and B.
It is meaningless to have ∪ operate on atoms.
12 The definition expects ∩ to operate on classes. As we know, atoms (by definition) have no set/class
structure thus no class and no set is an atom.
24 2 Safe Set Theory
Proof By assumption say A is built at stage while B is built at stage . Without loss of
generality (in short, “wlg”) say is no later than , that is, ≤ .
By Principle 2 I can pick a state > , thus
> (1)
and
> (2)
Let us pick any item x ∈ A ∪ B:
I have two (not necessarily mutually exclusive) cases (by Definition 2.4.4):
In either case, (3) or (4), the arbitrary x from A ∪ B is built before , so we can collect all
those x-values at stage in order to form a set: A ∪ B.
2.4.6 Definition (Difference of two classes) We define for any classes A and B
De f
A−B = x : x ∈A∧x ∈ /B
We call the operator “−” (set-theoretic) difference and the result A − B the difference of A
and B, in that order.
It is meaningless to have − operate on atoms.
Proof The reader is asked to verify that A − B ⊆ A. We are done by Theorem 2.3.6.
Notation. The definitions of ∩ and − suggest a shorter notation for the rhs for A ∩ B and
A − B. That is, respectively, it is common to write instead
13 As x may be an atom, we allow the possibility that it was available with no building involved,
hence we said “available or built”. For A and B though we are told they are sets, so they were built
at some stage, by Principle 1!
2.4 Operations on Classes and Sets 25
x ∈A:x ∈B
and
x ∈A:x ∈
/B
2.4.8 Exercise Demonstrate —using Definition 2.4.1— that for any A and B we have
A ∩ B = B ∩ A.
Hint. There are two parts in the proof:
2.4.9 Exercise Demonstrate —using Definition 2.4.4— that for any A and B we have
A ∪ B = B ∪ A.
2.4.10 Exercise By picking two particular very small sets A and B show that A − B =
B − A is not true for all sets A and B.
Is it true of all classes?
2.4.11 Definition (Family of sets) A class F is called a family of sets iff it contains no
atoms. The letter F is here used generically, and a family may be given any name, usually
capital.
Incidentally, as V contains all sets (but no atoms!) it is a proper class! Why? Well, if it is a
set, then it is equal to one of the x-values that we are collecting, thus V ∈ V. But we saw
that this statement is false for sets!
Here are some classes that are not families: {1}, {2, {{2}}} and U, the latter being the
universe of all objects —sets and atoms— and equals Russell’s “R” as we saw in Sect. 2.2.
These all are disqualified from being “families” as they contain atoms.
2.4.14 Definition (Intersection and union of families) Let F be a family of sets. Then
(i) the symbol F denotes the class that contains all the objects that are common to all
A ∈ F.
In symbols the definition reads:
De f
F = x : for all A, A ∈ F → x ∈ A (1)
(ii) the symbol F denotes the class that contains all the objects that are found among the
various A ∈ F. That is, imagine that the members of each A ∈ F are “emptied” into
a single —originally empty— container {. . .}. The class we get this way is what we
denote by F.
In symbols the definition reads (and arguably is clearer):
De f
F = x : for some A, A ∈ F ∧ x ∈ A (2)
2.4.15 Example Let F = {{1}, {1, {2}}}. Then emptying all the contents of the members of
F into some (originally) empty container we get
2.4.16 Exercise
1. Prove that A, B = A ∪ B.
2. Prove that A, B = A ∩ B.
2.4 Operations on Classes and Sets 27
Hint. In each of part 1. and 2. show that lhs ⊆ rhs and rhs ⊆ lhs (cf. Remark 2.3.4, practical
considerations, 3. and 4.). For that analyse membership, i.e., “assume x ∈ lhs and prove
x ∈ rhs”, and conversely (cf. Definition 2.1.1 and Remark 2.1.3.)
2.4.17 Theorem If the set F is a family of sets, then F is a set.
some
↓
x∈ F≡x∈ A ∈F
Thus x is available or built before A which is built before stage since that is when F was
built. x being arbitrary, all members of F are available/built before , so we can build
F as a set at stage .
2.4.18 Theorem If the class F = ∅ is a family of sets, then F is a set.
Proof By assumption there is some set in F. Fix one such and call it D.
First note that
x∈ F→x ∈D (∗)
Why? Because (i) of Definition 2.4.14 says that
x∈ F ≡ for all A ∈ F we have x ∈ A
Well, D is one of those “A” sets in F, so if x ∈ F then x ∈ D. We established (∗) and
thus we established
F⊆D
by Definition 2.1.1. We are done by Theorem 2.3.6.
However, as the hypothesis (i.e., lhs) of the implication in (∗∗) is false, the implication itself
is true. Thus the entrance condition “for all A, A ∈ F → x ∈ A” is true for all x and thus
allows all objects x to get into F.
Therefore F = U, the universe of all objects which we saw (cf. Sect. 2.2) is a proper
class.
28 2 Safe Set Theory
2.4.20 Exercise What is F if F = ∅? Set or proper class? Can you “compute” which
class it is precisely?
Q = {A1 , A2 , . . . , An }
Then we have a few alternative notations for Q:
(a)
A1 ∩ A2 ∩ . . . ∩ An
or, more elegantly,
(b)
n
Ai
i=1
or also
(c) n
Ai
i=1
Similarly for Q:
(i)
A1 ∪ A2 ∪ . . . ∪ An
or, more elegantly,
(ii)
n
Ai
i=1
or also
(iii)
n
Ai
i=1
If the family has so many elements that all the natural numbers are needed to index the sets
in the set family Q we will write
∞
Ai
i=0
or ∞
Ai
i=0
2.5 The Powerset 29
or
Ai
i≥0
or
Ai
i≥0
for Q and
∞
Ai
i=0
or
∞
Ai
i=0
or
Ai
i≥0
or
Ai
i≥0
for Q
2.4.22 Example Thus, for example, A ∪ B ∪ C ∪ D can be seen —just changing the
notation— as A1 ∪ A2 ∪ A3 ∪ A4 , therefore it means, {A1 , A2 , A3 , A4 }, or {A, B,
C, D}.
Same comment for ∩.
Pause. How come for the case for n = 2 we proved 14 A ∪ B = {A, B} (2.4.16) but
here we say (n ≥ 3) that something like the content of the previous remark and example are
just notation (definitions)?
Well, we had independent definitions (and associated theorems re set status for each,
Theorems 2.4.5 and 2.4.17) for A ∪ B and {A, B} so it makes sense to compare the two
definitions after the fact and see if we can prove that they say the same thing. For n ≥ 3 we
opted to not give a definition for A1 ∪ . . . ∪ An that is independent of {A1 ∪ . . . ∪ An },
rather we gave the definition of the former in terms of the latter. No independent definitions,
no theorem to compare the two!
2.5.1 Definition For any set A the symbol P (A) —pronounced the powerset of A— is
defined to be the class
De f
P (A) = x:x⊆A
(1) The term “powerset” is slightly premature, but it is apt. Under the conditions of the
definition —A a set— 2 A is also a set as we prove immediately below.
(2) We said “all the subsets x of A” in the definition. This is correct. As we know from
Theorem 2.3.6, if X ⊆ Y and Y is a set, then so is X.
Proof Let A be built at stage . Then each of its members y are given or built before .
Thus, since every subset x of A is a set of y-values, every such subset x can be built at
stage .
But then, just take any > . Since all x-values (such that x ⊆ A) are built before ,
at stage we can collect them all and build the set 2 A .
2.5.4 Remark For any set A it is trivial (verify!) that we have ∅ ⊆ A and A ⊆ A. Thus, for
any A, {∅, A} ⊆ 2 A .
To introduce the concepts of cartesian product —so that, for example, plane analytic geom-
etry can be developed within set theory— we need an object “(A, B)” that is like the set pair
(2.3.1) in that it contains two objects, A and B (A = B is a possibility), but in (A, B) order
and length (in (A, B) it is two) matter!
So, are we going to accept a new type of object in set theory? Not at all! We will build
(A, B) so that it is a set!
2.6.1 Definition (Ordered pair) By definition, (A, B) is the abbreviation (short name)
given below:
De f
(A, B) = A, {A, B} (1)
We call “(A, B)” an ordered pair, and A its first component, while B is its second
component.
2.6.2 Remark
1. Note that A = {A, B} and A = {A, A}, because in either case we would otherwise get
A ∈ A, which is false for sets or atoms A. Thus (A, B) does contain exactly two members
or has length two: A and {A, B}.
Pause. We have not said in Definition 2.6.1 that A and B are sets or atoms. So what right
do we have in the paragraph above to so declare?
2. What about the desired property that
(A, B) = (X , Y ) → A = X ∧ B = Y (2)
Well, assume the lhs of “→” in (2) and prove the rhs, “A = X ∧ B = Y ”. From our
truth table we know that we do the latter by proving each of A = X and B = Y true
(separately).
By the remark 1. above there are two distinct members in each of the two sets that we
equate in (3).
So since (3) is true (by assumption) we have (by definition of set equality) one of:
a. A = {X , Y } and {A, B} = X , that is, 1st listed element in lhs of “=” equals the 2nd
listed in rhs; and 2nd listed element in lhs of “=” equals the 1st listed in rhs.
b. A = X and {A, B} = {X , Y }.
Now case (a) above cannot hold, for it leads to A = {{A, B}, Y }. This in turn leads to
{A, B} ∈ A
and thus the set {A, B} is built before one of its members A, which contradicts Principle 0.
32 2 Safe Set Theory
• What if B is also equal to A? Then we have {B} = {A, Y } and thus Y ∈ {B} (why?).
Hence Y = B. We showed so far A = X (listed in case (b)) and B = Y (proved here);
great!
• Here B is not equal to A. But B must be in the rhs of (4), so the only way for that is
B = Y . All Done!
2.6.5 Example So, (1, 2) = {1, {1, 2}}, (1, 1) = {1, {1}}, and ({a}, {b}) = {{a}, {{a},
{b}}}.
2.6.6 Remark We can extend the ordered pair to ordered triple, ordered quadruple, and
beyond!
We take this approach in these notes:
De f
(A, B, C) = (A, B), C (1)
De f
(A, B, C, D) = (A, B, C), D (2)
De f
(A, B, C, D, E) = (A, B, C, D), E (3)
etc.
So suppose we defined what an n-tuple is, for some fixed unspecified n, and denote it by
(A1 , A2 , . . . , An ) for convenience. Then we define
2.6 The Ordered Pair and Finite Sequences 33
De f
(A1 , A2 , . . . , An , An+1 ) = (A1 , A2 , . . . , An ), An+1 (∗)
• Definition 2.6.1
and
• (∗)
The reader has probably seen such recursive definitions before (likely in calculus and/or
high school).
The most frequent example that occurs is to define, for any natural number n and any
real number a > 0, what a n means. One goes like this:
De f
a0 = 1
De f (1)
a n+1 = a · an
The pair of definitions above condenses infinitely many definitions such as
a0 =1
a1 = a · a0 =a
a2 = a · a1 =a·a
a3 = a · a2 =a·a·a
a4 = a · a3 =a·a·a·a
..
.
into just two!
We will study inductive definitions and induction soon!
De f
2.6.7 Exercise What would happen if we defined (in (1)) a 0 = 42?
Caution. Should we not? Why not? Because then a 1 = a × a 0 = a × 42. Hardly the
intended and expected value for a 1 !
A correct answer to these two questions does not prove that a 0 = 1! This expression is
not provable by logic using some axioms I forgot to mention. It is just a judicious renaming
of “1” as “a 0 ”.
34 2 Safe Set Theory
C = C and (A, B) = (A , B )
if we have followed (proved) the “etc.” all the way to the case of (A1 , A2 , . . . , An ) —for
k = 1, 2, 3, . . . , n. Then, by (∗), the case for k = n + 1 —(2) above— is a straightforward
application of the case for Theorem 2.6.3 where X = (A1 , . . . , An ) and Y = An+1 .
We will do the “etc.”-argument elegantly once we learn induction!
Note that now we can redefine all sequences of lengths n ≥ 1 using again (∗) above, but
this time with starting condition that of Definition 2.6.8. Indeed, for n = 2 we rediscover
(A1 , A2 ):
by (∗)
by 2.6.8
the “new” 2-tuple pair: (A1 , A2 ) = (A1 ), A2 = A1 , A2
The big brackets are applications of the ordered pair defined in Definition 2.6.1, just as it
was in the general definition (∗).
2.7.1 Definition (Cartesian product of classes) Let A and B be classes. Then we define
De f
A × B = (x, y) : x ∈ A ∧ y ∈ B
called the Cartesian product of A and B in that order. The definition requires both sides of
× to be classes. It makes no sense if one or both are atoms.
2.7 The Cartesian Product 35
De f
A× B ×C = (A × B) × C
De f
A× B ×C × D = (A × B × C) × D
..
.
De f
A1 × A2 × . . . × An × An+1 = (A1 × A2 × . . . × An ) × An+1
..
.
ą
n
We may write Ai for A1 × A2 × . . . × An
i=1
2.7.4 Remark Thus, what we learnt in Definition 2.7.3 is, in other words,
ą
n
De f
Ai = (x1 , . . . , xn ) : xi ∈ Ai , for i = 1, 2, . . . , n
i=1
and
De f
Bn = (x1 , . . . , xn ) : xi ∈ B
Ś
n
2.7.5 Theorem If Ai , for i = 1, 2, . . . , n is a set, then so is Ai .
i=1
Proof A × B is a set by Theorem 2.7.2. By Definition 2.7.3, and in this order, we verify
that so is A × B × C and A × B × C × D and . . . and A1 × A2 × . . . × An and . . .
36 2 Safe Set Theory
If we had inductive definitions available already, then Definition 2.7.3 would simply read
De f
A1 × A2 = (x1 , x2 ) : x1 ∈ A1 ∧ x2 ∈ A2
and, for n ≥ 2,
De f
A1 × A2 × . . . × An × An+1 = (A1 × A2 × . . . × An ) × An+1
Correspondingly, the proof of 2.7.5 would be far more elegant, via induction.
2.7.6 Definition (Strings over an Alphabet) A string x (or expression or word or vector)
over an alphabet A is just an n-tuple, all of whose components come from the same set, A.
That is, we say that “x is a string of length n over the alphabet A” meaning x ∈ An , for
some n > 0 —or we simply say “x is over A”.
Traditionally, strings are written down in “string notation” without separating commas
or spaces, ommiting enclosing brackets. So if A = {a, b} we will write aababa rather than
(a, a, b, a, b, a).
Thus all strings over A that we spoke of already are members of i>0 Ai .
What is the advantage of the notation aababa over that of (a, a, b, a, b, a)? Well, it is more
natural!
We write words (strings) like this “words” instead of like this“(w,o,r,d,s)” and aformula
(∃x)x = y is written like this “(∃x)x = y” rather than like this “ (, ∃, x, ), x, =, y ”. How-
ever we must be careful with the bracket-less and comma-less notation: Let us start with the
alphabet A = {1, 11}. Which string is denoted by “111”? Unfortunately, we can answer this
in many different ways, so the notation is ill-defined. It is ambiguous as we say. In n-tuple
notation we depict the possible meanings of “111” below:
(1, 1, 1) (length 3) or (11, 1) (length 2) or (1, 11) (length 2).
We avoid this ambiguity in notation if we choose our alphabet members so that each
is a symbol of length one. Thus, in application in assembly programming where one uses
integers base-16, rather than employing the digits
which are mathematically necessary and sufficient for the job one avoids ambiguities by
renaming the last 6 digits
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
Thus the decimal-notation number 11 is denoted by “b” base-16, since 11 translates (in
decimal notation) as 1 × 16 + 1 = 17.
2.7 The Cartesian Product 37
We can exercise the remedy of length-1-symbols easily in practice —just as in the example
above— because we normally deal with finite alphabets.
Concatenation of the strings (a1 , . . . , am ) and (b1 , . . . , bn ) in that order, denoted as
(a1 , . . . , am ) ∗ (b1 , . . . , bn )
A∗ = A+ ∪ {λ}
The “interesting” languages are those that are finitely definable. Automata and language
theory studies the properties of such finitely definable languages and of the “machinery”
that effects these finite definitions. The language of Logic is also finitely definable.
One can learn to live with ∗ as both a unary (one-argument) operation, A∗ , and as a binary
one, L ∗ M, much the same way we can see no ambiguity in uses of minus as −x and y − z.
2.8 Exercises
1. An argument towards showing that U, the class of all sets and atoms, is a set might go
like this:
Let be a stage after all atoms and all member sets of U were built. At stage we
can build U as a set.
Do you accept the preceding argument? Why?
2. Let a be a set, and consider the class b = {x ∈ a : x ∈/ x}.
Show that, despite similarities with the Russell class R, b is a set.
Moreover, show that b ∈ / a.
3. Show that R (the Russell class)= U.
4. Show that if a class A satisfies A ⊆ X for all X, then A = ∅.
5. Without using set-formation-by-stages Principles, show that ∅ = {∅}.
6. Without using set-formation-by-stages Principles, show that ∅ ∈ / ∅.
7. Without using set-formation-by-stages Principles, show that 1 ∈ / 1.
8. Let us prove that {A} —where A is a set— is a set. Argument. Well, {A} ⊆ {A, B}
and {A, B} has been proved to be a set. We conclude by the subclass theorem.
What exactly is wrong with this argument?
Hint. What exactly are we given?
9. Now prove that {A} is a set correctly! Do not argue via Principles 0, 1, 2.
10. Prove that the class {{x} : x = x} which includes only one-element sets {x} —but
includes all of them— is a proper class. Incidentally, the literature calls one-element
sets singletons.
11. Prove that the class {{x} : x is an atom} is a set.
12. How about the class {x : x is an atom}? Set or proper class?
2.8 Exercises 39
13. ZF (Zermelo-Fraenkel) axiomatic set theory contains the following axiom, here
expressed in terms of sets:
∅ = S → (∃x) x ∈ S ∧ ¬(∃z)(z ∈ S ∧ z ∈ x)
(2) A− Bi = (A − Bi )
i∈F i∈F
(1) A ∩ (B ∪ D) = (A ∩ B) ∪ (A ∩ D)
(2) A ∪ (B ∩ D) = (A ∪ B) ∩ (A ∪ D)
40 2 Safe Set Theory
29. (Generalized distributive laws for ∪, ∩). Prove for any class A and indexed family
(Bi )i∈F , that
(1) A∩ Bi = (A ∩ Bi )
i∈F i∈F
(2) A∪ Bi = (A ∪ Bi )
i∈F i∈F
30. Show that the Principles of set formation by stages disallow the truth of a ∈ a.
31. Show that the axiom of foundation (Exercise 2.8.13) disallows the truth of a ∈ a.
32. Show that the Principles of set formation by stages disallow the truth of a ∈ b ∈ c ∈
· · · ∈ a.
33. Show that the axiom of foundation (Exercise 2.8.13) disallows the truth of a ∈ b ∈ c ∈
· · · ∈ a.
34. Show that V = {x : x is a set} is a proper class.
35. Show that for any class (not just set) A, A ∈ A is false.
36. Somebody once said (cf. Wilder 1963) “Consider the class A of all abstract ideas. But
that is an abstract idea, so it is a member of itself.”
Discuss.
37. (1) Show that A =“the class of all sets that contain at least one element” can be defined
by a defining property.
(2) Show that A is a proper class.
38. Expand (i.e., show the set in by-listing notation) 2{1,2,3} .
39. This exercise will reappear after we covered Induction over N.
Attach the intuitive meaning to the statement that the set A has n distinct elements.
Show that if A has n elements then P (A) has 2n elements.
Hint. Imagine that you arranged the members of A in a straight line in any fixed order
you please. So they occupy position 1, position 2, position 3, . . ., position n in an array.
Now any subset of A can be marked off by a checkmark against each of its members
in the above mentioned array. No checkmark against an A-member means it is not in
the subset under consideration.
Well, we can have as many subsets as we can have ways to mark some entries of the
array and leave the rest unmarked! How many such marking schemes do we have?
40. Show (without the use of the Principles of set formation) that {a}, {a, b} = {a },
{a , b } implies a = a and b = b .
41. For any sets x, y show that x ∪ {x} = y ∪ {y} → x = y.
Hint: Use principles of set formation, or even foundation (2.8.13).
2.8 Exercises 41
D × (A ∪ B) = (D × A) ∪ (D × B)
Relations and Functions
3
Overview
The topic of relations and functions is central in all mathematics and computing. In the
former, whether it is calculus, algebra or theory of computation, one deals with relations
(notably equivalence relations, order) and all sorts of functions while in the latter one com-
putes relations and functions (among other related endeavours1 ), in that, one writes programs
that given an input to a relation they compute the response (true or false) or given an input to
a function they compute a response which is some object (number, graph, tree, matrix, other)
or nothing, in case there is no response for said input (for example, there is no response to
x
input “x, y” if what we are computing is but y = 0).
y
We are taking mostly an “extensional” point of view of relations and functions in this
course, as is customary in set theory, that is, we view them as sets of (input, output) ordered
pairs. It is also possible to take an intentional point of view, especially in theory of compu-
tation and some specific areas of mathematics, viewing relations and functions as methods
to compute outputs from given inputs.
The topics in this chapter include an introduction to equivalence and order relations,
finite and infinite sets, to uncountable sets and diagonalisation, and contain the proof of the
nontrivial Cantor-Bernstein theorem.
3.1 Relations
3.1.2 Remark R contains just pairs (x, y), that is, just sets {x, {x, y}}, in other words, it is
a family of sets.
(i)∅
(ii){(1, 1)}
(iii){(1, 1), (1, 2)}
(iv) N2 , that is {(x, y) : x ∈ N ∧ y ∈ N}. This is a set by the fact that N is (Why?) and thus
so is N × N by 2.7.2.
(v) < on N, that is {(x, y) : x < y ∧ x ∈ N ∧ y ∈ N}. This is a set since <⊆ N2 .
(vi) ∈, that is,
{(x, y) : x ∈ y ∧ x ∈ U ∧ y ∈ V} (∗)
This is a proper class (nonSet). Why? Well, if ∈ is a set, then it is built at some stage .
Now examine the arbitrary (x, y) in ∈. This is {x, {x, y}} so it is built before , but
then so is its member x (available before ). Thus we can collect all such x into a set
at stage . But this “set” contains all x ∈ U due to the middle conjunct in the entrance
condition in (∗).3 That is, this “set” is U. This is absurd!
Here is another way to argue that the relation ∈ is not a set: If it is, so is ∈. Any (x, y) ∈ ∈
is of the form {x, {x, y}}. Thus all x for which there is a y such that x ∈ y are in ∈. As
we said in the footnote, taking y = {x} makes clear that “x ∈ y” does not restrict the x’s we
can get. We get them all: thus they form the proper class U. I argued U ⊆ ∈, thus ∈
cannot be a set. So, neither can ∈ (2.4.17).
So, a binary relation R is a table of pairs:
1. Thus one way to view R is as a device that for inputs x, valued a, a , . . . , u, . . . one gets
the outputs y, valued b, b , . . . , u , . . . respectively. It is all right that a given input may
yield multiple outputs (e.g., case (iii) in the previous example).
2 I write “R” or “R” for a relation, generically, but P, Q, S and T are available to use as well.
3 Hmm. Doesn’t the first conjunct “x ∈ y” constrain and reduce the number of x-values? No: For
every x out there take y = {x} thus the conjunct x ∈ y is fulfilled for all x-values, as I just showed
how to find a y that works.
3.1 Relations 45
2. Another point of view is to see both x and y as inputs and the outputs are true or false (t
or f) according as (x, y) is in the table —that is, xRy is true— or not. For example, (a, b)
is in the table (that is, aRb) hence if the relation receives it as input, then it outputs t.
input: x output: y
a b
a b
.. ..
. .
u u
.. ..
. .
Most of the time we will take the point of view in 1 above. This point of view compels
us to define domain and range of a relation R, that is, the class of all inputs that cause an
output and the class of all caused outputs respectively.
3.1.4 Definition (Domain and range) For any relation R we define domain, in symbols
“dom” by
De f
dom(R) = {x : (∃y)xRy}
where we have introduced the notation “(∃y)” as short for “there exists some y such that”,
or “for some y,”
Range, in symbols “ran”, is defined also in the obvious way:
De f
ran(R) = {x : (∃y)yRx}
Notation 1. For a relation P, the symbol (a)P means the class of all outputs caused by a:
De f
(a)P = {x : aPx}
If (a)P = ∅ and therefore a ∈ dom(P) we may also write (a)P ↓ and say
“(a)P is defined”. Otherwise —(a)P = ∅— we write (a)P ↑ and say “(a)P is
undefined”.
We sometimes want to restrict a relation S to a class A. There are two main
ways to want to do this:
Notation 2. Restrict both inputs and outputs to be in A: This is the way we restrict relations
to obtain a relational restriction. We obtain
De f
S | A = S ∩ A2
46 3 Relations and Functions
Notation 3. For functions (to be introduced shortly) one prefers to restrict only inputs of S
to be in A: We obtain a functional restriction
De f
S A = S ∩ A × ran(S)
Notation 4. “Notation 1” above becomes (a) S | A and (a) S A in the context of
Notations 2 and 3 (note the brackets to help readability).
3.1.5 Theorem For a set relation R, both dom(R) and ran(R) are sets.
Proof For domain we collect all the x such that x Ry, for some y, that is, all the x such that
3.1.6 Definition In practice we often have an a priori decision about what are in principle
“legal” inputs for a relation R, and where its outputs go. Thus we have two classes, A and B
for the class of legal inputs and possible outputs respectively. Clearly we have R ⊆ A × B.
We call A and B left field and right field respectively, and instead of R ⊆ A × B we often
write
R:A→B
and also
R
A −→ B
pronounced “R is a relation from A to B”.
3.1 Relations 47
If A = B then we have
R:A→A
but rather than pronouncing this as “R is a relation from A to A” we prefer 4 to say “R is on
A”.
3.1.7 Remark Trivially, for any R : A → B, we have dom(R) ⊆ A and ran(R) ⊆ B (give
a quick proof of each of these inclusions).
Also, for any relation P with no a priori specified left/right fields, P is a relation from
dom(P) → ran(P). Naturally, we say that dom(P) ∪ ran(P) is the field of P.
3.1.8 Example As an example, consider the divisibility relation on all integers (their set
denoted by Z) that is usually named “|”:
The input x = 0 to the relation “ |” produces no output, in other words, “for input x = 0 the
relation is undefined.”
1. It does make sense for some relations to a priori choose left and right fields, here
|:Z→Z
3.1.9 Example Next consider the relation < with left/right fields restricted to N.
Then dom(<) = N, but ran(<) N. Indeed, 0 ∈ N − ran(<).
3.1.13 Definition A relation R (not a priori restricted to have predetermined left or right
fields) is
3.1.14 Example
(i) Transitive examples: ∅, {(1, 1)}, {(1, 2), (2, 3), (1, 3)}, <, ≤, =, N2 .
(ii) Symmetric examples: ∅, {(1, 1)}, {(1, 2), (2, 1)}, =, N2 .
(iii) Antisymmetric examples: ∅, {(1, 1)}, =, ≤, ⊆.
(iv) Irreflexive examples: ∅, {(1, 2)}, <, , the relation “ =” on N.
3.1 Relations 49
(v) Reflexive examples: 1 A on A, {(1, 1)} on {1}, {(1, 2), (2, 1), (1, 1), (2, 2)} on {1, 2}, =
on N, ≤ on N.
3.1.16 Definition (Relational Composition) Let R and S be (set) relations. Then, their
composition, in that order, denoted by R ◦ S is defined for all x and y by:
De f
x R ◦ Sy ≡ (∃z) x Rz ∧ zSy
It is customary to abuse notation and write “x RzSy” for “x Rz ∧ zSy” just as one writes
x < y < z for x < y ∧ y < z.
3.1.17 Example Here is whence the emphasis “in that order” above. Say, R = {(1, 2)}
and S = {(2, 1)}. Thus, R ◦ S = {(1, 1)} while S ◦ R = {(2, 2)}. Thus, R ◦ S = S ◦ R in
general.
Thus, the situation where we have that x R ◦ Sy means, for some z, x RzSy and is depicted
as:
3.1.19 Theorem The composition of two (set) relations R and S in that order is also a set.
(R ◦ S) ◦ T = R ◦ (S ◦ T)
We state and prove this central result for any class relations.
xRwSzTy (1)
xRw(S ◦ T)y
xRzSuTy (2)
x(R ◦ S)uTy
3.1.22 Corollary If R, S and T are (set) relations, all on some set A,5 then “R ◦ S ◦ T ”
has a meaning that is independent of how brackets are inserted.
The corollary allows us to just omit brackets in a chain of compositions, even longer than
the above. It also leads to the definition of relational exponentiation, below:
If moreover we have defined R to be on a set A, then we also define the 0-th power: R 0
stands for A or 1 A .
3.1.26 Exercise Show that if for a a relation R we know that R 2 ⊆ R, then R is transitive
and conversely.
1. T is transitive, and R ⊆ T .
2. If S is also transitive and R ⊆ S, then T ⊆ S. This makes the term “⊆-smallest” precise.
Note that we hedged twice in the definition, because at this point we do not know yet:
3.2.2 Remark Uniqueness can be settled immediately from the definition above: Suppose
T and T fulfil Definition 3.2.1, that is,
1. R ⊆ T
and
2. R ⊆ T
since both are closures. But now think of T as a closure and T as the “S” of 3.2.1 (it includes
R all right!)
Hence T ⊆ T .
Now reverse the role playing and think of T as a closure, while T plays the role of “S”.
We get T ⊆ T . Hence, T = T .
The above exercise is hardly exciting, but learning that R + exists for every R and also
learning how to “compute” R + is exciting. We do this next.
∞
3.2.5 Lemma Given a (set) relation R. Then n=1 R n is a transitive (set) relation.
Proof of 1. Note that all positive powers of R, R n+1 , for n ≥ 0, are sets. Indeed, they all
are subsets of the same set!
Here is why:
Firstly, R ⊆ dom(R) × ran(R) by Definition 3.1.4.
Let now n > 0: We have
n+1 n
R n+1
= R ◦ R ◦ . . . ◦ R = R ◦ R ◦ . . . ◦ R ◦R = R n ◦ R
R n+1 = R ◦ R n (1)
and
R n+1 = R n ◦ R (2)
Applying 3.1.19 to (1) we get
Thus
R n+1 ⊆ dom(R) × ran(R)
for n ≥ 0.
Therefore,
x Rn y Rm z
or
n m
x R ◦ R ◦ ··· R◦ R ◦ R ◦ ··· R z
or
n+m
x R ◦ R ◦ ··· R z
54 3 Relations and Functions
that is,
∞
x Ri z
i=1
3.2.6 Remark Why all this work for Part 1 of the proof above? Why not just use 2.4.21
right away? Because 2.4.21 offers only notation once we know that
F = {A0 , A1 , A2 , A3 , . . .} (3)
is a set! Cf. “Suppose the family of sets Q is a set of sets”, the opening statement in the
passage 2.4.21 on notation states.
Here we do not know (yet) if every family of sets like (3) is indeed a set —but in this
case it turns out that we do not care because every member of F = {R i : i = 1, 2, 3, . . .} is
included (as a subset) in dom(R) × ran(R) (a set), which allows us to sidestep the issue!
Whether every family of sets like F in (3) is a set will be answered affirmatively in 3.3.6.
For now note that we cannot recklessly say that after any sequence of construction by stages
steps there is a stage after all those stages. Why? Well, take all the objects in set theory.
Each is given outright (atom; stage 0) or is constructed at some stage (set). If we could prove
there is a stage after all these stages then we could also prove that U is a set, a claim we
refuted with two methods so far!
∞ i ∞ i
Since R ⊆ i=1 R due to R = R 1 , all that remains to show is that i=1 R is a transitive
closure of R is to show that
∞
3.2.7 Lemma If R ⊆ S and S is transitive, then i=1 R i ⊆ S.
Proof I will just show that for all n ≥ 1, R n ⊆ S. OK, R ⊆ S is our assumption, thus R 1 ⊆ S
is true.
For R 2 ⊆ S let x R 2 y, thus (for some z), x Rz Ry hence x SzSy. As S is transitive, the
latter gives x Sy. Done.
For R 3 ⊆ S let x R 3 y, thus (for some z), x R 2 z Ry hence x SzSy. As S is transitive, the
latter gives x Sy. Done.
You see the pattern: Assume now that we proved up to some fixed but unspecified n that
(1) below holds and we want to prove for n + 1 that R n+1 ⊆ S as well using the same value
for n, as in our assumption above.
Thus,
3.2 Transitive Closure 55
(1)
x R n+1 y ⇐⇒ x R n ◦ Ry ⇐⇒ x R n z Ry (some z) =⇒ x SzSy =⇒ x Sy (S transitive)
We have proved:
3.2.8 Theorem (The transitive closure exists) For any relation R, its transitive closure
∞ i
R + exists and is unique. Indeed we have that R + = i=1 R.
An interesting corollary that will lend a computational flavour to 3.2.8 is the following.
n n
Case 1. q ≤ n. Then x i=1 R i y since R q ⊆ i=1 R i , R q being one of the “R i ” with i in
the 1 ≤ i ≤ n range.
Case 2. q > n. In this case I will show that there is also a k ≤ n such that x R k y, which
sends me back to the “easy Case 1”.
Well, if there is one q > n that satisfies (2) there are probably more. Let us choose
our q to be the smallest > n that gives us (2).
Because among those “q” that fit (3)6 imagine we fix attention to one such.
Now, if it is not the smallest such, then go down to the next smaller one that still satisfies
(3), call it q .
Now go down to the next smaller, q > n, if q is not smallest.
Continue like this. Can I do this forever? That is, can we have the following being an
infinitely long sequence of distinct numbers?
n < . . . < q (k)7 < . . . < q < . . . < q < q < q (4)
If yes, then I will have an infinite “descending” chain of distinct natural numbers between
q and n. Absurd!8
Back to the proof. So let the q we are working with be the smallest that satisfies (3). Then
we have the configuration
z 1 , z 2 , z 3 . . . z i , z i+1 , . . . zr , zr +1 , . . . , z q−1 , y
in (5) above contains q > n members. As they all come from A, not all are distinct. So
let z i = zr (the zr could be as late in the sequence as y, i.e., equal to y).
Now omit the boxed part in (5). We obtain
which contains at least one “R” fewer than the sequence (5) does —the entry “z i Rz i+1 ”
(and everything else in the “. . .” part in the box) being removed. That is, (6) states
x Rq y
with q < q. Since the q in (3) was smallest > n, we must have q ≤ n which sends us
to Case 1 and we are done.
The result from 3.2.9 permits the computation of the transitive closure of relations on finite
sets. We will give a general definition of “finite” later but for this subsection we mean sets
like S in 3.2.9. We will introduce a matrix representation of relations R on finite sets (in
the sense agreed to for this subsection) the matrix rows and columns being indexed by the
entries of the set S. Since matrices like their coordinates to be pairs of natural numbers,
much will be gained in clarity and nothing lost in mathematical generality if the names of
the entries of finite sets like S of 3.2.9 are natural numbers rather than a generic letter “a”
indexed by natural numbers. Our S-sets therefore are precisely the
S = {1, 2, 3, . . . , n} (1)
3.2.10 Definition (Matrices and the Adjacency matrix) A matrix is the term used in
mathematics for the programming term “two dimensional array”. An “m × k” matrix has m
rows and k columns. The address or location of an item in a matrix M, just as in programming,
is given by two coordinates, i (row number) and j (column number) with this notation
M(i, j). Two matrices M and N are equal iff
A special case of matrices are the so-called adjacency matrices. Given a relation R on a
finite set S. Its adjacency matrix M R —or just M if R is understood— is an n × n matrix
of 0-1 entries given by
De f 1 if i R j
M R (i, j) =
0 othw
For computational purposes the entries “1” and “0” are taken to be “Boolean” with respect
to addition, that is, their arithmetic is not the normal one on natural numbers but is governed
by the “addition table” below.
3.2.11 Example (Matrix addition) Addition of two n × n matrices is done by adding all
the entries with the same coordinates in the two matrices. That is, if M and N are two n × n
De f
matrices, then (M + N )(i, j) = M(i, j) + N (i, j), for all i, j.
Thus
0 1 01 0 1
• + =
1 1 10 1 1
0 1 01 0 1
• + =
0 0 10 1 0
01 01 11
Thus × = because
11 11 11
3.2 Transitive Closure 59
3.2.13 Example (The Identity Matrix) For each n > 0 we have an n × n identity matrix
In —or I , if the n is understood from the context— whose entries are as follows:
1. The diagonal entries are all equal to 1: That is, In (i, i) = 1, for all 1 ≤ i ≤ n.
2. All non diagonal entries are zero: That is, In (i, j) = 0, for all 1 ≤ j, i ≤ n such that
i = j.
The above in turn can be written without the “wordy” part “for some k” as follows (see also
Tables 3.1 and 3.2)
n
i R ◦ Q j iff M R (i, k) × M Q (k, j) = 1 (1 )
k=1
n
Indeed, the part k=1 M R (i, k) × M Q (k, j) is 1 iff, for at least one k-value, (“for some k”
as we also say) we have M R (i, k) × M Q (k, j) = 1.
1.
M R 2 = M R2 (4)
2.
(3) (4)
M R 3 = M R 2 ◦R = M R 2 × M R = M R2 × M R = M R3
3. Suppose now that we have had enough perseverance to progress sufficiently far to a
number m that we will not disclose, and obtained
M R m = M Rm (5)
4. To show that we can obtain identities like (5) as far as we like we show how to stand on
the shoulders of (5) and obtain (6) below for m + 1 (“m” is still the fixed undisclosed
number we used in (5))
(3) (5)
M R m+1 = M R m ◦R = M R m × M R = M Rm × M R = M Rm+1 (6)
M × In = In × M = M
We have just seen that a computation of R + can be based on (Boolean) matrix multipli-
cation (due to 3.2.9 and the results immediately above). We want to compute
R ∪ R2 ∪ R3 ∪ · · · ∪ Rn
as
M R + M R2 + M R3 + · · · + M Rn
Here is then the most obvious algorithm to do so:
3.2 Transitive Closure 61
T ← In
for k = 1 to n do
T ← MR + T × MR
end
M R + M R2 + · · · + M Rk
since the successive contents of T at the end of the k-the iteration are
W hy?
k = 1, T = M R + In × M R = M R + M R = M R
k = 2, T = M R + T × M R = M R + M R2
k = 3, T = M R + T × M R = M R + (M R + M R2 ) × M R = M R + M R2 + M R3
Guess and Postulate the pattern for k = m below:
k = m, T = M R + M R2 + · · · + M Rm , thus (we are right! The pattern is preserved
below!)
k = m + 1,
T =M R + (M R + M R2 + · · · + M Rm ) × M R
=M R + M R2 + M R3 + · · · + M Rm+1
thus we validated the form of T also for iteration k = m + 1 and therefore
our “guess” of the form of T at iteration k = m is correct for all m-values. In
particular,
T = M R + M R2 + M R3 + · · · + M Rn
and therfore
T = M R+
If we ignore a multiplicative constant, the algorithm’s run time is n 3 Boolean additions and
multiplications every time it goes through a loop iteration. That is, it is K n 4 such operations
overall (over all n passages through the loop), for all large n where K is a constant that in
the analysis of algorithms domain we “normally” (read “most of the time”) do not care to
specify exactly.
This “do not care” attitude has led to the so-called “big-O” notation that we will develop
in some detail in Section 7.1. Put simply, for now, for two non negative expressions f (n)
62 3 Relations and Functions
and g(n), we can express the English “the expression f (n) is bounded (or majorised) by a
constant times g(n), for all large n” —or also the less verbose “ f (n) ≤ K × g(n), for all
n ≥ N0 ”— by the very brief notation
f (n) = O(g(n))
Thus, overall the run time is dominated by n times n 3 (the program loops n times), that
is, the algorithm’s run time is O(n 4 ) (Boolean operations) in big-O notation.
We can compute R + on a finite set A faster if we know that R is reflexive on A. This better
algorithm is based on the following theorem and its corollaries.
m
R i = R m−1 (1)
i=1
Proof Our proof relies on the techniques in 3.2.9 (ibid. Case 2 of the proof). Towards (1)
we have two directions.
m
⊇-direction. We want i=1 R i ⊇ R m−1 , but this is trivial.
m m
⊆-direction. We want i=1 R i ⊆ R m−1 . This needs some work. So let x i=1 R i y.
1. i = m − 1 works for the assumption (left hand side of ⊆). But then x R m−1 y which is
our conclusion.
2. i = k < m − 1 works for the assumption.
m−1−k R
From x Rk y
and y Ry (reflexivity) we get x Rk y Ry . . . y Ry which trivially implies
x R m−1 y.
3. i = m works for the assumption. We must reduce i so we end up in one of the above two
cases. We have
m R
x Ra1 Ra2 Ra3 · · · am−1 Ry (2)
3.2 Transitive Closure 63
We will partly use the technique of proof of 3.2.9, Case 2. Now we named m + 1 points
in (2) but we have only n < m + 1 distinct ones in A. So two names in (2) must name
the same point. We have cases:
n
m
3.2.19 m−1
R+ = Ri = Ri = R
i=1 i=1
We can now compute faster: Let R be reflexive on A = {1, 2, . . . , n} and let p be smallest 9
such that
n − 1 ≤ 2p (3)
That is p = log2 (n − 1).10 Set m = 2 p + 1. Thus n ≤ m and by 3.2.20
R + = R m−1 = R 2
p
(4)
We can now compute with jumps, by starting with M R and using repeated squaring ( p
times)
9 As the expression 2 p increases without bound and n − 1 is fixed, there are infinitely many p for
which n − 1 ≤ 2 p is true. We can thus pick the smallest p that works.
10 Let x be a real number and t an integer such that t − 1 < x ≤ t. Then we call t the ceiling of x
and write t = x.
64 3 Relations and Functions
T ← MR
for k = 1 to p do
T ← T2
end
By (4) T , at the end of the algorithm above, holds the adjacency matrix M R2 of R + .
p
There is an even faster way to compute R + due to Warshall. The algorithm relies on the
visualisation that x R + y means that there is a path from x to y that passes through points
(members) of A which are connected by arrows labelled “R” and all (arrows) point in the
direction from the “start” x toward the “end” y.
R R R R R R R R R
x −→ a1 −→ a2 −→ a3 −→ · · · −→ a j −→ a j+1 −→ · · · −→ ar −1 −→ y (1)
Thus M R + (x, y) = 1 iff x and y are connected by a path as depicted in (1) above, that is,
x R r y holds. What the algorithm below does is whenever it detects (1) and (1 ) below —
manifested as T (x, y) = 1 and T (y, z) = 1— it adds an “edge” from x to z, that is, makes
T (x, z) = 1 too.
R R R R R R R R R
y −→ b1 −→ b2 −→ b3 −→ · · · −→ b j −→ b j+1 −→ · · · −→ bq−1 −→ z (1 )
T ← MR
for j = 1 to n
for i = 1 to n
for k=1 to n
T (i, k) ←T (i, k)+T (i, j) × T ( j, k)
The command in the last loop says “if there is a path from i to j and one from j to k then
acknowledge a path (edge) from i to k by making T (i, k) equal to 1.”
This is the correct behaviour for the algorithm but the $1M question is:
Does the algorithm add all the edges (paths) needed for R + ?
11 “It takes” in the analysis of an algorithm’s run time is rarely exact. Usually, as is the case here, “it
takes” is short for “it takes up to”; an upper bound on steps/time.
3.2 Transitive Closure 65
Is it possible that T (i, k), for some i, k, will incorrectly stay 0 because, when we come
to perform T (i, j) × T ( j, k), the entry T ( j, k) is not 1 yet?
No, not possible.
We will prove the correctness of Warshall’s algorithm employing a “trick”. Well, not a
trick really, but the methodology of “dynamic programming” taught in courses on algorithms
but also appearing in the proof of Kleene’s theorem that expresses sets recognised by finite
automata (FA) as regular expressions (cf. for example, Tourlakis (2012)). Namely, we add
notation to help the reasoning about the correctness of the program above.
We add a superscript to T on the right and left of “←” in the innermost loop:
The meaning of T (q) (x, y) is that this entry is 1 precisely if there is a “path” from
x to y (such as (1)) that does not use intermediate points ar that are outside the set
{1, 2, . . . , q}.
Correspondingly, the initialisation T ← M R should be viewed as T (0) ← M R since
all paths depicted by M R are direct (single “edges” (arrows) labelled by R) —no
intermediate points on any of them.
Thus, not only the initialisation is correct but also the innermost loop behaves correctly:
The right hand side (before the execution of the assignment instruction inside the loop)
holds recorded paths —if such were recorded— i → k, i → j and j → k that have no
inner points outside {1, 2, . . . , j − 1} the “record” being a 1 or 0 in the corresponding
matrix entries T ( j−1) (i, k), T ( j−1) (i, j), and T ( j−1) ( j, k) according as the foregoing paths
were detected or not.
The left hand side T ( j) (i, k) records paths i → k that either have no inner points outside
{1, 2, . . . , j − 1} (term T ( j−1 (i, k) to the right of “←”) OR by virtue of the concatenation
of i → j and j → k they have inner points from 1 up to and including j; justifying us to
place a “ j” superscript on T to the left of “←”.
Given the semantics of
as noted in the above two paragraphs, and since T (0) on line 1 is initialised correctly as
already noted in the preceding boxed remark, assignment (2) is correct for all j (and all
i, k). In particular, for j = n, we have T (n) (i, k) = 1 iff there is a path from i to k that uses
no internal nodes outside {1, 2, . . . , n}. In short, “T (n) (i, k) = 1 iff there is a path from i to
k” is correct, period; without the preceding italicised qualification.
Do we need to use the superscripts in T (q) and to introduce new matrices T ( j) , one for
each j = 1, 2, . . . , n, beyond their use notionally in the justification of correctness?
No.
66 3 Relations and Functions
We can record the T (q) entries into T —that is, we store T (q) into T at each step that
the former is updated— without altering the analysis above: Namely, if the T ( j−1) (i, k),
T ( j−1) (i, j) and T ( j−1) ( j, k) in (2) have already been stored in T , then if paths (1) and
(1 ) have already been recorded as T ( j−1) (i, j) = 1 and T ( j−1) ( j, k) = 1, according to the
previous analysis, then they are stored as T (i, j) = 1 and T ( j, k) = 1 in the suggested
algorithm, without superscripts. Thus using the “T (i, k) ← T (i, k) + T (i, j) × T ( j, k)”
in the innermost loop —no “( j)” superscript on the leftmost T !— the algorithm above
correctly updates the left hand side T (i, k) since the right hand side is assumed correct and
the assignment statement can be viewed as having two steps: One, obtaining T ( j) (i, k) as an
“intermediate step” and Two, copying this result into the left hand side of “←” as T (i, k).
The latter entry is yes/no (1/0) without reference to inner nodes.
Before we turn to the timing assessment of the algorithm we give it the form that is
prevalent in the literature.
T ← MR
for j = 1 to n
for i = 1 to n
if T (i, j) =1 then
for k=1 to n
T (i, k) ←T (i, k)+T ( j, k)
By inspection of the above program and the presence of the three nested loops it is trivial
that the algorithm’s run time is bounded by O(n 3 ) Boolean + operations (no ×!).
Equivalence relations must be on some set A, since we require reflexivity. They play a
significant role in many branches of mathematics and even in computer science. For example,
the minimisation process of finite automata (a topic that we will not cover) relies on the
concept of equivalence relations.
1. Reflexive
2. Symmetric
3. Transitive
3.3 Equivalence Relations 67
Here is a longish, more sophisticated example, that is central in number theory. We will
have another instalment of it after a few definitions and results.
Be careful to distinguish the brackets {. . .} from these [. . .]. It is not a priori obvious that
x ∈ [x] R until you look at the definition 3.3.4! [x] R = {x}.
68 3 Relations and Functions
The symbol A/R denotes the quotient class of A with respect to R, that is,
De f
A/R = {[x] P : x ∈ A}
This is the time to introduce “Principle 3”12 of set formation.
3.3.6 Remark (Principle 3) Suppose that the class F is indexed by some (or all) members
of a set A. Then F is a set.
Being indexed by (some) members of a set A means that —for every x ∈ F— we have
attached to it as “label(s)” (each often depicted as a subscript or superscript) some mem-
ber(s) of A.
We must ensure that once a label is used it is not used again for another y ∈ F.
Thus, if F = {a, b, c}, then {a1 , b13,19,0 , c42 } is a valid labelling with members from N.13
{a1,13 , b13 , c19 } is not correctly labelled (same label, 13, twice), while the labelling
{a1,42 , b13 , C} is also invalid (C was not labelled):
In sum, we can label an object in F with many labels, but we may not use the same
label twice to label two objects of F and we may not leave any object of F unlabelled.
Note that in 3.3.4 we have labelled every X ∈ A/R by a member of A by virtue of the
fact that any X is an [a] R We can use a or any (or all) x ∈ [a] R to label X .
Two things:
1. The presence of a valid (correct) labelling from a set A ensures that the labelled class is
a set as it —intuitively!— has no more members than the set of labels (I can spend many
—or even all— of available labels on one set of F, but I may not reuse a label, so I have
at least as many labels as there are members in F).
Thus F is as “small” as a set, and thus is a set itself. Some people call Principle 3 the
size limitation doctrine.14
2. Why can’t I use the Principles 0–2 to argue that F, labelled by A, is a set? Well, because
these principles are notorious in not telling me when a stage exists after infinitely many
stages of construction that I might have if, say, I were to build one set for each natural
number:
A0 , A1 , . . . , An , . . .
Suppose the nature of each Ai —for each i ≥ 0— is such that each Ai+1 is built at stage
i+1 that is astronomically later than the stage i at which Ai was built.
Thus we get an infinite sequence of stages, wildly apart! How can I justify —just on the
basis of Principles 0-2— the existence of a stage that is after all the i , in order to
build the class {A0 , A1 , . . . , An , . . . , } as a set?
3.3.7 Theorem A/R is a set for any set A and equivalence relation R on A.
Proof A provides labels for all members of A/R. Now invoke Principle 3.
Now that we have had an excuse to introduce Principle 3 early, and applied it to the easy
example above let us do the following exercise:
3.3.8 Exercise Show that it was not necessary to apply the new Principle to prove 3.3.7.
Specifically show that the Lemma follows by Principles 0–2 implicitly via 2.3.6.
Hint. You will need, of course, to find a superset of A/R, that is, a class X that demon-
strably is a set, and satisfies A/R ⊆ X .
3.3.9 Lemma Let P be an equivalence relation on A. Then [x] = [y] iff x P y —where we
have omitted the subscript P from the [. . .]-notation.
Proof (→) part. By reflexivity, x ∈ [x] (3.3.5). The assumption then yields x ∈ [y] and
therefore y P x by 3.3.4. Symmetry gives us x P y now.
(←) part. Let z ∈ [x]. Then x P z. The assumption yields y P x (by symmetry), thus, transi-
tivity yields y P z. That is, z ∈ [y], proving
[x] ⊆ [y]
By swapping letters we have proved above that y P x implies [y] ⊆ [x]. Now (by sym-
metry) our original assumption, namely x P y, implies y P x, hence also [y] ⊆ [x]. All in all,
[x] = [y].
Proof
(i) 3.3.5.
(ii) Let z ∈ [x] ∩ [y]. Then x Rz and y Rz, therefore x Rz and z Ry (the latter by symmetry)
hence x Ry (transitivity). Thus, [x] = [y] by Lemma 3.3.9.
(iii) The ⊆-part is obvious from [x] ⊆ A. The ⊇-part follows from x∈A {x} = A and
{x} ⊆ [x].
3.3.12 Remark Often a partition F is given as an indexed family of sets denoted by (Fa )a∈I ,
where I is the indexing set.
Less informatively we may write (Fa )a∈I as
{Fa , Fb , Fc , . . .}
There is a natural affinity between equivalence relations and partitions on a set A. In fact,
3.3.13 Theorem Given a partition F on a set A. This leads to the definition of an equiv-
alence relation P whose equivalence classes are precisely the sets of the partition, that is
F = A/P.
(iii) P is transitive: Indeed, let x P y P z. Then {x, y} ⊆ X and {y, z} ⊆ Y for some X , Y in
F.
Thus, y ∈ X ∩ Y hence X = Y by 3.3.11(ii). Hence {x, z} ⊆ X , therefore x P z.
So P is an equivalence relation. Let us compare its equivalence classes with the various
X ∈ F.
Now [x] P (denoted without the subscript P in the remaining proof) is
{y : x P y} (2)
Let us compare [x] with the unique X ∈ F that contains x —why unique? By 3.3.11(ii).
Thus,
(2) (1) x∈X is t
y ∈ [x] ⇐⇒ x P y ⇐⇒ x ∈ X ∧ y ∈ X ⇐⇒ y ∈ X
Thus [x] = X .
3.3.14 Example (Another look at congruences) Euclid’s theorem for the division of inte-
gers states, where Z is the set of all integers, negative, positive and 0:
If a ∈ Z and 0 < m ∈ Z, then there are unique q and r such that
a = mq + r
15 Absolute value.
72 3 Relations and Functions
a = m(q + 1) + k
As k < r I have contradicted the minimality of r .
We have proved existence of at least one pair q and r that works for (1). How about
uniqueness? Well, the worst thing that can happen is to have two representations (1). Here
is another one:
a = mq + r and 0 ≤ r < m (2)
As both r and r are < m, their “distance” (absolute difference) is also < m.
Now, from (1) and (2) we get
m|q − q | = |r − r | (3)
OK. How about missing some? We are not, for any a is uniquely expressible as a =
m · q + r , where 0 ≤ r < m. Since m | (a − r ), we have a ≡m r , i.e., (by 3.3.4) a ∈ [r ].
−a = 22197 × 5 + 2
Thus,
a = −22197 × 5 − 2 (1)
(1) can be rephrased as
a ≡5 −2 (2)
But easily we check that −2 ≡5 3 (since 3 − (−2) = 5). Thus,
a ∈ [−2]5 = [3]5
3.3.16 Exercise Can you now easily write the same a above as
a = Q × 5 + R, with 0 ≤ R < 5?
This section introduces one of the most important kind of binary relations in set theory and
mathematics in general: The partial order relations.
3.4.1 Definition (Converse or inverse relation of P) For any relation P, the symbol P−1
stands for the converse or inverse relation of P and is defined as
De f
P−1 = {(x, y) : yPx} (1)
3.4.2 Definition (“(a)P” notation) For any relation P we write “(a)P” to indicate the class
—might fail to be a set— of all outputs of P on (caused by) input a. That is,
De f
(a)P = {y : a P y}
3.4.3 Exercise Give an example of a specific relation P and one specific object (set or atom)
a such that (a)P is a proper class.
74 3 Relations and Functions
Thus,
(a)P−1 ↑ iff a ∈
/ ran(P)
and
(a)P−1 ↓ iff a ∈ ran(P)
3.4.6 Definition (Partial order) A relation P is called a partial order or just an order, iff
it is
(1) irreflexive (i.e., xPy → x = y for all x, y), and
(2) transitive.
It is emphasised that in the interest of generality —for much of this section (until we say
otherwise)— P need not be a set.
Some people call this a strict order as it imitates the “<” on, say, the natural numbers.
3.4.7 Remark (1) We will normally use the symbol “<” in the abstract setting to denote
any unspecified order P, and it will be pronounced “less than”.
It is hoped that the context will not allow confusion with any concrete use of the symbol
< on numbers (say, on the reals, natural numbers, etc.).
(2) If the order < is a subclass of A × A —i.e., it is <: A → A— then we say that < is an
order on A.
(3) It is easy to check and verify that, for any order < and any class B, we have that
< ∩(B × B) is an order on B.
3.4.9 Example The concrete “less than”, <, on N is an order, but ≤ is not (it is not irreflex-
ive). The “greater than” relation, >, on N is also an order, but ≥ is not. Of course, >=<−1 .
In general, it is trivial to verify that P is an order iff P−1 is an order. Exercise!
3.4.11 Example The relation ∈ is irreflexive by the well known A ∈ / A, for all A. It is not
transitive though. For example, if a is a set (or atom), then a ∈ {a} ∈ {{a}} but a ∈/ {{a}}.
So it is not an order.
Let M = ∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}} . The relation ε =∈ ∩(M × M) is transitive
and irreflexive, hence it is an order (on M). Verify!
3.4.13 Example Consider the order ⊂ again. In this case we have none of {∅} ⊂ {{∅}},
{{∅}} ⊂ {∅} or {{∅}} = {∅}. That is, {∅} and {{∅}} are non comparable items. This justifies
the qualification partial for orders in general (Definition 3.4.18).
On the other hand, the “natural” < on N is such that one of x = y, x < y, y < x always
holds for any x, y. That is, all (unordered) pairs x, y of N are comparable under <. This is
a concrete example of a total order (see the “official definition” below: 3.4.19).
While all orders are “partial”, some are total (< above) and others are nontotal (⊂ above).
for all x, y in A.
(2) The definition of ≤ depends on A due to the presence of A . There is no such dependency
on a “reference” class in the case of <.
(3) We remind ourselves once more here that the symbols < and ≤ —and their
pronunciations— do not imply that we are talking about the specific ones on numbers.
It is just a harmless (I hope) notational devise, but unless said explicitly otherwise, “<” and
“≤” are any orders.
3.4.15 Lemma For any <: A → A, the associated relation ≤ on A is reflexive, antisym-
metric and transitive.
76 3 Relations and Functions
(a) If x = z, then x ≤ z (see the -remark after 3.4.14) and we are done.
(b) The remaining case is x = z. Now, if it is x = y or y = z (but not both (why?)), then
we are done again. So it remains to consider x < y and y < z. By transitivity of < we
get x < z, hence x ≤ z, since < ⊆ ≤.
Proof Since
P − A ⊆ P (1)
it is clear that P − A is on A. It is also clear that it is irreflexive. We only need verify that
it is transitive.
So let
(x, y) and (y, z) be in P − A (2)
By (1) (or (2))
(x, y) and (y, z) are in P (3)
hence
(x, z) ∈ P
by transitivity of P.
Can (x, z) ∈ A , i.e., can x = z? No, for antisymmetry of P and (3) would imply x = y,
i.e., (x, y) ∈ A contrary to (2).
So, (x, z) ∈ P − A .
3.4.17 Remark Often in the literature, but decreasingly so, it is the “reflexive order”
≤: A → A that is defined as a “partial order” by the requirements that it is reflexive, anti-
symmetric and transitive. Then < is obtained as in Lemma 3.4.16, namely, as “≤ −A ”.
Lemmas 3.4.15 and 3.4.16 show that the two approaches are interchangeable, but the “mod-
ern” approach of Definition 3.4.6 avoids the nuisance of having to tie the notion of order to
some particular “field” A (3.1.6).
For us “≤” is the derived notion defined in 3.4.14.
3.4 Partial Orders 77
3.4.18 Definition (PO Class) If < is an order on a class A, we call the informal pair
(A, <)17 a partially ordered class, or PO class.
If < is an order on a set A, we call the pair (A, <) a partially ordered set or PO set.
Often, if the order < is understood as being on A or A, one says that “A is a PO class” or
“A is a PO set” respectively.
3.4.19 Definition (Linear order) A relation < on A is a total or linear order on A iff it is
(1) An order, and
(2) For any x, y in A one of x = y, x < y, y < x holds —this is the so-called “tri-
chotomy” property.
If A is a class, then the informal pair (A, <) is a linearly ordered class —in short, a LO
class.
If A is a set, then the pair (A, <) is a linearly ordered set —in short, a LO set.
One often calls just A a LO class or LO set (as the case warrants) when < is understood
from the context.
3.4.20 Example The standard <: N → N is a total order, hence (N, <) is a LO set.
3.4.21 Definition (Minimal and minimum elements) Let < be an order and A some class.
17 Formally, (A, <) is not an ordered pair since A may be a proper class and we do not allow class
members —e.g., in {A, {A, <}}— to be proper classes. We may think then of “(A, <)” as informal
notation that simply “ties” A and < together. Alternatively, if we are really determined to have
class pairs (we are not!), we can define pairing with proper classes as components, for example as
(A, B) = De f (A × {0}) ∪ (B × {1}). For our part we will have no use for such formality, and will
consider (A, <) in only the informal sense.
78 3 Relations and Functions
3.4.22 Remark In particular, if a (∈ A) is not in the field dom(<) ∪ ran(<) (cf. 3.1.7) of
<, then a is both <-minimal and <-maximal in A. For example, (∃x ∈ A)x < a is false in
this case since if, for some x, we have x ∈ A and also x < a, then a ∈ ran(<); impossible.
Because of the duality between the notions of minimal/maximal and minimum/maximum,
we will mostly deal with the <-notions whose results can be trivially translated for the >-
notions.
Note how the notation learnt from 3.4.2 and 3.4.1 and 3.4.4 can simplify
Note that (1) says that no x is in both A and (a) >.18 That is, a is <-minimal in A iff
3.4.23 Example 0 is minimal, also minimum, in N with respect to the “standard” ordering.
In P (N), ∅ is both ⊂-minimal and ⊂-minimum. On the other hand, all of {0}, {1}, {2}
are ⊂-minimal in P (N) − {∅} but none are ⊂-minimum in that set.
Observe from this last example that minimal elements in a class are not unique.
3.4.24 Remark (Hasse diagrams) There is a neat pictorial way to depict orders on finite
sets known as “Hasse diagrams”. To do so one creates a so-called “graph” of the finite PO
set (A, <) where A = {a1 , a2 , . . . , an }.
How? The graph consists of n nodes —which are drawn as points— each labeled by one
ai . The graph also contains 0 or more arrows that connect nodes. These arrows are called
edges.
When we depict an arbitrary R on a finite set like A we draw one arrow (edge) from ai
to a j iff the two relate: ai Ra j .
In Hasse diagrams for PO sets (A, <) we are more selective: We say that b covers a iff
a < b, but there is no c such that a < c < b. In a Hasse diagram we will
So, let us have A = {1, 2, 3} and < = {(1, 2), (1, 3), (2, 3)}.
The above has a minimum (1) and a maximum (3) and is clearly a linear order.
A slightly more complex one is this (A, <), where A = {1, 2, 3, 4} and
< = {(1, 2), (4, 2), (2, 3), (1, 3), (4, 3)}
This one has a maximum (3), two minimal elements (1 and 4) but no minimum, and is not
a linear order: 1 and 4 are not comparable.
Let
Aa<m (ii)
80 3 Relations and Functions
For all x ∈ A, I have the negation of “x < m”, that is, I have ¬x < m. (1)
But from “our previous math” (high school? university?) ¬x < m is equivalent to m ≤ x.
Thus (1) says (∀x ∈ A)m ≤ x, in other words, m is the minimum in A.
Do you believe this? (Don’t!) If the order is not total, then the disjunction of x < m, x =
m, m < x may fail to be true, and thus ¬m < x and x < m ∨ x = m are not necessarily
equivalent. See also the counterexample to such expectation in 3.4.13 and also 3.4.23.
3.4.27 Lemma If < is a linear order on A, then every minimal element is also minimum.
Proof The “false proof” of the previous example is valid under the present circumstances.
The following type of relation has fundamental importance for set theory, and mathemat-
ics in general.
3.4.28 Definition 1. An order < satisfies the minimal condition, in short it has MC, iff
every nonempty A has <-minimal elements.
2. If a total order <: B → B has MC, then it is called a well-ordering19 on (or of) the class
B.
3. If (B, <) is a LO class (or set) with MC, then it is a well-ordered class (or set), or WO
class (or WO set).
19 The term “well-ordering” is ungrammatical, but it is the terminology established in the literature!
3.4 Partial Orders 81
3.4.29 Remark
1. What Definition 3.4.28 says in case 1. is —see (2) in 3.4.22— “if, for some fixed order
< the following statement
is true in set theory, for any A, then we say that < has MC ”.
The following observation is very important for future reference:
If A is given via a defining property F(x), as
De f
A = {x : F(x)}
Conversely, for each formula F(x) we get a class A = {x : F(x)} and thus if < has MC,
then we may express this fact as in (2) above.
2. Much is to be gained in applications by allowing slightly more generality to the concept
of MC by not requiring the relation that is so equipped to be an order. To this end we
will define the counterpart concepts for <-minimal (of Definition 3.4.21) and will also
generalise Definition 3.4.28 below by Definition 3.4.32 below.
3.4.30 Definition This time we are not postulating that P is on A nor that it is an order.
3.4.31 Remark
∅ if a ∈
/A
(a) P | A =
A ∩ (a)P othw
Indeed, in the first case we cannot have an x such that a P | A x since this requires
(a, x) ∈ A2 that is untenable under the condition for a. If, on the other hand, a ∈ A, then
82 3 Relations and Functions
we will include in (a) P | A all those x that are in both (a)P = {x : aPx} and A (cf.
3.1.4, Notation 2).
3.4.32 Definition A relation T —that is not necessarily an order— satisfies the minimal
condition, in short it has MC, iff every nonempty A has T-minimal elements in the sense
that a t ∈ A exists such that there is no t ∈ A satisfying t Tt.
3.4.33 Remark Definition 3.4.32 has a formulation identical to (1) of 3.4.29, although it is
here for the general relation with MC —T— as opposed to an order < with MC:
Often one works in a class A other than the class of everything, U (A might still be a proper
class). It is then useful to “relativise” a relation P to A and perhaps even have this restriction
—because a relational restriction is what we have in mind— have MC even if, perhaps, the
unrelativised P does not. Thus we have
3.4.34 Definition (Relations with MC over, or relative to a class) We say that a relation
P has MC over (or on, or in, or relative to) a class A if P | A does.
3.4.35 Proposition (MC over a Class Test) A relation P has MC over a class A iff the
schema
∅ = B ⊆ A → (∃b ∈ B)B ∩ (b)P−1 = ∅ (1)
is true in set theory.
Proof That P has MC over A means that P | A has MC, that is, the schema
∅ = B → (∃b ∈ B)B ∩ (b) P−1 | A = ∅ (2)
is true in set theory. We will prove that we have (1) iff we have (2). There are two directions
to verify this “iff”.
3.4 Partial Orders 83
(I) Assume (1) and prove (2): Towards proving (2) start with the assumption for (2): B = ∅.
∅ =B∩A⊆A (2 )
3.4.31 2., and mindful of B ∩ A = B, the right hand side of “→” in (2) —that is, (2 )—
becomes (∃b ∈ B)B ∩ (b)P−1 = ∅. This proves (1) due to the “let” at the onset of this
(2) → (1) case.
3.4.36 Remark In practice, the minimal condition (MC) of an order or, indeed, of an arbi-
trary relation P is usually taken relative to a class A, often a set class.
Thus it is important to reformulate (1) of Proposition 3.4.35 that succinctly states that a
relation P (not necessarily an order) has MC over a class A.
3.4.37 Corollary Let P be a relation with MC over A. (1) of 3.4.35 is equivalent to the
truth of the schema below —where A and F[x] are arbitrary, but A in any one application
of MC is fixed as the class inside which we do mathematics.
(∃b ∈ A)F[b] → (∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (†)
Proof (I) Assume (1) in 3.4.35 and prove (†). To this end assume the hypothesis
De f
B = A ∩ {x : F[x]} (¶)
20
By (‡) and (¶) we have ∅ = B ⊆ A, thus, by (1) quoted above we get (∃b ∈B) B ∩
(b)P−1 = ∅. This translates (in terms of F[x]) into (∃b ∈ A) F[b] ∧ ¬(∃x) x ∈ B ∧
bP−1 x , which after further translation (replacing “x ∈ B” and “bP−1 x”) becomes
(∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (¶¶)
(††) implies —in terms of F[x]— (∃b ∈ A)F[b], which is the same as the hypothesis of
the implication in (†). Since (†) is assumed, we have its conclusion part (see (¶¶) above)
—i.e., it is true under assumption (††). Let us express it without using the notation that
employs “F[x]”. We observe
(∃b∈B) (∃x∈B)
(∃b ∈ A) F[b]∧ ¬ (∃x ∈ A) F[x]∧
xPb
x∈ (b)P−1
In short, (∃b ∈ B)B ∩ (b)P−1 = ∅. We have just shown the truth of the conclusion part
of the implication (1) in 3.4.35 as we had set out to do.
3.5 Functions
At last! We consider here a special case of relations that we know them as “functions”. Many
of you know already that a function is a relation with some special properties.
20 That is, (∃b)(b ∈ B ∧ . . .) or, equivalently, (∃b)(b ∈ A ∧ F[b] ∧ . . .) or, equivalently, (∃b ∈
A)(F[b] ∧ . . .) the “…” part being, before further translation, “B ∩ (b)P−1 = ∅”.
3.5 Functions 85
3.5.1 Definition A function R is a single-valued relation. That is, whenever we have both
x Ry and x Rz, we will also have y = z.
It is traditional to use, generically, lower case letters from among f , g, h, k to denote
functions but this is by no means a requirement.
Another way of putting it, using the notation from 3.4.2, is: A relation R is a function iff
(a)R is either empty or contains exactly one element.
3.5.2 Example The empty set is a relation of course, the empty set of pairs. It is also a
function since
(x, y) ∈ ∅ ∧ (x, z) ∈ ∅ → y = z
vacuously, by virtue of the left hand side of → being false.
But we also have an annoying difference in notation that is used extremely widely:
Mathematicians normally prefer to write f (a) = b instead of (a) f = {b} and f (a) ↑
(undefined at a) if (a) f = ∅.
The qualifier “normally” indicates frequency, but also allows some authors to differ:
Notably, Kurosh (1963) writes “a f ” for relations and functions, even omitting the brackets
around the input a.
We will follow the “normally preferred” notation for functions — f (a)— in this work and
will give reasons for this “preference” —notation “ f (a)” over “(a) f ”— when we consider
the composition of functions below.
86 3 Relations and Functions
Worth recording: If b is such that a f b or (a, b) ∈ f and f is a function, then seeing that
b is unique we have (a) f = {b}.
The relationship between “functional notation” vs. “relational notation” is summarised
below.
f (a) = b iff (a) f = {b}
functional notation relational notation
and
f (a) ↑ iff (a) f = ∅
functional notation relational notation
3.5.4 Definition (Images and Inverse Images) The set of all outputs of a function, when
the inputs come from a particular set X , is called the image of X under f and is denoted
by f [X ]. Thus,
De f
f [X ] = { f (x) : x ∈ X } (1)
Note that careless notation (in many discrete mathematics texts) like f (X ) will not do. This
means the input is X . If I want the inputs to be from inside X , then we must not use the
round brackets notation.
The inverse image of a set Y under a function is useful as well, that is, the set of all inputs
that generate f -outputs exclusively in Y . It is denoted by f −1 [Y ] and is defined as
De f
f −1 [Y ] = {x : f (x) ∈ Y } (2)
Pause. So far we have been giving definitions regarding functions of one variable. Or have
we?
Not really: We have already said that the multiple-input case is subsumed by our notation.
If f : A → B and A is a set of n-tuples, then f is a function of “n-variables”, essentially. The
binary relation that is the alias of f contains like (
pairs xn ), xn+1 . However,
we usually
abuse notation and write (
xn ) f instead of (
xn ) f and f (
xn ) instead of f (
xn ) .
3.5 Functions 87
Let now g = {(1, 2), ({1, 2}, 2), (2, 7)}, clearly a function. Thus, g({1, 2}) = 2, but
g[{1, 2}] = {2, 7}. Also, g(5) ↑ and thus g[{5}] = ∅.
On the other hand, g −1 [{2, 7}] = {1, {1, 2}, 2} and g −1 [{2}] = {1, {1, 2}}, while
g −1 [{8}] = ∅ since no input causes output 8.
3.5.7 Example We saw that (3.5.3) F(a) = ∅ means (a)F = {∅}, that is, (a, ∅) ∈ F or aF∅
—not what one might hastily conclude it means.
We have F(a) ↓ here, with output the object “∅”, that is, it is not the case that F(a) ↑.
The following is quite useful in set theory and even has a nickname, “(the Principle of)
Replacement”.21
3.5.8 Theorem If F is a function (possibly a proper class of pairs), and A is a set, then
F[A] is a set.
Proof Let
∅ = Y = F[A] (†)
Thus, for every y, y ∈ Y iff for some x ∈ A, F(x) = y.
In short, each y ∈ Y is labelled —in the sense of 3.3.6— by all the x ∈ A with the prop-
erty F(x) = y.
• No member of Y is without an A-label (by (†)). These labels are in A and in F−1 [{y}],
thus
A ∩ F−1 [{y}] is nonempt y (1)
• The set A ∩ F−1 [{y}] has no repeated members (being a set) thus the labels assigned to
y are distinct, and more importantly
• If y = y are both in Y, then they receive non overlapping labels because F−1 [{y}] ∩
F−1 [{y }] = ∅.
Why? Because if z ∈ F−1 [{y}] ∩ F−1 [{y }], then F(z) = y and F(z) = y ; impossible
for a function.
Proof Exercise!
Some texts in discrete mathematics and also in calculus will say “let f (x) be a function …”
Well, “ f (x)” is not a function. Correctly it is known variously as a function call, or
function application or function invocation. “ f ” is the function here; a set or “table” —
possibly infinite— of input/output pairs. Thus f is a set and f (x) is an output value (when
the input is x).
Computer programmers are very much aware of the distinction between a function call
f (x) and a function definition for f , the latter being defined intentionally (by behaviour)
rather than extensionally (by explicit listing as a set or table of input/out pairs).
This intentional definition of the input/output behaviour of a function f is done by a
program. Luckily, unlike tables, all programs being finite in length, can fit into a computer!
In mathematics we often want to say “let f be a function of input variables x and z …”
but we are not excused to say it incorrectly as “let f (x, z) be a function”; it is not!
We can say instead “let (or consider) λx z. f (x, z)”. This names both the function f and
its input variables x, z. This is known as Church’s λ-notation and is a by-product of his
foundation of computability via “λ calculus”.
denotes a function with input variables x1 , x2 , . . . , xn and output computed according to the
“rule” following the end-of-input dot. We can use “vector notation” for the input list and
write xn or just x, if n is understood or is unimportant, for x1 , x2 , . . . , xn . Then (1) morphs
into
λxn .r ule (1 )
3.5 Functions 89
Examples:
1. λx.x + 1 but also λy.y + 1 and λu.u + 1. The successor function over the natural num-
bers. The variables “x, y” and “u” are not able to accept substitutions —unlike the “x”
in “x + 1” or “x 2 − 30x + 5”.
That is, they are “bound” or “dummy” variables just as x is in this expression 100 2
x=1 x .
2. λxw.w . This function inputs x and w, then ignores x and returns w as output.
2 2
3. We can give a short (letter) name to a function as always. Thus, we can say
“Let f = λxw.w2 ”. Then f (x, w) = w2 and f (5, 2) = 22 = 4.
When f (a) ↓, then f (a) = f (a) as is naturally expected. What happens when f (a) ↑?
This begs a more general question that we settle as follows:
3.5.11 Definition (Kleene Equality) Kleene (Kleene (1943)) extended equality to include
the case when the two sides of “=” are calls f ( that are both undefined. For such
a ) and g(b)
cases validate the equality. In symbols
Case 2
Case 1
f ( ≡ f (
a ) = g(b) ↑ ∨ (∃z) f (
a ) ↑ ∧ g(b) =z
a ) = z ∧ g(b)
There is no universal agreement in the literature as to whether or not to use a new symbol
for the extended equality. We will not do so use, but those (publications, not individuals)
who do, use “ ” as in f (
a ) g(b).
3.5.12 Example Let g = {(1, 2), ({1, 2}, 2), (2, 7)}. Then, g(1) = g({1, 2}) and g(1) =
g(2). Moreover, g(3) = g(4) (both undefined).
3.5.13 Definition A function f is 1-1 if for all x, y and z where f (x) = f (y) = z we
obtain x = y. We can also say f is 1-1 iff x f z and y f z imply x = y.
The presence of z (the definedness at x and y) in 3.5.13 ensures that we will not expect
anything unreasonable, like 3 = 4, in the context of a 1-1 function f where f (3) ↑= f (4) ↑.
3.5.14 Example {(1, 1)} and {(1, 1), (2, 7)} are 1-1. {(1, 0), (2, 0)} is not. ∅ is 1-1 vacu-
ously.
3.5.15 Exercise Prove that if f is a 1-1 function, then the relation converse f −1 is a function
(that is, single-valued).
90 3 Relations and Functions
The terminology is derived from the fact that every element of A is paired with precisely
one element of B and vice versa.
In particular,
f ◦ g is also a function.
Indeed, if we have
x f ◦ g y and x f ◦ g y
then
for some z, x f z g y (2)
and
for some w, x f w g y (3)
As f is a function, (2) and (3) give z = w. In turn, this (g is a function too) gives y = y .
The notation (as in 3.4.2) “(a) f ” for relations is awkward when applied to functions in
the presence of composition. In something like
x→ f →z→ g →y
that represents (1) above, note that f acts first. Its result or output z = f (x) is then inputed
to g —that is, we perform
g(z) = g f (x)
to obtain output y. Thus the first acting function f is called first with argument x and after
that g is called with argument f (x). “Everyday math” notation places the two calls as in
3.5 Functions 91
the displayed formua above: The first call to the right of the 2nd call —order reversal vis a
vis relational notation!
So, set theory heeds these observations and defines:
3.5.19 Definition (Composition of Functions; Notation) We just learnt (3.5.18) that the
composition of two functions produces a function. The present definition is about notation
only.
Let f : A → B and g : B → C be two functions. The relation f ◦ g : A → C, their
relational composition is given in 3.1.16.
For composition of functions, we have the alternative —so-called functional notation
for composition: If f , g are functions then we may use “g f ” to stand for “ f ◦ g”. Note the
order reversal and the absence of “ ◦”, the composition symbol.
In particular we write (g f )(a) for (a)( f ◦ g) —cf. 3.5.3.
Above we said “alternative”, not exclusive. For functions we have two possible notations
for composition: relational and functional.
Thus
De f 22
a(g f )y ⇐⇒ a f ◦ g y ⇐⇒ (∃z)(a f z ∧ z g y)
also
De f De f3.4.2
a(g f )y ⇐⇒ a f ◦ g y ⇐⇒ (a)( f ◦ g) = {y}
In particular, we have that (a)( f ◦ g) of 3.4.2 is the same as (g f )(a) = g f (a) as seen
through the “computation”
Conclusion:
3.5.19 via 3.5.3 (1)
(g f )(a) = (a)( f ◦ g) = g f (x)
Thus the “reversal” g f = f ◦ g now makes sense! So does (g f )(a) = g f (a) .
Proof Exercise!
3.5.21 Example The identity relation on a set A is a function since (a)1 A is the singleton
{a}.
The following interesting result connects the notions of ontoness and 1-1ness with the
“algebra” of composition.
g f = 1A (1)
We say that g is a left inverse of f and f is a right inverse of g. “A” because these are not
in general unique! Stay tuned on this!
Proof About g: Our goal, ontoness, means that, for each x ∈ A, I can “solve the equation
g(y) = x for y”. Indeed I can: By definition of 1 A ,
(1)
3.5.19
g f (x) = (g f )(x) = 1 A (x) = x
3.5.23 Example The above is as much as can be proved. For example, say A = {1, 2} and
B = {3, 4, 5, 6}. Let f : A → B be {(1, 4), (2, 3)} and g : B → A be {(4, 1), (3, 2), (6, 1)},
or in friendlier notation
f (1)= 4
f (2)= 3
and
g(3) = 2
g(4) = 1
g(5) ↑
3.5 Functions 93
g(6) = 1
f˜(1)= 6
f˜(2)= 3
and
g̃(3) = 2
g̃(4) = 1
g̃(5) = 1
g̃(6) = 1
3.5.25 Theorem Let f : A → B be total and 1-1. Then there is an onto g : B → A such
that g f = 1 A .
Proof Consider the converse (also called “inverse”) relation (3.4.1) of f —that is, the
relation f −1 — and call it g:
Def
x g y iff y f x (1)
By Exercise 3.5.15, g : B → A is a (possibly nontotal) function so we can write (1) as
g(x) = y iff f (y) = x, from which, substituting f (y) for x in g(x) we get g( f (y)) = y,
f or all y ∈ A, that is g f = 1 A , hence g is onto by 3.5.22. We got both statements that we
needed to prove.
3.5.27 Theorem Let f : A → B be onto. Then there is a total and 1-1 g : B → A such
that f g = 1 B .
3.5.28 Remark (The Axiom of Choice) Strictly speaking, the argument in 3.5.27 is flawed.
“Choose one c ∈ f −1 [{b}], for each b”? How?
Well, “for each b, there is at least one c ∈ f −1 [{b}].23 For each b, write one such down!”
Hmm! If that process were finite I’d be willing to go along, and say, “oh well! A proof is
a finite sequence of statements. Like the one above, in quotes. So I accept it”.
But then I thought: “wait a minute!” If B is infinite, intuitively speaking, then this “proof”
never ends! But there is no such thing as a never-ending proof!
Is there a way out of this difficulty? Answer: Only if we could somehow describe these
infinitely many choices in a finite proof !
For example, if all the sets f −1 [{b}] are subsets of N, and since they are nonempty, we
could say “in each f −1 [{b}] pick the smallest natural number therein!” This simple phrase
well-defines exactly what to do and how to effect each choice, and describes so in a finite
way, avoiding an “infinite proof”.
Russell once illustrated this problem by contrasting choosing one sock from each pair
of an infinite set of pairs of socks, on one hand, while, on the other hand, choosing one shoe
from from each pair of a similarly infinite set of pairs of shoes.
For shoes the method is simply described: “pick the left shoe in each pair”. This totally
defines in a finite manner how each of the infinitely many choices can be effected; precisely.
In the absence of a method (back then24 ) of identifying a left from a right sock there is
no obvious finite method to avoid the obvious infinite “proof/construction”: “Pick a sock
pair; pick a sock from it. Pick another sock pair; pick a sock from it. Etc., forever!” Set
theorists (reluctantly at first! for a discussion see Wilder (1963)) adopted an axiom —called
the Axiom of Choice— that postulates
If S is an infinite family of nonempty sets, then there exists a “choice function” C that
“chooses” one element from each set A ∈ S, in the precise technical sense: C(A) ∈ A,
for all A ∈ S.
This axiom is applicable to our { f −1 [{b}] : b ∈ B} since all f −1 [{b}] are nonempty
by ontoness of f . Thus the “construction” of g(b) in the proof above —“To define g(b)
choose one c ∈ f −1 [{b}] and set g(b) = c”— is legitimised by the Axiom of Choice by
23 Since f is onto.
24 The illustration with the socks would not be valid with certain branded socks nowadays that have
the brand insignia so positioned as to identify left and right socks in a pair.
3.6 Finite and Infinite Sets 95
Broadly speaking (that is, with very little detail contained in what I will say next) we have
sets that are finite —intuitively meaning that we can count all their elements in a finite
amount of time (but see the -remark 3.6.3 below)— and those that are not, naturally
called infinite!
What is a mathematical way to say all this?
Any counting process of the elements of a finite set A will have us say out loud —every
time we pick or point at an element of A— “0th”, “1st”, “2nd”, etc., and, once we reach and
pick the last element of the set, we finally pronounce “nth”, for some appropriate n that we
reached in our counting (again, see 3.6.3).
Thus, mathematically, we are pairing each member of the set with a member from
{0, . . . , n}.
So we propose,
3.6.1 Definition (Finite and infinite sets) A set A is finite iff it is either empty, or is in 1-1
correspondence with {x ∈ N : x ≤ n}. This “normalised” small set of natural numbers we
usually denote by {0, 1, 2, . . . , n}.
If a set is not finite, then it is, by definition, infinite.
3.6.2 Example For any n, {0, . . . , n} is finite since, trivially, {0, . . . , n} ∼ {0, . . . , n} using
the identity () function on the set {0, . . . , n}.
3.6.3 Remark One must be careful when one attempts to explain finiteness via counting
by a human.
For example, Achilles25 could count infinitely many objects by constantly accelerating
his counting process as follows:
He procrastinated for a full second, and then counted the first element. Then, he counted
the second object exactly after 1/2 a second from the first. Then he got to the third element
1/22 seconds after the previous, …, he counted the n th item at exactly 1/2n−1 seconds after
the previous, and so on forever.
Hmm! It was not “forever”, was it? After a total of 2 seconds he was done!
You see (as you can easily verify from your calculus knowledge (limits)),26
1 1 1 1
1+ + + . . . + n−1 + . . . = =2
2 22 2 1 − 1/2
Case 1. n 0 ∈ H . Then removing all pairs (a, n 0 ) from f —all these have a ∈ H — we
get a new function f : X 0 − H → {0, 1, . . . , n 0 − 1}, which is still onto as we
only removed inputs that cause output n 0 . Moreover, X 0 − H {0, 1, . . . , n 0 − 1}.
(Why?)
This contradicts minimality of n 0 since n 0 − 1 works too!
n
26 1 + 1 + 1 + . . . + 1 = 1−1/2 . Now let n go to infinity at the limit.
2 22 2n−1 1−1/2
3.6 Finite and Infinite Sets 97
Case 2. n 0 ∈
/ H.
If n 0 ∈
/ X 0 , then we argue exactly as in Case 1 and we just remove the base “H ”
of the cone (in the picture) from X 0 .
Otherwise, we have two subcases:
We simply transform the picture to the one below, “correcting” f to have f (a) = m
and f (n 0 ) = n 0 , that is defining a new “ f ” that we will call f by
f = f − {(n 0 , m), (a, n 0 )} ∪ {(n 0 , n 0 ), (a, m)}
Proof If the conclusion fails then we have an onto f : {0, . . . , m} → {0, . . . , n}, contra-
dicting 3.6.4.
98 3 Relations and Functions
Important!
3.6.7 Definition Let A ∼ {0, . . . , n}. Since n is uniquely determined by A we say that A
has n + 1 elements and write |A| = n + 1.
3.6.8 Corollary For any choice of n, there is no onto function from {0, . . . , n} to N.
Now let
De f
X = {0, . . . , n} − Y
and
De f
g = g − Y × N
The “g − Y × N” above is an easy way to say “remove all pairs from g that have their first
component in Y ”.
Thus, g : X → {0, . . . , n, n + 1} is onto (Why?), contradicting 3.6.4 because X ⊆
{0, . . . , n} {0, . . . , n, n + 1}.
Proof By 3.6.1 the opposite case requires that there is an n and a function f :
{0, 1, 2, . . . , n} → N that is a 1-1 correspondence. Impossible, since any such an f will
fail to be onto.
Our mathematical definitions have led to what we hoped they would: That N is infinite as
we intuitively understand, notwithstanding Achilles’s accelerated counting!
N is a “canonical” infinite set that we can use to index the members of many infinite sets.
Sets that can be indexed using natural number indices
3.6 Finite and Infinite Sets 99
a0 , a1 , . . .
are called countable.
In the interest of technical flexibility, we do not insist that all members of N be used as
indices. We might enumerate with gaps:
b5 , b9 , b13 , b42 , . . .
Thus, informally, a set A is countable if it is empty or (in the opposite case) if there is a way
to index, hence enumerate, all its members in an array, utilising indices from N. Cf. 3.3.6.
It is allowed to repeatedly list any element of A, so that finite sets are countable. For
example, the set {42}:
42 forever
42, 42, 42, . .
.
We may think that the enumeration above is done by assigning to “42” all of the members
of N as indices, in other words, the enumeration is effected, for example, by the constant
function f : N → {42} given by f (n) = 42 for all n ∈ N. This is consistent with our earlier
definition of indexing (3.3.6).
Now, mathematically,
Thus a nonempty set is countable iff it is the range of some function that has N as its left
field.
Incidentally, since we allow f to be non total, the hedging “nonempty” (in 3.6.10 above
and in this remark) is unnecessary: ∅ is the range of the empty function that has N as its left
field.
We said that the f that proves countability of a set A need not be total. But such an f
can always be “completed”, by adding pairs to it, to get an f such that f : N → A is onto
and total. Here is how:
27 Since we are constructing a total onto function to A we need to assume the case A = ∅ as we
cannot put any outputs into ∅.
100 3 Relations and Functions
Proof Pick an a ∈ A —possible since A = ∅— and keep it fixed. Now, our sought f is
given for all n ∈ N by cases as below:
f (n) if f (n) ↓
f (n) =
a if f (n) ↑
Some set theorists also define sets that can be enumerated using all the elements of N as
indices without repetitions.
3.6.13 Example Every enumerable set is countable, but the converse fails. For example, {1}
is countable but not enumerable due to 3.6.8. {2n : n ∈ N} is enumerable, with f (n) = 2n
effecting the 1-1 correspondence f : N → {2n : n ∈ N}.
Proof We will build a 1-1 and total enumeration of A, presented in a finite manner as a
(pseudo) program below, which enumerates all the members of A in strict ascending order
and arranges them in an array
a(0), a(1), a(2), . . . (1)
n ←0
while A=∅
a(n) ← min A Comment.We are inside the loop ∅ = A ⊆ N, hence min exists.
A ← A − {a(n)}
n ←n+1
end while
Note that the sequence {a(0), a(1), . . . , a(m)} is strictly increasing for any m, since a(0) is
smallest in A, a(1) is smallest in A − {a(0)} and hence the next higher than a(0) in A, etc.
Will this loop ever exit? Suppose that it exits when it starts (but does not complete) the
k-th pass through the loop. Thus A became empty when we did A ← A − {a(k − 1)} in the
previous pass, that is A = {a(0), a(1), . . . , a(k − 1)} and thus, since
Proof Let f : N → A be onto and total (cf. 3.6.11), where A is infinite. Let g : A → N such
that f g = 1 A (3.5.27). Thus, if we let B = ran(g), we have that g is onto B, and by 3.5.22
is also 1-1 and total. Thus it is a 1-1 correspondence g : A → B, that is,
A∼B (1)
B must be infinite, otherwise (3.6.1), for some n, B ∼ {0, . . . , n} and by (1) via Exercise
3.5.17 we have A ∼ {0, . . . , n}, contradicting that A is infinite. Thus, by 3.6.14, B ∼ N,
hence (again, Exercise 3.5.17 and (1)) A ∼ N. That is, A is enumerable.
So, if we can enumerate an infinite set at all, then we can enumerate it without repetitions.
We can linearise an infinite square matrix of elements in each location (i, j) by devising
a traversal that will go through each (i, j) entry once, and will not miss any entry!
In the literature one often sees the method diagrammatically, see below, where arrows
clearly indicate the sequence of traversing, with the understanding that we use the arrows
by picking the first unused chain of arrows from left to right.
So the linearisation induces a 1-1 correspondence between N and the linearised sequence
of matrix entries, that is, it shows that N × N ∼ N. In short,
Is there a “mathematical” way to do this? Well, the above is mathematical, don’t get me
wrong, but is given in outline. It is like an argument in geometry, where we rely on drawings
(figures).
Here are the algebraic details:
Proof (of 3.6.16 with an algebraic argument). Let us call i + j + 1 the “weight” of a pair
(i, j). The weight is the number of elements in the group:
1, 2, 3, 4, 5, . . .
and in each group of weight k we enumerate in ascending order of the second component.
Thus the (i, j) th entry occupies position j in its group —the first position in the group
being the 0 th, e.g., in the group of (3, 0) the first position is the 0 th— and this position
globally is the number of elements in all groups before group i + j + 1, plus j. Thus the first
available position for the first entry of group (i, j) members is just after this many occupied
positions:
(i + j)(i + j + 1)
1 + 2 + 3 + . . . (i + j) =
2
That is,
(i + j)(i + j + 1)
global position of (i, j) is this: +j
2
The function f which for all i, j is given by
(i + j)(i + j + 1)
f (i, j) = +j
2
is the algebraic form of the above enumeration.
3.6.18 Remark
1. Let us collect a few more remarks on countable sets here. Suppose now that we start with
a countable set A. Is every subset of A countable? Yes, because the composition of onto
functions is onto.
2. 3.6.19 Exercise What does composition of onto functions have to do with this? Well,
if B ⊆ A then there is a natural onto function g : A → B. Which one? Think “natural”!
Get a natural total and 1-1 function f : B → A and then use f to get g.
3. As a special case, if A is countable, then so is A ∩ B for any B, since A ∩ B ⊆ A.
4. How about A ∪ B? If both A and B are countable, then so is A ∪ B. Indeed, and without
inventing a new technique, let
a0 , a1 , . . .
be an enumeration of A and
b0 , b1 , . . .
for B. Now form an infinite matrix with the A-enumeration as the 1st row, while each
remaining row is the same as the B-enumeration. Now linearise this matrix!
Of course, we may alternatively adapt the unfolding technique to an infinite matrix of
just two rows. How?
5. 3.6.20 Exercise Let A be enumerable and an enumeration of A
a0 , a1 , a2 , . . . (1)
is given.
So, this is an enumeration without repetitions.
Use techniques we employed in this section to propose a new enumeration in which every
ai is listed infinitely many times (this is useful in some applications of logic).
110
101
011
104 3 Relations and Functions
and we are asked: Find a sequence of three numbers, using only 0 or 1, that does not fit as
a row of the above matrix —i.e., is different from all rows.
Sure, you reply: Take 1 1 1. Or, take 0 0 0.
That is correct. But what if the matrix were big, say, 10350000 × 10350000 , or even infinite?
Is there a finitely describable technique that can produce an “unfit” row for any square
matrix, even an infinite one? Yes, it is Cantor’s diagonal method or technique.
He noticed that any row that fits in the matrix as the, say, i-th row, intersects the main
diagonal at the same spot that the i-th column does.
Thus if we take the main diagonal —a sequence that has the same length as any row—
and change every one of its entries, then it will not fit anywhere as a row! Because no row
can have an entry that is different from the entry at the location where it intersects the main
diagonal!
This idea would give the answer 0 1 0 to our original question. While 1000 11 3 also
follows the principle “change all the entries of the diagonal” and works, we are constrained
here to “use only 0 or 1” as entries. More seriously, in a case of a very large or infinite matrix
it is best to have a simple technique that is finitely describable and works even if we do not
know much about the elements of the matrix. Read on!
3.7.2 Example We have an infinite matrix of 0-1 entries. Can we produce an infinite
sequence of 0-1 entries that does not match any row in the matrix? Yes, take the main
diagonal and flip every entry (0 to 1; 1 to 0).
If we assume that it fits as row i, then we get a contradiction:
Say the original row has an a as entry (i, i). But, by our construction, the new row has an
1 − a in as entry (i, i), so it will not fit as row i after all. So it fits nowhere, i being arbitrary.
The technique of obtaining a modified copy of the main diagonal of an infinite matrix so
that it does not match any row of the matrix is due to Cantor and is called diagonal method,
or diagonalisation.
3.7.3 Example (Cantor) Let S denote the set of all infinite sequences of 0s and 1s.
Pause. What is an infinite sequence? Our intuitive understanding of the term is captured
mathematically by the concept of a total function f with left field (and hence domain) N.
The n-th member of the sequence is f (n).
Can we arrange all of S in an infinite matrix —one element per row? No, since the
preceding example shows that we would miss at least one infinite sequence (i.e., we would
fail to list it as a row), because a sequence of infinitely many 0s and/or 1s can be found, as
indicated above, that does not match any row!
3.7 Diagonalisation and Uncountable Sets 105
But arranging all members of S as an infinite matrix —one element per row— is tan-
tamount to saying that we can enumerate all the members of S using members of N as
indices.
So we cannot do that. S is not countable!
3.7.4 Definition (Uncountable Sets) A set that is not countable is called uncountable.
So, an uncountable set is neither finite, nor enumerable. The first observation makes it
infinite, the second makes it “more infinite” than the set of natural numbers since it is not
in 1-1 correspondence with N (else it would be enumerable, hence countable) nor with a
subset of N: indeed, if the latter holds, then our uncountable set would be finite or enumerable
(which is absurd) according as it would be in 1-1 correspondence with a finite subset or an
infinite subset of N (cf. 3.6.14 and Exercise 3.5.17).
Example 3.7.3 shows that uncountable sets exist. Here is a more interesting one.
3.7.5 Example (Cantor) The set of real numbers in the interval
Def
(0, 1) = {x ∈ R : 0 < x < 1}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
prefixed with a dot; that is, think of the number’s decimal notation.
Some numbers have representations that end in 0s after a certain point. We call these
representations finite. Every such number has also an “infinite representation” since the non
zero digit d immediately to the left of the infinite tail of 0s can be converted to d − 1, and
the infinite tail into 9s, without changing the value of the number.
Allow only infinite representations.
Assume now by way of contradiction that a listing of all members of (0, 1) exists, listing
them via their infinite representations
The argument from 3.7.3 can be easily modified to get a “row that does not fit”, that is, a
representation
106 3 Relations and Functions
.d0 d1 d2 · · ·
not in the listing.
Well, just let
2 if aii = 0 ∨ aii = 1
di =
1 otherwise
Clearly .d0 d1 d2 · · · does not fit in any row i as it differs from the expected digit at the
i-th decimal place: should be aii , but di = aii . It is, on the other hand, an infinite decimal
expansion, being devoid of zeros, and thus should be listed. This contradiction settles the
issue.
3.7.6 Example (3.7.3 Revisited) Consider the set of all total functions from N to {0, 1}.
Is this countable?
Well, if there is an enumeration of these one-variable functions
f0 , f1 , f2 , f3, . . . (1)
consider the function g : N → {0, 1} given by g(x) = 1 − f x (x). Clearly, this must appear
in the listing (1) since it has the correct left and right fields, and is total.
Too bad! If g = f i then g(i) = f i (i). By definition, it is also 1 − f i (i). A contradiction.
This is just a version of 3.7.3; as already noted there, an infinite sequence of 0s and 1s is
just a total function from N to {0, 1}.
The same argument as above shows that the set of all functions from N to itself is
uncountable. Taking g(x) = f x (x) + 1 also works in this case to “systematically change
the diagonal” f 0 (0), f 1 (1), . . . since we are not constrained to keep the function values in
{0, 1}.
3.7.7 Remark Worth Emphasizing. Here is how we constructed g: We have a list of in
principle available f -indices for g. We want to make sure that none of them applies.
A convenient method to do that is to inspect each available index, i, and using the diagonal
method do this: Ensure that g differs from f i at input i, by setting g(i) = 1 − f i (i).
This ensures that g = f i ; period. We say that we cancelled the index i as a possible “ f -index”
of g.
Since the process is applied for each i, we have cancelled all possible indices for g: For
no i can we have g = f i .
3.7.8 Example (Cantor) What about the set of all subsets of N —P (N) or 2N ?
Cantor showed that this is uncountable as well: If not, we have an enumeration of its
members as
S0 , S1 , S2 , . . . (1)
3.7 Diagonalisation and Uncountable Sets 107
This array faithfully represents S —tells all we need to know about what S contains— since
it contains a “1” in location x iff x ∈ S; contains “0” otherwise.
The array viewed as a total function from N to {0, 1} is called the characteristic function
of S, denoted by c S :
1 if x ∈ S
c S (x) =
0 if x ∈ N − S
Note that there is a 1-1 correspondence, let’s call it F, between subsets of N and the total
0-1-valued functions from N simply given by F(S) = c S . (Exercise!)
Thus
{ f : f : N → {0, 1} and f is total} ∼ 2N
In particular, the concept of characteristic functions shows that Example 3.7.8 fits the diag-
onalization methodology. Indeed, the argument in 3.7.8 sets c D (x) = 1 − c Sx (x), for all x,
because
But then, the argument in 3.7.8 essentially applies the diagonal method to the list of 0/1
functions c Sx , for x = 0, 1, 2, . . ., to show that some 0/1 function, namely, c D cannot be in
the list.
3.7.10 Remark (Cantor) Cantor offered also a generalisation of 3.7.8: For any set X , we
have X 2 X = P (X ).
Assume otherwise, and let f : X → 2 X be onto, and let us write “Wx ” for “ f (x)” (the
latter, hence also the former) being the subset of X enumerated by f at “position” x (∈ X ).
Define
De f
D = {x ∈ X : x ∈ / Wx } (1)
We show that D (⊆ X ) is not a “Wx ”, that is, it is not enumerated by f contradicting ontoness
of the latter. Thus, indeed X 2 X .
108 3 Relations and Functions
The details: Suppose D = Wa for some a ∈ X . Then a ∈ D ≡ a ∈ Wa . But also (by (1)),
a∈D≡a∈ / Wa . A contradiction.
Is this diagonalisation? Of course: Let c D and cWx be the characteristic functions of D
and Wx (any x) respectively.
We have arranged so that (by (1)) c D (x) = 1 iff cWx (x) = 0. So from the infinite 0/1
matrix with entries cWi ( j) we obtained D by flipping the main diagonal entries cWi (i).
3.7.11 Remark (Russell Paradox and Diagonalisation) It should be mentioned that the
argument in Russell’s paradox is a diagonalisation in the model of the above.
In 3.7.10 we show that {x : x ∈
/ Wx } is “not a Wx ”.
The Wx above are enumerated using indices x from X .
Well, consider here an (attempted) enumeration of all sets using as indices the sets
themselves —that is, the enumerating function is the identity —λx.x : U → U— so while
we might imagine that a set a is enumerated as a “Wa ” we actually enumerate it as just “a”
without the unnecessary burden of the “W -notation”.
As in 3.7.10, we consider the question: Have we enumerated all sets? Or, for example,
Is
simplified “Wx ” notation
{x : x ∈
/ x }
a “Wa ” —or, in our simplified notation, an “a”?
Note that acts on points in P (X ) so the notation “(S)” (as opposed to “[S]”) is
correct.
• (S) = S and
• If (X ) = X , then S ⊆ X .
By 3.8.2 and the trivial X ⊆ X we have X ∈ F so this class is nonempty and it has,
therefore, an intersection S that is a subset of X :
S= F= Z⊆X (1)
Z ∈F
For Z ∈ F, it is T ∈F T ⊆ Z , thus we have T ∈F T ⊆ (Z ) by monotonicity and
hence also
T ⊆ (Z ) ⊆ Z=S (2)
T ∈F Z ∈F Z ∈F
Note the italics in “The” above. The uniqueness follows trivially from the ⊆-smallest
property.29
3.8.5 Definition Let f : A → B be total and 1-1. We indicate this situation by the symbol
f
A B and also A B if the role of f is important.
The following theorem that variously goes under the pair of names Cantor and Bern-
stein or, alternatively, Schroeder and Bernstein and even only Bernstein is remarkable in
establishing the connection between and ∼.30
f g
3.8.8 Theorem (Dedekind) Let A B and B A. Then A ∼ B.
Proof This proof follows an idea from Dieudonné (1960) but here we use operators (loc.
cit. does not). The assumption says that f : A → B and g : B → A are each total and 1-1.
Define the operator : 2 A → 2 A by
De f
For all Z ⊆ A, (Z ) = (A − g[B]) ∪ g f [Z ] (1)
De f f (x) if x ∈ X
h(x) =
g −1 (x) if x ∈ Y
3.8.9 Example (The self-contradictory set of all sets) The quoted term in the Example
caption goes back to the time when mathematicians were still shocked from the discovery of
set theory paradoxes by Russell, Burali-Forti and others (cf. Wilder (1963)) that invariably
112 3 Relations and Functions
pointed to huge or enormously inclusive sets as the culprits. Yet, until such time as Russell
introduced the remedy —of requiring sets to be built by stages and not to just “happen”—
they still considered all collections as sets and employed the nickname “self contradictory
sets” for those examples that were proper classes as we call them in the current terminology.
This example outlines the discovery that U was a “self contradictory set”.
That U cannot be a set —and “yet was back then, a set” (all collections were; hence it
turned out to be a “self contradictory set”)— was actually proved early on in the development
of set theory. This was not deduced as a result of “the Russell non-set set” being a subset of
U (true the latter contains all “sets”).
Besides, such an avenue (via 2.3.6) would not necessarily lead one to conclude that U is
not a set too, considering that the subclass theorem was not yet known. Nor was it known
yet that x ∈ x is false for all x —a fact that would entail U to equal the Russell non-set set
(See also 2.2.1) and thus be a non-set set itself.
But still there was a clear contradiction to “U being a set” that was argued, essentially,
in the following less elementary way:
De f
1. Since U contains everything —U = {x : x = x}— in particular every member of any
set S is in U and thus S ⊆ U. In particular, P (U) ⊆ U and thus (3.8.6.3) P (U) U.
2. On the other hand, by 3.8.6.2 we have U P (U).
3.9 Exercises
1. Give an example of two equivalence relations R and S on the set A = {1, 2, 3} such that
R ∪ S is not an equivalence relation.
2. Draw the Hasse diagram of the order ⊂ defined on the set 2{1,2,3} .
3. Given the left field A = {0, 1, 2} and right field B = {4, 5}. Which of the following
functions from A to B is partial, total, nontotal, 1-1, onto?
4. Prove that if R and S are any equivalence relations on any set A, then (R ∪ S)+ , the
transitive closure of their union, is also an equivalence relation.
3.9 Exercises 113
a. Actually give a counterexample to the above “theorem”: Propose a small set A and
construct a symmetric and transitive relation on it which is not reflexive.
b. So the theorem is false, but if you were to grade the “proof” you would have to find
where it misstepped. Where exactly is the error in the proof?
and a(S+ )−1 b means b(S+ )a, that is, for some finite sequence r1 , . . . , rm we have
bSr1 Sr2 . . . rm Sa
Prove that his definition is equivalent (two directions!) to the one we introduced in
this chapter (3.6.1).
29. Suppose someone defined certain subsets of N —one for each natural number x, which
thus is naming one such subset (possibly several x names are given to the same subset)
and they called them Wx .
Is the following subset of N,
3.9 Exercises 115
{x ∈ N : x ∈
/ Wx }
a Wx ? Why?31
Since by (1) ab ≥ 0, (2) has only two solutions. Both lead to 1-1-ness (a = b).
For ontoness consider three cases of −1 < c < 1 and find a b ∈ R such that f (b) = c:
The cases are c = 0, −1 < c < 0 and 0 < c < 1.
33. Let A and C be sets. Take for granted that {X , Y } is a set (2.3.1) for any sets or atoms
X, Y .
Without using Principles 0, 1, 2, but using Principle 3 prove that A × {C} is a set.
34. Let A and B be sets. Take for granted that {X , Y } is a set (2.3.1) for any sets or atoms
X, Y .
Without using Principles 0, 1, 2, but using Principle 3 prove that A × B is a set.
Hint. Use the preceding exercise.
31 W sets are actually defined and used in computability and were introduced by Rogers. Cf. Rogers
x
(1967), Tourlakis (2022).
A Tiny Bit of Informal Logic
4
Overview
We have become somewhat proficient in using informal logic in our arguments about aspects
of discrete mathematics.
Although we have already used quantifiers, ∃ and ∀, we did so mostly viewing them as
symbolic abbreviations of English texts about mathematics. In this chapter we will expand
our techniques in logic, extending them to include the manipulation of quantifiers, such as
formal —i.e., syntax-based— techniques towards adding ∀, ∃ to the left of a formula, and
also removing them when they are prefixes.
Manipulation of quantifiers boils down, mostly, to “how can I remove a quantifier from the
very beginning of a formula?” and “how can I add a quantifier at the very beginning of a
formula?” Once we learn this technique we will be able to reason within mathematics with
ease.
But first let us define once and for all what a mathematical proof looks like: its correct,
expected syntax or form, that is.
We will need some concepts to begin with.
1. The alphabet and structure of formulas. Formulas are strings. The alphabet —that is
set— of symbols that we use to write down formulas contains, at a minimum,
=, ¬, ∧, ∨, →, ≡, (, ), ∀, ∃, object variables1
1 That is, variables that denote objects such as numbers, arrays, matrices, sets, trees, etc.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 117
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_4
118 4 A Tiny Bit of Informal Logic
We finitely generate the infinite set of object variables using single letters, if necessary
, u
with primes and/or subscripts: A, x, y , w23 501 .
2. One normally works in a mathematical area of interest, or mathematical theory —such as
Geometry, Set Theory, Number Theory, Algebra, Calculus— where one needs additional
symbols to write down formulas. That is, symbols like
0, ∅, ∈, ⊂, , ◦, +, ×
• Terms are “function calls”, in the jargon of the computer savvy person. These calls
take math objects as inputs and return math objects as outputs. Examples of calls or
terms —all drawn from our familiar arithmetic— are: x + y, x × 3, 0 × x + 1 (one is
told that × is stronger than +, so, notwithstanding the bracket-parsimonious notation
“0 × x + 1”, we know it means “(0 × x) + 1”, so this call returns 1, no matter what
we plugged into x).
• Formulas are also function calls, but their output is restricted (by their syntax that
I will not define carefully!) to be one or the other of the truth values true or false
(denoted in this book by t or f) but nothing else! Their inputs, just as in the case for
terms, are any math objects. Examples are: 2 < 3 (which is true, t), (∀x)x = x (t),
(∀x)x = 0 (f over, say, the reals R), (∃x)x = 0 (t over the reals and natural numbers),
x = 0 the latter being neither true nor false; the answer depends on the input we put
in the “input variable” x.
More: x = x (t), an answer that is independent of input; x = 0 → x = 0 (t), an answer
that is independent of input; x = 0 → (∀x)x = 0, which is neither true nor false; the
answer depends on the input in x. The input variable is the leftmost x; the other two
occurrences of “x” are bound —as we say— and unavailable to accept inputs. See
also below.
• If an occurrence of a variable in a formula is available to accept inputs, then non
logicians would normally call it an occurrence as an input variable. Logicians (in
their classrooms and in the literature they author) would rather call such occurrences
free occurrences.
At the expense of writing style, “occurrence” occurred no less than four times in the
short passage above. The aim is emphasis: It is not a variable x itself that is free
4.1 Enriching Our Proofs to Manipulate Quantifiers 119
(input) or bound (not available for input) in a formula, but it is the occurrences of said
variable that we are speaking of, as the immediately preceding example makes clear.
4. In (∀x)x = 0 the variable x is non input, it is “bound” we say. Just like this: i=1 4 i,
which means 1 + 2 + 3 + 4 and “i” is not available for input: Something like 3=1 3 is,
4
(∀x)A, x = 3, x = 7
The last two have no Boolean structure so deconstructing stops with them. How about
(∀x)A? This cannot be deconstructed either, even if A has Boolean structure! Such
structure is locked up and hidden in the scope of (∀x).
We call the formulas where deconstruction stops “prime”. A prime formula is one with
no Boolean structure, e.g., x < 8, or one of the form (∀x)A or (∃x)A.
120 4 A Tiny Bit of Informal Logic
Every formula is either prime or can be deconstructed into prime components. A prime
formula is one with no explicit Boolean connectives. Such connectives are either totally
absent in it —e.g., x < y— or are buried in the scope of a quantifier —e.g., (∃x)(x =
0 ∨ x > 5).
Thus prime formulas are “atomic” —no further deconstructible— as far as Boolean
structure is concerned.
4.1.1 Remark (Tautologies) A formula A is a tautology iff it is true due to its Boolean
structure, according to truth tables (Remark 2.3.4) no matter what the values of its prime
formulas into which it is deconstructed are postulated to be. Postulated to be: This signifies
that we do not (attempt to) compute the intrinsic truth value of a prime formula when we
check whether A is a tautology or not.2
For example, x = x is a prime formula and thus its postulated value could be any one of
t or f. Thus it is not a tautology, even though, intrinsically is true, no matter what the value
of x may be.
4.1.2 Example
1. (∀x)A is not a tautology as it has two possible truth values (being a prime formula) in
principle.
2. x = 0 → x = 0 is a tautology. Which are its prime (sub) formulas?
3. (∀x)x = 0 → x = 0 is not a tautology. As noted, to determine tautologyhood we do not
evaluate prime formulas; we just consider each of the two scenarios, t or f, for each
prime formula and use truth tables to compute the overall truth value.
If we did evaluate (∀x)x = 0 we would see that (say over the natural numbers, or reals,
or complex numbers) it is false.3 So the implication is true. However it is not true as a
Boolean formula.
So, how do we show that (∀x)A is true (if it is)? Well, in easy cases we try to see if A is true
for all values of x —no matter where these values come from! That failing, we will use a
proof (see 4.1.11).
Similarly for (∃x)A. To show it is true (if it is) we try to see if A is true for some value
of x. Often we just guess one such value that works, say c, and then verify the truth of A
when x = c. That failing, we will use a proof.
2 After all, not all prime formulas have intrinsic values; x = y does not. It depends on assumed values
of x and y.
3 If we are doing our mathematics restricted to the set {0}, then, in this “theory” the formula is true!
4.1 Enriching Our Proofs to Manipulate Quantifiers 121
A1 , A2 , . . . , An |=taut B
meaning
4.1.4 Remark Note that we do not care to check, or even state, what happens if A ∧ A ∧
1 2
. . . ∧ An is false because the formula in (1) is then trivially true.
So, a tautological implication A1 , A2 , . . . , An |=taut B says that B is true provided we
proved (or accepted) that the lhs of |=taut is true.
4.1.5 Example Here are some easy and some involved tautological implications. They can
all be verified using truth tables, either building the tables in full, or taking shortcuts.
1. A |=taut A
2. A |=taut A ∨ B
3. A |=taut B → A
4. A, ¬A |=taut B —any B. Because I do work only if A ∧ ¬A is true! See above.
5. f |=taut B —any B. Because I do work only if lhs is true! See above.
6. Is this a valid tautological implication? B, A → B |=taut A, where A and B are distinct.
No, for if A is false and B is true, then the lhs is true, but the rhs is false!
7. Is this a valid tautological implication? A, A → B |=taut B? Yes! Say A = t and (A →
B) = t. Then, from the truth table of →, it must be B = t.
Statements such as “B = t” are shorthand for “B evaluates as t”.
8. How about this? A, A ≡ B |=taut B? Yes! Verify!
9. How about this? A ∨ B ≡ B |=taut A → B? Yes! I verify:
First off, assume lhs of |=taut —that is, A ∨ B ≡ B— is true.
122 4 A Tiny Bit of Informal Logic
Two cases:
• B = f. Then I need the lhs of ≡ to be false (f) to satisfy the italicised “assume”. So
A = f as well and clearly the rhs of |=taut is true with these values.
• B = t. Then I need not worry about A on the lhs. The rhs of |=taut is true by truth
table of →.
10. A ∧ (f ≡ A) |=taut B, for any B. Well, just note that the lhs of |=taut is f so we need to
do no work with B to conclude that the implication is valid.
11.
A → B, C → B |=taut A ∨ C → B
This is nicknamed “proof by cases” for the obvious reasons. Verify this tautological
implication!
Before we describe what a logical proof is, we need some discussion and a definition.
4.1.6 Example (A Cautionary Tale) Consider the formula
(∃y)¬x = y
(∃y)y = y (2)
which says “there is a value of y that is different from itself” —obviously a false statement.
What caused this distortion of (original) meaning is that an object that we substituted
into a free variable occurrence x contained a variable y that was “captured” by a quantifier,
as we say.
4.1 Enriching Our Proofs to Manipulate Quantifiers 123
So we always disallow substitutions that cause capture! We say they are illegal or
impossible.
4.1.7 Definition Let A be a formula and x a variable. The symbol A[x] indicates our interest
in the possibly input variable x of A. If y and z are actually the only input (free) variables
of A I can indicate this without words by writing A(y, z).
I explain “possibly”. For example,
1. If A is y = z then x does not even occur in A. But I said possibly! I can still write A[x].
I can also write A(y, z).
2. In the case where A is (∀x)x = 1, A cannot receive inputs in the so-called bound variable
x. Even though I may write A[x], this is just wishful thinking and x does not occur free
in A or as we variously say x is not an input variable of A or A does not depend on x.
3. A is (∀x)x = y. I can write A[x] but x is actually not free in A. I actually have A(y).
4. Let now t be any term —a constant or variable or a function call.
Having declared interest in a (possibly) free variable of A by writing down A[x], I can
next write A[t] in the same context meaning the substitution of t into x —that is, a search
and replace operation: find all free occurrences of x in A and replace them all by t; but do
abort the entire substitution (illegal) operation if any replacement caused some variable
in t to become bound (capture).
In the first illustration above, and assuming t is g(w), we get A[t] is y = z.
If now B[x] is (∀w)w = x then B[t] is illegal since it means (∀w)w = g(w).
5. If I wrote A(y, z), then A(t, z) means A(g(w), z) which is legal if A is as in item 1.
above.
The job of a mathematical proof is to start from established (previous theorems) truths,
or assumed truths (axioms) and unfailingly preserve truths in all the proof’s steps as we
develop it.
This description is word-parsimonious and sounds circular: No chicken and egg dilemma
here, however. “Previous theorems” can be used only if we have any of those at any given
moment. Else we use just axioms.
In fact, the concept of proof is defined in terms of axioms (and rules of inference) alone.
Thus, by the truth-preservation property, we will have produced, in particular, a truth at
the very last step of a proof. This is what we call a proved theorem.
What are our axioms, our starting assumptions, when we do proofs?
4.1.8 Definition First off, in any proof that we will write in math there are axioms that are
independent of the type of math that we do, whether it is set theory, number theory, algebra,
calculus, etc. Such axioms are called logical (logic-specific, that is).
124 4 A Tiny Bit of Informal Logic
and
A[x] → B[x] is true for all x (‡)
By (‡), every x that makes A true makes B true. But that is all values of x by (†). So
B[x] is true for all values of x as we wanted to verify.
5. For any choice of variable (here I use “x”) x = x is the identity axiom, no matter what
(is the value of) “x”. Note that “For any choice of variable” means that y = y and w = w
are also instances of the axiom.
6. x = y → y = x and x = y ∧ y = z → x = z are the equality axioms.
They can be expressed equally well using variables other than x and y (e.g., u, v and w).
4 Practicioners usually say “for all x”, meaning for all values of x.
4.1 Enriching Our Proofs to Manipulate Quantifiers 125
7. The ∃ vs. ∀ axiom. For any formula A and any choice of quantified variable, (∃x)A[x] ≡
¬(∀x)¬A[x] is an axiom.
Note that the right hand side of the “≡” says “it is not the case that all values of x make
A[x] false”.
The “rules of proving”, or rules of inference. These are two rules provided up in front and
as such they are essential to start up the proof mechanism. To indicate this “start-up role”
we often call these rules primary or primitive.
Incidentally you will find I am grossly miscounting the rules in item 2 below:
1. From A[x] I may infer (∀x)A[x]. Logicians write the up-in-front (“primary”) rules as
fractions without words:
A[x]
(∀x)A[x]
this rule we call generalisation, or we are using the nickname “Gen”.
2. I may construct (and use), using any tautological implication that I have verified, say,
one of this shape (form)
A1 , A2 , . . . , An |=taut B
the rule
A1 , A2 , . . . , An
B
For example, seeing readily that A, A → B |=taut B, we have the rule
A, A → B
(M P)
B
This is a very popular rule, known as modus ponens, in short MP.
It turns out that MP and Gen is all you need to prove theorems, thus, officially they are
THE two primary rules. However, additional tautological implications from 2. help in
the practice of proofs.
126 4 A Tiny Bit of Informal Logic
1. Rule Use: How do we use rules? See also Definition 4.1.11 below. If in a proof that we
are writing we have already written all the formulas of the numerator of some rule, then
it is correct to write next (or at any later step) the denominator of the rule.
We say that we have applied the rule in order to obtain and write the denominator.
2. The second “rule” above is a rule constructor. Any tautological implication we come
up with is fair game: It leads to a valid rule since the name of the game (in a proof) is
preservation/propagation of truth.
This is not an invitation to learn and memorise infinitely many rules (!) but is rather a
license to build your own rules as you go, as long as you endeavoured to verify first the
validity of the tautological implication they are based on.
3. Gen is a rule that indeed propagates truth: If A[x] is true, that means that it is so for all
values of x and all values of any other variables on which A depends, which variables
I did not show in the [. . .] notation. But then so is (∀x)A[x] true, as it says precisely
the same thing: “A[x] is true, for all values of x and all values of any other variables on
which A depends but I did not show in the [. . .] notation”.
The only difference between the two notations is that I added some notational emphasis
in the second —(∀x).
For example, if I know that B has just two variables, u and v, I can write it as B(u, v).
Then
4. Hmm. So is (∀x) redundant? Yes, but only as a formula prefix. In something like this
x = 0 → (∀x)x = 0 (1)
it is not redundant!
Dropping ∀ we change the meaning of (1).
As is, (1) is not a true statement (for all values of x, that is). For example, if we set x
to be 0, then (1) becomes the false statement 0 = 0 → (∀x)x = 0 since 0 = 0 is true
but (over the integers, say) “(∀x)x = 0” is f. However dropping (∀x), (1) morphs into
“x = 0 → x = 0” which is a tautology; always true.
5. The axioms in 4.1.8 are indispensable to do just logic; that is why we call them logical
axioms.
We also use them in all math reasoning no matter what type of math it is. However,
mathematical theories have their own additional axioms! These are called special axioms
but most often “mathematical axioms”.
4.1 Enriching Our Proofs to Manipulate Quantifiers 127
We are not going to list them. Why? Because every math branch, or “theory” as we say,
has different axioms! Unless we do, say, (axiomatic) set theory there is absolutely no
need to list all the set theory axioms.
2. Euclidean geometry:
• From two distinct points passes one and only one line.
• (“Axiom of parallels”) From a point A off a line named k —both A and k being on
the same plane— passes a unique line on said plane that is parallel to k.
• Many others that we omit.
This is the axiom of “foundation” from which one can prove things like A ∈ A is
always false.
It says that IF there is any element in A at all —this is the hypothesis part “(∃y)y ∈
A”— THEN there is some element —this is the part “(∃x) x ∈ A”— below which, if
you follow “∈” backwards from it, you will not find a z (“¬(∃z)”) that is both below
x along ∈ backwards, and also a member of A —this part is “(z ∈ x ∧ z ∈ A)”.
F1 , F2 , . . . , Fi , . . . , Fn (1)
Such proofs are known as “Hilbert-style proofs”. We write them vertically, one formula
per line, every formula consecutively numbered, with annotation to the right of formulas
(the “why did I write this?”). Like this
1) F1 because
2) F2 because
.. .. ..
. . .
n) Fn because
Often one writes A to symbolically say that A is a theorem. If we must indicate that we
worked in some specific theory, say ZFC (set theory), then we may indicate this as
Z FC A
If moreover we have had some “non-axiom hypotheses” (read on to see when this happens!)
that form a set , then we may indicate so by writing
Z FC A
4.1 Enriching Our Proofs to Manipulate Quantifiers 129
Why for a set of (non-axiom) assumptions? Because we reserve upper case latin letters
for formulas. For sets of formulas we use a distinguishable capital letter, so, we chose
distinguishable Greek capital letters, such as , , , , , , . Obviously, Greek capital
letters like A, B, E, Z will not do!
Before we do a few example proofs, some easy and some more complex, let us establish
a few properties of proofs.
Proof The syntactic fitness for a non-axiom, non-hypothesis Ai for inclusion in a proof
depends on formulas to its left, not to its right. Thus dropping the “tail” Ak , Ak+1 , . . . , An
will leave a shorter proof A1 , . . . Ak .
Proof Suppose we have proved A as A. With this fact in hand, a proof from as
hypotheses is legitimate if along with quoting members of , members of the theory in
hand, and logical axioms we also allow ourselves to quote A.
130 4 A Tiny Bit of Informal Logic
B1 , B2 , . . . , Bk , ... A
Finally, the tail part Bk+1 , . . . , Bn has all its B j pass the proof test (1.–5. in 4.1.11) as
these depend ONLY on the sequence B1 , B2 , . . . , Bk , A.
In short, (2) IS a proof according to 4.1.11
The whole point is: By the devise of including not only A but also its (inaccessible)
proof we rendered (2) formally correct, unlike (1). We legitimised (1) as a practical
shorthand.
The following is related:
1. A1
2. A2
4.1 Enriching Our Proofs to Manipulate Quantifiers 131
3. A3
..
.
n. An
then we have B.
A1 , A2 , . . . , An , . . . , B (1)
. . . , A1
. . . , A2
..
.
. . . , An
. . . , A1 . . . , A2 ... . . . , An A1 , A2 , . . . , An , . . . , B (2)
The first n proofs, as concatenated together, form a proof from (4.1.14, corollary). Now
concatenating box (1) at the right end of the foregoing sequence we get the sequence (2).
Since all the Ai that are hypotheses in the proof (1) are copies of those proved from
in the previous n boxes —and since in a proof we may repeat a formula that we proved
earlier as many times, and anywhere after its first occurrence, we please— it follows that
the sequence (2) is a -proof.
4.1.19 Example (New (derived) rules) A derived rule is one we were not given as primitive
—in 4.1.9— to bootstrap logic, but we can still prove that it propagates truth.
Aha! We used a non-axiom assumption (hypothesis) here! I write a Hilbert proof to show
that A[t] is a theorem if (∀x)A[x] is a (non-axiom) hypothesis (assumption) —shortened
to “hyp”.
1) (∀x)A[x] hyp
2) (∀x)A[x] → A[t] axiom 2
3) A[t] 1 + 2 + MP
1) A[t] hyp
2) (∀x)¬A[x] → ¬A[t] axiom 2
3) A[t] → ¬(∀x)¬A[x] 2 + Post
4) ¬(∀x)¬A[x] 3 + MP
Line 4 contains (∃x)A[x] by axiom 7, or, if you prefer, “(∃x)A[x] is obtained from axiom 7
and an application of Post”.
Instead of “tautological implication” we may give as reason just “Post” (no quotes) see line 3
above since it is Post’s completeness theorem for Boolean Logic that is at play when we
invoke “tautological implication” as a rule of inference (see 4.1.9, 2.)
Taking t to be x we have A[x] (∃x)A[x], simply written as A (∃x)A.
There are two principles of proof that we state without proof here (but you should try to
prove them as Exercises (Sect. 4.2) where several helpful hints are included.).
4.1.20 Remark (Deduction theorem and proof by contradiction)
1. The deduction theorem (also known as “proof by assuming the antecedent”) states, if
, A B (1)
then also A → B, provided that in the proof of (1), all free variables of A were
treated as constants: That is we neither used them to do a Gen, nor substituted objects
into them.
The notations “, A” and “ + A” are standard for the more cumbersome ∪ {A}.
4.1.21 Remark (Ping-Pong) For any formulas A and B, the formula —where I am using
way more brackets than I have to, ironically, to improve readability—
(A ≡ B) ≡ (A → B) ∧ (B → A)
is a tautology (draw up a truth table with one row for each of the possible values of A and
B and verify that the equivalence is always t).
Thus to prove the lhs of the second ≡ suffices to prove the rhs:
.. ..
. .
1) (A → B) ∧ (B → A) suppose I proved this
2) (A ≡ B) ≡ (A → B) ∧ (B → A) tautology, hence also axiom
3) A≡B 1 + 2 + tautological implication
In turn, to prove the rhs it suffices to prove each of A → B and B → A separately. This last
idea encapsulates the ping-pong approach to proving equivalences.
Here are a few applications.
1) (∀x)(A ∧ B) hyp
2) A∧B 1 + Spec
3) A 2 + tautological implication
4) B 2 + tautological implication
5) (∀x)A 3 + Gen; OK : x is not free in line 1
6) (∀x)B 4 + Gen; OK : x is not free in line 1
7) (∀x)A ∧ (∀x)B 5 + 6 + tautological implication
134 4 A Tiny Bit of Informal Logic
1) (∀x)(∀y)A hyp
2) (∀y)A 1 + Spec
3) A 2 + Spec
4) (∀x)A 3 + Gen; OK, no free x in line 1
5) (∀y)(∀x)A 4 + Gen; OK, no free y in line 1
In preparation for the removal of an ∃-prefix proof we will need an important and very
useful result, that of renaming the bound variable:
4.1.23 Definition (Substitution Again) Recall Definition 4.1.7. We indicate there that once
we declared our interest in the (possibly) free variable x of A by writing A[x], we can in
the same context write A[t] to indicate that all the free occurrences of x (if any) in A are
everywhere replaced by the object (term) t. Recall the caution needed in such substitutions
(4.1.6).
Here we add an explicit notation for the process “find and replace by the term t all the
free occurrences of x in A”: The symbol is A[x ← t].
The substitution operation compound symbol [x ← t] is viewed as an operator or con-
nective and as such it has the highest priority of all connectives.
4.1 Enriching Our Proofs to Manipulate Quantifiers 135
Thus, if we write A∧ B[x ← t] then we mean A ∧ (B[x ← t]). Also, (∀y)A[x ← t]
means (∀y) A[x ← t] thus there is no capture of y (if it appears in t) since the substitution
took effect before the quantifier was applied. For example, obtaining (∀x)x = 0 from x = 0
we do not speak of capture of x! It is just the process of formula formation!
Logicians annotate this step in a proof as “aux. hyp. associated with (∃x)A[x]”.
Now proceed to prove B using all that is known to you —that is, the axioms of the theory
T that you work in, perhaps some non-axiom hypotheses , and (∃x)A[x], and the new
non-axiom hypothesis A[z].
The technique of removing an ∃ -prefix guarantees that you did better than
, A[z] T B
T B
4.1.25 Example In practice we often have an assumption (∃x)Q from which we want to
eliminate (∃x) to benefit from the (possibly) uncovered Boolean structure of Q. This fits
with the theorem above taking = {(∃x)Q}.
For example, prove (∃y)(∀x)A[x, y] → (∀x)(∃y)A[x, y].
By the DThm it suffices to prove (∃y)(∀x)A[x, y] (∀x)(∃y)A[x, y] instead.
136 4 A Tiny Bit of Informal Logic
1) (∃y)(∀x)A[x, y] hyp
2) (∀x)A[x, z] aux. hyp. caused by 1; z is some fresh variable, not in the conclusion
3) A[x, z] 2 + Spec
4) (∃y)A[x, y] 3 + Dual Spec
5) (∀x)(∃y)A[x, y] 4 + Gen; OK, no free x in lines 1 and 2
I said in line 2, “z is some fresh variable, not in the conclusion”. Doesn’t “fresh” cover the
“not in the conclusion?” NO! “Fresh” ensures that none of the lines before the introduction
of z contain it. Freshness is not global to the proof! So, non occurrence in B must be added
explicitly.
4.1.26 Example Can I also prove the converse of the above? That is (∀x)(∃y)A[x, y] →
(∃y)(∀x)A[x, y].
I will try.
1) (∀x)(∃y)A[x, y] hyp
2) (∃y)A[x, y] 1 + spec
3) A[x, z] aux. hyp. for 2; z is fresh and not in the conclusion
4) (∀x)A[x, z] 3 + Gen; Hmm!
Illegal: I should treat the free x of aux. hyp. as a constant!
A question like this, if you are to answer “no”, must be resolved by offering a coun-
terexample. That is, a special case of A for which I can clearly see that the claim is not
true.
Here is one such special case:
Say we work in N. The lhs of → is true, but the rhs is false as it claims that there is a number
such that all numbers are equal to it. So the implication fails in the special case invalidating
also the general case.5
Another useful principle that can be proved, but we will not do so, is that one can replace
equivalents-by-equivalents. That is, if C is some formula, and if I have
then I can replace one (or more) occurrence(s) of A in C (as subformula(s)) by B and call
the resulting formula C , and be guaranteed the conclusion C ≡ C . That is, from A ≡ B, I
can prove C ≡ C .
This principle is called the equivalence theorem.
Let’s do a couple of ad hoc additional examples before we move to the section on Induc-
tion.
1) A→B hyp
2) (∀x)A hyp
3) A 2 + Spec
4) B 1 + 3 + MP
5) (∀x)B 4 + Gen; OK as the DThm hypothesis (line 2) has no free x
We apply the equivalence theorem above. To this end, note that A 𠪪A is a tautology,
hence an axiom in group 1, and thus a theorem: A 𠪪A. Applying the equivalence
theorem to (1) we thus obtain:
(∃x)¬A ≡ ¬(∀x)A (2)
Applying another tautological implication to (2) we get
(∀x)A ≡ ¬(∃x)¬A
which is of the same form as 4.1.8(7) with the roles of ∃ and ∀ reversed.
4.1.30 Exercise Prove A ≡ B (∀x)A ≡ (∀x)B without relying on the equivalence the-
orem. Rather use 4.1.27 in your proof, remembering the ping-pong tautology (4.1.21).
138 4 A Tiny Bit of Informal Logic
1) A→B hyp
2) (∀x)(A → B) 1 + Gen
3) (∀x)(A → B) → (∀x)A → (∀x)B axiom 2
4) (∀x)A → (∀x)B 2 + 3 + MP
5) A → (∀x)A axiom 3
6) A → (∀x)B 4 + 5 + Post
4.1.32 Example (Variant Theorem for ∀) Another useful result that practitioners use
without quoting and without notice is the “bound variable renaming”, which some people,
uncharitably, call the “dummy renaming” theorem. In Shoenfield (1967), Tourlakis (2003a)
it goes as the variant theorem. Suppose that the variable z is fresh for (∀x)A[x]. Then we
have the theorem
(∀x)A[x] ≡ (∀z)A[z]
The proof uses ping-pong and is straightforward, except that in the “← Direction” below it
requires some combinatorial thinking that is not part of logic.
→ Direction. Note that step 1 below has a legal substitution [x ← z] since freshness of
z entails that no part of A is “(∀z)(. . . x . . .)” to give trouble when we do [x ← z].
← Direction. Here we start the two-line proof with “(∀z)A[z] → A[z][z ← x]”. We
need to argue that “A[z][z ← x]” is the same as “A[x]” or just “A” in simpler notation.
First off, the rightmost substitution in A[z][z ← x] is legal. Why? Because there is NO
(∀x)(. . . z . . .)-part in A[z ← x] to capture x. Note that
Thus a z could only appear in (∀x)(. . . . . .) in the spot iff “” were a free x
—impossible when such an x is in the scope of (∀x).
4.2 Exercises 139
Indeed, we see that A[z][z ← x] is A[x ← z][z ← x]. Now, we already noted that
A[x ← z] is legal. At the end of this operation we introduce the symbol “z” in precisely
those spots where
A held
originally free occurrences of x.
But then, A[x ← z] [z ← x] will change back to x precisely those z that were originally
free x. Seeing that there were no preexisting z, all z change back to x. A is restored.
Now the ← Direction proof.
4.2 Exercises
(∃x)A[x] ≡ (∃z)A[z]
8. (The Deduction Theorem) The reader is asked here to prove the Deduction Theorem:
140 4 A Tiny Bit of Informal Logic
, A B (1)
follows
A→B
For (Induction Hypothesis) I.H. fix an n and assume all + A-proofs of lengths
≤ n satisfy the theorem.
We now embark discussing a + A-proof of length n + 1.
A1 , A2 , . . . , A j , . . . , Ak → B, . . . , An , B (1)
Cases for B:
A→B
c. B was obtained by Gen from one previous formula in (1), say A j , that is, B is
(∀x)A j . By the I.H. we have
4.2 Exercises 141
A → Aj (4)
End of Hints.
9. (The Deduction Theorem version 2) The reader is asked here to prove this version of
the Deduction Theorem:
Suppose that
, A B (1)
and that there is a proof of this fact that treated all the free variables of A as
constants, meaning, if x is such a variable, then we never used it in said proof with
(∀x) nor with [x ← t]. Under these conditions prove that we have
A→B
10. (Proof by Contradiction) Let us (somewhat informally) consider, as we did before, the
truth values f and t as “constant” atomic (i.e., devoid of connectives hence of Boolean
structure) formulas.
We then state the principle of proof by contradiction as “to prove A, where A has
no free variable, is the same as proving a falsehood, such as f, from premises + ¬A”.
Thus, prove that, for a sentence A, we have A iff , ¬A f.
Hint. In the if-direction use the deduction theorem to obtain ¬A → f. Follow up
with an application of Post.
In the only if-direction use 4.1.16 to show , ¬A A and then the definition of proof
to also show , ¬A ¬A. Follow up these two with an application of Post.
11. (Proof by Auxiliary Hypothesis, or ∃ -Elimination) See also 4.1.24 Suppose that
(∃x)A and let z be fresh for (∃x)A and B. Then
Hint. The assumption is that we have a proof with the help of the auxiliary hypothesis,
aux. hyp
, A[x ← z] B
Treating all the variables (incl. z) of the auxiliary hypothesis as constants we apply the
deduction theorem to get A[x ← z] → B.
142 4 A Tiny Bit of Informal Logic
By 4.2.6 obtain (∃z)A[z] → B. Then (∃x)A[x] ≡ (∃z)A[z] and the previous yield
(∃x)A[x] → B by Post. Now useone of the assumptions
and Post to get B.
12. Prove by ∃-elimination that (∃x) (A ∨ B) → C → (∃x)(A → C) ∧ (∃x)(B →
C).
13. Find a proof other than via ∃-elimination
for the above.
14. Prove by ∃-elimination that (∃x) (A ∨ B) → C → (∃x) (A → C) ∧ (B → C) .
15. Find a proof other than via ∃-elimination
for the above.
16. Prove by ∃-elimination that (∃x) (A → B) ∧ (A → C) → (∃x) A → B ∧ C .
17. Find a proof other than via ∃-elimination for the above.
18. Prove by ∃-elimination that (∀x)A → (∃x)(A → B) → (∃x)B.
19. Prove by ∃-elimination that (∃x)A → (∀x)(A → B) → (∃x)B.
20. Find a proof other than via ∃-elimination for the above.
21. Let φ stand for an unspecified relation of two variables.
This could be anything like: <, >, ≤, ≥, ∈, , ⊂ and many others!
Prove (1) within pure logic, that is, logic without any theory-specific axioms, no math
hypotheses, and no special meaning for symbols. Anyway, “symbols” —other than log-
ical symbols, that is, connectives, brackets and =— never have any inherent meaning
relating to their shape if there are no axioms about said symbols!
¬(∃y)(∀x) φ(x, y) ≡ ¬φ(x, x) (1)
Hint. Prove (1) via a proof by contradiction. In the initial setup you end up with a formula,
which has a leading (∃y) and it can prove f. Now apply ∃-elimination to construct the
latter proof.
22. What just happened in 21 above? Contextualise within set theory and discuss.
Induction
5
Overview
This chapter is about two of the most important topics in a course on discrete mathematics
—induction and inductive definitions. Nowadays most authors prefer to call “inductive”
definitions “recursive”. These topics are called upon in numerous sequel courses such as
logic, data structures, theory of computation, design and analysis of algorithms.
In this chapter we introduce the induction (proof) principle on N as an equivalent principle
to the least (integer) principle on N.
But we also generalise induction in two important directions making this tool sophis-
ticated enough to be applicable to advanced readings, for example (axiomatic) set theory,
which is relevant to mathematics students: One direction is to recognise that the induction
principle (equivalently, the minimal condition or principle, MC, which is a generalisation of
the least principle on N) which on N is an attribute of the “natural” order <, can be extended
to arbitrary orders on arbitrary classes. This opens applicability of induction to any classes
that are equipped with an order that has MC.
The other direction of our generalisation is to recognise that a relation —whether we
denote it by “P” or “<”— does not have to be an order for us to do induction along it. All
it needs is to satisfy MC. For example, we can do induction along ∈ —this is not an order
on all sets (it fails transitivity)— to prove properties of classes!
The chapter concludes with the important topic of inductive or recursive definitions of
functions, such as, for example, the recursive definition of the factorial function 0! = 1 and
(n + 1)! = (n + 1) × (n!), for n ≥ 0. Such inductive definitions are central in the theory of
computation, and in practical computation in fact, since it turns out that in practical com-
putation we cannot compute functions beyond the so-called primitive recursive functions;
cannot even compute all primitive recursive functions (except only “in principle”) due to
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 143
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_5
144 5 Induction
the fact that “most of them” have astronomical outputs —like the function that outputs the
“ladder” of x 2s on input x below1 — and equally astronomical run times needed for their
computation. For example it can trivially be proved that the function that with input x outputs
the following number is primitive recursive.
· ·2
· x 2s
22
What is the connection between primitive recursive functions and recursive definitions?
Primitive recursive functions (cf. e.g., Tourlakis (2022)) are formed by starting from trivial
functions, such as the function that for all inputs returns zero, using repeatedly compositions
and so-called “primitive” recursive definitions. A primitive recursive definition is a special
simple form of a general inductive definition where f is defined from functions h and g by
the two equations below, valid for all x, y from N,
f (0, y) = h(y)
f (x + 1, y) = g(x, y, f (x, y))
Definitions just as the above are based on the fact that N supports induction along the
standard “<”. We will not omit a generalisation of recursive definitions along any relation P
that may have MC without being an order. As an example, taking P = ∈ we give an inductive
definition over the class U of the so-called “support” function that for any set A as input
returns the set of all the atoms used to build A. For example, if A = {{{2}}, {1}}, it returns
{2, 1}.
In Remark 3.4.29 we concluded with a formulation of the minimal condition (MC) for any
order <, for which fields (left/right) have not been specified, an unrelativised order, that is.
We did this as follows:
That an “order < has MC” is captured by —i.e., is equivalent to— the statement
For any “property”, that is, formula F[x]2 we have that the following is true
(∃a)F[a] → (∃a) F[a] ∧ ¬(∃y) y < a ∧ F[y] (†)
More generally, for a non-order relation P we saw that “P has MC” (see also 3.4.30 for
the concept “a is P-minimal in A”) is expressed in terms of classes (3.4.32 , (see (1 ))) as
2 22
1 222 = 16 but 222 = 65536 while 222 is astronomical.
2 Recall that this notation, square brackets, indicates our interest in one among the, possibly many,
free variables of F.
5.1 Inductiveness Condition (IC) 145
while in terms of “properties” F[x] it is expressed by 3.4.32 , (see (2 )), reproduced below.
(∃a)F(a) → (∃a) F(a) ∧ ¬(∃y) yPa ∧ F(y) (2 )
which formally looks exactly like (†) above but using the symbol “P” instead of “<”.
Thus, (†) and (2 ) are formally (in form!) identical, but semantically we will need to
recall that “<” represents any order with MC in (†) while “P represents any relation
with MC” in (2 ) (and (1 )).
Let us logically manipulate (2 ) to bring it into an equivalent form that goes under the
nickname “Inductiveness Condition” —in short IC— or, alternatively, is called the “Principle
of Induction.”
Let us rewrite (2 ) replacing F[x] by ¬G[x] everywhere, where G[x] is arbitrary. We
get the theorem
(∃a)¬G[a] → (∃a) ¬G[a] ∧ ¬(∃y) yPa ∧ ¬G[y] (2)
Using the equivalence theorem (p. 137) and Axiom 7 (p. 125), we obtain from (2)
¬(∀a)G[x] → ¬(∀a)¬ ¬G[a] ∧ (∀y)¬ yPa ∧ ¬G[y]
¬A ∨ B ≡ A → B
(twice) and the equivalence theorem, we transform the above to this theorem:
(∀a) (∀y) yPa → G[y] → G[a] → (∀a)G[a] (3)
146 5 Induction
Display (3) above expresses the Inductiveness Condition (IC) for P, or, as we usually say,
expresses the principle of strong induction, or complete induction, or course-of-values induc-
tion for the —not necessarily an order— relation P.
5.1.1 Remark The above method of showing the equivalence between MC and IC is not
mentioned much in the literature (see however Barwise (1975) who applies it in the case
where P is ∈ on U).
De f
If we replace, everywhere in (3), the formula G[y] by the class B = {y : G[y]}, then we
directly obtain (3 ) below from (3):
(∀a) (a)P−1 ⊆ B → a ∈ B → (∀a)a ∈ B (3 )
Note that the y in (a)P−1 are the P-predecessors of a in the sense that they are precisely
those y that satisfy yPa —“y is before a along P”.
It is extremely useful to state (3 ) in words.
If we want to prove that all a are in some class B and we have a relation P with IC
(equivalently MC), then it suffices to prove, for any arbitrary unspecified a, that a is
in B provided all its P-predecessors are.
The part “for any arbitrary unspecified a” is English for the part (∀a). For such an a we
prove a ∈ B with the help of the condition (assumption) (a)P−1 ⊆ B . The last implication
in (3 ) says that “for any arbitrary unspecified a, a is in B, unconditionally”.
The boxed formula above is called the Induction Hypothesis or I.H. for a —and so is the
corresponding part “(∀y) yPa → G[y] ” in (3).
The essence of the I.H. in either formulation —(3) or (3 )— is that it assists in the proof
of the leftmost (conditional) “a ∈ B” (or “G[a]”) for the “arbitrary unspecified a”. Having
proved a ∈ B under the I.H., the fact that our P has IC also implies (the last implication in
(3) or (3 )) the unconditional truth of a ∈ B for the arbitrary a.
5.1 Inductiveness Condition (IC) 147
Thus the rightmost (unconditional) “a ∈ B” is established for all a only from the axioms
and assumptions of the theory we are working in (e.g., set theory, number theory, etc)
and the I.H. drops out from the hypotheses list.
Of course, (∀a)a ∈ B implies U ⊆ B, hence B = U, the set theoretic class of “all things”.
This is fine for (informal) set theory, indeed useful, but we often work within much smaller
than U classes A.
For example in number theory we work in N. Then the classes B of interest will be subsets
of N. Therefore let us formulate IC relativised to a class or set A before we move on to
practical considerations and examples.
We reproduce here the formula (†) that says “P has MC relative to a class A” (cf. 3.4.37)
—which is the same as “P | A has MC” (Definition 3.4.34):
(∃b ∈ A)F[b] → (∃b ∈ A) F[b] ∧ ¬(∃x ∈ A) F[x] ∧ xPb (†)
Applying the very same transformations we introduced on p. 145 to († ) we obtain the
equivalent formula (‡) below
(∀b ∈ A) (∀x ∈ A) xPb → G[x] → G[b] → (∀b ∈ A)G[b] (‡)
De f
If we replace, everywhere in (‡), the formula G[y] by the class B = {y ∈ A : G[y]}, then
we directly obtain (‡ ) below from (‡):
(∀b ∈ A) (b) P−1 | A ⊆ B → b ∈ B → (∀b ∈ A)b ∈ B (‡ )
Let us render (‡) more recognisable: By applying MP (modus ponens, cf. rule (M P) on
p. 125) I can transform (‡) in “rule of inference form”, indeed I will write it like a rule that
says, like all rules do, “if you proved my numerator, then my denominator is also proved!”
(∀b ∈ A) (∀x ∈ A) xPb → G[x] → G[b]
(∀b ∈ A)G[b]
148 5 Induction
The lhs of the second “→” is true. Thus, to certify the truth of that implication I must prove
G[b] without I.H. help.
This step was hidden in Steps (a) – (b) above. It is called the Basis of the induction.
5.2 IC Over N
With the above general considerations in hand, the present section focuses in some practise
with induction over N.
Taking P here to be the order < restricted to N and taking for granted that < | N has
MC (as we argued informally in 3.3.14; but see also the counterpoint in -delimited
comments on p. 151 !) and we conclude that < | N also has IC.
Thus we have, for some arbitrary property P[y] of natural numbers, the special form of
(‡) below:
(∀n ∈ N) (∀k ∈ N) k < n → P[k] → P[n] → (∀n ∈ N)P[n]
5.2.1 Remark Of course, we have the proof technique we called CVI (after Kleene) for
any POset (A, <) where < has IC (equivalently MC) over A, that is, < | A has IC.
There is another simpler induction principle over N that we call, well, simple induction:
P[0], P[x] → P[x + 1]
(S I )
P[x]
“SI” above stands for Simple Induction. That is, to prove P[x] for all x (denominator) do
three things:
Step 1. Prove/verify P[0]
Step 2. Assume P[x] for fixed (“frozen”) x (unspecified!).
Step 3. prove P[x + 1] for that same x. The assumption is the I.H. for simple induction.
The I.S. is the step that proves P[x + 1].
Note that what is described here is precisely an application of the Deduction theorem
towards proving “P[x] → P[x + 1]”, that is, proving the implication for any
given x.
Step 4. If you have done Step 1. through Step 3. above, then you have proved P[x] (for
all x in N is implied!)
150 5 Induction
Is the principle SI correct? I.e., if I do all that the numerator of SI asks me to do (or Steps
1. – 3.), then do I really get that the denominator is true (for all x implied)?
Proof Suppose SI is not correct. Then, for some property P[x], despite having completed
Steps 1. – 3., still, P[x] is not true for all x!
Well, if so, then by MC let n ∈ N be smallest such that P[n] is false. Now, n > 0
since I did verify the truth of P[0] (Step 1.). Thus, n − 1 ≥ 0. But then, when I proved
“P[x] → P[x + 1] for all x (in N)” —in Steps 2. and 3.— this includes proving the case
Now, by the smallest-ness of n, P[n − 1] is true, hence P[n] is true by (4) and the truth
table of “→”. I have just got a contradiction! I conclude that no such smallest n exists, i.e.,
P[x] is true (for all x ∈ N).
We conclude that SI works —if MC does (cf. discussion in the -passage on
p. 151).
How do the simple and course-of-values induction relate? They are equivalent tools, or,
they have the same (proof) power as we say. Here is why:
5.2.3 Theorem From the validity of SI I can obtain the validity of MC.
{0, 1, . . . , n, n + 1} ⊆ T (3)
Towards (3) —given the I.H.— I need to show n + 1 ∈ T . Suppose this is not true. But then
n + 1 ∈ S and the I.H. implies that none of 0, 1, . . . , n is in S. This means n + 1 is minimal
in S, a contradiction.
Having shown (2), for all n ∈ N, we have N ⊆ T hence T = N and thus S = ∅. A
contradiction to (1). Done.
Proof For < | N we have CVI iff we have MC iff we have SI.
5.2.5 Remark
1. When do I use CVI and when SI? SI is best to use when to prove P[x] (in the I.S.) I only
need to know P[x − 1] is true. CVI is used when we need a more flexible I.H. that P[n]
is true for all n < x. See the examples below!
2. “0” is the boundary case if the claim we are proving is valid “for all n ∈ N”, or simply put,
“for n ≥ 0”. If the claim is “for all n ≥ a, P[n] is true” then usually P[n] is meaningless
for x < a and thus the Basis is for n = a.
Having established that MC and CVI (and SI) are equivalent for the “standard” order <, it
follows that over N we have both or we have none. Which one is it?
We have informally argued earlier (e.g., in the section on congruences, 3.3.14) that <
does have MC over N but the informal argument we gave there does not have the force of a
proof.
A mathematical proof requires that established properties of natural numbers and the
set N be known and used. We only offered a tentative argument within informal set theory
where we took for granted that N is one of this theory’s sets. If so, what properties does this
set have?
The proper way to prove within set theory that N has CVI or equivalently MC is to build
a copy of N within set theory and prove such properties as theorems.
Indeed this can be done (e.g., in Tourlakis (2003b)) but we did not do it here. One builds
a counterpart of N and gives it an alternative name —ω— as follows: “0” is defined to be
the object ∅. If we defined the number n as a set, then its successor, n + 1, is defined as
the set n ∪ {n}. Thus n + 1 stands for {0, 1, 2, . . . , n}. We then prove that the class of all so
constructed natural numbers is a set, and call it ω.
Next we prove that < on ω defined by
De f
n < m iff n m
More elegantly, one may axiomatise N outside set theory, via the Peano’s axioms. One
such axiom schema states that the “<” on the set of natural numbers —with its basic prop-
erties axiomatically postulated— satisfies simple induction.
Hm. But we have seen arguments that directly “prove” simple induction “works” employ-
ing a “falling dominoes” argument. Haven’t we? It goes like this:
P[0] and P[0] → P[1] yield P[1]; next P[1] and P[1] → P[2] yield P[2]; next
. . . next P[n] and P[n]→P[n + 1] yield P[n + 1]; . . .
(2)
proves P[n + 1] is true for no matter what n.
That is, we can prove P[x] for any natural number x, the argument x being reachable by
adding one repeatedly. However, this does not say that we proved (∀x)P[x].
One, a proof has finite length and we cannot extend the proof (2) by just repeating the
parts “P[n] and P[n] → P[n + 1] yield P[n + 1]” an infinite number of times.
Two, nor can we be sure that all Informal natural numbers can be reached from 0 by
just repeatedly adding one. What are the natural numbers? Unless we know more about the
natural numbers we cannot be sure that, for example, there are no “infinite natural numbers”
after the end of the sequence 0, 1, 2, 3, . . . , n, . . . The subclass I of N consisting of only
“infinite numbers” has no least member, since if X is an infinite natural number, then so is
X − 1.
In particular, such an observation invalidates the argument in support of the thesis that <
on N has MC that we offered in 3.3.14, namely,
The discussion above triggers the motivation to connect the non existence of infinite
downwards walks with the presence of MC (or IC).
5.2.1 Well-Foundedness
5.2.6 Definition For any relation P, an infinite descending P-chain is a function f with the
properties
Intuitively, P is well-founded if the universe of all sets and atoms U cannot contain an infinite
descending chain, while it is well-founded over A if A cannot contain an infinite descending
P-chain. Clearly, no infinite descending P-chain can “start” anywhere outside ran(P) in any
case.
There is some disagreement on the term “well-founded” in the literature. In some of the
literature it applies definitionally to what we have called relations “with MC”. However, in
the presence of AC well-founded relations are precisely those that have MC, so the slight
confusion —if any— is harmless.
Proof The equivalence of (1) and (2) as well as (1)=⇒(3) have already been proved. Thus
we only need to prove that (3) implies (1). So assume (3) and let (1) fail. Let ∅ = B ⊆ A
such that B has no P-minimal elements. Pick an a ∈ B. Since it cannot be P-minimal, pick
an a1 ∈ B such that a1 Pa. Since a1 cannot be P-minimal, pick an a2 ∈ B such that a2 Pa1 .
This process can continue ad infinitum to yield an infinite descending chain
. . . a3 Pa2 Pa1 Pa in A, contradicting (3). Done.
This argument used AC, and more precisely it goes like this:
Let g be a choice function for 2 B − {∅}, that is, for each S ∈ 2 B − {∅}, we have g(S) ∈ S
(cf. 3.5.28).
Define now f on N as
a if n = 0
f (n) =
g B ∩ ( f (n − 1))P−1 if n > 0
5.2.10 Remark The implication (3)=⇒(1) and hence the entire corollary goes through
for any classes A and ∅ = B ⊆ A as long as P is left-narrow as we say, that is, the class
{x : xPa} = (a)P−1 is a set for all a ∈ A.
Indeed, the part “let a ∈ B” needs no elaboration, and moreover all B ∩ ( f (n − 1))P−1
are sets by left-narrowness.
5.2.11 Definition (λ-notation) λ-notation is very useful in both discrete mathematics and
in the theory of computation (Tourlakis (2012, 2022)). It easily allows us to separate the
intensional notation of a function —i.e., what it does on any given input— as opposed to its
extensional notation, that is, the function as a possibly infinite table of input/output pairs.
The format of λ-notation is
begin input
↓ end input
↓
λ list of inputs . rule for how to obtain the output
Examples:
1. λx.x + 1.
2. λx y.x − y.
3. λx y.x + 42. In this example the input y is ignored. Its value is not used to compute the
output.
5.2 IC Over N 155
. . . R1R2R3R1R2R3R1R2R3R1
Now R + = {1, 1, 2, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 3, 2, 1, 3, 2} which is not
a partial order (it is reflexive), nor is it a “reflexive” order, since it is not antisymmetric
(e.g., 1R3 ∧ 3R1 requires 1 = 3).
It turns out that if P has MC, then so does P+ and hence, in particular, it is a partial order,
being irreflexive.
(a)P−1 ↑ (1)
Suppose now that bP+ a for some b. Then, for some b1 , b2 , . . . , bk we have
Proof It is given that P|A has MC (IC). By 5.2.13 (P|A)+ has MC (IC).
We cannot sharpen the above to “P+ has MC (IC) over A”, for that means that P+ |A has
MC. This is not true though: Let O be the odd natural numbers, and R be defined on N by
x Ry iff x = y + 1, thus R + = >.
Now, R has MC over O (for R|O = ∅), yet R + does not, for R + |O has an infinite
descending chain in O:
... > 7 > 5 > 3 > 1
In particular, we note from this example that (P|A)+ = P+ |A in general.
156 5 Induction
This is just our familiar (from N) “simple” (as opposed to “course-of-values”) induction
SI over N.
The “natural” < on N is ≺+ . <-induction over N coincides with the “usual” CVI over N
displayed at the top of Section 5.2.
It says that ∈ has MC. Therefore properties of sets can be proved by ∈-IC (∈-CVI) over U.
5.2.17 Example This is the “classical first example of induction use” in the discrete math
bibliography! Prove that
n(n + 1)
0 + 1 + 2 + ... + n = (1)
2
So, the property to prove is the entire expression (1). One must learn to not have to rename
the “properties to use” as “P[n]”.
I will use SI. So let us do the Basis. Boundary case is n = 0. We verify: lhs = 0. r hs =
(0 × 1)/2 = 0. Good!
Now fix n and take the expression (1) as I.H.
0 + 1 + 2 + . . . + n + (n + 1)
using I.H.
= n(n + 1) + (n + 1)
2
arithmetic
= (n + 1)(n/2 + 1)
arithmetic (n + 1)(n + 2)
= 2
5.2.18 Example Same as above but doing away with the “0+”. Again, I use SI.
n(n + 1)
1 + 2 + ... + n = (1)
2
1 + 2 + . . . + n + (n + 1)
using I.H.
= n(n + 1) + (n + 1)
2
arithmetic
= (n + 1)(n/2 + 1)
arithmetic (n + 1)(n + 2)
= 2
• Basis. n = 0: 1 = 20 = 21 − 1. True.
• As I.H. take (1) for fixed n.
• I.S.
using I.H. n+1
1 + 2 + 22 + . . . + 2n + 2n+1 = 2 − 1 + 2n+1
arithmetic
= 2 · 2n+1
−1
arithmetic n+2
= 2 −1
158 5 Induction
• Basis. For n = 0 the expression “0” has the form of the rhs of (1) and satisfies inequality
(2).
• Fix an n > 0 and assume (I.H.) that if k < n, then k can be expressed as in (1) and (2).
• For the I.S. express the n of the I.H. using Euclid’s theorem (3.3.14) as
n = 10q + r
Then
n = 10q + r
n = 10 bt 10t + bt−1 10t−1 + · · · + b1 10 + b0 + r
n = bt 10t+1 + bt 10t + · · · + b1 102 + b0 10 + r
6 You will recall that a number N n > 1 is a prime iff —by definition— its only factors are 1
and n.
7 You see? Do you know many natural numbers n such that n − 1 divides n?! Only 2 has this property,
but 2 is just our Basis!
5.2 IC Over N 159
5.2.23 Example Another inequality. Let pn denote the n-th prime number, for n ≥ 0. Thus
p0 = 2, p1 = 3, p2 = 5, etc.
We prove that
n
pn ≤ 2 2 (1)
I use CVI on n. This is a bit of a rabbit out of a hat if you never read Euclid’s proof that
there are infinitely many primes.
0
• Basis p0 = 2 ≤ 22 = 21 = 2.
• Fix n > 0 and take (1) as I.H.
• The I.S.: I will work with the fixed n above and the expression (product of primes, plus
1; this is inspired from Euclid’s proof quoted above).
p 0 p1 p2 · · · p n + 1
I have
0 1 2 n
p0 p1 p2 · · · pn + 1≤ 22 22 22 · · · 22 + 1 by I.H.
= 22 +2 +2 +···+2 + 1
0 1 2 n
algebra
= 22 −1 + 1
n+1
by 5.2.19
< 22 −1 + 22 −1
n+1 n+1
smallest n possible is 0
= 21 · 22 −1
n+1
n+1
= 22
Since the sequence of primes is strictly increasing, pn+1 is the least that q can be.
Thus
n+1
pn+1 ≤ p0 p1 p2 · · · pn + 1 ≤ 22
in this case.
2. q is composite. By 5.2.20 some prime r divides q. Now, none of the
p 0 , p1 , p2 , · · · , p n
divides q because of the “ + 1”. Thus r is different from all of them, so it must be
one of pn+1 or pn+2 or pn+3 or …
Thus,
n+1
pn+1 ≤ r < q = p0 p1 p2 · · · pn + 1 ≤ 22
Done!
b1 = 3, b2 = 6
bk = bk−1 + bk−2 , for k ≥ 3
Proof So the boundary condition is (from the italicised part above) n = 1. This is the Basis.
Case 2. n > 2. Is bn divisible by 3? Well, bn = bn−1 + bn−2 in this case. By I.H. (valid
for all k < n) I have that bn−1 = 3t and bn−2 = 3r , for some integers t, r . Thus,
bn = 3(t + r ). Done!
5.2.25 Example (The Binomial Theorem) We prove in this example the so-called binomial
theorem, for any N n > 0 and any real or complex numbers a and b.
n
n n−i i
(a + b)n = a b (1)
i
i=0
n!
(n − m)!m!
where in turn n! stands for 1 × 2 × 3 × · · · × n, that is, it is inductively defined as
0! = 1
(n + 1)! = (n + 1) × n!
Suppose we have n objects. In how many ways can we choose m among them (m < n)
ignoring repetitions? Well, the first of the m elements I can choose in n ways. The second
of the m I can choose in n − 1 ways after I chose and removed the first. Clearly then I can
choose the 3rd in n − 2 ways after I removed the 2nd; etc.
All in all, I can choose all m members in n(n − 1)(n − 2) · · · n − (m − 1) ways.
Wait! I have m! repetitions in my choices of m elements if I do nothing else. So the final
answer is the above divided by m!
162 5 Induction
n(n − 1)(n − 2) · · · n − (m − 1)
=
m!
(n−m)!
n(n − 1)(n − 2) · · · n − (m − 1) (n − m) n − (m + 1) · · · 3 · 2 · 1
=
m!(n − m)!
n!
m!(n − m)!
Before we
embark on the proof of the binomial theorem here are some properties of the
n
symbol that we will use:
m
n n n!
I. = 1. Indeed, = , but 0! = 1 by definition.
0 0 (n − 0)!0!
n n n!
II. = 1. Indeed, = .
n n n!(n − n)!
III.
n n n+1
+ =
m m−1 m
n
Indeed we work from left to right using the definition of .
m
n n n! n!
+ = +
m m−1 (n − m)!m! n − (m − 1) !(m − 1)!
n! 1 1
= +
(n − m)!(m − 1)! m n − (m − 1)
n! n+1
=
(n − m)!(m − 1)! m(n − m + 1)
(n + 1)!
=
(n + 1 − m)!m!
n+1
=
m
(a + b)n+1 = (a + b) (a + b)n
I .H . n n n n−1 1 n n−2 2 n n
= (a + b) a + a b + a b + ··· + b
0 1 2 n
multi ply n n+1 n n 1 n n−1 2 n n−2 3 n 1 n
= a + a b + a b + a b + ··· + a b
0 1 2 3 n
n n n n−1 2 n n−2 3
+ a b+ a b + a b + ···
0 1 2
n
+ a 1 bn + bn+1
n−1
I. II. III. n + 1 n+1 n+1 n 1 n + 1 n−1 2 n + 1 n−2 3
= a + a b + a b + a b +
0 1 2 3
n+1 1 n n + 1 n+1
··· + a b + b
n n+1
Here are a few additional exercises for you to try.
5.2.27 Exercise
b0 = 1, b1 = 2, b3 = 3
bk = bk−1 + bk−2 + bk−3 , for k ≥ 3
As a postscript to our examples of induction proofs we offer this comment. It is clear that
since sets such as N − {0, 1, 3, 4, 5} and N ∪ {−3, −2, −1} are well-ordered (by <) we can
carry induction proofs over them. In the former case the “basis” case is at 6, in the latter
case it is at −3. In fact, in the preceding problem 4. the basis is at n = 2.
164 5 Induction
Inductive definitions are increasingly being renamed to “recursive definitions” in the modern
literature, thus using “recursive” for definitions, and “induction” for proofs. I will not go
out of my way to use this dichotomy of nomenclature. Here are some familiar examples of
inductive definitions of functions.
5.3.1 Example For any integer a > 0 we define
a0 = 1
(†)
a n+1 = a · a n
F0 = 0
F1 = 1, and for n ≥ 1 (‡)
Fn+1 = Fn + Fn−1
Unlike the function (sequence) a 0 , a 1 , a 2 , a 3 , . . ., for which we only need the value at n
to compute the value at n + 1, the Fibonacci function needs two previous values, at n − 1
and at n, to compute the value at n + 1.
The question is: Given an inductive definition of a function, can we prove that a function
f exists —that is, a potentially infinitely long table of input/output pairs— that satisfies the
“inductive specification”?
This translates, in the first example above, into “is there a realisation f —as a function,
an infinite table in this case— of what the definition (†) specifies as the behaviour of the
function?”
Such a function must obey the two equations below:
f (0) = 1
(† )
f (n + 1) = a · f (n)
How NOT to answer this: “Of course it exists. This f satisfies f (0) = 1, so we got an
output for input x = 0. If we now assume that we do have an output f (n) at input x = n
(I.H.), then at input x = n + 1 we have the output a × f (n).”
What just happened here? We proved that IF a function f that satisfies († ) exists, then
it is total. We never proved that the infinite table, which the function f is supposed to be,
exists and has the stated property (i.e., obeys the inductive definition).
In fact we took for granted that f exists and satisfies the recurrence equations († ) and
proceeded to prove that then it will be total!
The above (non) “proof” of function existence has actually appeared in print in a Discrete
Math text!
This section looks into inductive definitions in general, and proves that a function defined
inductively as in, for example, († ) above exists and is unique.
We said “in general” above. So we will present the existence and uniqueness theorem by
generalising the Fibonacci example above in several directions, as follows:
1. Will have the second equation depend on several (more than just two that we used in the
Fibonacci definition) recursive calls, as we name them in computer programming.
2. The defined function will not need to be total nor will the functions —which make
the recursive calls (such as the function “+” in the Fibonacci example and “×” in the
exponential example)— need be total.
3. The inductive definition in the Fibonacci example defines Fn in terms of two recursive
calls on the two immediately preceding (along <) arguments n − 1 and n − 2 of n. We
generalise this in two directions:
• The order P we use in an inductive definition to “sort the inputs” in the general case
(equation number two) is not necessarily < on N, nor is necessarily total.
• The 2nd equation defines some function g at an argument a using one recursive call
for each predecessor of a along the order P.
Thus, to motivate the general inductive definition of a function F over any class equipped
with an order P that has IC, we first sketch the case of an inductive definition of a function
K over N equipped with the standard order <.
5.3.4 Remark The notation of the set-argument
K (0), K (1), . . . , K (n − 1) (2)
in the definition (1) above is significantly less informative than the notation implies! Its
members —listed again in (2)— are just members of the set A and the marking of the inputs
responsible for the various K (i) is not embedded in these output values! So neither we, nor
G, knows which is which if we are just given the values in the set (2).
Can we modify the right hand side of K (n) to G n, K (0), K (1), . . . , K (n − 1) ?
No, because a function G cannot have a variable number of arguments! (n + 1 arguments
in all), that increases or decreases with the value of n.
This final idea however works: Tag along the input values that cause the K (i), that is,
use
K (0) = C, and, for n > 0
(3)
K (n) = G n, 0, K (0) , 1, K (1) , . . . , n − 1, K (n − 1)
is (cf. 3.1.4)
K (n) >
which we will use in all that follows.
Our theorem of the existence and uniqueness of inductively defined functions will be for
recursions along an arbitrary partial order P with IC and then we will obtain as a corollary
the case where P has IC but is not necessarily an order. Of course, a trivial corollary to all
that will be the existence and uniqueness of functions K defined as in (3) above.
5.3.5 Definition (Levy (1979)) A relation P is left-narrow iff (x)P−1 is a set for all x. It is
left-narrow over A iff P | A is left-narrow.
For example, ∈ is left-narrow by the foundation axiom (5.2.16 and p. 127), while is
not.
5.3 Inductive Definitions of Functions 167
5.3.6 Definition (Initial Segments) If < is an order on A and a ∈ A, then the class {x :
x < a} = (a) > is called the (initial) open segment defined by a, while the class (a) ≥ =
{x : x ≤ a} is called the closed segment defined by a.
≤ is < ∪ =, of course, so that (a) ≥ = (a) > ∪{a}. Segments of left-narrow relations are
sets.
The requirement of left-narrowness guarantees (via Principle 3, 3.3.6) that the second argu-
ment of G in (1) is a set. This restriction does not adversely affect applicability of the theorem
as the reader will be able to observe.
In (1) above “=” is Kleene’s extended equality, so that in the recurrence (1) above we have
either both sides are defined and equal (as sets or atoms), or both are undefined (see 3.5.11).
Proof We prove uniqueness first, so let H : A → X also satisfy (1). Let a ∈ A and adopt
the I.H. that
(∀b < a)F(b) = H(b)
that is, for all b < a, (∀y)(b, y ∈ F ↔ b, y ∈ H), and therefore
It follows that
This settles the claim of uniqueness: (∀a ∈ A)F(a) = H(a), that is, F = H. Define now,
F = f : (∃a ∈ A) f : (a) ≥ → X
(2)
∧ ∀x ∈ (a) ≥ f (x) = G(x, f (x) >)
168 5 Induction
Note that F = ∅. For example, if a ∈ A is <-minimal,10 then (a) >= ∅ and hence f
(a) >= ∅ for any f , thus F contains {(a, G(a, ∅))}, if G(a, ∅) ↓, else it contains the
empty function ∅ : (a) ≥ → X.
For the latter we clearly have ∅(a) = G(a, ∅ (a) >), where both sides are undefined.
A trivial adaptation of the uniqueness argument to the case that A is a closed segment
(a) ≥, shows that if f : (a) ≥ → X and g : (a) ≥ → X are in F , then f = g. We use “ f a ”
to denote the unique f : (a) ≥ → X for each a ∈ A, if it exists.
To remove the hedging, fix a and assume (I.H.) that, for each b < a, f b : (b) ≥ → X
satisfying (1) (where here A = (b) ≥) exists.
Let us argue that so does f a . Indeed, define h : (a) ≥ → X from the existing (by the I.H.)
f b and G by
{a, G(a, b<a f b )} ∪ b<a f b if G(a, b<a fb ) ↓
h= (3)
b<a f b otherwise
Observe next that, by transitivity of <,11 we have (c) ≥ ⊆ (b) ≥ whenever c ≤ b,12 therefore
f c ⊆ f b , due to f c = f b ≤ (c) (by uniqueness).
We draw two conclusions:
First, to retire the induction, note that b<a f b in (3) is single-valued (a function) and is
equal to h (a) >. Thus h satisfies the recurrence (1) at a outright, and also at b < a because
h(b) = f b (b)
= G(b, f b (b) >)
= G(b, h (b) >), since f b ⊆ h
5.3.8 Remark
(1) Since (a) ≥ is a set for each a ∈ A (by left-narrowness), so is dom( f ) for each f ∈ F
and hence each f itself is a set, by Principle 3, so forming the class F is legitimate.13
(2) The simple recursion on the natural numbers, where g is total
f (0) = a
(1)
for n ≥ 0, f (n + 1) = g(n, f (n))
where
⎧
⎪
⎪ if n = 0
⎨a
G(n, h) = g(n − 1, h(n − 1)) if h is a function ∧ dom(h) = {x : x < n}
⎪
⎪
⎩↑ otherwise
5.3.9 Corollary (Inductive Definition with Respect to Any P with IC) Montague (1955),
Tarski (1955)
Let P : A → A be a left-narrow relation —not necessarily an order— with IC, and G
a (not necessarily total) function G : A × U → X, for some class X. Then there exists a
: A × U → X by
Proof Define G
↑ if f is not a function
G(a, f) = −1
G(a, f (a)P ) othw
Let < stand for P+ and hence > is (P−1 )+ (cf. Exercise 3.9.21). Now < is an order on A
that has IC, and is left-narrow since
(a)(P−1 )+ = (a)(P−1 )n : n > 0
and an easy argument shows that each (a)(P−1 )n is a set (Exercise 5.4.27). Thus, by 5.3.7,
there is a unique F : A → X such that
F (a) >)
(∀a ∈ A)F(a) = G(a, (1)
= G a, F (a) > (a)P−1
Now, since (a)P−1 ⊆ (a) > we have (F (a) >) (a)P−1 = F (a)P−1 , hence (1)
becomes
(∀a ∈ A)F(a) = G(a, F (a)P−1 )
Proof We only need to show that dom(F) = A. By 5.3.9, there is a unique F satisfying
5.3.11 Remark (Notation Moschovakis (1969)) In the following corollaries we use some
notation introduced by Moschovakis:
Define, the functions π and δ by
5.3 Inductive Definitions of Functions 171
(u, a)
P(v, b) iff u = v ∧ aPb
It is clear that
P has MC. Now, (1) can be rewritten as
(∀(s, a) ∈ S × A)F(s, a) = G s, a, s, x, F(s, x) : (s, x)
P(s, a)
= G s, a, F (s, a)
P−1
The result follows from 5.3.9 by using the J given below as the “G-function”
↑ if g ∈
/ S×A
J(g, f ) =
G(π(g), δ(g), f ) othw
Thus,
(∀g ∈ S × A)F(g) = G π(g), δ(g), F (g)
P−1
= J g, F (g)
P−1
5.3.13 Corollary (Recursive Definition with Parameters II) Let all assumptions be as in
Corollary 5.3.12, except that the recurrence now reads
(∀(s, a) ∈ S × A)F(s, a) = G s, a, x, F(s, x) : xPa (1)
where p23 : U → U —to get the right hand side of (2) to be the same as that in (1)— is
⎧
⎨↑ if f is not a class of 3-tuples
p23 ( f ) =
⎩ δ(π(z) , δ(z)) : z ∈ f othw
Note that,
F (s, a)
P−1 = (s, x), F(s, x) : xPa
thus, setting z = (s, x), F(s, x) , we have δ π(z) = x and δ(z) = F(s, x) as needed by
(1).
5.3.14 Corollary (Pure Recursion Along a Well-Ordering with a Partial G) Let <: A →
A be a left-narrow well-ordering, and G a (not necessarily total) function G : U → X, for
some class X.
Then there exists a unique function F : A → X satisfying (1)–(2) below:
(1) (∀a ∈ A)F(a) = G(F (a) >),
(2) dom(F) is either A, or (a) > for some a ∈ A.
“Pure recursion” refers to the fact that G has only one argument, the “history” of F on the
open segment (a) >.
Proof In view of Theorem 5.3.7, we only need prove (2). So let dom(F) = A. Let a in A
be <-minimal (also minimum here, since < is total) such that
Thus (a) > ⊆ dom(F). We will argue that dom(F) = (a) >. Well, let instead b ∈ dom(F) −
(a) > be minimal such that F(b) ↓.
Thus,
F (b) >= F (a) > (4)
therefore
5.3.15 Example Let G : {0, 1} × U → {0, 1} be given as
1 if x = 1 ∧ f = ∅
G(x, f ) =
↑ othw
and {0, 1} be equipped with the standard order < on N. Then the recursive definition
yields the function F = {1, 1} whose domain is neither {0, 1} nor a segment of {0, 1}.
Thus the requirement of pure recursion in 5.3.14 is essential.14
5.3.16 Remark In “practice”, recursive definitions with respect to a P that has MC (IC)
have often the form
H(s) if x is P-minimal
F(s, x) =
G(s, x, {s, y, F(s, y) : yPx}) othw
given by
This reduces to the case considered in 5.3.12 with a G-function, G,
x, f ) = H(s)
G(s,
if x is P-minimal
G(s, x, f ) othw
A similar remark holds —regarding making the “basis” of the recursion explicit— for all
the forms of recursion that we have considered.
14 It was tacitly taken advantage of in the last step of the proof. Imagine what would happen if F’s
argument were explicitly present in G: We would get G(b, F (b) >) = G(b, F (a) >) but not
necessarily G(b, F (a) >) = G(a, F (a) >).
174 5 Induction
5.3.17 Example (The support function) The support function sp : U → U gives the set
of all atoms, sp(x), that took part in the formation of some set x.
For example,
sp(∅) = ∅
sp({{{∅}}}) = ∅
sp({2, {#, !, {1}}}) = {2, #, 1, !} for atoms 2, #, 1, !
{x} if x is an atom
sp(x) = (1)
{sp(y) : y ∈ x} othw
Proof 1. ≺ is an order:
≺ (a, b),
• Indeed, if (a, b) then a < a which is absurd.
5.3 Inductive Definitions of Functions 175
≺ (a , b ) ≺ (a , b ), then b = b = b and a < a < a . Thus a < a and
• If (a, b)
≺ (a , b ).
hence (a, b)
2. ≺ has MC: So let ∅ = A ⊆ Nn+1 . Let a be <-minimal in S = {x : (∃b)(x, ∈ A} ⊆ N.
b)
Pause. Why is S = ∅?
Let c be such that (a, c) ∈ A. This (a, c) is ≺-minimal in A. Otherwise for some d,
A (d, c) ≺ (a, c). Hence d < a, but this is a contradiction since d ∈ S (why?).
The minimal elements of ≺ in Nn+1 are of the form (0, b), (0, b ), (0, b ), . . ., which are
not comparable if they have distinct “b-parts”. Thus they are infinitely many.
We can now state the important (for computability, e.g., cf. Tourlakis (2012, 2022)).
5.3.20 Definition (Primitive Recursive Schema) The following inductive definition is the
schema of primitive recursion due to Dedekind. Define f : Nn+1 → N via given functions
h : Nn → N and g : Nn+2 → N by
5.3.21 Theorem The schema (1) of 5.3.20 defines inductively a unique function f :
Nn+1 → N.
Proof Using the relation ≺ of 5.3.19 that has MC (and thus IC) on Nn+1 , we rewrite (1) of
5.3.20 as follows:
• Noting that
f x, y = x−1, y, f (x − 1, y) , x − 2, y, f (x − 2, y) , . . . , 0, y, f (0, y)
(3)
we see that the function in (3) applied to input x − 1, y yields f (x − 1, y) as needed.
5.3.22 Exercise Prove by induction on x (and using y as a parameter) that the f defined
by (1) is total provided h and g are.
But “. . .”, or “etc.”, is not mathematics! That is why we gave at the outset of this section
the definition 5.3.1.
Applied to the case a = 2 we have
20 = 1
2n+1 = 2 × 2n (1)
From 5.3.21 we have at once that 5.3.1 and in particular 5.3.23 defines a unique function,
each satisfying its defining equations.
For the function that for each n outputs 2n we can give an alternative definition that uses
“+” rather than “×” in the “g-function” part of the definition:
20 = 1
2n+1 = 2n + 2n
m
5.3.24 Example Let f : Nn+1 → N be given. How can I define i=0
—for any
f (i, b)
b ∈ Nn — other than by the sloppy
+ f (1, b)
f (0, b) + f (2, b)
+ . . . + f (i, b)
+ . . . + f (m, b)?
By induction/recursion, of course:
0
= f (0, b)
f (i, b)
i=0
m+1
=
f (i, b) m
i=0 f (i, b) + f (m + 1, b) (1)
i=0
5.3 Inductive Definitions of Functions 177
n
5.3.25 Example Let f : Nn+1 → N be given. How can I define i=0
—for any
f (i, b)
b ∈ Nn — other than by the sloppy
× f (1, b)
f (0, b) × f (2, b)
× . . . × f (i, b)
× . . . × f (n, b)?
By induction/recursion:
0
= f (0, b)
f (i, b)
i=0
n+1
f (i, b) = n
i=0 f (i,
b)
× f (n + 1, b) (2)
i=0
n
Again, by 5.3.21, (2) defines a unique function named λn b. i=0
that behaves as
f (i, b)
required.
f (0) =1
f (n + 1) = 2 f (n) (3)
Hmm! Is the guess that f (n) is a ladder of n 2s correct? Yes! Let’s verify by induction:
One often refers to the part “ f (n, y) ”, that is,15
n − 1, y, f (n − 1, y) , n − 2, y, f (n − 2, y) , . . . , 0, y, f (0, y)
5.3.28 Example (Fibonacci again; with a comment re Basis case) Thus if want to fit
the Fibonacci definition into the general schemata of 5.3.9 or 5.3.27 —without a parameter
“y ”— we would choose a “g” like this
⎧
⎪
⎪ if f N2 then ↑
⎪
⎪
⎨else if n = 0 then 0
g(n, f ) = (1)
⎪
⎪ else if n = 1 then 1
⎪
⎪
⎩
else if n > 1 then f (n − 1) + f (n − 2)
15 In the expanded version below it is understood that the tuple (x, y, f (x, y)) is missing if f (x, y) ↑.
5.4 Exercises 179
5.4 Exercises
20. Let
b1 = 3, b2 = 6
bk = bk−1 + bk−2 , for k ≥ 3
F0 = 0, F1 = 1
Fk = Fk−1 + Fk−2 , for k ≥ 2
√
1+ 5
Let φ stand for the number . Prove by induction that Fn > φ n−2 for all n ≥ 3.
2
24. Let A be a set of n elements. Prove that 2 A has 2n elements using the binomial theorem.
n n
Hint. By the binomial theorem (5.2.25) 2n = (1 + 1)n = i=0 . Do not mix this
i
up with the methodology suggested in Exercise 25 below.
25. Use induction on n to prove that if A has n elements, that is, A ∼ {0, 1, . . . , n − 1}
if n ≥ 1 —that is, A has the form {a0 , a1 , . . . , an−1 }— or A = ∅, then 2 A has 2n
elements.
Hint. For the induction step —going from A = {a0 , . . . , an−1 } to A = {a0 , . . . ,
an−1 , an }— argue that the added member an is in as many new subsets (of A ) as
A has in total.
26. Show, for any 0 < n ∈ N, that (Pn )−1 = (P−1 )n .
27. Let P on A be left-narrow. Show that, for any a ∈ A, (a)(P−1 )n is a set.
Hint. (y)P−1 = {x : xPy} is a set for any y by left narrowness. What values go into x
in an expression like y (P−1 ) ◦ (P−1 ) ◦ · · · ◦ (P−1 ) x for any y?
n
28. Prove that every natural number ≥ 2 is a product of primes, where 2 is the “trivial”
product of one factor.
Hint. Use CVI in conjunction with 5.2.20.
29. Supplement the above problem to add a proof that every natural number ≥ 2 is a
product of primes in a unique way, if we disregard permutations of the factors, which
is permissible by the commutativity of ×.
5.4 Exercises 181
30. Code any finite set S = {a0 , a1 , . . . , an } by a code we will name c S (“c” for “code”)
given by
De f !
cS = pa
a∈S
where “ pa ” is the “a-th prime” in the sequence of primes
position =0 1 2 3 4 ...
prime name = p0 p1 p2 p3 p4 . . .
prime value = 2 3 5 7 11 . . .
Prove
a. The function that assigns to every finite ∅ = S the number c S —and assigns 1 to
∅— is a 1-1 correspondence onto its range.
b. Use that fact to show that the set of all finite subsets of N is enumerable.
31. Reprove the previous problem with a different coding: To every set S = {a0 , a1 , . . . ,
an } —where the ai are distinct— this time we assign the natural number (we assign
0 to ∅).
bc S = 2a0 + 2a1 + 2a2 + · · · + 2an
“bc” stands for “binary code”.16 That is, show
a. The function that assigns to every finite ∅ = S the number bc S —and assigns 0
to ∅— is a 1-1 correspondence onto its range.
b. If we are given a binary code of a set, we easily find the set if we convert the
number in binary notation. The positions of the 1-bits (terminology for binary
digits) in the code are the values of the members of the set.
c. Now argue again that the set of finite subsets of N is enumerable.
b 0 , b1 , b2 , b3 , . . .
16 The literature on computability refers to this finite set code as the “canonical index” (Rogers
(1967), Tourlakis (2022)).
182 5 Induction
b0 = C(B)
bn+1 = C B − {b0 , . . . , bn }
From our work on inductive definitions, the sequence (function of n, right?) exists.
Next
c. Prove by induction on n that the function λn.bn 17 is total (on N), 1-1 and onto
the set
T = {b0 , b1 , . . .} (1)
(the onto part is trivial; why?).
Prove that his definition is equivalent to the one we introduced in Definition 3.6.1.
There are two directions in the equivalence!
Hint. Use 33 and 34. Note that if A ⊆ B is enumerable, then B = (B − A) ∪ A.
The following few exercises expand our topics.
36. (Upper bounds in POsets) Let (A, <) be some POset, and let ∅ = B ⊆ A.
We call a u ∈ A an upper bound of B iff, for all x ∈ B, we have x ≤ u —where you
will recall that x ≤ u means x < u ∨ x = u.
We say that u is the least upper bound of B (in A) —in symbols, u = lub(B)— or also
the supremum or “the sup”, in symbols, u = sup(B), iff for all upper bounds u of B
we have that u ≤ u .
Determine upper bounds and the lub (if any) for the following two sets in the POset
{1, 2, 3}, {(1, 2), (1, 3)}
• {1} and
• {1, 2, 3}.
√
37. Prove that 2 is not rational. We√ say that a real number that is not rational is irrational.
Hint. The statement means √ that 2 cannot equal m/n for any integers m, n (n = 0, of
course). Well, assume that 2 = m/n, for some m, n, where we also assume that we
have reduced the fraction m/n to the lowest possible numerator and denominator.
Now 2n 2 = m 2 says that 2 divides 2
√ m . Does it also divide m? Can you now reach a
contradiction to the assumption 2 = m/n?
38. (A non Constructive Proof!) Prove that there are irrational numbers a and b such that
a b is rational (fraction of two integers).
Hint. Logic tells us that A ∨ ¬A is true for any sentence A —see Exercises 4.2.8 for a
definition of “sentence”.
So, consider cases.
√ √2
a. Case where 2 is rational. Done!
√ √2
b. Negation of case above: 2 is irrational. Take it from here.
39. Recall the notation “(a, b)” for the open interval of reals or rationals as the case may
De f De f
be, that is, (a, b) = {x√∈ R : a < x<b} or (a, b) = {x∈Q : a < x < b} respectively.18
A real —such as 2 that is not rational is called irrational. √
40. Start with the POset (R,√<). Consider next the sets of rationals S = {x ∈ Q : x < 2}
and T = {x ∈ Q : x > 2}. Prove two things:
41. Prove that if (A, <) is a LOset (linearly ordered set), then any pair a, b of members of
A has a least upper bound. Explain how such an lub can be found/constructed in each
case.
42. (Knaster-Tarski Fixpoint theorem) Let (A, <) be a POset with a minimum element m
and f : A → A be a total continuous function, which in the context of POsets means
that lub and function applications (calls) commute, that is, whenever lub(X ) exists for
∅ = X ⊆ A, then so does lub( f [X ]) and f (lub(X )) = lub( f [X ]).
18 We are reminded that R stands for the set of all real numbers and Q for the set of all rational
numbers. Of course, Q R as we know from “calculus 101”.
184 5 Induction
Assume now that every nonempty totaly ordered by “<” subset of A —such a subset
is often called a “chain”— has a lub.
Prove that there is an a ∈ A such that f (a) = a. Such an a is called a fixed point or
fixpoint of f .
Hints.
• Prove that f is increasing on A, that is, if f (x) ↓ and f (y) ↓ and x ≤ y, then
f (x) ≤ f (y) (Hint. What is lub({x, y})?).
• Define inductively the sequence —that is, function λn.an from N to A— below:
a0 = m
an+1 = f (an )
• Prove by induction that λn.an is total and increasing, that is, an ≤ an+1 , for n ≥ 0.
• Let a be defined to stand for lub {a0 , a1 , a2 , . . .} , that is, lub(ran(λn.an )).
Prove f (a) = a. Hint for this bullet. Show that lub {a0 , a1 , a2 , . . .} =
lub {a1 , a2 , . . .} .
43. Prove that the fixpoint you found above is least. That is, if c is any other member of A
such that f (c) = c, then a ≤ c.
44. (An Application of 42 to Computability) Let P(N : N) denote the set of all 1-argument
functions from N to N.
Let F be a total function F : P(N : N) → P(N : N). Such a function is called an
operator.
Now equip P(N : N) with the order ⊂ to obtain the POset of unary functions under the
inclusion (subset) order.
Prove
a. P(N : N), ⊂ has the properties given abstractly to (A, <) in Exercise 42 above.
b. Assume that the total operator F : P(N : N) → P(N : N) is continuous. Prove that
F has a least fixpoint α ∈ P(N : N).
Note. In computability theory F is assumed to be computable (in some mathemat-
ically appropriate sense). Then, provably, so is its least fixpoint. This result, due to
Kleene, known as (a special case of his) first recursion theorem (cf. Tourlakis (2022))
has extensive applications in computability, but also in the area of program semantics.
5.4 Exercises 185
45. The greatest common divisor —acronym gcd— let us call it “d”, of two nonzero
integers a and b is the largest positive common divisor of the two. We write d =
gcd(a, b).
Prove that if d = gcd(a, b), then for some integers (members of Z = {. . . , −1, 0, 1, . . .})
x, y we have d = ax + by.
Hint. Prove that the set S = {ax + by : x ∈ Z ∧ y ∈ Z} has positive members. Call d
the smallest such positive member and prove d = gcd(a, b). To this end,
a. Prove that we may write d = ax + by for the smallest positive member of S. (Trivial)
b. Every common divisor of a and b divides d (Trivial)
c. d divides a and b. If not it will be, say, a = dq + r for some q and 0 < r < d. Derive
a contradiction by showing that r = a X + bY for some X and Y in Z. Similarly for b.
46. If ab = 0 and 1 = gcd(a, b), then we say that a and b are relatively prime.
Prove that if 1 = gcd(a, b) and a | bc —where as before, x | y means that y = xq for
some q— then a | c.
Hint. Use 45 above.
47. Refer to Example 5.2.21. Generalise said example to any base. Thus, let 1 < b ∈ N.
Prove that every natural number n ≥ 0 is expressible base-b as an expression
48. Write an algorithm as a, say, pseudo C program,19 which will convert a number n ≥ 0
given base-10 to base-b.
Hint. Due to (1) above,
n = am bm + am−1 bm−1 + · · · + a1 b + a0
= am bm−1 + am−1 bm−2 + · · · + a1 b + a0
49. Demonstrate your algorithm above by converting 131 (this is expressed in decimal or
base-10 notation) to binary (or base-2) notation.
19 “Pseudo” means to not be too faithful to programming language syntax, and shortcuts in notation
are allowed if they do not introduce ambiguities.
186 5 Induction
50. (An ancient theorem20 ) Prove that there are infinitely many primes. Note that this will
be a proof by contradiction. Induction is not relevant.
Hint. Suppose instead that there only finitely many primes, exactly these n + 1:
p 0 , p 1 , . . . , pn
Q = p 0 × p 1 × . . . × pn + 1
20 Euclid.
Inductively Defined Sets; Structural Induction
6
Overview
This chapter introduces a generalisation of the definitions by induction (recursion) of the last
section. Here we define sets inductively, not functions. The associated proof tool —induction
along an inductive definition, or structural induction — of properties of inductively defined
sets is introduced and validated.
We also connect the inductive definition of sets with an appropriate iterative construction
by stages and also we connect it (in the chapter’s Exercises section) with the definition of
sets as monotone operator fixpoints (see 3.8.1).
Some folks would add a 3rd requirement “nothing else is in the set unless so demonstrated
using 1. 2. above” and omit “smallest”. Really?
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 187
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_6
188 6 Inductively Defined Sets; Structural Induction
How exactly would you so “demonstrate”? In a recursive definition you ought to be able
to make your recursive calls and not have to trace back why the object you constructed
exists!
We will prove in Sect. 6.3.5 that indeed there is an iterative way to show that a particular
simple arithmetic expression was formed correctly by our recursion, but that defeats the
beauty of recursion.
Besides, until we reach said section we don’t even know what “nothing else is in the
set unless so demonstrated using 1. 2. above” means or how to “use” 1. and 2. to do it!
(a) First off, in step 1. above we say that 1, 2 and 3 are the initial objects of our recur-
sive/inductive definition.
(b) In step 2. we say that (E + E ) is obtained by an operation (on strings) that is available
to us, depicted as a “blackbox” below, which we named “+”.
E
−→
+ −→ (E + E )
−→
E
“(”, the string named by “E”, “+”, the string named by “E ”, and “)”
E
−→
× −→ (E × E )
−→
E
(c) Both operations in this example are single-valued, that is, functions. It is preferable to
be slightly more general and allow operations that are just relations, but not necessarily
functions. Such an operation O(x1 , . . . , xn , y) is n-ary —n inputs, x1 , . . . , xn — with
output variable y.
(d) We say that a set of objects S is closed under a relation (operation) —it could also be a
function— O(x1 , . . . , xn , y) meaning that for all input values x1 , . . . , xn in S, all the
obtained values y are also in S.
6.1 Set Closures 189
6.1.1 Definition Given a set of initial objects I and a set of operations O = {O1 , O2 , O2 ,
. . .}, the object Cl(I , O) is called the closure of I under O —or the set inductively defined
by the pair (I , O)— and denotes the ⊆-smallest set 1 S that satisfies
1. I ⊆ S.
2. S is closed under all operations in O, or simply, closed under O or even O-closed.
3. The “smallest” part: Any set T that satisfies 1. and 2. also satisfies S ⊆ T .
Nice definition, but does the set Cl(I , O) exist given any I and O? Yes. But first,
6.1.2 Theorem For any choice of I and O, if Cl(I , O) exists, then it is unique.
Proof Say the definition of Cl(I , O) ambiguously —i.e., may have more than one value—
leads to two classes, S and T .
Then, letting S pose as closure, we get S ⊆ T from 6.1.1, 3.
Then, letting T pose as closure, we get T ⊆ S, again from 6.1.1, 3. Thus S = T .
6.1.3 Theorem For any choice of I and O with the restrictions of Definition 6.1.1 the set
Cl(I , O) exists.
Since all sets S in G contain I and are O-closed, so is C (Verify). That is, C satisfies
1.–2. of 6.1.1. But also C ⊆ S for all such sets S the way it is defined. So it satisfies
6.1.1, 3. as well; it is ⊆-smallest.
We say that a property P[x] propagates with O iff for each Oi (x1 , . . . , xn , y) ∈ O, if
whenever all the inputs in the xi satisfy P[x] (i.e., P[xi ] is true for each argument xi ), then
all output values returned by y —for said inputs— satisfy P[x] as well. Recall that for each
assignment of values to the inputs x1 , . . . , xn we may have more than one output values in
y; for all such values P[y] is true.
6.2.2 Lemma For all (I , O) and a property P[x], if the latter propagates with O, then the
class A = {x : P[x]} is closed under O (is O-closed).
6.2.3 Theorem (Induction Over a Closure Principle) Let Cl(I , O) and a property P[x]
be given. Suppose we have done the following steps:
Naturally, the technique encapsulated by 1. and 2. of 6.2.3 is called “induction over Cl(I , O)”
or “structural induction” over Cl(I , O).
Note that for each Oi ∈ O the “propagation of property P[x]” will take the form of an
I.H. followed by an I.S.:
• Assume for the unspecified fixed inputs a1 , . . . , an of Oi that all satisfy P[x]. This is
the I.H. for Oi .
6.2 Induction Over a Closure 191
• Then prove that any output b of Oi caused by said input also satisfies the property.
Cl(I , O) ⊆ A
because in 6.1.1 the “sets T ” that fulfil “1. and 2.” must be, well, sets; not proper classes.
Here is the workaround: Cl(I , O) contains I and is O-closed. By (∗) and (∗∗) so does
T = Cl(I , O) ∩ A (∗ ∗ ∗)
6.1.1 (∗∗∗)
Cl(I , O) ⊆ T ⊆ A
6.2.4 Example Let S = Cl(I , O) where I = {0} and O contains just one operation,
x + 1 = y, where y is the output variable. That is,
n −→ x + 1 = y −→ n + 1 (1)
Can we show also N ⊆ Cl(I , O)? Yes: In this direction I do SI over N on variable n. The
property, let’s call it Q[x], now is “x ∈ Cl(I , O)”.
For n = 0, n ∈ Cl(I , O) since 0 ∈ I ⊆ Cl(I , O) by 6.1.1.
192 6 Inductively Defined Sets; Structural Induction
Now, say (I.H.) n ∈ Cl(I , O). Since Cl(I , O) is closed under the operation x + 1 = y,
we have n + 1 ∈ Cl(I , O) by 6.1.1.
So,
Cl(I , O) = N
Thus the induction over a closure generalises SI. The direction N ⊆ Cl(I , O) can be also
proved directly by a result in the new section.
We will see in this section that there is also a by-stages or by-steps way to obtain Cl(I , O).
d1 , d2 , d3 , . . . , di , . . . , dn (1)
satisfying:
Each di is
1. A member of I ,
or
2. For some j, one of the results of O j (x1 , . . . , xk , y) with inputs a1 , . . . , ak that are found
in the derivation (1) to the left of di .
n is called the length of the derivation. Every di in (1) is called an (I , O)-derived object,
or just derived, if the (I , O) is understood.
Clearly, the concept of a derivation abstracts, thus generalises, the concept of proof, while
a derived object abstracts the concept of a theorem.
0, 0, 0
0, 1, 0, 1, 0, 1, 1, 1, 1, 0
Nothing says we cannot repeat a di in a derivation! Lastly here is an “efficient” derivation
with no redundant steps: 0, 1, 2, 3, 4, 5.
6.3 Closure Versus Definition by Stages 193
6.3.3 Proposition If
d1 , d2 , d3 , . . . , di , . . . , dn , dn+1 , . . . , dm
is a (I , O)-derivation, then so is
d1 , d2 , d3 , . . . , di , . . . , dn
d1 , d 2 , . . . , d n , e 1 , e 2 , . . . , e m
d1 , d2 , . . . , dn , e1 , e2 , . . . , em
from left to right we validate each di and each e j giving precisely the same validation reason
as we would in each sequence d1 , d2 , . . . , dn and e1 , e2 , . . . , em separately. These reasons
are local to each sequence.
6.3.5 Theorem For any initial sets of objects and operations on objects (I and O) we have
that Cl(I , O) = {x : x is (I , O)-derived}.
. . . , x1 , . . . , . . . , xi , . . . , . . . , xn (1)
But then so is
. . . , x1 , . . . , . . . , xi , . . . , . . . , xn , y (2)
by 6.3.1, case 2. That is, y is derived, hence y ∈ D is proved (I.S.).
2. Conversely, prove that D ⊆ Cl(I , O): Let x ∈ D. This time we do good old-fashioned
CVI over N on the length n of a derivation of x, toward showing that x ∈ Cl(I , O) —this
is the “property of x” that we prove.
Basis. n = 1. The only way to have a 1-element derivation is that x ∈ I .
Thus, x ∈ I ⊆ Cl(I , O) by 6.1.1.
I.H. Assume the claim for x derived with length k < n.
I.S. Prove that the claim holds when x has a derivation of length n.
Consider such a derivation
an
a1 , . . . ai , . . . , ak , . . . , x
If x ∈ I , then we are done by the Basis. Otherwise, say x is the result of an operation
(relation) Or ∈ O, applied on entries to the left of x, that is, say that Or (. . . , x) is true
—where we did not (have to) specify the inputs.
By the I.H. the inputs of Or all are all in Cl(I , O). Now, since this closure is closed
under Or (. . . , x), we have that the output x is in Cl(I , O) too.
6.3.6 Remark So now we have two equivalent (6.3.5) approaches to defining inductively
defined sets S: As S = Cl(I , O) or as S = {x : x is (I , O)-derived}.
The first approach is best when you want to prove properties of all members of the set S.
The second is best when you want to show x ∈ S, for some specific x.
6.3.7 Example Let us revisit Example 6.2.4, second half of the proof. To prove N ⊆
Cl(I , O) we prove that each n ∈ N has a (I , O)-derivation.
Indeed, such a derivation for n is
0, 1, 2, . . . , n − 1, n (1)
where the above is (n) ≥= {x ∈ N : x ≤ n} where all entries in (1) are in ascending order
without repetitions.
the result of concatenating string named X with the (length-1) string a, in that order. The
length of a string over A is the number of occurrences in the string (counting repetitions) of
a and b.
We denote by A+ the set of all strings of non zero length formed using the symbols a
and b. A∗ is defined to be A+ ∪ {λ}. Let O consist of the operations Oa and Ob :
X −→ Oa −→ Xa (1)
and
X −→ Ob −→ X b (2)
We claim that Cl(I , O) = A∗ .
1. For Cl(I , O) ⊆ A∗ we do induction over the closure to prove that any x ∈ Cl(I , O)
satisfies x ∈ A∗ (“the property”).
X −→ R −→ a X b (3)
If n = 0, “0 copies of X ” means λ.
1. For Cl(I , O) ⊆ S we do induction over the closure to prove that any x ∈ Cl(I , O)
satisfies x ∈ S (“the property”).
6.3.10 Example (Extended Binary Trees) This is a longish example with some prelim-
inary discussion up in front. We want to define the mathematical (and computer science)
term known as “Tree”.
This term refers to a structure, which uses as building blocks —called nodes— the
members of the enumerable set below
A = {0 , 1 , 2 , . . . ; 0 , 1 , 2 , . . .}
The qualifier “extended” is due to the presence of square nodes. We will not define simple
trees (they have round nodes only).
These nodes are made distinct by the use of subscripts. The symbols in the set A are
distinguished by their type, “round” versus “square”, and within each type by their natural
6.3 Closure Versus Definition by Stages 197
Circular or square nodes are connected by line segments. Walking in the vertical direction
from the top of the page towards the bottom, no nodes are ever shared. In particular, in all
the examples above where we have more than one node, you will notice that
the two sets of nodes that “hang below” the top node (left and right of it) are disjoint.
Thus our definition below builds the flat set —called the support of the tree— of nodes
of a tree at the same time as it builds the structure of the tree.
6.3.11 Definition We define the set of all extended trees —or just trees— E T , as Cl(I , O)
where:
If T = (X , i , Y ), then we say that i is the root of T , while X is its left and Y is its
right subtree.
We verify the example above: Using 6.3.5, the leftmost example is a tree since it is the right
component of the pair (∅, 1 ). The next tree is built via the derivation —written linearly,
(∅, 1 ), (∅, 2 ), {2 }, (1 , 2 , 2 )
The next derivation builds both the 2nd and 3rd trees:
(∅, 1 ), (∅, 2 ), {2 }, (1 , 2 , 2 ) , (∅, 3 ), (∅, 4 ), {3 }, (3 , 3 , 4 )
6.3.12 Example (Trees —continued) Hmm! Seems like we are not including square nodes
in the support. See how the support of all nodes in I is ∅ for each entry. Why so?
In the words of Knuth (Knuth (1973)) “trees is the most important nonlinear structure
arising in computing algorithms”. The extended tree is an abstraction of trees that we imple-
ment with computer programs, where round nodes are the only ones that can carry data.
The lines are (implicitly) pointing downwards. They are pointers, in computer jargon. For
example, the topmost leftmost line in the fourth tree above points to the node 2 . Practically
it means that if your program is processing node 1 , then it can transfer to and process node
2 if it wishes. It knows the address of 2 . The pointer holds this address as value.
Which brings me to square nodes! Together with the line planted on them, they are
notation for null pointers! They point nowhere. So square nodes cannot hold information,
that is why they do not contribute to the support of the tree.
The computer scientist calls round nodes “internal” and calls square nodes “external”.
Finally, how do the lines —called edges— get inserted? We defined “root” for trees, as
well as “left subtree” and “right subtree”. So, to draw lines and draw a tree that is given
mathematically as (X , r , Y ), we call recursively the process that does the “drawing” on
(inputs) X and Y .
Then add two more edges: One from r to the root of X and one from r to the root
of Y .
How does the recursion terminate? Well, if your tree is just j , then there is nothing to
draw. j is the root. This is the basis of the recursive procedure: do nothing.
6.3.13 Proposition In any extended tree, the number of square nodes exceeds by one the
number of round nodes.
Proof Induction over the set of all trees (6.3.11) Cl(I , O).
1. Basis. For any (∅, i ), the tree-part (structure-part) is just i . One square node, 0 round
nodes. Done.
2. The property propagates with the only tree-builder operation:
⎫
(FX , X ) −→⎬
i −→ form tree −→ FX ∪ FY ∪ {i }, (X , i , Y )
⎭
(FY , Y ) −→
Indeed, suppose that X has φ internal (round) and ε external (square) nodes. Let also Y
have φ internal and ε external nodes.
The assumption on the input side is then (I.H.) that
200 6 Inductively Defined Sets; Structural Induction
φ+1=ε (1)
and
φ + 1 = ε (2)
The output side of the operation has the tree (X , i , Y ). This has = φ + φ + 1
internal nodes and E = ε + ε external ones. Using (1) and (2) we have
= ε + ε − 1 = E − 1
Seeing that this is the property we want to prove on the output side, indeed the property
propagates with the rule. Done.
6.4 Exercises
1. (Long but Easy Exercise) Below we simultaneously define the syntax of a set of
names of certain sets of strings and the semantics of said names —that is, what sets
they name.
The set of names is given as a closure while the semantics of those names is given
along the definition of the closure, inductively. See below.
For the names we need an alphabet of symbols. As such we take the alphabet =
{0, 1, (, ), ·, +, ∗} by the inductive definition:
(α + β) A∪B
(α · β) A · B( = {x · y : x ∈ A ∧ y ∈ B})
(α ∗ ) A∗
These strings are called regular expressions, and the sets they are “naming” are called
regular sets. For example, (0 + 1) is a regular expression for the regular set {0, 1}, while
(∅∗ ) is a regular expression for {λ}, where λ denotes the empty string (in this context “λ”
is not related to λ-notation). We informally omit brackets (so that we can write 0 + 1, ∅∗ ),
by the rules:
6.4 Exercises 201
A −→ ¬ −→ (∗¬ ∗ A∗)
A −→
∨ −→ (∗A ∗ ∨ ∗ B∗)
B −→
where “∗” denotes string concatenation.
202 6 Inductively Defined Sets; Structural Induction
f (0, y) = h( y)
f (x + 1, y) = g(x, y, f (x, y))
6.4 Exercises 203
then f is total as well. Incidentally, we use the notation f = prim(h, g) for the f
defined above and call h the basis function and g the iteration function.
Hint. Prove by simple induction on x that, for all y, we have f (x, y) ↓.
Very Important! Some of the variables in f , g and h above may be missing! This is
alright. Permutation of the variables is also alright. We have indicated all the variables as
place-holders in the general case. The following function (Exercise 15) has a primitive
recursive definition where the last variable of “g” is missing so there is no “recursive
call”!
15. (The switch function) Prove that λx yz.i f x = 0 then y else z is in PR.
16. Let f = prim(h, g). Imagine a programming language that allows the assignment
statements z ← h( y) and z ← g(x, y, w). Program in this programming language,
using a single do loop, the function λx y. f (x, y) given by the primitive recursion in
14.
Hint. You will obviously use pseudo-programming, as details of the programming
language are not essential. The crucial part is that it supports the above mentioned
assignment statement and it can do “loops”:
do i = 0 to n
..
.
17. True or false? In the schema defining f as f = prim(h, g) the recursive call can be
eliminated.
18. Define the set of all primitive recursive functions, that we denote by PR, as a closure
Cl(I , O) where
a. I is the set of initial functions. These are precisely S = λx.x + 1, Z = λx.0, and
Uin , for 1 ≤ i ≤ n > 0, given by Uin = λxn xi .
b. There are just two operations in O:
H = λx w y.F(x, g(w), y)
Modern set theorists call this closure ω, the set of formal natural numbers.
Correspondingly, they call the members of ω formal natural numbers.
We will see (actually you will prove) below (30) that ∅ ∈ ω is the counterpart of 0 ∈
N
and n ∪ {n} ∈ ω is the counterpart of n + 1 ∈ N. Naturally, “n ∪ {n}” is called the
(formal) successor of n ∈ ω.
Let us now discover properties of the formal natural numbers —the first three of
which
are the sets ∅, {∅}, {∅, {∅}}, etc.—
Pause. Can you better specify the “etc.” above?
that mirror exactly those of the members of N.
Now, Prove (easy) that we can do induction over ω on the variable n, namely:
If P(n) is a property of sets n,
then if
one verifies P(∅) and also, for the arbitrary n,
verifies that P(n) implies P n ∪ {n} , then one has proved that P(n) is true for all
n ∈ ω.
22. Prove that n ∪ {n} = ∅ for all n ∈ ω.
Overview
In so-called “divide and conquer” algorithms one usually ends up with a recurrence relation
(i.e., inductive or recursive definition!) that defines the “timing function”, T (n) —such
timing indicating worst case upper bound on run time or average run time as the case may
be. For example, the recurrence might look like
1 if n = 1
T (n) =
T (n/2) + 1 otherwise
In order to assess the “goodness” of the proposed algorithm by comparison to either our
expectations or to another algorithm, we need to know T (n) in “closed” form, that is, in
terms of known functions, for example, nr for r > 0, cn for c > 1, logb n for some integer
b > 1.
Often, a preliminary analysis need only worry about the “asymptotic behaviour” of the
algorithm, i.e., the behaviour for large inputs (n is the input size).
What is input size? Since many algorithms of interest —e.g., they may manipulate trees—
are non numerical, “size” is not the numerical value of the input normally. Moreover, even
numerical algorithms are often expressed in terms of the digit-structure of the inputs thus it
makes sense to assess their “goodness” with respect to the number of digits in the input or
the length of the input, not its value. Does it matter? It does in the context of the so-called
efficient (or “feasible”) algorithms which is defined to mean that their run time is bounded
by a polynomial function of the input size!
It turns out that —due to the exponential relation between value and length of a natural
number— an algorithm that runs in polynomial time with respect to the input numerical value
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 207
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_7
208 7 Recurrence Equations and Their Closed-Form Solutions
will run in exponential time with respect to input length and thus be termed “inefficient” or
worse: “unfeasible”.
“Big-O” notation —introduced in this chapter— is an excellent tool in gauging upper
bounds of run times of algorithms, therefore the solution of recurrences is often sought in
such notation. On occasion one requires an “exact” solution (this is much harder to achieve
in general).
There is a big variety of recurrence relations and an equally big variety of solution
techniques. Some restricted cases are handled well by packages such as Mathematica or
Maple.
In this chapter we restrict attention to simple classes of recurrences taken from both the
“additive” and “multiplicative” cases. These characterisations in quotes refer to the manner
of handling the argument of the recurrence. E.g., the recurrence above is multiplicative as
the recursive call is to an argument obtained by halving the original argument n.
For the solution of the Fibonacci recurrence and other “Fibonacci-like” recurrences in
closed form we introduce the topic of generating functions.
This notation is due to the mathematician E. Landau and is in wide use in number theory,
but also in computer science in the context of measuring (bounding above) computational
complexity of algorithms for all “very large inputs”.
7.1.1 Definition Let f and g be two total functions of one variable, where g(x) > 0, for
all x. Then
1. f = O(g) —also written as f (x) = O(g(x))— read “ f is big-oh g”, means that there
are positive constants C and K in N such that
2. f = o(g) —also written as f (x) = o(g(x))— read “ f is small-oh g”, means that
f (x)
lim =0
x→∞ g(x)
3. f ∼ g —also written as f (x) ∼ g(x)— read “ f is of the same order as g”, means that
f (x)
lim =1
x→∞ g(x)
7.1 Big-O, Small-o, and the “Other” ∼ 209
“∼” between two sets A and B, as in A ∼ B, means that there is a 1-1 correspondence
f : A → B. Obviously, the context will protect us from confusing this ∼ with the one
introduced just now, in 7.1.1.
Both definitions 2. and 3. require some elementary understanding of differential calculus.
Case 2. says, intuitively, that as x gets extremely large, then the fraction f (x)/g(x) gets
extremely small, infinitesimally close to 0. Case 3. says, intuitively, that as x gets extremely
large, then the fraction f (x)/g(x) gets infinitesimally close to 1; that is, the function outputs
are infinitesimally close to each other.
2x 2 + 10001000 x + 10350000
= 2/3 + 10001000 /3x + 10350000 /3x 2 < 1
3x 2
for x > K for some well chosen K . Note that 10001000 /3x and 10350000 /3x 2 will
each be < 1/6 for all sufficiently large x-values: we will have 2/3 + 10001000 /3x +
10350000 /3x 2 < 2/3 + 1/6 + 1/6 = 1 for all such x-values. Thus 2x 2 + 10001000 x +
10350000 < 3x 2 for x > K as claimed.
In many words, in a polynomial, the order of magnitude is determined by the highest
power term.
7.1.3 Proposition Suppose that g is as in 7.1.1 and f (x) ≥ 0 for all x > L, hence | f (x)| =
f (x) for all x > L. Now, if f (x) ∼ g(x), then f (x) = O(g(x)).
7.1.4 Proposition Suppose that g is as in 7.1.1 and f (x) ≥ 0 for all x > L, hence | f (x)| =
f (x) for all x > L. Now, if f (x) = o(g(x)), then f (x) = O(g(x)).
hence
f (x)
−1 < <1
g(x)
therefore, x > max(K , L) implies f (x) < g(x).
7.1.5 Example 1. ln x = o(x r ) for any positive real r . Here “ln” stands for loge where e
is the Euler constant
2.7182818284590452353602874713526624977572470937 . . .
ln x 1/x 1
lim r
= lim r −1
= lim =0
x→∞ x x→∞ rx x→∞ r xr
2. ln x = O(log10 (x)). In fact, you can go from one log-base to the other:
log10 (x)
loge (x) =
log10 (e)
The claim follows from 7.1.3 since trivially ln x ∼ log10 (x)/ log10 (e). For that reason
—and since multiplicative constants are hidden in big-O notation— complexity- and
7.2 Solving Recurrences; the Additive Case 211
algorithms-practitioners omit the base of the logarithm and write things like O(log n)
and O(n log n).
T0 = k
sn Tn = vn Tn−1 + f (n) if n > 0
a recurrence defining the sequence Tn , or equivalently, the function T (n) (both jargons and
notations spell out the same thing), in terms of the known functions (sequences) sn , vn , f (n).
For the general case see Knuth (1973). Here we will restrict attention to the case sn = 1,
for all n, and also vn = a (a constant), for all n.
Subcase 1. (a = 1) Solve
T0 = k
(1)
Tn = Tn−1 + f (n) if n > 0
From (1), Tn − Tn−1 = f (n), thus
n
n
(Ti − Ti−1 ) = f (i)
i=1 i=1
the lower summation value dictated by the lowest valid value of i − 1 according to (1).
7.2.1 Remark The summation in the lhs above is called a “telescoping (finite) series”
because the terms T1 , T2 , . . . , Tn−1 appear both positively and negatively and pairwise can-
cel. Thus the series “contracts” into Tn − T0 like a (hand held) telescope.
Therefore n
Tn = T0 + i=1 f (i)
n (2)
= k + i=1 f (i)
If we know how to get the sum in (2) in closed form, then we solved the problem!
n
n
( pi − pi−1 ) = i
i=2 i=2
Note the lower bound of the summation: It is here 2, to allow for the lowest i − 1 value
possible. That is 1 according to 3, hence i = 2.
Thus,
(n + 2)(n − 1)
pn = 2 +
2
(Where did I get the (n + 2)(n − 1)/2 from?) The above answer is the same as (verify!)
(n + 1)n
pn = 1 +
2
obtained by writing
n
n
2+ i =1+ i
i=2 i=1
Subcase 2. (a = 1) Solve
T0 = k
(4)
Tn = aTn−1 + f (n) if n > 0
(4) is the same as
Tn Tn−1 f (n)
n
= n−1 + n
a a a
To simplify notation, set
Def Tn
tn =
an
thus the recurrence (4) becomes
t0 = k
f (n) (5)
tn = tn−1 + if n > 0
an
By subcase 1, this yields
n
f (i)
tn = k +
ai
i=1
from which
n
f (i)
Tn = ka n + a n (6)
ai
i=1
1 if n = 1
Tn = (7)
2Tn−1 + 1 otherwise
To avoid trouble, note that the lowest term here is T1 , hence its “translation” to follow the
above methodology will be “t1 = T1 /21 = 1/2”. So, the right hand side of (6) applied here
will have “ka n−1 ” instead of “ka n ” (Why?) and the indexing in the summation will start at
i = 2 (Why?)
Thus, by (6),
n
1
Tn = 2n (1/2) + 2n
2i
i=2
(2−1 )n+1 − 1 1
= 2n−1 + 2n ( −1
−1− )
2 −1 2
−n 1
=2 n−1
+ 2 (2 − 2 − 1 − )
n
2
= 2n − 1
In the end you will probably agree that it is easier to redo the work with (7) directly, first
translating it to
1/2 if n = 1
tn = (8)
tn−1 + 1/2 if n > 1
n
etc.
The red terms are subtracted as they are missing from our . The blue formula used is
for
n
1/2i
i=0
Subcase 1.
k if n = 1
T (n) = (1)
aT (n/b) + c if n > 1
214 7 Recurrence Equations and Their Closed-Form Solutions
were a, b are positive integer constants (b > 1) and k, c are any constants. Recurrences
like (1) above arise in divide and conquer solutions to problems. For example, binary search
has timing governed by the above recurrence with b = 2, a = c = k = 1.
Why does (1) with the above-mentioned parameters —b = 2, a = c = k = 1— capture the
run time of binary search? First off, regarding “run time” let us be specific: we mean number
of comparisons.
OK, to do such a search on a sorted (in ascending order, say) array of length n, you first
check the mid point (for a match with what you are searching for). If you found what you
want, exit. If not, you know (due to the ordering) whether you should next search the left
half or the right half.
So you call the procedure recursively on an array of length about n/2.
This decision and call took T (n/2) + 1 comparisons. This equals T (n). If the array has
length 1, then you spend just one comparison, T (1) = 1.
We seek a general solution to (1) in big-O notation.
First convert to an “additive case” problem: To this end, seek a solution in the restricted
set {n ∈ N : n = bm for some m ∈ N}. Next, set
therefore ⎧
⎨m if a = 1
t(m) = a m k + ca m (a −1 )m −1
⎩a −1 if a = 1
a −1 − 1
or, more simply, ⎧
⎨k + cm if a = 1
t(m) = a m −1
⎩a m k + c if a = 1
a−1
Using O-notation, and going back to T we get:
7.3 Solving Recurrences; the Multiplicative Case 215
O(m) if a = 1
T (b ) = m
(4)
O(a m ) if a = 1
or, provided we remember that this solution relies on the assumption that n has the form bm :
O(log n) if a = 1 O(log n) if a = 1
T (n) = = (5)
O(a logb n ) if a = 1 O(n logb a ) if a = 1
If a > b then we get slower than linear “run time” O(n logb a ) with logb a > 1. If on the other
hand b > a > 1 then we get a sublinear run time, since then logb a < 1.
Now a very important observation. For functions T (n) that are increasing,2 i.e., T (i) ≤ T ( j)
if i < j the restriction of n to have form bm proves to be irrelevant in obtaining the solution.
The solution is still given by (5) for all n. Here’s why:
In the general case, n satisfies
Suppose now that a = 1 (upper case in (4)). We want to establish that T (n) = O(log n) for
the general n (of (6)). By monotonicity of T and the second inequality of (6) we get
by (6) right by (4) by (6) left
T (n) ≤ T (bm ) = O(m) = O(log n)
The case where a > 1 is handled similarly. Here we found an answer O(nr ) (where
r = logb a > 0) provided n = bm (some m). Relax this proviso, and assume (6).
Now
by (6) right by (4) Why? by (6) left
T (n) ≤ T (bm ) = O(a m ) = O((bm )r ) = O((bm−1 )r ) = O(nr )
Subcase 2.
k if n = 1
T (n) = (1 )
aT (n/b) + cn if n > 1
were a, b are positive integer constants (b > 1) and k, c any constants. Recurrences like (1 )
above also occur in divide and conquer solutions to problems. For example, two-way merge
sort has timing governed by the above recurrence with a = b = 2 and c = 1/2. Quicksort
has average run time governed, essentially, by the above with a = b = 2 and c = 1. Both
lead to O(n log n) solutions. Also, Karatsuba’s “fast” integer multiplication has a run time
recurrence as above with a = 3, b = 2.
These examples are named for easy look up, in case they trigger your interest or curiosity. It
is not in the design of this course to expand on them. Merge Sort and Quicksort you might
see in a course on data structures while Karatsuba’s “fast multiplication” of natural numbers
might appear in a course on algorithms.
Setting at first (our famous initial restriction on n) n = bm for some m ∈ N and using (2)
above we end up with a variation on (3):
k if m = 0
t(m) = (3 )
at(m − 1) + cbm if m > 0
thus we need do
m
m
t(i) t(i − 1)
− =c (b/a)i
ai a i−1
i=1 i=1
therefore ⎧
⎨m if a = b
t(m) = a m k + ca m (b/a)m − 1
⎩(b/a) if a = b
b/a − 1
Using O-notation, and using cases according as to a < b or a > b we get:
⎧
⎪
⎪ O(bm m) if a = b
⎨
t(m) = a O(1) = O(a )
m m if b < a /* (b/a)m → 0 as m → ∞ */
⎪
⎪
⎩ O(bm − a m ) = O(bm ) if b > a
or, in terms of T and n, which is restricted to form bm (using same calculational “tricks” as
before): ⎧
⎪
⎨ O(n log n) if a = b
⎪
T (n) = O(n logb a ) if b < a (4 )
⎪
⎪
⎩ O(n) if b > a
The above solution is valid for any n without restriction, provided T is increasing. The proof
is as before, so we will not redo it (you may wish to check the “new case” O(n log n) as an
exercise).
In terms of complexity of algorithms, the above solution says that in a divide and conquer
algorithm (governed by (1 )) we have the following cases:
• The total size of all subproblems we solve (recursively) is equal to the original problem’s
size. Then we have a O(n log n) algorithm (e.g., merge sort).
• The total size of all subproblems we solve is more than the original problem’s size. Then
we go worse than linear (logb a > 1 in this case). An example is Karatsuba multiplication
7.4 Generating Functions 217
that runs in O(n log2 3 ) time —still better than the O(n 2 ) so-called “school method” integer
multiplication, which justifies the nickname “fast” for Karatsuba’s multiplication.3
• The total size of all subproblems we solve is less than the original problem’s size. Then
we go in linear time (e.g., the problem of finding the k-th smallest in a set of n elements).
We saw some simple cases of recurrence relations with additive and multiplicative index
structure (we reduced the latter to the former). Now we turn to a wider class of additive
index structure problems where our previous technique of utilizing a “telescoping sum”
n
(t(i) − t(i − 1))
i=1
does not apply because the right hand side still refers to t(i) for some i < n. Such is the
case of the well known Fibonacci sequence Fn given by
⎧
⎪
⎪ if n = 0
⎨0
Fn = 1 if n = 1
⎪
⎪
⎩F + Fn−2 if n > 1
n−1
The method of generating functions that solves this harder problem also solves the previous
problems we saw.
Here’s the method in outline. We will then embark on a number of fully worked out
examples.
Given a recurrence relation
with the appropriate “starting” (initial) conditions. We want tn in “closed form” in terms of
known functions. Here are the steps:
(2) is a formal power series, where formal means that we only are interested in the form
of the “infinite sum” and not in any issues of convergence4 (therefore “meaning”) of
the sum. It is stressed that our disinterest in convergence matters is not a simplifying
convenience but it is due to the fact that convergence issues are irrelevant to the problem
in hand!
In particular, we will never have to consider values of z or make substitutions into z.
2. Using the recurrence (1), find a closed form of G(z) as a function of z (this can be done
prior to knowing the tn in closed form!)
3. Expand the closed form G(z) back into a power series
∞
G(z) = i=0 ai z i
(3)
= a0 + a1 z + a2 z 2 + · · · + an z n + · · ·
But now we do have the an ’s in terms of known functions, because we know G(z) in
closed form! We only need to compare (2) and (3) and proclaim
tn = an for n = 0, 1, . . .
Steps 2. and 3. embody all the real work. We will illustrate by examples how this is done
in practice, but first we need some “tools”:
Let us concentrate below on the “boxed” results, which we will be employing —not being
too interested in the arithmetic needed to obtain them!
The Binomial Expansion. For our purposes in this volume we will be content with just one
tool, the “binomial expansion theorem” of calculus (the finite version of it we proved by
induction here Example 5.2.25):
For any m ∈ R (where R is the set of real numbers) we have
∞
m
(1 + z) =
m
zr
r
r =0 (4)
m r
= ··· + z + ···
r
4 In Calculus one learns that power series converge in an interval like |z| < r for some real r ≥ 0.
The r = 0 case means the series diverges for all z.
7.4 Generating Functions 219
⎧
m def ⎨1 if r = 0
= m(m − 1) · · · (m − [r − 1]) (5)
r ⎩ otherwise
r!
The expansion (4) terminates with last term
m m by (5) m
z = z
m
as the “binomial theorem of Algebra” says, and that is so iff m is a positive integer. In all
other cases (4) is non-terminating (infinitely many terms) and the formula is then situated
in Calculus. As we remarked before, we will not be concerned with when (4) converges.
Note that (5) gives the familiar
m m(m − 1) · · · (m − [r − 1])
=
r r!
m(m − 1) · · · (m − [r − 1])(m − r ) · · · 2 · 1
=
r !(m − r )!
m!
=
r !(m − r )!
when m ∈ N. In all other cases we use (5) because if m ∈
/ N, then “m!” is meaningless.
Let us record the very useful special case when m is a negative integer, −n (n > 0).
1 r r
(1 − z)−1 = = ··· + z + ···
1−z r (8)
= ··· + z + ···
r
220 7 Recurrence Equations and Their Closed-Form Solutions
The above is the familiar “converging geometric progression” (converging for |z| < 1, that
is, but this is the last time I’ll raise irrelevant convergence issues). Two more special cases
of (6) will be helpful:
−2 1 r +1 r
(1 − z) = = ··· + z + ···
(1 − z)2 r (9)
= 1 + 2z + · · · + (r + 1)z r + · · ·
and
1 r +2 r
(1 − z)−3 = = · · · + z + ···
(1 − z) 3 r
(10)
(r + 2)(r + 1) r
= 1 + 3z + · · · + z + ···
2
a0 = 1
(i)
an = 2an−1 + 1 if n > 0
Write (i) as
an − 2an−1 = 1 (ii)
Next, form the generating function for an , and a “shifted” copy of it (multiplied by 2z; z
does the shifting) underneath it (this was “inspired” by (ii)):
G(z)(1 − 2z) = 1 + z + z 2 + z 3 + · · ·
1
=
1−z
Hence
1
G(z) = (iii)
(1 − 2z)(1 − z)
(iii) is G(z) in closed form. To expand it back to a (known) power series we first use the
“partial fractions” method (familiar to students of calculus) to write G(z) as the sum of two
fractions with linear denominators. I.e., find constants A and B such that (iv) below is true
7.4 Generating Functions 221
for all z:
1 A B
= +
(1 − 2z)(1 − z) (1 − 2z) (1 − z)
or
1 = A(1 − z) + B(1 − 2z) (iv)
Setting in turn z ← 1 and z ← 1/2 we find B = −1 and A = 2, hence
2 1
G(z) = −
1 − 2z 1−z
= 2 · · · (2z)n · · · − · · · z n · · ·
= · · · (2n+1 − 1)z n · · ·
Comparing this known expansion with the original power series above, we conclude that
an = 2n+1 − 1
Of course, we solved this problem much more easily in Sect. 7.2. However due to its
simplicity it was worked out here again to illustrate this new method. Normally, you
apply the method of generating functions when there is no other obviously simpler way to
do it.
G(z) = p1 + p2 z + p3 z 2 + · · · + pn+1 z n + · · ·
zG(z) = p1 z + p2 z 2 + · · · + pn z n + · · ·
G(z)(1 − z) = 2 + 2z + 3z 2 + 4z 3 + · · · + (n + 1)z n + · · ·
1
=1+ by (9)
(1 − z)2
Hence
222 7 Recurrence Equations and Their Closed-Form Solutions
1 1
G(z) = +
1−z (1 − z)3
(n + 2)(n + 1) n
= · · · zn · · · + · · · z ··· by (10)
2
(n + 2)(n + 1) n
= ··· 1 + z ···
2
Comparing this known expansion with the original power series above, we conclude that
(n + 2)(n + 1)
pn+1 = 1 + , the coefficient of z n
2
or (n + 1)n
pn = 1 +
2
7.4.3 Example Here is one that cannot be handled by the techniques of Sect. 7.2.
s0 = 1
s1 = 1 (i)
sn = 4sn−1 − 4sn−2 if n > 1
Write (i) as
sn − 4sn−1 + 4sn−2 = 0 (ii)
to “inspire”
G(z) = s0 +s1 z +s2 z 2 + · · · +sn z n +···
4zG(z) = 4s0 z +4s1 z 2 + · · · +4sn−1 z n + · · ·
4z 2 G(z) = 4s0 z 2 + · · · +4sn−2 z n + · · ·
By (ii),
G(z)(1 − 4z + 4z 2 ) = 1 + (1 − 4)z
= 1 − 3z
Since 1 − 4z + 4z 2 = (1 − 2z)2 we get
1 1
G(z) = − 3z
(1 − 2z) 2 (1 − 2z)2
= · · · (n + 1)(2z)n · · · − 3z · · · (n + 1)(2z)n · · ·
= · · · (n + 1)2n − 3n2n−1 z n · · ·
Thus,
sn = (n + 1)2n − 3n2n−1
= 2n−1 (2n + 2 − 3n)
= 2n (1 − n/2)
7.4 Generating Functions 223
7.4.4 Example Here is another one that cannot be handled by the techniques of Sect. 7.2.
s0 = 0
s1 = 8
hence sn = 2 · 3n − 2(−1)n
F0 = 0
F1 = 1 (i)
Fn = Fn−1 + Fn−2 if n > 1
Write (i) as
Fn − Fn−1 − Fn−2 = 0 (ii)
224 7 Recurrence Equations and Their Closed-Form Solutions
Next,
G(z) = F0 +F1 z +F2 z 2 + · · · +Fn z n + · · ·
zG(z) = F0 z +F1 z 2 + · · · +Fn−1 z n + · · ·
z G(z) =
2 F0 z 2 + · · · +Fn−2 z n + · · ·
By (ii),
G(z)(1 − z − z 2 ) = z
The roots of 1 − z − z 2 = 0 are
⎧ √
√ ⎪ −1 + 5
⎪
−1 ± 1 + 4 ⎨
z= = 2√
2 ⎪ −1 − 5
⎪
⎩
2
For convenience of notation, set
√ √
−1 + 5 −1 − 5
φ1 = , φ2 = (iii)
2 2
Hence
1 − z − z 2 = −(z − φ1 )(z − φ2 )
(iv)
= −(φ1 − z)(φ2 − z)
therefore
z A B
G(z) = = + splitting into partial fractions
1−z−z 2 φ1 − z φ2 − z
from which (after some arithmetic that I will not display),
φ1 φ2
A= , B=
φ1 − φ2 φ2 − φ1
so
1 φ1 φ2
G(z) = −
φ1 − φ2 φ1 − z φ2 − z
1 1 1
= −
φ1 − φ2 1 − z/φ1 1 − z/φ2
n n
1 z z
= ··· ··· − ··· ···
φ1 − φ2 φ1 φ2
therefore
1 1 1
Fn = − n (v)
φ1 − φ2 φ1n φ2
Let’s simplify (v):
First, by brute force calculation, or by using the “known” relations between the roots of
a 2nd degree equation, we find
7.5 Exercises 225
√
φ1 φ2 = −1, φ1 − φ2 = 5
7.5 Exercises
T (1) = 1
T (n) = 2T (n/2) + n 2
3. Solve in closed form the following recurrence, and express the answer in Big-O notation.
Do not use generating functions.
226 7 Recurrence Equations and Their Closed-Form Solutions
T (1) = a
T (n) = 3T (n/2) + n
4. In this exercise you are asked to use the method of generating functions —the tele-
scoping method is not acceptable.
Solve in closed form the following recurrence.
a0 = 0
an = an−1 + 1, for n ≥ 1
a0 = 1
a1 = 2
for n ≥ 2, an = 2an−1 − an−2
8. a. Prove that if G(z) is the generating function for the sequence (an ), for n = 0, 1, . . .,
that is,
G(z) = a0 + a1 z + a2 z 2 + a3 z 3 + · · ·
n
then G(z)/(1 − z) is the generating function of the sequence ( i=0 an ), for n = 0, 1, . . ..
7.5 Exercises 227
n
n(n + 1)
i=
2
i=0
a0 = 0
a1 = 1
a2 = 2
for n ≥ 3, an = 3an−1 − 3an−2 + an−3
10. Using generating functions solve the following recurrence in closed form.
a0 = 0
a1 = 1
for n ≥ 2, an = −2an−1 − an−2
a0 = 1
a1 = 1
a2 = 1
for n ≥ 3, an = an−1 + an−3
• n/2 + n/2 = n, for all n. Hint. Argue the two cases separately, n even and n
odd.
• n/2 − 1 = (n − 1)/2, for all n. Hint. Argue the two cases separately, n even
and n odd.
• n/2 = (n + 1)/2, for all n. Hint. Argue the two cases separately, n even and n
odd.
• n/2 ≥ n/2 ≥ n/2 (trivial).
• n/2 + 1 ≥ n/2 . Hint. Directly from the definitions of . . . and . . . .
• n/4 = n/2 /2. Hint. If l = n/4, then by definition, l ≤ n/4 < l + 1 hence
2l ≤ n/2 < 2l + 2 thus 2l ≤ n/2 < 2l + 2 (explain “≤” but the “<” is trivial).
It follows that l ≤ n/2 /2 < l + 1. Checkmate in one very short sentence.
228 7 Recurrence Equations and Their Closed-Form Solutions
13. Consider standard binary search, where an array A[1 . . . n] with entries in ascending
order is recursively searched for the possible occurrence of K as an A[i] as follows:
A[(n + 1)/2 + 1 . . . n]
a. Set up the run time (upper bound, worst case) for the run time λn.T (n) —assumed
to be a non-decreasing function of n— of the algorithm. Preserve . . . in the recur-
rence equations for T , i.e., do not approximate with non-integer expressions such as
n/2, (n + 1)/2.
Hint. Trivially, the worst case T (n) is 1 (comparing K with the middle entry) plus
maximum of worst case time for left —T ((n + 1)/2 − 1)— call and right —
T (n − (n + 1)/2)— call. Decide which of these two calls is always worst and
arrive at T (n) = 1 + T (. . .). Note the “=”! You are equating two sides with the
worst case run time in each.
b. Now solve for T (n) in exact closed form by the telescoping trick and using the
appropriate tools from 12. Hint. Extend the last bullet from 12 with the result
n/2m /2 = n/2m+1 .
14. Consider a modified “binary search” where the “middle” entry of the array that we
compare with the key5 we are searching for, is the one stored in location to n/2 rather
than (n + 1)/2.
a. Formulate the recurrence that expresses the worst case number of comparisons T (n)
in this modified binary search.
5 In storing data —for example in an array— we often store them accompanied by short aliases that
we call “keys”. Thus instead of repeatedly comparing a possibly unwieldy and large record during
our search, we instead compare repeatedly its key against the keys of the stored (in the array) records.
7.5 Exercises 229
b. Solve your recurrence exactly, that is, do not “simplify” the floors, and do not answer
in O-notation.
15. Solve the following recurrence and express the solution in O-notation, where it is given
that the function T that expresses the run time is increasing.
0 if n ≤ 2
T (n) = √ √
n T ( n) + n if n > 2
Hint. Solve for f (n) = T (n)/n instead. The simplified equations for f must be solved
m
first for the special case of n = 22 . Then adapt to any n.
Caution! Show that your solution is valid for all n rather than only for n of restricted
forms.
16. Solve the following recurrence in O-notation, where we are told that T (n) is increasing.
1 if n = 1
T (n) =
aT (n/b) + n if n > 2
2
Caution!
a. You need to consider cases according to the values of the natural numbers a kai b.
b. Show that your closed-form solution is valid for all n rather than only for those of
some restricted forms.
17. Consider the described below algorithm to search a sorted (ascending order) array of
length n:
Formulate and solve exactly (not in O-notation) the recurrence relation that defines the
worst case number of comparisons T (n).
18. Let the symbol (n) stand for the number of ways we can write down the sum a1 +
· · · + an with all possible brackets inserted.
Examples:
The trivial “sum” a1 offers only one way to become “fully parenthesised”, namely, (a1 ).
230 7 Recurrence Equations and Their Closed-Form Solutions
Overview
This short Addendum uses trees to calculate a sum that arrises in the analysis of algorithms
in exact (i.e., not in O-notation) closed form. The difficulty with this sum is that its terms
involve the ceiling function —in something forbidding like λx.log2 x. In the area of
discrete mathematics known as graph theory, trees —in particular binary trees— play a
central role as special cases of the so-called directed graphs. While trees are studied for
their own merit in modelling important data structures in computing practise, they have also
unexpected applications to discrete mathematics such as the one we will demonstrate in this
chapter. The chapter concludes with an application of generating functions used to compute
a simple expression that computes the number of all extended trees that have n internal
nodes.
There is a direct graph-theoretic definition of a tree —that is beyond the design of this
volume— it is arguably more convenient to go the direct graph-independent route to a
definition (Example 6.3.10) that we took in Chapter 6 if for no other reason besides the fact
that such definition enables us to prove tree properties by structural induction. Our definitive
definition has been given in Example 6.3.10.
8.1.1 Example This example supplements the discussion started at 6.3.10.
So here are some trees ∅, (∅, 1, ∅), and ((∅, 1, ∅), 2, ∅) where we wrote 1 and 2 for 1
and 2 respectively.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 231
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0_8
232 8 An Addendum to Trees
In the figure below the notation shows only the structure part (not the support) and the
first example is what we may call “the most general tree” as there is no specific assumptions
regarding the left and right subtrees of the root r (which we can, by abuse of notation and
language just call “root r ”). They are “any” trees drawn as triangles with names “T1 ” and
“T2 ” respectively.
The last tree below has both subtrees of its root empty, while the second tree has an empty
right subtree. Thus, the “general tree” is drawn as one of the following, where r is the root.
The leftmost drawing uses the notation of a “triangle” to denote a tree. The other three cases
are used when we want to draw attention to the fact that the right (respectively, left, both)
subtree(s) is (are) empty.
8.1.2 Definition (Simple and Extended Trees) We agree that we have two types of tree-
notations (abusing language we say that we have two types of trees).
Simple Trees are those drawn only with “round” nodes (i.e., we do not draw the empty
subtrees).
Extended Trees are those that all empty subtrees are drawn as “square nodes” (as in 6.3.10).
We call, in this case, the round nodes internal and the square nodes external.
Clearly, the “external nodes” of an extended tree cannot hold any information since they are
(notations for) empty subtrees.
Alternatively, we may think of them (in a programming-implementation sense) as nota-
tions for null pointers. That is, in the case of simple trees we do not draw any null links,
while in the case of extended trees we draw all null links as square nodes.
In an extended tree the only leaves are the external (square) nodes.
8.1 Trees: More Terminology 233
8.1.6 Definition A non leaf node is fertile iff it has two children. A tree is fertile iff all its
non leaves are fertile.
A tree is full iff it is fertile and all the leaves are at the same level.
An extended tree is always fertile. The last sentence above then simplifies, if restricted to
such trees, to “All square nodes are at the same level”.
8.1.7 Example (Full Trees). A “full tree” has all the possible nodes it deserves.
8.1.8 Definition A tree is complete iff it is fertile and all the leaves occupy at most two
consecutive levels (obviously, one is going to be the last (highest) level).
234 8 An Addendum to Trees
Again, for extended trees we need only ask that all square nodes occupy “at most two
consecutive levels”.
Redraw the above so that all the square nodes are “rounded” and you get examples of
complete Simple Trees.
8.1.10 Example There is a variety of complete trees, the general case having the nodes
in the highest level scattered about in any manner. In practice we like to deal mostly with
complete trees whose highest level nodes are left justified (left-complete) or right-justified
(right-complete). See the following, where we drew (abstractly) a full tree (special case of
complete!), a left complete, a right complete, and a “general” complete tree.
8.2.1 Theorem An extended tree with a total number of nodes equal to n (this accounts for
internal and external nodes) has n − 1 edges.
Proof We do induction with respect to the definition of trees, or as we say in short, induction
on trees. Cf. 6.2.3.
Basis. The smallest tree is ∅, i.e., exactly one “square” node. It has no edges, so the
theorem verifies in this case.
8.2 A Few Provable Facts About Trees 235
By I.H., the left and right (small) subtrees have k − 1 and l − 1 edges respectively. Thus the
total number of edges is k − 1 + l − 1 + 2 = k + l (Note that in an extended tree all round
nodes are fertile, so both edges emanating from the root are indeed present).
On the other hand, the total number of nodes n + 1 is k + l + 1. We rest our case.
ε + φ − 1 = 2φ
Proof Let E be the original number of edges, still to be computed in terms of n. Add external
nodes (two for each “original” leaf). What this does is:
1 No magic with n + 1. We could have called the total n, but then we would have to add “where
n ≥ 1” to account for the presence of the root. The “ ≥ 1” part is built-in if you use n + 1 instead.
236 8 An Addendum to Trees
Total Nodes= 2n + 1
Total Edges = E + n + 1
Basis The smallest tree is one round node. Its level is 0 and 2−0 = 1, so we are OK.
I.H. Assume for small trees, and go to the “big” case.
I.S. The big case (recall that the tree is fertile, so even though simple, the root has two
children).
and
2−l = 1 (2)
l is a leaf’s
level in T2
8.2 A Few Provable Facts About Trees 237
It is understood that (1) and (2) are valid for T1 and T2 “free-standing” (i.e., root level is 0
in each). When they are incorporated in the overall tree, call it T , then their roots obtain a
level value of 1, so that formulas (1) and (2) need adjustment: All levels now in T1 , T2 are
by one larger than the previous values. Thus,
2−l = 2−(l+1) + 2−(l+1) = 1/2 + 1/2 = 1
I.H.
8.2.6 Corollary In both 8.2.4 and 8.2.5, if the root is assigned level 1, then
2−l = 1/2
l is a leaf’s
level
Next we address the relation between n, the number of nodes in a simple complete tree
(8.1.8), with its height l (8.1.4).
Clearly,
n = 20 + 21 + · · · + 2l−2 + k ≤ 2l − 1 (A)
238 8 An Addendum to Trees
thus
2l−1 − 1 < n ≤ 2l − 1
From this follows
2l−1 < n + 1 ≤ 2l
or
2l−1 ≤ n < 2l
leading to
l = log2 (n + 1) = log2 n + 1 (∗)
a good formula to remember.
Of course, all this holds when counting levels from 1 up. Check to see what happens if the
root level is 0.
How does k, the number of nodes at level l, relate to n, the number of nodes in the tree?
From (A),
k = n + 1 − 2l−1 (∗∗)
another very important formula this to remember, which can be also written (because of (∗))
as
k = n + 1 − 2log2 (n+1)−1
(B)
= n + 1 − 2log2 n
Note that (∗∗) or (B) hold even if some or all nodes at level l − 1 have no more than one
child (in which case the tree fails to be complete, or fertile for that matter).
Let us see next what happens if we label the nodes of a left-complete tree by numbers
successively, starting with label 2 for the root.
8.3 An Application to Summations 239
Note that if t is any of the numbers in (1), then 2i−1 < t ≤ 2i , hence
log2 t = i
n+1
A= log2 i (2)
i=2
which arises in the analysis of certain algorithms. The figure below helps to group terms
appropriately:
Clearly,
l−1
A= i2i−1 + klog2 (n + 1) (3)
i=1
To compute (3) we need to find k as a function of n, and to evaluate
l−1
B= i2i−1 (4)
i=1
There are two cases at level l as in the previous figure. Regardless, k is given in (∗∗) of the
previous section (p. 238) as
k = n + 1 − 2l−1
Thus, we only have to compute (4). Now,
240 8 An Addendum to Trees
l−1
B= i2i−1
i=1
l−2
= (i + 1)2i
i=0
l−2
= i2i + 2l−1 − 1
i=0
l−2
=2 i2i−1 + 2l−1 − 1
i=1
l−1
=2 i2i−1 − (l − 2)2l−1 − 1
i=1
= 2B − (l − 2)2l−1 − 1
Thus, B = (l − 2)2l−1 + 1, and the original becomes (recall (∗) and (∗∗)!)
A = (l − 2)2l−1 + 1 + kl
= l2l−1 − 2l + 1 + (n + 1 − 2l−1 )l
= (n + 1)l − 2l + 1
= (n + 1)log2 (n + 1) − 2log2 (n+1) + 1
Note. A rough analysis of A would go like this: Each term of the sum is O log(n + 1)
and we have n terms. Therefore, A = O n log(n + 1) . However we often need the exact
answer . . .
We want to find the number of all extended binary trees of n internal nodes.
Thus, n = m + r + 1. We can choose T1 in xm different ways, and for each way, we can
have xr different versions of T2 . And that is true for each size of T1 . Thus, the recurrence
equations for xn are
G(z) = x0 + x1 z + x2 z 2 + · · · + xn z n + · · · (3)
Indeed,
G(z)2 = x02 + (x0 x1 + x1 x0 )z + · · · + ( n−1=m+r xm xr )z n−1 + · · ·
= x1 + x2 z + · · · + xn z n−1 + · · · (4)
1/2
(1 − 4z)1/2 = 1 + · · · + (−4z)n + · · ·
n
1/2
Let us work with the coefficient (−4)n .
n
242 8 An Addendum to Trees
Going from (7) to (8) above we introduced factors [2 · 1], [2 · 2], . . . , [2 · [n − 1]] in order
to “close the gaps” and make the numerator be a factorial. This has spent n − 1 of the n
2-factors in 2n , but introduced (n − 1)! on the numerator, hence we balanced it out in the
denominator.
It follows that, according to the second case of (6),
2 2n − 2 n
1 − 1 − 2z − · · · − z − ···
n n−1
xn = G(z)[z n ] = [z n ] (9)
2z
where for any generating function G(z), G(z)[z n ] denotes the coefficient of z n .
In short,
1 2n
xn =
n+1 n
Don’t forget that we want G(z)[z n ]; we have adjusted for the division by z.
8.5 Exercises
n+m n m
=
r i j
i+ j=r
Hint. Multiply the generating functions (1 + z)n and (1 + z)m and look at the r -th coef-
ficient.
2. Do n, m above have to be natural numbers?
8.5 Exercises 243
3. This exercise is asking you to 1) formulate the recurrence equations and 2) solve them
in exact closed form (not in O-notation) for the worst case run time of the “linear merge
sort” algorithm.
Just like the binary search algorithm, linear merge sort uses a divide and conquer —
as practitioners in the analysis of algorithms call them— technique to devise a fast
algorithm.
In this case we want to sort an array A[1 . . . n] in ascending order. Thus we divide the
problem into two almost equal size problems —to sort each of A[1 . . . (n + 1)/2 ] and
A[(n + 1)/2 + 1 . . . n]— and we do so by calling the procedure recursively for each
half-size array.
We then do a linear —essentially n-comparisons— merge of the two sorted half-size
arrays.
Hints and directions.
a. Prove that the recurrence equations for the worst case are
T (0) = 0
T (1) = 0
Case of n > 1
T (n) = T (n/2) + T (n/2 ) + n
Be sure to prove that the equations above are correct.
b. You will use now the tools from Exercise 7.5.12 and also what we have learnt from
our work (your work) on recurrence equations solving. Exercise 7.5.13 will be helpful
in the final stages of your solution of the present exercise. Indeed let me show how
you can solve the above recurrence by reducing it, essentially, to the binary search
case recurrence.
Step 1: Towards the telescoping trick. We are also trying to get all divisions by 2 in
. . .-notation. So,
De f
Step 2: Rename: B(n) = T (n)−T (n−1) so that B(n)=B(n/2)+1 and B(n) =
0, for n ∈ {0, 1}.
244 8 An Addendum to Trees
Step 3: Solve for B(n) similarly to Exrecise 7.5.13 and note that 1 < n/2m ≤ 2 iff
2m < n ≤ 2m+1 iff m < log2 n ≤ m + 1 iff log2 n = m + 1. Thus,
B(n) = B(n/2) + 1
= B( n/22 ) + 2
= B( n/23 ) + 3
..
.
= B( n/2m+1 ) + m + 1
So (use the initial (boundary) conditions for B and provide details!) conclude
that B(n) = log n, where I wrote “log” for “log2 ”, and
Step 4: Compute the sum of the telescoping T (n) − T (n − 1) = log n using
Section 8.3.
4. Given an extended tree. Traverse it and mark its round nodes in the order you encounter
them.
Program —as a pseudo program— this traversal so that you first process the left subtree,
then the root and then the right subtree. This is called the inorder traversal.
Suppose now that the given tree contains one number in each round node. Moreover
assume the tree has the property that for every round node the number contained therein
is greater than all numbers contained in the node’s left subtree and less than all the
numbers in the right subtree of the node.
Prove that the inorder traversal of such a tree will visit all these numbers in ascending
order.
5. We have introduced edges to trees. Prove that no node (round or square) in a tree has
more than one edge coming into it (from above).
References
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 245
Springer Nature Switzerland AG 2024
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0
246 References
G. Tourlakis, Lectures in Logic and Set Theory, Volume 1: Mathematical Logic (Cambridge University
Press, Cambridge, 2003a)
G. Tourlakis, Lectures in Logic and Set Theory, Volume 2: Set Theory (Cambridge University Press,
Cambridge, 2003b)
G. Tourlakis, Mathematical Logic (Wiley, Hoboken, NJ, 2008)
G. Tourlakis, Theory of Computation (Wiley, Hoboken, NJ, 2012)
G. Tourlakis, Computability (Springer Nature, New York, 2022)
R.L. Wilder, Introduction to the Foundations of Mathematics (Wiley, New York, 1963)
Index
A Binary, 186
Adjacency matrix, 57 Binary notation, 186
A-introduction, 138 Binary relation, 44
Algorithm Binary search, 228
analysis of , 243 Binomial coefficient, 161
efficient, 207 Binomial theorem, 161
Euclidean, 226 Boolean, 57
feasible, 207 Boolean connectives, 204
Alphabet, 117 Boolean formulas, 203
Ambiguity, 204 Bound, 118, 123
Ambiguous, 36, 204 Bound variable, 89
string notation, 36 changing, 138
Analysis of algorithms, 243
Ancestor, 232
Antinomy, 1 C
Atomic Boolean formulas, 204 Call
Atomic formula, 204 to a function, 88
constant, 141 Cancelling indices, 106
Axiom Canonical index, 181
of foundation, 127, 156, 166 Captured, 122
mathematical, 126 Cartesian product, 34
schema, 124 Ceiling, 63
special, 126 Chain, 184, 232
Axiom of choice, 182 in a graph, 232
Axiom of replacement, 87 Child, 232
Axiom schema, 124 Child node, 232
Choice, 94
axiom, 94
B function, 94
Basis function, 205 Class, 7
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 247
Springer Nature Switzerland AG 2024
G. Tourlakis, Discrete Mathematics, Synthesis Lectures on Mathematics & Statistics,
https://doi.org/10.1007/978-3-031-30488-0
248 Index
Interval square, 59
open, 183 MC
Inverse, 73, 93 for non orders, 82
Inverse image, 86 Minimal
Inverse relation, 73 <-minimal, 77
Irrational, 183 for non orders, 82
Irrational number, 183 Minimal condition, 83, 143
Irrational real, 183 for non orders, 82
Iteration function, 205 Monotone operator, 108, 109
K N
Key, 228 Natural number
Kleene closure, 37 formal, 186
Kleene’s extended equality, 89 n-factorial, 161
Kleene star, 37 Node, 199, 233
ancestor of, 232
circular, 199
L descendant of, 232
λ calculus, 88 fertile, 233
λ-notation, 88 non leaf, 233
Language, 38 square, 199
finitely definable, 38 Node level, 233
Language concatenation, 38 Non-axiom
Leaf, 232 hypothesis, 132
Leaf node, 232 Non comparable elements, 75
Least fixpoint, 110, 184 Nontotal, 47
Least upper bound Notation
the, 182 infix, 44
Left field, 46 Null pointer, 232
Left inverse, 92 Null string, 37
Left-narrow, 154 Number
Left-narrow over a class, 166 irrational, 183
Length, 34 rational, 183
Level, 233
Linear merge sort, 243
Linear order, 77 O
Lub, 182 Object variable, 118
Occurrence, 118
free, 118
M of variable, 118
Majorised, 62 1-1 correspondence, 90
Map Onto, 47
inclusion, 110 Open interval, 183
Mathematical axioms, 126 Open segment, 167
Matrix, 57 Operation, 190
adjacency, 57 Operator, 108, 134
diagonal entries, 59 monotone, 108, 109
identity, 59 Order, 74
Index 251