Science of Concurrent Programs
Science of Concurrent Programs
Leslie Lamport
1 Introduction 1
1.1 Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Who Are You? . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 The Origin of the Science . . . . . . . . . . . . . . . . . . . . 2
1.3.1 The Origin of the Theory . . . . . . . . . . . . . . . . 2
1.3.2 The Origin of the Practice . . . . . . . . . . . . . . . . 3
1.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Why Math? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Ordinary Math 12
2.1 Arithmetic and Logic . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Elementary Algebra . . . . . . . . . . . . . . . . . . . 14
2.1.3 An Introduction to Mathglish . . . . . . . . . . . . . . 16
2.1.4 Proofs in Elementary Algebra . . . . . . . . . . . . . . 17
2.1.4.1 An Example . . . . . . . . . . . . . . . . . . 18
2.1.4.2 Longer Proofs . . . . . . . . . . . . . . . . . 21
2.1.5 The Semantics of Elementary Algebra . . . . . . . . . 23
2.1.6 Arithmetic Logic . . . . . . . . . . . . . . . . . . . . . 27
2.1.7 Propositional Logic . . . . . . . . . . . . . . . . . . . . 29
2.1.8 The Propositional Logic of Arithmetic . . . . . . . . . 31
2.1.8.1 The Logic . . . . . . . . . . . . . . . . . . . . 31
2.1.8.2 More About Proofs . . . . . . . . . . . . . . 33
2.1.9 Predicate Logic . . . . . . . . . . . . . . . . . . . . . . 36
2.1.9.1 Quantifiers . . . . . . . . . . . . . . . . . . . 36
2.1.9.2 Variables and Their Scopes . . . . . . . . . . 38
i
CONTENTS ii
5 Refinement 159
5.1 A Sequential Algorithm . . . . . . . . . . . . . . . . . . . . . 160
5.1.1 A One-Step Program . . . . . . . . . . . . . . . . . . . 161
5.1.2 Two Views of Refinement Mappings . . . . . . . . . . 162
5.1.3 A Step and Data Refinement . . . . . . . . . . . . . . 164
5.2 Invariance Under Refinement . . . . . . . . . . . . . . . . . . 167
5.3 An Example: The Paxos Algorithm . . . . . . . . . . . . . . . 168
5.3.1 The Consensus Problem . . . . . . . . . . . . . . . . . 169
5.3.2 The Paxos Consensus Algorithm . . . . . . . . . . . . 171
5.3.2.1 The Specification of Consensus . . . . . . . . 171
5.3.2.2 The Voting Algorithm . . . . . . . . . . . . . 172
5.3.2.3 The Paxos Abstract Program . . . . . . . . . 174
5.3.3 Implementing Paxos . . . . . . . . . . . . . . . . . . . 176
5.4 Proving Refinement . . . . . . . . . . . . . . . . . . . . . . . 178
5.4.1 The Refinement Mapping . . . . . . . . . . . . . . . . 179
5.4.2 Refinement of Safety . . . . . . . . . . . . . . . . . . . 180
5.4.3 Refinement of Fairness . . . . . . . . . . . . . . . . . . 183
5.4.4 A Closer Look at E . . . . . . . . . . . . . . . . . . . 186
5.4.4.1 A Syntactic View . . . . . . . . . . . . . . . 186
5.4.4.2 Computing E . . . . . . . . . . . . . . . . . 187
5.4.4.3 The Trouble With E . . . . . . . . . . . . . 189
5.5 A Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
CONTENTS v
Appendix 295
A Digressions 295
A.1 Why Not All Mappings Are Sets . . . . . . . . . . . . . . . . 295
A.2 Recursive Definitions of Mappings . . . . . . . . . . . . . . . 296
A.3 How Not to Write x 000 . . . . . . . . . . . . . . . . . . . . . . 298
A.4 Hoare Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
A.5 Another Way to Look at Safety and Liveness . . . . . . . . . 301
A.5.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . 301
A.5.2 The Metric Space of Behaviors . . . . . . . . . . . . . 304
B Proofs 306
B.1 Invariance Proof of Increment . . . . . . . . . . . . . . . . . . 306
B.2 Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . 309
B.3 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . 311
B.4 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . 312
B.5 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . 313
B.6 Proof of Theorem 4.6 . . . . . . . . . . . . . . . . . . . . . . . 313
B.7 Proof of Theorem 4.7 . . . . . . . . . . . . . . . . . . . . . . . 315
B.8 Proof Sketch of Theorem 4.8 . . . . . . . . . . . . . . . . . . 317
B.9 Proof of Formula (5.4) . . . . . . . . . . . . . . . . . . . . . . 318
B.10 Proof of Theorem 6.2 . . . . . . . . . . . . . . . . . . . . . . . 318
CONTENTS vii
Bibliography 327
Index 332
CONTENTS viii
• The pdf version contains links to other places in the book. They
are colored like this. I have not decided in what color links should
be. Besides links to numbered things like equations, there are a few
other links in the text from terms to where from they are defined or
introduced. I have not decided whether there should be more or none
of those other links.
• Almost all the examples are tiny, so the reader can concentrate on the
principles without being distracted by complications in the examples.
These tiny examples have been written in TLA+ and verified by model
checking; any errors in what is asserted about them are the result of
incorrect transcriptions of what has actually been verified. The TLA+
versions of the examples will be made available on the Web.
• The book contains a few marginal notes. The marginal notes in this
version colored like this are notes to myself for things to be checked
when preparing the published version.
CONTENTS ix
Acknowledgments (unfinished)
Because the ideas in this book were developed over several decades, it’s im-
possible for me to give credit to everyone who influenced their development.
I will therefore restrict this section to acknowledging only people who were
coauthors of papers of which I’ve been an author. They will be listed alpha-
betically. I haven’t decided whether to simply list them, or to indicate what
their influence has been. I currently expect there to be between 16 and 24
people listed, but I may change my mind on who might be included.
I will also add acknowledgments to people who read preliminary versions
and reported errors or sent me comments that led to significant changes.
Chapter 1
Introduction
1.1 Who Am I?
Dear Reader. I am inviting you to spend many pages with me. Before
deciding whether to accept my invitation, you may want to know who I am.
I was educated as a mathematician; my doctoral thesis was on partial
differential equations. While a student, I worked part-time and summers as a
programmer. At that time, almost all programs were what I call traditional
programs—ones with a single thread of control that take input, produce
output, and stop.
After obtaining my doctorate, I began working on concurrent algorithms—
ones comprising multiple independent threads of control, called processes,
that are executed concurrently. The first concurrent algorithms were meant
to be executed on a single computer, and processes communicated through a
shared memory. Later came distributed algorithms—concurrent algorithms
designed to be executed on multiple computers in which processes commu-
nicate by message passing.
This is not the place for modesty. I was very good at concurrency—both
writing concurrent algorithms and developing the theory underlying them.
The first concurrent algorithm I wrote, published in 1974, is still taught at
universities. In 1978 I published what is probably the first paper on the
theory of distributed computing. I have received many awards and honors
for this work, including a Turing award (generally considered the Nobel
prize of computer science) for “fundamental contributions to the theory and
practice of distributed and concurrent systems.”
1
CHAPTER 1. INTRODUCTION 2
1.4 Correctness
Thus far, our science has been described as helping to build concurrent
programs that work correctly. Working correctly is a vague concept. Here
is precisely what we take it to mean.
We define a behavioral property to be a condition that is or is not sat-
isfied by an individual execution of a program. For example, termination
is a behavioral property. An execution either terminates or else it doesn’t
terminate, meaning that it keeps executing forever. We say that a program
satisfies a behavioral property if every possible execution of the program sat-
isfies it. A program is considered to work correctly, or simply to be correct,
if it satisfies its desired behavioral properties.
That every possible execution of a program satisfies its behavioral prop-
erties may seem like an unreasonably strong requirement. I would be happy
if a program that I use does the right thing 99.99% of the times I run it. For
many programs, extensive testing can ensure that it does. But it can’t for
most concurrent programs. What a concurrent program does can depend on
CHAPTER 1. INTRODUCTION 6
16, written GCD(12, 16), equals 4 because 4 is the largest integer such that
12 and 16 are both multiples of that integer. The algorithm is an abstract
program containing two variables that we name x and y. Here is its prose
description.
I believe most engineers and many scientists can’t explain why an execution
of Euclid’s algorithm computes GCD(M , N ), which means that they don’t
understand the algorithm. Here is the explanation provided by our science,
beginning with how we view executions.
We consider an execution to be a sequence of states. For Euclid’s algo-
rithm, a state is an assignment of values to the program variables x and y.
We write the state that assigns 7 to x and 42 to y as [x :: 7, y :: 42]. Here is
the sequence of states that is the execution of Euclid’s algorithm for M = 12
and N = 16.
" # " # " # " #
x :: 12 x :: 12 x :: 8 x :: 4
→ → →
y :: 16 y :: 4 y :: 4 y :: 4
The states in the sequence are separated with arrows because we naturally
think of an execution going from one state to the next. But in terms of our
science, the algorithm and its execution just are; they don’t go anywhere.
What an algorithm does in the future depends on its current state, not
on what happened in the past. This means that in the final state of the
execution, in which x and y are equal, they equal GCD(M , N ) because of
some property that is true of every state of the execution. To understand
Euclid’s algorithm, we must know what that property is.
That property is GCD(x , y) = GCD(M , N ). (Chapter 3 explains how
we show that every state satisfies this property.) Because an execution stops
only when x and y are equal, and GCD(i , i ) equals i for any positive integer
i , this property implies that x and y equal GCD(M , N ) in the final state of
the execution.
2
You may have seen a more efficient modern version of Euclid’s algorithm that replaces
the larger of x and y by the remainder when it is divided by the smaller. For the purpose
of this example, it makes little difference which we use.
CHAPTER 1. INTRODUCTION 8
are special cases of the following two classes of behavioral properties that
can be required of a concurrent program:
These two classes of properties are defined precisely in Section 4.1. Termina-
tion is the only liveness property required of a traditional program. There
are many kinds of liveness properties that can be required of concurrent
programs.
Euclid’s algorithm satisfies its safety requirement (being allowed to ter-
minate only if it has produced the correct output) because the only thing
it is allowed to do is start with x = M and y = N and execute its action.
That is, it satisfies its safety requirement because it is assumed to satisfy the
safety property of doing only what the description of the algorithm allows
it to do.
Euclid’s algorithm satisfies its liveness requirement (eventually terminat-
ing) because it is assumed to satisfy the liveness property of eventually per-
forming any action that its description allows it to perform. (Section 3.4.2.8
shows how we prove that the algorithm terminates.)
I have found it best to describe and reason about safety and liveness
in different ways. In our science, temporal logic plays almost no role in
handling safety, but it is central to handling liveness. The TLA formula for
an abstract program is the logical conjunction of a safety property and a
liveness property.
CHAPTER 1. INTRODUCTION 9
Math has been developed over thousands of years to be simple and ex-
pressive. An abstract program ignores many implementation details, which
often means allowing multiple possible implementations. This is simple to
express in math. Code is designed to describe one way of computing some-
thing. It can be hard or even impossible to write code that allows all those
possibilities. Being based on concepts from coding languages, pseudocode
also lacks the simplicity and expressiveness of math.
One place we want to allow many possible implementations is in describ-
ing what the environment can do. A program can’t work in an arbitrary
environment. An implementation of Euclid’s algorithm will not produce the
correct answer if the operating system can modify the variables x and y. A
concurrent program can interact with its environment in complicated ways,
and we have to state explicitly what the program assumes about its environ-
ment to know if its correct. We usually want to assume as little as necessary
about the environment, which means the abstract program should allow it
to have many different behaviors.
Unanticipated behavior of the environment is a serious source of errors
in concurrent programs. Part of the environment of a program is likely to
be another program, such as an operating system. Avoiding errors may
require finding answers to subtle questions about exactly what that other
program does. This is often difficult, because the only description of what
it does other than its code is likely to be imprecise prose. When writing the
abstract program to describe what our concrete program does, describing
what the environment can do will tell us what questions we have to ask.
The expressiveness of math, embodied in TLA, provides a practical
method of writing and checking the correctness of high-level designs of sys-
tems. Such checking can catch errors early, when they are easier to correct.
TLA+ is used by a number of companies, including Amazon [40], Microsoft,
and Oracle. Math also provides a new way of thinking about programs that
can lead to better programming. There is usually no way to quantify the
result of better thinking, but it was possible in the following instance.
Virtuoso was a real-time operating system. It controlled some instru-
ments on the European Space Agency’s Rosetta spacecraft that explored a
comet. Its creators decided to build the next version from scratch, starting
with a high-level design written in TLA+. They described their experience
in a book [47]. The head of the project, Eric Verhulst, wrote this in an email
to me:
than in [Virtuoso].3
This result was unusual. It was possible only because the design of the
entire system was described with TLA+. Usually, TLA+ is used to describe
only critical aspects of a system that involve concurrency, which represent
a small part of the system’s code. But this example dramatically illustrates
that describing abstract programs with mathematics can produce better
programs.
3
The book states the reduction in code size to be a factor of 5–10. Verhulst explained
to me that it was impossible to measure the reduction precisely, so the book gave a
conservative estimate; but he believes it was actually a factor of 10.
Chapter 2
Ordinary Math
12
CHAPTER 2. ORDINARY MATH 13
calculations. What we will use are the properties of numbers that you should
have learned—for example:
√ √
(2.1) 3 ∗ ( 3 + π) = (3 ∗ 3) + (3 ∗ π)
• Ones like x ∗ (y + 3), whose value after substituting numbers for the
variables is a number. We call such an expression a numeric expres-
sion. A number is a numeric expression that has no variables.
Most of the expressions you wrote in elementary algebra were either formulas
or parts of formulas; and most of the formulas you wrote were equations.
Most of what you did in elementary algebra consisted of solving equations.
That meant finding a single value for each variable such that substituting
those values for the variables in the equations made the equations equal
true.
We’re not interested in solving equations. We will describe programs
with formulas, so we need to understand those formulas. This requires
understanding some basic concepts of formulas. The formulas of elementary
algebra are used to explain these concepts because you’re familiar with them.
CHAPTER 2. ORDINARY MATH 15
An important class of formulas are ones that equal true no matter what
values are substituted for their variables. Such a formula is said to be valid ;
and the assertion that F is valid is written |= F . For example, the truth of
formula (2.1) is a special case of:
(2.2) |= p ∗ (q + r ) = p ∗ q + p ∗ r
distinct variables and exp1, exp2, and exp3 are numeric expressions. For
example
( p ∗ (q + r ) with q ← r , r ← q + s ) = p ∗ (r + (q + s))
As in this example, the with expression is usually enclosed in parentheses
when it appears in a formula, otherwise the formula would be difficult to
parse.
In that case, we have to look at the context in which the sentence appears.
The formula x ≥ y + 1 can be true only in a context in which some assump-
tions have been made about the values of x and y + 1—assumptions that
are expressed by formulas that are assumed to be true. Sentence 3 asserts
that x ≥ y + 1 is true if and only if formula (42) has either been assumed or
shown to be implied by assumptions made about x and y. I have tried to
make it clear by grammar or context what it means when a formula appears
in a sentence in this book.
There’s another source of ambiguity in most mathematical writing that
I have tried to avoid in this book. Almost no mathematicians other than
logicians write the meta-formula |= F differently from the formula F . In
most written math, you have to tell from the context which is meant.
2.1.4.1 An Example
In school, you learned to solve this pair of equations:
(2.3) 3 ∗ x − 2 ∗ y = 7 and 7 ∗ x + 3 ∗ y = 1
If you can still do it, you will find that the solution is x = 1 and y = −2. In
any case, you can easily calculate that substituting those values for x and y
makes both of these equations true. But is that the only solution? Are there
other values of x and y for which the equations are true? The procedure
you would have followed to find the solution actually proves that it is the
only one. As an example, we will write that procedure as a proof.
The method of solving equations (2.3) is based on some rules. One is:
(2.4) 42 = x + y and 2 ∗ x = y + 1
(2.5) 42 + 2 ∗ x = (x + y) + (y + 1)
For example, I presume you can calculate with elementary algebra expres-
sions, so you know that 42 + 2 ∗ x equals 2 ∗ x + 42 and (x + y) + (y + 1)
equals x + 2 ∗ y + 1. We can then apply the substitution rule twice, the first
CHAPTER 2. ORDINARY MATH 19
2 ∗ x + 42 = (x + y) + (y + 1)
2 ∗ x + 42 = x + 2 ∗ y + 1
Theorem Assume: 1. 3 ∗ x − 2 ∗ y = 7,
2. 7 ∗ x + 3 ∗ y = 1
Prove: x = 1 and y = −2
1. 9 ∗ x − 6 ∗ y = 21
Proof: By assumption 1 and Rule EqMult with p ← 3,
since 3 ∗ (3 ∗ x − 2 ∗ y) = 9 ∗ x − 6 ∗ y.
2. 14 ∗ x + 6 ∗ y = 2
Proof: By assumption 2 and Rule EqMult with p ← 2.
3. 23 ∗ x = 23
Proof: By steps 1 and 2 and Rule EqAdd.
4. x = 1
Proof: By step 3 and Rule EqMult, substituting 1/23 for p.
5. 7 ∗ 1 + 3 ∗ y = 1
Proof: By step 4 and assumption 2 with 1 substituted for x .
6. 3 ∗ y = −6
Proof: By step 5 and Rule EqAdd, substituting −7 for m and n.
7. y = −2
Proof: By step 6 and Rule EqMult, substituting 1/3 for p.
8. Q.E.D.
Proof: By steps 4 and 7.
the Substitution Rule. That rule is such a basic part of mathematics that
it is taken for granted and never explicitly mentioned in a proof.
Finally, we come to the last statement. “Q.E.D.” is an abbreviation for
the goal of the proof, which is the Prove clause of the theorem. In this case,
the goal is to prove the two formulas x = 1 and y = −2. The proof simply
points to the steps in which those formulas are proved. A proof always ends
with a Q.E.D. step, so we’re sure that we’ve actually proved what we were
supposed to.
How we write a proof depends on how hard the proof is and how so-
phisticated we expect the reader of the proof to be. This proof was written
for someone less sophisticated than I expect most readers of this book to
be, since I wanted you to concentrate on the proof style rather than on the
math. A single-paragraph prose proof would probably be fine for a reader
who hasn’t forgotten elementary algebra.
1. x = 1
1.1. 9 ∗ x − 6 ∗ y = 21
1.2. 14 ∗ x + 6 ∗ y = 2
1.3. 23 ∗ x = 23
1.4. Q.E.D.
2. y = −2
2.1. 7 ∗ 1 + 3 ∗ y = 1
2.2. 3 ∗ y = −6
2.3. Q.E.D.
3. Q.E.D.
Proof: By steps 1 and 2.
good idea to make a Q.E.D. step a simple paragraph that you write first, so
you don’t waste time proving steps that don’t imply the proof’s goal.
At the bottom of the hierarchical structure are steps whose proof is
written in prose. That prose should be easy to understand, so the reader
can be sure that it’s correct. How easy that has to be depends on the
reader, who may just be you. If you find that the proof isn’t easy enough to
understand, you should decompose it another level. I’ve found that the way
to avoid errors is to decompose a proof down to the level where the prose
proof is obviously correct, and then go one level deeper. For machine-checked
proofs, the bottom-level proofs are instructions for the proof checker. If the
checker fails to check the proof and you believe the step is correct, then keep
decomposing until either the checker says it’s correct or you see why it’s not.
Long proofs, especially correctness proofs of programs, can be quite deep.
For proofs more than three or four levels deep, we use a compact numbering
system explained in Section 2.1.10.2 below.
In this example, the theorem’s assumptions and goals were mathemati-
cal formulas, as were the assertions made by the steps. This should be the
case for theorems asserting correctness of programs—except perhaps in some
cases at the deepest levels of the proof. In most mathematical proofs, includ-
ing proofs about the math underlying our science of concurrent programs,
CHAPTER 2. ORDINARY MATH 23
the theorem and the assertions of the steps consist of prose statements, such
as “x is a prime number.” Those statements may be a few sentences long.
The prose describes mathematical formulas, but getting the details exactly
right isn’t as important for those theorems as it is for programs. Hierarchi-
cally structured prose proofs are reliable enough for them.
We can’t define precisely what a collection is. I could say that a collection
is a bunch of things, but I would then have to define what a bunch is. I
assume you know what a collection is. The things that a collection is a
collection of are usually called values. For example, the values in the theory
of elementary algebra are numbers.
A mapping M from a collection C of values to a collection D of values
is something that assigns to each value v in C a value M (v ) in D. (The
collections C and D can be the same.) We say that such a mapping M is
a mapping on C. For example, we can define a mapping LengthOfName
from a collection of people to the collection of natural numbers by defining
LengthOfName(p) to equal the number of letters in the name of p, for every
person p in a collection D of people. If Jane is in D, then this defines
LengthOfName(Jane) to equal 4.
A predicate is a Boolean-valued mapping—that is a mapping M on a
collection C such that M (v ) equals true or false for each value v in C.
The meaning of an elementary algebra formula is a predicate on interpreta-
tions, where an interpretation is a mapping from variables to numbers. We
define the meaning [[F ]] of a formula F to be the predicate that maps an
interpretation to the Boolean value obtained by replacing each variable in
F by the value assigned to that variable by the interpretation. For example, If you’re not
if Υ is an interpretation, then [[x + y > 42]](Υ) equals Υ(x ) + Υ(y) > 42. If used to reading
formulas with
Υ(x ) = 3 and Υ(y) = 27, then [[x + y > 42]](Υ) equals 3 + 27 > 42, which
Greek letters,
equals false. go now to
This definition of [[x + y > 42]] makes no sense. Here’s why. A semantics Figure 2.3.
assigns a meaning to a formula. A formula is a string of characters. Its
meaning is a mathematical object. Let’s write the string of characters that is
the formula in a font like this: x + y > 42; and let’s write mathematical objects
like numbers or variables in the font used throughout this book. I claimed
that [[x + y > 42]](Υ) equals 3 + 27 > 42, which is a meaningless combination
of the numbers 3 and 27 and the four characters + > 4 2 .
Elementary algebra has a grammar. The grammar tells us that x + y is an
expression but + x + isn’t, that the + in x + y is an operator with the subex-
pressions x and y as its arguments, and that x + y * z is parsed as x + (y * z).
I assume you know this grammar, so you understand how the expressions
that appear in examples are parsed. To deduce that [[x+y>42]](Υ) equals
3+27 > 42 for the particular interpretation Υ, I replaced x by [[x]](Υ), which
equals 3, and y by [[y]](Υ), which equals 27. I then replaced the syntactic
tokens +, >, and 42 of elementary algebra by the tokens of arithmetic that
are spelled the same. This “punning” works for elementary algebra because
it is so closely related to arithmetic. It won’t work for TLA, so we use a
CHAPTER 2. ORDINARY MATH 25
I have been told that many engineers freak out when they see a
Greek letter like Υ in a formula. If you’re one of them, now is the
time to get over it. You had no trouble dealing with π as a child;
you can now handle a few more Greek letters. They’re used
sparingly in this book, but sometimes representing a particular
kind of object with Greek letters makes the text easier to read.
Here are all the Greek letters used in the book, along with their
English names. You don’t have to remember their names; you
just need to distinguish them from one another.
Lowercase
α alpha λ lambda π pi τ tau
β beta µ mu ρ rho φ phi
δ delta ν nu σ sigma ψ psi
epsilon (also written ε)
Uppercase
Λ Lambda Υ Upsilon Π Pi Φ Phi
∆ Delta
Figure 2.3: Greek letters used in this book.
from formulas to parse trees and apply the rules to the parse tree. But we
won’t be that compulsive.
Implicit in this exposition is that an interpretation assigns values to all
possible variables, not just the ones in any particular expression. The value
of [[F ]](Υ) for a formula F depends only on the values the interpretation Υ
assigns to variables that occur in F . But letting an interpretation assign
values to all variables simplifies things, because it means we don’t have to
keep track of which variables an interpretation is assigning values to.
We assume that there are infinitely many variables. We do this for the
same reason we assume there are infinitely many integers even though we
only ever use relatively few of them: it makes things simpler not to have to
worry about running out of them.
Let’s review what we have done. An expression is a string of characters.
We define the meaning [[exp]] of an expression exp to be a mapping that
assigns to every interpretation Υ a value [[exp]](Υ) that is either a number
or a Boolean, where an interpretation Υ is a mapping that assigns to each
variable v a value Υ(v ) that is a number. We define [[exp]] by defining
[[exp]](Υ) as follows. An expression is either (i) a variable or (ii) a string
of characters that represents an operator op applied to its arguments. In
case (i), [[exp]](Υ) equals Υ(exp). In case (ii), we define for each operator
op the value of [[exp]] in terms of the values [[arg]] for each argument arg of
op in the expression exp. This defines the meaning of the operator op. For
the operator +, we define [[arg1 + arg2 ]] to be the expression [[arg 1 ]] + [[arg 2 ]]
of arithmetic. The + in [[arg1 + arg2 ]] is a one-character string of characters,
and the + in [[arg 1 ]] + [[arg 2 ]] is an operation of arithmetic. We consider an
expression like 42 that represents a number to be an operator that takes no
arguments, where [[42]](Υ) equals 42 for every interpretation Υ.
The similarity in the way we write the operator + of elementary algebra
and the + of arithmetic is obviously not accidental. Syntactically, elemen-
tary algebra is an extension of arithmetic to include variables. However, the
formula 2 + 3 has a different meaning in elementary algebra than in arith-
metic. If we were to give a semantics to a language of arithmetic, then
[[2 + 3]] would equal the number 5. In elementary algebra, [[2 + 3]] is a map-
ping such that [[2 + 3]](Υ) equals 5 for every interpretation Υ. Elementary
algebra is an extension of arithmetic in the sense that if exp 1 and exp 2 are
two expressions of arithmetic, such as 2 + 3 and 5, then [[exp1 ]] and [[exp2 ]]
are equal in the semantics of arithmetic if and only if they’re equal in the
semantics of elementary algebra.
The purpose of this over-analyzing of elementary algebra was to explain
CHAPTER 2. ORDINARY MATH 27
A ∧ B asserts that A is true and B is true, where and has the same meaning
in both languages.
A ∨ B asserts that A is true or B is true (or both A and B are true). Unlike
or in English, or in Mathglish always allows the possibility that both
formulas are true.
⇒
(x > 20) implies (x > 10)
|= (m = n) ⇒ (p ∗ m = p ∗ n)
since |= means true for any values of the variables. Rule EqAdd can be
written as
|= (m = n) ∧ (p = q) ⇒ (m + p = n + q)
Theorem (3 ∗ x − 2 ∗ y = 7) ∧ (7 ∗ x + 3 ∗ y = 1) ⇒ (x = 1) ∧ (y = 2)
CHAPTER 2. ORDINARY MATH 29
We don’t write the |= because it’s implied by stating the formula as a the-
orem. However, we’ll see below that if we write the theorem this way, then
its proof has to be rewritten.
Observe that ∧ has higher precedence (binds more tightly) than ⇒.
The operator ¬ has higher precedence than ∧ and ∨, which have higher
precedence than ⇒ and ≡. Thus
¬A ∧ B ⇒ C ∨ D equals ((¬A) ∧ B ) ⇒ (C ∨ D)
I don’t know how the following expressions should be parsed, so it’s best
not to write them:
A∧B ∨C A≡B ⇒C
Transitivity: |= (A ⇒ B ) ∧ (B ⇒ C ) ⇒ (A ⇒ C )
A⇒B Proof of A ⇒ B .
⇒C Proof of B ⇒ C .
CHAPTER 2. ORDINARY MATH 33
..
.
⇒Q Proof of P ⇒ Q.
This works well if the proof of each implication is short. It’s my favorite way
of writing a lowest-level prose proof of a hierarchically structured proof.
Another useful tautology is
|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ false)
|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ Q)
It shows that to prove P ⇒ Q, we can assume not just P but also ¬Q when
proving Q. This gives us an additional hypothesis. Moreover, it’s a very
strong hypothesis. If P ⇒ Q is true, then P implies that ¬Q is equivalent
to false, which is the strongest possible hypothesis (since false implies
anything). If you wind up not using the additional hypothesis, you can just
delete it.
means that [[A ⇒ P ]](Υ) is true for every interpretation Υ, which is true
iff [[P ]](Υ) is true for every interpretation Υ for which [[A]](Υ) is true. In Remember
other words, we can prove |= A ⇒ P by assuming that Υ is an arbitrary that iff means
interpretation such that [[A]](Υ) is true and proving [[P ]](Υ) is true for that if and only if.
interpretation. Writing
Theorem Assume: A
Prove: P
asserts that |= A ⇒ P is true, but the goal of the proof is to show that
[[P ]](Υ) is true for an interpretation Υ, assuming [[A]](Υ) is true for that
interpretation. In other words, the goal of the proof is to show that P is
true when we can assume that A is true throughout the proof. Therefore,
the Q.E.D. step asserts that P is true (which completes the proof of A ⇒ P ),
not that A ⇒ P is true.
In plain Mathglish, if the theorem asserts A ⇒ P , then the goal of the
proof is to prove A ⇒ P , with no additional assumption. If we write the
theorem as an Assume/Prove, then the goal of the proof is to prove P ,
using the assumption that A is true. Either way, we’re proving the same
thing: |= A ⇒ P .
In our example, formula A equals B ∧ C . Assuming that B ∧ C is true
is the same as assuming that B is true and C is true. Writing B ∧ C as
two separate assumptions allows us to give them each a number, so we can
indicate in the proof which of the two conjuncts is being used in proving
an individual step. This makes the proof easier to read. The goal P is
also a conjunction, but there is seldom any reason to number the individual
conjuncts of a goal.
The Assume/Prove construct is not limited to the statement of the
theorem. It can be used as any step of a proof. The formulas in the Assume
clause as well as any assumptions in effect for that statement can be assumed
in the statement’s proof.
A formula P is often proved by showing that to prove P it suffices to
prove some other formula Q, and then to prove Q. If P is a statement in a
hierarchically structured proof, this proof can be written as:
2.3. P
2.3.1. Q ⇒ P
Proof of Q ⇒ P
2.3.2. Q
Proof of Q
CHAPTER 2. ORDINARY MATH 35
2.3.3. Q.E.D.
Proof: By steps 2.3.1 and 2.3.2.
The problem with this structure is that the proof of Q, which is likely to be
the main part of the proof of P , is one level deeper than the proof of P . (It
starts with statement 2.3.2.1.) That extra level of depth serves no purpose
and makes the proof harder to read. Instead, we write the proof of P like
this:
2.3. P
2.3.1. Suffices: Q
Proof of P , assuming that Q is true.
2.3.2. . . .
..
.
2.3.7. Q.E.D.
Proof that steps 2.3.2–2.3.6 prove Q.
Starting with step 2.3.2, the Suffices statement changes the goal of the
proof of step 2.3 from P to Q. The proof of step 2.3.1 has P as its goal and
Q as an additional assumption that can be used. In other words, the proof
of 2.3.1 is the same as if the statement were:
2.3.1. Assume: Q
Prove: P
The Suffices construct can be used with Assume/Prove too, as in:
2.3. P
2.3.1. Suffices: Assume: A
Prove: Q
Proof of P , assuming that A ⇒ Q is true.
In addition to changing the goal of the proof of step 2.3 from P to Q, this
Suffices statement also adds A to the current assumptions of that proof.
The proof of 2.3.1 is then the same as the proof of:
2.3.1. Assume: A ⇒ Q
Prove: P
If A is a conjunction, then its conjuncts can be listed and given numbers
in the Assume clause of a Suffices statement, the same as in an ordinary
Assume/Prove statement.
CHAPTER 2. ORDINARY MATH 36
2.1.9.1 Quantifiers
The symbols, names, and Mathglish pronunciations of quantifiers are:
∀ universal quantification for all
∃ existential quantification there exists
They have the following meanings, where v is any numeric variable and F
is any formula of the predicate logic of arithmetic:
CHAPTER 2. ORDINARY MATH 37
Υ except v 7→ r
be the mapping that’s the same as Υ except it assigns to the variable v the
number r . For any interpretation Υ:
[[∀ v : F ]](Υ) equals true iff [[F ]](Υ except v 7→ r ) equals true for
every number r .
[[∃ v : F ]](Υ) equals true iff [[F ]](Υ except v 7→ r ) equals true for
some number r .
(2.7) ∀ x : y ∗ x 2 ≥ x 2
Since (i) x 2 ≥ 0 for any number x and (ii) y ∗ r ≥ r for all r ≥ 0 iff y ≥ 1,
this formula equals true iff y ≥ 1. Thus, (2.7) is equivalent to y ≥ 1.
The following formula asserts that there exists a (real) number whose
square equals y:
(2.8) ∃ x : y = x 2
Since a real number y has a square root (that’s a real number) iff y ≥ 0,
this formula is equivalent to y ≥ 0.
The quantifiers ∀ and ∃ are related by the following rules. The box
around F in these formulas is explained in Section 2.1.9.2 below. For now,
ignore the boxes and pretend that each F is simply F .
(2.9) |= (∀ v : F ) ≡ (¬ ∃ v : ¬ F ) |= (∃ v : F ) ≡ (¬ ∀ v : ¬ F )
You should be able to check that these tautologies follow from the definitions
of [[∀ v : F ]] and [[∃ v : F ]]. We can take either of those tautologies to be the
definition of one of the quantifiers in terms of the other.
If we consider ∀ v and ∃ v to be the operators, and ∀ and ∃ (and the
colon “:”) to be pieces of syntax, then these two operators are dual to each
CHAPTER 2. ORDINARY MATH 38
(2.10) |= (¬ ∀ v : F ) ≡ (∃ v : ¬ F ) |= (¬ ∃ v : F ) ≡ (∀ v : ¬ F )
They follow from (2.9) by negating both sides of its equivalence relations.
In formulas (2.7) and (2.8), x is called a bound variable. The variable
y in those formulas is called a free variable and is said to occur free in the
formulas. In a certain sense, the bound variable x doesn’t really occur in
those predicates. If we replace x by another variable z in (2.8), we get
∃ z : z 2 = y — a formula that is not just equivalent to (2.8), but is really just
a different syntax for the same formula.
Bound variables introduce subtle problems in mathematical reasoning.
If not understood, those problems can lead to making incorrect deductions.
Understanding them requires a more careful examination of variables, which
we now undertake.
variables with the same name. In code, we can usually think of a variable
as a location in memory. Here, we don’t know what a variable is (just as we
don’t know what a number is); we only know that an interpretation assigns
a value to a variable, and not to its name. And an expression contains
variables that have names, it doesn’t just contain variable names.
Before we introduced quantifiers, the declarations and scopes of variables
were all implicit. A formula ∀ v : F explicitly declares a new variable named
v whose scope is the formula F . Scopes can be nested. Formula ∀ v : F could
occur within the scope of a variable named v . That variable is a different
variable than the one named v that occurs in formula F . For example, in the
formula ∃ v : G ∧ (∀ v : F ), a variable named v that occurs in G is different
from a variable named v that occurs in F .
Variable Capture Having different variables with the same name raises
issues when substituting an expression for a variable. Those issues don’t
arise in coding languages, because those languages don’t allow such substi-
tution. You can’t substitute x+1 for y in the assignment statement y = 2*y.
But such substitution is needed to define what it means for one program to
implement another.
Here is an issue that arises with substitution that mathematicians call
variable capture. Consider the formula ∃ x : x > y of the predicate logic of
arithmetic. It’s a valid formula, since there exists a number x greater than
any number y. By the Elementary Algebra Instantiation Rule, we obtain a
valid rule when we substitute an expression for the variable y. Substituting
x + 1 for y apparently yields the formula ∃ x : x > x + 1 which equals false,
because there is no number x that is greater than x + 1.
A mathematician would probably say that we’re not allowed to perform
that substitution because the bound variable x captures the x in x +1. How-
ever, the substitution produces no logical problem because the expression
x + 1 we’re substituting for y is not in the formula ∃ x : x > y, so the x in
x + 1 is a different variable than the bound variable x in that formula. So,
the substitution produces a perfectly sound formula. The problem is that
mathematicians have provided no way to write a formula x > x + 1 when
the two “x ”s are names of two different variables. Variable capture is not
a logical problem, it’s a problem in the way we write formulas. However,
it poses a real problem for us because we have to write formulas that way
when we write proofs. Variable capture when writing proofs can result in
incorrect proofs, leading us to believe things that aren’t true.
We can avoid variable capture by changing the name of the bound vari-
CHAPTER 2. ORDINARY MATH 40
able that’s doing the capturing. Variable names serve only to tell whether or
not two variables that occur in a formula are the same variable. They have
no mathematical significance. A formula doesn’t depend on the names of its
variables. We can’t write a formula that asserts that the name of a variable
contains a vowel. So, if the scope of a variable named x contains no formula
within the scope of any variable named z , then we can change the name
of that variable named x from x to z . This means replacing every variable
name x in that scope with z . Doing this replacement makes no change to
the mathematics, just to the way we have written that mathematics. We
can therefore write the formula obtained by substituting x + 1 for y in the
formula ∃ x : x > y by changing the name of the bound variable from x to z
to get the correct formula ∃ z : z > x + 1.
There’s an easy way to avoid variable capture without having to rename
bound variables. We just follow this rule:
Safe Scoping Rule Never declare a bound variable within the scope
of a variable with the same name.
To substitute x + 1 for y in a formula F , the formula F must be in the
scope of a variable named x . The rule implies that F can’t declare a
bound variable named x . Therefore, you could never have to substitute
x + 1 for y in ∃ x : x > y. This rule not only eliminates the problem of vari-
able capture, it also makes formulas easier to understand. While a formula
like ∃ v : F ∧ (∃ v : G) is mathematically fine, having two unrelated variables
named v is likely to confuse a reader.
∆
is the mapping defined by M (x ) = x > y. If we write this mapping M as
x 7→ x > y, then ∃ x : x > y is just a way of writing ∃ (x 7→ x > y). The
first meta-formula of (2.9) becomes |= ∀ (F ) ≡ ¬ ∃ (¬F ), where F is a map-
ping and ¬F is the mapping defined by (¬F )(v ) = ¬F (v ) for all values v .
Substituting x 7→ x > y for F in |= ∀ (F ) ≡ ¬∃ (¬F ) yields:
(2.14) |= (∀ v : F ∧ G ) ≡ ((∀ v : F ) ∧ (∀ v : G ))
|= (∃ v : F ∨ G ) ≡ ((∃ v : F ) ∨ (∃ v : G ))
You should be able to find examples to show that these assertions become
false if ∀ and ∃ are interchanged.
Another simple but useful rule, which was already mentioned in Sec-
tion 2.1.9.2, is that if the variable v does not occur free in formula F , then
∀ v : F and ∃ v : F are both equivalent to F . For example, if v does not occur
free in F , then the first meta-theorem of (2.14) implies that ∀ v : (F ∧ G) is
equivalent to F ∧ (∀ v : G).
CHAPTER 2. ORDINARY MATH 43
There are four more rules that are often used. Two of them are for
proving the two kinds of quantified formulas, the other two are for using each
kind of quantified formula to prove something else. The first two are called
quantifier introduction rules, the second two are called quantifier elimination
rules. The first uses a new kind of assumption in a proof. The others
are simple implications, which can be proved with an Assume/Prove as
described in Section 2.1.8.2.
3.2. ∀ v : F
3.2.1. Suffices: Assume: new v
Prove: F
Proof: Obvious (because the Assume/Prove asserts ∀ v : F ).
..
.
3.2.7. Q.E.D.
Proof that steps 3.2.2–3.2.6 imply F .
We could also eliminate one level of proof and the Suffices step by having
step 3.2 simply assert the Assume/Prove.
In the view of quantification without bound variables, where F is a
mapping, the ∀ introduction rule is that ∀ (F ) is proved by proving
Assume: new v
Prove: F (v )
|= (∀ v : F ) ⇒ ( F with v ← exp)
We then use the formula F with v ← exp to prove our goal. In the view
of quantification without bound variables, this rule is: |= ∀ (F ) ⇒ F (exp) .
|= F ⇒ G implies |= (∃ v : F ) ⇒ (∃ v : G )
|= ∀ (F ⇒ G) ⇒ (∃ (F ) ⇒ ∃ (G))
2
By this definition, v does not occur free in the formula v = v , which makes sense
mathematically.
CHAPTER 2. ORDINARY MATH 45
|= (G ≡ (v 7→ G(0))) ⇒ (∃ (G) ≡ G)
Then ASqrt(4) might equal 2 and ASqrt(9) might equal −3. Since this is
math, |= ASqrt(4) = ASqrt(4) is true. The value of ASqrt(4) may be 2 or
−2. But whichever value it equals, like every mathematical expression with
no free variable, it always equals the same value.
Formally, choose is defined by the following rules:
If there is more than one value of x for which F equals true, then
choose x : F can equal any of those values. But it always equals the same
value.
No matter how often I repeat that the choose operator always chooses
the same value, there are engineers who think that choose is nondeter-
ministic, possibly choosing a different value each time it’s evaluated, and
they try to use it to describe nondeterminism in a program. I’ve also heard
computer scientists talk about “nondeterministic functions”.3 There’s no
such thing. There’s no nondeterminism in mathematics. Nondeterminism
is important in concurrent programs, and we’ll see that it’s easy to describe
mathematically. Adding nondeterminism to math for describing nondeter-
minism in a program makes as much sense as adding water to math for
describing fluid dynamics.
An expression choose v : F is most often used when there is only a single
√
choice of v that makes F true, as in the definition of r above. Sometimes,
it appears within an expression whose value doesn’t depend on which value
of v satisfying F is chosen.
Just as with quantifiers, we can eliminate the bound variable in the
choose construct by writing choose (F ), where F is a mapping. The
expression choose v : F is then an abbreviation of choose (v 7→ F (v )).
The rules (2.16) are written:
(a) |= ∃ (F ) ⇒ F (choose (F ))
(b) |= ∀ (F ≡ G) ⇒ (choose (F ) = choose (G))
|= (A ∧ B ⇒ P ) ≡ (A ⇒ (B ⇒ P ))
A ⇒ (B ⇒ (∀ v : C ⇒ (D ⇒ P )))
The proof of the Assume/Prove statement has P as the goal, and the
variable v is declared by the new v clause to be a new variable whose scope
contains formulas C , D, and P and the statement’s proof. It would be
almost impossible to use formula A or formula B in the proof if it had a free
variable named v . The Safe Scoping Rule makes this impossible because it
forbids the new v clause to occur within the scope of a variable named v .
As should be clear from the example, the general form of an Assume/
Prove statement is
Assume: A1 , . . . , An Prove: P
B 1 . . . B n P ) . . .)
For example, step 6.2.7.3 can be used only in the proofs of steps 6.2.7.4,
6.2.7.5, etc.
This rule allows us to solve the problem of keeping track of long step
numbers. Step number 6.2.7.3, 6.2.7.4, etc. can be replaced by the numbers
h4i3, h4i4, etc., where step hn ii is the i th step of a level-n proof. There can
be many steps numbered h4i3 in a proof. However, in any paragraph proof,
there is at most one step numbered h4i3 that the Step Reference Rule allows
to be used. That step, if it exists, is the most recent step numbered h4i3.
2.1.11.1 if/then/else
A programmer who read enough math would notice that mathematicians
lack anything corresponding to the if /then/else statement of coding lan-
guages. Instead, they use either prose or a very awkward typographical
convention. We let the expression
if P then e else f
2.2 Sets
You’ve probably come across the mathematical concept of a set. A set is
a collection. However, we will see that not all collections can be sets. The
fundamental operator in terms of which sets are defined is ∈, which is read is
an element of or simply in. For every collection S that is a set, the formula
exp ∈ S equals true iff the value of the expression exp is one of the things
the collection S is a collection of. We call the things in a set S the elements
of S .
CHAPTER 2. ORDINARY MATH 52
We need to be able to have sets of sets—that is, sets whose elements are
sets. Therefore, a set has to be a “thing”. So, we need to know what kinds
of things there are besides sets. The simplest way I know to make the math
we need completely rigorous is to base it on what is called ZF set theory or
simply ZF, where Z and F stand for the mathematicians Ernst Zermelo and
Abraham Fraenkel. One thing that makes ZF simple is that every thing is
a set. In other words, every mathematical value is a set. We will use ZF to
describe our math, so the terms set and value will mean exactly the same
thing. Sometimes I will write set/value instead of set or value to remind you
that the two words are synonyms. We add to ZF the operators of predicate
logic, so we should say we’re using the predicate logic of ZF, but we won’t.
We’ll just call it ZF.
Logicians have shown how to build mathematics, including the set of real
numbers, from ZF. There’s no need for us to do that; we just assume the
real numbers exist and the arithmetic √ operators on them satisfy the usual
properties. This means that 42 and √ 2 are sets, but we don’t specify what
their elements are. We know that 2 ∈ 42 equals either true or false,
but we don’t know which. We assume nothing about what the elements of
the set 42 are. We’ll generally use the term value for a set/value like 42 for
which we don’t know what its elements are.
We define the semantics of ZF the same way we defined the semantics
of the predicate logic of arithmetic. The meaning [[F ]] of a formula F of ZF
is a predicate on (the collection of) interpretations, where an interpretation
is an assignment of a set to each variable. There is one difference between
the predicate logic of arithmetic and ZF: We defined the predicate logic
of arithmetic to have two kinds of variables, numeric-valued and Boolean-
valued. ZF has just set-valued variables. In ZF, the values true and false
are sets.4 (We don’t know what their elements are.)
Including the operators of arithmetic in ZF implies that we have to
say what the meaning of the expression x + y is if x or y is a set that
isn’t a number. It also implies that we can write weird expressions like
(x + y) ∧ z , so we have to explain what they mean. We’ll deal with this issue
in Section 2.2.7.
rule of ZF:
|= ( S = T ) ≡ ∀ e : (e ∈ S ) ≡ (e ∈ T )
|= S ⊆ T ≡ ∀ e : (e ∈ S ) ⇒ (e ∈ T )
Imaging {exp : v ∈ S } is the set of all values of the expression exp obtained
by substituting for v an element of S . For example, {2 ∗ n + 1 : n ∈ N}
is the set of all odd natural numbers. Since S is not in the scope of
the bound variable v , the following definition is correct only if S is a
variable or is a formula not containing a variable named v .
Definition: |= e ∈ {exp : v ∈ S } ≡ ∃ v : (v ∈ S ) ∧ (e = exp)
|= S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
|= S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
Moreover, if all the sets are subsets of a set W, then a propositional logic
tautology containing ¬ becomes a theorem of set theory when, in addition
to substituting operators of set theory for propositional logic operators, each
subexpression ¬S is replaced with W \ S .
This correspondence between theorems of propositional logic and set
theory is the result of a close connection between predicates and sets. For a
set W, there is a natural 1-1 correspondence between predicates on W and
subsets of W. The predicate P and the corresponding set S P are related as
follows:
S P = {v ∈ W : P (v )} P (v ) = (v ∈ S P )
|= F ∨ G = F ∪ G |= F ⇒ G = (F ⊆ G)
|= F ≡ G = (F = G) |= ¬F = W \ F
CHAPTER 2. ORDINARY MATH 57
|= (∀ v ∈ {} : F ) ≡ true |= (∃ v ∈ {} : F ) ≡ false
Some obvious abbreviations are used for nested ∀ and nested ∃ expres-
sions. For example, ∀ v ∈ S : (∀ w ∈ T : F ) is written ∀ v ∈ S , w ∈ T : F and
∀ v ∈ S , w ∈ S : F is written ∀ v , w ∈ S : F . Definitions (2.20) and the re-
quirement that the bound variable cannot appear in the expression S imply:
|= (∀ v , w ∈ S : F ) ≡ (∀ w , v ∈ S : F )
|= (∃ v , w ∈ S : F ) ≡ (∃ w , v ∈ S : F )
There are no such abbreviations for choose. (It’s not clear what they would
mean.)
If S is a finite set containing n elements, then ∀ v ∈ S : F equals the
conjunction of n formulas, each obtained by substituting an element of S
for v in F . Similarly, ∃ v ∈ S : F equals the disjunction of those n formulas.
If S is an infinite set, we can think of ∀ v ∈ S : F and ∃ v ∈ S : F as the
conjunction and disjunction of the infinitely many formulas obtained by
substituting elements of S for v in F . When we say that a formula is a
conjunction or disjunction, we sometimes include the case when it is such
an infinite conjunction or disjunction. It should be clear from the context
when we’re doing this.
The rules for reasoning about unbounded quantifiers and unbounded
choose can be applied to the bounded versions of these operators by using
the bounded operators’ definitions. For example, consider the ∀ introduc-
tion rule of Section 2.1.9.3. We prove ∀ v ∈ S : F by applying the rule to
∀ v : (v ∈ S ) ⇒ F , which means proving:
This abbreviation also makes it clear that the bound variable v can’t appear
in the expression S .
CHAPTER 2. ORDINARY MATH 59
that adding one element to an infinite set doesn’t change its size. Here is a
1-1 correspondence showing that the set of integers is the same size as the
set of natural numbers:
I : 0 1 −1 2 −2 3 −3 4 −4 . . .
l l l l l l l l l
N : 0 1 2 3 4 5 6 7 8 ...
We say that a set S is smaller than a set T iff S is the same size as a subset
of T , but T is not the same size as a subset of S . In the 19th century, Georg
Cantor upset many mathematicians by showing that the set of integers is
smaller than the set of real numbers. He also showed that P(S ) is bigger
than S , for every set S . A theorem of ZF called the Schröder-Bernstein
Theorem states that if S and T each is the same size as a subset of the
other, then S and T are the same size. The definition of size and this
theorem generalize to arbitrary collections.
A set is called countable iff it is either finite or has the same number of
elements as N (and hence the same number of elements as I). No infinite
set is smaller than N, so countably infinite sets are the smallest infinite sets.
The following theorem asserts that a countable union of countable sets is a
countable set.
Proof sketch: First, assume T and all its elements are infinite sets. Since
T is countable, we can enumerate its elements as A, B , C , D, etc. Since all
of these sets are countable, we can list their elements as:
A = {a 0 , a 1 , a 2 , a 3 , a 4 , . . .}
B = {b 0 , b 1 , b 2 , b 3 , b 4 , . . .}
C = {c 0 , c 1 , c 2 , c 3 , c 4 , . . .}
D = {d 0 , d 1 , d 2 , d 3 , d 4 , . . .}
..
.
0 N: 1 2 3 4 5 6 7 8 9 ...
l l l l l l l l l l
A ∪ B ∪ C ∪ D ∪ . . . : a0 a1 b0 a2 b1 c0 a3 b2 c1 d0 . . .
CHAPTER 2. ORDINARY MATH 61
Writing induction like this provides a new way of thinking about it. Instead
of starting from 0 and going up to bigger numbers, we think of starting
from an arbitrary number n and going down to smaller numbers. That is,
to prove F is true for n, we assume it’s true for numbers m smaller than
n. We can then prove F is true for each of those numbers m by assuming
it’s true for numbers p smaller than m. And so on. We can’t keep finding
smaller and smaller numbers forever. Therefore, we must eventually prove
that F is true for some number or numbers without using any assumptions.
The only property of N necessary for (2.22) to be a sound proof of
∀ n ∈ N : F is that there is no infinite sequence m 0 , m 1 , m 2 , . . . of elements
in N such that m 0 > m 1 > m 2 > . . . is true. Statement (2.22) proves
∀ n ∈ S : F for any set S with a greater-than relation > satisfying this con-
dition.
For later use, we define the property we need S to satisfy for an arbitrary
collection S , not just a set. A relation on a collection S is defined to be a
Boolean-valued mapping on pairs of values in S , where we write n m
instead of (n, m). We define to be well-founded on S iff there does
not exist a function f with domain N such that f (i ) is in S for all i ∈ N
and ∀ i ∈ N : f (i ) f (i + 1). The following theorem can be generalized to
an arbitrary collection S , but we state it only for sets.
Theorem 2.2 If is a well-founded relation on the set S , then proving
the following statement proves ∀ n ∈ S : F .
Assume: new n ∈ S ,
∀ m ∈ {i ∈ S : n i } : F with n ← m
Prove: F
This kind of proof is called a proof by well-founded induction. An example
is proving a property F of subsets of a finite set T by induction on the size
of the subset. We prove F is true of the empty set and that it is true of a
nonempty subset U of T if it is true of some proper subsets of U . The set
S of the theorem is P(T ) and U V is defined to be true iff V is a proper
subset of U .
2.2.6.3 Collections
ZF has mind-bogglingly big sets. Not only is P( R) bigger than R, and
P(P( R)) bigger than P( R), and P(P(P( R))) bigger than . . . , but the
union of all these huge sets is also a set. However, there are collections of
sets that are too big to be a set. The biggest such collection is the collection
of all sets, which we call V.
CHAPTER 2. ORDINARY MATH 63
that was developed for reasoning about sets to deduce a result about this
collection, we could in theory obtain an invalid result. The math used in
this book is, from a mathematical viewpoint, so simple that such a mistake
is unlikely. (The one conceivable exception is in Appendix Section A.5.)
The ability to make V a ZF set means that even if we did deduce something
that was incorrect because V is not a set, then a counterexample could not
arise in practice.
|= ∀ x ∈ R : (x 6= 0) ⇒ (x ∗ (1/x ) = 1)
(x 6= 0) ⇒ (x ∗ (1/x ) = 1)
(2.23) (0 6= 0) ⇒ (0 ∗ (1/0) = 1)
This is a true formula containing the expression 1/0 that children aren’t
supposed to write.
So, what does 1/0 equal? Some think that it should be an evil value
that can corrupt expressions in which it appears, causing them to equal
that evil value. This leads to a three-valued logic, in which the value of a
predicate can equal not only true or false but also the evil value. This is
unnecessarily complicated.
The simple answer to “What does 1/0 equal?” is “We don’t know.” It’s
a√value, but our definition of “/” doesn’t tell us what value. It might equal
2; it might equal the set N. Formula (2.23) is true because 0 6= 0 equals
false, and false ⇒ P equals true for any predicate P . A formula like
1/0 is meaningless, meaning that all we know about it is that it’s a value.
Meaningless expressions don’t concern us because we shouldn’t write
them. Writing x ∪ 1 when we meant to write x + 1 is an easy error to detect.
It will be found with any method that can find subtle errors in abstract
CHAPTER 2. ORDINARY MATH 65
Since there is no real number q for which q ∗ 0 = 1, this defines 1/0 to equal
choose q ∈ R : false, which is a meaningless expression. However, 2/0
equals this same meaningless expression, so this definition implies 1/0 = 2/0.
I don’t like that, so I prefer a definition such as:
∆
r /s = if s 6= 0 then choose q ∈ R : q ∗ s = r
else {r } + {s}
This tells nothing about r /0, except that the Substitution Rule implies that
it equals u/0 if r = u.
2.3 Functions
2.3.1 Functions as Mappings
Mathematicians usually define a pair5 hx , y i to be a set (usually the set
{{x }, {x , y}}) and define a function f to be a set of ordered pairs, where the
pair hx , y i in f means that f (x ) equals y. But they seldom use that definition
to define a particular function, instead defining the squaring function sq on
real numbers by writing something like:
The set D is called the domain of the function, and the domain of a function
f is written domain(f ). For a function f , the value of f (e) is specified only
if e is in its domain. Two functions f and g are equal iff they have the
same domain and f (v ) = g(v ) for all v in their domain. More precisely, the
function constructor satisfies this rule:
(2.24) |= ((v ∈ D 7→ exp 1 ) = (v ∈ E 7→ exp 2 )) ≡
(D = E ) ∧ (∀ v ∈ D : exp 1 = exp 2 )
We define D → S to be the set of all functions f with domain D such that
f (x ) ∈ S for all x ∈ D. A value f is a function with domain D iff f equals
v ∈ D 7→ f (v ). We can therefore define the set D → S by:
|= f ∈ (D → S ) ≡ ∧ f = (v ∈ D 7→ f (v ))
∧ ∀ v ∈ D : f (v ) ∈ S
(Like all our definitions of set-forming operators, this asserts that D → S is a
set if D and S are sets.) It follows from (2.24) that there is a unique function
whose domain is the empty set. That function can be written v ∈ {} 7→ 42,
where we can replace 42 with any value.
An array in modern coding languages is described mathematically as
a function, where the expression f [x ] in the language means f (x ). For a
variable f whose value is an array/function, assigning the value 4.2 to f [14]
changes the value of f to a new array/function that we can write as
x ∈ domain(f ) 7→ if x = 14 then 4.2 else f (x )
Mathematicians have little need to write such a function, but it occurs of-
ten when math is used to describe programs, so we need a more compact
notation for it. We write it like this:
f except 14 7→ 4.2
This notation has been used above when f is an interpretation Υ. An
interpretation is a function whose domain is the set of all variables.
We have defined what are usually called functions of a single argument.
Mathematicians also define functions of multiple arguments. For example,
+ can be considered to be a function of two arguments, where x + y is an
abbreviation of +(x , y). A function with n arguments can be considered to
be a function whose domain is a set of n-tuples, so f (x , y) is an abbreviation
of f (hx , y i).
A function is a special kind of mapping that is a value. A mapping
can be defined to assign a specified value to elements of any collection. For
CHAPTER 2. ORDINARY MATH 67
Ordinal These are lists whose items are naturally named with the ordinal
numbers first, second, third, etc. For example, in a list of people
waiting to be served, the second person to be served is naturally named
person number 2.
I think mathematicians call such lists sequences, so we will call them ordinal
and cardinal sequences.
In most of today’s coding languages, array elements are numbered start-
ing with 0. They make it convenient to describe cardinal sequences and less
convenient to describe ordinal sequences. When I started describing abstract
concurrent programs with mathematical formulas, I discovered that the for-
mulas were usually simpler if I described finite lists as ordinal sequences.
However, the meaning and properties of the formulas are defined in terms
of infinite sequences and finite prefixes of those sequences; and the math is
simpler if those are cardinal sequences. So, we use both kinds of sequences.
The obvious way to represent a sequence mathematically is as a function
whose domain is the set of numbers of the sequence’s items. An ordinal
sequence of length n (one containing n items) is a function whose domain is
CHAPTER 2. ORDINARY MATH 68
We define Seq(S ) to be the set of finite ordinal sequences whose items are
elements of the set S :
∆
[
(2.25) Seq(S ) = { (1 . . n → S ) : n ∈ N }
The empty sequence h i has domain 1 . . 0, which is the empty set. It’s a
simple way to write the (unique) function whose domain is the empty set.
(It is the one sequence that is both an ordinal and a cardinal sequence.)
We now define some operators on both ordinal and cardinal sequences.
To make it easier to write definitions that apply to both kinds of sequence,
for a nonempty sequence s we define 1st (s) to equal 1 if s is an ordinal
sequence and 0 if it is a cardinal sequence:
∆
1st (s) = if 0 ∈ domain(s) then 0 else 1
Head (s) The first item of a nonempty sequence s. It equals s(1st (s)).
Tail (s) The remainder of the nonempty sequence s after its first item is
removed. It equals:
i ∈ { j − 1 : j ∈ (domain(s) \ {1st (s)}) } 7→ s(i + 1)
CHAPTER 2. ORDINARY MATH 69
2.4 Definitions
Definitions are omnipresent in mathematics. Despite their importance,
mathematicians seem to give little thought to what definitions actually
mean. What they mean is a practical concern because if it contained no new
definitions, the formula that describes a typical abstract program would be
hundreds of lines long. We can understand such a large formula only by us-
ing definitions to decompose it into understandable pieces. A precise under-
standing of the abstract programs we write requires a precise understanding
of what those definitions mean.
To understand a defined symbol, we need to understand what the symbol
is defined in terms of. If we expand all definitions used in its definition, we
obtain a formula containing only variables and operators that we assume
to be understood by the readers of the formula. All those operators are
mappings on collections of values or of tuples (usually pairs) of values. The
operators of arithmetic can be viewed as functions whose domains are sets of
numbers or pairs of numbers. Operators like ∈, ∪, and domain are mappings
on values/sets or pairs of values/sets. Operator just means mapping.
CHAPTER 2. ORDINARY MATH 70
where exp is an expression. The v i are called parameters, but they are
essentially bound variables whose scope is the expression exp. Like variable
declarations, definitions have scopes. And as with non-bound variables, the
scopes of ordinary definitions are usually implicit.
We don’t allow circular definitions. That is, it must be possible to order
all definitions so that in (2.26), the definitions of all defined symbols that
occur in exp precede this definition of F . For an ordinary definition, F
cannot appear in exp. In Section 2.4.2, we describe recursive definitions in
which F can appear in exp. But we never allow exp to contain a symbol
whose definition follows the definition of F . (We have no need for mutual
recursion, in which several symbols can be defined in terms of one another.)
We first consider definitions with no parameters, in which we omit the
∆
parentheses and just write F = exp .
defines Double(P, {a, b}) to equal P(P({a, b})). (This set contains 16 ele-
ments, one of which is {{}, {b}}.) Definitions of higher-order mappings are
used only in Appendix Section A.2, where their meanings should be obvious.
They are therefore not discussed further here.
Definitions with parameters provide another source of variable capture
if a parameter occurs inside the scope of a bound variable in the definition.
∆
For example, if the mapping F is defined by F (v ) = ∃ y ∈ R : y > v , then
F (y + 1) does not equal ∃ y ∈ R : y + 1. This kind of variable capture can
be made impossible by requiring that if a definition parameter occurs within
the scope of a bound variable in the definition, then a variable with the same
name as that bound variable cannot appear in the scope of the definition.
In practice, definition parameters seldom occur in the scope of a bound
variable. I believe it happens three times in this book.
See Appendix Section A.2 for how to prove such an assertion. Most of the
time, it’s obvious that the definition defines what we expect it to, in which
case step 4.2 and its proof can be omitted in a hand proof.
Describing Abstract
Programs with Math
3.1 The Behavior of Physical Systems
Programs are meant to be executed on physical computers. I have been
guided by the principle that any statement I make about a program should
be understandable as a statement about its execution on one or more com-
puters. I believe this was the principle that guided Turing in defining the
Turing machine as an abstraction of a physical computing device.
The description of our science of concurrent programs begins by exam-
ining the physics of computing devices. We don’t care about the actual
details of how transistors and digital circuits work. We are just interested
in how scientists describe physical systems. As a simple example, we look
at a planet orbiting a star the way an astronomer might.
We consider the one-planet system’s behavior starting at some time t 0 ,
after the star and planet have been formed and the planet has settled into
its current orbit. Let R≥ be the set {r ∈ R : r ≥ t 0 } of all real numbers r
with r ≥ t 0 . The behavior of the one-planet system is described by its state
at each instant of time. We assume the star is much more massive than the
planet, so we can assume that it doesn’t move. We also assume that there
are no other objects massive enough to influence the orbit of the planet, so
the state of the system is described by the values of six state variables: three
describing the three spatial coordinates of the planet’s position and three
describing the direction and magnitude of its momentum. Let’s call those
state variables v 1 , . . . , v 6 ; we won’t worry about which of the six values
each represents. The quantities these variables represent change with time,
so the value of each variable v i is a function, where v i (t) represents the value
at time t. The behavior of the system is described mathematically by the
75
DESCRIBING ABSTRACT PROGRAMS 76
function σ with domain R≥ such that σ(t) is the tuple hv 1 (t), . . . , v 6 (t)i of
numbers, for every t ∈ R≥ . Physicists call σ(t) the state of the system at
time t.
In this description, the planet is modeled as a point mass. Real plan-
ets are more complicated, composed of things like mountains, oceans, and
atmospheres. For simplicity, the model ignores those details. This limits
the model’s usefulness. For example, it’s no good for predicting a planet’s
weather. But models of planets as point masses are sometimes used to plan
the trajectories of a real spacecraft. It’s also not quite correct to say that the
model ignores details like mountains and oceans. The mass of the model’s
point mass is the total mass of the planet, including its mountains and
oceans, and its position is the planet’s center of mass. The model abstracts
those details, it doesn’t ignore them.
The laws that determine the point-mass planet’s behavior σ are ex-
pressed by six differential equations of this form:
dv i
(3.1) (t) = f i (t)
dt
where t ∈ R≥ and each f i is a function with domain R≥ such that f i (t)
is a formula containing the expressions v 1 (t), . . . , v 6 (t). Don’t worry if you
haven’t studied calculus and don’t know what equation (3.1) means. All
you need to know is that it asserts the following approximate equality for
small non-negative values of dt:
and the approximation gets better as dt gets smaller, reaching equality when
dt = 0. The differential equations (3.1) have the property that for any time
t > t 0 and any time r > t, the values of the six numbers v i (t) and the
functions f i completely determine the six values v i (r ) and hence the value
of σ(r ). That is, the equations imply:
History Independence For any time t ∈ R≥ , the state σ(r ) of the system
at any time r > t depends only on its state σ(t) at time t, not on
anything that happened before time t.
by discretely changing variables whose values are not just bits but may be
any data structure provided by the language—for example, 128-bit integers.
An abstract program is the same, except the√value of a variable may be
any value—for example a real number such as 2, not just a finite-precision
approximation like 1.414213562. Modeling a science of programs on the
science of physical systems ensures that it can address real problems, and
we are not just creating a science of angels dancing on the head of a pin.
(However, the science should be able to describe any discretely behaving
angels, wherever they might be dancing.)
We are seldom interested in the actual times t j at which state variables
can change. To simplify things, we consider only the sequence of states
through which the system passes, ignoring the times at which it enters and
leaves those states. We call the state created at time t j state number j .
Instead of letting a state variable v be a function that assumes the value
v (t) at time t, we consider it to be a function that assumes the value v (j ) in
state number j . In other words, the value of a state variable v is a sequence
of values. A behavior σ of a program is also a sequence, where σ(j ), its state
number j , describes the values of the device’s variables in that state.
If a program or a digital device runs forever, then the sequence of times
t j is infinite and therefore so is the sequence σ of its states. But if a pro-
gram terminates, then those sequences can be finite. Other than parallel
programs, in which concurrency is added to a traditional program so it can
run faster by using multiple processors, most concurrent programs are not
supposed to stop. A concrete concurrent program will not really run for-
ever, but we describe it as running forever for the same reason there are an
infinite number of integers even though we only use a finite number of them:
it makes things simpler.
Still, some concurrent programs are supposed to stop, so we have to
describe them. For simplicity, we describe those programs as well with
infinite state sequences. Exceptionally observant readers will have noticed
that while the times t j had to be chosen so we can pretend that the state
changes only at those times, we did not require that the state had to change
at each of those times. There can be times t j at which none of the program
variables’ values change. In particular, if the program stops, we can add an
infinite number of times t j after it has stopped. This leads to an infinite
sequence of states such that, for some k , the values of the program’s variables
after state number k are the same. We call a pair hσ(j ), σ(j +1)i of successive
states in a behavior σ a step of σ. A step in which the values of the program’s
variables do not change is called a stuttering step of the program.
We call a behavior ending in infinitely many stuttering steps a halting
DESCRIBING ABSTRACT PROGRAMS 80
variables x = 1, y = 1;
while true do
a: x : = x + y + 2 ;
y := y + 2
end while
(3.4) ∀ j ∈ N : ∧ x (j + 1) = x (j ) + y(j ) + 2
∧ y(j + 1) = y(j ) + 2
We call (3.3) the initial predicate. It determines the initial state. Formula
(3.4) is called the step predicate. It’s the discrete analog of the differential
equations (3.1) that describe the orbiting planet. Instead of describing how
the values of the variables change in the continuous behavior when time
increases by the infinitesimal amount dt, the step predicate (3.4) describes
how they change when the state number of the discrete behavior increases
by one.
DESCRIBING ABSTRACT PROGRAMS 82
You can check that (3.3) and (3.4) define a behavior that begins as
follows where, for example, [x :: 16, y :: 7]3 indicates that state number 3
assigns the values 16 to x and 7 to y, and the arrows are purely decorative.
" # " # " # " # " #
x :: 1 x :: 4 x :: 9 x :: 16 x :: 25
→ → → → → ···
y :: 1 y :: 3 y :: 5 y :: 7 y :: 9
0 1 2 3 4
These first few states of the behavior suggest that in the complete behavior,
x and y equal the following functions:
(3.5) x = (j ∈ N 7→ (j + 1)2 )
y = (j ∈ N 7→ 2 ∗ j + 1)
To prove that (3.3) and (3.4) imply (3.5), we must prove they imply:
A proof by mathematical induction that (3.3) and (3.4) imply (3.6) is a nice
exercise in algebraic calculation.
We can think of (3.5) as the solution of (3.3) and (3.4), just as the for-
mulas describing the position and momentum of the planet at each time t
are solutions of the differential equations (3.1). It is mathematically im-
possible to find solutions to the differential equations describing arbitrary
multi-planet systems. It is mathematically possible to write explicit descrip-
tions of variables as functions of the state number like (3.5) for the abstract
programs written in practice, but those descriptions are almost always much
too complicated to be of any use. Instead, we reason about the initial pred-
icate and the step predicate, though in Section 3.4.1 we’ll see how to write
them in a more convenient way.
The interesting thing about program Sqrs is that the sequence of values
assumed by x in an execution of the program is the sequence of all posi-
tive integers that are perfect squares, and this is accomplished using only
addition. This is obvious from (3.5), but for nontrivial examples we won’t
have such an explicit description of each state of a behavior. Remember
that history independence implies that, at any point in a behavior, what
the program does in the future depends only on its current state. What is
it about the current state that ensures that if x is a perfect square in that
state, then it will equal all greater perfect squares in the future? There is
a large body of work on reasoning about traditional programs, initiated by
Robert Floyd in 1967 [14], that shows how to answer this question. If you’re
familiar with that work, the answer may seem obvious. If not, it may seem
like it was pulled out of a magician’s hat. Obvious or magic, the answer is
DESCRIBING ABSTRACT PROGRAMS 83
that the following formula is true for every state number j in the behavior
of Sqrs:
(3.7) ∧ (x (j ) ∈ N) ∧ (y(j ) ∈ N)
∧ y(j ) % 2 = 1
y(j ) + 1 2
∧ x (j ) =
2
This formula implies that x (j ) is a perfect square, since the first two con-
juncts imply that y(j ) is an odd natural number. Moreover, since y(j + 1) =
y(j ) + 2, the last conjunct implies that x (j + 1) is the next larger perfect
square after x (j ). So, the truth of (3.7) for every state number j explains
why the algorithm sets x to all perfect squares in increasing order.
A predicate like (3.7) that is true for every state number j of a behavior
is called an invariant of the behavior. By mathematical induction, we can
prove that a predicate is an invariant by proving these two conditions:
I2. For any k ∈ N, if the predicate is true for j = k then it’s true for
j = k + 1.
For (3.7), I1 follows from the initial predicate (3.3), and I2 follows from the
step predicate (3.4). (You should have no trouble writing the proof if you’re
used to writing proofs; otherwise, it might be challenging.)
A predicate that can be proved to be an invariant by proving I1 from an
initial predicate and I2 from a step predicate is called an inductive invariant.
Model checkers can check whether a state predicate is an invariant of small
instances of an abstract program. But the only way to prove it is an invariant
is to prove that it either is or is implied by an inductive invariant. For
any invariant P , there is an inductive invariant that implies P . However,
writing an inductive invariant for which we can prove I1 and I2 is a skill
that can be acquired only with practice. Tools to find it for you have been
developed [15, 38], but I don’t know how well they would work on industrial
examples.
The first conjunct of the invariant (3.7) asserts the two invariants
x (j ) ∈ N and y(j ) ∈ N. An invariant of the form v (j ) ∈ S for a variable v
is called a type invariant for v . An inductive invariant almost always must
imply a type invariant for each of its variables. For example, without the
hypotheses that x (j ) and y(j ) are numbers, we can deduce nothing about
the values of x (j + 1) and y(j + 1) from the step predicate (3.4).
DESCRIBING ABSTRACT PROGRAMS 84
variables x = 1, y = 1, pc = a;
while true do
a: x : = x + y + 2 ;
b: y : = y + 2
end while
Step Predicate
∀ j ∈ N : if pc(j ) = a
then ∧ x (j + 1) = x (j ) + y(j ) + 2
∧ y(j + 1) = y(j )
∧ pc(j + 1) = b
else ∧ x (j + 1) = x (j )
∧ y(j + 1) = y(j ) + 2
∧ pc(j + 1) = a
When they see this step predicate, most programmers and many computer
scientists think that the conjuncts y(j + 1) = y(j ) and x (j + 1) = x (j ) are
unnecessary. They think that not saying what the new value of a variable
equals should mean that it equals its previous value. But if that were the
case, then what we wrote wouldn’t be math. We would be giving up the
benefits of centuries of mathematical development—the benefits that are
the reason science is based on math. An essential aspect of math is that
a formula means exactly what it says—nothing more and nothing less. If
the step predicate didn’t say what y(j + 1) equals when pc(j ) = a is true,
then there√would be no more reason for it to equal y(j ) than for it to equal
i ∈ N 7→ −42.
You may find it discouraging that the mathematical description of
FGSqrs is more complicated than its pseudocode in Figure (3.2). Please
be patient. You will see in Section 3.4.1 how a little notation can simplify
it. We can always write an abstract program more compactly in pseudocode
than in math, as long as we don’t have to explain precisely what the pseu-
docode means. But science is precise, and a science of abstract programs
must explain exactly what they mean. Moreover, tools can’t check an im-
precise description of a program. Math is the simplest way to explain things
precisely.
PlusCal is a precise language for describing abstract programs in what
looks like pseudocode. (However, it’s infinitely more expressive than ordi-
nary pseudocode because its expressions can be any mathematical expres-
sions—even uncomputable ones.) A PlusCal program is translated to a
mathematical description of the program in TLA+. I often find it easier to
write an abstract program in PlusCal than directly in TLA+. However, I
reason about the TLA+ translation, not the PlusCal code. And for many
DESCRIBING ABSTRACT PROGRAMS 86
3.3 Nondeterminism
The laws of classical physics, such as the laws of planetary motion, are
deterministic. Given the initial values of all the variables, their values at any
later time are completed determined. Causes of nondeterminism are either
negligible because they have an insignificant effect—for example, meteor
showers—or are simply assumed not to happen—for example, cataclysmic
collisions with errant asteroids.
A program is nondeterministic if the initial state of a behavior doesn’t
determine the complete behavior. Even when executed on supposedly deter-
ministic digital systems, nondeterminism is the norm in programs—especially
concurrent ones. Here are some sources of nondeterminism in programs:
DESCRIBING ABSTRACT PROGRAMS 87
User Input The user giving a value to the program is usually described as
an action of the program that nondeterministically chooses the value
provided by the user. The user can also be described as a separate
process that nondeterministically chooses the value to provide.
Faults Physical devices don’t always behave the way they’re supposed to.
In particular, they can fail in various ways. Programs that tolerate
failures describe a failure as an operation that may or may not be
executed.
variables x = 0 ;
process p ∈ Procs
variables t = 0, pc = a ;
a: t : = x ;
b: x : = t + 1
end process
Figure 3.3: The Increment abstract program for a set Procs of processes.
but for convenience we call its elements processes.) The only assumption
we make about this set is that it is finite and nonempty. The process
statement declares that there is a process for every element of Procs, and
it gives the code for an arbitrary process p in Procs. The variables t and
pc are local to process p, each process having its own copy of these two
variables. Variable x is global, accessed by all the processes. Process p
saves the result of reading x in its variable t. The initial value of t doesn’t
matter, but letting all variables have reasonable initial values makes a type
invariant simpler, so we let t initially equal 0.
The mathematical description of the abstract program Increment is in
Figure (3.4). The process-local variables t and pc are represented by mathe-
matical variables whose values in each state are functions with domain Procs,
where t(p) and pc(p) are the values of those variables for process p. The
initial predicate, describing the values of the variables in state number 0,
is simple. The possible steps in a behavior are described by a predicate
that, for each j , gives the values of x (j + 1), t(j + 1), and pc(j + 1) for
any assignment of values to x (j ), t(j ), and pc(j ). It asserts that there are
two possibilities, described by formulas PgmStep(j ) and Stutter (j ), that are
explained below.
PgmStep(j ) describes the possible result of some process executing one step
starting in state j . The predicate equals true iff there exists a process
p for which aStep(p, j ) or bStep(p, j ) is true, where:
Initial Predicate
∧ x (0) = 0
∧ t(0) = (p ∈ Procs 7→ 0)
∧ pc(0) = (p ∈ Procs 7→ a)
Step Predicate
∀ j ∈ N : PgmStep(j ) ∨ Stutter (j )
where
∆
PgmStep(j ) = ∃ p ∈ Procs : aStep(p, j ) ∨ bStep(p, j )
∆
aStep(p, j ) = ∧ pc(j )(p) = a
∧ x (j + 1) = x (j )
∧ t(j + 1) = (t(j ) except p 7→ x (j ))
∧ pc(j + 1) = (pc(j ) except p 7→ b)
∆
bStep(p, j ) = ∧ pc(j )(p) = b
∧ x (j + 1) = t(j )(p) + 1
∧ t(j + 1) = t(j )
∧ pc(j + 1) = (pc(j ) except p 7→ done)
∆
Stutter (j ) = ∧ ∀ p ∈ Procs : pc(j )(p) = done
∧ hx (j + 1), t(j + 1), pc(j + 1)i = hx (j ), t(j ), pc(j )i
• t(j )(p) = N
Initial Predicate
∆
Init = ∧ x = 0
∧ t = (p ∈ Procs →7 0)
∧ pc = (p ∈ Procs →7 a)
Step Predicate
∆
Next = PgmStep ∨ Stutter
where
∆
PgmStep = ∃ p ∈ Procs : aStep(p) ∨ bStep(p)
∆
aStep(p) = ∧ pc(p) = a
∧ x0 = x
∧ t 0 = (t except p 7→ x )
∧ pc 0 = (pc except p 7→ b)
∆
bStep(p) = ∧ pc(p) = b
∧ x 0 = t(p) + 1
∧ t0 = t
∧ pc 0 = (pc except p 7→ done)
∆
Stutter = ∧ ∀ p ∈ Procs : pc(p) = done
∧ hx 0 , t 0 , pc 0 i = hx , t, pc i
Inductive Invariant
∆
Inv = ∧ TypeOK
∧ ∀ p ∈ Procs : (pc(p) = b) ⇒ (t(p) ≤ NumberDone)
∧ x ≤ NumberDone
Figure 3.5: Abstract program Increment and its invariant Inv in simpler
math.
DESCRIBING ABSTRACT PROGRAMS 93
variables like Procs, whose values are the same in every state of a behavior,
and program variables like x that are implicit functions of the state. Pro-
gram variables like x look weird to mathematicians. In math, the value of a
variable x is fixed. We’ve seen in Chapter 2 that when a mathematician does
something else and introduces a variable x , it’s really a completely different
variable that happens also to be written “x ”. Of course, you’re familiar with
program variables because they’re the variables of coding languages, whose
values change in the course of a computation.
Since this book is about a science of programs, we will henceforth use the
name variable for program variables. Mathematical variables like Procs will
be called constants. When describing a program mathematically, variables
correspond to what we normally think of as program variables. Constants
are parameters of the program, such as a fixed set of processes. Early coding
languages had constants as well as variables. In modern coding languages,
constants are buried in the code, where they are called static final variables
of an object.
In this book, the variables in pseudocode are explicitly declared, and un-
declared identifiers like Procs are constants. For formulas, the text indicates
which identifiers are variables and which are constants.
In addition to having both variables and constants, the formulas in Fig-
ure 3.5 have primed variables, like x 0 . An expression that may contain
primed and unprimed variables, constants, and the operators and values of
ordinary math (which means everything described in Chapter 2) is called a
step expression. A Boolean-valued step expression is called an action. The
math whose formulas are actions is called the Logic of Actions, or LA for
short.
For an unprimed variable v , we define [[v ]](s → t) to equal s(v ), the value
assigned to variable v by state s. For a primed variable v 0 , we define
[[v 0 ]](s → t) to equal t(v ).
We call an LA expression a step expression and an LA formula an action.
For an action A and step s → t, we say that s → t satisfies A or is an A
step iff [[A]](s → t) equals true.
A state expression is an LA expression that contains no primed vari-
ables, and a state formula is a Boolean-valued state expression. For a state
expression exp, the value of [[exp]](s → t) depends only on s, so we can write
it as [[exp]](s).
Because the meaning of an LA expression assigns different values to v
and v 0 , we can treat v and v 0 as two unrelated variables. This means that
we can reason about LA formulas as if constants, unprimed variables, and
primed variables were all different mathematical variables. Thus, for LA as
defined so far, we can regard LA as ordinary math with some mathematical
variables having names like v 0 ending with 0 .
DESCRIBING ABSTRACT PROGRAMS 95
I2. If Inv is true in a state, then the program’s next-state predicate implies
that it is true in the next state.
DESCRIBING ABSTRACT PROGRAMS 96
For program Increment, whose initial predicate is Init and whose next-state
action is Next, these two conditions can be expressed in LA as:
(3.10) |= Init ⇒ Inv
|= Inv ∧ Next ⇒ Inv 0
The proof of these conditions for program Increment is discussed in Ap-
pendix Section B.1.
Thus far, the correctness properties of programs that have concerned us
have been invariance properties. All the reasoning we have done to verify
that a program satisfies an invariance property is naturally expressed in
LA. The safety property usually proved of a traditional program is that it
cannot produce a wrong answer—which is expressed as the invariance of
the property asserting that the program has not terminated with a wrong
answer. The most popular way of proving such a property is Hoare logic [21].
Appendix Section A.4 explains Hoare logic and its relation to the Logic of
Actions.
The TLA in RTLA stands for Temporal Logic of Actions. The R stands
for Raw, in the sense of unrefined. We’ll see later that RTLA allows us
to write formulas that we shouldn’t write. TLA is the logic obtained by
restricting RTLA to make it impossible to write those formulas. But that’s
a complication we don’t need to worry about now, so we’ll start with the
simpler “raw” logic.
In temporal logic formulas, the operator 2 binds more tightly than the
operators of propositional logic. For example, 2F ∨G is parsed as (2F )∨G.
From now on, [[F ]] means [[F ]]RTLA for all RTLA formulas, including actions.
We will explicitly write [[A]]LA to denote the meaning of A as an LA formula.
For an action A, we define 2A to be the temporal formula that is true
of a behavior iff A is true of all steps of the behavior. In other words, we
define the meaning [[2A]] of the RTLA formula 2A by
∆
(3.12) [[2A]](σ) = ∀ n ∈ N : [[A]]LA (σ(n) → σ(n + 1))
Like most logics, RTLA contains the operators of propositional and predicate
logic, where they have their standard meanings. For example, [[F ∧ G]](σ)
equals [[F ]](σ)∧[[G]](σ), and [[∃ i : F ]](σ) equals ∃ i : [[F ]](σ). As in LA, bound
identifiers are called bound constants and they act like constants, having the
same value in all states of a behavior. Bounded quantification is defined
DESCRIBING ABSTRACT PROGRAMS 99
Remember that in (3.10) and (3.13), when Init, Next, and Inv are the formu-
las defined in Figure 3.5, |= F means that F is true for all interpretations
satisfying the assumptions we made about the constants of Increment—
namely, that Procs is a finite set and the implicit assumption that the values
of a, b, and done are different from one another.
In general, the conditions I1 and I2 for showing that a state predicate Inv
is an invariant of a program Init ∧ 2Next are expressed in LA by conditions
(3.10). It is an RTLA proof rule that these conditions imply (3.13). When we
DESCRIBING ABSTRACT PROGRAMS 100
prove a safety property like (3.13), the major part of the reasoning depends
on the definitions of the formulas Init, Next, and Inv . That reasoning is
reasoning about actions, which is formalized by LA. The temporal logic
reasoning, which is done in RTLA, is trivial. Describing the program with a
single formula is elegant. But it is really useful only when verifying liveness
properties, which requires nontrivial temporal reasoning.
Because it’s often forgotten, it is worth repeating that a state is any
assignment of values to variables, and a behavior is any infinite sequence of
states. Even when we are discussing program Increment,
√ “state” means any
state, including states in which x has the value h 2, 1 . . 147i. In (3.13), |=
means
√ true for any behavior, even behaviors in which the initial value of x
is h 2, 1 . . 147i.
√ It is true for those behaviors because Init equals false for
them (unless h 2, 1 . . 147i happens to equal 0, which it might).
about the present and future. The state σ(n) of a behavior σ is the state at
time n, and the behavior σ +n is the part of the behavior σ that begins at
time n. We can then think of [[F ]](σ +n ) as asserting that F is true at time
n of behavior σ. Thus, [[F ]](σ) asserts that F is true at time 0 of σ, and
[[2F ]](σ) asserts that F is true at all times of σ. The formula 2F therefore
asserts that F is true at all times—that is, F is always true. (Remember
that time n is just some instant of time; it is not n time units after time 0.)
We now drop the [[ ]] and think about temporal logic the same way we
think about ordinary math, conflating a formula with its meaning. So, we’ll
think of a temporal formula F as a Boolean-value function on behaviors.
However, we will still turn to the formal meaning (3.15) of 2 when it is
useful. Sections 3.4.2.3–3.4.2.8 below examine 2 and temporal operators
defined in terms of 2. They present quite a few tautologies. Understanding
intuitively why those tautologies are true will make you comfortable reading
temporal logic formulas and thinking in terms of them.
Perhaps less obvious is this proof rule, which is sort of a converse of (3.16):
(3.20) |= F implies |= 2F
(3.21) |= F ⇒ G implies |= 2F ⇒ 2G
(3.22) 3F = ¬2¬F
∆
Like 2, the operator 3 binds more tightly than the operators of propositional
logic, so 3F ∧ G is parsed as (3F ) ∧ G. To understand 3, we derive the
meaning [[3F ]] of a formula 3F from (3.15):
Make sure you understand why they are true from the meaning of 3 as
eventually. These two tautologies can be derived from (3.16) and (3.17).
For example:
and |= 2¬F ⇒ ¬F follows from (3.16). You should convince yourself that
3(F ∧ G) and (3F ) ∧ (3G) need not be equivalent. The equivalence of
3(F ∨ G) and 3F ∨ 3G generalizes to arbitrary disjunctions:
|= 3(∃ i ∈ S : F i ) ≡ (∃ i ∈ S : 3F i )
Here are three tautologies relating 3 and 2. The first is obtained by negating
3F and its definition; the second by substituting ¬F for F in that definition;
and the third by substituting ¬F for F in the first:
It asserts that if F is true from now on and G is true at some time in the
future, then at some time in the future F is true from then on and G is true
then.
We can express liveness properties with 3. For example, the assertion
that some state predicate P is eventually true is a liveness property. The
assertion that the program whose formula is F satisfies this property is
|= F ⇒ 3P . Since the assertion that something eventually happens is a
liveness property, most of the formulas we write that contain 3 express
liveness.
The equivalence of ¬3P and 2¬P for any formula P implies the tau-
tology |= 3P ∨ 2¬P . This tautology is central to many liveness proofs. To
DESCRIBING ABSTRACT PROGRAMS 104
4. Q.E.D.
Proof: Steps 1–3 are of the form A iff B , B iff C , and C iff D, and the
theorem asserts A iff D.
The rules for moving ¬ over 2 and 3 that are implied by the first two
tautologies of (3.25) yield the following two tautologies. For example, the
first comes from ¬23F ≡ 3¬3F ≡ 32¬F .
The first one is obvious if we read 323 as eventually infinitely often, because
F is true at infinitely many times iff it is true at infinitely many times after
some time has passed. You can convince yourself that the second is true
by realizing that infinitely often F always true is equivalent to F being
always true starting at some time. Alternatively, you can show that the
DESCRIBING ABSTRACT PROGRAMS 106
first tautology implies the second by figuring out why each of the following
equivalences is true:
232F ≡ ¬3¬¬2¬¬3¬F ≡ ¬323¬F ≡ ¬23¬F
≡ ¬¬3¬¬2¬¬F ≡ 32F
3.4.2.9 Warning
Although elegant and useful, temporal logic is weird. It’s not ordinary math.
In ordinary math, any operator Op we can define satisfies the condition,
sometimes called substitutivity, that the value of an expression Op(e 1 , . . . , e n )
is unchanged if we replace any e i by an expression equal to e i . If Op takes
a single argument, substitutivity means that
(3.34) |= (exp 1 = exp 2 ) ⇒ (Op(exp 1 ) = Op(exp 2 ))
is true for any expressions exp 1 and exp 2 . For example, (3.34) is true
for the operator Tail . However, the temporal operator 2 is not substi-
tutive. For example let exp 1 and exp 2 be the state predicates x = 0 and
DESCRIBING ABSTRACT PROGRAMS 108
affects all temporal logics and makes temporal logic reasoning tricky.
3.5 TLA
3.5.1 The Problem
There is something terribly wrong with our RTLA descriptions of abstract
programs, because there is something terribly wrong with the descriptions
like the one in Figure 3.4 that we wrote using explicit step numbers. To
see why, let’s return to the discussion in Section 3.1 of how astronomers
describe a planet orbiting a star. As explained there, the mathematical
description of the orbiting planet is best thought of as describing a universe
containing the planet, saying nothing about what else is or is not in the
universe. In particular, that description applies just as well to a universe
in which there is a spacecraft close to the star that orbits it very fast—
perhaps going around the star 60 times every time the planet goes around
it once. Since the spacecraft is too small to affect the motion of the planet,
we would obtain a description of the system composed of the planet and the
spacecraft by adding (conjoining) a description of the spacecraft’s motion
to the description of the planet’s motion. The description of the planet’s
motion remains an accurate description of that planet in the presence of
the spacecraft. It would be crazy if we had to write different formulas to
describe the planet because of the spacecraft that has no effect on it.
Now consider the descriptions of abstract programs we’ve been writing.
In particular, consider an RTLA formula HM describing how the values of
the hour and minute displays of a 24-hour clock change. Using the variables
hr and min to describe the current hour and minute being displayed, we
might define HM to equal Init ∧ 2Next, where:
∆
(3.35) Init = (hr = 0) ∧ (min = 0)
∆
Next = ∧ min 0 = (min + 1) % 60
∧ hr 0 = if min = 59 then (hr + 1) % 24 else hr
DESCRIBING ABSTRACT PROGRAMS 109
But suppose that the clock also displays seconds. The RTLA formula HMS
that also describes the second display might use a variable sec to describe
that display. A behavior σ allowed by HMS would not be allowed by HM
because HM requires every step to change the value of min, while σ must
change the value of sec in every step and the value of min in only every 60th
step.
It is just as crazy for an abstract program describing an hour-minute
clock not to describe a clock that also displays seconds as it is for a descrip-
tion of a planet’s motion no longer to describe that motion because of a
spacecraft that doesn’t affect the planet. It means that anything we’ve said
about the hour and minute display might be invalid if there’s also a second
display. And it doesn’t matter if the minute display is on a digital clock on
my desk and the second display is on a phone in my pocket. More generally,
it means if we’ve proved things about completely separate digital devices
and we look at those two devices at the same time, nothing we’ve proved
about them remains true unless those devices are somehow synchronized to
run in lock step. The more you think about it, the crazier it seems.
because hhr , min i0 equals hhr 0 , min 0 i, and two tuples are equal iff their
corresponding components are equal.
We can similarly fix every other example we’ve seen so far by changing
the next-state action Next in its RTLA description to Next ∨ (v 0 = v ), where
v is the tuple of all variables that appear in the RTLA formula. Since this
will have to be done all the time, we abbreviate A ∨ (v 0 = v ) as [A]v for any
action A and state expression v .
We can add stuttering steps to a pseudocode description of an algorithm
by adding a separate process that just takes stuttering steps. However,
we won’t bother to do this. We will just consider all pseudocode to allow
stuttering steps.
When HM is defined by (3.36), if HMS is true of a behavior then HM
is also true of the behavior. This remains true when HMS is modified to
allow stuttering steps. Thus, HMS implements HM , and |= HMS ⇒ HM is
true. Implementation is implication. How elegant!
DESCRIBING ABSTRACT PROGRAMS 111
add or remove steps that leave the program’s variables unchanged. Since the
assertion depends only on the values a behavior assigns to the program’s vari-
ables, this condition is satisfied iff the assertion does not depend on whether
we add or remove steps that leave all variables unchanged. We’ve used the
term stuttering step to mean a step that leaves a program’s variables un-
changed. We will now call such a step a stuttering step of the program. We
define a stuttering step to be a step that leaves all variables unchanged.
A sensible predicate F on behaviors should satisfy the condition that the
value of [[F ]](σ) is not changed by adding stuttering steps to, or removing
them from, a behavior σ. This means that the value of [[F ]](σ) is not changed
even if an infinite number of stuttering steps are added and an infinite
number removed. (However, the behavior must still be infinite, so if σ ends
in an infinite number of stuttering steps, those steps can’t be removed.) A
predicate on behaviors satisfying this condition for all behaviors σ is called
stuttering insensitive, or SI for short. When describing abstract programs
or the properties they satisfy, we should use only SI predicates on behaviors.
To define SI precisely, we first define \(σ) to be the behavior obtained by
removing from the behavior σ all stuttering steps except those belonging to
an infinite sequence of stuttering steps at the end. Recall that Tail (σ) is the
behavior obtained from σ by removing its first state and ◦ is concatenation
of sequences. For a state s, let’s define (s →) to equal i ∈ {0} 7→ s, the
cardinal sequence of length 1 whose single item is s. Here is the recursive
definition of \.
∆
(3.37) \(σ) = if ∀ i ∈ N : σ(i ) = σ(0)
then σ
else if σ(0) = σ(1) then \(Tail (σ))
else (σ(0) →) ◦ \(Tail (σ))
A predicate on behaviors is defined to be SI iff, for any behavior σ, the
predicate is true of σ iff it is true of \(σ). SI is a semantic condition—that
is, a condition on the meanings of formulas. Since we are conflating formulas
and their meanings, saying that a formula F is SI means that [[F ]] is SI.
We have been using the term property informally to mean some condition
on the behaviors of a system or abstract program. We now define it to mean
an SI predicate on behaviors. Behavior predicate still means any predicate
on behaviors, not just SI ones.
DESCRIBING ABSTRACT PROGRAMS 113
This equivalence follows from the definition of 3 and the following assertion,
DESCRIBING ABSTRACT PROGRAMS 114
which can be proved by propositional logic from the definitions of [. . .]v and
h. . .iv :
|= hAiv ≡ ¬[¬A]v
From (3.38) we see that ¬3hAiv is equivalent to 2[¬A]v . This means that
an hAiv step never occurs in a behavior iff every step of the behavior is a
[¬A]v step. This fact is used in proofs by contradiction of formulas of the
form 3hAiv .
The only temporal operator we have defined besides 2 and 3 is ;. We
define F ; 3hAiv to be a TLA formula if F is one, A is an action, and v
is a state expression. This formula is SI, since expanding the definition of
; in it produces what we have already defined to be a TLA formula.
Combining all this, we see that a TLA formula is one of the following:
• A state predicate.
Abstract programs and the properties they satisfy should be TLA for-
mulas. However, we can use RTLA proof rules and even RTLA formulas
when reasoning about TLA formulas. For example, we can prove that Inv
is an invariant of Init ∧ 2[Next]v by substituting [Next]v for Next in the
RTLA proof rule that (3.10) implies (3.13). This yields the following rule:
In this rule, the first |= means validity in LA while the second |= means
validity in TLA. A feature of TLA is that as much reasoning as possible is
done in LA, which becomes ordinary mathematical reasoning when the nec-
essary definitions are expanded and primes are distributed across operators
so only variables are primed.
Chapter 4
This characterization was turned into precise definitions of safety and live-
ness for arbitrary behavior predicates by Alpern and Schneider [3]. Since
we’re interested only in properties (which are SI), we will use a somewhat
simpler definition of safety. But first, we need a few preliminary definitions.
We call a finite, nonempty cardinal sequence of states a finite behavior.
(A behavior, without the adjective finite, still means an infinite cardinal
1
Remember that a behavior means any infinite sequence of states, not just one that
satisfies some program. If we know that a behavior satisfies the program, we can often tell
that 3(x = 42) is false by looking at the behavior predicate that describes the program,
without looking at the behavior at all.
115
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 116
It also follows easily from the definition of safety that the conjunction of
safety properties is a safety property. Therefore, as expected, the formula
Init ∧ 2[Next]v that we have been calling the safety property of a program
is indeed a safety property.
The property that asserts that a program halts is a liveness property.
That property is true of a behavior σ iff σ ends with infinitely many steps
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 117
that leave the program’s variables unchanged. It’s a liveness property be-
cause every finite behavior ρ is a prefix of its completion ρ↑ , which satisfies
the property.
Safety and liveness are conditions on properties, which are SI behavior
predicates. When we say that a TLA formula 2[A]v is a safety property, we
are conflating the formula with its meaning. We should remember that it’s
actually [[2[A]v ]] that is the safety property.
S 12 = Init ∧ 2[Next]hx ,y i
∆
(4.2)
∆
Init = (x 6= 2) ∧ (y = (x = 1))
∆
Next = ∧ (x 0 = 2) ⇒ y
∧ y 0 = (y ∨ (x = 1))
It’s not obvious in what sense formula S 12 expresses property F12 , since
S 12 contains the variables x and y while F12 describes only the values of x .
Intuitively, S 12 makes the same assertion as F12 if we ignore the value of y.
Section 6.1 describes a TLA operator ∃ such that ∃ y : S 12 means S12 if we
ignore the value of y. We’ll then see that [[∃ y : S 12 ]] equals F12 . However,
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 118
programs, we should use it. There are other temporal logics that can express
the simple property F12 with a formula that’s easier to understand than S 12 .
However, S 12 is not hard to understand, and abstract programs are the only
practical way I know to express all the properties of concrete concurrent
programs that we need to describe.2
example, we can rule out behaviors of program Increment that don’t halt
prematurely with the liveness property
However, we’ll see later why that’s not a good liveness property to use.
There’s another method of describing safety and liveness that helps me
understand them intuitively. It’s based on topology. The method and the
necessary topology are explained in Appendix Section A.5.
4.2 Fairness
Expressing mathematically the way computer scientists and engineers de-
scribed their algorithms and programs led us to describe the safety property
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 121
In the common case when A has the form hB iv , we omit the parentheses
and write simply EhB iv . The liveness property assumed of a traditional
program whose safety property is described by the formula Init ∧ 2[Next]v
is EhNext iv ; hNext iv .
occasionally print output on the same printer, and two processes printing
at the same time would produce an unreadable mixture of the two outputs.
To prevent that, the processes execute a mutual exclusion algorithm, and a
process prints only when in its critical section.
The outline of a mutual exclusion algorithm is shown in Figure 4.1,
where Procs is the set of processes. We don’t care what the processes do
in their noncritical and critical sections, so we represent them by atomic
skip statements labeled ncs and cs that do nothing when executed except
change the value of pc. The nontrivial part of the algorithm consists of the
two sections of code, the waiting and exiting sections, that begin with the
labels wait and exit. Each of those sections can contain multiple labeled
statements, using variables declared in the two variables statements.
The safety property that a mutual exclusion algorithm must satisfy is
that no two processes are executing their critical sections at the same time—
meaning that pc(p) and pc(q) cannot both equal cs for two different pro-
cesses p and q. This is an invariance property. A cute way of expressing it
compactly is:
(4.5) 2 ( ∀ p, q ∈ Procs : (p 6= q) ⇒ ({pc(p), pc(q)} =
6 {cs}) )
We will not yet state a precise liveness condition a mutual exclusion algo-
rithm should satisfy. All we need to know for now is that if some processes
enter the waiting section, they can’t all wait forever without entering the
critical section.
Most people viewing the outline in Figure 4.1 will think this is an un-
realistic description of a mutual exclusion algorithm because, by describing
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 124
the execution of the critical section with a single skip step, we are assuming
that the entire critical section is executed as a single step. Of course, we
realize that this isn’t the case. It no more says that the critical section is
executed as a single step than our description of an hour-minute clock says
that nothing else happens between the step that changes the clock’s display
to 7:29 and the step that changes it to 7:30. Just as 59 changes to a seconds
display can occur between those two steps, process p can print the entire
Bhagavad Gita while pc(p) equals cs. A mutual exclusion algorithm simply
describes all that printing as stuttering steps of the algorithm.
Figure 4.2 describes a program named UM , which is an abbreviation of
Unacceptable Mutual exclusion algorithm. Technically, it’s a mutual exclu-
sion algorithm because it satisfies property (4.5) with Procs equal to the
set {0, 1} of processes. But for reasons that will be discussed later, it isn’t
considered to be an acceptable algorithm.
This pseudocode program is the first one we’ve seen with an await state-
ment. For a state predicate P , the statement await P can be executed only
when control is at the statement and P equals true. We could write the
statement a : await P as:
a: if ¬P then goto a end if
Executing this statement in a state with P equal to true just moves control
to the next statement. Executing it in a state with P equal to false does
not change the value of any program variable, so it’s a stuttering step of the
program. Since a stuttering step is always allowed, executing the statement
await P when P equals false is the same as not executing it. So, while we
can think of the statement await P continually evaluating the expression
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 125
P and moving to the next statement iff it finds P equal to true, mathe-
matically that’s equivalent to describing it as an action A such that E(A)
equals (pc = a) ∧ P .
This is also the first pseudocode we’ve seen with explicit array variables.
An array variable x is an array-valued variable, where an array is a function
and x [p] just means x (p). We’ve already seen implicit array variables—
namely, the local variables t and pc of program Increment are represented
by function-valued variables in Figure 3.5. I have decided to write x [p] in-
stead of x (p) in pseudocode to make the pseudocode look more like real code.
However, the value of an array variable can be any function, not just a finite
ordinal sequence; and we write x (p) instead of x [p] when discussing the pro-
gram mathematically. As we’ve seen in Figure 3.5, an assignment statement
x [p] := . . . is described mathematically as x 0 = (x except p 7→ . . .).
Algorithm UM is quite simple. The processes communicate through the
variable x , with process p writing to x (p). The initial value of x (p) for each
process p is false. To enter the critical section, process p sets x (p) to true
and then enters its critical section when x (1 − p) (the array element written
by the other process) equals false.
It’s easy to see that the two processes cannot be in their critical sections
at the same time. If they were, the last process p to enter its critical section
would have read x (1 − p) equal to true when executing statement w 2,
so it couldn’t have entered its critical section. Since mutual exclusion is an
invariance property, it can be proved mathematically by finding an inductive
invariant that implies mutual exclusion. You can check that the following
formula is such an inductive invariant of UM :
(4.6) ∧ TypeOK
∧ ∀ p ∈ {0, 1} : ∧ (pc(p) ∈ {w 2, cs}) ⇒ x (p)
∧ (pc(p) = cs) ⇒ (pc(1 − p) 6= cs)
where TypeOK is the type-correctness invariant:
∆
TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, cs, exit})
Let UMSafe be the safety property described by the pseudocode. We
want to conjoin a property UMLive to UMSafe to state a fairness require-
ment of the program’s behaviors. Let’s make the obvious choice of defining
UMLive to be formula (4.4) with Procs equal to {0, 1} and v equal to hx , pc i.
This implies that both processes keep taking steps forever, executing their
critical sections infinitely often, which makes it seem like a good choice.
Actually, that makes it a bad choice.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 126
worked fine for program Increment. It failed for program UM . The reason it
worked for program Increment is that when PNext(p) is enabled, it remains
enabled until a PNext(p) step occurs. In program UM , process p can reach
a state in which pc(p) equals w 2 and PNext(p) is enabled, but process 1 − p
can then take a step that disables action PNext(p).
To obtain a machine-closed condition, we have to weaken (4.8) so it rules
out fewer behaviors. The obvious way to do that is by requiring a PNext(p)
step to occur not if PNext(p) just becomes enabled, but only if it remains
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 128
It makes weak fairness look stronger than the definition because 23hAiv
is a stronger property than 3hAiv . Here’s an informal proof of (4.14).
The definition of WFv (A) implies the right-hand side of the equivalence be-
cause 32 EhAiv implies that eventually hAiv is always enabled, whereupon
WFv (A) keeps forever implying that an hAiv step occurs, so there must be
infinitely many hAiv steps, making 23hAiv true. The opposite implica-
tion is true because 2 EhAiv implies 32 EhAiv , so the right-hand side of
the equivalence implies that 23hAiv is true and hence 3hAiv is true. A
rigorous proof of (4.14) is by the following RTLA4 reasoning, substituting
EhAiv for F and hAiv for G:
4.2.7. G ; H
4.2.7. 2(G ; H )
false. In fact, that OB satisfies mutual exclusion can be proved with the
same inductive invariant (4.6) as UM except that the type invariant TypeOK
must be modified because pc(1) can now also equal w 3 or w 4. For algorithm
OB , we define:6
∆
(4.17) TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, w 3, w 4, cs, exit})
∧ pc(0) ∈
/ {w 3, w 4}
However, the resulting inductive invariant isn’t strong enough for proving
liveness. We now consider liveness.
Let OBSafe, the safety property of OB described by the pseudocode, be
the formula Init ∧ 2[Next]v , where v equals hx , pc i and
∆
Next = ∃ p ∈ {0, 1} : PNext(p)
The fairness condition we want OB to satisfy is weak fairness of each pro-
cess’s next-state action except when the process is in its noncritical section.
A process p remaining forever in its noncritical section is represented in our
abstract program by no PNext(p) step occurring when pc(p) equals ncs.
The fairness condition we assume of program OB is therefore:
∆
OBFair = ∀ p ∈ {0, 1} : WFv ((pc(p) 6= ncs) ∧ PNext(p))
The formula OBSafe ∧ OBFair , which we call OB , satisfies the liveness
property that if process 0 is in its waiting section, then it will eventually
enter its critical section. That is, OB implies:
(4.18) (pc(0) ∈ {wait, w 2}) ; (pc(0) = cs)
This implies deadlock freedom, because if process 0 stops entering and leav-
ing its critical section, then it eventually stays forever in its noncritical
section. If process 1 is then in its waiting section, it will read x [0] equal to
false and enter its critical section.
The inductive invariant obtained from the inductive invariant of UM
isn’t strong enough because it doesn’t assert that x [p] = false when process
p is in its noncritical, which is at the heart of why OB is deadlock free. For
that we need this stronger invariant, where TypeOK is defined by (4.17):
(4.19) ∧ TypeOK
∧ x [0] ≡ (pc[0] ∈ {“w2”, “cs”, “exit”})
∧ x [1] ≡ (pc[1] ∈ {“w2”, “w3”, “cs”, “exit”})
∧ ∀ p ∈ {0, 1} : (pc[p] = “cs”) ⇒ (pc[1 − p] 6= “cs”)
6
For any infix predicate symbol like = or ∈, putting a slash through the symbol negates
it, so e ∈/ S means ¬(e ∈ S ).
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 134
By the meaning of leads to, the property asserted by each formula F in the
graph means that if the program is ever in a state for which F is true, then
it will eventually be in a state satisfying a formula pointed to by one of the
outgoing edges from F . The graph has a single sink node (one having no
outgoing edge). Every path in the graph, if continued far enough, leads to
the sink node. By transitivity of the ; relation, this means that if all the
properties asserted by the diagram are true of a behavior, then the behavior
satisfies the property F ; H , where H is the sink-node formula and F
is any formula in the lattice. In particular, the properties asserted by the
diagram imply formula (4.18). By (3.31), that every formula in the graph
leads to the sink-node formula means that the disjunction of all the formulas
in the graph leads to the sink-node formula.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 135
Now to explain the box. Let Λ equal 2Inv ∧ 2[Next]v ∧ OBFair , the
formula that labels the box. Formula Λ is implicitly conjoined to each of the
formulas in the graph. It is a 2 formula, since the conjunction of 2 formulas
is a 2 formula, and OBFair is the conjunction of WF formulas, which are
2 formulas.
Since Λ is conjoined to every formula in it, the leads-to lattice makes
assertions of the form
Λ ∧ G ; (Λ ∧ H 1 ) ∨ . . . ∨ (Λ ∧ H j )
Since Λ equals 2Λ, and once 2Λ is true it is true forever, this formula is
equivalent to Λ ∧ G ; H 1 ∨ . . . ∨ H j . (This follows from (3.32c) and
propositional logic.)
If H is the unique sink node of the lattice, then proving the assertions
made by the lattice proves |= Λ ∧ G ; H for every node G of the lat-
tice. Since Λ equals 2Λ, it’s easy to see that a behavior that satisfies
Λ ∧ G ; H must satisfy Λ ⇒ (G ; H ), so proving |= Λ ∧ G ; H
proves |= Λ ⇒ (G ; H ). All this is true only because the formula Λ is a
2 formula. In general, we label a box in a leads-to lattice only with a 2
formula.
Remember what the proof lattice of Figure 4.4 is for. We want to prove
that OB implies (4.18). By proving the assertions made by the proof lattice,
we show that the formula Λ labeling the box implies (4.18). By definition of
OB and because OB implies 2Inv , formula Λ is implied by OB . Therefore,
by proving the leads-to properties asserted by the proof lattice, we prove that
OB implies (4.18). Note how we had to use the 2 formula 2Inv ∧ 2[Next]v
instead of OBSafe, which is true only initially.
To complete the proof that OB implies (4.18), we now prove the leads-
to properties asserted by Figure 4.4. The leads-to property asserted by the
edges numbered 1 is:
It is trivially true, since pc(0) ∈ {wait, w 2} implies that pc(0) equals wait
or w 2, and 2(F ⇒ G) implies F ; G.
The formula Λ ∧ (pc(0) = wait) ; (pc(0) = w 2) asserted by edge num-
ber 2 is true because Λ implies 2Inv ∧ 2[Next]v and the weak fairness
assumption of process 0, which imply (pc(0) = wait) ; (pc(0) = w 2).
The formula
The formula 2[Next2]v is implied by 2Inv and 2[Next]v and the conjunct
2(pc(0) 6= cs) of the formula at the tail of the edge 2 arrow. (Note that the
prime in this formula is valid because pc(0) 6= cs always true implies that
it’s always true in the next state.) We are using an invariance property of
one program to prove a liveness property of another program. This would
seem strange if we were thinking in terms of code. But we’re thinking
mathematically, and a mathematical proof contains lots of formulas. It’s
not surprising if one of those formulas looks like the formula that describes
a program.
The edges numbered 3 enter a box whose label is the same formula from
which those edges come. In general, an edge can enter a box with a label
2F if it comes from a formula that implies 2F . This is because a box
labeled 2F is equivalent to conjoining 2F to all the formulas in the box,
and 2F ; (G 1 ∨ . . . ∨ G n ) implies 2F ; ((2F ∧ G 1 ) ∨ . . . ∨ (2F ∧ G n )).
An arrow can always leave a box, since removing the formula it points to
from the box just weakens that formula.
Proofs of the assertions represented by the rest of the lattice’s edges are
sketched below.
edges 3 The formula represented by these edges is true because the dis-
junction of the formulas they point to asserts that pc(1) is in the set
{ncs, wait, w 2, w 3, w 4, cs, exit}, which is implied by 2Inv .
edges 4 If pc(1) equals cs or exit, then 2Inv ∧ 2[Next]v and the fairness
condition for process 1 imply that it will eventually be at ncs. Either
pc(1) equals ncs forever or eventually it will not equal ncs. In the
latter case, 2[Next]v implies that the step that makes pc(1) = ncs
false must make pc(1) = wait true.
edge 8 2¬x (1), 2(pc(0) = w 2) (implied by the inner box’s label), and
OBFair imply that a process 0 step that makes pc(0) equal to cs must
eventually occur. (Equivalently, these three formulas are contradic-
tory, so they imply false which implies anything.)
The proof sketches of the properties asserted by edges 4 and edge 6 skim
over more details than the proofs of the other the properties asserted by the
lattice. A more detailed proof would be described by a lattice in which each
of the formulas pointed to by the edges numbered 3 were split into multiple
formulas—for example, the formula pc(1) ∈ {cs, exit, ncs} would be split
into the formulas pc(1) = cs, pc(1) = exit, and pc(1) = ncs. A good check
of your understanding is to draw the more detailed lattice and write proof
sketches for its new edges.
P (sem) V (sem)
await sem = 1; sem : = 1
sem : = 0
Locks were originally implemented with operating system calls. Modern
multiprocessor computers provide machine instructions to implement them.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 139
variables sem = 1 ;
process p ∈ Procs
while true do
ncs: skip ;
wait: P (sem) ;
cs: skip ;
exit: V (sem)
end while
end process
Using a lock, mutual exclusion for any set Procs of processes can be imple-
mented with the trivial algorithm LM of Figure 4.6.
Let PNext(p) now be the next-state action of process p of program LM .
With weak fairness of (pc(p) 6= ncs) ∧ PNext(p) for each process p as its
fairness property, algorithm LM satisfies the deadlock freedom condition
(4.16). However, deadlock freedom allows individual processes to be starved,
remaining forever in the waiting section.
Let Wait(p), Cs(p), and Exit(p) be the actions described by the state-
ments in process p with the corresponding labels wait, cs, and exit. Weak
fairness of (pc(p) 6= ncs)∧PNext(p) is equivalent to the conjunction of weak
fairness of the actions Wait(p), Cs(p), and Exit(p). Program LM allows
starvation of individual processes because weak fairness of the actions of
each process ensures that if multiple processes are waiting to execute that
action, then some process will execute it. But if processes continually reach
the wait statement, some individual processes p may never get to execute
Wait(p).
It’s reasonable to require the stronger condition of starvation freedom,
which asserts that no process starves. This is the property
(4.21) ∀ p ∈ Procs : (pc(p) = wait) ; (pc(p) = cs)
which asserts that any process reaching wait must eventually enter its critical
section. For LM to satisfy this property, it needs a stronger fairness property
than weak fairness of the Wait(p) actions.
The informal justification and the proof of (4.23) are similar to the ones for
(4.14). The proof of (4.24) is essentially the same as that of (4.15).
Exit(p). This is because the action is enabled iff pc(p) has the appropriate
value, so it remains enabled until a step of that action occurs to change
pc(p). Thus, when the action is enabled, it is continuously enabled until it
is executed. We can therefore write LMFair as the conjunction of strong
fairness of the three actions Wait(p), Cs(p), and Exit(p).
The same sort of reasoning that led to (4.13) of Section 4.2.3, as well as
Theorem 4.7 of Section 4.2.7, imply that the conjunction of strong fairness
of these three actions is equivalent to strong fairness of their disjunction.
Therefore, we can write LMFair as strong fairness of their disjunction, which
equals (pc(p) 6= ncs) ∧ PNext(p).
While SFv ((pc(p) 6= ncs) ∧ PNext(p)) is compact, I prefer not to define
LMPFair (p) this way because it suggests to a reader of the formula that
strong fairness of Cs(p) and Exit(p) is required, although only weak fairness
is. Usually, the process’s next-state action will be the disjunction of many
actions, and strong fairness is required of only a few of them. I would define
LMFair to equal
This is redundant because the first conjunct implies weak fairness of Wait(p)
and the second conjunct asserts strong fairness of it. But a little redundancy
doesn’t hurt, and its redundancy should be obvious because strong fairness
implies weak fairness.
of OB .7
We now state the theorem asserting that the conjunction of fairness prop-
erties produces a machine-closed specification. Its proof is in the Appendix.
Theorem 4.6 Let Init be a state predicate, Next an action, and v a tuple
of all variables occurring in Init and Next. If Ai is a subaction of Next for
all i in a countable set I , then the pair
where U is the until operator with which we first defined weak fairness
of PNext(p) as (4.8). Similarly to what we did for weak fairness, we can
remove the U by observing that F U G implies that if G is never true,
then F must remain true forever. That hAi iv is never true is asserted by
¬3hAi iv , which is equivalent to 2[¬Ai ]v . Therefore (4.25) implies
While (4.25) implies (4.26), the formulas are not equivalent. Formula (4.26)
is strictly weaker than (4.25). However, it’s strong enough to imply that
strong or weak fairness of all the Ai is equivalent to strong or weak fairness
of Q—assuming that Q is the disjunction of the Ai . Here is the precise
theorem; its proof is in the Appendix.
∆
Theorem 4.7 Let Ai be an action for each i ∈ I , let Q = ∃ i ∈ I : Ai ,
and let XF be either WF or SF. Then
|= ( ∀ i ∈ I : 2( EhAi iv ∧ 2[¬Ai ]v ⇒
2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ) )
⇒ ( XFv (Q) ≡ ∀ i ∈ I : XFv (Ai ) )
With TLA, fairness was generalized to weak and strong fairness of arbi-
trary actions. We have considered a fairness property for a safety property
S to be a formula L that is the conjunction of weak and strong fairness con-
ditions on actions such that hS , Li is machine closed. However, weak and
strong fairness of an action are defined in terms of how the action is written,
not in terms of its semantics. While we have given semantic definitions of
safety, liveness, and machine closure; we have not done it for fairness.
I only recently learned that a semantic definition of fairness was pub-
lished in 2012 by Völzer and Varacca [48]. Their definition of what it means
for a property L to be a fairness property for a safety property S can be
stated in terms of the following infinite two-player game. Starting with seq
equal to the empty sequence, the two players forever alternately take steps
that append a finite number of states to seq. The only requirement on the
steps is that after each one, seq must satisfy S . The second player wins
the game if she makes seq an infinite sequence that satisfies L. (Since S
is a safety property, seq must satisfy S .) They defined L to be a fairness
property for S iff the second player can always win, regardless of what the
first player does (as long as he follows the rules).
It is mathematically meaningless to say that a definition is correct. How-
ever, this seems to be the only reasonable definition that includes weak and
strong fairness such that fairness implies machine closure and the conjunc-
tion of countably many fairness properties is a fairness property. This defi-
nition also encompasses other fairness properties that have been proposed,
including one called hyperfairness [33].
I believe that weak and strong fairness of actions are the only fairness
properties that are relevant to abstract programs. However, this general
definition is interesting because it provides another way to think about fair-
ness. More importantly, it’s interesting because concepts we are led to by
mathematics often turn out to be useful.
that the Input action is enabled in every reachable state of the program,
which is asserted by
|= S ⇒ 2 E(Input)
However, “always possible” might instead mean that from any reachable
state, there is a sequence of possible steps that reach a state with E(Input)
true—a condition we will call “always eventually possible”. To express this
and other possibility conditions in TLA, we can use the action composition
operator defined in Section 3.4.1.4. Recall that for any action A, the action
A+ is true of a step s → t iff t is reachable from s by a sequence of one or
more A steps.
Now consider an abstract program Init ∧ 2[Next]v where Init is a state
predicate, Next is an action, and v is the tuple of all variables that appear
in Init or Next. Let’s abbreviate ([Next]v )+ as [N ext]+v . If s is a reachable
+
state of the program, then s → t is a [N ext]v step iff it is possible for an
execution of the program to go from state s to state t. (Since [Next]v allows
stuttering steps, t can equal s. In fact, [N ext]+ +
v is equivalent to [Next ]v .)
A state t is a reachable state of the program iff there is a state s satisfying
Init such that s → t is a [N ext]+
v step. In other words, t is a reachable state
of the program iff there is a state s such that s → t is an Init ∧ [N ext]+ v
step.
We can now express the condition that it is always eventually possible for
the user to enter input, meaning that from any reachable state, it is possible
to reach a state in which E(Input) is true. We generalize this condition by
replacing E(Input) with an arbitrary state predicate P . For the abstract
program Init ∧ 2[Next]v , that P is always eventually possible is expressed
as:
0
(4.27) |= Init ∧ 2[Next]v ⇒ 2 E([N ext]+
v ∧P )
if the program satisfies the property 23P . Let S equal Init ∧ 2[Next]v .
The safety property S will not imply the liveness property 23P unless P
is true in all reachable states of F —that is, unless S implies 2P . However,
if F is a fairness property for S , so hS , F i is machine closed, then S ∧ F
has the same set of reachable states as S . So, any state satisfying P can
be reached from a reachable state of S iff it can be reached from a state
satisfying S ∧ F . Therefore, it suffices to verify that a state satisfying P
can be reached from every reachable state of S ∧ F , which is true if S ∧ F
implies 23P . Therefore, we can verify that P is always eventually possible
by verifying
true. For example, if a model checker reports that your mutual exclusion
algorithm satisfies mutual exclusion, you should check that it’s possible for a
process to enter its critical section. This is especially true if you did not have
to make many corrections to reach that point. Remember that a program
that takes no non-stuttering steps satisfies most safety properties.
Tools can provide other ways of checking the accuracy of a program. For
example, if Input is a subaction of the program’s next-state action, a TLA+
model checker called TLC reports how many different steps satisfying Input
occur in behaviors of the program. If it finds no such steps, then it is not
always possible for an Input step to occur with either definition of “always
possible”. Finding too few such steps can also be an indication that the
program is not accurate.
Accuracy of an abstract program cannot be formally defined. It means
that a program really is correct if it implements the abstract program. In
other words, an abstract program is accurate iff it means what we want it
to mean, and our desires can’t be formally defined. That accuracy can’t
be formally defined does not mean it’s unimportant. There are quite a few
important aspects of programs that lie outside the scope of our science of
correctness.
Fischer.9
I believe that most work on the correctness of real-time programs has
considered only safety properties. Instead of requiring that something even-
tually happens, it requires the stronger property that it happens within
some fixed amount of time, which is a safety property. Fischer’s Algorithm
is more general because in addition to using real-time to ensure mutual ex-
clusion, a safety property, it uses fairness to ensure deadlock freedom, a
liveness property.
In the past 40 years, I have had essentially no contact with engineers who
build real-time systems. I know of only one case in which TLA was used to
check that a commercial system satisfied a real-time property [5].10 From
the point of view of our science, there is nothing special about real-time
programs. However, how well tools work can depend on the application
domain. The TLA+ tools were not developed with real-time programs in
mind, and it’s unclear how useful they are in that domain.
variables x = none ;
process p ∈ Procs
variables pc = ncs ;
while true do
ncs: skip ; noncritical section
wait: await x = none ;
w 1: x : = p ;
w 2: if x 6= p then goto wait end if ;
cs: skip ; critical section
exit: x : = none
end while
end process
∀ p ∈ Procs :
∧ (pc(p) = w 1) ⇒ (rt(p) ≤ now ≤ (rt(p) + δ))
∧ (pc(p) ∈ {cs, exit}) ⇒ (x = p) ∧ (∀ q ∈ Procs : pc(q) 6= w 1)
∧ (pc(p) = w 2) ∧ (x = p) ⇒
∀ j ∈ Procs : (pc(q) = w 1) ⇒ ((rt(q) + δ) < (rt(p) + ))
You should understand why the three conjuncts in this formula are the three
assertions expressed informally above. Adding the type-correctness part and
proving that it is an inductive invariant is a good exercise if you want to
learn how to write proofs.
Under suitable fairness assumptions, Fischer’s Algorithm is deadlock
free. Recall that deadlock freedom for a mutual exclusion algorithm means
it’s always true that if some process is trying to enter the critical section,
then some process (not necessarily the same process) will eventually do so.
Deadlock freedom of Fischer’s Algorithm follows from the algorithm having
this additional invariant:
and so on. Process p must wait forever at w 2 because now is always less
than t + and the w 2 action is enabled only when now ≥ t + . Such a
behavior, in which time remains bounded, is called a Zeno behavior.
The most natural way to avoid the problem of Zeno behaviors is to
make the abstract program describing Fischer’s Algorithm disallow them.
The obvious way to do that is to conjoin this liveness property:
(4.29) ∀ t ∈ R : 3(now > t)
which asserts that the value of time is unbounded. However, this isn’t neces-
sarily a fairness property. It’s easy to write an abstract program that allows
only Zeno behaviors, so conjoining the liveness property (4.29) produces a
program that allows no behaviors. For example, we can add timing con-
straints to the program of Figure 4.7 that require a process both to execute
statement w 1 within δ seconds after executing statement wait and to wait
at least seconds after executing wait before executing w 1, with δ < . If a
process executes wait at time t, then now ≤ t + δ must remain true forever.
If we added fairness properties that required processes eventually to reach
the wait statement and execute it if it’s enabled, then the program would
allow only Zeno behaviors.
We can ensure that Fischer’s Algorithm satisfies (4.29) by having it re-
quire an appropriate fairness condition on the advancing of time. The condi-
tion we need is strong fairness of the action timeStep ∧ (now 0 = exp), where
exp is the largest value of now 0 permitted by the values of rt(p) for processes
p with control at w 1, or now + 1 if there is no such process. More precisely:
∆ ∆
exp = let T = {rt(p) + δ : p ∈ {q ∈ Procs : pc(q) = w 1}}
in if T = {} then now + 1 else Min(T )
where Min(T ) is the minimum of the nonempty set T of real numbers.
With this fairness condition on advancing time and the conjunction of the
fairness conditions for the processes in Procs, Fischer’s Algorithm satisfies
(4.29) and the proof sketch that the algorithm is deadlock free can be made
rigorous.
If we are interested only in safety properties, there is no need for an
abstract program to rule out Zeno behaviors. A program satisfies a safety
property iff all finite behaviors allowed by the program satisfy it, and a Zeno
behavior is an infinite behavior. In many real-time programs, liveness prop-
erties are of no interest. Correctness means not that something eventually
happens but that it happens within a certain length of time, which is a
safety property. Zeno behaviors then make no difference, and there is no
reason to disallow them.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 155
Other than the differences implied by the use of continuous math, such
as the calculus in (4.31), rather than discrete math, proving properties of
hybrid programs is the same as proving properties of other real-time ab-
stract programs. Automatic tools like model checkers for ordinary abstract
programs seem to be unsuitable for checking abstract programs in which
variables represent continuously varying quantities. Methods have been de-
veloped for checking such programs [11].
Chapter 5
Refinement
Data Refinement A program refining another program can also refine the
representation of data used by the higher-level program. This will be
illustrated by refining a higher-level program that uses numbers with
a program that implements a number by a sequence of digits.
Refinement usually involves both step and data refinement, with step re-
finement manifest as operations on the lower-level data requiring more non-
stuttering steps than the corresponding operations on the higher-level pro-
gram’s data.
When a program S is refined by a program T , the variables in formula
T are usually different from the variables in formula S , and the two sets of
variables have non-overlapping scopes. Showing that T refines S involves
159
CHAPTER 5. REFINEMENT 160
means that their new values are unspecified. We are assuming that the final
values of x and y don’t matter; we care only about the final value of z .
f in :: false 0 f in :: true 1
x :: 321 x :: 5
y :: 23 y :: ?
z :: ?
→
z :: 344
steps taken by the algorithm you learned in school is usually equal to or one
greater than the length of the longer number. For simplicity, the number
of steps taken by AddSeq always equals one plus the length of the longer
number. To simplify the description of what happens when the algorithm
runs out of digits in one of the numbers, it uses the operator Fix defined as
follows to replace the empty sequence by h0i:
∆
Fix (seq) = if seq = h i then h0i else seq
The algorithm’s define statement defines digit to equal the indicated ex-
pression within that statement. The value of bn/10c is the greatest integer
less than or equal to n/10. To simplify the invariant, AddSeq specifies the
initial value of carry to equal 0 and ensures that it equals 0 at the end. Since
the low-order digit of a two-digit number n is n % 10 and its high-order digit
is bn/10c, it should be clear that AddSeq describes an algorithm for adding
two decimal numbers. (If it’s not, execute it by hand on an example.)
The usual way to express correctness of a program that computes a value
sum and stops is with an invariant asserting that if the program has stopped
then sum has the correct value. We can’t do that with AddSeq because the
correct value of sum is the initial value of u ⊕ v , and those initial values
have disappeared by the time the program stops. To express correctness, we
can add a constant ans that equals the initial value of u ⊕ v . Since stopping
means pc equals done for our pseudocode, correctness means:
The key part of an inductive invariant to prove (5.6) is the assertion that
CHAPTER 5. REFINEMENT 166
ans equals the final value of sum. A first approximation to the final value
of sum is:
sum ◦ (hcarry i ⊕ (u ⊕ v ))
refine stuttering steps of Add , the values of x and y must not change. Zero
seems like a nice value to let x and y equal when their value no longer
matters, so we let the refinement mapping include:
x ← if sum 6= h i then 0 else Val (u) ,
y ← if sum =
6 h i then 0 else Val (v )
Finally, we must decide what value the refinement mapping assigns to z .
If we add to AddSeq the constant ans that always equals the result the al-
gorithm finally computes, then we can substitute Val (ans) for z . But we
don’t have to add it because the invariant (5.7) tells us what expression con-
taining only the variables of AddSeq always equals ans. We could therefore
substitute for z the expression obtained by applying Val to the right-hand
side of equation (5.7). However, there’s a simpler expression that we can
use. Convince yourself that the following substitution works:
z ← Val (sum) + 10Len(sum) ∗ (carry + Val (u) + Val (v ))
This completes the refinement mapping. That AddSeq implements Add
under the refinement mapping means that this theorem is true:
|= AddSeq ⇒ (Add with
done ← sum 6= h i ,
x ← if sum 6= h i then 0 else Val (u) ,
y ← if sum 6= h i then 0 else Val (v ) ,
z ← Val (sum) + 10Len(sum) ∗ (carry + Val (u) + Val (v )))
This algorithm works fine, and the system keeps choosing a sequence of
inputs, until the leader fails. At that point, a new leader is selected. The
new leader sends a message to all the acceptors asking them what they’ve
done. In particular, the new leader finds out from the acceptors if inputs
were chosen that it was unaware of. It also finds out if the previous leader
had begun trying to choose an input but failed before the input was chosen.
If it had, then the new leader completes the choice of that input. When the
new leader has received this information from a majority of acceptors, it
can complete any uncompleted choices of an input and begin choosing new
inputs. Let’s call this algorithm the naive consensus algorithm.
There’s one problem with the naive algorithm: How is the new leader
chosen? Choosing a single leader is just as hard as choosing a single input.
The naive consensus algorithm thus assumes the existence of a consensus
algorithm. However, because leader failures should be rare, choosing a leader
does not have to be done efficiently. So, programmers would probably have
approached the problem of choosing a leader the way they approached most
programming problems. They would have found a plausible solution and
then debugged it. Debugging usually means thinking of all the things that
could go wrong and adding code to handle them.
Let’s pause and look at the science of consensus. Before Paxos, there
were consensus algorithms that worked no matter what a failed process
could do [45]. However, they were synchronous algorithms, meaning that
they assumed known bounds on the time required for messages sent by one
process to be received and acted upon by another process. They were not
practical for the loosely coupled computers that had become the norm by the
1980s. Although asynchronous algorithms were required, they had to solve a
simpler problem because sufficiently reliable systems could be based on the
assumption that a process failed by stopping and could not perform incorrect
actions. However, the FLP theorem, named after Michael Fischer, Nancy
Lynch, and Michael Paterson who discovered and proved it, states that no
asynchronous algorithm can implement consensus if even a single process can
fail in this benign way [13]. More precisely, any algorithm that ensures the
safety property that two processes never choose different values must allow
behaviors that violate the liveness property that requires a value eventually
to be chosen if enough processes are working and can communicate with one
another. Asynchronous algorithms that ensure liveness must allow behaviors
in which processes disagree about what input is chosen.
The leader-selection code programmers would have written therefore had
to allow either behaviors in which two processes thought they were the
leader, probably with serious consequences, or else behaviors in which no
CHAPTER 5. REFINEMENT 171
leader is selected, causing the system to stop choosing values. With a prop-
erly designed algorithm, the probability of never choosing the leader is zero,
and a leader will be chosen fairly quickly if enough of the system is working
properly. The system my colleagues built ran for several years with about
60 single-user computers, and I don’t think their consensus code caused any
system error or noticeable stalling. There is no way to know if it had errors
that would have appeared in today’s systems with thousands of computers
and many thousands of users.
the constraint that if any process has learned that the value v was chosen,
then p must also learn that v was chosen.
We take a different approach and let the abstract program describe only
the choosing of a value, without mentioning processes that learn the chosen
value. This abstract program has a single variable chosen that represents
the set of values that have been chosen. (In any behavior allowed by the
program, that set always has at most one value.) The initial predicate is
chosen = {}, and the next-state action is:
records all the votes that each acceptor has cast. This is described by a
variable votes whose value is a function that assigns to each acceptor a a
set votes(a) of pairs hb, v i where b ∈ N and v ∈ Value. The pair hb, v i in
votes(a) means that a has voted for v in ballot number b.
Choosing a leader is the weak point in the naive algorithm. The Voting
algorithm abstracts away the leaders. A leader serves two functions. The
first is to ensure that in any ballot, acceptors can cast votes only for the
value proposed by the leader. The Voting algorithm’s next-state action
takes care of that by not letting an acceptor cast a vote for a value v in
ballot b if a vote has already been cast in ballot b for a different value. The
second function of the leader is to learn that a value has been chosen, which
it does when it has received enough OK messages. The Voting algorithm
does away with that function by declaring that the value has been chosen
when the requisite number of OK messages have been sent—that is, when
there are enough votes cast for the value in the ballot. More precisely, we
define ChosenAt(b, v ) to be true iff a majority of acceptors has voted for v in
ballot b. The Voting algorithm implements the Consensus abstract program
under the refinement mapping
In addition to votes, the algorithm has one other variable maxBal whose
value is a function that assigns to each acceptor a a number maxBal (a).
The significance of this number is that a will never in the future cast a vote
in any ballot numbered less than maxBal (a). The value of maxBal (a) is
initially 0 and is never decreased. The algorithm can increase maxBal (a) at
any time.
It may seem strange that the state does not contain any information
about what processes have failed. We are assuming that a failed process does
nothing. Since we are describing only safety, a process is never required to
do anything, so there is no need to tell it to do nothing. A failed process that
has been repaired can differ from a process that hasn’t failed because it may
have forgotten its prior state when it resumes running. A useful property
of a consensus algorithm is that, even if all processes fail, the algorithm can
resume its normal operation when enough processes are repaired. To achieve
this, we require that a process maintains its state in stable storage, so it is
restored when a failed process restarts. A process failing and restarting is
then no different from a process simply pausing.
being executed. Execution of an instance of the consensus algorithm can consist of multiple
ballots.
CHAPTER 5. REFINEMENT 174
ber b. More precisely, the ballot b leader orchestrates the voting by the
acceptors in ballot b of the Voting algorithm.
The next-state action of the algorithm could be (but isn’t literally) writ-
ten in the form ∃ b ∈ N : BA(b) where BA(b) describes how ballot b is per-
formed. The ballot consists of two phases. In phase 1, the ballot b leader
sends a message to the acceptors containing only the ballot number b. An
acceptor a ignores the message unless b > maxBal (a), in which case it sets
maxBal (a) to b and replies with a message containing a, b, MaxVBal (a),
and MaxBal (a). When the ballot b leader receives those messages from a
majority of the acceptors, it can pick a value v to be chosen, where v is ei-
ther a value picked by the leader of a lower-numbered ballot or an arbitrary
value. The complete algorithm describes how it picks v . Phase 2 begins
with the leader sending a message to the acceptors asking them to vote for
v in ballot b. An acceptor a ignores the message unless b ≥ maxBal (a), in
which case a sets maxBal (a) to b and replies with a message saying that it
has voted for v in ballot b.
The Paxos algorithm implements the Voting algorithm under a refine-
ment mapping in which the variable votes of Voting is implemented by the
expression defined in the obvious way from the set of votes reported by
acceptors’ phase 2 messages in msgs, and in which the variable maxBal
of Voting is implemented by the variable of the same name in the Paxos
abstract program.
The values of maxVBal and maxVal can be described as functions of
the value of votes. For any acceptor a, the pair hmaxVBal (a), maxVal (a)i
equals the pair hb, v i in the set votes(a) with the largest value of b. (Ini-
tially, when votes(a) is the empty set, it equals h−1, None i for some special
value None.) Making maxVBal and maxVal variables rather than state
expressions makes it clear that they are the only information about what
messages have been sent that needs to be part of the acceptors’ states.
tor. However, the naive algorithm can fail to maintain its safety requirement
if two different computers believe they are the coordinator. If that happens
with Paxos, safety is preserved; the algorithm just fails to make progress.
An algorithm for choosing a coordinator in Paxos needs to work only most of
the time, a much easier problem to solve. One solution uses a synchronous
algorithm that implements consensus assuming known bounds on the times
needed to transmit and process messages. That algorithm chooses the coor-
dinator assuming values for those bounds that will be satisfied most of the
time.
imply actions of the form hALM ivLM . For proving that OB implies LMSafe,
we need only the weaker assertions obtained by replacing such an action by
ALM . However, we will need the stronger assertions later for proving that
OB implies LMLive.
R1. |= InvOB ∧ NcsOB (p) ⇒ NcsLM (p)
R2. |= InvOB ∧ WaitOB (p) ⇒ (vLM 0 = vLM )
R3. |= InvOB ∧ W 2OB (p) ⇒
if p = 0 then hWaitLM (0)ivLM
else if xOB (0) then vLM 0 = vLM
else hWaitLM (1)ivLM
R4. |= InvOB ∧ W 3OB (p) ⇒ (vLM 0 = vLM )
R5. |= InvOB ∧ W 4OB (p) ⇒ (vLM 0 = vLM )
R6. |= InvOB ∧ CsOB (p) ⇒ hCsLM (p)ivLM
R7. |= InvOB ∧ ExitOB (p) ⇒ hExitLM (p)ivLM
Assertion R3 is equivalent to these three assertions:
R3a. |= InvOB ∧ W 2OB (0) ⇒ hWaitLM (0)ivLM
R3b. |= InvOB ∧ W 2OB (1) ∧ xOB (0) ⇒ (vLM 0 = vLM )
R3c. |= InvOB ∧ W 2OB (1) ∧ ¬xOB (0) ⇒ hWaitLM (1)ivLM
All these assertions are proved by expanding the definitions of the actions
and of the refinement mapping. To see how this works, we consider R3a.
We haven’t written the definitions of the actions corresponding to the pseu-
docode statements of algorithms OB and LM . The definitions of W 2OB (0)
and WaitLM (0) as well as the other relevant definitions are in Figure 5.2.
Here is the proof of R3a.
definition of W 2OB (0) and InvOB (which implies pcOB is a function with
domain {0, 1}) imply pcOB 0 (0) = cs. Hence semBarOB 0 = 0, so semLM 0 = 0.
4. Q.E.D.
Proof: Steps 2 and 3 and the definition of WaitLM (0) imply WaitLM (0).
Step 3 implies semLM 0 6= semLM which implies vLM
0 6= v , proving the goal
LM
hWaitLM (0)ivLM introduced by step 1.
How we decomposed the proof that OBSafe refines OBLive into proving
R1–R7 was determined by the structure of NextOB as a disjunction of seven
subactions and knowing which disjuncts of NextLM each of those subactions
implements, which followed directly from the definition of the refinement
mapping. The decomposition of R3 into R3a–R3c followed from the struc-
ture of R3. As illustrated by the proof of R3a, the proof of each of the
resulting nine formulas is reduced to ordinary mathematical reasoning by
expanding the appropriate definitions. The only place where not under-
standing the algorithms could result in an error is in the definition of the in-
variant InvOB or of the refinement mapping. Catching such an error requires
CHAPTER 5. REFINEMENT 183
only careful reasoning about simple set theory and a tiny bit of arithmetic,
using elementary logic. Someday, computers should be very good at such
reasoning.
We prove (5.17) by finding an action BOB and state predicates POB and QOB
satisfying the following conditions:
To show that these conditions imply (5.17), we have to show that they imply
that in any behavior σ satisfying OBB , if 2 EhALM ivLM is true of σ +m , then
σ(n) → σ(n + 1) is an hALM ivLM step for some n ≥ m. Condition A1.1
implies 2QOB is true of σ +m , which by A1.2 implies 2POB is true of σ +q for
some q ≥ m. By the definition of WF, conditions A2 imply σ(n) → σ(n + 1)
is a hBOB ivOB step for some n ≥ q, and A3 implies that hBOB ivOB step is
an hALM ivLM step.
CHAPTER 5. REFINEMENT 184
ExitLM (p) pcOB (p) = exit ExitOB (p) pcOB (p) = exit
Figure 5.3: Formulas BOB , POB , and QOB for the actions ALM , with p ∈ {0, 1}.
The formulas BOB , POB , and QOB used for the six actions ALM are shown
in Figure 5.3. Condition A2.1 for the actions ALM follows easily from the
definitions of BOB and POB . To show that A2.2 is satisfied, we apply Theo-
rem 4.7 to write OBFair as the conjunction of weak fairness of the actions
described by each process’s statements other than its ncs statement. That
A3 is satisfied for the four actions ALM in Figure 5.3 follows from conditions
R3a, R3c, R6, and R7 of Section 5.4.2.
This leaves condition A1 for the actions. A1.1 is proved by using
the type correctness invariant implied by InvLM to show that EhALM ivLM
equals E(ALM ) , and then substituting pcBarOB for pcLM and semBarOB
for semLM in E(ALM ) . For our example, this actually shows that InvLM
implies EhALM ivLM ≡ QOB for all the actions ALM . A1.2 is trivially satisfied
for CSLM (p) and ExitLM (p), since QOB and POB are equal. The interesting
conditions are A1.2 for WaitLM (0) and WaitLM (1). They are the kind of
leads-to property we saw how to prove in Section 4.2.5. In fact, we now
obtain a proof of A1.2 for WaitLM (0) by a simple modification of the proof
in Section 4.2.5.3 that OB implies:
Let’s drop the subscript OB , so the variables in any formula whose name
has no subscript are the variables of OB . The proof of (5.18) is described
by the proof lattice of Figures 4.4 and 4.5. A 2 formula in a label on a
box in a proof lattice means that the formula is conjoined to each formula
inside the box. Since F ; G implies (2H ∧ F ) ; (2H ∧ G) for any F ,
G, and H , we obtain a valid proof lattice (one whose leads-to assertions are
all true) by conjoining 2InvLM ∧ OBFair ∧ 2Q to the labels of the outer
CHAPTER 5. REFINEMENT 185
boxes in the lattices of Figures 4.4 and 4.5. This makes those labels equal
to OBB ∧ 2Q . Since Q implies pc(0) ∈ {wait, w 2} , we obtain a valid
proof lattice by replacing the source node of the lattice in Figure 4.4 by
2Q. Moreover, since the new label’s conjunct 2Q implies 2(pc(0) 6= cs),
so it’s impossible for pc(0) ever to equal cs, we can remove the sink node
pc(0) = cs and the edges to and from it from the lattice of Figure 4.5.3
Since the label on the inner box containing 2¬x (1) , which is the new sink
node, implies 2(pc(0) = w 2) , we now have a valid proof lattice that shows:
1. 2Q ⇒ 2¬x (0)
1.1. 2Q ⇒ 2(pc(0) ∈ / {wait, w 2})
Proof: We proved in Section 4.2.5.3 that pc(0) ∈ {wait, w 2} leads to
pc(0) = cs, and 2Q implies 2(pc(0) 6= cs).
1.2. 2Q ∧ 2(pc(0) ∈ / {wait, w 2}) ⇒ 2(pc(0) = ncs)
Proof: Q implies pc(0) ∈/ {cs, exit}, which by Inv and pc(0) ∈
/ {wait, w 2}
implies pc(0) = ncs.
1.3. Q.E.D.
Proof: By steps 1.1 and 1.2, since Inv ∧ Q imply pc(0) = ncs, and Inv
and pc(0) = ncs imply ¬x (0).
2. 2Q ∧ 2¬x (0) ; 2P
2.1. 2Q ∧ 2¬x (0) ; (pc(1) = w 2)
Proof: Q implies pc(1) ∈ {wait, w 2, w 3, w 4}, and a straightforward
proof using fairness of PNext(1) and 2¬x (0) shows
(pc(1) ∈ {wait, w 2, w 3, w 4}) ; (pc(1) = w 2)
2.2. 2Q ∧ 2¬x (0) ∧ (pc(1) = w 2) ⇒ 2(pc(1) = w 2)
Proof: 2Q implies 2(pc(1) 6= cs), and (pc(1) = w 2) ∧ 2[Next]v ∧
2(pc(1) 6= cs) implies 2(pc(1) = w 2).
3
Equivalently, we can remove edge 8 and add an edge from pc(0) = cs to false and
an edge from false to 2¬x (1), since false implies anything.
CHAPTER 5. REFINEMENT 186
2.3. Q.E.D.
Proof: Steps 2.1 and 2.2 imply 2Q ∧ 2¬x (0) ; 2(pc(1) = w 2), and
2P equals 2(pc(1) = w 2) ∧ 2¬x (0).
3. Q.E.D.
Proof: By steps 1 and 2,
equals
(y − z ) = x + (sym awith x 0 ← y − z )
∆
√
If sym = 2 ∗ x 0 , then this equals
q
(y − z ) = x + 2 ∗ (y − z )
Now let A be an action and let x 1 , . . . , x n be all the variables that
appear in A. We can then write E2 as:
(5.19) E(A) = ∃ c 1 , . . . , c n : (A awith x 0 1 ← c 1 , . . . , x 0 n ← c n )
∆
5.4.4.2 Computing E
The syntactic definition (5.19) of E immediately provides rules for writing
E(A) in terms of formulas E(B i ), for B i subactions of A. From the rule
|= (∃ c : A ∨ B ) ≡ (∃ c : A) ∨ (∃ c : B )
we have
E1. |= E(A ∨ B ) ≡ E(A) ∨ E(B )
For example, in program LM defined in Section 4.2.6.1, the next-state action
PNext(p) is the disjunction of actions Ncs(p), Wait(p), Cs(p), and Exit(p).
Therefore, rule E1 implies
E(PNext(p)) ≡ E(Ncs(p)) ∨ E(Wait(p)) ∨ E(Cs(p)) ∨ E(Exit(p))
The generalization of E1 is:
E2. |= E(∃ i ∈ S : Ai ) ≡ ∃ i ∈ S : E(Ai )
where S is a constant or state expression.
Another rule of existential quantification is that if the constant c does
not occur in A, then ∃ c : (A ∧ B ) is equivalent to A ∧ (∃ c : B ). From this
we deduce:
E3. If no variable appears primed in both A and B , then |= E(A ∧ B ) ≡
E(A) ∧ E(B ).
CHAPTER 5. REFINEMENT 188
The first asserts that substitution distributes over ∨; the second asserts
that substitution distributes over 2; and the third asserts that substitution
distributes over the construct [. . .]... .
We expect substitution to distribute in this way over all mathematical
operators, so we would expect E(A) and E(A) to be equal for any action A.
In fact, they are equal for most actions encountered in practice. But here’s
an action A for which they aren’t for the refinement mapping of (5.21):
∆
A = ∧ pc 0 = (p ∈ {0, 1} 7→ wait)
∧ sem 0 = 0
Rules E3 and E5 imply that E(A) equals true, so E(A) equals true. By
definition of the refinement mapping:
∆
A = ∧ pcBar 0 = (p ∈ {0, 1} 7→ wait)
∧ semBar 0 = 0
A implies pcBar 0 (p) = wait for p ∈ {0, 1}. By definition of pcBar , this
implies:
Both (1) and (2) can’t be true, so A must equal false and thus E(A)
equals false. Therefore, E(A) does not equal E(A), so substitution does
not always distribute over E.
The reason substitution doesn’t distribute over E is that E(A) performs
the substitutions pc ← pcBar and sem ← semBar for the primed variables
pc 0 and sem 0 . However, as we see from (5.19), those primed variables do not
occur in E(A); they are replaced by bound constants. The substitutions
should be performed only on the unprimed variables. Therefore:
Instead, it equals
5.5 A Warning
We have defined correctness of a program S to mean |= S ⇒ P for some
property P . We have to be careful to make sure that we have chosen P so
that this implies what we really want correctness of the program to mean.
As discussed in Section 4.3, we have to be concerned with the accuracy of P .
When correctness asserts that S refines a program T , the property P
is (T with . . . ) for a refinement mapping “. . .”. That refinement mapping
is as important a part of the property as the program T , and it must be
examined just as carefully to be sure that proving refinement means what
you want it to. As an extreme example, OB also implements LM under this
refinement mapping:
occur. In such a case, it’s a good idea to make sure that S refines T when
fairness requirements are added to those actions in both programs. This is
an application of the general idea of adding fairness to verify possibility that
was introduced in Section 4.3.2.
Chapter 6
Auxiliary Variables
S x = (x = 0) ∧ 2[x 0 = x + 1]x
∆
194
CHAPTER 6. AUXILIARY VARIABLES 195
|= F ⇒ G implies |= (∃ v : F ) ⇒ G
|= (∃ y 1 , . . . , y k : F ) ⇒ G
(6.2) |= G ⇒ ∃ y 1 , . . . , y k : F
Combining (1) and (2), we see that to prove that one program of the
form (6.1) implements another program of that form, we have to prove an
assertion of the form
|= T ⇒ (S with y 1 ← exp 1 , . . . , y k ← exp k )
where S and T have the standard form Init ∧ 2[Next]v ∧ L of an abstract
program and the y i are the internal variables in S . For every interface
variable x of S , which in practice must also occur in T , the with clause
includes an implicit substitution x ← x that substitutes the variable x of
T for the variable x of S . Thus, the with clause describes a refinement
mapping under which T refines S .
This raises a question: If (6.2) is true, do there always exist expressions
exp i for which (6.3) is true? The answer is no, if we can use only the variables
that appear in T to define the refinement mapping. If S has the form
Init ∧ 2[Next]v ∧ L, then the answer is yes if we’re allowed to add auxiliary
variables to T . Adding an auxiliary variable a (which does not occur in T )
to T means writing a formula T a such that ∃ a : T a is equivalent to T . By
this equivalence, we can verify |= T ⇒ S by verifying |= (∃ a : T a ) ⇒ S . By
the ∃ Elimination rule, we do this by verifying |= T a ⇒ S . And to verify
this, we can use a as well as the variables of T to define the refinement
mapping. Auxiliary variables are the main topic of this chapter and are
discussed below.
property even if F is. For example, let F be the following formula, where
br c is the largest integer less than or equal to r :
(6.7) ∧ (x = 0) ∧ (y ∈ N)
∧ 2 [(y > 0) ∧ (x 0 = x + 1) ∧ (y 0 = y − 1)]hx ,y i
In a behavior satisfying this formula, x cannot be incremented forever be-
cause eventually y would equal 0, making any further non-stuttering steps
impossible. Therefore, formula ∃ y : F is equivalent to
To see that this is not a safety property, remember that a behavior σ satisfies
a safety property iff every finite prefix of σ satisfies that property. Consider
a behavior σ in which x does keep being incremented forever. Every finite
prefix of σ satisfies (6.8), since completing the prefix with stuttering steps
makes the behavior satisfy the liveness property 32[x 0 = x ]x . However, σ
doesn’t satisfy (6.8) because it doesn’t satisfy this liveness property. There-
fore, even though formula F , defined to equal (6.7), is a safety property,
formula ∃ y : F , which is equivalent to (6.8), is not a safety property.
∆
InitS = (inp = rdy) ∧ (avg = 0) ∧ seq = h i
∆
User = ∧ inp = rdy
∧ inp 0 ∈ R
∧ (avg 0 = avg) ∧ (seq 0 = seq)
∆
Syst = ∧ inp ∈ R
∧ seq 0 = Append (seq, inp)
∧ avg 0 = SeqSum(seq 0 ) / Len(seq 0 )
∧ inp 0 = rdy
∆
NextS = User ∨ Syst
IS = InitS ∧ 2[NextS ]hinp,avg,seq i
∆
∆
S = ∃ seq : IS
where Init h and Next h are obtained by augmenting Init and Next to de-
scribe, respectively, the initial value of h and how h can change; and vh is
the tuple v ◦ hh i of the variables of v and the variable h. Since h does not
appear in L, the formula ∃ h : T h equals
Using the internal variable seq to write the behavior predicate S is arguably
the clearest way to describe the values assumed by the interface variables inp
and avg. It’s a natural way to explain that the value of avg is the average of
the values that have been input. However, it’s not a good way to describe
how to implement the system. There’s no need for an implementation to re-
member the entire sequence of past inputs; it can just remember the number
of inputs and their sum. In fact, it doesn’t even need an internal variable to
remember the sum. We can implement it with an abstract program T that
implements S using only a single internal variable num whose value is the
number of inputs that the user has entered.
We first describe T in pseudocode and construct T h by adding a history
variable h to the code. The TLA translations of the pseudocode show how
to add a history variable to an abstract program described in TLA.
It’s natural to think of the user and the system in this example as two
separate processes. However, the abstract programs S and T are predicates
on behaviors, which are mathematical objects. Process is not a mathemat-
ical concept; it’s a way in which we interpret predicates on behaviors. For
simplicity, we write T as a single-process program.
The pseudocode for program T is in Figure 6.2. It uses the operator :∈
introduced in Figure 4.8, so statement usr sets inp to an arbitrary element
of R. Since we’re not concerned with implementing T , there’s no reason to
hide its internal variable num.
Because the sum of n numbers whose average is a is n ∗ a, it should be
clear that program T implements program S . But showing that T imple-
ments S requires defining a refinement mapping under which T implements
IS (program S without variable seq hidden). And that requires adding an
CHAPTER 6. AUXILIARY VARIABLES 202
auxiliary variable that records the sequence of input values. Adding the
required auxiliary variable h is simple and obvious. We just add the two
pieces of code shown in black in Figure 6.3.
It is a straightforward exercise to prove
T = Init ∧ 2[Next]hinp,avg,num i
∆ ∆
where Next = Usr ∨ Sys
Actions Usr and Sys are the actions executed from control points usr and
sys, respectively. The TLA translation of the code in Figure 6.3 is
∆
where Init h = Init ∧ (h = h i)
∆
Next h = Usr h ∨ Sys h
∆
Usr h = Usr ∧ (h 0 = h)
∆
Sys h = Sys ∧ (h 0 = Append (h, inp))
Here is the general result that describes how to add a history variable to a
program. Its proof is a simple generalization of the proof for our example.
• exp is a state expression that does not contain the variable h, and the
exp i are step expressions that do not contain h 0 ,
then |= T ≡ ∃ h : T h .
As explained in Section 4.2.7, the standard form for the liveness condition
of a program is the conjunction of weak and/or strong fairness conditions
of subactions of its next-state action. Even if T ∧ L has this form, T h ∧ L
will not because a subaction of Next will not be a subaction of Next h . (An
action that does not mention h cannot imply Next h .) This means that we
can’t apply Theorem 4.6 to show that hT h , Li is machine closed. However,
we can show as follows that if hT , Li is machine closed, then hT h , Li is also
machine closed. By definition of machine closure, this means showing that
any finite behavior ρ satisfying T h can be extended to an infinite behavior
satisfying T h ∧ L. Since T h implies T , machine closure of hT , Li implies ρ
can be extended to a behavior ρ ◦ σ satisfying T ∧ L. By definition of T h ,
we can modify the values of h in the states of σ to obtain a behavior τ such
that ρ ◦ τ satisfies T h . Since the truth of L does not depend on the values
of h, the behavior ρ ◦ τ also satisfies L, as required.
When using TLA, the fact that L will contain fairness conditions on
actions that are not subactions of Next h makes no difference. However, not
everyone uses TLA. In some approaches, abstract programs are described
in something like a coding language, and they define fairness only in terms
of weak and strong fairness of subactions of the next-state action. So, it is
interesting to know if we can replace a fairness condition on a subaction B i
of T with the same fairness condition on a corresponding subaction B hi of
T h . We can, under the following condition, which is likely to be satisfied
by programs written in those other languages: The next-state action of T
must be the disjunction of actions Ai , and each B i must be a subaction of
Ai such that a B i step is not an Aj step for j 6= i . The precise result is
the following, whose proof is in the Appendix. In this theorem, letting B i
equal false is equivalent to omitting that fairness condition because weak
and strong fairness of false are trivially true. (The action false is never
enabled, so (4.23) implies SFv (false) equals 23false ⇒ 23false, which
equals true.)
Theorem 6.2 With the assumptions of Theorem 6.1, for all i ∈ I let B i
be a subaction of Ai such that T ∧ (i 6= j ) ⇒ 2[¬(B i ∧ Aj )]v for all j in
∆
I ; and let B hi = hB i iv ∧ (h 0 = exp i ). Then
Theorem 6.3 Let T equal Init ∧ 2[Next]hxi where x is the list of all vari-
ables of S ; let F be a safety property such that F (σ) depends only on the
values of the variables x in σ, for any behavior σ; and let h be a variable not
one of the variables x. We can add h as a history variable to T to obtain
T h and define a state predicate I F in terms of F such that |= [[T ]] ⇒ F is
true iff I F is an invariant of T h .
Theorem 6.3 assumes only that F is a safety property. This might suggest
we can show that one program satisfies the safety part of another program by
verifying an invariance property. However, I have never seen this done, and
in practice it seems unlikely to be possible to describe any but the simplest
abstract programs with an invariant.
∆
Cen1 = ∃ aw : ICen1
ICen1 = Init ∧ 2[Next1]v
∆
∆
v = hinp, disp, aw i
∆
Init = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}
∆
Next1 = Input ∨ DispOrNot ∨ Ack
∆
Input = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ aw 0 = hinp 0 i
∧ disp 0 = disp
∆
DispOrNot = ∧ aw 6= h i
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = h i
∧ inp 0 = inp
∆
Ack = ∧ (inp ∈ Art) ∧ (aw = h i)
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 6.4: The program Cen1.
∆
Cen2 = ∃ aw : ICen2
ICen2 = Init ∧ 2[Next2]v
∆
∆
Next2 = InpOrNot ∨ Display ∨ Ack
∆
InpOrNot = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ ∨ aw 0 = hinp 0 i
∨ aw 0 = aw
∧ disp 0 = disp
∆
Display = ∧ aw 6= h i
∧ disp 0 = haw (1), 1 − disp(2)i
∧ aw 0 = h i
∧ inp 0 = inp
Figure 6.5: The program Cen2.
that the formulas Cen1 and Cen2 are equivalent, so they describe the same
abstract program.
To show that the two definitions are equivalent, we have to show that
ICen1 and ICen2 each implement the other under a suitable refinement map-
ping. We will see here how to define the refinement mapping under which
ICen2 implements ICen1. Section 6.4 shows how to define the refinement
mapping under which ICen1 implements ICen2.
Ignoring the value of s, the behaviors satisfying T s are the same as behaviors
satisfying T , except each A step in a behavior of T is followed in T s by a
finite number (possibly 0) of steps that leave the variables of T unchanged.
Therefore, by stuttering insensitivity, T and ∃ s : T s are satisfied by the
same sets of behaviors, so they are equivalent.
To show that ICen2 implements ICen1, we define ICen2s in this way,
where A equals InpOrNot and the B i are Ack and Display. In the definition
of InpOrNot s , we let:
∆
exp = if aw 0 = h i then 1 else 0
The proof of (6.11) is similar to, but simpler than, the refinement proof
sketched in Section 5.4.2. Here, we give only the briefest outline of a proof
to present results that will be used below when discussing liveness.
Let’s abbreviate (F with aw ← awBar ) by F for any formula F , so we
must prove |= ICen2s ⇒ ICen1. The proof of |= Init s ⇒ Init is trivial, since
s = 0 implies v = v by definition of awBar , so Init s implies Init = Init.
The main part of the proof is proving:
This is proved by proving assertions C1–C4 below, which are the analogs of
assertions R1–R7 of the proof in Section 5.4.2. Again, assertions containing
actions of the form hAiv are proved for use in reasoning about liveness when a
weaker assertion containing A suffices to prove (6.12). Two of the assertions
require an invariant Inv 2s of ICen2s . That invariant needs to assert type
correctness of disp (for C3) and that s = 1 implies aw = h i (for C2).
if s = h i then aw else s
CHAPTER 6. AUXILIARY VARIABLES 211
In general, we can let s assume values in any set with a well-founded relation,
defined in Section 2.2.6.2 to be a set with an ordering relation in which any
sequence of decreasing elements must reach a minimal element. We just
require that every added stuttering step decreases the value of s.
One use of this generality is for adding stuttering steps after multiple
actions. To do this, we let the value of s be a pair hm, i i, where m is the
number of remaining stuttering steps and i identifies the action. We define
the well-founded ordering on this set of pairs by letting hm, i i hn, j i
iff m > n. We can use this same trick to let the value of s be a tuple
with additional components. Information in those other components can
be used in defining the refinement mapping so it makes the stuttering steps
implement the appropriate steps of the higher-level program. For simplicity,
we state our theorem just for this particular use of a well-founded order.
However, the conjunct s(2) = i in the definition of Asi is added to ensure
that only Asi performs stuttering steps added after Ai , although that matters
only if s contains additional components that depend on i .
then ∃ s : T s equals T .
The theorem does not assume that the actions Ai and B are mutually dis-
joint. A step could be both an Ai and an Aj step for i 6= j , or both an Ai
and a B step. That should rarely be the case when applying the theorem,
since it allows a nondeterministic choice of how many stuttering steps (if
CHAPTER 6. AUXILIARY VARIABLES 212
any) are added in some states. The action B will usually be the disjunc-
tion of actions B j . In that case, B s equals the disjunction of the actions
(s(0) = 0) ∧ B j ∧ (s 0 = s).
is rejected or after a Display step that WFv (Display) implies must occur.
When Ack is enabled, WFv (Ack ) implies that an Ack step must occur.
For IC 2s to implement IC 1 under a refinement mapping, it should ensure
that an input step is eventually followed by an Ack s step. In IC 2s , an input
is entered by an (s = 0) ∧ InpOrNot s step. We must show that such a step
is eventually followed by an Ack s step. This appears problematic because if
the step rejects the input, then it sets s to 1, in which case the only enabled
action of Next2s is (s = 1) ∧ InpOrNot s ; and L2 asserts no fairness condition
for that action. To show that the (s = 0) ∧ InpOrNot s step must be followed
by an Ack s step, we first show as follows that ∃ s : IC 2s is equivalent to IC 2:
∃ s : IC 2s ≡ ∃ s : ICen2 ∧ L2 By definition of IC 2s .
≡ (∃ s : ICen2s ) ∧ L2 Because s does not occur in L2.
≡ ICen2 ∧ L2 By Theorem 6.4.
≡ IC 2 By definition of IC 2s .
4. Case: 32(aw 6= h i)
Proof: Since aw 6= h i equals EhDisplayiv , the case assumption and
WFv (Display) imply that, when 2(aw 6= h i) becomes true, a hDisplay iv
step eventually occurs, and IIS s implies that this step must be a Display s
step. By C3, this Display s step is a hDispOrNotiv step, which implies the
goal introduced by step 1.
5. Case: 2(s 6= 0)
Proof: The case assumption and the assumption 2Inv 2 imply 2(s = 1).
As shown above in the explanation of why a behavior of IC 2s can’t halt
in a state with s = 1, the property WFv (Ack ) implies that, in such a
state, an (s = 1) ∧ InpOrNot s step must eventually occur. By C2, that is
the hDispOrNotiv step that proves the step 1 goal.
6. Q.E.D.
Proof: Step 3 implies that the step 4 and 5 cases are exhaustive.
The proof of (6.14b) is similar but simpler, since it doesn’t have the
complication of deducing from fairness of one action (Ack s ) that a step of
another action (DispOrNot s ) of the same program must occur.
Theorem 6.2 shows how, after adding a history variable to a program,
we can rewrite the program’s fairness properties as fairness conditions of
subactions of the modified program’s next-state action. I don’t know if
there is a similar result for stuttering variables. Theorem 6.2 is relevant
to methods other than TLA for describing abstract programs. Those other
methods that I’m aware of do not assume stuttering insensitivity, so a similar
result for stuttering variables seems to be of no interest.
However, this is impossible for the following reason. Because the refinement
mapping substitutes the variables inp and disp of ICen1 for the correspond-
ing variables of ICen2, an Input step of ICen1 must implement an InpOrNot
step of ICen2. Besides choosing the input, the InpOrNot action of ICen2
also decides whether or not that input is to be displayed, recording its de-
cision in the value of aw . However, that decision is made by ICen1 later,
when executing the DispOrNot action. Immediately after the Input action,
there’s no information in the state of ICen1 to determine what the value of
variable aw of ICen2 should be.
The solution to this problem is to have the Input action guess what
DispOrNot will do, indicating its guess by setting a prophecy variable p to
a value that predicts whether the input will be displayed or rejected by the
DispOrNot step.
CHAPTER 6. AUXILIARY VARIABLES 218
To make the generalization from this example more obvious, let’s write
action DispOrNot of ICen1 as the disjunction of two actions: DorN Yes that
displays the input and DorN No that doesn’t. Remember that:
∆
DispOrNot = ∧ . . .
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
..
.
We can define DorN i , for i = Yes and i = No, by modifying the definition
of DispOrNot to get:
∆
DorN i = ∧ . . .
∧ ∨ (i = Yes) ∧ (disp 0 = haw (1), 1 − disp(2)i)
∨ (i = No) ∧ (disp 0 = disp)
..
.
We then replace DispOrNot in ICen1 by ∃ i ∈ Π : DorN i , where Π equals
{Yes, No}. We can then add to ICen2 an auxiliary variable p called a
prophecy variable to obtain a formula ICen2p in which the Input action is
replaced by
Input p = Input ∧ (p 0 ∈ Π)
∆
∆
vp = v ◦ hp i ,
∆
Init p = Init ∧ (p ∈ Π) ,
∆
Next p = (Ap ∧ (p 0 ∈ Π)) ∨ (∃ j ∈ J : B j ∧ C j ) ,
(p = None) ⇒ ¬ E(∃ i ∈ Π : Ai )
the concepts. A very general definition has been described elsewhere for
expert TLA+ users [37].
∆
CenSeq1 = ∃ aw : ICenSeq1
ICenSeq1 = Init ∧ 2[NextSeq1]v
∆
∆
v = hinp, disp, aw i
∆
InitSeq = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}
∆
NextSeq1 = InputSeq ∨ DispOrNotSeq ∨ AckSeq
∆
InputSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ aw 0 = Append (aw , inp 0 )
∧ disp 0 = disp
∆
DispOrNotSeq = ∧ aw 6= h i
∧ ∨ disp 0 = haw [1], 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = Tail (aw )
∧ inp 0 = inp
∆
AckSeq = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 6.6: The program CenSeq1.
∆
CenSeq2 = ∃ aw : ICenSeq2
ICenSeq2 = InitSeq ∧ 2[NextSeq2]v
∆
∆
NextSeq2 = InpOrNotSeq ∨ DisplaySeq ∨ AckSeq
∆
InpOrNotSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ ∨ aw 0 = Append (aw , inp 0 )
∨ aw 0 = aw
∧ disp 0 = disp
∆
DisplaySeq = ∧ aw 6= h i
∧ disp 0 = haw [1], 1 − disp(2)i
∧ aw 0 = Tail (aw )
∧ inp 0 = inp
Figure 6.7: The program CenSeq2.
CHAPTER 6. AUXILIARY VARIABLES 225
where DorNSeq Yes displays the input and DorNSeq No rejects it. The defi-
nition of DorNSeq i is obtained by modifying DispOrNotSeq the same way
we modified DispOrNot to obtain DorN i for ICen1. We can then define:
Note that having DispOrNotSeq p set p 0 to Tail (p) ensures that every pre-
diction is used only once. Since AckSeq p neither makes nor satisfies a pre-
diction, we define:
AckSeq p = AckSeq ∧ (p 0 = p)
∆
where
∆
NextSeq1p = InputSeq p ∨ DispOrNotSeq p ∨ AckSeq p
assumes that wsq and ysq are sequences with the same length. Here is the
recursive definition. (Recall that “◦” is concatenation of sequences.)
∆
OnlyYes(wsq, ysq) =
if wsq = h i then h i
else ( if Head (ysq) = Yes then hHead (wsq)i
else h i )
◦ OnlyYes(Tail (wsq), Tail (ysq))
(p = h i) ⇒ ¬ E (∃ i ∈ Π : Ai )
∆
CenSet1 = ∃ aw : ICenSet1
ICenSet1 = Init ∧ 2[NextSet1]v
∆
∆
v = hinp, disp, aw , old i
∆
InitSet = ∧ inp = NotArt
∧ aw = { }
∧ disp ∈ Art × {0, 1}
∧ old = { }
∆
NextSet1 = InputSet ∨ DispOrNotSet ∨ AckSet
∆
InputSet = ∧ inp = NotArt
∧ inp 0 ∈ Art \ old
∧ aw 0 = aw ∪ {inp 0 }
∧ (disp 0 = disp) ∧ (old 0 = old ∪ {inp 0 })
∆
DispOrNotSet = ∃ w ∈ aw :
∧ ∨ disp 0 = hw , 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = aw \ {w }
∧ (inp 0 = inp) ∧ (old 0 = old )
∆
AckSet = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp) ∧ (old 0 = old )
Figure 6.8: The program CenSet1.
The InputSet p action must add a prediction of whether or not the picture
inp 0 that it adds to aw will be displayed. Thus, it must assert that p 0 is
the function obtained from p by adding inp 0 to its domain and letting the
value of p 0 (inp 0 ) be either element in Π. To write that action, let’s define
FcnPlus(f , w , d ) to be the function obtained from a function f by adding an
element w to its domain and letting that function map w to d . The domain
of f is written domain(f ), so the definition is:
∆
FcnPlus(f , w , d ) =
x ∈ {w } ∪ domain(f ) 7→ if x = w then d else f (x )
We can then define
Theorem 6.6 Let x, y, and z be lists of variables, all distinct from one
another; let the variables of T be x and z and the variables of IS be x and
y; and let T equal Init ∧ 2[Next]hx,zi ∧ L. Let the operator Φ map behaviors
satisfying T to behaviors satisfying IS such that Φ(σ) ∼y σ. By adding
history, stuttering, and prophecy variables to T , we can define a formula
T a such that ∃ a : T a is equivalent to T and a list exp of expressions defined
in terms of Φ and the variables of T a such that
values of all the tuples hx, z, t i in all the states reached thus far, including
the current one. We then add a prophecy sequence variable p that predicts
the infinite sequence of all future values of hx, z, t i. This means that in all
states of the behavior, the value of h ◦ p is the entire sequence of values
of hx, z, t i in the complete (infinite) behavior. Moreover, the length of h
indicates the position of the current state in that behavior. The values of h
and p and the mapping Φ provide all the information needed to determine
the values to substitute for y to obtain a refinement mapping under which
IS is simulated. The proof in the Appendix sketches the details.
The theorem shows that these auxiliary variables are, in principle, all
we need to define a refinement mapping. It and its proof do not tell us how
refinement mappings are defined in practice.
∆
Fifo = ∃ queue, enqInner , deqInner : IFifo
IFifo = Init ∧ 2[Next]v
∆
∆
v = henq, deq, queue, enqInner , deqInner i
∆
Init = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ queue = h i
∧ enqInner = (e ∈ EnQers 7→ Done)
∧ deqInner = deq
∆
Next = ∨ ∃ e ∈ EnQers : BeginEnq(e) ∨ DoEnq(e) ∨ EndEnq(e)
∨ ∃ d ∈ DeQers : BeginDeq(d ) ∨ DoDeq(d ) ∨ EndDeq(d )
∆
BeginEnq(e) = ∧ enq(e) = Done
∧ ∃ D ∈ Data : enq 0 = (enq except e 7→ D)
∧ enqInner 0 = (enqInner except e 7→ Busy)
∧ unchanged hdeq, queue, deqInner i
∆
DoEnq(e) = ∧ enqInner (e) = Busy
∧ queue 0 = Append (queue, enq(e))
∧ enqInner 0 = (enqInner except e 7→ Done)
∧ unchanged hdeq, enq, deqInner i
∆
EndEnq(e) = ∧ enq(e) 6= Done
∧ enqInner (e) = Done
∧ enq 0 = (enq except e 7→ Done)
∧ unchanged hdeq, queue, enqInner , deqInner i
∆
BeginDeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ deqInner 0 = (deqInner except d 7→ NoData)
∧ unchanged henq, queue, enqInner i
∆
DoDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) = NoData
∧ queue 6= h i
∧ deqInner 0 = (deqInner except d 7→ Head (queue))
∧ queue 0 = Tail (queue)
∧ unchanged henq, deq, enqInner i
∆
EndDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) 6= NoData
∧ deq 0 = (deq except d 7→ deqInner (d ))
∧ unchanged henq, queue, enqInner , deqInner i
Figure 6.9: The program Fifo.
CHAPTER 6. AUXILIARY VARIABLES 236
implements Fifo requires adding a prophecy variable that predicts the order
in which data items enqueued by concurrent enqueue operations will be
dequeued.
What is encoded in the state of their algorithm is not a linearly ordered
queue of enqueued data values, but rather a partial order on the set of
enqueued values that indicates the possible orders in which the values can
be returned by dequeue operations. A partial order on a set S is a relation ≺
on S that is transitive and has no cycles (which implies a 6≺ a for any a ∈ S ).
For the partial ordering ≺ on the set of enqueued values, the relation u ≺ w
means that value u must be dequeued before value w . Program IFifo is the
special case in which that partial order is a total order, meaning that either
u ≺ w or w ≺ u for any two distinct enqueued values u and w .
Presented here is a program POFifo that is equivalent to Fifo, but which
is obtained by hiding internal variables in a program IPOFifo that main-
tains a partially ordered set of enqueued values rather than a queue. The
Herlihy-Wing algorithm can be shown to implement IPOFifo under a refine-
ment mapping defined in terms of its variables, without adding a prophecy
variable.
F1. Each dequeued value has been enqueued, and an enqueued value is
dequeued at most once.
We can now write the program IPOFifo. It will have the same constants
as IFifo plus the set Ids of identifiers; and it will have the same interface
variables deq. It will have the internal variable elts whose value is the set of
currently enqueued datums.
IPOFifo will need an internal variable to describe the partial order ≺ on
the set elts. Mathematicians describe a relation ≺ on a set S as a subset
of S × S , where u ≺ v is an abbreviation for hu, v i ∈ ≺. We’ll do the same
thing, except it would be confusing to use the symbol ≺ as a variable. We
will therefore let before be the variable whose value is a subset of elts × elts
such that u ≺ v means hu, v i ∈ before.
Finally, when enqueueing a datum w , the BeginPOEnq step must add
to ≺ the relation u ≺ w for a datum u in elts iff the enqueue operation that
added u has completed. That information is not contained in the interface
variable enq because enq(e) contains only the data value that an uncom-
pleted enqueue operation is enqueueing, not which datum the operation put
in elts. Therefore, we add to IPOFifo an internal variable adding such that
adding(e) equals the datum in elts that enqueuer e put in elts, and equals a
value NonElt that is not a datum if e is not currently performing an enqueue
operation.
We use adding to define the following state expression, whose value is
the set of datums enqueued by operations whose executions have not yet
completed:
∆
beingAdded = {adding(e) : e ∈ EnQers} \ {NonElt}
The set beingAdded need not be a subset of elts because it can contain
datums that were removed from elts by dequeue operations before the op-
erations that enqueued them have completed.
The program POFifo is defined in Figure 6.10. Here are explanations of
the four disjuncts of the next-state action PONext.
∆
POFifo = ∃ elts, before, adding : IPOFifo
IPOFifo = POInit ∧ 2[PONext]POv
∆
∆
POv = henq, deq, elts, before, adding i
∆
POInit = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ elts = {}
∧ before = {}
∧ adding = (e ∈ EnQers 7→ NonElt)
∆
PONext = ∨ ∃ e ∈ EnQers : BeginPOEnq(e) ∨ EndPOEnq(e)
∨ ∃ d ∈ DeQers : BeginPODeq(d ) ∨ EndPODeq(d )
∆
BeginPOEnq(e) =
∧ enq(e) = Done
∧ ∃ D ∈ Data : ∃ id ∈ {i ∈ Ids : hD, i i ∈ / (elts ∪ beingAdded )} :
∧ enq 0 = (enq except e 7→ D)
∧ elts 0 = elts ∪ {hD, id i}
∧ before 0 = before ∪ {hel , hD, id ii : el ∈ (elts \ beingAdded )}
∧ adding 0 = (adding except e 7→ hD, id i)
∧ deq 0 = deq
∆
EndPOEnq(e) = ∧ enq(e) 6= Done
∧ enq 0 = (enq except e 7→ Done)
∧ adding 0 = (adding except e 7→ NonElt)
∧ unchanged hdeq, elts, before i
∆
BeginPODeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ unchanged henq, elts, before, adding i
∆
EndPODeq(d ) = ∧ deq(d ) = Busy
∧ ∃ el ∈ elts :
∧ ∀ el 2 ∈ elts : ¬(el 2 ≺ el )
∧ elts 0 = elts \ {el }
∧ deq 0 = (deq except d 7→ el (1))
∧ before 0 = before ∩ (elts 0 × elts 0 )
∧ unchanged henq, adding i
Figure 6.10: The program POFifo.
CHAPTER 6. AUXILIARY VARIABLES 240
EndPODeq(d ) Enabled when deq(d ) equals Busy and elts is not empty,
which implies that elts contains at least one minimal datum (a datum
not preceded in the ≺ relation by any other datum in elts), since the
datum most recently added to elts must be a minimal datum. The
action chooses an arbitrary minimal datum el of elts, removes it from
elts, sets deq(d ) to its data value component, and modifies before to
remove all relations el ≺ el 2 for elements el 2 of elts.
equal to pg for as long as possible. To see how to do that, let’s see how pg
can change.
The sequence pg can become shorter only when an EndPODeq p step
occurs, in which case p is not the empty sequence and pg is a nonempty
prefix of p. The step removes the first element of p and pg, so p 0 = Tail (p),
pg 0 = Tail (pg), and qBar 0 = Tail (qBar ).
The sequence pg can be made longer by a BeginPOEnq p step as follows.
Suppose the step appends the prediction u to p and adds the datum w to
elts. The value of pg at the beginning of the step is a proper prefix of p ◦hu i.
If w equals the prediction in p ◦ hu i immediately after pg, then w will be
appended to pg iff doing so would not violate Q3. (We’ll see in a moment
when it would violate Q3.) If w can be appended to pg and the prediction
following w in p is already in elts, then it might be possible to append that
datum to pg as well. And so on. Thus, it’s possible for the BeginPOEnq p
step to append several datums to pg. If our strategy has been successful
thus far and qBar = pg at the beginning of the step, then a BeginPOEnq pq
step implies qBar 0 = pg 0 . This makes qBar a prefix of qBar 0 , as it should
be because stuttering steps to be added after a BeginPOEnq p step should
change queueBar only by appending datums to it.
There is one situation in which it is impossible for any further datum to
be appended to pg. One or more datums can be appended to pg only by
a BeginPOEnq p that adds a datum w to elts that can be appended to pg.
However if there is a datum u in elts that is neither in the sequence pg nor in
the set beingAdded , then adding w to elts also adds the relation u ≺ w . This
relation means that w can’t be appended to pg because that would violate
condition Q3. Thus, if there is a datum u in elts that is neither in pg nor
beingAdded , then no datums can be added to pg. Moreover, the datum u
can never be removed from elts because it is not in pg and can never be in pg
because no more datums can be added to pg. (The datum u can’t be added
to beingAdded because a BeginPOEnq step can’t add a datum to elts that
is already in elts.) Let’s call a state in which there is a datum in elts that
is not in beingAdded or pg a blocked state. In a blocked state, datums can
be removed from the head of pg by EndPODeq p steps, but no new datums
can be appended to pg. So, if and when enough EndPODeq p steps have
occurred to remove all the datums from pg, then no more EndPODeq p steps
can occur. That means that any further enqueue operations that are begun
with a BeginPODeq p step must block, never able to complete.
Let’s consider the first step that caused a blocked state—that is, causing
there to be an element u in elts that is neither in pg nor beingAdded . Since
u was added to elts by a BeginPOEnq p step that put u in beingAdded , it
CHAPTER 6. AUXILIARY VARIABLES 244
must be the EndPOEnq p step of the enqueue operation that added u to elts
that caused the blocked state by removing u from beingAdded . Until that
blocked state was reached, qBar equaled pg. However, since u has not been
dequeued, it must be in queueBar after that EndPOEnq p step because that
step must implement the EndEnq step of IFifo. Thus that EndPOEnq p step
must append u to qBar . Therefore, the first blocked state is the first state
in which qBar 6= pg. In that state, qBar equals pg ◦ hu i.
From that first blocked state on, no new datums can be added to pg, so
the datum u can never be dequeued. Therefore, whenever an EndPOEnq p
step occurs for an operation that enqueued a datum w , if w is in elts (so it
hasn’t been dequeued) and is not in pg, then that EndPOEnq p step must
append w to qBar .
To recapitulate, here is how we add the history variable qBar to IPOFifo p
to obtain the program IPOFifo pq . These rules imply that, at any point in the
behavior, qBar will equal pg ◦ eb where pg is the state function of IPOFifo p
defined above and eb is a sequence of datums in elts that are not in pg.
Initially, pg and eb equal h i.
These rules imply that a datum can never be removed from eb, so once eb
is nonempty no new datums can be added to pg and only datums currently
in pg can ever be dequeued.
Observe that the sequence pg and the set of datums in eb can be defined
in terms of the variables of IPOFifo p . A history variable is needed only
to remember the order in which datums have been appended to eb. This
suggests that it’s a little simpler to make eb the history variable and define
qbar to be the state expression pg ◦ eb. However, I could not have discovered
this without first understanding how qBar should be defined.
Writing a complete definition of IPOFifo pq is straightforward, once we
have solved the problem of writing a precise mathematical definition of the
state function qBar in terms of the variables of IPOFifo p . (It took me a few
tries to get that definition right, using a model checker to find my mistakes.)
That definition is omitted.
CHAPTER 6. AUXILIARY VARIABLES 245
Encoding in the value of the stuttering variable s for which of the three
cases the variable is being added, and in case 2 for which enqueuer e the
step is an EndPOEnq pq (e) step, allows the value of queueBar to be defined
in terms of the values of s, qBar , and (for case 2) adding.
We still have to define the state functions enqInnerBar and deqInnerBar
that are substituted for enqInner and deqInner in the refinement mapping
under which IPOFifo pqs implements IFifo. The value of enqInnerBar (e) for
an enqueuer e should equal Done except when adding(e) equals the datum
that e is enqueueing, and that datum is not yet in queueBar . This means
that enqInnerBar can be defined in terms of adding and queueBar .
The value of deqInnerBar (d ) for a dequeuer d should equal the value of
deq(d ) except between when d has removed the first element of queueBar
CHAPTER 6. AUXILIARY VARIABLES 246
(by executing the stuttering step added in case 1) and before the subse-
quent EndPODeq pqs (d ) step has occurred. In that case, deq(d ) should
equal qBar (1). It’s therefore easy to define deqInnerBar as a state func-
tion of IPOFifo pqs if the value of the stuttering variable s added in case 1
contains the value of d for which the following EndPODeq pqs (d ) step is to
be performed.
This completes the sketch of how auxiliary variables are added to IPOFifo
to define a refinement mapping under which it implies IFifo, showing that
POFifo refines Fifo. The intellectually challenging part was discovering how
to define qBar . It took me quite a bit of thinking to find the definition. This
was not surprising. The example of the fifo had been studied for at least
15 years before Herlihy and Wing discovered that it could be implemented
without maintaining a totally ordered queue. Given the definition of qBar ,
constructing the refinement mapping required the ability to write abstract
programs mathematically—an ability that comes with practice.
(6.17) |= T ⇒ ∃ n ∈ N : 32(x = n)
This implies:
|= T ≡ ∃ n ∈ N : T ∧ 32(x = n)
(6.18) |= (n ∈ N) ∧ 32(x = n) ∧ T ⇒ IS
|= T h ⇒ (IS with y ← h)
The key step in what we’ve done is the use of (6.17) to reduce showing
|= T ⇒ IS to showing (6.18). This procedure is called adding the prophecy
constant n to T . This is perhaps a silly name, since n is just an ordinary
bound constant. However, it serves to predict something that is eventually
going to happen—in this case, the final value x will have.
In general, we add a prophecy constant c to a program T by showing
|= T ⇒ ∃ c ∈ C : L , where the constant c does not occur in T , which implies
|= T ≡ ∃ c ∈ C : T ∧ L
CHAPTER 6. AUXILIARY VARIABLES 248
|= (c ∈ C ) ∧ T ∧ L ⇒ IC
Loose Ends
This chapter covers two topics that, to my knowledge, have not yet seen any
industrial application. However, they might in the future become useful.
The first topic is reduction, which is about verifying that a program satisfies
a property by verifying that a coarser-grained version of the program satisfies
it. Even if you never use it, understanding the principles behind reduction
can help you choose the appropriate grain of atomicity for abstract programs.
For that purpose, skimming through sections 7.1.1–7.1.3 should suffice.
The second topic is about representing a program as the composition of
component programs. We have been representing the components that make
up a program, such as the individual processes in a multiprocess program, as
disjuncts of the next-state action. Section 7.2 explains how the components
that form a program can be described as programs. How this is done depends
on why it is done. Two reasons for doing it and the methods they lead to
are presented.
7.1 Reduction
7.1.1 Introduction
7.1.1.1 What Reduction Is
When writing an abstract program to describe some aspect of a concrete
one, we must decide what constitutes a single step of a behavior. Stated
another way, we must describe what the grain of atomicity of the next-state
action should be. The only advice provided thus far is that we should use
the coarsest grain of atomicity (the fewest steps) that is a sufficiently ac-
curate representation of that aspect of the concrete program. “Sufficiently
249
CHAPTER 7. LOOSE ENDS 250
accurate” means that either we believe it is easy to make the concrete pro-
gram implement that grain of atomicity, or we are deferring the problem of
how those atomic actions are implemented.
Some work has addressed the problem of formalizing what makes an ab-
stract program “sufficiently accurate”, starting with a 1975 paper by Richard
Lipton [39]. This work used the approach called reduction, which replaces
a program S with an “equivalent” coarser-grained program S R called the
reduced version of S . Certain properties of S are verified by showing that
S R satisfies them. The program S R is obtained from S by replacing certain
nonatomic operations with atomic actions, each atomic action producing
the same effect as executing all the steps of the nonatomic operation it re-
places one after another, with no intervening steps of other operations. The
reduced program S R is therefore simpler and easier to reason about than
program S .
It was never clear in exactly what sense S R was equivalent to S , and the
results were restricted to particular classes of programs and of the properties
that could be verified. TLA enabled a new way of viewing reduction. In
that view, the variables of S are replaced in S R by “virtual” variables, and
S implements S R under a refinement mapping. The refinement mapping
is not made explicit, but the relation between the values of the actual and
the virtual variables is described by an invariant. I believe that this math-
ematical view encompasses all previous work on reduction for concurrent
programs.
Our basic problem approach to writing a correct concrete program is
showing that it refines an abstract program. There are two aspects to refin-
ing one program with another: data refinement and step refinement. Modern
coding languages have made data refinement easier by providing higher-level,
more abstract data structures. It is now almost as easy to write a program
that manipulates integers as one that manipulates bit strings representing
a finite set of integers. There has been much less progress in making step
refinement easier. As explained in Section 6.5.1, a linearizable object allows
a coarse grain of atomicity in descriptions of the code that executes oper-
ations on the object. However, the only general method of implementing
a linearizable object still seems to be the one invented by Dijkstra in the
1960s: putting the code that reads and/or modifies the state of the object
in a critical section.
I believe that better ways of describing the grain of atomicity will be
needed if rigorous verification that concrete concurrent programs implement
abstract ones is to become common practice. Reduction may provide the
key to doing this. This section provides a mathematical foundation for
CHAPTER 7. LOOSE ENDS 251
understanding reduction. The theorems presented here are not the most
general ones possible; some generalizations can be found elsewhere [8]. Also
omitted are rigorous proofs. I know of no industrial use of reduction or of
tools to support it; and I have no experience using the results described here
in practice. The problem it addresses is real, but I don’t know if reduction
is the solution.
R1. |= S ∧ T ⇒ ∃ X : S R ∧ 2I R
R2. |= S R ⇒ (P with x ← X)
R3. |= (P with x ← X) ∧ 2I R ⇒ P
|= S R ⇒ P and |= P ∧ 2I R ⇒ P
We first consider the case in which S is the usual TLA safety property
Init ∧ 2[Next]hxi for an abstract program. We then consider the program
described by S ∧ F , where F is the conjunction of fairness properties for S .
Conditions R1–R3 will then have S replaced by S ∧ F , the reduced program
(S ∧ F )R being defined to equal S R ∧ F R where F R is obtained by replacing
CHAPTER 7. LOOSE ENDS 252
7.1.2 An Example
To explain reduction, we start by examining this commonly assumed rule:
When reasoning about a multiprocess program in which interprocess com-
CHAPTER 7. LOOSE ENDS 253
.
.
rk: .
r 1: R1 ;
.
.
.
rk: Rk ;
c: C;
l 1: L1 ;
.
.
.
lm Lm ;
o: ...
.
.
.
Define R to be the state predicate that is true of a state iff that state occurs
during an execution of RCL before the C action. In the part of a behavior
shown in (7.1), R is true only in states s 42 and s 43 . Define L to be the state
predicate asserting that the process is currently executing operation RCL
after the C action, so in (7.1) it is true in states s 44 –s 47 . In general, if p is the
process executing RCL of Figure 7.1, then R equals pc(p) ∈ {r 2, . . . , r k , c}
and L equals pc(p) ∈ {l 1 , . . . , l m }. The behavior is currently executing
RCL iff the state predicate R ∨ L is true. Thus, the operation is not being
executed iff ¬(R ∨ L) is true.
Because R i and Lj actions access only process-local state, they commute
with actions from other processes, where two actions commute iff executing
them in either order has the same effect. Recall that A · B is defined in
Section 3.4.1.4 to be the action that is satisfied by a step s → t iff there
is a state u such that s → u is an A step and u → t is a B step. Actions
A and B commute iff A · B equals B · A. If A and B commute, then for
A B
any states s, t, and u such that s −→ u −→ t, there exists a state v such
B A
that s −→ v −→ t. By commuting R i and Lj actions with actions of other
processes, moving R i actions to the right and Lj actions to the left, we
obtain a behavior in which every execution of RCL is reduced, with no steps
of other processes interleaved. For example, commuting actions in this way
CHAPTER 7. LOOSE ENDS 255
converts the original behavior into the reduced behavior in which the portion
of the original behavior shown in (7.1) is converted to:
E 1 R 1 C L1 L
2 E
2 3E
(7.2) · · · s 41 −→ u 42 −→ s 43 −→ s 44 −→ u 45 −→ u 46 −→ u 47 −→ s 48 · · ·
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 u R1 s C E2 ? E3 s ? L1 s ? L2 s ?
· · · s41 −→ 42 −→ 43 −→ s 44 −→ s 45 −→ 46 −→ 47 −→ 48 · · ·
@
E1 R1 C E2 L1 E@R ? L2
3@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ s 45 −→ r46 −→
? ? ? s 47 −→ s48
?
···
@
? E1 ? R1 C L1 E@R ? E 3 ? L2
2@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ r46 −→ s 47 −→ s48
?
···
@
E1 R1 C L1 E2 L2 E@3@
R ?
· · · s41
?
−→ u42
?
−→ s 43 −→ s 44 −→ u45
?
−→ r46
?
−→ u47 −→ s 48 ···
@
? E1 u ? R1 s C L1 ? L2 u E2 ? E3 s ?@
R
@
· · · s41 −→ 42 −→ 43 −→ s 44 −→ u45 −→ 46 −→ u47 −→ 48 · · ·
The arrows in the picture are drawn according to the following rules.
There is a (thin) downward pointing arrow from each non-C state that is
unchanged by the interchange that yields the next behavior. From the one
state in each behavior that is changed by the interchange, there is a (thick)
diagonal arrow. If that state satisfies R (is before the C action), then the
arrow points one state to the left of the changed state. If the state satisfies
L, then the arrow points one state to the right.
These arrows define a unique path from every non-C state s i of the
original behavior to a state in the reduced behavior. Define φ(s i ) to be that
state in the reduced behavior. For the example in Figure 7.2, φ(s 45 ) = u 47
because the sequence of states in the path from s 45 in the top behavior to
the bottom behavior is:
(7.3) s 45 → s 45 → s 45 → r 46 → r 46 → u 47
Figure 7.3 contains an arrow pointing from each non-C state s i in the orig-
inal behavior to the state φ(s i ) in the reduced behavior. Observe that for
every non-C state s i , the state φ(s i ) is a state in which operation RCL is
not being executed—that is, a state satisfying ¬(R ∨ L).
C
We define φ(s) for the C states so that if the C step is s i −→ s i+1 , then
φ(s i ) is the first state to the left of s i for which ¬(R∨L) is true, and φ(s i+1 )
is the first state to the right of s i+1 for which ¬(R ∨ L) is true. In other
words, φ(s i ) and φ(s i+1 ) are the states of the reduced behavior in which the
execution of RCL begins and ends, respectively. The complete mapping φ
CHAPTER 7. LOOSE ENDS 257
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 R1 C L1 L2 E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 R1 C L1 L2 j E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·
for our example is shown in Figure 7.4, where the mapping for non-C states
is in gray.
Formula S ⊗S R is defined so that it is satisfied by a behavior σ iff σ
satisfies S (which describes the values of variables x) and the values of the
variables X in any state s k of σ equal the values of the variables x in the state
φ(s k ) of the reduced behavior Φ(σ). Since this assigns values of variables X
to every state of every behavior σ so that the behavior satisfies S ⊗S R , we
see that |= S ⇒ ∃ X : S ⊗S R is true, so R1a is satisfied.
From Figure 7.4, we see that for any k :
φ1. If s k → s k +1 is an E h step (so E h is an action of another process)
then φ(s k ) → φ(s k +1 ) is also an E h step.
φ2. If s k → s k +1 is an R i or Lj step, then φ(s k ) = φ(s k +1 ).
φ3. If s k → s k +1 is a C step, then φ(s k ) and φ(s k +1 ) are the first and
last states of an execution of operation RCL in the reduced behavior
Φ(σ), which is an execution with no interleaved steps of other process
actions.
Recall that G equals G with x ← X for any formula G. A step of
SR is either an Eh step, a D step, where D is an action that performs
an execution of operation RCL as a single step, or a stuttering step that
leaves the variables X unchanged. From φ1–φ3, we see that if behavior
(7.1) satisfies S ⊗S R (as well as S ), then each step of that behavior is either
an Eh step (by φ1), a D step (by φ3) or leaves the variables X unchanged
(by φ2). Hence each step of S ⊗S R satisfies the next-state action of S R .
The initial-state predicate of S R is Init (remember that Init is the initial-
state predicate of S ). Since operation RCL is not being executed in the
CHAPTER 7. LOOSE ENDS 258
initial state s 0 of (7.1), φ(s 0 ) equals s 0 , which implies that s 0 satisfies Init.
Thus, if (7.1) satisfies S ⊗S R , then it satisfies S R . Therefore, S ⊗S R ⇒ S R
is satisfied—or so it seems.
The reasoning works in this example because the behavior (7.1) contains
a complete execution of operation RCL. However, S is a safety property,
which means that the process can stop executing actions at any point during
the execution of the operation. The general definition of the reduced version
of a behavior, which includes a possibly incomplete execution of RCL, is that
actions performing steps of an RCL execution are made to occur together
by commuting R i actions to the right, Lj actions to the left, and leaving a
C action unmoved. (If there is no C step, then the last R i action can be
left unmoved.) If s is the last state of a C step, φ(s) is defined to be the
state after the last step of the RCL operation being executed. That state
will satisfy ¬(R ∨ L) iff the behavior contains a complete execution of the
operation.
The one part of what we’ve done that’s not correct in the presence of
a partial execution is φ3. Statement φ3 is vacuously true if the partial
execution doesn’t contain a C step. In that case there are only R i steps
which correspond to stuttering steps of S R , so the behavior of S ⊗S R is a
behavior of S R in which that RCL action doesn’t occur. The problem in φ3
arises in our example if the execution contains a C step but doesn’t contain
both the L1 and L2 steps.
The solution is to rule out such behaviors. That’s what the hypothesis
T in R1 and R1b does. Formula T must assert that any execution of RCL
that includes the C step must complete. The operation has performed the C
step but has not completed iff L is true. So we want to allow only behaviors
in which it is not the case that L eventually becomes true and remains true
forever. Such behaviors are ones satisfying ¬32L, which is equivalent to
23¬L. So, we can restate R1b as this assumption:
|= S ⊗S R ∧ (23¬L) ⇒ S R ∧ 2I R
The argument showing S ⊗S R implies S R for the behavior (7.1) applies to
all behaviors of the program of Figure 7.1 satisfying 23¬L. That is, we
have shown:
|= S ⊗S R ∧ (23¬L) ⇒ S R
is satisfied by this program S . To complete the proof of R1b, we must define
I R and show that it is an invariant of S ⊗S R . Since we have seen that S
satisfies R1a, this will show that it satisfies R1 with T equal to 23¬L. The
assumption 23¬L is discussed in Section 7.1.4.
CHAPTER 7. LOOSE ENDS 259
For the next property, look at the path (7.3) from the state s 45 to the state
φ(s 45 ), which equals u 47 . Follow that path in Figure 7.2. Observe that each
step in the path either leaves the state unchanged (is a stuttering step) or
else is an L1 or L2 step. (To see this, look at the horizontal arrows just
above the diagonal arrows.) We can see from the figure that this is true
of the path from state s to state φ(s) for the states s 46 and s 47 as well.
The rule for drawing the arrows implies that this is true in general, for all
executions of the program of Figure 7.1. The path from every non-C state
s for which L is true to φ(s) consists of a sequence of steps each of which is
either a stuttering step or an Lj step for some j . From Figure 7.4, it’s clear
that this is also true for the C state for which L is true.
Let’s define L to equal L1 ∨ . . . ∨ Lm , so any Lj step is an L step. We have
seen that we can get from a state s in which L is true to the state φ(s) by
a sequence of stuttering steps and/or L steps. Recall that in Section 4.3.2,
for any action A we defined A+ to equal A ∨ (A · A) ∨ (A · A · A) ∨ . . . .
Therefore, a step s → t satisfies ([L]hxi )+ , which we abbreviate [L]+ hxi , iff
we can get from state s to state t by a sequence of L steps or steps that
leave the variables x unchanged. Let’s write the subscript hxi as simply x,
so [A]x and hAix mean [A]hxi and hAihxi for any action A.
We have thus seen:
Following the path from s to φ(s) backwards for states s in which R is true
similarly leads to the following, where R equals R 1 ∨ . . . ∨ R k .2
Finally, Figure 7.4 shows that for any state s of the original behavior, φ(s)
is always a state in the reduced behavior in which the RCL operation is not
being executed, so ¬(R ∨ L) is true for φ(s). Because the rule for drawing
the arrows in Figure 7.2 creates a leftward pointing arrow whenever an R
step is moved to the right and a rightward pointing arrow whenever an L
step is moved to the left, this is true in general. Therefore, we have:
φ7. For any state s, the state predicate ¬(R ∨ L) is true in state φ(s).
Statements φ4–φ7 give us relations between s and φ(s) for all states s of a
behavior of our example program. We now have to turn them into relations
between the values of the variables x and the variables X in any state s in
a behavior of S ⊗S R .
It’s easy to do this for φ4. The values of the variables X in state s are
the values of x in φ(s). Since φ(s) = s if s satisfies ¬(R ∨ L), this means
that x = X is true for any reachable state of S ⊗S R satisfying ¬(R ∨ L). In
other words:
so formula φ6 implies:
0
(7.5) R ⇒ ([R]+
x awith x ← X, x ← x) is an invariant of S ⊗S R
Remember that in this formula, the definition of [R]+x must be fully expanded
before the awith substitution is performed.
We can obtain a similar relation between x and X from φ5. The state-
ment that s → φ(s) is an [L]+ x step is equivalent to the statement that state
s is satisfied by the formula obtained from [L]+x by substituting for variables
x their values in state s and substituting for x0 the values of x in state
CHAPTER 7. LOOSE ENDS 261
φ(s), the latter being the values of X in state s. Therefore, φ5 implies the
following, where x ← x has been eliminated from the awith formula, since
it just states that x is substituted for itself:
0
(7.6) L ⇒ ([L]+
x awith x ← X) is an invariant of S ⊗S R
Finally, since the values of X in state s equal the values of x in state φ(s),
from φ7 we obtain:
(7.7) ¬(R ∨ L) with x ← X is an invariant of S ⊗S R
The invariant I R of S ⊗S R relating the values of x and X is the con-
junction of the invariants (7.4)–(7.7):
∆
IR = ∧ ¬(R ∨ L) ⇒ (X = x)
∧ R ⇒ ([R]+ 0
x awith x ← X, x ← x)
∧ + 0
L ⇒ ([L]x awith x ← X)
∧ ¬(R ∨ L) with x ← X
R E E R
s −→ u −→ t there is a pair of steps s −→ v −→ t for some state v . This
condition can be stated as the requirement R ·E ⇒ E ·R. When this formula
holds, we say that R right-commutes with E , and that E left-commutes with
R. Similarly, to move L steps to the left, we don’t need E · L to equal L · E ;
we need only require E · L ⇒ L · E , which asserts that L left-commutes with
E (and E right-commutes with L).
Finally, suppose that A is another action of the same process containing
the RCL operation, so A does not allow steps that implement the RCL
operation. Since the next step of the process after an R step can only be
an R step or a C step, an A step cannot follow an R step. This implies
that R · A must equal false, which means that R · A ⇒ A · R is trivially
true. Similarly, the only step of the process that can immediately precede
an L step is an L step or a C step. Therefore, A · L must equal false,
so A · L ⇒ L · A must be true. Therefore, other actions of the process
containing RCL satisfy the same commutativity requirements as actions of
other processes. This implies that we can completely forget about processes.
We just assume that the program’s next-state action equals E ∨ R ∨ C ∨ L,
where R right-commutes with E and L left-commutes with E . If we represent
the program as a collection of processes, there is no need for R, C , and L
steps to all be steps of the same process.
What we need to require is that execution of an RCL operation consists
of a sequence of R steps followed by a C step followed by a sequence of L
steps. To express this requirement, we generalize R and L from assertions
about a process’s control state to arbitrary state predicates satisfying certain
conditions. We take as primitives the state predicates R and L and the
actions E and M such that the program’s next state relation equals E ∨ M ,
where M describes the operation to be reduced. The actions R, C , and L
will be defined in terms of R, L, and M . We therefore assume the original
program S is defined by:
S = Init ∧ 2[E ∨ M ]x
∆
M3. An M step can’t go from the second phase to the first phase, expressed
by ¬(L ∧ M ∧ R0 ).
R = M ∧ R0 C = (¬L) ∧ M ∧ (¬R0 )
∆ ∆ ∆
L = L∧M
We can therefore define S R as follows, where Init R and E R are Init and E
with the variables x replaced by the variables X:
S R = Init R ∧ 2[E R ∨ M R ]X
∆
Theorem 7.1 Assume Init, L, and R are state predicates, M and E are
actions, x is a list of all variables appearing in these formulas, and X is a
list of the same number of variables different from the variables x. Define
S = Init ∧ 2[E ∨ M ]x
∆ ∆ ∆
R = M ∧ R0 L = L∧M
∆ ∆
Init R = Init with x ← X E R = E with x ← X
∆
M R = (¬(R ∨ L) ∧ M + ∧ ¬(R ∨ L)0 ) with x ← X
S R = Init R ∧ 2[E R ∨ M R ]X
∆
∆
IR = ∧ ¬(R ∨ L) ⇒ (X = x)
∧ R ⇒ ([R]+ 0
x awith x ← X, x ← x)
∧ + 0
L ⇒ ([L]x awith x ← X)
∧ ¬(R ∨ L) with x ← X
and assume:
(1) |= Init ⇒ ¬(R ∨ L)
(2) |= S ⇒ 2 [ ∧ E ⇒ (R0 ≡ R) ∧ (L0 ≡ L)
∧ ¬(L ∧ M ∧ R0 )
∧ ¬(R ∧ L)
∧ R·E ⇒E ·R
∧ E · L ⇒ L · E ]x
Then |= S ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R .
Assumption (1) and the first three conjuncts in the action of assumption
(2) are the conditions M1–M4, which assert that an execution of the opera-
tion described by the action M consists of a sequence of L steps followed by
a C step followed by a sequence of R steps. The final two conjuncts in the
action of assumption (2) are the assumptions that R right-commutes with
E and L left-commutes with E .
In practice, R, L, and E will be defined to be the disjunction of sub-
actions. This allows us to decompose the proofs of those commutativity
conditions by using the following theorem that is proved in the Appendix.
|= (∀ i ∈ I , j ∈ J : Ai · B j ⇒ B j · Ai ) ⇒ (A · B ⇒ B · A)
implies |= S ⇒ P .
(7.9) |= S ⊗S R ∧ F ∧ 23¬L ⇒ S R ∧ 2I R ∧ F R
XFx (A) ≡ (2
32
3 EhAix ⇒ 23hAix )
XFX (AR ) ≡ (2
32
3 EhAR iX ⇒ 23hAR iX )
where 2 323 means 32 if XF is WF and 23 if XF is SF. These formulas and
a little temporal logic imply that to prove (7.10) it suffices to prove these
two theorems:
Since hAR iX equals hAix and formulas EhAix and EhAix contain only the
variables X, (7.14a) is equivalent to:
(7.15) |= S R ⇒ 2( EhAix ⇒ EhAix )
If E were not a weird operator (see Section 5.4.4.3), EhAix would be equiv-
alent to EhAix ; and we expect that equivalence to be true for most actions
hAix . However, because it is not always true, we have to add (7.15) as an
assumption.
To see what is required to make (7.14b) true, we consider what assump-
tion is required to ensure that P ⇒ P is true for an arbitrary state predicate
P with free variables x. The free variables of P are X, and the relation be-
tween the values of x and X is described by the invariant I R of S ⊗S R .
CHAPTER 7. LOOSE ENDS 268
Let’s review what we have shown. We can deduce (7.10) from (7.11) and
(7.12). If (7.13) is true, then we can choose S R of Theorem 7.1 to make
(7.11) true for a single subaction A of E . We can deduce (7.12) from (7.14a)
and (7.14b). We can deduce (7.14a) from (7.15). And finally, we can deduce
(7.14b) from the conditions obtained above for proving P ⇒ P , substituting
EhAix for P . Putting all this together, we have shown that the program
S R of Theorem 7.1 can be chosen to make (7.10) true, for a single subaction
A of E , if the following two conditions are satisfied:
|= S ⇒ 2[ (hAix )ρ ⇒ (x0 6= x) ]x
There is seldom any reason for a program’s next-state action to allow stut-
tering steps, and modifying it to disallow stuttering steps does not change
the program. An A step of the program will usually be an hAix step; and
if it isn’t, A can be replaced by hAix . So for simplicity, we strengthen this
assumption to:
This may seem wrong because we have EhAρ ix in (7.20a) and EhAρ ix
in (7.20b) when the two formulas should be equal. However, the following
reasoning shows that they are equal. The definition of Aρ and conditions E3
and E4 of Section 5.4.4.2 imply that EhAρ ix equals ¬(R ∨ L) ∧ EhAρ ix .
The invariant I R implies that ¬(R ∨ L) ⇒ (x = X) and ¬(R ∨ L) are
true, so S ⊗S R implies that EhAρ ix always equals EhAρ ix . We make S R
implying (7.20a) one of our requirements for deducing that WFX (AR ) is
satisfied. We now consider (7.20b).
By (3.32b) of Section 3.4.2.8 and the tautology |= ¬hAix ≡ [¬A]x , to
prove (7.20b) it suffices to prove:
We have seen that to deduce WFX (AR ) from SFx (A), it suffices to show
(7.18) and:
For the other three possible pairs of fairness conditions on AR and A, the
same argument shows that we can deduce SFX (AR ) instead of WFX (AR )
by replacing 32 with 23 in (7.23b); and we can assume WFx (A) instead
of SFx (A) by replacing 23 with 32 in (7.23b).
CHAPTER 7. LOOSE ENDS 272
Theorem 7.4 With the definitions and assumptions (1) and (2) of Theo-
∆
rem 7.1, let C = (¬L) ∧ M ∧ (¬R0 ) and let
∆
|= F ⇒ ∀ i ∈ I : YFix (Ai ) F R = ∀ i ∈ I : ZFiX (AR
i )
where I is a countable set and YFi and ZFi are WF or SF for each i ∈ I ;
and assume either:
∆ i i
• Ai is a subaction of E , AR
i = Ai , YF equals ZF ,
CHAPTER 7. LOOSE ENDS 273
= Aρi ,
∆
AR
i
|= S ⇒ 2[Aρi ⇒ (x0 6= x)]x ,
ρ
|= S R ⇒ 2( EhAR
i iX ⇒ EhAi ix ) , and
|= S ∧ F ⇒ (2
32
3 Z EhAρi ix ∧ 2[¬Ai ]x ; 2
32
3 Y EhAi ix )
where for Q either Y or Z, 2
32
3 Q is 32 if QF is WF, and
it is 23 if QF is SF.3
Then |= S ∧ F ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R ∧ F R .
unchanged formulas assert that all program variables other than sem and
pc are left unchanged:
∆
Pp = ∧ pc(p) = . . .
∧ (sem = 1) ∧ (sem 0 = 0)
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .
∆
Vp = ∧ pc(p) = . . .
∧ sem 0 = 1
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .
CSq,j : CS q,j · V p and CS q,j · CS p,i both equal false because a CS q,j step
leaves process q inside its critical section, which by the mutual exclu-
sion algorithm implies process q is outside its critical section so neither
CS p,i nor V p is enabled.
|= S R ⇒ 2( EhM R iX ⇒ EhM ρ ix )
obtain a partial result that it appends to the end of a fifo queue. Process 2
removes the partial result from the head of the queue and completes the
computation. Process 1 can therefore get ahead of process 2, performing
its part of the i th computation while process 2 is still performing its part
of the j th computation, for i > j . The reduced program S R replaces these
two processes with a single process that performs each computation as a
single atomic action. The property we want to prove by reduction presum-
ably involves how the computed values are used after they are computed,
when they have the same values in the original and reduced programs, so
condition R3 is satisfied.
We describe steps of process 1 by an action Cmp1 ∨ Send , where that
process’s part of a computation consists of a finite sequence of Cmp1 steps
followed by a single Send step that appends the partial result to the tail
of the queue. Steps of process 2 are described by an action Rcv ∨ Cmp2,
where that process’s part of the computation consists of a single Rcv step
that removes the partial result from the head of the queue followed by a
finite sequence of Cmp2 steps. The contents of the queue are described by
the variable qBar , which is accessed only by the Send and Rcv actions. We
assume that the two processes communicate only through the fifo qBar ,
an assumption expressed by these conditions: Cmp1 commutes with the
process 2 actions Rcv and Cmp2, and Cmp2 commutes with the process 1
actions Cmp1 and Send . Since qBar is the only shared variable accessed
by Rcv and Send , it doesn’t matter in which order these two actions are
executed in a state where the queue is nonempty. Thus, we have:
The program may contain other processes that can interact in some way
with processes 1 and 2. For example, process 1 may obtain its input from a
third process and process 2 may send its output to a fourth process.
The program’s next-state action is M ∨ O, where M describes processes
1 and 2 and O describes any other processes. We rewrite M in the form
∃ n ∈ N+ : M n , where N+ is the set of positive integers and M n is an ac-
tion whose steps describe a complete execution of the n th computation. To
do this, we assume state functions snum and rnum whose values are the
numbers of Send and Rcv actions, respectively, that have been executed.
Initially, snum = rnum = 0. The Send action increments snum by 1 and
the Rcv action increments rnum by 1. We can then define:
∆
(7.25) M n = ∨ (snum = n − 1) ∧ (Cmp1 ∨ Send )
∨ ((rnum = n − 1) ∧ Rcv ) ∨ ((rnum = n) ∧ Cmp2)
CHAPTER 7. LOOSE ENDS 278
Again, with multiple reductions we let the reduced program have the same
variables as the original program, so the n th reduction replaces the action
M n with M ρn .
The remaining action E for this reduction is the disjunction of these
actions: the action O describing the other processes, the already reduced
actions M ρk for k < n, and the subactions of M k for k > n. To apply
Theorem 7.1, we must show that R n right-commutes with these actions and
Ln left-commutes with them.
That R n right-commutes and Ln left-commutes with O must be as-
sumed. The commutativity relations hold for M ρk with k < n because an
R n step is enabled only after an M ρk step, which implies R n · M ρk equals
false (so R n right commutes with M ρk ), and which also implies that Ln
cannot be enabled immediately after an M ρk step, so M ρk · Ln also equals
false.
What remains to be shown is that Cmp1n (the action R n ) right com-
mutes with M k , and that Rcv n and Cmp2n (whose disjunction equals Ln )
left commutes with M k , for k > n. For that, we have to show that each
of the four actions whose disjunction equals M k satisfy those commutativ-
ity conditions. We will use the commutativity relations we assumed above:
that Cmp1 commutes with Cmp2 and Rcv , and that Cmp2 commutes with
Send . The assumption that Cmp2 commutes with Send implies that Cmp2i
commutes with Send j for all i and j . This follows from the definitions of
Cmp2i and Send j , because Cmp2 does not depend on or modify snum and
Send j does not depend on or modify rnum. Similarly, Cmp1i commutes
with Rcv j and Cmp2j for all i and j . These assumptions are called the
commutativity assumptions in the following proof sketches of the required
commutativity relations. Recall that we are assuming k > n.
CHAPTER 7. LOOSE ENDS 279
other processes, to ensure that the operation will complete once process 1’s
Send action occurs. The obvious fairness condition we want the reduced
program to satisfy is fairness of M ρ . If an M ρn action is enabled, then no
M ρi action with i 6= n can be enabled until an M ρn step occurs. This implies
that (weak or strong) fairness of M ρ is equivalent to fairness of M ρn for all
n. For each n, ensuring fairness of M ρn is the second case in Theorem 7.4,
with Ai equal to C n , which equals Send n . The assumption |= S ∧ F ⇒ . . .
in that case of the theorem will have to be implied by fairness conditions on
subactions of Cmp1.
steps of those three parts. But math provides many ways to structure a
proof, and deciding in advance to structure it by decomposition might rule
out better proofs.
The one good reason to decompose the verification of a program in this
way is that it may make it easier to use a tool to verify correctness. For
example, a model checker might be able to verify correctness of individual
components but not correctness of the complete program. Decomposition
would allow using model checking to perform part of the verification, and
then using the results presented here to prove that correctness of the com-
ponents implies correctness of the entire program. This approach has been
applied to a nontrivial example [24], but I don’t know of any case in which
it has been used in industry.
Composition is useful if an engineer wants to verify correctness of a pro-
gram that describes a system built using an existing component whose be-
havior is specified by an abstract program. Up until now, we have described
a program by a formula that is satisfied by behaviors in which the program
to be implemented, which I will here call the actual program, and its envi-
ronment are both acting correctly. There was no need for the mathematical
description to separate the actual program and its environment, since it
makes no difference if an execution is incorrect because the programmer
didn’t understand what the code would do or what the environment would
do. However, if a program is implemented using a component purchased
elsewhere, it is important to know if an incorrect behavior is due to an in-
correct implementation of the actual program or of the component, which
is part of the environment.
For composition, we therefore describe a program with two formulas,
formula M describing the correct behavior of the actual program and a
formula E describing correct behavior of its environment. These formulas
+
are combined into a single formula, written E −. M , that can be thought of
as being true of a behavior iff M is true as long as E is (so M is always true
+
if E is always true). Formula E −. M is what is called a rely/guarantee
description of the program [22].
Currently, implementing actual programs with precisely specified exist-
ing components seems likely to arise in practice only for components that
are traditional programs that perform a computation and stop; and where
execution of the component can be considered to be a single step of the
complete program. In that case, there is no need for TLA. As explained in
Appendix Section A.4, the safety property of the component can be speci-
fied by a Hoare triple; and termination is the only required liveness property.
Composition in TLA is needed only if the existing component interacts with
CHAPTER 7. LOOSE ENDS 282
its environment in a more complex way that must be described with a more
general abstract program. Such reusable, precisely specified components do
not seem to exist now. Perhaps someday they will.
The results presented here come from a single paper [2]. The reader is
referred to that paper for the proofs. To make reading it easier, much of
the notation used here—including the identifiers in formulas—is taken from
that paper.
∆
Init M = Init a ∧ Init b
∆
Next M = Next a ∨ Next b
∆
LM = La ∧ Lb
Formula M is equivalent to the conjunction of M a and M b , defined by:
M a = Init a ∧ 2[Next a ]a ∧ La
∆
M b = Init b ∧ 2[Next b ]b ∧ Lb
∆
This result follows from the equivalence of 2(F ∧ G) and 2F ∧ 2G, for any
formulas F and G, and from
Theorem 7.5 If m 1 , . . . , m n are each lists of variables, with all the vari-
∆
ables in all the lists distinct, N = 1 . . n, and
∆
m = m 1, . . . , m n
M i = Init i ∧ 2[Next i ]hm i i ∧ Li
∆
∆
M = ∀i ∈ N :Mi
|= M ⇒ 2 [ ∀ i , j ∈ N : Next i ∧ (i 6= j ) ⇒ (hm j i0 = hm j i) ]m
then
|= M lc ∧ M ld ⇒ M c and |= M lc ∧ M ld ⇒ M d
but that doesn’t reduce the amount of work involved. However, suppose that
correctness of M lc doesn’t depend on its environment being the component
M ld , but just requires its environment to satisfy the correctness condition
M d of that component, and similarly correctness of M ld just requires that
the other component satisfies M c . We would then like to reduce verification
of (7.27) to verifying:
(7.28) |= M d ∧ M lc ⇒ M c and |= M c ∧ M ld ⇒ M d
This would reduce the amount of work because M c and M d are probably
significantly simpler than M lc and M ld . Can we do that?
Let’s consider the following trivial example, where each component ini-
tializes its variable to 0 and keeps setting its variable’s value to the value of
the other component’s variable.
∧ WFc ((c 0 = d ) ∧ (d 0 = d ))
M ld = (d = 0) ∧ 2[(d 0 = c) ∧ (c 0 = c)]d
∆
M c = 3(c = 1) M d = 3(d = 1)
∆ ∆
CHAPTER 7. LOOSE ENDS 286
while keeping M lc and M ld the same. Condition (7.28) is still satisfied be-
cause each component eventually sets its variable to 1 if the other component
sets its variable to 1. However, (7.27) is not satisfied. Changing the cor-
rectness conditions doesn’t change the behavior of the program, which is to
take nothing but stuttering steps.
We might ask why we can’t deduce (7.27) from (7.28) in this example.
However, the real question is why we can deduce it in the first example. De-
ducing (7.27) from (7.28) is deducing, from the assumption that correctness
of each component implies correctness of the other, that both components
are correct. This is circular reasoning, and letting M c = M d = false shows
that it allows us to deduce that any program implies false, from which we
can deduce that the program satisfies any property.
So, why does (7.28) imply (7.27) in the first case? Why can we deduce
that both components leave their variables equal to 0 from the assumption
that each component leaves its variable equal to 0 if the other process leaves
its variable equal to 0? The reason is that neither process can set its variable
to a value other than 0 until the other one does. Stated more generally,
we can deduce that both components in a two-component program satisfy
their correctness properties if neither component can violate its correctness
property until after the other does. So we want to replace (7.28) by:
(7.30) |= ∀ k ∈ N :
(M d true through state k − 1)
∧ (M lc true through state k ) ⇒ (M c true through state k )
plus the same condition with c and d interchanged, where F true through
state −1 is taken to be true for any property F .
To express (7.30) precisely, we have to say what it means for a property
F to be true through state k . If F is a safety property, it means that F is
true of the finite behavior σ(0) → . . . → σ(k ), which means it’s true of the
(infinite) behavior obtained by repeating the state σ(k ) forever. It follows
from Theorems 4.3 and 4.4 that any property F equals C(F ) ∧ L where L is
a liveness property such that hC(F ), Li is machine closed. By the definition
of machine closure (in Section 4.2.2.2), any finite behavior that satisfies
C(F ) can be completed to a behavior satisfying C(F ) ∧ L, which equals F .
Therefore, the only way a behavior can fail to satisfy F through state k is
for it not to satisfy C(F ) through state k , so F is true through state k means
that C(F ) is true through state k . We should therefore replace M d , M lc ,
and M c by C(Md ), C(Mcl ), and C(Mc ) in (7.30). For a safety property, true
through state k means true if all states i with i > k equal state k , so we
CHAPTER 7. LOOSE ENDS 287
(7.31) |= ∀ k ∈ N :
(every state after state k equals state k ) ⇒
( (C(M d ) true through state k − 1) ∧ C(M lc ) ⇒ C(M c ) )
Next, let v be the tuple of all variables in these formulas. We can then
replace the assertion “every state . . . state k ” in (7.31) with “v 0 = v from
state k on”. By predicate logic, if k does not appear in R or S , then
(∀ k : P ⇒ (Q ∧ R ⇒ S )) ≡ ((∃ k : P ∧ Q) ∧ R ⇒ S )
we can then write (7.32) and the condition obtained from it by interchanging
c and d as:
be the formula that we have been calling F+v . We now define F+v to equal
old ∨ F . With this definition, (7.33) implies its two conditions also hold
F+v
with the “+v ” removed. If F is a safety property, then F v is a safety
property but F+vold usually isn’t. In fact, if F is a safety property then F
+v
old
equals C(F+v ). In practice, the change should seldom make a difference in
(7.34) because we don’t expect liveness properties to be useful for proving
safety properties, so we wouldn’t expect F ∧ G ⇒ H to be true for safety
properties G and H without C(F ) ∧ G ⇒ H also being true.
The formula F+v has been defined semantically. However, to verify (7.33)
directly, we have to write C(F )+v as a formula for a given formula F . It’s
CHAPTER 7. LOOSE ENDS 288
easy to write C(F ) if F has the usual form Init ∧ 2[Next]w ∧ L , where w is
the tuple of variables in the formulas and L is the conjunction of fairness
properties of subactions of Next. In that case, the definition of machine
closure (Section 4.2.2.2) and Theorem 4.6 (Section 4.2.7) imply C(F ) equals
Init ∧ 2[Next]w . We can then write F+v as follows:
∆
F+v = ∃ h : Init
d ∧ [Next]
d
w ◦v ◦hh i
∆
where Init
d = (Init ∧ (h = 0)) ∨ (h = 1)
∆
d = ∨ (h = 0) ∧ ∨ (h 0 = 0) ∧ [Next]w
Next
∨ h0 = 1
∨ (h = 1) ∧ (h 0 = h) ∧ (v 0 = v )
While writing F+v is easy enough, we usually don’t have to for the same
reason that we didn’t have to use the +v subscripts in (7.27). Our example
has one feature that we didn’t use in our generalization—namely, that no
single program step can make both M c and M d false. Here’s how to use that
feature in general. For safety properties F and G, define F ⊥ G to be true of
a behavior σ iff for every k ∈ N, if F ∧ G is true of σ(0) → . . . → σ(k ) then
F ∨ G is true of σ(0) → . . . → σ(k + 1). Understanding why the following
theorem is true is a good test that you understand the definition of F+v .
• No step is both a c step and a d step. This condition means that a step
in a behavior of the program consists of a step of a single component.
CHAPTER 7. LOOSE ENDS 289
1. |= ∀ j ∈ 1 . . n : C(M j ) ⇒ E i
then |= (∀ i ∈ 1 . . n : M li ) ⇒ (∀ i ∈ 1 . . n : M i )
CHAPTER 7. LOOSE ENDS 290
The theorem does not make any assumption about v . That’s because if w
is the tuple of all variables appearing in the formulas (including in v ), then
F+w implies F+v . Thus, if hypothesis 2(a) is satisfied for any state function
v , then it’s satisfied with v equal to the tuple of all variables in the formulas.
Letting v equal that tuple produces the weakest (hence easiest to satisfy)
hypothesis.
when run in an environment that satisfies 2(y = 0). So, we decide to write
our program as M lc ∧ M ld where:
∆
M lc = (M lx with x ← c, y ← d )
∆
M ld = (M lx with x ← d , y ← c)
This silly example captures the most important aspect of specifying com-
ponents: No real device will satisfy a specification such as 2(c = 0) when
executed in an arbitrary environment. For example, a process will not be
able to compute the GCD of two numbers if other processes can at any time
arbitrarily change the values of its variables.
We want to deduce that M lc ∧ M ld implies 2(c = 0) ∧ 2(d = 0)
from the properties that components c and d satisfy, without knowing
what M lc and M ld are. The property that the c component satisfies is
that if its environment satisfies 2(d = 0) then the component satisfies
2(c = 0); and d satisfies the same condition with d and c interchanged.
The obvious way to express these two properties is 2(d = 0) ⇒ 2(c = 0)
and 2(c = 0) ⇒ 2(d = 0), but those two properties obviously don’t imply
2(c = 0) ∧ 2(d = 0). We need to find the right way to express mathemati-
cally the condition that a component satisfies the property M if its environ-
ment satisfies the property E . We do this by assuming that the condition is
+ +
expressed by a formula E −. M and figuring out what the definition of −.
should be, given the assumption that the definition should make this true:
(7.36) |= 2(d = 0) −.
+
2(c = 0) and |= 2(c = 0) −.
+
2(d = 0)
implies |= 2(c = 0) ∧ 2(d = 0)
It’s instructive to compare Theorems 7.7 and 7.8. They both make no
assumption about v , since letting it equal the tuple of all variables in the
formulas yields the weakest hypothesis 2(a). Hypothesis 1 differs only in
Theorem 7.8 having the additional conjunct C(E ). This conjunct (which
weakens the hypothesis) is expected because, if M is the conjunction of the
M i , then the M in the conclusion of Theorem 7.7 is replaced in Theorem 7.8
+
by E −. M.
As we observed for Theorem 7.7, hypothesis 1 of Theorem 7.8 pretty
much requires the E i to be safety properties. However, when applying The-
orem 7.8, we can choose to make them safety properties by moving the
liveness property of E i into the liveness property of M i . More precisely,
suppose we write E i as E Si ∧ E Li , where E Si is a safety property and E Li
a liveness property such that hE Si , E Li i is machine closed; and we simi-
larly write M i as M Si ∧ M Li . We can then replace E i by E Si and M i by
M Si ∧ (E Li ⇒ M Li ).5 This replaces the property E i −.
+
M i by the stronger
property:
(7.39) E Si −.
+
(M Si ∧ (E Li ⇒ M Li ))
It is stronger because if the environment doesn’t satisfy its liveness property
E Li , then E i −.
+
M i is satisfied no matter what the component does; but in
5
By definition of machine closure, hMiS , MiL i machine closed implies hMiS , EiL ⇒ MiL i
is also machine closed, because MiL implies EiL ⇒ MiL .
CHAPTER 7. LOOSE ENDS 293
that case, (7.39) still requires the component to satisfy its safety property
M Si if the environment satisfies its safety property E Si . The two formulas
should be equivalent in practice because machine closure of hE Si , E Li i implies
that, as long as the environment satisfies its safety property, the component
can’t know that the environment’s entire infinite behavior will violate its
liveness property.
Theorem 7.8 has been explained in terms of M i being the property sat-
isfied by a component whose description M li we don’t know, with M a
property we want the composition of the components to satisfy. It can also
be applied by letting M i be the actual component M li and letting M be
the composition ∀ i ∈ 1 . . n : M li of those components. The theorem then
tells us under what environment assumption E the composition will behave
properly if each M li behaves properly under the environment assumption
E i . However, there is a problem when using it in this way. To explain
the problem, we return to our two components c and d whose composition
satisfies 2(c = 0) ∧ 2(d = 0).
The definitions M lc and M ld in (7.29) were written for components c and
d intended to be composed with one another. They were not written to de-
scribe a component that satisfies its desired property only if the environment
satisfies its property. We now want to define them and their environment
assumptions E c and E d so that:
+
|= (E c −. M lc ) ⇒ 2(c = 0)
+
|= (E d −. M ld ) ⇒ 2(d = 0)
The definition of M lc asserts that the value of d cannot change when the
value of c changes (because of the conjunct d 0 = d in the next-state relation)
and d cannot change when c doesn’t change (because of the subscript hc, d i).
That’s a property of its environment. If we want d to satisfy that property,
we should state it in E c , not inside the definition of M lc . So, the definition
of M lc should be
M lc = (c = 0) ∧ 2[c 0 = d ]c ∧ WFc (c 0 = d )
∆
[ ∨ (c 0 = d ) ∧ (d 0 = d )
∨ (d 0 = c) ∧ (c 0 = c)
∨ (c 0 = d ) ∧ (d 0 = c) ]hc,d i
6
An interleaving description is often taken to mean any description of a program’s
executions as sequences of states and/or events, so by that meaning all TLA program
descriptions are interleaving descriptions.
Appendix A
Digressions
295
APPENDIX A. DIGRESSIONS 296
3. Russell is a mapping such that Russell (S ) 6= M (S )(S ) for all sets S that
are mappings.
Proof: The value of any syntactically correct formula is a set, even if its
elements are unspecified. Therefore, M (S )(S ) is a set, and for any set T
there exists a set U such that U 6= T . Thus, Russell is a mapping such
that Russell (S ) 6= M (S )(S ) for every mapping S .
4. Russell (S ) 6= S (S ) for all sets S that are mappings.
Proof: Substituting S for U in step 2 shows S (S ) equals M (S )(S ), which
by step 3 is unequal to Russell (S ).
5. Q.E.D.
Proof: Since Russell is a mapping, and all mappings are assumed to
be sets, substituting Russell for S in step 4 proves Russell (Russell ) 6=
Russell (Russell ), which equals false.
For example, the definition (2.29) of the mapping # has this form, where
Def is defined by:
∆
Def (x , M ) = if x = {} then 0
else 1 + M (x \ {choose e : e ∈ x })
To make sure that what we do is mathematically sound, we should define
what an arbitrary definition of the form (A.1) actually defines the mapping
M to be.
If we want the mapping M to be a function with domain D, the example
of the factorial function fact in Section 2.4.2 shows that we can define M to
equal
choose M : M = (x ∈ D 7→ Def (x , M ))
This suggests that (A.1) should define M (x ) to equal the value of f (x ) for
some function f , whose domain contains x , that satisfies f (y) = Def (y, f )
for all y in its domain. To do that, let’s first define
∆
fdef (D, f ) = x ∈ D 7→ Def (x , f )
APPENDIX A. DIGRESSIONS 297
(Note that the definition of fix depends on fdef , whose definition depends
on Def .) We can then define (A.1) to mean:
∆
(A.3) M (x ) = (choose f : (x ∈ domain(f )) ∧ fix (f )) (x )
It apparently defines F (3) to equal x 000 . It doesn’t. To see why not, let’s
simplify things by defining F to be a function with domain N:
∆
F = choose f : f = (n ∈ N 7→ if n = 0 then x else f (n − 1)0 )
There are also rules for deriving a Hoare triple for a program from Hoare
triples of its components. Here are three such rules:
Such rules decompose the proof of a Hoare triple for any program to proofs of
Hoare triples for elementary statements of the language, such as assignment
statements.
It was quickly realized that pre- and postconditions are not adequate to
describe what a program should do. For example, suppose S is a program
to sort an array x of numbers. The obvious Hoare triple for it to satisfy has
a precondition asserting that x is an array of numbers and a postcondition
asserting that x is sorted. But this Hoare triple is true of a program that
simply sets all the elements of the array x to 0. A postcondition needs to
be able to state a relation between the final values of the variables and their
initial values. Various ways were proposed for doing this, one of them being
to allow formulas P and Q to contain constants whose values are the same
in the initial and final states. For example, the precondition for a sorting
program could assert that the constant x 0 equals x , and the postcondition
could assert that the elements of the array x are a sorted permutation of
the elements of x 0.
Viewing a program as a relation between initial and final states means
that it can be described mathematically as a formula of the Logic of Actions.
APPENDIX A. DIGRESSIONS 300
|= P ∧ S ⇒ R 0 and |= R ∧ T ⇒ Q 0 imply |= P ∧ (S · T ) ⇒ Q 0
|= (P ∧ R ∧ S ⇒ Q 0 ) ∧ (P ∧ ¬R ∧ S ⇒ Q 0 ) ⇒
(P ∧ ((R ∧ S ) ∨ (¬R ∧ S )) ⇒ Q 0 )
((R ∧ S )+ ∧ ¬R 0 ) ∨ (¬R ∧ (v 0 = v ))
where v is the tuple of all program variables and (. . .)+ is defined in Sec-
tion 3.4.1.4. With this representation of the while statement, (A.8) can be
derived from the following rule of LA, where I is any state predicate and A
any action:
(A.9) |= I ∧ A ⇒ I 0 implies |= I ∧ A+ ⇒ I 0
The LA definition of a Hoare triple implies that the validity of rule (A.8) is
proved by the following theorem:
elements p, q, and r of M :
Do you see why these conditions imply δ(p, q) ≥ 0 for all p and q in M ?
The set R of real numbers is a metric space with δ(p, q) equal to |p − q|,
where |r | is the absolute value of the number r , defined by
∆
|r | = if r ≥ 0 then r else − r
Theorem A.2 For any subset S of a metric space, S ⊆ C(S ) and C(S ) =
C(C(S )).
1
δ (p, {}) to equal ∞,
To handle the uninteresting case of the empty set, we can define b
which is a value satisfying r < ∞ for all r ∈ R.
APPENDIX A. DIGRESSIONS 303
Proof: The definition of C and property M1 imply S ⊆ C(S ) for any set S ,
which implies C(S ) ⊆ C(C(S )) for any S . Therefore, to show C(S ) = C(C(S )),
it suffices to assume p ∈ C(C(S )) and show p ∈ C(S ). By definition of C
b we do this by assuming e > 0 and showing there exists q ∈ S with
and δ,
δ(q, p) < e. Because p ∈ C(C(S )), there exists u ∈ C(S ) with δ(p, u) < e/2;
and u ∈ C(S ) implies there exists q ∈ S with δ(q, u) < e/2. By M2 and M3,
this implies δ(p, q) < e. End Proof
As you will have guessed by its name, the operator C on behavior predicates
is a special case of the closure operator C on metric spaces. But for now,
forget about behavior predicates and just think about metric spaces.
A set S that, like CD, equals its closure is said to be closed. The following
result shows that for any set S , its closure C(S ) is the smallest closed set
that contains S .
Theorem A.4 Any subset S of a metric space equals C(S ) ∩ D for a dense
set D.
APPENDIX A. DIGRESSIONS 304
Proof: Let M be the metric space and let D equal S ∪ (M \ C(S )). The set
D consists of all elements of M except those elements in the boundary of S
that are not in S . It follows from this that C(S ) ∩ D = S . Since elements
in the boundary of S are a distance 0 from S , which is a subset of D, they
are a distance 0 from D. Therefore all elements in M are a distance 0 from
D, so D is dense. End Proof
What we’re interested in is not the distance function δ, but the closure op-
erator C. Imagine that the plane was an infinite sheet of rubber that was
then stretched and shrunk unevenly in some way. Define the distance be-
tween two points on the original plane to be the distance between them
after the plane was deformed. For example, if the plane was stretched
to make everything twice as far apart in the y direction but the same
distance
p apart in the x direction, then δ(hx 1 , y 1 i, hx 2 , y 2 i) would equal
(x 1 − x 2 )2 + (2 ∗ (y 1 − y 2 ))2 . As long as the stretching and shrinking is
continuous, meaning that the rubber sheet is not torn, the boundary of a
set S in the plane after it is deformed is the set obtained by deforming the
boundary of S . This implies that the new distance function produces the
same closure operator as the ordinary distance function on the plane.
Topology is the study of properties of objects that depend only on a
closure operation, which need not be generated by a metric space. But we
are interested in a closure operator that is generated by a particular kind of
metric space, and it helps me to think in terms of its distance function.
Proofs
If the program were described in TLA instead of RTLA, the disjunct Stutter
would be removed from the definition of Next; and Next in the theorem
would be replaced by [Next]v , where v is the tuple hx , t, pc i of variables.
The proof of the theorem would be essentially the same, the only difference
being that the action Stutter would be replaced everywhere by its second
conjunct, which is v 0 = v .
The proof of the theorem is decomposed hierarchically. The first two
levels are determined by the logical structure of the theorem. There are two
standard ways to decompose the proof of a formula of the form F ⇒ G:
306
APPENDIX B. PROOFS 307
Steps 3 and 4 are simple enough that there is no need to decompose their
proofs. You should try to understand why these steps, and the others whose
proofs are given here, follow from the facts and definitions mentioned in
their proofs. To help you, a little bit of explanation has been added to some
of the proofs.
We now have to prove steps 1 and 2. They can both be decomposed using
the definition of Inv as a conjunction. We consider the proof of step 1. Here
is the first level of its decomposition.
1.1. TypeOK 0
1.3. x 0 ≤ NumberDone 0
APPENDIX B. PROOFS 308
1.4. Q.E.D.
Proof: By steps 1.1–1.3 and the definition of Inv .
Step 1.2 is the most difficult one to prove, so we examine its proof. The
standard way to prove a formula of this form is to assume i ∈ Procs and
pc 0 (i ) = b and prove t 0 (i ) ≤ NumberDone 0 . So, the first step of the proof
should be a Suffices step asserting that it suffices to make those assump-
tions and prove t 0 (i ) ≤ NumberDone 0 . Thus far, we have used only the
logical structure of the formulas, without thinking about what the formulas
mean. We can go no further that way. To write the rest of the proof of
step 1.2, we have to ask ourselves why an aStep(p) step starting in a state
with Inv true produces a state with t 0 (i ) ≤ NumberDone 0 true.
When I asked myself that question, I realized that the answer depends
on whether or not i is the process p executing the step. That suggested
proving the two cases i 6= p and i = p separately, asserting them as Case
statements. In figuring out how to write those two proofs, I found that both
of them required proving NumberDone 0 = NumberDone. Moreover, this was
true for the same reason in both cases—namely, that an aStep step of any
process leaves NumberDone unchanged. Therefore, I could prove it once in
a single step that precedes the two Case statements. This produced the
following level-3 proof:
1.2.3. Case: i = p
1.2.4. Case: i 6= p
1.2.5. Q.E.D.
Proof: By steps 1.2.3 and 1.2.4.
This leaves three steps to prove. Here is the proof of step 1.2.4, which I
think is the most interesting one.
which together imply that the values of t(i ) and pc(i ) are unchanged.
(The definition of TypeOK is needed because type correctness is
required to deduce this.)
1.2.4.2. pc(i ) = b
Proof: By step 1.2.4.1 and the step 1.2.1 assumption pc 0 (i ) = b.
1.2.4.3. t(i ) ≤ NumberDone
Proof: By step 1.2.4.2, the step 1 assumption (which implies Inv ),
and the second conjunct in the definition of Inv .
1.2.4.4. Q.E.D.
Proof: Steps 1.2.4.1, 1.2.4.3, and 1.2.2 imply t 0 (i ) ≤ NumberDone 0 ,
which is the current goal (introduced in step 1.2.1).
3. Q.E.D.
Proof: By steps 1 and 2.
Theorem 4.6 Let Init be a state predicate, Next an action, and v a tuple
of all variables occurring in Init and Next. If Ai is a subaction of Next for
all i in a countable set I , then the pair
h Init ∧ 2[Next]v , ∀ i ∈ I : XFiv (Ai ) i
is machine closed, where each XFiv may be either WF or SF.
imply 2
32 3 EhQ iv ⇒ 23Q, which by the step 3.1 assumption 2
32
3 EhQ iv
implies the step 3.1 goal 23Q.
4. Q.E.D.
Proof: By steps 1–3.
Proof sketch: For any behavior σ, let σ|x be the infinite sequence of n-
tuples of values such that σ|x (i ) equals the value of hxi in state σ(i ). The
basic idea is to define S so that the value of y in any state i of a behavior
of S always equals (σ|x )+i for some behavior σ satisfying F , and x always
equals y(0). (Remember that τ is the infinite sequence τ (0) → τ (1) → · · · ,
and τ +i equals τ (i ) → τ (i + 1) → · · · .)
To do this, for any infinite sequence τ of n-tuples of values, we define
F (τ ) to equal F (σ) for any behavior σ such that σ|x equals τ . This uniquely
e
defines F e because, by hypothesis, the value of F (σ) depends only on the
values of the variables x in the behavior σ. Define IsTupleSeq to be the
mapping such that IsTupleSeq(τ ) is true iff τ is an infinite cardinal sequence
of n-tuples of arbitrary values. We then then define S by letting:
∆
Init = ∃ τ : ∧ IsTupleSeq(τ ) ∧ F
e (τ )
∧ (y = τ ) ∧ (hxi = τ (0))
∆
Next = (y 0 = Tail (y)) ∧ (hxi0 = y 0 (0))
With this definition, F (σ) equals true for a behavior σ iff there is a behavior
satisfying S in which the initial value of y is σ|x . Notice that σ is a halting
behavior iff τ ends with an infinite sequence of identical n-tuples. When
y equals that value Tail (y) = y, so hNext ihx,y i equals false and S allows
only stuttering steps from that point on.
Eliminating the conjunct WFhx,y i (Next) allows S to halt even if the
behavior y initially equals σ|x for a non-halting behavior σ that satisfies F .
APPENDIX B. PROOFS 318
[[z 0 = x + y]](fb(σ))
≡ fb(σ)(1)z = fb(σ)(0)x + fb(σ)(0)y By definition of [[. . .]].
≡ f (σ(1))z = f (σ(0))x + f (σ(0))y By definition of bf .
≡ σ(1)w = σ(0)u + σ(0)v By definition of f .
1. Assume: i ∈ I
Prove: hB hi ivh ≡ hB i ∧ (h 0 = exp i )iv
1.1. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i ) ∧ (vh 0 6= vh)
Proof: By the definitions of B hi , Next i , and h. . .i... .
1.2. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i )
Proof: By step 1.1, since vh = v ◦ hh i implies (v 0 6= v ) ∧ (vh 0 6= vh) ≡
(v 0 6= v ).
1.3. Q.E.D.
APPENDIX B. PROOFS 319
We now define ρ|x and F e to be the same as in the proof of Theorem 4.8,
except for finite behaviors ρ. We define ρ|x to be the finite cardinal sequence
of n-tuples of the same length as ρ such that ρ|x (i ) equals the value of hxi
in state ρ(i ); and we define F e to be the predicate on finite sequences of
n-tuples of values such that F (ρ|x ) equals F (ρ) (which by definition equals
e
F (ρ↑ )). We then define IF to equal F e (h).
Much as in the proof of Theorem 4.8, for any behavior τ , a finite prefix
ρ of τ satisfies F iff F
e (ρ|x ) is true, which is true iff F
e (h) is true of the last
state of ρ. Every state of τ is the last state of some finite prefix of τ , and
the safety property F is true of τ iff it is true of every finite prefix of τ , so
F is true of τ iff IF is true of every state of τ . This proves that IF is an
invariant of T h iff T h satisfies F ; and T satisfies F iff T h does, because F
depends only on the variables x. End Proof Sketch
Ai = (i = hx, z, t i0 ) ∧ Next th
∆
A Next thp step removes the first element from p and appends that element
to the end of h. Therefore, the value of h ◦ p remains unchanged throughout
any behavior that satisfies T thp . The value of h ◦ p during a behavior σ
satisfying T thp equals the sequence of values of hx, z, t i in the entire behavior
σ, except that σ may have additional (stuttering) steps that leave hx, z, t i
unchanged.
In any state of a behavior satisfying T thp , the value of (h ◦p)(Len(h)−1)
(the last element in the sequence h of m-tuples of values) is the current value
of hx, z, t i. The variables h and p, together with the mapping Φ contain all
the information needed to define a refinement mapping under which T thp
implements IS . To see how this is done, we need some notation.
For any behavior σ and state expression exp, define σ|exp to be the infinite
sequence of values such that (σ|exp )(i ) equals the value of exp in state σ(i ),
for all i ∈ N. Thus σ|hx,z,t i is the sequence of m-tuples of values of hx, z, t i
in the states of σ. Define the mapping Φ e from m-tuples of values to behaviors
so that Φ(ρ) equals Φ(σ) for some behavior σ such that σ|hx,z,t i = ρ. (It
e
doesn’t matter what values the states of σ assign to variables other than
those in x, z, and t since they don’t affect whether or not σ satisfies T .)
We are assuming that Φ(σ) satisfies IS and Φ(σ) ∼ y σ. Therefore, for
any behavior satisfying T thp , for the value of h ◦ p in any state of that
behavior, Φ(he ◦ p) satisfies IS and Φ(h
e ◦ p) ∼y σ for some behavior σ such
that σ|hx,zi = h ◦ p.
To understand how to construct the needed refinement mapping, we con-
sider a simpler version of the theorem that would be true if we were using
APPENDIX B. PROOFS 323
A · B ≡ (∃ i ∈ I , j ∈ J : Ai · B j ) by (B.2)
⇒ (∃ i ∈ I , j ∈ J : B j · Ai ) we assume Ai · B j ⇒ B j · Ai for all i and j
≡ B ·A by (B.2), substituting I ← J , J ← I ,
Ai ← B j , and B j ← Ai .
End Proof
implies |= S ⇒ P .
2. S = C(S ∧ 23Q)
Proof: By step 1 and Theorem 4.4.
3. Q.E.D.
Proof: By step 2, the assumption that P is a safety property, and The-
orem 4.2.
Bibliography
[1] Martı́n Abadi and Leslie Lamport. An old-fashioned recipe for real
time. ACM Transactions on Programming Languages and Systems,
16(5):1543–1571, September 1994. This paper has an appendix pub-
lished by ACM only online that contains proofs. Other online versions
of the paper might not contain the appendix.
[5] Selma Azaiez, Damien Doligez, Matthieu Lemerre, Tomer Libal, and
Stephan Merz. Proving determinacy of the PharOS real-time operating
system. In Michael Butler, Klaus-Dieter Schewe, Atif Mashkoor, and
Miklós Biró, editors, 5th Intl. Conf. Abstract State Machines, Alloy, B,
TLA, VDM, and Z (ABZ 2016), volume 9675 of LNCS, pages 70–85.
Springer, 2016.
[6] Arthur Bernstein and Paul K. Harter, Jr. Proving real time properties of
programs with temporal logic. In Proceedings of the Eighth Symposium
on Operating Systems Principles, pages 1–11, New York, 1981. ACM.
Operating Systems Review 15, 5.
[7] James E. Burns and Nancy A. Lynch. Bounds on shared memory for
mutual exclusion. Inf. Comput., 107(2):171–184, 1993.
327
BIBLIOGRAPHY 328
[8] Ernie Cohen and Leslie Lamport. Reduction in TLA. In David San-
giorgi and Robert de Simone, editors, CONCUR’98 Concurrency The-
ory, volume 1466 of Lecture Notes in Computer Science, pages 317–331.
Springer-Verlag, 1998.
[11] Laurent Doyen, Goran Frehse, George J. Pappas, and André Platzer.
Verification of hybrid systems. In Edmund M. Clarke, Thomas A. Hen-
zinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model
Checking, pages 1047–1110. Springer, 2018.
[12] Michael Fischer. Re: Where are you? Email message to Leslie Lam-
port. Arpanet message sent on June 25, 1985 18:56:29 EDT, num-
ber 8506252257.AA07636@YALE-BULLDOG.YALE.ARPA (47 lines),
1985.
[15] Aman Goel, Stephan Merz, and Karem A. Sakallah. Towards an auto-
matic proof of the bakery algorithm. In Marieke Huisman and António
Ravara, editors, Formal Techniques for Distributed Objects, Compo-
nents, and Systems, volume 13910 of Lecture Notes in Computer Sci-
ence, pages 21–28. Springer, 2023.
[17] Thomas A. Henzinger, Zohar Manna, and Amir Pnueli. What good
are digital clocks? In Werner Kuich, editor, Automata, Languages
and Programming, 19th International Colloquium, ICALP92, Vienna,
BIBLIOGRAPHY 329
[25] Peter Ladkin, Leslie Lamport, Bryan Olivier, and Denis Roegel. Lazy
caching in TLA. Distributed Computing, 12(2/3):151–174, 1999.
[40] Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc
Brooker, and Michael Deardeuff. How amazon web services uses formal
methods. Communications of the ACM, 58(4):66–73, April 2015.
[44] Susan Owicki and Leslie Lamport. Proving liveness properties of con-
current programs. ACM Transactions on Programming Languages and
Systems, 4(3):455–495, July 1982.
[45] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agree-
ment in the presence of faults. Journal of the ACM, 27(2):228–234,
April 1980.
[46] Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th
Annual Symposium on the Foundations of Computer Science, pages 46–
57. IEEE, November 1977.
[47] Eric Verhulst, Raymond T. Boute, José Miguel Sampaio Faria, Bernard
H. C. Sputh, and Vitaliy Mezhuyev. Formal Development of a Network-
Centric RTOS. Springer, New York, 2011.
[48] Hagen Völzer and Daniele Varacca. Defining fairness in reactive and
concurrent systems. J. ACM, 59(3):13:1–13:37, 2012.
Index
332