0% found this document useful (0 votes)
10 views342 pages

Science of Concurrent Programs

The document is a version of 'A Science of Concurrent Programs' by Leslie Lamport, updated on January 30, 2024, which includes significant changes to the explanation of bound variables and plans for future simplifications in Chapter 2. The author invites feedback on errors or comments, emphasizing the importance of including the version date in communications. The contents outline various mathematical concepts and their applications in concurrent programming.

Uploaded by

scribd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views342 pages

Science of Concurrent Programs

The document is a version of 'A Science of Concurrent Programs' by Leslie Lamport, updated on January 30, 2024, which includes significant changes to the explanation of bound variables and plans for future simplifications in Chapter 2. The author invites feedback on errors or comments, emphasizing the importance of including the version date in communications. The contents outline various mathematical concepts and their applications in concurrent programming.

Uploaded by

scribd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 342

A Science of Concurrent Programs

Leslie Lamport

version of 30 January 2024

This version contains a major change from the version of


6 January in the explanation of bound variables. The major
change is to Section 2.1.9, but it also affects Sections 2.2.3,
2.2.5, 2.4.1, and 6.1.2. Changes to other sections are minor.
However, I recommend ignoring these changes because
that material will be removed in the next version. I have
decided that the current explanation of ordinary math is too
long and too formal. I intend to make Chapter 2 very short
and informal, describing nothing not needed for Chapter 3.
Other math will be introduced later, when it is needed.
If you have any errors to report, comments to make, or
questions to ask about anything not involving bound vari-
ables, please contact me by email. But when you do, include
the version date. Anyone who is the first to report any error
will be thanked in the book. If you find an error, please in-
clude in your email your name as you wish it to appear, and
be sure also to include the version date.
Contents

About This Book (unfinished) . . . . . . . . . . . . . . . . . . . . . viii


Acknowledgments (unfinished) . . . . . . . . . . . . . . . . . . . . ix

1 Introduction 1
1.1 Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Who Are You? . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 The Origin of the Science . . . . . . . . . . . . . . . . . . . . 2
1.3.1 The Origin of the Theory . . . . . . . . . . . . . . . . 2
1.3.2 The Origin of the Practice . . . . . . . . . . . . . . . . 3
1.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Why Math? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Ordinary Math 12
2.1 Arithmetic and Logic . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Elementary Algebra . . . . . . . . . . . . . . . . . . . 14
2.1.3 An Introduction to Mathglish . . . . . . . . . . . . . . 16
2.1.4 Proofs in Elementary Algebra . . . . . . . . . . . . . . 17
2.1.4.1 An Example . . . . . . . . . . . . . . . . . . 18
2.1.4.2 Longer Proofs . . . . . . . . . . . . . . . . . 21
2.1.5 The Semantics of Elementary Algebra . . . . . . . . . 23
2.1.6 Arithmetic Logic . . . . . . . . . . . . . . . . . . . . . 27
2.1.7 Propositional Logic . . . . . . . . . . . . . . . . . . . . 29
2.1.8 The Propositional Logic of Arithmetic . . . . . . . . . 31
2.1.8.1 The Logic . . . . . . . . . . . . . . . . . . . . 31
2.1.8.2 More About Proofs . . . . . . . . . . . . . . 33
2.1.9 Predicate Logic . . . . . . . . . . . . . . . . . . . . . . 36
2.1.9.1 Quantifiers . . . . . . . . . . . . . . . . . . . 36
2.1.9.2 Variables and Their Scopes . . . . . . . . . . 38

i
CONTENTS ii

2.1.9.3 Predicate Logic Reasoning . . . . . . . . . . 42


2.1.9.4 The choose Operator . . . . . . . . . . . . . 45
2.1.10 More About Proofs . . . . . . . . . . . . . . . . . . . . 46
2.1.10.1 Assume/Prove in General . . . . . . . . . . 46
2.1.10.2 Hierarchically Structured Proofs . . . . . . . 48
2.1.11 Some Useful Notation . . . . . . . . . . . . . . . . . . 50
2.1.11.1 if/then/else . . . . . . . . . . . . . . . . 50
2.1.11.2 Conjunction and Disjunction Lists . . . . . . 51
2.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2.1 Simple Operators on Sets . . . . . . . . . . . . . . . . 52
2.2.2 More Sets and Set Operators . . . . . . . . . . . . . . 54
2.2.3 Two Set Constructs . . . . . . . . . . . . . . . . . . . 55
2.2.4 Venn Diagrams . . . . . . . . . . . . . . . . . . . . . . 56
2.2.5 Bounded Quantification . . . . . . . . . . . . . . . . . 57
2.2.6 Infinite Sets and Collections . . . . . . . . . . . . . . . 59
2.2.6.1 Infinite Sets . . . . . . . . . . . . . . . . . . 59
2.2.6.2 Mathematical Induction . . . . . . . . . . . . 61
2.2.6.3 Collections . . . . . . . . . . . . . . . . . . . 62
2.2.6.4 Eliminating Collections . . . . . . . . . . . . 63
2.2.7 Meaningless Expressions . . . . . . . . . . . . . . . . . 64
2.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.1 Functions as Mappings . . . . . . . . . . . . . . . . . . 65
2.3.2 Sequences and Tuples . . . . . . . . . . . . . . . . . . 67
2.4 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.1 Ordinary Definitions . . . . . . . . . . . . . . . . . . . 70
2.4.1.1 Definitions with no Parameters . . . . . . . . 70
2.4.1.2 Definitions with Parameters . . . . . . . . . 71
2.4.2 Recursive Definitions . . . . . . . . . . . . . . . . . . . 72
2.4.3 Local Definitions . . . . . . . . . . . . . . . . . . . . . 73
2.4.3.1 Definitions in Proofs . . . . . . . . . . . . . . 74
2.4.3.2 Definitions in Formulas . . . . . . . . . . . . 74

3 Describing Abstract Programs with Math 75


3.1 The Behavior of Physical Systems . . . . . . . . . . . . . . . 75
3.2 Behaviors of Digital Systems . . . . . . . . . . . . . . . . . . 78
3.2.1 From Continuous to Discrete Time . . . . . . . . . . . 78
3.2.2 An Example: Sqrs . . . . . . . . . . . . . . . . . . . . 80
3.2.3 A Finer-Grained Example: FGSqrs . . . . . . . . . . . 84
3.3 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 91
CONTENTS iii

3.4.1 The Logic of Actions . . . . . . . . . . . . . . . . . . . 91


3.4.1.1 Eliminating State Numbers . . . . . . . . . . 91
3.4.1.2 The Semantics of the Logic of Actions . . . . 93
3.4.1.3 The Prime Operator . . . . . . . . . . . . . . 95
3.4.1.4 Action Composition . . . . . . . . . . . . . . 96
3.4.2 The Temporal Logic RTLA . . . . . . . . . . . . . . . 97
3.4.2.1 Simple RTLA . . . . . . . . . . . . . . . . . 98
3.4.2.2 The Complete RTLA . . . . . . . . . . . . . 100
3.4.2.3 The 2 Operator . . . . . . . . . . . . . . . . 101
3.4.2.4 Eventually (3) . . . . . . . . . . . . . . . . . 102
3.4.2.5 Eventually Always (32) . . . . . . . . . . . 104
3.4.2.6 Infinitely Often (23) . . . . . . . . . . . . . 104
3.4.2.7 The End of the Line . . . . . . . . . . . . . . 105
3.4.2.8 Leads To (;) . . . . . . . . . . . . . . . . . 106
3.4.2.9 Warning . . . . . . . . . . . . . . . . . . . . 107
3.5 TLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.5.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . 108
3.5.2 The Solution . . . . . . . . . . . . . . . . . . . . . . . 109
3.5.3 Stuttering Insensitivity . . . . . . . . . . . . . . . . . 111
3.5.4 The Definition of TLA . . . . . . . . . . . . . . . . . . 113

4 Safety, Liveness, and Fairness 115


4.1 Safety and Liveness . . . . . . . . . . . . . . . . . . . . . . . . 115
4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . 115
4.1.2 A Completeness Theorem . . . . . . . . . . . . . . . . 117
4.1.3 The Operator C . . . . . . . . . . . . . . . . . . . . . . 119
4.1.4 What Good is Liveness? . . . . . . . . . . . . . . . . . 120
4.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2.1 Traditional Programs and Enabled . . . . . . . . . . . 121
4.2.2 Concurrent Programs . . . . . . . . . . . . . . . . . . 121
4.2.2.1 Mutual Exclusion . . . . . . . . . . . . . . . 122
4.2.2.2 Machine Closure . . . . . . . . . . . . . . . . 126
4.2.3 Weak Fairness . . . . . . . . . . . . . . . . . . . . . . 127
4.2.4 Temporal Logic Reasoning . . . . . . . . . . . . . . . . 130
4.2.5 Reasoning With Weak Fairness . . . . . . . . . . . . . 131
4.2.5.1 Liveness for Mutual Exclusion . . . . . . . . 131
4.2.5.2 The One-Bit Algorithm . . . . . . . . . . . . 132
4.2.5.3 Proving Liveness . . . . . . . . . . . . . . . . 134
4.2.6 Strong Fairness . . . . . . . . . . . . . . . . . . . . . . 138
4.2.6.1 Starvation Free Mutual Exclusion . . . . . . 138
CONTENTS iv

4.2.6.2 The Definition of Strong Fairness . . . . . . 139


4.2.6.3 Using a Strongly Fair Semaphore . . . . . . . 140
4.2.7 Properties of WF and SF . . . . . . . . . . . . . . . . 141
4.2.8 What is Fairness? . . . . . . . . . . . . . . . . . . . . 143
4.3 Possibility and Accuracy . . . . . . . . . . . . . . . . . . . . . 144
4.3.1 Possibility Conditions . . . . . . . . . . . . . . . . . . 144
4.3.2 Expressing Possibility in TLA . . . . . . . . . . . . . . 145
4.3.3 Checking Accuracy . . . . . . . . . . . . . . . . . . . . 147
4.4 Real-Time Programs . . . . . . . . . . . . . . . . . . . . . . . 148
4.4.1 Fischer’s Algorithm . . . . . . . . . . . . . . . . . . . 149
4.4.2 Correctness of Fischer’s Algorithm . . . . . . . . . . . 152
4.4.3 Fairness and Zeno Behaviors . . . . . . . . . . . . . . 153
4.4.4 Discrete Time . . . . . . . . . . . . . . . . . . . . . . . 155
4.4.5 Hybrid Systems . . . . . . . . . . . . . . . . . . . . . . 157

5 Refinement 159
5.1 A Sequential Algorithm . . . . . . . . . . . . . . . . . . . . . 160
5.1.1 A One-Step Program . . . . . . . . . . . . . . . . . . . 161
5.1.2 Two Views of Refinement Mappings . . . . . . . . . . 162
5.1.3 A Step and Data Refinement . . . . . . . . . . . . . . 164
5.2 Invariance Under Refinement . . . . . . . . . . . . . . . . . . 167
5.3 An Example: The Paxos Algorithm . . . . . . . . . . . . . . . 168
5.3.1 The Consensus Problem . . . . . . . . . . . . . . . . . 169
5.3.2 The Paxos Consensus Algorithm . . . . . . . . . . . . 171
5.3.2.1 The Specification of Consensus . . . . . . . . 171
5.3.2.2 The Voting Algorithm . . . . . . . . . . . . . 172
5.3.2.3 The Paxos Abstract Program . . . . . . . . . 174
5.3.3 Implementing Paxos . . . . . . . . . . . . . . . . . . . 176
5.4 Proving Refinement . . . . . . . . . . . . . . . . . . . . . . . 178
5.4.1 The Refinement Mapping . . . . . . . . . . . . . . . . 179
5.4.2 Refinement of Safety . . . . . . . . . . . . . . . . . . . 180
5.4.3 Refinement of Fairness . . . . . . . . . . . . . . . . . . 183
5.4.4 A Closer Look at E . . . . . . . . . . . . . . . . . . . 186
5.4.4.1 A Syntactic View . . . . . . . . . . . . . . . 186
5.4.4.2 Computing E . . . . . . . . . . . . . . . . . 187
5.4.4.3 The Trouble With E . . . . . . . . . . . . . 189
5.5 A Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
CONTENTS v

6 Auxiliary Variables 194


6.1 Variable Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 194
6.1.2 Reasoning About ∃ . . . . . . . . . . . . . . . . . . . 196
6.1.3 The Definition . . . . . . . . . . . . . . . . . . . . . . 197
6.2 History Variables . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.2.1 How to Add a History Variable . . . . . . . . . . . . . 199
6.2.2 History Variables and Fairness . . . . . . . . . . . . . 203
6.2.3 A Completeness Result for History Variables . . . . . 205
6.3 Stuttering Variables . . . . . . . . . . . . . . . . . . . . . . . 205
6.3.1 The Example . . . . . . . . . . . . . . . . . . . . . . . 206
6.3.2 Adding Stuttering Steps After an Action . . . . . . . 208
6.3.3 Adding Stuttering Steps Before an Action . . . . . . . 212
6.3.4 Fairness and Stuttering Variables . . . . . . . . . . . . 213
6.3.5 Infinite-Stuttering Variables . . . . . . . . . . . . . . . 216
6.4 Prophecy Variables . . . . . . . . . . . . . . . . . . . . . . . . 217
6.4.1 Simple Prophecy Variables . . . . . . . . . . . . . . . 217
6.4.2 Predicting the Impossible and Liveness . . . . . . . . . 221
6.4.3 General Prophecy Variables . . . . . . . . . . . . . . . 222
6.4.3.1 A Sequence of Prophecies . . . . . . . . . . . 223
6.4.3.2 A Set of Prophecies . . . . . . . . . . . . . . 226
6.4.3.3 Further Generalizations . . . . . . . . . . . . 229
6.4.4 The Existence of Refinement Mappings . . . . . . . . 231
6.5 The FIFO Queue . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.5.1 Fifo – A Linearizable Specification . . . . . . . . . . . 232
6.5.2 POFifo – A More General Specification . . . . . . . . 234
6.5.2.1 The Background . . . . . . . . . . . . . . . . 234
6.5.2.2 Program POFifo . . . . . . . . . . . . . . . . 236
6.5.3 Showing IPOFifo Implements Fifo . . . . . . . . . . . 240
6.5.3.1 The Prophecy Variable . . . . . . . . . . . . 240
6.5.3.2 The History Variable qBar . . . . . . . . . . 241
6.5.3.3 Stuttering and the Refinement Mapping . . . 245
6.6 Prophecy Constants . . . . . . . . . . . . . . . . . . . . . . . 246

7 Loose Ends 249


7.1 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 249
7.1.1.1 What Reduction Is . . . . . . . . . . . . . . 249
7.1.1.2 The TLA Approach . . . . . . . . . . . . . . 251
7.1.2 An Example . . . . . . . . . . . . . . . . . . . . . . . 252
CONTENTS vi

7.1.2.1 The Reduced Behaviors . . . . . . . . . . . . 254


7.1.2.2 The Program S ⊗S R . . . . . . . . . . . . . . 255
7.1.2.3 The Invariant . . . . . . . . . . . . . . . . . . 259
7.1.3 Reduction In General . . . . . . . . . . . . . . . . . . 261
7.1.4 The Hypothesis 23¬L . . . . . . . . . . . . . . . . . . 265
7.1.5 Adding Liveness . . . . . . . . . . . . . . . . . . . . . 265
7.1.5.1 Fairness of Subactions of E R . . . . . . . . . 266
7.1.5.2 Fairness of Subactions of M R . . . . . . . . . 269
7.1.5.3 The Reduction Theorem with Fairness . . . . 272
7.1.6 An Example: Making Critical Sections Atomic . . . . 273
7.1.7 Another Example: Pipelining . . . . . . . . . . . . . . 276
7.2 Decomposing and Composing Programs . . . . . . . . . . . . 280
7.2.1 Decomposing Programs . . . . . . . . . . . . . . . . . 282
7.2.1.1 Writing a Program as a Conjunction . . . . . 282
7.2.1.2 Decomposing Proofs . . . . . . . . . . . . . . 284
7.2.2 Composing Components . . . . . . . . . . . . . . . . . 290

Appendix 295

A Digressions 295
A.1 Why Not All Mappings Are Sets . . . . . . . . . . . . . . . . 295
A.2 Recursive Definitions of Mappings . . . . . . . . . . . . . . . 296
A.3 How Not to Write x 000 . . . . . . . . . . . . . . . . . . . . . . 298
A.4 Hoare Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
A.5 Another Way to Look at Safety and Liveness . . . . . . . . . 301
A.5.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . 301
A.5.2 The Metric Space of Behaviors . . . . . . . . . . . . . 304

B Proofs 306
B.1 Invariance Proof of Increment . . . . . . . . . . . . . . . . . . 306
B.2 Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . 309
B.3 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . 311
B.4 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . 312
B.5 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . 313
B.6 Proof of Theorem 4.6 . . . . . . . . . . . . . . . . . . . . . . . 313
B.7 Proof of Theorem 4.7 . . . . . . . . . . . . . . . . . . . . . . . 315
B.8 Proof Sketch of Theorem 4.8 . . . . . . . . . . . . . . . . . . 317
B.9 Proof of Formula (5.4) . . . . . . . . . . . . . . . . . . . . . . 318
B.10 Proof of Theorem 6.2 . . . . . . . . . . . . . . . . . . . . . . . 318
CONTENTS vii

B.11 Proof Sketch of Theorem 6.3 . . . . . . . . . . . . . . . . . . 320


B.12 Proof Sketch of Theorem 6.6 . . . . . . . . . . . . . . . . . . 321
B.13 Proof of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . 324
B.14 Proof of Theorem 7.3 . . . . . . . . . . . . . . . . . . . . . . . 325

Bibliography 327

Index 332
CONTENTS viii

About This Book (unfinished)


I hope that this book will be published, but that the pdf version will be
free and only the printed version will be sold. In the published version, this
section will contain practical information about reading the book. Here is
some practical information for reading the current version.

• The pdf version contains links to other places in the book. They
are colored like this. I have not decided in what color links should
be. Besides links to numbered things like equations, there are a few
other links in the text from terms to where from they are defined or
introduced. I have not decided whether there should be more or none
of those other links.

• There will be an index in the published version, but there is none in


this preliminary version. In the pdf version, page numbers in the index
will be uncolored links to those pages.

• The Appendix contains some hopefully interesting digressions and the


proofs of most theorems.

• Almost all the examples are tiny, so the reader can concentrate on the
principles without being distracted by complications in the examples.
These tiny examples have been written in TLA+ and verified by model
checking; any errors in what is asserted about them are the result of
incorrect transcriptions of what has actually been verified. The TLA+
versions of the examples will be made available on the Web.

• The book contains a few marginal notes. The marginal notes in this
version colored like this are notes to myself for things to be checked
when preparing the published version.
CONTENTS ix

Acknowledgments (unfinished)
Because the ideas in this book were developed over several decades, it’s im-
possible for me to give credit to everyone who influenced their development.
I will therefore restrict this section to acknowledging only people who were
coauthors of papers of which I’ve been an author. They will be listed alpha-
betically. I haven’t decided whether to simply list them, or to indicate what
their influence has been. I currently expect there to be between 16 and 24
people listed, but I may change my mind on who might be included.
I will also add acknowledgments to people who read preliminary versions
and reported errors or sent me comments that led to significant changes.
Chapter 1

Introduction

1.1 Who Am I?
Dear Reader. I am inviting you to spend many pages with me. Before
deciding whether to accept my invitation, you may want to know who I am.
I was educated as a mathematician; my doctoral thesis was on partial
differential equations. While a student, I worked part-time and summers as a
programmer. At that time, almost all programs were what I call traditional
programs—ones with a single thread of control that take input, produce
output, and stop.
After obtaining my doctorate, I began working on concurrent algorithms—
ones comprising multiple independent threads of control, called processes,
that are executed concurrently. The first concurrent algorithms were meant
to be executed on a single computer, and processes communicated through a
shared memory. Later came distributed algorithms—concurrent algorithms
designed to be executed on multiple computers in which processes commu-
nicate by message passing.
This is not the place for modesty. I was very good at concurrency—both
writing concurrent algorithms and developing the theory underlying them.
The first concurrent algorithm I wrote, published in 1974, is still taught at
universities. In 1978 I published what is probably the first paper on the
theory of distributed computing. I have received many awards and honors
for this work, including a Turing award (generally considered the Nobel
prize of computer science) for “fundamental contributions to the theory and
practice of distributed and concurrent systems.”

1
CHAPTER 1. INTRODUCTION 2

1.2 Who Are You?


You probably belong to one of two classes of people who I will call scien-
tists and engineers. Scientists are computer scientists who are interested in
concurrent computing. If you are a scientist, you should be well-prepared
to decide if this book interests you and to read it if it does.
Engineers are people involved in building concurrent programs. If you
are an engineer, you might have a job title such as programmer, software
engineer, or hardware designer. I need to warn you that this book is about
a science, not about its practical application. Practice is discussed only to
explain the motivation for the science. If you are interested just in using
the science, you should read about the language TLA+ and its tools, which
are the practical embodiment of the science [27, 34]. But if you want to
understand the underlying science, then this book may be for you.
Like many sciences, the book’s science of concurrent programs is based
on mathematics. The book assumes only that you know the math one learns
before entering a university. All other necessary math not introduced along
with the science is explained in Chapter 2. Most of that math is taught at
universities in an introductory math course for computer science students,
though not as rigorously as it is presented here. Scientists should be used to
reading math. You may find the math hard if you’re an engineer. But unless
miseducation has burdened you with an insurmountable fear of mathematics,
I encourage you to give the book a try. Learning the math will improve your
thinking.

1.3 The Origin of the Science


The science that is the subject of this book, which I will call our science,
is a mathematical theory with a practical goal. That goal is to help build
concurrent programs that work correctly. Exactly what “working correctly”
means and why it’s an important goal are explained in Section 1.4. The
origin of our science explains how I came to believe it’s a good foundation
for trying to achieve that goal.

1.3.1 The Origin of the Theory


The first concurrent algorithm was published in 1965 by Edsger Dijkstra [9].
I started writing concurrent algorithms around 1973, and I quickly learned
that they were hard to get right. The many possible orders in which the op-
erations of different processes can be executed leads to an enormous number
CHAPTER 1. INTRODUCTION 3

of possible executions that have to be considered. The only way to ensure


that the algorithm worked correctly was to prove that it did.
By the 1970s, a standard approach had been developed for proving cor-
rectness of traditional programs. Around 1975, I and a few other computer
scientists began extending that approach to concurrent algorithms [4, 23, 28,
43]. Concurrent algorithms were usually written in pseudocode plus some
informal explanation of what the pseudocode meant. I came to realize that
all these methods for proving correctness could be explained by describing
a concurrent algorithm as what I am now calling an abstract program; and
an abstract program could be described mathematically.
Correctness of an algorithm was expressed by properties required of its
executions. I came to realize that correctness can also be expressed by
an abstract program—a more abstract, higher-level one than describes the
algorithm. Proving correctness means showing that the abstract program
describing the algorithm implements the abstract program describing its
correctness, and I developed a method for doing that.
This work culminated around 1990 with a way to write an abstract pro-
gram as a single formula [31]. The formula is written in an obscure form
of math called temporal logic. The particular temporal logic is called TLA
(for the Temporal Logic of Actions). Most of the TLA formula for an ab-
stract program consists of ordinary math that expresses essentially what
was described by pseudocode. Temporal logic replaces the informal expla-
nation of the pseudocode. That one abstract program implements another
is expressed as logical implication together with mathematical substitution.
Throughout this period, I was writing correctness proofs of the algo-
rithms I was inventing. This showed me that my way of reasoning with
abstract programs worked in practice. However, I discovered that as my
algorithms got more complicated and the formulas describing them became
larger, the method of writing proofs used by mathematicians became unre-
liable. It could not ensure that all the details were correct. I had to devise
a method of hierarchically structuring proofs to keep track of those details.

1.3.2 The Origin of the Practice


I have spent most of my career as a member of industrial research labs. The
computer science I have done has been motivated by the problems facing
system builders—sometimes before they were aware of those problems. I
have devoted the last part of my career to developing tools to help them—
both intellectual tools to help them think better and programs to help them
detect errors before they are implemented in code. These tools are based on
CHAPTER 1. INTRODUCTION 4

what I learned by writing and reasoning about concurrent algorithms.


Programming is not just coding. It requires thinking before we code.
Writing algorithms taught me that there are two things we need to decide
before writing and debugging the code: what the program should do and
how the program should do it. Most programmers think that the code
itself adequately describes “how the program should do it”, but I learned
that we need a higher-level, more abstract description of what the program
does. To emphasize that programming is more than just coding, I now
use the name coding language for what are commonly called programming
languages. That name is used in this book.
An algorithm is an example of a description of how a program should do
something. Concurrent algorithms are hard to understand. To invent them,
I had to be able to write them in the simplest way possible. Algorithms were
usually written in pseudocode to avoid the complexity that real code requires
to permit efficient execution. I developed a way to describe concurrent
algorithms in math that was more precise and no harder to understand
than pseudocode.
Engineers who build complex systems usually recognize the need for
describing what their programs do in a simpler, more abstract way than
with code. I decided that abstract programs written in math provided such
a way for describing the aspects of a system that involve concurrency. By
about 1995, I had designed a complete language called TLA+ that engineers
could use to write abstract programs in TLA.
The abstract programs I know of that have been written by engineers to
describe what a system should do generally consist of about 200–2000 lines
of TLA+. All but a few of those lines are ordinary math, not temporal logic.
As with code, those formulas are made easy to understand by decomposing
them into smaller pieces. This is done using simple definitions, rather than
the more complex constructs of coding languages.
To formalize mathematics and make it easier to write long formulas, I
had to add to TLA+ some concepts and syntax not present in the math com-
monly used by mathematicians—for example, variable declarations, group-
ing definitions into modules, and notation for substitution. This book uses
TLA, but not TLA+, because the examples with which it illustrates our
science are short and simple.
The kind of hierarchically structured proofs I devised can also be writ-
ten in TLA+, and there is a program for checking the correctness of those
proofs. However, with today’s proof-checking technology, writing machine-
checked proofs takes more time than engineers generally have. By the time
I designed TLA+, model checking had become a practical tool for checking
CHAPTER 1. INTRODUCTION 5

the correctness of abstract programs. A model checker can essentially check


correctness of all possible executions of a very small instance of an abstract
program. This turns out to be very effective at detecting errors. There
are two model checkers for abstract programs written in TLA+, using two
complementary approaches. Model checking is the standard way engineers
check those programs.
A program’s code can, in principle, be described by a (concrete) abstract
program and could, in principle, be written as a TLA+ formula. For a simple
program (or part of a program), the code can be hand-translated to TLA+
and checked with the TLA+ tools. Usually, the length of the program and
the complexity of the coding language makes this impractical.
From the point of view of our science, it makes no difference how long a
formula an abstract program is. We therefore consider a program written in
a coding language to be an abstract program. And since we are considering
only abstract programs, we will let program mean abstract program. We
will call an (abstract) program written in code a concrete program.
Although we don’t write them as formulas, viewing concrete programs as
abstract programs provides a new way of thinking about them. One benefit
of this way of thinking is that understanding what it means for a concrete
program to implement a higher-level abstract program can help avoid coding
errors.

1.4 Correctness
Thus far, our science has been described as helping to build concurrent
programs that work correctly. Working correctly is a vague concept. Here
is precisely what we take it to mean.
We define a behavioral property to be a condition that is or is not sat-
isfied by an individual execution of a program. For example, termination
is a behavioral property. An execution either terminates or else it doesn’t
terminate, meaning that it keeps executing forever. We say that a program
satisfies a behavioral property if every possible execution of the program sat-
isfies it. A program is considered to work correctly, or simply to be correct,
if it satisfies its desired behavioral properties.
That every possible execution of a program satisfies its behavioral prop-
erties may seem like an unreasonably strong requirement. I would be happy
if a program that I use does the right thing 99.99% of the times I run it. For
many programs, extensive testing can ensure that it does. But it can’t for
most concurrent programs. What a concurrent program does can depend on
CHAPTER 1. INTRODUCTION 6

the relative order in which operations by different processes are executed.


This makes the program nondeterministic, meaning that different executions
can do different things, even if the program receives identical inputs. This
can result in an enormous number of possible executions, and testing can
examine only a tiny fraction of them. Moreover, a concurrent program that
has run correctly for years can start producing frequent errors because a
small change to the computer hardware, the operating system, or even the
other programs running at the same time causes incorrect executions that
have never occurred before. The only way to prevent this is to ensure that
every possible execution satisfies the behavioral properties.
Model checking is more effective at finding errors in concurrent programs
than ordinary testing because it checks all possible executions. However,
it does this only on a few small instances of the program—for example,
an instance with few processes or one that allows only a small number of
messages to be in transit at any time.1 Engineering judgment is required
to decide if correctness of those instances provides enough confidence in the
correctness of the program.
There is one way testing could find errors in concrete programs. When
building a concurrent system, an abstract program is often used to model
how the processes interact with one another, and correctness of that pro-
gram is checked. The concrete program is then coded by implementing each
process of the more abstract program by a separate process in the code.
Since there is no concurrency within an individual process, testing that the
concrete program implements the more abstract program has a good chance
of finding coding errors. Research on this approach is in progress.
Is there a
citation for
this?
1.5 A Preview
To give you an idea of what our science is like, this section describes in-
formally a simple abstract program for Euclid’s algorithm—a traditional
algorithm that computes a value and stops. It’s a very simple concurrent
program in which the number of processes equals 1. Our science applies to
such programs, although there are simpler sciences that work quite well for
them.
Euclid’s algorithm computes the greatest common divisor (GCD) of two
positive integers that we will call M and N . For example, the GCD of 12 and
1
There are techniques for proving the correctness of a program by model checking a
simpler program, but they have not been implemented for abstract programs written in
TLA+.
CHAPTER 1. INTRODUCTION 7

16, written GCD(12, 16), equals 4 because 4 is the largest integer such that
12 and 16 are both multiples of that integer. The algorithm is an abstract
program containing two variables that we name x and y. Here is its prose
description.

Start with x equal to M and y equal to N and repeatedly perform the


following action until the program stops:
If the values of variables x and y are equal, then stop; otherwise,
subtract from the variable having the larger value the value of the
other variable.2

I believe most engineers and many scientists can’t explain why an execution
of Euclid’s algorithm computes GCD(M , N ), which means that they don’t
understand the algorithm. Here is the explanation provided by our science,
beginning with how we view executions.
We consider an execution to be a sequence of states. For Euclid’s algo-
rithm, a state is an assignment of values to the program variables x and y.
We write the state that assigns 7 to x and 42 to y as [x :: 7, y :: 42]. Here is
the sequence of states that is the execution of Euclid’s algorithm for M = 12
and N = 16.
" # " # " # " #
x :: 12 x :: 12 x :: 8 x :: 4
→ → →
y :: 16 y :: 4 y :: 4 y :: 4

The states in the sequence are separated with arrows because we naturally
think of an execution going from one state to the next. But in terms of our
science, the algorithm and its execution just are; they don’t go anywhere.
What an algorithm does in the future depends on its current state, not
on what happened in the past. This means that in the final state of the
execution, in which x and y are equal, they equal GCD(M , N ) because of
some property that is true of every state of the execution. To understand
Euclid’s algorithm, we must know what that property is.
That property is GCD(x , y) = GCD(M , N ). (Chapter 3 explains how
we show that every state satisfies this property.) Because an execution stops
only when x and y are equal, and GCD(i , i ) equals i for any positive integer
i , this property implies that x and y equal GCD(M , N ) in the final state of
the execution.
2
You may have seen a more efficient modern version of Euclid’s algorithm that replaces
the larger of x and y by the remainder when it is divided by the smaller. For the purpose
of this example, it makes little difference which we use.
CHAPTER 1. INTRODUCTION 8

That the formula GCD(x , y) = GCD(M , N ) is true in every state of a


program’s execution is a behavioral property. A behavioral property that
asserts a formula is true in all states of an execution is called an invariance
property, and the formula is called an invariant of the program. Correctness
of any concurrent program depends on it satisfying an invariance property.
To understand why the program is correct, we have to know the invariant
of the program that explains its correctness.
The invariant GCD(x , y) = GCD(M , N ) shows that, if Euclid’s algo-
rithm terminates, then it produces the correct output. A traditional pro-
gram must also satisfy the behavioral property of termination. The two
behavioral properties

• The program produces correct output if it terminates.

• The program terminates.

are special cases of the following two classes of behavioral properties that
can be required of a concurrent program:

Safety What the program is allowed to do.

Liveness What the program must eventually do.

These two classes of properties are defined precisely in Section 4.1. Termina-
tion is the only liveness property required of a traditional program. There
are many kinds of liveness properties that can be required of concurrent
programs.
Euclid’s algorithm satisfies its safety requirement (being allowed to ter-
minate only if it has produced the correct output) because the only thing
it is allowed to do is start with x = M and y = N and execute its action.
That is, it satisfies its safety requirement because it is assumed to satisfy the
safety property of doing only what the description of the algorithm allows
it to do.
Euclid’s algorithm satisfies its liveness requirement (eventually terminat-
ing) because it is assumed to satisfy the liveness property of eventually per-
forming any action that its description allows it to perform. (Section 3.4.2.8
shows how we prove that the algorithm terminates.)
I have found it best to describe and reason about safety and liveness
in different ways. In our science, temporal logic plays almost no role in
handling safety, but it is central to handling liveness. The TLA formula for
an abstract program is the logical conjunction of a safety property and a
liveness property.
CHAPTER 1. INTRODUCTION 9

A single-process algorithm that computes a value and stops doesn’t seem


to be a good example for a science of concurrent programs. So, let’s consider
a concurrent version of Euclid’s algorithm. It’s a two-process version of the
algorithm, suitable for execution on a single computer with the processes
communicating through shared variables. We call the two processes the x
process and the y process. The algorithm uses the same program variables
as the one-process version of Euclid’s algorithm and it begins in the same
starting state, with x = M and y = N . If the x process hasn’t stopped, it
is allowed to execute the following action whenever the when condition is
true:

When x ≥ y, stop if x = y, else subtract the value of y from x .

Process y is the same as process x , except with x and y interchanged.


Written in pseudocode, this version of Euclid’s algorithm looks differ-
ent from the one-process version. However, if we consider the executions
of the two versions, we see that they are the same. That is, they have the
same sequence of states, where a state is an assignment of values to x and y.
Whether we view Euclid’s algorithm as a one- or two-process algorithm may
affect how we implement it with a concrete program. An implementation
of the two-process algorithm with a two-process concrete program would
probably be less efficient than a single-process implementation of the one-
process algorithm. Since these two versions of the algorithm have the same
executions, from the point of view of correctness they are the same algo-
rithm. Both versions are written as the same TLA formula. More precisely,
their formulas are equivalent. There are many equivalent ways to write a
mathematical formula. How we choose to write the TLA formula for Eu-
clid’s algorithm can depend on whether we view it as a one- or two-process
algorithm.

1.6 Why Math?


The science of bridge building has a mathematical basis, but bridge design-
ers don’t represent a bridge by a mathematical formula. Why should we
describe an abstract program with one? The simple answer is, because we
can. A concrete program is not a physical object; it’s a concept. Code is
just one representation of that concept. While possible in theory, writing a
mathematical representation of a concrete program is not practical. How-
ever, for simpler abstract programs, it is possible; and I’ve found it to be a
good way to represent them.
CHAPTER 1. INTRODUCTION 10

Math has been developed over thousands of years to be simple and ex-
pressive. An abstract program ignores many implementation details, which
often means allowing multiple possible implementations. This is simple to
express in math. Code is designed to describe one way of computing some-
thing. It can be hard or even impossible to write code that allows all those
possibilities. Being based on concepts from coding languages, pseudocode
also lacks the simplicity and expressiveness of math.
One place we want to allow many possible implementations is in describ-
ing what the environment can do. A program can’t work in an arbitrary
environment. An implementation of Euclid’s algorithm will not produce the
correct answer if the operating system can modify the variables x and y. A
concurrent program can interact with its environment in complicated ways,
and we have to state explicitly what the program assumes about its environ-
ment to know if its correct. We usually want to assume as little as necessary
about the environment, which means the abstract program should allow it
to have many different behaviors.
Unanticipated behavior of the environment is a serious source of errors
in concurrent programs. Part of the environment of a program is likely to
be another program, such as an operating system. Avoiding errors may
require finding answers to subtle questions about exactly what that other
program does. This is often difficult, because the only description of what
it does other than its code is likely to be imprecise prose. When writing the
abstract program to describe what our concrete program does, describing
what the environment can do will tell us what questions we have to ask.
The expressiveness of math, embodied in TLA, provides a practical
method of writing and checking the correctness of high-level designs of sys-
tems. Such checking can catch errors early, when they are easier to correct.
TLA+ is used by a number of companies, including Amazon [40], Microsoft,
and Oracle. Math also provides a new way of thinking about programs that
can lead to better programming. There is usually no way to quantify the
result of better thinking, but it was possible in the following instance.
Virtuoso was a real-time operating system. It controlled some instru-
ments on the European Space Agency’s Rosetta spacecraft that explored a
comet. Its creators decided to build the next version from scratch, starting
with a high-level design written in TLA+. They described their experience
in a book [47]. The head of the project, Eric Verhulst, wrote this in an email
to me:

The [TLA+] abstraction helped a lot in coming to a much cleaner ar-


chitecture. One of the results was that the code size is about 10× less
CHAPTER 1. INTRODUCTION 11

than in [Virtuoso].3

This result was unusual. It was possible only because the design of the
entire system was described with TLA+. Usually, TLA+ is used to describe
only critical aspects of a system that involve concurrency, which represent
a small part of the system’s code. But this example dramatically illustrates
that describing abstract programs with mathematics can produce better
programs.

3
The book states the reduction in code size to be a factor of 5–10. Verhulst explained
to me that it was impossible to measure the reduction precisely, so the book gave a
conservative estimate; but he believes it was actually a factor of 10.
Chapter 2

Ordinary Math

We will write an abstract program as a mathematical formula. The program


could be quite complex, leading to a long formula. A long formula with a
lot of esoteric mathematics would be impossible to understand. Almost all
the math used in our formulas is ordinary math, consisting of arithmetic,
simple logic, sets, and functions. You should know it already if you took
an introductory university math course for computer science or engineering
students. The ordinary math used in this book is explained in this chapter,
but from a more sophisticated viewpoint than in introductory math courses.
Longer and easier to understand intuitive explanations of most of the math
discussed here can be found on the Web.
We will write an abstract program as a TLA formula. While most of that
formula consists of ordinary math, TLA is a temporal logic, and temporal
logic is not ordinary math. The meaning of TLA formulas will be explained
in terms of ordinary math. However, although not hard to understand,
temporal logic doesn’t satisfy all the properties of ordinary math. Avoiding
mistakes when using it requires a good understanding of precisely what its
formulas mean. Giving a meaning to the formulas of a logic is called defining
a semantics of the logic. This chapter explains how to define a semantics
by defining the meaning of the formulas of ordinary math. Even if you’re
familiar with the math presented here, the explanation of its semantics may
be new to you. Therefore, you should at least skim this chapter carefully.
The chapter also describes the method of writing proofs used in this
book, which is quite different from how mathematicians write proofs. This
method makes the proofs easier to understand, which makes them more
reliable. I hope that, some day, engineers will be able to write proofs of
correctness of their abstract programs. The proofs mathematicians write

12
CHAPTER 2. ORDINARY MATH 13

are too unreliable to provide confidence in the correctness of programs. En-


gineers will have to use the proof method presented here, or something like
it. Scientists would also benefit from using it.
The description of Euclid’s algorithm in Chapter 1 used the program
variables x and y. Program variables are different from the variables of
ordinary math, such as the ones used in elementary algebra. The values of
the program variables x and y in Euclid’s algorithm change during execution
of the algorithm. The values of the variables x and y of elementary algebra
remain the same throughout a calculation. They are like the constants M
and N of Euclid’s algorithm. For now, we will use the term variable to mean
a variable of ordinary math, like the ones of elementary algebra.
We begin with what I believe is the hardest math that you will have to
know—a branch of math that takes people years to learn. It’s much too
difficult to explain here, so I will have to assume that you’ve already learned
it. It’s called arithmetic.

2.1 Arithmetic and Logic


2.1.1 Numbers
Arithmetic is about numbers. The first numbers you learned about are the
positive integers 1, 2, 3, etc. You then learned about more and more kinds of
numbers until eventually you learned about the real numbers, which include

integers, rational numbers like 3/4, and lots of other numbers like − 2
and π (which equals 3.14159. . . ). Although the numbers we actually use
are almost always integers, most of our discussion here applies to all real
numbers, so we’ll let number mean real number.
We’ll use the same notation for the operations of arithmetic that you
learned in school—for example +, / (division), and ≥ ; except that multipli-
cation is written “∗” because mathematicians use × to mean something else.
We’ll also use one operator of arithmetic that you probably didn’t learn as
a child. That operator is the modulo operator % , written mod by mathe-
maticians. For any integer n greater than 0 and any integer m, the value of
m % n is the remainder when m is divided by n. The precise definition is
that m % n equals the unique integer r satisfying
m = d ∗ n + r and 0 ≤ r < n
for some integer d .
When studying arithmetic, you probably spent most of your time learn-
ing to calculate with numbers. We’ll do only a few very simple numerical
CHAPTER 2. ORDINARY MATH 14

calculations. What we will use are the properties of numbers that you should
have learned—for example:
√ √
(2.1) 3 ∗ ( 3 + π) = (3 ∗ 3) + (3 ∗ π)

2.1.2 Elementary Algebra


After learning about numbers, you probably continued your study of arith-
metic with a course in what was called elementary algebra, or something sim-
ilar in another language. That course introduced the concept of variables.
A variable represented an unspecified number. You wrote expressions like
x ∗ (y + 3) whose value depends on the values of x and y. For example, if x
equals 2 and y equals −4, then the value of that expression is 2 ∗ (−4 + 3),
which equals −2. You learned how to calculate with expressions containing
variables. For example, you learned that x ∗ (y + 3) equals x ∗ y + x ∗ 3,
regardless of the values of x and y.
There are two kinds of expressions in elementary algebra:

• Ones like x ∗ (y + 3), whose value after substituting numbers for the
variables is a number. We call such an expression a numeric expres-
sion. A number is a numeric expression that has no variables.

• Ones like x ∗ (y + 3) ≥ 7, that are either true or false after substituting


numbers for the variables. We call such an expression a formula, and
we say that the value of a formula after substituting numbers for the
variables is either true or false, which are two distinct values called
Booleans. We often say that a formula is true (or false) for some values
of its variables, but that’s just a short way of saying that the value
of the formula obtained by substituting those values for the variables
equals true (or false).

Most of the expressions you wrote in elementary algebra were either formulas
or parts of formulas; and most of the formulas you wrote were equations.
Most of what you did in elementary algebra consisted of solving equations.
That meant finding a single value for each variable such that substituting
those values for the variables in the equations made the equations equal
true.
We’re not interested in solving equations. We will describe programs
with formulas, so we need to understand those formulas. This requires
understanding some basic concepts of formulas. The formulas of elementary
algebra are used to explain these concepts because you’re familiar with them.
CHAPTER 2. ORDINARY MATH 15

An important class of formulas are ones that equal true no matter what
values are substituted for their variables. Such a formula is said to be valid ;
and the assertion that F is valid is written |= F . For example, the truth of
formula (2.1) is a special case of:

(2.2) |= p ∗ (q + r ) = p ∗ q + p ∗ r

The thing we write as |= F is not a formula of elementary algebra. It’s an as-


sertion about the formula F . Formally, |= F is a metamathematical formula
or meta-formula. We usually call it something else like an assertion or a
condition. When |= F is true for an interesting formula F , mathematicians
generally call F a theorem. But I will sometimes call |= F a meta-formula
to remind you that it’s not a formula.
Logicians write ` F to mean that the formula F can be proved from
axioms and proof rules (also called inference rules). They call F a theorem
if ` F is true. Like most mathematicians, we won’t worry about proving
formulas from axioms. We prove something by showing that it follows from
ordinary math, and we reason about the ordinary math as rigorously as
necessary. This book explains how proof rules are used to reduce the proof
of correctness of a program to proofs of simpler mathematical properties.
Most often used are the rules of ordinary math, which are explained in this
chapter. If you need a proof to be completely rigorous, you will have to
learn to use a theorem-proving program; you will never prove things from
axioms. I will call a formula an axiom when we are assuming it to be true
and it can’t be deduced from formulas we have already assumed to be true.
We can deduce (2.1) from (2.2) by the following rule:

Elementary Algebra Instantiation Rule Substituting any numeric


expressions for (some or all of the) variables in a valid formula yields
a valid formula.

For example, applying the rule to (2.2), substituting 3 for p, 3 for q, and
π for r produces the assertion that (2.1) is a valid formula.
Mathematicians have no standard notation for describing substitution,
and the notation I’ve seen used by computer scientists is impractical for the
formulas that arise in describing programs. The notation used in this book,
illustrated with substitution for three variables, is that

E with v 1 ← exp1, v 2 ← exp2, v 3 ← exp3

is the expression obtained from the expression E by simultaneously substi-


tuting exp1 for v 1, exp2 for v 2, and exp3 for v 3; where v 1, v 2, and v 3 are
CHAPTER 2. ORDINARY MATH 16

distinct variables and exp1, exp2, and exp3 are numeric expressions. For
example
( p ∗ (q + r ) with q ← r , r ← q + s ) = p ∗ (r + (q + s))
As in this example, the with expression is usually enclosed in parentheses
when it appears in a formula, otherwise the formula would be difficult to
parse.

2.1.3 An Introduction to Mathglish


Math is precise, but this book isn’t written in math. It’s written in English
that explains math. Explaining the precise meaning of math in the imprecise
language of English is not easy. To help them do this, English-speaking
mathematicians speak and write in a dialect of English I call Mathglish. (I
expect mathematicians use similar dialects of other languages.) Mathglish
differs from English in two ways: It eliminates some of the imprecision of
English by giving a precise meaning to some imprecise English words or to
newly invented words, and it makes the written language more compact by
using mathematical formulas to replace English phrases.
This book is written in Mathglish. This chapter explains the Mathglish
you need to know to read the book. This section discusses the second feature
of Mathglish—the use of formulas to replace prose.
Consider these two sentences:
1. Substituting y + 1 for z in formula (42) yields x ≥ y + 1.
2. Formula (42) shows us that x ≥ y + 1.
Grammatically, we can see that the two uses of “x ≥ y + 1” are different.
In sentence 1 it’s a noun, while in sentence 2 it’s a complete clause. In
sentence 1, “x ≥ y + 1” is a formula; in sentence 2 it’s an abbreviation for
“x is greater than or equal to y + 1”. This grammatical difference tells us
that the two sentences have very different meanings. Sentence 2 asserts that
the formula x ≥ y + 1 is true. The first doesn’t tell us whether it is true or
false. For example, sentence 1 could be followed by:
Since (41) implies x < y + 1, this proves that (42) is false.
It isn’t always possible to tell just from the sentence whether or not it’s
asserting that a formula is true. For example, we don’t know which the
following sentence is doing.
3. From (42) we deduce x ≥ y + 1.
CHAPTER 2. ORDINARY MATH 17

In that case, we have to look at the context in which the sentence appears.
The formula x ≥ y + 1 can be true only in a context in which some assump-
tions have been made about the values of x and y + 1—assumptions that
are expressed by formulas that are assumed to be true. Sentence 3 asserts
that x ≥ y + 1 is true if and only if formula (42) has either been assumed or
shown to be implied by assumptions made about x and y. I have tried to
make it clear by grammar or context what it means when a formula appears
in a sentence in this book.
There’s another source of ambiguity in most mathematical writing that
I have tried to avoid in this book. Almost no mathematicians other than
logicians write the meta-formula |= F differently from the formula F . In
most written math, you have to tell from the context which is meant.

2.1.4 Proofs in Elementary Algebra


Engineers now seldom write proofs of correctness of abstract programs.
Hand proofs are not reliable, and computer-checked proofs are usually too
much work because the proofs must be carried out to a very low level of
detail for a computer to check them. However, proof checkers are getting
better. Although we don’t know when it will happen, machine learning
should make it possible for programs to check proofs written at a higher
level—proofs that are much easier to write. The proof method described
here is the best way I know to write those proofs.
Even before proving correctness becomes easier, engineers as well as
computer scientists should learn how to write proofs. Learning how to reason
about a class of formulas is part of understanding those formulas. Learning
how to prove properties of programs teaches you about programs and their
properties.
A mathematician’s proof is a sequence of paragraphs written in Math-
glish. This way of writing proofs is adequate for very simple proofs. How-
ever, it is not reliable for complicated proofs. It is particularly unsuited to
handling the many details in a proof of correctness of a program. This book
teaches you how to write proofs that are easier to read than the proofs writ-
ten by mathematicians. Such proofs make it much harder to prove things
that aren’t true. We will use ordinary paragraph proofs only to prove simple
results and to serve as proof sketches that aren’t meant to be a proof.
CHAPTER 2. ORDINARY MATH 18

2.1.4.1 An Example
In school, you learned to solve this pair of equations:

(2.3) 3 ∗ x − 2 ∗ y = 7 and 7 ∗ x + 3 ∗ y = 1

If you can still do it, you will find that the solution is x = 1 and y = −2. In
any case, you can easily calculate that substituting those values for x and y
makes both of these equations true. But is that the only solution? Are there
other values of x and y for which the equations are true? The procedure
you would have followed to find the solution actually proves that it is the
only one. As an example, we will write that procedure as a proof.
The method of solving equations (2.3) is based on some rules. One is:

Rule EqAdd Assume: m = n, p = q


Prove: m + p = n + q
This rule states that if the values of the variables m, n, p, and q make
both the formulas m = n and p = q true, then they make the formula
m + p = n + q true. If the values of those variables make m = n or p = q or
both of them false, then it tells us nothing about the value of m + p = n + q.
For example, suppose we know that these two formulas are true:

(2.4) 42 = x + y and 2 ∗ x = y + 1

We can apply the Elementary Algebra Instantiation Rule by substituting


m ← 42, n ← x + y , p ← 2 ∗ x , q ← y + 1 in rule EqAdd to deduce this
from (2.4):

(2.5) 42 + 2 ∗ x = (x + y) + (y + 1)

The second rule used in solving (2.3) is:

Rule EqMult Assume: m = n


Prove: p ∗ m = p ∗ n
Another rule that is ubiquitous in mathematics is:

Substitution Rule If e1 equals e2 and exp is an expression containing


e1 as a subexpression, then exp equals the expression obtained from it
by replacing e1 with e2.

For example, I presume you can calculate with elementary algebra expres-
sions, so you know that 42 + 2 ∗ x equals 2 ∗ x + 42 and (x + y) + (y + 1)
equals x + 2 ∗ y + 1. We can then apply the substitution rule twice, the first
CHAPTER 2. ORDINARY MATH 19

substituting (2.5) for exp and substituting 42 + 2 ∗ x for e1 and 2 ∗ x + 42


for e2 to deduce:

2 ∗ x + 42 = (x + y) + (y + 1)

A second application of the Substitution Rule, substituting (x + y) + (y + 1)


for e1 and x + 2 ∗ y + 1 for e2, then implies the truth of:

2 ∗ x + 42 = x + 2 ∗ y + 1

We now come to the proof that x = 1 and y = −2 is the only solution


to the equations (2.3). The theorem that asserts this and its proof are in
Figure 2.1. The theorem asserts that if the values of x and y make true our
two equations, which are numbered as assumptions 1 and 2, then the two
equations of the Prove clause must be true for those values of x and y.
This is a precise way to say that 1 and −2 are the only values of x and
y that make equations (2.3) true, without having to give solving a precise
meaning.
The proof consists of a sequence of numbered steps, each but the last one
consisting of a formula and a proof. Each of those first seven steps asserts
that its formula is true if the theorem’s two assumptions are true. That
is, each formula is true for all values of x and y for which the assumptions
are true. A step’s proof explains why the step’s formula is true under those
assumptions.
I think most people would say that the step 1 formula is obtained by
multiplying assumption 1 (our first equation) by 3. The proof gives a more
detailed explanation, saying that truth of the formula follows from the truth
of assumption 1 and the EqMult rule with 3 substituted for p. I didn’t say
what expressions were substituted for m and n because I thought it was
obvious that the left- and right-hand sides of the equation were substituted
for them. But to give you a hint, and in case you’ve completely forgotten
elementary algebra, I reminded you how to multiply 3 ∗ x − 2y by 3. (I
assumed it was obvious that 3 ∗ 7 equals 21.)
Step 2 is similar, multiplying the second equation by 2, but the proof
leaves the multiplying to you. Step 3 adds the formulas of step 1 and step 2,
which means applying Rule EqAdd. I thought it was obvious what was
being substituted for the variables in that rule. Steps 4, 5, and 7 are similar,
except that, for no particular reason, I used words instead of the symbol ←
to describe the substitutions in the rules. Step 5 is applying the Substitution
Rule, with assumption 2 substituted for exp and with x and 1 substituted for
e1 and e2. I indicated what was being substituted where, but didn’t mention
CHAPTER 2. ORDINARY MATH 20

Theorem Assume: 1. 3 ∗ x − 2 ∗ y = 7,
2. 7 ∗ x + 3 ∗ y = 1
Prove: x = 1 and y = −2

1. 9 ∗ x − 6 ∗ y = 21
Proof: By assumption 1 and Rule EqMult with p ← 3,
since 3 ∗ (3 ∗ x − 2 ∗ y) = 9 ∗ x − 6 ∗ y.
2. 14 ∗ x + 6 ∗ y = 2
Proof: By assumption 2 and Rule EqMult with p ← 2.
3. 23 ∗ x = 23
Proof: By steps 1 and 2 and Rule EqAdd.
4. x = 1
Proof: By step 3 and Rule EqMult, substituting 1/23 for p.
5. 7 ∗ 1 + 3 ∗ y = 1
Proof: By step 4 and assumption 2 with 1 substituted for x .
6. 3 ∗ y = −6
Proof: By step 5 and Rule EqAdd, substituting −7 for m and n.
7. y = −2
Proof: By step 6 and Rule EqMult, substituting 1/3 for p.
8. Q.E.D.
Proof: By steps 4 and 7.

Figure 2.1: Proof of uniqueness of the solution to (2.3).


CHAPTER 2. ORDINARY MATH 21

the Substitution Rule. That rule is such a basic part of mathematics that
it is taken for granted and never explicitly mentioned in a proof.
Finally, we come to the last statement. “Q.E.D.” is an abbreviation for
the goal of the proof, which is the Prove clause of the theorem. In this case,
the goal is to prove the two formulas x = 1 and y = −2. The proof simply
points to the steps in which those formulas are proved. A proof always ends
with a Q.E.D. step, so we’re sure that we’ve actually proved what we were
supposed to.
How we write a proof depends on how hard the proof is and how so-
phisticated we expect the reader of the proof to be. This proof was written
for someone less sophisticated than I expect most readers of this book to
be, since I wanted you to concentrate on the proof style rather than on the
math. A single-paragraph prose proof would probably be fine for a reader
who hasn’t forgotten elementary algebra.

2.1.4.2 Longer Proofs


The kind of proof illustrated by Figure 2.1 is more reliable and easier to
read than a prose proof. It makes clear the sequence of intermediate results
that are being proved and exactly what is being used to prove each of those
results. However, a sequence doesn’t work for the long proofs needed to
prove complex results—such as the correctness of the abstract programs
that engineers write.
The method of handling complexity that’s obvious to an engineer is
hierarchical structuring. Figure 2.2 shows how the proof of Figure 2.1 can
be structured. The high-level proof consists of steps 1, 2, and 3. This level-
1 proof is suggested by the goal of the theorem, which consists of the two
formulas asserted by steps 1 and 2. The proof of its Q.E.D. step (step 3) is
the same as the proof of the Q.E.D. step of the Figure 2.1 proof, except the
steps it refers to have been renumbered.
The proof of step 1 consists of steps 1.1–1.4. Those steps are the same as
steps 1–4 of Figure 2.1, the Q.E.D. of step 1.4 being the goal of that proof,
which is x = 1. The proofs of steps 1.1–1.4, which have been omitted to save
space, are the same as the proofs of the corresponding steps of Figure 2.1.
Similarly, the proof of step 2 consists of steps 2.1–2.3, which are the same
as steps 5–7 of Figure 2.1.
Note that the proof of step 3 is a prose paragraph, while the proofs of
steps 1 and 2 are sequences of steps each with a prose proof. In general,
proofs of some or all of those level-2 steps can be further decomposed. Dif-
ferent parts of the proof can be decomposed down to different levels. It’s a
CHAPTER 2. ORDINARY MATH 22

1. x = 1
1.1. 9 ∗ x − 6 ∗ y = 21

1.2. 14 ∗ x + 6 ∗ y = 2

1.3. 23 ∗ x = 23

1.4. Q.E.D.
2. y = −2
2.1. 7 ∗ 1 + 3 ∗ y = 1

2.2. 3 ∗ y = −6

2.3. Q.E.D.
3. Q.E.D.
Proof: By steps 1 and 2.

Figure 2.2: Structured version of the proof in Figure 2.1.

good idea to make a Q.E.D. step a simple paragraph that you write first, so
you don’t waste time proving steps that don’t imply the proof’s goal.
At the bottom of the hierarchical structure are steps whose proof is
written in prose. That prose should be easy to understand, so the reader
can be sure that it’s correct. How easy that has to be depends on the
reader, who may just be you. If you find that the proof isn’t easy enough to
understand, you should decompose it another level. I’ve found that the way
to avoid errors is to decompose a proof down to the level where the prose
proof is obviously correct, and then go one level deeper. For machine-checked
proofs, the bottom-level proofs are instructions for the proof checker. If the
checker fails to check the proof and you believe the step is correct, then keep
decomposing until either the checker says it’s correct or you see why it’s not.
Long proofs, especially correctness proofs of programs, can be quite deep.
For proofs more than three or four levels deep, we use a compact numbering
system explained in Section 2.1.10.2 below.
In this example, the theorem’s assumptions and goals were mathemati-
cal formulas, as were the assertions made by the steps. This should be the
case for theorems asserting correctness of programs—except perhaps in some
cases at the deepest levels of the proof. In most mathematical proofs, includ-
ing proofs about the math underlying our science of concurrent programs,
CHAPTER 2. ORDINARY MATH 23

the theorem and the assertions of the steps consist of prose statements, such
as “x is a prime number.” Those statements may be a few sentences long.
The prose describes mathematical formulas, but getting the details exactly
right isn’t as important for those theorems as it is for programs. Hierarchi-
cally structured prose proofs are reliable enough for them.

2.1.5 The Semantics of Elementary Algebra


We are now going to over-analyze elementary algebra. We’ll perform the
kind of hairsplitting that may delight logicians and philosophers, but is of no
interest to scientists and engineers who use the math. But in later chapters
we will use temporal logic. While not difficult, temporal logic is different
from ordinary math in subtle ways. It’s easy even for mathematicians to
make mistakes when using it. Our over-analysis of elementary algebra will
help us avoid those mistakes.
The formulas of elementary algebra are literally strings of characters. To
know if how we reason about them makes sense, we need to know what those
character strings mean. We must give a semantics to elementary algebra.
There’s a philosophical question of how we can do that because we have to
write the meaning in some language, and how do we know what the words
or symbols of that language mean? We’re not going to go there.
I believe that I understand arithmetic. I can’t tell you what the number
2 or the number 4 is,1 but I understand that 2 + 2 equals 4. That kind of
understanding is good enough. If I can express the meaning of any formula
of elementary algebra in terms of arithmetic, then I am confident that I
understand elementary algebra. I assume your knowledge of arithmetic is
also good enough.
The meaning of a mathematical theory can be expressed in terms of
collections and mappings. Mathematicians generally call a collection a set
and call a mapping a function. For a particular mathematical theory, such as
elementary algebra, we can take collections and mappings to be the same as
sets and functions. However, we don’t want to restrict the values of variables
in abstract programs to be values in some small collection of mathematical
theories. We want to allow the value of a variable to be any mathematical
object. When we do that, we find that a set is a particular kind of collection,
and a function is a particular kind of mapping. Mathematicians sometimes
call a collection a class.
1
Mathematicians have defined these numbers, but the explanation of what the things
in those definitions are is no better than our explanation of what a number is.
CHAPTER 2. ORDINARY MATH 24

We can’t define precisely what a collection is. I could say that a collection
is a bunch of things, but I would then have to define what a bunch is. I
assume you know what a collection is. The things that a collection is a
collection of are usually called values. For example, the values in the theory
of elementary algebra are numbers.
A mapping M from a collection C of values to a collection D of values
is something that assigns to each value v in C a value M (v ) in D. (The
collections C and D can be the same.) We say that such a mapping M is
a mapping on C. For example, we can define a mapping LengthOfName
from a collection of people to the collection of natural numbers by defining
LengthOfName(p) to equal the number of letters in the name of p, for every
person p in a collection D of people. If Jane is in D, then this defines
LengthOfName(Jane) to equal 4.
A predicate is a Boolean-valued mapping—that is a mapping M on a
collection C such that M (v ) equals true or false for each value v in C.
The meaning of an elementary algebra formula is a predicate on interpreta-
tions, where an interpretation is a mapping from variables to numbers. We
define the meaning [[F ]] of a formula F to be the predicate that maps an
interpretation to the Boolean value obtained by replacing each variable in
F by the value assigned to that variable by the interpretation. For example, If you’re not
if Υ is an interpretation, then [[x + y > 42]](Υ) equals Υ(x ) + Υ(y) > 42. If used to reading
formulas with
Υ(x ) = 3 and Υ(y) = 27, then [[x + y > 42]](Υ) equals 3 + 27 > 42, which
Greek letters,
equals false. go now to
This definition of [[x + y > 42]] makes no sense. Here’s why. A semantics Figure 2.3.
assigns a meaning to a formula. A formula is a string of characters. Its
meaning is a mathematical object. Let’s write the string of characters that is
the formula in a font like this: x + y > 42; and let’s write mathematical objects
like numbers or variables in the font used throughout this book. I claimed
that [[x + y > 42]](Υ) equals 3 + 27 > 42, which is a meaningless combination
of the numbers 3 and 27 and the four characters + > 4 2 .
Elementary algebra has a grammar. The grammar tells us that x + y is an
expression but + x + isn’t, that the + in x + y is an operator with the subex-
pressions x and y as its arguments, and that x + y * z is parsed as x + (y * z).
I assume you know this grammar, so you understand how the expressions
that appear in examples are parsed. To deduce that [[x+y>42]](Υ) equals
3+27 > 42 for the particular interpretation Υ, I replaced x by [[x]](Υ), which
equals 3, and y by [[y]](Υ), which equals 27. I then replaced the syntactic
tokens +, >, and 42 of elementary algebra by the tokens of arithmetic that
are spelled the same. This “punning” works for elementary algebra because
it is so closely related to arithmetic. It won’t work for TLA, so we use a
CHAPTER 2. ORDINARY MATH 25

I have been told that many engineers freak out when they see a
Greek letter like Υ in a formula. If you’re one of them, now is the
time to get over it. You had no trouble dealing with π as a child;
you can now handle a few more Greek letters. They’re used
sparingly in this book, but sometimes representing a particular
kind of object with Greek letters makes the text easier to read.
Here are all the Greek letters used in the book, along with their
English names. You don’t have to remember their names; you
just need to distinguish them from one another.
Lowercase
α alpha λ lambda π pi τ tau
β beta µ mu ρ rho φ phi
δ delta ν nu σ sigma ψ psi
 epsilon (also written ε)
Uppercase
Λ Lambda Υ Upsilon Π Pi Φ Phi
∆ Delta
Figure 2.3: Greek letters used in this book.

more general approach.


We define the meanings of formulas by defining the meanings of all ex-
pressions, which for elementary algebra means formulas and numeric expres-
sions. We define rules for operators that can appear in an expression. For
example, the rule for the operator symbol + in a formula is that for any
character strings exp1 and exp2 describing numeric expressions and any in-
terpretation Υ, we define [[exp1 + exp2 ]](Υ) to equal [[exp1 ]](Υ) + [[exp2 ]](Υ).
We also define [[v]](Υ) to equal Υ(v ) for a variable v ; and for any string dstr
of digits, we define [[dstr ]](Υ) to be the number represented by that string
of digits. Thus, if Υ(x ) = 3 and Υ(y) = 27, then:
[[x + y > 42]](Υ) = [[x + y]](Υ) > [[42]](Υ) by the rule for >
= [[x]](Υ) + [[y]](Υ) > 42 by the rules for + and numbers
= 3 + 27 > 42 by the rule for variables
= false
It follows immediately from our definition of validity that a formula F is
valid if and only if [[F ]](Υ) equals true for all interpretations Υ.
If we were going to do all this very rigorously—for example, to write a
program to calculate the meanings of formulas—we would define a mapping
CHAPTER 2. ORDINARY MATH 26

from formulas to parse trees and apply the rules to the parse tree. But we
won’t be that compulsive.
Implicit in this exposition is that an interpretation assigns values to all
possible variables, not just the ones in any particular expression. The value
of [[F ]](Υ) for a formula F depends only on the values the interpretation Υ
assigns to variables that occur in F . But letting an interpretation assign
values to all variables simplifies things, because it means we don’t have to
keep track of which variables an interpretation is assigning values to.
We assume that there are infinitely many variables. We do this for the
same reason we assume there are infinitely many integers even though we
only ever use relatively few of them: it makes things simpler not to have to
worry about running out of them.
Let’s review what we have done. An expression is a string of characters.
We define the meaning [[exp]] of an expression exp to be a mapping that
assigns to every interpretation Υ a value [[exp]](Υ) that is either a number
or a Boolean, where an interpretation Υ is a mapping that assigns to each
variable v a value Υ(v ) that is a number. We define [[exp]] by defining
[[exp]](Υ) as follows. An expression is either (i) a variable or (ii) a string
of characters that represents an operator op applied to its arguments. In
case (i), [[exp]](Υ) equals Υ(exp). In case (ii), we define for each operator
op the value of [[exp]] in terms of the values [[arg]] for each argument arg of
op in the expression exp. This defines the meaning of the operator op. For
the operator +, we define [[arg1 + arg2 ]] to be the expression [[arg 1 ]] + [[arg 2 ]]
of arithmetic. The + in [[arg1 + arg2 ]] is a one-character string of characters,
and the + in [[arg 1 ]] + [[arg 2 ]] is an operation of arithmetic. We consider an
expression like 42 that represents a number to be an operator that takes no
arguments, where [[42]](Υ) equals 42 for every interpretation Υ.
The similarity in the way we write the operator + of elementary algebra
and the + of arithmetic is obviously not accidental. Syntactically, elemen-
tary algebra is an extension of arithmetic to include variables. However, the
formula 2 + 3 has a different meaning in elementary algebra than in arith-
metic. If we were to give a semantics to a language of arithmetic, then
[[2 + 3]] would equal the number 5. In elementary algebra, [[2 + 3]] is a map-
ping such that [[2 + 3]](Υ) equals 5 for every interpretation Υ. Elementary
algebra is an extension of arithmetic in the sense that if exp 1 and exp 2 are
two expressions of arithmetic, such as 2 + 3 and 5, then [[exp1 ]] and [[exp2 ]]
are equal in the semantics of arithmetic if and only if they’re equal in the
semantics of elementary algebra.
The purpose of this over-analyzing of elementary algebra was to explain
CHAPTER 2. ORDINARY MATH 27

what it means to define the meaning of mathematical operators such as +.


From now on, we’ll be less formal—especially when explaining operators of
ordinary math in this chapter.

2.1.6 Arithmetic Logic


In elementary algebra, we can combine numeric expressions using the opera-
tors of arithmetic, but we have no operators for combining formulas. What
we will call arithmetic logic is obtained by adding such operators to ele-
mentary algebra. They are the operators for combining the Boolean values
true and false. Arithmetic is the study of numbers and the operators on
them. So for now, let’s call the study of Boolean values and their operators
Boolean arithmetic. Here are the operators of Boolean arithmetic, and how
they are read in Mathglish:
¬ negation not ⇒ implication implies
∧ conjunction and ≡ equivalence if and only if
∨ disjunction or
Implication is sometimes written → or ⊃, and ≡ is sometimes written ⇔
or ↔.
You’ve probably already learned about these operators, so they are just
defined briefly here as follows. Negation and conjunction are defined by
¬ true = false true ∧ true = true
¬ false = true true ∧ false = false
false ∧ true = false
false ∧ false = false

The other operators can be defined in terms of ¬ and ∧ as follows, where =
means equals by definition:

A ∨ B = ¬(¬A ∧ ¬B )

A ⇒ B = ¬A ∨ B

A ≡ B = (A ⇒ B ) ∧ (B ⇒ A)
If you’re not familiar with Boolean arithmetic, you should write out the
complete definitions of ∨, ⇒, and ≡, the way it’s done above for ¬ and ∧.
Here is a brief explanation of what these operators and their Mathglish
counterparts mean.
¬A asserts that A is not true, where not means the same thing in English
and Mathglish.
CHAPTER 2. ORDINARY MATH 28

A ∧ B asserts that A is true and B is true, where and has the same meaning
in both languages.

A ∨ B asserts that A is true or B is true (or both A and B are true). Unlike
or in English, or in Mathglish always allows the possibility that both
formulas are true.

A ⇒ B asserts that A is true implies B is true. This means that B must


be true if A is true, but says nothing about B if A is false. Thus,
false ⇒ true and false ⇒ false both equal true. Reading ⇒
as implies is confusing because A implies B in English means that A
being true causes B to be true, while implies in Mathglish does not.
Only in Mathglish would we say that 2 + 2 = 5 implies 2 + 2 = 4. A
good way to understand ⇒ and the Mathglish implies is that we want


(x > 20) implies (x > 10)

to be true for all numbers x , and substituting different numbers for


x shows that we want A ⇒ B to equal true except when A = true
and B = false.

A ≡ B asserts that A is true if and only if B is true. In other words, ≡ is


the equality relation for Boolean values. We read ≡ as is equivalent
to. Because we often want to express equivalence, written Mathglish
has the abbreviation iff for if and only if. We sometimes write equals
instead of is equivalent to because it’s shorter.

Having the operators of propositional logic allows arithmetic logic to


express some of the rules and theorems of elementary algebra as the validity
of formulas. Rule EqMult asserts that if m = n is true then p ∗ m = p ∗ n
is true, for any values of m, n, and p. Therefore, it can be written as

|= (m = n) ⇒ (p ∗ m = p ∗ n)

since |= means true for any values of the variables. Rule EqAdd can be
written as

|= (m = n) ∧ (p = q) ⇒ (m + p = n + q)

We can write the theorem of Figure 2.1 as:

Theorem (3 ∗ x − 2 ∗ y = 7) ∧ (7 ∗ x + 3 ∗ y = 1) ⇒ (x = 1) ∧ (y = 2)
CHAPTER 2. ORDINARY MATH 29

We don’t write the |= because it’s implied by stating the formula as a the-
orem. However, we’ll see below that if we write the theorem this way, then
its proof has to be rewritten.
Observe that ∧ has higher precedence (binds more tightly) than ⇒.
The operator ¬ has higher precedence than ∧ and ∨, which have higher
precedence than ⇒ and ≡. Thus

¬A ∧ B ⇒ C ∨ D equals ((¬A) ∧ B ) ⇒ (C ∨ D)

I don’t know how the following expressions should be parsed, so it’s best
not to write them:

A∧B ∨C A≡B ⇒C

Boolean operators have lower precedence than other operators, including


the operators of arithmetic. The parentheses used above in the statements
of the EqMult and EqAdd rules and in the theorem are needed only to make
the formulas easier to read.

2.1.7 Propositional Logic


Consider these true assertions about arithmetic logic:

(2.6) (a) |= (x > 1) ⇒ (x > 0)


(b) |= (x > 1) ∧ (y ≥ 3) ⇒ (x > 1)
The first expresses a fact about arithmetic. The second tells us nothing
about arithmetic. It remains true if we replace x > 1 and y ≥ 3 by any two
formulas. It’s really a fact about Boolean arithmetic. To study such facts,
we temporarily abandon arithmetic logic and introduce a logic of Boolean
arithmetic—a logic called propositional logic.
A formula of propositional logic is a syntactically correct sequence of vari-
ables, Boolean operators, and the Boolean values true and false, where the
syntax has been informally described above. Every expression is Boolean-
valued, so all the expressions are formulas. The meaning [[F ]] of a formula
of propositional logic is defined in the same way we defined the meaning
of a formula of elementary algebra. We just replace numbers with Boolean
values and the operators of arithmetic with the Boolean operators. The
meaning of a propositional logic formula is a predicate on interpretations,
where an interpretation is a predicate on variables. Thus [[A ∧ B ⇒ A]](Υ)
equals Υ(A) ∧ Υ(B) ⇒ Υ(A), where ∧ and ⇒ in the formula are characters,
while those same symbols in the meaning are the corresponding operators
CHAPTER 2. ORDINARY MATH 30

of Boolean arithmetic. As in arithmetic logic, |= A for a propositional logic


formula A asserts that [[A]](Υ) equals true for all interpretations Υ.
A valid formula of propositional logic is often called a tautology. Here
are some tautologies that are easy to check by substituting the two Booleans
for A; but you should find them obvious from the explanation of the oper-
ators given above. (Remember that ≡ is the equality operator of Boolean
arithmetic.)
|= ¬¬A ≡ A |= true ∧ A ≡ A |= false ∧ A ≡ false
|= A ∨ ¬A ≡ true |= A ∧ ¬A ≡ false |= (true ⇒ A) ≡ A
|= A ∧ A ≡ A |= A ∨ A ≡ A |= (false ⇒ A) ≡ true
The following six tautologies have names. You should find the first five easy
to remember because they can be obtained from familiar laws of arithmetic
by substituting ∧, ∨, and ≡ for ∗, +, and =. The sixth is a bonus that has
no counterpart in ordinary arithmetic.
Commutativity: |= A ∧ B ≡ B ∧ A
|= A ∨ B ≡ B ∨ A
Associativity: |= (A ∧ B ) ∧ C ≡ A ∧ (B ∧ C )
|= (A ∨ B ) ∨ C ≡ A ∨ (B ∨ C )
Distributivity: |= A ∧ (B ∨ C ) ≡ (A ∧ B ) ∨ (A ∧ C )
|= A ∨ (B ∧ C ) ≡ (A ∨ B ) ∧ (A ∨ C )
Just as associativity of ∗ implies that we can write a ∗ b ∗ c ∗ d without
parentheses because it doesn’t matter in which order we compute the prod-
ucts, associativity of ∧ implies that we can write A ∧ B ∧ C ∧ D. Similarly,
associativity of ∨ means we can write A ∨ B ∨ C ∨ D without parentheses.
And as in arithmetic, commutativity implies we can write conjunctions or
disjunctions in any order.
The following two tautologies, which are called De Morgan’s laws, show
how to move negation over disjunction and conjunction when calculating
with Boolean arithmetic.
|= ¬(A ∧ B ) ≡ ¬A ∨ ¬B |= ¬(A ∨ B ) ≡ ¬A ∧ ¬B
They are important for understanding and manipulating formulas, and you
should internalize them—for example by understanding why:
¬((x = 1) ∧ (y = 2)) is equivalent to (x 6= 1) ∨ (y 6= 2)
An important property of implication is:
CHAPTER 2. ORDINARY MATH 31

Transitivity: |= (A ⇒ B ) ∧ (B ⇒ C ) ⇒ (A ⇒ C )

When A ⇒ B is true, we sometimes say that A is stronger than B , or B is


weaker than A. These are Mathglish abbreviations of “stronger/weaker than
or equivalent to”, since A ≡ B implies A ⇒ B and B ⇒ A. Stronger means
implying at least as many formulas because, by transitivity, A stronger than
B means that if B ⇒ C is true then so is A ⇒ C .
We usually manipulate propositional logic formulas to see if one formula
implies or is equivalent to another. Besides the tautologies given above, all
you need to know to do that are the definition of implication and that ≡
satisfies the usual properties of equality. When manipulating a formula, if
you can’t use transitivity of ⇒, it’s usually best to expand the definition of
⇒ so the formula contains only the operators ¬, ∧, and ∨.
When we describe programs with formulas, the operators of propositional
logic are used more often in those formulas than any other mathematical
operators, except perhaps “=”. It’s therefore important to understand them
well and to be comfortable using propositional logic.
Fortunately, propositional logic is really simple because it’s based on
Boolean arithmetic; and Boolean arithmetic is as simple as math can get
because it’s the arithmetic of just two values. However, you may not be
comfortable using propositional logic, either because you never learned it or
because you haven’t spent as much time learning it as you spent learning
elementary algebra. In that case, you may want to supplement what you
learn here by reading about it on the Web. Propositional logic is also called
Boolean algebra, and true and false are also sometimes written as > and
⊥ or as 1 and 0.
Even though propositional logic is simple, long formulas can be difficult
to understand. I sometimes find it hard to see if two propositional logic
formulas are equivalent. Instead of spending time trying to use tautologies
to calculate if they’re the same, I ask a computer. There are propositional
logic calculators on the Web that can be used for that. They can also be
helpful learning tools.

2.1.8 The Propositional Logic of Arithmetic


2.1.8.1 The Logic
The meanings of [[F ]] and |= F depend on the logic we’re talking about.
So far, we’ve used it for formulas of arithmetic logic and of propositional
logic. There’s an important relation between those two logics: if F is a valid
formula of propositional logic and we substitute formulas of arithmetic logic
CHAPTER 2. ORDINARY MATH 32

for the variables of F , then we obtain a valid formula of arithmetic logic.


For example, the assertion (2.6b) is obtained by the obvious substitutions
in the tautology |= A ∧ B ⇒ A. This relation between propositional logic
and the logic of arithmetic underlies much of the reasoning in proofs about
elementary algebra formulas.
Instead of having two separate logics, we modify arithmetic logic so it ex-
tends propositional logic in much the way that elementary algebra extends
arithmetic. We do this by adding Boolean-valued variables to arithmetic
logic. This means that we have two different kinds of variables: numeric-
valued and Boolean-valued. We’ll eliminate the “valued” and call then nu-
meric and Boolean variables. We call the resulting logic the propositional
logic of arithmetic.
In this logic, an interpretation assigns numbers to numeric variables and
Booleans to Boolean variables. I have adopted the convention of using lower-
case letters for numeric variables and upper-case ones for Boolean variables.
But this is just a convention, and in principle there are two different variables
that we call x . We can assume that those two variables are distinguishable
in some way—perhaps by being printed in different colors. Whenever we
use a variable it will be clear in which color it should be printed, so we don’t
bother coloring it. And of course, we’ll never use the same name with both
colors in the same formula. Later in this chapter, we’ll see that we don’t
need these two different kinds of variables. But in Chapter 3, when we start
using temporal logic, we’ll again need two different kinds of variables.
Having incorporated propositional logic into arithmetic logic, we don’t
have to “import” tautologies from propositional logic to reason about ele-
mentary algebra. Those tautologies have become tautologies of the propo-
sition logic of arithmetic. To use them, we extend the Elementary Algebra
Instantiation Rule to:
Propositional Logic of Arithmetic Instantiation Rule Substi-
tuting any numeric expressions for (some or all) numeric variables and
any formulas for (some or all) Boolean variables in a valid formula
yields a valid formula.

One propositional logic tautology often used in proofs is the transitivity of


⇒. Just as we write x < y < z as an abbreviation for (x < y) ∧ (y < z ), we
sometimes write A ⇒ B ⇒ C for (A ⇒ B ) ∧ (B ⇒ C ). A nice way to write
a proof of A ⇒ Q is:

A⇒B Proof of A ⇒ B .
⇒C Proof of B ⇒ C .
CHAPTER 2. ORDINARY MATH 33

..
.
⇒Q Proof of P ⇒ Q.

This works well if the proof of each implication is short. It’s my favorite way
of writing a lowest-level prose proof of a hierarchically structured proof.
Another useful tautology is

|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ false)

This is the basis of a proof by contradiction, in which we prove that P implies


Q by assuming P and ¬Q and obtaining a contradiction (which implies
false). Many mathematicians dislike proofs by contradiction, which they
find inelegant. If you write a proof to make sure that what you’re trying to
prove is true, then you should always write a proof by contradiction. It’s
never harder and can make it easier to write the proof. I like to view proofs
by contradiction in terms of this tautology:

|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ Q)

It shows that to prove P ⇒ Q, we can assume not just P but also ¬Q when
proving Q. This gives us an additional hypothesis. Moreover, it’s a very
strong hypothesis. If P ⇒ Q is true, then P implies that ¬Q is equivalent
to false, which is the strongest possible hypothesis (since false implies
anything). If you wind up not using the additional hypothesis, you can just
delete it.

2.1.8.2 More About Proofs


Consider the theorem of Figure 2.1 of Section 2.1.4.1. As we saw in Sec-
tion 2.1.6, it makes an assertion of the form |= T , for a formula T of arith-
metic logic containing the variables x and y. To prove it, we had to prove
that T is true when any numbers are substituted for x and y. We do this by
assuming that x and y equal some unspecified numbers and showing that
T is true for those unspecified numbers. Expressed semantically, |= T as-
serts that [[T ]](Υ) is true for all interpretations Υ. We prove this by proving
[[T ]](Υ) is true for some particular unspecified interpretation Υ. The goal
of the proof of
Theorem T
is to show that [[T ]](Υ) is true for the unspecified interpretation Υ.
The formula T in our example is of the form A ⇒ P for formulas A and
P . Most of the theorems we prove have this form. The assertion |= A ⇒ P
CHAPTER 2. ORDINARY MATH 34

means that [[A ⇒ P ]](Υ) is true for every interpretation Υ, which is true
iff [[P ]](Υ) is true for every interpretation Υ for which [[A]](Υ) is true. In Remember
other words, we can prove |= A ⇒ P by assuming that Υ is an arbitrary that iff means
interpretation such that [[A]](Υ) is true and proving [[P ]](Υ) is true for that if and only if.
interpretation. Writing

Theorem Assume: A
Prove: P

asserts that |= A ⇒ P is true, but the goal of the proof is to show that
[[P ]](Υ) is true for an interpretation Υ, assuming [[A]](Υ) is true for that
interpretation. In other words, the goal of the proof is to show that P is
true when we can assume that A is true throughout the proof. Therefore,
the Q.E.D. step asserts that P is true (which completes the proof of A ⇒ P ),
not that A ⇒ P is true.
In plain Mathglish, if the theorem asserts A ⇒ P , then the goal of the
proof is to prove A ⇒ P , with no additional assumption. If we write the
theorem as an Assume/Prove, then the goal of the proof is to prove P ,
using the assumption that A is true. Either way, we’re proving the same
thing: |= A ⇒ P .
In our example, formula A equals B ∧ C . Assuming that B ∧ C is true
is the same as assuming that B is true and C is true. Writing B ∧ C as
two separate assumptions allows us to give them each a number, so we can
indicate in the proof which of the two conjuncts is being used in proving
an individual step. This makes the proof easier to read. The goal P is
also a conjunction, but there is seldom any reason to number the individual
conjuncts of a goal.
The Assume/Prove construct is not limited to the statement of the
theorem. It can be used as any step of a proof. The formulas in the Assume
clause as well as any assumptions in effect for that statement can be assumed
in the statement’s proof.
A formula P is often proved by showing that to prove P it suffices to
prove some other formula Q, and then to prove Q. If P is a statement in a
hierarchically structured proof, this proof can be written as:

2.3. P
2.3.1. Q ⇒ P
Proof of Q ⇒ P
2.3.2. Q
Proof of Q
CHAPTER 2. ORDINARY MATH 35

2.3.3. Q.E.D.
Proof: By steps 2.3.1 and 2.3.2.
The problem with this structure is that the proof of Q, which is likely to be
the main part of the proof of P , is one level deeper than the proof of P . (It
starts with statement 2.3.2.1.) That extra level of depth serves no purpose
and makes the proof harder to read. Instead, we write the proof of P like
this:
2.3. P
2.3.1. Suffices: Q
Proof of P , assuming that Q is true.
2.3.2. . . .
..
.
2.3.7. Q.E.D.
Proof that steps 2.3.2–2.3.6 prove Q.
Starting with step 2.3.2, the Suffices statement changes the goal of the
proof of step 2.3 from P to Q. The proof of step 2.3.1 has P as its goal and
Q as an additional assumption that can be used. In other words, the proof
of 2.3.1 is the same as if the statement were:
2.3.1. Assume: Q
Prove: P
The Suffices construct can be used with Assume/Prove too, as in:

2.3. P
2.3.1. Suffices: Assume: A
Prove: Q
Proof of P , assuming that A ⇒ Q is true.
In addition to changing the goal of the proof of step 2.3 from P to Q, this
Suffices statement also adds A to the current assumptions of that proof.
The proof of 2.3.1 is then the same as the proof of:
2.3.1. Assume: A ⇒ Q
Prove: P
If A is a conjunction, then its conjuncts can be listed and given numbers
in the Assume clause of a Suffices statement, the same as in an ordinary
Assume/Prove statement.
CHAPTER 2. ORDINARY MATH 36

A common proof strategy is to decompose a proof into cases. For exam-


ple, the proof that each step of Euclid’s algorithm maintains the truth of
GCD(x , y) = GCD(M , N ) might be split into the three cases x > y, y > x ,
and x = y. This is done with the Case statement. If G is the current goal
of the proof, then the statement Case: A is an abbreviation of:
Assume: A Prove: G
In a proof by case splitting, a sequence of case steps is followed by a Q.E.D.
step whose proof shows that the cases cover all possibilities.
We have now seen all the kinds of statements needed to write a proof,
except for one: a statement for making definitions local to the proof. It’s
described in Section 2.4.3.1.

2.1.9 Predicate Logic


Predicate logic is an extension of propositional logic. It includes the oper-
ators of propositional logic, and all propositional logic tautologies are valid
formulas of predicate logic. It extends propositional logic because the value
of a formula is not either true or false, but instead is a predicate on some
collection of values. (Propositional logic is equivalent to the special case in
which that collection contains just a single value.)
Predicate logic is normally described as adding to propositional logic
two additional operators called quantifiers. There is a third mathematical
operator that seems to be considered an add-on to predicate logic, if it is
considered at all, but that I consider part of the logic. It’s described after
the descriptions of the quantifiers.
We won’t bother studying predicate logic in general, just the special
case in which predicates are predicates on the set R of real numbers. This
means that we will consider the logic obtained by adding the operators of
predicate logic to arithmetic logic—a logic we will call the predicate logic of
arithmetic. All the properties of this logic that don’t depend on the laws of
arithmetic are true of predicate logic in general.

2.1.9.1 Quantifiers
The symbols, names, and Mathglish pronunciations of quantifiers are:
∀ universal quantification for all
∃ existential quantification there exists
They have the following meanings, where v is any numeric variable and F
is any formula of the predicate logic of arithmetic:
CHAPTER 2. ORDINARY MATH 37

∀ v : F is true iff F is true when any number is substituted for v .


∃ v : F is true iff there is some number that, when substituted for v ,
makes F true.

To be a bit more precise, for any interpretation Υ, let

Υ except v 7→ r

be the mapping that’s the same as Υ except it assigns to the variable v the
number r . For any interpretation Υ:

[[∀ v : F ]](Υ) equals true iff [[F ]](Υ except v 7→ r ) equals true for
every number r .
[[∃ v : F ]](Υ) equals true iff [[F ]](Υ except v 7→ r ) equals true for
some number r .

When parsing a formula, the scope of a quantifier extends as far as possible—


for example, until terminated by the end of the formula or by a right paren-
thesis whose matching left parenthesis precedes the quantifier.
The following formula is an example of universal quantification:

(2.7) ∀ x : y ∗ x 2 ≥ x 2

Since (i) x 2 ≥ 0 for any number x and (ii) y ∗ r ≥ r for all r ≥ 0 iff y ≥ 1,
this formula equals true iff y ≥ 1. Thus, (2.7) is equivalent to y ≥ 1.
The following formula asserts that there exists a (real) number whose
square equals y:

(2.8) ∃ x : y = x 2

Since a real number y has a square root (that’s a real number) iff y ≥ 0,
this formula is equivalent to y ≥ 0.
The quantifiers ∀ and ∃ are related by the following rules. The box
around F in these formulas is explained in Section 2.1.9.2 below. For now,
ignore the boxes and pretend that each F is simply F .

(2.9) |= (∀ v : F ) ≡ (¬ ∃ v : ¬ F ) |= (∃ v : F ) ≡ (¬ ∀ v : ¬ F )

You should be able to check that these tautologies follow from the definitions
of [[∀ v : F ]] and [[∃ v : F ]]. We can take either of those tautologies to be the
definition of one of the quantifiers in terms of the other.
If we consider ∀ v and ∃ v to be the operators, and ∀ and ∃ (and the
colon “:”) to be pieces of syntax, then these two operators are dual to each
CHAPTER 2. ORDINARY MATH 38

other. This means that to negate the application of either of them to a


formula F , we replace the operator by its dual and replace F by ¬F . This
duality is expressed by these theorems:

(2.10) |= (¬ ∀ v : F ) ≡ (∃ v : ¬ F ) |= (¬ ∃ v : F ) ≡ (∀ v : ¬ F )

They follow from (2.9) by negating both sides of its equivalence relations.
In formulas (2.7) and (2.8), x is called a bound variable. The variable
y in those formulas is called a free variable and is said to occur free in the
formulas. In a certain sense, the bound variable x doesn’t really occur in
those predicates. If we replace x by another variable z in (2.8), we get
∃ z : z 2 = y — a formula that is not just equivalent to (2.8), but is really just
a different syntax for the same formula.
Bound variables introduce subtle problems in mathematical reasoning.
If not understood, those problems can lead to making incorrect deductions.
Understanding them requires a more careful examination of variables, which
we now undertake.

2.1.9.2 Variables and Their Scopes


Declarations and Scopes Variable y appears in Section 2.1.4.1 and in
formula (2.7). In Section 2.1.4.1 we asserted that formulas (2.3) are true
and deduced y = −2. But (2.7) isn’t true if y = −2. What’s the math-
ematical explanation of why this is not an inconsistency in the book? I
think most mathematicians would say that there’s no inconsistency because
the assumptions under which we deduced y = −2 in Section 2.1.4.1 are not
assumed to hold where (2.7) appears. This is a reasonable explanation, but
I think there’s one that helps understand bound variables better—namely,
that the variable called y in Section 2.1.4.1 and the variable called y in (2.7)
are two different variables that have the same name. The assumptions we
made about the variable named y in Section 2.1.4.1 don’t apply to the vari-
able named y in (2.7) because those are simply two unrelated variables that
happen to have the same name.
There are implicit declarations of new variables named x and y before the
first uses of those variables in Section 2.1.4.1. The implicit scopes of those
declarations extend until the end of Section 2.1.4.2. Inside those scopes, x
and y are the names of those variables. Outside their scopes, each of them
is either the name of another variable or else is meaningless because it isn’t
in the scope of any variable with that name.
Most readers will find this to be a natural way of thinking about variables
because with most coding languages, a program can contain many different
CHAPTER 2. ORDINARY MATH 39

variables with the same name. In code, we can usually think of a variable
as a location in memory. Here, we don’t know what a variable is (just as we
don’t know what a number is); we only know that an interpretation assigns
a value to a variable, and not to its name. And an expression contains
variables that have names, it doesn’t just contain variable names.
Before we introduced quantifiers, the declarations and scopes of variables
were all implicit. A formula ∀ v : F explicitly declares a new variable named
v whose scope is the formula F . Scopes can be nested. Formula ∀ v : F could
occur within the scope of a variable named v . That variable is a different
variable than the one named v that occurs in formula F . For example, in the
formula ∃ v : G ∧ (∀ v : F ), a variable named v that occurs in G is different
from a variable named v that occurs in F .

Variable Capture Having different variables with the same name raises
issues when substituting an expression for a variable. Those issues don’t
arise in coding languages, because those languages don’t allow such substi-
tution. You can’t substitute x+1 for y in the assignment statement y = 2*y.
But such substitution is needed to define what it means for one program to
implement another.
Here is an issue that arises with substitution that mathematicians call
variable capture. Consider the formula ∃ x : x > y of the predicate logic of
arithmetic. It’s a valid formula, since there exists a number x greater than
any number y. By the Elementary Algebra Instantiation Rule, we obtain a
valid rule when we substitute an expression for the variable y. Substituting
x + 1 for y apparently yields the formula ∃ x : x > x + 1 which equals false,
because there is no number x that is greater than x + 1.
A mathematician would probably say that we’re not allowed to perform
that substitution because the bound variable x captures the x in x +1. How-
ever, the substitution produces no logical problem because the expression
x + 1 we’re substituting for y is not in the formula ∃ x : x > y, so the x in
x + 1 is a different variable than the bound variable x in that formula. So,
the substitution produces a perfectly sound formula. The problem is that
mathematicians have provided no way to write a formula x > x + 1 when
the two “x ”s are names of two different variables. Variable capture is not
a logical problem, it’s a problem in the way we write formulas. However,
it poses a real problem for us because we have to write formulas that way
when we write proofs. Variable capture when writing proofs can result in
incorrect proofs, leading us to believe things that aren’t true.
We can avoid variable capture by changing the name of the bound vari-
CHAPTER 2. ORDINARY MATH 40

able that’s doing the capturing. Variable names serve only to tell whether or
not two variables that occur in a formula are the same variable. They have
no mathematical significance. A formula doesn’t depend on the names of its
variables. We can’t write a formula that asserts that the name of a variable
contains a vowel. So, if the scope of a variable named x contains no formula
within the scope of any variable named z , then we can change the name
of that variable named x from x to z . This means replacing every variable
name x in that scope with z . Doing this replacement makes no change to
the mathematics, just to the way we have written that mathematics. We
can therefore write the formula obtained by substituting x + 1 for y in the
formula ∃ x : x > y by changing the name of the bound variable from x to z
to get the correct formula ∃ z : z > x + 1.
There’s an easy way to avoid variable capture without having to rename
bound variables. We just follow this rule:
Safe Scoping Rule Never declare a bound variable within the scope
of a variable with the same name.
To substitute x + 1 for y in a formula F , the formula F must be in the
scope of a variable named x . The rule implies that F can’t declare a
bound variable named x . Therefore, you could never have to substitute
x + 1 for y in ∃ x : x > y. This rule not only eliminates the problem of vari-
able capture, it also makes formulas easier to understand. While a formula
like ∃ v : F ∧ (∃ v : G) is mathematically fine, having two unrelated variables
named v is likely to confuse a reader.

Placeholders In Section 2.1.8.1, we introduced rules for reasoning about


propositional logic. One of them was that ¬¬A ≡ A is a tautology, for any
formula A. We can apply that rule to deduce |= ¬¬(x > y) ≡ (x > y)
by substituting the expression (x > y) for the formula A. We can formally
express the rule as asserting the truth of |= ¬¬A ≡ A where A is a variable.
We can then deduce |= ¬¬(x > y) ≡ (x > y) from |= ¬¬A ≡ A by
the Propositional Logic of Arithmetic Instantiation Rule of Section 2.1.8.1,
substituting x > y for the variable A.
We asserted in (2.10) the predicate logic rule that (∃ v : F ) ≡ (¬ ∀ v : ¬F )
is a tautology for any formula F . We might expect that this rule could be
expressed formally as the tautology
(2.11) |= (∃ v : F ) ≡ (¬ ∀ v : ¬F )
where F is a variable. We would then expect to deduce
|= ∃ x : (x > y) ≡ ¬(∀ x : ¬(x > y))
CHAPTER 2. ORDINARY MATH 41

by first renaming the bound variables v to x in (2.11) to obtain


(2.12) |= (∃ x : F ) ≡ (¬ ∀ x : ¬F )
and then substituting the formula x > y for the variable F in (2.12).
However, this wouldn’t work because the variable x in x > y and the two
bound variables x in (2.12) are three different variables. To avoid variable
capture, we would have to rename the bound variables x , writing something
like:
(2.13) |= (∃ z : x > y) ≡ (¬ ∀ z : ¬x > y)
For any formula G not containing the bound variable z , formulas ∃ z : G
and ∀ z : G both equal G. Therefore, (2.13) asserts |= (x > y) ≡ ¬¬(x > y),
which is true but not what we wanted to prove. In fact, if F is a variable
(which must be different from the bound variables named v ), then (2.11)
just asserts the propositional logic tautology |= F 𠪪F .
We write F instead of F in (2.9) and (2.10) to indicate that F is not a
variable but is something else. The symbol F is what logicians sometimes
call a syntactic variable, but which I call a placeholder because it’s quite
different from a variable. The simplest way to think of a placeholder is that
it’s intended to be replaced by a sequence of symbols that produce a syntac-
tically correct formula. Syntactically correct includes the requirement that,
for every variable name in the formula, the resulting formula occurs in the
scope of a variable having that name. Of course, when a formula is sub-
stituted for F in ∀ v : F , that formula is then in the scope of the bound
variable v . The formulas in (2.9) and (2.10) are actually meta-formulas
asserting that all syntactically correct formulas obtained by substituting a
sequence of symbols for F in these meta-formulas are tautologies. Equiv-
alently, we can think of substituting a formula rather than a sequence of
symbols for F in ∀ v : F , except that any variable named v is replaced by
the bound variable v . In other words, substituting for F requires capture of
any variable named v .
This all works, but I find an explanation in terms of strings of symbols or
variable capture inelegant; and inelegance bothers me because I worry that it
might indicate a potential problem. There is what I consider a more elegant
way of viewing quantification that justifies what we do. It eliminates the
issues involving bound variables by defining quantification without bound
variables.
We can explain quantification by interpreting ∀ v : F and ∃ v : F to
mean ∀ (F ) and ∃ (F ), respectively, where F is not a formula but a map-
ping. The formula ∃ x : x > y is viewed as the formula ∀ (M ), where M
CHAPTER 2. ORDINARY MATH 42


is the mapping defined by M (x ) = x > y. If we write this mapping M as
x 7→ x > y, then ∃ x : x > y is just a way of writing ∃ (x 7→ x > y). The
first meta-formula of (2.9) becomes |= ∀ (F ) ≡ ¬ ∃ (¬F ), where F is a map-
ping and ¬F is the mapping defined by (¬F )(v ) = ¬F (v ) for all values v .
Substituting x 7→ x > y for F in |= ∀ (F ) ≡ ¬∃ (¬F ) yields:

|= ∀ (x 7→ x > y) ≡ ¬ ∃ (¬(x 7→ x > y))

Using the definition of ¬F for a mapping F to write ¬(x 7→ . . .) as x 7→ ¬ . . . ,


this theorem is what we usually write as

|= (∀ x : x > y) ≡ ¬ (∃ x : ¬(x > y))

This view eliminates placeholders, but it introduces higher-order mappings.


An ordinary mapping M maps values to values; if v is a value then M (v )
is a value. But in this view, ∀ and ∃ are higher-order mappings—ones that
map mappings to values. If M is a mapping, then ∀ (M ) and ∃ (M ) are
values. The view also requires extending Boolean operators to operators
on mappings—for example the conjunction F ∧ G of mappings F and G
is defined by (F ∧ G)(v ) = (F (v ) ∧ G(v )). This is standard practice in
mathematics. If you’ve studied calculus, you may have seen the sum f + g
of two functions defined by (f + g)(x ) = f (x ) + g(x ).
We will use the standard notation for quantifiers. However, this alter-
native view can help us understand what we’re doing.

2.1.9.3 Predicate Logic Reasoning


Reasoning about formulas of the predicate logic of arithmetic requires rules
of predicate logic. The assertions of (2.9) and (2.10) provide such rules. Two
more rules are:

(2.14) |= (∀ v : F ∧ G ) ≡ ((∀ v : F ) ∧ (∀ v : G ))
|= (∃ v : F ∨ G ) ≡ ((∃ v : F ) ∨ (∃ v : G ))
You should be able to find examples to show that these assertions become
false if ∀ and ∃ are interchanged.
Another simple but useful rule, which was already mentioned in Sec-
tion 2.1.9.2, is that if the variable v does not occur free in formula F , then
∀ v : F and ∃ v : F are both equivalent to F . For example, if v does not occur
free in F , then the first meta-theorem of (2.14) implies that ∀ v : (F ∧ G) is
equivalent to F ∧ (∀ v : G).
CHAPTER 2. ORDINARY MATH 43

There are four more rules that are often used. Two of them are for
proving the two kinds of quantified formulas, the other two are for using each
kind of quantified formula to prove something else. The first two are called
quantifier introduction rules, the second two are called quantifier elimination
rules. The first uses a new kind of assumption in a proof. The others
are simple implications, which can be proved with an Assume/Prove as
described in Section 2.1.8.2.

∀ Introduction To prove ∀ v : F , we need to show that F is true for any


value of v . We do that by proving F under the assumption that v is a brand-
new variable about which we know nothing. The following Assume/Prove
in a theorem or in a proof statement is equivalent to ∀ v : F .
Assume: new v
Prove: F
However, its proof has F as its goal, and declares v to be a new variable
whose scope is the formula F and the proof of the statement. We could
write a proof of ∀ v : F like this:

3.2. ∀ v : F
3.2.1. Suffices: Assume: new v
Prove: F
Proof: Obvious (because the Assume/Prove asserts ∀ v : F ).
..
.
3.2.7. Q.E.D.
Proof that steps 3.2.2–3.2.6 imply F .
We could also eliminate one level of proof and the Suffices step by having
step 3.2 simply assert the Assume/Prove.
In the view of quantification without bound variables, where F is a
mapping, the ∀ introduction rule is that ∀ (F ) is proved by proving
Assume: new v
Prove: F (v )

∃ Introduction To prove ∃ v : F , we have to show that there is a value of


v that makes F true. We do that by explicitly describing that value. This
is asserted as follows, where exp is a numeric expression:
|= ( F with v ← exp) ⇒ ∃ v : F
The rule is expressed without bound variables as |= F (exp) ⇒ ∃ (F ) .
CHAPTER 2. ORDINARY MATH 44

∀ Elimination The formula ∀ v : F asserts that F is true for all values


of v . We deduce from this formula that F is true when we substitute a
particular expression for v . This is asserted for the numeric expression exp
by:

|= (∀ v : F ) ⇒ ( F with v ← exp)

We then use the formula F with v ← exp to prove our goal. In the view
of quantification without bound variables, this rule is: |= ∀ (F ) ⇒ F (exp) .

∃ Elimination Suppose ∃ v : F is true and F ⇒ G is true when v has any


value. This implies that G is true for the particular value of v that makes
F true, so ∃ v : G is true. Stated precisely, this rule is a metamathematical
formula asserting that one meta-formula implies another. To write it, we
need a metamathematical operator that corresponds to the operator ⇒ of
mathematics. Instead of introducing a symbol for that operator, we simply
use the Mathglish word implies. (We will also use the word and to denote
the metamathematical conjunction operator.) The precise statement of the
rule is:

|= F ⇒ G implies |= (∃ v : F ) ⇒ (∃ v : G )

This doesn’t look like an ∃ elimination rule because we use ∃ v : F to prove


another existentially quantified formula ∃ v : G, so we haven’t eliminated the
∃ . It becomes an elimination rule if v does not occur free in G, so ∃ v : G
equals G. The rule is usually stated with ∃ v : G replaced by G and the
side condition that v does not occur free in G. This side condition can be
written mathematically as the assertion that replacing v by 0 (or any other
value) leaves G unchanged.2 The rule that ∃ x : G equals G if x does not
occur free in G can therefore be written:

(2.15) |= G ≡ (G with v ← 0) implies |= (∃ v : G) ≡ G

Of course, replacing ∃ with ∀ produces a valid rule too.


In terms of quantification without bound variables, the ∃ Elimination
rule becomes:

|= ∀ (F ⇒ G) ⇒ (∃ (F ) ⇒ ∃ (G))
2
By this definition, v does not occur free in the formula v = v , which makes sense
mathematically.
CHAPTER 2. ORDINARY MATH 45

The condition that v doesn’t occur free in the formula G becomes √


the con-
dition that G is a constant mapping, so G(v ) equals G(0) (or G( 2)) for
all v . Rule (2.15) therefore becomes:

|= (G ≡ (v 7→ G(0))) ⇒ (∃ (G) ≡ G)

In the ∃ Introduction and ∀ Elimination rules, exp is an ordinary vari-


able. We must be careful when using the with constructs in those rules.
If F is the formula (v > 0) ⇒ (∃ v : G), then F with v ← exp equals
(exp > 0) ⇒ (∃ v : G) . No substitution is performed in G because, within
G, the bound variable v is not the same variable as the v in v > 0.

2.1.9.4 The choose Operator


Mathematicians often define something in terms of its properties. For exam-

ple, they might define r for a real number r to be the positive real number
√ 2
such that ( r ) = r . We can express such a definition using an operator
invented by the mathematician David Hilbert in the 1920s. I didn’t learn
about this operator until about 25 years after I completed my studies; I
suspect it’s still unknown to most mathematicians. Hilbert called it ε , but
I think it’s better to call it choose. We can use this operator to define the
square root as follows (since we are assuming all predicates are predicates
on R):
√ ∆
r = choose s : (s ≥ 0) ∧ (s 2 = r )

In general, the expression choose v : F equals a value e that makes F


true when e is substituted for v . If there is no such e, then the value of
the expression is unspecified. If there is more than one such value e, then
the expression can equal any one of those values. For example, define the
mapping ASqrt by:

ASqrt(r ) = choose s : s 2 = r

Then ASqrt(4) might equal 2 and ASqrt(9) might equal −3. Since this is
math, |= ASqrt(4) = ASqrt(4) is true. The value of ASqrt(4) may be 2 or
−2. But whichever value it equals, like every mathematical expression with
no free variable, it always equals the same value.
Formally, choose is defined by the following rules:

(2.16) (a) |= (∃ v : F ) ⇒ ( F with v ← (choose v : F ))


(b) |= (∀ v : F ≡ G ) ⇒ ((choose v : F ) = (choose v : G ))
CHAPTER 2. ORDINARY MATH 46

If there is more than one value of x for which F equals true, then
choose x : F can equal any of those values. But it always equals the same
value.
No matter how often I repeat that the choose operator always chooses
the same value, there are engineers who think that choose is nondeter-
ministic, possibly choosing a different value each time it’s evaluated, and
they try to use it to describe nondeterminism in a program. I’ve also heard
computer scientists talk about “nondeterministic functions”.3 There’s no
such thing. There’s no nondeterminism in mathematics. Nondeterminism
is important in concurrent programs, and we’ll see that it’s easy to describe
mathematically. Adding nondeterminism to math for describing nondeter-
minism in a program makes as much sense as adding water to math for
describing fluid dynamics.
An expression choose v : F is most often used when there is only a single

choice of v that makes F true, as in the definition of r above. Sometimes,
it appears within an expression whose value doesn’t depend on which value
of v satisfying F is chosen.
Just as with quantifiers, we can eliminate the bound variable in the
choose construct by writing choose (F ), where F is a mapping. The
expression choose v : F is then an abbreviation of choose (v 7→ F (v )).
The rules (2.16) are written:

(a) |= ∃ (F ) ⇒ F (choose (F ))
(b) |= ∀ (F ≡ G) ⇒ (choose (F ) = choose (G))

2.1.10 More About Proofs


2.1.10.1 Assume/Prove in General
We have seen two kinds of Assume/Prove constructs: one in which the
Assume clause contains one or more assumptions, and one in which it con-
tains just a new declaration. We now examine the precise meaning of this
construct.
The statement of the theorem of Figure 2.1 has the form:

(2.17) Assume: A, B Prove P

for formulas A, B , and P . (The numbers attached to A and B are just


labels that don’t affect the meaning.) This statement asserts the formula
3
I must confess that, many years ago, I used that term in a paper [28].
CHAPTER 2. ORDINARY MATH 47

A ∧ B ⇒ P . You can check that the following is a tautology of propositional


logic:

|= (A ∧ B ⇒ P ) ≡ (A ⇒ (B ⇒ P ))

We can therefore rewrite the formula asserted by (2.17) as A ⇒ (B ⇒ P ).


Multiple applications of the tautology show that A ∧ B ∧ C ∧ D ⇒ P is
equivalent to A ⇒ (B ⇒ (C ⇒ (D ⇒ P ))), and so on for any number of
conjuncts.
In general, an Assume clause is a comma-separated list of assumptions,
each of which is either a formula or a new declaration. Recall that the
statement

Assume: new v Prove: P

makes the assertion ∀ v : P . The statement

(2.18) Assume: A, B , new v , C , D Prove P

asserts the formula

A ⇒ (B ⇒ (∀ v : C ⇒ (D ⇒ P )))

The proof of the Assume/Prove statement has P as the goal, and the
variable v is declared by the new v clause to be a new variable whose scope
contains formulas C , D, and P and the statement’s proof. It would be
almost impossible to use formula A or formula B in the proof if it had a free
variable named v . The Safe Scoping Rule makes this impossible because it
forbids the new v clause to occur within the scope of a variable named v .
As should be clear from the example, the general form of an Assume/
Prove statement is

Assume: A1 , . . . , An Prove: P

where each Ai is either a formula or a new declaration. This statement


asserts the formula

B 1 . . . B n P ) . . .)

where B i is “ Ai ⇒ ( ” if Ai is a formula and is “ (∀ v :” if Ai is new v .


There can be more than one new declaration among the Ai . If α is a list
of assumptions and P is a formula, we define AP (α, P ) to be the formula
asserted by the statement Assume: α Prove: P . If α is the empty list, we
define AP (α, P ) to equal P .
CHAPTER 2. ORDINARY MATH 48

2.1.10.2 Hierarchically Structured Proofs


We now define the precise meaning of a hierarchically structured proof. Let
a statement be either the statement of the theorem or a statement in the
theorem’s proof. For each statement and each lowest-level paragraph proof,
we define a current goal G and a list β of current assumptions that can be
used to prove that goal. For a proof consisting of a list of statements, we
consider the current goal and list of assumptions of the first statement of
the proof to be the goal and assumptions of the proof.
We recursively define the list of assumptions and the current goal for
every statement and paragraph proof in the theorem’s proof as follows. We
first define them for the theorem. We then define, for every statement, the
goal and list of assumptions for the statement’s proof and, except for the
theorem or a Q.E.D. statement, we also define the goal and list of assump-
tions for the next statement at the same level. (The next statement at the
same level as step number 1.2.3 is step number 1.2.4.) We use the notation
that if β and α are lists of assumptions, then β, α is the obvious concate-
nation of the two lists. If P is a formula, then β, P is the list obtained by
appending formula P to the end of β.
We can consider a formula P that is a statement or appears in a Suf-
fices: P statement to be an Assume/Prove with an empty list of assump-
tions. There are then three kinds of proof statements, where α is a (possibly
empty) list of assumptions and P is a formula:
PS1. Assume: α Prove: P
PS2. Suffices: Assume: α Prove: P
PS3. Q.E.D.
We now define the goal and list of assumptions for all statements and para-
graph proofs, starting with the theorem. The list of assumptions that can
be used to prove the theorem is the list of all formulas that have already
been proved or are taken as axioms, along with new v assumptions for any
variables v that have been introduced—usually implicitly. For example, x
and y are such variables in the theorem of Figure 2.1. To simplify the defini-
tion, we define the current goal where the theorem is stated to equal false.
The statement of the theorem must be a formula of type PS1. Here are
the inductive definitions for the three kinds of statements, where β is the
current list of assumptions and G is the current goal at the statement.
PS1. At the beginning of its proof, the current list of assumptions is β, α,
and the current goal is P ∨ G. (Why that’s P ∨ G and not P is
CHAPTER 2. ORDINARY MATH 49

explained below.) At the next statement, the list of assumptions is


β, AP (α, P ) and the current goal is G.
PS2. At the beginning of the proof, the current list of assumptions is
β, AP (α, P ) and the current goal is G. At the next statement, the
list of assumptions is β, α and the current goal is P ∨ G.
PS3. At the beginning of the proof, the current list of assumptions is β and
the current goal is G. (There is no next statement.)
In PS1 and PS2, you probably expected that the stated goals P ∨ G should
have been P . In fact, P is what is usually proved. However, we can take
P ∨ G as the goal because the statement that introduces the new goal P lies
within a proof of G. If we prove G instead of P , then the rest of the current
proof is irrelevant because its purpose is to prove G. Another way to look
at it is that we are actually performing a proof by contradiction by adding
¬G as an assumption. If we prove G, then the assumption ¬G allows us to
deduce false, which implies the goal P .
The lowest-level steps of a proof have a paragraph proof that mentions
all the previous steps needed to prove the step. However, not all previous
steps can be used to prove a step. Every step is proved under a list of
assumptions, and we know it to be true only if those assumptions hold.
Suppose S is the current step and T is a previous step, and suppose β is
the list of current assumptions under which S is being proved and α is the
list of assumptions under which T has been proved. For T to be used in
the proof of S , it must be true under the assumptions of β. Since it was
proved under the assumptions of α, we ensure this by requiring that all the
assumptions in α must be in β.
Let’s suppose for a moment that the proof contains no assumptions in
any of the statements of form PS1 or PS2, so all steps in the proof were
proved under the same list of assumptions. We could then use any previous
statement in the proof of a statement. The proof of step 4.3. could use
step 1.1.7.4.5. However, if we allowed that, we would lose the benefit of
hierarchical structuring. Hierarchical structuring simplifies understanding
because once we’ve finished proving statements 1, 2, and 3, we can forget
about any statements in their proofs when proving statement 4. We just
need to understand those three statements. And similarly, when proving
statement 4.3, we should not care about the proofs of statements 4.1 and
4.2; we just need to understand statements 1, 2, 3, 4.1, and 4.2. Changes
to the proofs of those five statements should make no difference, as long as
the statements don’t change.
CHAPTER 2. ORDINARY MATH 50

Now suppose there can be assumptions in Assumes. The lists of assump-


tions under which each of the statements 1, 2, 3, 4.1, and 4.2 were proved
are each a prefix of the list of assumptions under which every paragraph
proof in the proof of 4.3 is proved. (A list is considered a prefix of the same
list.) Respecting the hierarchical structure ensures that only steps that logic
allows to be used in a proof can be used. The rule for what statements can
then be used in a paragraph proof is:

Step Reference Rule A step in a level-n proof can be used only in


the proofs of the steps that follow it in the same level-n proof.

For example, step 6.2.7.3 can be used only in the proofs of steps 6.2.7.4,
6.2.7.5, etc.
This rule allows us to solve the problem of keeping track of long step
numbers. Step number 6.2.7.3, 6.2.7.4, etc. can be replaced by the numbers
h4i3, h4i4, etc., where step hn ii is the i th step of a level-n proof. There can
be many steps numbered h4i3 in a proof. However, in any paragraph proof,
there is at most one step numbered h4i3 that the Step Reference Rule allows
to be used. That step, if it exists, is the most recent step numbered h4i3.

2.1.11 Some Useful Notation


Here are two pieces of notation that mathematicians don’t seem to need,
but that I find essential for writing formulas that describe programs.

2.1.11.1 if/then/else
A programmer who read enough math would notice that mathematicians
lack anything corresponding to the if /then/else statement of coding lan-
guages. Instead, they use either prose or a very awkward typographical
convention. We let the expression

if P then e else f

equal e if the predicate P is true and f if P is false. It is defined by:



(2.19) if P then e else f =
choose v : (P ∧ (v = e)) ∨ (¬P ∧ (v = f ))
While it’s inspired by the if /then/else coding language statement, you
should not think of an if/then/else expression as instructions for com-
puting something. It is a mathematical expression defined by (2.19). It’s
more like the expression written in the C language and its descendants as
CHAPTER 2. ORDINARY MATH 51

P ? e : f . However, coding languages usually specify the order in which the


expressions P , e, and f are evaluated. Although we sometimes say that
evaluating an expression exp yields a value v , that just means exp = v .
Formally, there is no concept of evaluation in mathematics.

2.1.11.2 Conjunction and Disjunction Lists


Abstract mathematical descriptions of real systems can be quite long. Def-
initions are used to decompose them into shorter formulas that are easier
to understand. However, those shorter formulas can still be a few dozen
lines long. They are understandable because mathematical formulas have
a natural hierarchical structure. To take full advantage of that structure,
we use a simple bit of notation that mathematicians and many computer
scientists find heretical, but that engineers appreciate.
There are two simple ideas: a list of formulas bulleted by ∧ or ∨ rep-
resents the conjunction or disjunction, respectively, of those formulas; and
indentation is used to replace parentheses. For example, if A, B , . . . , J are
formulas, then:
∧ ∨ A∧B ( (A ∧ B )
∨C ∨ C )
∧D ⇒E equals ∧ (D ⇒ E )
∧ ∨ ∃x : F ∧ ( (∃ x : F )
∨∧G ≡H ∨ ( (G ≡ H )
∧J ∧ J ) )
Note how the implicit parentheses in the bulleted lists delimit the scope of
the ⇒ and ∃ x operators in this formula.
Making indentation significant is a feature of the currently popular Python
coding language, but it works even better in this notation because the use
of ∧ and ∨ as “bullets” makes the logical structure easier to see.

2.2 Sets
You’ve probably come across the mathematical concept of a set. A set is
a collection. However, we will see that not all collections can be sets. The
fundamental operator in terms of which sets are defined is ∈, which is read is
an element of or simply in. For every collection S that is a set, the formula
exp ∈ S equals true iff the value of the expression exp is one of the things
the collection S is a collection of. We call the things in a set S the elements
of S .
CHAPTER 2. ORDINARY MATH 52

We need to be able to have sets of sets—that is, sets whose elements are
sets. Therefore, a set has to be a “thing”. So, we need to know what kinds
of things there are besides sets. The simplest way I know to make the math
we need completely rigorous is to base it on what is called ZF set theory or
simply ZF, where Z and F stand for the mathematicians Ernst Zermelo and
Abraham Fraenkel. One thing that makes ZF simple is that every thing is
a set. In other words, every mathematical value is a set. We will use ZF to
describe our math, so the terms set and value will mean exactly the same
thing. Sometimes I will write set/value instead of set or value to remind you
that the two words are synonyms. We add to ZF the operators of predicate
logic, so we should say we’re using the predicate logic of ZF, but we won’t.
We’ll just call it ZF.
Logicians have shown how to build mathematics, including the set of real
numbers, from ZF. There’s no need for us to do that; we just assume the
real numbers exist and the arithmetic √ operators on them satisfy the usual
properties. This means that 42 and √ 2 are sets, but we don’t specify what
their elements are. We know that 2 ∈ 42 equals either true or false,
but we don’t know which. We assume nothing about what the elements of
the set 42 are. We’ll generally use the term value for a set/value like 42 for
which we don’t know what its elements are.
We define the semantics of ZF the same way we defined the semantics
of the predicate logic of arithmetic. The meaning [[F ]] of a formula F of ZF
is a predicate on (the collection of) interpretations, where an interpretation
is an assignment of a set to each variable. There is one difference between
the predicate logic of arithmetic and ZF: We defined the predicate logic
of arithmetic to have two kinds of variables, numeric-valued and Boolean-
valued. ZF has just set-valued variables. In ZF, the values true and false
are sets.4 (We don’t know what their elements are.)
Including the operators of arithmetic in ZF implies that we have to
say what the meaning of the expression x + y is if x or y is a set that
isn’t a number. It also implies that we can write weird expressions like
(x + y) ∧ z , so we have to explain what they mean. We’ll deal with this issue
in Section 2.2.7.

2.2.1 Simple Operators on Sets


A set S is completely specified by what its elements are—that is, by the
values of the formula e ∈ S for all sets e. This is asserted by the following
4
As usually defined, ZF does not consider true and false to be sets. We will see that
making them sets allows the value of a program variable to be a Boolean.
CHAPTER 2. ORDINARY MATH 53

rule of ZF:

|= ( S = T ) ≡ ∀ e : (e ∈ S ) ≡ (e ∈ T )

As in the rules of predicate logic, S and T are placeholders, not variables.


Almost all the rules of ZF are stated in terms of placeholders. Putting
boxes around all the placeholders would be distracting, so they won’t be
used any more. From now on, we will simply assume that any name that
appears within the scope of a bound variable in a rule is a placeholder, not
a variable. In particular, the S and T that appear in all the formulas in this
section (Section 2.2.1) are placeholders.
Besides equality, set theory has one Boolean-valued operator: the subset
operator ⊆. For sets S and T , the formula S ⊆ T , read S is a subset of T ,
is true iff every element of S is an element of T . It’s defined precisely by
thie rule:

|= S ⊆ T ≡ ∀ e : (e ∈ S ) ⇒ (e ∈ T )

We say that S is a proper subset of T iff S ⊆ T and S 6= T .


To define a set, we must define what its elements are. The most direct
way to do this is by enumerating the elements. If e 1 , . . . , e n are any values,
then they are the (only) elements of the set√{e 1 , . . . , e n }. This set need not
have n elements. For √ example, the set {3, 2, 3, 2+1, 42, 3} contains √ only
the three elements 2, 3, and 42. It is equal to the set {42, 42, 3, 2}. (It is
as silly to say that a set has two copies of the number 42 as it is to say that
a football team has two copies of one of its players.) For n = 0, this defines
{} to be the empty set, which has no elements. For a finite set S (one with
only finitely many elements), we let #(S ) be the number of elements of S .
For example, #({1, 2, 4, 2, 1}) = 3.
Another way to define a set S is to define what e ∈ S equals, for a variable
e. We do this by writing a formula and asserting that e ∈ S equals that
formula. Here are the names, descriptions, and definitions of the set-valued
operators ∪, ∩, and \ (sometimes written “−” by computer scientists):

S ∩ T (intersection) The set of all elements in both S and T .


Definition: |= e ∈ S ∩ T ≡ (e ∈ S ) ∧ (e ∈ T )

S ∪ T (union) The set of all elements in S or T .


Definition: |= e ∈ S ∪ T ≡ (e ∈ S ) ∨ (e ∈ T )

S \T (set difference) The set of all elements in S and not in T .


Definition: |= e ∈ S \ T ≡ (e ∈ S ) ∧ ¬(e ∈ T )
CHAPTER 2. ORDINARY MATH 54

Our definition of S ∩ T is not just a definition. Besides defining S ∩ T to be


the collection of values that are in both S and T , it asserts that if S and T
are sets, then this collection is also a set. The same is true of the definitions
of ∪ and \ and all the other set-forming operators to be defined. Some
of those set-forming operators can be defined in terms of others. For the
rest, the assertion that they form a set is an axiom. We don’t care which
operators are which, but we should be aware that each such definition is
making an assertion that some value is a set.

2.2.2 More Sets and Set Operators


We’ve defined the set R of all real numbers. Most sets of numbers manip-
ulated by programs are sets of integers. Here are definitions of some sets of
integers that we will use.
N The set of natural numbers, which consists of all the non-negative inte-
gers 0, 1, 2, etc.
I The set of all integers, which includes positive and negative integers
and 0.
m . . n If m and n are integers, then this is the set of all integers i such that
m ≤ i ≤ n. This means that if m > n, then m . . n is the empty set {}.
We will need two more operators on sets. Even if you’ve studied sets,
you may not be familiar with them:
P(S ) (power set) The set of all subsets of S . For example P({1, 2, 3}) equals
{ { }, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3} } .
Definition: |= e ∈ P(S ) ≡ e ⊆ S .
[ [
S The union of all the elements of S . For example, {T , U , V } =
T ∪ U ∪ V . In the following definition, e and S are not considered
to be in the scope of the bound variable s. Thus they both can be
considered to be variables.
[
Definition: |= e ∈ S ≡ ∃ s : (e ∈ s) ∧ (s ∈ S )
These operators can be confusing, and when using them it can be hard to
keep track of exactly what kind of elements belong to each set. Figuring out
why the following two assertions are true may help you understand them.
[ [
|= P(S ) = S |= S ⊆ P( S)
The definitions imply that P({}) equals {{}}, the set whose single element
is the empty set, and {} equals {}.
S
CHAPTER 2. ORDINARY MATH 55

2.2.3 Two Set Constructs


ZF has the following two set-forming constructs. They both introduce a
bound variable, named v here. The formula P in the first and the ex-
pression exp in the second are in the scope of v and can be replaced by
expressions containing the bound variable v . Therefore, they are placehold-
ers in the constructs’ definitions. However, an expression containing the
bound variable v cannot be substituted for S . We can therefore consider S
either a placeholder that is outside the scope of v , or a variable that occurs
in the scope of v . As we’ve seen, we can’t substitute for such a variable an
expression containing the bound variable v . I think it’s best to think of S
as being a placeholder outside the scope of the bound variable. Logicians
call the first construct set comprehension, a name I don’t like. I don’t think
the second has a commonly used name.

Subsetting {v ∈ S : P } is the subset of the set S consisting of all ele-


ments v in S for which formula P is true. For example, i . . j equals
{n ∈ I : i ≤ n ≤ j }.
Definition: |= e ∈ {v ∈ S : P } ≡ (e ∈ S ) ∧ (P with v ← e)

Imaging {exp : v ∈ S } is the set of all values of the expression exp obtained
by substituting for v an element of S . For example, {2 ∗ n + 1 : n ∈ N}
is the set of all odd natural numbers. Since S is not in the scope of
the bound variable v , the following definition is correct only if S is a
variable or is a formula not containing a variable named v .
Definition: |= e ∈ {exp : v ∈ S } ≡ ∃ v : (v ∈ S ) ∧ (e = exp)

As with quantifiers, there is an alternative view of these constructs that


doesn’t involve bound variables. For example, we can view the imaging
construct {exp : v ∈ S } for an expression exp as Imaging(exp, S ), where exp
is a mapping and S is a value. We can then write {2 ∗ n + 1 : n ∈ N} as
Imaging(n 7→ 2 ∗ n + 1, N). I chose the name Imaging for this construct
because {2 ∗ n + 1 : n ∈ N} is the set that mathematicians call the image of
the set N under the mapping n 7→ 2 ∗ n + 1.
There is a reason why, in both constructs, S is not in the scope of the
bound variable v . If it were, then {v : v ∈ {v }} and {v ∈ {v } : true} would
both equal a set that contains all values; and we will see in Section 2.2.6.3
that allowing the collection of all values to be a set would lead to a contra-
diction, making ZF unsound.
CHAPTER 2. ORDINARY MATH 56

2.2.4 Venn Diagrams


The operators ∩, ∪, ⊆, and = satisfy the same properties as the operators
∧, ∨, ⇒, and ≡ of propositional logic. If a propositional logic tautology
contains only these four operators, then substituting ∩ for ∧, ∪ for ∨, ⊆ for
⇒, and = for ≡ yields a theorem of set theory. Thus ⊆ is transitive; and
∩ and ∪ are commutative and associative and satisfy these distributivity
properties:

|= S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
|= S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
Moreover, if all the sets are subsets of a set W, then a propositional logic
tautology containing ¬ becomes a theorem of set theory when, in addition
to substituting operators of set theory for propositional logic operators, each
subexpression ¬S is replaced with W \ S .
This correspondence between theorems of propositional logic and set
theory is the result of a close connection between predicates and sets. For a
set W, there is a natural 1-1 correspondence between predicates on W and
subsets of W. The predicate P and the corresponding set S P are related as
follows:

S P = {v ∈ W : P (v )} P (v ) = (v ∈ S P )

Suppose that instead of letting a value be a ZF set, we let a value be any


element of W and we let a set be any subset of W. The meaning [[F ]] of a
predicate logic formula F would then be a predicate on W. Define F to be
the subset of W that corresponds to [[F ]], so

F = {v ∈ W : [[F ]](v )}

The following calculation shows that |= F ∧ G = F ∩ G :

F ∧ G = {v ∈ W : [[F ∧ G]](v )} by definition of . . .


= {v ∈ W : [[F ]](v ) ∧ [[G]](v )} by the meaning of ∧
= {v ∈ W : [[F ]](v )} ∩ {v ∈ W : [[G]](v )} by definition of ∩
= F ∩G by definition of . . .

Similar calculations show

|= F ∨ G = F ∪ G |= F ⇒ G = (F ⊆ G)
|= F ≡ G = (F = G) |= ¬F = W \ F
CHAPTER 2. ORDINARY MATH 57

If we take W to be the set of points in a plane, then subsets of W can


be represented by pictures. When used to illustrate predicate logic, such
pictures are called Venn diagrams. For example, if predicate symbols F and
G are represented by sets of shaded points describing F and G, then F ∧ G
is represented by the points of F ∩ G, where F and G overlap. You can
find on the Web numerous explanations of the operators of propositional
logic using Venn diagrams. Those diagrams provide a good way to become
familiar with propositional logic.

2.2.5 Bounded Quantification


In ZF, the formula ∀ v : F asserts that F is true when any set/value is
substituted for v . We usually want to assert only that a formula F is true
for all values of v in some set S . This assertion is written ∀ v ∈ S : F . A
similar notation applies to ∃ and to the choose operator. In all these
constructs, the formula F is in the scope of the bound variable but the set S
is not. That means the following definitions of these constructs are correct
if F is a placeholder and S is either a placeholder considered to lie outside
the scope of v or a variable.

(2.20) ∀ v ∈ S : F = ∀ v : (v ∈ S ) ⇒ F

∃ v ∈ S : F = ∃ v : (v ∈ S ) ∧ F

choose v ∈ S : F = choose v : (v ∈ S ) ∧ F
We can write the definition of the Imaging set construct in Section 2.2.3 as:
|= e ∈ {exp : v ∈ S } ≡ ∃ v ∈ S : (e = exp)
We call the constructs ∀ v ∈ S : F and ∃ v ∈ S : F bounded quantifica-
tion. Just like the unbounded quantification constructs we’ve seen before,
we can view bounded quantification without bound variables by consider-
ing F to be a mapping rather than a formula. We can write ∃ v ∈ S : F as
Forall (S , F ), so ∃ x ∈ R : x > y is an abbreviation for Forall ( R, x 7→ x > y).
Stating the rules for bounded quantification in this view is left as an exercise.
We can view ∀ v ∈ S and ∃ v ∈ S as operators that are the duals of each
other. This means that the following assertions are true:
(2.21) |= ¬ (∀ v ∈ S : F ) ≡ ( ∃ v ∈ S : ¬F )
|= ¬ (∃ v ∈ S : F ) ≡ ( ∀ v ∈ S : ¬F )
You should convince yourself that they follow from the intuitive meanings of
∀ v ∈ S and ∃ v ∈ S . It’s a good exercise to derive them from the definitions
CHAPTER 2. ORDINARY MATH 58

(2.20), the duality of ∀ v and ∃ v expressed in (2.10), and propositional logic.


You should also convince yourself of these properties of quantification over
the empty set {}:

|= (∀ v ∈ {} : F ) ≡ true |= (∃ v ∈ {} : F ) ≡ false

Some obvious abbreviations are used for nested ∀ and nested ∃ expres-
sions. For example, ∀ v ∈ S : (∀ w ∈ T : F ) is written ∀ v ∈ S , w ∈ T : F and
∀ v ∈ S , w ∈ S : F is written ∀ v , w ∈ S : F . Definitions (2.20) and the re-
quirement that the bound variable cannot appear in the expression S imply:

|= (∀ v , w ∈ S : F ) ≡ (∀ w , v ∈ S : F )
|= (∃ v , w ∈ S : F ) ≡ (∃ w , v ∈ S : F )

There are no such abbreviations for choose. (It’s not clear what they would
mean.)
If S is a finite set containing n elements, then ∀ v ∈ S : F equals the
conjunction of n formulas, each obtained by substituting an element of S
for v in F . Similarly, ∃ v ∈ S : F equals the disjunction of those n formulas.
If S is an infinite set, we can think of ∀ v ∈ S : F and ∃ v ∈ S : F as the
conjunction and disjunction of the infinitely many formulas obtained by
substituting elements of S for v in F . When we say that a formula is a
conjunction or disjunction, we sometimes include the case when it is such
an infinite conjunction or disjunction. It should be clear from the context
when we’re doing this.
The rules for reasoning about unbounded quantifiers and unbounded
choose can be applied to the bounded versions of these operators by using
the bounded operators’ definitions. For example, consider the ∀ introduc-
tion rule of Section 2.1.9.3. We prove ∀ v ∈ S : F by applying the rule to
∀ v : (v ∈ S ) ⇒ F , which means proving:

Assume: new v , v ∈ S Prove: F

Since it occurs frequently, we abbreviate new v , v ∈ S as new v ∈ S and


write this as:

Assume: new v ∈ S Prove: F

This abbreviation also makes it clear that the bound variable v can’t appear
in the expression S .
CHAPTER 2. ORDINARY MATH 59

Applying (2.16) to the definition of bounded choose, using the tautol-


ogy |= (R ∧ (F ≡ G)) ⇒ (R ∧ F ≡ R ∧ G), proves:
(a) |= (∃ v ∈ S : F ) ⇒
((F ∧ (v ∈ S )) with v ← (choose v ∈ S : F ))
(b) |= (∀ v ∈ S : F ≡ G) ⇒
((choose v ∈ S : F ) = (choose v ∈ S : G))

2.2.6 Infinite Sets and Collections


2.2.6.1 Infinite Sets
In an abstract program, the value of a variable might assume any value in
an infinite set of possible values. I suspect most people think infinite sets
make math more complicated. They’re wrong. Math uses infinite sets to
make things simpler. One reason coding languages are complicated is they
require you to do arithmetic with a finite set of numbers, rather than the
infinite set of numbers you used as a child. The simple rules of arithmetic
you learned as a child, such as
(x + y) + z = x + (y + z )
aren’t valid for arithmetic with a finite set of integers.
Infinite sets are needed to make math simple, but they have properties
that you may find surprising. An amusing example is called Hilbert’s hotel,
which is a hotel with an infinite set of rooms numbered 1, 2, 3, etc. Even
when Hilbert’s hotel is full, there is room for another guest. When a new
guest arrives, we just move all guests from their room numbered i to the
room numbered i + 1, and we give the new guest room number 1.
We say that two finite sets are the same size iff they have the same
number of elements. Equivalently, they are the same size iff we can find
a 1-1 correspondence between their elements. For example, here is a 1-1
correspondence between the sets 0 . . 4 and {2, 5, 8, 12, 17} that shows they
have the same number of elements.
0..4 : 0 1 2 3 4
l l l l l
{2, 5, 8, 12, 17} : 2 5 8 12 17
We can extend the concept of size of a set to infinite sets by defining any
two sets to have the same size iff there exists a 1-1 correspondence between
them. One difference between infinite and finite sets is that an infinite set
can be the same size as a proper subset of itself. Hilbert’s hotel illustrates
CHAPTER 2. ORDINARY MATH 60

that adding one element to an infinite set doesn’t change its size. Here is a
1-1 correspondence showing that the set of integers is the same size as the
set of natural numbers:
I : 0 1 −1 2 −2 3 −3 4 −4 . . .
l l l l l l l l l
N : 0 1 2 3 4 5 6 7 8 ...
We say that a set S is smaller than a set T iff S is the same size as a subset
of T , but T is not the same size as a subset of S . In the 19th century, Georg
Cantor upset many mathematicians by showing that the set of integers is
smaller than the set of real numbers. He also showed that P(S ) is bigger
than S , for every set S . A theorem of ZF called the Schröder-Bernstein
Theorem states that if S and T each is the same size as a subset of the
other, then S and T are the same size. The definition of size and this
theorem generalize to arbitrary collections.
A set is called countable iff it is either finite or has the same number of
elements as N (and hence the same number of elements as I). No infinite
set is smaller than N, so countably infinite sets are the smallest infinite sets.
The following theorem asserts that a countable union of countable sets is a
countable set.

Theorem[2.1 If T is a countable set and every element of T is a countable


set, then T is countable.

Proof sketch: First, assume T and all its elements are infinite sets. Since
T is countable, we can enumerate its elements as A, B , C , D, etc. Since all
of these sets are countable, we can list their elements as:

A = {a 0 , a 1 , a 2 , a 3 , a 4 , . . .}
B = {b 0 , b 1 , b 2 , b 3 , b 4 , . . .}
C = {c 0 , c 1 , c 2 , c 3 , c 4 , . . .}
D = {d 0 , d 1 , d 2 , d 3 , d 4 , . . .}
..
.

Here is the required 1-1 correspondence. (The elements of A ∪ B ∪ . . . are


grouped to make the pattern clearer.)

0 N: 1 2 3 4 5 6 7 8 9 ...
l l l l l l l l l l
A ∪ B ∪ C ∪ D ∪ . . . : a0 a1 b0 a2 b1 c0 a3 b2 c1 d0 . . .
CHAPTER 2. ORDINARY MATH 61

If T or any of its elements are finite, we can add elements to them to


make them all countably infinite, so we have proved that T is a subset of a
countable set. Since N is the smallest infinite set, any subset of a countable
set is countable. End Proof Sketch

2.2.6.2 Mathematical Induction


Any natural numbers n can be written as (. . . (0 + 1) + 1) + . . .) + 1, where
there are n ones. This means that we can prove ∀ n ∈ N : F by proving two
things:
1. F is true when we substitute 0 for n.
2. If F is true, then it is true when we substitute n + 1 for n.
Such a proof is called a proof by mathematical induction. We can write
these two proof steps as:
1. F with n ← 0
2. Assume: new n ∈ N, F
Prove: F with n ← n + 1
A little thought shows that we can strengthen the assumption in step 2 to
assert not just that F is true for n, but that F is true when any number in
0 . . n is substituted for n. In other words, we can replace step 2 by:
2. Assume: new n ∈ N, ∀ m ∈ 0 . . n : F with n ← m
Prove: F with n ← n + 1
This proof is sometimes called a proof by generalized mathematical induc-
tion. We can combine these two steps into one by observing that 0 . . (−1) is
the empty set, so ∀ n ∈ 0 . . (−1) : F equals true. This observation and a lit-
tle thought shows that generalized mathematical induction can be expressed
as proving:
Assume: new n ∈ N, ∀ m ∈ 0 . . (n − 1) : F with n ← m
Prove: F
Combining the two steps into one saves no work because the proof will be
broken into the two cases n = 0 and n > 0, which are the same as the two
steps. However, because 0 . . (n − 1) is the set of natural numbers less than
n, we can rewrite this statement as:
(2.22) Assume: new n ∈ N,
∀ m ∈ {i ∈ N : n > i} : F with n ← m
Prove: F
CHAPTER 2. ORDINARY MATH 62

Writing induction like this provides a new way of thinking about it. Instead
of starting from 0 and going up to bigger numbers, we think of starting
from an arbitrary number n and going down to smaller numbers. That is,
to prove F is true for n, we assume it’s true for numbers m smaller than
n. We can then prove F is true for each of those numbers m by assuming
it’s true for numbers p smaller than m. And so on. We can’t keep finding
smaller and smaller numbers forever. Therefore, we must eventually prove
that F is true for some number or numbers without using any assumptions.
The only property of N necessary for (2.22) to be a sound proof of
∀ n ∈ N : F is that there is no infinite sequence m 0 , m 1 , m 2 , . . . of elements
in N such that m 0 > m 1 > m 2 > . . . is true. Statement (2.22) proves
∀ n ∈ S : F for any set S with a greater-than relation > satisfying this con-
dition.
For later use, we define the property we need S to satisfy for an arbitrary
collection S , not just a set. A relation on a collection S is defined to be a
Boolean-valued mapping  on pairs of values in S , where we write n  m
instead of  (n, m). We define  to be well-founded on S iff there does
not exist a function f with domain N such that f (i ) is in S for all i ∈ N
and ∀ i ∈ N : f (i )  f (i + 1). The following theorem can be generalized to
an arbitrary collection S , but we state it only for sets.
Theorem 2.2 If  is a well-founded relation on the set S , then proving
the following statement proves ∀ n ∈ S : F .
Assume: new n ∈ S ,
∀ m ∈ {i ∈ S : n  i } : F with n ← m
Prove: F
This kind of proof is called a proof by well-founded induction. An example
is proving a property F of subsets of a finite set T by induction on the size
of the subset. We prove F is true of the empty set and that it is true of a
nonempty subset U of T if it is true of some proper subsets of U . The set
S of the theorem is P(T ) and U  V is defined to be true iff V is a proper
subset of U .

2.2.6.3 Collections
ZF has mind-bogglingly big sets. Not only is P( R) bigger than R, and
P(P( R)) bigger than P( R), and P(P(P( R))) bigger than . . . , but the
union of all these huge sets is also a set. However, there are collections of
sets that are too big to be a set. The biggest such collection is the collection
of all sets, which we call V.
CHAPTER 2. ORDINARY MATH 63

Here’s how we show that V can’t be a set. If V were a set, then


{v ∈ V : ¬(v ∈ v )} would be a set; let’s call that set S . Consider the formula
S ∈ S . If S were a set, then S ∈ V would be true, so the definition of the
subsetting construct would imply S ∈ S is equivalent to ¬(S ∈ S ), which is
impossible. Therefore, V can’t be a set. This argument is called Russell’s
paradox, because it was discovered by the mathematician and philosopher
Bertrand Russell.
Since V can’t be a set, any collection C the same size as V also can’t
be a set because if it were, the 1-1 correspondence between the elements
of V and C would, by the imaging construct, imply that V is a set. In
general, if there is a 1-1 correspondence between the elements of V and a
collection of elements contained in C, then C can’t be a set. (This follows
from Russell’s paradox and the generalized Schröder-Bernstein Theorem.)
For example, the collection of all finite sets isn’t a set because S ↔ {S }
is a 1-1 correspondence between the values in V and the collection of sets
having one element, all of which are finite sets.

2.2.6.4 Eliminating Collections


There’s a way we could use only sets, without mentioning collections that
aren’t sets. Instead of letting V be the collection of all sets, we could define
it to be a really big set. In particular, we could define it to be big enough
to include the real numbers, the Booleans, and all the sets we can express
using them and the operators and constructs of ZF. Instead of letting a
value be any ZF set, we could let it be any element of V.
The argument of Russell’s paradox still applies when V is defined in this
way. It shows that V can’t be an element of itself—in other words, that V
can’t be a value. Instead of having collections that are not sets, we would
have subsets of V that are not values.
Changing our definition of V like this would make no practical difference,
because it would make V big enough to contain every set we would ever want
to describe. All it would do is change the terminology, replacing collection
and set with subset of V and value (element of V). I think it’s less confusing
to think about sets versus collections rather than sets that are values versus
sets that aren’t values. We will therefore keep defining V to be the collection
of all sets.
The possibility of making V a ZF set is interesting for the following
reason. We use math to describe and reason about the collection of all
possible executions of a program, and we will see that this collection isn’t
a set/value. Most math is based on sets, not collections. If we apply math
CHAPTER 2. ORDINARY MATH 64

that was developed for reasoning about sets to deduce a result about this
collection, we could in theory obtain an invalid result. The math used in
this book is, from a mathematical viewpoint, so simple that such a mistake
is unlikely. (The one conceivable exception is in Appendix Section A.5.)
The ability to make V a ZF set means that even if we did deduce something
that was incorrect because V is not a set, then a counterexample could not
arise in practice.

2.2.7 Meaningless Expressions


We now address the issue, raised above, of what an expression like (x +y)∧z
means. To do that, we answer a question that might have arisen when you
studied arithmetic: What does 1/0 equal?
As a child I was taught that I wasn’t supposed to write 1/0. But I later
realized that sometimes it’s a perfectly natural thing to write. For example,
this is a theorem of arithmetic:

|= ∀ x ∈ R : (x 6= 0) ⇒ (x ∗ (1/x ) = 1)

Therefore, the ∀ Elimination law of predicate logic implies that

(x 6= 0) ⇒ (x ∗ (1/x ) = 1)

is true for all x ∈ R. In particular, this formula is true:

(2.23) (0 6= 0) ⇒ (0 ∗ (1/0) = 1)

This is a true formula containing the expression 1/0 that children aren’t
supposed to write.
So, what does 1/0 equal? Some think that it should be an evil value
that can corrupt expressions in which it appears, causing them to equal
that evil value. This leads to a three-valued logic, in which the value of a
predicate can equal not only true or false but also the evil value. This is
unnecessarily complicated.
The simple answer to “What does 1/0 equal?” is “We don’t know.” It’s
a√value, but our definition of “/” doesn’t tell us what value. It might equal
2; it might equal the set N. Formula (2.23) is true because 0 6= 0 equals
false, and false ⇒ P equals true for any predicate P . A formula like
1/0 is meaningless, meaning that all we know about it is that it’s a value.
Meaningless expressions don’t concern us because we shouldn’t write
them. Writing x ∪ 1 when we meant to write x + 1 is an easy error to detect.
It will be found with any method that can find subtle errors in abstract
CHAPTER 2. ORDINARY MATH 65

programs. There is no reason to complicate the math that we use by trying


to make meaningless expressions illegal.
A natural way to define division of real numbers is:

r /s = choose q ∈ R : q ∗ s = r

Since there is no real number q for which q ∗ 0 = 1, this defines 1/0 to equal
choose q ∈ R : false, which is a meaningless expression. However, 2/0
equals this same meaningless expression, so this definition implies 1/0 = 2/0.
I don’t like that, so I prefer a definition such as:

r /s = if s 6= 0 then choose q ∈ R : q ∗ s = r
else {r } + {s}

This tells nothing about r /0, except that the Substitution Rule implies that
it equals u/0 if r = u.

2.3 Functions
2.3.1 Functions as Mappings
Mathematicians usually define a pair5 hx , y i to be a set (usually the set
{{x }, {x , y}}) and define a function f to be a set of ordered pairs, where the
pair hx , y i in f means that f (x ) equals y. But they seldom use that definition
to define a particular function, instead defining the squaring function sq on
real numbers by writing something like:

Let sq be the function on R such that sq(x ) = x 2 .

I prefer to define a function to be a kind of mapping that is a set/value,


without defining what its elements are—in part because it’s convenient to
define a pair to be a function. The function sq is written as: x ∈ R 7→ x 2 .
In general, the meaning of the function construct v ∈ D 7→ exp is defined
by the following rule, which declares v to be a variable whose scope is the
expression exp, so exp is a placeholder and this variable v cannot appear
in D:

|= ∀ e ∈ D : (v ∈ D 7→ exp)(e) = (exp with v ← e)


5
Mathematicians generally enclose pairs in parentheses, but parentheses are used for
lots of other things, so we make formulas containing pairs easier to read by using angle
brackets h and i instead.
CHAPTER 2. ORDINARY MATH 66

The set D is called the domain of the function, and the domain of a function
f is written domain(f ). For a function f , the value of f (e) is specified only
if e is in its domain. Two functions f and g are equal iff they have the
same domain and f (v ) = g(v ) for all v in their domain. More precisely, the
function constructor satisfies this rule:
(2.24) |= ((v ∈ D 7→ exp 1 ) = (v ∈ E 7→ exp 2 )) ≡
(D = E ) ∧ (∀ v ∈ D : exp 1 = exp 2 )
We define D → S to be the set of all functions f with domain D such that
f (x ) ∈ S for all x ∈ D. A value f is a function with domain D iff f equals
v ∈ D 7→ f (v ). We can therefore define the set D → S by:
|= f ∈ (D → S ) ≡ ∧ f = (v ∈ D 7→ f (v ))
∧ ∀ v ∈ D : f (v ) ∈ S
(Like all our definitions of set-forming operators, this asserts that D → S is a
set if D and S are sets.) It follows from (2.24) that there is a unique function
whose domain is the empty set. That function can be written v ∈ {} 7→ 42,
where we can replace 42 with any value.
An array in modern coding languages is described mathematically as
a function, where the expression f [x ] in the language means f (x ). For a
variable f whose value is an array/function, assigning the value 4.2 to f [14]
changes the value of f to a new array/function that we can write as
x ∈ domain(f ) 7→ if x = 14 then 4.2 else f (x )
Mathematicians have little need to write such a function, but it occurs of-
ten when math is used to describe programs, so we need a more compact
notation for it. We write it like this:
f except 14 7→ 4.2
This notation has been used above when f is an interpretation Υ. An
interpretation is a function whose domain is the set of all variables.
We have defined what are usually called functions of a single argument.
Mathematicians also define functions of multiple arguments. For example,
+ can be considered to be a function of two arguments, where x + y is an
abbreviation of +(x , y). A function with n arguments can be considered to
be a function whose domain is a set of n-tuples, so f (x , y) is an abbreviation
of f (hx , y i).
A function is a special kind of mapping that is a value. A mapping
can be defined to assign a specified value to elements of any collection. For
CHAPTER 2. ORDINARY MATH 67

example, the mapping P, where P(S ) is the set of subsets of S , is defined


on the collection of all sets. However, P is not a value, so it can’t be the
value of a variable. More precisely, as explained in Appendix Section A.1, we
can’t let P be the value of a variable because we√don’t know if it’s a set. A
function is a value, no different than values like 2 and N. But the value of
f (v ) for a function f can be specified only for values v in domain(f ), which
has to be a set. When describing a program mathematically, we often have
to decide whether or not to define a particular mapping to be a function. If
we want to allow it to be the value of a (mathematical or program) variable,
then it must be a function.

2.3.2 Sequences and Tuples


We often number the items in a list, which means giving each item a name
like item number 2. We will use two different kinds of lists that are numbered
in different ways:

Ordinal These are lists whose items are naturally named with the ordinal
numbers first, second, third, etc. For example, in a list of people
waiting to be served, the second person to be served is naturally named
person number 2.

Cardinal These are lists in which it is natural to name an item by its


distance from an item numbered 0. For example, it’s natural to number
the floors of a building by their distance from the ground floor. The
ground floor is floor number 0 and floor number 2 is two stories above
the ground floor.

I think mathematicians call such lists sequences, so we will call them ordinal
and cardinal sequences.
In most of today’s coding languages, array elements are numbered start-
ing with 0. They make it convenient to describe cardinal sequences and less
convenient to describe ordinal sequences. When I started describing abstract
concurrent programs with mathematical formulas, I discovered that the for-
mulas were usually simpler if I described finite lists as ordinal sequences.
However, the meaning and properties of the formulas are defined in terms
of infinite sequences and finite prefixes of those sequences; and the math is
simpler if those are cardinal sequences. So, we use both kinds of sequences.
The obvious way to represent a sequence mathematically is as a function
whose domain is the set of numbers of the sequence’s items. An ordinal
sequence of length n (one containing n items) is a function whose domain is
CHAPTER 2. ORDINARY MATH 68

1 . . n. We write ordinal sequences as lists√ enclosed in angle brackets. Thus, if


s is the 4-item ordinal sequence h2, {1, 2}, 2, √h7ii then domain(s) = 1 . . 4,
s(1) and s(3) equal 2, s(2) equals the set {1, 2}, and s(4) equals the 1-item
sequence h7i, which equals i ∈ {1} 7→ 7.
A pair like h42, −3i is therefore a 2-item sequence. There is no difference
between tuples and finite ordinal sequences. An n-tuple is a function with
domain 1 . . n. This provides a simple, natural way of referring to the items
of a tuple: the i th item of a tuple t is t(i ).
The Cartesian product × of sets is defined so that S × T × U is the set
of all triples hs, t, u i with s ∈ S , t ∈ T , u ∈ U . Deciphering this definition
of the general Cartesian product S 1 × · · · × S n of n sets S i , as well as the
rest of the definitions in this section, is a good way to become familiar with
the math that is the topic of this chapter:

S 1 × · · · × S n =[
{ f ∈ (1 . . n → {S i : i ∈ 1 . . n}) : (∀ i ∈ 1 . . n : f (i ) ∈ S i ) }

We define Seq(S ) to be the set of finite ordinal sequences whose items are
elements of the set S :

[
(2.25) Seq(S ) = { (1 . . n → S ) : n ∈ N }

The empty sequence h i has domain 1 . . 0, which is the empty set. It’s a
simple way to write the (unique) function whose domain is the empty set.
(It is the one sequence that is both an ordinal and a cardinal sequence.)
We now define some operators on both ordinal and cardinal sequences.
To make it easier to write definitions that apply to both kinds of sequence,
for a nonempty sequence s we define 1st (s) to equal 1 if s is an ordinal
sequence and 0 if it is a cardinal sequence:

1st (s) = if 0 ∈ domain(s) then 0 else 1

Recall that #(S ) is the number of elements of a finite set S .

Len(s) The length of the finite sequence s. It equals #(domain(s)).

Head (s) The first item of a nonempty sequence s. It equals s(1st (s)).

Tail (s) The remainder of the nonempty sequence s after its first item is
removed. It equals:
i ∈ { j − 1 : j ∈ (domain(s) \ {1st (s)}) } 7→ s(i + 1)
CHAPTER 2. ORDINARY MATH 69

s ◦t The concatenation of the finite sequence s with the finite or infinite


sequence t, both of them the same kind of sequence—either ordinal
or cardinal. It equals:
i ∈ domain(s) ∪ { j + Len(s) : j ∈ domain(t) }
7→ if i ∈ domain(s) then s(i ) else t(i − Len(s))
It would be nice to define Append (s, v ) to be the sequence obtained by
appending the item v to the end of the finite sequence s. However, there is
no way to decide if Append (h i, v ) is an ordinal or cardinal sequence, since
the empty sequence h i is both. We will need to use Append only for ordinal
sequences, so we define it by:

Append (s, v ) = s ◦ hv i

In this book, a list means a piece of syntax consisting of expressions


separated by commas. For example:
Let x be the list x 1 , . . . , x n of variables.
means that the symbol x is an abbreviation for x 1 , . . . , x n , where each x i
is a variable. The expression hxi is then an abbreviation for the n-tuple
hx 1 , . . . , x n i of variables.

2.4 Definitions
Definitions are omnipresent in mathematics. Despite their importance,
mathematicians seem to give little thought to what definitions actually
mean. What they mean is a practical concern because if it contained no new
definitions, the formula that describes a typical abstract program would be
hundreds of lines long. We can understand such a large formula only by us-
ing definitions to decompose it into understandable pieces. A precise under-
standing of the abstract programs we write requires a precise understanding
of what those definitions mean.
To understand a defined symbol, we need to understand what the symbol
is defined in terms of. If we expand all definitions used in its definition, we
obtain a formula containing only variables and operators that we assume
to be understood by the readers of the formula. All those operators are
mappings on collections of values or of tuples (usually pairs) of values. The
operators of arithmetic can be viewed as functions whose domains are sets of
numbers or pairs of numbers. Operators like ∈, ∪, and domain are mappings
on values/sets or pairs of values/sets. Operator just means mapping.
CHAPTER 2. ORDINARY MATH 70

The meaning of constructs that introduce bound variables is not hard to


define in terms of the view that eliminates bound variables by considering
those constructs to be operators on mappings. However, I assume you un-
derstand the constructs of ordinary math already defined, and there is no
need to define any new ones. So we need only consider definitions of values
and operators.
This section first explains the meaning of ordinary definitions, then dis-
cusses recursive definitions, and then describes definitions that are local to
proofs and to expressions.

2.4.1 Ordinary Definitions


A definition of a symbol F has the form

(2.26) F (v 1 , . . . , v n ) = exp

where exp is an expression. The v i are called parameters, but they are
essentially bound variables whose scope is the expression exp. Like variable
declarations, definitions have scopes. And as with non-bound variables, the
scopes of ordinary definitions are usually implicit.
We don’t allow circular definitions. That is, it must be possible to order
all definitions so that in (2.26), the definitions of all defined symbols that
occur in exp precede this definition of F . For an ordinary definition, F
cannot appear in exp. In Section 2.4.2, we describe recursive definitions in
which F can appear in exp. But we never allow exp to contain a symbol
whose definition follows the definition of F . (We have no need for mutual
recursion, in which several symbols can be defined in terms of one another.)
We first consider definitions with no parameters, in which we omit the

parentheses and just write F = exp .

2.4.1.1 Definitions with no Parameters



Let’s consider the definition F = x + 1. It seems perfectly simple, but it’s
not clear what it means. In particular, with that definition of F , what
does the formula ∃ x : x > F mean? One possibility is that the formula
equals ∃ x : x > x + 1, which equals false. The other possibility is that it
equals the formula that it seemed impossible to write—the valid formula
∃ x : x > x + 1 in which the x in x + 1 is the name of a variable different
from the bound variable x in the formula.

The question we have to answer is: does the definition F = x + 1 define
F to be an abbreviation for the 3-symbol sequence “x + 1”, or does it define
CHAPTER 2. ORDINARY MATH 71

F to equal the expression denoted by that sequence where the definition


occurs? I find it more useful to think in terms of expressions than in terms
of symbol sequences. Therefore, we will consider it to have the second
meaning.
This choice allows us to write formulas that we couldn’t write before.
However, it makes it harder to use definitions. We can’t replace a defined
symbol by its definition without making sure that doing so doesn’t result
in variable capture. The Safe Scoping Rule will make such variable capture
impossible if we add the requirement that the scope of a definition must be
contained within the scope of every variable that occurs in the definition.
This will allow us to replace a defined symbol by the text of its definition
wherever the symbol appears.
However, sometimes we will want to use a defined symbol F outside the
scope of its definition. We will do that with this expression
(2.27) F with v 1 ← exp 1 , . . . , v m ← exp m
where the v i are all the variables in F whose scope does not include the ex-
pression (2.27), and the exp i are expressions that contain only variables
whose scope includes (2.27). In particular, we’ll see that proving that
one program implements another requires proving an assertion of the form
|= S ⇒ (T with . . . ), where T is defined outside the scope of the definition
of S .

2.4.1.2 Definitions with Parameters


Extending the meaning of a definition to definitions with a parameter is

straightforward. The definition Op(v ) = exp defines the mapping Op by
defining Op(e) for an expression e to equal:
(2.28) exp with v ← e
The variable v in (2.28) is a bound variable whose scope is the expression
exp.
Generalizing to definitions with multiple arguments is also easy. For

example, if we define an infix operator ] by w ] v = exp, then for any
expressions e1 and e2, the expression e1 ] e2 equals:
exp with w ← e1, v ← e2
We can also define higher-order mappings—ones that have mappings as
arguments. For example, this

Double(F , x ) = F (F (x ))
CHAPTER 2. ORDINARY MATH 72

defines Double(P, {a, b}) to equal P(P({a, b})). (This set contains 16 ele-
ments, one of which is {{}, {b}}.) Definitions of higher-order mappings are
used only in Appendix Section A.2, where their meanings should be obvious.
They are therefore not discussed further here.
Definitions with parameters provide another source of variable capture
if a parameter occurs inside the scope of a bound variable in the definition.

For example, if the mapping F is defined by F (v ) = ∃ y ∈ R : y > v , then
F (y + 1) does not equal ∃ y ∈ R : y + 1. This kind of variable capture can
be made impossible by requiring that if a definition parameter occurs within
the scope of a bound variable in the definition, then a variable with the same
name as that bound variable cannot appear in the scope of the definition.
In practice, definition parameters seldom occur in the scope of a bound
variable. I believe it happens three times in this book.

2.4.2 Recursive Definitions


A recursive definition is one in which the symbol being defined is used in its
definition. Allowing recursive definitions of mappings in ZF without intro-
ducing logical inconsistency is tricky. A method of doing this by translating
a recursive definition to a non-recursive one was apparently first given in
this century [16]. It is explained in Appendix Section A.2. Here we describe
recursive definitions informally, showing the translation to a non-recursive
definition only for recursive function definitions.
Here is a recursive definition of the operator #, where we have defined #
informally to be the mapping such that #(S ) equals the number of elements
in S if S is a finite set. (Its value is unspecified if S is an infinite set.)

(2.29) #(S ) = if S = {} then 0
else 1 + #(S \ {choose e : e ∈ S })
The same kind of reasoning that justifies proof by mathematical induction
shows that this defines #(S ) for any finite set S .
In general, for a recursive definition of an operator F on a collection C of
values, the recursion “terminates” and uniquely defines F (v ) for all v in C
if there is a well-founded relation  on C for which the following condition
is satisfied:
(2.30) For any v in C, the value of F (v) can depend only on the values of
F (w) for some values w in C with v  w.
Definition (2.29) satisfies condition (2.30), where C is the collection of all
finite sets and S  T is defined to equal true iff T is a proper subset of S .
CHAPTER 2. ORDINARY MATH 73

Condition (2.30) allows us to define operators on collections. If we want


to define an operator F recursively on a set, then there’s a simple way to
define F to be a function, if the recursion always terminates. For example,
the factorial function fact is the function with domain N such that fact(n)
equals 1 if n = 0 and n ∗ (n − 1) ∗ . . . ∗ 1 if n > 0. (fact(n) is usually written
n ! .) Its recursive definition is:

(2.31) fact = n ∈ N 7→ if n = 0 then 1 else n ∗ fact(n − 1)

We define this definition to be equivalent to the following non-recursive


definition.

fact = choose f :
f = ( n ∈ N 7→ if n = 0 then 1 else n ∗ f (n − 1) )

Remember that choose f : P equals a value f satisfying P only if |= ∃ f : P


is true. It is true of the choose expression in the definition of fact, so fact
does equal the right-hand side of its definition (2.31), because the operator
definition

f (n) = if n = 0 then 1 else n ∗ f (n − 1)

satisfies (2.30), where C is the set N and  is > .


The generalization from this example to arbitrary recursive function def-
initions should be clear. In practice, it is almost always obvious what a
recursive definition of an operator on a collection defines because it’s clear
that the recursion terminates for all values in the collection. For a function,
that collection is its domain, which is a set.

2.4.3 Local Definitions


Thus far, we have considered definitions with no explicit scope. Our def-
inition of Tail implicitly remains in effect for the rest of this book. The
implicit scope of our definition of Double in Section 2.4.1.2 is that section.
We now introduce definitions with two explicitly limited scopes.
These local definitions allow the possibility of a symbol’s definition oc-
curring in the scope of a definition of the same symbol, or in the scope of
a bound variable having the same name. This raises the same problems as
multiple variables with the same name, and it can be handled the same way.
But in practice, it never occurs for defined symbols.
CHAPTER 2. ORDINARY MATH 74

2.4.3.1 Definitions in Proofs


It’s often useful to make definitions local to a proof—more precisely, local
to a single proof within a larger hierarchically structured proof. We allow a
proof to contain steps that just make one or more definitions. To make the
proof easier to read, we begin such a step with the keyword Define. There’s
no need to number a definition step, since we can refer to the definition by
the name of the symbol being defined. As with the assertion of any proof
step, the scope of each of the definitions in a Define statement is the rest
of the current proof (including the rest of that statement).
A definition needs no proof. However, we may need a proof that a recur-
sive definition defines what we want it to. In that case, the definition step
would be followed by a statement asserting this. For example, a recursive
function definition would appear in a proof like this:

Define f = v ∈ S 7→ FDef (v , f )
4.2. f = (v ∈ S 7→ FDef (v , f ))
4.2.1. . . .

See Appendix Section A.2 for how to prove such an assertion. Most of the
time, it’s obvious that the definition defines what we expect it to, in which
case step 4.2 and its proof can be omitted in a hand proof.

2.4.3.2 Definitions in Formulas


We can also make definitions that are local to an individual formula. We
do this with a let/in construct, the let clause making definitions that are
local to the in clause. For example, here is the definition of the operator
SetSum, where SetSum(S ) equals the sum of the elements of any finite set
S of numbers:

SetSum(S ) = if S = {} then 0

else let n = choose m : m ∈ S
in n + SetSum(S \ {n})
The symbol n defined in the let clause is an abbreviation for the choose
expression; its scope is the in clause. The definition would be equivalent,
but harder to read, if the two occurrences of n in the in clause were replaced
by two copies of that choose expression. A let clause can contain multiple
definitions.
Chapter 3

Describing Abstract
Programs with Math
3.1 The Behavior of Physical Systems
Programs are meant to be executed on physical computers. I have been
guided by the principle that any statement I make about a program should
be understandable as a statement about its execution on one or more com-
puters. I believe this was the principle that guided Turing in defining the
Turing machine as an abstraction of a physical computing device.
The description of our science of concurrent programs begins by exam-
ining the physics of computing devices. We don’t care about the actual
details of how transistors and digital circuits work. We are just interested
in how scientists describe physical systems. As a simple example, we look
at a planet orbiting a star the way an astronomer might.
We consider the one-planet system’s behavior starting at some time t 0 ,
after the star and planet have been formed and the planet has settled into
its current orbit. Let R≥ be the set {r ∈ R : r ≥ t 0 } of all real numbers r
with r ≥ t 0 . The behavior of the one-planet system is described by its state
at each instant of time. We assume the star is much more massive than the
planet, so we can assume that it doesn’t move. We also assume that there
are no other objects massive enough to influence the orbit of the planet, so
the state of the system is described by the values of six state variables: three
describing the three spatial coordinates of the planet’s position and three
describing the direction and magnitude of its momentum. Let’s call those
state variables v 1 , . . . , v 6 ; we won’t worry about which of the six values
each represents. The quantities these variables represent change with time,
so the value of each variable v i is a function, where v i (t) represents the value
at time t. The behavior of the system is described mathematically by the

75
DESCRIBING ABSTRACT PROGRAMS 76

function σ with domain R≥ such that σ(t) is the tuple hv 1 (t), . . . , v 6 (t)i of
numbers, for every t ∈ R≥ . Physicists call σ(t) the state of the system at
time t.
In this description, the planet is modeled as a point mass. Real plan-
ets are more complicated, composed of things like mountains, oceans, and
atmospheres. For simplicity, the model ignores those details. This limits
the model’s usefulness. For example, it’s no good for predicting a planet’s
weather. But models of planets as point masses are sometimes used to plan
the trajectories of a real spacecraft. It’s also not quite correct to say that the
model ignores details like mountains and oceans. The mass of the model’s
point mass is the total mass of the planet, including its mountains and
oceans, and its position is the planet’s center of mass. The model abstracts
those details, it doesn’t ignore them.
The laws that determine the point-mass planet’s behavior σ are ex-
pressed by six differential equations of this form:
dv i
(3.1) (t) = f i (t)
dt
where t ∈ R≥ and each f i is a function with domain R≥ such that f i (t)
is a formula containing the expressions v 1 (t), . . . , v 6 (t). Don’t worry if you
haven’t studied calculus and don’t know what equation (3.1) means. All
you need to know is that it asserts the following approximate equality for
small non-negative values of dt:

(3.2) v i (t + dt) ≈ v i (t) + f i (t) ∗ dt

and the approximation gets better as dt gets smaller, reaching equality when
dt = 0. The differential equations (3.1) have the property that for any time
t > t 0 and any time r > t, the values of the six numbers v i (t) and the
functions f i completely determine the six values v i (r ) and hence the value
of σ(r ). That is, the equations imply:

History Independence For any time t ∈ R≥ , the state σ(r ) of the system
at any time r > t depends only on its state σ(t) at time t, not on
anything that happened before time t.

The generalization from a planetary system to an arbitrary physical system


starting at time t 0 is straightforward. The system is described by state
variables v 1 , . . . , v n , and its behavior σ is described mathematically as the
function with domain R≥ such that σ(t) equals hv 1 (t), . . . , v n (t)i. History
DESCRIBING ABSTRACT PROGRAMS 77

independence is satisfied by any isolated physical system—that is, by any


system that is assumed not to be influenced by anything outside the system.1
There is one way our one-planet system differs from most systems. For
this system, it is possible to solve the differential equations (3.1) to write
the functions v i as formulas in terms of ordinary mathematical operations.
Even for two planets around a star that is not much heavier than them, it
is impossible to write such a solution. The functions v i can be proved to
exist and be unique, but the best we can do in general is find very close
approximations to those functions for some finite interval of time.
Physics describes systems with math. Remember that in math, there are
infinitely many variables. A description of any particular system contains
only a finite number of them—the system variables. The description (3.1) of
a planet orbiting a star contains only the six system variables describing the
planet’s state. It doesn’t say that there is nothing else in the universe. It just
says nothing about any other planets. Instead of thinking of (3.1) and (3.2)
as describing a planet orbiting a star, it’s more accurate to think of them
as describing a universe in which the planet is orbiting the star. They also
describe a universe containing both the planet and a spacecraft that orbits
the star; they just say nothing about the spacecraft, since the spacecraft is
too small to affect the planet’s motion. (The spacecraft’s motion could be
affected by the planet.) Formulas (3.1) and (3.2) just say nothing about the
spacecraft.
Physical science is descriptive. The laws of physics describe how a planet
moves; they don’t instruct the planet. Programs are prescriptive; they tell a
computer what to do. This may make it seem strange to use physical science
as a guide to a science of programs. But being descriptive or prescriptive
is not a property of the math. It’s just how we choose to view that math.
We can view the equations of planetary motion not only as a description of
how a planet moves, but also as commands given to the planet by nature.
The math is agnostic. Our science views a program as a description of what
behaviors it allows, not as commands for producing those behaviors. This
view allows much more freedom in describing programs.
1
In classical physics, the state at time t0 uniquely determines the system’s subsequent
behavior. The situation is less clear in quantum physics where multiple subsequent be-
haviors seem possible, but the set of those behaviors is completely determined by the
state.
DESCRIBING ABSTRACT PROGRAMS 78

3.2 Behaviors of Digital Systems


3.2.1 From Continuous to Discrete Time
Digital systems are physical systems, usually electromagnetic, in which cer-
tain stable states represent a collection of one-bit values. For example, at
a certain point in a circuit, 0 volts may represent a 0 and 3.3 volts may
represent a 1. Classical physics describes the behavior of physical systems
as continuous.2 If the voltage at some point √ in a circuit changes from 0 volts
to 3.3 volts, it must pass through 1 and 2 volts.
Computers and other digital systems are designed so that each bit can
be thought of as passing instantaneously from one stable state to the next.
This means that we can think of there being a sequence of discrete times t 0 ,
t 1 , t 2 , . . . that are the only times at which the value of a bit can change.
(We assume there is an event at time t 0 that initializes all the bits of the
device.) We pretend that between times t j and t j +1 , the part of the circuit
representing each bit is in a stable state. Moreover, whether a bit changes
its value at time t j +1 depends on the (stable) values of the bits immediately
before time t j +1 .3 Thus, the system is history independent.
Although built from one-bit registers, digital systems are designed
so that larger components can also be viewed as changing their state
instantaneously—for example, a 128-bit register or even all the bits in a
chip controlled by a single clock. We can pretend that the entire component
changes its value in discrete steps that can occur only at the times t j . Thus,
we can view a digital system as one whose components are represented by
state variables that can have more than two values.
When a digital system is executing a program, the state of the program
does not correspond directly to the state of the system. The value of a
program variable might be represented by different parts of the system at
different times. For example, its value may at some times be stored in a
memory chip, at some times it may be in a register of a processor chip, and
at some times it might be stored on a disk. Later, we’ll see what it means
mathematically for a digital system to implement a program in this way.
For now, consider a concrete program to be just a digital system described
2
Here, classical physics includes relativity but not quantum mechanics. I believe that
quantum mechanics also describes a continuous universe, but a discussion of that would
take us too far afield.
3
The tj are pretend times, not exact physical times. Two bits that change at the
same pretend time may change at different physical times because the clock pulse that
generates the change may reach one of them a fraction of a nanosecond before the other.
Chip designers must ensure that we can pretend that they change at the same time.
DESCRIBING ABSTRACT PROGRAMS 79

by discretely changing variables whose values are not just bits but may be
any data structure provided by the language—for example, 128-bit integers.
An abstract program is the same, except the√value of a variable may be
any value—for example a real number such as 2, not just a finite-precision
approximation like 1.414213562. Modeling a science of programs on the
science of physical systems ensures that it can address real problems, and
we are not just creating a science of angels dancing on the head of a pin.
(However, the science should be able to describe any discretely behaving
angels, wherever they might be dancing.)
We are seldom interested in the actual times t j at which state variables
can change. To simplify things, we consider only the sequence of states
through which the system passes, ignoring the times at which it enters and
leaves those states. We call the state created at time t j state number j .
Instead of letting a state variable v be a function that assumes the value
v (t) at time t, we consider it to be a function that assumes the value v (j ) in
state number j . In other words, the value of a state variable v is a sequence
of values. A behavior σ of a program is also a sequence, where σ(j ), its state
number j , describes the values of the device’s variables in that state.
If a program or a digital device runs forever, then the sequence of times
t j is infinite and therefore so is the sequence σ of its states. But if a pro-
gram terminates, then those sequences can be finite. Other than parallel
programs, in which concurrency is added to a traditional program so it can
run faster by using multiple processors, most concurrent programs are not
supposed to stop. A concrete concurrent program will not really run for-
ever, but we describe it as running forever for the same reason there are an
infinite number of integers even though we only use a finite number of them:
it makes things simpler.
Still, some concurrent programs are supposed to stop, so we have to
describe them. For simplicity, we describe those programs as well with
infinite state sequences. Exceptionally observant readers will have noticed
that while the times t j had to be chosen so we can pretend that the state
changes only at those times, we did not require that the state had to change
at each of those times. There can be times t j at which none of the program
variables’ values change. In particular, if the program stops, we can add an
infinite number of times t j after it has stopped. This leads to an infinite
sequence of states such that, for some k , the values of the program’s variables
after state number k are the same. We call a pair hσ(j ), σ(j +1)i of successive
states in a behavior σ a step of σ. A step in which the values of the program’s
variables do not change is called a stuttering step of the program.
We call a behavior ending in infinitely many stuttering steps a halting
DESCRIBING ABSTRACT PROGRAMS 80

behavior of the program. It describes an execution in which the program


stops. There are many reasons a program might stop—for example, an error
might cause it to abort. If the program stops because it has completed what
it was supposed to do, we say that it terminates. The term halting covers
all cases when the program stops.
Mathematically, a behavior of a digital system or an abstract program
is an infinite cardinal sequence of states, where each state is an assignment
of values/sets to variables. There is a natural tendency to think of state
number j of a behavior as occurring at time j on some clock that ticks at
a constant rate. Don’t think of it like that. A microsecond might elapse
between when the system reaches state number j and when it reaches state
number j +1, and a day or a femtosecond might then elapse before it reaches
state number j + 2. All we know is that the system can’t reach state number
j + 1 before it reaches state number j .
By removing any information about the physical time at which things
happen, it may appear that we have eliminated the possibility of describing
how much actual time it takes for something to happen. That’s not the case,
and Section 4.4 explains how it’s done. However, correctness of few programs
depends on exactly how long it takes the program to do something, and I
know of no commonly used coding language that allows us to write such
programs. To my knowledge, nothing in the definition of the Java coding
language assures us that executing the statement x = x+1 takes less than a
century.

3.2.2 An Example: Sqrs


Our first example is a very simple abstract program called Sqrs that is
described in Figure 3.1 with pseudocode. The variables statement describes
the program variables and their initial values (their values in state 0). In
this example, the program variables are x and y, and their initial values are
both 1. The program’s code consists of a while true loop, which means
that the body of the loop is repeatedly executed forever. Program Sqrs is
an abstract program because it runs forever, producing a behavior with an
infinite number of states, unlike a concrete program that would halt with
an error when x became too big.
In a science, it would be crazy to let “=” mean anything other than what
it has meant in mathematics for several centuries, so we use “:=” to mean
assignment. Except for the label a that you can ignore for the moment,
it should be obvious what an execution of the loop body does. What’s
not obvious in most pseudocode and in virtually all real code is how to
DESCRIBING ABSTRACT PROGRAMS 81

variables x = 1, y = 1;
while true do
a: x : = x + y + 2 ;
y := y + 2
end while

Figure 3.1: The simple abstract program Sqrs.

represent the execution in a behavior—which means as a sequence of states.


In particular, how many different steps in the behavior describe a single
execution of the loop body?
We would expect to describe execution of the loop body of Sqr with at
least one step. But should there be more? For example, should evaluating
x +y in the first assignment statement be a separate step? Coding languages
seldom answer this question because it makes no difference to the result
computed by a traditional program. However, it can make a big difference
for concurrent programs.
We will adopt the PlusCal algorithm language’s [36] convention of using
labels to indicate what the separate steps of a behavior are. The rule is that
execution from one label to the next constitutes a single step. This means
that a step begins and ends with program execution at a label. For program
Sqrs, this implies that execution of the entire loop body, starting from label
a and finishing when the program reaches a again, is a single step. With
this choice of what constitutes a step in the behavior, the values x (j ) and
y(j ) of the variables in each state j of the behavior are determined by two
formulas:

(3.3) (x (0) = 1) ∧ (y(0) = 1)

(3.4) ∀ j ∈ N : ∧ x (j + 1) = x (j ) + y(j ) + 2
∧ y(j + 1) = y(j ) + 2

We call (3.3) the initial predicate. It determines the initial state. Formula
(3.4) is called the step predicate. It’s the discrete analog of the differential
equations (3.1) that describe the orbiting planet. Instead of describing how
the values of the variables change in the continuous behavior when time
increases by the infinitesimal amount dt, the step predicate (3.4) describes
how they change when the state number of the discrete behavior increases
by one.
DESCRIBING ABSTRACT PROGRAMS 82

You can check that (3.3) and (3.4) define a behavior that begins as
follows where, for example, [x :: 16, y :: 7]3 indicates that state number 3
assigns the values 16 to x and 7 to y, and the arrows are purely decorative.
" # " # " # " # " #
x :: 1 x :: 4 x :: 9 x :: 16 x :: 25
→ → → → → ···
y :: 1 y :: 3 y :: 5 y :: 7 y :: 9
0 1 2 3 4

These first few states of the behavior suggest that in the complete behavior,
x and y equal the following functions:

(3.5) x = (j ∈ N 7→ (j + 1)2 )
y = (j ∈ N 7→ 2 ∗ j + 1)
To prove that (3.3) and (3.4) imply (3.5), we must prove they imply:

(3.6) ∀ j ∈ N : (x (j ) = (j + 1)2 ) ∧ (y(j ) = 2 ∗ j + 1)

A proof by mathematical induction that (3.3) and (3.4) imply (3.6) is a nice
exercise in algebraic calculation.
We can think of (3.5) as the solution of (3.3) and (3.4), just as the for-
mulas describing the position and momentum of the planet at each time t
are solutions of the differential equations (3.1). It is mathematically im-
possible to find solutions to the differential equations describing arbitrary
multi-planet systems. It is mathematically possible to write explicit descrip-
tions of variables as functions of the state number like (3.5) for the abstract
programs written in practice, but those descriptions are almost always much
too complicated to be of any use. Instead, we reason about the initial pred-
icate and the step predicate, though in Section 3.4.1 we’ll see how to write
them in a more convenient way.
The interesting thing about program Sqrs is that the sequence of values
assumed by x in an execution of the program is the sequence of all posi-
tive integers that are perfect squares, and this is accomplished using only
addition. This is obvious from (3.5), but for nontrivial examples we won’t
have such an explicit description of each state of a behavior. Remember
that history independence implies that, at any point in a behavior, what
the program does in the future depends only on its current state. What is
it about the current state that ensures that if x is a perfect square in that
state, then it will equal all greater perfect squares in the future? There is
a large body of work on reasoning about traditional programs, initiated by
Robert Floyd in 1967 [14], that shows how to answer this question. If you’re
familiar with that work, the answer may seem obvious. If not, it may seem
like it was pulled out of a magician’s hat. Obvious or magic, the answer is
DESCRIBING ABSTRACT PROGRAMS 83

that the following formula is true for every state number j in the behavior
of Sqrs:

(3.7) ∧ (x (j ) ∈ N) ∧ (y(j ) ∈ N)
∧ y(j ) % 2 = 1
y(j ) + 1 2
 
∧ x (j ) =
2
This formula implies that x (j ) is a perfect square, since the first two con-
juncts imply that y(j ) is an odd natural number. Moreover, since y(j + 1) =
y(j ) + 2, the last conjunct implies that x (j + 1) is the next larger perfect
square after x (j ). So, the truth of (3.7) for every state number j explains
why the algorithm sets x to all perfect squares in increasing order.
A predicate like (3.7) that is true for every state number j of a behavior
is called an invariant of the behavior. By mathematical induction, we can
prove that a predicate is an invariant by proving these two conditions:

I1. The predicate is true for j = 0.

I2. For any k ∈ N, if the predicate is true for j = k then it’s true for
j = k + 1.

For (3.7), I1 follows from the initial predicate (3.3), and I2 follows from the
step predicate (3.4). (You should have no trouble writing the proof if you’re
used to writing proofs; otherwise, it might be challenging.)
A predicate that can be proved to be an invariant by proving I1 from an
initial predicate and I2 from a step predicate is called an inductive invariant.
Model checkers can check whether a state predicate is an invariant of small
instances of an abstract program. But the only way to prove it is an invariant
is to prove that it either is or is implied by an inductive invariant. For
any invariant P , there is an inductive invariant that implies P . However,
writing an inductive invariant for which we can prove I1 and I2 is a skill
that can be acquired only with practice. Tools to find it for you have been
developed [15, 38], but I don’t know how well they would work on industrial
examples.
The first conjunct of the invariant (3.7) asserts the two invariants
x (j ) ∈ N and y(j ) ∈ N. An invariant of the form v (j ) ∈ S for a variable v
is called a type invariant for v . An inductive invariant almost always must
imply a type invariant for each of its variables. For example, without the
hypotheses that x (j ) and y(j ) are numbers, we can deduce nothing about
the values of x (j + 1) and y(j + 1) from the step predicate (3.4).
DESCRIBING ABSTRACT PROGRAMS 84

variables x = 1, y = 1, pc = a;
while true do
a: x : = x + y + 2 ;
b: y : = y + 2
end while

Figure 3.2: The finer-grained abstract program FGSqrs.

Most mathematicians would not bother to write the first conjunct of


(3.7), simply assuming it to be obvious. However, mathematicians aren’t
good at getting things exactly right. They can easily omit some uninterest-
ing corner case—for example, the assumption that a set is nonempty. Those
“uninteresting corner cases” are the source of many errors in programs. To
avoid such errors, we need to state explicitly all necessary requirements,
including type invariants.

3.2.3 A Finer-Grained Example: FGSqrs


Now consider a modified version of our abstract program Sqrs in which
the execution of each assignment statement in the body of the while loop
is represented as a separate step of the behavior. This is specified in the
pseudocode by adding a label right before the second assignment statement.
The label is b and the program is called FGSqrs.
The natural way to describe the state of FGSqrs is with the variables x
and y and an additional variable to specify which assignment statement is
the next one to be executed.4 Such a variable isn’t needed in Sqrs because
that program has just a single label. The variable we add is traditionally
called pc (for program counter). We will let its value equal the label from
which the execution described by the next step begins. (That execution
ends when it reaches the following label.) We assume that a and b are two
arbitrary distinct values.
The pseudocode for FGSqrs is in Figure 3.2. The variables declaration
contains pc and its initial value, even though we know pc is needed because
there’s more than one label, and program execution is normally assumed to
start at the beginning of the code. But, a little redundancy doesn’t hurt. A
little redundancy doesn’t hurt.
4
In program FGSqrs, an additional variable isn’t needed because which statement
should be executed next can be deduced from the values of the variables x and y, but
that’s not the case in most programs written in pseudocode.
DESCRIBING ABSTRACT PROGRAMS 85

Here is the mathematical description of the behavior of program FGSqrs.


As with Sqrs, it consists of an initial predicate and a step predicate.

Initial Predicate (x (0) = 1) ∧ (y(0) = 1) ∧ (pc(0) = a)

Step Predicate
∀ j ∈ N : if pc(j ) = a
then ∧ x (j + 1) = x (j ) + y(j ) + 2
∧ y(j + 1) = y(j )
∧ pc(j + 1) = b
else ∧ x (j + 1) = x (j )
∧ y(j + 1) = y(j ) + 2
∧ pc(j + 1) = a
When they see this step predicate, most programmers and many computer
scientists think that the conjuncts y(j + 1) = y(j ) and x (j + 1) = x (j ) are
unnecessary. They think that not saying what the new value of a variable
equals should mean that it equals its previous value. But if that were the
case, then what we wrote wouldn’t be math. We would be giving up the
benefits of centuries of mathematical development—the benefits that are
the reason science is based on math. An essential aspect of math is that
a formula means exactly what it says—nothing more and nothing less. If
the step predicate didn’t say what y(j + 1) equals when pc(j ) = a is true,
then there√would be no more reason for it to equal y(j ) than for it to equal
i ∈ N 7→ −42.
You may find it discouraging that the mathematical description of
FGSqrs is more complicated than its pseudocode in Figure (3.2). Please
be patient. You will see in Section 3.4.1 how a little notation can simplify
it. We can always write an abstract program more compactly in pseudocode
than in math, as long as we don’t have to explain precisely what the pseu-
docode means. But science is precise, and a science of abstract programs
must explain exactly what they mean. Moreover, tools can’t check an im-
precise description of a program. Math is the simplest way to explain things
precisely.
PlusCal is a precise language for describing abstract programs in what
looks like pseudocode. (However, it’s infinitely more expressive than ordi-
nary pseudocode because its expressions can be any mathematical expres-
sions—even uncomputable ones.) A PlusCal program is translated to a
mathematical description of the program in TLA+. I often find it easier to
write an abstract program in PlusCal than directly in TLA+. However, I
reason about the TLA+ translation, not the PlusCal code. And for many
DESCRIBING ABSTRACT PROGRAMS 86

abstract programs, including most distributed algorithms, it’s easier to write


the program directly in TLA+ than in PlusCal.
The code whose execution is described as a single step of the behavior is
called an atomic operation. Because a single step in a behavior describing
the execution of Sqrs is replaced by two steps in the behavior describing
the execution of FGSqrs, we say that FGSqrs has a finer grain of atomicity
than Sqrs. Having a finer grain of atomicity implies that the step predicate
is more complicated.
Having a finer grain of atomicity also implies that the inductive invariant
that explains why the abstract program works will be more complicated.
However, there is a trick for obtaining the invariant for FGSqrs from the
invariant (3.7) of Sqrs. Define yy(j ) to equal y(j ) if execution FGSqrs is at
label a, and to equal the value y(j ) will have after executing statement b if
execution is at b. The mathematical definition is:

yy(j ) = if pc(j ) = a then y(j ) else y(j ) + 2
Observe that x and yy are changed at the same time by statement a of
FGSqrs exactly the same way that the loop body of Sqrs changes x and
y. Statement b of FGSqrs leaves x and yy unchanged. This implies that,
because (3.7) is an inductive invariant of Sqrs, the formula obtained from
(3.7) by substituting yy for y satisfies condition I2 for FGSqrs. It’s easy
to check that this formula also satisfies I1, so it is an inductive invariant of
FGSqrs. This trick of finding an expression (such as yy(j )) that is changed
by the fine grained program the way the coarse-grained program changes
a variable (such as y) can often be used to obtain an invariant for a finer-
grained abstract program from an invariant of a coarser-grained one. It is
also at the heart of program refinement, the subject of Chapter 5.

3.3 Nondeterminism
The laws of classical physics, such as the laws of planetary motion, are
deterministic. Given the initial values of all the variables, their values at any
later time are completed determined. Causes of nondeterminism are either
negligible because they have an insignificant effect—for example, meteor
showers—or are simply assumed not to happen—for example, cataclysmic
collisions with errant asteroids.
A program is nondeterministic if the initial state of a behavior doesn’t
determine the complete behavior. Even when executed on supposedly deter-
ministic digital systems, nondeterminism is the norm in programs—especially
concurrent ones. Here are some sources of nondeterminism in programs:
DESCRIBING ABSTRACT PROGRAMS 87

User Input The user giving a value to the program is usually described as
an action of the program that nondeterministically chooses the value
provided by the user. The user can also be described as a separate
process that nondeterministically chooses the value to provide.

Random Algorithms Some algorithms can achieve better average perfor-


mance by making random choices. The science of programs presented
here is not meant for describing average properties of possible behav-
iors, so it can’t distinguish this case from one in which random choices
are the result of user input.

Generality We may want an abstract program to allow multiple possible


implementations. Those possibilities appear as nondeterminism in the
abstract program.

Faults Physical devices don’t always behave the way they’re supposed to.
In particular, they can fail in various ways. Programs that tolerate
failures describe a failure as an operation that may or may not be
executed.

Timing Uncertainty The time taken to perform operations in an individ-


ual process can vary from one execution to another for several reasons,
including (i) being run on different hardware and (ii) competition for
resources with other processes in the same program or in concurrently
executed programs. This results in multiple behaviors in which op-
erations in different processes are executed in different orders. Those
different orders can lead to very different behaviors.
Timing uncertainty is the most important source of errors due to nondeter-
minism that affects all concurrent programs (not just fault-tolerant ones).
Let’s examine a simple example of it.
The example is a trivial abstract multiprocess program called Increment.
It has a variable x that initially equals 0, and each process just increments
x by 1 and terminates. A process does this in two steps: the first step reads
the current value of x , and the second step sets x to one plus the value it
read. You should convince yourself that with N processes, an execution can
terminate with x having any value from 1 through N . The final value of x
will be N if each process executes its two steps with no intervening step by
any other process. The final value will be 1 if all processes read x before
any process sets the value of x .
This abstract program is described with pseudocode in Figure 3.3, where
Procs is the set of processes. (Procs is really a set of process identifiers,
DESCRIBING ABSTRACT PROGRAMS 88

variables x = 0 ;
process p ∈ Procs
variables t = 0, pc = a ;
a: t : = x ;
b: x : = t + 1
end process

Figure 3.3: The Increment abstract program for a set Procs of processes.

but for convenience we call its elements processes.) The only assumption
we make about this set is that it is finite and nonempty. The process
statement declares that there is a process for every element of Procs, and
it gives the code for an arbitrary process p in Procs. The variables t and
pc are local to process p, each process having its own copy of these two
variables. Variable x is global, accessed by all the processes. Process p
saves the result of reading x in its variable t. The initial value of t doesn’t
matter, but letting all variables have reasonable initial values makes a type
invariant simpler, so we let t initially equal 0.
The mathematical description of the abstract program Increment is in
Figure (3.4). The process-local variables t and pc are represented by mathe-
matical variables whose values in each state are functions with domain Procs,
where t(p) and pc(p) are the values of those variables for process p. The
initial predicate, describing the values of the variables in state number 0,
is simple. The possible steps in a behavior are described by a predicate
that, for each j , gives the values of x (j + 1), t(j + 1), and pc(j + 1) for
any assignment of values to x (j ), t(j ), and pc(j ). It asserts that there are
two possibilities, described by formulas PgmStep(j ) and Stutter (j ), that are
explained below.

PgmStep(j ) describes the possible result of some process executing one step
starting in state j . The predicate equals true iff there exists a process
p for which aStep(p, j ) or bStep(p, j ) is true, where:

aStep(p, j ) describes a step in which process p executes its statement


labeled a in state number j . Its last three conjuncts describe the
values of the three variables x , t, and p in state j +1. Many people
are tempted to write t(j + 1)(p) = x (j ) and pc(j + 1)(p) = b
instead of the third and fourth conjuncts. But that would permit
t(j +1)(q) and pc(j +1)(q) to equal any values for q 6= p. Instead
we must use the except operator defined in Section 2.3.1. The
DESCRIBING ABSTRACT PROGRAMS 89

Initial Predicate
∧ x (0) = 0
∧ t(0) = (p ∈ Procs 7→ 0)
∧ pc(0) = (p ∈ Procs 7→ a)

Step Predicate
∀ j ∈ N : PgmStep(j ) ∨ Stutter (j )
where

PgmStep(j ) = ∃ p ∈ Procs : aStep(p, j ) ∨ bStep(p, j )

aStep(p, j ) = ∧ pc(j )(p) = a
∧ x (j + 1) = x (j )
∧ t(j + 1) = (t(j ) except p 7→ x (j ))
∧ pc(j + 1) = (pc(j ) except p 7→ b)

bStep(p, j ) = ∧ pc(j )(p) = b
∧ x (j + 1) = t(j )(p) + 1
∧ t(j + 1) = t(j )
∧ pc(j + 1) = (pc(j ) except p 7→ done)

Stutter (j ) = ∧ ∀ p ∈ Procs : pc(j )(p) = done
∧ hx (j + 1), t(j + 1), pc(j + 1)i = hx (j ), t(j ), pc(j )i

Figure 3.4: The Increment abstract program in math.

first conjunct is a predicate that is true or false of state j . It is an


enabling condition, allowing the step described by the following
three conjuncts to occur iff that condition is true.
bStep(p, j ) describes a step in which process p executes its statement
labeled b in state number j . It is similar to aStep(p, j ). Its
enabling condition is pc(j )(p) = b. The step sets pc(j + 1)(p)
to done, which is a value indicating that the process has reached
the end of its code and terminated.

Stutter (j ) describes a stuttering step starting in state j . It is enabled iff


pc(j )(p) equals done for all p ∈ Procs, so all processes have termi-
nated. At that point, PgmStep(j ) is not enabled, so only an infinite
sequence of stuttering steps can occur, as required for a terminated
abstract program. Note that the second conjunct in the definition of
Stutter (j ) uses the fact that two tuples are equal iff their correspond-
ing elements are equal to write the following formula more compactly:
DESCRIBING ABSTRACT PROGRAMS 90

(x (j + 1) = x (j )) ∧ (t(j + 1) = t(j )) ∧ (pc(j + 1) = pc(j ))

A property we might like to prove about abstract program Increment is that,


when it has terminated, the value of x lies between 1 and the number of
processes. Let’s define N to equal #(Procs), the number of processes. Since
a process has terminated iff its local pc variable equals done, the property
we want to prove is that this formula is an invariant of Increment—that is,
true for every j ∈ N:

(3.8) (∀ p ∈ Procs : pc(j )(p) = done) ⇒ (x (j ) ∈ 1 . . N )

This is not an inductive invariant because condition I2 is not satisfied. For


example, suppose the following is true:

• pc(j )(p) = b and pc(j )(q) = done for all q 6= p

• t(j )(p) = N

Then (3.8) is true in state number j , but false in state number j + 1.


To show that (3.8) is an invariant of Increment, we must find an inductive
invariant that implies it. Stopping now and trying to find that inductive
invariant by yourself is a good exercise. But it’s not easy if you don’t have
practice finding inductive invariants and don’t have a tool to check if what
you think is an inductive invariant actually is one. So, I will write one for
you.
An inductive invariant almost always requires a type invariant for each
variable. We start by defining TypeOK to assert a type invariant for each
of the three variables:

TypeOK (j ) = ∧ x (j ) ∈ 0 . . N
∧ t(j ) ∈ (ProcSet → 0 . . N )
∧ pc(j ) ∈ (ProcSet → {a, b, done})
TypeOK is an invariant, but not an inductive invariant. For example, if
x (j ) = 1, t(j )(p) = N , and pc(j )(p) = b, then TypeOK (j ) is true but a step
satisfying bStep(p, j ) makes TypeOK (j + 1) false. We can make TypeOK an
inductive invariant by weakening it, replacing the two occurrences of 0 . . N
with N. However, I prefer a stronger, more informative type invariant.
To write the rest of the inductive invariant, we define NumberDone(j )
to be the number of processes that have terminated in state j . The precise
definition is:

NumberDone(j ) = #({p ∈ Procs : pc(j )(p) = done})
DESCRIBING ABSTRACT PROGRAMS 91

The complete inductive invariant, which we call Inv , is defined by:



(3.9) Inv (j ) = ∧ TypeOK (j )
∧ ∀ p ∈ Procs :
(pc(j )(p) = b) ⇒ (t(j )(p) ≤ NumberDone(j ))
∧ x (j ) ≤ NumberDone(j )
To prove that Inv is an inductive invariant of program Increment, we must
prove I1 and I2. I1 asserts that the initial predicate implies Inv (0), and I2
asserts that the step predicate implies Inv (j ) ⇒ Inv (j + 1). We will not
consider how these conditions are proved until we have a more convenient
way of writing them.

3.4 Temporal Logic


3.4.1 The Logic of Actions
3.4.1.1 Eliminating State Numbers
There’s an easy way to simplify initial predicates, step predicates, and in-
variants: remove the explicit state numbers. It’s obvious that an initial
predicate is about state number 0, so we can eliminate every “(0)” in it. An
invariant is true for all states, so we don’t have to say which states it’s about.
For a step predicate, we just have to distinguish between v (j ) and v (j + 1)
for a variable v . A notation for doing this that dates back at least to the
early 1980s is to replace v (j ) by v and v (j + 1) by v 0 . The initial and step
predicates of program Increment have been rewritten this way in Figure 3.5,
where they’ve been given the names Init and Next. The inductive invariant
(3.9) is also rewritten without the “(j )” and named Inv . (The “(j )” has
been implicitly removed from the definition of NumberDone.) Make sure
that you understand Figure 3.5 by comparing it with Figure 3.4.
Mathematically, the big leap from Figure 3.4 to Figure 3.5 is removing
the explicit mention of state numbers—for example, writing x instead of
x (j ). In Figure 3.4, Procs and x are both ordinary mathematical variables.
The value of Procs is a set of processes and the value of x is a function whose
domain is N. In Figure 3.5, the value of Procs is a set of processes—the
same set throughout a behavior of the program. However, the value of x
depends on the state of the behavior.
The price of removing explicit state numbers from our formulas is leaving
the domain of ordinary math, with a single kind of variable, and entering a
new kind of math in which there are two kinds of variables: mathematical
DESCRIBING ABSTRACT PROGRAMS 92

Initial Predicate

Init = ∧ x = 0
∧ t = (p ∈ Procs →7 0)
∧ pc = (p ∈ Procs →7 a)

Step Predicate

Next = PgmStep ∨ Stutter
where

PgmStep = ∃ p ∈ Procs : aStep(p) ∨ bStep(p)

aStep(p) = ∧ pc(p) = a
∧ x0 = x
∧ t 0 = (t except p 7→ x )
∧ pc 0 = (pc except p 7→ b)

bStep(p) = ∧ pc(p) = b
∧ x 0 = t(p) + 1
∧ t0 = t
∧ pc 0 = (pc except p 7→ done)

Stutter = ∧ ∀ p ∈ Procs : pc(p) = done
∧ hx 0 , t 0 , pc 0 i = hx , t, pc i

Inductive Invariant

Inv = ∧ TypeOK
∧ ∀ p ∈ Procs : (pc(p) = b) ⇒ (t(p) ≤ NumberDone)
∧ x ≤ NumberDone

Figure 3.5: Abstract program Increment and its invariant Inv in simpler
math.
DESCRIBING ABSTRACT PROGRAMS 93

variables like Procs, whose values are the same in every state of a behavior,
and program variables like x that are implicit functions of the state. Pro-
gram variables like x look weird to mathematicians. In math, the value of a
variable x is fixed. We’ve seen in Chapter 2 that when a mathematician does
something else and introduces a variable x , it’s really a completely different
variable that happens also to be written “x ”. Of course, you’re familiar with
program variables because they’re the variables of coding languages, whose
values change in the course of a computation.
Since this book is about a science of programs, we will henceforth use the
name variable for program variables. Mathematical variables like Procs will
be called constants. When describing a program mathematically, variables
correspond to what we normally think of as program variables. Constants
are parameters of the program, such as a fixed set of processes. Early coding
languages had constants as well as variables. In modern coding languages,
constants are buried in the code, where they are called static final variables
of an object.
In this book, the variables in pseudocode are explicitly declared, and un-
declared identifiers like Procs are constants. For formulas, the text indicates
which identifiers are variables and which are constants.
In addition to having both variables and constants, the formulas in Fig-
ure 3.5 have primed variables, like x 0 . An expression that may contain
primed and unprimed variables, constants, and the operators and values of
ordinary math (which means everything described in Chapter 2) is called a
step expression. A Boolean-valued step expression is called an action. The
math whose formulas are actions is called the Logic of Actions, or LA for
short.

3.4.1.2 The Semantics of the Logic of Actions


As we did in defining the semantics of elementary algebra in Section 2.1.5,
we define the meaning [[exp]] of an expression of LA to be a mapping on
interpretations. An interpretation assigns values to variables. Since LA has
both constants and variables, there are two parts to an interpretation: an
assignment of values to constants and an assignment of values to variables.
Since constants are ordinary mathematical values, and we have already
discussed the semantics of ordinary math, we will ignore the part of an inter-
pretation for LA that assigns values to them. When discussing a formula of
LA, we assume that there is some fixed interpretation Υ that assigns values
to the constants. Constants are usually assumed to satisfy some conditions.
For example, the constants M and N of Euclid’s algorithm in Section 1.5
DESCRIBING ABSTRACT PROGRAMS 94

are assumed to be positive integers, and the constant Procs of program


Increment in Section 3.3 is assumed to be a finite set. We assume that
the fixed interpretation Υ satisfies those assumptions. We define |= F for a
formula F of LA to mean that [[F ]] is true for all interpretations in which
the assignment of values to the constants satisfies the assumptions.
In LA, there are effectively two kinds of variables: unprimed and primed.
An interpretation of LA assigns values to each of those kinds of variables,
where the values assigned to v and v 0 are independent of one another.
We have defined a state to be an assignment of values to program vari-
ables. So, since we’re neglecting constants, an interpretation for an LA
formula is a pair of states—the first assigning values to the unprimed vari-
ables and the second assigning values to the primed variables. We have used
the term step to mean a pair of successive states in a behavior. We now let
it mean any pair of states. We will write the step consisting of the states s
and t as s → t rather than hs, t i because that makes it clear that s and t
are states.
To define the semantics of LA, we therefore have to define [[exp]](s → t)
for any states s and t. We have not yet defined any operators for LA,
so the only operators that can appear in an LA expression are ordinary
mathematical operators like + and Tail . They have the usual semantics in
LA. For example

[[exp1 + exp2]](s → t) = [[exp1]](s → t) + [[exp2]](s → t)

For an unprimed variable v , we define [[v ]](s → t) to equal s(v ), the value
assigned to variable v by state s. For a primed variable v 0 , we define
[[v 0 ]](s → t) to equal t(v ).
We call an LA expression a step expression and an LA formula an action.
For an action A and step s → t, we say that s → t satisfies A or is an A
step iff [[A]](s → t) equals true.
A state expression is an LA expression that contains no primed vari-
ables, and a state formula is a Boolean-valued state expression. For a state
expression exp, the value of [[exp]](s → t) depends only on s, so we can write
it as [[exp]](s).
Because the meaning of an LA expression assigns different values to v
and v 0 , we can treat v and v 0 as two unrelated variables. This means that
we can reason about LA formulas as if constants, unprimed variables, and
primed variables were all different mathematical variables. Thus, for LA as
defined so far, we can regard LA as ordinary math with some mathematical
variables having names like v 0 ending with 0 .
DESCRIBING ABSTRACT PROGRAMS 95

3.4.1.3 The Prime Operator (0 )


In Figure 3.5, only variables are primed. In the Logic of Actions, we can
prime not just a variable but any state expression—that is, any expression
containing no primes. For a state expression exp, the value of the step
expression exp 0 on a step s → t is the value of exp on t. More precisely, the
meaning of exp 0 is defined by [[exp 0 ]](s → t) = [[exp]](t). This means that
exp 0 is equivalent to the step expression obtained by priming all the variables
in exp. The priming operator (0 ) can be applied only to state expressions.
In LA, priming an expression that contains a prime is a syntax error. That
means that it is illegal to prime an expression containing a defined symbol
whose definition contains a prime. For example, if e is defined to equal
x 0 + 1, then e 0 is syntactically illegal.
A constant has the same value in both states of a step. Therefore,
|= c 0 = c is true for any constant c. More generally, a constant expression is
an expression with no (primed or unprimed) variable; and |= exp 0 = exp is
true for any constant expression exp. The bound identifiers of predicate logic
are like ordinary mathematical variables, which means they are treated like
constants in the Logic of Actions. For example, (∃ i ∈ N : y = x + i )0 equals
∃ i ∈ N : y 0 = x 0 + i . We therefore call bound identifiers bound constants.
Appendix Section A.3 gives an example of how you can get into trouble by
forgetting that bound identifiers are constants.
The semantics of LA imply that the prime operator distributes over
the operators and constructs of ordinary math—for example, that (S ∪ T )0
equals S 0 ∪ T 0 . By expanding all definitions and distributing primes in this
way, we obtain a formula in which the prime operator is applied only to vari-
ables. We don’t have to expand all definitions to obtain such a formula. We
need only expand definitions that contain a prime or that appear within a
primed expression and contain a variable. Once we have reached an expres-
sion in which only variables are primed, we can reason about the resulting
expression as if constants, variables, and primed variables were all ordinary
mathematical variables. We therefore need no additional rules for reasoning
about LA formulas.
Section 3.2.2 defined an inductive invariant Inv of a program to be a
state predicate satisfying conditions I1 and I2, which we can restate as:

I1. Inv is implied by the program’s initial state.

I2. If Inv is true in a state, then the program’s next-state predicate implies
that it is true in the next state.
DESCRIBING ABSTRACT PROGRAMS 96

For program Increment, whose initial predicate is Init and whose next-state
action is Next, these two conditions can be expressed in LA as:
(3.10) |= Init ⇒ Inv
|= Inv ∧ Next ⇒ Inv 0
The proof of these conditions for program Increment is discussed in Ap-
pendix Section B.1.
Thus far, the correctness properties of programs that have concerned us
have been invariance properties. All the reasoning we have done to verify
that a program satisfies an invariance property is naturally expressed in
LA. The safety property usually proved of a traditional program is that it
cannot produce a wrong answer—which is expressed as the invariance of
the property asserting that the program has not terminated with a wrong
answer. The most popular way of proving such a property is Hoare logic [21].
Appendix Section A.4 explains Hoare logic and its relation to the Logic of
Actions.

3.4.1.4 Action Composition


The Logic of Actions contains another operator that is almost never used
in describing abstract programs and will not play a major role for us until
Section 7.1. However, it does make brief appearances in Sections 4.3.2 and
5.4.4.3, so it is explained here.
For actions A and B , the action A · B is defined to be true of a step
s → t iff there is a state u such that s → u is an A step and u → t is
a B step. If actions A and B describe two pseudocode statements S a and
S b , then A · B describes the statement S a ; S b executed by executing S a
followed by S b . For example, the statements labeled a and b in process p of
program Increment shown in Figure 3.3 are described by actions aStep(p)
and bStep(p) of Figure 3.5, and:5
aStep(p) · bStep(p) ≡ ∧ pc(p) = a
∧ x 0 = t(p) + 1
∧ t 0 = (t except p 7→ x )
∧ pc 0 = (pc except p 7→ done)
Replacing the actions aStep(p) and bStep(p) with aStep(p) · bStep(p) in the
definition of the next-state action Next of program Increment produces a
5
If you believe that the second and third conjuncts in this formula are in the wrong
order, then you’re thinking in terms of coding languages, not math. Remember that F ∧G
is equivalent to G ∧ F .
DESCRIBING ABSTRACT PROGRAMS 97

program with a coarser grain of atomicity. Choosing the grain of atomicity


of an abstract program involves a tradeoff between making the program
detailed enough to be useful and simple enough to be usable. Section 7.1
addresses this tradeoff using action composition.
The operator “·” is associative, meaning (A · B ) · C = A · (B · C ) for any
actions A, B , and C . We can therefore omit parentheses and simply write
A·B ·C .
For any action A, we define the action A+ to be satisfied by a step s → t
iff state t can be reached from state s by a sequence of one or more A steps.
In other words:

A+ = A ∨ (A · A) ∨ (A · A · A) ∨ (A · A · A · A) ∨ . . .

3.4.2 The Temporal Logic RTLA


In 1977, Amir Pnueli [46] had the idea of using an obscure branch of math-
ematics called temporal logic to express time-dependent properties without
explicitly mentioning times or state numbers. He used a temporal logic
containing the single temporal operator 2 and operators defined in terms
of 2. You can read 2 as always, but when you become more familiar with
it you’ll probably just call it box. Intuitively, the formula 2F asserts that
the formula F is true at all times. For example, if P is a state predicate,
then 2P asserts that P is true in all states of a behavior.
From now on, we will be discussing and using temporal logic. We will
continue to ignore assignments of values to constants, assuming some fixed
interpretation satisfying the assumptions made about those constants.
In the kind of temporal logic Pnueli used, called linear-time temporal
logic, the meaning of a formula is a predicate on behaviors, where a behavior
is an infinite cardinal sequence of states. In other words, a behavior σ is
a function from N to states. We think of σ(n) as the state at time n, so
the first state of σ is σ(0). But remember, the only resemblance of the
state number n to a time is that state σ(n) does not occur later than state
σ(n + 1). (In Section 4.4, we’ll see that they could both occur at the same
time.) We’ll sometimes write σ as σ(0) → σ(1) → · · · .
RTLA is the temporal logic containing the same temporal operators as
Pnueli’s original logic, all defined in terms of 2, but having the formulas of
LA as the basic formulas. The formulas of RTLA can all be written as LA
formulas and formulas obtained from them using the operator 2 and the
usual operators of propositional and predicate logic. The only expressions
of RTLA are formulas. Prime (0 ) can appear only in the basic LA formulas.
It’s illegal to prime a formula containing a 2.
DESCRIBING ABSTRACT PROGRAMS 98

The TLA in RTLA stands for Temporal Logic of Actions. The R stands
for Raw, in the sense of unrefined. We’ll see later that RTLA allows us
to write formulas that we shouldn’t write. TLA is the logic obtained by
restricting RTLA to make it impossible to write those formulas. But that’s
a complication we don’t need to worry about now, so we’ll start with the
simpler “raw” logic.
In temporal logic formulas, the operator 2 binds more tightly than the
operators of propositional logic. For example, 2F ∨G is parsed as (2F )∨G.

3.4.2.1 Simple RTLA


In RTLA, the operator 2 can be applied to any RTLA formula, so we can
write formulas like 2(A ⇒ 2B ) where A and B are actions. We will begin
by considering simple RTLA, in which the operator 2 is applied only to
actions, not to formulas containing 2.
A formula of a temporal logic is called a temporal formula. For any
assignment of values to constants, the meaning [[F ]] of a temporal formula
is a behavior predicate—that is, a mapping that assigns Boolean values to
behaviors. An action A is a formula of RTLA, where it is viewed as a
behavior predicate. As a formula of LA, we’ve viewed A as a step predicate.
As a formula of RTLA, we view it as a behavior predicate that is true on a
behavior iff, viewed as a step predicate, it is true of the behavior’s first step.
To state that precisely, let [[A]]LA be the meaning of A as an LA formula.
We define its meaning [[A]]RTLA as an RTLA formula as follows. For any
behavior σ, which equals σ(0) → σ(1) → σ(2) → · · ·, we define

(3.11) [[A]]RTLA (σ) = [[A]]LA (σ(0) → σ(1))

From now on, [[F ]] means [[F ]]RTLA for all RTLA formulas, including actions.
We will explicitly write [[A]]LA to denote the meaning of A as an LA formula.
For an action A, we define 2A to be the temporal formula that is true
of a behavior iff A is true of all steps of the behavior. In other words, we
define the meaning [[2A]] of the RTLA formula 2A by

(3.12) [[2A]](σ) = ∀ n ∈ N : [[A]]LA (σ(n) → σ(n + 1))

Like most logics, RTLA contains the operators of propositional and predicate
logic, where they have their standard meanings. For example, [[F ∧ G]](σ)
equals [[F ]](σ)∧[[G]](σ), and [[∃ i : F ]](σ) equals ∃ i : [[F ]](σ). As in LA, bound
identifiers are called bound constants and they act like constants, having the
same value in all states of a behavior. Bounded quantification is defined
DESCRIBING ABSTRACT PROGRAMS 99

as in ordinary math, with ∀ i ∈ S : F equal to ∀ i : (i ∈ S ) ⇒ F , and S not


in the scope of the bound constant i . In temporal logic, we almost always
write such a formula only when S is a constant expression.
It’s important to remember that a behavior is any cardinal sequence of
states. It doesn’t have to be a behavior of any particular program. Since
any step is the first step of lots of behaviors, it’s obvious that if A is an LA
formula, then |= A is true when A is viewed as an RTLA formula iff it’s true
when A is viewed as an LA formula.
Now let’s return to the description of program Increment in Figure 3.5.
It tells us that a behavior σ is a behavior of the program iff (i) the initial
predicate Init is true of its first state σ(0) and (ii) the step predicate Next
is true for every step σ(n) → σ(n + 1) of σ. Condition (i) is expressed
by [[Init]], since (3.11) tells us that [[Init]](σ) equals [[Init]]LA (σ(0) → σ(1));
and since Init is a state predicate, it’s true of a step iff it’s true of the first
state of the step. By (3.12), condition (ii) is expressed as [[2Next]]. Thus
(the meaning of) the formula Init ∧ 2Next is true of a behavior σ iff σ is a
behavior of program Increment.
Of course, this is true for an arbitrary program. The behaviors that
satisfy a program with initial predicate Init and next-state action Next
are described by the simple RTLA formula Init ∧ 2Next. Any program is
described by an RTLA formula of this form. As promised, we can write any
program as a mathematical formula. It’s an RTLA formula rather than a
TLA formula, and we’ll see that it needs to be modified. But for now, it’s
close enough to the final TLA formula.
By (3.12), the state predicate Inv is true in all states of a behavior iff
2Inv is true of that behavior. That Inv is an invariant of Increment means
that, for any behavior σ, if σ is a behavior of Increment then Inv is true in
all states of σ. Thus, that Inv is an invariant of Increment is expressed by
this condition:

(3.13) |= Init ∧ 2Next ⇒ 2Inv

Remember that in (3.10) and (3.13), when Init, Next, and Inv are the formu-
las defined in Figure 3.5, |= F means that F is true for all interpretations
satisfying the assumptions we made about the constants of Increment—
namely, that Procs is a finite set and the implicit assumption that the values
of a, b, and done are different from one another.
In general, the conditions I1 and I2 for showing that a state predicate Inv
is an invariant of a program Init ∧ 2Next are expressed in LA by conditions
(3.10). It is an RTLA proof rule that these conditions imply (3.13). When we
DESCRIBING ABSTRACT PROGRAMS 100

prove a safety property like (3.13), the major part of the reasoning depends
on the definitions of the formulas Init, Next, and Inv . That reasoning is
reasoning about actions, which is formalized by LA. The temporal logic
reasoning, which is done in RTLA, is trivial. Describing the program with a
single formula is elegant. But it is really useful only when verifying liveness
properties, which requires nontrivial temporal reasoning.
Because it’s often forgotten, it is worth repeating that a state is any
assignment of values to variables, and a behavior is any infinite sequence of
states. Even when we are discussing program Increment,
√ “state” means any
state, including states in which x has the value h 2, 1 . . 147i. In (3.13), |=
means
√ true for any behavior, even behaviors in which the initial value of x
is h 2, 1 . . 147i.
√ It is true for those behaviors because Init equals false for
them (unless h 2, 1 . . 147i happens to equal 0, which it might).

3.4.2.2 The Complete RTLA


Simple RTLA suffices for reasoning about safety properties, but not for
liveness properties. For example, we want to express the liveness property
of program Increment that a process p eventually terminates. Termination
of p means that pc(p) eventually equals done and remains equal to done
forever. In terms of explicit state numbers, where pc(n) is the value of pc
in state number n, this property can be written:
(3.14) ∃ j ∈ N : ∀ k ∈ N : pc(j + k )(p) = done
To write it without explicit state numbers, we need full RTLA, in which
2 can be applied to any RTLA formula, not just to an action. To define
the meaning of all RTLA formulas, we first define σ +n, for any behavior σ
and natural number n, to be the behavior obtained by removing the first n
states from the sequence σ. That is, σ +n is the behavior
σ(n) → σ(n + 1) → σ(n + 2) → · · ·
so σ +n equals i ∈ N 7→ σ(i + n) .
For any RTLA formula F , the RTLA formula 2F is true of a behavior
σ iff it is true of the behaviors σ +n for all n ∈ N. In other words:

(3.15) [[2F ]](σ) = ∀ n ∈ N : [[F ]](σ +n )
for any behavior σ. When F is an action A, this is the same definition as
(3.12) because the first step σ +n (0) → σ +n (1) of σ +n is σ(n) → σ(n + 1) .
Although 2 has a simple definition, temporal formulas can be hard to
understand at first. It helps to think of a temporal formula as an assertion
DESCRIBING ABSTRACT PROGRAMS 101

about the present and future. The state σ(n) of a behavior σ is the state at
time n, and the behavior σ +n is the part of the behavior σ that begins at
time n. We can then think of [[F ]](σ +n ) as asserting that F is true at time
n of behavior σ. Thus, [[F ]](σ) asserts that F is true at time 0 of σ, and
[[2F ]](σ) asserts that F is true at all times of σ. The formula 2F therefore
asserts that F is true at all times—that is, F is always true. (Remember
that time n is just some instant of time; it is not n time units after time 0.)
We now drop the [[ ]] and think about temporal logic the same way we
think about ordinary math, conflating a formula with its meaning. So, we’ll
think of a temporal formula F as a Boolean-value function on behaviors.
However, we will still turn to the formal meaning (3.15) of 2 when it is
useful. Sections 3.4.2.3–3.4.2.8 below examine 2 and temporal operators
defined in terms of 2. They present quite a few tautologies. Understanding
intuitively why those tautologies are true will make you comfortable reading
temporal logic formulas and thinking in terms of them.

3.4.2.3 The 2 Operator


We first consider some temporal logic tautologies—theorems about arbitrary
temporal formulas. If they are not obvious, rewrite them in terms of the
meanings of the formulas. The first tautology asserts the obvious fact that
if F is always true, then it is true now:
(3.16) |= 2F ⇒ F
The next tautology asserts that F ∧ G true at all times is equivalent to F
true at all times and G true at all times:
(3.17) |= 2(F ∧ G) ≡ (2F ) ∧ (2G)
You should check that (3.17) follows from the definition (3.15) of 2 and the
predicate-logic tautology:
|= (∀ n ∈ S : P ∧ Q) ≡ (∀ n ∈ S : P ) ∧ (∀ n ∈ S : Q)
Observe that 2(F ∨ G) and (2F ) ∨ (2G) are not equivalent. For example,
2(F ∨ G) is true of a behavior in which F is true only in the initial state and
G is false in the initial state and true in all other states. However, neither
2F nor 2G is true of that behavior.
We can generalize (3.17) to any conjunction, including an infinite con-
junction—that is, quantification over an infinite set of formulas. If F i is a
temporal formula for all i ∈ S , then:
(3.18) |= 2(∀ i ∈ S : F i ) ≡ (∀ i ∈ S : 2F i )
DESCRIBING ABSTRACT PROGRAMS 102

The next tautology should also be obvious. It asserts that if F implies G is


true at all times, then F is true at all times implies G is true at all times:

(3.19) |= 2(F ⇒ G) ⇒ (2F ⇒ 2G)

Perhaps less obvious is this proof rule, which is sort of a converse of (3.16):

(3.20) |= F implies |= 2F

The assertion |= F means that F is true of all behaviors. The assertion


|= 2F asserts that for any behavior, F is true of the part of that behavior
starting at any time. But that part of the behavior is itself a behavior, so
|= F implies that F is true of it. If this is not obvious to you, then you
may be thinking of a behavior as a behavior of some program. A behavior
is any infinite sequence of states, so if you remove the first n states of any
behavior, you get a behavior.
From (3.19) and (3.20) we easily derive:

(3.21) |= F ⇒ G implies |= 2F ⇒ 2G

This rule lies at the heart of much temporal logic reasoning.


We now examine some additional temporal assertions that can be defined
using 2.

3.4.2.4 Eventually (3)


The operator 3 is defined to be the dual of 2 :

(3.22) 3F = ¬2¬F

Like 2, the operator 3 binds more tightly than the operators of propositional
logic, so 3F ∧ G is parsed as (3F ) ∧ G. To understand 3, we derive the
meaning [[3F ]] of a formula 3F from (3.15):

[[3F ]](σ) ≡ [[¬2¬F ]](σ) by definition of 3


≡ ¬ [[2¬F ]](σ) by the meaning of ¬
≡ ¬∀n ∈ N : [[¬F ]](σ +n ) by (3.15)
≡ ¬∀n ∈ N : ¬ [[F ]](σ +n ) by the meaning of ¬
≡ ∃n ∈ N : [[F ]](σ +n ) by the duality relation (2.21)

Hence, 3F asserts that F is true at some time—either now or in the future.


We read 3 as eventually, where by being eventually true we include the
DESCRIBING ABSTRACT PROGRAMS 103

possibility of being true only now. Corresponding to the tautologies (3.16)


and (3.17) for 2 are these tautologies for 3:

(3.23) |= F ⇒ 3F |= 3(F ∨ G) ≡ (3F ∨ 3G)

Make sure you understand why they are true from the meaning of 3 as
eventually. These two tautologies can be derived from (3.16) and (3.17).
For example:

(3.24) (F ⇒ 3F ) ≡ (¬(¬F ) ⇒ ¬(2¬F )) By logic and the definition of 3.


≡ (2¬F ⇒ ¬F ) By propositional logic.

and |= 2¬F ⇒ ¬F follows from (3.16). You should convince yourself that
3(F ∧ G) and (3F ) ∧ (3G) need not be equivalent. The equivalence of
3(F ∨ G) and 3F ∨ 3G generalizes to arbitrary disjunctions:

|= 3(∃ i ∈ S : F i ) ≡ (∃ i ∈ S : 3F i )

Here are three tautologies relating 3 and 2. The first is obtained by negating
3F and its definition; the second by substituting ¬F for F in that definition;
and the third by substituting ¬F for F in the first:

(3.25) |= ¬3F ≡ 2¬F |= ¬2F ≡ 3¬F |= 2F ≡ ¬3¬F

They should be obvious from thinking of 2 as always and 3 as eventually.


The first two tell us that moving ¬ over a temporal operator 2 or 3 changes
2 to 3 and 3 to 2. Note the similarity between 2/3 and ∀ /∃ , a similarity
that arises from the meanings of 2 and 3. Here is another tautology that
follows from the intuitive meanings of 2 and 3 :

(3.26) |= (2F ∧ 3G) ⇒ 3(2F ∧ G)

It asserts that if F is true from now on and G is true at some time in the
future, then at some time in the future F is true from then on and G is true
then.
We can express liveness properties with 3. For example, the assertion
that some state predicate P is eventually true is a liveness property. The
assertion that the program whose formula is F satisfies this property is
|= F ⇒ 3P . Since the assertion that something eventually happens is a
liveness property, most of the formulas we write that contain 3 express
liveness.
The equivalence of ¬3P and 2¬P for any formula P implies the tau-
tology |= 3P ∨ 2¬P . This tautology is central to many liveness proofs. To
DESCRIBING ABSTRACT PROGRAMS 104

prove that a program described by formula F satisfies the liveness property


3P , we must prove |= F ⇒ 3P . The tautology |= 3P ∨ 2¬P and propo-
sitional logic imply that we can prove this by proving |= F ∧ 2¬P ⇒ 3P .
This is a proof by contradiction, allowing us to use the additional hypothe-
sis 2¬P , which should be false. For example, if P is a state predicate, this
gives us an invariant ¬P that can be used to prove other invariants.

3.4.2.5 Eventually Always (32)


Recall that termination of process p of program Increment means that pc(p)
eventually equals done and remains forever equal to done, a property ex-
pressed with explicit state numbers by (3.14). It can be stated as: It’s even-
tually true that pc(p) = done is always true. This is expressed in RTLA as
32(pc(p) = done). You should check that [[32(pc(p) = done)]](σ) is equiv-
alent to (3.14) because (σ +j )+k equals σ +(j +k ) , so pc(j + k )(p) in (3.14)
equals [[pc(p)]](σ +(j +k ) ).
We can think of 32 as a temporal operator meaning eventually always.
Convince yourself that this is a tautology:

|= 32(F ∧ G) ≡ (32F ) ∧ (32G)

3.4.2.6 Infinitely Often (23)


It should now seem natural to think of 23F as meaning always eventually
F is true. If you’re not used to thinking about infinite sequences, it may
not be obvious that always eventually is equivalent to infinitely often. So,
let’s prove it.

Theorem 3.1 F is infinitely often true iff it is always eventually true.

Define S σ to be the set of times at which F is true of a behavior σ.


1. F is infinitely often true of σ iff S σ is an infinite set.
Proof: By definition of infinitely often.
2. S σ is an infinite set iff for every time n, there is a time m ≥ n such that
m ∈ Sσ .
Proof: A set of natural numbers is infinite iff it has no largest element.
3. The statement that for every time n there exists a time m ≥ n such that
m ∈ S σ is equivalent to the statement that F is always eventually true.
Proof: By the definitions of S σ and always eventually.
DESCRIBING ABSTRACT PROGRAMS 105

4. Q.E.D.
Proof: Steps 1–3 are of the form A iff B , B iff C , and C iff D, and the
theorem asserts A iff D.

It is usually most helpful to think of 23 as meaning infinitely often rather


than always eventually. For example, consider the formula 23(F ∨ G). It
asserts that F ∨ G is true infinitely often, which means that F or G is true
infinitely often. But F or G is true infinitely often iff at least one of them
is true infinitely often. This yields the following tautology:

(3.27) |= 23(F ∨ G) ≡ (23F ) ∨ (23G)

The rules for moving ¬ over 2 and 3 that are implied by the first two
tautologies of (3.25) yield the following two tautologies. For example, the
first comes from ¬23F ≡ 3¬3F ≡ 32¬F .

(3.28) |= ¬23F ≡ 32¬F |= ¬32F ≡ 23¬F

3.4.2.7 The End of the Line


You might expect that we can keep constructing more and more compli-
cated operators like 233232 with sequences of 2 and 3. We can’t. Any
such sequence is equivalent to 2, 3, 23 or 32. To see this, first observe
that always always is the same as always. That is, 22F is equivalent to
2F . That’s because ∀ i , j ∈ N : P (i + j ) is equivalent to ∀ k ∈ N : P (k ), for
any P .
Similarly, eventually eventually is the same as eventually, so 33F is
equivalent to 3F . The equivalence of 33 and 3 also follows from the
definition of 3 and the equivalence of 22 and 2 by:

33F ≡ ¬2¬¬2¬F ≡ ¬22¬F ≡ ¬2¬F ≡ 3F

So, we can only get a new operator by alternating 2 and 3. However, 23


and 32 is as far as we can go because of the following tautologies:

(3.29) |= 323F ≡ 23F |= 232F ≡ 32F

The first one is obvious if we read 323 as eventually infinitely often, because
F is true at infinitely many times iff it is true at infinitely many times after
some time has passed. You can convince yourself that the second is true
by realizing that infinitely often F always true is equivalent to F being
always true starting at some time. Alternatively, you can show that the
DESCRIBING ABSTRACT PROGRAMS 106

first tautology implies the second by figuring out why each of the following
equivalences is true:
232F ≡ ¬3¬¬2¬¬3¬F ≡ ¬323¬F ≡ ¬23¬F
≡ ¬¬3¬¬2¬¬F ≡ 32F

3.4.2.8 Leads To (;)


Although there are no more operators to be defined by directly stacking 2
and 3, there is another useful temporal operator defined in terms of them:
the operator ;, read leads to, defined by:
F ; G = 2(F ⇒ 3G)

The operator ; is parsed like ⇒.


Formula F ; G asserts of a behavior that, whenever F is true, G is true
then or later. You should convince yourself that ; is transitive, meaning:
|= (F ; G) ∧ (G ; H ) ⇒ (F ; H )
Here are two additional tautologies that should be obvious:
(3.30) |= ((F ∨ G) ; H ) ≡ (F ; H ) ∧ (G ; H )
|= (F ; G) ∧ 2(G ⇒ H ) ⇒ (F ; H )
The first of these tautologies generalizes to:
(3.31) |= ((∃ i ∈ S : F i ) ; H ) ≡ (∀ i ∈ S : (F i ; H ))
Here are three more tautologies involving ;; try to understand why they’re
true.
(3.32) (a) |= 2F ∧ (F ; G) ⇒ 23G
(b) |= (F ; G) ≡ (F ∧ 2¬G ; G)
(c) |= (F ∧ 2G ; H ) ≡ (F ∧ 2G ; H ∧ 2G)
Here’s how I understand them:
(a) F ; G implies that whenever F is true, G is true then or later; and
2F implies that F is always true. Therefore, 2F ∧ (F ; G) implies
G is true infinitely often.
(b) F ;G ≡ 2(F ⇒ 3G) Definition of ; .
≡ 2(F ∧ ¬3G ⇒ 3G) Propositional logic.
≡ 2(F ∧ 2¬G ⇒ 3G) By (3.25).
≡ F ∧ 2¬G ; G Definition of ; .
DESCRIBING ABSTRACT PROGRAMS 107

(c) F ∧ 2G ; H asserts that, for all t, if F ∧ 2G is true at time t, then


H is true at some time u ≥ t; and 2G true at time t implies it is still
true at time u, so H ∧ 2G is true at time u.
Observe that tautology (b) justifies a proof by contradiction: to prove that
F true implies G is eventually true, we can assume that G is never true.
Proving ; properties is at the heart of liveness proofs. For example,
here’s how we prove termination of Euclid’s algorithm, discussed in Sec-
tion 1.5. The algorithm terminates because while x 6= y is true, the sum
of x and y keeps decreasing, which can’t continue forever because the algo-
rithm satisfies the invariant that x and y are positive integers. Therefore,
eventually x = y and the algorithm terminates. This argument is formalized
in RTLA as follows.
To prove termination, we must prove that every behavior of the algorithm
satisfies 3(x = y). The proof uses this tautology, which follows from the
meanings of the operators 2, ;, and 0 :
(3.33) |= 2(P ⇒ Q 0 ) ⇒ (P ; Q)
The tautology |= 3P ∨ 2¬P allows us to prove 3(x = y) by assuming that
a behavior of the algorithm satisfies 2(x 6= y) and obtaining a contradiction.
Let R i be the state predicate x + y ≤ i . We prove that 2(x 6= y) and the
invariant that x and y are positive integers imply that, for all i > 0, the
program satisfies 2(R i ⇒ (R i−1 )0 ). By (3.33), this implies R i ; R i−1 . By
the transitivity of ; and mathematical induction, this implies R M +N ; R 0 .
Since the program implies that R M +N is true in the initial state, this im-
plies that 3R 0 is true, contradicting the invariant that x and y are always
positive.

3.4.2.9 Warning
Although elegant and useful, temporal logic is weird. It’s not ordinary math.
In ordinary math, any operator Op we can define satisfies the condition,
sometimes called substitutivity, that the value of an expression Op(e 1 , . . . , e n )
is unchanged if we replace any e i by an expression equal to e i . If Op takes
a single argument, substitutivity means that
(3.34) |= (exp 1 = exp 2 ) ⇒ (Op(exp 1 ) = Op(exp 2 ))
is true for any expressions exp 1 and exp 2 . For example, (3.34) is true
for the operator Tail . However, the temporal operator 2 is not substi-
tutive. For example let exp 1 and exp 2 be the state predicates x = 0 and
DESCRIBING ABSTRACT PROGRAMS 108

y = 0, respectively; and let σ be a behavior such that for each j ∈ N, the


state σ(j ) assigns the value 0 to x and the value j to y. Then exp 1 and
exp 2 both equal true for σ because they are both true for σ(0). Formula
2exp 1 is true for σ because x = 0 is true in all its states, but 2exp 2 is
false for σ because y = 0 is true only in the first state σ(0). Hence, the
value of the formula (exp 1 = exp 2 ) ⇒ (2exp 1 = 2exp 2 ) for this behavior
is (true = true) ⇒ (true = false), which equals false. The operator
0 (prime) is similarly not substitutive, so it too is weird. This weirdness

affects all temporal logics and makes temporal logic reasoning tricky.

3.5 TLA
3.5.1 The Problem
There is something terribly wrong with our RTLA descriptions of abstract
programs, because there is something terribly wrong with the descriptions
like the one in Figure 3.4 that we wrote using explicit step numbers. To
see why, let’s return to the discussion in Section 3.1 of how astronomers
describe a planet orbiting a star. As explained there, the mathematical
description of the orbiting planet is best thought of as describing a universe
containing the planet, saying nothing about what else is or is not in the
universe. In particular, that description applies just as well to a universe
in which there is a spacecraft close to the star that orbits it very fast—
perhaps going around the star 60 times every time the planet goes around
it once. Since the spacecraft is too small to affect the motion of the planet,
we would obtain a description of the system composed of the planet and the
spacecraft by adding (conjoining) a description of the spacecraft’s motion
to the description of the planet’s motion. The description of the planet’s
motion remains an accurate description of that planet in the presence of
the spacecraft. It would be crazy if we had to write different formulas to
describe the planet because of the spacecraft that has no effect on it.
Now consider the descriptions of abstract programs we’ve been writing.
In particular, consider an RTLA formula HM describing how the values of
the hour and minute displays of a 24-hour clock change. Using the variables
hr and min to describe the current hour and minute being displayed, we
might define HM to equal Init ∧ 2Next, where:

(3.35) Init = (hr = 0) ∧ (min = 0)

Next = ∧ min 0 = (min + 1) % 60
∧ hr 0 = if min = 59 then (hr + 1) % 24 else hr
DESCRIBING ABSTRACT PROGRAMS 109

But suppose that the clock also displays seconds. The RTLA formula HMS
that also describes the second display might use a variable sec to describe
that display. A behavior σ allowed by HMS would not be allowed by HM
because HM requires every step to change the value of min, while σ must
change the value of sec in every step and the value of min in only every 60th
step.
It is just as crazy for an abstract program describing an hour-minute
clock not to describe a clock that also displays seconds as it is for a descrip-
tion of a planet’s motion no longer to describe that motion because of a
spacecraft that doesn’t affect the planet. It means that anything we’ve said
about the hour and minute display might be invalid if there’s also a second
display. And it doesn’t matter if the minute display is on a digital clock on
my desk and the second display is on a phone in my pocket. More generally,
it means if we’ve proved things about completely separate digital devices
and we look at those two devices at the same time, nothing we’ve proved
about them remains true unless those devices are somehow synchronized to
run in lock step. The more you think about it, the crazier it seems.

3.5.2 The Solution


To figure out how to fix this problem, let’s first see where we went wrong.
It happened in Section 3.2.1, when we went from a sequence t 0 , t 1 , t 2 , . . .
of times to a sequence 0, 1, 2, . . . of state numbers. We were writing a
description of a particular system. But math and science don’t describe a
system; they describe a universe containing that system. And that universe
can contain many systems. A different system might lead to a different
sequence u 0 , u 1 , u 2 , . . . of times, with only t 0 and u 0 equal. Our error
was converting two possibly different times t i and u i into the same state
number. The result was that when we thought we were writing a description
of a particular system, we were actually writing a description of a universe
in which the values of all variables, including ones describing other systems,
could change only when the variables of that particular system changed.
You might think that because the error occurred when we were throwing
away times, we need to represent the time at which a state holds, not just
a state number. Fortunately, there is a simpler solution. It’s the one we
used to eliminate finite behaviors and consider only infinite behaviors. We
observed that we could do that by adding stuttering steps at the end of a
finite sequence of states—steps that just repeat the previous state of the
program. Eliminating finite behaviors was not simply a matter of conve-
nience. The real reason to do it was to eliminate one source of craziness.
DESCRIBING ABSTRACT PROGRAMS 110

Since a behavior is not just a behavior of a particular program but a behav-


ior of the entire universe, a finite behavior is one in which everything in the
universe stops changing. The description of a halting program execution as
a finite behavior therefore asserts that the entire universe halts when the
program does. Those infinitely many stuttering steps, in which the value
of no variable of the program changes, allows other programs’ variables to
keep changing.
We can add those stuttering steps because of the observation that the
conversion from times to state numbers requires that a program variable
be allowed to change only at time t i for some i . It does not require that
any variable does change at that time. The mistake was writing descriptions
that, until the program halts, requires some variable to change value at each
time t i . Instead, we should have added to the sequence of times t i times at
which no program variable changes. Adding such a time adds a step in which
other variables describing other programs can change while the program’s
variables remain unchanged. Thus, if the description allows a behavior σ,
then it should allow the behavior obtained by inserting stuttering steps of
the program in σ. This is easy to do. For the description of the hour/minute
display, we just change the definition of HM to

HM = Init ∧ 2(Next ∨ ((hr 0 = hr ) ∧ (min 0 = min)))


We can write this formula more compactly as

(3.36) HM = Init ∧ 2(Next ∨ (hhr , min i0 = hhr , min i))


because hhr , min i0 equals hhr 0 , min 0 i, and two tuples are equal iff their
corresponding components are equal.
We can similarly fix every other example we’ve seen so far by changing
the next-state action Next in its RTLA description to Next ∨ (v 0 = v ), where
v is the tuple of all variables that appear in the RTLA formula. Since this
will have to be done all the time, we abbreviate A ∨ (v 0 = v ) as [A]v for any
action A and state expression v .
We can add stuttering steps to a pseudocode description of an algorithm
by adding a separate process that just takes stuttering steps. However,
we won’t bother to do this. We will just consider all pseudocode to allow
stuttering steps.
When HM is defined by (3.36), if HMS is true of a behavior then HM
is also true of the behavior. This remains true when HMS is modified to
allow stuttering steps. Thus, HMS implements HM , and |= HMS ⇒ HM is
true. Implementation is implication. How elegant!
DESCRIBING ABSTRACT PROGRAMS 111

There is an apparent problem with formula HM of (3.36). It allows


behaviors in which the program takes a finite number of steps (possibly zero
steps) and then takes nothing but stuttering steps. In other words, it allows
behaviors in which the clock stops. Most computer scientists will say that
we should never allow behaviors in which an abstract program stops when
it is possible for it to continue executing. This is because they are used
to thinking about traditional programs. In many cases, we don’t want to
require a concurrent abstract program to do something just because it can.
Never stopping is a liveness property. Taking only steps satisfying [Next]v
is a safety property. My experience has taught me that we should describe
safety properties separately from liveness properties because we reason about
them differently and we should think about them differently. Formula HM
describes the safety property that the hour-minute clock should satisfy. We
will see later how we conjoin a liveness property to HM if we want to require
the clock to run forever. It is a feature not a problem that this definition of
HM asserts only what the clock may do and not what it must do.
In general, the safety property of an abstract program is written in the
form Init ∧ 2[Next]v , where Init is the initial predicate and [Next]v is the
next-state action. The formula 2[Next]v always allows stuttering steps be-
cause [Next]v has the form . . .∨(v 0 = v ) , and v 0 = v allows stuttering steps.
However, v 0 = v allows lots of non-stuttering steps. In particular, it allows
steps in which any variable that does not appear in v can have any values
in the two states of the step. To describe an abstract program, the state ex-
pression v in 2[Next]v must ensure that v 0 = v allows only steps that do not
change any of the program’s variables. Therefore, unless stated otherwise,
in a formula of the form 2[Next]v where Next is the next-state action of a
program, the subscript v is assumed to be the tuple of all program variables.
(However, that subscript need not be called v .)

3.5.3 Stuttering Insensitivity


We have seen that the safety property of an abstract program should have
the form Init ∧ 2[Next]v , so it allows stuttering steps. But what can we say
in general about formulas for describing systems or abstract programs?
Stuttering steps are created by adding extra times t i at which we report
the values of a program’s variables. A stuttering step does not represent the
program doing anything. It’s just a mathematical way to allow the descrip-
tions of the universe with which we describe different abstract programs to
be made consistent with one another. Therefore, any assertion we make
about a behavior of an abstract program should not depend on whether we
DESCRIBING ABSTRACT PROGRAMS 112

add or remove steps that leave the program’s variables unchanged. Since the
assertion depends only on the values a behavior assigns to the program’s vari-
ables, this condition is satisfied iff the assertion does not depend on whether
we add or remove steps that leave all variables unchanged. We’ve used the
term stuttering step to mean a step that leaves a program’s variables un-
changed. We will now call such a step a stuttering step of the program. We
define a stuttering step to be a step that leaves all variables unchanged.
A sensible predicate F on behaviors should satisfy the condition that the
value of [[F ]](σ) is not changed by adding stuttering steps to, or removing
them from, a behavior σ. This means that the value of [[F ]](σ) is not changed
even if an infinite number of stuttering steps are added and an infinite
number removed. (However, the behavior must still be infinite, so if σ ends
in an infinite number of stuttering steps, those steps can’t be removed.) A
predicate on behaviors satisfying this condition for all behaviors σ is called
stuttering insensitive, or SI for short. When describing abstract programs
or the properties they satisfy, we should use only SI predicates on behaviors.
To define SI precisely, we first define \(σ) to be the behavior obtained by
removing from the behavior σ all stuttering steps except those belonging to
an infinite sequence of stuttering steps at the end. Recall that Tail (σ) is the
behavior obtained from σ by removing its first state and ◦ is concatenation
of sequences. For a state s, let’s define (s →) to equal i ∈ {0} 7→ s, the
cardinal sequence of length 1 whose single item is s. Here is the recursive
definition of \.

(3.37) \(σ) = if ∀ i ∈ N : σ(i ) = σ(0)
then σ
else if σ(0) = σ(1) then \(Tail (σ))
else (σ(0) →) ◦ \(Tail (σ))
A predicate on behaviors is defined to be SI iff, for any behavior σ, the
predicate is true of σ iff it is true of \(σ). SI is a semantic condition—that
is, a condition on the meanings of formulas. Since we are conflating formulas
and their meanings, saying that a formula F is SI means that [[F ]] is SI.
We have been using the term property informally to mean some condition
on the behaviors of a system or abstract program. We now define it to mean
an SI predicate on behaviors. Behavior predicate still means any predicate
on behaviors, not just SI ones.
DESCRIBING ABSTRACT PROGRAMS 113

3.5.4 The Definition of TLA


We now define TLA to be a language that is a sublanguage of RTLA in which
every formula is a property—that is, an SI formula. Defining a language
means giving syntactic rules for a formula to belong to the language.
We begin by defining a state predicate to be a TLA formula. Viewed as a
temporal formula, a state predicate is SI because it depends only on the first
state of a behavior, which isn’t changed by adding or removing stuttering
steps.
The operators of propositional logic applied to SI formulas produce
SI formulas. For example, if adding or removing stuttering steps doesn’t
change whether formulas F and G satisfy a behavior, then they don’t change
whether F ∧ G satisfies the behavior. So we let TLA include all formulas
obtained by applying propositional logic operators to TLA formulas. Simi-
larly, we can let TLA include all formulas obtained by applying unbounded
quantifiers and quantifiers bounded by a state expression to a TLA formula.
For example, ∃ x ∈ S : F is a TLA formula if F is one and S is a state expres-
sion. In practice, we write a formula ∃ x ∈ S : F only when S is a constant
expression.
It’s easy to find actions A that, when viewed as a temporal formula,
are not SI. For example, x 0 6= x is not SI because if σ is a behavior that
satisfies x 0 6= x , then the behavior obtained by adding a stuttering step to
the beginning of σ doesn’t satisfy it. However, the formula 2[A]v is SI for
any action A and state expression v , so we let TLA contain all such formulas.
The formula 2[A]v asserts that an action, namely one of the form [A]v ,
is true of all steps of a behavior. For reasoning about liveness, we will need
to assert that an action is eventually true in some step of a behavior. The
formula 3A is not SI for an arbitrary action A because if A is true on
some stuttering step, then 3A might be false on a behavior σ and true on
a behavior obtained by adding such a stuttering step to σ. However, if A
does not allow stuttering steps, then 3A is SI. In particular, the formula
3(A ∧ (v 0 6= v )) is SI for any state expression v . We define hAiv to equal
A ∧ (v 0 6= v ); and we let TLA contain all formulas 3hAiv , for any action A
and state expression v .
You can convince yourself that adding or removing stuttering steps
from a behavior doesn’t change whether it satisfies 3hAiv . However, since
¬2[¬A]v is SI, the following theorem shows that 3hAiv is SI:

(3.38) |= 3hAiv ≡ ¬2[¬A]v

This equivalence follows from the definition of 3 and the following assertion,
DESCRIBING ABSTRACT PROGRAMS 114

which can be proved by propositional logic from the definitions of [. . .]v and
h. . .iv :

|= hAiv ≡ ¬[¬A]v

From (3.38) we see that ¬3hAiv is equivalent to 2[¬A]v . This means that
an hAiv step never occurs in a behavior iff every step of the behavior is a
[¬A]v step. This fact is used in proofs by contradiction of formulas of the
form 3hAiv .
The only temporal operator we have defined besides 2 and 3 is ;. We
define F ; 3hAiv to be a TLA formula if F is one, A is an action, and v
is a state expression. This formula is SI, since expanding the definition of
; in it produces what we have already defined to be a TLA formula.
Combining all this, we see that a TLA formula is one of the following:

• A state predicate.

• Obtained by applying propositional logic operators to TLA formulas.

• ∀∃ c : F or ∀∃ c ∈ S : F where ∀∃ is ∀ or ∃ , for a constant c, a TLA


formula F , and a state expression S .

• 2[A]v , 3hAiv , or F ; 3hAiv , for an action A, a state expression


v , and a TLA formula F .

Abstract programs and the properties they satisfy should be TLA for-
mulas. However, we can use RTLA proof rules and even RTLA formulas
when reasoning about TLA formulas. For example, we can prove that Inv
is an invariant of Init ∧ 2[Next]v by substituting [Next]v for Next in the
RTLA proof rule that (3.10) implies (3.13). This yields the following rule:

|= (Init ⇒ Inv ) ∧ (Inv ∧ [Next]v ⇒ Inv 0 )


implies |= Init ∧ 2[Next]v ⇒ 2Inv

In this rule, the first |= means validity in LA while the second |= means
validity in TLA. A feature of TLA is that as much reasoning as possible is
done in LA, which becomes ordinary mathematical reasoning when the nec-
essary definitions are expanded and primes are distributed across operators
so only variables are primed.
Chapter 4

Safety, Liveness, and Fairness

4.1 Safety and Liveness


4.1.1 Definitions
Safety and liveness properties have been described intuitively as specifying
what the program is allowed to do and what it must do. To define them
precisely, we begin by observing that they have these characteristics:

Safety If a behavior doesn’t satisfy a safety property, then we can point to


the place in the behavior where it violates the property. For example,
if a behavior doesn’t satisfy an invariance property, it violates the
property in the first state in which the invariant is false.

Liveness We have to look at an entire infinite behavior to see that it doesn’t


satisfy a liveness property. For example, we can’t see that the property
x eventually equals 42 is violated by looking at a finite part of the
behavior.1

This characterization was turned into precise definitions of safety and live-
ness for arbitrary behavior predicates by Alpern and Schneider [3]. Since
we’re interested only in properties (which are SI), we will use a somewhat
simpler definition of safety. But first, we need a few preliminary definitions.
We call a finite, nonempty cardinal sequence of states a finite behavior.
(A behavior, without the adjective finite, still means an infinite cardinal
1
Remember that a behavior means any infinite sequence of states, not just one that
satisfies some program. If we know that a behavior satisfies the program, we can often tell
that 3(x = 42) is false by looking at the behavior predicate that describes the program,
without looking at the behavior at all.

115
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 116

sequence of states.) We’ll write a finite behavior ρ as ρ(0) → · · · → ρ(n).


A nonempty finite prefix of a behavior is a finite behavior. We define the
completion ρ↑ of a finite behavior to be the behavior obtained by repeating
the last state of ρ infinitely many times—that is, adding infinitely many
stuttering steps. A finite behavior ρ is defined to satisfy a behavior predicate
iff its completion ρ↑ satisfies it. We can now precisely define safety and
liveness.

Safety A property F is a safety predicate iff it satisfies the following con-


dition: A behavior satisfies F iff every nonempty finite prefix of the
behavior satisfies F .

Liveness A property F is a liveness property iff every finite behavior is the


prefix of a behavior that satisfies F .

A state predicate is a safety property because it is satisfied by a behavior


iff the state predicate is true on the initial state, and a behavior and all its
nonempty prefixes have the same initial state. The formula 2[A]v is a safety
property for any action A and state expression v ; here is the proof that every
nonempty finite prefix of a behavior σ satisfies 2[A]v iff σ satisfies 2[A]v .

1. Assume: Every nonempty finite prefix of σ satisfies 2[A]v .


Prove: σ satisfies 2[A]v
Proof: σ satisfies 2[A]v iff every step of σ satisfies [A]v , and every step of
σ is a step of some finite prefix of σ, so it satisfies 2[A]v by the assumption.
2. Assume: σ satisfies 2[A]v
Prove: Every nonempty finite prefix of σ satisfies 2[A]v .
Proof: Every step of a nonempty finite prefix of σ is either a step of
σ, so it satisfies [A]v by the assumption, or it is a stuttering step, which
satisfies [A]v by definition of [. . .]v .
3. Q.E.D.
Proof: Obvious, by steps 1 and 2.

It also follows easily from the definition of safety that the conjunction of
safety properties is a safety property. Therefore, as expected, the formula
Init ∧ 2[Next]v that we have been calling the safety property of a program
is indeed a safety property.
The property that asserts that a program halts is a liveness property.
That property is true of a behavior σ iff σ ends with infinitely many steps
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 117

that leave the program’s variables unchanged. It’s a liveness property be-
cause every finite behavior ρ is a prefix of its completion ρ↑ , which satisfies
the property.
Safety and liveness are conditions on properties, which are SI behavior
predicates. When we say that a TLA formula 2[A]v is a safety property, we
are conflating the formula with its meaning. We should remember that it’s
actually [[2[A]v ]] that is the safety property.

4.1.2 A Completeness Theorem


TLA is quite simple, adding only the two operators 0 (prime) and 2 to
ordinary math. In theory, this simplicity makes it quite inexpressive. For
example, here is a property F12 that neither TLA nor RTLA can express:
the value of x must equal 1 before it can equal 2. It’s expressed in terms of
explicit states as:

(4.1) F12 = ∀ j ∈ N : (x (j ) = 2) ⇒ ∃ k ∈ N : (k < j ) ∧ (x (k ) = 1)

This behavior predicate is a property because it’s SI; adding or removing


stuttering steps doesn’t affect whether a behavior satisfies it. Property F12
is a safety property because it’s not satisfied by a behavior σ iff there’s a
point in σ at which it’s violated—namely, a state σ(j ) in which x = 2 and
x 6= 1 in all the states σ(0), . . . , σ(j − 1).
If TLA is inexpressive, how can we describe programs with it? The
answer is, by using variables. We can express F12 as an abstract program
described by a TLA formula S 12 if we add a Boolean-valued variable, let’s
call it y, whose value is true iff x equals 1 or has previously equaled 1. We
let the initial predicate Init of S 12 assert that x 6= 2 and that y = true iff
x = 1. The next-state relation Next allows x 0 to equal 2 only if y = true,
and it sets y to true if x = 1. Here are the definitions:

S 12 = Init ∧ 2[Next]hx ,y i

(4.2)

Init = (x 6= 2) ∧ (y = (x = 1))

Next = ∧ (x 0 = 2) ⇒ y
∧ y 0 = (y ∨ (x = 1))
It’s not obvious in what sense formula S 12 expresses property F12 , since
S 12 contains the variables x and y while F12 describes only the values of x .
Intuitively, S 12 makes the same assertion as F12 if we ignore the value of y.
Section 6.1 describes a TLA operator ∃ such that ∃ y : S 12 means S12 if we
ignore the value of y. We’ll then see that [[∃ y : S 12 ]] equals F12 . However,
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 118

there’s no need to introduce ∃ here. The relevant condition that S 12 satisfies


is that if G is any TLA formula that does not contain the variable y, then

|= F12 ⇒ [[G]] iff |= S 12 ⇒ G

The idea of adding a variable to express a property works in general.


We state it now only for safety properties. We can’t express every safety
property as a TLA formula. A formula is a finite string of finitely many
symbols, and there are only a countable number of such strings; but there are
uncountably many safety properties. (For example, there are uncountably
many real numbers, so there are uncountably many properties asserting that
the initial value of x is a particular real number.) What we can show is that
any safety property (which is a predicate on behaviors) that can be described
by a mathematical formula—that is, by a formula of ZF—can be expressed
as a TLA formula. We do that by showing that if the mapping F from
behaviors to Boolean values is a safety property, then we can use F to write
a TLA formula that describes it the way S 12 describes F12 in our example.
One condition satisfied by a mathematical formula is that it contains only
a finite number of variables, and its value depends only on the values of
those variables. Remember that we are assuming that the language LA for
writing actions contains all the operators of ZF, including the operators on
sequences described in Section 2.3.2.
The theorem is expressed with the convention of letting a boldface iden-
tifier like x be the list x 1 , . . . , x n of subscripted non-bold versions of the
identifier, for some n. Thus, hxi is the tuple of those identifiers. The theo-
rem is a special case of Theorem 4.8 in Section 4.2.7 below, so the proof is
omitted.

Theorem 4.1 Let x be the list x 1 , . . . , x n of variables and let F be a safety


property such that F (σ) depends only on the values of the variables x in a
behavior σ. There exists a formula S equal to Init ∧ 2[Next]hx,y i , where Init
and Next are defined in terms of F , y is a variable not among the variables
x, and the variables of S are x and y, such that |= F ⇒ [[G]] iff |= S ⇒ G,
for any property G.

This theorem is a completeness result, showing that TLA can express as an


abstract program any safety property that can be expressed semantically.
While this shows that there is no fundamental lack of expressiveness in
TLA, it is of little practical significance. The proof assumes a description
of the property F and uses it to write F as an abstract program. If there
were a better way to describe properties mathematically than with abstract
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 119

programs, we should use it. There are other temporal logics that can express
the simple property F12 with a formula that’s easier to understand than S 12 .
However, S 12 is not hard to understand, and abstract programs are the only
practical way I know to express all the properties of concrete concurrent
programs that we need to describe.2

4.1.3 The Operator C


We now define the operator C so that C(F ) is the strongest safety property
implied by F , for any property F . Remember that property G stronger
than property H means every behavior satisfying G satisfies H —that is,
|= G ⇒ H . The operator C is not part of the TLA language; we do not
use it to write abstract programs. What we do is show that under certain
conditions some G equals C(F ) for some other program F . It would be
better to write that [[G]] equals C([[F ]]), but we won’t bother because we
regularly conflate a formula and its meaning.
If we want C(F ) to be the strongest safety property implied by F , it
should be satisfied by the behaviors satisfying F plus the fewest additional
behaviors needed to make it a safety property. The appropriate definition
is: A behavior σ satisfies C(F ) iff every finite prefix of σ is a prefix of a
behavior that satisfies F . The following theorem shows that this is the
correct definition. Its proof is in the Appendix.

Theorem 4.2 If F is a property, then C(F ) is a safety property such that


|= F ⇒ C(F ) and, for any safety property G, if |= F ⇒ G then |= C(F ) ⇒ G.

Alpern and Schneider proved that every property is the conjunction of a


safety property and a liveness property.3 They actually proved this stronger
result, whose proof is in the Appendix.
Theorem 4.3 Every property F is equivalent to C(F ) ∧ L for a liveness
property L.
We have been describing abstract programs by formulas of the form
Init ∧ 2[Next]v , which are safety properties. As we’ve observed, like any
safety property, this formula allows behaviors that halt at any point in the
behavior. We usually don’t want to allow such behaviors, so we must conjoin
a liveness property to this formula to describe most abstract programs. For
2
Remember that property means predicate on behaviors. There are many conditions
we want programs to satisfy besides properties.
3
Their result was stated for arbitrary behavior predicates, not properties.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 120

example, we can rule out behaviors of program Increment that don’t halt
prematurely with the liveness property

∀ p ∈ Procs : 3(pc[p] = done)

However, we’ll see later why that’s not a good liveness property to use.
There’s another method of describing safety and liveness that helps me
understand them intuitively. It’s based on topology. The method and the
necessary topology are explained in Appendix Section A.5.

4.1.4 What Good is Liveness?


Safety predicates constrain the finite behavior of a system. They describe
what must be true of finite prefixes of a behavior. Liveness properties say
nothing about finite prefixes; they describe what must be true if the system
runs forever. Since we don’t live forever, why should we care about liveness
properties?
Liveness is useless in theory but useful in practice. Consider the liveness
property required of a traditional program: it eventually terminates. In
theory, that’s useless because it might not terminate in a billion years. In
practice, proving that a program will terminate within a given amount of
time isn’t easy. Proving that it eventually terminates is easier, and it is useful
because the program is certainly not going to terminate soon enough if it
never does. But proving liveness provides more than that. Understanding
why a program eventually terminates requires understanding what it must
do in order to finish. That understanding helps you decide if it will terminate
soon enough. This applies to other liveness properties as well.
Using a model checker doesn’t give you the understanding that you get
from writing a proof. However, checking liveness properties is a good way to
detect errors—both in the program you intended to write and in what you
actually wrote. A program that does nothing satisfies most safety properties,
and an error in translating your intention into mathematics might disallow
behaviors in which the program fails to satisfy a safety property. Checking
that the program satisfies liveness properties that it should can catch such
errors, as well as errors in the program you wanted to write.

4.2 Fairness
Expressing mathematically the way computer scientists and engineers de-
scribed their algorithms and programs led us to describe the safety property
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 121

satisfied by an abstract program with the formula Init ∧ 2[Next]v , where v


is the tuple of all the program’s variables. We must conjoin to that formula
another formula to describe the program’s liveness property. To see how
this should be done, we first examine how scientists and engineers expressed
liveness.

4.2.1 Traditional Programs and the Enabled Operator E


We start with traditional programs. It was assumed, usually without need-
ing to be stated explicitly, that a program kept executing statements until it
terminated. If termination is expressed by pc = done, then this assumption
can be stated as the requirement that when pc 6= done, a Next step must
eventually occur. This is expressed by the TLA formula

(4.3) (pc 6= done) ; hNext iv

The h. . .iv is a bit of a nuisance, but it’s required by TLA to prevent a


liveness property from being satisfied by a stuttering step, which would make
no sense. Usually, the next-state action Next does not permit stuttering
steps, so (4.3) is equivalent to the RTLA formula (pc 6= done) ; Next .
Formula (4.3) assumes a particular way of expressing termination. In
general, termination of a traditional program means it’s no longer possible
for the program to take steps. We should replace (4.3) by E ; hNext iv ,
where E is a state predicate that is true in a state iff it’s possible to take a
hNext iv step in that state. We write that state predicate E as EhNext iv ,
where E is read enabled.
In general, for any action A, we define E(A) to be the state predicate
that is true in a state s iff there exists a state t such that s → t is an A
step. In other words, E is an LA operator, where for any action A the state
predicate E(A) is defined by:

[[ E(A)]](s) = ∃ t : [[A]](s → t)

In the common case when A has the form hB iv , we omit the parentheses
and write simply EhB iv . The liveness property assumed of a traditional
program whose safety property is described by the formula Init ∧ 2[Next]v
is EhNext iv ; hNext iv .

4.2.2 Concurrent Programs


In traditional programs, when the program hasn’t terminated there is just
one program statement that can be executed. In multiprocess programs,
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 122

it is usually possible for there to be multiple statements that can be exe-


cuted, each in a different process. Moreover, a process can stop not just
because it has terminated, but because it is waiting for another process to
do something. The liveness property EhNext iv ; hNext iv ensures that the
program keeps executing statements as long as some process hasn’t halted.
It is satisfied if one process keeps executing statements. It allows other pro-
cesses to halt, even if they could keep executing statements. Those other
processes are said to be starved.
It was generally accepted that processes should be treated “fairly”. Mul-
tiprocess programs were usually executed on computers having fewer proces-
sors than there were processes—for many years, usually just a single proces-
sor. It was sometimes proposed that fairness should guarantee the stronger
condition that each process gets a fair share of processor time. However,
it came to be generally accepted that fairness should not specify how long
(in terms of program steps) a process that can execute a statement might
wait before executing it. Therefore, fairness came to mean simply that no
process should be starved.
In a program with a set Procs of processes, the next-state action is
defined by

Next = ∃ p ∈ Procs : PNext(p)
where PNext(p) is the next-state action of process p. The obvious general-
ization of the liveness requirement for a traditional program suggests that
fairness for all the processes in a multiprocess program should mean:
(4.4) ∀ p ∈ Procs : EhPNext(p)iv ; hPNext(p)iv
However, this is not the way fairness should be expressed, and it is not an
appropriate liveness property for multiprocess programs. To see why, we
consider mutual exclusion algorithms.

4.2.2.1 Mutual Exclusion


The concept of fairness in a concurrent program first appeared (though not
by that name) in Edsger Dijkstra’s seminal 1965 paper that launched the
study of concurrent algorithms [9]. That paper defined mutual exclusion
and presented the first algorithm that implemented it.
In mutual exclusion, we assume a set of processes that each alternately
executes two sections of code called the noncritical and critical sections. A
mutual exclusion algorithm must ensure that no two processes can be execut-
ing their critical sections at the same time. For example, the processes may
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 123

variables . . . ; global variables


process p ∈ Procs
variables pc = ncs, . . . ; process-local variables
while true do
ncs: skip ; noncritical section
wait: .. waiting section
.
cs: skip ; critical section
exit: .. exiting section
.
end while
end process

Figure 4.1: The outline of a mutual exclusion algorithm.

occasionally print output on the same printer, and two processes printing
at the same time would produce an unreadable mixture of the two outputs.
To prevent that, the processes execute a mutual exclusion algorithm, and a
process prints only when in its critical section.
The outline of a mutual exclusion algorithm is shown in Figure 4.1,
where Procs is the set of processes. We don’t care what the processes do
in their noncritical and critical sections, so we represent them by atomic
skip statements labeled ncs and cs that do nothing when executed except
change the value of pc. The nontrivial part of the algorithm consists of the
two sections of code, the waiting and exiting sections, that begin with the
labels wait and exit. Each of those sections can contain multiple labeled
statements, using variables declared in the two variables statements.
The safety property that a mutual exclusion algorithm must satisfy is
that no two processes are executing their critical sections at the same time—
meaning that pc(p) and pc(q) cannot both equal cs for two different pro-
cesses p and q. This is an invariance property. A cute way of expressing it
compactly is:
(4.5) 2 ( ∀ p, q ∈ Procs : (p 6= q) ⇒ ({pc(p), pc(q)} =
6 {cs}) )
We will not yet state a precise liveness condition a mutual exclusion algo-
rithm should satisfy. All we need to know for now is that if some processes
enter the waiting section, they can’t all wait forever without entering the
critical section.
Most people viewing the outline in Figure 4.1 will think this is an un-
realistic description of a mutual exclusion algorithm because, by describing
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 124

variables x = (p ∈ {0, 1} 7→ false) ;


process p ∈ {0, 1}
variables pc = ncs ;
while true do
ncs: skip ;
wait: x [p] : = true ;
w 2: await ¬ x [1 − p] ;
cs: skip ;
exit: x [p] : = false
end while
end process

Figure 4.2: The unacceptable mutual exclusion algorithm UM.

the execution of the critical section with a single skip step, we are assuming
that the entire critical section is executed as a single step. Of course, we
realize that this isn’t the case. It no more says that the critical section is
executed as a single step than our description of an hour-minute clock says
that nothing else happens between the step that changes the clock’s display
to 7:29 and the step that changes it to 7:30. Just as 59 changes to a seconds
display can occur between those two steps, process p can print the entire
Bhagavad Gita while pc(p) equals cs. A mutual exclusion algorithm simply
describes all that printing as stuttering steps of the algorithm.
Figure 4.2 describes a program named UM , which is an abbreviation of
Unacceptable Mutual exclusion algorithm. Technically, it’s a mutual exclu-
sion algorithm because it satisfies property (4.5) with Procs equal to the
set {0, 1} of processes. But for reasons that will be discussed later, it isn’t
considered to be an acceptable algorithm.
This pseudocode program is the first one we’ve seen with an await state-
ment. For a state predicate P , the statement await P can be executed only
when control is at the statement and P equals true. We could write the
statement a : await P as:
a: if ¬P then goto a end if
Executing this statement in a state with P equal to true just moves control
to the next statement. Executing it in a state with P equal to false does
not change the value of any program variable, so it’s a stuttering step of the
program. Since a stuttering step is always allowed, executing the statement
await P when P equals false is the same as not executing it. So, while we
can think of the statement await P continually evaluating the expression
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 125

P and moving to the next statement iff it finds P equal to true, mathe-
matically that’s equivalent to describing it as an action A such that E(A)
equals (pc = a) ∧ P .
This is also the first pseudocode we’ve seen with explicit array variables.
An array variable x is an array-valued variable, where an array is a function
and x [p] just means x (p). We’ve already seen implicit array variables—
namely, the local variables t and pc of program Increment are represented
by function-valued variables in Figure 3.5. I have decided to write x [p] in-
stead of x (p) in pseudocode to make the pseudocode look more like real code.
However, the value of an array variable can be any function, not just a finite
ordinal sequence; and we write x (p) instead of x [p] when discussing the pro-
gram mathematically. As we’ve seen in Figure 3.5, an assignment statement
x [p] := . . . is described mathematically as x 0 = (x except p 7→ . . .).
Algorithm UM is quite simple. The processes communicate through the
variable x , with process p writing to x (p). The initial value of x (p) for each
process p is false. To enter the critical section, process p sets x (p) to true
and then enters its critical section when x (1 − p) (the array element written
by the other process) equals false.
It’s easy to see that the two processes cannot be in their critical sections
at the same time. If they were, the last process p to enter its critical section
would have read x (1 − p) equal to true when executing statement w 2,
so it couldn’t have entered its critical section. Since mutual exclusion is an
invariance property, it can be proved mathematically by finding an inductive
invariant that implies mutual exclusion. You can check that the following
formula is such an inductive invariant of UM :
(4.6) ∧ TypeOK
∧ ∀ p ∈ {0, 1} : ∧ (pc(p) ∈ {w 2, cs}) ⇒ x (p)
∧ (pc(p) = cs) ⇒ (pc(1 − p) 6= cs)
where TypeOK is the type-correctness invariant:

TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, cs, exit})
Let UMSafe be the safety property described by the pseudocode. We
want to conjoin a property UMLive to UMSafe to state a fairness require-
ment of the program’s behaviors. Let’s make the obvious choice of defining
UMLive to be formula (4.4) with Procs equal to {0, 1} and v equal to hx , pc i.
This implies that both processes keep taking steps forever, executing their
critical sections infinitely often, which makes it seem like a good choice.
Actually, that makes it a bad choice.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 126

Algorithm UM is unacceptable because formula UMSafe, which de-


scribes the pseudocode, permits deadlock. If both processes execute state-
ment wait before either executes w 2, the algorithm reaches the deadlocked
state in which neither await statement is enabled. Conjoining UMLive to
UMSafe produces a formula asserting that such a deadlocked state cannot
occur. It ensures the liveness property we want, that processes keep execut-
ing their critical sections. However, it does this not by requiring only that
processes keep taking steps, but also by preventing them from taking some
steps—namely, ones that produce a deadlocked state. A fairness property
shouldn’t do that.
Before going further, let’s see why UMSafe ∧ UMLive doesn’t allow such
a deadlocked state to be reached. The reason is that the formula satisfies
this invariant:

(4.7) ¬(pc(0) = pc(1) = w 2)

This is an invariant of UMSafe ∧ UMLive because it is true initially and


it can be made false only by a step in which a process p executes its wait
statement in a state s with pc(1 − p) = w 2; and we now show that such a
step cannot occur.
It’s an invariant of UMSafe, and hence of UMSafe ∧ UMLive, that
pc(p) = wait implies x (p) = false. Hence, pc(1−p) = w 2 and x (p) = false
in state s, which implies EhPNext(1 − p)iv is true. Therefore, UMLive im-
plies that 3hPNext(1 − p)iv must be true at state s of the behavior. This
implies that the process p step can’t occur, because it would lead to dead-
lock which would make such a hPNext(1 − p)iv step impossible. Therefore,
no step of the program can make (4.7) false. Since it is true in an initial
state, (4.7) is an invariant of UMSafe ∧ UMLive.
Thus, UMLive should not be the fairness property for algorithm UM ,
because it disallows a program step allowed by UMSafe. Before determining
what the fairness property should be, let’s characterize exactly what’s wrong
with property UMLive.

4.2.2.2 Machine Closure


The general principle illustrated by program UM is that fairness for a pro-
gram should require only that something eventually happens, so it should
rule out only infinite behaviors in which that thing never happens. It should
not rule out finite behaviors.
A liveness property by itself does not rule out any finite behaviors. A
liveness property L by definition allows any finite behavior to be completed
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 127

to a behavior that satisfies L. Property UMLive is a liveness property, so the


finite behavior in which the program reaches a deadlocked state can be com-
pleted to a behavior satisfying UMLive. For example, we can concatenate
to that finite behavior a complete behavior satisfying UMSafe ∧ UMLive.
(There are many behaviors of UMSafe that don’t deadlock.) Since that con-
catenation contains infinitely many steps of each process, it satisfies UMLive.
However, it doesn’t satisfy UMSafe because the step between the last state
of the deadlocked finite behavior (which is a deadlocked state) and the first
state of a complete behavior does not satisfy the program’s next-state action.
It is impossible to complete that deadlocked behavior to a behavior satis-
fying both UMLive and UMSafe because the next-state action of UMSafe
does not allow any non-stuttering step from a deadlocked state.
For a liveness property L to be a fairness property for UMSafe, it should
not just require that any finite behavior can be completed to a behavior
satisfying L; it should require that any finite behavior that satisfies UMSafe
can be completed to a behavior that satisfies UMSafe ∧ L.
In general, a pair hS , Li, where S is a safety property and L a liveness
property, is defined to be machine closed iff every behavior satisfying S can
be completed to a behavior satisfying S ∧ L. We require that a fairness
property for a program having the safety property S be a liveness property
L such that hS , Li is machine closed. The following theorem, proved in the
Appendix, provides a nice mathematical characterization of machine closure.

Theorem 4.4 If S is a safety property and L a liveness property, then


hS , Li is machine closed iff |= C(S ∧ L) ≡ S .

4.2.3 Weak Fairness


Let’s now see how to describe fairness with a machine-closed liveness prop-
erty. Using the requirement

(4.8) EhPNext(p)iv ; hPNext(p)iv

worked fine for program Increment. It failed for program UM . The reason it
worked for program Increment is that when PNext(p) is enabled, it remains
enabled until a PNext(p) step occurs. In program UM , process p can reach
a state in which pc(p) equals w 2 and PNext(p) is enabled, but process 1 − p
can then take a step that disables action PNext(p).
To obtain a machine-closed condition, we have to weaken (4.8) so it rules
out fewer behaviors. The obvious way to do that is by requiring a PNext(p)
step to occur not if PNext(p) just becomes enabled, but only if it remains
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 128

enabled until a PNext(p) step occurs. Let F U G be the temporal formula


asserting that F is true until G is true. (We’ll discuss later exactly what U
means.) As our choice of fairness property, we will replace property (4.8)
by:
(4.9) ( EhPNext(p)iv U hPNext(p)iv ) ; hPNext(p)iv
However, rather surprisingly, (4.9) is equivalent to:
(4.10) 2 EhPNext(p)iv ; hPNext(p)iv
To prove this, we first have to examine the definition of U. When we say
that E is true until P is true, we usually mean that P is eventually true
and E is true until P becomes true. But if E U P implies that 3P is true,
then (4.9) would be trivially true and thus useless, since it would assert that
hPNext(p)iv conjoined with some other condition leads to hPNext(p)iv . So
we have to interpret E U P to mean that either P is eventually true and E
is true until it is, or P is not eventually true and E is true forever. Thus,
whatever the precise meaning of U is, we have:
(4.11) |= (E U P ) ≡ (3P ∧ E U P ) ∨ (¬3P ∧ 2E )
The equivalence of (4.9) and (4.10) follows from this RTLA theorem, which
is proved in the Appendix.
Theorem 4.5 (4.11) implies |= ((E U P ) ; P ) ≡ (2E ; P ) for any
formulas E and P .
A liveness property commonly assumed of multiprocess algorithms, called
weak fairness, is that (4.10) is true for every process p. We generalize this
concept to define weak fairness of an arbitrary action A to be the formula
WFv (A) defined by:

(4.12) WFv (A) = 2 EhAiv ; hAiv


Another form of fairness called strong fairness that is sometimes assumed


is discussed later. We will see how weak and strong fairness are used to
write machine-closed descriptions of abstract programs. A special case
of the general result is that if S is the formula Init ∧ 2[Next]v and Next
equals ∃ p ∈ Procs : PNext(p), then hS , ∀ p ∈ Procs : WFv (PNext(p)i is ma-
chine closed. But before we get to all that, let’s examine weak fairness.
The first thing we observe is that for a multiprocess program described
with pseudocode, weak fairness of a process’s next-state action is equiva-
lent to the conjunction of weak fairness of all the actions described by the
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 129

process’s statements. That is, if PNext(p) equals ∃ i ∈ I : Ai (p) for a set


I , action Ai (p) describes the statement with label i , and v is the tuple of
program variables, then:

(4.13) |= WFv (PNext(p)) ≡ ∀ i ∈ I : WFv (Ai (p))

This is because hPNext(p)iv is enabled iff hApc(p) (p)iv is enabled, in which


case pc(p) can be changed only by an hApc(p) (p)iv step. Thus, in any state
of a behavior with pc(p) = i , the formula EhPNext(p)iv ⇒ 3hPNext(p)iv
is equivalent to EhAi (p)iv ⇒ 3hAi (p)iv . A rigorous justification of (4.13)
is that it is a special case of Theorem 4.7 in Section 4.2.7 below.
The following tautology is useful for deducing properties from weak fair-
ness assumptions:

(4.14) |= WFv (A) ≡ (32 EhAiv ⇒ 23hAiv )

It makes weak fairness look stronger than the definition because 23hAiv
is a stronger property than 3hAiv . Here’s an informal proof of (4.14).
The definition of WFv (A) implies the right-hand side of the equivalence be-
cause 32 EhAiv implies that eventually hAiv is always enabled, whereupon
WFv (A) keeps forever implying that an hAiv step occurs, so there must be
infinitely many hAiv steps, making 23hAiv true. The opposite implica-
tion is true because 2 EhAiv implies 32 EhAiv , so the right-hand side of
the equivalence implies that 23hAiv is true and hence 3hAiv is true. A
rigorous proof of (4.14) is by the following RTLA4 reasoning, substituting
EhAiv for F and hAiv for G:

2F ; G ≡ 2(2F ⇒ 3G) by definition of ;


≡ 2(¬2F ∨ 3G) by propositional logic
≡ 2(3¬F ∨ 3G) by ¬2F = 3¬F (3.25)
≡ 23(¬F ∨ G) by (3.23)
≡ 23¬F ∨ 23G by (3.27)
≡ ¬23¬F ⇒ 23G by propositional logic
≡ 32F ⇒ 23G ¬23¬F ≡ 3¬3¬F ≡ 32¬¬F by (3.25)

The following tautology is useful for proving a weak-fairness formula, be-


cause it has an additional hypothesis in the implication:

(4.15) |= WFv (A) ≡ (2 EhAiv ∧ 2[¬A]v ; hAiv )


4
With these definitions of F and G, the formula 23(¬F ∨ G) is not a TLA formula.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 130

It is proved by expanding the definition of ; and applying the proposi-


tional logic tautology |= (F ⇒ G) ≡ (F ∧ ¬G ⇒ G) and the TLA tautol-
ogy |= ¬3hAiv ≡ 2[¬A]v . Using (4.15) to prove WFv (A) is essentially a
proof by contradiction.
Proving these kinds of temporal logic tautologies is a good exercise.
However, there are temporal logic theorem provers that can do it for you.

4.2.4 Temporal Logic Reasoning


Thus far, the only properties we’ve verified that programs satisfy have been
invariance properties. Proving invariance requires no temporal logic rea-
soning. To prove |= Init ∧ 2[Next]v ⇒ 2Inv we prove the LA formulas
|= Init ⇒ Inv and |= Inv ∧ [Next]v ⇒ Inv 0 and then apply a single temporal
logic proof rule.
Nontrivial temporal logic reasoning is required for proving that programs
satisfy liveness properties. We often prove liveness properties of the form
P ; Q. This property asserts that something is true at all “times” in a
behavior—namely, whenever P is true, Q is eventually true. The description
Init ∧ 2[Next]v of a program cannot be used directly to prove P ; Q
because it asserts only that something is true initially. For that reason, the
first thing we do when proving P ; Q is to prove that some formula Inv is
an invariant of the program, so the program implies 2Inv ∧ 2[Next]v . We
then use 2Inv ∧ 2[Next]v to prove P ; Q.
A formula F that asserts something is true at all times is called a 2
formula. The formula 2Inv ∧ 2[Next]v is a 2 formula because it’s equiv-
alent to the RTLA formula 2(Inv ∧ [Next]v ). Because of the tautology
|= 22F ≡ 2F , we can define F to be a 2 formula iff it’s equivalent to
2F . By (3.17), the conjunction of 2 formulas is a 2 formula. In general,
(3.18) implies ∀ i ∈ S : F i is a 2 formula if each F i is a 2 formula.
In a proof, we almost always want every temporal logic formula asserted
by a statement to be a 2 formula. A theorem with statement F asserts
|= F . The proof rule (3.16) tells us that |= F implies |= 2F . This means
that whenever we prove F , we have proved 2F . However, that does not
mean that when we have proved a step in a proof that asserts F , we have
proved 2F . When we prove a step

4.2.7. G ; H

we have not proved |= G ; H . We have proved |= Asp ⇒ (G ; H ), where


Asp is the conjunction of all the assumptions in effect at statement 4.2.7. By
the proof rule (3.21), |= Asp ⇒ (G ; H ) implies |= 2Asp ⇒ 2(G ; H ).
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 131

If Asp is a 2 formula, so it’s equivalent to 2Asp, then proving 4.2.7 proves


|= Asp ⇒ 2(G ; H ). Therefore proving 4.2.7 is equivalent to proving

4.2.7. 2(G ; H )

if Asp is a 2 formula. Since the conjunction of 2 formulas is a 2 formula,


Asp is a 2 formula if all the assumptions in effect at statement 4.2.7 are 2
formulas.
We assure that every statement that asserts a temporal formula F asserts
2F by making every temporal formula in an assume clause be a 2 formula.
Any temporal formula F asserted by a statement can then be considered to
assert 2F . The following are 2 formulas for all temporal formulas F and
G, state expressions v , and actions A: 23F (obviously), 32F by (3.29),
F ; G by definition of ;, and WFv (A) by the definition (4.12) of WF.

4.2.5 Reasoning With Weak Fairness


We now see how to show that an abstract program with weak fairness con-
ditions satisfies a liveness property. We will do this with a modification
of algorithm UM called the One-Bit algorithm that is an acceptable mu-
tual exclusion algorithm. But first, we examine what liveness condition the
algorithm should satisfy.

4.2.5.1 Liveness for Mutual Exclusion


The liveness condition Dijkstra required of the mutual exclusion algorithm
outlined in Figure 4.1 was that if some process is at statement wait, then
eventually some process enters its critical section—expressed by:

(4.16) (∃ p ∈ Procs : pc(p) = wait) ; (∃ p ∈ Procs : pc(p) = cs)

This condition is usually called deadlock freedom. That’s a misuse of the


term, because deadlock freedom is actually the safety property asserting that
the program never reaches a deadlocked state—one in which no process
can take a step. Property (4.16) also rules out what is called livelock, in
which no process enters its critical section although some processes keep
executing statements in their waiting sections. However, when discussing
mutual exclusion, we will use deadlock freedom to mean property (4.16).
This condition allows one or more processes to be starved—that is, to remain
forever in their waiting section—while other processes enter and leave the
critical section.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 132

variables x = (p ∈ {0, 1} 7→ false) ;


process p ∈ {0, 1}
variables pc = ncs ;
while true do
ncs: skip ;
wait: x [p] : = true ;
w 2: if p = 0 then await ¬ x [1]
else if x [0] then w 3: x [1] : = false ;
w 4: await ¬ x [0] ;
goto wait
end if
end if ;
cs: skip ;
exit: x [p] : = false
end while
end process

Figure 4.3: Algorithm OB.

Dijkstra also required that processes be allowed to remain forever in


their noncritical sections. Just because a process might send output to the
printer, we don’t want to insist that it does. This requirement rules out
simple algorithms in which processes take turns entering the critical section.
A process that does not want to enter its critical section cannot be required
to do anything to allow other processes to enter their critical sections.

4.2.5.2 The One-Bit Algorithm


The basic idea of the One-Bit algorithm is to modify algorithm UM to pre-
vent deadlock by having process 1 wait when both processes are concurrently
trying to enter the critical section. This is done by modifying process 1 so
that if it sees that x [0] equals true in statement w 2, then it sets x [1] to
false and waits until x [0] equals false (so process 0 has exited its crit-
ical section) before going back to statement wait and trying again. The
pseudocode for the algorithm, which we call OB is in Figure 4.3.5
Algorithm OB satisfies mutual exclusion because processes use the same
protocol to enter the critical section as algorithm UM : Each process p sets
x [p] to true and can then enter the critical section only if x [1 − p] equals
5
Algorithm OB is the two-process case of an N -process algorithm that was discovered
independently by James E. Burns and me in the 1970s, but not published until later [7, 29].
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 133

false. In fact, that OB satisfies mutual exclusion can be proved with the
same inductive invariant (4.6) as UM except that the type invariant TypeOK
must be modified because pc(1) can now also equal w 3 or w 4. For algorithm
OB , we define:6

(4.17) TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, w 3, w 4, cs, exit})
∧ pc(0) ∈
/ {w 3, w 4}
However, the resulting inductive invariant isn’t strong enough for proving
liveness. We now consider liveness.
Let OBSafe, the safety property of OB described by the pseudocode, be
the formula Init ∧ 2[Next]v , where v equals hx , pc i and

Next = ∃ p ∈ {0, 1} : PNext(p)
The fairness condition we want OB to satisfy is weak fairness of each pro-
cess’s next-state action except when the process is in its noncritical section.
A process p remaining forever in its noncritical section is represented in our
abstract program by no PNext(p) step occurring when pc(p) equals ncs.
The fairness condition we assume of program OB is therefore:

OBFair = ∀ p ∈ {0, 1} : WFv ((pc(p) 6= ncs) ∧ PNext(p))
The formula OBSafe ∧ OBFair , which we call OB , satisfies the liveness
property that if process 0 is in its waiting section, then it will eventually
enter its critical section. That is, OB implies:
(4.18) (pc(0) ∈ {wait, w 2}) ; (pc(0) = cs)
This implies deadlock freedom, because if process 0 stops entering and leav-
ing its critical section, then it eventually stays forever in its noncritical
section. If process 1 is then in its waiting section, it will read x [0] equal to
false and enter its critical section.
The inductive invariant obtained from the inductive invariant of UM
isn’t strong enough because it doesn’t assert that x [p] = false when process
p is in its noncritical, which is at the heart of why OB is deadlock free. For
that we need this stronger invariant, where TypeOK is defined by (4.17):
(4.19) ∧ TypeOK
∧ x [0] ≡ (pc[0] ∈ {“w2”, “cs”, “exit”})
∧ x [1] ≡ (pc[1] ∈ {“w2”, “w3”, “cs”, “exit”})
∧ ∀ p ∈ {0, 1} : (pc[p] = “cs”) ⇒ (pc[1 − p] 6= “cs”)
6
For any infix predicate symbol like = or ∈, putting a slash through the symbol negates
it, so e ∈/ S means ¬(e ∈ S ).
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 134

2Inv ∧ 2[Next]v ∧ OBFair

pc(0) ∈ {wait, w 2} 1 - pc(0) = w 2 3 - pc(0) = cs


PP 1

1 P
q 
2
pc(0) = wait

Figure 4.4: Leads-to lattice for the proof of (4.18).

4.2.5.3 Proving Liveness


We will now see how to reason more rigorously about liveness. Even if
you never write rigorous correctness proofs, learning how to reason about
liveness will help you better understand liveness properties.
There are two kinds of liveness properties that we prove: that a program
implies leads-to properties such as (4.16), and that a program implies the
fairness properties of a more abstract program. Here we consider leads-to
properties. Proving fairness properties is discussed in Section 5.4.
The proof of a leads-to formula is usually decomposed into proving sim-
pler leads-to formulas. Figure 4.4 shows how we decompose the proof of
formula (4.18) using what we call a leads-to lattice.
First, let’s pretend that the box and the formula labeling it aren’t there.
We then have just a directed graph whose nodes are formulas. A formula F
and its outgoing edges represent the assertion that F leads to the disjunction
of the formulas those edges point to. Thus, the two edges numbered 1 assert
the formula:

(pc(0) ∈ {wait, w 2}) ; ((pc(0) = wait) ∨ (pc(0) = w 2))

By the meaning of leads to, the property asserted by each formula F in the
graph means that if the program is ever in a state for which F is true, then
it will eventually be in a state satisfying a formula pointed to by one of the
outgoing edges from F . The graph has a single sink node (one having no
outgoing edge). Every path in the graph, if continued far enough, leads to
the sink node. By transitivity of the ; relation, this means that if all the
properties asserted by the diagram are true of a behavior, then the behavior
satisfies the property F ; H , where H is the sink-node formula and F
is any formula in the lattice. In particular, the properties asserted by the
diagram imply formula (4.18). By (3.31), that every formula in the graph
leads to the sink-node formula means that the disjunction of all the formulas
in the graph leads to the sink-node formula.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 135

Now to explain the box. Let Λ equal 2Inv ∧ 2[Next]v ∧ OBFair , the
formula that labels the box. Formula Λ is implicitly conjoined to each of the
formulas in the graph. It is a 2 formula, since the conjunction of 2 formulas
is a 2 formula, and OBFair is the conjunction of WF formulas, which are
2 formulas.
Since Λ is conjoined to every formula in it, the leads-to lattice makes
assertions of the form

Λ ∧ G ; (Λ ∧ H 1 ) ∨ . . . ∨ (Λ ∧ H j )

Since Λ equals 2Λ, and once 2Λ is true it is true forever, this formula is
equivalent to Λ ∧ G ; H 1 ∨ . . . ∨ H j . (This follows from (3.32c) and
propositional logic.)
If H is the unique sink node of the lattice, then proving the assertions
made by the lattice proves |= Λ ∧ G ; H for every node G of the lat-
tice. Since Λ equals 2Λ, it’s easy to see that a behavior that satisfies
Λ ∧ G ; H must satisfy Λ ⇒ (G ; H ), so proving |= Λ ∧ G ; H
proves |= Λ ⇒ (G ; H ). All this is true only because the formula Λ is a
2 formula. In general, we label a box in a leads-to lattice only with a 2
formula.
Remember what the proof lattice of Figure 4.4 is for. We want to prove
that OB implies (4.18). By proving the assertions made by the proof lattice,
we show that the formula Λ labeling the box implies (4.18). By definition of
OB and because OB implies 2Inv , formula Λ is implied by OB . Therefore,
by proving the leads-to properties asserted by the proof lattice, we prove that
OB implies (4.18). Note how we had to use the 2 formula 2Inv ∧ 2[Next]v
instead of OBSafe, which is true only initially.
To complete the proof that OB implies (4.18), we now prove the leads-
to properties asserted by Figure 4.4. The leads-to property asserted by the
edges numbered 1 is:

Λ ∧ (pc(0) ∈ {wait, w 2}) ; ((pc(0) = wait) ∨ (pc(0) = w 2))

It is trivially true, since pc(0) ∈ {wait, w 2} implies that pc(0) equals wait
or w 2, and 2(F ⇒ G) implies F ; G.
The formula Λ ∧ (pc(0) = wait) ; (pc(0) = w 2) asserted by edge num-
ber 2 is true because Λ implies 2Inv ∧ 2[Next]v and the weak fairness
assumption of process 0, which imply (pc(0) = wait) ; (pc(0) = w 2).
The formula

(4.20) Λ ∧ (pc(0) = w 2) ; (pc(0) = cs)


CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 136

2Inv ∧ 2[Next]v ∧ OBFair


(pc(0) = w 2)
1 1
?
(pc(0) = w 2) ∧ 2(pc(0) 6= cs)
2
?
2((pc(0) = w 2) ∧ x (0))
2((pc(0) = w 2) ∧ x (0))

3 
 @3
)
  @
pc(1) ∈ {cs, exit, ncs} @
PP
4 4
PP @
? PP
q R
@
2(pc(1) = ncs) pc(1) ∈ {wait, w 2, w 3, w 4}
Z 6
Z ?
2(pc(1) = w 4)
Z
5ZZ
Z 
~
Z 7

)
2¬x (1)
8
?
pc(0) = cs 

Figure 4.5: Leads-to lattice for the proof of (4.20).

asserted by edge number 3 is the interesting one. Its proof is decomposed


with the proof lattice of Figure 4.5.
The property asserted by the edges numbered 1 in this leads-to lattice has
the form Λ ∧ F ; (G ∨ (F ∧ 2¬G)). This formula is a tautology. Intuitively,
it’s true because if F is true now, then either G is true eventually or F is
true now and ¬G is true from now on. It’s proved by:
Λ∧F ⇒ F ∧ (3G ∨ 2¬G) 3G ∨ 2¬G equals true
⇒ 3G ∨ (F ∧ 2¬G) by propositional logic
⇒ 3G ∨ 3(F ∧ 2¬G) by (3.23)
≡ 3(G ∨ (F ∧ 2¬G)) by (3.23)

which shows |= (Λ ∧ F ⇒ 3(G ∨ (F ∧ 2¬G)), from which we can deduce


|= Λ ∧ F ; (G ∨ (F ∧ 2¬G)) by (3.20) and the definition of ;.
The leads-to formula asserted by edge 2 is an implication. It’s true be-
cause 2Inv ∧2[Next]v implies that, if pc(0) = w 2 and 2(pc(0) 6= cs) are ever
true, then pc(0) = w 2 must remain true forever, which by 2Inv implies x (0)
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 137

must equal true forever. It is proved by proving that (pc(0) = w 2) ∧ x (0)


is an invariant of an abstract program. The initial predicate of this program
is Inv ∧ (pc(0) = w 2). Its next-state relation is:

Next2 = Inv ∧ Next ∧ (pc(0) 6= cs)0


The formula 2[Next2]v is implied by 2Inv and 2[Next]v and the conjunct
2(pc(0) 6= cs) of the formula at the tail of the edge 2 arrow. (Note that the
prime in this formula is valid because pc(0) 6= cs always true implies that
it’s always true in the next state.) We are using an invariance property of
one program to prove a liveness property of another program. This would
seem strange if we were thinking in terms of code. But we’re thinking
mathematically, and a mathematical proof contains lots of formulas. It’s
not surprising if one of those formulas looks like the formula that describes
a program.
The edges numbered 3 enter a box whose label is the same formula from
which those edges come. In general, an edge can enter a box with a label
2F if it comes from a formula that implies 2F . This is because a box
labeled 2F is equivalent to conjoining 2F to all the formulas in the box,
and 2F ; (G 1 ∨ . . . ∨ G n ) implies 2F ; ((2F ∧ G 1 ) ∨ . . . ∨ (2F ∧ G n )).
An arrow can always leave a box, since removing the formula it points to
from the box just weakens that formula.
Proofs of the assertions represented by the rest of the lattice’s edges are
sketched below.

edges 3 The formula represented by these edges is true because the dis-
junction of the formulas they point to asserts that pc(1) is in the set
{ncs, wait, w 2, w 3, w 4, cs, exit}, which is implied by 2Inv .

edges 4 If pc(1) equals cs or exit, then 2Inv ∧ 2[Next]v and the fairness
condition for process 1 imply that it will eventually be at ncs. Either
pc(1) equals ncs forever or eventually it will not equal ncs. In the
latter case, 2[Next]v implies that the step that makes pc(1) = ncs
false must make pc(1) = wait true.

edge 5 This is an implication since 2Inv implies that if process 1 is forever


at ncs, then x (1) is forever false.

edge 6 If process 1 is at wait, w 2, or w 3, then its weak fairness condition


implies it is eventually at w 4. When process 1 is at w 4, formulas
2[Next]v and 2x (0) (from the label of the inner box) imply that it
must remain forever at w 4.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 138

edge 7 This is an implication, because Inv and pc(1) = w 4 imply ¬x (1).

edge 8 2¬x (1), 2(pc(0) = w 2) (implied by the inner box’s label), and
OBFair imply that a process 0 step that makes pc(0) equal to cs must
eventually occur. (Equivalently, these three formulas are contradic-
tory, so they imply false which implies anything.)

The proof sketches of the properties asserted by edges 4 and edge 6 skim
over more details than the proofs of the other the properties asserted by the
lattice. A more detailed proof would be described by a lattice in which each
of the formulas pointed to by the edges numbered 3 were split into multiple
formulas—for example, the formula pc(1) ∈ {cs, exit, ncs} would be split
into the formulas pc(1) = cs, pc(1) = exit, and pc(1) = ncs. A good check
of your understanding is to draw the more detailed lattice and write proof
sketches for its new edges.

4.2.6 Strong Fairness


4.2.6.1 Starvation Free Mutual Exclusion
Mutual exclusion was not motivated by sharing a printer. It’s needed when
multiple processes perform operations on the same data. As we saw from
the Increment example of Section 3.3, even sharing a simple counter without
synchronization can result in increment operations being lost. An easy way
to synchronize data sharing is to put every operation to the shared data in
a critical section.
The One-Bit Algorithm OB implements mutual exclusion with processes
that communicate using only simple reads and writes of shared variables.
Synchronizing processes in this way is inefficient. Dijkstra proposed the
communication mechanism called a binary semaphore or lock. A lock is a
variable that can have two values, traditionally taken to be 0 and 1. Let’s
call that variable sem. Initially sem equals 1. A process can execute two
atomic operations, P (sem) and V (sem), to the lock. These operations can
be described in pseudocode as:

P (sem) V (sem)
await sem = 1; sem : = 1
sem : = 0
Locks were originally implemented with operating system calls. Modern
multiprocessor computers provide machine instructions to implement them.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 139

variables sem = 1 ;
process p ∈ Procs
while true do
ncs: skip ;
wait: P (sem) ;
cs: skip ;
exit: V (sem)
end while
end process

Figure 4.6: Program LM : mutual exclusion with a semaphore.

Using a lock, mutual exclusion for any set Procs of processes can be imple-
mented with the trivial algorithm LM of Figure 4.6.
Let PNext(p) now be the next-state action of process p of program LM .
With weak fairness of (pc(p) 6= ncs) ∧ PNext(p) for each process p as its
fairness property, algorithm LM satisfies the deadlock freedom condition
(4.16). However, deadlock freedom allows individual processes to be starved,
remaining forever in the waiting section.
Let Wait(p), Cs(p), and Exit(p) be the actions described by the state-
ments in process p with the corresponding labels wait, cs, and exit. Weak
fairness of (pc(p) 6= ncs)∧PNext(p) is equivalent to the conjunction of weak
fairness of the actions Wait(p), Cs(p), and Exit(p). Program LM allows
starvation of individual processes because weak fairness of the actions of
each process ensures that if multiple processes are waiting to execute that
action, then some process will execute it. But if processes continually reach
the wait statement, some individual processes p may never get to execute
Wait(p).
It’s reasonable to require the stronger condition of starvation freedom,
which asserts that no process starves. This is the property
(4.21) ∀ p ∈ Procs : (pc(p) = wait) ; (pc(p) = cs)
which asserts that any process reaching wait must eventually enter its critical
section. For LM to satisfy this property, it needs a stronger fairness property
than weak fairness of the Wait(p) actions.

4.2.6.2 The Definition of Strong Fairness


By the usual meaning of fair, the fairest lock would be one in which processes
execute their Wait(p) actions in the order in which they set pc(p) to wait.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 140

Some implementations of locks ensure this property. However, we don’t


consider it to be a fairness property because it produces a description of
program LM that is not machine closed. It rules out finite behaviors in which
processes execute their Wait(p) actions in the wrong order—executions in
which a process p reaches the wait statement after process q, but p enters
the critical section before q does. Machine closure means that the liveness
condition does not forbid any finite behaviors allowed by the program’s
safety property described by the pseudocode.
There is a standard way to strengthen weak fairness that produces ma-
chine closed program descriptions. Weak fairness of an action A asserts
that if ever A is always enabled, then an A action must eventually occur.
To strengthen this condition, we replace the requirement that A be always
enabled by the weaker requirement that it be infinitely often enabled. We
therefore define strong fairness of A by:
(4.22) SFv (A) = 23 EhAiv ; hAiv

Analogous to formulas (4.14) and (4.15) for weak fairness are:


(4.23) |= SFv (A) ≡ (23 EhAiv ⇒ 23hAiv )

(4.24) |= SFv (A) ≡ (23 EhAiv ∧ 2[¬A]v ; hAiv )

The informal justification and the proof of (4.23) are similar to the ones for
(4.14). The proof of (4.24) is essentially the same as that of (4.15).

4.2.6.3 Using a Strongly Fair Semaphore


To make program LM starvation free, meaning that it satisfies (4.21), we
conjoin to the safety property LMSafe defined by the pseudocode the fairness
property LMFair equal to ∀ p ∈ Procs : LMPFair (p), where LMPFair (p) is
the fairness requirement for process p. If we let LMPFair (p) be the conjunc-
tion of strong fairness of Wait(p) and weak fairness of Cs(p) and Exit(p),
then the following argument shows LMSafe ∧LMFair implies starvation free-
dom. Starvation means that a process p waits forever with pc(p) = wait.
That is possible only if other processes keep entering and leaving their crit-
ical sections. But whenever a process executes the Exit action, it sets sem
to 1, which makes EhWait(p)iv true. Thus EhWait(p)iv must be true in-
finitely often, which by strong fairness of Wait(p) implies that Wait(p) is
eventually executed, so p must enter its critical section.
There are several ways of writing formula LMPFair (p). First, we ob-
serve that weak and strong fairness are equivalent for the actions Cs(p) and
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 141

Exit(p). This is because the action is enabled iff pc(p) has the appropriate
value, so it remains enabled until a step of that action occurs to change
pc(p). Thus, when the action is enabled, it is continuously enabled until it
is executed. We can therefore write LMFair as the conjunction of strong
fairness of the three actions Wait(p), Cs(p), and Exit(p).
The same sort of reasoning that led to (4.13) of Section 4.2.3, as well as
Theorem 4.7 of Section 4.2.7, imply that the conjunction of strong fairness
of these three actions is equivalent to strong fairness of their disjunction.
Therefore, we can write LMFair as strong fairness of their disjunction, which
equals (pc(p) 6= ncs) ∧ PNext(p).
While SFv ((pc(p) 6= ncs) ∧ PNext(p)) is compact, I prefer not to define
LMPFair (p) this way because it suggests to a reader of the formula that
strong fairness of Cs(p) and Exit(p) is required, although only weak fairness
is. Usually, the process’s next-state action will be the disjunction of many
actions, and strong fairness is required of only a few of them. I would define
LMFair to equal

WFv ((pc(p) 6= ncs) ∧ PNext(p)) ∧ SFv (Wait(p))

This is redundant because the first conjunct implies weak fairness of Wait(p)
and the second conjunct asserts strong fairness of it. But a little redundancy
doesn’t hurt, and its redundancy should be obvious because strong fairness
implies weak fairness.

4.2.7 Properties of WF and SF


Several assertions were made above about weak and strong fairness. We now
assert the general theorems from which those assertions follow. The first is
that taking weak and strong fairness conditions as the liveness property
produces a machine-closed description of a program. This is true if we take
not just a single weak or strong fairness property or the conjunction of a
finite number of them. It is true for a conjunction of a countable set of weak
and/or strong fairness conditions, if each of those conditions asserts fairness
of a subaction of the program’s next-state action.
An action A is defined to be a subaction of an action Next iff |= A ⇒ Next
is true. The subactions of the next-state action Next for which fairness is as-
serted are usually disjuncts in the definition of Next—for example, Wait(p)
is a subaction of the next-state action Next of algorithm OB . Often, we as-
sert fairness of an action P ∧A where A is a disjunct in the definition of Next
and P is a state predicate—for example, the action (pc(p) 6= ncs)∧PNext(p)
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 142

of OB .7
We now state the theorem asserting that the conjunction of fairness prop-
erties produces a machine-closed specification. Its proof is in the Appendix.

Theorem 4.6 Let Init be a state predicate, Next an action, and v a tuple
of all variables occurring in Init and Next. If Ai is a subaction of Next for
all i in a countable set I , then the pair

h Init ∧ 2[Next]v , ∀ i ∈ I : XFiv (Ai ) i

is machine closed, where each XFiv may be either WFv or SFv .

Writing an infinite conjunction of fairness properties may not seem to be


something we would do in practice. However, the next-state action of an
abstract program sometimes does contain an infinite disjunction—that is,
existential quantification over an infinite set of subactions—and we might
want a fairness condition for each of those subactions. For example, a pro-
gram that dynamically creates processes may be described as having an
infinite number of processes, only a finite number of which have their next-
state action enabled at any time. We might want a fairness condition for
each of those processes.
It was stated above that, for program LM , weak or strong fairness of
(pc(p) 6= ncs) ∧ PNext(p) is equivalent to weak or strong fairness of its
three subactions Wait(p), Cs(p), and Exit(p). That’s true because when
any one of these subactions is enabled, a step of neither of the other two
subactions can occur until a step of that subaction occurs. Let Q equal
(pc(p) 6= ncs) ∧ PNext(p) and let’s call its three subactions A1 , A2 , and A3 .
This condition can then be asserted as:

(4.25) ∀ i ∈ 1 . . 3 : EhAi iv ⇒ ( EhQ iv ⇒ EhAi i) U hAi iv

where U is the until operator with which we first defined weak fairness
of PNext(p) as (4.8). Similarly to what we did for weak fairness, we can
remove the U by observing that F U G implies that if G is never true,
then F must remain true forever. That hAi iv is never true is asserted by
¬3hAi iv , which is equivalent to 2[¬Ai ]v . Therefore (4.25) implies

(4.26) ∀ i ∈ 1 . . 3 : EhAi iv ∧ 2[¬Ai ]v ⇒ 2( EhQ iv ⇒ EhAi i)


7
All the results that will be stated are satisfied if we allow a subaction A to be one
satisfying |= Inv ∧ A ⇒ Next, where Inv is an invariant of the program. However, this
generalization does not seem to be needed in practice.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 143

While (4.25) implies (4.26), the formulas are not equivalent. Formula (4.26)
is strictly weaker than (4.25). However, it’s strong enough to imply that
strong or weak fairness of all the Ai is equivalent to strong or weak fairness
of Q—assuming that Q is the disjunction of the Ai . Here is the precise
theorem; its proof is in the Appendix.

Theorem 4.7 Let Ai be an action for each i ∈ I , let Q = ∃ i ∈ I : Ai ,
and let XF be either WF or SF. Then

|= ( ∀ i ∈ I : 2( EhAi iv ∧ 2[¬Ai ]v ⇒
2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ) )
⇒ ( XFv (Q) ≡ ∀ i ∈ I : XFv (Ai ) )

It is perhaps interesting, but of no practical significance, that the theorem


is valid even if the set of actions Ai is uncountably infinite.
The completeness result for safety properties in Theorem 4.1 can be
extended as follows to arbitrary properties. A proof is sketched in the Ap-
pendix.

Theorem 4.8 Let x be the list x 1 , . . . , x n of variables and let F be a


property such that F (σ) depends only on the values of the variables x in σ,
for any behavior σ. There exists a formula S equal to Init ∧ 2[Next]hx,y i ∧
WFhx,y i (Next), where Init and Next are defined in terms of F , y is a variable
not among the variables x, and the variables of S are x and y, such that
|= F ⇒ [[G]] iff |= S ⇒ G, for any property G. If F is a safety property,
then the conjunct WFhx,y i (Next) is not needed.

Like Theorem 4.1, this result is of theoretical interest only.

4.2.8 What is Fairness?


Before TLA, concurrent abstract programs were generally written in some-
thing like a coding language. Fairness meant that each process had to exe-
cute its next atomic statement when it could. Viewed in terms of TLA, each
atomic statement was described by an action, and fairness meant fairness
of those actions. Usually that meant weak fairness of the action, but when
the statement was a synchronization primitive, it sometimes meant strong
fairness. For rigorous reasoning, those fairness requirements were expressed
as requirements on when control in the process had to move from one control
point to another [44].
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 144

With TLA, fairness was generalized to weak and strong fairness of arbi-
trary actions. We have considered a fairness property for a safety property
S to be a formula L that is the conjunction of weak and strong fairness con-
ditions on actions such that hS , Li is machine closed. However, weak and
strong fairness of an action are defined in terms of how the action is written,
not in terms of its semantics. While we have given semantic definitions of
safety, liveness, and machine closure; we have not done it for fairness.
I only recently learned that a semantic definition of fairness was pub-
lished in 2012 by Völzer and Varacca [48]. Their definition of what it means
for a property L to be a fairness property for a safety property S can be
stated in terms of the following infinite two-player game. Starting with seq
equal to the empty sequence, the two players forever alternately take steps
that append a finite number of states to seq. The only requirement on the
steps is that after each one, seq must satisfy S . The second player wins
the game if she makes seq an infinite sequence that satisfies L. (Since S
is a safety property, seq must satisfy S .) They defined L to be a fairness
property for S iff the second player can always win, regardless of what the
first player does (as long as he follows the rules).
It is mathematically meaningless to say that a definition is correct. How-
ever, this seems to be the only reasonable definition that includes weak and
strong fairness such that fairness implies machine closure and the conjunc-
tion of countably many fairness properties is a fairness property. This defi-
nition also encompasses other fairness properties that have been proposed,
including one called hyperfairness [33].
I believe that weak and strong fairness of actions are the only fairness
properties that are relevant to abstract programs. However, this general
definition is interesting because it provides another way to think about fair-
ness. More importantly, it’s interesting because concepts we are led to by
mathematics often turn out to be useful.

4.3 Possibility and Accuracy


4.3.1 Possibility Conditions
Informally, a safety property states what an abstract program is allowed to
do and a liveness property states what it must do. If a behavior violates a
safety property, then it does so at a particular step in the behavior. There-
fore, we can also view a safety property as stating what a program must not
do—that is, it must not take a step that violates the property. So, liveness
says what must happen and safety says what must not happen. That a
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 145

program eventually sets x to 0 is a liveness property, that it never sets x to


0 is a safety property.
Possibility says what might happen. That a program might set x to 0 is
a possibility condition. It is not a property, because it is not a predicate on
behaviors. It is satisfied by an abstract program iff there is some behavior
of the program in which x is set to 0. We can tell that the program satisfies
it if we see such a behavior. But seeing one behavior that doesn’t satisfy it
doesn’t tell us whether or not some other behavior might satisfy it. However,
we will see that possibility conditions can be expressed as properties that
explicitly mention the program’s actions.
Knowing that something might be true of a system, but knowing noth-
ing about the probability of its being true, is of almost8 no practical use.
The only way I know of calculating such probabilities is to view the abstract
program as a state-transition system, attach probabilities to the various
transitions, and mathematically analyze that system—for example, using
Markov analysis. Usually, the state-transition system would be a more ab-
stract program implemented by the program of interest.
While possibility conditions of systems are of little interest, we don’t
reason about systems; we reason about abstract programs that describe sys-
tems. Verifying that a program satisfies a possibility condition can be a way
of checking what we will call here the accuracy of an abstract program—that
it accurately describes the system it is supposed to describe. For example,
if the system doesn’t control when users send it input, a program that ac-
curately describes the system and its users should satisfy the condition that
it’s always possible for users to enter input.

4.3.2 Expressing Possibility in TLA


There exist logics for expressing possibility properties and tools for checking
them. Such tools could be built to check those properties for abstract TLA
programs, and it might be worthwhile to do so. But we will see how that
can be avoided.
Even though a possibility condition is not a property, that an abstract
program satisfies a possibility condition can be expressed by a TLA formula
that depends on the program and the possibility condition. For example,
suppose S is a TLA description of an abstract program and the action Input
is a subaction of its next-state action that describes users entering input.
That it is always possible for users to enter input could be considered to mean
8
Section 7.1 explains one way in which a possibility condition can be used to verify a
property of a system and could therefore be of practical use.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 146

that the Input action is enabled in every reachable state of the program,
which is asserted by
|= S ⇒ 2 E(Input)
However, “always possible” might instead mean that from any reachable
state, there is a sequence of possible steps that reach a state with E(Input)
true—a condition we will call “always eventually possible”. To express this
and other possibility conditions in TLA, we can use the action composition
operator defined in Section 3.4.1.4. Recall that for any action A, the action
A+ is true of a step s → t iff t is reachable from s by a sequence of one or
more A steps.
Now consider an abstract program Init ∧ 2[Next]v where Init is a state
predicate, Next is an action, and v is the tuple of all variables that appear
in Init or Next. Let’s abbreviate ([Next]v )+ as [N ext]+v . If s is a reachable
+
state of the program, then s → t is a [N ext]v step iff it is possible for an
execution of the program to go from state s to state t. (Since [Next]v allows
stuttering steps, t can equal s. In fact, [N ext]+ +
v is equivalent to [Next ]v .)
A state t is a reachable state of the program iff there is a state s satisfying
Init such that s → t is a [N ext]+
v step. In other words, t is a reachable state
of the program iff there is a state s such that s → t is an Init ∧ [N ext]+ v
step.
We can now express the condition that it is always eventually possible for
the user to enter input, meaning that from any reachable state, it is possible
to reach a state in which E(Input) is true. We generalize this condition by
replacing E(Input) with an arbitrary state predicate P . For the abstract
program Init ∧ 2[Next]v , that P is always eventually possible is expressed
as:
0
(4.27) |= Init ∧ 2[Next]v ⇒ 2 E([N ext]+
v ∧P )

This example indicates that it should be possible to express possibility


conditions in TLA using Next + and the E operator. However, like (4.27),
the resulting TLA formulas are quite different from the ones that arise in
checking that an abstract program satisfies a property. Different tools would
be needed to verify that a program satisfies a possibility condition expressed
in this way. It would be nice to be able to verify possibility conditions by
verifying the same kind of properties that arise in verifying that an abstract
program satisfies a liveness property. Here is how it can be done for the
condition that it is always possible to reach a state satisfying P .
This condition obviously holds if the program satisfies the property that
in any reachable state, a state satisfying P must eventually occur—that is,
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 147

if the program satisfies the property 23P . Let S equal Init ∧ 2[Next]v .
The safety property S will not imply the liveness property 23P unless P
is true in all reachable states of F —that is, unless S implies 2P . However,
if F is a fairness property for S , so hS , F i is machine closed, then S ∧ F
has the same set of reachable states as S . So, any state satisfying P can
be reached from a reachable state of S iff it can be reached from a state
satisfying S ∧ F . Therefore, it suffices to verify that a state satisfying P
can be reached from every reachable state of S ∧ F , which is true if S ∧ F
implies 23P . Therefore, we can verify that P is always eventually possible
by verifying

|= Init ∧ 2[Next]v ∧ F ⇒ 23P

for a fairness property F of Init ∧ 2[Next]v . By Theorem 4.6, we can ensure


that F is a fairness property for S by writing it as the conjunction of strong
fairness properties. (Since strong fairness implies weak fairness, there is no
need to use weak fairness properties.) There is a completeness result that
essentially says that such a fairness property always exists for any state
predicate P if S has the standard form Init ∧ 2[Next]v [32].
I suspect that TLA can in a similar way express possibility for a more
general class of possibility conditions than the two possible interpretations
of “always possible”. However, the only other possibility condition I have
found to be useful for checking the accuracy of an abstract program is a
very simple one, discussed in the following section.

4.3.3 Checking Accuracy


Using TLA to check that a state predicate P is always eventually possible
may not be easy, since it requires finding a fairness condition that implies P
is true infinitely often. There’s a simpler condition that is easier to check:
it’s possible for P to be true (at least once). It would be more useful to
check the stronger condition of always eventually possible. However, I have
found that most people, including me, don’t spend enough time checking
the accuracy of their abstract programs. The weaker check is likely to be
more helpful in practice because it’s more likely to be done than one that
requires more effort.
It’s possible for P to be true means that it is true in some reachable
state. This is equivalent to the assertion that ¬P is not true in all reachable
states—in other words, that ¬P is not an invariant of the program. We
can check this by asking a tool for checking invariance to check if ¬P is
an invariant. If the tool reports that it isn’t, then it’s possible for P to be
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 148

true. For example, if a model checker reports that your mutual exclusion
algorithm satisfies mutual exclusion, you should check that it’s possible for a
process to enter its critical section. This is especially true if you did not have
to make many corrections to reach that point. Remember that a program
that takes no non-stuttering steps satisfies most safety properties.
Tools can provide other ways of checking the accuracy of a program. For
example, if Input is a subaction of the program’s next-state action, a TLA+
model checker called TLC reports how many different steps satisfying Input
occur in behaviors of the program. If it finds no such steps, then it is not
always possible for an Input step to occur with either definition of “always
possible”. Finding too few such steps can also be an indication that the
program is not accurate.
Accuracy of an abstract program cannot be formally defined. It means
that a program really is correct if it implements the abstract program. In
other words, an abstract program is accurate iff it means what we want it
to mean, and our desires can’t be formally defined. That accuracy can’t
be formally defined does not mean it’s unimportant. There are quite a few
important aspects of programs that lie outside the scope of our science of
correctness.

4.4 Real-Time Programs


Real-time abstract programs are ones in which timing constraints ensure
that safety properties hold. Real time is most often used in concrete pro-
grams to ensure not safety but liveness. It appears in timeouts that guaran-
tee something eventually happens. For example, to guarantee that a message
is eventually delivered despite possible message loss, a timeout occurs if an
acknowledgement of the message is not received soon enough after it is sent,
and the message is then resent. A program that uses timeouts only in this
way is not a real-time program. The actual time at which a timeout occurs
affects only performance, not correctness. Therefore, timeout can be mod-
eled in an abstract program as an event that must eventually occur but can
occur at any time. The program can use liveness to abstract away time.
We need to write a real-time abstract program only if timing constraints are
used to ensure safety properties.
Scientists have been dealing mathematically with real-time systems for
centuries by simply representing time as the value of a variable. It has
been known for decades that this works for real-time programs too [6]. I
will illustrate how it is done with a mutual exclusion algorithm of Michael
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 149

Fischer.9
I believe that most work on the correctness of real-time programs has
considered only safety properties. Instead of requiring that something even-
tually happens, it requires the stronger property that it happens within
some fixed amount of time, which is a safety property. Fischer’s Algorithm
is more general because in addition to using real-time to ensure mutual ex-
clusion, a safety property, it uses fairness to ensure deadlock freedom, a
liveness property.
In the past 40 years, I have had essentially no contact with engineers who
build real-time systems. I know of only one case in which TLA was used to
check that a commercial system satisfied a real-time property [5].10 From
the point of view of our science, there is nothing special about real-time
programs. However, how well tools work can depend on the application
domain. The TLA+ tools were not developed with real-time programs in
mind, and it’s unclear how useful they are in that domain.

4.4.1 Fischer’s Algorithm


The algorithm without its timing constraints is described by the pseudocode
in Figure 4.7. The constant Procs is the set of processes, and none is some
constant that is not in Procs. The global variable x is read and written by
all the processes and, as usual, the value of pc(p) is the label of the next
statement to be executed by process p. As explained in Section 4.2.2.1, the
statement await P is a synchronization primitive that allows the program to
continue only when the state predicate P is true. Thus, the wait statement
of process p is described by the action:
∧ pc(p) = wait
∧ x = none
∧ pc 0 = (pc except p 7→ w 1)
∧ x0 = x
With no time constraints, mutual exclusion is easily violated. Two processes
can execute the wait statement when x equals none, then statements w 1 and
w 2 can both be executed by the first process and then by the second one,
putting both processes in the critical section. Mutual exclusion is ensured
by timing constraints.
9
Fischer sent this algorithm in an email to me [12]. I believe Martı́n Abadi and I were
the first to describe it in print [1].
10
The TLA+ design of the real-time operating system mentioned in Section 1.6 was not
used to check its real-time properties.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 150

variables x = none ;
process p ∈ Procs
variables pc = ncs ;
while true do
ncs: skip ; noncritical section
wait: await x = none ;
w 1: x : = p ;
w 2: if x 6= p then goto wait end if ;
cs: skip ; critical section
exit: x : = none
end while
end process

Figure 4.7: Fischer’s Algorithm.

We assume that each step is executed instantaneously at a certain time,


and that each process executes w 1 at most δ seconds after it executes wait
and executes w 2 at least  seconds after it executes w 1, for constants δ and
 with δ < . (The algorithm doesn’t specify what the time units are; we
will call them seconds for convenience.) It’s a nice exercise to show that this
ensures mutual exclusion by assuming that two processes are in their critical
sections and showing that the necessary reads and writes of x that allowed
them both to enter the critical section must have occurred in an order that
violates the timing constraints if δ < . While it may be good enough for
such a simple algorithm, this kind of behavioral reasoning is unreliable for
more complicated programs.
To verify mutual exclusion more rigorously, we describe Fischer’s Al-
gorithm with its timing constraints as an abstract program. This requires
adding a variable whose value represents the current time. Scientists usu-
ally call that variable t, but I like to call it now . We also add a variable rt,
where the value of rt(p) records the time at which certain actions of process
p were executed. The program also contains an additional process called
Time that advances time.
The algorithm is written in pseudocode in Figure 4.8. The initial value
of now can be any number in the set R of real numbers, and the initial value
of rt can be any function from the set of processes to R. Let’s now examine
the code of process p. Two assignments to rt(p) have been added. Each
sets rt(p) to the time at which the action in which it appears is executed,
the actions being the ones performed when the program is at control points
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 151

variables x = none, now ∈ R, rt ∈ (Procs → R) ;


process p ∈ Procs
variables pc = ncs ;
while true do
ncs: skip ; noncritical section
wait: await x = none ;
rt(p) : = now ;
w 1: x : = p ;
rt(p) : = now ;
w 2: await now − rt(p) ≥  ;
if x 6= p then goto wait end if ;
cs: skip ; critical section
exit: x : = none
end while
end process
process Time
while true do
now :∈ {t ∈ R :
∧ t > now
∧ ∀ p ∈ Procs : (pc(p) = w 1) ⇒ (t ≤ rt(p) + δ) }
end while
end process

Figure 4.8: Fischer’s Algorithm with explicit time.

wait and w 1. Also added is an await statement at w 2 that allows the


action to be performed only when now − rt(p) ≥ . This await enforces
the requirement that the w 2 action must not be executed until at least 
seconds after execution of the w 1 action.
Let’s now examine the Time process. It repeatedly performs a single
atomic action, so it has just one control point that needs no label. That
action increases the value of now . The :∈ operation is like the assignment
operation : = except with = replaced by ∈. That is, a step of the Time
process’s action assigns to now an arbitrary element of the set on the right-
hand side of the :∈ . The action can assign to now any value t greater than
its current value subject to the condition that t ≤ rt(p) + δ for every process
p at control point w 1. It is this condition that enforces the requirement that
a process must execute statement w 1 within δ seconds of when it executes
the wait statement.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 152

Fischer’s Algorithm illustrates the basic method of representing real-


time constraints in an abstract program. Lower bounds on how long it
must take to do something are described by enabling conditions on the
algorithm’s actions. Upper bounds are described by enabling conditions on
the action that advances time. There are a number of ways of enforcing these
bounds. The use of the variable rt in Fischer’s algorithm shows one way.
Another is to use variables whose values are the number of seconds remaining
before an action must be executed (lower bounds) or can be executed (upper
bounds)—variables whose values are decremented by the time-advancing
action.
The idea of an abstract program constraining the advance of time is
mind-boggling to most people, since they view a program as a set of in-
structions. They see it as the program stopping time. You should by now
realize that an abstract program is a description, not a set of instructions.
It describes a universe in which the algorithm is behaving correctly. That
description may constrain the algorithm’s environment, which is the part
of the universe that the algorithm doesn’t control—for example, its users.
Time is an important part of that environment if the amount of time it takes
to perform the algorithm’s actions is relevant to its correctness.

4.4.2 Correctness of Fischer’s Algorithm


Having written Fischer’s algorithm as an abstract program, we know how
to verify its correctness. Mutual exclusion is an invariance property, and
to understand why the algorithm satisfies it we need to find the inductive
invariant that explains why the algorithm satisfies that property. As usual,
the inductive invariant asserts type-correctness of all the variables. The
interesting part of the invariant makes the following assertions about each
process p:

• If control is at w 1, then the current time is at most δ seconds after


the time at which p just executed the wait statement.

• If control is at cs or exit, then x = p and in no process is control at


w 1. (This condition implies mutual exclusion.)

• If p is at w 2 and x = p, then any process with control at w 1 must


execute statement w 1 before p can execute statement w 2.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 153

These assertions about every process p are expressed mathematically as:

∀ p ∈ Procs :
∧ (pc(p) = w 1) ⇒ (rt(p) ≤ now ≤ (rt(p) + δ))
∧ (pc(p) ∈ {cs, exit}) ⇒ (x = p) ∧ (∀ q ∈ Procs : pc(q) 6= w 1)
∧ (pc(p) = w 2) ∧ (x = p) ⇒
∀ j ∈ Procs : (pc(q) = w 1) ⇒ ((rt(q) + δ) < (rt(p) + ))
You should understand why the three conjuncts in this formula are the three
assertions expressed informally above. Adding the type-correctness part and
proving that it is an inductive invariant is a good exercise if you want to
learn how to write proofs.
Under suitable fairness assumptions, Fischer’s Algorithm is deadlock
free. Recall that deadlock freedom for a mutual exclusion algorithm means
it’s always true that if some process is trying to enter the critical section,
then some process (not necessarily the same process) will eventually do so.
Deadlock freedom of Fischer’s Algorithm follows from the algorithm having
this additional invariant:

(4.28) (x 6= none) ⇒ (pc(x ) ∈ {w 2, cs, exit})

Here’s a sketch of a proof by contradiction that the algorithm is deadlock


free. Suppose some process is at wait and no process is ever in its critical
section. Eventually, some set of processes will be forever in their noncritical
sections, and one or more processes will forever have control at wait, w 1,
or w 2. Eventually the latter processes will all wind up waiting at the wait
statement with x 6= none. But that contradicts the invariant (4.28), which
implies that process x cannot be at wait.

4.4.3 Fairness and Zeno Behaviors


What fairness requirements of the abstract program of Figure 4.8 are as-
sumed in the informal argument that the program is deadlock free? If
procStep(p) is the next-state action of a process p in Procs, then we nat-
urally assume weak fairness of the action procStep(p) ∧ (pc 6= ncs). What
about fairness of the Time process? Let’s call that process’s action timeStep.
The obvious choice is to let it be strong fairness of timeStep. However, that
allows the following behavior: While all other processes in Procs remain in
their noncritical sections, a process p executes the wait statement and then,
at time t, executes statement w 1 that sets rt(p) to t. Repeated executions
of action timeStep then set now to t + /2, then t + 2 ∗ /3, then t + 3 ∗ /4,
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 154

and so on. Process p must wait forever at w 2 because now is always less
than t +  and the w 2 action is enabled only when now ≥ t + . Such a
behavior, in which time remains bounded, is called a Zeno behavior.
The most natural way to avoid the problem of Zeno behaviors is to
make the abstract program describing Fischer’s Algorithm disallow them.
The obvious way to do that is to conjoin this liveness property:
(4.29) ∀ t ∈ R : 3(now > t)
which asserts that the value of time is unbounded. However, this isn’t neces-
sarily a fairness property. It’s easy to write an abstract program that allows
only Zeno behaviors, so conjoining the liveness property (4.29) produces a
program that allows no behaviors. For example, we can add timing con-
straints to the program of Figure 4.7 that require a process both to execute
statement w 1 within δ seconds after executing statement wait and to wait
at least  seconds after executing wait before executing w 1, with δ < . If a
process executes wait at time t, then now ≤ t + δ must remain true forever.
If we added fairness properties that required processes eventually to reach
the wait statement and execute it if it’s enabled, then the program would
allow only Zeno behaviors.
We can ensure that Fischer’s Algorithm satisfies (4.29) by having it re-
quire an appropriate fairness condition on the advancing of time. The condi-
tion we need is strong fairness of the action timeStep ∧ (now 0 = exp), where
exp is the largest value of now 0 permitted by the values of rt(p) for processes
p with control at w 1, or now + 1 if there is no such process. More precisely:
∆ ∆
exp = let T = {rt(p) + δ : p ∈ {q ∈ Procs : pc(q) = w 1}}
in if T = {} then now + 1 else Min(T )
where Min(T ) is the minimum of the nonempty set T of real numbers.
With this fairness condition on advancing time and the conjunction of the
fairness conditions for the processes in Procs, Fischer’s Algorithm satisfies
(4.29) and the proof sketch that the algorithm is deadlock free can be made
rigorous.
If we are interested only in safety properties, there is no need for an
abstract program to rule out Zeno behaviors. A program satisfies a safety
property iff all finite behaviors allowed by the program satisfy it, and a Zeno
behavior is an infinite behavior. In many real-time programs, liveness prop-
erties are of no interest. Correctness means not that something eventually
happens but that it happens within a certain length of time, which is a
safety property. Zeno behaviors then make no difference, and there is no
reason to disallow them.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 155

Even if Zeno behaviors don’t matter, the absence of non-Zeno behaviors


can be a problem. Since real time really does increase without a bound,
an abstract program in which it is not always possible for time to become
arbitrarily large is unlikely to be accurate. Therefore, we almost always
want to ensure that a real-time program satisfies the condition that for any
t ∈ R, it is always possible for now > t to be true. This is true iff, for
any t ∈ R, from any reachable state of the program it is always possible
for now > t to be true. This is the kind of possibility condition considered
in Section 4.3. We saw there that if the program Safe is a safety property
that satisfies this condition, then we can verify that it does so by finding a
conjunction F of fairness properties for Safe and verifying:

(4.30) |= Safe ∧ F ⇒ 23(now > t)

for all t ∈ R. (Since a real-time program never allows now to decrease, it


suffices to verify that Safe ∧ F implies (4.29).)

4.4.4 Discrete Time


Verifying properties of real-time programs is easier if we assume time is dis-
crete and now always equals a multiple of some time unit. It may seem
obvious that, since concrete programs run on real computers reading the
current time from a clock that advances in discrete steps, we can always
assume discrete time. However, different processes can be executed on dif-
ferent computers whose clocks can run at slightly different rates. Still, it
seems likely that an abstract program will be sufficiently accurate if it as-
sumes time changes only in one yoctosecond (10−24 second) increments. So,
in practice we should be able to assume that the values of now and any time
constants are integers, leaving unspecified how long one time unit is.
When writing proofs, there doesn’t seem to be much reason to use dis-
crete time. The main advantage of discrete time is that tools for automat-
ically verifying properties of ordinary abstract programs can, in principle,
handle discrete real-time programs. For example, I didn’t prove that what I
claimed in Section 4.4.2 to be an inductive invariant of Fischer’s Algorithm
actually is one. Instead, I used a model checker to check that it is, which
gave me enough confidence to make the claim.
Many model checkers are based on enumerating reachable states, usually
on a small instance of the program—for example, with a small number of
processes. This is impossible with continuous time, in which a single state
can have possible next states with uncountably many values of now . There
are still infinitely many reachable states with discrete time because the values
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 156

of now are unbounded, but counterexamples to incorrect safety properties


and to (4.29) for a particular time t can be found by examining all reachable
states with now less than some value.
The number of reachable states that must be examined can be reduced
for real-time programs that satisfy a condition called symmetry under time
translation. This condition asserts that for every d ∈ R there is a time-
translation mapping T d from states to states such that: For every state s,
the value of now in state T d (s) equals d plus its value in s, and any step
s → t satisfies the program’s next-state action iff T d (s) → T d (t) does. For
example, Fischer’s Algorithm is symmetric under the time translations T d
defined by letting the values of now and rt(p) in T d (s) equal d plus their
values in s, and letting the values of x and pc be the same in s and T d (s).
Suppose S is a program symmetric under the time-translation functions
T d , and for simplicity assume that the program’s initial predicate asserts
now = 0. Let’s call states s and t translation equivalent iff t = T d (s) for
some d . If P is a safety property containing only variables whose values
are left unchanged by the functions T d , then to verify that P is satisfied by
S we can verify that it is satisfied by the program Sb obtained from S by
considering two translation equivalent states to be the same state. Often,
there will be some time λ such that every reachable state of S is translation
equivalent to a state in which now ≤ λ, in which case now ≤ λ in every
state of Sb . This implies we can verify that S satisfies (4.29) by checking
that it satisfies 3(now ≥ λ + 1), which requires examining only reachable
states with now ≤ λ + 1. The details can be found elsewhere [35].
Being able to reduce verification of a discrete-time program to examining
reachable states with now less than some value would still leave an enormous
number of states to consider if now advanced in yoctosecond steps. Hen-
zinger, Manna, and Pnueli [17] proved that for a class of programs called
timed transition systems, certain properties can be verified with discrete
time in which now advances only in reasonably sized steps. Timed tran-
sition systems are essentially programs in which, like Fischer’s Algorithm,
time is used only to require minimum and maximum delays between when
a program action becomes enabled and when it either must be executed
(maximum delay) or may be executed (minimum delay). If those delays are
constants that are all multiples of some time unit ∆, then a certain class of
properties can be verified by letting time always be a multiple of ∆. That
class of properties are ones in which replacing each state in a behavior by
time-translating it by d with −∆ < d ≤ 0 does not change whether the
behavior satisfies the property. It includes properties that depend only on
variables whose values are unchanged by time translation. It also includes
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 157

the property (4.29).


One reason that has been given for preferring continuous time is that
it is necessary for composing programs. It would be difficult to compose a
program in which a clock tick represents a nanosecond with one in which a
clock tick represents a millisecond. However, we can easily describe a pro-
gram in terms of a clock that ticks at an unspecified rate and an unspecified
constant that equals the number of clock ticks in a second.
Program composition is discussed in Section 7.2, where it is shown how
the composition of two abstract programs can be represented as the con-
junction of the TLA formulas that describe them. Moreover, a real-time
abstract program can be written as the conjunction of two formulas, one
describing an ordinary, untimed program, the other specifying the required
timing constraints [1]. However, this kind of abstract program composition
is not yet usable in practice.

4.4.5 Hybrid Systems


A hybrid computer system is one that controls physical processes—for exam-
ple, one that flies an airplane or runs a chemical plant. An abstract program
that describes such a system is a real-time program that describes not only
the passage of time but also other physical quantities like altitude or pres-
sure. There is no fundamental difference between programs that describe
hybrid systems and other real-time programs. The current altitude or pres-
sure is represented by a variable like any other variable. Abstract programs
describing hybrid systems differ from other real-time abstract programs only
in the math used to describe them. If the variable prs describes the current
pressure, then the time-advancing subaction of the next-state action might
contain a subformula like:
Z now0
(4.31) prs0 = prs + exp dt
now
for some expression exp containing the bound variable t and other vari-
ables [30].
It may seem that a representation of the behavior of a continuous process
by a sequence of discrete states would not be sufficiently accurate. For
example, if it is required that the pressure not be too high, violation of that
requirement would not be found if it occurred during the time between two
successive states of the behavior. This is not a problem because correctness
means that a property is true of all possible behaviors, and the possibility of
the pressure being too high at some time is revealed by a behavior containing
a state in which now equals that time.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 158

Other than the differences implied by the use of continuous math, such
as the calculus in (4.31), rather than discrete math, proving properties of
hybrid programs is the same as proving properties of other real-time ab-
stract programs. Automatic tools like model checkers for ordinary abstract
programs seem to be unsuitable for checking abstract programs in which
variables represent continuously varying quantities. Methods have been de-
veloped for checking such programs [11].
Chapter 5

Refinement

We have discussed one abstract program implementing another. We now


consider more carefully what that means. We write abstract programs with
TLA formulas, and it is rather weird to talk about one formula implementing
another. Computer scientists who view programs mathematically generally
use the term refinement rather than implementation. Henceforth, we will
use the two terms interchangeably. There are two aspects to refinement:

Step Refinement The refining program has a finer grain of atomicity.


This means that a non-stuttering step of the high-level program can
correspond to multiple steps of the refining program, all but one of
them implementing stuttering steps of the high-level program. In the
example of Section 3.5.1, the hour-minute-second clock HMS refines
the hour-minute clock HM . Every non-stuttering step of HM , which
advances the minute, corresponds to 60 steps of HMS , each changing
the second and one of them changing the minute.

Data Refinement A program refining another program can also refine the
representation of data used by the higher-level program. This will be
illustrated by refining a higher-level program that uses numbers with
a program that implements a number by a sequence of digits.

Refinement usually involves both step and data refinement, with step re-
finement manifest as operations on the lower-level data requiring more non-
stuttering steps than the corresponding operations on the higher-level pro-
gram’s data.
When a program S is refined by a program T , the variables in formula
T are usually different from the variables in formula S , and the two sets of
variables have non-overlapping scopes. Showing that T refines S involves

159
CHAPTER 5. REFINEMENT 160

reasoning in the scope of T about formulas containing the variables of S .


Section 2.4.1 describes how that’s done: We import an expression E con-
taining variables of S into the scope of the variables of T as the expression
E with v 1 ← exp 1 , . . . , v m ← exp m
where the v i are the variables of S and the exp i are expressions containing
the variables of T . The same thing applies to the constants of T and S ,
so the v i are both the variables and contants of S , and if v i is a constant
then exp i is a constant expression. We adopt the convention that if the
expression exp i is a constant or variable of T with the same name as the
constant or variable v i of S , then we omit the substitution v i ← v i . (This is
convenient because it can be unclear whether the two variables or constants
are the same in both programs or just have the same name.)

5.1 A Sequential Algorithm


In step refinement, the additional steps taken by the lower-level program
correspond to stuttering steps in the higher-level one. We consider an ex-
treme example of this: a program that terminates after taking a single
non-stuttering step that is refined by a traditional sequential program that
computes a value and stops.
This example is used because it’s simple and nicely illustrates data re-
finement. Its use does not imply that this way of looking at refinement is
the best one to use for traditional programs. There are other methods for
reasoning about refinement of traditional programs that are probably better
than our science of more general programs [20]. We consider only safety in
this example; liveness is straightforward.
The high-level abstract program Add begins with variables x and y equal
to arbitrary natural numbers, sets the variable z to their sum, and termi-
nates. Termination is indicated by changing the value of a Boolean-valued
variable end to true. Here is the definition:

InitA = (end = false) ∧ (x ∈ N) ∧ (y ∈ N)

NextA = ¬end ∧ (z 0 = x + y) ∧ (end 0 = true)

vA = hx , y, z , end i
InitA ∧ 2[NextA]vA

Add =
Note that Init does not specify the value of z . Its initial value doesn’t matter.
Note also that action NextA does not specify the values of x 0 or y 0 . You
should realize by now that this doesn’t mean those values are unchanged; it
CHAPTER 5. REFINEMENT 161

means that their new values are unspecified. We are assuming that the final
values of x and y don’t matter; we care only about the final value of z .

5.1.1 A One-Step Program


We refine Add by a program AddS in which a natural number is represented
by a finite ordinal sequence of decimal digits—that is, by an element of the
set Seq(0 . . 9). For convenience, we number the digits from right to left, so
the sequence h1, 2, 3i represents the number 321. Thus a sequence seq of
digits represents the number Val (seq) defined as follows, where the empty
sequence is defined to represent 0.

Val (seq) = if seq = h i then 0
else seq(1) + 10 ∗ Val (Tail (seq))
Let ⊕ represent addition of numbers represented in this way as sequences
of digits. In other words, ⊕ satisfies
Val (s ⊕ t) = Val (s) + Val (t)
for all sequences s and t of digits. We would expect Add to be refined by
the program AddS obtained by replacing + by ⊕ and N by Seq(0 . . 9) in
the definition of Add . To avoid confusing the variables of the two programs,
we’ll also replace x , y, z , and end by u, v , w , and fin, so AddS is defined
by:

InitS = (fin = false) ∧ (u ∈ Seq(0 . . 9)) ∧ (v ∈ Seq(0 . . 9))

NextS = ¬fin ∧ (w 0 = u ⊕ v ) ∧ (fin 0 = true)

vS = hu, v , w , fin i
InitS ∧ 2[NextS ]vS

AddS =
Exactly what does it mean for AddS to refine Add ? I believe the natural
definition is: If we look at any behavior of AddS and interpret the numbers
represented by the sequences u, v , and w of digits to be the values of x ,
y, and z and we interpret the value of fin to be the value of end , then we
get a behavior of Add . More precisely, let “←” mean “is represented by”.
That AddS refines Add means that a behavior satisfying AddS represents a
behavior satisfying Add with this representation of the variables of Add in
terms of the variables of AddS :
(5.1) x ← Val (u) y ← Val (v ) z ← Val (w ) end ← fin
Here’s an example to illustrate this, where the first two-state sequence is
a finite behavior satisfying AddS and the second two-state sequence is the
CHAPTER 5. REFINEMENT 162

finite behavior it represents. Remember that AddS leaves unspecified the


value of w in an initial state and the values of u and v in a halting state.
A “?” in the second behavior means the value is unspecified because, as
explained in Section 2.2.7, Val (seq) is a meaningless expression if seq isn’t
a sequence of numbers.
   
u :: h1, 2, 3i u :: h5i
 v :: h3, 2i   v :: −27 

 w ::
 √ 
2 
 → 


 w :: h4, 4, 3i

f in :: false 0 f in :: true 1
   
x :: 321 x :: 5
 y :: 23   y :: ? 

 z :: ? 


 → 
 z :: 344 


end :: false 0 end :: true 1


As you can see, the second finite behavior satisfies Add , since 321+23 equals
344 and end has the values the abstract program Add says it should.
Let’s look closely at this example. What it shows is that when we per-
form the substitutions (5.1) for the variables of Add in a behavior satisfying
formula AddS , we get a behavior that satisfies formula Add . In other words,
in any behavior: if the behavior satisfies AddS , then it satisfies the formula
Add when we perform the substitutions (5.1). This means that the following
formula is true:
(5.2) |= AddS ⇒
(Add with x ← Val (u), y ← Val (v ), z ← Val (w ), end ← fin)
This is what it means for AddS to refine Add under the representation de-
fined by (5.1). That representation is called a refinement mapping. Formula
(5.2) asserts that AddS implements Add under this refinement mapping.
I find (5.2) beautiful. We’ve already seen that, viewed in terms of
TLA, step refinement is implication. Now we see that data refinement is
substitution—the ordinary mathematical operation of substituting expres-
sions for variables in a formula. How beautifully simple! In science, beauty
is not an end in itself. It’s a sign that we’re doing something right.

5.1.2 Two Views of Refinement Mappings


There are two ways to view the refinement mapping (5.1) that appears
in (5.2). To understand them, let’s simplify things by ignoring irrelevant
variables and letting state mean program state—an assignment of values
CHAPTER 5. REFINEMENT 163

to the program’s variables. Let an S -state be a state of AddS , which is


an assignment of values to the variables u, v , w , and fin of AddS ; and let
an A-state be an assignment of values to the variables x , y, z , and end of
Add . Let an S -behavior or A-behavior be a sequence of S -states or A-states,
respectively.
The first way to view the refinement mapping is as a mapping f that
maps S -states to A-states. We can define the mapping fb from S -behaviors to

A-behaviors in terms of f by fb(σ)(i ) = f (σ(i )) for any S -behavior σ. That
AddS implements Add under the refinement mapping f means [[AddS ]](σ) ⇒
[[Add ]](fb(σ)) for every S -behavior σ.
The second way to view refinement is expressed in (5.2), which states that
AddS implements Add under the refinement mapping means [[AddS ]](σ) ⇒
[[Add with . . . ]](σ) for every S -behavior σ.
In the first view, the refinement mapping maps low-level behaviors to
high-level behaviors—that is, behaviors of AddS to behaviors of Add . In
the second view, it maps the high-level formula Add to the low-level formula
(Add with . . . ) . It may not be obvious, but these two views are equivalent.
They’re equivalent because

(5.3) [[Add with . . . ]](σ) ≡ [[Add ]](fb(σ))

is true for every S -behavior σ. To understand why (5.3) is true, remember


that the meaning [[F ]] of a formula F is defined in terms of the meanings of
its subformulas. Consider the subformula z 0 = x + y of Add , which appears
in its next-state action NextA. You should understand why (5.3) is true if
you understand why this is true:

(5.4) [[(z 0 = x + y) with . . . ]](σ) ≡ [[z 0 = x + y]](fb(σ))

A proof of (5.4) is given in the Appendix. However, I recommend that you


try figuring out by yourself why it’s true.
The bidirectional nature of refinement mappings—from low-level be-
haviors to high-level behaviors and from high-level formulas to low-level
formulas—is an example of a general mathematical principle. To explain it,
we need the concept of function composition. For any functions g and h,
their composition g • h is defined to be the function whose domain is the

domain of h such that (g • h)(d ) = g(h(d )) for any value d in the domain
of h.1 (For any d in the domain of h, the expression g • h(d ) is meaningful
only if h(d ) is in the domain of g.)
1
Mathematicians generally use “·” or “◦” to denote function composition, but we have
defined those symbols to mean other things.
CHAPTER 5. REFINEMENT 164

Let’s suppose that the collections of S -behaviors and A-behaviors are


sets. For example, we can let them be behaviors that satisfy suitable type
invariants. Formula (5.3) can then be written:

(5.5) [[Add with . . . ]] ≡ [[Add ]] • fb

In general, suppose L and H are sets and ψ ∈ (L → H ), so the function


ψ maps elements of L to elements of H . For any set V , the function ψ
← ←
determines a function ψ in (H → V ) → (L → V ), so ψ maps functions

from H to V to functions from L to V . The function ψ is defined by
← ∆
ψ(g) = g • ψ. Pictorially, we have:
negative
ψ vspace added
L −→ H

ψ
(L → V ) ←− (H → V )
Refinement mappings are the special case in which L is the set of S -behaviors,
H is the set of A-behaviors, V is the set {true, false} of Booleans, and ψ
is fb.

5.1.3 A Step and Data Refinement


Section 3.5.1 illustrated step refinement by showing that an hour-minute-
second clock refines an hour-minute clock. We have just illustrated data
refinement by showing that AddS implements Add under a refinement map-
ping. We now show an example that involves both step and data refinement.
The example involves an algorithm AddSeq that adds numbers the way
you probably learned to add them as a child. However, we represent those
numbers as sequences of digits in the reverse order, as they are in program
AddS , with the low-order digit being the first one in the sequence. It sets
sum equal to u ⊕ v , where u and v are numbers represented by strings of
digits. The algorithm computes sum digit by digit, keeping it equal to the
right-most digits of the sum computed so far. Each step removes the first
(right-most) digit from u and v and appends the next (left-most) digit to
sum, setting carry equal to the value 0 or 1 “carried over” from that sum.
A pseudocode description of the algorithm is in Figure 5.1, but it uses
some notation that requires explanation. Recall that Append (seq, val ) is
defined in Section 2.3.2 to equal seq ◦ hval i, the sequence obtained by ap-
pending the value val to the end of the sequence seq of values. The code as-
sumes that DigitSeq is the set Seq(1 . . 9) \ {h i} of nonempty finite sequences
of the digits 0 through If u and v are of unequal length, then the number of
CHAPTER 5. REFINEMENT 165

variables u ∈ DigitSeq, v ∈ DigitSeq, sum = h i, carry = 0, pc = a ;


while a: (u 6= h i) ∨ (v 6= h i) ∨ (carry 6= 0) do

define digit = Fix (u)(1) + Fix (v )(1) + carry;
sum : = Append (sum, digit % 10) ;
carry : = bdigit / 10c
end define ;
u : = Tail (Fix (u)) ;
v : = Tail (Fix (v ))
end while ;
carry : = 0

Figure 5.1: Algorithm AddSeq.

steps taken by the algorithm you learned in school is usually equal to or one
greater than the length of the longer number. For simplicity, the number
of steps taken by AddSeq always equals one plus the length of the longer
number. To simplify the description of what happens when the algorithm
runs out of digits in one of the numbers, it uses the operator Fix defined as
follows to replace the empty sequence by h0i:

Fix (seq) = if seq = h i then h0i else seq

The algorithm’s define statement defines digit to equal the indicated ex-
pression within that statement. The value of bn/10c is the greatest integer
less than or equal to n/10. To simplify the invariant, AddSeq specifies the
initial value of carry to equal 0 and ensures that it equals 0 at the end. Since
the low-order digit of a two-digit number n is n % 10 and its high-order digit
is bn/10c, it should be clear that AddSeq describes an algorithm for adding
two decimal numbers. (If it’s not, execute it by hand on an example.)
The usual way to express correctness of a program that computes a value
sum and stops is with an invariant asserting that if the program has stopped
then sum has the correct value. We can’t do that with AddSeq because the
correct value of sum is the initial value of u ⊕ v , and those initial values
have disappeared by the time the program stops. To express correctness, we
can add a constant ans that equals the initial value of u ⊕ v . Since stopping
means pc equals done for our pseudocode, correctness means:

(5.6) |= AddSeq ⇒ 2((pc = done) ⇒ (ans = sum))

The key part of an inductive invariant to prove (5.6) is the assertion that
CHAPTER 5. REFINEMENT 166

ans equals the final value of sum. A first approximation to the final value
of sum is:

sum ◦ (hcarry i ⊕ (u ⊕ v ))

We haven’t said what s ⊕ t means if s or t is the empty sequence, but it’s


clear that we should define the empty sequence to represent 0. However, a
close examination of the algorithm indicates that if u and v both equal h i
and carry = 0, then this expression equals a sequence with an extra 0 at the
end. The correct assertion that is the key to the inductive invariant is

(5.7) ans = if (u ◦ v = h i) ∧ (carry = 0)


then sum else sum ◦ (hcarry i ⊕ (u ⊕ v ))
The remainder of the inductive invariant asserts type correctness and that
pc = done implies that u, v , and carry have their correct final values.
However, the point of this example is not that AddSeq implements a
particular procedure for adding sequences of digits; it’s that it refines the
abstract program Add that adds two integers in a single step. This is a
dramatic example of step refinement, in which a program that can take ar-
bitrarily many non-stuttering steps to finish refines one that always finishes
in one non-stuttering step. And we don’t have to add the constant ans to
do it.
Under the refinement mapping, one step in an execution of AddSeq must
refine a NextA step of Add ; all the other steps must refine stuttering steps
of Add . The initial values of the variables x and y of Add should equal the
initial values of Val (u) and Val (v ). The initial values of u and v are no
longer deducible from the state after AddSeq takes it first step. This tells us
that the NextA step of Add must be refined by the first non-stuttering step
of AddSeq.
An Add step changes the value of its variable done from false to true.
So, the refinement mapping must assign to done an expression whose value
is changed from false to true by the first non-stuttering step of AddSeq.
Since further steps of AddSeq refine stuttering steps of Add , the expression
assigned to done must remain true for the rest of the execution of Add . A
suitable expression is sum 6= h i, so we let the refinement mapping include
done ← sum 6= h i.
In the initial state of AddSeq, the refinement mapping should assign to x
and y the values of u and v . Since Add allows x and y to have any values in
its final state, it doesn’t matter what values the refinement mapping assigns
to x and y after the first step of AddSeq. However, since later steps must
CHAPTER 5. REFINEMENT 167

refine stuttering steps of Add , the values of x and y must not change. Zero
seems like a nice value to let x and y equal when their value no longer
matters, so we let the refinement mapping include:
x ← if sum 6= h i then 0 else Val (u) ,
y ← if sum =
6 h i then 0 else Val (v )
Finally, we must decide what value the refinement mapping assigns to z .
If we add to AddSeq the constant ans that always equals the result the al-
gorithm finally computes, then we can substitute Val (ans) for z . But we
don’t have to add it because the invariant (5.7) tells us what expression con-
taining only the variables of AddSeq always equals ans. We could therefore
substitute for z the expression obtained by applying Val to the right-hand
side of equation (5.7). However, there’s a simpler expression that we can
use. Convince yourself that the following substitution works:
z ← Val (sum) + 10Len(sum) ∗ (carry + Val (u) + Val (v ))
This completes the refinement mapping. That AddSeq implements Add
under the refinement mapping means that this theorem is true:
|= AddSeq ⇒ (Add with
done ← sum 6= h i ,
x ← if sum 6= h i then 0 else Val (u) ,
y ← if sum 6= h i then 0 else Val (v ) ,
z ← Val (sum) + 10Len(sum) ∗ (carry + Val (u) + Val (v )))

5.2 Invariance Under Refinement


If an abstract program T implements an abstract program S under a refine-
ment mapping, and Inv is an invariant of S , then the refinement mapping
maps Inv to an invariant of T . The precise statement of this is the following
theorem, where “. . . ” is any refinement mapping.
Theorem 5.1 |= T ⇒ (S with . . .) and |= S ⇒ 2Inv imply
|= T ⇒ 2(Inv with . . .) .
The proof is simple:
1. |= S ⇒ 2Inv implies |= (S ⇒ 2Inv ) with . . . .
Proof: Substitution in a true formula produces a true formula
2. |= (S ⇒ 2Inv ) with . . . equals |= (S with . . .) ⇒ 2(Inv with . . .) .
CHAPTER 5. REFINEMENT 168

Proof: By definition of what substitution means.


3. Q.E.D.
Proof: The theorem follows from steps 1 and 2 by propositional logic.

Recall the trick used in Section 3.2.3 to obtain an invariant of FGSqrs


from an invariant of the coarser-grained algorithm Sqrs. We replaced the
variable y by the expression yy in the invariant of Sqrs. That trick was
an application of the theorem, because FGSqrs implements Sqrs under the
refinement mapping x ← x , y ← yy.

5.3 An Example: The Paxos Algorithm


We’ve seen step and data refinement for a sequential abstract program. Con-
currency adds nothing new. Refinement works exactly the same for concur-
rent programs. This is illustrated with the Paxos consensus algorithm, an
example chosen for the following reasons:
• It’s a distributed algorithm. Quite a few researchers used to believe
that different techniques are needed to reason about correctness of dis-
tributed programs; perhaps some still do. Paxos illustrates that there
is no mathematical difference between distributed and non-distributed
concurrent programs. In fact, Paxos is obtained as a refinement of a
non-distributed algorithm.

• It’s a widely used algorithm. If you perform any commercial transac-


tion on the Web, there is a good chance that Paxos or an algorithm
inspired by it is being executed by a program running on the computers
that perform the transaction.

• It illustrates the importance of abstraction. Thinking scientifically


means thinking abstractly. The abstract programs in this example are
more abstract than ones most computer scientists and engineers would
think of. Learning to think more abstractly is the key to building
better complex computer systems.

• The complete TLA+ specifications of the abstract programs, as well


as videos of a pair of lectures that explain them, are available on the
Web [26]. Therefore, the abstract programs are only sketched here.
The Paxos algorithm was invented (at least) twice, first by Barbara Liskov
and Brian Oki [41] and then by me.
CHAPTER 5. REFINEMENT 169

5.3.1 The Consensus Problem


One reason for building distributed systems is fault tolerance. Systems
implemented by multiple computers are often required to operate normally
even if one or more of the computers fail. What a system should do can be
described as a single-process abstract program that executes a sequence of
commands it receives as inputs. Correct execution of the system by multiple
computers requires that all the computers agree on what that sequence of
inputs is. This is achieved by having all the computers agree on what the i th
input is for every i . Ensuring that all the computers agree on a single input
is called consensus. A fault-tolerant system repeatedly executes a consensus
algorithm to choose a sequence of inputs.
I was inspired to invent the Paxos algorithm because colleagues were
building a distributed fault-tolerant system. I realized that the system
had to implement consensus, so it should implement a consensus algorithm.
However, my colleagues were writing code; they didn’t have an algorithm. I
never found out how their program implemented consensus. But based on
the state of the art of programming at the time, here is what their program
might have done.
A process called the leader, running on a single computer, receives all
input requests and decides what input should be chosen next. A new leader
will have to be selected if the initial leader fails, but we’ll worry about that
later. (Failure of a process usually means failure of the computer executing
the process.) For the system to keep running despite the failure of individual
computers, a set of processes called acceptors, each running on a different
computer, have to know what value was chosen. Moreover, only a subset of
the acceptors should have to be working (that is, not failed) for an input to
be chosen. If an input v is chosen by a leader and a set of acceptors, and
the leader and those acceptors fail, then a different leader and a different set
of acceptors must not choose an input different from v . The obvious way
to ensure that is to require a majority of the acceptors to agree upon the
input v in order for that input to be chosen. Any two majorities have at
least one acceptor in common, and that acceptor will know that it agreed
to the choice of v .
This reasoning leads to the following algorithm: The leader decides what
input v should be chosen. It sends a message to the acceptors saying that
they should agree to the choice of v . Any working acceptor that receives the
message replies to the leader with a message saying “v is OK”. When the
leader receives such an OK message from a majority of acceptors, it sends
a message to all the acceptors telling them that v has been chosen.
CHAPTER 5. REFINEMENT 170

This algorithm works fine, and the system keeps choosing a sequence of
inputs, until the leader fails. At that point, a new leader is selected. The
new leader sends a message to all the acceptors asking them what they’ve
done. In particular, the new leader finds out from the acceptors if inputs
were chosen that it was unaware of. It also finds out if the previous leader
had begun trying to choose an input but failed before the input was chosen.
If it had, then the new leader completes the choice of that input. When the
new leader has received this information from a majority of acceptors, it
can complete any uncompleted choices of an input and begin choosing new
inputs. Let’s call this algorithm the naive consensus algorithm.
There’s one problem with the naive algorithm: How is the new leader
chosen? Choosing a single leader is just as hard as choosing a single input.
The naive consensus algorithm thus assumes the existence of a consensus
algorithm. However, because leader failures should be rare, choosing a leader
does not have to be done efficiently. So, programmers would probably have
approached the problem of choosing a leader the way they approached most
programming problems. They would have found a plausible solution and
then debugged it. Debugging usually means thinking of all the things that
could go wrong and adding code to handle them.
Let’s pause and look at the science of consensus. Before Paxos, there
were consensus algorithms that worked no matter what a failed process
could do [45]. However, they were synchronous algorithms, meaning that
they assumed known bounds on the time required for messages sent by one
process to be received and acted upon by another process. They were not
practical for the loosely coupled computers that had become the norm by the
1980s. Although asynchronous algorithms were required, they had to solve a
simpler problem because sufficiently reliable systems could be based on the
assumption that a process failed by stopping and could not perform incorrect
actions. However, the FLP theorem, named after Michael Fischer, Nancy
Lynch, and Michael Paterson who discovered and proved it, states that no
asynchronous algorithm can implement consensus if even a single process can
fail in this benign way [13]. More precisely, any algorithm that ensures the
safety property that two processes never choose different values must allow
behaviors that violate the liveness property that requires a value eventually
to be chosen if enough processes are working and can communicate with one
another. Asynchronous algorithms that ensure liveness must allow behaviors
in which processes disagree about what input is chosen.
The leader-selection code programmers would have written therefore had
to allow either behaviors in which two processes thought they were the
leader, probably with serious consequences, or else behaviors in which no
CHAPTER 5. REFINEMENT 171

leader is selected, causing the system to stop choosing values. With a prop-
erly designed algorithm, the probability of never choosing the leader is zero,
and a leader will be chosen fairly quickly if enough of the system is working
properly. The system my colleagues built ran for several years with about
60 single-user computers, and I don’t think their consensus code caused any
system error or noticeable stalling. There is no way to know if it had errors
that would have appeared in today’s systems with thousands of computers
and many thousands of users.

5.3.2 The Paxos Consensus Algorithm


We develop the Paxos consensus algorithm as a series of three abstract
programs: a trivial specification of the problem the algorithm solves, which
is refined by a non-distributed multiprocess algorithm, which is refined by
the Paxos algorithm. I believe that this description—in particular, its view
of Paxos as a refinement of the non-distributed algorithm—mirrors how I
actually found the algorithm.
Only the safety properties of these abstract programs are described. In
most applications, violation of safety in a consensus algorithm can be quite
serious—for example, causing money deposited to a client’s bank account
to disappear. We will see later how the algorithm can be implemented to
almost always achieve liveness while never violating safety. As mentioned
above, the abstract programs are just sketched; complete descriptions are
available on the Web [26].

5.3.2.1 The Specification of Consensus


Instead of talking about inputs, we define consensus as choosing an element
of some set Value of values. Most correctness proofs of consensus algorithms
prove only that they satisfy the invariance property that two processes never
choose different values. A consensus algorithm must also not allow a value
to be unchosen and a different value then chosen. Proving the invariance
property is usually sufficient because it’s obvious that the algorithm doesn’t
allow a value to be unchosen. But to illustrate refinement, we write a high-
level abstract program that rules out such a possibility.
There are a number of reasonable ways to describe consensus as an ab-
stract program, and it makes little difference which one is used. Perhaps
the most obvious way is with a multiprocess abstract program in which
each process independently learns what value is chosen. The next-state ac-
tion would allow a process p that has not learned a value to learn one, with
CHAPTER 5. REFINEMENT 172

the constraint that if any process has learned that the value v was chosen,
then p must also learn that v was chosen.
We take a different approach and let the abstract program describe only
the choosing of a value, without mentioning processes that learn the chosen
value. This abstract program has a single variable chosen that represents
the set of values that have been chosen. (In any behavior allowed by the
program, that set always has at most one value.) The initial predicate is
chosen = {}, and the next-state action is:

(chosen = {}) ∧ (∃ v ∈ Value : chosen 0 = {v })

As explained above, there is no fairness condition.

5.3.2.2 The Voting Algorithm


In the naive algorithm, leader and acceptor processes communicate by send-
ing messages. It’s natural to think about a consensus algorithm in terms
of messages being sent. However, remember that we reason about an ab-
stract program in terms of its state, so we should be thinking about states,
not about sending messages. And the important part of the state is the
state of the acceptors. So, we refine the Consensus program with an ab-
stract program called the Voting algorithm whose state is just the state of
the acceptors. This is not just a nice way to describe the Paxos algorithm.
Viewing Paxos as a refinement of the non-distributed voting algorithm is
what enabled me to discover Paxos.
In good programming, we begin by abstracting away lower-level details
and getting the high-level design right. There’s a kind of bad programming
that sounds similar: We begin writing something that handles the normal
behavior, and we then modify it to handle non-normal situations. That’s
the way the naive consensus algorithm was described, and it’s a recipe for
creating incorrect programs—both abstract and concrete ones. We should
start thinking about the general case, not the normal case.
The general state of acceptors in the naive algorithm is one that is
reached after a number of leaders have begun trying to get a value cho-
sen, and some of them may have succeeded. When a leader tries to get a
particular value chosen, we say that the leader has begun a ballot. When
an acceptor has sent an OK message for a value v in that ballot, we say
that the acceptor has voted for v in that ballot. The algorithm will assign
a unique natural number to each ballot.2 The state of the Voting algorithm
2
Don’t confuse different ballots with the different instances of the consensus algorithm
CHAPTER 5. REFINEMENT 173

records all the votes that each acceptor has cast. This is described by a
variable votes whose value is a function that assigns to each acceptor a a
set votes(a) of pairs hb, v i where b ∈ N and v ∈ Value. The pair hb, v i in
votes(a) means that a has voted for v in ballot number b.
Choosing a leader is the weak point in the naive algorithm. The Voting
algorithm abstracts away the leaders. A leader serves two functions. The
first is to ensure that in any ballot, acceptors can cast votes only for the
value proposed by the leader. The Voting algorithm’s next-state action
takes care of that by not letting an acceptor cast a vote for a value v in
ballot b if a vote has already been cast in ballot b for a different value. The
second function of the leader is to learn that a value has been chosen, which
it does when it has received enough OK messages. The Voting algorithm
does away with that function by declaring that the value has been chosen
when the requisite number of OK messages have been sent—that is, when
there are enough votes cast for the value in the ballot. More precisely, we
define ChosenAt(b, v ) to be true iff a majority of acceptors has voted for v in
ballot b. The Voting algorithm implements the Consensus abstract program
under the refinement mapping

(5.8) chosen ← {v ∈ Value : ∃ b ∈ N : Chosen(b, v )}

In addition to votes, the algorithm has one other variable maxBal whose
value is a function that assigns to each acceptor a a number maxBal (a).
The significance of this number is that a will never in the future cast a vote
in any ballot numbered less than maxBal (a). The value of maxBal (a) is
initially 0 and is never decreased. The algorithm can increase maxBal (a) at
any time.
It may seem strange that the state does not contain any information
about what processes have failed. We are assuming that a failed process does
nothing. Since we are describing only safety, a process is never required to
do anything, so there is no need to tell it to do nothing. A failed process that
has been repaired can differ from a process that hasn’t failed because it may
have forgotten its prior state when it resumes running. A useful property
of a consensus algorithm is that, even if all processes fail, the algorithm can
resume its normal operation when enough processes are repaired. To achieve
this, we require that a process maintains its state in stable storage, so it is
restored when a failed process restarts. A process failing and restarting is
then no different from a process simply pausing.
being executed. Execution of an instance of the consensus algorithm can consist of multiple
ballots.
CHAPTER 5. REFINEMENT 174

The heart of the Voting algorithm is a state expression SafeAt(b, v ) that


is true iff ChosenAt(c, w ) is false and will remain false forever for any c < b
and w 6= v . That it will remain false forever can be deduced from the current
state, because the next-state action implies both that a process a will not
cast a vote in ballot c when c < maxBal (a) and that maxBal (a) can never
decrease. The key invariant maintained by the algorithm is
(5.9) ∀ a ∈ Acceptor , b ∈ N, v ∈ Value :
(hb, v i ∈ votes(a)) ⇒ SafetAt(b, v )
where Acceptor is the set of acceptors. The next-state action allows a process
a to perform either of two actions:
• Increase maxBal (a). This action is always enabled.
• Vote for a value v in a ballot numbered b. As already explained, this
action is enabled only if no process has voted for a value other than
v in ballot b and b ≥ maxBal (a). An additional enabling condition is
required to maintain the invariance of (5.9).
I have given you all the information you need to figure out the definition
of SafeAt(b, v ) and the enabling condition on acceptors needed to maintain
the invariance of (5.9). Can you do it? Few people can. I was able to
only because I had simplified the problem to finding an abstract program
whose only processes are the acceptors and whose state consists only of the
set of votes cast and the value of maxBal . I had abstracted away leaders,
messages, and failures.
The Voting algorithm requires an acceptor to know the current state of
other acceptors to decide what vote it can cast. How can this lead to a
distributed consensus algorithm? I abstracted away leaders and messages; I
didn’t ignore them. I knew that an acceptor didn’t have to directly observe
the state of other acceptors to know that they hadn’t voted for some value
other than v in a ballot. The acceptor could know that because of a message
it received from a leader. I also knew that it could deduce that the other
enabling conditions were satisfied from messages it received. Abstracting
away leaders and messages enabled me to concentrate on the core problem
of achieving consensus. The solution to that problem told me what the
leaders should do and what messages needed to be sent.

5.3.2.3 The Paxos Abstract Program


The Voting algorithm told me what messages needed to be sent. But I had to
decide how to represent message passing in an abstract program. Languages
CHAPTER 5. REFINEMENT 175

expressly designed for describing distributed algorithms usually don’t re-


quire us to make that decision because they provide built-in message-passing
primitives. However, different distributed algorithms and distributed sys-
tems have different requirements for message passing. They may or may not
tolerate lost messages; they may or may not require messages to be deliv-
ered in the order they are sent; they may or may not require that the same
message not be received twice; and so on. Our abstract programs require
that we choose how to represent message passing, but they make it easy to
represent any form of message passing we want.
I have found that most computer scientists and engineers are constrained
by thinking in terms of how messages are transmitted in actual systems.
They think of messages being sent on communication channels between pro-
cesses. Few of them would come up with the simple representation of mes-
sage passing I used in the Paxos abstract program—a representation that is
obvious if one thinks mathematically.
Paxos doesn’t require that messages be delivered in the order in which
they are sent, so there is no need for message channels. The receiver of
a message can be inferred from the message, so we can just have a set of
messages. Paxos tolerates the same message being received multiple times
by a process, so there is no need to remove a message when it is received.
This means that if the same message is sent to multiple recipients, there
is no need for multiple copies of the message. There is also no need for a
separate action of receiving a message. An action that should be taken upon
receipt of a message simply has the existence of that message in the set of
sent messages as an enabling condition. Paxos tolerates message loss. But
since we are describing safety, there’s no difference between a lost message
and a message that is sent but never received. So, there is no need ever to
remove messages that have been sent.
We can therefore represent message passing with a variable msgs whose
value is the set of all messages that have been sent. A message is sent by
adding it to the set msgs. The presence of a message in msgs enables an
action that should be triggered by the receipt of the message. The algorithm
has a variable maxBal that implements the variable of the same name in
the Voting algorithm. It also has two other variables MaxVBal and MaxVal
whose values are functions with domain the set of acceptors. They are
explained below.
The Paxos consensus algorithm can be viewed as a multiprocess algo-
rithm containing two sets of processes: the acceptors that implement the
acceptors of the Voting algorithm, and an infinite set of processes, one for
each natural number, where process number b is the leader of ballot num-
CHAPTER 5. REFINEMENT 176

ber b. More precisely, the ballot b leader orchestrates the voting by the
acceptors in ballot b of the Voting algorithm.
The next-state action of the algorithm could be (but isn’t literally) writ-
ten in the form ∃ b ∈ N : BA(b) where BA(b) describes how ballot b is per-
formed. The ballot consists of two phases. In phase 1, the ballot b leader
sends a message to the acceptors containing only the ballot number b. An
acceptor a ignores the message unless b > maxBal (a), in which case it sets
maxBal (a) to b and replies with a message containing a, b, MaxVBal (a),
and MaxBal (a). When the ballot b leader receives those messages from a
majority of the acceptors, it can pick a value v to be chosen, where v is ei-
ther a value picked by the leader of a lower-numbered ballot or an arbitrary
value. The complete algorithm describes how it picks v . Phase 2 begins
with the leader sending a message to the acceptors asking them to vote for
v in ballot b. An acceptor a ignores the message unless b ≥ maxBal (a), in
which case a sets maxBal (a) to b and replies with a message saying that it
has voted for v in ballot b.
The Paxos algorithm implements the Voting algorithm under a refine-
ment mapping in which the variable votes of Voting is implemented by the
expression defined in the obvious way from the set of votes reported by
acceptors’ phase 2 messages in msgs, and in which the variable maxBal
of Voting is implemented by the variable of the same name in the Paxos
abstract program.
The values of maxVBal and maxVal can be described as functions of
the value of votes. For any acceptor a, the pair hmaxVBal (a), maxVal (a)i
equals the pair hb, v i in the set votes(a) with the largest value of b. (Ini-
tially, when votes(a) is the empty set, it equals h−1, None i for some special
value None.) Making maxVBal and maxVal variables rather than state
expressions makes it clear that they are the only information about what
messages have been sent that needs to be part of the acceptors’ states.

5.3.3 Implementing Paxos


An implementation of the Paxos consensus algorithm would add a third
phase to each ballot in which the leader sends a message announcing that a
value v had been chosen after it receives phase 2 messages telling it that a
majority of acceptors had voted for v in the ballot. With that addition, a
ballot of the Paxos algorithm looks like what the naive algorithm does when
a new leader has been selected. Phase 1 of a Paxos ballot corresponds to
the new leader finding out if it needs to complete the choosing of a value
proposed by the failed leader. Phase 2 corresponds to the leader completing
CHAPTER 5. REFINEMENT 177

the choosing of a previously proposed value or choosing a new value.


In Paxos, a leader performs phase 1 in every instance of the consensus al-
gorithm, while in the naive algorithm it performs the corresponding actions
only once when it is selected. This makes Paxos seem much less efficient,
since leaders are infrequently replaced. However, the value to be chosen isn’t
selected until phase 2. This means that phase 1 can be executed simulta-
neously for a ballot numbered b in all instances of the consensus algorithm
the first time ballot b is executed for any instance. A single message can
serve as the leader’s phase 1 message for all instances. A single message
can also contain the phase 1 responses of a particular acceptor for all the
instances, since there is only information to be transmitted for consensus
instances that have begun but not yet chosen a value. Thus Paxos uses the
same number of messages to choose a value as the naive algorithm.
The Paxos consensus algorithm is an efficient algorithm that has been
proved to satisfy the safety requirement of consensus. We still have to see
how to get it to satisfy the liveness requirement of actually choosing a value.
To solve a problem, we need to understand it, and it’s easy to understand
what prevents Paxos from choosing a value. An acceptor a participating in
ballot number b sets maxBal (a) to b, preventing it from responding to any
message from the leader of any ballot numbered less than b. Even in the
absence of failures or message loss, no value will ever be chosen if higher and
higher numbered ballots are begun before any ballot chooses a value.
Conversely, if ballot number b is started and no higher-numbered ballot
is begun, and if the ballot b leader and a majority of acceptors are work-
ing, then liveness assumptions that require working processes eventually to
perform enabled actions (which implicitly assume that messages sent are
eventually delivered) imply that a value is eventually chosen. This observa-
tion can be stated mathematically as a temporal logic formula and proved.
However, it is so obviously true that, to my knowledge, no one has ever
bothered doing it.
How do we assure that a ballot numbered b is started and no higher-
numbered ballots are? Paxos uses an infinite number of leader processes—
one for each ballot number. Those infinitely many processes are executed
by a finite number of computers, with each ballot number pre-assigned to
a single computer that executes the leader of the corresponding ballot. A
single computer, called the coordinator, is selected to be the only one that
executes leader processes, and it is easy to add messages that allow it to
find a ballot number higher than the values of mbal (a) for a majority of
acceptors a.
Like the naive algorithm, Paxos depends on selecting a single coordina-
CHAPTER 5. REFINEMENT 178

tor. However, the naive algorithm can fail to maintain its safety requirement
if two different computers believe they are the coordinator. If that happens
with Paxos, safety is preserved; the algorithm just fails to make progress.
An algorithm for choosing a coordinator in Paxos needs to work only most of
the time, a much easier problem to solve. One solution uses a synchronous
algorithm that implements consensus assuming known bounds on the times
needed to transmit and process messages. That algorithm chooses the coor-
dinator assuming values for those bounds that will be satisfied most of the
time.

5.4 Proving Refinement


This section sketches how to prove that one abstract program refines an-
other. We use as an example the proof that the One-Bit mutual exclusion
algorithm OB of Section 4.2.5.2 refines program LM of Figure 4.6, assuming
a weakly fair semaphore.
The proof uses identifiers from the definition of OB and ones from the
definition of LM . To avoid confusion, we indicate to which program an
identifier belongs with a subscript. We defined OB to equal:

Init ∧ 2[Next]v ∧ Fair



where Next = ∃ p ∈ {0, 1} : PNext(p)

Fair = ∀ p ∈ {0, 1} : WFv (PNext(p))
We now add subscripts to that definition, so OB equals:

InitOB ∧ 2[NextOB ]vOB ∧ FairOB

We assume LM has the same definition, except with the subscripts LM .


We sometimes use subscripts even when they aren’t necessary—for example
writing xOB even though LM has no variable named x .
We define OBSafe and OBFair as before, so OB equals OBSafe∧OBFair .
We define LMSafe and LMFair similarly. As expected, safety and liveness
are proved separately. We first show that OBSafe refines LMSafe. (By
machine closure of hOBSafe, OBFair i and Theorems 4.2 and 4.4, OBFair
isn’t needed to prove that OB refines LMSafe.) We then show that OB
implies LMFair . However, first we must define the refinement mapping
under which OB implements LM .
CHAPTER 5. REFINEMENT 179

5.4.1 The Refinement Mapping


To define the refinement mapping, it’s helpful to think of a single behavior
in which the variables xOB and pcOB describe a behavior of program OB and
the variables semLM and pcLM describe the corresponding behavior of LM
that is implemented by OB under the refinement mapping. To determine
what the refinement mapping should be, for each possible step in such a
behavior that changes the values of the variables of LM , we decide how that
step should change the values of OB .
For example, if a step of the behavior describes the execution of state-
ment cs by a process p of LM , then it should describe the execution of cs by
process p of OB . Thus, when the value of pcLM (p) changes from cs to exit,
the value of pcOB (p) should also change from cs to exit. Reasoning in this
way, we see that the values of pcLM (p) and pcOB (p) should be equal except
that when pcLM (p) equals wait, the value of pcOB (p) can be wait, w 2, w 3,
or w 4. This tells us that the refinement mapping must substitute for pcLM
the value pcBarOB defined by:

pcBarOB = p ∈ {0, 1} 7→ if pcOB (p) ∈ {w 2, w 3, w 4} then wait
else pcOB (p)
In a behavior satisfying LM , the value of semLM can be deduced from the
value of pcLM . In particular, semLM equals 0 iff pcLM (p) equals cs or exit for
one of the processes p. From the definition of pcBarOB , this means that the
refinement mapping must substitute for semLM the value semBarOB defined
by:

semBarOB = if (∃ p ∈ {0, 1} : pcOB (p) ∈ {cs, exit}) then 0 else 1

That OB refines LM under this refinement mapping means:

(5.10) |= OB ⇒ (LM with pcLM ← pcBarOB , semLM ← semBarOB )

We’ll be dealing with a lot of formulas obtained from a formula FLM by


making the substitutions defined by the refinement mapping for the variables
of LM . To keep from having lots of withs, we use this abbreviation, for
any formula FLM :

FLM = (FLM with pcLM ← pcBarOB , semLM ← semBarOB )

Thus, (5.10) can be written |= OB ⇒ LM . Also, pcLM equals pcBarOB and


semLM equals semBarOB (which explains the suffix Bar in the names pcBar
and semBar ).
CHAPTER 5. REFINEMENT 180

5.4.2 Refinement of Safety


We now sketch the proof that OBSafe refines LMSafe, which means the
proof of

(5.11) |= OBSafe ⇒ LMSafe

By the definitions of these formulas, this requires proving:

(5.12) (a) |= InitOB ⇒ InitLM


(b) |= InitOB ∧ 2[NextOB ]vOB ⇒ 2[NextLM ]vLM
The proof of (a) is simple. To prove (b), we use the invariant InvOB of OB ,
which is defined by (4.19), where TypeOK is defined by (4.17). That is, we
assume:

(5.13) |= OBSafe ⇒ 2InvOB

To prove (5.12b), it suffices to prove

(5.14) |= InvOB ∧ [NextOB ]vOB ⇒ [NextLM ]vLM

By definition of [. . .]v , we can prove (5.14) by proving:

(5.15) (a) |= InvOB ∧ NextOB ⇒ NextLM ∨ (vLM 0 = vLM )


(b) |= (vOB 0 = vOB ) ⇒ (vLM 0 = vLM )
Part (b) is trivial, since vLM is defined in terms of the variables of vOB . For
part (a), propositional logic tells us that we prove F ∧ (G 1 ∨ . . . ∨ G n ) ⇒ H
by proving F ∧ G i ⇒ H for each i . So, we decompose the proof of part (a)
by writing NextOB as the disjunction of subactions.
We use the notation introduced in Section 4.2.6 of naming the action
described by a labeled statement with the capitalized label. For example,
CsOB (p) is the action described by statement cs of process p of program
OB . We decompose the proof of (5.15a) into proving:

(5.16) |= InvOB ∧ LblOB (p) ⇒ NextLM ∨ (vLM 0 = vLM )

for each label lbl in Figure 4.3 and p in {0, 1}.


Condition (5.16) asserts that a step of OB described by the statement
labeled lbl implements a step of LM under the refinement mapping. We
defined the refinement mapping to make that true, so we should be able
to prove this assertion. We prove it by showing that each action LblOB (p)
implements some particular subaction of NextLM . In particular, we prove the
following seven assertions R1–R7. Three of them assert that actions of OB
CHAPTER 5. REFINEMENT 181

imply actions of the form hALM ivLM . For proving that OB implies LMSafe,
we need only the weaker assertions obtained by replacing such an action by
ALM . However, we will need the stronger assertions later for proving that
OB implies LMLive.
R1. |= InvOB ∧ NcsOB (p) ⇒ NcsLM (p)
R2. |= InvOB ∧ WaitOB (p) ⇒ (vLM 0 = vLM )
R3. |= InvOB ∧ W 2OB (p) ⇒
if p = 0 then hWaitLM (0)ivLM
else if xOB (0) then vLM 0 = vLM
else hWaitLM (1)ivLM
R4. |= InvOB ∧ W 3OB (p) ⇒ (vLM 0 = vLM )
R5. |= InvOB ∧ W 4OB (p) ⇒ (vLM 0 = vLM )
R6. |= InvOB ∧ CsOB (p) ⇒ hCsLM (p)ivLM
R7. |= InvOB ∧ ExitOB (p) ⇒ hExitLM (p)ivLM
Assertion R3 is equivalent to these three assertions:
R3a. |= InvOB ∧ W 2OB (0) ⇒ hWaitLM (0)ivLM
R3b. |= InvOB ∧ W 2OB (1) ∧ xOB (0) ⇒ (vLM 0 = vLM )
R3c. |= InvOB ∧ W 2OB (1) ∧ ¬xOB (0) ⇒ hWaitLM (1)ivLM
All these assertions are proved by expanding the definitions of the actions
and of the refinement mapping. To see how this works, we consider R3a.
We haven’t written the definitions of the actions corresponding to the pseu-
docode statements of algorithms OB and LM . The definitions of W 2OB (0)
and WaitLM (0) as well as the other relevant definitions are in Figure 5.2.
Here is the proof of R3a.

1. Suffices: Assume: InvOB ∧ W 2OB (0)


Prove: hWaitLM (0)ivLM
Proof: Obvious.
2. (pcLM (0) = wait) ∧ (pcLM 0 = (pcLM except 0 7→ cs))
Proof: By the step 1 assumption and the definitions of W 2OB (0) and
pcBarOB (0), since pcLM equals pcBarOB .
3. (semLM = 1) ∧ (semLM 0 = 0)
Proof: W 2OB (0) implies (pcOB (0) = w 2)∧¬xOB (1), and InvOB and ¬xOB (1)
imply pcOB (1) ∈
/ {cs, exit}. Hence, semBarOB = 1, so semLM = 1. The
CHAPTER 5. REFINEMENT 182

W 2OB (0) ≡ WaitLM (0) ≡


∧ pcOB (0) = w 2 ∧ pcLM (0) = wait
∧ ¬xOB (1) ∧ semLM = 1
∧ pcOB 0 = (pcOB except 0 7→ cs) ∧ pcLM 0 = (pcLM except 0 7→ cs)
∧ xOB 0 = xOB ∧ semLM 0 = 0

InvOB = ∧ TypeOKOB
∧ ∀ p ∈ {0, 1} : ∧ (pcOB (p) ∈ {w 2, cs}) ⇒ xOB (p)
∧ (pcOB (p) = cs) ⇒ (pcOB (1 − p) 6= cs)

TypeOKOB = ∧ xOB ∈ ({0, 1} → {true, false})
∧ pcOB ∈ ({0, 1} → {ncs, wait, w 2, w 3, w 4, cs, exit})
∧ pcOB (0) ∈
/ {w 3, w 4}

pcBarOB = p ∈ {0, 1} 7→ if pcOB (p) ∈ {w 2, w 3, w 4} then wait
else pcOB (p)

semBarOB = if ∃ p ∈ {0, 1} : pcOB (p) ∈ {cs, exit} then 0 else 1
pcLM = pcBarOB semLM = semBarOB

Figure 5.2: Definitions used in the proofs.

definition of W 2OB (0) and InvOB (which implies pcOB is a function with
domain {0, 1}) imply pcOB 0 (0) = cs. Hence semBarOB 0 = 0, so semLM 0 = 0.
4. Q.E.D.
Proof: Steps 2 and 3 and the definition of WaitLM (0) imply WaitLM (0).
Step 3 implies semLM 0 6= semLM which implies vLM
0 6= v , proving the goal
LM
hWaitLM (0)ivLM introduced by step 1.

How we decomposed the proof that OBSafe refines OBLive into proving
R1–R7 was determined by the structure of NextOB as a disjunction of seven
subactions and knowing which disjuncts of NextLM each of those subactions
implements, which followed directly from the definition of the refinement
mapping. The decomposition of R3 into R3a–R3c followed from the struc-
ture of R3. As illustrated by the proof of R3a, the proof of each of the
resulting nine formulas is reduced to ordinary mathematical reasoning by
expanding the appropriate definitions. The only place where not under-
standing the algorithms could result in an error is in the definition of the in-
variant InvOB or of the refinement mapping. Catching such an error requires
CHAPTER 5. REFINEMENT 183

only careful reasoning about simple set theory and a tiny bit of arithmetic,
using elementary logic. Someday, computers should be very good at such
reasoning.

5.4.3 Refinement of Fairness


This section shows how to prove that a program refines the fairness property
of another program by sketching the proof of one example: OB implies
LMFair . Define

OBB = 2(InvOB ∧ InvLM ) ∧ 2[NextOB ]vOB ∧ OBFair


where InvOB is the invariant satisfied by OB defined by (4.19) and (4.17),


and InvLM is an invariant of LM . For our example, we just require that
InvLM implies type correctness of LM . Formula OBB is a 2 formula that is
implied by OB . (We have proved that OB implies LMSafe, which implies
that InvLM is an invariant of OB .) We prove |= OBB ⇒ LMFair .
We use Theorem 4.7 to write LMFair as the conjunction of weak fairness
of WaitLM (p), CSLM (p), and ExitLM (p), for p ∈ {0, 1}. So, we have to prove
|= OBB ⇒ WFvLM (ALM ) for ALM equal to each of those six actions. By
(4.12), we can do this by proving:

(5.17) |= OBB ⇒ (2 EhALM ivLM ; hALM ivLM )

We prove (5.17) by finding an action BOB and state predicates POB and QOB
satisfying the following conditions:

A1. 1. |= InvOB ∧ InvLM ∧ EhALM ivLM ⇒ QOB


2. |= OBB ⇒ (2QOB ; 2POB )

A2. 1. |= InvOB ∧ InvLM ∧ POB ⇒ EhBOB ivOB


2. |= OBB ⇒ WFvOB (BOB )

A3. |= InvOB ∧ InvLM ∧ POB ∧ hBOB ivOB ⇒ hALM ivLM

To show that these conditions imply (5.17), we have to show that they imply
that in any behavior σ satisfying OBB , if 2 EhALM ivLM is true of σ +m , then
σ(n) → σ(n + 1) is an hALM ivLM step for some n ≥ m. Condition A1.1
implies 2QOB is true of σ +m , which by A1.2 implies 2POB is true of σ +q for
some q ≥ m. By the definition of WF, conditions A2 imply σ(n) → σ(n + 1)
is a hBOB ivOB step for some n ≥ q, and A3 implies that hBOB ivOB step is
an hALM ivLM step.
CHAPTER 5. REFINEMENT 184

ALM QOB BOB POB


W aitLM (0) ∧ pcOB (0) ∈ {wait, w2} W 2OB (0) ∧ pcOB (0) = w2
∧ pcOB (1) ∈
/ {cs, exit} ∧ ¬ x(1)

W aitLM (1) ∧ pcOB (1) ∈ W 2OB (1) ∧ pcOB (1) = w2


{wait, w2, w3, w4} ∧ ¬ x(0)
∧ pcOB (0) ∈
/ {cs, exit}

CSLM (p) pcOB (p) = cs CSOB (p) pcOB (p) = cs

ExitLM (p) pcOB (p) = exit ExitOB (p) pcOB (p) = exit

Figure 5.3: Formulas BOB , POB , and QOB for the actions ALM , with p ∈ {0, 1}.

The formulas BOB , POB , and QOB used for the six actions ALM are shown
in Figure 5.3. Condition A2.1 for the actions ALM follows easily from the
definitions of BOB and POB . To show that A2.2 is satisfied, we apply Theo-
rem 4.7 to write OBFair as the conjunction of weak fairness of the actions
described by each process’s statements other than its ncs statement. That
A3 is satisfied for the four actions ALM in Figure 5.3 follows from conditions
R3a, R3c, R6, and R7 of Section 5.4.2.
This leaves condition A1 for the actions. A1.1 is proved by using
the type correctness invariant implied by InvLM to show that EhALM ivLM
equals E(ALM ) , and then substituting pcBarOB for pcLM and semBarOB
for semLM in E(ALM ) . For our example, this actually shows that InvLM
implies EhALM ivLM ≡ QOB for all the actions ALM . A1.2 is trivially satisfied
for CSLM (p) and ExitLM (p), since QOB and POB are equal. The interesting
conditions are A1.2 for WaitLM (0) and WaitLM (1). They are the kind of
leads-to property we saw how to prove in Section 4.2.5. In fact, we now
obtain a proof of A1.2 for WaitLM (0) by a simple modification of the proof
in Section 4.2.5.3 that OB implies:

(5.18) (pc(0) ∈ {wait, w 2}) ; (pc(0) = cs)

Let’s drop the subscript OB , so the variables in any formula whose name
has no subscript are the variables of OB . The proof of (5.18) is described
by the proof lattice of Figures 4.4 and 4.5. A 2 formula in a label on a
box in a proof lattice means that the formula is conjoined to each formula
inside the box. Since F ; G implies (2H ∧ F ) ; (2H ∧ G) for any F ,
G, and H , we obtain a valid proof lattice (one whose leads-to assertions are
all true) by conjoining 2InvLM ∧ OBFair ∧ 2Q to the labels of the outer
CHAPTER 5. REFINEMENT 185

boxes in the lattices of Figures 4.4 and 4.5. This makes those labels equal
to OBB ∧ 2Q . Since Q implies pc(0) ∈ {wait, w 2} , we obtain a valid
proof lattice by replacing the source node of the lattice in Figure 4.4 by
2Q. Moreover, since the new label’s conjunct 2Q implies 2(pc(0) 6= cs),
so it’s impossible for pc(0) ever to equal cs, we can remove the sink node
pc(0) = cs and the edges to and from it from the lattice of Figure 4.5.3
Since the label on the inner box containing 2¬x (1) , which is the new sink
node, implies 2(pc(0) = w 2) , we now have a valid proof lattice that shows:

|= OBB ⇒ (2Q ; 2((pc(0) = w 2) ∧ ¬x (1)))

This proves A1.2 for the action WaitLM (0).


To prove A1.2 for action WaitLM (1), it suffices to assume OBB and prove
2Q ; 2P for the formulas P and Q given in Figure 5.3 for the action. Here
is the proof sketch, which uses without mention some simple temporal logic,
including transitivity of ;.

1. 2Q ⇒ 2¬x (0)
1.1. 2Q ⇒ 2(pc(0) ∈ / {wait, w 2})
Proof: We proved in Section 4.2.5.3 that pc(0) ∈ {wait, w 2} leads to
pc(0) = cs, and 2Q implies 2(pc(0) 6= cs).
1.2. 2Q ∧ 2(pc(0) ∈ / {wait, w 2}) ⇒ 2(pc(0) = ncs)
Proof: Q implies pc(0) ∈/ {cs, exit}, which by Inv and pc(0) ∈
/ {wait, w 2}
implies pc(0) = ncs.
1.3. Q.E.D.
Proof: By steps 1.1 and 1.2, since Inv ∧ Q imply pc(0) = ncs, and Inv
and pc(0) = ncs imply ¬x (0).
2. 2Q ∧ 2¬x (0) ; 2P
2.1. 2Q ∧ 2¬x (0) ; (pc(1) = w 2)
Proof: Q implies pc(1) ∈ {wait, w 2, w 3, w 4}, and a straightforward
proof using fairness of PNext(1) and 2¬x (0) shows
(pc(1) ∈ {wait, w 2, w 3, w 4}) ; (pc(1) = w 2)
2.2. 2Q ∧ 2¬x (0) ∧ (pc(1) = w 2) ⇒ 2(pc(1) = w 2)
Proof: 2Q implies 2(pc(1) 6= cs), and (pc(1) = w 2) ∧ 2[Next]v ∧
2(pc(1) 6= cs) implies 2(pc(1) = w 2).
3
Equivalently, we can remove edge 8 and add an edge from pc(0) = cs to false and
an edge from false to 2¬x (1), since false implies anything.
CHAPTER 5. REFINEMENT 186

2.3. Q.E.D.
Proof: Steps 2.1 and 2.2 imply 2Q ∧ 2¬x (0) ; 2(pc(1) = w 2), and
2P equals 2(pc(1) = w 2) ∧ 2¬x (0).
3. Q.E.D.
Proof: By steps 1 and 2,

5.4.4 A Closer Look at E


5.4.4.1 A Syntactic View
Section 4.2.1 explained E semantically, defining E(A) to be true of a state
s iff there exists a state t such that action A is true of the step s → t. We
now translate this semantic definition into a syntactic definition of E(A). A
state is an assignment of values to variables, so we can restate that definition
as:
E1. E(A) is true for an assignment of values to the unprimed variables
iff there exists an assignment of values to the primed variables that
makes A true.
A state predicate is true of a state iff it is true when its variables have the
values assigned to them by the state. We can restate E1 as:
E2. E(A) is true (of a state) iff there exist values of the primed variables
for which A is true.
We will now translate E2 into a precise syntactic definition of E.
To do this, for any variable x , we now regard x and x 0 as two unrelated
symbols. For an expression exp, we take exp 0 to be the expression obtained
by priming all the variables in exp. If exp contains a defined symbol whose
definition contains variables, then all the variables in that definition are
primed in exp 0 .
We now define awith to be substitution like with, except regarding x 0
as being a different variable from x . For example, if x , y, and z are variables,
then:
(x 0 = x + 1) with x ← y − z equals (y − z )0 = (y − z ) + 1
(x 0 = x + 1) awith x ← y − z equals x 0 = (y − z ) + 1
(x 0 = x + 1) awith x 0 ← y − z equals (y − z ) = x + 1
If sym is a defined symbol, then
(x 0 = x + sym) awith x 0 ← y − z
CHAPTER 5. REFINEMENT 187

equals
(y − z ) = x + (sym awith x 0 ← y − z )


If sym = 2 ∗ x 0 , then this equals
q
(y − z ) = x + 2 ∗ (y − z )
Now let A be an action and let x 1 , . . . , x n be all the variables that
appear in A. We can then write E2 as:
(5.19) E(A) = ∃ c 1 , . . . , c n : (A awith x 0 1 ← c 1 , . . . , x 0 n ← c n )

Thus, we obtain E(A) from A by replacing the primed variables by bound


constants that are existentially quantified. We informally describe this def-
inition by saying that E(A) is obtained from A by existentially quantifying
its primed variables.

5.4.4.2 Computing E
The syntactic definition (5.19) of E immediately provides rules for writing
E(A) in terms of formulas E(B i ), for B i subactions of A. From the rule
|= (∃ c : A ∨ B ) ≡ (∃ c : A) ∨ (∃ c : B )
we have
E1. |= E(A ∨ B ) ≡ E(A) ∨ E(B )
For example, in program LM defined in Section 4.2.6.1, the next-state action
PNext(p) is the disjunction of actions Ncs(p), Wait(p), Cs(p), and Exit(p).
Therefore, rule E1 implies
E(PNext(p)) ≡ E(Ncs(p)) ∨ E(Wait(p)) ∨ E(Cs(p)) ∨ E(Exit(p))
The generalization of E1 is:
E2. |= E(∃ i ∈ S : Ai ) ≡ ∃ i ∈ S : E(Ai )
where S is a constant or state expression.
Another rule of existential quantification is that if the constant c does
not occur in A, then ∃ c : (A ∧ B ) is equivalent to A ∧ (∃ c : B ). From this
we deduce:
E3. If no variable appears primed in both A and B , then |= E(A ∧ B ) ≡
E(A) ∧ E(B ).
CHAPTER 5. REFINEMENT 188

For example, in program LM we have:



Wait(p) = ∧ (sem = 1) ∧ (pc(p) = wait)
∧ sem 0 = 0
∧ pc 0 = (pc except p 7→ cs)
Therefore, rule E3 implies
(5.20) E(Wait(p)) ≡ ∧ E((sem = 1) ∧ (pc(p) = wait))
∧ E(sem 0 = 0)
∧ E(pc 0 = (pc except p 7→ cs))
The following two rules also follow easily from (5.19) and properties of ex-
istential quantification:
E4. If P is a state predicate, then |= E(P ) ≡ P .

E5. If x is a variable and exp is a state expression, then |= E(x 0 = exp) ≡


true.
From (5.20), E4, and E5, we deduce that E(Wait(p)) equals (sem = 1) ∧
(pc(p) = wait). Here is another obvious rule, which can be considered a
generalization of E5, since c = exp is equivalent to c ∈ {exp} :
E6 If x is a variable and exp is a state expression, then |= E(x 0 ∈ exp) ≡
(exp 6= {}).
Rules E1– E6 are sufficient for computing E(A) for almost all subactions
A that, like PNext(p), appear in the definition of a program’s next-state
action. However, the definition of fairness does not contain such formu-
las E(A). Instead, it contains formulas of the form EhAiv , which equals
E(A ∧ (v 0 6= v )). None of those rules apply to such a formula. In particular,
E3 does not apply because v is the tuple of all the program’s variables.
Most of the time, a subaction A in the definition of a program’s next-
state action does not allow stuttering steps. Therefore, hAiv equals A, so
EhAiv equals E(A) and we can apply the rules. For example, EhPNext(p)iv
equals E(PNext(p)) because a PNext(p) step changes the value of pc(p), so
it can’t be a stuttering step. We are using the substitutivity rule (3.34) of
ordinary math to deduce
|= (A ≡ B ) ⇒ ( E(A) ≡ E(B ))
(Even though substitutivity is not valid for TLA or the Logic of Actions, we
can apply it to the syntactic definition (5.19) of E, which treats x and x 0 as
two different variables of ordinary math—that is, two different constants.)
CHAPTER 5. REFINEMENT 189

However, we can’t deduce PNext(p) ≡ hPNext(p)iv from the definition


of PNext(p). For example, if pc(p) = cs, then the definition of PNext(p)
asserts
pc 0 = (pc except p 7→ exit)
If p is not in the domain of pc, then pc 0 (p) = pc(p). If pc is not a function,
then we have no idea what pc 0 (p) equals, so it could equal pc(p). Fortu-
nately, we care what EhPNext(p)iv equals only in reachable states of LM .
So, we just have to prove that Inv implies EhPNext(p)iv ≡ E(PNext(p))
for an invariant Inv of LM that asserts type correctness. To do this, we
observe that for any action A and state predicate P , rules E3 and E4 imply
P ∧ E(A) ≡ E(P ∧ A). So, to prove that Inv implies EhPNext(p)iv ≡
E(PNext(p)), it suffices to prove
|= Inv ⇒ (hPNext(p)iv ≡ PNext(p))
which is straightforward. In general, we reason about liveness under the
assumption that the program’s safety property is satisfied, so we can assume
2Inv is true for an invariant Inv of the program.
Since the formula 2[Next]v always allows stuttering steps, there is no
need for a next-state action Next to allow them. Usually, it doesn’t. How-
ever, there is no reason for Next not to allow stuttering steps, and sometimes
it’s more convenient to write a subaction A that allows them. In that case,
we have to use the definition (5.19) to compute EhAiv . However, we apply
the definition to E(Inv ∧ hAiv ), which equals EhInv ∧ Aiv , for a program
invariant Inv .

5.4.4.3 The Trouble With E


Refinement is based on substitution. Program OB refines LM means:
(5.21) |= OB ⇒ (LM with pc ← pcBar , sem ← semBar )
We no longer need the subscripts that were added to help us understand
which program an identifier refers to. We continue using the abbreviation
that, for any formula F :

F = (F with pc ← pcBar , sem ← semBar )
Almost without thinking, we replaced Init ∧ 2[Next]v with the equivalent
property Init ∧ 2[Next]v . We were actually using these three rules:
• F ∧ G ≡ F ∧ G for any formulas F and G.
CHAPTER 5. REFINEMENT 190

• 2F ≡ 2F for any formula F .

• [A]v ≡ [A]v for any action A and state expression v .

The first asserts that substitution distributes over ∨; the second asserts
that substitution distributes over 2; and the third asserts that substitution
distributes over the construct [. . .]... .
We expect substitution to distribute in this way over all mathematical
operators, so we would expect E(A) and E(A) to be equal for any action A.
In fact, they are equal for most actions encountered in practice. But here’s
an action A for which they aren’t for the refinement mapping of (5.21):

A = ∧ pc 0 = (p ∈ {0, 1} 7→ wait)
∧ sem 0 = 0

Rules E3 and E5 imply that E(A) equals true, so E(A) equals true. By
definition of the refinement mapping:

A = ∧ pcBar 0 = (p ∈ {0, 1} 7→ wait)
∧ semBar 0 = 0

A implies pcBar 0 (p) = wait for p ∈ {0, 1}. By definition of pcBar , this
implies:

(1) pc 0 (p) ∈ {w 2, w 3, w 4, wait} for p equal to 0 or 1.

But A also implies semBar 0 = 0, which by the definition of semBar implies:

(2) pc 0 (p) ∈ {cs, exit} for p equal to 0 or 1.

Both (1) and (2) can’t be true, so A must equal false and thus E(A)
equals false. Therefore, E(A) does not equal E(A), so substitution does
not always distribute over E.
The reason substitution doesn’t distribute over E is that E(A) performs
the substitutions pc ← pcBar and sem ← semBar for the primed variables
pc 0 and sem 0 . However, as we see from (5.19), those primed variables do not
occur in E(A); they are replaced by bound constants. The substitutions
should be performed only on the unprimed variables. Therefore:

E(A) with pc ← . . . , sem ← . . .

does not equal

E(A with pc ← . . . , sem ← . . .)


CHAPTER 5. REFINEMENT 191

Instead, it equals

E(A awith pc ← . . . , sem ← . . .)

which substitutes only for unprimed variables.


Since WF and SF are defined in terms of E, substitution does not dis-
tribute over them either. We proved that OB refines LM by proving that OB
implies WFv (A) for six actions A. To evaluate WFv (A), we expanded the
definition of WF. Since substitution distributes over all the operators other
than E in the definition of WFv (PN ), including in the definition PN (p),
we could perform the substitutions everywhere in the resulting formula ex-
cept in EhAiv ). We could then have used (5.19) to expand the definition
of E and perform the substitution on the resulting formula, which contains
no primed variables. (This is equivalent to performing the substitution in
EhAiv , except using awith instead of with.)
While expanding the definition of E in this way would have allowed
EhAiv to be evaluated, it would have required applying E to an action
that was more complicated than hAiv . That’s not what we did in the proof
sketch in Section 5.4.3. Instead we showed that hAiv equals A and performed
the substitution on E(A). Showing hAiv ≡ A required an invariant Inv of
LM , but because OB refines LMSafe, the formula Inv is an invariant of OB ,
allowing us to deduce that EhAiv equals A.
Substitution not distributing over E makes E mathematically weird.
You should be suspicious of such weird things. The operators 2 and 0
(prime) that TLA adds to ordinary math are weird because they are not
substitutive. But substitution does distribute over them. Moreover, tempo-
ral logic is a well-studied field of math. I find E weirder than the temporal
logic operators.
However, fairness is an important concept in concurrent programs. The
WF and SF operators are the mathematical expressions of what fairness has
meant since Dijkstra introduced the assumption of weak fairness in 1965 [9].
There seems to be no good way to express it mathematically without the
operator E.
A similarly weird operator has been at the heart of traditional programs
since the earliest coding languages—namely, the action composition operator
“·” introduced in Section 3.4.1.4. If x 1 , . . . , x n are all the variables that
appear in actions A and B , then A · B can be defined syntactically by:

A · B = ∃ c 1 , . . . , c n : ∧ (A awith x 0 1 ← c 1 , . . . , x 0 n ← c n )
∧ (B awith x 1 ← c 1 , . . . , x n ← c n )
CHAPTER 5. REFINEMENT 192

The primed variables of A and the unprimed variables of B are replaced by


bound constants, and substitution does not distribute over “·” for the same
reason it doesn’t distribute over E.
The common methods for reasoning about traditional programs written
in an imperative language can be viewed as a form of Hoare logic. As ex-
plained in Appendix Section A.4, such a logic can be viewed mathematically
as defining the meaning of a statement S to be an action AS . If the mean-
ings of statements S and T are the actions AS and AT , then the meaning
of S ; T is the action AS · AT .
With this way of reasoning, the semicolon of imperative coding languages
therefore has the same weirdness as the E operator. I suspect this was never
discovered because people thought of programs in terms of conventional
code, and it makes no sense to implement a variable x by an expression
when x can appear in an assignment statement x : = . . . .

5.5 A Warning
We have defined correctness of a program S to mean |= S ⇒ P for some
property P . We have to be careful to make sure that we have chosen P so
that this implies what we really want correctness of the program to mean.
As discussed in Section 4.3, we have to be concerned with the accuracy of P .
When correctness asserts that S refines a program T , the property P
is (T with . . . ) for a refinement mapping “. . .”. That refinement mapping
is as important a part of the property as the program T , and it must be
examined just as carefully to be sure that proving refinement means what
you want it to. As an extreme example, OB also implements LM under this
refinement mapping:

pcLM ← (p ∈ {0, 1} 7→ ncs), semLM ← 1

Implementation under this refinement mapping tells us nothing about OB ,


because under it, every behavior of OB implements a behavior in which all
processes remain forever in their noncritical sections. The program obtained
by replacing the next-state action of OB by false also implements LM
under this refinement mapping.
Such an egregiously useless refinement mapping can often be detected
because, under a refinement mapping that implements behaviors of a pro-
gram T by behaviors of program S that do nothing, S won’t implement
the fairness properties of T . However, programs often don’t require that
actions representing the initiation of an operation by the environment ever
CHAPTER 5. REFINEMENT 193

occur. In such a case, it’s a good idea to make sure that S refines T when
fairness requirements are added to those actions in both programs. This is
an application of the general idea of adding fairness to verify possibility that
was introduced in Section 4.3.2.
Chapter 6

Auxiliary Variables

An auxiliary variable is a variable that is added to an abstract program


without altering the values assumed by the program’s regular variables. It’s
sometimes necessary to add auxiliary variables to a program in order to prove
that it refines another program. Sections 6.2, 6.3, and 6.4 define the three
kinds of auxiliary variables that may be needed, illustrating them with silly
little examples. Section 6.5 describes a realistic example that makes use of
all three kinds of auxiliary variables. We begin with a section that explains
variable hiding, which is the basis for auxiliary variables and is also used in
Chapter 7.

6.1 Variable Hiding


6.1.1 Introduction
Recall the behavior predicate F12 , discussed in Section 4.1.2, that is true of
a behavior iff the value of x can equal 2 in a state only if x equaled 1 in a
previous state. We gave a semantic definition of F12 ; it can’t be expressed
in RTLA or TLA as those languages have been defined so far. We observed
that F12 can be expressed as the abstract program S 12 , defined in (4.2), by
introducing an additional variable y.
The variable x that we’re interested in is called an interface variable.
The variable y that’s used only to describe how the values of x can change
is called an internal variable. There’s a problem with using the internal
variable y to describe F12 . Consider the abstract program S x that starts
with x = 0 and can keep incrementing x by one:

S x = (x = 0) ∧ 2[x 0 = x + 1]x

194
CHAPTER 6. AUXILIARY VARIABLES 195

Since S x allows x to equal 2 only after it has equaled 1, it satisfies property


F12 . However, S x doesn’t imply S 12 because S 12 describes how the values of
x and y change, while S x allows behaviors in which y can have any values.
We want to express F12 by a formula that asserts of a behavior σ that
there is some way to assign values to y that makes S 12 true, but says nothing
about the actual values of y. As mentioned in Section 4.1.2, that formula is
written ∃ y : S 12 . The operator ∃ is explained here.
In ordinary math, the formula ∃ y : x ∗ y 2 = 36 asserts that there is some
value y for which x ∗y 2 equals 36, but says nothing about the actual value of
y. The variables of ordinary math correspond to the constants of temporal
logic. The y in ∃ y : S 12 is a constant, so that formula asserts that there is a
constant value y that satisfies S 12 ; and that value equals true if the initial
value of x is 1, otherwise it equals false. Formula ∃ y : S 12 asserts that x
can never equal 2 unless the initial value of x is 1, which is not what F12
asserts.
The formula ∃ y : S 12 is true of a behavior iff the values of x in that
behavior are the same as its values in a behavior satisfying S 12 , where y is a
variable rather than a constant; but it says nothing about the actual values
assumed by y. Thus, y is not a (free) variable of ∃ y : S 12 . The precise
definition of ∃ is subtle and is given below. For now, we just need to know
that [[∃ y : S 12 ]] equals F12 . I like to say that ∃ y : S 12 is formula S 12 with
y hidden, because ∃ corresponds to what hiding seems to mean in coding
languages.
We can now generalize abstract programs to allow quantification over
variables. As with the operator ∃ , we let ∃ y 1 , . . . , y n : F be an abbreviation
for ∃ y 1 : . . . ∃ y n : F . The general form of an abstract program with hidden
variables is:
(6.1) ∃ y 1 , . . . , y k : Init ∧ 2[Next]v ∧ L
with internal (bound) variables y 1 , . . . , y k . (The interface variables are the
free variables of the formula.) Theorem 4.8 shows that any property that
can be described mathematically can be written in this form, with a single
bound variable. However, ∃ is of little use in practice. The only role it
plays is telling us that, when implementing the program, it doesn’t matter
how the internal variables are refined. That can be stated just as well in a
comment; we don’t need to introduce a new operator just for that. In fact,
although ∃ is an operator of TLA+ and is recognized by the parser, none of
the current tools handle it. Model checking formulas containing ∃ seems to
be computationally infeasible. I don’t know of any engineer wanting to use
it.
CHAPTER 6. AUXILIARY VARIABLES 196

The reason to understand the temporal existential quantification oper-


ator ∃ is that it is the logical underpinning of important concepts such as
the auxiliary variables discussed in this chapter.

6.1.2 Reasoning About ∃


Allowing an abstract program to be described with a formula of the form
(6.1) raises the question of how to reason about such formulas. The answer
is that the operator ∃ obeys the rules for the quantifier ∃ of predicate logic
given in Section 2.1.9.3, except with program variables (now called variables)
replacing the mathematical variables (now called constants). In principle,
∃ has the same problem of variable capture as ∃ , and its proof rules require
placeholders. In practice, ∃ is used in such a restricted way that we can
ignore variable capture and don’t have to worry about placeholders. There
are two ways we want to reason about a formula ∃ y 1 , . . . , y k : F :
(1) Show that it implies a formula G.
(2) Show that it is implied by a formula G.
For (1), we use the ∃ elimination rule (the analog of the ∃ elimination rule
of predicate logic) when the bound variable does not occur in the formula
being proved. If v does not occur (free) in G, then that rule asserts

|= F ⇒ G implies |= (∃ v : F ) ⇒ G

Therefore, if none of the variables y i occur in G, by applying the rule mul-


tiple times we prove

|= (∃ y 1 , . . . , y k : F ) ⇒ G

by proving |= F ⇒ G, treating the bound variables y i and the free variables


the same. If y i occurs free in G, then we can replace it in the quantified
formula by any variable that does not occur in F or G.
For (2), applying the ∃ Introduction rule multiple times, we prove

(6.2) |= G ⇒ ∃ y 1 , . . . , y k : F

for a formula G by proving:

(6.3) |= G ⇒ (F with y 1 ← exp 1 , . . . , y k ← exp k )

for expressions exp i .


CHAPTER 6. AUXILIARY VARIABLES 197

Combining (1) and (2), we see that to prove that one program of the
form (6.1) implements another program of that form, we have to prove an
assertion of the form
|= T ⇒ (S with y 1 ← exp 1 , . . . , y k ← exp k )
where S and T have the standard form Init ∧ 2[Next]v ∧ L of an abstract
program and the y i are the internal variables in S . For every interface
variable x of S , which in practice must also occur in T , the with clause
includes an implicit substitution x ← x that substitutes the variable x of
T for the variable x of S . Thus, the with clause describes a refinement
mapping under which T refines S .
This raises a question: If (6.2) is true, do there always exist expressions
exp i for which (6.3) is true? The answer is no, if we can use only the variables
that appear in T to define the refinement mapping. If S has the form
Init ∧ 2[Next]v ∧ L, then the answer is yes if we’re allowed to add auxiliary
variables to T . Adding an auxiliary variable a (which does not occur in T )
to T means writing a formula T a such that ∃ a : T a is equivalent to T . By
this equivalence, we can verify |= T ⇒ S by verifying |= (∃ a : T a ) ⇒ S . By
the ∃ Elimination rule, we do this by verifying |= T a ⇒ S . And to verify
this, we can use a as well as the variables of T to define the refinement
mapping. Auxiliary variables are the main topic of this chapter and are
discussed below.

6.1.3 The Definition


The standard way temporal existential quantification is defined in most tem-
poral logics is not suitable for TLA because it does not preserve stuttering
insensitivity (SI), defined in Section 3.5.3. It’s the natural way to define it
for RTLA, so we will call the operator defined in that way ∃ rtla .
To define ∃ rtla , we first define s =y t to be true for states s and t iff the
values of all variables except y are the same in states s and t. Remembering
that σ(i ) is state number i of a behavior σ, we define the relation ' y on
behaviors by:

σ 'y τ = ∀ i ∈ N : σ(i ) =y τ (i )
Therefore, σ ' y τ asserts that behaviors σ and τ are the same except for
the values assigned to y by their states. We then define ∃ rtla y : F to be
satisfied by a behavior σ iff it is satisfied by some behavior τ with σ 'y τ .
The operator ∃ rtla is not a suitable hiding operator for properties, and
hence not suitable for TLA, because the formula ∃ rtla y : F need not be a
CHAPTER 6. AUXILIARY VARIABLES 198

property even if F is. For example, let F be the following formula, where
br c is the largest integer less than or equal to r :

(6.4) (x = y = 0) ∧ 2[(y 0 = y + 1) ∧ (x 0 = by 0 /2c)]hx ,y i

Ignoring the values of other variables, the property F is satisfied by this


non-halting behavior with no stuttering steps:
" # " # " # " # " #
x :: 0 x :: 0 x :: 1 x :: 1 x :: 2
→ → → → → ···
y :: 0 y :: 1 y :: 2 y :: 3 y :: 4
0 1 2 3 4

The non-halting behaviors of ∃ rtla y : F consist of this behavior:


h i h i h i h i h i
(6.5) x :: 0 → x :: 0 → x :: 1 → x :: 1 → x :: 2 → ···
0 1 2 3 4

and behaviors obtained from it by adding stuttering steps. An SI formula


containing the one free variable x that allows behavior (6.5) should also
allow this behavior:
h i h i h i h i h i
(6.6) x :: 0 → x :: 1 → x :: 2 → x :: 3 → x :: 4 → ···
0 1 2 3 4

which ∃ rtla y : F does not allow, so it is not a property.


To obtain the proper quantifier ∃ for TLA, we modify the definition of
∃ rtla so ∃ y : F is satisfied by (6.6). The definition of ∃ is the same as that
of ∃ rtla except with the relation ' y on behaviors replaced by a relation
∼ y . This relation is defined so σ ∼ y τ means approximately that σ can be
obtained from τ by adding and removing stuttering steps and then changing
the values of y. The precise definition of ∼ y is a bit subtle. (In fact, its
definition in [34] is wrong.) To define ∼ y , we first define the operator \y
on behaviors. This operator is the same as the operator \ defined in (3.37)
except with “=” replaced by “= y ”. Thus, \y removes “almost stuttering”
steps instead of just stuttering steps, where a step s → t is almost stuttering
if s =y t. Here’s the precise definition:

\y (σ) = if ∀ i ∈ N : σ(i ) =y σ(0)
then σ
else if σ(0) =y σ(1) then \y (Tail (σ))
else hσ(0)i ◦ \y (Tail (σ))

We now define σ ∼ y τ to equal \y (σ) 'y \y (τ ) and define ∃ y : F to be


satisfied by a behavior F iff there is a behavior τ satisfying F such that
σ ∼ y τ . Observe that σ ' y τ implies σ ∼ y τ . Therefore, ∃ rtla y : F
implies ∃ y : F for any behavior predicate F .
CHAPTER 6. AUXILIARY VARIABLES 199

One reason not to use ∃ is that if S is a safety property, then ∃ y : S


need not be a safety property. Temporal quantification destroys the nice
separation of safety and liveness provided by our way of describing abstract
programs. For example, let F be this safety property for an abstract pro-
gram:

(6.7) ∧ (x = 0) ∧ (y ∈ N)
∧ 2 [(y > 0) ∧ (x 0 = x + 1) ∧ (y 0 = y − 1)]hx ,y i
In a behavior satisfying this formula, x cannot be incremented forever be-
cause eventually y would equal 0, making any further non-stuttering steps
impossible. Therefore, formula ∃ y : F is equivalent to

(6.8) (x = 0) ∧ 2[x 0 = x + 1]x ∧ 32[x 0 = x ]x

To see that this is not a safety property, remember that a behavior σ satisfies
a safety property iff every finite prefix of σ satisfies that property. Consider
a behavior σ in which x does keep being incremented forever. Every finite
prefix of σ satisfies (6.8), since completing the prefix with stuttering steps
makes the behavior satisfy the liveness property 32[x 0 = x ]x . However, σ
doesn’t satisfy (6.8) because it doesn’t satisfy this liveness property. There-
fore, even though formula F , defined to equal (6.7), is a safety property,
formula ∃ y : F , which is equivalent to (6.8), is not a safety property.

6.2 History Variables


The simplest kind of auxiliary variable is a history variable. As the name
implies, a history variable is used to remember things that happened in the
past and can’t be deduced from the current state. We may need to add a
history variable to T to prove |= T ⇒ S when the internal state of S records
information about past events that isn’t needed to describe the behavior of
its interface variables.

6.2.1 How to Add a History Variable


Except in one unusual case described in Section 6.3.5, we add an auxiliary
variable to an abstract program by adding it to the safety part of the pro-
gram. Thus, if T equals Init ∧ 2[Next]v ∧ L for a liveness property L, then
the formula T h obtained by adding a history variable h will equal

Init h ∧ 2[Next h ]vh ∧ L


CHAPTER 6. AUXILIARY VARIABLES 200


InitS = (inp = rdy) ∧ (avg = 0) ∧ seq = h i

User = ∧ inp = rdy
∧ inp 0 ∈ R
∧ (avg 0 = avg) ∧ (seq 0 = seq)

Syst = ∧ inp ∈ R
∧ seq 0 = Append (seq, inp)
∧ avg 0 = SeqSum(seq 0 ) / Len(seq 0 )
∧ inp 0 = rdy

NextS = User ∨ Syst
IS = InitS ∧ 2[NextS ]hinp,avg,seq i


S = ∃ seq : IS

Figure 6.1: The abstract averaging program S.

where Init h and Next h are obtained by augmenting Init and Next to de-
scribe, respectively, the initial value of h and how h can change; and vh is
the tuple v ◦ hh i of the variables of v and the variable h. Since h does not
appear in L, the formula ∃ h : T h equals

(∃ h : Init h ∧ 2[Next h ]vh ) ∧ L

We can therefore ignore L for now, so we assume T equals Init ∧ 2[Next]v


and show how to define Init h and Next h .
We use a tiny example to illustrate history variables. There is an abstract
program in which a user inputs a sequence of real numbers and the system
displays the average of the numbers entered thus far. The interface variables
are inp and avg. Initially, inp equals a value rdy that is not a number and
avg = 0. The user’s input action sets inp to a real number and leaves avg
unchanged. The system’s output action sets avg to the new average of the
inputs and resets inp to rdy.
This abstract program is described by formula S of Figure 6.1. It uses
an internal variable seq whose value is the ordinal sequence of numbers
input so far. Recall that R is the set of real numbers, and Section 2.3.2
defines these operators on sequences seq: Append (seq, inp) is the sequence
obtained by appending inp to the end of seq; Len(seq) is the length of seq;
and Tail (seq) is the sequence obtained by removing the first element of
seq if seq is nonempty. The operator SeqSum is defined as follows so that
CHAPTER 6. AUXILIARY VARIABLES 201

variables inp = rdy, avg = 0, num = 0 ;


while true do
usr : inp :∈ R ;
sys: avg : = (avg ∗ num + inp) / (num + 1) ;
num : = num + 1 ;
inp : = rdy
end while

Figure 6.2: Abstract program T in pseudocode.

SeqSum(sq) is the sum of the elements of a finite sequence sq of numbers:



SeqSum(sq) = if sq = h i then 0 else sq(1) + SeqSum(Tail (sq))

Using the internal variable seq to write the behavior predicate S is arguably
the clearest way to describe the values assumed by the interface variables inp
and avg. It’s a natural way to explain that the value of avg is the average of
the values that have been input. However, it’s not a good way to describe
how to implement the system. There’s no need for an implementation to re-
member the entire sequence of past inputs; it can just remember the number
of inputs and their sum. In fact, it doesn’t even need an internal variable to
remember the sum. We can implement it with an abstract program T that
implements S using only a single internal variable num whose value is the
number of inputs that the user has entered.
We first describe T in pseudocode and construct T h by adding a history
variable h to the code. The TLA translations of the pseudocode show how
to add a history variable to an abstract program described in TLA.
It’s natural to think of the user and the system in this example as two
separate processes. However, the abstract programs S and T are predicates
on behaviors, which are mathematical objects. Process is not a mathemat-
ical concept; it’s a way in which we interpret predicates on behaviors. For
simplicity, we write T as a single-process program.
The pseudocode for program T is in Figure 6.2. It uses the operator :∈
introduced in Figure 4.8, so statement usr sets inp to an arbitrary element
of R. Since we’re not concerned with implementing T , there’s no reason to
hide its internal variable num.
Because the sum of n numbers whose average is a is n ∗ a, it should be
clear that program T implements program S . But showing that T imple-
ments S requires defining a refinement mapping under which T implements
IS (program S without variable seq hidden). And that requires adding an
CHAPTER 6. AUXILIARY VARIABLES 202

variables inp = rdy, avg = 0, num = 0, h = h i ;


while true do
usr : inp :∈ R ;
sys: avg : = (avg ∗ num + inp) / (num + 1) ;
num : = num + 1 ;
h : = Append (h, inp) ;
inp : = rdy
end while

Figure 6.3: Abstract program T h in pseudocode.

auxiliary variable that records the sequence of input values. Adding the
required auxiliary variable h is simple and obvious. We just add the two
pieces of code shown in black in Figure 6.3.
It is a straightforward exercise to prove

|= T h ⇒ (IS with inp ← inp, avg ← avg, seq ← h)

using that fact that

avg ≡ if h = h i then 0 else SeqSum(h) / Len(h)

is an invariant of T h . To show that this proves |= T ⇒ S , we have to show


that T h actually is obtained by adding the auxiliary variable h to T —that
is, we have to show that T is equivalent to ∃ h : T h . This requires showing
(i) |= (∃ h : T h ) ⇒ T and (ii) |= T ⇒ (∃ h : T h ).
To show (i) we have to show |= T h ⇒ T , which is obvious because it’s
easy to see that the initial predicate and next-state action of T h imply the
initial predicate and next-state action of T . To show (ii), we have to show
that for any behavior σ satisfying T there is a behavior τ satisfying T h with
τ ∼ h σ. From the code for T h , it’s easy to obtain a recursive definition of
the value of h in each state τ (i ) of τ . The declaration of h provides the
value of h in state τ (0), and the rest of the code defines the value of h in
state τ (i + 1) as a function of its value and the value of pc in state τ (i ).
It’s pretty obvious how to generalize from this example to adding a
history variable h to any abstract program T described by pseudocode. We
let the initial value of h be any expression that can contain the variables of
T . We modify the pseudocode by adding at most one statement assigning a
value to h to any action of the code. The right-hand side of the assignment
can contain h as well as the variables of T . Making this precise would
CHAPTER 6. AUXILIARY VARIABLES 203

require making pseudocode precise, which we don’t want to do. When we


want to be precise, we use math.
So, let’s now see how we add a history variable when the abstract pro-
gram T is written in TLA. The translation of the code in Figure 6.2 to TLA
defines

T = Init ∧ 2[Next]hinp,avg,num i
∆ ∆
where Next = Usr ∨ Sys

Actions Usr and Sys are the actions executed from control points usr and
sys, respectively. The TLA translation of the code in Figure 6.3 is

(6.9) T h = Init h ∧ 2[Next h ]hinp,avg,num,h i



where Init h = Init ∧ (h = h i)

Next h = Usr h ∨ Sys h

Usr h = Usr ∧ (h 0 = h)

Sys h = Sys ∧ (h 0 = Append (h, inp))
Here is the general result that describes how to add a history variable to a
program. Its proof is a simple generalization of the proof for our example.

Theorem 6.1 (History Variable) Let T equal Init ∧ 2[Next]v , where


Next equals ∃ i ∈ I : Ai and v is the tuple of variables in T , and assume h
is not one of those variables. If T h equals Init h ∧ 2[Next h ]vh , where

• Init h = Init ∧ (h = exp)

• Next h = ∃ i ∈ I : Ai ∧ (h 0 = exp i )

• vh = v ◦ hh i

• exp is a state expression that does not contain the variable h, and the
exp i are step expressions that do not contain h 0 ,

then |= T ≡ ∃ h : T h .

6.2.2 History Variables and Fairness


We add a history variable h to a safety property T of the form Init ∧2[Next]v
to obtain a formula T h such that ∃ h : T h is equivalent to T . If a program
also contains a liveness condition L, this gives us the program T h ∧ L. Since
the variable h does not occur in L, the formula ∃ h : T h ∧ L is equivalent to
(∃ h : T h ) ∧ L which equals T ∧ L. Therefore the history variable h is an
auxiliary variable for T h ∧ L.
CHAPTER 6. AUXILIARY VARIABLES 204

As explained in Section 4.2.7, the standard form for the liveness condition
of a program is the conjunction of weak and/or strong fairness conditions
of subactions of its next-state action. Even if T ∧ L has this form, T h ∧ L
will not because a subaction of Next will not be a subaction of Next h . (An
action that does not mention h cannot imply Next h .) This means that we
can’t apply Theorem 4.6 to show that hT h , Li is machine closed. However,
we can show as follows that if hT , Li is machine closed, then hT h , Li is also
machine closed. By definition of machine closure, this means showing that
any finite behavior ρ satisfying T h can be extended to an infinite behavior
satisfying T h ∧ L. Since T h implies T , machine closure of hT , Li implies ρ
can be extended to a behavior ρ ◦ σ satisfying T ∧ L. By definition of T h ,
we can modify the values of h in the states of σ to obtain a behavior τ such
that ρ ◦ τ satisfies T h . Since the truth of L does not depend on the values
of h, the behavior ρ ◦ τ also satisfies L, as required.
When using TLA, the fact that L will contain fairness conditions on
actions that are not subactions of Next h makes no difference. However, not
everyone uses TLA. In some approaches, abstract programs are described
in something like a coding language, and they define fairness only in terms
of weak and strong fairness of subactions of the next-state action. So, it is
interesting to know if we can replace a fairness condition on a subaction B i
of T with the same fairness condition on a corresponding subaction B hi of
T h . We can, under the following condition, which is likely to be satisfied
by programs written in those other languages: The next-state action of T
must be the disjunction of actions Ai , and each B i must be a subaction of
Ai such that a B i step is not an Aj step for j 6= i . The precise result is
the following, whose proof is in the Appendix. In this theorem, letting B i
equal false is equivalent to omitting that fairness condition because weak
and strong fairness of false are trivially true. (The action false is never
enabled, so (4.23) implies SFv (false) equals 23false ⇒ 23false, which
equals true.)

Theorem 6.2 With the assumptions of Theorem 6.1, for all i ∈ I let B i
be a subaction of Ai such that T ∧ (i 6= j ) ⇒ 2[¬(B i ∧ Aj )]v for all j in

I ; and let B hi = hB i iv ∧ (h 0 = exp i ). Then

T ∧ (∀ i ∈ I : XFiv (Bi )) ≡ ∃ h : T h ∧ (∀ i ∈ I : XFivh (Bih ))

where each XFi is either WF or SF.


CHAPTER 6. AUXILIARY VARIABLES 205

6.2.3 A Completeness Result for History Variables


A popular approach to proving safety properties of concurrent programs,
derived from work by Owicki and Gries [42], can prove only invariance
properties. We can, in theory, reduce proving safety properties to prov-
ing invariance. We do this by adding a history variable h to a program T to
obtain a program T h . For any safety property F , we can then define a state
predicate I F that is an invariant of T h iff (every behavior of) T satisfies F .
The idea is simple: We define the value of h to be the sequence of system
states in the current behavior up to and including the current state. We
then define I F to be true iff the value of h satisfies F . The result is stated
in the following theorem, whose proof is sketched in the Appendix.

Theorem 6.3 Let T equal Init ∧ 2[Next]hxi where x is the list of all vari-
ables of S ; let F be a safety property such that F (σ) depends only on the
values of the variables x in σ, for any behavior σ; and let h be a variable not
one of the variables x. We can add h as a history variable to T to obtain
T h and define a state predicate I F in terms of F such that |= [[T ]] ⇒ F is
true iff I F is an invariant of T h .

A simple example of the theorem is when F is the safety property F 12


defined semantically by (4.1) of Section 4.1.2. That property asserts x must
equal 1 before it can equal 2. A program Init ∧ 2[Next]v satisfies F 12 iff the
formula (x = 2) ⇒ h is an invariant of the program obtained by adding the
history variable h to that program as follows:

(Init ∧ (h = false)) ∧ 2[Next ∧ (h 0 = h ∨ (x = 1))]v ◦hh i

Theorem 6.3 assumes only that F is a safety property. This might suggest
we can show that one program satisfies the safety part of another program by
verifying an invariance property. However, I have never seen this done, and
in practice it seems unlikely to be possible to describe any but the simplest
abstract programs with an invariant.

6.3 Stuttering Variables


Typically, when a lower-level abstract program T implements a higher-level
abstract program S , program T takes more steps than S does to perform an
operation. Under the refinement mapping, the extra steps of T implement
stuttering steps of S . It’s also possible for S to take more steps than T . In
that case, defining a refinement mapping requires adding steps to behaviors
CHAPTER 6. AUXILIARY VARIABLES 206

of T that implement those extra steps of S . This is done by adding a


stuttering variable s to T . The extra steps are ones that change only s, so
when s is hidden, those steps become stuttering steps of T .
There are two kinds of stuttering variables used in practice: ones that
add stuttering steps immediately after steps of an action, and ones that add
stuttering steps immediately before steps of an action. They are described
in Sections 6.3.2 and 6.3.3. Multiple such variables can be combined into a
single stuttering variable. Section 6.3.5 explains another kind of stuttering
variable that is never needed in practice but could, in theory, be required.
This section talks about adding stuttering steps, which literally makes no
sense because it’s impossible to require or forbid stuttering steps in a TLA
formula. Here, adding stuttering steps to an abstract program T means
writing a formula T s containing s and the variables of T by adding steps
that change s and leave the variables of T unchanged, so that ∃ s : T s is
equivalent to T . In this section, a stuttering step usually means one of those
additional steps that leave the variables of T unchanged and change s.

6.3.1 The Example


Stuttering variables are explained with the silly example of a tiny censoring
system. An artist paints pictures and submits them to a censor, who decides
either to display or reject each picture. This system is described by the
abstract program Cen1 defined as follows.
There are two interface variables inp and disp. The artist submits a
picture w , which is an element of the set Art of all possible pictures, by
setting the value of the variable inp to w . The censor then either displays w
by setting the value of the variable disp to hw , i i, where i is set alternately
to 0 and 1, or else rejects w . (The second component of disp is needed
so displaying the same picture twice isn’t a stuttering step, which would
needlessly complicate the example.) The censor then acknowledges receipt
of the picture by setting the value of inp to a special value NotArt that is
not an element of Art.
There is also an internal variable aw that is hidden. The value of aw is
initially the empty sequence h i. It is set to hw i when the artist submits a
picture w , and it is reset to h i when w is either displayed or rejected. The
value of aw records whether or not the display/reject decision has been made.
That information is encoded in aw this way so the example is more easily
modified to obtain an example in Section 6.4. The complete description of
the abstract program is formula Cen1 in Figure 6.4, where ICen1 is the
program without aw hidden. (Initially, any painting may be displayed.)
CHAPTER 6. AUXILIARY VARIABLES 207


Cen1 = ∃ aw : ICen1
ICen1 = Init ∧ 2[Next1]v


v = hinp, disp, aw i

Init = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}

Next1 = Input ∨ DispOrNot ∨ Ack

Input = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ aw 0 = hinp 0 i
∧ disp 0 = disp

DispOrNot = ∧ aw 6= h i
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = h i
∧ inp 0 = inp

Ack = ∧ (inp ∈ Art) ∧ (aw = h i)
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 6.4: The program Cen1.

There is another way to describe the artist/censor system as an ab-


stract program. In ICen1, submission of a picture w by the artist is de-
scribed by an input action that sets inp to w and aw to hw i. A separate
action DispOrNot either displays or rejects w . We define Cen2 to equal
∃ aw : ICen2 where ICen2 describes a program in which it is the input ac-
tion that decides whether to display or reject w , setting aw to hw i iff it
decides to display w . The displaying action always displays w if aw equals
hw i. The program Cen2 is defined in Figure 6.5, where v , Init, and Ack
are the same as in Cen1 and are defined in Figure 6.4.
If we ignore the values of aw , the only difference between behaviors
of ICen1 and ICen2 is that, when a picture is rejected, the behavior of
ICen1 takes one more step than the corresponding behavior of ICen2—a
step that leaves inp and disp unchanged. Since inp and disp are the only
free variables in the two definitions of Cen, stuttering insensitivity implies
CHAPTER 6. AUXILIARY VARIABLES 208


Cen2 = ∃ aw : ICen2
ICen2 = Init ∧ 2[Next2]v


Next2 = InpOrNot ∨ Display ∨ Ack

InpOrNot = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ ∨ aw 0 = hinp 0 i
∨ aw 0 = aw
∧ disp 0 = disp

Display = ∧ aw 6= h i
∧ disp 0 = haw (1), 1 − disp(2)i
∧ aw 0 = h i
∧ inp 0 = inp
Figure 6.5: The program Cen2.

that the formulas Cen1 and Cen2 are equivalent, so they describe the same
abstract program.
To show that the two definitions are equivalent, we have to show that
ICen1 and ICen2 each implement the other under a suitable refinement map-
ping. We will see here how to define the refinement mapping under which
ICen2 implements ICen1. Section 6.4 shows how to define the refinement
mapping under which ICen1 implements ICen2.

6.3.2 Adding Stuttering Steps After an Action


To define the refinement mapping that shows ICen2 implements ICen1, we
have to add a stuttering step to an execution of ICen2 for each operation of
receiving an input and rejecting it. We do that by adding a stuttering vari-
able that adds a stuttering step after each InpOrNot step of the execution
that rejects the input—that is, after InpOrNot steps that set aw to h i.
The simplest stuttering variable s is one whose value is a natural number
that equals 0 except when it is adding stuttering steps (steps that change
only s), in which case s equals the number of such steps it has yet to take.
Here’s how we add such a variable that adds stuttering steps after a subac-
tion A of the next-state action.
Let T equal Init ∧ 2[Next]v , where Next equals A ∨ (∃ j ∈ J : B j ) for
actions A and B j . We define T s to equal Init s ∧ 2[Next s ]vs , where Next s
equals As ∨ (∃ j ∈ J : B sj ) and Init s , As , B sj , and vs are defined as follows:
CHAPTER 6. AUXILIARY VARIABLES 209

S1. vs is the tuple of variables obtained by appending s to the tuple v of


variables.

S2. Init s = Init ∧ (s = 0).

S3. As = ∨ (s = 0) ∧ A ∧ (s 0 = exp)
∨ (s > 0) ∧ (v 0 = v ) ∧ (s 0 = s − 1)
where exp is an expression whose value is a natural number; it can
contain the original variables primed or unprimed.

S4. Bjs = (s = 0) ∧ B j ∧ (s 0 = 0), for j ∈ J .

Ignoring the value of s, the behaviors satisfying T s are the same as behaviors
satisfying T , except each A step in a behavior of T is followed in T s by a
finite number (possibly 0) of steps that leave the variables of T unchanged.
Therefore, by stuttering insensitivity, T and ∃ s : T s are satisfied by the
same sets of behaviors, so they are equivalent.
To show that ICen2 implements ICen1, we define ICen2s in this way,
where A equals InpOrNot and the B i are Ack and Display. In the definition
of InpOrNot s , we let:

exp = if aw 0 = h i then 1 else 0

This adds a stuttering step to a behavior of ICen2s after an InpOrNot step


that rejects the input.
Programs ICen1 and ICen2s take the same number of steps to process
an input. A stuttering step of ICen2s corresponds to a DispOrNot step of
ICen1 that rejects the input. If we compare behaviors of these two programs,
we find that corresponding states have the same values of the variables inp,
disp, and aw except when ICen2s is about to take a stuttering step. In that
state, s = 1 in ICen2s , and the value of aw for an input w is h i in ICen2s
and hw i in ICen1. This means that the value of aw in a behavior of ICen1
always equals the value of the following state function in the corresponding
behavior of ICen2s :

(6.10) awBar = if s = 0 then aw else hinp i

Therefore, ICen2s implements ICen1 under the refinement mapping that


substitutes awBar for aw . In other words:

(6.11) |= ICen2s ⇒ (ICen1 with aw ← awBar )


CHAPTER 6. AUXILIARY VARIABLES 210

The proof of (6.11) is similar to, but simpler than, the refinement proof
sketched in Section 5.4.2. Here, we give only the briefest outline of a proof
to present results that will be used below when discussing liveness.
Let’s abbreviate (F with aw ← awBar ) by F for any formula F , so we
must prove |= ICen2s ⇒ ICen1. The proof of |= Init s ⇒ Init is trivial, since
s = 0 implies v = v by definition of awBar , so Init s implies Init = Init.
The main part of the proof is proving:

(6.12) |= ICen2s ⇒ 2[Next1]v

This is proved by proving assertions C1–C4 below, which are the analogs of
assertions R1–R7 of the proof in Section 5.4.2. Again, assertions containing
actions of the form hAiv are proved for use in reasoning about liveness when a
weaker assertion containing A suffices to prove (6.12). Two of the assertions
require an invariant Inv 2s of ICen2s . That invariant needs to assert type
correctness of disp (for C3) and that s = 1 implies aw = h i (for C2).

C1. |= (s = 0) ∧ InpOrNot s ⇒ Input


C2. |= Inv 2s ∧ (s = 1) ∧ InpOrNot s ⇒ hDispOrNot iv
C3. |= Inv 2s ∧ Display s ⇒ hDispOrNot iv
C4. |= Ack s ⇒ hAck iv

Proving these assertions is a good way to start learning to write proofs.


Often, when showing that one program implements another, after adding
a simple stuttering variable it’s necessary to add a history variable to be able
to define the refinement mapping. For example, suppose the input actions
of the censor programs set inp to some value Busy instead of letting it be
unchanged. We could then not define awBar to make (6.11) true because the
input value would be forgotten when s equals 1. To define a state function
awBar to make (6.11) true, we would have to add a history variable that
remembers what value was input.
We can avoid having to add a history variable by letting the stuttering
variable carry additional information. This is done by generalizing the way
stuttering steps are counted. In the censor example, instead of setting s
to 1 when adding a stuttering step and to 0 otherwise, we can set it to hw i
when adding the step, where w is the value being input, and to h i when not
adding the step. The number of stuttering steps to be taken at any point in
the execution is then the length Len(s) of the sequence s. We would define
awBar to equal:

if s = h i then aw else s
CHAPTER 6. AUXILIARY VARIABLES 211

In general, we can let s assume values in any set with a well-founded relation,
defined in Section 2.2.6.2 to be a set with an ordering relation in which any
sequence of decreasing elements must reach a minimal element. We just
require that every added stuttering step decreases the value of s.
One use of this generality is for adding stuttering steps after multiple
actions. To do this, we let the value of s be a pair hm, i i, where m is the
number of remaining stuttering steps and i identifies the action. We define
the well-founded ordering  on this set of pairs by letting hm, i i  hn, j i
iff m > n. We can use this same trick to let the value of s be a tuple
with additional components. Information in those other components can
be used in defining the refinement mapping so it makes the stuttering steps
implement the appropriate steps of the higher-level program. For simplicity,
we state our theorem just for this particular use of a well-founded order.
However, the conjunct s(2) = i in the definition of Asi is added to ensure
that only Asi performs stuttering steps added after Ai , although that matters
only if s contains additional components that depend on i .

Theorem 6.4 (Post-Action Stuttering Variable)


Let T equal Init ∧ 2[Next]v , where Next equals (∃ i ∈ I : Ai ) ∨ B for a
constant set I , and v is a tuple of all the variables of T . If T s equals
Init s ∧ 2[Next s ]vs where

• s is not a variable of T and vs = v ◦ hs i.

• Init s = Init ∧ (s = h0, i 0 i) for some i 0 in I .

• Next s = (∃ i ∈ I : Asi ) ∨ B s

• Asi = ∨ (s(1) = 0) ∧ (s(2) = i ) ∧ Ai ∧ (s 0 = hexp i , i i)
∨ (s(1) > 0) ∧ (v 0 = v ) ∧ (s 0 = hs(1) − 1, s(2)i)
where |= T ⇒ 2[exp i ∈ N]v and exp i is a step expression not contain-
ing s.

• B s = (s(0) = 0) ∧ B ∧ (s 0 = s)

then ∃ s : T s equals T .

The theorem does not assume that the actions Ai and B are mutually dis-
joint. A step could be both an Ai and an Aj step for i 6= j , or both an Ai
and a B step. That should rarely be the case when applying the theorem,
since it allows a nondeterministic choice of how many stuttering steps (if
CHAPTER 6. AUXILIARY VARIABLES 212

any) are added in some states. The action B will usually be the disjunc-
tion of actions B j . In that case, B s equals the disjunction of the actions
(s(0) = 0) ∧ B j ∧ (s 0 = s).

6.3.3 Adding Stuttering Steps Before an Action


Suppose that instead of adding stuttering steps after InpOrNot steps of
ICen2, we want to add them before Ack steps. That’s a silly thing to do,
but it’s a silly example anyway. One thing that makes it silly is that when
the Ack action is enabled, nothing in the state tells us whether a stuttering
step is necessary. The value of aw is h i regardless of whether or not a
Display step has occurred. So we’ll have to add the stuttering step whether
or not it’s needed. But that’s not a problem, since an unnecessary stuttering
step can simply implement a stuttering step of ICen1.
For a simple stuttering variable that counts down to 0, we add stuttering
steps before an action A the way we added them after A, except instead of
As executing A in the first step when s equals 0, it executes A in the last
step, when s equals 1 (unless it adds 0 stuttering steps). However, to ensure
that an As step can be taken after those k stuttering steps, the stuttering
steps can begin only when A is enabled. (Once A is enabled, stuttering steps
leave it enabled.) To add exp stuttering steps before an A step, we define:

As = ∧ ∨ E(A) ∧ (s = 0) ∧ (s 0 = exp)
∨ (s > 0) ∧ (s 0 = s − 1)
∧ if s 0 = 0 then A else v 0 = v

Since A is enabled when s 0 = 0 is true, any enabling condition (conjunct


with no primed variable) can be removed from A in the last line of the
definition.
We could define Ack s this way in ICen2s with exp = 1 to add a stut-
tering step before every Ack step. However, there’s nothing in the state to
indicate whether that stuttering step should implement a Display step or
a stuttering step of ICen1. To define the refinement mapping that shows
ICen2s implements ICen1, we would have to add a history variable that
records whether or not the InputOrNot s step decided to display the input.
Alternatively, we could add the history variable before adding the stutter-
ing variable. We could then define Ack s so it adds a stuttering step iff the
InpOrNot step chose not to display the input.
To obtain the general result for adding stuttering steps before actions
CHAPTER 6. AUXILIARY VARIABLES 213

Ai , we modify Theorem 6.4 by changing the definition of Asi to:



Asi = ∧ ∨ E(Ai ) ∧ (s(1) = 0) ∧ (s 0 = hexp i , i i)
∨ (s > 0) ∧ (s(2) = i ) ∧ (s 0 = hs(1) − 1, s(2)i)
∧ if s 0 (1) = 0 then Ai else (v 0 = v )

where exp i is a state expression. (Although allowed, there is usually no point


having primed variables in exp i , because they equal the unprimed variables
unless exp i equals 0.)
We can also add stuttering steps both before and after Ai steps. We add
a third component to s to indicate whether the next stuttering steps to be
added for Ai are ones that precede or follow the Ai step. Writing a precise
definition is left as an exercise for motivated readers.

6.3.4 Fairness and Stuttering Variables


As with other auxiliary variables, we add a stuttering variable to a safety
property T of the form Init ∧ 2[Next]v . If a program is described by the
property T ∧ L for a liveness property L, then the program with the added
stuttering variable s is T s ∧ L .
To see how stuttering variables work with liveness conditions, we add
fairness requirements L1 and L2 to our two censor programs to define:
∆ ∆
IC 1 = ICen1 ∧ L1 IC 2 = ICen2 ∧ L2

We’ve shown that ICen2s implements ICen1 under a refinement mapping.


We show here that IC 2s implements IC 1 under that same refinement map-
ping. That is, we show:

(6.13) |= IC 2s ⇒ (IC 1 with aw ← awBar )

The fairness requirements are:



L1 = WFv (DispOrNot) ∧ WFv (Ack )

L2 = WFv (Display) ∧ WFv (Ack )

(Theorem 4.7 implies that L1 and L2 are equivalent to weak fairness of


DispOrNot ∨ Ack and Display ∨ Ack respectively, but it’s more convenient
to write them this way.) It’s clear that L1 and L2 are the appropriate
fairness requirements for IC 1 and IC 2, ensuring that an Ack step occurs
after each input step. In particular, an input step of IC 2 is a DispOrNot
step, after which eventually Ack is enabled—either immediately if the input
CHAPTER 6. AUXILIARY VARIABLES 214

is rejected or after a Display step that WFv (Display) implies must occur.
When Ack is enabled, WFv (Ack ) implies that an Ack step must occur.
For IC 2s to implement IC 1 under a refinement mapping, it should ensure
that an input step is eventually followed by an Ack s step. In IC 2s , an input
is entered by an (s = 0) ∧ InpOrNot s step. We must show that such a step
is eventually followed by an Ack s step. This appears problematic because if
the step rejects the input, then it sets s to 1, in which case the only enabled
action of Next2s is (s = 1) ∧ InpOrNot s ; and L2 asserts no fairness condition
for that action. To show that the (s = 0) ∧ InpOrNot s step must be followed
by an Ack s step, we first show as follows that ∃ s : IC 2s is equivalent to IC 2:

∃ s : IC 2s ≡ ∃ s : ICen2 ∧ L2 By definition of IC 2s .
≡ (∃ s : ICen2s ) ∧ L2 Because s does not occur in L2.
≡ ICen2 ∧ L2 By Theorem 6.4.
≡ IC 2 By definition of IC 2s .

Any behavior that satisfies IC 2s satisfies ∃ s : IC 2s , so it satisfies IC 2. An


(s = 0) ∧ InpOrNot s step is an InpOrNot step, which by IC 2 must eventu-
ally be followed by an Ack step, which by definition of ICen2s must be an
Ack s step. Thus, IC 2s implies that any input step is eventually followed by
an Ack s step.
In the case of the input being rejected, the necessary (s = 1)∧InpOrNot s
step must occur to satisfy the fairness requirement WFv (Ack ) on the action
Ack in IC 2. If you think of the abstract program IC 2s as instructions
to a computer for generating behaviors, then this makes no sense. How
can a fairness condition on Ack tell the computer to take an InpOrNot s
step? But by now, you should understand that an abstract program is
a predicate on behaviors, not instructions for generating them. Formula
ICen2s ∧ WFv (Ack ) implies that if a state with s = 1 has been reached,
then there must be another InpOrNot s step and then an Ack s step in the
behavior.
This may seem weird. The source of the apparent weirdness is that
ICS 2 contains a fairness condition on the action Ack , which is not a subac-
tion of the next-state action Next2s . Fairness conditions on actions not a
subaction of the next-state action can lead to weirdness, including program
descriptions that are not machine closed. However, in this case, we still get
a machine-closed program description. In fact, this is true in general. If
hT , Li is machine closed and T s is obtained from T by adding a stuttering
variable, then hT s , Li is also machine closed. The proof is the same as the
one for history variables sketched in Section 6.2.2, except in defining the
CHAPTER 6. AUXILIARY VARIABLES 215

behavior τ , we may have to add stuttering steps to σ as well as changing


the values of the variable s. Stuttering insensitivity of L implies that ρ ◦ τ
still satisfies L.
We now explain the proof of (6.13). As before, define F to equal
(F with aw ← awBar ) for any formula F , so (6.13) asserts |= IC 2s ⇒ IC 1.
The proof of |= ICen2s ⇒ ICen1 is discussed in Section 6.3.2, so we consider
only the proof of |= IC 2s ⇒ L1 , which requires proving:

(6.14) (a) |= IC 2s ⇒ WFv (DispOrNot)


(b) |= IC 2s ⇒ WFv (Ack )

We now sketch a proof of (6.14a). As usual when proving temporal proper-


ties, instead of assuming IC 2s , which is true only for a behavior starting in
a state satisfying Init2s , we assume this 2 formula implied by IC 2s

IIC 2s = 2Inv 2 ∧ 2[Next2s ]vs ∧ L2


where Inv 2 is an invariant of ICen22 that asserts some obvious invariants


such as type correctness. Here is the proof sketch.

1. Suffices: Assume: IIC 2s ∧ 2 EhDispOrNotiv ∧ 2[¬DispOrNot]v


Prove: 3hDispOrNotiv
Proof: By (4.15).
2. 2((aw 6= h i) ∨ (s 6= 0))
Proof: The definition of DispOrNot implies that EhDispOrNotiv equals
aw 6= h i, so the step 1 assumption 2 EhDispOrNotiv implies 2aw 6= h i;
and the definition of awBar implies aw 6= h i equals (aw 6= h i) ∨ (s 6= 0).
3. 32(aw 6= h i) ∨ 32(s 6= 0)
3.1. 2 ( (aw 6= h i) ⇒ 2(aw 6= h i) )
Proof: The assumption IIC 2s implies that aw 6= h i can be made
false only by a Display s step, which by C3 is a hDispOrNotiv step.
The assumption 2[¬DispOrNot]v implies that such a step can’t occur.
Therefore, if aw 6= h i ever becomes true, then it must remain true
forever.
3.2. Q.E.D.
Proof: By steps 2 and 3.1 and the temporal logic tautology:
|= 2(F ∨ G) ∧ 2(F ⇒ 2F ) ⇒ (32F ∨ 32G)
CHAPTER 6. AUXILIARY VARIABLES 216

4. Case: 32(aw 6= h i)
Proof: Since aw 6= h i equals EhDisplayiv , the case assumption and
WFv (Display) imply that, when 2(aw 6= h i) becomes true, a hDisplay iv
step eventually occurs, and IIS s implies that this step must be a Display s
step. By C3, this Display s step is a hDispOrNotiv step, which implies the
goal introduced by step 1.
5. Case: 2(s 6= 0)
Proof: The case assumption and the assumption 2Inv 2 imply 2(s = 1).
As shown above in the explanation of why a behavior of IC 2s can’t halt
in a state with s = 1, the property WFv (Ack ) implies that, in such a
state, an (s = 1) ∧ InpOrNot s step must eventually occur. By C2, that is
the hDispOrNotiv step that proves the step 1 goal.
6. Q.E.D.
Proof: Step 3 implies that the step 4 and 5 cases are exhaustive.

The proof of (6.14b) is similar but simpler, since it doesn’t have the
complication of deducing from fairness of one action (Ack s ) that a step of
another action (DispOrNot s ) of the same program must occur.
Theorem 6.2 shows how, after adding a history variable to a program,
we can rewrite the program’s fairness properties as fairness conditions of
subactions of the modified program’s next-state action. I don’t know if
there is a similar result for stuttering variables. Theorem 6.2 is relevant
to methods other than TLA for describing abstract programs. Those other
methods that I’m aware of do not assume stuttering insensitivity, so a similar
result for stuttering variables seems to be of no interest.

6.3.5 Infinite-Stuttering Variables


Suppose a terminating program is described by a formula ∃ y : IS , where
IS implies that the value of y keeps changing forever. (IS implies that at
some point, the values of all its other variables stop changing.) Suppose
also that program ∃ y : IS is refined by a terminating program T with no
internal variables, so all its variables eventually stop changing. The methods
of adding stuttering steps to a program described so far add a finite number
of stuttering steps to non-stuttering steps of the program. They can’t define
a state function that keeps changing forever, so they can’t be used to define
a refinement mapping to show that T implements ∃ y : IS .
It’s easy to construct an example of such programs IS and T , but I can’t
CHAPTER 6. AUXILIARY VARIABLES 217

imagine one occurring in practice. We consider them only for completeness—


and in particular, to prove the completeness theorem in Section 6.4.4 stating
that if |= T ⇒ ∃ y : IS is true for some tuple y of variables, then we can add
auxiliary variables to T to obtain a program T a that implements IS under
a refinement mapping. For that theorem to be true, we need to define an
infinite-stuttering variable whose value keeps changing forever to handle this
situation that never occurs in practice.
There are lots of ways to define an infinite-stuttering variable. Here is
the definition used in the proof of Theorem 6.6. Let T equal Init ∧2[Next]v ,
where v is the tuple of all variables that appear in T , and let s not be one
of those variables. We then define T s to equal:

Init ∧ 2[(Next ∧ (v 0 6= v )) ∨ ((s 0 6= s) ∧ (v 0 = v )]v ◦hs i ∧ 23hs 0 6= s is

6.4 Prophecy Variables


6.4.1 Simple Prophecy Variables
We observed in Section 6.3.1 that the descriptions Cen1 and Cen2 of the
censor system were equivalent. We showed that Cen2 implies Cen1, which
required adding a stuttering variable to ICen2. We now complete the demon-
stration of equivalence by showing that Cen1 implies Cen2. This requires
defining a state function awBar such that ICen1 implies:

(ICen2 with aw ← awBar )

However, this is impossible for the following reason. Because the refinement
mapping substitutes the variables inp and disp of ICen1 for the correspond-
ing variables of ICen2, an Input step of ICen1 must implement an InpOrNot
step of ICen2. Besides choosing the input, the InpOrNot action of ICen2
also decides whether or not that input is to be displayed, recording its de-
cision in the value of aw . However, that decision is made by ICen1 later,
when executing the DispOrNot action. Immediately after the Input action,
there’s no information in the state of ICen1 to determine what the value of
variable aw of ICen2 should be.
The solution to this problem is to have the Input action guess what
DispOrNot will do, indicating its guess by setting a prophecy variable p to
a value that predicts whether the input will be displayed or rejected by the
DispOrNot step.
CHAPTER 6. AUXILIARY VARIABLES 218

To make the generalization from this example more obvious, let’s write
action DispOrNot of ICen1 as the disjunction of two actions: DorN Yes that
displays the input and DorN No that doesn’t. Remember that:

DispOrNot = ∧ . . .
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
..
.
We can define DorN i , for i = Yes and i = No, by modifying the definition
of DispOrNot to get:

DorN i = ∧ . . .
∧ ∨ (i = Yes) ∧ (disp 0 = haw (1), 1 − disp(2)i)
∨ (i = No) ∧ (disp 0 = disp)
..
.
We then replace DispOrNot in ICen1 by ∃ i ∈ Π : DorN i , where Π equals
{Yes, No}. We can then add to ICen2 an auxiliary variable p called a
prophecy variable to obtain a formula ICen2p in which the Input action is
replaced by
Input p = Input ∧ (p 0 ∈ Π)

and the DispOrNot action is replaced by:



DispOrNot p = DorN p
Thus the Input p action predicts what the DispOrNot action will do, and
DispOrNot p is modified to ensure that the prediction comes true. To com-
plete the definition of ICen1p , we can let Init p equal Init and Ack p equal
Ack , since the value of p matters only after an Input p step and before the
following DispOrNot p step.
In ICen2, the value of aw is h i except after an InpOrNot step that chose
to display the input. This implies
(6.15) |= ICen1p ⇒ (ICen2 with aw ← awBar )
where awBar is defined by:

awBar = if aw 6= h i ∧ (p = Yes) then aw else h i
To show that (6.15) implies |= ICen1 ⇒ ICen2, we have to show that p is an
auxiliary variable—that is, we have to show that ∃ p : ICen1p is equivalent
to ICen1. To do that, we prove two things:
CHAPTER 6. AUXILIARY VARIABLES 219

1. Every behavior satisfying ICen1p satisfies ICen1


Proof: It’s clear that Init p equals Init, Act p equals Act, and Input p im-
plies Input. To complete the proof, we must show that every DispOrNot p
step is a DispOrNot step. It’s easy to see that
E(DispOrNot p ) ⇒ (p ∈ {Yes, No})
is an invariant of ICen1p , and to check that DispOrNot p implies DispOrNot
for each of those two values of p.
2. For any behavior σ satisfying ICen1 there is a behavior τ satisfying
ICen1p such that σ 'p τ .
Proof: We let σ be a behavior satisfying ICen1 and construct the states
of τ from the states of σ by specifying the value of p in each of those
states, so obviously σ ' p τ . The behavior τ will satisfy ICen1p if the
values chosen for p satisfy these three conditions:
1. The value of p in the second state of a stuttering step of σ (one
leaving the variables of ICen1 unchanged) is the same as its value in
the first state of the step.
2. After an Input p step, the value of p is either Yes or No.
3. In the first state of a DispOrNot step of σ, the value of p must make
that step a DispOrNot p step of τ .
We define the values of p in all states of τ as follows. We let p have any
value in the initial state. In any other state of τ , we let the value of p be
the same as its value in the previous state except if the state is the second
state of an Input step of σ. In that case, we let the value of p equal Yes
if the next DispOrNot step of σ changes disp; otherwise we let it equal
No. (If there is no next DispOrNot step, so there remain only stuttering
steps, we can let p have either value.) It’s easy to check that this way of
defining p makes it satisfy the three conditions. End Proof

Let’s now generalize from this example. We want an action B to predict


the result of the next execution of an action A. We do this by writing A as
∃ i ∈ Π : Ai for a constant set Π of possible predictions and having B predict
for which value of i the next A step will be an Ai step. Action B makes the
prediction by setting the variable p to equal its prediction, so we define B p
to equal B ∧ (p 0 ∈ Π). The prediction is made to come true by defining Ap
to equal Ap .
One way our example was special is that the prediction made by an
Input p step is used by DispOrNot p in the first non-stuttering step after it
CHAPTER 6. AUXILIARY VARIABLES 220

is made. This allowed ICen1p to leave the new value of p unspecified by


other actions. Usually, there can be steps of other actions between when the
prediction is made and when it is fulfilled. Those other actions should leave
the value of p unchanged. For any subaction B of the next-state action other
than A, we let B p equal either B ∧ (p 0 = p) if it doesn’t make a prediction
or B ∧ (p 0 ∈ Π) if it makes one. It doesn’t matter if multiple predictions are
made for the same A step; only the most recent one counts.
There is no reason not to let p always equal an element of Π, so we will
do that. This means we let Init p equal Init ∧ (p ∈ Π). This can represent
an initial prediction, or it can be overridden by a subsequent prediction. In
either case, it means that we have the type invariant 2(p ∈ Π).
It doesn’t matter if a prediction is never used—either because it is over-
ridden by another prediction or an A step never occurs. What does matter is
that a prediction must be used at most once. Our ability to choose the right
prediction in the proof that ∃ p : ICen1p is equivalent to ICen1 depended
on this. To make sure that this is true, we require that Ap makes a predic-
tion, so we define Ap to equal Ap ∧ (p 0 ∈ Π). That prediction can always
be overridden by a subsequent prediction made by a different action. The
argument above that the variable p in our example was a prophecy variable
then generalizes to prove:

Theorem 6.5 (Simple Prophecy Variable) Let T = Init ∧ 2[Next]v


where v is the tuple of variables in T , let



Next = (∃ i ∈ Π : Ai ) ∨ (∃ j ∈ J : B j )

and let Π be a constant set. If p is not a variable of T ,

T p = Init p ∧ 2[Next p ]vp ,



vp = v ◦ hp i ,

Init p = Init ∧ (p ∈ Π) ,

Next p = (Ap ∧ (p 0 ∈ Π)) ∨ (∃ j ∈ J : B j ∧ C j ) ,

and each C j equals p 0 = p or p 0 ∈ Π, then |= (∃ p : T p ) ≡ T .

The theorem makes no disjointness assumption about the actions Ai and


B j , but in most applications of the theorem they will be mutually disjoint.
It is inelegant and possibly confusing to have a program make predica-
tions that are never used—for example, by having the DispOrNot p action of
ICens1p make a prediction that is always replaced by the prediction made
by the Input p action. If the prediction will never be used, we can replace
CHAPTER 6. AUXILIARY VARIABLES 221

p 0 ∈ Π by p 0 = None (or p ∈ Π by p = None for an initial prediction),


where None is a value not in Π. The assertion that the prediction is never
used means that the following state predicate is an invariant of T p :

(p = None) ⇒ ¬ E(∃ i ∈ Π : Ai )

We could also modify Next p to allow a special value of p indicate that no


prediction is being made, but there is no reason to do that.

6.4.2 Predicting the Impossible and Liveness


What if a prophecy variable makes a prediction that can’t be fulfilled? A
prophecy variable predicts, for an action A equal to ∃ i ∈ Π : Ai , the value
of i for which the next A step is an Ai step. The prediction that the next A
step will be an Ap step can’t be fulfilled if action Ap can’t be enabled until
an Aj step occurs for some j 6= p.
Let’s look at the worst case: a prediction that predicts that the next A
step will be an Ai step, where Ai equals false. We can write any next-state
action Next as

∃ i ∈ {0, 1} : ((i = 1) ∧ Next) ∨ ((i = 0) ∧ false)

(If i = 0, then a [Next]v step leaves the variables of v unchanged.) The


observation that ∃ i ∈ {0, 1} : F i equals F 0 ∨ F 1 and a bit of propositional
logic show that

|= ∃ i ∈ {0, 1} : (p = i ) ∧ (((i = 1) ∧ Next) ∨ ((i = 0) ∧ false))

equals (p = 1) ∧ Next. Theorem 6.5 therefore implies that if T equals


Init ∧ 2[Next]v , then T equals ∃ p : T p where

T p = ∧ Init ∧ (p ∈ {0, 1})
∧ 2 [ (p = 1) ∧ Next ∧ (p 0 ∈ {0, 1}) ]vp
In other words, if p ever becomes equal to 0, then the next-state relation of
T p is never again enabled, so the behavior halts with an infinite sequence
of stuttering steps—ones that leave p and the variables of T unchanged.
But that’s perfectly OK. T is a safety property; it allows behaviors
that terminate at any point. The prophecy variable p is simply predicting
whether the behavior will terminate before the next Next step.
If we are describing an abstract program in which hNext iv is always
enabled and its execution is never supposed to stop, then we must conjoin
to T some fairness property, such as WFv (Next). If hNext iv is enabled
CHAPTER 6. AUXILIARY VARIABLES 222

in every reachable state of T , then it is enabled in every reachable state


of T p , since the reachable states of T p are reachable states of T because
T equals ∃ p : T p . In that case, conjoining WFv (Next) to T p adds the
requirement that in every behavior, an infinite number of non-stuttering
Next steps must occur. In our worst-case example, p = 0 implies Next p =
false, so T p ∧ WFv (Next) is satisfied only by behaviors in which infinitely
many hNext iv steps occur, and hence in which p never equals 0.
Conjoining WFv (Next) to T p rules out finite behaviors allowed by T p —
ones in which p equals 0. Hence, the pair hT p , WFv (Next)i is not machine
closed, so WFv (Next) is not a fairness property for T p . This doesn’t con-
tradict Theorem 4.6, because Next, which is a trivial subaction of Next, is
not a subaction of Next p . In general, if predicting that the next A step is
an Ap step is a nontrivial predication, then every possible A step can’t be a
Next p step, so |= A ⇒ Next p can’t be true—which by definition means A is
not a subaction of Next p .
As dramatically illustrated by this example, adding a prophecy variable
that can make impossible predictions to a description of an abstract program
with a fairness property produces a hsafety, livenessi pair that is not machine
closed. Although this is not a typical example, in practice prophecy variables
often do make impossible predictions. This is usually because it’s easier not
to eliminate them. That’s the case for the example in Section 6.5.
Programs that are not machine closed are weird, and unintentional weird-
ness usually indicates an error. An abstract program that describes how a
concrete program works should be machine closed, because coding languages
have no way of expressing liveness properties that are not fairness proper-
ties. Abstract programs that are not machine closed should almost always
be avoided because they’re hard to understand. However, there are excep-
tions [25, Section 3.2]. On the other hand, prophecy variables are added
to a program only for verifying that it implements another program. There
is no reason adding a prophecy variable should produce a machine closed
program.

6.4.3 General Prophecy Variables


A simple prophecy variable makes a single prediction. General prophecy
variables can make multiple predictions. Those multiple predictions can be
successive predictions about a single action or separate predictions about
different actions. These two possibilities are illustrated with variants of
the censor system. There can also be multiple predictions about multiple
actions, but we won’t try to be that general. The two examples illustrate
CHAPTER 6. AUXILIARY VARIABLES 223

the concepts. A very general definition has been described elsewhere for
expert TLA+ users [37].

6.4.3.1 A Sequence of Prophecies


Let’s now modify the censor programs to allow the artist to submit a new
picture before the censor has either displayed or rejected the previous sub-
mission. At any time, there may be a queue of submissions being processed
by the censor. In the modified version of ICen1, called ICenSeq1, the censor
has not yet decided whether to display or reject any of the submissions in
that queue. In ICenSeq2, the modified version of ICen2, the censor decides
immediately whether to accept or reject a submission and maintains only a
queue of submissions to be displayed.
Formula CenSeq1 is defined in Figure 6.6, where everything is in gray
except for parts that differ from the corresponding parts of the definition
of Cen1 in Figure 6.4 other than by adding “Seq” to names. Because of
the way we defined Cen1, with aw equal to a sequence of 0 or 1 pictures,
the changes are minimal. (Recall the definitions of Tail and Append from
Section 2.3.2.)
Similarly, Figure 6.7 shows the definition of CenSeq2, using formulas
defined in Figure 6.6. Shown in black are the parts that differ from the
corresponding parts in the definition of Cen2 in Figure 6.5 by more than a
name change.
In both ICenSeq1 and ICenSeq2, the value of the variable aw is the
queue being maintained by the censor. As with ICen1 and ICen2, when aw
is hidden by ∃ , the two formulas are equivalent. As in the previous example,
ICenSeq2 decides whether to display or reject an input before ICenSeq1 does.
To define a refinement mapping to show ICenSeq1 implements ICenSeq2, we
need to add a prophecy variable p to ICenSeq1 that is set by the Input action
and predicts the decisions that will be made by the DispOrNotSeq action.
However, this time there are multiple predictions to be remembered—one
for every picture in aw .
You have probably figured out that this will be done by letting the
value of p be a sequence of Yes or No values, each element of p predicting
whether the corresponding input in the sequence aw will be displayed or
rejected by the DispOrNot action. Here’s how we define ICenSeq1p , the
formula obtained by adding the prophecy sequence variable p to ICenSeq1.
Let Π be the set {Yes, No} of predications. The value of p should always
be a sequence of elements of Π having the same length as the value of the
CHAPTER 6. AUXILIARY VARIABLES 224


CenSeq1 = ∃ aw : ICenSeq1
ICenSeq1 = Init ∧ 2[NextSeq1]v


v = hinp, disp, aw i

InitSeq = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}

NextSeq1 = InputSeq ∨ DispOrNotSeq ∨ AckSeq

InputSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ aw 0 = Append (aw , inp 0 )
∧ disp 0 = disp

DispOrNotSeq = ∧ aw 6= h i
∧ ∨ disp 0 = haw [1], 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = Tail (aw )
∧ inp 0 = inp

AckSeq = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 6.6: The program CenSeq1.

CenSeq2 = ∃ aw : ICenSeq2
ICenSeq2 = InitSeq ∧ 2[NextSeq2]v


NextSeq2 = InpOrNotSeq ∨ DisplaySeq ∨ AckSeq

InpOrNotSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ ∨ aw 0 = Append (aw , inp 0 )
∨ aw 0 = aw
∧ disp 0 = disp

DisplaySeq = ∧ aw 6= h i
∧ disp 0 = haw [1], 1 − disp(2)i
∧ aw 0 = Tail (aw )
∧ inp 0 = inp
Figure 6.7: The program CenSeq2.
CHAPTER 6. AUXILIARY VARIABLES 225

variable aw of ICenSeq1. The initial predicate of ICenSeq1p is:



Init p = InitSeq ∧ (p = h i)

In addition to appending the input to aw , the action InputSeq p must append


to p the prediction of whether or not that input will be displayed:

InputSeq p = InputSeq ∧ (∃ i ∈ Π : p 0 = Append (p, i ))


As in ICen1p , to make DispOrNotSeq p display the input iff p predicts that


it will, we define DorNSeq i so that

DispOrNotSeq = ∃ i ∈ Π : DorNSeq i

where DorNSeq Yes displays the input and DorNSeq No rejects it. The defi-
nition of DorNSeq i is obtained by modifying DispOrNotSeq the same way
we modified DispOrNot to obtain DorN i for ICen1. We can then define:

DispOrNotSeq p = DorNSeq p(1) ∧ (p 0 = Tail (p))


Note that having DispOrNotSeq p set p 0 to Tail (p) ensures that every pre-
diction is used only once. Since AckSeq p neither makes nor satisfies a pre-
diction, we define:

AckSeq p = AckSeq ∧ (p 0 = p)

Putting this all together we get:

ICenSeq1p = InitSeq p ∧ 2[NextSeq1p ]vp


where

NextSeq1p = InputSeq p ∨ DispOrNotSeq p ∨ AckSeq p

and vp equals hinp, disp, aw , p i.


We can now show that CenSeq1 implements CenSeq2 by showing

(6.16) |= ICenSeq1 ⇒ (ICenSeq2 with aw ← awBar )

where awBar is the subsequence of aw containing only the pictures that p


predicts will be displayed. To define awBar , we first define OnlyYes(wsq, ysq)
to be the subsequence of the sequence wsq consisting of all elements for which
the corresponding elements of the sequence ysq equals Yes. The definition
CHAPTER 6. AUXILIARY VARIABLES 226

assumes that wsq and ysq are sequences with the same length. Here is the
recursive definition. (Recall that “◦” is concatenation of sequences.)

OnlyYes(wsq, ysq) =
if wsq = h i then h i
else ( if Head (ysq) = Yes then hHead (wsq)i
else h i )
◦ OnlyYes(Tail (wsq), Tail (ysq))

Defining awBar to equal OnlyYes(aw , p) makes (6.16) true.


It’s straightforward to modify Theorem 6.5 to describe an arbitrary
prophecy variable p that makes a sequence of predictions. We replace the
definition of Next p in the hypothesis of the theorem by:

Next p = (Ap(1) ∧ D) ∨ (∃ j ∈ J : B j ∧ C j ) , where
D equals p 0 = Tail (p) or ∃ i ∈ Π : p 0 = Append (Tail (p), i )
C j equals p 0 = p or ∃ i ∈ Π : p 0 = Append (p, i )
However, there’s one problem: The empty sequence h i is the value of p
indicating that no prediction is being made. When p = h i, the value of the
subscript p(1) = i in this definition is undefined. That doesn’t matter in
our example because p and aw are sequences of the same length, so p = h i
implies aw = h i, which implies that DorNSeq i equals false for i ∈ Π.
Therefore, the value of the undefined subformula makes no difference. In
general, to make the modified theorem valid, we need to add to its hypothesis
the requirement that the following is an invariant of T p :

(p = h i) ⇒ ¬ E (∃ i ∈ Π : Ai )

6.4.3.2 A Set of Prophecies


To illustrate a prophecy variable that makes a set of concurrent predictions,
we now modify the censor programs CenSeq1 and CenSeq2 so that instead
of displaying pictures in the order in which they were submitted, the censor
can display them in any order. This is represented by letting aw be a set
rather than a sequence of pictures. It is done in the two programs ICenSet1
and ICenSet2, where the first lets aw be the set of all unprocessed inputs
and the second lets aw contain just the ones that will be displayed. Letting
CenSet1 and CenSet2 be the programs obtained from these two programs
by hiding aw , we want to show that CenSet1 implements CenSet2. As you
probably realize, defining a refinement mapping to show that this is true
CHAPTER 6. AUXILIARY VARIABLES 227


CenSet1 = ∃ aw : ICenSet1
ICenSet1 = Init ∧ 2[NextSet1]v


v = hinp, disp, aw , old i

InitSet = ∧ inp = NotArt
∧ aw = { }
∧ disp ∈ Art × {0, 1}
∧ old = { }

NextSet1 = InputSet ∨ DispOrNotSet ∨ AckSet

InputSet = ∧ inp = NotArt
∧ inp 0 ∈ Art \ old
∧ aw 0 = aw ∪ {inp 0 }
∧ (disp 0 = disp) ∧ (old 0 = old ∪ {inp 0 })

DispOrNotSet = ∃ w ∈ aw :
∧ ∨ disp 0 = hw , 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = aw \ {w }
∧ (inp 0 = inp) ∧ (old 0 = old )

AckSet = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp) ∧ (old 0 = old )
Figure 6.8: The program CenSet1.

requires adding a prophecy variable p to ICenSet1 that predicts which of


the inputs in aw will be displayed.
Writing these two censor programs poses a problem. What if the artist
submits the same picture twice? If we want the picture to be displayed twice,
we would need to have two copies of it in aw , which means aw couldn’t
simply be a set. In the example of Section 6.5, you’ll see one way of keeping
multiple copies of a value in a set. But for simplicity, we’ll modify the censor
programs not to allow the artist to submit the same picture twice. This will
be done by adding an interface variable old whose value is the set of all
previously submitted pictures.
The definition of CenSet1 is in Figure 6.8, with the changes from the
definition of CenSeq1 (Figure 6.6) in black. You should be able to write the
definition of CenSeq2 yourself.
CHAPTER 6. AUXILIARY VARIABLES 228

To define a refinement mapping under which ICenSet1 implements


ICenSet2, we need to add a prophecy variable p to ICenSet1 that predicts,
for each picture in aw , whether or not that picture will be displayed. The
obvious way to do that is to let the value of p be a function in aw → Π , the
set of functions from aw to Π. As before, we let Π equal the set {Yes, No}.
Since aw initially equals the empty set, the initial value of p should be
the function whose domain is the empty set. There is just a single such
function, and the easiest way to write it is as the empty sequence h i, which
is a (and hence the) function whose domain is the empty set. So, we define:

InitSet p = InitSet ∧ (p = h i)

The InputSet p action must add a prediction of whether or not the picture
inp 0 that it adds to aw will be displayed. Thus, it must assert that p 0 is
the function obtained from p by adding inp 0 to its domain and letting the
value of p 0 (inp 0 ) be either element in Π. To write that action, let’s define
FcnPlus(f , w , d ) to be the function obtained from a function f by adding an
element w to its domain and letting that function map w to d . The domain
of f is written domain(f ), so the definition is:

FcnPlus(f , w , d ) =
x ∈ {w } ∪ domain(f ) 7→ if x = w then d else f (x )
We can then define

InputSet p = InputSet ∧ (∃ i ∈ Π : p 0 = FcnPlus(p, inp 0 , i ))


To define DispOrNotSet p , we define DorNSet i (w ) as follows so DispOrNotSet


equals ∃ w ∈ aw , i ∈ Π : DorNSet i (w ) .

DorNSet i (w ) = ∧ ∨ (i = Yes) ∧ (disp 0 = hw , 1 − disp(2)i)
∨ (i = No) ∧ (disp 0 = disp)
∧ aw 0 = aw \ {w }
∧ (inp 0 = inp) ∧ (old 0 = old )
The DispOrNotSet p action will have to erase the prediction by removing
from the domain of p the picture being displayed or rejected. So let’s de-
fine FcnMinus(f , w ) to equal the restriction of f to its domain minus the
element w :

FcnMinus(f , w ) = x ∈ (domain(f ) \ {w }) 7→ f (x )
CHAPTER 6. AUXILIARY VARIABLES 229

We can now define:



DispOrNotSet p =
∃ w ∈ aw : DorNSet p(w ) (w ) ∧ (p 0 = FcnMinus(p, w ))
Since AckSet p neither makes nor satisfies a prediction, its definition is sim-
ply:
AckSet p = AckSet ∧ (p 0 = p)

The rest of the definition of ICenSet1p should be clear.


We can then show that CenSet1 implements CenSet2 by showing
|= ICenSet1p ⇒ (ICenSet2 with aw ← awBar )
where awBar equals {w ∈ aw : p(w ) = Yes}, the set of elements in aw
that p predicts will be displayed.
We won’t bother writing the generalization of Theorem 6.5 for a prophecy
variable p that makes a set of predictions.

6.4.3.3 Further Generalizations


We now extract from our examples a more general formulation of prophecy
variables. To construct a prophecy variable, we start with certain sets of ac-
tions we’ll call action sets. For CenSeq1 there is one action set consisting of
two actions: DispOrNotSeq Yes , a DispOrNotSeq action that displays the pic-
ture, and DispOrNotSeq No , a DispOrNotSeq action that doesn’t display it.
For CenSet1, for every w ∈ Art, there is an action set consisting of two ac-
tions: DispOrNotSet w ,Yes and DispOrNotSet w ,No , which are DispOrNotSet
actions that either display or don’t display the input w . Thus, there is a set
of action sets, one action set defined for each w in Art.
In general, for a program T , we have a set of action sets, each of which
can be written as {Ai : i ∈ Π}. (The set Π can be different for different
action sets.) That set of action sets is a constant; it doesn’t change during
a behavior of T .
A prophecy is a prediction about one of the program’s action sets. That
is, it is an element of the set Π defining the action set {Ai : i ∈ Π}. It predicts
the value of i for which the next Ai step occurs. In any state, the value of
a prophecy variable describes a set of predictions, some of which are active.
For CenSeq1, the prophecy variable p equals a sequence of predictions, all
for the same action set, only the first element of the sequence being active.
For CenSet1, the prophecy variable p equals a set of prophecies, one for each
action set defined by an element of aw , all of its prophecies being active.
CHAPTER 6. AUXILIARY VARIABLES 230

In general, the program T p is defined by adding the variable p in such


a way that every behavior satisfying T p satisfies T . Moreover, it ensures
that no active prediction of p is ever violated. When an action predicted
by an active prediction of p occurs, we say that the prediction is fulfilled.
The initial value of p can contain prophecies. Prophecies can be added to
or removed from p and/or made active or inactive by any action of T p , so
long as the following conditions are satisfied:
• No two active prophecies can be predictions for the same action set.
• For any action of T p that adds a prophecy to p, it must be possible
for the action to add a prophecy for any element of the set Π for that
action set.
• An action of T p that fulfills a prediction must remove that prediction
from p. (It may also add one or more new predictions.)
The prophecy variables of CenSeq1 and CenSet1 made only predictions that
were likely to be fulfilled. We could instead have used prophecy variables
that a mathematician might consider simpler that make a lot more predic-
tions. For CenSeq1, instead of having each InputSeq step add a prediction
to p, we could have let the initial value of p be an infinite sequence of predic-
tions. The first element of the sequence would be the active one, and each
DispOrNotSeq action would remove that element from p. For CenSet1, we
could have let the initial value of p be any element of Art → {Yes, No} ,
predicting for each picture w whether or not it will be displayed if it is in-
put. Since the same picture can’t be input twice, the value of p(inp) could
be set by a DispOrNotSet action to a value indicating that its prediction is
inactive.
We could have used an even more extravagant prophecy variable for
CenSet1—one that predicts not only whether each picture will be displayed
or rejected, but in which order they will be input. The initial value of
p would be an infinite sequence of predictions hw , d i, for w ∈ Art and
d ∈ {Yes, No}, predicting not just if the next DispOrNotSet step will display
or reject the input, but that it must occur with inp equal to w . Almost all
of those predictions will be impossible to fulfill because inp will not equal
w . But as we’ve seen, impossible predictions don’t matter because they
just require the behavior to halt, which is either allowed or is ruled out by
a liveness hypothesis. This may seem silly, but a prophecy variable that
makes predictions that are almost all impossible is used in the example of
Section 6.5 because it seems to provide the simplest way to define the needed
refinement mapping.
CHAPTER 6. AUXILIARY VARIABLES 231

6.4.4 The Existence of Refinement Mappings


We now state a completeness result for the auxiliary variables that we’ve
described. Completeness means that if |= T ⇒ ∃ y : IS is true, where y is
a list of variables, then we can successively add a list a of these auxiliary
variables to T to obtain a formula T a that implies IS under a refinement
mapping that substitutes state expressions of T a for the variables y.
Like most such completeness results, it assumes that there is a mathe-
matical proof of the result based on the semantics of the formulas. That
is, we assume not only the truth of |= T ⇒ ∃ y : IS , but that there exists
a mathematical proof of its truth. Such a proof consists of an operator Φ
such that for every behavior σ satisfying T , there is a behavior Φ(σ) sat-
isfying IS that shows σ satisfies ∃ y : IS . This means approximately that
we can obtain Φ(σ) from σ by adding and/or removing stuttering steps and
changing the values of the variables of y. The precise statement of this
condition is Φ(σ) ∼y σ, where the definition of ∼ y is the same as that of
∼ y in Section 6.1.3 with = y replaced by = y , and s = y t defined to mean
that states s and t are equal except perhaps for the values they assign to
the variables of y. The assumption that there exists a mathematical proof
of |= T ⇒ ∃ y : IS is embodied in the use of Φ to construct the refinement
mapping. Here is the precise statement of the theorem. It uses the notation
that if y is the list y 1 , . . . , y n of variables and exp is the list exp 1 , . . . , exp n of
expressions, then y ← exp is an abbreviation for y 1 ← exp 1 , . . . , y n ← exp n .

Theorem 6.6 Let x, y, and z be lists of variables, all distinct from one
another; let the variables of T be x and z and the variables of IS be x and
y; and let T equal Init ∧ 2[Next]hx,zi ∧ L. Let the operator Φ map behaviors
satisfying T to behaviors satisfying IS such that Φ(σ) ∼y σ. By adding
history, stuttering, and prophecy variables to T , we can define a formula
T a such that ∃ a : T a is equivalent to T and a list exp of expressions defined
in terms of Φ and the variables of T a such that

|= T a ⇒ (IS with y ← exp)

The theorem makes no assumption about L other than that it contains


no variables besides those of x and z. It doesn’t even have to be a liveness
property.
Here is the idea behind the theorem’s proof. We first add an infinite-
stuttering variable t to avoid having to worry about terminating behaviors.
We then add a history variable h that remembers the entire sequence of
CHAPTER 6. AUXILIARY VARIABLES 232

values of all the tuples hx, z, t i in all the states reached thus far, including
the current one. We then add a prophecy sequence variable p that predicts
the infinite sequence of all future values of hx, z, t i. This means that in all
states of the behavior, the value of h ◦ p is the entire sequence of values
of hx, z, t i in the complete (infinite) behavior. Moreover, the length of h
indicates the position of the current state in that behavior. The values of h
and p and the mapping Φ provide all the information needed to determine
the values to substitute for y to obtain a refinement mapping under which
IS is simulated. The proof in the Appendix sketches the details.
The theorem shows that these auxiliary variables are, in principle, all
we need to define a refinement mapping. It and its proof do not tell us how
refinement mappings are defined in practice.

6.5 The FIFO Queue


This section presents a more realistic example of the use of auxiliary vari-
ables to show that one abstract program implements another. That makes
it rather long, but it’s included for two reasons. The first is that it describes
serializability, which is an important concept for designing concurrent pro-
grams. The second is that stuttering and prophecy variables are not as
intuitive as history variables, and a more realistic example may provide
some insight into how they can be used in practice.

6.5.1 Fifo – A Linearizable Specification


Popular coding languages provide a small number of built-in data types
such as finite-precision integers. Other data types must be implemented as
objects. An object has a state and methods with which the program can
read and modify parts of the state. We will ignore how objects are created
and destroyed.
A simple example of such an object is a first in, first out queue, called
a fifo. We can think of the state of a fifo as an ordinal sequence queue
of elements from some set Data. A fifo provides two methods, usually
described as follows.

enqueue Takes an element of Data as an argument, and appends it to the


end of queue. It returns no value.1
1
Sometimes the queue can hold only some maximum number of elements, but for
simplicity we assume that there is no such limit.
CHAPTER 6. AUXILIARY VARIABLES 233

dequeue If queue is nonempty, it removes the first element of queue and


returns it as the result. If queue is the empty sequence, it returns some
special value.

An object is accessed only by executing its methods. For the purpose of


correctness, the programmer needs to know nothing about how these two
methods are implemented.
This kind of description is adequate for a method in a traditional pro-
gram. It is inadequate for concurrent programs because it says nothing
about what happens if two processes concurrently access the object. The
call of a method and the return are usually described as single steps, but
execution of the operation may consist of steps that occur between those
two steps.
Often, it is considered an error if two processes concurrently access the
same object. The object must either be accessed by only one process, or
else accesses by different processes must be inside the critical section of a
mutual exclusion algorithm. We’re interested in objects that are meant to be
accessed concurrently by multiple processes—for example, a critical section
object for implementing mutual exclusion with enter and exit methods.
Maurice Herlihy and Jeannette Wing defined an object to be linearizable
iff if acts as if the execution of a method consists of three steps: the call,
the return, and between them a single step that performs the actual reading
and/or modifying of the object’s state [18]. The state of a linearizable object
is described by internal variables. Only the call and return steps change
interface variables. Linearizability has become a standard requirement for
shared objects in concurrent systems.
We describe a linearizable fifo as an abstract program. We require that
execution of a dequeue operation when queue is empty waits for an element
to be enqueued rather than returning a special value. This makes the fifo
more interesting because it involves process synchronization—making one
process wait for another process to do something. Since the purpose of the
example is to illustrate the use of auxiliary variables, which are added only
to the safety property of a program, we consider only the safety property of
a fifo.
We assume there is a set EnQers of processes that perform enqueue
operations. Execution of an enqueue operation by process e consists of
three steps: a BeginEnq(e) step that describes the call of the method, a
DoEnq(e) step that modifies the variable queue, and an EndEnq(e) step that
describes the return. The enqueuers communicate with the object through
the interface variable enq, whose value is a function with domain EnQers.
CHAPTER 6. AUXILIARY VARIABLES 234

The value of enq(e) equals Done when enqueuer e is not performing an


enqueue operation, and it equals the data value it is appending to queue
when e is performing the operation, where Done is some constant not in
Data. There is also an internal variable enqInner , where enqInner (e) is set
to Busy by the BeginEnq(e) action and is set to Done by the DoEnq(e)
action.
Similarly, there is a set DeQers of dequeuer processes, each d ∈ DeQers
performing BeginDeq(d ), DoDeq(d ), and EndDeq(d ) steps. Dequeuers com-
municate with the object through the interface variable deq, where deq(d ) is
set to Busy by the BeginDeq(d ) action and to the value that was dequeued
by the EndDeq(d ) action. There is an internal variable deqInner , where
deqInner (d ) is set to Busy by the BeginDeq(d ) action and set by DoDeq(d )
to the value dequeued by the dequeue operation. The complete definition of
the abstract program is formula Fifo in Figure 6.9. It uses the unchanged
operator, where unchanged exp equals exp 0 = exp. Thus, if v is a tuple
hv 1 , . . . , v n i of variables, unchanged v asserts that v 0i = v i for all i in
1 . . n.

6.5.2 POFifo – A More General Specification


6.5.2.1 The Background
Many people, myself included, used to believe that any implementation of
a fifo had to be a more concrete version of program IFifo, with the value
of the variable queue encoded in the program’s state. This implied that
any implementation of a fifo should implement IFifo under a refinement
mapping, without having to add auxiliary variables.
We were wrong. Suppose two processes concurrently execute enqueue
operations. When the two operations’ EndEnq steps have occurred, program
IFifo has appended both values to queue in some order, determining the
order in which they will be dequeued. However, in a behavior of program
Fifo, where queue is hidden, there is no way to know in which order the two
values appear in queue until that order is revealed by dequeue operations. In
their paper defining linearizability, Herlihy and Wing gave an algorithm that
implements a fifo in which, from a state immediately after both enqueue
operations have completed, it is possible for the two values to be dequeued in
either order by two successive non-concurrent dequeue operations. There is
no queue encoded in the algorithm. While their algorithm implements Fifo,
there is no refinement mapping under which it implements IFifo without the
addition of auxiliary variables. In particular, showing that their algorithm
CHAPTER 6. AUXILIARY VARIABLES 235


Fifo = ∃ queue, enqInner , deqInner : IFifo
IFifo = Init ∧ 2[Next]v


v = henq, deq, queue, enqInner , deqInner i

Init = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ queue = h i
∧ enqInner = (e ∈ EnQers 7→ Done)
∧ deqInner = deq

Next = ∨ ∃ e ∈ EnQers : BeginEnq(e) ∨ DoEnq(e) ∨ EndEnq(e)
∨ ∃ d ∈ DeQers : BeginDeq(d ) ∨ DoDeq(d ) ∨ EndDeq(d )

BeginEnq(e) = ∧ enq(e) = Done
∧ ∃ D ∈ Data : enq 0 = (enq except e 7→ D)
∧ enqInner 0 = (enqInner except e 7→ Busy)
∧ unchanged hdeq, queue, deqInner i

DoEnq(e) = ∧ enqInner (e) = Busy
∧ queue 0 = Append (queue, enq(e))
∧ enqInner 0 = (enqInner except e 7→ Done)
∧ unchanged hdeq, enq, deqInner i

EndEnq(e) = ∧ enq(e) 6= Done
∧ enqInner (e) = Done
∧ enq 0 = (enq except e 7→ Done)
∧ unchanged hdeq, queue, enqInner , deqInner i

BeginDeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ deqInner 0 = (deqInner except d 7→ NoData)
∧ unchanged henq, queue, enqInner i

DoDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) = NoData
∧ queue 6= h i
∧ deqInner 0 = (deqInner except d 7→ Head (queue))
∧ queue 0 = Tail (queue)
∧ unchanged henq, deq, enqInner i

EndDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) 6= NoData
∧ deq 0 = (deq except d 7→ deqInner (d ))
∧ unchanged henq, queue, enqInner , deqInner i
Figure 6.9: The program Fifo.
CHAPTER 6. AUXILIARY VARIABLES 236

implements Fifo requires adding a prophecy variable that predicts the order
in which data items enqueued by concurrent enqueue operations will be
dequeued.
What is encoded in the state of their algorithm is not a linearly ordered
queue of enqueued data values, but rather a partial order on the set of
enqueued values that indicates the possible orders in which the values can
be returned by dequeue operations. A partial order on a set S is a relation ≺
on S that is transitive and has no cycles (which implies a 6≺ a for any a ∈ S ).
For the partial ordering ≺ on the set of enqueued values, the relation u ≺ w
means that value u must be dequeued before value w . Program IFifo is the
special case in which that partial order is a total order, meaning that either
u ≺ w or w ≺ u for any two distinct enqueued values u and w .
Presented here is a program POFifo that is equivalent to Fifo, but which
is obtained by hiding internal variables in a program IPOFifo that main-
tains a partially ordered set of enqueued values rather than a queue. The
Herlihy-Wing algorithm can be shown to implement IPOFifo under a refine-
ment mapping defined in terms of its variables, without adding a prophecy
variable.

6.5.2.2 Program POFifo


Execution of an operation is described by a sequence of steps. In a lineariz-
able description, there are three steps: a step of the Begin action, a step
of the Do action, and a step of the End action. One operation execution
is defined to precede another if its End step precedes the other execution’s
Begin step. If neither of two operation executions precedes the other, then
the executions are said to be concurrent.
Since Begin and End steps change interface variables, they appear in a
behavior of Fifo. However, a Do step changes only internal variables, so
it is not visible in a behavior of Fifo. If the executions of two enqueues
are concurrent, then a behavior of Fifo does not show in which order their
Do steps occurred in IFifo. Only if one enqueue execution precedes the
other do we know from a behavior of Fifo the order in which the enqueued
values appear in queue. Therefore, a behavior satisfies Fifo iff it satisfies the
following two safety properties. They are stated informally, where a value
is taken to mean a particular enqueueing of a data value.

F1. Each dequeued value has been enqueued, and an enqueued value is
dequeued at most once.

F2. If an enqueue of a value v precedes an enqueue of a value w , then the


CHAPTER 6. AUXILIARY VARIABLES 237

dequeue of value w cannot precede the dequeue of value v .

The values enqueued by two concurrent enqueue executions may be de-


queued in either order.
Program IPOFifo must maintain a set of enqueued items and some or-
dering relation among them. The first thing to observe is that the same
data item might be enqueued twice before any item is dequeued. Since this
is not a silly example, that possibility should be handled. An easy way to
do that is to maintain a set of pairs hd , i i where d is the enqueued data
value and i is an element of a set Ids of identifiers that serve to distinguish
between different “copies” of an enqueued value. Let’s call such a hdata
value, identifier pairi a datum. (We will use datums as the plural of datum
because data suggests elements of Data rather than of Data × Id . We will
continue to call an element of Data a data value.)
IPOFifo will use an internal variable elts whose value is the set of cur-
rently enqueued datums. It will also have an internal variable whose value
is an ordering relation ≺ on the set elts where, u ≺ w means that datum u
must be dequeued before datum w is. For IPOFifo to describe a linearizable
object, execution of an enqueue operation must consist of a BeginPOEnq
step, followed by a DoPOEnq step that puts the datum in elts, followed by
an EndPOEnq step. Condition F2 is satisfied if the DoPOEnq step that
puts a datum w in elts adds a relation u ≺ w for every datum u put in elts
by an enqueue operation that has completed. Of course, F1 is satisfied if
every DoPODeq step obtains its data item from a datum that it removes
from elts.
Given any behavior σ satisfying IPOFifo, we can obtain a behavior τ
satisfying IPOFifo by moving every DoPOEnq step earlier in the behavior
so it occurs immediately after its operation’s BeginPOEnq step, and moving
every DoPODeq step later in the behavior so it occurs immediately before its
operation’s EndPODeq step. Moreover, since the Do . . . steps change only
internal variables, the values of the interface variables are the same in each
state of τ as in σ. Therefore, without eliminating any possible behaviors of
POFifo (which are obtained by hiding the internal variables of IPOFifo),
we can require that a BeginPOEnq step be immediately followed by the
operation’s DoPOEnq step, and that an EndPODeq step be immediately
preceded by the operation’s DoPODeq step. This means that there’s no
need for the Do . . . actions; the BeginPOEnq and DoPOEnq actions can be
combined into a single action, as can the DoPODeq and EndPODeq actions.
We will therefore simplify IPOFifo be eliminating the Do . . . actions and
having only Begin . . . and End . . . actions.
CHAPTER 6. AUXILIARY VARIABLES 238

We can now write the program IPOFifo. It will have the same constants
as IFifo plus the set Ids of identifiers; and it will have the same interface
variables deq. It will have the internal variable elts whose value is the set of
currently enqueued datums.
IPOFifo will need an internal variable to describe the partial order ≺ on
the set elts. Mathematicians describe a relation ≺ on a set S as a subset
of S × S , where u ≺ v is an abbreviation for hu, v i ∈ ≺. We’ll do the same
thing, except it would be confusing to use the symbol ≺ as a variable. We
will therefore let before be the variable whose value is a subset of elts × elts
such that u ≺ v means hu, v i ∈ before.
Finally, when enqueueing a datum w , the BeginPOEnq step must add
to ≺ the relation u ≺ w for a datum u in elts iff the enqueue operation that
added u has completed. That information is not contained in the interface
variable enq because enq(e) contains only the data value that an uncom-
pleted enqueue operation is enqueueing, not which datum the operation put
in elts. Therefore, we add to IPOFifo an internal variable adding such that
adding(e) equals the datum in elts that enqueuer e put in elts, and equals a
value NonElt that is not a datum if e is not currently performing an enqueue
operation.
We use adding to define the following state expression, whose value is
the set of datums enqueued by operations whose executions have not yet
completed:

beingAdded = {adding(e) : e ∈ EnQers} \ {NonElt}

The set beingAdded need not be a subset of elts because it can contain
datums that were removed from elts by dequeue operations before the op-
erations that enqueued them have completed.
The program POFifo is defined in Figure 6.10. Here are explanations of
the four disjuncts of the next-state action PONext.

BeginPOEnq(e) Enabled when enq(e) = Done, it:


• Sets enq(e) to the data value D that e is enqueuing.
• Adds a datum hD, id i to elts for some id ∈ Ids for which hD, id i
is not already in elts or beingAdded .
• Modifies before to add the relations el ≺ hD, id i for every el in
elts that is not in beingAdded .
• Sets adding(e) to hD, id i, thereby adding hD, id i to beingAdded .
CHAPTER 6. AUXILIARY VARIABLES 239


POFifo = ∃ elts, before, adding : IPOFifo
IPOFifo = POInit ∧ 2[PONext]POv


POv = henq, deq, elts, before, adding i

POInit = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ elts = {}
∧ before = {}
∧ adding = (e ∈ EnQers 7→ NonElt)

PONext = ∨ ∃ e ∈ EnQers : BeginPOEnq(e) ∨ EndPOEnq(e)
∨ ∃ d ∈ DeQers : BeginPODeq(d ) ∨ EndPODeq(d )

BeginPOEnq(e) =
∧ enq(e) = Done
∧ ∃ D ∈ Data : ∃ id ∈ {i ∈ Ids : hD, i i ∈ / (elts ∪ beingAdded )} :
∧ enq 0 = (enq except e 7→ D)
∧ elts 0 = elts ∪ {hD, id i}
∧ before 0 = before ∪ {hel , hD, id ii : el ∈ (elts \ beingAdded )}
∧ adding 0 = (adding except e 7→ hD, id i)
∧ deq 0 = deq

EndPOEnq(e) = ∧ enq(e) 6= Done
∧ enq 0 = (enq except e 7→ Done)
∧ adding 0 = (adding except e 7→ NonElt)
∧ unchanged hdeq, elts, before i

BeginPODeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ unchanged henq, elts, before, adding i

EndPODeq(d ) = ∧ deq(d ) = Busy
∧ ∃ el ∈ elts :
∧ ∀ el 2 ∈ elts : ¬(el 2 ≺ el )
∧ elts 0 = elts \ {el }
∧ deq 0 = (deq except d 7→ el (1))
∧ before 0 = before ∩ (elts 0 × elts 0 )
∧ unchanged henq, adding i
Figure 6.10: The program POFifo.
CHAPTER 6. AUXILIARY VARIABLES 240

EndPOEnq(e) Enabled when enq(e) is a data value, it sets enq(e) to Done


and sets adding(e) to NonElt, thereby removing from beingAdded the
datum that e had enqueued.

BeginPODeq(d ) Enabled when deq(d ) is a data value, it sets deq(d ) to


Busy.

EndPODeq(d ) Enabled when deq(d ) equals Busy and elts is not empty,
which implies that elts contains at least one minimal datum (a datum
not preceded in the ≺ relation by any other datum in elts), since the
datum most recently added to elts must be a minimal datum. The
action chooses an arbitrary minimal datum el of elts, removes it from
elts, sets deq(d ) to its data value component, and modifies before to
remove all relations el ≺ el 2 for elements el 2 of elts.

6.5.3 Showing IPOFifo Implements Fifo


Formulas POFifo and Fifo are equivalent. However, more important and
more interesting than showing that Fifo refines POFifo is showing that
POFifo refines Fifo. It’s more important because Fifo is the generally ac-
cepted description of a fifo. By showing that POFifo implements Fifo, we
can show that an algorithm that doesn’t maintain a totally ordered queue
implements Fifo by showing that it implements POFifo. It’s more interest-
ing because showing that POFifo implements Fifo requires all three kinds
of auxiliary variables. So, the problem of showing that Fifo refines POFifo
is not discussed.
To show that POFifo refines Fifo, we have to define a refinement map-
ping under which IPOFifo implements IFifo. This requires adding to
IPOFifo first a prophecy variable, then a history variable, then a stutter-
ing variable. They are added in the three following subsections. Adding the
prophecy variable is straightforward and the necessary details are presented.
How to add the other two variables and define the refinement mapping are
then sketched. Rigorously defining those two programs and the refinement
mapping would be a marvelous learning experience. However, the chance of
doing it without making any error is small unless you use a tool to check
what you have done.

6.5.3.1 The Prophecy Variable


The most significant problem in showing that POFifo implements Fifo is
that IFifo decides the order in which concurrently enqueued values will be
CHAPTER 6. AUXILIARY VARIABLES 241

dequeued before IPOFifo does. This tells us that to define a refinement


mapping under which IFifo is implemented by POFifo, we need to add a
prophecy variable p to IPOFifo.
The simplest way I know of making the necessary predictions is with
a prophecy sequence variable p that predicts the sequence of datums that
will be dequeued next. The first item in the sequence predicts the datum
that the next EndPODeq step removes from elts, and that step of course
removes the prediction from p. The natural step to append a prediction
to the sequence p is the BeginPOEnq step that adds a datum to elts. The
length of p therefore always equals the number of datums in elts. The
predictions are completely arbitrary datums, so in almost all behaviors they
will be impossible to fulfill, at some point not allowing any more datums to
be dequeued. But as we’ve seen, unfulfillable predictions are no problem.
Since no predictions are made about enqueue operations, they can keep
being performed even if the datums they enqueue can never be dequeued.
The set Π of possible predictions equals the set Data × Ids of all possible
datums. The BeginPOEnq(e) action should append a prediction to p, and
the EndPOEnq(e) and BeginPODeq(d ) actions should leave p unchanged.
We can therefore define three of the four subactions of the next-state action
PONext p of IPOFifo p by:

BeginPOEnq p (e) = ∧ BeginPOEnq(e)
∧ ∃ el ∈ Data × Ids : p 0 = Append (p, el )

EndPOEnq p (e) = EndPOEnq(e) ∧ (p 0 = p)

BeginPODeq p (d ) = BeginPODeq(d ) ∧ (p 0 = p)
The prediction made by the first item p(1) of the sequence p is the datum
that the next EndPODeq(d ) step will remove from elts. The datum p(1)
is removed by this step iff elts 0 = elts \ {p(1)} is true of the step. Since the
step must remove the prediction, we can define:

EndPODeq p (d ) =
EndPODeq(d ) ∧ (elts 0 = elts \ {p(1)}) ∧ (p 0 = Tail (p))
These four action definitions, and the observation that Init p should equal
Init ∧ (p = h i) give us all the pieces of the definition of IPOFifo p . I won’t
bother to put them together.

6.5.3.2 The History Variable qBar


We now must decide how to define queueBar , the state expression that is
substituted for queue in the refinement mapping. It’s easiest to think of
CHAPTER 6. AUXILIARY VARIABLES 242

defining queueBar as the sequence not of the data values to be substituted


for the variable queue of IFifo, but of the sequence of datums being dequeued
from elts by POFifo. We can then define the refinement mapping so it
substitutes for the variable queue of IFifo the sequence of data values in the
datums of queueBar .
Since there are no Do . . . steps in IPOFifo, we will have to add stutter-
ing steps to append datums to queueBar . To do that, we will first add to
IPOFifo p a history variable qBar to produce a program we’ll call IPOFifo pq .
The value of qBar will be the sequence of datums that will appear in
queueBar . However, datums will be appended to qBar by Begin . . . and
End . . . steps, so stuttering steps will then have to be added to IPOFifo pq
that append the datums in qBar one at a time to queueBar , since IFifo
appends data values to queue one at a time. How those stuttering steps are
added is explained in Section 6.5.3.3.
We will see that a single BeginPOEnq pq step may append multiple da-
tums to qBar , so letting IPOFifo have Do . . . steps wouldn’t have eliminated
the need to add stuttering steps. The following definition of the history
variable qBar is subtle. To help you understand it, I suggest you check how
the definition works on the first few steps of several different behaviors of
IPOFifo p .
Suppose that program BeginPOEnq p is in a state in which it’s possible
for a dequeue operation to remove a datum from elts. For qBar to describe
the datums that are in queueBar , the datum removed by that dequeue op-
eration must be in qBar . An EndPODeq p step can remove only the datum
p(1) from elts, and it can remove p(1) iff p(1) is a minimal element for the
relation ≺ of elts. Therefore, p(1) must be the first element of qBar iff p(1)
is a minimal element of elts. If p(1) can be dequeued , then p(2) can be
dequeued after it iff p(2) is a minimal element of elts \ {p(1)}. And so on.
Define the state predicate pg to be the longest prefix of p all of whose
datums can be dequeued from elts if no further datums are enqueued in elts.
It follows from the definition of IPOFifo p that pg equals the longest prefix
of p satisfying the following conditions:
Q1. Every datum in pg is in elts.
Q2. No datum appears twice in pg.
Q3. For each i ∈ 1 . . Len(pg) and each datum u ∈ elts, if u ≺ pg(i ) then
u = pg(j ) for some j ∈ 1 . . (i − 1).
We have shown that pg must be a prefix of qBar . Our strategy for defining
IPOFifo pq by adding the history variable qBar to IPOFifo p is to keep qBar
CHAPTER 6. AUXILIARY VARIABLES 243

equal to pg for as long as possible. To see how to do that, let’s see how pg
can change.
The sequence pg can become shorter only when an EndPODeq p step
occurs, in which case p is not the empty sequence and pg is a nonempty
prefix of p. The step removes the first element of p and pg, so p 0 = Tail (p),
pg 0 = Tail (pg), and qBar 0 = Tail (qBar ).
The sequence pg can be made longer by a BeginPOEnq p step as follows.
Suppose the step appends the prediction u to p and adds the datum w to
elts. The value of pg at the beginning of the step is a proper prefix of p ◦hu i.
If w equals the prediction in p ◦ hu i immediately after pg, then w will be
appended to pg iff doing so would not violate Q3. (We’ll see in a moment
when it would violate Q3.) If w can be appended to pg and the prediction
following w in p is already in elts, then it might be possible to append that
datum to pg as well. And so on. Thus, it’s possible for the BeginPOEnq p
step to append several datums to pg. If our strategy has been successful
thus far and qBar = pg at the beginning of the step, then a BeginPOEnq pq
step implies qBar 0 = pg 0 . This makes qBar a prefix of qBar 0 , as it should
be because stuttering steps to be added after a BeginPOEnq p step should
change queueBar only by appending datums to it.
There is one situation in which it is impossible for any further datum to
be appended to pg. One or more datums can be appended to pg only by
a BeginPOEnq p that adds a datum w to elts that can be appended to pg.
However if there is a datum u in elts that is neither in the sequence pg nor in
the set beingAdded , then adding w to elts also adds the relation u ≺ w . This
relation means that w can’t be appended to pg because that would violate
condition Q3. Thus, if there is a datum u in elts that is neither in pg nor
beingAdded , then no datums can be added to pg. Moreover, the datum u
can never be removed from elts because it is not in pg and can never be in pg
because no more datums can be added to pg. (The datum u can’t be added
to beingAdded because a BeginPOEnq step can’t add a datum to elts that
is already in elts.) Let’s call a state in which there is a datum in elts that
is not in beingAdded or pg a blocked state. In a blocked state, datums can
be removed from the head of pg by EndPODeq p steps, but no new datums
can be appended to pg. So, if and when enough EndPODeq p steps have
occurred to remove all the datums from pg, then no more EndPODeq p steps
can occur. That means that any further enqueue operations that are begun
with a BeginPODeq p step must block, never able to complete.
Let’s consider the first step that caused a blocked state—that is, causing
there to be an element u in elts that is neither in pg nor beingAdded . Since
u was added to elts by a BeginPOEnq p step that put u in beingAdded , it
CHAPTER 6. AUXILIARY VARIABLES 244

must be the EndPOEnq p step of the enqueue operation that added u to elts
that caused the blocked state by removing u from beingAdded . Until that
blocked state was reached, qBar equaled pg. However, since u has not been
dequeued, it must be in queueBar after that EndPOEnq p step because that
step must implement the EndEnq step of IFifo. Thus that EndPOEnq p step
must append u to qBar . Therefore, the first blocked state is the first state
in which qBar 6= pg. In that state, qBar equals pg ◦ hu i.
From that first blocked state on, no new datums can be added to pg, so
the datum u can never be dequeued. Therefore, whenever an EndPOEnq p
step occurs for an operation that enqueued a datum w , if w is in elts (so it
hasn’t been dequeued) and is not in pg, then that EndPOEnq p step must
append w to qBar .
To recapitulate, here is how we add the history variable qBar to IPOFifo p
to obtain the program IPOFifo pq . These rules imply that, at any point in the
behavior, qBar will equal pg ◦ eb where pg is the state function of IPOFifo p
defined above and eb is a sequence of datums in elts that are not in pg.
Initially, pg and eb equal h i.

• An EndPODeq p step can occur only if pg 6= h i. Such a step satisfies


pg 0 = Tail (pg) and qBar 0 = Tail (qBar ).

• An EndPOEnq p step that removes from beingAdded a datum w that


is in elts but not in pg appends w to eb and hence to qBar .

• A BeginPOEnq p step that occurs when eb = h i (so qBar = pg) sets


qBar 0 equal to pg 0 .

These rules imply that a datum can never be removed from eb, so once eb
is nonempty no new datums can be added to pg and only datums currently
in pg can ever be dequeued.
Observe that the sequence pg and the set of datums in eb can be defined
in terms of the variables of IPOFifo p . A history variable is needed only
to remember the order in which datums have been appended to eb. This
suggests that it’s a little simpler to make eb the history variable and define
qbar to be the state expression pg ◦ eb. However, I could not have discovered
this without first understanding how qBar should be defined.
Writing a complete definition of IPOFifo pq is straightforward, once we
have solved the problem of writing a precise mathematical definition of the
state function qBar in terms of the variables of IPOFifo p . (It took me a few
tries to get that definition right, using a model checker to find my mistakes.)
That definition is omitted.
CHAPTER 6. AUXILIARY VARIABLES 245

6.5.3.3 Stuttering and the Refinement Mapping


Having defined qBar for IPOFifo pq , adding a stuttering variable s and defin-
ing the refinement mapping under which the resulting program IPOFifo pqs
implements IFifo are comparatively straightforward. First, s has to add
stuttering steps that allow us to define queueBar . Recall that qBar is de-
fined so it is changed by interface actions (the only ones that IPOFifo pq
has) the way internal Do . . . actions of IFifo change the value of the inter-
nal variable queue of IFifo. Since queueBar should implement queue under
the refinement mapping (except that queueBar contains datums not just the
data values), it needs to be changed by stuttering steps added to IPOFifo pq .
We define the program IPOFifo pqs by adding a stuttering variable s that
adds steps in the following three cases. We define queueBar to be the state
expression that equals qBar except as noted below.

1. s adds a single stuttering step before each EndPODeq pq step. The


value of queueBar equals Tail (qBar ) immediately after that stuttering
step.

2. s adds a stuttering step before each EndPOEnq pq step that appends


an element w to eb (and hence to qBar ). Immediately after that
stuttering step, queueBar equals qBar ◦ hw i.

3. Following each BeginPOEnq pq step such that Len(pg 0 ) > Len(pg)


(which implies eb = h i), s adds Len(pg 0 ) − Len(pg) stuttering steps.
While there are k more of those stuttering steps left to be executed,
queueBar equals the sequence obtained by removing the last k ele-
ments of qBar .

Encoding in the value of the stuttering variable s for which of the three
cases the variable is being added, and in case 2 for which enqueuer e the
step is an EndPOEnq pq (e) step, allows the value of queueBar to be defined
in terms of the values of s, qBar , and (for case 2) adding.
We still have to define the state functions enqInnerBar and deqInnerBar
that are substituted for enqInner and deqInner in the refinement mapping
under which IPOFifo pqs implements IFifo. The value of enqInnerBar (e) for
an enqueuer e should equal Done except when adding(e) equals the datum
that e is enqueueing, and that datum is not yet in queueBar . This means
that enqInnerBar can be defined in terms of adding and queueBar .
The value of deqInnerBar (d ) for a dequeuer d should equal the value of
deq(d ) except between when d has removed the first element of queueBar
CHAPTER 6. AUXILIARY VARIABLES 246

(by executing the stuttering step added in case 1) and before the subse-
quent EndPODeq pqs (d ) step has occurred. In that case, deq(d ) should
equal qBar (1). It’s therefore easy to define deqInnerBar as a state func-
tion of IPOFifo pqs if the value of the stuttering variable s added in case 1
contains the value of d for which the following EndPODeq pqs (d ) step is to
be performed.

This completes the sketch of how auxiliary variables are added to IPOFifo
to define a refinement mapping under which it implies IFifo, showing that
POFifo refines Fifo. The intellectually challenging part was discovering how
to define qBar . It took me quite a bit of thinking to find the definition. This
was not surprising. The example of the fifo had been studied for at least
15 years before Herlihy and Wing discovered that it could be implemented
without maintaining a totally ordered queue. Given the definition of qBar ,
constructing the refinement mapping required the ability to write abstract
programs mathematically—an ability that comes with practice.

6.6 Prophecy Constants


We have seen examples of showing |= T ⇒ S , where S equals ∃ . . . : IS , by
adding an auxiliary variable a to T and showing T a implies IS under a
refinement mapping. In these examples, IS and S were safety properties.
Liveness was shown by showing that T a conjoined with the lower-level pro-
gram’s liveness property implies the higher-level program’s liveness property.
Let’s return to an example we considered in Section 6.1.3. Define IS and
T to be the two formulas defined in (6.7) in (6.8):

IS = ∧ (x = 0) ∧ (y ∈ N)
∧ 2 [(y > 0) ∧ (x 0 = x + 1) ∧ (y 0 = y − 1)]hx ,y i
T = (x = 0) ∧ 2[x 0 = x + 1]x ∧ 32[x 0 = x ]x

and let S equal ∃ y : IS . We observed that formulas S and T are equiva-


lent, so |= T ⇒ S is true. To define a refinement mapping under which T
implies IS , we have to add one or more auxiliary variables to T to obtain
an expression to substitute for y.
The liveness property 32[x 0 = x ]x implies that a behavior of T eventu-
ally stops incrementing x and terminates. To define the refinement mapping,
we need a variable that can predict the value of x when the behavior termi-
nates. So, we need a prophecy variable.
CHAPTER 6. AUXILIARY VARIABLES 247

Because we have to predict something that is implied by a liveness prop-


erty, it appears that the only kind of prophecy variable that will work is one
that predicts the entire future—namely an infinite prophecy variable that
predicts all the future values of x . This means adding auxiliary variables,
including a prophecy variable, the way it is done in the completeness proof
of Section 6.4.4. This is disturbingly complicated for such a simple example.
Fortunately, there is a simple way to construct the refinement mapping
under which T implies IS without using a prophecy variable. We first
observe that T implies that eventually x is forever equal to some natural
number—more precisely:

(6.17) |= T ⇒ ∃ n ∈ N : 32(x = n)

This implies:

|= T ≡ ∃ n ∈ N : T ∧ 32(x = n)

For any formulas F and G and constants c and C , to prove (∃ c ∈ C : F ) ⇒ G


it suffices to prove (c ∈ C ) ∧ F ⇒ G. Therefore, to prove T ⇒ IS , it suffices
to prove

(6.18) |= (n ∈ N) ∧ 32(x = n) ∧ T ⇒ IS

We can do this by adding the history variable h as follows to obtain T h :



T h = ∧ (n ∈ N) ∧ (x = 0) ∧ (h = n)
∧ 2[(x 0 = x + 1) ∧ (h 0 = h − 1)]hx ,h i
∧ 32[x 0 = x ]x ∧ 32(x = n)
We can then use the refinement mapping that substitutes h for y, and we
show:

|= T h ⇒ (IS with y ← h)

The key step in what we’ve done is the use of (6.17) to reduce showing
|= T ⇒ IS to showing (6.18). This procedure is called adding the prophecy
constant n to T . This is perhaps a silly name, since n is just an ordinary
bound constant. However, it serves to predict something that is eventually
going to happen—in this case, the final value x will have.
In general, we add a prophecy constant c to a program T by showing
|= T ⇒ ∃ c ∈ C : L , where the constant c does not occur in T , which implies

|= T ≡ ∃ c ∈ C : T ∧ L
CHAPTER 6. AUXILIARY VARIABLES 248

and showing |= T ⇒ IS by showing

|= (c ∈ C ) ∧ T ∧ L ⇒ IC

It can be shown that prophecy constants are in principle as powerful as


prophecy variables. In particular, Theorem 6.6 of Section 6.4.4 is true with
the prophecy variable replaced by a prophecy constant. This is proved by
modifying the proof of Theorem 6.6 as described in Appendix Section B.12,
after the proof of that theorem.
Although in theory equivalent, prophecy variables and prophecy con-
stants are quite different in practice. It appears that a prophecy variable
is best for predicting what a safety property implies may happen, while a
prophecy constant is best for predicting what a liveness property implies
must eventually happen.
Prophecy constants were introduced by Wim Hesselink, who called them
“eternity variables” [19]. He did not represent programs mathematically, so
he had to invent a rule for adding them to programs.
Chapter 7

Loose Ends

This chapter covers two topics that, to my knowledge, have not yet seen any
industrial application. However, they might in the future become useful.
The first topic is reduction, which is about verifying that a program satisfies
a property by verifying that a coarser-grained version of the program satisfies
it. Even if you never use it, understanding the principles behind reduction
can help you choose the appropriate grain of atomicity for abstract programs.
For that purpose, skimming through sections 7.1.1–7.1.3 should suffice.
The second topic is about representing a program as the composition of
component programs. We have been representing the components that make
up a program, such as the individual processes in a multiprocess program, as
disjuncts of the next-state action. Section 7.2 explains how the components
that form a program can be described as programs. How this is done depends
on why it is done. Two reasons for doing it and the methods they lead to
are presented.

7.1 Reduction
7.1.1 Introduction
7.1.1.1 What Reduction Is
When writing an abstract program to describe some aspect of a concrete
one, we must decide what constitutes a single step of a behavior. Stated
another way, we must describe what the grain of atomicity of the next-state
action should be. The only advice provided thus far is that we should use
the coarsest grain of atomicity (the fewest steps) that is a sufficiently ac-
curate representation of that aspect of the concrete program. “Sufficiently

249
CHAPTER 7. LOOSE ENDS 250

accurate” means that either we believe it is easy to make the concrete pro-
gram implement that grain of atomicity, or we are deferring the problem of
how those atomic actions are implemented.
Some work has addressed the problem of formalizing what makes an ab-
stract program “sufficiently accurate”, starting with a 1975 paper by Richard
Lipton [39]. This work used the approach called reduction, which replaces
a program S with an “equivalent” coarser-grained program S R called the
reduced version of S . Certain properties of S are verified by showing that
S R satisfies them. The program S R is obtained from S by replacing certain
nonatomic operations with atomic actions, each atomic action producing
the same effect as executing all the steps of the nonatomic operation it re-
places one after another, with no intervening steps of other operations. The
reduced program S R is therefore simpler and easier to reason about than
program S .
It was never clear in exactly what sense S R was equivalent to S , and the
results were restricted to particular classes of programs and of the properties
that could be verified. TLA enabled a new way of viewing reduction. In
that view, the variables of S are replaced in S R by “virtual” variables, and
S implements S R under a refinement mapping. The refinement mapping
is not made explicit, but the relation between the values of the actual and
the virtual variables is described by an invariant. I believe that this math-
ematical view encompasses all previous work on reduction for concurrent
programs.
Our basic problem approach to writing a correct concrete program is
showing that it refines an abstract program. There are two aspects to refin-
ing one program with another: data refinement and step refinement. Modern
coding languages have made data refinement easier by providing higher-level,
more abstract data structures. It is now almost as easy to write a program
that manipulates integers as one that manipulates bit strings representing
a finite set of integers. There has been much less progress in making step
refinement easier. As explained in Section 6.5.1, a linearizable object allows
a coarse grain of atomicity in descriptions of the code that executes oper-
ations on the object. However, the only general method of implementing
a linearizable object still seems to be the one invented by Dijkstra in the
1960s: putting the code that reads and/or modifies the state of the object
in a critical section.
I believe that better ways of describing the grain of atomicity will be
needed if rigorous verification that concrete concurrent programs implement
abstract ones is to become common practice. Reduction may provide the
key to doing this. This section provides a mathematical foundation for
CHAPTER 7. LOOSE ENDS 251

understanding reduction. The theorems presented here are not the most
general ones possible; some generalizations can be found elsewhere [8]. Also
omitted are rigorous proofs. I know of no industrial use of reduction or of
tools to support it; and I have no experience using the results described here
in practice. The problem it addresses is real, but I don’t know if reduction
is the solution.

7.1.1.2 The TLA Approach


Here is how reduction can be described in TLA. Let x be the variables x 1 ,
. . . , x n of S . Reduction is usually described by letting x also be the variables
of S R . However, I find that reduction is easier to understand by letting the
variables of S R be different from those of S . We will let those variables be
the list X of variables X 1 , . . . , X n . The goal is then to verify that S satisfies
a property P by verifying that S R satisfies the property P with x ← X .
To deduce |= S ⇒ P , where S and P contain the variables x, from
|= S R ⇒ (P with x ← X), we need a relation between the variables x
and the variables X. That relation is expressed by a state predicate I R
containing the variables x and X. We then deduce that S satisfies P from
the following three conditions. The formula T will be explained later; ignore
it for the moment.

R1. |= S ∧ T ⇒ ∃ X : S R ∧ 2I R
R2. |= S R ⇒ (P with x ← X)
R3. |= (P with x ← X) ∧ 2I R ⇒ P

Condition R1 is implied by theorems whose hypotheses are the conditions


necessary for reduction to be possible. Condition R2 asserts that P with
its variables x replaced by X is satisfied by the reduced program S R , which
should be easier to verify than that P is satisfied by the finer-grained pro-
gram S .
Throughout this section on reduction, we abbreviate F with x ← X as
F , for any formula F . Thus R2 and R3 can be written as

|= S R ⇒ P and |= P ∧ 2I R ⇒ P

We first consider the case in which S is the usual TLA safety property
Init ∧ 2[Next]hxi for an abstract program. We then consider the program
described by S ∧ F , where F is the conjunction of fairness properties for S .
Conditions R1–R3 will then have S replaced by S ∧ F , the reduced program
(S ∧ F )R being defined to equal S R ∧ F R where F R is obtained by replacing
CHAPTER 7. LOOSE ENDS 252

each fairness condition for an action A of S with fairness of a corresponding


action AR of S R .
The formula T in the hypothesis of R1 is a liveness assumption that
will be defined below. It turns out that even when S is a safety property,
so S R is also safety property, R1 requires a liveness assumption about S .
For a program with a fairness property F , we expect S ∧ F to imply T . If
the program describes only safety, then it can satisfy only safety properties,
so it would be strange if we required S to satisfy a liveness property to use
reduction. We will see that, to show S satisfies a safety property P , showing
that S satisfies a possibility condition allows us to assume that T is satisfied.
To prove the theorem that asserts R1, we construct an abstract program
S ⊗S R , with variables x and X, that describes the programs S and S R
running simultaneously in parallel. (S ⊗S R is just an identifier that names
a formula, not an expression with formulas S and S R and an operator ⊗.
This unusual identifier is meant to remind us what the formula means.) We
then show:
R1a. |= S ⇒ ∃ X : S ⊗S R
R1b. |= S ⊗S R ∧ T ⇒ S R ∧ 2I R
R1b implies |= (∃ X : S ⊗S R ∧ T ) ⇒ (∃ X : S R ∧ 2I R ); and the variables X
don’t appear in T , so ∃ X : S ⊗S R ∧ T equals (∃ X : S ⊗S R ) ∧ T . Therefore,
R1a and R1b imply R1.
One property asserted by I R is that, if the values of x describe a state in
which none of the nonatomic operations of S being reduced is currently being
executed, then x = X. The values of x and X are synchronized like this,
despite the reduced program taking fewer steps than the original program,
by having S ⊗S R take X-stuttering steps—steps that leave the values of
the variables X unchanged. R1b is satisfied because these X-stuttering
steps implement stuttering steps of S R . R3 was satisfied in Lipton’s original
paper because he considered terminating programs and properties P that
depend only on the initial and final states of the program. In those states,
none of the nonatomic operations of S being reduced are being executed, so
x = X. In what may have been the second paper published on reduction,
Doeppner [10] satisfied R3 by proving only that an invariant is true when
none of the reduced operations are being executed.

7.1.2 An Example
To explain reduction, we start by examining this commonly assumed rule:
When reasoning about a multiprocess program in which interprocess com-
CHAPTER 7. LOOSE ENDS 253

.
.
rk: .
r 1: R1 ;
.
.
.
rk: Rk ;
c: C;
l 1: L1 ;
.
.
.
lm Lm ;
o: ...
.
.
.

Figure 7.1: The nonatomic operation RCL.

munication is performed by atomic operations to shared data objects, the


program can be represented with any grain of atomicity in which each atomic
action accesses at most one shared data object.1 The following is a state-
ment of the rule in our science and the argument generally used to justify
it.
Suppose S is a multiprocess program with a process that executes a
nonatomic operation, which we call RCL, described by the statements shown
in Figure 7.1. We assume this is “straight line” code, so an execution of RCL
consists of k +1+m steps. For now, we let S be the safety property described
by the code; liveness is discussed later. We assume statements Ri and Lj
can access only data local to the process, while statement C can also access
shared data. The rule asserts that we can replace S with its reduced version
S R obtained by removing all the labels between (but not including) r 1 and
o, so those k + 1 + m statements are executed as a single step, and replacing
the variables x with the variables X. It is usually claimed that we can do
this because other processes can’t observe when the statements Ri and Lj
are executed, so we can pretend that they are executed in the same step as
statement C.
We can reduce other operations of the same form to atomic actions as
well, reducing the operations one at a time. So, it suffices to see how it’s
done for just this single operation RCL, which may be executed multiple
times.
1
The rule was stated in print independently by Owicki and Gries [42] and by me [28],
but I think it was well known at the time.
CHAPTER 7. LOOSE ENDS 254

7.1.2.1 The Reduced Behaviors


Let R i , C , and Lj be the TLA actions described by statements Ri , C, and
Lj . An execution of RCL consists of a sequence of R i steps, for i ∈ 1 . . k ,
followed by a C step, followed by a sequence of Lj steps, for j ∈ 1 . . m. In
an execution of the operation during a behavior of the program, interleaved
between those steps may be steps performed by other processes. (There can
also be stuttering steps that leave the values of variables x unchanged, but
they are irrelevant and we can ignore them.) A reduced execution of the
operation execution is one in which all the steps of RCL occur one after the
other, with no interleaved steps executed by other processes.
We transform a behavior to a reduced behavior, in which only reduced
executions of RCL occur, by a procedure illustrated with the following por-
tion of a behavior containing an execution of RCL, where RCL has one R i
statement (k = 1) and two Lj statements (m = 2). We show the name of
the action that each step satisfies, where E 1 , E 2 , and E 3 are actions of other
processes.
R 1 E
1 C 2 E 3 E 1 L2 L
(7.1) · · · s 41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s 48 · · ·

Define R to be the state predicate that is true of a state iff that state occurs
during an execution of RCL before the C action. In the part of a behavior
shown in (7.1), R is true only in states s 42 and s 43 . Define L to be the state
predicate asserting that the process is currently executing operation RCL
after the C action, so in (7.1) it is true in states s 44 –s 47 . In general, if p is the
process executing RCL of Figure 7.1, then R equals pc(p) ∈ {r 2, . . . , r k , c}
and L equals pc(p) ∈ {l 1 , . . . , l m }. The behavior is currently executing
RCL iff the state predicate R ∨ L is true. Thus, the operation is not being
executed iff ¬(R ∨ L) is true.
Because R i and Lj actions access only process-local state, they commute
with actions from other processes, where two actions commute iff executing
them in either order has the same effect. Recall that A · B is defined in
Section 3.4.1.4 to be the action that is satisfied by a step s → t iff there
is a state u such that s → u is an A step and u → t is a B step. Actions
A and B commute iff A · B equals B · A. If A and B commute, then for
A B
any states s, t, and u such that s −→ u −→ t, there exists a state v such
B A
that s −→ v −→ t. By commuting R i and Lj actions with actions of other
processes, moving R i actions to the right and Lj actions to the left, we
obtain a behavior in which every execution of RCL is reduced, with no steps
of other processes interleaved. For example, commuting actions in this way
CHAPTER 7. LOOSE ENDS 255

converts the original behavior into the reduced behavior in which the portion
of the original behavior shown in (7.1) is converted to:
E 1 R 1 C L1 L
2 E
2 3E
(7.2) · · · s 41 −→ u 42 −→ s 43 −→ s 44 −→ u 45 −→ u 46 −→ u 47 −→ s 48 · · ·

A state s i is changed to a possibly different state u i if commutativity was


A B
used to commute actions A and B in · · · −→ s i −→ · · ·. For example, state
s 46 had to be (possibly) changed twice to arrive at u 46 because both the E 2
and E 3 actions had to be “moved across” state number 46 of the behavior
to get from (7.1) to (7.2). If σ is the original behavior (7.1), we define Φ(σ)
to be its reduced version (7.2), obtained by transforming all executions of
RCL in this way.
Note that the states s 41 and s 48 are the same in σ and Φ(σ) because
RCL is not being executed in those states. States s 43 and s 44 , which form
the C step, are also the same in the original and reduced behaviors because
the C action is not commuted with any action. The reduced behavior (7.2)
is also a behavior of program S because it has the same initial state as (7.1),
which satisfies Init, and every step satisfies a subaction of the next-state
action Next.

7.1.2.2 The Program S ⊗S R


We define S ⊗S R in terms of a mapping φ from states in the original behavior
σ to states in the reduced behavior Φ(σ)—in our example, from states of
(7.1) to states of (7.2). First, we define φ(s) for all states except the ones in
a C step, which in (7.1) are all states except for s 43 and s 44 . We call those
two states “C states” and the other states “non-C states”.
We transform a behavior containing executions of RCL to a reduced be-
havior by a sequence of action interchanges based on commutativity. There
are two kinds of interchange, both involving an E h action of a different pro-
cess: one that moves an R i step to the right of an E h step, and one that
moves an Lj step to the left of an E h step. To go from behavior (7.1) to
behavior (7.2) requires five interchanges. Figure 7.2 shows the sequence of
behaviors created by performing these interchanges, where each behavior is
obtained from the preceding one by interchanging the left-most R i or Lj
action that can be moved closer to the C action. The top behavior is the
original behavior (7.1) and the bottom one is the reduced behavior (7.2).
Observe that every state in one behavior equals the corresponding state
in the next behavior except for the one state across which the actions are
interchanged.
CHAPTER 7. LOOSE ENDS 256

R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·

? E1 u R1 s C E2 ? E3 s ? L1 s ? L2 s ?
· · · s41 −→ 42 −→ 43 −→ s 44 −→ s 45 −→ 46 −→ 47 −→ 48 · · ·
@
E1 R1 C E2 L1 E@R ? L2
3@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ s 45 −→ r46 −→
? ? ? s 47 −→ s48
?
···
@
? E1 ? R1 C L1 E@R ? E 3 ? L2
2@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ r46 −→ s 47 −→ s48
?
···
@
E1 R1 C L1 E2 L2 E@3@
R ?
· · · s41
?
−→ u42
?
−→ s 43 −→ s 44 −→ u45
?
−→ r46
?
−→ u47 −→ s 48 ···
@
? E1 u ? R1 s C L1 ? L2 u E2 ? E3 s ?@
R
@
· · · s41 −→ 42 −→ 43 −→ s 44 −→ u45 −→ 46 −→ u47 −→ 48 · · ·

Figure 7.2: Constructing (7.2) from (7.1).

The arrows in the picture are drawn according to the following rules.
There is a (thin) downward pointing arrow from each non-C state that is
unchanged by the interchange that yields the next behavior. From the one
state in each behavior that is changed by the interchange, there is a (thick)
diagonal arrow. If that state satisfies R (is before the C action), then the
arrow points one state to the left of the changed state. If the state satisfies
L, then the arrow points one state to the right.
These arrows define a unique path from every non-C state s i of the
original behavior to a state in the reduced behavior. Define φ(s i ) to be that
state in the reduced behavior. For the example in Figure 7.2, φ(s 45 ) = u 47
because the sequence of states in the path from s 45 in the top behavior to
the bottom behavior is:

(7.3) s 45 → s 45 → s 45 → r 46 → r 46 → u 47

Figure 7.3 contains an arrow pointing from each non-C state s i in the orig-
inal behavior to the state φ(s i ) in the reduced behavior. Observe that for
every non-C state s i , the state φ(s i ) is a state in which operation RCL is
not being executed—that is, a state satisfying ¬(R ∨ L).
C
We define φ(s) for the C states so that if the C step is s i −→ s i+1 , then
φ(s i ) is the first state to the left of s i for which ¬(R∨L) is true, and φ(s i+1 )
is the first state to the right of s i+1 for which ¬(R ∨ L) is true. In other
words, φ(s i ) and φ(s i+1 ) are the states of the reduced behavior in which the
execution of RCL begins and ends, respectively. The complete mapping φ
CHAPTER 7. LOOSE ENDS 257

R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·

? E1 R1 C L1 L2 E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·

Figure 7.3: The mapping si → φ(si ) for non-C states.

R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·

? E1 R1 C L1 L2 j E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·

Figure 7.4: The complete mapping si → φ(si ).

for our example is shown in Figure 7.4, where the mapping for non-C states
is in gray.
Formula S ⊗S R is defined so that it is satisfied by a behavior σ iff σ
satisfies S (which describes the values of variables x) and the values of the
variables X in any state s k of σ equal the values of the variables x in the state
φ(s k ) of the reduced behavior Φ(σ). Since this assigns values of variables X
to every state of every behavior σ so that the behavior satisfies S ⊗S R , we
see that |= S ⇒ ∃ X : S ⊗S R is true, so R1a is satisfied.
From Figure 7.4, we see that for any k :
φ1. If s k → s k +1 is an E h step (so E h is an action of another process)
then φ(s k ) → φ(s k +1 ) is also an E h step.
φ2. If s k → s k +1 is an R i or Lj step, then φ(s k ) = φ(s k +1 ).
φ3. If s k → s k +1 is a C step, then φ(s k ) and φ(s k +1 ) are the first and
last states of an execution of operation RCL in the reduced behavior
Φ(σ), which is an execution with no interleaved steps of other process
actions.
Recall that G equals G with x ← X for any formula G. A step of
SR is either an Eh step, a D step, where D is an action that performs
an execution of operation RCL as a single step, or a stuttering step that
leaves the variables X unchanged. From φ1–φ3, we see that if behavior
(7.1) satisfies S ⊗S R (as well as S ), then each step of that behavior is either
an Eh step (by φ1), a D step (by φ3) or leaves the variables X unchanged
(by φ2). Hence each step of S ⊗S R satisfies the next-state action of S R .
The initial-state predicate of S R is Init (remember that Init is the initial-
state predicate of S ). Since operation RCL is not being executed in the
CHAPTER 7. LOOSE ENDS 258

initial state s 0 of (7.1), φ(s 0 ) equals s 0 , which implies that s 0 satisfies Init.
Thus, if (7.1) satisfies S ⊗S R , then it satisfies S R . Therefore, S ⊗S R ⇒ S R
is satisfied—or so it seems.
The reasoning works in this example because the behavior (7.1) contains
a complete execution of operation RCL. However, S is a safety property,
which means that the process can stop executing actions at any point during
the execution of the operation. The general definition of the reduced version
of a behavior, which includes a possibly incomplete execution of RCL, is that
actions performing steps of an RCL execution are made to occur together
by commuting R i actions to the right, Lj actions to the left, and leaving a
C action unmoved. (If there is no C step, then the last R i action can be
left unmoved.) If s is the last state of a C step, φ(s) is defined to be the
state after the last step of the RCL operation being executed. That state
will satisfy ¬(R ∨ L) iff the behavior contains a complete execution of the
operation.
The one part of what we’ve done that’s not correct in the presence of
a partial execution is φ3. Statement φ3 is vacuously true if the partial
execution doesn’t contain a C step. In that case there are only R i steps
which correspond to stuttering steps of S R , so the behavior of S ⊗S R is a
behavior of S R in which that RCL action doesn’t occur. The problem in φ3
arises in our example if the execution contains a C step but doesn’t contain
both the L1 and L2 steps.
The solution is to rule out such behaviors. That’s what the hypothesis
T in R1 and R1b does. Formula T must assert that any execution of RCL
that includes the C step must complete. The operation has performed the C
step but has not completed iff L is true. So we want to allow only behaviors
in which it is not the case that L eventually becomes true and remains true
forever. Such behaviors are ones satisfying ¬32L, which is equivalent to
23¬L. So, we can restate R1b as this assumption:
|= S ⊗S R ∧ (23¬L) ⇒ S R ∧ 2I R
The argument showing S ⊗S R implies S R for the behavior (7.1) applies to
all behaviors of the program of Figure 7.1 satisfying 23¬L. That is, we
have shown:
|= S ⊗S R ∧ (23¬L) ⇒ S R
is satisfied by this program S . To complete the proof of R1b, we must define
I R and show that it is an invariant of S ⊗S R . Since we have seen that S
satisfies R1a, this will show that it satisfies R1 with T equal to 23¬L. The
assumption 23¬L is discussed in Section 7.1.4.
CHAPTER 7. LOOSE ENDS 259

7.1.2.3 The Invariant I R


The invariant I R of S ⊗S R that relates the values of the variables x to those
of the variables X follows from three additional properties of the mapping
φ. We can see why those properties hold from Figures 7.2–7.4.
The most obvious of these properties follows from the observation that
if ¬(R ∨ L) is true in a state s, then only downward pointing arrows are
drawn from it in the figures. This shows:

φ4. For any state s in which ¬(R ∨ L) is true, φ(s) = s.

For the next property, look at the path (7.3) from the state s 45 to the state
φ(s 45 ), which equals u 47 . Follow that path in Figure 7.2. Observe that each
step in the path either leaves the state unchanged (is a stuttering step) or
else is an L1 or L2 step. (To see this, look at the horizontal arrows just
above the diagonal arrows.) We can see from the figure that this is true
of the path from state s to state φ(s) for the states s 46 and s 47 as well.
The rule for drawing the arrows implies that this is true in general, for all
executions of the program of Figure 7.1. The path from every non-C state
s for which L is true to φ(s) consists of a sequence of steps each of which is
either a stuttering step or an Lj step for some j . From Figure 7.4, it’s clear
that this is also true for the C state for which L is true.
Let’s define L to equal L1 ∨ . . . ∨ Lm , so any Lj step is an L step. We have
seen that we can get from a state s in which L is true to the state φ(s) by
a sequence of stuttering steps and/or L steps. Recall that in Section 4.3.2,
for any action A we defined A+ to equal A ∨ (A · A) ∨ (A · A · A) ∨ . . . .
Therefore, a step s → t satisfies ([L]hxi )+ , which we abbreviate [L]+ hxi , iff
we can get from state s to state t by a sequence of L steps or steps that
leave the variables x unchanged. Let’s write the subscript hxi as simply x,
so [A]x and hAix mean [A]hxi and hAihxi for any action A.
We have thus seen:

φ5. For any state s in which L is true, s → φ(s) is an [L]+


x step.

Following the path from s to φ(s) backwards for states s in which R is true
similarly leads to the following, where R equals R 1 ∨ . . . ∨ R k .2

φ6. For any state s in which R is true, φ(s) → s is an [R]+


x step.
2
The action R bears no relation to the superscript R in S R . It is traditional to name
the action R for Right-mover because of the way the reduced behavior is constructed; and
there seems to be no better superscript than R to signify reduced.
CHAPTER 7. LOOSE ENDS 260

Finally, Figure 7.4 shows that for any state s of the original behavior, φ(s)
is always a state in the reduced behavior in which the RCL operation is not
being executed, so ¬(R ∨ L) is true for φ(s). Because the rule for drawing
the arrows in Figure 7.2 creates a leftward pointing arrow whenever an R
step is moved to the right and a rightward pointing arrow whenever an L
step is moved to the left, this is true in general. Therefore, we have:

φ7. For any state s, the state predicate ¬(R ∨ L) is true in state φ(s).

Statements φ4–φ7 give us relations between s and φ(s) for all states s of a
behavior of our example program. We now have to turn them into relations
between the values of the variables x and the variables X in any state s in
a behavior of S ⊗S R .
It’s easy to do this for φ4. The values of the variables X in state s are
the values of x in φ(s). Since φ(s) = s if s satisfies ¬(R ∨ L), this means
that x = X is true for any reachable state of S ⊗S R satisfying ¬(R ∨ L). In
other words:

(7.4) ¬(R ∨ L) ⇒ (X = x) is an invariant of S ⊗S R

To express the relations between x and X implied by φ5 and φ6, we need


a bit of notation. For any action A containing only the variables x, a step
s → t satisfies A iff formula A is true when we substitute for the variables x
their values in state s and for x0 their values in state t. Therefore, φ(s) → s
is an [R]+ +
x step iff [R]x is true when we substitute the values of x in φ(s) for
the variables x and the values of x in s for the primed variables x0 . But the
values of x in φ(s) are by definition the values of X in state s. So φ(s) → s
is an [R]+
x step iff s satisfies the formula we get by substituting X for x and
x for x0 in [R]+x . Using the construct awith defined in Section 5.4.4.1, we
write the formula produced by these substitutions as:
0
[R]+
x awith x ← X, x ← x

so formula φ6 implies:
0
(7.5) R ⇒ ([R]+
x awith x ← X, x ← x) is an invariant of S ⊗S R

Remember that in this formula, the definition of [R]+x must be fully expanded
before the awith substitution is performed.
We can obtain a similar relation between x and X from φ5. The state-
ment that s → φ(s) is an [L]+ x step is equivalent to the statement that state
s is satisfied by the formula obtained from [L]+x by substituting for variables
x their values in state s and substituting for x0 the values of x in state
CHAPTER 7. LOOSE ENDS 261

φ(s), the latter being the values of X in state s. Therefore, φ5 implies the
following, where x ← x has been eliminated from the awith formula, since
it just states that x is substituted for itself:
0
(7.6) L ⇒ ([L]+
x awith x ← X) is an invariant of S ⊗S R
Finally, since the values of X in state s equal the values of x in state φ(s),
from φ7 we obtain:
(7.7) ¬(R ∨ L) with x ← X is an invariant of S ⊗S R
The invariant I R of S ⊗S R relating the values of x and X is the con-
junction of the invariants (7.4)–(7.7):

IR = ∧ ¬(R ∨ L) ⇒ (X = x)
∧ R ⇒ ([R]+ 0
x awith x ← X, x ← x)
∧ + 0
L ⇒ ([L]x awith x ← X)
∧ ¬(R ∨ L) with x ← X

7.1.3 Reduction In General


So far, we have been considering reduction of an operation RCL of one
process of a multiprocess program having the form shown in Figure 7.1. We
now extend this to a more general situation. First, some observations about
reducing RCL.
The invariant I R is described in terms of the actions R and L, which are
the disjunction of the actions R i and Lj , respectively. The fact that R and
L were defined in this particular way was irrelevant. All we required was
that an execution of the operation RCL consists of a sequence of R steps
followed by a C step followed by a sequence of L steps. We didn’t need
actions E h of other processes to commute with each of the actions R i and
Lj . In drawing the diagrams illustrated by Figures 7.2 and 7.4, we didn’t
have to identify the subactions of R and L that are involved. We could have
replaced each R i by R and each Li by L. We only needed to assume that
each action of another process commutes with R and L, not with each R i
and Lj individually. (For example, an E h · L1 step could be an L2 · E h step.)
Similarly, we didn’t have to require that R and L commute with particular
actions of other processes. We just required R and L to commute with the
disjunction of the next-state relations of all the other processes—an action
we will call E .
Furthermore, we didn’t need to require that R and L commute with E .
To move R steps to the right, we just required that for any pair of steps
CHAPTER 7. LOOSE ENDS 262

R E E R
s −→ u −→ t there is a pair of steps s −→ v −→ t for some state v . This
condition can be stated as the requirement R ·E ⇒ E ·R. When this formula
holds, we say that R right-commutes with E , and that E left-commutes with
R. Similarly, to move L steps to the left, we don’t need E · L to equal L · E ;
we need only require E · L ⇒ L · E , which asserts that L left-commutes with
E (and E right-commutes with L).
Finally, suppose that A is another action of the same process containing
the RCL operation, so A does not allow steps that implement the RCL
operation. Since the next step of the process after an R step can only be
an R step or a C step, an A step cannot follow an R step. This implies
that R · A must equal false, which means that R · A ⇒ A · R is trivially
true. Similarly, the only step of the process that can immediately precede
an L step is an L step or a C step. Therefore, A · L must equal false,
so A · L ⇒ L · A must be true. Therefore, other actions of the process
containing RCL satisfy the same commutativity requirements as actions of
other processes. This implies that we can completely forget about processes.
We just assume that the program’s next-state action equals E ∨ R ∨ C ∨ L,
where R right-commutes with E and L left-commutes with E . If we represent
the program as a collection of processes, there is no need for R, C , and L
steps to all be steps of the same process.
What we need to require is that execution of an RCL operation consists
of a sequence of R steps followed by a C step followed by a sequence of L
steps. To express this requirement, we generalize R and L from assertions
about a process’s control state to arbitrary state predicates satisfying certain
conditions. We take as primitives the state predicates R and L and the
actions E and M such that the program’s next state relation equals E ∨ M ,
where M describes the operation to be reduced. The actions R, C , and L
will be defined in terms of R, L, and M . We therefore assume the original
program S is defined by:

S = Init ∧ 2[E ∨ M ]x

As in our example, we assume that R is true when execution of the opera-


tion described by M is in its first phase, where execution has begun but the
C action has not yet been executed; and L is true when execution is in its
second phase, where the C action has been executed but the operation exe-
cution has not yet terminated. This is implied by the following assumptions
on executions of an operation described by the action M :

M1. In the initial state, M is not in the middle of an execution, expressed


by Init ⇒ ¬(R ∨ L).
CHAPTER 7. LOOSE ENDS 263

M2. An E step can’t change the current phase of an execution of M , ex-


pressed by E ⇒ (R0 ≡ R) ∧ (L0 ≡ L).

M3. An M step can’t go from the second phase to the first phase, expressed
by ¬(L ∧ M ∧ R0 ).

M4. The two phases are disjoint, expressed by ¬(R ∧ L).

We can define the actions R, L, and C in terms of M , R, and L by:

R = M ∧ R0 C = (¬L) ∧ M ∧ (¬R0 )
∆ ∆ ∆
L = L∧M

A complete execution of the operation described by M consists of a sequence


of M steps beginning and ending in a state in which the M operation is not
being executed—in other words, in a state satisfying ¬(R ∨ L). Therefore,
the action M R that executes the complete operation as a single step for the
variables X is:

M R = (¬(R ∨ L) ∧ M + ∧ ¬(R ∨ L)0 ) with x ← X


We can therefore define S R as follows, where Init R and E R are Init and E
with the variables x replaced by the variables X:

S R = Init R ∧ 2[E R ∨ M R ]X

If R always equals false, then R equals false so there is no R action.


Similarly, there is no L action if L always equals false. If there is neither
an R nor an L action, then M equals C and S R equals (S with x ← X) so
reduction accomplishes nothing.
Ignoring liveness, reduction is described in TLA by a theorem asserting
that R1 (with T equal to 23¬L) is implied by the definitions of S , R, L,
M , and S R above, assumptions M1–M4, and the commutativity relations
assumed of R, L, and E .
M1–M4 and the commutativity relations are action formulas. (Remem-
ber that a state predicate is an action whose value depends only on the first
state of a step.) Those action formulas don’t have to be true of all possible
steps, just on steps in behaviors satisfying S . (The action formulas are usu-
ally meaningless for states not satisfying some type-correctness predicate.)
If x is the list of all variables that can appear in S and A, then the assertion
that an action formula A is true for all steps of behaviors satisfying S can
be written as |= S ⇒ 2[A]x . Here is reduction, without liveness, expressed
as a theorem.
CHAPTER 7. LOOSE ENDS 264

Theorem 7.1 Assume Init, L, and R are state predicates, M and E are
actions, x is a list of all variables appearing in these formulas, and X is a
list of the same number of variables different from the variables x. Define

S = Init ∧ 2[E ∨ M ]x
∆ ∆ ∆
R = M ∧ R0 L = L∧M
∆ ∆
Init R = Init with x ← X E R = E with x ← X

M R = (¬(R ∨ L) ∧ M + ∧ ¬(R ∨ L)0 ) with x ← X
S R = Init R ∧ 2[E R ∨ M R ]X


IR = ∧ ¬(R ∨ L) ⇒ (X = x)
∧ R ⇒ ([R]+ 0
x awith x ← X, x ← x)
∧ + 0
L ⇒ ([L]x awith x ← X)
∧ ¬(R ∨ L) with x ← X
and assume:
(1) |= Init ⇒ ¬(R ∨ L)
(2) |= S ⇒ 2 [ ∧ E ⇒ (R0 ≡ R) ∧ (L0 ≡ L)
∧ ¬(L ∧ M ∧ R0 )
∧ ¬(R ∧ L)
∧ R·E ⇒E ·R
∧ E · L ⇒ L · E ]x
Then |= S ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R .

Assumption (1) and the first three conjuncts in the action of assumption
(2) are the conditions M1–M4, which assert that an execution of the opera-
tion described by the action M consists of a sequence of L steps followed by
a C step followed by a sequence of R steps. The final two conjuncts in the
action of assumption (2) are the assumptions that R right-commutes with
E and L left-commutes with E .
In practice, R, L, and E will be defined to be the disjunction of sub-
actions. This allows us to decompose the proofs of those commutativity
conditions by using the following theorem that is proved in the Appendix.

Theorem 7.2 If A ≡ ∃ i ∈ I : Ai and B ≡ ∃ j ∈ J : B j for actions Ai and


B j , then:

|= (∀ i ∈ I , j ∈ J : Ai · B j ⇒ B j · Ai ) ⇒ (A · B ⇒ B · A)

By this theorem, to show that R right-commutes with E , it suffices to show


that each subaction in the definition of R right-commutes with each sub-
action in the definition of E . Similarly, L left-commutes with E if each
subaction of L left-commutes with each subaction of E .
CHAPTER 7. LOOSE ENDS 265

7.1.4 The Hypothesis 23¬L


Formula 23¬L is a liveness property. When we add liveness conditions to
S and hope to use reduction to prove that it satisfies a liveness property
P , we would expect S to imply 23¬L. But we wouldn’t expect to have
to add a liveness property to S in order to verify that it satisfies a safety
property. In fact, to verify a safety property, we don’t have to prove that
¬L is always eventually true. The following theorem implies that, to verify
S satisfies a safety property, we can simply assume that 23¬L is true if
it is always possible for ¬L to eventually become true. To understand the
theorem, recall that Section 4.3.2 defines what it means for it to be always
possible for a state predicate Q to eventually become true—namely, that
the program Init ∧ 2[Next]x implies 2 E([Next]+ 0
x ∧ Q ). (See (4.27).) This
theorem is proved in the Appendix:

Theorem 7.3 If S equals Init ∧ 2[Next]x , P is a safety property, and Q is a


state predicate such that |= S ⇒ 2 E([Next]+
x ∧ Q ), then |= S ∧ 23Q ⇒ P
0

implies |= S ⇒ P .

The theorem allows us to replace the formula T in R1b (which we later


defined to equal 23¬L) with 2 E([Next]+ 0
x ∧ ¬L ). Section 4.3.2 explains
that we can verify this possibility condition by finding a fairness condition
F for S and verifying |= S ∧ F ⇒ 23¬L. This is the only case I know of in
which a possibility condition is used to verify a correctness property.

7.1.5 Adding Liveness


Theorem 7.1 allows us to deduce safety properties of the program S from
safety properties of the coarser-grain program S R . We also want to deduce
liveness properties of S from liveness properties of S R . We deduce liveness
properties of a program from fairness properties of program actions. To
deduce liveness properties of S by proving liveness properties of S R , we
extend Theorem 7.1 so its conclusion is
(7.8) |= S ∧ F ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R ∧ F R
where F is the conjunction of fairness properties of subactions of the next-
state action E ∨ M of S and F R is the conjunction of fairness properties of
subactions of the next-state action E R ∨ M R of S R .
To understand how this is done, it helps to think in terms of the program
S ⊗S R, a behavior of which is a behavior both of S (described by the values
of variables x) and of S R (described by the values of variables X) whose
CHAPTER 7. LOOSE ENDS 266

existence is asserted by (7.8). We expect S to satisfy (7.8) because S ⊗S R


satisfies:

(7.9) |= S ⊗S R ∧ F ∧ 23¬L ⇒ S R ∧ 2I R ∧ F R

Formula F R should assert fairness of subactions AR of E R ∨ M R . For sim-


plicity, we consider only actions AR that are subactions of either E R or
M R , which I expect will usually be the case. When it is not the case, the
requirements for deducing fairness of AR include requirements for deducing
fairness of both AR ∧ E R and AR ∧ M R [8].

7.1.5.1 Fairness of Subactions of E R


We defined S R so that E R equals E . (Recall that we defined G to equal
G with x ← X for any formula G.) Property φ1 of Section 7.1.2.2 for our
example generalizes to show that, in any behavior of S ⊗S R, an E step is
also an E step.
Because each E R step corresponds to a single E step in a behavior of
S ⊗S R , we expect fairness of a subaction AR of E R to be implied by fairness
of a single subaction A of E . In other words, we expect this to be true:

(7.10) |= S ⊗S R ∧ XFx (A) ⇒ XFX (AR )

where XF is either WF or SF. By (4.14) and (4.23) we have

XFx (A) ≡ (2
32
3 EhAix ⇒ 23hAix )
XFX (AR ) ≡ (2
32
3 EhAR iX ⇒ 23hAR iX )
where 2 323 means 32 if XF is WF and 23 if XF is SF. These formulas and
a little temporal logic imply that to prove (7.10) it suffices to prove these
two theorems:

(7.11) |= S ⊗S R ⇒ (23hAix ⇒ 23hAR iX )

(7.12) |= S ⊗S R ⇒ 2( EhAR iX ⇒ EhAix )

We make (7.11) true by requiring every hAix step in a behavior of S ⊗S R


to be an hAR iX step. Every E step in a behavior of S ⊗S R is an E R step
because E R equals E . This suggests that we want AR to equal A. (Note
that hAix equals hAiX because x equals X.)
Every E step in a behavior of S ⊗S R is an E R step because of property
φ1, and that property holds because of the commutativity assumptions of
CHAPTER 7. LOOSE ENDS 267

action E with respect to actions R and L. Therefore, we must require that


hAix satisfies the same commutativity relation as E , namely:
(7.13) |= S ⇒ 2[ (R · hAix ⇒ hAix · R) ∧ (hAix · L ⇒ L · hAix ) ]x
With this assumption, defining AR to equal A will make (7.11) true.
The commutativity relations satisfied by E were used to define the map-
ping φ, which in turn was used to define S R . For example, in Figures 7.2–
7.4, the state u 42 , which equals φ(s 43 ), was chosen when right-commuting
R 1 with E 1 so that s 41 → u 42 is an E 1 step and u 42 → s 43 is an R 1 step.
Suppose that A is a subaction of E 1 and s 41 → s 42 is an hAix step. Noth-
ing in the construction in those figures, which describes the choice of φ used
to define S R in Theorem 7.1, ensures that the E 1 step s 41 → u 42 is also
an hAix step. However, the assumption (7.13) ensures that we could have
chosen u 42 to make s 41 → u 43 an hAix step.
In general, we can use (7.13) to ensure that every E step in the original
behavior that is an hAix step remains an hAix step whenever that step is
commuted with an R or L step, so it remains an hAix step in the reduced
behavior. We can therefore define S R so that (7.12) is true, so this is not
a problem when we want to ensure that (7.10) holds for a single action A.
However, we may want (7.10) to hold for multiple subactions A of E , and a
single E step can be an hAix step for more than one of those subactions A.
We will return to this problem in Section 7.1.5.3. Now, we consider making
(7.12) true.
We can deduce (7.12) from these two theorems:
(7.14) (a) |= S ⊗S R ⇒ 2( EhAR iX ⇒ EhAix )
(b) |= S ⊗S R ⇒ 2( EhAix ⇒ EhAix )

Since hAR iX equals hAix and formulas EhAix and EhAix contain only the
variables X, (7.14a) is equivalent to:
(7.15) |= S R ⇒ 2( EhAix ⇒ EhAix )
If E were not a weird operator (see Section 5.4.4.3), EhAix would be equiv-
alent to EhAix ; and we expect that equivalence to be true for most actions
hAix . However, because it is not always true, we have to add (7.15) as an
assumption.
To see what is required to make (7.14b) true, we consider what assump-
tion is required to ensure that P ⇒ P is true for an arbitrary state predicate
P with free variables x. The free variables of P are X, and the relation be-
tween the values of x and X is described by the invariant I R of S ⊗S R .
CHAPTER 7. LOOSE ENDS 268

There are three cases, depending on whether R is true, L is true, or neither


is true:

R true: The conjunct


R ⇒ ([R]+ 0
x awith x ← X, x ← x)
of I R asserts that, in this case, we can arrive at the values of x in the
current state by starting in a state in which the values of x equal the
current values of X and executing a sequence of R steps. This means
that P ⇒ P is true if, starting in a state satisfying P and executing
a sequence of R steps, we reach a state in which P is true. This
is true iff, starting in a state satisfying P and repeatedly executing
single R steps, we keep reaching states satisfying P . In other words, if
P ∧ R ⇒ P 0 is true in any state satisfying R of a behavior of S ⊗S R ,
then P ⇒ P is true.

L true: A similar argument based on the conjunct


L ⇒ ([L]+ 0
x awith x ← X)
of I R shows that if L ∧ P 0 ⇒ P is true in any state satisfying L in a
behavior of S ⊗S R , then P ⇒ P is true.

Neither R nor L true: The conjunct ¬(R ∨ L) ⇒ (X = x) of I R shows


that P ≡ P is true for any state satisfying ¬(R ∨ L) in a behavior of
S ⊗S R

Let’s review what we have shown. We can deduce (7.10) from (7.11) and
(7.12). If (7.13) is true, then we can choose S R of Theorem 7.1 to make
(7.11) true for a single subaction A of E . We can deduce (7.12) from (7.14a)
and (7.14b). We can deduce (7.14a) from (7.15). And finally, we can deduce
(7.14b) from the conditions obtained above for proving P ⇒ P , substituting
EhAix for P . Putting all this together, we have shown that the program
S R of Theorem 7.1 can be chosen to make (7.10) true, for a single subaction
A of E , if the following two conditions are satisfied:

(7.16) |= S ⇒ 2 [ ∧ (R · hAix ⇒ hAix · R) ∧ (hAix · L ⇒ L · hAix )


∧ EhAix ∧ R ⇒ ( EhAix )0
∧ L ∧ ( EhAix )0 ⇒ EhAix ]x
|= S R ⇒ 2 ( EhAix ⇒ EhAix )
CHAPTER 7. LOOSE ENDS 269

7.1.5.2 Fairness of Subactions of M R


Deducing fairness properties of subactions of M R from fairness properties of
actions of S is more complicated than for subactions of E R because an M R
step is the result of executing multiple actions of S as a single step. The
simplest and probably most common case is when we want the reduced pro-
gram to satisfy fairness of the M R action itself. So, we begin by examining
this case.
Action M R is the action obtained by executing the entire operation con-
sisting of a sequence of R steps, followed by a C step, followed by a complete
sequence of L steps, with the variables X substituted for the variables x.
For our example program of Figure 7.1, where the values of variables X in
state s were defined to be the values of x in state φ(s), condition φ3 implies
that M R equals C . This means that in any behavior of S ⊗S R, a C step is
an M R step. It suggests that we should be able to deduce fairness of the
M R action of S R from fairness of the C action of S .
Now suppose A is a subaction of C and AR is the action obtained by
executing the entire operation consisting of a sequence of R steps, followed
by an A step, followed by a complete sequence of L steps, with the variables
X substituted for the variables x. The same argument shows that AR equals
A, and that in any behavior of S ⊗S R, an A step is an AR step. Moreover,
any AR step is also an M R step, so AR is a subaction of M R . So, it is
reasonable to consider deriving fairness of a subaction AR of M R when AR
is obtained in this way from a subaction A of C . Since our goal is not to
obtain the most general results, we consider only this case.
First, we must define AR precisely for a subaction A of C . An AR step is
obtained by combining a sequence of R steps followed by an A step followed
by a sequence of L steps into a single step, and then substituting X for x.
The definitions of R, C , and L in terms of R and L and assumptions M1–M4
imply that AR equals Aρ where Aρ is defined by
∆ 0
(7.17) Aρ = ¬(R ∨ L) ∧ ([R]+ +
x · A · [L]x ) ∧ ¬(R ∨ L)

We want condition (7.10) to be true for this choice of A and AR . As before,


we do this by making (7.11) and (7.12) true, starting with (7.11). In a
behavior satisfying S ⊗S R , every A step is an AR step. However, (7.11)
requires every hAix step to be an hAR iX step. This is not true for A an
arbitrary subaction of C . It’s not even necessarily true for A = C , for the
following reason.
Recall that Figure 7.4 shows two behaviors that satisfy S , the bottom
one being the reduced version of the top one. The action labels describe the
CHAPTER 7. LOOSE ENDS 270

changes of the values of the variables x. The top behavior satisfies S ⊗S R ,


where the values of variables X in a state s i of that behavior are the values
of x in the corresponding state φ(s i ) of the bottom sequence. The step
s 43 → s 44 of the top behavior is a C step iff u 42 → u 46 is a C step. If one of
the variables x has different values in states s 43 and s 44 , there is no reason
why its value should differ in states u 42 and u 46 . Step s 43 → s 44 would be
a hC ix step but not a hC iX step if the values of all the variables x are the
same in states u 42 and u 46 . To be able to deduce (7.11) when A is C and
AR is M R , we need the assumption that if s 43 → s 44 is a hC ix step, then
x0 6= x is true of step u 42 → u 46 .
Since every A step in a behavior of S ⊗S R is an AR step, we can deduce
that every hAix step is an hAR iX step from this assumption:

|= S ⇒ 2[ (hAix )ρ ⇒ (x0 6= x) ]x

There is seldom any reason for a program’s next-state action to allow stut-
tering steps, and modifying it to disallow stuttering steps does not change
the program. An A step of the program will usually be an hAix step; and
if it isn’t, A can be replaced by hAix . So for simplicity, we strengthen this
assumption to:

(7.18) |= S ⇒ 2[Aρ ⇒ (x0 6= x)]x

We have shown that (7.18) implies that in a behavior of S ⊗S R , every hAix


step is an hAR iX step. In other words, we have shown that it implies:

(7.19) |= S ⊗S R ⇒ 2[ hAix ⇒ hAR iX ]x,X

This assertion implies (7.11).


The assumption (7.18) makes (7.11) true, so we now have to make (7.12)
true. But we can’t expect (7.12) to hold in general for the following reason.
Since AR equals Aρ , we expect EhAR iX to imply EhAρ ix . By (7.17),
EhAρ ix equals Eh¬(R ∨ L) ∧ [R]+ x · A · . . .ix , which implies E(¬(R ∨ L) ∧
(R ∨ A)) (since E(U · V ) implies E(U ) for any actions U and V ). We can
therefore expect EhAR iX to imply EhAix , as required by (7.12), only when
there is no R action.
A sequence of R steps may have to occur between when Aρ becomes
enabled and when A becomes enabled, so fairness assumptions for R as well
as a fairness assumption for A may be required to imply fairness of Aρ . In-
stead of assuming a fairness condition on L actions to ensure that operation
M completes after a C step occurs, we simply assumed that the operation
completes by adding the assumption 23¬L. Similarly, instead of assuming
CHAPTER 7. LOOSE ENDS 271

a fairness condition on R actions to ensure that the necessary enabling con-


dition of C occurs, we simply assume that the enabling condition eventually
occurs.
As an example, suppose we want to deduce weak fairness of AR from
strong fairness of A. (Because fairness of AR requires fairness of more than
just A, there’s no reason not to have different kinds of fairness for the two
actions.) By (7.19) and the definition (4.12) of WF, to prove WFX (AR ) it
suffices to prove 32 EhAR iX ; hAix . Just as we split the proof of (7.12)
into the two conditions of (7.14), we split the proof of 32 EhAR iX ; hAix
into proving:

(7.20) (a) 2( EhAR iX ⇒ EhAρ ix )


(b) 32 EhAρ ix ; hAix

This may seem wrong because we have EhAρ ix in (7.20a) and EhAρ ix
in (7.20b) when the two formulas should be equal. However, the following
reasoning shows that they are equal. The definition of Aρ and conditions E3
and E4 of Section 5.4.4.2 imply that EhAρ ix equals ¬(R ∨ L) ∧ EhAρ ix .
The invariant I R implies that ¬(R ∨ L) ⇒ (x = X) and ¬(R ∨ L) are
true, so S ⊗S R implies that EhAρ ix always equals EhAρ ix . We make S R
implying (7.20a) one of our requirements for deducing that WFX (AR ) is
satisfied. We now consider (7.20b).
By (3.32b) of Section 3.4.2.8 and the tautology |= ¬hAix ≡ [¬A]x , to
prove (7.20b) it suffices to prove:

(7.21) 32 EhAρ ix ∧ 2[¬A]x ; hAix

By the definition (4.22) of SF and transitivity of ;, to deduce (7.21) from


SFx (A), it suffices to prove that S implies:

(7.22) 32 EhAρ ix ∧ 2[¬A]x ; 23 EhAix

We have seen that to deduce WFX (AR ) from SFx (A), it suffices to show
(7.18) and:

(7.23) (a) |= S R ⇒ 2( EhAR iX ⇒ EhAρ ix )


(b) |= S ⇒ (32 EhAρ ix ∧ 2[¬A]x ; 23 EhAix )

For the other three possible pairs of fairness conditions on AR and A, the
same argument shows that we can deduce SFX (AR ) instead of WFX (AR )
by replacing 32 with 23 in (7.23b); and we can assume WFx (A) instead
of SFx (A) by replacing 23 with 32 in (7.23b).
CHAPTER 7. LOOSE ENDS 272

7.1.5.3 The Reduction Theorem with Fairness


We have described the assumption needed to infer that the reduced program
satisfies a fairness condition on a single action AR if the original program
satisfies a fairness condition on a single action A. We now combine this into
a theorem for inferring that the reduced program satisfies the conjunction
of countably many fairness conditions AR i . This is simple, except for one
problem mentioned above for the case in which AR is a subaction of E R .
Recall that to satisfy (7.11), we constrained the construction of the func-
tion φ illustrated in Figures 7.2–7.4 so that if an E step is an A step in the
original behavior, then the corresponding E step in the reduced behavior
is also an A step. With multiple actions Ai , if the E step in the original
behavior is both an Ai and an Aj step for j 6= i , it might be impossible to
make the E step in the transformed behavior both an Ai and an Aj step.
However, to satisfy (7.11), it’s not necessary for every E step that’s an
A step in the original behavior to remain an A step in the reduced behavior.
It’s only necessary to ensure that if there are infinitely many A steps in the
original behavior, then infinitely many of them are A steps in the reduced
behavior. With countably many such actions Ai , it’s possible to construct
the reduced behavior so that for every i for which there are infinitely many
Ai steps in the original behavior, there are infinitely many Ai steps in the
reduced behavior. This is done using Lemma B.1 in Appendix Section B.6
the same way it is used there in the proof of Theorem 4.6.
We can now put together the assumptions we derived above for deducing
fairness of an action AR from fairness of an action A for individual actions
A into a theorem for deducing fairness of a countable number of actions
ARi from fairness of actions Ai . The requirements for Ai a subaction of E
(and AR R
i a subaction of E ) come from (7.16). The requirements for Ai a
subaction of C (and Ai a subaction of M R ) come from (7.18) and (7.23) plus
R

the modification of (7.23) for additional fairness conditions of A and AR .

Theorem 7.4 With the definitions and assumptions (1) and (2) of Theo-

rem 7.1, let C = (¬L) ∧ M ∧ (¬R0 ) and let

|= F ⇒ ∀ i ∈ I : YFix (Ai ) F R = ∀ i ∈ I : ZFiX (AR
i )

where I is a countable set and YFi and ZFi are WF or SF for each i ∈ I ;
and assume either:
∆ i i
• Ai is a subaction of E , AR
i = Ai , YF equals ZF ,
CHAPTER 7. LOOSE ENDS 273

|= S ⇒ 2 [ ∧ (R · hAi ix ⇒ hAi ix · R) ∧ (hAi ix · L ⇒ L · hAi ix )


∧ EhAi ix ∧ R ⇒ ( EhAi ix )0
∧ L ∧ ( EhAi ix )0 ⇒ EhAi ix ]x , and
|= S R ⇒ 2 ( EhAi ix ⇒ EhAi ix )
or
• Ai is a subaction of C ,
Aρi = ¬(R ∨ L) ∧ ([R]+
∆ + 0
x · Ai · [L]x ) ∧ ¬(R ∨ L) ,

= Aρi ,

AR
i
|= S ⇒ 2[Aρi ⇒ (x0 6= x)]x ,
ρ
|= S R ⇒ 2( EhAR
i iX ⇒ EhAi ix ) , and
|= S ∧ F ⇒ (2
32
3 Z EhAρi ix ∧ 2[¬Ai ]x ; 2
32
3 Y EhAi ix )
where for Q either Y or Z, 2
32
3 Q is 32 if QF is WF, and
it is 23 if QF is SF.3

Then |= S ∧ F ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R ∧ F R .

7.1.6 An Example: Making Critical Sections Atomic


A standard concurrent coding practice is to “protect” accesses to shared
data by putting them in critical sections. Recall that Section 4.2.2.1 defined
a critical section to be a piece of code in a process such that no two processes
can be executing their critical sections at the same time. We can consider
this coding practice to be an application of reduction in which the reduced
program executes the entire critical section, including its waiting and exiting
code, as a single action.
We assume that critical sections are implemented with the trivial mutual
exclusion algorithm LM described in Figure 4.6 of Section 4.2.6.1 that uses
Dijkstra’s P and V semaphore operations. (Correctness of a mutual exclu-
sion algorithm can be expressed as the requirement that it implements LM
under a suitable refinement mapping.) We assume the program is described
with pseudocode, using a semaphore variable sem and a variable pc to de-
scribe the control state. The variable sem initially equals 0 and is accessed
by a process p only by P p and V p actions that execute the P (sem) and
V (sem) operations. These actions are written in TLA as follows, where the
3
For example, if YFix (Ai ) is SFix (Ai ) and ZFiX (AR i R
i ) is WFX (Ai ), this condition is:

|= S ∧ F ⇒ (32 EhAρi ix ∧ 2[¬Ai ]x ; 23 EhAi ix )


CHAPTER 7. LOOSE ENDS 274

unchanged formulas assert that all program variables other than sem and
pc are left unchanged:

Pp = ∧ pc(p) = . . .
∧ (sem = 1) ∧ (sem 0 = 0)
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .

Vp = ∧ pc(p) = . . .
∧ sem 0 = 1
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .

The value of sem is therefore always either 0 or 1. An execution of a crit-


ical section by process p consists of a P p step, followed by steps satisfying
actions CS p,1 , . . . , CS p,k p that represent executions of statements in the
critical section, followed by a V p step. The assumption that shared data is
accessed only within a critical section means that if an action E q describes
a statement of process q outside a critical section (meaning not a P q , CS q,j ,
or V q action), then each action CS p,i commutes with E q if p 6= q.
We can reduce a program using critical sections in this way by making
execution of each critical section a single step. This is done by a sequence of
applications of our reduction theorems, each one reducing one critical section
of a single process. When doing multiple reductions, it would get confusing
if we introduced new variables for each reduction. We therefore consider the
reduced version S R of a program S to be the program S R described in our
theorems, except with x substituted for X. In our theorems, the M R that
executes the operation described by action M as a single step is defined to
equal M ρ with x ← X . When we use the variables x instead of X for the
reduced program, M R becomes M ρ .
To apply the theorems to a critical section of a process p, we let M p
equal P p ∨ CS p,1 ∨ . . . ∨ CS p,k p ∨ V p , so M p is replaced in the reduced
program by M ρp . We could let any of those actions be the action C , but it
is most convenient to let C be the P p action, so there is no action R and
action L is CS p,1 ∨ . . . ∨ CS p,k p ∨ V p . By Theorem 7.2, it suffices to show
that V p and each CS p,i left-commutes with every action of every process
q 6= p. (Action L trivially left-commutes with any action of process p not
a subaction of M p .) The possible actions of process q are P q , V q , CS q,j
for some j , an action E q not in a critical section, and M ρq if the operation
described by M q has already been reduced. Here is why V p and CS p,i
commute with each of those actions of process q 6= p:
CHAPTER 7. LOOSE ENDS 275

Pq : V p left-commutes with P q because the mutual exclusion algorithm im-


plies that V p is not enabled when a process q 6= p is in its critical
section, so P q · V p equals false; CS p,i commutes with P q because it
does not access sem or pc(q) (meaning that it does not depend on or
modify sem or pc(q)).

Vq : V p commutes with V q because any two V (sem) operations commute;


CS p,i commutes with V p because it does not access sem or pc(q).

CSq,j : CS q,j · V p and CS q,j · CS p,i both equal false because a CS q,j step
leaves process q inside its critical section, which by the mutual exclu-
sion algorithm implies process q is outside its critical section so neither
CS p,i nor V p is enabled.

Eq : V p commutes with E q because E q does not access sem or pc(p); and


CS p,i commutes with E q because of the assumption that actions de-
scribing a statement inside a critical section commute with all actions
describing statements not in another process’s critical section.

Mqρ : Both M ρq · V p and M ρq · CS p,i equal false because an M ρq step leaves


sem = 1, while the mutual exclusion algorithm implies that V p and
CS p,i are enabled only when sem = 0.
This shows that S implies E · L ⇒ L · E , so since there is no R action, we
can apply Theorem 7.1. There is still the formula 23¬L to deal with. If
the program has fairness assumptions, then we expect 23¬L to be implied
by fairness assumptions of the L actions. If not, then by Theorem 7.3 we
can verify safety properties by showing that, after executing process p’s
P p statement, it is always possible for execution of the critical section to
complete. It’s hard to imagine an application of mutual exclusion that would
allow a behavior in which it is impossible for some process ever to exit its
critical section.
We now consider deducing fairness properties of the reduced program.
First, let action A describe a statement outside any critical section. If the
statement is in process p, then Aρ equals A so fairness of Aρ is equivalent to
fairness of A. If A is an action of a process other than p, the argument above
shows that L commutes with hAix . If hAix is a subaction of the next-state
action of S , then an L ∧ ( EhAix )0 step ends in a state in which an hAix
step can occur, which by commutativity of L and hAix implies it begins in
a state in which an hAix step can occur, so S implies 2(L ∧ ( EhAix )0 ⇒
EhAix ). We expect 2 ( EhAi ix ⇒ EhAi ix ) to be true, since substitution
has been found to distribute over E for most actions. (However, there is
CHAPTER 7. LOOSE ENDS 276

little experience with the substitution x ← X that occurs in reduction.) If


this formula is true, then by Theorem 7.4 we can deduce fairness of Aρ from
fairness of A.
We expect the most common fairness property for a subaction of M ρp to
be fairness of M ρp itself. In this case C is P p , and we expect to deduce a
fairness condition of M ρp from the same fairness condition of the P p action.
There is no R action, so the definitions of P p and M ρp imply M ρp ⇒ (x0 6= x)
and EhM ρp ix ⇒ EhP p ix . (In the unlikely case that process p does nothing
but repeatedly execute the critical section, so M ρp leaves pc(p) unchanged,
we need to add the assumption that M ρp changes some other variable.) The
only remaining requirement to deduce from Theorem 7.4 that fairness of P p
implies fairness of M ρp is:

|= S R ⇒ 2( EhM R iX ⇒ EhM ρ ix )

Since hM R ix equals hM ρ ix , this is true if substitution distributes over E


for the action hM ρ ix .
We have described the mathematics underlying the use of mutual exclu-
sion to implement atomic operations. We have ignored the question of what
that achieves. We use mutual exclusion to view execution of the critical sec-
tion as a single step, but is that a correct view? The answer lies in condition
R3 of Section 7.1.1.2, which tells us when we can deduce that the original
program satisfies a property that the reduced program satisfies. The only
part of the invariant I R that seems useful in condition R3 is the conjunct
¬(R ∨ L) ⇒ (X = x), which asserts that the variables of the original pro-
gram and the reduced one have equal values when no process is executing
its critical section.

7.1.7 Another Example: Pipelining


Here is a sketch of a simple example of reduction that is interesting in part
because each operation being reduced contains actions performed by two
different processes. The example is a very abstract view of one stage in a
pipelined computation—a view that tells us nothing about what is being
computed.
The program S performs a sequence of computations. Each computation
presumably obtains some input, does some computation, and then produces
some output—but that is irrelevant. What concerns us is that the com-
putations are pipelined as follows so they can be performed concurrently
by two processes. Process 1 performs the first part of the computation to
CHAPTER 7. LOOSE ENDS 277

obtain a partial result that it appends to the end of a fifo queue. Process 2
removes the partial result from the head of the queue and completes the
computation. Process 1 can therefore get ahead of process 2, performing
its part of the i th computation while process 2 is still performing its part
of the j th computation, for i > j . The reduced program S R replaces these
two processes with a single process that performs each computation as a
single atomic action. The property we want to prove by reduction presum-
ably involves how the computed values are used after they are computed,
when they have the same values in the original and reduced programs, so
condition R3 is satisfied.
We describe steps of process 1 by an action Cmp1 ∨ Send , where that
process’s part of a computation consists of a finite sequence of Cmp1 steps
followed by a single Send step that appends the partial result to the tail
of the queue. Steps of process 2 are described by an action Rcv ∨ Cmp2,
where that process’s part of the computation consists of a single Rcv step
that removes the partial result from the head of the queue followed by a
finite sequence of Cmp2 steps. The contents of the queue are described by
the variable qBar , which is accessed only by the Send and Rcv actions. We
assume that the two processes communicate only through the fifo qBar ,
an assumption expressed by these conditions: Cmp1 commutes with the
process 2 actions Rcv and Cmp2, and Cmp2 commutes with the process 1
actions Cmp1 and Send . Since qBar is the only shared variable accessed
by Rcv and Send , it doesn’t matter in which order these two actions are
executed in a state where the queue is nonempty. Thus, we have:

(7.24) |= (qBar 6= h i) ⇒ (Send · Rcv ≡ Rcv · Send )

The program may contain other processes that can interact in some way
with processes 1 and 2. For example, process 1 may obtain its input from a
third process and process 2 may send its output to a fourth process.
The program’s next-state action is M ∨ O, where M describes processes
1 and 2 and O describes any other processes. We rewrite M in the form
∃ n ∈ N+ : M n , where N+ is the set of positive integers and M n is an ac-
tion whose steps describe a complete execution of the n th computation. To
do this, we assume state functions snum and rnum whose values are the
numbers of Send and Rcv actions, respectively, that have been executed.
Initially, snum = rnum = 0. The Send action increments snum by 1 and
the Rcv action increments rnum by 1. We can then define:

(7.25) M n = ∨ (snum = n − 1) ∧ (Cmp1 ∨ Send )
∨ ((rnum = n − 1) ∧ Rcv ) ∨ ((rnum = n) ∧ Cmp2)
CHAPTER 7. LOOSE ENDS 278

We recursively define the n th reduction of the program to be the one obtained


by reducing the operations M 1 , . . . , M n in that order. To define the n th
reduction, define:

Cmp1n = (snum = n − 1) ∧ Cmp1

Send n = (snum = n − 1) ∧ Send

Rcv n = (rnum = n − 1) ∧ Rcv

Cmp2n = (rnum = n) ∧ Cmp2
so M n equals Cmp1n ∨ Send n ∨ Rcv n ∨ Cmp2n . The actions R, C , and L
for the n th reduction are:
∆ ∆ ∆
R n = Cmp1n C n = Send n Ln = Rcv n ∨ Cmp2n

Again, with multiple reductions we let the reduced program have the same
variables as the original program, so the n th reduction replaces the action
M n with M ρn .
The remaining action E for this reduction is the disjunction of these
actions: the action O describing the other processes, the already reduced
actions M ρk for k < n, and the subactions of M k for k > n. To apply
Theorem 7.1, we must show that R n right-commutes with these actions and
Ln left-commutes with them.
That R n right-commutes and Ln left-commutes with O must be as-
sumed. The commutativity relations hold for M ρk with k < n because an
R n step is enabled only after an M ρk step, which implies R n · M ρk equals
false (so R n right commutes with M ρk ), and which also implies that Ln
cannot be enabled immediately after an M ρk step, so M ρk · Ln also equals
false.
What remains to be shown is that Cmp1n (the action R n ) right com-
mutes with M k , and that Rcv n and Cmp2n (whose disjunction equals Ln )
left commutes with M k , for k > n. For that, we have to show that each
of the four actions whose disjunction equals M k satisfy those commutativ-
ity conditions. We will use the commutativity relations we assumed above:
that Cmp1 commutes with Cmp2 and Rcv , and that Cmp2 commutes with
Send . The assumption that Cmp2 commutes with Send implies that Cmp2i
commutes with Send j for all i and j . This follows from the definitions of
Cmp2i and Send j , because Cmp2 does not depend on or modify snum and
Send j does not depend on or modify rnum. Similarly, Cmp1i commutes
with Rcv j and Cmp2j for all i and j . These assumptions are called the
commutativity assumptions in the following proof sketches of the required
commutativity relations. Recall that we are assuming k > n.
CHAPTER 7. LOOSE ENDS 279

Cmp1n right-commutes with Cmp1k and Sendk


Cmp1n ·Cmp1k and Cmp1n ·Send k equal false because snum = n −1
after a Cmp1n step; Cmp1k and Send k are enabled iff snum = k − 1;
and k > n.
Cmp1n right-commutes with Rcvk and Cmp2k
By the commutativity assumptions.
Rcvn left-commutes with Cmp1k
By the commutativity assumptions.
Rcvn left-commutes with Sendk
A Send k step is enabled only if snum = k − 1; the step leaves rnum
unchanged; and Rcv n is enabled only if rnum = n − 1. Therefore,
Send k · Rcv n enabled implies snum = k − 1 and rnum = n − 1, so
k > n implies snum > rnum which implies qBar 6= h i. By (7.24), this
implies Send k · Rcv n ≡ Rcv n · Send k .
Rcvn left-commutes with Rcvk and Cmp2k
Rcv k · Rcv n and Cmp2k · Rcv n equal false because a Rcv k or Cmp2k
step ends in a state with rnum = k which by k > n implies rnum 6=
n − 1, so Rcv n is not enabled in that state.
Cmp2n left-commutes with Cmp1k and Sendk
By the commutativity assumptions.
Cmp2n left-commutes with Rcvk and Cmp2k
A Rcv k or Cmp2k step ends with rnum = k , and Cmp2n is enabled
only if rnum = n, which is false because k > n. Therefore a Cmp2n
step cannot follow a Rcv k or Cmp2k step, so Rcv k ·Cmp2n and Cmp2k ·
Cmp2n equal false.
This handles the commutativity assumptions of Theorem 7.1. We still have
the hypothesis 23¬L in the theorem’s conclusion to deal with. By Theo-
rem 7.3, to use reduction for verifying safety properties, we don’t need that
hypothesis. We only have to show that any finite behavior satisfying the
safety property S can be extended to a behavior that completes each M n
operation in which the C n action has occurred. This means showing that
Process 2 cannot deadlock. This should be easy to show unless process 2
may have to wait for another process to do something—for example, until
another process is ready to receive process 2’s output.
For reasoning about liveness, we would expect the hypothesis 23¬L to
be satisfied by adding fairness conditions to process 2 actions, and perhaps to
CHAPTER 7. LOOSE ENDS 280

other processes, to ensure that the operation will complete once process 1’s
Send action occurs. The obvious fairness condition we want the reduced
program to satisfy is fairness of M ρ . If an M ρn action is enabled, then no
M ρi action with i 6= n can be enabled until an M ρn step occurs. This implies
that (weak or strong) fairness of M ρ is equivalent to fairness of M ρn for all
n. For each n, ensuring fairness of M ρn is the second case in Theorem 7.4,
with Ai equal to C n , which equals Send n . The assumption |= S ∧ F ⇒ . . .
in that case of the theorem will have to be implied by fairness conditions on
subactions of Cmp1.

7.2 Decomposing and Composing Programs


We think of a program as consisting of multiple components. Most often
those components are processes. However, a process of an abstract program
need not correspond to a process (or thread) in a coding language. As
we saw in Section 1.5, Euclid’s algorithm can be viewed as a multiprocess
program. In general, any disjunct of a program’s next-state action can be
thought of as a process, and we may be able to write the next-state action
in more than one way as the disjunction of subactions. We now use the
term component instead of process for what is described by a disjunct of the
program’s next-state action.
In this section, we describe a program as the conjunction of separate
programs, each describing one of the program’s components. There are two
reasons for doing this. The first is that we have the program and want to
decompose the problem of verifying its correctness into the simpler tasks
of verifying correctness of each of its components. We call this procedure
decomposition and consider it in Section 7.2.1. The second reason is because
the program is implemented using existing components whose correctness
has been verified. We want to deduce correctness of the program from
correctness of the components, without knowing how the components are
implemented. This procedure is called composition and is considered in
Section 7.2.2.
There is little reason to decompose a program if we are proving its cor-
rectness. Decomposition structures the proof in three parts: (1) showing
that the program is equivalent to the conjunction of programs describing the
components, (2) showing that each of those programs satisfies a property,
and (3) showing that the conjunction of those properties implies correctness
of the original program. Such a proof can be rewritten as an ordinary cor-
rectness proof of the original algorithm by a simple rearrangement of the
CHAPTER 7. LOOSE ENDS 281

steps of those three parts. But math provides many ways to structure a
proof, and deciding in advance to structure it by decomposition might rule
out better proofs.
The one good reason to decompose the verification of a program in this
way is that it may make it easier to use a tool to verify correctness. For
example, a model checker might be able to verify correctness of individual
components but not correctness of the complete program. Decomposition
would allow using model checking to perform part of the verification, and
then using the results presented here to prove that correctness of the com-
ponents implies correctness of the entire program. This approach has been
applied to a nontrivial example [24], but I don’t know of any case in which
it has been used in industry.
Composition is useful if an engineer wants to verify correctness of a pro-
gram that describes a system built using an existing component whose be-
havior is specified by an abstract program. Up until now, we have described
a program by a formula that is satisfied by behaviors in which the program
to be implemented, which I will here call the actual program, and its envi-
ronment are both acting correctly. There was no need for the mathematical
description to separate the actual program and its environment, since it
makes no difference if an execution is incorrect because the programmer
didn’t understand what the code would do or what the environment would
do. However, if a program is implemented using a component purchased
elsewhere, it is important to know if an incorrect behavior is due to an in-
correct implementation of the actual program or of the component, which
is part of the environment.
For composition, we therefore describe a program with two formulas,
formula M describing the correct behavior of the actual program and a
formula E describing correct behavior of its environment. These formulas
+
are combined into a single formula, written E −. M , that can be thought of
as being true of a behavior iff M is true as long as E is (so M is always true
+
if E is always true). Formula E −. M is what is called a rely/guarantee
description of the program [22].
Currently, implementing actual programs with precisely specified exist-
ing components seems likely to arise in practice only for components that
are traditional programs that perform a computation and stop; and where
execution of the component can be considered to be a single step of the
complete program. In that case, there is no need for TLA. As explained in
Appendix Section A.4, the safety property of the component can be speci-
fied by a Hoare triple; and termination is the only required liveness property.
Composition in TLA is needed only if the existing component interacts with
CHAPTER 7. LOOSE ENDS 282

its environment in a more complex way that must be described with a more
general abstract program. Such reusable, precisely specified components do
not seem to exist now. Perhaps someday they will.
The results presented here come from a single paper [2]. The reader is
referred to that paper for the proofs. To make reading it easier, much of
the notation used here—including the identifiers in formulas—is taken from
that paper.

7.2.1 Decomposing Programs


7.2.1.1 Writing a Program as a Conjunction
As an example, we take Euclid’s algorithm, described in Section 1.5. Instead
of x and y, let’s name the variables a and b. Also, instead of having the
algorithm compute the GCD of two particular numbers, we’ll let it nonde-
terministically choose the initial values of a and b and compute their GCD.
To write the algorithm in TLA, let’s first define formulas that describe
the initial values of each of the variables, how they are changed in the next-
state action, and fairness of the actions that change them, where N+ is the
set of positive integers. Here are those formulas for the variable a:

Init a = a ∈ N+

Next a = (a > b) ∧ (a 0 = a − b) ∧ (b 0 = b)

La = WFha i (Next a )
The formulas Init b , Next b , and Lb are obtained from these formulas by
interchanging a and b. Note that La is equivalent to WFha,b i (Next a ) because
Next a implies b 0 = b, and similarly, Lb is equivalent to WFha,b i (Next b ).
We can describe Euclid’s algorithm with the following TLA formula M :
M = Init M ∧ 2[Next M ]ha,b i


Init M = Init a ∧ Init b

Next M = Next a ∨ Next b

LM = La ∧ Lb
Formula M is equivalent to the conjunction of M a and M b , defined by:
M a = Init a ∧ 2[Next a ]a ∧ La

M b = Init b ∧ 2[Next b ]b ∧ Lb

The equivalence of M and M a ∧ M b follows by simple logic from:


|= 2[Next a ]a ∧ 2[Next b ]b ≡ 2[Next a ∨ Next b ]ha,b i
CHAPTER 7. LOOSE ENDS 283

This result follows from the equivalence of 2(F ∧ G) and 2F ∧ 2G, for any
formulas F and G, and from

[Next a ]a ∧ [Next b ]b ≡ [Next a ∨ Next b ]ha,b i

which follows from the definition of [. . .]... , the equivalence of (a 0 = a)∧(b 0 =


b) and ha, b i0 = ha, b i, and:

(7.26) |= Next a ⇒ (b 0 = b) |= Next b ⇒ (a 0 = a)

That M is equivalent to M a ∧ M b depends only on (7.26), not on any other


properties of Next a and Next b , and not on the definitions of Init a , Init b ,
La , or Lb . Moreover, it remains true if each variable a and b is replaced
by a tuple of variables, as long as those two tuples have no variable in
common and (7.26) is satisfied. In other words, if a program consists of
two components, each modifying different variables than the other, then
the program can be decomposed as the conjunction of two programs, each
describing one of the components.
The following theorem generalizes this example from two to n compo-
nents. It replaces M a and M b by processes M i for i ∈ 1 . . n, replaces a and
b by the lists m i of variables modified by each component i , and replaces
(7.26) by the requirement that for each i , the next-state action Next i of M i
implies hm j i0 = hm j i for j 6= i .4

Theorem 7.5 If m 1 , . . . , m n are each lists of variables, with all the vari-

ables in all the lists distinct, N = 1 . . n, and

m = m 1, . . . , m n
M i = Init i ∧ 2[Next i ]hm i i ∧ Li


M = ∀i ∈ N :Mi
|= M ⇒ 2 [ ∀ i , j ∈ N : Next i ∧ (i 6= j ) ⇒ (hm j i0 = hm j i) ]m
then

|= M ≡ (∀ i ∈ N : Init i ) ∧ 2[∃ i ∈ N : Next i ]hm i ∧ (∀ i ∈ N : Li )

We can further generalize this theorem to allow hiding of variables local


to components. Suppose that for each component i , we might want to hide
4
We are abandoning our convention of naming a list of variables with a boldface letter
and adopting the notation of [2], where each mi is a list of variables that we could write
mi,1 , . . . , mi,ni but won’t.
CHAPTER 7. LOOSE ENDS 284

a sublist y i of the variable list m i , where the variables y i do not appear in


any M j with j 6= i . We can then define M by

M = ∀ i ∈ N : (∃ y i : M i )
where if y i is the empty tuple, then ∃ y i : M i equals M i . We must modify
the hypothesis |= M ⇒ 2[. . .]m by replacing m j with the variables of m j
not in y j , and make this the conclusion:
|= M ≡ ∃ y 1 , . . . , y n :
(∀ i ∈ N : Init i ) ∧ 2[∃ i ∈ N : Next
d i]
hm i ∧ (∀ i ∈ N : Li )

d i = Next i ∧ ∀ j ∈ N \ {i } : y 0 = y j .
where Next j
Another generalization is to require only that different components mod-
ify different parts of the state, not necessarily different variables. For ex-
ample, suppose the components are processes and process i modifies pc(i ),
where the value of the variable pc is always a function whose domain is the
set N . In the hypothesis |= M ⇒ 2[. . .]m , we can replace the expression
hm j i0 = hm j i by a state predicate ν j , where the predicates ν i must satisfy
only the additional hypothesis:
|= M ⇒ 2[(∀ i ∈ N : ν i ) ⇒ (m 0 = m)]hm i
For the example in which process i modifies pc(i ), we can define ν i so it
implies (∃ S : pc ∈ (N → S )) ∧ (pc 0 (i ) = pc(i )).
For simplicity, we consider only decomposing programs as described by
Theorem 7.5. The rest of what we say about program decomposition can
be generalized to these more general ways to decompose programs.

7.2.1.2 Decomposing Proofs


We now consider decomposing the verification that a program is correct into
the verification that its components are correct. Let’s start with a program
with two variables c and d and two components, each component modifying
one of the variables. Let’s suppose we have decomposed the program into
the conjunction M lc ∧ M ld , and we have decomposed its correctness property
into M c ∧ M d , where M c and M d are correctness conditions for the two
components. We have to verify:
(7.27) |= M lc ∧ M ld ⇒ M c ∧ M d
and we’d like to do this by verifying that M lc satisfies M c and that M ld
satisfies M d . We can obviously do that if we can verify
|= M lc ⇒ M c and |= M ld ⇒ M d
CHAPTER 7. LOOSE ENDS 285

However, those conditions are unlikely to be true. The component program


M lc is unlikely to satisfy M c when used as a component of an arbitrary
program. Its correctness will depend upon some property of its environment
M ld . Similarly, correctness of M ld will depend upon some property of M lc .
We can obviously reduce verification of (7.27) to verifying

|= M lc ∧ M ld ⇒ M c and |= M lc ∧ M ld ⇒ M d

but that doesn’t reduce the amount of work involved. However, suppose that
correctness of M lc doesn’t depend on its environment being the component
M ld , but just requires its environment to satisfy the correctness condition
M d of that component, and similarly correctness of M ld just requires that
the other component satisfies M c . We would then like to reduce verification
of (7.27) to verifying:

(7.28) |= M d ∧ M lc ⇒ M c and |= M c ∧ M ld ⇒ M d

This would reduce the amount of work because M c and M d are probably
significantly simpler than M lc and M ld . Can we do that?
Let’s consider the following trivial example, where each component ini-
tializes its variable to 0 and keeps setting its variable’s value to the value of
the other component’s variable.

(7.29) M lc = (c = 0) ∧ 2[(c 0 = d ) ∧ (d 0 = d )]c


∧ WFc ((c 0 = d ) ∧ (d 0 = d ))
M ld = (d = 0) ∧ 2[(d 0 = c) ∧ (c 0 = c)]d

∧ WFd ((d 0 = c) ∧ (c 0 = c))


We take as the correctness condition of each component that its variable
always equals 0:

M c = 2(c = 0) and M d = 2(d = 0)


∆ ∆

Condition (7.28) is satisfied because each component’s variable keeps setting


its variable to 0 (that is, it can take nothing but stuttering steps) if the other
component’s variable always equals 0. As we would hope, the correctness
condition (7.27) is also satisfied because the program consisting of those two
components keeps both c and d always equal to 0.
Now let’s replace M c and M d by the properties asserting that c and d
eventually equal 1:

M c = 3(c = 1) M d = 3(d = 1)
∆ ∆
CHAPTER 7. LOOSE ENDS 286

while keeping M lc and M ld the same. Condition (7.28) is still satisfied be-
cause each component eventually sets its variable to 1 if the other component
sets its variable to 1. However, (7.27) is not satisfied. Changing the cor-
rectness conditions doesn’t change the behavior of the program, which is to
take nothing but stuttering steps.
We might ask why we can’t deduce (7.27) from (7.28) in this example.
However, the real question is why we can deduce it in the first example. De-
ducing (7.27) from (7.28) is deducing, from the assumption that correctness
of each component implies correctness of the other, that both components
are correct. This is circular reasoning, and letting M c = M d = false shows
that it allows us to deduce that any program implies false, from which we
can deduce that the program satisfies any property.
So, why does (7.28) imply (7.27) in the first case? Why can we deduce
that both components leave their variables equal to 0 from the assumption
that each component leaves its variable equal to 0 if the other process leaves
its variable equal to 0? The reason is that neither process can set its variable
to a value other than 0 until the other one does. Stated more generally,
we can deduce that both components in a two-component program satisfy
their correctness properties if neither component can violate its correctness
property until after the other does. So we want to replace (7.28) by:

(7.30) |= ∀ k ∈ N :
(M d true through state k − 1)
∧ (M lc true through state k ) ⇒ (M c true through state k )
plus the same condition with c and d interchanged, where F true through
state −1 is taken to be true for any property F .
To express (7.30) precisely, we have to say what it means for a property
F to be true through state k . If F is a safety property, it means that F is
true of the finite behavior σ(0) → . . . → σ(k ), which means it’s true of the
(infinite) behavior obtained by repeating the state σ(k ) forever. It follows
from Theorems 4.3 and 4.4 that any property F equals C(F ) ∧ L where L is
a liveness property such that hC(F ), Li is machine closed. By the definition
of machine closure (in Section 4.2.2.2), any finite behavior that satisfies
C(F ) can be completed to a behavior satisfying C(F ) ∧ L, which equals F .
Therefore, the only way a behavior can fail to satisfy F through state k is
for it not to satisfy C(F ) through state k , so F is true through state k means
that C(F ) is true through state k . We should therefore replace M d , M lc ,
and M c by C(Md ), C(Mcl ), and C(Mc ) in (7.30). For a safety property, true
through state k means true if all states i with i > k equal state k , so we
CHAPTER 7. LOOSE ENDS 287

can rewrite the resulting condition as:

(7.31) |= ∀ k ∈ N :
(every state after state k equals state k ) ⇒
( (C(M d ) true through state k − 1) ∧ C(M lc ) ⇒ C(M c ) )
Next, let v be the tuple of all variables in these formulas. We can then
replace the assertion “every state . . . state k ” in (7.31) with “v 0 = v from
state k on”. By predicate logic, if k does not appear in R or S , then

(∀ k : P ⇒ (Q ∧ R ⇒ S )) ≡ ((∃ k : P ∧ Q) ∧ R ⇒ S )

We can therefore rewrite (7.31) as follows, abbreviating “true through state”


as “tts”:

(7.32) (∃ k ∈ N : (v 0 = v from state k on) ∧ (C(M d ) tts k − 1))


∧ C(M lc ) ⇒ C(M c )
If we define F+v to equal

∃ k ∈ N : (v 0 = v from state k on) ∧ (C(F ) tts k − 1)

we can then write (7.32) and the condition obtained from it by interchanging
c and d as:

(7.33) |= C(M d )+v ∧ C(M lc ) ⇒ C(M c )


|= C(M c )+v ∧ C(M ld ) ⇒ C(M d )
From (7.33), we can infer:

(7.34) |= C(M lc ) ∧ C(M ld ) ⇒ C(M c ) ∧ C(M d )

The theorems to be stated require a slightly weaker definition of F+v , which


makes the conditions (7.33) stronger (so they still imply (7.34)). Let F+v old

be the formula that we have been calling F+v . We now define F+v to equal
old ∨ F . With this definition, (7.33) implies its two conditions also hold
F+v
with the “+v ” removed. If F is a safety property, then F v is a safety
property but F+vold usually isn’t. In fact, if F is a safety property then F
+v
old
equals C(F+v ). In practice, the change should seldom make a difference in
(7.34) because we don’t expect liveness properties to be useful for proving
safety properties, so we wouldn’t expect F ∧ G ⇒ H to be true for safety
properties G and H without C(F ) ∧ G ⇒ H also being true.
The formula F+v has been defined semantically. However, to verify (7.33)
directly, we have to write C(F )+v as a formula for a given formula F . It’s
CHAPTER 7. LOOSE ENDS 288

easy to write C(F ) if F has the usual form Init ∧ 2[Next]w ∧ L , where w is
the tuple of variables in the formulas and L is the conjunction of fairness
properties of subactions of Next. In that case, the definition of machine
closure (Section 4.2.2.2) and Theorem 4.6 (Section 4.2.7) imply C(F ) equals
Init ∧ 2[Next]w . We can then write F+v as follows:

F+v = ∃ h : Init
d ∧ [Next]
d
w ◦v ◦hh i

where Init
d = (Init ∧ (h = 0)) ∨ (h = 1)

d = ∨ (h = 0) ∧ ∨ (h 0 = 0) ∧ [Next]w
Next
∨ h0 = 1
∨ (h = 1) ∧ (h 0 = h) ∧ (v 0 = v )
While writing F+v is easy enough, we usually don’t have to for the same
reason that we didn’t have to use the +v subscripts in (7.27). Our example
has one feature that we didn’t use in our generalization—namely, that no
single program step can make both M c and M d false. Here’s how to use that
feature in general. For safety properties F and G, define F ⊥ G to be true of
a behavior σ iff for every k ∈ N, if F ∧ G is true of σ(0) → . . . → σ(k ) then
F ∨ G is true of σ(0) → . . . → σ(k + 1). Understanding why the following
theorem is true is a good test that you understand the definition of F+v .

Theorem 7.6 If F , G, and H are safety properties, v is a tuple of variables


containing all variables of F , and |= H ⇒ (F ⊥ G), then |= F ∧ H ⇒ G
implies |= F+v ∧ H ⇒ G.

For our example, in which M c and M d are safety properties, |= C(M lc ) ⇒


(M c ⊥ M d ) and |= C(M ld ) ⇒ (M c ⊥ M d ) are true and allow us to re-
move +v from the conditions (7.33). These properties are true because the
example satisfies these two conditions:

• We can express correctness of the program as M c ∧ M d , where only


a step of component c can violate M c and only a step of component
d can violate M d . This seems to be a requirement for decomposing
verification of the program into verification of its components to make
sense. For example, mutual exclusion can’t be expressed as the con-
junction of invariance properties that can each be violated by only one
process. I therefore expect that attempting to decompose verification
of a mutual exclusion algorithm in this way would complicate the task.

• No step is both a c step and a d step. This condition means that a step
in a behavior of the program consists of a step of a single component.
CHAPTER 7. LOOSE ENDS 289

It is the case if the components are processes in a program written in


the kind of pseudocode we have been using, which is modeled after
the most popular coding languages. I believe that all the engineers
I have worked with find this to be the most natural way to describe
concurrent programs.

We have obtained the assumptions we need to deduce (7.34). But we


want to verify (7.27), which is (7.34) with the C operators removed. Since F
implies C(F ) for any property F , we can remove the Cs from the left-hand
side of the implication (7.34). But we need additional assumptions to be
able to infer that M lc ∧ M ld implies the liveness parts of M c and M d . Since
we know that M lc ∧ M ld implies C(M c ) ∧ C(M d ), we can use the following
assumptions to deduce (7.27).

(7.35) |= C(M d ) ∧ M lc ⇒ M c and |= C(M c ) ∧ M ld ⇒ M d

Condition (7.33) allows us to assume that other components satisfy their


safety conditions when showing that a component satisfies its safety con-
dition. However, (7.35) allows us to use the liveness property of only that
component when showing that the component satisfies its liveness require-
ment. It would be circular reasoning to assume M d when verifying M c and
assume M c when verifying M d . However, we can do it for one of the com-
ponents. For example, if we show that C(M c )∧M ld implies M d , we can then
assume M d when showing that M lc implies M c . We can therefore replace
C(M d ) by M d in (7.35), since C(M d ) ∧ M d equals M d . (But we then can’t
also replace C(M c ) by M c .)
For any decomposition of a program into two components M lc and M ld
with correctness properties M c and M d , we have deduced (7.27) from (7.33)
and (7.35). The following theorem, which is Theorem 1 of [2], generalizes
this to a program with n components M li , each with correctness property
M i . The theorem is first stated, then explained.

Theorem 7.7 (Decomposition Theorem) If for all i ∈ 1 . . n:

1. |= ∀ j ∈ 1 . . n : C(M j ) ⇒ E i

2. (a) |= C(E i )+v ∧ C(M li ) ⇒ C(M i )


(b) |= E i ∧ M li ∧ (∀ j ∈ 1 . . (i − 1) : M j ) ⇒ M i

then |= (∀ i ∈ 1 . . n : M li ) ⇒ (∀ i ∈ 1 . . n : M i )
CHAPTER 7. LOOSE ENDS 290

The theorem’s conclusion is the obvious generalization of (7.27). There are


two hypotheses for each component i . Let’s first ignore hypothesis 1 and
take E i to be the conjunction of all the C(M j ) for j 6= i . Hypothesis 2(a) for
component M li then generalizes the first condition of (7.33) for component
M lc , replacing the correctness condition C(M d ) of the other component with
the conjunction C(M j ) of all the other components. Hypothesis 2(b) makes
the similar generalization of (7.35), where allowing the use of all M j with
j < i in the proof of M i generalizes the observation we made about being
able to weaken one of the conditions of (7.35). Hypothesis 1 generalizes
what we did for two components in two ways:

• It allows E i to be the conjunction of all formulas C(M j ), including


j = i . This can obviously be done for hypothesis 2(b). It might seem
to turn hypothesis 2(a) into a tautology by conjoining the right-hand
side C(M i ) of the implication to the left-hand side, resulting in circular
reasoning. However, the subscript +v turns it from circular reasoning
into induction on the number of steps in a finite behavior.

• Instead of using the conjunction of the closures of all the components’


correctness properties in hypothesis 2, it allows E i to be any property
implied by that conjunction that is strong enough to satisfy hypothe-
sis 2. I expect E i will always be a safety property, but it’s conceivable
that it might not be.

The theorem does not make any assumption about v . That’s because if w
is the tuple of all variables appearing in the formulas (including in v ), then
F+w implies F+v . Thus, if hypothesis 2(a) is satisfied for any state function
v , then it’s satisfied with v equal to the tuple of all variables in the formulas.
Letting v equal that tuple produces the weakest (hence easiest to satisfy)
hypothesis.

7.2.2 Composing Components


In Section 7.2.1, we decomposed a given program as the conjunction of
components. We now assume we are given a collection of components and
define the program to be the conjunction of those components. As a tiny
example, assume we want to write a program that satisfies the property
2(c = 0) ∧ 2(d = 0), and we want to do it by conjoining a c component
that implements 2(c = 0) and a d component that implements 2(d = 0).
We find that someone has written a program M lx that satisfies 2(x = 0)
CHAPTER 7. LOOSE ENDS 291

when run in an environment that satisfies 2(y = 0). So, we decide to write
our program as M lc ∧ M ld where:

M lc = (M lx with x ← c, y ← d )

M ld = (M lx with x ← d , y ← c)

This silly example captures the most important aspect of specifying com-
ponents: No real device will satisfy a specification such as 2(c = 0) when
executed in an arbitrary environment. For example, a process will not be
able to compute the GCD of two numbers if other processes can at any time
arbitrarily change the values of its variables.
We want to deduce that M lc ∧ M ld implies 2(c = 0) ∧ 2(d = 0)
from the properties that components c and d satisfy, without knowing
what M lc and M ld are. The property that the c component satisfies is
that if its environment satisfies 2(d = 0) then the component satisfies
2(c = 0); and d satisfies the same condition with d and c interchanged.
The obvious way to express these two properties is 2(d = 0) ⇒ 2(c = 0)
and 2(c = 0) ⇒ 2(d = 0), but those two properties obviously don’t imply
2(c = 0) ∧ 2(d = 0). We need to find the right way to express mathemati-
cally the condition that a component satisfies the property M if its environ-
ment satisfies the property E . We do this by assuming that the condition is
+ +
expressed by a formula E −. M and figuring out what the definition of −.
should be, given the assumption that the definition should make this true:

(7.36) |= 2(d = 0) −.
+
2(c = 0) and |= 2(c = 0) −.
+
2(d = 0)
implies |= 2(c = 0) ∧ 2(d = 0)

We first ask when we can deduce this:


+ +
(7.37) |= E 1 −. M1 and |= E 2 −. M2 implies |= M 1 ∧ M 2
The answer lies in Theorem 7.7. Since (7.37) doesn’t mention the programs
M l1 and M l2 , it should be true if we let those programs equal true. Substi-
tuting true for them, the conclusion of Theorem 7.7 for n = 2 is M 1 ∧ M 2 .
+ +
We define −. so that hypotheses 2(a) and 2(b) are equivalent to E i −. M i.
The definition is:
+ ∆
E −. M = (C(E )+v ⇒ C(M )) ∧ (E ⇒ M )
where v is the tuple of all variables in E and M . The theorem then implies
that (7.37) is true if hypothesis 1 is satisfied, that hypothesis asserting:
(7.38) |= C(M 1 ) ∧ C(M 2 ) ⇒ E 1 and |= C(M 1 ) ∧ C(M 2 ) ⇒ E 2
CHAPTER 7. LOOSE ENDS 292

These conditions are true for our example, so (7.36) is true.


The conclusion of Theorem 7.7 is |= M 1 ∧ M 2 , which asserts that the
composition of the components satisfies M 1 and M 2 assuming nothing about
its environment. We need a more general theorem whose conclusion is
+
|= E −. M 1 ∧ M 2 , asserting that the composition satisfies M 1 ∧M 2 if its en-
vironment satisfies E . There is actually a stronger result, asserting that the
+
composition satisfies |= E −. M for any property M implied by E ∧M 1 ∧M 2 .
Here is the theorem, generalized from two to n components. It is Theorem 3
of [2].

Theorem 7.8 (Composition Theorem) If


1. |= ∀ i ∈ 1 . . n : C(E ) ∧ (∀ j ∈ 1 . . n : C(M j )) ⇒ E i

2. (a) |= C(E )+v ∧ (∀ j ∈ 1 . . n : C(M j )) ⇒ C(M )


(b) |= E ∧ (∀ j ∈ 1 . . n : M j ) ⇒ M
+ +
then |= (∀ j ∈ 1 . . n : E j −. M j ) ⇒ (E −. M)

It’s instructive to compare Theorems 7.7 and 7.8. They both make no
assumption about v , since letting it equal the tuple of all variables in the
formulas yields the weakest hypothesis 2(a). Hypothesis 1 differs only in
Theorem 7.8 having the additional conjunct C(E ). This conjunct (which
weakens the hypothesis) is expected because, if M is the conjunction of the
M i , then the M in the conclusion of Theorem 7.7 is replaced in Theorem 7.8
+
by E −. M.
As we observed for Theorem 7.7, hypothesis 1 of Theorem 7.8 pretty
much requires the E i to be safety properties. However, when applying The-
orem 7.8, we can choose to make them safety properties by moving the
liveness property of E i into the liveness property of M i . More precisely,
suppose we write E i as E Si ∧ E Li , where E Si is a safety property and E Li
a liveness property such that hE Si , E Li i is machine closed; and we simi-
larly write M i as M Si ∧ M Li . We can then replace E i by E Si and M i by
M Si ∧ (E Li ⇒ M Li ).5 This replaces the property E i −.
+
M i by the stronger
property:
(7.39) E Si −.
+
(M Si ∧ (E Li ⇒ M Li ))
It is stronger because if the environment doesn’t satisfy its liveness property
E Li , then E i −.
+
M i is satisfied no matter what the component does; but in
5
By definition of machine closure, hMiS , MiL i machine closed implies hMiS , EiL ⇒ MiL i
is also machine closed, because MiL implies EiL ⇒ MiL .
CHAPTER 7. LOOSE ENDS 293

that case, (7.39) still requires the component to satisfy its safety property
M Si if the environment satisfies its safety property E Si . The two formulas
should be equivalent in practice because machine closure of hE Si , E Li i implies
that, as long as the environment satisfies its safety property, the component
can’t know that the environment’s entire infinite behavior will violate its
liveness property.
Theorem 7.8 has been explained in terms of M i being the property sat-
isfied by a component whose description M li we don’t know, with M a
property we want the composition of the components to satisfy. It can also
be applied by letting M i be the actual component M li and letting M be
the composition ∀ i ∈ 1 . . n : M li of those components. The theorem then
tells us under what environment assumption E the composition will behave
properly if each M li behaves properly under the environment assumption
E i . However, there is a problem when using it in this way. To explain
the problem, we return to our two components c and d whose composition
satisfies 2(c = 0) ∧ 2(d = 0).
The definitions M lc and M ld in (7.29) were written for components c and
d intended to be composed with one another. They were not written to de-
scribe a component that satisfies its desired property only if the environment
satisfies its property. We now want to define them and their environment
assumptions E c and E d so that:
+
|= (E c −. M lc ) ⇒ 2(c = 0)
+
|= (E d −. M ld ) ⇒ 2(d = 0)

The definition of M lc asserts that the value of d cannot change when the
value of c changes (because of the conjunct d 0 = d in the next-state relation)
and d cannot change when c doesn’t change (because of the subscript hc, d i).
That’s a property of its environment. If we want d to satisfy that property,
we should state it in E c , not inside the definition of M lc . So, the definition
of M lc should be

M lc = (c = 0) ∧ 2[c 0 = d ]c ∧ WFc (c 0 = d )

and the definition of M ld should be similarly changed.


If you recall how we decomposed programs in Section 7.2.1.1, expressed
in Theorem 7.5, you will realize that the conjuncts d 0 = d and c 0 = c in the
original definitions of M lc and M ld were there so that the next-state action of
the conjunction of M lc and M ld would be the disjunction of their next-state
CHAPTER 7. LOOSE ENDS 294

actions. With these new definitions, the next-state action of M lc ∧M ld equals

[ ∨ (c 0 = d ) ∧ (d 0 = d )
∨ (d 0 = c) ∧ (c 0 = c)
∨ (c 0 = d ) ∧ (d 0 = c) ]hc,d i

The additional disjunct (c 0 = d )∧(d 0 = c) describes a step that is performed


jointly by the two components. If we want the composition of the two com-
ponents to allow such steps, then there is no problem. However, components
are often processes, and in all the examples we’ve considered, each step is
a step of exactly one process. Program descriptions in which each step is
performed by a single component are called interleaving descriptions.6
Suppose we want to consider only interleaving program descriptions. We
would take the approach used in Theorem 7.5 that for each i , there is a list
m i of variables that can be modified only by component M li . It would be
nice to let E i assert that any step that changes a variable other than one of
the variables of m i must leave all the variables of m i unchanged. However,
this is impossible because there are infinitely many variables other than the
ones in m i , and a formula E i can mention only finitely many of them.
Instead, we modify Theorem 7.8 so its conclusion is:
+ +
(7.40) |= G ∧ (∀ j ∈ 1 . . n : E j −. M j ) ⇒ (E −. G ∧ M)

for a property G. We can then apply the theorem with M i equal to M li , M


equal to ∀ i ∈ 1 . . n : M li , and G the property asserting that if i 6= j , then a
step can’t change both hm i i and hm j i. Formula G ∧ M is the interleaving
description that is presumably what we intended the composition of the
components M li to mean.
In the same way as we could generalize Theorem 7.5, we can replace
hm j i0 = hm j i in the definition of G by a step predicate ν j and have G
assert that every step must satisfy ν i ∨ ν j for i 6= j .
You can write the theorem whose conclusion is (7.40) by yourself. All you
have to do is apply Theorem 7.8 substituting n + 1 for n, letting E n+1 equal
true, letting M n+1 equal G, and replacing M with G ∧ M . The hypotheses
of Theorem 7.8 after making these substitutions are the hypotheses of the
theorem.

6
An interleaving description is often taken to mean any description of a program’s
executions as sequences of states and/or events, so by that meaning all TLA program
descriptions are interleaving descriptions.
Appendix A

Digressions

A.1 Why Not All Mappings Are Sets


Section 2.3.1 states that a function is a special kind of mapping that is
assumed to be a value—meaning that it’s a set, although we don’t know
what its elements are. This raises the question of why we can’t simply
assume that all mappings are sets. The answer is provided by the following
theorem, whose proof is due to Stephan Merz. It asserts that there has to be
a mapping that is not a set. Although we could assume that some mappings
other than functions are sets, the theorem means that we can’t assume all
mappings are sets. For simplicity, we let functions be the only mappings
that we assume to be sets.
Theorem There exists a mapping that is not a set.
1. Suffices: Assume: Every mapping is a set.
Prove: false
Proof: Obvious.
Define M to be the mapping such that M (S ) = S if S is a set that is a
mapping.
2. M (S )(U ) = S (U ) for every mapping S and every set U .
Proof: Since we are assuming that every mapping S is a set, M is a
mapping on the collection of all mappings, and by definition M (S ) = S
for every mapping S , so M (S )(U ) = S (U ) for every set U .
Define Russell to be the mapping on the collection of all sets that are

mappings such that Russell (S ) = choose U : U = 6 M (S )(S ).

295
APPENDIX A. DIGRESSIONS 296

3. Russell is a mapping such that Russell (S ) 6= M (S )(S ) for all sets S that
are mappings.
Proof: The value of any syntactically correct formula is a set, even if its
elements are unspecified. Therefore, M (S )(S ) is a set, and for any set T
there exists a set U such that U 6= T . Thus, Russell is a mapping such
that Russell (S ) 6= M (S )(S ) for every mapping S .
4. Russell (S ) 6= S (S ) for all sets S that are mappings.
Proof: Substituting S for U in step 2 shows S (S ) equals M (S )(S ), which
by step 3 is unequal to Russell (S ).
5. Q.E.D.
Proof: Since Russell is a mapping, and all mappings are assumed to
be sets, substituting Russell for S in step 4 proves Russell (Russell ) 6=
Russell (Russell ), which equals false.

A.2 Recursive Definitions of Mappings


The general form of a recursive definition of a mapping M is

(A.1) M (x ) = Def (x , M )

For example, the definition (2.29) of the mapping # has this form, where
Def is defined by:

Def (x , M ) = if x = {} then 0
else 1 + M (x \ {choose e : e ∈ x })
To make sure that what we do is mathematically sound, we should define
what an arbitrary definition of the form (A.1) actually defines the mapping
M to be.
If we want the mapping M to be a function with domain D, the example
of the factorial function fact in Section 2.4.2 shows that we can define M to
equal

choose M : M = (x ∈ D 7→ Def (x , M ))

This suggests that (A.1) should define M (x ) to equal the value of f (x ) for
some function f , whose domain contains x , that satisfies f (y) = Def (y, f )
for all y in its domain. To do that, let’s first define

fdef (D, f ) = x ∈ D 7→ Def (x , f )
APPENDIX A. DIGRESSIONS 297

We could then define M (x ) to equal:

(A.2) (choose f : f = fdef (domain(f ), f )) (x )

However, this doesn’t work. For example, consider:



MFact(x ) = if x = 0 then 1 else x ∗ MFact(x − 1)

We would expect MFact(x ) to equal fact(x ) for all x ∈ N. However, sup-


pose x = 13 and f is a function whose domain is {13}. Then f (13) equals
13 ∗ f (12). But the value of f (12) is unspecified, so it might equal 1. If
f (12) did equal 1 and the choose operator chose that particular function f
in (A.2), then this would define MFact(13) to equal 13 rather than fact(13).
The way to ensure that the function f is chosen so that f (13) equals
fact(13) is to require that the domain of f includes all numbers in 0 . . 13.
In general, we must choose the domain of f to be large enough so that
f (y) = Def (y, f ) for all y in its domain uniquely determines f . If that’s the
case, we say that fix (f ) is true. The precise definition of fix (f ) is:

∀ g : (∀ y ∈ domain(f ) : g(y) = f (y)) ⇒ (f = fdef (domain(f ), g))

(Note that the definition of fix depends on fdef , whose definition depends
on Def .) We can then define (A.1) to mean:

(A.3) M (x ) = (choose f : (x ∈ domain(f )) ∧ fix (f )) (x )

This defines M (x ) for all values x . However, is doesn’t necessarily define


it to equal Def (x , M ). We write a definition of the form (A.1) because we
want M (x ) to equal Def (x , M ) for some values of x , so we have to be able
to show that it does. We can do that with the following theorem, which is
proved in [16].
Theorem

Assume: 1. M is defined by M (x ) = Def (x , M ).
2.  is a well-founded relation on a set S .
3. For all x ∈ S , the value of Def (x , M ) depends only on the
3. values of M (w ) for w ∈ S .
4. For all x ∈ S and functions f and g, if f (w ) = g(w ) for all
4. w ∈ {v ∈ S : x  v }, then Def (x , f ) = Def (x , g).
Prove: ∀ x ∈ S : M (x ) = Def (x , M )
APPENDIX A. DIGRESSIONS 298

A.3 How Not to Write x 000


Here is an amusing paradox. It’s illegal to prime a primed expression, so it’s
illegal to write x 00 or x 000 if x is a variable. However, consider this definition:

(A.4) F (n) = if n = 0 then x else F (n − 1)0

It apparently defines F (3) to equal x 000 . It doesn’t. To see why not, let’s
simplify things by defining F to be a function with domain N:

F = choose f : f = (n ∈ N 7→ if n = 0 then x else f (n − 1)0 )

In this definition f and n are bound constants, so f (n − 1) is a constant


expression; and exp 0 = exp for any constant expression exp. Therefore, this
definition of F is equivalent to

F = choose f : f = (n ∈ N 7→ if n = 0 then x else f (n − 1))

which defines f (n) to equal x for all n ∈ N.


If we defined F as a mapping by (A.4), we would take (A.3) as the
meaning of that definition and expand the definition of fix . This would
show that F (n) equals an expression (choose f : . . . ) (3) where the primed
expression in “. . . ” is a constant. Again, we would obtain F (n) = x for all
n ∈ N.
This example illustrates why we should not write a recursive defini-
tion that contains an expression that isn’t a constant or a state expression.
There’s no problem applying a recursively defined mapping to a step expres-
sion. It should also be all right to apply one to a temporal logic formula,
though I can’t imagine why we would want to do that. Recursively defined
mappings are used to define the meaning of temporal logic operators, but
those are mappings of ordinary math whose definitions contain no primes
or temporal operators.

A.4 Hoare Logic


Hoare logic is a science of traditional programs developed by C. A. R. (Tony)
Hoare [21]. Programs are described in a coding language, and the logic’s
goal is to prove properties of concrete programs. However, Hoare intended it
also to be applied to abstract programs written in code and used in verifying
a concrete program.
In Hoare logic, a program is viewed as a relation between the initial
and the final states of its execution. A formula of the logic has the form
APPENDIX A. DIGRESSIONS 299

{P }S {Q}, where S is a program (written in code) and P and Q are state


predicates. This formula asserts that if program S is executed starting in a
state in which P is true and the execution terminates, then Q is true in the
final state of the execution. The formula {P }S {Q} is called a Hoare triple,
P is called its precondition, and Q is called its postcondition. Hoare logic
provides a way of showing that a program S satisfies a Hoare triple.
The following is the Hoare logic rule for a program consisting of the
single assignment statement x : = exp, where x is a variable and exp is an
expression:

(A.5) |= (P with x ← exp) ⇒ Q implies {P } x : = exp {Q}

There are also rules for deriving a Hoare triple for a program from Hoare
triples of its components. Here are three such rules:

(A.6) {P }S {R} and {R}T {Q} imply {P } S ;T {Q}

(A.7) {P ∧ R}S {Q} and {P ∧ ¬R}T {Q} imply


{P } if R then S else T end if {Q}

(A.8) |= P ⇒ I and {I ∧ R}S {I } and |= I ∧ ¬R ⇒ Q imply


{P } while R do S end while {Q}

Such rules decompose the proof of a Hoare triple for any program to proofs of
Hoare triples for elementary statements of the language, such as assignment
statements.
It was quickly realized that pre- and postconditions are not adequate to
describe what a program should do. For example, suppose S is a program
to sort an array x of numbers. The obvious Hoare triple for it to satisfy has
a precondition asserting that x is an array of numbers and a postcondition
asserting that x is sorted. But this Hoare triple is true of a program that
simply sets all the elements of the array x to 0. A postcondition needs to
be able to state a relation between the final values of the variables and their
initial values. Various ways were proposed for doing this, one of them being
to allow formulas P and Q to contain constants whose values are the same
in the initial and final states. For example, the precondition for a sorting
program could assert that the constant x 0 equals x , and the postcondition
could assert that the elements of the array x are a sorted permutation of
the elements of x 0.
Viewing a program as a relation between initial and final states means
that it can be described mathematically as a formula of the Logic of Actions.
APPENDIX A. DIGRESSIONS 300

If we represent the program S as an LA formula, then {P }S {Q} is the


assertion |= P ∧ S ⇒ Q 0 ; and the Hoare logic rules follow from rules of LA.
For example, when programs are described in LA, S ; T is S · T , where “·” is
the action composition operator defined in Section 3.4.1.4. The Hoare Logic
rule (A.6) is equivalent to this LA rule:

|= P ∧ S ⇒ R 0 and |= R ∧ T ⇒ Q 0 imply |= P ∧ (S · T ) ⇒ Q 0

The program if R then S else T end if is represented by the LA formula


(R ∧S )∨(¬R ∧T ), and rule (A.7) becomes the propositional-logic tautology:

|= (P ∧ R ∧ S ⇒ Q 0 ) ∧ (P ∧ ¬R ∧ S ⇒ Q 0 ) ⇒
(P ∧ ((R ∧ S ) ∨ (¬R ∧ S )) ⇒ Q 0 )

Hoare’s rule (A.5) for assignment statements is obtained from LA by rep-


resenting the statement x := exp as (x 0 = exp) ∧ ((v x̃ )0 = v x̃ ) , where
v x̃ is the tuple of all program variables other than x . It is valid because
|= P ∧ (x 0 = exp) ∧ ((v x̃ )0 = v x̃ ) ⇒ Q 0 equals |= (P with x ← exp) ⇒ Q
if v x̃ is a tuple containing all variables other than x that appear in P or exp.
Rule (A.8) is a bit tricky because, when executed in a state in which R
equals false, the while statement leaves all variables unchanged. We can
represent that while statement by

((R ∧ S )+ ∧ ¬R 0 ) ∨ (¬R ∧ (v 0 = v ))

where v is the tuple of all program variables and (. . .)+ is defined in Sec-
tion 3.4.1.4. With this representation of the while statement, (A.8) can be
derived from the following rule of LA, where I is any state predicate and A
any action:

(A.9) |= I ∧ A ⇒ I 0 implies |= I ∧ A+ ⇒ I 0

The LA definition of a Hoare triple implies that the validity of rule (A.8) is
proved by the following theorem:

Theorem A.1 Assume: 1. |= P ⇒ I


pagebreak
2. |= I ∧ R ∧ S ⇒ I 0
3. |= I ∧ ¬R ⇒ Q
4. v is the tuple of all variables occurring in Q.
Prove: |= P ∧ ((R ∧ S )+ ∧ ¬R 0 ) ∨ (¬R ∧ (v 0 = v )) ⇒ Q 0
APPENDIX A. DIGRESSIONS 301

1. Suffices: Assume: P ∧ ((R ∧ S )+ ∧ ¬R 0 ) ∨ (¬R ∧ (v 0 = v ))


Prove: Q 0
Proof: Because assumptions 1–3 have the form |= . . . , proving that they
imply a formula F ⇒ G proves that they imply |= (F ⇒ G), and F ⇒ G
is proved by assuming F and proving G.
2. Case: P ∧ (R ∧ S )+ ∧ ¬R 0
2.1. I ∧ (R ∧ S )+ ⇒ I 0
Proof: By assumption 2 and (A.9), with R ∧ S substituted for A.
2.2. P ∧ (R ∧ S )+ ⇒ I 0
Proof: By 2.1 and assumption 1.
2.3. I 0
Proof: By 2.2 and the step 2 case assumption
2.4. ¬R 0
Proof: By the step 2 case assumption
2.5. Q.E.D.
Proof: By 2.3, 2.4, and assumption 3 (since |= F implies |= F 0 for any
state predicate F ).
3. Case: P ∧ ¬R ∧ (v 0 = v )
Proof: By assumptions 1 and 3, P ∧ ¬R implies Q, which by v 0 = v and
assumption 4 implies Q 0 .
4. Q.E.D.
Proof: By the step 1 assumption, steps 2 and 3 cover all possibilities.

A.5 Another Way to Look at Safety and Liveness


This section provides a different view of safety and liveness based on viewing
behavior predicates as sets of behaviors. This view was first recognized by
Gordon Plotkin. I find that it helps me understand safety and liveness. To
understand it, we first need some more math.

A.5.1 Metric Spaces


A metric space M is a set with a distance function δ that assigns a non-
negative real number δ(p, q), called the distance between p and q, to all
elements p and q of M . The function δ must satisfy these conditions for all
APPENDIX A. DIGRESSIONS 302

elements p, q, and r of M :

M1. δ(p, q) = 0 iff p = q.

M2. δ(p, q) = δ(q, p)

M3. δ(p, q) ≤ δ(p, r ) + δ(r , q)

Do you see why these conditions imply δ(p, q) ≥ 0 for all p and q in M ?
The set R of real numbers is a metric space with δ(p, q) equal to |p − q|,
where |r | is the absolute value of the number r , defined by

|r | = if r ≥ 0 then r else − r

An infinite plane, represented as in analytic geometry by the set R × R of


pairs ofp real numbers, is a metric space with δ(hx 1 , y 1 i, hx 2 , y 2 i) defined to
equal (x 1 − x 2 )2 + (y 1 − y 2 )2 . I find that thinking of a metric space M as
the set of points in a plane is a good way to visualize the concepts presented
here.
For a metric space M , the distance δ(p, b S ) from p ∈ M to a nonempty
subset S of M is defined to be the largest number r such that r ≤ δ(p, q)
for all q ∈ S .1 For example, if S is the set of all points hx , y i in the plane
such that x < 3, then δ(h4,
b 7i, S ) equals 1 because δ(h4,
b 7i, S ) > 1 for all q
in S and there are elements q of S such that δ(h4, 7i, q) is arbitrarily close
to 1.
For any metric spaces M and subset S of M , if p ∈ S then δ(p, b S) = 0
because condition M1 implies δ(p, p) = 0. In general, δ(p, b S ) = 0 for p ∈ M
iff for every e > 0 there exists q ∈ S such that δ(p, q) < e.
The closure operation C on subsets of a metric space M is defined by
letting C(S ) be the set {p ∈ M : δ(p, b S ) = 0} of all elements M that are a
distance 0 from S . For example, if M is the plane, let OD and CD be the
open and closed disks of radius 1 centered at the origin, defined by:

OD = {p ∈ M : δ(p, h0, 0i) < 1}

CD = {p ∈ M : δ(p, h0, 0i) ≤ 1}

Both C(OD) and C(CD) equal CD.

Theorem A.2 For any subset S of a metric space, S ⊆ C(S ) and C(S ) =
C(C(S )).
1
δ (p, {}) to equal ∞,
To handle the uninteresting case of the empty set, we can define b
which is a value satisfying r < ∞ for all r ∈ R.
APPENDIX A. DIGRESSIONS 303

Proof: The definition of C and property M1 imply S ⊆ C(S ) for any set S ,
which implies C(S ) ⊆ C(C(S )) for any S . Therefore, to show C(S ) = C(C(S )),
it suffices to assume p ∈ C(C(S )) and show p ∈ C(S ). By definition of C
b we do this by assuming e > 0 and showing there exists q ∈ S with
and δ,
δ(q, p) < e. Because p ∈ C(C(S )), there exists u ∈ C(S ) with δ(p, u) < e/2;
and u ∈ C(S ) implies there exists q ∈ S with δ(q, u) < e/2. By M2 and M3,
this implies δ(p, q) < e. End Proof

As you will have guessed by its name, the operator C on behavior predicates
is a special case of the closure operator C on metric spaces. But for now,
forget about behavior predicates and just think about metric spaces.
A set S that, like CD, equals its closure is said to be closed. The following
result shows that for any set S , its closure C(S ) is the smallest closed set
that contains S .

Theorem A.3 For any subsets S and T of a metric space, if T is a closed


set and S ⊆ T then C(S ) ⊆ T .

Proof: It follows from the definition of C that S ⊆ T implies C(S ) ⊆ C(T ),


and the definition of a closed set implies T = C(T ). End Proof

For any subset S of a metric space M , the boundary of S is defined to be


the set of all p ∈ M with δ(p,
b S ) = 0 and δ(p, b M \ S ) = 0. The boundary
of both disks OD and CD is {p ∈ M : δ(p, h0, 0i) = 1}, the circle of radius
1 centered at the origin. For any metric space M and S ⊆ M , any element
b S ) = 0 that is not in S must be in M \ S and therefore
p of M with δ(p,
must satisfy δ(p, M \ S ) = 0. This shows that the closure of any set S is the
b
union of S and the boundary of S .
A subset S of a metric space M is said to be dense iff C(S ) = M . A
dense set is one that, for any element p of M , contains p or elements of M
arbitrarily close to p. As an example, let’s call a finite-digit real number one
that can be written in decimal notation with a finite number of digits—for
example, 123.5432. The set of all pairs of finite-digit numbers is dense in
the plane because any real number can be approximated arbitrarily closely
with a finite-digit number. Thus, for any pair of real numbers hx , y i and
any e > 0, we can find a pair of finite-digit numbers hp, q i within a distance
e of hx ,√
y i by choosing p and q such that |x − p| and |y − q| are both less
than e/ 2.

Theorem A.4 Any subset S of a metric space equals C(S ) ∩ D for a dense
set D.
APPENDIX A. DIGRESSIONS 304

Proof: Let M be the metric space and let D equal S ∪ (M \ C(S )). The set
D consists of all elements of M except those elements in the boundary of S
that are not in S . It follows from this that C(S ) ∩ D = S . Since elements
in the boundary of S are a distance 0 from S , which is a subset of D, they
are a distance 0 from D. Therefore all elements in M are a distance 0 from
D, so D is dense. End Proof

What we’re interested in is not the distance function δ, but the closure op-
erator C. Imagine that the plane was an infinite sheet of rubber that was
then stretched and shrunk unevenly in some way. Define the distance be-
tween two points on the original plane to be the distance between them
after the plane was deformed. For example, if the plane was stretched
to make everything twice as far apart in the y direction but the same
distance
p apart in the x direction, then δ(hx 1 , y 1 i, hx 2 , y 2 i) would equal
(x 1 − x 2 )2 + (2 ∗ (y 1 − y 2 ))2 . As long as the stretching and shrinking is
continuous, meaning that the rubber sheet is not torn, the boundary of a
set S in the plane after it is deformed is the set obtained by deforming the
boundary of S . This implies that the new distance function produces the
same closure operator as the ordinary distance function on the plane.
Topology is the study of properties of objects that depend only on a
closure operation, which need not be generated by a metric space. But we
are interested in a closure operator that is generated by a particular kind of
metric space, and it helps me to think in terms of its distance function.

A.5.2 The Metric Space of Behaviors


Recall the correspondence between predicates and sets satisfying those pred-
icates described in Section 2.2.4 as the basis of Venn diagrams. It defines a
correspondence between behavior predicates and sets of behaviors. Compar-
ing Section 4.1.3 and A.5.1 should make you expect there to be a definition
of the distance between two behaviors that makes the set of all behaviors
a metric space in which, under the correspondence between behavior predi-
cates and sets of behaviors, we have:

• The operator C on behavior predicates corresponds to the closure op-


erator C on sets of behaviors.

• Safety predicates correspond to closed sets of behaviors.

• Liveness predicates correspond to dense sets of behaviors.


APPENDIX A. DIGRESSIONS 305

We now face a problem. We want to describe the collection of all behaviors


as a metric space. However, mathematicians describe metric spaces as sets,
and the collection of all behaviors isn’t a set. I believe that all the results
about metric spaces that we need would remain true if metric spaces were
arbitrary collections rather than sets, but I haven’t checked this. So for the
rest of this section, we assume that the collection V of all values is made a
set, as explained in Section 2.2.6.4. This makes the collection of all behaviors
also a set.
We’re interested in the closure operator on sets of behaviors, which can
be the same for many different distance functions. The property of the dis-
tance function that provides the closure operator we want is that behaviors
with a long prefix in common are close together. More precisely, for two dif-
ferent behaviors σ and τ , define o(σ, τ ) to be the largest n such that σ and
τ have the same prefix of length n − 1—that is, the largest value n such that
∀ i ∈ 0 . . (n − 1) : σ(i ) = τ (i ). Thus, o(σ, τ ) equals 1 iff σ(0) = τ (0) and
σ(1) 6= τ (1). (There is no such n iff σ = τ , in which case we let o(σ, τ ) = ∞,
where ∞ > i for all i ∈ N.) We get the right closure operator on sets of
behaviors if δ satisfies this property:

For any e > 0, there is an n ∈ N such that o(σ, τ ) > n implies


δ(σ, τ ) < e, for any behaviors σ and τ .

The simplest choice of δ satisfying this and the properties of a distance



function for a metric space is δ(σ, τ ) = 1/(1 + o(σ, τ )) for σ 6= τ . (Of
course, δ(σ, τ ) = 0 if σ = τ .)
With the correspondence between the operator C on behavior predicates
and the closure operator on this metric space, Theorems 4.2 and 4.3 of Sec-
tion 4.1.3 are immediate consequences of Theorems A.2–A.4 of Section A.5.1.
I find it more elegant to deduce the results about behavior predicates from
the corresponding results about metric spaces than to prove them directly.
But that might be because I was educated as a mathematician. Whichever
you prefer, I hope that having an alternative way of thinking about safety
and liveness helps you understand those concepts.
Appendix B

Proofs

B.1 Invariance Proof of Increment


Figure 3.5 in Section 3.4.1.1 defines the initial predicate Init and next-state
action Next for the RTLA formula describing a program having the inductive
invariant Inv . As described in Section 3.4.1.3, to prove the invariance of Inv
we had to prove the two conditions of (3.10). As with most programs, the
proof of the first condition is simple. Here, we describe the proof of the
second condition, which is:

Theorem Next ∧ Inv ⇒ Inv 0

If the program were described in TLA instead of RTLA, the disjunct Stutter
would be removed from the definition of Next; and Next in the theorem
would be replaced by [Next]v , where v is the tuple hx , t, pc i of variables.
The proof of the theorem would be essentially the same, the only difference
being that the action Stutter would be replaced everywhere by its second
conjunct, which is v 0 = v .
The proof of the theorem is decomposed hierarchically. The first two
levels are determined by the logical structure of the theorem. There are two
standard ways to decompose the proof of a formula of the form F ⇒ G:

• Write F in the form F 1 ∨ . . . ∨ F m and prove F i ⇒ G for all i .

• Write G in the form G 1 ∧ . . . ∧ G n and prove F ⇒ G j for all j .

In this case, we can do both, proving F ⇒ G by proving F i ⇒ G j for all i


and j , performing the two decompositions in either order. We can do the first
decomposition by writing Next as a disjunction (since (P ∨ Q ∨ . . . ) ∧ Inv

306
APPENDIX B. PROOFS 307

equals (P ∧ Inv ) ∨ (Q ∧ Inv ) ∨ . . . ), and we can do the second decomposition


because Inv is defined to be the conjunction of three formulas. We do the
first decomposition first.
Expanding the definitions of Next and PgmStep, using the rule that ∃
distributes over ∨ ((2.14) of Section 2.1.9.3), we see that Next is equivalent
to:

(∀ p ∈ Procs : aStep(p) ∧ Inv ) ∨ (∀ p ∈ Procs : bStep(p) ∧ Inv )


∨ (Stutter ∧ Inv )

Writing each ∀ assertion as an Assume/Prove, the top level of the proof


is:
1. Assume: new p ∈ Procs, aStep(p), Inv
Prove: Inv 0

2. Assume: new p ∈ Procs, bStep(p), Inv


Prove: Inv 0

3. Assume: Stutter , Inv


Prove: Inv 0
Proof: By the definitions of Stutter , Inv , TypeOK , and Num-
berDone, since a stutter Stutter step leaves the three variables un-
changed, which by definition of Inv implies that the value of Inv is
unchanged.
4. Q.E.D.
Proof: By steps 1–3 and the definition of Next.

Steps 3 and 4 are simple enough that there is no need to decompose their
proofs. You should try to understand why these steps, and the others whose
proofs are given here, follow from the facts and definitions mentioned in
their proofs. To help you, a little bit of explanation has been added to some
of the proofs.
We now have to prove steps 1 and 2. They can both be decomposed using
the definition of Inv as a conjunction. We consider the proof of step 1. Here
is the first level of its decomposition.

1.1. TypeOK 0

1.2. ∀ i ∈ Procs : (pc 0 (i ) = b) ⇒ (t 0 (i ) ≤ NumberDone 0 )

1.3. x 0 ≤ NumberDone 0
APPENDIX B. PROOFS 308

1.4. Q.E.D.
Proof: By steps 1.1–1.3 and the definition of Inv .

Step 1.2 is the most difficult one to prove, so we examine its proof. The
standard way to prove a formula of this form is to assume i ∈ Procs and
pc 0 (i ) = b and prove t 0 (i ) ≤ NumberDone 0 . So, the first step of the proof
should be a Suffices step asserting that it suffices to make those assump-
tions and prove t 0 (i ) ≤ NumberDone 0 . Thus far, we have used only the
logical structure of the formulas, without thinking about what the formulas
mean. We can go no further that way. To write the rest of the proof of
step 1.2, we have to ask ourselves why an aStep(p) step starting in a state
with Inv true produces a state with t 0 (i ) ≤ NumberDone 0 true.
When I asked myself that question, I realized that the answer depends
on whether or not i is the process p executing the step. That suggested
proving the two cases i 6= p and i = p separately, asserting them as Case
statements. In figuring out how to write those two proofs, I found that both
of them required proving NumberDone 0 = NumberDone. Moreover, this was
true for the same reason in both cases—namely, that an aStep step of any
process leaves NumberDone unchanged. Therefore, I could prove it once in
a single step that precedes the two Case statements. This produced the
following level-3 proof:

1.2.1. Suffices: Assume: new i ∈ Procs, pc 0 (i ) = b


Prove: t 0 (i ) ≤ NumberDone 0
Proof: Obvious.
1.2.2. NumberDone 0 = NumberDone

1.2.3. Case: i = p

1.2.4. Case: i 6= p

1.2.5. Q.E.D.
Proof: By steps 1.2.3 and 1.2.4.

This leaves three steps to prove. Here is the proof of step 1.2.4, which I
think is the most interesting one.

1.2.4.1. (t 0 (i ) = t(i )) ∧ (pc 0 (i ) = pc(i ))


Proof: By step 1 (which implies aStep(p) and Inv ), the step 1.2.4
case assumption, and the definitions of aStep, Inv , and TypeOK ,
APPENDIX B. PROOFS 309

which together imply that the values of t(i ) and pc(i ) are unchanged.
(The definition of TypeOK is needed because type correctness is
required to deduce this.)
1.2.4.2. pc(i ) = b
Proof: By step 1.2.4.1 and the step 1.2.1 assumption pc 0 (i ) = b.
1.2.4.3. t(i ) ≤ NumberDone
Proof: By step 1.2.4.2, the step 1 assumption (which implies Inv ),
and the second conjunct in the definition of Inv .
1.2.4.4. Q.E.D.
Proof: Steps 1.2.4.1, 1.2.4.3, and 1.2.2 imply t 0 (i ) ≤ NumberDone 0 ,
which is the current goal (introduced in step 1.2.1).

The purpose of this example is to illustrate the science of proving correct-


ness of concurrent programs. The program and its proof are very simple.1
The example shows how proving that a program satisfies a property can
be hierarchically decomposed into proving simple mathematical assertions
whose proofs require no understanding of why the program works or what
it’s supposed to do. How this can be done for real abstract programs is an
engineering problem that is outside the scope of this book.

B.2 Proof of Theorem 4.2


Theorem 4.2 If F is a property, then C(F ) is a safety property such
that |= F ⇒ C(F ) and, for any safety property G, if |= F ⇒ G then
|= C(F ) ⇒ G.

Proof: Let F be a property. Extend the definition (3.37) of \ in the obvious


way to finite behaviors ρ so that \ρ equals ρ with stuttering steps removed.
1. Assume: F is a property
Prove: C(F ) is a property
1.1. Suffices: Assume: σ is a behavior
Prove: σ satisfies C(F ) iff \σ does
Proof: By definition of a property, it suffices to show that C(F ) is SI.
By definition of SI, it suffices to assume σ is a behavior and show σ
1
To check the level-1 proof, the TLA+ proof checker requires only that step 2 be
decomposed to a two-step proof, and that it be told to use two simple facts about the
cardinality of finite sets that it easily deduces from a standard library of such facts.
APPENDIX B. PROOFS 310

satisfies C(F ) iff \σ does.


1.2. Assume: σ satisfies C(F )
Prove: \σ satisfies C(F )
Proof: By definition of C, it suffices to assume ρ is a nonempty finite
prefix of \σ and show it is a prefix of a behavior satisfying F . Since
ρ is a prefix of \σ, it equals \τ for some prefix τ of σ, so σ satisfies F
implies τ ◦ ν satisfies F for some behavior ν. Since F is SI and ρ is
obtained from τ by removing stuttering steps, ρ ◦ ν also satisfies F , so
ρ is a prefix of a behavior satisfying F .
1.3. Assume: \σ satisfies C(F )
Prove: σ satisfies C(F )
Proof: By definition of C, it suffices to show that any finite nonempty
prefix ρ of σ satisfies F , which means showing that ρ is the prefix of a
behavior satisfying F . Since \ρ is a prefix of \σ, by hypothesis there is
a behavior τ such that (\ρ) ◦ τ satisfies F . Since F is SI and ρ ◦ τ differs
from (\ρ) ◦ τ only by stuttering steps, ρ ◦ τ too satisfies F . Thus ρ is
the prefix of a behavior satisfying F .
1.4. Q.E.D.
Proof: By steps 1.1–1.3.
2. C(F ) is a safety predicate.
2.1. Suffices: Assume: ρ is a prefix of a behavior that satisfies C(F ).
Prove: ρ↑ satisfies C(F )
Proof: By definition of safety.
2.2. Let σ be a behavior such that ρ ◦ σ satisfies F , and let φ(n) be the
sequence of states consisting of n “copies” of the final state of ρ, for
any n ∈ N, and let τ (n) equal ρ ◦ φ(n) ◦ σ. Then τ (n) satisfies F for
all n ∈ N.
Proof: A behavior σ such that ρ ◦ σ satisfies F exists by the step 2.1
assumption and the definition of C. That τ (n) satisfies F follows from:
(i) \τ (n) equals \(ρ ◦ σ) by definition of φ(n) and τ (n), (ii) ρ ◦ σ satisfies
F , and (iii) F is SI.
2.3. Every finite prefix of ρ↑ is a finite prefix of τ (n), for some n.
Proof: By the definitions of τ (n) and of ρ↑ .
2.4. Q.E.D.
Proof: By steps 2.2 and 2.3, every finite prefix of ρ↑ is a prefix of a
behavior satisfying F . By definition of C, this implies ρ satisfies C(F ),
APPENDIX B. PROOFS 311

which by step 2.1 is what we had to prove.


3. Assume: G a safety property and |= F ⇒ G
Prove: |= C(F ) ⇒ G
Proof: It suffices to assume σ is a behavior satisfying C(F ) and prove it
satisfies G. Since G is a safety property, it suffices to show that any finite
prefix ρ of σ satisfies G. By definition of C, ρ is a prefix of a behavior
satisfying F , and therefore by hypothesis satisfying G. Since G is a safety
property, this implies ρ satisfies G.
4. Q.E.D.
Proof: Steps 1–3 are the assertions of the theorem.

B.3 Proof of Theorem 4.3


Theorem 4.3 Every property F is equivalent to C(F ) ∧ L for a liveness
property L.

Define L = F ∨ ¬C(F )
1. F is equivalent to C(F ) ∧ L .
Proof: By |= (F ⇒ C(F )) (from Theorem 4.2) and propositional logic.
2. L is a liveness property.
2.1. Suffices: Assume: ρ is a finite behavior.
Prove: ρ is a prefix of a behavior τ satisfying L.
Proof: By definition of liveness, since L is a property because the
operators of propositional logic preserve stuttering insensitivity, and
C(F ) is a property by Theorem 4.2.
2.2. Case: ρ is the prefix of a behavior τ satisfying F .
Proof: By definition of L, if τ satisfies F then it satisfies L.
2.3. Case: ρ is not the prefix of any behavior satisfying F .
Proof: By definition of C(F ), if ρ were the prefix of a behavior satis-
fying C(F ), then it would be the prefix of a behavior satisfying F . The
case assumption therefore implies that any behavior τ having ρ as a
prefix does not satisfy C(F ), so it satisfies ¬C(F ) and therefore satisfies
L by definition of L.
2.4. Q.E.D.
Proof: Steps 2.2 and 2.3 cover all possibilities.
APPENDIX B. PROOFS 312

3. Q.E.D.
Proof: By steps 1 and 2.

B.4 Proof of Theorem 4.4


Theorem 4.4 Assume: S a safety property and L a liveness property.
Prove: hS , Li is machine closed iff C(S ∧ L) ≡ S .
1. Assume: hS , Li is machine closed.
Prove: C(S ∧ L) ≡ S
1.1. C(S ∧ L) ⇒ S
Proof: Theorem 4.2 implies |= S ⇒ C(S ), so |= S ∧ L ⇒ C(S ).
That theorem also implies C(S ) is a safety property and therefore
|= C(S ∧ L) ⇒ C(S ). Since S is a safety property, it equals C(S ), so
|= C(S ∧ L) ⇒ C(S ) implies C(S ∧ L) ⇒ S .
1.2. S ⇒ C(S ∧ L)
Proof: It suffices to assume a behavior σ satisfies S and prove σ satis-
fies C(S ∧ L). By definition of machine closure, every finite prefix of σ
can be completed to a behavior of S ∧ L. By definition of C, this implies
σ satisfies C(S ∧ L).
1.3. Q.E.D.
Proof: By steps 1.1 and 1.2.
2. Assume: C(S ∧ L) ≡ S
Prove: hS , Li is machine closed.
Proof: By definition of machine closure, it suffices to assume ρ is a
finite behavior satisfying S and prove ρ can be completed to a behavior
satisfying S ∧ L. By definition of what it means for a finite behavior
to satisfy a property, ρ↑ satisfies S . Since S ≡ C(S ∧ L), behavior ρ↑
satisfies C(S ∧ L). By definition of C, this implies every prefix of ρ↑ can
be completed to a behavior satisfying S ∧ L, and ρ is a prefix of ρ↑ .
3. Q.E.D.
Proof: By steps 1 and 2.
APPENDIX B. PROOFS 313

B.5 Proof of Theorem 4.5


Theorem 4.5 Assume: |= (E U P ) ≡ (3P ∧ E U P ) ∨ (¬3P ∧ 2E )
Prove: |= ((E U P ) ; P ) ≡ (2E ; P ).
1. Suffices: |= ((E U P ) ⇒ 3P ) ≡ (2E ⇒ 3P )
Proof: F ; G is defined to equal 2(F ⇒ 3G).
2. |= ((E U P ) ⇒ 3P ) ≡ ∧ (3P ∧ E U P ) ⇒ 3P
∧ (¬3P ∧ 2E ) ⇒ 3P
Proof: By the theorem’s assumption and the propositional logic tautol-
ogy |= ((F ∨ G) ⇒ H ) ≡ (F ⇒ H ) ∧ (G ⇒ H ).
3. Q.E.D.
Proof: |= 3P ∧ E U P ⇒ 3P is a propositional logic tautology, which
by step 2 implies |= ((E U P ) ⇒ 3P ) ≡ (¬3P ∧2E ⇒ 3P ), which with
the propositional logic tautology |= (¬3P ∧ 2E ⇒ 3P ) ≡ (2E ⇒ 3P )
proves the step 1 goal.

B.6 Proof of Theorem 4.6


We will need this lemma to prove the theorem:
Lemma B.1 For any countable set I , there is a function f in N → I such
that for every i ∈ I , there are infinitely many n ∈ N with f (n) = i

Proof: Let S n equal {hn, i i : i ∈ I }, which is countable because I is. Since


N × I equals {S n : n ∈ N} , Theorem 2.1 of Section 2.2.6.1 implies N × I
S

is countable. Choose g in N → N × I such that n ↔ g(n) is a 1-1 corre-


spondence between N and N×I . For every i ∈ I , the pair hm, i i is in N×I
for all m ∈ N, so there are infinitely many n ∈ N such that g(n)(2) = i .
Therefore, n ∈ N 7→ g(n)(2) is the required function f .

Theorem 4.6 Let Init be a state predicate, Next an action, and v a tuple
of all variables occurring in Init and Next. If Ai is a subaction of Next for
all i in a countable set I , then the pair
h Init ∧ 2[Next]v , ∀ i ∈ I : XFiv (Ai ) i
is machine closed, where each XFiv may be either WF or SF.

Define S = Init ∧ 2[Next]v



APPENDIX B. PROOFS 314

1. Suffices: Assume: ρ is a finite behavior satisfying S .


Prove: There exists a behavior σ having ρ as a prefix that
satisfies S and SFv (Ai ), for all i ∈ I .
Proof: By definition of machine closure, it suffices to show that any
finite behavior ρ satisfying S is the prefix of a behavior σ satisfying
S ∧ ∀ i ∈ I : XFiv (Ai ). Since SFv (Ai ) implies WFv (Ai ), it suffices to show
that σ satisfies SFv (Ai ) for all i ∈ I .
2. Choose f ∈ ( N → I ) such that each i ∈ I equals f (n) for infinitely many
n ∈ N. Define τ j for each j ∈ N as follows. Let τ 0 = ρ, and for j > 0,
define τ j such that:
if τ j −1 is a prefix of a finite behavior µ satisfying S and ending in a
state in which hAf (j −1) iv is enabled.
then τ j equals the finite behavior obtained by appending to µ a
state that makes the last step of τ j an hAf (j −1) iv step.
else τ j is obtained from τ j −1 by adding a stuttering step.
For all j ∈ N, τ j is a prefix of and shorter than τ j +1 , and τ j satisfies S .
Proof: Lemma B.1 shows the existence of f . By construction each τ j
is a prefix of and shorter than τ j +1 . We must just show that τ j satisfies
S . The proof is by induction. It is true for j = 0 since τ 0 equals ρ,
which satisfies S by the step 1 assumption. So we complete the proof
by assuming j > 0 and τ j −1 satisfies S and proving as follows that τ j
satisfies S .
If the if condition in the definition of τ j is true, then τ j satisfies S
because τ j −1 does and τ j is obtained by appending to τ j −1 an hAf (j −1) iv
step, which is a Next step by the hypothesis that |= Ai ⇒ Next for all
i ∈ I . If the if condition is false, then τ j satisfies S because S is stuttering
insensitive, τ j −1 satisfies S , and τ j is obtained by adding a stuttering step
to τ j −1 .
3. Let σ be the behavior having every τ j as a prefix. Then σ is a behavior
satisfying S having the prefix ρ.
Proof: The behavior σ exists (and is unique) because each τ j is a prefix
of and shorter than τ j +1 . Since ρ equals τ 0 , it is a prefix of σ. Step 2
asserts that every prefix τ j of σ satisfies S . By definition of S , this implies
σ satisfies S .
4. σ satisfies SFi (A) for all i ∈ I .
Proof: Step 2 asserts that for any i ∈ I , there are infinitely many j ∈ N
APPENDIX B. PROOFS 315

such that i = f (j − 1). If the if condition in the definition of τ j is true


for all of those values of j , then σ contains an hAi iv step for each of those
values of j , so σ satisfies 23hAi iv . If the if condition is false for any
such j , then τ j −1 cannot be extended to any finite behavior containing
an hAi iv step. Hence, 2¬ EhAi iv is true for the suffix of σ obtained
by removing the prefix τ j −1 . Therefore, σ satisfies 32¬ EhAi iv , which
equals ¬23 EhAi iv . By (4.14), in either case σ satisfies SFv (Ai ).
5. Q.E.D.
Proof: The theorem follows from steps 1, 3, and 4.

B.7 Proof of Theorem 4.7



Theorem 4.7 Let Ai be an action for each i ∈ I , let Q = ∃ i ∈ I : Ai ,
and let XF be either WF or SF. Then
|= ( ∀ i ∈ I : 2( EhAi iv ∧ 2[¬Ai ]v ⇒
2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ) )
⇒ ( XFv (Q) ≡ ∀ i ∈ I : XFv (Ai ) )

Define 2323 to equal 32 if XF is WF, and 23 if XF is SF.


1. Suffices: Assume: ∀ i ∈ I : 2( EhAi iv ∧ 2[¬Ai ]v ⇒
2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ))
Prove: XFv (Q) ≡ ∀ i ∈ I : XFv (Ai )
Proof: Obvious.
2. XFv (Q) ⇒ ∀ i ∈ I : XFv (Ai )
2.1. Suffices: Assume: XFv (Q), new i ∈ I , 2 323 EhAi iv ∧ 32[¬Ai ]v
Prove: false
Proof: By (4.14) and (4.23), since F ⇒ G is equivalent to F ∧ ¬G ⇒
false and ¬23hAi iv equals 32[¬Ai ]v .
2.2. 3 ( 2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ) )
Proof: 2 323 EhAi iv implies 23 EhAi iv , and 23 EhAi iv ∧ 32[¬Ai ]v
implies 3( EhAi iv ∧ 2[¬Ai ]v ) by temporal reasoning. Therefore, the
step 2.1 assumption implies 3( EhAi iv ∧ 2[¬Ai ]v ), which by the step 1
assumption implies 2.2.
2.3. 32[¬Q]v ∧ 32( EhQ iv ≡ EhAi iv )
Proof: By step 2.2, because 3(2F ∧ 2G) equals 32F ∧ 32G for any
F and G, and 2( EhQ iv ⇒ EhAi iv ) implies 2( EhQ iv ≡ EhAi iv )
APPENDIX B. PROOFS 316

because the definition of Q implies |= Ai ⇒ Q, which by the definition


of E in Section 4.2.1 implies |= EhAi iv ⇒ EhQ iv .
2.4. 23hQ iv
Proof: 2 323 F and 32(F ≡ G) imply 2 323 G, for any F and G.
Therefore step 2.3 and the step 2.1 assumption 2 323 EhAi iv imply
2323 EhQ iv . By definition of XF, the step 2.1 assumption XFv (Q)
and 232
3 EhQ iv imply 23hQ iv .
2.5. Q.E.D.
Proof: Since 32[¬Q]v is equivalent to ¬23hQ iv , steps 2.3 and 2.4
imply false, the goal introduced in step 2.1.
3. (∀ i ∈ I : XFv (Ai )) ⇒ XFv (Q)
3.1. Suffices: Assume: (∀ i ∈ I : XFv (Ai )) ∧ 2
32
3 EhQ iv
Prove: 23hQ iv
Proof: By (4.14) and (4.23).
3.2. EhQiv ⇒ 3hQiv
3.2.1. Suffices: Assume: (i ∈ I ) ∧ EhAi iv
Prove: 3hQ iv
Proof: By predicate logic (the ∃ Elimination rule), since Q equals
∃ i ∈ I : Ai , so EhQ iv equals ∃ i ∈ I : EhAi iv by rule E2 of Sec-
tion 5.4.4.2.
3.2.2. Case: 3hAi iv
Proof: 3hQ iv follows from the case assumption and |= Ai ⇒ Q.
3.2.3. Case: 2[¬Ai ]v
Proof: Assumption EhAi iv from step 3.2.1, the case assumption,
and the step 1 assumption imply 2( EhQ iv ⇒ EhAi iv ). This, the
temporal logic tautology |= 2(F ⇒ G) ⇒ (2 32
3F ⇒ 2 323 G), and the
step 3.1 assumption 2323 EhQ iv imply 2 323 EhAi iv . Since i ∈ I by
the step 3.2.1 assumption, the step 3.1 assumption implies XFv (Ai ),
which by 2323 EhAi iv and the definition of XF implies 3hAi iv , which
by |= Ai ⇒ Q implies 3hQ iv .
3.3. Q.E.D.
Proof: All assumptions in effect at step 3.2 are 2 formulas so, as ex-
plained in Section 4.2.4, we can deduce 2( EhQ iv ⇒ 3hQ iv ) from 3.2.
This and the temporal logic tautology
|= 2(F ⇒ 3G) ⇒ (2 323 F ⇒ 23G)
APPENDIX B. PROOFS 317

imply 2
32 3 EhQ iv ⇒ 23Q, which by the step 3.1 assumption 2
32
3 EhQ iv
implies the step 3.1 goal 23Q.
4. Q.E.D.
Proof: By steps 1–3.

B.8 Proof Sketch of Theorem 4.8


Theorem 4.8 Let x be the list x 1 , . . . , x n of variables and let F be a
property such that F (σ) depends only on the values of the variables x in σ,
for any behavior σ. There exists a formula S equal to Init ∧ 2[Next]hx,y i ∧
WFhx,y i (Next), where Init and Next are defined in terms of F , y is a variable
not among the variables x, and the variables of S are x and y, such that
|= F ⇒ [[G]] iff |= S ⇒ G, for any property G. If F is a safety property,
then the conjunct WFhx,y i (Next) is not needed.

Proof sketch: For any behavior σ, let σ|x be the infinite sequence of n-
tuples of values such that σ|x (i ) equals the value of hxi in state σ(i ). The
basic idea is to define S so that the value of y in any state i of a behavior
of S always equals (σ|x )+i for some behavior σ satisfying F , and x always
equals y(0). (Remember that τ is the infinite sequence τ (0) → τ (1) → · · · ,
and τ +i equals τ (i ) → τ (i + 1) → · · · .)
To do this, for any infinite sequence τ of n-tuples of values, we define
F (τ ) to equal F (σ) for any behavior σ such that σ|x equals τ . This uniquely
e
defines F e because, by hypothesis, the value of F (σ) depends only on the
values of the variables x in the behavior σ. Define IsTupleSeq to be the
mapping such that IsTupleSeq(τ ) is true iff τ is an infinite cardinal sequence
of n-tuples of arbitrary values. We then then define S by letting:

Init = ∃ τ : ∧ IsTupleSeq(τ ) ∧ F
e (τ )
∧ (y = τ ) ∧ (hxi = τ (0))

Next = (y 0 = Tail (y)) ∧ (hxi0 = y 0 (0))

With this definition, F (σ) equals true for a behavior σ iff there is a behavior
satisfying S in which the initial value of y is σ|x . Notice that σ is a halting
behavior iff τ ends with an infinite sequence of identical n-tuples. When
y equals that value Tail (y) = y, so hNext ihx,y i equals false and S allows
only stuttering steps from that point on.
Eliminating the conjunct WFhx,y i (Next) allows S to halt even if the
behavior y initially equals σ|x for a non-halting behavior σ that satisfies F .
APPENDIX B. PROOFS 318

That makes no difference if F is a safety property, since in that case every


finite prefix of σ also satisfies F . End Proof Sketch

B.9 Proof of Formula (5.4)


For any variable such as x and any state s, let s x be the value of x in the
state s. For any S -behavior σ:

[[(z 0 = x + y) with . . .]](σ)


≡ [[w 0 = u + v ]](σ) By definition of with . . . .
≡ σ(1)w = σ(0)u + σ(0)v By definition of [[. . .]].

[[z 0 = x + y]](fb(σ))
≡ fb(σ)(1)z = fb(σ)(0)x + fb(σ)(0)y By definition of [[. . .]].
≡ f (σ(1))z = f (σ(0))x + f (σ(0))y By definition of bf .
≡ σ(1)w = σ(0)u + σ(0)v By definition of f .

B.10 Proof of Theorem 6.2


Theorem 6.2 With the assumptions of Theorem 6.1, for all i ∈ I let B i
be a subaction of Ai such that

(*) T ∧ (i 6= j ) ⇒ 2[¬(B i ∧ Aj )]v



for all j in I ; and let B hi = hB i iv ∧ (h 0 = exp i ). Then

T ∧ (∀ i ∈ I : XFiv (Bi )) ≡ ∃ h : T h ∧ (∀ i ∈ I : XFivh (Bih ))

where each XFi is either WF or SF.

1. Assume: i ∈ I
Prove: hB hi ivh ≡ hB i ∧ (h 0 = exp i )iv
1.1. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i ) ∧ (vh 0 6= vh)
Proof: By the definitions of B hi , Next i , and h. . .i... .
1.2. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i )
Proof: By step 1.1, since vh = v ◦ hh i implies (v 0 6= v ) ∧ (vh 0 6= vh) ≡
(v 0 6= v ).
1.3. Q.E.D.
APPENDIX B. PROOFS 319

Proof: By step 1.2 and the definition of h. . .iv .


2. Assume: i ∈ I
Prove: EhB hi ivh ≡ EhB i iv
Proof: By step 1 because exp is assumed not to contain h 0 , so rules E3
and E5 of Section 5.4.4.2 imply EhB i ∧ (h 0 = exp i )iv equals EhB i iv .
Define 2
32
3 i to equal 32 if XFi is WF and to equal 23 if XFi is SF.
3. Assume: T h ∧ ∀ i ∈ I : XFivh (B hi )
Prove: T ∧ ∀ i ∈ I : XFiv (B i )
3.1. Suffices: Assume: (i ∈ I )
Prove: XFiv (B i )
Proof: By Theorem 6.1, T h implies T . Therefore, if suffices to prove
∀ i ∈ I : XFiv (B i ) to prove step 3
3.2. Suffices: Assume: 2323 i EhB i iv
Prove: 23hB i iv
Proof: By (4.14) and (4.23)
3.3. 2
323 i EhB hi ivh
Proof: By the step 3.2 assumption and step 2. (Since step 2 is not in
the scope of any assumptions, it implies 2( EhB hi ivh ≡ EhB i iv ) .)
3.4. 23hB hi ivh
Proof: The step 3 assumption implies XFivh (Bih ), which by step 3.3,
(4.14), and (4.23) implies 23hB hi ivh .
3.5. Q.E.D.
Proof: Step 3.4 and step 1 imply 23hB i ∧(h 0 = exp i )iv , which implies
the goal introduced in step 3.2.
4. Assume: T ∧ ∀ i ∈ I : XFiv (B i )
Prove: ∃ h : T h ∧ ∀ i ∈ I : XFivh (B hi )
4.1. Suffices: Assume: T h ∧ (i ∈ I )
Prove: XFivh (B hi )
Proof: Theorem 6.1 shows that T implies ∃ h : T h . This implies that
to prove T ∧ F implies ∃ h : (T h ∧ G) for any F and G, it suffices
to prove that T ∧ F ∧ T h implies G. Thus, the step 4 assumption
shows that to prove the step 4 goal, it suffices to prove T h implies
∀ i ∈ I : XFivh (B hi ), which is asserted by this step’s Assume/Prove.
4.2. Suffices: Assume: 2
32
3 i EhB hi ivh
APPENDIX B. PROOFS 320

Prove: 23hB hi ivh


Proof: By (4.14) and (4.23).
4.3. 2
32
3 i EhB i iv
Proof: By the step 4.2 assumption and step 2.
4.4. 23hB i iv
Proof: The step 4 assumption implies XFiv (Bi ), which by step 4.3,
(4.14), and (4.23) implies 23hB i iv .
4.5. Q.E.D.
Proof: By hypothesis, B i is a subaction of Ai . By hypothesis (∗) and
the step 4 assumption (which implies T ), B i ∧ Aj equals false if i 6= j .
Hence a B i step must be an Ahi step. By definition of Ahi and the step
4.1 assumption (which implies T h ), every B i step is a B i ∧ (h 0 = exp i )
step. Therefore, step 4.4 and step 1 imply 23hB hi ivh , which is the goal
introduced by step 4.2.
5. Q.E.D.
Proof: Steps 3 and 4 imply that T ∧ (∀ i ∈ I : XFiv (Bi )) is equivalent
to ∃ h : T h ∧ (∀ i ∈ I : XFivh (Bih )) .

B.11 Proof Sketch of Theorem 6.3


Theorem 6.3 Let T equal Init ∧ 2[Next]hxi where x is the list of all
variables of S ; let F be a safety property such that F (σ) depends only on
the values of the variables x in σ, for any behavior σ; and let h be a variable
not one of the variables x. We can add h as a history variable to T to obtain
T h and define a state predicate IF in terms of F such that |= [[T ]] ⇒ F is
true iff IF is an invariant of T h .

Proof sketch: Define T h as follows:

= Init h ∧ 2[Next h ]hx,h i



Th

Init h = Init ∧ (h = hxi)

Next h = Next ∧ (h 0 = Append (h, hxi0 ))

Since h is a history variable, so ∃ h : T h is equivalent to T , to every behavior


σ satisfying T there is a corresponding behavior satisfying T h that has the
same values of the variables of x as σ. Since F does not depend on h, this
means that |= [[T ]] ⇒ F iff |= [[T h ]] ⇒ F .
APPENDIX B. PROOFS 321

We now define ρ|x and F e to be the same as in the proof of Theorem 4.8,
except for finite behaviors ρ. We define ρ|x to be the finite cardinal sequence
of n-tuples of the same length as ρ such that ρ|x (i ) equals the value of hxi
in state ρ(i ); and we define F e to be the predicate on finite sequences of
n-tuples of values such that F (ρ|x ) equals F (ρ) (which by definition equals
e
F (ρ↑ )). We then define IF to equal F e (h).
Much as in the proof of Theorem 4.8, for any behavior τ , a finite prefix
ρ of τ satisfies F iff F
e (ρ|x ) is true, which is true iff F
e (h) is true of the last
state of ρ. Every state of τ is the last state of some finite prefix of τ , and
the safety property F is true of τ iff it is true of every finite prefix of τ , so
F is true of τ iff IF is true of every state of τ . This proves that IF is an
invariant of T h iff T h satisfies F ; and T satisfies F iff T h does, because F
depends only on the variables x. End Proof Sketch

B.12 Proof Sketch of Theorem 6.6


Theorem 6.6 Let x, y, and z be lists of variables, all distinct from one
another; let the variables of T be x and z and the variables of IS be x and y;
and let T equal Init ∧ 2[Next]hx,zi ∧ L. Let the operator Φ map behaviors
satisfying T to behaviors satisfying IS such that Φ(σ) ∼y σ. By adding
history, stuttering, and prophecy variables to T , we can define a formula
T a such that ∃ a : T a is equivalent to T and a tuple exp of expressions
defined in terms of Φ and the variables of T a such that
|= T a ⇒ (IS with y ← exp)

Proof sketch: We first add an infinite stuttering insensitivity t to T ,


defining T t to equal
∧ Init ∧ 2[(Next ∧ (v 0 6= v )) ∨ ((t 0 6= t) ∧ (hx, zi0 = hx, zi))]hx,z,t i
∧ L ∧ 23ht 0 6= t it
In addition to handling the weird case described in Section 6.3.5, this sim-
plifies the proof by not having to consider terminating behaviors, since T t
doesn’t allow them.
Let m be the number of variables in x, z, and t. We next define T th by
adding to T t a history variable h whose value in any state is a cardinal se-
quence of m-tuples. The initial value of h is the one-element sequence whose
single element is the value of hx, z, t i in the initial state. Each step that
does not leave hx, z, t i unchanged appends to h the new value of hx, z, t i.
For a behavior σ satisfying T th that has no stuttering steps (steps leaving
APPENDIX B. PROOFS 322

x, z, and t unchanged), the value of variable h always equals the sequence


h(0), . . . , h(Len(h) − 1) of m-tuples such that each h(i ) equals the value of
hx, z, t i in state σ(i ) for all i < Len(h).
We now add to T th a prophecy variable p that always makes an infinite
sequence of prophecies. The value of p in state σ(i ) of a behavior σ satisfying
T thp is the infinite sequence of m-tuples such that in state σ(i ), the value of
p(0), the next prediction made by p, is the m-tuple that equals the next value
of hx, z, t i different from its current value. This is a generalized prophecy
variable with predictions in the collection (not a set) of m-tuple of values.
It is defined by writing the next-state action Next th as ∃ i : Ai , where

Ai = (i = hx, z, t i0 ) ∧ Next th

and defining the next-state action Next thp of T thp by

Next thp = Ap(0) ∧ Next th ∧ (p 0 = Tail (p))


A Next thp step removes the first element from p and appends that element
to the end of h. Therefore, the value of h ◦ p remains unchanged throughout
any behavior that satisfies T thp . The value of h ◦ p during a behavior σ
satisfying T thp equals the sequence of values of hx, z, t i in the entire behavior
σ, except that σ may have additional (stuttering) steps that leave hx, z, t i
unchanged.
In any state of a behavior satisfying T thp , the value of (h ◦p)(Len(h)−1)
(the last element in the sequence h of m-tuples of values) is the current value
of hx, z, t i. The variables h and p, together with the mapping Φ contain all
the information needed to define a refinement mapping under which T thp
implements IS . To see how this is done, we need some notation.
For any behavior σ and state expression exp, define σ|exp to be the infinite
sequence of values such that (σ|exp )(i ) equals the value of exp in state σ(i ),
for all i ∈ N. Thus σ|hx,z,t i is the sequence of m-tuples of values of hx, z, t i
in the states of σ. Define the mapping Φ e from m-tuples of values to behaviors
so that Φ(ρ) equals Φ(σ) for some behavior σ such that σ|hx,z,t i = ρ. (It
e
doesn’t matter what values the states of σ assign to variables other than
those in x, z, and t since they don’t affect whether or not σ satisfies T .)
We are assuming that Φ(σ) satisfies IS and Φ(σ) ∼ y σ. Therefore, for
any behavior satisfying T thp , for the value of h ◦ p in any state of that
behavior, Φ(he ◦ p) satisfies IS and Φ(h
e ◦ p) ∼y σ for some behavior σ such
that σ|hx,zi = h ◦ p.
To understand how to construct the needed refinement mapping, we con-
sider a simpler version of the theorem that would be true if we were using
APPENDIX B. PROOFS 323

RTLA rather than TLA, so we didn’t have the complication introduced by


e ◦ p) would satisfy Φ(h
stuttering insensitivity In that case, Φ(h e ◦ p) ' y σ
instead of Φ(h ◦ p) ∼y σ. This means that the behavior Φ(σ) satisfying IS is
e
constructed from a behavior σ with σ|hx,z,t i equal to h ◦ p by just changing
the value of the variables y in each state of σ (without adding or removing
stuttering steps). For any state s, let s y be the values of the variables y
in that state. For σ to satisfy IS , the values of y in any state σ(i ) of the
behavior σ should equal the values of Φ(σ)(i )y , the values of y in the corre-
sponding state of Φ(σ).2 Remember that state number i of σ corresponds
to the m-tuple (h ◦ p)(i ) with i = Len(h) − 1. Therefore, the values assigned
to y by the refinement mapping under which T thp implements IS should in
each state equal the values of y in state number Len(h) − 1 of Φ(h e ◦ p). In
other words, we have:
(B.1) |= T thp ⇒ (IS with y ← (Φ(h
e ◦ p)(Len(h) − 1))y )

Let’s now return to the actual situation in which Φ(h e ◦ p) ∼ y σ (rather


than Φ(h ◦ p) 'y σ). The behavior Φ(h ◦ p) is constructed from the behavior
e e
σ, where σ|hx,z,t i equals the value of h ◦ p in every state of σ, by possibly
adding stuttering steps to σ and then modifying the values of y in its states.
(There are no stuttering steps to remove from σ because we defined T t and
the history variable h so that h is a sequence of m-tuples no two successive
elements of which are equal.)
To fix (B.1) to handle these stuttering steps, we have to add a stuttering
variable s to T thp to obtain the program T thps that adds to the behavior
σ satisfying T thp the stuttering steps needed to produce a behavior τ such
that the state τ (i ) corresponds to the state Φ(h e ◦ p)(i ) for every i ∈ N. The
behavior Φ(he ◦ p) tells us where those stuttering steps must be added: each
step Φ(h ◦p)(i ) → Φ(h
e e ◦p)(i +1) that leaves the value of hx, z, t i unchanged
corresponds to a stuttering step added to the behavior σ satisfying T thp to
form τ .
Define the function f in ( N → N) in terms of h ◦ p as follows. Let
f (0) = 0, and for all i ∈ N, let f (i + 1) be the smallest number greater than
f (i ) such that the value of hx, z, t i in state Φ(h
e ◦p)(f (i +1)) is different from
e ◦ p)(i ). (Such a number f (i + 1) exists for the value
its value in state Φ(h
of h ◦ p in any reachable state of T thp because adding the infinite-stuttering
variable t ensured that 23(t 0 6= t) is true for every behavior of T thp .)
e ◦ p) from a behavior σ with σ|
To construct Φ(h hx,z,t i = h ◦ p, we have
to add stuttering steps to make each state number i of σ correspond to
2
This means that for each variable yj of y, the value of yj in any state σ(i) should
equal the value of yj in Φ(σ)(i).
APPENDIX B. PROOFS 324

state number f (i ) of Φ(h


e ◦ p). The stuttering variable s added to T thp
to define T thps has to add f (i + 1) − f (i ) − 1 stuttering steps between
states number i and i + 1 of a behavior of T thp . That means adding
f (Len(h)) − f (Len(h) − 1) − 1 stuttering steps after every step of the next-
state action of T thp . (Of course no stuttering step is added if f (Len(h)) −
f (Len(h) − 1) = 1.) We then define k to be a state expression whose value
in any behavior satisfying T thps is the number of steps of the next-state
action of T thps that have occurred so far. We can define k in terms of the
values of f and s, or we can simply add k as a history variable to T thps .
However we define it, we can express the corrected version of (B.1) that
handles stuttering insensitivity by:
|= T thps ⇒ (IS with y ← (Φ(h
e ◦ p)(k ))y )

which completes the proof. End Proof Sketch

Replacing the Prophecy Variable with a Prophecy Constant


Theorem 6.6 is true if we use a prophecy constant, as defined in Section 6.6,
instead of a prophecy variable. This is proved by modifying the proof sketch
of Theorem 6.6 as follows. After adding the auxiliary variables t and h, we
add to T th a prophecy constant c whose value is the infinite sequence of
m-tuples that are the values of hx, z, t i in the entire behavior. In other
words,

T thc = ∃ c : P (h, c) ∧ T th
where P (h, c) asserts that c is an infinite sequence of m-tuples such that,
in every state, h equals the sequence of the first Len(h) elements of c. We
then define p to be the state function such that, in any state, h ◦ p equals
c. The proof is then the same as the rest of the proof of Theorem 6.6, using
this as the definition of p.

B.13 Proof of Theorem 7.2


The proof uses the following assertion, which the definition of the action
composition operator “·” implies is true for all actions Ai and B j :
(B.2) |= (∃ i ∈ I : Ai ) · (∃ j ∈ J : B j ) ≡ (∃ i ∈ I , j ∈ J : Ai · B j )
Theorem 7.2 If A ≡ ∃ i ∈ I : Ai and B ≡ ∃ j ∈ J : B j for actions Ai and
B j , then:
|= (∀ i ∈ I , j ∈ J : Ai · B j ⇒ B j · Ai ) ⇒ (A · B ⇒ B · A)
APPENDIX B. PROOFS 325

Proof: It suffices to assume Ai · B j ⇒ B j · Ai for all i ∈ I and j ∈ J and


prove A · B ⇒ B · A. Here is the proof:

A · B ≡ (∃ i ∈ I , j ∈ J : Ai · B j ) by (B.2)
⇒ (∃ i ∈ I , j ∈ J : B j · Ai ) we assume Ai · B j ⇒ B j · Ai for all i and j
≡ B ·A by (B.2), substituting I ← J , J ← I ,
Ai ← B j , and B j ← Ai .

End Proof

B.14 Proof of Theorem 7.3


Theorem 7.3 If S equals Init ∧ 2[Next]x , P is a safety property, and Q is a
state predicate such that |= S ⇒ 2 E([Next]+
x ∧ Q ), then |= S ∧ 23Q ⇒ P
0

implies |= S ⇒ P .

1. hS , 23Q i is machine closed.


1.1. Suffices: Assume: ρ a finite behavior satisfying S
Prove: ρ is a prefix of a behavior satisfying S ∧ L
Proof: By definition of machine closure.
1.2. There is a mapping Φ such that if µ is any finite behavior satisfying
S , then Φ(µ) is a finite behavior ending in a state satisfying Q such
that µ ◦ Φ(µ) satisfies S .
Proof: The hypothesis |= S ⇒ 2 E([Next]+ 0
x ∧ Q ) implies that the ac-
tion [Next]+ 0
x ∧ Q is enabled in the last state of µ. Therefore, there is
a finite behavior ψ beginning with the last state of µ and containing at
least two states such that the last state of ψ satisfies Q and every step
of ψ satisfies [Next]+
x . Let Φ(µ) equal Tail (ψ). Then µ ◦ Φ(µ) satisfies
S by definition of S , because µ satisfies Init and 2[Next]x and every
step of ψ is a [Next]+
x step; and the last state of µ ◦ Φ(µ) satisfies Q.

Define τ i for i ∈ N by: τ 0 = ρ

τ i+1 = τ i ◦ Φ(τ i ) for all i ∈ N
1.3. Q.E.D.
Proof: Let σ be the (unique) behavior such that each τ i is a prefix of
σ, so ρ (which equals τ 0 ) is a prefix of σ. Each finite prefix of σ is a
prefix of some τ i and therefore satisfies S . Since S is a safety property,
this implies σ satisfies S . For all i ∈ N, τ i+1 has at least one more
state satisfying Q than τ i does, so σ satisfies 23Q.
APPENDIX B. PROOFS 326

2. S = C(S ∧ 23Q)
Proof: By step 1 and Theorem 4.4.
3. Q.E.D.
Proof: By step 2, the assumption that P is a safety property, and The-
orem 4.2.
Bibliography

[1] Martı́n Abadi and Leslie Lamport. An old-fashioned recipe for real
time. ACM Transactions on Programming Languages and Systems,
16(5):1543–1571, September 1994. This paper has an appendix pub-
lished by ACM only online that contains proofs. Other online versions
of the paper might not contain the appendix.

[2] Martı́n Abadi and Leslie Lamport. Conjoining specifications. ACM


Transactions on Programming Languages and Systems, 17(3):507–534,
May 1995. This paper has an appendix published by ACM only online
that contains proofs. Online versions of the paper might not contain
the appendix.

[3] Bowen Alpern and Fred B. Schneider. Defining liveness. Information


Processing Letters, 21(4):181–185, October 1985.

[4] E. A. Ashcroft. Proving assertions about parallel programs. Journal of


Computer and System Sciences, 10:110–135, February 1975.

[5] Selma Azaiez, Damien Doligez, Matthieu Lemerre, Tomer Libal, and
Stephan Merz. Proving determinacy of the PharOS real-time operating
system. In Michael Butler, Klaus-Dieter Schewe, Atif Mashkoor, and
Miklós Biró, editors, 5th Intl. Conf. Abstract State Machines, Alloy, B,
TLA, VDM, and Z (ABZ 2016), volume 9675 of LNCS, pages 70–85.
Springer, 2016.

[6] Arthur Bernstein and Paul K. Harter, Jr. Proving real time properties of
programs with temporal logic. In Proceedings of the Eighth Symposium
on Operating Systems Principles, pages 1–11, New York, 1981. ACM.
Operating Systems Review 15, 5.

[7] James E. Burns and Nancy A. Lynch. Bounds on shared memory for
mutual exclusion. Inf. Comput., 107(2):171–184, 1993.

327
BIBLIOGRAPHY 328

[8] Ernie Cohen and Leslie Lamport. Reduction in TLA. In David San-
giorgi and Robert de Simone, editors, CONCUR’98 Concurrency The-
ory, volume 1466 of Lecture Notes in Computer Science, pages 317–331.
Springer-Verlag, 1998.

[9] E. W. Dijkstra. Solution of a problem in concurrent programming


control. Communications of the ACM, 8(9):569, September 1965.

[10] Thomas W. Doeppner, Jr. Parallel program correctness through refine-


ment. In Fourth Annual ACM Symposium on Principles of Program-
ming Languages, pages 155–169. ACM, January 1977.

[11] Laurent Doyen, Goran Frehse, George J. Pappas, and André Platzer.
Verification of hybrid systems. In Edmund M. Clarke, Thomas A. Hen-
zinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model
Checking, pages 1047–1110. Springer, 2018.

[12] Michael Fischer. Re: Where are you? Email message to Leslie Lam-
port. Arpanet message sent on June 25, 1985 18:56:29 EDT, num-
ber 8506252257.AA07636@YALE-BULLDOG.YALE.ARPA (47 lines),
1985.

[13] Michael J. Fischer, Nancy Lynch, and Michael S. Paterson. Impossi-


bility of distributed consensus with one faulty process. Journal of the
ACM, 32(2):374–382, April 1985.

[14] R. W. Floyd. Assigning meanings to programs. In Proceedings of the


Symposium on Applied Math., Vol. 19, pages 19–32. American Mathe-
matical Society, 1967.

[15] Aman Goel, Stephan Merz, and Karem A. Sakallah. Towards an auto-
matic proof of the bakery algorithm. In Marieke Huisman and António
Ravara, editors, Formal Techniques for Distributed Objects, Compo-
nents, and Systems, volume 13910 of Lecture Notes in Computer Sci-
ence, pages 21–28. Springer, 2023.

[16] George Gonthier and Leslie Lamport. Recursive operator definitions.


Research Report RR-9341, Inria Saclay Ile de France, 2020. https:
//inria.hal.science/hal-02598330/file/RR-9341.pdf.

[17] Thomas A. Henzinger, Zohar Manna, and Amir Pnueli. What good
are digital clocks? In Werner Kuich, editor, Automata, Languages
and Programming, 19th International Colloquium, ICALP92, Vienna,
BIBLIOGRAPHY 329

Austria, July 13-17, 1992, Proceedings, volume 623 of Lecture Notes in


Computer Science, pages 545–558. Springer, 1992.

[18] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correct-


ness condition for concurrent objects. ACM Transactions on Program-
ming Languages and Systems, 12(3):463–492, January 1990.

[19] Wim H. Hesselink. Eternity variables to prove simulation of specifica-


tions. ACM Trans. Comput. Log., 6(1):175–201, 2005.

[20] C. A. R. Hoare. Proof of correctness of data representations. Acta


Informatica, 1:271–281, 1972.

[21] C.A.R. Hoare. An axiomatic basis for computer programming. Com-


munications of the ACM, 12(10):576–583, October 1969.

[22] Cliff B. Jones. Specification and design of (parallel) programs. In


R. E. A. Mason, editor, Information Processing 83: Proceedings of the
IFIP 9th World Congress, pages 321–332, Amsterdam, September 1983.
IFIP, North-Holland.

[23] R. M. Keller. Formal verification of parallel programs. Communications


of the ACM, 19(7):371–384, July 1976.

[24] R. P. Kurshan and Leslie Lamport. Verification of a multiplier: 64 bits


and beyond. In Costas Courcoubetis, editor, Computer-Aided Verifica-
tion, volume 697 of Lecture Notes in Computer Science, pages 166–179,
Berlin, June 1993. Springer-Verlag. Proceedings of the Fifth Interna-
tional Conference, CAV’93.

[25] Peter Ladkin, Leslie Lamport, Bryan Olivier, and Denis Roegel. Lazy
caching in TLA. Distributed Computing, 12(2/3):151–174, 1999.

[26] Leslie Lamport. The paxos algorithm or how to win a tur-


ing award. Web page. https://lamport.azurewebsites.net/tla/
paxos-algorithm.html.

[27] Leslie Lamport. TLA—temporal logic of actions. A web page at https:


//lamport.azurewebsites.net/tla/tla.html.

[28] Leslie Lamport. Proving the correctness of multiprocess programs.


IEEE Transactions on Software Engineering, SE-3(2):125–143, March
1977.
BIBLIOGRAPHY 330

[29] Leslie Lamport. The mutual exclusion problem—part ii: Statement


and solutions. Journal of the ACM, 32(1):327–348, January 1986.

[30] Leslie Lamport. Hybrid systems in TLA+ . In Robert L. Grossman, Anil


Nerode, Anders P. Ravn, and Hans Rischel, editors, Hybrid Systems,
volume 736 of Lecture Notes in Computer Science, pages 77–102, Berlin,
Heidelberg, 1993. Springer-Verlag.

[31] Leslie Lamport. The temporal logic of actions. ACM Transactions on


Programming Languages and Systems, 16(3):872–923, May 1994.

[32] Leslie Lamport. Proving possibility properties. Theoretical Computer


Science, 206(1–2):341–352, October 1998.

[33] Leslie Lamport. Fairness and hyperfairness. Distributed Computing,


13(4):239–245, November 2000.

[34] Leslie Lamport. Specifying Systems. Addison-Wesley, Boston, 2003.

[35] Leslie Lamport. Real-time model checking is really simple. In Do-


minique Borrione and Wolfgang Paul, editors, Correct Hardware De-
sign and Verification Methods, 13th IFIP WG 10.5 Advanced Research
Working Conference, CHARME 2005, Saarbrücken, Germany, Octo-
ber 3-6, 2005, Proceedings, volume 3725 of Lecture Notes in Computer
Science, pages 162–175. Springer, 2005.

[36] Leslie Lamport. The PlusCal algorithm language. In Martin Leucker


and Carroll Morgan, editors, Theoretical Aspects of Computing, ICTAC
2009, volume 5684 of Lecture Notes in Computer Science, pages 36–60.
Springer-Verlag, 2009.

[37] Leslie Lamport and Stephan Merz. Auxiliary variables in TLA+.


arXiv:1703.05121 (https://arxiv.org/abs/1703.05121) Also avail-
able, together with TLA+ specifications, at http://lamport.
azurewebsites.net/tla/auxiliary/auxiliary.html.

[38] Mark H. Liffiton and Karem A. Sakallah. Algorithms for comput-


ing minimal unsatisfiable subsets of constraints. J. Autom. Reason.,
40(1):1–33, 2008.

[39] Richard J. Lipton. Reduction: A method of proving properties of paral-


lel programs. Communications of the ACM, 18(12):717–721, December
1975.
BIBLIOGRAPHY 331

[40] Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc
Brooker, and Michael Deardeuff. How amazon web services uses formal
methods. Communications of the ACM, 58(4):66–73, April 2015.

[41] Brian M. Oki and Barbara H. Liskov. Viewstamped replication: A new


primary copy method to support highly-available distributed systems.
In Proceedings of the Seventh Annual ACM Symposium on Principles
of Distributed Computing, pages 8–17. ACM Press, August 1988.

[42] S. Owicki and D. Gries. An axiomatic proof technique for parallel


programs I. Acta Informatica, 6(4):319–340, 1976.

[43] Susan Owicki and David Gries. Verifying properties of parallel


programs: An axiomatic approach. Communications of the ACM,
19(5):279–284, May 1976.

[44] Susan Owicki and Leslie Lamport. Proving liveness properties of con-
current programs. ACM Transactions on Programming Languages and
Systems, 4(3):455–495, July 1982.

[45] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agree-
ment in the presence of faults. Journal of the ACM, 27(2):228–234,
April 1980.

[46] Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th
Annual Symposium on the Foundations of Computer Science, pages 46–
57. IEEE, November 1977.

[47] Eric Verhulst, Raymond T. Boute, José Miguel Sampaio Faria, Bernard
H. C. Sputh, and Vitaliy Mezhuyev. Formal Development of a Network-
Centric RTOS. Springer, New York, 2011.

[48] Hagen Völzer and Daniele Varacca. Defining fairness in reactive and
concurrent systems. J. ACM, 59(3):13:1–13:37, 2012.
Index

The index obviously needs to be added.

332

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy