Notes Mono PDF
Notes Mono PDF
Notes Mono PDF
Peter Sewell
Computer Laboratory
University of Cambridge
Schedule:
Lectures 1–8: LT1, MWF 11am, 26 Jan – 11 Feb
Lectures 9–12: LT1, MWF 11am, 27 Feb – 6 March
1
Contents
Syllabus 3
Learning Guide 4
Summary of Notation 5
1 Introduction 8
2 A First Imperative Language 12
2.1 Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 L1: Collected Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Induction 42
3.1 Abstract Syntax and Structural Induction . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Inductive Definitions and Rule Induction . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Example Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Inductive Definitions, More Formally (optional) . . . . . . . . . . . . . . . . . . . . . 61
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 Functions 63
4.1 Function Preliminaries: Abstract Syntax up to Alpha Conversion, and Substitution . 65
4.2 Function Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 Function Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4 Local Definitions and Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6 L2: Collected Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 Data 86
5.1 Products, Sums, and Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Mutable Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.3 Evaluation Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 L3: Collected Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6 Subtyping and Objects 100
6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7 Semantic Equivalence 107
7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8 Concurrency 113
8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9 Low-level semantics 122
10 Epilogue 122
2
Syllabus
This course is a prerequisite for Types (Part II), Denotational Semantics (Part II), and
Topics in Concurrency (Part II).
Aims
The aim of this course is to introduce the structural, operational approach to program-
ming language semantics. It will show how to specify the meaning of typical programming
language constructs, in the context of language design, and how to reason formally about
semantic properties of programs.
Lectures
• Introduction. Transition systems. The idea of structural operational semantics.
Transition semantics of a simple imperative language. Language design options.
• Types. Introduction to formal type systems. Typing for the simple imperative lan-
guage. Statements of desirable properties.
• Induction. Review of mathematical induction. Abstract syntax trees and struc-
tural induction. Rule-based inductive definitions and proofs. Proofs of type safety
properties.
• Functions. Call-by-name and call-by-value function application, semantics and typ-
ing. Local recursive definitions.
• Data. Semantics and typing for products, sums, records, references.
• Subtyping. Record subtyping and simple object encoding.
• Semantic equivalence. Semantic equivalence of phrases in a simple imperative lan-
guage, including the congruence property. Examples of equivalence and non-equivalence.
• Concurrency. Shared variable interleaving. Semantics for simple mutexes; a serial-
izability property.
• Low-level semantics. Monomorphic typed assembly language.
Objectives
At the end of the course students should
• be familiar with rule-based presentations of the operational semantics and type systems
for some simple imperative, functional and interactive program constructs
• be able to prove properties of an operational semantics using various forms of induction
(mathematical, structural, and rule-based)
• be familiar with some operationally-based notions of semantic equivalence of program
phrases and their basic properties
Recommended reading
Hennessy, M. (1990). The semantics of programming languages. Wiley. Out of print, but
available on the web at http://www.cogs.susx.ac.uk/users/matthewh/semnotes.ps.gz
* Pierce, B.C. (2002). Types and programming languages. MIT Press.
Winskel, G. (1993). The formal semantics of programming languages. MIT Press.
3
Learning Guide
Books:
• Hennessy, M. (1990). The Semantics of Programming Languages. Wiley. Out of
print, but available on the web at http://www.cogs.susx.ac.uk/users/matthewh/
semnotes.ps.gz.
Introduces many of the key topics of the course.
• Pierce, B. C. (2002) Types and Programming Languages. MIT Press.
This is a graduate-level text, covering a great deal of material on programming language
semantics. The first half (through to Chapter 15) is relevant to this course, and some of the
later material relevant to the Part II Types course.
• Pierce, B. C. (ed) (2005) Advanced Topics in Types and Programming Languages. MIT
Press.
This is a collection of articles by experts on a range of programming-language semantics topics.
Most of the details are beyond the scope of this course, but it gives a good overview of the
state of the art. The contents are listed at http://www.cis.upenn.edu/~bcpierce/attapl/.
• Winskel, G. (1993). The Formal Semantics of Programming Languages. MIT Press.
An introduction to both operational and denotational semantics; recommended for the Part
II Denotational Semantics course.
Further reading:
• Plotkin, G. D.(1981). A structural approach to operational semantics. Technical
Report DAIMI FN-19, Aarhus University.
These notes first popularised the ‘structural’ approach to operational semantics—the ap-
proach emphasised in this course—but couched solely in terms of transition relations (‘small-
step’ semantics), rather than evaluation relations (‘big-step’, ‘natural’, or ‘relational’ seman-
tics). Although somewhat dated and hard to get hold of (the Computer Laboratory Library
has a copy), they are still a mine of interesting examples.
• The two essays:
Hoare, C. A. R.. Algebra and Models.
Milner, R. Semantic Ideas in Computing.
In: Wand, I. and R. Milner (Eds) (1996). Computing Tomorrow. CUP.
Two accessible essays giving somewhat different perspectives on the semantics of computation
and programming languages.
4
Tripos questions: This version of the course was first given in 2002–2003. The questions
since then are directly relevant, and there is an additional mock question on the course web
page. The previous version of the course (by Andrew Pitts) used a slightly different form
of operational semantics, ‘big-step’ instead of ‘small-step’ (see Page 82 of these notes), and
different example languages, so the notation in most earlier questions may seem unfamiliar
at first sight.
These questions use only small-step and should be accessible: 1998 Paper 6 Question 12,
1997 Paper 5 Question 12, and 1996 Paper 5 Question 12.
These questions use big-step, but apart from that should be ok: 2002 Paper 5 Question 9,
2002 Paper 6 Question 9, 2001 Paper 5 Question 9, 2000 Paper 5 Question 9, 1999 Paper 6
Question 9 (first two parts only), 1999 Paper 5 Question 9, 1998 Paper 5 Question 12, 1995
Paper 6 Question 12, 1994 Paper 7 Question 13, 1993 Paper 7 Question 10.
These questions depend on material which is no longer in this course (complete partial
orders, continuations, or bisimulation – see the Part II Denotational Semantics and Topics
in Concurrency courses): 2001 Paper 6 Question 9, 2000 Paper 6 Question 9, 1997 Paper 6
Question 12, 1996 Paper 6 Question 12, 1995 Paper 5 Question 12, 1994 Paper 8 Question
12, 1994 Paper 9 Question 12, 1993 Paper 8 Question 10, 1993 Paper 9 Question 10.
Feedback: Please do complete the on-line feedback form at the end of the course, and let
me know during it if you discover errors in the notes or if the pace is too fast or slow. A list
of corrections will be on the course web page.
Acknowledgements: These notes draw, with thanks, on earlier courses by Andrew Pitts,
on Benjamin Pierce’s book, and many other sources. Any errors are, of course, newly
introduced by me.
Summary of Notation
Each section is roughly in the order that notation is introduced. The grammars of the
languages are not included here, but are in the Collected Definitions of L1, L2 and L3 later
in this document.
5
Logic and Set Theory
Φ ∧ Φ′ and
Φ ∨ Φ′ or
Φ ⇒ Φ′ implies
¬Φ not
∀ x .Φ(x ) for all
∃ x .Φ(x ) exists
a ∈ A element of
{a1 , ..., an } the set with elements a1 , ..., an
A1 ∪ A2 union
A1 ∩ A2 intersection
A1 ⊆ A2 subset or equal
6
Particular sets
B = {true, false} the set of booleans
L = {l , l1 , l2 , ...} the set of locations
Z = {.., −1, 0, 1, ...} the set of integers
N = {0, 1, ...} the set of natural numbers
X = {x, y, ...} the set of L2 and L3 variables
LAB = {p, q, ...} the set of record labels
M = {m, m0 , m1 , ...} the set of mutex names
T the set of all types (in whichever language)
Tloc the set of all location types (in whichever language)
L1 the set of all L1 expressions
TypeEnv the set of all L1 type environments, finite partial functions from L to Z
TypeEnv2 the set of all L2 type environments, the finite partial functions from L ∪ X to Tloc ∪ T
such that ∀ ℓ ∈ dom(Γ).Γ(ℓ) ∈ Tloc and ∀ x ∈ dom(Γ).Γ(x ) ∈ T
A thread actions
Metavariables
b ∈ B boolean
n ∈ Z integer
ℓ ∈ L location
op binary operation
e, f expression (of whichever language)
v value (of whichever language)
s store (of whichever language)
T ∈ T type (of whichever language)
Tloc ∈ Tloc location type (of whichever language)
Γ type environment (also, set of propositional assumptions)
i, k, y natural numbers
c configuration (or state), typically he, si with expression e and store s
Φ formula
c tree constructor
R set of rules
(H , c) a rule with hypotheses H ⊆ A and conclusion c ∈ A for some set A
SR a subset inductively defined by the set of rules R
x ∈ X variable
σ substitution
lab ∈ LAB record label
E evaluation context
C arbitrary context
π permutation of natural numbers
m ∈ M mutex name
M state of all mutexes (a function M :M −→ B)
a thread action, for a ∈ A
Other
hole in a context
C [e] context C with e replacing the hole
7
1 Introduction
Peter Sewell
1B, 12 lectures
2008–9
In this course we will take a close look at programming languages. We will focus on how
one can define precisely what a programming language is – i.e., how the programs of the
language behave, or, more generally, what their meaning, or semantics, is.
Many programming languages that you meet are described only in natural language, e.g.
the English standards documents for C, Java, XML, etc. These are reasonably accessible
(though often written in ‘standardsese’), but there are some major problems. It is very
hard, if not impossible, to write really precise definitions in informal prose. The standards
often end up being ambiguous or incomplete, or just too large and hard to understand.
That leads to differing implementations and flaky systems, as the language implementors
and users do not have a common understanding of what it is. More fundamentally, natural
language standards obscure the real structure of languages – it’s all too easy to add a feature
and a quick paragraph of text without thinking about how it interacts with the rest of the
language.
Instead, as we shall see in this course, one can develop mathematical definitions of how
programs behave, using logic and set theory (e.g. the definition of Standard ML, the .NET
CLR, recent work on XQuery, etc.). These require a little more background to understand
and use, but for many purposes they are a much better tool than informal standards.
8
Semantics complements the study of language implementation (cf. Compiler Construction
and Optimising Compilers). We need languages to be both clearly understandable, with
precise definitions, and have good implementations.
This is true not just for the major programming languages, but also for intermediate lan-
guages (JVM, CLR), and the many, many scripting and command languages, that have
often been invented on-the-fly without sufficient thought. How many of you will do lan-
guage design? lots!
More broadly, while in this course we will look mostly at semantics for conventional pro-
gramming languages, similar techniques can be used for hardware description languages,
verification of distributed algorithms, security protocols, and so on – all manner of subtle
systems for which relying on informal intuition alone leads to error. Some of these are
explored in Specification and Verification and Topics in Concurrency.
Warmup
C♯
class M {
public static void Main() {
IntThunk[] funcs = new IntThunk[11];
for (int i = 0; i <= 10; i++)
Slide 1 {
funcs[i] = delegate() { return i; };
}
foreach (IntThunk f in funcs)
{
System.Console.WriteLine(f());
}
}
}
9
Ruby (expected)
Output:
x is 123
14
x is 123
Ruby (unexpected)
10
Various different approaches have been used for expressing semantics.
• Operational semantics
• Denotational semantics
• Axiomatic, or Logical, semantics
...Static and dynamic semantics...
Operational: define the meaning of a program in terms of the computation steps it takes in
an idealised execution. Some definitions use structural operational semantics, in which the
intermediate states are described using the language itself; others use abstract machines,
which use more ad-hoc mathematical constructions.
Denotational: define the meaning of a program as elements of some abstract mathematical
structure, e.g. regarding programming-language functions as certain mathematical functions.
cf. the Denotational Semantics course.
Axiomatic or Logical: define the meaning of a program indirectly, by giving the axioms of
a logic of program properties. cf. Specification and Verification.
All these are dynamic semantics, describing behaviour in one way or another. In contrast
the static semantics of a language describes its compile-time typechecking.
‘Toy’ languages
Real programming languages are large, with many features and, often,
with redundant constructs – things that can be expressed in the rest of the
language.
Core
11
Operational semantics
(assignment and while ) L10<1,2,3,4 Type systems
00<<
00<< Implementations
0 <<<
5,6 00 << Language design choices
(functions and recursive definitions) L2 00 <
00 <<< Inductive definitions
00 <<
0 << Inductive proof – structural; rule
8 00 <<
(products, sums, records, references) L3 <<
uu 0
uu
u 000 << Abstract syntax up to alpha
uuu 0 <<
u <
Subtyping Semantic <<<
9 11 <<
and Objects Equivalence <<
<<
<<
<
TAL 10 Concurrency12
In the core we will develop enough techniques to deal with the semantics of a non-trivial
small language, showing some language-design pitfalls and alternatives along the way. It
will end up with the semantics of a decent fragment of ML. The second part will cover a
selection of more advanced topics.
Admin
• Not all previous Tripos questions are relevant (see the notes)
• Exercises in the notes.
• Implementations on web.
• Books (Hennessy, Pierce, Winskel)
L1
12
L1 – Example
l2 := 0;
while !l1 ≥ 1 do (
l2 :=!l2 +!l1 ;
l1 :=!l1 + −1)
L1 – Syntax
Operations op ::= + |≥
Expressions
Points to note:
• we’ll return later to exactly what the set L1 is when we talk about abstract syntax
• unbounded integers
• abstract locations – can’t do pointer arithmetic on them
• untyped, so have nonsensical expressions like 3 + true
• what kind of grammar is that?
• don’t have expression/command distinction
• doesn’t much matter what basic operators we have
• carefully distinguish metavariables b, n, ℓ, op , e etc. from program locations l etc..
In order to describe the behaviour of L1 programs we will use structural operational seman-
tics to define various forms of automata:
13
Transition systems
To compare with the automata you saw in Regular Languages and Finite Automata: a
transition system is like an NFAε with an empty alphabet (so only ε transitions) except (a)
it can have infinitely many states, and (b) we don’t specify a start state or accepting states.
Sometimes one adds labels (e.g. to represent IO) but mostly we’ll just look at the values of
terminated states, those that cannot do any transitions.
Some handy auxilary notation:
• −→∗ is the reflexive transitive closure of −→, so c −→∗ c ′ iff there exist k ≥ 0 and
c0 , .., ck such that c = c0 −→ c1 ... −→ ck = c ′ .
• 6−→ is a unary predicate (a subset of Config) defined by c 6−→ iff ¬ ∃ c ′ .c −→ c ′ .
• The transition relation is deterministic if for all states c there is at most one c ′ such
that c −→ c ′ , ie if ∀ c.∀ c ′ , c ′′ .(c −→ c ′ ∧ c −→ c ′′ ) =⇒ c ′ = c ′′ .
The particular transition systems we use for L1 are as follows.
L1 Semantics (1 of 4) – Configurations
{l1 7→ 7, l3 7→ 23}
he, si −→ he ′ , s ′ i
A finite partial function f from a set A to a set B is a set containing a finite number n ≥ 0
of pairs {(a1 , b1 ), ..., (an , bn )}, often written {a1 7→ b1 , ..., an 7→ bn }, for which
• ∀ i ∈ {1, .., n}.ai ∈ A (the domain is a subset ofA)
• ∀ i ∈ {1, .., n}.bi ∈ B (the range is a subset of B )
• ∀ i ∈ {1, .., n}, j ∈ {1, .., n}.i 6= j ⇒ ai 6= aj (f is functional, i.e. each element of A is
mapped to at most one element of B )
For a partial function f , we write dom(f ) for the set of elements in the domain of f (things
that f maps to something) and ran(f ) for the set of elements in the range of f (things that
something is mapped to by f ). For example, for the s above we have dom(s) = {l1 , l3 } and
ran(s) = {7, 23}. Note that a finite partial function can be empty, just {}.
We write store for the set of all stores.
14
Transitions are single computation steps. For example we will have:
hl := 2+!l , {l 7→ 3}i
−→ hl := 2 + 3, {l 7→ 3}i
−→ hl := 5, {l 7→ 3}i
−→ hskip, {l 7→ 5}i
6−→
want to keep on until we get to a value v , an expression in
V = B ∪ Z ∪ {skip}.
Say he, si is stuck if e is not a value and he, si 6−→. For example
2 + true will be stuck.
We could define the values in a different, but equivalent, style: Say values v are expressions
from the grammar v ::= b | n | skip.
Now define the behaviour for each construct of L1 by giving some rules that (together)
define a transition relation −→.
L1 Semantics (2 of 4) – Rules (basic operations)
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
How to read these? The rule (op +) says that for any instantiation of the metavariables
n, n1 and n2 (i.e. any choice of three integers), that satisfies the sidecondition, there is a
transition from the instantiated configuration on the left to the one on the right.
We use a strict naming convention for metavariables: n can only be instantiated by integers,
not by arbitrary expressions, cabbages, or what-have-you.
The rule (op1) says that for any instantiation of e1 , e1′ , e2 , s, s ′ (i.e. any three expressions
and two stores), if a transition of the form above the line can be deduced then we can
deduce the transition below the line. We’ll be more precise about this later.
Observe that – as you would expect – none of these first rules introduce changes in the store
part of configurations.
15
Example
(op +)
h2 + 3, ∅i −→ h5, ∅i
(op1)
h(2 + 3) + (6 + 7), ∅i −→ h5 + (6 + 7), ∅i
(op +)
h6 + 7, ∅i −→ h13, ∅i
(op2)
h5 + (6 + 7), ∅i −→ h5 + 13, ∅i
(op +)
h5 + 13, ∅i −→ h18, ∅i
he, si −→ he ′ , s ′ i
(assign2)
hℓ := e, si −→ hℓ := e ′ , s ′ i
he1 , si −→ he1′ , s ′ i
(seq2)
he1 ; e2 , si −→ he1′ ; e2 , s ′ i
Example
hl := 3; l :=!l , {l 7→ 0}i −→ ?
h15+!l , ∅i −→ ?
16
L1 Semantics (4 of 4) – The rest (conditionals and while)
he1 , si −→ he1′ , s ′ i
(if3)
hif e1 then e2 else e3 , si −→ hif e1′ then e2 else e3 , s ′ i
(while)
Example
If
he, si −→∗ ?
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
Determinacy
Note that top-level universal quantifiers are usually left out – the theorem really says “For
all e, s, e1 , s1 , e2 , s2 , if he, si −→ he1 , s1 i and he, si −→ he2 , s2 i then he1 , s1 i = he2 , s2 i”.
17
L1 Implementation
L1 Implementation
Will implement an interpreter for L1, following the definition. Use mosml
(Moscow ML) as the implementation language, as datatypes and pattern
matching are good for this kind of thing.
We’ve chosen to represent locations as strings, rather arbitrarily (really, so they pretty-print
trivially). A lower-level implementation would use ML references or, even lower, machine
pointers.
In the semantics, a store is a finite partial function from locations to integers. In the
implementation, we represent a store as a list of loc*int pairs containing, for each ℓ in the
domain of the store and mapped to n, exactly one element of the form (l,n). The order of
the list will not be important. This is not a very efficient implementation, but it is simple.
datatype expr =
Integer of int
| Boolean of bool
| Op of expr * oper * expr
| If of expr * expr * expr
| Assign of loc * expr
| Deref of loc
| Skip
| Seq of expr * expr
| While of expr * expr
The expression and operation datatypes have essentially the same form as the abstract
grammar. Note, though, that it does not exactly match the semantics, as that allowed
arbitrary integers whereas here we use the bounded Moscow ML integers – so not every
term of the abstract syntax is representable as an element of type expr, and the interpreter
will fail with an overflow exception if + overflows.
18
Store operations
(you might think it would be better ML style to use exceptions instead of these options;
that would be fine).
(op1), (op2)
...
if (is value e1) then
case reduce (e2,s) of
SOME (e2’,s’) =>
SOME (Op(e1,opr,e2’),s’)
| NONE => NONE
else
case reduce (e1,s) of
SOME (e1’,s’) =>
SOME(Op(e1’,opr,e2),s’)
| NONE => NONE )
19
Note that the code depends on global properties of the semantics, including the fact that it
defines a deterministic transition system, so the comments indicating that particular lines
of code implement particular semantic rules are not the whole story.
(assign1), (assign2)
Demo
The full interpreter code is available on the web, in the file l1.ml, together with a pretty-
printer and the type-checker we will come to soon. You should make it go...
(* 2002-11-08 -- Time-stamp: <2004-01-03 16:17:04 pes20> -*-SML-*- *)
(* Peter Sewell *)
That will give you a MoscowML top level in which these definitions
are present. You can then type
doit ();
doit2 ();
to run the type-checker on the same simple example; you can try
other examples analogously. This file doesn’t have a parser for
20
l1, so you’ll have to enter the abstract syntax directly, eg
This has been tested with Moscow ML version 2.00 (June 2000), but
should work with any other implementation of Standard ML. *)
(* *********************)
(* the abstract syntax *)
(* *********************)
datatype expr =
Integer of int
| Boolean of bool
| Op of expr * oper * expr
| If of expr * expr * expr
| Assign of loc * expr
| Deref of loc
| Skip
| Seq of expr * expr
| While of expr * expr
(* **********************************)
(* an interpreter for the semantics *)
(* **********************************)
21
fun update (s, (l,n)) = update’ [] s (l,n)
22
evaluate : expr * store -> (expr * store) option
• the ML groups together all the parts of each algorithm, into the
reduce, infertype, and prettyprint functions;
• the Java groups together everything to do with each clause of the
abstract syntax, in the IfThenElse, Assign, etc. classes.
23
For comparison, here is a Java implementation – with thanks to Matthew Parkinson. This
includes code for type inference (the ML code for which is on Page 37) and printy-printing
(in l1.ml but not shown above).
Note the different code organisation between the ML and Java versions: the ML has a
datatype with a constructor for each clause of the abstract syntax grammar, and reduce
and infertype function definitions that each have a case for each of those constructors; the
Java has a subclass of Expression for each clause of the abstract syntax, each of which
defines smallStep and typecheck methods.
public class L1 {
Expression e =
new Seq(new While(new GTeq(new Deref(l2),new Deref(l1)),
new Seq(new Assign(l3, new Plus(new Deref(l1),new Deref(l3))),
new Assign(l1,new Plus(new Deref(l1),new Int(1))))
),
new Deref(l3))
;
try{
//Type check
Type t= e.typeCheck(env);
System.out.println("Program has type: " + t);
//Evaluate program
System.out.println(e + "\n \n");
while(!(e instanceof Value) ){
e = e.smallStep(s1);
//Display each step of reduction
System.out.println(e + "\n \n");
}
//Give some output
System.out.println("Program has type: " + t);
System.out.println("Result has type: " + e.typeCheck(env));
System.out.println("Result: " + e);
System.out.println("Terminating State: " + s1);
} catch (TypeError te) {
System.out.println("Error:\n" + te);
System.out.println("From code:\n" + e);
} catch (CanNotReduce cnr) {
System.out.println("Caught Following exception" + cnr);
System.out.println("While trying to execute:\n " + e);
System.out.println("In state: \n " + s1);
}
}
}
class Location {
String name;
24
Location(String n) {
this.name = n;
}
public String toString() {return name;}
}
class State {
java.util.HashMap store = new java.util.HashMap();
//Used for setting the initial store for testing not used by
//semantics of L1
State add(Location l, Value v) {
store.put(l,v);
return this;
}
class Environment {
java.util.HashSet env = new java.util.HashSet();
boolean contains(Location l) {
return env.contains(l);
}
}
class Type {
int type;
Type(int t) {type = t;}
public static final Type BOOL = new Type(1);
public static final Type INT = new Type(2);
public static final Type UNIT = new Type(3);
25
public String toString() {
switch(type) {
case 1: return "BOOL";
case 2: return "INT";
case 3: return "UNIT";
}
return "???";
}
}
Bool(boolean b) {
value = b;
}
26
class Seq extends Expression {
Expression exp1,exp2;
Seq(Expression e1, Expression e2) {
exp1 = e1;
exp2 = e2;
}
27
if(!( exp1 instanceof Value)) {
return new Plus(exp1.smallStep(state),exp2);
} else if (!( exp2 instanceof Value)) {
return new Plus(exp1, exp2.smallStep(state));
} else {
if( exp1 instanceof Int && exp2 instanceof Int ) {
return new Int(((Int)exp1).value + ((Int)exp2).value);
}
else throw new CanNotReduce("Operands are not both integers.");
}
}
public String toString(){return exp1 + " + " + exp2;}
28
Assign(Location l, Expression exp1) {
this.l = l;
this.exp1 = exp1;
}
Deref(Location l) {
this.l = l;
}
public String toString(){return "WHILE " + exp1 + " DO {" + exp2 +"}";}
29
L1 is a simple language, but it nonetheless involves several language design choices.
he2 , si −→ he2′ , s ′ i
(op1b)
he1 op e2 , si −→ he1 op e2′ , s ′ i
he1 , si −→ he1′ , s ′ i
(op2b)
he1 op v , si −→ he1′ op v , s ′ i
In this language (call it L1b)
For programmers whose first language has left-to-right reading order, left-to-right evaluation
is arguably more intuitive than right-to-left. Nonetheless, some languages are right-to-left
for efficiency reasons (e.g. OCaml bytecode).
It is important to have the same order for all operations, otherwise we certainly have a
counter-intuitive language.
One could also underspecify, taking both (op1) and (op1b) rules. That language doesn’t
have the Determinacy property.
Sometimes ordering really is not always guaranteed, say for two writes l := 1; l := 2. In L1
it is defined, but if we were talking about a setting with a cache (either processors, or disk
block writes, or something) we might have to do something additional to force ordering.
Similarly if you have concurrency l := 1 | l := 2. Work on redesigning the Java Memory
Model by Doug Lea and Bill Pugh, which involves this kind of question, can be found at
http://www.cs.umd.edu/~pugh/java/memoryModel/.
One could also underspecify in a language definition but require each implementation to
use a consistent order, or require each implementation to use a consistent order for each
operator occurrence in the program source code. A great encouragement to the bugs...
Recall
Matter of taste?
30
Another possiblity: return the old value, e.g. in ANSI C signal handler installation sig-
nal(n,h). Atomicity?
Recall that
2. allow assignment to an ℓ ∈
/ dom(s) to initialise that ℓ.
These would both be bad design decisions, liable to lead to ghastly bugs, with locations
initialised on some code path but not others. Option 1 would be particularly awkward in
a richer language where values other than integers can be stored, where there may not be
any sensible value to default-initialise to.
Looking ahead, any reasonable type system will rule out, at compile-time, any program that
could reach a stuck expression of these forms.
he, si −→ he ′ , s ′ i
(assign2)
hℓ := e, si −→ hℓ := e ′ , s ′ i
How many operators? Obviously want more than just + and ≥. But this is
semantically dull - in a full language would add in many, in standard
libraries.
(beware, it’s not completely dull - eg floating point specs! Even the L1 impl
and semantics aren’t in step.).
31
L1: Collected Definition
Syntax (deref) h!ℓ, si −→ hn, si if ℓ ∈ dom(s) and s(ℓ) = n
Booleans b ∈ B = {true, false}
Integers n ∈ Z = {..., −1, 0, 1, ...} (assign1) hℓ := n, si −→ hskip, s + {ℓ 7→ n}i if ℓ ∈ dom(s)
Locations ℓ ∈ L = {l , l0 , l1 , l2 , ...} he, si −→ he ′ , s ′ i
(assign2)
Operations op ::= + |≥
hℓ := e, si −→ hℓ := e ′ , s ′ i
Expressions
(seq1) hskip; e2 , si −→ he2 , si
e ::= n | b | e1 op e2 | if e1 then e2 else e3 |
ℓ := e |!ℓ | he1 , si −→ he1′ , s ′ i
(seq2)
skip | e1 ; e2 | he1 ; e2 , si −→ he1′ ; e2 , s ′ i
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
Expressiveness
• no: there’s no support for gadgets like functions, objects, lists, trees,
modules,.....
2.2 Typing
L1 Typing
Type Systems
used for
32
Type systems are also used to provide information to compiler optimisers; to enforce security
properties, from simple absence of buffer overflows to sophisticated information-flow policies;
and (in research languages) for many subtle properties, e.g. type systems that allow only
polynomial-time computation. There are rich connections with logic, which we’ll return to
later.
Run-time errors
Untrapped errors. May go unnoticed for a while and later cause arbitrary
behaviour. (E.g. accessing data past the end of an array, security
loopholes in Java abstract machines, etc.) Insidious!
We cannot expect to exclude all trapped errors, eg arith overflows, or out-of-memory errors,
but certainly want to exclude all untrapped errors.
So, how to do so? Can use runtime checks and compile-time checks – want compile-time
where possible.
Note that the last is excluded despite the fact that when you execute the program you will
always get an int – type systems define approximations to the behaviour of programs, often
quite crude – and this has to be so, as we generally would like them to be decidable, so that
compilation is guaranteed to terminate.
Types for L1
Types of expressions:
Write T and Tloc for the sets of all terms of these grammars.
Let Γ range over TypeEnv, the finite partial functions from locations L
to Tloc . Notation: write a Γ as l1 :intref, ..., lk :intref instead of
{l1 7→ intref, ..., lk 7→ intref}.
33
• concretely, T = {int, bool, unit} and Tloc = {intref}.
• in this (very small!) language, there is only one type in Tloc , so a Γ is (up to isomor-
phism) just a set of locations. Later, Tloc will be more interesting...
• our semantics only let you store integers, so we have stratified types into T and Tloc .
If you wanted to store other values, you’d say
Γ ⊢ e1 :int Γ ⊢ e1 :int
Γ ⊢ e2 :int Γ ⊢ e2 :int
(op +) (op ≥)
Γ ⊢ e1 + e2 :int Γ ⊢ e1 ≥ e2 :bool
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
Note that in (if) the T is arbitrary, so long as both premises have the same T .
In some rules we arrange the premises vertically, e.g.
Γ ⊢ e1 :int
Γ ⊢ e2 :int
(op +)
Γ ⊢ e1 + e2 :int
but this is merely visual layout, equivalent to the horizontal layout below. Derivations using
such a rule should be written as if it was in the horizontal form.
Example
(bool) (int)
{} ⊢ false:bool {} ⊢ 2:int
(if)
{} ⊢ if false then 2 else 3 + 4:int
where ∇ is
34
Example
where ∇ is
(int) (int)
{} ⊢ 3:int {} ⊢ 4:int
(op +)
{} ⊢ 3 + 4:int
Γ(ℓ) = intref
Γ ⊢ e:int
(assign)
Γ ⊢ ℓ := e:unit
(deref)
Γ(ℓ) = intref
Γ ⊢!ℓ:int
(skip) Γ ⊢ skip:unit
Γ ⊢ e1 :unit
Γ ⊢ e2 :T
(seq)
Γ ⊢ e1 ; e2 :T
Γ ⊢ e1 :bool
Γ ⊢ e2 :unit
(while)
Γ ⊢ while e1 do e2 :unit
Note that the typing rules are syntax-directed – for each clause of the abstract syntax for
expressions there is exactly one rule with a conclusion of that form.
Properties
From these two we have that well-typed programs don’t get stuck:
35
(we’ll discuss how to prove these results soon)
Semantic style: one could make an explicit definition of what configurations are runtime
errors. Here, instead, those configurations are just stuck.
For L1 we don’t need to type the range of the store, as by definition all stored things are
integers.
Second problem is usually harder than the first. Solving it usually results
in a type inference algorithm: computing a type T for a phrase e , given
type environment Γ (or failing, if there is none).
More Properties
In the semantics, type environments Γ are partial functions from locations to the singleton
set {intref}. Here, just as we did for stores, we represent them as a list of loc*type loc
pairs containing, for each ℓ in the domain of the type environment, exactly one element of
the form (l,intref).
36
The Type Inference Algorithm
fun infertype gamma (Integer n) = SOME int
| infertype gamma (Boolean b) = SOME bool
| infertype gamma (Op (e1,opr,e2))
= (case (infertype gamma e1, opr, infertype gamma e2) of
(SOME int, Plus, SOME int) => SOME int
| (SOME int, GTEQ, SOME int) => SOME bool
| => NONE)
| infertype gamma (If (e1,e2,e3))
= (case (infertype gamma e1, infertype gamma e2, infertype gamma e3) of
(SOME bool, SOME t2, SOME t3) =>
if t2=t3 then SOME t2 else NONE
| => NONE)
| infertype gamma (Deref l)
= (case lookup (gamma,l) of
SOME intref => SOME int
| NONE => NONE)
| infertype gamma (Assign (l,e))
= (case (lookup (gamma,l), infertype gamma e) of
(SOME intref,SOME int) => SOME unit
| => NONE)
| infertype gamma (Skip) = SOME unit
| infertype gamma (Seq (e1,e2))
= (case (infertype gamma e1, infertype gamma e2) of
(SOME unit, SOME t2) => SOME t2
| => NONE )
| infertype gamma (While (e1,e2))
= (case (infertype gamma e1, infertype gamma e2) of
(SOME bool, SOME unit) => SOME unit )
ahem.
The Type Inference Algorithm – If
...
| infertype gamma (If (e1,e2,e3))
= (case (infertype gamma e1,
infertype gamma e2,
infertype gamma e3) of
(SOME bool, SOME t2, SOME t3) =>
if t2=t3 then SOME t2 else NONE
| => NONE)
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
...
| infertype gamma (Deref l)
= (case lookup (gamma,l) of
SOME intref => SOME int
| NONE => NONE)
...
(deref)
Γ(ℓ) = intref
Γ ⊢!ℓ:int
Again, the code depends on a uniqueness property (Theorem 7), without which we would
have to have infertype return a type L1 list of all the possible types.
Demo
37
Executing L1 in Moscow ML
where s is the store {l1 7→ n1 , ..., lk 7→ nk } and all locations that occur
in e are contained in {l1 , ..., lk }.
Some languages build the type system into the syntax. Original FORTRAN, BASIC etc.
had typing built into variable names, with e.g. those beginning with I or J storing integers).
Sometimes one has typing built into the grammar, with e.g. separate grammatical classes
of expressions and commands. As the type systems become more expressive, however, they
quickly go beyond what can be captured in context-free grammars. They must then be
separated from lexing and parsing, both conceptually and in implementations.
38
2.3 L1: Collected Definition
Syntax
Operational Semantics
Note that for each construct there are some computation rules, doing ‘real work’, and some
context (or congruence) rules, allowing subcomputations and specifying their order.
Say stores s are finite partial functions from L to Z. Say values v are expressions from the
grammar v ::= b | n | skip.
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
(deref) h!ℓ, si −→ hn, si if ℓ ∈ dom(s) and s(ℓ) = n
he1 , si −→ he1′ , s ′ i
(if3)
hif e1 then e2 else e3 , si −→ hif e1′ then e2 else e3 , s ′ i
(while)
hwhile e1 do e2 , si −→ hif e1 then (e2 ; while e1 do e2 ) else skip, si
39
Typing
Types of expressions:
T ::= int | bool | unit
Types of locations:
Tloc ::= intref
Write T and Tloc for the sets of all terms of these grammars.
Let Γ range over TypeEnv, the finite partial functions from locations L to Tloc .
Γ ⊢ e1 :int Γ ⊢ e1 :int
Γ ⊢ e2 :int Γ ⊢ e2 :int
(op +) (op ≥)
Γ ⊢ e1 + e2 :int Γ ⊢ e1 ≥ e2 :bool
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
Γ(ℓ) = intref
Γ ⊢ e:int
(assign)
Γ ⊢ ℓ := e:unit
Γ(ℓ) = intref
(deref)
Γ ⊢!ℓ:int
(skip) Γ ⊢ skip:unit
Γ ⊢ e1 :unit
Γ ⊢ e2 :T
(seq)
Γ ⊢ e1 ; e2 :T
Γ ⊢ e1 :bool
Γ ⊢ e2 :unit
(while)
Γ ⊢ while e1 do e2 :unit
40
2.4 Exercises
Exercise 1 ⋆Write a program to compute the factorial of the integer initially in location
l1 . Take care to ensure that your program really is an expression in L1.
Exercise 2 ⋆Give full derivations of all the reduction steps of h(l0 := 7); (l1 := (!l0 +
2)), {l0 7→ 0, l1 7→ 0}i.
Exercise 3 ⋆Give full derivations of the first four reduction steps of the he, si of the first
L1 example.
Exercise 4 ⋆Adapt the implementation code to correspond to the two rules (op1b) and
(op2b). Give some test cases that distinguish between the original and the new semantics.
Exercise 5 ⋆Adapt the implementation code to correspond to the two rules (assign1’) and
(seq1’). Give some test cases that distinguish between the original and the new semantics.
Exercise 6 ⋆⋆Fix the L1 implementation to match the semantics, taking care with the
representation of integers.
Exercise 7 ⋆⋆Fix the L1 semantics to match the implementation, taking care with the
representation of integers.
Exercise 8 ⋆Give a type derivation for (l0 := 7); (l1 := (!l0 +2)) with Γ = l0 :intref, l1 :intref.
Exercise 9 ⋆Give a type derivation for the e on Page 17 with Γ = l1 :intref, l2 :intref, l3 :intref
.
Exercise 10 ⋆Does Type Preservation hold for the variant language with rules (assign1’)
and (seq1’)? If not, give an example, and show how the type rules could be adjusted to make
it true.
Exercise 11 ⋆Adapt the type inference implementation to match your revised type system
from Exercise 10.
Exercise 12 ⋆Check whether mosml, the L1 implementation and the L1 semantics agree
on the order of evaluation for operators and sequencing.
Exercise 13 ⋆ (just for fun) Adapt the implementation to output derivation trees, in
ASCII, (or to show where proof search gets stuck) for −→ or ⊢.
41
3 Induction
Induction
We’ve stated several ‘theorems’, but how do we know they are true?
Use proof process also for strengthening our intuition about subtle
language features, and for debugging definitions – it helps you examine all
the various cases.
Prove facts about all elements of a relation defined by rules (e.g. the L1
transition relation, or the L1 typing relation) by rule induction.
We shall see that all three boil down to induction over certain trees.
∀ x ∈ N.Φ(x )
it’s enough to prove
Φ(0) ∧ (∀ x ∈ N.Φ(x ) ⇒ Φ(x + 1)) ⇒ ∀ x ∈ N.Φ(x )
For example, to prove
42
Theorem 8 1 + 2 + ... + x = 1/2 ∗ x ∗ (x + 1) .
(state Φ explicitly)
Proof We prove ∀ x .Φ(x ), where
def
Φ(x ) = (1 + 2 + ... + x = 1/2 ∗ x ∗ (x + 1))
(state the induction principle you’re using)
by mathematical induction.
(Now show each conjunct of the premise of the induction principle)
(instantiate Φ)
Φ(0) is (1 + ... + 0 = 1/2 ∗ 0 ∗ (0 + 1)), which holds as both sides are equal to 0.
43
3.1 Abstract Syntax and Structural Induction
How to prove facts about all expressions, e.g. Determinacy for L1?
Abstract Syntax
Recall we said:
if then else
MMM
sss MMM
sss M
≥ skip ;?
??
??
!l 0 skip l :=
+ 4
11
1
2 2
1 + 2 + 3 – ambiguous
(1 + 2) + 3 6= 1 + (2 + 3)
+ +
11
333
1
+ 3 1 +
333 11
1
1 2 2 3
Parentheses are only used for disambiguation – they are not part of the
grammar. 1 + 2 = (1 + 2) = ((1 + 2)) = (((((1)))) + ((2)))
44
All those are (sometimes) useful ways of looking at expressions (for lexing and parsing you
start with (1) and (2)), but for semantics we don’t want to be distracted by concrete syntax
– it’s easiest to work with abstract syntax trees, which for this grammar are finite trees,
with ordered branches, labelled as follows:
• leaves (nullary nodes) labelled by B ∪ Z ∪ ({!} ∗ L) ∪ {skip} = {true, false, skip} ∪
{..., −1, 0, 1, ...} ∪ {!l , !l1 , !l2 , ...}.
• unary nodes labelled by {l :=, l1 :=, l2 :=, ...}
• binary nodes labelled by {+, ≥, :=, ; , while do }
• ternary nodes labelled by {if then else }
Abstract grammar suggests a concrete syntax – we write expressions as strings just for
convenience, using parentheses to disambiguate where required and infix/mixfix notation,
but really mean trees. Arguments about exactly what concrete syntax a language should
have – beloved amongst computer scientists everywhere – do not belong in a semantics
course.
Just as for natural numbers to prove ∀ x ∈ N.Φ(x ) it was enough to prove Φ(0) and all
the implications Φ(x ) ⇒ Φ(x + 1) (for arbitrary x ∈ N), here to prove ∀ e ∈ L1 .Φ(e)
it is enough to prove Φ(c) for each nullary tree constructor c and all the implications
(Φ(e1 ) ∧ ... ∧ Φ(ek )) ⇒ Φ(c(e1 , .., ek )) for each tree constructor of arity k ≥ 1 (and for
arbitrary e1 ∈ L1 , .., ek ∈ L1 ).
∀ e ∈ L1 .Φ(e)
it’s enough to prove for each tree constructor c (taking k ≥ 0 arguments)
that if Φ holds for the subtrees e1 , .., ek then Φ holds for the tree
c(e1 , .., ek ). i.e.
∀ c.∀ e1 , .., ek .(Φ(e1 ) ∧ ... ∧ Φ(ek )) ⇒ Φ(c(e1 , .., ek )) ⇒ ∀ e.Φ(e)
where the tree constructors (or node labels) c are n , true, false, !l , skip,
l :=, while do , if then else , etc.
nullary: Φ(skip)
∀ b ∈ {true, false}.Φ(b)
∀ n ∈ Z.Φ(n)
∀ ℓ ∈ L.Φ(!ℓ)
unary: ∀ ℓ ∈ L.∀ e.Φ(e) ⇒ Φ(ℓ := e)
binary: ∀ op .∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 op e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 ; e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(while e1 do e2 )
ternary: ∀ e1 , e2 , e3 .(Φ(e1 ) ∧ Φ(e2 ) ∧ Φ(e3 )) ⇒ Φ(if e1 then e2 else e3 )
(See how this comes directly from the grammar)
If you think of the natural numbers as the abstract syntax trees of the grammar n ::= zero |
succ (n) then Structural Induction for that grammar is exactly the same as the Principal
of Mathematical Induction.
45
Proving Determinacy (Outline)
def
Φ(e) = ∀ s, e ′ , s ′ , e ′′ , s ′′ .
(he, si −→ he ′ , s ′ i ∧ he, si −→ he ′′ , s ′′ i)
⇒ he ′ , s ′ i = he ′′ , s ′′ i
nullary: Φ(skip)
∀ b ∈ {true, false}.Φ(b)
∀ n ∈ Z.Φ(n)
∀ ℓ ∈ L.Φ(!ℓ)
unary: ∀ ℓ ∈ L.∀ e.Φ(e) ⇒ Φ(ℓ := e)
binary: ∀ op .∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 op e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 ; e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(while e1 do e2 )
ternary: ∀ e1 , e2 , e3 .(Φ(e1 ) ∧ Φ(e2 ) ∧ Φ(e3 )) ⇒ Φ(if e1 then e2 else e3 )
How to prove facts about all elements of the L1 typing relation or the L1
reduction relation, e.g. Progress or Type Preservation?
Have to pay attention to what the elements of these relations really are...
46
Inductive Definitions
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
For each rule we can construct the set of all concrete rule instances,
taking all values of the metavariables that satisfy the side condition. For
example, for (op + ) and (op1) we take all values of n1 , n2 , s, n
(satisfying n = n1 + n2 ) and of e1 , e2 , s, e1′ , s ′ .
(op+ ) (op + )
h2 + 2, {}i −→ h4, {}i , h2 + 3, {}i −→ h5, {}i , ...
Note the last has a premise that is not itself derivable, but nonetheless this is a legitimate
instance of (op1).
(op+)
h2 + 2, {}i −→ h4, {}i
(op1)
h(2 + 2) + 3, {}i −→ h4 + 3, {}i
(op1)
h(2 + 2) + 3 ≥ 5, {}i −→ h4 + 3 ≥ 5, {}i
(deref) (int)
Γ ⊢!l :int Γ ⊢ 2:int (op +) (int)
Γ ⊢ (!l + 2):int Γ ⊢ 3:int
(op +)
Γ ⊢ (!l + 2) + 3:int
and he, si −→ he ′ , s ′ i is an element of the reduction relation
(resp. Γ ⊢ e:T is an element of the transition relation) iff there is a
derivation with that as the root node.
47
Now, to prove something about an inductively-defined set...
For any property Φ(a) of elements a of A, and any set of rules which
define a subset SR of A, to prove
∀ a ∈ SR .Φ(a)
it’s enough to prove that {a | Φ(a)} is closed under the rules, ie for each
concrete rule instance
h1 .. hk
c
if Φ(h1 ) ∧ ... ∧ Φ(hk ) then Φ(c).
For some proofs a slightly different principle is useful – this variant allows you to assume
each of the hi are themselves members of SR .
For any property Φ(a) of elements a of A, and any set of rules which
inductively define the set SR , to prove
∀ a ∈ SR .Φ(a)
48
Principle of Rule Induction (variant form): to prove Φ(a) for all a in the
set SR , it’s enough to prove that for each concrete rule instance
h1 .. hk
c
(deref) (int)
Γ ⊢!l :int Γ ⊢ 2:int (op +) (int)
Γ ⊢ (!l + 2):int Γ ⊢ 3:int
(op +)
Γ ⊢ (!l + 2) + 3:int
Example Proofs
In the notes there are detailed example proofs for Determinacy (structural
induction), Progress (rule induction on type derivations), and Type
Preservation (rule induction on reduction derivations).
49
When is a proof a proof?
What’s a proof?
Remember – the point is to use the mathematics to help you think about things that are too
complex to keep in your head all at once: to keep track of all the cases etc. To do that, and
to communicate with other people, it’s important to write down the reasoning and proof
structure as clearly as possible. After you’ve done a proof you should give it to someone
(your supervision partner first, perhaps) to see if they (a) can understand what you’ve said,
and (b) if they believe it.
1. proof lets you see (and explain) why they are obvious
50
Theorem 1 (Determinacy)If he, si −→ he1 , s1 i and he, si −→ he2 , s2 i then he1 , s1 i =
he2 , s2 i .
Proof Take
def
Φ(e) = ∀ s, e ′ , s ′ , e ′′ , s ′′ .(he, si −→ he ′ , s ′ i ∧ he, si −→ he ′′ , s ′′ i) ⇒ he ′ , s ′ i = he ′′ , s ′′ i
so e ′ = e ′′ and s ′ = s ′′ .
Case ℓ := e. Suppose Φ(e) (then we have to show Φ(ℓ := e)).
Take arbitrary s, e ′ , s ′ , e ′′ , s ′′ such that hℓ := e, si −→ he ′ , s ′ i ∧ hℓ := e, si −→
he ′′ , s ′′ i.
It’s handy to have this lemma:
Lemma 1 For all e ∈ L1 , if e is a value then ∀ s.¬ ∃e ′ , s ′ .he, si −→
he ′ , s ′ i.
Proof By defn e is a value if it is of one of the forms n, b, skip. By
examination of the rules on slides ..., there is no rule with conclusion
of the form he, si −→ he ′ , s ′ i for e one of n, b, skip.
The only rules which could be applicable, for each of the two transitions, are
(assign1) and (assign2).
case hℓ := e, si −→ he ′ , s ′ i is an instance of (assign1). Then for some n we have
e = n and ℓ ∈ dom(s) and e ′ = skip and s ′ = s + {ℓ 7→ n}.
case hℓ := n, si −→ he ′′ , s ′′ i is an instance of (assign1) (note we are using
the fact that e = n here). Then e ′′ = skip and s ′′ = s + {ℓ 7→ n} so
he ′ , s ′ i = he ′′ , s ′′ i as required.
case hℓ := e, si −→ he ′′ , s ′′ i is an instance of (assign2). Then hn, si −→
he ′′ , s ′′ i, which contradicts the lemma, so this case cannot arise.
case hℓ := e, si −→ he ′ , s ′ i is an instance of (assign2). Then for some e1′ we have
he, si −→ he1′ , s ′ i (*) and e ′ = (ℓ := e1′ ).
case hℓ := e, si −→ he ′′ , s ′′ i is an instance of (assign1). Then for some n we
have e = n, which contradicts the lemma, so this case cannot arise.
case hℓ := e, si −→ he ′′ , s ′′ i is an instance of (assign2). Then for some
e1′′ we have he, si −→ he1′′ , s ′′ i(**) and e ′′ = (ℓ := e1′′ ). Now, by the
induction hypothesis Φ(e), (*) and (**) we have he1′ , s ′ i = he1′′ , s ′′ i, so
he ′ , s ′ i = hℓ := e1′ , s ′ i = hℓ := e1′′ , s ′′ i = he ′′ , s ′′ i as required.
Case e1 op e2 . Suppose Φ(e1 ) and Φ(e2 ).
Take arbitrary s, e ′ , s ′ , e ′′ , s ′′ such that he1 op e2 , si −→ he ′ , s ′ i∧he1 op e2 , si −→
he ′′ , s ′′ i.
51
By examining the expressions in the left-hand-sides of the conclusions of the rules,
and using the lemma above, the only possibilities are those below (you should
check why this is so for yourself).
case op = + and he1 + e2 , si −→ he ′ , s ′ i is an instance of (op+) and he1 +
e2 , si −→ he ′′ , s ′′ i is an instance of (op+ ).
Then for some n1 , n2 we have e1 = n1 , e2 = n2 , e ′ = n3 = e ′′ for n3 = n1 +n2 ,
and s ′ = s = s ′′ .
case op =≥ and he1 ≥ e2 , si −→ he ′ , s ′ i is an instance of (op≥) and he1 ≥
e2 , si −→ he ′′ , s ′′ i is an instance of (op≥).
Then for some n1 , n2 we have e1 = n1 , e2 = n2 , e ′ = b = e ′′ for b = (n1 ≥ n2 ),
and s ′ = s = s ′′ .
case he1 op e2 , si −→ he ′ , s ′ i is an instance of (op1) and he1 op e2 , si −→
he ′′ , s ′′ i is an instance of (op1).
Then for some e1′ and e1′′ we have he1 , si −→ he1′ , s ′ i (*), he1 , si −→ he1′′ , s ′′ i
(**), e ′ = e1′ op e2 , and e ′′ = e1′′ op e2 . Now, by the induction hypothesis
Φ(e1 ), (*) and (**) we have he1′ , s ′ i = he1′′ , s ′′ i, so he ′ , s ′ i = he1′ op e2 , s ′ i =
he1′′ op e2 , s ′′ i = he ′′ , s ′′ i as required.
case he1 op e2 , si −→ he ′ , s ′ i is an instance of (op2) and he1 op e2 , si −→
he ′′ , s ′′ i is an instance of (op2).
Similar, save that we use the induction hypothesis Φ(e2 ).
Case e1 ; e2 . Suppose Φ(e1 ) and Φ(e2 ).
Take arbitrary s, e ′ , s ′ , e ′′ , s ′′ such that he1 ; e2 , si −→ he ′ , s ′ i ∧ he1 ; e2 , si −→
he ′′ , s ′′ i.
By examining the expressions in the left-hand-sides of the conclusions of the rules,
and using the lemma above, the only possibilities are those below.
case e1 = skip and both transitions are instances of (seq1).
Then he ′ , s ′ i = he2 , si = he ′′ , s ′′ i.
case e1 is not a value and both transitions are instances of (seq2). Then for some
e1′ and e1′′ we have he1 , si −→ he1′ , s ′ i (*), he1 , si −→ he1′′ , s ′′ i (**), e ′ = e1′ ; e2 ,
and e ′′ = e1′′ ; e2
Then by the induction hypothesis Φ(e1 ) we have he1′ , s ′ i = he1′′ , s ′′ i, so
he ′ , s ′ i = he1′ ; e2 , s ′ i = he1′′ ; e2 , s ′′ i = he ′′ , s ′′ i as required.
Case while e1 do e2 . Suppose Φ(e1 ) and Φ(e2 ).
Take arbitrary s, e ′ , s ′ , e ′′ , s ′′ such that hwhile e1 do e2 , si −→ he ′ , s ′ i ∧
hwhile e1 do e2 , si −→ he ′′ , s ′′ i.
By examining the expressions in the left-hand-sides of the conclusions of the rules
both must be instances of (while), so he ′ , s ′ i = hif e1 then (e2 ; while e1 do e2 ) else skip, si =
he ′′ , s ′′ i.
Case if e1 then e2 else e3 . Suppose Φ(e1 ), Φ(e2 ) and Φ(e3 ).
Take arbitrary s, e ′ , s ′ , e ′′ , s ′′ such that hif e1 then e2 else e3 , si −→ he ′ , s ′ i ∧
hif e1 then e2 else e3 , si −→ he ′′ , s ′′ i.
By examining the expressions in the left-hand-sides of the conclusions of the rules,
and using the lemma above, the only possibilities are those below.
case e1 = true and both transitions are instances of (if1).
case e1 = false and both transitions are instances of (if2).
52
case e1 is not a value and both transitions are instances of (if3).
The first two cases are immediate; the last uses Φ(e1 ).
(check we’ve done all the cases!)
(note that the level of written detail can vary, as here – if you and the reader agree – but you
must do all the steps in your head. If in any doubt, write it down, as an aid to thought...!)
53
Theorem 2 (Progress) If Γ ⊢ e:T and dom(Γ) ⊆ dom(s) then either e is a value or there
exist e ′ , s ′ such that he, si −→ he ′ , s ′ i.
Proof Take
def
Φ(Γ, e, T ) = ∀ s.dom(Γ) ⊆ dom(s) ⇒ value(e) ∨ (∃ e ′ , s ′ .he, si −→ he ′ , s ′ i)
We show that for all Γ, e, T , if Γ ⊢ e:T then Φ(Γ, e, T ), by rule induction on the
definition of ⊢.
Case (int). Recall the rule scheme
It has no premises, so we have to show that for all instances Γ, e, T of the con-
clusion we have Φ(Γ, e, T ).
For any such instance, there must be an n ∈ Z for which e = n.
Now Φ is of the form ∀ s.dom(Γ) ⊆ dom(s) ⇒ ..., so consider an arbitrary s and
assume dom(Γ) ⊆ dom(s).
We have to show value(e) ∨ (∃ e ′ , s ′ .he, si −→ he ′ , s ′ i). But the first disjunct is
true as integers are values (according to the definition).
Case (bool) similar.
Case (op+ ). Recall the rule
Γ ⊢ e1 :int
Γ ⊢ e2 :int
(op +)
Γ ⊢ e1 + e2 :int
We have to show that for all Γ, e1 , e2 , if Φ(Γ, e1 , int) and Φ(Γ, e2 , int) then Φ(Γ, e1 +
e2 , int).
Suppose Φ(Γ, e1 , int) (*), Φ(Γ, e2 , int) (**), Γ ⊢ e1 :int (***), and Γ ⊢ e2 :int (****)
(note that we’re using the variant form of rule induction here).
Consider an arbitrary s. Assume dom(Γ) ⊆ dom(s).
We have to show value(e1 + e2 ) ∨ (∃ e ′ , s ′ .he1 + e2 , si −→ he ′ , s ′ i).
Now the first disjunct is false (e1 + e2 is not a value), so we have to show the
second, i.e.∃he ′ , s ′ i.he1 + e2 , si −→ he ′ , s ′ i.
By (*) one of the following holds.
case ∃ e1′ , s ′ .he1 , si −→ he1′ , s ′ i.
Then by (op1) we have he1 + e2 , si −→ he1′ + e2 , s ′ i, so we are done.
case e1 is a value. By (**) one of the following holds.
case ∃ e2′ , s ′ .he2 , si −→ he2′ , s ′ i.
Then by (op2) he1 + e2 , si −→ he1 + e2′ , s ′ i, so we are done.
case e2 is a value.
(Now want to use (op+ ), but need to know that e1 and e2 are really
integers. )
Lemma 2 for all Γ, e, T , if Γ ⊢ e:T , e is a value and T = int then for
some n ∈ Z we have e = n.
54
Proof By rule induction. Take Φ′ (Γ, e, T ) = ((value(e) ∧ T = int) ⇒
∃ n ∈ Z.e = n).
Case (int). ok
Case (bool),(skip). In instances of these rules the conclusion is a
value but the type is not int, so ok.
Case otherwise. In instances of all other rules the conclusion is
not a value, so ok.
(a rather trivial use of rule induction – we never needed to use the
induction hypothesis, just to do case analysis of the last rule that
might have been used in a derivation of Γ ⊢ e:T ).
Using the Lemma, (***) and (****) there exist n1 ∈ Z and n2 ∈ Z
such that e1 = n1 and e2 = n2 . Then by (op+) he1 + e2 , si −→ hn, si
where n = n1 + n2 , so we are done.
Case (op ≥ ). Similar to (op + ).
Case (if). Recall the rule
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
Suppose Φ(Γ, e1 , bool) (*1), Φ(Γ, e2 , T ) (*2), Φ(Γ, e3 , T ) (*3), Γ ⊢ e1 :bool (*4),
Γ ⊢ e2 :T (*5) and Γ ⊢ e3 :T (*6).
Consider an arbitrary s. Assume dom(Γ) ⊆ dom(s). Write e for if e1 then e2 else e3 .
This e is not a value, so we have to show he, si has a transition.
case ∃ e1′ , s ′ .he1 , si −→ he1′ , s ′ i.
Then by (if3) he, si −→ hif e1′ then e2 else e3 , si, so we are done.
case e1 is a value.
(Now want to use (if1) or (if2), but need to know that e1 ∈ {true, false}.
Realise should have proved a stronger Lemma above).
Lemma 3 For all Γ, e, T . if Γ ⊢ e:T and e is a value, then T = int ⇒
∃ n ∈ Z.e = n, T = bool ⇒ ∃ b ∈ {true, false}.e = b, and T = unit ⇒
e = skip.
Proof By rule induction – details omitted.
Using the Lemma and (*4) we have ∃ b ∈ {true, false}.e1 = b.
case b = true. Use (if1).
case b = false. Use (if2).
Case (deref). Recall the rule
Γ(ℓ) = intref
(deref)
Γ ⊢!ℓ:int
55
Cases (assign), (skip), (seq), (while). Left as an exercise.
56
Theorem 3 (Type Preservation)If Γ ⊢ e:T and dom(Γ) ⊆ dom(s) and he, si −→ he ′ , s ′ i
then Γ ⊢ e ′ :T and dom(Γ) ⊆ dom(s ′ ).
Proof First show the second part, using the following lemma.
Lemma 4 If he, si −→ he ′ , s ′ i then dom(s ′ ) = dom(s).
Proof Rule induction on derivations of he, si −→ he ′ , s ′ i. Take Φ(e, s, e ′ , s ′ ) =
(dom(s) = dom(s ′ )).
All rules are immediate uses of the induction hypothesis except (assign1),
for which we note that if ℓ ∈ dom(s) then dom(s + (ℓ 7→ n)) = dom(s).
Now prove the first part, ie If Γ ⊢ e:T and dom(Γ) ⊇ dom(s) and he, si −→ he ′ , s ′ i
then Γ ⊢ e ′ :T .
Prove by rule induction on derivations of he, si −→ he ′ , s ′ i.
Take Φ(e, s, e ′ , s ′ ) = ∀ Γ, T .(Γ ⊢ e:T ∧ dom(Γ) ⊆ dom(s)) ⇒ Γ ⊢ e ′ :T .
Case (op+). Recall
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
Suppose Φ(e1 , s, e1′ , s ′ ) (*) and he1 , si −→ he1′ , s ′ i. Have to show Φ(e1 op e2 , s, e1′ op e2 , s ′ ).
Take arbitrary Γ, T . Suppose Γ ⊢ e1 op e2 :T and dom(Γ) ⊆ dom(Γ) (**).
case op = +. The last rule in the derivation of Γ ⊢ e1 + e2 :T must have been
(op+), so must have T = int, Γ ⊢ e1 :int (***) and Γ ⊢ e2 :int (****). By the
induction hypothesis (*), (**), and (***) we have Γ ⊢ e1′ :int. By the (op+)
rule Γ ⊢ e1′ + e2 :T .
case op =≥. Similar.
Case s (op2) (deref), (assign1), (assign2), (seq1), (seq2), (if1), (if2), (if3), (while).
Left as exercises.
Theorem 4 (Safety) If Γ ⊢ e:T , dom(Γ) ⊆ dom(s), and he, si −→∗ he ′ , s ′ i then either e ′
is a value or there exist e ′′ , s ′′ such that he ′ , s ′ i −→ he ′′ , s ′′ i.
Proof Hint: induction along −→∗ using the previous results.
Theorem 7 (Uniqueness of typing) If Γ ⊢ e:T and Γ ⊢ e:T ′ then T = T ′ . The proof
is left as Exercise 19.
Theorem 5 (Decidability of typeability) Given Γ, e, one can decide ∃ T .Γ ⊢ e:T .
Theorem 6 (Decidability of type checking) Given Γ, e, T , one can decide Γ ⊢ e:T .
57
Proof The implementation gives a type inference algorithm, which, if correct, and to-
gether with Uniqueness, implies both of these results.
Proving Progress
Principle of Rule Induction (variant form): to prove Φ(a) for all a in the set
SR defined by the rules, it’s enough to prove that for each rule instance
h1 .. hk
c
def
Φ(Γ, e, T ) = ∀ s. dom(Γ) ⊆ dom(s) ⇒
value(e) ∨ (∃ e ′ , s ′ .he, si −→ he ′ , s ′ i)
Γ ⊢ e1 :int
Γ ⊢ e2 :int
(op +)
Γ ⊢ e1 + e2 :int
58
Using Φ(Γ, e1 , int) and Φ(Γ, e2 , int) we have:
(deref) (int)
Γ ⊢!l :int Γ ⊢ 2:int (op +) (int)
Γ ⊢ (!l + 2):int Γ ⊢ 3:int
(op +)
Γ ⊢ (!l + 2) + 3:int
Proving Determinacy
nullary: Φ(skip)
∀ b ∈ {true, false}.Φ(b)
∀ n ∈ Z.Φ(n)
∀ ℓ ∈ L.Φ(!ℓ)
unary: ∀ ℓ ∈ L.∀ e.Φ(e) ⇒ Φ(ℓ := e)
binary: ∀ op .∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 op e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(e1 ; e2 )
∀ e1 , e2 .(Φ(e1 ) ∧ Φ(e2 )) ⇒ Φ(while e1 do e2 )
ternary: ∀ e1 , e2 , e3 .(Φ(e1 ) ∧ Φ(e2 ) ∧ Φ(e3 )) ⇒ Φ(if e1 then e2 else e3 )
59
(op +) hn1 + n2 , si −→ hn, si if n = n1 + n2
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i (if1) hif true then e2 else e3 , si −→ he2 , si
he1 , si −→ he1′ , s ′ i
(seq2)
he1 ; e2 , si −→ he1′ ; e2 , s ′ i
def
Φ(e) = ∀ s, e ′ , s ′ , e ′′ , s ′′ .
(he, si −→ he ′ , s ′ i ∧ he, si −→ he ′′ , s ′′ i)
⇒ he ′ , s ′ i = he ′′ , s ′′ i
he, si −→ he ′ , s ′ i
(assign2)
hℓ := e, si −→ hℓ := e ′ , s ′ i
+
111
+3 3
33
!l 2
60
Summarising Proof Techniques
Here we will be more precise about inductive definitions and rule induction. Following this
may give you a sharper understanding, but it is not itself examinable. To make an inductive
definition of a particular subset of a set A, take a set R of some concrete rule instances,
each of which is a pair (H , c) where H is a finite subset of A (the hypotheses) and c is an
element of A (the conclusion).
Consider finite trees labelled by elements of A for which every step is in R, eg
a3
a1 a2
a0
FR (S ) = {c | ∃ H .(H , c) ∈ R ∧ H ⊆ S }
(FR (S ) is the set of all things you can derive in exactly one step from things in S )
0
SR = {}
k +1 k
SR = FR (SR )
ω
T k
SR = k ∈ N SR
ω
Theorem 9 SR = SR .
Say a subset S ⊆ A is closed under rules R if ∀(H , c) ∈ R.(H ⊆ S ) ⇒ c ∈ S , ie, if
FR (S ) ⊆ S .
T
Theorem 10 SR = {S | S ⊆ A ∧ FR (S ) ⊆ S }
61
This says ‘the subset SR of A inductively defined by R is the smallest set closed under the
rules R’. It is the intersection of all of them, so smaller than (or equal to) any of them.
Now, to prove something about an inductively-defined set...
To see why rule induction is sound, using this definition: Saying {a | Φ(a)} closed under
the rules means exactly FR ({a | Φ(a)}) ⊆ {a | Φ(a)}, so by Theorem 10 we have SR ⊆ {a |
Φ(a)}, i.e.∀ a ∈ SR .a ∈ {a ′ | Φ(a ′ )}, i.e.∀ a ∈ SR .Φ(a).
3.5 Exercises
Exercise 14 ⋆Without looking at the proof in the notes, do the cases of the proof of The-
orem 1 (Determinacy) for e1 op e2 , e1 ; e2 , while e1 do e2 , and if e1 then e2 else e3 .
Exercise 15 ⋆Try proving Determinacy for the language with nondeterministic order of
evaluation for e1 op e2 (ie with both (op1) and (op1b) rules), which is not determinate.
Explain where exactly the proof can’t be carried through.
Exercise 16 ⋆Complete the proof of Theorem 2 (Progress).
Exercise 17 ⋆⋆Complete the proof of Theorem 3 (Type Preservation).
Exercise 18 ⋆⋆Give an alternate proof of Theorem 3 (Type Preservation) by rule induc-
tion over type derivations.
Exercise 19 ⋆⋆Prove Theorem 7 (Uniqueness of Typing).
62
4 Functions
Functions – L2
<script type="text/vbscript">
function addone(x)
addone = x+1
end function
</script>
C♯
class M {
public static void Main() {
IntThunk[] funcs = new IntThunk[11];
for (int i = 0; i <= 10; i++)
Slide 6 {
funcs[i] = delegate() { return i; };
}
foreach (IntThunk f in funcs)
{
System.Console.WriteLine(f());
}
}
}
Most languages have some kind of function, method, or procedure – some way of abstracting
a piece of code on a formal parameter so that you can use the code multiple times with
different arguments, without having to duplicate the code in the source. The next two
lectures explore the design space for functions, adding them to L1.
63
Functions – Examples
(fn x:int ⇒ x + 1)
(fn x:int ⇒ x + 1) 7
(fn y:int ⇒ (fn x:int ⇒ x + y))
(fn y:int ⇒ (fn x:int ⇒ x + y)) 1
(fn x:int → int ⇒ (fn y:int ⇒ x (x y)))
(fn x:int → int ⇒ (fn y:int ⇒ x (x y))) (fn x:int ⇒ x + 1)
(fn x:int → int ⇒ (fn y:int ⇒ x (x y))) (fn x:int ⇒ x + 1) 7
For simplicity, we’ll deal with anonymous functions only. Functions will always take a single
argument and return a single result — though either might itself be a function or a tuple.
Functions – Syntax
e ::= ... | fn x :T ⇒ e | e1 e2 | x
Types
T ::= int | bool | unit | T1 → T2
Tloc ::= intref
Note that the first-order function types include types like int → (int → int) and
int → (int → (int → int)), of functions that take an argument of base type and return
64
a (first-order) function, e.g.
Some languages go further, forbidding partial application – and thereby avoiding the
need for heap-allocated closures in the implementation. We’ll come back to this.
In order to express the semantics for functions, we need some auxiliary definitions.
Variable shadowing
The second is not allowed in Java. For large systems that would be a problem, eg in a
language with nested function definitions, where you may wish to write a local function
parameter without being aware of what is in the surrounding namespace. There are other
issues to do with class namespaces.
Alpha conversion
65
Alpha conversion – free and bound occurrences
17
x+y
fn x:int ⇒ x + 2
fn x:int ⇒ x + z
if y then 2 + x else ((fn x:int ⇒ x + 2)z)
All the other occurrences of x are bound by the closest enclosing
fn x :T ⇒ ....
Note that in fn x:int ⇒ 2 the x is not an occurrence. Likewise, in fn x:int ⇒ x + 2 the left
x is not an occurrence; here the right x is an occurrence that is bound by the left x.
Sometimes it is handy to draw in the binding:
fn x:int ⇒x + 2
fn x:int ⇒x + z
fn y:int ⇒y + z
fn z:int ⇒z+z
For example:
fn x:int ⇒x + z = fn y:int ⇒y + z 6= fn z:int ⇒z+z
66
Abstract Syntax up to Alpha Conversion
+ II + II + II
tt II uu II uu II
ttt II uu II uuu II
x
t
z uu z
u
z
y z
add pointers (from each x node to the closest enclosing fn x :T ⇒ node);
remove names of binders and the occurrences they bind
+ JJ + JJ + JJ
tt JJ tt JJ tt JJ
ttt JJ ttt JJ ttt JJ
t t t
• z • z • •
fn · :int ⇒ fn 8 · :int ⇒
u + III u + III
uuu II uuu II
uu I uu I
• 2 • 2
67
De Bruijn Indices
fn · :int ⇒ fn 8 · :int ⇒
+ II + II
uu II uu II
uuu II uuu II
u u
• 2 • 2
Free Variables
Say the free variables of an expression e are the set of variables x for
which there is an occurence of x free in e .
fv(x ) = {x }
fv(e1 op e2 ) = fv(e1 ) ∪ fv(e2 )
fv(fn x :T ⇒ e) = fv(e) − {x }
For example
fv(x + y) = {x, y}
fv(fn x:int ⇒ x + y) = {y}
fv(x + (fn x:int ⇒ x + y)7) = {x, y}
fv(x ) = {x }
fv(fn x :T ⇒ e) = fv(e) − {x }
fv(e1 e2 ) = fv(e1 ) ∪ fv(e2 )
fv(n) = {}
fv(e1 op e2 ) = fv(e1 ) ∪ fv(e2 )
fv(if e1 then e2 else e3 ) = fv(e1 ) ∪ fv(e2 ) ∪ fv(e3 )
fv(b) = {}
fv(skip) = {}
fv(ℓ := e) = fv(e)
fv(!ℓ) = {}
fv(e1 ; e2 ) = fv(e1 ) ∪ fv(e2 )
fv(while e1 do e2 ) = fv(e1 ) ∪ fv(e2 )
bv(x ) = {}
bv(fn x :T ⇒ e) = {x } ∪ bv(e)
bv(e1 e2 ) = bv(e1 ) ∪ bv(e2 )
...
This is fine for concrete terms, but we’re working up to alpha conversion, so (fn x:int ⇒
2) = (fn y:int ⇒ 2) but bv(fn x:int ⇒ 2) = {x} 6= {y} = bv(fn y:int ⇒ 2). Argh! Can see
68
from looking back at the abstract syntax trees up to alpha conversion that they just don’t
have this information in, anyway.)
The semantics for functions will involve substituting actual parameters for formal parame-
ters. That’s a bit delicate in a world with binding...
Substitution – Examples
The semantics for functions will involve substituting actual parameters for
formal parameters.
Write {e/x }e ′ for the result of substituting e for all free occurrences of x
in e ′ . For example
{3/x}(x ≥ x) = (3 ≥ 3)
{3/x}((fn x:int ⇒ x + y)x) = (fn x:int ⇒ x + y)3
{y + 2/x}(fn y:int ⇒ x + y) = fn z:int ⇒ (y + 2) + z
Note that substitution is a meta-operation – it’s not part of the L2 expression grammar.
The notation used for substitution varies – people write {3/x }e, or [3/x ]e, or e[3/x ], or
{x ← 3}e, or...
Substitution – Definition
Defining that:
{e/z }x = e if x =z
= x otherwise
{y + 2/x}(fn y:int ⇒ x + y)
= {y + 2/x}(fn y′ :int ⇒ x + y′ ) renaming
= fn y′ :int ⇒ {y + 2/x}(x + y′ ) as y′ 6= x and y′ ∈
/ fv(y + 2)
= fn y′ :int ⇒ {y + 2/x}x + {y + 2/x}y′
= fn y′ :int ⇒ (y + 2) + y′
Substitution – Simultaneous
69
Write dom(σ) for the set of variables in the domain of σ; ran(σ) for the set of expressions
in the range of σ, ie
dom({e1 /x1 , .., ek /xk }) = {x1 , .., xk }
ran({e1 /x1 , .., ek /xk }) = {e1 , .., ek }
σx = σ(x ) if x ∈ dom(σ)
= x otherwise
σ(fn x :T ⇒ e) = fn x :T ⇒ (σ e) if x ∈
/ dom(σ) and x ∈
/ fv(ran(σ)) (*)
σ(e1 e2 ) = (σ e1 )(σ e2 )
σn = n
σ(e1 op e2 ) = σ(e1 ) op σ(e2 )
σ(if e1 then e2 else e3 ) = if σ(e1 ) then σ(e2 ) else σ(e3 )
σ(b) = b
σ(skip) = skip
σ(ℓ := e) = ℓ := σ(e)
σ(!ℓ) = !ℓ
σ(e1 ; e2 ) = σ(e1 ); σ(e2 )
σ(while e1 do e2 ) = while σ(e1 ) do σ(e2 )
Function Behaviour
70
L2 Call-by-value
he2 , si −→ he2′ , s ′ i
(app2)
hv e2 , si −→ hv e2′ , s ′ i
• This is a strict semantics – fully evaluating the argument to function before doing the
application.
• One could evaluate e1 e2 right-to-left instead or left-to-right. That would be perverse
– better design is to match the evaluation order for operators etc.
• The syntax has explicit types and the semantics involves syntax, so types appear in
semantics – but they are not used in any interesting way, so an implementation could
erase them before execution. Not all languages have this property.
• The rules for these constructs, and those in the next few lectures, don’t touch the store,
but we need to include it in the rules in order to get the sequencing of side-effects right.
In a pure functional language, configurations would just be expressions.
• A naive implementation of these rules would have to traverse e and copy v as many
times as there are free occurrences of x in e. Real implementations don’t do that,
using environments instead of doing substitution. Environments are more efficient;
substitutions are simpler to write down – so better for implementation and semantics
respectively.
71
Function Behaviour. Choice 2: Call-by-name
L2 Call-by-name
he1 , si −→ he1′ , s ′ i
(CBN-app)
he1 e2 , si −→ he1′ e2 , s ′ i
Haskell uses a refined variant – call-by-need – in which the first time the argument evaluated
we ‘overwrite’ all other copies by that value.
That lets you do some very nice programming, e.g. with potentially-infinite datastructures.
72
Function Behaviour. Choice 3: Full beta
Allow both left and right-hand sides of application to reduce. At any point
where the left-hand-side has reduced to a fn-term, replace all
occurrences of the formal parameter in the fn-term by the argument.
Allow reduction inside lambdas.
L2 Beta
he1 , si −→ he1′ , s ′ i
(beta-app1)
he1 e2 , si −→ he1′ e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(beta-app2)
he1 e2 , si −→ he1 e2′ , s ′ i
he, si −→ he ′ , s ′ i
(beta-fn2)
hfn x :T ⇒ e, si −→ hfn x :T ⇒ e ′ , s ′ i
This reduction relation includes the CBV and CBN relations, and also reduction inside
lambdas.
L2 Beta: Example
(fn x:int ⇒ x + x)
WWW(2
WW + 2)WWWWW
zz WWWWW
}zz +
(fn x:int ⇒ xI + x) 4 (2 + 2) + (2 + 2)
SSS
II k
II kkk SSS
II ukkk S)
II 4 + (2 + 2) (2 + 2) + 4
II
II eeeee
II e eeeeeeeeee
$ reeeeee
4+4
8
This ain’t much good for a programming language... why? (if you’ve got any non-terminating
computation Ω, then (λx .y) Ω might terminate or not, depending on the implementation)
(in pure lambda you do have confluence, which saves you – at least mathematically)
But, in full beta, or in CBN, it becomes rather hard to understand what order your code
is going to be run in! Hence, non-strict languages typically don’t allow unrestricted side
effects (our combination of store and CBN is pretty odd ). Instead, Haskell encourages pure
programming, without effects (store operations, IO, etc.) except where really necessary.
Where they are necessary, it uses a fancy type system to give you some control of evaluation
order.
Purity
Note that Call-by-Value and Call-by-Name are distinguishable even if there is no store – con-
sider applying a function to a non-terminating argument, eg (fn x:unit ⇒ skip) (while true do skip).
Call-by-Name and Call-by-Need are not distinguishable except by performance properties –
but those really matter.
73
Back to CBV (from now on).
(var) Γ ⊢ x :T if Γ(x ) =T
(fn)
Γ, x :T ⊢ e:T ′
Γ ⊢ fn x :T ⇒ e : T → T ′
(app) Γ ⊢ e1 :T → T ′ Γ ⊢ e2 :T
Γ ⊢ e1 e2 :T ′
(var) (int)
x:int ⊢ x:int x:int ⊢ 2:int (op+)
x:int ⊢ x + 2:int (fn) (int)
{} ⊢ (fn x:int ⇒ x + 2):int → int {} ⊢ 2:int
(app)
{} ⊢ (fn x:int ⇒ x + 2) 2:int
74
• With our notational convention for Γ, x :T , we could rewrite the (var) rule as Γ, x :T ⊢
x :T . By the convention, x is not in the domain of Γ, and Γ + {x 7→ T } is a perfectly
good partial function.
Another example:
(int)
l :intref, x:unit ⊢ 1:int
(assign) (var)
l :intref, x:unit ⊢ (l := 1):unit l :intref, x:unit ⊢ x:unit
(seq) (int)
l :intref, x:unit ⊢ (l := 1); x:unit l :intref ⊢ 2:int
(fn) (assign)
l :intref ⊢ (fn x:unit ⇒ (l := 1); x):unit → unit l :intref ⊢ (l := 2):unit
(app)
l :intref ⊢ (fn x:unit ⇒ (l := 1); x) (l := 2):unit
Properties of Typing
⊢ e:T and
Theorem 12 (Type Preservation) If e closed and Γ
dom(Γ) ⊆ dom(s) and he, si −→ he , s i then Γ ⊢ e ′ :T and e ′
′ ′
Taking
Φ(e, s, e ′ , s ′ ) =
∀ Γ, T .
Γ ⊢ e:T ∧ closed(e) ∧ dom(Γ) ⊆ dom(s)
⇒
Γ ⊢ e ′ :T ∧ closed(e ′ ) ∧ dom(Γ) ⊆ dom(s ′ )
Normalisation
75
4.4 Local Definitions and Recursive Functions
Local definitions
(let)
Γ ⊢ e1 :T Γ, x :T ⊢ e2 :T ′
Γ ⊢ let val x :T = e1 in e2 end:T ′
(let1)
he1 , si −→ he1′ , s ′ i
hlet val x :T = e1 in e2 end, si −→ hlet val x :T = e1′ in e2 end, s ′ i
(let2)
Our alpha convention means this really is a local definition – there is no way to refer to the
locally-defined variable outside the let val .
How about
76
But...
What about
Recursive Functions
77
For example:
Below, in the context of the let val rec , x f n finds the smallest n ′ ≥ n
′ ′
for which f n evaluates to some m ≤ 0.
xf0
end
end
As a test case, we apply it to the function (fn z :int ⇒ if z ≥ 3 then (if 3 ≥ z then 0 else 1) else 1),
which is 0 for argument 3 and 1 elsewhere.
78
More Syntactic Sugar
Do we need e1 ; e2 ?
Do we need while e1 do e2 ?
No: could encode by while e1 do e2
w skip
end
In each case typing is the same (more precisely?); reduction is ‘essentially’ the same. What
does that mean? More later, on contextual equivalence.
We know at least that you can’t in the language without while or store, as
had normalisation theorem there and can write
4.5 Implementation
Implementation
make
mosml
load "Main";
79
Watch out for the parsing - it is not quite the same as (eg) mosml, so
you need to parenthesise more.
of these, you’re most likely to want to look at, and change, Semantics.sml.
You should first also look at Syntax.sml.
The implementation lets you type in L2 expressions and initial stores and watch them
resolve, type-check, and reduce.
Implementation – Substitution
80
If e’ represents a closed term fn x :T ⇒ e1′ then e’ = Fn(t,e1’) for t and e1’ representing
T and e1′ . If also e represents a closed term e then subst e 0 e1’ represents {e/x }e1′ .
type typeEnv
= (loc*type loc) list * type expr list
Implementation – Closures
(if you get that wrong, you end up with dynamic scoping, as in original
LISP)
81
Aside: Small-step vs Big-step Semantics
Syntax
Operational Semantics
Say stores s are finite partial functions from L to Z. Values v ::= b | n | skip | fn x :T ⇒ e
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
82
(deref) h!ℓ, si −→ hn, si if ℓ ∈ dom(s) and s(ℓ) = n
he1 , si −→ he1′ , s ′ i
(if3)
hif e1 then e2 else e3 , si −→ hif e1′ then e2 else e3 , s ′ i
(while)
hwhile e1 do e2 , si −→ hif e1 then (e2 ; while e1 do e2 ) else skip, si
he1 , si −→ he1′ , s ′ i
(app1)
he1 e2 , si −→ he1′ e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(app2)
hv e2 , si −→ hv e2′ , s ′ i
(let2)
hlet val x :T = v in e2 end, si −→ h{v /x }e2 , si
(letrecfn) let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e2 end
−→
{(fn y:T1 ⇒ let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e1 end)/x }e2
Typing
Take Γ ∈ TypeEnv2, the finite partial functions from L ∪ X to Tloc ∪ T such that
∀ ℓ ∈ dom(Γ).Γ(ℓ) ∈ Tloc
83
∀ x ∈ dom(Γ).Γ(x ) ∈ T
Γ ⊢ e1 :int Γ ⊢ e1 :int
Γ ⊢ e2 :int Γ ⊢ e2 :int
(op +) (op ≥)
Γ ⊢ e1 + e2 :int Γ ⊢ e1 ≥ e2 :bool
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
Γ(ℓ) = intref
Γ ⊢ e:int
(assign)
Γ ⊢ ℓ := e:unit
Γ(ℓ) = intref
(deref)
Γ ⊢!ℓ:int
(skip) Γ ⊢ skip:unit
Γ ⊢ e1 :unit
Γ ⊢ e2 :T
(seq)
Γ ⊢ e1 ; e2 :T
Γ ⊢ e1 :bool
Γ ⊢ e2 :unit
(while)
Γ ⊢ while e1 do e2 :unit
(var) Γ ⊢ x :T if Γ(x ) = T
Γ, x :T ⊢ e:T ′
(fn)
Γ ⊢ fn x :T ⇒ e : T → T ′
′
(app) Γ ⊢ e1 :T → T Γ ⊢ e2 :T
Γ ⊢ e1 e2 :T ′
Γ ⊢ e1 :T Γ, x :T ⊢ e2 :T ′
(let)
Γ ⊢ let val x :T = e1 in e2 end:T ′
84
4.7 Exercises
85
5 Data
Data – L3
So far we have only looked at very simple basic data types – int, bool, and unit, and functions
over them. We now explore more structured data, in as simple a form as possible, and revisit
the semantics of mutable store.
The two basic notions are the product and the sum type.
The product type T1 ∗ T2 lets you tuple together values of types T1 and T2 – so for example
a function that takes an integer and returns a pair of an integer and a boolean has type
int → (int ∗ bool). In C one has structs; in Java classes can have many fields.
The sum type T1 + T2 lets you form a disjoint union, with a value of the sum type either
being a value of type T1 or a value of type T2 . In C one has unions; in Java one might
have many subclasses of a class (see the l1.java representation of the L1 abstract syntax,
for example).
In most languages these appear in richer forms, e.g. with labelled records rather than simple
products, or labelled variants, or ML datatypes with named constructors, rather than simple
sums. We’ll look at labelled records in detail, as a preliminary to the later lecture on
subtyping.
Many languages don’t allow structured data types to appear in arbitrary positions – e.g. the
old C lack of support for functions that return structured values, inherited from close-to-
the-metal early implementations. They might therefore have to have functions or methods
that take a list of arguments, rather than a single argument that could be of product (or
sum, or record) type.
Products
T ::= ... | T1 ∗ T2
Design choices:
• pairs, not arbitrary tuples – have int ∗ (int ∗ int) and (int ∗ int) ∗ int, but (a) they’re
different, and (b) we don’t have (int ∗ int ∗ int). In a full language you’d likely allow
(b) (and still have it be a different type from the other two).
• have projections #1 and #2, not pattern matching fn (x , y) ⇒ e. A full language
should allow the latter, as it often makes for much more elegant code.
• don’t have #e e ′ (couldn’t typecheck!).
86
Products - typing
(proj1) Γ ⊢ e:T1 ∗ T2
Γ ⊢ #1 e:T1
(proj2) Γ ⊢ e:T1 ∗ T2
Γ ⊢ #2 e:T2
Products - reduction
he2 , si −→ he2′ , s ′ i
(pair2)
h(v1 , e2 ), si −→ h(v1 , e2′ ), s ′ i
he, si −→ he ′ , s ′ i he, si −→ he ′ , s ′ i
(proj3) (proj4)
h#1 e, si −→ h#1 e ′ , s ′ i h#2 e, si −→ h#2 e ′ , s ′ i
Again, have to choose evaluation strategy (CBV) and evaluation order (left-to-right, for
consistency).
T ::= ... | T1 + T2
e ::= ... | inl e:T | inr e:T |
case e of inl (x1 :T1 ) ⇒ e1 | inr (x2 :T2 ) ⇒ e2
Those x s are binders.
Sums - typing
(inl) Γ ⊢ e:T1
Γ ⊢ inl e:T1 + T2 :T1 + T2
(inr) Γ ⊢ e:T2
Γ ⊢ inr e:T1 + T2 :T1 + T2
Γ ⊢ e:T1 + T2
Γ, x :T1 ⊢ e1 :T
Γ, y:T2 ⊢ e2 :T
(case)
Γ ⊢ case e of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 :T
87
Why do we have these irritating type annotations? To maintain the unique
typing property, as otherwise
You might:
• have a compiler use a type inference algorithm that can infer them.
• require every sum type in a program to be declared, each with different
names for the constructors inl , inr (cf OCaml).
• ...
Sums - reduction
he, si −→ he ′ , s ′ i
(case1) hcase e of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 , si
−→ hcase e ′ of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 , s ′ i
he, si −→ he ′ , s ′ i
(inr)
hinr e:T , si −→ hinr e ′ :T , s ′ i
T →T fn x :T ⇒ e
T ∗T (, ) #1 #2
T +T inl ( ) inr ( ) case
bool true false if
88
The Curry-Howard Isomorphism
(var) Γ, x :T ⊢ x :T Γ, P ⊢ P
Γ, x :T ⊢ e:T ′ Γ, P ⊢ P ′
(fn)
Γ ⊢ fn x :T ⇒ e : T → T ′ Γ ⊢ P → P′
Γ ⊢ e1 :T → T ′ Γ ⊢ e2 :T Γ ⊢ P → P′ Γ⊢P
(app)
Γ ⊢ e1 e2 :T ′ Γ ⊢ P′
Γ ⊢ e1 :T1 Γ ⊢ e2 :T2 Γ ⊢ P1 Γ ⊢ P2
(pair)
Γ ⊢ (e1 , e2 ):T1 ∗ T2 Γ ⊢ P1 ∧ P2
Γ ⊢ e:T1 ∗ T2 Γ ⊢ e:T1 ∗ T2 Γ ⊢ P1 ∧ P2 Γ ⊢ P1 ∧ P2
(proj1) (proj2)
Γ ⊢ #1 e:T1 Γ ⊢ #2 e:T2 Γ ⊢ P1 Γ ⊢ P2
Γ ⊢ e:T1 Γ ⊢ P1
(inl)
Γ ⊢ inl e:T1 + T2 :T1 + T2 Γ ⊢ P1 ∨ P2
ML Datatypes
Note (a) this involves recursion at the type level (e.g. types for binary trees), (b) it introduces
constructors (Null and Cons) for each summand, and (c) it’s generative - two different
declarations of IntList will make different types. Making all that precise is beyond the
scope of this course.
Records
Note:
• The condition on record formation means that our syntax is no longer ‘free’. Formally,
we should have a well-formedness judgment on types.
• Labels are not the same syntactic class as variables, so (fn x:T ⇒ {x = 3}) is not an
expression.
• Does the order of fields matter? Can you use reuse labels in different record types?
The typing rules will fix an answer.
• In ML a pair (true, fn x:int ⇒ x) is actually syntactic sugar for a record {1 = true, 2 =
fn x:int ⇒ x}.
• Note that #lab e is not an application, it just looks like one in the concrete syntax.
• Again we will choose a left-to-right evaluation order for consistency.
89
Records - typing
(recordproj)
Γ ⊢ e:{lab 1 :T1 , .., lab k :Tk }
Γ ⊢ #lab i e:Ti
• Here the field order matters, so (fn x:{foo:int, bar :bool} ⇒ x){bar = true, foo = 17}
does not typecheck. In ML, though, the order doesn’t matter – so Moscow ML will
accept strictly more programs in this syntax than this type system allows.
• Here and in Moscow ML can reuse labels, so {} ⊢ ({foo = 17}, {foo = true}):{foo:int}∗
{foo:bool} is legal, but in some languages (e.g. OCaml) you can’t.
Records - reduction
he, si −→ he ′ , s ′ i
(record3)
h#lab i e, si −→ h#lab i e ′ , s ′ i
Mutable Store
Most languages have some kind of mutable store. Two main choices:
90
2 The C-way (also Java etc).
• variables let you refer to a previously calculated value and let you
overwrite that value with another.
• implicit dereferencing and assignment,
void foo(x:int) {
l = l + x
...}
• have some limited type machinery (const qualifiers) to limit
mutability.
References
Staying with 1 here. But, those L1/L2 references are very limited:
• can only store ints - for uniformity, would like to store any value
• cannot create new locations (all must exist at beginning)
• cannot write functions that abstract on locations fn l :intref ⇒!l
So, generalise.
Have locations in the expression syntax, but that is just so we can express the intermediate
states of computations – whole programs now should have no locations in at the start, but
can create them with ref. They can have variables of T ref type, e.g.fn x:int ref ⇒!x.
References - Typing
(ref) Γ ⊢ e:T
Γ ⊢ ref e : T ref
Γ ⊢ e1 :T ref
Γ ⊢ e2 :T
(assign)
Γ ⊢ e1 := e2 :unit
(loc)
Γ(ℓ) = T ref
Γ ⊢ ℓ:T ref
91
References – Reduction
A location is a value:
v ::= ... | ℓ
Stores s were finite partial maps from L to Z. From now on, take them to
be finite partial maps from L to the set of all values.
he, si −→ he ′ , s ′ i
(ref2)
h ref e, si −→ h ref e ′ , s ′ i
he, si −→ he ′ , s ′ i
(deref2)
h!e, si −→ h!e ′ , s ′ i
he, si −→ he ′ , s ′ i
(assign2)
hℓ := e, si −→ hℓ := e ′ , s ′ i
he, si −→ he ′ , s ′ i
(assign3)
he := e2 , si −→ he ′ := e2 , s ′ i
• A ref has to do something at runtime – ( ref 0, ref 0) should return a pair of two new
locations, each containing 0, not a pair of one location repeated.
• Note the typing and this dynamics permit locations to contain locations, e.g. ref( ref 3).
• This semantics no longer has determinacy, for a technical reason – new locations are
chosen arbitrarily. At the cost of some slight semantic complexity, we could regain
determinacy by working ’up to alpha for locations’.
• What is the store:
1. an array of bytes,
2. an array of values, or
3. a partial function from locations to values?
We take the third, most abstract option. Within the language one cannot do arithmetic
on locations (just as well!) (can in C, can’t in Java) or test whether one is bigger than
another (in presence of garbage collection, they may not stay that way). Might or
might not even be able to test them for equality (can in ML, cannot in L3).
• This store just grows during computation – an implementation can garbage collect (in
many fancy ways), but platonic memory is free.
We don’t have an explicit deallocation operation – if you do, you need a very baroque
type system to prevent dangling pointers being dereferenced. We don’t have unini-
tialised locations (cf. null pointers), so don’t have to worry about dereferencing null.
92
Type-checking the store
For L1, our type properties used dom(Γ) ⊆ dom(s) to express the
condition ‘all locations mentioned in Γ exist in the store s ’.
Now need more: for each ℓ ∈ dom(s) need that s(ℓ) is typable.
Moreover, s(ℓ) might contain some other locations...
Consider
Have made a recursive function by ‘tying the knot by hand’, not using let val rec .
To do this we needed to store function values – couldn’t do this in L2, so this doesn’t
contradict the normalisation theorem we had there.
Implementation
93
5.3 Evaluation Contexts
We end this chapter by showing a slightly different style for defining operational semantics,
collecting together many of the context rules into a single (eval) rule that uses a definition
of a set of evaluation contexts to describe where in your program the next step of reduction
can take place. This style becomes much more convenient for large languages, though for
L1 and L2 there’s not much advantage either way.
Evaluation Contexts
he, si −→ he ′ , s ′ i
(eval)
hE [e], si −→ hE [e ′ ], s ′ i
replacing the rules (all those with ≥ 1 premise) (op1), (op2), (seq2), (if3),
(app1), (app2), (let1), (pair1), (pair2), (proj3), (proj4), (inl), (inr), (case1),
(record1), (record3), (ref2), (deref2), (assign2), (assign3).
To (eval) we add all the computation rules (all the rest) (op + ), (op ≥ ),
(seq1), (if1), (if2), (while), (fn), (let2), (letrecfn), (proj1), (proj2), (case2),
(case3), (record2), (ref1), (deref1), (assign1).
Fortran 1950s
Haskell 1987
Subtyping 1980s
And now? module systems, distribution, mobility, reasoning about objects, security, typed compilation,
approximate analyses,.......
94
5.4 L3: Collected Definition
L3 Syntax
T ::= int | bool | unit | T1 → T2 |T1 ∗ T2 |T1 + T2 |{lab 1 :T1 , .., lab k :Tk }|T ref
Expressions
e ::= n | b | e1 op e2 | if e1 then e2 else e3 |
e1 := e2 |!e | ref e | ℓ |
skip | e1 ; e2 |
while e1 do e2 |
fn x :T ⇒ e | e1 e2 | x |
let val x :T = e1 in e2 end|
let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e2 end|
(e1 , e2 ) | #1 e | #2 e|
inl e:T | inr e:T |
case e of inl (x1 :T1 ) ⇒ e1 | inr (x2 :T2 ) ⇒ e2 |
{lab 1 = e1 , .., lab k = ek } | #lab e
(where in each record (type or expression) no lab occurs more than once)
In expressions fn x :T ⇒ e the x is a binder. In expressions let val x :T = e1 in e2 end
the x is a binder. In expressions let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e2 end
the y binds in e1 ; the x binds in (fn y:T ⇒ e1 ) and in e2 . In case e of inl (x1 :T1 ) ⇒ e1 |
inr (x2 :T2 ) ⇒ e2 the x1 binds in e1 and the x2 binds in e2 .
L3 Semantics
Stores s were finite partial maps from L to Z. From now on, take them to be finite partial
maps from L to the set of all values.
Values v ::= b | n | skip | fn x :T ⇒ e|(v1 , v2 )|inl v :T | inr v :T |{lab 1 = v1 , .., lab k = vk }|ℓ
he1 , si −→ he1′ , s ′ i
(op1)
he1 op e2 , si −→ he1′ op e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
(seq1) hskip; e2 , si −→ he2 , si
he1 , si −→ he1′ , s ′ i
(seq2)
he1 ; e2 , si −→ he1′ ; e2 , s ′ i
95
(if1) hif true then e2 else e3 , si −→ he2 , si
he1 , si −→ he1′ , s ′ i
(if3)
hif e1 then e2 else e3 , si −→ hif e1′ then e2 else e3 , s ′ i
(while)
hwhile e1 do e2 , si −→ hif e1 then (e2 ; while e1 do e2 ) else skip, si
he1 , si −→ he1′ , s ′ i
(app1)
he1 e2 , si −→ he1′ e2 , s ′ i
he2 , si −→ he2′ , s ′ i
(app2)
hv e2 , si −→ hv e2′ , s ′ i
(let2)
hlet val x :T = v in e2 end, si −→ h{v /x }e2 , si
(letrecfn) let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e2 end
−→
{(fn y:T1 ⇒ let val rec x :T1 → T2 = (fn y:T1 ⇒ e1 ) in e1 end)/x }e2
he1 , si −→ he1′ , s ′ i
(pair1)
h(e1 , e2 ), si −→ h(e1′ , e2 ), s ′ i
he2 , si −→ he2′ , s ′ i
(pair2)
h(v1 , e2 ), si −→ h(v1 , e2′ ), s ′ i
he, si −→ he ′ , s ′ i he, si −→ he ′ , s ′ i
(proj3) (proj4)
h#1 e, si −→ h#1 e ′ , s ′ i h#2 e, si −→ h#2 e ′ , s ′ i
he, si −→ he ′ , s ′ i
(inl)
hinl e:T , si −→ hinl e ′ :T , s ′ i
he, si −→ he ′ , s ′ i
(case1) hcase e of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 , si
−→ hcase e ′ of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 , s ′ i
96
he, si −→ he ′ , s ′ i
(inr)
hinr e:T , si −→ hinr e ′ :T , s ′ i
he, si −→ he ′ , s ′ i
(record3)
h#lab i e, si −→ h#lab i e ′ , s ′ i
(ref1) h ref v , si −→ hℓ, s + {ℓ 7→ v }i ℓ ∈
/ dom(s)
he, si −→ he ′ , s ′ i
(ref2)
h ref e, si −→ h ref e ′ , s ′ i
(deref1) h!ℓ, si −→ hv , si if ℓ ∈ dom(s) and s(ℓ) = v
he, si −→ he ′ , s ′ i
(deref2)
h!e, si −→ h!e ′ , s ′ i
he, si −→ he ′ , s ′ i
(assign3)
he := e2 , si −→ he ′ := e2 , s ′ i
L3 Typing
Take Γ ∈ TypeEnv2, the finite partial functions from L ∪ X to Tloc ∪ T such that
∀ ℓ ∈ dom(Γ).Γ(ℓ) ∈ Tloc
∀ x ∈ dom(Γ).Γ(x ) ∈ T
Γ ⊢ e1 :int Γ ⊢ e1 :int
Γ ⊢ e2 :int Γ ⊢ e2 :int
(op +) (op ≥)
Γ ⊢ e1 + e2 :int Γ ⊢ e1 ≥ e2 :bool
Γ ⊢ e1 :bool
Γ ⊢ e2 :T
Γ ⊢ e3 :T
(if)
Γ ⊢ if e1 then e2 else e3 :T
97
(skip) Γ ⊢ skip:unit
Γ ⊢ e1 :unit
Γ ⊢ e2 :T
(seq)
Γ ⊢ e1 ; e2 :T
Γ ⊢ e1 :bool
Γ ⊢ e2 :unit
(while)
Γ ⊢ while e1 do e2 :unit
(var) Γ ⊢ x :T if Γ(x ) = T
Γ, x :T ⊢ e:T ′
(fn)
Γ ⊢ fn x :T ⇒ e : T → T ′
′
(app) Γ ⊢ e1 :T → T Γ ⊢ e2 :T
Γ ⊢ e1 e2 :T ′
Γ ⊢ e1 :T Γ, x :T ⊢ e2 :T ′
(let)
Γ ⊢ let val x :T = e1 in e2 end:T ′
(proj1) Γ ⊢ e:T1 ∗ T2
Γ ⊢ #1 e:T1
(proj2) Γ ⊢ e:T1 ∗ T2
Γ ⊢ #2 e:T2
(inl) Γ ⊢ e:T1
Γ ⊢ inl e:T1 + T2 :T1 + T2
(inr) Γ ⊢ e:T2
Γ ⊢ inr e:T1 + T2 :T1 + T2
Γ ⊢ e:T1 + T2
Γ, x :T1 ⊢ e1 :T
Γ, y:T2 ⊢ e2 :T
(case)
Γ ⊢ case e of inl (x :T1 ) ⇒ e1 | inr (y:T2 ) ⇒ e2 :T
98
(ref) Γ ⊢ e:T
Γ ⊢ ref e : T ref
Γ ⊢ e1 :T ref
Γ ⊢ e2 :T
(assign)
Γ ⊢ e1 := e2 :unit
Γ(ℓ) = T ref
(loc)
Γ ⊢ ℓ:T ref
5.5 Exercises
Exercise 29 ⋆⋆Design abstract syntax, type rules and evaluation rules for labelled vari-
ants, analogously to the way in which records generalise products.
Exercise 30 ⋆⋆Design type rules and evaluation rules for ML-style exceptions. Start
with exceptions that do not carry any values. Hint 1: take care with nested handlers within
recursive functions. Hint 2: you might want to express your semantics using evaluation
contexts.
Exercise 31 ⋆⋆⋆Extend the L2 implementation to cover all of L3.
Operational semantics
(assignment and while ) L10<1,2,3,4 Type systems
00<<
00<< Implementations
0 <<<
5,6 00 << Language design choices
(functions and recursive definitions) L2 00 <
00 <<< Inductive definitions
00 <<
0 << Inductive proof – structural; rule
8 00 <<
(products, sums, records, references) L3 <<
uu 0
uu
u 00 << Abstract syntax up to alpha
uu 00 <<
uu <
Subtyping Semantic <<<
9 11 <<
and Objects Equivalence <<
<<
<<
<
TAL10 Concurrency12
99
6 Subtyping and Objects
Our type systems so far would all be annoying to use, as they’re quite rigid (Pascal-like).
There is no support for code reuse (except for functions), so you would have to have different
sorting code for, e.g., int lists and int ∗ int lists.
Polymorphism
Subtyping – Motivation
Recall
Γ ⊢ e1 :T → T ′
Γ ⊢ e2 :T
(app)
Γ ⊢ e1 e2 :T ′
so can’t type
even though we’re giving the function a better argument, with more
structure, than it needs.
100
Subsumption
‘Better’? Any value of type {p:int, q:int} can be used wherever a value
of type {p:int} is expected. (*)
Example
(s-refl)
T <: T
Subtyping – Records
{lab 1 :T1 , .., lab k :Tk , lab k+1 :Tk+1 , .., lab k+k′ :Tk+k′ }
<: (s-record-width)
101
Another example:
(s-record-width)
{p:int, q:int} <: {p:int}
(s-record-width) (s-record-de
{x:{p:int, q:int}, y:{r:int}} <: {x:{p:int, q:int}} {x:{p:int, q:int}} <: {x:{p:int}}
(s-trans)
{x:{p:int, q:int}, y:{r:int}} <: {x:{p:int}}
(s-record-order)
π a permutation of 1, .., k
{lab 1 :T1 , .., lab k :Tk } <: {lab π(1) :Tπ(1) , .., lab π(k) :Tπ(k) }
Subtyping - Functions
(s-fn)
T1′ <: T1 T2 <: T2′
T1 → T2 <: T1′ → T2′
contravariant on the left of →
f = fn x:{p:int} ⇒ {p = #p x, q = 28}
we have
{} ⊢ f :{p:int} → {p:int, q:int}
{} ⊢ f :{p:int} → {p:int}
{} ⊢ f :{p:int, q:int} → {p:int, q:int}
{} ⊢ f :{p:int, q:int} → {p:int}
as
{p:int, q:int} <: {p:int}
we have
{} ⊢ f :{p:int, q:int} → {p:int}
{} 6⊢ f :{p:int} → T for any T
{} 6⊢ f :T → {p:int, q:int} for any T
Subtyping – Products
(s-pair)
T1 <: T1′ T2 <: T2′
T1 ∗ T2 <: T1′ ∗ T2′
Subtyping – Sums
Exercise.
102
Subtyping – References
T <: T ′ T ′ <: T
T ref <: T ′ ref T ref <: T ′ ref
No...
Semantics
Properties
Implementation
Subtyping – Down-casts
The subsumption rule (sub) permits up-casting at any point. How about
down-casting? We could add
e ::= ... | (T )e
with typing rule
Γ ⊢ e:T ′
Γ ⊢ (T )e:T
then you need a dynamic type-check...
This gives flexibility, but at the cost of many potential run-time errors.
Many uses might be better handled by Parametric Polymorphism, aka
Generics. (cf. work by Martin Odersky at EPFL, Lausanne, now in Java
1.5)
The following development is taken from [Pierce, Chapter 18], where you can find more
details (including a treatment of self and a direct semantics for a ‘featherweight’ fragment
of Java).
in
103
Using Subtyping
in
Object Generators
in
104
Reusing Method Code (Simple Classes)
class Counter
{ protected int p;
Counter() { this.p=0; }
int get () { return this.p; }
void inc () { this.p++ ; }
};
class ResetCounter
extends Counter
{ void reset () {this.p=0;}
};
A′ = {} with {p:int}
′′
A = A′ with {q:bool}
A′′′ = A′ with {r:int}
{} Object (ish!)
{p:int} A′ MM
nnn
OOO
OOO qqqqq MMM
MM
nnn qq
{p:int, q:bool} {p:int, r:int} A′′ A′′
6.1 Exercises
Exercise 32 ⋆For each of the following, either give a type derivation or explain why it is
untypable.
1. {} ⊢ {p = {p = {p = {p = 3}}}}:{p:{}}
2. {} ⊢ fn x:{p:bool, q:{p:int, q:bool}} ⇒ #q #p x : ?
3. {} ⊢ fn f:{p:int} → int ⇒ (f {q = 3}) + (f {p = 4}) : ?
4. {} ⊢ fn f:{p:int} → int ⇒ (f {q = 3, p = 2}) + (f {p = 4}) : ?
105
Exercise 33 ⋆For each of the two bogus T ref subtype rules on Page 6, give an example
program that is typable with that rule but gets stuck at runtime.
Exercise 34 ⋆⋆What should the subtype rules for sums T + T ′ be?
Exercise 35 ⋆⋆...and for let and let rec ?
106
7 Semantic Equivalence
Semantic Equivalence
?
2+2 ≃ 4
In what sense are these two expressions the same?
But, you’d hope that in any program you could replace one by the other
without affecting the result....
Z 2+2 Z 4
esin(x) dx = esin(x) dx
0 0
?
How about (l := 0; 4) ≃ (l := 1; 3+!l )
They will produce the same result (in any store), but you cannot replace
one by the other in an arbitrary program context. For example:
C [ ] = +!l
C [l := 0; 4] = (l := 0; 4)+!l
6≃
C [l := 1; 3+!l ] = (l := 1; 3+!l )+!l
On the other hand, consider
?
(l :=!l + 1); (l :=!l − 1) ≃ (l :=!l )
Those were all particular expressions – may want to know that some
general laws are valid for all e1 , e2 , .... How about these:
?
e1 ; (e2 ; e3 ) ≃ (e1 ; e2 ); e3
?
(if e1 then e2 else e3 ); e ≃ if e1 then e2 ; e else e3 ; e
?
e; (if e1 then e2 else e3 ) ≃ if e1 then e; e2 else e; e3
?
e; (if e1 then e2 else e3 ) ≃ if e; e1 then e2 else e3
107
Temporarily extend L3 with pointer equality
op ::= ... |=
Γ ⊢ e1 :T ref
Γ ⊢ e2 :T ref
(op =)
Γ ⊢ e1 = e2 :bool
(op =) hℓ = ℓ′ , si −→ hb, si if b = (ℓ = ℓ′ )
?
f ≃ g
The last two examples are taken from A.M. Pitts, Operational Semantics and Program
Equivalence. In: G. Barthe, P. Dybjer and J. Saraiva (Eds), Applied Semantics. Lecture
Notes in Computer Science, Tutorial, Volume 2395 (Springer-Verlag, 2002), pages 378-412.
(Revised version of lectures at the International Summer School On Applied Semantics,
APPSEM 2000, Caminha, Minho, Portugal, 9-15 September 2000.) ftp://ftp.cl.cam.
ac.uk/papers/amp12/opespe-lncs.pdf
108
What does it mean for ≃ to be ‘good’?
If T = unit then C = ; !l .
If T = bool then C = if then !l else !l .
If T = int then C = l1 := ; !l .
C ::= op e2 | e1 op |
if then e2 else e3 | if e1 then else e3 | if e1 then e2 else |
ℓ := |
; e2 | e1 ; |
while do e2 | while e1 do
Say ≃T
Γ has the congruence property if whenever e1 ≃TΓ e2 we have,
for all C and T ′ , if Γ ⊢ C [e1 ]:T ′ and Γ ⊢ C [e2 ]:T ′ then
′
C [e1 ] ≃TΓ C [e2 ].
109
Theorem 16 (Congruence for L1) ≃TΓ has the congruence property.
Proof Outline By case analysis, looking at each L1 context C in turn.
For each C (and for arbitrary e and s ), consider the possible reduction
sequences
(the other possibility, of zero or more (assign1) reductions ending in a stuck state, is excluded
by Theorems 2 and 3 (type preservation and progress))
110
Now, if hℓ:= e, si −→ω we have he, si −→ω , so by e ≃TΓ e ′ we
have he , si −→ω , so (using (assign2)) we have hℓ := e ′ , si −→ω .
′
111
By e ≃TΓ e ′ we have he ′ , si −→∗ hn, sk−1 i. Then using (assign1) hℓ := e ′ , si −→∗
hℓ := n, sk−1 i −→ hskip, sk−1 + {ℓ 7→ n} = hek , sk i as required.
Back to the Examples
2 + 2 ≃int
Γ 4 for any Γ
(l := 0; 4) 6≃int
Γ (l := 1; 3+!l ) for any Γ
Conjecture 1 e1 ; (e2 ; e3 ) ≃T
Γ (e1 ; e2 ); e3 for any Γ, T , e1 , e2 and e3
such that Γ ⊢ e1 :unit, Γ ⊢ e2 :unit, and Γ ⊢ e3 :T
Conjecture 2
((if e1 then e2 else e3 ); e) ≃TΓ (if e1 then e2 ; e else e3 ; e) for
any Γ, T , e , e1 , e2 and e3 such that Γ ⊢ e1 :bool, Γ ⊢ e2 :unit,
Γ ⊢ e3 :unit, and Γ ⊢ e:T
Conjecture 3
(e; (if e1 then e2 else e3 )) ≃TΓ (if e1 then e; e2 else e; e3 ) for
any Γ, T , e , e1 , e2 and e3 such that Γ ⊢ e:unit, Γ ⊢ e1 :bool,
Γ ⊢ e2 :T , and Γ ⊢ e3 :T
A sufficient condition: they don’t mention any locations (but not necessary... e.g. if e1 does
but e2 doesn’t)
112
A weaker sufficient condition: they don’t mention any of the same locations. (but not
necessary... e.g. if they both just read l )
An even weaker sufficient condition: we can regard each expression as a partial function
over stores with domain dom(Γ). Say [[ei ]]Γ is the function that takes a store s with
dom(s) = dom(Γ) and either is undefined, if hei , si −→ω , or is s ′ , if hei , si −→∗ h(), s ′ i
(the Determinacy theorem tells us that this is a definition of a function).
For each location ℓ in dom(Γ), say ei semantically depends on ℓ if there exists s, n such that
[[ei ]]Γ (s) 6= [[ei ]]Γ (s + {ℓ 7→ n})). (note this is much weaker than “ei contains an dereference
of ℓ”)
Say ei semantically affects ℓ if there exists s such that s(ℓ) 6= [[ei ]]Γ (s)(ℓ). (note this is much
weaker than “ei contains an assignment to ℓ”)
Now e1 ; e2 ≃unit
Γ e2 ; e1 if there is no ℓ that is depended on by one ei and affected by the
other.
(sill not necessary...?)
7.1 Exercises
8 Concurrency
Concurrency
Our focus so far has been on semantics for sequential computation. But
the world is not sequential...
• multi-processor machines
• multi-threading (perhaps on a single processor)
• networked machines
113
Problems
More Problems!
Theme: as for sequential languages, but much more so, it’s a complicated
world.
Aim of this lecture: just to give you a taste of how a little semantics can
be used to express some of the fine distinctions. Primarily (1) to boost
your intuition for informal reasoning, but also (2) this can support rigorous
proof about really hairy crypto protocols, cache-coherency protocols,
comms, database transactions,....
114
Booleans b ∈ B = {true, false}
Integers n ∈ Z = {..., −1, 0, 1, ...}
Locations ℓ ∈ L = {l , l0 , l1 , l2 , ...}
Operations op ::= + |≥
Expressions
(thread) Γ ⊢ e:unit
Γ ⊢ e:proc
he2 , si −→ he2′ , s ′ i
(parallel2)
he1 e2 , si −→ he1 e2′ , s ′ i
115
But, assignments and dereferencing are atomic. For example,
hl := 3498734590879238429384 | l := 7, {l 7→ 0}i
will reduce to a state with l either 3498734590879238429384 or 7, not
something with the first word of one and the second word of the other.
Implement?
Note that the labels +, w and r in this picture are just informal hints as to how those
transitions were derived – they are not actually part of the reduction relation.
Some of the nondeterministic choices “don’t matter”, as you can get back to the same state.
Others do...
Morals
• Almost certainly you (as the programmer) didn’t want all those 3
outcomes to be possible – need better idioms or constructs for
programming.
Usually, though, you can depend on built-in support from the scheduler,
e.g. for mutexes and condition variables (or, at a lower level, tas or
cas).
See this – in the library – for a good discussion of mutexes and condition variables: A. Birrell,
J. Guttag, J. Horning, and R. Levin. Thread synchronization: a Formal Specification. In G.
Nelson, editor, System Programming with Modula-3, chapter 5, pages 119-129. Prentice-
Hall, 1991.
See N. Lynch. Distributed Algorithms for other mutual exclusion algorithms (and much else
besides).
116
Consider simple mutexes, with commands to lock an unlocked mutex and to unlock a locked
mutex (and do nothing for an unlock of an unlocked mutex).
(lock) (unlock)
Γ ⊢ lock m:unit Γ ⊢ unlock m:unit
Note that (lock) atomically (a) checks the mutex is currently false, (b) changes its state,
and (c) lets the thread proceed.
Also, there is no record of which thread is holding a locked mutex.
Need to adapt all the other semantic rules to carry the mutex state M
around. For example, replace
he2 , si −→ he2′ , s ′ i
(op2)
hv op e2 , si −→ hv op e2′ , s ′ i
by
he2 , s, M i −→ he2′ , s ′ , M ′ i
(op2)
hv op e2 , s, M i −→ hv op e2′ , s ′ , M ′ i
(note, the M and s must behave the same wrt evaluation order).
Using a Mutex
Consider
In all the intervening states (until the first unlock ) the second lock can’t proceed.
Look back to behaviour of the program without mutexes. We’ve essentially cut down to the
top and bottom paths (and also added some extra reductions for lock , unlock , and ;).
In this example, l := 1+!l and l := 7+!l commute, so we end up in the same final state
whichever got the lock first. In general, that won’t be the case.
On the downside, we’ve also lost any performance benefits of concurrency (for this program
that’s fine, but in general there’s some other computation that wouldn’t conflict and so
could be done in parallel).
117
Using Several Mutexes
lock m can block (that’s the point). Hence, you can deadlock.
Locking Disciplines
There are many possible locking disciplines. We’ll focus on one, to see
how it – and the properties it guarantees – can be made precise and
proved.
These are semantic properties again. In general, it won’t be computable whether they hold.
For simple ei , though, it’s often obvious. Further, one can construct syntactic disciplines
that are checkable and are sufficient to guarantee these.
See Transactional Information Systems, Gerhard Weikum and Gottfried Vossen, for much
more detail on locking disciplines etc. (albeit not from a programming-language semantics
perspective).
118
Solution: Write One Down
Instead of only defining the global he, s, M i −→ he ′ , s ′ , M ′ i, with rules
(assign1) hℓ := n, s, M i −→ hskip, s + {ℓ 7→ n}, M i if ℓ ∈ dom(s)
he1 , s, M i −→ he1′ , s ′ , M ′ i
(parallel1)
he1 e2 , s, M i −→ he1′ e2 , s ′ , M ′ i
a
−→ e ′ and use that to define
define a per-thread e
he, s, M i −→ he , s , M ′ i, with rules like
′ ′
ℓ:=n
(t-assign1) ℓ := n −→ skip
a
e1 −→ e1′
(t-parallel1) a
e1 e2 −→ e1′ e2
ℓ:=n
e −→ e ′ ℓ ∈ dom(s)
(c-assign)
he, s, M i −→ he ′ , s + {ℓ 7→ n}, M i
τ
(op ≥) hn1 ≥ n2 , s, M i −→ hb, s, M i if b = (n1 ≥ n2 ) (t-op ≥) n1 ≥ n2 −→ b if b = (n1 ≥ n2 )
a
he1 , s, M i −→ he1′ , s ′ , M ′ i e1 −→ e1′
(op1) (t-op1)
he1 op e2 , s, M i −→ he1′ op e2 , s ′ , M ′ i a
e1 op e2 −→ e1′ op e2
!ℓ=n
(deref) h!ℓ, s, M i −→ hn, s, M i if ℓ ∈ dom(s) and s(ℓ) = n (t-deref) !ℓ −→ n
ℓ:=n
(assign1) hℓ := n, s, M i −→ hskip, s + {ℓ 7→ n}, M i if ℓ ∈ dom(s) (t-assign1) ℓ := n −→ skip
a
he, s, M i −→ he ′ , s ′ , M ′ i e −→ e ′
(assign2) (t-assign2) a
hℓ := e, s, M i −→ hℓ := e ′ , s ′ , M ′ i ℓ := e −→ ℓ := e ′
τ
(t-seq1) skip; e2 −→ e2
(seq1) hskip; e2 , s, M i −→ he2 , s, M i
a
e1 −→ e1′
he1 , s, M i −→ he1′ , s ′ , M ′ i (t-seq2) a
(seq2) e1 ; e2 −→ e1′ ; e2
he1 ; e2 , s, M i −→ he1′ ; e2 , s ′ , M ′ i
τ
(t-if1) if true then e2 else e3 −→ e2
(if1) hif true then e2 else e3 , s, M i −→ he2 , s, M i
τ
(t-if2) if false then e2 else e3 −→ e3
(if2) hif false then e2 else e3 , s, M i −→ he3 , s, M i
a
e1 −→ e1′
he1 , s, M i −→ he1′ , s ′ , M ′ i (t-if3) a
(if3) if e1 then e2 else e3 −→ if e1′ then e2 else e3
hif e1 then e2 else e3 , s, M i −→ hif e1′ then e2 else e3 , s ′ , M ′ i
(t-while)
(while)
τ
while e1 do e2 −→ if e1 then (e2 ; while e1 do e2 ) else skip
hwhile e1 do e2 , s, M i −→ hif e1 then (e2 ; while e1 do e2 ) else skip, s, M i
a
e1 −→ e1′
(t-parallel1) a
he1 , s, M i −→ he1′ , s ′ , M ′ i e1 e2 −→ e1′ e2
(parallel1)
he1 e2 , s, M i −→ he1′ e2 , s ′ , M ′ i
a
e2 −→ e2′
he2 , s, M i −→ he2′ , s ′ , M ′ i (t-parallel2) a
(parallel2) e1 e2 −→ e1 e2′
he1 e2 , s, M i −→ he1 e2′ , s ′ , M ′ i
lock m
(t-lock) lock m −→ ()
(lock) hlock m, s, M i −→ h(), s, M + {m 7→ true}i if ¬M (m)
unlock m
(t-unlock) unlock m −→ ()
(unlock) hunlock m, s, M i −→ h(), s, M + {m 7→ false}i
τ
(c-tau)
e −→ e ′
he, s, M i −→ he ′ , s, M i
ℓ:=n lock m
e −→ e ′ ℓ ∈ dom(s) e −→ e ′ ¬ M (m)
(c-assign) (c-lock)
he, s, M i −→ he ′ , s + {ℓ 7→ n}, M i he, s, M i −→ he ′ , s, M + {m 7→ true}i
!ℓ=n
e −→ e ′ ℓ ∈ dom(s) ∧ s(ℓ) = n e −→ e ′
unlock m
(c-deref) (c-unlock)
he, s, M i −→ he ′ , s, M i he, s, M i −→ he ′ , s, M + {m 7→ false}i
119
Example of Thread-local transitions
120
The Theorem
(may be false!)
Language Properties
8.1 Exercises
Exercise 37 ⋆⋆Are the mutexes specified here similar to those described in CSAA?
Exercise 38 ⋆⋆Can you show all the conditions for O2PL are necessary, by giving for
each an example that satisfies all the others and either is not serialisable or deadlocks?
Exercise 39 ⋆⋆⋆⋆Prove the Conjecture about it.
Exercise 40 ⋆⋆⋆Write a semantics for an extension of L1 with threads that are more
like Unix threads (e.g. with thread ids, fork, etc..). Include some of the various ways Unix
threads can exchange information.
121
9 Low-level semantics
Low-level semantics
10 Epilogue
Epilogue
Lecture Feedback
Please do fill in the lecture feedback form – we need to know how the
course could be improved / what should stay the same.
My impression...
Need:
122
What can you use semantics for?
The End
123
Global Semantics Thread-Local Semantics
τ
(op +) hn1 + n2 , s, M i −→ hn, s, M i if n = n1 + n2 (t-op +) n1 + n2 −→ n if n = n1 + n2
τ
(op ≥) hn1 ≥ n2 , s, M i −→ hb, s, M i if b = (n1 ≥ n2 ) (t-op ≥) n1 ≥ n2 −→ b if b = (n1 ≥ n2 )
he2 , s, M i −→ he2′ , s ′ , M ′ i a
(op2) e2 −→ e2′
hv op e2 , s, M i −→ hv op e2′ , s ′ , M ′ i (t-op2) a
v op e2 −→ v op e2′
!ℓ=n
(deref) h!ℓ, s, M i −→ hn, s, M i if ℓ ∈ dom(s) and s(ℓ) = n (t-deref) !ℓ −→ n
(assign1) hℓ := n, s, M i −→ hskip, s + {ℓ 7→ n}, M i if ℓ ∈ dom(s) ℓ:=n
(t-assign1) ℓ := n −→ skip
′ ′ ′
he, s, M i −→ he , s , M i a
(assign2) e −→ e ′
hℓ := e, s, M i −→ hℓ := e ′ , s ′ , M ′ i (t-assign2) a
ℓ := e −→ ℓ := e ′
τ
(seq1) hskip; e2 , s, M i −→ he2 , s, M i (t-seq1) skip; e2 −→ e2
he1 , s, M i −→ he1′ , s ′ , M ′ i a
e1 −→ e1′
(seq2) (t-seq2)
he1 ; e2 , s, M i −→ he1′ ; e2 , s ′ , M ′ i a
e1 ; e2 −→ e1′ ; e2
τ
(if1) hif true then e2 else e3 , s, M i −→ he2 , s, M i (t-if1) if true then e2 else e3 −→ e2
τ
(if2) hif false then e2 else e3 , s, M i −→ he3 , s, M i (t-if2) if false then e2 else e3 −→ e3
he1 , s, M i −→ he1′ , s ′ , M ′ i a
e1 −→ e1′
(if3) (t-if3)
hif e1 then e2 else e3 , s, M i −→ hif e1′ ′
then e2 else e3 , s , M i ′
if e1 then e2 else e3 −→ if e1′ then e2 els
a
(while) (t-while)
hwhile e1 do e2 , s, M i −→ hif e1 then (e2 ; while e1 do e2 ) else skip, s, M i τ
while e1 do e2 −→ if e1 then (e2 ; while e1 do e2 ) e
a
he1 , s, M i −→ he1′ , s ′ , M ′ i e1 −→ e1′
(parallel1) (t-parallel1) a
he1 e2 , s, M i −→ he1′ e2 , s ′ , M ′ i e1 e2 −→ e1′ e2
a
he2 , s, M i −→ he2′ , s ′ , M ′ i e2 −→ e2′
(parallel2) (t-parallel2) a
he1 e2 , s, M i −→ he1 e2′ , s ′ , M ′ i e1 e2 −→ e1 e2′
ℓ:=n lock m
e −→ e ′ ℓ ∈ dom(s) e −→ e ′ ¬ M (m)
(c-assign) ′
(c-lock) ′
he, s, M i −→ he , s + {ℓ 7→ n}, M i he, s, M i −→ he , s, M + {m 7→ true}i
!ℓ=n unlock m
e −→ e ′ ℓ ∈ dom(s) ∧ s(ℓ) = n e −→ e ′
(c-deref) (c-unlock)
he, s, M i −→ he ′ , s, M i he, s, M i −→ he ′ , s, M + {m 7→ false}
124
The behaviour of (l := 1+!l ) (l := 7+!l ) for the initial store {l 7→ 0}:
r + w
h() (l := 7+!l), {l 7→ 1}i /• /• / h() (), {l 7→ 8}i
kkk5
w kkk
k
k kkkk
kk
h(l := 1) (l := 7+!l), {l 7→ 0}i h() (l := 7 + 0), {l 7→ 1}i
+ oo
oo7 SSS
SSS r w kkk
kkk
5 RRR
RRR +
o S S k RRR
oo oo SSS
S kk kk RRR
oo SS) kk k R)
w
h(l := 1 + 0) (l := 7+!l), {l 7→ 0}i h(l := 1) (l := 7 + 0), {l 7→ 0}i h() (l := 7), {l 7→ 1}i / h() (), {l 7→ 7}i
r ooo
oo7 OOO
OOOr + kkk
kkk
5 SSS
SSS + w lll
lll
5
o O O k k SSS l l
o o OOO kkk S SSS ll
ooo ' kkk S) lll
h(l := 1+!l) (l := 7+!l), {l 7→ 0}i h(l := 1 + 0) (l := 7 + 0), {l 7→ 0}i h(l := 1) (l := 7), {l 7→ 0}i
OOO 7 SSS 5 RRR
OOO oooo SSS + kkk
kkk RRR
O oo S SS kk RR
r OOO oo o r +
SS S kk k w RRR
O' oo SS ) k kk RR)
w
h(l := 1+!l) (l := 7 + 0), {l 7→ 0}i h(l := 1 + 0) (l := 7), {l 7→ 0}i hl := 1 (), {l 7→ 7}i / h() (), {l 7→ 1}i
OOO 5 SSS 5
OOO r kkk
kkk SSS + ll
lll
OOO k k SSS ll
+ OO' kkk w SSS lll
kkk SS) lll
125
Proofs differ, but for many of those you meet the following steps should be helpful.
1. Make sure the statement of the conjecture is precisely defined. In particular, make
sure you understand any strange notation, and find the definitions of all the auxiliary
gadgets involved (e.g. definitions of any typing or reduction relations mentioned in the
statement, or any other predicates or functions).
2. Try to understand at an intuitive level what the conjecture is saying – verbalize out
loud the basic point. For example, for a Type Preservation conjecture, the basic
point might be something like “if a well-typed configuration reduces, the result is still
well-typed (with the same type)”.
3. Try to understand intuitively why it is true (or false...). Identify what the most
interesting cases might be — the cases that you think are most likely to be suspicious,
or hard to prove. Sometimes it’s good to start with the easy cases (if the setting
is unfamiliar to you); sometimes it’s good to start with the hard cases (to find any
interesting problems as soon as possible).
4. Think of a good basic strategy. This might be:
(a) simple logic manipulations;
(b) collecting together earlier results, again by simple logic; or
(c) some kind of induction.
5. Try it! (remembering you might have to backtrack if you discover you picked a strategy
that doesn’t work well for this conjecture). This might involve any of the following:
(a) Expanding definitions, inlining them. Sometimes you can just blindly expand all
definitions, but more often it’s important to expand only the definitions which
you want to work with the internal structure of — otherwise things just get too
verbose.
(b) Making abbreviations — defining a new variable to stand for some complex gadget
you’re working with, saying e.g.
where e = (let x:int = 7+2 in x+x)
Take care with choosing variable names.
126
(c) Doing equational reasoning, e.g.
e = e1 by ...
= e2 by ...
= e3 as ...
Here the e might be any mathematical object — arithmetic expressions, or ex-
pressions of some grammar, or formulae. Some handy equations over formulae
are given in §A.2.2.
(d) Proving a formula based on its structure. For example, to prove a formula ∀x ∈
S.P (x) you would often assume you have an arbitrary x and then try to prove
P (x).
Take an arbitrary x ∈ S.
We now have to show P (x):
This is covered in detail in §A.2.3. Much proof is of this form, automatically
driven by the structure of the formula.
(e) Using an assumption you’ve made above.
(f) Induction. As covered in the 1B Semantics notes, there are various kinds of induc-
tion you might want to use: mathematical induction over the natural numbers,
structural induction over the elements of some grammar, or rule induction over
the rules defining some relation (especially a reduction or typing relation). For
each, you should:
i. Decide (and state!) what kind of induction you’re using. This may need
some thought and experience, and you might have to backtrack.
ii. Remind yourself what the induction principle is exactly.
iii. Decide on the induction hypothesis you’re going to use, writing down a pred-
icate Φ which is such that the conclusion of the induction principle implies
the thing you’re trying to prove. Again, this might need some thought. Take
care with the quantifiers here — it’s suspicious if your definition of Φ has
any globally-free variables...
iv. Go through each of the premises of the induction principle and prove each one
(using any of these techniques as appropriate). Many of those premises will
be implications, e.g. ∀x ∈ N.Φ(x) ⇒ Φ(x + 1), for which you can do a proof
based on the structure of the formula — taking an arbitrary x, assuming
Φ(x), and trying to prove Φ(x + 1). Usually at some point in the latter you’d
make use of the assumption Φ(x).
6. In all of the above, remember: the point of doing a proof on paper is to use the
formalism to help you think — to help you cover all cases, precisely — and also to
communicate with the reader. For both, you need to write clearly:
(a) Use enough words! “Assume”, “We have to show”, “By such-and-such we know”,
“Hence”,...
(b) Don’t use random squiggles. It’s good to have formulae properly nested within
text, with and no “⇒” or “∴” between lines of text.
7. If it hasn’t worked yet... either
(a) you’ve make some local mistake, e.g. mis-instantiated something, or used the
same variable for two different things, or not noticed that you have a definition
you should have expanded or an assumption you should have used. Fix it and
continue.
127
(b) you’ve discovered that the conjecture is really false. Usually at this point it’s
a good idea to construct a counterexample that is as simple as possible, and to
check carefully that it really is a counterexample.
(c) you need to try a different strategy — often, to use a different induction principle
or to strengthen your induction hypothesis.
(d) you didn’t really understand intuitively what the conjecture is saying, or what
the definitions it uses mean. Go back to them again.
8. If it has worked: read through it, skeptically, and check. Maybe you’ll need to re-write
it to make it comprehensible: proof discovery is not the same as proof exposition. See
the example proofs in the Semantics notes.
9. Finally, give it to someone else, as skeptical and careful as you can find, to see if they
believe it — to see if they believe that what you’ve written down is a proof, not that
they believe that the conjecture is true.
128
A.2 And in More Detail...
First, I’ll explain informal proof intuitively, giving a couple of examples. Then I’ll explain
how this intuition is reflected in the sequent rules from Logic and Proof.
In the following, I’ll call any logic statement a formula. In general, what we’ll be trying to
do is prove a formula, using a collection of formulas that we know to be true or are assuming
to be true. There’s a big difference between using a formula and proving a formula. In fact,
what you do is in many ways opposite. So, I’ll start by explaining how to prove a formula.
Here are the logical connectives and a very brief decription of what each means.
P ∧Q P and Q are both true
P ∨Q P is true, or Q is true, or both are true
¬P P is not true (P is false)
P ⇒Q if P is true then Q is true
P ⇔Q P is true exactly when Q is true
∀x ∈ S.P (x) for all x in S, P is true of x
∃x ∈ S.P (x) there exists an x in S such that P holds of x
A.2.2 Equivalences
These are formulas that mean the same thing, and this is indicated by a ≃ between them.
The fact that they are equivalent to each other is justified by the truth tables of the con-
nectives.
definition of ⇒ P ⇒Q ≃ ¬P ∨ Q
definition of ⇔ P ⇔Q ≃ (P ⇒ Q) ∧ (Q ⇒ P )
definition of ¬ ¬P ≃ P ⇒ false
de Morgan’s Laws ¬(P ∧ Q) ≃ ¬P ∨ ¬Q
¬(P ∨ Q) ≃ ¬P ∧ ¬Q
extension to quantifiers ¬(∀x.P (x)) ≃ ∃x.¬P (x)
¬(∃x.P (x)) ≃ ∀x.¬P (x)
distributive laws P ∨ (Q ∧ R) ≃ (P ∨ Q) ∧ (P ∨ R)
P ∧ (Q ∨ R) ≃ (P ∧ Q) ∨ (P ∧ R)
coalescing quantifiers (∀x.P (x)) ∧ (∀x.Q(x)) ≃ ∀x.(P (x) ∧ Q(x))
(∃x.P (x)) ∨ (∃x.Q(x)) ≃ ∃x.(P (x) ∨ Q(x))
these ones apply if (∀x.P (x)) ∧ Q ≃ (∀x.P (x) ∧ Q)
x is not free in Q (∀x.P (x)) ∨ Q ≃ (∀x.P (x) ∨ Q)
(∃x.P (x)) ∧ Q ≃ (∃x.P (x) ∧ Q)
(∃x.P (x)) ∨ Q ≃ (∃x.P (x) ∨ Q)
For each of the logical connectives, I’ll explain how to handle them.
∀x ∈ S.P (x) This means “For all x in S, P is true of x.” Such a formula is called a
universally quantified formula. The goal is to prove that the property P , which has some
xs somewhere in it, is true no matter what value in S x takes on. Often the “∈ S” is left
out. For example, in a discussion of lists, you might be asked to prove ∀l.length l > 0 ⇒
∃x. member(x, l). Obviously, l is a list, even if it isn’t explicitly stated as such.
There are several choices as to how to prove a formula beginning with ∀x. The standard
thing to do is to just prove P (x), not assuming anything about x. Thus, in doing the proof
129
you sort of just mentally strip off the ∀x. What you would write when doing this is “Let x be
any S”. However, there are some subtleties—if you’re already using an x for something else,
you can’t use the same x, because then you would be assuming something about x, namely
that it equals the x you’re already using. In this case, you need to use alpha-conversion1 to
change the formula you want to prove to ∀y ∈ S.P (y), where y is some variable you’re not
already using, and then prove P (y). What you could write in this case is “Since x is already
in use, we’ll prove the property of y”.
An alternative is induction, if S is a set that is defined with a structural definition. Many
objects you’re likely to be proving properties of are defined with a structural definition.
This includes natural numbers, lists, trees, and terms of a computer language. Sometimes
you can use induction over the natural numbers to prove things about other objects, such
as graphs, by inducting over the number of nodes (or edges) in a graph.
You use induction when you see that during the course of the proof you would need to use
the property P for the subparts of x in order to prove it for x. This usually ends up being
the case if P involves functions defined recursively (i.e., the return value for the function
depends on the function value on the subparts of the argument).
A special case of induction is case analysis. It’s basically induction where you don’t use the
inductive hypothesis: you just prove the property for each possible form that x could have.
Case analysis can be used to prove the theorem about lists above.
A final possibility (which you can use for all formulas, not just for universally quantified
ones) is to assume the contrary, and then derive a contradiction.
∃x ∈ S.P (x) This says “There exists an x in S such that P holds of x.” Such a formula is
called an existentially quantified formula. The main way to prove this is to figure out what
x has to be (that is, to find a concrete representation of it), and then prove that P holds of
that value. Sometimes you can’t give a completely specified value, since the value you pick
for x has to depend on the values of other things you have floating around. For example,
say you want to prove
∀x, y ∈ ℜ.x < y ∧ sin x < 0 ∧ sin y > 0 ⇒ ∃z.x < z ∧ z < y ∧ sin z = 0
where ℜ is the set of real numbers. By the time you get to dealing with the ∃z.x < z ∧ z <
y ∧ sin z = 0, you will have already assumed that x and y were any real numbers. Thus the
value you choose for z has to depend on whatever x and y are.
An alternative way to prove ∃x ∈ S.P (x) is, of course, to assume that no such x exists, and
derive a contradiction.
To summarize what I’ve gone over so far: to prove a universally quantified formula, you must
prove it for a generic variable, one that you haven’t used before. To prove an existentially
quantified formula, you get to choose a value that you want to prove the property of.
P ⇒ Q This says “If P is true, then Q is true”. Such a formula is called an implication,
and it is often pronounced “P implies Q”. The part before the ⇒ sign (here P ) is called
the antecedent, and the part after the ⇒ sign (here Q) is called the consequent. P ⇒ Q is
equivalent to ¬P ∨ Q, and so if P is false, or if Q is true, then P ⇒ Q is true.
The standard way to prove this is to assume P , then use it to help you prove Q. Note that
I said that you will be using P . Thus you will need to follow the rules in Section A.2.4 to
deal with the logical connectives in P .
Other ways to prove P ⇒ Q involve the fact that it is equivalent to ¬P ∨ Q. Thus, you can
prove ¬P without bothering with Q, or you can just prove Q without bothering with P .
1 Alpha-equivalence says that the name of a bound variable doesn’t matter, so you can change it at will
(this is called alpha-conversion). You’ll get to know the exact meaning of this soon enough so I won’t explain
this here.
130
To reason by contradiction you assume that P is true and that Q is not true, and derive a
contradiction.
Another alternative is to prove the contrapositive: ¬Q ⇒ ¬P , which is equivalent to it.
P ⇔ Q This says “P is true if and only if Q is true”. The phrase “if and only if” is usually
abbreviated “iff”. Basically, this means that P and Q are either both true, or both false.
Iff is usually used in two main ways: one is where the equivalence is due to one formula
being a definition of another. For example, A ⊆ B ⇔ (∀x. x ∈ A ⇒ x ∈ B) is the standard
definition of subset. For these iff statements, you don’t have to prove them. The other use
of iff is to state the equivalence of two different things. For example, you could define an
SML function fact:
fun fact 0 = 1
| fact n = n * fact (n - 1)
Since in SML whole numbers are integers (both positive and negative) you may be asked
to prove: fact x terminates ⇔ x ≥ 0. The standard way to do this is us the equivalence
P ⇔ Q is equivalent to P ⇒ Q ∧ Q ⇒ P . And so you’d prove that (fact x terminates ⇒
x ≥ 0) ∧ (x ≥ 0 ⇒ fact x terminates).
¬P This says “P is not true”. It is equivalent to P ⇒ false, thus this is one of the ways
you prove it: you assume that P is true, and derive a contradiction (that is, you prove
false). Here’s an example of this, which you’ll run into later this year: the undecidability
of the halting problem can be rephrased as ¬∃x ∈ RM. x solves the halting problem, where
RM is the set of register machines. The proof of this in your Computation Theory notes
follows exactly the pattern I described—it assumes there is such a machine and derives a
contradiction.
The other major way to prove ¬P is to figure out what the negation of P is, using equiva-
lences like De Morgan’s Law, and then prove that. For example, to prove ¬∀x ∈ N . ∃y ∈
N . x = y 2 , where N is the set of natural numbers, you could push in the negation to get:
∃x ∈ N . ∀y ∈ N . x 6= y 2 , and then you could prove that.
P ∨ Q This says “P is true or Q is true”. This is inclusive or: if P and Q are both true,
then P ∨ Q is still true. Such a formula is called a disjunction. To prove this, you can prove
P or you can prove Q. You have to choose which one to prove. For example, if you need to
prove (5 mod 2 = 0) ∨ (5 mod 2 = 1), then you’ll choose the second one and prove that.
However, as with existentials, the choice of which one to prove will often depend on the
values of other things, like universally quantified variables. For example, when you are
studying the theory of programming languages (you will get a bit of this in Semantics), you
might be asked to prove
where ML is the set of all ML programs. You don’t know in advance which of these will be
the case, since some programs do run forever, and some do evaluate to a value. Generally,
the best way to prove the disjunction in this case (when you don’t know in advance which
will hold) is to use the equivalence with implication. For example, you can use the fact
that P ∨ Q is equivalent to ¬P ⇒ Q, then assume ¬P , then use this to prove Q. For
example, your best bet to proving this programming languages theorem is to assume that
the evaluation of P doesn’t run forever, and use this to prove that P evaluates to a value.
131
A.2.4 How to Use a Formula
You often end up using a formula to prove other formulas. You can use a formula if someone
has already proved that it’s true, or you are assuming it because it was in an implication,
namely, the A in A ⇒ B. For each logical connective, I’ll tell you how to use it.
∀x ∈ S.P (x) This formula says that something is true of all elements of S. Thus, when
you use it, you can pick any value at all to use instead of x (call it v), and then you can use
P (v).
∃x ∈ S.P (x) This formula says that there is some x that satisfies P . However, you do not
know what it is, so you can not assume anything about it. The usual approach it to just
say that the thing that is being said to exist is just x, and use the fact that P holds of x to
prove something else. However, if you’re already using an x for something else, you have to
pick another variable to represent the thing that exists.
To summarize this: to use a universally quantified formula, you can choose any value, and
use that the formula holds for that variable. To use an existentially quantified formula, you
must not assume anything about the value that is said to exists, so you just use a variable
(one that you haven’t used before) to represent it. Note that this is more or less opposite
of what you do when you prove a universally or existentially quantified formula.
¬P Usually, the main use of this formula is to prove the negation of something else.
An example is the use of reduction to prove the unsolvability of various problems in the
Computation Theory (you’ll learn all about this in Lent term). You want to prove ¬Q,
where Q states that a certain problem (Problem 1) is decidable (in other words, you want
to prove that Problem 1 is not decidable). You know ¬P , where P states that another
problem (Problem 2) is decidable (i.e. ¬P says that Problem 2 is not decidable). What you
do basically is this. You first prove Q ⇒ P , which says that if Problem 1 is decidable, then
so is Problem 2. Since Q ⇒ P ≃ ¬P ⇒ ¬Q, you have now proved ¬P ⇒ ¬Q. You already
know ¬P , so you use modus ponens2 to get that ¬Q.
P ⇒ Q The main way to use this is that you prove P , and then you use modus ponens to
get Q, which you can then use.
P ∧ Q Here you can use both P and Q. Note, you’re not required to use both of them, but
they are both true and are waiting to be used by you if you need them.
P ∨ Q Here, you know that one of P or Q is true, but you do not know which one. To use
this to prove something else, you have to do a split: first you prove the thing using P , then
you prove it using Q.
Note that in each of the above, there is again a difference in the way you use a formula,
verses the way you prove it. They are in a way almost opposites. For example, in proving
P ∧ Q, you have to prove both P and Q, but when you are using the formula, you don’t
have to use both of them.
A.3 An Example
There are several exercises in the Semantics notes that ask you to prove something. Here,
we’ll go back to Regular Languages and Finite Automata. (If they’ve faded, it’s time
2 Modus ponens says that if A ⇒ B and A are both true, then B is true.
132
to remind yourself of them.) The Pumping Lemma for regular sets (PL for short) is an
astonishingly good example of the use of quantifiers. We’ll go over the proof and use of the
PL, paying special attention to the logic of what’s happening.
My favorite book on regular languages, finite automata, and their friends is the Hopcroft
and Ullman book Introduction to Automata Theory, Languages, and Computation. You
should locate this book in your college library, and if it isn’t there, insist that your DoS
order it for you.
In the Automata Theory book, the Pumping Lemma is stated as: “Let L be a regular set.
Then there is a constant n such that if z is any word in L, and |z| ≥ n, we may write z = uvw
in such a way that |uv| ≤ n, |v| ≥ 1, and for all i ≥ 0, uv i w is in L.” The Pumping Lemma
is, in my experience, one of the most difficult things about learning automata theory. It
is difficult because people don’t know what to do with all those logical connectives. Let’s
write it as a logical formula.
∀L ∈ RegularLanguages.
∃n. ∀z ∈ L. |z| ≥ n ⇒
∃u v w. z = uvw ∧ |uv| ≤ n ∧ |v| ≥ 1 ∧
∀i ≥ 0. uv i w ∈ L
Complicated, eh? Well, let’s prove it, using the facts that Hopcroft and Ullman have
established in the chapters previous to the one wih the PL. I’ll give the proof and put in
square brackets comments about what I’m doing.
Let L be any regular language. [Here I’m dealing with the ∀L ∈ RegularLanguages by
stating that I’m not assuming anything about L.] Let M be a minimal-state deterministic
finite state machine accepting L. [Here I’m using a fact that Hopcroft and Ullman have
already proved about the equivalence of regular languages and finite automata.] Let n be
the number of states in this finite state machine. [I’m dealing with the ∃n by giving a very
specific value of what it will be, based on the arbitrary L.] Let z be any word in L. [Thus
I deal with ∀z ∈ L.] Assume that |z| ≥ n. [Thus I’m taking care of the ⇒ by assuming the
antecedent.]
Say z is written a1 a2 . . . am , where m ≥ n. Consider the states that M is in during the
processing of the first n symbols of z, a1 a2 . . . an . There are n + 1 of these states. Since
there are only n states in M , there must be a duplicate. Say that after symbols aj and ak
we are in the same state, state s (i.e. there’s a loop from this state that the machine goes
through as it accepts z), and say that j < k. Now, let u = a1 a2 . . . aj . This represents the
part of the string that gets you to state s the first time. Let v = aj+1 . . . ak . This represents
the loop that takes you from s and back to it again. Let w = ak+1 . . . am , the rest of word
z. [We have chosen definite values for u, v, and w.] Then clearly z = uvw, since u, v, and
w are just different sections of z. |uv| ≤ n since u and v occur within the first n symbols
of z. |v| ≥ 1 since j < k. [Note that we’re dealing with the formulas connected with ∧ by
proving each of them.]
Now, let i be a natural number (i.e. ≥ 0). [This deals with ∀i ≥ 0.] Then uv i w ∈ L. [Finally
our conclusion, but we have to explain why this is true.] This is because we can repeat the
loop from s to s (represented by v) as many times as we like, and the resulting word will
still be accepted by M .
Now we use the PL to prove that a language is not regular. This is a rewording of Example
2
3.1 from Hopcroft and Ullman. I’ll show that L = {0i |i is an integer, i ≥ 1} is not regular.
Note that L consists of all strings of 0’s whose length is a perfect square. I will use the PL.
133
I want to prove that L is not regular. I’ll assume the negation (i.e., that L is regular) and
derive a contradiction. So here we go. Remember that what I’m emphasizing here is not
the finite automata stuff itself, but how to use a complicated theorem to prove something
else.
Assume L is regular. We will use the PL to get a contradiction. Since L is regular, the PL
applies to it. [We note that we’re using the ∀ part of the PL for this particular L.] Let n
be as described in the PL. [This takes care of using the ∃n. Note that we are not assuming
2
anything about its actual value, just that it’s a natural number.] Let z = 0n . [Since the PL
says that something is true of all zs, we can choose the one we want to use it for.] So by the
PL there exist u, v, and w such that z = uvw, |uv| ≤ n, |v| ≥ 1. [Note that we don’t assume
anything about what the u, v, and w actually are; the only thing we know about them is
what the PL tells us about them. This is where people trying to use the PL usually screw
up.] The PL then says that for any i, then uv i w ∈ L. Well, then uv 2 w ∈ L. [This is using
the ∀i ≥ 0 bit.] However, n2 < |uv 2 w| ≤ n2 + n, since 1 ≤ |v| ≤ n. But n2 + n < (n + 1)2 .
Thus |uv 2 w| lies properly between n2 and (n + 1)2 and is thus not a perfect square. Thus
uv 2 w is not in L. This is a contradiction. Thus our assumption (that L was regular) was
incorrect. Thus L is not a regular language.
In this section, I will show how the intuitive approach to things that I’ve described above
is reflected in the sequent calculus rules. A sequent is Γ ⊢ ∆, where Γ and ∆ are sets of
formulas.3 Technically, this means that
A1 ∧ A2 ∧ . . . An ⇒ B1 ∨ B2 ∨ . . . Bm (1)
use of ⇒ as implication. Thus I will use ⊢. You will see something similar in Semantics, where it separates
assumptions (of the types of variables) from something that they allow you to prove.
4 I won’t mention iff here: as P ⇔ Q is equivalent to P ⇒ Q ∧ Q ⇒ P , we don’t need separate rules for
it.
134
already proved it before. This is shown with the cut rule:
Γ ⊢ ∆, P P, Γ ⊢ ∆
(cut)
Γ⊢∆
The ∆, P in the first sequent in the hypotheses means that to the right of the ⊢ we have
the set consisting of the formula P plus all the formulas in ∆, i.e., if all formulas in Γ are
true, then P or one of the formulas in ∆ is true. Similarly P, Γ to the left of the ⊢ in the
second sequent means the set consisting of the formula P plus all the formulas in Γ.
We read this rule from the bottom up to make sense of it. Say we want to prove one of the
formulas in ∆ from the formulas in Γ, and we want to make use of a formula P that we’ve
already proved. The fact that we’ve proved P is shown by the left hypothesis (of course,
unless the left hypothesis is itself a basic sequent, then in a completed proof there will be
more lines on top of the left hypothesis, showing the actual proof of the sequent). The fact
that we are allowed to use P in the proof of ∆ is shown in the right hand hypothesis. We
continue to build the proof up from there, using P .
Some other ways of getting formulas to the left of the ⊢ are shown in the rules (¬r) and
(⇒ r) below.
∀x ∈ S.P (x) The two rules for universally quantified formulas are:
P (v), Γ ⊢ ∆ Γ ⊢ ∆, P (x)
(∀l) (∀r)
∀x.P (x), Γ ⊢ ∆ Γ ⊢ ∆, ∀x.P (x)
∃x ∈ S.P (x) The two rules for existentially quantified formulas are:
P (x), Γ ⊢ ∆ Γ ⊢ ∆, P (v)
(∃l) (∃r)
∃x.P (x), Γ ⊢ ∆ Γ ⊢ ∆, ∃x.P (x)
135
being free in the conclusions comes from the requirement not to assume anything about x
(since we don’t know what it is). If x isn’t free in the conclusion, then it’s not free in Γ or
∆. If it were free in Γ or ∆, then we would be assuming that the x used there is the same
as the x we’re assuming exists, and this isn’t allowed.
In (∃r), we are proving ∃x.P (x). Thus we must pick a particular value (call it v) and prove
P for that value. The value v is allowed to contain variables that are free in Γ or ∆, since
you can set it to anything you want.
Γ ⊢ ∆, P P, Γ ⊢ ∆
(¬l) (¬r)
¬P, Γ ⊢ ∆ Γ ⊢ ∆, ¬P
Let’s start with the right rule first. I said that the way to prove ¬P is to assume P and
derive a contradiction. If ∆ is the empty set, then this is exactly what this rule says: If
there are no formulas to the right hand side of the ⊢, then this means that the formulas in
Γ are inconsistent (that means, they cannot all be true at the same time). This means that
you have derived a contradiction. So if ∆ is the empty set, the hypothesis of the rule says
that, assuming P , you have obtained a contradiction. Thus, if you are absolutely certain
about all your other hypotheses, then you can be sure that P is not true. The best way to
understand the rule if ∆ is not empty is to write out the meaning of the sequents in terms
of the meaning of the sequent given by Equation 1 and work out the equivalence of the top
and bottom of the rule using the equivalences in your Logic and Proof notes.
The easiest way to understand (¬l) is again by using equivalences.
Γ ⊢ ∆, P Q, Γ ⊢ ∆ P, Γ ⊢ ∆, Q
(⇒ l) (⇒ r)
P ⇒ Q, Γ ⊢ ∆ Γ ⊢ ∆, P ⇒ Q
The rule (⇒ l) easily understood using the intuitive explanation of how to use P ⇒ Q given
above. First, we have to prove P . This is the left hypothesis. Then we can use Q, which is
what the right hypothesis says.
The right rule (⇒ r) is also easily understood. In order to prove P ⇒ Q, we assume P ,
then use this to prove Q. This is exactly what the hypothesis says.
P, Q, Γ ⊢ ∆ Γ ⊢ ∆, P Γ ⊢ ∆, Q
(∧l) (∧r)
P ∧ Q, Γ ⊢ ∆ Γ ⊢ ∆, P ∧ Q
Both of these rules are easily explained by the intuition above. The left rule (∧l) says that
when you use P ∧ Q, you can use P and Q. The right rule says that to prove P ∧ Q you must
prove P , and you must prove Q. You may wonder why we need separate hypotheses for
the two different proofs. We can’t just put P, Q to the right of the ⊢ in a single hypothesis,
because that would mean that we’re proving one of the other of them (see the meaning of
the sequent given in Equation 1). So we need separate hypotheses to make sure that each
of P and Q has actually been proved.
P, Γ ⊢ ∆ Q, Γ ⊢ ∆ Γ ⊢ ∆, P, Q
(∨l) (∨r)
P ∨ Q, Γ ⊢ ∆ Γ ⊢ ∆, P ∨ Q
136
These are also easily understood by the intuitive explanations above. The left rule says that
to prove something (namely, one of the formulas in ∆) using P ∨ Q, you need to prove it
using P , then prove it using Q. The right rule says that in order to prove P ∨ Q, you can
prove one or the other. The hypothesis says that you can prove one or the other, because
in order to show a sequent Γ ⊢ ∆ true, you only need to show that one of the formulas in
∆ is true.
137