Baker CS341 Packet PDF
Baker CS341 Packet PDF
Baker CS341 Packet PDF
Dr. Baker
Fall 200?
For late-breaking information and news related to the class, see
http://www.cs.utexas.edu/users/dbaker/cs341/
Acknowledgements:
This material was assembled by Elaine Rich, who allowed me to use it wholesale. I am
greatly indebted to her for her efforts.
Table of Contents
I. Lecture Notes
A. Overview and Introduction
1.
2.
Regular Languages
Finite State Machines
Nondeterministic Finite State Machines
Interpreters for Finite State Machines
Equivalence of Regular Languages and FSMs
Languages that Are and Are Not Regular
A Review of Equivalence Relations
State Minimization
Summary of Regular Languages and Finite State Machines
Turing Machines
Computing with Turing Machines
Recursively Enumerable and Recursive Languages
Turing Machine Extensions
Problem Encoding, Turing Machine Encoding, and the Universal Turing
Machine
25. Grammars and Turing Machines
26. Undecidability
27. Introduction to Complexity Theory
II. Homework
A. Review
1.
Basic Techniques
Turing Machines
Computing with Turing Machines
Turing Machine Extensions
Unrestricted Grammars
Undecidability
E. Review
22. Review
III.
Supplementary Materials
The Three Hour Tour through Automata Theory
Review of Mathematical Concepts
Regular Languages and Finite State Machines
Context-Free Languages and Pushdown Automata
Recursively Enumerable Languages, Turing Machines, and Decidability
I. Lecture Notes
10
2
5
(3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the
arithmetic expression, since it contains only constants.
(4) Termination: Decide whether the program is guaranteed to halt.
(5) Interpretation: Figure out what (if anything) it does.
Languages
(1) = {0,1,2,3,4,5,6,7,8,9}
L
= {w *: w represents an odd integer}
= {w *: the last character of w is 1,3,5,7, or 9}
= (0123456789)*
(13579)
(2) = {(,)}
L
= {w *: w has matched parentheses}
= the set of strings accepted by the grammar:
S(S)
S SS
S
(3) L = {w: w is a sentence in English}
Examples:
Mary hit the ball.
Colorless green ideas sleep furiously.
The window needs fixed.
(4) L = {w: w is a C program that halts on all inputs}
Lecture Notes 1
Lecture Notes 1
Grammars 2
Lecture Notes 1
Bottom Up Parsing
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
Lecture Notes 1
Regular Grammars
In a regular grammar, all rules must be of the form:
<one nonterminal> <one terminal> or
or
<one nonterminal>
Lecture Notes 1
English:
S NP VP
NP the NP1 | NP1
NP1 ADJ NP1 | N
N boy | boys
VP V | V NP
V run | runs
What about boys runs
anbncn, n 1
Unrestricted Grammars
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
Lecture Notes 1
A Machine Hierarchy
1,3,5,7,9
1,3,5,7,9
0,2,4,6,8
0,2,4,6,8
Finite State Machines 2
An FSM to accept identifiers:
letter
letter or digit
blank, delimiter
or digit
delimiter or blank
anything
Pushdown Automata
a/a/
#//
s
b//b
Lecture Notes 1
f
b/b/
A Nondeterministic PDA
A PDA to accept strings of the form wwR
a//a
a/a/
//
s
b//b
b/b/
PDA 3
A PDA to accept strings of the form anbncn
Turing Machines
A Turing Machine to accept strings of the form anbncn
S
d//R
//R
a,e//R
b,f//R
a,b,e,f//L
a/d/R
b,c
b/e/R
c,d,f,
c/f/L
a,d,e,
,e,f//R
a,b,c,d
e,f//R
Lecture Notes 1
0
a
1
a
1
b
0
b
0
b
0
a
0
a
0
0
b
0
0
a
0
a
a
b
a
Lecture Notes 1
#
a
a
b
a
a00a00a01
#
q000
Church's Thesis
(Church-Turing Thesis)
An algorithm is a formal procedure that halts.
The Thesis: Anything that can be computed by any algorithm can be computed by a Turing machine.
Another way to state it: All "reasonable" formal models of computation are equivalent to the Turing machine. This isn't a formal
statement, so we can't prove it. But many different computational models have been proposed and they all turn out to be
equivalent.
Example: unrestricted grammars
A Machine Hierarchy
FSMs
PDAs
Turing Machines
Lecture Notes 1
10
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
FSMs
PDAs
Turing Machines
Closure Properties
Regular Lanugages are Closed Under:
Union
Concatenation
Kleene closure
Complementation
Reversal
Intersection
Context Free Languages are Closed Under:
Union
Concatenation
Kleene Closure
Reversal
Intersection with regular languages
Etc.
Lecture Notes 1
11
Set 1
Set 2
Set 3
Set 4
Set 5
1
1
1
1
1
1
1
1
1
But this new set must necessarily be different from all the other sets in the supposedly complete enumeration. Yet it should be
included. Thus a contradiction.
More on Cantor
Of course, if we're going to enumerate, we probably want to do it very systematically, e.g.,
Set 1
Set 2
Set 3
Set 4
Set 5
Set 6
Set 7
1
1
1
1
1
1
1
1
1
1
1
1
1
Read the rows as bit vectors, but read them backwards. So Set 4 is 100. Notice that this is the binary encoding of 4.
This enumeration will generate all finite sets of integers, and in fact the set of all finite sets of integers is countable.
But when will it generate the set that contains all the integers except 1?
Lecture Notes 1
12
Machine 1
Machine 2
Machine 3
TROUBLE
Machine 5
I1
1
I2
I3
TROUBLE
I5
1
1
1
Or maybe HALT said that TROUBLE(TROUBLE) would halt. But then TROUBLE would loop.
Lecture Notes 1
13
Decidability
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
10
5
(3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the
arithmetic expression, since it contains only constants.
(4) Termination: Decide whether the program is guaranteed to halt.
(5) Interpretation: Figure out what (if anything) useful it does.
Lecture Notes 1
14
So What's Left?
Lecture Notes 1
15
What Is a Language?
Do Homework 2.
Grammars, Languages, and Machines
Language
L
Grammar
Accepts
Machine
A string over an alphabet is a finite sequence of symbols drawn from the alphabet.
English string: happynewyear
binary string: 1001101
We will generally omit from strings unless doing so would lead to confusion.
The set of all possible strings over an alphabet is written *.
binary string: 1001101 {0,1}*
The shortest string contains no characters. It is called the empty string and is written or (epsilon).
The set of all possible strings over an alphabet is written *.
More on Strings
The length of a string is the number of symbols in it.
|| = 0
|1001101| = 7
A string a is a substring of a string b if a occurs contiguously as part of b.
aaa
is a substring of
aaabbbaaa
aaaaaa is not a substring of
aaabbbaaa
Every string is a substring (although not a proper substring) of itself.
is a substring of every string. Alternatively, we can match anywhere.
Notice the analogy with sets here.
Lecture Notes 2
What is a Language?
Operations on Strings
Concatenation: The concatenation of two strings x and y is written x || y, xy, or xy and is the string formed by appending the
string y to the string x.
|xy| = |x| + |y|
If x = and y = food, then xy =
If x = good and y = bye, then |xy| =
Note: x = x = x for all strings x.
Replication: For each string w and each natural number i, the string wi is defined recursively as
w0 =
for each i 1
wi = wi-1 w
Like exponentiation, the replication operator has a high precedence.
Examples:
a3 =
(bye)2 =
a0b3 =
String Reversal
An inductive definition:
(1) If |w| = 0 then wR = w =
(2) If |w| 1 then a : w = ua
(a is the last character of w)
and
wR = auR
Example:
(abc)R =
More on String Reversal
Theorem: If w, x are strings, then (wx)R = xRwR
Example: (dogcat)R
= (cat)R(dog)R = tacgod
associativity
definition of reversal
induction hypothesis
definition of reversal
dogcat
w
x
u a
Lecture Notes 2
What is a Language?
Defining a Language
A language is a (finite or infinite) set of finite length strings over a finite alphabet .
Example: Let = {a, b}
Some languages over : , {}, {a, b}, {, a, aa, aaa, aaaa, aaaaa}
The language * contains an infinite number of strings, including:
L = {x {a, b}* : all a's precede all b's}
, a, b, ab, ababaaa
So all languages are either finite or countably infinite. Alternatively, all languages are countable.
Operations on Languages 1
Normal set operations: union, intersection, difference, complement
Examples: = {a, b}
L1 = strings with an even number of a's
L2 = strings with no b's
L1 L2 =
L1 L2 =
L2 - L1 =
( L2 - L1) =
Lecture Notes 2
What is a Language?
Operations on Languages 2
Concatenation: (based on the definition of concatenation of strings)
If L1 and L2 are languages over , their concatenation L = L1 L2, sometimes L1L2, is
{w *: w = x y for some x L1 and y L2}
Examples:
L1 = {cat, dog}
L1 = {an: n 1}
L2 = {apple, pear}
L2 = {an: n 3}
Identities:
L = L = L (analogous to multiplication by 0)
L{}= {}L = L L (analogous to multiplication by 1)
Replicated concatenation:
Ln = LLL L (n times)
L1 = L
L0 = {}
Example:
L = {dog, cat, fish}
L0 = {}
L1 = {dog, cat, fish}
L2 = {dogdog, dogcat, dogfish, catdog, catcat, catfish, fishdog, fishcat, fishfish}
L2 = bn = {bn : n 0}
(common mistake: ) anbn = { an bn : n 0}
Note: The scope of any variable used in an expression that invokes replication will be taken to be the entire expression.
L = 1n2m
L = anbman
Operations on Languages 3
Kleene Star (or Kleene closure): L* = {w * : w = w1 w2 wk for some k 0 and some w1, w2, wk L}
Alternative definition: L* = L0 L1 L2 L3
Note: L, L*
Example:
L = {dog, cat, fish}
L* = {, dog, cat, fish, dogdog, dogcat, fishcatfish, fishdogdogfishcat, }
Another useful definition: L+ = L L*
Alternatively, L+ = L1 L2 L3
L+ = L*-{}
L+ = L*
Lecture Notes 2
if L
if L
What is a Language?
Regular Languages
Read Supplementary Materials: Regular Languages and Finite State Machines: Regular Languages
Do Homework 3.
Regular Grammars, Languages, and Machines
Regular Expression
or
Regular Grammar
Regular
Language
Accepts
Finite
State
Machine
, a, bab, ab , (ab)*a*b*
So far, regular expressions are just (finite) strings over some alphabet, {(, ), , , *}.
Regular Expressions Define Languages
Regular expressions define languages via a semantic interpretation function we'll call L:
1. L() = and L(a) = {a} for each a
2. If , are regular expressions, then
L() = L()L()
= all strings that can be formed by concatenating to some string from L() some string from L().
Note that if either or is , then its language is , so there is nothing to concatenate and the result is .
3. If , are regular expressions, then L() = L() L()
4. If is a regular expression, then L(*) = L()*
5. L( () ) = L()
A language is regular if and only if it can be described by a regular expression.
A regular expression is always finite, but it may describe a (countably) infinite language.
Lecture Notes 3
Regular Languages
Regular Languages
An equivalent definition of the class of regular languages over an alphabet :
The closure of the languages
{a} a and
[1]
with respect to the functions:
concatenation,
[2]
union, and
[3]
Kleene star.
[4]
In other words, the class of regular languages is the smallest set that includes all elements of [1] and that is closed under [2],
[3], and [4].
Closure and Closed
Informally, a set can be defined in terms of a (usually small) starting set and a group of functions over elements from the set.
The functions are applied to members of the set, and if anything new arises, its added to the set. The resulting set is called
the closure over the initial set and the functions. Note that the functions(s) may only be applied a finite number of times.
Examples:
The set of natural numbers N can be defined as the closure over {0} and the successor (succ(n) = n+1) function.
Regular languages can be defined as the closure of {a} a and and the functions of concatenation, union, and
Kleene star.
We say a set is closed over a function if applying the function to arbitrary elements in the set does not yield any new elements.
Examples:
The set of natural numbers N is closed under multiplication.
Regular languages are closed under intersection.
See Supplementary MaterialsReview of Mathematical Concepts for more formal definitions of these terms.
Examples of Regular Languages
L( a*b* ) =
L( (a b) ) =
L( (a b)* ) =
L( (ab)*a*b*) =
L = {w {a,b}* : |w| is even}
L = {w {a,b}* : w contains an odd number of a's}
Augmenting Our Notation
It would be really useful to be able to write in a regular expression.
Example: (a ) b (Optional a followed by b)
But we'd also like a minimal definition of what constitutes a regular expression. Why?
Observe that
0 = {} (since 0 occurrences of the elements of any set generates the empty string), so
* = {}
So, without changing the set of languages that can be defined, we can add to our notation for regular expressions if we
specify that
L() = {}
We're essentially treating the same way that we treat the characters in the alphabet.
Having done this, you'll probably find that you rarely need in any regular expression.
Lecture Notes 3
Regular Languages
Intersection: (well prove later that regular languages are closed under intersection)
Example: L = (a3)* (a5)*
Operator Precedence in Regular Expressions
Regular expressions are strings in the language of regular expressions. Thus to interpret them we need to:
1. Parse the string
2. Assign a meaning to the parse tree
Parsing regular expressions is a lot like parsing arithmetic expressions. To do it, we must assign precedence to the operators:
Regular
Arithmetic
Expressions
Expressions
Highest
Kleene star
exponentiation
concatenation
intersection
Lowest
multiplication
union
addition
a b* c d*
x y2 + i j2
The have limited power. They can be used to define only regular languages.
They don't look much like other kinds of grammars, which generally are composed of sets of production rules.
But we can write more "standard" grammars to define exactly the same languages that regular expressions can define.
Specifically, any such grammar must be composed of rules that:
Lecture Notes 3
Regular Languages
S
S aT
S bT
Ta
Tb
T aS
T bS
a, b
S
T
a, b
Recognizer
Language
Regular Languages
Regular Expressions
Regular Grammars
Lecture Notes 3
Regular Languages
Informally, M accepts a string w if M winds up in some state that is an element of F when it has finished reading w (if not, it
rejects w).
The language accepted by M, denoted L(M), is the set of all strings accepted by M.
Deterministic finite state machines (DFSMs) are also called deterministic finite state automata (DFSAs or DFAs).
Computations Using FSMs
A computation of A FSM is a sequence of configurations, where a configuration is any element of K *.
The yields relation |-M:
(q, w) |-M (q', w') iff
w = a w' for some symbol a , and
(q, a) = q'
(The yields relation effectively runs M one step.)
|-M * is the reflexive, transitive closure of |-M.
(The |-M* relation runs M any number of steps.)
Formally, a FSM M accepts a string w iff
(s, w) |-M * (q, ), for some q F.
An Example Computation
A DFSM to accept odd integers:
On input 235, the configurations are:
(q0, 235)
|-M
(q0, 35)
|-M
|-M
Thus (q0, 235) |-M* (q1, ). (What does this mean?)
Lecture Notes 4
More Examples
((aa) (ab) (ba) (bb))*
(b )(ab)*(a )
More Examples
L1 = {w {a, b}* : every a is immediately followed a b}
A regular expression for L1:
Lecture Notes 4
Server
send reply
send request
send reply
close socket
M=
q0
0/1
An Odd Parity Generator
After every three bits, output a fourth bit such that each group of four bits has odd parity.
Lecture Notes 4
A Nondeterministic FSA
The idea is to guess (nondeterministically) which character will be the one that doesn't appear.
10
11
a, b
12
b
b
M2=
b
20
b
21
a, b
22
a
a
M3=
Lecture Notes 5
b
a
Does this FSA accept:
baaba
Remember: we just have to find one accepting path.
Nondeterministic and Deterministic FSAs
Clearly, {Languages accepted by a DFSA} {Languages accepted by a NDFSA}
(Just treat as )
More interestingly,
Theorem: For each NDFSA, there is an equivalent DFSA.
Proof: By construction
b,c
a
q1
a,c
q0
b
q2
a,b
c
q3
a, c
2
b
c,
a,b
5
c,
b
8
Lecture Notes 5
A Real Example
See enemy
Found by enemy
Hide
Run
See sword
Coast clear
See sword
Found by enemy
Brother
kills enemy
See laser
Reach for Sword
Pick up
Laser
Sword picked up
Swing Sword
Get stabbed
Kill enemy
Die
Kill enemy
Become King
Dealing with Transitions
E(q) = {p K : (q,w) |-*M (p, w}. E(q) is the closure of {q} under the relation
An algorithm to compute E(q):
Lecture Notes 5
b,c
a
q1
a,c
q0
b
q2
a,b
c
q3
1.
2.
3.
4.
5.
L1= {w : aa occurs in w}
L2= {x : bb occurs in x}
L3= {y : L1 or L2 }
a
a, b
b
10
11
12
00
a, b
a
20
21
22
a
Another Example
a, c
2
b
c,
a, b
5
c,
b
8
E(q) =
' =
Lecture Notes 5
b,c
Example:
The missing letter machine, with || = n
No. of states after 0 chars: 1
No. of new states after 1 char:
=n
n 1
a,c
n
No. of new states after 2 chars:
= n(n-1)/2
n 2
No. of new states after 3 chars:
a
q1
q0
= n(n-1)(n-2)/6
n 3
b
q2
c
q3
a,b
1.
2.
3.
q1
0,2,4,6,8
4.
5.
0,2,4,6,8
Lecture Notes 5
a,b
b
S
T:
b
T
s := get-next-symbol;
if s = end-of-file then accept;
else if s = a then go to S;
else if s = b then go to T;
s:= get-next-symbol;
if s = end-of-file then accept;
else if s = a then go to T;
else if s = b then go to U;
etc.
ST := s;
Repeat
i := get-next-symbol;
if i end-of-string then
ST := (ST, i)
Until i = end-of-string;
If ST F then accept else reject
Nondeterministic FSAs as Algorithms
Real computers are deterministic, so we have three choices if we want to execute a nondeterministic FSA:
1.
2.
Simulate the behavior of the nondeterministic one by constructing sets of states "on the fly" during execution
No conversion cost
Time to analyze string w: O(|w| K2)
3.
Lecture Notes 6
Lecture Notes 6
Example:
Prove:
A (B C) = (A B) (A C)
A (B C)
= (B C) A
= (B A) (C A)
= (A B) (A C)
2.
commutativity
distributivity
commutativity
Do two separate proofs: (1) a b, and (2) b a, possibly using totally different techniques. In this case, we show first (by
construction) that for every regular expression there is a corresponding FSM. Then we show, by induction on the number of
states, that for every FSM, there is a corresponding regular expression.
For Every Regular Expression There is a Corresponding FSM
Lecture Notes 7
Lecture Notes 7
A Complementation Example
a
b
q1
q2
L2
Union
Concatenation
Kleene star
Complementation
An Example
(b ab*a)*ab*
Lecture Notes 7
Idea 2: To get from state I to state J without passing through any intermediate state numbered greater than K, a machine may
either:
1. Go from I to J without passing through any state numbered greater than K-1 (which we'll take as the induction hypothesis), or
2. Go from I to K, then from K to K any number of times, then from K to J, in each case without passing through any
intermediate states numbered greater than K-1 (the induction hypothesis, again).
So we'll start with no intermediate states allowed, then add them in one at a time, each time building up the regular expression
with operations under which regular languages are closed.
The Formula
Adding in state k as an intermediate state we can use to go from i to j, described using paths that don't use k:
i
R(i, j, k) = R(i, j, k - 1)
R(i, k, k-1)
R(k, k, k-1)*
/* then go from the new intermediate state back to itself as many times as you want
R(k, j, k-1)
Solution:
R(s, q, N)
Lecture Notes 7
(2,3,0) = a
(3,3,0) = b
(3,4,0) = a
a
*
a
= aa
Allow 3 as an intermediate state:
(1, 3, 3) = (1, 3, 2) (1, 3, 2)(3, 3, 2)*(3, 3, 2)
= aa
aa ( b)* ( b)
= aab*
(1, 4, 3) = (1, 4, 2) (1, 3, 2)(3, 3, 2)*(3, 4, 2)
=
aa ( b)* a
= aab*a
An Easier Way - See Packet
a
1
2
b
b
3
a
(1) Create a new initial state and a new, unique final state, neither of which is part of a loop.
2
b
b
3
a
Lecture Notes 7
(2) Remove states and arcs and replace with arcs labelled with larger and larger regular expressions. States can be removed in
any order, but dont remove either the start or final state.
2
b
ba*b
aa*b
(Notice that the removal of state 3 resulted in two new paths because there were two incoming paths to 3 from another state and 1
outgoing path to another state, so 21 = 2.) The two paths from 2 to 1 should be coalesced by unioning their regular expressions
(not shown).
ab aaa*b ba*b
a
5
Matching IP addresses:
([0-9]+ (\ . [0-9]+) {3})
Ta
Tb
T aS
T bS
a, b
S
T
a, b
An Algorithm to Generate the NDFSM from a Regular Grammar
1.
2.
3.
4.
5.
6.
7.
Ta
Tb
T aS
T bS
Lecture Notes 7
A bA
A cA
A
B aB
B cB
B
C aC
C bC
C
Example:
a
b
X
Y
a
Regular
Grammar
NFSM
(NFA)
Regular
Expression
DFSM
(DFA)
Lecture Notes 7
Lecture Notes 8
Example, Continued
L3 = L1 and divisible by 3
Recall that a number is divisible by 3 if and only if the sum of its digits is divisible by 3. We can build a FSM to determine that
and accept the language L3a, which is composed of strings of digits that sum to a multiple of 3.
L3 = L1 L3a
Finally, L = L2 L3
Another Example
= {0 - 9}
L = {w : w is the social security number of a living US resident}
Soc. Sec. #
Checking
Checking
2.
The only way to generate/accept an infinite language with a finite description is to use Kleene star (in regular expressions) or
cycles (in automata). This forces some kind of simple repetitive cycle within the strings.
Example:
ab*a generates aba, abba, abbba, abbbba, etc.
Example:
{an : n 1 is a prime number} is not regular.
Lecture Notes 8
If a FSM of n states accepts any string of length n, how many strings does it accept?
n
________
babbbbab
x y
z
L = bab*ab
xy*z must be in L.
So L includes: baab, babab, babbab, babbbbbbbbbbab
The Pumping Lemma for Regular Languages
If L is regular, then
N 1, such that
strings w L, where |w| N,
x, y, z, such that
w = xyz
and
|xy| N,
and
y ,
and
q 0, xyqz is in L.
Example: L = anbn
aaaaaaaaaabbbbbbbbbb
x
y
z
N1
long strings w
x, y, z
Call it N
We pick one
We show no x, y, z
y falls in region 3:
Lecture Notes 8
Lecture Notes 8
Lecture Notes 8
L = (ab)*
Choose w = (ab)N
Then,
for all q 0,
x yqz L
Note that this does not prove that L is regular. It just fails to prove that it is not.
Using Closure Properties
Once we have some languages that we can prove are not regular, such as anbn, we can use the closure properties of regular
languages to show that other languages are also not regular.
= {a, b}
L = {w : w contains an equal number of a's and b's }
a*b* is regular. So, if L is regular, then L1 = L a*b* is regular.
Example:
But what if L3 and L1 are regular? What can we say about L2?
L3 = L1 L2.
Example:
Lecture Notes 8
ab = ab anbn
N
aaaaaaaaaaaaa
x
y z
Distribution of primes:
||| | |
|
| |
||| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
But the Prime Number Theorem tells us that the primes "spread out", i.e., that the number of primes not exceeding x is
asymptotic to x/ln x.
Note that when q = |x| + |z|, |xyqz| = (|y| + 1)(|x| + |z|), which is composite (non-prime) if both factors are > 1. If youre careful
about how you choose N in a pumping lemma proof, you can make this true for both factors.
More Examples
Example element of L:
WBWBHHBHQQBHHBQEEQEEB
Lecture Notes 8
More Examples
= {0 - 9}
L = {w = is a prime Fermat number}
The Fermat numbers are defined by
n
Fn =
22
+ 1, n = 1, 2, 3,
Example elements of L:
F1 = 5, F2 = 17, F3 = 257, F4 = 65,537
Another Example
= {0 - 9, *, =}
L = {w = a*b=c: a, b, c {0-9}+ and
OR
Decision Procedures
A decision procedure is an algorithm that answers a question (usually yes or no) and terminates. The whole idea of a
decision procedure itself raises a new class of questions. In particular, we can now ask,
1.
2.
3.
Clearly, if we jump immediately to an answer to question 2, we have our answer to question 1. But sometimes it makes sense to
answer question 1 first. For one thing, it tells us whether to bother looking for answers to questions 2 and 3.
Examples of Question 1:
Is there a decision procedure, given a regular expression E and a string S, for determining whether S is in L(E)?
Is there a decision procedure, given a Turing machine T and an input string S, for determining whether T halts on S?
Lecture Notes 8
Let M1 and M2 be two deterministic FSAs. There is a decision procedure to determine whether M1 and M2 are equivalent. Let L1
and L2 be the languages accepted by M1 and M2. Then the language
L
must be regular. L is empty iff L1 = L2. There is a decision procedure to determine whether L is empty and thus whether L1 = L2
and thus whether M1 and M2 are equivalent.
L1
Lecture Notes 8
L2
L1
L2
L1,2
An equivalence relation on a nonempty set A creates a partition of A. We write the elements of the partition as [a1], [a2],
Example:
Partition:
Lecture Notes 9
5
a
a
6
2
b
a, b
3
State 3 is unreachable.
Step (2): Get rid of redundant states.
b
3
a
States 2 and 3 are redundant.
Getting Rid of Unreachable States
We can't easily find the unreachable states directly. But we can find the reachable ones and determine the unreachable ones from
there. An algorithm for finding the reachable states:
a
1
2
b
a, b
3
Lecture Notes 10
State Minimization
2
a
a, b
Two states have identical sets of transitions out.
Getting Rid of Redundant States
3
a
The outcomes are the same, even though the states aren't.
Finding an Algorithm for Minimization
Capture the notion of equivalence classes of strings with respect to a language.
Capture the (weaker) notion of equivalence classes of strings with respect to a language and a particular FSA.
Prove that we can always find a deterministic FSA with a number of states equal to the number of equivalence classes of strings.
Describe an algorithm for finding that deterministic FSA.
Defining Equivalence for Strings
We want to capture the notion that two strings are equivalent with respect to a language L if, no matter what is tacked on to them
on the right, either they will both be in L or neither will. Why is this the right notion? Because it corresponds naturally to what
the states of a recognizing FSM have to remember.
Example:
(1)
(2)
Lecture Notes 10
State Minimization
a
b
bbb
baa
aa
bb
aba
aab
= {a, b}
L = {w * : |w| is even}
a
b
aa
aabb
bbaa
aabaa
bb
aba
aab
bbb
baa
= {a, b}
L = aab*a
a
b
aa
ab
aabb
aabaa
aabbba
aabbaa
ba
bb
aaa
aba
aab
bab
An Example of L Where All Elements of L Are Not in the Same Equivalence Class
= {a, b}
L = {w {a, b}* : no two adjacent characters are the same}
bb
aba
a
aab
b
baa
aa
aabb
The equivalence classes of L:
Lecture Notes 10
State Minimization
aabaa
aabbba
aabbaa
Is |
L| Always Finite?
= {a, b}
L = anbn
a
b
The equivalence classes of L:
aaaa
aaaaa
aa
aba
aaa
L is an ideal relation.
What if we now consider what happens to strings when they are being processed by a real FSM?
= {a, b}
L = {w * : |w| is even}
a
1
2
a, b
b
a, b
3
Define ~M to relate pairs of strings that drive M from s to the same state.
Formally, if M is a deterministic FSM, then x ~M y if there is some state q in M such that (s, x) |-*M (q, ) and (s, y) |-*M (q, ).
Notice that M is an equivalence relation.
An Example of ~M
= {a, b}
L = {w * : |w| is even}
a
1
2
a, b
b
a, b
3
a
b
aa
aabb
bbaa
aabaa
bb
aba
aab
bbb
baa
|~M| =
Lecture Notes 10
State Minimization
Another Example of ~M
= {a, b}
L = {w * : |w| is even}
a,b
1
2
a, b
a
b
aa
aabb
bbaa
aabaa
bb
aba
aab
bbb
baa
|~M| =
[even length]
~M
(3 state)
[even length]
[odd length]
odd ending
in a
odd ending
in b
(S)
(R)
Lecture Notes 10
State Minimization
~M is a Refinement of L.
Theorem: For any deterministic finite automaton M and any strings x, y *, if x ~M y, then x L y.
Proof: If x ~M y, then x and y drive m to the same state q. From q, any continuation string w will drive M to some state r. Thus
xw and yw both drive M to r. Either r is a final state, in which case they both accept, or it is not, in which case they both reject.
But this is exactly the definition of L.
L |.
Corollary: |~M | |
Going the Other Way
When is this true?
If x L(M) y then x ~M y.
Finding the Minimal FSM for L
What's the smallest number of states we can get away with in a machine to accept L?
Example:
L = {w * : |w| is even}
Lecture Notes 10
State Minimization
The Proof
(1) K is finite.
Since L is regular, there must exist a machine M, with |~M| finite. We know that
|~M| |L|
Thus |L| is finite.
(2) is well defined.
This is assured by the definition of L, which groups together precisely those strings that have the same fate with respect to L.
The Proof, Continued
(3) L = L(M)
Suppose we knew that ([x], y) |-M* ([xy], ).
Now let [x] be [] and let s be a string in *.
Then
([], s) |-M* ([s], )
M will accept s if [s] F.
By the definition of F, [s] F iff all strings in [s] are in L.
So M accepts precisely the strings in L.
The Proof, Continued
Lemma: ([x], y) |-M* ([xy], )
By induction on |y|:
Trivial if |y| = 0.
Suppose true for |y| = n.
Show true for |y| = n+1
Let y = y'a, for some character a. Then,
|y'| = n
([x], y'a) |-M* ([xy'], a)
(induction hypothesis)
(definition of )
([xy',] a) |-M* ([xy'a], )
(trans. of |-M*)
([x], y'a) |-M* ([xy'a], )
(definition of y)
([x], y) |-M* ([xy], )
Another Version of the Myhill-Nerode Theorem
Theorem: A language is regular iff |L| is finite.
Example:
Consider:
L = anbn
a, aa, aaa, aaaa, aaaaa
Equivalence classes:
Proof:
Regular |L| is finite: If L is regular, then there exists an accepting machine M with a finite number of states N. We know that
N |L|. Thus |L| is finite.
|L| is finite regular: If |L| is finite, then the standard DFSA ML accepts L. Since L is accepted by a FSA, it is regular.
Lecture Notes 10
State Minimization
(b)(ab)*a
(a)(ba)*b
the rest
a b
b
a, b
b
3
a
aa
aaa
aaaa
Equivalence classes:
So Where Do We Stand?
1. We know that for any regular language L there exists a minimal accepting machine ML.
2. We know that |K| of ML equals |L|.
3. We know how to construct ML from L.
But is this good enough?
Consider:
a
a
1
5
a
a
6
Lecture Notes 10
State Minimization
L = {w * : |w| is even}
a
1
2
a, b
b
a, b
3
Constructing as the Limit of a Sequence of Approximating Equivalence Relations n
(Where n is the length of the input strings that have been considered so far)
We'll consider input strings, starting with , and increasing in length by 1 at each iteration. We'll start by way overgrouping
states. Then we'll split them apart as it becomes apparent (with longer and longer strings) that their behavior is not identical.
Initially, 0 has only two equivalence classes: [F] and [K - F], since on input , there are only two possible outcomes, accept or
reject.
Next consider strings of length 1, i.e., each element of . Split any equivalence classes of 0 that don't behave identically on all
inputs. Note that in all cases, n is a refinement of n-1.
Continue, until no splitting occurs, computing n from n-1.
Constructing , Continued
More precisely, for any two states p and q K and any n 1, q n p iff:
1. q n-1 p, AND
2. for all a , (p, a) n-1 (q, a)
Lecture Notes 10
State Minimization
= {a, b}
b
1
a
a
a,b
0 =
1 =
2 =
Another Example
(a*b*)*
a
b
b
2
a
0 =
1 =
Minimal machine:
Lecture Notes 10
State Minimization
10
Another Example
Ta
Tb
T aS
T bS
a, b
T
a, b
#
b
Convert to deterministic:
S = {s}
=
Another Example, Continued
Minimize:
a,b
S(1)
T(2)
a,b
a,b
#S(3)
0 =
1 =
Minimal machine:
Lecture Notes 10
State Minimization
11
Language
L
Grammar
Accepts
Machine
Machines
Nondeterministic
FSAs
Deterministic
FSAs
Regular
Grammars
Minimal
DFSAs
What Does Finite State Really Mean?
There are two kinds of finite state problems:
Those in which:
Some history matters.
Only a finite amount of history matters. In particular, it's often the case that we don't care what order things
occurred in.
Examples:
Parity
Money in a vending machine
Seat belt buzzer
Those that are characterized by patterns.
Examples:
Switching circuits:
Telephone
Railroad
Traffic lights
Lexical analysis
grep
Lecture Notes 11
Context-Free Grammars
Read K & S 3.1
Read Supplementary Materials: Context-Free Languages and Pushdown Automata: Context-Free Grammars
Read Supplementary Materials: Context-Free Languages and Pushdown Automata: Designing Context-Free Grammars.
Do Homework 11.
Context-Free Grammars, Languages, and Pushdown Automata
Context-Free
Language
L
Context-Free
Grammar
Accepts
Pushdown
Automaton
Derivation
(Generate)
choose aa
choose ab
yields
Lecture Notes 12
T
a
a a a b
Parse (Accept)
Regular Grammar
S
S aT
S bT
Ta
Tb
T aS
T bS
S
a T
b
a a b
Context-Free Grammars
Regular Grammar
(a b)*a (a b)*
Sa
S bS
S aS
S aT
Ta
Tb
T aT
T bT
choose a from (a b)
choose a from (a b)
choose a
choose a
choose a from (a b)
choose a from (a b)
S
a
S
S
a
S
a
T
a
T
a
S aB
Ba
B bB
vs.
Example 2: L = anb*an
SB
S aSa
B
B bB
Key distinction: Example 1 has no recursion on the nonregular rule.
Context-Free Grammars
Remove all restrictions on the form of the right hand sides.
S abDeFGab
Keep requirement for single non-terminal on left hand side.
S
but not ASB or aSb
Examples:
Lecture Notes 12
or ab
balanced parentheses
S
S SS
S (S)
anbn
SaSb
S
Context-Free Grammars
Context-Free Grammars
A context-free grammar G is a quadruple (V, , R, S), where:
V is the rule alphabet, which contains nonterminals (symbols that are used in the grammar but that do not appear in strings in
the language) and terminals,
(the set of terminals) is a subset of V,
R (the set of rules) is a finite subset of (V - ) V*,
S (the start symbol) is an element of V - .
x G y is a binary relation where x, y V* such that x = A and y = for some rule A in R.
Any sequence of the form
w0 G w1 G w2 G . . . G wn
e.g., (S) (SS) ((S)S)
is called a derivation in G. Each wi is called a sentinel form.
The language generated by G is
{w * : S G* w}
S
a
S
S
a
a
S
S
S
a
b
S
a S b
a
a
a
b
b
S
S
a
Lecture Notes 12
SA
SB
Aa
A aA
A aAb
Bb
B Bb
B aBb
Context-Free Grammars
S NP VP
NP the NP1 | NP1
NP1 ADJ NP1 | N
ADJ big | youngest | oldest
N boy | boys
VP V | V NP
V run | runs
English
the boys run
big boys run
the youngest boy runs
the youngest oldest boy runs
the boy run
Who did you say Bill saw coming out of the hotel?
Arithmetic Expressions
The Language of Simple Arithmetic Expressions
G = (V, , R, E), where
V = {+, *, id, T, F, E},
= {+, *, id},
R = { E id
EE+E
EE*E}
E
id
id
id
(id
id
id
id)
(id
id
id
id)
id
Lecture Notes 12
Examples:
id + id * id
id * id * id
Context-Free Grammars
BNF
Backus-Naur Form (BNF) is used to define the syntax of programming languages using context-free grammars.
Main idea: give descriptive names to nonterminals and put them in angle brackets.
Example: arithmetic expressions:
expression expression + term
expression term
term term * factor
term factor
factor (expression)
factor id
Lecture Notes 12
Context-Free Grammars
T
a, b
(2) The context-free languages are precisely the languages accepted by NDPDAs. But every FSM is a PDA that doesn't bother
with the stack. So every regular language can be accepted by a NDPDA and is thus context-free.
(3) Context-free languages are closed under union, concatenation, and Kleene *, and and each single character in are clearly
context free.
Lecture Notes 12
Context-Free Grammars
Parse Trees
Read K & S 3.2
Read Supplementary Materials: Context-Free Languages and Pushdown Automata: Derivations and Parse Trees.
Do Homework 12.
Parse Trees
Regular languages:
We care about recognizing patterns and taking appropriate actions.
Example: A parity checker
Structure
Context free languages:
We care about structure.
E
id
id
id
(id
id
*
id)
E id
EE+E
EE*E
E
id
id
id
Lecture Notes 13
(id
id
id
id)
(id
Parse Trees
id
id
+
id)
id
height
nodes
leaves
yield
Leaves are all labeled with terminals or .
Other nodes are labeled with nonterminals.
A path is a sequence of nodes, starting at the root, ending at a leaf, and following branches in the tree.
The length of the yield of any tree T with height H and branching factor (fanout) B is
Derivations
To capture structure, we must capture the path we took through the grammar. Derivations do that.
S
S SS
S (S)
1
2
3
4
5
6
S SS (S)S ((S))S (())S (())(S) (())()
S SS (S)S ((S))S ((S))(S) (())(S) (())()
1
2
3
5
4
6
S
S
(
S
(
Alternative Derivations
S
S SS
S (S)
S
(
S
(
)
)
S
)
S
S
(
)
( S )
Lecture Notes 13
Parse Trees
Ordering Derivations
Consider two derivations:
1
2
3
4
5
6
7
S SS (S)S ((S))S (())S (())(S) (())()
S SS (S)S ((S))S ((S))(S) (())(S) (())()
1
2
3
4
5
6
7
We can write these, or any, derivation as
D1 = x1 x2 x3 xn
D2 = x1' x2' x3' xn'
5
(())S
S
(
Lecture Notes 13
Parse Trees
S
(
There's one derivation in this equivalence class that precedes all others in the class.
We call this the leftmost derivation. There is a corresponding rightmost derivation.
The leftmost (rightmost) derivation can be used to construct the parse tree and the parse tree can be used to construct the leftmost
(rightmost) derivation.
Another Example
E id
EE+E
EE*E
id
id
id
[id
id
id]
id
[id
id
id]
id
id
Ambiguity
A grammar G for a language L is ambiguous if there exist strings in L for which G can generate more than one parse tree (note
that we don't care about the number of derivations).
The following grammar for arithmetic expressions is ambiguous:
E id
EE+E
EE*E
Often, when this happens, we can find a different, unambiguous grammar to describe L.
Lecture Notes 13
Parse Trees
Another Example
The following grammar for the language of matched parentheses is ambiguous:
S
S SS
S (S)
S
S
(
S
(
)
)
S
)
S
(
S
(
( S )
S
S1
S1
(
Lecture Notes 13
Parse Trees
S1
( )
S1
)
2.
3.
Arithmetic Expressions
Balanced Parentheses
Whenever we design practical languages, it is important that they not be inherently ambiguous.
Lecture Notes 13
Parse Trees
Pushdown Automata
Read K & S 3.3.
Read Supplementary Materials: Context-Free Languages and Pushdown Automata: Designing Pushdown Automata.
Do Homework 13.
Recognizing Context-Free Languages
Two notions of recognition:
(1) Say yes or no, just like with FSMs
(2) Say yes or no, AND
if yes, describe the structure
c
Just Recognizing
(
(
(
(
(
Finite
State
Controller
(
Definition of a Pushdown Automaton
state
( {})
input or
Lecture Notes 14
K
state
string of symbols to
push on top of stack
Pushdown Automata
the states
the input alphabet
the stack alphabet
((s, [, ), (s, [ ))
((s, ], [ ), (s, ))
Important:
This does not mean that the stack is empty.
An Example of Accepting
[//[
s
]/[/
contains:
[1]
((s, [, ), (s, [ ))
[2]
((s, ], [ ), (s, ))
input = [ [ [ ] [ ] ] ]
trans
1
1
1
2
1
2
2
2
state
s
s
s
s
s
s
s
s
s
unread input
[[[][]]]
[[][]]]
[][]]]
][]]]
[]]]
]]]
]]
]
stack
[
[[
[[[
[[
[[[
[[
[
An Example of Rejecting
[//[
s
]/[/
contains:
[1]
((s, [, ), (s, [ ))
[2]
((s, ], [ ), (s, ))
input = [ [ ] ] ]
trans
state
unread input
stack
s
[[]]]
1
s
[]]]
[
1
s
]]]
[[
2
s
]]
[
2
s
]
none!
s
]
We're in s, a final state, but we cannot accept because the input string is not empty. So we reject.
Lecture Notes 14
Pushdown Automata
a/a/
c//
s
b//b
b/b/
the states
the input alphabet
the stack alphabet
the final states
An Example of Accepting
a//a
a/a/
c//
s
b//b
f
b/b/
contains:
[1]
((s, a, ), (s, a))
[2]
((s, b, ), (s, b))
[3]
((s, c, ), (f, ))
[4]
((f, a, a), (f, ))
[5]
((f, b, b), (f, ))
input = b a c a b
trans
2
1
3
5
6
Lecture Notes 14
state
s
s
s
f
f
f
unread input
bacab
acab
cab
ab
b
stack
b
ab
ab
b
Pushdown Automata
A Nondeterministic PDA
L = wwR
S
S aSa
S bSb
A PDA to accept strings of the form wwR:
a//a
a/a/
//
s
b//b
b/b/
the states
the input alphabet
the stack alphabet
the final states
An Example of Accepting
a//a
a/a/
//
s
b//b
b/b/
[1]
[2]
[3]
trans
state
s
s
f
f
unread input
aabbaa
abbaa
abbaa
bbaa
stack
a
a
state
s
s
s
s
f
f
f
f
unread input
aabbaa
abbaa
bbaa
baa
baa
aa
a
stack
a
aa
baa
baa
aa
a
1
3
4
none
trans
1
1
2
3
5
4
4
Lecture Notes 14
[4]
[5]
Pushdown Automata
L = {ambn : m n}
A context-free grammar for L:
S
S Sb
S aSb
A PDA to accept L:
a//a
/* more b's
b/a/
b/a/
b//
2
b//
Accepting Mismatches
L = {ambn m n; m, n >0}
a//a
b/a/
b/a/
a//a
/a/
/a/
b/a/
b/a/
2
a//a
1
/a/
b/a/
/a/
b/a/
2
b//
4
Lecture Notes 14
b//
Pushdown Automata
Eliminating Nondeterminism
A PDA is deterministic if, for each input and state, there is at most one possible transition. Determinism implies uniquely
defined machine behavior.
a//a
/a/
b/a/
/a/
b/a/
1
b//
4
b//
/a/
b/a/
/a/
b/a/
//Z
1
/Z/
b/Z/
4
b//
S NC
S QP
NA
NB
Aa
A aA
A aAb
Bb
B Bb
B aBb
C | cC
P B'
P C'
B' b
B' bB'
B' bB'c
C' c | C'c
C' C'c
C' bC'c
Q | aQ
a//a
S'
machine for N
a//
b,c
clear and accept
machine for P
Lecture Notes 14
Pushdown Automata
A NDPDA for L:
SA
SB
A
A aAb
B
B bBa
A DPDA for L:
More on PDAs
What about a PDA to accept strings of the form ww?
Every FSM is (Trivially) a PDA
Given an FSM M = (K, , , s, F)
and elements of of the form
(
p,
old state,
i,
input,
q
)
new state
),
old state,
input,
don't look at stack
q,
new state
)
don't push on stack
(//(
S
)/(/
(//(
S'
)/(/
/#/
F
The new machine is nondeterministic:
( ) ( )
Lecture Notes 14
Pushdown Automata
Arithmetic Expressions
//E
1
Example:
a+b*c
But what we really want to do with languages like this is to extract structure.
Comparing Regular and Context-Free Languages
Regular Languages
Context-Free Languages
context-free grammars
parse
= NDPDAs
regular expressions
- or regular grammars
recognize
= DFSAs
Lecture Notes 14
Pushdown Automata
//E
1
Given G = (V, , R, S)
Construct M such that L(M) = L(G)
M = ({p, q}, , V, , p, {q}), where contains:
(1) ((p, , ), (q, S))
push the start symbol on the stack
(2) ((q, , A), (q, x)) for each rule A x in R
replace left hand side with right hand side
(3) ((q, a, a), (q, )) for each a
read an input character and pop it from the stack
The resulting machine can execute a leftmost derivation of an input string in a top-down fashion.
Lecture Notes 15
input = a a b b a a
trans
state
p
q
q
q
q
q
q
q
q
q
q
q
q
q
0
3
6
3
6
2
5
7
5
7
4
6
6
(p, , ), (q, S)
(q, , S), (q, )
(q, , S), (q, B)
(q, , S), (q, aSa)
(q, , B), (q, )
(q, , B), (q, bB)
(q, a, a), (q, )
(q, b, b), (q, )
0
1
2
3
4
5
6
7
unread input
aabbaa
aabbaa
aabbaa
abbaa
abbaa
bbaa
bbaa
bbaa
baa
baa
aa
aa
a
S
SB
S aSa
B
B bB
stack
S
aSa
Sa
aSaa
Saa
Baa
bBaa
Baa
bBaa
Baa
aa
a
Another Example
L = {anbmcpdq : m + n = p + q}
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(p, , ), (q, S)
(q, , S), (q, aSd)
(q, , S), (q,T)
(q, , S), (q,U)
(q, , T), (q, aTc)
(q, , T), (q, V)
(q, , U), (q, bUd)
(q, , U), (q, V)
(q, , V), (q, bVc
(q, , V), (q, )
(q, a, a), (q, )
(q, b, b), (q, )
(q, c, c), (q, )
(q, d, d), (q, )
0
1
2
3
4
5
6
7
8
9
10
11
12
13
S aSd
ST
SU
T aTc
TV
U bUd
UV
V bVc
V
input = a a b c d d
S aSd
ST
SU
T aTc
TV
U bUd
UV
V bVc
V
(6)
(7)
(8)
(9)
a//a
b//a
c/a/
b//a
1
2
//
d/a/
c/a/
d/a/
3
//
4
//
input = a a b c d d
Lecture Notes 15
Notice Nondeterminism
Machines constructed with the algorithm are often nondeterministic, even when they needn't be. This happens even with trivial
languages.
Example: L = anbn
A grammar for L is:
[1] S aSb
[2] S
contains:
((s, a, ), (s, a))
((s, b, ), (s, b))
((s, c, ), (f, ))
((f, a, a), (f, ))
((f, b, b), (f, ))
a/a/
c//
s
b//b
b/b/
a/a/
/Z/
c//
s
b//b
f'
b/b/
Step 2:
(1)
Assure that || 1.
(2)
Assure that || 2.
(3)
Assure that || = 1.
Lecture Notes 15
Making M Simple
a//a
c//
//Z
s'
a/a/
/Z/
b//b
f'
b/b/
If the nonterminal <s1, X, s2> * w, then the PDA starts in state s1 with (at least) X on the stack and after consuming w and
popping the X off the stack, it ends up in state s2.
Start with the rule:
S <s, Z, f> where s is the start state, f is the (introduced) final state and Z is the stack bottom symbol.
Transitions ((s1, a, X), (s2, YX)) become a set of rules:
<s1, X, q> a <s2, Y, r> <r, X, q> for a {}, q,r K
Transitions ((s1, a, X), (s2, Y)) becomes a set of rules:
<s1, X, q> a <s2, Y, q> for a {}, q K
Transitions ((s1, a, X), (s2, )) become a rule:
for a {}
<s1, X, s2> a
Lecture Notes 15
S <s, Z, f'>
[1]
[2]
[x]
[x]
[x]
[x]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Context-Free Languages
context-free grammars
parse
= NDPDAs
regular exprs.
or
regular grammars
recognize
= DFSAs
Lecture Notes 15
Now it's time to worry about extracting structure (and doing so efficiently).
Optimizing Context-Free Languages
For regular languages:
Computation = operation of FSMs. So,
Optimization = Operations on FSMs:
Conversion to deterministic FSMs
Minimization of FSMs
For context-free languages:
Computation = operation of parsers. So,
Optimization = Operations on languages
Operations on grammars
Parser design
Before We Start: Operations on Grammars
There are lots of ways to transform grammars so that they are more useful for a particular purpose.
the basic idea:
1. Apply transformation 1 to G to get of undesirable property 1. Show that the language generated by G is unchanged.
2. Apply transformation 2 to G to get rid of undesirable property 2. Show that the language generated by G is unchanged AND
that undesirable property 1 has not been reintroduced.
3. Continue until the grammar is in the desired form.
Examples:
Getting rid of rules (nullable rules)
Getting rid of sets of rules with a common initial terminal, e.g.,
A aB, A aC become A aD, D B | C
Conversion to normal forms
Lecture Notes 16
Normal Forms
If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with.
Normal forms are designed to do just that. Various ones have been developed for various purposes.
Examples:
Category
fruit
fruit
vegetable
vegetable
or
or
or
Supplier
Price
A
B
A
B
Normal Forms for Grammars
Chomsky Normal Form, in which all rules are of one of the following two forms:
X a, where a , or
X BC, where B and C are nonterminals in G
Greibach Normal Form, in which all rules are of the following form:
X a , where a and is a (possibly empty) string of nonterminals
If L is a context-free language that does not contain , then if G is a grammar for L, G can be rewritten into both of these normal
forms.
Lecture Notes 16
2.
Lecture Notes 16
Remove from G' all productions P whose right hand sides have length greater than 1 and include a terminal (e.g., A
aB or A BaC):
3.1.
Create a new nonterminal Ta for each terminal a in .
3.2.
Modify each production P by substituting Ta for each terminal a.
3.3.
Add to G', for each Ta, the rule Ta a
Example:
A aB
A BaC
A BbC
Ta a
Tb b
Conversion to Chomsky Normal Form
4.
Remove from G' all productions P whose right hand sides have length greater than 2 (e.g., A BCDE)
4.1.
For each P of the form A N1N2N3N4Nn, n > 2, create new nonterminals M2, M3, Mn-1.
4.2.
Replace P with the rule A N1M2.
4.3.
Add the rules M2 N2M3, M3 N3M4, Mn-1 Nn-1Nn
Example:
A BCDE
(n = 4)
A BM2
M2 C M3
M3 DE
Lecture Notes 16
E
E
E
.
.
.
id
Bottom Up
E
E
F
id + id
id + id
id
id
[1]
[2]
[3]
[4]
[5]
[6]
[7]
(1, , ), (2, E)
(2, , E), (2, E+T)
(2, , E), (2, T)
(2, , T), (2, T*F)
(2, , T), (2, F)
(2, , F), (2, (E) )
(2, , F), (2, id)
Lecture Notes 17
id + id * id(id)
Stack:
E
What Does It Produce?
The leftmost derivation of the string. Why?
E E + T T + T F + T id + T
id + T * F id + F * F id + id * F
id + id * id(E) id + id * id(T)
id + id * id(F) id + id * id(id)
E
E
id
id
F
id
( E )
T
F
id
Lecture Notes 17
nondeterministic
nondeterministic
nondeterministic
Is Nondeterminism A Problem?
Yes.
In the case of regular languages, we could cope with nondeterminism in either of two ways:
Create an equivalent deterministic recognizer (FSM)
Simulate the nondeterministic FSM in a number of steps that was still linear in the length of the input string.
For context-free languages, however,
The best straightforward general algorithm for recognizing a string is O(n3) and the best (very complicated) algorithm is
based on a reduction to matrix multiplication, which may get close to O(n2).
We'd really like to find a deterministic parsing algorithm that could run in time proportional to the length of the input string.
Is It Possible to Eliminate Nondeterminism?
In this case: Yes
In general: No
Some definitions:
A PDA M is deterministic if it has no two transitions such that for some (state, input, stack sequence) the two transitions
could both be taken.
Theorem: The class of deterministic context-free languages is a proper subset of the class of context-free languages.
Proof: Later.
Adding a Terminator to the Language
We define the class of deterministic context-free languages with respect to a terminator ($) because we want that class to be as
large as possible.
Theorem: Every deterministic CFL (as just defined) is a context-free language.
Proof:
Without the terminator ($), many seemingly deterministic cfls aren't. Example:
a* {anbn : n> 0}
Possible Solutions to the Nondeterminism Problem
1)
2)
3)
Add a terminator $
Change the parsing algorithm
Modify the grammar
Lecture Notes 17
id + id * id(id)
E E + T T + T F + T id + T
id + T * F id + F * F id + id * F
Considering transitions:
(5) (2, , F), (2, (E) )
(6) (2, , F), (2, id)
(7) (2, , F), (2, id(E))
If we add to the state an indication of what character is next, we have:
(5) (2, (, , F), (2, (E) )
(6) (2, id, , F), (2, id)
(7) (2, id, , F), (2, id(E))
Modifying the Language
So we've solved part of the problem. But what do we do when we come to the end of the input? What will be the state indicator
then?
The solution is to modify the language. Instead of building a machine to accept L, we will build a machine to accept L$.
Using Lookahead
EE+T
ET
TT*F
TF
F (E)
F id
F id(E)
[1]
[2]
[3]
[4]
[5]
[6]
[7]
For now, we'll ignore the issue of when we read the lookahead character and the fact that we only care about it if the top symbol
on the stack is F.
Possible Solutions to the Nondeterminism Problem
1)
2)
3)
Add a terminator $
Change the parsing algorithm
Lecture Notes 17
F id
F id(E)
F id A
A
A (E)
Replace with:
[6']
[7']
[8']
EE+T
ET
The problem:
E
E
E
Replace with:
[1]
[2]
[3]
Lecture Notes 17
E T E'
E' + T E'
E'
A m
A mA'
We have just offered heuristic rules for getting rid of some nondeterminism.
We know that not all context-free languages are deterministic, so there are some languages for which these rules won't work.
We define a grammar to be LL(k) if it is possible to decide what production to apply by looking ahead at most k symbols in the
input string.
Specifically, a grammar G is LL(1) iff, whenever
A | are two rules in G:
1. For no terminal a do and derive strings beginning with a.
2. At most one of | can derive .
3. If * , then does not derive any strings beginning with a terminal in FOLLOW(A), defined to be the set of terminals
that can immediately follow A in some sentential form.
We define a language to be LL(k) if there exists an LL(k) grammar for it.
Lecture Notes 17
E
ETE'
ETE'
E'
E'+TE'
E'
T
TFT'
TFT'
T'
T'
T'*FT'
T'
F
Fid
F(E)
Given input id + id * id, the first few moves of this parser will be:
E
ETE'
TE'
TFT'
FT'E'
Fid
idT'E'
T'E'
T'
E'
$
E'
T'
id + id * id$
id + id * id$
id + id * id$
id + id * id$
+ id * id$
+ id * id$
S' else ST
III.
Lecture Notes 17
TT*F
TF
[5]
[6]
[7]
F (E)
F id
F id(E)
New Grammar
E TE'
E' +TE'
E'
T FT'
T' *FT'
T'
F (E)
F idA
A
A (E)
input = id + id + id
E
T
F
id
E'
T'
T
F
E'
T'
id
F
id
E'
T'
Context-Free Languages
context-free grammars
= NDPDAs
parse
find deterministic grammars
find efficient parsers
regular exprs.
or
regular grammars
= DFSAs
recognize
minimize FSAs
Lecture Notes 17
Bottom Up Parsing
Read Supplementary Materials: Context-Free Languages and Pushdown Automata: Parsing, Section 3.
Bottom Up Parsing
An Example:
EE+T
ET
TT*F
TF
F (E)
F id
[1]
[2]
[3]
[4]
[5]
[6]
id
id
id
Lecture Notes 18
Bottom Up Parsing
0
1
2
3
4
5
6
7
trans (action)
state
p
0 (shift)
p
6 (reduce F id)
p
4 (reduce T F)
p
2 (reduce E T)
p
0 (shift)
p
0 (shift)
p
6 (reduce F id)
p
4 (reduce T F)
p
0 (shift)
p
0 (shift)
p
6 (reduce F id)
p
3 (reduce T T * F) p
1 (reduce E E + T) p
7 (accept)
q
M for Expressions
unread input
id + id * id$
+ id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
* id$
* id$
* id$
id$
$
$
$
$
$
stack
id
F
T
E
+E
id+E
F+E
T+E (could also reduce)
*T+E
id*T+E
F*T+E (could also reduce T F)
T+E
E
id
id
id $
E+ id* id
T+ id*id
F+ id*id
id+ id*id
Lecture Notes 18
Bottom Up Parsing
Add a terminator $
2)
3)
Left factor
Let's return to the problem of deciding when to shift and when to reduce (as in our example).
We chose, correctly, to shift * onto the stack, instead of reducing
T+E
to E.
This corresponds to knowing that + has low precedence, so if there are any other operations, we need to do them first.
Solution:
1. Add a one character lookahead capability.
2. Define the precedence relation
P
( V
top
stack
symbol
{ $} )
next
input
symbol
If (a,b) is in P, we reduce (without consuming the input) . Otherwise we shift (consuming the input).
How Does It Work?
We're reconstructing rightmost derivations backwards. So suppose a rightmost derivation contains
abx
To make this happen, we put (a, b) in P. That means we'll try to reduce if a is on top of the stack and b is the next character. We
will actually succeed if the next part of the stack is .
Lecture Notes 18
Bottom Up Parsing
Example
T*F
T
*
E
id
A Different Example
E+T
*
E
E+T$
E + T + id
E + T * id
Lecture Notes 18
id
Bottom Up Parsing
C1
then
if
C2
then
ST1
1
else
ST2
2
2
1
0
1
2
3
4
5
6
7
trans (action)
state
p
0 (shift)
p
6 (reduce F id)
p
4 (reduce T F)
p
2 (reduce E T)
p
0 (shift)
p
0 (shift)
p
6 (reduce F id)
p
4 (reduce T F)
p
0 (shift)
p
0 (shift)
p
6 (reduce F id)
p
3 (reduce T T * F) p
1 (reduce E E + T) p
7 (accept)
q
Lecture Notes 18
unread input
id + id * id$
+ id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
* id$
* id$
* id$
id$
$
$
$
$
$
Bottom Up Parsing
stack
id
F
T
E
+E
id+E
F+E
T+E (could also reduce)
*T+E
id*T+E
F*T+E (could also reduce T F)
T+E
E
F * T
+ E
We call grammars that become unambiguous with the addition of a precedence relation and the longest string reduction heuristic
weak precedence grammars.
Possible Solutions to the Nondeterminism Problem in a Bottom Up Parser
1)
Add a terminator $
2)
Use an LR parser
3)
LR parsers scan each input Left to right and build a Rightmost derivation. They operate bottom up and deterministically using a
parsing table derived from a grammar for the language to be recognized.
A grammar that can be parsed by an LR parser examining up to k input symbols on each move is an LR(k) grammar. Practical
LR parsers set k to 1.
An LALR ( or Look Ahead LR) parser is a specific kind of LR parser that has two desirable properties:
The parsing table is not huge.
Most useful languages can be parsed.
Another big reason to use an LALR parser:
There are automatic tools that will construct the required parsing table from a grammar and some optional additional
information.
We will be using such a tool:
Lecture Notes 18
yacc
Bottom Up Parsing
Lexical Analyzer
Output Token
Stack
Parsing Table
In simple cases, think of the "states" on the stack as corresponding to either terminal or nonterminal characters.
In more complicated cases, the states contain more information: they encode both the top stack symbol and some facts about
lower objects in the stack. This information is used to determine which action to take in situations that would otherwise be
ambiguous.
The Actions the Parser Can Take
At each step of its operation, an LR parser does the following two things:
1)
2)
Based on its current state, it decides whether it needs a lookahead token. If it does, it gets one.
Based on its current state and the lookahead token if there is one, it chooses one of four possible actions:
Shift the lookahead token onto the stack and clear the lookahead token.
Reduce the top elements of the stack according to some rule of the grammar.
Detect the end of the input and accept the input string.
Lecture Notes 18
Bottom Up Parsing
A Simple Example
0: S rhyme $end ;
1: rhyme sound place ;
2: sound DING DONG ;
3: place DELL
state 0 (empty)
$accept : _rhyme $end
DING shift 3
. error
rhyme goto 1
sound goto 2
state 1 (rhyme)
$accept : rhyme_$end
$end accept
. error
state 2 (sound)
rhyme : sound_place
DELL shift 5
. error
place goto 4
state 3 (DING)
sound : DING_DONG
DONG shift 6
. error
state 4 (place)
rhyme : sound place_ (1)
. reduce 1
by rule 1
state 5 (DELL)
place : DELL_ (3)
. reduce 3
state 6 (DONG)
sound : DING DONG_ (2)
. reduce 2
procname ( id)
Lecture Notes 18
Bottom Up Parsing
Output: -1414.52
The Language Interpretation Problem:
Input: -(17 * 83.56) + 72 / 12
Output: -1414.52
The Language Interpretation Problem:
Input: -(17 * 83.56) + 72 / 12
Output: -1414.52
Lecture Notes 18
Bottom Up Parsing
A string of input tokens, corresponding to the primitive objects of which the input is composed:
-(id * id) + id / id
2
1
2
lex
1
lexical analyzer
yacc
parser
Lecture Notes 18
Bottom Up Parsing
10
lex
The input to lex:
definitions
%%
rules
%%
user routines
All strings that are not matched by any rule are simply copied to the output.
Rules:
[ \t]+;
[A-Za-z][A-Za-z0-9]*
return(ID);
find identifiers
[0-9]+
Example:
integer
[a-z]+
input:
action 1
action 2
integers
integer
take action 2
take action 1
yacc
(Yet Another Compiler Compiler)
Lecture Notes 18
:a b c
:a b c
:a b c
{action}
{$$ = $2}
Bottom Up Parsing
11
Example
Input to yacc:
%token DING DONG DELL
%%
rhyme :
sound place ;
sound :
DING DONG ;
place :
DELL
%%
#include "lex.yy.c"
state 3 (DING)
sound : DING_DONG
DONG shift 6
. error
state 4 (place)
rhyme : sound place_ (1)
. reduce 1
state 5 (DELL)
place : DELL_ (3)
. reduce 3
state 6 (DONG)
sound : DING DONG_ (2)
. reduce 2
state 0 (empty)
$accept : _rhyme $end
DING shift 3
. error
rhyme goto 1
sound goto 2
state 1 (rhyme)
$accept : rhyme_$end
$end accept
. error
state 2 (sound)
rhyme : sound_place
DELL shift 5
. error
place goto 4
C1
then
if
C2
then
ST1
1
else
ST2
2
Lecture Notes 18
2
1
Bottom Up Parsing
12
E
E
E
T
T
T
id
id
id
What does the shift rather than reduce heuristic if we instead write:
EE+E
E id
id
id
id
input:
id + id * id
T
+
E
id * id + id
T
*
E
One solution was the precedence table, derived from an unambiguous grammar, which can be encoded into the parsing table of an
LR parser, since it tells us what to do for each top-of-stack, input character combination.
Operator Precedence
We know that we can write an unambiguous grammar for arithmetic expressions that gets the precedence right. But it turns out
that we can build a faster parser if we instead write:
E E + E | E * E | (E) | id
And, in addition, we specify operator precedence. In yacc, we specify associativity (since we might not always want left) and
precedence using statements in the declaration section of our grammar:
%left '+' '-'
%left '*' '/'
Operators on the first line have lower precedence than operators on the second line, and so forth.
Lecture Notes 18
Bottom Up Parsing
13
Reduce/Reduce Conflicts
Recall:
2.
This can easily be used to simulate the longest prefix heuristic, "Choose the longest possible stack string to reduce."
EE+T
ET
TT*F
TF
F (E)
F id
[1]
[2]
[3]
[4]
[5]
[6]
creates lex.yy.c
creates y.tab.c
actually compiles y.tab.c and lex.yy.c, which is included.
-ly links the yacc library, which includes main and yyerror.
-ll links the lex library
Parser
ask
for a
token
return
a
token
yylval
Lexical Analyer
set the value of the token
Summary
Efficient parsers for languages with the complexity of a typical programming language or command line interface:
Make use of special purpose constructs, like precedence, that are very important in the target languages.
May need complex transition functions to capture all the relevant history in the stack.
Use heuristic rules, like shift instead of reduce, that have been shown to work most of the time.
Would be very difficult to construct by hand (as a result of all of the above).
Lecture Notes 18
Bottom Up Parsing
14
Unfortunately, these are weaker than they are for regular languages.
The Context-Free Languages are Closed Under Union
Let G1 = (V1, 1, R1, S1) and
G2 = (V2, 2, R2, S2)
Assume that G1 and G2 have disjoint sets of nonterminals, not including S.
Let L = L(G1) L(G2)
We can show that L is context-free by exhibiting a CFG for it:
Lecture Notes 19
Lecture Notes 19
Example
L=
a nb n
(aa)*(bb)*
b/a/
a//a
b/a/
a
b
b
3
4
b
(1, a, 2)
(1, b, 3)
(2, a, 1)
(3, b, 4)
(4, b, 3)
A PDA for L:
Dont Try to Use Closure Backwards
One Closure Theorem:
If L1 and L2 are context free, then so is
L3 = L1 L2.
But what if L3 and L1 are context free? What can we say about L2?
L3 = L1 L2.
Example:
anbnc* = anbnc* anbncn
If L is a context-free language, and if w is a string in L where |w| > K, for some value of K, then w can be rewritten as uvxyz,
where |vy| > 0 and |vxy| M, for some value of M.
uxz, uvxyz, uvvxyyz, uvvvxyyyz, etc. (i.e., uvnxynz, for n 0) are all in L.
Lecture Notes 19
height
nodes
leaves
yield
Theorem: The length of the yield of any tree T with height H and branching factor (fanout) B is BH.
Proof: By induction on H. If H is 1, then just a single rule applies. By definition of fanout, the longest yield is B.
Assume true for H = n.
Consider a tree with H = n + 1. It consists of a root, and some number of subtrees, each of which is of height n (so induction
hypothesis holds) and yield Bn. The number of subtrees B. So the yield must be B(Bn) or Bn+1.
What Is K?
S
Assume that we are considering the bottom most two occurrences of some nonterminal. Then the yield of the upper one is at
most BT+1 (since only one nonterminal repeats).
So M = BT+1.
Lecture Notes 19
Unfortunately, we don't know where v and y fall. But there are two possibilities:
1. If vy contains all three symbols, then at least one of v or y must contain two of them. But then uvvxyyz contains at least one
out of order symbol.
2. If vy contains only one or two of the symbols, then uvvxyyz must contain unequal numbers of the symbols.
Using the Strong Pumping Lemma for Context Free Languages
If L is context free, then
There exist K and M (with M K) such that
For all strings w, where |w| > K,
(Since true for all such w, it must be true for any paricular one, so you pick w)
(Hint: describe w in terms of K or M)
there exist u, v, x, y, z such that w = uvxyz and
We need to pick w, then show that there are no values for uvxyz that satisfy all the above criteria. To do that, we just need to
focus on possible values for v and y, the pumpable parts. So we show that all possible picks for v and y violate at least one of
the criteria.
Write out a single string, w (in terms of K or M) Divide w into regions.
For each possibility for v and y (described in terms of the regions defined above), find some value n such that uvnxynz is not in L.
Almost always, the easiest values are 0 (pumping out) or 2 (pumping in). Your value for n may differ for different cases.
Lecture Notes 19
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Convince the reader that there are no other cases.
Q. E. D.
A Pumping Lemma Proof in Full Detail
Proof that L = {anbncn : n 0} is not context free.
Suppose L is context free. The context free pumping lemma applies to L. Let M be the number from the pumping lemma.
Choose w = aMbMcM. Now w L and |w| > M K. From the pumping lemma, for all strings w, where |w| > K, there exist u, v, x,
y, z such that w = uvxyz and |vy| > 0, and |vxy| M, and for all n 0, uvnxynz is in L. There are two main cases:
1. Either v or y contains two or more different types of symbols (a, b or c). In this case, uv2xy2z is not of the form
a*b*c* and hence uv2xy2z L.
2. Neither v nor y contains two or more different types of symbols. In this case, vy may contain at most two types of
symbols. The string uv0xy0z will decrease the count of one or two types of symbols, but not the third, so uv0xy0z L
Cases 1 and 2 cover all the possibilities. Therefore, regardless of how w is partitioned, there is some uvnxynz that is not in L.
Contradiction. Therefore L is not context free.
Note: the underlined parts of the above proof is boilerplate that can be reused. A complete proof should have this text or
something equivalent.
Context-Free Languages Over a Single-Letter Alphabet
Theorem: Any context-free language over a single-letter alphabet is regular.
Examples:
L
L
= {anbn}
= {anan}
= {a2n}
= {w {a}* : |w| is even}
L
L
L
L
= {anbm : n, m 0 and n m}
= {anam : n, m 0 and n m}
=
Lecture Notes 19
L = {an : n 1 is prime}
|L| = 1. So if L were context free, it would also be regular. But we know that it is not. So it is not context free either.
Now what?
|w| > K
t
v
What if u is
v is
x is
y is
z is
,
w,
,
w, and
Lecture Notes 19
t
y
tntn is in L.
t
v
t
y
Suppose |v| = |y|. Now we have to show that repeating them makes the two copies of t different. But we cant.
L = {tt : t {a, b}* }
But let's consider L' = L a*b*a*b*
This time, we let |w| > 2M, and the number of both a's and b's in w >M:
1
2
3
4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t
t
u
v x y
z
Now we use pumping to show that L' is not context free.
First, notice that if either v or y contains both a's and b's, then we immediately violate the rules for L' when we pump.
So now we know that v and y must each fall completely in one of the four marked regions.
L' = {tt : t {a, b}* } a*b*a*b*
|w| > 2M, and the number of both a's and b's in w >M:
1
2
3
4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t
t
u
v x y
z
Consider the combinations of (v, y):
(1,1)
(2,2)
(3,3)
(4,4)
(1,2)
(2,3)
(3,4)
(1,3)
(2,4)
(1,4)
Lecture Notes 19
Let
5.
6.
Lecture Notes 19
b/a/
b/a/
$//
$//
Set M = M'. Make M simple.
a/a/aa
a/Z/aZ
//Z
b/a/
0
1
b/a/
$/Z/
2
$/Z/
The Construction, Continued
Add dead state(s) and swap final and nonfinal states:
a/a/aa
a/Z/aZ
b/a/
//Z
b/a/
$/Z/
0
1
2
$/Z/
b/Z/, $/a/
a//, $/a/, b/Z/
4
a//, b//, $//, /a/, /Z/
Issues: 1) Never having the machine die
2) (L$) (L)$
3) Keeping the machine deterministic
Deterministic vs. Nondeterministic Context-Free Languages
Theorem: The class of deterministic context-free languages is a proper subset of the class of context-free languages.
Proof: Consider L = {anbmcp : m n or m p}
But L is not deterministic. If it were, then its complement L1 would be deterministic context free, and thus certainly context free.
But then
L2 = L1 a*b*c* (a regular language)
would be context free. But
L2 = {anbncn : n 0}, which we know is not context free.
Thus there exists at least one context-free language that is not deterministic context free.
Note that deterministic context-free languages are not closed under union, intersection, or difference.
Lecture Notes 19
10
Lecture Notes 19
11
Context-Free Languages
context-free grammars
parse
= NDPDAs
parse
find deterministic grammars
find efficient parsers
closed under:
concatenation
union
Kleene star
regular exprs.
or
regular grammars
recognize
= DFSAs
recognize
minimize FSAs
closed under:
concatenation
union
Kleene star
complement
intersection
pumping lemma
deterministic = nondeterministic
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
FSMs
D
ND
PDAs
Lecture Notes 19
12
Turing Machines
Read K & S 4.1.
Do Homework 17.
Grammars, Recursively Enumerable Languages, and Turing Machines
Recursively
Enumerable
Language
Unrestricted
Grammar
Accepts
Turing
Machine
Turing Machines
Can we come up with a new kind of automaton that has two properties:
powerful enough to describe all computable things
unlike FSMs and PDAs
simple enough that we can reason formally about it
like FSMs and PDAs
unlike real computers
Turing Machines
A Formal Definition
A Turing machine is a quintuple (K, , , s, H):
K is a finite set of states;
is an alphabet, containing at least and , but not or ;
s K is the initial state;
H K is the set of halting states;
is a function from:
(K - H)
to
K
such that
(a) if the input symbol is , the action is , and
(b) can never be written .
Lecture Notes 20
Turing Machines
( {, })
action (write or move)
The input tape is infinite to the right (and full of ), but has a wall to the left. Some definitions allow infinite tape in both
directions, but it doesn't matter.
2.
3.
must be defined for all state, input pairs unless the state is a halt state.
4.
Turing machines do not necessarily halt (unlike FSM's). Why? To halt, they must enter a halt state. Otherwise they loop.
5.
= 0, 1, ,
s=
H=
=
(1)
(2)
input up
to scanned
square
(*( - {}))
input after
scanned square
The input after the scanned square may be empty, but it may not end with a blank. We assume the entire tape to the right of the
input is filled with blanks.
(1)
(2)
(q, aab, b)
(h, aabb, )
Lecture Notes 20
=
=
(q, aabb)
(h, aabb)
a halting configuration
Turing Machines
Yields
(q1, w1a1u1) |-M (q2, w2a2u2),
a1 and a2 ,
w1
iff
| a1
b
| u1
b
w2
a2
a
aabb
aaab
aaab
aaab
| a1
|u1|
aaab
aaab
u2
Yields, Continued
(2) b = , w1 = w2a2, and either
(a) u2 = a1u1, if a1 or u1 ,
|
w1
w2
or
|
a
|
a
(b) u2 = , if a1 = and u1 =
|
w1
a
|
a
a2
a
a1
a
u2
a
|
w1
u1
b
a1
b
|u1|
If we scan left off the first square of the blank region, then drop that square from the configuration.
Yields, Continued
(3) b = , w2 = w1a1, and either
(a) u1 = a2u2
|
w1
a
a1
a
w2
a
(b) u1 = u2 = and a2 =
|
w1
or
w2
a
u1
b
a2
a1
u2
|u1|
a2
aaab
aaab
aaab
|u2|
aaab
If we scan right onto the first square of the blank region, then a new blank appears in the configuration.
Lecture Notes 20
Turing Machines
Yields, Continued
For any Turing machine M, let |-M* be the reflexive, transitive closure of |-M.
Configuration C1 yields configuration C2 if
C1 |-M* C2.
A computation by M is a sequence of configurations C0, C1, , Cn for some n 0 such that
C0 |-M C1 |-M C2 |-M |-M Cn.
We say that the computation is of length n or that it has n steps, and we write
C0 |-Mn Cn
A Context-Free Example
M takes a tape of a's then b's, possibly with more a's, and adds b's as required to make the number of b's equal the number of a's.
K = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
= a, b, , , 1, 2
s= 0
H = {9}
a/1
/
a,1,2/
a/1
1,2/
1/
b/2
2/
/2
5
2/
/
1/a;2/b
7
8
/
9
An Example Computation
Lecture Notes 20
Turing Machines
Notes on Programming
The machine has a strong procedural feel.
It's very common to have state pairs, in which the first writes on the tape and the second move. Some definitions allow both
actions at once, and those machines will have fewer states.
There are common idioms, like scan left until you find a blank.
Even a very simple machine is a nuisance to write.
A Notation for Turing Machines
(1) Define some basic machines
M2
M3
Lecture Notes 20
Turing Machines
Shorthands
a
M1
M2
becomes
M1
M2
becomes
M1
a, b
M2
M1
all elems of
M2
or
M1M2
MM
all elems of
except a
M1
M2
becomes
M2
becomes
M1
xa
M2
M1
a, b
M2
becomes
M1
x = a, b
M2
x?y
M2
e.g.,
>
Rx
find the first blank square to the right of the current square
R
>L
find the first blank square to the left of the current square
L
>R
find the first nonblank square to the right of the current square
R
>L
find the first nonblank square to the left of the current square
L
Lecture Notes 20
Turing Machines
Ra,b
La,b
b
M1
find the first occurrence of a or b to the left of the current square, then go to M1 if the detected
character is a; go to M2 if the detected character is b
M2
Lx=a,b
find the first occurrence of a or b to the left of the current square and set x to the value found
Lx=a,bRx
find the first occurrence of a or b to the left of the current square, set x to the value found, move one
square to the right, and write x (a or b)
An Example
w w {1}*
w3
Input:
Output:
111
Example:
>R1,
L
#R#R#L
1
H
A Shifting Machine S
Input:
Output:
w
w
Example:
abba
> L
LxR
x=
L
Lecture Notes 20
Turing Machines
A Recognition Example
L = {anbncn : n 0}
Example: aabbcc
Example: aaccb
a
> R
, b, c
b,c
a, b
a
a R
c, a, c,
b,c
a, b, c, a
n
b, c
b R
c L
, a, b, a
Lecture Notes 21
Example: acabb
>R
Rc,
x=a,b
R#
c
n
(y ? x )
Ry=#
y?x
#L
Do Turing Machines Stop?
FSMs
Always halt after n steps, where n is the length of the input. At that point, they either accept or reject.
PDAs
Don't always halt, but there is an algorithm to convert any PDA into one that does halt.
(s, w)
Lecture Notes 21
Output: ww
> L R
ww
ww
LxR
x=
L
Then the machine to compute f is just
>C S L
Computing Numeric Functions
n
1111
Output: n+1
Output: 10000
Why Are We Working with Our Hands Tied Behind Our Backs?
Turing machines are more powerful than any of the other formalisms we have studied so far.
Turing machines are a lot harder to work with than all the real computers we have available.
Why bother?
The very simplicity that makes it hard to program Turing machines makes it possible to reason formally about what they can do.
If we can, once, show that anything a real computer can do can be done (albeit clumsily) on a Turing machine, then we have a
way to reason about what real computers can do.
Lecture Notes 21
M halts on input w
M does not halt on input w
M(w) =
> R),
L(,
L
b b b b b b)
Lecture Notes 22
=w?
halt
w3, w2, w1
M
M'
Lecture Notes 22
a
a
a
a
a
[1]
[2]
[3]
[4]
[5]
b [1]
b [2]
b [3]
aa [1]
aa [2]
aa [3]
ab [1]
ab [2]
ba [1]
a/a
y
M':
Lemma: There exists at least one language L that is recursively enumerable but not recursive.
Proof that M' doesn't exist: Suppose that the RE languages were closed under complement. Then if L is RE, L would be RE. If
that were true, then L would also be recursive because we could construct M to decide it:
1. Let T1 be the Turing machine that semidecides L.
2. Let T2 be the Turing machine that semidecides L.
3. Given a string w, fire up both T1 and T2 on w. Since any string in * must be in either L or L, one of the two machines will
eventually halt. If it's T1, accept; if it's T2, reject.
But we know that there is at least one RE language that is not recursive. Contradiction.
Lecture Notes 22
*3, *2, *1
L?
yes
no
*k
M
M'
Proof, Continued
Proof that lexicographically Turing enumerable implies recursive: Let M be a Turing machine that lexicographically enumerates
L. Then, on input w, M' starts up M and waits until either M generates w (so M' accepts), M generates a string that comes after w
(so M' rejects), or M halts (so M' rejects). Thus M' decides L.
w
= w?
yes
> w?
no
no more Lis?
no
L3, L2, L1
M
M'
Lecture Notes 22
Tm always halts
Tm halts if yes
domain
Functions
recursive
?
range
Suppose we have a function that is not defined for all elements of its domain.
Example: f: N N, f(n) = n/2
Partially Recursive Functions
domain
range
One solution: Redefine the domain to be exactly those elements for which f is defined:
domain
range
But what if we don't know? What if the domain is not a recursive set (but it is recursively enumerable)? Then we want to define
the domain as some larger, recursive set and say that the function is partially recursive. There exists a Turing machine that halts
if given an element of the domain but does not halt otherwise.
Lecture Notes 22
Language
Summary
IN
OUT
Semidecidable
Enumerable
Unrestricted grammar
Recursively
Enumerable
Decision procedure
Lexicicographically enumerable
Complement is recursively enumer.
Recursive
CF grammar
PDA
Closure
Context Free
Regular expression
FSM
Closure
Lecture Notes 22
Regular
Diagonalization
Reduction
Pumping
Closure
Pumping
Closure
K
state,
( - {}) {, }
tape symbol, L or R
Multiple Tapes
1
2
.
.
k)
(K, 1' {, }
, 2' {, }
, .
, .
, k' {, })
to
Lecture Notes 23
0
a
1
a
0
a
0
0
b
0
a
1
b
0
b
0
b
0
a
1
b
0
0
0
0
b
a
b
0
0
0
0
a
0
0
a
1
0
a
0
1.
Track 1
Track 2
Proposed definition:
Simulation:
Simulating a PDA
The components of a PDA:
Finite state controller
Input tape
Stack
The simulation:
Finite state controller:
Input tape:
Stack:
Track 1
(Input)
Track 2
Corresponding to
a
a
a
a
b
a
Lecture Notes 23
#
a
a
b
a
abab
abab
bbab
b/
a/
b/
b
c/
c/
d/
4
d/
Lecture Notes 23
Nondeterministically choose two binary numbers 1 < p, q, where |p| and |q| |w|, and write them on the tape, after w,
separated by ;.
110011;111;1111
2.
Multiply p and q and put the answer, A, on the tape, in place of p and q.
110011;1011111
3.
Theorem: If a nondeterministic Turing machine M semidecides or decides a language, or computes a function, then there is a
standard Turing machine M' semideciding or deciding the same language or computing the same function.
Note that while nondeterminism doesnt change the computational power of a Turing Machine, it can exponentially increase its
speed!
Proof: (by construction)
For semideciding: We build M', which runs through all possible computations of M. If one of them halts, M' halts
Recall the way we did this for FSMs: simulate being in a combination of states.
Will this work here?
What about:
Lecture Notes 23
The Construction
At any point in the operation of a nondeterministic machine M, the maximum number of branches is
r=
|K|
(|| + 2)
states
actions
So imagine a table:
1
(p-,-)
(p-,-)
(q1,1)
(q1,2)
(q1,n)
(q2,1)
2
(p-,-)
(p-,-)
3
(p-,-)
(p-,-)
r
(p-,-)
(p-,-)
(p-,-)
(p-,-)
(q|K|,n)
Note that if, in some configuration, there are not r different legal things to do, then some of the entries on that row will repeat.
The Construction, Continued
Md:
(suppose r = 6)
Tape 1:
Input
Tape 2:
1 3 2 6 5 4 3 6
Input
Tape 2:
Copy of Input
Md
Tape 3:
1 3 2 6 5 4 3 6
Steps of M':
write on Tape 3
until Md accepts do
(1) copy Input from Tape 1 to Tape 2
(2) run Md
(3) if Md accepts, exit
(4) otherwise, generate lexicographically next string on Tape 3.
Pass
Tape3
Lecture Notes 23
2
1
3
2
7
6
8
11
9
12
2635
Nondeterministic Algorithms
One character?
Two characters?
Three characters?
Lecture Notes 23
2
4
3
5
Lecture Notes 24
Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine
a
Then we would represent the string s = aaa as
Lecture Notes 24
representation
a000
a001
a010
a011
a100
"s" = s = a001a100a100a000a100
Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine
An Encoding Example
Consider M = ({s, q, h}, {, ,a}, , s, {h}), where =
state
s
s
s
q
q
q
symbol
a
(q, )
(h, )
(s, )
(s, a)
(s, )
(q, )
state/symbol
s
q
h
representation
q00
q01
q11
a000
a001
a010
a011
a100
>R
L R M1
The Universal Turing Machine
The specification for U:
U("M" "w") = "M(w)"
"w------------------------w"
0
0
0
"
"
"w--------------------w"
1
0
0
0
0
"M ---------------------------- M"
1
0
0
0
0
q
0
0
0
Initialization of U:
1. Copy "M" onto tape 2
2. Insert "" at the left edge of tape 1, then shift w over.
3. Look at "M", figure out what i is, and write the encoding of state s on tape 3.
Lecture Notes 24
Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine
The Operation of U
a
0
0
1
1
0
0
0
"M ---------------------------- M"
1
0
0
0
q
0
0
0
1
a
0
0
0
0
0
a
a
a
Tape 2: (q00,a000,q11,a000), (q00,a001,q00,a011),
(q00,a100,q01,a000), (q01,a000,q00,a011),
(q01,a001,q01,a011), (q01,a100,q00,a100)
Tape 3: q01
a
a
a
Tape 3: q00
Lecture Notes 24
Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine
Accepts
Turing
Machine
Unrestricted Grammars
An unrestricted, or Type 0, or phrase structure grammar G is a quadruple
(V, , R, S), where
V is an alphabet,
(the set of terminals) is a subset of V,
R (the set of rules) is a finite subset of
(V*
(V-)
V*)
V*,
context
N
context
result
S (the start symbol) is an element of V - .
We define derivations just as we did for context-free grammars.
The language generated by G is
{w * : S G* w}
There is no notion of a derivation tree or rightmost/leftmost derivation for unrestricted grammars.
Unrestricted Grammars
Example: L = anbncn, n > 0
S aBSc
S aBc
Ba aB
Bc bc
Bb bb
Another Example
L = {w {a, b, c}+ : number of a's, b's and c's is the same}
S ABCS
S ABC
AB BA
BC CB
AC CA
BA AB
Lecture Notes 25
CA AC
CB BC
Aa
Bb
Cc
1
a
1
a
0
S
0
b
0
T
0
a
0
a
0
0
b
0
Lecture Notes 25
>s
<
bp aq
If (q, a) = (p, ) :
abp aqb b
ap< aq<
If (q, a) = (p, ), a
pa aq
If (q, ) = (p, )
pb qb
p< q<
Lecture Notes 25
L = a*
S >h<
>s
<
(1)
(2)
(3)
(4)
(5)
(6)
q s
aq sa
q< s<
aq aq
aaq aqa
aq< aq<
t q
ta qa
t< q<
p at
h t
t p
ta pa
t< p<
Working It Out
(1)
(2)
S >h<
>s
<
1
2
3
q s
aq sa
q< s<
aq aq
aaq aqa
aq< aq<
4
5
6
7
8
9
>saa<
>aqa<
>aaq<
>aaq<
>aat<
>ap<
>at<
>p<
>t<
>h<
Lecture Notes 25
1
2
2
3
4
6
4
6
5
(3)
(4)
(5)
(6)
t q
ta qa
t< q<
p at
h t
t p
ta pa
t< p<
>h<
>t<
>p<
>at<
>ap<
>aat<
>aaq<
>aaq<
>aqa<
>saa<
aa<
aa
10
11
12
13
14
15
16
17
1
14
17
13
17
13
12
9
8
5
2
3
An Alternative Proof
An alternative is to build a grammar G that simulates the forward operation of a Turing machine M. It uses alternating symbols
to represent two interleaved tapes. One tape remembers the starting string, the other working tape simulates the run of the
machine.
The first (generate) part of G:
Creates all strings over * of the form
w = Qs a1 a1 a2 a2 a3 a3
The second (test) part of G simulates the execution of M on a particular string w. An example of a partially derived string:
a 1 b 2 c c b 4 Q3 a 3
Examples of rules:
b b Q 4 b 4 Q 4 (rewrite b as 4)
b 4 Q 3 Q 3 b 4 (move left)
The third (cleanup) part of G erases the junk if M ever reaches h.
Example rule:
#ha1a#h
S111S
Output:
Lecture Notes 25
More on Functions: Why Have We Been Using Recursive as a Synonym for Computable?
Primitive Recursive Functions
Define a set of basic functions:
zerok (n1, n2, nk) = 0
identityk,j (n1, n2, nk) = nj
successor(n) = n + 1
Combining functions:
Composition of g with h1, h2, hk is
g(h1( ), h2( ), hk( ))
Primitive recursion of f in terms of g and h:
f(n1,n2,nk, 0) = g(n1,n2,nk)
f(n1,n2,nk,m+1) = h(n1,n2,nk, m, f(n1, n2,nk,m))
Example:
plus(n, 0) = n
plus(n, m+1) = succ(plus(n, m))
Primitive Recursive Functions and Computability
f0
f1
f2
f3
f4
27
Ackermann's function:
A(0, y) = y + 1
A(x + 1, 0) = A(x, 1)
A(x + 1, y + 1) = A(x, A(x + 1, y))
0
1
1
2
2
3
3
4
4
5
5
6
11
3
4
5
13
13
65533
29
265536-3
61
*
125
65536
22
22
265536
Lecture Notes 25
Recursive Functions
A function is -recursive if it can be obtained from the basic functions using the operations of:
Composition,
Recursive definition, and
Minimalization of minimalizable functions:
The minimalization of g (of k + 1 arguments) is a function f of k arguments defined as:
if such an m exists,
f(n1,n2,nk) = the least m such at g(n1,n2,nk,m)=1,
0
otherwise
A function g is minimalizable iff for every n1,n2,nk, there is an m such that g(n1,n2,nk,m)=1.
Theorem: A function is -recursive iff it is recursive (i.e., computable by a Turing machine).
Partial Recursive Functions
Consider the following function f:
f(n) = 1 if TM(n) halts on a blank tape
0 otherwise
The domain of f is the natural numbers. Is f recursive?
domain
range
Theorem: There are uncountably many partially recursive functions (but only countably many Turing machines).
Functions and Machines
Partial Recursive
Functions
Recursive
Functions
Primitive Recursive
Functions
Turing Machines
Lecture Notes 25
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Deterministic
Context-Free
Languages
Regular
Languages
FSMs
DPDAs
NDPDAs
Turing Machines
Lecture Notes 25
Example:
aabbcc
a
> R
,b,c
b,c
a,b
a R
c,a,c,
b,c
a,b,c,a
b,c
b R
c L
,a,b,a
n
y
Context-Sensitive Languages and Linear Bounded Automata
Theorem: The set of context-sensitive languages is exactly the set of languages that can be accepted by linear bounded automata.
Proof: (sketch) We can construct a linear-bounded automaton B for any context-sensitive language L defined by some grammar
G. We build a machine B with a two track tape. On input w, B keeps w on the first tape. On the second tape, it
nondeterministically constructs all derivations of G. The key is that as soon as any derivation becomes longer than |w| we stop,
since we know it can never get any shorter and thus match w. There is also a proof that from any lba we can construct a contextsensitive grammar, analogous to the one we used for Turing machines and unrestricted grammars.
Theorem: There exist recursive languages that are not context sensitive.
Lecture Notes 25
Recursively Enumerable
Languages
Recursive
Languages
Context-Sensitive
Languages
Context-Free
Languages
Deterministic
Context-Free
Languages
Regular
Languages
FSMs
DPDAs
NDPDAs
Linear Bounded Automata
Turing Machines
Lecture Notes 25
10
Recursively Enumerable
Languages
Context-Sensitive
Languages
Context-Free
Languages
Type 0
Type 1
Type 2
Regular
(Type 3)
Languages
FSMs
PDAs
Linear Bounded Automata
Turing Machines
Lecture Notes 25
11
Undecidabilty
Read K & S 5.1, 5.3, & 5.4.
Read Supplementary Materials: Recursively Enumerable Languages, Turing Machines, and Decidability.
Do Homeworks 21 & 22.
Church's Thesis
(Church-Turing Thesis)
An algorithm is a formal procedure that halts.
The Thesis: Anything that can be computed by any algorithm can be computed by a Turing machine.
Another way to state it: All "reasonable" formal models of computation are equivalent to the Turing machine.
This isn't a formal statement, so we can't prove it. But many different computational models have been proposed and they all turn
out to be equivalent.
Examples:
unrestricted grammars
lambda calculus
cellular automata
DNA computing
quantum computing (?)
Lecture Notes 26
Undecidability
Another View
The Problem View: The halting problem is undecidable.
The Language View: Let H =
{"M" "w" : TM M halts on input string w}
H is recursively enumerable but not recursive.
Why?
H is recursively enumerable because it can be semidecided by U, the Universal Turing Machine.
But H cannot be recursive. If it were, then it would be decided by some TM MH. But MH("M" "w") would have to be:
If M is not a syntactically valid TM, then False.
else HALTS("M" "w")
But we know cannot that HALTS cannot exist.
If H were Recursive
H = {"M" "w" : TM M halts on input string w}
Theorem: If H were also recursive, then every recursively enumerable language would be recursive.
Proof: Let L be any RE language. Since L is RE, there exists a TM M that semidecides it.
Suppose H is recursive and thus is decided by some TM O (oracle).
We can build a TM M' from M that decides L:
1. M' transforms its input tape from w to "M""w".
2. M' invokes O on its tape and returns whatever answer O returns.
So, if H were recursive, all RE languages would be. But it isn't.
Undecidable Problems, Languages that Are Not Recursive, and Partial Functions
The Problem View: The halting problem is undecidable.
The Language View: Let H =
{"M" "w" : TM M halts on input string w}
H is recursively enumerable but not recursive.
The Functional View: Let f (w) = M(w)
f is a partial function on *
"M""w" pairs
Lecture Notes 26
Undecidability
Consider two lists of strings over some alphabet . The lists must be finite and of equal length.
A = x1, x2, x3, , xn
B = y1, y2, y3, , yn
Question: Does there exist some finite sequence of integers that can be viewed as indexes of A and B such that, when elements of
A are selected as specified and concatenated together, we get the same string we get when elements of B are selected also as
specified?
For example, if we assert that 1, 3, 4 is such a sequence, were asserting that x1x3x4 = y1y3y4
Any problem of this form is an instance of the Post Correspondence Problem.
Is the Post Correspondence Problem decidable?
Post Correspondence Problem Examples
i
1
2
3
A
1
10111
10
B
111
10
0
i
1
2
3
A
10
011
101
B
101
11
011
Lecture Notes 26
Undecidability
Reduction
Let L1, L2 * be languages. A reduction from L1 to L2 is a recursive function : * * such that
x L1 iff (x) L2.
Example:
L1 = {a, b : a,b N : b = a + 1}
= Succ
a, b becomes
Succ(a), b
L2 = {a, b : a,b N : a = b}
If there is a Turing machine M2 to decide L2, then I can build a Turing machine M1 to decide L1:
1. Take the input and apply Succ to the first number.
2. Invoke M2 on the result.
3. Return whatever answer M2 returns.
Reductions and Recursive Languages
Theorem: If there is a reduction from L1 to L2 and L2 is recursive, then L1 is recursive.
x
x L 1?
M1
y = M2
y L2?
(x)
yes
yes
no
no
Theorem: If there is a reduction from L1 to L2 and L1 is not recursive, then L2 is not recursive.
x
x L1?
M1
y = M2
y L2?
(x)
halt
halt
Theorem: If there is a reduction from L1 to L2 and L1 is not RE, then L2 is not RE.
Lecture Notes 26
Undecidability
=H=
(?M2)
L2 =
Let be the function that, from "M" and "w", constructs "M*", which operates as follows on an empty input tape:
1. Write w on the tape.
2. Operate as M would have.
If M2 exists, then M1 = M2(M(s)) decides L1.
A Formal Reduction Proof
Prove that L2 = {M: Turing machine M halts on the empty tape} is not recursive.
Proof that L2 is not recursive via a reduction from H = {M, w: Turing machine M halts on input string w}, a non-recursive
language. Suppose that there exists a TM, M2 that decides L2. Construct a machine to decide H as M1(M, w) = M2((M, w)).
The function creates from M and w a new machine M*. M* ignores its input and runs M on w, halting exactly when M halts
on w.
M, w H M halts on w M* always halts L(M*) M* L2 M2 accepts M1 accepts.
M, w H M does not halt on w L(M*) M* L2 M2 rejects M1 rejects.
Thus, if there is a machine M2 that decides L2, we could use it to build a machine that decides H. Contradiction. L2 is not
recursive.
Important Elements in a Reduction Proof
A clear declaration of the reduction from and to languages and what youre trying to prove with the reduction.
A description of how a machine is being constructed for the from language based on an assumed machine for the to
language and a recursive function.
A description of the functions inputs and outputs. If is doing anything nontrivial, it is a good idea to argue that it is
recursive.
Note that machine diagrams are not necessary or even sufficient in these proofs. Use them as thought devices, where
needed.
Run through the logic that demonstrates how the from language is being decided by your reduction. You must do both
accepting and rejecting cases.
Declare that the reduction proves that your to language is not recursive.
The Most Common Mistake: Doing the Reduction Backwards
L1
L2
Example: If there exists a machine M2 that solves L2, the problem of deciding whether a Turing machine halts on a blank tape,
then we could do H (deciding whether M halts on w) as follows:
1. Create M* from M such that M*, given a blank tape, first writes w on its tape, then simulates the behavior of M.
2. Return M2("M*").
Doing it wrong by reducing L2 (the unknown one to L1): If there exists a machine M1 that solves H, then we could build a
machine that solves L2 as follows:
1. Return (M1("M", "")).
Lecture Notes 26
Undecidability
Reduce L1 to L2:
L1 = L2 + (now - 4/9/1865)
L2
Reduce L2 to L1:
L2 = L1 - (now - 4/9/1865)
L2
L1
Why Backwards Doesn't Work, Continued
Considering L2:
Reduce L1 to L2:
L1 = L2 + (now - 4/9/1865)
L2
Reduce L2 to L1:
L2 = L1 - (now - 4/9/1865)
L2
L1
L1
Considering L3:
Reduce L1 to L3:
L1 = oops
L3
Reduce L3 to L1:
L3 = L1 - 365 - (now - 4/9/1866)
L3
L1
Is There Any String on Which M Halts?
L1
=H=
(?M2)
L2 =
Let be the function that, from "M" and "w", constructs "M*", which operates as follows:
1. M* examines its input tape.
2. If it is equal to w, then it simulates M.
3. If not, it loops.
Clearly the only input on which M* has a chance of halting is w, which it does iff M would halt on w.
If M2 exists, then M1 = M2(M(s)) decides L1.
Lecture Notes 26
Undecidability
(?M2)
L2 =
Let be the function that, from "M", constructs "M*", which operates as follows:
1. Erase the input tape.
2. Simulate M.
Clearly M* either halts on all inputs or on none, since it ignores its input.
If M2 exists, then M1 = M2(M(s)) decides L1.
Rice's Theorem
Theorem: No nontrivial property of the recursively enumerable languages is decidable.
Alternate statement: Let P: 2*{true, false} be a nontrivial property of the recursively enumerable languages. The language
{M: P(L(M)) = True} is not recursive.
By "nontrivial" we mean a property that is not simply true for all languages or false for all languages.
Examples:
L contains only even length strings.
L contains an odd number of strings.
L contains all strings that start with "a".
L is infinite.
L is regular.
Note:
Rice's theorem applies to languages, not machines. So, for example, the following properties of machines are decidable:
M contains an even number of states
M has an odd number of symbols in its tape alphabet
Of course, we need a way to define a language. We'll use machines to do that, but the properties we'll deal with are properties of
L(M), not of M itself.
Proof of Rice's Theorem
Proof: Let P be any nontrivial property of the RE languages.
= H = {s = "M" "w" : Turing machine M halts on input string w}
L1
(?M2)
L2 =
Either P() = true or P() = false. Assume it is false (a matching proof exists if it is true). Since P is nontrivial, there is some
language LP such that P(LP) is true. Let MP be some Turing machine that semidecides LP.
Let construct "M*", which operates as follows:
1. Copy its input y to another track for later.
2. Write w on its input tape and execute M on w.
3. If M halts, put y back on the tape and execute MP.
4. If MP halts on y, accept.
Claim: If M2 exists, then M1 = M2(M(s)) decides L1.
Lecture Notes 26
Undecidability
Why?
Two cases to consider:
"M" "w" H M halts on w M* will halt on all strings that are accepted by MP L(M*) = L(MP) = LP P(L(M*)) =
P(LP) = true M2 decides P, so M2 accepts "M*" M1 accepts.
"M" "w" H M doesnt halt on w M* will halt on nothing L(M*) = P(L(M*)) = P() = false M2 decides
P, so M2 rejects "M*" M1 rejects.
Using Rices Theorem
Example 1:
L
= {s = "M" : M writes a 1 within three moves}.
Example 2:
L
= {s = "M1" "M2": L(M1) = L(M2)}.
Lecture Notes 26
Undecidability
(?M2)
L2 =
{s = "M" : L(M) is regular}
Let be the function that, from "M" and "w", constructs "M*", whose own input is a string
t = "M*" "w*"
M*("M*" "w*") operates as follows:
1. Copy its input to another track for later.
2. Write w on its input tape and execute M on w.
3. If M halts, invoke U on "M*" "w*".
4. If U halts, halt and accept.
If M2 exists, then M2(M*(s)) decides L1 (H).
Why?
If M does not halt on w, then M* accepts (which is regular).
If M does halt on w, then M* accepts H (which is not regular).
Undecidable Problems About Unrestricted Grammars
=H=
(?M2)
L2 =
Let be the construction that builds a grammar G for the language L that is semidecided by M. Thus
w L(G) iff M(w) halts.
Then
Lecture Notes 26
Undecidability
(?M2)
L2 =
Non-RE Languages
There are an uncountable number of non-RE languages, but only a countably infinite number of TMs (hence RE languages).
The class of non-RE languages is much bigger than that of RE languages!
Intuition: Non-RE languages usually involve either infinite search or knowing a TM will infinite loop to accept a string.
{M: M is a TM that does not halt on the empty tape}
{M: M is a TM and L(M) = *}
{M: M is a TM and there does not exist a string on which M halts}
Diagonalization
L={M: M is a TM and M(M) does not halt} is not RE
Suppose L is RE. There is a TM M* that semidecides L. Is M* in L?
If it is, then M*(M*) halts (by the definition of M* as a semideciding machine for L)
But, by the definition of L, if M* L, then M*(M*) does not halt.
Contradiction. So L is not RE.
(This is a very bare-bones diagonalization proof.)
Diagonalization can only be easily applied to a few non-RE languages.
Lecture Notes 26
Undecidability
10
x
x L1?
M1
y = M2
y L2?
(x)
halt
halt
Theorem: If there is a reduction from L1 to L2 and L1 is not RE, then L2 is not RE.
Reduction from a known non-RE Language
Using a reduction from a non-RE language:
L1 = H = {M, w: Turing machine M does not halt on input string w}
(?M2)
L2 = {M: there does not exist a string on which Turing machine M halts}
Let be the function that, from M and w, constructs M*, which operates as follows:
1. Erase the input tape (M* ignores its input).
2. Write w on the tape
3. Run M on w.
M, w
M1
M*
M*
w
M2
halt
halt
halt halt
M, w H M does not halt on w M* does not halt on any input M* halts on nothing M2 accepts (halts).
M, w H M halts on w M* halts on everything M2 loops.
If M2 exists, then M1(M, w) = M2(M(M, w)) and M1 semidecides L1. Contradiction. L1 is not RE. L2 is not RE.
Lecture Notes 26
Undecidability
11
Language
Summary
IN
OUT
Semidecidable
Enumerable
Unrestricted grammar
Recursively
Enumerable
Decision procedure
Lexicicographically enumerable
Complement is recursively enumer.
Recursive
CF grammar
PDA
Closure
Context Free
Regular expression
FSM
Closure
Lecture Notes 26
Regular
Undecidability
Diagonalization
Reduction
Pumping
Closure
Pumping
Closure
12
Most computational problems you will face your life are solvable (decidable). We have yet to address whether a problem is
easy or hard. Complexity theory tries to answer this question.
Recall that a computational problem can be recast as a language recognition problem.
Some easy problems:
Pattern matching
Parsing
Database operations (select, join, etc.)
Sorting
Some hard problems:
Traveling salesman problem
Boolean satisfiability
Knapsack problem
Optimal flight scheduling
Hard problems usually involve the examination of a large search space.
Big-O Notation
A function f(n) is O(g(n)) whenever there exists a constant c, such that |f(n)| c|g(n)| for all n 0.
(We are usually most interested in the smallest and simplest function, g.)
Examples:
2n3 + 3n2log(n) + 75n2 + 7n + 2000 is O(n3)
752n + 200n5 + 10000 is O(2n)
A function f(n) is polynomial if f(n) is O(p(n)) for some polynomial function p.
If a function f(n) is not polynomial, it is considered to be exponential, whether or not it is O of some exponential function
(e.g. n log n).
In the above two examples, the first is polynomial and the second is exponential.
Comparison of Time Complexities
Speed of various time complexities for different values of n, taken to be a measure of problem size. (Assumes 1 step per
microsecond.)
f(n)\n
10
20
30
40
50
60
.00001 sec.
.00002 sec.
.00003 sec.
.00004 sec.
.00005 sec.
.00006 sec.
n
.0001 sec.
.0004 sec.
.0009 sec.
.0016 sec.
.0025 sec.
.0036 sec.
n2
.001 sec.
.008 sec.
.027 sec.
.064 sec.
.125 sec.
.216 sec.
n3
.1 sec.
3.2 sec.
24.3 sec.
1.7 min.
5.2 min.
13.0 min.
n5
.001 sec.
1.0 sec.
17.9 min.
12.7 days
35.7 yr.
366 cent.
2n
.059 sec.
58 min.
6.5 yr.
3855 cent.
2x108 cent.
1.3x1013 cent.
3n
Faster computers dont really help. Even taking into account Moores Law, algorithms with exponential time complexity are
considered intractable. Polynomial time complexities are strongly desired.
Lecture Notes 27
Complexity Theory
Polynomial Land
If f1(n) and f2(n) are polynomials, then so are:
f1(n) + f2(n)
f1(n) f2(n)
f1(f2(n))
This means that we can sequence and compose polynomial-time algorithms with the resulting algorithms remaining polynomialtime.
Computational Model
For formally describing the time (and space) complexities of algorithms, we will use our old friend, the deciding TM (decision
procedure).
There are two parts:
The problem to be solved must be translated into an equivalent language recognition problem.
A TM to solve the language recognition problem takes an encoded instance of the problem (of size n symbols) as input
and decides the instance in at most TM(n) steps.
We will classify the time complexity of an algorithm (TM) to solve it by its big-O bound on TM(n).
We are most interested in polynomial time complexity algorithms for various types of problems.
Encoding a Problem
Traveling Salesman Problem: Given a set of cities and the distances between them, what is the minimum distance tour a
salesman can make that covers all cities and returns him to his starting city?
Stated as a decision question over graphs: Given a graph G = (V, E), a positive distance function for each edge d: EN+, and a
bound B, is there a circuit that covers all V where d(e) B? (Here a minimization problem was turned into a bound problem.)
A possible encoding the problem:
Give |V| as an integer.
Give B as an integer.
Enumerate all (v1, v2, d) as a list of triplets of integers (this gives both E and d).
All integers are expressed as Boolean numbers.
Separate these entries with commas.
Note that the sizes of most reasonable problem encodings are polynomially related.
What about Turing Machine Extensions?
Most TM extensions are can be simulated by a standard TM in a time polynomially related to the time of the extended machine.
Lecture Notes 27
Complexity Theory
The Class P
P = { L : there is a polynomial-time deterministic TM, M that decides L }
Roughly speaking, P is the class of problems that can be solved by deterministic algorithms in a time that is polynomially related
to the size of the respective problem instance.
The way the problem is encoded or the computational abilities of the machine carrying out the algorithm are not very important.
Example: Given an integer n, is there a positive integer m, such that n = 4m?
Problems in P are considered tractable or easy.
The Class NP
NP = { L: there is a polynomial time nondeterministic TM, M that decides L }
Roughly speaking, NP is the class of problems that can be solved by nondeterministic algorithms in a time that is polynomially
related to the size of the respective problem instance.
Many problems in NP are considered intractable or hard.
Examples:
Traveling salesman problem: Given a graph G = (V, E), a positive distance function for each edge d: EN+, and a
bound B, is there a circuit that covers all V where d(e) B?
Subgraph isomorphism problem: Given two graphs G1 and G2, does G1 contain a subgraph isomorphic to G2?
The Relationship of P and NP
Recursive
NP
P
Lecture Notes 27
Complexity Theory
Why NP is so Interesting
To date, nearly all decidable problems with polynomial bounds on the size of the solution are in this class.
Nondeterminism doesnt influence decidability, so maybe it shouldnt have a big impact on complexity.
Showing that P = NP would dramatically change the computational power of our algorithms.
Stephen Cooks Contribution (1971)
Showed that the Boolean Satisfiability (SAT) problem has the property that every other NP problem can be
polynomially reduced to it. Thus, SAT can be considered the hardest problem in NP.
Suggested that other NP problems may also be among the hardest problems in NP.
Lecture Notes 27
Complexity Theory
(w)
M2
SAT
3SAT
3DM
VC
PARTITION
HC
CLIQUE
The early NP-complete reductions took this structure. Each phrase represents a problem. The arrow represents a reduction from
one problem to another.
Today, thousands of diverse problems have been shown to be NP-complete.
Lets now look at these problems.
Lecture Notes 27
Complexity Theory
3SAT (3-satisfiability)
Boolean satisfiability where each clause has exactly 3 terms.
3DM (3-Dimensional Matching)
Consider a set M X Y Z of disjoint sets, X, Y, & Z, such that |X| = |Y| = |Z| = q. Does there exist a matching, a subset
M M such that |M| = q and M partitions X, Y, and Z?
This is a generalization of the marriage problem, which has two sets men & women and a relation describing acceptable
marriages. Is there a pairing that marries everyone acceptably?
The marriage problem is in P, but this 3-sex version of the problem is NP-complete.
PARTITION
Given a set A and a positive integer size, s(a) N+, for each element, a A. Is there a subset A A such that
s(a) = s(a) ?
aA
aA-A
VC (Vertex Cover)
Given a graph G = (V, E) and an integer K, such that 0 < K |V|, is there a vertex cover of size K or less for G, that is, a subset
V V such that |V| K and for each edge, (u, v) E, at least one of u and v belongs to V?
CLIQUE
Given a graph G = (V, E) and an integer J, such that
0 < J |V|, does G contain a clique of size J or more, that is a subset V V such that |V| J and every two vertices in V are
joined by an edge in E?
HC (Hamiltononian Circuit)
Given a graph G = (V, E), does there exist a Hamiltonian circuit, that is an ordering <v1, v2, , vn> of all V such that
(v|V|, v1) E and (vi, vi+1) E for all i, 1 i < |V|?
Traveling Salesman Prob. is NP-complete
Given a graph G = (V, E), a positive distance function for each edge d: EN+, and a bound B, is there a circuit that covers all V
where d(e) B?
To prove a language TSP is NP-complete, you must do the following:
1. Show that TSP NP.
2. Select a known NP-complete language L1.
3. Construct a reduction from L1 to TSP.
4. Show that is polynomial-time function.
TSP NP: Guess a set of roads. Verify that the roads form a tour that hits all cities. Answer yes if the guess is a tour and the
sum of the distances is B.
Reduction from HC: Answer the Hamiltonian circuit question on G = (V, E) by constructing a complete graph where roads
have distance 1 if the edge is in E and 2 otherwise. Pose the TSP problem, is there a tour of length |V|?
Lecture Notes 27
Complexity Theory
NP-hard
NP
NP-complete
(part of NP-hard)
P
Lecture Notes 27
Complexity Theory
II. Homework
CS 341 Homework 1
Basic Techniques
1. What are these sets? Write them using braces, commas, numerals, (for infinite sets), and only.
(a) ({1, 3, 5} {3, 1}) {3, 5, 7}
(b) {{3}, {3, 5}, {{5, 7}, {7, 9}}}
(c) ({1, 2, 5} - {5, 7, 9}) ({5, 7, 9} - {1, 2, 5})
(d) 2{7, 8, 9} - 2{7, 9}
(e) 2
(f) {x : y N where x = y2}
(g) {x : x is an integer and x2 = 2}
Basic Techniques
(b) LessThanOrEqual defined on ordered pairs of natural numbers. (a, b) (x, y) iff a x or (a = x and
b y). For example, (1,2) (2,1) and (1,2) (1,3).
(c) The relation defined by the following boolean matrix:
1
1
1
1
1
1
1
1
10. Are the following sets closed under the following operations? If not, what are the respective closures?
(a) The odd integers under multiplication.
(b) The positive integers under division.
(c) The negative integers under subtraction.
(d) The negative integers under multiplication.
(e) The odd length strings under concatenation.
11. What is the reflexive transitive closure R* of the relation
R = {(a, b), (a, c), (a, d), (d, c), (d, e)} Draw a directed graph representing R*.
12. For each of the following relations R, over some domain D, compute the reflexive, symmetric, transitive
closure R. Try to think of a simple descriptive name for the new relation R. Since R must be an equivalence
relation, describe the partition that R induces on D.
(a) Let D be the set of 50 states in the US. xy, xRy iff x shares a boundary with y.
(b) Let D be the natural numbers. xy, xRy iff y = x+3.
(c) Let D be the set of strings containing no symbol except a. xy, xRy iff y = xa. (i.e., if y equals x
concatenated with a).
13. Consider an infinite rectangular grid (like an infinite sheet of graph paper). Let S be the set of intersection
points on the grid. Let each point in S be represented as a pair of (x,y) coordinates where adjacent points differ
in one coordinate by exactly 1 and coordinates increase (as is standard) as you move up and to the right.
(a) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff x2= x1+1 and y2=y1+1. Let R be
the reflexive, symmetric, transitive closure of R. Describe in English the partition P that R induces on S. What
is the cardinality of P?
(b) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff (x2= x1+1 and y2=y1+1) or (x2= x11 and y2=y1+1). Let R be the reflexive, symmetric, transitive closure of R. Describe in English the partition P
that R induces on S. What is the cardinality of P?
(c) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff (x2,y2) is reachable from (x1,y1) by
moving two squares in any one of the four directions and then one square in a perpendicular direction. Let R
be the reflexive, symmetric, transitive closure of R. Describe in English the partition P that R induces on S.
What is the cardinality of P?
14. Is the transitive closure of the symmetric closure of a binary relation necessarily reflexive? Prove it or give
a counterexample.
15. Give an example of a binary relation that is not reflexive but has a transitive closure that is reflexive.
16. For each of the following functions, state whether or not it is (i) one-to-one, (ii) onto, and (iii) idempotent.
Justify your answers.
(a) +: P P P, where P is the set of positive integers, and
+(a, b) = a + b (In other words, simply addition defined on the positive integers)
(b) X : B B B, where B is the set {True, False}
Homework 1
Basic Techniques
{3, 5}
{3, 5, 7}
{1, 2, 7, 9}
{8}, {7, 8}, {8, 9}, {7, 8, 9}
{}
{0, 1, 4, 9, 25, 36} (the perfect squares)
(since the square root of 2 is not an integer)
2. (a)
A (B C)
= (B C) A
= (B A) (C A)
= (A B) (A C)
commutativity
distributivity
commutativity
(b)
A (B C)
= (B C) A
= (B A) (C A)
= (A B) (A C)
commutativity
distributivity
commutativity
(c)
A (A B)
= (A B) A
=A
commutativity
absorption
3. (a)
(b)
(c)
{(,1), (,2), ({1}, 1), ({1}, 2), ({2}, 1), ({2}, 2), ({1,2}, 1), ({1,2}, 2)}
4.
R R = {(a, a), (a, d), (a, b), (b, b), (b, c), (b, a), (a, c)}
R inverse = {(b, a), (c, a), (d, c), (a, a), (a, b)}
None of R, R R or R inverse is a function.
5. (a) S = {0, 1, 5, 6, 7, }. S has the same number of elements as N. Why? Because there is a bijection
between S and N: f: S N, where f(0) = 0, f(1) = 1, x 5, f(x) = x - 3. So |S| = 0.
(b) 2.
(c) S = all subsets of {a, b, c}. So S = {, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. So |S| = 8. We
could also simply have used the fact that the cardinality of the power set of a finite set of cardinality c
is 2c.
(d) S = {(a, 1), (a, 2), (a, 3), (a, 4), (b, 1), (b, 2), (b, 3), (b, 4), (c, 1), (c, 2), (c, 3), (c, 4)}. So |S| = 12. Or
we could have used the fact that, for finite sets, |A B| = |A| * |B|.
(e) S = {(a, 0), (a, 1), , (b, 0), (b, 1),} Clearly S contains an infinite number of elements. But are there
the same number of elements in S as in N, or are there more (26 times more, to be precise)? The
answer is that there are the same number. |S| = 0. To prove this, we need a bijection from S to N. We
can define this bijection as an enumeration of the elements of S:
(a, 0), (b, 0), (c, 0),
(First enumerate all 26 elements of S that have 0 as their second element)
Homework 1
Basic Techniques
6. Mother-of:
Not reflexive: Eve is not the mother of Eve (in fact, no one is her own mother).
Not symmetric: mother-of(Eve, Cain), but not Mother-of(Cain, Eve).
Not transitive: Each person has only one mother, so if Mother-of(x, y) and Mother-of(y, z),
the only way to have Mother-of(x, z) would be if x and y are the same person, but we know
that that's not possible since Mother-of(x, y) and no one can be the mother of herself).
Would-recognize-picture-of:
Not symmetric: W-r-p-o(Elaine, Bill Clinton), but not W-r-p-o (Bill Clinton, Elaine)
Not transitive: W r-p-o(Elaine, Bill Clinton) and W r-p-o(Bill Clinton, Bill's mom) but not
W-r-p-o(Elaine, Bill's mom)
Has-ever-been-married-to:
Not reflexive: No one is married to him or herself.
Not transitive: H-e-b-m-t(Dave, Sue) and H-e-b-m-t(Sue, Jeff) but not
H-e-b-m-t(Dave, Jeff)
Ancestor-of: Not reflexive: not Ancestor-of(Eve, Eve) (in fact, no one is their own ancestor).
Not symmetric: Ancestor-of(Eve, Cain) but not Ancestor-of(Cain, Eve)
Hangs-out-with: Not transitive: Hangs-out-with(Bill, Monica) and Hangs-out-with(Monica, Linda Tripp),
but not Hangs-out-with(Bill, Linda Tripp).
Less-than-or-equal-to: Not symmetric: 1 2, but not 2 1.
7. Yes, if 2A = 2B, then A must equal B. Suppose it didn't. Then there is some element x that is in one set but
not the other. Call the set x is in A. Then 2A must contain {x}, which must not be in 2B, since x B. This
would mean that 2A 2B, which contradicts our premise.
8. (a)
(b)
(c)
(d)
yes
no, since no element of a partition can be empty.
no, 0 is missing
no, since, each element of the original set S must appear in only one element of a partition of S.
Basic Techniques
contains all the necessary elements. Since it is not possible to derive zero by multiplying two negative
numbers, it must not be in the closure set.
(e) The odd length strings are not closed under concatenation. "a" || "b" = "ab", which is of length 2. The
closure is the set of strings of length 2. Note that strings of length 1 are not included. Why?
11. R* = R {(x, x) : x {a, b, c, d, e}} {(a, e)}
12. (a) The easiest way to start to solve a problem like this is to start writing down the elements of R and see if
a pattern emerges. So we start with the elements of R: {(TX, LA), (LA, TX), (TX, NM), (NM, TX), (LA, Ark),
(Ark, LA), (LA Miss), (Miss, LA) }. To construct R, we first add all elements of the form (x, x), so we add
(TX,TX), and so forth. Then we add the elements required to establish transitivity:
(NM, TX), (TX, LA) (NM, TX)
(TX, LA), (LA, Ark) (TX, Ark)
(NM, TX), (TX, Ark) (NM, Ark), and so forth.
If we continue this process, we will see that the reflexive, symmetric, transitive closure R relates all states
except Alaska and Hawaii to each other and each of them only to themselves. So R can be described as
relating two states if its possible to drive from one to the other without leaving the country. The partition is:
[Alaska]
[Hawaii]
[all other 48 states]
(b) R includes, for example {(0, 3), (3, 6), (6, 9), (9, 12) }. When we compute the transitive closure, we
add, among other things {(0, 6), (0, 9), (0,12)}. Now try this starting with (1,4) and (2, 5). Its clear that x,y,
xRy iff x = y (mod 3). In other words, two numbers are related iff they have the same remainder mod 3. The
partition is:
[0, 3, 6, 9, 12 ]
[1, 4, 7, 10, 13 ]
[2, 5, 8, 11, 14 ]
(c) R relates all strings composed solely of as to each other. So the partition is
[, a, aa, aaa, aaaa, ]
13. (a) Think of two points being related via R if you can get to the second one by starting at the first and
moving up one square and right one square. When we add transitivity, we gain the ability to move diagonally
by two squares, or three, or whatever. So P is an infinite set. Each element of P consists of the set of points
that fall on an infinite diagonal line running from lower left to upper right.
(b) Now we can more upward on either diagonal. And we can move up and right followed by up and left,
and so forth. The one thing we cant do is move directly up or down or right or left exactly one square. So take
any given point. To visualize the points to which it is related under R, imagine a black and white chess board
where the squares correspond to points on our grid. Each point is related to all other points of the same color.
Thus the cardinality of P is 2.
(c) Now every point is related to every other point. The cardinality of P is 1.
14. You might think that for all relations R on some domain D, the transitive closure of the symmetric closure
of R (call it TC(SC(R))) must be reflexive because for any two elements x, y D such that (x, y) R, we'll
have (x, y), (y, z) SC(R) and therefore (x, x), (y, y) TC(SC(R)). This is all true, but does not prove that for
all z D, (z, z) TC(SC(R)). Why not? Suppose there is a z D such that there is no y D for which (y, z)
R or (z, y) R. (If you look at the graph of R, z is an isolated vertex with no edges in or out.) Then (z, z)
TC(SC(R)). So the answer is no, with R = on domain {a} as a simple counterexample: TC(SC(R)) = , yet
it should contain (a, a) if it were reflexive.
15. R = {(a, b), (b, a)} on domain {a, b} does the trick easily.
Homework 1
Basic Techniques
Homework 1
Basic Techniques
CS 341 Homework 2
Strings and Languages
1. Let = {a, b}. Let L1 = {x *: |x| < 4}. Let L2 = {aa, aaa, aaaa}. List the elements in each of the
following languages L:
(a) L3 = L1 L2
(b) L4 = L1 L2
(c) L5 = L1 L4
(d) L6 = L1 - L2
2. Consider the language L = anbncm. Which of the following strings are in L?
(a)
(b) ab
(c) c
(d) aabc
(e) aabbcc
(f) abbcc
3. It probably seems obvious to you that if you reverse a string, the character that was originally first becomes
last. But the definition we've given doesn't say that; it says only that the character that was originally last
becomes first. If we want to be able to use our intuition about what happens to the first character in a proof, we
need to turn it into a theorem. Prove x,a where x is a string and a is a single character, (ax)R = xRa.
4. For each of the following binary functions, state whether or not it is (i) one-to-one, (ii) onto, (iii) idempotent,
(iv) commutative, and (v) associative. Also (vi) state whether or not it has an identity, and, if so, what it is.
Justify your answers.
(a) || : S S S, where S is the set of strings of length 0
||(a, b) = a || b (In other words, simply concatenation defined on strings)
(b) || : L L L where L is a language over some alphabet
||(a, b) = {w *: w = x || y for some x a and y b} In other words, the concatenation of two
languages A and B is the set of strings that can be derived by taking a string from A and then
concatenating onto it a string from B.
5. We can define a unary function F to be self-inverse iff x Domain(F) F(F(x)) = x. The Reverse function
on strings is self-inverse, for example.
(a) Give an example of a self-inverse function on the natural numbers, on sets, and on booleans.
(b) Prove that the Reverse function on strings is self-inverse.
Solutions
1. First we observe that L1 = {, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb}.
(a) L3 = {, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa}
(b) L4 = {aa, aaa}
(c) L5 = every way of selecting one element from L1 followed by one element from L4:
{aa, aaa, baa, aaaa, abaa, baaa, bbaa, aaaaa, aabaa, abaaa, abbaa, baaaa, babaa, bbaaa, bbbaa}
{aaa, aaaa, baaa, aaaaa, abaaa, baaaa, bbaaa, aaaaaa, aabaaa, abaaaa, abbaaa, baaaaa, babaaa,
bbaaaa, bbbaaa}. Note that we've written aa, just to make it clear how this string was derived. It
should actually be written as just aa. Also note that some elements are in both of these sets (i.e.,
there's
more than one way to derive them). Eliminating duplicates (since L is a set and thus does not contain
duplicates), we get:
{aa, aaa, baa, aaaa, abaa, baaa, bbaa, aaaaa, aabaa, abaaa, abbaa, baaaa, babaa, bbaaa, bbbaa, aaaaaa,
aabaaa, abaaaa, abbaaa, baaaaa, babaaa, bbaaaa, bbbaaa}
(d) L6 = every string that is in L1 but not in L2: {, a, b, ab, ba, bb, aab, aba, abb, baa, bab, bba, bbb}.
Homework 2
2. (a)
(b)
(c)
(d)
(e)
(f)
Yes. n = 0 and m = 0.
Yes. n = 1 and m = 0.
Yes. n = 0 and m = 1.
No. There must be equal numbers of a's and b's.
Yes. n = 2 and m = 2.
No. There must be equal numbers of a's and b's.
3. Prove: x,a where x is a string and a is a single character, (ax)R = xRa. We'll use induction on the length of
x. If |x| = 0 (i.e, x = ), then (a)R = a = Ra. Next we show that if this is true for all strings of length n, then it
is true for all strings of length n + 1. Consider any string x of length n + 1. Since |x| > 0, we can rewrite x as yb
for some single character b.
(ax)R = (ayb)R
Rewrite of x as yb
= b(ay)R
Definition of reversal
= b(yRa)
Induction hypothesis (since |x| = n + 1, |y| = n)
= (b yR) a
Associativity of concatenation
= xRa
Definition of reversal: If x = yb then xR = byR
4. (a)
(b)
on
the fact that is an identity for concatenation of strings. Given the way in which we defined
concatenation of languages as the concatenation of strings drawn from the two languages, {} is an
identity for concatenation of languages and thus it enables us to prove that all languages can be derived
from the concatenation operation.
(iii) || is not idempotent. ||({a}, {a}) = {aa}
(iv) || is not commutative. ||({a}, {b}) = {ab}. But ||({b}, {a}) = {ba}.
(v) || is associative.
(vi) || has {} as both a left and right identity.
5. (a)
(b)
Homework 2
CS 341 Homework 3
Languages and Regular Expressions
1. Describe in English, as briefly as possible, each of the following (in other words, describe the language
defined by each regular expression):
(a) L( ((a*a) b) b )
(b) L( (((a*b*)*ab) ((a*b*)*ba))(b a)* )
2. Rewrite each of these regular expressions as a simpler expression representing the same set.
(a) * a* b* (a b)*
(b) ((a*b*)* (b*a*)*)*
(c) (a*b)* (b*a)*
3. Let = {a, b}. Write regular expressions for the following sets:
(a) All strings in * whose number of a's is divisible by three.
(b) All strings in * with no more than three a's.
(c) All strings in * with exactly one occurrence of the substring aaa.
4. Which of the following are true? Prove your answer.
(a) baa a*b*a*b*
(b) b*a* a*b* = a* b*
(c) a*b* c*d* =
(d) abcd (a(cd)*b)*
5. Show that L((a b)*) = L(a* (ba*)*).
6. Consider the following:
(a) ((a b) (ab))*
(b) (a+ anbn)
(c) ((ab)* )
(d) (((ab) c)* (b c*))
(e) (* (bb*))
(i) Which of the above are pure regular expressions?
(ii) For each of the above that is a regular expression, give a simplified equivalent pure regular expression.
(iii) Which of the above represent regular languages?
7. True - False: For all languages L1, L2, and L3
(a) (L1L2)* = L1*L2*
(b) (L1 L2)* = L1* L2*
(c) (L1 L2) L3 = L1 L3 L2 L3
(d) (L1 L2) L3 = (L1 L3) (L2 L3)
(e) L1+)* = L1*
(f) (L1+)+ = L1+
(g) (L1*)+ = (L1+)*
(h) L1* = L1+
(i) (ab)*a = a(ba)*
(j) (a b)* b (a b)* = a* b (a b)*
(k) [(a b)* b (a b)* (a b)* a (a b)*] = (a b)*
(l) [(a b)* b (a b)* (a b)* a (a b)*] = (a b)+
(m) [(a b)* b a (a b)* a*b*] = (a b)*
Homework 3
Any string of a's and/or b's with zero or more a's followed by a single b.
Any string of a's and/or b's with at least one occurrence of ab or ba.
Homework 3
(c) (a*b)* (b*a)* = (a b)* (In other words, all strings over {a, b}.) How do we know that? (a*b)* is the
union of and all strings that end in b. (b*a)* is the union of and all strings that end in a. Clearly any string
over {a, b} must either be empty or it must end in a or b. So we've got them all.
3. (a) The a's must come in groups of three, but of course there can be arbitrary numbers of b's everywhere. So:
(b*ab*ab*a)*b*
Since the first expression has * around it, it can occur 0 or more times, to give us any number of a's
that is divisible by 3.
(b) Another way to think of this is that there are three optional a's and all the b's you want. That gives us:
b* (a ) b* (a ) b* (a ) b*
(c) Another way to think of this is that we need one instance of aaa. All other instances of aa must occur
with
either b or end of string on both sides. The aaa can occur anywhere so we'll plunk it down, then list the
options for everything else twice, once on each side of it:
(ab aab b)*
aaa
(ba baa b)*
4. (a) True. Consider the defining regular expression: a*b*a*b*. To get baa, take no a's, then one b, then two
a's then no b's.
(b) True. We can prove that two sets X and Y are equal by showing that any string in X must also be in Y
and vice versa. First we show that any string in b*a* a*b* (which we'll call X) must also be in a* b*
(which we'll call Y). Any string in X must have two properties: (from b*a*): all b's come before all a's; and
(from a*b*): all a's come before all b's. The only way to have both of these properties simultaneously is to be
composed of only a's or only b's. That's exactly what it takes to be in Y.
Next we must show that every string in Y is in X. Every string in Y is either of the form a* or b*. All strings
of the form a* are in X since we simply take b* to be b0, which gives us a* a* = a*. Similarly for all strings
of the form b*, where we take a* to be a0.
(c) False. Remember that to show that any statements is false it is sufficient to find a single counterexample:
a*b* and c*d*. Thus a*b* c*d* , which is therefore not equal to .
(d) False. There is no way to generate abcd from (a(cd)*b)*. Let's call the language generated by
(a(cd)*b)* L. Notice that every string in L has the property that every instance of (cd)* is immediately
preceded by a. abcd does not possess that property.
5. That the language on the right is included in the language on the left is immediately apparent since every
string in the right-hand language is a string of a's and b's. To show that any string of a's and b's is contained in
the language on the right, we note that any such string begins with zero or more a's. If there are no b's, then the
string is contained in a*. If there is at least one b, we strip off any initial a's as a part of a* and examine the
remainder. If there are no more b's, the remainder is in ba*. If there is at least one more b to the right, then we
strip of the initial b and any following consecutive a's (a string in ba*) and examine the remainder. Repeat the
last two steps until the end of the string is reached. Thus, every string of a's and b's is included in the language
on the right.
6. (i) a, c, e (b contains superscript n; d contains )
(ii) (a) = (a b)*
(c) =
(e) = b*
(iii) a, c, d, e (b is {ambn : m > n}, which is not regular)
7. (a) F, (b) F, (c) T, (d) F, (e) T, (f) T, (g) T, (h) F, (i) T, (j) T, (k) F, (l) T, (m) T, (n) F, (o) F, (p) T (by def. of
+), (q) T, (r) F, (s) T, (t) F, (u) F, (v) T.
Homework 3
Homework 3
CS 341 Homework 4
Deterministic Finite Automata
1. If M is a deterministic finite automaton. Under exactly what circumstances is L(M)?
2. Describe informally the languages accepted by each of the following deterministic FSMs:
(from Elements of the Theory of Computation, H. R. Lewis and C. H. Papdimitriou, Prentice-Hall, 1998.)
Homework 4
a
4
a
b
3
a
a
6
a,b
7. Give a dfa accepting {x {a, b}* : at least one a in x is not immediately followed by b}.
8. Let L = {w {a, b}* : w does not end in ba}.
(a) Construct a dfa accepting L.
(b) Give a regular expression for L.
9. Consider L = {anbn : 0 n 4}
(a) Show that L is regular by giving a dfa that accepts it.
(b) Give a regular expression for L.
10. Construct a deterministic finite state machine to accept strings that correspond to odd integers without
leading zeros.
11. Imagine a traffic light. Let = {a}. In other words, the input consists just of a string of a's. Think of
each a as the output from a timer that signals the light to change. Construct a deterministic finite state
transducer whose outputs are drawn from the set {Y, G, R} (corresponding to the colors yellow, green, and
red). The outputs of the transducer should correspond to the standard traffic light behavior.
12. Recall the finite state machine that we constructed in class to accept $1.00 in change or bills. Modify
the soda machine so that it actually does something (i.e., some soda comes out) by converting our finite state
acceptor to a finite state transducer. Let there be two buttons, one for Coke at $.50 and one for Water at
$.75 (yes, it's strange that water costs more than Coke. The world is a strange place). In any case, there will
now be two new symbols in the input alphabet, C and W. The machine should behave as follows:
Homework 4
The machine should keep track of how much money has been inserted. If it ever gets more than $1.50, it
should spit back enough to get it under $1.00 but keep it above $.75.
If the Coke or Water button is pushed and enough money has been inserted, the product and the change
should be output.
If a button is pushed and there is not enough money, the machine should remember the button push and
wait until there is enough money, at which point it should output the product and the change.
13. Consider the problem of designing an annoying buzzer that goes off whenever you try to drive your car
and you're not wearing a seat belt. (For simplicity, we'll just worry about the driver's possible death wish. If
you want to make this harder, you can worry about the other seats as well.) Design a finite state transducer
whose inputs are drawn from the alphabet {KI, KR, SO, SU, BF, BU}, representing the following events,
respectively: "key just inserted into ignition", "key just removed from ignition", "seat just became
occupied", "seat just became unoccupied", "belt has just been fastened", and "belt has just been unfastened".
The output alphabet is {ON, OFF}. The buzzer should go on when ON is output and stay off until OFF is
output.
14. Is it possible to construct a finite state transducer that can output the following sequence:
1010010001000010000010000001
If it is possible, design one. If it's not possible, why not?
Solutions
1. L(M) iff the initial state is a final state. Proof: M will halt in its initial state given as input. So: (IF)
If the initial state is a final state, then when M halts in the initial state, it will be in a final state and will
accept as an element of L(M). (ONLY IF) If the initial state is not a final state, then when M halts in the
initial state, it will reject its input, namely . So the only way to accept is for the initial state to be a final
state.
2.
Homework 4
3.
Homework 4
Homework 4
4. (a)
(b)
Homework 4
(c)
5.
a
a,b
1
6. (aa)* (bb* bb*a(aa)*) = (aa)*b+( a(aa)*) = all strings of a's and b's consisting of an even number of
a's, followed by at least one b, followed by zero or an odd number of a's.
7.
b
a
a,b
b
(b) a (a b)* (b aa)
8. (a)
a
2
a
a
4
Homework 4
9. (a)
a
b
b
a,b
b
a
b
a
a,b
(b) ( ab aabb aaabbb aaaabbbb)
Homework 4
CS 341 Homework 5
Regular Expressions in UNIX
Regular expressions are all over the place in UNIX, including the programs grep, sed, and vi.
There's a regular expression pattern matcher built into the programming language perl. There's also
one built into the majordomo maillist program, to be used as a way to filter email messages. So it's
easy to see that people have found regular expressions extremely useful. Each of the programs that
uses the basic idea offers its own definition of what a regular expression is. Some of them are more
powerful than others. The definition in perl is shown on the reverse of this page.
1. Write perl regular expressions to do the following things. If you have easy access to a perl
interpreter, you might even want to run them.
(a) match occurrences of your phone number
(b) match occurrences of any phone number
(c) match occurrences of any phone number that occurs more than once in a string
(d) match occurrences of any email address that occurs more than once in a string
(e) match the Subject field of any mail message from yourself
(f) match any email messages where the address of the sender occurs in the body of
the message
2. Examine the constructs in the perl regular expression definition closely. Compare them to the
much more limited definition we are using. Some of them can easily be described in terms of the
primitive capabilities we have. In other words, they don't offer additional power, just additional
convenience. Some of them, though, are genuinely more powerful, in the sense that they enable you
to define languages that aren't regular (i.e., they cannot be recognized with Finite State Machines).
Which of the perl constructs actually add power to the system? What is it about them that makes
them more powerful?
Homework 5
from Programming in Perl, Larry Wall and Randall L. Scwartz, OReilly & Associates, 1990.
Homework 5
CS 341 Homework 6
Nondeterministic Finite Automata
1. (a) Which of the following strings are accepted by the nondeterministic finite automaton shown on the left below?
(i)
a
(ii)
aa
(iii)
aab
(iv)
b
a
b
b
(b) Which of the following strings are accepted by the nondeterministic finite automaton on the right above?
(i)
(ii)
ab
(iii)
abab
(iv)
aba
(v)
abaa
2. Write regular expressions for the languages accepted by the nondeterministic finite automata of problem 1.
3. For any FSM F, let |F| be the number of states in F. Let R be the machine shown on the right in problem 1.
Let L = {w {0, 1}* : M such that M is an FSM, L(M) = L(R), |M| |R|, and w is the binary encoding of |M|}. Write a
regular expression for L.
4. Draw state diagrams for nondeterministic finite automata that accept these languages:
(a) (ab)*(ba)* aa*
(b) ((ab aab)*a*)*
(c) ((a*b*a*)*b)*
(d) (ba b)* (bb a)*
5. Some authors define a nondeterministic finite automaton to be a quintuple (K, , , S, F), where K, , , and F are as we
have defined them and S is a finite set of initial states, in the same way that F is a finite set of final states. The automaton may
nondeterministically begin operating in any of these initial states. Explain why this definition is not more general than ours in
any significant way.
6. (a) Find a simple nondeterministic finite automaton accepting ((a b)*aabab).
(b) Convert the nondeterministic finite automaton of Part (a) into a deterministic finite automaton by the method described
in class and in the notes.
(c) Try to understand how the machine constructed in Part (b) operates. Can you find an equivalent deterministic machine
with fewer states?
7. Construct a NDFSA that accepts the language (ba ((a bb) a*b)).
Homework 6
0
a,b
b
b
4
a
Homework 6
(ab a)*. This is so because aab can be formed by one application a, followed by one of ab. So it
is redundant inside a Kleene star. Now we can write a two state machine:
a
b
If you put the loop on a on the start state, either in place of where we have it, or in addition to it, it's also right.
(c) First we simplify:
((a*b*a*)*b)*
/ (L1*L2*L3*)* = (L1 L2 L3)* /
((a b a)*b)*
/ union is idempotent /
((a b)*b)*
There is a simple 2 state NDFSM accepting this, which is the empty string and all strings ending with b.
(d) This is the set of strings where either:
(1) every a is preceded by a b,
or (2) all b's occur in pairs.
So we can make a five state nondeterministic machine by making separate machines (each with two states) for the two
languages and then introducing transitions from the start state to both of them.
5. To explain that any construct A is not more general or powerful than some other construct B, it suffices to show that any
instance of A can be simulated by a corresponding instance of B. So in this case, we have to show how to take a multiple start
state NDFSA, A, and convert it to a NDFSA, B, with a single start state. We do this by initially making B equal to A. Then
add to B a new state we'll call S0. Make it the only start state in B. Now add transitions from S0 to each of the states that
was a start state in A. So B has a single start state (thus it satisfies our original definition of a NDFSA), but it simulates the
behavior of A since the first thing it does is to move, nondeterministically, to all of A's start states and then it exactly mimics
the behavior of A.
6. If you take the state machine as it is given, add a new start state and make transitions from it to the given start states, you
have an equivalent machine in the form that weve been using.
7. (a)
a
q0
a
q1
b
q2
a
q3
b
q4
q5
a,b
(b)
(1) Compute the E(q)s. Since there are no transitions, E(q), for all states q is just {q}.
(2) S = {q0}
(3) =
{
({q0}, a, {q0, q1}),
({q0}, b, {{q0}),
({q0, q1}, a, {q0, q1, q2}),
({q0, q1}, b, {q0}),
({q0, q1, q2}, a, {q0, q1, q2}),
({q0, q1, q2}, b, {q0, q3}),
({q0, q3}, a, {q0, q1, q4}),
({q0, q3}, b, {q0}),
({q0, q1, q4}, a, {q0, q1, q2}),
({q0, q1, q4}, b, {q0, q5}),
({q0, q5}, a, {q0, q1}),
({q0, q5}, b, {q0}) }
Homework 6
(4) K = {{q0}, {q0, q1}, {q0, q1, q2}, {q0, q3}, {q0, q1, q4}, {q0, q5}}
(5) F = {{q0, q5}}
(c) There isnt a simpler machine since we need a minimum of six states in order to keep track of how many characters
(between 0 and 5) of the required trailing string we have seen so far.
8. We can build the following machine really easily. We make the path from 1 to 2 to 3 for the ba option. The rest is for the
second choice. We get a nondeterministic machine, as we generally do when we use the simple approach.
1
b
a
a
4
In this case, we could simplify our machine if we wanted to and get rid of state 4 by adding a transition on b from 2 to 5.
9. (1) E(q0) = {q0, q1}, E(q1) = {q1}, E(q2) = {q2}, E(q3) = {q3, q4}, E(q4) = {q4}
(2) s = {q0, q1}
(3) =
{ ({q0, q1}, a, {q0, q1}),
({q0, q1}, b, {q0, q1, q2, q4}),
({q0, q1, q2, q4}, a, {q0, q1, q3, q4}),
({q0, q1, q2, q4), b, {q0, q1, q2, q4}) }
({q0, q1, q3, q4}, a, {q0, q1, q3, q4}),
({q0, q1, q3, q4}, b, {q0, q1, q2, q4}),
(4) K = { {q0, q1}, {q0, q1, q3, q4}, {q0, q1, q2, q4} }
(5) F = { {q0, q1, q3, q4}, {q0, q1, q2, q4} }
This machine corresponds to the regular expression a*b(a b)*
10. (a)
a
a
b
S
F
b
Homework 6
CS 341 Homework 7
Review of Equivalence Relations
1. Assume a finite domain that includes just the specific cities mentioned here. Let R = the reflexive,
symmetric, transitive closure of:
(Austin, Dallas), (Dallas, Houston), (Dallas, Amarillo), (Austin, San Marcos),
(Philadelphia, Pittsburgh), (Philadelphia, Paoli), (Paoli, Scranton),
(San Francisco, Los Angeles), (Los Angeles, Long Beach), (Long Beach, Carmel)
(a) Draw R as a graph.
(b) List the elements of the partition defined by R on its domain.
2. Let R be a relation on the set of positive integers. Define R as follows:
{(a, b) : (a mod 2) = (b mod 2)} In other words, R(a, b) iff a and b have the same remainder when
divided by 2.
(a) Consider the following example integers: 1, 2, 3, 4, 5, 6. Draw the subset of R involving just these values as
a graph.
(b) How many elements are there in the partition that R defines on the positive integers?
(c) List the elements of that partition and show some example elements.
3. Consider the language L, over the alphabet = {a, b}, defined by the regular expression
a*(b ) a*
Let R be a relation on *, defined as follows:
R(x, y) iff both x and y are in L or neither x nor y is in L. In other words, R(x,y) if x and y have
identical status with respect to L.
(a) Consider the following example elements of *: , b, aa, bb, aabaaa, bab, bbaabb. Draw the subset of R
involving just these values as a graph.
(b) How many elements are there in the partition that R defines on *?
(c) List the elements of that partition and show some example elements.
Solutions
1.
2.
(b) Two
(c) [even integers] Examples: 2, 4, 6, 106
[odd integers] Examples: 1, 3, 5, 17, 11679
3.
(a) (Hint: L is the language of strings with no more than one b.)
(b) Two
(c) [strings in L] Examples: , aa, b, aabaaa
[strings not in L] Examples: bb, bbaabb, bab
Homework 7
CS 341 Homework 8
Finite Automata, Regular Expressions, and Regular Grammars
1. We showed that the set of finite state machines is closed under complement. To do that, we presented a
technique for converting a deterministic machine M into a machine M' such that L(M') is the complement of
L(M). Why did we insist that M be deterministic? What happens if we interchange the final and nonfinal states
of a nondeterministic finite automaton?
2. Give a direct construction for the closure under intersection of the languages accepted by finite automata.
(Hint: Consider an automaton whose set of states is the Cartesian product of the sets of states of the two
original automata.) Which of the two constructions, the one given in the text or the one suggested in this
problem, is more efficient when the two languages are given in terms of nondeterministic finite automata?
3. Using the either of the construction techniques that we discussed, construct a finite automaton that accepts
the language defined by the regular expression: a*(ab ba )b*.
4. Write a regular expression for the language recognized by the following FSM:
a
b
b
a
a
a,b
b
a
b
1
a
Homework 8
8. Construct a deterministic FSM to accept the intersection of the languages accepted by the following FSMs:
a
2'
1'
a
a
3'
b
1
a,b
Homework 8
Homework 8
4. Without using the algorithm for finding a regular expression from an FSM, we can note in this case that the
lower right state is a dead state, i.e., an absorbing, non-accepting state. We can leave and return to the initial
state, the only accepting state, by reading ab along the upper path or by reading ba along the lower path. These
can be read any number of times, in any order, so the regular expression is (ab ba)*. Note that is included,
as it should be.
5. (a) ((a ba)(ba)*b)
(b)
a
{1}
a
{2, 3}
b
{0}
a
b
{2}
a,b
6. (a)
a
a
a
b
a,b
b
b
a
a,b
b
Homework 8
(b)
a
a
b
b
b
a
a,b
7. (a) Nonterminal S is the starting symbol. We'll use it to generate an odd number of a's. We'll also use the
nonterminal E, and it will always generate an even number of a's. So, whenever we generate an a, we must
either stop then, or we must generate the nonterminal E to reflect the fact that if we generate any more a's, we
must generate an even number of them.
Sa
b
S aE
(b)
a
S bS
Eb
S
a
E
E bE
b
E aS
a
b
#
8.
b
<1, 1'>
<2, 1'>
b
<3, 1'>
a
b
a
b
<1, 2'>
<2,2'>
<3, 2'>
a
b
a
b
<1,3'>
<2, 3'>
<3, 3'>
b
b
a
9. (a) (a bb*aa)* ( bb*(a ))
(b) All strings in {a, b}* that contain no occurrence of bab.
Homework 8
CS 341 Homework 9
Languages That Are and Are Not Regular
1. Show that the following are not regular.
(a) L = {wwR : w {a, b}*}
(b) L = {ww : w {a, b}*}
(c) L = {ww' : w {a, b}*}, where w' stands for w with each occurrence of a replaced by b, and vice versa.
2. Show that each of the following is or is not a regular language. The decimal notation for a number is the
number written in the usual way, as a string over the alphabet {-, 0, 1, , 9}. For example, the decimal
notation for 13 is a string of length 2. In unary notation, only the symbol 1 is used; thus 5 would be represented
as 11111 in unary notation.
(a) L = {w : w is the unary notation for a natural number that is a multiple of 7}
(b) L = {w : w is the decimal notation for a natural number that is a multiple of 7}
(c) L = {w : w is the unary notation for a natural number n such that there exists a pair p and q of twin primes,
both > n.} Two numbers p and q are a pair of twin primes iff q = p + 2 and both p and q are prime. For
example, (3, 5) is a pair of twin primes.
(d) L = {w : w is, for some n 1, the unary notation for 10n}
(e) L = {w : w is, for some n 1, the decimal notation for 10n}
(f) L = {w is of the form x#y, where x, y {1}+ and y = x+1 when x and y are interpreted as unary numbers}
(For example, 11#111 and 1111#11111 L, while 11#11, 1#111, and 1111 L.)
(g) L = {anbj: |n j| = 2}
(h) L = {uwwRv : u, v, w {a, b}+}
(i) L = {w {a, b}* : for each prefix x of w, #a(x) #b(x)}
3. Are the following statements true or false? Explain your answer in each case. (In each case, a fixed alphabet
is assumed.)
(a) Every subset of a regular language is regular.
(b) Let L = L1 L2. If L is regular and L2 is regular, L1 must be regular.
(c) If L is regular, then so is L = {xy : x L and y L}.
(d) {w : w = wR} is regular.
(e) If L is a regular language, then so is L = {w : w L and wR L}.
(f) If C is any set of regular languages, C (the union of all the elements of C) is a regular language.
(g) L = {xyxR : x, y *} is regular.
(h) If L = L1 L2 is a regular language and L1 is a regular language, then L2 is a regular language.
(i) Every regular language has a regular proper subset.
(j) If L1 and L2 are nonregular languages, then L1 L2 is also not regular.
4. Show that the language L = {anbm : n m} is not regular.
5. Prove or disprove the following statement:
If L1 and L2 are not regular languages, then L1 L2 is not regular.
6. Show that the language L = {x {a, b}* : x = anbambamax(m,n)} is not regular.
7. Show that the language L = {x {a, b}* : x contains exactly two more b's than a's} is not regular.
8. Show that the language L = {x {a, b}* : x contains twice as many a's as b's} is not regular.
Homework 9
Homework 9
(b) L = {w : w is the decimal notation for a natural number that is a multiple of 7}. L is regular. We can
build a deterministic FSM M to accept it. M is based on the standard algorithm for long division. The states
represent the remainders we have seen so far (so there are 7 of them, corresponding to 0 6). The start state, of
course, is 0, corresponding to a remainder of 0. So is the final state. The transitions of M are as follows:
si {0 - 6} and cj {0 - 9}, (si, cj) = (10si + cj) mod 7
So, for example, on the input 962, M would first read 9. When you divide 7 into 9 you get 1 (which we dont
care about since we dont actually care about the answer we just care whether the remainder is 0) with a
remainder of 2. So M will enter state 2. Next it reads 6. Since it is in state 2, it must divide 7 into 2*10 +6
(26). It gets a remainder of 5, so it goes to state 5. Next it reads 2. Since it is in state 5, it must divide 7 into
5*10 + 5 (52), producing a remainder of 3. Since 3 is not zero, we know that 862 is not divisible by 7, so M
rejects.
(c) L = {w : w is the unary notation for a natural number such that there exists a pair p and q of twin primes,
both > n.}. L is regular. Unfortunately, this time we dont know how to build a PDA for it. We can, however,
prove that it is regular by considering the following two possibilities:
(1) There is an infinite number of twin primes. In this case, for every n, there exists a pair of twin primes
greater than n. Thus L = 1*, which is clearly regular.
(2) There is not an infinite number of twin primes. In this case, there is some largest pair. There is thus
also a largest n that has a pair greater than it. Thus the set of such ns is finite and so is L (the unary
encodings of those values of n). Since L is finite, it is clearly regular.
It is not known which of these cases is true. But interestingly, from our point of view, it doesnt matter. L is
regular in either case. It may bother you that we can assert that L is regular when we cannot draw either an
FSM or a regular expression for it. It shouldnt bother you. We have just given a nonconstructive proof that L
is regular (and thus, by the way, that some FSM M accepts it). Not all proofs need to be constructive. This
situation isnt really any different from the case of L = {w : w is the unary encoding of the number of siblings
I have}. You know that L is finite and thus regular, even though you do not know how many siblings I have
and thus cannot actually build a machine to accept L.
(d) L = {w : w is, for some n 1, the unary notation for 10n}. So L = {1111111111, 1100, 11000, }. L isnt
regular, since clearly any machine to accept L will have to count the 1s. We can prove this using the pumping
lemma: Let w = 1P, N P and P is some power of 10. y must be some number of 1s. Clearly, it can be of
length at most P. When we pump it in once, we get a string s whose maximum length is therefore 2P. But the
next power of 10 is 10P. Thus s cannot be in L.
(e) L = {w : w is, for some n 1, the decimal notation for 10n}. Often its easier to work with unary
representations, but not in this case. This L is regular, since it is just 100*.
(f) L = {w is of the form x#y, where x, y {1}+ and y = x+1 when x and y are interpreted as unary numbers}
(For example, 11#111 and 1111#11111 L, while 11#11, 1#111, and 1111 L.) L isnt regular. Intuitively, it
isnt regular because any machine to accept it must count the 1s before the # and then compare that number to
the number of 1s after the #. We can prove that this is true using the pumping lemma: Let w = 1N#1N+1. Since
|xy| N, y must occur in the region before the #. Thus when we pump (either in or out) we will change x but
not make the corresponding change to y, so y will no longer equal x +1. The resulting string is thus not in L.
(g) L = {anbj: |n j| = 2}. L isnt regular. L consists of all strings of the form a*b* where either the number
of as is two more than the number of bs or the number of bs is two more than the number of as. We can
show that L is not regular by pumping. Let w = aNbN+2. Since |xy| N, y must equal ap for some p > 0. We can
pump y out once, which will generate the string aN-pbN+2, which is not in L.
Homework 9
(h) L = {uwwRv : u, v, w {a, b}+}. L is regular. This may seem counterintuitive. But any string of length
at least four with two consecutive symbols, not including the first and the last ones, is in L. We simply make
everything up to the first of the two consecutive symbols u. The first of the two consecutive symbols is w. The
second is wR. And the rest of the string is v. And only strings with at least one pair of consecutive symbols
(not including the first and last) are in L because w must end with some symbol s. wR must start with that same
symbol s. Thus the string will contain two consecutive occurrences of s. L is regular because it can be
described the regular expression (a b)+ (aa bb) (a b)+.
(i) L = {w {a, b}* : for each prefix x of w, #a(x) #b(x)}. First we need to understand exactly what L is.
In order to do that, we need to define prefix. A string x is a prefix of a string y iff z * such that y = xz. In
other words, x is a prefix of y iff x is an initial substring of y. For example, the prefixes of abba are , a, ab,
abb, and abba. So L is all strings over {a, b}* such that, at any point in the string (reading left to right), there
have never been more bs than as. The strings , a, ab, aaabbb, and ababa are in L. The strings b, ba, abba, and
ababb are not in L. L is not regular, which we can show by pumping. Let w = aNbN. So y = ap, for some
nonzero p. If we pump out, there will be fewer as than bs in the resulting string s. So s is not in L since every
string is a prefix of itself.
3. (a) Every subset of a regular language is regular. FALSE. Often the easiest way to show that a universally
quantified statement such as this is false by showing a counterexample. So consider L = a*. L is clearly
regular, since we have just shown a regular expression for it. Now consider L = ai: i is prime. L L. But we
showed in class that L is not regular.
(b) Let L = L1 L2. If L is regular and L2 is regular, L1 must be regular. FALSE. We know that the
regular languages are closed under intersection. But it is important to keep in mind that this closure lemma (as
well as all the others we will prove) only says exactly what it says and no more. In particular, it says that:
If L1 is regular and L2 is regular
Then L is regular.
Just like any implication, we cant run this one backward and conclude anything from the fact that L is regular.
Of course, we cant use the closure lemma to say that L1 must not be regular either. So we cant apply the
closure lemma here at all. A rule of thumb: it is almost never true that you can prove the converse of a closure
lemma. So it makes sense to look first for a counterexample. We dont have to look far. Let L = . Let L2 =
. So L and L2 are regular. Now let L1 = {ai: i is prime}. L1 is not regular. Yet L = L1 L2. Notice that
we could have made L2 anything at all and its intersection with would have been . When you are looking
for counterexamples, it usually works to look for very simple ones such as or *, so its a good idea to start
there first. works well in this case because were doing intersection. * is often useful when were doing
union.
(c) If L is regular, then so is L = {xy : x L and y L}. TRUE. Proof: Saying that y L is equivalent to
saying that y L. Since the regular languages are closed under complement, we know that L is also regular. L
is thus the concatenation of two regular languages. The regular languages are closed under concatenation.
Thus L must be regular.
(d) L = {w : w = wR} is regular. FALSE. L is NOT regular. You can prove this easily by using the
pumping lemma and letting w = aNbaN.
(e) If L is a regular language, then so is L = {w : w L and wR L}. TRUE. Proof: Saying that wR L is
equivalent to saying that w LR. If w must be in both L and LR, that is equivalent to saying that L = L LR.
L is regular because the problem statement says so. LR is also regular because the regular languages are closed
Homework 9
under reversal. The regular languages are closed under intersection. So the intersection of L and LR must be
regular.
Proof that the regular languages are closed under reversal (by construction): If L is regular, then there exists
some FSM M that accepts it. From M, we can construct a new FSM M that accepts LR. M will effectively run
M backwards. Start with the states of M equal to states of M. Take the state that corresponds to the start state
of M and make it the final state of M. Next we want to take the final states of M and make them the start states
of M. But M can have only a single start state. So create a new start state in M and create an epsilon
transition from it to each of the states in M that correspond to final states of M. Now just flip the arrows on all
the transitions of M and add these new transitions to M.
(f) If C is any set of regular languages, C is a regular language. FALSE. If C is a finite set of regular
languages, this is true. It follows from the fact that the regular languages are closed under union. But suppose
that C is an infinite set of languages. Then this statement cannot be true. If it were, then every language would
be regular and we have proved that there are languages that are not regular. Why is this? Because every
language is the union of some set of regular languages. Let L be an arbitrary language whose elements are w1,
w2, w3, . Let C be the set of singleton languages {{w1}, {w2}, {w3}, } such that wi L. The number of
elements of C is equal to the cardinality of L. Each individual element of C is a language that contains a single
string, and so it is finite and thus regular. L = C. Thus, since not all languages are regular, it must not be the
case that C is guaranteed to be regular. If youre not sure you follow this argument, you should try to come
up with a specific counterexample. Choose an L such that L is not regular, and show that it can be described as
C for some set of languages C.
(g) L = {xyxR : x, y *} is regular. TRUE. Why? Weve already said that xxR isnt regular. This looks a
lot like that, but it differs in a key way. L is the set of strings that can be described as some string x, followed
by some string y (where x and y can be chosen completely independently), followed by the reverse of x. So, for
example, it is clear that abcccccba L (assuming ={a, b, c}). We let x = ab, y = ccccc, and xR = ba. Now
consider abbcccccaaa. You might think that this string is not in L. But it is. We let x = a, y = bbcccccaa, and
xR = a. What about acccb? This string too is in L. We let x = , y = acccb, and xR = . Note the following
things about our definition of L: (1) There is no restriction on the length of x. Thus we can let x = . (2)There
is no restriction on the relationship of y to x. And (3) R = . Thus L is in fact equal to * because we can take
any string w in * and rewrite it as w , which is of the form xyxR. Since * is regular, L must be regular.
(h) If L = L1 L2 is a regular language and L1 is a regular language, then L2 is a regular language.
FALSE. This is another attempt to use a closure theorem backwards. Let L1 = *. L1 is clearly regular. Since
L1 contains all strings over , the union of L1 with any language is just L1 (i.e., L = *). If the proposition
were true, then all languages L2 would necessarily be regular. But we have already shown that there are
languages that are not regular. Thus the proposition must be false.
(i) Every regular language has a regular proper subset. FALSE. is regular. And it is subset of every set.
Thus it is a subset of every regular language. However, it is not a proper subset of itself. Thus this statement is
false. However the following two similar statements are true:
(1) Every regular language has a regular subset.
(2) Every regular language except has a regular proper subset.
(j) If L1 and L2 are nonregular languages, then L1 L2 is also not regular. False. Let L1 = {anbm, n m}
and L2 = {anbm, n m}. L1 L2 = a*b*, which is regular.
Homework 9
4. If L were regular, then its complement, L1, would also be regular. L1 contains all strings over {a, b} that are
not in L. There are two ways not to be in L: have any a's that occur after any b's (in other words, not have all
the a's followed by all the b's), or have an equal number of a's and b's. So now consider
L2 = L1 a*b*
L2 contains only those elements of L1 in which the a's and b's are in the right order. In other words,
L2 = anbn
But if L were regular, then L1 would be regular. Then L2, since it is the intersection of two regular languages
would also be regular. But we have already shown that it (anbn) is not regular. Thus L cannot be regular.
5. This statement is false. To prove it, we offer a counter example. Let L1 = {anbm : n=m} and let L2 =
{anbm : n m}. We have shown that both L1 and L2 are not regular. However,
L1 L2 = a*b*, which is regular.
There are plenty of other examples as well. Let L1 = {an: n 1 and n is prime}. Let L2 = {an: n 1 and n is not
prime}. Neither L1 nor L2 is regular. But L1 L2 = a+, which is clearly regular.
6. This is easy to prove using the pumping lemma. Let w = aNbaNbaN. We know that xy must be contained
within the first block of a's. So, no matter how y is chosen (as long as it is not empty, as required by the
lemma), for any i > 2, xyiz L, since the first block of a's will be longer than the last block, which is not
allowed. Therefore L is not regular.
7. First, let L' = L a*b*, which must be regular if L is. We observe that L' = anbn+2 : n 0. Now use the
pumping lemma to show that L' is not regular in the same way we used it to show that anbn is not regular.
8. We use the pumping lemma. Let w = a2NbN. xy must be contained within the block of a's, so when we pump
either in or out, it will no longer be true that there will be twice as many a's as b's, since the number of a's
changes but not the number of b's. Thus the pumped string will not be in L. Therefore L is not regular.
9. (a) L is not regular. We can prove this using the pumping lemma. Let w = aNbN. Since y must occur within
the first N characters of w, y = ap for some p > 0. Thus when we pump y in, we will have more as than bs,
which produces strings that are not in L.
(b) L* is also not regular. To prove this, we need first to prove a lemma, which well call EQAB: s, s
L* #a(s) = #b(s). To prove the lemma, we first observe that any string s in L* must be able to be
decomposed into at least one finite sequence of strings, each element of which is in L. Some strings will have
multiple such decompositions. In other words, there may be more than one way to form s by concatenating
together strings in L. For any string s in L*, let SQ be some sequence of elements of L that, when concatenated
together, form s. It doesnt matter which one. Define the function HowMany on the elements of L*.
HowMany(x) returns the length of SQ. Think of HowMany as telling you how many times we went through the
Kleene star loop in deriving x. We will prove EQAB by induction on HowMany(s).
Base case: If HowMany(s) = 0, then s = . #a(s) = #b(s).
Induction hypothesis: If HowMany(s) N, then #a(s) = #b(s).
Show: If HowMany(s) = N+1, then #a(s) = #b(s).
If HowMany(s) = N+1, then w,y such that s = wy, w L*, HowMany(w) = N, and y L. In other words,
we can decompose s into a part that was formed by concatenating together N instances of L plus a second part
that is just one more instance of L. Thus we have:
Homework 9
Definition of L
Induction hypothesis
s = wy
s = wy
4, 2
5, 1
6, 3
Q. E. D.
Now we can show that L* isnt regular using the pumping lemma. Let w = aNbN. Since y must occur within the
first N characters of w, y = ap for some p > 0. Thus when we pump y in, we will have a string with more as
than bs. By EQAB, that string cannot be in L*.
Homework 9
CS 341 Homework 10
State Minimization
1. (a) Give the equivalence classes under L for these languages:
(i) L = (aab ab)*
(ii) L = {x : x contains an occurrence of aababa}
(iii) L = {xxR : x {a, b}*}
(iv) L = {xx : x {a, b}*}
(v) Ln = {a, b}a{a, b}n, where n > 0 is a fixed integer
(vi) The language of balanced parentheses
(b) For those languages in (a) for which the answer is finite, give a deterministic finite automaton with the smallest number
of states that accepts the corresponding language.
2. Let L = {x {a, b}* : x contains at least one a and ends in at least two b's}.
(a) Write a regular expression for L.
(b) Construct a deterministic FSM that accepts L.
(c) Let RL be the equivalence relation of the Myhill-Nerode Theorem. What partition does RL induce on the set
{a, bb, bab, abb, bba, aab, abba, bbaa, baaba}?
(d) How many equivalence classes are there in the partition induced on * by RL?
3. Let L = {x {a, b}* : x begins with a and ends with b}.
(a) What is the nature of the partition induced on * by RL, the equivalence relation of the Myhill-Nerode Theorem? That
is, how many classes are there in the partition and give a description of the strings in each.
(b) Using these equivalence classes, construct the minimum state deterministic FSM that accepts L.
4. Suppose that we are given a language L and a deterministic FSM M that accepts L. Assume L is a subset of {a, b, c}*. Let
RL and RM be the equivalence relations defined in the proof of the Myhill-Nerode Theorem. True or False:
(a) If we know that x and y are two strings in the same equivalence class of RL, we can be sure that they are in the same
equivalence class of RM.
(b) If we know that x and y are two strings in the same equivalence class of RM, we can be sure that they are in the same
equivalence class of RL.
(c) There must be at least one equivalence class of RL that has contains an infinite number of strings.
(d) RM induces a partition on {a, b, c}* that has a finite number of classes.
(e) If L, then [] (the equivalence class containing ) of RL cannot be an infinite set.
5. Use the Myhill-Nerode Theorem to prove that {anbmcmax(m,n)bpdp : m, n, p 0} is not regular.
6. (a) In class we argued that the intersection of two regular languages was regular on the basis of closure properties of
regular languages. We did not show a construction for the FSM that recognizes the intersection of two regular languages.
Such a construction does exist, however, and it is suggested by the fact that L1 L2 = * - ((* - L1) (* - L2)).
Given two deterministic FSMs, M1 and M2, that recognize two regular languages L1 and L2, we can construct an FSM that
recognizes L = L1 L2 (in other words strings that have all the required properties of both L1 and L2), as follows:
1. Construct machines M1' and M2', as deterministic versions of M1 and M2. This step is necessary because complementation
only works on deterministic machines.
2. Construct machines M1'' and M2'', from M1' and M2', using the construction for complementation, that recognize * - L1
and * - L2, respectively.
3. Construct M3, using the construction for union and the machines M1'' and M2'', that recognizes
((* - L1) (* - L2)). This will be a nondeterministic FSM.
4. Construct M4, the deterministic equivalent of M3.
5. Construct ML, using the construction for complementation, that recognizes * - ((* - L1) (* - L2)).
Now consider:
Homework 10
= {a, b}
L1 = {w * : all a's occur in pairs}
State Minimization
Solutions
1. (a)
(i) L = (aab ab)*
1. [, aab, ab, and all other elements of L]
2. [a or wa : w L]
3. [aa or waa : w L]
4. [everything else, i.e., strings that can never become elements of L because they contain illegal
substrings such as bb or aaa]
(ii) L = {x : s contains an occurrence of aababa}
1. [(a b)*aababa(a b)*, i.e., all elements of L]
2. [ or any string not in L and ending in b but not in aab or aabab, i.e., no progress yet on
"aababa"]
3. [wa for any w [2]; alternatively, any string not in L and ending in a but not in aa, aaba, or
aababa]
4. [any string not in L and ending in aa]
5. [any string not in L and ending in aab]
6. [any string not in L and ending in aaba]
7. [any string not in L and ending in aabab]
Note that this time there is no "everything else". Strings never become hopeless in this
language. They simply fail if we get to the end without finding "aababa".
(iii)L = {xxR : x {a, b}*}
1. [a, which is the only string for which the continuations that lead to acceptance are all strings of
the form wa : where w L]
2. [b, which is the only string for which the continuations that lead to acceptance are all strings of
the form wb : where w L]
3. [ab, which is the only string for which the continuations that lead to acceptance are all strings
of the form wba : where w L]
4. [aa, which is the only string for which the continuations that lead to acceptance are all strings
of the form waa : where w L]
And so forth. Every string is in a distinct equivalence class.
(iv) L = {xx : x {a, b}*}
1. [a, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading a]
2. [b, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading b]
3. [ab, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading ab]
Homework 10
State Minimization
4. [aa, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading aa]
And so forth. Every string is in a distinct equivalence class.
(v) Ln = {a, b}a{a, b}n
0. []
1. [a, b]
2. [aa, ba]
3. [aaa, aab, baa, bab]
.
.
.
n+2. [(a b)a(a b)n]
n+3. [strings that can never become elements of Ln]
There is a finite number of strings in any specific language Ln. So there is a finite number of equivalence classes of
L. Every string in Ln must be of length n+2. So there are n+3 equivalence classes (numbered 0 to n+2, to indicate the length
of the strings in the class) of strings that may become elements of Ln, plus one for strings that are already hopeless, either
because they don't start with ab or aa, or because they are already too long.
(vi) L = The language of balanced parentheses
1. [w*(w*: w L]
/* i.e., one extra left parenthesis somewhere in the string
2. [w*((w* : w L]
/*
two
3. [w*(((w* : w L]
4. [w*((((w* : w L]
5. [w*(((((w* : w L]
and so on. There is an infinite number of equivalence classes.
Each of these classes is distinct, since ) is an acceptable continuation for 1, but none of the others; )) is acceptable for
2, but none of the others, ))) is acceptable for 3, but none of the others, and so forth.
1. (b)
(i)
b
a
b
b
a
4
(ii) There's always a very simple nondeterministic FSM to recognize all strings that contain a specific substring. It's just a
chain of states for the desired substring, with loops on all letters of on the start state and the final state. In this case, the
machine is:
a
2
a
3
b
4
a
5
b
6
a
7
a,b
1
a,b
To construct a minimal, deterministic FSM, you have two choices. You can use our algorithm to convert the NDFSM to a
deterministic one and then use the minimization algorithm to get the minimal machine. Or you can construct the minimal
FSM directly. In any case, it is:
Homework 10
State Minimization
a
a
a,b
1
b
a
(v)
a,b
0
a,b
a,b
3
a,b
..
n+2
a,b
n+3
(a b)*a(a b)* bb
a
a
0
1
a
b
3
b
(c) It's easiest to answer (d) first, and then to consider (c) as a special case.
(d)
[0] = all strings that contain no a
[1] = all strings that end with a
[2] = all strings that end with ab
[3] = all strings that contain at least one a and that end with bb, i.e., all strings in L.
It is clear that these classes are pairwise disjoint and that their union is {a, b}*. Thus they represent a partition of {a, b}*. It
is also easy to see that they are the equivalence classes of RL of the Myhill-Nerode Theore, since all the members of one
equivalence class will, when suffixed by any string z, form strings all of which are in L or all of which are not in L. Further,
for any x and y from different equivalence classes, it is easy to find a z such that one of xz, yz is in L and the other is not.
Letting the equivalence relation RL be restricted to the set in part (c), gives the partition
{{bb}, {a, bba, abba, bbaa, baaba}, {bab, aab}, {abb}}.
Homework 10
State Minimization
3. (a)
[1] = {}
[2] = b (a b)*
[3] = a a(a b)*a
[4] = a(a b)*b
(b)
a
1
a
b
a
2
a,b
4. (a) F, (b) T, (c) T (* is infinite and the number of equivalence classes is finite), (d) T, (e) F.
5. Choose any two distinct strings of a's: call them ai and aj (i < j). Then they must be in different equivalence classes of RL
since aibici L but ajbici L. Therefore, there is an infinite number of equivalence classes and L is not regular.
M1, which recognizes L1, is:
6. (a)
a,b
a
a
M2, which recognizes L2, is:
a,b
b
a
a
a,b
a
a
1
a,b
b
a
a
Homework 10
State Minimization
s = {1, 2, 5}
F = K - {2, 8}, i.e., all states except {2, 8} are final states.
You may find it useful at this point to draw this out.
Step (5) ML = M4 except that now there is a single final state, {2, 8}.
ML is deterministic. M4 is deterministic by construction, and Step 5 can not introduce any nondeterminism since it doesn't
introduce any transitions.
(b)
(c)
{1, 2, 5}
{3, 5}
{2, 6}
{2, 5}
{4, 6}
{2, 7}
{4, 5}
{4, 7}
{2, 8}
{4, 8}
{3, 8}
(d)
[]
[strings without bbb but with a single a at the end]
[strings without bbb, with any a's in pairs, and ending in a single b]
[strings without bbb and with at least one pair of a's and any a's in pairs]
[strings that cannot be made legal because they have a single a followed by a b
and where every b is preceded by an a and the last character is b]
[strings without bbb, with any a's in pairs, and ending with just two b's]
[strings that cannot be made legal because they have a single a followed by b
and where there is no bbb and the last character is a]
[strings that cannot be made legal because they have a single a followed by a b
and where there is no bbb but there is at least one bb and the last
character is b]
[all strings in L]
[strings that cannot be made legal because they have a single a followed by a b
and where there is a bbb, but the ab violation came before the first bbb]
[strings with bbb but with a single a at the end]
{1, 2, 5} {2,5} = 1
{3, 5} = 2
{4, 6} {4, 5} {4, 7} {4, 8} = 3
{2, 6} = 4
{2, 7} = 5
{2, 8} = 6
{3, 8} = 7
Homework 10
State Minimization
Homework 10
State Minimization
a
b
G
a,b
a
b
a
E
b
a
C
b
D
a
Homework 10
State Minimization
CS 341 Homework 11
Context-Free Grammars
1. Consider the grammar G = (V, , R, S), where
V = {a, b, S, A},
= {a, b},
R = { S AA,
A AAA,
A a,
A bA,
A Ab }.
(a) Which strings of L(G) can be produced by derivations of four or fewer steps?
(b) Give at least four distinct derivations for the string babbab.
(c) For any m, n, p > 0, describe a derivation in G of the string bmabnabp.
2. Construct context-free grammars that generate each of these languages:
(a) {wcwR : w {a, b}*}
(b) {wwR : w {a, b}*}
(c) {w {a, b}* : w = wR}
3. Consider the alphabet = {a, b, (, ), , *, }. Construct a context-free grammar that generates all strings in
* that are regular expressions over {a, b}.
4. Let G be a context-free grammar and let k > 0. We let Lk(G) L(G) be the set of all strings that have a
derivation in G with k or fewer steps.
(a) What is L5(G), where G = ({S, (, )}, {(, )}, {S , S SS, S (S) })?
(b) Show that, for all context-free grammars G and all k > 0, Lk(G) is finite.
5. Let G = (V, , R, S), where
V = {a, b, S},
= {a, b},
R = { S aSb,
S aSa,
S bSa,
S bSb,
S }.
Show that L(G) is regular.
6. A program in a procedural programming language, such as C or Java, consists of a list of statements, where
each statement is one of several types, such as:
(1) assignment statement, of the form id := E, where E is any arithmetic expression (generated by the grammar
using T and F that we presented in class).
(2) conditional statement, e.g., "if E < E then statement", or while statement , e.g. "while E < E do statement".
(3) goto statement; furthermore each statement could be preceded by a label.
(4) compound statement, i.e., many statements preceded by a begin, followed by an end, and separated by ";".
Give a context-free grammar that generates all possible statements in the simplified programming language
described above.
7. Show that the following languages are context free by exhibiting context-free grammars generating each:
(a) {ambn : m n}
Homework 11
Context-Free Grammars
(b) {ambncpdq : m + n = p + q}
(c) {w {a, b}* : w has twice as many b's as a's}
(d) {uawb : u, w {a, b}*, |u| = |w|}
8. Let = {a, b, c}. Let L be the language of prefix arithmetic defined as follows:
(i) any member of is a well-formed expression (wff).
(ii) if and are any wff's, then so are A, S, M, and D.
(iii) nothing else is a wff.
(One might think of A, S, M, and D as corresponding to the operators +, -, , /, respectively. Thus in L we could
write Aab instead of the usual (a + b), and MSabDbc, instead of ((a - b) (b/c)). Note that parentheses are
unnecessary to resolve ambiguities in L.)
(a) Write a context-free grammar that generates exactly the wff's of L.
(b) Show that L is not regular.
9. Consider the language L = {amb2nc3ndp : p > m, and m, n 1}.
(a) What is the shortest string in L?
(b) Write a context-free grammar to generate L.
Solutions
1. (a) We can do an exhaustive search of all derivations of length no more than 4:
S AA aA aa
S AA aA abA aba
S AA aA aAb aab
S AA bAA baA baa
S AA bAA bAa baa
S AA AbA abA aba
S AA AbA Aba aba
S AA Aa aa
S AA Aa bAa baa
S AA Aa Aba aba
S AA AbA abA aba
S AA AbA Aba aba
S AA AAb aAb aab
S AA AAb Aab aab
Many of these correspond to the same parse trees, just applying the rules in different orders. In any case, the
strings that can be generated are: aa, aab, aba, baa.
(b) Notice that A bA bAb bab, and also that A Ab bAb bab. This suggests 8 distinct
derivations:
S AA AbA AbAb Abab * babbab
S AA AAb AbAb Abab * babbab
S AA bAA bAbA babA * babbab
S AA AbA bAbA babA * babbab
Where each of these four has 2 ways to reach babbab in the last steps. And, of course, one could interleave the
productions rather than doing all of the first A, then all of the second A, or vice versa.
(c) This is a matter of formally describing a sequence of applications of the rules in terms of m, n, p that will
produce the string bmabnabp.
S
/* by rule S AA */
Homework 11
Context-Free Grammars
AA
* /* by m applications of rule A bA */
bmAA
/* by rule A a */
bmaA
* /* by n applications of rule A bA */
bmabnA
* by p applications of rule A Ab */
bmabnAbp
/* by rule A a */
bmabnabp
Clearly this derivation (and some variations on it) produce bmabnabp for each m, n, p.
2. (a) G = (V, , R, S) with V = {S, a, b, c}, = {a, b, c}, R = {
S aSa
S bSb
Sc
}.
(b) Same as (a) except remove c from V and and replace the last rule, S c, by S .
(c) This language very similar to the language of (b). (b) was all even length palindromes; this is all
palindromes. We can use the same grammar as (b) except that we must add two rules:
Sa
Sb
3. This is easy. Recall the inductive definition of regular expressions that was given in class :
1. and each member of is a regular expression.
2. If , are regular expressions, then so is
3. If , are regular expressions, then so is .
4. If is a regular expression, then so is *.
5. If is a regular expression, then so is ().
6. Nothing else is a regular expression.
This definition provides the basis for a grammar for regular expressions:
G = (V, , R, S) with V = {S, a, b, (, ), , *, }, = { a, b, (, ), , *, }, R= {
S
/* part of rule 1, above
Sa
/*
"
Sb
/*
"
S SS
/* rule 2
SSS
/* rule 3
S S*
/* rule 4
S (S)
/* rule 5
}
4. (a) We omit derivations that don't produce strings in L (i.e, they still contain nonterminals).
L1 : S
L2 : S (S) ()
L3 : S SS S
S (S) ((S)) (())
L4 : S SS (S)S ()S ()
S SS S(S) (S) ()
S (S) ((S)) (((S))) ((()))
L5 : S SS (S)S (S)(S) ()(S) ()()
Homework 11
Context-Free Grammars
Context-Free Grammars
T < | > | =
Context-Free Grammars
Another approach you could take is to build a pushdown automaton for L and then derive a grammar from it.
This may be easier simply because PDA's are good at counting. But deriving the grammar isn't trivial either. If
you had a hard time with this one, don't worry.
(d) L = {uawb : u, w {a, b}*, |u| = |w|}. This one fools some people since you might think that the a and b
are correlated somehow. But consider the simpler language L' = {uaw : u, w {a, b}*, |u| = |w|}. This one
seems easier. We just need to generate u and w in parallel, keeping something in the middle that will turn into a.
Now back to L: L is just L' with b tacked on the end. So a grammar for L is:
G = ({S, T, a, b}, {a, b}, R, S), where R = {S Tb, T aTa, T aTb, T bTa, T bTb, T a}.
8. (a) G = ({S, A, M, D, F, a, b, c}. {A, M, D, S, a, b, c}, R, S), where R = {
Fa
F AFF
Fb
F SFF
Fc
F MFF
F DFF }
(b) First, we let L' = L A*a*. L' = {Anan+1 : n 0}. L' can easily be shown to be nonregular using the
Pumping Theorem, so, since the regular languages are closed under intersection, L must not be regular.
9. (a) abbcccdd
(b) G = ({S, X, Y, a, b, c, d}, {a, b, c, d}, R, S), where R is either:
(S aXdd, X Xd, X aXd, X bbYccc, Y bbYccc, Y ), or
(S aSd, S Sd, S aMdd, M bbccc, M bbMccc)
Homework 11
Context-Free Grammars
CS 341 Homework 12
Parse Trees
1. Consider the grammar G = ({+, *, (, ), id, T, F, E}, {+, *, (, ), id}, R, E}, where
R = {E E+T, E T, T T * F, T F, F (E), F id}.
Give two derivations of the string id * id + id, one of which is leftmost and one of which is not leftmost.
2. Draw parse trees for each of the following:
(a) The simple grammar of English we presented in class and the string "big Jim ate green cheese."
(b) The grammar of Problem 1 and the strings id + (id + id) * id and (id * id + id * id).
3. Present a context-free grammar that generates , the empty language.
4. Consider the following grammar (the start symbol is S; the alphabets are implicit in the rules):
S SS | AAA |
A aA | Aa | b
(a) Describe the language generated by this grammar.
(b) Give a left-most derivation for the terminal string abbaba.
(c) Show that the grammar is ambiguous by exhibiting two distinct derivation trees for some terminal string.
(d) If this language is regular, give a regular (right linear) grammar generating it and construct the
corresponding FSM. If the language is not regular, prove that it is not.
5. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. Give a context-free grammar G that generates L and a parse tree that shows that
aababb L.
6. (a) Consider the CFG that you constructed in Homework 11, Problem 2 for {wcwR : w {a, b}*}. How many
derivations are there, using that grammar, for the string aabacabaa?
(b) Show parse tree(s) corresponding to your derivation(s). Is the grammar ambiguous?
7. Consider the language L = {w {a, b}* : w contains equal numbers of a's and b's}
(a) Write a context-free grammar G for L.
(b) Show two derivations (if possible) for the string aabbab using G. Show at least one leftmost derivation.
(c) Do all your derivations result in the same parse tree? If so, see if you can find other parse trees or convince
yourself there are none.
(d) If G is ambiguous (i.e., you found multiple parse trees), remove the ambiguity. (Hint: look out for two
recursive occurrences of the same nonterminal in the right side of a rule, e.g, X XX)
(e) See how many parse trees you get for aabbab using the grammar developed in (d).
Solutions
3. G = ({S} , , R, S), where R is any set of rules that can't produce any strings in *. So, for example, R =
{S S} does the trick. So does R = .
Homework 12
Parse Trees
4. (a) (a*ba*ba*ba*)*
(b) S AAA aAAA abAA abAaA abbaA abbaAa abbaba
(c)
S
S
A
A
A
a
b
(d) G = ({S, S1, B1, B2, B3, a, b}, {a, b}, R, S), where R = {
S
B1 aB1
S S1
B1 bB2
S1 aS1
B2 aB2
S1 bB1
B2 bB3
a
S
b
B3 aB3
B3
B3 S1
a
S1
B3
B1
b
B2
a
5. G = ({S, a, b}, {a, b}, R, S), R = { S aSb, S bSa, S }
S
a
S
a
S
b
b
a
6. (a) The grammar is G = (V, , R, S) with V = {S, a, b, c}, = {a, b, c}, R = {S aSa, S bSb, S c}.
There is a single derivation:
S aSA aaSaa aabSbaa aabaSabaa aabacabaa
(b) There is a single parse tree. The grammar is unambiguous.
7. (a) G = (V, , R, S) with V = {S }, = {a, b }, R = {
S aSb
S bSa
S
S SS
}
(b) (i) S SS aSbS aaSbbS aabbaSb aabbab /* This is the leftmost derivation of the most
"sensible" parse.
(ii) S SS SSS aSbSS aaSbbSS aabbSS aabbaSbS aabbabS aabbab /* This is the
leftmost derivation of a parse that introduced an unnecessary S in the first step, which was then eliminated by
rewriting it as in the final step.
Homework 12
Parse Trees
(c) No. The two derivations shown here have different parse trees. They do, however, have the same
bracketing, [a[ab]b][ab].(In other words, they have similar essential structures.) They differ only in how S is
introduced and then eliminated. But there are other derivations that correspond to additional parse trees, and
some of them correspond to a completely different bracketing, [a[ab][ba]b]. One derivation that does this is
(ii) S aSb aSSb aabSb aabbab
(d) This is tricky. Recall that we were able to eliminate ambiguity in the case of the balanced parentheses
language just by getting rid of except at the very top to allow for the empty string. If we do the same thing here,
we get R = { S
ST
T ab
T aTb
T ba
T bTa
T TT
But aabbab still has multiple parses in this grammar. This language is different from balanced parens since we
can go back and forth between being ahead on a's and being ahead on b's (whereas, in the paren language, we
must always either be even or be ahead on open paren). So the two parses correspond to the bracketings
[aabb][ab] and [a [ab] [ba] b]. The trouble is the rule T TT, which can get applied at the very top of the tree
(as in the case of the first bracketing shown here), or anywhere further down (as in the case of the second one).
We clearly need some capability for forming a string by concatenating a perfectly balanced string with another
one, since, without that, we'll get no parse for the string abba. Just nesting won't work. We have to be able to
combine nesting and concatenation, but we have to control it. It's tempting to think that maybe an unambiguous
grammar doesn't exist, but it's pretty easy to see how to build a deterministic pda (with a bottom of stack symbol)
to accept this language, so there must be an unambiguous grammar. What we need is the notion of an A region,
in which we are currently ahead on a's, and a B region, in which we are currently ahead on b's. Then at the top
level, we can allow an A region, followed by a B region, followed by an A region and so forth. Think of
switching between regions as what will happen when the stack is empty and we're completely even on the number
of a's and b's that we've seen so far. For example, [ab][ba] is one A region followed by one B region. Once we
enter an A region, we stay in it, always generating an a followed (possibly after something else embedded in the
middle) by a b. After all, the definition of an A region, is that we're always ahead on a's. Only when we are
even, can we switch to a B region. Until then, if we want to generate a b, we don't need to do a pair starting with
b. We know we're ahead on a's, so make any desired b's go with an a we already have. Once we are even, we
must either quit or move to a B region. If we allow for two A regions to be concatenated at the top, there will be
ambiguity between concatenating two A regions at the top vs. staying in a single one. We must, however, allow
two A regions to be concatenated once we're inside an A region. Consider [a[ab][ab]b] Each [ab] is a perfectly
balanced A region and they are concatenated inside the A region whose boundaries are the first a and the final b.
So we must distinguish between concatenation within a region (which only happens with regions of the same
type, e.g, two A's within an A) and concatenation at the very top level, which only happens between different
types.
Also, we must be careful of any rule of the form X XX for another reason. Suppose we have a string that
corresponds to XXX. Is that the first X being rewritten as two, or the second one being rewritten as two. We
need to force a single associativity.
Homework 12
Parse Trees
Homework 12
Parse Trees
CS 341 Homework 13
Pushdown Automata
1. Consider the pushdown automaton M = (K, , , , s, F), where
K = {s, f},
F = {f},
= {a, b},
= {a},
= {((s, a, ), (s, a)), ((s, b, ), (s, a)), ((s, a, ), (f, )), ((f, a, a), (f, )), ((f, b, a), (f, ))}.
(a) Trace all possible sequences of transitions of M on input aba.
(b) Show that aba, aa, abb L(M), but baa, bab, baaaa L(M).
(c) Describe L(M) in English.
2. Construct pushdown automata that accept each of the following:
(a) L = the language generated by the grammar G = (V, , R, S), where
V = {S, (, ), [, ]},
= {(, ), [, ]},
R = { S ,
S SS,
S [S],
S (S)}.
(b) L = {ambn : m n 2m}.
(c) L = {w {a, b}* : w = wR}.
(d) L = {w {a, b}* : w has equal numbers of a's and b's}.
(e) L = {w {a, b}* : w has twice as many a's as b's}.
(f) L = {ambn : m n }
(g) L = {uawb: u and w {a, b}* and |u| = |w|}
3. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. In Homework 12, problem 5, you wrote a context-free grammar for L. Now give
a PDA M that accepts L and trace a computation that shows that aababb L.
4. Construct a context-free grammar for the language of problem 2(b): L = ({ambn: m n 2m}).
Solutions
1. (a) There are three possible computations of M on aba:
(s, aba, ) |- (s, ba, a) |- (s, a, aa) |- (s, , aaa)
(s, aba, ) |- (s, ba, a) |- (s, a, aa) |- (f, , aa)
(s, aba, ) |- (f, ba, )
None of these is an accepting configuration.
(b) This is done by tracing the computation of M on each of the strings, as shown in (a).
(c) L(M) is the set of strings whose middle symbol is a. In other words,
L(M) = {xay {a, b}* : |x| = |y|}.
2. (a) Notice that the square brackets and the parentheses must be properly nested. So the strategy will be to push
the open brackets and parens and pop them against matching close brackets and parens as they are read in. We
only need one state, since all the counting will be done on the stack. Since L, the start state can be final.
Thus we have M = ({s}, {(, ), [, ]}, {(, [}, , s {s}), where (sorry about the confusing use of parentheses both as
part of the notation and as symbols in the language):
Homework 13
Pushdown Automata
Homework 13
Pushdown Automata
To make this work, we need to be able to tell if the stack is empty, since that's the only case where we might
consider pushing either a or b. Recall that we can't do that just by writing as the stack character, since that
always matches, even if the stack is not empty. So we'll start by pushing a special character # onto the bottom of
the stack. We can then check to see if the stack is empty by seeing if # is on top. We can do all the real work in
our PDA in a single state. But, because we're using the bottom of stack symbol #, we need two additional states:
the start state, in which we do nothing except push # and move to the working state, and the final state, which we
get to once we've popped # and can then do nothing else. Considering all these issues, we get M = ({s, q, f}, {a,
b}, {#, a, b}, , s, {f}), where
= { ((s, , ), (q, #)),
/* push # and move to the working state q */
((q, a, #), (q, a#)),
/* the stack is empty and we've got an a, so push it */
((q, a, a), (q, aa)),
/* the stack is counting a's and we've got another one so push it */
((q, b, a), (q, )),
/* the stack is counting a's and we've got b, so cancel a and b */
((q, b, #), (q, b#)),
/* the stack is empty and we've got a b, so push it */
((q, b, b), (q, bb)),
/* the stack is counting b's and we've got another one so push it */
((q, a, b), (q, )),
/* the stack is counting b's and we've got a, so cancel b and a */
((q, , #), (f, ))}.
/* the stack is empty of a's and b's. Pop the # and quit. */
To convince yourself that M does the job, you should show that M does in fact maintain the invariant we stated
above.
The only nondeterminism in this machine involves the last transition in which we guess that we're at the end of
the input. There is an alternative way to solve this problem in which we don't bother with the bottom of stack
symbol #. Instead, we substitute a lot of nondeterminism and we sometimes push a's on top of b's, and so forth.
Most of those paths will end up in dead ends. The machine has fewer states but is harder to analyze. Try to
construct it if you like.
(e) This one is similar to (d) except that there are two a's for every b. Recall the two techniques for matching two
to one that we discussed in our solution to (b). This time, though, we do know that there are always two a's to
every b. We don't need nondeterminism to allow for either one or two. But, because we no longer know that all
the a's come first, we do need to consider what to do in the two cases: (1) We're counting b's on the stack; and (2)
We're counting a's on the stack. If we're counting b's, let's take the approach in which we push two b's every time
we see one. Then, when we go to cancel a's, we can just pop one b for each a. If we see twice as many a's as b's,
we'll end up with an empty stack. Now what if we're counting a's? We'll push one a for every one we see. When
we see b, we pop two a's. The only special case we need to consider arises in strings such as "aba", where we'll
only have seen a single a at the point at which we see the b. What we need to do is to switch from counting a's to
counting b's, since the b counts twice. Thus the invariant that we want to maintain is now
(Number of a's read so far) - 2*(Number of b's read so far)
=
(Number of a's on stack) - (Number of b's on stack)
We can do all this with M = ({s, q, f}, {a, b}, {#, a, b}, , s, {f}), where
= { ((s, , ), (q, #)),
/* push # and move to the working state q */
((q, a, #), (q, a#)),
/* the stack is empty and we've got an a, so push it */
((q, a, a), (q, aa)),
/* the stack is counting a's and we've got another one so push it */
((q, b, aa), (q, )),
/* the stack is counting a's and we've got b, so cancel aa and b */
((q, b, a#), (q, b#)),
/* the stack contains a single a and we've got b, so cancel the a and b
and start counting b's, since we have a shortage of one a */
((q, b, #), (q, bb#)),
/* the stack is empty and we've got a b, so push two b's */
((q, b, b), (q, bbb)),
/* the stack is counting b's and we've got another one so push two */
((q, a, b), (q, )),
/* the stack is counting b's and we've got a, so cancel b and a */
((q, , #), (f, ))}.
/* the stack is empty of a's and b's. Pop the # and quit. */
Homework 13
Pushdown Automata
(g) The idea here is to create a nondeterministic machine. In the start state (1), it reads a's and b's, and for each
character seen, it pushes an x on the stack. Thus it counts the length of u. If it sees an a, it may also guess that
this is the required separator a and go to state 2. In state 2, it reads a's and b's, and for each character seen, pops
an x off the stack. If there's nothing to pop, the machine will fail. If it sees a b, it may also guess that this is the
required final b and go to the final state, state 3. The machine will then accept if both the input and the stack are
empty.
M = ({1, 2, 3}, {a, b}, {x}, s, {2}, =
((1, a, ), (1, x))
((1, b, ), (1, x))
((1, a, ), (2, ))
((2, a, x), (2, )
((2, b, x), (2, )
((2, b, ), (3, )
3.
a//a
a/b/
p
b//b
4.
//
q
b/a/
S
S aSb
S aSbb
Homework 13
Pushdown Automata
CS 341 Homework 14
Pushdown Automata and Context-Free Grammars
1. In class, we described an algorithm for constructing a PDA to accept a language L, given a context free
grammar for L. Let L be the balanced brackets language defined by the grammar G = ({S, [, ]}, {[, ]}, R, S),
where R =
S , S SS, S [S]
Apply the construction algorithm to this grammar to derive a PDA that accepts L. Trace the operation of the
PDA you have constructed on the input string [[][]].
2. Consider the following PDA M:
//
a//a
a//
b/a/
b/a/
//
//
b//b
//
/b/
c/b/
a//a
a//aa
b/a/
a//
b//
a//
b/a/
Homework 14
3. Don't even try to use the grammar construction algorithm. Just observe that L = {anbnbmcp : m p and n and p
0}, or, alternatively {anbmcp : m n + p and n and p 0}. It can be generated by the following rules:
S S1S2
S1 aS1b
/* S1 generates the anbn part. */
S1
S2 bS2
/* S2 generates the bmcp part. */
S2 bS2c
S2
4. (a)
a,b//
b,//
a//a
b//
b//
b//
5
/a/
a/a/
/a/
a//
b//
a,b//
b//
7
a,b//
We use state 2 to skip over an arbitrary number of bai groups that aren't involved in the required mismatch.
We use state 3 to count the first group of a's we care about.
We use state 4 to count the second group and make sure it's not equal to the first.
We use state 5 to skip over an arbitrary number of bai groups in between the two we care about.
We use state 6 to clear the stack in the case that the second group had fewer a's than the first group did.
We use state 7 to skip over any remaining bai groups that aren't involved in the required mismatch.
(b)
S A'bLA'
S A'bRA'
L ab | aL | aLa
R ba | Ra | aRa
A' bAA' |
A aA |
(c)
Homework 14
/* L will take care of two groups where the first group has more a's */
/* R will take care of two groups where the second group has more a's */
CS 341 Homework 15
Parsing
1. Show that the following languages are deterministic context free.
(a) {ambn : m n}
(b) {wcwR : w {a, b}*}
(c) {cambm : m 0} {damb2m : m 0}
(d) (amcbm : m 0} {amdb2m : m 0}
2. Consider the context-free grammar: G = (V, , R, S), where V = {(, ), ., a, S, A}, = {(, ), .}, and R =
{ S (),
S a,
S (A),
A S,
A A.S}
(If you are familiar with the programming language LISP, notice that L(G) contains all
atoms and lists, where the symbol a stands for any non-null atom.)
(a) Apply left factoring and the rule for getting rid of left recursion to G. Let G' be the resulting grammar.
Argue that G' is LL(1). Construct a deterministic pushdown automaton M that accepts L(G)$ by doing a top
down parse. Study the computation of M on the string ((()).a).
(b) Repeat Part (a) for the grammar resulting from G if one replaces the first rule by A .
(c) Repeat Part (a) for the grammar resulting from G if one replaces the last rule by A S.A.
3. Answer each of the following questions True or False. If you choose false, you should be able to state a
counterexample.
(a) If a language L can be described by a regular expression, we can be sure it is a context-free language.
(b) If a language L cannot be described by a regular expression, we can be sure it is not a context-free
language.
(c) If L is generated by a context-free grammar, then L cannot be regular.
(d) If there is no pushdown automaton accepting L, then L cannot be regular.
(e) If L is accepted by a nondeterministic finite automaton, then there is some deterministic PDA accepting L.
(f) If L is accepted by a deterministic PDA, then L' (the complement of L) must be regular.
(g) If L is accepted by a deterministic PDA, then L' must be context free.
(h) If, for a given L in {a, b}*, there exist x, y, z, such that y and xynz L for all n 0, then L must be
regular.
(i) If, for a given L in {a, b}*, there do not exist u, v, x, y, z such that |vy| 1 and uvnxynz L for all n 0,
then L cannot be regular.
(j) If L is regular and L = L1 L2 for some L1 and L2, then at least one of L1 and L2 must be regular.
(k) If L is context free and L = L1L2 for some L1 and L2, then L1 and L2 must both be context free.
(l) If L is context free, then L* must be regular.
(m) If L is an infinite context-free language, then in any context-free grammar generating L there exists at least
one recursive rule.
(n) If L is an infinite context-free language, then there is some context-free grammar generating L that has no
rule of the form A B, where A and B are nonterminal symbols.
(o) Every context-free grammar can be converted into an equivalent regular grammar.
(p) Given a context-free grammar generating L, every string in L has a right-most derivation.
4. Recall problem 4 from Homework 12. It asked you to consider the following grammar for a language L (the
start symbol is S; the alphabets are implicit in the rules):
S SS | AAA |
A aA | Aa | b
(a) It is not possible to convert this grammar to an equivalent one in Chomsky Normal Form. Why not?
Homework 15
Parsing
(b) Modify the grammar as little as possible so that it generates L - . Now convert this new grammar to
Chomsky Normal Form. Is the resulting grammar still ambiguous? Why or why not?
(c) From either the original grammar for L - or the one in Chomsky Normal Form, construct a PDA that
accepts L - .
5. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. In Homework 12, problem 5, you wrote a context-free grammar for L. Then, in
Homework 13, problem 3, you wrote a PDA M that accepts L and traced one of its computations. Now decide
whether you think L is deterministic context free. Defend your answer.
6. Convert the following grammar for arithmetic expressions to Chomsky Normal Form:
EE+T
ET
TT*F
TF
F (E)
F id
7. Again, consider the grammar for arithmetic expressions given in Problem 6. Walk through the process of
doing a top down parse of the following strings using that grammar. Point out the places where a decision has to
be made about what to do.
(a) id * id + id
(b) id * id * id
Solutions
1. (a) L = {ambn : m n}. To show that a language L is deterministic context free, we need to show a
deterministic PDA that accepts L$. We did that for L = {ambn : m n} in class. (See Lecture Notes 14).
(b) L = {wcwR : w {a, b}*}. In class (again see Lecture Notes 14), we built a deterministic PDA to accept L
= {wcwR : w {a, b}*}. Its easy to turn it into a deterministic PDA that accepts L$.
(c) L = {cambm : m 0} {damb2m : m 0}. Often its hard to build a deterministic PDA for a language that is
formed by taking the union of two other languages. For example, {ambm : m 0} {amb2m : m 0} would be
hard (in fact its impossible) because we have no way of knowing, until we run out of bs, whether were
expecting two bs for each a or just one. However, {cambm : m 0} {damb2m : m 0} is actually quite easy.
Every string starts with a c or a d. If its a c, then we know to look for one b for each a; if its a d, then we know
to look for two. So the first thing we do is to start our machine like this:
1
c
0
d
2
The machine that starts in state 1 is our classic machine for anbn, except of course that it must have a final
transition on $ to its final state.
We have two choices for the machine that starts in state 2. It can either push one a for every a it sees, and then
pop an a for every pair of bs, or it can push two as for every a it sees, and then pop one a for every b.
Homework 15
Parsing
(d) L = (amcbm : m 0} {amdb2m : m 0}. Now weve got another unioned language. But this time we dont
get a clue from the first character which part of the language were dealing with. That turns out to be okay
though, because we do find out before we have to start processing the bs whether weve got two bs for each a or
just one. Recall the two approaches we mentioned for machine 2 in the last problem. What we need here is the
first, the one that pushes a single a for each a it sees. Then, when we see a c or d, we branch and either pop an a
for each b or pop an a for every two bs.
2. (a) We need to apply left factoring to the two rules S () and S (A). We also need to eliminate the left
recursion from A A . S. Applying left factoring, we get the first column shown here. Then getting rid of left
recursion gets us the second column:
S (S
S (S
S )
S )
S A)
S A)
Sa
Sa
AS
A SA
A A.S
A .SA
A
(b) Notice that the point of the first rule, which was S (), was to get a set of parentheses with no A inside.
An alternative way to do that is to dump that rule but to add the rule A . Now we always introduce an A
when we expand S, but we can get rid of it later. If we do this, then theres no left factoring to be done. We still
have to get rid of the left recursion on A, just as we did above, however.
(c) If we change A A.S to S S.A, then theres no left recursion to get rid of and we can leave the rules
unchanged. Notice, though, that well get different parse trees this way, which may or may not be important. To
see this, consider the string (a.a.a) and parse it using both the original grammar and the one we get if we change
the last rule.
3. (a) True, since all regular languages are context-free.
(b) False, there exist languages that are context-free but not regular.
(c) False. All regular languages are also context-free and thus are generated by context-free grammars.
(d) True, since if L were regular, it would also be context free and thus would be accepted by some PDA.
(e) True, since there must also be a deterministic FSM and thus a deterministic PDA.
(f) False. Consider L = anbn. L' = {w {a, b}* : either some b comes before some a or there is an unequal
number of a's and b's.}. Clearly this language is not regular since we can't count the a's and b's.
(g) True, since the deterministic context-free languages are closed under complement.
(h) False. Suppose L = anc*bn, which is clearly not regular. Let x = aa, y = c, and z = bb. xynz L.
(i) False. L could be finite.
(j) False. L1 could be anbn and L2 could be { anbm : n m}. Neither is regular. But L1 L2 = {}, which
is regular.
(k) False. Let L1 = a* and L2 = {anbmcm : n m}. L2 is not context free. But L = L1L2 = a*bmcm, which is
context free.
(l) False. Let L = wwR.
(m) True.
(n) True, since we have a procedure for eliminating such unit productions.
(o) False, since there exist context-free languages that are not regular.
(p) True.
4. (a) No grammar in Chomsky Normal Form can generate , yet L.
Homework 15
Parsing
(b) In the original grammar, we could generate zero copies of AAA (by letting S go to ), one copy of AAA
(by letting S go to AAA), two copies (by letting S go to SS and then each of them to AAA), three copies of AAA
(by letting S go to SS, then one of the Ss goes to SS, then all three go to AAA), and so forth. We want to make
sure that we can still get one copy, two copies, three copies, etc. We just want to eliminate the zero copies
option. Note that the only role of S is to determine how many copies of AAA are produced. Once we have
generated As we can never go back and apply the S rules again. So all we have to do is to eliminate the
production S . The modified grammar to accept L - is thus:
G = ({S, A, B, C, a, b}, {a, b}, R, S), where R = {
S SS | AAA
A aA | Aa | b
If we convert this grammar to Chomsky Normal Form, we get:
G = ({S, A, B, C, a, b}, {a, b}, R, S), where R = {
S SS
A AC
S AB
Ab
B AA
Ca
A CA }
This grammar is still ambiguous.
(c) (from the grammar of part (b)): M = ({p, q}, {a, b}, {S, A, a, b,}, , p, {q})
= { ((p, , ), (q, S))
((q, , A), (q, aA))
((q, , S), (q, SS)
((q, , A), (q, Aa))
((q, , S), (q, AAA)
((q, , A), (q, b))
((q, a, a), (q, ))
((q, b, b), (q, )) }
5. L is not deterministic context free for essentially the same reason that wwR is not.
EE+T
ET
TT*F
TF
F (E)
F id
Step 2. There are no rules. We show steps 3, 4, and 5 next to each other, so it's clear where the rules in steps 4
and 5 came from. In each case, the first rule that is derived from a step 3 rule is next to its source. If more than
one rule is derived from any given step 3 rule, the second and others are shown immediately under the first.
That's why there are some blank lines in the first two columns.
6. The original grammar was:
Homework 15
Parsing
Step 5.
Step 4.
Step 3.
E * T, F
T * F
G" =
E TMF
E LER
E id
T LER
T id
P+
M*
L(
R)
E T T'
E LF'
E id
T L F'
T id
P+
M*
L(
R)
E EPT
TT*F
T TMF
F (E)
F LER
F id
Then we add:
ET*F
E (E)
E id
T (E)
T id
Homework 15
F id
E E E'
E' PT
T T T'
T' M F
F L F'
F' E R
F id
EE+T
Parsing
(since T' M F)
(since F' E R)
(since F' E R)
CS 341 Homework 16
Languages that Are and Are Not Context-Free
1. Show that the following languages are context-free. You can do this by writing a context free grammar or a
PDA, or you can use the closure theorems for context-free languages. For example, you could show that L is the
union of two simpler context-free languages.
(a) L = ancbn
(b) L = {a, b}* - {anbn : n 0}
(c) L = {ambncpdq : n = q, or m p or m + n = p + q}
(d) L = {a, b}* - L1, where L1 is the language {babaabaaabban-1banb : n n 1}.
2. Show that the following languages are not context-free.
2
(a) L = { a n : n 0}
(b) L = {www : w {a, b}*}
(c) L = {w {a, b, c}* : w has equal numbers of a's, b's, and c's}
(d) L = {anbman : n m}
(e) L = {anbmcnd(n+m) : m, n 0}
3. Give an example of a context free language ( *) that contains a subset that is not context free. Describe the
subset.
4. What is wrong with the following "proof" that anb2nan is context free?
(1) Both {anbn : n 0} and {bnan : n 0} are context free.
(2) anb2nan = {anbn}{bnan }
(3) Since the context free languages are closed under concatenation, anb2nan is context free.
5. Consider the following context free grammar: G = ({S, A, a, b}, {a, b}, R, S), where R = {
S aAS
Sa
A SbA
A SS
A ba }
(a) Answer each of the following questions True or False:
(i) From the fact that G is context free, it follows that there is no regular expression for L(G).
(ii) L(G) contains no strings of length 3.
(iii) For any string w L(G), there exists u, v, x, y, z such that w = uvxyz, |vy| 1, and uvnxynz L(G)
for all n 0.
(iv) If there exist languages L1 and L2 such that L(G) = L1 L2, then L1 and L2 must both be context
free.
(v) The language (L(G))R is context free.
(b) Give a leftmost derivation according to G of aaaabaa.
(c) Give the parse tree corresponding to the derivation in (b).
(d) Give a nondeterministic PDA that accepts L(G).
6. Show that the following language is context free:
L = {xxRyyRzzR : x, y, z {a, b}*}.
Homework 16
2. (a) L = { a n : n 0}. Suppose L = { a n : n 0} were context free. Then we could pump. Let n = M2. So w
2
is the string with M 2 , or M4, a's.) Clearly |w| K, since M > K. So uvvxyyz must be in L (whatever v and y
Homework 16
are). But it can't be. Why not? Given our w, the next element of L is the string with (M2+1)2 a's. That's M4 +
2M2 + 1 (expanding it out). But we know that |vxy| M, so we can't pump in more than M a's when we pump
only once. Thus the string we just created can't have more than M4 + M a's. Clearly not enough.
(b) L = {www : w {a, b}*}. The easiest way to do this is not to prove directly that L = {www : w {a, b}*}
is not context free. Instead, let's consider L1 = L a*ba*ba*b. If L is context free, L1 must also be. L1 =
{anbanbanb : n 0}. To show that L1 is not context free, let's choose w = aMbaMbaMb. First we observe that
neither v nor y can contain b, because if either did, then, when we pump, we'd have more than three b's, which is
not allowed. So both must be in one of the three a regions. We consider the cases:
(1, 1) That group of a's will no longer match the other two, so the string is not in L1.
(2, 2)
"
(3, 3)
"
(1, 2) At least one of these two groups will have something pumped into it and will no longer match the
one that is left out.
(2, 3)
"
(1, 3) excluded since |vxy| M, so vxy can't span the middle region of a's.
(c) L = {w {a, b, c}*. Again, the easiest thing to do is first to intersect L = {w {a, b, c}* : w has equal
numbers of a's, b's, and c's} with a regular language. This time we construct L1 = L a*b*c*. L1 must be
context free if L is. But L1 = anbncn, which we've already proven is not context free. So L isn't either.
(d) L = {anbman : n m}. We'll use the pumping lemma to show that L = {anbman : n m} is not context free.
Choose w = aMbMaM. We know that neither v nor y can cross a and b regions, because if one of them did, then,
when we pumped, we'd get a's and b's out of order. So we need only consider the cases where each is in one of
the three regions of w (the first group of a's, the b's, and the second group of a's.)
(1, 1) The first group of a's will no longer match the second group.
(2, 2) If we pump in b's, then at some point there will be more b's than a's, and that's not allowed.
(3, 3) Analogous to (1, 1)
(1, 2) We must either (or both) pump a's into region 1, which means the two a regions won't match, or,
if y is not empty, we'll pump in b's but then eventually there will be more b's than a's.
(2, 3) Analogous to (1, 2)
(1, 3) Ruled out since |vxy| M, so vxy can't span the middle region of b's.
(e) L = {anbmcnd(n+m) : m, n 0}. We can show that L = {anbmcnd(n+m) : m, n 0} is not context free by
pumping. We choose w = aMbMcMd2M. Clearly neither v nor y can cross regions and include more than one letter,
since if that happened we'd get letters out of order when we pumped. So we only consider the cases where v and
y fall within a single region. We'll consider four regions, corresponding to a, b, c, and d.
(1, 1) We'll change the number of a's and they won't match the c's any more.
(1, 2) If v is not empty, we'll change the a's and them won't match the c's. If y is nonempty, we'll change the
number of b's and then we won't have the right number of d's any more.
(1, 3), (1, 4) are ruled out because |vxy| M, so vxy can't span across any whole regions.
(2, 2) We'll change the number of b's but then we won't have the right number of d's.
(2, 3) If v is not empty, we'll change the b's without changing the d's. If y is not empty, we'll change the c's and
they'll no longer match the a's.
(2, 4) is ruled out because |vxy| M, so vxy can't span across any whole regions.
(3, 3) We'll change the number of c's and they won't match the a's.
(3, 4) If v is not empty, we'll change c's and they won't match a's. If y is not empty, we'll change d's without
changing b's.
(4, 4) We'll change d's without changing a's or b's.
Homework 16
3. Let L = { anbmcp : n = m or m = p}. L is clearly context free. We can build a nondeterministic PDA M to
accept it. M has two forks, one of which compares n to m and the other of which compares m to p (skipping over
the a's). L1 = {anbmcp : n = m and m = p} is a subset of L. But L1 = anbncn, which we know is not context free.
4. (1) is fine. (2) is fine if we don't over interpret it. In particular, although both languages are defined in terms
of the variable n, the scope of that variable is a single language. So within each individual language definition,
the two occurrences of n are correctly interpreted to be occurrences of a single variable, and thus the values must
be same both times. However, when we concatenate the two languages, we still have two separate language
definitions with separate variables. So the two n's are different. This is the key. It means that we can't assume
that, given {anbn}{bnan }, we choose the same value of n for the two strings we choose. For example, we could
get a2b2b3a3, which is a2b5a3, which is clearly not in {anb2nan}.
5. (a)
(i) False, since all regular languages are also context free.
(ii) True.
(iii) False. For example a L, but is not long enough to contain pumpable substrings.
(iv) False.
(v) True, since the context-free languages are closed under reversal.
(b) S aAs aSSS aaSS aaaS aaaaAS aaaabaS aaaabaa.
(c)
S
a
A
b
S
a
Homework 16
CS 341 Homework 17
Turing Machines
1. Let M = (K, , , s, {h}), where
K = {q0, q1, h},
= {a, b, , },
s = q0,
and is given by the following table,
q
q0
q0
q0
q0
q1
q1
q1
q1
a
b
a
b
(q,
)
(q1, b)
(q1, a)
(h, )
(q0, )
(q0, )
(q0, )
(q0, )
(q1, )
(a) Trace the computation of M starting from the configuration (q0, aabbba).
(b) Describe informally what M does when started in q0 on any square of a tape.
2. Repeat Problem 1 for the machine M = (K, , , s, {h}), where
K = {q0, q1, q2, h},
= {a, b, , },
s = q0,
and is given by the following table (the transitions on are (q, ) = (q, ), and are omitted).
q
q0
q0
q0
q1
q1
q1
q2
q2
q2
a
b
a
b
a
b
(q,
)
(q1, )
(q0, )
(q0, )
(q1, )
(q2, )
(q1, )
(q2, )
(q2, )
(h, )
Turing Machines
q
q0
q0
q0
q1
q1
q1
q2
q2
q2
(q,
)
(q1, )
(q0, )
(q0, )
(q2, )
(h, )
(q1, )
(q2, a)
(q0, )
(q2, )
4. Design and write out in full a Turing machine that scans to the right until it finds two consecutive a's and then
halts. The alphabet of the Turing machine should be {a, b, , }.
5. Give a Turing machine (in our abbreviated notation) that takes as input a string w {a, b}* and squeezes out
the a's. Assume that the input configuration is (s, w) and the output configuration is (h, w'), where w' = w
with all the a's removed.
6. Give a Turing machine (in our abbreviated notation) that shifts its input two characters to the right.
Input:
w
Output:
w
7. (L & P 5.7.2) Show that if a language is recursively enumerable, then there is a Turing machine that
enumerates it without ever repeating an element of the language.
Solutions
/
1. (a)
/
a,b,/
q0
q1
a/b; b/a
/
h
q0, aabbba
q1, babbba
q0, babbba
q1, bbbbba
q0, bbbbba
q1, bbabba
q0, bbabba
q1, bbaaba
q0, bbaaba
q1, bbaaaa
q0, bbaaaa
q1, bbaaab
q0, bbaaab
h, bbaaab
(b) Converts all a's to b's, and vice versa, starting with the current symbol and moving right.
Homework 17
Turing Machines
2. (a)
b,/
a,/
a/
q0, abbbbaba
q0, abbbbaba
q0, abbbbaba
q0, abbbbaba
q1, abbbbaba
q1, abbbbaba
q2, abbbbaba
h, abbbbaba
a,b,/
b/
q0
q1
q2
/
h
(b) M goes right until if finds an a, then left until it finds a b, then right until it finds a blank.
/
3.
a/a
a/
a/
q0
q1
q2
/
h
q0, a a a a a
q1, a a a a a
q2, a a a a
q0, a a a a
q1, a a a a
q2, a a a
q0, a a a
q1, a a a
h,a a a
/
M changes every other a, moving left from the start, to a blank. If n is odd, it loops. If n is even, it halts.
4.
a/
a/a
q0
q1
b,/
b,/
5. The idea here is that first we'll push all b's to the left, thus squeezing all the a's to the right. Then we'll just
replace the a's with blanks. In more detail: scan the string from left to right. Every time we find an a, if there are
any b's to the right of it we need to shift them left. So scan right, skipping as many a's as there are. When we
find a b, swap it with the last a. That squeezes one a further to the right. Go back to the left end and repeat.
Eventually all the a's will come after all the b's. At that point, when we look for a b following an a, all we'll find
is a blank. At that point, we just clean up by rewriting all the a's as blanks.
>Ra,
Rb,
aLbL
L
a
Homework 17
Turing Machines
6. The idea is to start with the rightmost character of w, rewrite it as a blank, then move two squares to the right
and plunk that character back down. Then scan left for the next leftmost character, do the same thing, and so
forth.
>L
R2aLL
R3
7. Suppose that M is the Turing machine that enumerates L. Then construct M* to enumerate L with no
repetitions: M* will begin by simulating M. But whenever M outputs a string, M* will first check to see if the
string has been output before (see below). If it has, it will just continue looking for strings. If not, it will output
it, and it will also write the string, with # in front of it, at the right end of the tape. To check whether a string has
been output before, M* just scans its tape checking for the string it is about to output.
Homework 17
Turing Machines
CS 341 Homework 18
Computing with Turing Machines
1. Present Turing machines that decide the following languages over {a, b}:
(a)
(b) {}
(c) {a}
(d) {a}*
2. Consider the simple (regular, in fact) language L = {w {a,b}* : |w| is even}
(a) Give a Turing machine that decides L.
(b) Give a Turing machine that semidecides L.
3. Give a Turing machine (in our abbreviated notation) that accepts L = {anbman : m > n}
4. Give a Turing machine (in our abbreviated notation) that accepts L = {ww : w {a, b}*}
5. Give a Turing machine (in our abbreviated notation) that computes the following function from strings in {a,
b}* to strings in {a, b}* : f(w) = wwR.
6. Give a Turing machine that computes the function f: {a,b,c}* N (the integers), where f(w) = the number of
a's (in unary) in w.
7. Let w and x be any two positive integers encoded in unary. Show a Turing machine M that computes
f(w, x) = w + x.
Represent the input to M as
w;x
8. Two's complement form provides a way to represent both positive and negative binary integers. Suppose that
the number of bits allocated to each number is k (generally the word size). Then each positive integer is
represented simply as its binary encoding, with leading zeros. Each negative integer n is represented as the result
of subtracting |n| from 2k, where k is the number of bits to be used in the representation. Given a fixed k, it is
possible to represent any integer n if -2k-1 n 2k-1 -1. The high order digit of each number indicates its sign: it
is zero for positive integers and 1 for negative integers.
Examples, for k = 4:
0 = 0000, 1 = 0001, 2 = 0010, 3 = 0011, 4 = 0100, 5 = 0101, 6 = 0110, 7 = 0111
-1 = 1111, -2 = 1110, -3 = 1101, -4 = 1100, -5 = 1011, -6 = 1010, -7 = 1001, -8 = 1000
Since Turing machines don't have fixed length words, we'd like to be able to represent any integer. We will
represent positive integers with a single leading 0. We will represent each negative integer n as the result of
subtracting n from 2i+1, where i is the smallest value such that 2i |n|. For example, -65 will be represented as
1111111, since 27 (128) 65, so we subtract 65 (01000001 in binary) from 28 (in binary, 100000000). We need
the extra digit (i.e., we subtract from 2i+1 rather than from 2i) because, in order for a positive number to be
interpreted as positive, it must have a leading 0, thus consuming an extra digit.
Let w be any integer encoded in two's complement form. Show a Turing machine that computes f(w) = -w.
Homework 18
Solutions
1 (a) We should reject everything, since no strings are in the language.
>n
(b) Other than the left boundary symbol, the tape should be blank:
>R
y
(c) Just the single string a:
>R
a
n
(d) Any number of a's:
a
>R
(a,)
2. (a)
a,b
>R
a,b
>R
Homework 18
a,b
(b)
a,b
3. The idea is to make a sequence of passes over the input. On each pass, we mark off (with d, e, and f) a
matching a, b, and a. This corresponds to the top row of the machine shown here. When there are no matching
groups left, then we accept if there is nothing left or if there are b's in the middle. If there is anything else, we
reject. It turns out that a great deal of this machine is essentially error checking. We get to the R on the second
row as soon as we find the first "extra" b. We can loop in it as long as we find b's. If we find a's we reject. If we
find a blank, then the string had just b's, which is okay, so we accept. Once we find an f, we have to go to the
separate state R on the third row to skip over the f's and make sure we get to the final blank without either any
more b's or any more a's.
d,e
a,e
R
a d
b,f
R
fL
a,b
R
4. The hard part here is that we don't know where the middle of the string is. So we don't know where the
boundary between the first occurrence of w ends and the second begins. We can break this problem into three
subroutines, which will be executed in order:
(1) Find the middle and mark it. If there's a lone character in the middle (i.e., the length of the input string isn't
even), then reject immediately.
(2) Bounce back and forth between the beginning of the first w and the beginning of the second, marking off
characters if they match and rejecting if they don't.
(3) If we get to the end of the w's and everything has matched, accept.
Let's say a little more about step (1). We need to put a marker in the middle of the string. The easiest thing to do
is to make a double marker. We'll use ##. That way, we can start at both ends (bouncing back and forth), moving
a marker one character toward the middle at each step. For example, if we start with the tape aabbaabb,
after one mark off step we'll have a#abbaab#b, then aa#bbaa#bb, and finally
aabb##aabb. So first we shift the whole input string two squares to the right on the tape to make room
for the two markers. Then we bounce back and forth, moving the two markers toward the center. If they meet,
we've got an even length string and we can continue. If they don't, we reject right away.
Homework 18
5. The idea is to work from the middle. We'll scan right to the rightmost character of w (which we find by
scanning for the first blank, then backing up (left) one square. We'll rewrite it so we know not to deal with it
again (a's will become 1's; b's will become 2's.) Then we move right and copy it. Now if we scan back left past
any 1's or 2's, we'll find the next rightmost character of w. We rewrite it to a 1 or a 2, then scan right to a blank
and copy it. We keep this up until, when we scan back to the left, past any 1's or 2's, we hit a blank. That means
we've copied everything. Now we scan to the right, replacing 1's by a's and 2's by b's. Finally, we scan back to
the left to position the read head to the left of w.
a
>R L
1Ra
L1,2 L1,2
b
2Rb
a
1
a,b
2
b
L
6. The idea here is that we need to write a 1 for every a and we can throw away b's and c's. We want the 1's to
end up at the left end of the string, so then all we have to do to clean up at the end is erase all the b's and c's to the
right of the area with the 1's. So, we'll start at the left edge of the string. We'll skip over b's and c's until we get
to an a. At that point, rewrite it as b so we don't count it again. Then (remembering it in the state) scan left until
we get to the blank (if this is the first 1) or we get to a 1. In either case, move one square to the right and write a
1. We are thus overwriting a b or a c, but we don't care. We're going to throw them away anyway. Now start
again scanning to the right looking for an a. At some point, we'll come to the end of string blank instead. At that
point, just travel leftward, rewriting all the b's and c's to blank, then cross the 1's and land on the blank at the left
of the string.
>R
bL,1R1
b,c
b,c
1
7. All we have to do is to concatenate the two strings. So shift the second one left one square, covering up the
semicolon.
>R; R 1
;L1
LL
Homework 18
8. Do a couple of examples of the conversion to see what's going on. What you'll observe is that we want to scan
from the right. Initially, we may see some zeros, and those will stay as zeros. If we ever see a 1, then we rewrite
the first one as a 1. After that, we're dealing with borrowing, so we swap all digits: every zero becomes a one and
every one becomes a zero, until we hit the blank at the end of the string and halt.
>R L
1
1L
1
0
Homework 18
CS 341 Homework 19
Turing Machine Extensions
1. Consider the language L = {wwR}.
(a) Describe a one tape Turing machine to accept L.
(b) Describe a two tape Turing machine to accept L.
(c) How much more efficient is the two tape machine?
2. Give (in abbreviated notation) a nondeterministic Turing machine that accepts the language
L = {wwRuuR : w, u {a, b}*}
Solutions
(1) (a) The one tape machine needs to bounce back and forth between the beginning of the input string and the
end, marking off matching symbols.
(b) The two tape machine works as follows: If the input is , accept. If not, copy the input to the second tape and
record in the state that you have processed an even number of characters so far. Now, start the first tape at the
left end and the second tape at the right end . Check that the symbols on the two tapes are the same. If not,
reject. If so, move the first tape head to the right and the second tape head to the left. Also record that you have
processed an odd number and continue, each time using the state to keep track of whether youve seen an even or
odd number of characters so far. When you reach the end of the input tape, accept if youve seen an even number
of characters. Reject if youve seen an odd number. (The even/odd counter is necessary to make sure that you
reject strings such as aba.)
(c) The one tape machine takes time proportional to the square of the length of the input, since for an input of
length n it will make n passes over the input, each of which takes on average n/2 steps. The two tape machine
takes time that's linear in n. It takes n steps to copy, then another n steps to compare.
2. The idea is just to use nondeterminism to guess the location of the boundary between the w and u regions.
Each path will choose a spot, shift the u region to the right, and insert a boundary marker #. Once this is done,
the machine simply checks each region for wwR. If we get a string in L, one of the guessed paths will work.
Homework 19
CS 341 Homework 20
Unrestricted Grammars
1. Find grammars that generate the following languages:
(a) L = {ww : w {a, b}*}
n
(b) L = { a 2 : n 0}
(c) L = { anb2nc3n : n 1}
(d) L = {wR : w is the social security number of a living American citizen}
(e) L = {wcmdn : w {a, b}* and m = the number of a's in w and n equals the number of b's in w}
2. Find a grammar that computes the function f(w) = ww, where w {a, b}*.
Solutions
1. (a) L = {ww : w {a, b}*}
There isnt any way to generate the two ws in the correct order. Suppose we try. Then we could get aSa.
Suppose we want b next. Then we need Sa to become bSab, since the new b has to come after the a thats
already there. That could work. Now we have abSab. Lets say we want a next. Now Sab has to become aSaba.
The problem is that, as the length of the string grows, so does the number of rules well need to cope with all the
patterns we could have to replace. In a finite number of rules, we cant deal with replacing S (which we need to
do to get the next character in the first occurrence of w), and adding a new character that is arbitrarily far away
from S.
The other approach we could try would be have a rule S WW, and then let W generate a string of a's and b's.
But this won't work, since we have no way to control the expansion of the two W's so that they produce the same
thing.
So what we need to do is to generate wwR and then, carefully, reverse the order of the characters in wR. What
well do is to start by erecting a wall (#) at the right end of the string. Then well generate wwR. Then, in a
second phase, well take the characters in the second w and, one at a time, starting with the leftmost, move it right
and then move it past the wall. At each step, we move each character up to the wall and then just over it, but we
dont reverse characters once they get over the wall. The first part of the grammar, which will generate wTwR,
looks like this:
S S1 #
S1 aS1a
S1 bS1b
S1 T
T will mark the left edge of the portion that needs to be reversed.
At this point, we can generate strings such as abbbTbbba#. What we need to do now is to reverse the string of
as and bs that is between T and #. To do that, we let T spin off a marker Q, which we can pass rightward
through the string. As it moves to the right, it will take the first a or b it finds with it. It does this by swapping the
character it is carrying (the one just to the right of it) with the next one to the right. It also moves itself one
square to the right. The four rules marked with * accomplish this. When Qs character gets to the # (the rules
marked **), the a or b will swap places with the # (thus hopping the fence) and the Q will go away. We can keep
doing this until all the as and bs are behind the fence and in the right order. Then the final T# will drop out.
Here are the rules for this phase:
Homework 20
Unrestricted Grammars
T TQ
Qaa aQa
*
Qab bQa
*
Qbb bQb
*
Qba aQb
*
Qa# #a
**
Qb# #b
**
T#
So with R as given above, the grammar G = ({S, S1, #, T, Q, a, b}, {a, b}, R, S}
n
(b) L = { a 2 : n 0}
The idea here is first to generate the first string, which is just a. Then think about the next one. You can derive it
by taking the previous one, and, for every a, write two as. So we get aa. Now to get the third one, we do the
same thing. Each of the two as becomes two and we have four, and so forth. So we need a rule to get us started
and to indicate the possibility of duplication. Then we need rules to actually do the duplication. To make
duplication happen, we need a symbol that gets generated by S indicating the option to repeat. Well use P.
Since duplication can happen an arbitrary number of times, we need P to spin off as many individual duplication
commands as we want. Well use R for that. The one other thing we need is to make sure, if we start a
duplication step, that we finish it. In other words, suppose we currently have aaaa. If we start duplicating the as,
we must duplicate all of them. Otherwise, we might end up with, for example, seven as. So well introduce a
left edge marker, #. Once we fire up a duplication (by creating an R), well only stop (i.e., get rid of R) when R
has made it all the way to the other end of the string (namely the left end since it starts at the right). So we get
the following rules:
S #aP
P lets us start up duplication processes as often as we like.
P
When weve done as many as we want, we get rid of P.
P RP
R will actually do a duplication by moving leftward, duplicating every a it sees.
aR Raa
Actually duplicates one a, and moves R one square to the left so it moves on to the next a
#R #
Get rid of R once it's made it all the way to the left
#
Get of # at the end
So with R as given above, the grammar G = ({S, P, R, #, a, b}, {a, b}, R, S}
(c) L = { anb2nc3n : n 1}
This one is very similar to anbncn. The only difference is that we will churn out b's in pairs and c's in triples each
time we expand S. So we get:
S aBSccc
S aBccc
Ba aB
Bc bbc
Bb bbb
So with R as given above, the grammar G = ({S, B, a, b, c}, {a, b, c}, R, S}
(d) L = {wR : w is the social security number of a living American citizen}
This one is regular. There is a finite number of such social security numbers. So we need one rule for each
number. Each rule is of the form S <valid number>. So with that collection of rules as R, the grammar G =
({S, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, R, S}
(e) L = {wcmdn : w {a, b}* and m = the number of a's in w and n equals the number of b's in w}
The idea here is to generate a c every time we generate an a and to generate a d every time we generate a b. We'll
do this by generating the nonterminals C and D, which we will use to generate c's and d's once everything is in
the right place. Once we've finished generating all the a's and b's we want, the next thing we need to do is to get
Homework 20
Unrestricted Grammars
all the D's to the far right of the string, all the C's next, and then have the a's and b's left alone at the left. We
guarantee that everything must line up that way by making sure that C can't become c and D can't become d
unless things are right. To do this, we require that D can only become d if it's all the way to the right (i.e., it's
followed by #) or it's got a d to its right. Similarly with C. We can do this with the following rules;
S S1#
S1 aS1C
S1 bS1D
S1
DC CD
D# d
Dd dd
C# c
Cd cd
Cc cc
#
So with R as given above, the grammar G = ({S, S1, C, D, #, a, b, c, d}, {a, b, c, d}, R, S}
2. We need to find a grammar that computes the function f(w) = ww. So we'll get inputs such as SabaS. Think of
the grammar we'll build as a procedure, which will work as described below. At any given time, the string that
has just been derived will be composed of the following regions:
<the part of w that
has already been
inserted
copied>
when T
is)
Most of the rules come in pairs, one dealing with an a, the other with b.
SS
Sa aS#a
Unrestricted Grammars
%aa a%a
%ab b%a
%ba a%b
%bb b%b
%aW aW
%bW bW
ST
W
Push a to the right through the copied region in exactly the same way we pushed it through w,
except we're using % rather than # as the pusher. This rule pushes a past a.
Pushes a past b.
Same two rules for pushing b.
"
We've pushed an a all the way to the right boundary, so get rid of %, the pusher.
Same for a pushed b.
All the characters from w have been copied, so they're all to the left of S, which causes S to be
adjacent to the middle marker T. We can now get rid of our special walls. Here we get rid of S
and T.
Gid rid of W. Note that if we do this before we should, there's no way to get rid of %, so any
derivation path that does this will fail to produce a string in {a, b}*.
So with R as given above, the grammar G = ({S, T, W, #, %,a, b}, {a, b}, R, S}
Homework 20
Unrestricted Grammars
CS 341 Homework 21
Undecidability
1. Which of the following problems about Turing machines are solvable, and which are undecidable? Explain
your answers carefully.
(a) To determine, given a Turing machine M, a state q, and a string w, whether M ever reaches state q when
started with input w from its initial state.
(b) To determine, given a Turing machine M and a string w, whether M ever moves its head to the left when
started with input w.
(c) To determine, given two Turing machines, whether one semidecides the complement of the language
semidecided by the other.
(d) To determine, given a Turing machine M, whether the language semidecided by M is finite.
2. Show that it is decidable, given a pushdown automaton M with one state, whether L(M) = *. (Hint: Show
that such an automaton accepts all strings if and only if it accepts all strings of length one.)
3. Which of the following problems about context-free grammars are solvable, and which are undecidable?
Explain your answers carefully.
(a) To determine, given a context-free grammar G, is L(G)?
(b) To determine, given a context-free grammar G, is {} = L(G)?
(c) To determine, given two context-free grammars G1 and G2, is L(G1) L(G2)?
4. The nonrecursive languages L that we have discussed in class all have the property that either L or the
complement of L is recursively enumerable.
(a) Show by a counting argument that there is a language L such that neither L nor its complement is recursively
enumerable.
(b) Give an example of such a language.
Solutions
1. (a) To determine, given a Turing machine M, a state q, and a string w, whether M ever reaches state q when
started with input w from its initial state. This is not solvable. We can reduce H to it. Essentially, if we can tell
whether a machine M ever reaches some state q, then let q be M's halt state (and we can massage M so it has only
one halt state). If it ever gets to q, it must have halted. More formally:
L1 = H =
(?M2) L2 =
{s : "M" "w" "q" : M reaches state q when started with input w from its initial state}
Let ' create, from M the machine M* as follows. Initially M* equals M. Next, a new halting state H is created in
M*. Then, from each state that was a halting state in M, we create transitions in M* such that for all possible
values of the current tape square, M* goes to H. We create no other transitions to H. Notice that M* will end up
in H in precisely the same situations in which M halts.
Now let ("M" "w") = '("M") "w" "H"
So, if M2 exists, then M1 exists. It invokes ' to create M*. Then it passes "M*", "w", and "H" to M2 and returns
whatever M2 returns. But M1 doesn't exist. So neither does M2.
Homework 21
Undecidability
(b) To determine, given a Turing machine M and a string w, whether M ever moves its head to the left when
started with input w. This one is solvable. We will assume that M is deterministic. We can build the deciding
machine D as follows. D starts by simulating the operation of M on w. D keeps track on another tape of each
configuration of M that it has seen so far. Eventually, one of the following things must happen:
1. M moves its head to the left. In this case, we say yes.
2. M is stuck on some square s of the tape. In other words, it is in some state p looking at some square s on the
tape and it has been in this configuration before. If this happens and M didn't go left yet, then M simply
hasn't moved off of s. And it won't from now on, since it's just going to do the same thing at this point as it
did the last time it was in this configuration. So we say no.
3. M moves off the right hand edge of the input w. So it is in some state p looking at a blank. Within k steps (if
k is the number of states in M), M must repeat some state p. If it does this without moving left, then again we
know that it never will. In other words, if the last time it was in the configuration in which it was in state p,
looking at a blank, there was nothing to the right except blanks, and it can't move left, and it is again in that
same situation, it will do exactly the same thing again. So we say no.
(c) To determine, given two Turing machines, whether one semidecides the complement of the language
semidecided by the other. This one is not solvable. We can reduce to it the problem, "Given a Turing machine
M, is there any string at all on which M halts?" (Which is equivalent to "Is L(M) = ?") In the book we show
that this problem is not solvable. What we'll do is to build a machine M* that semidecides the language *,
which is the complement of the language . If we could build a machine to tell, given two Turing machines,
whether one semidecides the complement of the language semidecided by the other, then to find out whether any
given machine M accepts anything, we'd pass M and our constructed M* to this new machine. If it says yes, then
M accepts . If it says no, then M must accept something. Formally:
L1 =
(?M2) L2 =
M accepts strings over some input alphabet . Let ' construct a machine M* that semidecides the language *.
Then ("M") = "M" "'(M)".
So, if M2 exists, then M1 exists. It invokes ' to create M*. Then it passes "M" and "M*" to M2 and returns the
opposite of whatever M2 returns (since M2 says yes if L(M) = and M1 wants to say yes if L(M) ). But M1
doesn't exist. So neither does M2.
(d) To determine, given a Turing machine M, whether the language semidecided by M is finite. This one isn't
solvable. We can reduce to it the problem, "Given a Turing machine M, does M halt on ?" We'll construct,
from M, a new machine M*, which erases its input tape and then simulates M. M* halts on all inputs iff M halts
on . If M doesn't halt on , then M* halts on no inputs. So there are two situations: M* halts on all inputs (i.e.,
L(M*) is infinite) or M* halts on no inputs (i.e., L(M*) is finite). So, if we could build a Turing machine M2 to
decide whether L(M*) is finite or infinite, we could build a machine M1 to decide whether M halts on .
Formally:
{s = "M" M halts on }
L1 =
(?M2) L2 =
Homework 21
{s = "M" is finite}
Undecidability
Homework 21
Undecidability
4. (a) If any language L is recursively enumerable, then there is a Turing machine that semidecides it. Every
Turing machine has a description of finite length. Therefore, the number of Turing machines, and thus the
number of recursively enumerable languages, is countably infinite (since the power set of a countable set is
countably infinite). If, for some language L, its complement is re, then it must have a semideciding Turing
machine, so there is a countably infinite number of languages whose complement is recursively enumerable. But
there is an uncountable number of languages. So there must be languages that are not recursively enumerable
and do not have recursively enumerable complements.
(b) L = {"M" : M halts on the input 0 and M doesn't halt on the input 1}.
The complement of L = {"M" : M doesn't halt on the input 0 or M halts on the input 1}. Neither of these
languages is recursively enumerable because of the doesn't halt piece.
Homework 21
Undecidability
CS 341 Homework 22
Review
1. Given the following language categories:
A:
L is finite.
B:
L is not finite but is regular.
C:
L is not regular but is deterministic context free
D:
L is not deterministic context free but is context free
E:
L is not context free but is Turing decidable
F:
L is not Turing decidable but is Turing acceptable
G:
L is not Turing acceptable
Assign the appropriate category to each of the following languages. Make sure you can justify your answer.
a. _____ {anbkn : k = 1 or k = 2, n 0}
b. _____ {anbkn : k = 0 or k = 1, n 0}
c. _____ {anbncn : n 0}
d. _____ {anbncm : n 0, m 0}
e. _____ {anbn : n 0} a*
f. _____ {anbm : n is prime and m is even}
g. _____ {anbmcm+n : n 0, m 0}
h. _____ {anbmcmn : n 0, m 0}
i. _____ {anbm : n 0, m 0}
j. _____ {xy : x a*, y b*, |x| = |y|}
k. _____ {xy : x a*, y a*, |x| = |y|}
l. _____ {x : x {a, b, c}*, and x has 5 or more a's}
m._____ {"M" : M accepts at least 1 string}
n._____ {"M" : M is a Turing machine that halts on input and |"M"| 1000}
o._____ {"M" : M is a Turing machine with 50 states}
p._____ {"M" : M is a Turing machine such that L(M) = a*}
q. _____ {x : x {A, B, C, D, E, F, G}, and x is the answer you write to this question}
Solutions
a. __D__ {anbkn : k = 1 or k = 2, n 0}
We havent discussed many techniques for proving that a context free language isnt deterministic, so we cant
prove that this one isnt. But essentially the reason this one isnt is that we dont know what to do when we see
bs. Clearly, we can build a pda M to accept this language. As M reads each a, it pushes it onto the stack. When
it starts seeing bs, it needs to start popping as. But theres no way to know, until either it runs out of bs or it
gets to the n+1st b, whether to pop an a for each b or hold back and pop an a for every other b. So M is not
deterministic.
b. __C__ {anbkn : k = 0 or k = 1, n 0}
This one is looks very similar to a, but its different in one key way. Remember that the definition of
deterministic context free is that it is possible to build a deterministic pda to accept L$. So now, we can build a
deterministic pda M as follows: Push each a onto the stack. When we run out of as, the next character will
either be $ (in the case where k = 0) or b (in the case where k = 1). So we know right away which case were
dealing with. If M sees a b, it goes to a state where it pops one b for each a and accepts if it comes out even. If it
sees $, it goes to a state where it clears the stack and accepts.
c. __E__ {anbncn : n 0}
We proved that this is recursive by showing a grammar for it in Lecture Notes 24. We used the pumping theorem
to prove that it isnt context free in Lecture Notes 19.
Homework 22
Review
d. __C__ {anbncm : n 0, m 0}
This one is context free. We need to compare the as to the bs, but the cs are independent. So a grammar to
generate this one is:
SAC
AaAb
A
CcC
C
Its deterministic because we can build a pda that always knows what to do: push as, pop an a for each b, then
simply scan the cs.
e. __C__ {anbn : n 0} a*
This one is equivalent to b, since a* = anb0n.
f. __E__ {anbm : n is prime and m is even}
This one is recursive because we can write an algorithm to determine whether a number is prime and another one
to determine whether a number is even. The proof that it is essentially the same as the one we did in class that an:
n is prime is not context free.
g. __C__ {anbmcm+n : n 0, m 0}
This one is context free. A grammar for it is:
SaSc
SbSc
S
Its deterministic because we can build a deterministic pda M for it: M pushes each a onto its stack. It also
pushes an a for each b. Then, when it starts seeing cs, it pops one a for each c. If it runs out of as and cs at the
same time, it accepts.
h. __E__ {anbmcmn : n 0, m 0}
This one is similar to g, but because the number of cs is equal to the product of n and m, rather than the sum,
there is no way to know how many cs to generate until we know both how many as there are and how many bs.
Clearly we can write an algorithm to do it, so its recursive. To prove this , we need to use the pumping theorem.
Let w = aMbMcMM. Call the as, region 1, the bs region 2, and the cs region 3. Clearly neither v nor y can span
regions since, if they did, wed get a string with letters out of order. So we need only consider the following
possibilities:
(1, 1) The number of cs will no longer be the product of n and m.
(1, 2) The number of cs will no longer be the product of n and m.
(1, 3) Ruled out by |vxy| M.
(2, 2) The number of cs will no longer be the product of n and m.
(2, 3) The number of cs will no longer be the product of n and m.
(3, 3) The number of cs will no longer be the product of n and m.
i. __B__ {anbm : n 0, m 0}
This one is regular. It is defined by the regular expression a*b*. It isnt finite, which we know from the presence
of Kleene star in the regular expression.
j. __C__ {xy : x a*, y b*, |x| = |y|}
This one is equivalent to anbn, which weve already shown is context free and not regular. W showed a
deterministic pda to accept it in Lecture Notes 14.
k. __B__ {xy : x a*, y a*, |x| = |y|}
This one is {w = a*: |w| is even}. Weve shown a simple two state FSM for this one.
l. __B__ {x : x {a, b, c}*, and x has 5 or more a's}
This one also has a simple FSM F that accepts it. F has six states. It simply counts as, up to five. If it ever gets
to 5, it accepts.
Homework 22
Review
Homework 22
Review
III. Supplementary
Materials
Languages
Characterizing Problems as Language Recognition Tasks
In order to create a formal theory of problems (as opposed to algorithms), we need a single,
relatively straightforward framework that we can use to describe any kind of possibly computable
function. The one we'll use is language recognition.
What is a Language?
A language is a set of strings over an alphabet, which can be any finite collection of symbols.
[Slide - Languages]
Defining a Problem as a Language Recognition Task
We can define any problem as a language recognition task. In other words, we can output just a
boolean, True or False. Some problems seem naturally to be described as recognition tasks. For
Supplementary Materials
example, accept grammatical English sentences and reject bad ones. (Although the truth is that
English is so squishy it's nearly impossible to formalize this. So let's pick another example -accept the syntactically valid C programs and reject the others.)
Problems that you think of more naturally as functions can also be described this way. We define
the set of input strings to consist of strings that are formed by concatenating an input to an
output. Then we only accept strings that have the correct output concatenated to each input.
[Slide - Encoding Output]
Branching Out -- Allowing for Actual Output
Although it is simpler to characterize problems simply as recognition problems and it is possible
to reformulate functional problems as recognition problems, we will see that we can augment the
formalisms we'll develop to allow for output as well.
Defining Languages Using Grammars
Now what we need is a general mechanism for defining languages. Of course, if we have a finite
language, we can just enumerate all the strings in it. But most interesting languages are infinite.
What we need is a finite mechanism for specifying infinite languages. Grammars can do this.
The standard way to write a grammar is as a production system, composed of rules with a left
hand side and a right hand side. Each side contains a sequence of symbols. Some of these
symbols are terminal symbols, i.e., symbols in the language we're defining. Others are drawn
from a finite set of nonterminal symbols, which are internal symbols that we use just to help us
define the language. Of these nonterminal symbols, one is special -- we'll call it the start symbol.
[Slide - Grammars 1]
If there is a grammar that defines a language, then there is an infinite number of such grammars.
Some may be better, from various points of view than others. Consider the grammar for odd
integers. What different grammars could we write? One thing we could do would be to introduce
the idea of odd and even digits. [Slide - Grammars 2]
Sometimes we use single characters, disjoint from the characters of the target language, in our
rules. But sometimes we need more symbols. Then we often use < and > to mark multiple
character nonterminal symbols. [Slide - Grammars 3]
Notice that we've also introduced a notation for OR so that we don't have to write as many
separate rules. By the way, there are lots of ways of writing a grammar of arithmetic expressions.
This one is simple but it's not very good. It doesn't help us at all to determine the precedence of
operators. Later we'll see other grammars that do that.
Grammars as Generators and as Acceptors
So far, we've defined problems as language recognition tasks. But when you look at the
grammars we've considered, you see that there's a sense in which they seem more naturally to be
generators than recognizers. If you start with S, you can generate all the strings in the language
Supplementary Materials
defined by the grammar. We'll see later that we'll use the idea of a grammar as a generator (or an
enumerator) as one way to define some interesting classes of languages.
But you can also use grammars as acceptors, as we've suggested. There are two ways to do that.
One is top-down. By that we mean that you start with S, and apply rules. [work this out for a
simple expression for the Language of Simple Arithmetic Expressions] At some point, you'll
generate a string without any nonterminals (i.e., a string in the language). Check and see if it's
the one you want. If so accept. If not, try again. If you do this systematically, then if the string is
in the language, you'll eventually generate it. If it isn't, you may or may not know when you
should give up. More on that later.
The other approach is bottom up. In this approach, we simply apply the rules sort of backwards,
i.e., we run them from right to left, matching the string to the right hand sides of the rules and
continuing until we generate S and nothing else. [work this out for a simple expression for the
Language of Simple Arithmetic Expressions] Again, there are lots of possibilities to consider
and there's no guarantee that you'll know when to stop if the string isn't in the language.
Actually, for this simple grammar there is, but we can't assure that for all kinds of grammars.
The Language Hierarchy
Remember that our whole goal in this exercise is to describe classes of problems, characterize
them as easy or hard, and define computational mechanisms for solving them. Since we've
decided to characterize problems as languages to be recognized, what we need to do is to create a
language hierarchy, in which we start with very simple languages and move toward more
complex ones.
Regular Languages
Regular languages are very simple languages that can be defined by a very restricted kind of
grammar. In these grammars, the left side of every rule is a single nonterminal and the right side
is a single terminal optionally followed by a single nonterminal. [slide - Regular Grammars] If
you look at what's going on with these simple grammars, you can see that as you apply rules,
starting with S, you generate a terminal symbol and (optionally) have a new nonterminal to work
with. But you can never end up with multiple nonterminals at once. (Recall our first grammar for
Odd Integers [slide - Grammars 1]).
Of course, we also had another grammar for that same language that didn't satisfy this restriction.
But that's okay. If it is possible to define the language using the restricted formalism, then it falls
into the restricted class. The fact that there are other, less restricted ways to define it doesn't
matter.
It turns out that there is an equivalent, often useful, way to describe this same class of languages,
using regular expressions. [Slide Regular Expressions and Languages] Regular expressions
don't look like grammars, in the sense that there are no production rules, but they can be used to
define exactly the same set of languages that the restricted class of regular grammars can define.
Here's a regular expression for the language that consists of odd integers, and one for the
Supplementary Materials
language of identifiers. We can try to write a regular expression for the language of matched
parenthesis, but we won't succeed.
Intuitively, regular languages are ones that can be defined without keeping track of more than a
finite number of things at once. So, looking back at some of our example languages [slide
Languages], the first is regular and none of the others is.
Context Free Languages
To get more power in how we define languages, we need to return to the more general production
rule structure.
Suppose we allow rules where the left hand side is composed of a single symbol and the right
hand side can be anything. We then have the class of context-free grammars. We define the
class of context-free languages to include any language that can be generated by a context-free
grammar.
The context-free grammar formalism allows us to define many useful languages, including the
languages of matched parentheses and of equal numbers of parentheses but in any order [slide Context-Free Grammars]. We can also describe the language of simple arithmetic expressions
[slide - Grammars 3].
Although this system is a lot more powerful (and useful) than regular languages are, it is not
adequate for everything. We'll see some quite simple artificial language it won't work for in a
minute. But it's also inadequate for things like ordinary English. [slide - English Isn't ContextFree].
Recursively Enumerable Languages
Now suppose we remove all restrictions from the form of our grammars. Any combination of
symbols can appear on the left hand side and any combination of symbols can appear on the
right. The only real restriction is that there can be only a finite number of rules. For example, we
can write a grammar for the language that contains strings of the form a n b n c n . [slide Unrestricted Grammars]
Once we remove all restrictions, we clearly have the largest set of languages that can be
generated by any finite grammar. We'll call the languages that can be generated in this way the
class of recursively enumerable languages. This means that, for any recursively enumerable
language, it is possible, using the associated grammar, to generate all the strings in the language.
Of course, it may take an infinite amount of time if the language contains an infinite number of
strings. But any given string, if it is enumerated at all, will be enumerated in a finite amount of
time. So I guess we could sit and wait. Unfortunately, of course, we don't know how long to
wait, which is a problem if we're trying to decide whether a string is in the language by
generating all the strings and seeing if the one we care about shows up.
Supplementary Materials
Recursive Languages
There is one remaining set of languages that it is useful to consider. What about the recursively
enumerable languages where we could guarantee that, after a finite amount of time, either a given
string would be generated or we would know that it isn't going to be. For example, if we could
generate all the strings of length 1, then all the strings of length 2, and so forth, we'd either
generate the string we want or we'd just wait until we'd gone past the length we cared about and
then report failure. From a practical point of view, this class is very useful since we like to deal
with solutions to problems that are guaranteed to halt. We'll call this class of languages the
recursive languages. This means that we can not only generate the strings in the language, we
can actually, via some algorithm, decide whether a string is in the language and halt, with an
answer, either way.
Clearly the class of recursive languages is a subset of the class of recursively enumerable ones.
But, unfortunately, this time we're not going to be able to define our new class by placing
syntactic restrictions on the form of the grammars we use. There are some useful languages,
such, as a n b n c n , that are recursive. There are some others, unfortunately, that are not.
The Whole Picture
[Slide - The Language Hierarchy]
Computational Devices
Formal Models of Computational Devices
If we want to make formal statements about the kinds of computing power required to solve
various kinds of problems, then we need simple, precise models of computation.
We're looking for models that make it easy to talk about what can be computed -- we're not
worrying about efficiency at this point.
When we described languages and grammars, we saw that we could introduce several different
structures, each with different computational power. We can do the same thing with machines.
Let's start with really simple devices and see what they can do. When we find limitations, we can
expand their power.
Finite State Machines
The only memory consists of the ability to be in one of a finite number of states. The machine
operates by reading an input symbol and moving to a new state that is determined solely by the
state it is in and the input that it reads. There is a unique start state and one or more final states.
If the input is exhausted and the machine is in a final state, then it accepts the input. Otherwise it
rejects it.
Supplementary Materials
Supplementary Materials
By the way, it also makes sense to talk about nondeterministic finite state machines. But it turns
out that adding nondeterminism to finite state machines doesn't increase the class of things they
can compute. It just makes it easier to describe some machines. Intuitively, the reason that
nondeterminism doesn't buy you anything with finite state machines is that we can simulate a
nondeterministic machine with a deterministic machine. We just make states that represent sets
of states in the nondeterministic machine. So in essence, we follow all paths. If one of them
accepts, we accept.
Then why can't we do that with PDA's? For finite state machines, there must be a finite number
of states. DAH. So there is a finite number of subsets of states and we can just make them the
states of our new machine. Clunky but finite. Once we add the stack, however, there is no
longer a finite number of states of the total machine. So there is not a finite number of subsets of
states. So we can't simulate being in several states at once just using states. And we only have
one stack. Which branch would get it? That's why adding nondeterminism actually adds power
for PDAs.
Turing Machines
Clearly there are still some things we cannot do with PDAs. All we have is a single stack. We
can count one thing. If we need to count more than one thing (such as a's and b's in the case of
languages defined by a n b n c n ), we're in trouble.
So we need to define a more powerful computing device. The formalism we'll use is called the
Turing Machine, after its inventor, Alan Turing. There are many different (and equivalent) ways
to write descriptions of Turing Machines, but the basic idea is the same for all of them [slide Turing Machines]. In this new formalism, we allow our machines to write onto the input tape.
They can write on top of the input. They can also write past the input. This makes it easier to
define computation that actually outputs something besides yes or no if we want to. But, most
importantly, because we view the tape as being of infinite length, all limitations of finiteness or
limited storage have been removed, even though we continue to retain the core idea of a finite
number of states in the controller itself.
Notice, though, that Turing Machines are not guaranteed to halt. Our example one always does.
But we could certainly build one that scans right until it finds a blank (writing nothing) and then
scans left until it finds the start symbol and then scans right again and so forth. That's a legal (if
stupid) Turing Machine. Unfortunately, (see below) it's not always possible to tell, given a
Turning Machine, whether it is guaranteed to halt. This is the biggest difference between Turing
Machines and the FSMs and PDAs, both of which will always halt.
Extensions to Turing Machines
You may be thinking, wow, this Turing Machine idea sure is restrictive. For example, suppose
we want to accept all strings in the simple language {w # w R }. We saw that this was easy to do
in one pass with a pushdown automaton. But to do this with the sort of Turing Machine we've
got so far would be really clunky. [work this out on a slide] We'd have to start at the left of the
string, mark a character, move all the way to the right to find the corresponding character, mark
Supplementary Materials
it, scan back left, do it again, and so forth. We've just transformed a linear process into an n 2
one.
But suppose we had a Turing Machine with 2 tapes. The first thing we'll do is to copy the input
onto the second tape. Now start the read head of the first tape at the left end of the input and the
read head of the second tape at the right end. At each step in the operation of the machine, we
check to make sure that the characters being read on the two tapes are the same. And we move
the head on tape 1 right and the head on tape 2 to the left. We run out of input on both machines
at the same time, we accept. [slide -A Two Head Turing Machine]
The big question now is, "Have we created a new notational device, one that makes it easier to
describe how a machine will operate, or have we actually created a new kind of device with more
power than the old one? The answer is the former. We can prove that by showing that we can
simulate a Turing Machine with any finite number of tapes by a machine that computes the same
thing but only has one tape. [slide - Simulating k Heads with One] The key idea here is to use
the one tape but to think of it has having some larger number of tracks. Since there is a finite
tape alphabet, we know that we can encode any finite number of symbols in a finite (but larger)
symbol alphabet. For example, to simulate our two headed machine with a tape alphabet of 3
symbols plus start and blank, we will need 2*2*5*5 or 100 tape symbols. So to do this
simulation, we must do two main things: Encode all the information from the old, multi-tape
machine on the new, single tape machine and redesign the finite state controller so that it
simulates, in several moves, each move of the old machine.
It turns out that any "reasonable" addition you can think of to our idea of a Turing Machine is
implementable with the simple machine we already have. For example, any nondeterministic
Turing Machine can be simulated by a deterministic one. This is really significant. In this, in
some ways trivial, machine, we have captured the idea of computability.
Okay, so our Turing Machines can do everything any other machine can do. It also goes the
other way. We can propose alternative structures that can do everything our Turing Machines
can do. For example, we can simulate any Turing Machine with a deterministic PDA that has
two stacks rather than one. What this machine will do is read its input tape once, copying onto
the first stack all the nonblank symbols. Then it will pop all those symbols off, one at a time, and
move them to the second stack. Now it can move along its simulated tape by transfering symbols
from one stack to the other. [slide - Simulating a Turing Machine with Two Stacks]
The Universal Turing Machine
So now, having shown that we can simulate anything on a simple Turing Machine, it should
come as no surprise that we can design a Turing Machine that takes as its input the definition of
another Turing Machine, along with an input for that machine. What our machine does is to
simulate the behavior of the machine it is given, on the given input.
Remember that to simulate a k-tape machine by a 1 tape machine we had first to state how to
encode the multiple tapes. Then we had to state how the machine would operate on the
encoding. We have to do the same thing here. First we need to decide how to encode the states
Supplementary Materials
and the tape symbols of the input machine, which we'll call M. There's no upper bound on how
many states or tape symbols there will be. So we can't encode them with single symbols. Instead
we'll encode states as strings that start with a "q" and then have a binary encoding of the state
number (with enough leading zeros so all such encodings take the same number of digits). We'll
encode tape symbols as an "a" followed by a binary encoding of the count of the symbol. And
we'll encode "move left" as 10, "move right" as 01, and stay put as 00. We'll use # as a delimiter
between transitions. [slide - Encoding States, Symbols, and Transitions]
Next, we need a way to encode the simulation of the operation of M. We'll use a three tape
machine as our Universal Turing Machine. (Remember, we can always implement it on a one
tape machine, but this is a lot easier to describe.) We'll use one tape to encode the tape of M, the
second tape contains the encoding of M, and the third tape encodes the current state of M during
the simulation. [slide - The Universal Turing Machine]
A Hierarchy of Computational Devices
These various machines that we have just defined, fall into an inclusion hierarchy, in the sense
that the simpler machines can always be simulate by the more powerful ones. [Slide - A
Machine Hierarchy]
Church's Thesis
If we really want to talk about naturalness, can we say anything about whether we've captured
what it means to be computable? Church's Thesis (also sometimes called the Church-Turing
Thesis) asserts that the precise concept of the Turing Machine that halts on all inputs corresponds
to the intuitive notion of an algorithm. Think about it. Clearly a Turing Machine that halts
defines an algorithm. But what about the other way around? Could there be something that is
computable by some kind of algorithm that is not computable by a Turing Machine that halts?
From what we've seen so far, it may seem unlikely, since every extension we can propose to the
Turing Machine model turns out possibly to make things more convenient, but it never extends
the formal power. It turns out that people have proposed various other formalisms over the last
50 years or so, and they also turn out to be no more powerful than the Turing Machine. Of
course, something could turn up, but it seems unlikely.
Supplementary Materials
10
itself, then TROUBLE loops (i.e., it doesn't halt, thus contradicting our assumption that HALTS
could do the job). But if HALTS says FALSE, namely that TROUBLE will not halt on itself,
then TROUBLE promptly halts, thus again proving our supposed oracle HALTS wrong. Thus
HALTS cannot exist.
We've used a sort of stripped down version of diagonalization here [slide - Viewing the Halting
Problem as Diagonalization] in which we don't care about the whole row of the item that
creates the contradiction. We're only invoking HALTS with two identical inputs. It's just the
single element that we care about and that causes the problem.
So What's Left?
Supplementary Materials
11
Supplementary Materials
The smallest set is the set that contains no elements. It is called the empty set, and is written or {}.
When you are working with sets, it is very important to keep in mind the difference between a set and the elements of a
set. Given a set that contains more than one element, this not usually tricky. It's clear that {1, 2} is distinct from either the
number 1 or the number 2. It sometimes becomes a bit trickier though with singleton sets (sets that contain only a single
element). But it is equally true here. So, for example, {1} is distinct from the number 1. As another example, consider
{}. This is a set that contains one element. That element is in turn a set that contains no elements (i.e., the empty set).
Notice that the empty set is a subset of every set (since, trivially, every element of , all none of them, is also an element
of every other set). And the empty set is a proper subset of every set other than itself.
It is useful to define some basic operations that can be performed on sets:
The union of two sets A and B (written A B) contains all elements that are contained in A or B (or both). We can
easily visualize union using a Venn diagram. The union of sets A and B is the entire hatched area:
The intersection of two sets A and B (written A B) contains all elements that are contained in both A and B. In the
Venn diagram shown above, the intersection of A and B is the double hatched area in the middle.
The difference of two sets A and B (written A - B) contains all elements that are contained in A but not in B. In both of
the following Venn diagrams, the hatched region represents A - B.
Supplementary Materials
The complement of a set A with respect to a specific domain D (written as A or A) contains all elements of D that are
not contained in A (i.e, A = D - A). For example, if D is the set of residents of Austin and A is the set of Austin residents
who like barbeque, then A is the set of Austin residents who don't like barbeque. The complement of A is shown as the
hatched region of the following Venn diagram:
Two sets are disjoint if they have no elements in common (i.e., their intersection is empty). In the following Venn
diagram, A and B are disjoint:
So far, we've talked about operations on pairs of sets. But just as we can extend binary addition and sum up a whole set of
numbers, we can extend the binary operations on sets and perform then on sets of sets. Recall that for summation, we
have the notation
7 Ai
and
1 Ai
to indicate the union of a set of sets and the intersection of a set of sets, respectively.
Now consider a set A. For example, let A = {1, 2, 3}. Next, let's enumerate the set of all subsets of A:
{, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
We call this set the power set of A, and we write it 2A. The power set of A is interesting because, if we're working with
the elements of A, we may well care about all the ways in which we can combine those elements.
Now for one final property of sets. Again consider the set A above. But this time, rather than looking for all possible
subsets, let's just look for a single way to carve A up into subsets such that each element of A is in precisely one subset.
For example, we might choose any of the following sets of subsets:
{{1}, {2, 3}}
or
{{1, 3}, 2}
or
{{1, 2, 3}}
We call any such set of subsets a partition of A. Partitions are very useful. For example, suppose we have a set S of
students in a school. We need for every student to be assigned to precisely one lunch period. Thus we must construct a
partition of S: a set of subsets, one for each lunch period, such that each student is in precisely one subset. More formally,
we say that is a partition of a set A if and only if (a) no element of is empty; (b) all members of are disjoint
(alternatively, each element of A is in only one element of ); and (c) 7 = A (alternatively, each element of A is in
some element of and no element not in A is in any element of ).
Supplementary Materials
This notion of partitioning a set is fundamental to programming. Every time you analyze the set of possible inputs to your
program and consider the various cases that must be dealt with, you're forming a partition of the set of inputs: each input
must fall through precisely one path in your program. So it should come as no surprise that, as we build formal models of
computational devices, we'll rely heavily on the idea of a partition on a set of inputs as an analytical technique.
2.1 Relations
An ordered pair is a sequence of two objects. Given any two objects, x and y, there are two ordered pairs that can be
formed. We write them as (x, y) and (y, x). As the name implies, in an ordered pair (as opposed to in a set), order matters
(unless x and y happen to be equal).
The Cartesian product of two sets A and B (written A B) is the set of all ordered pairs (a, b) such that a A and b B.
For example, let A be a set of people {Dave, Sue, Billy} and let B be a set of desserts {cake, pie, ice cream}. Then
A B = { (Dave, cake), (Dave, pie), (Dave, ice cream),
(Sue, cake), (Sue, pie), (Sue, ice cream),
(Billy, cake), (Billy, pie), (Billy, ice cream)}
As you can see from this example, the Cartesian product of two sets contains elements that represent all the ways of
pairing someone from the first set with someone from the second. Note that A B is not the same as B A. In our
example,
B A = { (cake, Dave), (pie, Dave), (ice cream, Dave),
(cake, Sue), (pie, Sue), (ice cream, Sue),
(cake, Billy), (pie, Billy), (ice cream, Billy)}
We'll have more to say about the cardinality (size) of sets later, but for now, let's make one simple observation about the
cardinality of a Cartesian product. If A and B are finite and if there are p elements in A and q elements in B, then there
are p*q elements in AB (and in BA).
We're going to use Cartesian product a lot. It's our basic tool for constructing complex objects out of simpler ones. For
example, we 're going to define the class of Finite State Machines as the Cartesian product of five sets. Each individual
finite state machine then will be a five tuple (K, , , s, F) drawn from that Cartesian product. The sets will be:
1. The set of all possible sets of states: {{q1}, {q1, q2}, {q1, q2, q3}, }. We must draw K from this set.
2. The set of all possible input alphabets: {{a}, {a, b, c}, {, , }, {1, 2, 3, 4}, {1, w, h, j, k}, {q, a, f}, {a, , 3, j,
f}}. We must draw from this set.
3. The set of all possible transition functions, which tell us how to move from one state to the next. We must draw
from this set.
4. The set of all possible start states. We must draw s from this set.
5. The set of all possible sets of final states. (If we land in one of these when we've finished processing an input string,
then we accept the string, otherwise we reject.) We must draw F from this set.
Let's return now to the simpler problem of choosing dessert. Suppose we want to define a relation that tells us, for each
person, what desserts he or she likes. We might write the Dessert relation, for example as
{(Dave, cake), (Dave, ice cream), (Sue, pie), (Sue, ice cream)}
In other words, Dave likes cake and ice cream, Sue likes pie and ice cream, and Billy hates desserts.
We can now define formally what a relation is. A binary relation over two sets A and B is a subset of A B. Our dessert
relation clearly satisfies this definition. So do lots of other relations, including common ones defined on the integers. For
Supplementary Materials
example, Less than (written <) is a binary relation on the integers. It contains an infinite number of elements drawn from
the Cartesian product of the set of integers with itself. It includes, for example:
{(1,2), (2,3), (3, 4), }
Notice several important properties of relations as we have defined them. First, a relation may be equal to the empty set.
For example, if Dave, Sue, and Billy all hate dessert, then the dessert relation would be {} or .
Second, there are no constraints on how many times a particular element of A or B may occur in the relation. In the
dessert example, Dave occurs twice, Sue occurs twice, Billy doesn't occur at all, cake occurs once, pie occurs once, and
ice cream occurs twice.
If we have two or more binary relations, we may be able combine them via an operation we'll call composition. For
example, if we knew the number of fat grams in a serving of each kind of dessert, we could ask for the number of fat
grams in a particular person's dessert choices. To compute this, we first use the Dessert relation to find all the desserts
each person likes. Next we get the bad news from the FatGrams relation, which probably looks something like this:
{(cake, 25), (pie, 15), (ice cream, 20)
Finally, we see that the composed relation that relates people to fat grams is {(Dave, 25), (Dave, 20), (Sue, 15), (Sue,
20)}. Of course, this only worked because when we applied the first relation, we got back desserts, and our second
relation has desserts as its first component. We couldn't have composed Dessert with Less than, for example.
Formally, we say that the composition of two relations R1 A B and R2 B C, written R2 R1 is {(a,c) : (a, b)
R1 and (b, c) R2}. Note that in this definition, we've said that to compute R2 R1, we first apply R1, then R2. In other
words we go right to left. Some definitions go the other way. Obviously we can define it either way, but it's important to
check carefully what definition people are using and to be consistent in what you do. Using this notation, we'd represent
the people to fat grams composition described above as FatGrams Dessert.
Now let's generalize a bit. An ordered pair is a sequence (where order counts) of two elements. We could also define an
ordered triple as a sequence of three elements, an ordered quadruple as a sequence of four elements, and so forth. More
generally, if n is any positive integer, then an ordered n-tuple is a sequence of n elements. For example, (Ann, Joe, Mark)
is a 3-tuple.
We defined binary relation using our definition of an ordered pair. Now that we've extended our definition of an ordered
pair to an ordered n-tuple, we can extend our notion of a relation to allow for an arbitrary number of elements to be
related. We define an n-ary relation over sets A1, A2, An as a subset of A1 A2 An. The n sets may be
different, or they may be the same. For example, let A be a set of people:
A = {Dave, Sue, Billy, Ann, Joe, Mark, Cathy, Pete}
Now suppose that Ann and Dave are the parents of Billy, Ann and Joe are the parents of Mark, and Mark and Sue are the
parents of Cathy. Then we could define a 3-ary (or ternary) relation Child-of as the following subset of A A A:
{(Ann, Dave, Billy), (Ann, Joe, Mark), (Mark, Sue, Cathy)}
2.2 Functions
Relations are very general. They allow an object to be related to any number of other objects at the same time (as we did
in the dessert example above). Sometimes, we want a more restricted notion, in which each object is related to a unique
other object. For example, (at least in an ideal world without criminals or incompetent bureaucrats) each American
resident is related to a unique social security number. To capture this idea we need functions. A function from a set A to
a set B is a special kind of a binary relation over A and B in which each element of A occurs precisely once. The dessert
relation we defined earlier is not a function since Dave and Sue each occur twice and Billy doesn't occur at all. We
haven't restricted each person to precisely one dessert. A simple relation that is a function is the successor function Succ
defined on the integers:
Succ(n) = n + 1.
Of course, we cannot write out all the elements of Succ (since there are an infinite number of them), but Succ includes:
{, (-3, -2), (-2, -1), (-1, 0), (0, 1), (1, 2), (2, 3), }
Supplementary Materials
It's useful to define some additional terms to make it easy to talk about functions. We start by writing
f: A B,
which means that f is a function from the set A to the set B. We call A the domain of f and B the codomain or range. We
may also say that f is a function from A to B. If a A, then we write
f(a),
which we read as "f of a" to indicate the element of B to which a is related. We call this element the image of a under f or
the value of f for a. Note that, given our definition of a function, there must be exactly one such element. We'll also call a
the argument of f. For example we have that
Succ(1) = 2, Succ (2) = 3, and so forth.
Thus 2 is the image (or the value) of the argument 1 under Succ.
Succ is a unary function. It maps from a single element (a number) to another number. But there are lots of interesting
functions that map from ordered pairs of elements to a value. We call such functions binary functions. For example,
integer addition is a binary function:
+: (Z Z) Z
Thus + includes elements such as ((2, 3), 5), since 2 + 3 is 5. We could also write
+((2,3)) = 5
We have double parentheses here because we're using the outer set to indicate function application (as we did above
without confusion for Succ) and the inner set to define the ordered pair to which the function is being applied. But this is
confusing. So, generally, when the domain of a function is the Cartesian product of two or more sets, as it is here, we drop
the inner set of parentheses and simply write
+(2,3) = 5.
Alternatively, many common binary functions are written in infix notation rather than the prefix notation that is standard
for all kinds of function. This allows us to write
2+3 = 5
So far, we've had unary functions and binary functions. But just as we could define n-ary relations for arbitrary values of
n, we can define n-ary functions. For any positive integer n, an n-ary function f is a function is defined as
F: (D1 D2 Dn) R
For example, let Z be the set of integers. Then
QuadraticEquation: (Z Z Z) F
is a function whose domain is an ordered triple of integers and whose domain is a set of functions. The definition of
Quadratic Equation is:
QuadraticEquation(a, b, c) (x) = ax2 + bx + c
What we did here is typical of function definition. First we specify the domain and the range of the function. Then we
define how the function is to compute its value (an element of the range) given its arguments (an element of the domain).
QuadraticEquation may seem a bit unusual since its range is a set of functions, but both the domain and the range of a
function can be any set of objects, so sets of functions qualify.
Recall that in the last section we said that we could compose binary relations to derive new relations. Clearly, since
functions are just special kinds of binary relations, if we can compose binary relations we can certainly compose binary
functions. Because a function returns a unique value for each argument, it generally makes a lot more sense to compose
functions than it does relations, and you'll see that although we rarely compose relations that aren't functions, we compose
functions all the time. So, following our definition above for relations, we define the composition of two functions F1
A B and F2 B C, written F2 F1 is {(a,c) : b (a, b) F1 and (b, c) F2}. Notice that the composition of two
functions must necessarily also be a function. We mentioned above that there is sometimes confusion about the order in
which relations (and now functions) should be applied when they are composed. To avoid this problem, let's introduce a
new notation F(G(x)). We use the parentheses here to indicate function application, just as we did above. So this notation
is clear. Apply F to the result of first applying G to x. This notation reads right to left as does our definition of the
notation.
A function is a special kind of a relation (one in which each element of the domain occurs precisely once). There are also
special kinds of functions:
Supplementary Materials
A function f : D R is total if it is defined for every element of D (i.e., every element of D is related to some element of
R). The standard mathematical definition of a function requires totality. The reason we haven't done that here is that, as
we pursue the idea of "computable functions", we'll see that there are total functions whose domains cannot be effectively
defined (for example, the set of C programs that always halt). Thus it is useful to expand the definition of the function's
domain (e.g., to the set of all C programs) and acknowledge that if the function is applied to certain elements of the
domain (e.g., programs that don't halt), its value will be undefined. We call this broader class of functions (which does
include the total functions as a subset) the set of partial functions. For the rest of our discussion in this introductory unit,
we will consider only total functions, but be prepared for the introduction of partial functions later.
A function f : D R is one to one if no element of the range occurs more than once. In other words, no two elements of
the domain map to the same element of the range. Succ is one to one. For example, the only number to which we can
apply Succ and derive 2 is 1. QuadraticEquation is also one to one. But + isn't. For example, both +(2,3) and +(4,1)
equal 5.
A function f : D R is onto if every element of R is the value of some element of D. Another way to think of this is that
a function is onto if all of the elements of the range are "covered" by the function. As we defined it above, Succ is onto.
But let's define a different function Succ' on the natural numbers (rather than the integers). So we define
Succ' : N N.
Succ' is not onto because there is no natural number i such that Succ'(i) = 0.
The easiest way to envision the differences between an arbitrary relation, a function, a one to one function and an onto
function is to make two columns (the first for the domain and the second for the range) and think about the sort of
matching problems you probably had on tests in elementary school.
Let's consider the following five matching problems and let's look at various ways of relating the elements of column 1
(the domain) to the elements of column 2 (the range):
1
The relationship in example 1 is a relation but it is not a function, since there are three values associated A. The second
example is a function since, for each object in the first column, there is a single value in the second column. But this
function is neither one to one (because x is derived from both A and B) nor onto (because z can't be derived from
anything). The third example is a function that is one to one (because no element of the second column is related to more
than one element of the first column). But it still isn't onto because z has been skipped: nothing in the first column derives
it. The fourth example is a function that is onto (since every element of column two has an arrow coming into it), but it
isn't one to one, since z is derived from both C and D. The fifth and final example is a function that is both one to one and
onto. By the way, see if you can modify either example 3 or example 4 to make them both one to one and onto. You're
not allowed to change the number of elements in either column, just the arrows. You'll notice that you can't do it. In order
for a function to be both one to one and onto, there must be equal numbers of elements in the domain and the range.
The inverse of a binary relation R is simply the set of ordered pairs in R with the elements of each pair reversed.
Formally, if R A B, then R-1 B A = {(b, a): (a, b) R}. If a relation is a way of associating with each element of
A a corresponding element of B, then think of its inverse as a way of associating with elements of B their corresponding
elements in A. Every relation has an inverse. Every function also has an inverse, but that inverse may not also be a
function. For example, look again at example two of the matching problems above. Although it is a function, its inverse
is not. Given the argument x, should we return the value A or B? Now consider example 3. Its inverse is also not a
(total) function, since there is no value to be returned for the argument z. Example four has the same problem example
Supplementary Materials
two does. Now look at example five. Its inverse is a function. Whenever a function is both one to one and onto, its
inverse will also be a function and that function will be both one to one and onto.
Inverses are useful. When a function has an inverse, it means that we can move back and forth between columns one and
two without loss of information. Look again at example five. We can think of ourselves as operating in the {A, B, C}
universe or in the {x, y, z} universe interchangeably since we have a well defined way to move from one to the other. And
if we move from column one to column two and then back, we'll be exactly where we started. Functions with inverses
(alternatively, functions that are both one to one and onto) are called bijections. And they may be used to define
isomorphisms between sets, i.e., formal correspondences between the elements of two sets, often with the additional
requirement that some key structure be preserved. We'll use this idea a lot. For example, there exists an isomorphism
between the set of states of a finite state machine and a particular set of sets of input strings that could be fed to the
machine. In this isomorphism, each state is associated with precisely the set of strings that drive the machine to that state.
Doreen
Supplementary Materials
Ann
Catherine
Allison
And, finally, approach 4: Again assuming a finite relation R A A, we can build an incidence matrix to represent R as
follows:
1) Construct a square boolean matrix S whose number of rows and columns equals the number of elements of A that
appear in any element of R.
2) Label one row and one column for each such element of A.
3) For each element (p, q) of R, set S(p, q) to 1(or True). Set all other elements of S to 0 (or False).
The following boolean matrix represents our example relation M defined above:
Doreen
Ann
Catherine
Allison
Doreen
0
0
0
0
Ann
1
0
0
0
Catherine
0
1
0
0
Allison
0
0
1
0
1
1
1
A relation R A A is symmetric if, whenever (a, b) R, so is (b, a). In other words, if a is related to b, then b is
related to a. The Address relation we described above is symmetric. If Joe lives with Ann, then Ann lives with Joe. The
Less than or equal relation is not symmetric (since, for example, 2 3, but it is not true that 3 2). The graph
representation of a symmetric relation has the property that between any two nodes, either there is an arrow going in both
directions or there is an arrow going in neither direction. So we get graphs with components that look like this:
If we choose the matrix representation, we will end up with a symmetric matrix (i.e., if you flip it on its major diagonal,
you'll get the same matrix back again). In other words, if we have a matrix with 1's wherever there is a number in the
following matrix, then there must also be 1's in all the squares marked with an *:
Supplementary Materials
*
1
*
3
*
A relation R A A is antisymmetric if, whenever (a, b) R and a b, then (b, a) R The Mother-of relation we
described above is antisymmetric: if Ann is the mother of Catherine, then one thing we know for sure is that Catherine is
not also the mother of Ann. Our Address relation is clearly not antisymmetric, since it is symmetric. There are, however,
relations that are neither symmetric nor antisymmetric. For example, the Likes relation on the set of people: If Joe likes
Bob, then it is possible that Bob likes Joe, but it is also possible that he doesn't.
A relation R A A is transitive if, whenever (a, b) R and (b, c) R, (a, c) R. A simple example of a transitive
relation is Less than. Address is another one: if Joe lives with Ann and Ann lives with Mark, then Joe lives with Mark.
Mother-of is not transitive. But if we change it slightly to Ancestor-of, then we get a transitive relation. If Doreen is an
ancestor of Ann and Ann is an ancestor of Catherine, then Doreen is an ancestor of Catherine.
The three properties of reflexivity, symmetry, and transitivity are almost logically independent of each other. We can find
simple, possibly useful relationships with seven of the eight possible combinations of these properties:
Domain
people
people
people
people
people
numbers
Example
Mother-of
Would-recognize-picture-of
Has-ever-been-married-to
Ancestor-of
Hangs-out-with (assuming we can say one hangs out with oneself)
Less than or equal to
numbers
people
Equality
Address
To see why we can't find a good example of a relation that is symmetric and transitive but not reflexive, consider a simple
relation R on {1, 2, 3, 4}. As soon as R contains a single element that relates two unequal objects (e.g., (1, 2)), it must, for
symmetry, contain the matching element (2, 1). So now we have R = {(1, 2), (2, 1)}. To make R transitive, we must add
(1, 1). But that also makes R reflexive.
Supplementary Materials
10
1
2
3
4
5
Now let's build an equivalence relation E on P. The first thing we have to do is to relate each node to itself, in order to
make the relation reflexive. So we've now got:
1
2
3
4
5
Now let's add one additional element (1,2). As soon as we do that, we must also add (2,1), since E must be symmetric. So
now we've got:
1
2
3
4
5
Suppose we now add (2,3). We must also add (3,2) to maintain symmetry. In addition, because we have (1, 2) and (2, 3),
we must create (1,3) for transitivity. And then we need (3, 1) to restore symmetry. That gives us
1
2
3
4
5
Notice what happened here. As soon as we related 3 to 2, we were also forced to relate 3 to 1. If we hadn't, we would no
longer have had an equivalence relation. See what happens now if you add (3, 4) to E.
What we've seen in this example is that an equivalence relation R on a set S carves S up into a set of clusters, which we'll
call equivalence classes. This set of equivalence classes has the following key property:
For any s, t S, if s Classi and (s, t) R, then t Classi.
In other words, all elements of S that are related under R are in the same equivalence class. To describe equivalence
classes, we'll use the notation [a] to mean the equivalence class to which a belongs. Or we may just write [description],
where description is some clear property shared by all the members of the class. Notice that in general there may be lots
of different ways to describe the same equivalence class. In our example, for instance, [1], [2], and [3] are different names
for the same equivalence class, which includes the elements 1, 2, and 3. In this example, there are two other equivalence
classes as well: [4] and [5].
It is possible to prove that if R is an equivalence relation on a nonempty set A then the equivalence classes of R constitute
a partition of A. Recall that is a partition of a set A if and only if (a) no element of is empty; (b) all members of
are disjoint; and (c) 7 = A . In other words, if we want to take a set A and carve it up into a set of subsets, an
equivalence relation is a good way to do it.
For example, our Address relation carves up a set of people into subsets of people who live together. Let's look at some
more examples:
Supplementary Materials
11
Let A be the set of all strings of letters. Let SameLength A A relate strings whose lengths are the same.
SameLength is an equivalence relation that carves up the universe of all strings into a collection of subsets, one for
each natural number (i.e., strings of length 0, strings of length 1, etc.).
Let Z be the set of integers. Let EqualMod3 Z Z relate integers that have the same remainder when divided by 3.
EqualMod3 has three equivalence classes, [0], [1], and [2]. [0] includes 0, 3, 6, etc.
Let CP be the set of C programs, each of which accepts an input of variable length. We'll call the length of any
specific input n. Let SameComplexity CP CP relate two programs if their running-time complexity is the same.
More specifically, (c1, c2) SameComplexity precisely in case:
m1, m2, k [n > k, RunningTime(c1) m1*RunningTime(c2) AND RunningTime(c2) m2*RunningTime(c1)]
Not every relation that connects "similar" things is an equivalence relation. For example, consider SimilarCost(x, y),
which holds if the price of x is within $1 of the price of y. Suppose A costs $10, B costs $10.50, and C costs $11.25.
Then SimilarCost(A, B) and SimilarCost(B, C), but not SimilarCost(A, C). So SimilarCost is not transitive, although it is
reflexive and symmetric.
3.4 Orderings
Important as equivalence relations are, they're not the only special kind of relation worth mentioning. Let's consider two
more.
A partial order is a relation that is reflexive, antisymmetric, and transitive. If we write out any partial order as a graph,
we'll see a structure like the following one for the relation SubstringOf. Notice that in order to make the graph relatively
easy to read, we'll adopt the convention that we don't write in the links that are required by reflexivity and transitivity.
But, of course, they are there in the relation itself:
ddyxabcdbce
yxab
bcd
bce
xab
ab
bc
Supplementary Materials
12
Let's consider another example based on the subsumption relation between pairs of logical expressions. A logical
expression A subsumes another expression B iff (if and only if), whenever A is true B must be true regardless of the values
assigned to the variables and functions of A and B. For example: x P(x) subsumes P(A), since, regardless of what the
predicate P is and independently of any axioms we have about it, and regardless of what object A represents, if x P(x) is
true, then P(A) must be true. Why is this a useful notion? Suppose we're building a theorem proving or reasoning
program. If we already know x P(x), and we are then told P(A), we can throw away this new fact. It doesn't add to our
knowledge (except perhaps to focus our attention on the object A) since it is subsumed by something we already knew. A
small piece of the subsumption relation on logical expressions is shown in the following graph. Notice that now there is a
maximal element, False, which subsumes everything (in other words, if we have the assertion False in our knowledge base,
we have a contradiction even if we know nothing else). There is also a minimal element, True, which tells us nothing.
False
x P(x)
P(B)
x R(x) S(x)
x R(x) T(x)
P(A)
P(A) Q(A)
x R(x)
True
A total order R A A is a partial order that has the additional property that a, b A, either (a, b) or (b, a) R. In
other words, every pair of elements must be related to each other one way or another. The classic example of a total order
is (or , if you prefer) on the integers. The relation is reflexive since every integer is equal to itself. It's antisymmetric
since if a b and a b, then for sure it is not also true that b a. It's transitive: if a b and b c, then a c. And, given
any two integers a and b, either a b or b a. If we draw any total order as a graph, we'll get something that looks like
this (again without the reflexive and transitive links shown):
6
5
4
3
This is only a tiny piece of the graph, of course. It continues infinitely in both directions. But notice that, unlike our
earlier examples of partial orders, there is no splitting in this graph. For every pair of elements, one is above and one is
below.
Supplementary Materials
13
a,b a # b = b # a
integer addition
set intersection
boolean and
a,b,c (a # b) # c = a # (b # c)
Examples:
(a + b) + c = a + (b + c)
(a b) c = a (b c)
(a AND b) AND c = a AND (b AND c)
(a || b) || c = a || (b || c)
a a # a = a.
integer addition
set intersection
boolean and
string concatenation
min(a, a) = a
aa=a
a AND a = a
integer min
set intersection
boolean and
The distributivity property relates two binary functions: A function # distributes over another function ! iff
a,b,c a # (b ! c) = (a # b) ! (a # c) and (b ! c) # a = (b # a) ! (c # a)
Examples:
a * (b + c) = (a * b) + (a * c)
integer multiplication over addition
a (b c) = (a b) (a c)
set union over intersection
a AND (b OR c)=(a AND b) OR (a AND c) boolean AND over OR
The absorption laws also relate two binary functions to each other: A function # absorbs another function ! iff
a,b a # (a ! b) = a
Examples:
a (a b) = a
a OR (a AND b) = a
It is often the case that when a function is defined over some set A, there are special elements of A that have particular
properties with respect to that function. In particular, it is worth defining what it means to be an identity and to be a zero:
An element a is an identity for the function # iff
Examples:
b*1=b
b+0=b
b=b
b OR False = b
b || "" = b
x A, x # a = x and a # x = x
1 is an identity for integer multiplication
0 is an identity for integer addition
is an identity for set union
False is an identity for boolean OR
"" is an identity for string concatenation
Sometimes it is useful to differentiate between a right identity (one that satisfies the first requirement above) and a left
identity (one that satisfies the second requirement above). But for all the functions we'll be concerned with, if there is a
left identity, it is also a right identity and vice versa, so we will talk simply about an identity.
x A, x # a = a and a # x = a
b*0=0
b=
b AND FALSE = FALSE
Just as with identities, it is sometimes useful to distinguish between left and right zeros, but we won't need to.
Supplementary Materials
14
Although we're focusing here on binary functions, there's one important property that unary functions may have that is
worth mentioning here:
A unary function % is a self inverse iff x %(%x)) = x. In other words, if we compose the function with itself (apply it
twice), we get back the original argument. Note that this is not the same as saying that the function is its own inverse. In
most of the cases we'll consider (including the examples given here), it is not. A single application of the function
produces a new value, but if we apply the function a second time, we get back to where we started.
Examples:
-(-(a)) = a
1/(1/a)
a=a
( a) = a
(aR)R = a
5.1 Relations
We have defined two relations on sets: subset and proper subset. What can we say about them? Subset is a partial order,
since it is reflexive (every set is a subset of itself), transitive (if A B and B C, then A C) and antisymmetric (if A
B and A B, then it must not be true that B A). For example, we see that the subset relation imposes the following
partial order if you read each arrow as "is a subset of":
Z (the integers)
Odd numbers
Even numbers
{4, 10}
What about proper subset? It is not a partial order since it is not reflexive.
5.2 Functions
All of the functional properties we defined above apply in one way or another to the functions we have defined on sets.
Further, as we saw above, there some set functions have a zero or an identity. We'll summarize here (without proof) the
most useful properties that hold for the functions we have defined on sets:
Commutativity
AB=BA
AB=BA
Associativity
(A B) C = A (B C)
(A B) C = A (B C)
Supplementary Materials
15
Idempotency
AA=A
AA=A
Distributivity
(A B) C = (A C) (B C)
(A B) C = (A C) (B C)
Absorption
(A B) A = A
(A B) A = A
Identity
A=A
Zero
A=
Self Inverse
A=A
In addition, we will want to make use of the following theorems that can be proven to apply specifically to sets and their
operations (as well as to boolean expressions):
De Morgan's laws
AB=AB
AB=AB
6.1 Using Set Identities and the Definitions of the Functions on Sets
Sometimes we want to compare apples to apples. We may, for example, want to prove that two sets of strings are
identical, even though they may have been derived differently. In this case, one approach is to use the set identity
theorems that we enumerated in the last section. Suppose, for example, that we want to prove that
A (B (A C)) = A
We can prove this as follows:
A (B (A C)) = (A B) (A (A C))
Distributivity
= (A B) ((A C) A)
Commutativity
= (A B) A
Absorption
=A
Absorption
Sometimes, even when we're comparing apples to apples, the theorems we've listed aren't enough. In these cases, we need
to use the definitions of the operators. Suppose, for example, that we want to prove that
A-B =AB
We can prove this as follows (where U stands for the Universe with respect to which we take complement):
Supplementary Materials
16
A-B
= {x : x A and x B}
= {x : x A and (x U and x B)}
= {x : x A and x U - B}
= {x : x A and x B}
=AB
6.2 Showing Two Sets are the Same by Showing that Each is a Subset of
the Other
Sometimes, though, our problem is more complex. We may need to compare apples to oranges, by which I mean that we
are comparing sets that aren't even defined in the same terms. For example, we will want to be able to prove that A: the
set of languages that can be defined using regular expressions is the same as B: the set of languages that can be accepted
by a finite state automaton. This seems very hard: Regular expressions look like
a* (b ba)*
Finite state machines are a collection of states and rules for moving from one state to another. How can we possibly prove
that these A and B are the same set? The answer is that we can show that the two sets are equal by showing that each is a
subset of the other. For example, in the case of the regular expressions and the finite state machines, we will show first
that, given a regular expression, we can construct a finite state machine that accepts exactly the strings that the regular
expression describes. That gives us A B. But there might still be some finite state machines that don't correspond to
any regular expressions. So we then show that, given a finite state machine, we can construct a regular expression that
defines exactly the same strings that the machine accepts. That gives us B A. The final step is to exploit the fact that
A B and B A A = B
7 Cardinality of Sets
It seems natural to ask, given some set A, "What is the size of A?" or "How many elements does A contain?" In fact,
we've been doing that informally. We'll now introduce formal techniques for discussing exactly what we mean by the size
of a set. We'll use the term cardinality to describe the way we answer such questions. So we'll reply that the cardinality
of A is X, for some appropriate value of X. For simple cases, determining the value of X is straightforward. In other
cases, it can get quite complicated. For our purposes, however, we can get by with three different kinds of answers: a
natural number (if A is finite), "countably infinite" (if A has the same number of elements as there are integers), and
"uncountably infinite" (if A has more elements than there are integers).
We write the cardinality of a set A as |A|.
A set A is finite and has cardinality n N (the natural numbers) if either A = or there is a bijection from A to {1, 2,
n}, for some value of n. In other words, a set is finite if either it is empty or there exists a one-to-one and onto mapping
from it to a subset of the positive integers. Or, alternatively, a set is finite if we can count its elements and finish. The
cardinality of a finite set is simply a natural number whose value is the number of elements in the set.
A set is infinite if it is not finite. The question now is, "Are all infinite sets the same size?" The answer is no. And we
don't have to venture far to find examples of infinite sets that are not the same size. So we need some way to describe the
cardinality of infinite sets. To do this, we need to define a set of numbers we'll call the cardinal numbers. We'll use these
numbers as our measure of the size of sets. Initially, we'll define all the natural numbers to be cardinal numbers. That lets
us describe the cardinality of finite sets. Now we need to add new cardinal numbers to describe the cardinality of infinite
sets.
Let's start with a simple infinite set N, the natural numbers. We need a new cardinal number to describe the (infinite)
number of natural numbers that there are. Following Cantor, we'll call this number 0. (Read this as "aleph null". Aleph
is the first symbol of the Hebrew alphabet.)
Supplementary Materials
17
Next, we'll say that any other set that contains the same number of members as N does also has cardinality 0. We'll also
call a set with cardinality 0 countably infinite. And one more definition: A set is countable if it is either finite or
countably infinite.
To show that a set has cardinality 0, we need to show that there is a bijection between it and N. The existence of such a
bijection proves that the two sets have the same number of elements. For example, the set E of even natural numbers has
cardinality 0. To prove this, we offer the bijection:
Even : E N
Even(x) = x/2
So we have the following mapping from E to N:
E
0
2
4
6
N
0
1
2
3
This one was easy. The bijection was obvious. Sometimes it's less so. In harder cases, a good way to think about the
problem of finding a bijection from some set A to N is that we need to find an enumeration of A. An enumeration of A is
simply a list of the elements of A in some order. Of course, if A is infinite, the list will be infinite, but as long as we can
guarantee that every element of A will show up eventually, we have an enumeration. But what is an enumeration? It is in
fact a bijection from A to the positive integers, since there is a first element, a second one, a third one, and so forth. Of
course, what we need is a bijection to N, so we just subtract one. Thus if we can devise a technique for enumerating the
elements of A, then our bijection to N is simply
Enum : A N
Enum(x) = x's position in the enumeration - 1
Let's consider an example of this technique:
Theorem: The union of a countably infinite number of countably infinite sets is countably infinite.
To prove this theorem, we need a way to enumerate all the elements of the union. The simplest thing to do would be to
start by dumping in all the elements of the first set, then all the elements of the second, etc. But, since the first set is
infinite, we'll never get around to considering any of the elements of the other sets. So we need another technique. If we
had a finite number of sets to consider, we could take the first element from each, then the second element from each, and
so forth. But we also have an infinite number of sets, so if we try that approach, we'll never get to the second element of
any of the sets. So we follow the arrows as shown below. The numbers in the squares indicate the order in which we
select elements for the enumeration. This process goes on forever, but it is systematic and it guarantees that, if we wait
long enough, any element of any of the sets will eventually be enumerated.
Element 1
Element 2
Element 3
Set 1
1
2
6
7
Set 2
3
5
8
Set 3
4
Set 4
It turns out that a lot of sets have cardinality 0. Some of them, like the even natural numbers, appear at first to contain
fewer elements. Some of them, like the union of a countable number of countable sets, appear at first to be bigger. But in
both cases there is a bijection between the elements of the set and the natural numbers, so the cardinality is 0.
However, this isn't true for every set. There are sets with more than 0 elements. There are more than 0 real numbers,
for example. As another case, consider an arbitrary set S with cardinality 0. Now consider the power set of S (the set of
all subsets of S). This set has cardinality greater than 0. To prove this, we need to show that there exists no bijection
Supplementary Materials
18
between the power set of S and the integers. To do this, we will use a technique called diagonalization. Diagonalization
is a kind of proof by contradiction. It works as follows:
Let's start with the original countably infinite set S. We can enumerate the elements of S (since it's countable), so there's a
first one, a second one, etc. Now we can represent each subset SS of S as a binary vector that contains one element for
each element of the original set S. If SS contains element 1 of S, then the first element of its vector will be 1, otherwise 0.
Similarly for all the other elements of S. Of course, since S is countably infinite, the length of each vector will also be
countably infinite. Thus we might represent a particular subset SS of S as the vector:
Elem 1 of S
1
Elem 2 of S
0
Elem 3 of S
0
Elem 4 of S
1
Elem 5 of S
1
Elem 6 of S
0
Next, we observe that if the power set P of S were countably infinite, then there would be an enumeration of it that put its
elements in one to one correspondence with the natural numbers. Suppose that enumeration were the following (where
each row represents one element of P as described above. Ignore for the moment the numbers enclosed in parentheses.):
Elem 1 of P
Elem 2 of P
Elem 3 of P
Elem 4 of P
Elem 5 of P
Elem 6 of P
Elem 1 of S
1
(1)
0
1
0
1
1
Elem 2 of S
0
1
(2)
1
0
0
1
Elem 3 of S
0
0
0
(3)
1
1
1
Elem 4 of S
0
0
0
0
(4)
0
0
Elem 5 of S
0
0
0
0
0
(5)
0
Elem 6 of S
0
0
0
0
0
0
(6)
.
..
..
..
..
..
..
If this really is an enumeration of P, then it must contain all elements of P. But it doesn't. To prove that it doesn't, we will
construct an element L P that is not on the list. To do this, consider the numbers in parentheses in the matrix above.
Using them, we can construct L:
(1)
(2)
(3)
(4)
(5)
(6)
..
What we mean by (1) is that if (1) is a 1 then 0; if (1) is a 0, then 1. So we've constructed the representation for an
element of P. It must be an element of P since it describes a possible subset of S. But we've built it so that it differs from
the first element in the list above by whether or not it includes element 1 of S. It differs from the second element in the list
above by whether or not it includes element 2 of S. And so forth. In the end, it must differ from every element in the list
above in at least one place. Yet it is clearly an element of P. Thus we have a contradiction. The list above was not an
enumeration of P. But since we made no assumptions about it, no enumeration of P can exist. In particular, if we try to
fix the problem by simply adding our new element to the list, we can just turn around and do the same thing again and
create yet another element that 's not on the list. Thus there are more than 0 elements in P. We'll say that sets with more
than 0 elements are uncountably infinite.
The real numbers are uncountably infinite. The proof that they are is very similar to the one we just did for the power set
except that it's a bit tricky because, when we write out each number as an infinite sequence of digits (like we wrote out
each set above as an infinite sequence of 0's and 1's), we have to consider the fact that several distinct sequences may
represent the same number.
Not all uncountably infinite sets have the same cardinality. There is an infinite number of cardinal numbers. But we won't
need any more. All the uncountably infinite sets we'll deal with (and probably all the ones you can even think of unless
you keep taking power sets) have the same cardinality as the reals and the power set of a countably infinite set.
Thus to describe the cardinality of all the sets we'll consider, we will use one of the following:
The natural numbers, which we'll use to count the number of elements of finite sets,
Supplementary Materials
19
8 Closures
Imagine some set A and some property P. If we care about making sure that A has property P, we are likely to do the
following:
1. Examine A for P. If it has property P, we're happy and we quit.
2. If it doesnt, then add to A the smallest number of additional elements required to satisfy P.
Let's consider some examples:
Let A be a set of friends you're planning to invite to a party. Let P be "A should include everyone who is likely to
find out about the party" (since we don't want to offend anyone). Let's assume that if you invite Bill and Bill has a
friend Bob, then Bill may tell Bob about the party. This means that if you want A to satisfy P, then you have to invite
not only your friends, but your friends' friends, and their friends, and so forth. If you move in a fairly closed circle,
you may be able to satisfy P by adding a few people to the guest list. On the other hand, it's possible that you'd have
to invite the whole city before P would be satisfied. It depends on the connectivity of the FriendsOf relation in your
social setting. The problem is that whenever you add a new person to A, you have to turn around and look at that
person's friends and consider whether there are any of them who are not already in A. If there are, they must be
added, and so forth. There's one positive feature of this problem, however. Notice that there is a unique set that does
satisfy P, given the initial set A. There aren't any choices to be made.
Let A be a set of 6 people. Let P be "A can enter a baseball tournament". This problem is different from the last in
two important ways. First, there is a clear limit to how many elements we have to add to A in order to satisfy P. We
need 9 people and when we've got them we can stop. But notice that there is not a unique way to satisfy P (assuming
that we know more than 9 people). Any way of adding 3 people to the set will work.
Let A be the Address relation (which we defined earlier as "lives at same address as"). Since relations are sets, we
should be able to treat Address just as we've treated the sets of people in our last two examples. We know that
Address is an equivalence relation. So we'll let P be the property of being an equivalence relation (i.e., reflexive,
symmetric, and transitive). But suppose we are only able to collect facts about living arrangements in a piecemeal
fashion. For example, we may learn that Address contains {(Dave, Mary), (Sue, Pete), (John, Bill)}. Immediately we
know, because Address must be reflexive, that it must also contain {(Dave, Dave), (Mary, Mary), (Sue, Sue), (Pete,
Pete), (John, John), (Bill, Bill)}. And, since Address must also be symmetric it must contain {(Mary, Dave), (Pete,
Sue), (Bill, John)}. Now suppose that we discover that Mary lives with Sue. We add {(Mary, Sue)}. To make
Address symmetric again, we must add {(Sue, Mary)}. But now we also have to make it transitive by adding {(Dave,
Sue), (Sue, Dave)}.
Let A be the set of natural numbers. Let P be "the sum of any two elements of A is also in A." Now we've got a
property that is already satisfied. The sum of any two natural numbers is a natural number. This time, we don't have
to add anything to A to establish P.
Let A be the set of natural numbers. Let P be "the quotient of any two elements of A is also in A." This time we have
a problem. 3/5 is not a natural number. We can add elements to A to satisfy P. If we do, we end up with exactly the
rational numbers.
In all of these cases, we're going to want to say that A is closed with respect to P if it possesses P. And, if we have to add
elements to A in order to satisfy P, we'll call a smallest such expanded A that does satisfy P a closure of A with respect to
P. What we need to do next is to define both of these terms more precisely .
Supplementary Materials
20
1. d1, d2, dn-1 S, (all of the first n-1 elements are already in the set S) and
2. (d1, d2, dn-1, dn) R (the last element is related to the n-1 other elements via R)
it is also true that dn S.
A set S' is a closure of S with respect to R (defined on D) iff:
1. S S',
2. S' is closed under R, and
3. T (T D and T is closed under R) |S'| |T|.
In other words, S' is a closure of S with respect to R if it is an extension of S that is closed under R and if there is no
smaller set that also meets both of those requirements. Note that we can't say that S' must be the smallest set that will do
the job, since we do not yet have any guarantee that there is a unique such smaller set (recall the softball example above).
These definitions of closure are a very natural way to describe our first example above. Drawing from a set A of people,
you start with S equal to your friends. Then, to compute your invitee list, you simply take the closure of S with respect to
the relation FriendOf, which will force you to add to A your friends' friends, their friends, and so forth.
Now consider our second example, the case of the baseball team. Here there is no relation R that specifies, if one or more
people are already on the team, then some specific other person must also be on. The property we care about is a property
of the team (set) as a whole and not a property of patterns of individuals (elements). Thus this example, although similar,
is not formally an instance of closure as we have just defined it. This turns out to be significant and leads us to the
following definition:
Any property that asserts that a set S is closed under some relation R is a closure property of S. It is possible to prove that
if P is a closure property, as just defined, on a set A and S is a subset of A, then the closure of S with respect to R exists
and is unique. In other words, there exists a unique minimal set S' that contains S and is closed under R. Of all of our
examples above, the baseball example is the only one that cannot be described in the terms of our definition of a closure
property. The theorem that we have just stated (without proof) guarantees, therefore, that it will be the only one that does
not have a unique minimal solution.
The definitions that we have just provided also work to describe our third example, in which we want to compute the
closure of a relation (since, after all, a relation is a set). All we have to do is to come up with relations that describe the
properties of being reflexive, symmetric, and transitive. To help us see what those relations need to be, let's recall our
definitions of symmetry, reflexivity, and transitivity:
A relation R A A is reflexive if, for each a A, (a, a) R.
A relation R A A is symmetric if, whenever (a, b) R, so is (b, a).
A relation R A A is transitive if, whenever (a, b) R and (b, c) R, (a, c) R.
Looking at these definitions, we can come up with three relations, Reflexivity, Symmetry, and Transitivity. All three are
relations on relations, and they will enable us to define these three properties using the closure definitions we've given so
far. All three definitions assume a base set A on which the relation we are interested is defined:
a A, ((a, a)) Reflexivity. Notice the double parentheses here. Reflexivity is a unary relation, where each
element is itself an ordered pair. It doesn't really "relate" two elements. It is simply a list of ordered pairs. To see
how it works to define reflexive closure, imagine a set A = {x, y}. Now suppose we start with a relation R on A =
{(x, y)}. Clearly R isn't reflexive. And the Reflexivity relation tells us that it isn't because the reflexivity relation on
A contains {((x, x)), ((y, y))}. This is a unary relation. So n, in the definition of closure, is 1. Consider the first
element ((x, x)). We consider all the components before the nth (i.e., first) and see if they're in A. This means we
consider the first zero components. Trivially, all zero of them are in A. So the nth (the first) must also be. This
means that (x, x) must be in R. But it isn't. So to compute the closure of R under Reflexivity, we add it. Similarly for
(y, y).
a,b A, a b [((a, b), (b, a)) Symmetry]. This one is a lot easier. Again, suppose we start with a set A = {x,
y} and a relation R on A = {(x, y)}. Clearly R isn't symmetric. And Symmetry tells us that. Symmetry on A = {((x,
y), (y, x)), ((y, x), (x, y))}. But look at the first element of Symmetry. It tells us that for R to be closed, whenever (x,
y) is in R, (y, x) must also be. But it isn't. To compute the closure of R under Symmetry, we must add it.
Supplementary Materials
21
a,b,c A, [a b b c] [((a, b), (b, c), (a, c)) Transitivity]. Now we will exploit a ternary relation.
Whenever the first two elements of it are present in some relation R, then the third must also be if R is transitive. This
time, let's start with a set A = {x, y, z} and a relation R on A = {(x, y), (y, z)}. Clearly R is not transitive. The
Transitivity relation on A is {((x, y), (y, z), (x, z)), ((x, z), (z, y), (x, y)), ((y, x), (x, z), (y, z)), ((y, z), (z, x), (y, x)),
((z, x), (x, y), (z, y)), ((z, y), (y, x), (z, x))}. Look at the first element of it. Both of the first two components of it are
in R. But the third isn't. To make R transitive, we must add it.
These definitions also work to enable us to describe the closure of the integers under division as the rationals. Following
the definition, A is the set of rationals. S (a subset of A) is the integers and R is QuotientClosure, defined as:
a,b,c A, [a/b = c] [(a, b, c) QuotientClosure].
So we've got a quite general definition of closure. And it makes it possible to prove the existence of a unique closure for
any set and any relation R. The only constraint is that this definition works only if we can define the property we care
about as an n-ary relation for some finite n. There are cases of closure where this isn't possible, as we saw above, but we
won't need to worry about them in this class.
So we don't really need any new definitions. We've offered a general definition of closure of a set (any set) under some
relation (which is the way we use to define a property). But most of the cases of closure that we'll care about involve the
special case of the closure of a binary relation given some property that may or may not be naturally describable as a
relation. For example, one could argue that the relations we just described to define the properties of being reflexive,
symmetric, and transitive are far from natural. Thus it will be useful to offer the following alternative definitions. Don't
get confused though by the presence of two definitions. Except when we cannot specify our property P as a relation (and
we won't need to deal with any such cases), these new definitions are simply special cases of the one we already have.
We say that a binary relation B on a set T is closed under property P if B possesses P. For example, LessThanOrEqual is
closed under transitivity since it is transitive. Simple enough. Next:
Let B be a binary relation on a set T. A relation B' is a closure of B with respect to some property P iff:
1. B B',
2. B' is closed under P, and
3. There is no smaller relation B'' that contains B and is closed under P.
So, for example, the transitive closure of B = {(1, 2), (2, 3)} is the smallest new relation B' that contains B but is
transitive. So B' = {(1, 2), (2, 3), (1, 3)}.
You'll generally find it easier to use these definitions than our earlier ones. But keep in mind that, with the earlier
definitions it is possible to prove the existence of a unique closure. Since we went through the process of defining
reflexivity, symmetry, and transitivity using those definitions, we know that there always exists a unique reflexive,
symmetric, and transitive closure for any binary relation. We can exploit that fact that the same time that we use the
simpler definitions to help us find algorithms for computing those closures.
Supplementary Materials
22
We can, however, guarantee that the transitive closure of any binary relation on a finite set is computable. How? A very
simple approach is the following algorithm for computing the transitive closure of a binary relation B with N elements on
a set A:
Set Trans = B;
/* Initially Trans is just the original relation.
/* We need to find all cases where (x, y) and (y, z) are in Trans. Then we must insert (x, z) into Trans if
/* it isn't already there.
Boolean AddedSomething = True;
/* We'll keep going until we make one whole pass through
without adding any new elements to Trans.
while AddedSomething = True do
AddedSomething = False;
Xcounter := 0;
Foreach element of Trans do
xcounter := xcounter + 1;
x = Trans[xcounter][1]
/* Pull out the first element of the current element of Trans
y = Trans[xcounter][2]
/* Pull out the second element of the i'th element of Trans
/* So if the first element of Trans is (p, q), then
/* x = p and y = q the first time through.
zcounter := 0;
foreach element of Trans do
zcounter := zcounter + 1;
if Trans[zcounter][1] = y then do /* We've found another element (y, ?) and we may need to
z = Trans[zcounter][2];
/* add (x, ?) to Trans.
if (x, z) Trans then do /* we have to add it
Insert(Trans, (x, z))
AddedSomething = True;
end;
end;
end;
end;
end;
This algorithm works. Try it on some simple examples. But it's very inefficient. There are much more efficient
algorithms. In particular, if we represent a relation as an incidence matrix, we can do a lot better. Using Warshall's
algorithm, for example, we can find the transitive closure of a relation of n elements using 2n3 bit operations. For a
description of that algorithm, see, for example, Kenneth Rosen, Discrete Mathematics and its Applications, McGraw-Hill.
Supplementary Materials
23
1.
2.
Let v = 1: 1 = 12
Prove that, n 0,
n +1
( Odd i = n ) ( Odd i = ( n + 1) 2 )
2
i =1
i =1
To do this, we observe that the sum of the first n+1 odd integers is the sum of the first n of them plus the n+1'st, i.e.,
n +1
Odd
i =1
Odd
i =1
2
=n
+ Odd n +1
+ Odd n +1
= n + 2n + 1
( Odd n+1 is 2n + 1)
2
= (n + 1)
Thus we have shown that the sum of the first n+1 odd integers must be equivalent to (n+1)2 if it is known that the sum of
the first n of them is equivalent to n2.
Mathematical induction lets us prove properties of positive integers. But it also lets us prove properties of other things if
the properties are described in terms of integers. For example, we could talk about the size of finite sets, or the length of
finite strings. Let's do one with sets: For any finite set A, |2A| = 2|A|. In other words, the cardinality of the power set of A
is 2 raised to the power of the cardinality of A. We'll prove this by induction on the number of elements of A (|A|). We
follow the same two steps:
1. Let v = 0. So A is , |A| = 0, and A's power set is {}, whose cardinality is 1 = 20 = 2|A|.
2. Prove that, n 0, if |2A| = 2|A| is true for all sets A of cardinality n, then it is also true for all sets S of cardinality n+1.
We do this as follows. Since n 0, and any such S has n + 1 elements, S must have at least one element. Pick one
and call it a. Now consider the set T that we get by removing a from S. |T| must be n. So, by the induction
hypothesis (namely that |2A| = 2|A| if |A| = n), our claim is true for T and we have |2T| = 2|T|. Now let's return to the
power set of the original set S. It has two parts: those subsets that include a and those that don't. The second part is
exactly 2T, so we know that it has 2|T| = 2n elements. The first part (all the subsets that include a) is exactly all the
subsets that don't include a with a added in). Since there are 2n subsets that don't include a and there are the same
number of them once we add a to each, we have that the total number of subsets of our original set S is 2n (for the
ones that don't include a) plus 2n (for the ones that do include a), for a total of 2(2n) = 2n+1, which is exactly 2|S|.
Why does mathematical induction work? It relies on the well-ordering property of the integers, which states that every
nonempty set of nonnegative integers has a least element. Let's see how that property assures us that mathematical
induction is valid as a proof technique. We'll use the technique of proof by contradiction to show that, given the wellordering property, mathematical induction must be valid. Once we have done an induction proof, we know that A(v)
(where v is 0 or 1 or some other starting value) is true and we know that n 0, A(n) A(n+1). What we're using the
technique to enable us to claim is that, therefore, n v, A(n). Suppose the technique were not valid and there was a set
S of nonnegative integers v for which A(n) is False. Then, by the well-ordering property, there is a smallest element in
this set. Call it x. By definition, x must be equal to or greater than v. But it cannot actually be v because we proved A(v).
So it must be greater than v. But now consider x - 1. Since x - 1 is less than x, it cannot be in S (since we chose x to be
the smallest value in S). If it's not in S, then we know A(x - 1). But we proved that n 0, A(n) A(n+1), so A(x - 1)
A(x). But we assumed A(x). So that assumption led us to a contradiction; thus it must be false.
Sometimes the principle of mathematical induction is stated in a slightly different but formally equivalent way:
1. Prove that A holds for the smallest value v with which we're concerned.
2. State the induction hypothesis H, which must be of the form, "There is some integer n v such that A is true for all
integers k where v k n."
3. Prove that (n v, A(n)) A(n + 1). In other words prove that whenever A holds for all nonnegative integers
starting with v, up to an including n, it must also hold for n + 1.
You can use whichever form of the technique is easiest for a particular problem.
Supplementary Materials
24
Supplementary Materials
3.
4.
5.
So, for example, let's find the meaning of the regular expression (a b)*b:
L((a b)*b)
= L((a b)*) L(b)
= L(a b)* L(b)
= (L(a) L(b))* L(b)
= ({a} {b})* {b}
= {a, b}* {b}
which is just the set of all strings ending in b. Another example is L(((a b)(a b))a(a b)*) = {xay: x and y are strings
of a's and b's and |x| = 2}. The distinction between an expression and its meaning is somewhat pedantic, but you should
try to understand it. We will usually not actually write L() because it is generally clear from context whether we mean the
regular expression or the language denoted by it. For example, a (a b)* is technically meaningless since (a b)* is a
regular expression, not a set. Nonetheless, we use it as a reasonable abbreviation for a L((a b)*), just as we write 3 +
4 = 4 + 3 to mean that the values of "3 + 4" and "4 + 3", not the expressions themselves, are identical.
Here are some useful facts about regular expressions and the languages they describe:
(a b)* = (a*b*)* = (b*a*)* = set of all strings composed exclusively of a's and b's (including the empty string)
(a b)c = (ac bc) Concatenation distributes over union
c(a b) = (ca cb)
"
a* b* (a b)* The right-hand expression denotes a set containing strings of mixed a's and b's, while the lefthand expression does not.
(ab)* a*b*
In the right-hand expression, all a's must precede all b's. That's not true for the left-hand
expression.
a* * = a* = a*
There is an algebra of regular expressions, but it is rather complex and not worth the effort to learn it. Therefore, we will
rely primarily on our knowledge of what the expressions mean to determine the equivalence (or non-equivalence) or
regular expressions.
We are now in a position to state formally our definition of the class of regular languages: Given an alphabet , the set of
regular languages over is precisely the set of languages that can be defined using regular expressions with respect to .
Another equivalent definition (given our definition of regular expressions and L()), is that the set of regular languages over
an alphabet is the smallest set that contains and each of the elements of , and that is closed under the operations of
concatenation, union, and Kleene star (rules 2, 3, and 4 above).
Supplementary Materials
4.
5.
Compute E(q) for each q in K. q K, E(q) = {p K : (q, ) |-*M (p, )}. In other words, E(q) is the set of states
reachable from q without consuming any input.
Compute s' = E(s).
Compute ', which is defined as follows: Q2K and a, '(Q, a) = {E(p) : p K and (q, a, p) for some q
Q}. Recall that the elements of 2K are sets of states from the original machine M. So what we've just said is that
to compute the transition out of one of these "set" states, find all the transitions out of the component states in the
original machine, then find all the states reachable from them via epsilon transitions. The new state is the set of all
states reachable in this way. We'll actually compute ' by first computing it for the new start state s' and each of the
elements of . Each state thus created becomes an element of K', the set of states of M'. Then we compute ' for
any new states that were just created. We continue until there are no additional reachable states. (so although ' is
defined for all possible subsets of K, we'll only bother to compute it for the reachable such subsets and in fact we'll
define K' to include just the reachable configurations.)
Compute K' = that subset of 2K that is reachable, via ', as defined in step 3, from s'.
Compute F' = {Q K' : Q F }. In other words, each constructed "set" state that contains at least one final
state from the original machine M becomes a final state in M'.
However, to complete the proof of the theorem that asserts that there is an equivalent DFSA for every NDFSA, we need
next to prove that the algorithm we have defined does in fact produce a machine that is (1) deterministic, and (2)
equivalent to the original machine.
Proving (1) is trivial. By the definition in step 3 of ', we are guaranteed that ' is defined for all reachable elements of K'
and that it is single valued.
Next we must prove (2). In other words, we must prove that M' accepts a string w if and only if M accepts w. We
constructed the transition function ' of M' so that each step of the operation of M' mimics an "all paths" simulation of M.
So we believe that the two machines are identical, but how can we prove that they are? Suppose we could prove the
following:
Lemma: For any string w * and any states p, q K, (q, w) |-*M (p, ) iff (E(q), w) |-*M' (P, ) for some P
K' that contains p. In other words, if the original machine M starts in state q and, after reading the string w, can
land in state p, then the new machine M' must behave as follows: when started in the state that corresponds to the
set of states the original machine M could get to from q without consuming any input, M' reads the string w and
lands in one of its new "set" states that contains p. Furthermore, because of the only-if part of the lemma, M'
must end up in a "set" state that contains only states that M could get to from q after reading w and following any
available epsilon transitions.
If we assume that this lemma is true, then the proof that M' is equivalent to M is straightforward: Consider any string w
*. If w L(M) (i.e., the original machine M accepts w) then the following two statements must be true:
1. The original machine M, when started in its start state, can consume w and end up in a final state. This must be true
given the definition of what it means for a machine to accept a string.
Supplementary Materials
2.
(E(s), w) |-*M' (Q, ) for some Q containing some f F. In other words, the new machine, when started in its start
state, can consume w and end up in one of its final states. This follows from the lemma, which is more general and
describes a computation from any state to any other. But if we use the lemma and let q equal s (i.e., M begins in its
start state) and p = f for some f F (i.e., M ends in a final state), then we have that the new machine M', when started
in its start state, E(s), will consume w and end in a state that contains f. But if M' does that, then it has ended up in
one of its final states (by the definition of K' in step 5 of the algorithm). So M' accepts w (by the definition of what it
means for a machine to accept a string). Thus M' accepts precisely the same set of strings that M does.
Now all we have to do is to prove the lemma. What the lemma is saying is that we've built M' from M in such a way that
the computations of M' mirror those of M and guarantee that the two machines accept the same strings. But of course we
didn't build M' to perform an entire computation. All we did was to describe how to construct '. In other words, we
defined how individual steps of the computation work. What we need to do now is to show that the individual steps, when
taken together, do the right thing for strings of any length. The obvious way to do that, since we know what happens one
step at a time, is to prove the lemma by induction on |w|.
We must first prove that the lemma is true for the base case, where |w| = 0 (i.e., w = ). To do this, we actually have to do
two proofs, one to establish the if part of the lemma, and the other to establish the only if part:
Basis step, if part: Prove (q, w) |-*M (p, ) if (E(q), w) |-*M' (P, ) for some P K' that contains p. Or, turning it around to
make it a little clearer,
[ (E(q), w) |-*M' (P, ) for some P K' that contains p ] (q, w) |-*M (p, )
If |w| = 0, then M' makes no moves. So it must end in the same state it started in, namely E(q). If we're told that it ends in
some state that contains p, then p E(q). But, given our definition of E(x), that means exactly that, in the original
machine M, p is reachable from q just be following transitions, which is exactly what we need to show.
Basis step, only if part: Recall that only if is equivalent to implies. So now we need to show:
[ (q, w) |-*M (p, ) ] (E(q), w) |-*M' (P, ) for some P K' that contains p
If |w| = 0, and the original machine M goes from q to p with only w as input, it must go from q to p following just
transitions. In other words p E(q). Now consider the new machine M'. It starts in E(q), the set state that includes all
the states that are reachable from q via transitions. Since the new machine is deterministic, it will make no moves at all
if its input is . So it will halt in exactly the same state it started in, namely E(q). Since we know that p E(q), we know
that M' has halted in a set state that includes p, which is exactly what we needed to show.
Next we'll prove that if the lemma is true for all strings w of length k, k 0, then it is true for all strings of length k + 1.
Considering strings of length k + 1, we know that we are dealing with strings of a least one character. So we can rewrite
any such string as zx, where x is a single character and z is a string of length k. The way that M and M' process z will thus
be covered by the induction hypothesis. We'll use our definition of ', which specifies how each individual step of M'
operates, to show that, assuming that the machines behave correctly for the first k characters, they behave correctly for the
last character also and thus for the entire string of length k + 1. Recall our definition of ':
'(Q, a) = {E(p) : p K and (q, a, p) for some q Q}.
To prove the lemma, we must show a relationship between the behavior of:
M:
(q, w) |-*M (p, ), and
M':
(E(q), w) |-*M' (P, ) for some P K' that contains p
Rewriting w as zx, we have
M:
(q, zx) |-*M (p, )
M':
(E(q), zx) |-*M' (P, ) for some P K' that contains p
Breaking each of these computations into two pieces, the processing of z followed by the processing of x, we have:
M:
(q, zx) |-*M (si, x) |-*M (p, )
M':
(E(q), zx) |-*M' (S, x) |-*M' (P, ) for some P K' that contains p
Supplementary Materials
In other words, after processing z, M will be in some set of states si, and M' will be in some state, which we'll call S.
Again, we'll split the proof into two parts:
Induction step, if part:
[ (E(q), zx) |-*M' (S, x) |-*M' (P, ) for some P K' that contains p ] (q, zx) |-*M (si, x) |-*M (p, )
If, after reading z, M' is in state S, we know, from the induction hypothesis, that the original machine M, after reading z,
must be in some set of states si and that S is precisely that set. Now we just have to describe what happens at the last step
when the two machines read x. If we have that M', starting in S and reading x lands in P, then we know, from the
definition of ' above, that P contains precisely the states that M could land in after starting in any si and reading x. Thus
if p P, p must be a state that M could land in.
Induction step, only if part:
(q, zx) |-*M (si, x) |-*M (p, ) (E(q), zx) |-*M' (S, x) |-*M' (P, ) for some P K' that contains p
By the induction hypothesis, we know that if M, after processing z, can reach some set of states si, then S (the state M' is in
after processing z) must contain precisely all the si's. Knowing that, and our definition of ', we know that from S, reading
x, M' must be in some set state P that contains precisely the states that M can reach starting in any of the si's, reading x,
and then following all transitions. So, after consuming w (zx), M', when started in E(q) must end up in a state P that
contains all and only the states p that M, when started in q, could end up in.
This theorem is a very useful result when we're dealing with FSAs. It's true that, by and large, we want to build
deterministic machines. But, because we have provided a constructive proof of this theorem, we know that we can design
a deterministic FSA by first describing a nondeterministic one (which is sometimes a much simpler task), and then
applying our algorithm to construct a corresponding deterministic one.
Supplementary Materials
Supplementary Materials
Supplementary Materials
Supplementary Materials
Supplementary Materials
Supplementary Materials
10
The lemma we've just stated is sometimes referred to as the Strong Pumping Lemma. That's because there is a weaker
version that is much less easy to use, yet no easier to prove. We won't say anything more about it, but at least now you
know what it means if someone refers to the Strong Pumping Lemma.
Supplementary Materials
11
Don't take this picture to be making any claim about what x, y, and z are. But what the picture does show is that w is
composed of two regions:
1. The initial segment, which is all a's.
2. The final segment, which is all b's.
Typically, as we attempt to show that there is no x, y, z triple that satisfies all the conditions of the pumping lemma, what
we'll do is to consider the ways that y can be spread within the regions of w. In this example, we observe immediately that
since |xy| N, y must be ag for some g 1. (In other words, y must lie completely within the first region.) Now there's just
one case to consider. Clearly we'll add just a's as we pump, so there will be more a's than b's, so we'll generate strings that
are not in L. Thus w is not pumpable, we've found a contradiction, and L is not regular.
The most common mistake people make in applying the pumping lemma is to show a particular x, y, z triple that isn't
pumpable. Remember, you must show that all such triples fail to be pumpable.
Suppose you try to apply the pumping lemma to a language L and you fail to find a counterexample. In other words, every
string w that you examine is pumpable. What does that mean? Does it mean that L is regular? No. It's true that if L is
regular, you will be unable to find a counterexample. But if L isn't regular, you may fail if you just don't find the right w.
In other words, even in a non regular language, there may be plenty of strings that are pumpable. For example, consider
L = axbx ay. In other words, if there are any b's there must be the same number of them as there are a's, but it's also okay
just to have a's. We can prove that this language is not pumpable by choosing w = aNbN, just as we did for our previous
example, and the proof will work just as it did above. But suppose that were less clever. Let's choose w = aN. Now again
we know that y must be ag for some g 1. But now, if we pump y either in or out, we still get strings in L, since all strings
that just contain a's are in L. We haven't proved anything.
Remember that, when you go to apply the pumping lemma, the one thing that is in your control is the choice of w. As you
get experience with this, you'll notice a few useful heuristics that will help you find a w that is easy to work with:
1. Choose w so that there are distinct regions, each of which contains only one element of . When we considered L =
axbx, we had no choice about this, since every element of L (except ) must have a region of a's followed by a region
of b's. But suppose we were interested in L' = {w {a, b}*: w contains an equal number of a's and b's}. We might
consider choosing w = (ab)N. But now there are not clear cut regions. We won't be able to use pumping successfully
because if y = ab, then we can pump to our hearts delight and we'll keep getting strings in L. What we need to do is to
choose aNbN, just as we did when we were working on L. Sure, L' doesn't require that all the a's come first. But
strings in which all the a's do come first are fine elements of L', and they produce clear cut regions that make the
pumping lemma useful.
2. Choose w so that the regions are big enough that there is a minimal number of configurations for y across the regions.
In particular, you must pick w so that it has length at least N. But there's no reason to be parsimonious. For example,
when we were working on axbx, we could have chosen w = aN/2bN/2. That would have been long enough. But then we
couldn't have known that y would be a string of a's. We would have had to consider several different possibilities for
y. (You might want to work this one out to see what happens.) It will generally help to choose w so that each region
is of length at least N.
3. Whenever possible, choose w so that there are at least two regions with a clear boundary between them. In particular,
you want to choose w so that there are at least two regions that must be related in some way (e.g., the a region must be
the same length as the b region). If you follow this rule and rule 2 at the same time, then you'll be assured that as you
pump y, you'll change one of the regions without changing the other, thus producing strings that aren't in your
language.
The pumping lemma is a very powerful tool for showing that a language isn't regular. But it does take practice to use it
right. As you're doing the homework problems for this section, you may find it helpful to use the worksheet that appears
on the next page.
Supplementary Materials
12
[1]
[2]
[3]
[4]
Q.E.D.
Supplementary Materials
13
Supplementary Materials
terminated derivations, let alone any terminated derivations consisting entirely of terminal symbols (i.e., generated strings).
Thus this grammar generates the language .
Now let us look at our definition of a context-free grammar in a somewhat more formal way. A context-free grammar (CFG)
G consists of four things:
(1) V, a finite set (the total alphabet or vocabulary), which contains two subsets, (the terminal symbols, i.e., the ones that
will occur in strings of the language) and V - (the nonterminal symbols, which are just working symbols within the
grammar).
(2) , a finite set (the terminal alphabet or terminal vocabulary).
(3) R, a finite subset of (V - ) x V*, the set of rules. Although each rule is an ordered pair (nonterminal, string), well
generally use the notion nonterminal string to describe our rules.
(4) S, the start symbol or initial symbol, which can be any member of V - .
For example, suppose G = (V, , R, S), where
V = {S, A, B, a, b}, = {a, b}, and R = {S AB, A aAa, A a, B Bb, B b}
Then G generates the string aaabb by the following derivation:
(1)
Formally, given a grammar G, the two-place relation on strings called "derives in one step" and denoted by (or by G if
we want to remind ourselves that the relation is relative to G) is defined as follows:
(u, v) iff strings w, x, y V* and symbol A (V - ) such that u = xAy, v = xwy, and (A w) R.
In words, two strings stand in the "derives in one step" relation for a given grammar just in case the second can be produced
from the first by rewriting a single non-terminal symbol in a way allowed by the rules of the grammar.
(u, v) is commonly written in infix notation, thus: u v.
This bears an obvious relation to the "yields in one step" relation defined on configurations of a finite automaton. Recall that
there we defined the "yields in zero or more steps" relation by taking the reflexive transitive closure of the yields in one step
relation. Well do that again here, giving us "yields in zero or more steps" denoted by * (or G*, to be explicit), which
holds of two strings iff the second can be derived from the first by finitely many successive applications of rules of the
grammar. In the example grammar above:
Supplementary Materials
In other words, a derivation is a finite sequence of strings such that each string, except the first, is derivable in one step from
the immediately preceding string by the rules of the grammar. We can also refer to it as a derivation of wn from w0. Such a
derivation is said to be of length n, or to be a derivation of n steps. (1) above is a 5-step derivation of aaabb from S according
to the given grammar G.
Similarly, A aAa is a one-step derivation of aAa from A by the grammar G. (Note that derivations do not have to begin
with S, nor indeed do they have to begin with a working string derivable from S. Thus, AA aAaA aAaa is also a wellformed derivation according to G, and so we are entitled to write AA * aAaa).
The strings generated by a grammar G are then just those that are (i) derivable from the start symbol, and (ii) composed
entirely of terminal symbols. That is, G = (V, , R, S) generates w iff w * and S * w. Thus, derivation (l) above shows
that the string aaabb is generated by G. The string aAa, however, is not generated by G, even though it is derivable from S,
because it contains a non-terminal symbol. It may be a little harder to see that the string bba is not generated by G. One
would have to convince oneself that there exists no derivation beginning with S and ending in bba according to the rules of G.
(Question: Is this always determinable in general, given any arbitrary context-free grammar G and string w? In other words,
can one always tell whether or not a given w is "grammatical" according to G? We'll find out the answer to this later.)
The language generated by a grammar G is exactly the set of all strings generated--no more and no less. The same remarks
apply here as in the case of regular languages: a grammar generates a language iff every string in the language is generated by
the grammar and no strings outside the language are generated.
And now our final definition (for this section). A language L is context free if and only if there exists a context-free grammar
that generates it.
Our example grammar happens to generate the language a(aa)*bb*. To prove this formally would require a somewhat
involved argument about the nature of derivations allowed by the rules of G, and such a proof would not necessarily be easily
extended to other grammars. In other words, if you want to prove that a given grammar generates a particular language, you
will in general have to make an argument which is rather specific to the rules of the grammar and show that it generates all the
strings of the particular language and only those. To prove that a grammar generates a particular string, on the other hand, it
suffices to exhibit a derivation from the start symbol terminating in that string. (Question: if such a derivation exists, are we
guaranteed that we will be able to find it?) To prove that a grammar does not generate a particular string, we must show that
there exists no derivation that begins with the start symbol and terminates in that string. The analogous question arises here:
when can we be sure that our search for such a derivation is fruitless and be called off? (We will return to these questions
later.)
Supplementary Materials
Each time an a is generated, a corresponding b is generated. They are created in parallel. The first a, b pair created is the
outermost one. The nonterminal S is always between the two regions of a's and b's. Clearly any string anbn L is produced
by this grammar, since
n
n n
S
aSb L"""
a nSb
"""
! a b ,
n steps
Therefore L L(G).
We must also check that no other strings not in anbn are produced by the grammar, i.e., we must confirm that L(G) L.
Usually this is easy to see intuitively, though you can prove it by induction, typically on the length of a derivation. For
illustration, we'll prove L(G) L for this example, though in general you won't need to do this in this class.
Claim: x, x L(G) x L. Proof by induction on the length of the derivation of G producing x.
Base case: Derivation has length 1. Then the derivation must be S , and L.
Induction step: Assume all derivations of length k produce a string in L, and show the claim holds for derivations of length k
+ 1. A derivation of length k + 1 looks like:
S aSb
axb
"
"L ""
!
k steps
for some terminal string x such that S * x. By the induction hypothesis, we know that x L (since x is produced by a
derivation of length k), and so x = anbn for some n (by definition of L). Therefore, the string axb produced by the length k + 1
derivation is axb = aanbnb = an+1bn+1 L. Therefore by induction, we have proved L(G) L.
Example 2: L = {xy: |x| = |y| and x {a, b}* and y {c, d}*}. (E.g., , ac, ad, bc, bd, abaccc L. ) Here again we will
want to match a's and b's against c's and d's in parallel. We could use two strategies. In the first,
G = ({S, a , b, c, d}, {a , b , c, d}, R , S) where R = {S aSc, S aSd, S bSc, S bSd, S }.
This explicitly enumerates all possible pairings of a, b symbols with c, d symbols. Clearly if the number of symbols allowed
in the first and second halves of the strings is n, the number of rules with this method is n2 + 1, which would be inefficient for
larger alphabets. Another approach is:
G = ({S, L, R, a, b, c, d}, {a, b, c, d}, R, S) where R = {S LSR, S , L a, L b, R c, R d}.
(Note that L and R are nonterminals here.) Now the number of rules is 2n+2.
Example 3: L = {wwR : w {a, b}*}. Any string in L will have matching pairs of symbols. So it is clear that the CFG G =
({S, a, b}, {a, b}, R, S}, where R = {S aSa, S bSb, S } generates L, because it produces matching symbols in
parallel. How can we prove L(G) = L? To do half of this and prove that L L(G) (i.e, every element of L is generated by
G), we note that any string x L must either be (which is generated by G (since S )), or it must be of the form awa or
bwb for some w L. This suggests an induction proof on strings:
Claim: x, x L x L(G). Proof by induction on the length of x.
Base case: L and L(G).
Induction step: We must show that if the claim holds for all strings of length k, it holds for all strings of length k+2 (We
use k+2 here rather than the more usual k+1 because, in this case, all strings in L have even length. Thus if a string in L has
length k, there are no strings in L of length k +1.). If |x| = k+2 and x L, then x = awa or x = bwb for some w L. |w| = k,
so, by the induction hypothesis, w L(G). Therefore S * w. So either S aSa * awa, and x L(G), or S bSb *
bwb, and x L(G).
Conversely, to prove that L(G) L, i.e., that G doesn't generate any bad strings, we would use an induction on the length of a
derivation.
Claim: x, x L(G) x L. Proof by induction on length of derivation of x.
Base case: length 1. S and L.
Induction step: Assume the claim is true for derivations of length k, and show the claim holds for derivations of length k+1. A
derivation of length k + 1 looks like:
Supplementary Materials
S aSa
awa
"
"L
""
!
k steps
or like
S bSb
L
bwb
""
""
!
k steps
for some terminal string w such that S * w. By the induction hypothesis, we know that w L (since w is produced by a
derivation of length k), and so x = awa is also in L, by the definition of L. (Similarly for the second class of derivations that
begin with the rule S bSb.)
As our example languages get more complex, it becomes harder and harder to write detailed proofs of the correctness of our
grammars and we will typically not try to do so.
Example 4: L = {anb2n}. You should recognize that b2n = (bb)n, and so this is just like the first example except that instead of
matching a and b, we will match a and bb. So we want
G = ({S, a, b}, {a, b}, R, S} where R = {S aSbb, S }.
Another way to approach this problem is to realize that {ambn : m n}= {ambm+k : k 0} = {ambkbm: k 0}. Therefore, we
can produce a's and b's in parallel, then when we're done, produce some more b's. So a solution is
Supplementary Materials
Any string x = a b L a b
n1
n1
nk
nk
/* k applications of rule S MS */
*
M kS
Mk
a n b n M k 1
1
n n
n n
a 1b L a k b k
Now let's consider the fact that there are other derivations of the string aaabb using our example grammar:
(2)
(3)
(4)
(5)
Supplementary Materials
(6)
If you examine all these derivations carefully, you will see that in each case the same rules have been used to rewrite the same
symbols; they differ only in the order in which those rules were applied. For example, in (2) we chose to rewrite the B in
ABb as b (producing Abb) before rewriting the A as aAa, whereas in (3) the same.processes occur in the opposite order. Even
though these derivations are technically different (they consist of distinct sequences of strings connected by ) it seems that
in some sense they should all count as equivalent. This equivalence is expressed by the familiar representations known as
derivation trees or parse trees.
The basic idea is that the start symbol of the grammar becomes the root of the tree. When this symbol is rewritten by a
grammar rule S x1x2...xn, we let the tree "grow" downward with branches to each of the new nodes x1, x2, , xn; thus:
S
x1
x2
...
xn
When one of these xi symbols is rewritten, it in turn becomes the "mother" node with branches extending to each of its
"daughter" nodes in a similar fashion. Each of the derivations in (1) through (6) would then give rise to the following parse
tree:
S
A
a
B
a
A note about tree terminology: for us, a tree always has a single root node, and the left-to-right order of nodes is significant;
i.e., X is not the same tree as X.
Y
The lines connecting nodes are called branches, and their top-to-bottom orientation is also significant. A mother node is
connected by a single branch to each of the daughter nodes beneath it. Nodes with the same mother are called sisters, e.g.,
the topmost A and B in the tree above are sisters, having S as mother.
Nodes without daughters are called leaves; e.g., each of the nodes labelled with a lower-case letter in the tree above. The
string formed by the left-to-right sequence of leaves is called the yield (aaabb in the tree above).
It sometimes happens that a grammar allows the derivations of some string by nonequivalent derivations, i.e., derivations that
do not reduce to the same parse tree. Suppose, for example, the grammar contained the rules S A, S B, A b and B
b Then the two following derivations of the string b correspond to the two distinct parse trees shown below.
Supplementary Materials
SAb
SBb
A grammar with this property is said to be ambiguous. Such ambiguity is highly undesirable in grammars of programming
languages such as C, LISP, and the like, since the parse tree (the syntactic structure) assigned to a string determines its
translation into machine language and therefore the sequence of commands to be executed. Designers of programming
languages, therefore, take great pains to assure that their grammars (the rules that specify the well-formed strings of the
language) are unambiguous. Natural languages, on the other hand, are typically rife with ambiguities (cf. "They are flying
planes," "Visiting relatives can be annoying," "We saw her duck," etc.), a fact that makes computer applications such as
machine translation, question-answering systems, and so on, maddeningly difficult.
S
a
S
a
S
a
b
b
G = (V, , R, S),
where
V = {S, A, a, b},
= {a, b},
R = {S aS
S bA
A aA
A bS
S
L(G) = {w {a, b}* :
w contains an even
number of b's}
Supplementary Materials
(3)
A
b
S
b
A
b
(Notice that the stack alphabet need not be in any way similar to the input alphabet. We could equally well have pushed a's,
but we don't need to.) This PDA nondeterministically decides when it is done reading a's. Thus one valid computation is
(a, aabb, ) |- (s, abb, I) |- (f, abb, I),
which is then stuck and so M rejects along this path. Since a different accepting computation of aabb exists, this is no
problem, but you might want to elimimate the nondeterminism if you are bothered by it. Note that the nondeterminism arises
from the transition; we only want to take it if we are done reading a's. The only way to know that there are no more a's is to
read the next symbol and see that it's ab. (This is analogous to unfolding a loop in a program.) One other wrinkle: L, so
now state s must be final in order to accept . The resulting deterministic PDA is M = ({s, f}, {a, b}, {I}, , s, {s, f}), where
=
{ ((s, a, ), (s, I)),
/* read a's and count them */
((s, b, I), (f, )),
/* only go to second phase if there's a b */
((f, b, I), (f, ))}.
/* read b's and compare them to the a's in the stack */
Notice that this DPDA can still get stuck and thus fail, e.g., on input b or aaba (i.e., strings that aren't in L). Determinism for
PDA's simply means that there is at most one applicable transition, not necessarily exactly one.
Example 3: L = {ambmcndn}. Here we have two independent concerns, matching the a's and b's, and then matching the c's and
d's. Again, start by designing a finite state machine for the language L' that is just like L in structure but where we don't care
how many of each letter there are. In other words a*b*c*d*. It's obvious that this machine needs four states. So our PDA
must also have four states. The twist is that we must be careful that there is no unexpected interaction between the two
independent parts ambm and cndn. Consider the PDA M = ({1,2,3,4}, {a,b,c,d}, {I}, , 1, {4}), where =
{ ((1, a, ), (1, I)),
/* read a's and count them */
Supplementary Materials
/* guess that we're ready to quit reading a's and start reading b's */
/* read b's and compare to as */
/* guess that we're ready to quit reading bs and start reading cs */
/* read cs and count them */
/* guess that were ready to quit reading cs and start reading ds */
/* read ds and compare them to cs */
It is clear that every string in L is accepted by this PDA. Unfortunately, some other strings are also, e.g., ad. Why is this?
Because it's possible to go from state 2 to 3 without clearing off all the I marks we pushed for the a's That means that the
leftover Is are available to match ds. So this PDA is accepting the language {ambncpdq : m n and m + p = n + q}, a
superset of L. E.g., the string aabcdd is accepted.
One way to fix this problem is to ensure that the stack is really cleared before we leave phase 2 and go to phase 3; this must be
done using a bottom of stack marker, say B. This gives M = ({s, l, 2, 3, 4}, {a, b, c, d}, {B, I}, , s, {4}), where =
{ ((s, , ), (1, B)),
/* push the bottom marker onto the stack */
((1, a, ), (1, I)),
/* read a's and count them */
((1, , ), (2, )),
/* guess that we're ready to quit reading a's and start reading b's */
((2, b, I), (2, )),
/* read b's and compare to as */
((2, , B), (3, )),
/* confirm stack is empty, then get readty to start reading cs */
((3, c, ), (3, I))
/* read cs and count them */
((3, , ), (4, ))}.
/* guess that were ready to quit reading cs and start reading ds */
((4, d, I), (4, ))}.
/* read ds and compare them to cs */
A different, probably cleaner, fix is to simply use two different symbols for the counting of the a's and the c's. This gives us
M = ({1,2, 3,4}, {a,b,c,d}, {A,C}, , 1, {4}), where =
{ ((1, a, ), (1, A)),
/* read a's and count them */
((1, , ), (2, )),
/* guess that we're ready to quit reading a's and start reading b's */
((2, b, A), (2, )),
/* read b's and compare to as */
((2, , ), (3, )),
/* guess that we're ready to quit reading b's and start reading c's */
((3, c, ), (3, C)),
/* read cs and count them */
((3, , ), (4, )),
/* guess that were ready to quit reading cs and start reading ds */
((4, d, C), (4, ))}.
/* read ds and compare them to cs */
Now if an input has more a's than b's, there will be leftover A's on the stack and no way for them to be removed later, so that
there is no way such a bad string would be accepted.
As an exercise, you want to try making a deterministic PDA for this one.
Example 4: L = {anbn} {bnan). Just as with nondeterministic finite state automata, whenever the language were concerned
with can be broken into cases, a reasonable thing to do is build separate PDAs for the each of the sublanguages. Then we
build the overall machine so that it, each time it sees a string, it nondeterministically guesses which case the string falls into.
(For example, compare the current problem to the simpler one of making a finite state machine for the regular language a*b*
b*a*.) Taking this approach here, we get M = ({s, 1, 2, 3, 4), {a, b}, {I}, , s, {2, 4}), where =
{ ((s, , ), (1, )),
/* guess that this is an instance of anbn */
((s, , ), (3, )),
/* guess that this is an instance of bnan */
((1, a, ), (1, I)),
/* as come first so read and count them */
((1, , ), (2, )),
/* begin the b region following the as */
((2, b, I), (2, )),
/* read bs and compare them to the as */
((3, b, ), (3, I)),
/* bs come first so read and count them */
((3, , ), (4, )),
/* begin the a region following the bs */
((4, a, I), (4, ))}.
/* read as and compare them to the bs */
Supplementary Materials
10
Notice that although L, the start state s is not a final state, but there is a path (in fact two) from s to a final state.
Now suppose that we want a deterministic machine. We can no longer use this strategy. The -moves must be eliminated by
looking ahead. Once we do that, since L, the start state must be final. This gives us M = ({s, 1, 2, 3, 4}, {a, b}, {l}, , s,
{s, 2, 4}), where =
{ ((s, a, ), (1, )),
((s, b, ), (3, )),
((1, a, ), (1, I)),
((1, b, I), (2, )),
((2, b, I), (2, )),
((3, b, ), (3, I)),
((3, a, I), (4, )),
((4, a, I), (4, ))}.
Example 5: L = {wwR : w {a, b}*}. Here we have two phases, the first half and the second half of the string. Within each
half, the symbols may be mixed in any particular order. So we expect that a two state PDA should do the trick. See the
lecture notes for how it works.
Example 6: L = {wwR : w a*b*}. Here the two halves of each element of L are themselves split into two phases, reading
a's, and reading b's. So the straightforward approach would be to design a four-state machine to represent these four phases.
This gives us M = ({1, 2, 3, 4), {a, b}, {a, b), , 1, {4}), where =
{ ((1, a, ), (1, a))
/*push a's*/
((1, , ), (2, )),
/* guess that we're ready to quit reading a's and start reading b's */
((2, b, ), (2, b)),
/* push b's */
((2, , ), (3, )),
/* guess that we're ready to quit reading the first w and start reading wR */
((3, b, b), (3, )),
/* compare 2nd b's to 1st b's */
((3, , ), (4, )),
/* guess that we're ready to quit reading b's and move to the last region of a's */
((4, a, a), (4, ))}
/* compare 2nd a's to 1st a's */
You might want to compare this to the straightforward nondeterministic finite state machine that you might design to accept
a*b*b*a*.
There are various simplifications that could be made to this machine. First of all, notice that L = {ambnbnam}. Next, observe
that bnbn = (bb)n, so that, in effect, the only requirement on the b's is that there be an even number of them. And of course a
stack is not even needed to check that. So an alternate solution only needs three states, giving M = ({1, 2, 3}, {a, b}, {a},
{a}, , 1, {3}), where =
{ ((1, a, ), (1, a))
/*push a's*/
((1, , ), (2, )),
/* guess that we're ready to quit reading a's and start reading b's */
((2, bb, ), (2, )),
/* read bb's */
((2, , ), (3, )),
/* guess that we're ready to quit reading b's and move on the final group of a's */
((3, a, a), (3, ))}.
/* compare 2nd a's to 1st a's */
This change has the fringe benefit of making the PDA more deterministic since there is no need to guess where the middle of
the b's occurs. However, it is still nondeterministic.
So let's consider another modification. This time, we go ahead and push the a's and the b's that make up w. But now we
notice that we can match wR against w in a single phase: the required ordering b*a* in wR will automatically be enforced if
we simply match the input with the stack! So now we have the PDA M= ({1, 2, 3}, {a, b}, {a, b}, , 1, {3}), where =
{ ((1, a, ), (1, a))
/*push a's*/
((1, , ), (2, )),
/* guess that we're ready to quit reading a's and start reading b's */
((2, b, ), (2, b)),
/* push b's */
Supplementary Materials
11
/* guess that we're ready to quit reading the first w and start reading wR */
/* compare wR to w*/
"
Notice that this machine is still nondeterministic. As an exercise, you might try to build a deterministic machine to accept this
language. You'll find that it's impossible; you've got to be able to tell when the end of the strings is reached, since it's possible
that there aren't any b's in between the a regions. This suggests that there might be a deterministic PDA that accepts L$, and
in fact there is. Interestingly, even that is not possible for the less restrictive language L = {wwR : w {a, b}*} (because
there's no way to tell without guessing where w ends and wR starts). Putting a strong restriction on string format often makes a
language more tractable. Also note that {wwR : w a*b+} is accepted by a determinstic PDA; find such adeterminstic PDA
as an exercise.
Example 7: Consider L = {w {a, b}* : #(a, w) = #(b, w)}. In other words every string in L has the same number of a's as
b's (although the a's and b's can occur in any order). Notice that this language imposes no particular structure on its strings,
since the symbols may be mixed in any order. Thus the rule of thumb that we've been using doesn't really apply here. We
don't need multiple states for multiple string regions. Instead, we'll find that, other than possible bookkeeping states, one
"working" state will be enough.
Sometimes there may be a tradeoff between the degree of nondeterminism in a pda and its simplicity. We can see that in this
example. One approach to designing a PDA to solve this problem is to keep a balance on the stack of the excess a's or b's.
For example, if there is an a on the stack and we read b, then we cancel them. If, on the other hand, there is an a on the stack
and we read another a, we push the new a on the stack. Whenever the stack is empty, we know that we've seen matching
number of a's and b's so far. Let's try to design a machine that does this as deterministically as possible. One approach is M =
({s, q, f}, {a, b}, {a, b, c}, , s, {f}), where =
1
((s, , ), (q, c))
/* Before we do anything else, push a marker, c, on the stack so we'll be able to
tell when the stack is empty. Then leave state s so we don't ever do this again.
2
((q, a, c), (q ,ac))
/* If the stack is empty (we find the bottom c) and we read an a, push c back
and then the a (to start counting a's).
3
((q, a, a), (q, aa))
/* If the stack already has a's and we read an a, push the new one.
4
((q, a, b), (q, ))
/* If the stack has b's and we read an a, then throw away the top b and the new a.
5
((q, b, c), (q, bc))
/* If the stack is empty (we find the bottom c) and we read a b, then
start counting b's.
6
((q, b, b), (q, bb))
/* If the stack already has b's and we read b, push the new one.
7
((q, b, a), (q, ))
/* If the stack has a's and we read a b, then throw away the top a and the new b.
8
((q, , c), (f, ))
/* If the stack is empty then, without reading any input, move to f, the final state.
Clearly we only want to take this transition when we're at the end of the input.
This PDA attempts to solve our problem deterministically, only pushing an a if there is not a b on the stack. In order to tell
that there is not a b, this PDA has to pop whatever is on the stack and examine it. In order to make sure that there is always
something to pop and look at, we start the process by pushing the special marker c onto the stack. (Recall that there is no way
to check directly for an empty stack. If we write just for the value of the current top of stack, we'll get a match no what the
stack looks like.) Notice, though, that despite our best efforts, we still have a nondeterministic PDA because, at any point in
reading an input string, if the number of a's and b's read so far are equal, then the stack consists only of c, and so transition 8
((q, , c), (f, )) may be taken, even if there is remaining input. But if there is still input, then either transition 1 or 5 also
applies. The solution to this problem is to add a terminator to L.
Another thing we could do is to consider a simpler PDA that doesn't even bother trying to be deterministic. Consider M
=({s}, {a, b}, {a, b}, , s, {s}), where =
1
((s, a, ), (s, a))
/* If we read an a, push a.
2
((s, a, b), (s, ))
/* Cancel an input a and a stack b.
3
((s, b, ), (s, b))
/* If we read b, push b.
4
((s, b, a), (s, ))
/* Cancel and input b and a stack a.
Now, whenever we're reading a and b is on the stack, there are two applicable transitions: 1, which ignores the b and pushes
the a on the stack, and 2, which pops the b and throws away the a (in other words, it cancels the a and b against each other).
Supplementary Materials
12
Transitions 3 and 4 do the same two things if we're reading b. It is clear that if we always perform the cancelling transition
when we can, we will accept every string in L. What you might worry about is whether, due to this larger degree of freedom,
we might not also be able to wrongly accept some string not in L. In fact this will not happen because you can prove that M
has the property that, if x is the string read in so far, and y is the current stack contents,
#(a, x) - #(b, x) = #(a, y) - #(b, y).
This formula is an invariant of M. We can prove it by induction on the length of the string read so far: It is clearly true
initially, before M reads any input, since 0 - 0 - 0 - 0. And, if it holds before taking a transition, it continues to hold afterward.
We can prove this as follows:
Let x' be the string read so far and let y' be the contents of the stack at some arbitrary point in the computation. Then let us
see what effect each of the four possible transitions has. We first consider:
((s, a, ), (s, a)): After taking this transition we have that x' = xa and y' = ay. Thus we have
#(a, x') - #(b, x')
= /* x' = xa */
#(a, xa) - #(b, xa)
= /* #(b, xa) = #(b, x)
#(a, xa) - #(b, x)
=
#(a, x) + 1 - #(b, x)
= /* induction hypothesis */
#(a, y) + 1 - #(b, y)
=
#(a, ay) - #(b, ay)
= /* y' = ay */
#(a, y') - #(b, y')
So the invariant continues to be true for x' and y' after the transition is taken. Intuitively, the argument is simply that when this
transition is taken, it increments #(a, x) and #(a, y), preserving the invariant equation. The three other transitions also preserve
the invariant as can be seen similarly:
((s, a, b), (s, )) increments #(a, x) and decrements #(b, y), preserving equality.
((s, b, ), (s, b)) increments #(b, x) and #(b, y), preserving equality.
((s, b, a), (s, )) increments #(b, x) and decrements #(a, y), preserving equality.
Therefore, the invariant holds initially, and taking any transitions continues to preserve it, so it is always true, no matter what
string is read and no matter what transitions are taken. Why is this a good thing to know? Because suppose a string x L is
read by M. Since x L, we know that #(a, x) - #(b, x) 0, and therefore, by the invariant equation, when the whole string x
has been read in, the stack contents y will satisfy #(a, y) - #(b, y) 0 Thus the stack cannot be empty, and x cannot be
accepted, no matter what sequence of transitions is taken. Thus no bad strings are accepted by M.
In other words, if you can describe a language with a context-free grammar, you can build a nondeterministic PDA for it, and
vice versa. Note here that the class of context-free languages is equivalent to the class of languages accepted by
nondeterministic PDAs. This is different from what we observed when we were considering regular languages. There we
showed that nondeterminism doesnt buy us any power and that we could build a deterministic finite state machine for every
regular language. Now, as we consider context-free languages, we find that determinism does buy us power: there are
languages that are accepted by nondeterministic PDAs for which no deterministic PDA exists. And those languages are
context free (i.e., they can be described with context-free grammars). So this theorem differs from the similar theorem that we
proved for regular languages and claims equivalence for nondeterministic PDAs rather than deterministic ones.
Supplementary Materials
13
Well prove this theorem by construction in two steps: first well show that, given a context-free grammar G, we can construct
a PDA for L(G). Then well show (actually, well just sketch this second proof) that we can go the other way and construct,
from a PDA that accepts some language L, a grammar for L.
Lemma: Every context-free language is accepted by some nondeterministic PDA.
To prove this lemma, we give the following construction. Given some CFG G = (V, , R, S), we construct an equivalent
PDA M in the following way. M = (K, , , , s, F), where
K = {p, q }
(the PDA always has just 2 states)
s=p
(p is the initial state)
F = {q}
(q is the only final state)
=
(the input alphabet is the terminal alphabet of G)
=V
(the stack alphabet is the total alphabet of G)
contains
(1) the transition ((p, , ), (q, S))
(2) a transition ((q, , A), (q, )) for each rule A in G
(3) a transition ((q, a, a), (q, )) for each a
Notice how closely the machine M mirrors the structure of the original grammar G. M works by using its stack to simulate a
derivation by G. Using the transition created in (1), M begins by pushing S onto its stack and moving to its second state, q,
where it will stay for the rest of its operation. Think of the contents of the stack as Ms expectation for what it must find in
order to have seen a legal string in L(G). So if it finds S, it will have found such a string. But if S could be rewritten as some
other sequence , then if M found it would also have found a string in L(G). All the transitions generated by (2) take care
of these options by allowing M to replace a stack symbol A by a string whenever G contains the rule A . Of course, at
some point we actually have to look at the input. Thats what M does in the transitions generated in (3). If the stack contains
an expectation of some terminal symbol and if the input string actually contains that symbol, M consumes the input symbol
and pops the expected symbol off the stack (effectively canceling out the expectation with the observed symbol). These steps
continue, and if M succeeds in emptying its stack and reading the entire input string, then the input is accepted.
Lets consider an example. Let G = (V = {S, a, b, c}, = {a, b, c}, R = {S aSa, S bSb, S c}, S). This grammar
generates {xcxR : x {a, b}*}. Carrying out the construction we just described for this example CFG gives the following
PDA:
M = ({p, q}, {a, b, c}, {S, a, b, c}, , p, {q}), where
= { ((p, , ), (q, S))
((q, , S), (q, aSa))
((q, , S), (q, bSb))
((q, , S), (q, c))
((q, a, a), (q, ))
((q, b, b), (q, ))
((q, c, c), (q, ))}
Here is a derivation of the string abacaba by G:
(1) S aSa abSba : abaSaba abacaba
And here is a computation by M accepting that same string:
(2)
(p, abacaba, ) |- (q, abacaba, S) |- (q, abacaba, aSa) |- (q, bacaba, Sa) |- (q, bacaba, bSba) |- (q, acaba, Sba) |(q, acaba, aSaba) |- (q, caba, Saba) |- (q, caba, caba) |- (q, aba, aba) |- (q, ba, ba) |- (q, a, a) |- (q, , )
Supplementary Materials
14
If you look at the successive stack contents in computation (2) above, you will see that they are, in effect, tracing out a
derivation tree for the string abacaba:
S
a,
S
b
S
a
b
b
c
M is alternately extending the tree and checking to see if leaves of the tree match the input string. M is thus acting as a topdown parser. A parser is something that determines whether a presented string is generated by a given grammar (i.e., whether
the string is grammatical or well-formed), and, if it is, calculates a syntactic structure (in this case, a parse tree) assigned to
that string by the grammar. Of course, the machine M that we have just described does not in fact produce a parse tree,
although it could be made to do so by adding some suitable output devices. M is thus not a parser but a recognizer. Well
have more to say about parsers later, but we can note here that parsers play an important role in many kinds of computer
applications including compilers for programming languages (where we need to know the structure of each command), query
interpreters for database systems (where we need to know the structure of each user query), and so forth.
Note that M is properly non-deterministic. From the second configuration in (2), we could have gone to (q, abacaba, bSb) or
to (q, abacaba, c), for example, but if wed done either of those things, M would have reached a dead end. M in effect has to
guess which one of a group of applicable rules of G, if any, is the right one to derive the given string. Such guessing is highly
undesirable in the case of most practical applications, such as compilers, because their operation can be slowed down to the
point of uselessness. Therefore, programming languages and query languages (which are almost always context-free, or
nearly so) are designed so that they can be parsed deterministically and therefore compiled or interpreted in the shortest
possible time. A lot of attention has been given to this problem in Computer Science, as you will learn if you take a course in
compilers. On the other hand, natural languages, such as English, Japanese, etc., were not "designed" for this kind of parsing
efficiency. So, if we want to deal with them by computer, as for example, in machine translation or information retrieval
systems, we have to abandon any hope of deterministic parsing and strive for maximum non-deterministic efficiency. A lot of
effort has been devoted to these problems as well, as you will learn if you take a course in computational linguistics.
To complete the proof of our lemma, we need to prove that L(M) = L(G). The proof is by induction and is reasonably
straightforward. Well omit it here, and turn instead to the other half of the theorem:
Lemma: If M is a non-deterministic PDA, there is a context-free grammar G such that L(G) = L(M).
Again, the proof is by construction. Unfortunately, this time the construction is anything by natural. Wed never want
actually to do it. We just care that the construction exists because it allows us to prove this crucial result. The basic idea
behind the construction is to build a grammar that has the property that if we use it to create a leftmost derivation of some
string s then we will have simulated the behavior of M while reading s. The nonterminals of the grammar are things like <s,
Z, f> (recall that we can use any names we want for our nonterminals). The reason we use such strange looking nonterminals
is to make it clear what each one corresponds to. For example, <s, Z, f> will generate all strings that M could consume in the
process of moving from state s with Z on the stack to state f having popped Z off the stack.
To construct G from M, we proceed in two steps: First we take our original machine M and construct a new simple
machine M (see below). We do this so that there will be fewer cases to consider when we actually do the construction of a
grammar from a machine. Then we build a grammar from M.
A pda M is simple iff:
(1) There are no transitions into the start state, and
(2) Whenever ((q, a, ), (p, )) is a transition in M and q is not the start state, then and || 2.
Supplementary Materials
15
In other words, M is simple if it always consults its topmost stack symbol (and no others) and replaces that symbol either with
0, 1, or 2 new symbols. We need to treat the start state separately since of course when M starts, its stack is empty and there
is nothing to consult. But we do need to guarantee that the start state cant bypass the restriction of (2) if it also functions as
something other than the start state i.e., it is part of a loop. Thus constraint (1).
Although not all machines are simple, there is an algorithm to construct an equivalent simple machine from any machine M.
Thus the fact that our grammar construction algorithm will work only on simple machines in no way limits the applicability of
the lemma that says that for any machine there is an equivalent grammar.
Given any PDA M, we construct an equivalent simple PDA M as follows:
(1) Let M = M.
(2) Add to M a new start state s and a new final state f. Add a transition from s to Ms original start state that consumes no
input and pushes a special stack bottom symbol Z onto the stack. Add transitions from of all of Ms original final states to
f. These transitions should consume no input but they should pop the bottom of stack symbol Z from the stack. For example,
if we start with a straightforward two-state PDA that accepts wcwR, then this step produces:
a//a
c//
//Z
s'
a/a/
/Z/
s
b//b
f'
b/b/
(3) (a) Assure that || 1. In other words, make sure that no transition looks at more than one symbol on the stack. It is easy
to do this. If there are any transitions in M that look at two or more symbols, break them down into multiple transitions that
examine one symbol apiece.
(b) Assure that || 1. In other words, make sure that each transition pushes no more than one symbol onto the stack.
(The rule for simple allows us to push 2, but youll see why we restrict to 1 at this point in a minute.) Again, if M has any
transitions that push more than one symbol, break them apart into multiple steps.
(c) Assure that || = 1. We already know that || isnt greater than 1. But it could be zero. If there are any transitions that
dont examine the stack at all, then change them so that they pop off the top symbol, ignore it, and push it right back on.
When we do this, we will increase by one the length of the string that gets pushed onto the stack. Now you can see why we
did step (b) as we did. If, after completing (b) we never pushed more than one symbol, we can go ahead and do (c) and still
be assured that we never push more than two symbols (which is what we require for M to be simple).
Well omit the proof that this procedure does in fact produce a new machine M that is simple and equivalent to M.
Once we have a simple machine M (K, , , , s, f) derived from our original machine M (K, , , , s, F), we are
ready to construct a grammar G for L(M) (and thus, equivalently, for L(M). We let G = (V, , R, S), where V contains a
start symbol S, all the elements of , and a new nonterminal symbol <q, A, p> for every q and p in K and every A = or any
symbol in the stack alphabet of M (which is the stack alphabet of M plus the special bottom of stack marker). The tricky part
is the construction of R, the rules of G. R contains all the following rules (although in fact most will be useless in the sense
that the nonterminal symbol on the left hand side will never be generated in any derivation that starts with S):
(1) The special rule S <s, Z, f>, where s is the start state of the original machine M, Z is the special bottom of stack
symbol that M pushes when it moves from s to s, and f is the new final state of M. This rule says that to be a string in
L(M) you must be a string that M can consume if it is started in state s with Z on the top of the stack and it makes it to state f
having popped Z off the stack. All the rest of the rules will correspond to the various paths by which M might do that.
(2) Consider each transition ((q, a, B), (r, C)) of M where a is either or a single input symbol and C is either a single symbol
or . In other words, each transition of M that pushes zero or one symbol onto the stack. For each such transition and each
state p of M, we add the rule
<q, B, p> a<r, C, p>.
Read these rule as saying that one way in which M can go from q to p and pop B off the stack is by consuming an a, going to
Supplementary Materials
16
state r, pushing a C on the stack (all of which are specified by the transition were dealing with), then getting eventually to p
and popping off the stack the C that the transition specifies must be pushed. Think of these rules this way. The transition that
motivates them tells us how to make a single move from q to r while consuming the input symbol a and popping the stack
symbol B. So think about the strings that could drive M from q to some arbitrary state p (via this transition) and pop B from
the stack in the process. They include all the strings that start with a and are followed by the strings that can drive M from r
on to p provided that they also cause the C that got pushed to be dealt with and popped. Note that of course we must also pop
anything else we push along the way, but we dont have to say that explicitly since if we havent done that we cant get to C to
pop it.
(3) Next consider each transition ((q, a, B), (r, CD)) of M, where C and D are stack symbols. In other words, consider every
transition that pushes two symbols onto the stack. (Recall that since M is simple, we only have to consider the cases of 0, 1,
or 2 symbols being pushed.) Now consider all pairs of states v and w in K (where v and w are not necessarily distinct). For
all such transitions and pairs of states, construct the rule
<q, B, v> a<r, C, w><w, D, v>
These rules are a bit more complicated than the ones that were generated in (2) just because they describe computations that
involve two intermediate states rather than one, but they work the same way.
(4) For every state q in M, we add the rule
<q, , q>
These rules let us get rid of spurious nonterminals so that we can actually produce strings composed solely of terminal
symbols. They correspond to the fact that M can (trivially) get from a state back to itself while popping nothing simply by
doing nothing (i.e., reading the empty string).
See the lecture notes for an example of this process in action. As youll notice, the grammars that this procedure generates are
very complicated, even for very simple machines. From larger machines, one would get truly enormous grammars (most of
whose rules turn out to be useless, as a matter of fact). So, if one is presented with a PDA, the best bet for finding an
equivalent CFG is to figure out the language accepted by the PDA and then proceed intuitively to construct a CFG that
generates that language.
Well omit here the proof that this process does indeed produce a grammar G such that L(G) = L(M).
6 Parsing
Almost always, the reason we care about context-free languages is that we want to build programs that "interpret" or
"understand" them. For example, programming languages are context free. So are most data base query languages.
Command languages that need capabilities (such as matching delimiters) that can't exist in simpler, regular languages are also
context free.
The interpretation process for context free languages generally involves three parts (although these logical parts may be
interleaved in various ways in the interpretation program):
1. Lexical analysis, in which individual characters are combined, generally using finite state machine techniques, to form the
building blocks of the language.
2. Parsing, in which a tree structure is assigned to the string.
3. Semantic interpretation, in which "meaning", often in the form of executable code, is attached to the nodes of the tree and
thus to the entire string itself.
For example, consider the input string "orders := orders + 1;", which might be a legal string in any of a number of
programming languages. Lexical analysis first divides this string of characters into a sequence of six tokens, each of which
corresponds to a basic unit of meaning in the language. The tokens generally contain two parts, an indication of what kind of
thing they are and the actual value of the string that they matched. The six tokens are (with the kind of token shown, followed
by its value in parentheses):
<id> (orders)
:=
<id> orders
<op> (+)
<id> (1)
;
Assume that we have a grammar for our language that includes the following rules:
Supplementary Materials
17
<assignment statement>
<id> orders
:=
<expr>
<expr>
<id> orders
<op> +
<expr>
<id> 1
Finally, we need to assign a meaning to this string. If we attach appropriate code to each node of this tree, then we can
execute this statement by doing a postorder traversal of the tree. We start at the top node, <statement> and traverse its left
branch, which takes us to <assignment statement>. We go down its left branch, and, in this case, we find the address of the
variable orders. We come back up to <assignment statement>, and then go down its middle branch, which doesnt tell us
anything that we didnt already know from the fact that were in an assignment statement. But we still need to go down the
right branch to compute the value that is to be stored. To do that, we start at <expr>. To get its value, we must examine its
subtrees. So we traverse its left branch to get the current value for orders. We then traverse the middle branch to find out
what operation to perform, and then the right branch and get 1. We hand those three things back up to <expr>, which applies
the + operator and computes a new value, which we then pass back up to <assignment statement> and then to <statement>.
Lexical analysis is a straightforward process that is generally done using a finite state machine. Semantic interpretation can
be arbitrarily complex, depending on the language, as well as other factors, such as the degree of optimization that is desired.
Parsing, though, is in the middle. It's not completely straightforward, particularly if we are concerned with efficiency. But it
doesn't need to be completely tailored to the individual application. There are some general techniques that can be applied to
a wide variety of context-free languages. It is those techniques that we will discuss briefly here.
6.1
Parsing as Search
Recall that a parse tree for a string in a context-free language describes the set of grammar rules that were applied in the
derivation of the string (and thus the syntactic structure of the string). So to parse a string we have to find that set of rules.
How shall we do it? There are two main approaches:
1. Top down, in which we start with the start symbol of the grammar and work forward, applying grammar rules and
keeping track of what we're doing, until we succeed in deriving the string we're interested in.
2. Bottom up, in which we start with the string we're interested in. In this approach, we apply grammar rules "backwards".
So we look for a rule whose right hand side matches a piece of our string. We "apply" it and build a small subtree that
will eventually be at the bottom of the parse tree. For example, given the assignment statement we looked at above, me
might start by building the tree whose root is <expr> and whose (only) leaf is <id> orders. That gives us a new "string"
to work with, which in this case would be orders := <expr> <op> <id>(1). Now we look for a grammar rule that matches
part of this "string" and apply it. We continue until we apply a rule whose left hand side is the start symbol. At that
point, we've got a complete tree.
Whichever of these approaches we choose, wed like to be as efficient as possible. Unfortunately, in many cases, were
Supplementary Materials
18
forced to conduct a search, since at any given point it may not be possible to decide which rule to apply. There are two
reasons why this might happen:
Our grammar may be ambiguous and there may actually be more than one legal parse tree for our string. We will
generally try to design languages, and grammars for them, so that this doesnt happen. If a string has more than one parse
tree, it is likely to have more than one meaning, and we rarely want to use languages where users cant predict the
meaning of what they write.
There may be only a single parse tree but it may not be possible to know, without trying various alternatives and seeing
which ones work, what that tree should be. This is the problem well try to solve with the introduction of various specific
parsing techniques.
6.2
To get a better feeling for why a straightforward parsing algorithm may require search, lets consider again the following
grammar for arithmetic expressions:
(1)
EE+T
(2)
ET
(3)
TT*F
(4)
TF
(5)
F (E)
(6)
F id
Lets try to do a top down parse, using this grammar, of the string id + id * id. We will begin with a tree whose only node is
E, the start symbol of the grammar. At each step, we will attempt to expand the leftmost leaf nonterminal in the tree.
Whenever we rewrite a nonterminal as a terminal (for example, when we rewrite F as id), well climb back up the tree and
down another branch, each time expanding the leftmost leaf nonterminal). We could do it some other way. For example, we
could always expand the rightmost nonterminal. But since we generally read the input string left to right, it makes sense to
process the parse tree left to right also.
No sooner do we get started on our example parse but were faced with a choice. Should we expand E by applying rule (1) or
rule (2)? If we choose rule (1), what were doing is choosing the interpretation in which + is done after * (since + will be at
the top of the tree). If we choose rule (2), were choosing the interpretation in which * is done after + (since * will be nearest
the top of the tree, which well detect at the next step when we have to find a way to rewrite T). We know (because weve
done this before and because we know that we carefully crafted this grammar to force * to have higher precedence than +) that
if we choose rule (2), well hit a dead end and have to back up, since there will be no way to deal with + inside T.
Lets just assume for the moment that our parser also knows the right thing to do. It then produces
E
E
Since E is again the leftmost leaf nonterminal, we must again choose how to expand it. This time, the right thing to do is to
choose rule (2), which will rewrite E as T. After that, the next thing to do is to decide how to rewrite T. The right thing to do
is to choose rule (4) and rewrite T as F. Then the next thing to do is to apply rule (6) and rewrite F as id. At this point, weve
generated a terminal symbol. So we read an input symbol and compare it to the one weve generated. In this case, it matches,
so we can continue. If it didnt match, wed know wed hit a dead end and wed have to back up and try another way of
expanding one of the nodes higher up in the tree. But since we found a match, we can continue. At this point, the tree looks
like
E
E
Supplementary Materials
19
F
id
Since we matched a terminal symbol (id), the next thing to do is to back up until we find a branch that we havent yet
explored. We back all the way up to the top E, then down its center branch to +. Since this is a terminal symbol, we read the
next input symbol and check for a match. Weve got one, so we continue by backing up again to E and taking the third
branch, down to T. Now we face another choice. Should we apply rule (3) or rule (4). Again, being smart, well choose to
apply rule (3), producing
E
E
T
T
F
id
The rest of the parse is now easy. Well expand T to F and then match the second id. Then well match F to the last id.
But how can we make our parser know what we knew?
In this case, one simple heuristic we might try is to consider the rules in the order in which they appear in the grammar. That
will work for this example. But suppose the input had been id * id * id. Now we need to choose rule (2) initially. And were
now in big trouble if we always try rule (1) first. Why? Because well never realize were on the wrong path and back up and
try rule (2). If we choose rule (1), then we will produce the partial parse tree
E
E
But now we again have an E to deal with. If we choose rule (1) again, we have
E
And then we have another E, and so forth. The problem is that rule (1) contains left recursion. In other words, a symbol, in
this case E, is rewritten as a sequence whose first symbol is identical to the symbol that is being rewritten.
We can solve this problem by rewriting our grammar to get rid of left recursion. Theres an algorithm to do this that always
works. We do the following for each nonterminal A that has any left recursive rules. We look at all the rules that have A on
their left hand side, and we divide them into two groups, the left recursive ones and the other ones. Then we replace each rule
with another related rule as shown in the following table:
Supplementary Materials
Original rules
A A1
A A2
A A3
A An
New rules
A 1A
A 2A
A 3A
A nA
A
20
A 1
A 2
A n
A 1A
A 2A
A nA
The basic idea is that, using the original grammar, in any successful parse, A may be expanded some arbitrary number of
times using the left recursive rules, but if were going to get rid of A (which we must do to derive a terminal string), then we
must eventually apply one of the nonrecursive rules. So, using the original grammar, we might have something like
A
2
A
3
A
1
Notice that, whatever 1, 3, and 2 are, 1, which came from one of the nonrecursive rules, comes first. Now look at the new
set of rules in the right hand column above. They say that A must be rewritten as a string that starts with the right hand side of
one of the nonrecursive rules (i.e., some i). But, if any of the recursive rules had been applied first, then there would be
further substrings, after the i, derived from those recursive rules. We introduce the new nonterminal A to describe what
those things could look like, and we write rules, based on the original recursive rules, that tell us how to rewrite A. Using
this new grammar, wed get a parse tree for 1 3 2 that would look like
A
1
A
3
A
2
If we apply this transformation algorithm to our grammar for arithmetic expressions, we get
(1)
E + T E
(1) E
(2)
E T E
(3)
T * F T
(3) T
(4)
T F T
(5)
F (E)
(6)
F id
Now lets return to the problem of parsing id + id * id. This time, there is only a single way to expand the start symbol, E, so
we produce, using rule (2),
E
T
Now we need to expand T, and again, there is only a single choice. If you continue with this example, youll see that if you
have the ability to peek one character ahead in the input (well call this character the lookahead character), then its possible
to know, at each step in the parsing process, which rule to apply.
Supplementary Materials
21
Youll notice that this parse tree assigns a quite different structure to the original string. This could be a serious problem
when we get ready to assign meaning to the string. In particular, if we get rid of left recursion in our grammar for arithmetic
expressions, well get parse trees that associate right instead of left. For example, well interpret a + b + c as
(a + b) + c using the original grammar, but
a + (b + c) using the new grammar.
For this and various other reasons, it often makes sense to change our approach and parse bottom up, rather than top down.
6.3
Bottom Up Parsing
Supplementary Materials
22
E
T
F
id
At this point, there are no further reductions to consider, since there are no rules whose right hand side is just E. So we must
consume the next input symbol + and push it onto the stack. Now, again, there are no available reductions. So we read the
next symbol, and the stack then contains id + E (well write the stack so that we push onto the left). Again, we need to
promote id before we can do anything else, so we promote it to F and then to T. Now weve got:
E
T
id
id
Notice that weve now got two parse tree fragments. Since were working up from the bottom, we dont know yet how theyll
get put together. The next thing we have to do is to choose between reducing the top three symbols on the stack (T + E) to E
using rule (1) or shifting on the next input symbol. By the way, dont be confused about the order of the symbols here. Well
always be matching the right hand sides of the rules reversed because the last symbol we read (and thus the right most one
well match) is at the top of the stack.
Okay, so what should we choose to do, reduce or shift? This is the first choice weve had to make where there isnt one
correct answer for all input strings. When there was just one universally correct answer, we could compute it simply by
examining the grammar. Now we cant do that. In the example were working with, we dont want to do the reduction, since
the next operator is *. We want the parse tree to correspond to the interpretation in which * is applied before +. That means
that + must be at the top of the tree. If we reduce now, it will be at the bottom. So we need to shift * on and do a reduction
that will build the multiplication piece of the parse tree before we do a reduction involving +. But if the input string had been
id + id + id, wed want to reduce now in order to cause the first + to be done first, thus producing left associativity. So we
appear to have reached a point where well have to branch. Since our grammar wont let us create the interpretation in which
we do the + first, if we choose that path first, well eventually hit a dead end and have to back up. Wed like not to waste time
exploring dead end paths, however. Well come back to how we can make a parser smart enough to do that later. For now,
lets just forge ahead and do the right thing.
As we said, what we want to do here is not to reduce but instead to shift * onto the stack. So the stack now contains * T + E.
At this point, there are no available reductions (since there are no rules whose right hand side contains * as the last symbol),
so we shift the next symbol, resulting in the stack id * T + E. Clearly we have to promote id to F (following the same
argument that we used above), so weve got
F
E
Supplementary Materials
id
id
id
23
Next, we need to reduce (since there arent any more input symbols to shift), but now we have another decision to make:
should we reduce the top F to T, or should we reduce the top three symbols, using rule (3) to T? The right answer is to use
rule (3), producing:
E
id
id
id
Finally, we need to apply rule (1), to produce the single symbol E on the top of the stack, and the parse tree:
E
E
id
id
F
*
id
In a bottom up parse, were done when we consume all the input and produce a stack that contains a single symbol, the start
symbol. So were done (although see the class notes for an extension of this technique in which we add to the input and endof-input symbol $ and consume it as well).
Now lets return to the question of how we can build a parser that makes the right choices at each step of the parsing process.
As we did the example parse above, there were two kinds of decisions that we had to make:
Whether to shift or reduce (well call these shift/reduce conflicts), and
Which of several available reductions to perform (well call these reduce/reduce conflicts).
Lets focus first on shift/reduce conflicts. At least in this example, it was always possible to make the right decision on these
conflicts if we had two kinds of information:
A good understanding of what is going on in the grammar. For example, we noted that theres nothing to be done with a
raw id that hasnt been promoted to an F.
A peek at the next input symbol (the one that were considering shifting), which we call the lookahead symbol. For
example, when we were trying to decide whether to reduce T + E or shift on the next symbol, we looked ahead and saw
that the next symbol was *. Since we know that * has higher precedence than +, we knew not to reduce +, but rather to
wait and deal with * first.
So we as people can be smart and do the right thing. The important question is, Can we build a parser that is smart and does
the right thing? The answer is yes. For simple grammars, like the one were using, its fairly straightforward to do so. For
more complex grammars, the algorithms that are needed to produce a correct deterministic parser are way beyond the scope of
this class. In fact, theyre not something most people ever want to deal with. And thats okay because there are powerful
tools for building parsers. The input to the tools is a grammar. The tool then applies a variety of algorithms to produce a
parser that does the right thing. One of the most widely used such tools is yacc, which well discuss further in class. See the
yacc documentation for some more information about how it works.
Although we dont have time to look at all the techniques that systems like yacc use to build deterministic bottom up parsers,
we will look at one of the structures that they can build. A precedence table tells us whether to shift or reduce. It uses just
two sources of information, the current top of stack symbol and the lookahead symbol. We wont describe how this table is
Supplementary Materials
24
constructed, but lets look at an example of one and see how it works. For our expression grammar, we can build the
following precedence table (where $ is a special symbol concatenated to the end of each input string that signals the end of the
input):
V\
(
)
id
+
*
E
T
F
id
Heres how to read the table. Compare the left most column to the top of the stack and find the row that matches. Now
compare the symbols along the top of the chart to the lookahead symbol and find the column that matches. If theres a dot in
the correponding square of the table, then reduce. Otherwise, shift. So lets go back to our example input string id + id * id.
Remember that we had a shift/reduce conflict when the stacks contents were T + E and the next input symbol was *. So we
look at the next to the last row of the table, the one that has T as the top of stack symbol. Then we look at the column headed
*. Theres no dot, so we dont reduce. But notice that if the lookahead symbol had been +, wed have found a dot, telling us
to reduce, which is exactly what wed want to do. Thus this table captures the precedence relationships between the operators
* and +, plus the fact that we want to associate left when faced with operators of equal precedence.
Deterministic, bottom up parsers of the sort that yacc builds are driven by an even more powerful table called a parse table.
Think of the parse table as an extension of the precedence table that contains additional information that has been extracted
from an examination of the grammar.
Supplementary Materials
25
Intersection with regular languages: The CFL's are, however, closed under intersection with the regular languages. Given a
CFL, L and a regular language R, the intersection L R is a CFL. Proof: Since L is context-free, there is some nondeterministic PDA accepting it, and since R is regular, there is some deterministic finite state automaton that accepts it. The
two automata can now be merged into a single PDA by a straightforward technique described in class.
Complementation: The CFL's are not closed under complement. Proof: Since the CFL's are closed under union, if they
were also closed under complement, this would imply that they are closed under intersection. This is so because of the settheoretic equality (L1 L2) = (L1 L2).
A A
3. The height of a parse tree is the length of the longest path in it. For example, the parse tree above is of height 2.
4. The width of a parse tree is the length of its yield (the string consisting of its leaves). For example, the parse tree above is
of width 5.
We observe that in order for a parse tree to achieve a certain width it must attain a certain minimum height. How are height
and width connected? The relationship depends on the rules of the grammar generating the parse tree.
Suppose, for example, that a certain CFG contains the rule A AAAA. Focusing just on derivations involving this rule, we
see that a tree of height 1 would have a width of 4. A tree of height 2 would have a maximum width of 16 (although there are
narrower trees of height 2 of course).
A
A
A
A
A
A
A
A
A
With height 3, the maximum width is 64 (i.e., 43), and in general a tree of height n has maximum width of 4n. Or putting it
another way, if a tree is wider than 4n then it must be of height greater than n.
Where does the 4 come from? Obviously from the length of the right-hand side of the rule A AAAA. If we had started
with the rule A AAAAAA, we would find that a tree of height n has maximum width 6n.
What about other rules in the grammar? If it contained both the rules A AAAA and A AAAAAA, for example, then the
maximum width would be determined by the longer right-hand side. And if there were no other rules whose right-hand sides
were longer than 6, then we could confidently say that any parse tree of height n could be no wider than 6n.
Let p = the maximum length of the right-hand side of all the rules in G. Then any parse tree generated by G of height m can
Supplementary Materials
26
be no wider than pm. Equivalently, any parse tree that is generated by G and that is wider than pm must have height greater
than m.
Now suppose we have a CFG G = (V, , R, S). Let n = |(V - )|, the size of the non-terminal alphabet. If G generates a parse
tree of width greater than pn, then, by the above reasoning, the tree must be of height greater than n, i.e., it contains a path of
length greater than n. Thus there are more than n + 1 nodes on this path (the number of nodes being one greater than the
length of the path), and all of them are non-terminal symbols except possibly the last. Since there are only n distinct nonterminal symbols in G, some such symbol must occur more than once along this path (by the Pigeonhole Principle). What this
says is that if a CFG generates a long enough string, its parse tree is going to be sufficiently high that it is guaranteed to have a
path with some repeated non-terminal symbol along it. Let us represent this situation by the following diagram:
S
w
Call the generated string w. The parse tree has S as its root, and let A be a non-terminal symbol (it could be S itself, of
course) that occurs at least twice on some path (indicated by the dotted lines).
Another observation about parse trees: If the leaves are all terminal symbols, then every non-terminal symbol in the tree is the
root of a subtree having terminal symbols as its leaves. Thus, the lower instance of A in the tree above must be the root of a
tree with some substring of w as its leaves. Call this substring x. The upper instance of A likewise roots a tree with a string of
terminal symbols as its leaves, and furthermore, from the geometry of the tree, we see that this string must include x as a
substring. Call the larger string, therefore, vxy. This string vxy is also a substring of the generated string w, which is to say
that for some strings u and z, w = uvxyz. Attaching these names to the appropriate substrings we have the following diagram:
S
w=
Now, assuming that such a tree is generated by G (which will be true on the assumption that G generates some sufficiently
long string), we can conclude that G must generate some other parse trees as well and therefore their associated terminal
strings. For example, the following tree must also be generated:
S
w=
Supplementary Materials
27
x
could have been applied when the upper A was being rewritten.
Similarly, the sequence of rules that expanded the upper A originally to yield the string vAy could have been applied to the
lower A as well, and if the resulting third A were now rewritten to produce x, we would have:
S
A
A
w=
Clearly this process could be repeated any number of times to give an infinite number of strings of the form
u
v1 v2 v3vn
x
y1 y2 y3yn
z, for all values of n 0.
We need one further observation before we are ready to state the Pumping Lemma. Consider again any string w that is
sufficiently long that its derivation contains at least one repeating nonterminal (A in our example above). Of course, there
may be any number of occurrences of A, but let's consider the bottom two. Consider the subtree whose root is the second A
up from the bottom (shown in bold):
S
A
A
w=
Notice that the leaves of this subtree correspond to the sequence vxy. How long can this sequence be? The answer relies on
the fact that this subtree contains exactly one repeated nonterminal (since we chose it that way). So the maximum height of
this subtree is pn+1. (Recall that p is the length of the longest rule in the grammar and n is the number of nonterminals in the
grammar.) Why n+1? Because we have n+1 nonterminals available (all n of them plus the one repeated one). So we know
that |vxy| must be M, where M is some constant that depends on the grammar and that is in fact pn+1. We are now ready to
state the Pumping Lemma for context-free languages.
Pumping Lemma for Context-Free Languages: Let G be a context-free grammar. Then there are some constants K and M
Supplementary Materials
28
depending on G such that, for every string w L(G) where |w| > K, there are strings u, v, x, y, z such that
(1) w = uvxyz,
(2) |vy| > 0,
(3) |vxy| M, and
(4) for all n 0, uvnxynz L(G).
Remarks: The constant K in the lemma is just pn referred to above - the length of the longest right-hand side of a rule of G
raised to the power of the number of non-terminal symbols. In applying the lemma we won't care what the value of K actually
is, only that some such constant exists. If G generates an infinite language, then clearly there will be strings in L(G) longer
than K, no matter what K is. If L(G) is finite, on the other hand, then the lemma still holds (trivially), because K will have a
value greater than the longest strings in L(G). So all strings in L(G) longer than K are guaranteed to be "pumpable," but no
such strings exist, so the lemma is trivially true because the antecedent of the conditional is false. Similarly for M, which is
actually bigger than K; it is pn+1. But, again, all we care about is that if L(G) is infinite then M exists. Without knowing what
it is, we can describe strings in terms of it and know that we must have pumpable strings.
This lemma, like the pumping lemma for regular languages, addresses the question of how strings grow longer and longer
without limit so as to produce infinite languages. In the case of regular languages, we saw that strings grow by repeating
some substring any number of times: xynz L for all n 0. When does this happen? Any string in the language of sufficient
length is guaranteed to contain such a "pumpable" substring. What length is sufficient? The number of states in the
minimum-state deterministic finite state machine that accepts the language. This sets the lower bound for guaranteed
pumpability.
For context-free languages, strings grow by repeating two substrings simultaneously: uvnxynz L for all n 0. This, too,
happens when a string in the language is of sufficient length. What is sufficient? Long enough to guarantee that its parse tree
contains a repeated non-terminal along some path. Strings this long exceed the lower bound for guaranteed context-free
pumpability.
What about the condition that |vy| > 0, i.e., v and y cannot both be the empty string? This could happen if by the rules of G
we could get from some non-terminal A back to A again without producing any terminal symbols in the process, and that's
possible with rules like A B, B C, C A, all perfectly good context-free rules. But given that we have a string w
whose length is greater than or equal to K, its derivation must have included some rules that make the string grow longer;
otherwise w couldn't have gotten as long as it did. Therefore, there must be some path in the derivation tree with a repeated
non-terminal that involves branching rules, and along this path, at least one of v or y is non-empty.
Recall that the corresponding condition for regular languages was y . We justified this by pointing out that if a sufficiently
long string w was accepted by the finite state machine, then there had to be a loop in the machine and that loop must read
something besides the empty string; otherwise w couldn't be as long as it is and still be accepted.
And, finally, what about condition (3), |vxy| M? How does this compare to the finite state pumping lemma? The
corresponding condition there was that |xy| K. Since |y| |xy|, this certainly tells us that the pumpable substring y is
(relatively) short. |xy| K also tells us that y occurs close to the beginning of the string w = xyz. The context-free version, on
the other hand, tells us that |vxy| M, where v and y are the pumpable substrings. Since |v| |vxy| M and |y| |vxy| M, we
know that the pumpable substrings v and y are short. Furthermore, from |vxy| M, we know that v and y must occur close to
each other (or at least not arbitrarily far away from each other). Unlike in the regular pumping lemma, though, they do not
necessarily occur close to the beginning of the string w = uvxyz. This is the reason that context-free pumping lemma proofs
tend to have more cases: the v and y pumpable substrings can occur anywhere within the string w.
Note that this Pumping Lemma, like the one for regular languages, is an if-then statement not an iff statement. Therefore, it
cannot be used to show that a language is context-free, only that it is not.
Example 1: Show that L = {anbncn : n 0} is not context-free.
If L were context-free (i.e., if there were a context-free grammar generating L), then the Pumping Lemma would apply. Then
Supplementary Materials
29
there would be a constant K such that every string in L of length greater than K would be "pumpable." We show that this is
not so by exhibiting a string w in L that is of length greater than K and that is not pumpable. Since we want to rely on clause
3 of the pumping lemma, and it relies on M > K, we'll actually choose w in terms of M.
Let w = aMbMcM. (Note that this is a particular string, not a language or a variable expression for a string. M is some
number whose exact value we don't happen to know; it might be 23, for example. If so, w would be the unique string
a23b23c23.) This string is of length greater than K (of length 3M, where M is greater than K, in fact), and it is a string in the
language {anbncn : n 0}. Therefore, it satisfies the criteria for a pumpable string according to the Pumping Lemma-provided, of course, that L is context-free.
What does it mean to say that aMbMcM is pumpable? It means that there is some way to factor this string into five parts u,v,x,y,z - meeting the following conditions:
1. v and y are not both the empty string (although any of u, x, or z could be empty),
2. |vxy| M,
3. uxz L, uvxyz L, uvvxyyz L, uvvvxyyyz L, etc.; i.e., for all n 0, uvnxynz L.
We now show that there is no way to factor aMbMcM into 5 parts meeting these conditions; thus, aMbMcM is not a pumpable
string, contrary to the stipulation of the Pumping Lemma, and thus L is not a context-free language.
How do we do this? We show that no matter how we try to divide aMbMcM in ways that meet the first two conditions, the third
condition always falls. In other words, every "legal" division of aMbMcM falls to be pumpable-that is, there is some value of n
for which uvnxynz L.
There are clearly a lot of ways to divide this string into 5 parts, but we can simplify the task by grouping the divisions into
cases just as we did with the regular language Pumping Lemma:
Case 1: Either v or y consists of more than different letter (e.g., aab). No such division is pumpable, since for any n 2,
uvnxynz will contain some letters not in the correct order to be in L. Now that weve eliminated this possibility, all the
remaining cases can assume that both v and y contain only as, only bs, or only cs (although one could also be ).
Case 2: Both v and y are located within aM. No such division is pumpable, since we will pump in only as. So, for n 2,
uvnxynz will contain more a's than b's or c's and therefore won't be in L. (Note that n = 0 also works.)
Cases 3, 4: Both v and y are located within bM or cM. No such division is pumpable, by the same logic as in Case 2.
Case 5: v is located within aM and y is located within bM. No such division is pumpable, since for n 2, uvnxynz will contain
more a's than c's or more b's than c's (or both) and therefore won't be in L. (n = 0 also works here.)
Cases 6, 7: v is located within aM and y is located within cM, or v is located within bM and y is located within cM. No such
division is pumpable, by the same logic as in Case 5.
Since every way of dividing aMbMcM into 5 parts (such that the 2nd and 4th are not both empty) is covered by (at least one of
the above 7 cases, and in each case we find that the resulting division is not pumpable, we conclude that there is no division of
aMbMcM that is pumpable. Since all this was predicated on the assumption that L was a context-free language, we conclude
that L, is not context-free after all.
Notice that we didn't need to use condition the fact that |vxy| must be less than M in this proof, although we could have used it
as an alternative way to handle case 6, since it prevents v and y from being separated by a region of size M, which is exactly
the size of the region of b's that occurs between the a's and the c's.
Example 2: Show that L = {w {a, b c}* | #(a, w) = #(b, w) = #(c, w)} is not context free. (We use the notation #(a, w) to
mean the number of a's in the string w.)
Let's first try to use the pumping lemma. We could again choose w = aMbMcM. But now we can't immediately brush off case 1
as we did in Example 1, since L allows for strings that have the a's, b's, and c's interleaved. In fact, this time there are ways
to divide aMbMcM into 5 parts (v, y not both empty), such that the result is pumpable. For example, if v were ab and y were c,
then uvnxynz would be in L for all n 0, since it would still contain equal numbers of a's, b's, and cs.
Supplementary Materials
30
So we need some additional help, which we'll find in the closure theorems for context-free languages. Our problem is that the
definition of L is too loose, so it's too easy for the strings that result from pumping to meet the requirements for being in L.
We need to make L more restrictive. Intersection with another language would be one way we could do that. Of course, since
the context-free languages are not closed under intersection, we can't simply consider some new language L' = L L2, where
L2 is some arbitrary context-free language. Even if we could use pumping to show that L' isn't context free, we'd know
nothing about L. But the context-free languages are closed under intersection with regular languages. So if we construct a
new language L' = L L2, where L2 is some arbitrary regular language, and then show that L' is not context free, we know
that L isn't either (since, if it were, its intersection with a regular language would also have to be context free). Generally in
problems of this sort, the thing to do is to use intersection with a regular language to constrain L so that all the strings in it
must have identifiable regions of symbols. So what we want to do here is to let L2 = a*b*c*. Then L' = L L2 = anbncn. If
we hadn't just proved that anbncn isn't context free, we could easily do so. In either case, we know immediately that L isn't
context free.
Supplementary Materials
31
Supplementary Materials
Machine M, followed by the encoding of a string w (which we can think of as the input to M), with the additional constraint
that TM M halts on input w (which is equivalent to saying that w is in the language accepted by M).
Solving a problem is the higher level notion, which is commonly used in the programming/algorithm context. In our
theoretical framework, we use the term deciding a language because we are talking about Turing Machines, which operate
on strings, and we have a carefully constructed theory that lets us talk about Turing Machines as language recognizers.
In the following section, well go through several examples in which we use the technique of problem reduction to show that
some new problem is unsolvable (or, alternatively, that the corresponding language is undecidable). All of these proofs
depend ultimately on our proof, using diagonalization, of the undecidability of the halting problem (H above).
2.
If MLE returns True, then we know that M (the original input to MH) halts on w. If MLE returns False, then we know that it
doesnt. Thus we have built a supposedly unbuildable MH. How did we do it? We claimed when we asserted the existence of
MLE that we could answer what appears to be a more limited question, does M halt on the empty tape? But we can find out
whether M halts on any other specific input (w) by constructing a machine (M*) that starts by writing w on top of whatever
was originally on its tape (thus it ignores its actual input) and then proceeds to do whatever M would have done. Thus M*
behaves the same on all inputs. Thus if we knew what it does on any one input, wed know what it does for all inputs. So we
ask MLE what it does on the empty string. And that tells us what it does all the time, which must be, by the way we
constructed it, whatever the original machine M does on w.
The only slightly tricky thing here is the procedure for constructing M*. Are we sure that it is in fact computable? Maybe
weve reached the contradiction of claiming to have a machine to solve H not by erroneously claiming to have MLE but rather
by erroneously claiming to have a procedure for constructing M*. But thats not the case. Why not? Its easy to see how to
write a procedure that takes a string w and builds M*. For example, if "w" is "ab", then M* must be:
ERaRbLM, where E is a TM that erases its tape and then moves the read head back to the first square.
In other words, we erase the tape, move back to the left, then move right one square (leaving one blank ahead of the new tape
contents), write a, move right, write b, move left until we get back to the blank thats just to the left of the input, and then
execute M.
Supplementary Materials
The Turing Machine to construct M* is a bit too complicated to write here, but we can see how it works by describing it in a
more standard procedural way: It first writes ER. Then, for each character in w, it writes that character and R. Finally it
writes LM.
To make this whole process even clearer, lets look at this problem not from the point of view of the decidability of the
language H but rather by asking whether we can solve the Halting problem. To do this, lets describe in standard code how
we could solve the Halting problem if we had a subroutine MLE(M: TM) that could tell us whether a particular Turing
Machine halts on the empty string. Well assume a datatype TM. If you want to, you can think of objects of this type as
essentially strings that correspond to valid Turing Machines. Its like thinking of a type Cprogram, which is all the strings that
are valid C programs.
We can solve the Halting program with following function Halts:
Function Halts(M: TM, w: string): Boolean;
M* := Construct(M, w);
Return MLE(M*);
end;
Function Construct(M: TM, w: string): TM;
/* Construct builds a machine that first erases its tape. Then it copies w onto its tape and moves its
/* read head back to the left ready to begin reading w. Finally, it executes M.
Construct := Erase;
/* Erase is a string that corresponds to the TM that erases its input tape.
For each character c in w do
Construct := Construct || "R" || c;
end;
Construct := Construct || LM;
Return(Construct);
Function MLE(M: TM): Boolean;
The function we claim tells us whether M halts on the empty string.
The most common mistake that people make when theyre trying to use reduction to show that a problem isnt solvable (or
that a language isnt decidable) is to do the reduction backwards. In this case, that would mean we would put forward the
following argument: Suppose we had a program Halts to solve the Halting problem (the general problem of determining
whether a TM M halts on some arbitrary input w). Then we could use it as a subroutine to solve the specific problem of
determining whether a TM M halts on the empty string. Wed simply invoke Halts and pass it M and . If Halts returns True,
then we say yes. If Halts returns False, we say no. But since we know that Halts cant exist, no solution to our problem can
exist either. The flaw in this argument is the last sentence. Clearly, since Halts cant exist, this particular approach to
solving our problem wont work. But this says nothing about whether there might be some other way to solve our problem.
To see this flaw even more clearly, lets consider applying it in a clearly ridiculous way: Suppose we had a program Halts to
solve the Halting problem. Then we could use it as a subroutine to solve the problem of adding two numbers a and b. Wed
simply invoke Halts and pass it the trivial TM that simply halts immediately and the input . If Halts returns True, then we
return a+b. If Halts returns False, we also return a+b. But since we know that Halts cant exist, no solution to our problem
can exist either. Just as before, we have certainly written a procedure for adding two numbers that wont work, since Halts
cant exist. But there are clearly other ways to solve our problem. We dont need Halts. Thats totally obvious here. Its less
so in the case of attempting to build MLE. But the logic is the same in both cases: flawed.
Example 2: Given a TM M, is L(M) ? (In other words, does M halt on anything at all?). Lets do this one first using the
solvability of the problem perspective. Then well do it from the decidability of the language point of view.
This time, we claim that there exists:
Function MLA(M: TM): Boolean;
Returns T if M halts on any inputs at all and False otherwise.
Supplementary Materials
We show that if this claim is true and MLA does in fact exist, then we can build a function MLE that solves the problem of
determining whether a TM accepts the empty string. We already know, from our proof in Example 1, that this problem isnt
solvable. So if we can do it using MLA (and some set of clearly computable functions), we know that MLA cannot in fact
exist.
The reduction we need for this example is simple. We claim we have a machine MLA that tells us whether some machine M
accepts anything at all. If we care about some particular input to M (for example, we care about ), then we will build a new
machine M* that erases whatever was originally on its tape. Then it copies onto its tape the input we care about (i.e., ) and
runs M. Clearly this new machine M* is oblivious to its actual input. It either always accepts (if M accepts ) or always
rejects (if M rejects ). It accepts everything or nothing. So what happens if we pass M* to MLA? If M* always accepts,
then its language is not the empty set and MLA will return True. This will happen precisely in case M halts on . If M*
always rejects, then its language is the empty set and MLA will return False. This will happen precisely in case M doesnt
halt on . Thus, if MLA really does exist, we have a way to find out whether any machine M halts on :
Function MLE(M: TM): Boolean;
M* := Construct(M);
Return MLA(M*);
end;
Function Construct(M: TM): TM; /* This time, we build an M* that simply erases its input and then runs M
/*(thus running M on ).
Construct := Erase;
/* Erase is a string that corresponds to the TM that erases its input tape.
Construct := Construct || M;
Return(Construct);
But we know that MLE cant exist. Since everything in its code, with the exception of MLA, is trivially computable, the only
way it cant exist is if MLA doesnt actually exist. Thus weve shown that the problem determining whether or not a TM M
halts on any inputs at all isnt solvable.
Notice that this argument only works because everything else that is done, both in MLE and in Construct is clearly
computable. We could write it all out in the Turing Machine notation, but we dont need to, since its possible to prove that
anything that can be done in any standard programming language can also be done with a Turing Machine. So the fact that
we can write code for it is good enough.
Whenever we want to try to use this approach to decide whether or not some new problem is solvable, we can choose to
reduce to the new problem any problem that we already know to be unsolvable. Initially, the only problem we knew wasnt
solvable was the Halting problem, which we showed to be unsolvable using diagonalization. But once we have used
reduction to show that other problems arent solvable either, we can use any of them for our next problem. The choice is up
to you. Whatever is easiest is the thing to use.
When we choose to use the problem solvability perspective, there is always a risk that we may make a mistake because we
havent been completely rigorous in our definition of the mechanisms that we can use to solve problems. One big reason for
even defining the Turing Machine formalism is that it is both simple and rigorously defined. Thus, although the problem
solvability perspective may seem more natural to us, the language decidability perspective gives us a better way to construct
rigorous proofs.
So lets show that the language
LA = {"M": L(M) } is undecidable.
We will show that if LA were decidable, then LE = {"M": L(M)} would also be decidable. But of course, we know that it
isnt.
Suppose LA is decidable; then some TM MLA decides it. We can now show how to construct a new Turing Machine MLE,
Supplementary Materials
which will invoke MLA as a subroutine, and which will decide LE:
MLE(M): /* A decision procedure for LE = {"M": L(M)}
1. Construct a new TM M*, which behaves as follows:
1.1. Erase its tape.
1.2. Execute M on the resulting empty tape.
2. Invoke MLA(M*).
Its clear that MLE effectively decides LE (if MLA really exists). Why? MLE returns True iff MLA returns True. That happens,
by its definition, if it is passed a TM that accepts at least one string. We pass it M*. M* accepts at least one string (in fact, it
accepts all strings) precisely in case M accepts the empty string. If M does not accept the empty string, then M* accepts
nothing and MLE returns False.
Example 3: Given a TM M, is L(M) = *? (In other words, does M accept everything?). We can show that this problem is
unsolvable by using almost exactly the same technique we just used. In example 2, we wanted to know whether a TM
accepted anything at all. Now we want to know whether it accepts everything. We will answer this question by showing that
the language
L = {"M": L(M) = *} is undecidable.
Recall the machine M* that we constructed for Example 2. It erases its tape and then runs M on the empty tape. Clearly M*
either accepts nothing or it accepts everything, since its behavior is independent of its input. M* is exactly what we need for
this proof too. Again well choose to reduce the language LE = {"M": L(M)} to our new problem L:
If L is decidable, then theres a TM ML that decides it. In other words, theres a TM that tells us whether or not some other
machine M accepts everything. If ML exists, then we can define the following TM to decide LE:
MLE(M): /* A decision procedure for LE = {"M": L(M)}
1. Construct a new TM M*, which behaves as follows:
1.1. Erase its tape.
1.2. Execute M on the resulting empty tape.
2. Invoke ML(M*).
Step 2 will return True if M* halts on all strings in * and False otherwise. So it will return True if and only M halts on .
This would seem to be a correct decision procedure for LE. But we know that such a procedure cannot exist and the only
possible flaw in the procedure weve given is ML. So ML doesnt exist either.
Example 4: Given a TM M, is L(M) infinite? Again, we can use M*. Remember that M* either halts on nothing or it halts
on all elements of *. Assuming that , that means that M* either halts on nothing or it halts on an infinite number of
strings. It halts on everything if its input machine M halts on . Otherwise it halts on nothing. So we can show that the
language
LI = {"M": L(M) is infinite}
is undecidable by reducing the language LE = {"M": L(M)} to it.
If LI is decidable, then there is a Turing Machine MLI that decides it. Given MLI, we decide LE as follows:
MLE(M): /* A decision procedure for LE = {"M": L(M)}
1. Construct a new TM M*, which behaves as follows:
1.1. Erase its tape.
1.2. Execute M on the resulting empty tape.
2. Invoke MLI(M*).
Step 2 will return True if M* halts on an infinite number of strings and False otherwise. So it will return True if and only M
halts on .
Supplementary Materials
This idea that a single construction may be the basis for several reduction proofs is important. It derives from the fact that
several quite different looking problems may in fact be distinguishing between the same two cases.
Example 5: Given two TMs, M1 and M2, is L(M1) = L(M2)? In other words, is the language
LEQ = {"M1" "M2": L(M1) = L(M2)} decidable?
Now, for the first time, we want to answer a question about the relationship of two Turing Machines to each other. How can
we solve this problem by reducing to it any of the problems we already know to be undecidable? They all involve only a
single machine. The trick is to use a constant, a machine whose behavior we are certain of. So we define M#, which halts on
all inputs. M# is trivial. It ignores its input and goes immediately to a halt state.
If LEQ is decidable, then there is a TM MLEQ that decides it. Using MLEQ and M#, we can decide the language
L = {"M": L(M) = *} (which we showed in example 3 isnt decidable) as follows:
ML(M): /* A decision procedure for L
1. Invoke MLEQ(M, M#).
Clearly M accepts everything if it is equivalent to M#, which is exactly what MLEQ tells us.
This reduction is an example of an easy one. To solve the unsolvable problem, we simply pass the input directly into the
subroutine that we are assuming exists, along with some simple constant. We don't need to do any clever constructions. The
reason this was so simple is that our current problem LEQ, is really just a generalization of a more specific problem weve
already shown to be unsolvable. Clearly if we cant solve the special case (determining whether a machine is equivalent to
M#), we cant solve the more general problem (determining whether two arbitrary machines are equivalent).
Example 6: Given a TM M, is L(M) regular? Alternatively, is
LR = {"M": L(M) is regular} decidable?
.
To answer this one, well again need to use an interesting construction. To do this, well make use of the language
H = {"M" "w" : w L(M)}
Recall that H is just the set of strings that correspond to a (TM, input) pair, where the TM halts on the input. H is not
decidable (thats what we proved by diagonalization). But it is semidecidable. We can easily built a TM Hsemi that halts
whenever the TM M halts on input w and that fails to halt whenever M doesnt halt on w. All Hsemi has to do is to simulate the
execution of M on w. Note also that H isnt regular (which we can show using the pumping theorem).
Suppose that, from M and Hsemi, we construct a machine M$ that behaves as follows: given an input string w, it first runs Hsemi
on w. Clearly, if Hsemi fails to halt on w, M$ will also fail to halt. But if Hsemi halts, then we move to the next step, which is to
run M on . If we make it here, then M$ will halt precisely in case M would halt on . So our new machine M$ will either:
1. Accept H, which it will do if L(M), or
2. Accept , which it will do if L(M).
Thus we see that M$ will accept either
1. A non regular language, H, which it will do if L(M), or
2. A regular language , which it will do if L(M).
So, if we could tell whether M$ accepts a regular language or not, wed know whether or not M accepts .
Were now ready to show that LR isnt decidable. If it were, then there would be some TM MLR that decided it. But MLR
cannot exist, because, if it did, we could reduce LE = {"M": L(M)} to it as follows:
MLE(M): /* A decision procedure for LE = {"M": L(M)}
1. Construct a new TM M$(w), which behaves as follows:
1.1. Execute Hsemi on w.
Supplementary Materials
2.
3.
1.2. Execute M on .
Invoke MLR(M$).
If the result of step 2 is True, return False; if the result of step 2 is False, return True.
MLE, as just defined, effectively decides LE. Why? If L(M), then L(M$) is H, which isnt regular, so MLR will say False
and we will, correctly, say True. If L, then L(M$) is , which is regular, so MLR will say True and we will, correctly, say
False.
By the way, we can use exactly this same argument to show that LC = {"M": L(M) is context free} and LD = {"M": L(M) is
recursive} are undecidable. All we have to do is to show that H is not context free (by pumping) and that it is not recursive
(which we did with diagonalization).
Example 7: Given a TM M and state q, is there any configuration (p, uav), with p q, that yields a configuration whose state
is q? In other words, is there any state p that could possibly lead M to q? Unlike many (most) of the questions we ask about
Turing Machines, this one is not about future behavior. (e.g., Will the Turing Machine do such and such when started from
here?) So were probably not even tempted to try simulation (which rarely works anyway).
But there is a way to solve this problem. In essence, we dont need to consider the infinite number of possible configurations
of M. All we need to do is to examine the (finite) transition table of M to see whether there is any transition from some state
other than q (call it p) to q. If there is such a transition (i..e., if p, , such that (p, ) = (q, )), then the answer is yes.
Otherwise, the answer is no.
3 Rices Theorem
Rices Theorem makes a very general statement about an entire class of languages that are not recursive. Thus, for some
languages, it is an alternative to problem reduction as a technique for proving that a language is not recursive.
There are several different forms in which Rices Theorem can be stated. Well present two of them here:
(Form 1) Any nontrivial property of the recursively enumerable languages is not decidable.
(Form 2) Suppose that C is a proper, nonempty subset of the class of all recursively enumerable languages. Then the
following language is undecidable: LC = {<M>: L(M) is an element of C}.
These two statements look quite different but are in fact nearly equivalent. (Although Form 1 is stronger since it makes a
claim about the language, no matter how you choose to define the language. So it applies given a grammar, for example.
Form 2 only applies directly if the way you define the language is by a Turing Machine that semidecides it. But even this
difference doesn't really matter since we have algorithms for constructing a Turing Machine from a grammar and vice versa.
So it would just take one more step if we started with a grammar and wanted to use Form 2). But, if we want to prove that a
language of Turing machine descriptions is not recursive using Rice's Theorem, you must do the same things, whichever
description of it you prefer.
Well consider Form 1 first. To use it, we first, we have to guarantee that the property we are concerned with is a property
(predicate) whose domain is the set of recursively enumerable languages. Here are some properties P whose domains are the
RE languages:
1) P is true of any RE language that contains an even number of strings and false of any RE language that contains an odd
number of strings.
2) P is true of any RE language that contains all stings over its alphabet and false for all RE languages that are missing any
strings over their alphabet.
3) P is true of any RE language that is empty (i.e., contains no strings) and false of any RE language that contains any strings.
4) P is true of any RE language
Supplementary Materials
5) P is true of any RE language that can be semidecided by a TM with an even number of states and false for any RE language
that cannot be semidecided by such a TM.
6) P is true of any RE language that contains at least one string of infinite length and false of any RE language that contains no
infinite strings.
Here are some properties whose domains are not the RE languages:
1 ) P is true of Turing machines whose first move is to write "a" and false of other Turing machines.
2) P is true of two tape Turing machines and false of all other Turing machines.
3) P is true of the negative numbers and false of zero and the positive numbers.
4) P is true of even length strings and false of odd length strings.
We can attempt to use Rices Theorem to tell us that properties in the first list are undecidable. It wont help us at all for
properties in the second list.
But now we need to do one more thing: We must show that P is a nontrivial property. Any property P is nontrivial if it is not
equivalent to True or False. In other words, it must be true of at least one language and false of at least one language.
Lets look at properties 1-6 above:
1) P is true of {a, b} and false of {a}, so it is nontrivial.
2) Lets just consider the case in which is {a, b}. P is true of * and false of {a}.
3) P is true of and P is false of every other RE language.
4) P is true of any RE language and false of nothing, so P is trivial.
5) P is true of any RE language and false of nothing, so P is trivial. Why? Because, for any RE language L there exists a
semideciding TM M. If M has an even number of states, then P is clearly true. If M has an odd number of states, then create
a new machine M identical to M except it has one more state. This state has no effect on Ms behavior because there are no
transitions in to it. But it guarantees that M has an even number of states. Since M accepts L (because M does), P is true of
L. So P is true of all RE languages.
6) P is false for all RE languages. Why? Because the definition of a language is a set of strings, each of finite length. So no
RE language contains a string of infinite length.
So we can use Rices theorem to prove that the set of RE languages possessing any one of properties 1, 2, or 3 is not
recursive. But it does not tell us anything about the set of RE languages possessing property 3, 4, or 5.
In summary, to apply this version of Rices theorem, it is necessary to do three things:
0) Specify a property P.
1) Show that the domain of P is the set of recursively enumerable languages.
2) Show that P is nontrivial by showing:
a) That P is true of at least one language, and
b) That P is false of at least on language.
Now lets try to use Form 2. We must find a C that is a proper, nonempty subset of the class of all recursively enumerable
language.
First we notice that this version is stated in terms of C, a subset of the RE languages, rather than P, a property (predicate) that
is true of the RE languages. But this is an insignificant difference. Given a universe U, then one way to define a subset S of
U is by a characteristic function that, for any candidate element x, returns true if x S and false otherwise. So the P that
corresponds to S is simply whatever property the characteristic function tests for. Alternatively, for any subset S there must
exist a characteristic function for S (although that function need not be computable thats a different issue.) So given a set
S, we can define the property P as is a member of S. or possesses whatever property it takes to be determined to be n S.
So Form 2 is stated in terms of the set of languages that satisfy some property P instead of being stated in terms of P directly,
but as there is only one such set for any property P and there is only one such property P (viewed simply as a truth table,
ignoring how you say it in English) for any set, it doesnt matter which specification we use.
Supplementary Materials
Next we notice that this version requires that C be a proper, nonempty subset of the class of RE languages. But this is exactly
the same as requiring that P be nontrivial. Why? For P to be nontrivial, then there must exist at least one language of which it
is true and one of which it is false. Since there must exist one language of which it is true, the set of languages that satisfy it
isnt empty. Since there must exist one language of which it is false, the set of languages that satisfy it is not exactly the set of
RE languages and so we have a proper subset.
So, to use Form 2 of Rices Theorem requires that we:
0) Specify some set C (by specifying a membership predicate P or some other way).
1) Show that C is a subset of the set of RE languages (which is equivalent to saying that the domain of its membership
predicate is the set of RE languages)
2) Show that C is a proper nonempty subset of the recursive languages by showing that
a) C (i.e., its characteristic function P is not trivially false), and
b) C RE (i.e., its characteristic function P is not trivially true).
Supplementary Materials