Parsing: Compiler Design
Parsing: Compiler Design
Parsing: Compiler Design
Recursive-Descent Parsing
Parsing
Compiler Design
CSE 504
1 2 3
Grammars
Recursive-Descent Parsing
Parsing
A.k.a. Syntax Analysis Recognize sentences in a language. Discover the structure of a document/program. Construct (implicitly or explicitly) a tree (called as a parse tree) to represent the structure. The parse tree is used to guide translation.
Compiler Design
Parsing
CSE 504
2 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet.
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Ecient recognizers (automata) can be constructed to determine whether a string is in the language.
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Ecient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Ecient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Ecient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration Regular Languages (RL FL) Regular Expressions
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Grammars
The syntactic structure of a language is dened using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Ecient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration Regular Languages (RL FL) Regular Expressions Context-Free Languages (CFL RL) Context-Free Grammars
Compiler Design
Parsing
CSE 504
3 / 37
Grammars
Recursive-Descent Parsing
Regular Languages
Compiler Design
Parsing
CSE 504
4 / 37
Grammars
Recursive-Descent Parsing
Regular Languages
Compiler Design
Parsing
CSE 504
4 / 37
Grammars
Recursive-Descent Parsing
Regular Languages
Compiler Design
Parsing
CSE 504
4 / 37
Grammars
Recursive-Descent Parsing
Regular Languages
{an b n | n 0}
Compiler Design
Parsing
CSE 504
4 / 37
Grammars
Recursive-Descent Parsing
Context-Free Grammars
Terminal Symbols: Tokens Nonterminal Symbols: set of strings made up of tokens Productions: Rules for constructing the set of strings associated with nonterminal symbols. Example: Stmt while Expr do Stmt Start symbol: a nonterminal symbol that represents the set of all strings in the language.
Compiler Design
Parsing
CSE 504
5 / 37
Grammars
Recursive-Descent Parsing
Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)): S Notational shorthand: S ES S | ES E a E a | b E b
Compiler Design
Parsing
CSE 504
6 / 37
Grammars
Recursive-Descent Parsing
Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)): S Notational shorthand: S ES S | ES E a E a | b E b {an bn | n 0} : S S aSb
Compiler Design
Parsing
CSE 504
6 / 37
Grammars
Recursive-Descent Parsing
Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)): S Notational shorthand: S ES S | ES E a E a | b E b {an bn | n 0} : S S aSb {w | no. of as in w = no. of bs in w }: Not expressible in CFG .
Compiler Design
Parsing
CSE 504
6 / 37
Grammars
Recursive-Descent Parsing
E E E E E E
E + E E E E E E / E ( E ) id
Compiler Design
Parsing
CSE 504
7 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
8 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id E = = = E +E E + id id + id
E derives id + id:
Compiler Design
Parsing
CSE 504
9 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id E = = = E +E E + id id + id
E derives id + id:
Compiler Design
Parsing
CSE 504
9 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id E = = = E +E E + id id + id
E derives id + id:
A = i A is a production in the grammar. = if derives in zero or more steps. Example: E = id + id Sentence: A sequence of terminal symbols w such that S = w (where S is the start symbol)
+
Compiler Design
Parsing
CSE 504
9 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id E = = = E +E E + id id + id
E derives id + id:
A = i A is a production in the grammar. = if derives in zero or more steps. Example: E = id + id Sentence: A sequence of terminal symbols w such that S = w (where S is the start symbol) Sentential Form: A sequence of terminal/nonterminal symbols such that S =
Compiler Design Parsing CSE 504 9 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Compiler Design
Parsing
CSE 504
10 / 37
Grammars
Recursive-Descent Parsing
Derivations
Grammar: E E E +E id
E +E id + E id + id
Written as E =lm id + id Rightmost derivation: Rightmost nonterminal is replaced rst: E = = = Written as E =rm id + id
Compiler Design Parsing CSE 504 10 / 37
E +E E + id id + id
Grammars
Recursive-Descent Parsing
Parse Trees
A Parse Tree is a graphical representation of a derivation
Grammar: E E
E
E +E id
= E + E = id + E = id + id
E
E + E
= E + E = E + id = id + id
id
id
Compiler Design
Parsing
CSE 504
11 / 37
Grammars
Recursive-Descent Parsing
Parse Trees
A Parse Tree is a graphical representation of a derivation
Grammar: E E
E
E +E id
= E + E = id + E = id + id
E
E + E
= E + E = E + id = id + id
id
id
Compiler Design
Parsing
CSE 504
11 / 37
Grammars
Recursive-Descent Parsing
Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.
Grammar: E E + E E E E E id Sentence: id + id id
Compiler Design
Parsing
CSE 504
12 / 37
Grammars
Recursive-Descent Parsing
Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.
E
Grammar: E E + E E E E E id Sentence: id + id id
id
id
id
Compiler Design
Parsing
CSE 504
12 / 37
Grammars
Recursive-Descent Parsing
Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.
E
Grammar: E E + E E E E E id Sentence: id + id id
id
id
id
id
id
id
Compiler Design
Parsing
CSE 504
12 / 37
Grammars
Recursive-Descent Parsing
Disambiguition
Express Preference for one parse tree over others. Example: id + id id The usual precedence of over + means:
E
id
id
id
id
id
id
Preferred
Compiler Design
Parsing
CSE 504
13 / 37
Grammars
Recursive-Descent Parsing
Parsing
Construct a parse tree for a given string.
S S S (S)S a
Compiler Design
Parsing
CSE 504
14 / 37
Grammars
Recursive-Descent Parsing
Parsing
Construct a parse tree for a given string.
S S S (S)S a
(a)a
(a)(a)
Compiler Design
Parsing
CSE 504
14 / 37
Grammars
Recursive-Descent Parsing
Parsing
Construct a parse tree for a given string.
S S S (S)S a
(a)a
S
(a)(a)
Compiler Design
Parsing
CSE 504
14 / 37
Grammars
Recursive-Descent Parsing
Parsing
Construct a parse tree for a given string.
S S S (S)S a
(a)a
S
( S
(a)(a)
S
S
a ( S ) S
a
a
CSE 504 14 / 37
Compiler Design
Parsing
Grammars
Recursive-Descent Parsing
Grammar:
Algorithm parse S() { switch (input token) { case TOKEN A: consume(TOKEN A); return; default: /* Parse Error */ } }
Compiler Design
Parsing
CSE 504
15 / 37
Grammars
Recursive-Descent Parsing
Grammar:
Algorithm parse S() { switch (input token) { case TOKEN A: /* Production 1 */ consume(TOKEN A); return; case TOKEN EOF : /* Production 2 */ return; default: /* Parse Error */ } }
Compiler Design
Parsing
CSE 504
16 / 37
Grammars
Recursive-Descent Parsing
Grammar:
Algorithm parse S() { switch (input token) { case TOKEN OPEN PAREN: /* Production 1 */ consume(TOKEN OPEN PAREN); parse S(); consume(TOKEN CLOSE PAREN); parse S(); return; /* Continued on next page */
Compiler Design
Parsing
CSE 504
17 / 37
Grammars
Recursive-Descent Parsing
Grammar:
case TOKEN A: /* Production 2 */ consume(TOKEN A); return; case TOKEN CLOSE PAREN: case TOKEN EOF : /* Production 3 */ return; default: /* Parse Error */
Compiler Design
Parsing
CSE 504
18 / 37
Grammars
Recursive-Descent Parsing
a B d
Compiler Design
Parsing
CSE 504
19 / 37
Grammars
Recursive-Descent Parsing
FIRST(X ) = First symbol of any string that can be derived from X FIRST(S) = {(, a, }. FOLLOW(A) = First symbol that, in some derivation of a sentence in the language, appears immediately after A. FOLLOW(S) = {), EOF}
S
C a
Compiler Design
a FIRST(C ) b FOLLOW(C )
b
Parsing CSE 504 20 / 37
Grammars
Recursive-Descent Parsing
Grammar:
S S
A S B
A a B b
First terminal in some such that X = . First terminal in some such that S = A.
= = =
{ a, } {a} {b}
Compiler Design
Parsing
CSE 504
21 / 37
Grammars
Recursive-Descent Parsing
Grammar:
S S
A S B
A a B b
First terminal in some such that X = . First terminal in some such that S = A. FOLLOW (S) = { b, EOF }
= = =
{ a, } {a} {b}
Compiler Design
Parsing
CSE 504
21 / 37
Grammars
Recursive-Descent Parsing
Grammar:
S S
A S B
A a B b
First terminal in some such that X = . First terminal in some such that S = A. FOLLOW (S) FOLLOW (A) = = { b, EOF } { a, b }
= = =
{ a, } {a} {b}
Compiler Design
Parsing
CSE 504
21 / 37
Grammars
Recursive-Descent Parsing
Grammar:
S S
A S B
A a B b
First terminal in some such that X = . First terminal in some such that S = A. FOLLOW (S) FOLLOW (A) FOLLOW (B) = = = { b, EOF } { a, b } { b, EOF }
= = =
{ a, } {a} {b}
Compiler Design
Parsing
CSE 504
21 / 37
Grammars
Recursive-Descent Parsing
Denition of FIRST
S S AS B A B a b
Grammar:
Grammars
Recursive-Descent Parsing
Denition of FIRST
S S AS B A B a b
Grammar:
Grammars
Recursive-Descent Parsing
Denition of FIRST
S S AS B A B a b
Grammar:
FIRST (S) { }
Compiler Design Parsing CSE 504 22 / 37
Grammars
Recursive-Descent Parsing
Denition of FIRST
S S AS B A B a b
Grammar:
FIRST (S) { }, and FIRST (S) FIRST ({ASB}) FIRST (A) {a}
Compiler Design Parsing CSE 504 22 / 37
Grammars
Recursive-Descent Parsing
Denition of FOLLOW
Grammar:
S S
AS B
A B
a b
Grammars
Recursive-Descent Parsing
Denition of FOLLOW
Grammar:
S S
AS B
A B
a b
Grammars
Recursive-Descent Parsing
Denition of FOLLOW
Grammar:
S S
AS B
A B
a b
Grammars
Recursive-Descent Parsing
Denition of FOLLOW
Grammar:
S S
AS B
A B
a b
Grammars
Recursive-Descent Parsing
Parsing Table
Grammar: S S AS B A B a b
Algorithm parse S() { switch (input token) { case TOKEN A: /* Production 3 */ parse A(); parse S(); parse B(); return; case TOKEN B: case TOKEN EOF : /* Production 4 */ return; Parsing Table:
Nonterminal S
Compiler Design
Parsing
CSE 504
24 / 37
Grammars
Recursive-Descent Parsing
Table-driven Parsing
Grammar:
S S
AS B
A B
a b
Nonterminal S A B
Compiler Design
Parsing
CSE 504
25 / 37
Grammars
Recursive-Descent Parsing
Nonrecursive Parsing
Instead of recursion, use an explicit stack along with the parsing table. Data objects: Parsing Table: M(A, a), a two-dimensional array, dimensions indexed by nonterminal symbols (A) and terminal symbols (a). A Stack of terminal/nonterminal symbols Input stream of tokens The above data structures manipulated using a table-driven parsing program.
Compiler Design
Parsing
CSE 504
26 / 37
Grammars
Recursive-Descent Parsing
Grammars
Recursive-Descent Parsing
Grammar:
S S
AS B
A B = = =
a b
= = =
{ a, } {a} {b}
{ b, EOF } { a, b } { b, EOF }
Nonterminal S A B
Compiler Design
Parsing
CSE 504
28 / 37
Grammars
Recursive-Descent Parsing
Algorithm table construct(G ) { for each A G { for each a FIRST () such that a = add A to M[A, a]; if FIRST () for each b FOLLOW (A) add A to M[A, b]; }}
Compiler Design
Parsing
CSE 504
29 / 37
Grammars
Recursive-Descent Parsing
Grammar:
S S S
(S)S a
LL(1) Grammar: When the grammars recursive descent parsing table has no conicts (i.e. each cell has at most one entry).
Compiler Design
Parsing
CSE 504
30 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
31 / 37
Grammars
Recursive-Descent Parsing
A A a A b
A bA A A aA
Compiler Design
Parsing
CSE 504
32 / 37
Grammars
Recursive-Descent Parsing
E E
E + E id
E E E
id E + id E
Compiler Design
Parsing
CSE 504
33 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$
Compiler Design
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
a S A S B A a
b S B b
EOF S
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$ b$
Rule S A S B A a S A S B A a S B b B b
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
a S A S B A a
b S B b
EOF S
Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$ b$ $
Rule S A S B A a S A S B A a S B b B b
Parsing
CSE 504
34 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E
id E id E
EOF E
Derivation
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E
id E id E
EOF E
Derivation E = id E
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id
id E id E
EOF E
Derivation E = id E
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id $E
id E id E
EOF E
Derivation E = id E = id + id E
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id $E $E id +
id E id E
EOF E
Derivation E = id E = id + id E
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id $E $E id + $E id
id E id E
EOF E
Derivation E = id E = id + id E
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id $E $E id + $E id $E
id E id E
EOF E
Derivation E = id E = id + id E
id+id
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
Nonterminal E E Stack $E $E id $E $E id + $E id $E $
id E id E
EOF E
Derivation E = id E = id + id E
id+id
Compiler Design
Parsing
CSE 504
35 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A | G
1
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A | G
1 2
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
LL(1) Derivations
Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A | G
1 2
Compiler Design
Parsing
CSE 504
36 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
37 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
37 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
37 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
37 / 37
Grammars
Recursive-Descent Parsing
Compiler Design
Parsing
CSE 504
37 / 37