Top-Down Parsing: - The Parse Tree Is Created Top To Bottom. - Top-Down Parser
Top-Down Parsing: - The Parse Tree Is Created Top To Bottom. - Top-Down Parser
Predictive Parsing
no backtracking efficient needs a special form of grammars (LL(1) grammars). Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking. Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
a
b
B
c
B
b
fails, backtrack
Predictive Parser
a grammar
eliminate
left recursion
left
factor
When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a production rule by just looking the current symbol in the input string.
A 1 | ... | n
When we are trying to write the non-terminal stmt, if the current token is if we have to choose first production rule. When we are trying to write the non-terminal stmt, we can uniquely choose the production rule by just looking the current token. We eliminate the left recursion in the grammar, and left factor it. But it may not be suitable for predictive parsing (not LL(1) grammar).
proc A { - match the current token with a, and move to the next token; - call B; - match the current token with b, and move to the next token; }
If all other productions fail, we should apply an -production. For example, if the current token is not a or b, we may apply the -production. Most correct choice: We should apply an -production for a nonterminal A when the current token is in the follow set of A (which terminals can follow A in the sentential forms).
proc B { case of the current token { b: - match the current token with b, and move to the next token; - call B e,d: do nothing } }
follow set of B
input buffer
stack
output
LL(1) Parser
input buffer
our string to be parsed. We will assume that its end is marked with a special symbol $.
output
a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer.
stack
contains the grammar symbols at the bottom of the stack, there is a special end marker symbol $. initially the stack contains only the symbol $ and the starting symbol S. $S initial stack when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
a two-dimensional array M[A,a] each row is a non-terminal symbol each column is a terminal symbol or the special symbol $ each entry holds a production rule.
4.
S
B
S aBa
B
stack
$S $aBa $aB $aBb $aB $aBb $aB $a $
input
abba$ abba$ bba$ bba$ ba$ ba$ a$ a$ $
output
S aBa
B bB B bB B accept, successful completion
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b b
B B
FIRST() is a set of the terminal symbols which occur as first symbols in strings derived from where is any string of grammar symbols. if derives to , then is also in FIRST() . FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol. * Aa a terminal a is in FOLLOW(A) if S * A $ is in FOLLOW(A) if S
FIRST Example
E TE E +TE | T FT T *FT | F (E) | id FIRST(F) = {(,id} FIRST(T) = {*, } FIRST(T) = {(,id} FIRST(E) = {+, } FIRST(E) = {(,id} FIRST(TE) = {(,id} FIRST(+TE ) = {+} FIRST() = {} FIRST(FT) = {(,id} FIRST(*FT) = {*} FIRST() = {} FIRST((E)) = {(} FIRST(id) = {id}
FOLLOW Example
E TE E +TE | T FT T *FT | F (E) | id FOLLOW(E) = { $, ) } FOLLOW(E) = { $, ) } FOLLOW(T) = { +, ), $ } FOLLOW(T) = { +, ), $ } FOLLOW(F) = {+, *, ), $ }
FIRST()={} none but since in FIRST() and FOLLOW(E)={$,)} E into M[E,$] and M[E,)] FIRST(FT)={(,id} FIRST(*FT )={*} T FT into M[T,(] and M[T,id] T *FT into M[T,*]
T FT T *FT T
FIRST()={} none but since in FIRST() and FOLLOW(T)={$,),+} T into M[T,$], M[T,)] and M[T,+] FIRST((E) )={(} F (E) into M[F,(]
F (E)
F id
FIRST(id)={id}
F id into M[F,id]
LL(1) Grammars
A grammar whose parsing table has no multiply-defined entries is said to be LL(1) grammar.
one input symbol used as a look-head symbol do determine parser action
LL(1)
The parsing table of a grammar may contain more than one production rule. In this case, we say that it is not a LL(1) grammar.
a
S Sa E
e
EeS E
i
S iCtSE
$
E
Cb
two production rules for M[E,e]
Problem ambiguity
3. If can derive to , then cannot derive to any string starting with a terminal in FOLLOW(A).
Error-Productions
If we have a good idea of the common errors that might be encountered, we can augment the grammar with productions that generate erroneous constructs. When an error production is used by the parser, we can generate appropriate error diagnostics. Since it is almost impossible to know all the errors that can be made by the programmers, this method is not practical.
Global-Correction
Ideally, we we would like a compiler to make as few change as possible in processing incorrect inputs. We have to globally analyze the input to find the error. This is an expensive method, and it is not in practice.
a S S AbS A A a
$ sync
S accept
stack input output $S ceadb$ S AbS $SbA ceadb$ A cAd $SbdAc ceadb$ $SbdA eadb$ Error:unexpected e (illegal A) (Remove all input tokens until first b or d, pop A) $Sbd db$ $Sb b$ $S $ S $ $ accept
We should be careful when we design these error routines, because we may put the parser into an infinite loop.