0% found this document useful (0 votes)
691 views45 pages

Top Down Parsing

Top-down parsing begins with the root symbol and attempts to construct a parse tree for the input string starting from the top. For top-down parsing to work efficiently with predictive parsers, the grammar must be unambiguous, non-left recursive, and left-factored. Left recursion is eliminated by rewriting productions in a right-recursive form. The LL(1) parsing table is constructed using the FIRST and FOLLOW sets to determine the production rules to select based on the next input symbol.

Uploaded by

gargsajal9
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
691 views45 pages

Top Down Parsing

Top-down parsing begins with the root symbol and attempts to construct a parse tree for the input string starting from the top. For top-down parsing to work efficiently with predictive parsers, the grammar must be unambiguous, non-left recursive, and left-factored. Left recursion is eliminated by rewriting productions in a right-recursive form. The LL(1) parsing table is constructed using the FIRST and FOLLOW sets to determine the production rules to select based on the next input symbol.

Uploaded by

gargsajal9
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Top-Down Parsing

Top down parsing is an attempt to construct a parse tree for the input string, starting from the root . Top down parsing attempts to find left most derivation of string An unambiguous grammar does not alone guarantee that it is suitable for top down parsing For top down (predictive parsers) parsers to work, grammar should not be left recursive and should be left factored

When Top down parsing doesnt Work Well Consider productions S p S a | a: In the process of parsing S we try the above rules Applied consistently in this order, get infinite loop Could re-order productions, but search will have lots of backtracking and defining a general rule for ordering will become complex Problem here is left-recursive grammar:

Left Recursion

EpE+T|T TpT*F|F F p n | (E) E

E E + T + T

Elimination of Left recursion Consider the left-recursive grammar SpSE|F S generates all strings starting with a F and followed by a number of E Can rewrite using right-recursion S p F S S p E S | I
4

Elimination of left Recursion. Example Consider the grammar S p 1 | S 0 ( F = 1 and E = 0 ) can be rewritten as S p 1 S S p 0 S | I

More Elimination of Left Recursion In general S p S E1 | | S En | F1 | | Fm All strings derived from S start with one of F1,,Fm and continue with several instances of E1,,En Rewrite as S p F1 S | | Fm S S p E1 S | | En S | I

General Left Recursion The grammar SpAE|H (1) ApSF (2) is also left-recursive because S p+ S F E This left recursion can also be eliminated by first substituting (2) into (1) There is a general algorithm (e.g. Aho, Sethi, Ullman 4.3)
7

General Left Recursion The grammar SpAE|H (1) ApSF (2) is also left-recursive because S p+ S F E This left recursion can also be eliminated by first substituting (2) into (1) There is a general algorithm (e.g. Aho, Sethi, Ullman 4.3)
8

Top-Down Parsing
Top-down parser Recursive-Descent Parsing Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.) It is a general parsing technique, but not widely used. Not efficient Predictive Parsing no backtracking efficient needs a special form of grammars (LL(1) grammars). Predictive Parsing is a special form of Recursive Descent parsing without backtracking. Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
9

Recursive-Descent Parsing (uses Backtracking)


Backtracking is needed. It tries to find the left-most derivation. S p aBc B p bd | b S input: abc a b B d c a B b c fails, backtrack S

10

Recursive Descent parsing


Parser consists of set of procedures, one for each NT. Execution will begin with procedure for start symbol. Not deterministic; since it randomly picks any production. Requires backtracking i.e. repeated scans over input. Backtracking parsers not considered efficient.

11

Left Factoring Consider the grammar EpT+E|T T p int | int * T | ( E ) Impossible to predict because For T two productions start with int For E it is not clear how to predict

A grammar must be left-factored before use for predictive parsing


12

Left-Factoring Example
Starting with the grammar E T+E|T T int | int * T | ( E )

Factor out common prefixes of productions


E X T Y TX +E| ( E ) | int Y *T|
13

Left-Factoring (cont.)
In general, A p EF1 | EF2 where E is non-empty and the first symbols of F1 and F2 (if they have one)are different. when processing E we cannot know whether expand A to EF1 or A to EF2 But, if we re-write the grammar as follows A p EA A p F1 | F2 so, we can immediately expand A to EA

14

Left-Factoring -- Algorithm
For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say A p EF1 | ... | EFn | K1 | ... | Km convert it into A p EA | K1 | ... | Km A p F1 | ... | Fn

15

Left-Factoring Example1
A p abB | aB | cdg | cdeB | cdfB left factor the given grammar

16


A p aA | cdA A p bB | B A p g | eB | fB

17

Predictive Parser
a grammar
eliminate left recursion left factor

a grammar suitable for predictive parsing (a LL(1) grammar) no %100 guarantee.

When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a production rule by just looking the current symbol in the input string. A p E1 | ... | En input: ... a ....... current token

18

Predictive Parser (example)


stmt p if ...... while ...... begin ...... for ..... | | |

When we are trying to write the non-terminal stmt, if the current token is if we have to choose first production rule. When we are trying to write the non-terminal stmt, we can uniquely choose the production rule by just looking the current token. We eliminate the left recursion in the grammar, and left factor it. But it may not be suitable for predictive parsing (not LL(1) grammar).
19

Non-Recursive Predictive Parsing -- LL(1) Parser


Non-Recursive predictive parsing is a table-driven parser. It is a top-down parser. It is also known as LL(1) Parser.

input buffer stack Non-recursive Predictive Parser Parsing Table


20

output

LL(1) Parser
input buffer
our string to be parsed. We will assume that its end is marked with a special symbol $.

output
a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer.

stack
contains the grammar symbols at the bottom of the stack, there is a special end marker symbol $. initially the stack contains only the symbol $ and the starting symbol S. $S initial stack when the stack is emptied (ie. only $ left in the stack), the parsing is completed.

parsing table
a two-dimensional array M[A,a] each row is a non-terminal symbol each column is a terminal symbol or the special symbol $ each entry holds a production rule.
21

LL(1) Parser Parser Actions


1. 2. 3. The symbol at the top of the stack (say X) and the current symbol in the input string (say a) determine the parser action. There are four possible parser actions. If X and a are $ parser halts (successful completion)

If X and a are the same terminal symbol (different from $) parser pops X from the stack, and moves the next symbol in the input buffer. If X is a non-terminal parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule XpY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also outputs the production rule XpY1Y2...Yk to represent a step of the derivation. none of the above

4.

error

all empty entries in the parsing table are errors. If X is a terminal symbol different from a, this is also an error case.
22

LL(1) Parser Example1


S p aBa B p bB | I
a S B stack
$S $aBa $aB $aBb $aB $aBb $aB $a $

S p aBa BpI B p bB output


S p aBa B p bB B p bB BpI accept, successful completion

LL(1) Parsing Table

input
abba$ abba$ bba$ bba$ ba$ ba$ a$ a$ $

23

LL(1) Parser Example1 (cont.)


Outputs: S p aBa B p bB B p bB BpI

Derivation(left-most): SaBaabBaabbBaabba
S

parse tree
a B a

b b

B B

I
24

LL(1) Parser Example2


E p TE E p +TE | I T p FT T p *FT | I F p (E) | id id E E T T F E p TE E p +TE T p FT T p I F p id T p *FT F p (E)
25

( E p TE

) E p I

$ E p I T p I

T p FT T p I

LL(1) Parser Example2


stack $E $ET $E TF $ E Tid $ E T $ E $ E T+ $ E T $ E T F $ E Tid $ E T $ E $ input id+id$ id+id$ id+id$ id+id$ +id$ +id$ +id$ id$ id$ id$ $ $ $ output E p TE T p FT F p id T p I E p +TE T p FT F p id T p I E p I accept

26

Constructing LL(1) Parsing Tables


Two functions are used in the construction of LL(1) parsing tables:
FIRST FOLLOW

FIRST(E) is a set of the terminal symbols which occur as first symbols in strings derived from E where E is any string of grammar symbols. if E derives to I, then I is also in FIRST(E) . FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol. * a terminal a is in FOLLOW(A) if S EAaF * $ is in FOLLOW(A) if S EA

27

Compute FIRST for Any String X


If X is a terminal symbol FIRST(X)={X} If X is a non-terminal symbol and X p I is a production rule I is in FIRST(X). If X is a non-terminal symbol and X p Y1Y2..Yn is a production rule if a terminal a in FIRST(Yi) and I is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X). if I is in all FIRST(Yj) for j=1,...,n then I is in FIRST(X). If X is I FIRST(X)={I} If X is Y1Y2..Yn if a terminal a in FIRST(Yi) and I is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X). if I is in all FIRST(Yj) for j=1,...,n then I is in FIRST(X).
28

FIRST Example
E p TE E p +TE | I T p FT T p *FT | I F p (E) | id FIRST(F) = {(,id} FIRST(T) = {*, I} FIRST(T) = {(,id} FIRST(E) = {+, I} FIRST(E) = {(,id} FIRST(TE) = {(,id} FIRST(+TE ) = {+} FIRST(I) = {I} FIRST(FT) = {(,id} FIRST(*FT) = {*} FIRST(I) = {I} FIRST((E)) = {(} FIRST(id) = {id}
29

Compute FOLLOW (for non-terminals)


If S is the start symbol $ is in FOLLOW(S)

if A p EBF is a production rule everything in FIRST(F) is FOLLOW(B) except I If ( A p EB is a production rule ) or ( A p EBF is a production rule and I is in FIRST(F) ) everything in FOLLOW(A) is in FOLLOW(B). We apply these rules until nothing more can be added to any follow set.

30

FOLLOW Example
E p TE E p +TE | I T p FT T p *FT | I F p (E) | id FOLLOW(E) = { $, ) } FOLLOW(E) = { $, ) } FOLLOW(T) = { +, ), $ } FOLLOW(T) = { +, ), $ } FOLLOW(F) = {+, *, ), $ }

31

Constructing LL(1) Parsing Table -- Algorithm


for each production rule A p E of a grammar G for each terminal a in FIRST(E) add A p E to M[A,a] If I in FIRST(E) for each terminal a in FOLLOW(A) add A p E to M[A,a] If I in FIRST(E) and $ in FOLLOW(A) add A p E to M[A,$] All other undefined entries of the parsing table are error entries.

32

Constructing LL(1) Parsing Table -- Example


E p TE E p +TE E p I FIRST(TE)={(,id} FIRST(+TE )={+} FIRST(I)={I} but since I in FIRST(I) and FOLLOW(E)={$,)} FIRST(FT)={(,id} FIRST(*FT )={*} E p TE into M[E,(] and M[E,id] E p +TE into M[E,+] none E p I into M[E,$] and M[E,)] T p FT into M[T,(] and M[T,id] T p *FT into M[T,*]

T p FT T p *FT T p I

FIRST(I)={I} none but since I in FIRST(I) and FOLLOW(T)={$,),+} T p I into M[T,$], M[T,)] and M[T,+] FIRST((E) )={(} FIRST(id)={id} F p (E) into M[F,(] F p id into M[F,id]
33

F p (E) F p id

LL(1) Parser Example2


E p TE E p +TE | I T p FT T p *FT | I F p (E) | id id E E T T F E p TE E p +TE T p FT T p I F p id T p *FT F p (E)
34

( E p TE

) E p I

$ E p I T p I

T p FT T p I

LL(1) Grammars
A grammar whose parsing table has no multiply-defined entries is said to be LL(1) grammar.
one input symbol used as a look-head symbol do determine parser action

LL(1)

left most derivation

input scanned from left to right

The parsing table of a grammar may contain more than one production rule. In this case, we say that it is not a LL(1) grammar.

35

A Grammar which is not LL(1)


SpiCtSE | a EpeS | I Cpb

36

A Grammar which is not LL(1)


SpiCtSE | a EpeS | I Cpb FIRST(iCtSE) = {i} FIRST(a) = {a} FIRST(eS) = {e} FIRST(I) = {I} FIRST(b) = {b} FOLLOW(S) = { $,e } FOLLOW(E) = { $,e } FOLLOW(C) = { t } a S Spa E C
Cpb two production rules for M[E,e] Problem ambiguity
37

e
EpeS EpI

i
S p iCtSE

$
EpI

A Grammar which is not LL(1) (cont.)


What do we have to do it if the resulting parsing table contains multiply defined entries?
If we didnt eliminate left recursion, eliminate the left recursion in the grammar. If the grammar is not left factored, we have to left factor the grammar. If its (new grammars) parsing table still contains multiply defined entries, that grammar is ambiguous or it is inherently not a LL(1) grammar.

A left recursive grammar cannot be a LL(1) grammar.


A p AE | F
any terminal that appears in FIRST(F) also appears FIRST(AE) because AE FE. If F is I, any terminal that appears in FIRST(E) also appears in FIRST(AE) and FOLLOW(A).

A grammar is not left factored, it cannot be a LL(1) grammar


A p EF1 | EF2
any terminal that appears in FIRST(EF1) also appears in FIRST(EF2).

An ambiguous grammar cannot be a LL(1) grammar.


38

Properties of LL(1) Grammars


A grammar G is LL(1) if and only if the following conditions hold for two distinctive production rules A p E and A p F 1. Both E and F cannot derive strings starting with same terminals. 2. At most one of E and F can derive to I. 3. If F can derive to I, then E cannot derive to any string starting with a terminal in FOLLOW(A).

39

Error Recovery in Predictive Parsing


An error may occur in the predictive parsing (LL(1) parsing) if the terminal symbol on the top of stack does not match with the current input symbol. if the top of stack is a non-terminal A, the current input symbol is a, and the parsing table entry M[A,a] is empty. What should the parser do in an error case? The parser should be able to give an error message (as much as possible meaningful error message). It should be recover from that error case, and it should be able to continue the parsing with the rest of the input.

40

Error Recovery Techniques


Panic-Mode Error Recovery
Skipping the input symbols until a synchronizing token is found.

Phrase-Level Error Recovery


Each empty entry in the parsing table is filled with a pointer to a specific error routine to take care that error case.

Error-Productions
If we have a good idea of the common errors that might be encountered, we can augment the grammar with productions that generate erroneous constructs. When an error production is used by the parser, we can generate appropriate error diagnostics. Since it is almost impossible to know all the errors that can be made by the programmers, this method is not practical.

Global-Correction
Ideally, we we would like a compiler to make as few change as possible in processing incorrect inputs. We have to globally analyze the input to find the error. This is an expensive method, and it is not in practice.
41

Panic-Mode Error Recovery in LL(1) Parsing


In panic-mode error recovery, we skip all the input symbols until a synchronizing token is found. What is the synchronizing token?
All the terminal-symbols in the follow set of a non-terminal can be used as a synchronizing token set for that non-terminal.

So, a simple panic-mode error recovery for the LL(1) parsing:


All the empty entries are marked as synch to indicate that the parser will skip all the input symbols until a symbol in the follow set of the non-terminal A which on the top of the stack. Then the parser will pop that non-terminal A from the stack. The parsing continues from that state. To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol from the stack and it issues an error message saying that that unmatched terminal is inserted.

42

Adding Synchronizing tokens in panic mode recovery

id E E T T F E p TE T p FT F p id

+ E p +TE synch T p I synch

( E p TE T p FT

) synch E p I synch T p I synch

$ synch E p I synch T p I synch

T p *FT synch

F p (E)

43

If M[A,a] is blank, skip the input symbol If M[A,a] is synch, pop Non terminal from TOS & continue parsing If terminal on TOS mismatch current token, pop terminal from TOS

44

Phrase-Level Error Recovery


Each empty entry in the parsing table is filled with a pointer to a special error routine which will take care that error case. These error routines may:
change, insert, or delete input symbols. issue appropriate error messages pop items from the stack.

We should be careful when we design these error routines, because we may put the parser into an infinite loop.

45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy