Module-3 Syntax Analyzer
Module-3 Syntax Analyzer
Module-3 Syntax Analyzer
2.1 Introduction
Every programming language such as C or PASCAL has rules that prescribe the syntactic
structure of well-formed programs. The syntax of programming language constructs can
be described by CFG or BNF (Backus Naur Form) notation. The parser determines the
syntax or structure of a program. That is, it checks whether the input is syntactically
correct or not. Before proceeding further, let us see what is a context free grammar, what
is derivation and some other important terms that are used in coming chapters.
Definition: The context free grammar in short a CFG is 4-tuple G = (V, T, P, S) where
V is set of variables. The variables are also called non-terminals.
T is set of terminals.
P is set of productions. All productions in P are of the form A→ α where A is a non-
terminal and α is string of grammar symbols.
S is the start symbol.
2.3 Derivation
Now, let us see “What is derivation?”
Definition: The process of obtaining string of terminals and/or non-terminals from the
start symbol by applying some set of productions (it may include all productions) is
called derivation.
For example, if A → B and B → are the productions, the string can be
obtained from A- production as shown below:
A B [ Apply the production A → B]
[ Replace B by using the production B → ]
The above derivation can also be written as shown below:
A
Systematic approach to Compiler Design - 2.3
Observe the following points:
If a string is obtained by applying only one production, then it is called one-step
derivation and is denoted by the symbol „ „.
If one or more productions are applied to get the string from A, then we write
A
If zero or more productions are applied to get the string from A, then we write
A
Example 2.1: Consider the grammar shown below from which any arithmetic
expression can be obtained.
E → E+E
E → E-E
E → E*E
E → E/E
E → id
Obtain the string id + id * id and show the derivation for the same.
E E+E
id + E
id + E * E
id + id * E
id + id * id
Thus, the above sequence of steps can also be written as:
E id + id * id
which indicates that the string id + id * id is obtained in one or more steps by applying
various productions.
Now, let us see “What are the two types of derivations?” The two types of derivations
are:
Leftmost derivation
Rightmost derivation
E → E+E
E → E*E
E → (E)
E → id
The leftmost derivation for the string id + id * id can be obtained as shown below:
E E+E
lm
id + E
id + E * E
id + id * E
id + id * id
E → E+E
E → E*E
E → (E)
E → id
The rightmost derivation for the string id + id * id can be obtained as shown below:
E E+E
rm
E+E*E
E + E * id
E + id * id
id + id * id
Systematic approach to Compiler Design - 2.5
2.4 Sentence
E E+E
id + E
id + E * E
id + id * E
id + id * id
The final string of terminals i.e., id + id * id is called sentence of the grammar.
Now, let us see “What the different sentential forms?” The two sentential forms are:
Left sentential form
Right sentential form
Definition: If there is a derivation of the form S α, where at each step in the derivation
process only a left most variable is replaced, then α is called left-sentential form of G.
For example, consider the following grammar and its leftmost derivation:
Definition: If there is a derivation of the form S α, where at each step in the derivation
process only a right most non-terminal is replaced, then α is called right-sentential form
of G.
For example, consider the following grammar and its rightmost derivation:
In the above rightmost derivation, the string of grammar symbols obtained in each step
such as:
{ E + E, E + E * E, E + E * id, E + id * id, id + id * id }
are various right-sentential forms of the given grammar.
Example 2.2: Obtain the leftmost derivation for the string aaabbabbba using the
following grammar.
S → aB| bA
A → aS | bAA | a
B → bS | aBB | b
Now, let us see “What is the language generated by grammar?” The formal definition of
the language accepted by a grammar is defined as shown below.
i.e., w is a string of terminals obtained from the start symbol S by applying various
productions.
For example, for the grammar A → a | aA the various strings that are generated are a, aa,
aaa, ……and so on.
The derivation can be shown in the form of a tree. Such trees are called derivation or
parse trees. The leftmost derivation as well as the right most derivation can be
represented using derivation trees. Now, let us see “What is derivation tree or parse tree?”
The derivation tree can be defined as shown below.
Definition: Let G = (V, T, P, S) be a CFG. The tree is derivation tree (parse tree) with
the following properties.
1. The root has the label S.
2. Every vertex has a label which is in (V U T U ).
3. Every leaf node has label from T and an interior vertex has a label from V.
4. If a vertex is labeled A and if X1, X2, X3, …. Xn are all children of A from left,
then A X1X2X3….Xn must be a production in P.
For example, consider the following grammar and its rightmost derivation along with
parse tree:
E → E*E E+E*E E + E
E → (E) E + E * id
E → id E + id * id E *
id E
id + id * id
id id
2.8 Syntax Analyzer
Now, let us see “What is the yield of the tree?” The yield of a tree can be formally
defined as follows:
Definition: The yield of a tree is the string of symbols obtained by only reading the
leaves of the tree from left to right without considering the -symbols. The yield of the
tree is derived always from the root and the yield of the tree is always a terminal string.
For example, consider the derivation tree (or parse tree) shown below:
E + E
E *
id E
id id
If we read only the terminal symbols in the above parse tree from left to right we get id +
id * id and id + id * id is the yield of the given parse tree.
Example 2.3: Consider the following grammar from which an arithmetic expression can
be obtained:
E → E+E
E → E-E
E → E*E
E → E/E
E → (E) | I
I → id
Show that the grammar is ambiguous.
Systematic approach to Compiler Design - 2.9
Solution: The sentence id + id * id can be obtained from leftmost derivation in two
ways as shown below.
E E+E E E*E
id + E E+E*E
id + E * E id + E * E
id + id * E id + id * E
id + id * id id + id * id
The corresponding derivation trees for the two leftmost derivations are shown below:
E E
E + E E * E
E * E + id
id E E
id id id id
Since the two parse trees are different for the same sentence id + id * id by applying
leftmost derivation, the grammar is ambiguous.
Since there are two different parse trees for the string „ibtibtaea‟ by applying leftmost
derivation the given grammar is ambiguous. The grammar has two interpretations or two
meanings.
We have already seen in the previous problem that the grammar corresponding to if-
statement is ambiguous. This is due to dangling-else. The dangling else problem can be
eliminated and thus ambiguity of the grammar can also be eliminated.
Now, let us see “What is dangling else problem?” Consider the following grammar:
S → iCtS | iCtSeS | a
C → b
where
i stands for keyword if
C stands for Condition to be satisfied. Here C is a non-terminal
t stands for keyword then
S stands statement for non-terminal
e stands for keyword else
a stands for other statement
b stands for other statement
Since the above grammar is ambiguous, we get two different parse trees for the string
ibtibtaea (Look at solution for previous problem for details) as shown below:
Systematic approach to Compiler Design - 2.11
S S
i C t S i C t S e S
b i C t S e S b ii a
C t S
b a a
b a
Parse tree 1 Parse tree 2
Since there are two parse trees for the same string ibtibtaea the given grammar is
ambiguous. Observe the following points:
The first parse tree associates else with 2nd if-statement
The second parse tree associates else with first if-statement.
This ambiguity whether to associate else with first if-statement or second if-statement is
called “dangling else problem”.
Now, let us see “How dangling else problem can be solved?” The dangling else problem
can be solved by constructing unambiguous grammar as shown below:
Step 1: The matched statement M is an if-else statement where the statement S before
else and after else keyword is matched. This can be expressed as:
M→iCtMeM
S→M|U
M→iCtMeM
U→iCtS
U→iCtMeU
Observe that the above grammar associates else with closest then and eliminates
ambiguity from the grammar.
Example 2.6: Convert the following ambiguous grammar into unambiguous grammar
E → E*E|E-E
E → E^E|E/E
E → E+E
E → (E) | id
The grammar can be converted into unambiguous grammar using the precedence of
operators as well as associativity operators as shown below:
Step 1: Arrange the operators in increaing order of the precedence along with
associativity as shown below:
Operators Associativity non-terminal used
+,– LEFT E
*, / LEFT T
^ RIGHT P
Since there are three levels of precedence, we associate three non-terminals: E, T and P.
Also an extra non-terminal F, generating basic units in an arithmetic expression.
Step 2: The basic units in expression are id (identifier) and parenthesized expressions.
The production corresponding to this can be written as:
F → (E) | id
Systematic approach to Compiler Design - 2.13
Step 3: The next highest priority operator is ^ and it is right associative. So, the
production must start from the non-terminal P and it should have right recursion as shown
below:
P→F^P|F
Step 4: The next highest priority operators are * and / and they are left associative. So,
the production must start from the non-terminal T and it should have left recursion as
shown below:
T→T*P|T/P|P
Step 5: The next highest priority operators are + and – and they are left associative. So,
the production must start from the non-terminal E and it should have left recursion as
shown below:
E→E+T|E–T |T
Step 6: The final grammar which is unambiguous can be written as shown below:
E→E+T|E–T |T
T→T*P|T/P|P
P→F^P|F
F → (E) | id
Example 2.7: Convert the following ambiguous grammar into unambiguous grammar
E→E+E
E→E–E
E→E^E
E→E*E
E→E/E
E → (E) | id
by considering * and – operators lowest priority and they are left associative, / and +
operators have the highest priority and are right associative and ^ operator has precedence
in between and it is left associative.
The grammar can be converted into unambiguous grammar using the precedence of
operators as well as associativity operators as shown below:
2.14 Syntax Analyzer
Step 1: Arrange the operators in increasing order of the precedence along with
associativity as shown below:
Since there are three levels of precedence we associate three non-terminals: E, P and T.
Also use an extra non-terminal F generating basic units in an arithmetic expression.
Step 2: The basic units in expression are id (identifier) and parenthesized expressions.
The production corresponding to this can be written as:
F → (E) | id
Step 3: The next highest priority operators are + and / and they are right associative. So,
the production must start from the non-terminal T and it should be right recursive in RHS
of the production as shown below:
T→F+T|F/T|F
Step 4: The next highest priority operator is ^ and it is left associative. So, the production
must start from the non-terminal P and it should be left recursive in RHS of the
production as shown below:
P→P^T|T
Step 5: The next highest priority operators are * and – and they are left associative. So,
the production must start from the non-terminal E and it should be left recursive in RHS
of the production as shown below:
E→E+P|E–P |P
Step 6: The final grammar which is unambiguous can be written as shown below:
E→E+P|E–P |P
P→P^T|T
T→F+T|F/T|F
F → (E) | id
Systematic approach to Compiler Design - 2.15
Symbol
table
token
source Lexical Syntax parse Rest of
program analyzer analyzer tree phases
get
next token
Error
Handler
The role of the parser or the various activities that are performed by the parser are shown
below:
Parser reads sequence of tokens from the lexical analyzer
The parser checks whether the tokens obtained from lexical analyzer can be
successfully generated. This is done by obtaining a derivation for the sequence of
tokens and builds the parse tree.
If a derivation is obtained using the sequence of tokens it indicates that program is
syntactically correct and the parse tree is generated.
If a derivation is not obtained using the sequence of tokens it indicates that program is
syntactically wrong and the parse tree is not generated. Now, the parser reports
appropriate error messages clearly and accurately along with line numbers
Parser also recovers from each error quickly so that subsequent errors can be detected
and displayed so that the user can correct the programs.
2.16 Syntax Analyzer
The following activities are performed whenever errors are detected by the parser:
Detect the syntax errors accurately and produce appropriate error messages so that the
programmer can correct the program.
It has to recover from the errors quickly and detect subsequent errors in the program.
Error handler should take all the actions very fast and should not slowdown the
compilation process.
Now, let us see “What are error recovery strategies of the parser (or syntax analyzer)?”
The various error recovery techniques are:
Panic mode recovery
Error productions
Phrase level recovery
Global correction
Panic Mode Recovery: It is the simplest and most popular error recovery method. When
an error is detected, the parser discards symbols one at a time until next valid token
(called synchronizing token) is found. The typical synchronizing tokens are:
statement terminators such as semicolon
Expression terminators such as \n
It often skips a considerable amount of input without checking it for additional errors.
Once the synchronizing token found, the parser will continue from that point onwards to
identify the subsequent errors. In situations where multiple errors are in the same
statement, this method is not useful.
(5 ** 2) +8
The parser scans the input from left to right and finds no mistake after reading (, 5
and *.
After reading the second *, it knows that no expression has consecutive * operators
and it displays an error “Extra * in the input”
Now, it has to recover from the error. In panic mode recovery, it skips all input
symbols till the next integer 2 is encountered. Here, 2 is the synchronizing token.
Thus, error is detected and recovered from the error in panic mode recovery
Systematic approach to Compiler Design - 2.17
Disadvantages
Can resolve many errors, but not all potential errors.
The introduction of error productions will complicate the grammar
Disadvantages
Very difficult to implement
Slows down the parsing of correct programs
Proper care must be taken while choosing replacements as they may lead to infinite
loops.
Global Correction: This is also one of the error correction strategies. The various points
to remember in this error correction method are:
These methods replace incorrect input with correct input using least-cost-correction
algorithms.
These algorithms take an incorrect input string x and grammar G, and find a parse
tree for a related string y, such that the number of insertions, deletions and changes of
tokens required to transform x into y is as small as possible.
These methods are costly to implement in terms of time and space and hence are only
of theoretical interest.
In this section, let us see “What are the different types of parsers?”
2.18 Syntax Analyzer
Definition: The process of constructing a parse tree for the string of tokens (obtained
from the lexical analyzer) from top i.e., starting from the root node and creating the nodes
of the parse tree in preorder in depth-first-search manner is called top down parsing
technique. Thus, top-down parsing can be viewed as an attempt to find a leftmost
derivation for an input string and constructing the parse tree for that derivation. The
parser that uses this approach is called top down parser. Since the parsing starts from top
(i.e., root) down to the leaves, it is called top down parser.
Example 2.8: Show the top-down parsing process for the string id + id * id for the
grammar
E→E+E
E→E*E
E → (E)
E → id
id + E (fig b) step 2
id + E * E (fig c) step 3
id + id * E (fig d) step 4
id + id * id (fig e) step 5
Systematic approach to Compiler Design - 2.19
The above derivation can be written in the form of a parse tree from the start symbol
using top-down approach as shown below:
E + E E + E
id
E + E E + E E + E
id E * E id E * E id E * E
id id id
(Fig. d) (Fig. e) (Fig. f)
For example, the procedure for the production A → α can be written as shown below:
To match the terminal a, we compare current input symbol with a. If there is match, it
is syntactically correct and we increment the input pointer and get the next token
If the current input symbol is not a, it is syntactically wrong and appropriate error
message is displayed
Some error correction may be done to recover from each error quickly so that
subsequent errors can be detected and displayed so that the user can correct the
programs.
The general procedure for a recursive-descent parsing that uses top-down parser is shown
below:
Example 2.9: Algorithm for recursive descent parser (backtracking is not supported)
Now, let us write the recursive parsers for some of the grammars.
Example 2.10: Write the recursive descent parser for the following grammar
E→T
T→F
F → (E) | id
Now, let us see “What are the different types of recursive descent parsers?” The recursive
descent parser can be classified into two types:
Recursive descent parser with backtracking
Recursive descent parser without backtracking (predictive parser)
Now, let us see “What is the need for backtracking in recursive descent parser?” The
backtracking is necessary for the following reasons:
During parsing, the productions are applied one by one. But, if two or more
alternative productions are there, they are applied in order from left to right one at a
time.
Systematic approach to Compiler Design - 2.23
But, the recursive descent parsers with backtracking are not frequently used. So, we just
concentrate on how they work with example.
Example 2.11: Show the steps involved in recursive descent parser with backtracking for
the input string cad for the following grammar
S → cAd
A → ab | a
Solution: The three parts that are used while parsing the string are:
Step 1: The only unexplored node is S and we apply the production S → cAd to expand
the non-terminal S as shown below:
S root
c A d
match(c) and increment i/p pointer
c ad
↑
input pointer
Step 2: Now, the next node to be expanded is A and input pointer points to a as shown
below:
S root
c A d
cad
↑
input pointer
2.24 Syntax Analyzer
Step 3: Since two productions are there from A, the first production A → ab is selected
and non-terminal A is expanded as shown below:
S root S root
c A d c A d b does not
match with d
a b a b
c a d match(a) and c a d
↑ increment i/p ↑
input pointer pointer input pointer
Step 4: Observe that by selecting A → ab the input string is not matched. So, we have to
reset the pointer to input symbol a (so as to get the parse tree shown in step 2). This is
done using backtracking and is shown in (fig a). After backtracking try expanding A
using the second production A → a and then proceed comparing as shown in (fig b).
S root S root
c A d c A d match(a) and
a increment i/p
cad
pointer
↑ cad
input pointer ↑
input pointer
(Fig a.)
Step 5: Now, the next symbol d in grammar is compared with d in the input and they
match. Finally, we halt and announce successful completion of parsing.
Now, let us see “For what type of grammars recursive descent parser cannot be
constructed? What is the solution?”
Immediate left recursion: A grammar G is said to have immediate left recursion if it has a
production of the form:
A → A
For example, consider the following grammar:
E→E+T|T
T→T*F|F
F → (E) | id
In the above grammar consider the first two productions:
E→E+T
T→T*F
Observe that in the above two productions, the first symbol on the right hand side of the
production is same as the symbol on the left hand side of the production. So, the given
grammar has immediate left recursion in two productions.
Indirect left recursion: A left recursion involving derivations of two or more steps so that
the first symbol on the right hand side of the partial derivation is same as the symbol
from which the derivation started is called indirect left recursion. For example, consider
the following grammar:
E→T
T→F
F → E + T | id
Now, let us write the recursive descent parser for the grammar having left recursion.
Example 2.12: Consider the production E→ E + T . Write the recursive descent parser.
Solution: The recursive descent parser for the production E→ E + T can be written as
shown below:
if (input_symbol = „+‟)
advance input pointer
else
error
end if
T();
}
Now, let us see “What is the problem in constructing recursive descent parser for the
grammar having left recursion?” Observe the following points (with respect to the above
procedure which has left recursion)
When a procedure is invoked, the parameter values along with return address will be
pushed on to the stack and hence stack size decreases
The procedure E() is called recursively infinitely without consuming any input and
hence the size of the stack grows very fast and stack will be full soon.
Since there is no space left on the stack to push parameter values and return address,
the system crashes.
So. the recursive descent parser that is built using left-recursive grammar can cause a
parser to go into an infinite loop eventually crashing the system and hence the left
recursive grammar is not suitable for recursive descent parser. Hence, we have to
eliminate left recursion from the grammar and then parse the string.
Systematic approach to Compiler Design - 2.27
{ β, β , β , β ……..}
Observe from above derivations that the language L consists of β followed zero or more
α‟s. The same language can be represented and generated using different grammar as
shown below:
L = { β i | i 0}
↓ ↓
A → β A' where A' should generate zero or more α‟s
From A' we can get zero or more α‟s using the following productions:
A' → ϵ | αA'
So the final grammar that generate β followed by zero or more α‟s which do not have
left recursion is shown below:
A → β A'
A' → ϵ | αA'
Thus, the grammar which has left recursion can be written in the form of another
grammar that does not have left recursion as shown below:
Left recursive grammar Right recursive grammar
A → A | β A → β A'
A'→ ϵ | αA'
In general,
1) E → E + T | T E → TE'
↓ ↓ ↓ ↓ E'→ +TE' | ϵ
A → A 1 | β1
2) T → T * F | F T → FT'
↓ ↓ ↓ ↓ T'→ *FT' | ϵ
A → A 1 | β1
3) F → (E) | id F → (E) | id
The final grammar obtained after eliminating left recursion can be written as shown
below:
E → TE'
E1→ +TE' | ϵ
T → FT'
T1→ *FT' | ϵ
F → (E) | id
Systematic approach to Compiler Design - 2.29
Now, we can write the recursive descent parser for the above grammar which is obtained
after eliminating left recursion.
Example 2.14: Write the recursive descent parser for the following grammar:
E → TE'
E'→ +TE' | ϵ
T → FT'
T'→ *FT' | ϵ
F → (E) | id
The recursive descent parser for the above grammar is shown below. Note that for each
non-terminal there is a procedure and the right hand side of the production is
implemented as the body of the procedure as shown below:
Example 2.15: Obtain top-down parse for the string id+id*id for the following grammar
E → TE'
E' → + TE' |
T → FT'
T' → *FT' |
F → (E) | id
The top-down parse for the string id+id*id for the above grammar can be written as
shown below:
Systematic approach to Compiler Design - 2.31
E lm E lm E
T E1 T E1
lm E lm E lm F E T1
T E1 T E1 T E1
F T1 F T1 F T1 + T E1
id id ϵ id ϵ
lm E lm E lm E
T E1 T E1 T E1
F T1 + T E1 F T1 + T E1 F T1 + T E1
id ϵ F T1 id ϵ F T1 id ϵ F T1
id id * F T1
lm E lm E lm E
T E1 T E1 T E1
F T1 + T E1 F T1 + T E1 F T1 + T E1
id ϵ F T1 id ϵ F T1 id ϵ F T1 ϵ
id * F T1 id * F T1 id * F T1
id id ϵ id ϵ
Step 3: Now, the grammar obtained after eliminating indirect left recursion is shown
below:
S → Aa | b
A → Ac | Aad | bd | ϵ
Now, immediate left recursion can be eliminated as shown below:
1) S → Aa | b S → Aa | b
2) A → A c | A ad | bd | ϵ A → bd A'| ϵ A'
↓ ↓ ↓ ↓ ↓ A'→ cA'| adA'|ϵ
A → A 1| A 2 | β1 | β2
So, the final grammar obtained after eliminating left recursion is shown below:
S → Aa | b S → Aa | b
Now, let us “Write the algorithm to eliminate left recursion” The algorithm to eliminate
left recursion is shown below:
Example 2.17: Algorithm to eliminate left recursion (including indirect left recursion)
Now, let us see “What is left factoring? What is the need for left factoring?”
Ex 1: The grammar to generate string consisting of at least one „a‟ followed by at least
one „b‟ can be written as shown below:
S → aAbB
A→ aA | ϵ Left factored grammar
B → bB | ϵ
Ex 2: The grammar that generates string consisting of at least one „a‟ followed by at least
one „b‟ can also be written as shown below:
S → AB
A→ aA | a Non Left factored grammar
B → bB | b
The two B-productions have a common prefix “b” on the right side of the production.
Since common prefix is present in both A-productions and B-productions, it is not
left-factored grammar.
Note: If two or more productions starting from same non-terminal have a common prefix,
the grammar is not left-factored.
Now, let us see “What is the use of left factoring?” Left factoring is must for top down
parser such as recursive descent parser with backtracking or predictive parser which is
also recursive descent parser without backtracking. This is because, if A-production has
two or more alternate productions and they have a common prefix, then the parser has
some confusion in selecting the appropriate production for expanding the non-terminal A
Now, the question is “How to do left factoring?” The left-factoring can be done as
shown below:
A → α A'
A' → β1| β2
Systematic approach to Compiler Design - 2.35
Now, after seeing the input derived from α, we can expand A' either to β1 or to β2. So,
the given grammar is converted into left-factored grammar as shown below:
Now, let us “Write the algorithm for doing left-factoring” The algorithm for doing left-
factoring is shown below:
Algorithm LEFT_FACTOR(G)
Input: Grammar G
1) For each non-terminal A, find the longest prefix α which is common to two or more
of its alternatives.
A→ α A' | γ
A'→ β1 | β2 | β3 |….. βn
3) Repeatedly apply the transformation in step 2 as long as two alternatives for a non-
terminal have a common prefix
A→ α β1 | α β2 | γ S'→ ϵ | eS
2) C → b C→ b
So, the final grammar which is obtained after doing left-factoring is shown below:
S → iCtSS' | a
S'→ ϵ | eS
C→ b
Ambiguity in the grammar: A grammar having two or more left most derivations or
two or more right most derivations is called ambiguous grammar. For example, the
following grammar is ambiguous:
E → E + E | E – E | E * E | E / E | ( E ) | id
The ambiguous grammar is not suitable for top-down parser. So, ambiguity has to be
eliminated from the grammar. (For details refer section 2.6)
E→E+T|T
T→T*F|F
F → ( E ) | is
The above grammar is unambiguous but, it is having left recursion and hence, it is not
suitable for top down parser. So, left recursion has to be eliminated (For details refer
section 2.8.3 and 2.8.4)
Backtracking: The backtracking is necessary for top down parser for following
reasons:
1) During parsing, the productions are applied one by one. But, if two or more
alternative productions are there, they are applied in order from left to right one at
a time.
2) When a particular production applied fails to expand the non-terminal properly,
we have to apply the alternate production. Before trying alternate production, it is
necessary undo the activities done using the current production. This is possibly
only using backtracking.
Even though backtracking parsers are more powerful than predictive parsers, they are
also much slower, requiring exponential time in general and therefore, backtracking
parsers are not suitable for practical compilers.
Now, let us see “What is a predictive parser? Explain the working of predictive parser. “
Since, it can predict which production to use while parsing, it is called predictive parser.
The predictive parsers accepts a restricted grammar called LL(k) grammars (defined in
section 2.10)
Now, let us see “What are the various components of predictive parser? How it works?”
The working of predictive parser can be explained easily by knowing the various
components of the predictive parser. The block diagram showing the various parts of
predictive parser are shown below:
Systematic approach to Compiler Design - 2.39
Input buffer
a1a2a3……an$ The predictive parser has
four components namely:
Input
Stack
Output
X Parser program Parsing table
Y
Z Parsing program
$ Output
Stack
Parsing table
Input : The input buffer contains the string to be parsed and the input string ends
with „$‟. Here, $ indicates the end of the input.
Stack : It contains sequence of grammar symbols and „$‟ is placed initially on top of
the stack. When $ is on top of the stack, it indicates that stack is empty.
Parser : It is a program which takes different actions based on X which is the symbol
on top of the stack and the current input symbol a.
Output: As output, the productions that are used are displayed using which the parse
tree can be constructed.
Working of the parser: The various actions performed by the parser are shown below:
1) If X = a = $, that is, if the symbol on top of the stack and the current input symbol is
$, then parsing is successful.
2) If X = a ≠ $, that is, if the symbol on top of the stack is same as the current input
symbol but not equal to $, then pop X from the stack and advance the input pointer to
point to next symbol.
2.40 Syntax Analyzer
3) If X is a terminal and ≠ a, that is, the symbol on top of the stack is not equal to the
current input symbol, then error()
4) If X is a non-terminal and a is the input symbol, the parser consults the parsing table
M[X, a] which contains either an X production or an error entry. If X → UVW is the
corresponding production, the parser pops X from the stack and pushes U, V and W
in reverse order onto the stack.
Now, before seeing how the parser parses the string, let us “Explain parsing table and
how to use the parsing table? or “What information is given in the predictive parsing
table?” The parsing table details and how it can be used can be explained using the
example.
Example 2.20: Consider the following grammar and the corresponding predictive parsing
table:
E → TE'
GRAMMAR
E' → + TE' |
T → FT'
T' → *FT' |
F → (E) | id
lookahead tokens
M 2d parsing table
id + * ( ) $
E E → TE' E→ TE'
E' E' → +TE' E' → E' →
T T → FT' T → FT'
1
T' T' → T' → *FT T' → T' →
F F → id F → (E)
leftmost variable
The various information that we get from the above parsing table are shown below:
The symbols present in the first column of table M i.e., E, E', T, T' and F represent
left most non-terminals in the derivation. Let us denote the non-terminal in general
by A
Systematic approach to Compiler Design - 2.41
The symbols present in the first row such as id, +, *, (, ) and $ represent next input
tokens obtained from the lexical analyzer. Let us denote the terminal in general by
„a’.
The entry in a particular row A and column „a’ denoted by M[A, a] may be either
blank or a production. This is the production predicted for a variable A when the
input symbol is „a’. Now, parsing is done as shown below:
1) If E is on top of the stack and id is the input symbol, the parser consults the
parsing table M[E, id], gets the production E → TE'. Now, the parser removes E
from the stack and push TE' in reverse order. So, the entry M[E, id] = E → TE'
indicates that in the current leftmost derivation, E is the left most non-terminal.
When the token id is read from the input, we expand the non-terminal E using the
production E → TE'.
2) If E' is on top of the stack and input is „)‟, the parser consults the parsing table
M[E', )] and gets the production E' → ϵ. Now, the parser removes E' from the
stack. But, nothing is there on the right side of the production to push. That is, the
entry M[E', )] = E' → ϵ indicates that in the current leftmost derivation, E' is the
leftmost non-terminal and it is replaced by ϵ. Thus, only the leftmost variable is
replaced at each step when the input symbol (lookahead token) is read from the
input buffer which results in leftmost derivation. Thus, we say that predictive
parsing will mimic the leftmost derivation.
3) The entry in row E and column „+‟ is blank. This indicates an error entry and the
parser should display appropriate error messages.
Now, the various actions performed by the parser are can be implemented using algorithm. The
complete algorithm to parse the string using predictive parser is shown below:
Input: The string w ending with $ (end of the input) and the parsing table
Output: If w L(G) i.e., if the input string is generated successfully from the parser, the
parse tree using leftmost derivation is constructed. Otherwise, the parser displays error
message.
2.42 Syntax Analyzer
Method: Initially the $ and S are placed on the stack and the input buffer contains input
string w ending with $. The algorithm shown below uses the parsing table and produce
the parse tree. But, instead of displaying the parse tree, we generate the productions that
are used to generate the parse tree.
Example 2.22: Consider the following grammar and the corresponding predictive
parsing table:
E → TE'
GRAMMAR
E' → + TE' |
T → FT'
T' → *FT' |
F → (E) | id
M Parsing Table
id + * ( ) $
E E → TE' E→ TE'
E' E1 → +TE' E' → E' →
T T → FT' T → FT'
T' T' → T1 → *FT' T' → T' →
F F → id F → (E)
Show the sequence of moves made by the predictive parser for the string id+id*id during parsing.
Solution: The sequence of moves made by the parser for the string id+id*id is shown
below:
id + * ( ) $
E E → TE' E→ TE'
E' E' → +TE' E' → E' →
T T → FT' T → FT'
T' T' → T' → *FT' T' → T' →
F F → id F → (E)
$ $ ACCEPT
Since the stack contains $ and the input pointer points to $, the string id+id*id is parsed
successfully.
The predictive parser can be easily constructed once we know FIRST and FOLLOW sets.
These sets of symbols help us to construct the predictive parsing table very easily.
ϵ if α = ϵ Definition 1
FIRST(α) = ϵ if α ϵ Definition 2
a if α aβ Definition 3
Example 2.23: Compute FIRST sets for each non-terminal in the following grammar
E → TE'
E' → + TE' |
T → FT'
T' → *FT' |
F → (E) | id
Solution: The FIRST sets for the given grammar can be computed by obtaining various
derivations as shown below:
FIRST(E) =
FIRST(T) =
FIRST(F) = { (, id }
Consider the derivations not used in previous derivation:
E' + T E' T' * F T'
E' ϵ T' ϵ
E E' T T' F
FIRST (, id ϵ, + (, id ϵ, * (, id
2.46 Syntax Analyzer
Now, the question is “What is the use of FIRST sets?” The FIRST sets can be used
during predictive parsing while creating the predictive parsing table as shown below:
Consider the A-production A→ α | β and assume FIRST(α) and FIRST(β) are disjoint
i.e., FIRST(α) ∩ FIRST(β) = {} which is an empty set.
If the input symbol obtained from lexical analyzer is a and if a is in FIRST(α) then
use the production A→ α during parsing.
If the input symbol obtained from lexical analyzer is b and if b is in FIRST(β) then
use the production A→ β during parsing.
Thus, using FIRST sets we can choose what production to use between the two
productions A→ α | β when input symbol is a or b.
Now, let us see “What are the rules to be followed to compute FIRST(X)?” or “What is
the algorithm to compute FIRST(X)?” The algorithm or the rules to compute FIRST(X)
are shown below:
ALGORITHM FIRST(X)
Rule 1: If X → a where a is a terminal, then FIRST(X) ← a
Rule 2: If X → ϵ, then FIRST(X) ← ϵ
Rule 3: If X → Y1Y2Y3………Yn and if Y1Y2Y3………Yi – 1 ϵ, then
FIRST(X) ← non-ϵ symbols in FIRST(Yi).
Now, let us see how FIRST sets are computed by taking some specific examples:
1) Rule 1 is applied if the first symbol on the right hand side of the production is a
terminal. If so, then add only the first symbol.
Ex 1: if A → aBC, then FIRST(A) = {a}
Ex 2: if E → +TE then1
FIRST(E) = {+}
Ex 3: if A → abc, then FIRST(A) = {a}
2) Rule 2 is applied only for ϵ-productions
Ex 1: if A → ϵ, then FIRST(A) = { ϵ }
Ex 2: if E1 → ϵ, then FIRST(E1) = { ϵ }
3) Rule 3 is applied for all productions not considered in first two steps
Systematic approach to Compiler Design - 2.47
Ex : Consider the productions:
S → ABCd
A→ ϵ | +B
B→ ϵ | *B
C→ ϵ | %B
FIRST(A), FIRST(B), FIRST(C) are computed using rules 1 and 2 as shown below:
S A B C
FIRST ϵ, + ϵ, * ϵ, %
To compute FIRST(S) consider the production S → ABCd and apply rule 3 as shown
below:
a) S → ABCd Add non- ϵ symbols of FIRST(A) to FIRST(S)
step (d)
step (a)
step (b)
step (c)
FIRST(A), FIRST(B), FIRST(C) are computed using rules 1 and 2 as shown below:
2.48 Syntax Analyzer
S A B C
FIRST ϵ, + ϵ, * ϵ, %
To compute FIRST(S) consider the production S → ABC and apply rule 3 as shown
below:
a) S → ABC Add non- ϵ symbols of FIRST(A) to FIRST(S)
So, all the above actions are pictorially represented as shown below:
S A B C
FIRST %, *, +, ϵ ϵ, + ϵ, * ϵ, %
step (d)
step (a)
step (b)
step (c)
Solution: Using FIRST(A), FIRST(B) and FIRST(C), the FIRST(ABC) can be obtained
as shown below:
A B C
FIRST(ABC) FIRST +, ϵ *, ϵ %, -
non- ϵ symbols 1 2 3
+
non- ϵ symbols
*
non- ϵ symbols
%, -
Solution: Using FIRST(A), FIRST(B) and FIRST(C), the FIRST(ABC) can be obtained
as shown below:
A B C
FIRST(ABC) FIRST +, ϵ *, ϵ %, -, ϵ
non- ϵ symbols 1 2 3
+
non- ϵ symbols
*
non- ϵ symbols
%, -
ϵ ϵ 4
2.50 Syntax Analyzer
1 Add non- ϵ symbols of FIRST(A)
ALGORITHM FOLLOW(A)
Rule 1: FOLLOW(S) ← $ where S is the start symbol.
Rule 2: If A → B is a production and ≠ then FOLLOW(B) ← non- symbols in
FIRST()
Rule 3: If A → B is a production and , then FOLLOW(B) ← FOLLOW(A)
Example 2.26: Compute FIRST and FILLOW sets for the following grammar:
E → TE'
E' → + TE' |
T → FT'
T' → *FT' |
F → (E) | id
Systematic approach to Compiler Design - 2.51
a) Computing FIRST sets: The FIRST sets can be computed as shown below:
+, ϵ *, ϵ (, id
E E' T T' F
Rule 3 - (a) Rule 3 - (b)
Rule 3: Consider the productions not considered earlier and obtain FIRST sets as shown
below:
a) E → T E' Add “FIRST(T) - ϵ” to FIRST(E) i.e., draw an
edge from T to E in above figure.
In the above figure, transfer FIRST(T) to FIRST(E) and from FIRST(F) to FIRST(T). So,
the final FIRST sets are shown below:
FIRST (, id +, ϵ (, id *, ϵ (, id
E E' T T' F
E $, ) E' $, ) T +, $, ) T' +, $, ) F +, *, $, )
Rule 2 &3 : Apply rule 2 and 3 for every production of the form A → αBβ where B
is a non-terminal. In the first column shown below, copy from FIRST(β) to
FOLLOW(B) and in the second column copy from FOLLOW(A) to FOLLOW(B).
2.52 Syntax Analyzer
E → T E' E → T E'
A→αB β A→αB β
T → F T' T → F T'
A→αB β A→αB β
F → (E) F → (E)
A → αB β Rule 3 not applicable
A → αB β
Systematic approach to Compiler Design - 2.53
Now, let us see “What are the steps to be followed while constructing the predictive
parser?” The various steps to be followed while constructing the predictive parser are
shown below:
If the grammar is ambiguous, eliminate ambiguity from the grammar
If the grammar has left recursion, eliminate left recursion
If the grammar has two or more alternatives having common prefix, then do left-
factoring
The resulting grammar is suitable for constructing predictive parsing table
2.9.3 Constructing predictive parsing table
Now, using FIRST and FOLLOW sets, we can easily construct the predictive parsing
table and the productions are entered into the table M[A, a] where
M is a 2-dimensional array representing the predictive parsing table
A is a non-terminal which represent the row values
a is a terminal or $ which is endmarker and represent the column values
Now, let us “Write the algorithm to construct the predictive parsing table” The complete
algorithm is shown below:
ALGORITHM Predictive_Parsing_Table(G, M)
Input : Grammar G
Output : Predictive parsing table M
Procedure : For each production A → α of grammar G apply the following rules
1) For each terminal a in FIRST(α), add A→ α to M[A, a]
2) If FIRST(α) contains , for each symbol b in FOLLOW(A), add A→ α to M[A, b]
Example 2.27: Obtain the predictive parsing table for the following grammar
E → TE'
E' → + TE' |
T → FT'
T1 → *FT' |
F → (E) | id
Solution: The FIRST and FOLLOW sets of each non-terminal of the given grammar are
shown below: (See example 2.26 for details)
E E' T T' F
FIRST (, id +, ϵ (, id *, ϵ (, id
FOLLOW ), $ ), $ +, ), $ +, ), $ +, *, ), $
2.54 Syntax Analyzer
For every production of the form A → α, we compute FIRST(α) and entries of the
parsing table can be done as shown below:
F → (E) ( M [ F, ( ] = F → (E) 2
A
F → id id M [ F, id ] = F → id 2
A
id + * ( ) $
E E → TE' E → TE'
E' E' → +TE' E' → E' →
T T → FT' T → FT'
T' T' → T' → *FT' T' → T' →
F F → id F → (E)
Systematic approach to Compiler Design - 2.55
Note: Since there are no multiple entries in the parsing table, the given grammar is called
LL(1) grammar. If multiple entries are present in the parsing table, the grammar is not
LL(1). The predictive parser accepts only the language generated from LL(1) grammar.
Example 2.28: Compute FIRST and FOLLOW symbols and predictive parsing table for
the following grammar:
S → iCtS | iCtSeS | a
C→ b
Is the following grammar LL(1)?
2.56 Syntax Analyzer
Solution: We know that the grammar is not left-factored since, two productions have
common prefix “iCtS”. So, it is necessary to do the left-factoring for the given grammar.
The left-factored grammar (for details refer section 2.8.5, example 2.19) is shown below:
S → iCtSS'| a
S'→ ϵ | eS
C→ b
Rule 2: S' → ϵ
i, a e, ϵ b
Rule 3: This rule is not applied, since all productions are already considered when we
apply first two rules. So, the final FIRST sets are shown below:
S S' C
FIRST i, a e, ϵ b
FOLLOW sets:
S S' C
FOLLOW $, e $, e t
Rule 2 &3 : Apply rule 2 and 3 for every production of the form A → αBβ where B
is a non-terminal. In the first column shown below, copy from FIRST(β) to
FOLLOW(B) and in the second column copy from FOLLOW(A) to FOLLOW(B).
Systematic approach to Compiler Design - 2.57
Rule 2 (β ≠ ϵ) FOLLOW(B) ← FIRST(β) - ϵ Rule 3 (β ϵ) FOLLOW(A) → FOLLOW(B)
t
S → i C t S S' rule 3 not applicable
A→αB β
S → i C t S S' S → i C t S S'
A→ α B β A→ α B β
Note: The productions S → a and C → b are not considered while computing FOLLOW
since there are no variables in those productions. So, the FIRST and FOLLOW sets for
the left-factored grammar are shown below:
S S' C
FIRST a, i e, ϵ b
FOLLOW $, e $, e t
To check whether the grammar is LL(1) or not: Without constructing the predictive
parser also we can check whether the grammar is LL(1) or not. If the grammar is LL(1),
the following two conditions must be satisfied:
Since one of the condition fails, the given grammar is not LL(1). For a grammar to be
LL(1), the both the conditions must be satisfied.
C→ b b M [ C, b ] = C→b 1
A
a b e i t $
S S→a S → iCtSS'
S 1
S'→ eS S' →
S' →
C C→b
Systematic approach to Compiler Design - 2.59
Since the first symbol on RHS of the production is same as the symbol on LHS of the
production, the given grammar is having left-recursion and hence, it is not suitable
for predictive parser.
b) To make it suitable for LL(1) parser or predictive parser, we need to eliminate left-
recursion as shown below: (For details refer section 2.8.4)
1) S → a | (L) S→ a | (L)
2) L → L , S | S L → SL'
↓ ↓ ↓ ↓ L'→ , S L' | ϵ
A → A 1 | β1
The final grammar obtained after eliminating left recursion can be written as shown
below:
S→ a | (L)
L → SL'
L'→ , S L' | ϵ
2.60 Syntax Analyzer
c) Computing FIRST and FOLLOW: The first set can be computed as shown below:
Rule 2: L' → ϵ
S a, ( L L' ,ϵ
Rule 3: Consider the productions not considered earlier and obtain FIRST sets as shown
below:
a) L → S L' Add “FIRST(S) - ϵ” to FIRST(L)
In the above figure, transfer FIRST(S) to FIRST(L). So, the final FIRST sets are shown
below:
S L L'
FIRST a, ( a, ( ,ϵ
FOLLOW sets:
FOLLOW S L L'
$ ) )
Rule 2 &3 : Apply rule 2 and 3 for every production of the form A → αBβ where B
is a non-terminal. In the first column shown below, copy from FIRST(β) to
FOLLOW(B) and in the second column copy from FOLLOW(A) to FOLLOW(B).
)
S→ ( L ) rule 3 not applicable
A→αB β
L → S L'
L → S L'
A→αB β
A→αB β
Systematic approach to Compiler Design - 2.61
Note: The productions S → a and L' → ϵ are not considered while computing
FOLLOW since they do not have non-terminals in those productions. So, the FIRST
and FOLLOW sets for the left-factored grammar are shown below:
S L L'
FIRST a ( a ( ,ϵ
FOLLOW ,$) ) )
d) Construction of parsing table: For every production of the form A → α, we
compute FIRST(α) and entries of the parsing table can be done as shown below:
A M [ L, ( ] = L → SL'
2
L' → M [ L', ) ] = L'→
A
L' → , SL' , M [ L', „,‟] = L' → , SL' 1
A
2.62 Syntax Analyzer
The above entries can be entered into parsing table as shown below:
( ) a , $
S S → (L) S→a
L L→ SL1 L→ SL1
L1 L1 → L1 → , SL1
Since there are no multiple entries in the parse table, the resulting grammar obtained after
eliminating left recursion is LL(1).
$ ) L' ) L' S , ,a))$ Match , Pop „,‟ and increment i/p pointer
$ $ Accept
Note: Since stack is empty and i/p pointer also points to $ which is endmarker, parsing is
successful
E → 5+T|3–T
T → V | V*V | V+V
V→ a|b
a) The E-productions and V productions are suitable for parsing. But, consider the
production:
T → V | V*V | V+V
In the T-production, one or more productions have a common prefix V and hence the
given grammar is not left-factored grammar. So, the given grammar is not suitable
for predictive parser.
2.64 Syntax Analyzer
b) To make it suitable for LL(1) parser or predictive parser, we need to do left factoring
(For details refer section 2.8.5). If an A-production has two or more alternate
productions and they have a common prefix, then the parser has some confusion in
selecting the appropriate production for expanding the non-terminal A. So, left
factoring is must for top down parser. This can be done as shown below:
1) E → 5 + T | 3 – T E → 5+T|3–T
2) T → V ϵ | V * V | V + V T → V T'
A → α β1 | α β 2 | α β3 T'→ ϵ | * V | + V
3) V → a|b V→ a|b
So, the final grammar which is obtained after doing left-factoring is shown below:
E → 5+T|3–T
T → V T'
T'→ ϵ | * V | + V
V→ a|b
c) Computing FIRST and FOLLOW: The first set can be computed as shown below:
Rule 1: E → 5 + T T' → * V V→ a
E→ 3-T E→ b
T' → + V
Rule 2: T' → ϵ
E 53 T T' * +ϵ V ab
Rule 3: Consider the productions not considered earlier and obtain FIRST sets as shown
below:
b) T → V T' Add “FIRST(V) - ϵ” to FIRST(T)
In the above figure, transfer FIRST(V) to FIRST(T). So, the final FIRST sets are shown
below:
Systematic approach to Compiler Design - 2.65
FIRST E 5, 3 T a, b T1 *, +, ϵ V a, b
FOLLOW sets:
FOLLOW E $ T $ T1 $ V *, + ,$
Rule 2 &3 : Apply rule 2 and 3 for every production of the form A → αBβ where B
is a non-terminal. In the first column shown below, copy from FIRST(β) to
FOLLOW(B) and in the second column copy from FOLLOW(A) to FOLLOW(B).
T → V T1 T → V T1
A→αBβ A→αB β
So, the FIRST and FOLLOW sets for the left-factored grammar are shown below:
E T T1 V
FIRST 5, 3 a, b *,+ , ϵ a,b
FOLLOW $ $ $ *,+,$
d) Now, for the grammar to be LL(1) the following two conditions must be satisfied:
E→3–T 3 M [ E, 3 ] = E→3–T 1
A
T → VT1 a, b M [ T, a ] = T → VT1 1
A M [ T, b ] = T → VT1
T →
1 M [ T1, $ ] = T1 → 2
A
1
T → *V * M [ T1 , * ] = T1 → *V 1
A
1
T → +V + M [ T1 , + ] = T1 → +V 1
A
V→a a M [ V, a ] = V→a 1
A
V→b b M [ V, b ] = V→b 1
A
5 3 a b * + $
E E→5+T E→3–T
T T → VT1 T → VT1
T1 T1→ *V T1→ +V T1 →
V V→a V→b
Since there are no multiple entries in the parse table, the resulting grammar obtained after
doing left factoring is LL(1).
Rule 1: Z → d X→a Y→ c
Rule 2: Y→ ϵ
Z d X a Y c, ϵ
Rule 3: Consider the productions not considered earlier and obtain FIRST sets as
shown below:
FIRST Z a, c, d X a, c, ϵ Y c, ϵ
FOLLOW sets:
FOLLOW Z $ X a, c, d Y a, c, d
β
Z→ XYZ β = FIRST(Y) - ϵ + Rule 3 is not applicable
A→ α B β FIRST(Z) - ϵ
β
Z→ XYZ β = FIRST(Z) - ϵ Rule 3 is not applicable
A→ α Bβ
FOLLOW (X)
X→a a M [ X, a ] = X→a 1
A
X→Y c, M [ X, c ] = X→Y 1
A
M [ X, a ] = X→Y
M [ X, c ] = X→Y
M [ X, d ] = X→Y
2
FOLLOW (X)
The parsing table is shown below:
a c d $
Z Z→ XYZ Z→ XYZ Z→d
Z→ XYZ
X X→a X→Y X→Y
X→Y
Y Y→ Y→ c Y→
Y→
Since there are multiple entries in the parse table, the given grammar is not LL(1).
Example 2.32 : Left factor the following grammar and obtain LL(1) parsing table
E→T+E|T
T → float | float * T | (E)
Systematic approach to Compiler Design - 2.71
Solution: Since the right hand side of E-production and T-production has common
prefixes, this grammar is not suitable for parsing. So, we have to do left factoring and see
that two or more productions do not have common prefix. Left-factoring can be done as
shown below:
The left factoring can be done to the given grammar as shown below:
1) E → T + E | T E → T E1
A→ α β1 | α β2 E1 → + E |
So, the grammar obtained after doing left factoring is shown below:
E → T E1
E1 → + E |
T → float T1 | ( E )
T1→ | *T
Rule 1: E1 → + E T → float T1 T1 → * T
T→ ( E )
Rule 2: E1 → T1 → ϵ
E E1 +, T float, ( T1 *,
Rule 3: Consider the productions not considered earlier and obtain FIRST sets as
shown below:
E → T E1 Add “FIRST(T) - ϵ” to FIRST(E)
2.72 Syntax Analyzer
In the above figure, transfer FIRST(T) to FIRST(E). So, the final FIRST sets are
shown below:
FOLLOW sets:
FOLLOW E $, ) E1 $, ) T +, $, ) T1 +,$, )
E → T E1 E → T E1
A→ α B β A→ α B β
T → (E) T → (E)
A → αB β Rule 3 not applicable
A → αB β
The terminal on top of the stack does not match with the next input symbol
When non-terminal A is on top of the stack, a is the next input symbol and M[A, a]
has blank entry (blank denote an error)
The error recovery is done using panic mode and phrase-level recovery as shown below:
Panic mode: In this approach, error recovery is done by skipping symbols from the
input until a token matches with synchronizing tokens. The synchronizing tokens are
selected such that the parser should quickly recover from the errors that are likely to
occur in practice. Some of the recovery techniques are shown below:
1) For a non-terminal A, consider the symbols in FOLLOW(A). These symbols can
be considered as synchronizing tokens and are added into parsing table replacing
only blank entries. Now, whenever there is a mismatch, keep skipping the tokens
till we get one of the synchronizing character and remove A from the stack. It is
likely that parsing can continue.
2) For a non-terminal A, consider the symbols in FIRST(A). These symbols can also
be considered as synchronizing characters and add to the parsing table replacing
only blank entries. Now, whenever there is a mismatch, keep skipping the tokens
till we get one of the synchronizing character and remove A from the stack. It is
also likely that parsing can continue.
3) If a terminal on top of the stack cannot be matched, pop the terminal from the
stack and issue “Error message” and insert the corresponding terminal and
continue parsing.
For example, consider the parsing table containing synchronizing tokens and
sequence of moves made by the parser in example 2.33 given later in this section.
Phrase level recovery: This recovery method is implemented by filling the blank entries
in the predictive parsing table with pointers to error routines. These routines may change,
insert, replace or delete symbols from the input and issue appropriate error messages.
They may also pop from the stack.
id + * ( ) $
E E → TE1 E→ TE1
E1 E1 → +TE1 E1 → E1 →
T T → FT1 T → FT1
T1 T1 →
1
T → *FT 1
T1 → T1 →
F F → id F → (E)
Add the synchronizing tokens for the above parsing table and show the sequence of
moves made by parser for the string “ ) id * + id”
Solution: The synchronizing characters are the characters present in FIRST or FOLLW
sets of each non-terminal. In our example, let us add synchronizing characters by
considering FOLLOW of each non-terminal replacing each blank entry in the parsing
table. The FOLLOW sets of each non-terminal are shown below (Refer example 2.26 for
details):
E E1 T T1 F
FOLLOW $, ) $, ) +, $, ) +, $, ) +, *, $, )
Now, FOLLOW(E) = { $, ) }. So, M[E, $] = M[E,)] = synch only for blank entries.
Similarly, FOLLOW(F) = {+, *, $, ) }. So, M[F,+] = M[F,*] = M[F, $] = M[F,) = synch.
On similar lines we add synchronizing characters to the parsing table as shown below:
id + * ( ) $
E E → TE1 E→ TE1 synch synch
E1 E1 → +TE1 E1 → E1 →
T T → FT1 synch T → FT1 synch synch
T1 T1 → T1 → *FT1 T1 → T1 →
F F → id synch synch F → (E) synch synch
Now, the sequence of moves made by the parser for the string “ ) id * + id” is shown
below:
$ $ ACCEPT
Note: Observe that parsing is successful and the parser has also recognized two errors.
By looking at these errors if the programmer corrects the program, parsing action is
successful without any errors.
Computing FIRST sets: The first sets of LHS of the production is nothing but the
terminals obtained from the first symbols on the RHS of the production.
Computing FOLLOW sets: The FOLLOW sets of any non-terminal A on RHS of the
production are obtained the following rules:
1) Sets of terminals immediately following A or sets of first symbols obtained from the
non-terminals immediately following A
2) If A is on LHS of the production and B is right most symbol on RHS of the
production then FOLLOW(B) = FOLLOW(A)
Exercises
1) What is a context free grammar? What is derivation? What are the two types of
derivations?
2) Define the terms: leftmost derivation, rightmost derivation, sentence
Systematic approach to Compiler Design - 2.77
3) What the different sentential forms? What is left sentential form? What is right
sentential form?
4) Define the terms: Language, derivation tree, yield of a tree, ambiguous grammar
5) Show that the following grammar is ambiguous
E → E+E
E → E-E
E → E*E
E → E/E
E → (E) | I
I → id
6) Is the following grammar ambiguous? (if-statement or if-then-else)
S → iCtS | iCtSeS | a
C → b
7) What is dangling else problem? How dangling else problem can be solved
8) Eliminate ambiguity from the following ambiguous grammar:
S → iCtS | iCtSeS | a
C → b
9) Convert the following ambiguous grammar into unambiguous grammar using normal
precedence and associativity of the operators
E → E*E|E-E
E → E^E|E/E
E → E+E
E → (E) | id
10) Convert the following ambiguous grammar into unambiguous grammar
E→E+E
E→E–E
E→E^E
E→E*E
E→E/E
E → (E) | id
by considering * and – operators lowest priority and they are left associative, / and +
operators have the highest priority and are right associative and ^ operator has
precedence in between and it is left associative.
11) What is parsing? What are the different types of parsers?
2.78 Syntax Analyzer
12) What are error recovery strategies of the parser (or syntax analyzer)?”
13) What is top down parser? Show the top-down parsing process for the string id + id *
id for the grammar
i. E → E + E
ii. E → E * E
iii. E → (E)
iv. E → id
14) What is recursive descent parser? Write the algorithm for recursive descent parser
15) Write the recursive descent parser for the following grammar
E→T
T→F
F → (E) | id
16) What are the different types of recursive descent parsers? What is the need for
backtracking in recursive descent parser
17) Show the steps involved in recursive descent parser with backtracking for the input
string cad for the following grammar
S → cAd
A → ab | a
18) For what type of grammars recursive descent parser cannot be constructed? What is
the solution?
19) What is left recursion? What problems are encountered if a recursive descent parser is
constructed for a grammar having left recursion?
20) Write the procedure to eliminate left recursion
21) Eliminate left recursion from the following grammar
E → E +T | T
T→ T*F|F
F → (E) | id
22) Write the recursive descent parser for the following grammar:
E → TE1
E1→ +TE1 | ϵ
T → FT1
T1→ *FT1 | ϵ
F → (E) | id
23) Obtain top-down parse for the string id+id*id for the following grammar
E → TE1
Systematic approach to Compiler Design - 2.79
E1 → + TE1 |
T → FT1
T1 → *FT1 |
F → (E) | id
24) Eliminate left recursion from the following grammar:
S → Aa | b
A → Ac | Sd | ϵ
25) Write the algorithm to eliminate left recursion
26) What is left factoring? What is the need for left factoring? How to do left factoring?
Write the algorithm for doing left-factoring
S → iCtS | iCtSeS | a
C → b
28) Briefly explain the problems associated with top-down parser?
29) What is a predictive parser? Explain the working of predictive parser.
30) What are the various components of predictive parser? How it works?
31) Define FIRST and FOLLOW sets and write the rules to compute FIRST and
FOLLOW sets
32) Consider the following grammar:
E → TE1
E1 → + TE1 |
T → FT1
T1 → *FT1 |
F → (E) | id
a) Compute FIRST and FILLOW sets for the following grammar:
b) Obtain the predictive parsing table
c) Show the sequence of moves made by the parser for the string id+id*id
d) Add the synchronizing tokens for the above parsing table and show the sequence
of moves made by parser for the string “ ) id * + id”
33) What is LL (1) grammar? How to check whether a given grammar is LL(1) or not
without constructing the predictive parser
2.80 Syntax Analyzer
34) Compute FIRST and FOLLOW symbols and predictive parsing table for the
following grammar and check whether the grammar is LL(1) or not.
S → iCtS | iCtSeS | a
C→ b
35) Given the following grammar:
S → a | (L)
L→ L,S|S
a. Is the grammar suitable for predictive parser?
b. Do the necessary changes to make it suitable for LL(1) parser
c. Compute FIRST and FOLLOW sets for each non-terminal
d. Obtain the parsing table and check whether the resulting grammar is LL(1) or not.
e. Show the moves made by the predictive parser on the input “( a , ( a , a ) )”