0% found this document useful (0 votes)
4 views30 pages

CD Notes3

This document discusses the syntax analysis phase of a compiler, focusing on the construction of parsers using context-free grammar (CFG). It explains the components of CFG, the process of parsing, derivation methods, and the significance of parse trees, as well as issues like ambiguity and left recursion in grammar. Additionally, it covers techniques like left factoring and the concepts of 'first' and 'follow' sets in grammar to aid in parser decision-making.

Uploaded by

gbmudhol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

CD Notes3

This document discusses the syntax analysis phase of a compiler, focusing on the construction of parsers using context-free grammar (CFG). It explains the components of CFG, the process of parsing, derivation methods, and the significance of parse trees, as well as issues like ambiguity and left recursion in grammar. Additionally, it covers techniques like left factoring and the concepts of 'first' and 'follow' sets in grammar to aid in parser decision-making.

Uploaded by

gbmudhol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

PCET-NMVPM’s

Nutan College of Engineering and Research, Talegaon, Pune


DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

UNIT 3
Syntax Analyzer
Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall
learn the basic concepts used in the construction of a parser.
The parser (syntax analyzer) receives the source code in the form of tokens from the
lexical analyzer and performs syntax analysis, which create a tree-like intermediate
representation that depicts the grammatical structure of the token stream.
We have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax of a given
sentence due to the limitations of the regular expressions. Regular expressions cannot
check balancing tokens, such as parenthesis. Therefore, this phase uses context-free
grammar CFG, which is recognized by pushdown automata.
Syntax of a language refers to the structure of a valid programs / statements of that
language. It is specified by certain rules known as productions and collection of such rules
is known as grammar.
Parsing is a process of determining that stream of tokens are valid or not which is
defined by a grammar.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Context Free Grammar:

What is grammar?
Grammar contains the set of rules to construct a sentence in a language. We
are defining set of rules in CFG , from these rules we will construct string.

In lexical analysis regular grammar is used and the language is generated is


known as regular language. But some of the grammars are not able to
generate the regular language. For eg.

L = {ambm | m ≥ 1}.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

The reason is, you have to reach the final state only when no. of 'a' and
no. of 'b' are equal in the input string. And to do that you have to count both,
the no. of 'a' as well as no. of 'b' but because value of 'n' can reach infinity, it's
not possible to count up to infinity using Finite automata.

Finite State Automaton has no data structure (stack) - memory as in case


of push down automaton. So it can give you some 'a's followed by some 'b's
but not exact amount of 'a' followed by that no 'b'.

So, context free grammar is type 2 grammar and it will recognized by


push down automata machine. That is the reason we have to use context free
grammar in syntax analysis phase to accept all possible languages.

 A context-free grammar (CFG) has four components:

Context free grammar is a set of recursive rules used to generate patterns of


strings. A context free grammar can generate CFL by taking set of variables
which are defined recursively , in terms of one another by a set of production
rules.

 A set of terminal symbols, sometimes referred to as "tokens."


 A set of nonterminal symbols. Sometimes called "syntactic variables."
 One nonterminal is distinguished as the start symbol.

A set of productions in the form: LHS RHS where


 LHS (called head, or left side) is a single nonterminal symbol
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

 RHS (called body, or right side) consists of zero or more terminals


and nonterminals.
 The terminals are the elementary symbols of the language defined by the
grammar.
 Non terminals impose a hierarchical structure on the language that is key
to syntax analysis and translation.
 Conventionally, the productions for the start symbol are listed first.
 The productions specify the manner in which the terminals and
 Non terminals can be combined to form strings.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)
Where,
G describe T describes a finite set of terminal symbols.
V describes a finite set of non-terminal symbols
P describes a set of production rules
S is the start symbol.s the grammar

In CFG, the start symbol is used to derive the string. You can derive the string
by repeatedly replacing a non-terminal by the right hand side of the
production, until all non-terminal have been replaced by terminal symbols.
Production rules:

S aSa
S bSb
S c
Now check that abbcbba string can be derived from the given CFG.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

S aSa
S abSba
S abbSbba
S abbcbba

Capabilities of CFG

There are the various capabilities of CFG:


1. Context free grammar is useful to describe most of the programming
languages.
2. If the grammar is properly designed then an efficient parser can be
constructed automatically.
3. Using the features of associatively & precedence information, suitable
grammars for expressions can be constructed.
4. Context free grammar is capable of describing nested structures like:
balanced parentheses, matching begin-end, corresponding if-then-else's &
so on
Derivation
Syntax analyzer will create a parse tree to check the syntax or pattern of the
string for particular language. So for generating parse tree there are two types
of parser. One is Top-Down parser and second is Bottom-up parser. In both
the methods we generate parse tree and the tree will be generated using two
methods.
1) LMD(Left Most Derivation) is used by Top-Down parser
2) RMD(Right Most Derivation) is used by Bottom-up parser.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Derivation is a sequence of production rules. It is used to get the input string


through these production rules. During parsing we have to take two decisions.
These are as follows:
1. We have to decide the non-terminal which is to be replaced.
2. We have to decide the production rule by which the non-terminal will
be replaced.
We have two options to decide which non-terminal to be replaced with
production rule.
1. Left-most Derivation
In the left most derivation, the input is scanned and replaced with the
production rule from left to right. So in left most derivatives we read the input
string from left to right.
Example :
S=S+S
S=S-S
S = a | b |c
string to be derived : a - b + c (Left Most Derivation)
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

2. Right-most Derivation
In the right most derivation, the input is scanned and replaced with the
production rule from right to left. So in right most derivatives we read the
input string from right to left.

Example : Gramar is:

S=S+S
S=S-S
S = a | b |c

String to be derived: a - b + c
aa - b + c - b + c
The right-most derivation is:
S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c
 Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be
terminal or non-terminal.
o In parsing, the string is derived using the start symbol. The root of the
parse tree is that start symbol.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

o It is the graphical representation of symbol that can be terminals or


non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree
traversed first. So, the operator in the parent node has less precedence
over the operator in the sub-tree.
The parse tree follows these points:
o All leaf nodes have to be terminals.
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.
Example: Production rules are :
T= T + T | T * T
T = a|b|c
String to be derived : a * b + c
Step 1:

Step 2:

Step 3:
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Step 4:

Step 5:

Ambiguity
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

A grammar is said to be ambiguous if there exists more than one leftmost


derivation or more than one rightmost derivative or more than one parse tree
for the given input string. If the grammar is not ambiguous then it is called
unambiguous.

Example:
S = aSb | SS
S=
For the string aabb, the above grammar generates two parse trees:

If the grammar has ambiguity then it is not good for a compiler construction.
No method can automatically detect and remove the ambiguity but you can
remove ambiguity by re-writing the whole grammar without ambiguity.

Grammar can be ambiguous in 2 cases

1. Left recursion (left most symbol of R.H.S = L.H.S)


2. Right recursion (right most symbol of R.H.S = L.H.S)

A A α |β.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

The above Grammar is left recursive because the left of production is


occurring at a first position on the right side of production. It can eliminate
left recursion by replacing a pair of production with

1) A βA 2) A αA |

In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at


each step, causing it to enter into an infinite loop. And generated language is
βα* .

In right recursion A() is going to do some work alpha(α) first and then execute
recursive function A(). So alpha(α) act as a condition checking , so no way to
fall in infinite loop. The language will be generated by right recursion is α*β.

Most of parser (Top-Down) don’t allow left recursion. Therefore we have


to eliminate left recursion without changing language i.e βα*

A-> βα* but grammar should not contain * symbol. So we will make it as

A βA now its responsibility of A’ to generate any no of alpha(α) or


including 0 no of alpha.

1)A βA
2)A αA |

Example1 − Consider the Left Recursion from the Grammar.

E E + T|T
T T * F|F
F (E)|id
Eliminate immediate left recursion from the Grammar.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Solution
Comparing E E + T|T with A A α |β

E → E +T | T

A → A α | Β

A = E, α = +T, β = T

A A α |β is changed to A βA and A α A |ε

A βA means E TE

A α A |ε means E +TE |ε

Comparing T T F|F with A Aα|β


T → T *F | F

A → A α | β

∴ A = T, α =∗ F, β = F

∴ A → β A′ means T → FT′
A → α A′|ε means T′ →* FT′|ε
Production F → (E)|id does not have any left recursion
∴ Combining productions 1, 2, 3, 4, 5, we get

E TE
E +TE | ε
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

T FT
T * FT |ε
F (E)| id

Till now we have seen grammar can be ambiguous or unambiguous,


left recursive or right recursive. One more thing is there - grammar is
Deterministic or Nondeterministic.
A αβ1| αβ2| αβ3
In above grammar on seeing α it goes to β1 next β2 or β3. So in Non-deterministic
we have many option on single symbol. We can do parsing for selected grammar
such as it has to be simplified grammar.

 Left Factoring-
The grammar with common prefix between at least two different productions
from the same L.H.S (non terminal symbol in CFG) is known as Non
Deterministic grammar. Because for one symbol it has many production. In
this case of Non Deterministic grammar compiler cann’t decide unique
production for particular terminal and many times it needs to backtrack for
searching correct production. This backtracking process is more time
consuming and top down parser don’t allow backtracking. So to convert Non-
Deterministic grammar into Deterministic grammar is known as left factoring
method.
In left factoring,
 We make one production for each common prefixes.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

 The common prefix may be a terminal or a non-terminal or a combination


of both.
 Rest of the derivation is added by new productions.
The grammar obtained after the process of left factoring is called as Left
Factored Grammar.

Example :

Example 1: Do left factoring in the following grammar-

S iEtS / iEtSeS / a
E b

Solution-
The left factored grammar is-
S iEtSS’ / a
S’ eS /
E b
Example 2: Do left factoring in the following grammar-

A aAB / aBc / aAc


PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Solution :

A aA’
A’ → AB / Bc / Ac
Again this grammar has common prefix A.
A’ → AA’’
A’’ → B / c

First and Follow


 Why to find First?
We saw the need of backtrack in the previous part of on Introduction to
Syntax Analysis, which is really a complex process to implement.
If the compiler would have come to know in advance, that what is the
“first character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can
wisely take decision on which production rule to apply.
Let’s take the same grammar from the previous article:
S -> cAd
A -> bc|a
And the input string is “cad”.
Thus, in the example above, if it knew that after reading character ‘c’ in the
input string and applying S->cAd, next character in the input string is ‘a’, then
it would have ignored the production rule A->bc (because ‘b’ is the first
character of the string produced by this production rule, not ‘a’ ), and directly
use the production rule A->a (because ‘a’ is the first character of the string
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

produced by this production rule, and is same as the current character of the
input string which is also ‘a’).
Hence it is validated that if the compiler/parser knows about first character of
the string that can be obtained by applying a production rule, then it can
wisely apply the correct production rule to get the correct syntax tree for the
given input string.
 Why FOLLOW?
The parser faces one more problem. Let us consider below grammar to
understand this problem.
A -> aBb
B -> c | ε
And suppose the input string is “ab” to parse.

As the first character in the input is a, the parser applies the rule A->aBb.

A
/| \
a B b
Now the parser checks for the second character of the input string which is
b, and the Non-Terminal to derive is B, but the parser can’t get any string
derivable from B that contains b as first character.
But the Grammar does contain a production rule B -> ε, if that is applied
then B will vanish, and the parser gets the input “ab”, as shown below. But
the parser can apply it only when it knows that the character that follows B
in the production rule is same as the current character in the input.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

In RHS of A -> aBb, b follows Non-Terminal B, i.e. FOLLOW(B) = {b}, and the
current input character read is also b. Hence the parser applies this rule.
And it is able to get the string “ab” from the given grammar.
A A
/ | \ / \
a B b => a b
|
ε
So FOLLOW can make a Non-terminal vanish out if needed to generate the
string from the parse tree.

The conclusions is, we need to find FIRST and FOLLOW sets for a given
grammar so that the parser can properly apply the needed rule at the
correct position.

 FIRST
FIRST(X) for a grammar symbol X is the set of terminals that begin the strings
derivable from X.

Rules to compute FIRST set:

1) If x is a terminal, then FIRST(x) = { ‘x’ }


2) If x-> Є, is a production rule, then add Є to FIRST(x).
Example 1:
Production Rules of Grammar
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | Є
C -> h | Є

FIRST sets

FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba) where ever Є is there put it


in production and again see the first of symbol.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

In grammar C , B has Є production, if we put Є to B then the first of A is C and


then find C’s first is h.
= { d, g, h, b, a, Є}

FIRST(A) = { d } U FIRST(BC)
= { d, g, h, Є }
FIRST(B) = { g , Є }

FIRST(C) = { h , Є }

Follow
Follow(X) to be the set of terminals that can appear immediately to the right o
Rules to compute FOLLOW set:
1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal
2) If A -> pBq is a production, where p, B and q are any grammar symbols,
then everything in FIRST(q) except Є is in FOLLOW(B).
3) If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B).
4) If A->pBq is a production and FIRST(q) contains Є,
then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A) f Non-Terminal X in
some sentential form.
Example :
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

F -> (E) | id

FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

FOLLOW Set
FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th rule
FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = {+,$,)}
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }
Example 2:
Production Rules:
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }
Example 3:
Production Rules:

S -> ACB|Cbb|Ba

A -> da|BC

B-> g|Є

C-> h| Є

FIRST set
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a}

FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }

FIRST(B) = { g, Є }

FIRST(C) = { h, Є }

FOLLOW Set

FOLLOW(S) = { $ }

FOLLOW(A) = { h, g, $ }

FOLLOW(B) = { a, $, h, g }

FOLLOW(C) = { b, g, $, h }

Note :

1. Є as a FOLLOW doesn’t mean anything (Є is an empty string).


2. $ is called end-marker, which represents the end of the input string, hence
used while parsing to indicate that the input string has been completely
processed.
3. The grammar used above is Context-Free Grammar (CFG). The syntax of a
programming language can be specified using CFG.
4. CFG is of the form A -> B, where A is a single Non-Terminal, and B can be a
set of grammar symbols ( i.e. Terminals as well as Non-Terminals)
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

LL(1) Parser
Here the 1st L represents that the scanning of the Input will be done from
Left to Right manner and the second L shows that in this parsing technique we
are going to use Left most Derivation Tree. And finally, the 1 represents the
number of look-ahead, which means how many symbols are you going to see
when you want to make a decision.

Algorithm to construct LL(1) Parsing Table:

Step 1: First check for left recursion in the grammar, if there is left recursion
in the grammar remove that and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
 First(): If there is a variable, and from that variable, if we try to drive all
the strings then the beginning Terminal Symbol is called the First.
 Follow(): What is the Terminal Symbol which follows a variable in the
process of derivation.
Step 3: For each production A –> α. (A tends to alpha)
 Find First(α) and for each terminal in First(α), make entry A –> α in the
table.
 If First(α) contains ε (epsilon) as terminal than, find the Follow(A) and
for each terminal in Follow(A), make entry A –> α in the table.
 If the First(α) contains ε and Follow(A) contains $ as terminal, then
make entry A –> α in the table for the $.
To construct the parsing table, we have two functions:
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

In the table, rows will contain the Non-Terminals and the column will
contain the Terminal Symbols. All the Null Productions of the Grammars will
go under the Follow elements and the remaining productions will lie under
the elements of the First set.

Example-1:
Consider the Grammar:
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)
*ε denotes epsilon

Find their First and Follow sets:

First Follow
E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/ε { +, ε } { $, ) }

T –> FT’ { id, ( } { +, $, ) }


T’ –> *FT’/ε { *, ε } { +, $, ) }

F –> id/(E) { id, ( } { *, +, $, ) }


Now, the LL(1) Parsing Table is:

id + * ( ) $

E E –> TE’ E –> TE’


Rule 1 Rule 1
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

E’ E’ –> +TE’ E’ –> ε E’ –> ε


Rule 2 Rule 3
T T –> FT’ T –> FT’

T’ T’ –> ε T’ –> *FT’ T’ –> ε T’ –> ε

F F –> id F –> (E)

Here you can write production numbers also rather than production rules.

As you can see that all the null productions are put under the Follow set of
that symbol and all the remaining productions are lie under the First of that
symbol.

Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible
that one cell may contain more than one production. If each cell contain only
one production then the grammar is LL(1) or can be accepted by LL(1) parser.

Operator Precedence Parser


An operator precedence parser is a bottom-up parser that interprets an
operator grammar. This parser is only used for operator grammars.
Ambiguous grammars are not allowed in any parser except operator
precedence parser. It is mainly used to define mathematical operators for
compiler. First we have to see operator grammar.

Operator Grammar:

A grammar that is used to define mathematical operators is called an operator


grammar or operator precedence grammar. Such grammars have the
restriction that no production has either an empty right-hand side (null
productions) or two adjacent non-terminals in its right-hand side.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

Examples –
 This is an example of operator grammar:
o E->E+E/E*E/id
 However, the grammar given below is not an operator grammar
because two non-terminals are adjacent to each other:
o S->SAS/a
o A->bSb/b
 We can convert it into an operator grammar, though:
o S->SbSbS/SbS/a
o A->bSb/b
There are two methods for determining what precedence relations should
hold between a pair of terminals:
Use the conventional associativity and precedence of operator.
The second method of selecting operator-precedence relations is first to
construct an unambiguous grammar for the language, a grammar that reflects
the correct associativity and precedence in its parse trees.
This parser relies on the following three precedence relations: , ,
 a b This means a “yields precedence to” b.
 a b This means a “takes precedence over” b.
 a b This means a “has same precedence as” b.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

The given grammar is :


E->E+E/E*E/id
So the operator relation table will contain only terminals.

 There is not given any relation between id and id as id will not be


compared and two variables can not come side by side.
 Id and + , id will be given highest precedence. Because always identifier
will given highest precedence compared to any other operator.
 (+, +) so first we read row terminal and then column terminal, so (+, +) are
having same precedence and that’s why we have to check associativity of
the operator. (+, -, *, /) operators are left associative, so we will write > sign
for row terminal.
 $ is always having less precedence.
 So using operator precedence table we will parse the input string.
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

If we want to parse the string is : id + id * id


1) Whenever the top of the stack is less than or equal to look ahead
operator of input buffer, we will push the terminal in to the stack.
2) Whenever the top of the stack is greater , we will pop the terminal from
the stack.
3) Initially top of the stack is $.
4) TOS (top of the stack) is $ and look ahead is at “id” so ($, id) will
compared and its relation is <, so we will push “id” and increment look
ahead.
5) Now TOS is “id” and look ahead is at “+” (id, +) will compared , in table it
shows >, so we will pop it from stack.

$ id +

6) Now TOS is “+” and look ahead is at “id” so (+, id) will compared , it
shows < so we will push “id” in stack again.
7) So , like this way we have to follow all steps .
 So, operator relation table has a disadvantage – if we have n operators then
size of table will be n*n and complexity will be 0(n2). In order to decrease
the size of table, we use operator function table.
 Operator precedence parsers usually do not store the precedence table
with the relations; rather they are implemented in a special way. Operator
precedence parsers use precedence functions that map terminal symbols
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

to integers, and the precedence relations between the symbols are


implemented by numerical comparison. The parsing table can be encoded
by two precedence functions f and g that map terminal symbols to integers.
We select f and g such that:

f(a) < g(b) whenever a yields precedence to b


f(a) = g(b) whenever a and b have the same precedence
f(a) > g(b) whenever a takes precedence over b
Example – Consider the following grammar:
E -> E + E/E * E/( E )/id
This is the directed graph representing the precedence function:

Since there is no cycle in the graph, we can make this function table:
From this graph we have to find longest route.
fid -> g* -> f+ ->g+ -> f$
PCET-NMVPM’s
Nutan College of Engineering and Research, Talegaon, Pune
DEPARTMENT OF COMPUTER ENGINEERING AND SCIENCE

gid -> f* -> g* ->f+ -> g+ ->f$

from this route we will feel the function table. As from fid how many arrows
are there upto the end…… 4 so we will write 4 for fid.
f+ path is 2 so we will fill 2.

Size of the table is 2n.


 One disadvantage of function tables is that even though we have blank
entries in relation table we have non-blank entries in function table. Blank
entries are also called error. Hence error detection capability of relation
table is greater than function table.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy