0% found this document useful (0 votes)
2 views

Chapter 3

The document discusses compiler design, focusing on syntax analysis, parsing techniques, and context-free grammar (CFG). It covers the roles of parsing, derivation methods, and the structure of parse trees, along with concepts like ambiguity and resolving it. Additionally, it explains top-down and bottom-up parsing strategies, including examples and notational conventions used in grammar definitions.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 3

The document discusses compiler design, focusing on syntax analysis, parsing techniques, and context-free grammar (CFG). It covers the roles of parsing, derivation methods, and the structure of parse trees, along with concepts like ambiguity and resolving it. Additionally, it explains top-down and bottom-up parsing strategies, including examples and notational conventions used in grammar definitions.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Compiler design

Department of Computer Science


FACULTY OF COMPUTING
Debre Markos University
Year IV Semester I
by…
Birku L
B.Sc. CS, M.Sc. SE
The Academic year of 2016 E.C
Chapter 3 Outline
Role of parsing and Parsing tree?
 Context free Grammar
 Derivation
 Ambiguity
Top-Down Parsing
 Recursive parsing
 Non Recursive parsing
Bottom-Up Parsing
 S-R parsing
 Operator-Precedence Parsing
 L- R Parsing, LALR parsing
 SLR and CLR Parsing
Syntax Analysis
This phase takes the
produced by the lexical analysis and arranges
these in a tree-structure (called the ) that reflects
the structure of the program. Also known as
the
List of Token Syntax Analysis Syntax tree

Error Messages

Each represents an
The of represent .
Key concepts of Syntax Analysis
refers to the set of rules, principles, and processes that
govern the structure of sentences in a given language,
specifically word order and hierarchical structure.
:- is the process of analyzing a sequence of input tokens
(words or symbols) to determine its grammatical structure.
This structure is often represented as a parse tree or syntax
tree.
:-A formal grammar defines the syntactic rules of a
language. Common types of grammars used in parsing include
context-free grammars (CFGs) and regular grammars.
:-A parse tree is a tree representation that depicts the
syntactic structure of a string according to some formal
grammar. The tree’s nodes represent syntactic categories, and
the edges represent the relationship between these categories.
Syntax Analysis
: Syntax Tree for Assignment Statement

 The above have an labeled with as


( and a

 Labeled with as ( and the multiplication of ( and a

Labeled assign the in to (

 So are interior node also the


of
Role of Syntax Analysis(Parsing)
1. To the
2. To them into using

3. To a parse tree to the next phase(Semantic


Analysis)
4. To if those do not
a .
Source Lexical Token To semantic
program
Parser analysis
Analyzer getNextToken

Symbol table
Context-free grammar (CFG)
 Context-Free Grammars (CFGs) are a type of formal grammar used
to define the syntax of programming languages and other formal
languages.
are good for the of
.
 Can define the languages, a strict of the
, i.e. than

 Normally used to classify …


 Simpler and more concise for than a
 More efficient can be built from
CFGs are best explained by example...
are used to impose structure
Context-free grammar (or CFG)
 Many constructs have an
that can be defined by
.
 we might have a defined by
a such as
and are statements and is an expression. then

 This form of cannot be using


the notation of ;
 On the other hand use the
to denote the class of and
the class of . we can express
using the

 For this kinds of we are using


Context-Free Grammar (CFG)
textbooks use different and terms to describe

Formally, a
= or a finite set
= or a finite set
= a finite set
= Start SV

 Productions’ form, where

:
: and/or
• Rules how to rewrite (beginning with
) into terminals
Context-free grammar (CFG)
four those

are the from which


The word ” “ is a synonym for " “ when
we are talking about for . each
of the , . and is a terminal.
are that denote .
and are . The nonterminal define sets of
that define the by the
In a grammar. one is as
the .
of a grammar specify the in which the
and can be to form .
 Each consists of a . followed by an
(sometimes the symbol used in
followed by a of and .
Arithmetic Expressions
we want to all
using , , , and
.
Here is one possible
Expr → Expr Op Expr
The nonterminal symbols are
V = {Expr, OP}
Expr → (Expr )
The terminal symbols are
Expr → id
T = {(,),id,+,-,*,/,%}
Op → +
How many Production are
Op → -
P = 3 for Expr and 5 for Op
Op → *
Total 8 productions
Op → /
S = Start Symbol is Expr
Op → %
Example of CFG
Suppose we want to describe only and
.
Here is one possible
V = {Expr}
Expr → Expr + Expr T = {id,+,*}
Expr → Expr * Expr P=3
S = Expr
Expr → id
V = {S}
T = {a}
S → | aS | S a | a P=3
S=S

V = {S}
T = {a,b,e}
S → |aSbS | bSaS | e P=3
S=S
Notational Conventions
 To avoid always having to state that " ,"
"these are the ”, and so on. we shall employ the
following with regard to
1. These are :
letters early in the alphabet such as a, b, c.
symbols such as +, - ,* , / etc.
symbols such as parentheses, comma.
0, 1, …….., 9.
strings such as or .
2. These are :
letters early in the alphabet such as
b) The , which. when it appears, is usually the
,
italic names such as or .
Notational Conventions
letters in the alphabet, chiefly , represent
strings of .
4. , it for , represent
strings or grammar . Thus, a generic could be
written as , indicating that there is a single
on the of the (
) and a symbols to the of
the ( ).
5. If are an with
on the left (we call them ), we may
the. for
.
6. Unless otherwise stated. the of the first is
the start .
 Using these , we could write the of
concisely as Expr → Expr A Expr | (Expr) |id
A→ +|-|*|/|%
Derivation
 Productions are treated as rules to generate a string this
process is called .
 Show that a is in the ( )
• Start with the
replace of the by a side
of a
when the contains only
 At step, we choose a to replace.
• This can to .
Left-most derivation
. of a sentential form is one
in which rules transforming the nonterminal are
always applied
Expr → | Expr + Expr V = {Expr}
T = {id,+,*}
Expr → | Expr * Expr P=3
Expr → |id S = Expr
String id + id * id  2+ 3 * 4

Expr  Expr + Expr Expr  Expr * Expr


Expr  Id + Expr Expr  Expr + Expr * Expr
Expr  Id + Expr * Expr Expr  Id + Expr * Expr
Expr  Id + Id * Expr Expr  Id + Id * Expr
Expr  Id + Id * Id Expr  Id + Id * Id
Right-most derivation
. of a sentential form is one in
which rules transforming the right-most nonterminal are always
applied
Expr → | Expr + Expr V = {Expr}
T = {id,+,*}
Expr → | Expr * Expr P=3
Expr → |id S = Expr
String id + id * id  2+ 3 * 4

Expr  Expr + Expr Expr  Expr * Expr


Expr  Expr + Expr * Expr Expr  Expr * Id
Expr  Expr + Expr * Id Expr  Expr + Expr * Id
Expr  Expr + Id * Id Expr  Expr + Id * Id
Expr  Id + Id * Id Expr  Id + Id * Id
Parser Tree and derivation
is the process of checking that a string is in the CFG
for your programming language. It is usually coupled
with creating an abstract syntax tree.
Expr  Expr + Expr Expr  Expr * Expr
Expr  Id + Expr Expr  Expr + Expr * Expr
Expr  Id + Expr * Expr Expr  Id + Expr * Expr
Expr  Id + Id * Expr Expr  Id + Id * Expr
Expr  Id + Id * Id Expr  Id + Id * Id
Expr Expr

Expr + Expr Expr * Expr

Id Expr * Expr Expr + Expr Id


Id Id Id Id
14 = 2 + 3 * 4 2 + 3 * 4 = 20
Ambiguity
has more than for
then it is
is to be if there is at
string with or .
that is a property of
, not .
is no for converting grammar
into an one.
We say that a grammar is an Ambiguous, if there is
Resolving ambiguity
may be by Appling Disambiguated rule
( ,
and

( ,
, we can transform a grammar to have this property:
 For each A find the common to or
of its .

no for a single have a


.
Backus Normal Form(BNF)
 BNF is a notation techniques for context-free
grammars, often used to describe the syntax of
languages used in computing
 A BNF specification is a set of derivation rules,
written as
<symbol> ::= expression
BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= <expr>
<expr> ::= id
<op> ::= + | - | * | /
Left factoring
<expr> : : = <term> + <expr> Two
| <term> - <expr> must be
| <term> :
<term>: := <factor> * <term>
| <factor> / <term>
| <factor>
<expr> : : = <term> <expr´>
<expr´>: : = + <expr>
| - <expr>
| ε
<term> : : = <factor> <term´>
<term´>: := * <term>
| / <term> 22

| ε
Left factoring
,
<stmt> : : = if < > then <stmt>
| if < > then <stmt> else <stmt>
| <otherstmt>

<stmt> : := if < > then <stmt><stmt’>

<stmt’> : := ε|else <stmt>

<stmt> : := <otherstmt>

This generates the as the ambiguous grammar,


but applies the rule:
Resolving
Resolving ambiguity
ambi
grow
which means

means in the
equal

E→E+E E  E + T/T
E
E →id T  Id E + T
String id + id + id
E + T Id
T Id
Id
Resolving ambiguity
concerned about the priority of operators
get the
get from

Expr → | Expr + Expr


Expr → | Expr * Expr
Expr → |id
String id + id * id
Expr Expr

Expr + Expr Expr * Expr

Id Expr * Expr Expr + Expr Id


Id Id Id Id
14 = 2 + 3 * 4 2 + 3 * 4 = 20
Resolving ambiguity
concerned about the priority of
operators
E→|E+E E → | E + T/T
E→|E*E T → | T * F/F
E → |id F → |id
String id + id * id
E

E + T
T T * F

F F Id 14 = 2 + (3 * 4)

Id Id
Resolving ambiguity
concerned about the priority of
operators
E → | E or E E → | E or T/T
E → | E and E T → | T and F/F
E → |not E F → |not F/true/false
String id or id and (not id)
E

E or T True = T or (F and (not T))

T T and F

F F not

True False True


Parser
Parser
A
Parser
A tree may be viewed as a
for a that filters out the choice
order.
Mainly there are of parse tree

:
• Starts at the of tree and fills in
• Picks a and tries to the input
• Some grammars are ( )
:
• Starts at the and fills in
• Up to a valid for
• Uses a to store both and forms
Top-Down Parser
A tries to create a from the
the leafs input from to
It can be also as finding a for an
input

E E E E E E
E -> TE’ lm lm lm lm lm

E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’ F T’ F T’ F T’ F T’ + T E’
T’ -> *FT’ | Ɛ
F -> (E) | id id id Ɛ id Ɛ F T’ Ɛ
id
* F T’
id
Ɛ
Top-down parsing
Bottom-up parser
for an input string at the
(the ) and working the (the top)

Start from to (start Symbol)

E -> E + T | T id*id F * id T * id T*F T E

T -> T * F | F id F id
F id
F T*F T
F -> id
id id F id
F id T*F

id
id F id

id
Bottom-up parsing
Top-down parser

on the the parser has about the


input, a is made with one
.
 If this choice to a , the would have to
to that point, moving through
the , and
 Start again a choice and so on it
found the that was the or

• For example, this :


Top-down parser
Let’s follow S bcd Try
the input . bab bcd match b
In the below, ab cd dead-end, backtrack
the on the S bcd Try
will be the bA bcd match b
thus far, A cd Try

the is the d cd dead-end, backtrack


A cd Try
, and
cA cd match c
the is the
A d Try
action at
d d match d
step:
Success!
Recursive Descent parsing
1. A consists of
, one for each in the .
As we a , we call the that
to the of the
we are applying. If these productions are , we end
up the .
begins with the for
A procedure for a
void A() {
choose an A-production, AX1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
Top-down parser

A is by its to choose the


to solely on the of the
and the current being .
 To this, the must take a form.
 We call such a
• The means we scan the input from to ; the
means we create a ; and
• the means one input of .
 Informally, an LL(1) has no

it is a special form of Recursive Descent


Parsing without backtracking.
Non Recursive Descent parsing
 It is to build a predictive by
a , than via .
 The problem during is that of
the to be for a .
 A table-driven predictive parser has an input buffer, a stack, a
parsing table. and an output stream.
the string to be followed by a
used as a end to the of the
.
contains a of symbols with on the
, the of the .
the the of the grammar on

is a two dimensional array . Where


is a , and or the
Non Recursive Descent parsing
is by a that behaves as follows. The
considers , the symbol on of the , and the
current in . These determine the of the
.
 are .
1. If the parser and announces of
.
2. If the parser off the and the
input to the next .
3. If is a , Parser looks at the parsing table entry
XY1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. of the
XY1Y2...Yk to represent a step of the derivation.
4. None of the above  error
• All empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error
Non Recursive Descent parsing
Non Recursive Descent parsing Algorithm
• Input buffer
– our string to be parsed. We will assume that its end is
marked with a special symbol $.
• Output
– A production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input
buffer.
• Stack
– Contains the grammar symbols
– At the bottom of the stack, there is a special end marker
symbol $.
– Initially the stack contains only the symbol $ and the
starting symbol S.
– When the stack is emptied (i.e. only $ left in the stack), the
Non Recursive Descent parsing Algorithm
• Parsing table
– A two-dimensional array M[A,a]
– Each row is a non-terminal symbol
– Each column is a terminal symbol or the special
symbol $
– Each entry holds a production rule.
LL(1) Grammars
are those

for which we can create are called

means scanning input


means
stands for one for
• A grammar if and only if are
of , the following conditions hold:
– For a do and both
with a
– At of or can derive
– If then derive any string with a
in
LL(1) Grammars
Example
left factoring
First and Follow

()
()
is set of that derived from
• If then is also in
• In when we have , if and
are then we can
by looking at the
for any , is set of a that
can in some form
we have for some and then is in

• If A can be the in some form,


then is in
Computing First
• To compute ) for all grammar , apply
following rules no or can be added to
any
is a terminal then =
is a and is a
for some , then place a
for some a is in and is in all of
that is . if is in
for then add to
1. If is a production then add to
!
• of is
• of is
Computing Follow
• To compute for all , apply
following rules until nothing can be added to any
follow set:
in where is
is a production then in
except is in .
there is a production or a production
where contains , then
in is in
!
• of is
of is
Examples
 To find the first of the given grammar remember the above
rules.
1. Sabc|def|ghi the first(S)={a,d,g}
2. SABC|def|ghi the first(S)=first(A)={a,b,c,d,g}
Aa|b|c
Bb
Dd
3. SABC The first(S)=First(A)={a,b, } but we didn’t write
epsilon before the remaining symbol is present instead
Aa|b| of writing epsilon directly goes to B The
Bc|d| first(S)=first(A)after reached to epsilon first(B) then
again goto first(C) when you reached at epsilon finally ,
Ce|f| First(S)={a,b,c,d,e,f, }
Examples
 To find the follow of G: remember the above rules.
1. The follow of starting symbol is {$}
2. SACD the follow(A)={a,b} and follow(D)=follow(S)={$}
Ca|b
3. SaSbS|bSaS the follow(S)={$,b,a}
Aa|b|c
Bb
Dd
4. SABC follow(A)=first(B) since
ADEF to the next=first(C) again
B
C
D
First and Follow function
First() Follow()
S  ABCDE {a, b, c} {$}
A  a/Ɛ {a, Ɛ} {b, c}
B  b/Ɛ {b, Ɛ} {c}
C c {c} {d, e, $}
D  d/Ɛ {d, Ɛ} {e, $}
E  e/Ɛ {e, Ɛ} {$}

First() Follow()
S  Bb/Cd {a, b, c, d } {$}
B  aB/Ɛ {a, Ɛ} {b}
C  cC/Ɛ {c, Ɛ} {d}
Construction of predictive parsing table
• For each in do the
:
1. For a in add in
is in then for each in
to .
is in and $ is in , add
to
as well
the , there is
in then set to
Example
• For the

 Eliminate left recursion


 Eliminate left factoring
 Calculate First and follow
 Construct parser table
 Check weather the string is acceptable
or not
Construction of predictive parsing table

First() Follow()
E  TE’ {Id,(} {$, )}
E’  + TE’/Ɛ {+, Ɛ} {$, )}
T  FT’ {Id, (} {+, $, )}
T' *
FT’/Ɛ {*, Ɛ} {+, $, )}
F  Id/(E) {Id, (} {*, +, $, )}
Construction of predictive parsing table

Id + * ( ) $
E E  TE’ E  TE’

E’ E’  + TE’ E’ Ɛ E’ Ɛ

T T  FT’ T  FT’

T' T’ Ɛ T’  *FT’ T’ Ɛ T’ Ɛ

F F  Id F  (E)
Construction of predictive parsing table

E
$E Id + Id * Id$ T E’
$E’T Id + Id * Id$ E  TE’
T’ + T
F E’
$E’T’F Id + Id * Id$ T  FT’
id T’
$E’T’ Id Id + Id * Id$ F  Id Ɛ F
$E’T’ + Id * Id$ id
$E’ + Id * Id$ T’  Ɛ
$E’T + + Id * Id$ E’  + TE’
$E’T Id * Id$
$E’T’F Id * Id$ T  FT’

$E’T’ Id Id * Id$ F  Id

$E’T’ * Id$
Construction of predictive parsing table

E
$E’T’F * * Id$ T’  *FT’ T E’
$E’T’F Id$ F T’ + T E’
$E’T’ Id Id$ F  Id
id T’ Ɛ
$E’T’ $ Ɛ F

$E’ $ T’  Ɛ id * F T’
$ $ E’  Ɛ
id Ɛ
Characteristics of LL(1) Grammar
 Left-to-Right Scanning The parser read the input from
left to right.
 Leftmost Derivation: The parser constructs the leftmost
derivation of the sentence.
 One Lookahead Token: The parser uses one token of
lookahead to make parsing decisions.
 No Left Recursion: The grammar should not have left
recursion, as it can lead to infinite recursion in the
parser.
 Non-Ambiguous: Each production should be
unambiguously chosen based on the current non-
terminal and lookahead token.
Bottom Up
Bottom Up parsing
parsing
Bottom Up parsing
 Bottom up parser is constructs parse tree from the bottom to the
top i.e. leave to root
 It is the process of reducing the string to the starting symbols of
the grammar
• It construct the parser tree in the reverse which means it uses
reverse Right most derivative (RMD) in reducing the input string
• The popular bottom up parser is LR parser
• The main objective of bottom up parser is construct a parser tree
starting from input string and proceed upward to generate the
starting symbol of grammar
• Steps
 Parsing start with input string
 Scan input left to right
 Detect the right handle
 Apply production rule to reduce the handle
 Procedure continue until drives the starting symbol
Shift-reduce parser
• The idea is to some of input to the
until a can be
, a specific the
of a is by the at the of
the
during parsing are about
and about
• A is a of a in a
of a parser is to a in
: that means
Types of Bottom up parser
Shift-reduce parser
• A is a that the of a
and whose represents one
the of a
Shift-reduce parser
is used to
always on of the
:
Shift-reduce parser

$ Id + id*id$ shift
$id +id*id$ reduce by Eid
$E +id*id$ shift
$E+ id*id$ shift
$E + id *id$ reduce by Eid
$E + E *id$ shift
$E + E * id$ shift
$E + E * id $ reduce by Eid
$E + E * E $ reduce by EE*E
E reduce by EE+E
$E + E $
$E $ accept
E E
Fig. Configurations of Shift Reduce
E E Parser on id + id * id

id + id * id
Operator Precedence parsing

These grammars have the properly that no


production right side is or has

E  EAE/ Id
A  */+
b/c
These two are by ( )
E | E*E From the above example
E  |E + E EEAE|id EE+E|E*E|id
E |E-E A+|* A+|*
E |E^E
The first production is not operator grammar
E  | Id
but we can change it into operator grammar
Operator Precedence parsing
There are three possibility precedence
relations
 a>b  terminal a have high
precedence than terminal b
 a<b terminal b have least
precedence than b
 a=b terminal a and b have the
same precedence

E → | E or E
E → | E and E
E → |not E
E →|Id
String id or id and id
Construct of Operator Precedence Relation table
When we construct
relation table the id
E  E+E
Id + have high precedence
E E*E and $ have lower
E  E* Id Id -- ∙> ∙> ∙> precedence
+ <∙ ∙> <∙ ∙> It there is + + give
high precedence of the
* <∙ ∙> ∙> ∙> left b/c we applied left
Example
Id+ Id * Id$
$ <∙ <∙ <∙ accepted
associative and similar
for *

E
$ Id + Id * Id
E E

side and get E E


stack is $
of a it id + id * id
it.
Operator Precedence parsing Algorithm
Stack Relation Input Operation
$ < Id+id*id$ Push/shift id
$id > +id*id$ Pop/reduce E id
$E < +id*id$ Push/shift +
$E+ < Id*id$ Push/shift id
$E+id > *id$ Pop/reduce Eid
$E+E < *id$ Push/shift *
$E+E* < id$ Push/shift id
$E+E*id > $ Pop/reduce E id
$E+E*E > $ Pop/reduce EE*E
$E+E > $ Pop/reduce EE+E
$E - $ Accepted
Construct of Operator Precedence Function table
of
If we have our
So to the of we are
operator function table
To we have to a
two f(i) and g(j)
Id +

i
Id -- ∙> ∙> ∙>
+ <∙ ∙> <∙ ∙>
* <∙ ∙> ∙> ∙>
$ <∙ <∙ <∙ --
j
Construct of Operator Precedence Function table

W/h one is a longest path each function


f(id) g(*) f(+) g(+) f($) =4
g(id) f(*) g(*) f(+) g(+) f($)
=5
Id +
Id -- ∙> ∙> ∙>
f(i) +
<∙ ∙> <∙ ∙>
* <∙ ∙> ∙> ∙>
$ <∙ <∙ <∙ --
g(j)
Id +
f() 4 2 4 0
g() 5 1 3 0  there is no cycle
Construct of Operator Precedence Relation table

E→|E -E
E→|E /E
E →|Id
String id or id and id

E → | E or E
E → | E and E
E →|Id
String id or id and id
Construct of Operator Precedence Function table
From Relation table

 The Functional table is


Reading assignment
What is LR parsing?
Questions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy