Parsing Part - 1

CSE-361:Compiler Design
Parsing : Part-I
Parsing
Parsing During Compilation
regular
expressions errors
lexical token rest of intermediate

source parser parse
representation
program analyzer get next tree front end
token
symbol
• Collecting token
table information
• Perform type checking
• uses a grammar to check structure of tokens • Intermediate code
generation
• produces a parse tree
• syntactic errors and recovery
• recognize correct syntax
• report errors
Parsers
We categorize the parsers into two groups:
1. Top- Down Parser: The parse tree is created top to bottom, starting from the root
2. Bottom-Up Parser: The parse tree is created bottom to top, starting from the leaves
Both Top-Down and Bottom-Up parsers scan the input from left to right ( One symbol at a time)
Efficient Top-Down and Bottom-Up parsers can be implement only for the sub-classes of context
free grammars
 LL for top-down parsing
 LR for bottom-up parsing
Errors in Programs
Error Detection
Adequate Error Reporting is Not a Trivial
Task
ERROR RECOVERY
ERROR RECOVERY MAY TRIGGER MORE ERRORS!
ERROR RECOVERY APPROACHES: PANIC
MODE
ERROR RECOVERY APPROACHES:
PHRASE-LEVEL RECOVERY
ERROR PRODUCTIONS
GLOBAL CORRECTION
Syntactical Analysis
Each language definition has rules that describe the syntax of well formed programs.
• Format of the rules: context-free grammars
• Why not regular expressions/NFA’s/DFA’s?
 Source program constructs have recursive structure:
digits = [0-9]+;
expr = digits | “(“ expr “+” expr “)”
◦ Finite automata can’t recognize recursive constructs, so cannot ensure expressions

are well-bracketed: a machine with N states cannot remember parenthesis—nesting
depth greater than N
◦ CFG’s are more powerful, but also more costly to implement
CFG versus Regular Expression
Language: set of strings
String: finite sequence of symbols taken from finite alphabet
Regular expressions and CFG’s both describe languages, but over different
alphabets
CFG’s strictly more expressive than RE’s:
Any language recognizable/generated by a RE can also be recognized/generated
by a CFG, but not vice versa.
Also known as Backus-Naur Form (BNF, Algol 60)

CONTEXT FREE GRAMMARS (CFG)
RULE ALTERNATIVE NOTATIONS
Notational Conventions
Terminals
◦ Lower-case letters early in the alphabet: a, b, c
◦ Operator symbols: +, -
◦ Punctuations symbols: parentheses, comma
◦ Boldface strings: id or if
Nonterminals:
◦ Upper-case letters early in the alphabet: A, B, C
◦ The letter S (start symbol)
◦ Lower-case italic names: expr or stmt
Upper-case letters late in the alphabet, such as X, Y, Z, represent either nonterminals or terminals.
Lower-case letters late in the alphabet, such as u, v, …, z, represent strings of terminals.
Notational Conventions
Lower-case Greek letters, such as , , , represent strings of grammar symbols.
Thus A  indicates that there is a single nonterminal A on the left side of the production
and a string of grammar symbols  to the right of the arrow.
If A 1, A 2, …., A k are all productions with A on the left, we may write:
 A 1 | 2 | …. | k
Unless otherwise started, the left side of the first production is the start symbol.
E  E A E | ( E ) | -E | id
A+|-|*| / | 
SUMMARY OF NOTATIONAL
CONVENTIONS
Context Free Grammars : A First Look
Production rules:
1. assign_stmt  id := expr ;
2. expr  expr operator term Derivation: A sequence of grammar rule
3. expr  term applications and substitutions that
transform a starting non-term into a
4. term  id sequence of terminals / tokens.
5. term  real
6. term  integer Terminals: id real integer + - := ;
Nonterminals: assign_stmt, expr, operator, term
7. operator  +
Start symbol: assign_stmt
8. operator  -
Example Grammar:
Simple Arithmetic Expressions
1. expr  expr op expr
2. expr  ( expr )
3. expr  - expr
4. expr  id
9 Production rules
5. op  +
6. op  -
Terminals: id + - * /  ( )
7. op  * Nonterminals: expr, op
8. op  / Start symbol: expr
9. op  
DERIVATIONS
DERIVATIONS
CFG Terminology
Derivation
Let’s derive: id = id + real – integer ;
production rules:
assign_stmt
® id = expr ; =
® id = expr operator term;
® id = expr operator term operator term;
® id = term operator term operator term;
® id = id operator term operator term;
® id = id + term operator term;
® id = id + real operator term;
® id = id + real - term;
® id = id + real - integer;
LEFTMOST DERIVATION
RIGHTMOST DERIVATION
PARSE TREE
PARSE TREE
Input: (id * id) + id

PARSE TREE
Input: (id * id) + id

PARSE TREE
PARSE TREE
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR
Two different parse trees!!

Which derivation of the parse tree is correct??
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR
YES
AMBIGUOUS GRAMMAR
PROBLEMS OF AMBIGUOUS GRAMMAR
4/2+2 4 1
This parse tree gives the Right Answer.
4/2+2=4
PROBLEMS OF AMBIGUOUS GRAMMAR
The Ambiguous Grammar does not consider the

Precedence and Associativity
SOLUTION: REMOVING AMBIGUITY
Ambiguous Grammar:
How to Solve Associativity Problem?
Operators with
Input: 3 + 2 + 6 Parse Tree -1 Parse Tree -2  Two different parse trees!!
 According to the Grammar both
are correct.
Ambiguous Grammar
 But Parse Tree-1 gives WRONG
1. E E + E answer and Parse Tree-2 gives
2. E  E * E RIGHT answer.
num(3) num(6)
3. E  num
num(2) num(6) num(3) num(2) Removing the associativity
problem from the Grammar:
 Operators with same precedence must be 1. E  E + num

resolved by Associativity 2. E  E * num
 Some operators have left associativity (+, -, *, /) 3. E num
and some operators have right associativity (^) Still it is an Ambiguous Grammar!!
Ambiguous Grammar:
How to Solve Associativity Problem?(2)
Input: 3 + 2 + 6 Parse Tree
Removing the associativity

problem from the Grammar:
num(6)
1. E  E + num
2. E  E * num
3. E num num(2)
num(3)
Still it is an Ambiguous Grammar!!
Ambiguous Grammar:
How to Solve Precedence Problem?
Input: 3 + 2 * 6 Parse Tree -1 Parse Tree -2  Two different parse trees!!
 According to the Grammar both
are correct.
Ambiguous Grammar
 But Parse Tree-1 gives RIGHT
1. E E + E answer and Parse Tree-2 gives
2. E  E * E WRONG answer.
num(3) num(6)
3. E  num
num(2) num(6) num(3) num(2)
After Conversion to
Unambiguous Grammar:
 Lower precedence operation rules should be
1. E  E + T | T
declared in the upper level in the Grammar
2. T  T * F | F
 Higher precedence operation rules should be
3. F  num
declared in the lower level in the Grammar
Ambiguous Grammar:
How to Solve Precedence Problem?(2)
Input: 3 + 2 * 6 Parse Tree
After Conversion to
T
Unambiguous Grammar:
T T F
1. E  E + T | T F
F num(6)
2. T  T * F | F num(3)
3. F  num num(2)
UNAMBIGUOUS GRAMMAR
Reading Materials
 Chapter -4 of your Text book:
 Compilers: Principles, Techniques, and Tools
THE END

Parsing Part - 1

Uploaded by

Copyright:

Available Formats

Parsing Part - 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parsing Part - 1

Uploaded by

Copyright:

Available Formats

CSE-361:Compiler Design

lexical token rest of intermediate

◦ Finite automata can’t recognize recursive constructs, so cannot ensure expressions

Also known as Backus-Naur Form (BNF, Algol 60)

Input: (id * id) + id

Input: (id * id) + id

Two different parse trees!!

This parse tree gives the Right Answer.

The Ambiguous Grammar does not consider the

 Operators with same precedence must be 1. E  E + num

Removing the associativity

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.