Parsing Part - 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 53

CSE-361:Compiler Design

Parsing : Part-I
Parsing
Parsing During Compilation
regular
expressions errors

lexical token rest of intermediate


source parser parse
representation
program analyzer get next tree front end
token

symbol
• Collecting token
table information
• Perform type checking
• uses a grammar to check structure of tokens • Intermediate code
generation
• produces a parse tree
• syntactic errors and recovery
• recognize correct syntax
• report errors
Parsers
We categorize the parsers into two groups:
1. Top- Down Parser: The parse tree is created top to bottom, starting from the root
2. Bottom-Up Parser: The parse tree is created bottom to top, starting from the leaves

Both Top-Down and Bottom-Up parsers scan the input from left to right ( One symbol at a time)

Efficient Top-Down and Bottom-Up parsers can be implement only for the sub-classes of context
free grammars
 LL for top-down parsing
 LR for bottom-up parsing
Errors in Programs
Error Detection
Adequate Error Reporting is Not a Trivial
Task
ERROR RECOVERY
ERROR RECOVERY MAY TRIGGER MORE ERRORS!
ERROR RECOVERY APPROACHES: PANIC
MODE
ERROR RECOVERY APPROACHES:
PHRASE-LEVEL RECOVERY
ERROR RECOVERY APPROACHES:
ERROR PRODUCTIONS
ERROR RECOVERY APPROACHES:
GLOBAL CORRECTION
Syntactical Analysis
Each language definition has rules that describe the syntax of well formed programs.
• Format of the rules: context-free grammars
• Why not regular expressions/NFA’s/DFA’s?
 Source program constructs have recursive structure:
digits = [0-9]+;
expr = digits | “(“ expr “+” expr “)”

◦ Finite automata can’t recognize recursive constructs, so cannot ensure expressions


are well-bracketed: a machine with N states cannot remember parenthesis—nesting
depth greater than N
◦ CFG’s are more powerful, but also more costly to implement
CFG versus Regular Expression
CFG versus Regular Expression
Language: set of strings
String: finite sequence of symbols taken from finite alphabet
Regular expressions and CFG’s both describe languages, but over different
alphabets
CFG versus Regular Expression
CFG’s strictly more expressive than RE’s:
Any language recognizable/generated by a RE can also be recognized/generated
by a CFG, but not vice versa.

Also known as Backus-Naur Form (BNF, Algol 60)


CONTEXT FREE GRAMMARS (CFG)
RULE ALTERNATIVE NOTATIONS
Notational Conventions
Terminals
◦ Lower-case letters early in the alphabet: a, b, c
◦ Operator symbols: +, -
◦ Punctuations symbols: parentheses, comma
◦ Boldface strings: id or if
Nonterminals:
◦ Upper-case letters early in the alphabet: A, B, C
◦ The letter S (start symbol)
◦ Lower-case italic names: expr or stmt

Upper-case letters late in the alphabet, such as X, Y, Z, represent either nonterminals or terminals.
Lower-case letters late in the alphabet, such as u, v, …, z, represent strings of terminals.
Notational Conventions
Lower-case Greek letters, such as , , , represent strings of grammar symbols.
Thus A  indicates that there is a single nonterminal A on the left side of the production
and a string of grammar symbols  to the right of the arrow.
If A 1, A 2, …., A k are all productions with A on the left, we may write:
 A 1 | 2 | …. | k
Unless otherwise started, the left side of the first production is the start symbol.
E  E A E | ( E ) | -E | id
A+|-|*| / | 
SUMMARY OF NOTATIONAL
CONVENTIONS
Context Free Grammars : A First Look
Production rules:

1. assign_stmt  id := expr ;
2. expr  expr operator term Derivation: A sequence of grammar rule
3. expr  term applications and substitutions that
transform a starting non-term into a
4. term  id sequence of terminals / tokens.
5. term  real
6. term  integer Terminals: id real integer + - := ;
Nonterminals: assign_stmt, expr, operator, term
7. operator  +
Start symbol: assign_stmt
8. operator  -
Example Grammar:
Simple Arithmetic Expressions
1. expr  expr op expr
2. expr  ( expr )
3. expr  - expr
4. expr  id
9 Production rules
5. op  +
6. op  -
Terminals: id + - * /  ( )
7. op  * Nonterminals: expr, op
8. op  / Start symbol: expr
9. op  
DERIVATIONS
DERIVATIONS
CFG Terminology
Derivation
Let’s derive: id = id + real – integer ;
production rules:
assign_stmt
® id = expr ; =
® id = expr operator term;
® id = expr operator term operator term;
® id = term operator term operator term;
® id = id operator term operator term;
® id = id + term operator term;
® id = id + real operator term;
® id = id + real - term;
® id = id + real - integer;
LEFTMOST DERIVATION
RIGHTMOST DERIVATION
PARSE TREE
PARSE TREE

Input: (id * id) + id


PARSE TREE

Input: (id * id) + id


PARSE TREE
PARSE TREE
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR

Two different parse trees!!


Which derivation of the parse tree is correct??
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR

YES
AMBIGUOUS GRAMMAR
PROBLEMS OF AMBIGUOUS GRAMMAR

4/2+2 4 1

This parse tree gives the Right Answer.

4/2+2=4
PROBLEMS OF AMBIGUOUS GRAMMAR

The Ambiguous Grammar does not consider the


Precedence and Associativity
SOLUTION: REMOVING AMBIGUITY
Ambiguous Grammar:
How to Solve Associativity Problem?
Operators with
Input: 3 + 2 + 6 Parse Tree -1 Parse Tree -2  Two different parse trees!!
 According to the Grammar both
are correct.
Ambiguous Grammar
 But Parse Tree-1 gives WRONG
1. E E + E answer and Parse Tree-2 gives
2. E  E * E RIGHT answer.
num(3) num(6)
3. E  num
num(2) num(6) num(3) num(2) Removing the associativity
problem from the Grammar:

 Operators with same precedence must be 1. E  E + num


resolved by Associativity 2. E  E * num
 Some operators have left associativity (+, -, *, /) 3. E num
and some operators have right associativity (^) Still it is an Ambiguous Grammar!!
Ambiguous Grammar:
How to Solve Associativity Problem?(2)
Input: 3 + 2 + 6 Parse Tree

Removing the associativity


problem from the Grammar:
num(6)
1. E  E + num
2. E  E * num
3. E num num(2)

num(3)
Still it is an Ambiguous Grammar!!
Ambiguous Grammar:
How to Solve Precedence Problem?
Input: 3 + 2 * 6 Parse Tree -1 Parse Tree -2  Two different parse trees!!
 According to the Grammar both
are correct.
Ambiguous Grammar
 But Parse Tree-1 gives RIGHT
1. E E + E answer and Parse Tree-2 gives
2. E  E * E WRONG answer.
num(3) num(6)
3. E  num
num(2) num(6) num(3) num(2)
After Conversion to
Unambiguous Grammar:
 Lower precedence operation rules should be
1. E  E + T | T
declared in the upper level in the Grammar
2. T  T * F | F
 Higher precedence operation rules should be
3. F  num
declared in the lower level in the Grammar
Ambiguous Grammar:
How to Solve Precedence Problem?(2)
Input: 3 + 2 * 6 Parse Tree

After Conversion to
T
Unambiguous Grammar:
T T F
1. E  E + T | T F
F num(6)
2. T  T * F | F num(3)
3. F  num num(2)
UNAMBIGUOUS GRAMMAR
Reading Materials
 Chapter -4 of your Text book:
 Compilers: Principles, Techniques, and Tools
THE END

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy