Parsing Part - 1
Parsing Part - 1
Parsing Part - 1
Parsing : Part-I
Parsing
Parsing During Compilation
regular
expressions errors
symbol
• Collecting token
table information
• Perform type checking
• uses a grammar to check structure of tokens • Intermediate code
generation
• produces a parse tree
• syntactic errors and recovery
• recognize correct syntax
• report errors
Parsers
We categorize the parsers into two groups:
1. Top- Down Parser: The parse tree is created top to bottom, starting from the root
2. Bottom-Up Parser: The parse tree is created bottom to top, starting from the leaves
Both Top-Down and Bottom-Up parsers scan the input from left to right ( One symbol at a time)
Efficient Top-Down and Bottom-Up parsers can be implement only for the sub-classes of context
free grammars
LL for top-down parsing
LR for bottom-up parsing
Errors in Programs
Error Detection
Adequate Error Reporting is Not a Trivial
Task
ERROR RECOVERY
ERROR RECOVERY MAY TRIGGER MORE ERRORS!
ERROR RECOVERY APPROACHES: PANIC
MODE
ERROR RECOVERY APPROACHES:
PHRASE-LEVEL RECOVERY
ERROR RECOVERY APPROACHES:
ERROR PRODUCTIONS
ERROR RECOVERY APPROACHES:
GLOBAL CORRECTION
Syntactical Analysis
Each language definition has rules that describe the syntax of well formed programs.
• Format of the rules: context-free grammars
• Why not regular expressions/NFA’s/DFA’s?
Source program constructs have recursive structure:
digits = [0-9]+;
expr = digits | “(“ expr “+” expr “)”
Upper-case letters late in the alphabet, such as X, Y, Z, represent either nonterminals or terminals.
Lower-case letters late in the alphabet, such as u, v, …, z, represent strings of terminals.
Notational Conventions
Lower-case Greek letters, such as , , , represent strings of grammar symbols.
Thus A indicates that there is a single nonterminal A on the left side of the production
and a string of grammar symbols to the right of the arrow.
If A 1, A 2, …., A k are all productions with A on the left, we may write:
A 1 | 2 | …. | k
Unless otherwise started, the left side of the first production is the start symbol.
E E A E | ( E ) | -E | id
A+|-|*| / |
SUMMARY OF NOTATIONAL
CONVENTIONS
Context Free Grammars : A First Look
Production rules:
1. assign_stmt id := expr ;
2. expr expr operator term Derivation: A sequence of grammar rule
3. expr term applications and substitutions that
transform a starting non-term into a
4. term id sequence of terminals / tokens.
5. term real
6. term integer Terminals: id real integer + - := ;
Nonterminals: assign_stmt, expr, operator, term
7. operator +
Start symbol: assign_stmt
8. operator -
Example Grammar:
Simple Arithmetic Expressions
1. expr expr op expr
2. expr ( expr )
3. expr - expr
4. expr id
9 Production rules
5. op +
6. op -
Terminals: id + - * / ( )
7. op * Nonterminals: expr, op
8. op / Start symbol: expr
9. op
DERIVATIONS
DERIVATIONS
CFG Terminology
Derivation
Let’s derive: id = id + real – integer ;
production rules:
assign_stmt
® id = expr ; =
® id = expr operator term;
® id = expr operator term operator term;
® id = term operator term operator term;
® id = id operator term operator term;
® id = id + term operator term;
® id = id + real operator term;
® id = id + real - term;
® id = id + real - integer;
LEFTMOST DERIVATION
RIGHTMOST DERIVATION
PARSE TREE
PARSE TREE
YES
AMBIGUOUS GRAMMAR
PROBLEMS OF AMBIGUOUS GRAMMAR
4/2+2 4 1
4/2+2=4
PROBLEMS OF AMBIGUOUS GRAMMAR
num(3)
Still it is an Ambiguous Grammar!!
Ambiguous Grammar:
How to Solve Precedence Problem?
Input: 3 + 2 * 6 Parse Tree -1 Parse Tree -2 Two different parse trees!!
According to the Grammar both
are correct.
Ambiguous Grammar
But Parse Tree-1 gives RIGHT
1. E E + E answer and Parse Tree-2 gives
2. E E * E WRONG answer.
num(3) num(6)
3. E num
num(2) num(6) num(3) num(2)
After Conversion to
Unambiguous Grammar:
Lower precedence operation rules should be
1. E E + T | T
declared in the upper level in the Grammar
2. T T * F | F
Higher precedence operation rules should be
3. F num
declared in the lower level in the Grammar
Ambiguous Grammar:
How to Solve Precedence Problem?(2)
Input: 3 + 2 * 6 Parse Tree
After Conversion to
T
Unambiguous Grammar:
T T F
1. E E + T | T F
F num(6)
2. T T * F | F num(3)
3. F num num(2)
UNAMBIGUOUS GRAMMAR
Reading Materials
Chapter -4 of your Text book:
Compilers: Principles, Techniques, and Tools
THE END