3 - Grammars

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Prof.

Radu Prodan

SYNTACTIC ANALYSIS
CONTEXT-FREE GRAMMARS
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 1
Phases of a Compiler
Front-end Back-end

Source Syntax Annotated Intermediate Target Target


Tokens
Code Tree Tree Representation Code Code

Source Code

Target Code
Generator

Optimiser
Optimiser
Semantic
Syntactic
Analyser

Analyser

Analyser
Lexical

Code
Literal Symbol Error
Table Table Handler

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 2


Agenda
▪ Introduction

▪ Context-free grammars

▪ Ambiguous grammars

▪ Extended BNF

▪ Conclusions

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 3


Overview
▪ Syntactic analysis or parsing
– Find program structure

▪ Context-free grammar
– Grammar rules defines programming language syntax
– Operates similar to scanner recognising regular expressions

▪ Recursive context-free grammars


– E.g. nested for loops, nested if statements

▪ Parse tree or syntax tree


– Increased complexity of data structure and algorithms

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 4


Parsing
▪ Input: tokens produced by lexical analyser
– Parser calls getToken scanning procedure when needed

parser
Sequence of tokens Syntax Tree

▪ Output: parse tree or syntax tree

▪ Multi-pass compilers explicitly create and save syntax tree


– syntaxTree = parse();

▪ Error handling
– Scanners consume incorrect characters and generate error token

▪ Error recovery
– Infer possible correct code from incorrect code and continue parsing

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 5


Agenda
▪ Introduction

▪ Context-free grammars

▪ Ambiguous grammars

▪ Extended BNF

▪ Conclusions

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 6


Context-Free Grammar
▪ Syntactic structure of a programming language

▪ Backus-Naur Form (BNF) for integer arithmetic expressions


– exp → exp op exp | ( exp ) | number
– op → + | – | *

▪ (34 – 3) * 42
– Corresponds to legal string of seven tokens
– ( number – number ) * number

▪ (34 – 3 * 42
– Not legal expression because of a missing right parenthesis

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 7


Formal Context-Free Grammar
Definition
▪ Context-free grammar: G = (T, N, P, S)
– Terminal set: T
– Nonterminal set: N (disjoint from T)
– Productions or grammar rules P: A → , A  N    (T  N)*
– Start symbol: S  N

▪ Symbol set: T  N

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 8


BNF Grammar for Pascal
program → program-heading ; program-block .

program-heading → . . .
program-block → . . .
. . .

▪ program is start symbol

▪ program, program-heading, program-block are nonterminals

▪ ; and . are terminals

▪ Nonterminals defined through grammar rules or productions


20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 9
Comparison to Regular Expressions
▪ Regular expression
– number = digit digit*
– digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

▪ BNF form
– number → digit | digit number
– digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

▪ Repetition specified by
– * in regular expressions
– Recursion in BNF grammar rules

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 10


Expression Derivation
▪ Derivation exp → exp op exp | ( exp ) | number
– Sequence of replacements op → + | – | *
of nonterminals by one
grammar rule body ▪ Derivation for ( 34 – 3 ) * 42
– Define through grammar rules: →
– Construct by replacement: 
Step Derivation Rule
1 exp  exp op exp [ exp → exp op exp ]
2  exp op number [ exp → number ]
3  exp * number [ op → * ]
4  ( exp ) * number [ exp → ( exp ) ]
5  ( exp op exp ) * number [ exp → exp op exp ]
6  ( exp op number ) * number [ exp → number ]
7  ( exp – number ) * number [ op → – ]
8  ( number – number ) * number [ exp → number ]
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 11
Derivation
▪ Derivation step over G is of the form A  
– Where   (T  N)*,   (T  N)*, and A →   P

▪ Transitive closure 1 * n of derivation step relation 


– 1 * n   1  2  …  n

▪ Derivation over grammar G is of the form S * w


– w  T* and S  N is start symbol of G

▪ Language generated by G
– L(G) = { w  T* |  S *w  G }

▪ Leftmost derivation S *lm w   A  , then   T*

▪ Rightmost derivation S *rm w   A  , then   T*

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 12


Parse Tree
▪ Parse tree: labelled tree representing a derivation
– Interior nodes are nonterminals
– Leaf nodes are terminals
– Children of each internal node are body of a grammar production

▪ Example 1 exp
(1) exp  exp op exp
(2)  number op exp 2 exp 3 op 4 exp
(3)  number + exp
(4)  number + number
number + number

▪ Leftmost derivation
– Leftmost nonterminal is replaced at each derivation step
– Preorder numbering
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 13
Leftmost versus Rightmost
Derivation 1 exp
▪ Rightmost derivation
– Rightmost nonterminal is replaced at 4 exp 3 op 2 exp
each derivation step
– Postorder numbering
number + number

(1) exp  exp op exp 1 exp


(2)  exp op number
(3)  exp + number 4 exp 3 op 2 exp

(4)  number + number


( 5 exp ) * number

▪ Parse tree of rightmost derivation 8 exp 7 op 6 exp


for (34−3)*42
number – number
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 14
Abstract Syntax Tree (AST)
exp
▪ Syntax-directed translation
exp op exp

▪ Parse tree for 3+4 number + number


(3) (4)

+
▪ AST for 3+4
3 4

▪ AST for (34–3)*42 – 42


– OpExp(Times, OpExp(Minus, ConstExp(34),
ConstExp(3)), ConstExp(42)) 34 3
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 15
AST for Expression Grammar
typedef enum { Plus, Minus, Times } OpKind;
typedef enum { OpK, ConstK } ExpKind;

typedef struct streenode


{ ExpKind kind;
OpKind op;
struct streenode *lchild, *rchild;
*
int val;
} STreeNode; – 42
typedef STreeNode *SyntaxTree;
34 3

exp → exp op exp | ( exp ) | number


op → + | – | *
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 16
Grammar of Paired Braces
▪ E → ( E ) | a
– Nonterminals: E
– Terminals: (, ), and a
– L(G) = { a, (a), ((a)), (((a))), … } = { (na)n | n  ℕ }
– Derivation for ((a))
• E  ( E )  (( E ))  (( a ))

▪ E → ( E )
– Nonterminals: E
– Terminals: ( and )
– L(G) = 
• Missing non-recursive case (or base case)

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 17


Left and Right Recursion
▪ Left recursive grammar Gl
– A → Aa | a
– A  Aa  Aaa  Aaaa  aaaa

▪ Right recursive grammar Gr


– A → aA | a
– A  aA  aaA  aaaA  aaaa

▪ L(Gl) = L(Gr) = { an | n  ℕ* } = L(a+)


– Same language as generated by regular expression a+

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 18


-Production
▪ Grammar for same language as regular expression a*
– Needs rule notation that generates empty string

▪ -production
– empty →
– empty → 

▪ Left recursive grammar Gl


– A → A a | 

▪ Right recursive grammar Gr


– A → a A | 

▪ L(Gl) = L(Gr) = { an | n  ℕ } = L(a*)

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 19


Simplified Statement Grammar
▪ Without -productions
statement → if-stmt | other
if-stmt → if ( exp ) statement
| if ( exp ) statement else statement
exp → 0 | 1

▪ With -productions
statement → if-stmt | other
if-stmt → if ( exp ) statement else-part
else-part → else statement | 
exp → 0 | 1

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 20


Statement Grammar Parse Tree
statement → if-stmt | other
if-stmt → if ( exp ) statement else-part
else-part → else statement | 
exp → 0 | 1

▪ if (0) other else other


statement

if-stmt

if ( exp ) statement else-part

0 other else statement

other
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 21
Statement Grammar AST
typedef enum { ExpK, StmtK } NodeKind;
typedef enum { Zero, One } ExpKind;
typedef enum { IfK, OtherK } StmtKind;

typedef struct streenode if


{ NodeKind kind;
ExpKind ekind; 0 other other
StmtKind skind;
struct streenode *test, *thenpart, *elsepart;
} STreeNode;

typedef STreeNode *SyntaxTree;

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 22


Statement Sequence Grammar
stmt-sequence → stmt ; stmt-sequence | stmt
stmt → s

▪ Input string: s;s;s


stmt-sequence

stmt ; stmt-sequence

s stmt ; stmt-sequence

s stmt

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 s 23


Statement Sequence AST
;
▪ Right derivation
s ;
s s

seq
▪ Variable number of children
s s s

seq
▪ Leftmost-child right-sibling
s s s

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 24


Agenda
▪ Introduction

▪ Context-free grammars

▪ Ambiguous grammars

▪ Extended BNF

▪ Conclusions

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 25


Ambiguity
▪ Ambiguous grammar
– Generates string with two distinct parse trees
– Similar to nondeterministic automaton
– Considered as incomplete specification

exp → exp op exp | ( exp ) | number


op → + | – | *

▪ Correct syntax tree for 34 – 3 * 42


– 34 – 3 = 31, 31 * 42 = 1302
– 3 * 42 = 126, 34 – 126 = –92

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 26


Leftmost Derivation
exp → exp op exp | ( exp ) | number *
op → + | – | * – 42
34 3
▪ Input string: 34–3*42
Step Leftmost Derivation Rule
1 exp  exp op exp [ exp → exp op exp ]
2  exp op exp op exp [ exp → exp op exp ]
3  number op exp op exp [ exp → number ]
4  number – exp op exp [ op → – ]
5  number – number op exp [ exp → number ]
6  number – number * exp [ op → * ]
7  number – number * number [ exp → number ]
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 27
Another Leftmost Derivation
exp → exp op exp | ( exp ) | number –
op → + | – | * 34 *
3 42
▪ Input string: 34–3*42
Step Leftmost Derivation Rule
1 exp  exp op exp [ exp → exp op exp ]
2  number op exp [ exp → number ]
3  number – exp [ op → – ]
4  number – exp op exp [ exp → exp op exp ]
5  number – number op exp [ exp → number ]
6  number – number * exp [ op → * ]
7  number – number * number [ exp → number ]
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 28
Disambiguating Rules
▪ Precedence relation of mathematical operators

▪ Left associative subtraction


– 34 – 3 – 42 = (34 – 3) – 42 = –11
– 34 – (3 – 42) = 73

▪ Non-associative operation
– A sequence of more than one operator is not allowed
• 34 – 3 – 42 or 34 – 3 * 42 are illegal
– Only fully parenthesized expressions are legal
• (34 – 3) – 42, 34 – (3 * 42)
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 29
Ambiguity Removal
▪ Replace one recursion with base case

▪ Separate operators with different precedence

▪ Consider associativity when writing recursion

exp → exp addop term | term


addop → + | –
term → term mulop factor | factor
mulop → *
factor → ( exp ) | number

Ambiguous Rule Nonambiguous left associative Nonambiguous right associative


exp → exp addop exp | term exp → exp addop term | term exp → term addop exp | term
term → term mulop term | factor term → term mulop factor | factor term → factor mulop term | factor

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 30


Dangling Else Problem
statement → if-stmt | other
if-stmt → if ( exp ) statement
| if ( exp ) statement else statement
exp → 0 | 1

▪ if (0) if (1) other else other


– Two distinct parse trees

▪ Ambiguous grammar because of optional else

▪ Most closely nested (disambiguating) rule


20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 31
Other Dangling Else Solutions
▪ Mandatory else part
– LISP and other functional languages

▪ Bracketing keyword

if-stmt → if condition then statement-sequence end if


| if condition then statement-sequence
else statement-sequence end if

Most Closely Nested Rule Bracketing Bracketing keyword Required else


if (x != 0) if (x == 0) { if (x != 0) if (x != 0)
if (y == 1/x) if (y == 1/x) { if (y == 1/x) if (y == 1/x)
ok = true ok = true ok = true ok = true
else z = 1/x } else z = 1/x else z = 1/x else z = 1/x
} end if else
end if

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 32


Inessential Ambiguity seq

▪ Statement sequence grammar s s s

stmt-sequence → stmt ; stmt-sequence | stmt


stmt → s

▪ Left or a right recursive grammars produce same abstract


syntax tree structure

▪ Inessential ambiguity
– Semantic does not depend on disambiguating rule

▪ Associative operations generate inessential ambiguity


– Addition, multiplication, concatenation
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 33
Agenda
▪ Introduction

▪ Context-free grammars

▪ Ambiguous grammars

▪ Extended BNF

▪ Conclusions

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 34


Extended BNF
▪ Repetitive constructs

▪ Optional constructs

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 35


EBNF Repetitive Constructs
▪ Left recursive: A → A  | 
– Kleene closure in regular expressions: A →  *
– EBNF: A →  {  }

▪ Right recursive: A →  A | 
– Kleene closure in regular expressions: A → * 
– EBNF: A → {  } 

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 36


Expression Grammar in EBNF
exp → exp addop term | term

▪ ENBF left associative form


exp → term { addop term }

▪ EBNF right associative form


exp → { term addop } term

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 37


EBNF Optional Constructs
▪ Grammar rules for if-statements
– BNF: if-stmt → if ( exp ) statement
| if ( exp ) statement else statement
– EBNF: if-stmt → if ( exp ) statement [ else statement ]

▪ Statement sequence grammar


– BNF: stmt-sequence → stmt ; stmt-sequence | stmt
– EBNF: stmt-sequence → stmt [ ; stmt-sequence ]

▪ Addition operation in right associative form


– BNF: exp → term addop exp | term
– EBNF: exp → term [ addop exp ]
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 38
EBNF Syntax Diagrams
factor → ( exp ) ( exp )
factor
| number
number

A
A → { B }
B

A
A → [ B ]
B

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 39


Syntax Diagrams for Simple
Arithmetic Expression Grammar
▪ BNF expression grammar exp term
exp → exp addop term
| term term addop
addop → + | –
term → term mulop factor +
addop
| factor

mulop → *
factor → ( exp ) | number
term factor

factor mulop
▪ EBNF expression grammar
exp → term { addop term }
addop → + | – mulop *
term → factor { mulop factor }
mulop → * ( exp )
factor → ( exp ) | number factor
number
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 40
Syntax Diagrams for Simplified
Grammar of If-Statements
▪ BNF grammar
statement→ if-stmt | other if-stmt
statement
if-stmt → if ( exp ) statement other
| if ( exp ) statement
else statement
exp → 0 | 1 if-stmt if ( exp )

statement
▪ EBNF grammar else statement
statement → if-stmt | other
if-stmt → if ( exp ) statement
[ else statement ] 0
exp
exp → 0 | 1
1
20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 41
Agenda
▪ Introduction

▪ Context-free grammars

▪ Ambiguous grammars

▪ Extended BNF

▪ Conclusions

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 42


Conclusions
▪ Syntactic analysis or parsing

▪ Specified through context-free grammars

▪ Represented as Abstract Syntax Trees

▪ Ambiguous grammars

▪ BNF and EBNF representation

20.03.2024 R. Prodan, Compiler Construction, Summer Semester 2024 43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy