0% found this document useful (0 votes)
225 views41 pages

Syntax Analysis: Chapter - 4

The document discusses syntax analysis in compiler design. It defines the role of a parser as checking if a program satisfies the rules of a context-free grammar and creating a parse tree if it does. There are three types of parsers - universal, top-down, and bottom-up. Top-down and bottom-up parsers are more efficient and commonly used in compilers. The document also discusses error handling strategies like panic-mode recovery and phrase-level recovery during parsing. It provides examples of context-free grammars and derivations in a grammar.

Uploaded by

Amir Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
225 views41 pages

Syntax Analysis: Chapter - 4

The document discusses syntax analysis in compiler design. It defines the role of a parser as checking if a program satisfies the rules of a context-free grammar and creating a parse tree if it does. There are three types of parsers - universal, top-down, and bottom-up. Top-down and bottom-up parsers are more efficient and commonly used in compilers. The document also discusses error handling strategies like panic-mode recovery and phrase-level recovery during parsing. It provides examples of context-free grammars and derivations in a grammar.

Uploaded by

Amir Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

System Software & Compiler Design Module-4

Chapter – 4

SYNTAX ANALYSIS
Introduction
 The Role of the Parser
Syntax Analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The
syntax of a programming is described by a context-free grammar (CFG). We will use BNF
(Backus-Naur Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of
that program. Otherwise the parser gives the error messages.
A context-free grammar gives a precise syntactic specification of a programming
language. The design of the grammar is an initial phase of the design of a compiler. A
grammar can be directly converted into a parser by some tools. Parser works on a stream of
tokens. The smallest item is a token.
The position of the parser in compiler is shown below.

There are three general types of parsers for grammars:


Universal parser
Top-down parser
Bottom-up parser
Universal parsing method such as the cocke-younger-kasami algorithm and earley’s
algorithm can parse any grammar. These general method is too inefficient to use in
production compilers.
Top-down parsing methods build parse trees from top (root) to the bottom (leaves),
while bottom-up parsing methods start from leaves and work their way up to the root. In
either case, the input to the parser is scanned from left to right, one symbol at a time. These
methods commonly used in compilers.
Efficient top-down and bottom-up parsers can be implemented only for subclasses
of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing

DEPT. OF CSE, SJCIT 1 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

 Representative Grammar
Some of the grammars that will be examined in this unit are presented here for ease of
reference. Constructs that begin with keywords like while or int, are relatively easy to parse,
because the keyword guides the choice of the grammar production that must be applied to
match the input.
We concentrate on expressions, which present more of challenge, because of the
associativity and precedence of operators.
Associativity and precedence are captured in the following grammar, which is describing
expressions, terms, and factors. E represents expressions consisting of terms separated by +
signs, T represents terms consisting of factors separated by * signs, and F represents factors
that can be represents either parenthesized expressions or identifiers.

E→E+T|T
T→T*F|F
F → ( E ) | id
The above expression grammar belongs to the class of LR grammars that are suitable for
bottom-up parsing. This grammar can be adapted to handle additional operators and
additional levels of precedence. However, it cannot be used for top-down parsing because it
is left recursive.
The following non-left recursive grammar used for top-down parsing.
E → T E’
E’ → + T E’ | Ɛ
T → F T’
T’ → * F T’ | Ɛ
F → ( E ) | id
The following grammar treats + and * alike, so it is useful for illustrating techniques for
handling ambiguities during parsing.
E → E + E | E * E | ( E ) | id
Here, E represents expressions of all types. Grammar permits more than one parse tree for
expressions like a+b*c.
 Syntax Error Handling
Common Programming errors can occur at many different levels.
1. Lexical errors: include misspelling of identifiers, keywords, or operators.
Eg: the use of an identifier elipseSize instead of ellipseSize and missing quotes
around text intended as a string.
2. Syntactic errors: include misplaced semicolons or extra or missing braces.
Eg: in C or Java, the appearance of a case statement without an enclosing switch is a
syntactic error.
3. Semantic errors: include type mismatches between operators and operands.
Eg: a return statement in a Java method with result type void.
4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Eg: the assignment operator = instead of the comparison operator ==.
Goals of the Parser
• Report the presence of errors clearly and accurately

DEPT. OF CSE, SJCIT 2 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

• Recover from each error quickly enough to detect subsequent errors.


• Add minimal overhead to the processing of correct programs.
 Error-Recovery Strategies
• Panic-Mode Recovery
• Phrase-Level Recovery
• Error Productions
• Global Correction
Panic-Mode Recovery
• On discovering an error, the parser discards input symbols one at a time until one of a
designated set of Synchronizing tokens is found.
• Synchronizing tokens are usually delimiters.
Ex: semicolon or } whose role in the source program is clear and unambiguous.
• It often skips a considerable amount of input without checking it for additional errors.
Advantage:
 Simplicity
 Is guaranteed not to go into an infinite loop

Phrase-Level Recovery
• A parser may perform local correction on the remaining input. i.e it may replace a prefix of
the remaining input by some string that allows the parser to continue.
Ex: replace a comma by a semicolon, insert a missing semicolon
• Local correction is left to the compiler designer.
• It is used in several error-repairing compliers, as it can correct any input string.
• Difficulty in coping with the situations in which the actual error has occurred before the
point of detection.

Error Productions
• By anticipating common errors that might be encountered, we can augment the grammar for
the language at hand with productions that generate the erroneous constructs.
• Then we can use the grammar augmented by these error productions to Construct a parser.
• If an error production is used by the parser, we can generate appropriate error diagnostics
to indicate the erroneous construct that has been recognized in the input.

Global Correction
• We use algorithms that perform minimal sequence of changes to obtain a globally least cost
correction.
• Given an incorrect input string x and grammar G, these algorithms will find a parse tree for
a related string y.
• Such that the number of insertions, deletions and changes of tokens required to transform x
into y is as small as possible.
• It is too costly to implement in terms of time space, so these techniques only of theoretical
interest.

Context-Free Grammars
• Inherently recursive structures of a programming language are defined by a context-free
grammar.
• In a context-free grammar, we have:

DEPT. OF CSE, SJCIT 3 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

– A finite set of terminals (in our case, this will be the set of tokens)
– A finite set of non-terminals (syntactic-variables)
– A finite set of productions rules in the following form
• A → α where A is a non-terminal and α is a string of terminals and non-terminals
(including the empty string)
– ‘A’ start symbol (one of the non-terminal symbol)

NOTATIONAL CONVENTIONS
1. Symbols used for terminals are:
a. Lower case letters early in the alphabet (such as a, b, c, . . .)
b. Operator symbols (such as +, *, . . . )
c. Punctuation symbols (such as parenthesis, comma and so on)
d. The digits(0…9)
e. Boldface strings and keywords (such as id or if) each of which represents a single terminal
symbol
2. Symbols used for non terminals are:
a. Uppercase letters early in the alphabet (such as A, B, C, …)
b. The letter S, which when it appears is usually the start symbol.
c. Lowercase, italic names (such as expr or stmt).
3. Lower case greek letters, α, β, ϒ for example represent (possibly empty) strings of
grammar symbols.
4. Upper case letters late in the alphabet such as X, Y, Z represent grammar symbols,
either nonterminals or terminals.
5. Lowercase letters late in the alphabet such as u, v, ….. z represent strings of
terminals.
6. A set of productions A→ α1 | α2 |…… | αk. Call α1, α2,…. Αk the alternatives for A.

Example: using above notations list out terminals, non terminals and start symbol in the
following example
E→E+T|E–T|T
T →T * F | T / F | F
F → ( E ) | id
Here terminal are +, -, *, / , (, ), id
Non terminals are E, T, F
Start symbol is E

DEPT. OF CSE, SJCIT 4 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

 Derivations

DEPT. OF CSE, SJCIT 5 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 6 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 7 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 8 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Example:

DEPT. OF CSE, SJCIT 9 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Example:

DEPT. OF CSE, SJCIT 10 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 11 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 12 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Top-Down Parsing
Top down parsing can be viewed as the problem of constructing a parse tree for the input
string, starting from the root and creating the nodes of the parse tree in preorder.

Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an


input string.

Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to try
other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient.
• It tries to find the left-most derivation.

A typical procedure for a nonterminal in a top-down parser as shown below.


void A( ){
1) Choose an A-production, A→X1X2 ..................... Xk;
2) for( i=1 to k) {
3) if(Xi is a non terminal)

DEPT. OF CSE, SJCIT 13 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

4) Call procedure Xi( );


5) else if (Xi equals the current input symbol a)
6) Advance the input to the next symbol;
7) else /* an error has occurred */;
}
}

To allow backtracking, the code as shown above needs to be modified. First, we


cannot choose a unique A-production at line (1), so we must try each of several productions
in some order. Then, failure at line (7) is not ultimate failure, but suggests only that we need
to return to line (1) and try another A-production. Only if there are no more A-productions to
try do we declare that an input error has been found. In order to try another A-production, we
need to be able to reset the input pointer to where it was when we first reached line (1). Thus,
a local variable is needed to store this input pointer for future use.

Predictive Parsing
• No backtracking
• Efficient
• Needs a special form of grammars (LL (1) grammars).
•Recursive Predictive Parsing is a special form of Recursive Descent parsing without
backtracking. Non-Recursive (Table Driven) Predictive Parser is also known as LL (1)
parser.
• When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose
a production rule by just looking the current symbol in the input string.

• When we are trying to write the non-terminal stmt, we can uniquely choose the production
rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it may not be suitable
for predictive parsing (not LL(1) grammar).
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW

DEPT. OF CSE, SJCIT 14 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Non-Recursive Predictive Parsing -- LL (1) Parser


• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL (1) Parser.

DEPT. OF CSE, SJCIT 15 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 16 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 17 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 18 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 19 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 20 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 21 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 22 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 23 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

FIRST(S)={a,c,e,Ɛ}

FIRST(A)={a,c}

FOLLOW(S)={$}

FOLLOW(A)={b,d}

DEPT. OF CSE, SJCIT 24 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Note: For panic mode error recovery, refer example 4.36 in prescribed text book and also
refer figure 4.22 and figure 4.23.

DEPT. OF CSE, SJCIT 25 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 26 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the given input starting from leaves towards
the root.
• A bottom-up parser tries to find the right-most derivation of the given input in the reverse
order.

• Bottom-up parsing is also known as shift-reduce parsing because its two main actions are
shift and reduce.
– At each shift action, the current symbol in the input string is pushed to a stack.
– At each reduction step, the symbols at the top of the stack (this symbol sequence is the right
side of a production) will replaced by the non- terminal at the left side of that production.
– There are also two more actions: accept and error.

Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string into the starting symbol.

• At each reduction step, a substring of the input matching to the right side of a production
rule is replaced by the non-terminal at the left side of that production rule.
• If the substring is chosen correctly, the right most derivation of that string is created in the
reverse order.

DEPT. OF CSE, SJCIT 27 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

• How do we know which substring to be replaced at each reduction step?

Handle
• Informally, a handle of a string is a substring that matches the right side of a production
rule.
– But not every substring matches the right side of a production rule is Handle

• If the grammar is unambiguous, then every right-sentential form of the grammar has exactly
one handle.
• We will see that is a string of terminals.

DEPT. OF CSE, SJCIT 28 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

A Shift-Reduce Parser

DEPT. OF CSE, SJCIT 29 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 30 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 31 Prepared by SHRIHARI M R


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 32 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 33 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 34 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 35 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 36 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 37 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 38 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 39 Prepared by AJAY.N


System Software & Compiler Design Module-4

DEPT. OF CSE, SJCIT 40 Prepared by AJAY.N


System Software & Compiler Design Module-4

NOTE: OPERATOR PRECEDENCE TOPIC REFER FROM PRESCRIBED TEXT


BOOK

DEPT. OF CSE, SJCIT 41 Prepared by AJAY.N

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy