100% found this document useful (2 votes)

1K views

Compiler Design-Notes

Notes compiler design

Uploaded by

csehod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

1K views

Compiler Design-Notes

Notes compiler design

Uploaded by

csehod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 212

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3501 COMPILER DESIGN

NOTES
REGULATION 2021

1
CS3501 COMPILER DESIGN SYLLABUS
COURSE OBJECTIVES:
To learn the various phases of compiler.
To learn the various parsing techniques.
To understand intermediate code generation and run-time environment.
To learn to implement the front-end of the compiler.
To learn to implement code generator.
To learn to implement code optimization.

UNIT I INTRODUCTION TO COMPILERS & LEXICAL ANALYSIS 8

Introduction- Translators- Compilation and Interpretation- Language processors -The Phases of Compiler –
Lexical Analysis – Role of Lexical Analyzer – Input Buffering – Specification of Tokens – Recognition of
Tokens – Finite Automata – Regular Expressions to Automata NFA, DFA – Minimizing DFA - Language for
Specifying Lexical Analyzers – Lex tool.
UNIT II SYNTAX ANALYSIS 11
Role of Parser – Grammars – Context-free grammars – Writing a grammar Top Down Parsing - General
Strategies - Recursive Descent Parser Predictive Parser-LL(1) - Parser-Shift Reduce Parser-LR Parser- LR (0)Item
Construction of SLR Parsing Table - Introduction to LALR Parser - Error Handling and Recovery in Syntax
Analyzer-YACC tool - Design of a syntax Analyzer for a Sample Language

UNIT III SYNTAX DIRECTED TRANSLATION & INTERMEDIATE CODE GENERATION 9

Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-Attribute Definitions-
Design of predictive translator - Type Systems-Specification of a simple type Checker- Equivalence of Type
Expressions-Type Conversions. Intermediate Languages: Syntax Tree, Three Address Code, Types and
Declarations, Translation of Expressions, Type Checking, Back patching.
UNIT IV RUN-TIME ENVIRONMENT AND CODE GENERATION 9
Runtime Environments – source language issues – Storage organization – Storage Allocation Strategies: Static,
Stack and Heap allocation - Parameter Passing-Symbol Tables - Dynamic Storage Allocation - Issues in the
Design of a code generator – Basic Blocks and Flow graphs - Design of a simple Code Generator - Optimal Code
Generation for Expressions– Dynamic Programming Code Generation.
UNIT V CODE OPTIMIZATION 8
Principal Sources of Optimization – Peep-hole optimization - DAG- Optimization of Basic Blocks - Global Data
Flow Analysis - Efficient Data Flow Algorithm – Recent trends in Compiler Design.

45 PERIODS

COURSE OUTCOMES:
On Completion of the course, the students should be able to:
CO1:Understand the techniques in different phases of a compiler.
CO2:Design a lexical analyser for a sample language and learn to use the LEX tool.
CO3:Apply different parsing algorithms to develop a parser and learn to use YACC tool
CO4:Understand semantics rules (SDT), intermediate code generation and run-time environment.

CO5:Implement code generation and apply code optimization techniques.

TEXT BOOK:
1. Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, “Compilers: Principles, Techniques and Tools”,
Second Edition, Pearson Education, 2009.

2
UNIT 1 INTRODUCTION TO COMPILERS & LEXICAL ANALYSIS

OVERVIEW OF LANGUAGE PROCESSING SYSTEM

• Preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short hands for longer
constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of-control
and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts
to build-in macro
• Compiler

Compiler is a software which converts a program written in high level language (Source Language) to low level
language (Object/Target/Machine Language).

• Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. It is capable of
creating code for a platform other than the one on which the compiler is running.
• Source-to-source Compiler or transcompiler or transpiler is a compiler that translates source code written
in one programming language into source code of another programming language.
3
• High Level Language – If a program contains #define or #include directives such as #include or #define it is
called HLL. They are closer to humans but far from machines. These (#) tags are called pre-processor
directives. They direct the pre-processor about what to do.
• Pre-Processor – The pre-processor removes all the #include directives by including the files called file
inclusion and all the #define directives using macro expansion. It performs file inclusion, augmentation,
macro-processing etc.
• Assembly Language – Its neither in binary form nor high level. It is an intermediate state that is a
combination of machine instructions and some other useful data needed for execution.
• Assembler – For every platform (Hardware + OS) we will have a assembler. They are not universal since for
each platform we have one. The output of assembler is called object file. Its translates assembly language
to machine code.
• Interpreter – An interpreter converts high level language into low level machine language, just like a
compiler. But they are different in the way they read the input. The Compiler in one go reads the inputs,
does the processing and executes the source code whereas the interpreter does the same line by line.
Compiler scans the entire program and translates it as a whole into machine code whereas an interpreter
translates the program one statement at a time. Interpreted programs are usually slower with respect to
compiled ones.
• Relocatable Machine Code – It can be loaded at any point and can be run. The address within the program
will be in such a way that it will cooperate for the program movement.
• Loader/Linker – It converts the relocatable code into absolute code and tries to run the program resulting
in a running program or an error message (or sometimes both can happen). Linker loads a variety of object
files into a single file to make it executable. Then loader loads it in memory and executes it.

TYPE OF TRANSLATORS:-

• INTERPRETOR
• COMPILER
• PREPROCESSOR

COMPILATION AND INTERPRETATION

Difference Between Compiler And Interpreter
Compiler Interpreter

Steps of Programming:
• Program Creation.
• Analysis of language by the
compiler and throws errors in
Steps of Programming:
case of any incorrect
• Program Creation.
statement.
• Linking of files or generation of Machine Code is not
• In case of no error, the
required by Interpreter.
Compiler converts the source
• Execution of source statements one by one.
code to Machine Code.
• Linking of various code files
into a runnable program.
• Finally runs a Program.

The compiler saves the Machine The Interpreter does not save the Machine Language.

4
Compiler Interpreter

Language in form of Machine

Code on disks.

Compiled codes run faster than

Interpreted codes run slower than Compiler.
Interpreter.

Linking-Loading Model is the

The Interpretation Model is the basic working model of
basic working model of the
the Interpreter.
Compiler.

The compiler generates an output

The interpreter does not generate any output.
in the form of (.exe).

Any change in the source program Any change in the source program during the
after the compilation requires translation does not require retranslation of the entire
recompiling the entire code. code.

Errors are displayed in Compiler

after Compiling together at the Errors are displayed in every single line.
current time.

The compiler can see code

The Interpreter works by line working of Code, that’s
upfront which helps in running the
why Optimization is a little slower compared to
code faster because of performing
Compilers.
Optimization.

It does not require source code for

It requires source code for later execution.
later execution.

Execution of the program takes

Execution of the program happens after every line is
place only after the whole
checked or evaluated.
program is compiled.

Compilers more often take a large

In comparison, Interpreters take less time for analyzing
amount of time for analyzing the
the source code.
source code.

CPU utilization is more in the case CPU utilization is less in the case of a Interpreter.

5
Compiler Interpreter

of a Compiler.

The use of Compilers mostly

The use of Interpreters is mostly in Programming and
happens in Production
Development Environments.
Environment.

Object code is permanently saved

No object code is saved for future use.
for future use.

C, C++, C#, etc are programming

Python, Ruby, Perl, SNOBOL, MATLAB, etc are
languages that are compiler-
programming languages that are interpreter-based.
based.

COUSINS OF COMPILER
1. Preprocessor 2. Assembler 3. Loader and Link-editor

Compiler Construction Tools

Compiler Construction Tools are specialized tools that help in the implementation of various phases of a
compiler. These tools help in the creation of an entire compiler or its parts.

Some of the commonly used compiler constructions tools are:-

• Parser Generator
• Scanner Generator
• Syntax Directed Translation Engines
• Automatic Code Generators
• Data-Flow Analysis Engines
• Compiler Construction Toolkits

Parser Generator

Parser Generator produces syntax analyzers (parsers) based on context-free grammar that takes input in the form of
the syntax of a programming language. It's helpful because the syntax analysis phase is quite complex and takes more
compilation and manual time.

Scanner Generator
6
Scanner Generator generates lexical analyzers from the input that consists of regular expression descriptions based on
tokens of a language. It generates a finite automaton to identify the regular expression.

Example: LEX is a scanner generator provided by UNIX systems.

Syntax Directed Translation Engines take a parse tree as input and generate intermediate code with three address
formats. These engines contain routines to traverse the parse tree and generate intermediate code. Each parse tree node
has one or more translations associated with it.

Automatic Code Generators

Automatic Code Generators take intermediate code as input and convert it into machine language. Each intermediate
language operation is translated using a set of rules and then sent into the code generator as an input. A template
matching process is used, and by using the templates, an intermediate language statement is replaced by its machine
language equivalent.

Data-Flow Analysis Engines

Data-Flow Analysis Engines is used for code optimization and can generate an optimized code. Data flow analysis is an
essential part of code optimization that collects the information, the values that flow from one part of a program to
another.

Compiler Construction Toolkits

Compiler Construction Toolkits provide an integrated set of routines that helps in creating compiler components or in the
construction of various phases of a compiler.

Structure of compiler or Phases of a Compiler

TWO PARTS OF A COMPILER:
I.Analysis phase (Machine Independent/Language Dependent)
• An intermediate representation is created from the give source code .It is also termed as front end of
compiler.
• Information about the source program is collected and stored in a data structure called symbol table.
➢ Lexical Analyzer
➢ Syntax Analyzer
➢ Semantic Analyzer
➢ Intermediate Code Generator
Lexical analyzer divides the program into “tokens”, Syntax analyzer recognizes “sentences” in
the program using syntax of language and Semantic analyzer checks static semantics of each
construct. Intermediate Code Generator generates “abstract” code.
2.Synthesis Phase (Machine Dependent/Language independent)
– Equivalent target program is created from the intermediate representation. It has two parts :
➢ Code Optimizer
➢ Code Generator

7
Symbol Table – It is a data structure being used and maintained by the compiler, consists all the
identifier’s name along with their types. It helps the compiler to function smoothly by finding the
identifiers quickly. It consists of the following
• An extensible array of records.
• The identifier and the associated records contains collected information about the identifier.
• FUNCTION identify (Identifier name)
• RETURNING a pointer to identifier information contains
• The actual string
• A macro definition
• A keyword definition
• A list of type, variable & function definition
• A list of structure and union name definition
• A list of structure and union field selected definitions.

8
1. Lexical Analyzer – It reads the program and converts it into tokens. It converts a stream of lexemes into a
stream of tokens. Tokens are defined by regular expressions which are understood by the lexical analyzer.
It also removes white-spaces and comments.
2. Syntax Analyzer – It is sometimes called as parser. It constructs the parse tree. It takes all the tokens one
by one and uses Context Free Grammar to construct the parse tree.
The rules of programming can be entirely represented in some few productions. Using these
productions we can represent what the program actually is. The input has to be checked whether it is in
the desired format or not.
Syntax error can be detected at this level if the input is not in accordance with the grammar.
3. Semantic Analyzer – It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a
verified parse tree.It also does type checking, Label checking and Flow control checking.
4. Intermediate Code Generator – It generates intermediate code, that is a form which can be readily
executed by machine We have many popular intermediate codes. Example – Three address code etc.
Intermediate code is converted to machine language using the last two phases which are platform
dependent.
Till intermediate code, it is same for every compiler out there, but after that, it depends on the platform.
To build a new compiler we don’t need to build it from scratch. We can take the intermediate code from
the already existing compiler and build the last two parts.
5. Code Optimizer – It transforms the code so that it consumes fewer resources and produces more
speed. The meaning of the code being transformed is not altered. Optimisation can be categorized into
two types: machine dependent and machine independent.
6. Target Code Generator – The main purpose of Target Code generator is to write a code that the
machine can understand and also register allocation, instruction selection etc. The output is dependent
on the type of assembler. This is the final stage of compilation.

Example:Consider the given expression and explain the different phases of compiler in evaluating the
expression
c=a+b*5;

Lexical Analysis

Lexemes Tokens

c identifier

= assignment symbol

a identifier

+ + (addition symbol)

b identifier

* * (multiplication symbol)

5 5 (number)
Hence, <id, 1><=>< id, 2>< +><id, 3 >< * >< 5>

9
Syntax Analysis
• Syntax analysis is the second phase of compiler which is also called as parsing.
Input: Tokens
Output: Syntax tree

Semantic Analysis
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax tree.
• Performs type checking.

Intermediate Code Generation

• Intermediate code generation produces intermediate representations for the source program which are of
the following forms:
o Postfix notation
o Three address code
o Syntax tree
Most commonly used form is the three address code.
t1 = inttofloat (5)
t2 = id3* tl
t3 = id2 + t2
id1 = t3

10
Code Optimization
• Code optimization phase gets the intermediate code as input and produces optimized intermediate code as
output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a program.
(I).Local Optimization:-
There are local transformations that can be applied to a program to make an improvement. For example,
If A > B goto L2
Goto L3
L2 :
This can be replaced by a single statement
If A < B goto L3
(II)Loop Optimization:-
Another important source of optimization concerns about increasing the speed of loops. A typical loop
improvement is to move a computation that produces the same result each time around the loop to a point, in
the program just before the loop is entered.
To improve the code generation, the optimization involves
o Deduction and removal of dead code (unreachable code).
o Calculation of constants in expressions and terms.
o Collapsing of repeated expression into temporary string.
o Loop unrolling.
o Moving code outside the loop.
o Removal of unwanted temporary variables.
t1 = id3* 5.0
id1 = id2 + t1

Code Generation
• Code generation is the final phase of a compiler.
• It gets input from code optimization phase and produces the target code or object code as result.
• Intermediate instructions are translated into a sequence of machine instructions that perform the same task.
• The code generation involves
o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data types.
o Generation of missing code.
LDF R2, id3
MULF R2, # 5.0
LDF R1, id2
ADDF R1, R2
11
STF id1, R1
Table Management (or) Book-keeping:-
This is the portion to keep the names used by the program and records essential information about each.
The data structure used to record this information called a ‘Symbol Table’.

Error Handlers:-
It is invoked when a flaw error in the source program is detected.

12
Lexical Analysis
Lexical analysis is the process of converting a sequence of characters from source program into a sequence of
tokens.

13
A program which performs lexical analysis is termed as a lexical analyzer (lexer), tokenizer or scanner.
Lexical analysis consists of two stages of processing which are as follows:
• Scanning
• Tokenization

➢ Token, Pattern and Lexeme

Token
Token is a valid sequence of characters which are given by lexeme. In a programming language,
• keywords,
• constant,
• identifiers,
• numbers,
• operators and
• punctuations symbols
are possible tokens to be identified.
Pattern
Pattern describes a rule that must be matched by sequence of characters (lexemes) to form a token. It can be
defined by regular expressions or grammar rules.
Lexeme
Lexeme is a sequence of characters that matches the pattern for a token i.e., instance of a
token.
(eg.) c=a+b*5;
Token lexeme pattern
const const const
if if If
relation <,<=,= ,< >,>=,> < or <= or = or < > or
>= or letter
followed by letters &
digit
i pi any numeric constant
nun 3.14 any character b/w “and
“except"
literal "core" pattern

14
Role of Lexical Analyzer

Lexical analyzer performs the following tasks:

• Reads the source program, scans the input characters, group them into lexemes and produce the token as
output.
• Enters the identified token into the symbol table.
• Strips out white spaces and comments from source program.
• Correlates error messages with the source program i.e., displays error message with its occurrence by
specifying the line number.
• Expands the macros if it is found in the source program.
Tasks of lexical analyzer can be divided into two processes:
Scanning: Performs reading of input characters, removal of white spaces and comments.
Lexical Analysis: Produce tokens as the output.

➢ Need of Lexical Analyzer

Simplicity of design of compiler The removal of white spaces and comments enables the syntax analyzer for
efficient syntactic constructs.
Compiler efficiency is improved Specialized buffering techniques for reading characters speed up the compiler
process.
Compiler portability is enhanced

➢ Issues in Lexical Analysis

Lexical analysis is the process of producing tokens from the source program. It has the following issues:
• Lookahead
• Ambiguities
a.Lookahead
Lookahead is required to decide when one token will end and the next token will begin. The simple example
which has lookahead issues are i vs. if, = vs. ==. Therefore a way to describe the lexemes of each token is
required.
b.Ambiguities
The lexical analysis programs written with lex accept ambiguous specifications and choose the longest match
possible at each input point. Lex can handle ambiguous specifications. When more than one expression can
match the current input, lex chooses as follows:

15
• The longest match is preferred.
• Among rules which matched the same number of characters, the rule given first is preferred.

LEXICAL ERRORS
Lexical errors are the errors thrown by your lexer when unable to continue. Which means that there's no
way to recognise a lexeme as a valid token for you lexer. Syntax errors, on the other side, will be thrown
by your scanner when a given set of already recognised valid tokens don't match any of the right sides
of your grammar rules. simple panic-mode error handling system requires that we return to a high-level
parsing function when a parsing or lexical error is detected.
Lexical error handling approaches
Lexical errors can be handled by the following actions:
• Deleting one character from the remaining input.
• Inserting a missing character into the remaining input.
• Replacing a character by another character.
• Transposing two adjacent characters.
➢ Error Recovery Schemes
• Panic mode recovery
• Local correction
o Source text is changed around the error point in order to get a correct text.
o Analyzer will be restarted with the resultant new text as input.
• Global correction
o It is an enhanced panic mode recovery.
o Preferred when local correction fails.
➢ Panic mode recovery
In panic mode recovery, unmatched patterns are deleted from the remaining input, until the lexical analyzer
can find a well-formed token at the beginning of what input is left.

Input Buffering
The lexical analyzer scans the input from left to right one character at a time. It uses two pointers begin ptr(bp)
and forward to keep track of the pointer of the input scanned.

16
Initially both the pointers point to the first character of the input string as shown below

The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is encountered, it
indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank space the lexeme “int” is
identified.
The fp will be moved ahead at white space, when fp encounters white space, it ignore and moves ahead. then
both the begin ptr(bp) and forward ptr(fp) are set at next token.
The input character is thus read from secondary storage, but reading in this way from secondary storage is
costly. hence buffering technique is used.A block of data is first read into a buffer, and then second by lexical
analyzer. there are two methods used in this context: One Buffer Scheme, and Two Buffer Scheme. These are
explained as following below.

1. One Buffer Scheme:

In this scheme, only one buffer is used to store the input string but the problem with this scheme is that if
lexeme is very long then it crosses the buffer boundary, to scan rest of the lexeme the buffer has to be

17
refilled, that makes overwriting the first of lexeme.

2.Two Buffer Scheme:

To overcome the problem of one buffer scheme, in this method two buffers are used to store the input string.
the first buffer and second buffer are scanned alternately. when end of current buffer is reached the other
buffer is filled. the only problem with this method is that if length of the lexeme is longer than length of the
buffer then scanning input cannot be scanned completely.
Initially both the bp and fp are pointing to the first character of first buffer. Then the fp moves towards
right in search of end of lexeme. as soon as blank character is recognized, the string between bp and fp is
identified as corresponding token. to identify, the boundary of first buffer end of buffer character should
be placed at the end first buffer.
Similarly end of second buffer is also recognized by the end of buffer mark present at the end of second
buffer. when fp encounters first eof, then one can recognize end of first buffer and hence filling up second
buffer is started. in the same way when second eof is obtained then it indicates of second buffer.
alternatively both the buffers can be filled up until end of the input program and stream of tokens is
identified. This eof character introduced at the end is calling Sentinel which is used to identify the end of
buffer

Specification and Recognition of Tokens

➢ Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of
Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.

➢ Strings
Any finite sequence of alphabets is called a string. Length of the string is the total number of occurrence of
alphabets, e.g., the length of the string sample is 14 and is denoted by |sample| = 14. A string having no
alphabets, i.e. a string of zero length is known as an empty string and is denoted by ε (epsilon).

➢ Special Symbols
A typical high-level language contains the following symbols:-

Arithmetic Addition(+), Subtraction(-), Modulo(%),

Symbols Multiplication(*), Division(/)

18
Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)

Assignment =

Special Assignment +=, /=, *=, -=

Comparison ==, !=, <, <=, >, >=

Preprocessor #

Location Specifier &

Logical &, &&, |, ||, !

Shift Operator >>, >>>, <<, <<<

➢ Language
A language is considered as a finite set of strings over some finite set of alphabets. Computer languages are
considered as finite sets, and mathematically set operations can be performed on them. Finite languages can
be described by means of regular expressions.

➢ Longest Match Rule

When the lexical analyzer read the source-code, it scans the code letter by letter; and when it encounters a
whitespace, operator symbol, or special symbols, it decides that a word is completed.
For example:
int intvalue;
While scanning both lexemes till ‘int’, the lexical analyzer cannot determine whether it is a keyword int or the
initials of identifier int value.
The Longest Match Rule states that the lexeme scanned should be determined based on the longest match
among all the tokens available.

Regular expressions
Regular expressions have the capability to express finite languages by defining a pattern for finite strings of
symbols. The grammar defined by regular expressions is known as regular grammar. The language defined by
regular grammar is known as regular language.
Regular expression is an important notation for specifying patterns. Each pattern matches a set of strings, so
regular expressions serve as names for a set of strings

➢ Operations
The various operations on languages are:

19
• Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
• Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
• The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.

➢ Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
• Union : (r)|(s) is a regular expression denoting L(r) U L(s)
• Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
• Kleene closure : (r)* is a regular expression denoting (L(r))*
• (r) is a regular expression denoting L(r)

➢ Precedence and Associativity

• *, concatenation (.), and | (pipe sign) are left associative

• * has the highest precedence
• Concatenation (.) has the second highest precedence.
• | (pipe sign) has the lowest precedence of all.

➢ Representing valid tokens of a language in regular expression

If x is a regular expression, then:
• x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
• x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
• x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.

➢ Representing language tokens using regular expressions

Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
Example-1,
Ab*|cd? Is equivalent to (a(b*)) | (c(d?))
Pascal identifier
Letter - A | B | ……| Z | a | b |……| z|
20
Digits - 0 | 1 | 2 | …. | 9
Id - letter (letter / digit)*
We learn how to express pattern using regular expressions. Now, we must study how to take the patterns for
all the needed tokens and build a piece of code that examins the input string and finds a prefix that is a
lexeme matching one of the patterns.

Stmt →if expr then stmt

| If expr then else stmt
|є
Expr →term relop term
| term
Term →id
|number

For relop ,we use the comparison operations of languages like Pascal or SQL where = is “equals” and < > is
“not equals” because it presents an interesting structure of lexemes.

The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of tokens as far as
the lexical analyzer is concerned, the patterns for the tokens are described using regular definitions.
digit -->[0,9]
digits -->digit+
number -->digit(.digit)?(e.[+-]?digits)?
letter -->[A-Z,a-z]
id -->letter(letter/digit)*
if --> if
then -->then
else -->else
relop --></>/<=/>=/==/< >

Lexical analysis Parsing

A Scanner simply turns an input String (say A parser converts this list of tokens into a Tree-
a file) into a list of tokens. These tokens like object to represent how the tokens fit
represent things like identifiers, together to form a cohesive whole (sometimes
parentheses, operators etc. referred to as a sentence).
The lexical analyzer (the "lexer") parses A parser does not give the nodes any meaning
individual symbols from the source code file beyond structural cohesion. The next thing to do
into tokens. From there, the "parser" proper is extract meaning from this structure
turns those whole tokens into sentences of (sometimes called contextual analysis).
your grammar

Lex
Lex is a tool in lexical analysis phase to recognize tokens using regular expression.
• Lex tool itself is a lex compiler.

21
➢ Use of Lex

• lex.l is an a input file written in a language which describes the generation of lexical analyzer. The lex compiler
transforms lex.l to a C program known as lex.yy.c.
• lex.yy.c is compiled by the C compiler to a file called a.out.
• The output of C compiler is the working lexical analyzer which takes stream of input characters and produces
a stream of tokens.
• yylval is a global variable which is shared by lexical analyzer and parser to return the name and an attribute
value of token.
• The attribute value can be numeric code, pointer to symbol table or nothing.
• Another tool for lexical analyzer generation is Flex.

➢ Structure of Lex Programs

Lex program will be in following form
declarations
%%
translation rules
%%
auxiliary functions
Declarations This section includes declaration of variables, constants and regular definitions.
Translation rules It contains regular expressions and code segments.
Form : Pattern {Action}
Pattern is a regular expression or regular definition.
Action refers to segments of code.
Auxiliary functions This section holds additional functions which are used in actions. These functions are
compiled separately and loaded with lexical analyzer.
Lexical analyzer produced by lex starts its process by reading one character at a time until a valid match for a
pattern is found.
Once a match is found, the associated action takes place to produce token.

22
The token is then given to parser for further processing.

. Lookahead Operator
• Lookahead operator is the additional operator that is read by lex in order to distinguish additional pattern for
a token.
• Lexical analyzer is used to read one character ahead of valid lexeme and then retracts to produce token.

Example :Lex program

/* LEX Program to recognize tokens */

#define LT 256

#define LE 257

#define EQ 258

#define NE 259

#define GT 260

#define GE 261

#define RELOP 262

#define ID 263

#define NUM 264

#define IF 265

#define THEN 266

#define ELSE 267

int attribute;

delim [

t\n]

ws {delim}+

letter [A

23
-Za

-z]

digit [0

-9]

id {letter}({letter}|{digit})*

num {digit}+(

\.{digit}+)?(E[+

-]?{digit}+)?

{ws} {}

if { return(IF); }

then { return(THEN); }

else { return(ELSE); }

{id} { return(ID); }

{num} { return(NUM); }

"<" { attribute=LT;return(RELOP); }

"<=" { attribute=LE;return(RELOP); }

"<>" { attribute=NE;return(RELOP); }

"=" { attribute=EQ;return(RELOP); }

">" { attribute=GT;return(RELOP); }

">=" { attribute=GE;return(RELOP); }

int yywrap(){

return 1;

int main() {

24
int token;

while(token=yylex())

printf("<%d,",token);

switch(token)

case ID:case NUM:

printf("%s>\n",yytext);

break;

case RELOP:

printf("%d>\n",attribute);

break;

default:

printf(")\n");

break; }}

return 0; }

AUTOMATA
An automation is defined as a system where information is transmitted and used for performing some
functions without direct participation of man.
1, an automation in which the output depends only on the input is called an automation without memory.
2, an automation in which the output depends on the input and state also is called as automation with
memory.
3, an automation in which the output depends only on the state of the machine is called a Moore machine.
4., an automation in which the output depends on the state and input at any instant of time is called a mealy
machine.
Finite automata may have outputs corresponding to each transition. There are two types of finite state
machines that generate output –
Mealy Machine andMoore machine
Mealy Machine Moore Machine

25
Output depends both upon the present Output depends only upon the present
state and the present input state.

Generally, it has fewer states than Moore Generally, it has more states than Mealy
Machine. Machine.

The value of the output function is a The value of the output function is a
function of the transitions and the function of the current state and the
changes, when the input logic on the changes at the clock edges, whenever state
present state is done. changes occur.

Mealy machines react faster to inputs. In Moore machines, more logic is required
They generally react in the same clock to decode the outputs resulting in more
cycle. circuit delays. They generally react one
clock cycle later.

Finite automata
Finite automata is a state machine that takes a string of symbols as input and changes its state accordingly.
Finite automata is a recognizer for regular expressions. When a regular expression string is fed into finite
automata, it changes its state for each literal. If the input string is successfully processed and the automata
reaches its final state, it is accepted
The mathematical model of finite automata consists of:

• Finite set of states (Q)

• Finite set of input symbols (Σ)
• One Start state (q0)
• Set of final states (qf)
• Transition function (δ)
The transition function (δ) maps the finite set of state (Q) to a finite set of input symbols (Σ), Q × Σ ➔ Q
Types of Finite Automata
Deterministic Automata
Non-Deterministic Automata.

1)DETERMINISTIC AUTOMATA
A deterministic finite automata has at most one transition from each state on any input. A DFA is a special
case of a NFA in which:-
1, it has no transitions on input € ,
2, each input symbol has at most one transition from any state.

DFA formally defined by 5 tuple notation M = (Q, Σ, δ, qo, F), where

Q is a finite ‘set of states’, which is non empty.
Σ is ‘input alphabets’, indicates input set.
qo is an ‘initial state’ and qo is in Q ie, qo, Σ, Q
F is a set of ‘Final states’,
26
δ is a ‘transmission function’ or mapping function, using this function the next state can be determined.
The regular expression is converted into minimized DFA by the following procedure:
Regular expression → NFA → DFA → Minimized DFA
Example:DFA

The Finite Automata is called DFA if there is only one path for a specific input from current state to next
state.
From state S0 for input ‘a’ there is only one path going to S2. similarly from S0 there is only one path for
input going to S1.

2)NONDETERMINISTIC AUTOMATA
A NFA is a mathematical model that consists of

▪ A set of states S.
▪ A set of input symbols Σ.
▪ A transition for move from one state to an other.
▪ A state so that is distinguished as the start (or initial) state.
▪ A set of states F distinguished as accepting (or final) state.
▪ A number of transition to a single symbol.
Example:NFA

Finite Automata Construction

Let L(r) be a regular language recognized by some finite automata (FA).
• States : States of FA are represented by circles. State names are written inside circles.
• Start state : The state from where the automata starts, is known as the start state. Start state has an
arrow pointed towards it.
• Intermediate states : All intermediate states have at least two arrows; one pointing to and another
pointing out from them.
• Final state : If the input string is successfully parsed, the automata is expected to be in this state. Final
state is represented by double circles. It may have any odd number of arrows pointing to it and even
number of arrows pointing out from it. The number of odd arrows are one greater than even, i.e. odd
= even+1.
• Transition : The transition from one state to another state happens when a desired symbol in the input
is found. Upon transition, automata can either move to the next state or stay in the same state.
Movement from one state to another is shown as a directed arrow, where the arrows points to the
destination state. If automata stays on the same state, an arrow pointing from a state to itself is
drawn.
27
TRANSITION DIAGRAM:
Transition Diagram has a collection of nodes or circles, called states. Each state represents a condition that
could occur during the process of scanning the input looking for a lexeme that matches one of several
patterns .
Edges are directed from one state of the transition diagram to another. each edge is labeled by a symbol or
set of symbols.
If we are in one state s, and the next input symbol is a, we look for an edge out of state s labeled by a. if we
find such an edge ,we advance the forward pointer and enter the state of the transition diagram to which that
edge leads.
Some important conventions about transition diagrams are
1. Certain states are said to be accepting or final .These states indicates that a lexeme has been found,
although the actual lexeme may not consist of all positions b/w the lexeme Begin and forward pointers we
always indicate an accepting state by a double circle.
2. In addition, if it is necessary to return the forward pointer one position, then we shall additionally place a
* near that accepting state.
3. One state is designed the state ,or initial state ., it is indicated by an edge labeled “start” entering
from nowhere .the transition diagram always begins in the state before any input symbols have been
used.

Example:Transition Diagram for recognizing relational operator

28
Example:Transition Diagram for recognizing identifier or variables

The above Transition Diagram for an identifier, defined to be a letter followed by any number of
letters or digits

Converting Regular Expression to Automata

We can use Thompson's Construction to find out a Finite Automaton from a Regular Expression. We will
reduce the regular expression into smallest regular expressions and converting these to NFA and finally to DFA.
Some basic RA expressions are the following −
Case 1 − For a regular expression ‘a’, we can construct the following FA −

Case 2 − For a regular expression ‘ab’, we can construct the following FA −

Case 3 − For a regular expression (a+b), we can construct the following FA −

29
Case 4 − For a regular expression (a+b)*, we can construct the following FA −

Method
Step 1 Construct an NFA with Null moves from the given regular expression.
Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA.
Problem
Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0
Solution
We will concatenate three expressions "1", "(0 + 1)*" and "0"

Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get the following −

30
Minimizing DFA

➢ DFA Minimization using Myphill-Nerode Theorem

Algorithm
Input − DFA
Output − Minimized DFA
Step 1 − Draw a table for all pairs of states (Qi, Qj) not necessarily connected directly [All are unmarked
initially]
Step 2 − Consider every state pair (Qi, Qj) in the DFA where Qi ∈ F and Qj ∉ F or vice versa and mark them.
[Here F is the set of final states]
Step 3 − Repeat this step until we cannot mark anymore states −
If there is an unmarked pair (Qi, Qj), mark it if the pair {δ (Qi, A), δ (Qi, A)} is marked for some input alphabet.
Step 4 − Combine all the unmarked pair (Qi, Qj) and make them a single state in the reduced DFA.

Example
Let us use Algorithm 2 to minimize the DFA shown below.

Step 1 − We draw a table for all pair of states.

31
a b c d e f

Step 2 − We mark the state pairs.

a b c d e f

c ✔ ✔

d ✔ ✔

e ✔ ✔

f ✔ ✔ ✔

Step 3 − We will try to mark the state pairs, with green colored check mark, transitively. If we input 1 to state
‘a’ and ‘f’, it will go to state ‘c’ and ‘f’ respectively. (c, f) is already marked, hence we will mark pair (a, f). Now,
we input 1 to state ‘b’ and ‘f’; it will go to state ‘d’ and ‘f’ respectively. (d, f) is already marked, hence we will
mark pair (b, f).

a b c d e f

32
a

c ✔ ✔

d ✔ ✔

e ✔ ✔

f ✔ ✔ ✔ ✔ ✔

After step 3, we have got state combinations {a, b} {c, d} {c, e} {d, e} that are unmarked.
We can recombine {c, d} {c, e} {d, e} into {c, d, e}
Hence we got two combined states as − {a, b} and {c, d, e}
So the final minimized DFA will contain three states {f}, {a, b} and {c, d, e}

➢ DFA Minimization using Equivalence Theorem

If X and Y are two states in a DFA, we can combine these two states into {X, Y} if they are not distinguishable.
Two states are distinguishable, if there is at least one string S, such that one of δ (X, S) and δ (Y, S) is accepting
and another is not accepting. Hence, a DFA is minimal if and only if all the states are distinguishable.

Algorithm 3
Step 1 − All the states Q are divided in two partitions − final states and non-final states and are denoted by P0.
All the states in a partition are 0th equivalent. Take a counter k and initialize it with 0.
Step 2 − Increment k by 1. For each partition in Pk, divide the states in Pk into two partitions if they are k-
distinguishable. Two states within this partition X and Y are k-distinguishable if there is an input S such
that δ(X, S) and δ(Y, S) are (k-1)-distinguishable.
Step 3 − If Pk ≠ Pk-1, repeat Step 2, otherwise go to Step 4.

33
Step 4 − Combine kth equivalent sets and make them the new states of the reduced DFA.

Example
Let us consider the following DFA −

q δ(q,0) δ(q,1)

a b c

b a d

c e f

d e f

e e f

f f f

Let us apply the above algorithm to the above DFA −

• P0 = {(c,d,e), (a,b,f)}
• P1 = {(c,d,e), (a,b),(f)}
• P2 = {(c,d,e), (a,b),(f)}
Hence, P1 = P2.
There are three states in the reduced DFA. The reduced DFA is as follows −

34
Q δ(q,0) δ(q,1)

(a, b) (a, b) (c,d,e)

(c,d,e) (c,d,e) (f)

(f) (f) (f)

UNIT II
SYNTAX ANALYSIS

Role of the parser :

In the syntax analysis phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are
grouped according to the syntactic rules of the language. This is done by a parser. The parser obtains a string of
tokens from the lexical analyzer and verifies that the string can be the grammar for the source language. It
detects and reports any syntax errors and produces a parse tree from which intermediate code can be

35
generated.

Ambiguity
A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
Eg- consider a grammar
S -> aS | Sa | a

Grammars are used to describe the syntax of a programming language. It specifies the structure of expression
and statements.

stmt -> if (expr) then stmt

where stmt denotes statements,
expr denotes expressions.

Types of grammar
• Type 0 grammar
• Type 1 grammar
• Type 2 grammar
• Type 3 grammar

Grammar Grammar Accepted Language Accepted Automaton

Type

Type 0 Unrestricted grammar Recursively enumerable Turing Machine

language

Type 1 Context-sensitive Context-sensitive language Linear-bounded

grammar automaton

Type 2 Context-free grammar Context-free language Pushdown automaton

Type 3 Regular grammar Regular language Finite state automaton

Error Handling in Compiler Design

36
The tasks of the Error Handling process are to detect each error, report it to the user, and then make some
recover strategy and implement them to handle error. During this whole process processing time of program
should not be slow. An Error is the blank entries in the symbol table.
Types or Sources of Error – There are two types of error: run-time and compile-time error:
1. A run-time error is an error which takes place during the execution of a program, and usually happens
because of adverse system parameters or invalid input data. The lack of sufficient memory to run an
application or a memory conflict with another program and logical error are example of this. Logic errors,
occur when executed code does not produce the expected result. Logic errors are best handled by
meticulous program debugging.
2. Compile-time errors rises at compile time, before execution of the program. Syntax error or missing file
reference that prevents the program from successfully compiling is the example of this.
Classification of Compile-time error –
1. Lexical : This includes misspellings of identifiers, keywords or operators
2. Syntactical : missing semicolon or unbalanced parenthesis
3. Semantical : incompatible value assignment or type mismatches between operator and operand
4. Logical : code not reachable, infinite loop.
Finding error or reporting an error – Viable-prefix is the property of a parser which allows early detection of
syntax errors.
• Goal: detection of an error as soon as possible without further consuming unnecessary input
• How: detect an error as soon as the prefix of the input does not match a prefix of any string in the
language.
• Example: for(;), this will report an error as for have two semicolons inside braces.
Error Recovery –
The basic requirement for the compiler is to simply stop and issue a message, and cease compilation. There are
some common recovery methods that are follows.
1. Panic mode recovery: This is the easiest way of error-recovery and also, it prevents the parser from
developing infinite loops while recovering error. The parser discards the input symbol one at a time until one
of the designated (like end, semicolon) set of synchronizing tokens (are typically the statement or
expression terminators) is found. This is adequate when the presence of multiple errors in same statement
is rare. Example: Consider the erroneous expression- (1 + + 2) + 3. Panic-mode recovery: Skip ahead to
next integer and then continue. Bison: use the special terminal error to describe how much input to skip.
E->int|E+E|(E)|error int|(error)
2. Phase level recovery: Perform local correction on the input to repair the error. But error correction is
difficult in this strategy.
3. Error productions: Some common errors are known to the compiler designers that may occur in the code.
Augmented grammars can also be used, as productions that generate erroneous constructs when these
errors are encountered. Example: write 5x instead of 5*x
4. Global correction: Its aim is to make as few changes as possible while converting an incorrect input string
to a valid string. This strategy is costly to implement.

Context Free Grammar

Context free grammar is also called as Type 2 grammar.

Definition :CFG

A context free grammar G is defined by four tuples as,

G=(V,T,P,S)
where,
G - Grammar
V - Set of variables
T - Set of Terminals
37
P - Set of productions
S - Start symbol
It produces Context Free Language (CFL) which is defined as,

where,
L-Language
G- Grammar
w - Input string
S - Start symbol
T - Terminal
Hence, CFL is a collection of input strings which are terminals, derived from the start
symbol of grammar on multiple steps.
Terminals are symbols from which strings are formed.
• Lowercase letters i.e., a, b, c.
• Operators i.e.,+,-,*·
• Punctuation symbols i.e., comma, parenthesis.
• Digits i.e. 0, 1, 2, · · · ,9.
• Boldface letters i.e., id, if.

Non-terminals are syntactic variables that denote a set of strings.

Uppercase letters i.e., A, B, C.
Lowercase italic names i.e., expr , stmt.
Start symbol is the head of the production stated first in the grammar.
Production is of the form LHS ->RHS (or) head -> body, where head contains only one non-
terminal and body contains a collection of terminals and non-terminals.
(eg.) Let G be,

38
Sentence and Sentential Form
Any α ∈ (N ∪ Σ)∗ derivable from the start symbol S is called a sentential form of the grammar. If α ∈ Σ∗, i.e. α ∈
L(G), then α is called a sentence of the grammar.

Parse Tree

Given a grammar G = (Σ, N, P, S), the parse tree of a sentential form x of the grammar is a rooted ordered tree
with the following properties:

o The root is labeled by the start symbol.

o Each leaf is labeled by a token or ε.
o Each interior none is labeled by a nonterminal symbol.
o When a production A→x1… xn is derived, nodes labeled by x1… xn are made as
children
nodes of node labeled by A.
• root : the start symbol
• internal nodes : nonterminal
• leaf nodes : terminal

39
o Example G:
list -> list + digit | list - digit |
digitdigit -> 0|1|2|3|4|5|6|7|8|9

40
Ambiguous Grammar

A grammar G is said to be ambiguous if there is a sentence x ∈ L(G) that has two distinct parse trees.

• Example. Suppose a grammar G that can not distinguish between lists and
digits: string → string + string | string - string |0|1|2|3|4|5|6|7|8|9

if-else Ambiguity

Consider the following production rules: S → if(E)S | if(E) S else S | · · ·

A statement of the form if(E1) if(E2) S2 else S3 can be parsed in two different ways. Normally we associate the
else to the nearest if.So it produces two different parse trees.

➢ if-else Modified

Consider the following production rules: S → if(E)S | if(E) ES else S | · · · ES → if(E) ES else ES | · · ·

We restrict the statement that can appeare in then-part. Now following statement has unique parse tree. if(E1)
if(E2) S2 else S3

Context Free Grammars vs Regular Expressions

Grammars are more powerful than regular expressions.
Every construct that can be described by a regular expression can be described by a
grammar but not vice-versa.
Every regular language is a context free language but reverse does not hold.
(eg.)

41
RE= (a I b)*abb (set of strings ending with abb).
Grammar

Rules for converting NFA to CFG

For each state i of the NFA, create a non-terminal Ai.

If state i has a transition to state j on input a, add the production Ai -> aAj.
If state i goes to state j on input e, add the production Ai -> Aj.
If i is an accepting state, add Ai -> Ɛ.
If i is a start state, make Ai be the start symbol of the grammar.

Parsing
Parsing can be defined as top-down or bottom-up based on how the parse-tree is
constructed.

Top-Down Parsing
We have learnt in the last chapter that the top-down parsing technique parses the input, and
starts constructing a parse tree from the root node gradually moving down to the leaf nodes.
The types of top-down parsing are depicted below:

42
Bottom-up Parsing
Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it
reaches the root node. Here, we start from a sentence and then apply production rules in
reverse manner in order to reach the start symbol. The image given below depicts the
bottom-up parsers available.

LL vs. LR parsing
LL parsing LR parsing

Does a leftmost derivation. Does a rightmost derivation in reverse.

43
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.

Ends when the stack is empty. Starts with an empty stack.

Uses the stack for designating what is still to be Uses the stack for designating what is already seen.
expected.

Builds the parse tree top-down. Builds the parse tree bottom-up.

Continuously pops a nonterminal off the stack, Tries to recognize a right hand side on the stack,
and pushes the corresponding right hand side. pops it, and pushes the corresponding nonterminal.

Expands the non-terminals. Reduces the non-terminals.

Reads the terminals when it pops one off the Reads the terminals while it pushes them on the
stack. stack.

Pre-order traversal of the parse tree. Post-order traversal of the parse tree.

Recursive Descent Parser

Recursive descent is a top-down parsing technique that constructs the parse tree from the
top and the input is read from left to right. It uses procedures for every terminal and non-
terminal entity. This parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking. But the grammar associated with it (if not left
factored) cannot avoid back-tracking. A form of recursive-descent parsing that does not
require any back-tracking is known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is
recursive in nature.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against
the production rules to replace them (if matched). To understand this, take the following
example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the
input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser

44
advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and
checks its production from the left (X → oa). It does not match with the next input symbol. So
the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.

Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the
next input symbols. To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate a parse
tree. Both the stack and the input contains an end symbol $ to denote that the stack is empty
and the input is consumed. The parser refers to the parsing table to take any decision on the
input and stack element combination.

45
In recursive descent parsing, the parser may have more than one production to choose from
for a single instance of input, whereas in predictive parser, each step has at most one
production to choose. There might be instances where there is no production matching the
input string, making the parsing procedure to fail.

LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with
some restrictions to get the simplified version, in order to achieve easy implementation. LL
grammar can be implemented by means of both algorithms namely, recursive-descent or
table-driven.
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the
second L in LL(k) stands for left-most derivation and k itself represents the number of look
aheads. Generally k = 1, so LL(k) may also be written as LL(1).

46
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows
exponentially with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is
not LL(k), for any given k.
Given below is an algorithm for LL(1) Parsing:
Input:
string ω
parsing table M for grammar G

Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.

Initial State : $S on stack (with S being start symbol)

ω$ in the input buffer

SET ip to point the first symbol of ω$.

repeat
let X be the top stack symbol and a the symbol pointed by ip.

if X∈ V or $t

if X = a
POP X and advance ip.
else
error()
endif

else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
error()
endif
endif
until X = $ /* empty stack */

First and Follow function-

First Function-
First(α) is a set of terminal symbols that begin in strings derived from α.

Rule 1:For a production rule X → ∈,

First(X) = { ∈ }
Rule 2:For any terminal symbol ‘a’,
First(a) = { a }

47
Rule 3:For a production rule X → Y1Y2Y3,

• If ∈ ∉ First(Y1), then First(X) = First(Y1)

• If ∈ ∈ First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3)
Calculating First(Y2Y3)

• If ∈ ∉ First(Y2), then First(Y2Y3) = First(Y2)

• If ∈ ∈ First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3)

Follow Function-
Follow(α) is a set of terminal symbols that appear immediately to the right of α.
Rule-01:

For the start symbol S, place $ in Follow(S).

Rule-02:

For any production rule A → αB,

Follow(B) = Follow(A)
Rule-03:

For any production rule A → αBβ,

• If ∈ ∉ First(β), then Follow(B) = First(β)

• If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)

Note-01:

• ∈ may appear in the first function of a non-terminal.

• ∈ will never appear in the follow function of a non-terminal.

Note-02:

• Before calculating the first and follow functions, eliminate Left Recursion from the grammar, if present.

Note-03:

• We calculate the follow function of a non-terminal by looking where it is present on the RHS of a production rule.

48
Problem-01:

Calculate the first and follow functions for the given grammar-

S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈

Solution-
First Functions-

• First(S) = { a }
• First(B) = { c }
• First(C) = { b , ∈ }
• First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }
• First(E) = { g , ∈ }
• First(F) = { f , ∈ }

Follow Functions-

• Follow(S) = { $ }
• Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h }
• Follow(C) = Follow(B) = { g , f , h }
• Follow(D) = First(h) = { h }
• Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h }
• Follow(F) = Follow(D) = { h }

Problem-02:

Calculate the first and follow functions for the given grammar-

S→A
A → aB / Ad
B→b

49
C→g

Solution-

We have-

• The given grammar is left recursive.

• So, we first remove left recursion from the given grammar.

After eliminating left recursion, we get the following grammar-

S→A
A → aBA’
A’ → dA’ / ∈
B→b
C→g

Now, the first and follow functions are as follows-

First Functions-

• First(S) = First(A) = { a }
• First(A) = { a }
• First(A’) = { d , ∈ }
• First(B) = { b }
• First(C) = { g }

Follow Functions-
• Follow(S) = { $ }
• Follow(A) = Follow(S) = { $ }
• Follow(A’) = Follow(A) = { $ }
• Follow(B) = { First(A’) – ∈ } ∪ Follow(A) = { d , $ }
• Follow(C) = NA

Problem-03:

Calculate the first and follow functions for the given grammar-

S → (L) / a
L → SL’
L’ → ,SL’ / ∈

50
Solution-

The first and follow functions are as follows-

First Functions-

• First(S) = { ( , a }
• First(L) = First(S) = { ( , a }
• First(L’) = { , , ∈ }

Follow Functions-

• Follow(S) = { $ } ∪ { First(L’) – ∈ } ∪ Follow(L) ∪ Follow(L’) = { $ , , , ) }

• Follow(L) = { ) }
• Follow(L’) = Follow(L) = { ) }
Problem-04:

Calculate the first and follow functions for the given grammar-

S → AaAb / BbBa
A→∈
B→∈

Solution-

The first and follow functions are as follows-

First Functions-

• First(S) = { First(A) – ∈ } ∪ First(a) ∪ { First(B) – ∈ } ∪ First(b) = { a , b }

• First(A) = { ∈ }
• First(B) = { ∈ }

Follow Functions-

• Follow(S) = { $ }
• Follow(A) = First(a) ∪ First(b) = { a , b }
• Follow(B) = First(b) ∪ First(a) = { a , b }

51
Left Recursion-

• A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS.
• A grammar containing a production having left recursion is called as Left Recursive Grammar.

Example-

S → Sa / ∈
(Left Recursive Grammar)

• Left recursion is considered to be a problematic situation for Top down parsers.

• Therefore, left recursion has to be eliminated from the grammar.
Elimination of Left Recursion

Left recursion is eliminated by converting the grammar into a right recursive grammar.
If we have the left-recursive pair of productions-

A → Aα / β
(Left Recursive Grammar)
where β does not begin with an A.
Then, we can eliminate left recursion by replacing the pair of productions with-

A → βA’
A’ → αA’ / ∈
Problem-01:

Consider the following grammar and eliminate left recursion-

A → ABd / Aa / a
B → Be / b

Solution-

The grammar after eliminating left recursion is-

A → aA’

A’ → BdA’ / aA’ / ∈

B → bB’

B’ → eB’ / ∈

52
Problem-02:

Consider the following grammar and eliminate left recursion-

E→E+E/ExE/a

Solution-

The grammar after eliminating left recursion is-

E → aA
A → +EA / xEA / ∈

Problem-03

Consider the following grammar and eliminate left recursion-

E→E+T/T
T→TxF/F
F → id

Solution-

The grammar after eliminating left recursion is-

E → TE’
E’ → +TE’ / ∈
T → FT’
T’ → xFT’ / ∈
F → id

Problem-04:

Consider the following grammar and eliminate left recursion-

S → (L) / a
L→L,S/S

Solution-

53
The grammar after eliminating left recursion is-
S → (L) / a
L → SL’

L’ → ,SL’ / ∈

Problem-05:

Consider the following grammar and eliminate left recursion-

S → S0S1S / 01

Solution-

The grammar after eliminating left recursion is-

S → 01A

A → 0S1SA / ∈

Problem-06:

Consider the following grammar and eliminate left recursion-

A → Ba / Aa / c
B → Bb / Ab / d

Solution-

This is a case of indirect left recursion.

Step-01:

First let us eliminate left recursion from A → Ba / Aa / c

Eliminating left recursion from here, we get-

A → BaA’ / cA’

A’ → aA’ / ∈

Now, given grammar becomes-

A → BaA’ / cA’

A’ → aA’ / ∈

B → Bb / Ab / d

54
Step-02:

Substituting the productions of A in B → Ab, we get the following grammar-

A → BaA’ / cA’

A’ → aA’ / ∈

B → Bb / BaA’b / cA’b / d

Step-03:

Now, eliminating left recursion from the productions of B, we get the following grammar-
A → BaA’ / cA’

A’ → aA’ / ∈

B → cA’bB’ / dB’

B’ → bB’ / aA’bB’ / ∈

This is the final grammar after eliminating left recursion.

Left Factoring
• If a grammar contains two productions of
formS→ aα and S → aβ
it is not suitable for top down parsing without backtracking. Troubles of this form can
sometimes be removed from the grammar by a technique called the left factoring.
Example:
1)In the left factoring,
we replace { S→ aα, S→ aβ } by
{ S → aS', S'→ α, S'→ β }
2)Consider the grammar , G : S→iEtS | iEtSeS | a
E→b

Solution:

Left factored, this grammar becomes

S → iEtSS’ | a
S’ → eS | ε
E→b

FIRST(E) = { ( , id}
FIRST(E’) ={+ ,ε}
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }

56
Conditions for a LL(1)Grammar:

A grammar G is LL(1) if A → α | β are two distinct productions of G:

• for no terminal, both α and β derive strings beginning with a.

• at most one of α and β can derive empty string.

• if β → t, then α does not derive any string beginning with a terminal in FOLLOW(A).

SHIFT-REDUCE PARSING

Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for
an input string beginning at the leaves (the bottom) and working up towards the root (the
top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.

57
REDUCTION (LEFTMOST) RIGHTMOST DERIVATION

abbcde (A → b) S→ aABe

aAbcde (A → Abc) → aAde

aAde (B → d) → aAbcde

aABe(S → aABe) → abbcde

The reductions trace out the right-most derivation in reverse.

Handles:

A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.

Example:

Consider the grammar:

E → E+E

E → E*EE
→ (E)

E → id

And the input string id1+id2*id3

The rightmost derivation is :

E →E+E

→ E+E*E

→ E+E*id3

→ E+id2*id3

→id 1+id2*id3

58
In the above derivation the underlined substrings are calledhandles.

Handle pruning:

A rightmost derivation in reverse can be obtained by “handle pruning”.

(i.e.) ifwis a sentence or string of the grammar at hand, thenw= y n, where yn is then th right-
sentinel form of some rightmost derivation.

59
Stack implementation of shift-reduce parsing :

Stack Input Action

$ id1+id2*id3 $ shift

$ id1 +id2*id3 $ reduce by E→id

$E +id2*id3 $ shift

$ E+ id2*id3 $ shift

$ E+id2 *id3 $ reduce by E→id

$ E+E *id3 $ shift

$ E+E* id3 $ shift

$ E+E*id3 $ reduce by E→id

$ E+E*E $ reduce by E→ E *E

$ E+E $ reduce by E→ E+E

$E $ accept

Actions in shift-reduce parser:

• shift – The next input symbol is shifted onto the top of the stack.
• reduce – The parser replaces the handle within a stack with a non-terminal.
• accept – The parser announces successful completion of parsing.
• error – The parser discovers that a syntax error has occurred and calls an error
recoveryroutine.

Conflicts in shift-reduce parsing:

There are two conflicts that occur in shift shift-reduce parsing:

1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.

2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

1. Shift-reduce conflict:

Example:

Consider the grammar:

60
E→E+E | E*E | id and input id+id*id

61
Stack Input Action Stack Input Action
$ E+E *id $ Reduce by $E+E *id $ Shift
E→E+E
$E *id $ Shift $E+E* id $ Shift

$ E* id $ Shift $E+E*id $ Reduce by

E→id
$ E*id $ Reduce by $E+E*E $ Reduce by
E→id E→E*E
$ E*E $ Reduce by $E+E $ Reduce by
E→E*E E→E*E
$E $E

2. Reduce-reduce conflict:

Consider the

grammar:M → R+R |

R+c | R

R→c

and input c+c

Stack Input Action Stack Input Action

$ c+c $ Shift $ c+c $ Shift

$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift

$ R+ c$ Shift $ R+ c$ Shift

$ R+c $ Reduce by $ R+c $ Reduce by

R→c M→R+c
$ R+R $ Reduce by $M $
M→R+R
$M $

62
Viable prefixes:

➢ a is a viable prefix of the grammar if there iswsuch that awis a right sentinel form.
➢ The set of prefixes of right sentinel forms that can appear on the stack of a shift-reduce
parserare called viable prefixes.
➢ The set of viable prefixes is a regular language.
LR PARSERS

An efficient bottom-up syntax analysis technique that can be used to parse a large class
of CFG is called LR(k) parsing. The ‘L’ is for left-to-right scanning of the input, the ‘R’ for
constructing a rightmost derivation in reverse, and the ‘k’ for the number of input
symbols.When ‘k’ is omitted, it is assumed to be 1.

Advantages of LR parsing:

✓ It recognizes virtually all programming language constructs for which CFG can be
written.
✓ It is an efficient non-backtracking shift-reduce parsing method.
✓ A grammar that can be parsed using LR method is a proper superset of a grammar
thatcan be parsed with predictive parser.
✓ It detects a syntactic error as soon as possible.

Drawbacks of LR method:

It is too much of work to construct a LR parser by hand for a programming language

grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.

Types of LR parsing method:

1. SLR- Simple LR
▪ Easiest to implement, least powerful.
2. CLR- Canonical LR
▪ Most powerful, most expensive.
3. LALR- Look-Ahead LR
▪ Intermediate in size and cost between the other two methods.

The LR parsing algorithm:

The schematic form of an LR parser is as follows:

INPUT a1 ai an $
… …

Sm LR parsing program 63
Xm
Sm-1
Xm-1
OUTPUT

STACK

64
It consists of : an input, an output, a stack, a driver program, and a parsing table that has
twoparts (actionandgoto).

➢ The driver program is the same for all LR parser.

➢ The parsing program reads characters from an input buffer one at a time.

➢ The program uses a stack to store a string of the form s 0X1s1X2s2…Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.

➢ The parsing table consists of two parts :actionandgotofunctions.

Action: The parsing program determines s m, the state currently on top of stack, and ai,
the current input symbol. It then consultsaction[s m,ai] in the action table which can have one of
four values :

1. shift s, where s is a state,

2. reduce by a grammar production A → β,
3. accept, and
4. error.

Goto: The function goto takes a state and grammar symbol as arguments and produces a state.

LR Parsing algorithm:

Input: An input stringwand an LR parsing table with functionsactionandgotofor grammar G.

Output: If w is in L(G), a bottom-up-parse forw; otherwise, an error indication.

Method: Initially, the parser has s0 on its stack, where s0 is the initial state, andw$ in the input
buffer. The parser then executes the following program :

setipto point to the first input symbol ofw$;

repeat forever begin

letsbe the state on top of the stack and

athe symbol pointed to byip;

ifaction[s,a] = shifts’then begin

pushathens’ on top of the stack;
advanceipto the next input
symbol

end

else ifaction[s,a] = reduce A→βthen begin

pop 2* | β | symbols off the stack;

65
lets’ be the state now on top of the stack;
push A thengoto[s’, A] on top of the stack;
output the production A→ β

end

else ifaction[s,a] = acceptthen

return

elseerror( )

end

CONSTRUCTING SLR(1) PARSING TABLE:

To perform SLR parsing, take grammar as input and do the following:

1. Find LR(0) items.

2. Completing the closure.
3. Computegoto(I,X), where, I is set of items and X is grammar symbol.

LR(O) items:

AnLR(O) itemof a grammar G is a production of G with a dot at some position of the

right side. For example, production A → XYZ yields the four items :

A →.XYZ A
→ X.YZA →
XY.Z A →
XYZ.

Closure operation:

If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I
by the two rules:

1. Initially, every item in I is added to closure(I).

2. If A → a . Bβ is in closure(I) and B → y is a production, then add the item B → . y to I , if itis
not already there. We apply this rule until no more new items can be added to closure(I).

Goto operation:

Goto(I, X) is defined to be the closure of the set of all items [A→ aX . β] such that
[A→ a . Xβ] is in I.

Steps to construct SLR parsing table for grammar G are:

1. Augment G and produce G’

2. Construct the canonical collection of set of items C for G’
3. Construct the parsing action functionactionandgotousing the following algorithm
66
thatrequires FOLLOW(A) for each non-terminal of grammar.

Algorithm for construction of SLR parsing table:

Input: An augmented grammar G’

Output: The SLR parsing table functionsactionandgotofor G’

Method:

1. Construct C = {I0, I1, …. In}, the collection of sets of LR(0) items for G’.
2. Stateiis constructed from I i.. The parsing functions for stateiare determined as follows:
(a) If [A→a·aβ] is in Ii and goto(Ii,a) = Ij, then setaction[i,a] to “shift j”. Hereamust be
terminal.
(b) If [A→a·] is in Ii , then setaction[i,a] to “reduce A→a” for allain FOLLOW(A).
(c) If [S’→S.] is in Ii, then setaction[i,$] to “accept”.

If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).

3. Thegototransitions for stateiare constructed for all non-terminals A using the rule:
Ifgoto(I i,A) = Ij, thengoto[i,A] =j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing
[S’→.S].

Example for SLR parsing:

Construct SLR parsing for the following grammar :

G:E→E+T|T

T → T * F | FF
→ (E) | id

The given grammar is :

G : E → E + T ------ (1)
E →T ------ (2)
T → T * F ------ (3)
T→F ------ (4)
F → (E) ------ (5)
F → id ------ (6)

Step 1 :Convert given grammar into augmented grammar.

67
Augmented grammar :

E’ → E

E→E+T

E→T

T → T * FT
→F

F → (E)

F → id

Step 2 :Find LR (0) items.I0

: E’ →.E

E →.E + T

E →.T

T →.T * FT
→.F

F →.(E)

F →.id

68
GOTO ( I0 , E)I1 : E’ → E.

E → E.+ T

69
GOTO ( I4 , id )

I5 : F → id.

GOTO ( I6 , T )

GOTO ( I0 , T) I9 : E → E + T.

I2 : E → T. T → T.* F

T → T.* F

GOTO ( I6 , F )I3 :
GOTO ( I0 , F)I3 T → F.
: T → F.

GOTO ( I6 , ( )

I4 : F → (.E )

GOTO ( I0 , ( ) GOTO ( I2 , * )I7 : T → T *.F

I4 : F → (.E) F →.(E)

E →.E + TE F →.id
→.T

T →.T * FT GOTO ( I4 , E )I8 : F → ( E.)

→.F
E → E.+ T
F →.(E)
GOTO ( I6 , id)
F →.id
I5 : F → id.

GOTO ( I0 , id )
GOTO ( I7 , F )I10 :
I5 : F → id. T → T * F.

GOTO ( I1 , + )I6 : GOTO ( I7 , ( )

E → E +.T
I4 : F → (.E )
T →.T * FT
→.F E →.E + TE
→.T
F →.(E)
T →.T * FT
F →.id →.F

F →.(E)
70
F GOTO ( I8 , ) )

I11 :F → ( E).
→
. GOTO( I8 , + )
i
d I6 : E → E +.T

T →.T * FT
→.F

GOTO ( I7 , id ) F →.( E )

I5 : F → id. F →.id

GOTO ( I4 , T) GOTO ( I9 , *)I7

: T → T *.F
I2 : E →T.
F →.( E )
T → T.* F
F →.id

GOTO ( I4 , F)I3
: T → F.

71
GOTO ( I4 , ( )

I4 : F → (.E)

E →.E + T E
→.T

T →.T * F T
→.F

F →.(E)
FOLLOW (E) = { $ , ) , +)
F → id
FOLLOW (T) = { $ , + , ) , * }

FOOLOW (F) = { * , + , ) , $ }

SLR parsing table:

ACTION GOTO

id + * ( ) $ E T F

IO s5 s4 1 2 3

I1 s6 ACC

I2 r2 s7 r2 r2

I3 r4 r4 r4 r4

I4 s5 s4 8 2 3

I5 r6 r6 r6 r6

I6 s5 s4 9 3

I7 s5 s4 10

I8 s6 s11

I9 r1 s7 r1 r1

I1O r3 r3 r3 r3

I11 r5 r5 r5 r5

Blank entries are error entries.

72
Stack implementation:

Check whether the inputid + id * idis valid or not.

73
STACK INPUT ACTION

0 id + id * id $ GOTO ( I0 , id ) = s5 ;shift

0 id 5 + id * id $ GOTO ( I5 , + ) = r6 ;reduceby F→id

0F3 + id * id $ GOTO ( I0 , F ) = 3
GOTO ( I3 , + ) = r4 ;reduceby T → F

0T2 + id * id $ GOTO ( I0 , T ) = 2
GOTO ( I2 , + ) = r2 ;reduceby E → T

0E1 + id * id $ GOTO ( I0 , E ) = 1
GOTO ( I1 , + ) = s6 ;shift

0E1+6 id * id $ GOTO ( I6 , id ) = s5 ;shift

0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ;reduceby F → id

0E1+6F3 * id $ GOTO ( I6 , F ) = 3
GOTO ( I3 , * ) = r4 ;reduceby T → F

0E1+6T9 * id $ GOTO ( I6 , T ) = 9
GOTO ( I9 , * ) = s7 ;shift

0E1+6T9*7 id $ GOTO ( I7 , id ) = s5 ;shift

0 E 1 + 6 T 9 * 7 id 5 $ GOTO ( I5 , $ ) = r6 ;reduceby F → id

0 E 1 + 6 T 9 * 7 F 10 $ GOTO ( I7 , F ) = 10
GOTO ( I10 , $ ) = r3 ;reduceby T → T * F

0E1+6T9 $ GOTO ( I6 , T ) = 9
GOTO ( I9 , $ ) = r1 ;reduceby E → E + T

0E1 $ GOTO ( I0 , E ) = 1
GOTO ( I1 , $ ) =accept

74
LR(O) Items
An LR(O) item of a grammar G is a production of G with a dot at some position of the body.
(eg.)
A ---> •XYZ
A ---> XeYZ
A ---> XYeZ
A ---> XYZ•
One collection of set of LR(O) items, called the canonical LR(O) collection, provides finite
automaton that is used to make parsing decisions. Such an automaton is called an LR(O)
automaton.

Introduction to LALR Parser

• LALR stands for lookahead LR parser.
• This is the extension of LR(O) items, by introducing the one symbol of lookahead on the input.
• It supports large class of grammars.
• The number of states is LALR parser is lesser than that of LR( 1) parser. Hence, LALR is
preferable as it can be used with reduced memory.
• Most syntactic constructs of programming language can be stated conveniently.
➢ Steps to construct LALR parsing table
• Generate LR(l) items.
• Find the items that have same set of first components (core) and merge these sets into one.
• Merge the goto's of combined itemsets.
• Revise the parsing table of LR(l) parser by replacing states and goto's with combined states
and combined goto's respectively.

Error Handling and Recovery in Syntax Analyzer

➢ YACC program error handling

• Parser must be capable of detecting the error as soon as it encounters, i.e., when an input
stream does not match the rules in grammar.
• If there is an error-handling subroutine in the grammar file, the parser can allow for entering
the data again, ignoring the bad data or initiating a cleanup and recovery action.
• When the parser finds an error, it may need to reclaim parse tree storage, delete or alter
symbol table entries and set switches to avoid generating further output.

75
Error handling routines are used to restart the parser to continue its process even after the
occurrence of error.
• Tokens following the error get discarded to restart the parser.
• The YACC command uses a special token name error, for error handling. The token is placed
at places where error might occur so that it provides a recovery subroutine.
• To prevent subsequent occurrence of errors, the parser remains in error state until it
processes three tokens following an error.
• The input is discarded and no message is produced, if an error occurred while the
parser remains in error state.
(eg.) stat : error ‘;’
• The above rule tells the parser that when there is an error, it should ignore the
token and all following tokens until it finds the next semicolon.
• It discards all the tokens after the error and before the next semicolon.
• Once semicolon is found, the rule is reduced by parser and cleanup action
associated with that rule will be performed.
Providing for error correction
The input errors can be corrected by entering a line in the data stream again.
input : error ‘\n’
{
printf (“Reenter last line:”);
}
input
{
$$ = $4;
};
The YACC statement, yyerrok is used to indicate that error recovery is complete.
This statement leaves the error state and begins processing normally.
input : error ‘\n’
yyerrok;
printf (“Reenter last line:”);
}
input

76
{
$$ = $4;
};
Clearing the Lookahead token

• When an error occurs, the lookahead token becomes the token at which the error was
detected.
• The lookahead token must be changed if the error recovery action includes code to find the
correct place to start processing again.
• To clear the lookahead token, the error-recovery action issues the following statement:
yyclearin;
To assist in error handling, macros can be placed in YACC actions.
Macros for error handling

YYERROR Causes the parser to initiate error

handling.

YYABORT Causes the parser to return with a

value of 1.

YYACCEPT Causes the parser to return with a

value of 0.

YYRECOVERING() Returns a value of 1 if a syntax

error has
been detected and the parser has
not yet fully
recovered.

YACC
YACC in compiler design, also known as Yet Another Compiler Compiler is used to produce the source code
of the syntactic analyzer of the language produced by Look Ahead Left to Right LALR (1) parser generator. As an
input, a parser generator takes input a syntax specification and produces a procedure for recognizing that language
as output.

77
Stephen C. Johnson developed YACC in compiler design in the early 1970s. Initially, the YACC was written in the B
programming language and was soon rewritten in C
The parts of YACC program are divided into three sections:

/* definitions */
....

%%
/* rules */
....
%%

/* auxiliary routines */
....
Definitions: these include the header files and any token information used in the syntax. These are located at the
top of the input file. Here, the tokens are defined using a modulus sign. In the YACC, numbers are automatically
assigned for tokens.

examples:

%token ID
{% #include <stdio.h> %}

Rules: The rules are defined between %% and %%. These rules define the actions for when the token is scanned
and are executed when a token matches the grammar.

Auxiliary Routines: Auxiliary routines contain the function required in the rules section. This Auxiliary section
includes the main() function, where the yyparse() function is always called.

This yyparse() function plays the role of reading the token, performing actions and then returning to the main() after
the execution or in the case of an error.

0 is returned after successful parsing and 1 is returned after an unsuccessful parsing.

78
Workings of YACC

YACC in compiler design is set to work in C programming language along with its parser generator.

An input with a .y extension is given.

The file is invoked, and 2 files, y.tab.h and y.tab.c, are created. These files contain long codes implementing the LARl (1) Parser
for grammar.
This file then provides yyparse.y, which tries to parse a valid sentence successfully.
For the output files,

If called with the –d option in the command line, YACC produces y.tab.h with all its specific definitions.
Example of YACC Program

The YACC program code for a simple calculator is given below:

#include <ctype.h>

#include <stdio.h>

int yylex();

void yyerror();

int tmp=0;

%token num

%left '+' '-'

%left '*' '/'

%left '(' ')'

line :exp {printf("=%d\n",$$); return 0;};

exp :exp '+' exp {$$ =$1+$3;}

| exp '-' exp {$$ =$1-$3;}

| exp '' exp {$$ =$1$3;}

| exp '/' exp {$$ =$1/$3;}

| '(' exp ')' {$$=$2;}

| num {$$=$1;};

79
%%

void yyerror(){

printf("Incorrect\n");

tmp=1;

int main(){

printf("Enter an expression( +,-,*,/ or parenthesis):\n");

yyparse();

The lex file of the program is given below:

#include <stdlib.h>

#include <stdio.h>

#include "y.tab.h"

extern int yylval;

[\t] ;

[\n] return 0;

[0-9]+ { yylval = atoi(yytext);

return num;

. return yytext[0];

80
UNIT III SYNTAX DIRECTED TRANSLATION &
INTERMEDIATE CODE GENERATION

1
2
3
4
5
6
7
8
9
ARRAY REFERENCES IN ARITHMETIC EXPRESSIONS

10
11
12
13
TYPE CHECKING

14
15
16
17
BACKPATCHING

Backpatching is a technique used in compiler design to update previously generated code with the correct
target addresses or labels. It helps establish the connections between control flow constructs, such as
conditionals and loops, by setting the appropriate target addresses during code generation.

One-pass Code Generation using Backpatching

Backpatching can be used to generate a boolean expressions program and the flow of
control statements in a single pass. Label handling for Boolean statements in jumping code
is done by non-terminal B’s synthesized true list and false list attributes. If B is true, the label
to which control should go should be added to the list of a jump or conditional jump
instructions in B.truelist. The set of instructions known as B.falselist is what ultimately
receives the label to which control is sent when B is false. When the program is generated
for B, the jumps to true and false exist as well as the label field are left empty. These early
jumps are found in the lists B.truelist and B.falselist, respectively.

In the process of code generation, a statement S possesses a synthesized attribute called

S.nextlist, which represents a list of jumps to the instruction immediately following the code
for S. These jumps are represented as indexes in an instruction array, with labels serving as
the indexes. To manipulate the list of jumps, three functions are utilized:

• Makelist (i): This function creates a new list that contains only the index i, which
corresponds to an instruction in the array. Additionally, the function returns a pointer to the
newly generated list.
• Merge (p1, p2): The Merge function concatenates the lists pointed to by p1 and p2,
resulting in a single list that combines the jumps from both lists. The function then returns a
pointer to the concatenated list.
• Backpatch (p, i): The Backpatch function is responsible for inserting the index i as the
target label for each instruction on the list pointed to by p. This operation ensures that the
jumps in the list are redirected to the appropriate target location represented by the index i.

Backpatching for Boolean Expressions

18
By employing a translation technique, code generation for Boolean expressions can be
achieved through bottom-up parsing. In grammar, a non-terminal symbol M triggers a
semantic action that retrieves the index of the subsequent instruction to be generated at the
appropriate moment.

To illustrate this, let’s consider the concept of backpatching using boolean expressions and
a production rules table. Backpatching involves updating previously generated code with the
correct target addresses or labels.

Step 1: Generation of the production table

19
Step 2: We have to find the TAC(Three address code) for the given expression using
backpatching:
A < B OR C < D AND P < Q

20
Flow-of-Control Statements:

A translation scheme is developed for statements generated by the following grammar :

HereSdenotes a statement,La statement list,Aan assignment statement, and E a boolean

expression. We make the tacit assumption that the code that follows a given
statement inexecution also follows it physically in the quadruple array. Else, an
explicit jump must beprovided.

Scheme to implement the Translation:

The nonterminal E has two attributesE.truelistandE.falselist.LandSalso need a

list of unfilled quadruples that must eventually be completed by backpatching. These
lists are pointedto by the attributesL..nextlistandS.nextlist.S.nextlistis a pointer to a list
of all conditional and unconditional jumps to the quadruple following the statement S
in execution order, andL.nextlist is defined similarly.

The semantic rules for the revised grammar are as follows:

(1)S→ifEthenM1 S1 NelseM 2 S2
21
{backpatch(E.truelist,M1.quad);
backpatch(E.falselist,M2.quad);
S.nextlist: =merge(S1.nextlist,merge(N.nextlist,S2.nextlist)) }

We backpatch the jumps whenEis true to the quadrupleM1.quad, which is the

beginning of thecode for S1. Similarly, we backpatch jumps whenEis false to go to the
beginning of the code for S2. The listS.nextlistincludes all jumps out of S 1 and S2, as
well as the jump generated byN.

(2)N→ɛ{N.nextlist: =makelist(nextquad);
emit(‘goto_’) }

(3)M→ɛ{M.quad: =nextquad}

(4) S→ifEthenM S1 {backpatch(E.truelist,M.quad);

S.nextlist: =merge(E.falselist,S1.nextlist) }

(5) S→whileM1 EdoM2 S1 {backpatch(S1.nextlist,M1.quad);

backpatch(E.truelist,M2.quad);
S.nextlist:
=E.falselist
emit( ‘goto’M
1.quad) }

(6)S→beginLend{S.nextlist: =L.nextlist}
(7)S→A{S.nextlist: =nil}

The assignment S.nextlist: =nil initializes S.nextlist to an

Empty list. (8)L→L1;M S{backpatch(L1.nextlist,M.quad);

L.nextlist: =S.nextlist}

The statement followingL1 in order of execution is the beginning of S. Thus the L1.nextlistlist is
backpatched to the beginning of the code for S, which is given byM.quad.

(9)L→S{L.nextlist: =S.nextlist}

22
UNIT IV RUN-TIME ENVIRONMENT AND CODE
GENERATION

23
24
25
26
27
28
29
INTERMEDIATE CODE GENERATION

INTRODUCTION

The front end translates a source program into an intermediate representation from which
the back end generates target code.

Benefits of using a machine-independent intermediate form are:

1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end.

2. A machine-independent code optimizer can be applied to the intermediate representation.

Position of intermediate code generator

parser static intermediate intermediate code

checker code generator generator
code

INTERMEDIATE LANGUAGES

Three ways of intermediate representation:

• Syntax tree

• Postfix notation

• Three address code

The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.

Graphical Representations:

Syntax tree:

A syntax tree depicts the natural hierarchical structure of a source program. Adag
(Directed Acyclic Graph)gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statementa : =
b * - c + b * - care as follows:

30
assign assign

a + a +

* * *
b uminus b uminus b uminus
c c c

(a) Syntax tree (b) Dag

Postfix notation:

Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of

the tree in which a node appears immediately after its children. The postfix notation for the
syntax tree given above is

a b c uminus * b c uminus * + assign

Syntax-directed definition:

Syntax trees for assignment statements are produced by the syntax-directed definition.
Non-terminal S generates an assignment statement. The two binary operators + and * are
examples of the full operator set in a typical language. Operator associativities and precedences
are the usual ones, even though they have not been put into the grammar. This definition
constructs the tree from the input a : = b * - c + b* - c.

PRODUCTION SEMANTIC RULE

S→id : = E S.nptr : = mknode(‘assign’,mkleaf(id, id.place), E.nptr)

E→E 1 + E2 E.nptr : = mknode(‘+’, E1.nptr, E2.nptr )

E→E 1 * E2 E.nptr : = mknode(‘*’, E1.nptr, E2.nptr )

E→-E 1 E.nptr : = mknode(‘uminus’, E1.nptr)

E→( E 1 ) E.nptr : = E1.nptr

E→id E.nptr : = mkleaf( id, id.place )

Syntax-directed definition to produce syntax trees for assignment statements

31
The tokenidhas an attributeplacethat points to the symbol-table entry for the identifier.
A symbol-table entry can be found from an attributeid.name, representing the lexeme associated
with that occurrence ofid.If the lexical analyzer holds all lexemes in a single array of
characters, then attributenamemight be the index of the first character of the lexeme.

Two representations of the syntax tree are as follows. In (a) each node is represented as a
record with a field for its operator and additional fields for pointers to its children. In (b), nodes
are allocated from an array of records and the index or position of the node serves as the pointer
to the node. All the nodes in the syntax tree can be visited by following pointers, starting from
the root at position 10.

Two representations of the syntax tree

aaaaaaaaaaaaa
assign 0 id b

1 id c
id a
2 uminus
2 1

3 * 0 2
+
4 id b

5 id c
* *
6 uminus 5
id b id b
7 * 4 6

uminus uminus 8 + 3 7

9 id a
id c id c
10 assign 9 8

(a) (b)

Three-Address Code:

Three-address code is a sequence of statements of the general form

x : = yopz

where x, y and z are names, constants, or compiler-generated temporaries;opstands for any

operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into a sequence

t1 : = y * z
t2 : = x + t1

where t1 and t2 are compiler-generated temporary names.

32
Advantages of three-address code:

➢ The unraveling of complicated arithmetic expressions and of nested flow-of-control

statements makes three-address code desirable for target code generation and
optimization.

➢ The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged – unlike postfix notation.

Three-address code is a linearized representation of a syntax tree or a dag in which

explicit names correspond to the interior nodes of the graph. The syntax tree and dag are
represented by the three-address code sequences. Variable names can appear directly in three-
address statements.

Three-address code corresponding to the syntax tree and dag given above

t1 : = - c t1 : = -c

t2 : = b * t1 t2 : = b * t1

t3 : = - c t5 : = t2 + t2

t4 : = b * t3 a : = t5

t5 : = t2 + t4

a : = t5

(a) Code for the syntax tree (b) Code for the dag

The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.

Types of Three-Address Statements:

The common three-address statements are:

1. Assignment statements of the formx : = yopz, whereopis a binary arithmetic or logical

operation.

2. Assignment instructions of the formx : =opy, whereopis a unary operation. Essential unary
operations include unary minus, logical negation, shift operators, and conversion operators
that, for example, convert a fixed-point number to a floating-point number.

3.Copy statementsof the formx : = ywhere the value ofyis assigned tox.

4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.

33
5. Conditional jumps such asifx relop ygoto L. This instruction applies a relational operator (
<, =, >=, etc. ) toxandy, and executes the statement with label L next ifxstands in relation

34
relop to y. If not, the three-address statement following ifx relop ygoto L is executed next,
as in the usual sequence.

6.param xandcall p, nfor procedure calls andreturn y, where y representing a returned value
is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).

7. Indexed assignments of the form x : = y[i] and x[i] : = y.

8. Address and pointer assignments of the form x : = &y , x : = y, and x : = y.

Syntax-Directed Translation into Three-Address Code:

When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. For example,id : =Econsists of code to evaluateEinto some temporary
t, followed by the assignmentid.place: =t.

Given input a : = b * - c + b * - c, the three-address code is as shown above. The

synthesized attributeS.coderepresents the three-address code for the assignmentS.
The nonterminalEhas two attributes :
1.E.place, the name that will hold the value ofE, and
code, the sequence of three-address statements evaluatingE.

Syntax-directed definition to produce three-address code for assignments

PRODUCTION SEMANTIC RULES

35
S id : = E S.code : = E.code||gen(id.place ‘:=’ E.place)

E E1 + E2 E.place := newtemp;
E.code := E1.code||E 2.code||gen(E.place ‘:=’ E 1.place ‘+’ E2.place)

E E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)

E - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)

E ( E1 ) E.place : = E1.place;
E.code : = E1.code

E id E.place : = id.place;
E.code : = ‘ ‘

36
Semantic rules generating code for a while statement

S.begin:

E.code

if E.place = 0 goto S.after

S1.code
goto S.begin

S.after:. . .

PRODUCTION SEMANTIC RULES

S→ whileEdoS 1 S.begin := newlabel;

S.after := newlabel;
S.code := gen(S.begin ‘:’) ||
E.code ||
gen ( ‘if’ E.place ‘=’ ‘0’ ‘goto’ S.after)||
S1.code ||
gen ( ‘goto’ S.begin) ||
gen ( S.after ‘:’)

➢ The functionnewtempreturns a sequence of distinct names t 1,t2,….. in response to

successive calls.
➢ Notationgen(x ‘:=’ y ‘+’ z)is used to represent three-address statement x := y + z.
Expressions appearing instead of variables likex, yandzare evaluated when passed to
gen, and quoted operators or operand, like ‘+’ are taken literally.
➢ Flow-of–control statements can be added to the language of assignments. The code forS
→whileEdoS 1 is generated using new attributesS.beginandS.afterto mark the first
statement in the code forEand the statement following the code for S, respectively.
➢ The functionnewlabelreturns a new label every time it is called.
➢ We assume that a non-zero expression represents true; that is when the value ofE
becomes zero, control leaves the while statement.

Implementation of Three-Address Statements:

A three-address statement is an abstract form of intermediate code. In a compiler,

these statements can be implemented as records with fields for the operator and the operands.

37
Three such representations are:

38
➢ Quadruples

➢ Triples

➢ Indirect triples

Quadruples:

➢ A quadruple is a record structure with four fields, which are,op, arg1, arg2andresult.

➢ Theopfield contains an internal code for the operator. The three-address statementx : =
y op zis represented by placingyinarg1,zinarg2andxinresult.

➢ The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.

Triples:

➢ To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.

➢ If we do so, three-address statements can be represented by records with only three fields:
op, arg1andarg2.

➢ The fieldsarg1andarg2, for the arguments ofop, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).

➢ Since three fields are used, this intermediate code format is known astriples.

op arg1 arg2 result op arg1 arg2

(0) uminus c t1 (0) uminus c

(1) * b t1 t2 (1) * b (0)

(2) uminus c t3 (2) uminus c

(3) * b t3 t4 (3) * b (2)

(4) + t2 t4 t5 (4) + (1) (3)

(5) := t3 a (5) assign a (4)

(a) Quadruples (b) Triples

Quadruple and triple representation of three-address statements given above

39
A ternary operation like x[i] : = y requires two entries in the triple structure as shown as below
while x : = y[i] is naturally represented as two operations.

op arg1 arg2 op arg1 arg2

(0) []= x i (0) =[] y i

(1) assign (0) y (1) assign x (0)

(a) x[i] : = y (b) x : = y[i]

Indirect Triples:

➢ Another implementation of three-address code is that of listing pointers to triples, rather

than listing the triples themselves. This implementation is called indirect triples.

➢ For example, let us use an array statement to list pointers to triples in the desired order.
Then the triples shown above might be represented as follows:

statement op arg1 arg2

(0) (14) (14) uminus c

(1) (15) (15) * b (14)
(2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) assign a (18)

Indirect triples representation of three-address statements

DECLARATIONS

As the sequence of declarations in a procedure or block is examined, we can lay out

storage for names local to the procedure. For each local name, we create a symbol-table entry
with information like the type and the relative address of the storage for the name. The relative
address consists of an offset from the base of the static data area or the field for local data in an
activation record.

40
Declarations in a Procedure:
The syntax of languages such as C, Pascal and Fortran, allows all the declarations in a
single procedure to be processed as a group. In this case, a global variable, sayoffset, can keep
track of the next available relative address.

In the translation scheme shown below:

➢ Nonterminal P generates a sequence of declarations of the formid :T.

➢ Before the first declaration is considered,offsetis set to 0. As each new name is seen ,
that name is entered in the symbol table with offset equal to the current value ofoffset,
andoffsetis incremented by the width of the data object denoted by that name.

➢ The procedureenter( name, type, offset) creates a symbol-table entry forname, gives its
typetypeand relative addressoffsetin its data area.

➢ Attributetyperepresents a type expression constructed from the basic typesintegerand

realby applying the type constructorspointerandarray. If type expressions are
represented by graphs, then attributetypemight be a pointer to the node representing a
type expression.

➢ The width of an array is obtained by multiplying the width of each element by the
number of elements in the array. The width of each pointer is assumed to be 4.

Computing the types and relative addresses of declared names

P D { offset : = 0 }

D D;D

D id : T { enter(id.name, T.type, offset);

offset : = offset + T.width }

T integer { T.type : = integer;

T.width : = 4 }

T real { T.type : = real;

T.width : = 8 }

T array [ num ] of T1 { T.type : = array(num.val, T1.type);

T.width : = num.val X T1.width }

T ↑T 1 { T.type : = pointer ( T1.type);

T.width : = 4 }

41
Keeping Track of Scope Information:

When a nested procedure is seen, processing of declarations in the enclosing procedure is

temporarily suspended. This approach will be illustrated by adding semantic rules to the
following language:

P D

D D ; D |id: T |proc id; D ; S

One possible implementation of a symbol table is a linked list of entries for names.

A new symbol table is created when a procedure declarationD proc idD 1;Sis seen,
and entries for the declarations in D1 are created in the new table. The new table points back to
the symbol table of the enclosing procedure; the name represented by id itself is local to the
enclosing procedure. The only change from the treatment of variable declarations is that the
procedureenteris told which symbol table to make an entry in.

For example, consider the symbol tables for proceduresreadarray, exchange, and
quicksortpointing back to that for the containing proceduresort, consisting of the entire
program. Sincepartitionis declared withinquicksort, its table points to that ofquicksort.

Symbol tables for nested procedures

sort
nil header
a
x
readarray to readarray
exchange to exchange
quicksort

readarray exchange quicksort

h d h d h d
k
v
partition

partition
h d
i
j

42
The semantic rules are defined in terms of the following operations:

1. mktable(previous)creates a new symbol table and returns a pointer to the new table. The
argumentpreviouspoints to a previously created symbol table, presumably that for the
enclosing procedure.

2. enter(table, name, type, offset)creates a new entry for namenamein the symbol table pointed
to bytable.Again,enterplaces typetypeand relative addressoffsetin fields within the entry.

3. addwidth(table, width)records the cumulative width of all the entries in table in the header
associated with this symbol table.

4. enterproc(table, name, newtable)creates a new entry for procedurenamein the symbol table
pointed to bytable. The argumentnewtablepoints to the symbol table for this procedure
name.

Syntax directed translation scheme for nested procedures

P MD { addwidth ( top( tblptr) , top (offset));

pop (tblptr); pop (offset) }

M ɛ { t : = mktable (nil);
push (t,tblptr); push (0,offset) }

D D1 ; D2

D proc id ; N D1 ; S { t : = top (tblptr);

addwidth ( t, top (offset));
pop (tblptr); pop (offset);
enterproc (top (tblptr), id.name, t) }

D id : T { enter (top (tblptr), id.name, T.type, top (offset));

top (offset) := top (offset) + T.width }

N ɛ { t := mktable (top (tblptr));

push (t, tblptr); push (0,offset) }

➢ The stacktblptris used to contain pointers to the tables forsort, quicksort,andpartition

when the declarations inpartitionare considered.

➢ The top element of stackoffsetis the next available relative address for a local of the
current procedure.

➢ All semantic actions in the subtrees for B and C in

A→BC {action A}

43
are done beforeaction A at the end of the production occurs. Hence, the action associated
with the marker M is the first to be done.

44
➢ The action for nonterminal M initializes stacktblptrwith a symbol table for the
outermost scope, created by operationmktable(nil).The action also pushes relative
address 0 onto stack offset.

➢ Similarly, the nonterminal N uses the operationmktable(top(tblptr))to create a new

symbol table. The argumenttop(tblptr)gives the enclosing scope for the new table.

➢ For each variable declarationid:T, an entry is created foridin the current symbol table.
The top of stack offset is incremented by T.width.

➢ When the action on the right side ofD proc id; ND1; Soccurs, the width of all
declarations generated by D1 is on the top of stack offset; it is recorded usingaddwidth.
Stackstblptrandoffsetare then popped.
At this point, the name of the enclosed procedure is entered into the symbol table of its
enclosing procedure.

ASSIGNMENT STATEMENTS

Suppose that the context in which an assignment appears is given by the following grammar.

P→M D

M→ɛ

D→D ; D |id: T |proc id; N D ; S

N→ɛ

Nonterminal P becomes the new start symbol when these productions are added to those in the
translation scheme shown below.

Translation scheme to produce three-address code for assignments

S→id : = E { p : = lookup (id.name);

ifp≠nilthen
emit( p ‘ : =’ E.place)
elseerror }

E→E 1 + E2 { E.place : = newtemp;

emit( E.place ‘: =’ E1.place ‘ + ‘ E2.place ) }

E→E 1 * E2 { E.place : = newtemp;

emit( E.place ‘: =’ E1.place ‘ * ‘ E2.place ) }

E→-E 1 { E.place : = newtemp;

emit ( E.place ‘: =’ ‘uminus’ E1.place ) }

E→( E 1 ) { E.place : = E1.place }

45
E→id { p : = lookup (id.name);

ifp≠nilthen
E.place : = p
elseerror }

Reusing Temporary Names

➢ The temporaries used to hold intermediate values in expression calculations tend to

clutter up the symbol table, and space has to be allocated to hold their values.

➢ Temporaries can be reused by changingnewtemp. The code generated by the rules for E
→E 1 + E2 has the general form:

evaluate E1 into t1
evaluate E2 into t2
t : = t1 + t2

➢ The lifetimes of these temporaries are nested like matching pairs of balanced parentheses.

➢ Keep a count c , initialized to zero. Whenever a temporary name is used as an operand,

decrement c by 1. Whenever a new temporary name is generated, use $c and increase c
by 1.

➢ For example, consider the assignment x := a * b + c * d – e * f

Three-address code with stack temporaries

statement value of c

0
$0 := a * b 1
$1 := c * d 2
$0 := $0 + $1 1
$1 := e * f 2
$0 := $0 - $1 1
x := $0 0

Addressing Array Elements:

Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element isw, then theith element of array A
begins in location

base + ( i – low )xw

wherelowis the lower bound on the subscript andbaseis the relative address of the storage
allocated for the array. That is,baseis the relative address of A[low].
46
The expression can be partially evaluated at compile time if it is rewritten as

ixw+ (base–lowxw)

The subexpressionc = base – lowxwcan be evaluated when the declaration of the array is seen.
We assume that c is saved in the symbol table entry for A , so the relative address of A[i] is
obtained by simply addingixwtoc.

Address calculation of multi-dimensional arrays:

A two-dimensional array is stored in of the two forms :

➢ Row-major (row-by-row)

➢ Column-major (column-by-column)

Layouts for a 2 x 3 array

A[ 1 1 ] A[11]
first column
first row A[ 1,2 ] A[21]
A[ 1 3 ] A [ 1,2 ]
A[ 2,1 ] A [ 2,2 ] second column

second row A[ 2 2 ] A[13]

A[ 2,3 ] A [ 2,3 ] third column

(a) ROW-MAJOR (b) COLUMN-MAJOR

In the case of row-major form, the relative address of A[ i1 , i2] can be calculated by the formula

base + ((i1 – low1)xn 2 + i2 – low2)xw

where,low 1 andlow 2 are the lower bounds on the values ofi 1 andi 2 andn 2 is the number of
values thati 2 can take. That is, ifhigh 2 is the upper bound on the value ofi 2, thenn 2 = high2 –
low2 +1.

Assuming that i1 and i2 are the only values that are known at compile time, we can rewrite the
above expression as

(( i1 xn 2 ) + i2 )xw + ( base – (( low 1 xn 2 ) + low2 )xw)

47
The Translation Scheme for Addressing Array Elements :

Semantic actions will be added to the grammar :

(1) S L:=E
(2) E E+E
(3) E (E)
(4) E L
(5) L→Elist]
(6) L id
(7) Elist Elist , E
(8) Elist id[E

We generate a normal assignment ifLis a simple name, and an indexed assignment into the
location denoted byLotherwise :

(1) S L : = E{ifL.offset =null then/ * L is a simpleid*/

emit(L.place ‘: =’ E.place) ;
else
emit(L.place ‘[‘L.offset ‘]’ ‘: =’ E.place) }

(2) E E1 + E2 {E.place : = newtemp;

emit(E.place ‘: =’ E 1.place ‘ +’ E2.place) }

(3) E ( E1 ){E.place : = E 1.place}

When an array referenceLis reduced toE, we want ther-value ofL. Therefore we use indexing
to obtain the contents of the locationL.place[L.offset] :

(4) E L{ifL.offset =null then/* L is a simpleid* /

E.place : = L.place
else begin
E.place : = newtemp;
emit(E.place ‘: =’ L.place ‘[‘L.offset‘]’)
end}

(5) L Elist] {L.place : = newtemp;

L.offset : = newtemp;
emit(L.place ‘: =’ c( Elist.array ));
emit(L.offset ‘: =’ Elist.place ‘*’ width (Elist.array)) }

(6) L id{L.place :=id.place;

L.offset :=null}

(7) Elist Elist1 , E{t := newtemp;

m : = Elist1.ndim +1;
emit(t ‘: =’ Elist 1.place ‘*’ limit(Elist 1.array,m));
48
emit(t ‘: =’ t ‘+’ E.place);
Elist.array : = Elist1.array;

49
Elist.place : = t;
Elist.ndim : = m}

(8) Elist id[E{Elist.array : =id.place;

Elist.place : = E.place;
Elist.ndim : =1 }

Type conversion within Assignments :

Consider the grammar for assignment statements as above, but suppose there are two
types – real and integer , with integers converted to reals when necessary. We have another
attributeE.type, whose value is eitherrealorinteger. The semantic rule forE.typeassociated
with the productionE E + Eis :

E E + E{E.type: =
ifE 1.type = integerand
E2.type = integertheninteger
elsereal}

The entire semantic rule forE E + Eand most of the other productions must be
modified to generate, when necessary, three-address statements of the form x : = inttoreal y,
whose effect is to convert integer y to a real of equal value, called x.

Semantic action forE E1 + E 2

E.place := newtemp;
ifE 1.type = integerandE 2.type = integerthen begin
emit( E.place ‘: =’ E1.place ‘int +’ E2.place);
E.type : = integer
end
else ifE 1.type = realandE 2.type = realthen begin
emit( E.place ‘: =’ E1.place‘real +’E 2.place);
E.type : = real
end
else ifE 1.type = integerandE 2.type = realthen begin
u : = newtemp;
emit( u ‘: =’‘inttoreal’E 1.place);
emit( E.place ‘: =’ u‘ real +’E 2.place);
E.type : = real
end
else ifE 1.type = realandE 2.type =integerthen begin
u : = newtemp;
emit( u ‘: =’‘inttoreal’E 2.place);
emit( E.place ‘: =’ E1.place ‘real +’ u);
E.type : = real
end
50
else
E.type : = type_error;

51
For example, for the input x : = y + i * j
assumingxandyhave typereal, and i and j have typeinteger, the output would look like

t1 : = i int* j
t3 : = inttoreal t1
t2 : = y real+ t3
x : = t2

BOOLEAN EXPRESSIONS

Boolean expressions have two primary purposes. They are used to compute logical
values, but more often they are used as conditional expressions in statements that alter the flow
of control, such as if-then-else, or while-do statements.

Boolean expressions are composed of the boolean operators (and, or,andnot) applied
to elements that are boolean variables or relational expressions. Relational expressions are of the
formE 1 relopE 2, where E1 and E2 are arithmetic expressions.

Here we consider boolean expressions generated by the following grammar :

E→EorE | EandE |notE | ( E ) |id relop id | true | false

Methods of Translating Boolean Expressions:

There are two principal methods of representing the value of a boolean expression. They are :

➢ To encode true and falsenumericallyand to evaluate a boolean expression analogously

to an arithmetic expression. Often, 1 is used to denote true and 0 to denote false.

➢ To implement boolean expressions byflow of control, that is, representing the value of a
boolean expression by a position reached in a program. This method is particularly
convenient in implementing the boolean expressions in flow-of-control statements, such
as the if-then and while-do statements.

Numerical Representation

Here, 1 denotes true and 0 denotes false. Expressions will be evaluated completely from
left to right, in a manner similar to arithmetic expressions.

For example :

➢ The translation for

aorband notc
is the three-address sequence
t1 : =notc
t2 : = bandt 1
t3 : = aort 2

52
➢ A relational expression such as a < b is equivalent to the conditional statement
if a < b then 1 else 0

53
which can be translated into the three-address code sequence (again, we arbitrarily start
statement numbers at 100) :

100 : if a < b goto 103

101 : t:=0
102 : goto 104
103 : t:=1
104 :

Translation scheme using a numerical representation for booleans

E E1 orE 2 { E.place : = newtemp;

emit( E.place ‘: =’ E1.place ‘or’ E2.place ) }
E E1 andE 2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘and’ E2.place ) }
E notE 1 { E.place : = newtemp;
emit( E.place ‘: =’ ‘not’ E1.place ) }
E ( E1 ) { E.place : = E1.place }
E id1 relop id2 { E.place : = newtemp;
emit( ‘if’id 1.placerelop.opid 2.place ‘goto’ nextstat +3);
emit( E.place ‘: =’ ‘0’ );
emit(‘goto’nextstat +2);
emit( E.place ‘: =’ ‘1’) }
E true{ E.place : = newtemp;
emit( E.place ‘: =’ ‘1’) }
E false{ E.place : = newtemp;
emit( E.place ‘: =’ ‘0’) }

Short-Circuit Code:

We can also translate a boolean expression into three-address code without generating
code for any of the boolean operators and without having the code necessarily evaluate the entire
expression. This style of evaluation is sometimes called “short-circuit” or “jumping” code. It is
possible to evaluate boolean expressions without generating code for the boolean operatorsand,
or,andnotif we represent the value of an expression by a position in the code sequence.

Translation of a < b or c < d and e < f

100 : if a < b goto 103 107 : t2 : = 1

101 : t1 : = 0 108 : if e < f goto 111

102 : goto 104 109 : t3 : = 0

103 : t1 : = 1 110 : goto 112

104 : if c < d goto 107 111 : t3 : = 1

54
105 : t2 : = 0 112 : t4 : = t2 and t3

106 : goto 108 113 : t5 : = t1 or t4

55
Flow-of-Control Statements

We now consider the translation of boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as those generated by the following
grammar:

S→ifEthenS 1
|ifEthenS 1 elseS 2
| whileEdoS 1

In each of these productions,Eis the Boolean expression to be translated. In the translation, we

assume that a three-address statement can be symbolically labeled, and that the function
newlabelreturns a new symbolic label each time it is called.

➢ E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.

➢ The semantic rules for translating a flow-of-control statement S allow control to flow
from the translation S.code to the three-address instruction immediately following
S.code.

➢ S.next is a label that is attached to the first three-address instruction to be executed after
the code for S.

Code for if-then , if-then-else, and while-do statements

toE.true
E.code
toE.false
E.code toE.true E.true: S1.code

E.true: toE.false
S1.code gotoS.next
E.false:
S2.code
E.false: ...

S.next: ...

(a) if-then (b) if-then-else

S.begin: E.code :. . .
E
.
f
E.true: S1.code
a
l
gotoS.begin s
e
56
t

t
(c) while-do
r

57
Syntax-directed definition for flow-of-control statements

PRODUCTION SEMANTIC RULES

S→ifEthenS 1 E.true : = newlabel;

E.false : = S.next;
S1.next : = S.next;
S.code : = E.code || gen(E.true ‘:’) || S1.code

S→ifEthenS 1 elseS 2 E.true : = newlabel;

E.false : = newlabel;
S1.next : = S.next;
S2.next : = S.next;
S.code : = E.code || gen(E.true ‘:’) || S1.code ||
gen(‘goto’ S.next) ||
gen( E.false ‘:’) || S2.code

S→whileEdoS 1 S.begin : = newlabel;

E.true : = newlabel;
E.false : = S.next;
S1.next : = S.begin;
S.code : = gen(S.begin ‘:’)|| E.code ||
gen(E.true ‘:’) || S1.code ||
gen(‘goto’ S.begin)

Control-Flow Translation of Boolean Expressions:

Syntax-directed definition to produce three-address code for booleans

PRODUCTION SEMANTIC RULES

E→E 1 orE 2 E1.true : = E.true;

E1.false : = newlabel;
E2.true : = E.true;
E2.false : = E.false;
E.code : = E1.code || gen(E1.false ‘:’) || E2.code

E→E 1 andE 2 E.true : = newlabel;

E1.false : = E.false;
E2.true : = E.true;
E2.false : = E.false;
E.code : = E1.code || gen(E1.true ‘:’) || E2.code

E→notE 1 E1.true : = E.false;

E1.false : = E.true;
E.code : = E1.code
58
E→( E1 ) E1.true : = E.true;

59
E1.false : = E.false;
E.code : = E1.code

E→id 1 relop id2 E.code : = gen(‘if’id 1.placerelop.opid 2.place

‘goto’E.true) || gen(‘goto’E.false)

E→true E.code : = gen(‘goto’E.true)

E→false E.code : = gen(‘goto’ E.false)

CASE STATEMENTS

The “switch” or “case” statement is available in a variety of languages. The switch-statement

syntax is as shown below :
Switch-statement syntax

switchexpression
begin
casevalue:statement
casevalue:statement
...
casevalue:statement
default :statement
end

There is a selector expression, which is to be evaluated, followed bynconstant values

that the expression might take, including a default “value” which always matches the expression
if no other value does. The intended translation of a switch is code to:

1. Evaluate the expression.

2. Find which value in the list of cases is the same as the value of the expression.
3. Execute the statement associated with the value found.

Step (2) can be implemented in one of several ways :

➢ By a sequence of conditionalgotostatements, if the number of cases is small.

➢ By creating a table of pairs, with each pair consisting of a value and a label for the code
of the corresponding statement. Compiler generates a loop to compare the value of the
expression with each value in the table. If no match is found, the default (last) entry is
sure to match.
➢ If the number of cases s large, it is efficient to construct a hash table.
➢ There is a common special case in which an efficient implementation of the n-way branch
exists. If the values all lie in some small range, say imin to imax, and the number of
different values is a reasonable fraction of imax - imin, then we can construct an array of

60
labels, with the label of the statement for value j in the entry of the table with offset j -
imin and the label for the default in entries not filled otherwise. To perform switch,

61
evaluate the expression to obtain the value ofj, check the value is within range and
transfer to the table entry at offset j-imin .

Syntax-Directed Translation of Case Statements:

Consider the following switch statement:

switchE
begin
caseV 1 :S 1
caseV 2 :S 2
...
caseV n-1 :S n-1
default :S n
end

This case statement is translated into intermediate code that has the following form :

Translation of a case statement

code to evaluateEinto t
goto test
L1 : code forS 1
goto next
L2 : code forS 2
goto next
. . .
Ln-1 : code forS n-1
goto next
Ln : code forS n
goto next
test : if t =V 1 goto L1
if t =V 2 goto L2
. . .
if t =V n-1 goto Ln-1
goto Ln
next :

To translate into above form :

➢ When keywordswitchis seen, two new labelstestandnext,and a new temporarytare

generated.

➢ As expressionEis parsed, the code to evaluateEintotis generated. After processingE,

the jumpgoto testis generated.

➢ As eachcasekeyword occurs, a new label L is created and entered into the symbol table.
i
A pointer to this symbol-table entry and the valueV i of case constant are placed on a
62
stack (used only to store cases).

63
➢ Each statementcaseV i : Si is processed by emitting the newly created label Li, followed
by the code forS i , followed by the jumpgoto next.

➢ Then when the keywordendterminating the body of the switch is found, the code can be
generated for the n-way branch. Reading the pointer-value pairs on the case stack from
the bottom to the top, we can generate a sequence of three-address statements of the form

caseV 1 L1
caseV 2 L2
...
caseV n-1 Ln-1
case t Ln
label next

where t is the name holding the value of the selector expressionE, and L n is the label for
the default statement.

64
65
CODE GENERATION

The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program. The
code generation techniques presented below can be used whether or not an optimizing phase
occurs before code generation.

Position of code generator

source front end intermediate code intermediate code target

program code optimizer code generator program

symbol
table

ISSUES IN THE DESIGN OF A CODE GENERATOR

The following issues arise during the code generation phase :

1. Input to code generator

2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to code generator:

• The input to the code generation consists of the intermediate representation of the source
program produced by front end , together with information in the symbol table to
determine run-time addresses of the data objects denoted by the names in the
intermediate representation.

• Intermediate representation can be :

a. Linear representation such as postfix notation
b. Three address representation such as quadruples
c. Virtual machine representation such as stack machine code
d. Graphical representations such as syntax trees and dags.

• Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.

2. Target program:
• The output of the code generator is the target program. The output may be :
a. Absolute machine language
66
- It can be placed in a fixed memory location and can be executed immediately.

67
b. Relocatable machine language
- It allows subprograms to be compiled separately.

c. Assembly language
- Code generation is made easier.

3. Memory management:
• Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator.

• It makes use of symbol table, that is, a name in a three-address statement refers to a
symbol-table entry for the name.

• Labels in three-address statements have to be converted to addresses of instructions.

For example,
j:gotoigenerates jump instruction as follows :
➢ ifi<j, a backward jump instruction with target address equal to location of
code for quadrupleiis generated.
➢ ifi>j, the jump is forward. We must store on a list for quadrupleithe
location of the first machine instruction generated for quadruplej.Wheniis
processed, the machine locations for all instructions that forward jumps toi
are filled.

4. Instruction selection:
• The instructions of target machine should be complete and uniform.

• Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.

• The quality of the generated code is determined by its speed and size.

• The former statement can be translated into the latter statement as shown below:

5. Register allocation
• Instructions involving register operands are shorter and faster than those involving
operands in memory.

• The use of registers is subdivided into two subproblems :

➢ Register allocation– the set of variables that will reside in registers at a point in
68
the program is selected.

69
➢ Register assignment– the specific register that a variable will reside in is
picked.

• Certain machine requires even-oddregister pairsfor some operands and results.

For example , consider the division instruction of the form :
D x, y

where, x – dividend even register in even/odd register pair

y – divisor
even register holds the remainder
odd register holds the quotient

6. Evaluation order
• The order in which the computations are performed can affect the efficiency of the
target code. Some computation orders require fewer registers to hold intermediate
results than others.

TARGET MACHINE

• Familiarity with the target machine and its instruction set is a prerequisite for designing a
good code generator.
• The target computer is a byte-addressable machine with 4 bytes to a word.
• It hasngeneral-purpose registers, R 0, R1, . . . , Rn-1.
• It has two-address instructions of the form:
op source, destination
where,opis an op-code, andsourceanddestinationare data fields.

• It has the following op-codes :

MOV (movesourcetodestination)
ADD (addsourcetodestination)
SUB (subtractsourcefromdestination)
• Thesourceanddestinationof an instruction are specified by combining registers and
memory locations with address modes.

Address modes with their assembly-language forms

MODE FORMADDRESSADDED COST

absoluteM M 1

registerR R 0

indexed c(R)c+contents(R) 1

indirect register*R contents(R) 0

indirect indexed*c(R)contents(c+ 1
contents(R))

literal#c c1

70
• For example : MOV R 0, M stores contents of Register R0 into memory location M ;
MOV 4(R0), M stores the valuecontents(4+contents(R 0)) into M.

Instruction costs :

• Instruction cost = 1+cost for source and destination address modes. This cost corresponds
to the length of the instruction.
• Address modes involving registers have cost zero.
• Address modes involving memory location or literal have cost one.
• Instruction length should be minimized if space is important. Doing so also minimizes the
time taken to fetch and perform the instruction.
For example : MOV R0, R1 copies the contents of register R0 into R1. It has cost one,
since it occupies only one word of memory.
• The three-address statementa : = b + ccan be implemented by many different instruction
sequences :

i) MOV b, R0
ADD c, R0 cost = 6
MOV R0, a

ii) MOV b, a
ADD c, a cost = 6

iii) Assuming R0, R1 and R2 contain the addresses of a, b, and c :

MOV *R1, *R0
ADD *R2, *R0 cost = 2

• In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.

RUN-TIME STORAGE MANAGEMENT

• Information needed during an execution of a procedure is kept in a block of storage

called an activation record, which includes storage for names local to the procedure.
• The two standard storage allocation strategies are:
1. Static allocation
2. Stack allocation
• In static allocation, the position of an activation record in memory is fixed at compile
time.
• In stack allocation, a new activation record is pushed onto the stack for each execution of
a procedure. The record is popped when the activation ends.
• The following three-address statements are associated with the run-time allocation and
deallocation of activation records:
1. Call,
2. Return,
3. Halt, and
4. Action, a placeholder for other statements.
• We assume that the run-time memory is divided into areas for:
1. Code
2. Static data
71
3. Stack

72
Static allocation

Implementation of call statement:

The codes needed to implement static allocation are as follows:

MOV#here+ 20,callee.static_area/It saves return address/

GOTOcallee.code_area/*It transfers control to the target code for the called procedure */

where,
callee.static_area– Address of the activation record
callee.code_area– Address of the first instruction for called procedure
#here+ 20 – Literal return address which is the address of the instruction following GOTO.

Implementation of return statement:

A return from procedurecalleeis implemented by :

GOTO*callee.static_area

This transfers control to the address saved at the beginning of the activation record.

Implementation of action statement:

The instruction ACTION is used to implement action statement.

Implementation of halt statement:

The statement HALT is the final instruction that returns control to the operating system.

Stack allocation

Static allocation can become stack allocation by using relative addresses for storage in
activation records. In stack allocation, the position of activation record is stored in register so
words in activation records can be accessed as offsets from the value in this register.

The codes needed to implement stack allocation are as follows:

Initialization of stack:

MOV#stackstart, SP /* initializes stack */

Code for the first procedure

HALT /* terminate execution */

Implementation of Call statement:

ADD#caller.recordsize, SP /* increment stack pointer */

MOV#here+ 16, SP /Save return address */

73
GOTOcallee.code_area

74
where,
caller.recordsize– size of the activation record
#here+ 16 – address of the instruction following theGOTO

Implementation of Return statement:

GOTO0 ( SP ) /return to the caller */

SUB#caller.recordsize, SP /* decrement SP and restore to previous value */

BASIC BLOCKS AND FLOW GRAPHS

Basic Blocks

• Abasic blockis a sequence of consecutive statements in which flow of control enters at

the beginning and leaves at the end without any halt or possibility of branching except at
the end.
• The following sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5

Basic Block Construction:

Algorithm:Partition into basic blocks

Input:A sequence of three-address statements

Output:A list of basic blocks with each three-address statement in exactly one block

Method:

1. We first determine the set ofleaders, the first statements of basic blocks. The rules
we use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto statement
is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.

75
• Consider the following source code for dot product of two vectors a and b of length 20

begin

prod :=0;

i:=1;

do begin

prod :=prod+ a[i] * b[i];

i :=i+1;

end

while i <= 20

end

• The three-address code for the above source program is given as :

(1) prod := 0

(2) i := 1

(3) t1 := 4* i

(4) t2 := a[t1] /compute a[i] /

(5) t3 := 4* i

(6) t4 := b[t3] /compute b[i] /

(7) t5 := t2*t4

(8) t6 := prod+t5

(9) prod := t6

(10) t7 := i+1

(11) i := t7

(12) if i<=20 goto (3)

Basic block 1: Statement (1) to (2)

Basic block 2: Statement (3) to (12)

76
Transformations on Basic Blocks:

A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block. Two important classes of transformation are :

• Structure-preserving transformations

• Algebraic transformations

1. Structure preserving transformations:

a) Common subexpression elimination:

a:=b+c a:=b+c
b:=a–d b:=a-d
c:=b+c c:=b+c
d:=a–d d:=b

Since the second and fourth expressions compute the same expression, the basic block can be
transformed as above.

b) Dead-code elimination:

Supposexis dead, that is, never subsequently used, at the point where the statement x : =
y + z appears in a basic block. Then this statement may be safely removed without changing
the value of the basic block.

c) Renaming temporary variables:

A statementt : = b + c( t is a temporary ) can be changed tou : = b + c(u is a new

temporary) and all uses of this instance oftcan be changed touwithout changing the value of
the basic block.
Such a block is called anormal-form block.

d) Interchange of statements:

Suppose a block has the following two adjacent statements:

t1 : = b + c
t2 : = x + y

We can interchange the two statements without affecting the value of the block if and
only if neitherxnoryist 1 and neitherbnorcist 2.

2. Algebraic transformations:

Algebraic transformations can be used to change the set of expressions computed by a basic
block into an algebraically equivalent set.
Examples:
i) x : = x + 0 or x : = x * 1 can be eliminated from a basic block without changing the set of

77
expressions it computes.
ii) The exponential statement x : = y * * 2 can be replaced by x : = y * y.

78
Flow Graphs

• Flow graph is a directed graph containing the flow-of-control information for the set of
basic blocks making up a program.
• The nodes of the flow graph are basic blocks. It has a distinguished initial node.
• E.g.: Flow graph for the vector dot product is given as follows:

prod : = 0 B1
i:=1

t1 : = 4 * i
t2 : = a [ t1 ]
t3 : = 4 * i
B2
t4 : = b [ t3 ]
t5 : = t2 * t4
t6 : = prod + t5
prod : = t6
t7 : = i + 1
i : = t7
if i <= 20 goto B2

•B is theinitialnode. B 2 immediately follows B1, so there is an edge from B1 to B2. The

1
target of jump from last statement of B1 is the first statement B2, so there is an edge from
B1 (last statement) to B2 (first statement).
• B 1 is thepredecessorof B 2, and B2 is asuccessorof B 1.

Loops

• A loop is a collection of nodes in a flow graph such that

1. All nodes in the collection arestrongly connected.
2. The collection of nodes has a uniqueentry.
• A loop that contains no other loops is called an inner loop.

NEXT-USE INFORMATION

• If the name in a register is no longer needed, then we remove the name from the register
and the register can be used to store some other names.

79
Input:Basic block B of three-address statements

Output:At each statement i: x= y op z, we attach to i the liveliness and next-uses of x,

y and z.

Method:We start at the last statement of B and scan backwards.

1. Attach to statement i the information currently found in the symbol table

regarding the next-use and liveliness of x, y and z.
2. In the symbol table, set x to “not live” and “no next use”.
3. In the symbol table, set y and z to “live”, and next-uses of y and z to i.

Symbol Table:

Names Liveliness Next-use

x not live no next-use

y Live i

z Live i

A SIMPLE CODE GENERATOR

• A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements.

• For example: consider the three-address statementa := b+c

It can have the following sequence of codes:

ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c

(or)

ADD c, Ri Cost = 2 // if c is in a memory location

(or)

MOV c, Rj Cost = 3 // move c from memory to Rj and add

ADD Rj, Ri

Register and Address Descriptors:

• A register descriptor is used to keep track of what is currently in each registers. The
register descriptors show that initially all the registers are empty.
80
• An address descriptor stores the location where the current value of the name can be
found at run time.

81
A code-generation algorithm:

The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z,perform the following actions:

1. Invoke a functiongetregto determine the location L where the result of the computation y op
z should be stored.

2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of y is
not already in L, generate the instructionMOV y’ , Lto place a copy of y in L.

3. Generate the instructionOP z’ , Lwhere z’ is a current location of z. Prefer a register to a

memory location if z is in both. Update the address descriptor of x to indicate that x is in
location L. If x is in L, update its descriptor and remove x from all other descriptors.

4. If the current values of y or z have no next uses, are not live on exit from the block, and are in
registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z.

Generating Code for Assignment Statements:

• The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
t:=a–b
u:=a–c
v:=t+u
d:=v+u
with d live at the end.

Code sequence for the example is:

Statements Code Generated Register descriptor Address descriptor

t:=a-b MOV a, R0 R0 contains t t in R0

SUB b, R0

u:=a-c MOV a , R1 R0 contains t t in R0

SUB c , R1 R1 contains u u in R1

v:=t+u ADD R1, R0 R0 contains v u in R1

R1 contains u v in R0

d:=v+u ADD R1, R0 R0 contains d d in R0

d in R0 and memory
MOV R0, d

82
Generating Code for Indexed Assignments

The table shows the code sequences generated for the indexed assignment statements
a : = b [ i ]anda [ i ] : = b

Statements Code Generated Cost

a : = b[i] MOV b(Ri), R 2

a[i] : = b MOV b, a(Ri) 3

Generating Code for Pointer Assignments

The table shows the code sequences generated for the pointer assignments
a : = *pand*p : = a

Statements Code Generated Cost

a : = *p MOV *Rp, a 2

*p : = a MOV a, *Rp 2

Generating Code for Conditional Statements

Statement Code

if x < y goto z CMP x, y

CJ< z /* jump to z if condition code
is negative */

x : = y +z MOV y, R0
if x < 0 goto z ADD z, R0
MOV R0,x
CJ< z

THE DAG REPRESENTATION FOR BASIC BLOCKS

• A DAG for a basic block is adirected acyclic graphwith the following labels on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store the
computed values.
• DAGs are useful data structures for implementing transformations on basic blocks.
• It gives a picture of how the value computed by a statement is used in subsequent
statements.
83
• It provides a good way of determining common sub - expressions.

84
Algorithm for construction of DAG

Input:A basic block

Output:A DAG for the basic block containing the following information:

1. A label for each node. For leaves, the label is an identifier. For interior nodes, an
operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case (i) x : = y OP z

Case (ii) x : = OP y

Case (iii) x : = y

Method:

Step 1:If y is undefined then create node(y).

If z is undefined, create node(z) for case(i).

Step 2:For the case(i), create a node(OP) whose left child is node(y) and right child is

node(z). ( Checking for common sub expression). Let n be this node.

For case(ii), determine whether there is node(OP) with one child node(y). If not create such
a node.

For case(iii), node n will be node(y).

Step 3:Delete x from the list of identifiers for node(x). Append x to the list of attached

identifiers for the node n found in step 2 and set node(x) to n.

Example:Consider the block of three- address statements:

1. t1 := 4* i
2. t2 := a[t1]
3. t3 := 4* i
4. t4 := b[t3]
5. t5 := t2*t4
6. t6 := prod+t5
7. prod := t6
8. t7 := i+1
9. i := t7
10. if i<=20 goto (1)

85
Stages in DAG Construction

86
87
Application of DAGs:

1. We can automatically detect common sub expressions.

2. We can determine which identifiers have their values used in the block.
3. We can determine which statements compute values that could be used outside the block.

88
GENERATING CODE FROM DAGs

The advantage of generating code for a basic block from its dag representation is that,
from a dag we can easily see how to rearrange the order of the final computation sequence than
we can starting from a linear sequence of three-address statements or quadruples.

Rearranging the order

The order in which computations are done can affect the cost of resulting object code.

For example, consider the following basic block:

t1 : = a + b
t2 : = c + d
t3 : = e – t2
t4 : = t1 – t3

Generated code sequence for basic block:

MOV a , R0
ADD b , R0
MOV c , R1
ADD d , R1
MOV R0 , t1
MOV e , R0
SUB R1 , R0
MOV t1 , R1
SUB R0 , R1
MOV R1 , t4

Rearranged basic block:

Now t1 occurs immediately before t4.

t2 : = c + d
t3 : = e – t2
t1 : = a + b
t4 : = t1 – t3

Revised code sequence:

MOV c , R0
ADD d , R0
MOV a , R0
SUB R0 , R1
MOV a , R0
ADD b , R0
SUB R1 , R0
MOV R0 , t4

In this order, two instructionsMOV R 0 , t1 andMOV t 1 , R1 have been saved.

89
A Heuristic ordering for Dags

The heuristic ordering algorithm attempts to make the evaluation of a node immediately follow
the evaluation of its leftmost argument.

The algorithm shown below produces the ordering in reverse.

Algorithm:

1) whileunlisted interior nodes remaindo begin

2) select an unlisted node n, all of whose parents have been listed;
3) list n;
4) whilethe leftmost child m of n has no unlisted parents and is not a leafdo
begin
5) list m;
6) n:=m
end
end

Example:Consider the DAG shown below:

1
*

2 + - 3

4
*

5 - +
8

6 + 7 c d 11 e 12

a b
9 10

Initially, the only node with no unlisted parents is 1 so set n=1 at line (2) and list 1 at line (3).

Now, the left argument of 1, which is 2, has its parents listed, so we list 2 and set n=2 at line (6).

Now, at line (4) we find the leftmost child of 2, which is 6, has an unlisted parent 5. Thus we
select a new n at line (2), and node 3 is the only candidate. We list 3 and proceed down its left
chain, listing 4, 5 and 6. This leaves only 8 among the interior nodes so we list that.

The resulting list is 1234568 and the order of evaluation is 8654321.

90
Code sequence:

t8 : = d + e
t6 : = a + b
t5 : = t6 – c
t4 : = t5 * t8
t3 : = t4 – e
t2 : = t6 + t4
t1 : = t2 * t3

This will yield an optimal code for the DAG on machine whatever be the number of registers.

Dynamic Programming Code-Generation

1 Contiguous Evaluation
2 The Dynamic Programming Algorithm

1. Contiguous Evaluation

The dynamic programming algorithm partitions the problem of generating op-timal code
for an expression into the subproblems of generating optimal code for the subexpressions
of the given expression. As a simple example, consider an expression E of the form E1 +
E . An optimal program for E is formed by combining optimal programs for E1 and E , in
2 2

one or the other order, followed by code to evaluate the operator +. The subproblems of
generating optimal code for E1 and E are solved similarly.
2

An optimal program produced by the dynamic programming algorithm has an important

property. It evaluates an expression E = E1 op E "contigu-ously." We can appreciate what
2

this means by looking at the syntax tree T for E:

Here Ti and T 2 are trees for E1 and E , respectively.

91
We say a program P evaluates a tree T contiguously if it first evaluates those subtrees
of T that need to be computed into memory. Then, it evaluates the remainder of T either in
the order Ti, T , and then the root, or in the order T , Ti, and then the root, in either case
2 2

using the previously computed values from memory whenever necessary. As an example of
noncontiguous evaluation, P might first evaluate part of Ti leaving the value in a register
(instead of memory), next evaluate T , and then return to evaluate the rest of Ti.
2

The contiguous evaluation property defined above ensures that for any ex-pression tree T
there always exists an optimal program that consists of optimal programs for subtrees of the
root, followed by an instruction to evaluate the root. This property allows us to use a
dynamic programming algorithm to generate an optimal program for T.

2. The Dynamic Programming Algorithm

The dynamic programming algorithm proceeds in three phases (suppose the target machine
has r registers):

1. Compute bottom-up for each node n of the expression tree T an array C of costs, in
which the zth component C[i] is the optimal cost of computing the subtree S rooted at n
into a register, assuming i registers are available for the computation, for 1 < i < r.
2. Traverse T, using the cost vectors to determine which subtrees of T must be computed
into memory.
Traverse each tree using the cost vectors and associated instructions to generate the final
target code. The code for the subtrees computed into memory locations is generated first.
Each of these phases can be implemented to run in time linearly proportional to the size of
the expression tree.

The cost of computing a node n includes whatever loads and stores are necessary to
evaluate S in the given number of registers. It also includes the cost of computing the
operator at the root of S. The zeroth component of the cost vector is the optimal cost of
computing the subtree S into memory. The contiguous evaluation property ensures that an
optimal program for S can be generated by considering combinations of optimal programs
only for the subtrees of the root of S. This restriction reduces the number of cases that need
to be considered.

92
Consider a machine having two registers RO and Rl, and the following instructions, each of
unit cost:

LD Ri, Kj // Ri = Mj
op Ri, Ri, Ri // Ri = Ri Op Rj
op Ri, Ri, Mi // Ri = Ri Op Kj
LD Ri, Ri // Ri = Ri
ST Hi, Ri // Mi = Rj

In these instructions, Ri is either RO or Rl, and Mi is a memory location. The

operator op corresponds to an arithmetic operators.

Let us apply the dynamic programming algorithm to generate optimal code for the syntax
tree in Fig 8.26. In the first phase, we compute the cost vectors shown at each node. To
illustrate this cost computation, consider the cost vector at the leaf a. C[0], the cost of
computing a into memory, is 0 since it is already there. C[l], the cost of computing a into a
register, is 1 since we can load it into a register with the instruction LD RO, a. C[2], the
cost of loading a into a register with two registers available, is the same as that with one
register available. The cost vector at leaf a is therefore (0,1,1).

Consider the cost vector at the root. We first determine the minimum cost of computing
the root with one and two registers available. The machine instruction ADD RO, RO,
M matches the root, because the root is labeled with the operator +. Using this instruction,
the minimum cost of evaluating the root with one register available is the minimum cost of
computing its right subtree into memory, plus the minimum cost of computing its left
subtree into the register, plus 1 for the instruction. No other way exists. The cost vectors at
the right and left children of the root show that the minimum cost of computing the root
with one register available is 5 + 2 + 1 = 8.

93
Now consider the minimum cost of evaluating the root with two registers available. Three
cases arise depending on which instruction is used to compute the root and in what order
the left and right subtrees of the root are evaluated.
Compute the left subtree with two registers available into register RO, compute the right
subtree with one register available into register Rl, and use the instruction ADD RO,
RO, Rl to compute the root. This sequence has cost 2 + 5 + 1 = 8.

Compute the right subtree with two registers available into Rl, compute

the left subtree with one register available into RO, and use the instruction ADD RO,
RO, Rl. This sequence has cost 4 + 2 + 1 = 7.

Compute the right subtree into memory location M, compute the left sub-tree with two
registers available into register RO, and use the instruction ADD RO, RO, M. This
sequence has cost 5 + 2 + 1 = 8.

The second choice gives the minimum cost 7.

The minimum cost of computing the root into memory is determined by adding one to
the minimum cost of computing the root with all registers avail-able; that is, we compute
the root into a register and then store the result. The cost vector at the root is therefore
(8,8,7).

94
UNIT-V CODE OPTIMIZATION

INTRODUCTION

➢ The code produced by the straight forward compiling algorithms can often be made to run
faster or take less space, or both. This improvement is achieved by program transformations
that are traditionally called optimizations. Compilers that apply code-improving
transformations are called optimizing compilers.

➢ Optimizations are classified into two categories. They are

• Machine independent optimizations:
• Machine dependant optimizations:

Machine independent optimizations:

• Machine independent optimizations are program transformations that improve the target code
without taking into consideration any properties of the target machine.

Machine dependant optimizations:

• Machine dependant optimizations are based on register allocation and utilization of special
machine-instruction sequences.
Organization for an Optimizing Compiler:

➢ Flow analysis is a fundamental prerequisite for many important types of code

improvement.
• Generally control flow analysis precedes data flow analysis.
• Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA
constructs such as
• control flow graph
95
• Call graph
• Data flow analysis (DFA) is the process of asserting and collecting information prior to
program execution about the possible modification, preservation, and use of certain
entities (such as values or attributes of variables) in a computer program.

PRINCIPAL SOURCES OF OPTIMIzATION

• A transformation of a program is called local if it can be performed by looking only at the

statements in a basic block; otherwise, it is called global.
• Many transformations can be performed at both the local and global levels. Local
transformations are usually performed first.

➢ Function-Preserving Transformations

• There are a number of ways in which a compiler can improve a program without
changing the function it computes.
• The transformations

✓ Common sub expression elimination,

✓ Copy propagation,
✓ Dead-code elimination, and
✓ Constant folding

are common examples of such function-preserving transformations. The other

transformations come up primarily when global optimizations are performed.

96
• Frequently, a program will include several calculations of the same value, such as an
offset in an array. Some of the duplicate calculations cannot be avoided by the
programmer because they lie below the level of detail accessible within the source
language.

➢ Common Sub expressions elimination:

• An occurrence of an expression E is called a common sub-expression if E was previously

computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
• For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5 : = n
t6: = b [t4] +t5

The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5 : = n
t6: = b [t1] +t5

The common sub expression t4: =4*i is eliminated as its computation is alre ady in t1. And
value of i is not been changed from definition to use.

➢ Copy Propagation:

• Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after the
copy statement f: = g. Copy propagation means use of one variable instead of another.
This may not appear to be an improvement, but as we shall see it gives us an opportunity
to eliminate x.
• For example:

x=Pi;
……
A=x*r*r;

The optimization using copy propagation can be done as follows:

A=Pi*r*r;

Here the variable x is eliminated

➢ Dead-Code Eliminations:

97
• A variable is live at a point in a program if its value can be used subsequently; otherwise,
it is dead at that point. A related idea is dead or useless code, statements that compute

98
values that never get used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations. An optimization can
be done by eliminating dead code.
Example:

i=0;
if(i=1)
{
a=b+5;
}

Here, ‘if’ statement is dead code because this condition will never get satisfied.

➢ Constant folding:

• We can eliminate both the test and printing from the object code. More generally,
deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.

• One advantage of copy propagation is that it often turns the copy statement into dead
code.
✓ For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.

➢ Loop Optimizations:

• We now give a brief introduction to a very important place for optimizations, namely
loops, especially the inner loops where programs tend to spend the bulk of their time. The
running time of a program may be improved if we decrease the number of instructions in
an inner loop, even if we increase the amount of code outside that loop.
• Three techniques are important for loop optimization:

✓ code motion, which moves code outside a loop;

✓ Induction-variable elimination, which we apply to replace variables from inner loop.
✓ Reduction in strength, which replaces and expensive operation by a cheaper one, such as
a multiplication by an addition.

➢ Code Motion:

• An important modification that decreases the amount of code in a loop is code motion.
This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the
expression before the loop. Note that the notion “before the loop” assumes the existence
of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant
computation in the following while-statement:

while (i <= limit-2) /* statement does not change limit*/

Code motion will result in the equivalent of

99
t= limit-2;
while (i<=t) /* statement does not change limit or t */

➢ Induction Variables :

• Loops are usually processed inside out. For example consider the loop around B3.
• Note that the values of j and t 4 remain in lock-step; every time the value of j decreases by
1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called
induction variables.
• When there are two or more induction variables in a loop, it may be possible to get rid of
all but one, by the process of induction-variable elimination. For the inner loop around
B3 in Fig. we cannot get rid of either j or t4 completely; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2
- B5 is considered.

Example:
As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is not
changed elsewhere in the inner loop around B3, it follows that just after the statement
j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:=
4*j by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3
for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3,
we place an initializations of t4 at the end of the block where j itself is

before after
100
initialized, shown by the dashed addition to block B1 in second Fig.

101
• The replacement of a multiplication by a subtraction will speed up the object code if
multiplication takes more time than addition or subtraction, as is the case on many
machines.

➢ Reduction In Strength:

• Reduction in strength replaces expensive operations by equivalent cheaper ones on the

target machine. Certain machine instructions are considerably cheaper than others and
can often be used as special cases of more expensive operators.
• For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be implemented
as multiplication by a constant, which may be cheaper.

OPTIMIZATION OF BASIC BLOCKS

There are two types of basic block optimizations. They are :

✓ Structure-Preserving Transformations
✓ Algebraic Transformations

Structure-Preserving Transformations:

The primary Structure-Preserving Transformation on basic blocks are:

✓ Common sub-expression elimination

✓ Dead code elimination
✓ Renaming of temporary variables
✓ Interchange of two independent adjacent statements.

➢ Common sub-expression elimination:

Common sub expressions need not be computed over and over again. Instead they can be
computed once and kept in store from where it’s referenced when encountered aga in – of course
providing the variable values in the expression still remain constant.

Example:

a: =b+c
b: =a-d
c: =b+c
d: =a-d

The 2nd and 4th statements compute the same expression: b+c and a-d

Basic block can be transformed to

a: = b+c
b: = a-d
102
c: = a
d: = b

103
➢ Dead code elimination:

It’s possible that a large amount of dead (useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of construction or
error-correction of a program – once declared and defined, one forgets to remove them in case
they serve no purpose. Eliminating these will definitely optimize the code.

➢ Renaming of temporary variables:

• A statement t:=b+c where t is a temporary name can be changed to u:=b+c where u is

another temporary name, and change all uses of t to u.
• In this we can transform a basic block to its equivalent block called normal-form block.

➢ Interchange of two independent adjacent statements:

• Two statements

t1:=b+c

t2:=x+y

can be interchanged or reordered in its computation in the basic block when value of t 1
does not affect the value of t2.

Algebraic Transformations:

• Algebraic identities represent another important class of optimizations on basic blocks.

This includes simplifying expressions or replacing expensive operation by cheaper ones
i.e. reduction in strength.
• Another class of related optimizations is constant folding. Here we evaluate constant
expressions at compile time and replace the constant expressions by their values. Thus
the expression 2*3.14 would be replaced by 6.28.
• The relational operators <=, >=, <, >, + and = sometimes generate unexpected common
sub expressions.
• Associative laws may also be applied to expose common sub expressions. For example, if
the source code has the assignments

a :=b+c
e :=c+d+b

the following intermediate code may be generated:

a :=b+c
t :=c+d
e :=t+b

• Example:

x:=x+0 can be removed

104
x:=y**2 can be replaced by a cheaper statement x:=y*y

105
• The compiler writer should examine the language carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does not always
obey the algebraic identities of mathematics. Thus, a compiler may evaluate x*y-x*z as
x*(y-z) but it may not evaluate a+(b-c) as (a+b)-c.

LOOPS IN FLOW GRAPH

A graph representation of three-address statements, called aflow graph, is useful for

understanding code-generation algorithms, even if the graph is not explicitly constructed by a
code-generation algorithm. Nodes in the flow graph represent computations, and the edges
represent the flow of control.

Dominators:
In a flow graph, a node d dominates node n, if every path from initial node of the flow
graph to n goes through d. This will be denoted byd dom n. Every initial node dominates all the
remaining nodes in the flow graph and the entry of a loop dominates all nodes in the loop.
Similarly every node dominates itself.

Example:

*In the flow graph below,

*Initial node,node1 dominates every node.
*node 2 dominates itself
*node 3 dominates all but 1 and 2.
*node 4 dominates all but 1,2 and 3.
*node 5 and 6 dominates only themselves,since flow of control can skip around either by goin
through the other.
*node 7 dominates 7,8 ,9 and 10.
*node 8 dominates 8,9 and 10.
*node 9 and 10 dominates only themselves.

106
• The way of presenting dominator information is in a tree, called the dominator tree in
which the initial node is the root.
• The parent of each other node is its immediate dominator.
• Each node d dominates only its descendents in the tree.
• The existence of dominator tree follows from a property of dominators; each node has a
unique immediate dominator in that is the last dominator of n on any path from the initial
node to n.
• In terms of the dom relation, the immediate dominator m has the property is d=!n and d
dom n, then d dom m.

D(1)={1}

D(2)={1,2}

D(3)={1,3}

D(4)={1,3,4}

D(5)={1,3,4,5}

D(6)={1,3,4,6}

D(7)={1,3,4,7}

D(8)={1,3,4,7,8}

D(9)={1,3,4,7,8,9}

107
D(10)={1,3,4,7,8,10}

108
Natural Loop:

• One application of dominator information is in determining the loops of a flow graph suitable
for improvement.

• The properties of loops are

✓ A loop must have a single entry point, called the header. This entry point-dominates all
nodes in the loop, or it would not be the sole entry to the loop.
✓ There must be at least one way to iterate the loop(i.e.)at least one path back to the header.

• One way to find all the loops in a flow graph is to search for edges in the flow graph whose
heads dominate their tails. If a→b is an edge, b is the head and a is the tail. These types of
edges are called as back edges.

✓ Example:

In the above graph,

7→4 4 DOM 7

10→7 7 DOM 10

4→3

8→3

9→1

• The above edges will form loop in flow graph.

• Given a back edge n→d, we define the natural loop of the edge to be d plus th e set of nodes
that can reach n without going through d. Node d is the header of the loop.

Algorithm:Constructing the natural loop of a back edge.

Input:A flow graph G and a back edge n→d.

Output:The set loop consisting of all nodes in the natural loop n→d.

Method:Beginning with node n, we consider each node m*d that we know is in loop, to make
sure that m’s predecessors are also placed in loop. Each node in loop, except for d, is placed once
on stack, so its predecessors will be examined. Note that because d is put in the loop initially, we
never examine its predecessors, and thus find only those nodes that reach n without going
through d.

Procedureinsert(m);
ifm is not inloopthen begin
loop:=loopU {m};
pushmontostack
end;
109
stack: = empty;

110
loop: = {d};
insert(n);
whilestackis not emptydo begin
popm, the first element ofstack, offstack;
foreach predecessorpofmdoinsert(p)
end

Inner loop:

• If we use the natural loops as “the loops”, then we have the useful property that unless
two loops have the same header, they are either disjointed or one is entirely contained in
the other. Thus, neglecting loops with the same header for the moment, we have a natural
notion of inner loop: one that contains no other loop.
• When two natural loops have the same header, but neither is nested within the other, they
are combined and treated as a single loop.

Pre-Headers:

• Several transformations require us to move statements “before the header”. Therefore

begin treatment of a loop L by creating a new block, called the preheater.

• The pre-header has only the header as successor, and all edges which formerly entered
the header of L from outside L instead enter the pre-header.

• Edges from inside loop L to the header are not changed.

• Initially the pre-header is empty, but transformations on L may place statements in it.

header pre-header
loop L

he der
loop L

(a) Before (b) After

Reducible flow graphs:

• Reducible flow graphs are special flow graphs, for which several code optimization
transformations are especially easy to perform, loops are unambiguously defined,
dominators can be easily calculated, data flow analysis problems can also be solved
efficiently.

• Exclusive use of structured flow-of-control statements such as if-then-else, while-do,

continue, and break statements produces programs whose flow graphs are always

111
reducible.

112
• The most important properties of reducible flow graphs are that there are no jumps into
the middle of loops from outside; the only entry to a loop is through its header.

• Definition:

A flow graph G is reducible if and only if we can partition the edges into two disjoint
groups,forwardedges andbackedges, with the following properties.

✓ The forward edges from an acyclic graph in which every node can be reached from initial
node of G.

✓ The back edges consist only of edges where heads dominate theirs tails.

✓ Example: The above flow graph is reducible.

• If we know the relation DOM for a flow graph, we can find and remove all the back
edges.

• The remaining edges are forward edges.

• If the forward edges form an acyclic graph, then we can say the flow graph reducible.

• In the above example remove the five back edges 4→3, 7→4, 8→3, 9→1 and 10→7
whose heads dominate their tails, the remaining graph is acyclic.

• The key property of reducible flow graphs for loop analysis is that in such flow graphs
every set of nodes that we would informally regard as a loop must contain a back edge.

PEEPHOLE OPTIMIZATION

• A statement-by-statement code-generations strategy often produce target code that

contains redundant instructions and suboptimal constructs .The quality of such target
code can be improved by applying “optimizing” transformations to the target program.
• A simple but effective technique for improving the target code is peephole optimization,
a method for trying to improving the performance of the target program by examining a
short sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
• The peephole is a small, moving window on the target program. The code in the peephole
need not contiguous, although some implementations do require this.it is characteristic of
peephole optimization that each improvement may spawn opportunities for additional
improvements.
• We shall give the following examples of program transformations that are characteristic
of peephole optimizations:

✓ Redundant-instructions elimination
✓ Flow-of-control optimizations
✓ Algebraic simplifications
✓ Use of machine idioms
✓ Unreachable Code
113
Redundant Loads And Stores:

If we see the instructions sequence

(1) MOV R0,a

(2) MOV a,R0

we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of
ais already in register R 0.If (2) had a label we could not be sure that (1) was always executed
immediately before (2) and so we could not remove (2).

Unreachable Code:

• Another opportunity for peephole optimizations is the removal of unreachable instructions.

An unlabeled instruction immediately following an unconditional jump may be removed.
This operation can be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain segments that are executed
only if a variabledebugis 1. In C, the source code might look like:

#define debug 0

….

If ( debug ) {

Print debugging information

• In the intermediate representations the if-statement may be translated as:

If debug =1 goto L2

goto L2

L1: print debugging information

L2: ......................................................................................... (a)

• One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what
the value ofdebug; (a) can be replaced by:

If debug≠1 goto L2

Print debugging information

L2: ........................................................................................... (b)

• As the argument of the statement of (b) evaluates to a constanttrueit can be replaced

114
by

115
If debug≠0 goto L2

Print debugging information

L2: .......................................................................................... (c)

• As the argument of the first statement of (c) evaluates to a constant true, it can be replaced by
goto L2. Then all the statement that print debugging aids are manifestly unreachable and
can be eliminated one at a time.

Flows-Of-Control Optimizations:

• The unnecessary jumps can be eliminated in either the intermediate code or the target code
by the following types of peephole optimizations. We can replace the jump sequence

goto L1

….

L1: gotoL2

by the sequence

goto L2

….

L1: goto L2

• If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto
L2 provided it is preceded by an unconditional jump .Similarly, the sequence

if a < b goto L1

….

L1: goto L2

can be replaced by

If a < b goto L2

….

L1: goto L2

• Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional goto.
Then the sequence

goto L1
116
……..

117
L1: if a < b goto L2

L3: .......................................................................... (1)

• May be replaced by

If a < b goto L2

goto L3

…….

L3: ............................................................................ (2)

• While the number of instructions in (1) and (2) is the same, we sometimes skip the
unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in execution time

Algebraic Simplification:

• There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only a few algebraic identities occur frequently enough that it is
worth considering implementing them .For example, statements such as

x := x+0

x := x * 1

• Are often produced by straightforward intermediate code-generation algorithms, and they can
be eliminated easily through peephole optimization.

Reduction in Strength:

• Reduction in strength replaces expensive operations by equivalent cheaper ones on the target
machine. Certain machine instructions are considerably cheaper than others and can often be
used as special cases of more expensive operators.
• For example, x² is invariably cheaper to implement as x*x than as a call to an exponentiation
routine. Fixed-point multiplication or division by a power of two is cheaper to implement as
a shift. Floating-point division by a constant can be implemented as multiplication by a
constant, which may be cheaper.

X2 →X*X

Use of Machine Idioms:

• The target machine may have hardware instructions to implement certain specific operations
efficiently. For example, some machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or after using its value.
• The use of these modes greatly improves the quality of code when pushing or popping a

118
stack, as in parameter passing. These modes can also be used in code for statements like i :
=i+1.

119
i:=i+1→i++

i:=i-1→i- -

INTRODUCTION TO GLOBAL DATAFLOW ANALYSIS

• In order to do code optimization and a good job of code generation , compiler needs to
collect information about the program as a whole and to distribute this information to
each block in the flow graph.

• A compiler could take advantage of “reaching definitions” , such as knowing where a

variable likedebugwas last defined before reaching a given block, in order to perform
transformations are just a few examples of data-flow information that an optimizing
compiler collects by a process known as data-flow analysis.

• Data-flow information can be collected by setting up and solving systems of equations of

the form :

out [S] = gen [S] U ( in [S] – kill [S] )

This equation can be read as “ the information at the end of a statement is either generated
within the statement , or enters at the beginning and is not killed as control flows through
the statement.”

• The details of how data-flow equations are set and solved depend on three factors.

✓ The notions of generating and killing depend on the desired information, i.e., on the data
flow analysis problem to be solved. Moreover, for some problems, instead of proceeding
along with flow of control and defining out[s] in terms of in[s], we need to proceed
backwards and define in[s] in terms of out[s].

✓ Since data flows along control paths, data-flow analysis is affected by the constructs in a
program. In fact, when we write out[s] we implicitly assume that there is unique end
point where control leaves the statement; in general, equations are set up at the level of
basic blocks rather than statements, because blocks do have unique end points.

✓ There are subtleties that go along with such statements as procedure calls, assignments
through pointer variables, and even assignments to array variables.

Points and Paths:

• Within a basic block, we talk of the point between two adjacent statements, as well as the
point before the first statement and after the last. Thus, block B1 has four points: one
before any of the assignments and one after each of the three assignments.

120
B1

d1 : i :=m-1
d2: j :=n
d3 = u1
B2
d4 : I := i+1
B3
d5: j := j-1

B5 B6

d6 :a :=u2

• Now let us take a global view and consider all the points in all the blocks. A path from p 1
to pn is a sequence of points p1, p2,….,pn such that for each i between 1 and n-1, either

✓ P i is the point immediately preceding a statement and pi+1 is the point immediately
following that statement in the same block, or

✓ P i is the end of some block and pi+1 is the beginning of a successor block.

Reaching definitions:

• A definition of variable x is a statement that assigns, or may assign, a value to x. The

most common forms of definition are assignments to x and statements that read a value
from an i/o device and store it in x.

• These statements certainly define a value for x, and they are referred to asunambiguous
definitions of x. There are certain kinds of statements that may define a value for x; they
are calledambiguousdefinitions. The most usual forms ofambiguousdefinitions of x
are:

✓ A call of a procedure with x as a parameter or a procedure that can access x because x is

in the scope of the procedure.

✓ An assignment through a pointer that could refer to x. For example, the assignment *q: =
y is a definition of x if it is possible that q points to x. we must assume that an assignment
through a pointer is a definition of every variable.

121
• We say a definition d reaches a point p if there is a path from the point immediately
following d to p, such that d is not “killed” along that path. Thus a point can be reached

122
by an unambiguous definition and an ambiguous definition of the same variable
appearing later along one path.

Data-flow analysis of structured programs:

• Flow graphs for control flow constructs such as do-while statements have a useful
property: there is a single beginning point at which control enters and a single end point
that control leaves from when execution of the statement is over. We exploit this property
when we talk of the definitions reaching the beginning and the end of statements with the
following syntax.

S id: = E| S; S | if E then S else S | do S while E

E id + id| id

• Expressions in this language are similar to those in the intermediate code, but the flow
graphs for statements have restricted forms.

S1
S1
If E goto s1

S2
S1 S2 If E goto s1

S1 ; S2

IF E then S1 else S2 do S1 while E

• We define a portion of a flow graph called aregionto be a set of nodes N that includes a
header, which dominates all other nodes in the region. All edges between nodes in N are
in the region, except for some that enter the header.
• The portion of flow graph corresponding to a statement S is a region that obeys the

123
further restriction that control can flow to just one outside block when it leaves the
region.

124
• We say that the beginning points of the dummy blocks at the entry and exit of a
statement’s region are the beginning and end points, respectively, of the statement. The
equations are inductive, or syntax-directed, definition of the sets in[S], out[S], gen[S],
and kill[S] for all statements S.
• gen[S] is the set of definitions “generated” by S while kill[S] is the set of definitions
that never reach the end of S.
• Consider the following data-flow equations for reaching definitions :

S d:a:=b+c

gen [S] = { d }
kill [S] = Da – { d }
out [S] = gen [S] U ( in[S] – kill[S] )

• Observe the rules for a single assignment of variable a. Surely that assignment is a
definition of a, say d. Thus

Gen[S]={d}

• On the other hand, d “kills” all other definitions of a, so we write

Kill[S] = Da – {d}

Where, Da is the set of all definitions in the program for variable a.

ii )

S S1

gen[S]=gen[S2] U (gen[S1]-kill[S2])
Kill[S] = kill[S2] U (kill[S1] – gen[S2])

in [S1] = in [S]
in [S2] = out [S1]

125
out [S] = out [S2]

126
• Under what circumstances is definition d generated by S=S 1; S2? First of all, if it is
generated by S2, then it is surely generated by S. if d is generated by S 1, it will reach the
end of S provided it is not killed by S2. Thus, we write

gen[S]=gen[S2] U (gen[S1]-kill[S2])

• Similar reasoning applies to the killing of a definition, so we have

Kill[S] = kill[S2] U (kill[S1] – gen[S2])

Computation of in and out:

• Many data-flow problems can be solved by synthesized translations similar to those used
to compute gen and kill. It can be used, for example, to determine loop-invariant
computations.

• However, there are other kinds of data-flow information, such as the reaching-definitions
problem. It turns out that in is an inherited attribute, and out is a synthesized attribute
depending on in. we intend that in[S] be the set of definitions reaching the beginning of
S, taking into account the flow of control throughout the entire program, including
statements outside of S or within which S is nested.

• The set out[S] is defined similarly for the end of s. it is important to note the distinction
between out[S] and gen[S]. The latter is the set of definitions that reach the end of S
without following paths outside S.

• Assuming we know in[S] we compute out by equation, that is

Out[S] = gen[S] U (in[S] - kill[S])

• Considering cascade of two statements S 1;

S2, as in the second case. We start by
observing in[S1]=in[S]. Then, we recursively compute out[S1], which gives us in[S2],
since a definition reaches the beginning of S2 if and only if it reaches the end of S1. Now
we can compute out[S2], and this set is equal to out[S].

• Considering if-statement we have conservatively assumed that control can follow either
branch, a definition reaches the beginning of S1 or S2 exactly when it reaches the
beginning of S.

In[S1] = in[S2] = in[S]

• If a definition reaches the end of S if and only if it reaches the end of one or both sub
statements; i.e,

Out[S]=out[S1] U out[S2]

Representation of sets:

• Sets of definitions, such as gen[S] and kill[S], can be represented compactly using bit
vectors. We assign a number to each definition of interest in the flow graph. Then bit
127
vector representing a set of definitions will have 1 in position I if and only if the
definition numbered I is in the set.

• The number of definition statement can be taken as the index of statement in an array
holding pointers to statements. However, not all definitions may be of interest during
global data-flow analysis. Therefore the number of definitions of interest will typically be
recorded in a separate table.

• A bit vector representation for sets also allows set operations to be implemented
efficiently. The union and intersection of two sets can be implemented by logical or and
logical and, respectively, basic operations in most systems-oriented programming
languages. The difference A-B of sets A and B can be implemented by taking the
complement of B and then using logical and to compute A .

Local reaching definitions:

• Space for data-flow information can be traded for time, by saving information only at
certain points and, as needed, recomputing information at intervening points. Basic
blocks are usually treated as a unit during global flow analysis, with attention restricted to
only those points that are the beginnings of blocks.

• Since there are usually many more points than blocks, restricting our effort to blocks is a
significant savings. When needed, the reaching definitions for all points in a block can be
calculated from the reaching definitions for the beginning of a block.

Use-definition chains:

• It is often convenient to store the reaching definition information as” use-definition

chains” or “ud-chains”, which are lists, for each use of a variable, of all the definitions
that reaches that use. If a use of variable a in block B is preceded by no unambiguous
definition of a, then ud-chain for that use of a is the set of definitions in in[B] that are
definitions of a.in addition, if there are ambiguous definitions of a ,then all of these for
which no unambiguous definition of a lies between it and the use of a are on the ud-chain
for this use of a.

Evaluation order:

• The techniques for conserving space during attribute evaluation, also apply to the
computation of data-flow information using specifications. Specifically, the only
constraint on the evaluation order for the gen, kill, in and out sets for statements is that
imposed by dependencies between these sets. Having chosen an evaluation order, we are
free to release the space for a set after all uses of it have occurred.

• Earlier circular dependencies between attributes were not allowed, but we have seen that
data-flow equations may have circular dependencies.

CODE IMPROVING TRANSFORMATIONS

128
• Algorithms for performing the code improving transformations rely on data-flow
information. Here we consider common sub-expression elimination, copy propagation and
transformations for moving loop invariant computations out of loops and for eliminating
induction variables.

• Global transformations are not substitute for local transformations; both must be performed.

Elimination of global common sub expressions:

• The available expressions data-flow problem discussed in the last section allows us to
determine if an expression at point p in a flow graph is a common sub-expression. The
following algorithm formalizes the intuitive ideas presented for eliminating common sub-
expressions.

❖ ALGORITHM: Global common sub expression elimination.

INPUT: A flow graph with available expression information.

OUTPUT: A revised flow graph.

METHOD: For every statement s of the form x := y+z6 such that y+z is available at the
beginning of block and neither y nor r z is defined prior to statement s in that block,
do the following.

✓ To discover the evaluations of y+z that reach s’s block, we follow flow graph
edges, searching backward from s’s block. However, we do not go through
any block that evaluates y+z. The last evaluation of y+z in each block
encountered is an evaluation of y+z that reaches s.

✓ Create new variable u.

✓ Replace each statement w: =y+z found in (1) by

u:=y+z
w:=u

✓ Replace statement s by x:=u.

Copy propagation:

• Various algorithms introduce copy statements such as x :=copies may also be generated
directly by the intermediate code generator, although most of these involve temporaries
local to one block and can be removed by the dag construction. We may substitute y for x
in all these places, provided the following conditions are met every such use u of x.

• Statement s must be the only definition of x reaching u.

• On every path from s to including paths that go through u several times, there are no
129
assignments to y.

• Condition (1) can be checked using ud-changing information. We shall set up a new data-
flow analysis problem in which in[B] is the set of copies s: x:=y such that every path
from initial node to the beginning of B contains the statement s, and subsequent to the
last occurrence of s, there are no assignments to y.

Detection of loop-invariant computations:

• Ud-chains can be used to detect those computations in a loop that are loop-invariant, that
is, whose value does not change as long as control stays within the loop. Loop is a region
consisting of set of blocks with a header that dominates all the other blocks, so the only
way to enter the loop is through the header.

• If an assignment x := y+z is at a position in the loop where all possible definitions of y

and z are outside the loop, then y+z is loop-invariant because its value will be the same
each time x:=y+z is encountered. Having recognized that value of x will not change, consider v
:= x+w, where w could only have been defined outside the loop, then x+w is also loop-invariant.

Performing code motion:

• Having found the invariant statements within a loop, we can apply to some of them an
optimization known as code motion, in which the statements are moved to pre-header of
the loop. The following three conditions ensure that code motion does not change what
the program computes. Consider s: x: =y+z.

✓ The block containing s dominates all exit nodes of the loop, where an exit of a loop is a
node with a successor not in the loop.

✓ There is no other statement in the loop that assigns to x. Again, if x is a temporary

assigned only once, this condition is surely satisfied and need not be changed.
✓ No use of x in the loop is reached by any definition of x other than s. This condition too
will be satisfied, normally, if x is temporary.

Elimination of induction variable:

• A variable x is called an induction variable of a loop L if every time the variable x

changes values, it is incremented or decremented by some constant. Often, an induction
variable is incremented by the same constant each time around the loop, as in a loop
headed by for i := 1 to 10.

• However, our methods deal with variables that are incremented or decremented zero, one,
two, or more times as we go around a loop. The number of changes to an induction
variable may even differ at different iterations.
130
• A common situation is one in which an induction variable, say i, indexes an array, and
some other induction variable, say t, whose value is a linear function of i, is the actual
offset used to access the array. Often, the only use made of i is in the test for loop
termination. We can then get rid of i by replacing its test by one on t.

• We shall look for basic induction variables, which are those variables i whose only
assignments within loop L are of the form i := i+c or i-c, where c is a constant.

Recent trends in Compiler Design

1)Use of Machine Learning in Compiler Development

Since the mid-1990s, researchers have been trying to use machine-learning-based approaches to solve a number of
different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more
importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which
optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization
space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new
target architectures.

Leveraging machine-learning (ML) techniques for compiler optimizations has been widely used.

Generic view of machine learning in compilers. In stages (a) to (c) the model is learned from input
examples; in stage (d) the model is deployed and predicts a heuristic value for a new program. In stage
(a) the compiler writer investigates data structures that may be useful which are then summarized as
feature vectors in stage (b). In stage (c) training examples consisting of feature vectors and the correct
answer are passed to a machine learning tool. In stage (d) the learned model is inserted in the

131
compiler.

2)MLGO Framework by Google

Machine Learning Guided Optimizations (MLGO) is an industrial-grade general framework for integrating ML
techniques systematically in LLVM (an open-source industrial compiler infrastructure that is ubiquitous for building
mission-critical, high-performance software). MLGO uses reinforcement learning (RL) to train neural networks to make
decisions that can replace heuristics in LLVM. It describes two MLGO optimizations for LLVM:

1) reducing code size with inlining; and

2) improving code performance with register allocation (regalloc). Both optimizations are available in the LLVM repository,
and have been deployed in production.

3)Google Deep Learning Virtual Machine (DLVM)

Deep Learning Virtual Machine (DLVM) is a compiler infrastructure designed for modern deep learning systems. DLVM is
designed to apply a multi-stage compiler optimization strategy to both high-level linear algebra and low-level parallelism,
perform domain-specific transformations and relieve the overhead from front-end languages, and serve as the host for
research and development of DSLs for neural networks.

The DLVM consists of a virtual instruction set, control flow graph and data flow representation. Passes are functions that
traverse the intermediate representation of a program, either producing useful results as analyses of the program (analysis
passes) or mutating the program for differentiation and optimizations (transform passes).

132

Aktu Imp Question Solution
0% (1)
Aktu Imp Question Solution
42 pages
OS Unit IV PPT 2023
100% (1)
OS Unit IV PPT 2023
85 pages
Compiler Design Lex and Yacc
100% (3)
Compiler Design Lex and Yacc
48 pages
Compiler Design Notes PDF
No ratings yet
Compiler Design Notes PDF
103 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages
Compiler Design Question Papers
No ratings yet
Compiler Design Question Papers
6 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
Bootstrapping in Compiler Design
90% (20)
Bootstrapping in Compiler Design
12 pages
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
No ratings yet
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
6 pages
COMPILER DESIGN Unit 5 Two Mark With Answer
No ratings yet
COMPILER DESIGN Unit 5 Two Mark With Answer
7 pages
Compiler Construction Tools
100% (1)
Compiler Construction Tools
2 pages
ATCD Important Questions
No ratings yet
ATCD Important Questions
7 pages
Compilation and Execution Process of C Program
100% (3)
Compilation and Execution Process of C Program
2 pages
Cs3501 Compiler Design Laboratory 2021r - Lab Manual
No ratings yet
Cs3501 Compiler Design Laboratory 2021r - Lab Manual
55 pages
Q. What Is Input Buffering. What Is Sentinels?
No ratings yet
Q. What Is Input Buffering. What Is Sentinels?
6 pages
Reasons For Studying Concepts
100% (1)
Reasons For Studying Concepts
2 pages
2 Marks Dpco
No ratings yet
2 Marks Dpco
21 pages
r05311201 Automata and Compiler Design
100% (3)
r05311201 Automata and Compiler Design
6 pages
Labmanual Compiler Design
100% (3)
Labmanual Compiler Design
65 pages
PPL Notes
No ratings yet
PPL Notes
126 pages
Compiler Design Book PDF
100% (1)
Compiler Design Book PDF
101 pages
Compiler Design Notes For Uptu Syllabus PDF
No ratings yet
Compiler Design Notes For Uptu Syllabus PDF
114 pages
Anna University CP 2 Marks
50% (2)
Anna University CP 2 Marks
62 pages
UML Lab Manual PDF
No ratings yet
UML Lab Manual PDF
56 pages
YACC With Example
No ratings yet
YACC With Example
5 pages
Computer Organization UNIT-3 Processor and Control Unit: Fundamental Concepts
No ratings yet
Computer Organization UNIT-3 Processor and Control Unit: Fundamental Concepts
23 pages
Assignment No. 1 Class: T.E. Computer Subject: Theory of Computation
No ratings yet
Assignment No. 1 Class: T.E. Computer Subject: Theory of Computation
6 pages
Python Question Bank
No ratings yet
Python Question Bank
6 pages
Guarded Commands
No ratings yet
Guarded Commands
9 pages
CN UNIT-2 Notes
No ratings yet
CN UNIT-2 Notes
70 pages
CD Unitwise Imp Questions
100% (1)
CD Unitwise Imp Questions
5 pages
CCS334 Big Data Analytics Important Question
No ratings yet
CCS334 Big Data Analytics Important Question
1 page
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
DCCN Important Questions From All Units Test-1 - 1
No ratings yet
DCCN Important Questions From All Units Test-1 - 1
4 pages
Lecture 3.2.1 Procedural Design
No ratings yet
Lecture 3.2.1 Procedural Design
8 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
13 pages
Cs3271 Programming in C Lab Manual
No ratings yet
Cs3271 Programming in C Lab Manual
33 pages
Unit 5 Toc
No ratings yet
Unit 5 Toc
56 pages
Compiler Design Questio and Answer Key - 1
No ratings yet
Compiler Design Questio and Answer Key - 1
14 pages
Scripting Languages Unit - V Handwritten Notes
No ratings yet
Scripting Languages Unit - V Handwritten Notes
37 pages
DAA Notes
No ratings yet
DAA Notes
126 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
Compiler Design-UNIT-5
No ratings yet
Compiler Design-UNIT-5
34 pages
Compiler Design Two Marks
50% (2)
Compiler Design Two Marks
17 pages
Flat (Complete Notes)
No ratings yet
Flat (Complete Notes)
91 pages
Dpco - Unit-3,4,5-Question-Bank
No ratings yet
Dpco - Unit-3,4,5-Question-Bank
4 pages
Primitives For Distributed Communication
100% (2)
Primitives For Distributed Communication
10 pages
82001
No ratings yet
82001
85 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
CS3251 Programming in C Important Q Bank
No ratings yet
CS3251 Programming in C Important Q Bank
15 pages
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
No ratings yet
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
50 pages
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
No ratings yet
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
14 pages
CD3291 Data Structures and Algorithms Lecture Notes 1
No ratings yet
CD3291 Data Structures and Algorithms Lecture Notes 1
162 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Compiler Design
No ratings yet
Compiler Design
188 pages
Principles of Compiler Design
100% (4)
Principles of Compiler Design
162 pages
Principles of Compiler Design PDF
No ratings yet
Principles of Compiler Design PDF
162 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
CD Notes Final
No ratings yet
CD Notes Final
72 pages
CD Unit 1 Merged
No ratings yet
CD Unit 1 Merged
136 pages
UNIT 4 - Chapter 1 in Compiler Design
No ratings yet
UNIT 4 - Chapter 1 in Compiler Design
51 pages
Code Generation - Compiler Design
No ratings yet
Code Generation - Compiler Design
56 pages
Introduction
0% (1)
Introduction
26 pages
Compiler Design.: Why To Learn About Compilers
No ratings yet
Compiler Design.: Why To Learn About Compilers
12 pages
Unit-I Notes PPS
No ratings yet
Unit-I Notes PPS
23 pages
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
No ratings yet
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
18 pages
Compiler Design PDF
No ratings yet
Compiler Design PDF
313 pages
Comp309 PDF
No ratings yet
Comp309 PDF
4 pages
Research Paper Compiler
No ratings yet
Research Paper Compiler
9 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
16 pages
What Is The Phases of Compiler
No ratings yet
What Is The Phases of Compiler
5 pages
Module 1
100% (1)
Module 1
91 pages
Unit 1 Lexical Analyzer
No ratings yet
Unit 1 Lexical Analyzer
103 pages
CP 111 Lecture 1
100% (1)
CP 111 Lecture 1
47 pages
CSC 319 Compiler Constructions
No ratings yet
CSC 319 Compiler Constructions
54 pages
Unit - V: Study Material 1/11
No ratings yet
Unit - V: Study Material 1/11
11 pages
Unit 4
No ratings yet
Unit 4
24 pages
JNTUH Model Paper Answers With Explanations
No ratings yet
JNTUH Model Paper Answers With Explanations
17 pages
19 - M - SC - Computer Science Syllabus (2017-18)
100% (1)
19 - M - SC - Computer Science Syllabus (2017-18)
32 pages
Chapter 4 - Compiler Designnn 1 Compressed
No ratings yet
Chapter 4 - Compiler Designnn 1 Compressed
35 pages
Compiler Design Unit 5
No ratings yet
Compiler Design Unit 5
39 pages
Compiler Notes KCG Unit IV
No ratings yet
Compiler Notes KCG Unit IV
14 pages
Post and Pre Lab Question
No ratings yet
Post and Pre Lab Question
4 pages
DFJDFJ
No ratings yet
DFJDFJ
12 pages
Introduction and Structure of A Compiler
No ratings yet
Introduction and Structure of A Compiler
47 pages
Module 1
No ratings yet
Module 1
185 pages
Compiler Design Unit-1 - 2
No ratings yet
Compiler Design Unit-1 - 2
3 pages
Course File IOT
No ratings yet
Course File IOT
75 pages
An Overview of A Compiler: Y.N. Srikant
No ratings yet
An Overview of A Compiler: Y.N. Srikant
26 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.