CompilerDesign 210170107518 Krishna (4-10)
CompilerDesign 210170107518 Krishna (4-10)
Experiment No - 4
Aim: Introduction to YACC and generate Calculator Program
Objectives:
By the end of this experiment, the students should be able to:
Understand the concept of YACC and its significance in compiler construction
Write grammar rules for a given language
Implement a calculator program using YACC
Theory:
YACC (Yet Another Compiler Compiler) is a tool that is used for generating parsers. It is used in
combination with Lex to generate compilers and interpreters. YACC takes a set of rules and
generates a parser that can recognize and process the input according to those rules.
The grammar rules that are defined using YACC are written in BNF (Backus-Naur Form)
notation. These rules describe the syntax of a programming language.
INPUT FILE:
→ The YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
Definition Part:
→ The definition part includes information about the tokens used in the syntax definition.
Rule Part:
→ The rules part contains grammar definition in a modified BNF form. Actions is C code in { }
and can be embedded inside (Translation schemes).
The program for generating a calculator using YACC involves the following steps:
Defining the grammar rules for the calculator program
Writing the Lex code for tokenizing the input
Writing the YACC code for parsing the input and generating the output
Program:
Calc.l
%{
/* Definition section */
#include<stdio.h>
#include "calc.tab.h"
extern int yylval;
%}
/* Rule Section */
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
}
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}
Compiler Design (3170701) Enrollment No: 210170107518
Calc.y
%{
/* Declaration section */
#include <stdio.h>
#include "lex.yy.c"
/* Function prototypes */
int yylex();
void yyerror(const char* ss);
%token NUMBER
%%
ArithmeticExpression: E {
printf("\nResult = %d\n", $1);
return 0;
};
E: E '+' E { $$ = $1 + $3; }
| E '-' E { $$ = $1 - $3; }
| E '*' E { $$ = $1 * $3; }
| E '/' E { $$ = $1 / $3; }
| E '%' E { $$ = $1 % $3; }
| '(' E ')' { $$ = $2; }
| NUMBER { $$ = $1; }
;
%%
// Driver code
int main() {
printf("\nEnter any arithmetic expression:\n");
yyparse();
if (flag == 0)
printf("\nEntered arithmetic expression is Valid\n\n");
return 0;
}
After executing the program, we observed that the calculator program was successfully generated
using YACC. It was able to perform simple arithmetic operations such as addition, subtraction,
multiplication, and division. The program was also able to handle negative numbers and brackets.
Quiz:
1. What is YACC?
YACC, short for "Yet Another Compiler Compiler," is a tool used in computer programming
to generate parsers and syntax analyzers for processing structured text, typically in the context
of programming languages.
Suggested Reference:
1. "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
2. "The Unix Programming Environment" by Brian W. Kernighan and Rob Pike
https://www.geeksforgeeks.org/yacc-program-to-implement-a-calculator-and-recognize-a-valid-
arithmetic-expression/
Understandi Grammar
Implementat Testing &
ng of YAAC Generation Ethics (2)
Rubrics ion (2) Debugging (2) Total
(2) (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 5
Aim: Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser
Objectives:
By the end of this experiment, the students should be able to:
Understand the concept parsers and its significance in compiler construction
Write first and follow set for given grammar
Implement a LL(1) and predictive grammar using top down parser
Software/Equipment: C compiler
Theory:
LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and the second L shows that in this parsing technique, we are
going to use the Left most Derivation Tree. And finally, the 1 represents the number of
look-ahead, which means how many symbols are you going to see when you want to
make a decision.
Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next
input symbols. To make the parser back-tracking free, the predictive parser puts some constraints
on the grammar and accepts only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed. The parser refers to the parsing table to take any decision on the input and
stack element combination.
In recursive descent parsing, the parser may have more than one production to choose from for a
single instance of input, whereas in predictive parser, each step has at most one production to
choose. There might be instances where there is no production matching the input string, making
the parsing procedure to fail.
Program-1:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int table[100][TSIZE];
Compiler Design (3170701) Enrollment No: 210170107518
char terminal[TSIZE];
char nonterminal[26];
int no_pro;
char first[26][TSIZE];
char follow[26][TSIZE];
char first_rhs[100][TSIZE];
struct product {
char str[100];
int len;
} pro[20];
int modifiedRules = 0;
int isNT(char c) {
return c >= 'A' && c <= 'Z';
}
if (nonTerminal == production[i][index]) {
alpha = production[i][index + 1];
printf("is left recursive.\n");
if (production[i][index] != 0) {
} else {
printf("is not left recursive.\n");
strcpy(modifiedGrammar[modifiedRules], production[i]);
modifiedRules++;
}
index = 3;
}
printf("\nModified Grammar:\n");
for (int i = 0; i < modifiedRules; i++) {
printf("%s\n", modifiedGrammar[i]);
}
for (int k = 0; k < modifiedRules; k++) {
// printf("=>%s\n", modifiedGrammar[k]);
int j = 0;
nonterminal[modifiedGrammar[k][0] - 'A'] = 1;
for (int i = 0; i < strlen(modifiedGrammar[k]); ++i) {
if (modifiedGrammar[k][i] == '|') {
++no_pro;
pro[no_pro - 1].str[j] = '\0';
pro[no_pro - 1].len = j;
pro[no_pro].str[0] = pro[no_pro - 1].str[0];
pro[no_pro].str[1] = pro[no_pro - 1].str[1];
pro[no_pro].str[2] = pro[no_pro - 1].str[2];
j = 3;
} else {
pro[no_pro].str[j] = modifiedGrammar[k][i];
++j;
if (!isNT(modifiedGrammar[k][i]) && modifiedGrammar[k][i] != '-' &&
modifiedGrammar[k][i] != '>') {
terminal[modifiedGrammar[k][i]] = 1;
}
}
}
pro[no_pro].len = j;
++no_pro;
}
}
void FOLLOW() {
int t = 0;
int i, j, k, x;
while (t++ < no_pro) {
for (k = 0; k < 26; ++k) {
if (!nonterminal[k]) continue;
char nt = k + 'A';
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
if (nt == pro[i].str[j]) {
for (x = j + 1; x < pro[i].len; ++x) {
char sc = pro[i].str[x];
if (isNT(sc)) {
add_FIRST_A_to_FOLLOW_B(sc, nt);
if (first[sc - 'A']['^'])
continue;
} else {
follow[nt - 'A'][sc] = 1;
}
break;
}
if (x == pro[i].len)
add_FOLLOW_A_to_FOLLOW_B(pro[i].str[0], nt);
}
}
}
}
}
}
void FIRST() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_B(sc, pro[i].str[0]);
if (first[sc - 'A']['^'])
continue;
} else {
if (sc == '\'') continue;
first[pro[i].str[0] - 'A'][sc] = 1;
}
break;
}
if (j == pro[i].len)
first[pro[i].str[0] - 'A']['^'] = 1;
}
++t;
}
}
void FIRST_RHS() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_RHS__B(sc, i);
if (first[sc - 'A']['^'])
continue;
} else {
first_rhs[i][sc] = 1;
}
break;
}
if (j == pro[i].len)
first_rhs[i]['^'] = 1;
}
Compiler Design (3170701) Enrollment No: 210170107518
++t;
}
}
int main() {
// readFromFile();
char production[MAX_RULES][MAX_SYMBOLS];
int num;
removeLeftRecursion(production, num);
follow[pro[0].str[0] - 'A']['$'] = 1;
FIRST();
FOLLOW();
FIRST_RHS();
int i, j, k;
return 0;
}
Program-2
In the above example, the grammar is given as input and first set and follow set of nonterminals
are identified.Further the LL1 parsing table is constructed .
Program-2:
Compiler Design (3170701) Enrollment No: 210170107518
The code calculates FIRST and FOLLOW sets for a grammar and generates a predictive parsing
table. For real-world applications, parser generator tools like Bison or ANTLR are recommended
for efficiency and reliability.
Quiz:
1. What is a parser and state the Role of it?
A parser is a key component of a compiler or interpreter, responsible for analyzing the syntax of a
program written in a programming language. Its primary role is to check whether the program
follows the grammar rules of the language and to construct a parse tree or an abstract syntax tree
(AST) that represents the program's structure.
3. What are the Tools available for implementation?There are several tools and libraries available for
implementing parsers, each catering to different parsing techniques and programming languages.
Compiler Design (3170701) Enrollment No: 210170107518
Here are some commonly used tools for parser implementation:
YACC/Bison
ANTLR (ANother Tool for Language Recognition):
PLY (Python Lex-Yacc)
JavaCC
a. FIRST() Set: To calculate the FIRST() set for a symbol, you consider the possible first terminal
symbols that can be derived from that symbol. You iterate through the grammar rules, considering
epsilon (ε) for empty derivations and the FIRST() sets of other symbols. The process continues
until there are no changes to the FIRST() set.
b. FOLLOW() Set: To calculate the FOLLOW() set for a non-terminal symbol, you consider
where that non-terminal appears in the grammar rules. You examine the symbols that can follow
the non-terminal and consider the FIRST() sets of the symbols that can follow it. This process is
repeated until there are no changes to the FOLLOW() set.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/construction-of-ll1-parsing-table/
3. http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html
Knowledge
Problem Completeness
of Parsing Code Quality Presentation
implementati and accuracy
Rubrics techniques (2) (2) Total
on (2) (2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 06
Objectives:
By the end of this experiment, the students should be able to:
Understand the RDP ,broad classification of bottom up parsers and its significance in
compiler construction
Verifying whether the string is accepted for RDP, a given grammar is parsed using LR
parsers.
Implement a RDP and LALR parser
Software/Equipment: C compiler
Compiler Design (3170701) Enrollment No: 210170107518
Theory:
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can
be defined as a Parser that uses the various recursive procedure to process the input string with no
backtracking. It can be simply performed using a Recursive language. The first symbol of the
string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure.
The objective of each procedure is to read a sequence of input characters that can be produced by
the corresponding non-terminal, and return a pointer to the root of the parse tree for the non-
terminal. The structure of the procedure is prescribed by the productions for the equivalent non-
terminal.
The recursive procedures can be simply to write and adequately effective if written in a language
that executes the procedure call effectively. There is a procedure for each non-terminal in the
grammar. It can consider a global variable lookahead, holding the current input token and a
procedure match (Expected Token) is the action of recognizing the next token in the parsing
process and advancing the input stream pointer, such that lookahead points to the next token to be
parsed. Match () is effectively a call to the lexical analyzer to get the next token.
For example, input stream is a + b$.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the
canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look
ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Compiler Design (3170701) Enrollment No: 210170107518
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-
terminal. So, the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-
terminal. So, the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b
}
I6= { A → a•A, $
A → •aA, $
A → •b, $
}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.
Compiler Design (3170701) Enrollment No: 210170107518
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine them
and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine them
and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:
Program-1:
#include <iostream>
#include <cstdio>
char input[100];
int i, l;
int S();
int A();
int B();
int main() {
std::cout << "\nRecursive descent parsing for the following grammar" << std::endl;
std::cout << "\nS -> Aa | bAc | bBa \nA -> d \nB -> d" << std::endl;
std::cout << "\nEnter the string to be checked:";
gets(input); // Consider using fgets or C++ input for more robust input handling
if (S()) {
Compiler Design (3170701) Enrollment No: 210170107518
if (input[i] == '\0')
std::cout << "\nString is accepted" << std::endl;
else
std::cout << "\nString is not accepted" << std::endl;
} else
std::cout << "\nString not accepted" << std::endl;
return 0;
}
int S() {
if (input[i] == 'b') {
i++;
if (B() || A())
if(input[i] == 'c') i++;
else if(input[i] == 'a') i++;
else if(input[i] == '$') return 1;
else
return 0;
} else{
if(A()){
if(input[i] == 'a') i++;
return 1;
}else{
return 0;
}
}
}
int A(){
if(input[i] == 'd') {
i++;
return 1;
}
else return 0;
}
int B(){
if(input[i] == 'd'){
i++;
return 1;
}else return 0;
}
Program-2:
Program -1:
Compiler Design (3170701) Enrollment No: 210170107518
In the above output, as pe the grammar provided and as per calling procedure , the tree is parsed
and thereby the inputted strings are mapped w.r.t calling procedure ; and the string/s which are
successfully parsed are accepted and others rejected.
Program-2:
Quiz:
1. What do you mean by shift reduce parsing?
2. Provide broad classification of LR parsers.
3. Differentiate RDP and LALR parser.
4. How to do merging of itemsets?
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/recursive-descent-parser/
3. https://www.youtube.com/watch?v=odoHgcoombw
4. https://www.geeksforgeeks.org/lalr-parser-with-examples/
Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 07
Objectives:
By the end of this experiment, the students should be able to:
Understand the concept of OPG its significance in compiler construction
Write precedence relations for grammar
Implement a OPG using C compiler
Software/Equipment: C compiler
Theory:
Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to a class of
Grammars known as Operator Grammar.
Relation Meaning
Program:
#include <iostream>
#include <string>
using namespace std;
// Function to exit from the loop if the given condition is not true
void f() {
cout << "Not operator grammar" << endl;
exit(0);
}
int main() {
string grm[20];
char c;
while (c != '\0') {
if (c == '$') {
flag = 0;
f();
}
c = grm[i][++j];
}
}
if (flag == 1)
cout << "Operator grammar" << endl;
return 0;
}
In the above example ,the grammar is analysed as per operator grammar rules and the output is
against the rules of OPG so, it is not an operator grammar.
In the above example ,the grammar is analysed as per operator grammar rules and the output
favors the rules of OPG(operator present between two non terminals) so, it is not an operator
grammar.
Quiz:
1. Define operator grammar.
Operator grammar is a formalism used in the field of formal languages and parsing theory to describe
the syntax of programming languages or other formal languages. In operator grammar, the grammar
Compiler Design (3170701) Enrollment No: 210170107518
rules are defined based on operators and their associated precedence and associativity. It is particularly
useful for languages that use infix notation (e.g., arithmetic expressions) and need to specify how
operators should be parsed.
b. Lower-Than Relation (LTR): This relation indicates that one operator has lower precedence than
another, meaning it should be evaluated after the other operator in an expression.
c. Equal-Precedence Relation (EPR): This relation indicates that two operators have equal precedence,
and their associativity is used to determine the evaluation order.
Determining precedence relations in operator precedence parsing typically involves defining them
explicitly in the grammar or using a table-based approach. Here are two common methods:
a. Grammar Rules: Precedence relations can be defined in the grammar rules themselves. For example,
by using production rules like:
b. Precedence Table: A table-based approach can be used to specify precedence relations. This table
defines the relationships between operators and can be consulted during parsing to resolve ambiguities.
The table might indicate which operators have higher precedence (HTR), lower precedence (LTR), or
are equal in precedence (EPR).
Suggested Reference:
1. https://www.gatevidyalay.com/operator-precedence-parsing/
2. https://www.geeksforgeeks.org/role-of-operator-precedence-parser/
Marks
Experiment No - 08
Objectives:
By the end of this experiment, the students should be able to:
Understand the different intermediate code representations and its significance in compiler
construction
Write intermediate code for given infix expression
Software/Equipment: C compiler
Theory:
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
Compiler Design (3170701) Enrollment No: 210170107518
Three address code is used in compiler applications:
2. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
Temporaries are implicit and difficult to rearrange code.
It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
Compiler Design (3170701) Enrollment No: 210170107518
3. Indirect Triples – This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.
Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y +
z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
Program:
#include <iostream>
#include <string>
#include <stack>
#include <vector>
#include <iomanip>
Compiler Design (3170701) Enrollment No: 210170107518
using namespace std;
struct Quadruple {
string op;
string arg1;
string arg2;
string result;
};
while (!operators.empty()) {
postfix += operators.top();
operators.pop();
}
return postfix;
}
vector<Quadruple> quadruples;
Quadruple quad;
quad.op = c;
quad.arg1 = op1;
quad.arg2 = op2;
quad.result = result;
quadruples.push_back(quad);
operands.push(result);
}
}
return quadruples;
}
int main() {
string expression;
cout << "Enter an infix expression: ";
cin >> expression;
displayQuadruples(quadruples);
return 0;
}
In the above example the user is asked to write an infix expression and the output is generated
intermediate code(3- address code).
Quiz:
1. What are the different implementation methods for three-address code?
Three-address code (TAC) is an intermediate representation used in compilers to simplify the
translation of high-level programming languages into machine code. There are different methods to
implement TAC:
Quadruple: A quadruple is a four-element data structure used to represent an operation in three-
address code. It consists of the following components:
Triple: A triple is similar to a quadruple but contains only three elements: an operation, a source
operand, and a destination operand. Triples are often used for code generation and can represent simple
operations or assignments.
Indirect Triple: An indirect triple is a variation of the triple in which the source and destination
operands may contain references to memory locations rather than immediate values or variables.
Indirect triples are used when dealing with indirect addressing or complex memory operations.
Compiler Design (3170701) Enrollment No: 210170107518
2. What do you mean by quadruple?
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. https://www.geeksforgeeks.org/introduction-to-intermediate-representationir/
3. https://cs.lmu.edu/~ray/notes/ir/
Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 09
Aim: Extract Predecessor and Successor from given Control Flow Graph.
Objectives:
By the end of this experiment, the students should be able to:
Understand the concept control structure (in blocks) in compiler
Write the predecessor and successor for given graph.
Theory:
A basic block is a simple combination of statements. Except for entry and exit, the basic blocks
do not have any branches like in and out. It means that the flow of control enters at the
beginning and it always leaves at the end without any halt. The execution of a set of instructions
of a basic block always takes place in the form of a sequence.
The first step is to divide a group of three-address codes into the basic block. The new basic
block always begins with the first instruction and continues to add instructions until it reaches a
jump or a label. If no jumps or labels are identified, the control will flow from one instruction to
the next in sequential order.
The algorithm for the construction of the basic block is described below step by step:
Algorithm: The algorithm used here is partitioning the three-address code into basic blocks.
Input: A sequence of three-address codes will be the input for the basic blocks.
Output: A list of basic blocks with each three address statements, in exactly one block, is
considered as the output.
Method: We’ll start by identifying the intermediate code’s leaders. The following are some
guidelines for identifying leaders:
1. The first instruction in the intermediate code is generally considered as a leader.
2. The instructions that target a conditional or unconditional jump statement can be
considered as a leader.
3. Any instructions that are just after a conditional or unconditional jump statement can
be considered as a leader.
Each leader’s basic block will contain all of the instructions from the leader until the instruction
right before the following leader’s start.
Example of basic block:
Three Address Code for the expression a = b + c – d is:
T1 = b + c
T2 = T1 - d
a = T2
This represents a basic block in which all the statements execute in a sequence one after the
other.
Basic Block Construction:
Let us understand the construction of basic blocks with an example:
Example:
1. PROD = 0
Compiler Design (3170701) Enrollment No: 210170107518
2. I = 1
3. T2 = addr(A) – 4
4. T4 = addr(B) – 4
5. T1 = 4 x I
6. T3 = T2[T1]
7. T5 = T4[T1]
8. T6 = T3 x T5
9. PROD = PROD + T6
10. I = I + 1
11. IF I <=20 GOTO (5)
Using the algorithm given above, we can identify the number of basic blocks in the above three-
address code easily-
There are two Basic Blocks in the above three-address code:
B1 – Statement 1 to 4
B2 – Statement 5 to 11
Transformations on Basic blocks:
Transformations on basic blocks can be applied to a basic block. While transformation, we don’t
need to change the set of expressions computed by the block.
There are two types of basic block transformations. These are as follows:
1. Structure-Preserving Transformations
Structure preserving transformations can be achieved by the following methods:
1. Common sub-expression elimination
2. Dead code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
2. Algebraic Transformations
In the case of algebraic transformation, we basically change the set of expressions into an
algebraically equivalent set.
For example, and expression
x:= x + 0
or x:= x *1
This can be eliminated from a basic block without changing the set of expressions.
Flow Graph:
A flow graph is simply a directed graph. For the set of basic blocks, a flow graph shows the
flow of control information. A control flow graph is used to depict how the program control is
being parsed among the blocks. A flow graph is used to illustrate the flow of control between
basic blocks once an intermediate code has been partitioned into basic blocks. When the
beginning instruction of the Y block follows the last instruction of the X block, an edge might
flow from one block X to another block Y.
Let’s make the flow graph of the example that we used for basic block formation:
Compiler Design (3170701) Enrollment No: 210170107518
Firstly, we compute the basic blocks (which is already done above). Secondly, we assign the
flow control information.
Program:
#include <iostream>
struct TreeNode {
int data;
TreeNode* left;
TreeNode* right;
class BST {
public:
TreeNode* root;
BST() : root(nullptr) {}
int main() {
Compiler Design (3170701) Enrollment No: 210170107518
BST tree;
tree.insert(50);
tree.insert(30);
tree.insert(70);
tree.insert(20);
tree.insert(40);
tree.insert(60);
tree.insert(80);
if (predecessor != nullptr) {
std::cout << "Predecessor: " << predecessor->data << std::endl;
} else {
std::cout << "No predecessor for " << key << std::endl;
}
if (successor != nullptr) {
std::cout << "Successor: " << successor->data << std::endl;
} else {
std::cout << "No successor for " << key << std::endl;
}
return 0;
}
In the above example user gets predecessor and successor of a given specific node .
Quiz:
1. What is flowgraph?
A flow graph is a directed graph containing control flow information for the set of
fundamental blocks. A control flow graph shows how program control is parsed among the
blocks.
2. Define DAG.
A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The
order of the activities is depicted by a graph, which is visually presented as a set of circles,
each one representing an activity, some of which are connected by lines, which represent the
flow from one activity to another.
3. Define Backpatching .
Backpatching is basically a process of fulfilling unspecified information. This information is
of labels. It basically uses the appropriate semantic actions during the process of code
generation. It may indicate the address of the Label in goto statements while producing TACs
Compiler Design (3170701) Enrollment No: 210170107518
for the given expressions.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman
2. https://www.geeksforgeeks.org/data-flow-analysis-compiler/
Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 10
Aim: Study of Learning Basic Block Scheduling Heuristics from Optimal Data.
Objectives:
By the end of this experiment, the students should be able to:
Understanding the concept of basic block scheduling and its importance in compiler
optimization.
Understanding the various heuristics used for basic block scheduling.
Analyzing optimal data to learn the basic block scheduling heuristics.
Comparing the performance of the implemented basic block scheduler with other
commonly used basic block schedulers.
Theory:
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler. Basic block scheduling is important in its own right and also as a
building block for scheduling larger groups of instructions such as superblocks. The basic block
instruction scheduling problem is to find a minimum length schedule for a basic block a straight-
line sequence of code with a single-entry point and a single exit point subject to precedence,
latency, and resource constraints. Solving the problem exactly is known to be difficult, and most
compilers use a greedy list scheduling algorithm coupled with a heuristic. The heuristic is usually
hand-crafted, a potentially time-consuming process. Modern architectures are pipelined and can
issue multiple instructions per time cycle. On such processors, the order that the instructions are
scheduled can significantly impact performance.
The basic block instruction scheduling problem is to find a minimum length schedule for a basic
block a straight-line sequence of code with a single entry point and a single exit point subject to
precedence, latency, and resource constraints. Instruction scheduling for basic blocks is known to
be NP-complete for realistic architectures. The most popular method for scheduling basic blocks
continues to be list scheduling.
For e.g.: We consider multiple-issue pipelined processors. On such processors, there are multiple
functional units and multiple instructions can be issued (begin execution) each clock cycle.
Associated with each instruction is a delay or latency between when the instruction is issued and
when the result is available for other instructions which use the result. In this paper, we assume
that all functional units are fully pipelined and that instructions are typed. Examples of types of
instructions are load/store, integer, floating point, and branch instructions. We use the standard
labelled directed acyclic graph (DAG) representation of a basic block (see Figure 1(a)). Each node
corresponds to an instruction and there is an edge from i to j labelled with a positive integer l (i, j)
if j must not be issued until i has executed for l (i, j) cycles. Given a labelled dependency DAG for
a basic block, a schedule for a multiple-issue processor specifies an issue or start time for each
instruction or node such that the latency constraints are satisfied and the resource constraints are
satisfied. The latter are satisfied if, at every time cycle, the number of instructions of each type
Compiler Design (3170701) Enrollment No: 210170107518
issued at that cycle does not exceed the number of functional units that can execute instructions of
that type. The length of a schedule is the number of cycles needed for the schedule to complete;
i.e., each instruction has been issued at its start time and, for each instruction with no successors,
enough cycles have elapsed that the result for the instruction is available. The basic block
instruction scheduling problem is to construct a schedule with minimum length.
Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures. The
most popular method for scheduling basic blocks continues to be list scheduling. A list scheduler
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
best-first greedy heuristic. A list scheduler generates the schedule by determining all instructions
that can be scheduled at that time step, called the ready list, and uses the heuristic to determine the
best instruction on the list. The selected instruction is then added to the partial schedule and the
scheduler determines if any new instructions can be added to the ready list.
The heuristic in a list scheduler generally consists of a set of features and an order for testing the
features. Some standard features are as follows. The path length from a node i to a node j in a
DAG is the maximum number of edges along any path from i to j. The critical-path distance from
a node i to a node j in a DAG is the maximum sum of the latencies along any path from i to j.
Note that both the path length and the critical-path distance from a node i to itself is zero. A node j
is a descendant of a node i if there is a directed path from i to j; if the path consists of a single
edge, j is also called an immediate successor of i. The earliest start time of a node i is a lower
bound on the earliest cycle in which the instruction i can be scheduled.
In supervised learning of a classifier from examples, one is given a training set of instances, where
each instance is a vector of feature values and the correct classification for that instance, and is to
induce a classifier from the instances. The classifier is then used to predict the class of instances
that it has not seen before. Many algorithms have been proposed for supervised learning. One of
the most widely used is decision tree learning. In a decision tree the internal nodes of the tree are
labelled with features, the edges to the children of a node are labelled with the possible values of
the feature, and the leaves of the tree are labelled with a classification. To classify a new example,
one starts at the root and repeatedly tests the feature at a node and follows the appropriate branch
until a leaf is reached. The label of the leaf is the predicted classification of the new example.
Compiler Design (3170701) Enrollment No: 210170107518
Algorithm:
This document is on automatically learning a good heuristic for basic block scheduling using
supervised machine learning techniques. The novelty of our approach is in the quality of the
training data we obtained training instances from very large basic blocks and we performed an
extensive and systematic analysis to identify the best features and to synthesize new features—
and in our emphasis on learning a simple yet accurate heuristic.
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler.
Basic block scheduling is important as a building block for scheduling larger groups of
instructions such as superblocks.
The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single-entry point and a single exit point
subject to precedence, latency, and resource constraints.
Solving the problem exactly is known to be difficult, and most compilers use a greedy list
scheduling algorithm coupled with a heuristic.
Modern architectures are pipelined and can issue multiple instructions per time cycle. The
order that the instructions are scheduled can significantly impact performance.
Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures.
The most popular method for scheduling basic blocks continues to be list scheduling, which
Compiler Design (3170701) Enrollment No: 210170107518
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
best-first greedy heuristic.
The heuristic in a list scheduler generally consists of a set of features and an order for testing
the features.
Decision tree learning is one of the most widely used algorithms for supervised learning.
Quiz:
1. What is the basic block instruction scheduling problem?
The basic block instruction scheduling problem involves determining the optimal order of
execution for instructions within a basic block of code. The objective is to minimize execution
time and maximize resource utilization while respecting dependencies among instructions. It
is a critical step in optimizing the performance of compiled code.
2. Why is instruction scheduling important for improving the performance of object code
produced by a compiler?
Instruction scheduling is vital for improving the performance of object code generated by a
compiler for several reasons:
It optimizes the use of available hardware resources, such as functional units and registers,
reducing execution time.
It helps break dependencies and pipeline stalls, resulting in more efficient instruction
execution.
By rearranging instructions, it can improve instruction-level parallelism and reduce idle
time in the pipeline, enhancing overall program performance.
3. What are the constraints that need to be considered in solving the basic block instruction
scheduling problem?
When solving the basic block instruction scheduling problem, several constraints must be
considered, including:Data dependencies: Instructions must be scheduled to respect data
dependencies, such as read-after-write (RAW) and write-after-read (WAR) hazards.
Resource constraints: Limited hardware resources, like functional units, registers, and
memory, must be managed effectively.
Instruction latency: Different instructions may have different execution times, affecting
scheduling decisions.
Branch instructions: Conditional branches must be handled appropriately to avoid incorrect
speculative execution.
Suggested Reference:
1. https://dl.acm.org/doi/10.5555/1105634.1105652
2. https://www.worldcat.org/title/1032888564
Compiler Design (3170701) Enrollment No: 210170107518
Problem
Knowledge Documentati Presentation(2
Recognition Ethics(2)
Rubrics (2) on (2) ) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks