0% found this document useful (0 votes)
38 views

CompilerDesign 210170107518 Krishna (4-10)

The document discusses LL(1) and predictive parsing. LL(1) parsing uses a top-down approach with 1 token of lookahead. It requires the grammar to be LL(1) - free of left recursion, unambiguous, and left factored. The LL(1) parsing table is constructed by calculating First and Follow sets and placing productions under terminals in First or Follow. Predictive parsing is also top-down and can predict the production without backtracking using a lookahead pointer. An experiment aims to implement LL(1) and predictive parsers for a given grammar by writing First/Follow sets and generating the parsers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

CompilerDesign 210170107518 Krishna (4-10)

The document discusses LL(1) and predictive parsing. LL(1) parsing uses a top-down approach with 1 token of lookahead. It requires the grammar to be LL(1) - free of left recursion, unambiguous, and left factored. The LL(1) parsing table is constructed by calculating First and Follow sets and placing productions under terminals in First or Follow. Predictive parsing is also top-down and can predict the production without backtracking using a lookahead pointer. An experiment aims to implement LL(1) and predictive parsers for a given grammar by writing First/Follow sets and generating the parsers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Compiler Design (3170701) Enrollment No: 210170107518

Experiment No - 4
Aim: Introduction to YACC and generate Calculator Program

Competency and PracticalSkills:


• Understanding of YACC and its role in compiler construction
• Ability to write grammar rules and YAAC programs for a given language
• Ability to develop program using YACC

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
 Understand the concept of YACC and its significance in compiler construction
 Write grammar rules for a given language
 Implement a calculator program using YACC

Software/Equipment: Windows/Linux Operating System, YACC Compiler, Text editor,


Command prompt or terminal

Theory:
YACC (Yet Another Compiler Compiler) is a tool that is used for generating parsers. It is used in
combination with Lex to generate compilers and interpreters. YACC takes a set of rules and
generates a parser that can recognize and process the input according to those rules.

The grammar rules that are defined using YACC are written in BNF (Backus-Naur Form)
notation. These rules describe the syntax of a programming language.

INPUT FILE:
→ The YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....

Definition Part:
→ The definition part includes information about the tokens used in the syntax definition.

Rule Part:
→ The rules part contains grammar definition in a modified BNF form. Actions is C code in { }
and can be embedded inside (Translation schemes).

Auxiliary Routines Part:


→ The auxiliary routines part is only C code.
→ It includes function definitions for every function needed in the rules part.
→ It can also contain the main() function definition if the parser is going to be run as a program.
→ The main() function must call the function yyparse().
Compiler Design (3170701) Enrollment No: 210170107518

For Compiling YACC Program:


1. Write lex program in a file file.l and yacc in a file file.y
2. Open Terminal and Navigate to the Directory where you have saved the files.
3. type lex file.l
4. type yacc file.y
5. type cc lex.yy.c y.tab.h -ll
6. type ./a.out

The program for generating a calculator using YACC involves the following steps:
 Defining the grammar rules for the calculator program
 Writing the Lex code for tokenizing the input
 Writing the YACC code for parsing the input and generating the output

Program:

Calc.l

%{
/* Definition section */
#include<stdio.h>
#include "calc.tab.h"
extern int yylval;
%}

/* Rule Section */
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;

}
[\t] ;

[\n] return 0;

. return yytext[0];

%%

int yywrap()
{
return 1;
}
Compiler Design (3170701) Enrollment No: 210170107518

Calc.y
%{
/* Declaration section */
#include <stdio.h>
#include "lex.yy.c"

/* Function prototypes */
int yylex();
void yyerror(const char* ss);

/* Global flag variable */


int flag = 0;
%}

%token NUMBER

%left '+' '-'


%left '*' '/' '%'
%left '(' ')'

%%

ArithmeticExpression: E {
printf("\nResult = %d\n", $1);
return 0;
};

E: E '+' E { $$ = $1 + $3; }
| E '-' E { $$ = $1 - $3; }
| E '*' E { $$ = $1 * $3; }
| E '/' E { $$ = $1 / $3; }
| E '%' E { $$ = $1 % $3; }
| '(' E ')' { $$ = $2; }
| NUMBER { $$ = $1; }
;

%%

// Driver code
int main() {
printf("\nEnter any arithmetic expression:\n");

yyparse();

if (flag == 0)
printf("\nEntered arithmetic expression is Valid\n\n");

return 0;
}

void yyerror(const char* s) {


Compiler Design (3170701) Enrollment No: 210170107518
printf("\nEntered arithmetic expression is Invalid: %s\n\n", s);
flag = 1;
}

Observations and Conclusion:

After executing the program, we observed that the calculator program was successfully generated
using YACC. It was able to perform simple arithmetic operations such as addition, subtraction,
multiplication, and division. The program was also able to handle negative numbers and brackets.

Quiz:
1. What is YACC?
YACC, short for "Yet Another Compiler Compiler," is a tool used in computer programming
to generate parsers and syntax analyzers for processing structured text, typically in the context
of programming languages.

2. What is the purpose of YACC?


The purpose of YACC is to assist in the creation of syntax analyzers (parsers) for
programming languages or other formal languages. It helps developers define the grammar of
a language and generate code to recognize and analyze the syntax of programs written in that
language.

3. What is the output of YACC?


The output of YACC is typically a parser or syntax analyzer in the form of source code
written in a programming language like C or C++. This generated code can be integrated into
a compiler or interpreter to process and validate the syntax of input programs.

4. What is a syntax analyzer?


A syntax analyzer, also known as a parser, is a component of a compiler or interpreter that
checks the syntactic structure of a program or input text. It ensures that the input adheres to
the grammar rules of the language and generates a parse tree or an abstract syntax tree for
further analysis or code generation.

5. What is the role of a lexical analyzer in YACC?


The role of a lexical analyzer in YACC is to break down the input text into a sequence of
tokens, which are the smallest meaningful units in a programming language (e.g., keywords,
identifiers, operators, literals). YACC and the lexical analyzer work together: the lexical
analyzer tokenizes the input, and YACC's generated parser processes these tokens based on
the language's grammar rules.
Compiler Design (3170701) Enrollment No: 210170107518

Suggested Reference:
1. "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
2. "The Unix Programming Environment" by Brian W. Kernighan and Rob Pike

References used by the students:

https://www.geeksforgeeks.org/yacc-program-to-implement-a-calculator-and-recognize-a-valid-
arithmetic-expression/

Rubric wise marks obtained:

Understandi Grammar
Implementat Testing &
ng of YAAC Generation Ethics (2)
Rubrics ion (2) Debugging (2) Total
(2) (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 5
Aim: Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser

Competency and PracticalSkills:


• Understanding Parsers and its role in compiler construction
• Ability to write first and follow for given grammar
• Ability to developLL(1) and predictive parser using top down parsing approach

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
 Understand the concept parsers and its significance in compiler construction
 Write first and follow set for given grammar
 Implement a LL(1) and predictive grammar using top down parser

Software/Equipment: C compiler

Theory:

 LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and the second L shows that in this parsing technique, we are
going to use the Left most Derivation Tree. And finally, the 1 represents the number of
look-ahead, which means how many symbols are you going to see when you want to
make a decision.

Essential conditions to check first are as follows:


1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.

Algorithm to construct LL(1) Parsing Table:


Step 1: First check all the essential conditions mentioned above and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all the strings then
the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of
derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal
in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in
the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal
Symbols. All the Null Productions of the Grammars will go under the Follow elements and the
remaining productions will lie under the elements of the First set.
Compiler Design (3170701) Enrollment No: 210170107518

 Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next
input symbols. To make the parser back-tracking free, the predictive parser puts some constraints
on the grammar and accepts only a class of grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed. The parser refers to the parsing table to take any decision on the input and
stack element combination.

In recursive descent parsing, the parser may have more than one production to choose from for a
single instance of input, whereas in predictive parser, each step has at most one production to
choose. There might be instances where there is no production matching the input string, making
the parsing procedure to fail.

Program-1:
#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define TSIZE 128


#define MAX_RULES 10
#define MAX_SYMBOLS 100

int table[100][TSIZE];
Compiler Design (3170701) Enrollment No: 210170107518
char terminal[TSIZE];
char nonterminal[26];
int no_pro;
char first[26][TSIZE];
char follow[26][TSIZE];
char first_rhs[100][TSIZE];

struct product {
char str[100];
int len;
} pro[20];

int modifiedRules = 0;

int isNT(char c) {
return c >= 'A' && c <= 'Z';
}

void removeLeftRecursion(char production[MAX_RULES][MAX_SYMBOLS], int num) {


char modifiedGrammar[MAX_RULES * 2][MAX_SYMBOLS];
int index = 3;
char ch='Z';

for (int i = 0; i < num; i++) {


printf("\nGRAMMAR:: %s ", production[i]);
char nonTerminal = production[i][0];
char beta, alpha;
int nonTerminalCount = 0;

if (nonTerminal == production[i][index]) {
alpha = production[i][index + 1];
printf("is left recursive.\n");

while (production[i][index] != 0 && production[i][index] != '|') {


index++;
}

if (production[i][index] != 0) {

beta = production[i][index + 1];


printf("Grammar without left recursion: \n");
printf("%c->%c%c\n", nonTerminal, beta, ch);
strcpy(modifiedGrammar[modifiedRules], "");
modifiedGrammar[modifiedRules][0] = nonTerminal;
modifiedGrammar[modifiedRules][1] = '-';
modifiedGrammar[modifiedRules][2] = '>';
modifiedGrammar[modifiedRules][3] = beta;
modifiedGrammar[modifiedRules][4] = ch;
modifiedGrammar[modifiedRules][5] = '\0';
modifiedRules++;

printf("%c->%c%c|E \n", ch, alpha, ch);


strcpy(modifiedGrammar[modifiedRules], "");
Compiler Design (3170701) Enrollment No: 210170107518
modifiedGrammar[modifiedRules][0] = ch;
modifiedGrammar[modifiedRules][1] = '-';
modifiedGrammar[modifiedRules][2] = '>';
modifiedGrammar[modifiedRules][3] = alpha;
modifiedGrammar[modifiedRules][4] = ch;
modifiedGrammar[modifiedRules][5] = '|';
modifiedGrammar[modifiedRules][6] = '^';
modifiedGrammar[modifiedRules][7] = '\0';
modifiedRules++;
ch = (char)((int)(ch-1));
}

} else {
printf("is not left recursive.\n");
strcpy(modifiedGrammar[modifiedRules], production[i]);
modifiedRules++;
}

index = 3;
}

printf("\nModified Grammar:\n");
for (int i = 0; i < modifiedRules; i++) {
printf("%s\n", modifiedGrammar[i]);
}
for (int k = 0; k < modifiedRules; k++) {
// printf("=>%s\n", modifiedGrammar[k]);
int j = 0;
nonterminal[modifiedGrammar[k][0] - 'A'] = 1;
for (int i = 0; i < strlen(modifiedGrammar[k]); ++i) {
if (modifiedGrammar[k][i] == '|') {
++no_pro;
pro[no_pro - 1].str[j] = '\0';
pro[no_pro - 1].len = j;
pro[no_pro].str[0] = pro[no_pro - 1].str[0];
pro[no_pro].str[1] = pro[no_pro - 1].str[1];
pro[no_pro].str[2] = pro[no_pro - 1].str[2];
j = 3;
} else {
pro[no_pro].str[j] = modifiedGrammar[k][i];
++j;
if (!isNT(modifiedGrammar[k][i]) && modifiedGrammar[k][i] != '-' &&
modifiedGrammar[k][i] != '>') {
terminal[modifiedGrammar[k][i]] = 1;
}
}
}
pro[no_pro].len = j;
++no_pro;
}
}

void add_FIRST_A_to_FOLLOW_B(char A, char B) {


Compiler Design (3170701) Enrollment No: 210170107518
int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^')
follow[B - 'A'][i] = follow[B - 'A'][i] || first[A - 'A'][i];
}
}

void add_FOLLOW_A_to_FOLLOW_B(char A, char B) {


int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^')
follow[B - 'A'][i] = follow[B - 'A'][i] || follow[A - 'A'][i];
}
}

void FOLLOW() {
int t = 0;
int i, j, k, x;
while (t++ < no_pro) {
for (k = 0; k < 26; ++k) {
if (!nonterminal[k]) continue;
char nt = k + 'A';
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
if (nt == pro[i].str[j]) {
for (x = j + 1; x < pro[i].len; ++x) {
char sc = pro[i].str[x];
if (isNT(sc)) {
add_FIRST_A_to_FOLLOW_B(sc, nt);
if (first[sc - 'A']['^'])
continue;
} else {
follow[nt - 'A'][sc] = 1;
}

break;
}
if (x == pro[i].len)
add_FOLLOW_A_to_FOLLOW_B(pro[i].str[0], nt);
}
}
}
}
}
}

void add_FIRST_A_to_FIRST_B(char A, char B) {


int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^') {
first[B - 'A'][i] = first[A - 'A'][i] || first[B - 'A'][i];
}
}
Compiler Design (3170701) Enrollment No: 210170107518
}

void FIRST() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_B(sc, pro[i].str[0]);
if (first[sc - 'A']['^'])
continue;
} else {
if (sc == '\'') continue;
first[pro[i].str[0] - 'A'][sc] = 1;
}
break;
}
if (j == pro[i].len)
first[pro[i].str[0] - 'A']['^'] = 1;
}
++t;
}
}

void add_FIRST_A_to_FIRST_RHS__B(char A, int B) {


int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^')
first_rhs[B][i] = first[A - 'A'][i] || first_rhs[B][i];
}
}

void FIRST_RHS() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_RHS__B(sc, i);
if (first[sc - 'A']['^'])
continue;
} else {
first_rhs[i][sc] = 1;
}
break;
}
if (j == pro[i].len)
first_rhs[i]['^'] = 1;
}
Compiler Design (3170701) Enrollment No: 210170107518
++t;
}
}

int main() {
// readFromFile();
char production[MAX_RULES][MAX_SYMBOLS];
int num;

printf("Enter Number of Production: ");


scanf("%d", &num);

printf("Enter the grammar in the format 'E->E-A':\n");


for (int i = 0; i < num; i++) {
scanf("%s", production[i]);
}

removeLeftRecursion(production, num);
follow[pro[0].str[0] - 'A']['$'] = 1;
FIRST();
FOLLOW();
FIRST_RHS();
int i, j, k;

// display first of each variable


printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
char c = pro[i].str[0];
printf("FIRST OF %c: ", c);
for (j = 0; j < TSIZE; ++j) {
if (first[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");
}
}

// display follow of each variable


printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
char c = pro[i].str[0];
printf("FOLLOW OF %c: ", c);
for (j = 0; j < TSIZE; ++j) {
if (follow[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");
}
}
Compiler Design (3170701) Enrollment No: 210170107518

// display first of each variable ß


// in form A->ß
printf("\n");
for (i = 0; i < no_pro; ++i) {
printf("FIRST OF %s: ", pro[i].str);
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j]) {
printf("%c ", j);
}
}
printf("\n");
}

// the parse table contains '$'


// set terminal['$'] = 1
// to include '$' in the parse table
terminal['$'] = 1;

// the parse table do not read '^'


// as input
// so we set terminal['^'] = 0
// to remove '^' from terminals
terminal['^'] = 0;

// printing parse table


printf("\n");
printf("\n\t**************** LL(1) PARSING TABLE *******************\n");
printf("\t--------------------------------------------------------\n");
printf("%-10s", "");
for (i = 0; i < TSIZE; ++i) {
if (terminal[i]) printf("%-10c", i);
}
printf("\n");
int p = 0;
for (i = 0; i < no_pro; ++i) {
if (i != 0 && (pro[i].str[0] != pro[i - 1].str[0]))
p = p + 1;
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j] && j != '^') {
table[p][j] = i + 1;
} else if (first_rhs[i]['^']) {
for (k = 0; k < TSIZE; ++k) {
if (follow[pro[i].str[0] - 'A'][k]) {
table[p][k] = i + 1;
}
}
}
}
}
k = 0;
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
Compiler Design (3170701) Enrollment No: 210170107518
printf("%-10c", pro[i].str[0]);
for (j = 0; j < TSIZE; ++j) {
if (table[k][j]) {
printf("%-10s", pro[table[k][j] - 1].str);
} else if (terminal[j]) {
printf("%-10s", "");
}
}
++k;
printf("\n");
}
}

return 0;
}

Program-2

Observations and Conclusion:


Program -1:
Compiler Design (3170701) Enrollment No: 210170107518

In the above example, the grammar is given as input and first set and follow set of nonterminals
are identified.Further the LL1 parsing table is constructed .

Program-2:
Compiler Design (3170701) Enrollment No: 210170107518

The code calculates FIRST and FOLLOW sets for a grammar and generates a predictive parsing
table. For real-world applications, parser generator tools like Bison or ANTLR are recommended
for efficiency and reliability.
Quiz:
1. What is a parser and state the Role of it?
A parser is a key component of a compiler or interpreter, responsible for analyzing the syntax of a
program written in a programming language. Its primary role is to check whether the program
follows the grammar rules of the language and to construct a parse tree or an abstract syntax tree
(AST) that represents the program's structure.

2. Types of parsers? Examples to each.


The parser is mainly classified into two categories, i.e. Top-down Parser, and Bottom-up Parser. These
are explained below:
1) Top-Down Parser:
The top-down parser is the parser that generates parse for the given input string with the help
of grammar productions by expanding the non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
a) Recursive descent parser is also known as the Brute force parser or the backtracking
parser. It basically generates the parse tree by using brute force and backtracking.
b) Non-recursive descent parser is also known as LL(1) parser or predictive parser or
without backtracking parser or dynamic parser. It uses a parsing table to generate the parse
tree instead of backtracking.
2) Bottom-up Parser:
Bottom-up Parser is the parser that generates the parse tree for the given input string with the
help of grammar productions by compressing the terminals i.e. it starts from terminals and
ends on the start symbol. It uses the reverse of the rightmost derivation.
Further Bottom-up parser is classified into two types:
a) LR parser is the bottom-up parser that generates the parse tree for the given string by
using unambiguous grammar. It follows the reverse of the rightmost derivation.
b) Operator precedence parser generates the parse tree from given grammar and string but
the only condition is two consecutive non-terminals and epsilon never appears on the
right-hand side of any production. The operator precedence parsing techniques can be
applied to Operator grammars.

3. What are the Tools available for implementation?There are several tools and libraries available for
implementing parsers, each catering to different parsing techniques and programming languages.
Compiler Design (3170701) Enrollment No: 210170107518
Here are some commonly used tools for parser implementation:

 YACC/Bison
 ANTLR (ANother Tool for Language Recognition):
 PLY (Python Lex-Yacc)
 JavaCC

4. How do you calculate FIRST(),FOLLOW() sets used in Parsing Table construction?


Calculating FIRST() and FOLLOW() sets for parsing table construction involves examining the
grammar's rules and symbols:

a. FIRST() Set: To calculate the FIRST() set for a symbol, you consider the possible first terminal
symbols that can be derived from that symbol. You iterate through the grammar rules, considering
epsilon (ε) for empty derivations and the FIRST() sets of other symbols. The process continues
until there are no changes to the FIRST() set.

b. FOLLOW() Set: To calculate the FOLLOW() set for a non-terminal symbol, you consider
where that non-terminal appears in the grammar rules. You examine the symbols that can follow
the non-terminal and consider the FIRST() sets of the symbols that can follow it. This process is
repeated until there are no changes to the FOLLOW() set.

5. Name the most powerful parser.


Canonical (CLR) is the most powerful Parsers among all the LR(k) Parsers or SLR.

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/construction-of-ll1-parsing-table/
3. http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html

References used by the students:


https://www.geeksforgeeks.org/predictive-parser-in-compiler-design/

Rubric wise marks obtained:

Knowledge
Problem Completeness
of Parsing Code Quality Presentation
implementati and accuracy
Rubrics techniques (2) (2) Total
on (2) (2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks
Compiler Design (3170701) Enrollment No: 210170107518

Experiment No - 06

Aim: Implement a program for constructing


a. Recursive Decent Parser (RDP)
b. LALR Parser

Competency and PracticalSkills:


• Understanding of RDP and bottom up parsers and its role in compiler construction
• Ability to write acceptance of string through RDP and parsing of string using LALR
parsers for a given grammar
• Ability to developRDP and LALR parser using bottom up approach

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
 Understand the RDP ,broad classification of bottom up parsers and its significance in
compiler construction
 Verifying whether the string is accepted for RDP, a given grammar is parsed using LR
parsers.
 Implement a RDP and LALR parser

Software/Equipment: C compiler
Compiler Design (3170701) Enrollment No: 210170107518
Theory:

 Recursive Descent Parser:

Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can
be defined as a Parser that uses the various recursive procedure to process the input string with no
backtracking. It can be simply performed using a Recursive language. The first symbol of the
string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure.
The objective of each procedure is to read a sequence of input characters that can be produced by
the corresponding non-terminal, and return a pointer to the root of the parse tree for the non-
terminal. The structure of the procedure is prescribed by the productions for the equivalent non-
terminal.
The recursive procedures can be simply to write and adequately effective if written in a language
that executes the procedure call effectively. There is a procedure for each non-terminal in the
grammar. It can consider a global variable lookahead, holding the current input token and a
procedure match (Expected Token) is the action of recognizing the next token in the parsing
process and advancing the input stream pointer, such that lookahead points to the next token to be
parsed. Match () is effectively a call to the lexical analyzer to get the next token.
For example, input stream is a + b$.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.

 LALR (1) Parsing:

LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the
canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look
ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Compiler Design (3170701) Enrollment No: 210170107518
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-
terminal. So, the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-
terminal. So, the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b
}
I6= { A → a•A, $
A → •aA, $
A → •b, $
}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.
Compiler Design (3170701) Enrollment No: 210170107518
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine them
and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine them
and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:

LALR (1) Parsing table:

Program-1:

#include <iostream>
#include <cstdio>

char input[100];
int i, l;

int S();
int A();
int B();

int main() {
std::cout << "\nRecursive descent parsing for the following grammar" << std::endl;
std::cout << "\nS -> Aa | bAc | bBa \nA -> d \nB -> d" << std::endl;
std::cout << "\nEnter the string to be checked:";

gets(input); // Consider using fgets or C++ input for more robust input handling

if (S()) {
Compiler Design (3170701) Enrollment No: 210170107518
if (input[i] == '\0')
std::cout << "\nString is accepted" << std::endl;
else
std::cout << "\nString is not accepted" << std::endl;
} else
std::cout << "\nString not accepted" << std::endl;

return 0;
}
int S() {
if (input[i] == 'b') {
i++;
if (B() || A())
if(input[i] == 'c') i++;
else if(input[i] == 'a') i++;
else if(input[i] == '$') return 1;
else
return 0;
} else{
if(A()){
if(input[i] == 'a') i++;
return 1;
}else{
return 0;
}
}
}
int A(){
if(input[i] == 'd') {
i++;
return 1;
}
else return 0;
}

int B(){
if(input[i] == 'd'){
i++;
return 1;
}else return 0;
}

Program-2:

Observations and Conclusion:

Program -1:
Compiler Design (3170701) Enrollment No: 210170107518

In the above output, as pe the grammar provided and as per calling procedure , the tree is parsed
and thereby the inputted strings are mapped w.r.t calling procedure ; and the string/s which are
successfully parsed are accepted and others rejected.
Program-2:

//Write your output and conclusion here as given in program-2

Quiz:
1. What do you mean by shift reduce parsing?
2. Provide broad classification of LR parsers.
3. Differentiate RDP and LALR parser.
4. How to do merging of itemsets?

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/recursive-descent-parser/
3. https://www.youtube.com/watch?v=odoHgcoombw
4. https://www.geeksforgeeks.org/lalr-parser-with-examples/

Rubric wise marks obtained:


Knowledge
Problem Completeness
of Parsing Code Quality Presentation
implementati and accuracy
Rubrics techniques (2) (2) Total
on (2) (2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks
Compiler Design (3170701) Enrollment No: 210170107518

Experiment No - 07

Aim: Implement a program for constructing Operator Precedence Parsing.

Competency and PracticalSkills:


• Understanding of OPG and its role in compiler construction
• Ability to write precedence table for a given language
• Ability to developOPG program using C compiler.

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
 Understand the concept of OPG its significance in compiler construction
 Write precedence relations for grammar
 Implement a OPG using C compiler

Software/Equipment: C compiler

Theory:

Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to a class of
Grammars known as Operator Grammar.

A Grammar G is Operator Grammar if it has the following properties −


Compiler Design (3170701) Enrollment No: 210170107518
 Production should not contain ϵ on its right side.
 There should not be two adjacent non-terminals at the right side of production.
Example1 − Verify whether the following Grammar is operator Grammar or not.
E → E A E |(E)|id
A → +| − | ∗
Solution
No, it is not an operator Grammar as it does not satisfy property 2 of operator Grammar.
As it contains two adjacent Non-terminals on R.H.S of production E → E A E.
We can convert it into the operator Grammar by substituting the value of A in E → E A E.
E → E + E |E − E |E * E |(E) | id.
Operator Precedence Relations
Three precedence relations exist between the pair of terminals.

Relation Meaning

p <. q p has less precedence than q.

p >. q p has more precedence than q.

p =. q p has equal precedence than q.


Depending upon these precedence Relations, we can decide which operations will be executed or
parsed first.
Association and Precedence Rules
 If operators have different precedence
Since * has higher precedence than +
Example−
In a statement a + b * c
∴ + <. *
In statement a * b + c
∴∗ . > +
 If operators have Equal precedence, then use Association rules.
(a) Example minus; In statement a + b + c here + operators are having equal precedence.
As '+' is left Associative in a + b + c
∴ (a + b) will be computed first, and then it will be added to c.
i.e., (a + b) + c
+ .> +
Similarly, '*' is left Associative in a * b * c
(b) Example − In a statement a ↑ b ↑ c here, ↑ is the Right Associative operator
∴ It will become a ↑ (b ↑ c)
∴ (b ↑ c) will be computed first.
∴ ↑<. ↑
 Identifier has more precedence then all operators and symbols.
∴ θ <. id $ <. id
id . > θ id . > $
id . >)
(<. id.
 $ has less precedence than all other operators and symbols.
$ <. ( id . > $
$ <. + ). > $
$ <.*

Example 2– Construct the Precedence Relation table for the Grammar.


Compiler Design (3170701) Enrollment No: 210170107518
E → E + E | E ∗ E/id
Solution
Operator-Precedence Relations
Id + * $

Id .> .> .>

+ <. .> <. .>

* <. .> .> .>

$ <. <. <.


Advantages of Operator Precedence Parsing
 It is accessible to execute.
Disadvantages of Operator Precedence Parsing
 Operator Like minus can be unary or binary. So, this operator can have different
precedence’s in different statements.
 Operator Precedence Parsing applies to only a small class of Grammars.

Program:

#include <iostream>
#include <string>
using namespace std;

// Function to exit from the loop if the given condition is not true
void f() {
cout << "Not operator grammar" << endl;
exit(0);
}

int main() {
string grm[20];
char c;

// Using a flag variable, considering grammar is not an operator grammar


int n, j = 2, flag = 0;

// Taking the number of productions from the user


cout<<"Enter the number of productions:: ";
cin >> n;

cout<<"Enter the productions::\n ";


for (int i = 0; i < n; i++)
cin >> grm[i];

for (int i = 0; i < n; i++) {


Compiler Design (3170701) Enrollment No: 210170107518
c = grm[i][2];

while (c != '\0') {

if (grm[i][3] == '+' || grm[i][3] == '-' || grm[i][3] == '*' || grm[i][3] == '/')


flag = 1;
else {
flag = 0;
f();
}

if (c == '$') {
flag = 0;
f();
}

c = grm[i][++j];
}
}

if (flag == 1)
cout << "Operator grammar" << endl;

return 0;
}

Observations and Conclusion:

In the above example ,the grammar is analysed as per operator grammar rules and the output is
against the rules of OPG so, it is not an operator grammar.

In the above example ,the grammar is analysed as per operator grammar rules and the output
favors the rules of OPG(operator present between two non terminals) so, it is not an operator
grammar.

Quiz:
1. Define operator grammar.
Operator grammar is a formalism used in the field of formal languages and parsing theory to describe
the syntax of programming languages or other formal languages. In operator grammar, the grammar
Compiler Design (3170701) Enrollment No: 210170107518
rules are defined based on operators and their associated precedence and associativity. It is particularly
useful for languages that use infix notation (e.g., arithmetic expressions) and need to specify how
operators should be parsed.

2. Define operator precedence grammar.


Operator precedence grammar is a type of operator grammar that specifies not only the operators and
their associativity but also their precedence levels. It is used to resolve ambiguities in expressions by
indicating which operators should be evaluated first. Operator precedence grammar rules help in
parsing and determining the correct order of operations in expressions.

3. What are the different precedence relations in operator precedence parser?


In operator precedence parsing, there are typically three precedence relations used to determine how
operators interact:
a. Higher-Than Relation (HTR): This relation indicates that one operator has higher precedence than
another, meaning it should be evaluated before the other operator in an expression.

b. Lower-Than Relation (LTR): This relation indicates that one operator has lower precedence than
another, meaning it should be evaluated after the other operator in an expression.

c. Equal-Precedence Relation (EPR): This relation indicates that two operators have equal precedence,
and their associativity is used to determine the evaluation order.

4. What are the different methods are available to determine relations?

Determining precedence relations in operator precedence parsing typically involves defining them
explicitly in the grammar or using a table-based approach. Here are two common methods:

a. Grammar Rules: Precedence relations can be defined in the grammar rules themselves. For example,
by using production rules like:

E -> E '+' E (HTR)


E -> E '-' E (HTR)

b. Precedence Table: A table-based approach can be used to specify precedence relations. This table
defines the relationships between operators and can be consulted during parsing to resolve ambiguities.
The table might indicate which operators have higher precedence (HTR), lower precedence (LTR), or
are equal in precedence (EPR).

5. What do you mean by precedence function?


The precedence function, often referred to as the "precedence relation function," is a key component in
operator precedence parsing. It is used to determine the precedence relation between operators when
parsing expressions. The function takes two operators as input and returns whether the first operator
has higher precedence (HTR), lower precedence (LTR), or equal precedence (EPR) compared to the
second operator.

Suggested Reference:
1. https://www.gatevidyalay.com/operator-precedence-parsing/
2. https://www.geeksforgeeks.org/role-of-operator-precedence-parser/

Rubric wise marks obtained:


Rubrics Knowledge Knowledge Implementat Completeness Presentation Total
Compiler Design (3170701) Enrollment No: 210170107518
of parsing of
and accuracy
techniques precedence ion (2) (2)
(2)
(2) table (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

Experiment No - 08

Aim: Generate 3-tuple intermediate code for given infix expression.

Competency and PracticalSkills:


• Understanding of intermediate code generation and its role in compiler construction
• Ability to write intermediate code for given infix expression.

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
 Understand the different intermediate code representations and its significance in compiler
construction
 Write intermediate code for given infix expression

Software/Equipment: C compiler

Theory:

Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
Compiler Design (3170701) Enrollment No: 210170107518
Three address code is used in compiler applications:

Optimization: Three address code is often used as an intermediate representation of code


during optimization phases of the compilation process. The three address code allows the
compiler to analyze the code and perform optimizations that can improve the performance of the
generated code.
Code generation: Three address code can also be used as an intermediate representation of
code during the code generation phase of the compilation process. The three address code
allows the compiler to generate code that is specific to the target platform, while also ensuring
that the generated code is correct and efficient.
Debugging: Three address code can be helpful in debugging the code generated by the
compiler. Since three address code is a low-level language, it is often easier to read and
understand than the final generated code. Developers can use the three address code to trace the
execution of the program and identify errors or issues that may be present.
Language translation: Three address code can also be used to translate code from one
programming language to another. By translating code to a common intermediate
representation, it becomes easier to translate the code to multiple target languages.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator
Example-1: Convert the expression a * – (b + c) into three address code.

Example-2: Write three address code for following code


for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
Compiler Design (3170701) Enrollment No: 210170107518
Implementation of Three Address Code –
There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.
Advantage –
 Easy to rearrange code for global optimization.
 One can quickly access value of temporary variables using symbol table.
Disadvantage –
 Contain lot of temporaries.
 Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

2. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
 Temporaries are implicit and difficult to rearrange code.
 It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
Compiler Design (3170701) Enrollment No: 210170107518
3. Indirect Triples – This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.

Example – Consider expression a = b * – c + b * – c

Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y +
z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4

Program:
#include <iostream>
#include <string>
#include <stack>
#include <vector>
#include <iomanip>
Compiler Design (3170701) Enrollment No: 210170107518
using namespace std;

int precedence(char op) {


if (op == '+' || op == '-')
return 1;
if (op == '*' || op == '/')
return 2;
return 0;
}

struct Quadruple {
string op;
string arg1;
string arg2;
string result;
};

string infixToPostfix(string expression) {


stack<char> operators;
string postfix;

for (char c : expression) {


if (isalnum(c)) {
postfix += c;
} else if (c == '(') {
operators.push(c);
} else if (c == ')') {
while (!operators.empty() && operators.top() != '(') {
postfix += operators.top();
operators.pop();
}
operators.pop(); // Pop the '('
} else {
while (!operators.empty() && precedence(operators.top()) >= precedence(c)) {
postfix += operators.top();
operators.pop();
}
operators.push(c);
}
}

while (!operators.empty()) {
postfix += operators.top();
operators.pop();
}

return postfix;
}

vector<Quadruple> generateQuadruples(string postfix) {


stack<string> operands;
stack<char> operators;
int tempVarCount = 1;
Compiler Design (3170701) Enrollment No: 210170107518

vector<Quadruple> quadruples;

for (char c : postfix) {


if (isalnum(c)) {
string operand(1, c);
operands.push(operand);
} else {
string op2 = operands.top();
operands.pop();
string op1 = operands.top();
operands.pop();

string tempVar = "t" + to_string(tempVarCount);


tempVarCount++;
string result = tempVar;

Quadruple quad;
quad.op = c;
quad.arg1 = op1;
quad.arg2 = op2;
quad.result = result;

quadruples.push_back(quad);

operands.push(result);
}
}
return quadruples;
}

void displayQuadruples(vector<Quadruple> quadruples) {


cout << "Quadruples:" << endl;
cout << setw(10) << "OP" << setw(10) << "Arg1" << setw(10) << "Arg2" << setw(10) <<
"Result" << endl;
for (const Quadruple& quad : quadruples) {
cout << setw(10) << quad.op << setw(10) << quad.arg1 << setw(10) << quad.arg2 <<
setw(10) << quad.result << endl;
}
}

int main() {
string expression;
cout << "Enter an infix expression: ";
cin >> expression;

string postfix = infixToPostfix(expression);


cout << "Postfix notation: " << postfix << endl;

vector<Quadruple> quadruples = generateQuadruples(postfix);

cout << "\nIntermediate Code:" << endl;


for (const Quadruple& quad : quadruples) {
Compiler Design (3170701) Enrollment No: 210170107518
cout << quad.result << " := " << quad.arg1 << " " << quad.op << " " << quad.arg2 << endl;
}

displayQuadruples(quadruples);

return 0;
}

Observations and Conclusion:

In the above example the user is asked to write an infix expression and the output is generated
intermediate code(3- address code).

Quiz:
1. What are the different implementation methods for three-address code?
Three-address code (TAC) is an intermediate representation used in compilers to simplify the
translation of high-level programming languages into machine code. There are different methods to
implement TAC:
Quadruple: A quadruple is a four-element data structure used to represent an operation in three-
address code. It consists of the following components:
Triple: A triple is similar to a quadruple but contains only three elements: an operation, a source
operand, and a destination operand. Triples are often used for code generation and can represent simple
operations or assignments.
Indirect Triple: An indirect triple is a variation of the triple in which the source and destination
operands may contain references to memory locations rather than immediate values or variables.
Indirect triples are used when dealing with indirect addressing or complex memory operations.
Compiler Design (3170701) Enrollment No: 210170107518
2. What do you mean by quadruple?

A quadruple is a four-element data structure used to represent an operation in three-address code.


It consists of the following components:

1. Operation: The operation to be performed (e.g., ADD, SUB, MUL, ASSIGN).


2. Operand1: The first source operand.
3. Operand2: The second source operand.
4. Result: The destination operand where the result of the operation is stored.

3. Define triple and indirect triple.


Triple: A triple is similar to a quadruple but contains only three elements: an operation, a source
operand, and a destination operand. Triples are often used for code generation and can represent simple
operations or assignments.
Indirect Triple: An indirect triple is a variation of the triple in which the source and destination
operands may contain references to memory locations rather than immediate values or variables.
Indirect triples are used when dealing with indirect addressing or complex memory operations.

4. What is abstract syntax tree?


An abstract syntax tree is a hierarchical representation of the syntactic structure of a program or
expression in a programming language. It is a tree-like data structure where each node represents a
language construct, and the edges between nodes represent the relationships between these constructs.
The AST captures the abstract, high-level structure of the code, omitting many of the low-level details
found in parse trees or source code.

5. Differentiate parse tree and abstract syntax tree.


Aspect Parse Tree Abstract Syntax Tree (AST)
Purpose Detailed syntax High-level program structure
Nodes Many nodes Fewer nodes, abstracted syntax
Generated During Parsing phase Later compilation stages
Use in Compilation Syntax analysis Semantic analysis, optimization
Structure Reflects grammar rules Emphasizes program logic
Complexity More complex Simpler, abstracted
Application Grammar adherence High-level program analysis

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. https://www.geeksforgeeks.org/introduction-to-intermediate-representationir/
3. https://cs.lmu.edu/~ray/notes/ir/

Rubric wise marks obtained:


Problem Completeness
Knowledge Logic
Recognition and accuracy Ethics (2)
Rubrics (2) Building (2) Total
(2) (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks
Compiler Design (3170701) Enrollment No: 210170107518

Experiment No - 09

Aim: Extract Predecessor and Successor from given Control Flow Graph.

Competency and PracticalSkills:


• Understanding of Control flow structure in compilers
• Ability to write predecessor and successor for given graph

Relevant CO: CO3

Objectives:
By the end of this experiment, the students should be able to:
 Understand the concept control structure (in blocks) in compiler
 Write the predecessor and successor for given graph.

Software/Equipment: C/C++ compiler

Theory:

A basic block is a simple combination of statements. Except for entry and exit, the basic blocks
do not have any branches like in and out. It means that the flow of control enters at the
beginning and it always leaves at the end without any halt. The execution of a set of instructions
of a basic block always takes place in the form of a sequence.
The first step is to divide a group of three-address codes into the basic block. The new basic
block always begins with the first instruction and continues to add instructions until it reaches a
jump or a label. If no jumps or labels are identified, the control will flow from one instruction to
the next in sequential order.
The algorithm for the construction of the basic block is described below step by step:
Algorithm: The algorithm used here is partitioning the three-address code into basic blocks.
Input: A sequence of three-address codes will be the input for the basic blocks.
Output: A list of basic blocks with each three address statements, in exactly one block, is
considered as the output.
Method: We’ll start by identifying the intermediate code’s leaders. The following are some
guidelines for identifying leaders:
1. The first instruction in the intermediate code is generally considered as a leader.
2. The instructions that target a conditional or unconditional jump statement can be
considered as a leader.
3. Any instructions that are just after a conditional or unconditional jump statement can
be considered as a leader.
Each leader’s basic block will contain all of the instructions from the leader until the instruction
right before the following leader’s start.
Example of basic block:
Three Address Code for the expression a = b + c – d is:
T1 = b + c
T2 = T1 - d
a = T2
This represents a basic block in which all the statements execute in a sequence one after the
other.
Basic Block Construction:
Let us understand the construction of basic blocks with an example:
Example:
1. PROD = 0
Compiler Design (3170701) Enrollment No: 210170107518
2. I = 1
3. T2 = addr(A) – 4
4. T4 = addr(B) – 4
5. T1 = 4 x I
6. T3 = T2[T1]
7. T5 = T4[T1]
8. T6 = T3 x T5
9. PROD = PROD + T6
10. I = I + 1
11. IF I <=20 GOTO (5)
Using the algorithm given above, we can identify the number of basic blocks in the above three-
address code easily-
There are two Basic Blocks in the above three-address code:
 B1 – Statement 1 to 4
 B2 – Statement 5 to 11
Transformations on Basic blocks:
Transformations on basic blocks can be applied to a basic block. While transformation, we don’t
need to change the set of expressions computed by the block.
There are two types of basic block transformations. These are as follows:
1. Structure-Preserving Transformations
Structure preserving transformations can be achieved by the following methods:
1. Common sub-expression elimination
2. Dead code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
2. Algebraic Transformations
In the case of algebraic transformation, we basically change the set of expressions into an
algebraically equivalent set.
For example, and expression
x:= x + 0
or x:= x *1
This can be eliminated from a basic block without changing the set of expressions.
Flow Graph:
A flow graph is simply a directed graph. For the set of basic blocks, a flow graph shows the
flow of control information. A control flow graph is used to depict how the program control is
being parsed among the blocks. A flow graph is used to illustrate the flow of control between
basic blocks once an intermediate code has been partitioned into basic blocks. When the
beginning instruction of the Y block follows the last instruction of the X block, an edge might
flow from one block X to another block Y.
Let’s make the flow graph of the example that we used for basic block formation:
Compiler Design (3170701) Enrollment No: 210170107518

Flow Graph for above Example

Firstly, we compute the basic blocks (which is already done above). Secondly, we assign the
flow control information.

Program:
#include <iostream>

struct TreeNode {
int data;
TreeNode* left;
TreeNode* right;

TreeNode(int val) : data(val), left(nullptr), right(nullptr) {}


};

class BST {
public:
TreeNode* root;

BST() : root(nullptr) {}

// Function to insert a new node into the BST


TreeNode* insert(TreeNode* node, int data) {
if (node == nullptr) {
return new TreeNode(data);
}

if (data < node->data) {


node->left = insert(node->left, data);
} else if (data > node->data) {
node->right = insert(node->right, data);
}
Compiler Design (3170701) Enrollment No: 210170107518
return node;
}

void insert(int data) {


root = insert(root, data);
}

// Function to find the minimum value node (successor) in the BST


TreeNode* findMin(TreeNode* node) {
while (node->left != nullptr) {
node = node->left;
}
return node;
}

// Function to find the maximum value node (predecessor) in the BST


TreeNode* findMax(TreeNode* node) {
while (node->right != nullptr) {
node = node->right;
}
return node;
}

// Function to find predecessor and successor of a given node


void findPredecessorSuccessor(TreeNode* root, int key, TreeNode*& predecessor,
TreeNode*& successor) {
if (root == nullptr) {
return;
}

// If key is found in the BST


if (root->data == key) {
// The predecessor is the maximum node in the left subtree
if (root->left != nullptr) {
predecessor = findMax(root->left);
}
// The successor is the minimum node in the right subtree
if (root->right != nullptr) {
successor = findMin(root->right);
}
} else if (key < root->data) {
// If the key is smaller, search in the left subtree
successor = root;
findPredecessorSuccessor(root->left, key, predecessor, successor);
} else {
// If the key is larger, search in the right subtree
predecessor = root;
findPredecessorSuccessor(root->right, key, predecessor, successor);
}
}
};

int main() {
Compiler Design (3170701) Enrollment No: 210170107518
BST tree;
tree.insert(50);
tree.insert(30);
tree.insert(70);
tree.insert(20);
tree.insert(40);
tree.insert(60);
tree.insert(80);

int key = 40;


TreeNode* predecessor = nullptr;
TreeNode* successor = nullptr;

tree.findPredecessorSuccessor(tree.root, key, predecessor, successor);

if (predecessor != nullptr) {
std::cout << "Predecessor: " << predecessor->data << std::endl;
} else {
std::cout << "No predecessor for " << key << std::endl;
}

if (successor != nullptr) {
std::cout << "Successor: " << successor->data << std::endl;
} else {
std::cout << "No successor for " << key << std::endl;
}

return 0;
}

Observations and Conclusions:

In the above example user gets predecessor and successor of a given specific node .

Quiz:
1. What is flowgraph?
A flow graph is a directed graph containing control flow information for the set of
fundamental blocks. A control flow graph shows how program control is parsed among the
blocks.

2. Define DAG.
A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The
order of the activities is depicted by a graph, which is visually presented as a set of circles,
each one representing an activity, some of which are connected by lines, which represent the
flow from one activity to another.

3. Define Backpatching .
Backpatching is basically a process of fulfilling unspecified information. This information is
of labels. It basically uses the appropriate semantic actions during the process of code
generation. It may indicate the address of the Label in goto statements while producing TACs
Compiler Design (3170701) Enrollment No: 210170107518
for the given expressions.

4. What is concept of local transformation on a block of code?


Local transformations in the context of compiler optimization refer to code transformations
applied to a basic block or a localized portion of code within a program. These
transformations are limited in scope to a specific block or region and do not consider the entire
program's context. The goal of local transformations is to improve the efficiency and quality
of code without altering the program's overall behavior.

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman
2. https://www.geeksforgeeks.org/data-flow-analysis-compiler/

Rubric wise marks obtained:


Documentatio
Problem
Knowledge implementati Correctness n and
Recognition
Rubrics (2) on (2) (2) Presentation Total
(2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks
Compiler Design (3170701) Enrollment No: 210170107518
Experiment No - 10
Aim: Study of Learning Basic Block Scheduling Heuristics from Optimal Data.

Competency and PracticalSkills:


 Knowledge of basic computer architecture concepts
 Understanding of compiler design and optimization techniques
 Ability to analyze and interpret performance data
 Proficiency in programming languages such as C/C++ and assembly language

Relevant CO: CO4

Objectives:
By the end of this experiment, the students should be able to:
 Understanding the concept of basic block scheduling and its importance in compiler
optimization.
 Understanding the various heuristics used for basic block scheduling.
 Analyzing optimal data to learn the basic block scheduling heuristics.
 Comparing the performance of the implemented basic block scheduler with other
commonly used basic block schedulers.

Software/Equipment: Computer system

Theory:
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler. Basic block scheduling is important in its own right and also as a
building block for scheduling larger groups of instructions such as superblocks. The basic block
instruction scheduling problem is to find a minimum length schedule for a basic block a straight-
line sequence of code with a single-entry point and a single exit point subject to precedence,
latency, and resource constraints. Solving the problem exactly is known to be difficult, and most
compilers use a greedy list scheduling algorithm coupled with a heuristic. The heuristic is usually
hand-crafted, a potentially time-consuming process. Modern architectures are pipelined and can
issue multiple instructions per time cycle. On such processors, the order that the instructions are
scheduled can significantly impact performance.

The basic block instruction scheduling problem is to find a minimum length schedule for a basic
block a straight-line sequence of code with a single entry point and a single exit point subject to
precedence, latency, and resource constraints. Instruction scheduling for basic blocks is known to
be NP-complete for realistic architectures. The most popular method for scheduling basic blocks
continues to be list scheduling.

For e.g.: We consider multiple-issue pipelined processors. On such processors, there are multiple
functional units and multiple instructions can be issued (begin execution) each clock cycle.
Associated with each instruction is a delay or latency between when the instruction is issued and
when the result is available for other instructions which use the result. In this paper, we assume
that all functional units are fully pipelined and that instructions are typed. Examples of types of
instructions are load/store, integer, floating point, and branch instructions. We use the standard
labelled directed acyclic graph (DAG) representation of a basic block (see Figure 1(a)). Each node
corresponds to an instruction and there is an edge from i to j labelled with a positive integer l (i, j)
if j must not be issued until i has executed for l (i, j) cycles. Given a labelled dependency DAG for

a basic block, a schedule for a multiple-issue processor specifies an issue or start time for each
instruction or node such that the latency constraints are satisfied and the resource constraints are
satisfied. The latter are satisfied if, at every time cycle, the number of instructions of each type
Compiler Design (3170701) Enrollment No: 210170107518
issued at that cycle does not exceed the number of functional units that can execute instructions of
that type. The length of a schedule is the number of cycles needed for the schedule to complete;
i.e., each instruction has been issued at its start time and, for each instruction with no successors,
enough cycles have elapsed that the result for the instruction is available. The basic block
instruction scheduling problem is to construct a schedule with minimum length.

Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures. The
most popular method for scheduling basic blocks continues to be list scheduling. A list scheduler
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
best-first greedy heuristic. A list scheduler generates the schedule by determining all instructions
that can be scheduled at that time step, called the ready list, and uses the heuristic to determine the
best instruction on the list. The selected instruction is then added to the partial schedule and the
scheduler determines if any new instructions can be added to the ready list.

The heuristic in a list scheduler generally consists of a set of features and an order for testing the
features. Some standard features are as follows. The path length from a node i to a node j in a
DAG is the maximum number of edges along any path from i to j. The critical-path distance from
a node i to a node j in a DAG is the maximum sum of the latencies along any path from i to j.
Note that both the path length and the critical-path distance from a node i to itself is zero. A node j
is a descendant of a node i if there is a directed path from i to j; if the path consists of a single
edge, j is also called an immediate successor of i. The earliest start time of a node i is a lower
bound on the earliest cycle in which the instruction i can be scheduled.

In supervised learning of a classifier from examples, one is given a training set of instances, where
each instance is a vector of feature values and the correct classification for that instance, and is to
induce a classifier from the instances. The classifier is then used to predict the class of instances
that it has not seen before. Many algorithms have been proposed for supervised learning. One of
the most widely used is decision tree learning. In a decision tree the internal nodes of the tree are
labelled with features, the edges to the children of a node are labelled with the possible values of
the feature, and the leaves of the tree are labelled with a classification. To classify a new example,
one starts at the root and repeatedly tests the feature at a node and follows the appropriate branch
until a leaf is reached. The label of the leaf is the predicted classification of the new example.
Compiler Design (3170701) Enrollment No: 210170107518

Algorithm:

This document is on automatically learning a good heuristic for basic block scheduling using
supervised machine learning techniques. The novelty of our approach is in the quality of the
training data we obtained training instances from very large basic blocks and we performed an
extensive and systematic analysis to identify the best features and to synthesize new features—
and in our emphasis on learning a simple yet accurate heuristic.

Observations and Conclusion:

 Instruction scheduling is an important step for improving the performance of object code
produced by a compiler.
 Basic block scheduling is important as a building block for scheduling larger groups of
instructions such as superblocks.
 The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single-entry point and a single exit point
subject to precedence, latency, and resource constraints.
 Solving the problem exactly is known to be difficult, and most compilers use a greedy list
scheduling algorithm coupled with a heuristic.
 Modern architectures are pipelined and can issue multiple instructions per time cycle. The
order that the instructions are scheduled can significantly impact performance.
 Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures.
 The most popular method for scheduling basic blocks continues to be list scheduling, which
Compiler Design (3170701) Enrollment No: 210170107518
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
best-first greedy heuristic.
 The heuristic in a list scheduler generally consists of a set of features and an order for testing
the features.
 Decision tree learning is one of the most widely used algorithms for supervised learning.

Quiz:
1. What is the basic block instruction scheduling problem?
The basic block instruction scheduling problem involves determining the optimal order of
execution for instructions within a basic block of code. The objective is to minimize execution
time and maximize resource utilization while respecting dependencies among instructions. It
is a critical step in optimizing the performance of compiled code.

2. Why is instruction scheduling important for improving the performance of object code
produced by a compiler?
Instruction scheduling is vital for improving the performance of object code generated by a
compiler for several reasons:

 It optimizes the use of available hardware resources, such as functional units and registers,
reducing execution time.
 It helps break dependencies and pipeline stalls, resulting in more efficient instruction
execution.
 By rearranging instructions, it can improve instruction-level parallelism and reduce idle
time in the pipeline, enhancing overall program performance.

3. What are the constraints that need to be considered in solving the basic block instruction
scheduling problem?

When solving the basic block instruction scheduling problem, several constraints must be
considered, including:Data dependencies: Instructions must be scheduled to respect data
dependencies, such as read-after-write (RAW) and write-after-read (WAR) hazards.

 Resource constraints: Limited hardware resources, like functional units, registers, and
memory, must be managed effectively.
 Instruction latency: Different instructions may have different execution times, affecting
scheduling decisions.
 Branch instructions: Conditional branches must be handled appropriately to avoid incorrect
speculative execution.

4. What is the most popular method for scheduling basic blocks?


List scheduling is one of the most popular methods for scheduling basic blocks. List scheduling
works by maintaining a list of instructions and their ready times, and it schedules instructions in a
way that maximizes instruction-level parallelism while adhering to the defined constraints.

5. What is the heuristic used in a list scheduler?


List scheduling employs heuristics to make scheduling decisions. One common heuristic used in
list scheduling is the "Earliest Time First" (ETF) rule. This rule selects the instruction with the
earliest available time (i.e., the one with all dependencies satisfied and resources available) to
schedule next. The goal is to minimize the time instructions spend waiting for their dependencies
to be resolved, thereby optimizing instruction execution and resource utilization.

Suggested Reference:

1. https://dl.acm.org/doi/10.5555/1105634.1105652
2. https://www.worldcat.org/title/1032888564
Compiler Design (3170701) Enrollment No: 210170107518

Rubric wise marks obtained:

Problem
Knowledge Documentati Presentation(2
Recognition Ethics(2)
Rubrics (2) on (2) ) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy