CC Lab Manual CS 20

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

1

University of Engineering and Technology, Lahore


(Narowal Campus)

Lab ManuaL

Compiler Construction (Lab)


Teacher: Dr. Iqra Munir
Teaching Assistant: Muzamil Yousaf

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
2

TabLe of conTenT

LABS CONTENT PAGE


NO
LAB 1 Basics of Compiler Construction 3

LAB 2 Lexical Analyzer 9

LAB 3 Input Buffering in Compiler Design 22

LAB 4 Symbol Table 27

LAB 5 Parsing (Top-down) 34

LAB 6 Parsing (Bottom-up) 43

LAB 7 FIRST &FOLLOW Algorithm 49

LAB 8 Semantic Analysis 58

LAB 9 Shift-Reduce Parser 62

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
3

Lab # 1
Basics of Compiler Construction:

A compiler translates the code written in one language to some other language without
changing the meaning of the program. It is also expected that a compiler should make the
target code efficient and optimized in terms of time and space.

Compiler design principles provide an in-depth view of the translation and optimization
process. Compiler design covers basic translation mechanisms and error detection & and
recovery. It includes lexical, syntax, and semantic analysis as the front end, and code
generation and optimization as back end.

Compiler Design:
Computers are a balanced mix of software and hardware. Hardware is just a piece of
mechanical device, and its functions are controlled by compatible software. Hardware
understands instructions in the form of electronic charge, which is the counterpart of binary
language in software programming. Binary language has only two alphabets, 0 and 1. To
instruct, the hardware codes must be written in binary format, which is simply a series of
1s and 0s.
Language Processing System:
We have learnt that any computer system is made of hardware and software. The
hardware understands a language, which humans cannot understand. So we write programs
in high-level language, which is easier for us to understand and remember. These programs
are then fed into a series of tools and OS components to get the desired code that can be
used by the machine. This is known as Language Processing System.
Task of the Compiler:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
4

The design of a particular compiler determines which (if any) intermediate language
programs actually appear as concrete text or data structures during compilation. Any
compilation can be broken down into two major tasks:

Compiler Structure:
The decomposition of any problem identifies both tasks and data structures. For example,
in Section 1.2 we discussed the analysis and synthesis tasks. We mentioned that the
analyzer converted the source program into an abstract representation and that the
synthesizer obtained information from this abstract representation to guide its construction
of the target algorithm. Modules fall into a spectrum with single procedures at one end and
simple data objects at the other. Four points on this spectrum are important for our
purposes:
 Procedure: An abstraction of a single "memoryless" action (i.e. an action with no
internal state). It may be invoked with parameters and its e
etc. depends only upon the parameter values. (Example {A procedure to calculate
the square root of a real value.)
 Package: An abstraction of a collection of actions related by a common internal
state. The declaration of a package is also its instantiation, and hence only one
instance is possible. (Example {The analysis or structure tree module of a compiler.)
 Abstract data type: An abstraction of a data object on which a number of actions
can be performed. A declaration is separate from instantiation, and hence many
instances may exist. (Example {A stack abstraction providing the operations push,
pop, top, etc.)
 Variable: An abstraction of a data object on which exactly two operations, fetch
and store, can be performed. (Example {An integer variable in most programming
languages.)

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
5

Introduction of C++:
C++ is a high-level, general-purpose programming language created by Danish computer
scientist Bjarne Structure. First released in 1985 as an extension of the C programming language,
it has since expanded significantly over time; as of 1997, C++ has object-oriented, generic,
and functional features, in addition to facilities for low-level memory manipulation.
C++ was designed with systems programming and embedded, resource-constrained software
and large systems in mind, with performance, efficiency, and flexibility of use as its design
highlights. C++ has also been found useful in many other contexts, with key strengths being
software infrastructure and resource-constrained applications. including desktop
applications, video games, servers (e.g. e-commerce, web search, or databases), and performance-
critical applications (e.g. telephone switches or space probes).

Language:
The C++ language has two main components: a direct mapping of hardware features provided
primarily by the C subset, and zero-overhead abstractions based on those mappings. Structure
describes C++ as "a light-weight abstraction programming language [designed] for building and
using efficient and elegant abstractions";[14] and "offering both hardware access and abstraction is
the basis of C++. Doing it efficiently is what distinguishes it from other languages.
C++ inherits most of C's syntax. The following is Bjarne Structure’s version of the Hello World
program that uses the C++ Standard Library stream facility to write a message to standard output.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
6

Standard library:
The C++ standard consists of two parts: the core language and the standard library. C++
programmers expect the latter on every major implementation of C++; it includes aggregate types
(vectors, lists, maps, sets, queues, stacks, arrays, tuples), algorithms (find, for each, binary search,
random shuffle, etc.), input/output facilities (iostream, for reading from and writing to the console
and files), filesystem library, localization support, smart pointers for automatic memory
management, regular expression support, multi-threading library, atomics support (allowing a
variable to be read or written to buy at most one thread at a time without any external
synchronization), time utilities (measurement, getting current time, etc.), a system for converting
error reporting that does not use C++ exceptions into C++ exceptions, a random number
generator and a slightly modified version of the C standard library (to make it comply with the
C++ type system).
Benefits of C++ Programming:
There are many reasons to use the C++ programming language to develop software and
applications. Outside of its popularity and wide range of use cases. We highlight just a few of the
many advantages and benefits of using C++ below.

Compatibility with the C Programming Language:

Given that C++ is a derivative or extension of C and, in fact, uses practically all of the C syntax,
it is no surprise that C++ is highly compatible with C. If there is a valid C application, it is
compatible with C++ by its very nature.

Community and Resources:

As one of the oldest, most popular programming languages, C++ enjoys a vast community of
developers. A large portion of that community helps add to the continued functionality of C++ by
creating libraries that extend the language’s natural abilities.

Platform Independent and Portability:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
7

C++ is platform-independent and portable, meaning it can run on any operating system or
machine. Because of this, programs developers create in C++ will not be limited to a single OS
environment or require any further coding to work on other operating systems. This increases the
programmer’s audience reach and limits the amount of iterations of an application a coder will
have to make.

Memory Management:

With C++, developers have complete control over memory management, which technically
counts as both an advantage and a disadvantage of programming in C++. It is a bonus to developers
because they have more control over memory allocation. It is a negative in that the coder must be
responsible for the management of memory, instead of giving that task to a garbage collector, as
other programming languages do.

Embedded Systems:

As noted, C++ is a derivative of C. C, in turn, is a procedural language capable of low-level


manipulation. It is like machine language, and because of that, C is a great option for working with
embedded software and the coding of smart devices, firmware, and even compilers.

Scalability:

Scalability – the ability to scale an application to serve varying degrees of users or purposes – is
an important element to any modern piece of software. C++ programs are highly scalable and
capable of serving small amounts of data or large amounts of information.

C++ Standard Template Library (STL):

C++ has the Standard Temple Library (STL) that provides libraries – or functions – of pre-
written code that developers can use to save time instead of writing common functionality in their
software. These libraries make coders more efficient, type fewer lines of code, write quicker
programs, and avoid errors.

Execution and Compilation Speed:

C++ is a fast programming language in terms of both execution and compilation. The speed for
both is much quicker than with other general-purpose development languages. Further, C++ excels
at concurrency, which is important for critical mass servers, web servers, databases, and other
types of servers.
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
8

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
9

Lab # 2
Lexical Analyzer:
A lexical analyzer, often referred to as a Lexar or scanner, is a crucial component of a compiler
or interpreter. Its primary function is to read the source code of a program and break it down into
smaller units called tokens. These tokens are the basic building blocks of a programming language
and represent the smallest meaningful units in the code, such as keywords, identifiers, literals, and
operators.
It converts the High-level input program into a sequence of Tokens.
 Lexical Analysis can be implemented with the Deterministic finite Automata.
 The output is a sequence of tokens that is sent to the parser for syntax analysis

What is a token? A lexical token is a sequence of characters that can be treated as a unit in the
grammar of the programming languages. Example of tokens:
 Type token (id, number, real, . . . )
 Punctuation tokens (IF, void, return, . . . )
 Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name, etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc
Example of Non-Tokens:
 Comments, preprocessor directive, macros, blanks, tabs, newline, etc.
Lexeme: The sequence of characters matched by a pattern to form the corresponding token or a
sequence of input characters that comprises a single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .

How Lexical Analyzer works:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
10

1. Input preprocessing: This stage involves cleaning up the input text and preparing it for
lexical analysis. This may include removing comments, whitespace, and other non-essential
characters from the input text.
2. Tokenization: This is the process of breaking the input text into a sequence of tokens. This
is usually done by matching the characters in the input text against a set of patterns or regular
expressions that define the different types of tokens.
3. Token classification: In this stage, the lexer determines the type of each token. For
example, in a programming language, the lexer might classify keywords, identifiers,
operators, and punctuation symbols as separate token types.
4. Token validation: In this stage, the lexer checks that each token is valid according to the
rules of the programming language. For example, it might check that a variable name is a
valid identifier, or that an operator has the correct syntax.
5. Output generation: In this final stage, the lexer generates the output of the lexical analysis
process, which is typically a list of tokens. This list of tokens can then be passed to the next
stage of compilation or interpretation.

Advantages:

 Efficiency
 Flexibility
 Error Detection

Recognition of operators/variables:
In a lexical analyzer or lexer, the recognition of operators and variables involves identifying and
categorizing sequences of characters in the source code into tokens that represent operators and
variables. Here's how this recognition typically works:
Operators Recognition:
Operators are symbols in the source code that represent various operations, such as addition,
subtraction, multiplication, and comparison. To recognize operators, the lexer typically follows
these steps:
 It scans the source code character by character.
 When it encounters a character that could be the start of an operator (e.g., "+", "-",
"*", "="), it begins forming a token.
 The lexer continues scanning characters until it determines that the token is
complete or until it encounters a character that cannot be part of the operator.
 Once the lexer identifies the operator token, it categorizes it as an operator token
and records the specific operator it represents (e.g., "+" for addition, "=" for
assignment).

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
11

Variables Recognition:
Variables are names given to values or objects in the program. To recognize variables, the lexer
generally follows these steps:
 It scans the source code character by character.
 When it encounters a character that could be the start of a variable name (e.g., a letter or
an underscore), it begins forming a token.
 The lexer continues scanning characters until it determines that the token is complete or
until it encounters a character that cannot be part of a variable name (e.g., a space or
punctuation).
 Once the lexer identifies the variable token, it categorizes it as a variable token and records
the variable's name.
Here's an example in C++:

Recognition of keywords/constants:
Recognition of keywords and constants in a lexical analyzer follows a similar pattern to the
recognition of operators and variables. Let's break down how the lexer typically identifies
keywords and constants in the source code:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
12

Keywords Recognition:
Keywords are reserved words in the programming language that have a special meaning.
Examples of keywords include "if," "while," "int," "for," "return," etc. To recognize keywords, the
lexer generally follows these steps:
1. It scans the source code character by character, just like in the case of operators and
variables.
2. When it encounters a character that could be the start of a keyword (e.g., a letter), it begins
forming a token.
3. The lexer continues scanning characters until it determines that the token is complete or
until it encounters a character that cannot be part of a keyword (e.g., a space or
punctuation).
4. Once the lexer identifies the keyword token, it categorizes it as a "Keyword" token and
records the specific keyword it represents (e.g., "if" for conditional statements, "int" for
declaring integer types).
Here's an example in C++:

Constants Recognition:
Constants in a programming language can be numeric literals (integers or floating-point
numbers), string literals, character literals, or other special values. To recognize constants, the
lexer typically follows these steps:
1. It scans the source code character by character, similar to the previous cases.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
13

2. When it encounters a character that could be the start of a constant (e.g., a digit for numeric
literals or a quote for string literals), it begins forming a token.
3. The lexer continues scanning characters until it determines that the token is complete or
until it encounters a character that cannot be part of the constant (e.g., a space or
punctuation).
4. Once the lexer identifies the constant token, it categorizes it as a "Constant" token and
records the specific value it represents (e.g., "42" for an integer literal, "Hello, World!" for
a string literal).
Here's an example in Python:

CODE
C++ code to generate the lexical analyzer with the Recognition of operators/variables.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
14

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
15

C++ code to generate the lexical analyzer with the Recognition of keywords/constants.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
16

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
17

Activity
Generate the Calculator which also show the previous value.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
18

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
19

Create a Stack that perform push and pop operation.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
20

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
21

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
22

Lab # 3
Input Buffering in Compiler Design:
Lexical Analysis has to access secondary memory each time to identify tokens. It is time-
consuming and costly. So, the input strings are stored into a buffer and then scanned by
Lexical Analysis.
Lexical Analysis scans input string from left to right one character at a time to identify
tokens. It uses two pointers to scan tokens −
 Begin Pointer (bptr) − It points to the beginning of the string to be read.
 Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.
Example − For statement int a, b;
 Both pointers start at the beginning of the string, which is stored in the buffer.

 Look Ahead Pointer scans buffer until the token is found.

 The character ("blank space") beyond the token ("int") have to be examined before
the token ("int") will be determined.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
23

 After processing token ("int") both pointers will set to the next token ('a'), & this
process will be repeated for the whole program.

A buffer can be divided into two halves. If the look Ahead pointer moves towards
halfway in First Half, the second half is filled with new characters to be read. If the look
Ahead pointer moves towards the right end of the buffer of the second half, the first half
will be filled with new characters, and it goes on.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
24

Sentinels − Sentinels are used to making a check, each time when the forward pointer is
converted, a check is completed to provide that one half of the buffer has not converted
off. If it is completed, then the other half should be reloaded.
Buffer Pairs − A specialized buffering technique can decrease the amount of overhead,
which is needed to process an input character in transferring characters. It includes two
buffers, each includes N-character size which is reloaded alternatively.
There are two pointers such as the lexeme Begin and forward are supported. Lexeme
Begin points to the starting of the current lexeme which is discovered. Forward scans ahead
before a match for a pattern are discovered. Before a lexeme is initiated, lexeme begin is
set to the character directly after the lexeme which is only constructed, and forward is set
to the character at its right end.

Code
Code to build a compiler using the buffer.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
25

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
26

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
27

Lab # 4
Symbol Table
Symbol table is an important data structure created and maintained by compilers in order
to store information about the occurrence of various entities such as variable names,
function names, objects, classes, interfaces, etc. Symbol table is used by both the analysis
and the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:

 To store the names of all entities in a structured form at one place.


 To verify if a variable has been declared.
 To implement type checking, by verifying assignments and expressions in the
source code are semantically correct.
 To determine the scope of a name (scope resolution).

A symbol table is simply a table that can be either linear or a hash table. It maintains an
entry for each name in the following format:

<symbol name, type, attribute>

For example, if a symbol table has to store information about the following variable
declaration:

static int interest;

then it should store the entry such as:

<interest, int, static>

The attribute clause contains the entries related to the name.

Implementation:

If a compiler is to handle a small amount of data, then the symbol table can be implemented
as an unordered list, which is easy to code, but it is only suitable for small tables. A symbol
table can be implemented in one of the following ways:

 Linear (sorted or unsorted) list


 Binary Search Tree
 Hash table
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
28

Among all, symbol tables are mostly implemented as hash tables, where the source code
symbol itself is treated as a key for the hash function and the return value is the information
about the symbol.

Operations:

A symbol table, either linear or hash, should provide the following operations.

insert()

This operation is more frequently used by analysis phase, i.e., the first half of the compiler
where tokens are identified and names are stored in the table. This operation is used to add
information in the symbol table about unique names occurring in the source code. The
format or structure in which the names are stored depends upon the compiler in hand.

An attribute for a symbol in the source code is the information associated with that symbol.
This information contains the value, state, scope, and type about the symbol. The insert()
function takes the symbol and its attributes as arguments and stores the information in the
symbol table.

For example:

int a;

should be processed by the compiler as:

insert(a, int);

lookup()

lookup() operation is used to search a name in the symbol table to determine:

 if the symbol exists in the table.


 if it is declared before it is being used.
 if the name is used in the scope.
 if the symbol is initialized.
 if the symbol declared multiple times.

The format of lookup() function varies according to the programming language. The basic
format should match the following:

lookup(symbol)

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
29

This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol
exists in the symbol table, it returns its attributes stored in the table.

Scope Management:

A compiler maintains two types of symbol tables: a global symbol table which can be
accessed by all the procedures and scope symbol tables that are created for each scope in
the program.

To determine the scope of a name, symbol tables are arranged in hierarchical structure as
shown in the example below:

...
int value=10;

void pro_one()
{
int one_1;
int one_2;

{ \
int one_3; |_ inner scope 1
int one_4; |
} /

int one_5;

{ \
int one_6; |_ inner scope 2
int one_7; |
} /
}

void pro_two()
{
int two_1;
int two_2;

{ \
int two_3; |_ inner scope 3
int two_4; |
} /

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
30

int two_5;
}
...

The above program can be represented in a hierarchical structure of symbol tables:

The global symbol table contains names for one global variable (int value) and two
procedure names, which should be available to all the child nodes shown above. The names
mentioned in the pro_one symbol table (and all its child tables) are not available for
pro_two symbols and its child tables.

Code
Code to Create the Symbol Table.

#include <iostream>
#include <string>
#include <map>

// Enum to represent variable scope


enum class Scope {
Local,
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
31

Global
};

// SymbolInfo struct to store information about symbols


struct SymbolInfo {
std::string type;
int offset; // For illustration purposes; you'd typically store more information
Scope scope; // Added scope field
};

// SymbolTable class to manage the symbol table


class SymbolTable {
public:
// Insert a symbol into the symbol table
void insert(const std::string& name, const std::string& type, int offset, Scope scope) {
table[name] = { type, offset, scope };
}

// Lookup a symbol in the symbol table


SymbolInfo lookup(const std::string& name) {
auto it = table.find(name);
if (it != table.end()) {
return it->second;
} else {
return { "Not Found", -1, Scope::Local }; // Default to local
}
}
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
32

// Check if a symbol exists in the symbol table


bool exists(const std::string& name) {
return table.find(name) != table.end();
}

private:
std::map<std::string, SymbolInfo> table;
};

int main() {
SymbolTable symbolTable;

// Insert global symbols into the symbol table


symbolTable.insert("global_var", "int", 0, Scope::Global);

// Enter a local scope


{
// Insert local symbols into the symbol table
symbolTable.insert("x", "int", 4, Scope::Local);
symbolTable.insert("y", "float", 8, Scope::Local);

// Lookup a local symbol


SymbolInfo localInfo = symbolTable.lookup("x");
std::cout << "Local Symbol: x, Type: " << localInfo.type << ", Offset: " <<
localInfo.offset << ", Scope: " << (localInfo.scope == Scope::Global ? "Global" :
"Local") << std::endl;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
33

} // Exit local scope

// Lookup global symbol


SymbolInfo globalInfo = symbolTable.lookup("global_var");
std::cout << "Global Symbol: global_var, Type: " << globalInfo.type << ", Offset: " <<
globalInfo.offset << ", Scope: " << (globalInfo.scope == Scope::Global ? "Global" :
"Local") << std::endl;

// Lookup a non-existent symbol


SymbolInfo nonExistentInfo = symbolTable.lookup("w");
std::cout << "Symbol: w, Type: " << nonExistentInfo.type << ", Offset: " <<
nonExistentInfo.offset << ", Scope: " << (nonExistentInfo.scope == Scope::Global ?
"Global" : "Local") << std::endl;

return 0;
}

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
34

Lab # 5
Parsing:
Parsing is the process of converting information from one type to another. The parser is
a component of the translator in the organization of linear text structure according to a set
of defined rules known as grammar.

Types of the Parser:

Parser is divided into two types:

 Bottom-up parser
 Top-down parser

Top-Down Parser:

A top-down parser in compiler design can be considered to construct a parse tree for an
input string in preorder, starting from the root. It can also be considered to create a leftmost
derivation for an input string. The leftmost derivation is built by a top-down parser. A top-
down parser builds the leftmost derivation from the grammar’s start symbol. Then it
chooses a suitable production rule to move the input string from left to right in sentential
form.

Example of Top-Down Parsing:

Consider the lexical analyzer’s input string ‘acb’ for the following grammar by using
left leftmost deviation.

S->aAb

A->cd|c

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
35

Classification of the Top-Down Parser:

Top-down parsers can be classified based on their approach to parsing as follows:

 Recursive-descent parsers: Recursive-descent parsers are a type of top-down parser


that uses a set of recursive procedures to parse the input. Each non-terminal symbol
in the grammar corresponds to a procedure that parses input for that symbol.
 Backtracking parsers: Backtracking parsers are a type of top-down parser that can
handle non-deterministic grammar. When a parsing decision leads to a dead end,
the parser can backtrack and try another alternative.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
36

 Non-backtracking parsers: Non-backtracking is a technique used in top-down


parsing to ensure that the parser doesn’t revisit already-explored paths in the parse
tree during the parsing process.
 Predictive parsers: Predictive parsers are top-down parsers that use parsing to
predict which production rule to apply based on the next input symbol. Predictive
parsers are also called LL parsers because they construct a left-to-right, leftmost
derivation of the input string.

Advantages of Top-Down Parsing:

 Top-down parsing is much simpler.


 It is incredibly easy to identify the response action of the top-down parser.

Disadvantages of Top-Down Parsing:

 Top-down parsing cannot handle left recursion in the grammar’s present.


 When using recursive descent parsing, the parser may need to backtrack when it
encounters a symbol that does not match the expected token. This can make the
parsing process slower and less efficient.

Code
Code for top-down parsing.

#include <iostream>

#include <cctype>

#include <stdexcept>

#include <memory>

class TreeNode {

public:

virtual double evaluate() = 0;

virtual void display(int depth) = 0;

};

class BinaryOpNode : public TreeNode {


Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
37

public:

BinaryOpNode(char op, std::shared_ptr<TreeNode> left, std::shared_ptr<TreeNode>


right)

: op(op), left(left), right(right) {}

double evaluate() override {

double leftValue = left->evaluate();

double rightValue = right->evaluate();

if (op == '+') {

return leftValue + rightValue;

} else if (op == '-') {

return leftValue - rightValue;

} else if (op == '*') {

return leftValue * rightValue;

} else if (op == '/') {

if (rightValue == 0) {

throw std::runtime_error("Division by zero.");

return leftValue / rightValue;

} else {

throw std::runtime_error("Unknown operator.");

}
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
38

void display(int depth) override {

std::cout << std::string(depth * 2, ' ') << "Binary Operator: " << op << std::endl;

left->display(depth + 1);

right->display(depth + 1);

private:

char op;

std::shared_ptr<TreeNode> left;

std::shared_ptr<TreeNode> right;

};

class NumberNode : public TreeNode {

public:

NumberNode(double value) : value(value) {}

double evaluate() override {

return value;

void display(int depth) override {

std::cout << std::string(depth * 2, ' ') << "Number: " << value << std::endl;

private:

double value;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
39

};

class Parser {

public:

Parser(const std::string& input) : input(input), position(0) {}

std::shared_ptr<TreeNode> parse() {

auto result = expression();

if (position < input.length()) {

throw std::runtime_error("Parsing error: Unexpected input.");

return result;

private:

std::shared_ptr<TreeNode> expression() {

auto left = term();

while (position < input.length() && (input[position] == '+' || input[position] == '-'))


{

char op = input[position++];

auto right = term();

left = std::make_shared<BinaryOpNode>(op, left, right);

return left;

}
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
40

std::shared_ptr<TreeNode> term() {

auto left = factor();

while (position < input.length() && (input[position] == '*' || input[position] == '/')) {

char op = input[position++];

auto right = factor();

left = std::make_shared<BinaryOpNode>(op, left, right);

return left;

std::shared_ptr<TreeNode> factor() {

if (position < input.length() && input[position] == '(') {

position++;

auto result = expression();

if (position < input.length() && input[position] == ')') {

position++;

return result;

} else {

throw std::runtime_error("Parsing error: Missing closing parenthesis.");

} else if (isdigit(input[position])) {

size_t end;

double number = std::stod(input.substr(position), &end);

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
41

position += end;

return std::make_shared<NumberNode>(number);

} else {

throw std::runtime_error("Parsing error: Unexpected character.");

const std::string& input;

size_t position;

};

int main() {

std::string input;

std::cout << "Enter an arithmetic expression: ";

std::cin >> input;

try {

Parser parser(input);

std::shared_ptr<TreeNode> parseTree = parser.parse();

std::cout << "Parse Tree:" << std::endl;

parseTree->display(0);

double result = parseTree->evaluate();

std::cout << "Result: " << result << std::endl;

} catch (const std::exception& e) {

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
42

std::cerr << "Error: " << e.what() << std::endl;

return 0;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
43

Lab # 6
Parsing:
Parsing is the process of converting information from one type to another. The parser is
a component of the translator in the organization of linear text structure according to a set
of defined rules known as grammar.

Types of the Parser:

Parser is divided into two types:

 Bottom-up parser
 Top-down parser

Bottom-Up Parser:
Bottom-up Parsers / Shift Reduce Parsers Build the parse tree from leaves to root.
Bottom-up parsing can be defined as an attempt to reduce the input string w to the start
symbol of grammar by tracing out the rightmost derivations of w in reverse.
Eg.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
44

Classification of Bottom-up Parsers:

A general shift reduce parsing is LR parsing. The L stands for scanning the input from
left to right and R stands for constructing a rightmost derivation in reverse.

Benefits of LR parsing:
1. Many programming languages using some variations of an LR parser. It should be
noted that C++ and Perl are exceptions to it.
2. LR Parser can be implemented very efficiently.
3. Of all the Parsers that scan their symbols from left to right, LR Parsers detect syntactic
errors, as soon as possible.

CODE
Code for Bottom-up Parser with the help of DFA.
#include <iostream>
#include <string>
#include <vector>

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
45

#include <stdexcept>

// Define DFA states


enum class State {
START,
NUMBER,
OPERATOR,
ERROR
};

// Define DFA transitions


State transition(State currentState, char inputSymbol) {
switch (currentState) {
case State::START:
if (isdigit(inputSymbol)) {
return State::NUMBER;
}
return State::ERROR;
case State::NUMBER:
if (isdigit(inputSymbol)) {
return State::NUMBER;
} else if (inputSymbol == '+' || inputSymbol == '-' || inputSymbol == '*' ||
inputSymbol == '/') {
return State::OPERATOR;
}
return State::ERROR;
case State::OPERATOR:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
46

if (isdigit(inputSymbol)) {
return State::NUMBER;
}
return State::ERROR;
default:
return State::ERROR;
}
}

std::vector<char> parseBottomUp(const std::string& inputString) {


std::vector<char> parseStack;
State currentState = State::START;

for (char inputSymbol : inputString) {


currentState = transition(currentState, inputSymbol);

if (currentState == State::ERROR) {
break;
}

parseStack.push_back(inputSymbol);
}

if (currentState != State::NUMBER) {
parseStack.clear(); // Clear the stack if parsing fails
}

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
47

return parseStack;
}

int main() {
std::string inputString;
std::cout << "Enter an arithmetic expression: ";
std::cin >> inputString;

std::vector<char> parseStack = parseBottomUp(inputString);

if (parseStack.empty()) {
std::cout << "Error: Invalid expression" << std::endl;
} else {
std::cout << "Parse Stack:" << std::endl;
for (char symbol : parseStack) {
std::cout << symbol << std::endl;
}
std::cout << "Input String Length: " << inputString.length() << std::endl;
}

return 0;
}

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
48

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
49

Lab # 7
FIRST:
We saw the need of backtrack in the previous article of on Introduction to Syntax
Analysis, which is really a complex process to implement. There can be easier way to
sort out this problem:
If the compiler would have come to know in advance, that what is the “first character
of the string produced when a production rule is applied”, and comparing it to the current
character or token in the input string it sees, it can wisely take decision on which
production rule to apply.
Example:
Let’s take the same grammar from the previous article:
S -> cAd
A -> bc|a
And the input string is “cad”.
Thus, in the example above, if it knew that after reading character ‘c’ in the input string
and applying S->cAd, next character in the input string is ‘a’, then it would have ignored
the production rule A->bc (because ‘b’ is the first character of the string produced by this
production rule, not ‘a’ ), and directly use the production rule A->a (because ‘a’ is the
first character of the string produced by this production rule, and is same as the current
character of the input string which is also ‘a’).

FOLLOW:
The parser faces one more problem. Let us consider below grammar to understand this
problem.
A -> aBb
B -> c | ε
And suppose the input string is “ab” to parse.
As the first character in the input is a, the parser applies the rule A->aBb.
A
/| \
a B b
Now the parser checks for the second character of the input string which is b, and the
Non-Terminal to derive is B, but the parser can’t get any string derivable from B that
contains b as first character.
But the Grammar does contain a production rule B -> ε, if that is applied then B will
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
50

vanish, and the parser gets the input “ab”, as shown below. But the parser can apply it
only when it knows that the character that follows B in the production rule is same as the
current character in the input.
Example:
In RHS of A -> aBb, b follows Non-Terminal B, i.e. FOLLOW(B) = {b}, and the
current input character read is also b. Hence the parser applies this rule. And it is able to
get the string “ab” from the given grammar.
A A
/ | \ / \
a B b => a b
|
ε
So FOLLOW can make a Non-terminal vanish out if needed to generate the string from
the parse tree.

The conclusions is, we need to find FIRST and FOLLOW sets for a given grammar so
that the parser can properly apply the needed rule at the correct position.
In the next article, we will discuss formal definitions of FIRST and FOLLOW, and
some easy rules to compute these sets.
CODE
Implementation of the first and follow algorithm.

#include <iostream>

#include <set>

#include <map>

#include <vector>

#include <string>

using namespace std;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
51

// Structure to represent a production rule

struct Production {

char nonTerminal;

string production;

};

// Function to compute the First set for a given non-terminal

set<char> computeFirst(char nonTerminal, vector<Production>& productions, map<char,


set<char>>& first) {

set<char> firstSet;

for (const Production& production : productions) {

if (production.nonTerminal == nonTerminal) {

// If the first symbol of the production is a terminal, add it to the First set

if (isalpha(production.production[0]) && islower(production.production[0])) {

firstSet.insert(production.production[0]);

// If the first symbol of the production is a non-terminal, calculate its First set

else if (isalpha(production.production[0]) && isupper(production.production[0]))


{

set<char> subFirst = computeFirst(production.production[0], productions,


first);

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
52

firstSet.insert(subFirst.begin(), subFirst.end());

first[nonTerminal] = firstSet;

return firstSet;

// Function to compute the Follow set for a given non-terminal

set<char> computeFollow(char nonTerminal, vector<Production>& productions,


map<char, set<char>>& first, map<char, set<char>>& follow) {

set<char> followSet;

if (nonTerminal == productions[0].nonTerminal) {

followSet.insert('$');

for (const Production& production : productions) {

size_t pos = production.production.find(nonTerminal);

if (pos != string::npos) {

string remaining = production.production.substr(pos + 1);

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
53

if (remaining.empty()) {

if (production.nonTerminal != nonTerminal) {

set<char> subFollow = computeFollow(production.nonTerminal,


productions, first, follow);

followSet.insert(subFollow.begin(), subFollow.end());

} else {

bool nullable = true;

for (char symbol : remaining) {

if (isalpha(symbol) && isupper(symbol)) {

set<char> subFirst = first[symbol];

followSet.insert(subFirst.begin(), subFirst.end());

if (subFirst.find('e') == subFirst.end()) {

nullable = false;

break;

} else if (isalpha(symbol) && islower(symbol)) {

followSet.insert(symbol);

nullable = false;

break;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
54

} else {

followSet.insert(symbol);

nullable = false;

break;

if (nullable) {

set<char> subFollow = computeFollow(production.nonTerminal,


productions, first, follow);

followSet.insert(subFollow.begin(), subFollow.end());

follow[nonTerminal] = followSet;

return followSet;

int main() {

vector<Production> productions = {

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
55

{'S', "AB"},

{'A', "aA"},

{'A', "e"},

{'B', "bB"},

{'B', "c"}

};

map<char, set<char>> first, follow;

for (const Production& production : productions) {

if (isalpha(production.production[0]) && islower(production.production[0])) {

first[production.nonTerminal].insert(production.production[0]);

for (const Production& production : productions) {

if (isalpha(production.production[0]) && isupper(production.production[0])) {

computeFirst(production.nonTerminal, productions, first);

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
56

computeFollow(productions[0].nonTerminal, productions, first, follow);

// Display the First and Follow sets

for (const auto& entry : first) {

cout << "First(" << entry.first << "): {";

for (char symbol : entry.second) {

cout << symbol << ", ";

cout << "}" << endl;

for (const auto& entry : follow) {

cout << "Follow(" << entry.first << "): {";

for (char symbol : entry.second) {

cout << symbol << ", ";

cout << "}" << endl;

return 0;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
57

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
58

Lab # 8
Semantic Analysis:
Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure
that declarations and statements of program are semantically correct. It is a collection of
procedures which is called by parser as and when required by grammar. Both syntax tree
of previous phase and symbol table are used to check the consistency of the given
code. Type checking is an important part of semantic analysis where compiler makes sure
that each operator has matching operands.

Semantic Analyzer:

It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either
syntax tree or symbol table. This type of information is subsequently used by compiler
during intermediate-code generation.

Semantic Errors:
Errors recognized by semantic analyzer are as follows:
 Type mismatch
 Undeclared variables
 Reserved identifier misuse

Functions of Semantic Analysis:


1. Type Checking –
Ensures that data types are used in a way consistent with their definition.
2. Label Checking –
A program should contain labels references.
3. Flow Control Check –
Keeps a check that control structures are used in a proper manner.(example: no
break statement outside a loop)

Example:
float x = 10.1;
float y = x*30;
In the above example integer 30 will be type casted to float 30.0 before multiplication,
by semantic analyzer.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
59

Static and Dynamic Semantics:


1. Static Semantics –
It is named so because these are checked at compile time. The static semantics and
meaning of program during execution, are indirectly related.
2. Dynamic Semantic Analysis –
It defines the meaning of different units of program like expressions and statements.
These are checked at runtime unlike static semantics.

CODE
Implementation of semantic analyzer.
#include <iostream>
#include <string>
#include <unordered_set>
#include <unordered_map>
#include <sstream>

// Create a symbol table to keep track of declared variables


std::unordered_set<std::string> declaredVariables;

// Create a map to store variable types


std::unordered_map<std::string, std::string> variableTypes;

// Function to declare a variable with a given name and type


void DeclareVariable(const std::string& name, const std::string& type) {
// Check if the variable is already declared
if (declaredVariables.count(name) == 1) {
std::cerr << "Error: Variable '" << name << "' is already declared.\n";
exit(1);
}

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
60

// Add the variable to the symbol table and associate it with its type
declaredVariables.insert(name);
variableTypes[name] = type;
}

// Function to check if a variable with a given name is declared


void CheckVariable(const std::string& name) {
// Check if the variable is declared in the symbol table
if (declaredVariables.count(name) == 0) {
std::cerr << "Error: Variable '" << name << "' is undeclared.\n";
exit(1);
}
}

int main() {
std::string input;
std::cout << "Enter 'declare <type> <name>' to declare a variable, or 'check <name>' to
check if a variable is declared: ";
std::getline(std::cin, input);

std::istringstream iss(input);
std::string action;
iss >> action;

if (action == "declare") {
std::string type, name;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
61

iss >> type >> name;


DeclareVariable(name, type);
std::cout << "Variable '" << name << "' of type '" << type << "' has been declared."
<< std::endl;
} else if (action == "check") {
std::string name;
iss >> name;
CheckVariable(name);
std::cout << "Variable '" << name << "' is declared." << std::endl;
} else {
std::cerr << "Invalid input. Please use 'declare' or 'check' commands.\n";
}

return 0;
}
Output:

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
62

Lab # 9
Shift Reduce parser:
Shift Reduce parser attempts for the construction of parse in a similar manner as
done in bottom-up parsing i.e. the parse tree is constructed from leaves(bottom) to the
root(up). A more general form of the shift-reduce parser is the LR parser.
This parser requires some data structures i.e.
 An input buffer for storing the input string.
 A stack for storing and accessing the production rules.

Basic Operations:
 Shift: This involves moving symbols from the input buffer onto the stack.
 Reduce: If the handle appears on top of the stack then, its reduction by using the
appropriate production rule is done i.e. RHS of a production rule is popped out of a
stack and LHS of a production rule is pushed onto the stack.
 Accept: If only the start symbol is present in the stack and the input buffer is empty
then, the parsing action is called accept. When accepted action is obtained, it means
successful parsing is done.
 Error: This is the situation in which the parser can neither perform shift action nor
reduce action and not even accept action.

Example 1 – Consider the grammar


S –> S + S
S –> S * S
S –> id
Perform Shift Reduce parsing for input string “id + id + id”.

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
63

Example 2 – Consider the grammar


E –> 2E2
E –> 3E3
E –> 4
Perform Shift Reduce parsing for input string “32423”.

CODE
Implementation of shift-reduce parser.
// Including Libraries
#include <bits/stdc++.h>
using namespace std;

// Global Variables
int z = 0, i = 0, j = 0, c = 0;

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)
64

// Modify array size to increase


// length of string to be parsed
char a[16], ac[20], stk[15], act[10];

// This Function will check whether


// the stack contain a production rule
// which is to be Reduce.
// Rules can be E->2E2 , E->3E3 , E->4
void check()
{
// Copying string to be printed as action
strcpy(ac,"REDUCE TO E -> ");

// c=length of input string


for(z = 0; z < c; z++)
{
// checking for producing rule E->4
if(stk[z] == '4')
{
printf("%s4", ac);
stk[z] = 'E';
stk[z + 1] = '\0';

//printing action
printf("\n$%s\t%s$\t", stk, a);
}
}
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
65

for(z = 0; z < c - 2; z++)


{
// checking for another production
if(stk[z] == '2' && stk[z + 1] == 'E' &&
stk[z + 2] == '2')
{
printf("%s2E2", ac);
stk[z] = 'E';
stk[z + 1] = '\0';
stk[z + 2] = '\0';
printf("\n$%s\t%s$\t", stk, a);
i = i - 2;
}

for(z = 0; z < c - 2; z++)


{
//checking for E->3E3
if(stk[z] == '3' && stk[z + 1] == 'E' &&
stk[z + 2] == '3')
{
printf("%s3E3", ac);
stk[z]='E';
stk[z + 1]='\0';
stk[z + 2]='\0';
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
66

printf("\n$%s\t%s$\t", stk, a);


i = i - 2;
}
}
return ; // return to main
}

// Driver Function
int main()
{
printf("GRAMMAR is -\nE->2E2 \nE->3E3 \nE->4\n");

// a is input string
strcpy(a,"32423");

// strlen(a) will return the length of a to c


c=strlen(a);

// "SHIFT" is copied to act to be printed


strcpy(act,"SHIFT");

// This will print Labels (column name)


printf("\nstack \t input \t action");

// This will print the initial


// values of stack and input
printf("\n$\t%s$\t", a);
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
67

// This will Run upto length of input string


for(i = 0; j < c; i++, j++)
{
// Printing action
printf("%s", act);

// Pushing into stack


stk[i] = a[j];
stk[i + 1] = '\0';

// Moving the pointer


a[j]=' ';

// Printing action
printf("\n$%s\t%s$\t", stk, a);

// Call check function ..which will


// check the stack whether its contain
// any production or not
check();
}

// Rechecking last time if contain


// any valid production then it will
// replace otherwise invalid
check();
Department of Computer science
University of Engineering and Technology, Lahore (Narowal Campus)
68

// if top of the stack is E(starting symbol)


// then it will accept the input
if(stk[0] == 'E' && stk[1] == '\0')
printf("Accept\n");
else //else reject
printf("Reject\n");
}

Department of Computer science


University of Engineering and Technology, Lahore (Narowal Campus)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy