CC Assignment
CC Assignment
CC Assignment
1
Describe the process of lexical analysis in compiler construction. Write a program that takes a source
code input and performs tokenization, identifying keywords, operators, and identifiers.
Answer.
Lexical Analysis
Lexical Analysis is the first phase of a compiler that takes the input as a source code written in a high-
level language. Lexical analysis is the process of breaking down the source code of the program into
smaller parts, called tokens, such that a computer can easily understand. These tokens can be individual
words or symbols in a sentence, such as keywords, variable names, numbers, and punctuation. It is also
known as a scanner. Lexical Analysis can be implemented with the Deterministic Finite Automata. The
output generated from Lexical Analysis are a sequence of tokens sent to the parser for syntax analysis.
Token:
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages.
Lexeme
The sequence of characters matched by a pattern to form the corresponding token or a sequence of input
characters that comprises a single token is called a lexeme. Eg- “float”, “abs_zero_Kelvin”, “=”, “-”,
“273”, “;”
Output
Token: int, Type: Keyword
Question no. 2
Explain the phases of a compiler and provide examples of tasks performed in each phase. Illustrate how a
source code is transformed step-by-step through these phases until it becomes executable machine code.
Answer.
Phases of Compiler
Compiler operates in various phases each phase transforms the source program from one representation to
another. Every phase takes inputs from its previous stage and feeds its output to the next phase of the
compiler.
There are 6 phases in a compiler. Each of this phase help in converting the high-level langue the machine
code. The phases of a compiler are:
Lexical analysis
Syntax analysis
Semantic analysis
Intermediate code generator
Code optimizer
Code generator
Phase 1: Lexical Analysis
Lexical Analysis is the first phase when compiler scans the source code. This process can be left to right,
character by character, and group these characters into tokens.
Here, the character stream from the source program is grouped in meaningful sequences by identifying
the tokens. It makes the entry of the corresponding tickets into the symbol table and passes that token to
next phase.
Example:
X = y + 10
Tokens
X : identifier
= : Assignment operator
Y : identifier
+ : Addition operator
10 : Number
Syntax analysis is based on the rules based on the specific programing language by constructing the parse
tree with the help of tokens. It also determines the structure of source language and grammar or syntax of
the language.
Example
Any identifier/number is an expression
In Parse Tree
Interior node: record with an operator filed and two files for children
Leaf: records with 2/more fields; one for token and other information about the token
Ensure that the components of the program fit together meaningfully
Gathers type information and checks for type compatibility
Checks operands are permitted by the source language
Semantic Analyzer will check for Type mismatches, incompatible operands, a function called with
improper arguments, an undeclared variable, etc.
Helps you to store type information gathered and save it in symbol table or syntax tree
Allows you to perform type checking
In the case of type mismatch, where there are no exact type correction rules which satisfy the
desired operation a semantic error is shown
Collects type information and checks for type compatibility
Checks if the source language permits the operands or not
Example
Float x = 20.2;
Float y = x*30;
In the above code, the semantic analyzer will typecast the integer 30 to float 30.0 before multiplication.
Intermediate code is between the high-level and machine level language. This intermediate code needs to
be generated in such a manner that makes it easy to translate it into the target machine code.
Example
Total = count + rate * 5
T1 := int_to_float(5)
T2 := rate * t1
T3 := count + t2
Total := t3
Example:
Consider the following code
A = intofloat(10)
B=c*a
D=e+b
F=d
Can become
B =c * 10.0
F = e+b
It also allocatesq memory locations for the variable. The instructions in the intermediate code are
converted into machine instructions. This phase coverts the optimize or intermediate code into the target
language.
The target language is the machine code. Therefore, all the memory locations and registers are also
selected and allotted during this phase. The code generated by this phase is executed to take inputs and
generate expected outputs.
Example
A = b + 60.0
MOVF a, R1
MULF #60.0, R2