CH5 2
CH5 2
CH5 2
Code Generation
It is the final phase of a compiler. It takes as input an IR of the source program
with supplementary information in symbol table and produces as output an
equivalent target program.
2
Code Generation
Presented below can be used whether or not an
optimizing phase occurs before code generation.
.
3
Code Generation
Three address code
The given expression is broken down into several separate instructions. These
instructions can easily translate into machine language.
Each Three address code instruction has at most three operands. It is a
combination of assignment and a binary operator.
Example
Given Expression:
a := (-c * b) + (-c * d)
The three address code can be represented in two
forms: quadruples and triples.
4
Code Generation
Quadruples
The quadruples have four fields to implement the three address code.
The field of quadruples contains the name of the operator, the first source
operand, the second source operand and the result respectively.
1 uminus b - t1
2 + c d t2
3 * t1 t2 t3
4 := t3 - a 5
Code Generation
Triples
The quadruples have three fields to implement the three address code.
The field of triples contains the name of the operator, the first source operand
and the second source operand.
1 uminus b -
2 + c d
3 * t1 t2
4 := t3 - 6
Code Generation
Code generator main tasks:
Factors to determine
Level of IR,
Instruction ordering
7
ISSUES IN THE DESIGN OF A CODE
GENERATOR
The following issues arise during the code generation
phase:
Input to code generator
Target program
Memory management
Instruction selection
Register allocation
Evaluation order
8
Input to code generator
The input to the code generation consists of the IR of the source program produced by
front end, together with information in the symbol table to determine run-time addresses
of the data objects denoted by the names in the IR.
Prior to code generation, the front end must be scanned, parsed and translated
into IR along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.
9
Target program
The output of the code generator is the target program. The
output may be :
Absolute machine language
Producing an absolute machine language program as output has the advantage that
it can be placed in a fixed location in memory and immediately executed.
10
Target program
A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
If the target machine does not handle relocation automatically,
the compiler must provide explicit relocation information
to the loader, to link the separately compiled program
segments.
Assembly language
Producing an assembly language program as output makes
the process of code generation some what easier
11
Memory Management
Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.
The quality of the generated code is determined by its speed and size.
13
Register allocation
Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Therefore efficient utilization of registers is particularly
important in generating good code.
Initially, we shall avoid the problem by generating code for the three-
address statements in the order in which they have been produced by
the intermediate code generator.
15
Approaches to code generation issues
Code generator must always generate the correct code.
It is essential because of the number of special cases
that a code generator might face.
t1 = t0 + c
d = t0 + t1
.
19
Descriptors
The code generator has to track both the registers (for availability) and
addresses (location of values) while generating the code. For both of
them, the following two descriptors are used:
Register descriptor :
It is used to inform the code generator about the availability of registers.
It keeps track of values stored in each register.
Whenever a new register is required during code generation, this
descriptor is consulted for register availability.
The register descriptors show that all the registers are initially empty.
20
Descriptors
Address descriptor :
An address descriptor is used to store the location where current
value of the name can be found at run time.
Values of the names (identifiers) used in the program might be
stored at different locations while in execution.
It used to keep track of memory locations where the values of
identifiers are stored.
These locations may include CPU registers, heaps, stacks, memory
or a combination of the mentioned locations.
21
getReg Function
getReg : Code generator uses getReg function to determine
the status of available registers and the location of name
values. getReg works as follows:
If variable Y is already in register R, it uses that register.
Else if some register R is available, it uses that register.
Else if both the above options are not possible, it chooses a
register that requires minimal number of load and store
instructions.
22
A code-generation algorithm
The algorithm takes a sequence of three-address statements as input. For each three
address statement of the form x : = y op z perform the various actions. These are as
follows:
Invoke a function getreg to find out the location L where the result of computation y op z
should be stored.
Consult the address description for y to determine y'. If the value of y currently in memory
and register both then prefer the register y' . If the value of y is not already in L then
generate the instruction MOV y' , L to place a copy of y in L.
Generate the instruction OP z' , L where z' is used to show the current location of z. if z is
in both then prefer a register to a memory location. Update the address descriptor of x to
indicate that x is in location L. If x is in L then update its descriptor and remove x from all
other descriptor.
If the current value of y or z have no next uses or not live on exit from the block or in
register then alter the register descriptor to indicate that after execution of x : = y op z
those register will no longer contain y or z. 23