Compiler Design-UNIT-5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

jntuk396.blogspot.

com
UNIT V Compiler Design

UNIT-V
Symbol Table & Run-Time Environments
Symbol Table
 Symbol table is a data structure used by compiler to keep track of
semantics of variable.
 i.e. symbol table stores the information about scope and binding
information about names.
 Symbol table is built in lexical and syntax analysis phases.
 It is used by various phases as follows, semantic analysis phase refers
symbol table for type conflict issues. Code generation refers symbol table
to know how much run-time space is allocated? What type of space
allocated?
Use of symbol table
 To achieve compile time efficiency compiler makes use of symbol table
 It associates lexical names with their attributes.
 The items to be stored in symbol table are,
 Variable name
 Constants
 Procedure names
 Literal constants & strings
 Compiler generated temporaries
 Labels in source program
 Compiler uses following types of information from symbol table
 Data type
 Name
 Procedure declarations
 Offset in storage
 In case of structure or record, a pointer to structure table
 For parameters, whether pass by value or reference
 Number and type of arguments passed
 Base address
1|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

Types of symbol table


Ordered symbol table
 Here , the entries of variables are made in alphabetical order.
 Searching of ordered symbol table can be done using linear and binary
search.
Advantages :
 The searching of particular variable is efficient.
 Relationship between variables can be established easily.
Disadvantages:
 Insertion of element is costly if there are large no. of entries in the
table.
Unordered symbol table
 In this type of table, variable entries are not made in sorted manner.
 Each time before inserting a variable in the table, a lookup is made to
check whether it is already present in the symbol table or not.
 If the variable is not present, then an entry is made.
 Advantage:
 Insertion of variable is easier.
Disadvantage:
 Searching is done using linear search.
 For larger tables the method turns to be inefficient, because lookup is
made before every insertion.

How names are stored in symbol table?


 There are two ways to store the names in symbol table.
 Fixed-length name:
 A fixed space is allocated for every symbol in the table. Space is wasted
in this type of storage, if name of the variable is small.

2|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design
Name Atribute

CALCULATE Float…
 For example,
SUM Float…

A Int…

B Int…

Variable- length names:


 The amount of space required by string is used to store the names.
 The name can be stored with the help of starting index and length of each
name.
 E.g,
Name
Atribute
Starting

Index length

0 10

10 4

14 2

16 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
c a l c u l a t e $ s u m $ a $ b $

3|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

Symbol table management


Symbol table management is required for the following reasons,
 For quick insertion of identifier and related information.
 For quick search of identifier.
 Following are commonly used data structures for symbol table
construction
 List Data structure
 Self organizing list/ Linked list
 Binary tree
 Hash tables
List data structure
 Linear list is the simplest kind of mechanism to implement symbol table.
In this method array is used to store names and associated information.
 New names can be added in the order they arrive.
 The pointer available is maintained at the end of all stored records.
Name 1 Info 1

Name 2 Info 2

Name 3 Info 3

. .
. .
. .

Name n Info n

 To retrieve information about some name we start from the beginning and
go on searching up to available pointer.
4|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

 If we reach the available pointer without finding the name, then we get an
error “use of undeclared name”.
 While inserting a new name we should ensure that it should not be already
there. If it is already there, then another error occurs, i.e.. “multiple
defined name”.
 Advantage is it takes less amount of space.
Self organizing list
 This symbol table representation uses linked list. A link field is added to
each record.
 We search the records in the order pointed by the link field.
 A pointer “first”, is maintained to point to the first record of the symbol
table.
 When the name is referenced or created it is moved to the front of the list.
 The most frequently referred names will be tend to be front of the list.
Hence access time to most frequently referred names will be the least.
 Insertion is easier.
Binary trees
The symbol table is represented as a binary tree as follows,
Left Symbols Information Right child
child
 The left child stores address of previous symbol and right child stores
address of next symbol.
 The symbol field is used to store the name of the symbol and information
field is used to store all attributes/information of the symbol.
 The binary tree structure is basically a BST in which left side node is
always less and right side node is always more than the parent node.
 Hence, insertion of symbol is efficient.
 Searching process is efficient.
 create a BST structure for following
5|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

Int m, n, p;
Int compute(int a, int b, int c)
{
t=a + b+ c;
Return (t); }
Void main()
{
Int k;
K= compute(10,20,30);
}

Hash tables
 Hashing is an important technique used to search the records of symbol
table. This method is superior to list organization.
 In hashing scheme two tables are maintained- a hash table and a symbol
table.
 The hash table consists of k entries from o to k-1. these entries are
basically pointers to the symbol table pointing to the names in the symbol
table.
 To determine whether the name is there in the symbol table, we use a hash
function ” h” such that h(name) will result any integer between o to k-1.
 We can search any name using the function.
 The hash function should result in uniform distribution of names in
symbol table.
 The hash function should be such that there will be minimum no of
collisions.
 Advantages of hashing is quick search is possible and the disadvantage is
it is complicated to implement. Extra space is required. Obtaining scope
of variables is difficult.

6|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

Runtime environment
 The compiler demands a block of memory to the OS. This memory is
utilized for running or executing the compiled program.
 This block of memory is called run time storage.
 The run time storage is subdivided to hold the following,
 The generated target code
 Data objects
 Information which keeps track of procedure activations
 The size of generated code is fixed. Hence the target code occupies
statically determined area of the memory.
 Compiler places the target code at the lower end of the memory.
 The amount of memory required by the data objects is known at the
compile time and hence data objects also can be placed at the static data
area of memory.
 To maximize the utilization of space at run time, the other two areas, stack
and heap are used at opposite ends of the remaining address space.
 Stack is used to store data structures called activation records that gets
generated during procedure calls.
 Heap area is the area of run time storage which is allocated to variables
during run-time.
 Size of stack and heap is not fixed. i.e. it may grow or shrink during
program execution.

Storage Organization
The executing target program runs in its own logical address space in which each
program value has a location.
The management and organization of this logical address space is shared between
the complier, operating system and target machine. The operating system maps the
logical address into physical addresses, which are usually spread throughout
memory.

7|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

 Run-time storage comes in blocks, where a byte is the smallest unit of


addressable memory. Four bytes form a machine word. Multi byte objects
are stored in consecutive bytes and given the address of first byte.
 The storage layout for data objects is strongly influenced by the addressing
constraints of the target machine.
 A character array of length 10 needs only enough bytes to hold 10
characters, a compiler may allocate 12 bytes to get alignment, leaving 2
bytes unused.
 This unused space due to alignment considerations is referred to as padding.
 The size of some program objects may be known at run time and may be
placed in an area called static.
 The dynamic areas used to maximize the utilization of space at run time are
stack and heap.
Storage organization strategies
 Three different strategies based on the division of run time storage
 Static allocation
 Stack allocation
 Heap allocation

Static Allocation
 The size of data objects is known at compile time. The names of these
objects are bound to storage at compile time only and such allocation is
called as static allocation. Amount of storage do not change during run
time. At compile time, the compiler can fill the addresses at which the
target code can find the data it operates on.

8|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

 Main limitation is recursive procedures are not supported by this type of


allocation because of the static nature.
Stack allocation
 Here the storage is organized as a stack. (LIFO)
 It is also called as control stack.
 As activation begins the activation records are pushed onto the stack and
on completion of this activation the corresponding activation record is
popped out.
 The local variables are stored in each activation record. Hence locals are
bound to corresponding activation record on each fresh activation.
 The data structure can be created dynamically for stack allocation.
 Here memory addressing is done using pointers or index registers hence
slower than static allocation.
Heap allocation
 If the values of non-local variables must be retained even after the
activation record then such a retaining is not possible in stack allocation
because of its LIFO nature. Hence, heap allocation is used in such
situations.
 The heap allocation allocates continuous block of memory when required
for storage of activation records or other data objects.
 This memory can be deactivated when activation ends.
 Heap management can be done by creating a linked list for the free blocks
and when any memory is deallocated that block can be appended to the
list.

9|Page

www.jntuk396.blogspot.com
UNIT V Compiler Design

Activation record
 Activation record is a block of memory used for managing information
needed by a single execution of a procedure. The contents of activation
record are,

 Temporaries: temporary variables used during evaluation of expressions.


 Local data: it is the data that is local to the execution of procedure.
 Saved machine status: this field holds information regarding to the status
of machine, before calling the procedure. This field contains the machine
registers and PC.
 Control link: it is optional field. It points to activation record of calling
procedure. This link is also called as dynamic link.
 Access link: it may be needed by the called procedure but found else
where, i.e. in another activation record.
 Actual parameters: passed during call.
 Return value: stores result of function call.

10 | P a g e

www.jntuk396.blogspot.com
UNIT V Compiler Design

Example: Sketch of a quicksort program

Activation for Quicksort

11 | P a g e

www.jntuk396.blogspot.com
UNIT V Compiler Design

Activation tree representing calls during an execution of quicksort

Downward-growing stack of activation records

12 | P a g e

jntuk396.blogspot.com
UNIT V Compiler Design

Block and non block structure storage allocation


 The storage allocation is done for two types of data,
 Local data
 Non-local data
 The local data can be handled using activation record whereas non local data
can be handled using scope information.

Access to Local Data


 The local data can be accessed with help of activation record.
 The offset relative to base pointer of an activation record points to local data
variables within an activation record.
 Hence, Reference to any variable x in procedure= Base pointer pointing to start
of procedure + offset of variable x from base pointer
 E.g. consider the following,
Procedure A
Int a;
Procedure B
Int b;
Body of B;
Body of A;
Contents of stack along with base pointer is shown below,

Return Value

Saved Registers
Activation
offset
record for A
Parameters

Locals: a

13 | P a g e
UNIT V Compiler Design

Activation record for A

offset

Activation record for B

Access to non local Data


 A procedure may sometimes refer to variables which are not local to it.
 Such variables are called as non-local variables.
 For the non local names there are two types of scope rules such as static and
dynamic.
 The static scope rule is also called as lexical scope. In this type the scope is
determined by examining the program text.
 The languages that use the static scope rules are called as block structured
language.
 The dynamic scope rules determine the scope of declaration of the names at
run time by considering the current activation.
Static or lexical scope
 Block is a sequence of statements containing the local data declarations and
enclosed within the delimiters.
 The blocks can be nesting in nature.
 The scope of declaration in a block structured language is given by most
closely nested loop or static rule.

14 | P a g e
UNIT V Compiler Design

 E.g.
Scope_test()
{
Int p, q;
{
Int p;
{
Int r; B3 B2

} B1
……
}
….
{
Int q, s,t; B4
}
}
 The storage for the names corresponding to a particular block can be shown
below.

15 | P a g e
UNIT V Compiler Design

The lexical scope can be implemented using access link and displays
Access link:
 The implementation of lexical scope can be obtained by using pointer to each
activation record. These pointers are called access links.
 If procedure P is nested within a procedure q then access link of p points to
access link of most recent activation record of procedure q.

Display:
 It is expensive to traverse down access link every time when a particular non
local variable is accessed. To speed up the access to non locals can be achieved
by maintaining an array of pointers called display.
 In display,
 An array of pointers to activation record is maintained.
 Array is indexed by nesting level
 The pointers point to only accessible activation record.
 The display changes when a new activation occurs and it must be reset
when control returns from the new activation.

0
1
2

Display stack
The advantage of using display is that , if p is executing, and it needs to access
element x belonging to some procedure q, we need to look only in display[i] where
i is the nesting depth of q.We will follow the pointer display[i] to the activation
record for q wherein x is found at some offset.
The compiler knows what ‘i’ is, so it can access display[i] easily.Hence no need to
follow a long chain of access links.

16 | P a g e
UNIT V Compiler Design

Heap Management
 Heap is a portion of memory that holds data during the lifetime of the program.
 In the heap, the static memory requirements such as global variables will be
allocated space.
 In addition, any memory that is supposed to be used throughout the program is
also stored in heap.
 Hence managing the heap is important.
 A special software called Memory Manager manages allocation and
deallocation of memory.
Memory Manager:
 Two basic functions of memory manager are,
 Allocation:when a program requests memory for a variable, then the
memory manager produces a chunk of heap memory of requested size.
 Deallocation:The memory manager deallocates the space adds it to the
pool of free space so that it can be reused.
 Desired Properties of memory managers:
 Space efficiency: should minimize the total heap space required by a
program
 Program efficiency: should make better use of space, to run the program
faster to increase efficientcy.
 Low overhead: allocation and deallocation should be effiecient.
Two types of memory allocation techniques are,
 Explicit allocation
 Implicit allocation
 Explicit allocation is done using functions like new and dispose.
 Whereas, implicit memory allocation is done by compiler using run time
support packages.

Explicit allocation
 This is the simplest technique of explicit allocation where the size of block for
which memory is allocated is fixed.

17 | P a g e
UNIT V Compiler Design

 In this technique, a free list is used which is a set of free blocks. Memory is
allocated from this list.
 The blocks are linked each other in a list structure.
 The memory allocation can be done by pointing previous node to the newly
allocated block.
 Similarly deallocation can be done by dereferencing the previous link.

Explicit allocation of variable sized blocks


 Due to frequent memory allocation and deallocation, the heap memory
becomes fragmented.
 For allocating variable sized blocks we use strategies like first fit, worst fit and
best fit.

Implicit allocation
 The implicit allocation is performed using user program and runtime packages.
 The run time package is required to know when the storage block is not in use.
 The format of storage block is given below.
 Reference count: (RC) It is a special counter used during implicit allocation. If
any block is referred by any other block then its reference count is incremented
by one.
 If the value of RC is 0, then it can be deallocated.
 Marking techniques: it is an alternative approach whether the block is in use or
not. In this method , the user program is suspended temporarily and frozen
pointers are used to mark the blocks which are in use.
Parameter passing Mechanism
 Types of parameters
 Formal : parameters used in the function definition.
 Actual: parameters passed during function call.
 What is l-value & r-value?
 R-value is the value of expression which is present at the right side of
assignment operator.
 L-value is the address of memory location( or variable) present at the left
18 | P a g e
UNIT V Compiler Design

side of the assignment operator.


 What are the Parameter passing methods ?
 Call by value/pass by value.
 Call by address/ Pass by reference.
 Pass by copy-restore
 Pass by Name
 Example: swapping of two numbers( using ‘C ’ language)
Pass by Value:
In pass by value mechanism, the calling procedure passes the r-value of actual
parameters and the compiler puts that into the called procedure‟s activation record.
Formal parameters then hold the values passed by the calling procedure. If the
values held by the formal parameters are changed, it should have no impact on the
actual parameters.
Pass by Reference:
In pass by reference mechanism, the l-value of the actual parameter is copied to the
activation record of the called procedure. This way, the called procedure now has
the address (memory location) of the actual parameter and the formal parameter
refers to the same memory location. Therefore, if the value pointed by the formal
parameter is changed, the impact should be seen on the actual parameter as they
should also point to the same value.
Pass by Copy-restore:
This parameter passing mechanism works similar to „pass-by-reference‟ except
that the changes to actual parameters are made when the called procedure ends.
Upon function call, the values of actual parameters are copied in the activation
record of the called procedure. Formal parameters if manipulated have no real-time
effect on actual parameters (as l-values are passed), but when the called procedure
ends, the l-values of formal parameters are copied to the l-values of actual
parameters.

19 | P a g e
UNIT V Compiler Design

Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual
parameter y. Even if the value of y is changed before the procedure ends, the l-
value of x is copied to the l-value of y making it behave like call by reference.
Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that
works like preprocessor in C language. In pass by name mechanism, the name of
the procedure being called is replaced by its actual body. Pass-by-name textually
substitutes the argument expressions in a procedure call for the corresponding
parameters in the body of the procedure so that it can now work on actual
parameters, much like pass-by-reference.
Garbage collection
 The process of collecting unused memory (which was previously allocated to
variables/objects and no longer needed now) in a program and pool it in a form
to be used by other application is called as GARBAGE COLLECTION.
 Few languages support automatic garbage collection and in other languages we
20 | P a g e
UNIT V Compiler Design

need to explicitly use the garbage collection techniques.


 Basic idea is keep track of what memory is referenced and when it is no longer
accessible, reclaim the memory. Example: linked list.

Reference count garbage collectors


 Garbage collection(GC) works as follows.
 When an application needs some free space to allocate nodes and there is
no free space available to allocate memory to the objects then a system
routine called GARBAGE COLLECTOR is invoked.
 The routine then searches the system for the nodes that are no longer
accessible from the external pointers. These nodes can be made available
for reuse by adding them to the available pool.
 Reference count is a special counter used during implicit memory
allocation. If any block is referred by any other block then its reference
count is incremented by one.

21 | P a g e
UNIT V Compiler Design

 The block is deallocated as soon as the reference count value becomes


zero.
 These kind of garbage collectors are called as reference count garbage
collectors.

Advantages of GC are,
 The manual memory management done by the programmer (using malloc,
realloc, free) is time consuming and error prone.
 Reusability of memory is achieved using garbage collection.
Disadvantages are,
 The execution of program is stopped or paused during garbage collection.
 Sometimes a situation called Thrashing occurs.
Introduction
CODE GENERATION
 The final phase of a compiler is code generator.
 It receives an intermediate representation (IR) along with information in
symbol table.

 Produces a semantically equivalent target program


 Code generator main tasks:
 Instruction selection
 Register allocation and assignment
 Instruction ordering

22 | P a g e
UNIT V Compiler Design

Issues in the Design of Code Generator:


 The most important criterion is that it produces correct code
 Input to the code generator: IR + Symbol table

 We assume front end produces low-level IR, i.e. values of names in it can
be directly manipulated by the target machine.
 Syntactic and semantic errors have been already detected.
 The target program
 Common target architectures are: RISC, CISC and Stack based machines.
 In this chapter, we use a very simple RISC-like computer with addition of
some CISC-like addressing modes
Instruction selection:

 The code generated must map the IR program into a code sequence that can be
executed by the target machine. The complexities of this mapping are,
 The level of the IR(Intermediate representation)
 The nature of the instruction-set architecture
 The desired quality of the generated code.(speed & size)
Register allocation
 Two sub problems are,
 Register allocation: selecting the set of variables that will reside in
registers at each point in the program
 Resister assignment: selecting specific register that a variable reside in.
Evaluation ordering
 The order in which computations are performed can affect the efficiency
of the target code.
 Because, some computation orders fewer registers to hold intermediate
results than other.
The Target Language

The target language performs following operations,

23 | P a g e
UNIT V Compiler Design

 Load operations: LD r,x and LD r1, r2


 Store operations: ST x,r
 Computation operations: OP dst, src1, src2
 Unconditional jumps: BR L
 Conditional jumps: Bcond r, L like BLTZ r, L
The simple target machine model uses following addressing modes.
 variable name: x
 indexed address: a(r) like, LD R1, a(R2) means R1=contents(a + contents(R2))
 integer indexed by a register : like LD R1, 100(R2)
 Indirect addressing mode: *r means the memory location found in the location
represented by the contents of register r and *100(r) means
content(contents(100+contents(r)))
 immediate constant addressing mode: like LD R1, #100
The three address statement x=y-z can be implemented by the machine instructions
LD R1,Y
LD R2,Z
SUB R1,R2
ST X,R1
suppose an array a whose elements are 8-byte real numbers. Then,
b = a [i]
LD R1, i //R1 = i
MUL R1, 8 //R1 = Rl * 8
LD R2, a(R1) //R2=contents(a+contents(R1))
ST b, R2 //b = R2

a[j] = c
LD R1, c //R1 = c
LD R2, j // R2 = j
MUL R2, 8 //R2 = R2 * 8
ST a(R2), R1 //contents(a+contents(R2))=R1

x=*p
LD R1, p //R1 = p
LD R2, 0(R1) // R2 = contents(0+contents(R1))
ST x, R2 // x=R2

24 | P a g e
UNIT V Compiler Design

conditional-jump three-address instruction


If x<y goto L
LD R1, x // R1 = x
LD R2, y // R2 = y
SUB R1, R1, R2 // R1 = R1 - R2
BLTZ R1, M // i f R1 < 0 jump t o M

Basic blocks and flow graphs

 A graph representation of intermediate code that is helpful for code generation.


 Partition the intermediate code into basic blocks, which are sequence of 3-
address code with properties that,
 The flow of control can only enter the basic block through the first
instruction in the block. That is, there are no jumps into the middle of the
block.
 Control will leave the block without halting or branching, except possibly
at the last instruction in the block.
 The basic blocks become the nodes of a flow graph, whose edges indicate
which block can follow which other blocks.

Rules for finding leaders


 The first three-address instruction in the intermediate code is a leader.
 Any instruction that is the target of a conditional or unconditional jump is a
leader.
 Any instruction that immediately follows a conditional or unconditional jump
is a leader.

Intermediate code to set a 10*10 matrix to an identity.

25 | P a g e
UNIT V Compiler Design

Three address code for the above code is,

Flow graph
 Once basic blocks are constructed, the flow of control between these block can
be represented using edges.
 There is an edge from B to C iff it is possible for the first instruction in block C
to immediately follow the last instruction in block B.
 There are two ways such an edge can be justified.
 There is a conditional jump from the end of B to beginning of C
 C immediately follows B in the original order of the three address
instructions and B does not end in an unconditional jump.

Example: consider the above three address code, the leader instructions are,
1,2,3,10,12,13 because these statements are the targets of branch instructions. Now,
using these leaders we can construct 6 basic blocks and then these basic blocks can
be connected to each other using edges as shown below.

26 | P a g e
UNIT V Compiler Design

(Flow graph)

A Simple code generator


 One of the primary issues in code generation is deciding how to use registers to
best advantage. Following are principal use of registers.
 In most machine architectures, some or all of the operands of an operation must
be in registers in order to perform the operation.
 Registers make good temporaries - places to hold the result of a subexpression
while a larger expression is being evaluated.
 Registers used to hold global values which are computed in one basic block
and used in other blocks.
 Registers are often used to help with run-time storage management, for
example, to manage the run-time stack, including the maintenance of stack
pointers and possibly the top elements of the stack itself.

27 | P a g e
UNIT V Compiler Design

Descriptors
 For each available register, a register descriptor keeps track of the variable
names whose current value is in that register. Since we shall use only those
registers that are available for local use within a basic block, we assume that
initially, all register descriptors are empty. As the code generation progresses,
each register will hold the value of zero or more names.

 For each program variable, an address descriptor keeps track of the location or
locations where the current value of that variable can be found. The location
might be a register, a memory address, a stack location, or some set of more
than one of these. The information can be stored in the symbol-table entry for
that variable name.
The Code-generation Algorithm

This algorithm uses a function called getReg(I) which select registers for each
memory location associated with instruction I.
 Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry and
Rz.
 If y is not in Ry (according to the register descriptor for Ry), then issue an
instruction LD Ry, y', where y' is one of the memory locations for y (according
to the address descriptor for y).
 Similarly, if z is not in Rz, issue and instruction LD Rz, z', where z' is a location
for x .
 Issue the instruction ADD Rx , Ry, Rz.
Rules for updating the register and address descriptors
1. For the instruction LD R, x
 Change the register descriptor for register R so it holds only x.
 Change the address descriptor for x by adding register R as an additional
location.

2. For the instruction ST x, R, change the address descriptor for x to include its
own memory location.

3. For an operation such as ADD Rx, Ry, Rz implementing a three-address


instruction x = y + x
 Change the register descriptor for Rx so that it holds only x.
 Change the address descriptor for x so that its only location is R x. Note
28 | P a g e
UNIT V Compiler Design

that the memory location for x is not now in the address descriptor for x.
 Remove Rx from the address descriptor of any variable other than x.

4. When we process a copy statement x = y, after generating the load for y into
register Ry, if needed, and after managing descriptors as for all load statements (per
rule I):

Add x to the register descriptor for Ry.
 Change the address descriptor for x so that its only location is Ry .
Example
Let us consider a basic block containing the following 3-address code.

t=a-b
u=a-c
v=t+u
a=d
d=v+u
Instructions generated and the changes in the register and address descriptors is
shown below.

29 | P a g e
UNIT V Compiler Design

Rules for picking register Ry for y


 If y is currently in a register, pick a register already containing y as R y. Do not
issue a machine instruction to load this register, as none is needed.
 If y is not in a register, but there is a register that is currently empty, pick one
such register as Ry.
 The difficult case occurs when y is not in a register, and there is no register that
is currently empty. We need to pick one of the allowable registers anyway, and
we need to make it safe to reuse.
Peephole optimizations
 An alternative approach of code generation is to generate naïve code and then
improve the quality of the target code by applying optimization.
 A simple but effective technique for locally improving the target code is
peephole optimization.
 This is done by examining a sliding window of target instructions (called
peephole) and replacing the instruction sequences within the peephole by a
shorter or faster sequence whenever possible.

30 | P a g e
UNIT V Compiler Design

Characteristic of peephole optimizations


 Redundant-instruction elimination
 Flow-of-control optimizations
 Algebraic simplifications
 Use of machine idioms
Redundant-instruction elimination
 LD R0,a
ST a,R0
Eliminating unreachable code
Example 1:
Sum=0
if(sum)
printf(“%d”,sum);
Example 2:
Int fun(int a, int b)
{ c=a+b;
Return c;
Printf(“%d”,c);}

Flow-of-control optimizations
goto L1
...
Ll: goto L2

Can be replaced by:


goto L2

Algebraic simplifications
There is no end to the amount of algebraic simplification that can be attempted
through peephole optimization. Only a few algebraic identities occur frequently
enough that it is worth considering implementing them .For example, statements
such as
x := x+0

Or
x := x * 1
31 | P a g e
UNIT V Compiler Design

Are often produced by straightforward intermediate code-generation algorithms,


and they can be eliminated easily through peephole optimization.
Reduction in Strength: Reduction in strength replaces expensive operations by
equivalent cheaper ones on the target machine. Certain machine instructions are
considerably cheaper than others and can often be used as special cases of more
expensive operators.
For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be
implemented as multiplication by a constant, which may be cheaper.
X2 → X*X.
Use of Machine Idioms:
The target instructions have equivalent machine instruction for performing some
operations. Hence we can replace these target instructions by equivalent machine
instructions in order to improve efficiency. E.g. some machines have auto-
increment and auto-decrement addressing modes that are used to perform addition
and subtraction.
Register Allocation and Assignment
Instructions involving register operands are shorter and faster than those involving
operands in memory.
The use of registers is subdivided into two sub problems:
Register allocation – the set of variables that will reside in registers at a point in
the program is selected.
Register assignment – the specific register that a variable will reside in is picked.
Following are the techniques.

 Global Register Allocation


 Usage Counts
 Register Assignment for Outer Loops
 Register Allocation by Graph Coloring

Global register allocation


 Previously explained algorithm does local (block based) register allocation
 This resulted that all live variables be stored at the end of block
32 | P a g e
UNIT V Compiler Design

 To save some of these stores and their corresponding loads, we might arrange
to assign registers to frequently used variables and keep these registers
consistent across block boundaries (globally)
 Some options are:
 Keep values of variables used in loops inside registers
 Use graph coloring approach for more globally allocation
Usage counts
 The usage count is the count for the use of some variable x in some register
used in any basic block.
 Usage count gives the idea about how many units of cost can be saved by
selecting a specific variable for global register allocation.
 The approximate formula for usage count for the loop L in some basic block B
can be given by,
∑ (use(x,B)+2*(live(x,B))
 Sum over all blocks (B) in a loop (L)
 For each uses of x before any definition in the block we add one unit of
saving
 If x is live on exit from B and is assigned a value in B, then we add 2 units
of saving
Ex:

Here, usage count of a,b,c,d,e & f is 4,5,3,6,4 &4 respectively. Hence,


if you have three registers (R1,R2,R3) then , these registers are allocated to the
33 | P a g e
UNIT V Compiler Design

variables c,b and a because they have highest usage count value.
Flow graph of an inner loop
Register assignment for outer loops
Consider that there are two loops L1 and L2 where L2 is the inner loop and
allocation of variable a is to be done to some register.
Following criteria should be adopted for register assignment for outer loop.
If a is allocated in loop L2 then it need not be allocated in L1-L2.
If a is allocated in L1 and it is not allocated in L2 then store a on an entrance to L2
and load a while leaving L2.
If a is allocated in L2 and not in L1 then load a on entrance of L2 and store a on
exit from L2.

Register allocation by Graph coloring


 When we need a register for allocation but all registers are occupied then we
need to make some register free for reusability. The register selection is done
using following technique.(Graph coloring)
 Two passes are used in this technique.
 In the first pass the specific machine instruction is selected for register
allocation. For each variable a symbolic register is allocated.
 In the second pass the register interference graph is constructed. In this
graph, each node is a symbolic register and an edge connects two nodes
where one is live at a point where other is defined.
 Use a graph coloring algorithm to assign registers so that no two registers
can interfere with each other with assigned physical registers.

34 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy