Compiler Design-UNIT-5
Compiler Design-UNIT-5
Compiler Design-UNIT-5
com
UNIT V Compiler Design
UNIT-V
Symbol Table & Run-Time Environments
Symbol Table
Symbol table is a data structure used by compiler to keep track of
semantics of variable.
i.e. symbol table stores the information about scope and binding
information about names.
Symbol table is built in lexical and syntax analysis phases.
It is used by various phases as follows, semantic analysis phase refers
symbol table for type conflict issues. Code generation refers symbol table
to know how much run-time space is allocated? What type of space
allocated?
Use of symbol table
To achieve compile time efficiency compiler makes use of symbol table
It associates lexical names with their attributes.
The items to be stored in symbol table are,
Variable name
Constants
Procedure names
Literal constants & strings
Compiler generated temporaries
Labels in source program
Compiler uses following types of information from symbol table
Data type
Name
Procedure declarations
Offset in storage
In case of structure or record, a pointer to structure table
For parameters, whether pass by value or reference
Number and type of arguments passed
Base address
1|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
2|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Name Atribute
CALCULATE Float…
For example,
SUM Float…
A Int…
B Int…
Index length
0 10
10 4
14 2
16 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
c a l c u l a t e $ s u m $ a $ b $
3|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Name 2 Info 2
Name 3 Info 3
. .
. .
. .
Name n Info n
To retrieve information about some name we start from the beginning and
go on searching up to available pointer.
4|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
If we reach the available pointer without finding the name, then we get an
error “use of undeclared name”.
While inserting a new name we should ensure that it should not be already
there. If it is already there, then another error occurs, i.e.. “multiple
defined name”.
Advantage is it takes less amount of space.
Self organizing list
This symbol table representation uses linked list. A link field is added to
each record.
We search the records in the order pointed by the link field.
A pointer “first”, is maintained to point to the first record of the symbol
table.
When the name is referenced or created it is moved to the front of the list.
The most frequently referred names will be tend to be front of the list.
Hence access time to most frequently referred names will be the least.
Insertion is easier.
Binary trees
The symbol table is represented as a binary tree as follows,
Left Symbols Information Right child
child
The left child stores address of previous symbol and right child stores
address of next symbol.
The symbol field is used to store the name of the symbol and information
field is used to store all attributes/information of the symbol.
The binary tree structure is basically a BST in which left side node is
always less and right side node is always more than the parent node.
Hence, insertion of symbol is efficient.
Searching process is efficient.
create a BST structure for following
5|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Int m, n, p;
Int compute(int a, int b, int c)
{
t=a + b+ c;
Return (t); }
Void main()
{
Int k;
K= compute(10,20,30);
}
Hash tables
Hashing is an important technique used to search the records of symbol
table. This method is superior to list organization.
In hashing scheme two tables are maintained- a hash table and a symbol
table.
The hash table consists of k entries from o to k-1. these entries are
basically pointers to the symbol table pointing to the names in the symbol
table.
To determine whether the name is there in the symbol table, we use a hash
function ” h” such that h(name) will result any integer between o to k-1.
We can search any name using the function.
The hash function should result in uniform distribution of names in
symbol table.
The hash function should be such that there will be minimum no of
collisions.
Advantages of hashing is quick search is possible and the disadvantage is
it is complicated to implement. Extra space is required. Obtaining scope
of variables is difficult.
6|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Runtime environment
The compiler demands a block of memory to the OS. This memory is
utilized for running or executing the compiled program.
This block of memory is called run time storage.
The run time storage is subdivided to hold the following,
The generated target code
Data objects
Information which keeps track of procedure activations
The size of generated code is fixed. Hence the target code occupies
statically determined area of the memory.
Compiler places the target code at the lower end of the memory.
The amount of memory required by the data objects is known at the
compile time and hence data objects also can be placed at the static data
area of memory.
To maximize the utilization of space at run time, the other two areas, stack
and heap are used at opposite ends of the remaining address space.
Stack is used to store data structures called activation records that gets
generated during procedure calls.
Heap area is the area of run time storage which is allocated to variables
during run-time.
Size of stack and heap is not fixed. i.e. it may grow or shrink during
program execution.
Storage Organization
The executing target program runs in its own logical address space in which each
program value has a location.
The management and organization of this logical address space is shared between
the complier, operating system and target machine. The operating system maps the
logical address into physical addresses, which are usually spread throughout
memory.
7|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Static Allocation
The size of data objects is known at compile time. The names of these
objects are bound to storage at compile time only and such allocation is
called as static allocation. Amount of storage do not change during run
time. At compile time, the compiler can fill the addresses at which the
target code can find the data it operates on.
8|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
9|Page
www.jntuk396.blogspot.com
UNIT V Compiler Design
Activation record
Activation record is a block of memory used for managing information
needed by a single execution of a procedure. The contents of activation
record are,
10 | P a g e
www.jntuk396.blogspot.com
UNIT V Compiler Design
11 | P a g e
www.jntuk396.blogspot.com
UNIT V Compiler Design
12 | P a g e
jntuk396.blogspot.com
UNIT V Compiler Design
Return Value
Saved Registers
Activation
offset
record for A
Parameters
Locals: a
13 | P a g e
UNIT V Compiler Design
offset
14 | P a g e
UNIT V Compiler Design
E.g.
Scope_test()
{
Int p, q;
{
Int p;
{
Int r; B3 B2
} B1
……
}
….
{
Int q, s,t; B4
}
}
The storage for the names corresponding to a particular block can be shown
below.
15 | P a g e
UNIT V Compiler Design
The lexical scope can be implemented using access link and displays
Access link:
The implementation of lexical scope can be obtained by using pointer to each
activation record. These pointers are called access links.
If procedure P is nested within a procedure q then access link of p points to
access link of most recent activation record of procedure q.
Display:
It is expensive to traverse down access link every time when a particular non
local variable is accessed. To speed up the access to non locals can be achieved
by maintaining an array of pointers called display.
In display,
An array of pointers to activation record is maintained.
Array is indexed by nesting level
The pointers point to only accessible activation record.
The display changes when a new activation occurs and it must be reset
when control returns from the new activation.
0
1
2
Display stack
The advantage of using display is that , if p is executing, and it needs to access
element x belonging to some procedure q, we need to look only in display[i] where
i is the nesting depth of q.We will follow the pointer display[i] to the activation
record for q wherein x is found at some offset.
The compiler knows what ‘i’ is, so it can access display[i] easily.Hence no need to
follow a long chain of access links.
16 | P a g e
UNIT V Compiler Design
Heap Management
Heap is a portion of memory that holds data during the lifetime of the program.
In the heap, the static memory requirements such as global variables will be
allocated space.
In addition, any memory that is supposed to be used throughout the program is
also stored in heap.
Hence managing the heap is important.
A special software called Memory Manager manages allocation and
deallocation of memory.
Memory Manager:
Two basic functions of memory manager are,
Allocation:when a program requests memory for a variable, then the
memory manager produces a chunk of heap memory of requested size.
Deallocation:The memory manager deallocates the space adds it to the
pool of free space so that it can be reused.
Desired Properties of memory managers:
Space efficiency: should minimize the total heap space required by a
program
Program efficiency: should make better use of space, to run the program
faster to increase efficientcy.
Low overhead: allocation and deallocation should be effiecient.
Two types of memory allocation techniques are,
Explicit allocation
Implicit allocation
Explicit allocation is done using functions like new and dispose.
Whereas, implicit memory allocation is done by compiler using run time
support packages.
Explicit allocation
This is the simplest technique of explicit allocation where the size of block for
which memory is allocated is fixed.
17 | P a g e
UNIT V Compiler Design
In this technique, a free list is used which is a set of free blocks. Memory is
allocated from this list.
The blocks are linked each other in a list structure.
The memory allocation can be done by pointing previous node to the newly
allocated block.
Similarly deallocation can be done by dereferencing the previous link.
Implicit allocation
The implicit allocation is performed using user program and runtime packages.
The run time package is required to know when the storage block is not in use.
The format of storage block is given below.
Reference count: (RC) It is a special counter used during implicit allocation. If
any block is referred by any other block then its reference count is incremented
by one.
If the value of RC is 0, then it can be deallocated.
Marking techniques: it is an alternative approach whether the block is in use or
not. In this method , the user program is suspended temporarily and frozen
pointers are used to mark the blocks which are in use.
Parameter passing Mechanism
Types of parameters
Formal : parameters used in the function definition.
Actual: parameters passed during function call.
What is l-value & r-value?
R-value is the value of expression which is present at the right side of
assignment operator.
L-value is the address of memory location( or variable) present at the left
18 | P a g e
UNIT V Compiler Design
19 | P a g e
UNIT V Compiler Design
Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual
parameter y. Even if the value of y is changed before the procedure ends, the l-
value of x is copied to the l-value of y making it behave like call by reference.
Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that
works like preprocessor in C language. In pass by name mechanism, the name of
the procedure being called is replaced by its actual body. Pass-by-name textually
substitutes the argument expressions in a procedure call for the corresponding
parameters in the body of the procedure so that it can now work on actual
parameters, much like pass-by-reference.
Garbage collection
The process of collecting unused memory (which was previously allocated to
variables/objects and no longer needed now) in a program and pool it in a form
to be used by other application is called as GARBAGE COLLECTION.
Few languages support automatic garbage collection and in other languages we
20 | P a g e
UNIT V Compiler Design
21 | P a g e
UNIT V Compiler Design
Advantages of GC are,
The manual memory management done by the programmer (using malloc,
realloc, free) is time consuming and error prone.
Reusability of memory is achieved using garbage collection.
Disadvantages are,
The execution of program is stopped or paused during garbage collection.
Sometimes a situation called Thrashing occurs.
Introduction
CODE GENERATION
The final phase of a compiler is code generator.
It receives an intermediate representation (IR) along with information in
symbol table.
22 | P a g e
UNIT V Compiler Design
We assume front end produces low-level IR, i.e. values of names in it can
be directly manipulated by the target machine.
Syntactic and semantic errors have been already detected.
The target program
Common target architectures are: RISC, CISC and Stack based machines.
In this chapter, we use a very simple RISC-like computer with addition of
some CISC-like addressing modes
Instruction selection:
The code generated must map the IR program into a code sequence that can be
executed by the target machine. The complexities of this mapping are,
The level of the IR(Intermediate representation)
The nature of the instruction-set architecture
The desired quality of the generated code.(speed & size)
Register allocation
Two sub problems are,
Register allocation: selecting the set of variables that will reside in
registers at each point in the program
Resister assignment: selecting specific register that a variable reside in.
Evaluation ordering
The order in which computations are performed can affect the efficiency
of the target code.
Because, some computation orders fewer registers to hold intermediate
results than other.
The Target Language
23 | P a g e
UNIT V Compiler Design
a[j] = c
LD R1, c //R1 = c
LD R2, j // R2 = j
MUL R2, 8 //R2 = R2 * 8
ST a(R2), R1 //contents(a+contents(R2))=R1
x=*p
LD R1, p //R1 = p
LD R2, 0(R1) // R2 = contents(0+contents(R1))
ST x, R2 // x=R2
24 | P a g e
UNIT V Compiler Design
25 | P a g e
UNIT V Compiler Design
Flow graph
Once basic blocks are constructed, the flow of control between these block can
be represented using edges.
There is an edge from B to C iff it is possible for the first instruction in block C
to immediately follow the last instruction in block B.
There are two ways such an edge can be justified.
There is a conditional jump from the end of B to beginning of C
C immediately follows B in the original order of the three address
instructions and B does not end in an unconditional jump.
Example: consider the above three address code, the leader instructions are,
1,2,3,10,12,13 because these statements are the targets of branch instructions. Now,
using these leaders we can construct 6 basic blocks and then these basic blocks can
be connected to each other using edges as shown below.
26 | P a g e
UNIT V Compiler Design
(Flow graph)
27 | P a g e
UNIT V Compiler Design
Descriptors
For each available register, a register descriptor keeps track of the variable
names whose current value is in that register. Since we shall use only those
registers that are available for local use within a basic block, we assume that
initially, all register descriptors are empty. As the code generation progresses,
each register will hold the value of zero or more names.
For each program variable, an address descriptor keeps track of the location or
locations where the current value of that variable can be found. The location
might be a register, a memory address, a stack location, or some set of more
than one of these. The information can be stored in the symbol-table entry for
that variable name.
The Code-generation Algorithm
This algorithm uses a function called getReg(I) which select registers for each
memory location associated with instruction I.
Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry and
Rz.
If y is not in Ry (according to the register descriptor for Ry), then issue an
instruction LD Ry, y', where y' is one of the memory locations for y (according
to the address descriptor for y).
Similarly, if z is not in Rz, issue and instruction LD Rz, z', where z' is a location
for x .
Issue the instruction ADD Rx , Ry, Rz.
Rules for updating the register and address descriptors
1. For the instruction LD R, x
Change the register descriptor for register R so it holds only x.
Change the address descriptor for x by adding register R as an additional
location.
2. For the instruction ST x, R, change the address descriptor for x to include its
own memory location.
that the memory location for x is not now in the address descriptor for x.
Remove Rx from the address descriptor of any variable other than x.
4. When we process a copy statement x = y, after generating the load for y into
register Ry, if needed, and after managing descriptors as for all load statements (per
rule I):
Add x to the register descriptor for Ry.
Change the address descriptor for x so that its only location is Ry .
Example
Let us consider a basic block containing the following 3-address code.
t=a-b
u=a-c
v=t+u
a=d
d=v+u
Instructions generated and the changes in the register and address descriptors is
shown below.
29 | P a g e
UNIT V Compiler Design
30 | P a g e
UNIT V Compiler Design
Flow-of-control optimizations
goto L1
...
Ll: goto L2
Algebraic simplifications
There is no end to the amount of algebraic simplification that can be attempted
through peephole optimization. Only a few algebraic identities occur frequently
enough that it is worth considering implementing them .For example, statements
such as
x := x+0
Or
x := x * 1
31 | P a g e
UNIT V Compiler Design
To save some of these stores and their corresponding loads, we might arrange
to assign registers to frequently used variables and keep these registers
consistent across block boundaries (globally)
Some options are:
Keep values of variables used in loops inside registers
Use graph coloring approach for more globally allocation
Usage counts
The usage count is the count for the use of some variable x in some register
used in any basic block.
Usage count gives the idea about how many units of cost can be saved by
selecting a specific variable for global register allocation.
The approximate formula for usage count for the loop L in some basic block B
can be given by,
∑ (use(x,B)+2*(live(x,B))
Sum over all blocks (B) in a loop (L)
For each uses of x before any definition in the block we add one unit of
saving
If x is live on exit from B and is assigned a value in B, then we add 2 units
of saving
Ex:
variables c,b and a because they have highest usage count value.
Flow graph of an inner loop
Register assignment for outer loops
Consider that there are two loops L1 and L2 where L2 is the inner loop and
allocation of variable a is to be done to some register.
Following criteria should be adopted for register assignment for outer loop.
If a is allocated in loop L2 then it need not be allocated in L1-L2.
If a is allocated in L1 and it is not allocated in L2 then store a on an entrance to L2
and load a while leaving L2.
If a is allocated in L2 and not in L1 then load a on entrance of L2 and store a on
exit from L2.
34 | P a g e