0% found this document useful (0 votes)
30 views

CD Unit-4 &5

Uploaded by

lokeshvarma0309
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

CD Unit-4 &5

Uploaded by

lokeshvarma0309
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit-4 Compiler Design TEC

Intermediate code generation – Variants of syntax trees, Three address code, Types and
declarations, Translation of expressions, Type checking, Control flow, Back patching.

Variants of Syntax trees


1 Q: Explain briefly about variants of syntax trees.
 Syntax tree consists of nodes and children.
i) Nodes – They represent constructs in the source program.
ii) Children – They represent the meaningful components of a construct.
Directed Acyclic Graphs (DAG) for expressions (OR) Procedure for constructing DAG
 In a given expression, the common sub-expressions (sub-expressions occurring more
than one time) are identified by Directed Acyclic Graph (DAG)
 DAG and syntax tree construction is similar.
 The difference between them is : a node N in DAG can have more than one parent if
N represents a common sub-expression. But in case of syntax tree, the common sub-
expression would be represented repeatedly as many times the sub-expression
appears in the original expression.
 For example, let us construct DAG for the expression:
e*(b–c)+(b–c)*a
Here, expression (b – c ) is repeated twice, called common sub-expression.
Step 1
First let us construct DAG for the common sub-expression, ( b –c )

Step 2
Construct DAG for e * ( b – c )

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 1


Unit-4 Compiler Design TEC

Step 3
Again no need to construct DAG for ( b – c ). Hence consider ( b – c ) * a

Step 4
Finally draw DAG for the expression e * ( b – c ) + ( b – c ) * a

 The steps for constructing DAG are as follows.


Symbol Operation
b P1 = mkleaf(id, ptr to entry b )
c P2 = mkleaf(id, ptr to entry c )
- P3 = mknode(-, P1 , P2 )
e P4 = mkleaf(id, ptr to entry e )
* P5 = mknode(*, P3 , P4 )
a P6 = mkleaf(id, ptr to entry a )
* P7 = mknode(*, P3, P6 )
+ P8 = mknode(*, P5 , P7 )

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 2


Unit-4 Compiler Design TEC

 Consider the following SDD to produce syntax tree or DAG


SNO PRODUCTION SEMANTIC RULE
1. EE+T E.Node = new Node ( +, E.Node, T.Node )
2. EE-T E.Node = new Node ( -, E.Node, T.Node )
3. E  T E.Node = T.Node
4. T ( E ) T.Node = E.Node
5. T id T.Node = new Lean( id, id.entry )
6. T digit T.Node = new Leaf( digit, digit.val )

Value Number method for constructing DAGs


 Array of records can be used to store nodes of syntax trees or DAGs.
 Each row of the array represents one record, and therefore one node.
 The first field is an operation code which indicates the label of a node. This is
present in each record.
 Leaves consist of one additional field to hold the lexical value (either a pointer to
symbol table or constant).
 Interior nodes consist of two additional fields to represent the left and right child.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 3


Unit-4 Compiler Design TEC

2 Q: Explain about Three address code (TAC or 3AC)


 TAC is a sequence of statements of the general form
x = y op z
Where x,y,z are names, constants or compiler generated temporaries. op stands for
any operator such fixed or floating point arithmetic operator or logical operator on
Boolean valued data. Here three address refers to three addresses ie addresses of x, y
and z.
 For example, if x = a + b * c then the TAC is,
t1 = b * c
t2 = a + t1
x = t2
Where t1 and t2 are compiler generated temporary variables.
Addresses and Instructions
These are different types of TAC:
 Assignment statements are of the form x = y op z
 Assignment instructions are of the form x = op z ( is unary operation like unary
minus)
 Copy statements are of the form x = y (value of y is assigned to x).
 Unconditional jumps such as
if x relop y goto L
(relational operators <>, <, >, <=, >=, = )
 Param x and call p, n for procedure calls and return y where y represents a
returned value (optional).
Param X1
Param X2
Param Xn
 The following are the different types of TAC
1. x = y op z Assignment ( op = binary arithmetic or logical
operation)
2. x = op y Unary assignment ( op = unary operation )
3. x=y Copy ( x is assigned the value of y )
4. goto L Unconditional jump

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 4


Unit-4 Compiler Design TEC

5. if x relop goto L Conditional jump ( relop is <>, <, >, <=, >=, = )
6. Param x Procedure call
7. call p, n Procedure call
8. return z Procedure call
9. a=b[i] Index assignment
10. x[i]=y Index assignment
11. x = &y Pointer assignment
x = *y
12. *x = y Pointer assignment

3 Q: Briefly discuss about the implementations of TAC or Structure of 3AC.


 Three address code is an abstract form of intermediate code that can be
implemented as a record with the address fields.
 There are three representations used for three address code.
 They are
1. Quadruple representation.
2. Triple representation.
3. Indrect triple representation.
Quadruple representation
 Quadruple is a structure with the atmost four fields such as,
op arg1 arg2 result
 The field op is sued to represent the internal code for operator.
 arg1 and arg2 represents two operands used and result field is used to store the
result of an expression.
 For example, consider the input string x = - a * b + - a * b
 The following is the three address code.
t1 = uminus a
t2 = t1 * b
t3 = uminus a
t4 = t3 * b
t5 = t2 + t4
x = t5

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 5


Unit-4 Compiler Design TEC

 The following is the quadruple representation.


op arg1 arg2 result
uminus a t1
* t1 b t2
uminus a t3
* t3 b t4
+ t2 t4 t5
= t5 x

Triple representation
 In this, the use of temporary variables is avoided by referring the pointers in the
symbol table.
 Triple is a structure with the atmost three fields such as,
op arg1 arg2
 For example, consider the input string x = - a * b + - a * b
 The following is the three address code.
t1 = uminus a
t2 = t1 * b
t3 = uminus a
t4 = t3 * b
t5 = t2 + t4
x = t5
 The following is the triple representation.
op arg1 arg2
uminus a
* (0) b
uminus a
* (2) b
+ (1) (3)
= x (4)

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 6


Unit-4 Compiler Design TEC

Indirect triple representation


 In this, it uses additional array to list the pointers of triple in the desired order.
 For example, consider the input string x = - a * b + - a * b
 The following is the three address code.
t1 = uminus a
t2 = t1 * b
t3 = uminus a
t4 = t3 * b
t5 = t2 + t4
x = t5
 The following is the Indirect triple representation.

(0) ( 11 ) op arg1 arg2


(1) ( 12 ) uminus a
(2) ( 13 ) * ( 11 ) b
(3) ( 14 ) uminus a
(4) ( 15 ) * ( 13 ) b
(5) ( 16 ) + ( 12) ( 14 )
= x ( 15 )

4 Q: Briefly explain about Types and Declarations


 A complier has to perform semantic checking along with syntactic checking.
 Semantic checks can be static (during compilation) or dynamic (during run time).
 The application of types is grouped under checking and translation.
Type checking
 The process of verifying and enforcing the constraints of types – type checking –
may occur either at compile time (static check) or run time (dynamic check).
 A “type checker” implements the type system.
 A “type system” is a collection of rules for assigning type expressions to the parts of
a program.
Translation applications
 Compiler can determine the storage required from the type of a name at run time.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 7


Unit-4 Compiler Design TEC

Type expressions
 Type expression denotes the type of a language construct. It can be a basic type Eg:
primitive data type like int, real, char, boolean etc., or a type error or a void: no
type.
Type equivalence
 Type equivalence contains two important concepts.
 Name equivalence have the same name. For example, a, b have same type and c has
different type.
 Structural equivalence have the same structure. For example, a, b and c have the
same type.
Declaration
 Whenever a declaration is encountered, then create a symbol entry for every local
name in a procedure.
 The symbol table entry should contain type of name, how much storage the name
requires and a relative offset.
Storage layout for local names
 If we know the type of a name, we can determine the amount of storage required for
the name at run time.
 Type and relative address are saved in the symbol table entry for the name.
 The ‘width’ of type is defined as the number of storage units required for objects of
that type.
 SDT is used to compute types and their width for basic and array types.
5 Q: Explain about Type checking.
Rules for Type Checking (TC)
 Type Checking can be Type Synthesis (TS) and Type Inference (TI).
 Type Synthesis (TS) uses the types of its subexpressions in order to build the type of
expression. TS require names to declare before they are used.
 Type Inference (TI) is used to determine the type of a language construct from the
way it is used.
Type conversions
 Type conversion rules differ from one language to another. Java uses widening and
narrowing conversions between primitive types.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 8


Unit-4 Compiler Design TEC

Unification
 Testing equality of expressions is the concept of unification.
6 Q: Explain about Control Flow in Intermediate code generation.
 The main idea of converting any flow of control statements to 3AC is to stimulate
the “branching” of the control flow.
 Boolean expression in programming languages are often used to:
1. Alter the flow of control.
2. Compute logical value.
 Flow of control statements may be converted to 3AC by using following functions:
1. new label – returns a new symbolic label each time it is called.
2. gen ( ) – generates the code (string) passed as parameter to it.
 The following attributes are associated with non-terminals for the code generation:
1. code– contains the generated 3AC.
2. true – contains the label to which a jump takes place if the Boolean expression
associated evaluates to true.
3. false – contains the label to which a jump takes place if the Boolean expression
associated evaluates to false.
4. begin – contains the label / address pointing to the beginning of the code block
for the statement ‘generated’ by the non-terminal.
Boolean expressions
 Boolean expressions consist of Boolean operations like AND (&&), OR ( || ) and
NOT ( ! ) as in C applied to the elements that are Boolean variables or relational
expressions of the form E1 rel E2
E1 rel E2

Arithmetic <. <=, >>=, ! Arithmetic


expression expression

 For example, E  E && E | E || E | !E | (E) | E1 rel E2 | true | false


Short circuit code
 Given Boolean expression can be translated into 3AC without generating code for
any of the boolean operators and without having the code necessarily evaluate the
entire expression. This is called as Jumping code or short circuit code.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 9


Unit-4 Compiler Design TEC

 For example, if ( a < 10 || a > 15 && a != b )


x = 1;

Translate this to jumping code or short circuit code


if a < 10 goto L1
if false a > 15 goto L2
if false a != b goto L2
L1 : x = 1
L2 :
Flow of control statements
 The main idea of converting any flow of control statement to the 3AC is to stimulate
the “branching” of the flow of control using the goto statement.
 Consider the following grammar,
S  if E then S1 | if E then S1 else S2 | while E do S1 ;
 The simulation of the flow of control branching for each statement is depicted
pictorially as follows:
if – then

if – then else

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 10


Unit-4 Compiler Design TEC

while – do

7 Q: Explain about Backpatching.


 The main problem in generating 3AC in a single pass is that we may know the labels
that control must go to at the time jump statements are generated.
 In order to get rid of this problem, a series of branching statements with the targets
of the jumps temporarily left unspecified is generated.
 Back patching is putting the address instead of labels when the proper label is
determined.
One pass code generation using back patching:
 Back patching algorithms perform three types of operations:
1. Makelist ( i ) – Creates a new list containing only ‘ i ‘, an index into the array of
quadruples and returns pointer to the list it has made.
2. Merge ( i, j ) – Concatenates the lists pointed by ‘i’ and ‘j’ and returns a pointer
to the concatenated list.
3. Back patch( P, i ) – Inserts ‘i’ as the target label for each of the statements on the
list pointed by P.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 11


Unit-5 Compiler Design TEC

Runtime environments – Stack allocation of space, Access to non-local data on the stack,
Heap management.
Code generation – Issues in the design of code generation, The target language, Addresses
in the target code, Basic blocks and flow graphs, A simple code generator.

1 Q: Stack allocation of space


 Stack area is used to allocate space to the local variables whenever the procedure is
called. This space is popped off the stack when the procedure terminates.
Activation Trees
 Activation trees are used to describe the nesting of procedure calls to make the stack
allocation feasible efficiently.
 The nesting of procedure calls can be illustrated using the following quicksort
example.
1. program sort(input,output)
2. var a : array[ 0 .. 10 ] of integers
3. procedure readarray
4. var i : integer
5. begin
6. for i = 1 to 9 do read array( a [ i ] )
7. end
8. function partition( y , z : integer ) : integer
9. var i , j , x , v : integer
10. begin ….
11. end
12. procedure quicksort( m ,n : integer )
13. var i : integer
14. begin
15. if ( n > m ) then begin
16. i = partition ( m , n )
17. quicksort( m, i – 1 )
18. quicksort( i + 1 , n )
19. end
20. end

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 1


Unit-5 Compiler Design TEC

21. begin
22. a [ 0 ] := -9999, a [ 10 ] := 9999
23. readarray
24. quicksort(1, 9)
25. end
 The following is the activation tree corresponding to the output of quicksort.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 2


Unit-5 Compiler Design TEC

Activation Record ( AR )
 Activation Record is a memory block used for information management for single
execution of a procedure.
 The following is the activation record ( Read from bottom to top )
Actual parameters
Returned values
Control link
Access link
Saved machine status
Local data
Temporaries

Temporaries – It hold temporary values, such as the result of mathematical


calculation, a buffer value or so on.
Local data – It belongs to the procedure where the control is currently located.
Saved machine status- It provides information about the state of a machine just before
the point where the procedure is called.
Access link – It is used to locate remotely available data. This field is optional.
Control link – It points to the activation record of the procedure which called it, ie the
caller. This field is optional. This link is also known as dynamic link.
Return value – It holds any valued returned by the called procedure. These values can
also be placed in a register depending on the requirements.
Actual parameters – These are used by the calling procedures.

2 Q: Access to non-local data on the stack


 Non-local data is a parameter that is used within a procedure. There are various
situations of access to non-local data. These include:
Data access without nested procedures
For non-nested procedures, variables are accessed as follows:
 Global variables are declared statically as their values and locations are known or
fixed at compile time.
 Local variables are declared locally at the top of the stack using stack top pointer.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 3


Unit-5 Compiler Design TEC

A language with nested procedure declarations – ML


Properties of ML include:
 The variables declared in ML are unchangeable once they are initialized. This means
that ML is a functional language.
 Variable declaration is done using the statement:
val <name> = <exp>
 The syntax for defining functions is,
fun <name> ( <arg> ) = <body>
 Function bodies are defined as,
let <definitions> in <statements> end
Nesting depth
 Nesting Depth defines the level from the start of a procedure at which a particular
nested procedure is defined.
Access link
 It is used to locate any variable or procedure in the case of nested procedures.
Displays
 Displays are used for efficient and easy access of non-local data by avoiding the use of
long chains of activation links, in situations having large network depths.
 Display uses auxiliary array called d.
3 Q: Heap Management
 Heap is the unused memory space available for allocation dynamically.
 It is used for data that lives indefinitely, and is freed only explicitly.
 The existence of such data is independent of the procedure that created it.
The memory manager
 Memory manager is used to keep account of the free space available in the heap area.
Its functions include,
 allocation
 deallocation
 The properties of an efficient memory manager include:
 space efficiency
 program efficiency
 low overhead

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 4


Unit-5 Compiler Design TEC

The memory hierarchy of a computer


 Typical sizes Typical access times
2GB Virtual memory (disk) 3 – 15 ms
256MB – 2 GB Physical memory (RAM) 100 – 150 ms
128Kb – 4 MB 2nd Level Cache 40 – 60 ns
16 – 65KB 1st Level Cache 5 – 10 ns
32 words Registers 1 ns
Locality in programs
Locality refers to the amount of data requirements for a particular program, and the
times required to access or locate the data. Locality is of two types namely,
 Temporal locality – This is present if the memory locations accessed by it are likely to
be accessed again soon.
 Spatial locality – This is the condition when the memory locations close to the location
accessed are likely to be accessed within a short period of time.
4 Q: Issues in the design of code generation
Instruction selection
 The job of instruction selector is to do a good job overall choosing which instructions
to implement which operator in the low level intermediate representation.
 Issues here are:
 level of the IR
 nature of the instruction set architecture
 desired quality of the generated code
 Target Machine:
 n general purpose registers
 instructions: load, store, compute, jump, conditional jump
 various addressing mode
 indexed address, integer indexed by a register, indirect addressing and
immediate constant.
 For example, X = Y + Z we can generate,
LD R0, Y
ADD R0, R0, Z
ST X, R0

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 5


Unit-5 Compiler Design TEC

Register allocation
 Instructions involving register operands are usually shorter and faster than those
involving operands in memory. Therefore, efficient utilization of registers is
particularly important in generating good code.
 The most frequently used variables should be kept in process register for faster access.
 The use of registers if often subdivided into two sub problems:
 During “register allocation”, we select the set of variables that will reside in
register at a point in the program.
 During “register assignment”, we pick the specific register that a variables will
reside in.
 For example, consider t1 = a * b
t2 = t1 + c
t3 = t2 / d
optimal machine code sequence is,
L R1, a
M R1, b
A R2, c
D R2, d
ST R1, t3

5 Q: The Target Language


 The output of Code Generation (CG) is the target language.
 The output may take different forms like absolute machine language, relocatable
machine language or assembly language.
Simple target machine model
The possible kinds of operations are listed below:
1. Load operation ( LD )
 General format is LD dst, addr.
 LD loads the value in location ‘addr’ into location ‘dst’. Here dst is destination.
 For example, LD R1, x  loads the value in location x into register R1.
2. Store operation ( ST )
 General format is ST dst, src.
 For example, ST x, R1  stores the value in register R1 into the location x.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 6


Unit-5 Compiler Design TEC

3. Computation operation
 The general form is OP dst, src1, src2 where OP is a operator like ADD, SUB.
 For example, ADD R1, R2, R3  adds R2 and R3 values and stores into R1.
4. Unconditional jump ( BR )
 The general form is BR L where BR is branch. This causes control to branch to the
machine instruction with label L.
5. Conditional jump
 The general form is Bcond R, L where R is a register, L is label and cond stands for
any of the common tests on values in register R.
 For example, BGTZ R2, L  This instruction causes a jump to label L if the value in
register R2 is greater than zero and allows control to pass to the next machine
instruction if not.
6 Q: Addresses in the target code
 In this, it explains storage allocation strategies namely static and stack allocation
strategies.
Static allocation strategy
 In static allocation names are bound to the storage as the program is compiled. Since
bindings don’t change at runtime, its names are bound to the same storage whenever
the procedure is activated.
 The compiler must decide where the activation record is to go, relative to the target
code. Once this decision is made, the position of each activation record and the storage
for each name in the record is fixed.
 Compiler gives the address of code which is required by the target code.
Limitations:
 The size of data objects must be known at compile time.
 Recursive procedures are not allowed because all the activations of a procedure use
the same bindings for local names.
 Data structures cannot be created dynamically since there is no mechanism for
storage allocation at runtime.
Stack allocation strategy
 In stack allocation, storage is organized as stack and activation records are pushed
and popped as the activation begins and ends respectively.
 Storage for locals in each call of a procedure is contained in the activation record.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 7


Unit-5 Compiler Design TEC

 The values of locals are deleted when activation ends.


 Consider the stack in which size of all activation records are known at compile time.
Consider a register “top” which points the top and increments at run time. Activation
record is allocated and deallocated by incrementing and decrementing the top with
size of the record.
7. Basic blocks and Flow graph
Basic blocks
 Basic block is sequence of constructive statements in which flow of control enters at
the beginning and leaves at the end without halt or possibility of branching except at
the end.
 Basic blocks are constructed by partitioning a sequence of three address instruction.
Algorithm for partitioning into basic blocks:
INPUT:- sequence of three address instructions
OUTPUT:- list of basic blocks
METHOD:
Step1
The first step is to determine the set of leaders. The rules to obtain the leaders are
1. The first statement is a leader.
2. Any statement which is the target of conditional or unconditional GOTO is a
leader.
3. Any statement which immediately follows the conditional GOTO or unconditional
GOTO is a leader.
Step2
For each leader, construct the basic blocks which consists of the leader and all the
instructions up to but not including the next leader or the end of the intermediate
program.
 For example, let us construct the basic blocks for the following:
1. i = 0
2. if ( i > 10 ) goto 6
3. a[ i ] = 0
4. i = i + 1
5. goto 2
6. end

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 8


Unit-5 Compiler Design TEC

 Let us apply step1 and step2 of algorithm of algorithm to identify basic blocks.
Step1
1. i = 0
2. if ( i > 10 ) goto 6
3. a[ i ] = 0
4. i = i + 1
5. goto 2
6. end
 Based on rule 1 of step1, 1 statement is leader.
 Based on rule 2 of step1, 2 and 5 are leaders.
 Based on rule 3 of ste1, 3 is a leader.
Step2
 For each leader, construct the basic blocks.
L 1. i = 0 B1

L
2. if (i > 10 ) GOTO 6 B2

L 3. a[i] = 0
4. i = i + 1
5. GOTO 2 B3

L 6. end B4

Flow graph
 Flow graphs are used to represent the basic blocks and their relationship by a directed
graph.
 There exists an edge from block 1 to block2 iff it is possible for the first instruction in
block2 to immediately flow to the last instruction in block1.
 For example, Let us write the flow graph for the following basic blocks.

1. i = 0 B1

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 9


Unit-5 Compiler Design TEC

2. if (i > 10 ) GOTO 6 B2

4. a[i] = 0
5. i = i + 1
6. GOTO 2 B3

3. end B4

 Flow graph for these basic blocks is:

B1 1. i = 0

B2 2. if (i > 10 ) GOTO B4

B3 4. a[i] = 0
5. i = i + 1
6. GOTO B2

B4 3. end

8 Q Code Generation algorithm


Machine instruction for operations
For X = Y + Z do the following:
 Use getReg ( X = Y + Z ) to select registers for X, Y and Z. Let the registers be Rx,
Ry and Rz.
 If Y is not in Ry then issue an instruction LD Ry, Y. Here Y is one of the memory
location for Y.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 10


Unit-5 Compiler Design TEC

 Similarly if Z is not in Rz, issue an instruction LD Rz, Z. Here Z is a memory


location of Z.
 Issue the instruction ADD Rx, Ry, Rz
Machine instructions for copy statements
 A three address copy statement can be of the form x = y. This is a special case where
we assume that ‘getReg’ will always choose the same register for both x and y.
Ending the basic block
 When the block ends, we need not have to remember the variable value if it is
temporary. In this case, assume that its register is empty if the variable is temporary
and used only within the block.
 Then generate the instruction ST x, R where R is a register in which x’s value exists
at end of the block.
Managing the Register and Address descriptors
1. Load operation ( LD )
 General format is LD dst, addr.
 LD loads the value in location ‘addr’ into location ‘dst’. Here dst is destination.
 For example, LD R1, x  loads the value in location x into register R1.
2. Store operation ( ST )
 General format is ST dst, src.
 For example, ST x, R1  stores the value in register R1 into the location x.
3. Computation operation
 The general form is OP dst, src1, src2 where OP is a operator like ADD, SUB.
For example, ADD R1, R2, R3  adds R2 and R3 values and stores into R1.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 11


Unit-6 Compiler Design TEC

Machine Independent code optimization – Principle sources of optimization, Peephole


optimization, Introduction to data flow analysis.
Principle sources of optimization
1 Q: Explain Principle sources of optimization techniques (OR) Function preserving
transformations).
The following are the principle sources of optimization techniques or function preserving
transformations:
a) Elimination of common sub expression
b) Copy propagation
c) Dead code elimination
d) Constant folding
e) Loop optimizations
Elimination of common sub expression
 Common sub expressions can be either eliminated locally or globally.
 Local common sub expressions can be identified in a basic block.
 Hence first step to eliminate local common sub expressions is to construct a basic
block.
 Then the common sub expressions in a basic block can be deleted by constructing a
directed acyclic graph (DAG).
 For example, x = a + b * ( a + b ) + c + d
The following is the basic block.

t1 = a + b
t2 = a + b
t3 = t1 * t2
t4 = t3 + c
t5 = t4 + d
x = t5

 The local common sub expression in the above basic block are
t1 = a + b
t2 = a + b
 Hence these local common sub expressions can be eliminated. The block obtained
after eliminating local common sub expression is shown below.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 1


Unit-6 Compiler Design TEC

t1 = a + b
t2 = t1 * t1
t3 = t2 + c
t4 = t3 + d
x = t4

Copy or Variable propagation


 Consider the assignment statement of the form x = y. The statement x = y is called as
copy statement.
 To explain copy propagation, take the following example.
a) B1 b=a+d B2 b=a+d
c=b h=b

B3 j=a+d

b) B1 B2
b=a+d b=a+d
c=b h=b

B3 j=b

 Here the common sub expression is j = a + d. When a + d is eliminated, it uses j = b.


Dead code elimination
 A piece of code which is not reachable and never used anywhere in the program, then
it is said to be dead code, and can be removed from the program safely.
 Generally copy statements may lead to dead code. For example,
x=b+c
z=x

d=x+y
 Suppose z variable is not used in the entire program, the z =x becomes the dead code.
Hence, it can be optimized as,

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 2


Unit-6 Compiler Design TEC

x=b+c

d=x+y
Constant folding
 In folding technique, the computation of a constant is done at compile time instead of
execution time and further the computed value of the constant is used.
 For example, k = ( 22 / 7 ) * r * r
Here folding is done by performing the computation of ( 22 / 7 ) at compile time. So it
can be optimized as,
k = 3.14 * r * r
Loop optimizations
 The major source of code optimization is loops, especially the inner loops.
 Most of the run time is spent inside the loops which can be reduced the number of
instructions in the inner loop.
 The following are the loop optimization techniques.
1. Code motion.
2. Elimination of induction variables.
3. Strength reduction.
1. Code motion:-
 Code motion is a technique which is used to move the code outside the loop.
 If there exists any expression outside the loop which the result is unchanged even after
executing the loop many times, then such expression should be placed just before the
loop.
 For example, while ( x ! = max =3 )
{
x = x + 2;
}
Here the expression max -3 is a loop invariant computation. So this can be optimized
as follows:
k = max -3;
while ( x ! = k )
{
x = x + 2;
}

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 3


Unit-6 Compiler Design TEC

2. Elimination of induction variables:-


 An induction variable is a loop control variable or any other variable that depends on
the induction variable in some fixed way.
 It can also be defined as a variable which is incremented or decremented by a fixed
number in the loop each time is executed.
 For example, for( i = 0, j = 0, k =0; i < n; i++ )
{
printf(“%d”, i );
}
 There are three induction variables i, j, k used in the for loop. So each time i is used
but j and k are not used. Hence the code can be optimized after elimination of unused
induction variables is given below.
for( i = 0; i < n; i++ )
{
printf(“%d”, i );
}
3. Strength reduction:-
 It is the process of replacing expensive operations by the equivalent cheaper
operations on the target machine.
 For example, for( k = 1; k < 5; k++)
{
x = 4 * k;
}
 On many machines, multiplication operation takes more time than addition or
subtraction. Hence, the speed of the object code can be increased by replacing
multiplication with addition or subtraction.
for( k = 1; k < 5; k++)
{
x = x + 4;
}

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 4


Unit-6 Compiler Design TEC

2 Q: Explain briefly about peephole optimization techniques.


 Peephole optimization is simple but effective method used to locally improve the
target code by examining a short sequence of target instructions known as peephole
and then replace these instructions by a shorter and/or faster sequence of instructions
whenever required.
 The following are peephole optimization techniques:
1. Elimination of redundant instructions.
2. Optimization of flow of control or elimination of unreachable code.
3. Algebraic simplifications.
4. Strength reduction.
1. Elimination of redundant instructions:-
 This includes elimination of redundant load and store instructions.
 For example, MOV R1, A
MOV A, R1
 Here first instruction is storing the value of A into register R1 and second instruction
is loading R1 value into A.
 These two instructions are redundant so eliminate instruction (2) because whenever
instruction (2) is executed after (1), it is ensured that the register R1 contains A value.
2. Optimization of flow of control or elimination of unreachable code:-
 An unlabeled instruction that immediately follows an unconditional jump can be
removed.
 For example,
i=j
if k = 2 goto L1
goto L2
L1: k is good
L2:
Here L1 immediately follows unconditional jump statement goto L2. Then the code
after elimination of unreachable code is
i=j
if k ≠ 2 goto L2
k is good
L2:

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 5


Unit-6 Compiler Design TEC

3. Algebraic simplifications:-
 Algebraic identities that occur frequently and which is worth considering them can
be simplified.
 For example, X = X * 1 or X = 0 + X is often produced by straight forwards
intermediate code generation algorithms. Hence they can be eliminated easily through
peephole optimization.
4. Strength reduction:-
 Replace expensive statements by a cheaper one.
 For example, X2 is expensive operation. Hence replace this by X * X which is cheaper
one.
3 Q: Explain about Data flow analysis
 Data flow analysis:
– Flow-sensitive: sensitive to the control flow in a function
– Intra procedural analysis
 Examples of optimizations:
– Constant propagation
– Common sub expression elimination
– Dead code elimination
 Data flow analysis abstraction:
– For each point in the program, combines information of all the instances of the
same program point.
 Reaching Definitions

 Every assignment is a definition


 A definition d reaches a point p if there exists path from the point immediately
following d to p such that d is not killed (overwritten) along that path.

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 6


Unit-6 Compiler Design TEC

 Data Flow Analysis Schema

 Build a flow graph (nodes = basic blocks, edges = control flow)


 Set up a set of equations between in[b] and out[b] for all basic blocks b
Live Variable Analysis
 Definition
– A variable v is live at point p if the value of v is used along some path in the flow graph
starting at p.
– Otherwise, the variable is dead.
 For each basic block, determine if each variable is live in each basic block

CHAKRAVARTHY, ASSOCIATE PROFESSOR, DEPARTMENT OF CSE 7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy