Intermediate Code Generation and Code Optimization
Intermediate Code Generation and Code Optimization
Intermediate Code Generation and Code Optimization
&
Code Optimization
By Birku L.
Introduction
Why IR?
Intermediate Representations
Graphical IRs (Abstract Syntax Trees, Directed
Acyclic Graphs, Control Flow Graphs)
Linear LRs (Postfix notation, Static Single
Assignment Form, Stack Machine Code, Three
Address Code)
Code Optimization
Optimization Techniques
Introduction
Intermediate representation
It ties the front and back ends together
Language and Machine neutral
no limit on register and memory, no machine-specific instructions.
Many forms
Syntax trees, three-address code, quadruples.
Intermediate code generation can affect the performance of the
back end
Why IR?
Portability - Suppose We have n-source languages and m-Target languages.Without
Intermediate code we will change each source language into target language
directly. So, for each source-target pair we will need a compiler. Hence we will require
(n*m) Compilers, one for each pair. If we Use Intermediate code We will require n-
Compilers to convert each source language into Intermediate code and m-Compilers
to convert Intermediate code into m-target languages. Thus We require only (n+m)
Compilers.
C SPARC
Pascal HP PA
FORTRAN x86
Pascal HP PA
IR
FORTRAN x86
x y x + y x y
5
if (x < y) * /
x = 5*y + 5*y/3;
else 5 y 3
*
y = 5;
x = x+y;
5 y
Directed Acyclic Graphs (DAGs)
Directed acyclic graphs (DAGs)
Like compressed trees
Leaves are distinct: variables, constants available on entry
internal nodes: operators
Can generate efficient code – since it encode common expressions
But difficult to transform
Check whether an operand is already present
if not, create a leaf for it
Check whether there is a parent of the operand that represents the
same operation
if not create one, then label the node representing the result with the
name of the destination variable, and remove that label from all other
nodes in the DAG
Directed Acyclic Graphs (DAGs)…
if (x < y) Statements
x = 5*y + 5*y/3;
else IfStmt
y = 5; AssignStmt
x = x+y;
x y +
/
*
3
5
Control Flow Graphs (CFGs)
Nodes in the control flow graph are basic blocks
A basic block is a sequence of statements always entered at the beginning
of the block and exited at the end
Edges in the control flow graph represent the control flow
if (x < y)
x = 5*y + 5*y/3; B0
if (x < y) goto B1 else goto B2
else
y = 5;
x = x+y; B1 B2
x = 5*y + 5*y/3 y = 5
Rules:
1. If E is a variable/constant, the PN of E is E itself
2. If E is an expression of the form E1 op E2, the PN of E is E1’E2’op (E1’
and E2’ are the PN of E1 and E2, respectively.)
3. If E is a parenthesized expression of form (E1), the PN of E is the same
as the PN of E1.
Example:
The PN of expression (a+b)/(c-d) ? is (ab+)(cd-)/
Three-Address Code
A popular form of intermediate code used in optimizing
compilers
Each instruction can have at most three operands
Types of three address code statements
Assignment statements: x := y op z and x := op y
Indexed assignments: x := y[ i] and x[ i] := y
Pointer assignments: x := & y, x := * y, * x := y
Unconditional jumps: goto L
Conditional jumps: if x relop y goto L
Function calls: param x, call p, n , and return y
p(x1,..., xn ) => param x1 . . .
Param xn
call p, n
Three-Address Code
Variables can be represented with
if (x < y)
x = 5*y + 5*y/3;
their locations in the symbol table
else
y = 5; if x < y goto L1
goto L2
x = x+y; L1: t1 := 5 * y
t2 := 5 * y
t3 := t2 / 3
Temporaries: temporaries x := t1 + t3
correspond goto L3
L2: y := 5
to the internal nodes of the L3: x := x + y
syntax tree
• Three address code instructions can be represented as an array of
quadruples: operation, argument1, argument2, result
triples: operation, argument1, argument2
(each triple implicitly corresponds to a temporary)
Syntax tree vs. Three address code
Expression: (A+B*C) + (-B*A) – B
T1 := B * C
_ T2 = A + T1
T3 = - B
+ B
T4 = T3 * A
+ * T5 = T2 + T4
_ T6 = T5 – B
A * A
B C
B
=
T1 := A T1 := B * C
D / T2 := C T2 := A+T1
T3 := B * T2
+ _ T4 := T1+T3
T3 := A*T1
T4 := T2+T3
T5 := T1*T3 T5 := – C
+ * T6 := T4 + T5 T6 := T4 / T5
T7 := –T2 D := T6
T8 := T6 / T7
* D := T8
A
B C
Example
Implementation of Three Address Code
Quadruples
Four fields: op, arg1, arg2, result
Array of struct {op, *arg1, *arg2, *result}
x:=y op z is represented as op y, z, x
arg1, arg2 and result are usually pointers to symbol table entries.
May need to use many temporary names.
Many assembly instructions are like quadruple, but arg1, arg2, and result
are real registers.
Example:
a=b*-c+b*-c
Implementation of Three Address Code…
Triples
Three fields: op, arg1, and arg2. Result become implicit.
arg1 and arg2 are either pointers to the symbol table or index/pointers
to the triple structure.
No explicit temporary names used.
Need more than one entries for ternary operations such as x:=y[i],
a=b+c, x[i]=y, … etc.
Pointer
to
symbol
table
Static Single- Assignment Form
Facilitâtes certain code optimisations
Two distinctive aspects distinguish SSA from three-address code.
First, all assignments in SSA are to variables with distinct names; hence
the term static single-assignment.
Its advantage
Executes faster
/* Before */
for (i = 0; i < M; i = i + 1) a[i] = b[i] / c[i];
for (i = 0; i < M; i = i + 1) d[i] = a[i] + c[i];
/* After */
for (i = 0; i < M; i = i + 1) {
a[i] = b[i] / c[i];
d[i] = a[i] + c[i];
} loop Fusion
Optimization Techniques…
Code Motion
Any code inside a loop that always computes the same value can be
moved before the loop.
Example:
while (i <= limit-2)
do {loop code}
where the loop code doesn't change the limit variable. The
subtraction, limit-2, will be inside the loop. Code motion would
substitute:
t = limit-2;
while (i <= t)
do {loop code}
Optimization Techniques…
Copy propagation
Deals with copies to temporary variables, a = b.
Compilers generate lots of copies themselves in intermediate
form.
Copy propagation is the process of removing them and replacing
them with references to the original. It often reveals dead-code.
Example
Before
tmp0 = FP + offset A
temp1 = tmp0
After
tmp1 = FP + offset A
Optimization Techniques…
Peep-hole Optimization
Look through small window at assembly code for common
cases that can be improved
1. Redundant load
2. Redundant push/pop
3. Replace a Jump to a jump
4. Remove a Jump to next instruction
5. Replace a Jump around jump
6. Remove Useless operations
7. Reduction in strength
Done after code generation - Makes small local changes to
assembly
Optimization Techniques…
Redundant Load
Before
store Rx, M After
load M, Rx store Rx, M
Redundant Push/Pop
Before After
push Rx … nothing …
pop Rx
Replace a jump to a jump
Before
goto L1 After
goto L2
…
L1:goto L2
L1:goto L2
Optimization Techniques…
Reduction in Strength
Before After
mul T0, T0, 2 shift-left T0
add T0, T0, 1 inc T0
Optimization Techniques…