Code Optimization
Code Optimization
Code Optimization
Introduction
Classifications of Optimization techniques
Factors influencing Optimization
Themes behind Optimization Techniques
Optimizing Transformations
Example
Details of Optimization Techniques
1
Introduction
Introduction
Introduction
Profile and
optimize
(user)
Front
end
Inter.
code
Loop, proc
calls, addr
calculation
improvement
(compiler)
Code
generator
target
code
Reg usage,
instruction
choice,
peephole opt
(compiler)
4
Introduction
Data flow
analysis
Transformation
Code optimizer
Classifications of Optimization
techniques
Peephole optimization
Local optimizations
Global Optimizations
Inter-procedural
Intra-procedural
Loop optimization
of CPU registers
RISC vs CISC
Pipeline Architecture
Number of functional units
Machine Architecture
Cache Size and type
Cache/Memory transfer
rate
7
Redundancy elimination
numbering
Common
subexpression elimination
Constant/Copy
Partial
propagation
redundancy elimination
Optimizing Transformations
10
Compile-Time Evaluation
Expressions whose values can be precomputed at the compilation time
Two ways:
Constant
folding
Constant propagation
11
Compile-Time Evaluation
Example:
area := (22.0/7.0) * r ** 2
area := 3.14286 * r ** 2
12
Compile-Time Evaluation
Constant Propagation: Replace a
variable with constant which has been
assigned to it earlier.
Example:
pi := 3.14286
area = pi * r ** 2
area = 3.14286 * r ** 2
13
Constant Propagation
When is it performed?
Early
code
Fewer registers
14
Common Sub-expression
Evaluation
Example:
a := b * c
x := b * c + 5
temp := b * c
a := temp
x := temp + 5
15
x=a+b
...
y=a+b
t=a+b
x=t
...
y=t
16
t1 = a + b
c = t1
t2 = m * n
d = t2
t3 = b + d
e = t3
f = t1
g = -b
h = t1 /* commutative */
a=j+a
k = t2
j = t3
a = -b
if t2 go to L
on flow graph
Requires available expression information
18
Common Sub-expression
Evaluation
1
x:=a+b
a:= b
z : = a + b + 10
a + b is not a
common subexpression in 1 and 4
19
Code Motion
20
Code Motion
1.
temp : = x ** 2
if (a< b) then
z := temp
else
y := temp + 10
Code Motion
2
if (a<b) then
temp = x * 2
z = temp
else
y = 10
temp = x * 2
g = temp;
22
Code Motion
Move expression out of a loop if the
evaluation does not change inside the
loop.
Example:
Equivalent to:
t :=
max - 2
while ( i < t )
23
Code Motion
temp = x * 2
if (a<b) then
z = temp
else
y = 10
24
Strength Reduction
Example:
for i=1 to 10 do
x=i*5
end
temp = 5;
for i=1 to 10 do
x = temp
temp = temp + 5
end
be removed
Examples:
No
26
Can be
eliminated
28
Copy Propagation
When is it performed?
At
code
29
Copy Propagation
x[i] = a;
sum = a + a;
31
Loop Optimization
Decrease the number if instruction in the
inner loop
Even if we increase no of instructions in
the outer loop
Techniques:
Code
motion
Induction variable elimination
Strength reduction
32
Peephole Optimization
Pass
Flow
Use
of machine idioms
33
MOV R0, a
MOV a, R0
Can eliminate the second instruction without needing any global
knowledge of a
if (0 != 1) goto L2
L2:
34
Algebraic identities
Strength reduction:
A ^ 2 = A * A
35
Objective
bits
-5
111111...1111111011
SAR
111111...1111111101
which is -3, not -2
in most languages -5/2 = -2
38
X * 125
x * 128 - x*4 + x
two
Note
39
JNE lab1
...
lab1: JMP lab2
Can be replaced by:
JNE lab2
As a result, lab1 may become dead (unreferenced)
40
Jump to Return
be replaced by
RET
41
INC i
42
Local Optimization
43
44
DAG representation
of Basic Block (BB)
t1 := 4 * i
4
t1 := 4 * i
t3 := 4 * i
t2 := t1 + t3
t1
if (i <= 20)goto L1
+ t2
<= (L1)
* t1, t3
i
4
20
i
46
47
Node:
1)
2)
3)
4)
Method:
For each 3AC, A in B
A if of the following forms:
1.
2.
3.
1.
x := y op z
x := op y
x := y
If (A == type 1)
Find a node labelled op with left and right as ny and nz
respectively [determination of common sub-expression]
If (not found)
If (A == type 2)
Find a node labelled op with a single child as ny
If (not found) n = Create (op, ny);
3.
50
* t1
4
i
51
[] t2
* t1
a
i
52
[] t2
* t1, t3
a
i
53
:=
:=
:=
:=
4
a
4
b
*
[
*
[
i
t1 ]
i
t3 ]
t4 []
[] t2
* t1, t3
i
54
:=
:=
:=
:=
:=
4 * i
a [ t1 ]
4 * i
b [ t3 ]
t2 + t4
+ t5
t4 []
[] t2
* t1, t3
i
55
+ t5,i
t4 []
[] t2
* t1, t3
i
56
Observations:
A
57
:=
:=
:=
:=
b
b
c
b
+
+
c
d
d
c
Common expressions
But do not generate the
same result
58
:=
:=
:=
:=
b
b
c
b
+
+
c
d
d
c
+ e
+ a
- b
+ c
b0
c0
d0
59
:=
:=
:=
:=
b
a
a
d
c
d
d
c
c +
b,d
a := b + c
d := a - d
c := d + c
a +
d0
b is not live
b0
c0
60
Loop Optimization
61
Loop Optimizations
loops
Loop
Dominators:
A node d of a flow graph G dominates a node n, if
every path in G from the initial node to n goes through
d.
Represented as: d dom n
Corollaries:
Every node dominates itself.
The initial node dominates all nodes in G.
The entry node of a loop dominates all nodes in the
loop.
63
64
1
3
4
5
4
6
5
7
8
8
Flow Graph
Dominator Tree
65
Natural loops:
1.
2.
Loop Optimization
73
Loop Optimization
Loop Optimization
Header
Pre-Header:
loop L
Targeted
to hold statements
that are moved out of the loop
A basic block which has only
the header as successor
Control flow that used to enter
the loop from outside the loop,
through the header, enters the
loop from the pre-header
Pre-header
Header
loop L
75
77
78
79
r1 = 0
r7 = &A
Loop: r2 = r1 * 4
r4 = r7 + 3
r7 = r7 + 1
r10 = *r2
r3 = *r4
r9 = r1 * r3
r10 = r9 >> 4
*r2 = r10
r1 = r1 + 4
If(r1 < 100) goto Loop
81
82
Transformation:
Insert the following into the bottom of pre-header:
new_reg = expression of target statement S
if (opcode(S)) is not add/sub, insert to the bottom of the
preheader
new_inc = inc(y,op,z)
Function: inc()
else
Calculate the amount of inc
new_inc = inc(x)
for 1st param.
Insert the following at each update of y
new_reg = new_reg + new_inc
Change S: x = new_reg
83
r5 = r4 - 3
r4 = r4 + 1
r7 = r4 *r9
r6 = r4 << 2
r5 = r4 - 3
r4 = r4 + 1
new_reg += new_inc
r7 = new_reg
r6 = r4 << 2
84
r2 = 0
r1 = r1 - 1
r2 = r2 -1
r2 = r2 - 1
r7 = r1 * r9
r9 = r2 + r4
r4 = *(r1)
r7 = r2 * r9
r9 = r2 + r4
r4 = *(r2)
*r2 = r7
*r7 = r2
86
Variants:
Complexity of elimination
1.
2.
3.
4.
5.
r4 := r1 + rx
r3 := r1 = 4
.
.
r1 := r1 + 4
88
Loop Unrolling
Variants:
Unroll
89
90
IMPORTANT!
Data flow analysis should never tell us that a
transformation is safe when in fact it is not.
When
Aggressive
92
Global:
Performed on the flow graph
Goal = to collect information at the beginning
and end of each basic block
Iterative:
Construct data flow equations that describe
how information flows through each basic
block and solve them by iteratively
converging on a solution.
93
Functions
Algorithm sketch
95
96
Typical problems
Reaching definitions
For
Live variables
For
Available expressions
Find
99
block.
A location after the last statement of the basic block.
100
B1
p3
p4
d4: i := i + 1
B2
p5
p6
d5: j := j - 1
B3
Path:
p1, p2, p3, p4,
p5, p6 pn
B4
p1
p2
d6: a := u2 B5
B6
pn
101
Reaching Definition
Assignments to x
Read a value from I/O device to x
102
Reaching Definition
p1
p2
d1: i := m 1
d2: j := n
d3: a := u1
B1
d4: i := i + 1
B2
d5: j := j - 1
B3
B4
d6: a := u2 B5
Definition of i (d1)
reaches p1
Killed as d4, does
not reach p2.
Definition of i (d1)
does not reach B3,
B4, B5 and B6.
B6
104
Reaching Definition
N includes a header h
h dominates all node in N
The
107
Composition of Regions
S1
S S1 ; S2
S2
109
Composition of Regions
if E goto S1
S if E then S1 else S2
S1
S2
110
Composition of Regions
S1
S do S1 while E
if E goto S1
111
kill[S]:
112
113
114
d:
a := b + c
( gen[ S1 ] kill[ S 2 ])
(kill[ S1 ] gen[ S 2 ])
S1
S
in[ S1 ] in[ S ]
in[ S2 ] out[ S1 ]
S2
out[ S ] out[ S2 ]
116
gen[ S 2 ]
kill[ S 2 ]
S1
S2
in[ S1 ] in[ S ]
in[ S2 ] in[ S ]
out[ S ] out[ S1 ]
out[ S 2 ]
117
in[ S1 ] in[ S ]
gen[ S1 ]
out[ S ] out[ S1 ]
118
119
in[ S ]
(out[Y ] : Y is a predecessor of S)
out[ S ] gen[ S ]
(in[ S ] kill[ S ])
120
Reaching definitions
What is safe?
Reaching definitions
forward
out[B] = gen[B] (in[B] - kill[B])
122
Reaching definitions
union
in[B] = out[P], over the predecessors P of
B
How do we initialize?
start small
124
125
Attributes
126
(out[ B ] def [ B ])
in[ S ]
S succ ( B )
1st
or
2nd
Example: Liveness
r2, r3, r4, r5 are all live as they
are consumed later, r6 is dead
as it is redefined later
r1 = r2 + r3
r6 = r4 r5
r4 = 4
r6 = 8
r6 = r2 + r3
r7 = r4 r5
r4 is dead, as it is redefined.
So is r6. r2, r3, r5 are live
130
DU/UD Chains
Convenient way to access/use reaching
definition information.
Def-Use chains (DU chains)
Given
4: r1 = r1 + 5
5: r3 = r5 r1
6: r7 = r3 * 2
7: r7 = r6
8: r2 = 0
9: r7 = r7 + 1
DU Chain of r1:
(1) -> 3,4
(4) ->5
DU Chain of r3:
(3) -> 11
(5) -> 11
UD Chain of r1:
(12) ->
(12) -> 11
UD Chain of r7:
(10) -> 6,9
10: r8 = r7 + 5
11: r1 = r3 r8
12: r3 = r1 * 2
132
133
Transfer function
How
information is changed by BB
Meet/Confluence function
How
134
135
endfor
endfor
endfor
136
out = U(in(succ))
Walk ops reverse order
137
Up to this point
Available definitions
Available expressions (could also have reaching expressions,
but not that useful)
138
1,2 reach
1,2 available
1,2 reach
1,2 available
3: r4 = 4
4: r6 = 8
1,3,4 reach
1,3,4 available
5: r6 = r2 + r3
6: r7 = r4 r5
1,2,3,4 reach
1 available
139
Algorithm:
Ex: in r2 = r3 + r4 r3 + r4 is an expression
Algorithm:
Available Expression
Visit
Requirements Efficiency!
Large
Bitvectors
General
Classes of optimization
1.
2.
Machine specific
3.
Increasing parallelism
Possibly increase instructions
147
Constant Folding
r1 = 3 * 4 r1 = 12
r1 = 3 / 0 ??? Dont evaluate excepting ops !, what about FP?
if (1 < 2) goto BB2 goto BB2
if (1 > 2) goto BB2 convert to a noop
Dead code
Algebraic identities
r1 = r2 + 0, r2 0, r2 | 0, r2 ^ 0, r2 << 0, r2 >> 0 r1 = r2
r1 = 0 * r2, 0 / r2, 0 & r2 r1 = 0
r1 = r2 * 1, r2 / 1 r1 = r2
149
Strength Reduction
Power of 2 constants
r1 = r2 << 3
r1 = r2 >> 2
r1 = r2 & 15
More exotic
r1 = r2 * 6
r1 = r2 * 7
DU
151
Constant Propagation
of rx with L wherever
possible.
d must be available at point of replacement.
152
r1 := r2
.
.
.
r4 := r2 + 1
Reduce
chain of dependency
Possibly create dead code
153
Rules:
Statement dS is source of copy propagation
Statement dT is target of copy propagation
dS
is a mov statement
src(dS) is a register
dT uses dest(dS)
dS is available definition at dT
src(dS) is a available expression at dT
154
r4 := r2 + r3
r5 := r4 + r6
Dead Code
Rules:
Benefits:
Reduced computation
Generates mov statements,
which can get copy propagated
dS: r1 := r2 + r3
dT: r4 := r2 + r3
Rules:
dS: r1 := r2 + r3
r100 := r1
dT: r4 := r100
156
Rules:
dS
157
entry
bb1
bb2
bb3
bb4
bb5
158