Code Optimization

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 149

Organization

Introduction
Classifications of Optimization techniques
Factors influencing Optimization
Themes behind Optimization Techniques
Optimizing Transformations

Example
Details of Optimization Techniques
1

Introduction

Concerns with machine-independent code


optimization

90-10 rule: execution spends 90% time in


10% of the code.

It is moderately easy to achieve 90% optimization.


The rest 10% is very difficult.
Identification of the 10% of the code is not possible
for a compiler it is the job of a profiler.

In general, loops are the hot-spots


2

Introduction

Criterion of code optimization


Must

preserve the semantic equivalence of the


programs
The algorithm should not be modified
Transformation, on average should speed up the
execution of the program
Worth the effort: Intellectual and compilation effort
spend on insignificant improvement.
Transformations are simple enough to have a good effect

Introduction

Optimization can be done in almost all


phases of compilation.
Source
code

Profile and
optimize
(user)

Front
end

Inter.
code

Loop, proc
calls, addr
calculation
improvement
(compiler)

Code
generator

target
code

Reg usage,
instruction
choice,
peephole opt
(compiler)
4

Introduction

Organization of an optimizing compiler


Control
flow
analysis

Data flow
analysis

Transformation

Code optimizer

Classifications of Optimization
techniques

Peephole optimization
Local optimizations
Global Optimizations

Inter-procedural
Intra-procedural

Loop optimization

Factors influencing Optimization

The target machine: machine dependent factors


can be parameterized to compiler for fine tuning
Architecture of Target CPU:
Number

of CPU registers
RISC vs CISC
Pipeline Architecture
Number of functional units

Machine Architecture
Cache Size and type
Cache/Memory transfer

rate
7

Themes behind Optimization


Techniques

Avoid redundancy: something already computed

need not be computed again


Smaller code: less work for CPU, cache, and memory!

Less jumps: jumps interfere with code pre-fetch


Code locality: codes executed close together in time is
generated close together in memory increase locality
of reference

Extract more information about code: More info


better code generation

Redundancy elimination

Redundancy elimination = determining that two


computations are equivalent and eliminating one.
There are several types of redundancy elimination:
Value

numbering

Associates symbolic values to computations and identifies


expressions that have the same value

Common

subexpression elimination

Identifies expressions that have operands with the same name

Constant/Copy

Identifies variables that have constant/copy values and uses the


constants/copies in place of the variables.

Partial

propagation

redundancy elimination

Inserts computations in paths to convert partial redundancy to


full redundancy.
9

Optimizing Transformations

Compile time evaluation


Common sub-expression elimination
Code motion
Strength Reduction
Dead code elimination
Copy propagation
Loop optimization
Induction

variables and strength reduction

10

Compile-Time Evaluation
Expressions whose values can be precomputed at the compilation time
Two ways:

Constant

folding
Constant propagation

11

Compile-Time Evaluation

Constant folding: Evaluation of an


expression with constant operands to
replace the expression with single value

Example:
area := (22.0/7.0) * r ** 2

area := 3.14286 * r ** 2

12

Compile-Time Evaluation
Constant Propagation: Replace a
variable with constant which has been
assigned to it earlier.
Example:

pi := 3.14286
area = pi * r ** 2

area = 3.14286 * r ** 2

13

Constant Propagation

What does it mean?


Given

an assignment x = c, where c is a constant,


replace later uses of x with uses of c, provided there
are no intervening assignments to x.

Similar to copy propagation


Extra feature: It can analyze constant-value conditionals to
determine whether a branch should be executed or not.

When is it performed?
Early

in the optimization process.

What is the result?


Smaller

code
Fewer registers
14

Common Sub-expression
Evaluation

Identify common sub-expression present in different


expression, compute once, and use the result in all the
places.

The definition of the variables involved should not change

Example:
a := b * c

x := b * c + 5

temp := b * c
a := temp

x := temp + 5

15

Common Subexpression Elimination

Local common subexpression elimination


Performed

within basic blocks


Algorithm sketch:

Traverse BB from top to bottom


Maintain table of expressions evaluated so far

if any operand of the expression is redefined, remove it from


the table

Modify applicable instructions as you go

generate temporary variable, store the expression in it and use


the variable next time the expression is encountered.

x=a+b
...
y=a+b

t=a+b
x=t
...
y=t

16

Common Subexpression Elimination


c=a+b
d=m*n
e=b+d
f=a+b
g=-b
h=b+a
a=j+a
k=m*n
j=b+d
a=-b
if m * n go to L

t1 = a + b
c = t1
t2 = m * n
d = t2
t3 = b + d
e = t3
f = t1
g = -b
h = t1 /* commutative */
a=j+a
k = t2
j = t3
a = -b
if t2 go to L

the table contains quintuples:


(pos, opd1, opr, opd2, tmp)
17

Common Subexpression Elimination

Global common subexpression


elimination
Performed

on flow graph
Requires available expression information

In addition to finding what expressions are


available at the endpoints of basic blocks, we
need to know where each of those expressions
was most recently evaluated (which block and
which position within that block).

18

Common Sub-expression
Evaluation
1

x:=a+b

a:= b

z : = a + b + 10

a + b is not a
common subexpression in 1 and 4

None of the variable involved should be modified in any path

19

Code Motion

Moving code from one part of the program


to other without modifying the algorithm
Reduce

size of the program


Reduce execution frequency of the code
subjected to movement

20

Code Motion
1.

Code Space reduction: Similar to common


sub-expression elimination but with the
objective to reduce code size.
Example: Code hoisting
if (a< b) then
z := x ** 2
else
y := x ** 2 + 10

temp : = x ** 2
if (a< b) then
z := temp
else
y := temp + 10

x ** 2 is computed once in both cases, but the code size in the


second case reduces.
21

Code Motion
2

Execution frequency reduction: reduce execution


frequency of partially available expressions
(expressions available atleast in one path)
Example:
if (a<b) then
z=x*2
else
y = 10
g=x*2

if (a<b) then
temp = x * 2
z = temp
else
y = 10
temp = x * 2
g = temp;

22

Code Motion
Move expression out of a loop if the
evaluation does not change inside the
loop.
Example:

while ( i < (max-2) )

Equivalent to:
t :=

max - 2

while ( i < t )
23

Code Motion

Safety of Code movement


Movement of an expression e from a basic block bi to
another block bj, is safe if it does not introduce any
new occurrence of e along any path.
Example: Unsafe code movement
if (a<b) then
z=x*2
else
y = 10

temp = x * 2
if (a<b) then
z = temp
else
y = 10

24

Strength Reduction

Replacement of an operator with a less costly one.

Example:
for i=1 to 10 do

x=i*5

end

temp = 5;
for i=1 to 10 do

x = temp

temp = temp + 5
end

Typical cases of strength reduction occurs in address


calculation of array references.
Applies to integer expressions involving induction
variables (loop optimization)
25

Dead Code Elimination

Dead Code are portion of the program which will


not be executed in any path of the program.
Can

be removed

Examples:
No

control flows into a basic block


A variable is dead at a point -> its value is not used
anywhere in the program
An assignment is dead -> assignment assigns a value
to a dead variable

26

Dead Code Elimination


Examples:
DEBUG:=0
if (DEBUG) print

Can be
eliminated

28

Copy Propagation

What does it mean?


Given an assignment x = y, replace later uses of x

with uses of y, provided there are no intervening


assignments to x or y.

When is it performed?
At

any level, but usually early in the


optimization process.

What is the result?


Smaller

code

29

Copy Propagation

f := g are called copy statements or copies


Use of g for f, whenever possible after copy
statement
Example:
x[i] = a;
sum = x[i] + a;

x[i] = a;
sum = a + a;

May not appear to be code improvement, but


opens up scope for other optimizations.
30

Local Copy Propagation

Local copy propagation


Performed within basic blocks
Algorithm sketch:
traverse

BB from top to bottom


maintain table of copies encountered so far
modify applicable instructions as you go

31

Loop Optimization
Decrease the number if instruction in the
inner loop
Even if we increase no of instructions in
the outer loop
Techniques:

Code

motion
Induction variable elimination
Strength reduction
32

Peephole Optimization
Pass

over generated code to examine


a few instructions, typically 2 to 4
Redundant

instruction Elimination: Use


algebraic identities

Flow

of control optimization: removal of


redundant jumps

Use

of machine idioms
33

Redundant instruction elimination

Redundant load/store: see if an obvious replacement is possible

MOV R0, a
MOV a, R0
Can eliminate the second instruction without needing any global
knowledge of a

Unreachable code: identify code which will never be executed:


#define DEBUG 0
if( DEBUG) {

if (0 != 1) goto L2

print debugging info

print debugging info

L2:

34

Algebraic identities

Worth recognizing single instructions with a constant operand:


A * 1 = A
A * 0 = 0
A / 1 = A
A * 2 = A + A
More delicate with floating-point

Strength reduction:
A ^ 2 = A * A

35

Objective

Why would anyone write X * 1?

Why bother to correct such obvious junk code?

In fact one might write


#define MAX_TASKS 1
...
a = b * MAX_TASKS;

Also, seemingly redundant code can be produced


by other optimizations. This is an important effect.
36

The right shift problem

Arithmetic Right shift:


shift

right and use sign bit to fill most significant

bits

-5
111111...1111111011
SAR
111111...1111111101
which is -3, not -2
in most languages -5/2 = -2

38

Addition chains for multiplication

If multiply is very slow (or on a machine with no multiply


instruction like the original SPARC), decomposing a
constant operand into sum of powers of two can be
effective:

X * 125

x * 128 - x*4 + x

two

shifts, one subtract and one add, which may be


faster than one multiply

Note

similarity with efficient exponentiation method

39

Folding Jumps to Jumps

A jump to an unconditional jump can copy the target


address

JNE lab1
...
lab1: JMP lab2
Can be replaced by:
JNE lab2
As a result, lab1 may become dead (unreferenced)

40

Jump to Return

A jump to a return can be replaced by a


return
JMP lab1
...
lab1: RET
Can

be replaced by
RET

lab1 may become dead code

41

Usage of Machine idioms

Use machine specific hardware instruction


which may be less costly.
i := i + 1
ADD i, #1

INC i

42

Local Optimization

43

Optimization of Basic Blocks

Many structure preserving transformations


can be implemented by construction of
DAGs of basic blocks

44

DAG representation
of Basic Block (BB)

Leaves are labeled with unique identifier (var


name or const)
Interior nodes are labeled by an operator symbol
Nodes optionally have a list of labels (identifiers)
Edges relates operands to the operator (interior
nodes are operator)
Interior node represents computed value
Identifier

in the label are deemed to hold the value


45

Example: DAG for BB


*

t1 := 4 * i

4
t1 := 4 * i
t3 := 4 * i
t2 := t1 + t3

t1

if (i <= 20)goto L1

+ t2

<= (L1)

* t1, t3
i
4

20

i
46

Construction of DAGs for BB

I/p: Basic block, B


O/p: A DAG for B containing the following
information:
A label for each node
2) For leaves the labels are ids or consts
3) For interior nodes the labels are operators
4) For each node a list of attached ids
(possible empty list, no consts)
1)

47

Construction of DAGs for BB

Data structure and functions:

Node:
1)
2)
3)
4)

Label: label of the node


Left: pointer to the left child node
Right: pointer to the right child node
List: list of additional labels (empty for leaves)

Node (id): returns the most recent node created for


id. Else return undef
Create(id,l,r): create a node with label id with l as
left child and r as right child. l and r are optional
params.
48

Construction of DAGs for BB

Method:
For each 3AC, A in B
A if of the following forms:
1.
2.
3.

1.

x := y op z
x := op y
x := y

if ((ny = node(y)) == undef)


ny = Create (y);
if (A == type 1)
and ((nz = node(z)) == undef)
nz = Create(z);
49

Construction of DAGs for BB


2.

If (A == type 1)
Find a node labelled op with left and right as ny and nz
respectively [determination of common sub-expression]
If (not found)

n = Create (op, ny, nz);

If (A == type 2)
Find a node labelled op with a single child as ny
If (not found) n = Create (op, ny);
3.

If (A == type 3) n = Node (y);


Remove x from Node(x).list
Add x in n.list
Node(x) = n;

50

Example: DAG construction


from BB
t1 := 4 * i

* t1
4

i
51

Example: DAG construction


from BB
t1 := 4 * i
t2 := a [ t1 ]

[] t2
* t1
a

i
52

Example: DAG construction


from BB
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i

[] t2
* t1, t3
a

i
53

Example: DAG construction


from BB
t1
t2
t3
t4

:=
:=
:=
:=

4
a
4
b

*
[
*
[

i
t1 ]
i
t3 ]

t4 []

[] t2
* t1, t3

i
54

Example: DAG construction


from BB
t1
t2
t3
t4
t5

:=
:=
:=
:=
:=

4 * i
a [ t1 ]
4 * i
b [ t3 ]
t2 + t4

+ t5
t4 []

[] t2
* t1, t3

i
55

Example: DAG construction


from BB
t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ]
t5 := t2 + t4
i := t5

+ t5,i
t4 []

[] t2
* t1, t3

i
56

DAG of a Basic Block

Observations:
A

leaf node for the initial value of an id


A node n for each statement s
The children of node n are the last definition
(prior to s) of the operands of n

57

Optimization of Basic Blocks

Common sub-expression elimination: by


construction of DAG
Note:

for common sub-expression elimination,


we are actually targeting for expressions that
compute the same value.
a
b
c
e

:=
:=
:=
:=

b
b
c
b

+
+

c
d
d
c

Common expressions
But do not generate the
same result
58

Optimization of Basic Blocks

DAG representation identifies expressions


that yield the same result
a
b
c
e

:=
:=
:=
:=

b
b
c
b

+
+

c
d
d
c

+ e

+ a

- b

+ c

b0

c0

d0

59

Optimization of Basic Blocks

Dead code elimination: Code generation


from DAG eliminates dead code.
a
b
d
c

:=
:=
:=
:=

b
a
a
d

c
d
d
c

c +

b,d

a := b + c
d := a - d
c := d + c

a +

d0

b is not live
b0

c0
60

Loop Optimization

61

Loop Optimizations

Most important set of optimizations


Programs

are likely to spend more time in

loops

Presumption: Loop has been identified


Optimizations:

Loop

invariant code removal


Induction variable strength reduction
Induction variable reduction
62

Loops in Flow Graph

Dominators:
A node d of a flow graph G dominates a node n, if
every path in G from the initial node to n goes through
d.
Represented as: d dom n
Corollaries:
Every node dominates itself.
The initial node dominates all nodes in G.
The entry node of a loop dominates all nodes in the
loop.
63

Loops in Flow Graph

Each node n has a unique immediate dominator


m, which is the last dominator of n on any path
in G from the initial node to n.
(d n) && (d dom n) d dom m

Dominator tree (T):


A representation of dominator information of
flow graph G.
The root node of T is the initial node of G
A node d in T dominates all node in its sub-tree

64

Example: Loops in Flow Graph


1
2

1
3

4
5

4
6
5

7
8
8

Flow Graph

Dominator Tree
65

Loops in Flow Graph

Natural loops:
1.
2.

A loop has a single entry point, called the header.


Header dominates all node in the loop
There is at least one path back to the header from
the loop nodes (i.e. there is at least one way to
iterate the loop)

Natural loops can be detected by back edges.

Back edges: edges where the sink node (head) dominates


the source node (tail) in G
66

Loop Optimization

Loop interchange: exchange inner loops with


outer loops
Loop splitting: attempts to simplify a loop or
eliminate dependencies by breaking it into
multiple loops which have the same bodies but
iterate over different contiguous portions of the
index range.
A

useful special case is loop peeling - simplify a loop


with a problematic first iteration by performing that
iteration separately before entering the loop.

73

Loop Optimization

Loop fusion: two adjacent loops would iterate


the same number of times, their bodies can be
combined as long as they make no reference to
each other's data
Loop fission: break a loop into multiple loops
over the same index range but each taking only
a part of the loop's body.
Loop unrolling: duplicates the body of the loop
multiple times
74

Loop Optimization
Header

Pre-Header:

loop L

Targeted

to hold statements
that are moved out of the loop
A basic block which has only
the header as successor
Control flow that used to enter
the loop from outside the loop,
through the header, enters the
loop from the pre-header

Pre-header
Header
loop L

75

Loop Invariant Code Removal

Move out to pre-header the statements


whose source operands do not change
within the loop.
Be

careful with the memory operations


Be careful with statements which are
executed in some of the iterations

77

Loop Invariant Code Removal

Rules: A statement S: x:=y op z is loop invariant:


y

and z not modified in loop body


S is the only statement to modify x
For all uses of x, x is in the available def set.
For all exit edge from the loop, S is in the available
def set of the edges.
If S is a load or store (mem ops), then there is no
writes to address(x) in the loop.

78

Loop Invariant Code Removal

Loop invariant code removal can be done without


available definition information.

Rules that need change:


For all use of x is in the
available definition set
For all exit edges, if x is
live on the exit edges, is
in the available definition
set on the exit edges

Approx of First rule:

d dominates all uses of x

Approx of Second rule

d dominates all exit basic


blocks where x is live

79

Loop Induction Variable

Induction variables are variables such that every


time they change value, they are incremented or
decremented.
Basic

induction variable: induction variable whose


only assignments within a loop are of the form:
i = i +/- C, where C is a constant.
Primary induction variable: basic induction variable
that controls the loop execution
(for i=0; i<100; i++)
i (register holding i) is the primary induction variable.
Derived induction variable: variable that is a linear
function of a basic induction variable.
80

Loop Induction Variable


Basic: r4, r7, r1
Primary: r1
Derived: r2

r1 = 0
r7 = &A

Loop: r2 = r1 * 4
r4 = r7 + 3
r7 = r7 + 1
r10 = *r2
r3 = *r4
r9 = r1 * r3
r10 = r9 >> 4
*r2 = r10
r1 = r1 + 4
If(r1 < 100) goto Loop
81

Induction Variable Strength


Reduction

Create basic induction variables from derived


induction variables.
Rules: (S: x := y op z)
is *, <<, +, or
y is a induction variable
z is invariant
No other statement modifies x
x is not y or z
x is a register
op

82

Induction Variable Strength


Reduction

Transformation:
Insert the following into the bottom of pre-header:
new_reg = expression of target statement S
if (opcode(S)) is not add/sub, insert to the bottom of the
preheader
new_inc = inc(y,op,z)
Function: inc()
else
Calculate the amount of inc
new_inc = inc(x)
for 1st param.
Insert the following at each update of y
new_reg = new_reg + new_inc

Change S: x = new_reg
83

Example: Induction Variable


Strength Reduction
new_reg = r4 * r9
new_inc = r9

r5 = r4 - 3
r4 = r4 + 1

r7 = r4 *r9

r6 = r4 << 2

r5 = r4 - 3
r4 = r4 + 1
new_reg += new_inc
r7 = new_reg
r6 = r4 << 2

84

Induction Variable Elimination

Remove unnecessary basic induction variables from the


loop by substituting uses with another basic induction
variable.
Rules:

Find two basic induction variables, x and y


x and y in the same family

Incremented at the same place

Increments are equal


Initial values are equal
x is not live at exit of loop
For each BB where x is defined, there is no use of x between the
first and the last definition of y
85

Example: Induction Variable


Elimination
r1 = 0
r2 = 0

r2 = 0

r1 = r1 - 1
r2 = r2 -1

r2 = r2 - 1

r7 = r1 * r9

r9 = r2 + r4

r4 = *(r1)

r7 = r2 * r9

r9 = r2 + r4

r4 = *(r2)
*r2 = r7

*r7 = r2
86

Induction Variable Elimination

Variants:

Complexity of elimination

1.
2.
3.
4.
5.

Trivial: induction variable that are never used except to


increment themselves and not live at the exit of loop
Same increment, same initial value (discussed)
Same increment, initial values are a known constant offset from
one another
Same increment, nothing known about the relation of initial
value
Different increments, nothing known about the relation of initial
value
1,2 are basically free
3-5 require complex pre-header operations
87

Example: Induction Variable


Elimination

Case 4: Same increment, unknown initial value


For the induction variable that we are eliminating, look at each nonincremental use, generate the same sequence of values as before.
If that can be done without adding any extra statements in the loop
body, then the transformation can be done.
rx := r2 r1 + 8
r4 := r2 + 8
r3 := r1 + 4
.
.
r1 := r1 + 4
r2 := r2 + 4

r4 := r1 + rx
r3 := r1 = 4
.
.
r1 := r1 + 4
88

Loop Unrolling

Replicate the body of a loop (N-1) times,


resulting in total N copies.
Enable

overlap of operations from different iterations


Increase potential of instruction level parallelism (ILP)

Variants:
Unroll

multiple of known trip counts


Unroll with remainder loop
While loop unroll

89

Global Data Flow


Analysis

90

Global Data Flow Analysis

Collect information about the whole program.


Distribute the information to each block in the
flow graph.
Data flow information: Information collected by
data flow analysis.
Data flow equations: A set of equations solved
by data flow analysis to gather data flow
information.
91

Data flow analysis

IMPORTANT!
Data flow analysis should never tell us that a
transformation is safe when in fact it is not.
When

doing data flow analysis we must be


Conservative

Do not consider information that may not preserve the


behavior of the program

Aggressive

Try to collect information that is as exact as possible, so


we can get the greatest benefit from our optimizations.

92

Global Iterative Data Flow Analysis

Global:
Performed on the flow graph
Goal = to collect information at the beginning
and end of each basic block
Iterative:
Construct data flow equations that describe
how information flows through each basic
block and solve them by iteratively
converging on a solution.
93

Global Iterative Data Flow Analysis

Components of data flow equations


Sets

containing collected information

in set: information coming into the BB from outside


(following flow of data)
gen set: information generated/collected within the BB
kill set: information that, due to action within the BB, will
affect what has been collected outside the BB
out set: information leaving the BB

Functions

(operations on these sets)

Transfer functions describe how information changes as it


flows through a basic block
Meet functions describe how information from multiple
paths is combined.
94

Global Iterative Data Flow Analysis

Algorithm sketch

Typically, a bit vector is used to store the information.

We use an iterative fixed-point algorithm.


Depending on the nature of the problem we are solving, we
may need to traverse each basic block in a forward (top-down)
or backward direction.

For example, in reaching definitions, each bit position corresponds


to one definition.

The order in which we "visit" each BB is not important in terms of


algorithm correctness, but is important in terms of efficiency.

In & Out sets should be initialized in a conservative and


aggressive way.

95

Global Iterative Data Flow Analysis

Initialize gen and kill sets


Initialize in or out sets (depending on "direction")
while there are no changes in in and out sets {
for each BB {
apply meet function
apply transfer function
}
}

96

Typical problems

Reaching definitions
For

each use of a variable, find all definitions that


reach it.

Upward exposed uses


For

each definition of a variable, find all uses that it


reaches.

Live variables
For

a point p and a variable v, determine whether v is


live at p.

Available expressions
Find

all expressions whose value is available at


some point p.
97

Global Data Flow Analysis

A typical data flow equation:


out[S ] gen[S ] (in[S ] kill[S ])
S: statement
in[S]: Information goes into S
kill[S]: Information get killed by S
gen[S]: New information generated by S
out[S]: Information goes out from S
98

Global Data Flow Analysis

The notion of gen and kill depends on the


desired information.
In some cases, in may be defined in terms of out
- equation is solved as analysis traverses in the
backward direction.
Data flow analysis follows control flow graph.
Equations

are set at the level of basic blocks, or even


for a statement

99

Points and Paths

Point within a basic block:


A location between two consecutive statements.
A location before the first statement of the basic

block.
A location after the last statement of the basic block.

Path: A path from a point p1 to pn is a sequence


of points p1, p2, pn such that for each i : 1 i
n,
pi

is a point immediately preceding a statement and


pi+1 is the point immediately following that statement
in the same block, or
pi is the last point of some block and pi+1 is first point
in the successor block.

100

Example: Paths and Points


d1: i := m 1
d2: j := n
d3: a := u1

B1

p3
p4

d4: i := i + 1

B2

p5
p6

d5: j := j - 1

B3

Path:
p1, p2, p3, p4,
p5, p6 pn

B4
p1
p2

d6: a := u2 B5

B6

pn
101

Reaching Definition

Definition of a variable x is a statement that assigns or


may assign a value to x.

Unambiguous Definition: The statements that certainly assigns a


value to x

Assignments to x
Read a value from I/O device to x

Ambiguous Definition: Statements that may assign a value to x

Call to a procedure with x as parameter (call by ref)


Call to a procedure which can access x (x being in the scope of the
procedure)
x is an alias for some other variable (aliasing)
Assignment through a pointer that could refer x

102

Reaching Definition

A definition d reaches a point p

if there is a path from the point immediately


following d to p and
d is not killed along the path (i.e. there is not
redefinition of the same variable in the path)

A definition of a variable is killed between


two points when there is another definition
of that variable along the path.
103

Example: Reaching Definition

p1
p2

d1: i := m 1
d2: j := n
d3: a := u1

B1

d4: i := i + 1

B2

d5: j := j - 1

B3
B4

d6: a := u2 B5

Definition of i (d1)
reaches p1
Killed as d4, does
not reach p2.

Definition of i (d1)
does not reach B3,
B4, B5 and B6.

B6
104

Reaching Definition

Non-Conservative view: A definition might reach


a point even if it might not.
Only

unambiguous definition kills a earlier definition


All edges of flow graph are assumed to be traversed.
if (a == b) then a = 2
else if (a == b) then a = 4
The definition a=4 is not reachable.
Whether each path in a flow graph is taken is an undecidable
problem
105

Data Flow analysis of a


Structured Program

Structured programs have well defined


loop constructs the resultant flow graph
is always reducible.
Without

loss of generality we only consider


while-do and if-then-else control constructs
S id := ES ; S
if E then S else S do S while E
E id + id id

The non-terminals represent regions.


106

Data Flow analysis of a


Structured Program

Region: A graph G= (N,E) which is


portion of the control flow graph G.
The

set of nodes N is in G such that

N includes a header h
h dominates all node in N

The

set of edges E is in G such that

All edges a b such that a,b are in N

107

Data Flow analysis of a


Structured Program

Region consisting of a statement S:


Control

can flow to only one block outside the region

Loop is a special case of a region that is strongly


connected and includes all its back edges.
Dummy blocks with no statements are used as
technical convenience (indicated as open
circles)
108

Data Flow analysis of a Structured Program:

Composition of Regions

S1

S S1 ; S2
S2

109

Data Flow analysis of a Structured Program:

Composition of Regions

if E goto S1
S if E then S1 else S2
S1

S2

110

Data Flow analysis of a Structured Program:

Composition of Regions

S1

S do S1 while E
if E goto S1

111

Data Flow Equations

Each region (or NT) has four attributes:


gen[S]:

Set of definitions generated by the


block S.
If a definition d is in gen[S], then d reaches the
end of block S.

kill[S]:

Set of definitions killed by block S.

If d is in kill[S], d never reaches the end of block S. Every


path from the beginning of S to the end S must have a
definition for a (where a is defined by d).

112

Data Flow Equations


in[S]:

The set of definition those are live at


the entry point of block S.
out[S]: The set of definition those are live
at the exit point of block S.

The data flow equations are inductive or


syntax directed.
gen

and kill are synthesized attributes.


in is an inherited attribute.

113

Data Flow Equations


gen[S] concerns with a single basic block.
It is the set of definitions in S that reaches
the end of S.
In contrast out[S] is the set of definitions
(possibly defined in some other block) live
at the end of S considering all paths
through S.

114

Data Flow Equations


Single statement
gen[ S ] {d }
kill[ S ] Da {d }
S

d:

a := b + c

out[S ] gen[S ] (in[S ] kill[S ])

Da: The set of definitions in the program for variable a


115

Data Flow Equations


Composition
gen[ S ] gen[ S 2 ]
kill[ S ] kill[ S 2 ]

( gen[ S1 ] kill[ S 2 ])
(kill[ S1 ] gen[ S 2 ])

S1
S
in[ S1 ] in[ S ]
in[ S2 ] out[ S1 ]

S2

out[ S ] out[ S2 ]

116

Data Flow Equations


if-then-else
gen[ S ] gen[ S1 ]
kill[ S ] kill[ S1 ]

gen[ S 2 ]
kill[ S 2 ]

S1

S2

in[ S1 ] in[ S ]
in[ S2 ] in[ S ]
out[ S ] out[ S1 ]

out[ S 2 ]
117

Data Flow Equations


Loop
gen[ S ] gen[ S1 ]
kill[ S ] kill[ S1 ]
S1

in[ S1 ] in[ S ]

gen[ S1 ]

out[ S ] out[ S1 ]

118

Data Flow Analysis

The attributes are computed for each region.


The equations can be solved in two phases:
gen

and kill can be computed in a single pass of a


basic block.
in and out are computed iteratively.
Initial condition for in for the whole program is

In can be computed top- down


Finally out is computed

119

Dealing with loop


Due to back edge, in[S] cannot be used as
in [S1]
in[S1] and out[S1] are interdependent.
The equation is solved iteratively.
The general equations for in and out:

in[ S ]

(out[Y ] : Y is a predecessor of S)

out[ S ] gen[ S ]

(in[ S ] kill[ S ])
120

Reaching definitions

What is safe?

To assume that a definition reaches a


point even if it turns out not to.
The computed set of definitions reaching a
point p will be a superset of the actual set
of definitions reaching p
Goal : make the set of reaching definitions
as small as possible (i.e. as close to the
actual set as possible)
121

Reaching definitions

How are the gen and kill sets defined?

gen[B] = {definitions that appear in B and


reach the end of B}
kill[B] = {all definitions that never reach
the end of B}

What is the direction of the analysis?

forward
out[B] = gen[B] (in[B] - kill[B])
122

Reaching definitions

What is the confluence operator?

union
in[B] = out[P], over the predecessors P of
B

How do we initialize?

start small

Why? Because we want the resulting set to be as


small as possible

for each block B initialize out[B] = gen[B]


123

Computation of gen and kill sets


for each basic block BB do
gen(BB) = ; kill(BB) = ;
for each statement (d: x := y op z) in sequential order in BB, do
kill(BB) = kill(BB) U G[x];
G[x] = d;
endfor
gen(BB) = U G[x]: for all id x
endfor

124

Computation of in and out sets


for all basic blocks BB
in(BB) =
for all basic blocks BB out(BB) = gen(BB)
change = true
while (change) do
change = false
for each basic block BB, do
old_out = out(BB)
in(BB) = U(out(Y)) for all predecessors Y of BB
out(BB) = gen(BB) + (in(BB) kill(BB))
if (old_out != out(BB)) then change = true
endfor
endfor

125

Live Variable (Liveness) Analysis

Liveness: For each point p in a program and each


variable y, determine whether y can be used before
being redefined, starting at p.

Attributes

use = set of variable used in the BB prior to its definition


def = set of variables defined in BB prior to any use of the
variable
in = set of variables that are live at the entry point of a BB
out = set of variables that are live at the exit point of a BB

126

Live Variable (Liveness) Analysis

Data flow equations:


in[ B ] use[ B ]
out[ B ]

(out[ B ] def [ B ])
in[ S ]

S succ ( B )

1st

Equation: a var is live, coming in the block, if either


it is used before redefinition in B

or

it is live coming out of B and is not redefined in B

2nd

Equation: a var is live coming out of B, iff it is live


coming in to one of its successors.
127

Example: Liveness
r2, r3, r4, r5 are all live as they
are consumed later, r6 is dead
as it is redefined later

r1 = r2 + r3
r6 = r4 r5

r4 = 4
r6 = 8
r6 = r2 + r3
r7 = r4 r5

r4 is dead, as it is redefined.
So is r6. r2, r3, r5 are live

What does this mean?


r6 = r4 r5 is useless,
it produces a dead value !!
Get rid of it!
128

Computation of use and def sets


for each basic block BB do
def(BB) = ; use(BB) = ;
for each statement (x := y op z) in sequential order, do
for each operand y, do
if (y not in def(BB))
use(BB) = use(BB) U {y};
endfor
def(BB) = def(BB) U {x};
endfor
def is the union of all the LHSs
use is all the ids used before defined
129

Computation of in and out sets


for all basic blocks BB
in(BB) = ;
change = true;
while (change) do
change = false
for each basic block BB do
old_in = in(BB);
out(BB) = U{in(Y): for all successors Y of BB}

in(X) = use(X) U (out(X) def(X))


if (old_in != in(X)) then change = true
endfor
endfor

130

DU/UD Chains
Convenient way to access/use reaching
definition information.
Def-Use chains (DU chains)

Given

a def, what are all the possible


consumers of the definition produced

Use-Def chains (UD chains)


Given

a use, what are all the possible


producers of the definition consumed
131

Example: DU/UD Chains


1: r1 = MEM[r2+0]
2: r2 = r2 + 1
3: r3 = r1 * r4

4: r1 = r1 + 5
5: r3 = r5 r1
6: r7 = r3 * 2

7: r7 = r6
8: r2 = 0
9: r7 = r7 + 1

DU Chain of r1:
(1) -> 3,4
(4) ->5
DU Chain of r3:
(3) -> 11
(5) -> 11
UD Chain of r1:
(12) ->
(12) -> 11
UD Chain of r7:
(10) -> 6,9

10: r8 = r7 + 5
11: r1 = r3 r8
12: r3 = r1 * 2
132

Some-things to Think About

Liveness and Reaching definitions are basically the


same thing!

All dataflow is basically the same with a few parameters

Meaning of gen/kill (use/def)


Backward / Forward
All paths / some paths (must/may)

So far, we have looked at may analysis algorithms


How do you adjust to do must algorithms?

Dataflow can be slow

How to implement it efficiently?


How to represent the info?

133

Generalizing Dataflow Analysis

Transfer function
How

information is changed by BB

out[BB] = gen[BB] + (in[BB] kill[BB]) forward analysis


in[BB] = gen[BB] + (out[BB] kill[BB]) backward analysis

Meet/Confluence function
How

information from multiple paths is combined


in[BB] = U out[P] : P is pred of BB forward analysis
out[BB] = U in[P] : P is succ of BB backward analysis

134

Generalized Dataflow Algorithm


change = true;
while (change)
change = false;
for each BB
apply meet function
apply transfer function
if any changes change = true;

135

Example: Liveness by upward


exposed uses
for each basic block BB, do
gen[ BB]
kill[ BB]
for each operation (x := y op z) in reverse order in BB, do

gen[ BB] gen[ BB] {x}


kill[ BB] kill[ BB] {x}
for each source operand of op, y, do
gen[ BB] gen[ BB] { y}
kill[ BB] kill[ BB] { y}

endfor
endfor
endfor
136

Beyond Upward Exposed Uses

Upward exposed defs

in = gen + (out kill)

out = U(in(succ))
Walk ops reverse order

gen += {dest} kill += {dest}

Downward exposed defs


in = U(out(pred))
out = gen + (in - kill)
Walk in forward order

gen += {dest}; kill += {dest};

Downward exposed uses


in = U(out(pred))
out = gen + (in - kill)
Walk in forward order

gen += {src}; kill -= {src};


gen -= {dest}; kill += {dest};

137

All Path Problem

Up to this point

Any path problems (maybe relations)

Definition reaches along some path


Some sequence of branches in which def reaches
Lots of defs of the same variable may reach a point

Use of Union operator in meet function

All-path: Definition guaranteed to reach

Regardless of sequence of branches taken, def reaches


Can always count on this
Only 1 def can be guaranteed to reach
Availability (as opposed to reaching)

Available definitions
Available expressions (could also have reaching expressions,
but not that useful)
138

Reaching vs Available Definitions


1: r1 = r2 + r3
2: r6 = r4 r5

1,2 reach
1,2 available

1,2 reach
1,2 available

3: r4 = 4
4: r6 = 8
1,3,4 reach
1,3,4 available

5: r6 = r2 + r3
6: r7 = r4 r5

1,2,3,4 reach
1 available
139

Available Definition Analysis


(Adefs)

A definition d is available at a point p if along all paths


from d to p, d is not killed

Remember, a definition of a variable is killed between 2 points when


there is another definition of that variable along the path

r1 = r2 + r3 kills previous definitions of r1

Algorithm:

Forward dataflow analysis as propagation occurs from defs


downwards
Use the Intersect function as the meet operator to guarantee the
all-path requirement
gen/kill/in/out similar to reaching defs

Initialization of in/out is the tricky part


140

Compute Adef gen/kill Sets


for each basic block BB do
gen(BB) = ; kill(BB) = ;
for each statement (d: x := y op z) in sequential order in BB, do
kill(BB) = kill(BB) U G[x];
G[x] = d;
endfor
gen(BB) = U G[x]: for all id x
endfor

Exactly the same as Reaching defs !!


141

Compute Adef in/out Sets


U = universal set of all definitions in the prog
in(0) = 0; out(0) = gen(0)
for each basic block BB, (BB != 0), do
in(BB) = 0; out(BB) = U kill(BB)
change = true
while (change) do
change = false
for each basic block BB, do
old_out = out(BB)
in(BB) =
out(Y) : for all predecessors Y of BB
out(BB) = GEN(X) + (IN(X) KILL(X))
if (old_out != out(X)) then change = true
endfor
endfor
142

Available Expression Analysis


(Aexprs)

An expression is a RHS of an operation

An expression e is available at a point p if along all paths


from e to p, e is not killed.
An expression is killed between two points when one of
its source operands are redefined

Ex: in r2 = r3 + r4 r3 + r4 is an expression

Ex: r1 = r2 + r3 kills all expressions involving r1

Algorithm:

Forward dataflow analysis


Use the Intersect function as the meet operator to guarantee the
all-path requirement
Looks exactly like adefs, except gen/kill/in/out are the RHSs of
operations rather than the LHSs
143

Available Expression

Input: A flow graph with e_kill[B] and e_gen[B]


Output: in[B] and out[B]
Method:
foreach basic block B
in[B1] := ; out[B1] := e_gen[B1];
out[B] = U - e_kill[B];
change=true
while(change)
change=false;
for each basic block B,
in[B] :=
out[P]: P is pred of B
old_out := out[B];
out[B] := e_gen[B] (in[B] e_kill[B])
if (out[B] old_out[B]) change := true;
144

Efficient Calculation of Dataflow


Order in which the basic blocks are visited
is important (faster convergence)
Forward analysis DFS order

Visit

a node only when all its predecessors


have been visited

Backward analysis PostDFS order


Visit

a node only when all of its successors


have been visited
145

Representing Dataflow Information

Requirements Efficiency!
Large

amount of information to store


Fast access/manipulation

Bitvectors
General

strategy used by most compilers


Bit positions represent defs (rdefs)
Efficient set operations: union/intersect/isone
Used for gen, kill, in, out for each BB
146

Optimization using Dataflow

Classes of optimization
1.

Classical (machine independent)

2.

Machine specific

3.

Reducing operation count (redundancy elimination)


Simplifying operations
Peephole optimizations
Take advantage of specialized hardware features

Instruction Level Parallelism (ILP) enhancing

Increasing parallelism
Possibly increase instructions

147

Types of Classical Optimizations

Operation-level One operation in isolation


Constant

folding, strength reduction


Dead code elimination (global, but 1 op at a time)

Local Pairs of operations in same BB


May

or may not use dataflow analysis

Global Again pairs of operations


Pairs

of operations in different BBs

Loop Body of a loop


148

Constant Folding

Simplify operation based on values of target operand

Constant propagation creates opportunities for this

All constant operands

Evaluate the op, replace with a move

Evaluate conditional branch, replace with BRU or noop

r1 = 3 * 4 r1 = 12
r1 = 3 / 0 ??? Dont evaluate excepting ops !, what about FP?
if (1 < 2) goto BB2 goto BB2
if (1 > 2) goto BB2 convert to a noop

Dead code

Algebraic identities

r1 = r2 + 0, r2 0, r2 | 0, r2 ^ 0, r2 << 0, r2 >> 0 r1 = r2
r1 = 0 * r2, 0 / r2, 0 & r2 r1 = 0
r1 = r2 * 1, r2 / 1 r1 = r2
149

Strength Reduction

Replace expensive ops with cheaper ones

Power of 2 constants

Constant propagation creates opportunities for this


Mult by power of 2: r1 = r2 * 8
Div by power of 2: r1 = r2 / 4
Rem by power of 2: r1 = r2 % 16

r1 = r2 << 3
r1 = r2 >> 2
r1 = r2 & 15

More exotic

Replace multiply by constant by sequence of shift and adds/subs

r1 = r2 * 6

r100 = r2 << 2; r101 = r2 << 1; r1 = r100 + r101

r1 = r2 * 7

r100 = r2 << 3; r1 = r100 r2


150

Dead Code Elimination


Remove statement d: x := y op z whose
result is never consumed.
Rules:

DU

chain for d is empty


y and z are not live at d

151

Constant Propagation

Forward propagation of moves/assignment


of the form
d: rx := L where L is literal
Replacement

of rx with L wherever

possible.
d must be available at point of replacement.

152

Forward Copy Propagation

Forward propagation of RHS of


assignment or movs.
r1 := r2
.
.
.
r4 := r1 + 1

r1 := r2
.
.
.
r4 := r2 + 1

Reduce

chain of dependency
Possibly create dead code
153

Forward Copy Propagation

Rules:
Statement dS is source of copy propagation
Statement dT is target of copy propagation
dS

is a mov statement
src(dS) is a register
dT uses dest(dS)
dS is available definition at dT
src(dS) is a available expression at dT
154

Backward Copy Propagation

Backward propagation of LHS of an assignment.


dT: r1 := r2 + r3
r5 := r1 + r6
dS: r4 := r1

r4 := r2 + r3
r5 := r4 + r6
Dead Code

Rules:

dT and dS are in the same basic block


dest(dT) is register
dest(dT) is not live in out[B]
dest(dS) is a register
dS uses dest(dT)
dest(dS) not used between dT and dS
dest(dS) not defined between d1 and dS
There is no use of dest(dT) after the first definition of dest(dS)
155

Local Common Sub-Expression


Elimination

Benefits:

Reduced computation
Generates mov statements,
which can get copy propagated

dS: r1 := r2 + r3
dT: r4 := r2 + r3

Rules:

dS and dT has the same


expression
src(dS) == src(dT) for all sources
For all sources x, x is not
redefined between dS and dT

dS: r1 := r2 + r3
r100 := r1
dT: r4 := r100

156

Global Common Sub-Expression


Elimination

Rules:
dS

and dT has the same expression


src(dS) == src(dT) for all sources of dS and dT
Expression of dS is available at dT

157

Unreachable Code Elimination


Mark initial BB visited
to_visit = initial BB
while (to_visit not empty)
current = to_visit.pop()
for each successor block of current
Mark successor as visited;
to_visit += successor
endfor
endwhile
Eliminate all unvisited blocks

entry

bb1

bb2

bb3

bb4
bb5

Which BB(s) can be deleted?

158

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy