Value Numbering
Value Numbering
Value Numbering
Value Numbering
PRESTON BRIGGS
Tera Computer Company, 2815 Eastlake Avenue East, Seattle, WA 98102
AND
KEITH D. COOPER L. TAYLOR SIMPSON
Rice University, 6100 Main Street, Mail Stop 41, Houston, TX 77005
SUMMARY
Value numbering is a compiler-based program analysis method that allows redundant computations to
be removed. This paper compares hash-based approaches derived from the classic local algorithm1 with
partitioning approaches based on the work of Alpern, Wegman, and Zadeck2. Historically, the hash-based
algorithm has been applied to single basic blocks or extended basic blocks. We have improved the technique
to operate over the routine’s dominator tree. The partitioning approach partitions the values in the routine
into congruence classes and removes computations when one congruent value dominates another. We have
extended this technique to remove computations that define a value in the set of available expressions
(AVAIL)3 . Also, we are able to apply a version of Morel and Renvoise’s partial redundancy elimination4 to
remove even more redundancies.
The paper presents a series of hash-based algorithms and a series of refinements to the partitioning
technique. Within each series, it can be proved that each method discovers at least as many redundancies as
its predecessors. Unfortunately, no such relationship exists between the hash-based and global techniques. On
some programs, the hash-based techniques eliminate more redundancies than the partitioning techniques,
while on others, partitioning wins. We experimentally compare the improvements made by these techniques
when applied to real programs. These results will be useful for commercial compiler writers who wish to
assess the potential impact of each technique before implementation.
INTRODUCTION
Value numbering is a compiler-based program analysis technique with a long history in both
literature and practice. Although the name was originally applied to a method for improving
single basic blocks, it is now used to describe a collection of optimizations that vary in power
and scope. The primary objective of value numbering is to assign an identifying number (a
value number) to each expression in a particular way. The number must have the property that
two expressions have the same number if the compiler can prove they are equal for all possible
program inputs. The numbers can then be used to find redundant computations and remove
them. There are two other objectives accomplished by certain forms of value numbering:
2. To evaluate expressions whose operands are constants and to propagate their values
through the code.
This paper describes different techniques for assigning numbers and handling redundancies.
There are several ways to accomplish each of these goals, and the methods can be applied
across different scopes. It includes an experimental evaluation of the relative effectiveness of
these different approaches.
In value numbering, the compiler can only assign two expressions the same value number if
it can prove that they always produce equal values. Two techniques for proving this equivalence
appear in the literature:
• The first approach hashes an operator and the value numbers of its operands to produce a
value number for the resulting expression. Hashing provides an efficient representation
of the expressions known at any point during the analysis. The hash-based techniques
are on-line methods that transform the program immediately. Their efficiency relies on
the constant expected-time behavior of hashing.∗ This approach can easily be extended
to propagate constants and simplify algebraic identities.
• The second approach divides the expressions in a procedure into equivalence classes
by value, called congruence classes . Two values are congruent if they are computed
by the same operator and the corresponding operands are congruent. These methods
are called partitioning algorithms. The partitioning algorithm runs off-line; it must run
to completion before transforming the code. It can be made to run in O(E log2 N )
time, where N and E are the number of nodes and edges in the routine’s static single
assignment (SSA) graph6 . The partitioning algorithm cannot propagate constants or
simplify algebraic identities.
Once value numbers have been assigned, redundancies must be discovered and removed.
Many techniques are possible, ranging from ad hoc removal through data-flow techniques.
This paper makes several distinct contributions. These include: (1) an algorithm for hash-
based value numbering over a routine’s dominator tree, (2) an algorithm based on using
a unified hash table for the entire procedure, (3) an extension of Alpern, Wegman, and
Zadeck’s partition-based global value numbering algorithm to perform AVAIL-based removal
of expressions or partial redundancy elimination, and (4) an experimental comparison of these
techniques in the context of an optimizing compiler.
exist in the hash table. If the original variable still contains the same value, the recomputa-
tion can be replaced with a reference to that variable. To verify this condition, we look up
the name corresponding to the value number, v , and verify that its value number is still v
(i.e., VN[name[v]] = v ). Any operator with known-constant arguments is evaluated and the
resulting value used to replace any subsequent references. The algorithm is easily extended to
account for commutativity and simple algebraic identities without affecting its complexity.
As variables get assigned new values, the compiler must carefully keep track of the location
of each expression in the hash table. Consider the code fragment on the left side of Figure 2.
At statement (1), the expression X + Y is found in the hash table, but it is available in B and
not in A, since A has been redefined. We can handle this by making each entry in the name
array contain a list of variable names and carefully keeping the lists up to date. At statement
(2), the situation is worse; X + Y is in the hash table, but it is not available anywhere.
A← X +Y A0 ← X + Y
B ←X+Y B0 ← X + Y
A←1 A1 ← 1
(1) C ←X+Y C0 ← X + Y
B←2 B1 ← 2
C←3 C1 ← 3
(2) D ← X +Y D0 ← X + Y
As described, the technique operates over single basic blocks. With some minor modifica-
tions, we can apply it to an expanded scope, called an extended basic block7 . An extended
basic block is a sequence of blocks B1 , B2 , . . . , Bn where Bi is the only predecessor of Bi+1 ,
for 1 ≤ i < n, and B1 does not have a unique predecessor. To apply value numbering to a
single extended basic block, we can simply apply the single block algorithm to each block in
the sequence, in order, and use the results from Bi−1 to initialize the tables for Bi . This works
because each Bi has a single predecessor, 1 < i ≤ n.
If a block has multiple successors, then it may be a member of more than one extended
basic block. For example, consider the if-then-else construct shown in Figure 3. Block B1
is contained in two extended basic blocks: {B1 , B2} and {B1 , B3}. These blocks are related
by a common prefix. In fact, if the intersection of two extended basic blocks is non-empty,
it must be a common prefix of both. Thus, a collection of extended basic blocks related by
intersection forms a tree, and the trees form a forest representing the control-flow graph. The
start block and each block with multiple predecessors correspond to the root of a tree, and
each block with a single predecessor, p, is a child of p. We can use this tree structure to avoid
processing any basic block more than once.
The tree representation of extended basic blocks leads to a straight forward and efficient
technique for value numbering. It suggests that each tree should be processed in a preorder
walk using a scoped hash table similar to one that would be used for processing declarations
in a language with nested lexical scopes7, 8 . At any point during the processing, the scoped
table contains a sequence of nested scopes, one for each block that is an ancestor of the current
block.
• As new blocks are processed, new scopes are created in the table. Any entries added
to the current scope will supersede entries with the same name in any enclosing scope.
Searches are performed starting with the innermost scope and proceeding outward until
a matching entry is found.
• As the algorithm returns upward from a block, it must undo the effects of processing
that block. Using a scoped table, this corresponds to deleting the block’s scope from the
table. It must also restore the entries in the name and VN arrays. In practice, this adds a
fair amount of overhead and complication to the algorithm, but it does not change the
asymptotic complexity.
The scoped table matches the tree structure of sets of related blocks. It lets the algorithm avoid
reprocessing blocks that appear in multiple extended basic blocks. The next section shows
how to use the properties of static single assignment form to eliminate the name array and to
avoid the complication of restoring the VN array.
Static single assignment form
Many of the difficulties encountered during value numbering of extended basic blocks can
be overcome by constructing the static single assignment (SSA) form of the routine6 . The basic
idea used in constructing SSA form is to give unique names to the targets of all assignments
in the routine, and then to overwrite uses of the assignments with the new names. Special
assignments, called φ-functions, are inserted to select the appropriate definition where more
that one definition site (each with a unique SSA name) reaches a point in the routine. One φ-
function is inserted at each join point for each name in the original routine. In practice, to save
space and time, φ-functions are placed at only certain join points and for only certain names.
Specifically, a φ-function is placed at the birthpoint 9 of each value – the earliest location
VALUE NUMBERING 5
HHHH
B1 if (. . . )
HHHH
B1 if (. . . )
HHj HHj
←5 B x ←3
HHH
x←5 B x←3 B2 x1
HH
B2 3 3 2
Before After
where the joined value exists. Each φ-function defines a new name for the original item as a
function of all of the SSA names which are current at the end of the join point’s predecessors.
Any uses of the original name after the φ-function are replaced by the φ-function’s result. The
φ-function selects the value of the input that corresponds to the block from which control is
transferred and assigns this value to the result.
The critical property of SSA for this work is the naming discipline that it imposes on the
code. Each SSA name is assigned a value by exactly one operation in a routine; therefore, no
name is ever reassigned, and no expression ever becomes inaccessible. The advantage of this
approach becomes apparent if the code in Figure 2 is converted to SSA form. At statement
(1), the expression X + Y can be replaced by A0 because the second assignment to A was
given the name A1 . Similarly, the expression at statement (2) can be replaced by A0 . Also,
the transition from single to extended basic blocks is simpler because we can, in fact, use a
scoped hash table where only the new entries must be removed. We can also eliminate the
name array, and we no longer need to restore the VN array.
Dominator-based value numbering technique
To extend the scope of optimization any further requires a mechanism for handling points in
the control-flow graph where paths converge. The method for extended basic blocks already
covers the maximal length regions without such merges. To handle merge points, we will
rely on a well-understood idea from classic optimization and analysis—dominance. In a flow
graph, if node X appears on every path from the start node to node Y , then X dominates Y
(XY )10 . If XY and X 6= Y , then X strictly dominates Y (X Y ). The immediate
dominator of Y (idom(Y )) is the closest strict dominator of Y 11 . In the routine’s dominator
tree, the parent of each node is its immediate dominator. Notice that all nodes that dominate
a node X are ancestors of X in the dominator tree.
Aside from the naming discipline imposed, another key feature of SSA form is the informa-
tion it provides about the way values flow into each basic block. A value can enter a block B
in one of two ways: either it is defined by a φ-function at the start of B or it flows through
B ’s parent in the dominator tree (i.e., B ’s immediate dominator). These observations lead us
6 BRIGGS, COOPER, SIMPSON
procedure DVNT(Block B)
Mark the beginning of a new scope
for each φ-function p of the form “n ← φ(. . .)” in B
if p is meaningless or redundant
Put the value number for p into VN [n]
Remove p
else
VN [n] ← n
Add p to the hash table
for each assignment a of the form “x ← y op z ” in B
Overwrite y with VN [y] and z with VN [z]
expr ← hy op zi
if expr can be simplified to expr0
Replace a with “x ← expr0 ”
expr ← expr0
if expr is found in the hash table with value number v
VN [x] ← v
Remove a
else
VN [x] ← x
Add expr to the hash table with value number x
for each successor s of B
Adjust the φ-function inputs in s
for each child c of B in the dominator tree
DVNT(c)
Clean up the hash table after leaving this scope
to an algorithm that extends value numbering to larger regions by using the dominator tree.
The algorithm processes each block by initializing the hash table with the information
resulting from value numbering its parent in the dominator tree. To accomplish this, we again
use a scoped hash table. The value numbering proceeds by recursively walking the dominator
tree. Figure 4 shows high-level pseudo-code for the algorithm.
To simplify the implementation of the algorithm, the SSA name of the first occurrence of an
expression (in this path in the dominator tree) becomes the expression’s value number. This
eliminates the need for the name array because each value number is itself an SSA name. For
clarity, we will surround an SSA name that represents a value number with angle brackets (e.g.,
hx0i). When a redundant computation of an expression is found, the compiler removes the
operation and replaces all uses of the defined SSA name with the expression’s value number.
The compiler canuse this replacement scheme over a limited region of the code.
1. The value number can replace a redundant computation in any block dominated by the
first occurrence.
2. The value number can replace a redundant evaluation that is a parameter to a φ-node
VALUE NUMBERING 7
corresponding to control flow from a block dominated by the first occurrence. To find
these φ-node parameters, we compute the dominance frontier of the block containing
the first occurrence of the expression. The dominance frontier of node X is the set of
nodes Y such that X dominates a predecessor of Y , but X does not strictly dominate Y
(i.e., DF(X) = {Y | ∃P ∈ Pred(Y ), XP and X 6 Y }).∗
In both cases, we know that control must flow through the block where the first evaluation
occurred (defining the SSA name’s value).
The φ-functions require special treatment. Before the compiler can analyze the φ-functions
in a block, it must previously have assigned value numbers to all of the inputs. This is not
possible in all cases; specifically, any φ-function input whose value flows along a back edge
(with respect to the dominator tree) cannot have a value number. If any of the parameters
of a φ-function have not been assigned a value number, then the compiler cannot analyze
the φ-function, and it must assign a unique, new value number to the result. The following
two conditions guarantee that all φ-function parameters in a block have been assigned value
numbers:
1. When procedure DVNT (see Figure 4) is called recursively for the children of block b in
the dominator tree, the children must be processed in reverse postorder. This ensures that
all of a block’s predecessors are processed before the block itself, unless the predecessor
is connected by a back edge relative to the DFS tree.
2. The block must have no incoming back edges.
If the above conditions are met, we can analyze the φ-functions in a block and decide if
they can be eliminated. A φ-function can be eliminated if it is meaningless or redundant.
A φ-function is meaningless if all its inputs have the same value number. A meaningless
φ-function can be removed if the references to its result are replaced with the value number
of its input parameters. A φ-function is redundant if it computes the same value as another φ-
function in the same block. The compiler can identify redundant φ-functions using a hashing
scheme analogous to the one used for expressions. Without additional information about
the conditions controlling the execution of different blocks, the compiler cannot compare
φ-functions in different blocks.
After value numbering the φ-functions and instructions in a block, the algorithm visits each
successor block and updates any φ-function inputs that come from the current block. This
involves determining which φ-function parameter corresponds to input from the current block
and overwriting the parameter with its value number. Notice the resemblance between this step
and the corresponding step in the SSA construction algorithm. This step must be performed
before value numbering any of the block’s children in the dominator tree, if the compiler is
going to analyze φ-functions.
To illustrate how the algorithm works, we will apply it to the code fragment in Figure 5.
The first block processed will be B1 . Since none of the expressions have been seen, the names
u0 , v0 , and w0 will be assigned their SSA name as their value number.
The next block processed will be B2 . Since the expression c0 + d0 was defined in block B1
(which dominates B2 ), we can delete the two assignments in this block by assigning the value
number for both x0 and y0 to be hv0i. Before we finish processing block B2 , we must fill in
the φ-function parameters in its successor block, B4 . The first argument of φ-functions in B4
corresponds to input from block B2 , so we replace u0 , x0 , and y0 with hu0 i, hv0 i, and hv0i,
respectively.
∗ The SSA-construction algorithm uses dominance frontiers to place φ-nodes6.
8 BRIGGS, COOPER, SIMPSON
u0 ← a0 + b0 u0 ← a0 + b0
B1 v0 ← c0 + d0 B1 v0 ← c0 + d0
HHH
w0 ← e0 + f0
HHH
w0 ← e0 + f0
HHj uh
(
HHj ((
h((ha h+h(b
u ←a 1 0 + b0
h
v(←
(
h(h(c h+h(d B w(←h(h(e (h+h(f
( h ← 0 0 0
HHH
y0 ← c0 + d0
y1 ← e0 + f0 0
HHH
0 0
← 0 0 0
HH
j H
u ← φ(u , u )
2 0 1 uh(←h(Hjφ(u
0
(h(h(h, u )
0 0
B xh h((h(h, w )
x2 ← φ(x0, x1) x ← φ(v , w )
(←h(φ(v
2 0 0
B4 y2 ← φ(y0, y1) 4 2 0 0
h((ha (h+(hb
z0 ← u2 + y2 z ← u +x
uh
(←
0 0 2
u3 ← a0 + b0 0 0 0
Before After
Block B3 will be visited next. Since every right-hand-side expression has been seen, we
assign the value numbers for u1 , x1, y1 to be hu0i, hw0i, and hw0i, respectively, and remove
the assignments. To finish processing B3 , we fill in the second parameter of the φ-functions
in B4 with hu0i, hw0i, and hw0i, respectively.
The final block processed will be B4 . The first step is to examine the φ-functions. Notice
that we are able to examine the φ-functions only because we processed B1 ’s children in the
dominator tree (B2 , B3 , and B4 ) in reverse postorder and because there are no back edges
flowing into B4 . The φ-function defining u2 is meaningless because all its parameters are
equal (They have the same value number – hu0 i). Therefore, we eliminate the φ-function by
assigning u2 the value number hu0 i. It is interesting to note that this φ-function was made
meaningless by eliminating the only assignment to u in a block with B4 in its dominance
frontier. In other words, when we eliminate the assignment to u in block B3 , we eliminate the
reason that the φ-function for u was inserted during the construction of SSA form. The second
φ-function combines the values v0 and w0. Since this is the first appearance of a φ-function
with these parameters, x2 is assigned its SSA name as its value number. The φ-function defining
y2 is redundant because it is equal to x2. Therefore, we eliminate this φ-function by assigning
y2 the value number hx2 i. When processing the assignments in the block, we replace each
operand by its value number. This results in the expression hu0i + hx2i in the assignment to
z0 . The assignment to u3 is eliminated by giving u3 the value number hu0 i.
Notice that if we applied single-basic-block value numbering to this example, the only
redundancies we could remove are the assignments to y0 and y1 . If we applied extended-
basic-block value numbering, we could also remove the assignments to x0, u1 , and x1. Only
dominator-based value numbering can remove the assignments to u2 , y2 , and u3 .
VALUE NUMBERING 9
Incorporating value numbering into SSA construction
We have described dominator-based value numbering as it would be applied to routines
already in SSA form. However, it is possible to incorporate value numbering into the SSA
construction process. The advantage of combining the steps is to improve the performance
of the optimizer by reducing the amount of work performed and by reducing the size of the
routine’s SSA representation. The algorithm for dominator-based value numbering during SSA
construction is presented in Figure 6. There is a great deal of similarity between the value
numbering process and the renaming process during SSA construction 6 . The renaming process
can be modified as follows to accomplish renaming and value numbering simultaneously:
• For each name in the original program, a stack is maintained which contains subscripts
used to replace uses of that name. To accomplish value numbering, these stacks will
contain value numbers. Notice that each element in the VN array in Figure 4 represents
a value number, but the VN array in Figure 6 represents stacks of value numbers.
• Before inventing a new name for each φ-function or assignment, we first check if it can
be eliminated. If so, we push the value number of the φ-function or assignment onto the
stack for the defined name.
VALUE PARTITIONING
Alpern, Wegman, and Zadeck2 presented a technique that uses a variation of Aho, Hopcroft,
and Ullman’s12 formulation of Hopcroft’s DFA-minimization algorithm to partition values into
10 BRIGGS, COOPER, SIMPSON
B1 B1
HHH HHH
HHj HHj
B2 x0 ← a + b x1 ← a + b B3 B2 x0 ← a + b x0 ← a + b B3
HHH HHH
HH
j HHj
B4 x2 ← φ(x0 , x1) B4
y ← x2 + 1 y ← x0 + 1
Before After
congruence classes. It operates on the SSA form of the routine6 . Two values are congruent if
they are computed by the same opcode, and their corresponding operands are congruent. For
all legal expressions, two congruent values must be equal. Since the definition of congruence
is recursive, there will be routines where the solution is not unique. A trivial solution would
be to set each SSA name in the routine to be congruent only to itself; however, the solution we
seek is the maximum fixed point – the solution that contains the most congruent values.
The algorithm we use differs slightly from Alpern, Wegman, and Zadeck’s. They describe
an algorithm that operates on a structured programming language, where the SSA form is
modified with φif -functions that represent if-then-else structures and φenter and φexit -
functions that represent loop structures. These extensions to SSA form allow φif -functions
to be compared to φif -functions in different blocks. The same is true for φenter and φexit -
functions. In order to be more general, our implementation of value partitioning operates on
pure SSA form, which means that φ-functions in different blocks cannot be congruent.
Figure 8 shows the partitioning algorithm. Initially, the partition contains a congruence class
for the values defined by each operator in the program. The partition is iteratively refined by
examining the uses of all members of a class and determining which classes must be further
subdivided. After the partition stabilizes, the registers and φ-functions in the routine are
renumbered based on the congruence classes so that all congruent definitions have the same
name. In other words, for each SSA name, n, we replace each occurrence of n in the program
with the name chosen to represent the congruence class containing n. Because the effects of
partitioning and renumbering are similar to those of value numbering using the unified hash
table described in the previous section, we think of this technique as a form of global (or
intraprocedural) value numbering.∗ Value partitioning and the unified hash table algorithm do
not necessarily discover the same equivalences, but they both provide a consistent naming of
the expressions throughout the entire routine.
Partitioning and renaming alone will not improve the running time of the routine; we must
∗ Rosen, Wegman, and Zadeck13 describe a technique called global value numbering . It is an interesting and powerful approach
to redundancy elimination, but it should not be confused with value partitioning.
12 BRIGGS, COOPER, SIMPSON
also find and remove the redundant computations. We explore three possible approaches:
dominator-based removal, AVAIL-based removal, and partial redundancy elimination.
Dominator-based removal
Alpern, Wegman, and Zadeck2 suggest removing any computation that is dominated by
a definition from the same congruence class. In Figure 9, the computation of z is a redun-
dancy that this method can eliminate. Since the computation of z in block B1 dominates the
z ← x+y z ←x+y
B1 if (. . . ) B1 if (. . . )
HHH HHH
HHj HHj
B2
HHH B 3 B2
HHH B 3
HH
j HHj
B4 z ← x + y B 4
Before After
HHHH HHHH
B1 if (. . . ) B1 if (. . . )
Hj
Hj
z ← x+y B z ← x+y z ←x+y B z ←x+y
B2
HHH 3 B2
HHH 3
HH
j HHj
B4 z ← x + y B 4
Before After
∅, if i is the entry block
Z ← X +Y AVINi = \
AVOUTj , otherwise
X ← ...
j∈pred(i)
Z ← X +Y
AVOUTi = AVINi ∪ definedi
that the two assignments to Z are congruent, the second one is redundant and can be removed.
The only way the intervening assignment will be given the name X is if the value computed is
congruent to the definition of X that reaches the first assignment to Z . The data-flow equations
we use are shown in Figure 11.
Partial redundancy elimination
Partial redundancy elimination (PRE) is an optimization introduced by Morel and Renvoise4 .
Partially redundant computations are redundant along some, but not all, execution paths. PRE
operates by discovering partially redundant computations, inserting code to make many of
them fully redundant,∗ and then removing all redundant computations.
In Figures 9 and 10, the computations of z are redundant along all paths to block B4 , so
they will be removed by PRE. On the other hand, the computation of z in block B4 in Figure 12
∗ PRE only inserts a copy of an evaluation if it can prove that the insertion, followed by removal of the newly redundant code,
makes no path longer. In practice, this prevents it from removing some partially redundant expressions inside if-then-else
constructs.
B1
HHH
if (. . . )
B1
HHH if (. . . )
HHj HHj
B z ← x+y z ←x+y B z ←x+y
B2
HHH 3 B2
HHH 3
HH
j HHj
B4 z ← x+y B4
Before After
X0 ← 1
Y0 ← 1
A← X −Y while (. . . ) {
B ←Y −X X2 ← φ(X0, X3)
C ← A−B Y2 ← φ(Y0, Y3 )
D←B−A X3 ← X2 + 1
Y3 ← Y2 + 1
}
cannot be removed using AVAIL-based removal, because it is not available along the path
through block B2 . The value of z is computed twice along the path through B3 but only once
along the path through B2 . Therefore, it is considered partially redundant. PRE can move the
computation of z from block B4 to block B2 . It inserts a copy of the computation in B2 ,
making the computation in B4 redundant. Next, it removes the computation from B4 . This
will shorten the path through B3 and leave the length of the path through B2 unchanged. PRE
has the added advantage that it moves invariant code out of loops.
these techniques have visited X3 or Y3 . Therefore, they must assign different value numbers
to X2 and Y2 . However, the partitioning technique will prove that X2 is congruent to Y2 (and
thus X3 is congruent to Y3 ). The key feature of the partitioning algorithm which makes this
possible is its initial optimistic assumption that all values defined by the same operator are
congruent. It then proceeds to disprove the instances where the assumption is false. In contrast,
the hash-based approaches begin with the pessimistic assumption that no values are equal and
proceed to prove as many equalities as possible.
We should point out that eliminating more redundancies does not necessarily result in
reduced execution time. This effect is a result of the way different optimizations interact.
The primary interactions are with register allocation and with optimizations that combine
instructions, such as constant folding or peephole optimization. Each replacement affects
register allocation because it has the potential of shortening the live ranges of its operands
and lengthening the live range of its result. Because the precise impact of a replacement on
the lifetimes of values depends completely on context, the impact on demand for registers is
difficult to assess. In a three-address intermediate code, each replacement has two opportunities
to shorten a live range and one opportunity to extend a live range. We believe that the impact
of replacements on the demand for registers is negligible; however, this issue deserves more
study.
The interaction between value numbering and other optimizations can also affect the execu-
tion time of the optimized program. The example in Figure 14 illustrates how removing more
redundancies may not result in improved execution time. The code in block B1 loads the value
of the second element of a common block called “foo”, and the code in block B2 loads the first
element of the same common block. Compared to value numbering over single basic blocks,
value numbering over extended basic blocks will remove more redundancies. In particular,
the computation of register r5 is not needed because the same value is in register r1. However,
the definition of r1 is no longer used in block B1 due to the constant folding in the definition
of r3. The definition of r1 is now partially dead because it is used along the path through
block B2 but not along the path through B3 . If the path through block B3 is taken at run time,
the computation of r1 will be unused. On the other hand, value numbering over single basic
blocks did not remove the definition of r5, and the definition of r1 can be removed by dead
code elimination. The result is that both paths through the CFG are as short as possible. Other
optimizations that fold or combine optimizations, such as constant propagation or peephole
optimization, can produce analogous results. In our test suite, the saturr routine exhibits
this behavior.
EXPERIMENTAL RESULTS
Even though we can prove that each of the three partitioning techniques and each form
of hash-based value numbering is never worse than its predecessor in terms of eliminating
redundancies, an equally important question is how much this theoretical distinction matters
in practice. To assess the real impact of these techniques, we have implemented all of the
optimizations in our experimental Fortran compiler. The compiler is centered around our
intermediate language, called ILOC (pronounced “eye-lock”). ILOC is a pseudo-assembly lan-
guage for a RISC machine with an arbitrary number of symbolic registers. LOAD and STORE
operations are provided to access memory, and all computations operate on symbolic registers.
The front end translates Fortran into ILOC. The optimizer is organized as a collection of Unix
filters that consume and produce ILOC. This design allows us to easily apply optimizations
in almost any order. The back end produces code instrumented to count the number of ILOC
VALUE NUMBERING 17
r1 ← “foo”
r ←4
B1 r ←2 r + r
3 1 2
HHH
r4 ← LOAD r3
r5 ← “foo”
HHj
r6 ← 0
B2 r7 ← r5 + r6 B3
HHH
r8 ← LOAD r7
HHj
B4
Original Program
r1 ← “foo” r1 ← “foo”
r2 ← 4 r2 ← 4
B1 B1
r3 ← “foo+4” r3 ← “foo+4”
HHHH HHHH
r4 ← LOAD r3 r4 ← LOAD r3
Hj
Hj
r5 ← “foo”
r6 ← 0 r6 ← 0
B2 B3 B2 B3
HH
r8 ← LOAD r1
HH
r8 ← LOAD r5
HHH
j HHHj
B4 B4
Extended Basic Blocks Single Basic Blocks
More Redundancies Removed Fewer Reduncancies Removed
r1 ← “foo”
B1 B1
r3 ← “foo+4” r3 ← “foo+4”
HHH
r4 ← LOAD r3
HHH
r4 ← LOAD r3
HHj HHj
r5 ← “foo”
B2 B3 B2 B3
HHH
r8 ← LOAD r1
HHH
r8 ← LOAD r5
HH
j
HHj
B4 B4
Final Code Final Code
1.2
1 Single
0.8 Extended
0.6 Dominator
AVAIL
0.4
PRE
0.2
0
tomcatv twldrv gamgen iniset deseco debflu prophy pastem repvid fpppp
1.2
1 Single
0.8 Extended
Dominator
0.6
AVAIL
0.4
PRE
0.2
0
paroi bilan debico inithx integr sgemv cardeb sgemm inideb supp
1.2
1 Single
0.8 Extended
0.6 Dominator
AVAIL
0.4
PRE
0.2
0
saxpy ddeflu fmtset subb ihbtr drepvi x21y21 saturr fmtgen efill
1.2
1 Single
0.8 Extended
Dominator
0.6
AVAIL
0.4
PRE
0.2
0
si heat dcoera lclear orgpar yeh colbur coeray drigl lissag
1.2
1 Single
0.8 Extended
0.6 Dominator
AVAIL
0.4
PRE
0.2
0
aclear sortie sigma hmoy dyeh vgjyeh arret inter intowp ilsw
1.2
1 Single
0.8 Extended
Dominator
0.6
AVAIL
0.4
PRE
0.2
0
svd fmin zeroin spline decomp fehl rkfs urand solve seval rkf45
instructions executed.
Comparisons were made using routines from a suite of benchmarks consisting of routines
drawn from the SPEC benchmark suite15 and from Forsythe, Malcolm, and Moler’s book on
numerical methods16. We refer to the latter as the FMM benchmark. Each routine was optimized
in several different ways by varying the type of redundancy elimination (value numbering
followed by code removal or motion).∗ To achieve accurate comparisons, we varied only the
type of redundancy elimination performed. The complete results are shown in Figures 15
through 20. Each bar represents dynamic counts of ILOC operations, normalized against
the the leftmost bar. Routines are optimized using the sequence of global reassociation17 ,
redundancy elimination, global constant propagation18, global peephole optimization, dead
code elimination6, operator strength reduction19, 20 , redundancy elimination, global constant
propagation, global peephole optimization, dead code elimination, copy coalescing, and a pass
to eliminate empty basic blocks. All forms of value numbering were performed on the SSA
form of the routine. The hash-based approaches use the unified table method. Its global name
space is needed for either AVAIL-based removal or PRE. All tests were run on a two-processor
Sparc10 model 512 running at 50 MHz with 1 MB cache and 115 MB of memory.
Figures 15 and 16 compare the hash-based techniques. In general, each refinement to the
technique results in an improvement in the results. We see significant improvements when
moving from single basic blocks to extended basic blocks and again to dominator-based
removal. One surprising aspect of this study is that the differences between dominator-based
removal and AVAIL-based removal are small in practice.† The differences between AVAIL-
based removal and PRE are significant. The ability of PRE to move invariant code out of
loops contributes greatly to this improvement. However, there are some examples where our
value-based formulation of AVAIL-based removal is better than PRE, which operates on lexical
names. Figures 17 and 18 compare the partitioning techniques. The results are similar to the
∗ The sizes of the test cases for matrix300 and tomcatv have been reduced to ease testing.
† This suggests that either (1) the situation depicted in Figure 10 occurs infrequently in the tested codes, or (2) some combination
of the other optimizations catch this situation. It appears that the first explanation is the correct one.
20 BRIGGS, COOPER, SIMPSON
1.2
1
0.8 Dominator
0.6 AVAIL
PRE
0.4
0.2
0
tomcatv twldrv gamgen iniset deseco debflu prophy pastem repvid fpppp
1.2
1
0.8 Dominator
0.6 AVAIL
PRE
0.4
0.2
0
paroi bilan debico inithx integr sgemv cardeb sgemm inideb supp
1.2
1
0.8 Dominator
0.6 AVAIL
0.4 PRE
0.2
0
saxpy ddeflu fmtset subb ihbtr drepvi x21y21 saturr fmtgen efill
1.2
1
0.8 Dominator
0.6 AVAIL
0.4 PRE
0.2
0
si heat dcoera lclear orgpar yeh colbur coeray drigl lissag
1.2
1
0.8 Dominator
0.6 AVAIL
0.4 PRE
0.2
0
aclear sortie sigma hmoy dyeh vgjyeh arret inter intowp ilsw
1.2
1
Dominator
0.8
AVAIL
0.6
PRE
0.4
0.2
0
svd fmin zeroin spline decomp fehl rkfs urand solve seval rkf45
1.2
1 Dominator Partitioning
Dominator Hash-based
0.8
AVAIL Partitioning
0.6
AVAIL Hash-based
0.4 PRE Partitioning
0.2 PRE Hash-based
0
tomcatv twldrv gamgen iniset deseco debflu prophy pastem repvid fpppp
1.2
Dominator Partitioning
1
Dominator Hash-based
0.8 AVAIL Partitioning
0.6 AVAIL Hash-based
0.4 PRE Partitioning
0
paroi bilan debico inithx integr sgemv cardeb sgemm inideb supp
1.2
Dominator Partitioning
1
Dominator Hash-based
0.8 AVAIL Partitioning
0.6 AVAIL Hash-based
0.4 PRE Partitioning
PRE Hash-based
0.2
0
saxpy ddeflu fmtset subb ihbtr drepvi x21y21 saturr fmtgen efill
1.2
Dominator Partitioning
1
Dominator Hash-based
0.8 AVAIL Partitioning
0.6 AVAIL Hash-based
0.4 PRE Partitioning
PRE Hash-based
0.2
0
si heat dcoera lclear orgpar yeh colbur coeray drigl lissag
1.2
Dominator Partitioning
1
Dominator Hash-based
0.8 AVAIL Partitioning
0.6 AVAIL Hash-based
0.4 PRE Partitioning
PRE Hash-based
0.2
0
aclear sortie sigma hmoy dyeh vgjyeh arret inter intowp ilsw
1.2
1 Dominator Partitioning
0.8 Dominator Hash-based
0.6 AVAIL Partitioning
0.4 AVAIL Hash-based
0.2 PRE Partitioning
0 PRE Hash-based
svd
fmin
zeroin
spline
solve
seval
decom
fehl
urand
rkfs
rkf45
p
SUMMARY
In this paper, we study a variety of redundancy elimination techniques. We have introduced
a technique for applying hash-based value numbering over a routine’s dominator tree. This
technique is superior in practice with the value partitioning techniques, while being faster and
simpler. Additionally, we have improved the effectiveness of value partitioning by removing
computations based on available values rather than dominance information and by applying
partial redundancy elimination.
We presented experimental data comparing the effectiveness of each type of value numbering
in the context of our optimizing compiler. The data indicates that our extensions to the existing
algorithms can produce significant improvements in execution time.
ACKNOWLEDGEMENTS
Our interest in this problem began with suggestions from both Jonathan Brezin of IBM and
Bob Morgan of DEC. Independently, they suggested that we investigate value numbering over
dominator regions. Bruce Knobe of Intermetrics also urged us to look at extending value
numbering to ever larger regions. The referees made a number of detailed comments and
suggestions that improved both the exposition and content of the paper.
Our colleagues in the Massively Scalar Compiler Project at Rice have played a large role
in this work. In particular, we owe a debt of gratitude to Cliff Click, Tim Harvey, Linlong
Jiang, John Lu, Nat McIntosh, Philip Schielke, Rob Shillner, Lisa Thomas, Linda Torczon,
and Edmar Wienskoski. Without their tireless implementation efforts, we could not have
completed this study.
REFERENCES
1. John Cocke and Jacob T. Schwartz, ‘Programming languages and their compilers: Preliminary notes’, Tech-
nical report, Courant Institute of Mathematical Sciences, New York University, 1970.
24 BRIGGS, COOPER, SIMPSON
2. Bowen Alpern, Mark N. Wegman, and F. Kenneth Zadeck, ‘Detecting equality of variables in programs’,
Conference Record of the Fifteenth Annual ACM Symposium on Principles of Programming Languages, San
Diego, California, January 1988, pp. 1–11.
3. John Cocke, ‘Global common subexpression elimination’, SIGPLAN Notices, 5(7), 20–24 (1970). Proceedings
of a Symposium on Compiler Optimization.
4. Etienne Morel and Claude Renvoise, ‘Global optimization by suppression of partial redundancies’, Commu-
nications of the ACM, 22(2), 96–103 (1979).
5. Jiazhen Cai and Robert Paige, “‘Look Ma, no hashing, and no arrays neither”’, Conference Record of the
Eighteenth Annual ACM Symposium on Principles of Programming Languages, Orlando, Florida, January
1991, pp. 143–154.
6. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck, ‘Efficiently com-
puting static single assignment form and the control dependence graph’, ACM Transactions on Programming
Languages and Systems, 13(4), 451–490 (1991).
7. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-
Wesley, 1986.
8. Charles N. Fischer and Jr. Richard J. LeBlanc, Crafting a Compiler with C, The Benjamin/Cummings
Publishing Company, Inc., Redwood City, CA, 1991.
9. John H. Reif, ‘Symbolic programming analysis in almost linear time’, Conference Record of the Fifth Annual
ACM Symposium on Principles of Programming Languages, Tucson, Arizona, January 1978, pp. 76–83.
10. Thomas Lengauer and Robert Endre Tarjan, ‘A fast algorithm for finding dominators in a flowgraph’, ACM
Transactions on Programming Languages and Systems, 1(1), 121–141 (1979).
11. Matthew S. Hecht, Flow Analysis of Computer Programs, Programming Languages Series, Elsevier North-
Holland, Inc., 52 Vanderbilt Avenue, New York, NY 10017, 1977.
12. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms,
Addison-Wesley, Reading, Massachusetts, 1974.
13. Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck, ‘Global value numbers and redundant computa-
tions’, Conference Record of the Fifteenth Annual ACM Symposium on Principles of Programming Languages,
San Diego, California, January 1988, pp. 12–27.
14. Cliff Click, ‘Combining analyses, combining optimizations’, Ph.D. Thesis, Rice University, 1995.
15. SPEC release 1.2, September 1990. Standards Performance Evaluation Corporation.
16. George E. Forsythe, Michael A. Malcolm, and Cleve B. Moler, Computer Methods for Mathematical Com-
putations, Prentice-Hall, Englewood Cliffs, New Jersey, 1977.
17. Preston Briggs and Keith D. Cooper, ‘Effective partial redundancy elimination’, SIGPLAN Notices, 29(6),
159–170 (1994). Proceedings of the ACM SIGPLAN ’94 Conference on Programming Language Design and
Implementation.
18. Mark N. Wegman and F. Kenneth Zadeck, ‘Constant propagation with conditional branches’, ACM Transac-
tions on Programming Languages and Systems, 13(2), 181–210 (1991).
19. Frances E. Allen, John Cocke, and Ken Kennedy, ‘Reduction of operator strength’, in Steven S. Muchnick
and Neil D. Jones (eds.), Program Flow Analysis: Theory and Applications, Prentice-Hall, 1981.
20. Keith D. Cooper, L. Taylor Simpson, and Christopher A. Vick, ‘Operator strength reduction’, Technical Report
CRPC-TR95635-S, Center for Research on Parallel Computation, Rice University, October 1995.