Chapter 5 - Intermediate Code Generation
Chapter 5 - Intermediate Code Generation
Chapter 5 - Intermediate Code Generation
Chapter 5
Agenda
Introduction
Why IR?
Issues in Designing an IR
Intermediate Representations
Abstract Syntax Trees (AST)
Directed Acyclic Graphs (DAG)
Control Flow Graphs (CFG)
Postfix notation
Static Single Assignment Form (SSA)
Stack Machine Code
Three Address Code
Introduction
Intermediate representation
It ties the front and back ends together
Language and Machine neutral
no limit on register and memory, no machine-specific
instructions.
Many forms
Syntax trees, three-address code, quadruples.
Intermediate code generation can effect the performance
of the back end
Why IR?
Portability - Suppose We have n-source languages and m-Target
languages. Without Intermediate code we will change each source
language into target language directly. So, for each source-target
pair we will need a compiler. Hence we will require (n*m)
Compilers, one for each pair. If we Use Intermediate code We will
require n-Compilers to convert each source language into
Intermediate code and m-Compilers to convert Intermediate code
into m-target languages. Thus We require only (n+m) Compilers.
C SPARC
Pascal HP PA
FORTRAN x86
Pascal HP PA
IR
FORTRAN x86
if (x < y)
x = 5*y + 5*y/3; AssignStmt AssignStmt x +
<
else
y = 5; x y
x y x + y 5
x = x+y;
* /
5 y 3
*
5 y
Directed Acyclic Graphs (DAGs)
Directed acyclic graphs (DAGs)
Like compressed trees
leaves: variables, constants available on entry
internal nodes: operators
annotated with variable names?
distinct left/right children
Used for basic blocks (DAGs don't show control flow)
Can generate efficient code.
Note: DAGs encode common expressions
But difficult to transform
Good for analysis
Generating DAGs
Check whether an operand is already present
if not, create a leaf for it
Check whether there is a parent of the operand that
represents the same operation
if not create one, then label the node representing the result
with the name of the destination variable, and remove that label
from all other nodes in the DAG
ASTs and DAGs: := :=
a := b *-c + b*-c a + a +
* * *
b - (uni) b - (uni) b - (uni)
c c c
Directed Acyclic Graphs (DAGs)…
Use directed acyclic graphs to represent expressions
Use a unique node for each expression
if (x < y) Statements
x = 5*y + 5*y/3;
else
IfStmt AssignStmt
y = 5;
x = x+y;
x y +
/
*
3
5
Control Flow Graphs (CFGs)
Nodes in the control flow graph are basic blocks
A basic block is a sequence of statements always entered at the
beginning of the block and exited at the end
Edges in the control flow graph represent the control flow
if (x < y)
x = 5*y + 5*y/3; B0
if (x < y) goto B1 else goto B2
else
y = 5;
x = x+y; B1 B2
x = 5*y + 5*y/3 y = 5
Form Rules:
1. If E is a variable/constant, the PN of E is E itself
2. If E is an expression of the form E1 op E2, the PN of E is
E1’E2’op (E1’ and E2’ are the PN of E1 and E2, respectively.)
3. If E is a parenthesized expression of form (E1), the PN of E is
the same as the PN of E1.
Example:
The PN of expression (a+b)/(c-d) ? is (ab+)(cd-)/
Three-Address Code
A popular form of intermediate code used in optimizing compilers
Each instruction can have at most three operands
Assignments
x := y
x := y op z op: binary arithmetic or logical operators
x := op y op: unary operators (minus, negation, integer to
float conversion)
Branch
goto L Execute the statement with labeled L next
Conditional Branch
if x relop y goto L relop: <, =, <=, >=, ==, !=
if the condition holds we execute statement labeled L next
if the condition does not hold we execute the statement following
this statement next
Three-Address Code
Variables can be represented with
if (x < y)
x = 5*y + 5*y/3;
their locations in the symbol table
else
y = 5; if x < y goto L1
goto L2
x = x+y;
L1: t1 := 5 * y
t2 := 5 * y
t3 := t2 / 3
Temporaries: temporaries x := t1 + t3
correspond L2:
goto L3
y := 5
to the internal nodes of the L3: x := x + y
syntax tree
• Three address code instructions can be represented as an array of
quadruples: operation, argument1, argument2, result
triples: operation, argument1, argument2
(each triple implicitly corresponds to a temporary)
Three-Address Code Generation for a Simple Grammar
Attributes: E.place: location that holds the value of expression E
E.code: sequence of instructions that are generated for E
Procedures: newtemp(): Returns a new temporary each time it is called
gen(): Generates instruction (have to call it with appropriate arguments)
lookup(id.name): Returns the location of id from the symbol table
T1 := B * C
_ T2 = A + T1
T3 = - B
+ B
T4 = T3 * A
+ * T5 = T2 + T4
_ T6 = T5 – B
A * A
B C
B
=
T1 := A T1 := B * C
D
/ T2 := C T2 := A+T1
T3 := B * T2
+ _ T4 := T1+T3
T3 := A*T1
T4 := T2+T3
T5 := T1*T3 T5 := – C
+ * T6 := T4 + T5 T6 := T4 / T5
T7 := – T2 D := T6
T8 := T6 / T7
* D := T8
A
B C