Chapter 5 - Intermediate Code Generation

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Intermediate-Code Generation

Chapter 5
Agenda
 Introduction
 Why IR?
 Issues in Designing an IR
 Intermediate Representations
Abstract Syntax Trees (AST)
Directed Acyclic Graphs (DAG)
Control Flow Graphs (CFG)
Postfix notation
Static Single Assignment Form (SSA)
Stack Machine Code
Three Address Code
Introduction

Intermediate Code Generation


Introduction
Intermediate code which is also called Intermediate
representation, intermediate language is
a kind of abstract machine language that can express the target
machine operations without committing too much machine
details.

Intermediate representation
It ties the front and back ends together
Language and Machine neutral
no limit on register and memory, no machine-specific
instructions.
Many forms
Syntax trees, three-address code, quadruples.
Intermediate code generation can effect the performance
of the back end
Why IR?
Portability - Suppose We have n-source languages and m-Target
languages. Without Intermediate code we will change each source
language into target language directly. So, for each source-target
pair we will need a compiler. Hence we will require (n*m)
Compilers, one for each pair. If we Use Intermediate code We will
require n-Compilers to convert each source language into
Intermediate code and m-Compilers to convert Intermediate code
into m-target languages. Thus We require only (n+m) Compilers.
C SPARC

Pascal HP PA

FORTRAN x86

C++ IBM PPC


Why IR?...
C SPARC

Pascal HP PA
IR
FORTRAN x86

C++ IBM PPC

 Retargeting - Build a compiler for a new machine by attaching a new


code generator to an existing front-end.
 Optimization - reuse intermediate code optimizers in compilers for
different languages and different machines.
 Program understanding - Intermediate code is simple enough to be
easily converted to any target code but complex enough to represent all
the complex structure of high level language.
Issues in Designing an IR
Whether to use an existing IR
if target machine architecture is similar
if the new language is similar

Designing a new IR needs to consider


 Level (how machine dependent it is)
 Structure
 Expressiveness
 Appropriateness for general and special optimizations
 Appropriateness for code generation
 Whether multiple IRs should be used
Intermediate Representations
Intermediate Representations can be expressed using

• Abstract Syntax Trees (AST) high-level


Graphical IRs
• Directed Acyclic Graphs (DAG)
• Control Flow Graphs (CFG)
• Postfix notation
Linear IRs • Three Address Code
• Static Single Assignment Form (SSA)
low-level
• Stack Machine Code

Hybrid approaches mix graphical and linear representations


SGI and SUN compilers use three address code but provide ASTs
for loops if-statements and array references
Use three-address code in basic blocks in control flow graphs
Abstract Syntax Trees (ASTs)
retain essential structure
Statements
of the parse tree,
eliminating unneeded
IfStmt
nodes. AssignStmt

if (x < y)
x = 5*y + 5*y/3; AssignStmt AssignStmt x +
<
else
y = 5; x y
x y x + y 5
x = x+y;

* /

5 y 3
*

5 y
Directed Acyclic Graphs (DAGs)
Directed acyclic graphs (DAGs)
Like compressed trees
leaves: variables, constants available on entry
internal nodes: operators
annotated with variable names?
distinct left/right children
Used for basic blocks (DAGs don't show control flow)
Can generate efficient code.
Note: DAGs encode common expressions
But difficult to transform
Good for analysis
Generating DAGs
Check whether an operand is already present
if not, create a leaf for it
Check whether there is a parent of the operand that
represents the same operation
if not create one, then label the node representing the result
with the name of the destination variable, and remove that label
from all other nodes in the DAG
ASTs and DAGs: := :=
a := b *-c + b*-c a + a +
* * *
b - (uni) b - (uni) b - (uni)

c c c
Directed Acyclic Graphs (DAGs)…
Use directed acyclic graphs to represent expressions
Use a unique node for each expression

if (x < y) Statements
x = 5*y + 5*y/3;
else
IfStmt AssignStmt
y = 5;
x = x+y;

< AssignStmt AssignStmt

x y +

/
*
3

5
Control Flow Graphs (CFGs)
Nodes in the control flow graph are basic blocks
A basic block is a sequence of statements always entered at the
beginning of the block and exited at the end
Edges in the control flow graph represent the control flow

if (x < y)
x = 5*y + 5*y/3; B0
if (x < y) goto B1 else goto B2
else
y = 5;
x = x+y; B1 B2
x = 5*y + 5*y/3 y = 5

• Each block has a sequence of statements B 3


x = x+y
• No jump from or to the middle of the block
• Once a block starts executing, it will execute till the end
Postfix Notation (PN)
A mathematical notation wherein every operator follows all
of its operands.
Example: The PN of expression 9* (5+2) is 952+*

 Form Rules:
1. If E is a variable/constant, the PN of E is E itself
2. If E is an expression of the form E1 op E2, the PN of E is
E1’E2’op (E1’ and E2’ are the PN of E1 and E2, respectively.)
3. If E is a parenthesized expression of form (E1), the PN of E is
the same as the PN of E1.
Example:
 The PN of expression (a+b)/(c-d) ? is (ab+)(cd-)/
Three-Address Code
A popular form of intermediate code used in optimizing compilers
Each instruction can have at most three operands
Assignments
x := y
x := y op z op: binary arithmetic or logical operators
x := op y op: unary operators (minus, negation, integer to

float conversion)
 Branch
goto L Execute the statement with labeled L next
 Conditional Branch
if x relop y goto L relop: <, =, <=, >=, ==, !=
if the condition holds we execute statement labeled L next
if the condition does not hold we execute the statement following
this statement next
Three-Address Code
Variables can be represented with
if (x < y)
x = 5*y + 5*y/3;
their locations in the symbol table
else
y = 5; if x < y goto L1
goto L2
x = x+y;
L1: t1 := 5 * y
t2 := 5 * y
t3 := t2 / 3
Temporaries: temporaries x := t1 + t3
correspond L2:
goto L3
y := 5
to the internal nodes of the L3: x := x + y
syntax tree
• Three address code instructions can be represented as an array of
quadruples: operation, argument1, argument2, result
triples: operation, argument1, argument2
(each triple implicitly corresponds to a temporary)
Three-Address Code Generation for a Simple Grammar
Attributes: E.place: location that holds the value of expression E
E.code: sequence of instructions that are generated for E
Procedures: newtemp(): Returns a new temporary each time it is called
gen(): Generates instruction (have to call it with appropriate arguments)
lookup(id.name): Returns the location of id from the symbol table

Productions Semantic Rules


S  id := E id.place  lookup(id.name);
S.code  E.code || gen(id.place ‘:=‘ E.place);
E  E1 + E2 E.place  newtemp();
E.code  E1.code || E2.code || gen(E.place ‘:=‘ E1.place ‘+’ E2.place);
E  E1 * E2 E.place  newtemp();
E.code  E1.code || E2.code || gen(E.place ‘:=‘ E1.place ‘*’ E2.place);
E  ( E1 ) E.code  E1.code;
E.place  E1.place;
E   E1 E.place  newtemp();
E.code  E1.code || gen(E.place ‘:=‘ ‘uminus’ E1.place);
E  id E.place  lookup(id.name);
E.code  ‘’ (empty string)
Syntax tree vs. Three address code
Expression: (A+B*C) + (-B*A) – B

T1 := B * C
_ T2 = A + T1
T3 = - B
+ B
T4 = T3 * A
+ * T5 = T2 + T4
_ T6 = T5 – B
A * A

B C
B

 Three address code is a linearized representation of a syntax tree


(or a DAG) in which explicit names (temporaries) correspond to
the interior nodes of the graph.
DAG vs. Three address code
Expression: D = ((A+B*C) + (A*B*C))/ -C

=
T1 := A T1 := B * C
D
/ T2 := C T2 := A+T1
T3 := B * T2
+ _ T4 := T1+T3
T3 := A*T1
T4 := T2+T3
T5 := T1*T3 T5 := – C
+ * T6 := T4 + T5 T6 := T4 / T5
T7 := – T2 D := T6
T8 := T6 / T7
* D := T8
A
B C

Question: Which IR code sequence is


better?
Implementation of Three Address Code
Quadruples
Four fields: op, arg1, arg2, result
 Array of struct {op, *arg1, *arg2, *result}
x:=y op z is represented as op y, z, x
arg1, arg2 and result are usually pointers to symbol table entries.
May need to use many temporary names.
Many assembly instructions are like quadruple, but arg1, arg2,
and result are real registers.
Example:
a=b*-c+b*-c
Implementation of Three Address Code…
Triples
Three fields: op, arg1, and arg2. Result become implicit.
arg1 and arg2 are either pointers to the symbol table or
index/pointers to the triple structure.
Example: d = a + (b*c)
1 * b, c
2 + a, (1)
3 assign d, (2)
No explicit temporary names used.
Need more than one entries for ternary operations such as
x:=y[i], a=b+c, x[i]=y, … etc.
Static Single- Assignment Form
facilitâtes certain code optimisations
Two distinctive aspects distinguish SSA from three-address
code.
First, all assignments in SSA are to variables with distinct
names; hence the term static single-assignment.

Second, SSA uses a notational convention called the q-function to


combine two or more definitions of a variable :
For example, the same variable may be defined in two different
control-flow paths in a program
Static Single- Assignment Form
if ( flag ) x = -1; else x = 1;
y=x*a;
If we use different names for x in the true part and the false part
of the conditional statement, then which name should we use in
the assignment y = x * a;
In this case q-function is used to combine the two definitions of x:
if ( flag ) xl = -1; else x2 = 1;
x3 = q(x1,x2);
 the q-function returns the value of its argument that corresponds to the
control-flow path that was taken to get to the assignment- statement
containing the q-function.
Stack Machine Code
pushes the value
Assumes presence of operand at the location x to
the stack
load x
stack load y
iflt L1 pops the top
 Useful for stack architectures, JVM
goto L2 two elements and
 Operations typically pop operands and L1: push 5
compares them

push results. load y


 Easy code generation and has compact multiply pops the top two
push 5 elements, multiplies
form load y them, and pushes the
 But difficult to reuse expressions and to multiply result back to the stack
push 3
rearrange
if (x < y) divide
x = 5*y + 5*y/3; add
stores the value at the
else store x top of the stack to the
goto L3 location x
y = 5;
L2: push 5
x = x+y; store y
L3: load x
load y
add
store x
Stack Machine Code Generation for a Simple Grammar

Attributes: E.code: sequence of instructions that are generated for E


(no place for an expression is needed since the result of an expression
is stored in the operand stack)
Procedures: newtemp(): Returns a new temporary each time it is called
gen(): Generates instruction (have to call it with appropriate arguments)
lookup(id.name): Returns the location of id from the symbol table

Productions Semantic Rules


S  id := E id.place  lookup(id.name);
S.code  E.code || gen(‘store’ id.place);
E  E1 + E2 E.code  E1.code || E2.code || gen(‘add’);
(arguments for the add instruction are in the top of the stack)
E  E1 * E2 E.code  E1.code || E2.code || gen(‘multiply’);
E  ( E1 ) E.code  E1.code;
E   E1 E.code  E1.code || gen( ‘negate‘);
E  id E.code  gen(‘load’ id.place)
Translation of Expressions
For a = b * -c + b * -c
following code is generated
t1 = -c
t2 = b * t 1
t3 = -c
t4 = b * t 3
t5 = t 2 + t 4
a = t5
Reading Assignment
Translation of Control flow
Translation of Declarations
Translation of Procedure call/return

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy