Lec05 Intermediate Code Generation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

INTERMEDIATE CODE GENERATION

4THPHASE OF COMPILER CONSTRUCTION

1
SECTION 5.1: INTERMEDIATE CODE
GENERATION

2
INTERMEDIATE CODE GENERATION

Intermediate codes are machine independent codes, but they are close
to machine instructions.
 The given program in a source language is converted to an equivalent program in
an intermediate language by the intermediate code generator.
Benefits of using a machine-independent intermediate form are:
 Retargeting is facilitated. That is, a compiler for a different machine can
be created by attaching a back end for the new machine to an existing front
end.
 A machine-independent code optimizer can be applied to the intermediate
representation.
Three ways of intermediate representation:
 Graphical representations(syntax trees)
 Postfix notation (operations on values stored on operand stack; similar to
3
JVM bytecode)
 Three-address code (triples or Quadruples)
ABSTRACT SYNTAX TREE
 A syntax tree depicts the natural hierarchical structure of a source program. A
DAG (Directed Acyclic Graph) gives the same information but in a more compact
way because common subexpressions are identified.

a := b * -c + b * -c

Tree DAG
ABSTRACT SYNTAX TREE
 ASTs don't show the whole syntactic clutter, but represent the parsed string in a
structured way, discarding all information that may be important for parsing the
string, but isn't needed for analyzing it.
 Abstract syntax trees, or simply syntax trees, differ from parse trees because
superficial distinctions of form, unimportant for translation, do not appear in syntax
trees.
For the Grammar
E→E+T|T
T→TxF|F
F → ( E ) | id
Parse the string ‘id + id x id’

5
ABSTRACT SYNTAX TREES
E.nptr

a * (b + c) E.nptr * E.nptr

a ( E.nptr )

E.nptr + E.nptr

b c
*

Pro: easy restructuring of code a +


and/or expressions for
intermediate code optimization b c 6
Cons: memory intensive
POSTFIX NOTATION

a := b * -c + b * -c
a b c uminus * b c uminus * + assign
iload 2 // push b
iload 3 // push c
Postfix notation represents ineg // uminus
operations on a stack imul // *
iload 2 // push b

iload 3 // push c
Pro: easy to generate ineg // uminus
Cons: stack operations are more
imul // *
difficult to optimize
iadd // +
istore 1 // store a

7
THREE-ADDRESS CODE

Three-address code is a sequence of statements of the general form


x : = y op z
where x, y and z are names, constants, or compiler-generated temporaries; op
stands for any operator, such as a fixed- or floating-point arithmetic operator, or a
logical operator on Boolean valued data. Thus a source language expression like
x+ y*z might be translated into a sequence
t1 : = y * z
t2 : = x + t1
where t1 and t2 are compiler-generated temporary names.
Advantages of three-address code:
 The unraveling of complicated arithmetic expressions and of nested flow-of-
control statements makes three-address code desirable for target code generation
and optimization.
 The use of names for the intermediate values computed by a program allows8
three address code to be easily rearranged – unlike postfix notation
THREE-ADDRESS CODE (QUADRUPLES)

a := b * -c + b * -c

t1 := - c t1 := - c
t2 := b * t1 t2 := b * t1
t3 := - c t5 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5
Linearized representation Linearized representation
of a syntax tree of a syntax DAG 9
COMMON THREE-ADDRESS STATEMENTS

1. Assignment statements of the form x : = y op z, where op is a binary arithmetic or logical


operation.

2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential


unary operations include unary minus, logical negation, shift operators, and conversion
operators that, for example, convert a fixed-point number to a floating-point number.

3. Copy statements of the form x : = y where the value of y is assigned to x.

4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.

5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator
( <, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in
relation relop to y. If not, the three-address statement following if x relop y goto L is 10
executed next, as in the usual sequence.
COMMON THREE-ADDRESS STATEMENTS
6. param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).

7. Indexed assignments of the form x : = y[i] and x[i] : = y.

8. Address and pointer assignments of the form x : = &y , x : = *y, and *x : = y.


11
THREE-ADDRESS STATEMENTS

Binary Operator: op y,z,result or result := y op z


where op is a binary arithmetic or logical operator. This binary operator is
applied to y and z, and the result of the operation is stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c

Unary Operator: op y,,result or result := op y


where op is a unary arithmetic or logical operator. This unary operator is applied
to y, and the result of the operation is stored in result.
Ex: uminus a,,c
not a,,c
inttoreal a,,c
12
THREE-ADDRESS STATEMENTS (CONT.)

Move Operator: mov y,,result or result := y


where the content of y is copied into result.
Ex: mov a,,c
movi a,,c
movr a,,c

Unconditional Jumps: jmp ,,L or goto L


Jump to the three-address code with the label L, and the execution continues from
that statement.
Ex: jmp ,,L1 // jump to L1
jmp ,,7 // jump to the statement 7
13
THREE-ADDRESS STATEMENTS (CONT.)
Conditional Jumps: jmprelop y,z,L
or
if y relop z goto L
Jump to the three-address code with the label L if the result of y relop z is true,
and the execution continues from that statement. If the result is false, the execution continues
from the statement following this conditional jump statement.
Ex: jmpgt y,z,L1 // jump to L1 if y>z
jmpgte y,z,L1 // jump to L1 if y>=z
jmpe y,z,L1 // jump to L1 if y==z
jmpne y,z,L1 // jump to L1 if y!=z

Relational operator can also be a unary operator.


jmpnz y,,L1 // jump to L1 if y is not zero
jmpz y,,L1 // jump to L1 if y is zero
jmpt y,,L1 // jump to L1 if y is true 14
jmpf y,,L1 // jump to L1 if y is false
THREE-ADDRESS STATEMENTS (CONT.)

Procedure Parameters: param x,, or param x


Procedure Calls: call p,n, or call p,n
where x is an actual parameter, we invoke the procedure p with n parameters.
Ex: param x1,,
param x2,,
➔ p(x1,...,xn)
param xn,,
call p,n,

f(x+1,y) ➔ add x,1,t1


param t1,,
param y,, 15
call f,2,
THREE-ADDRESS STATEMENTS (CONT.)

Indexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x

Address and Pointer Assignments:


moveaddr y,,x or x := &y
movecont y,,x or x := *y

16
SYNTAX-DIRECTED TRANSLATION INTO THREE
ADDRESS CODE

 Use attributes
 E.place: the name that will hold the value of E
 Identifier will be assumed to already have the place attribute defined.
 E.code: hold the three address code statements that evaluate E (this is
the `translation’ attribute).

 Use function newtemp that returns a new temporary variable


which can be used.

 Use function gen to generate a single three address statement


given the necessary information (variable names and operations).
17
SYNTAX-DIRECTED TRANSLATION INTO THREE
ADDRESS CODE

S → id := E S.code = E.code || gen(‘mov’ E.place ‘,,’ id.place)


E → E1 + E2 E.place = newtemp();
E.code = E1.code || E2.code || gen(‘add’ E1.place ‘,’ E2.place ‘,’ E.place)
E → E1 * E2 E.place = newtemp();
E.code = E1.code || E2.code || gen(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place)
E → - E1 E.place = newtemp();
E.code = E1.code || gen(‘uminus’ E1.place ‘,,’ E.place)
E → ( E1 ) E.place = E1.place;
E.code = E1.code
E → id E.place = id.place;
E.code = null
18
SYNTAX-DIRECTED TRANSLATION (CONT.)

S → while E do S1 S.begin = newlabel();


S.after = newlabel();
S.code = gen(S.begin “:”) || E.code ||
gen(‘jmpf’ E.place ‘,,’ S.after) || S1.code ||
gen(‘jmp’ ‘,,’ S.begin) ||
gen(S.after ‘:”)
S → if E then S1 else S2 S.else = newlabel();
S.after = newlabel();
S.code = E.code ||
gen(‘jmpf’ E.place ‘,,’ S.else) || S1.code ||
gen(‘jmp’ ‘,,’ S.after) ||
gen(S.else ‘:”) || S2.code ||
gen(S.after ‘:”) 19
TRANSLATION SCHEME TO PRODUCE THREE-ADDRESS
CODE
S → id := E { p= lookup(id.name);
if (p is not nil) then emit(‘mov’ E.place ‘,,’ p)
else error(“undefined-variable”) }
E → E1 + E2 { E.place = newtemp();
emit(‘add’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → E1 * E2 { E.place = newtemp();
emit(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → - E1 { E.place = newtemp();
emit(‘uminus’ E1.place ‘,,’ E.place) }
E → ( E1 ) { E.place = E1.place; }
E → id { p= lookup(id.name);
if (p is not nil) then E.place = id.place
20
else error(“undefined-variable”) }
TRANSLATION SCHEME WITH LOCATIONS
S → id := { E.inloc = S.inloc } E
{ p = lookup(id.name);
if (p is not nil) then { emit(E.outloc ‘mov’ E.place ‘,,’ p); S.outloc=E.outloc+1 }
else { error(“undefined-variable”); S.outloc=E.outloc } }

E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2


{ E.place = newtemp(); emit(E2.outloc ‘add’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }

E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2


{ E.place = newtemp(); emit(E2.outloc ‘mult’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }

E → - { E1.inloc = E.inloc } E1
{ E.place = newtemp(); emit(E1.outloc ‘uminus’ E1.place ‘,,’ E.place); E.outloc=E1.outloc+1 }

E → ( E1 ) { E.place = E1.place; E.outloc=E1.outloc+1 }

E → id { E.outloc = E.inloc; p= lookup(id.name);


if (p is not nil) then E.place = id.place 21
else error(“undefined-variable”) }
BOOLEAN EXPRESSIONS
E → { E1.inloc = E.inloc } E1 and { E2.inloc = E1.outloc } E2
{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’ E2.place ‘,’ E.place);
E.outloc=E2.outloc+1 }

E → { E1.inloc = E.inloc } E1 or { E2.inloc = E1.outloc } E2


{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’ E2.place ‘,’ E.place);
E.outloc=E2.outloc+1 }

E → not { E1.inloc = E.inloc } E1


{ E.place = newtemp(); emit(E1.outloc ‘not’ E1.place ‘,,’ E.place); E.outloc=E1.outloc+1 }

E → { E1.inloc = E.inloc } E1 relop { E2.inloc = E1.outloc } E2


{ E.place = newtemp();
emit(E2.outloc relop.code E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }

22
TRANSLATION SCHEME(CONT.)

S → while { E.inloc = S.inloc } E do


{ emit(E.outloc ‘jmpf’ E.place ‘,,’ ‘NOTKNOWN’);
S1.inloc=E.outloc+1; } S1
{ emit(S1.outloc ‘jmp’ ‘,,’ S.inloc);
S.outloc=S1.outloc+1;
backpatch(E.outloc,S.outloc); }

S → if { E.inloc = S.inloc } E then


{ emit(E.outloc ‘jmpf’ E.place ‘,,’ ‘NOTKNOWN’);
S1.inloc=E.outloc+1; } S1 else
{ emit(S1.outloc ‘jmp’ ‘,,’ ‘NOTKNOWN’);
S2.inloc=S1.outloc+1;
backpatch(E.outloc,S2.inloc); } S2
{ S.outloc=S2.outloc;
23
backpatch(S1.outloc,S.outloc); }
THREE ADDRESS CODES - EXAMPLE

x:=1; 01: mov 1,,x


y:=x+10; 02: add x,10,t1
while (x<y) { ➔ 03: mov t1,,y
x:=x+1; 04: lt x,y,t2
if (x%2==1) then y:=y+1; 05: jmpf t2,,17
else y:=y-2; 06: add x,1,t3
} 07: mov t3,,x
08: mod x,2,t4
09: eq t4,1,t5
10: jmpf t5,,14
11: add y,1,t6
12: mov t6,,y
13: jmp ,,16
14: sub y,2,t7
15: mov t7,,y
16: jmp ,,4 24
17:
IMPLEMENTATION OF THREE-ADDRESS
STATEMENTS: QUADS
a:=(-c*b)+(-c * b)
A quadruple is a record structure with four
fields, which are, op, arg1, arg2 and result. # Op Arg1 Arg2 Res
• The op field contains an internal code for (0) uminus c t1
the operator. The three-address statement x
(1) * b t1 t2
: = y op z is represented by placing y in
(2) uminus c t3
arg1, z in arg2 and x in result (Res).
• The contents of fields arg1, arg2 and result (3) * b t3 t4
are normally pointers to the symbol-table (4) + t2 t4 t5
entries for the names represented by these (5) := t5 a
fields. Temporary names must be entered
into the symbol table as they are created.
Quads (quadruples)

Pro: easy to rearrange code for global optimization 25

Cons: lots of temporaries


IMPLEMENTATION OF THREE-ADDRESS
STATEMENTS: TRIPLES
a:=(-c*b)+(-c * b)
# Op Arg1 Arg2
Three-address statements can be represented by
(0) uminus c
records with only three fields: op, arg1 and arg2.
• The fields arg1 and arg2, for the arguments of (1) * b (0)
op, are either pointers to the symbol table (2) uminus c
or pointers into the triple structure ( for
(3) * b (2)
temporary values ).
• Since three fields are used, this intermediate (4) + (1) (3)
code format is known as triples (5) := a (4)

Triples

Pro: temporaries are implicit 26

Cons: difficult to rearrange code


INDIRECT TRIPLES
Listing pointers to triples, rather than listing the triples themselves is called indirect triples.

a:=(-c*b)+(-c * b)
# Stmt # Op Arg1 Arg2
(0) (14) (14) uminus c
(1) (15) (15) * b (14)
(2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) := a (18)

Program Triple container


27
Pro: temporaries are implicit & easier to rearrange code
EXAMPLE
Three address code for a+b*c –d/(b*c)

28
SECTION 5.2: TYPE CHECKING

29
TYPE CHECKING
 A compiler has to do both syntactic and semantic check of the source program
 Semantic Checks can be of two types:
 Static – done during compilation
 Dynamic – done during run-time

 Type checking is one of these static checking operations.


 we may not do all type checking at compile-time.
 Some systems also use dynamic type checking too.

 A type system is a collection of rules for assigning type expressions to the parts of
a program.
 A type checker implements a type system.
 A sound type system eliminates run-time type checking for type errors because it
allow us to determine statically that these errors cannot occur when the target
program runs.
 A programming language is strongly-typed, if every program its compiler accepts
will execute without type errors.
 In practice, some of type checking operations are done at run-time (so, most of the programming30
languages are not strongly-typed).
 Ex: int x[100]; … x[i] ➔ most of the compilers cannot guarantee that i will be between 0 and
99
TYPE EXPRESSION
 The type of a language construct is denoted by a type expression.
 A type expression can be:
 A basic type
 a primitive data type such as integer, real, char, boolean, …
 type-error to signal a type error

 void : no type

 A type name
 a name can be used to denote a type expression.
 A type constructor applies to other type expressions.
 arrays: If T is a type expression, then array(I,T) is a type expression where I denotes
index range. Ex: array(0..99,int)
 products: If T1 and T2 are type expressions, then their cartesian product T1 x T2 is a
type expression. Ex: int x int
 pointers: If T is a type expression, then pointer(T) is a type expression. Ex:
pointer(int)
 functions: We may treat functions in a programming language as mapping from a
domain type D to a range type R. So, the type of a function can be denoted by the 31
type expression D→R where D are R type expressions. Ex: int→int represents the
type of a function which takes an int value as parameter, and its return type is also
int.
A SIMPLE TYPE CHECKING SYSTEM
P → D;E

D → D;D
D → id:T { addtype(id.entry,T.type) }
T → char { T.type=char } The prefix operator ↑ builds a pointer type.
Eg. ↑ Integer leads to the type expression
T → int { T.type=int } pointer(integer)

T → real { T.type=real }
T → ↑T1 { T.type=pointer(T1.type) }
T → array[intnum] of T1 { T.type=array(1..intnum.val,T1.type) }
32
TYPE CHECKING OF EXPRESSIONS
E → id { E.type=lookup(id.entry) }
*Lookup(E) is used to fetch the type saved in the symbol table entry pointed to
by e

E → literal { E.type=char }
E → num1 { E.type=int }
E → num2 { E.type=real }
*Constants represented by the tokens literal, num1and num2 have type char, int
and real.

E → E1 + E2 { if (E1.type=int and E2.type=int) then E.type=int


else if (E1.type=int and E2.type=real) then E.type=real
else if (E1.type=real and E2.type=int) then E.type=real 33
else if (E1.type=real and E2.type=real) then E.type=real
else E.type=type-error }
TYPE CHECKING OF EXPRESSIONS

E → E1 [E2] { if (E2.type=int and E1.type=array(s,t)) then E.type=t


else E.type=type-error }
*The index expression E2, the index expression E2 must have type integer. The result is the element
type t obtained from the type array(s,t) of E1.

E → E1 ↑ { if (E1.type=pointer(t)) then E.type=t


else E.type=type-error }

*The postfix operator yields the object pointed to by its operand. The type of E is the type t of the
object pointed to by the pointer E. 34
TYPE CHECKING OF STATEMENTS

Assignment Statement
S → id = E { if (id.type=E.type then S.type=void
else S.type=type-error }
Conditional Statement

S → if E then S1 { if (E.type=boolean then S.type=S1.type


else S.type=type-error }
While Statement

S → while E do S1 { if (E.type=boolean then S.type=S1.type 35


else S.type=type-error }
TYPE CHECKING OF FUNCTIONS
E → E1 ( E2 ) { if (E2.type=s and E1.type=s→t) then E.type=t
else E.type=type-error }

Ex: int f(double x, char y) { ... }

f: double x char → int

argument types return type

36
STRUCTURAL EQUIVALENCE OF TYPE EXPRESSIONS

 How do we know that two type expressions are equal?


 As long as type expressions are built from basic types (no type names),
we may use structural equivalence between two type expressions
Structural Equivalence Algorithm (sequiv):
if (s and t are same basic types) then return true
else if (s=array(s1,s2) and t=array(t1,t2)) then return (sequiv(s1,t1) and sequiv(s2,t2))
else if (s = s1 x s2 and t = t1 x t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else if (s=pointer(s1) and t=pointer(t1)) then return (sequiv(s1,t1))
else if (s = s1 → s2 and t = t1 → t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else return false

37
NAMES FOR TYPE EXPRESSIONS
 In some programming languages, we give a name to a type
expression, and we use that name as a type expression afterwards.

type link =  cell; p,q,r,s have same types ?


var p,q : link;
var r,s :  cell

 How do we treat type names?


 Get equivalent type expression for a type name (then use structural
equivalence), or
 Treat a type name as a basic type.

38
CYCLES IN TYPE EXPRESSIONS

type link =  cell;


type cell = record
x : int,
next : link
end;

 We cannot use structural equivalence if there are cycles in type


expressions.
 We have to treat type names as basic types.
➔ but this means that the type expression link is different than the type
expression cell. 39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy