Lec05 Intermediate Code Generation
Lec05 Intermediate Code Generation
Lec05 Intermediate Code Generation
1
SECTION 5.1: INTERMEDIATE CODE
GENERATION
2
INTERMEDIATE CODE GENERATION
Intermediate codes are machine independent codes, but they are close
to machine instructions.
The given program in a source language is converted to an equivalent program in
an intermediate language by the intermediate code generator.
Benefits of using a machine-independent intermediate form are:
Retargeting is facilitated. That is, a compiler for a different machine can
be created by attaching a back end for the new machine to an existing front
end.
A machine-independent code optimizer can be applied to the intermediate
representation.
Three ways of intermediate representation:
Graphical representations(syntax trees)
Postfix notation (operations on values stored on operand stack; similar to
3
JVM bytecode)
Three-address code (triples or Quadruples)
ABSTRACT SYNTAX TREE
A syntax tree depicts the natural hierarchical structure of a source program. A
DAG (Directed Acyclic Graph) gives the same information but in a more compact
way because common subexpressions are identified.
a := b * -c + b * -c
Tree DAG
ABSTRACT SYNTAX TREE
ASTs don't show the whole syntactic clutter, but represent the parsed string in a
structured way, discarding all information that may be important for parsing the
string, but isn't needed for analyzing it.
Abstract syntax trees, or simply syntax trees, differ from parse trees because
superficial distinctions of form, unimportant for translation, do not appear in syntax
trees.
For the Grammar
E→E+T|T
T→TxF|F
F → ( E ) | id
Parse the string ‘id + id x id’
5
ABSTRACT SYNTAX TREES
E.nptr
a * (b + c) E.nptr * E.nptr
a ( E.nptr )
E.nptr + E.nptr
b c
*
a := b * -c + b * -c
a b c uminus * b c uminus * + assign
iload 2 // push b
iload 3 // push c
Postfix notation represents ineg // uminus
operations on a stack imul // *
iload 2 // push b
iload 3 // push c
Pro: easy to generate ineg // uminus
Cons: stack operations are more
imul // *
difficult to optimize
iadd // +
istore 1 // store a
7
THREE-ADDRESS CODE
a := b * -c + b * -c
t1 := - c t1 := - c
t2 := b * t1 t2 := b * t1
t3 := - c t5 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5
Linearized representation Linearized representation
of a syntax tree of a syntax DAG 9
COMMON THREE-ADDRESS STATEMENTS
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator
( <, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in
relation relop to y. If not, the three-address statement following if x relop y goto L is 10
executed next, as in the usual sequence.
COMMON THREE-ADDRESS STATEMENTS
6. param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).
Indexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x
16
SYNTAX-DIRECTED TRANSLATION INTO THREE
ADDRESS CODE
Use attributes
E.place: the name that will hold the value of E
Identifier will be assumed to already have the place attribute defined.
E.code: hold the three address code statements that evaluate E (this is
the `translation’ attribute).
E → - { E1.inloc = E.inloc } E1
{ E.place = newtemp(); emit(E1.outloc ‘uminus’ E1.place ‘,,’ E.place); E.outloc=E1.outloc+1 }
22
TRANSLATION SCHEME(CONT.)
Triples
a:=(-c*b)+(-c * b)
# Stmt # Op Arg1 Arg2
(0) (14) (14) uminus c
(1) (15) (15) * b (14)
(2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) := a (18)
28
SECTION 5.2: TYPE CHECKING
29
TYPE CHECKING
A compiler has to do both syntactic and semantic check of the source program
Semantic Checks can be of two types:
Static – done during compilation
Dynamic – done during run-time
A type system is a collection of rules for assigning type expressions to the parts of
a program.
A type checker implements a type system.
A sound type system eliminates run-time type checking for type errors because it
allow us to determine statically that these errors cannot occur when the target
program runs.
A programming language is strongly-typed, if every program its compiler accepts
will execute without type errors.
In practice, some of type checking operations are done at run-time (so, most of the programming30
languages are not strongly-typed).
Ex: int x[100]; … x[i] ➔ most of the compilers cannot guarantee that i will be between 0 and
99
TYPE EXPRESSION
The type of a language construct is denoted by a type expression.
A type expression can be:
A basic type
a primitive data type such as integer, real, char, boolean, …
type-error to signal a type error
void : no type
A type name
a name can be used to denote a type expression.
A type constructor applies to other type expressions.
arrays: If T is a type expression, then array(I,T) is a type expression where I denotes
index range. Ex: array(0..99,int)
products: If T1 and T2 are type expressions, then their cartesian product T1 x T2 is a
type expression. Ex: int x int
pointers: If T is a type expression, then pointer(T) is a type expression. Ex:
pointer(int)
functions: We may treat functions in a programming language as mapping from a
domain type D to a range type R. So, the type of a function can be denoted by the 31
type expression D→R where D are R type expressions. Ex: int→int represents the
type of a function which takes an int value as parameter, and its return type is also
int.
A SIMPLE TYPE CHECKING SYSTEM
P → D;E
D → D;D
D → id:T { addtype(id.entry,T.type) }
T → char { T.type=char } The prefix operator ↑ builds a pointer type.
Eg. ↑ Integer leads to the type expression
T → int { T.type=int } pointer(integer)
T → real { T.type=real }
T → ↑T1 { T.type=pointer(T1.type) }
T → array[intnum] of T1 { T.type=array(1..intnum.val,T1.type) }
32
TYPE CHECKING OF EXPRESSIONS
E → id { E.type=lookup(id.entry) }
*Lookup(E) is used to fetch the type saved in the symbol table entry pointed to
by e
E → literal { E.type=char }
E → num1 { E.type=int }
E → num2 { E.type=real }
*Constants represented by the tokens literal, num1and num2 have type char, int
and real.
*The postfix operator yields the object pointed to by its operand. The type of E is the type t of the
object pointed to by the pointer E. 34
TYPE CHECKING OF STATEMENTS
Assignment Statement
S → id = E { if (id.type=E.type then S.type=void
else S.type=type-error }
Conditional Statement
36
STRUCTURAL EQUIVALENCE OF TYPE EXPRESSIONS
37
NAMES FOR TYPE EXPRESSIONS
In some programming languages, we give a name to a type
expression, and we use that name as a type expression afterwards.
38
CYCLES IN TYPE EXPRESSIONS