1 Unit 4 Complete
1 Unit 4 Complete
1 Unit 4 Complete
Linear Form: The following are commonly used Linear form intermediate
code representations:
a. Prefix notation
b. Postfix notation
c. Three address code
Tree Form: The following are commonly used Linear form intermediate
code representations:
a. Syntax tree
b. Direct acyclic graph (DAG)
• Prefix Notation: Also known as Polish notation or prefix notation. The
ordinary (infix) way of writing the sum of a and b is with an operator in
the middle: a + b The prefix notation for the same expression places the
operator at the left end as + ab. In general, if e1 and e2 are any postfix
expressions, and + is any binary operator, the result of applying + to the
values denoted by e1 and e2 is prefix notation by + e1e2. No parentheses
are needed in prefix notation because the position and arity (number of
arguments) of the operators permit only one way to decode a prefix
expression. In prefix notation, the operator follows the operand.
Example 1: The prefix representation of the expression (a + b) * c is : * +
abc
Example 2: The prefix representation of the expression (a – b) * (c + d) +
(a – b) is : + * -ab +cd -ab
• Postfix Notation: Also known as reverse Polish notation or suffix notation.
The ordinary (infix) way of writing the sum of a and b is with an operator in
the middle: a + b The postfix notation for the same expression places the
operator at the right end as ab +. In general, if e1 and e2 are any postfix
expressions, and + is any binary operator, the result of applying + to the
values denoted by e1 and e2 is postfix notation by e1e2 +. No parentheses
are needed in postfix notation because the position and arity (number of
arguments) of the operators permit only one way to decode a postfix
expression. In postfix notation, the operator follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c
*
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a –
b) is : ab – cd + *ab -+
• Three-Address Code: A statement involving no more than three
references(two for operands and one for result) is known as a three address
statement.
• A sequence of three address statements is known as a three address code.
Three address statement is of form x = y op z, where x, y, and z will have
address (memory location). At most one operator
Example: The three address code for the expression a + b * c + d :
• T1=b*c
• T2=a+T1
• T3=T2+d
• T 1 , T 2 , T 3 are temporary variables. There are 3 ways to represent a Three-
Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1,
arg2 and result. op denotes the operator and arg1 and arg2 denotes the two
operands and result is used to store the result of the expression.
• Advantage –
• Easy to rearrange code for global optimization.
• One can quickly access value of temporary variables using symbol table.
• Disadvantage –
• Contain lot of temporaries.
• Temporary variable creation increases time and space complexity.
Example – Consider expression a = (b * – c) + (b * – c). The three address code
is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
2. Triples – This representation doesn’t make use of extra temporary variable
to represent a single operation instead when a reference to another triple’s
value is needed, a pointer to that triple is used. So, it consist of only three
fields namely op, arg1 and arg2.
Disadvantage –
• Temporaries are implicit and difficult to rearrange code.
• It is difficult to optimize because optimization involves moving intermediate
code. When a triple is moved, any other triple referring to it must be updated
also. With help of pointer one can directly access symbol table entry.
Example – Consider expression a = (b * – c) + (b * – c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
3. Indirect Triples – This representation makes use of pointer to the listing of
all references to computations which is made separately and stored. Its
similar in utility as compared to quadruple representation but requires less
space than it. Temporaries are implicit and easier to rearrange code.
• Example – Consider expression a = (b * – c) + (b * – c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Syntax tree
• In the parse tree, most of the leaf nodes are single child to their parent
nodes.
• In the syntax tree, we can eliminate this extra information.
• Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are
operators and leaves are operands.
• Syntax tree is usually used when represent a program in a tree structure.
• A sentence id + id * id would have the following syntax tree:
Example: x = (a + b * c) / (a – b * c)
Basic Block and Control Flow Graph:
A basic block is a sequence of consecutive statements in which flow of control
enters at the beginning and leaves at the end without halt or possibility of
branching except at the end.
• The following sequence of three address code forms a basic block:
t1 = a * a
t2 = a * b
t3 = t1 + t2
• The first task is to partition a sequence of three-address code into basic
blocks. A new basic block is begun with the first instruction and instructions
are added until a jump or a label is met. In the absence of a jump, control
moves further consecutively from one instruction to another.
Partition Algorithm for Basic Blocks :
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4 +t5
A Directed Acyclic Graph for Basic Block is a directed acyclic graph with the
following labels on nodes.
• The graph’s leaves each have a unique identifier, which can be variable names
or constants.
• The interior nodes of the graph are labelled with an operator symbol.
• In addition, nodes are given a string of identifiers to use as labels for storing
the computed value.
Method:
Step 1:
• If y operand is undefined then create node(y).
• If z operand is undefined then for case(i) create node(z).
Step 2:
• For case(i), create node(OP) whose right child is node(z) and left child is
node(y).
• For case(ii), check whether there is node(OP) with one child node(y).
• For case(iii), node n will be node(y).
Example :
T0 = a + b —Expression 1
T1 = T0 + c —-Expression 2
d = T0 + T1 —–Expression 3
Expression 1 : T0 = a + b
Expression 2: T1 = T0 + c
• Expression 3 : d = T0 + T1
Example :
• T1 = a + b
• T2 = T1 + c
• T3 = T1 x T2
Algorithm for construction of DAG
a=b*c
d=b
e=d*c
Step 1 Step 2 Step 3
b=e
f=b+c
g=d+f
Step 4 Step 5
Step 6
Example:
a. S1:= 4 * i
b. S2:= a[S1]
c. S3:= 4 * i
d. S4:= b[S3]
e. S5:= s2 * S4
f. S6:= prod + S5
g. Prod:= s6
h. S7:= i+1
i. i := S7
j. if i<= 20 goto (1)
Application of Directed Acyclic Graph:
• In syntax directed translation, every non-terminal can get one or more than
one attribute or sometimes 0 attribute depending on the type of the
attribute. The value of these attributes is evaluated by the semantic rules
associated with the production rule.
• In the semantic rule, attribute is VAL and an attribute may hold anything like
a string, a number, a memory location and a complex record
2 types:
1. Syntax directed translation definition
2. Syntax directed translation scheme
Types of attributes – Attributes may be of two types –
• Synthesized
• Inherited
Expression Action
E -> E + T {print(‘+’)}
E -> T {}
T -> T * F { print(‘ * ’)}
T -> F {}
F -> num { print(LexValue)}
Input String: 3 + 4 * 5
Output: 345 * +
Types of SDT:
S-attributed SDT :
• If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
• S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent
nodes depend upon the values of the child nodes.
• Semantic actions are placed in rightmost place of RHS.
L-attributed SDT:
• If an SDT uses both synthesized attributes and inherited attributes with a restriction
that inherited attribute can inherit values from left siblings only, it is called as L-
attributed SDT.
• Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
• Semantic actions are placed anywhere in RHS.
Implementation of Syntax directed translation
x < 100 || y > 200 && x ! = y x < 100 || y > 200 && x ! = y
100 : if x < 100 then ______ 100 : if x < 100 then 106
101 : goto _____ 101 : goto 102
102 : if y > 200 then______ 102 : if y > 200 then 104
103 : goto _______ 103 : goto 107
104 : if x ! = y then 104 : if x ! = y then 106
105 : goto ______ 105 : goto 107
106 : true 106 : true
107 : false 107 : false
• B. truelist (tl)= {100, 102, 104, 106}
• B. falselist (fl)= {101, 103, 105, 107}
• Backpatch (p, i): Inserts i as the target label for each of the instructions
on the record pointed to by p.
2. B -> B1 && MB2
{
Translation Rules: Backpatch (B1 . tl, M . instr);
B . tl = B2 . Tl;
B . fl = merge (B1 . fl, B2 . fl);
1. B -> B1 || MB2
}
{
Backpatch (B1.fl, M.instr);
B . tl = merge (B1 . tl, B2 . tl ); 3. B -> ! B1
{
B . fl = B2 . Fl; B . tl = B1 . fl;
} B . fl = B1 . tl;
}
Labels and Goto:
• The most elementary programming language construct for changing the flow
of control in a program is a label and goto. When a compiler encounters a
statement like goto L, it must check that there is exactly one statement with
label L in the scope of this goto statement. If the label has already appeared,
then the symbol table will have an entry giving the compiler-generated label
for the first three-address instruction associated with the source statement
labeled L. For the translation, we generate a goto three-address statement
with that compiler-generated label as a target.
• When a label L is encountered for the first time in the source program, either
in a declaration or as the target of the forward goto, we enter L into the
symbol table and generate a symbolic table for L.
One-pass code generation using backpatching:
• In a single pass, backpatching may be used to create a boolean expressions
program as well as the flow of control statements.
• The synthesized properties truelist and falselist of non-terminal B are used to
handle labels in jumping code for Boolean statements.
• The label to which control should go if B is true should be added to B.truelist,
which is a list of a jump or conditional jump instructions.
• B.falselist is the list of instructions that eventually get the label to which
control is assigned when B is false.
• The jumps to true and false exist, as well as the label field, are left blank
when the program is generated for B. The lists B.truelist and B.falselist,
respectively, contain these early jumps.
• A statement S, for example, has a synthesized attribute S.nextlist, which
indicates a list of jumps to the instruction immediately after the code for S.
It can generate instructions into an instruction array, with labels serving as
indexes. We utilize three functions to modify the list of jumps:
• Makelist (i): Create a new list including only i, an index into the array of
instructions and the makelist also returns a pointer to the newly generated
list.
• Merge(p1,p2): Concatenates the lists pointed to by p1, and p2 and returns a
pointer to the concatenated list.
• Backpatch (p, i): Inserts i as the target label for each of the instructions on
the record pointed to by p.
Backpatching for Boolean Expressions:
• Compute logical values: During bottom-up parsing, it may generate code for
Boolean statements via a translation mechanism. A non-terminal marker M
in the grammar establishes a semantic action that takes the index of the
following instruction to be formed at the appropriate moment.
Applications of Backpatching:
The target program is the output of the code generator. The output can be:
a) Assembly language: It allows subprogram to be separately compiled.
b) Relocatable machine language: It makes the process of code generation
easier.
c) Absolute machine language: It can be placed in a fixed location in
memory and can be executed immediately.
3. Memory management
6. Evaluation order
• The efficiency of the target code can be affected by the order in which
the computations are performed. Some computation orders need
fewer registers to hold results of intermediate than others.
Activation Records
• An activation record is a contiguous block of storage that manages
information required by a single execution of a procedure.
• When you enter a procedure, you allocate an activation record, and when
you exit that procedure, you de-allocate it.
• Basically, it stores the status of the current activation function. So, whenever
a function call occurs, then a new activation record is created and it will be
pushed onto the top of the stack.
• It will be in function till the execution of that function. So, once the
procedure is completed and it is returned to the calling function, this
activation function will be popped out of the stack.
• If a procedure is called, an activation record is pushed into the stack, and it is
popped when the control returns to the calling function.
• Activation Record includes some fields which are –
Return values, parameter list, control links, access links,
saved machine status, local data, and temporaries.
Runtime memory
The run-time memory is divided into areas for:
1.Code area: It is known as the text part of a program that does not change at runtime. Its
memory requirements are known at the compile time.
2.Static/ global data: In this allocation scheme, the compilation data is bound to a fixed
location in the memory and it does not change when the program executes. As the
memory requirement and storage locations are known in advance, runtime support
package for memory allocation and de-allocation is not required.
3. Stack: Procedure calls and their activations are managed by means of stack memory
allocation. It works in last-in-first-out (LIFO) method and this allocation strategy is very
useful for recursive procedure calls.
4. Heap: Variables local to a procedure are allocated and de-allocated only at runtime. Heap
allocation is used to dynamically allocate memory to the variables and claim it back when
the variables are no more required.
Both stack and heap memory can grow and shrink dynamically and unexpectedly.
Therefore, they cannot be provided with a fixed amount of memory in the system.
• The information which required during an execution of a procedure is
kept in a block of storage called an activation record. The activation
record includes storage for names local to the procedure.
• We can describe address in the target code using the following ways:
1. Static allocation
2. Stack allocation
1. Initialization of stack:
• MOV #stackstart , SP /*initializes stack*/
• HALT /*terminate execution*/
2. Implementation of Call statement:
main()
{ 3 Activation records will be generated-
one()
} AR – main()
one() AR - one()
{ AR – two()
two()
}
two()
{
…………
}
Example 2:
AR – main()
main()
AR - fact(3)
{ AR – fact(2)
int f; AR – fact(1)
f =fact(3);
}
int fact(int n)
{
if (n == 1)
return 1;
else
return (n * fact (n - 1))
}
Code Generation Algorithm
Code Generator: Code generator is used to produce the target code for
three-address statements. It uses registers to store the operands of the three
address statement.
Example: Consider the three address statement x:= y + z. It can have the
following sequence of codes:
MOV x, R0
ADD y, R0
• Invoke a function getreg to find out the location L where the result of
computation b op c should be stored.
• Consult the address description for y to determine y'. If the value of y
currently in memory and register both then prefer the register y' . If the value
of y is not already in L then generate the instruction MOV y' , L to place a
copy of y in L.
• Generate the instruction OP z' , L where z' is used to show the current
location of z. if z is in both then prefer a register to a memory location.
Update the address descriptor of x to indicate that x is in location L. If x is in L
then update its descriptor and remove x from all other descriptor.
• If the current value of y or z have no next uses or not live on exit from the
block or in register then alter the register descriptor to indicate that after
execution of x : = y op z those register will no longer contain y or z.
Generating Code for Assignment Statements:
• The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the
following sequence of three address code:
t:= a-b
u:= a-c
v:= t +u
d:= v+u
Code sequence for the example is as follows:
Cross compiler:
• A cross compiler is a compiler capable of creating executable code for a
platform other than the one on which the compiler is running.
• Compiler that runs on one machine and produces target code for another
machine.
• For example, a cross compiler executes on machine X and produces machine
code for machine Y.
Where is the cross compiler used?