UNIT-3 Odg

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

UNIT-3

INTERMEDIATE CODE GENERATION


In Intermediate code generation we use syntax directed methods to translate the source
program into an intermediate form programming language constructs such as declarations,
assignments and flow-of-control statements.

intermediate code is:


 The output of the Parser and the input to the Code Generator.
 Relatively machine-independent and allows the compiler to be retargeted.
 Relatively easy to manipulate (optimiz

ABSTRACT SYNTAX TREE:


Is nothing but the condensed form of a parse tree, It is
 Useful for representing language constructs so naturally.
 The production S if B then s1 else s2 may appear as

In the next few slides we will see how abstract syntax trees can be constructed from syntax
directed definitions. Abstract syntax trees are condensed form of parse trees. Normally operators
and keywords appear as leaves but in an abstract syntax tree they are associated with the interior
nodes that would be the parent of those leaves in the parse tree. This is clearly indicated by the
examples in these slides. Chain of single productions may be collapsed, and operators move to the
parent nodes
Chain of single productions are collapsed into one node with the operators moving up to become
the node.

What is Three Address Code?


Three-address code is a sequence of statements of the general form : X := Y Op Z
where x, y, and z are names, constants, or compiler-generated temporaries; op stands for
any operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on
Boolean-valued data. Note that no built-up arithmetic expressions are permitted, as there is only
one operator on the right side of a statement. Thus a source language expression like x+y*z
might be translated into a sequence
1 := y * z
t2 : = x + t1
Where t1 and t2 are compiler-generated temporary names. This unraveling of
complicated arithmetic expressions and of nested flow-of-control statements makes three-address
code desirable for target code generation and optimization. The use of names for the intermediate
values computed by a program allow- three-address code to be easily rearranged – unlike postfix
notation. Three - address code is a linearzed representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.
Intermediate code using Syntax for the above arithmetic expression
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
The reason for the term‖three-address code‖ is that each statement usually contains three
addresses, two for the operands and one for the result. In the implementations of three-address
code given later in this section, a programmer-defined name is replaced by a pointer tc a symbol-
table entry for that name.

Types of Three-Address Statements


Three-address statements are akin to assembly code. Statements can have symbolic labels
and there are statements for flow of control. A symbolic label represents the index of a three-
address statement in the array holding inter- mediate code. Actual indices can be substituted for
the labels either by making a separate pass, or by using ‖back patching,
1. Assignment statements of the form x: = y op z, where op is a binary arithmetic or logical
operation.
2. Assignment instructions of the form x:= op y, where op is a unary operation. Essential unary
operations include unary minus, logical negation, shift operators, and conversion operators that,
for example, convert a fixed-point number to a floating-point number.
3. Copy statements of the form x: = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
.5 Conditional jumps such as if x relop y goto L. This instruction applies a relational operator
(<, =, >=, etc.) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in
the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
param xn
call p, n
Generated as part of a call of the procedure p(x,, x~,..., x‖). The integer n indicating the number
of actual parameters in ‖call p, n‖ is not redundant because calls can be nested. The
implementation of procedure calls is outline d in Section 8.7.
7. Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i]:=y sets the contents of
the location i units beyond x to the value of y. In both these instructions, x, y, and i refer to data
objects.
8. Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of these
sets the value of x to be the location of y. Presumably y is a name, perhaps a temporary, that
denotes an expression with an I-value such as A[i, j], and x is a pointer name or temporary. That
is, the r-value of x is the l-value (location) of some object!. In the statement x: = ~y, presumably
y is a pointer or a temporary whose r- value is a location. The r-value of x is made equal to the
contents of that location. Finally, +x: = y sets the r-value of the object pointed to by x to the r-
value of y.

What is Polish Notation?


Polish notation is also known as prefix notation. Polish notation helps compilers
evaluate mathematical expressions following the order of operations using operator
precedence notation, which defines the order in which operators should be evaluated,
such as multiplication before addition.
In 1924 Jan Łukasiewicz thought of parenthesis-free notations, and this is where the
Polish Notation was invented.
Suppose you have to add 3 and 6 and multiply the result by 2. Normally we will write this
as (3+6).2, also called infix notation, because the operators are in-between the
operands. In this expression, parenthesis is a must. But when you write this in prefix
notation, the expression will be .+362. In prefix notation, there is no need for the
parenthesis. This expression will be evaluated as, firstly, you have to add 3 and 6 and then
multiply the result with 2, which will be 18.

Advantages
Here are some advantages of polish notation in compiler design:
1.No need for parentheses: In polish notation, there is no need for parentheses
while writing the arithmetic expressions as the operators come before the operands.
2.Efficient Evaluation: The evaluation of an expression is easier in polish notation
because in polish notation stack can be used for evaluation.
3.Easy parsing: In polish notation, the parsing can easily be done as compared to the
infix notation.
4.Less scanning: The compiler needs fewer scans as the parentheses are not used
in the polish notations, and the compiler does not need to scan the operators and
operands differently.
Disadvantages
Here are some disadvantages of polish notation in compiler design:
1.Vague: If someone sees the polish notation for the first time and doesn’t know about
it. It will be very hard to understand how to evaluate the expression.
2.Not used commonly: The polish notation is not commonly used in day-to-day life.
These are mostly used for scientific purposes.
3.Difficult for programmers: It will be difficult for programmers who need to become
more familiar with polish notations to read the expression.

ATTRIBUTE GRAMMARS: A CFG G=(V,T,P,S), is called an Attributed


Grammar iff ,
where in G, each grammar symbol XƐ VUT, has an associated set of attributes, and each
production, p ƐP, is associated with a set of attribute evaluation rules called Semantic Actions.
In an AG, the values of attributes at a parse tree node are computed by semantic rules.
→ There are two different specifications of AGs used by the Semantic Analyzer in
evaluating the semantics
of the program constructs. They are,-
Syntax directed definition(SDD)s
o High level specifications
o Hides implementation details
o Explicit order of evaluation is not specified
- Syntax directed Translation schemes(SDT)s
 Nothing but an SDD, which indicates order in which semantic rules are to be evaluated
and
 Allow some implementation details to be shown. An attribute grammar is the formal
expression of the syntax-derived semantic checks associated with a grammar. It represents
the rules of a language not explicitly imparted by the syntax.

There are two ways for writing attributes:


1) Syntax Directed Definition(SDD): Is a context free grammar in which a set of semantic
actions are embedded (associated) with each production of G. It is a high level specification
in which implementation details are hidden, e.g., S.sys =
A.sys + B.sys;
/* does not give any implementation details. It just tells us. This kind of attribute
equation we will be using, Details like at what point of time is it evaluated and in what
manner
are hidden from the programmer.*/
E E1 + T { E.val = E1 .val+ E2.val }
E T { E.val = T.val }
T T 1 * F { T.val = T1 .val+ F.val)
T F { T.val = F.val }
F (E) { F.val = E.val }
F id { F.val = id.lexval }
F num { F.val = num.lexval }
2) Syntax directed Translation(SDT) scheme: Sometimes we want to control the way the
attributes are evaluated, the order and place where they are evaluated. This is of a slightly
lower level.
An SDT is an SDD in which semantic actions can be placed at any position in the body of
the production.
For example , following SDT prints the prefix equivalent of an arithmetic expression consisting a
+ and * operators.
L En{ printf(„E.val‟) }
E { printf(„+‟) }E1 + T
ET
T { printf(„*‟) }T 1 * F
TF
F (E)
F { printf(„id.lexval‟) }id
F { printf(„num.lexval‟) } num
This action in an SDT, is executed as soon as its node in the parse tree is visited in a preorder
traversal of the tree.
Conceptually both the SDD and SDT schemes will:
 Parse input token stream
 Build parse tree
 Traverse the parse tree to evaluate the semantic rules at the parse tree nodes
Evaluation may:
 Generate code
 Save information in the symbol table
 Issue error messages
 Perform any other activity
symbol attributes
Number value
sign negative
list position, value
bit position, value
production Attribute rule number sign list list.position 0,
Syntax directed translation
In syntax directed translation, along with the grammar we associate some informal
notations and these notations are called as semantic rules.

So we can say that

1. Grammar + semantic rule = SDT (syntax directed translation)


→ In syntax directed translation, every non-terminal can get one or more than one
attribute or sometimes 0 attribute depending on the type of the attribute. The value
of these attributes is evaluated by the semantic rules associated with the production
rule.

→In the semantic rule, attribute is VAL and an attribute may hold anything like a
string, a number, a memory location and a complex record

→In Syntax directed translation, whenever a construct encounters in the


programming language then it is translated according to the semantic rules define
in that particular programming language.

Example
Production Semantic Rules
E→E+T E.val := E.val + T.val

E→T E.val := T.val

T→T*F T.val := T.val + F.val

T→F T.val := F.val

F → (F) F.val := F.val

F → num F.val := num.lexval

E.val is one of the attributes of E.

num.lexval is the attribute returned by the lexical analyzer.

SPECIFICATIONS OF A TYPE CHECKER:Consider a language which


consists of a sequence of declarations followed by a single expression
PD;E
D D ; D | id : T
T char | integer | array [ num] of T | ^ T
E literal | num | E mod E | E [E] | E ^
→A type checker is a translation scheme that synthesizes the type of each expression from the
types of its sub-expressions. Consider the above given grammar that generates programs
consisting of a sequence of declarations D followed by a single expression E.
Specifications of a type checker for the language of the above grammar: A program generated
by this grammar is
key : integer;
key mod 1999
Assumptions:
1. The language has three basic types: char , int and type-error
2. For simplicity, all arrays start at 1. For example, the declaration array[256] of char leads to the
type expression array ( 1.. 256, char).
Rules for Symbol Table entry
D → id : T addtype(id.entry, T.type)
T → char T.type = char
T → integer T.type = int
T → ^T1 T.type = pointer(T1 .type)
T → array [ num ] of T 1 T.type = array(1..num, T 1 .type)
TYPE CHECKING OF FUNCTIONS :
Consider the Syntax Directed Definition,
E E1 ( E2 ) E. type = if E2 .type == s and
E1 .type == s t
then t
else type-error
TYPE CHECKING FOR EXPRESSIONS: consider the following SDD for expressions
E → literal E.type = char
E →num E.type = integer
E → id E.type = lookup(id.entry)
E → E1 mod E2 E.type = if E 1 .type == integer and ,E2 .type==integer then integer
E → E1 [E2 ] E.type = if E2 .type==integer and
E1 . type==array(s,t) then t else type_error
E → E1 ^ E.type = if E1 .type==pointer(t) then t else type_error
TYPE CHECKING OF STATEMENTS: Statements typically do not have values. Special
basic type void can be assigned to them. Consider the SDD for the grammar below which
generates Assignment statements conditional, and looping statements.
S →id := E S.Type = if id.type == E.type then void else type_error
S →if E then S1 S.Type = if E.type == boolean then S1.type else type_error
S →while E do S1 S.Type = if E.type == boolean then S1.type else type_error
S→ S1 ; S2 S.Type = if S1.type == void
and S2.type == void then void else type_error

Symbol Table
→Symbol table is an important data structure used in a compiler.

→Symbol table is used to store the information about the occurrence of various entities
such as objects, classes, variable name, interface, function name etc. it is used by both the
analysis and synthesis phases.

→The symbol table used for following purposes:

 It is used to store the name of all entities in a structured form at one place.

 It is used to verify if a variable has been declared.

 It is used to determine the scope of a name.

 It is used to implement type checking by verifying assignments and expressions in


the source code are semantically correct.

1. <symbol name, type, attribute>


For example, suppose a variable store the information about the following variable
declaration:

1. static int salary


then, it stores an entry in the following format:

<salary, int, static>

The clause attribute contains the entries related to the name.


Implementation The symbol table can be implemented in the unordered list if the
compiler is used to handle the small amount of data.

A symbol table can be implemented in one of the following techniques:

Linear (sorted or unsorted) list

Hash table

Binary search tree


Symbol table are mostly implemented as hash table.

Operations
The symbol table provides the following operations:

Insert ()
Insert () operation is more frequently used in the analysis phase when the tokens
are identified and names are stored in the table.

The insert() operation is used to insert the information in the symbol table like the
unique name occurring in the source code.

1. int x;
Should be processed by the compiler as:

1. insert (x, int)

lookup()
In the symbol table, lookup() operation is used to search a name. It is used to determine:

The existence of symbol in the table.

The declaration of the symbol before it is used.

Check whether the name is used in the scope.

Initialization of the symbol.

Checking whether the name is declared multiple times.


The basic format of lookup() function is as follows:

1. lookup (symbol)
This format is varies according to the programming language.
ORGANIZATION FOR BLOCK STRUCTURES:
A block is a any sequence of operations or instructions that are used to perform a [sub] task. In
any programming language,
 Blocks contain its own local data structure.
 Blocks can be nested and their starting and ends are marked by a delimiter.
 They ensure that either block is independent of other or nested in another block. That is,
it is not possible for two blocks B1 and B2 to overlap in such a way that first block B1
begins, then B2, but B1 end before B2.
 This nesting property is called block structure. The scope of a declaration in a block-
structured language is given by the most closely nested rule:
1. The scope of a declaration in a block B includes B.
2. If a name X is not declared in a block B, then an occurrence of X in B is in the scope
of a declaration of X in an enclosing block B ' such that. B ' has a declaration of X, and. B
' is more closely nested around B then any other block with a declaration of X.

Department of Computer Science & Engineering Course File : Compiler Design


DECLARATION SCOPE
int a=0 B0 not including B2
int b=0 B0 not including B1
int b=1 B1 not including B3
int a =2 B2 only
int b =3 B3 only
The outcome of the print statement will be, therefore:
21
03
01
00
Blocks : . Blocks are simpler to handle than procedures
. Blocks can be treated as parameter less procedures
. Use stack for memory allocation
. Allocate space for complete procedure body at one time

BLOCK STRUCTURES AND NON BLOCK STRUCTURE STORAGE


ALLOCATION
Storage binding and symbolic registers : Translates variable names into addresses and the
process must occur before or during code generation
- . Each variable is assigned an address or addressing method
- . Each variable is assigned an offset with respect to base which changes with every
invocation
- . Variables fall in four classes: global, global static, stack, local (non-stack) static
- The variable names have to be translated into addresses before or during code generation.
Department of Computer Science & Engineering Course File : Compiler Design
There is a base address and every name is given an offset with respect to this base which changes
with every invocation. The variables can be divided into four categories:
a) Global Variables : fixed relocatable address or offset with respect to base as global pointer
b) Global Static Variables : .Global variables, on the other hand, have static duration (hence
also called static variables): they last, and the values stored in them persist, for as long as the
program does. (Of course, the values can in general still be overwritten, so they don't necessarily
persist forever.) Therefore they have fixed relocatable address or offset with respect to base as
global pointer.
c) Stack Variables : allocate stack/global in registers and registers are not indexable, therefore,
arays cannot be in registers
. Assign symbolic registers to scalar variables
. Used for graph coloring for global register allocation
d) Stack Static Variables : By default, local variables (stack variables) (those declared within a
function) have automatic duration: they spring into existence when the function is called, and
they (and their values) disappear when the function returns. This is why they are stored in stacks
and have offset from stack/frame pointer.
Register allocation is usually done for global variables. Since registers are not indexable,
therefore, arrays cannot be in registers as they are indexed data structures. Graph coloring is a
simple technique for allocating register and minimizing register spills that works well in practice.
→ The contents of one of the registers must be stored in memory to free it up for immediate
use. We assign symbolic registers to scalar variables which are used in the graph coloring.

Local Variables in Frame


 Assign to consecutive locations; allow enough space for each
 May put word size object in half word boundaries
 Requires two half word loads
 Requires shift, or, and
 Align on double word boundaries
 Wastes space
 And Machine may allow small offsets

STATIC ALLOCATION:In this, A call statement is implemented by a sequence of two


instructions.
 A move instruction saves the return address
 A goto transfers control to the target code.
The instruction sequence is
MOV #here+20, callee.static-area
GOTO callee.code-area
callee.static-area and callee.code-area are constants referring to address of the activation record
and the first address of called procedure respectively.
. #here+20 in the move instruction is the return address; the address of the instruction following
the goto instruction
. A return from procedure callee is implemented by
GOTO *callee.static-area
For the call statement, we need to save the return address somewhere and then jump to
the location of the callee function. and
callee.static- area is a fixed location in memory. 20 is added to #here here, as the code
corresponding to the call instruction takes 20 bytes (at 4 bytes for each parameter: 4*3 for this
instruction, and 8 for the next). Then we say GOTO callee. code-area, to take us to the code of
the callee, as callee.codearea is merely the address where the code of the callee starts. Then a
return from the callee is implemented by: GOTO *callee.static area. Note that this works only
because callee.static-area is a constant.
Example:
. Assume each 100: ACTION-l
action 120: MOV 140, 364
block takes 132: GOTO 200
bytes of space 140: ACTION-2
.Start address 160: HALT
of code for c :
and p is 200: ACTION-3
100 and 200 220: GOTO *364
The activation :
records 300:
arestatically 304:
allocated starting :
at addresses 364:
300 and 364. 368:
→This example corresponds to the code shown in slide 57. Statically we say that the code
for c starts at 100 and that for p starts at 200. At some point, c calls p. Using the strategy
discussed earlier, and assuming that callee.staticarea is at the memory location 364, we get the
code as given. Here we assume that a call to 'action' corresponds to a single machine instruction
which takes 20 bytes.

RUN TIME STORAGE MANAGEMENT:


To study the run-time storage management system it is sufficient to focus on the statements:
action, call, return and halt, because they by themselves give us sufficient insight into the
behavior shown by functions in calling each other and returning.
And the run-time allocation and de-allocation of activations occur on the call of functions and
when they return.
There are mainly two kinds of run-time allocation systems: Static allocation and Stack
Allocation. While static allocation is used by the FORTRAN class of languages, stack allocation
is used by the Ada class of languages.

Storage Allocation
The different ways to allocate memory are:

1.Static storage allocation

2.Stack storage allocation

3.Heap storage allocation

Static storage allocation


In static allocation, names are bound to storage locations.

If memory is created at compile time then the memory will be created in static
area and only once.

Static allocation supports the dynamic data structure that means memory is
created only at compile time and deallocated after program completion.
The drawback with static storage allocation is that the size and position of data
objects should be known at compile time.

Another drawback is restriction of the recursion procedure.

Stack Storage Allocation


In static storage allocation, storage is organized as a stack.

An activation record is pushed into the stack when activation begins and it is
popped when the activation end.

Activation record contains the locals so that they are bound to fresh storage in
each activation record. The value of locals is deleted when the activation ends.

It works on the basis of last-in-first-out (LIFO) and this allocation supports the
recursion process.

Heap Storage Allocation


Heap allocation is the most flexible allocation scheme.

Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.

Heap allocation is used to allocate memory to the variables dynamically and when
the variables are no more used then claim it back.

Heap storage allocation supports the recursion process.

Example:
1. fact (int n)
2. {
3. if (n<=1)
4. return 1;
5. else
6. return (n * fact(n-1));
7. }
8. fact (6)
The dynamic allocation is as follows:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy