Unit-Iii: Figure 4.1: Intermediate Code Generator

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

UNIT-III

In Intermediate code generation we use syntax directed methods to translate the source
program into an intermediate form programming language constructs such as declarations,
assignments and flow-of-control statements.

Figure 4.1 : Intermediate Code Generator


Intermediate code is:

 The output of the Parser and the input to the Code Generator.
 Relatively machine-independent and allows the compiler to be retargeted.
 Relatively easy to manipulate (optimize).

What are the Advantages of an intermediate language?

Advantages of Using an Intermediate Language includes :

1. Retargeting is facilitated - Build a compiler for a new machine by attaching a new code
generator to an existing front-end.

2. Optimization - reuse intermediate code optimizers in compilers for different languages


and different machines.

Note: the terms ―intermediate code‖, ―intermediate language‖, and ―intermediate


representation‖ are all used interchangeably.

Types of Intermediate representations / forms: There are three types of intermediate


representation:-

1. Syntax Trees

2. Postfix notation

3. Three Address Code

Semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees of for generating postfix notation.
Graphical Representations

A syntax tree depicts the natural hierarchical structure of a source program. A DAG
(Directed Acyclic Graph) gives the same information but in a more compact way because
common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-c+b*-c
appear in the following figure.

. assign

a +

* *

b uniminus b uniminus

c c

Figure 4.2 : Abstract Syntax Tree for the statement a:=b*-c+b*-c

Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the in
which a node appears immediately after its children. The postfix notation for the syntax tree in
the fig is

a b c uminus + b c uminus * + assign

The edges in a syntax tree do not appear explicitly in postfix notation. They can be
recovered in the order in which the nodes appear and the no. of operands that the operator at a
node expects. The recovery of edges is similar to the evaluation, using a staff, of an expression in
postfix notation.

What is Three Address Code?

Three-address code is a sequence of statements of the general form : X := Y Op Z

where x, y, and z are names, constants, or compiler-generated temporaries; op stands for


any operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on
Boolean-valued data. Note that no built-up arithmetic expressions are permitted, as there is only
one operator on the right side of a statement. Thus a source language expression like x+y*z
might be translated into a sequence
t1 := y * z
t2 : = x + t1
Where t1 and t2 are compiler-generated temporary names. This unraveling of
complicated arithmetic expressions and of nested flow-of-control statements makes three-address
code desirable for target code generation and optimization. The use of names for the intermediate
values computed by a program allow- three-address code to be easily rearranged – unlike postfix
notation. Three - address code is a linearzed representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.

Intermediate code using Syntax for the above arithmetic expression

t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
The reason for the term‖three-address code‖ is that each statement usually contains three
addresses, two for the operands and one for the result. In the implementations of three-address
code given later in this section, a programmer-defined name is replaced by a pointer tc a symbol-
table entry for that name.

Types of Three-Address Statements

Three-address statements are akin to assembly code. Statements can have symbolic labels
and there are statements for flow of control. A symbolic label represents the index of a three-
address statement in the array holding inter- mediate code. Actual indices can be substituted for
the labels either by making a separate pass, or by using ‖back patching,‖ discussed in Section
8.6. Here are the common three-address statements used in the remainder of this book:

1. Assignment statements of the form x: = y op z, where op is a binary arithmetic or logical


operation.

2. Assignment instructions of the form x:= op y, where op is a unary operation. Essential unary
operations include unary minus, logical negation, shift operators, and conversion operators that,
for example, convert a fixed-point number to a floating-point number.

3. Copy statements of the form x: = y where the value of y is assigned to x.

4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator
(<, =, >=, etc.) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in
the usual sequence.

6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. Their typical use is as the sequence of three-address statements

param x1
param x2
param xn
call p, n
Generated as part of a call of the procedure p(x,, x~,..., x‖). The integer n indicating the number
of actual parameters in ‖call p, n‖ is not redundant because calls can be nested. The
implementation of procedure calls is outline d in Section 8.7.

7. Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i]:=y sets the contents of
the location i units beyond x to the value of y. In both these instructions, x, y, and i refer to data
objects.

8. Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of these
sets the value of x to be the location of y. Presumably y is a name, perhaps a temporary, that
denotes an expression with an I-value such as A[i, j], and x is a pointer name or temporary. That
is, the r-value of x is the l-value (location) of some object!. In the statement x: = ~y, presumably
y is a pointer or a temporary whose r- value is a location. The r-value of x is made equal to the
contents of that location. Finally, +x: = y sets the r-value of the object pointed to by x to the r-
value of y.

The choice of allowable operators is an important issue in the design of an intermediate


form. The operator set must clearly be rich enough to implement the operations in the source
language. A small operator set is easier to implement on a new target machine. However, a
restricted instruction set may force the front end to generate long sequences of statements for
some source, language operations. The optimizer and code generator may then have to work
harder if good code is to be generated.

SYNTAX DIRECTED TRANSLATION OF THREE ADDRESS CODE

When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. The value of non-
computed into a new temporary t. In general, the three- address code for id: = E consists of code
to evaluate E into some temporary t, followed by the assignment id.place: = t. If an expression is
a single identifier, say y, then y itself holds the value of the expression. For the moment, we
create a new name every time a temporary is needed; techniques for reusing temporaries are
given in Section S.3. The S-attributed definition in Fig. 8.6 generates three-address code for
assignment statements. Given input a: = b+ – c + b+ – c, it produces the code in Fig. 8.5(a). The
synthesized attribute S.code represents the three- address code for the assignment S. The non-
terminal E has two attributes:

1. E.place, the name that will hold the value of E, and

2. E.code, the sequence of three-address statements evaluating E.

The function newtemp returns a sequence of distinct names t1, t2,... in response to
successive calls. For convenience, we use the notation gen(x ‘: =‘ y ‘+‘ z) in Fig. 8.6 to represent
the three-address statement x: = y + z. Expressions appearing instead of variables like x, y, and z
are evaluated when passed to gen, and quoted operators or operands, like ‘+‘, are taken literally.
In practice, three- address statements might be sent to an output file, rather than built up into the
code attributes. Flow-of-control statements can be added to the language of assignments in Fig.
8.6 by productions and semantic rules )like the ones for while statements in Fig. 8.7. In the
figure, the code for S - while E do S, is generated using‘ new attributes S.begin and S.after to
mark the first statement in the code for E and the statement following the code for S,
respectively.

These attributes represent labels created by a function new label that returns a new label
every time it is called.
IMPLEMENTATIONS OF THREE-ADDRESS STATEMENTS:

A three-address statement is an abstract form of intermediate code. In a compiler, these


statements can be implemented as records with fields for the operator and the operands. Three
such representations are quadruples, triples, and indirect triples.

QUADRUPLES:

A quadruple is a record structure with four fields, which we call op, arg l, arg 2, and
result. The op field contains an internal code for the operator. The three-address statement x:= y
op z is represented by placing y in arg 1. z in arg 2. and x in result. Statements with unary
operators like x: = – y or x: = y do not use arg 2. Operators like param use neither arg2 nor
result. Conditional and unconditional jumps put the target label in result. The quadruples in Fig.
H.S(a) are for the assignment a: = b+ – c + b i – c. They are obtained from the three-address code
.The contents of fields arg 1, arg 2, and result are normally pointers to the symbol-table entries
for the names represented by these fields. If so, temporary names must be entered into the
symbol table as they are created.

TRIPLES:

To avoid entering temporary names into the symbol table. We might refer to a temporary
value bi the position of the statement that computes it. If we do so, three-address statements can
be represented by records with only three fields: op, arg 1 and arg2, as Shown below. The fields
arg l and arg2, for the arguments of op, are either pointers to the symbol table (for programmer-
defined names or constants) or pointers into the triple structure (for temporary values). Since
three fields are used, this intermediate code format is known as triples.‘ Except for the treatment
of programmer-defined names, triples correspond to the representation of a syntax tree or dag by
an array of nodes, as in

op Arg1 Arg2 Result op Arg1 Arg2


(0) uminus c t1 (0) uminus C
(1) * b t1 t2 (1) * B (0)
(2) uminus c t3 (2) uminus C
(3) * b t3 t4 (3) * B (2)
(4) + t2 t4 t5 (4) + (1) (3)
(5) := t5 A (5) := A (4)
Table 8.8 (a) : Qudraples Table8.8(b) : Triples :Triples

Parenthesized numbers represent pointers into the triple structure, while symbol-table
pointers are represented by the names themselves. In practice, the information needed to interpret
the different kinds of entries in the arg 1 and arg2 fields can be encoded into the op field or some
additional fields. The triples in Fig. 8.8(b) correspond to the quadruples in Fig. 8.8(a). Note that
the copy statement a:= t5 is encoded in the triple representation by placing a in the arg 1 field
and using the operator assign. A ternary operation like x[ i ]: = y requires two entries in the triple
structure, as shown in Fig. 8.9(a), while x: = y[i] is naturally represented as two operations in
Fig. 8.9(b).

Indirect Triples

Another implementation of three-address code that has been considered is that of listing
pointers to triples, rather than listing the triples themselves. This implementation is naturally
called indirect triples. For example, let us use an array statement to list pointers to triples in the
desired order. Then the triples in Fig. 8.8(b) might be represented as in Fig. 8.10.

Figure 8.10 : Indirect Triples

SEMANTIC ANALYSIS : This phase focuses mainly on the

. Checking the semantics ,


. Error reporting
. Disambiguate overloaded operators
. Type coercion ,
. Static checking
- Type checking
-Control flow checking
- Uniqueness checking
- Name checking aspects of translation

Assume that the program has been verified to be syntactically correct and converted into
some kind of intermediate representation (a parse tree). One now has parse tree available. The
next phase will be semantic analysis of the generated parse tree. Semantic analysis also includes
error reporting in case any semantic error is found out.

Semantic analysis is a pass by a compiler that adds semantic information to the parse tree
and performs certain checks based on this information. It logically follows the parsing phase, in
which the parse tree is generated, and logically precedes the code generation phase, in which
(intermediate/target) code is generated. (In a compiler implementation, it may be possible to fold
different phases into one pass.) Typical examples of semantic information that is added and
checked is typing information ( type checking ) and the binding of variables and function names
to their definitions ( object binding ). Sometimes also some early code optimization is done in
this phase. For this phase the compiler usually maintains symbol tables in which it stores what
each symbol (variable names, function names, etc.) refers to.

FOLLOWING THINGS ARE DONE IN SEMANTIC ANALYSIS:

Disambiguate Overloaded operators: If an operator is overloaded, one would like to specify


the meaning of that particular operator because from one will go into code generation phase next.

TYPE CHECKING: The process of verifying and enforcing the constraints of types is called
type checking. This may occur either at compile-time (a static check) or run-time (a dynamic
check). Static type checking is a primary task of the semantic analysis carried out by a compiler.
If type rules are enforced strongly (that is, generally allowing only those automatic type
conversions which do not lose information), the process is called strongly typed, if not, weakly
typed.

UNIQUENESS CHECKING: Whether a variable name is unique or not, in the its scope.

Type coersion: If some kind of mixing of types is allowed. Done in languages which are not
strongly typed. This can be done dynamically as well as statically.

NAME CHECKS: Check whether any variable has a name which is not allowed. Ex. Name is
same as an identifier (Ex. int in java).

 Parser cannot catch all the program errors


 There is a level of correctness that is deeper than syntax analysis
 Some language features cannot be modeled using context free grammar formalism
- Whether an identifier has been declared before use, this problem is of identifying a language
{w α w | w ε Σ *}

- This language is not context free

A parser has its own limitations in catching program errors related to semantics, something that
is deeper than syntax analysis. Typical features of semantic analysis cannot be modeled using
context free grammar formalism. If one tries to incorporate those features in the definition of a
language then that language doesn't remain context free anymore.

Example: in

string x; int y;
y=x+3 the use of x is a type error
int a, b;
a = b + c c is not declared

An identifier may refer to different variables in different parts of the program . An identifier
may be usable in one part of the program but not another These are a couple of examples which
tell us that typically what a compiler has to do beyond syntax analysis. The third point can be
explained like this: An identifier x can be declared in two separate functions in the program,
once of the type int and then of the type char. Hence the same identifier will have to be bound to
these two different properties in the two different contexts. The fourth point can be explained in
this manner: A variable declared within one function cannot be used within the scope of the
definition of the other function unless declared there separately. This is just an example.
Probably you can think of many more examples in which a variable declared in one scope cannot
be used in another scope.

ABSTRACT SYNTAX TREE: Is nothing but the condensed form of a parse tree, It is

 Useful for representing language constructs so naturally.


 The production S if B then s1 else s2 may appear as

In the next few slides we will see how abstract syntax trees can be constructed from syntax
directed definitions. Abstract syntax trees are condensed form of parse trees. Normally operators
and keywords appear as leaves but in an abstract syntax tree they are associated with the interior
nodes that would be the parent of those leaves in the parse tree. This is clearly indicated by the
examples in these slides.
Chain of single productions may be collapsed, and operators move to the parent nodes

Chain of single productions are collapsed into one node with the operators moving up to become
the node.

CONSTRUCTING ABSTRACT SYNTAX TREE FOR EXPRESSIONS:

In constructing the Syntax Tree, we follow the convention that :

. Each node of the tree can be represented as a record consisting of at least two fields to store
operators and operands.
.operators : one field for operator, remaining fields ptrs to operands mknode( op,left,right )
.identifier : one field with label id and another ptr to symbol table mkleaf(id, id.entry)
.number : one field with label num and another to keep the value of the number mkleaf(num,val)

Each node in an abstract syntax tree can be implemented as a record with several fields. In the
node for an operator one field identifies the operator (called the label of the node) and the
remaining contain pointers to the nodes for operands. Nodes of an abstract syntax tree may have
additional fields to hold values (or pointers to values) of attributes attached to the node. The
functions given in the slide are used to create the nodes of abstract syntax trees for expressions.
Each function returns a pointer to a newly created note.
For Example: the following
sequence of function
calls creates a parse
tree for w= a- 4 + c

P 1 = mkleaf(id, entry.a)
P 2 = mkleaf(num, 4)
P 3 = mknode(-, P 1 , P 2 )
P 4 = mkleaf(id, entry.c)
P 5 = mknode(+, P 3 , P 4 )

An example showing the formation of an abstract syntax tree by the given function calls for the
expression a-4+c.The call sequence can be defined based on its postfix form, which is explained
blow.

A- Write the postfix equivalent of the expression for which we want to construct a syntax tree

For above string w=a-4+c, it is a4-c+

B- Call the functions in the sequence, as defined by the sequence in the postfix expression which
results in the desired tree. In the case above, call mkleaf() for a, mkleaf() for 4, mknode() for
-, mkleaf() for c , and mknode() for + at last.

1. P1 = mkleaf(id, a.entry) : A leaf node made for the identifier a, and an entry for a is made in
the symbol table.

2. P2 = mkleaf(num,4) : A leaf node made for the number 4, and entry for its value.

3. P3 = mknode(-,P1,P2) : An internal node for the -, takes the pointer to previously made nodes
P1, P2 as arguments and represents the expression a-4.

4. P4 = mkleaf(id, c.entry) : A leaf node made for the identifier c , and an entry for c.entry made
in the symbol table.

5. P5 = mknode(+,P3,P4) : An internal node for the + , takes the pointer to previously made
nodes P3,P4 as arguments and represents the expression a- 4+c .

Following is the syntax directed definition for constructing syntax tree above

E E1+T E.ptr = mknode(+, E1 .ptr, T.ptr)


E T E.ptr = T.ptr
T T1*F T.ptr := mknode(*, T1 .ptr, F.ptr)
T F T.ptr := F.ptr
F (E) F.ptr := E.ptr
F id F.ptr := mkleaf(id, id.entry)
F num F.ptr := mkleaf(num, val)

Now we have the syntax directed definitions to construct the parse tree for a given grammar. All
the rules mentioned in slide 29 are taken care of and an abstract syntax tree is formed.

ATTRIBUTE GRAMMARS: A CFG G=(V,T,P,S), is called an Attributed Grammar iff ,


where in G, each grammar symbol XƐ VUT, has an associated set of attributes, and each
production, p ƐP, is associated with a set of attribute evaluation rules called Semantic Actions.
In an AG, the values of attributes at a parse tree node are computed by semantic rules. There are
two different specifications of AGs used by the Semantic Analyzer in evaluating the semantics
of the program constructs. They are,

- Syntax directed definition(SDD)s


o High level specifications
o Hides implementation details
o Explicit order of evaluation is not specified
- Syntax directed Translation schemes(SDT)s
 Nothing but an SDD, which indicates order in which semantic rules are to be
evaluated and
 Allow some implementation details to be shown.
An attribute grammar is the formal expression of the syntax-derived semantic checks
associated with a grammar. It represents the rules of a language not explicitly imparted by the
syntax. In a practical way, it defines the information that is needed in the abstract syntax tree in
order to successfully perform semantic analysis. This information is stored as attributes of the
nodes of the abstract syntax tree. The values of those attributes are calculated by semantic rule.

There are two ways for writing attributes:

1) Syntax Directed Definition(SDD): Is a context free grammar in which a set of semantic


actions are embedded (associated) with each production of G.

It is a high level specification in which implementation details are hidden, e.g., S.sys =
A.sys + B.sys;

/* does not give any implementation details. It just tells us. This kind of attribute
equation we will be using, Details like at what point of time is it evaluated and in what manner
are hidden from the programmer.*/

E E1 + T { E.val = E1 .val+ E2.val }


E T { E.val = T.val }
T T1*F { T.val = T1 .val+ F.val)
T F { T.val = F.val }
F (E) { F.val = E.val }
F id { F.val = id.lexval }
F num { F.val = num.lexval }

2) Syntax directed Translation(SDT) scheme: Sometimes we want to control the way the
attributes are evaluated, the order and place where they are evaluated. This is of a slightly lower
level.

An SDT is an SDD in which semantic actions can be placed at any position in the body of the
production.
For example , following SDT prints the prefix equivalent of an arithmetic expression consisting a
+ and * operators.

L En{ printf(„E.val‟) }
E { printf(„+‟) }E1 + T
E T
T { printf(„*‟) }T 1 * F
T F
F (E)
F { printf(„id.lexval‟) }id
F { printf(„num.lexval‟) } num

This action in an SDT, is executed as soon as its node in the parse tree is visited in a preorder
traversal of the tree.

Conceptually both the SDD and SDT schemes will:


 Parse input token stream
 Build parse tree
 Traverse the parse tree to evaluate the semantic rules at the parse tree nodes
Evaluation may:
 Generate code
 Save information in the symbol table
 Issue error messages
 Perform any other activity

To avoid repeated traversal of the parse tree, actions are taken simultaneously when a token is
found. So calculation of attributes goes along with the construction of the parse tree.

Along with the evaluation of the semantic rules the compiler may simultaneously generate code,
save the information in the symbol table, and/or issue error messages etc. at the same time while
building the parse tree.

This saves multiple passes of the parse tree.

Example

Number sign list


sign +|-
list list bit | bit
bit 0|1

Build attribute grammar that annotates Number with the value it represents

. Associate attributes with grammar symbols


symbol attributes
Number value
sign negative
list position, value
bit position, value
production Attribute rule number sign list
list.position 0

if sign.negative

then number.value - list.value


else number.value list.value
sign + sign.negative false sign - sign.negative true list bit
bit.position list.position
list.value bit.value
list0 list 1 bit
list1 .position list 0 .position + 1
bit.position list 0 .position
list0 .value list1 .value + bit.value
bit 0 bit.value 0 bit 1 bit.value 2bit.position

Explanation of attribute rules


Num -> sign list /*since list is the rightmost so it is assigned position 0
*Sign determines whether the value of the number would be
*same or the negative of the value of list*/
Sign -> + | - /*Set the Boolean attribute (negative) for sign*/
List -> bit /*bit position is the same as list position because this bit is the rightmost
*value of the list is same as bit.*/
List0 -> List1 bit /*position and value calculations*/
Bit -> 0 | 1 /*set the corresponding value*/

Attributes of RHS can be computed from attributes of LHS and vice versa.

The Parse Tree and the Dependence graph are as under


.

Dependence graph shows the dependence of attributes on other attributes, along with the
syntax tree. Top down traversal is followed by a bottom up traversal to resolve the dependencies.
Number, val and neg are synthesized attributes. Pos is an inherited attribute.

Attributes : . Attributes fall into two classes namely synthesized attributes and inherited
attributes. Value of a synthesized attribute is computed from the values of its children nodes.
Value of an inherited attribute is computed from the sibling and parent nodes.

The attributes are divided into two groups, called synthesized attributes and inherited
attributes. The synthesized attributes are the result of the attribute evaluation rules also using the
values of the inherited attributes. The values of the inherited attributes are inherited from parent
nodes and siblings.

Each grammar production A a has associated with it a set of semantic rules of the form

b = f (c1 , c2 , ..., ck ) , Where f is a function, and either ,b is a synthesized attribute of A Or

- b is an inherited attribute of one of the grammar symbols on the right

. attribute b depends on attributes c1 , c2 , ..., ck

Dependence relation tells us what attributes we need to know before hand to calculate a
particular attribute.

Here the value of the attribute b depends on the values of the attributes c1 to ck . If c1 to ck
belong to the children nodes and b to A then b will be called a synthesized attribute. And if b
belongs to one among a (child nodes) then it is an inherited attribute of one of the grammar
symbols on the right.
Synthesized Attributes : A syntax directed definition that uses only synthesized attributes
is said to be an S- attributed definition

. A parse tree for an S-attributed definition can be annotated by evaluating semantic rules for
attributes

S-attributed grammars are a class of attribute grammars, comparable with L-attributed grammars
but characterized by having no inherited attributes at all. Inherited attributes, which must be
passed down from parent nodes to children nodes of the abstract syntax tree during the semantic
analysis, pose a problem for bottom-up parsing because in bottom-up parsing, the parent nodes
of the abstract syntax tree are created after creation of all of their children. Attribute evaluation
in S-attributed grammars can be incorporated conveniently in both top-down parsing and bottom-
up parsing .

Syntax Directed Definitions for a desk calculator program


L En Print (E.val)
E E+T E.val = E.val + T.val
E T E.val = T.val
T T*F T.val = T.val * F.val
T F T.val = F.val
F (E) F.val = E.val
F digit F.val = digit.lexval

. terminals are assumed to have only synthesized attribute values of which are supplied by lexical
analyzer

. start symbol does not have any inherited attribute

This is a grammar which uses only synthesized attributes. Start symbol has no parents, hence no
inherited attributes.

Parse tree for 3 * 4 + 5 n


Using the previous attribute grammar calculations have been worked out here for 3 * 4 + 5 n.
Bottom up parsing has been done.

Inherited Attributes: An inherited attribute is one whose value is defined in terms of


attributes at the parent and/or siblings

. Used for finding out the context in which it appears

. possible to use only S-attributes but more natural to use inherited attributes

D TL L.in = T.type
T real T.type = real
T int T.type = int
L L1 , id L1 .in = L.in; addtype(id.entry, L.in)
L id addtype (id.entry,L.in)

Inherited attributes help to find the context (type, scope etc.) of a token e.g., the type of a token
or scope when the same variable name is used multiple times in a program in different functions.
An inherited attribute system may be replaced by an S -attribute system but it is more natural to
use inherited attributes in some cases like the example given above.

Here addtype(a, b) functions adds a symbol table entry for the id a and attaches to it the type of b
.

Parse tree for real x, y, z


Dependence of attributes in an inherited attribute system. The value of in (an inherited attribute)
at the three L nodes gives the type of the three identifiers x , y and z . These are determined by
computing the value of the attribute T.type at the left child of the root and then valuating L.in top
down at the three L nodes in the right subtree of the root. At each L node the procedure addtype
is called which inserts the type of the identifier to its entry in the symbol table. The figure also
shows the dependence graph which is introduced later.

Dependence Graph : . If an attribute b depends on an attribute c then the semantic rule for b
must be evaluated after the semantic rule for c

. The dependencies among the nodes can be depicted by a directed graph called dependency
graph

Dependency Graph : Directed graph indicating interdependencies among the synthesized and
inherited attributes of various nodes in a parse tree.

Algorithm to construct dependency graph

for each node n in the parse tree do

for each attribute a of the grammar symbol do

construct a node in the dependency graph

for a

for each node n in the parse tree do

for each semantic rule b = f (c1 , c2 , ..., ck ) do

{ associated with production at n }


for i = 1 to k do

Construct an edge from ci to b

An algorithm to construct the dependency graph. After making one node for every
attribute of all the nodes of the parse tree, make one edge from each of the other attributes on
which it depends.

For example ,

The semantic rule A.a = f(X.x , Y.y) for the production A -> XY defines the synthesized
attribute a of A to be dependent on the attribute x of X and the attribute y of Y . Thus the
dependency graph will contain an edge from X.x to A.a and Y.y to A.a accounting for the two
dependencies. Similarly for the semantic rule X.x = g(A.a , Y.y) for the same production there
will be an edge from A.a to X.x and an edg e from Y.y to X.x.

Example

. Whenever following production is used in a parse tree

E E1+E2 E.val = E 1 .val + E 2 .val

we create a dependency graph


The synthesized attribute E.val depends on E1.val and E2.val hence the two edges one
each from E 1 .val & E 2 .val

For example, the dependency graph for the sting real id1, id2, id3

. Put a dummy synthesized attribute b for a semantic rule that consists of a procedure call

The figure shows the dependency graph for the statement real id1, id2, id3 along with the
parse tree. Procedure calls can be thought of as rules defining the values of dummy synthesized
attributes of the nonterminal on the left side of the associated production. Blue arrows constitute
the dependency graph and black lines, the parse tree. Each of the semantic rules addtype
(id.entry, L.in) associated with the L productions leads to the creation of the dummy attribute.

Evaluation Order :

Any topological sort of dependency graph gives a valid order in which semantic rules must be
evaluated

a4 = real
a5 = a4
addtype(id3.entry, a5)
a7 = a5
addtype(id2.entry, a7 )
a9 := a7 addtype(id1.entry, a9 )

A topological sort of a directed acyclic graph is any ordering m1, m2, m3 .......mk of the
nodes of the graph such that edges go from nodes earlier in the ordering to later nodes. Thus if
mi -> mj is an edge from mi to mj then mi appears before mj in the ordering. The order of the
statements shown in the slide is obtained from the topological sort of the dependency graph in
the previous slide. 'an' stands for the attribute associated with the node numbered n in the
dependency graph. The numbering is as shown in the previous slide.

Abstract Syntax Tree is the condensed form of the parse tree, which is

. Useful for representing language constructs.


. The production : S if B then s1 else s2 may appear as

In the next few slides we will see how abstract syntax trees can be constructed from
syntax directed definitions. Abstract syntax trees are condensed form of parse trees. Normally
operators and keywords appear as leaves but in an abstract syntax tree they are associated with
the interior nodes that would be the parent of those leaves in the parse tree. This is clearly
indicated by the examples in these slides.

. Chain of single productions may be collapsed, and operators move to the parent nodes

Chain of single production are collapsed into one node with the operators moving up to become
the node.
For Constructing the Abstract Syntax tree for expressions,

. Each node can be represented as a record

. operators : one field for operator, remaining fields ptrs to operands mknode(
op,left,right )

. identifier : one field with label id and another ptr to symbol table mkleaf(id,entry)

. number : one field with label num and another to keep the value of the number
mkleaf(num,val)

Each node in an abstract syntax tree can be implemented as a record with several fields.
In the node for an operator one field identifies the operator (called the label of the node) and the
remaining contain pointers to the nodes for operands. Nodes of an abstract syntax tree may have
additional fields to hold values (or pointers to values) of attributes attached to the node. The
functions given in the slide are used to create the nodes of abstract syntax trees for expressions.
Each function returns a pointer to a newly created note.

Example : The following


sequence of function
calls creates a parse
tree for a- 4 + c

P 1 = mkleaf(id, entry.a)
P 2 = mkleaf(num, 4)
P 3 = mknode(-, P 1 , P 2 )
P 4 = mkleaf(id, entry.c)
P 5 = mknode(+, P 3 , P 4 )

An example showing the formation of an abstract syntax tree by the given function calls for the
expression a-4+c.The call sequence can be explained as:

1. P1 = mkleaf(id,entry.a) : A leaf node made for the identifier Qa R and an entry for Qa R is
made in the symbol table.
2. P2 = mkleaf(num,4) : A leaf node made for the number Q4 R.
3. P3 = mknode(-,P1,P2) : An internal node for the Q- Q.I takes the previously made nodes as
arguments and represents the expression Qa-4 R.
4. P4 = mkleaf(id,entry.c) : A leaf node made for the identifier Qc R and an entry for Qc R is
made in the symbol table.
5. P5 = mknode(+,P3,P4) : An internal node for the Q+ Q.I takes the previously made nodes as
arguments and represents the expression Qa- 4+c R.
A syntax directed definition for constructing syntax tree
E E1+T E.ptr = mknode(+, E 1 .ptr, T.ptr)
E T E.ptr = T.ptr
T T1*F T.ptr := mknode(*, T 1 .ptr, F.ptr)
T F T.ptr := F.ptr
F (E) F.ptr := E.ptr
F id F.ptr := mkleaf(id, entry.id)
F num F.ptr := mkleaf(num,val)

Now we have the syntax directed definitions to construct the parse tree for a given grammar. All
the rules mentioned in slide 29 are taken care of and an abstract syntax tree is formed.

Translation schemes : A CFG where semantic actions occur within the right hand side of
production, A translation scheme to map infix to postfix.

E TR
addop T { print(addop)} R | e
T num {print(num)}

Parse tree for 9 - 5 + 2

We assume that the actions are terminal symbols and Perform depth first order traversal to obtain
9 5 - 2 +.
 When designing translation scheme, ensure attribute value is available when referred to
 In case of synthesized attribute it is trivial (why?)
In a translation scheme, as we are dealing with implementation, we have to explicitly
worry about the order of traversal. We can now put in between the rules some actions as part of
the RHS. We put this rules in order to control the order of traversals. In the given example, we
have two terminals (num and addop). It can generally be seen as a number followed by R (which
necessarily has to begin with an addop). The given grammar is in infix notation and we need to
convert it into postfix notation. If we ignore all the actions, the parse tree is in black, without the
red edges. If we include the red edges we get a parse tree with actions. The actions are so far
treated as a terminal. Now, if we do a depth first traversal, and whenever we encounter a action
we execute it, we get a post-fix notation. In translation scheme, we have to take care of the
evaluation order; otherwise some of the parts may be left undefined. For different actions,
different result will be obtained. Actions are something we write and we have to control it.
Please note that translation scheme is different from a syntax driven definition. In the latter, we
do not have any evaluation order; in this case we have an explicit evaluation order. By explicit
evaluation order we have to set correct action at correct places, in order to get the desired output.
Place of each action is very important. We have to find appropriate places, and that is that
translation scheme is all about. If we talk of only synthesized attribute, the translation scheme is
very trivial. This is because, when we reach we know that all the children must have been
evaluated and all their attributes must have also been dealt with. This is because finding the place
for evaluation is very simple, it is the rightmost place.

In case of both inherited and synthesized attributes

. An inherited attribute for a symbol on rhs of a production must be computed in an action before
that symbol

S A 1 A 2 {A 1 .in = 1,A 2 .in = 2}


A a {print(A.in)}

Depth first order traversal gives error undefined

. A synthesized attribute for non terminal on the lhs can be computed after all attributes it
references, have been computed. The action normally should be placed at the end of rhs

We have a problem when we have both synthesized as well as inherited attributes. For the given
example, if we place the actions as shown, we cannot evaluate it. This is because, when doing a
depth first traversal, we cannot print anything for A1. This is because A1 has not yet been
initialized. We, therefore have to find the correct places for the actions. This can be that the
inherited attribute of A must be calculated on its left. This can be seen logically from the
definition of L-attribute definition, which says that when we reach a node, then everything on its
left must have been computed. If we do this, we will always have the attribute evaluated at the
correct place. For such specific cases (like the given example) calculating anywhere on the left
will work, but generally it must be calculated immediately at the left.

Example: Translation scheme for EQN

S B B.pts = 10
S.ht = B.ht
B B1 B2 B1 .pts = B.pts
B 2 .pts = B.pts
B.ht = max(B 1 .ht,B2 .ht)
B B1 sub B 2 B1 .pts = B.pts;
B 2 .pts = shrink(B.pts)
B.ht = disp(B1 .ht,B2 .ht)
B text B.ht = text.h * B.pts

We now look at another example. This is the grammar for finding out how do I compose text.
EQN was equation setting system which was used as an early type setting system for UNIX. It
was earlier used as an latex equivalent for equations. We say that start symbol is a block: S - >B
We can also have a subscript and superscript. Here, we look at subscript. A Block is composed
of several blocks: B -> B1B2 and B2 is a subscript of B1. We have to determine what is the point
size (inherited) and height Size (synthesized). We have the relevant function for height and point
size given along side. After putting actions in the right place

We have put all the actions at the correct places as per the rules stated. Read it from left to right,
and top to bottom. We note that all inherited attribute are calculated on the left of B symbols and
synthesized attributes are on the right.

Top down Translation: Use predictive parsing to implement L-attributed definitions


E E 1 + T E.val := E 1 .val + T.val
E E 1 - T E.val := E 1 .val - T.val
E T E.val := T.val
T (E) T.val := E.val
T num T.val := num.lexval

We now come to implementation. We decide how we use parse tree and L-attribute
definitions to construct the parse tree with a one-to-one correspondence. We first look at the top-
down translation scheme. The first major problem is left recursion. If we remove left recursion
by our standard mechanism, we introduce new symbols, and new symbols will not work with the
existing actions. Also, we have to do the parsing in a single pass.

TYPE SYSTEM AND TYPE CHECKING:

. If both the operands of arithmetic operators +, -, x are integers then the result is of type integer
. The result of unary & operator is a pointer to the object referred to by the operand.
- If the type of operand is X then type of result is pointer to X

In Pascal, types are classified under:

1. Basic types: These are atomic types with no internal structure. They include the types boolean,
character, integer and real.

2. Sub-range types: A sub-range type defines a range of values within the range of another type.
For example, type A = 1..10; B = 100..1000; U = 'A'..'Z';

3. Enumerated types: An enumerated type is defined by listing all of the possible values for the
type. For example: type Colour = (Red, Yellow, Green); Country = (NZ, Aus, SL, WI, Pak, Ind,
SA, Ken, Zim, Eng); Both the sub-range and enumerated types can be treated as basic types.

4. Constructed types: A constructed type is constructed from basic types and other basic types.
Examples of constructed types are arrays, records and sets. Additionally, pointers and functions
can also be treated as constructed types.

TYPE EXPRESSION:
It is an expression that denotes the type of an expression. The type of a language construct is
denoted by a type expression

 It is either a basic type or it is formed by applying operators called type constructor to


other type expressions
 A type constructor applied to a type expression is a type expression
 A basic type is type expression

- type error : error during type checking


- void : no type value
The type of a language construct is denoted by a type expression. A type expression is either a
basic type or is formed by applying an operator called a type constructor to other type
expressions. Formally, a type expression is recursively defined as:

1. A basic type is a type expression. Among the basic types are boolean , char , integer , and real
.A special basic type, type_error , is used to signal an error during type checking. Another
special basic type is void which denotes "the absence of a value" and is used to check statements.
2. Since type expressions may be named, a type name is a type expression.
3. The result of applying a type constructor to a type expression is a type expression.
4. Type expressions may contain variables whose values are type expressions themselves.

TYPE CONSTRUCTORS: are used to define or construct the type of user defined types based
on their dependent types.
Arrays : If T is a type expression and I is a range of integers, then array ( I , T ) is the type
expression denoting the type of array with elements of type T and index set I.

For example, the Pascal declaration, var A: array[1 .. 10] of integer; associates the type
expression array ( 1..10, integer ) with A.

Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is also a type
expression.

Records : A record type constructor is applied to a tuple formed from field names and field
types. For example, the declaration

Consider the declaration

type row = record


addr : integer;
lexeme : array [1 .. 15] of char
end;
var table: array [1 .. 10] of row;

The type row has type expression : record ((addr x integer) x (lexeme x array(1 .. 15, char)))
and type expression of table is array(1 .. 10, row)

Note: Including the field names in the type expression allows us to define another record type
with the same fields but with different names without being forced to equate the two.

Pointers: If T is a type expression, then pointer ( T ) is a type expression denoting the type
"pointer to an object of type T".
For example, in Pascal, the declaration
var p: row declares variable p to have type pointer( row ).
Functions : Analogous to mathematical functions, functions in programming languages may be
defined as mapping a domain type D to a range type R. The type of such a function is denoted by
the type expression D R. For example, the built-in function mod of Pascal has domain type int X
int, and range type int . Thus we say mod has the type: int x int -> int
As another example, according to the Pascal declaration
function f(a, b: char) : integer;
Here the type of f is denoted by the type expression is char x char pointer( integer )

SPECIFICATIONS OF A TYPE CHECKER: Consider a language which consists of a


sequence of declarations followed by a single expression

P D;E

D D ; D | id : T

T char | integer | array [ num] of T | ^ T

E literal | num | E mod E | E [E] | E ^

A type checker is a translation scheme that synthesizes the type of each expression from the
types of its sub-expressions. Consider the above given grammar that generates programs
consisting of a sequence of declarations D followed by a single expression E.

Specifications of a type checker for the language of the above grammar: A program generated
by this grammar is

key : integer;
key mod 1999

Assumptions:

1. The language has three basic types: char , int and type-error

2. For simplicity, all arrays start at 1. For example, the declaration array[256] of char leads to the
type expression array ( 1.. 256, char).

Rules for Symbol Table entry


D id : T addtype(id.entry, T.type)
T char T.type = char
T integer T.type = int
T ^T1 T.type = pointer(T1 .type)
T array [ num ] of T 1 T.type = array(1..num, T 1 .type)
TYPE CHECKING OF FUNCTIONS :

Consider the Syntax Directed Definition,

E E1 ( E2 ) E. type = if E2 .type == s and

E1 .type == s t

then t

else type-error

The rules for the symbol table entry are specified above. These are basically the way in which
the symbol table entries corresponding to the productions are done.

Type checking of functions

The production E -> E ( E ) where an expression is the application of one expression to another
can be used to represent the application of a function to an argument. The rule for checking the
type of a function application is

E -> E1 ( E2 ) { E.type := if E2.type == s and E1. type == s -> t then t else type_error }

This rule says that in an expression formed by applying E1 to E2, the type of E1 must be a
function s -> t from the type s of E2 to some range type t ; the type of E1 ( E2 ) is t . The above
rule can be generalized to functions with more than one argument byconstructing a product type
consisting of the arguments. Thus n arguments of type T1 , T2

... Tn can be viewed as a single argument of the type T1 X T2 ... X Tn . For example,

root : ( real real) X real real

declares a function root that takes a function from reals to reals and a real as arguments and
returns a real. The Pascal-like syntax for this declaration is

function root ( function f (real) : real; x: real ) : real

TYPE CHECKING FOR EXPRESSIONS: consider the following SDD for expressions

E literal E.type = char


E num E.type = integer
E id E.type = lookup(id.entry)
E E1 mod E2 E.type = if E 1 .type == integer and
E2 .type==integer
then integer
else type_error
E E1 [E2 ] E.type = if E2 .type==integer and
E1 .type==array(s,t)
then t
else type_error
E E1 ^ E.type = if E1 .type==pointer(t)
then t
else type_error

To perform type checking of expressions, following rules are used. Where the synthesized
attribute type for E gives the type expression assigned by the type system to the expression
generated by E.

The following semantic rules say that constants represented by the tokens literal and num have
type char and integer , respectively:

E -> literal { E.type := char }

E -> num { E.type := integer }

. The function lookup ( e ) is used to fetch the type saved in the symbol-table entry pointed to by
e. When an identifier appears in an expression, its declared type is fetched and assigned to the
attribute type:

E -> id { E.type := lookup ( id . entry ) }

. According to the following rule, the expression formed by applying the mod operator to two
sub-expressions of type integer has type integer ; otherwise, its type is type_error .

E -> E1 mod E2 { E.type := if E1.type == integer and E2.type == integer then integer else
type_error }

In an array reference E1 [ E2 ], the index expression E2 must have type integer , inwhich case
the result is the element type t obtained from the type array ( s, t ) of E1.

E -> E1 [ E2 ] { E.type := if E2.type == integer and E1.type == array ( s, t ) then t else


type_error }

Within expressions, the postfix operator yields the object pointed to by its operand. The type of E
is the type t of the object pointed to by the pointer E:

E E1 { E.type := if E1.type == pointer ( t ) then t else type_error }


TYPE CHECKING OF STATEMENTS: Statements typically do not have values. Special
basic type void can be assigned to them. Consider the SDD for the grammar below which
generates Assignment statements conditional, and looping statements.

S id := E S.Type = if id.type == E.type


then void
else type_error
S if E then S1 S.Type = if E.type == boolean
then S1.type
else type_error
S while E do S1 S.Type = if E.type == boolean
then S1.type
else type_error
S S1 ; S2 S.Type = if S1.type == void
and S2.type == void
then void
else type_error

Since statements do not have values, the special basic type void is assigned to them, but if an
error is detected within a statement, the type assigned to the statement is type_error .

The statements considered below are assignment, conditional, and while statements. Sequences
of statements are separated by semi-colons. The productions given below can be combined with
those given before if we change the production for a complete program to P -> D; S. The
program now consists of declarations followed by statements.

Rules for type checking the statements are given below.

1. S id := E { S.type := if id . type == E.type then void else type_error }

This rule checks that the left and right sides of an assignment statement have the same type.

2. S if E then S1 { S.type := if E.type == boolean then S1.type else type_error }

This rule specifies that the expressions in an if -then statement must have the type boolean .

3. S while E do S1 { S.type := if E.type == boolean then S1.type else type_error }

This rule specifies that the expression in a while statement must have the type boolean .

4. S S1; S2 { S.type := if S1.type == void and S2.type == void then void else type_error }
Errors are propagated by this last rule because a sequence of statements has type void only if
each sub-statement has type void.

IMPORTANT & EXPECTED QUESTIONS

1. What do you mean by THREE ADDRESS CODE? Generate the three-address code for
the following code.
begin
PROD: = 0;
I: =1;
do
begin
PROD:=PROD + A[I] B[I];
I:=I+1;
End
while I <=20
end

2. Write a short note on Attributed grammar & Annotated parse tree.


3. Define an intermediate code form. Explain various intermediate code forms?
4. What is Syntax Directed Translation? Construct Syntax Directed Translation scheme to
convert a given arithmetic expression into three address code.
5. What are Synthesized and Inherited attributes? Explain with examples?
6. Explain SDT for Simple Type checker?
7. Define and construct triples, quadruples and indirect triple notations of an expression: a *
- (b + c).

ASSIGNMENT QUESTIONS:
1. Write Three address code for the below example

While( i<10)
{
a= b+c*-d;
i++;
}

2. What is a Syntax Directed Definition? Write Syntax Directed definition to convert binary
value in to decimal?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy