Unit-Iii: Figure 4.1: Intermediate Code Generator
Unit-Iii: Figure 4.1: Intermediate Code Generator
Unit-Iii: Figure 4.1: Intermediate Code Generator
In Intermediate code generation we use syntax directed methods to translate the source
program into an intermediate form programming language constructs such as declarations,
assignments and flow-of-control statements.
The output of the Parser and the input to the Code Generator.
Relatively machine-independent and allows the compiler to be retargeted.
Relatively easy to manipulate (optimize).
1. Retargeting is facilitated - Build a compiler for a new machine by attaching a new code
generator to an existing front-end.
1. Syntax Trees
2. Postfix notation
Semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees of for generating postfix notation.
Graphical Representations
A syntax tree depicts the natural hierarchical structure of a source program. A DAG
(Directed Acyclic Graph) gives the same information but in a more compact way because
common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-c+b*-c
appear in the following figure.
. assign
a +
* *
b uniminus b uniminus
c c
Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the in
which a node appears immediately after its children. The postfix notation for the syntax tree in
the fig is
The edges in a syntax tree do not appear explicitly in postfix notation. They can be
recovered in the order in which the nodes appear and the no. of operands that the operator at a
node expects. The recovery of edges is similar to the evaluation, using a staff, of an expression in
postfix notation.
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
The reason for the term‖three-address code‖ is that each statement usually contains three
addresses, two for the operands and one for the result. In the implementations of three-address
code given later in this section, a programmer-defined name is replaced by a pointer tc a symbol-
table entry for that name.
Three-address statements are akin to assembly code. Statements can have symbolic labels
and there are statements for flow of control. A symbolic label represents the index of a three-
address statement in the array holding inter- mediate code. Actual indices can be substituted for
the labels either by making a separate pass, or by using ‖back patching,‖ discussed in Section
8.6. Here are the common three-address statements used in the remainder of this book:
2. Assignment instructions of the form x:= op y, where op is a unary operation. Essential unary
operations include unary minus, logical negation, shift operators, and conversion operators that,
for example, convert a fixed-point number to a floating-point number.
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator
(<, =, >=, etc.) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in
the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
param xn
call p, n
Generated as part of a call of the procedure p(x,, x~,..., x‖). The integer n indicating the number
of actual parameters in ‖call p, n‖ is not redundant because calls can be nested. The
implementation of procedure calls is outline d in Section 8.7.
7. Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i]:=y sets the contents of
the location i units beyond x to the value of y. In both these instructions, x, y, and i refer to data
objects.
8. Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of these
sets the value of x to be the location of y. Presumably y is a name, perhaps a temporary, that
denotes an expression with an I-value such as A[i, j], and x is a pointer name or temporary. That
is, the r-value of x is the l-value (location) of some object!. In the statement x: = ~y, presumably
y is a pointer or a temporary whose r- value is a location. The r-value of x is made equal to the
contents of that location. Finally, +x: = y sets the r-value of the object pointed to by x to the r-
value of y.
When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. The value of non-
computed into a new temporary t. In general, the three- address code for id: = E consists of code
to evaluate E into some temporary t, followed by the assignment id.place: = t. If an expression is
a single identifier, say y, then y itself holds the value of the expression. For the moment, we
create a new name every time a temporary is needed; techniques for reusing temporaries are
given in Section S.3. The S-attributed definition in Fig. 8.6 generates three-address code for
assignment statements. Given input a: = b+ – c + b+ – c, it produces the code in Fig. 8.5(a). The
synthesized attribute S.code represents the three- address code for the assignment S. The non-
terminal E has two attributes:
The function newtemp returns a sequence of distinct names t1, t2,... in response to
successive calls. For convenience, we use the notation gen(x ‘: =‘ y ‘+‘ z) in Fig. 8.6 to represent
the three-address statement x: = y + z. Expressions appearing instead of variables like x, y, and z
are evaluated when passed to gen, and quoted operators or operands, like ‘+‘, are taken literally.
In practice, three- address statements might be sent to an output file, rather than built up into the
code attributes. Flow-of-control statements can be added to the language of assignments in Fig.
8.6 by productions and semantic rules )like the ones for while statements in Fig. 8.7. In the
figure, the code for S - while E do S, is generated using‘ new attributes S.begin and S.after to
mark the first statement in the code for E and the statement following the code for S,
respectively.
These attributes represent labels created by a function new label that returns a new label
every time it is called.
IMPLEMENTATIONS OF THREE-ADDRESS STATEMENTS:
QUADRUPLES:
A quadruple is a record structure with four fields, which we call op, arg l, arg 2, and
result. The op field contains an internal code for the operator. The three-address statement x:= y
op z is represented by placing y in arg 1. z in arg 2. and x in result. Statements with unary
operators like x: = – y or x: = y do not use arg 2. Operators like param use neither arg2 nor
result. Conditional and unconditional jumps put the target label in result. The quadruples in Fig.
H.S(a) are for the assignment a: = b+ – c + b i – c. They are obtained from the three-address code
.The contents of fields arg 1, arg 2, and result are normally pointers to the symbol-table entries
for the names represented by these fields. If so, temporary names must be entered into the
symbol table as they are created.
TRIPLES:
To avoid entering temporary names into the symbol table. We might refer to a temporary
value bi the position of the statement that computes it. If we do so, three-address statements can
be represented by records with only three fields: op, arg 1 and arg2, as Shown below. The fields
arg l and arg2, for the arguments of op, are either pointers to the symbol table (for programmer-
defined names or constants) or pointers into the triple structure (for temporary values). Since
three fields are used, this intermediate code format is known as triples.‘ Except for the treatment
of programmer-defined names, triples correspond to the representation of a syntax tree or dag by
an array of nodes, as in
Parenthesized numbers represent pointers into the triple structure, while symbol-table
pointers are represented by the names themselves. In practice, the information needed to interpret
the different kinds of entries in the arg 1 and arg2 fields can be encoded into the op field or some
additional fields. The triples in Fig. 8.8(b) correspond to the quadruples in Fig. 8.8(a). Note that
the copy statement a:= t5 is encoded in the triple representation by placing a in the arg 1 field
and using the operator assign. A ternary operation like x[ i ]: = y requires two entries in the triple
structure, as shown in Fig. 8.9(a), while x: = y[i] is naturally represented as two operations in
Fig. 8.9(b).
Indirect Triples
Another implementation of three-address code that has been considered is that of listing
pointers to triples, rather than listing the triples themselves. This implementation is naturally
called indirect triples. For example, let us use an array statement to list pointers to triples in the
desired order. Then the triples in Fig. 8.8(b) might be represented as in Fig. 8.10.
Assume that the program has been verified to be syntactically correct and converted into
some kind of intermediate representation (a parse tree). One now has parse tree available. The
next phase will be semantic analysis of the generated parse tree. Semantic analysis also includes
error reporting in case any semantic error is found out.
Semantic analysis is a pass by a compiler that adds semantic information to the parse tree
and performs certain checks based on this information. It logically follows the parsing phase, in
which the parse tree is generated, and logically precedes the code generation phase, in which
(intermediate/target) code is generated. (In a compiler implementation, it may be possible to fold
different phases into one pass.) Typical examples of semantic information that is added and
checked is typing information ( type checking ) and the binding of variables and function names
to their definitions ( object binding ). Sometimes also some early code optimization is done in
this phase. For this phase the compiler usually maintains symbol tables in which it stores what
each symbol (variable names, function names, etc.) refers to.
TYPE CHECKING: The process of verifying and enforcing the constraints of types is called
type checking. This may occur either at compile-time (a static check) or run-time (a dynamic
check). Static type checking is a primary task of the semantic analysis carried out by a compiler.
If type rules are enforced strongly (that is, generally allowing only those automatic type
conversions which do not lose information), the process is called strongly typed, if not, weakly
typed.
UNIQUENESS CHECKING: Whether a variable name is unique or not, in the its scope.
Type coersion: If some kind of mixing of types is allowed. Done in languages which are not
strongly typed. This can be done dynamically as well as statically.
NAME CHECKS: Check whether any variable has a name which is not allowed. Ex. Name is
same as an identifier (Ex. int in java).
A parser has its own limitations in catching program errors related to semantics, something that
is deeper than syntax analysis. Typical features of semantic analysis cannot be modeled using
context free grammar formalism. If one tries to incorporate those features in the definition of a
language then that language doesn't remain context free anymore.
Example: in
string x; int y;
y=x+3 the use of x is a type error
int a, b;
a = b + c c is not declared
An identifier may refer to different variables in different parts of the program . An identifier
may be usable in one part of the program but not another These are a couple of examples which
tell us that typically what a compiler has to do beyond syntax analysis. The third point can be
explained like this: An identifier x can be declared in two separate functions in the program,
once of the type int and then of the type char. Hence the same identifier will have to be bound to
these two different properties in the two different contexts. The fourth point can be explained in
this manner: A variable declared within one function cannot be used within the scope of the
definition of the other function unless declared there separately. This is just an example.
Probably you can think of many more examples in which a variable declared in one scope cannot
be used in another scope.
ABSTRACT SYNTAX TREE: Is nothing but the condensed form of a parse tree, It is
In the next few slides we will see how abstract syntax trees can be constructed from syntax
directed definitions. Abstract syntax trees are condensed form of parse trees. Normally operators
and keywords appear as leaves but in an abstract syntax tree they are associated with the interior
nodes that would be the parent of those leaves in the parse tree. This is clearly indicated by the
examples in these slides.
Chain of single productions may be collapsed, and operators move to the parent nodes
Chain of single productions are collapsed into one node with the operators moving up to become
the node.
. Each node of the tree can be represented as a record consisting of at least two fields to store
operators and operands.
.operators : one field for operator, remaining fields ptrs to operands mknode( op,left,right )
.identifier : one field with label id and another ptr to symbol table mkleaf(id, id.entry)
.number : one field with label num and another to keep the value of the number mkleaf(num,val)
Each node in an abstract syntax tree can be implemented as a record with several fields. In the
node for an operator one field identifies the operator (called the label of the node) and the
remaining contain pointers to the nodes for operands. Nodes of an abstract syntax tree may have
additional fields to hold values (or pointers to values) of attributes attached to the node. The
functions given in the slide are used to create the nodes of abstract syntax trees for expressions.
Each function returns a pointer to a newly created note.
For Example: the following
sequence of function
calls creates a parse
tree for w= a- 4 + c
P 1 = mkleaf(id, entry.a)
P 2 = mkleaf(num, 4)
P 3 = mknode(-, P 1 , P 2 )
P 4 = mkleaf(id, entry.c)
P 5 = mknode(+, P 3 , P 4 )
An example showing the formation of an abstract syntax tree by the given function calls for the
expression a-4+c.The call sequence can be defined based on its postfix form, which is explained
blow.
A- Write the postfix equivalent of the expression for which we want to construct a syntax tree
B- Call the functions in the sequence, as defined by the sequence in the postfix expression which
results in the desired tree. In the case above, call mkleaf() for a, mkleaf() for 4, mknode() for
-, mkleaf() for c , and mknode() for + at last.
1. P1 = mkleaf(id, a.entry) : A leaf node made for the identifier a, and an entry for a is made in
the symbol table.
2. P2 = mkleaf(num,4) : A leaf node made for the number 4, and entry for its value.
3. P3 = mknode(-,P1,P2) : An internal node for the -, takes the pointer to previously made nodes
P1, P2 as arguments and represents the expression a-4.
4. P4 = mkleaf(id, c.entry) : A leaf node made for the identifier c , and an entry for c.entry made
in the symbol table.
5. P5 = mknode(+,P3,P4) : An internal node for the + , takes the pointer to previously made
nodes P3,P4 as arguments and represents the expression a- 4+c .
Following is the syntax directed definition for constructing syntax tree above
Now we have the syntax directed definitions to construct the parse tree for a given grammar. All
the rules mentioned in slide 29 are taken care of and an abstract syntax tree is formed.
It is a high level specification in which implementation details are hidden, e.g., S.sys =
A.sys + B.sys;
/* does not give any implementation details. It just tells us. This kind of attribute
equation we will be using, Details like at what point of time is it evaluated and in what manner
are hidden from the programmer.*/
2) Syntax directed Translation(SDT) scheme: Sometimes we want to control the way the
attributes are evaluated, the order and place where they are evaluated. This is of a slightly lower
level.
An SDT is an SDD in which semantic actions can be placed at any position in the body of the
production.
For example , following SDT prints the prefix equivalent of an arithmetic expression consisting a
+ and * operators.
L En{ printf(„E.val‟) }
E { printf(„+‟) }E1 + T
E T
T { printf(„*‟) }T 1 * F
T F
F (E)
F { printf(„id.lexval‟) }id
F { printf(„num.lexval‟) } num
This action in an SDT, is executed as soon as its node in the parse tree is visited in a preorder
traversal of the tree.
To avoid repeated traversal of the parse tree, actions are taken simultaneously when a token is
found. So calculation of attributes goes along with the construction of the parse tree.
Along with the evaluation of the semantic rules the compiler may simultaneously generate code,
save the information in the symbol table, and/or issue error messages etc. at the same time while
building the parse tree.
Example
Build attribute grammar that annotates Number with the value it represents
if sign.negative
Attributes of RHS can be computed from attributes of LHS and vice versa.
Dependence graph shows the dependence of attributes on other attributes, along with the
syntax tree. Top down traversal is followed by a bottom up traversal to resolve the dependencies.
Number, val and neg are synthesized attributes. Pos is an inherited attribute.
Attributes : . Attributes fall into two classes namely synthesized attributes and inherited
attributes. Value of a synthesized attribute is computed from the values of its children nodes.
Value of an inherited attribute is computed from the sibling and parent nodes.
The attributes are divided into two groups, called synthesized attributes and inherited
attributes. The synthesized attributes are the result of the attribute evaluation rules also using the
values of the inherited attributes. The values of the inherited attributes are inherited from parent
nodes and siblings.
Each grammar production A a has associated with it a set of semantic rules of the form
Dependence relation tells us what attributes we need to know before hand to calculate a
particular attribute.
Here the value of the attribute b depends on the values of the attributes c1 to ck . If c1 to ck
belong to the children nodes and b to A then b will be called a synthesized attribute. And if b
belongs to one among a (child nodes) then it is an inherited attribute of one of the grammar
symbols on the right.
Synthesized Attributes : A syntax directed definition that uses only synthesized attributes
is said to be an S- attributed definition
. A parse tree for an S-attributed definition can be annotated by evaluating semantic rules for
attributes
S-attributed grammars are a class of attribute grammars, comparable with L-attributed grammars
but characterized by having no inherited attributes at all. Inherited attributes, which must be
passed down from parent nodes to children nodes of the abstract syntax tree during the semantic
analysis, pose a problem for bottom-up parsing because in bottom-up parsing, the parent nodes
of the abstract syntax tree are created after creation of all of their children. Attribute evaluation
in S-attributed grammars can be incorporated conveniently in both top-down parsing and bottom-
up parsing .
. terminals are assumed to have only synthesized attribute values of which are supplied by lexical
analyzer
This is a grammar which uses only synthesized attributes. Start symbol has no parents, hence no
inherited attributes.
. possible to use only S-attributes but more natural to use inherited attributes
D TL L.in = T.type
T real T.type = real
T int T.type = int
L L1 , id L1 .in = L.in; addtype(id.entry, L.in)
L id addtype (id.entry,L.in)
Inherited attributes help to find the context (type, scope etc.) of a token e.g., the type of a token
or scope when the same variable name is used multiple times in a program in different functions.
An inherited attribute system may be replaced by an S -attribute system but it is more natural to
use inherited attributes in some cases like the example given above.
Here addtype(a, b) functions adds a symbol table entry for the id a and attaches to it the type of b
.
Dependence Graph : . If an attribute b depends on an attribute c then the semantic rule for b
must be evaluated after the semantic rule for c
. The dependencies among the nodes can be depicted by a directed graph called dependency
graph
Dependency Graph : Directed graph indicating interdependencies among the synthesized and
inherited attributes of various nodes in a parse tree.
for a
An algorithm to construct the dependency graph. After making one node for every
attribute of all the nodes of the parse tree, make one edge from each of the other attributes on
which it depends.
For example ,
The semantic rule A.a = f(X.x , Y.y) for the production A -> XY defines the synthesized
attribute a of A to be dependent on the attribute x of X and the attribute y of Y . Thus the
dependency graph will contain an edge from X.x to A.a and Y.y to A.a accounting for the two
dependencies. Similarly for the semantic rule X.x = g(A.a , Y.y) for the same production there
will be an edge from A.a to X.x and an edg e from Y.y to X.x.
Example
For example, the dependency graph for the sting real id1, id2, id3
. Put a dummy synthesized attribute b for a semantic rule that consists of a procedure call
The figure shows the dependency graph for the statement real id1, id2, id3 along with the
parse tree. Procedure calls can be thought of as rules defining the values of dummy synthesized
attributes of the nonterminal on the left side of the associated production. Blue arrows constitute
the dependency graph and black lines, the parse tree. Each of the semantic rules addtype
(id.entry, L.in) associated with the L productions leads to the creation of the dummy attribute.
Evaluation Order :
Any topological sort of dependency graph gives a valid order in which semantic rules must be
evaluated
a4 = real
a5 = a4
addtype(id3.entry, a5)
a7 = a5
addtype(id2.entry, a7 )
a9 := a7 addtype(id1.entry, a9 )
A topological sort of a directed acyclic graph is any ordering m1, m2, m3 .......mk of the
nodes of the graph such that edges go from nodes earlier in the ordering to later nodes. Thus if
mi -> mj is an edge from mi to mj then mi appears before mj in the ordering. The order of the
statements shown in the slide is obtained from the topological sort of the dependency graph in
the previous slide. 'an' stands for the attribute associated with the node numbered n in the
dependency graph. The numbering is as shown in the previous slide.
Abstract Syntax Tree is the condensed form of the parse tree, which is
In the next few slides we will see how abstract syntax trees can be constructed from
syntax directed definitions. Abstract syntax trees are condensed form of parse trees. Normally
operators and keywords appear as leaves but in an abstract syntax tree they are associated with
the interior nodes that would be the parent of those leaves in the parse tree. This is clearly
indicated by the examples in these slides.
. Chain of single productions may be collapsed, and operators move to the parent nodes
Chain of single production are collapsed into one node with the operators moving up to become
the node.
For Constructing the Abstract Syntax tree for expressions,
. operators : one field for operator, remaining fields ptrs to operands mknode(
op,left,right )
. identifier : one field with label id and another ptr to symbol table mkleaf(id,entry)
. number : one field with label num and another to keep the value of the number
mkleaf(num,val)
Each node in an abstract syntax tree can be implemented as a record with several fields.
In the node for an operator one field identifies the operator (called the label of the node) and the
remaining contain pointers to the nodes for operands. Nodes of an abstract syntax tree may have
additional fields to hold values (or pointers to values) of attributes attached to the node. The
functions given in the slide are used to create the nodes of abstract syntax trees for expressions.
Each function returns a pointer to a newly created note.
P 1 = mkleaf(id, entry.a)
P 2 = mkleaf(num, 4)
P 3 = mknode(-, P 1 , P 2 )
P 4 = mkleaf(id, entry.c)
P 5 = mknode(+, P 3 , P 4 )
An example showing the formation of an abstract syntax tree by the given function calls for the
expression a-4+c.The call sequence can be explained as:
1. P1 = mkleaf(id,entry.a) : A leaf node made for the identifier Qa R and an entry for Qa R is
made in the symbol table.
2. P2 = mkleaf(num,4) : A leaf node made for the number Q4 R.
3. P3 = mknode(-,P1,P2) : An internal node for the Q- Q.I takes the previously made nodes as
arguments and represents the expression Qa-4 R.
4. P4 = mkleaf(id,entry.c) : A leaf node made for the identifier Qc R and an entry for Qc R is
made in the symbol table.
5. P5 = mknode(+,P3,P4) : An internal node for the Q+ Q.I takes the previously made nodes as
arguments and represents the expression Qa- 4+c R.
A syntax directed definition for constructing syntax tree
E E1+T E.ptr = mknode(+, E 1 .ptr, T.ptr)
E T E.ptr = T.ptr
T T1*F T.ptr := mknode(*, T 1 .ptr, F.ptr)
T F T.ptr := F.ptr
F (E) F.ptr := E.ptr
F id F.ptr := mkleaf(id, entry.id)
F num F.ptr := mkleaf(num,val)
Now we have the syntax directed definitions to construct the parse tree for a given grammar. All
the rules mentioned in slide 29 are taken care of and an abstract syntax tree is formed.
Translation schemes : A CFG where semantic actions occur within the right hand side of
production, A translation scheme to map infix to postfix.
E TR
addop T { print(addop)} R | e
T num {print(num)}
We assume that the actions are terminal symbols and Perform depth first order traversal to obtain
9 5 - 2 +.
When designing translation scheme, ensure attribute value is available when referred to
In case of synthesized attribute it is trivial (why?)
In a translation scheme, as we are dealing with implementation, we have to explicitly
worry about the order of traversal. We can now put in between the rules some actions as part of
the RHS. We put this rules in order to control the order of traversals. In the given example, we
have two terminals (num and addop). It can generally be seen as a number followed by R (which
necessarily has to begin with an addop). The given grammar is in infix notation and we need to
convert it into postfix notation. If we ignore all the actions, the parse tree is in black, without the
red edges. If we include the red edges we get a parse tree with actions. The actions are so far
treated as a terminal. Now, if we do a depth first traversal, and whenever we encounter a action
we execute it, we get a post-fix notation. In translation scheme, we have to take care of the
evaluation order; otherwise some of the parts may be left undefined. For different actions,
different result will be obtained. Actions are something we write and we have to control it.
Please note that translation scheme is different from a syntax driven definition. In the latter, we
do not have any evaluation order; in this case we have an explicit evaluation order. By explicit
evaluation order we have to set correct action at correct places, in order to get the desired output.
Place of each action is very important. We have to find appropriate places, and that is that
translation scheme is all about. If we talk of only synthesized attribute, the translation scheme is
very trivial. This is because, when we reach we know that all the children must have been
evaluated and all their attributes must have also been dealt with. This is because finding the place
for evaluation is very simple, it is the rightmost place.
. An inherited attribute for a symbol on rhs of a production must be computed in an action before
that symbol
. A synthesized attribute for non terminal on the lhs can be computed after all attributes it
references, have been computed. The action normally should be placed at the end of rhs
We have a problem when we have both synthesized as well as inherited attributes. For the given
example, if we place the actions as shown, we cannot evaluate it. This is because, when doing a
depth first traversal, we cannot print anything for A1. This is because A1 has not yet been
initialized. We, therefore have to find the correct places for the actions. This can be that the
inherited attribute of A must be calculated on its left. This can be seen logically from the
definition of L-attribute definition, which says that when we reach a node, then everything on its
left must have been computed. If we do this, we will always have the attribute evaluated at the
correct place. For such specific cases (like the given example) calculating anywhere on the left
will work, but generally it must be calculated immediately at the left.
S B B.pts = 10
S.ht = B.ht
B B1 B2 B1 .pts = B.pts
B 2 .pts = B.pts
B.ht = max(B 1 .ht,B2 .ht)
B B1 sub B 2 B1 .pts = B.pts;
B 2 .pts = shrink(B.pts)
B.ht = disp(B1 .ht,B2 .ht)
B text B.ht = text.h * B.pts
We now look at another example. This is the grammar for finding out how do I compose text.
EQN was equation setting system which was used as an early type setting system for UNIX. It
was earlier used as an latex equivalent for equations. We say that start symbol is a block: S - >B
We can also have a subscript and superscript. Here, we look at subscript. A Block is composed
of several blocks: B -> B1B2 and B2 is a subscript of B1. We have to determine what is the point
size (inherited) and height Size (synthesized). We have the relevant function for height and point
size given along side. After putting actions in the right place
We have put all the actions at the correct places as per the rules stated. Read it from left to right,
and top to bottom. We note that all inherited attribute are calculated on the left of B symbols and
synthesized attributes are on the right.
We now come to implementation. We decide how we use parse tree and L-attribute
definitions to construct the parse tree with a one-to-one correspondence. We first look at the top-
down translation scheme. The first major problem is left recursion. If we remove left recursion
by our standard mechanism, we introduce new symbols, and new symbols will not work with the
existing actions. Also, we have to do the parsing in a single pass.
. If both the operands of arithmetic operators +, -, x are integers then the result is of type integer
. The result of unary & operator is a pointer to the object referred to by the operand.
- If the type of operand is X then type of result is pointer to X
1. Basic types: These are atomic types with no internal structure. They include the types boolean,
character, integer and real.
2. Sub-range types: A sub-range type defines a range of values within the range of another type.
For example, type A = 1..10; B = 100..1000; U = 'A'..'Z';
3. Enumerated types: An enumerated type is defined by listing all of the possible values for the
type. For example: type Colour = (Red, Yellow, Green); Country = (NZ, Aus, SL, WI, Pak, Ind,
SA, Ken, Zim, Eng); Both the sub-range and enumerated types can be treated as basic types.
4. Constructed types: A constructed type is constructed from basic types and other basic types.
Examples of constructed types are arrays, records and sets. Additionally, pointers and functions
can also be treated as constructed types.
TYPE EXPRESSION:
It is an expression that denotes the type of an expression. The type of a language construct is
denoted by a type expression
1. A basic type is a type expression. Among the basic types are boolean , char , integer , and real
.A special basic type, type_error , is used to signal an error during type checking. Another
special basic type is void which denotes "the absence of a value" and is used to check statements.
2. Since type expressions may be named, a type name is a type expression.
3. The result of applying a type constructor to a type expression is a type expression.
4. Type expressions may contain variables whose values are type expressions themselves.
TYPE CONSTRUCTORS: are used to define or construct the type of user defined types based
on their dependent types.
Arrays : If T is a type expression and I is a range of integers, then array ( I , T ) is the type
expression denoting the type of array with elements of type T and index set I.
For example, the Pascal declaration, var A: array[1 .. 10] of integer; associates the type
expression array ( 1..10, integer ) with A.
Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is also a type
expression.
Records : A record type constructor is applied to a tuple formed from field names and field
types. For example, the declaration
The type row has type expression : record ((addr x integer) x (lexeme x array(1 .. 15, char)))
and type expression of table is array(1 .. 10, row)
Note: Including the field names in the type expression allows us to define another record type
with the same fields but with different names without being forced to equate the two.
Pointers: If T is a type expression, then pointer ( T ) is a type expression denoting the type
"pointer to an object of type T".
For example, in Pascal, the declaration
var p: row declares variable p to have type pointer( row ).
Functions : Analogous to mathematical functions, functions in programming languages may be
defined as mapping a domain type D to a range type R. The type of such a function is denoted by
the type expression D R. For example, the built-in function mod of Pascal has domain type int X
int, and range type int . Thus we say mod has the type: int x int -> int
As another example, according to the Pascal declaration
function f(a, b: char) : integer;
Here the type of f is denoted by the type expression is char x char pointer( integer )
P D;E
D D ; D | id : T
A type checker is a translation scheme that synthesizes the type of each expression from the
types of its sub-expressions. Consider the above given grammar that generates programs
consisting of a sequence of declarations D followed by a single expression E.
Specifications of a type checker for the language of the above grammar: A program generated
by this grammar is
key : integer;
key mod 1999
Assumptions:
1. The language has three basic types: char , int and type-error
2. For simplicity, all arrays start at 1. For example, the declaration array[256] of char leads to the
type expression array ( 1.. 256, char).
E1 .type == s t
then t
else type-error
The rules for the symbol table entry are specified above. These are basically the way in which
the symbol table entries corresponding to the productions are done.
The production E -> E ( E ) where an expression is the application of one expression to another
can be used to represent the application of a function to an argument. The rule for checking the
type of a function application is
E -> E1 ( E2 ) { E.type := if E2.type == s and E1. type == s -> t then t else type_error }
This rule says that in an expression formed by applying E1 to E2, the type of E1 must be a
function s -> t from the type s of E2 to some range type t ; the type of E1 ( E2 ) is t . The above
rule can be generalized to functions with more than one argument byconstructing a product type
consisting of the arguments. Thus n arguments of type T1 , T2
... Tn can be viewed as a single argument of the type T1 X T2 ... X Tn . For example,
declares a function root that takes a function from reals to reals and a real as arguments and
returns a real. The Pascal-like syntax for this declaration is
TYPE CHECKING FOR EXPRESSIONS: consider the following SDD for expressions
To perform type checking of expressions, following rules are used. Where the synthesized
attribute type for E gives the type expression assigned by the type system to the expression
generated by E.
The following semantic rules say that constants represented by the tokens literal and num have
type char and integer , respectively:
. The function lookup ( e ) is used to fetch the type saved in the symbol-table entry pointed to by
e. When an identifier appears in an expression, its declared type is fetched and assigned to the
attribute type:
. According to the following rule, the expression formed by applying the mod operator to two
sub-expressions of type integer has type integer ; otherwise, its type is type_error .
E -> E1 mod E2 { E.type := if E1.type == integer and E2.type == integer then integer else
type_error }
In an array reference E1 [ E2 ], the index expression E2 must have type integer , inwhich case
the result is the element type t obtained from the type array ( s, t ) of E1.
Within expressions, the postfix operator yields the object pointed to by its operand. The type of E
is the type t of the object pointed to by the pointer E:
Since statements do not have values, the special basic type void is assigned to them, but if an
error is detected within a statement, the type assigned to the statement is type_error .
The statements considered below are assignment, conditional, and while statements. Sequences
of statements are separated by semi-colons. The productions given below can be combined with
those given before if we change the production for a complete program to P -> D; S. The
program now consists of declarations followed by statements.
This rule checks that the left and right sides of an assignment statement have the same type.
This rule specifies that the expressions in an if -then statement must have the type boolean .
This rule specifies that the expression in a while statement must have the type boolean .
4. S S1; S2 { S.type := if S1.type == void and S2.type == void then void else type_error }
Errors are propagated by this last rule because a sequence of statements has type void only if
each sub-statement has type void.
1. What do you mean by THREE ADDRESS CODE? Generate the three-address code for
the following code.
begin
PROD: = 0;
I: =1;
do
begin
PROD:=PROD + A[I] B[I];
I:=I+1;
End
while I <=20
end
ASSIGNMENT QUESTIONS:
1. Write Three address code for the below example
While( i<10)
{
a= b+c*-d;
i++;
}
2. What is a Syntax Directed Definition? Write Syntax Directed definition to convert binary
value in to decimal?