CD_UNIT III
CD_UNIT III
TABLE
• Attribute grammars, syntax directed definition, evaluation and flow of attribute
in a syntax tree Basic structure, symbol attributes and management. Run-time
environment: Procedure activation, parameter passing, value return, memory
allocation, scope.
Attribute grammars:
• Attribute grammar is a special form of context-free grammar where some additional
information (attributes) are appended to one or more of its non-terminals in order to provide
context-sensitive information. Each attribute has well-defined domain of values, such as
integer, float, character, string, and expressions.
• Attribute grammar is a medium to provide semantics to the context-free grammar and it can
help specify the syntax and semantics of a programming language. Attribute grammar (when
viewed as a parse-tree) can pass values or information among the nodes of a tree.
• Example
• E → E + T { E.value = E.value + T.value }
• The right part of the CFG contains the semantic rules that specify how the grammar should be
interpreted. Here, the values of non-terminals E and T are added together and the result is
copied to the non-terminal E.
• Semantic attributes may be assigned to their values from their domain at the time of parsing
and evaluated at the time of assignment or conditions. Based on the way the attributes get
their values, they can be broadly divided into two categories : synthesized attributes and
inherited attributes.
syntax directed definition:
Prerequisite – Introduction to Syntax Analysis, Syntax Directed Translation
Syntax Directed Definition (SDD) is a kind of abstract specification. It is generalization of context free
grammar in which each grammar production X –> a is associated with it a set of production rules of the
form s = f(b1, b2, ……bk) where s is the attribute obtained from function f. The attribute can be a string,
number, type or a memory location. Semantic rules are fragments of code which are embedded usually at
the end of production and enclosed in curly braces ({ }). Example:
E --> E1 + T { E.val = E1.val + T.val}
Annotated Parse Tree – The parse tree containing the values of attributes at each node for given input
string is called annotated or decorated parse tree.
Features –
•High level specification
•Hides implementation details
•Explicit order of evaluation is not specified
Types of attributes – There are two types of attributes:
1. Synthesized Attributes – These are those attributes which derive their values from their children
nodes i.e. value of synthesized attribute at node is computed from the values of attributes at children
nodes in parse tree.
Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E1.val and T.val
Computation of Synthesized Attributes –
•Write the SDD using appropriate semantic rules for each production in given grammar.
•The annotated parse tree is generated and attribute values are computed in bottom up manner.
•The value obtained at root node is the final output.
Example: Consider the following grammar
S --> E E --> E1 + T E --> T T --> T1 * F T --> F F --> digit
The SDD for the above grammar can be written as follow
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse tree for the input
string is
For computation of attributes we start from leftmost bottom node. The rule F –> digit is used to reduce digit to F and
the value of digit is obtained from lexical analyzer which becomes value of F i.e. from semantic action F.val = digit.lexval.
Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 *
F production, the corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 = 20
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26. Then, the production S –> E is
applied to reduce E.val = 26 and semantic action associated with it prints the result E.val . Hence, the output will be 26.
2. Inherited Attributes – These are the attributes which derive their values from their parent or sibling nodes i.e. value of
inherited attributes are computed by value of parent or sibling nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }
string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as int, float or
double. Then L node gives type of identifiers a and c. The computation of type is done in top down manner or
preorder traversal. Using function Enter_type the type of identifiers a and c is inserted in symbol table at
corresponding id.entry.
Evaluation and flow of attribute in a syntax tree
Basic structure
• A syntax tree is a tree in which each leaf node represents an operand, while each inside
node represents an operator. The Parse Tree is abbreviated as the syntax tree. The syntax
tree is usually used when representing a program in a tree structure.
• 1. mknode (op, left, right): It creates an operator node with the name op and two fields,
containing left and right pointers.
• 2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which
is a reference to the identifier’s symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the number’s value, val. Make a
syntax tree for the expression a 4 + c, for example. p1, p2,…, p5 are pointers to the symbol table entries for identifiers ‘a’ and
‘c’, respectively, in this sequence.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of fundamental blocks, allows you to examine
the flow of values between them, and also allows you to optimize them. DAG allows for simple transformations of
fundamental pieces.
An array of records is used to hold the nodes of a syntax tree or DAG. Each row of the array corresponds to a
single record, and hence a single node. The first field in each record is an operation code, which indicates the
node’s label. In the given figure below, Interior nodes contain two more fields denoting the left and right
children, while leaves have one additional field that stores the lexical value (either a symbol-table pointer or a
constant in this instance).
he integer index of the record for that node inside the array is used to refer to nodes in this array. This integer has
been referred to as the node’s value number or the expression represented by the node in the past. The value of
the node labeled -I- is 3, while the values of its left and right children are 1 and 2, respectively. Instead of integer
indexes, we may use pointers to records or references to objects in practice, but the reference to a node would
still be referred to as its “value number.” Value numbers can assist us in constructing expressions if they are
stored in the right data format.
Algorithm: The value-number method for constructing the nodes of a Directed Acyclic Graph.
INPUT: Label op, node /, and node r.
OUTPUT: The value number of a node in the array with signature (op, l,r).
METHOD: Search the array for node M with label op, left child I, and right child r. If there is such a node, return
the value number of M. If not, create in the array a new node N with label op, left child I, and right child r, and
return its value number.
While Algorithm produces the intended result, examining the full array every time one node is requested is
timeconsuming, especially if the array contains expressions from an entire program. A hash table, in which the
nodes are divided into “buckets,” each of which generally contains only a few nodes, is a more efficient method.
The hash table is one of numerous data structures that may effectively support dictionaries. 1 A dictionary is a
data type that allows us to add and remove elements from a set, as well as to detect if a particular element is
present in the set. A good dictionary data structure, such as a hash table, executes each of these operations in a
constant or near-constant amount of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that computes the bucket index for a
signature (op, I, r) in such a manner that the signatures are distributed across buckets and no one bucket gets
more than a fair portion of the nodes. The bucket index h(op, I, r) is deterministically computed from the op, I,
and r, allowing us to repeat the calculation and always arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The bucket headers are stored in an array
indexed by the hash value, each of which corresponds to the first cell of a list. Each column in a bucket’s linked list
contains the value number of one of the nodes that hash to that bucket. That is, node (op,l,r) may be located on
the array’s list whose header is at index h(op,l,r).
symbol attributes:
• Symbol table is an important data structure created and maintained by compilers in order to store information
about the occurrence of various entities such as variable names, function names, objects, classes, interfaces,
etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.
• A symbol table may serve the following purposes depending upon the language in hand:
• To store the names of all entities in a structured form at one place.
• To verify if a variable has been declared.
• To implement type checking, by verifying assignments and expressions in the source code are semantically
correct.To determine the scope of a name (scope resolution).
• A symbol table is simply a table which can be either linear or a hash table. It maintains an entry for each name
in the following format:
• <symbol name, type, attribute>
• For example, if a symbol table has to store information about the following variable declaration: • static int
interest;
• then it should store the entry such as:
• <interest, int, static>
• The attribute clause contains the entries related to the name.
Implementation
If a compiler is to handle a small amount of data, then the symbol table can be implemented as an unordered list,
which is easy to code, but it is only suitable for small tables only. A symbol table can be implemented in one of the
following ways:
•Linear (sorted or unsorted) list
•Binary Search Tree
•Hash table
Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as
a key for the hash function and the return value is the information about the symbol.
Operations
A symbol table, either linear or hash, should provide the following
operations. insert()
This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens are
identified and names are stored in the table. This operation is used to add information in the symbol table about
unique names occurring in the source code. The format or structure in which the names are stored depends upon
the compiler in hand.
An attribute for a symbol in the source code is the information associated with that symbol. This information contains
the value, state, scope, and type about the symbol. The insert() function takes the symbol and its attributes as
arguments and stores the information in the symbol table. For example:
int a; should be processed by
the compiler as: insert(a, int);
lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists in the symbol
table, it returns its attributes stored in the table.
Scope Management
A compiler maintains two types of symbol tables: a global symbol table which can be accessed by all the procedures
and scope symbol tables that are created for each scope in the program.
To determine the scope of a name, symbol tables are arranged in hierarchical structure as shown in the example
below:
int value=10;
void pro_one()
{
int one_1;
int one_2;
{ \
int one_3; |_ inner scope 1
int one_4; |
} /
int one_5;
{ \
int one_6; |_ inner scope 2
int one_7; |
} /
void pro_two()
{
int two_1;
int two_2;
{ \
int two_3; |_ inner scope 3
int two_4; |
} /
int two_5;
}
Machine Status Stores machine status such as Registers, Program Counter etc., before the procedure is
called.
Control Link Stores the address of activation record of the caller procedure.
Access Link Stores the information of data which is outside the local scope.
Actual Parameters Stores actual parameters, i.e., parameters which are used to send input to the called
procedure.
Procedure activation:
• Control stack is a run time stack which is used to keep track of the live
procedure activations i.e. it is used to find out the procedures whose
execution have not been completed.
• When it is called (activation begins) then the procedure name will push
on to the stack and when it returns (activation ends) then it will popped.
• Activation record is used to manage the information needed by a single
execution of a procedure.
• An activation record is pushed into the stack when a procedure is called
and it is popped when the control returns to the caller function.
• The diagram below shows the contents of activation records:
Return Value: It is used by calling procedure to return a value to calling procedure.
Actual Parameter: It is used by calling procedures to supply parameters to the called procedures.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the procedure is called.
Local Data: It holds the data that is local to the execution of the procedure.
parameter passing:
• There are different ways in which parameter data can be passed into and out of methods and functions. Let us
assume that a function B() is called from another function A(). In this case A is called the “caller function” and
B is called the “called function or callee function”. Also, the arguments which A sends to B are called actual
arguments and the parameters of B are called formal arguments.
• Terminology
• Formal Parameter : A variable and its type as they appear in the prototype of the function or method.
• Actual Parameter : The variable or expression corresponding to a formal parameter that appears in the
function or method call in the calling environment.
• Modes:
• IN: Passes info from caller to callee.
• OUT: Callee writes values in caller.
• IN/OUT: Caller tells callee value of variable, which may be updated by callee.
• Important methods of Parameter Passing
• Pass By Value: This method uses in-mode semantics. Changes made to formal parameter do not get
transmitted back to the caller. Any modifications to the formal parameter variable inside the called
function or method affect only the separate storage location and will not be reflected in the actual
parameter in the calling environment. This method is also called as call by value.
// C program to illustrate
// call by value
#include <stdio.h>
// Passing parameters
func(x, y);
printf("In main, x = %d y = %d\n", x, y);
return 0;
}
Output:
In func, a = 12 b = 7
In main, x = 5 y = 7
1.Languages like C, C++, Java support this type of parameter passing. Java in fact is strictly call by value.
Shortcomings:
1. Inefficiency in storage allocation
2. For objects and arrays, the copy semantics are costly
2.Pass by reference(aliasing): This technique uses in/out-mode semantics. Changes made to formal
parameter do get transmitted back to the caller through parameter passing. Any changes to the formal
parameter are reflected in the actual parameter in the calling environment as formal parameter receives a
reference (or pointer) to the actual data. This method is also called as call by reference. This method is
efficient in both time and space.
// C program to illustrate
// call by reference
#include <stdio.h>
int main(void)
{ int a = 10, b = 20;
// passing parameters
swapnum(&a, &b);
a is 20 and b is 10
Other methods of Parameter Passing
These techniques are older and were used in earlier programming languages like Pascal, Algol and Fortran. These
techniques are not applicable in high level languages.
1.Pass by Result:This method uses out-mode semantics. Just before control is transferred back to the caller, the
value of the formal parameter is transmitted back to the actual parameter.T his method is sometimes called call by
result. In general, pass by result technique is implemented by copy.
2.Pass by Value-Result: This method uses in/out-mode semantics. It is a combination of Pass-by-Value and Pass-
byresult. Just before the control is transferred back to the caller, the value of the formal parameter is transmitted back
to the actual parameter. This method is sometimes called as call by value-result
3.Pass by name : This technique is used in programming language such as Algol. In this technique, symbolic “name”
of a variable is passed, which allows it both to be accessed and update.
Example:
To double the value of C[j], you can pass its name (not its value) into the following procedure.
procedure
double(x); real x;
begin x:=x*2
end;
In general, the effect of pass-by-name is to textually substitute the argument in a procedure call for the corresponding
parameter in the body of the procedure.
Implications of Pass-by-Name mechanism:
•The argument expression is re-evaluated each time the formal parameter is passed.
•The procedure can change the values of variables used in the argument expression and hence change the
expression’s value.
Value Return:
• The value of expression, if present, is returned to the calling function. If expression is omitted, the return
value of the function is undefined. The expression, if present, is evaluated and then converted to the
type returned by the function. When a return statement contains an expression in functions that have a
void return type, the compiler generates a warning, and the expression isn't evaluated.
• If no return statement appears in a function definition, control automatically returns to the calling
function after the last statement of the called function is executed. In this case, the return value of the
called function is undefined. If the function has a return type other than void, it's a serious bug, and the
compiler prints a warning diagnostic message. If the function has a void return type, this behavior is
okay, but may be considered poor style. Use a plain return statement to make your intent clear.
• As a good engineering practice, always specify a return type for your functions. If a return value isn't
required, declare the function to have void return type. If a return type isn't specified, the C compiler
assumes a default return type of int.
• Many programmers use parentheses to enclose the expression argument of the return statement.
However, C doesn't require the parentheses.
• The compiler may issue a warning diagnostic message about unreachable code if it finds any
statements placed after the return statement.
• In a main function, the return statement and expression are optional. What happens to the returned
value, if one is specified, depends on the implementation. Microsoft-specific: The Microsoft C
implementation returns the expression value to the process that invoked the program, such as cmd.exe.
If no return expression is supplied, the Microsoft C runtime returns a value that indicates success (0) or
failure (a non-zero value).
Example
• This example is one program in several parts. It demonstrates the return statement, and how it's used
both to end function execution, and optionally, to return a value. // C_return_statement.c
// Compile using: cl /W4 C_return_statement.c
#include <limits.h> // for INT_MAX
#include <stdio.h> // for printf
long long square( int value )
{
// Cast one operand to long long to force the //
expression to be evaluated as type long long. // Note
that parentheses around the return expression // are
allowed, but not required here.
return ( value * (long long) value );
}
The square function returns the square of its argument, in a wider type to prevent an arithmetic
error. Microsoft-specific: In the Microsoft C implementation, the long long type is large enough to
hold the product of two int values without overflow.
The parentheses around the return expression in square are evaluated as part of the expression, and
aren't required by the return statement.
memory allocation:
• Since C is a structured language, it has some fixed rules for programming. One of them includes
changing the size of an array. An array is a collection of items stored at contiguous memory locations.
• As it can be seen that the length (size) of the array above made is 9. But what if there is a
requirement to change this length (size). For Example,
• If there is a situation where only 5 elements are needed to be entered in this array. In this case, the
remaining 4 indices are just wasting memory in this array. So there is a requirement to lessen the
length (size) of the array from 9 to 5.
•Take another situation. In this, there is an array of 9 elements with all 9 indices filled. But there is a need
to enter 3 more elements in this array. In this case, 3 indices more are required. So the length (size) of the
array needs to be changed from 9 to 12.
This procedure is referred to as Dynamic Memory Allocation in C.
Therefore, C Dynamic Memory Allocation can be defined as a procedure in which the size of a data
structure (like Array) is changed during the runtime.
C provides some functions to achieve these tasks. There are 4 library functions provided by C defined
under <stdlib.h> header file to facilitate dynamic memory allocation in C programming. They are:
1.malloc()
2.calloc()
3.free()
4.realloc()
C malloc() method
The “malloc” or “memory allocation” method in C is used to dynamically allocate a single large block of
memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any
form. It doesn’t Initialize memory at execution time so that it has initialized each block with the default
int main()
{
// This pointer will hold the // base
address of the block created
int* ptr;
int n, i;
return 0;
}
Output:
Enter number of elements: 5
Memory successfully allocated using malloc.
The elements of the array are: 1, 2, 3, 4, 5,
C calloc() method
1.“calloc” or “contiguous allocation” method in C is used to dynamically allocate the specified number of blocks of
memory of the specified type. it is very much similar to malloc() but has two different points and these are:
2.It initializes each block with a default value ‘0’.
3.It has two parameters or arguments as compare to malloc().
Syntax:
ptr = (cast-type*)calloc(n, element-size); here, n is the no. of elements and
element-size is the size of each element. For Example:
int main()
{
return 0;
}
Output:
Enter number of elements: 5
Memory successfully allocated using calloc.
The elements of the array are: 1, 2, 3, 4, 5,
C free() method
“free” method in C is used to dynamically de-allocate the memory. The memory allocated using functions malloc() and
calloc() is not de-allocated on their own. Hence the free() method is used, whenever the dynamic memory allocation
takes place. It helps to reduce wastage of memory by freeing it.
Syntax:
free(ptr);
#include <stdio.h>
#include <stdlib.h>
int main()
{
return 0;
}
Output:
Enter number of elements: 5
Memory successfully allocated using malloc.
Malloc Memory successfully freed.
Memory successfully allocated using calloc.
Calloc Memory successfully freed.
C realloc() method
“realloc” or “re-allocation” method in C is used to dynamically change the memory allocation of a
previously allocated memory. In other words, if the memory previously allocated with the help of malloc or
calloc is insufficient, realloc can be used to dynamically re-allocate memory. re-allocation of memory
maintains the already present value and new blocks will be initialized with the default garbage value.
Syntax:
ptr = realloc(ptr, newSize); where ptr is
reallocated with new size 'newSize'.
int main()
{
free(ptr);
} return
0;}
Output:
Enter number of elements:
5 Memory successfully allocated using calloc.
The elements of the array are: 1, 2, 3, 4, 5,
Enter the new size of the array:
10 Memory successfully re-allocated using realloc. The elements of the array are: 1,
2, 3, 4, 5, 6, 7, 8, 9, 10,
Scope:
• The scope of a variable x in the region of the program in which the use of x refers to its
declaration. One of the basic reasons for scoping is to keep variables in different parts of the
program distinct from one another. Since there are only a small number of short variable
names, and programmers share habits about naming of variables (e.g., I for an array index), in
any program of moderate size the same variable name will be used in multiple different
scopes.
Scoping is generally divided into two classes:
1. Static Scoping
2. Dynamic Scoping Static Scoping:
Static scoping is also called lexical scoping. In this scoping, a variable always refers to its
top-level environment. This is a property of the program text and is unrelated to the run-time
call stack. Static scoping also makes it much easier to make a modular code as a programmer
can figure out the scope just by looking at the code. In contrast, dynamic scope requires the
programmer to anticipate all possible dynamic contexts.
In most programming languages including C, C++, and Java, variables are always statically
(or lexically) scoped i.e., binding of a variable can be determined by program text and is
independent of the run-time function call stack.
For example, the output for the below program is 10, i.e., the value returned by f() is not
dependent on who is calling it (Like g() calls it and has a x with value 20). f() always returns
the value of global variable x.
// A C program to demonstrate static scoping.
#include<stdio.h>
int x = 10;
// Called by g()
int f()
{ return
x; }
int main()
{ printf("%d",
g()); printf("\n");
return 0;
}
Output :
10
To sum up, in static scoping the compiler first searches in the current block, then in global
variables, then in successively smaller scopes.
Dynamic Scoping:
With dynamic scope, a global identifier refers to the identifier associated with the most recent
environment and is uncommon in modern languages. In technical terms, this means that each
identifier has a global stack of bindings and the occurrence of an identifier is searched in the
most recent binding.
In simpler terms, in dynamic scoping, the compiler first searches the current block and then
successively all the calling functions.
// Since dynamic scoping is very uncommon in
// the familiar languages, we consider the
// following pseudo code as our example. It
// prints 20 in a language that uses dynamic
// scoping. int x = 10;
// Called by g()
int f()
{ return
x;
}
// g() has its own variable
// named as x and calls f()
int g() {
int x =
20;
return
f();
}
main()
{
printf(g());
}
Output in a language that uses Dynamic Scoping :
20
return f(); }
print g()."\n";
Output :
20