Unit 5
Unit 5
Unit 5
Conceptually, with both syntax-directed definition and translation schemes, we parse the
input token stream, build the parse tree, and then traverse the tree as needed to evaluate the
semantic rules at the parse tree nodes. Evaluation of the semantic rules may generate code,
save information in a symbol table, issue error messages, or perform any other activities.
The translation of the token stream is the result obtained by evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree in
form of attributes attached to the nodes. Syntax-directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes associated with the non-terminals in their
definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or syntax
tree and compute the values of attributes at the nodes of the tree by visiting them in some
order. In many cases, translation can be done during parsing without building an explicit
tree.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to this
grammar, in order to pass some information up the parse tree and check for semantic errors,
if any. In this example, we will focus on the evaluation of the given expression, as we don’t
have any semantic assertions to check in this very basic example.
E -> E+T { E.val = E.val + T.val } PR#1
E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5
For understanding translation rules further, we take the first SDT augmented to [ E -> E+T
] production rule. The translation rule in consideration has val as an attribute for both the
non-terminals – E & T. Right-hand side of the translation rule corresponds to attribute
values of the right-side nodes of the production rule and vice-versa. Generalizing, SDT are
augmented rules to a CFG that associate 1) set of attributes to every node of the grammar
and 2) a set of translation rules to every production rule using attributes, constants, and
lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree
corresponding to S would be
To evaluate translation rules, we can employ one depth-first search traversal on the parse
tree. This is possible only because SDT rules don’t impose any specific order on evaluation
until children’s attributes are computed before parents for a grammar having all synthesized
attributes. Otherwise, we would have to figure out the best-suited plan to traverse through
the parse tree and evaluate all the attributes in one or more traversals. For better
understanding, we will move bottom-up in the left to right fashion for computing the
translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents, as
discussed above. Right-hand side nodes are sometimes annotated with subscript 1 to
distinguish between children and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of
children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding to
node E. If all the semantic attributes in an augmented grammar are synthesized, one depth-
first search traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same
production symbols annotated to differentiate between parent and child, has an inherited
attribute val corresponding to node T.
Intermediate languages
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this
intermediate code to generate the target code (which can be understood by the machine). The
benefits of using machine-independent intermediate code are:
• Because of the machine-independent intermediate code, portability will be enhanced.
For example, suppose, if a compiler translates the source language to its target machine
language without having the option for generating intermediate code, then for each new
machine, a full native compiler is required. Because, obviously, there were some
modifications in the compiler itself according to the machine specifications.
• Retargeting is facilitated.
• It is easier to apply source code modification to improve the performance of source
code by optimizing the intermediate code.
What is Intermediate Code Generation?
Intermediate Code Generation is a stage in the process of compiling a program, where the
compiler translates the source code into an intermediate representation. This representation
is not machine code but is simpler than the original high-level code. Here’s how it works:
• Translation: The compiler takes the high-level code (like C or Java) and converts it
into an intermediate form, which can be easier to analyze and manipulate.
• Portability: This intermediate code can often run on different types of machines
without needing major changes, making it more versatile.
• Optimization: Before turning it into machine code, the compiler can optimize this
intermediate code to make the final program run faster or use less memory.
If we generate machine code directly from source code then for n target machine we will
have optimizers and n code generator but if we will have a machine-independent
intermediate code, we will have only one optimizer. Intermediate code can be either
language-specific (e.g., Bytecode for Java) or language. independent (three-address code).
The following are commonly used intermediate code representations:
Postfix Notation
• Also known as reverse Polish notation or suffix notation.
• In the infix notation, the operator is placed between operands, e.g., a + b. Postfix
notation positions the operator at the right end, as in ab +.
• For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator
yields e1e2+.
• Postfix notation eliminates the need for parentheses, as the operator’s position and arity
allow unambiguous expression decoding.
• In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab –
cd + *ab -+
Read more: Infix to Postfix
Three-Address Code
• A three address statement involves a maximum of three references, consisting of two
for operands and one for the result.
• A sequence of three address statements collectively forms a three address code.
• The typical form of a three address statement is expressed as x = y op z, where x, y,
and z represent memory addresses.
• Each variable (x, y, z) in a three address statement is associated with a specific memory
location.
While a standard three address statement includes three references, there are instances
where a statement may contain fewer than three references, yet it is still categorized as a
three address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1
T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
Read more: Three-address code
Syntax Tree
• A syntax tree serves as a condensed representation of a parse tree.
• The operator and keyword nodes present in the parse tree undergo a relocation process
to become part of their respective parent nodes in the syntax tree. the internal nodes are
operators and child nodes are operands.
• Creating a syntax tree involves strategically placing parentheses within the expression.
This technique contributes to a more intuitive representation, making it easier to discern
the sequence in which operands should be processed.
The syntax tree not only condenses the parse tree but also offers an improved visual
representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Advantages of Intermediate Code Generation
• Easier to Implement: Intermediate code generation can simplify the code generation
process by reducing the complexity of the input code, making it easier to implement.
• Facilitates Code Optimization: Intermediate code generation can enable the use of
various code optimization techniques, leading to improved performance and efficiency
of the generated code.
• Platform Independence: Intermediate code is platform-independent, meaning that it
can be translated into machine code or bytecode for any platform.
• Code Reuse: Intermediate code can be reused in the future to generate code for other
platforms or languages.
• Easier Debugging: Intermediate code can be easier to debug than machine code or
bytecode, as it is closer to the original source code.
Disadvantages of Intermediate Code Generation
• Increased Compilation Time: Intermediate code generation can significantly increase
the compilation time, making it less suitable for real-time or time-critical applications.
• Additional Memory Usage: Intermediate code generation requires additional memory
to store the intermediate representation, which can be a concern for memory-limited
systems.
• Increased Complexity: Intermediate code generation can increase the complexity of
the compiler design, making it harder to implement and maintain.
• Reduced Performance: The process of generating intermediate code can result in code
that executes slower than code generated directly from the source code.
assignment statements
E→E+E
E→E∗E
E → −E
E → (E)
E → id
• 𝐄. 𝐏𝐋𝐀𝐂𝐄− It tells about the name that will hold the value of the expression.
• 𝐄. 𝐂𝐎𝐃𝐄− It represents a sequence of three address statements evaluating the
expression E in grammar represents an Assignment statement. E. CODE represents the
three address codes of the statement. CODE for non-terminals on the left is the
concatenation of CODE for each non-terminal on the right of Production.
Abstract Translation Scheme
{T = newtemp( );
E. PLACE = T;
E → E(1) + E(2) E. CODE = E(1). CODE | |E(2). CODE| |
E. PLACE
| | '=' | |E(1). PLACE | | '+' | |E(2). PLACE }
{T = newtemp( );
E. PLACE = T;
E → E(1) ∗ E(2) E. CODE = E(1). CODE | |E(2). CODE | |
E. PLACE | | '=' | |E(1). PLACE
| | '*' | |E(2). PLACE }
{T = newtemp( );
E. PLACE = T;
E → −E(1) E. CODE = E(1). CODE
| |E. PLACE | | '=−' | |E(1). PLACE
}
E. PLACE| | '=' | | E(1). PLACE | | '+' | | E(2). PLACE is a string which is appended with E. CODE
= E(1). CODE ||E(2). CODE.
In the fifth production, i.e., E → (E(1)) does not have any string which follows E. CODE =
E(1). CODE. This is because it does not have any operator on its R.H.S of the production.
Similarly, the sixth production also does not have any string appended after E. CODE = null.
The sixth production contains null because there is no expression appears on R.H.S of
production. So, no CODE attribute will exist as no expression exists, because CODE represents
a sequence of Three Address Statements evaluating the expression.
It consists of id in its R.H.S, which is the terminal symbol but not an expression. We can also
use a procedure GEN (Statement) in place of S. CODE & E. CODE, As the GEN procedure
will automatically generate three address statements.
E → (E(1)) None
E → id None
Back patching.
Backpatching is basically a process of fulfilling unspecified information. This information is
of labels. It basically uses the appropriate semantic actions during the process of code
generation. It may indicate the address of the Label in goto statements while producing TACs
for the given expressions.
Here basically two passes are used because assigning the positions of these label statements
in one pass is quite challenging. It can leave these addresses unidentified in the first pass and
then populate them in the second round. Backpatching is the process of filling up gaps in
incomplete transformations and information.
What is Backpatching?
Backpatching is a method to deal with jumps in the control flow constructs like if statements,
loops, etc in the intermediate code generation phase of the compiler. Otherwise, as the target
of these jumps may not be known until later in the compilation stages, back patching is a
method to fill in these destinations located elsewhere.
Forward jumps are very common in constructs like the if statements, while loops, switch
cases. For example, in a language with goto statements, the destination of a goto may not
be resolved until and unless its label appears following the goto statement. Forward
references i.e; Jumps from lower addresses to higher address it is a mechanism to maintain
and solve these.
Need for Backpatching:
Backpatching is mainly used for two purposes:
1. Boolean Expression
Boolean expressions are statements whose results can be either true or false. A boolean
expression which is named for mathematician George Boole is an expression that evaluates
to either true or false. Let’s look at some common language examples:
• My favorite color is blue. → true
• I am afraid of mathematics. → false
• 2 is greater than 5. → false
2. Flow of control statements:
The flow of control statements needs to be controlled during the execution of statements in a
program. For example:
3. Labels and Gotos
The most elementary programming language construct for changing the flow of control in a
program is a label and goto. When a compiler encounters a statement like goto L, it must
check that there is exactly one statement with label L in the scope of this goto statement. If
the label has already appeared, then the symbol table will have an entry giving the compiler-
generated label for the first three-address instruction associated with the source statement
labeled L. For the translation, we generate a goto three-address statement with that compiler-
generated label as a target.
When a label L is encountered for the first time in the source program, either in a declaration
or as the target of the forward goto, we enter L into the symbol table and generate a symbolic
table for L.
One-pass code generation using backpatching:
In a single pass, backpatching may be used to create a boolean expressions program as well
as the flow of control statements. The synthesized properties truelist and falselist of non-
terminal B are used to handle labels in jumping code for Boolean statements. The label to
which control should go if B is true should be added to B.truelist, which is a list of a jump or
conditional jump instructions. B.falselist is the list of instructions that eventually get the label
to which control is assigned when B is false. The jumps to true and false exist, as well as the
label field, are left blank when the program is generated for B. The lists B.truelist and
B.falselist, respectively, contain these early jumps.
A statement S, for example, has a synthesized attribute S.nextlist, which indicates a list of
jumps to the instruction immediately after the code for S. It can generate instructions into an
instruction array, with labels serving as indexes. We utilize three functions to modify the list
of jumps:
• Makelist (i): Create a new list including only i, an index into the array of instructions
and the makelist also returns a pointer to the newly generated list.
• Merge(p1,p2): Concatenates the lists pointed to by p1, and p2 and returns a pointer to
the concatenated list.
• Backpatch (p, i): Inserts i as the target label for each of the instructions on the record
pointed to by p.
Backpatching for Boolean Expressions
Using a translation technique, it can create code for Boolean expressions during bottom-up
parsing. In grammar, a non-terminal marker M creates a semantic action that picks up the
index of the next instruction to be created at the proper time.
For Example, Backpatching using boolean expressions production rules table:
Step 1: Generation of the production table
Step 2: We have to find the TAC(Three address code) for the given expression using
backpatching:
A < B OR C < D AND P < Q
Step 3: Now we will make the parse tree for the expression:
The flow of Control Statements:
Control statements are those that alter the order in which statements are executed. If, If-else,
Switch-Case, and while-do statements are examples. Boolean expressions are often used in
computer languages to
• Alter the flow of control: Boolean expressions are conditional expressions that change
the flow of control in a statement. The value of such a Boolean statement is implicit in
the program’s position. For example, if (A) B, the expression A must be true if
statement B is reached.
• Compute logical values: During bottom-up parsing, it may generate code for Boolean
statements via a translation mechanism. A non-terminal marker M in the grammar
establishes a semantic action that takes the index of the following instruction to be
formed at the appropriate moment.
Applications of Backpatching
• Backpatching is used to translate flow-of-control statements in one pass itself.
• Backpatching is used for producing quadruples for boolean expressions during bottom-
up parsing.
• It is the activity of filling up unspecified information of labels during the code
generation process.
• It helps to resolve forward branches that have been planted in the code.