0% found this document useful (0 votes)

110 views

20210624-80604 Automata and Compiler Design

This document provides lecture notes on automata and compiler design. It covers 5 modules: fundamentals and finite automata, context free grammars and pushdown automata, Turing machines and compiler basics, syntax analysis, and code optimization and generation. The course objectives are to define properties of formal languages, explain different automata and grammars, and describe compilers. Key concepts covered include regular expressions, context-free grammars, Turing machines, parsing, syntax trees, and code optimization techniques.

Uploaded by

Prasanna Latha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views

20210624-80604 Automata and Compiler Design

Uploaded by

Prasanna Latha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Malla Reddy College Engineering

(Autonomous)
Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad, Telangana-500100 www.mrec.ac.in

Department of Information Technology

III B. TECH I SEM (A.Y.2018-19)

Lecture Notes
On

80604-Automata And Compiler Design

2018-19
MALLA REDDY ENGINEERING COLLEGE B.Tech.
Onwards
(Autonomous) V Semester
(MR-18)
Code: 80604 L T P
AUTOMATA AND COMPILER DESIGN
Credits: 3 3 - -
Prerequisites: Basic Mathematics
Course Objectives:
This course enable the students to define basic properties of formal languages, explain the
Regular languages and grammars, inter conversion, Normalizing CFG , describe the
context free grammars, minimization of CNF, GNF and PDA , designing Turing
Machines and types of Turing Machines, church’s hypothesis counter machines, LBA, P
and NP problems and LR grammar.

MODULE I: Fundamentals and Finite Automata [10 Periods]

Review of Mathematical Theory-Sets, functions, logical statements, proofs, relations,
languages, Mathematical induction, strong principle, Recursive definitions.
Regular Languages and Finite Automata- Regular expressions, regular languages,
applications, Types of grammar: 0, 1, 2 and 3 Automata With output-Moore machine,
Mealy machine, Finite automata, memory requirement in a recognizer, definition,
union, intersection and complement of regular languages, Non Determinism Finite
Automata, Conversion from NFA to FA, Kleene’s Theorem, Minimization of Finite
automata.
MODULE II: Context Free Grammar (CFG) and PDA [10 Periods]
Regular Grammar- Definition, Unions Concatenations And Kleen’s* of Context free
language Regular grammar, Derivations and Languages, Relationship between derivation
and derivation trees, ambiguity.
CFG- Unambiguous CFG and Algebraic Expressions, Bacos Naur Form (BNF), Normal
Form – CNF, Deterministic PDA, Equivalence of CFG and PDA, Context free language
(CFL), Pumping lemma for CFL.

MODULE III: Turing Machine and Compiler Basics [09 Periods]

A: Turing Machine : TM Definition, Model of Computation and Church Turning
Thesis, computing functions with TM, Combining TM, Variations Of TM, Non
Deterministic TM, Universal TM, Recursively and Enumerable Languages, Context
sensitive languages and Chomsky hierarchy.
B: Basics of Compiler and Lexical Analysis : A Simple Compiler, Difference between
interpreter, assembler and compiler. Overview and use of linker and loader , types of
Compiler, Analysis of the Source Program, The Phases of a Compiler, The Grouping of
Phases, Compiler-Construction Tools.The Role of the Lexical Analyzer, Input Buffering,
Specification of Tokens, Recognition of Tokens, A Language for Specifying Lexical
Analyzers, Design of a Lexical Analyzer Generator, Optimization of DFA-Based Pattern
Matchers

MODULE IV: Syntax Analysis [09 Periods]

Introduction- The Role of the parser, Context-Free Grammar, Writing a grammar,Top-
down Parsing, Bottom-Up Parsing, Operator-Precedence Parsing, Lr Parsers, Using
Ambiguous Grammars, Parser Generators.
Syntax-Directed Translation: Syntax-Directed Definitions, Construction of Syntax
Trees, Bottom-Up Evaluation of S- Attributed Definitions, L-Attributed Definitions, Top
Down Translation, Analysis of Syntax- Directed Definitions , Type Systems,
Specification of a Simple Type Checker, Equivalence of Type Expressions, Type
Conversions.

MODULE V: Code Optimization and Genaration [10 Periods]

Intermediate Languages , The Principal Sources of Optimization, Optimization of Basic
Blocks, Loops in Flow Graphs, Iterative Solution of Data-Flow Equations, Code-
Improving Transformations, Data-Flow Analysis of Structured Flow Graphs, Efficient
Data-Flow Algorithms, Symbolic Debugging of Optimized Code. Issues in the Design of
a Code Generator, The Target Machine, Run-Time Storage Management, A Simple Code
Generator, Register Allocation and Assignment, The DAG Representation of Basic
Blocks, Peephole Optimization, Generating Code from DAGs, Dynamic Programming
Code-Generation Algorithm, Code-Generator Generators.

TEXT BOOKS:
1. John C. Martin, “Introduction to Languages and Theory of Computation”, TMH;
Third Edition.
2. Alfred Aho, Ravi Sethi, Jeffrey D Ullman, “Compilers Principles, Techniques and
Tools”, Pearson Education Asia.

REFERENCES:
1. Adesh K. Pandey “An introduction to automata theory and formal
languages”, Publisher: S.K. Kataria and Sons.
2. Deniel I. Cohen, Joh Wiley and Sons, Inc “Introduction to computer theory”.
3. Allen I. Holub “Compiler Design in C”, Prentice Hall of India.
4. J.P. Bennet, “Introduction to Compiler Techniques”, Tata McGraw-Hill,
Second Edition.

E –RESOURCES:
1. https://www.iitg.ernet.in/dgoswami/Flat-Notes.pdf
2. https://books.google.co.in/books?isbn=8184313020
3. http://www.jalc.de/
4. https://arxiv.org/list/cs.FL/0906
5. http://freevideolectures.com/Course/3379/Formal-Languages-and-Automata-Theory
6. http://nptel.ac.in/courses/111103016/

Course Outcomes:
At the end of the course, students will be able to
1. Define the theory of automata types of automata and FA with outputs.
2. Differentiate regular languages and applying pumping lemma.
3. Classify grammars checking ambiguity able to apply pumping lemma for CFL
various types of PDA.
4. Illustrate Turing machine concept and in turn the technique applied in computers.
5. Analyze P vs NP- Class problems and NP-Hard vs NP-complete problems, LBA,
LR Grammar, Counter machines, Decidability of Problems.

CO- PO, PSO Mapping

(3/2/1 indicates strength of correlation) 3-Strong, 2-Medium, 1-Weak
Programme Outcomes(POs) P S O s
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 3 3 2 2
CO2 3 2 2 2
CO3 3 2 2 2
CO4 3 2 2 2
CO5 3 3 2 2

MALLA REDDY ENGINEERING COLLEGE

DEPARTMENT OF INFORMATION TECHNOLOGY

UNIT -1

Fundamentals

Symbol – An atomic unit, such as a digit, character, lower-case letter, etc.

Sometimesa word.[Formal language does not deal with the “meaning” of
thesymbols.]

Alphabet – A finite set of symbols, usually denoted byΣ.

Σ ={0, 1}
Σ = {0, a,9, 4}
Σ = {a, b, c,d}

String – A finite length sequence of symbols, presumably from

some alphabet. w=0110
y=0aa
x=aabcaa
z = 111

Special string: ε (also denoted by λ)

Concatenation: wz = 0110111
Length: |w| = 4 |ε|=0 |x| = 6
R
Reversal: y = aa0
Some special sets ofstrings:
*
Σ All strings of symbols fromΣ
+ *
Σ Σ -{ε}
Example: Σ = {0,1}
*
Σ = {ε, 0, 1, 00, 01, 10, 11, 000, 001,…}
+
Σ = {0, 1, 00, 01, 10, 11, 000, 001,…}
A languageis:
A set of strings from some alphabet (finite or infinite). In
*
otherwords, Any subset L ofΣ
Some speciallanguages:
{}The empty set/language, containing nostring.
{ε}A language containing one string, the emptystring.

Examples:
Σ = {0,1}
*
L = {x | x is in Σ and x contains an even number of
0‟s} Σ = {0, 1, 2,…, 9, .}
*
L = {x | x is in Σ and x forms a finite length real number}
Automata &Compiler Design Page 2
= {0, 1.5, 9.326,…}
Σ = {a, b, c,…, z, A, B,…, Z}
*
L = {x | x is in Σ and x is a Pascal reserved word}
= {BEGIN, END, IF,…}
*
Σ = {Pascal reserved words} U { (, ), ., :, ;,…} U {Legal Pascal identifiers} L = {x | x is in Σ and x
is a syntactically correct Pascal program}

Σ = {English words}
*
L = {x | x is in Σ and x is a syntactically correct

English sentence} Regular Expression

• A regular expression is used to specify a language, and it does soprecisely.
• Regular expressions are veryintuitive.
• Regular expressions are very useful in a variety ofcontexts.
• Given a regular expression, an NFA-ε can be constructed from itautomatically.
• Thus, so can an NFA, a DFA, and a corresponding program, allautomatically!
Definition:
Let Σ be an alphabet. The regular expressions over Σare:
Ø Represents the empty set {}
Ε Represents the set{ε}
Represents the set {a}, for any symbol a inΣ
Let r and s be regular expressions that represent the sets R and S, respectively.
r+sRepresents the set RUS (precedence3)
rsRepresents thesetRS (precedence2)
r* Represents thesetR* (highest precedence)
(r) Represents thesetR (not an op, providesprecedence)
If r is a regular expression, then L(r) is used to denote the correspondinglanguage.

Examples:
Let Σ = {0,1}
(0 +1)* All strings of 0‟s and1‟s0(0 +1)* All strings of 0‟s and 1‟s, beginning with
a0 (0 +1)*1 All strings of 0‟s and 1‟s, ending with a1
(0 + 1)*0(0+1)* All strings of 0‟s and 1‟s containing at least one 0 (0 + 1)*0(0 +
1)*0(0+1)* All strings of 0‟s and 1‟s containing at least two
0‟s (0+1)*01*01* All strings of 0‟s and 1‟s containing at least two
0‟s (101*0)* All strings of 0‟s and 1‟s containing an even number of 0‟s
1*(01*01*)* All strings of 0‟s and 1‟s containing an even number of 0‟s
(1*01*0)*1* All strings of 0‟s and 1‟s containing an even number of0‟s
Identities:

1. Øu = uØ=Ø Multiply by0

2. εu = uε=u Multiply by1

3. Ø* =ε

4. ε* =ε

5. u+v =v+u

Automata &Compiler Design Page 3

6. u + Ø =u

7. u + u = u

8. u* =(u*)*

9. u(v+w) =uv+uw

10. (u+v)w =uw+vw

11. (uv)u = u(vu)

12. (u+v)* = (u+v)

=u*(u+v)*

=(u+vu*)*
= (u*v*)*

=u*(vu*)*

=(u*v)*u*

Finite State Machines

A finite state machine has a set of states and two functions called the next-state
function and the outputfunction
The set of states correspond to all the possible combinations of the internal storage
n
If there are n bits of storage, there are 2 possiblestates
The next state function is a combinational logic function that given the inputs
and the current state, determines the next state of thesystem
The output function produces a set of outputs from the current state and theinputs

• There are two types of finite statemachines

• In a Moore machine, the output only depends on the currentstate
• While in a Mealy machine, the output depends both the current state and the currentinput
• We are only going to deal with the Mooremachine.
• These two types are equivalent incapabilities

A Finite State Machine consistsof:

Kstates:S = {s1, s2, … ,sk}, s1 is initial state Ninputs:I = {i1,
i2, …,in}
Moutputs:O = {o1, o2, …,om}
Next-state function T(S, I) mapping each current state and input to next state
Output Function P(S) specifies output

Finite Automata

• Two types – both describe what are called regularlanguages

• Deterministic (DFA) – There is a fixed number of states and we
can only bein one state at atime

Automata &Compiler Design Page 4

• Nondeterministic (NFA) –There is a fixed number of states but
wcan bein multiple states at onetime

• While NFA‟s are more expressive than DFA‟s, we will see that
addingnondeterminism does not let us define any language that
cannot be defined by aDFA.

• One way to think of this is we might write a program using a NFA, but
then when it is “compiled” we turn the NFA into an equivalentDFA.

Formal Definition of a Finite Automaton

• Finite set of states, typicallyQ.

•
•
•
Alphabet of input symbols, typically∑
O ne s ta te is the s ta r t/in itia l s ta te , ty p ic a lly q 0 / / q 0 ∈ Q
Zero o r m o r e f ina l/a cc e p tin g s ta te s ; th e s e t is ty p ica lly F . // F ⊆ Q

• A transition function, typicallyδ. Thisfunction

• Takes a state and input symbol asarguments.

Deterministic Finite Automata (DFA)

• A DFA is a five-tuple: M = (Q, Σ, δ, q0,

F) Q=A finite set ofstates
Σ=A finite inputalphabet
q0=The initial/starting state, q0 is inQ
F=A set of final/accepting states, which is a subset ofQ
Δ=A transition function, which is a total function from Q x Σ toQ
δ: (Q x Σ)–>Q δ is defined for any q in Q and s in Σ, and δ(q,s)=q‟is equal to
another state q‟ inQ.
Intuitively, δ(q,s) is the state entered by M after reading symbol s while in state q.

• LetM=(Q,Σ,δ,q,F)beaDFAandletwbeinΣ*.ThenwisacceptedbyMiff
0

Automata &Compiler Design Page 5

δ(q ,w) = p for some state p in F.
0
• Let M = (Q, Σ, δ, q , F) be a DFA. Then the language accepted by M is theset: 0
L(M) = {w | w is in Σ* and δ(q ,w) is in F}

• Another equivalentdefinition:
L(M) = {w | w is in Σ* and w is accepted by M}

• Let L be a language. Then L is a regular language iff there exists

a DFA M such that L =L(M).

Notes:
• A DFA M = (Q, Σ, δ,q0,F) partitions the set Σ* into two sets:
L(M)and Σ* -L(M).

• If L = L(M) then L is a subset of L(M) and L(M) is a subset ofL.

• Similarly, if L(M1) = L(M2) then L(M1) is a subset of L(M2) and L(M2)

is a subset of L(M1).

• Some languages are regular, others are not. For example,if

L1 = {x | x is a string of 0's and 1's containing an even number of 1's} and L2 = {x | x =
n n
0 1 for some n >= 0}then L1 is regular but L2 is not.

Nondeterministic Finite Automata (NFA)

An NFA is a five-tuple: M = (Q, Σ, δ, q0,F)

Automata &Compiler Design Page 6

Q A finite set ofstates
ΣA finite inputalphabet
q0The initial/starting state, q0 is inQ
FA set of final/accepting states, which is a subset ofQ
δA transition function, which is a total function from Q x Σ to2
Q
Q Q
δ: (Q x Σ)->2 -2 is the power set of Q, the set of all subsets of Q δ(q,s) -The set of
all states p such that there is atransition
Q
labeled s from q to p δ(q,s) is a function from Q x S to 2 (but not to Q)
Let M = (Q, Σ, δ,q0,F) be an NFA and let w be in Σ*. Then w is accepted by
M iffδ({q0}, w) contains at least one state inF.
Let M = (Q, Σ, δ,q0,F) be an NFA. Then the language accepted by M is
theset: L(M) = {w | w is in Σ* and δ({q0},w) contains at least one state
inF} Another equivalentdefinition:
L(M) = {w | w is in Σ* and w is accepted by M}

Automata &Compiler Design Page 7

Automata &Compiler Design Page 8
. f‹,o,x„M,ili.ti,‹t. I

3.I REDTIOflSflIP BETñEE 8 FAAflDRE

,mod

FIGURE: RsIaliouhipB&anFAsnd iegvIsr ^^P^^ ’°'

g.j gggg@y¢TINGFAF0RAGNEN REe

GonslzrcQon olhFAwM ; • M0Y8B

Case1 :

Automata &Compiler Design Page 9

Automata &Compiler Design Page 10
Conversion from NFA to DFA
Suppose there is an NFA N < Q, ∑, q0, δ, F > which recognizes a language L. Then the DFA D <
Q‟, ∑, q0, δ‟, F‟ > can be constructed for language L as:
Step 1: Initially Q‟ = ɸ.
Step 2: Add q0 to Q‟.
Step 3: For each state in Q‟, find the possible set of states for each input symbol using
transition function of NFA. If this set of states is not in Q‟, add it to Q‟.
Step 4: Final state of DFA will be all states with contain F (final states of NFA)

Automata &Compiler Design Page 11

Example
Consider the following NFA shown in Figure 1.

Following are the various parameters for NFA.

Q = { q0, q1, q2 }
∑ = ( a, b )
F = { q2 }
δ (Transition Function of NFA)

Step 1: Q‟ = ɸ
Step 2: Q‟ = {q0}
Step 3: For each state in Q‟, find the states for each input symbol.
Currently, state in Q‟ is q0, find moves from q0 on input symbol a and b using transition
function of NFA and update the transition table of DFA
δ‟ (Transition Function of DFA)

Now { q0, q1 } will be considered as a single state. As its entry is not in Q‟, add it
to Q‟. So Q‟ = { q0, { q0, q1 } }
Now, moves from state { q0, q1 } on different input symbols are not present in transition table of
DFA, we will calculate it like:
δ‟ ( { q0, q1 }, a ) = δ ( q0, a ) ∪ δ ( q1, a ) = { q0, q1 } δ‟ ( { q0, q1 }, b ) = δ ( q0, b ) ∪

δ ( q1, b ) = { q0, q2 } Now we will update the transition table of DFA.

Automata &Compiler Design Page 12

δ‟ (Transition Function of DFA)

Now { q0, q2 } will be considered as a single state. As its entry is not in Q‟, add it
to Q‟. So Q‟ = { q0, { q0, q1 }, { q0, q2 } }
Now, moves from state {q0, q2} on different input symbols are not present in transition table of
DFA, we will calculate it like:
δ‟ ( { q0, q2 }, a ) = δ ( q0, a ) ∪ δ ( q2, a ) = { q0, q1 } δ‟ ( { q0, q2 }, b ) = δ ( q0, b ) ∪ δ ( q2, b ) = { q0 } Now we will update the transition table of DFA.

δ‟ (Transition Function of DFA)

As there is no new state generated, we are done with the conversion. Final state of DFA will
be state which has q2 as its component i.e., { q0, q2 }
Following are the various parameters for DFA.
Q‟ = { q0, { q0, q1 }, { q0, q2 } }
∑ = ( a, b )
F = { { q0, q2 } } and transition function δ‟ as shown above. The final DFA for above NFA
has been shown in Figure 2.

Note : Sometimes, it is not easy to convert regular expression to DFA. First you can
convert regular expression to NFA and then NFA to DFA

Automata &Compiler Design Page 13

Application of Finite state machine and regular expression in Lexical analysis: Lexical analysis
is the process of reading the source text of a program and converting that source code into a
sequence of tokens. The approach of design a finite state machine by using regular expression is so
useful to generates token form a given source text program. Since the lexical structure of more or
less every programming language can be specified by a regular language, a common way to
implement a lexical analysis is to; 1. Specify regular expressions for all of the kinds of tokens in the
language. The disjunction of all of the regular expressions thus describes any possible token in the
language. 2. Convert the overall regular expression specifying all possible tokens into a
deterministic finite automaton (DFA). 3. Translate the DFA into a program that simulates the DFA.
This program is the lexical analyzer. To recognize identifiers, numerals, operators, etc., implement
a DFA in code. State is an integer variable, δ is a switch statement Upon recognizing a lexeme
returns its lexeme, lexical class and restart DFA with next character in source code.

CONTEXT FREE-GRAMMAR

Definition: Context-Free Grammar (CFG) has 4-tuple: G = (V, T, P,S)

Where,

V -A finite set of variables ornon-terminals

T -A finite set of terminals (V and T do not intersect)
P -A finite set of productions, each of the form A –>α,
Where A is in V and α is in (V U T)*
Note: that α may be ε.
S -A starting non-terminal (S is inV)

Example :CFG:

G = ({S}, {0, 1}, P, S) P:

S–>0S1 or just simply S –> 0S1 |ε
S –>ε
ExampleDerivations:

S => 0S1 (1)

S => ε (2)
=> 01 (2)
S => 0S1 (1)
=> 00S11 (1)
=> 000S111 (1)
=> 000111 (2)

kk
• Note that G “generates” the language {0 1 |k>=0}

Derivation (or Parse) Tree

• Definition: Let G = (V, T, P, S) be a CFG. A tree is a derivation (or parse) treeif:
– Every vertex has a label from V U T U{ε}
– The label of the root isS
– If a vertex with label A has children with labels X1,
X2,…, Xn, from left to right, then

Automata &Compiler Design Page 14

synchronizing set. The Usage of FOLLOW and FIRST symbols as synchronizing
tokens works reasonably well when expressions are parsed.

For the constructed table., fill with synch for rest of the input symbols of FOLLOW set and then
fill the rest of the columns with error term.
A –> X1, X2,…, Xn
Terminals must be a production in P

The first L stands for “Left-to-right scan of input”. The second L stands for “Left-
most derivation”. The „1‟

stands for “1 token of look ahead”.

No LL (1) grammar can be ambiguous or left recursive.

LL (1) Grammar:
• If a vertex has label ε, then that vertex is a leaf and the only child of its‟parent
• More Generally, a derivation tree can be defined with any non-terminal as theroot.

Notes:
• Root can be anynon-terminal
• Leaf nodes can be terminals ornon-terminals
If there were no multiple entries in the Recursive decent parser table, the given grammar is
LL (1).
If the grammar G is ambiguous, left recursive then the recursive decent table will have
atleast one multiply defined entry.
The weakness of LL(1) (Top-down, predictive) parsing is that, must predict which
production to use.
Error Recovery in Predictive Parser:
Error recovery is based on the idea of skipping symbols on the input until a token in a
selected set of synchronizing tokens appear. Its effectiveness depends on the choice of
• A derivation tree with root S shows the productions used to obtain a
sentential form.

Automata &Compiler Design Page 15

LL(k)
LL(k) grammar performs a top-down, leftmost parse after reading the string from left-to-
right Here, kk is the number of look-aheads allowed.
With the knowledge of kk look-aheads, we
calculate FIRSTkFIRSTk and FOLLOWkFOLLOWk where:
If the parser looks up entry in the table as synch, then the non terminal on top of the stack is
popped in an attempt to resume parsing. If the token on top of the stack does not match the
input symbol, then pop the token from the stack.

The moves of a parser and error recovery on the erroneous input) id*+id is as follows:
•
FIRSTkFIRSTk: kk terminals that can be at the beginning of a derived non-terminal
•
FOLLOWkFOLLOWk: kk terminals that can come after a derived non-terminal
The basic idea is to create a lookup table using this information from which the parser
can then simply go and check what derivation is to be made given a certain input token.

Now, the following text from here explains strong LL(k)LL(k):

In the general case, the LL(k)LL(k) grammars are quite difficult to parse directly. This is due to the
fact that the left context of the parse must be remembered somehow.
Each parsing decision is based both on what is to come as well as on what has

already been seen of the input.

The class of LL(1)LL(1) grammars are so easily parsed because it is strong. The strong
LL(kLL(k) grammars are a subset of the LL(k)LL(k) grammars that can be parsed without
knowledge of the left-context of the parse. That is, each parsing decision is based only on the
next k tokens of the input for the current nonterminal that is being expanded. Formally,

A grammar (G=N,T,P,S)(G=N,T,P,S) is strong if for any two distinct A-productions in the grammar:
A→αA→α
A→βA→β
FIRSTk(αFOLLOWk(A))∩FIRSTk(βFOLLOWk(A))=∅FIRSTk(αFOLLOWk(A))∩FIRSTk(βFOL LOWk(A))=∅

That looks complicated so we‟ll see it another way. Let‟s take a textbook example to understand,
instead, when is some grammar “weak” or when exactly would we need to know the left-context of
the parse.

S→aAaS→aAa
S→bAbaS→bAba
A→bA→b
A→ϵA→ϵ
Here, you‟ll notice that for an LL(2)LL(2) instance, baba could result from either of
the SSproductions. So the parser needs some left-context to decide whether baba is produced
by S→aAaS→aAa or S→bAbaS→bAba.
Such a grammar is therefore “weak” as opposed to being a strong LL(k)LL(k) grammar.

Automata &Compiler Design Page 16

Unit-II

BOTTOM UPPARSING:
Bottom-up parser builds a derivation by working from the input sentence back towards the
start symbol S. Right most derivation in reverse order is done in bottom-up parsing.

(The point of parsing is to construct a derivation. A derivation consists of a series of

rewrite steps)
S r0 r1 r2 - - - rn-

1 rn sentence Bottom-up

Assuming the production A→ , to reduce ri ri-1 match some RHS against ri then replace with
its corresponding LHS, A.In terms of the parse tree, this is working from leaves to root.
Example – 1:

S→if E then S else S/while E do S/

print E→ true/ False/id
Input: if id then while true do print else print.
Parse tree:
Basicidea: Given input string a, “reduce” it to the goal (start) symbol, by looking for substring
that match productionRHS.
S

E then S Clse
If S

While E do S
|
true

if E then S
elseS lm
if id then S
elseS lm
if id then while E do S
elseS lm
if id then while true do S
elseS lm
if id then while true do print elseS
lm

Automata &Compiler Design Page 17

if id then while true do print
elseprint lm
if E then while true do print elseprint
rm
if E then while E do print elseprint
rm
if E then while E do S elseprint
rm
if E then S
elseprint rm
if E then S
elseS rm
S
rm

HANDLE PRUNING:

Keep removing handles, replacing them with corresponding LHS of production, until we reach S.

Example:

E→E+E/E*E/(E)/id

Right-sentential form Handle Reducing production

a+b*c A E→id

E+b*c B E→id

E+E*C C E→id

E+EE EE E→E*E

E+E E+E E→E+E

The grammar is ambiguous, so there are actually two handles at next-to-last step. We can use parser-
generators that compute the handles for us

LR PARSINGINTRODUCTION:
The "L" is for left-to-right scanning of the input and the "R" is for constructing
a rightmost derivation in reverse.

Automata &Compiler Design Page 18

WHY LR-PARSING:
1. LRparsers can be constructed to recognize virtually all programming-
language constructs for which context-free grammarscan be written.

2. TheLRparsing method is the most general non-backtracking shift- reduce parsing

method known, yetitcanbeimplementedas efficiently as other shift-reducemethods.

3. The class of grammars that can be parsed using LR methods is a proper subset of
the class of grammars that can be parsed with predictiveparsers.

4,AnLR parser can detect a syntactic error as soon as it is possible to do so on a left-

to-right scan of theinput.
The disadvantage is that it takes too much work to constuct an LR parser by hand for a
typical programming-language grammar. But there are lots of LR parser generators
available to make this taskeasy.

LR-PARSERS:
LR(k) parsers are most general non-backtracking shift-reduce parsers. Two cases of
interest are k=0 and k=1. LR(1) is of practical relevance.

„L‟stands for “Left-to-right” scan of input.

„R‟ stands for “Rightmost derivation (in reverse)”.

K‟standsfornumber ofinput symbolsoflook-a-head thatareusedin

makingparsingdecisions.When (K) is omitted, „K‟is assumed to be 1.
LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1
token) for handle recognition.

Automata &Compiler Design Page 19

LR(1) parsers recognize languages that have an LR(1) grammar.
A grammar is LR(1) if, given a right-most derivation

S r0 r1 r2- - - rn-1 rn sentence.

We can isolate the handle of each right-sentential form ri and determine the production by which
to reduce, by scanning ri from left-to-right, going atmost 1 symbol beyond the right end of the
handle of ri.
Parser accepts input when stack contains only the start symbol and no remaining input symbol
areleft.
LR(0)item: (no lookahead)
Grammar rule combined with a dot that indicates a position in its RHS.
I
Ex– 1: S → .S$
S→.
x S→.(L)

Ex-2: A→XYZ generates 4LR(0) items

A→.XYZ
A→X.
YZ A→XY.
Z A→XYZ.
A→XY.Z indicates that the parser has seen a string derived from XY and is looking
for one derivable from Z.
→ LR(0) items play a key role in the SLR(1) table constructionalgorithm.

→ LR(1) items play a key role in the LR(1) and LALR(1) table
constructionalgorithms. LR parsers have more information available
than LL parsers when choosing aproduction:
* LR knowseverything derived fromRHS plus„K‟lookaheadsymbols.
* LL just knows„K‟lookaheadsymbols into what‟sderived fromRHS.
* Deterministic context free languages:
*
*
* LR (1) languages
*
*

LALR PARSING:
Example:

Construct C={I0,I1,… .........,In} The collection of sets of LR(1)items

For each core present among the set of LR (1) items, find all sets having that core, and
replace there sets by their Union# (clus them into a singleterm)

Automata &Compiler Design Page 20

I0 →same asprevious
I1 →“
I2 →“

C →cC,c/d/$
C→cC,c/d/$
C→d,c/d/$ I5→some
as previous
I47→C→d,c/d/$
I89→C→cC, c/d/$

LALR Parsing table construction:

Action Goto
State
c d $ S C

Io S36 S47 1 2

1 Accept

2 S36 S47 5

36 S36 S47 89

47 r3 r3

5 r1

89 r2 r2 r2

Ambiguous grammar:

A CFG is said to ambiguous if there exists more than one derivation tree for the given input string i.e.,
more than one LeftMost Derivation Tree (LMDT) or RightMost Derivation Tree (RMDT).

Definition: G = (V,T,P,S) is a CFG is said to be ambiguous if and only if there exist a string in T*
that has more than on parse tree.
where V is a finite set of variables.
T is a finite set of terminals.
P is a finite set of productions of the form -> α, where A is a variable and α ∈ (V ∪ T)* S is a designated variable called the start symbol.

For Example:

1. Let us consider this grammar : E ->E+E|id

We can create 2 parse tree from this grammar to obtain a string id+id+id :
The following are the 2 parse trees generated by left most derivation:

Automata &Compiler Design Page 21

Both the above parse trees are derived from same grammar rules but both parse trees are different.
Hence the grammar is ambiguous.
YACC PROGRAMMING
A parser generator is a program that takes as input a specification of a syntax, and produces
as output a procedure for recognizing that language. Historically, they are also called
compiler-compilers.
YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right, Rightmost
derivation producer with 1 lookahead token) parser generator. YACC was originally
designed for being complemented by Lex.

Input File:
YACC input file is divided in three parts.
/* definitions */
....

%%
/* rules */
....
%%

/* auxiliary routines */
....
Input File: Definition Part:
• The definition part includes information about the tokens used in the syntax definition:
• %token NUMBER

Automata &Compiler Design Page 22

%token ID
• Yacc automatically assigns numbers for tokens, but it can be overridden
by %token NUMBER 621
• Yacc also recognizes single characters as tokens. Therefore, assigned token
numbers should no overlap ASCII codes.
• The definition part can include C code external to the definition of the parser
and variable declarations, within %{and %} in the first column.
• It can also include the specification of the starting symbol in the
grammar: %start nonterminal
• The rules part contains grammar definition in a modified BNF form.
• Actions is C code in { } and can be embedded inside (Translation schemes).

• The auxiliary routines part is only C code.

• It includes function definitions for every function needed in rules part.
• It can also contain the main() function definition if the parser is going to be run as a program.
• The main() function must call the function yyparse().

• If yylex() is not defined in the auxiliary routines sections, then it should be

included: #include "lex.yy.c"
• YACC input file generally finishes with:
.y
Output Files:
• The output of YACC is a file named y.tab.c
• If it contains the main() definition, it must be compiled to be executable.
• Otherwise, the code can be an external function definition for the function int yyparse()
• If called with the –d option in the command line, Yacc produces as output a header
file y.tab.h with all its specific definition (particularly important are token definitions to
be included, for example, in a Lex input file).
• If called with the –v option, Yacc produces as output a file y.output containing a textual
description of the LALR(1) parsing table used by the parser. This is useful for tracking
down how the parser solves conflicts.

Semantics
Syntax Directed Translation:

• A formalist called as syntax directed definition is used fort specifying translations for
programming languageconstructs.
• A syntax directed definition is a generalization of a context free grammar in which each
grammar symbol has associated set of attributes and each and each productions is
associated with a set of semanticrules

Definition of (syntax Directed definition ) SDD :

• SDD is a generalization of CFG in which each grammar productions X->α is associated with it a set of
semantic rules of the form

a: = f(b1,b2…..bk)

Automata &Compiler Design Page 23

Where a is an attributes obtained from the function f.

A syntax-directed definition is a generalization of a context-free grammar inwhich:

• Each grammar symbol is associated with a set ofattributes.

• Thissetofattributesforagrammarsymbolispartitionedintotwosubsetscalledsynthesized
and inherited attributes of that grammar symbol.

• Each production rule is associated with a set of semanticrules.

• Semantic rules set up dependencies between attributes which can be represented
by a dependencygraph.
• This dependency graph determines the evaluation order of these semanticrules.
• Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may
also have some side effects such as printing avalue.
The two attributes for non terminalare :

The two attributes for non terminalare :

Synthesized attribute (S-attribute) :(↑)
An attribute is said to be synthesized attribute if its value at a parse tree node is
determined from attribute values at the children of the node

Inherited attribute:(↑,→)

An inherited attribute is one whose value at parse tree node is determined in terms of attributes at
the parent and | or siblings of thatnode.

• The attribute can be string, a number, a type, a, memory location or anythingelse.

• The parse tree showing the value of attributes at each node is called an
annotated parse tree.

The process of computing the attribute values at the node is called annotating or decorating
the parse tree.Terminals can have synthesized attributes, but not inherited attributes.

Annotated Parse Tree

• A parse tree showing the values of attributes at each node is called an Annotated parsetree.

• The process of computing the attributes values at the nodes is called annotating(or
decorating) of the parse tree.
• Of course, the order of these computations depends on the dependency graph
induced by the semanticrules.
Ex1:1) Synthesized Attributes : Ex: Consider the CFG :
S→ EN
E→E+T
E→E-T

E→ T
T→ T*F
T→T/F
T→F
F→(E)
F→digit N→;

Automata &Compiler Design Page 24

Solution: The syntax directed definition can be written for the above grammar by using
semantic actions for each production
Productionrule Semanticactions

S→EN S.val=E.val
E→E1+T E.val =E1.val +T.val
E→E1-T E.val = E1.val –T.val
E→T E.val=T.val
T→T*F T.val = T.val *F.val
T→T|F T.val =T.val | F.val
F→ (E) F.val=E.val
T→F T.val=F.val
F→digit F.val =digit.lexval
N→; can be ignored by lexical Analyzer as;I
is terminating symbol
For the Non-terminals E,T and F the values can be obtained using the attribute “Val”.

The taken digit has synthesized attribute “lexval”.

In S→EN, symbol S is the start symbol. This rule is to print the final answer of expressed.

Following steps are followed to Compute S attributed definition

Write the SDD using the appropriate semantic actions for corresponding production rule
of the givenGrammar.

The annotated parse tree is generated and attribute values are computed. The Computation
is done in bottom upmanner.

The value obtained at the node is supposed to be final output.

L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not taking values
from right siblings.

In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes. As
in the following production

S→ABC

S can take values from A, B, and C (synthesized). A can take values from S only. B can take values
from S and A. C can get values from S, A, and B. No non-terminal can get values from the sibling
to its right.

Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.

Automata &Compiler Design Page 25

We may conclude that if a definition is S-attributed, then it is also L-attributed as L-attributed
definition encloses S-attributed definitions
Intermediate Code
An intermediate code form of source program is an internal form of a program created by the
compiler while translating the program created by the compiler while translating the program from
a high –level language to assembly code(or)object code(machine code).an intermediate source form
represents a more attractive form of target code than does assembly. An optimizing Compiler
performs optimizations on the intermediate source form and produces an objectmodule.

Analysis + syntheses=translation

Createsan generate targe

code Intermediatecode

parser Static intermediate intermediate code

Checker code generator code generator

In the analysis –synthesis model of a compiler, the front-end translates a source program into an
intermediate representation from which the back-end generates target code, in many compilers the
source code is translated into a language which is intermediate in complexity between a HLL and
machine code .the usual intermediate code introduces symbols to stand for various temporary
quantities.

We assume that the source program has already been parsed and statically checked..the
various intermediate code forms are:

a) Polishnotation
b) Abstract syntax trees(or)syntaxtrees
c) Quadruples
d) Triples three address code
e) Indirecttriples
f) Abstract
machinecode(or)pseudocopde postfix notation:

Automata &Compiler Design Page 26

The ordinary (infix) way of writing the sum of a and b is with the operator in the middle: a+b. the
postfix (or postfix polish)notation for the same expression places the operator at the right end,
asab+.

In general, if e1 and e2 are any postfix expressions, and Ø to the values denoted by e1 and e2 is
indicated in postfix notation nby e1e2Ø.no parentheses are needed in postfix notation because the
position and priority (number of arguments) of the operators permits only one way to decode a
postfixexpression.

Syntax Directed Translation:

Definition of (syntax Directed definition ) SDD :

SDD is a generalization of CFG in which each grammar productions X->α is associated with it a
set of semantic rules of the form

a: = f(b1,b2…..bk)
Where a is an attributes obtained from the function f.

• A syntax-directed definition is a generalization of a context-free grammar inwhich:

• Each grammar symbol is associated with a set ofattributes.

Thissetofattributesforagrammarsymbolispartitionedintotwosubsetscalledsynthesized
and inherited attributes of that grammar symbol.

• Each production rule is associated with a set of semanticrules.

• Semantic rules set up dependencies between attributes which can be represented

by a dependencygraph.

Annotated Parse Tree

• A parse tree showing the values of attributes at each node is called an Annotated parsetree.

• The process of computing the attributes values at the nodes is called annotating(or
decorating) of the parse tree.Of course, the order of these computations depends on
the dependency graph induced by the

Automata &Compiler Design Page 27

Syntax tree:

Annotated parse tree :

ASSIGNMENT STATEMENTS

Suppose that the context in which an assignment appears is given by the following grammar. P
MD
M ɛ
D D ; D | id : T | proc id ; N D ;
SNɛ
Nonterminal P becomes the new start symbol when these productions are added to those in
the translation scheme shown below.

Translation scheme to produce three-address code for assignments

→
S id := E { p : = lookup ( id.name);
ifp≠nil then
emit( p ‘ : =’ E.place)
elseerror }
→
E E1 + E2 { E.place : = newtemp;

Automata &Compiler Design Page 28

emit(E.place ‘: =’ E1.place ‘ + ‘ E2.place ) }
→
E E1 * E2 { E.place : = newtemp;
emit(E.place ‘: =’ E1.place ‘ * ‘ E2.place ) }
→
E - E1 { E.place : = newtemp;
emit ( E.place ‘: =’ ‘uminus’ E1.place ) }

E →( E1 ) { E.place : = E1.place }

→
E id { p : = lookup ( id.name);
ifp≠nil then
E.place : = p
elseerror }
Flow-of-Control Statements
We now consider the translation of boolean expressions into three -address code in the context of
if-then, if-then-else, and while-do statements such as those generated by the following grammar:
S if E then S1
if E then S1 else
| S2
| while E do S1
In each of these productions, E is the Boolean expression to be translated. In the translation, we
assume that a three-address statement can be symbolically labeled, and that the function
newlabelreturns a new symbolic label each time it is called.
• E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
• The semantic rules for translating a flow-of-control statement S allow control to flow from
the translation S.code to the three-address instruction immediately following S.code.
• S.nextis a label that is attached to the first three-address instruction to be executed after the
code for Code for if-then , if-then-else, and while-do statements

E.false: S2.code

E.false: ...

S.next: ...

(a) if-then (b) if-then-else

Automata &Compiler Design Page 29

S.begin: E.code to E.true

to E.false
E.true: S1.code

gotoS.begin

E.false: ...
(c) while-do

PRODUCTION SEMANTIC RULES

→
Sif E then S1 E.true : = newlabel;
E.false : = S.next;
S1.next : = S.next;
→ S.code : = E.code || gen(E.true „:‟) || S1.code
Sif E then S1else S2 E.true : = newlabel;
E.false : = newlabel;
S1.next : = S.next;
S2.next : = S.next;
S.code : = E.code || gen(E.true „:‟) || S 1.code ||
gen(„goto‟ S.next) ||

→ gen( E.false „:‟) || S2.code

SwhileE do S1 S.begin : = newlabel;
E.true : = newlabel;
E.false : = S.next;
S1.next : = S.begin;
S.code : = gen(S.begin „:‟)|| E.code ||
gen(E.true „:‟) || S1.code ||
gen(„goto‟ S.begin)

Automata &Compiler Design Page 30

Unit-III

According to Chomsky hierarchy, grammars are divided of 4 types:

Type 0 known as unrestricted grammar.

Type 1 known as context sensitive grammar.
Type 2 known as context free grammar.
Type 3 Regular Grammar.

Type 0: Unrestricted Grammar:

In Type 0
Type-0 grammars include all formal grammars. Type 0 grammar language are recognized by
turing machine. These languages are also known as the Recursively Enumerable languages.

Grammar Production in the form of

alpha \to \beta where alpha is ( V + T)* V ( V +

T)* V : Variables
T : Terminals.
beta is ( V + T )*.
In type 0 there must be at least one variable on Left side of production.

For example,
Sab –>ba A–
>S.
Here, Variables are S, A and Terminals a, b.

Type 1: Context Sensitive Grammar)

Type-1 grammars generate the context-sensitive languages. The language generated by the grammar
are recognized by the Linear Bound Automata
In Type 1
I. First of all Type 1 grammar should be Type 0.
II. Grammar Production in the form of

alpha \to \beta

alpha| <= |\beta|
i.e count of symbol in \alpha is less than or equal to \beta

For Example,
S–>AB
AB –>abc
B –> b

Type 2: Context Free Grammar:

Type-2 grammars generate the context-free languages. The language generated by the grammar is
recognized by a Pushdown automata. Type-2 grammars generate the context-free languages.

In Type 2,
1. First of all it should be Type 1.
2. Left hand side of production can have only one variable.

alpha= 1.

Automata &Compiler Design Page 31

Their is no restriction on \beta.

For example,
S–>AB
A –> a B
–> b

Type 3: Regular Grammar:

Type-3 grammars generate regular languages. These languages are exactly all languages that can
be accepted by a finite state automaton.

Type 3 is most restricted form of grammar.

Type 3 should be in the given form only :

V–>VT*/
T*. (or)
V –> T*V /T*

for example :
S –> ab.

TypeChecking:
• A compiler has to do semantic checks in addition to syntactic checks.
• Semantic Checks
• Static –done duringcompilation
• Dynamic –done duringrun-time
• Type checking is one of these static checkingoperations.
• we may not do all type checking at compile-time.
• Some systems also use dynamic type checking too.
• A type system is a collection of rules for assigning type expressions to the parts of a program.
• A type checker implements a type system.
• A sound type system eliminates run-time type checking for type errors.
• A programming language is strongly-typed, if every program its compiler accepts will
execute without type errors.

• In practice, some of type checking operations is done at run-time (so, most of

the programming languages are not strongly yped).

• –Ex: int x[100]; … x[i] most of the compilers cannot guarantee that i will be between
0and 99

Type Expression:
• The type of a language construct is denoted by a typeexpression.

•A type expression can be:

–A basic type

•a primitive data type such as integer, real, char, Boolean, …

• type-error to signal a typeerror

• void: notype

Automata &Compiler Design Page 32

A type name

• a name can be used to denote a type expression.

• A type constructor applies to other type expressions.

• arrays: If T is a type expression, then array (I,T)is a type expression where I denotes index
range. Ex: array(0..99,int)
• products: If T1and T2 are type expressions, then their Cartesian product T1 x T2 is a type
expression. Ex: int xint
• pointers: If T is a type expression, then pointer (T) is a type expression. Ex: pointer(int)

• functions: We may treat functions in a programming language as mapping from a domain

type D to a range type R. So, the type of a function can be denoted by the type expression
D→R where D are R type expressions. Ex: int→int represents the type of a
function which takes an int value as parameter, and its return type is alsoint.

Type Checking of Statements:

S->d=E { if (id.type=E.type

thenS.type=void else S.type=type-

error

S ->if Ethen S1 { if (E.type=boolean thenS.type=S1.type

else S.type=type-error}

S->while EdoS1 { if (E.type=boolean thenS.type=S1.type

else S.type=type-error}

Type Checking of Functions:

E->E1(E2) { if (E2.type=s and E1.type=s t)thenE.type=t

else E.type=type-error}

Ex: intf(double x, char y) { ... }

f: double x char->int

argumenttypes return type

Structural Equivalence of Type Expressions:

•How do we know that two type expressions are equal?

•As long as type expressions are built from basic types (no type names), we may use structural
equivalence between two type expressions
Structural Equivalence Algorithm (sequin):
if (s and t are same basic types) then return true
else if (s=array(s1,s2) and t=array(t1,t2))

Automata &Compiler Design Page 33

then return (sequiv(s1,t1)

andsequiv(s2,t2))

else if(s= s1 x s2and t = t1 x t2)

then return (sequiv(s1,t1)

and sequiv(s2,t2))

else if (s=pointer(s1) and t=pointer(t1)) then return (sequiv(s1,t1))

else if (s = s1 s2and t = t1 t2) then return (sequiv(s1,t1) and sequiv(s2,t2))

else return false

Names for Type Expressions:

In some programming languages, we give a name to a type expression, and we use that name
as a type expressionafterwards.

type link= ↑cell; ? p,q,r,s have same

types ? varp,q :link;

varr,s : ↑cell

•How do we treat type names?

Get equivalent type expression for a type name (then use structural
equivalence), or Treat a type name as a basic type

Overloading of Functions and Operators

AN OVERLOADED OPERATOR may have different meanings depending upon its
context. Normally overloading is resolved by the types of the arguments,
but sometimes this is not possible and an expression can have a set of possible types.
Example 2 In the previous section we were resolving overloading of binary arithmetic operators by
looking at the the types of the arguments. Indeed we had two possibles types, say $ \mathbb {Z}$
and $ \mathbb {R}$, with a natural coercion due to the inclusion
$\displaystyle \mbox{${\mathbb Z}$}$ $\displaystyle \subseteq$ $\displaystyle \mbox{${\mathbb
R}$}$ (2)
But what could we do if we had the three types $ \mathbb {Z}$, $ \mbox{${\mathbb Z}$}$/p$
\mbox{${\mathbb Z}$}$ and $ \mbox{${\mathbb Z}$}$/m$ \mbox{${\mathbb Z}$}$ for two
different integers m and p?
There is no natural coercion between $ \mbox{${\mathbb Z}$}$/p$ \mbox{${\mathbb Z}$}$ and $
\mbox{${\mathbb Z}$}$/m$ \mbox{${\mathbb Z}$}$.
So the type of an expression like 1 + 2 and consequently the signature of + may also depend on
what is done with 1 + 2.
SET OF POSSIBLE TYPES FOR A SUBEXPRESSION. The first step in resolving the overloading
of operators and functions occuring in an expression E' is to determine the possible types for E'.
For simplicity, we restrict here to unary functions.
We assign to each subexpression E of E' a synthesized attribute E.types which is the set
of possible types for E.

Automata &Compiler Design Page 34

These attributes can be computed by the following translation scheme.
E' $ \longmapsto$ E { E'.types := E.types }
E $ \longmapsto$ $ \bf id$ { E.types := lookup(id.entry) }
E $ \longmapsto$ E1[E2] { E.types := {t | ($ \exists$ s $
\in$ E2.types) | (s $ \rightarrow$ t) $ \in$ E1.types} }

NARROWING THE SET OF POSSIBLE TYPES. The second step in resolving the overloading of
operators and functions in the expression E' consists of
determining if a unique type can be assigned to each subexpression E of
E' and generating the code for evaluating each subexpression E of E'.
This s done by
assigning an inherited attribute E.unique to each subexpression E of E'
such that either E can be assigned a unique type and E.unique is this type,
or E cannot be assigned a unique type and E.unique is $ \bf type\_error$.
assigning a synthesized attribute E.code which is the target code for evaluating E and
executing the following translation scheme (where the attributes E.types have already
been computed).

Automata &Compiler Design Page 35

Unit-IV

STORAGE ORGAN ISAT ION

• The executin•• tar3et program runs in its own logical address space in which each
program value has a location.
• The management and organization of this logical address space is shared between the
complies, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.

CoJe

Static Data

Stack

free memory

Heap

Run-time storage comes in blocks, where a byte is the smallest unit of addressable
bytes and given the address of‘ first byte.
• The storage layout for data objects is strongly influenced by the addressing constraints of
the target machine.
• A character array of length 10 needs only enough bytes to hold 10 characters, a compiler
may allocate 12 bytes to s et alignment, leaving 2 bytes unused.
• This unused space due to alignment considerations 1s referred to as padding.
• The size of sortie program objects may be known at run time and may be placed in an
area cal led static
• The dynamic areas used to maximize the utilization of space at run time are stack and
heap.

Activation records:
• Procedure calls and returns are usually man‹s ed by a run time stack called the cnnli ol
.s‘tavk.
• Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its recoi’d at the top of‘the stuck.
• The contents of the activation record vary with the lans uage being implemented. The
dias ram below shows the contents of activation record.

Automata &Compiler Design Page 36

•
Local daia belonging to the procedure whose activation record this is.
• A saved machine status, with information about the state of the machine just before the
call to procedures.
• An access link may be needed to locate data needed by the called procedure but found
elsewhere.
• A control link pointing to the activation record of the caller.
Space tor the return value of’ the called tiinctions, if any. A gain, not all called procedures
etJiciency.
• The actual parameters used by the calling procedure. These are not placed in activation
record but rather in res isters, when possible, for greater et’liciency.

STORAGE ALLOCATION STRATEGIES

at
2 as a
3 Heap allocation — allocates and deallocates storage as needed at run time from a data area
known as heap

STATIC ALLOCATION
In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time support package.
• Since the bindings do not change at run-time, everytime a procedure is activated, its
names are bound to the same storage locations.
• Therefore values of local names are i elaineJ across activations of a procedure That is,
when control returns to a procedure the values of the locals are the same as they were
when control left the last time.
• Front the type of a name, the compiler decides the amount of storage for the naiTle and
decides where the activation records go. Ai compile time, we can fill in lhe addresses at
which the target code can i’iiid the data it operates on.

Automata &Compiler Design Page 37

CODE OPTIMIZATION
The code produced by the straight forward compiling algorithms can often be made to run faster
or take less space, or both. This improvement is achieved by program transformations that are
traditionally called optimizations. Compilers that apply code- improving transformations are
called optimizingcompilers.

Optimizations are classified into two categories.

Theyare Machine independentoptimizations:
Machine dependantoptimizations:

Machine independentoptimizations:
Machine independent optimizations are program transformations that improve the
target code without taking into consideration any properties of the targetmachine.

Machine dependantoptimizations:
Machine dependant optimizations are based on register allocation and utilization of
special machine- instruction sequences.

The criteria for code improvementtransformations:

• Simply stated, the best program transformations are those that yield the most benefit for
the leasteffort.

• The transformation must preserve the meaning of programs. That is, the optimization must
not change the output produced by a program for a given input, or cause an error such as
division by zero, that was not present in the original source program. At all times we take
the “safe” approach of missing an opportunity to apply a transformation rather than risk
changing what the programdoes.

• A transformation must, on the average, speed up programs by a measurable amount. We

are also interested in reducing the size of the compiled code although the size of the code
has less importance than it once had. Not every transformation succeeds in improving
every program, occasionally an “optimization” may slow down a programslightly.

• The transformation must be worth the effort. It does not make sense for a compilerwriter

Automata &Compiler Design Page 38

• to expend the intellectual effort to implement a code improving transformation and to
have the compiler expend the additional time compiling source programs if this effort is
not repaid when the target programs are executed. “Peephole” transformations of this kind
are simple enough and beneficial enough to be included in anycompiler.
• Flow analysis is a fundamental prerequisite for many important types of code
improvement.
• Generally control flow analysis precedes data flowanalysis.
• Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA
constructs sucha
o control flow graph
o Callgraph

• Data flow analysis (DFA) is the process of ascerting and collecting information prior
to program execution about the possible modification, preservation, and use of certain
entities (such as values or attributes of variables) in a computerprogram

Function-Preserving Transformations
• There are a number of ways in which a compiler can improve a program
without changing the function itcomputes.
• Thetransformations
o Common sub expressionelimination,
o Copypropagation,
o Dead-code elimination,and
o Constant folding, are common examples of such function-preserving
transformations. The other transformations come up primarily when global
optimizations areperformed.
• Frequently, a program will include several calculations of the same value, such as an
offset in an array. Some of the duplicate calculations cannot be avoided by the
programmer because they lie below the level of detail accessible within the source
language.
Common Sub expressionselimination:
• An occurrence of an expression E is called a common sub-expression if E was
previously computed, and the values of variables in E have not changed since the
previous computation. We can avoid recomputing the expression if we can
usethe previously computedvalue.

Forexample
t1: =4*i
t2: =a [t1]
t3: =4*j
t4:=4*i
t5:=n
t 6: =b [t 4] +t 5

The above code can be optimized using the common sub-expression eliminationas t1:=4*i
t2:=a
[t1]t3:=4*j
t5:=n
t6: =b [t1] +t5
The common sub expression t 4: =4*i is eliminated as its computation is already in t1.
And value of i is not been changed from definition to use.

Automata &Compiler Design Page 39

CopyPropagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind the copy-
propagation transformation is to use g for f, whenever possible after the copy statement f: = g. Copy
propagation means use of one variable instead of another. This maynot appear to be an improvement, but
as we shall see it gives us an opportunity to eliminatex.

For example: x=Pi;

…… A=x*r*r;

The optimization using copy propagation can be done as follows: A=Pi*r*r; Here the

variable x is eliminated

Dead-CodeEliminations:
A variable is live at a point in a program if its value can be used subsequently; otherwise, it is
dead at that point. A related idea is dead or useless code, statements that compute values that
never get used. While the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations. An optimization can be done by
eliminating deadcode.

Example: i=0;
if(i=1)
{
a=b+5;
}
Here,„if‟statement is dead codebecausethis condition will never get satisfied.

Constant folding:

• We can eliminate both the test and printing from the object code. More generally,
deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constantfolding.

• One advantage of copy propagation is that it often turns the copy statement into deadcode.

Forexample,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.

Loop Optimizations
• We now give a brief introduction to a very important place for optimizations, namely
loops, especially the inner loops where programs tend to spend the bulk of their time. The
running time of a program may be improved if we decrease the number of instructions in
an inner loop, even if we increase the amount of code outside thatloop.
• Three techniques are important for loopoptimization:
codemotion, which moves code outside aloop;
Induction -variable elimination, which we apply to replace variables from innerloop.
Reduction in strength, which replaces and expensive operation by a cheaper one, such
as a multiplication by anaddition
CodeMotion:
An important modification that decreases the amount of code in a loop is code motion. This
transformation takes an expression that yields the same result independent of the number of times
a loop is executed ( a loop-invariant computation) and places the expression before the loop. Note

Automata &Compiler Design Page 40

that the notion “before the loop” assumes the existence of an entry for the loop. For
example, evaluation of limit-2 is a loop-invariant computation in the followingwhile-
statement: while (i<= limit-2) /* statement does not change Limit*/ Code motion
will result in the equivalent of
t= limit-2;
while (i<=t) /* statement does not change limit or t */

Induction Variables
• Loops are usually processed inside out. For example consider the loop aroundB3.
• Note that the values of j and t4 remain in lock-step; every time the value of j decreases by
1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called
inductionvariables.

• When there are two or more induction variables in a loop, it may be possible to get rid of
all but one, by the process of induction-variable elimination. For the inner loop around
B3 in Fig. we cannot get rid of either j or t4 completely; t4 is used in B3 and j inB4.
• However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2
- B5 is considered.

LOOPS IN FLOWGRAPH
• A graph representation of three-address statements, called a flow graph, is useful for
understanding code-generation algorithms, even if the graph is not explicitly constructed
by a code-generation algorithm. Nodes in the flow graph represent computations, and the
edges represent the flow of control.

Dominators:
In a flow graph, a node d dominates node n, if every path from initial node of the flow graph to n
goes through d. This will be denoted by d dom n. Every initial node dominates all the remaining
nodes in the flow graph and the entry of a loop dominates all nodes in the loop.
Similarlyeverynode dominates itself.

Example:
• In the flow graph below,
• Initial node,node1 dominates every node. *node 2 dominatesitself
• node 3 dominates all but 1 and 2. *node 4 dominates all but 1,2 and 3.

• node 5 and 6 dominates only themselves,since flow of control can skip around either by goin
through theother.
• node 7 dominates 7,8 ,9 and 10. *node 8 dominates 8,9 and 10.

• node 9 and 10 dominates only themselves

Automata &Compiler Design Page 41

• The way of presenting dominator information is in a tree, called the dominator tree
in which the initial node is theroot.

• The parent of each other node is its immediatedominator.

• Each node d dominates only its descendents in thetree.
• The existence of dominator tree follows from a property of dominators; each node has a
unique immediate dominator in that is the last dominator of n on any path from the initial
node ton.
• In terms of the dom relation, the immediate dominator m has the property is d=!n and
d dom n, then d domm.

D(1)={1}

D(2)={1,2}

D(3)={1,3}

D(4)={1,3,4}

D(5)={1,3,4,5}
(6)={1,3,4,6}

D(7)={1,3,4,7}

D(8)={1,3,4,7,8}

Automata &Compiler Design Page 42

D(9)={1,3,4,7,8,9}

D(10)={1,3,4,7,8,10}
NaturalLoop

• One application of dominator information is in determining the loops of a

flow graph suitable forimprovement.

• The properties of loopsare

o A loop must have a single entry point, called the header. This entry point-
dominates all nodes in the loop, or it would not be the sole entry to theloop.
o There must be at least one wayto iterate the loop(i.e.)at least one
path back to the header.

One way to find all the loops in a flow graph is to search for edges in the flow graph whose
heads dominate their tails. If a→b is an edge, b is the head and a is the tail. These types of edges
are called as backedg

Example:
In the above graph,
7→ 4 4 DOM7

0 →7 7 DOM10

4→ 3

8→ 3

9 →1

The above edges will form loop in flowgraph.

Given a back edge n → d, we define the natural loop of the edge to be d plus the set
of nodes that can reach n without going through d. Node d is the header of theloop.

Algorithm: Constructing the natural loop of a back edge.

Input: A flow graph G and a back edge n→d

LOOP:

• If we use the natural loops as “the loops”, then we have the useful property that
unless two loops have the same header, they are either disjointed or one is entirely
contained in the other. Thus, neglecting loops with the same header for the
moment, we have a natural notion of inner loop: one that contains no otherloop.

• When two natural loops have the same header, but neither is nested within the
other, they are combined and treated as a singleloop.

Pre-Headers:

Several transformations require us to move statements “before the header”.

Therefore begin treatment of a loop L by creating a new block, called thepreheater.

Automata &Compiler Design Page 43

The pre -header has only the header as successor, and all edges which
formerly entered the header of Lfrom outside L instead enter thepre-header.

Edges from inside loop L to the header are notchanged.

Initially the pre-header is empty, but transformations on L may place statements init.

header pre-
header

loop L

header

loop L

Reducible flow graphs:

• Reducible flow graphs are special flow graphs, for which several code optimization
transformations are especially easy to perform, loops are unambiguously defined,
dominators can be easily calculated, data flow analysis problems can also be solved
efficiently.
• Exclusive use of structured flow-of-control statements such as if-then-else, while-do,
continue, and break statements produces programs whose flow graphs are always
reducible. The most important properties of reducible flow graphs are that there are no
jumps into the middle of loops from outside; the only entry to a loop is through its header.

Definition:
• A flow graph G is reducible if and only if we can partition the edges into
twodisjoint groups, forward edges and back edges, with the followingproperties.
• The forward edges from an acyclic graph in which every node can be reached
from initial node ofG.
• The back edges consist only of edges where heads dominate theirstails.
Example: The above flow graph isreducible.

• If we know the relation DOM for a flow graph, we can find and remove all
the back edges.
• The remaining edges are forwardedges.
• If the forward edges form an acyclic graph, then we can say the flow graphreducible.
• In the above example remove the five back edges 4→3, 7→4, 8→3, 9→1 and 10→7
whose heads dominate their tails, the remaining graph isacyclic.
• The key property of reducible flow graphs for loop analysis is that in such flow graphs
every set of nodes that we would informally regard as a loop must contain a backedge

PEEPHOLE OPTIMIZATION

• A statement-by-statement code-generations strategy often produce target code that

contains redundant instructions and suboptimal constructs .The quality of such target code
can be improved by applying “optimizing” transformations to the targetprogram.
• A simple but effective technique for improving the target code is peephole optimization, a
method for trying to improving the performance of the target program by examining a
short sequence of target instructions (called the peephole) and replacing these instructions

Automata &Compiler Design Page 44

by a shorter or faster sequence, wheneverpossible.
• The peephole is a small, moving window on the target program. The code in the peephole
need not contiguous, although some implementations do require this.it is characteristic of
peephole optimization that each improvement may spawn opportunities for additional
improvements.
• We shall give the following examples of program transformations that
are characteristic of peepholeoptimizations:
▪
Redundant-instructionselimination
▪
Flow-of-controloptimizations
▪ Algebraicsimplifications
▪ Use of machineidioms
▪ UnreachableCode

Redundant Loads And Stores:

If we see the instructions sequence
(1) MOVR0,a
(2) MOVa,R0
we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of a is already
in register R0.If (2) had a label we could not be sure that (1) was always executed immediately before (2)
and so we could not remove (2).

INTRODUCTION TO GLOBAL DATAFLOWANALYSIS

• In order to do code optimization and a good job of code generation , compiler needs to
collect information about the program as a whole and to distribute this information to
each block in the flowgraph.
• A compiler could take advantage of “reaching definitions” , such as knowing where a
variable like debug was last defined before reaching a given block, in order to perform
transformations are just a few examples of data-flow information that an optimizing
compiler collects by a process known as data-flowanalysis.
Data- flow information can be collected by setting up and solving systems of
equations of the form:
out [S] = gen [S] U ( in [S] – kill [S] )

This equation can be read as “ the information at the end of a statement is either generated within
the statement , or enters at the beginning and is not killed as control flows through thestatement.”
• The details of how data-flow equations are set and solved depend on threefactors.
• The notions of generating and killing depend on the desired information, i.e., on thedata
flow analysis problem to be solved. Moreover, for some problems, instead of
proceeding along with flow of control and defining out[s] in terms of in[s], we need to
proceed backwards and define in[s] in terms ofout[s].
• Since data flows along control paths, data-flow analysis is affected by the constructs in a
program. In fact, when we write out[s] we implicitly assume that there is unique end
point where control leaves the statement; in general, equations are set up at the level of
basic blocks rather than statements, because blocks do have unique endpoints.
• There are subtleties that go along with such statements as procedure calls,
assignments through pointer variables, and even assignments to arrayvariables.

Data-flow analysis of structuredprograms:

• Flow graphs for control flow constructs such as do-while statements have a useful
property: there is a single beginning point at which control enters and a single end point
that control leaves from when execution of the statement is over. We exploit this property
when we talk of the definitions reaching the beginning and the end of statements with the

Automata &Compiler Design Page 45

followingsyntax.

S id: = E| S; S | if E then S else S | do S

while E E id + id|id
• Expressions in this language are similar to those in the intermediate code, but
the flow graphs for statements have restrictedforms.
• We define a portion of a flow graph called a region to be a set ofnodes N that
includes a header, which dominates all other nodes in the region. All edges
between nodes in N are in the region, except for some that enter the header.
• The Portion of flow graph corresponding to a statement S is a region that obeysthe
further restriction that control can flow to just one outside block when it leaves the
region.
• We say that the beginning points of the dummy blocks at the entry and exit of a
statement‟s region are the beginning and end points, respectively, of the
statement.The equations are inductive, or syntax-directed, definition of the sets
in[S], out[S], gen[S], and kill[S] for all statementsS.
• gen[S] is the set of definitions “generated” by S while kill[S] is the set of definitions
that never reach the end ofS.
Consider the following data-flow equations for reaching definitions:
i)

d:a:=b+c
S

gen [S] = { d }
kill [S] = Da – { d }
out [S] = gen [S] U ( in[S] – kill[S] )

Observe the rules for a single assignment of variable a. Surely that assignment is a
definition of a, say d. ThusGen[S]={d}
On the other hand, d “kills” all other definitions of a, so we write Kill[S]
= Da –{d}

Where, Da is the set of all definitions in the program for variable a.

S S 1S 2

Automata &Compiler Design Page 46

gen[S]=gen[S2] U(gen[S1]-kill[S2])
Kill[S] = kill[S2] U (kill[S1] – gen[S2])
in [S1] = in [S] in [S2] =
out [S1] out
[S] = out [S2]
Under what circumstances is definition d generated by S=S1; S2? First of all, if it is
generated by S2, then it is surely generated by S. if d is generated by S1, it will reach the
end of S provided it is not killed by S2. Thus, wewrite
gen[S]=gen[S2] U (gen[S1]-kill[S2])
Similar reasoning applies to the killing of a definition, so we have
Kill[S] = kill[S2] U (kill[S1] –gen[S2])

Automata &Compiler Design Page 47

Unit-V

OBJECT CODE GENERATION:

The final phase in our compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program.
The requirements traditionally imposed on a code generator are severe. The output code must be
correct and of high quality, meaning that it should make effective use of the resources of the
target machine. Moreover, the code generator itself should run efficiently.

ISSUES IN THE DESIGN OF A CODE GENERATOR

While the details are dependent on the target language and the operating system, issues such as
memory management, instruction selection, register allocation, and evaluation order are inherent
in almost all code generation problems.
INPUT TO THE CODE GENERATOR

The input to the code generator consists of the intermediate representation of the source program
produced by the front end, together with information in the symbol table that is used to determine
the run time addresses of the data objects denoted by the names in the intermediate
representation.

There are several choices for the intermediate language, including: linear representations such as
postfix notation, three address representations such as quadruples, virtual machine representations
such as syntax trees and dags.

We assume that prior to code generation the front end has scanned, parsed, and translated the
source program into a reasonably detailed intermediate representation, so the values of names
appearing in the intermediate language can be represented by quantities that the target machine
can directly manipulate (bits, integers, reals, pointers, etc.). We also assume that the necessary
type checking has take place, so type conversion operators have been inserted wherever necessary
and obvious semantic errors (e.g., attempting to index an array by a floating point number) have
already been detected. The code generation phase can therefore proceed on the assumption that its

Automata &Compiler Design Page 48

input is free of errors. In some compilers, this kind of semantic checking is done together with
codegeneration.

TARGET PROGRAMS

The output of the code generator is the target program. The output may take on a variety of
forms: absolute machine language, relocatable machine language, or assembly language.

Producing an absolute machine language program as output has the advantage that it can be
placed in a location in memory and immediately executed. A small program can be
compiled and executed quickly. A number of “student-job” compilers, such as WATFIV and
PL/C, produce absolute code.

Producing a relocatable machine language program as output allows subprograms to be compiled

separately. A set of relocatable object modules can be linked together and loaded for execution by
a linking loader. Although we must pay the added expense of linking and loading if we produce
relocatable object modules, we gain a great deal of flexibility in being able to compile
subroutines separately and to call other previously compiled programs from an object module. If
the target machine does not handle relocation automatically, the compiler must provide explicit
relocation information to the loader to link the separately compiled program segments.
Producing an assembly language program as output makes the process of code generation
somewhat easier .We can generate symbolic instructions and use the macro facilities of the
assembler to help generate code .The price paid is the assembly step after code generation.

Because producing assembly code does not duplicate the entire task of the assembler, this choice
is another reasonable alternative, especially for a machine with a small memory, where a
compiler must uses several passes.

A code-generation algorithm:

The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z, perform the following actions:

Invoke a function getreg to determine the location L where the result of the computation y op
z should be stored.

Consult the address descriptor for y to determine y‟, the current location of y. Prefer the
register for y‟ if the value of y is currently both in memory and a register. If the value of
y is not already in L, generate the instruction MOV y‟ , L to place a copy of y in L.

Generate the instruction OP z‟ , L where z‟ is a current location of z. Prefer a register toa

memory location if z is in both. Update the address descriptor of x to indicate that x is in
location L. If x is in L, update its descriptor and remove x from all other descriptors.

If the current values of y or z have no next uses, are not live on exit from the block, and are
in registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z.

Automata &Compiler Design Page 49

Generating Code for Assignment Statements:

The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
t:=a–b
u:=a–c
v:=t+u
d:=v+u
with d live at the end.
Code sequence for the example is:
Statements Code Generated Register descriptor Address descriptor

t:=a–b MOV a, R0 R0 contains t t in R0

SUB b, R0

u:=a–c MOV a , R1 R0 contains t t in R0

SUB c , R1 R1 contains u u in R1

v:=t+u ADD R1, R0 R0 contains v u in R1

R1 contains u v in R0

R
d:=v+u ADD R1, R0 0 contains d d in R0
d in R0 and memory
MOV R0, d

Generating Code for Indexed Assignments

The table shows the code sequences generated for the indexed
assignment statements a : = b [ i ] and a [ i ] : = b

Statements Code Generated Cost

a : = b[i] MOV b(Ri), R 2

a[i] : = b MOV b, a(Ri) 3

Generating Code for Pointer Assignments

The table shows the code sequences generated for the pointer assignments
a : = *p and *p : = a

Automata &Compiler Design Page 50

Statements Code Generated Cost

a : = *p MOV *Rp, a 2

*p : = a MOV a, *Rp 2

REGISTER ALLOCATION
Instructions involving register operands are usually shorter and faster than those involving operands in memory.
Therefore, efficient utilization of register is particularly important in generating good code. The use of registers is
often subdivided into two sub problems:

1. During register allocation, we select the set of variables that will reside in registers at
a point in theprogram.

2. During a subsequent register assignment phase, we pick the specific register that a
variable will residein.

Finding an optimal assignment of registers to variables is difficult, even with single register
values. Mathematically, the problem is NP-complete. The problem is further complicated because
the hardware and/or the operating system of the target machine may require that certain register
usage conventions be observed.

Certain machines require register pairs (an even and next odd numbered register) for some
operands and results. For example, in the IBM System/370 machines integer multiplication and
integer division involve register pairs. The multiplication instruction is of the form

M x,y

where x, is the multiplicand, is the even register of an even/odd register pair.

The multiplicand value is taken from the odd register pair. The multiplier y is a single register.
The product occupies the entire even/odd register pair.
The division instruction is of theform D x,y

where the 64-bit dividend occupies an even/odd register pair whose even register is x; y represents
the divisor. After division, the even register holds the remainder and the odd register the quotient.
Now consider the two three address code sequences (a) and (b) in which the only difference is the
operator in the second statement. The shortest assembly sequence for (a) and (b) are given in(c).
Ri stands for register i. L, ST and A stand for load, store and add respectively. The optimal
choice for the register into which „a‟ is to be loaded depends on what will ultimately happen to e.

t := a+b t := a +b

t := t* c t := t +c

t := t/ d t := t / d

Automata &Compiler Design Page 51

(a) fig. 2 Two three address codesequences

L R1, a L R0, a

A R1, b A R0, b

M R0, c A R0, c

D R0, d SRDA R0, 32

ST R1, t D R0, d ST R1,t

(a) (b)
THE DAG REPRESENTATION FOR BASIC BLOCKS

A DAG for a basic block is a directed acyclic graph with the following labels on nodes:
Leaves are labeled by unique identifiers, either variable names or constants.
Interior nodes are labeled by an operator symbol.
Nodes are also optionally given a sequence of identifiers for labels to store
the computed values.
DAGs are useful data structures for implementing transformations on basic blocks.
It gives a picture of how the value computed by a statement is used in
subsequent statements.
It provides a good way of determining common sub - expressions
Input: A basic block

Output: A DAG for the basic block containing the following information:

A label for each node. For leaves, the label is an identifier. For interior nodes,
an operator symbol.
For each node a list of attached identifiers to hold the computed values.
Case (i) x : = y OP z

Case (ii) x : = OP y

Case (iii) x : = y

Method:

Step 1: If y is undefined then create node(y).

If z is undefined, create node(z) for case(i).

Step 2: For the case(i), create a node(OP) whose left child is node(y) and right child is

node(z). ( Checking for common sub expression). Let n be this node.

For case(ii), determine whether there is node(OP) with one child node(y). If not
create such a node.
For case(iii), node n will be node(y).
Step 3: Delete x from the list of identifiers for node(x). Append x to the listof attached identifiers for the

node n found in step 2 and set node(x) to n.

Automata &Compiler Design Page 52

Example: Consider the block of three- address statements:

t1 := 4* i
t2 := a[t1]
t3 := 4* i
t4 := b[t3]
t5 := t2*t4
t6 := prod+t5
prod := t6
t7 := i+1
i := t7
if i<=20 goto (1)

Stages in DAG Construction

Automata &Compiler Design Page 53

Automata &Compiler Design Page 54
t6,prod

Stniei cut (9)

tfi,prod
Final DAG

prodo

7 20

Automata &Compiler Design Page 55

GENERATING CODE FROM DAGs

The advantage of generating code for a basic block from its dag representation is that, from a
dag we can easily see how to rearrange the order of the final computation sequence than we can
starting from a linear sequence of three-address statements or quadruples.

Rearranging the order

The order in which computations are done can affect the cost of resulting object code.

For example, consider the following basic block:

t1 : = a + b
t2 : = c + d
t3 : = e – t2
t4 : = t1 – t3
Generated code sequence for basic block:
MOV a , R0
ADD b , R0
MOV c , R1
ADD d , R1
MOV R0 , t1
MOV e , R0
SUB R1 , R0
MOV t1 , R1
SUB R0 , R1
MOV R1 , t4

Rearranged basic block:

Now t1 occurs immediately before t4.
t2 : = c + d
t3 : = e – t2
t1 : = a + b
t4 : = t1 – t3

Revised code sequence:

MOV c , R0
ADD d , R0
MOV a , R0
SUB R0 , R1
MOV a , R0
ADD b , R0
SUB R1 , R0
MOV R0 , t4
In this order, two instructions MOV R0 , t1 and MOV t1 , R1 have been saved.

Automata &Compiler Design Page 56

PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
From Everand
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
Tom Mejer Antonsen
4/5 (12)
10 Day Unit Plan Template
No ratings yet
10 Day Unit Plan Template
19 pages
Grammar Bank 3 1
No ratings yet
Grammar Bank 3 1
6 pages
Automata Theory and Compiler Design Course Handout
No ratings yet
Automata Theory and Compiler Design Course Handout
23 pages
Ielts Academic and General Task 2 - How To Write at A 9 Level
91% (45)
Ielts Academic and General Task 2 - How To Write at A 9 Level
69 pages
Automata and Compiler Design (R18a1201)
No ratings yet
Automata and Compiler Design (R18a1201)
67 pages
It B.tech II Year II Sem Acd (R18a1201) Notes
No ratings yet
It B.tech II Year II Sem Acd (R18a1201) Notes
59 pages
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
No ratings yet
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
64 pages
Automata and Compiler Design
No ratings yet
Automata and Compiler Design
67 pages
ACD Notes Full PDF
No ratings yet
ACD Notes Full PDF
59 pages
Automata Theory & Compiler Design
No ratings yet
Automata Theory & Compiler Design
69 pages
Automata & Compiler Design Handout
No ratings yet
Automata & Compiler Design Handout
59 pages
Module 1 FLAT
No ratings yet
Module 1 FLAT
84 pages
IT (ACD-QUESTION BANK)
No ratings yet
IT (ACD-QUESTION BANK)
5 pages
21CS51 - ATCD - MODULE 1 - Introduction & Central Concepts of Automata Theory
No ratings yet
21CS51 - ATCD - MODULE 1 - Introduction & Central Concepts of Automata Theory
43 pages
Automata Theory
No ratings yet
Automata Theory
57 pages
1_Introduction
No ratings yet
1_Introduction
12 pages
Unit 1
No ratings yet
Unit 1
61 pages
AT Syllabus
No ratings yet
AT Syllabus
3 pages
Why We Study Automata Theory
0% (1)
Why We Study Automata Theory
27 pages
CS361 TOA Course Outline
No ratings yet
CS361 TOA Course Outline
3 pages
Theory of Computation: II B. Tech. - II Semester L T P C Course Code: A3CS10 4 1 - 4
No ratings yet
Theory of Computation: II B. Tech. - II Semester L T P C Course Code: A3CS10 4 1 - 4
2 pages
[Ebooks PDF] download FORMAL LANGUAGE AND AUTOMATA THEORY 2nd Edition Singh Ajit full chapters
100% (3)
[Ebooks PDF] download FORMAL LANGUAGE AND AUTOMATA THEORY 2nd Edition Singh Ajit full chapters
37 pages
Unit 1-Theory of Computation
No ratings yet
Unit 1-Theory of Computation
83 pages
TOC-MOD1-1
No ratings yet
TOC-MOD1-1
57 pages
Cse228 Toc
No ratings yet
Cse228 Toc
10 pages
Regular Languages and Finite Automata: Lecture Notes On
No ratings yet
Regular Languages and Finite Automata: Lecture Notes On
56 pages
CH-1 IntroToAutomataTheory
No ratings yet
CH-1 IntroToAutomataTheory
35 pages
TheoryofComputation
No ratings yet
TheoryofComputation
5 pages
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
No ratings yet
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
27 pages
Toc Questions
No ratings yet
Toc Questions
1 page
Cs8501 - Theory of Computation - by DR W T Chembian
No ratings yet
Cs8501 - Theory of Computation - by DR W T Chembian
167 pages
AT&CD DCM UNIT 4-1
No ratings yet
AT&CD DCM UNIT 4-1
113 pages
CS8501 - THEORY OF COMPUTATION - by LearnEngineering - in
No ratings yet
CS8501 - THEORY OF COMPUTATION - by LearnEngineering - in
162 pages
Lec 1 IntroToAutomataTheory (1)
No ratings yet
Lec 1 IntroToAutomataTheory (1)
20 pages
Lecture 0 CSE322
No ratings yet
Lecture 0 CSE322
46 pages
CS3103-1 - Toc - Lesson Plan
No ratings yet
CS3103-1 - Toc - Lesson Plan
19 pages
DFA
No ratings yet
DFA
33 pages
Operating System
No ratings yet
Operating System
231 pages
Formal Languages & Finite Theory of Automata: BS Course
No ratings yet
Formal Languages & Finite Theory of Automata: BS Course
54 pages
Automata Theory and Formal Languages - Module 2
No ratings yet
Automata Theory and Formal Languages - Module 2
4 pages
Course Overview: - What Are The Practical Benefits/application of Formal Languages and Automata Theory?
No ratings yet
Course Overview: - What Are The Practical Benefits/application of Formal Languages and Automata Theory?
10 pages
Computer Science Press - Introduction To Logic and Automata
No ratings yet
Computer Science Press - Introduction To Logic and Automata
302 pages
MR22_1209 ATCD SYLLABUS
No ratings yet
MR22_1209 ATCD SYLLABUS
2 pages
Theory of Computation
No ratings yet
Theory of Computation
128 pages
Formal Language Theory and Compiler Design and Analysis
No ratings yet
Formal Language Theory and Compiler Design and Analysis
13 pages
Lesson Plan of at&CD Cse(Aiml) II-II
No ratings yet
Lesson Plan of at&CD Cse(Aiml) II-II
23 pages
Lecture 1 to 4 Theory of Computation
No ratings yet
Lecture 1 to 4 Theory of Computation
107 pages
FLA Syllabus
No ratings yet
FLA Syllabus
2 pages
Course Title: Theory of Computation Credit Units: 04 Course Code: CSE204
No ratings yet
Course Title: Theory of Computation Credit Units: 04 Course Code: CSE204
3 pages
AFL Course Plan _Activity Calendar_Tanik
No ratings yet
AFL Course Plan _Activity Calendar_Tanik
4 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
32877.TOC - CHO 3rd Sem
No ratings yet
32877.TOC - CHO 3rd Sem
9 pages
TOC Handbook 2016-17
No ratings yet
TOC Handbook 2016-17
23 pages
CD Digital Notes
No ratings yet
CD Digital Notes
217 pages
Theory of Computation
No ratings yet
Theory of Computation
10 pages
Toc Syllabus
No ratings yet
Toc Syllabus
2 pages
course handout (1) (12)
No ratings yet
course handout (1) (12)
4 pages
FLAT Unit 1 August 2023
No ratings yet
FLAT Unit 1 August 2023
69 pages
Formal Language & Automata Theory
No ratings yet
Formal Language & Automata Theory
32 pages
TCS QB
No ratings yet
TCS QB
20 pages
FLAT Module-I
No ratings yet
FLAT Module-I
123 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
1800 - Dicionario Sumerio (Lexicon)
100% (1)
1800 - Dicionario Sumerio (Lexicon)
146 pages
Saravali of Kalyaavarma Chapter Three D
0% (1)
Saravali of Kalyaavarma Chapter Three D
75 pages
Jaf Transcript 030215
No ratings yet
Jaf Transcript 030215
281 pages
Amazing Adjectives - Google Search
No ratings yet
Amazing Adjectives - Google Search
1 page
Chapter 2 The Nature of Learner Language
100% (1)
Chapter 2 The Nature of Learner Language
17 pages
Paper+3+ (2021 4 12) +Factors+and+Problems+Affecting+Reading+Comprehension+of+Undergraduate+Students
No ratings yet
Paper+3+ (2021 4 12) +Factors+and+Problems+Affecting+Reading+Comprehension+of+Undergraduate+Students
15 pages
English Politeness and Class by Sara Mills (z-lib.org)
No ratings yet
English Politeness and Class by Sara Mills (z-lib.org)
159 pages
The University of Nottingham: Q83AG1-E1
No ratings yet
The University of Nottingham: Q83AG1-E1
3 pages
ECA2plus_Tests_Grammar check 4.4A
No ratings yet
ECA2plus_Tests_Grammar check 4.4A
1 page
UNIT 4 Receptive Macro Skills Listening
No ratings yet
UNIT 4 Receptive Macro Skills Listening
13 pages
U5Exercise1 RosaJavier IA5B
No ratings yet
U5Exercise1 RosaJavier IA5B
2 pages
Modest_Proposal-I_Have_a_Dream_Questions (3)
No ratings yet
Modest_Proposal-I_Have_a_Dream_Questions (3)
2 pages
Study Zone - Upper Intermediate
No ratings yet
Study Zone - Upper Intermediate
57 pages
Degree Final+Test++Exercise May2016
No ratings yet
Degree Final+Test++Exercise May2016
7 pages
Vowel Sounds
No ratings yet
Vowel Sounds
3 pages
A German Reader For Beginners in School or College 1000196196
No ratings yet
A German Reader For Beginners in School or College 1000196196
296 pages
How Did You Count
No ratings yet
How Did You Count
146 pages
bac-english-2016-se-2 (1)
No ratings yet
bac-english-2016-se-2 (1)
2 pages
Understanding The Floss Rule For Spelling (+ Vide
No ratings yet
Understanding The Floss Rule For Spelling (+ Vide
15 pages
Technical communication syllabus
No ratings yet
Technical communication syllabus
2 pages
Download Full (Ebook) The Expression of Possession by William B. McGregor, William B. McGregor ISBN 9783110184372, 3110184370 PDF All Chapters
100% (3)
Download Full (Ebook) The Expression of Possession by William B. McGregor, William B. McGregor ISBN 9783110184372, 3110184370 PDF All Chapters
81 pages
CREATIVE
No ratings yet
CREATIVE
7 pages
What Is Coherence in Writing
100% (1)
What Is Coherence in Writing
3 pages
Support Material / Material de Apoyo Learning Activity 2 / Actividad de Aprendizaje 2
No ratings yet
Support Material / Material de Apoyo Learning Activity 2 / Actividad de Aprendizaje 2
9 pages
GQ L3 2 Group A PDF
No ratings yet
GQ L3 2 Group A PDF
1 page
(eBook PDF) Avanti! (Italian) 4th Edition by Janice Aski instant download
100% (2)
(eBook PDF) Avanti! (Italian) 4th Edition by Janice Aski instant download
54 pages
An Analysis of Morphological and Syntactical Errors On The English Writing of Junior High School Indonesian Students
No ratings yet
An Analysis of Morphological and Syntactical Errors On The English Writing of Junior High School Indonesian Students
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

20210624-80604 Automata and Compiler Design

Uploaded by

20210624-80604 Automata and Compiler Design

Uploaded by

Malla Reddy College Engineering

Department of Information Technology

III B. TECH I SEM (A.Y.2018-19)

80604-Automata And Compiler Design

MODULE I: Fundamentals and Finite Automata [10 Periods]

MODULE III: Turing Machine and Compiler Basics [09 Periods]

MODULE IV: Syntax Analysis [09 Periods]

MODULE V: Code Optimization and Genaration [10 Periods]

CO- PO, PSO Mapping

MALLA REDDY ENGINEERING COLLEGE

DEPARTMENT OF INFORMATION TECHNOLOGY

Symbol – An atomic unit, such as a digit, character, lower-case letter, etc.

Alphabet – A finite set of symbols, usually denoted byΣ.

String – A finite length sequence of symbols, presumably from

Special string: ε (also denoted by λ)

English sentence} Regular Expression

1. Øu = uØ=Ø Multiply by0

2. εu = uε=u Multiply by1

Automata &Compiler Design Page 3

10. (u+v)w =uw+vw

11. (uv)*u = u(vu)*

12. (u+v)* = (u*+v)*

Finite State Machines

• There are two types of finite statemachines

A Finite State Machine consistsof:

• Two types – both describe what are called regularlanguages

Automata &Compiler Design Page 4

Formal Definition of a Finite Automaton

• Finite set of states, typicallyQ.

• A transition function, typicallyδ. Thisfunction

Deterministic Finite Automata (DFA)

• A DFA is a five-tuple: M = (Q, Σ, δ, q0,

Automata &Compiler Design Page 5

• Let L be a language. Then L is a regular language iff there exists

• If L = L(M) then L is a subset of L(M) and L(M) is a subset ofL.

• Similarly, if L(M1) = L(M2) then L(M1) is a subset of L(M2) and L(M2)

• Some languages are regular, others are not. For example,if

Nondeterministic Finite Automata (NFA)

Automata &Compiler Design Page 6

Automata &Compiler Design Page 7

3.I REDTIOflSflIP BETñEE 8 FAAflDRE

FIGURE: RsIaliouhipB&anFAsnd iegvIsr ^^P^^ ’°'

g.j gggg@y¢TINGFAF0RAGNEN REe

GonslzrcQon olhFAwM ; • M0Y8B

Automata &Compiler Design Page 9

Automata &Compiler Design Page 11

Following are the various parameters for NFA.

δ ( q1, b ) = { q0, q2 } Now we will update the transition table of DFA.

Automata &Compiler Design Page 12

δ‟ (Transition Function of DFA)

Automata &Compiler Design Page 13

Definition: Context-Free Grammar (CFG) has 4-tuple: G = (V, T, P,S)

V -A finite set of variables ornon-terminals

G = ({S}, {0, 1}, P, S) P:

S => 0S1 (1)

Derivation (or Parse) Tree

Automata &Compiler Design Page 14

stands for “1 token of look ahead”.

No LL (1) grammar can be ambiguous or left recursive.

Automata &Compiler Design Page 15

Now, the following text from here explains strong LL(k)LL(k):

already been seen of the input.

Automata &Compiler Design Page 16

(The point of parsing is to construct a derivation. A derivation consists of a series of

S→if E then S else S/while E do S/

Automata &Compiler Design Page 17

Right-sentential form Handle Reducing production

E+E*E E*E E→E*E

E+E E+E E→E+E

Automata &Compiler Design Page 18

2. TheLRparsing method is the most general non-backtracking shift- reduce parsing

4,AnLR parser can detect a syntactic error as soon as it is possible to do so on a left-

„L‟stands for “Left-to-right” scan of input.

K‟standsfornumber ofinput symbolsoflook-a-head thatareusedin

Automata &Compiler Design Page 19

S r0 r1 r2- - - rn-1 rn sentence.

11. (uv)u = u(vu)

12. (u+v)* = (u+v)

E+EE EE E→E*E