Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
1
How to translate?
• Source code and machine code mismatch in
level of abstraction
• Goals of translation
– Good performance for the generated
code
– Good compile time performance
– Maintainable code
– High level of abstraction
3
The big picture
• Compiler is part of program
development environment
Machine
Programmer Code
Does manual
Correction of Linker
The code Resolved
Machine
Code
Debugger Loader
Debugging Execute under
Control of Executable
results Image
debugger
Execution on
the target machine
Normally end
up with error 5
How to translate easily?
• Translate in steps. Each step handles a reasonably
simple, logical, and well defined task
6
The first few steps
• The first few steps can be understood by
analogies to how humans comprehend a
natural language
8
The next step
• Once the words are understood, the next
step is to understand the structure of the
sentence
Sentence
9
Parsing
• Parsing a program is exactly the same
• Consider an expression
if x == y then z = 1 else z = 2
if stmt
== = =
x y z 1 z 2
10
Understanding the meaning
• Once the sentence structure is understood we try to
understand the meaning of the sentence (semantic analysis)
• Example:
Prateek said Nitin left his assignment at home
• How many Amits are there? Which one left the assignment?
11
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding
{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}
12
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Amit left her work at home
Front End
(Language specific)
14
Front End Phases
• Lexical Analysis
– Recognize tokens and ignore white spaces,
comments
– Error reporting
– Model using regular expressions
– Recognize using Finite State Automata
15
Syntax Analysis
• Check syntax and construct abstract
syntax tree
if
== = ;
b 0 a b
16
Semantic Analysis
• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Unique ness checking
– Name checks
17
Code Optimization
• No strong counter part with English, but is similar to
editing/précis writing
• Example: x = 15 * 3 is transformed to x = 45
18
Example of Optimizations
PI = 3.14159 3A+4M+1D+2E
Area = 4 * PI * R^2
Volume = (4/3) * PI * R^3
--------------------------------
X = 3.14159 * R * R 3A+5M
Area = 4 * X
Volume = 1.33 * X * R
--------------------------------
Area = 4 * 3.14159 * R * R2A+4M+1D
Volume = ( Area / 3 ) * R
--------------------------------
Area = 12.56636 * R * R 2A+3M+1D
Volume = ( Area /3 ) * R
--------------------------------
X = R * R 3A+4M
Area = 12.56636 * X
Volume = 4.18879 * X * R
A : assignment M : multiplication
D : division E : exponent
19
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code
21
Intermediate Code
Generation …
• Map identifiers to locations (memory/storage
allocation)
22
Intermediate Code
Generation …
• Layout parameter passing protocols:
locations for parameters, return
values, layout of activations frame
etc.
23
Post translation Optimizations
• Algebraic transformations and re-
ordering
– Remove/simplify operations like
• Multiplication by 1
• Multiplication by 0
• Addition with 0
Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization
24
if
boolean
== = int ;
int b 0 a b
int int int
Optimization
Code
Generation
CMP Cx, 0
CMOVZ Dx,Cx
25
Compiler structure
Compiler
Lexical Syntax Semantic IL code
Analysis Optimizer generator Code
Analysis Analysis generator
Optimized
code
Source Token
Abstract Unambiguous Target
Program Syntax Program IL Program
stream tree representation code
26
• Information required about the program variables
during compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
27
Final Compiler structure
Symbol Table
Compiler
Lexical Syntax Semantic IL code
Code
Optimizer generator
Analysis Analysis Analysis generator
Optimized
code
Source Token
Abstract Unambiguous Target
Program Syntax Program IL Program
stream tree representation code
29
Advantages of the model
• Also known as Analysis-Synthesis model of
compilation
– Front end phases are known as analysis phases
– Back end phases known as synthesis phases
30
Advantages of the model …
• Compiler is retargetable
31
CS304 COMPILER DESIGN Teaching Scheme: Sem.
4(L) - 0(T) - 0(P) Credits: Exam
4 Marks
%
Module Contents Hours
(42)
1 Introduction to compilers – Analysis of the source program, Phases of a compiler, Grouping of
phases, compiler writing tools – bootstrapping 7 15%
Lexical Analysis: The role of Lexical Analyzer, Input Buffering, Specification of Tokens using
Regular Expressions, Review of Finite Automata, Recognition of Tokens.
2 Syntax Analysis: Review of Context-Free Grammars – Derivation trees and Parse Trees,
Ambiguity. Top-Down Parsing: Recursive Descent parsing, Predictive parsing, LL(1) 6 15%
Grammars.
FIRST INTERNAL EXAM
3 Bottom-Up Parsing: Shift Reduce parsing – Operator precedence parsing (Concepts only) LR 7 15%
parsing – Constructing SLR parsing tables, Constructing, Canonical LR parsing tables and
Constructing LALR parsing tables.
4 Syntax directed translation: Syntax directed definitions, Bottom- up evaluation of Sattributed 8 15%
definitions, L- attributed definitions, Top-down translation, Bottom-up evaluation of inherited
attributes. Type Checking : Type systems, Specification of a simple type checker.
SECOND INTERNAL EXAM
5 Run-Time Environments: Source Language issues, Storage organization, Storage- allocation 7 20%
strategies. Intermediate Code Generation (ICG): Intermediate languages – Graphical
representations, Three Address code, Quadruples, Triples. Assignment statements, Boolean
expressions.
6 Code Optimization: Principal sources of optimization, Optimization of Basic blocks 7 20%
Code generation: Issues in the design of a code generator. The target machine, A simple code
generator.
Text Books
1. Aho A. Ravi Sethi and D Ullman. Compilers – Principles Techniques and Tools, Addison Wesley.
2. D. M.Dhamdhare, System Programming and Operating Systems,Tata McGraw Hill & Company.
CO # Course Outcome Cognitive Process
33
Issues in Compiler Design
• Compilation appears to be very simple, but there
are many pitfalls
F1 B1 F1 B1
Universal IL
F2 B2 F2 B2
F3 B3 F3 B3
FM BN FM BN
35
How to reduce development and
testing effort?
• DO NOT WRITE COMPILERS
• GENERATE compilers
Source Language
Specification
Compiler Compiler
Target Machine Generator
Specification
36
Specifications and Compiler
Generator
• How to write specifications of the source
language and the target machine?
– Language is broken into sub components like lexemes,
structure, semantics etc.
– Each component can be specified separately. For
example an identifiers may be specified as
• A string of characters that has at least one alphabet
• starts with an alphabet followed by alphanumeric
• letter(letter|digit)*
– Similarly syntax and semantics can be described
• Can target machine be described using
specifications?
37
Tool based Compiler
Development
Source Target
Lexical Parser Semantic IL code Code
Program Analyzer Optimizer generatorgenerator Program
Analyzer
Generator
Generator
generator
Generator
Analyzer
Parser
Other phase
Lexical
Code
Generators
Lexeme Parser
Phase Machine
specs specs
Specifications specifications
38
How to Retarget Compilers?
• Changing specifications of a phase can lead to a
new compiler
– If machine specifications are changed then
compiler can generate code for a different machine
without changing any other phase
– If front end specifications are changed then we can
get compiler for a new language
39
Bootstrapping
• Compiler is a complex program and should not be
written in assembly language
40
Bootstrapping …
• A compiler can be characterized by three languages: the source
language (S), the target language (T), and the implementation
language (I)
41
• Write a cross compiler for a language L in
implementation language S to generate
code for machine N
• Existing compiler for S runs on a different
machine M and generates code for M
• When Compiler LSN is run through SMM we
get compiler LMN
L N L N
S S M M
M EQN TROFF EQN TROFF
C C PDP11 PDP11
PDP11
42
Bootstrapping a Compiler
• Suppose L N is to be developed on a machine M
L
where LMM is available
L N L N
L L M M
M
L N L N
L N L L N N
L L M M
44
Compilers of the 21st
Century
• Overall structure of almost all the compilers is
similar to the structure we have discussed
45