0% found this document useful (0 votes)
60 views45 pages

Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors

Compilers translate source code into machine code by performing lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation. Compilers translate source code into machine code in steps by using multiple intermediate representations that become more machine-specific during the translation process.

Uploaded by

Akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views45 pages

Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors

Compilers translate source code into machine code by performing lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation. Compilers translate source code into machine code in steps by using multiple intermediate representations that become more machine-specific during the translation process.

Uploaded by

Akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

What are Compilers?

• Translates from one representation of the


program to another

• Typically from high level source code to low level


machine code or object code

• Source code is normally optimized for human


readability
– Expressive: matches our notion of languages (and
application?!)
– Redundant to help avoid programming errors

• Machine code is optimized for hardware


– Redundancy is reduced
– Information about the intent is lost

1
How to translate?
• Source code and machine code mismatch in
level of abstraction

• Some languages are farther from machine


code than others

• Goals of translation
– Good performance for the generated
code
– Good compile time performance
– Maintainable code
– High level of abstraction

• Correctness is a very important issue. Can


compilers be proven to be correct? Very
tedious!

• However, the correctness has an


implication on the development cost
2
High level Low level
Compiler code
program

3
The big picture
• Compiler is part of program
development environment

• The other typical components of this


environment are editor, assembler,
linker, loader, debugger, profiler etc.

• The compiler (and all other tools)


must support each other for easy
program development
4
Source Assembly
Programmer Program code
Editor Compiler Assembler

Machine
Programmer Code
Does manual
Correction of Linker
The code Resolved
Machine
Code
Debugger Loader
Debugging Execute under
Control of Executable
results Image
debugger

Execution on
the target machine

Normally end
up with error 5
How to translate easily?
• Translate in steps. Each step handles a reasonably
simple, logical, and well defined task

• Design a series of program representations

• Intermediate representations should be amenable


to program manipulation of various kinds (type
checking, optimization, code generation etc.)

• Representations become more machine specific


and less language specific as the translation
proceeds

6
The first few steps
• The first few steps can be understood by
analogies to how humans comprehend a
natural language

• The first step is recognizing/knowing


alphabets of a language. For example
– English text consists of lower and upper case
alphabets, digits, punctuations and white
spaces
– Written programs consist of characters from
the ASCII characters set (normally 9-13, 32-
126)

• The next step to understand the


sentence is recognizing words (lexical
analysis)
– English language words can be found in
dictionaries
– Programming languages have a dictionary
(keywords etc.) and rules for constructing
words (identifiers, numbers etc.) 7
Lexical Analysis
• Recognizing words is not completely trivial. For
example:
ist his ase nte nce?

• Therefore, we must know what the word separators are

• The language must define rules for breaking a sentence


into a sequence of words.

• Normally white spaces and punctuations are word


separators in languages.

• In programming languages a character from a different


class may also be treated as word separator.

• The lexical analyzer breaks a sentence into a sequence


of words or tokens:
– If a == b then a = 1 ; else a = 2 ;
– Sequence of words (total 14 words)
if a == b then a = 1 ; else a = 2 ;

8
The next step
• Once the words are understood, the next
step is to understand the structure of the
sentence

• The process is known as syntax checking or


parsing
I am going to market

pronoun aux verb adverb

subject verb adverb-phrase

Sentence
9
Parsing
• Parsing a program is exactly the same

• Consider an expression
if x == y then z = 1 else z = 2

if stmt

predicate then-stmt else-stmt

== = =

x y z 1 z 2
10
Understanding the meaning
• Once the sentence structure is understood we try to
understand the meaning of the sentence (semantic analysis)

• Example:
Prateek said Nitin left his assignment at home

• What does his refer to? Prateek or Nitin?

• Even worse case


Amit said Amit left his assignment at home

• How many Amits are there? Which one left the assignment?

11
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding

• However, compilers do perform analysis to


understand the meaning and catch inconsistencies

• Programming languages define strict rules to avoid


such ambiguities

{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}

12
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings

• Type checking
Amit left her work at home

• There is a type mismatch between her


and Amit. Presumably Amit is a male.
And they are not the same person.
13
Compiler structure once
again
Compiler
Lexical Syntax Semantic
Analysis Analysis Analysis

Source Token Abstract Unambiguous Target


Program stream
Syntax Program
representation
Program
tree

Front End
(Language specific)

14
Front End Phases

• Lexical Analysis
– Recognize tokens and ignore white spaces,
comments

Generates token stream

– Error reporting
– Model using regular expressions
– Recognize using Finite State Automata
15
Syntax Analysis
• Check syntax and construct abstract
syntax tree

if
== = ;
b 0 a b

• Error reporting and recovery


• Model using context free grammars
• Recognize using Push down automata/Table
Driven Parsers

16
Semantic Analysis
• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Unique ness checking
– Name checks

17
Code Optimization
• No strong counter part with English, but is similar to
editing/précis writing

• Automatically modify programs so that they


– Run faster
– Use less resources (memory, registers, space, fewer fetches
etc.)

• Some common optimizations


– Common sub-expression elimination
– Copy propagation
– Dead code elimination
– Code motion
– Strength reduction
– Constant folding

• Example: x = 15 * 3 is transformed to x = 45

18
Example of Optimizations
PI = 3.14159 3A+4M+1D+2E
Area = 4 * PI * R^2
Volume = (4/3) * PI * R^3
--------------------------------
X = 3.14159 * R * R 3A+5M
Area = 4 * X
Volume = 1.33 * X * R
--------------------------------
Area = 4 * 3.14159 * R * R2A+4M+1D
Volume = ( Area / 3 ) * R
--------------------------------
Area = 12.56636 * R * R 2A+3M+1D
Volume = ( Area /3 ) * R
--------------------------------
X = R * R 3A+4M
Area = 12.56636 * X
Volume = 4.18879 * X * R

A : assignment M : multiplication
D : division E : exponent
19
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code

• The advantage is that each phase is simple

• Requires design of intermediate language

• Most compilers perform translation between


successive intermediate representations

• Intermediate languages are generally ordered


in decreasing level of abstraction from highest
(source) to lowest (machine)

• However, typically the one after the


intermediate code generation is the most
important
20
Intermediate Code
Generation
• Abstraction at the source level
identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined,
system defined or libraries)

• Abstraction at the target level


memory locations, registers, stack, opcodes,
addressing modes, system libraries, interface to
the operating systems

• Code generation is mapping from source level


abstractions to target machine abstractions

21
Intermediate Code
Generation …
• Map identifiers to locations (memory/storage
allocation)

• Explicate variable accesses (change identifier


reference to relocatable/absolute address

• Map source operators to opcodes or a sequence of


opcodes

• Convert conditionals and iterations to a test/jump


or compare instructions

22
Intermediate Code
Generation …
• Layout parameter passing protocols:
locations for parameters, return
values, layout of activations frame
etc.

• Interface calls to library, runtime


system, operating systems

23
Post translation Optimizations
• Algebraic transformations and re-
ordering
– Remove/simplify operations like
• Multiplication by 1
• Multiplication by 0
• Addition with 0

– Reorder instructions based on


• Commutative properties of operators
• For example x+y is same as y+x
(always?)

Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization
24
if
boolean
== = int ;

int b 0 a b
int int int

Intermediate code generation

Optimization
Code
Generation
CMP Cx, 0
CMOVZ Dx,Cx

25
Compiler structure
Compiler
Lexical Syntax Semantic IL code
Analysis Optimizer generator Code
Analysis Analysis generator
Optimized
code
Source Token
Abstract Unambiguous Target
Program Syntax Program IL Program
stream tree representation code

Front End Optional Back End


(Language specific) Phase Machine specific

26
• Information required about the program variables
during compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information

• Location to store this information


– Attributes with the variable (has obvious problems)
– At a central repository and every phase refers to the
repository whenever information is required

• Normally the second approach is preferred


– Use a data structure called symbol table

27
Final Compiler structure

Symbol Table

Compiler
Lexical Syntax Semantic IL code
Code
Optimizer generator
Analysis Analysis Analysis generator
Optimized
code
Source Token
Abstract Unambiguous Target
Program Syntax Program IL Program
stream tree representation code

Front End Optional Back End


(Language specific) Phase Machine specific
28
Translation of an assignment statement
position = initial + rate * 60

29
Advantages of the model
• Also known as Analysis-Synthesis model of
compilation
– Front end phases are known as analysis phases
– Back end phases known as synthesis phases

• Each phase has a well defined work

• Each phase handles a logical activity in the


process of compilation

30
Advantages of the model …
• Compiler is retargetable

• Source and machine independent code


optimization is possible.

• Optimization phase can be inserted after


the front and back end phases have been
developed and deployed

31
CS304 COMPILER DESIGN Teaching Scheme: Sem.
4(L) - 0(T) - 0(P) Credits: Exam
4 Marks
%
Module Contents Hours
(42)
1 Introduction to compilers – Analysis of the source program, Phases of a compiler, Grouping of
phases, compiler writing tools – bootstrapping 7 15%
Lexical Analysis: The role of Lexical Analyzer, Input Buffering, Specification of Tokens using
Regular Expressions, Review of Finite Automata, Recognition of Tokens.
2 Syntax Analysis: Review of Context-Free Grammars – Derivation trees and Parse Trees,
Ambiguity. Top-Down Parsing: Recursive Descent parsing, Predictive parsing, LL(1) 6 15%
Grammars.
FIRST INTERNAL EXAM
3 Bottom-Up Parsing: Shift Reduce parsing – Operator precedence parsing (Concepts only) LR 7 15%
parsing – Constructing SLR parsing tables, Constructing, Canonical LR parsing tables and
Constructing LALR parsing tables.
4 Syntax directed translation: Syntax directed definitions, Bottom- up evaluation of Sattributed 8 15%
definitions, L- attributed definitions, Top-down translation, Bottom-up evaluation of inherited
attributes. Type Checking : Type systems, Specification of a simple type checker.
SECOND INTERNAL EXAM
5 Run-Time Environments: Source Language issues, Storage organization, Storage- allocation 7 20%
strategies. Intermediate Code Generation (ICG): Intermediate languages – Graphical
representations, Three Address code, Quadruples, Triples. Assignment statements, Boolean
expressions.
6 Code Optimization: Principal sources of optimization, Optimization of Basic blocks 7 20%
Code generation: Issues in the design of a code generator. The target machine, A simple code
generator.
Text Books
1. Aho A. Ravi Sethi and D Ullman. Compilers – Principles Techniques and Tools, Addison Wesley.
2. D. M.Dhamdhare, System Programming and Operating Systems,Tata McGraw Hill & Company.
CO # Course Outcome Cognitive Process

1 Explain the concepts and different phases of compilation


with compile time error handling. Understand

2 Represent language tokens using regular expressions, context


free grammar and finite automata and design lexical analyzer Understand
for a language.

3 Compare top down with bottom up parsers, and develop


appropriate parser to produce parse tree representation of the Analyze
input.

4 Generate intermediate code for statements in high level


language. Apply

5 Design syntax directed translation schemes for a given context


free grammar. Apply

6 Apply optimization techniques to intermediate code and


generate machine code for high level language program. Apply

33
Issues in Compiler Design
• Compilation appears to be very simple, but there
are many pitfalls

• How are erroneous programs handled?

• Design of programming languages has a big impact


on the complexity of the compiler

• M*N vs. M+N problem


– Compilers are required for all the languages and all the
machines
– For M languages and N machines we need to developed
M*N compilers
– However, there is lot of repetition of work because of
similar activities in the front ends and back ends
– Can we design only M front ends and N back ends, and
some how link them to get all M*N compilers?
34
M*N vs M+N Problem
Universal Intermediate Language

F1 B1 F1 B1

Universal IL
F2 B2 F2 B2

F3 B3 F3 B3

FM BN FM BN

Requires M*N compilers Requires M front ends


And N back ends

35
How to reduce development and
testing effort?
• DO NOT WRITE COMPILERS
• GENERATE compilers

• A compiler generator should be able to “generate”


compiler from the source language and target machine
specifications

Source Language
Specification
Compiler Compiler
Target Machine Generator
Specification

36
Specifications and Compiler
Generator
• How to write specifications of the source
language and the target machine?
– Language is broken into sub components like lexemes,
structure, semantics etc.
– Each component can be specified separately. For
example an identifiers may be specified as
• A string of characters that has at least one alphabet
• starts with an alphabet followed by alphanumeric
• letter(letter|digit)*
– Similarly syntax and semantics can be described
• Can target machine be described using
specifications?

37
Tool based Compiler
Development
Source Target
Lexical Parser Semantic IL code Code
Program Analyzer Optimizer generatorgenerator Program
Analyzer

Generator

Generator
generator
Generator
Analyzer

Parser

Other phase
Lexical

Code
Generators

Lexeme Parser
Phase Machine
specs specs
Specifications specifications

38
How to Retarget Compilers?
• Changing specifications of a phase can lead to a
new compiler
– If machine specifications are changed then
compiler can generate code for a different machine
without changing any other phase
– If front end specifications are changed then we can
get compiler for a new language

• Tool based compiler development cuts down


development/maintenance time by almost 30-40%

• Tool development/testing is one time effort

• Compiler performance can be improved by


improving a tool and/or specification for a
particular phase

39
Bootstrapping
• Compiler is a complex program and should not be
written in assembly language

• How to write compiler for a language in the same


language (first time!)?

• First time this experiment was done for Lisp

• Initially, Lisp was used as a notation for writing


functions.

• Functions were then hand translated into assembly


language and executed

• McCarthy wrote a function eval[e,a] in Lisp that took a


Lisp expression e as an argument

• The function was later hand translated and it became an


interpreter for Lisp

40
Bootstrapping …
• A compiler can be characterized by three languages: the source
language (S), the target language (T), and the implementation
language (I)

• The three language S, I, and T can be quite different. Such a compiler


is called cross-compiler

• This is represented by a T-diagram as:


S T
I
• In textual form this can be represented as
SI T

41
• Write a cross compiler for a language L in
implementation language S to generate
code for machine N
• Existing compiler for S runs on a different
machine M and generates code for M
• When Compiler LSN is run through SMM we
get compiler LMN
L N L N
S S M M
M EQN TROFF EQN TROFF

C C PDP11 PDP11

PDP11
42
Bootstrapping a Compiler
• Suppose L N is to be developed on a machine M
L
where LMM is available
L N L N
L L M M
M

• Compile L N second time using the generated


L
compiler L N L N
L L N N
M
43
Bootstrapping a Compiler:
the Complete picture

L N L N

L N L L N N

L L M M

44
Compilers of the 21st
Century
• Overall structure of almost all the compilers is
similar to the structure we have discussed

• The proportions of the effort have changed since


the early days of compilation

• Earlier front end phases were the most complex


and expensive parts.

• Today back end phases and optimization dominate


all other phases. Front end phases are typically a
small fraction of the total time

45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy