Introduction1
Introduction1
1. Introduction
Why Compilers?
• Compiler
– A program that translates from 1 language to another
– It must preserve semantics of the source
– It should create an efficient version of the target
language
• In the beginning, there was machine language
– Ugly – writing code, debugging
– Then came textual assembly
– High-level languages – Fortran, Pascal, C, C++
– Machine structures became too complex and software
management too difficult to continue with low-level
languages
Compiler learning
• Isn’t it an old discipline?
– Yes, it is a well-established discipline
– Algorithms, methods and techniques are researched and
developed in early stages of computer science growth
– There are many compilers around and many tools to
generate them automatically
• So, why we need to learn it?
– Although you may never write a full compiler
– But the techniques we learn is useful in many tasks like
writing an interpreter for a scripting language,
validation checking for forms and so on
Why Study Compilers? (1)
• Become a better programmer(!)
– Insight into interaction between languages,
compilers, and hardware
– Understanding of implementation techniques
– Better intuition about what your code does
Why Study Compilers? (2)
• Compiler techniques are everywhere
– Parsing (little languages, interpreters, XML)
– Database engines, query languages
– AI: domain-specific languages
– Text processing
• Tex/LaTex -> dvi -> Postscript -> pdf
– Hardware: VHDL; model-checking tools
– Mathematics (Mathematica, Matlab)
Why Study Compilers? (3)
• Fascinating blend of theory and engineering
– Direct applications of theory to practice
• Parsing, scanning, static analysis
– Some very difficult problems (NP-hard or
worse)
• Resource allocation, “optimization”, etc.
• Need to come up with good-enough
approximations/heuristics
Why Study Compilers? (4)
• Ideas from many parts of CS
– AI: Greedy algorithms, heuristic search
– Algorithms: graph algorithms, dynamic programming,
approximation algorithms
– Theory: Grammars, DFAs and PDAs, pattern matching,
fixed-point algorithms
– Systems: Allocation & naming, synchronization,
locality
– Architecture: pipelines, instruction set use, memory
hierarchy management
Why Study Compilers? (5)
• You might even write a compiler some day!
– You’ll almost certainly write parsers and
interpreters in some context if you haven’t
already
Course scope
• Aim:
– To learn techniques of a modern compiler
• Books:
– Des Watson, A Practical Approach to Compiler
Construction, Springer, 2017.
– Compilers – Principles, Techniques and Tools, Second
Edition by Alfred V. Aho, Ravi Sethi, Jeffery D.
Ullman
Prerequisites
• Problem Solving & Programming (C++,
Python)
• Data Structures
• Algorithm Design
• Theory of Automata
• Computer Architecture
• Assembly Programming
• Operating Systems
Topics
• High Level Languages
• Lexical analysis (Scanning)
• Syntax Analysis (Parsing)
• Syntax Directed Translation
• Intermediate Code Generation
• Run-time environments
• Code Generation
• Machine Independent Optimization
Grading policy
• Midterm (25 Marks)
• Final exam (50 Marks)
• Assignments (20 Marks)
• Quizes (5 Marks)
Terminology
• Compiler:
– a program that translates an executable program in one
language into an executable program in another
language
– we expect the program produced by the compiler to be
better, in some way, than the original
• Interpreter:
– a program that reads an executable program and
produces the results of running that program
– usually, this involves executing the source program in
some fashion
• Our course is mainly about compilers but many of
the same issues arise in interpreters
Common Issues
• Compilers and interpreters both must read
the input – a stream of characters – and
“understand” it; analysis
• Object orientation
• Support for asynchronous processes and
parallelism
• Software portability
– Machine independence
– Java
Advantages of High-Level Languages (Cont..)
Source Machine
code Compiler code
errors
• Recognizes legal (and illegal) programs
• Generate correct code
• Manage storage of all variables and code
• Agreement on format for object (or
assembly) code
Structure of a Compiler
• First approximation
– Front end: analysis
• Read source program and understand its structure
and meaning
– Back end: synthesis
• Generate equivalent target language program
source tokens IR
Scanner Parser
Front end
Source tokens IR
Scanner Parser
code
errors
• Scanner:
– Maps characters into tokens – the basic unit of syntax
• x = x + y becomes <id, x> = <id, x> + <id, y>
– Typical tokens: number, id, +, -, *, /, do, end
– Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
LEX it sometimes needed to write your own
scanner
Front end
Source tokens IR
Scanner Parser
code
errors
• Parser:
– Recognize context-free syntax
– Guide context-sensitive analysis
– Construct IR
– Produce meaningful error messages
– Attempt error correction
• There are parser generators like YACC which
automates much of the work
Front end
• Context free grammars are used to represent
programming language syntaxes:
errors
errors
errors
errors
Type 0
Type 1
Type 2
Type 3
Chomsky Hierarchy
• Chomsky type 0
– a free grammar or an unrestricted grammar
–α→β
• Chomsky type 1
– a context-sensitive grammar
– αAβ → αγβ
Chomsky Hierarchy
• Chomsky type 2
– a context-free grammar
– A→γ
– where A is a single non-terminal symbol.
– These productions correspond directly to BNF
rules.
• Chomsky type 3
– a regular grammar or a finite-state grammar
– A → a or A → aB