0% found this document useful (0 votes)
214 views37 pages

Slides 01 - Compiler Construction - UET CS - Introduction

This document discusses the basics of compiler construction including an introduction, prerequisites, phases of compilation like lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

mubashrazaman33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views37 pages

Slides 01 - Compiler Construction - UET CS - Introduction

This document discusses the basics of compiler construction including an introduction, prerequisites, phases of compilation like lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

mubashrazaman33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

CS411-Compiler Construction

Introduction

Talha Waheed, Dept. of CS, UET, Lahore, Pakistan.


Course Learning Objectives

Compiler Construction - Lexical Analyzer 2


Introduction and
Overview
Pre-Requisite Concepts
Core Concepts from
• Computer Programming
• Data Structures
• Theory of Automata

Additional Concepts from


• Programming Languages
• Computer Architecture
• Assembly Language
• Software Design (e.g. OOAD)
• Software Engineering
• Algorithm Design
Sotwares and Apps are Everywhere
• Computers and software apps are everywhere around us.
• All the software applications are written in some programming
language.
• Before a program can be run, it must be translated into a form that can
be executed by a computer.
COMPILER
A software application that translates a program written
in one language or form (i.e. source) into an equivalent program
in another language or form (i.e. target).

Source Program Target Program


( Normally a program written in
COMPILER
( Normally the equivalent program in
a high-level programming language) assembly or relocatable object file)

Error Messages

Source Languages: (General Purpose) - C/C++, Java, Python, Pascal, Fortran


etc … (Special Purpose) – Mathematica, Matlab, Postscript, Tex …

Target Languages: Machine Languages of various architectures like intel, amd


History
• First compiler for arithmetic FORmula TRANslation (Fortran Language)
into machine code
• Implementation Effort (roughly 6 hours/day * 300 days * 18 years)
• Now good formalisms, implementation languages, programming
environments and software toolkits are available

Classifications
On the basis of Functionality
• Debugging Compiler, Optimizing Compiler
On the basis of Execution Style
• Interpreter vs Compiler
On the basis of structure
• Single Pass vs. Multi-Pass (Don’t confuse pass with phases of compiler)
Applications of Compiler in CS
• The basic principles and techniques used in compiler design can be
applicable to almost every applied area of computer science. E.g.
– Techniques used in a lexical analyzer can be used in text editors,
information retrieval system, and pattern recognition programs.
– Techniques used in a parser can be used in a query processing system
such as SQL.
– Many software having a complex front-end may need techniques used
in compiler design.
• A symbolic equation solver which takes an equation as input. That
program should parse the given input equation.
– Most of the techniques used in compiler design can be used in
Natural Language Processing (NLP) systems.
Two Major Parts of Compilers
• In analysis phase, an intermediate representation is created from
the given source program. By going through
– Lexical Analyzer,
– Syntax Analyzer
– Semantic Analyzer
• In synthesis phase, the equivalent target program is created from
this intermediate representation, by going through
– Intermediate Code Generator,
– Code Generator,
– Code Optimizer
Phases of Compilation
source program
front end
lexical analyzer (source)

syntax analyzer
Each phase transforms
semantic analyzer the source program
symbol table fromerror
onehandler
representation
manager intermediate
code generator into another
representation.
code optimizer

code generator back end


(target)
target program
Lexical Analyzer
• Linear scan of source program for grouping
• Reads the source program character by character and
returns the tokens of the source program.
• Puts information about identifiers, functions and
constants into the symbol table.
• Regular expressions are used to describe tokens
(lexical constructs).
• A (Deterministic) Finite State Automaton can be used
in the implementation of a lexical analyzer.
Tokens
• A token describes a pattern of characters having same
meaning in the source program. (such as identifiers,
operators, keywords, numbers, delimeters and so on)
newval := oldval + 12

String Token
newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number
Syntax Analyzer
• A Syntax Analyzer creates the syntactic structure (generally a parse
tree) of the given program.
• A syntax analyzer is also called as a parser.
• A parse tree describes a syntactic structure.
• The syntax of a language is specified by a context free grammar
(CFG).
• The rules in a CFG are mostly recursive.
• A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given
program.
Syntax Analyzer (CFG)

• Ex: We use BNF (Backus Naur Form) to specify a CFG


assgstmt  identifier := expression
expression  identifier
expression  number
expression  expression + expression
Syntax Analyzer

assgstmt
newval := oldval + 12

String Token identifier := expression


newval identifier
:= assignment operator newval expression +
oldval identifier expression
+ add operator
12 a number identifier number

oldval 12
• In a parse tree, all terminals are at leaves.
• All inner nodes are non-terminals in a context free grammar.
• Compiler may or may not explicitly build the tree
Syntax Analyzer versus Lexical Analyzer
• Regular expressions • Context free grammar
• Words • Sentences
• Deals with simple non- • Deals with recursive
recursive constructs of the constructs of the language.
language. • The syntax analyzer works
• The lexical analyzer on the smallest meaningful
simplifies the job of the units (tokens) in a source
syntax analyzer. program to recognize
• The lexical analyzer meaningful structures in our
recognizes the smallest programming language.
meaningful units (tokens) in
a source program.
Parsing Techniques
• Depending on how the parse tree is created, there are different parsing
techniques.
• These parsing techniques are categorized into two groups:
– Top-Down Parsing, Bottom-Up Parsing
• Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
• Bottom-Up Parsing:
– Construction of the parse tree starts at the leaves, and proceeds towards the root.
– Normally efficient bottom-up parsers are created with the help of some software tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing – simple, restrictive, easy to implement
– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
Symbol Table Management

• Symbol table is data structure used by all


phases of the compiler to keep track of user
defined symbols as well as keywords
• During early phases (lexical and syntax
analysis) symbols are discovered and put into
symbol table
• During later phases symbols are looked up to
validate their usage
Symbol Table

Name Type Def/UDef other


avg realid D ...
if keyword ...
num intid D ...
sum ...
then keyword ...
Semantic Analysis

• Meanings of sentences
• Involves examining the syntax output for correct
semantic usage
– type checking
– flow of control checks
– uniqueness checks (identifiers, case labels, etc.)
Semantic Analyzer
• A semantic analyzer checks the source program for semantic errors and
collects the type information for the code generation.
• Type-checking is an important part of semantic analyzer.
• Normally semantic information cannot be represented by a context-free
language used in syntax analyzers.
• Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
• Ex:
newval := oldval + 12

• The type of the identifier newval must match with type of the expression (oldval+12)
Error Management

• Errors can occur at all phases in the compiler


• Invalid linear combinations, syntax errors,
semantic errors, etc.
• Good compilers will attempt to recover from
errors and continue
Intermediate Code Generation

• Rather than generate code for a specific architecture,


most compilers generate to intermediate language
• Provides input to optimization
• Retargeting is facilitated
• Machine-independent code optimization can be
applied
• Code generation for different source languages can be
combined
• Can be interpreted at this point
Intermediate Code

• Generally three address type statements

newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1
ADD temp1,#1,temp2
MOV temp2,id1
Optimization

• Intermediate code is examined and optimized


• Can be as simple as combining adjacent
statements to reorganizing data for cache
efficiency
• Can make orders of magnitude difference in
the execution speed of generated code
Code Optimizer (for Intermediate Code
Generator)
• The code optimizer optimizes the code produced by
the intermediate code generator in the terms of time
and space.

MULT id2,id3,temp1 MULT id2,id3,temp1


ADD temp1,#1,temp2 ADD temp1,#1,id1
MOV temp2,id1
Code Generation

• Generation of real executable code for a


particular target architecture
• Output can either be assembler for target
architecture requiring assembly or object code
ready for linking
Code Generator

• Produces the target language in a specific


architecture.
• The target program is normally is a
relocatable object file containing the machine
codes.
MOVE id2,R1
MULT id2,id3,temp1
MULT id3,R1
ADD temp1,#1,id1
ADD #1,R1
MOVE R1,id1
Compiler Construction Toolkits

• A Number of Tools exist to help in the


development of some stages of the compiler
e.g.
– Lex (Flex) - lexical analysis generator
– Yacc (Bison) - parser generator
Input

Translation of an
Assignment
Statement

Output
Quick Review - Phases of Compilation
source program
front end
lexical analyzer (source)

syntax analyzer

semantic analyzer
symbol table error handler
manager intermediate
code generator

code optimizer

code generator back end


(target)
target program

Each phase transforms the source program from one representation


into another representation.
Overall Context of Compiler in a Language Processing System
skeletal preprocessor modified source program (character stream)
source
program lexical analyzer front end
(source)
token stream
syntax analyzer
syntax/parse tree
semantic analyzer
annotated syntax tree
Intermediate code generator
intermediate representation
Machine independent
symbol table error handler
code optimizer
manager intermediate representation
code generator
target target machine code
machine Machine dependent
code Library files/ code optimizer back end
relocatable (target)
target assembly program
obj files
Relocatable assembler
loader/linker machine code
Traditional Compilers Implementation

Modern Compilers Implementation


Three Major components of 3 phase compiler developed by different vendors
M High Level Languages N Architectures
Modern
Compilers
Chomsky Hierarchy of Formal Languages
Grammars
Chomsky Hierarchy binds Grammars,
Languages, and Automatons

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy