Chapter 1 Completed

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chapter 1

Introduction
Introduction
 Programming languages are notations for describing
computations to people and to machines.
 The world as we know it depends on programming
languages, because all the software running on all the
computers was written in some programming
language.
 But, before a program can be run, it first must be
translated into a form in which it can be executed by a
computer.
 The software systems that do this translation are called
compilers.
Cont….
This course is about
 How to design and implement compilers.
About basic ideas that can be used to construct
translators for a wide variety of languages and
machines.
Besides compilers, the principles and techniques
for compiler design.
Phases of a Compiler
 A compiler processes source code through several phases to
convert high-level programming code into machine code.
 These phases are typically divided into two main parts: *Analysis*
and *Synthesis*. Here’s an overview of each phase:
 Analysis: The analysis phase reads the source code and breaks it down into an
intermediate representation.
 Synthesis: The synthesis phase takes the intermediate representation
from the analysis phase and transforms it into the target code.
1. Lexical Analysis (Scanner)
2. Syntax Analysis (Parser)
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
7. Code Linking and Loading
Lexical Analysis (Scanner)
The compiler reads the source code as a stream of
characters and groups them into meaningful
sequences, called *tokens* (e.g., keywords,
identifiers, literals).
Removes whitespace, comments, and may perform
simple error-checking.
Output: a stream of tokens.
Syntax Analysis (Parser)
Uses tokens from the lexical analyzer to create
a *parse tree* (or syntax tree) based on the
source language’s grammar.
Checks for syntactical errors and reports if the
code violates the language’s rules.
Output: a parse tree or abstract syntax tree (AST).
Semantic Analysis
Checks the parse tree for semantic errors,
ensuring that expressions, variable types, and
scope rules are valid.
May involve type checking, ensuring that operators
apply to compatible data types.
Output: an annotated syntax tree, with type
information added.
Intermediate Code Generation

Transforms the syntax tree into an intermediate


representation (IR), which is typically easier to
optimize and translate into machine code.
Common IRs include three-address code or abstract
machine code.
Output: Intermediate code.
Code Optimization
Optimizes the intermediate code to improve
efficiency (speed or memory usage) without
altering its functionality.
Can include dead code elimination, loop
optimization, constant folding, etc.
Output: optimized intermediate code.
Code Generation
Converts the optimized intermediate code into
machine code or assembly code specific to the
target CPU architecture.
Allocates memory for variables and translates IR
instructions into machine-specific instructions.
Output: target code (machine code or assembly).
Code Linking and Loading
Combines code from multiple modules or
libraries, resolving addresses for functions and
variables.
Prepares the program for execution by the
operating system
Computer Language Representation
 Computer languages are represented at multiple levels,
each designed to bridge the gap between human-
readable code and machine-executable instructions.
 Here’s is different forms of language representation
used in the compilation process:
o Source Code (High-Level Language)
o Tokens (Lexical Representation)
o Abstract Syntax Tree (AST) and Parse Tree (Syntax
Representation)
o Symbol Table (Semantic Representation)
o Intermediate Representation (IR)
o Assembly Code (Low-Level Representation)
o Machine Code (Binary Representation)
Source Code (High-Level Language)
Written by programmers in high-level languages such
as Python, C++, and Java.
These languages are designed for human readability,
using syntax and constructs that abstract away
hardware details.
Example: int sum = a + b;
Tokens (Lexical Representation)
After lexical analysis, source code is broken into
tokens, which are the smallest units with meaning
(e.g., keywords, identifiers, operators).
Tokens make parsing easier by breaking code into
recognizable patterns.
Example: The line int sum = a + b;
is tokenized as [int, sum, =, a, +, b, ;].
Abstract Syntax Tree (AST) and Parse Tree (Syntax
Representation)
 The parser organizes tokens into a hierarchical structure
based on grammar rules.
This structure can be represented as a *parse tree* or an
*abstract syntax tree (AST)*.
 The AST removes unnecessary syntactic details, focusing on
the structure and relationships between elements.
Example for Expression: 3 + 5 * 2

<expression> (+)
/ | \ / \
<expression> '+' <term> 3 (*)
| / | \ / \
3 <term> '*' <factor> 5 2
| |
5 2
Abstract Syntax Tree (AST)
Parse Tree
Symbol Table (Semantic Representation)
The compiler creates a *symbol table* that holds
information about identifiers (variables, functions,
etc.), including types, scopes, and memory locations.
This table is essential for semantic checks and
intermediate code generation.
Example: sum is an int variable, scoped locally within a
function.
Intermediate Representation (IR)
A lower-level, platform-independent representation
used between the high-level code and machine
code.
Common IRs include three-address code, quadruples,
and static single-assignment (SSA) form.
Example: sum = a + b in three-address code might look like
t1 = a + b; sum = t1.
Assembly Code (Low-Level Representation)
Assembly code is a human-readable form of
machine instructions, specific to the architecture.
It represents operations directly supported by the
processor, such as MOV, ADD, and JMP.
Example (in x86 Assembly):
o MOV EAX, a
o ADD EAX, b
o MOV sum, EAX
Machine Code (Binary Representation)
The final compiled code is in binary format, consisting of
a series of 0s and 1s.
Machine code is executed directly by the CPU and is
specific to the hardware architecture.
 Example: The machine code for MOV EAX, a might look like a
sequence of binary digits, depending on the CPU instruction set.
Compiler Construction Tools
Compiler construction tools are specialized software
utilities designed to aid in developing compilers by
automating various parts of the process.
These tools help in generating parts of the compiler,
such as lexical analyzers and parsers, without requiring
developers to write them from scratch.
Cont….
Here are some key compiler construction tools:
1. Lexical Analyzer Generators
2. Parser Generators
3. Syntax-Directed Translation Engines
4. Intermediate Code Generators
5. Code Optimization Tools
6. Code Generation Tools
7. Debugger and Profiler Generators
8. Parser/Grammar Visualization Tools
9. Integrated Development Environments (IDEs) for
Compiler Construction
10. Automated Testing Tools
Lexical Analyzer Generators
Tools like Lex and Flex (Fast Lexical Analyzer) are
commonly used to create lexical analyzers.
These tools take a set of regular expressions that define
token patterns and automatically generate code to
identify tokens in the source code.
Example: Lex or Flex can generate a tokenizer for recognizing
keywords, identifiers, numbers, and symbols in a programming
language.
Parser Generators
 Parser generators are tools that automate the creation of
parsers from a formal grammar.
 A parser is a fundamental component in compilers and
interpreters, responsible for analyzing the syntactic structure of
source code to ensure it matches the grammar of the
programming language.
 Tools like Yacc (Yet Another Compiler Compiler), Bison, ANTLR
(Another Tool for Language Recognition), and JavaCC (Java
Compiler Compiler) are widely used to create parsers.
 They take a grammar definition, written in Backus-Naur Form
(BNF) or a similar notation, and generate code that can parse
input according to that grammar.
 These tools can generate parsers for both top-down and bottom-
up parsing methods (e.g., LL and LR parsers).
Syntax-Directed Translation Engines
 Syntax-Directed Translation (SDT) engines are mechanisms
used in compilers and interpreters to perform semantic
analysis and intermediate code generation by associating
specific semantic actions with grammar rules.
 Tools like Yacc and ANTLR also support syntax-directed
translation, where semantic actions (code) are embedded in
the grammar.
 These actions can be used to build syntax trees, generate
intermediate code, or populate symbol tables while parsing
the input.
 Syntax-directed translation engines streamline the process of
associating grammar rules with actions that execute when
those rules are recognized.
Intermediate Code Generators
Some compiler tools have built-in support for
generating intermediate representations (IR) that
simplify code optimization and target code
generation.
LLVM (Low-Level Virtual Machine) is a popular tool in
modern compiler construction that provides an IR, as well
as various backends and optimizations for generating
machine code across different architectures.
Code Optimization Tools
Tools like LLVM and GCC’s optimization libraries
provide a suite of code optimizations that can be
applied to intermediate code to make the final
machine code more efficient.
These tools allow developers to focus on writing the
core of the compiler while leveraging existing, well-
tested optimization techniques, such as dead code
elimination, loop unrolling, and constant folding.
Code Generation Tools
LLVM also doubles as a code generation tool,
where the IR can be transformed into machine
code for various architectures.
Retargetable code generators (like GCC and
LLVM) can convert intermediate
representations into the machine code of
different CPUs, making the compiler more
versatile.
Debugger and Profiler Generators
Tools like GDB (GNU Debugger) and Valgrind
provide debugging and profiling capabilities.
Some compilers integrate debug information
generators that allow code to be debugged
at the source level, tracking variables,
functions, and line numbers.
Parser/Grammar Visualization Tools
JFLAP (Java Formal Languages and Automata
Package) and other similar tools allow
visualization of parsing and automata, helping
compiler developers better understand the
grammar and its ambiguities.
Visualization tools are helpful for debugging
parser conflicts, like shift/reduce or
reduce/reduce conflicts in LR parsers.
Integrated Development Environments
(IDEs) for Compiler Construction
 IDEs such as Eclipse IDE with Xtext and IntelliJ
IDEA (when used with appropriate plugins) offer
support for creating domain-specific languages
(DSLs) and language processing features like
syntax highlighting and code completion.
These IDEs simplify compiler construction by
providing editing and debugging support within a
more user-friendly environment.
Automated Testing Tools
Tools like DejaGnu and Torture Test Suites help in
validating compiler implementations by testing for
compliance with language specifications and standard
behaviors.
Automated testing tools are essential to ensure that the
compiler correctly translates programs across a wide
range of cases.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy