0% found this document useful (0 votes)
5 views5 pages

Document From Aditya Tripathi

This document provides an overview of compiler design, detailing the role of compilers in translating high-level programming languages into machine code. It covers key concepts such as the phases of compilation, lexical analysis, data structures used in compilers, and tools like LEX and YACC for generating lexical analyzers and parsers. Additionally, it discusses context-free grammars and their capabilities and limitations in representing programming language syntax.

Uploaded by

Aditya Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Document From Aditya Tripathi

This document provides an overview of compiler design, detailing the role of compilers in translating high-level programming languages into machine code. It covers key concepts such as the phases of compilation, lexical analysis, data structures used in compilers, and tools like LEX and YACC for generating lexical analyzers and parsers. Additionally, it discusses context-free grammars and their capabilities and limitations in representing programming language syntax.

Uploaded by

Aditya Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT-I: Introduction to Compiler Design

1. Introduction to Compiler
A compiler is a software program that translates code written in a high-level programming language (like
C, Java) into machine code (binary) that a computer can execute.
Example:
int main() {
printf("Hello, world!");
}
A compiler converts this into assembly or machine language the CPU understands.
Reference Links:
• GeeksforGeeks - Introduction to Compiler Design

• TutorialsPoint - Compiler Basics

2. Analysis of Source Program


This refers to how the compiler understands and processes the input program. It typically involves:

• Lexical Analysis: Breaking the source code into a stream of tokens.


• Syntax Analysis: Checking the grammatical structure of the token stream against the language’s
syntax rules.
• Semantic Analysis: Verifying the meaning and consistency of the code, such as type checking
and variable declaration.
Example: For code: int a = b + 1;
• Lexical Analysis: tokens → int, a, =, b, +, 1, ;
Reference Links:

• GeeksforGeeks - Phases of a Compiler


• Stanford University - Compiler Analysis Phases (PDF)

3. Phases and Passes of a Compiler


Phases of Compilation (in order):
1. Lexical Analysis
2. Syntax Analysis

3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization

6. Code Generation
7. Symbol Table Management (interacts with all phases)
8. Error Handling (interacts with all phases)

1
Passes:
• Single Pass Compiler: Processes the source code once, typically for simpler languages.

• Multi-Pass Compiler: Processes the code in multiple stages, often separating declaration pro-
cessing from code generation, allowing for more complex optimizations and language features.

Reference Links:
• GeeksforGeeks - Phases of a Compiler
• Computer Science Stack Exchange - Difference between phases and passes of a compiler

4. Bootstrapping
Bootstrapping is a technique where a compiler for a language is written in the very language it is
intended to compile. This is common for developing new compilers or porting existing ones.
Example: Writing a C compiler using the C language itself.
Reference Links:

• Wikipedia - Bootstrapping (compilers)


• GeeksforGeeks - Bootstrapping in Compiler Design

5. Lexical Analyzers (Scanners)


The lexical analyzer (lexer or scanner) is the first phase of a compiler. It reads the stream of characters
from the source code and groups them into meaningful units called tokens.
Example: Input: int x = 5; Output Tokens: keyword(int), identifier(x), operator(=),
constant(5), delimiter(;)
Reference Links:

• GeeksforGeeks - Lexical Analysis in Compiler Design


• TutorialsPoint - Compiler Design - Lexical Analysis

6. Data Structures in Compilation


Compilers heavily rely on various data structures to manage and process information about the source
program.
• Symbol Table: Stores information about identifiers (variables, functions, classes), their types,
scope, and other attributes. It’s used across almost all phases.

• Parse Trees (Concrete Syntax Trees): A tree representation of the syntactic structure of the
input string, reflecting the derivation steps from a grammar.
• Abstract Syntax Trees (ASTs): A simplified and abstract representation of the program’s
structure, omitting concrete syntax details like parentheses, and directly representing the logical
structure.

• Hash Tables: Often used for efficient symbol table lookups.


• Directed Acyclic Graphs (DAGs): Used in optimization to represent expressions and identify
common subexpressions.
Reference Links:

• GeeksforGeeks - Symbol Table in Compiler Design


• GeeksforGeeks - Parse Tree vs Abstract Syntax Tree

2
7. LEX: Lexical Analyzer Generator
LEX (or Flex, its GNU version) is a tool that generates lexical analyzers. It takes a set of regular
expressions (patterns for tokens) and corresponding actions as input and produces C code for a lexer.
Example Lex program snippet:
%%
[0-9]+ { printf("Number found: %s\n", yytext); }
[a-zA-Z]+ { printf("Word found: %s\n", yytext); }
%%

Reference Links:
• GeeksforGeeks - Lexical Analyzer using LEX for C and C++
• TutorialsPoint - Lex & Yacc Tutorial

8. Input Buffering
To optimize the reading of source code characters, lexical analyzers often use input buffering techniques.
This avoids frequent disk I/O operations.
• Two-Buffer Scheme: Divides the input buffer into two halves. When one half is processed, the
next characters are read into the other half.
• Sentinels: A special character (sentinel) is placed at the end of each buffer half to eliminate the
need for checking the end of the buffer in every character read, speeding up the process.
Reference Links:

• GeeksforGeeks - Input Buffering in Compiler Design


• TutorialsPoint - Compiler Design - Input Buffering

9. Specification and Recognition of Tokens


• Specification of Tokens: Tokens are typically specified using regular expressions. Regular
expressions are a powerful notation for describing patterns in text.
– Examples:
∗ Keywords: int|float|char
∗ Identifiers: [a-zA-Z ][a-zA-Z0-9 ]*
∗ Integer literals: [0-9]+
• Recognition of Tokens: Regular expressions are implemented using finite automata.
– Non-deterministic Finite Automata (NFA): Can have multiple transitions for the same
input symbol or empty transitions.
– Deterministic Finite Automata (DFA): For each state and input symbol, there’s exactly
one transition. NFAs can be converted to DFAs, and DFAs are used to efficiently recognize
tokens.

Reference Links:
• GeeksforGeeks - Specification of Tokens in Compiler Design
• NPTEL - Finite Automata and Regular Expressions (Lecture on Automata Theory relevant here)

3
10. YACC: Yet Another Compiler Compiler
YACC (Yet Another Compiler Compiler, or Bison, its GNU version) is a parser generator. It takes
a context-free grammar specification as input and generates C code for a parser (syntax analyzer). This
parser builds a parse tree or an AST from the token stream provided by the lexer.
Example YACC program snippet (for a simple expression grammar):
%token NUMBER
%%
expr: expr ’+’ expr
| expr ’-’ expr
| NUMBER
;
%%
Reference Links:
• GeeksforGeeks - YACC Tutorial
• TutorialsPoint - Lex & Yacc Tutorial

11. The Syntactic Specification of Programming Languages: Context-Free


Grammars (CFG)
Context-Free Grammars (CFGs) are a formal system used to describe the syntax (structure) of
programming languages. They consist of:
• Terminals: The basic symbols of the language (e.g., tokens like int, +, if, id).
• Non-terminals: Variables representing syntactic categories (e.g., Statement, Expression, Declaration).
• Start Symbol: A special non-terminal that represents the entire program or the highest-level
syntactic category.
• Production Rules: Rules that define how non-terminals can be replaced by sequences of terminals
and other non-terminals.
Example CFG for arithmetic expressions:
• E →E+T |T
• T →T ∗F |F
• F → (E) | id
Reference Links:
• GeeksforGeeks - Context-Free Grammar (CFG) in Compiler Design
• TutorialsPoint - Compiler Design - Syntax Analysis (CFG)

12. Derivation and Parse Trees


• Derivation: A sequence of applications of production rules to derive a string of terminals from
the start symbol of a grammar. It shows how a sentence in the language is generated.
– Leftmost Derivation: Always expands the leftmost non-terminal.
– Rightmost Derivation: Always expands the rightmost non-terminal.
• Parse Trees: A graphical representation of a derivation. Each internal node is a non-terminal,
each leaf node is a terminal, and the children of a node represent the symbols on the right-hand
side of a production rule applied to the node’s non-terminal.

Example: For the expression a + b * c using the CFG above: A parse tree visually represents the
hierarchical structure:

4
[level distance=1.5cm, level 1/.style=sibling distance=2.5cm, level 2/.style=sibling distance=1.5cm,
level 3/.style=sibling distance=1cm] E child node E child node T child node F child node a child
node + child node T child node T child node F child node b child node * child node F child
node c ;

Note: This is a simplified representation of the tree for ‘b * c‘ for brevity.


Reference Links:
• GeeksforGeeks - Derivation and Parse Tree
• YouTube - Parse Tree in Compiler Design (Search for ”Parse Tree in Compiler Design” to find
relevant videos)

13. Capabilities of CFG


• What CFGs can represent:

– The recursive nature of programming language constructs (e.g., nested expressions, loops,
conditional statements, function calls).
– Hierarchical structures of programs.
– Most of the syntactic structure of typical programming languages.

• What CFGs cannot represent (limitations):


– Context-sensitive aspects: Rules that depend on the surrounding context (e.g., a variable
must be declared before use, an array index must be an integer). These are typically handled
by the semantic analysis phase.
– Agreement in numbers (e.g., number of parameters in a function call must match the defini-
tion).
– Type compatibility rules.

Reference Links:
• Stack Overflow - What are the limitations of Context-Free Grammars?
• TutorialsPoint - Compiler Design - Context-Free Grammar Limitations (often implied in the CFG
section’s examples)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy