Document From Aditya Tripathi
Document From Aditya Tripathi
1. Introduction to Compiler
A compiler is a software program that translates code written in a high-level programming language (like
C, Java) into machine code (binary) that a computer can execute.
Example:
int main() {
printf("Hello, world!");
}
A compiler converts this into assembly or machine language the CPU understands.
Reference Links:
• GeeksforGeeks - Introduction to Compiler Design
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
7. Symbol Table Management (interacts with all phases)
8. Error Handling (interacts with all phases)
1
Passes:
• Single Pass Compiler: Processes the source code once, typically for simpler languages.
• Multi-Pass Compiler: Processes the code in multiple stages, often separating declaration pro-
cessing from code generation, allowing for more complex optimizations and language features.
Reference Links:
• GeeksforGeeks - Phases of a Compiler
• Computer Science Stack Exchange - Difference between phases and passes of a compiler
4. Bootstrapping
Bootstrapping is a technique where a compiler for a language is written in the very language it is
intended to compile. This is common for developing new compilers or porting existing ones.
Example: Writing a C compiler using the C language itself.
Reference Links:
• Parse Trees (Concrete Syntax Trees): A tree representation of the syntactic structure of the
input string, reflecting the derivation steps from a grammar.
• Abstract Syntax Trees (ASTs): A simplified and abstract representation of the program’s
structure, omitting concrete syntax details like parentheses, and directly representing the logical
structure.
2
7. LEX: Lexical Analyzer Generator
LEX (or Flex, its GNU version) is a tool that generates lexical analyzers. It takes a set of regular
expressions (patterns for tokens) and corresponding actions as input and produces C code for a lexer.
Example Lex program snippet:
%%
[0-9]+ { printf("Number found: %s\n", yytext); }
[a-zA-Z]+ { printf("Word found: %s\n", yytext); }
%%
Reference Links:
• GeeksforGeeks - Lexical Analyzer using LEX for C and C++
• TutorialsPoint - Lex & Yacc Tutorial
8. Input Buffering
To optimize the reading of source code characters, lexical analyzers often use input buffering techniques.
This avoids frequent disk I/O operations.
• Two-Buffer Scheme: Divides the input buffer into two halves. When one half is processed, the
next characters are read into the other half.
• Sentinels: A special character (sentinel) is placed at the end of each buffer half to eliminate the
need for checking the end of the buffer in every character read, speeding up the process.
Reference Links:
Reference Links:
• GeeksforGeeks - Specification of Tokens in Compiler Design
• NPTEL - Finite Automata and Regular Expressions (Lecture on Automata Theory relevant here)
3
10. YACC: Yet Another Compiler Compiler
YACC (Yet Another Compiler Compiler, or Bison, its GNU version) is a parser generator. It takes
a context-free grammar specification as input and generates C code for a parser (syntax analyzer). This
parser builds a parse tree or an AST from the token stream provided by the lexer.
Example YACC program snippet (for a simple expression grammar):
%token NUMBER
%%
expr: expr ’+’ expr
| expr ’-’ expr
| NUMBER
;
%%
Reference Links:
• GeeksforGeeks - YACC Tutorial
• TutorialsPoint - Lex & Yacc Tutorial
Example: For the expression a + b * c using the CFG above: A parse tree visually represents the
hierarchical structure:
4
[level distance=1.5cm, level 1/.style=sibling distance=2.5cm, level 2/.style=sibling distance=1.5cm,
level 3/.style=sibling distance=1cm] E child node E child node T child node F child node a child
node + child node T child node T child node F child node b child node * child node F child
node c ;
– The recursive nature of programming language constructs (e.g., nested expressions, loops,
conditional statements, function calls).
– Hierarchical structures of programs.
– Most of the syntactic structure of typical programming languages.
Reference Links:
• Stack Overflow - What are the limitations of Context-Free Grammars?
• TutorialsPoint - Compiler Design - Context-Free Grammar Limitations (often implied in the CFG
section’s examples)