Lex Yaac
Lex Yaac
Lex Yaac
Lex
lex is a program (generator) that generates lexical analyzers,
(widely used on Unix).
It is mostly used with Yacc parser generator.
Written by Eric Schmidt and Mike Lesk.
It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
Lex will read patterns (regular expressions); then produces C
code for a lexical analyzer that scans for identifiers.
•
Lex
Lex is a program that generates lexical analyzer. It is used with YACC
parser generator.
• The lexical analyzer is a program that transforms an input stream into a
sequence of tokens.
Lex
◦ A simple pattern: letter(letter|digit)*
Regular expressions are translated by lex to a computer program that mimics an FSA.
This pattern matches a string of characters that begins with a single letter followed by
zero or more letters or digits.
Lex
The auxiliary declarations are copied as such by LEX to the output lex.yy.c
file. This C code consists of instructions to the C compiler and are not
processed by the LEX tool.The auxiliary declarations (which are optional)
are written in C language and are enclosed within ' %{ ' and ' %} ' . It is
generally used to declare functions, include header files, or define global
variables and constants.
Auxiliary functions
LEX generates C code for the rules specified in the Rules section and places
this code into a single function called yylex(). (To be discussed in detail later).
In addition to this LEX generated code, the programmer may wish to add his
own code to the lex.yy.c file. The auxiliary functions section allows the
programmer to achieve this.
The yyvariables
These variables are accessible in the LEX program and are automatically declared
by LEX in lex.yy.c.
yyin
yytext
yyleng
yyin
yyin is a variable of the type FILE* and points to the input file. yyin is defined by LEX
automatically. If the programmer assigns an input file to yyin in the auxiliary functions
section, then yyin is set to point to that file. Otherwise LEX assigns yyin to
stdin(console input).
yytext
yytext is of type char* and it contains the lexeme currently found.Each invocation of
the function yylex() results in yytext carrying a pointer to the lexeme found in the input
stream by yylex(). The value of yytext will be overwritten after the next yylex()
invocation.
yyleng
yyleng is a variable of the type int and it stores the length of the lexeme pointed to
by yytext.
The yyfunctions
yylex()
yywrap()
yylex()
yylex() is a function of return type int. LEX automatically defines yylex() in lex.yy.c but
does not call it. The programmer must call yylex() in the Auxiliary functions section of
the LEX program.
yywrap()
LEX declares the function yywrap() of return-type int in the file lex.yy.c . LEX does not
provide any definition for yywrap(). yylex() makes a call to yywrap() when it encounters
the end of input. If yywrap() returns zero (indicating false) yylex() assumes there is
more input and it continues scanning from the location pointed to by yyin. If yywrap()
returns a non-zero value (indicating true), yylex() terminates the scanning process and
returns 0 (i.e. “wraps up”). The programmer must define it in the Auxiliary functions
section or provide %option noyywrap in the declarations section. It is mandatory to
either define yywrap() or indicate the absence using the %option feature. If not, LEX
will flag an error
Disambiguation Rules
yylex() uses two important disambiguation rules in selecting the right action to execute
in case there is more than one pattern that matches a string in the given input:
1. Choose the first match.
2. "Longest match" is preferred.
Example:
“break” { return BREAK; }
[a-zA-Z][a-zA-Z0-9]* { return IDENTIFIER; }
If "break" is found in the input, it is matched with the first pattern and yylex() returns
BREAK. If "breakdown" is found, it is matched with the second pattern and yylex()
returns IDENTIFIER. Note the use of disambiguation rules here.
Example:
/* Declarations section */
%%
%%
/* Auxiliary functions */
Lex
expression may be :
◦ sum of two expressions .
◦ product of two expressions .
◦ Or an identifiers
Lex file
Yacc file
Linking lex&yacc