Lex and Yacc Examples Lab Task
Lex and Yacc Examples Lab Task
Lex and Yacc Examples Lab Task
return(token);
where the appropriate token value is returned. An easy way to get access to Yacc's names for tokens is
to compile the Lex output file as part of the Yacc output file by placing the line # include "lex.yy.c" in the
last section of Yacc input. Supposing the grammar to be named ``good'' and the lexical rules to be
named ``better'' the UNIX command sequence can just be:
yacc good
lex better
cc y.tab.c -ly -ll
The Yacc library (-ly) should be loaded before the Lex library, to obtain a main program which invokes
the Yacc parser. The generations of Lex and Yacc programs can be done in either order.
9. Examples.
As a trivial problem, consider copying an input file while adding 3 to every positive number divisible by
7. Here is a suitable Lex source program
%%
int k;
[0-9]+ {
k = atoi(yytext);
if (k%7 == 0)
printf("%d", k+3);
else
printf("%d",k);
}
to do just that. The rule [0-9]+ recognizes strings of digits; atoi converts the digits to binary and stores
the result in k. The operator % (remainder) is used to check whether k is divisible by 7; if it is, it is
incremented by 3 as it is written out. It may be objected that this program will alter such input items as
49.63 or X7. Furthermore, it increments the absolute value of all negative numbers divisible by 7. To
avoid this, just add a few more rules after the active one, as here:
%%
int k;
-?[0-9]+ {
Compiler Construction lab manual Spring 2018 By: Rabbia Mahum
k = atoi(yytext);
printf("%d",
k%7 == 0 ? k+3 : k);
}
-?[0-9.]+ ECHO;
[A-Za-z][A-Za-z0-9]+ ECHO;
Numerical strings containing a ``.'' or preceded by a letter will be picked up by one of the last two rules,
and not changed. The if-else has been replaced by a C conditional expression to save space; the form
a?b:c means ``if a then b else c''.
For an example of statistics gathering, here is a program which histograms the lengths of words, where a
word is defined as a string of letters.
int lengs[100];
%%
[a-z]+ lengs[yyleng]++;
. |
\n ;
%%
yywrap()
{
int i;
printf("Length No. words\n");
for(i=0; i 0)
printf("%5d%10d\n",i,lengs[i]);
return(1);
}
This program accumulates the histogram, while producing no output. At the end of the input it prints
the table. The final statement return(1); indicates that Lex is to perform wrapup. If yywrap returns zero
(false) it implies that further input is available and the program is to continue reading and processing. To
provide a yywrap that never returns true causes an infinite loop.
As a larger example, here are some parts of a program written by N. L. Schryer to convert double
precision Fortran to single precision Fortran. Because Fortran does not distinguish upper and lower case
letters, this routine begins by defining a set of classes including both cases of each letter:
a [aA]
b [bB]
c [cC]
...
z [zZ]
An additional class recognizes white space:
W [ \t]*
The first rule changes ``double precision'' to ``real'', or ``DOUBLE PRECISION'' to ``REAL''.
{d}{o}{u}{b}{l}{e}{W}{p}{r}{e}{c}{i}{s}{i}{o}{n} {
printf(yytext[0]=='d'? "real" : "REAL");
}
Compiler Construction lab manual Spring 2018 By: Rabbia Mahum
Care is taken throughout this program to preserve the case (upper or lower) of the original program. The
conditional operator is used to select the proper form of the keyword. The next rule copies continuation
card indications to avoid confusing them with constants:
[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ |
[0-9]+{W}"."{W}{d}{W}[+-]?{W}[0-9]+ |
"."{W}[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ {
/* convert constants */
for(p=yytext; *p != 0; p++)
{
if (*p == 'd' || *p == 'D')
*p=+ 'e'- 'd';
ECHO;
}
After the floating point constant is recognized, it is scanned by the for loop to find the letter d or D. The
program than adds 'e'-'d', which converts it to the next letter of the alphabet. The modified constant,
now single-precision, is written out again. There follow a series of names which must be respelled to
remove their initial d. By using the array yytext the same action suffices for all the names (only a sample
of a rather long list is given here).
{d}{s}{i}{n} |
{d}{c}{o}{s} |
{d}{s}{q}{r}{t} |
{d}{a}{t}{a}{n} |
...
{d}{f}{l}{o}{a}{t} printf("%s",yytext+1);
Another list of names must have initial d changed to initial a:
{d}{l}{o}{g} |
{d}{l}{o}{g}10 |
{d}{m}{i}{n}1 |
{d}{m}{a}{x}1 {
yytext[0] =+ 'a' - 'd';
ECHO;
}
And one routine must have initial d changed to initial r:
[A-Za-z][A-Za-z0-9]* |
[0-9]+ |
\n |
. ECHO;
Compiler Construction lab manual Spring 2018 By: Rabbia Mahum
Note that this program is not complete; it does not deal with the spacing problems in Fortran or with
the use of keywords as identifiers.
Solution:
Go to tools and build lex or you can do it by opening cmd and change directory as
Cd path where you have saved your file