070 Problem Set 1
070 Problem Set 1
070 Problem Set 1
In this first written assignment, you'll get the chance to play around with the various
constructions that come up when doing lexical analysis. This problem set will start off
by reviewing those constructions, then ask you to think critically about them. You'll be
considering different algorithms for lexical analysis, along with some of the limitations
of what we presented in class. In particular, you'll explore the various algorithms for
generating automatondriven scanners and for resolving conflicts that might arise in
them.
This assignment is due on Friday, July 6. I suggest starting early while the material is
still fresh in your mind. Feel free to email us with questions or to drop by office hours.
Overall, this assignment is worth 10% of your total grade in the course. Each problem is
weighted equally.
Submission instructions
There are two ways to submit this assignment:
1. Submit a physical copy of your answers in the filing cabinet in the open space
near the handout hangout in the Gates building. If you haven't been there before,
it's right inside the entrance labeled “Stanford Engineering Venture Fund
Laboratories.”
2. Send an email with an electronic copy of your answers to the staff list at
cs143sum1112staff@lists.stanford.edu. Please include the string [WA1]
somewhere in the subject line so that it's easier for us to find your submission.
Good luck!
start q0 b q1
i.
a a
q2
b
start q0 a q1
ii.
b a
q2 b q3
start q0 a q1 a q2
iii.
a a
ε a, b
q3 b q4 ε q5
3
%%
"for" { return T_For; }
[A-Za-z_][A-Za-z0-9_]* { return T_Identifier; }
We saw how the string fort could be tokenized in nine possible ways, based on how we
chose to apply the regular expression. In this case, maximalmunch algorithm dictates
that we would scan the string as the identifier fort.
However, in some cases we may have a set of regular expressions for which it is possible
to tokenize a particular input string, but for which the maximalmunch algorithm will
not be able to break the input into tokens. Give an example of such a set of regular
expressions and an input string such that
• The string can be broken apart into substrings, where each substring matches one
of the regular expressions, but
• The maximalmunch algorithm will fail to break the string apart in a way where
each piece matches one of the regular expressions.
Additionally, explain how the string can be tokenized and why maximalmunch will fail
to tokenize it.
*
If you want to, you can actually use flex to compile and run this scanner to check your answer.
However, please be sure that you understand why the output is as it is.
4
*
Θ(n2) is similar to O(n2), except that it implies a tight bound instead of an upper bound. If a function is
O(n2), it could also be O(n). However, if a function is Θ(n2), it grows asymptotically at the rate of n2.
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: