0% found this document useful (0 votes)
30 views6 pages

320 Problem Set 7

The document provides details about Problem Set 7 for a CS103 class, including: - 6 problems exploring regular expressions, finite automata, regular languages, and properties like closure and limits. - The problems develop skills like designing regular expressions, proving languages are (or aren't) regular, and understanding the Myhill-Nerode theorem. - The full problem set and solutions are due on November 18.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

320 Problem Set 7

The document provides details about Problem Set 7 for a CS103 class, including: - 6 problems exploring regular expressions, finite automata, regular languages, and properties like closure and limits. - The problems develop skills like designing regular expressions, proving languages are (or aren't) regular, and understanding the Myhill-Nerode theorem. - The full problem set and solutions are due on November 18.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CS103 Handout 32

Fall 2016 November 11, 2016

Problem Set 7
What can you do with regular expressions? What are the limits of regular languages? On this prob-
lem set, you'll find out!
As always, please feel free to drop by office hours, ask on Piazza, or send us emails if you have any
questions. We'd be happy to help out.
Good luck, and have fun!
Due Friday, November 18 at the start of class.

Skills developed in this problem set:


• Designing and testing regular expressions.
• Switching between different representations of regular language to prove results about those
languages.
• Using the state-elimination algorithm to convert finite automata into regular expressions.
• Understanding the definition of distinguishability at a conceptual level.
• Using the Myhill-Nerode theorem to prove that languages are or are not regular.
• Building an intuition for what makes a language regular or nonregular.
• Developing a nuanced understanding of the proof of the Myhill-Nerode theorem and using it
to generalize the proof of that result.
• Exploring the nuances of closure properties and their limits.
Problem One: Designing Regular Expressions (14 Points)
Below are a list of alphabets and languages over those alphabets. For each language, write a regular ex-
pression for that language.
Please use our online tool to design, test, and submit your regular expressions. Typed or handwrit-
ten solutions will not be accepted. To use it, visit the CS103 website and click the “Regex Editor” link
under the “Resources” header. As before, make a note in your GradeScope submission of which team
member submitted your answers to this question so that we know where to look. Also, as a reminder,
please test your submissions thoroughly, since we'll be grading them with an autograder.
i. Let Σ = {a, b} and let L = { w ∈ Σ* | w does not contain ba as a substring }. Write a regular ex-
pression for L.
ii. Let Σ = {a, b} and let L = { w ∈ Σ* | w does not contain bb as a substring }. Write a regular ex-
pression for L.
iii. Suppose you are taking a walk with your dog on a leash of length two. Let Σ = {y, d} and let
L = { w ∈ Σ* | w represents a walk with your dog on a leash where you and your dog both end up
at the same location }. For example, the string yyddddyy is in L because you and your dog are
never more than two steps apart and both of you end up four steps ahead of where you started;
similarly, ddydyy ∈ L. However, yyyyddd ∉ L, since halfway through your walk you are three
steps ahead of your dog; ddyd ∉ L, because your dog ends up two steps ahead of you; and ddyd-
dyyy ∉ L, because at one point in your walk your dog is three steps ahead of you. Write a regular
expression for L.
iv. Let Σ = {a, b} and let L = { w ∈ Σ* | w ≠ ab }. Write a regular expression for L.
v. Let Σ = {M, D, C, L, X, V, I} and let L = { w ∈ Σ* | w is number less than 1,000 represented in Ro-
man numerals }. For example, CMXCIX ∈ L, since it represents the number 999, as are the strings
L (50), VIII (8), DCLXVI (666), CXXXVII (137), and CDXII (412). However, we have VIIII ∉ L
(you'll never have four I's in a row; use IX or IV instead), that MI ∉ L (it's a Roman numeral, but
it's for a number that's too large), that VX ∉ L (this isn't a valid Roman numeral), and that IM ∉ L
(the notation of using a smaller digit to subtract from a larger one only lets you use I to prefix V
and X, or X to prefix L and C, or C to prefix D and M). The Romans didn't have a way of expressing
the number 0, so to make your life easier we'll say that ε ∈ L and that the empty string represents
0. (Oh, those silly Romans.) Write a regular expression for L.

Problem Two: Finite and Cofinite Languages (6 Points)


A language L is called finite if L contains finitely many strings. More precisely, a language L is a finite
language if |L| is a natural number. A language L is called cofinite if its complement is a finite language;
that is, L is cofinite if |L| is a natural number.
i. Prove that any finite language is regular.
ii. Prove that any cofinite language is regular.
Problem Three: State Elimination (6 Points)
The state elimination algorithm gives a way to transform a finite automaton (DFA or NFA) into a regular
expression. It's a really beautiful algorithm once you get the hang of it, so we thought that we'd let you try
it out on a particular example.
Let Σ = {a, b} and let L = { w ∈ Σ* | w has an even number of a's and an even number of b's}. Below is a
finite automaton for L that we've prepared for the state elimination algorithm by adding in a new start
state qstart and a new accept state qacc:

qacc

ε
a
start qstart ε q0 q1
a

b b b b

q2 a q3
a

We'd like you to use the state elimination algorithm to produce a regular expression for L.
i. Run two steps of the state elimination algorithm on the above automaton. Specifically, first remove
state q₁, then remove state q₂. Show your result at this point.
ii. Finish the state elimination algorithm by removing q₃, then q₀. What regular expression do you get
for L?

Problem Four: Distinguishable Strings (6 Points)


The Myhill-Nerode theorem is one of the trickier and more nuanced theorems we've covered this quarter.
This question explores what the theorem means and, importantly, what it doesn't mean.
Let Σ = {a, b} and let L = { w ∈ Σ* | |w| is even }.
i. Show that L is a regular language.
ii. Prove that there is a infinite set S ⊆ Σ* where there are infinitely many pairs of distinct strings
x, y ∈ S such that x ≢L y.
iii. Prove that there is no infinite set S ⊆ Σ* where all pairs of distinct strings x, y ∈ S satisfy x ≢L y.
The distinction between parts (ii) and (iii) is important for understanding the Myhill-Nerode theorem. A
language is nonregular not if you can find infinitely many pairs of distinguishable strings, but rather if you
can find infinitely many strings that are all pairwise distinguishable. This is a subtle distinction, but it's an
important one!
Problem Five: Balanced Parentheses (12 Points)
Let Σ = {(, )} and consider the language L₁ = { w ∈ Σ* | w is a string of balanced parentheses }. For ex-
ample, we have () ∈ L₁, (()) ∈ L₁, (()())() ∈ L₁, ε ∈ L₁, and (())((()())) ∈ L₁, but )( ∉ L₁,
(() ∉ L₁, and ((()))) ∉ L₁. This question explores properties of this language.
i. Prove that L₁ is not a regular language. One consequence of this result – which you don't need to
prove – is that most languages that support some sort of nested parentheses, such as most pro-
gramming languages and HTML, aren't regular and so can't be parsed using regular expressions.
Let's say that the nesting depth of a string of balanced parentheses is the maximum number of unmatched
open parentheses at any point inside the string. For example, the string ((())) has nesting depth three,
the string (()())() has nesting depth two, and the string ε has nesting depth zero.
Consider the language L₂ = { w ∈ Σ* | w is a string of balanced parentheses and w's nesting depth is at
most four }. For example, ((())) ∈ L₂, (()()) ∈ L₂, and (((())))(()) ∈ L₂, but ((((())))) ∉ L₂ be-
cause although it's a string of balanced parentheses, the nesting goes five levels deep.
ii. Design a DFA for L₂, showing that L₂ is regular. A consequence of this result – which, again, you
don't need to prove – is that while you can't parse general programs or HTML with regular ex-
pressions, you can (in principle) parse programs with low nesting depth or HTML documents
without deeply-nested tags using regular expressions. Please submit this DFA using the DFA edi-
tor on the course website.
iii. Look back at your proof from part (i) of this problem. Imagine that you were to take that exact
proof and blindly replace every instance of “L₁” with “L₂.” This would give you a (incorrect) proof
that L₂ is nonregular (we know is has to be wrong because L₂ is indeed regular.) Where would the
error be in that proof? Be as specific as possible.
iv. Without making reference to DFAs, NFAs, regular expressions, or the Myhill-Nerode theorem,
explain, intuitively, why L₁ is nonregular while L₂ is regular.

Problem Six: Tautonyms (8 Points)


A tautonym is a word that consists of the same string repeated twice. For example, the words “bulbul,”
“caracara,” and “dikdik” are all tautoynms (the first two are species of birds, and the last is the cutest ani-
mal you'll ever see), as is the word “hotshots” (people who aren't very fun to be around). Let Σ = {a, b}
and consider the following language:
L = { ww | w ∈ Σ* }
This is the language of all tautonyms over Σ. Below is an incorrect proof that L is not regular:
Proof: Let S = { an | n ∈ ℕ }. This set is infinite because it contains one string for each natural
number. We claim that any two strings in S are distinguishable relative to L. To see this, con-
sider any two distinct strings an and am in the set S. Then anan ∈ L but aman ∉ L, so an ≢L am.
This means that S is an infinite set of strings that are pairwise distinguishable to L. Therefore,
by the Myhill-Nerode theorem, L is not regular. ■
Although this language is indeed nonregular, this proof is incorrect.
i. What's wrong with this proof? Be specific.
ii. Although the above proof is incorrect, the language L isn't regular. Prove this.
Problem Seven: State Lower Bounds (6 Points)
The Myhill-Nerode theorem we proved in lecture is actually a special case of a more general theorem
about regular languages that can be used to prove lower bounds on the number of states necessary to con-
struct a DFA for a given language.
i. Let L be a language over Σ. Suppose there's a finite set S such that any two distinct strings x, y ∈ S
are distinguishable relative to L (that is, x ≢L y). Prove that any DFA for L must have at least |S|
states. (You sometimes hear this referred to as lower-bounding the size of any DFA for L.)
Consider this language from Problem Two, part (iii) from Problem Set Six:
L₁ = { w ∈ {a, b}* | w contains at least two b's with exactly five characters between them }
It's possible to build a seven-state NFA for this particular language, but any DFA for this language will
have to have a huge number of states.
ii. Let S = {a, b}6. Prove that any pair of distinct strings in S are distinguishable relative to L₁. This
shows that any DFA for L₁ must have at least 64 states, since there are 64 strings in S.

Problem Eight: Closure Properties Revisited (9 Points)


When building up the regular expressions, we explored several closure properties of the regular languages.
This problem explores some of their nuances.
The regular languages are closed under complementation: If L is regular, so is L.
i. Prove or disprove: the nonregular languages are closed under complementation.
The regular languages are closed under union: If L₁ and L₂ are regular, so is L₁ ∪ L₂.
ii. Prove or disprove: the nonregular languages are closed under union.
We know that the union of any two regular languages is regular. Using induction, we can show that the
union of any finite number of regular languages is also regular. As a result, we say that the regular lan -
guages are closed under finite union.
An infinite union is the union of infinitely many sets. For example, the rational numbers can be expressed
as the infinite union { x/1 | x ∈ ℤ } ∪ { x/2 | x ∈ ℤ } ∪ { x/3 | x ∈ ℤ } ∪ … out to infinity.
iii. Prove or disprove: the regular languages are closed under infinite union.
Extra Credit Problem: Fooling Sets (1 Point Extra Credit)
In Problem Seven, you saw how to use distinguishability to lower-bound the size of DFAs for a particular
language. Unfortunately, distinguishability is not a powerful enough technique to lower-bound the sizes of
NFAs. In fact, it's in general quite hard to bound NFA sizes; there's a $1,000,000 prize for anyone who
finds a polynomial-time algorithm that, given an arbitrary NFA, converts it to the smallest possible equiv-
alent NFA!
Although it's generally difficult to lower-bound the sizes of NFAs is in genera, there are some techniques
we can use to find lower bounds on the sizes of NFAs. Let L be a language over Σ. A generalized fooling
set for L is a set ℱ ⊆ Σ* × Σ* is a set with the following properties:
• For any (x, y) ∈ ℱ, we have xy ∈ L.
• For any distinct pairs (x₁, y₁), (x₂, y₂) ∈ ℱ, we have x₁y₂ ∉ L or x₂y₁ ∉ L (this is an inclusive OR.)
As an example, consider this language L₁:
L₁ = { w ∈ {a, b}* | w contains at least two b's with exactly five characters between them }
The following set is a generalized fooling set for L₁:
ℱ₁ = { (b, aaaaab), (ba, aaaab), (baa, aaab), (baaa, aab), (baaaa, ab), (baaaaa, b), (baaaaab, ε) }
It's worth investigating why, exactly, this is a generalized fooling set for L₁.
Prove that if there is a generalized fooling set ℱ for some language L that contains n pairs of strings, then
any NFA for L must have at least n states.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy