CFG - A Twice B
CFG - A Twice B
CFG - A Twice B
1. (60 points)
(a) (5 points) Let Σ be the alphabet {a, b}. Give a context free grammar for the language, A1 , where
A1 = {w ∈ Σ∗ | ∃n ∈ Z≥0 . w = an b2n }
Note: in all problems, you don not need to write your grammar in CNF or any other “special” form. In
fact, you should write your rules so that it is clear why they generate the specified language, and if it is not
obvious, add a short explanation of the intuition behind your solution.
Solution:
S0 → ǫ | a S0 bb
This is solution is “obvious” enough that it doesn’t require further explanation.
(b) (10 points) Describe a PDA that recognizes language A1 . You can just draw a transition diagram where
edges are labeled as in Sipser.
Solution:
See the PDA of Figure 1.
(c) (15 points) Let Σ be the alphabet {a, b}. Give a context free grammar for the language, A2 , where
A2 = {w ∈ Σ∗ | #a(w) = 2#b(w)}
where #a(w) denotes the number of a’s in w and likewise for #b(w). My grammar is fairly short, but it
requires a bit of explanation to see that it is correct. Make sure that you include enough of an explanation
of why your grammar is correct that your solution is convincing.
Solution:
S0 → ǫ | S0 S0
| a S0 a S0 b
| a S0 b S0 a
| b S0 a S0 a
I’ll claim that it is obvious that every string generated by this grammar has twice as many a’s as b’s.
Now, I’ll show that every string that has twice as many a’s as b’s is generated by this grammar. Let
f (s) = #a(s) − 2#b(s).
Let w be an arbitrary string in A2 . I’ll sketch the induction proof that w ∈ A2 .
If w = ǫ, then w is generated by the derivation S0 ⇒ ǫ.
Otherwise, if we can find non-empty strings x, y ∈ A2 such that w = xy, then S0 ⇒ S0 S0 ⇒ xy = w.
Otherwise, For any non-empty strings x and y with xy = w, f (x) 6= 0.
If f (x) > 0 for all x as described above, Then, w must be of the form aaub for some string u ∈ Σ∗ ,
and f (u) = 0. Thus u ∈ A2 and we get
S0 ⇒ a §0 a §0 b
S0 → aS0 aS0 b
⇒ aa §0 b, S0 → ǫ
∗ ∗
⇒ aa u b, S0 ⇒ u, by ind. hyp.
= w
Thus, w is generated by the grammar.
If f (x) < 0 for all x as described above, Then an argument analagous to the one above shows that
w is of the form buaa and is generated by the grammar.
Otherwise, f (x) must change sign as we consider longer prefixes of w, but f (x) is never 0. Note
that if f (x · c) > f (x) for some c ∈ Σ, then c = a and f (x · c) = f (x) + 1. Thus, the sign change
in f must be from positive to negative. We conclude that w has the form aubva and
S0 ⇒ a §0 b §0 a
S0 → aS0 bS0 a
∗ ∗
⇒ aub v a, S0 ⇒ u, v, by ind. hyp.
= w
Thus, w is generated by the grammar.
This completes the proof (sketch). The langauge generated by the grammar given above is A2 .
(d) (10 points) Describe a PDA that recognizes language A2 . You can just draw a transition diagram where
edges are labeled as in Sipser.
Solution:
See the PDA of Figure 2.
(e) (10 points) Let Σ be the alphabet {a, b, c}. Give a context free grammar for the language, A3 , where
A3 = {w ∈ Σ∗ | ∃i, j ∈ Z≥0 . w = ai bj ci+j }
Solution:
S0 → ǫ | S1 | a S0 c
S1 → ǫ | b S1 c
S0 is the start variable.
This one merits a bit of explanation. The rules for S0 can derive ǫ (i.e. a0 b0 c0 ) or strings of the form
ai S1 ci (equivalently, ai b0 S1 ci . Likewise, S1 derives strings of the form bj cj . Thus, the grammar
produces all strings of the form ai bj cj ci = ai bj ci+j , and no others. This is the language A2 .
(f) (10 points) Describe a PDA that recognizes language A3 . You can just draw a transition diagram where
edges are labeled as in Sipser.
Solution:
See the PDA of Figure 3.
2. (10 points) Prove that language B described below is not context free.
B = {w ∈ {a, b}∗ | (w = wR ) ∧ (#a(w) = #b(w))}
where wR is the reverse of w. In English, B is the language of all palindromes that contain an equal number of
a’s and b’s.
Solution:
Let p be a proposed pumping lemma constant, and let w = ap b2p ap ∈ B. If w = uvxyz with |vxy| ≤ p
and |vy| > 0, it must be the case that #a(vy) = #b(vy); otherwise uv 2 xy 2 z clearly has an unequal number
of a’s and b’s. Thus we assume #a(vy) = #b(vy). Without loss of generality, we assume that vxy is a
substring of ap bp . Since v necessarily begins with a, then uv 2 xy 2 z has a prefix of ap+1 . This implies that
uv 2 xy 2 z 6= uv 2 xy 2 z R , because this string has a postfix of bap .
3. (20 points) One of the languages described below is context free and the other is not. Determine which is which.
Give a CFG or describe a PDA for the context-free language, and use the pumping lemma to prove that the other
language is not context free. For both languages the alphabet is {a, b, c, d}.
C1 = {w | ∃i, j ∈ Z≥0 . w = ai bj ci dj }
C2 = {w | ∃i, j ∈ Z≥0 . w = ai bj cj di }
Solution:
C1 is not context-free: Let p be a proposed pumping lemma constant, and let w = ap bp cp dp ∈ C1 . If w =
uvxyz with |vxy| ≤ p and |vy| > 0, then assume without loss of generality that v only contains an a or a b (or
both) (as the cases of v containing c or d is analogous, as is the case of y containing a particular symbol). If v
contains an a, then y does not contain a c. On the other hand, if v contains a b, then y does not contain a d. In
either case, uv 2 xy 2 z ∈
/ C1 , as either the number of a’s increase while the number of c’s do not, or the number
of b’s increase while the number of d’s do not.
C2 is context-free, because it is given by the following grammer:
S → aSd|T
T → bT c|ǫ
4. (20 points) One of the languages described below is context free and the other is not. Determine which is which.
Give a CFG or describe a PDA for the context-free language, and use the pumping lemma to prove that the other
language is not context free. For both languages the alphabet is {a, b, c, d}.
D1 = {x1 cx2 c · · · xk | each xi ∈ {a, b}∗ , and for every i, j ∈ 1 . . . k, if i 6= j, then xi 6= xj .}
D2 = {x1 cx2 c · · · xk | each xi ∈ {a, b}∗ , there is some pair i, j ∈ 1 . . . k with i 6= j and xi 6= xj .}
Solution:
D1 is not context-free: Let p be a proposed pumping lemma constant, and let w = a0 ca1 ca2 c...cap−1 cap ∈ D1 .
If w = uvxyz with |vxy| ≤ p and |vy| > 0, we consider two cases. (1) If vy contains a c, then (wlog, assume v
contains the c) v = qcr for some strings q and r, and v 3 = qcrqcrqcr has two common substrings delimited by
c, which is the longest prefix of rq that does not contain a c. Therefore, uv 3 xy 3 z ∈
/ D1 . If vy does not contain
a c, then v is contained within cai c for some 1 ≤ 1 ≤ p. Then, uv 0 xy 0 z ∈ / D1 because no string of length i
appears but there are still a total of p c-symbols, so there are two equal strings by the pigeon-hole principle.
D2 is context free. The key observation is that if there are two xi ’s that differ, then there is an i such that
xi 6= xi+1 . Figure 4 shows a PDA that recognizes language D2 .
{a,b}, ε
q4
{a,b} ,ε {a,b} ,ε ε
b,
$
ε
a ,ε a
ε
ε,ε $ b ,ε b c,a
q0 q2 q3 q6 Σ,ε ε
c,b
ε
ε,ε
c,
ε
$
$
ε
a,
ε
{a,b},$ ε
ε
c,
q5 c, ε
q1 q8 q7
ε ,$ ε
Σ,ε ε {a,b}, ε
The PDA initially pushes an endmarker, $ onto the stack. It moves directly to state q2 if x1 6= x2 . Otherwise, it
moves to state q1 to skip over x1 , x2 , . . . to get to a pair that differ.
Now, note that if xi and xi+1 differ then either they have the same lengths but have different symbols in some
position OR they have different lengths. If they have the same lengths, then in state q2 the PDA pushes markers,
•’s, onto the stack until it reaches a symbol that differs for the two strings. It pushes this symbol for the xi string
onto the stack and transitions to state q3 . In state q3 , the PDA skips over the rest of xi . When it reaches the c
that separates xi from xi+1 it transitions to state q4 if the symbol that it has guessed will be different was an a
in string xi and to state q5 if it was a b. In state q4 , the PDA pops markers until it reaches the symbol in the
same position as the a in string xi . If the corresponding symbol in xi+1 is a b, the PDA transitions to state q6
and accepts. The operation in state q5 is similar.
If xi and xi+1 have different lengths, the PDA stays in state q2 the entire time that it reads xi and transitions to
state q7 when it reads the c that separates xi from xi+1 . At this point, the number of markers on the stack is
equal to the length of xi . In state q7 , the PDA pops off one marker for each symbol of xi+1 . If |xi+1 | > |xi |
then the PDA will read an a or b when the $ marker is on the top of the stack and it will transition to state q6
and accept once it finishes reading the string. If |xi+1 | < |xi | and xi+1 is not the last substring, then the PDA
will read a c while there are still one or more • markers on the stack. Again, the PDA will transition to state q6
and eventually accept. Finally, if xi+1 | < |xi | and xi+1 is the last substring, then the PDA will reach the end of
the input while there are still one or more • markers on the stack. In this case, the PDA will transtion to state q8
and accept.