Regular Expressions
Regular Expressions
• Highlights:
– A regular expression is used to specify a language, and it does so
precisely.
– Regular expressions are very intuitive.
– Regular expressions are very useful in a variety of contexts.
– Given a regular expression, an NFA-ε can be constructed from it
automatically.
– Thus, so can an NFA be constructed, and a DFA, and a corresponding
program, all automatically!
1
Two Operations
• Concatenation:
– x = 010
– y = 1101
– xy = 010 1101
• Language Union:
– L1 = {01, 00}
– L2 = {01, 11, 010}
– L1 UL2 = {01, 00, 11, 010}
2
Operations on Languages
3
Kleene closure
Say, L, or L1 ={a, abc, ba}, on Σ ={a,b,c}
Then, L2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, baba}
…..
But, L0 = {ε}
4
Operations on Languages
5
Definition of a Regular Expression
Let r and s be regular expressions that represent the sets R and S, respectively.
6
• Examples: Let Σ = {0, 1}
(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least one 0
(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least two 0’s
(0 + 1)*01*01* All strings of 0’s and 1’s containing at least two 0’s
(1 + 01*0)* All strings of 0’s and 1’s containing an even number of 0’s
1*(01*01*)* All strings of 0’s and 1’s containing an even number of 0’s
(1*01*0)*1* All strings of 0’s and 1’s containing an even number of 0’s
(0+1)* = (0*1*)* Any string, or (sigma)*, sigma={0, 1} in all cases here
1. Øu = uØ = Ø Like multiplying by 0
2. εu = u ε = u Like multiplying by 1
3. Ø* = ε L = Li = L0 U L1 U L2 U…
*
i 0
4. ε* = ε = { ε}
5. u+v = v+u
6. u+Ø=u
7. u+u=u
8. u* = (u*)*
9. u(v+w) = uv+uw [which operation is hidden before parenthesis?]
10. (u+v)w = uw+vw
11. (uv)*u = u(vu)* [note: you have to have a single u, at start or
end]
[note (uv)* =/= u*v*]
1. (u+v)* = (u*+v)*
= u*(u+v)*
= (u+vu*)*
= (u*v*)*
= u*(vu*)*
= (u*v)*u* 8
Equivalence of Regular Expressions
and NFA-εs
• Note:
Throughout the following, keep in mind that a string is accepted by an NFA-ε
if there exists ANY path from the start state to any final state.
9
Basis: OP(r) = 0
For Ø:
q0 qf
For ε:
qf
For a:
a
q0 qf
10
Inductive Hypothesis: Suppose there exists a k 0 such that for any regular
expression r where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r).
Furthermore, suppose that M has exactly one final state.
Case 1) r = r1 + r2
Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive
hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and
L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.
Construct M as:
ε q1 M1 f1 ε
q0 qf
ε ε
q2 M2 f2
11
Case 2) r = r1r2
Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there
exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore,
both M1 and M2 have exactly one final state.
Construct M as:
ε
q1 M1 f1 q2 M2 f2
Case 3) r = r1*
Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists
an NFA-ε machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.
ε
Construct M as:
q0 ε q1 M1 f1 ε qf
12
ε
• Example:
r = 0(0+1)*
r = r 1 r2
r1 = 0
r2 = (0+1)*
r2 = r 3 * q0 1
q1
r3 = 0+1
r3 = r 4 + r 5
r4 = 0
r5 = 1
13
• Example:
r = 0(0+1)*
r = r 1 r2
r1 = 0
r2 = (0+1)*
r2 = r 3 * q0 1
q1
r3 = 0+1
q2 0
q3
r3 = r 4 + r 5
r4 = 0
r5 = 1
14
• Example:
r = 0(0+1)*
r = r 1 r2
r1 = 0
r2 = (0+1)*
r2 = r 3 * ε q0 1 q1 ε
q4 q5
r3 = 0+1
ε q2 0 q3 ε
r3 = r4 + r5
r4 = 0
r5 = 1
15
• Example:
r = 0(0+1)*
r = r 1 r2
r1 = 0
ε
r2 = (0+1)*
r2 = r3* ε q0 1 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
16
• Example:
r = 0(0+1)*
q8 0 q9
r = r 1 r2
r1 = 0
ε
r2 = (0+1)*
r2 = r 3 * ε q0 1 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
17
• Example:
r = 0(0+1)*
q8 0 q9
r = r 1r2
r1 = 0
ε
ε
r2 = (0+1)*
r2 = r 3 * ε q0 1 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
18
Equivalence Proved So Far
• Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn}, and
define:
Ri,j is the set of all strings that define a path in M from q i to qj.
20
• Example:
1
q2 q4
0
0 1
q1
0
1
0
1 q3 q5
1
0
21
• In words: Rki,j is the set of all the strings that define a path in M from q i to qj
but that passes through no state numbered greater than k.
• Definition:
Rki,j = { x | x is in Σ* and δ(qi,x) = qj, and for no u where 1 |u| < |x| and
x = uv there is no case such that δ(qi,u) = qp where p>k}
• Note that it may be true that i>=k or j>=k, only the intermediate states on the
path from i to j may not be >k.
22
• Example:
1
q2 q4
0
0 1
q1
0
1
0
1 q3 q5
1 0
5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j Now, you see the purpose of
introducing k:
So that we can write it as a
RE
24
• Notes on 5:
qi qj
• IF x is a string in Rki,j then no state numbered > k may passed through when processing
x and either:
– qk is not passed through, i.e., x is in Rk-1i,j
– qk is passed through one or more times, i.e., x is in Rk-1i,k (Rk-1k,k)* Rk-1k,j
25
• Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a regular expression r
such that L(M) = L(r).
• Proof:
First we will show (by induction on k) that for all i,j, and k, where 1 i,j n
and 0 k n, that there exists a regular expression r such that L(r) = Rki,j .
Basis: k=0
R0i,j contains single symbols, one for each transition from q i to qj, and possibly ε if i=j.
r0i,j = Ø
26
case 3) No transitions from qi to qj and i = j
r0i,j = ε
Inductive Step:
Consider Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j . By the inductive hypothesis there
exist regular expressions rk-1i,k , rk-1k,k , rk-1k,j , and rk-1i,j generating Rk-1i,k , Rk-1k,k ,
Rk-1k,j , and Rk-1i,j , respectively. Thus, if we let
27
• Finally, if F = {qj1, qj2, …, qjr}, then
• Note: not only does this prove that the regular expressions generate the regular
languages, but it also provides an algorithm for computing it!
28
• Example:
1
First table column is
q1 0 1 computed from the
q2 q3
DFA.
0 0/1
rk1,1 ε
rk1,2 0
rk1,3 1
rk2,1 0
rk2,2 ε
rk2,3 1
rk3,1 Ø
rk3,2 0+1
rk3,3 ε 29
• All remaining columns are computed from the previous column using the
formula. 1
rk1,1 ε ε
rk1,2 0 0
rk1,3 1 1
rk2,1 0 0
rk2,2 ε ε + 00
rk2,3 1 1 + 01
rk3,1 Ø Ø
rk3,2 0+1 0+1
rk3,3 ε ε
30
1
rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
31
rk3,3 ε ε ε + (0 + 1)0*1
• To complete the regular expression for the language, we compute:
r31,2 + r31,3 [complete this]
rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
rk3,3 ε ε ε + (0 + 1)0*1
32
Now we have proved equivalence of all
• Proof:
(if) Suppose there exists a DFA M such that L = L(M). Then by Lemma 2
there exists a regular expression r such that L = L(r).
(only if) Suppose there exists a regular expression r such that L = L(r). Then
by Lemma 1 there exists a DFA M such that L = L(M).
34