Unit 4: Regular Expressions
Unit 4: Regular Expressions
Regular Expressions
1
Overview
• Regular expressions
• Equivalence of RE and RL
2
Regular Expressions
• Regular languages (RL) are often described
by means of algebraic expressions called
regular expressions (RE).
• In arithmetic we use the +, * operations to
construct expressions: (2+3)*5
• The value of the arithmetic expression is the
number 25.
• The value of a regular expression is a regular
language.
3
Regular operations
• In RE we use regular operations to construct
expressions describing regular languages:
( 0 + 1 )* ◦ 0
where :
r+s means r OR s
r* means Kleene star of r
r◦s (or rs) means concatenation of r and s
4
Formal definition
• A set of regular expressions over an alphabet
is defined inductively as follows:
Basis:
, , and (for all ) are regular
expressions.
Induction:
If r and s are RE then the following expressions
are also RE:
– (r)
– r+s
– r◦s
– r* 5
Examples over ={a,b}
, a, a+b, b*, (a+b)b, ab*, a*+b*
• To avoid using many parentheses, the
operations have the following priority
hierarchy:
1. * - highest (do it first)
2. ◦
3. + - lowest (do it last)
• Example: (b+(a◦(b*)) = b + ab*
(Notations: The symbol ◦ can be dropped. means
(1+2+3…) , r+ means rr* ). 6
Regular expressions and
regular languages
We associate each regular expression r with a
regular language L(r) as follows:
• L()=,
• L()={},
• L()={} for each ,
• L(r+s)=L(r)L(s),
• L(r◦s)=L(r)◦L(s),
• L(r*)=(L(r))*.
7
Examples over ={0,1}
In Class: Describe each language as
a regular expression.
L1 = { w | w has a single 1}
L2 = { w | w has at least one 1}
L3 = { w | w contains the string 110}
L4 = { w | |w| mod 2 =0}
L5 = { w | w starts with 1}
L6 = { w | w ends with 00}
8
Examples over ={0,1}, cont’
9
Properties of regular expressions 1
10
Properties of regular expressions 2
• r(s+t)=rs+rt
• r+(s+t)=(r+s)+t
• r(st)=(rs)t
• r*=(r*)*=r*r*=r*+r
• r*+r+=r*
11
Equivalence of
regular expressions
12
Example
Let r=+(0+1)*1
Let s=(0*1)*
Are r and s equivalent?
13
L[r]L[s]
r=+(0+1)*1 s=(0*1)*
14
L[r]L[s]
r=+(0+1)*1 s=(0*1)*
• Let wL[r]=+(0+1)*1.
• If w= then wL[s] (by definition of *).
• If w then w can be represented as w=w’1
where w’L[(0+1)*]. Assume that w’ contains k
instances of the letter 1. This means that w’ can
be written as w’= x11x21.. xk1xk+1 where xi0*
But then w=w’1=
=(x11)(x21)…(xk+11)=(0*1)(0*1)...(0*1)
So wL[(0*1)*]. 15
Another example
Are r and s equivalent?
r=(0+1)*1+0*
s=(1+0)(0*1)*
Answer: No.
• Consider the word w = .
• wL[r]=(0+1)*1+0*, because w0*.
• But wL[s] =(1+0)(0*1)*, as all words in L[s]
have at least one letter.
16
Equivalence of RE with FA
• Regular expressions and finite automata are
equivalent in terms of the languages they
describe.
Theorem:
A language is regular iff some regular
expression describes it.
Regular
Languages DFA
Regular
NFA
Expressions
18
Converting RE into FA
Proof idea:
Build a NFA by transforming some regular
expression r into a non-deterministic finite
automaton N that accepts the language L(r).
19
Converting RE into RL
Regular Regular
NFA
Expression Language
20
RE to FA Algorithm
q
r
p
i
j
i j
22
RE to FA Algorithm (cont.)
r+s
i j
i j
s 23
RE to FA Algorithm (cont.)
rs
i j
r s
i j
24
RE to FA Algorithm (cont.)
r*
i j
25
Example
b*+ab
s f
26
Example (cont.)
b*
b*+ab
s f
ab
27
Example (cont.)
b*
s f
ab
a b 28
Example (cont.)
Next we apply rule 4 for b*:
b
b*
s f
a b 29
The final NFA
b
s f
a b
Proof idea:
Transform some DFA N into a regular
expression r that s.t. L(r)=L(N).
31
Converting RL into RE
Regular Regular
DFA
Language Expression
32
Converting FA into RE
33
Generalized NFA
• Before we start we first convert the DFA into a
Generalized NFA (GNFA):
– GNFA might have RE as labels.
– GNFA has a single accept state.
– The start state has arrows going out but none coming in.
– The accept state has arrows coming in but none going
out.
– Except for the start and accept state, one arrow goes
from every state to every other state (including itself).
r
s f 34
Converting DFA into GNFA
The input is a DFA or an NFA N=(Q, , , q0, F).
Perform the following steps:
s q0 f
35
Converting DFA into GNFA (cont.)
s q0 f
36
Converting DFA into GNFA (cont.)
3. For each pair of states i and j that have more
than one edge between them (in the same
direction), replace all the edges by a single
edge labeled with the RE formed by the sum of
the labels of these edges.
4. If there is no edge <i,j> then label(i,j)=.
r
i j
s
r+s
i j 37
Example: DFA to GNFA
a,b
a
q0 q1
b
q2 a,b
38
Example: DFA to GNFA (cont.)
a,b
a
s q0 q1 f
b
q2 a,b
39
Example: DFA to GNFA (cont.)
a+b
a
s q0 q1 f
b
q2 a+b
40
Converting GNFA into RE
• Let old(i,j) denote the label on edge <i,j> of
the current GNFA.
• Construct a sequence of new machines by
eliminating one state at a time until the only
two states remaining are s and f.
• The state elimination order is arbitrary.
• When a state is eliminated, a new
(equivalent) machine is constructed.
• The label on <s,t> in the final machine is
the required RE.
41
Converting GNFA into RE (cont.)
Eliminating state k
• For each pair of states (i,j) where i,jk, the
label of (i,j) will be updated as follows:
new(i,j)=old(i,j) + old(i,k)old(k,k)*old(k,j)
old(i,j)
i j old(i,j)+old(i,k)old(k,k)*old(k,j)
i j
old(i,k) k old(k,j)
old(k,k) 42
Converting GNFA into RE (cont.)
• The states of the new machine are those of
the current machine with state k eliminated.
• The edges of the new machine are those
edges (i,j) for which new(i,j) has been
calculated.
43
Example: GNFA to RE
a+b
a
s q0 q1 f
b
q2 a+b
44
Example: GNFA to RE
Eliminate state q2
• No paths pass through q2. There are no states
that connect through 2. So no need to change
anything after deletion of state 2.
a+b
a
s q0 q1 f
b
q2 a+b
45
Example: GNFA to RE
Eliminate state q2
• No paths pass through q2. There are no states
that connect through 2. So no need to change
anything after deletion of state 2.
a+b
a
s q0 q1 f
46
Example: GNFA to RE
Eliminate state q0
• The only path through it is s q1.
• We add an edge that is labeled by regular
expression associated with the deleted edges:
new(s,q1)=old(s,q1)+old(s,q0)old(q0,q0)*old(q0,q1)=
=+*a=a a+b
a
s q0 q1 f
47
Example: GNFA to RE
Eliminate state q0
• The only path through it is s q1.
• We add an edge that is labeled by regular
expression associated with the deleted edges.
new(s,q1)=old(s,q1)+old(s,q0)old(q0,q0)*old(q0,q1)=
=+*a=a a+b
a
s q1 f
48
Example: GNFA to RE
Eliminate state q1
• The only path through it is s f.
new(s,f)=old(s,f)+old(s,q1)old(q1,q1)*old(q1,f)=
=+a(a+b)* = a(a+b)*
a+b
a
s q1 f
49
Example: GNFA to RE
Eliminate state q1
• The only path through it is s f.
new(s,f)=old(s,f)+old(s,q1)old(q1,q1)*old(q1,f)=
=+a(a+b)* = a(a+b)*
a(a+b)*
s f
50
Example II
What is the regular expression of L(A)?
a a,b
b
q0 q1
Solution: In class
51
Example III
What is the regular expression of L(A)?
a
q0 q1 b
a
b b a
q2
Solution: In class
52