Theory of Computation Term Paper
Theory of Computation Term Paper
A Term paper/ Project submitted in the fulfillment of the requirement for the award of the degrees of master
of computer applications
Topic: Technique for conversion of regular expression to and from finite automata
INTRODUCTION:
Regular expressions are used to represent certain set of string in algebraic manner. Regular expressions are
widely used in the field of compiler design, text editor, search for an email-address, grep filter of unix, train
track switches, pattern matching ,context switching and in many areas of computer science. The demand of
converting regular expression into finite automata and vice versa motivates research into some alternative
so that time taken for above is minimized. For conversion of deterministic finite automata to regular
expression, several techniques like Transitive closure method, Brzozowski Algebraic method and state
elimination method have been proposed. None of the above specified technique is able to find smallest
regular expression. Our purpose is to find the smallest regular expression equivalent to given deterministic
finite automata. State elimination approach is the most widely used and efficient approach for converting
deterministic finite automata to regular expression.
This paper investigates and compares different techniques used for converting deterministic finite
automata to regular expression. Brief comparisons amongst different techniques are presented in this term
paper.
Deterministic finite automaton (DFA) is a finite state machine accepting finite strings of symbols. For each
state, there is a transition arrow leading out to a next state for each symbol. Deterministic finite automata
(DFA) can be defined by 5- tuples (Q, Σ, δ, q0, F), where
Q is a finite set of states
Σ is a finite set of symbols
δ is the transition function, that is, δ: Q × Σ → Q.
3. Regular Expression:
A regular expression (RE) is a pattern that describes some set of strings. Regular expression over a language
can be defined as:
1) Regular expression for each alphabet will be represented by itself. The empty string (ϵ) and null language
(ϕ) are regular expression denoting the language {ϵ} and {ϕ} respectively.
2) If E and F are regular expressions denoting the languages L(E) and L(F) respectively, then following rules
can be applied recursively.
a. Union of E and F will be denoted by regular expression E+F and representing language L(E) U L(F).
b. Concatenation of E and F denoted by EF and representing language L(E*F) = L(E) * L(F).
c. Kleene closure will be denoted by E* and represent language (L(E))*.
Any regular expression can be formed using 1-2 rules only.
CONVERSIONS BETWEEN REGULAR EXPRESSION & AUTOMATA
This section describes different techniques used for converting deterministic finite automata to regular
expression.
(a) Conversion of DFA to RE
Kleene proves that every RE has equivalent DFA and vice versa. On the basis of this theoretical result, it
is clear that DFA can be converted into RE and vice versa using some algorithms or techniques. For
converting RE to DFA, first we convert RE to NFA(Thomson Construction) and then NFA is converted
into DFA(Subset construction).For conversion of DFA to regular expression, following methods have
been introduced.
▪ Transitive closure method
▪ Brzozowski Algebraic method
▪ State elimination method
(b)Conversion of RE to FA
It turns out that every Regular Expression has an equivalent NFA and vice versa. There are multiple ways
to translate RE into equivalent NFA‟s but there are two main and most popular approaches. The first
approach and the one that will be used during this project is the Thompson algorithm and the other one is
McNaughton and Yamada‟s algorithm.
(b1) Thompson’s algorithm.
Thompson algorithm was first described by Thompson in his CACM paper in 1968. Thompson‟s algorithm
parse the input string (RE) using the bottom-up method, and construct the equivalent NFA. The final NFA
is built from partial NFA‟s, it means that the RE is divided in several subexpressions, in our case every
regular expression is shown by a common tree, and every subexpression is a subtree in the main common
tree. Based on the operator the subtree is constructed differently which results on a different partial NFA
construction. For example the NFA for matching a single character look like:
Figure 6: Automaton that represents the concatenation of two characters, ‘a’& ‘b’(ab)
The alternation of a|b is constructed by adding a new state with a choice of first expression and another
choice to second expression:
Figure 7: Automaton that represents the union of two characters, ‘a’and ‘b’(a|b)
The loops as a* or a+ are almost similar, and “a+” can be written as “aa*”, so the NFA graph looks like:
CONCLUSION
This paper work provides an insight into the various approaches used for conversion of deterministic
finite automata to regular expression and vice versa. Comparisons between different techniques for
conversion of DFA to RE are carried out. Researching this project has shown that the conversion of regular
expressions to DFA and back again are processes that are well understood and are implementable
without any great difficulty. The most time-consuming part of the project was coding the parser for the
regular expression. This is because while regular expressions define regular languages, they themselves
are not regular and must be described by context-free grammars.
REFERENCES
1. https://www.javatpoint.com/automata-conversion-of-re-to-fa
2. https://www.academia.edu/27877021/Technique_for_Conversion_of_Regula
r_Expression_to_and_from_Finite_Automata
3. https://www.researchgate.net/publication/346021350_Conversion_of_Deter
ministic_and_Non-
_Deterministic_Finite_Automata_to_Regular_Expression_using_Brzozowski_
Algebraic_Method
4. https://www.google.com/