Exact String Matchin
Exact String Matchin
Exact String Matchin
Series
1
pandeyravik@gmail.com, 2 staruna@jklu.edu.in
Abstract. String matching and searching problems are one of the classical hitches to the
domain of computer engineering. There exist numerous variants of such algorithms, which can
be categorized into dual classes; i.e. an approximate match and exact match algorithms. This
paper engrossed on comparing prevalent algorithms of exact matching category in terms of
their functionality and complexity along with critical opinions for a better understanding of the
differences among them. The application of string matching is used in various fields like for
intrusion detection in the networks, DNA matching in the field of bioinformatics, plagiarism
checking for fraud detection, in the field of information security, pattern recognition, text
mining, web searching, recommendation system, the document comprising, authentication
system and web scraping. The uses are not only restricted to the fields termed above but also
the notion has copious advantages for ongoing and forthcoming research work.
1. Introduction
The string matching is the key to several valuable algorithms used in a tangible situation. Many times,
string matching and searching algorithms are considered to be the vital base algorithms for high
graded research works, and to get its privilege researchers changed it according to their needs. Like if
you have to perform a search in a big data environment, the parallel search algorithm is used whereas
if the same algorithm used with other scenarios with the small data set, the algorithm may show low
efficiency. If we require an approximate match to some string then we have to use an algorithm from
the bunch of approximate match algorithm. This paper is intensively focused on a few prevalent exact
match algorithms. The readings are not limited to the algorithms reviewed here only but several
uncommon algorithms may perform better in your situation. Among the algorithms discussed in the
paper, the Naive Algorithm is considered to be the easiest and the oldest one; but in terms of the
performance, it lacks behind to the other presented algorithms.
2. Exact_String_Matching Algorithms
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
FEST 2020 IOP Publishing
Journal of Physics: Conference Series 1854 (2021) 012042 doi:10.1088/1742-6596/1854/1/012042
deprived of any fixed directive. The searching time complexity is in the order of O(n*m) for the
Brute_Force (Naïve) algorithm.
2
FEST 2020 IOP Publishing
Journal of Physics: Conference Series 1854 (2021) 012042 doi:10.1088/1742-6596/1854/1/012042
3
FEST 2020 IOP Publishing
Journal of Physics: Conference Series 1854 (2021) 012042 doi:10.1088/1742-6596/1854/1/012042
Zest of the Turbo_BM algorithm is to remember the subpart of string in the transcript; this algorithm
is as well a modified form of the Boyer_Moore algorithm. The algorithm doesn’t need further pre-
processing state rather than of base algorithm, it just entails a constant additional space only. Sub-
string’s matching is performed with a suffix of the pattern until the evaluations with the last character.
Because of a turbo shift capability, the name is Turbo_BM. Space and pre-processing time
complexities are in order of O(σ) and O(σ+m) respectively, whereas, the order of O(n) is the search
time complexity. In the nastiest situation, the maximum of 2n text character comparisons is
accomplished while searching.
4
FEST 2020 IOP Publishing
Journal of Physics: Conference Series 1854 (2021) 012042 doi:10.1088/1742-6596/1854/1/012042
3. Comparison Chart
Table 1. Chart comparing the pre-processing and matching complexities of algorithms discussed in
previous session.
Name of Algorithm Pre-processing Complexity Matching Complexity
Brute Force (Naïve) ---- O(n*m)
Finite Automata O(m|∑|) O(n)
Rabin_Karp O(m) O((n-m+1) m)
Shift OR O(σ+m) O(n)
Morris_Pratt O(m) O(n+m)
Knuth_Morris_Pratt O(m) O(n)
Colussi O(m) O(n)
Forward DAWG O(n) O(n)
Boyer_Moore O (|∑|) O((n-m+1)+|∑|)
Quick Search O(σ+m) O(n*m)
Horspool O(σ+m) O(n*m)
Apostolico_Giancarlo O(σ+m) O(n)
Boyer_Moore_Smith O(σ+m) O(n*m)
Raita O(σ+m) O(n*m)
Turbo_BM O(σ+m) O(n)
Berry_Ravindran O(σ2+m) O(n*m)
Wu_Manber O(|∑|+m) O(n)
Backward_Oracle_Matching O(m) O(n*m)
Backward Nondeterministic DAWG O(m) O(n*m)
Commentz_Walter O(σ+m) O(n*m)
Aho_Corasick O(nk+m) O(n+m+z)
4. Conclusion
The paper discussed some of the popular exact matching algorithms and found that few algorithms
show linearity in pre-processing or searching time complexities. However, Shift OR, Morris_Pratt,
Kunth_Morris_Pratt, Colussi, Forward DAWG, Boyer_Moore, Apostolico_Giancarlo, Turbo_BM,
Wn_Manber are being shown linearity in pre-processing as well as in the search segment. Among the
discussed algorithms, some are good for matching long patterns (i.e. Backward_Oracle), few shows
high efficiency on working with short patterns (i.e. Quick Search). Many of the algorithms work on
simple pattern matching while other use automata (i.e. Searching with Automation, Forward DAWG
Matching, Backward Non-deterministic DAWG Matching, Aho_Corasick,) or hash function (i.e.
Rabin Karp, Wu_Manber), Several needs storage while few obsolete needs the space requirement. The
string matching /searching algorithms have a vast range, including an exact match, approximate
match, parallel search, single pattern search, and multiple pattern search.
References
[1] P. D. Michailidis and K. G. Margaritis, "On-Line String Matching Algorithms: Survey And
Experimental Results," International Journal of Computer Mathematics, vol. 76, no. 4, pp.
411-434, 2001.
[2] C. Charras and T. Lecroq, Handbook of Exact String Matching Algorithms, 2004.
[3] C. Charras and T. Lecroq, "Exact string matching algorithms," Technical Report, 1997.
[4] C. Charras and T. Lecroq, "Exact string matching algorithms animation in java," Laboratoire
d'Informatique de Rouen, 14 January 1997. [Online]. Available: http://www-igm.univ-
5
FEST 2020 IOP Publishing
Journal of Physics: Conference Series 1854 (2021) 012042 doi:10.1088/1742-6596/1854/1/012042