Unit V
Unit V
while i < n, do
if text[i] = pattern[j], then
increase i and j by 1
if j = m, then
print the location (i-j) as there is the pattern
j := prefArray[j-1]
else if i < n AND pattern[j] ≠ text[i] then
if j ≠ 0 then
j := prefArray[j - 1]
else
increase i by 1
done
End
Worked example of the search algorithm
Input:
Main String: “AAAABAAAAABBBAAAAB”, Pattern: “AAAB”
Output:
Pattern found at location: 1
Pattern found at location: 7
Pattern found at location: 14
Input:
txt[] = “THIS IS A TEST TEXT”, pat[] = “TEST”
Output:
Pattern found at index: 10
Example
BOYER MOORE ALGORITHM
• Specialized hardware machine to perform the searches and pass the results
to the main computer which support the user interface and retrieval of hits.
• Since the searcher is hardware based, scalability is achieved by
increasing the number of hardware search devices.
• The only limit on speed is the time it takes to flow the text of secondary
storage by having one search machine per disk, the maximum time it takes
to search a database of any size will be the time to search one disk.
The Fast Data Finder (FDF)
• It is the most recent specialized hardware text search unit still in use in many
organizations.
• It was developed to search text and has been used to search English and
foreign languages.
• The early Fast Data Finders consisted of an array of programmable text
processing cells connected in series forming a pipeline hardware search
processor.
• The cells are implemented using a VSLI chip. In the tests each chip contained
24 processor cells with a typical system containing 3600 cells (the FDF-3 has
a rack mount configuration with 10,800 cells).
• Each cell is a comparator for a single character, limiting the total number of
characters in a query to the number of cells. The cells are interconnected with
an 8-bit data path and approximately 20-bit control path.
• The text to be searched passes through
each cell in a pipeline fashion until the
complete database has been searched.
• As data are analyzed at each cell, the 20
control lines states are modified
depending upon their current state and
the results from the comparator.
• A cell is composed of both a register cell
(Rs) and a comparator (Cs).
• The input from the Document database is
controlled and buffered by the
microprocessor/memory and feed through
the comparators.
Other Hardware
• The search characters are stored in the registers. The connection between
the registers reflects the control lines that are also passing state information.
• Earliest hardware text string search unit - Rapid Search Machine developed
by General Electric. The machine consisted of a special purpose search unit
where a single query was passed against a magnetic tape containing the
documents.
• In recent years the evaluation of IRS and techniques for indexing, sorting,
searching and retrieving information have become increasingly important.