Fast and Memory-Efficient Regular Expression Matching For Deep Packet Inspection

Fast and Memory-Efficient Regular Expression
Matching for Deep Packet Inspection
Fang Yu
Zhifeng Chen
Yanlei Diao
T.V. Lakshman
Randy H. Katz
Electrical Engineering and Computer Sciences

University of California at Berkeley
Technical Report No. UCB/EECS-2006-76

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-76.html
May 22, 2006

Copyright © 2006, by the author(s).
All rights reserved.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Fast and Memory-Efficient Regular Expression
Matching for Deep Packet Inspection
Fang Yu, Member, IEEE Zhifeng Chen, Member, IEEE Yanlei Diao, Member, IEEE
T. V. Lakshman , Fellow, IEEE Randy H. Katz, Fellow, IEEE
Abstract—Packet content scanning at high speed has become As regular expressions gain widespread adoption for
extremely important due to its applications in network security, packet content scanning, it is imperative that regular
network monitoring, HTTP load balancing, etc. In content expression matching over the packet payload keep up with
scanning, the packet payload is compared against a set of the line-speed packet header processing. Unfortunately, this
patterns specified as regular expressions. In this paper, we first
show that memory requirements using traditional methods are
requirement cannot be met in many existing payload
prohibitively high for many patterns used in packet scanning scanning implementations. For example, when all 70
applications. We then propose regular expression rewrite protocol filters are enabled in the Linux L7-filter [1], we
techniques that can effectively reduce memory usage. Further, found that the system throughput drops to less than 10Mbps,
we develop a grouping scheme that can strategically compile a which is well below current LAN speeds. Moreover, over
set of regular expressions into several engines, resulting in 90% of the CPU time is spent in regular expression
remarkable improvement of regular expression matching speed matching, leaving little time for other intrusion detection or
without much increase in memory usage. We implement a new monitoring functions. On the other hand, although many
DFA-based packet scanner using the above techniques. Our schemes for fast string matching [4-11] have been developed
experimental results using real-world traffic and patterns show
that our implementation achieves a factor of 12 to 42
recently in intrusion detection systems, they focus on explicit
performance improvement over a commonly used DFA-based string patterns only and can not be easily extended to fast
scanner. Compared to the state-of-art NFA-based regular expression matching.
implementation, our DFA-based packet scanner achieves 50 to The inefficiency in regular expression matching is largely
700 times speedup. due to the fact that the current solutions are not optimized for
the following three unique complex features of regular
I. INTRODUCTION expressions used in network packet scanning applications.
Packet content scanning (also known as Layer-7 filtering or • First, many such patterns use multiple wildcard
payload scanning) is crucial to network security and network metacharacters (e.g., ‘.’, ‘*’). For example, the pattern
monitoring applications. In these applications, the payload of for identifying the Internet radio protocol,
packets in a traffic stream is matched against a given set of “membername.*session.*player”, has two wildcard
patterns to identify specific classes of applications, viruses, fragments “.*”. Some patterns even contain over ten
protocol definitions, etc. such wildcard fragments. As regular expressions are
converted into state machines for pattern matching, large
Currently, regular expressions are replacing explicit
numbers of wildcards can cause the corresponding
string patterns as the pattern matching language of choice in
Deterministic Finite Automaton (DFA) to grow
packet scanning applications. Their widespread use is due to
exponentially.
their expressive power and flexibility for describing useful
patterns. For example, in the Linux Application Protocol • Second, a majority of the wildcards are used with length
Classifier (L7-filter) [1], all protocol identifiers are expressed restrictions (‘?’, ‘+’). As we shall show later in the
as regular expressions. Similarly, the Snort [2] intrusion paper, such length restrictions can increase the resource
detection system has evolved from no regular expressions in needs for expression matching.
its ruleset in April 2003 to 1131 out of 4867 rules using • Third, groups of characters are also commonly used: for
regular expressions as of February 2006. Another intrusion example, the pattern for matching the ftp protocol,
detection system, Bro [3], also uses regular expressions as its “^220[\x09-\x0d -~]*ftp”, contains a class (inside the
pattern language. brackets) that includes all the printing characters and
space characters. The class of characters may intersect
F. Yu, Department of EECS, University of California Berkeley, Berkeley, with other classes or wildcards. Such interaction can
CA 94720 (phone: 510-642-8284; email: fyu@eecs.berkeley.edu).
Z. Chen, Google Inc., 1600 Amphitheathre Pkwy, Mountain View, CA result in a highly complex state machine.
94043 (email: zhifengc@google.com) To the best of our knowledge, there has not been any
Y. Diao, Department of Computer Science, University of Massachusetts detailed study of optimizations for these kinds of regular
Amherst, Amherst, MA 01003 (email: yanlei@cs.umass.edu).
T.V. Lakshman, Bell Laboratories, Lucent Technologies, 101 Crawfords
expressions as they are so specific to network packet
Corner Road, Holmdel, NJ 07733 (email: lakshman@research.bell-labs.com). scanning applications. In this paper, we address this gap by
Randy H. Katz, Department of EECS, University of California Berkeley, analyzing these regular expressions and developing memory-
Berkeley, CA 94720 (email: randy@eecs.berkeley.edu).
1
efficient DFA-based solutions for high speed processing. then a letter l, w or t, and some arbitrary characters, and
Specifically, we make the following contributions: finally the ASCII letters c0 and 80 in the hexadecimal form.
• We analyze the computational and storage cost of Table 2 compares the regular expressions used in two
building individual DFAs for matching regular networking applications, Snort and the Linux L7-filter,
expressions, and identify the structural characteristics of against those used in emerging Extensible Markup Language
the regular expressions in networking applications that (XML) filtering applications [12, 13] where regular
lead to exponential growth of DFAs, as presented in expressions are matched over text documents encoded in
Section 3.2. XML. We notice three main differences: (1) While both
• Based on the above analysis, we propose two rewrite types of applications use wildcards (‘.’, ‘?’, ‘+’, ‘*’), the
rules for specific regular expressions in Section 3.3. The patterns for packet scanning applications contain larger
rewritten rules can dramatically reduce the size of numbers of them in each pattern; (2) classes of characters
resulting DFAs, making them small enough to fit in (“[]”) are used only in packet scanning applications; (3) a
memory. We prove that the patterns after rewriting are high percentage of patterns in packet payload scanning
equivalent to the original ones for detecting non- applications have length restrictions on some of the classes or
overlapping patterns. While we do not claim to handle wildcards, while such length restrictions usually do not occur
all possible cases of dramatic DFA growth (in fact the in XML filtering. This shows that compared to the XML
worse case cannot be improved), our rewrite rules do filtering applications, network packet scanning applications
cover those patterns present in common payload face additional challenges These challenges lead to a
scanning rulesets like Snort and Bro, thus making fast significant increase in the complexity of regular expression
DFA-based pattern matching feasible for today’s matching, as we shall show later in this paper.
payload scanning applications. Table 1. Features of Regular Expressions
Syntax Meaning Example
• We further develop techniques to intelligently combine ^ Pattern to be matched ÂB means the input starts with AB.
multiple DFAs into a small number of groups to improve at the start of the input A pattern without ‘^’, e.g., AB, can
the matching speed in Section IV, while avoiding the be matched anywhere in the input.
| OR relationship A|B denotes A or B.
exponential growth in the number of states in memory.
. A single character
We demonstrate the effectiveness of our rewriting and wildcard
grouping solutions through a detailed performance analysis ? A quantifier denoting A? denotes A or an empty string.
one or less
using real-world payload scanning pattern sets. As the results
* A quantifier denoting A* means an arbitrary number of As.
show, our DFA-based implementation can increase the zero or more
regular expression matching speed on the order of 50 to 700 {} Repeat A{100} denotes 100 As.
times over the NFA-based implementation used in the Linux [] A class of characters [lwt] denotes a letter l, w, or t.
L7-filter and Snort system. It can also achieve 12-42 times [^] Anything but [^\n] denotes any character except \n.
speedup over a commonly used DFA-based parser. The Table 2. Comparison of regular expressions in networking
pattern matching speed can achieve gigabit rates for certain applications against those in XML filtering
pattern sets. This is significant for implementing fast regular Snort L7-filter XML filtering
# of regular expressions analyzed 1555 70 1,000-100,000
expression matching of the packet payload using network
% of patterns starting with “^” 74.4% 72.8% ≥80%
processors or general-purpose processors, as the ability to % of patterns with wildcards “., +, 74.9% 75.7% 50% -
more quickly and efficiently classify enables many new ?, *” 100%
technologies like real-time worm detection, content lookup in Average # of wildcards per pattern 4.65 7.03 1-2
overlay networks, fine-grained load balancing, etc. % of patterns with class “[ ]” 31.6% 52.8% 0
Average # of classes per pattern 7.97 4.78 0
% of patterns with length 56.3% 21.4% ≈0
II.PROBLEM STATEMENT restrictions on classes or wildcards
In this section, we first discuss regular expressions used in 2.2 Solution Space for Regular Expression Matching
packet payload scanning applications, then present the
Finite automata are a natural formalism for regular
possible solutions for regular expression matching, and
expressions. There are two main categories: Deterministic
finally define the specific problem that we address in this
Finite Automaton (DFA) and Nondeterministic Finite
paper.
Automaton (NFA). In this section, we survey existing
2.1 Regular Expression Patterns solutions using these two types of automata.
A regular expression describes a set of strings without A DFA consists of a finite set of input symbols, denoted
enumerating them explicitly. Table 1 lists the common as ∑, a finite set of states, and a transition function δ [14]. In
features of regular expression patterns used in packet payload networking applications, ∑ contains the 28 symbols from the
scanning. For example, consider a regular expression from extended ASCII code. Among the states, there is a single
the Linux L7-filter [1] for detecting Yahoo traffic: start state q0 and a set of accepting states. The transition
“^(ymsg|ypns|yhoo).?.?.?.?.?.?.?[lwt].*\xc0\x80”. This pattern function δ takes a state and an input symbol as arguments
matches any packet payload that starts with ymsg, ypns, or and returns a state. A key feature of DFA is that at any time
yhoo, followed by seven or fewer arbitrary characters, and there is only one active state in the DFA. An NFA works
similarly to a DFA except that the δ function maps from a
2
state and a symbol to a set of new states. Therefore, multiple such situations, a network processor or general-purpose
states can be active simultaneously in an NFA. CPU-based implementation may be more desirable.
A theoretical worst case study [14] shows that a single 2.3 Problem statement
regular expression of length n can be expressed as an NFA In this paper, we seek a fast and memory-efficient solution to
with O(n) states. When the NFA is converted into a DFA, it
regular expression matching for packet payload scanning.
may generate O(∑n) states. The processing complexity for We define the scope of the problem as follows:
each character in the input is O(1) in a DFA, but is O(n2) for
• We consider DFA-based approaches in this paper, as
an NFA when all n states are active at the same time.
NFA-based approaches are inefficient on serial
To handle m regular expressions, two choices are processors or processors with limited parallelism (e.g.,
possible: processing them individually in m automata, or multi-core CPUs in comparison to FPGAs). Our goal is
compiling them into a single automaton. The former is used to achieve O(1) computation cost for each incoming
in Snort [2] and Linux L7-filer [1]. The latter is proposed in character, which cannot be accomplished by any existing
recent studies [12, 13] so that the single composite NFA can DFA-based solutions due to their excessive memory
support shared matching of common prefixes of those usage. Thus, the focus of the study is to reduce memory
expressions. Despite the demonstrated performance gains overhead of DFA while approaching the optimal
over using m separate NFAs, in practice this approach processing speed of O(1) per character.
experiences large numbers of active states. This has the same
worst case complexity as the sum of m separate NFAs. • We focus on general-purpose processor-based
Therefore, this approach on a serial processor can be slow, as architectures and explore the limits of regular
given any input character, each active state must be serially expression matching in this environment. Wherever
examined to obtain new states. appropriate, we leverage the trend of multi-core
processors that are becoming prevalent in those
In DFA-based systems, compiling m regular expressions
architectures. Nevertheless, our results can be used in
into a composite DFA provides guaranteed performance
FPGA-based and ASIC-based approaches as well [20].
benefit over running m individual DFA. Specifically, a
composite DFA reduces processing cost from O(m) (O(1) for It is worth noting that there are two sources of memory
each automaton) to O(1), i.e., a single lookup to obtain the usage in DFAs: states and transitions. The number of
next state for any given character. However, the number of transitions is linear with respect to the number of states
because for each state there can be at most 28 (for all ASCII
states in the composite automaton grows to O(∑mn) in the
characters) links to next states. Therefore, we consider the
theoretical worst case. In fact, we will show in Section 4 that
number of states (in minimized DFA) as the primary factor
typical patterns in packet payload scanning applications
for determining the memory usage in the rest of the paper.
indeed interact with each other and can cause the creation of
Also, due to the need for high performance, we do not
an exponential number of states in the composite DFA.
consider DFAs that use any table compression techniques.
Table 3. Worst case comparisons of DFA and NFA
One regular expression of m regular expressions
length n compiled together III.MATCHING OF INDIVIDUAL PATTERNS
Processing Storage Processing Storage In this section, we present our solution to matching
complexity cost complexity cost individual regular expression patterns. The main technical
NFA O(n2) O(n) O(n2m) O(nm)
challenge is to create DFAs that can fit in memory, thus
DFA O(1) O(∑n) O(1) O(∑nm) making a fast DFA-based approach feasible. We first define
There is a middle ground between DFA and NFA called a few concepts key to DFA construction in the context of
lazy DFA. Lazy DFA are designed to reduce memory packet payload scanning in Section 3.1. We then analyze the
consumption of conventional DFA [12, 15]: a lazy DFA size of DFAs for typical payload scanning patterns in Section
keeps a subset of the DFA that matches the most common 3.2. Although theoretical analyses [12, 14] have shown that
strings in memory; for uncommon strings, it extends the DFAs are subject to exponential blow-up, here, we identify
subset from the corresponding NFA at runtime. As such, a specific structures that can lead to exponential growth of
lazy DFA is usually much smaller than the corresponding DFAs. Based on the insights from this analysis, in Section
fully-compiled DFA and provides good performance for 3.3, we propose pattern rewrite techniques that explore the
common input strings. Bro intrusion detection systems [3] possibility of trading off exhaustive pattern matching (which
adopt this approach. However, malicious senders can easily real-world applications often allow) for memory efficiency.
construct packets that keep the system busy and slow down Finally, we offer guidelines to pattern writers on how to write
the matching process. patterns amenable to efficient implementation in Section 3.4.
Field Programmable Gate Arrays (FPGAs) provide a
high degree of parallelism and thus can be used to speed up 3.1 Design Considerations
the regular expression matching process. There are existing Although regular expressions and automata theory can be
FPGA solutions that build circuits based on DFA [16] or directly applied to packet payload scanning, there is a
NFA [17-19]. These approaches are promising if the extra noticeable difference in between. Most existing studies on
FPGA hardware can be embedded in the packet processors. regular expressions focus on a specific type of evaluation,
FPGAs, however, are not available in many applications; in that is, checking if a fixed length string belongs to the
language that a regular expression defines. More specifically,
3
a fixed length string is said to be in the language of a regular search starts from the beginning of the input, reading
expression, if the string is matched from start to end by a characters until (1) it has reported all matches (if exhaustive
DFA corresponding to that regular expression. In contrast, in matching is used) or one match (if non-overlapping matching
packet payload scanning, a regular expression pattern can be is used), or (2) it has reached the end of the input. In the
matched by the entire input or specific substrings of the input. former case, the new search will start from the next character
Without a priori knowledge of the starting and ending in input (if exhaustive matching is used) or from the
positions of those substrings, DFAs created for recognizing character after the reported match (if non-overlapping
all substring matches can be highly complex. matching is used). In the latter case, a new search is initiated
For a better understanding, we next present a few from the next character in input. This style of repeated
concepts pertaining to the completeness of matching results scanning using DFA is commonly used in language parsers.
and the DFA execution model for substring matching. However, it is inefficient for packet payload scanning where
Completeness of matching results the chance of the packet payload matching a particular
Given a regular expression pattern and an input string, a pattern is low (such inefficiency is verified in Section 5.3.3).
complete set of results contains all substrings of the input One-pass search. In the second approach, “.*” is pre-
that the pattern can possibly match. For example, given a pended to each pattern without ‘^’, which explicitly states
pattern ab* and an input abbb, three possible matches can be that the pattern can be matched anywhere in the input. Then a
reported, ab, abb, and abbb. We call this style of matching DFA is created for the extended pattern. As the input is
Exhaustive Matching. It is formally defined as below: scanned from start to end, the DFA can recognize all
substring matches that may start at different positions of the
Exhaustive Matching: Consider the matching process M as a
function from a pattern P and a string S to a power set of input. Using one pass search, this approach can truly achieve
S, such that, M(P, S) = {substring S' of S| S' is accepted O(1) computation cost per character, thus suitable for
networking applications. To achieve high scanning rate, we
by the DFA of P}.
adopt this approach in the rest of the study.
In practice, it is expensive and often unnecessary to
report all matching substrings, as most applications can be 3.2 DFA Analysis for Individual Regular Expressions
satisfied by a subset of those matches. Therefore, we propose Next, we study the complexity of DFA for typical patterns
a new concept, Non-overlapping Matching, that relaxes the used in real-world packet payload scanning applications such
requirements of exhaustive matching. as Linux L7-filter, Snort, and Bro. The study is based on the
Non-overlapping Matching: Consider the matching process use of exhaustive matching and one-pass search. Table 4
M as a function from a pattern P and a string S to a set of summarizes the results.
strings, specifically, M(P, S) = {substring Si of S| ∀ Si, Sj • Explicit strings generate DFAs of length linear to the
accepted by the DFA of P, Si ∩ Sj = φ }. number of characters in the pattern.
If a pattern appears in multiple locations of the input, this • If a pattern starts with ‘^’, it creates a DFA of polynomial
matching process reports all non-overlapping substrings that complexity with respect to the pattern length k and the
match the pattern. Revisit our example above. For the pattern length restriction j. Our observation from the existing
ab* and the input abbb, the three matches overlap by sharing payload scanning rule sets is that the pattern length k is
the prefix ab. For this example, non-overlapping matching usually limited but the length restriction j can reach
will report one match instead of three. hundreds or even thousands. Therefore, Case 4 can result
For most payload scanning applications, we expect that in a large DFA because it has a factor quadratic in j.
non-overlapping matching would suffice, as those • Patterns starting with “.*” and having length restrictions
applications are mostly interested in knowing if certain (Case 5) cause the creation of DFA of exponential size.
attacks or application layer patterns appear in a packet. In Table 4. Analysis of patterns with k characters
fact, most existing scanning tools like grep and flex and Pattern features Example # of states
systems like Snort [2] and Bro [3] implement special cases of 1. Explicit strings with k characters ÂBCD k+1
non-overlapping matching such as left-most longest .*ABCD
matching or left-most shortest matching. As we shall show 2. Wildcards ÂB.*CD k+1
.*AB.*CD
later this section, non-overlapping matching can be exploited
3. Patterns with ^, a wildcard, and a ÂB.{j+}CD O(k*j)
to construct more memory-efficient DFAs. length restriction j ÂB.{0, j}CD
DFA execution model for substring matching ÂB.{j}CD
In the following discussion, we focus on patterns without ‘^’ 4. Patterns with ^, a class of characters
Â+[A-Z]{j}D O(k+j2)
attached at the beginning. Recall that for such patterns, there overlaps with the prefix, and a length
restriction j
is no prior knowledge of whether/where a matching substring
5. Patterns with a length restriction j, .*AB.{j}CD O(k+2j)
may appear. To handle these patterns, two types of DFAs can where a wildcard or a class of .*A[A-Z]{j+}D
be created with different execution models: characters overlaps with the prefix
Repeated searches. A DFA can be created directly from a Next, we explain the two cases of large DFA sizes,
pattern using standard DFA construction techniques [14]. To namely, Case 4 and Case 5 of Table 4, in more detail.
find the set of matching substrings (using either exhaustive or Case 4: DFA of Quadratic Size
non-overlapping matching), the DFA execution needs to be A common misconception is that patterns starting with ‘^’
augmented with repeated searches of the input: An initial create simple DFAs. However, we discover that even with
4
‘^’, classes of characters that overlap with the prefix pattern post processing after identifying the prefix “SEARCH”. This
can still yield a complex DFA. Consider the pattern approach does not solve the problem because every packet
^B+[^\n]{3}D, where the class of character [^\n] denotes any (even normal traffic) with the prefix will incur the counting
character but the return character (\n). Its corresponding DFA process. In addition, intruders can easily construct packets
has a quadratic number of states, as shown in Figure 1. The with multiple (different) prefixes to invoke many requests for
quadratic complexity comes from the fact that the letter B such post processing.
overlaps with the class of character [^\n] and, hence, there is Case 5: DFA of Exponential Size
inherent ambiguity in the pattern: A second B letter can be Many payload scanning patterns contain an exact distance
matched either as part of B+, or as part of [^\n]{3}. requirement. Figure 2 shows the DFA for an example pattern
Therefore, if an input contains multiple Bs, the DFA needs to “.*A..CD”. An exponential number of states (22+1) are needed
remember the number of Bs it has seen and their locations in to represent these two wildcard characters. This is because
order to make a correct decision with the next input we need to remember all possible effects of the preceding As
character. If the class of characters has length restriction of j as they may yield different results when combined with
bytes, DFA needs O(j2) states to remember the combination subsequent inputs. For example, an input AAB is different
of distance to the first B and the distance to the last B. from ABA because a subsequent input BCD forms a valid
pattern with AAB (AABBCD), but not so with ABA
(ABABCD). In general, if a pattern matches exactly j
arbitrary characters, O(2j) states are needed to handle the
exact j requirement. This result is also reported in [12].
Similar results apply to the case where the class of characters
overlaps with the prefix, e.g., “.*A[A-Z]{j}D”.
Figure 1. A DFA for Pattern ^B+[^\n]{3}D

Similar structures in real world pattern sets:
A significant number of patterns in the Snort rule set fall into
this category. For example, the regular expression for the
NNTP rule is “^SEARCH\s+[^\n]{1024}”. Similar to the
example in Figure 1, \s overlaps with ^\n. White space Figure 2. A DFA for pattern .*A.{2}CD
characters cause ambiguity of whether they should match \s+ Similar structures in real world pattern sets:
or be counted as part of the 1024 non-return characters In the intrusion detection system Snort, 53.8% of the patterns
[^\n]{1024}. Specifically, an input of SEARCH followed by (mostly for detecting buffer overflow attempts) contain a
1024 white spaces and then 1024 ‘a’s will have 1024 ways of fixed length restriction. Out of them, around 80% of the rules
matching strings, i.e., one white space matches \s+ and the start with ^; hence, they will not cause exponential growth of
rest as part of [^\n]{1024}, or two white spaces match \s+ DFA. The remaining 20% of the patterns do suffer from the
and the rest as part of [^\n]{1024}, etc. By using 10242 states state explosion problem. For example, consider the rule for
to remember all possible consequences of these white spaces, detecting IMAP authentication overflow attempts, which
the DFA accommodates all the ways to match the substrings uses the regular expression “.*AUTH\s[^\n]{100}”. This rule
of different lengths. Note that all these substrings start with detects any input that contains AUTH, then a white space,
SEARCH and hence are overlapping matches. and no return character in the following 100 bytes. If we
This type of quadratic state problem cannot be solved by directly compile this rule into a DFA, the DFA will contain
an NFA-based approach. Specifically, the corresponding more than 10,000 states because it needs to remember all the
NFA contains 1042 states; among these, one is for the possible consequences that an AUTH\s subsequent to the first
matching of SEARCH, one for the matching of \s+, and the AUTH\s can lead to. For example, the second AUTH\s can
rest of the 1024 states for the counting of [\^n]{1024} with either match [^\n]{100} or be counted as a new match of the
one state for each count. An intruder can easily construct an prefix of the regular expression.
input as “SEARCH” followed by 1024 white spaces. With
this input, both the \s+ state and all the 1023 non-return
states would be active at the same time. Given the next
character, the NFA needs to check these 1024 states
sequentially to compute a new set of active states. Figure 3. NFA for the pattern .*AUTH\s[^\n]{100}
This problem cannot be solved by a fixed string pre- It is obvious that the exponential blow-up problem cannot
filtering scheme (used by Snort), either. This is because pre- be mitigated by using an NFA-based approach. The NFA for
filtering can only recognize the presence of the fixed string the pattern “.*AUTH\s[^\n]{100}” is shown in Figure 3.
“SEARCH” in the input. After that, an NFA or DFA-based Because the first state has a self-loop marked with Σ, the
matching scheme is still needed in post processing to report input “AUTH\sAUTH\sAUTH\s…” can cause a large number
whether the input matches the pattern and what the matches of states to be simultaneously active, resulting in
are. Another choice is to count the subsequent characters in significantly degraded system performance, as demonstrated
5
by our results reported in Section 5.3.3. Similar to Case 4, also easy to see that the new pattern requires a number of
this problem cannot be solved by a fixed string pre-filtering states linear in j because it has removed the ambiguity for
scheme (used by Snort), either. matching \s.
We provide a theorem (Theorem 1 in the Appendix) for a
more general case where the suffix of a pattern contains a
class of characters overlapping with its prefix and a length
restriction, “Â+[A-Z]{j}”. We prove that this type of pattern
FigureT 4. DFA for rewriting the pattern .*AUTH\s[^\n]{100} can be rewritten to “Â[A-Z]{j}” with equivalence
guaranteed under the condition of non-overlap matching.
Note that our rewrite rule can also be extended to patterns
with various types of length restriction such as “Â+[A-
Z]{j+}” and “Â+[A-Z]{j,k}”. Details are omitted in the
interest of space.
Using Rewrite Rule (1), we successfully rewrote 17
similar patterns in the Snort rule set. Detailed results
regarding these rewrites are reported in Section 5.2.
Figure 5. Transformed NFA for deriving Rewrite Rule (1) Rewrite Rule (2)
3.3 Regular Expression Rewrites As we discussed in Section 3.2, patterns like
We have identified the typical patterns used in packet “.*AUTH\s[^\n]{100}” generate exponential numbers of
payload scanning that can cause the creation of large DFAs. states to keep track of all the AUTH\s subsequent to the fist
In this section, we investigate the possibility of rewriting AUTH\s. If non-overlapping matching is used, the intuition
some of those patterns to reduce the DFA size. Such of our rewriting is that after matching the first AUTH\s, we
rewriting is enabled by relaxing the requirement of do not need to keep track of the second AUTH\s. This is
exhaustive matching to that of non-overlapping matching. In because (1) if there is a ‘\n’ character within the next 100
particular, we propose two rewrite rules, one for rewriting bytes, the return character must also be within 100 bytes to
specific patterns belonging to the case of quadratic-sized the second AUTH\s, and (2) if there is no ‘\n’ character
DFAs (Case 4 in Section 3.2), and the other for rewriting within the next 100 bytes, the first AUTH\s and the following
specific patterns that generate exponential-sized DFAs (Case characters have already matched the pattern. This intuition
5 of Section 3.2). The commonality of the patterns amenable implies that we can rewrite the pattern such that it only
to rewrites is that their suffixes address length restricted attempts to capture one match of the prefix pattern.
occurrences of a class of characters that overlap with their Following the intuition, we can simplify the DFA by
prefixes. These patterns are typical in real-world rulesets removing the states that deal with the successive AUTH\s. As
such as Snort and Bro. For these patterns, as shown in shown in Figure 4, the simplified DFA first searches for
Section 3.2, neither the NFA-based solution nor the fixed AUTH in the first 4 states, then looks for a white space, and
string pre-filtering scheme can handle them efficiently. In after that starts to count and check whether the next 100
contrast, our rewrites rules can convert these patterns into bytes contains a return character. After rewriting, the DFA
DFAs with their sizes successfully reduced from quadratic or only contains 106 states.
exponential to only linear. We derive our rewrite pattern from the simplified DFA
Rewrite Rule (1) shown in Figure 4. Applying a standard technique that maps
a DFA/NFA to a regular expression [14], we transform this
As shown in Section 3.2, patterns that start with ‘^’ and
DFA to an equivalent NFA in Figure 5. For the link that
contain classes of characters with length restrictions, e.g.,
moves from state 1 back to the start state in Figure 4 (i.e.,
“^SEARCH\s+[^\n]{1024}”, can generate DFAs of quadratic
matching A then not U), the transformed NFA places it right
size with respect to the length restriction. Below, we first
at the start state and labels it with A[Û]. The transformed
explain the intuition behind Rewrite Rule (1) using the above
NFA does the same for each link moving from state i
example and then state a theorem for more general cases.
(1≤i≤105) to the start state in Figure 4. The transformed
Given the fact that such patterns are used in packet
NFA can be directly described using the following regular
scanning applications for detecting buffer overflow attempts,
expression:
it seems reasonable to assume that non-overlapping matches
“([Â]|A[Û]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s
are sufficient for reporting such attacks. Based on this
[^\n]{0,99}\n)*AUTH\s[^\n]{100}”.
observation, we propose to rewrite the pattern
“^SEARCH\s+[^\n]{1024}” to “^SEARCH\s[^\n]{1024}”. This rule first enumerates all the cases that do not satisfy the
The new pattern specifies that after matching a single white pattern and then attaches the original pattern to the end of the
space, we start counting for [^\n]{1024} no matter what the new pattern. In other words, “.*” is replaced with the cases
content is. It is not hard to see that for every matching that do not match the pattern, represented by
substring s that the original pattern reports, the new pattern ([Â]|A[Û]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99
produces a substring s’ that is either identical to s or is a }\n)*. Then, when the DFA comes to the states for
prefix of s. In other words, the new pattern essentially AUTH\s[^\n]{100}, it must be able to match the pattern.
implements non-overlapping left-most shortest match. It is Since the rewritten pattern is directly obtained from a DFA
6
of size j+5, it generates a DFA of a linear number of states as In this pattern, \s+ matches all white spaces and [^\s] means
opposed to an exponential number before the rewrite. the first non white space character. If there are more than
We also provide a theorem (Theorem 2 in the Appendix) 1023 non return characters following the first non white
that proves the equivalence of the new pattern and the space character, it is a buffer overflow attack. By adding
original pattern for a more general case “.*AB[A-Z]{j}” [^\s], the ambiguity in the original pattern is removed; given
under the condition of non-overlapping matching. Moreover, an input, there is only way of matching each packet. As a
we offer rewrite rules for patterns in other forms of length result, this new pattern generates a DFA of linear size.
restriction, e.g., “.*AB[A-Z]{j+}”.
Rewrite Rule (2) is applicable to 54 expressions in the IV.SELECTIVE GROUPING OF MULTIPLE PATTERNS
Snort rule sets and 49 in the Bro rule set. We wrote a script to The previous section presented our analysis of the
automatically rewrite these patterns and observed significant complexity of the DFA created for individual patterns and
reduction in DFA size. Detailed simulation results are two rewrite techniques that simplify these DFA so that they
reported in Section 5.2. could fit in memory. As we mentioned in Section 2.2, it is
3.4 Notes for Pattern Writers well known that the computation complexity for processing
As mentioned above, an important outcome of this work is m patterns reduces from O(m) to O(1) per character, when
that our pattern rewriter can automatically perform both the m patterns are compiled into a single composite DFA.
types of rewriting. An additional benefit is that our analysis However, it is usually infeasible to compile a large set of
provides insight into how to write regular expression patterns patterns together due to the complicated interactions between
amenable to efficient DFA implementation. We discuss this patterns. In some cases, the composite DFA may experience
in more detail below. exponential growth in size, although none of the individual
DFA has an exponential component.
From the analysis in Section 3.2, we can see that patterns
with length restrictions can generate large DFAs. By In this section, we first present two examples illustrating
studying typical packet payload scanning pattern sets the interactions between patterns, and then use a real-world
including Linux L7-filter, Snort, and Bro, we found that payload scanning ruleset to demonstrate the existence of
21.4-56.3% of the length restrictions are associated with exponential growth in reality. Based on these observations,
classes of characters. The most common classes of characters we propose grouping algorithms that selectively divide
are “[^\n]”, “[^\]]” (i.e., not ‘]’), and “[^\]”, used for detecting patterns into groups while avoiding the adverse interaction
buffer overflow attempts. The length restrictions of these among patterns.
patterns are typically large (233 on the average and reaching 4.1 Interactions of Regular Expressions
up to 1024). For these types of patterns, we highly encourage
the pattern writer to add “^” so as to avoid the exponential When patterns share prefixes, some states can be merged. For
state growth as we showed in Section 3.3. For patterns that example, states 1 and 2 shown in Figure 6 are shared by
cannot start with “^”, the pattern writers can use the Rewrite “.*ABCD” and “.*ABAB”. Combining these patterns can
Rule 2 to generate state efficient patterns. save both storage and computation.
Even for patterns starting with “^”, we need to carefully
avoid the interactions between a class of characters and its
preceding character as shown in Rewrite Rule 1. One may
wonder why a pattern writer uses \s+ in the pattern
“^SEARCH\s+[^\n]{1024}”, when it can be simplified as \s.
Our understanding is that, in reality, a server implementation Figure 6. A DFA for pattern .*ABCD and .*ABAB
of a search task usually interprets the input in one of the two
However, if the patterns do not share the same prefix,
ways: either skip a white space after SEARCH and take the
putting m patterns together may generate 2m states.
following up to 1024 characters to conduct a search, or skip
Figure 7 shows a composite DFA for matching
all white spaces and take the rest for the search. The original
“.*AB.*CD” and “.*EF.*GH”. This DFA contains many
pattern writer may want to catch intrusion to systems of
states that did not exist in the individual DFAs. Among them,
either implementation. However, the original pattern will
state 8 is created to record the case of matching both prefixes
generate false positives if the server does the first type of
AB and EF. Generally speaking, if there are l patterns with
implementation (skipping all the white spaces). This is
one wildcard per pattern, we need O(2l) states to record the
because if an input is followed by 1024 white spaces and
matching of the power set of the prefixes. In such scenarios,
then some non-whitespace regular command of less than
adding one more pattern into the DFA doubles its size. If
1024 bytes, the server can skip these white spaces and take
there are x wildcards per pattern, then (x+1)l states are
the follow-up command successfully. However, this input
required. There are several such patterns in the Linux L7-
will be caught by the original pattern as intrusion because
filter. For example, the pattern for the remote desktop
these white spaces themselves can trigger the alarm. To catch
protocol is “.*rdpdr.*cliprdr.*rdpsnd”, and the pattern for
attacks to this type of server implementation, while not
Internet radio is “.*membername.*session.*player”. Snort
generating false positives, we need the following pattern.
also has similar patterns and the number of “.*” in a pattern
“^SEARCH\s+[^\s][^\n]{1023}”
can go up to six.
7
leading to the individual DFA, a new accepting state, and
two ε edges from the DFA accepting states to the new
accepting state, as shown in Figure 9. Then we run the NFA
to DFA conversion algorithm and the DFA minimization
algorithm to obtain the composite DFA.
ε ε
ε ε
Figure 7. A DFA for pattern .*AB.*CD and .*EF.*GH
Figure 9. Composite NFA for two DFAs
4.2 Interactions of Real-world Regular Expressions We use the information on pairwise interaction to group a
We study the pattern interactions of the Linux L7-filter [1] in set of m regular expressions. The intuition is that if there is
this section. If 70 patterns are compiled separately into 70 no interaction between any pair selected from R1, R2, and R3,
DFAs, each DFA has tens to hundreds of states. The total the composite DFA of R1, R2, R3 is not likely to exceed the
number of states is 3533. When we start to group multiple sum of individual ones. We validate this point using
patterns into a composite DFA (we select patterns with a empirical results in Section 5.3.1.
random order), the processing complexity decreases. We devise grouping algorithms both for multi-core
However, the total number of DFA states (i.e., the sum of the processor architecture, where groups of patterns can be
composite DFA and those ungroup ones) grows over 136,786 processed in parallel among different processing units, and
with just 40 patterns, as illustrated by the increasing dotted for general processor architecture, where the DFA for one
line in Figure 8. We could not add more patterns into the group corresponds to one process or thread. Next, we present
composite DFA because it exceeded the memory limit in the the algorithm for the former architecture first and then the
test machine that we used (1.5 GB). However, not all patterns algorithm for the latter.
cause significant DFA growth. Only some patterns (e.g., In multi-core architecture, there are multiple parallel
pattern 12, 37, and 38 as shown in Figure 8) lead to processing units. Their number is usually limited, e.g., 16 in
significant growth of the DFA. These patterns all contain Intel IXP2800 NPU, which is much smaller than the number
large numbers of wildcards, and sometimes have classes of of patterns. Hence, one DFA per pattern per processing unit
characters. For example, pattern 12 contains a fixed length is infeasible. Our goal is to design an algorithm that divides
(20) of wildcards, pattern 37 contains three unrestricted regular expressions into several groups, so that one
wildcards (“.*”), and pattern 38 contains 19 classes of processing unit can run one or several composite DFAs. In
characters, 4 unrestricted wildcards, and 8 length restricted addition, the size of local memory of each processing unit is
wildcards. quite limited. For example, the newly architected IBM cell
60000
processor has 8 synergistic processor elements, each with
128KB local memory [23]. Hence, we need to keep grouping
Total Num ber of States
50000
40000 patterns until they meet the local memory limit. The pseudo-
30000
code of the algorithm is provided below.
In this algorithm, we first compute the pairwise
20000
interaction of regular expressions. With this pairwise
10000
information, we construct a graph with each pattern as a
0 vertex and an edge between patterns Ri and Rj if they interact
with each other. Using this graph, we can start with a pattern
1 6 11 16 21 26 31
# of patterns compiles together

that has least interaction with others, and then try to add
Figure 8. DFA Size and Processing Complexity of Multiple patterns that have least interactions into the same group. We
Patterns (Unsorted order) keep adding until the composite DFA is larger than the local
memory limit. Then we proceed to create a new group from
4.3 Regular Expressions Grouping Algorithms
the patterns that remain ungrouped.
As discussed above, certain patterns interact with each other ______________________________________________________________________________
when compiled together, which can result in a large For regular expression Ri in the set
composite DFA. In this section, we propose algorithms to For regular expression Rj in the set
selectively partition m patterns to k groups such that patterns Compute pairwise interaction of Ri and Rj
in each group do not adversely interact with each other. As Construct a graph G(V, E)
such, these algorithms reduce the computation complexity V is the set of regular expressions, with one vertex per regular
from O(m) to O(k) without causing extra memory usage. expression
We first provide a formal definition of interaction: two E is the set of edges between vertices, with an edge (Vi, Vj) if Ri
and Rj interact with each other.
patterns interact with each other if their composite DFA
contains more states than the sum of two individual ones. To Repeat
calculate the number of states in the composite DFA, we first New group (NG) = φ
construct an NFA by adding a new start state, two ε edges
8
Pick a regular expression that has the least interaction with the number of grouping operations, hence reducing the
others and add it into new group NG number of resulting groups. In this algorithm, we group
Repeat patterns using a similar routine as the previous algorithm.
Pick a regular expression R has the least number of edges However, we stop grouping when the size of the composite
connected to the new group
Compile NG ∪ {R} into a DFA
DFA (denoted as D(NG)) exceeds its share of the leftover
if this DFA is larger than the limit memory. Here, the DFA’s share of the leftover memory is
break; calculated using the formula = (Leftover memory L) *
else (Number of patterns in the group) / (Number of ungrouped
Add R into NG patterns).
Until every regular expression in G is examined Discussion: Grouping multiple regular expressions into
Delete NG from G one composite DFA is a well known technique to enhance
Until no regular expression is left in G matching speed. Our algorithms focus on picking the right
______________________________________________________________________________
Algorithm for Multi-core Processor Architecture with patterns to be grouped together. Similar to our approach,
limited total memory size systems like Bro group patterns into one group, instead of
several groups. They adopt a lazy DFA-based approach,
General processor architecture. In the general where they cache commonly used DFA states and extend the
processor architecture, if there are multiple composite DFAs DFA at run-time if needed. The distinction between our
to be run, the processor executes each of them sequentially. approach and Bro’s approach is that our grouping algorithm
Usually all the DFAs are kept in the main memory for the produces scanners of deterministic complexity. The lazy
performance purpose. Since the memory is shared among all DFA-based approach, although fast and memory efficient on
DFAs, we want to group all patterns into the smallest number most common inputs, may be exploited by intruders to
of groups (hence the smallest number of DFA) while not construct malicious packets that force the lazy DFA to enter
exceeding the available memory size. It is clear that finding many corner cases [15]. Our fully-developed DFA does not
the smallest number of groups is an NP hard problem. In this have performance degradation under such attacks.
work, we apply heuristics to find a small number of groups
that can serve as a good approximation. The pseudo-code of V.EVALUTION RESULTS
our algorithm for the general processor architecture is shown
in the following. We implement a DFA scanner with rewriting and grouping
_____________________________________________ functionality for efficient regular expression matching. In
this section, we evaluate the effectiveness of our rewriting
Leftover memory L = Total memory techniques for reducing DFA size, and the effectiveness of
For regular expression Ri in the set our grouping algorithms for creating memory-efficient
Compute the DFA size Di for Ri
composite DFA. We also compare the speed of our scanner
Leftover memory -= Di
against a DFA-based repeated scanner generated by flex [25]
Repeat and a best-known NFA-based scanner [26]. Compared to the
New group (NG) = φ DFA-based repeated scanning approach, our DFA-based one
pass scanning approach has 12 to 42 times performance
Pick a regular expression which has the least interaction with
others and add it into new group NG
improvements. Compared to the NFA-based implementation,
Repeat our DFA scanner is 50 to 700 times faster on traffic dumps
Pick a regular expression R that has the least number of obtained form MIT and Berkeley networks.
edges connected to the new group
5.1 Experimental Setup
Compile NG ∪ {R} into a DFA
To focus on regular expressions commonly used in
if D(NG) > ∑
D ( R ) +L*|NG|/(#left patterns)
Ri ∈NG
i
networking applications, we select the following three
break; complex pattern sets: The first is from the Linux layer 7 filter
else [1] which contains 70 regular expressions for detecting
Add R into NG different protocols. The second is from the Snort system [2]
Until every regular expression in G is examined which contains 1555 regular expressions for intrusion
Leftover memory L -= D(NG) - ∑
D( Ri )
Ri ∈NG
detection. The last one is from Bro intrusion detection system
[3] with a total of 2781 regular expressions.
Delete NG from G We use two sets of real-world packet traces. The first set
Until no regular expression is left in G is the intrusion detection evaluation data set from the MIT
____________________________________________ DARPA project [24]. It contains more than a million packets.
The second data set is from a local LAN with 20 machines at
Algorithm for General-Processor Architecture the UC Berkeley networking group, which contains more
Different from the multi-core case, in this algorithm we than six million packets. The characteristics of MIT dump
first compute the DFA of individual patterns and compute the are very different from Berkeley dump. MIT dump mostly
leftover memory size. At any stage, we always try to contains intrusion packets that are long, with the average
distribute the leftover memory evenly among the ungrouped packet payload length being 507.386 bytes. In the Berkeley
expressions, which is the heuristics that we apply to increase dump, however, most packets are normal traffic, with 67.65
9
bytes on average in the packet payload. A high percentage of that share the same header information. Note that we do not
the packets are ICMP and ARP packets that are very short. report the results of using the Snort rule set because its
We use Flex [25] to convert regular expressions into patterns overlap significantly with those of the Bro rule set.
DFAs. Our implementation of the DFA scanner eliminates
5.3.1 Interaction of Patterns
backtracking operations [25]. It only performs one-pass
search over the input and is able to report matching results at For all three pattern sets, a majority of patterns are non-
the position of the end of each matching substring. interactive, particularly in Bro http patterns set where all
All the experimental results reported were obtained on patterns are non-interactive. As a result, most patterns in
PCs with 3.4 Ghz CPU and 3.7 GB memory. these rulesets can be combined pair-wise. This nice property
offers a significant potential for our grouping algorithms to
5.2 Effect of Rule Rewriting produce small numbers of groups. To achieve that, one
We apply our rewriting scheme presented in Section 3.2 to assumption that our grouping algorithms use needs to be
the Linux L7-filter, Snort and Bro pattern sets. For the Linux verified. As stated in Section 4.2, the assumption is that if
L7-filter pattern set, we do not identify any pattern that needs three patterns are pair-wise non-interactive, it is highly likely
to be rewritten. For the Snort pattern set, however, 71 rules that the size of the composite DFA will not exceed the sum
need to be rewritten. For Bro, 49 patterns (mostly imported of the individual sizes. We verify this assumption with the
from Snort) need to be rewritten using Rewrite Rule 2. For real world pattern sets. Table 6 shows that this assumption is
these patterns, we gain significant memory savings as shown valid for over 99.8% of the cases from all three pattern sets.
in Table 5. For both types of rewrite, the DFA size reduction Table 6. Interaction of regular expressions
rate is over 98%. No-interaction
Pair-wise No-interaction
Table 5. Rewriting effects lead to No-interaction
Pair-wise
Type of Rewrite Rule Number Average DFA three patterns
Set of length Reduction
Patterns restriction Rate Linux L7-filter 71.18% 99.87%
Rewrite Rule 1: Snort 17 370 >98% Bro http 100% 100%
(Quadratic case) Bro 0 0 0 Bro payload 93.3% 99.99%
Rewrite Rule 2: Snort 54 344 >99%1
(Exponential Case) Bro 49 214.4 >99%1
17 patterns belong to the category for which Rewrite Rule 5.3.2 Grouping Results
1 can be applied. These patterns (e.g., “^SEARCH\s+[^\n]
We apply our grouping algorithms to all three pattern sets
{1024}”) all contain a character (e.g., \s) that is allowed to
and successfully group all of them into small (<5) numbers
appear multiple times before a class of characters (e.g., [^\n])
of groups. For the Bro’s http pattern set, since patterns do not
with a fixed length restriction (e.g., 1024). As discussed in
interact with each other, it is possible to compile all 648
Section 3.2, this type of pattern generates DFAs that expand
patterns into one composite DFA of 6218 states. The other
quadratically in the length restriction. After rewriting, the
two sets, however, cannot be grouped into one group due to
DFA sizes come down to linear in the length restriction. A
interactions. Below, we report results obtained using our
total of 103 patterns need to be rewritten using Rewrite Rule
grouping algorithm for the multi-core architecture in Table 7,
2. Before rewriting, most of them generate exponential sized
where local memory is limited. The results for the general
DFAs that cannot even be compiled successfully. With our
processor architecture are in Table 8.
rewriting techniques, the collection of DFAs created for all
the patterns in the Snort system can fit into 95MB memory, Table 7. Results of grouping algorithms for the multi-core
which can be satisfied in most PC-based systems. architecture
7 (a) Linux L7-filter (70 Patterns)
5.3 Effect of Grouping Multiple Patterns Composite DFA Total Number Compilation
Groups
In this section, we apply the grouping techniques to regular state limit of States Time (s)
617 10 4267 3.3
expression sets. We show that our grouping techniques can 2000 5 6181 12.6
intelligently group patterns to boost system throughput, while 4000 4 9307 29.1
avoiding extensive memory usages. We test on three pattern 16000 3 29986 54.5
sets: the Linux L7-filter, the Bro http-related pattern set and
the Bro payload related pattern set. The patterns of L7-filter 7(b) Payload patterns from Bro (222 Patterns)
Composite DFA Total Number Compilation
can be grouped because the payload of an incoming packet is state limit
Groups
of States Time (s)
compared against all the patterns, regardless of the packet 540 11 4868 20
header information. For the Bro pattern set, as most rules are 1000 7 4656 118
related to packets with specific header information, we pick 2000 5 5430 780
the http related patterns (a total of 648) that share the same 6000 4 9197 1038
header information, as well as 222 payload scanning patterns Table 7(a) shows the results for Linux L7-filter pattern
set. We start by limiting the number of states in each
composite DFA to 617, the size of the largest DFA created
1 for a single pattern in the Linux L7-filter set. The actual
Note, we use over 99% because some of the patterns create too many states
to be compiled successfully without rewriting. 99% is obtained by memory cost is 617 times 256 next state pointers times
calculating those successful ones. log(617) bits for each pointer, which amounts to 192 KB.
10
Considering that most modern processors have large data Linux L7-filter or Bro system do not change frequently, the
caches (>0.5MB), this memory cost for a single composite occasional overhead of several minutes is affordable. In
DFA is comparatively small. Our algorithm generates 10 addition, we do not need to regroup all patterns given any
groups when the limit on the DFA size is set to 617. It new pattern. We can just compute its pairwise interactions
creates fewer groups when the limit is increased to larger with existing patterns and pick a group that yields least total
numbers. As today’s multi-core network processors have 8- interactions. This type of incremental update computation
16 engines, it is feasible to allocate each composite DFA to time is in average less than 1 second on the Bro payload
one processor and take advantage of parallelism. pattern set.
With our grouping algorithms, we can decrease the
5.3.3 Speed Comparison
number of pattern groups from 70 (originally ungrouped) to 3
groups. This means that, given a character, the generated We compare our DFA-based algorithms with the starte-of-
packet content scanner needs to perform only three state the-art NFA-based regular expression matching algorithm.
transitions instead of the 70 state transitions that were Both L7-filter and Snort systems use this NFA-based library.
necessary with the original ungrouped case. This results in a We also compare it with the DFA-based repeated scan
significant performance enhancement (show in Section approach generated by flex [25]. The results are summarized
5.3.3). in Table 9. Our DFA-based one pass scanner is 47.9 to 704
For Bro’s payload pattern set, we can group more times faster than the NFA-based scanner. Compared to DFA-
patterns into one group. As Table 7(b) shows, starting from based repeated scan engine, our scanner yields a performance
540, the largest individual DFA size, the grouping algorithm improvement of 1244% to 4238%. Also note that although
can group 222 patterns into 11 groups. As the DFA state these dumps have dramatically different characteristics, our
limit increases, the number of groups decreases down to 4. scanner provides similar throughputs over these dumps
because it scans each character only once. The other two
Table 8. Results of grouping algorithms for general approaches are subject to dramatic change in throughput (1.8
processor architecture to 3.4 times) over these traces, because they need to do
8(a) Linux L7-filter (70 Patterns) backtracking or repeated scans. Of course, we admit that the
Total DFA
Total
Compilation memory usage of our scanner is 2.6 to 8.4 times the NFA-
state limit
Groups Number of
Time (s) based approach. However, the largest scanner we created
States (Linux L7-filter, 3 groups) uses 13.3MB memory, which is
3533 12 3371 5.602 well under the memory limit of most modern systems.
4000 10 3753 7.335 Table 9. Comparison of the Different Scanners
10000 5 7280 37.928 Throughputs
Memory
32000 3 25215 49.976 (Mb/s)
Consumption
MIT Berkeley (KB)
8(b) Payload patterns from Bro (223 Patterns) dump dump
Composite Total Linux NFA 0.98 3.4 1636
Compilation L-7 DFA RP 16.3 34.6 7632
DFA state Groups Number of
Time (s) DFA OP 3 groups 690.8 728.3 13596
limit States
Bro NFA 30.4 56.1 1632
5221 6 4697 1050
DFA RP 117.2 83.2 1624
8000 4 6854 1030 Http
DFA OP 1 group 1458 1612.8 4264
Bro NFA 5.8 14.8 1632
Table 8 demonstrates that the grouping algorithm for the DFA RP 17.1 25.6 7628
general processor architecture can effectively reduce the Payload
DFA OP 4 groups 566.1 568.3 4312
number of groups generated as the memory limit imposed on NFA—NFA-based implementation
the algorithm is increased. In addition, the total number of DFA RP – Flex generated DFA-based repeated scan engine
DFA states is close to the memory limit, showing the DFA OP – Our DFA one pass scanning engine
algorithm can fully utilize the memory allocated to the packet
scanner. Note that we start memory limit at 3533 DFA states VI.CONCLUSION AND FUTURE WORK
for Linux L7-filter which is the total number of the states of We considered the implementation of fast regular expression
individual DFAs. Our simulation results show that we can matching for packet payload scanning applications. While
group 70 patterns into 12 groups with no extra memory usage. NFA-based approaches are usually adopted for
Similarly, we start 5221 DFA states for Bro payload set, implementation because naïve DFA implementations can
which is the sum of 233 individual DFAs. Even without extra have exponentially growing memory costs, we showed that
memory, we can decrease the number of groups from 233 to with our rewriting techniques, memory-efficient DFA-based
6. approaches are possible. In addition, we presented a scheme
Beyond the effectiveness, Table 7 and Table 8 also that selectively groups patterns together to further speed up
present the running time of our grouping algorithms. This the matching process. Our DFA-based implementation is 2 to
overhead is a one-time cost. In networking environments, the 3 orders of magnitude faster than the widely used NFA
packet content scanner operates continuously until there are implementation and 1 to 2 orders of magnitude faster than a
new patterns to be inserted to the system. As patterns in the commonly used DFA-based parser. Our grouping scheme
11
can be applied to general processor architecture where the Appendix
DFA for one group corresponds to one process or thread, as
well as to multi-core architecture where groups of patterns Theorem 1: Pattern P1 “Â[A-Z]{j}” is equivalent to the
can be processed in parallel among different processing units. original pattern P2 “Â+[A-Z]{j}” for detecting non-
In the future, it would be an interesting study to apply overlapping shortest string.
different DFA compression techniques and explore tradeoffs (1) Any input matching P1 must match P2 as well, and the
between the overhead of compression and the savings in shortest matched string S1 for P1 is the same as the
memory usage. shortest matched sting S2 for P2.
Proof: For any input matching P1, it must match pattern P2
REFEENCES because we can use “\s” to match “\s+” and the remaining j
[1] J. Levandoski, E. Sommer, and M. Strait, "Application Layer Packet
characters as [A-Z]{j}. Next we prove their matched string S1
Classifier for Linux." http://l7-filter.sourceforge.net/. and S2 are identical. For P1, there is only one way of
[2] "SNORT Network Intrusion Detection System." http://www.snort.org. selecting S1 and its length is j+1. There maybe multiple
[3] "Bro Intrusion Detection System." http://bro-ids.org/Overview.html. ways of selecting S2 (same start position, overlapping
[4] L. Tan and T. Sherwood, "A High Throughput String Matching
Architecture for Intrusion Detection and Prevention," Proc. LISA, 2005.
strings), with length from j+1 to infinity. If we pick the
[5] Y. Cho and W. Mangione-Smith, "Deep packet filter with dedicated shortest match, its length would also be j+1. In addition, S1
logic and read only memories," Proc. FCCM, 2004. and S2 must start from the beginning of the input due to the
[6] Z. K. Baker and V. K. Prasanna, "Time and area efficient pattern requirement of ^. Given that they have the same length, S1
matching on FPGAs," Proc. FPGAs, 2004.
[7] Z. K. Baker and V. K. Prasanna, "A methodology for synthesis of
and S2 must be identical.
efficient intrusion detection systems on FPGAs.," Proc. FCCM, 2004. (2) Any input matching P2 must match P1 as well, and both
[8] M. Aldwairi, T. Conte, and P. Franzon, "Configurable string matching patterns report matching of the same shortest string.
hardware for speedup up intrusion detection," Proc. WASSA, 2004.
[9] S. Dharmapurikar, M. Attig, and J. Lockwood, "Deep packet inspection Proof: For any input matching P2 “Â+[A-Z]{j}”, it must
using parallel bloom filters," IEEE Micro, 2004. have x “A”s (x>=1) matched as “Â+”, y “A”s and z [A-Z]
[10] F. Yu, R. H. Katz, and T. V. Lakshman, "Gigabit Rate Packet Pattern characters (starting from a none “A”) matched as “[A-Z]{j}”
Matching with TCAM," Proc. ICNP, 2004. (y>=0, z>=0, y+z=j). This input must match P1 “Â[A-
[11] Y. H. Cho and W. H. MangioneSmith, "A Pattern Matching
Coprocessor for Network Security," Proc. DAC, 2005. Z]{j}” because the input have x-1+y+z>=j [A-Z] characters
[12] T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu, after the first A. Similar to (1), the shortest matched strings
"Processing XML Streams with Deterministic Automata and Stream are the same.
Indexes," ACM TODS, vol. 29, 2004.
[13] Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, "Path
Since pattern starts with ^, P1 and P2 report at most one
Sharing and Predicate Evaluation for High-Performance XML Filtering," match for one line. Given (1) and (2), P1 and P2 report the
ACM TODS, 2003. same results for any input, hence they are equivalent. 
[14] J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to Automata Theorem 2. Patter P1 “.*AB[A-Z]{j}” can be rewritten as
Theory, Languages, and Computation, Second ed: Addison Wesley, 2001.
[15] R. Sommer and V. Paxson, "Enhancing Byte-Level Network Intrusion pattern P2 “([Â]|A[^B]|AB[A-Z]{j-1}[^(A-Z)])*AB[A-
Detection Signatures with Context," Proc. CCS, 2003. Z]{j}”. These two patterns are equivalent for detecting non-
[16] J. Moscola, J. Lockwood, R. P. Loui, and Michael Pachos, overlapping strings.
"Implementation of a Content-Scanning Module for an Internet Firewall,"
Proc. FCCM, 2003.
Proof: It is trivial that these two patterns are equivalent when
[17] R. Sidhu and V. K. Prasanna, "Fast regular expression matching using the input does not contain AB\s because none of them match
FPGAs," Proc. FCCM, 2001. the input. It is also trivial if the input only contains one AB\s.
[18] R. Franklin, D. Carver, and B. Hutchings, "Assisting network intrusion Next, we prove the case where we have multiple ABs without
detection with reconfigurable hardware," Proc. FCCM, 2002.
[19] C. R. Clark and D. E. Schimmel, "Scalable pattern matching for high
[^(A-Z)] in between and they are within j bytes to the first AB
speed networks," Proc FCCM, 2004. through (1), (2) and (3).
[20] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, J. Turner., (1) Any input not matching P2 does not match P1 either.
"Algorithms to accelerate Multiple Regular Expression Matching for Deep
Packet Inspection," Under submission. Proof: Since the input does not match P2, there must be a
[21] "Standard for Information Technology, Portable Operating System [^(A-Z)] character within the next j bytes of the first AB, this
Interface (POSIX)," Portable Applications Standards Committee of IEEE character must also be located within j bytes to the following
Computer Society and the Open Group. ABs. Hence, P1 will not be matched either.
[22] C. L. A. Clarke and G. V. Cormack, "On the use of regular expressions
for searching text," Technical Report CS-95-07, Department of Computer (2) Any input matching P2 must match P1. P2 and P1
Science, University of Waterloo, 1995. generate matching results at the same position.
[23] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and
D. Shippy, "Introduction to the Cell multiprocessor," IBM J. RES. & DEV., Proof: For any input matching P2, it must report matching
vol. 49, JULY/SEPTEMBER 2005. result at j positions after the first AB. If there is no [^(A-Z)]
[24] "MIT DARPA Intrusion Detection Data Sets." character within the next j bytes to one of the ABs, then there
http://www.ll.mit.edu/IST/ideval/data/2000/2000_data_index.html. will not be [^(A-Z)] within the next j bytes to the first AB
[25] V. Paxson et al., "Flex: A fast scanner generator."
http://www.gnu.org/software/flex/. because there is no [^(A-Z)] in between of these ABs.
[26] Perl compatible Regular Expression, http://www.pcre.org/ Therefore, the match result of P1 will be generated j bytes
after the first AB as well. Hence, S1 and S2 are the same.
(3) P1 and P2 report the same number of matches.
Proof: When there are multiple ABs without [^(A-Z)]
between them and they are within j bytes to the first AB. P1
12
would only report one match, because these ABs are within j
bytes and their matching strings overlap with each other. P2
would report one match too. Hence, P1 and P2 report the
same number of matches. If there are multiple non-
overlapping patterns in the input (ABs are at least j apart or
with [^(A-Z)] in between), P1 and P2 still report the same
number of matches because we can divide the input to
segments, where only one match is reported in one segment.
Given (1), (2) and (3), for any input, patterns P1 and P2
report the same matching results and hence they are
equivalent. 
13

Fast and Memory-Efficient Regular Expression Matching For Deep Packet Inspection

Uploaded by

Copyright:

Available Formats

Fast and Memory-Efficient Regular Expression Matching For Deep Packet Inspection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fast and Memory-Efficient Regular Expression Matching For Deep Packet Inspection

Uploaded by

Copyright:

Available Formats

Fast and Memory-Efficient Regular Expression

Matching for Deep Packet Inspection

Electrical Engineering and Computer Sciences

Technical Report No. UCB/EECS-2006-76

May 22, 2006

Figure 1. A DFA for Pattern ^B+[^\n]{3}D

# of patterns compiles together

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.