A memory-based NFA regular expression match engine for signature ...

5 downloads 8522 Views 1MB Size Report
Mar 15, 2013 - Department of Electronic Engineering, City University of Hong Kong, Hong ... design of hardware accelerator to speed up regular expression ...
Computer Communications 36 (2013) 1255–1267

Contents lists available at SciVerse ScienceDirect

Computer Communications journal homepage: www.elsevier.com/locate/comcom

A memory-based NFA regular expression match engine for signature-based intrusion detection Derek Pao ⇑, Nga Lam Or, Ray C.C. Cheung Department of Electronic Engineering, City University of Hong Kong, Hong Kong

a r t i c l e

i n f o

Article history: Received 15 October 2012 Received in revised form 14 January 2013 Accepted 4 March 2013 Available online 15 March 2013 Keywords: Signature-based intrusion detection Regular expression matching Non-deterministic finite automaton Memory-based architecture

a b s t r a c t Signature-based intrusion detection is required to inspect network traffic at wire-speed. Matching packet payloads against patterns specified with regular expression is a computation intensive task. Hence, the design of hardware accelerator to speed up regular expression matching has been an active research area. A systematic approach to detect regular expression is based on finite automaton. The space-time tradeoff between deterministic finite automaton (DFA) and non-deterministic finite automaton (NFA) is wellknown. DFA can offer constant throughput but it may suffer from the state explosion problem. Hence, implementation of DFA for large pattern sets on embedded device with limited on-chip memory may not be viable. NFA requires linear space but the throughput can be very low. Implementations of NFA with hardwired circuits can overcome the speed deficiency by exploiting the massive parallelism offered by dedicated hardware circuitries, but this approach does not support efficient dynamic updates. In this paper, we shall present a memory-based architecture for the implementation of NFA to speed up regular expression matching for signature-based intrusion detection. The proposed method supports dynamic updates and offers constant throughput so that it can be used to supplement the existing DFA-based methods in handling large pattern sets. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction Two categories of intrusion detection techniques are used in today’s intrusion detection system (IDS), namely anomaly detection and signature-based detection. Anomaly detection methods [1] detect attacks by monitoring network traffic behaviors. If the observed traffic behavior deviates significantly from the expect user profile, the IDS may generate an alert and/or take appropriate actions. Anomaly detection can be used to protect the system against unknown attacks. The detection rate is about 98% while the false alarm rate is about 1% [2]. Anomaly detection has two limitations. First, by the time the attack is detected some damages might have been done to the computer system. Anomaly detection can reduce and localize the damages. Second, anomaly detection may not be able to detect slow-attacks where the attacker deliberately slows down the traffic rate of the attack to avoid detection. Signature-based detection methods are based on content inspection. Signature-based detection is precise and accurate, but it has one limitation. The attack patterns have to be known in advance, and are characterized by the intrusion signatures. If an intrusion signature is found in the packet header/payload, the ⇑ Corresponding author. Tel.: +852 34428607; fax: +852 34420562. E-mail addresses: [email protected] (D. Pao), [email protected] (N.L. Or), [email protected] (R.C.C. Cheung). 0140-3664/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.comcom.2013.03.002

IDS will execute the predefined action, e.g., generates an alert and/or blocks the packet. Pattern matching is a computation intensive task. In this paper we shall present the design of a hardware accelerator to speed up the pattern matching process, in particular for the detection of intrusion signatures specified using regular expressions. Snort [3] is an open-source IDS. With close to 400,000 registered users, Snort is considered to be the de facto standard for IDS. Our study will be based on the intrusion signatures used in Snort. A Snort rule is consisted of two parts, namely the rule header and the rule option. The rule header specifies the traffic flow and action, and the rule option specifies the contents in the packet payload that may constitute a threat to the system. Two example Snort rules are shown in Fig. 1 (information on class type and reference number etc., are omitted). The intrusion pattern of rule 1 is a sequence of raw bytes (specified by the keyword ‘‘content’’), and the intrusion pattern of rule 2 is specified using a Perl compatible regular expression (pcre). The Snort system is required to inspect the Internet traffic at real-time. Each packet has to be compared against all the installed rules. If the packet header and the packet payload match an installed rule, the corresponding rule action will be carried out. There are about 10 thousand rules in the Snort ruleset. In a physical deployment, the system administrator can select to install some subsets of the rules appropriate to his application environ-

1256

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

ment. The IDS deployed in a high-speed router or server machine are required to process large amount of data at real-time. Hence, the design of hardware accelerator to speed up pattern matching has been an active research area. A pattern can be specified as a simple string or regular expression (regex). In the Snort ruleset, about 1/3 of the patterns are specified as regexes. There have been significant progresses in the design of hardware string matching engines in recent years [4–9]. However, the design of cost effective hardware regex matching engine that can meet the application requirements of intrusion detection is still an on-going research problem. A systematic approach to detect regex is based on finite automaton (FA) [10]. A FA can be classified as deterministic (DFA) or nondeterministic (NFA). A DFA has the following three properties: (i) there is one active state in the system at any time; (ii) for each state there is one and only one next state for a given input symbol; and (iii) each state transition consumes one input symbol. In contrast to the DFA, an NFA allows (i) multiple active states, and (ii) the system can make multiple state transitions in processing one input symbol. The space-time tradeoff between DFA and NFA is well-known. The processing rate of DFA is constant (typically 1 input character per cycle in a hardware implementation), but it may suffer from the exponential state explosion problem. NFA only requires linear space, but the processing speed of NFA depends on the pattern set and the input sequence. Theoretically the NFA can make up to n2 state transitions when processing 1 input symbol, where n is the number of states in the NFA. A characteristic of IDS is that the pattern set is fairly dynamic. When a new attack has been identified, pattern set should be updated as soon as possible. Hence, there are three basic requirements in the design of hardware regex matching engine for intrusion detection, namely the hardware cost, processing speed, and dynamic updating. DFA-based regex detection engine can offer good processing speed and allows dynamic updates, but the hardware cost is prohibitive. Alternatively, a DFA with fixed memory allocation may not be able to handle regexes that cause severe state explosion. Conventional lookup table driven NFA-based regex matching engine has good memory efficiency, but its processing speed is slow. The system may needs to look up the transition rule table multiple times in processing 1 input symbol. There have been attempts to implement NFA using hardwired circuits on FPGA such that the device can offer constant processing speed [11–15]. A disadvantage of this approach is that whenever the pattern set is updated, the FPGA needs to be reconfigured. Although FPGA is reconfigurable, the hardware synthesis procedure may not be fully automatic. Very often some manual adjustments of system parameters are required in order to achieve the desired timing performance, and the synthesis time can be in the order of hours. In this paper we shall present a novel memory-based NFA architecture for regex matching called MX-NFA. A distinctive feature of the proposed architecture is that it can meet the three requirements on cost, speed, and dynamic update. The hardware system requires linear memory space, offers constant processing speed, and allows dynamic updates to the pattern set without the needs to reconfigure the hardware circuits. Another advantage of MX-NFA is its simplicity.

The processing logic is simple and the mapping of the Snort regex to the internal data structures is fairly easy. In real-life the IDS may need to detect hundreds of signatures (strings and regexes) at the same time. Most of the regexes are relatively simple and do not cause severe state explosion. These ‘‘simple’’ regexes and the strings can be handled by the DFA approach. However, a small percentage of regexes in the pattern set may cause exponential state explosion. These ‘‘complex’’ regexes can be handled by the proposed MX-NFA. For example, if we detect the regex of rule 2 of Fig. 1 using DFA, the conventional DFA contains 23,616 states and 6 million transition rules. If the optimization methods of [16,17] are applied, the number of transition rules is reduced to 68,848 (close to 99% reduction). The memory cost for the optimized DFA transition rule table is still more than 2.7 Mbit. If MX-NFA is used, the memory cost is less than 3 Kbit. The organization of the remaining parts of this paper is as follows. In Section 2 we shall outline the basic principle of detecting regex using DFA and NFA. A review of previous studies on the optimization of memory-based DFA/NFA will be given. The design of the MX-NFA will be presented in Section 3. Section 4 is devoted to performance evaluation. We shall evaluate the hardware implementation cost of MX-NFA on FPGA, and give a comparison with related methods. Section 5 is the conclusion.

2. Background and prior work 2.1. Detecting regular expression using DFA and NFA The regular expression features supported in our method are listed in Table 1. We contrast the structure of the DFA state transition graph and the NFA state transition graph with a simple example shown in Fig. 2. In this study, the NFA does not have e-transitions (transitions without consuming any input symbol). The pattern specified by R1 = .⁄nd..[^ns]⁄z can appear anywhere in the input. It starts with a decimal digit ‘0’ to ‘9’ followed by two wildcard characters, and then 0 or more character that is not equal to a white-space character, and ended with the character ‘z’. In the subsequence discussion, the ‘‘.⁄’’ at the beginning of a regex is understood and is omitted. The DFA scans the input stream in one pass, and makes one state transition when an input character is consumed. There is only one active state in the DFA at any time. Hence, a DFA uses distinct states to represent all possible partially matching substrings when scanning the input up to a given point. As a result, the state graph would have a more complex structure as shown in Fig. 2(a). A transition rule (an edge in the transition graph) is a four-tuple (current state, input symbol, next state, output information). The system searches the transition rule table using the current state ID and the input symbol to determine the next state and the associated output. Intrusion patterns are byte-oriented, i.e., there are 256 distinct symbols in the alphabet set. A DFA with n states will have n  256 transition rules. Theoretically, a regex with length L can be mapped to a DFA with O(2L) states in the worst case, and this is known as the exponential state explosion problem.

Fig. 1. Example snort rules.

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

1257

Table 1 Regular expression features supported in MX-NFA. Regex symbol

Feature

⁄ + ? ^ . (a|b) [a-z], nd, ns, etc. [^a] {n}, {n, m}, {n,}, {,m}

Kleene star. Matches zero or more times Plus-closure. Matches one or more times Match zero or one time Match pattern at start of input, or start of a line (with/m option) Wildcard character Alternate character, e.g., ‘a’ or ‘b’ (a special case of char. subclass) Character subclass, e.g., ‘a’ to ‘z’, decimal digits, white-space char Negation of character class, e.g., not ‘a’ Discrete repetitions of a character (repetition of substring with 2 or more characters is not supported)

(a) DFA state graph for R1. State 0 is the initial state, and states 9 and 10 are output state.

(b) NFA state graph for R1. State 0 is the initial state, and state 4 is the output state. Fig. 2. DFA and NFA state graph for R1 = .⁄nd..[^ ns]⁄z.

In a typical hardware implementation, the table lookup operation can be pipelined and the system can attain a constant throughput. Having a deterministic throughput is a clear advantage because the IDS is required to scan the network traffic in real-time. Another advantage of DFA-based methods is that the hardware architecture is memory-based. When the pattern set is updated, we only need to modify the transition rule table accordingly. The major issue of detecting regex with DFA is the excessive memory cost, especially when we are considering implementation on embedded device with limited on-chip memory. NFA has superior memory efficiency compared to DFA. The number of states in the NFA is approximately equal to the length of the regex. Multiple active states are allowed in NFA, and the NFA can make multiple state transitions when an input symbol is consumed. As a result, the structure of the NFA state transition graph is much simpler and exhibits more or less a linear structure as shown in Fig. 2(b). The distance of a state is equal to the minimum number of hops from the initial state. One may notice that the transitions in a NFA are mainly in the forward direction, i.e.,

moving to a state with larger distance. Backward transitions moving to states with smaller distance are, in general, not required in NFA. For example, the backward transition from state 3 to state 0 in the NFA of Fig. 2(b) is redundant and can be removed from the state graph. Conventional implementation of NFA uses transition rule table. The system maintains a list of active states, i.e., the current state list (cs_list). Initially the cs_list will only contain state 0. When the system receives an input symbol, it will look up the transition rule table using a state ID retrieved from cs_list and the input symbol. One or more matching rules can be found, and the corresponding next state(s) will be added to the next state list (ns_list). When all the entries in cs_list have been processed, cs_list is replaced by ns_list, and the NFA is ready to process the next input symbol. The execution of the conventional NFA with the input ‘‘a1b2c3znnde’’ is depicted in Fig. 3. Initially the cs_list only contains state 0. When the NFA receives the input character ‘a’, only 1 transition from state 0 to state 0 is possible. When the NFA receives the input character ‘1’, it can make two state transitions from state 0 to state 0 and from state 0

1258

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

Fig. 3. Transition rule table for the NFA of Fig. 2(b), and the execution of the NFA for the given input.

to state 1. Hence, there are 2 states in cs_list (states 0 and 1) when the NFA processes the input character ‘b’, and the NFA will make transitions from state 0 to state 0 and from state 1 to state 2. The changes in the cs_list for the subsequent input characters are listed in Fig. 3. In this example, the NFA would need to look up the transition rule table 4 times when processing the character ‘nn’. 2.2. Advanced DFA methods The memory cost of a DFA depends on the number of transition rules which is equal to n  |R|, where n is the number of states and R is the alphabet set. The first approach to reduce the memory cost is to reduce the number of states in the DFA. The early work of Hopcroft [16] described an efficient method to reduce the number of states in DFA by merging equivalent states into one state. Becchi et al. [18] proposed a method to merge non-equivalent states in a DFA by labeling the transitions in the transition graph. The performance gains of this method vary from 30% to about 90% depending on the properties of the regexes. Fu et al. [19] proposed two rule rewriting strategies that can help to reduce the number of states in the DFA. However, the rule rewriting strategies can only be applied to regexes with some specific format. Typically the hardware accelerator may contain 8–16 DFAs, while the number of regexes in the pattern set is in the order of hundreds. It is inevitable that multiple regexes have to be merged to form a DFA. But merging multiple regexes into one DFA may lead to even more severe state explosion. Rule grouping heuristics are studied in Refs. [19,20] that would try to reduce the size of the resultant DFA. The second approach to reduce the memory cost is by reducing |R|. By introducing equivalent symbol class [21,22], the number of alphabets can be reduced. However, this scheme is effective only if the symbols appearing in the regexes correspond to a small subset of R. Hence, each DFA can only handle a small number of regexes. The third approach to reduce the memory cost is to reduce the number of transition rules needed to be stored in the lookup table. Let’s denote the states that are 1-hop from the initial state as the level-1 states. The transition rules are divided into 3 partitions in [23], (i) transitions to the initial state, (ii) transitions to level-1 states, and (iii) transitions to other states. By storing the transition rules for partition 2 and partition 3 in separate physical memories, (i) all transition rules of partition 1 need not be stored in the system, (ii) transition rules in partition 2 that are destined to the same level-1 state can be represented by one entry in the lookup table with a don’t care current state value. In processing an input symbol, the DFA will look up the transition rule tables of partitions 2 and 3 in parallel. If matching rules are found in partitions 2 and 3, then priority is given to the matching rule found in partition 3.

If no matching rule is found in both partitions, then the system will move to the initial state by default. The number of transition rules can be reduced by about 95% using this approach. The industry has adopted this approach together with the rule grouping method of [20] and further optimized lookup table organization of [24] in the design of an on-chip hardware accelerator in a network processor [25] to support wire-speed deep packet inspection. The delayed input DFA (D2FA) [17] is another well-known method to reduce the number of transition rules in DFA. By introducing default transitions that do not consume any input symbol, the number of transition rules can be reduced substantially. Let u and v be two distinct states in the DFA, and there exists an edge (u, a, w), i.e., transition rule from state u to w with input a, and an edge (v, a, w). The edge (u, a, w) can be removed by introducing a default transition from u to v. Each state is allowed to have at most one outgoing default transition. The number of transition rules can be reduced by up to 99%. Comparing D2FA and [23], D2FA can achieve better reduction of transition rules at the expenses of having lower processing speed. Some other variants of D2FA and further improvements that try to reduce the speed penalty can be found in Refs. [26–28]. Kumar et al. observed that state explosion are usually caused by wildcard or character negation (e.g., [^ nn]) that appears in the regex [29]. They proposed to insert annotations to the states in the DFA so as to avoid state explosion. This approach is formalized by Smith et al. in their proposed extended finite state automata (XFA) [30,31]. Variables and program instructions can be associated with states in the XFA. When the system visits a state, the associated program instructions will be executed. Although it is possible to implement this idea on hardware, the system throughput will be degraded when states with associated program instructions are visited frequently. The above mentioned methods can achieve varying degrees of memory reduction depending on the properties of the regexes in the pattern set. A fundamental limitation of these methods is that the optimization process is computation intensive. For example, the methods of [17,18,23] require the initial DFA be constructed before the optimization process can be carried out. If the group of regexes cause exponential state explosion, generation of the initial DFA may not be feasible. The method of [30,31] requires manual selection of instruction template and the subsequent execution time of the optimization algorithm can be up to an hour. 2.3. Advanced NFA and hybrid FAs methods Lee proposed an implementation of a memory-based NFA using bit-map based processing [32]. The internal system state and the next state function are encoded with bit-vectors whose

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

length is equal to the number of states in the NFA. The number of bit-vectors is equal to |R|, one for each alphabet value. Lee’s method has limited flexibility because (i) a hardware unit can only be used to detect 1 regex, (ii) repetition of characters is handled by unrolling, and (iii) the length of the bit-vector has to be pre-determined in a hardware implementation. In principle, multiple regexes can be grouped together to form a NFA. However, the proposed architecture can only report a match when the NFA arrives at an output state without the identity of the matching signature. This is not sufficient for our application in intrusion detection. Becchi and Crowley [33,34] explored the use of hybrid DFA/NFA to detect regexes. The match engine consists of a front-end DFA and multiple back-end DFAs/NFAs. When the front-end DFA detects the prefix string of some complex regexes, activation command(s) are sent to the back-end FAs via a FIFO queue. The back-end FAs share common resources like the transition rule table and a counter table. In general, the back-end FAs will take multiple cycles to process one character. If the FIFO queue is full, the front-end DFA needs to be stalled. The processing speed of the match engine depends on the input data as well as the properties of the pattern set. Recently Bando et al. presented a new approach called lookahead finite automata (LaFA) in [35] which also emphasized dynamic update capability. Bando observes that a regex pattern can, in general, be divided into several segments, i.e., s1v1s2v2 where si are simple strings consisted of a fixed-length sequence of defined byte values and vi are variable strings consisted of arbitrary character subclass, repetition counts, etc. Instead of detecting the pattern in the conventional manner by matching the input against the symbols in a regex from left to right, LaFA tries to match the simple strings first and then verifies the interleaved variable strings on-demand. This method may have difficulties in detecting some complex regexes. Let us consider the regex of rule 2 in Fig. 1. In this regex nx5Csvns + [^nx7D]⁄nx3B[^nx7D]⁄nx3B[^nx7B]{12} we have (i) two adjacent variable strings ns+ and [^ nx7D]⁄, and (ii) two variable strings [^ nx7D]⁄ separated by a single-character simple string nx3B, and the character nx3B may also be part of the variable string [^ nx7D]⁄. The authors of LaFA might not have considered these situations in their design. Yang and Prasanna [36] proposed another hybrid approach called semi-deterministic finite automata (SFA). An SFA consists of p constituent DFA running in parallel, where each DFA is responsible for a subset of states in the original NFA. The space requirement of SFA is O(|R|  p2  (n/p)c), and the time to process one input symbol is O(p2/c2), where n is the number of state in the underlying NFA and c P 1 is a design parameter.

3. Memory-based NFA (MX-NFA) architecture Discrete repetition of character subclass, e.g., [^ns]{n, m} nonwhite space character repeating n to m times, is quite common in the Snort ruleset, and the repetition count can be up to 1024. If the character repetition is handled by unrolling, the hardware cost can become very expensive. In Section 3.1, we shall first present the basic design of MX-NFA for handling regexes without character repetition. In Section 3.2, we shall present an extension of MX-NFA with a count module for handling character repetition without unrolling. 3.1. The basic MX-NFA match unit In the proposed MX-NFA architecture, the transition rules are stored in a table with embedded control circuits. The behavior of a transition rule is defined by the values of the associated control flags. The organization of the MX-NFA transition rule table is very different from the conventional approach. Each transition rule only contains a transition symbol and the associated control flags, and no explicit state IDs are maintained. Individual transition rules can be enabled or disabled. The system does not explicitly maintain the set of active states. Instead, all the out-going transition rules from an active state of the underlining NFA are enabled, and out-going transition rules from an inactive state are disabled. When the input character matches the transition symbol of an enabled transition rule, the corresponding rule will be fired and it may in turn activate the transition rules associated with the corresponding next state. The control flags for the basic MX-NFA unit are listed in Fig. 4. An entry is active if the Enable (E) bit is set. When the input character matches the transition symbol of an active entry, the entry is fired and the adjacent rules specified by S2S1 are activated in the next cycle. By default, an active transition rule will be turned off automatically in the next cycle, unless the Hold (H) bit is set or it is set by some other events. We shall illustrate the basic idea of the MX-NFA match unit using regex R1 as the example. The left-hand side of Fig. 5 shows the setup of MX-NFA match unit that corresponds to the NFA state transition graph of Fig. 2. The transition edge from state 3 to state 0 with input ‘ns’ is redundant, and it can be ignored in the MX-NFA rule table. The right-hand side of Fig. 5 shows the dynamic changes of the E-bits of the match entries when the system processes the input sequence ‘‘a1b2c3z nnde’’. The system is initialized (or reset) in cycle t = 0, and entries with the I-bit equals to 1 are set to active, i.e., E = 1 for e1. Since the pattern R1 can appear anywhere in the input, the H-bit of e1 is set to 1. Consequently, e1 is enabled

Control flag

Description

Enable (E)

The rule is active if E=1. By default the E bit is reset automatically at the end of the clock cycle.

Initial (I)

If I=1, the rule is active after initialization (or system reset).

Hold (H)

If H=1, the E bit will not be reset automatically.

Self-activation (S0)

If S0=1, the rule will activate itself in the next cycle.

Sequential activation (S2S1)

The two control bits specify the number of sequential rules to be activated if the rule is fired. Let the current rule be rule i. S2S1=00 : activate rule i+1 S2S1=01 : activate rules i+1 to i+2 S2S1=10 : activate rules i+1 to i+3 S2S1=11 : activate rules i+1 to i+4

Output (O)

1259

If O=1, an output signal is generated when the rule is fired. Fig. 4. Control flags for a match entry in the basic MX-NFA.

1260

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

MX-NFA rule table

Changes of the E-bit when process the inputs “a1b2c3z\nde”

Control flags Entry Symbol e1

t=1

t=2

t=3

t=4

t=5

t=6

t=7

t=8

t=9

t=10

I

H

O S2S1

S0 reset ‘a’

t=0

‘1’

‘b’

‘2’

‘c’

‘3’

‘z’

‘\n’

‘d’

‘e’

1

1

0

00

0

1

1

1

1

1

1

1

1

1

1

1

0

1

0

0

0

0

0 0

e2

.

0

0

0

00

0

0

0

0

1

e3

.

0

0

0

01

0

0

0

0

0

1

0

1

0

0

0

e4

0

0

0

00

1

0

0

0

0

0

1

1

1

1

0

0

e5

0

0

1

00

0

0

0

0

0

0

1

1

1M

1

0

0

e6

0

0

0

00

0

0

0

0

0

0

0

0

0

1

0

0

Fig. 5. Setup of the MX-NFA match unit for R1 = .⁄nd..[^ns]⁄z, and the changes of the E-bit in the matching process. The E-bit is highlighted if the rule is fired when processing the given input character. The superscript M represents the generation of a match signal.

throughout the matching process. The O-bit of e5 is equal to 1, which means that if e5 is fired an output match signal will be generated. The ID of the matching regex is stored in a register associated with the match unit. Let’s consider the matching process shown on the right hand side of Fig. 5. In cycle 1, the input character ‘a’ does not match the symbol of any enabled entry. In cycle 2, the input character ‘1’ matches the symbol of e1. When e1 is fired in cycle 2, e2 is activated in cycle 3. In cycle 3, e1 and e2 are enabled, which corresponds to the scenario where state 0 and state 1 of the NFA in Fig. 2(b) are active. The input character in cycle 3 matches the symbol of e2 (which is a wildcard) and subsequently e3 is enabled in cycle 4. The E-bit of e2 is reset automatically at the end of cycle 3, and hence it is not active in cycle 4. In cycle 4, both e1 and e3 are fired. When e3 is fired, it will activate e4 and e5 because the value of S2S1 of e3 is equal to 01. When e4 is fired in cycle 5, it activates itself (with S0 = 1) and e5 (with S2S1 = 00). As a result e4 remains enabled as long as the input character is not equal to ‘ns’, and it will also keep e5 to be active. When the character ‘z’ is processed in cycle 7, e5 is fired and a match output signal is generated by the hardware. Entry e6 is a dummy entry used as a delimiter. It will never be fired because its symbol is set to null. We shall now consider the implementation of the basic MX-NFA unit in FPGA. The transition symbol of an entry can be an arbitrary character subclass, and the match unit is required to be programmable. The input character is compared to all the enabled entries in parallel. We implement the parallel comparators for arbitrary character subclasses using block RAM, and the control flags are stored in the flip-flops of the logic cells. The flip-flops are connected into a serial chain such that the required control data for the match entries can be downloaded from the I/O interface. When the pattern set is modified, the FPGA needs not be reconfigured. Assume 9 Kbit block RAM is used. The block RAM is configured as a 256  36-bit array. The input character is used as the 8-bit address to reference a 36-bit word in the block RAM. The arbitrary character subclass of a rule symbol is stored in a column of 256 bits in the memory array. A basic MX-NFA unit can have 36 entries. The ith bit of the memory word corresponds to the match result of the input character with the symbol of the ith match entry. A null value in the entry symbol (e.g., e6 of Fig. 5) corresponds to having all the 256 bits in the column equal to 0. The hardware organization of a basic MX-NFA match unit is depicted in Fig. 6. There are 36 match entries in a basic MX-NFA match unit. Each match entry is equipped with a control logic block. The control logic block is responsible for maintaining the E-bit of the given entry, and the generation of the activation signals a1 to a4, and the output match signal M. The activation signal aj of the ith entry is connected to the input pj of the (i + j)-th entry. Let Et denote the value of the E-bit in cycle t. The E-bit of an entry will be set in cycle t + 1 if

(i) Et = 1 and the value of the control flag H = 1; or (ii) Et = 1, the input character matches the rule symbol and the control flag S0 = 1; or (iii) an activation signal is received from one of the preceding four entries. The setting of the E-bit, and generation of the signals a1 to a4, and M are defined by the following Boolean equations:

Etþ1 ¼ Et ðH þ mt  S0 Þ þ pt1 þ pt2 þ pt3 þ pt4 at1 ¼ Et  mt at2 ¼ Et  mt  ðS2 þ S1 Þ at3 ¼ Et  mt  S2 at4 ¼ Et  mt  S2  S1 M t ¼ Et  mt  O The 36 entries in the match unit are divided into the lower and upper halves. An 8-bit pattern ID (pid) register is associated with the MX-NFA match unit. The output interface produces a match signal and a 9-bit pid value. The leftmost 8 bits of the pid are obtained from the register, and the rightmost bit is generated by the circuit. The rightmost bit is equal to 0 if the match signal is generated from the lower half, and it is equal to 1 if the match signal is generated from the upper half. Multiple MX-NFA match units can be cascaded to process regex with longer length. When multiple regexes are mapped to the MX-NFA, output entries belonging to different regexes cannot be located in the same block (lower/ upper half of a match unit). 3.2. MX-NFA unit with count module Let’s consider the following regex structure with a discrete repetition of an arbitrary character subclass, prefixString [char subclass]{n, m} suffixString. The repetition of the character subclass is referred to as the counting block. A counting block can belong to one of the following five categories: 1. The suffixString is empty, i.e., the counting block appears at the end of the regex. In this case, the counting can be handled by a counter device. 2. The value of n is equal to 0, i.e., the repetition is from 0 to m times. In this case, the counting can be handled by a counter device. 3. The prefixString cannot be contained in the counting block, e.g., R = /^ ab[^ nn]{3, 5}cd/m. In this example, the prefixString ‘‘ab’’ is required to be at the beginning of a line. Hence, whenever a ‘nn’ character is seen, the repetition count should be terminated. Alternatively, if the length of the prefixString is greater than the repetition count, e.g., abc.{2}de, then the prefixString cannot be contained in the counting block. In this case, the counting can also be handled by 2 counter devices.

1261

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

Input character 8 Address decoder m

36

M e m o r y

Output interface

256 x 36 Block RAM

M

Control Logic Control Logic

R e g

match

pid

Control Logic

p1

m

p2

p3

p4

E

I, H, O, S0, S1, S2

Enable

Control flags

a1

a2

a3

M

a4

Fig. 6. Block diagram of the basic MX-NFA match unit. A pattern is found if match = 1, and the pattern ID is given by pid.

4. The prefixString can be contained in the counting block and the repetition count is a fixed value, e.g., R = ab.{7}cd. In this case, the counting cannot be handled by a counter device and the counting block needs to be unrolled. The minimum unrolling is such that the length of the prefixString is larger than the count value. For example, the regex is unrolled to ab....{4}cd such that the prefixString ‘‘ab...’’ has a length of 5 and it cannot appear in the counting block ‘‘.{4}’’. 5. The prefixString can be contained in the counting block and the repetition count is a range, e.g., R = ab.{4-9}cd. In this case, the counting block is subdivided into two parts, i.e., R = ab.{4}.{0-5}cd. The repetition count in the first part of the counting block has a fixed value, and the repetition count of the second part is a range that starts from 0. The first part will be handled by simple expansion as in case (4), and the second part can be handled by a counter device as in case (2). The MX-NFA unit with count module is designed to handle the repetition count of cases 1, 2, 3 and 5 without unrolling. A precondition for using the count module is that the time to start the counting process can be determined precisely. We shall discuss some exception cases in Section 4.2. Fig. 7 shows the structure of the MX-NFA block with count module. To enhance readability, the output interface is not shown in the diagram. A memory column is used to monitor the character of the counting block, and

8 match entries are associated with the count module. The remaining 27 match entries are the same as the basic unit. A count module contains two counter circuits and additional control signals to support interaction between the counters and the associated match entries. The two counters, namely the lowerand upper-bound counters, are responsible for keeping track of the lower- and upper-bound of the counting range. To implement the counting condition {n, m}, the initial values of the lower- and upper-bound counters are set to n and m + 1, respectively. The initial values of the lower- and upper-bound counters are pre-loaded to the corresponding internal registers. The count module has 2 control flags U (unbounded) and N (not-renewable). If U = 1 and the initial upper-bound count value is non-zero, it means that the upper-bound count value is logically equal to infinite, i.e., this is used to implement the counting condition {n,}. In the physical implementation, if U = 1 the upper-bound counter will be frozen after the initial value is loaded into the counter (i.e., it will never count-down to zero). The lower- and upper-bound counters will be reset if the input character does not match the counting condition. The control flag N controls the action to be taken by the counters when an activation signal AC is received. If N = 0, whenever an activate signal is received, the initial values are loaded into the counters and the counters start the count-down process. If N = 1, the activate signal is ignored if the counters are not in the idle state (i.e., the count value of any one of the 2 counters is non-zero). When the counter counts down to zero (i.e., making a transition

1262

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

Input character 8

CEL CEU

Address decoder Br m m

256 x 36 Block RAM

36

M e m o r y

AC

Control Logic Control Logic

Activate

N, U

Count module

Lower-bound counter Upper-bound counter

Control Logic

R e g p1 p2 p3 p4

CEL CEU Br M

m

E Enable

I, H, O, S0, S1, S2, R, C Control flags

AC

a1 a2 a3 a4

Fig. 7. Block diagram of the MX-NFA match unit with a count module. The output interface is not show in the figure. CEL and CEU are the count event signals. Br (break) signal is generated if the input character does not match the requirement of the count module.

Control flag

Description

Respond (R)

If R=1, respond to count-event and break signals: (i) set E-bit upon the count event of the lower-bound counter CEL (ii) clear E-bit upon the count event of the upper-bound counter CEU (iii) clear E-bit upon the break signal Br

Activate Counter (C)

If C=1, sends an activate signal AC to the count module when the rule is fired.

Fig. 8. Additional control flags required in the match entry associated with a count module.

from 1 to 0), a count event signal (CEL for the lower-bound counter and CEU for the upper-bound counter) is generated. When the count value is reduced to zero, the counter stops counting. If the input character does not match the character subclass of the counting block (this is monitored by the bit value m read from the block RAM), the counting process is aborted and a break (Br) signal is generated. In addition to the control flags shown in Fig. 4, each match entry associated with a count module has two more control flags, namely respond (R) and activate counter (C). The control flags R and C specify how the match entry will interact with the count module as shown in Fig. 8. We shall illustrate our method using a simple example. Consider the regex R2 = /^ab[^nn]{3, 5}cd/m. The setup of the MX-NFA is shown in Fig. 9. In R2, the suffixString ‘‘cd’’ can be separated from the prefixString by 3 to 5 characters not equal to ‘nn’. The count module is used to handle the counting process and the activation of the detection of the suffixString after 3 cycles and de-activation after 6 cycles. In this example, the prefixString is equal to ‘‘nnab’’, and it cannot appear in the counting block [^ nn] {3, 5}. Hence, the count module does not expect to receive another activation signal when it is in the counting state. The entry labeled cm is used to monitor if the input character matches the requirement of the

count module. The pattern of R2 can appear at the beginning of the input stream or at the start of a line. Hence, both e1 and e2 are enabled after the system reset (i.e., the I-bit of e1 and e2 is equal to 1). Entry e1 remains active throughout the matching process to detect the start of a line. If the input stream is equal to ‘‘nnab1234cde’’, entries e1, e2 and e3 will be fired in cycles 1–3, respectively. When e3 is fired, an activation signal is sent to the count module, and the lower- and upper-bound counters are initialized to 3 and 6, respectively. Entry e4 is a dummy entry used as a delimiter to avoid the activation of e5 when e3 is fired. The counters start the count-down in cycle 4. In cycle 6, the lowerbound counter will make a 1 to 0 transition, and generates a count-event signal CEL. The R-bit of e5 is equal to 1, and it will respond to the CEL signal by setting its E-bit to 1 in cycle 7. The H-bit of e5 is equal to 1, hence the E-bit of e5 remains equal to 1 until it is explicitly reset by another event. The upper-bound counter will generate a count-event CEU in cycle 9. In response to the CEU signal, the E-bit of e5 is cleared in cycle 10 (i.e., the detection of the suffixString is disabled). For the given input stream, e5 and e6 will be fired in cycles 8 and 9, respectively. When e6 fires in cycle 9, a match signal is generated. There is one remark about the mapping of the regex to the MX-NFA entries. The entry that activates the count

1263

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

MX-NFA rule table

Changes of the E-bit when process the inputs “

Control flags

t=0

Rule Symbol I H O S2S1 S0



t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 t=10

R

C reset ‘\n’ ‘a’ ‘b’ ‘1’ ‘2’ ‘3’ ‘4’ ‘c’ ‘d’

‘e’

cm

-- -- --

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

e1

1 1

0

00

0

0

0

1

1

1

1

1

1

1

1

1

1

1

e2

1 0

0

00

0

0

0

1

0

1

0

0

0

0

0

0

0

0

e3

0 0

0

00

0

0

1

0

0

0

1

0

0

0

0

0

0

0

e4

0 0

0

00

0

0

0

0

0

0

0

1

0

0

0

0

0

0

e5

0 1

0

00

0

1

0

0

0

0

0

0

0

0

1

1

1

0

e6

0 0

1

00

0

0

0

0

0

0

0

0

0

0

0

0

1M

0

e7

0 0

0

00

0

0

0

0

0

0

0

0

0

0

0

0

0

1

Fig. 9. Setup of the MX-NFA with count module for R2 = /^ab[^nn]{3, 5}cd/m, and the changes of the E-bit in the matching process. The initial count values for the lower- and upper-bound counters are equal to 3 and 6, respectively. The N-bit of the count module is set to 1. The E-bit is highlighted if the rule is fired when processing the given input character.

MX-NFA rule table

Changes of the E-bit when process the inputs “

Control flags

t=0

Rule Symbol I H O S2S1 S0



t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 t=10

R

C reset ‘a’ ‘b’ ‘a’ ‘b’ ‘1’ ‘2’ ‘3’ ‘c’ ‘d’

‘e’

cm

-- -- --

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

e1

1

1

0

00

0

0

0

1

1

1

1

1

1

1

1

1

1

1

e2

0 0

0

00

0

0

0

0

0

1

0

1

0

0

0

0

0

0

e3

0 0

0

00

0

0

0

0

0

0

1

0

1

0

0

0

0

0

e4

0 0

0

00

0

0

1

0

0

0

0

1

0

1

0

0

0

0

e5

0 1

0

00

0

1

0

0

0

0

0

0

1

1

1

1

1

0

e6

0 0

1

00

0

0

0

0

0

0

0

0

0

0

0

0

1M

0

e7

0 0

0

00

0

0

0

0

0

0

0

0

0

0

0

0

0

1

Fig. 10. Setup of the MX-NFA with count module for R3 = ab[^ nn]{2, 4}cd, and the changes of the E-bit in the matching process. The initial count values for the lower- and upper-bound counters are equal to 0 and 3, respectively. The N-bit of the count module is set to 0. The E-bit is highlighted if the rule is fired when processing the given input character.

module (e3 in Fig. 9) and the entry that responds to the countevents (e5 in Fig. 9) should belong to the same block (i.e., associated with the same count module). We shall consider one more example, R3 = ab[^ nn]{2, 4}cd. In this example, the suffixString can be separated from the prefixString by 2–4 characters, and the prefixString ‘‘ab’’ may appear in the counting block [^nn]{2, 4}. This case can be resolved by partially unroll the counting block such that the lower-bound count value is equal to zero. R3 is unrolled to R3’ = ab[^ nn][^ nn][^ nn]{0, 2}cd. The set up in the MX-NFA rule table is shown in Fig. 10. The initial values of the lower- and upper-bound counter are equal to 0 and 3, respectively, and the N-bit of the count module is set to 0. Assume the input to the match unit is ‘‘abab123cde’’. The substring ‘‘ab123cd’’ matches the requirements of R3. Entry e1 is enabled after system reset, and it will remains active throughout the matching process. When e4 is fired in cycle 4, the count module is activated. Entry e5 is also activated when e4 is fired (because the suffixString ‘‘cd’’ may follow immediately after the revised prefixString ‘‘ab[^nn][^nn]’’. In cycle 6 when e4 is fired the second time, the value of the upper-bound counter is equal to 2 at the start of the cycle. When the count module receives another activation signal in cycle 6, the count value of the upper-bound counter is reset to 3 again (with the N-bit = 0). As a result, the upper-bound counter will generate a count-event CEU in cycle 9, and the E-bit of e5 is cleared in cycle 10. In this example, e5 is active from cycle 5 to cycle 9. Entry e5 is fired in cycle 8 when the input character

matches the expected value. The match unit detects regex R3 when e6 fires in cycle 9.

4. Implementation and performance evaluation 4.1. FPGA implementation We demonstrate the feasibility of the proposed MX-NFA architecture using the Virtex-4 XC4VLX200 FPGA. The size of the block RAM is 18 Kbit, and the smallest depth is 512 with a word width of 36 bits. When the input character is used as the address, only half of the memory words can be accessed. To overcome the problem, we make use of the dual-port feature supported by the FPGA. The block RAM is logically divided into two partitions of size 256  36. The physical address range of partition A is from 0 to 255, and the address range of partition B is 256 to 511. We prepend a 0 to the input character to form the 9-bit address used by port A of the block RAM, and prepend a 1 to the input character to form the address used by port B. By doing so, an 18 Kbit block RAM can be used to implement two MX-NFA match units, and the block RAM can be fully utilized. The count module is equipped with two 11-bit counters, i.e., the largest count value is up to 2047. Fig. 11 summarizes the hardware resources required for implementing the basic MX-NFA and the MX-NFA with count module. The system’s clock speed is 143 MHz when 8 match units (using 4 block

1264

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

Hardware resources

Resources available Basic MX-NFA in the device (72 match entries) (XC4VLX200) 1

MX-NFA with count module (70 match entries + 2 count modules) 1

MX-NFA with count module (280 match entries + 8 count modules) 4

18Kbit block RAM 4-input LUT Register (bit) Logic Slice

336 178176 178176 89088

788 1007 732

1211 1283 1057

4377 3856 3693

Fig. 11. FPGA implementation cost of MX-NFA.

Table 2 Distribution of the sizes of D2FA. No. of states in D2FA

No. of regexes

less than 1000 1000–1999 2000–100,000 Cannot be generated (size of initial DFA > 100,000 states)

2479 19 70 211

RAMs) are cascaded to form a chain with 280 match entries and 8 count modules. 4.2. Comparison with DFA/D2FA An open-source software called the regular expression processor was developed by Becchi [37]. This software implements the D2FA method, and hence, allows us to perform a detailed comparison of MX-NFA with D2FA. Our study is based on Snortrules-snapshot-2904.tar downloaded from the Snort web site in September 2011. We evaluate the size of the DFA/D2FA of individual regexes (one regex at a time). The software will first map the regex to a standard DFA, optimizes the DFA using the classical state minimization method of [16], and then generates the D2FA. A total of 2779 regexes are analyzed using Becchi’s program. The maximum number of states allowed in the program is set to 100K. If the size of the initial DFA exceeds this limit, the program is aborted. In general, D2FA is very effective for detecting most of the regexes, but there are about 300 exceptions. Distribution of the sizes of the D2FA is summarized in Table 2. Some typical regexes that cause varying degrees of state explosion are shown in Fig. 12. The transition symbol of an edge (transition rule) produced by Becchi’s program can be an arbitrary character subclass as depicted in the example of Fig. 1(a). If the D2FA is implemented in hardware, the number of entries in the lookup table can be much larger than the number of edges shown in Fig. 12. In the following discussion, we shall highlight how to detect these ‘‘complex’’ regexes using MX-NFA. Let us consider the regex Ra = nx5Csvns+[^nx7D]⁄nx3B[^ nx7D]⁄nx3B[^nx7B]{12}. The D2FA to detect Ra contains 16,128 states and 68,848 edges. To detect Ra using the MX-NFA, we only require 11 match entries plus one count module. The setup of the MX-NFA is shown in Fig. 13. The counting block at the end of Ra can be handled by a count module. The entry cm is used to monitor if the input character matches the character subclass [^nx7B] of the counting block. The initial values for the lowerand upper-bound counters are set to 11 and 0, respectively, i.e. the upper-bound count is not used in this example. Entry e11 is a dummy entry used as a delimiter. Eight match entries (say entries e4 to e11 in this example) are associated with the count module. Regex Ra contains one plus-closure ns+ and two Kleene closures [^nx7D]⁄. The closures are realized by the self-activation of the corresponding entries, i.e. e4, e5 and e7. When the match unit sees an input sequence ‘‘nx5Csv’’ at the start of input or start of a line, e4 is enabled. If the next input character is a white-space character, e4 is fired and entries e4 to e6 are enabled (S0 = 1, and S2S1 = 01). The

allowable input character that follows the white-space character can be either ns, [^nx7D] or nx3B. These three possibilities are handled by e4 to e6, respectively. Entry e8 is used to activate the count module. After the count module is activated and it sees 11 instances of [^nx7B], it will generate a CEL signal to activate entry e10. If the next input character also matches [^nx7B], e10 is fired and a match output is generated. The counting blocks in regexes Rb to Rd can be handled in a similar manner in MX-NFA. There is one remark for Rb and Rc. These two regexes contain some alternate substrings, e.g., (SENDFROM|MAILHOST) in Rc. The regex is expanded to two patterns, SET_SENDFROMnx28nx27[^nx27]{256} and SET_MAILHOSTnx28nx27[^nx27]{256}. Each of these two patterns consumes 17 match entries and 1 count module. Regex Re has two counting blocks back-to-back, i.e., nx00nx08.{13}[^nx00]{56}. One way to detect this pattern in MX-NFA is to unroll the first counting block, and use a count module to handle the second counting block at the end of the regex. The counting block in Rf is preceded by a closure, i.e., ns + [^ns]{200}. In this case, the character subclass in the closure is the complement of the character subclass in the counting block. We unroll the counting block by 1 time such that the prefixString is not ended with a closure, i.e., the equivalent regex is /^OPTIONSns + nS[^ns]{199}/m. The setup of the MX-NFA for detecting Rf is shown in Fig. 14. The initial values of the lowerand upper-bound counters are set to 198 and 0, respectively. The N-bit of the count module is set to 1. The counter is activated by the firing of entry e10, i.e., a non-white space character received after /^OPTIONSns+. If the count module sees 198 consecutive instances of [^ns], it will generate the CEL signal to enable entry e12. If one more instance of [^ns] is received, a match output is generated. Let’s consider the regex Rg. The authors believe the intention of this regex is that it wants to detect a pattern that starts with ‘‘o=’’ in a line or at the beginning of the input, and after skipping one or more white-space characters that follow the ‘=’ sign, there are 256 characters in the input (commence with a non-white space character) before the end of the current line. There is, however, one difficulty that arises from the given specification /^o=ns+[^nrnn]{256}/m where the counting block [^nrnn]{256} is preceded by the closure ns+. The character subclass in the closure ns+ is partially overlapped with the character subclass of the counting block [^nrnn]. It is difficult to decide when to start the counter of the count module. There are two ways to resolve this ambiguity. The first approach is to unroll the counting block, and the match engine would be more expensive requiring 261 entries in this example. The second approach is to rewrite the regex following the recommendation of [19]. Rg may be rewritten to Rg’ = /^o=ns[^nrnn]{256}/m, or rewritten to Rg’’ = /^o=ns+nS[^nrnn]{255}/m. In the latter case, by introducing the non-white space character nS in between the closure ns+ and the counting block [^nrnn]{255} in Rg’’, the counter is started when the match unit sees the first non-white space character that follows ‘‘o=ns+’’. We can see that rule rewriting also helps to reduce the size of the DFA/D2FA significantly in this case.

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

1265

Fig. 12. Comparison of MX-NFA and D2FA for regexes that cause state explosion.

A similar situation happens in Rh. There are 3 counting blocks in Rh, and the first counting block is preceded by a closure, i.e. ns+[nwns@n.]{200,}, and the character subclass of the closure is a subset of the character subclass of the counting block. Hence, to detect this regex using MX-NFA, the first counting block is unrolled and the remaining 2 counting blocks can be handled by two count modules. Unrolling can be avoided if rewriting the regex to ns+[nw@n.][nwns@n.]{199,} is acceptable. Many patterns in the SQL ruleset of Snort share the structure of Rv. The regex is used to guard against buffer overflow type of attacks, i.e. by submitting an SQL query string with 1024 characters or more. The SQL query string can be placed in between single quotes (nx27), or double quotes (nx22). Regex Rv contains a back reference n2. Back reference is, in general, not supported in MXNFA. However, in this special case, the back reference refers to the component (nx27[^nx27]{1024,}nx27|nx22[^nx22]{1024,}nx22), i.e. if the first instance of the substring is delimited by single

quotes, then the second instance of the substring is also required to be delimited by single quotes; similarly for double quotes. Rv can be decomposed into 6 independent cases based on the ‘|’ operator: (i) nw+[nrnnns]⁄nx3a=[nrnnns]⁄nx27[^nx27]{1024,}nx27[nrnnns]⁄nx3b.⁄userid [nrnnns]⁄=>[nrnnns]⁄nx27[^nx27]{1024,}nx27

(ii) nw+[nrnnns]⁄nx3a=[nrnnns]⁄nx22[^nx22]{1024,}nx22[nrnnns]⁄nx3b.⁄userid [nrnnns]⁄=>[nrnnns]⁄nx22[^nx22]{1024,}nx22

(iii) (iv) (v) (vi)

useridns⁄=>ns⁄nx27[^nx27]{1024,} useridns⁄=>ns⁄nx22[^nx22]{1024,} nðns⁄nx22[^nx22]{1024,} nðns⁄nx27[^nx27]{1024,}

If we substitute nx27 by nx22 in case (i), we will obtain case (ii). Similarly for cases (iii) and (iv), and (v) and (vi). To detect case (i), we need to use 27 match entries and 2 count modules. To detect

1266

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

4.3. A qualitative comparison with other methods

Regex Ra: Control flags Entry

Symbol

I

H

O

S2S1

S0

R

C

cm

--

--

--

--

--

--

--

e1

1

1

0

00

0

--

--

e2

0

0

0

00

0

--

--

e3

0

0

0

00

0

--

--

e4

0

0

0

01

1

0

0

e5

0

0

0

00

1

0

0

e6

0

0

0

01

0

0

0

e7

0

0

0

00

1

0

0

e8

0

0

0

00

0

0

1

e9

0

0

0

00

0

0

0

e10

0

0

1

00

0

1

0

e11

0

0

0

00

0

0

0

Fig. 13. Setup of the MX-NFA for detecting Ra. The initial count values for the lowerand upper-bound counters are equal to 11 and 0, respectively. The N-bit of the count modules set to 1.

case (iii), we need to use 14 match entries and 1 count module. To detect case (v), we need to use 6 match entries and 1 count module. The DFA/D2FA for regexes Rq to Rv cannot be generated because the number of states in the initial DFA exceeds the predefined limit allowed by the program. Even if rule rewriting is applied to Ru, the number of states in the DFA is still over 100 K. But these difficult cases can be handled by MX-NFA with affordable cost. In practice, the pattern set contains a relatively large number of regexes. When multiple regexes are merged into one DFA/D2FA, the state explosion problem can become even more severe. For example, Rc and Re can be mapped to independent D2FA with 2041 and 2446 states, respectively. However, when they are put together, the resultant DFA contains more than 100 K states and the program is aborted.

Regex Rf :

It is not easy to make direct comparison with the other published regex matching methods. In general, the system cost and performance depends on the properties of the regexes. Different research teams used different pattern sets in their evaluations. We have seen in the previous section that the complexity of the regexes can vary substantially. In general, extended FAs with annotated state variables and instructions [30,31], and hybrid FAs [33,36] cannot guarantee constant throughput. The processing speed depends on the pattern set and the input sequence. This will be a disadvantage for signaturebased IDS, where the system needs to scan the network traffic at real-time. MX-NFA has the advantage that it can guarantee constant processing speed of one character per cycle. The processing architecture of LaFA [35] is very complex, in particular the variable string verification module. The pattern set that can be handled by the hardware is constrained by a number of design parameters, such as the size of the history buffer, size of the time stamp table, number of processing tracks, number of repetition detection modules, etc. However, these constraints are not apparent. Moreover, the handling of regexes with very short simple strings and consecutive variable strings is not clearly explained. MX-NFA has the advantage that the processing logic is very simple, and the mapping of regexes to the hardware match units is straightforward. A disadvantage of the NFA architecture of [32] is that a hardware detection unit can only handle one regex. In principle multiple regexes can be merged into one NFA. However, the bit-vector processing architecture will only report whether the automaton has reached an output state, and it does not identify the matching regex. This is not sufficient for IDS where the system needs to know which intrusion signature has been found. The other two disadvantages of [32] are (i) repetition of character subclass can only be handled by unrolling, and (ii) the size of the bit-vector must be pre-determined in the hardware implementation. In comparison to MX-NFA, MX-NFA is more efficient in handling repetition of character subclasses, and is more flexible in resource sharing. Cascaded match units can be used to detect multiple regexes of different lengths. When mapping the regexes to the MX-NFA hardware, we only need to (i) delimit the regex by a null entry, and (ii) ensure that the output entries of two regexes will not be mapped to the same block (lower or upper half of a match unit).

Regex Rg’’ :

Control flags Entry Symbol

I

Control flags

H O S2S1 S0 R C

Entry Symbol

I

H O S2S1 S0 R C

cm

-- -- --

--

-- -- --

cm

-- -- --

--

-- -- --

e1

1

1

0

00

0

-- --

e1

1

1

0

00

0

0

0

e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13

1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 0

00 00 00 00 00 00 00 00 00 00 00 00

0 0 0 0 0 0 0 1 0 0 0 0

----0 0 0 0 0 0 1 0

e2 e3 e4 e5 e6 e7 e8

1 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 1 0

00 00 00 00 00 00 00

0 0 1 0 0 0 0

0 0 0 0 0 1 0

0 0 0 1 0 0 0

----0 0 0 0 1 0 0 0

Fig. 14. Setup of the MX-NFA for detecting Rf and Rg’’.

D. Pao et al. / Computer Communications 36 (2013) 1255–1267

5. Conclusion and future work Signature-based intrusion detection needs to detect hundreds of patterns at the same time. In general, more than half of the patterns are simple strings and the rest are regular expressions. Recent advanced network processor [25] is equipped with DFAbased string/regex matching engines to speed up deep packet inspection. However, detecting regexes using DFA may suffer from the state explosion problem, and the memory requirement can be prohibitive. Based on the evaluation results, we can see the advantage of MX-NFA over DFA-based methods in detecting typical Snort regexes that cause state explosion. To detect the 20 regexes listed in Fig. 12 with MX-NFA, we need to use less than 1800 match entries and 35 count modules. A match entry in MX-NFA consumes 256 bits memory. Hence, the overall memory cost for detecting the 20 regexes of Fig. 12 with MX-NFA is about 450 Kbit. If pattern Rn of Fig. 12 is to be detected using D2FA, the automaton contains close to 80 K states and 440 K transition rules. A transition rule requires at least 42 bits (two 17-bit state IDs and one 8-bit character). Hence, the memory cost of the lookup table for Rn alone exceeds 18 Mbit. The hardware implementation of MX-NFA is simple, and the mapping of regexes to the MX-NFA is straightforward. The number of match entries required is linearly proportional to the total length of the regexes. Since the MX-NFA offers constant throughput and allows dynamic updates, it can be used as a supporting unit to supplement DFA-based regex detection methods [17,23], and largely reduce the overall memory cost of the match engine. Anti-virus system is another security tool that requires highspeed content inspection. The size of a virus database is much larger than the ruleset of IDS. There are over 90 K patterns in the ClamAV [38] virus database, where close to 8 K patterns contain some regex features. The design of hardware accelerator to support the full set of patterns in the ClamAV virus database is a very challenging problem. Our previous work in [39,40] are limited to the detection of the 83 K simple strings. In our future work, we shall try to extend our method to detect the 8 K regex patterns in the ClamAV virus database, and the proposed MX-NFA architecture will be used as one of the building blocks in the overall system. Acknowledgements This work was supported by a grant from the Research Grant Council of the HKSAR, China (Project No. CityU 119809). The authors would like to thank Mr. Yuchen Yang for his contributions in the development of a computer program for generating the data structures required by MX-NFA. References [1] J.M. Estevez-Tapiador, P. Garcia-Teodoro, J.E. Diaz-Verdejo, Anomaly detection methods in wired networks: a survey and taxonomy, Computer Communications 27 (2004) 1560–1584. [2] M. Tavallaee, N. Stakhanova, A.A. Ghorbani, Toward credible evaluation of anomaly-based intrusion-detection methods, IEEE Transaction on Systems, Man, and Cybernetics – Part C: Applications and Reviews 40 (2010) 516–524. [3] Snort intrusion detection system, http://www.snort.org. [4] S. Dharmapurikar, P. Krishnamurthy, T.S. Sproull, J.W. Lockwood, Deep packet inspection using parallel bloom filters, IEEE Micro 24 (2004) 44–51. [5] L. Tan, B. Brotherton, T. Sherwood, Bit-split string-matching engines for intrusion detection and prevention, ACM Transaction on Architecture and Code Optimization 3 (2006) 3–34. [6] J.T.L. Ho, G.G.F. Lemieux, PERG: a scalable FPGA-based pattern-matching engine with consolidated bloomier filters, in: International Conference on Field-Programmable Technology, 2008, pp. 73–80. [7] T. Song, W. Zhang, D. Wang, Y. Xue, A memory efficient multiple pattern matching architecture for network security, IEEE INFOCOM (2008) 166–170.

1267

[8] N. Hua, H. Song, T.V. Lakshman, Variable-stride multi-pattern matching for scalable deep packet inspection, IEEE INFOCOM (2009) 415–423. [9] D. Pao, W. Lin, B. Liu, A memory-efficient pipelined implementation of the Aho–Corasick string-matching algorithm, ACM Transaction on Architecture and Code Optimization 7 (10) (2010). [10] A.V. Aho, J.E. Hopcroft, J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Co., 1974. [11] Z.K. Baker, V.K. Prasanna, Automatic synthesis of efficient intrusion detection systems on FPGAs, IEEE Transaction on Dependable and Secure Computing 3 (2006) 289–300. [12] C.H. Lin, C.T. Huang, C.P. Jiang, S.C. Chang, Optimization of pattern matching circuits for regular expression on FPGA, IEEE Transaction on VLSI System 15 (2007) 1303–1310. [13] I. Sourdis, J. Bispo, J.M.P. Cardoso, S. Vassiliadis, Regular expression matching in reconfigurable hardware, Journal of Signal Processing Systems 51 (2008) 99–121. [14] I. Sourdis, D.N. Pnevmatikatos, S. Vassiliadis, Scalable multigigabit pattern matching for packet inspection, IEEE Transactions on VLSI System 16 (2008) 156–166. [15] Y.H.E. Yang, W. Jiang, V.K. Prasanna, Compact architecture for high-throughput regular expression matching on FPGA, in: ACM/IEEE ANCS, 2008, pp. 30–39. [16] J. Hopcroft, An nlogn algorithm for minimizing states in a finite automaton, Technical Report, Stanford University, STAN-CS-71-190, 1971. [17] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, J. Turner, Algorithms to accelerate multiple regular expressions matching for deep packet inspection, ACM SIGCOMM Computer Communication Review 36 (2006) 339–350. [18] M. Becchi, S. Cadambi, Memory-efficient regular expression search using state merging, in: IEEE INFOCOM, 2007, pp. 1064–1072. [19] F. Yu, Z. Chen, Y. Diao, T. Lakshman, R.H. Katz, Fast and memory-efficient regular expression matching for deep packet inspection, in: ACM/IEEE ANCS, 2006, pp. 93–102. [20] J. Rohrer, K. Atasu, J. van Lunteren, C. Hagleitner, Memory-efficient distribution of regular expressions for fast deep packet inspection, in: IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, 2009, pp. 147–154. [21] M. Becchi, P. Crowley, An improved algorithm to accelerate regular expression evaluation, in: ACM/IEEE ANCS, pp. 145–154, 2007. [22] B.C. Brodie, D.E. Taylor, R.K. Cytron, A scalable architecture for highthroughput regular-expression pattern matching, SIGARCH Computer Architecture News 34 (2006) 191–202. [23] J. van Lunteren, ‘‘High-performance pattern-matching for intrusion detection’’, IEEE INFOCOM, pp. 1–13, 2006. [24] J. van Lunteren, A. Guanella, Hardware-accelerated regular expression matching at multiple tens of Gb/s, in: IEEE INFOCOM, 2012, pp. 1737–1745. [25] J.D. Brown, S. Woodward, B.M. Bass, C.L. Johnson, IBM power edge of network processor: a wire-speed system on a chip, IEEE Micro 31 (2011) 76–85. [26] S. Kumar, J. Turner, J. Williams, Advanced algorithms for fast and scalable deep packet inspection, in: ACM/IEEE ANCS, 2006, pp. 81–92. [27] D. Ficara, S. Giordano, G. Provissi, F. Vitucci, G. Antichi, A. Di Pietro, An improved DFA for fast regular expression matching, SIGCOMM Computer Communication Review 38 (2008) 29–40. [28] T. Liu, Y. Yang, Y. Liu, Y. Sun, Li Guo, An efficient regular expressions compression algorithm from a new perspective, in: IEEE INFOCOM, 2011. [29] S. Kumar, B. Chandrasekaran, J. Turner, G. Varghese, Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia, in: ACM/IEEE ANCS, 2007, pp. 155–164. [30] R. Smith, C. Estan, S. Jha, XFA: faster signature matching with extended automaton, in: IEEE Symposium on Security and Privacy, 2008, pp. 187–201. [31] R. Smith, C. Estan, S. Jha, S. Kong, Deflating the big bang: fast and scalable deep packet inspection with extended finite automata, ACM SIGCOMM Computer Communication Review 38 (2008) 207–218. [32] T.H. Lee, Hardware architecture for high-performance regular expression matching, IEEE Transaction on Computers 58 (2009) 984–993. [33] M. Becchi, P. Crowley, A hybrid finite automaton for practical deep packet inspection, in: ACM Conference on Emerging Network Experiment and Technology (CoNEXT), pp. 1–12, 2007. [34] M. Becchi, P. Crowley, Extending finite automata to efficiently match Perlcompatible regular expressions, in: ACM Conference on Emerging Network Experiment and Technology (CoNEXT), 2008, pp. 1–12. [35] M. Bando, N.S. Artan, H.J. Chao, Scalable lookahead regular expression detection system for deep packet inspection, IEEE/ACM Transaction on Networking 20 (2012) 699–714. [36] Y.-H.E. Yang, V.K. Prasanna, Space-time tradeoff in regular expression matching with semi-deterministic finite automata, in: IEEE INFOCOM, 2011. [37] M. Becchi, Regular Expression Processor, http://regex.wustl.edu/index.php/ Main_Page. [38] ClamAV anti-virus system, http://www.clamav.net. [39] D. Pao, X. Wang, X. Wang, C. Cao, Y. Zhu, String searching engine for virus scanning, IEEE Transaction on Computers 60 (2011) 1596–1609. [40] D. Pao, X. Wang, Multi-stride string searching for high-speed content inspection, The Computer Journal 55 (2012) 1216–1231.

Suggest Documents