An Evaluation of FPGA-based IDS Pattern

5 downloads 0 Views 106KB Size Report
the string “ABCA” we have to remember the matching of character A 3 cycles earlier, the matching of B two cycles earlier, etc, until the final character is matched ...
An Evaluation of FPGA-based IDS Pattern Matching Techniques Ioannis Sourdis† , Dionisios Pnevmatikatos‡ , and Stamatis Vassiliadis† †Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands {Sourdis,Stamatis}@CE.ET.TUDelft.NL Tel: +31-15-27-89656

Abstract Pattern matching is one of the most computationally intensive tasks in network security systems. Numerous pattern matching approaches have been proposed in the past. The most common ones use: regular expressions, discrete comparators or CAM, Pre-decoding, and Hashing to match patterns. The researchers’ first concern was to achieve high operating throughput in order to process incoming packets in wire rates. Since the set of matching patterns increases rapidly, though, pattern matching designers started considering also the area cost of their designs. In this paper, we attempt an evaluation of FPGA-based pattern matching techniques for network security systems. We measure the efficiency of pattern matching modules in terms of obtained performance per area cost. Keywords: pattern matching, FPGA, Intrusion Detection Systems, network security

1 Introduction The area of intrusion detection systems is very active recently. Deep packet inspection is performed by intrusion detection systems (IDS) to provide sufficient protection from attacks[10]. Such systems check the packet header, rely on pattern matching techniques to analyze the packet payload, and make decisions on the significance of the packet body. Matching every incoming byte, though, against thousands of pattern

‡Microprocessor and Hardware Laboratory, Electronic and Computer Engineering Dept., Technical University of Crete, Chania, Greece [email protected] Fax: +31-15-27-84898

characters at wire rates is a computationally intensive task. Measurements on Snort IDS show that 80% of total processing is spent on string matching in the case of Web-intensive traffic[11]. In the past, numerous hardware units have been proposed for FPGA-based IDS pattern matching that can match thousands of patterns in parrallel (tens of thousand characters in total)[13, 18, 6, 12, 7, 16, 14, 2, 8, 4, 19, 3, 15, 9, 4, 20]. Generally speaking, the performance of FPGA-based systems is promising and shows that FPGAs can support the increasing needs for network security. Utilizing regular expressions, discrete comparators, CAM and hashing are some of the most common techniques for IDS pattern matching. We evaluate different IDS pattern matching approaches, including our solutions [19, 18, 20], in terms of performance and area cost and analyze their efficiency and the trade-offs that occur. The remainder of the paper is organized as follows: In Section 2 we describe different pattern matching methods. In Section 3, we present detailed implementation results of the best published pattern matching solutions and compare them using a performance efficiency metric. Finally, in Section 4 we present our conclusions.

2 IDS Pattern Matching Methods In this Section we describe different pattern matching techniques for intrusion detection implemented in FPGAs. We discuss the characteristics, advantages

449

and disadvantages of each approach. All proposed IDS pattern matching techniques have two objectives: (i) high operating throughput and (ii) low area cost.

8

output f

8

C

8

B

8

A

e

c enable a

b

Figure 1. Hardware NFA implementation of the following regular expression, which contains wild cards: f ((ab)|(c(∗.1 )e))∗.

Regular Expressions: Until 2004, one of the most common approaches was the regular expressions matching using Finite Automata, Non-deterministic (NFAs) or Deterministic (DFAs), which results in designs with low cost, but at a modest throughput [12, 7, 16, 14]. The basic idea here is to generate regular expressions for every pattern or group of patterns, and implement them with N/DFA (Figure 1). The use of parallelism (processing multiple bytes or characters per cycle) is in general difficult in finiteautomata implementations that are built with the implicit assumption that the input is checked one byte at a time. One proposed solution to this problem is the usage of packet-level parallelism where multiple pattern matching subsystems operating in parallel can process more than one packets[14]. Finally, finite automata are usually restricted in their operating frequency by the amount of combinational logic utilized for state transitions. In many cases the equations are complex, resulting in multilevel implementations even with FPGA 4-to-1 LUTs. Discrete Comparatos/CAM: A more brute-force approach is the use of CAM or discrete comparators [13, 18, 6](Figure 2). In this case, it is easier to achieve higher operating frequency by utilizing finegrain pipelining, while it is relatively straightforward to increase throughput by exploiting parallelism. On the other hand, sharing logic (i.e. for patterns that have common substrings) is more difficult. Therefore, simple discrete comparators and CAM designs usually suffer from high area cost. Pre-decoding: A more successful technique to in-

D

Match “ABCD”

Figure 2. Discrete Comparator that matches pattern "ABCD".

Decoder 8 Incoming Packets 8

8

8

=C

Match “ABCA”

=B =A

SRL16

Figure 3. Details of Pre-decoded CAM matching: four comparators provide the equality signals for characters A, B, and C. To match the string “ABCA” we have to remember the matching of character A 3 cycles earlier, the matching of B two cycles earlier, etc, until the final character is matched in the current cycle. This is achieved with the shift registers of width 3, 2, ... at the proper match lines.

crease sharing of character comparators and reduce design cost is to use pre-decoding that is applicable to both regular expression and CAM-like approaches [2, 8, 4, 19, 3]. The main idea is that incoming characters are pre-decoded resulting in each unique character being represented by a single wire (figure 3). In this way, an N -character comparator is reduced to an N input AND gate. Hashing: Both regular expressions and CAM-like approaches match each pattern separately, trying to share (if possible) common substrings between patterns. Instead of matching each pattern separately, it is more efficient to utilize a hash module to deter-

450

lowing equation:

Shift Register

Incoming Data

Match

Hash Tree Pattern ID Indirection Memory

Comparator Pattern ID Length

Address

Pattern Memory

P EM =

P erf ormance = Area Cost

T hroughput bytes Logic Cells + M EM 12 Character

(1)

Figure 4. Block diagram of hashing pattern matching approach. Incoming data are hashed to access the memory that contains the matching patterns. A subsequent simple comparison between incoming data and memory output determines the match.

mine which pattern is a possible match, read this pattern from a memory and compare it against the incoming data [15, 9, 4, 20]. Hardware hashing for pattern matching is a technique widely used for decades. Figure 4 depicts a Hash method for pattern matching[20]. The incoming packet data are shifted into a serial-in parallel-out shift register. The parallel-out lines of the shift register provide input to the comparator, which is also fed by the memory that stores the patterns. A selected subset of the incoming data bits are used as inputs to a hash module, which outputs the ID of the “possible match” pattern. The pattern ID is utilized to read the pattern. A subsequent comparison between the memory output and the incoming data determines whether there is a match.

3 Evaluation In this section we evaluate and compare the different FPGA-based IDS pattern matching techniques. We evaluate them by measuring their performance and area cost. We measure designs performance in terms of operating throughput (Gbps), and their area cost in terms of number of logic cells required for each matching character. For designs that require memory we measure the memory area cost based on the fact that 12 bytes of memory occupy area equal to a logic cell [21]. In order to evaluate and compare the efficiency of each solution, we also utilize the Performance Efficiency Metric (PEM), which takes into account both performance and area cost, and it is given by the fol-

Table 1 compares different FPGA-based pattern matching techniques. Regular Expression designs might require relatively low area, however, they do not achieve high throughput. More specifically, they require at least 2.5 logic cells per matching character and can process 0.4-1.2 Gbps of incoming data. The low area cost -compared to the discrete comparator solution- is due to their ability to share common parts of different pattern, when more than one patterns are included in the same regular expression. Their modest throughput is due to their complexity and the fact that these designs are relatively difficult to pipeline. On the other hand, discrete comparators’ solutions have higher area cost, since there is no sharing between separate patterns, but achieve much higher throughput mostly because it is easy to use pipeline and exploit parallelism. They can achieve 2-10 Gbps throughput requiring 10-20 LC/char. Generally speaking, discrete comparators’ designs have better efficiency than regular expressions. As discussed, a significant improvement in reducing the area cost of pattern matching designs is the use of pre-decoding. Pre-decoded CAMs and NFAs designs use centralized comparators (decoders) to match each character individually. Therefore each pattern character is matched only once and its result (character match signal) is shared among the pattern matchers. Consequently, the area cost is much lower (0.3-3.5 LC/char) and the performance remains high (up to 10 Gbps), since the designs become simpler. The efficiency of designs with pre-decoding have at least 10× better efficiency compared to discrete comparators and regular expression solutions. Finally, in the case of hashing, the area cost depends primarily on the hash function and the number of matching patterns. The resulting designs have even lower area cost, requiring 0.1-0.9 LC/char (including memory area) and high operating throughput (0.5-5 Gbps). The efficiency of pattern matching designs that use hashing is up to 60% better than that of designs using

451

Table 1. Comparison of different FPGA-based pattern matching approaches.

Description

Sourdis et al.[20]Hashing

Input bits /cycle 16

8 8 Papadopoulos et al.[15]Hashing 16 Attig et al.[1]Hashing 8 Cho et al.[5]Hashing 8 Sourdis[17]PreD-CAM 8 Sourdis et al.[19]PreD-CAM 32 Cho et al.[4]PreD-CAM 8 Baker et al.[2]PreD-CAM 8 Baker et al.[3]PreD-CAM 8 Clark et al.[8]PreD-NFAs 32 Sourdis et al.[18]Dis.Comp 32 Gokhale et al.[13]Dis.Comp 32 Cho et al.[6] Dis.Comp 32 Sidhu et al.[16]NFAs 8 Franklin et al.[12]NFAs 8 Moscola et al.[14]DFAs 32

ThrouLogic Eq. LCMEM ghput #chars PEM Cells1 /char2 Kbits (Gbps) 4.167 10,224 0.64 306 6.46 Vitex2-1500 5.734 12,106 0.89 612 20,911 6.44 Virtex2-1000 2.108 6,272 0.44 288 4.72 Virtex2-1000 2.000 2,570 0.50 630 4.00 18,636 Virtex2-3000 3.712 5,230 0.96 1,188 3.87 VirtexE-2000 0.502 36,720 0.10 629 420k3 5.05 Spartan3-1000 1.900 >8,0004 >0.475 162 20,800

Suggest Documents