Accelerating String Matching Using Multi-threaded ... - Google Sites
Recommend Documents
AbstractâNetwork Intrusion Detection System has been widely used to protect ... malware. The string matching engine us
processor are too slow for today's networking. ⢠Hardware approaches for .... less complexity and memory usage compare
scalable high performance, independent of the input stream or the pattern set being analyzed by .... languages like Java, C, C++, Python and Haskell. We used a.
Jun 14, 2006 - means by definition that P [j] = i. If any of ..... with realistic real world data. .... Parameterized du
website or repository, are prohibited. ... We give simple and practical algorithms for finding all such pattern ..... compiled them with Borland C++ Builder 6. We per ...
SCICON Consultancy International Ltmited, Sanderson House, 49 Berners Street, London WIP 4AQ,. England. GEOFF R. DOWLING. Department of Computer ...
algorithm is developed to equally divide the attack signatures .... Rule options contain extra information for matching packets ..... E. Node level parallelism.
fast : search for the last (rightmost) character of x ;. 2 .... i = z yj] thus xi = yj; start = j ? i is the possible starting position of an occurrence of x in y; wall is the ...
AbstractâA string dictionary is a data structure for storing a set of strings that maps them ..... been proposed [9],
Joseph Daniel Pehoushek z. Abstract ... The two most famous are the Knuth-Morris-Pratt algorithm 3] and ... Using the shift tables of Knuth-Morris-Pratt algo-.
times faster with significant improvement on memory efficiency. Furthermore, because the ... become inadequate for the h
They are critical net-work security tools that help protect high-speed computer ... Most hardware-based solutions for hi
One of the largest areas deals with speech recognition, where the ... wireless networks, as the air is a low qual- .....
Linked data from various profiles was extracted using LinkedIn API [5]. Google Forms were used to create forms to be filled by professionals of various industries.
GPUs one step closer to wider use for general purpose computations. The use of a GPU instead of a CPU to perform general purpose computations is known as ...
shown to be around 28 s, for patterns 16 character long, achieving a dramatic speed-up over classical FPGA computations. 2.2 Self-reconfiguration. The main ...
ini selari dengan pertambahan jumlah aplikasi web dan perisian berdiri sendiri pada ... Terdapat banyak teknik pengesanan klon telah dihasilkan pada hari ini.
exact multiple string matching algorithm, which benefits from Intel's. SSE (streaming SIMD extensions) technology for searching long strings. Our experimental ...
in the English King James version of the Holy Bible (as provided by [1]). BibleLines: The .... 1. Project Gutenberg official home site. http://www.gutenberg.net/.
career path to a graduate to reach his/her dream career given the current ... information from the data. Further, this was used as .... Technology. > Doctoral, Data ...
Jul 28, 2011 - Non-parametric change-point detection using string matching ..... 0.25. 0.3. 0.35. 0.4 location of argmin Ï(j) / n frequency γ. 0.42. 0.44. 0.46.
Oct 13, 2009 - Accelerating Pattern Matching for DPI. Jun Mu, Sakir ... conventional hardware accelerating methods, performance .... Assume that there are m keys to be inserted into a ... first RAM block can be expressed using equation(1).
(2) The solid-state sensors are increasingly used, which capture only a portion ... file is small. We have ... the ridge
where the end point of the alignment maybe be unknown. How- ever, it needs to know where the two matching sequences star
Accelerating String Matching Using Multi-threaded ... - Google Sites
Accelerating String Matching Using Multi-threaded Algorithm on GPU Cheng-Hung Lin*, Sheng-Yu Tsai**, Chen-Hsiung Liu**, Shih-Chieh Chang**, Jyuo-Min Shyu** *National Taiwan Normal University, Taiwan **National Tsing Hua University , Taiwan
Introduction • Network Intrusions Detection System (NIDS) has been widely used to detect network attacks. • The pattern matching engine dominates the performance of an NIDS. • Traditional pattern matching approaches on uniprocessor are too slow for today’s networking. • Hardware approaches for acceleration pattern matching. – Logic-based – Memory-based – Multiprocessor-based 2
GPU for Pattern Matching • Parallel computation on GPU is suitable for accelerating pattern matching.
AAAAAAAAAAAAAAAAAAAAAAAB
AAAAAAAAAAAAAAAAAAAAAAAB Thread #1 Thread #2
Thread #3 Thread #4
1 thread 24 cycles
4 segments 4 threads 6 cycles 3
Boundary Problem • Boundary Problem – Pattern occurring in the boundary of adjacent segments cannot be detected. – False negative results False Negative
AAAAAAAAAAAAAAAAAABBBBBB Thread #1 Thread #2
Thread #3 Thread #4 4
Overlapped Computation • To resolve boundary problem – Scan across boundaries
• Problem – Overhead of overlapped computation – Throughput reduction Thread #3 can identify "AB" Thread #1
Aho-Corasick Algorithm • Aho-Corasick (AC) algorithm has been widely used for pattern matching due to its advantage of matching multiple patterns in a single pass – Compiling multiple patterns into a composite state machine
B
Patterns (1) AB (2) ABG (3) BEDE (4) EF
2
1
A
G 3
[^ABE] B
0
4
E 8
E
F
E
D 5
6
7
9 6
Aho-Corasick Algorithm (cont.) • Aho-Corasick (AC) state machine composes of – Solid line represents valid transitions. – Dotted line represents failure transitions.
• Failure transition backtracks the state machine to recognize patterns in different start locations. B 1
A
G 2
3
[^ABE]
Input strings : A B E D E
B
0
4
E 8
location 1
E
F
E
D 5
6
7
9
location 2 7
Problems of AC on GPU • Direct implementation of AC on GPU – To resolve the boundary problem, each thread has low bound constraint of scanning length • Constraint = segment length + overlapped length • Overlapped length = the length of longest pattern -1
– Overhead of overlapped computation AAAAAAAAAAAAAAAAAABBBBBB
8
Problems of AC on GPU (cont.)
9
Failureless-AC State Machine • AC state machine – Failure transition backtracks the state machine to recognize patterns in different start locations. B
1
A
2
3
[^ABE]
Input strings :
4
E
location 2
E
D
E
B
0
A B E D E location 1
G
6
5
7
F 8
9
• Failureless-AC state machine – Remove failure transition – Terminated when no valid transitions – Recognize patterns in location 1. Input strings : Location 1
A B E D E
0
Stop B 1
A B
4
E 8
G 2
E
F
3 E
D 5
9
6
7
10
Parallel Failureless-AC Algorithm • Parallel Failureless-AC (PFAC) Algorithm – Allocate each byte of input a thread to traverse Failureless-AC state machine.
XXXXXXXXXABEDEXXXXXXXXXX
11
Mechanism of PFAC Thread #n
…XXXXABEDEXXX… Thread #n+1 1
A B 0
B
2
E 4
E 8
G D
5 F
3 E 6
1
A 7
B 0
Thread #n
2
E 4
E 9
B
8
3 E
D 5
F
G
6
7
9
Thread #n+1
12
Reducing Overlapped Computation • Direct Implementation of AC Algorithm – Each thread has low bound constraint of scanning length – Overlapped computation (overlapped length = 3)
• PFAC Algorithm – – – –
Without boundary problem. Each thread has variable scanning length Most thread terminates early Reducing overlapped computation to 1 1 3 …CCCCCCCCBCCCCCC… 13
Experimental Environments • CPU: Intel® Core™ i7 CPU 950 @3.07 GHz – 4 cores – 12 GB DDR3 memory • GPU: NVIDIA ® GeForce ® GTX 480 @ 1.4 GHz – 480 cores – 1536MB DDR5 memory • Patterns: String pattern of Snort V2.4 – 1,998 rules containing 41,997 characters – Total 27,754 states • Input: Normal and worst case – DEFCON packet 14
Experimental Results Table 1: Throughput of normal case inputs
Memory efficiency= (Throughput x # of characters) / Memory 19
Conclusions • We have proposed a novel parallel string matching algorithm which is well-suited to be performed on GPUs and is free from the boundary detection problem. • The proposed algorithm creates a new state machine which has less complexity and memory usage compared to the traditional Aho-Corasick state machine. • The new algorithm achieves a significant speedup compared to the traditional Aho-Corasick algorithm accelerated by OpenMP on CPU. • Compared to other GPU approaches, the new algorithm achieves 11.6 times faster than the state-of-the-art approach. 20