Work-in-Progress: A Fast Online Sequential Learning Accelerator for IoT Network Intrusion Detection Hantao Huang, Rai Suleman Khalid, Wenye Liu and Hao Yu School of Electrical and Electronic Engineering, Nanyang Technological University Singapore Email:
[email protected]
ABSTRACT
Packet Filtering
This work is sponsored by grants from Singapore MOE Tier-2 (MOE2015-T2-2013), NRF-ENIC-SERTD-SMES-NTUJTCI3C-2016 (WP4) and NRF-ENIC-SERTD-SMESNTUJTCI3C-2016 (WP5). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. CODES/ISSS ’17 Companion, October 15–20, 2017, Seoul, Republic of Korea © 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery. ACM ISBN 978-1-4503-5185-0/17/10. . . $15.00 https://doi.org/10.1145/3125502.3125532
Feature Vectors
Detection Acceleractor
Sequential Learning Acceleractor
IoT System Alert
Model Updates
Online Sequential Learning Acceleractor
X T λ
Input Mux
Data Valid Ready Shared Intermediate Results $
Fp-to-flp Conversion Look-up Table
Vector core
Scalar core
Y
Parallel ± x ÷ ≥
+ Activation Matrix H
Beagleboard xM
Virtex-7 FPGA
Output Weight
Input Weght
INTRODUCTION
Deployment of IoT devices for smart buildings and homes will make life more comfortable by introducing physical data collection linked to cyber network for data analytics. It, however, leads to potential cyber-attacks as well. As such, IoT-based smart buildings and homes become typical cyberphysical systems with the need of security protection. For example, smart meters can be hacked to analyze occupants privacy or to steal utility bills. Network intrusion detection system (NIDS) is one important mechanism to protect a network from malicious activities in a real-time fashion. Therefore, in IoT networks, a low-power and low-latency intrusion detection accelerator is greatly needed. In this work, we present a fast machine learning hardware accelerator for IoT NIDS. A computational-efficient single hidden layer feedforward neural network model is developed using the least-squares solver, which is feasible to support online sequential learning for IoT NIDS. Accordingly, a scalable and parameterized hardware realization is developed on FPGA with 128-PE in parallel operating at 50-MHz consuming 0.85W power. Experimental results have shown that the proposed machine-learning accelerator (on FPGA) has achieved a good detection accuracy and a bandwidth of 409.6 Gbps with an average speed-up of 4.5× and 77.4×, when compared to general CPU and embedded CPU based learning process on benchmark ISCX-2012. Moreover, around 113.83× and 57.75× energy saving are also observed respectively.
Packet Collecting
Feature extraction on embedded systems
LFSR
1
Embedded software and FPGA hardware Co-deisgn
IoT Network Traffic (NIC or pcap file)
Deployment of IoT devices for smart buildings and homes will offer a high level of comfortability with increased energy efficiency; but can also introduce potential cyber-attacks such as network intrusions via linked IoT devices. Due to the low-power and low-latency requirement to secure IoT network, traditional software based security system is not applicable. Instead, an embedded hardware-accelerator based data analytics is more preferred for network intrusion detection. In this paper, we propose an online sequential machine learning hardware accelerator to perform realtime network intrusion detection. A single hidden layer feedforward neural network based learning algorithm is developed with a least-squares solver realized on hardware. Experimental results on a single FPGA achieve a bandwidth of 409.6 Gbps with fast yet low-power network intrusion detection based on a number of benchmarks.
Sigmoid Function
x PE PreH Memory
Figure 1: Embedded software and hardware co-design for IoT network intrusion detection with hardware architecture
2 IOT NETWORK INTRUSION DETECTION SYSTEM 2.1 NIDS Architecture for IoT System The goal of network intrusion detection system is to differentiate anomalous network activity from normal network traffic in a timely fashion. The major challenge is that the patterns of attack signatures change over time and the NIDS has to upgrade to handle these changes [1]. Therefore, we propose a hardware-software co-design machine learning accelerator as shown in Fig. 1. We perform hardware/software partition as feature extraction on software and online-sequential learning on hardware. This is mainly due to the various communications protocols in IoT systems and high complexity of machine learning. Details on sequential machine learning accelerator will be discussed in the next section.
2.2
Online Sequential Learning Algorithm
Sequential learning is the process to adjust the trained model to minimize the accuracy loss with new incoming data. This is especially necessary for NIDS since network traffic changes over time. We first build the relationship between the hidden layer output H and input training data X as preH = XA + B, H =
1 1 + e −preH
(1)
CODES/ISSS ’17 Companion, October 15–20, 2017, Seoul, Republic of Korea
H. Huang et al.
Table 1: Intrusion Detection Accuracy comparison with other algorithms on NSL-KDD dataset
Model Proposed w SL † Proposed w/o SL SVM [6] MLP [6] Naive Bayes [6]
Normal 97.01 % 95.82 % 92.8 % 93 % 85.8%
DOS 64.03% 76.21% 74.7% 71.2 % 69.4%
Class Probe 79.51% 72.61% 71.6% 60.1 % 32.8%
R2L 8.19% 0.25% 12.3% 0.001% 0.095%
Overall 76.04% 75.15% 74.55% 70.6% 70.5%
† SL represents sequential learning. [6] has built-in models of SVM, MLP and Naive Bayes.
where A ∈ Rn×L and B ∈ R N ×L are random generated input weights and biases formed by a i j and bi j between [−1, 1]. N , n and L are the number of training samples, feature dimensions and number of hidden nodes respectively. The training process is designed to find Γ such that we can minimize: min. | |HΓ − T | |2 + λ | |Γ | |2 (2) where H is the hidden-layer output matrix generated from the Sigmoid function for activation; and λ is a user defined parameter that biases the training error and output weights. [2] shows that such training method will not affect the overall accuracy. The output weight Γ is computed based on a least-squares problem: T ˜ −1 H ˜ H ˜ = √H ˜ T H) ˜ T T, T˜ = (3) Γ = (H 0 λI ˜ ∈ R(N +L)×L , T˜ ∈ R(N +L)×M and M is the number of classes. where H I ∈ RL×L . The complexity of solving output weight will be reduced by the square-root-free Cholesky decomposition and incremental least-squares solutions. Our training data set X and labels T can be updated to adjust new network traffic patterns. The sequential learning process includes initial phase and sequential learning phase. In the initialization phase, the output weight Γ (0) can be calculated by (3). In the sequential learning phase, the new training data arrives one-by-one. Given the (k + 1)th new training data arrives, the output weight Γ k +1 can be calculated as [3]: Pk +1 = Pk − Pk HT(k +1) (I + H(k +1) Pk HT(k +1) )−1 H(k +1) Pk Γ k +1 = Γ k + Pk +1 HT(k +1) (Yk +1 − H(k +1) Γ k )
(4)
˜T ˜ where P0 = (H )−1 and other matrix inversion can be calculated H (k =0) (k =0) using the incremental least-square solver. We use H(k +1) to represent new generated activation output. A folded architecture is proposed as shown in Fig. 1 with parallel and pipelined multiplication process.
3
EXPERIMENT RESULTS
To verify our proposed architecture, we have implemented it on Xilinx Virtex 7. The HDL code is synthesized using Synplify and the maximum operating frequency of the system is 54.1 MHz under 128 parallel PEs. Experiment results on our accelerator are denoted as hardware-accelerated NIDS. We develop two baselines to compare the performance. To test our proposed system, we have used two benchmarks ISCX-2012 [4] and NSL-KDD [5] dataset, which include all the attacks in the IoT system such as DoS, Probe and R2L. We did not consider U2R attack in the NSL-KDD dataset since it will not happen in the home area network (HAN) and the number of samples is very small. Fig. 2 shows the F-measure accuracy on benchmarks ISCX-2012 and NSL-KDD. We can find that the hardware-accelerated NIDS is slightly less accurate comparing to software-NIDS. This is mainly due to the 8-bit fixed data format. Fig. 2 also shows that our system can not only detect anomaly network traffics but also try to identify it with high accuracy. In the experiment, the maximum throughput of proposed architecture is 12.68 GFLOPS with 128 parallelism for matrix multiplication under 50Mhz operations.
Class (b)
Class (a)
Figure 2: Detection accuracy of proposed sequential learning algorithm on (a) ISCX 2012 (b) NSL-KDD Table 2: Performance Comparison on ISCX 2012 Benchmark Platform Software Embedded FPGA
Type Train Test Train Test Train Test
Format Single Single Single+ Fixed
Time 656s 30.24ms 11183s 383ms 144.5s 0.328ms
Power 84 W 84 W 2. 5W 2.5 W (2.5+0.85) W (2.5+0.85) W
Energy 55104J 2540.2mJ 27957.5J 957.6mJ 484.08J 1.10mJ
Speed. 4.5× 92.2× 77.4× 1168× – –
E. Imp. 113.83× 2309.2× 57.75× 870.55× – –
The maximum input bandwidth is 409.6 Gbps and can be easily extended with higher parallelism and faster operating frequency. Table 1 shows the accuracy comparison between our proposed algorithm and others. Our method outperforms others. Furthermore, the training complexity is less compared to SVM based training. The low detection rate of R2L is due to the similar behavior of R2L comparing to normal traffic. This can be later filtered out by firewalls or other rule based techniques. Table 2 provides detailed comparisons between different platforms. Our proposed hardware-accelerated NIDS has 0.85W on FPGA accelerator and 2.5W on embedded system. For training process, our accelerator has 4.5× and 77.4× speed-up for training compared to Software-NIDS and EmbeddedNIDS. For testing process, it is mainly on matrix-vector multiplications. Our proposed method has and 92.2× and 1168× speed-up for testing compared to Software-NIDS and Embedded-NIDS implementations. Furthermore, our proposed hardware-accelerated NIDS achieves around 113.83× and 57.75× energy saving on training process respectively.
4
CONCLUSION
In this paper, we propose an online sequential machine learning hardware accelerator to perform real-time network intrusion detection. A fast and lowpower IoT NIDS can be achieved based on hardware/software co-design with sequential learning to adapt to new threats. Future works will investigate the detection speed with deeper neural network and consider the memory constraint of FPGA for more energy-efficient and accurate NIDS.
REFERENCES [1] Eduardo Viegas and et.al. Towards an energy-efficient anomaly-based intrusion detection engine for embedded systems. IEEE Transactions on Computers, 66(1):163– 177, 2017. [2] Hantao Huang and et.al. Distributed-neuron-network based machine learning on smart-gateway network towards real-time indoor data analytics. In IEEE/ACM DATE, 2016. [3] Nan-Ying Liang and et. al. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on Neural networks, 17(6):1411–1423, 2006. [4] ISCX 2012. http://www.unb.ca/research/iscx/dataset/iscx-IDS-dataset.html. [5] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghorbani. Nsl-kdd dataset. http://www. iscx. ca/NSL-KDD, 2012. [6] Mark Hall and et. al. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.