Runtime Mitigation of Illegal Packet Request. Attacks in Networks-on-Chip. N Prasad, Rajit Karmakar, Santanu Chattopadhyay, and Indrajit Chakrabarti.
Runtime Mitigation of Illegal Packet Request Attacks in Networks-on-Chip N Prasad, Rajit Karmakar, Santanu Chattopadhyay, and Indrajit Chakrabarti Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, WB 721302, India {nprasad,rajit,santanu,indrajit}@ece.iitkgp.ernet.in Abstract—A novel Denial-of-Service attack for Networks-on- the type of attack, they have been implemented at a cost of Chip, namely illegal packet request attack (IPRA), has been area and execution time. Boraten and Kodi [2] have proposed proposed and measures to mitigate the same have been addressed. a packet validation technique called P-Sec, for protecting comHardware Trojans, which cause these attacks, are conditionally triggered inside the routers at the buffer sites associated with promised NoC architectures. The attacks considered are fault local core, when the core is idle. These attacks contribute to injection side channel attacks and covert hardware Trojan (HT) the degradation of network performance and may even create attacks on NoC links. Though P-Sec can secure the packet deadlocks, which can raise serious concerns in time critical information while the packets are flowing in the network, it systems. A security unit has been proposed to detect these attacks does not deal with the attacks that are confined to the router and mitigate the consequent loss by guiding the control units of the corresponding buffers to either isolate or mask the attacked microarchitecture. They also have proposed a target-activated buffers in runtime. Area and power overheads of the proposed sequential payload (TASP) HT model that injects faults into secure router are found to be a maximum of 1.69% and 0.63% the packets, by inspecting them [8]. To circumvent the threats, respectively when compared to a baseline router in a 16×16 the authors have proposed a heuristic threat detection model Mesh network. The proposed secure router can also improve to classify faults and to discover the HTs within compromised the normalized execution time as well as energy consumption of links. Many of the works reported earlier consider the attack benchmark applications under considered IPRAs. Index Terms—Allocator, buffer, hardware Trojan, Network-on- scenario while a core is communicating with others. However, the case of attacking a NoC via a router while the cores remain Chip, security. I. I NTRODUCTION idle, has not been considered so far. Unlike the previous works, With the increase in the heterogeneous functionality pro- this work concentrates on the attack that happens at the buffer vided by modern electronic systems, several IPs from different sites of a router, when the core attached to it remains idle. vendors are being integrated to realize such systems. In this In an MPSoC platform, the processing cores remain idle direction, even the Networks-on-Chip (NoCs) are available as for some time during the execution of a mapped application. individual IPs, easing the task of designers for performing de- For this idle time, cores do not inject any packet into the tailed interconnection on the chip. This encourages the attack- network. During this time, the buffer corresponding to the ers to disrupt the functionality of the chip through malicious core in the associated router should not request the switch means. Denial-of-Service (DoS) attacks aid in degrading the allocator for any access of the path. Hardware Trojans, which performance of NoCs and may even create deadlocks, either by perform DoS attacks, can choose this period of application occupying the hardware resources or by misguiding the flow of execution to generate and send illegal packet requests to the legal packets. Insertion of hardware Trojans (HTs) into a chip switch allocator. To the best of the authors’ knowledge, illegal has become a common practice to perform such DoS attacks packet request attack (IPRA), when a core is idle, has not been [1], [2]. Although Quality-of-Service (QoS) mechanisms exist proposed in the existing literature. at software or task level, as in [3], [4], to improve the metrics This paper proposes a secure router architecture, called of legal packet flow, they can also be vulnerable to DoS attacks SeRA, to protect a compromised NoC from HTs deployed at generated by the HTs. the buffer sites in the router, generating IPRAs. SeRA has been On the other hand, secure NoC design has been an active endowed with a security unit (SU), which has been proposed research area for over a decade [5], [6]. Few works targeting to mitigate the IPRAs in runtime. Rest of the paper is organized as follows. Section II presents secure router design for NoCs have been proposed in the literature [1], [2], [7]. Ancajas et al. [1] have proposed the motivation and describes the attack scenario. Section III Fort-NoCs to protect a compromised NoC (C-NoC) in an describes the proposed security unit and router architecture for MPSoC platform. The threats considered here are related SeRA. Section IV presents the performance evaluation results. to covert backdoor activation of hardware trojans (HTs) to Section V concludes the article. snoop the ongoing data communication. However, it does not II. M OTIVATION AND ATTACK S CENARIO consider the HTs that can be triggered when there is no data communication. Biswas et al. [7] have addressed attacks on A. Motivation and Threat Relevance Table I shows the average idle times of cores while runrouting tables, namely, unauthorized access attack and misrouting attack. They have proposed different monitoring-based ning SPLASH-2 [9] benchmark applications on a 64 core countermeasures against such attacks. Though the presented NoC-based MPSoC. This offers a potential advantage to the countermeasures are effective in determining the location and attackers, to increase the network congestion by injecting
978-1-4673-6853-7/17/$31.00 ©2017 IEEE
Average Idle Time (%) 30.97 26.37 56.52 39.80 9.92
2 20
1.5 1
10 0.5 Area
0
illegal packets during this time. Fig. 1a shows the overhead in execution time for raytrace with the increase in the number of HTs employed, that the illegal traffic occupies 50% of core idle time, in a 8×8 NoC. From the figure, if the number of HTs is 20, the performance overhead shoots up by 29%. For larger NoCs, the effect can be further aggravated as the amount of illegal traffic would be high. This depicts the importance of this threat when time critical applications are considered. Another feature of HTs is to have as less overhead in the design footprint as possible so as to go undetected inside a fairly large design. The area overhead, as well as power overhead, of considered HTs is very small, as few XOR gates are sufficient to generate the attack. The number of gates accounting for a HT is (2B +F ), where B denotes the number of source/destination address bits, and F denotes the number of bits required to identify a flit. Fig. 1a shows the area overhead of the HTs, as compared to a baseline (unaffected) router in a 8×8 network, on the secondary y-axis. From the figure, it is evident that the proposed HTs do not occupy huge design footprints, as the area overhead to deploy 20 HTs is 2.18% of the router area, which is very small. B. Attack Scenario Fig. 1b shows the microarchitecture of a generic 5-port mesh router with a hardware Trojan. The steps in the attack scenario follow those that are in the router pipeline. Following sequence of steps describes the IPRA. 1) HTs deployed at the buffer sites trigger when the respective condition is met, and the corresponding bits get manipulated (shown as red buffer slot in Fig. 1b). 2) Switch allocator (SA) receives the illegal request from the corresponding input port and depending on the priority state, grants a slot to the packet. 3) Routing unit (RU), which considers only current and destination address values, computes the proper output port according to the destination. 4) Crossbar (XB), after receiving corresponding switch select signals from RU and SA, forwards the packet to the next router. To secure the packet information, packet encryption is performed with the help of cryptographic primitives (CPs) [10]. Adhering to huge hardware of CPs, they are generally not employed in the NoC. Even here, packet encryption is assumed to be performed for all valid packets. If a core is not injecting any packet, it flushes all zero bits, and there is no encryption performed on those. The proper time for the HT to trigger is the detection of the all zero state of a packet, as there is no encryption performed.
VC Allocator Switch Allocator
Execution Time
1
4 2 8 16 Number of Hardware Trojans
Area Overhead (%)
Application barnes cholesky fmm radix raytrace
30 Execution Time Overhead (%)
TABLE I: Average Idle Time of Cores Observed while Running SPLASH-2 Benchmarks on a 64 Core MPSoC
.. .
HT
Crossbar
.. .
0
20
Routing Unit
(a)
(b)
Fig. 1: (a) Execution time and area overheads observed due to hardware Trojans. (b) Microarchitecture of a generic mesh router with a hardware Trojan (HT). Fig. 2a shows the physical location inside a router where the attacks can happen. The fields Src, Dst and FId in the header flit correspond to source, destination and flit identifier, respectively. FId identifies the type of the flit, which can be header, or body, or tail. Depending on the value of the FId, the SA considers a request from the corresponding input port, and grants a slot for the packet to flow, according to the priority adopted. The propagation of the attack in a router has been highlighted in Fig. 1b. This attack is meaningful as many existing routing algorithms consider only the current and destination addresses for deciding the routes of packets [11]. This makes the RU unaware of the source of the packet, which gives a potential advantage to the attackers to generate illegal packet requests. Fig. 2b shows a possible way of inserting the HT for the considered IPRA. In the figure, P IN is the regular input packet information and EOC is the end of computation flag obtained from the associated core. Simple logic needed to trigger the HTs consists of XOR gates, whose inputs are P IN and EOC. As long as EOC remains low, denoting that the core is injecting packets, the deployed HTs are not activated. As soon as EOC goes high, they are activated and the corresponding bits in the considered fields are flipped and set to logic 1. The choice of locations for the XOR gates is made by the attacker, as the attacker may insert the HTs anywhere in the shown locations. Backdoor kill switch, as mentioned in [12], can be adopted for timely activation of the HTs. ···
BufOut
Header Body · · · Body Tail
···
Src
Dst
(a)
FId
BufIn
P IN
Src
P IN
Dst
P IN
FId EOC
(b)
Fig. 2: (a) Attack points in buffer location corresponding to header flit. (b) Insertion of Considered Hardware Trojan.
Security Unit
P VLD Dst[1] .. .
Input Buffer
FId[1]
I OUT
FId[0]
A. Area and Power Overhead Fig. 4a shows the required additional number of OR gates in realizing the proposed router architecture. The value increases with increase in either the number of virtual channels (VCs) per input port, or in the network size. Since the attack can happen at any buffer, a security unit has been placed at each input VC of all routers in the network. In the present case, unified buffer management has been assumed, which means that the buffer allocation to a particular port would be done in runtime. This makes it necessary for the router to have a security unit for each virtual channel present in the router. Fig. 4b shows the area overhead of the proposed router architecture when compared with the baseline router. From the figure, it is evident that if the number of virtual channels per input port increases, the area overhead of SeRA reduces, compared to the baseline router. Also, if the number of nodes in the network increases, the area of other units, such as RU increases, and thus the overhead of SeRA compared to the baseline router is reduced. For a 16 node network with 2 VCs per input port, the area overhead observed is 3.64%. Here the router has a SU for each of its VCs. Fig. 4c shows the power overhead of the proposed router architecture when compared with the baseline router. From the figure, it is evident that if the number of virtual channels per input port increases, the power overhead of SeRA increases. This is due to the fact that the SU has to remain active for the entire time when the core is idle. Figs. 4b-4c correspond to SeRA with SUs employed at all VCs in the router. However, if
VC Allocator Switch Allocator
Src[0]
Dst[0]
In order to assess the efficacy of the proposed security unit, performance evaluation has been done on Mesh NoCs of different sizes. Performance metrics such as area overhead and normalized execution time under threats have been compared with baseline router. Architecture of the router has been modelled in VHDL. Area and power results have been obtained by synthesizing the proposed architecture using Synopsys Design Compiler (SDC) and TSMC 90 nm technology libraries with a supply voltage of 1.0 V at an operating frequency of 1 GHz.
Crossbar
.. .
600 Number of Additional Gates
Src[1]
IV. P ERFORMANCE E VALUATION
5 2 VCs/port 4 VCs/port 8 VCs/port
400
200
16
64 Number of Nodes
256
2 VCs/port 4 VCs/port 8 VCs/port
3.5 3 2.5 2 1.5
16
64 Number of Nodes
256
Percentage Power Overhead
The proposed security unit (SU) has been designed to mitigate the IPRAs and to restore the secure operation scenario in the network. The adverse effects due to these attacks are nullified at the point where the attacks happen, thus not affecting the performance of the network. The proposed SU consists of modules that validate the packet header. The inputs to the SU are the information bits in the packet header, such as source (Src) and destination (Dst) addresses and flit identifier (FId). Since the condition for triggering the HT is during the non-execution period of a core, it is sufficient to cross check the address bits of source and destination nodes for detecting the threat. In a secure scenario, all these should correspond to zero. Thus, a module, called P VLD, verifies if all the bits corresponding to source and destination addresses are zero or not. This is just to detect if any HTs are present at these locations. Checking the non-zero status of these bits is necessary, as few fault-tolerant switch allocators assume that a request is valid, if they find non-zero bits in either Src or Dst fields, irrespective of the contents of FId [13]. If the HT is triggered at the location of FId, there is always a possibility that the SA can consider it to be a valid request, and accordingly can grant a port for a packet corresponding to its destination address. To verify this, I OUT module has been employed in the SU. Number of FId bits depends on whether the packet has one or more flits. Considering multiple flits per packet, I OUT checks for all zero state of those bits. P VLD outputs a zero (0) if the bits corresponding to source and destination addresses are all zeros. Similarly, I OUT outputs a zero (0) if the bits corresponding to FId are all zeros. A one (1) output from any of these blocks means that there is a HT in the corresponding buffer location. After the P VLD and I OUT bits have been obtained, warning bits (W OUT) are generated and sent to units, such as SA, for further processing. Fig. 3a shows the logic of the proposed SU for a 2×2 Mesh NoC. The hardware overhead of the proposed security unit is 2(B − 1) + (F − 1) 2-input OR gates, where B and F denote the number of bits required to represent source (destination) address and flit identifier, respectively. Fig. 3b shows the architecture of SeRA, endowed with the proposed SU. At the end of computation, EOC functions as an enable signal to the SU, to start monitoring the status of the buffers associated with it. SU outputs the warning signals to the SA for further processing. Once the threat has been detected, mechanisms such as buffer masking and buffer
isolation can be done to avoid the usage of such an insecure buffer. This degrades the network performance to some extent, as the total number of buffers now for a port becomes less, since the masked buffers cannot be used further. However, efficient buffer management schemes discussed in [14], [15] would alleviate this problem.
Percentage Area Overhead
III. P ROPOSED S ECURITY U NIT
2 VCs/port 4 VCs/port 8 VCs/port
4 3 2 1
16
64 Number of Nodes
256
Routing Unit
(a) (a)
(b)
Fig. 3: (a) Logic of a security unit (SU) in the proposed architecture for 2×2 network. (b) The proposed secure router architecture.
(b)
(c)
Fig. 4: (a) Number of additional 2-input OR gates required for the proposed router. (b) Percentage area overhead and (c) Percentage power overhead of proposed router compared to baseline router.
the network size increases, the power overhead decreases. This is due to the increase in the power of other router components. Thus for a 64 node network with 8 VCs per input port, the power overhead observed is 3.54%. B. Performance Overhead
Application Traffic from SPLASH-2 Benchmarks
ce
ag e av er
x di
m
ra yt ra
ra
fm
es
ky
1 es
e ag
ce ra
av er
m
ra di x
fm
yt ra
ch
ol e
s
sk y
1
1.2
rn
1.1
1.4
ch ol
Normalized Energy Consumption
1.2
Baseline with No Threat Baseline with Threats SeRA with Threats
1.6
ba
Baseline with No Threat Baseline with Threats SeRA with Threats
1.3
ba rn e
Normalized Execution Time
For evaluating the performance of SeRA under IPRAs, a 64node mesh NoC has been considered. Communication traces for SPLASH-2 benchmarks have been obtained from SniperSim [16] system simulator. BookSim 2.0 network simulator [17] has been used to obtain the performance parameters. Each router is considered to have five input/output (IO) ports, four VCs per IO port. Each buffer has the capacity to store 8 flits and each flit is 32 bit wide. Fig. 5 shows the comparison of normalized execution time of SeRA and that of the baseline routers under threats. To the total communication volume of an application, illegal traffic for 50% core idle time has been added. From the figure, it is evident that if the illegal traffic is 50% of the core idle time, the execution time jumps up by 34.80% for applications such as raytrace. This shows the severity of IPRAs in the network. However, in the presence of SUs in routers, the effect of these illegal packets has been brought down significantly. One can observe that the normalized execution time reaches close to its optimal value, which is one. The slight degradation in the same for SeRA is due to the reduction in the number of total buffers (inherently VCs). This is because of the HT attacks, as once a VC of a buffer has been detected with an attack, it is either isolated or masked off. This prevents it from being considered for further data communication. SeRA can bring down the effect of IPRAs, in terms of execution time overhead, from 34.80% to 7.80%, for raytrace benchmark traffic, and similarly for other applications as well. On an average, if the IPRAs contribute to an increase in total communication volume corresponding to 50% of core idle time, SeRA shows an overhead in execution time of 4.86%. Fig. 5 also compares the energy consumption of SeRA with baseline router under threats. Power results, necessary for the same, have been obtained using SDC. From the figure, one may note that SeRA has better energy consumption compared to baseline router under threats. On an average SeRA has an improvement of 37.75% in terms of energy consumption compared with baseline router under threats, which is an overhead of 9.76% compared to the baseline router with no threats.
Application Traffic from SPLASH-2 Benchmarks
Fig. 5: Performance comparison of SeRA with baseline router.
V. C ONCLUSIONS This article has proposed a secure router architecture (SeRA) for Network-on-Chip (NoC) paradigm, which adopts effective measures to counter illegal packet request attacks (IPRAs). Conditionally triggered hardware Trojans (HTs) have been considered to cause these attacks, which reside at the buffer sites in the routers. The attacks have been assumed to happen when the core attached to a router remains idle. A security unit (SU) has been proposed to mitigate these attacks and to restore secure communication in the NoC. Compared to a baseline router, SeRA has a maximum overhead of 3.64% in area for a 16-node network. If HTs inject illegal packets for 50% of the core idle time, SeRA can manage to maintain the execution time overhead below 4.86%, and the energy consumption overhead below 9.76%, respectively, for real benchmarks. This slight degradation is due to the reduction in the number of virtual channels that support data communication after the attacks happen. Thus, SeRA is able to mitigate such threats in an NoC with graceful degradation of the execution time of the running application. Future work includes designing secure router architectures for NoCs in the presence of faults. R EFERENCES [1] D. M. Ancajas et al., “Fort-nocs: Mitigating the threat of a compromised noc,” in DAC, Jun 2014, pp. 1–6. [2] T. Boraten and A. K. Kodi, “Packet security with path sensitization for nocs,” in DATE, Mar 2016, pp. 1136–1139. [3] E. Carara et al., “Managing qos flows at task level in noc-based mpsocs,” in VLSI-SoC, Oct 2009, pp. 133–138. [4] ——, “Achieving composability in noc-based mpsocs through qos management at software level,” in DATE, Mar 2011, pp. 1–6. [5] C. H. Gebotys and R. J. Gebotys, “A framework for security on noc technologies,” in ISVLSI, Feb 2003, pp. 113–117. [6] S. Evain and J. P. Diguet, “From noc security analysis to design solutions,” in SIPS, Nov 2005, pp. 166–171. [7] A. K. Biswas et al., “Router attack toward noc-enabled mpsoc and monitoring countermeasures against such threat,” CSSP, vol. 34, no. 10, pp. 3241–3290, Oct 2015. [8] T. Boraten and A. K. Kodi, “Mitigation of denial of service attack with hardware trojans in noc architectures,” in IPDPS, May 2016, pp. 1091– 1100. [9] S. C. Woo et al., “The splash-2 programs: characterization and methodological considerations,” in ISCA, Jun 1995, pp. 24–36. [10] H. K. Kapoor et al., “A security framework for noc using authenticated encryption and session keys,” CSSP, vol. 32, no. 6, pp. 2605–2622, Dec 2013. [11] M. Palesi and M. Daneshtalab, Eds., Routing Algorithms in Networkson-Chip. Springer New York, 2014. [12] T. Boraten and A. K. Kodi, “Mitigation of denial of service attack with hardware trojans in noc architectures,” in IPDPS, May 2016, pp. 1091– 1100. [13] G. Dimitrakopoulos and E. Kalligeros, “Low-cost fault-tolerant switch allocator for network-on-chip routers,” in INA-OCMC, Jan 2012, pp. 25–28. [14] I. Seitanidis et al., “Elastistore: Flexible elastic buffering for virtualchannel-based networks on chip,” TVLSI, vol. 23, no. 12, pp. 3015–3028, Dec 2015. [15] M. Oveis-Gharan and G. N. Khan, “Efficient dynamic virtual channel organization and architecture for noc systems,” TVLSI, vol. 24, no. 2, pp. 465–478, Feb 2016. [16] T. E. Carlson et al., “An evaluation of high-level mechanistic core models,” TACO, vol. 11, no. 3, pp. 28:1–28:23, Oct 2014. [17] N. Jiang et al., “A detailed and flexible cycle-accurate network-on-chip simulator,” in ISPASS, Apr 2013, pp. 86–96.