1
Data Falsification Attacks on Consensus-based Detection Systems Bhavya Kailkhura, Student Member, IEEE, Swastik Brahma, Member, IEEE, Pramod K. Varshney, Fellow, IEEE
Abstract—This paper considers the problem of signal detection in distributed networks in the presence of data falsification (Byzantine) attacks. Detection approaches considered in the paper are based on fully distributed consensus algorithms, where all of the nodes exchange information only with their neighbors in the absence of a fusion center. For such networks, we first characterize the negative effect of Byzantines on the steady-state and transient detection performance of conventional consensusbased detection algorithms. To avoid performance deterioration, we propose a distributed weighted average consensus algorithm that is robust to Byzantine attacks. We show that, under reasonable assumptions, the global test statistic for detection can be computed locally at each node using our proposed consensus algorithm. We exploit the statistical distribution of the nodes’ data to devise techniques for mitigating the influence of data falsifying Byzantines on the distributed detection system. Since some parameters of the statistical distribution of the nodes’ data might not be known a priori, we propose learning based techniques to enable an adaptive design of the local fusion or update rules. Numerical results are presented for illustration. Index Terms—Distributed detection, spectrum sensing, ad-hoc cognitive radio networks, consensus algorithms, data falsification attacks, Byzantines
I. I NTRODUCTION Distributed detection is a well studied topic in the detection theory literature [1]–[3]. The traditional distributed detection framework comprises of a group of spatially distributed nodes which acquire observations regarding the phenomenon of interest and send a compressed version to the fusion center (FC) where a global decision is made. However, in many scenarios, a centralized FC may not be available or the FC may become an information bottleneck causing degradation of system performance, potentially leading to system failure. Also, due to the distributed nature of future communication networks, and various practical constraints, e.g., absence of the FC, transmit power or hardware constraints and dynamic nature of the wireless medium, it may be desirable to employ alternate peer-to-peer local information exchange in order to reach a global decision. One such decentralized approach for peer-to-peer local information exchange and inference is the use of a consensus algorithm [4]. Recently, distributed detection based on consensus algorithms has been explored in [5]–[10]. Consensus-based detection approaches operate in three phases. First, in the sensing This work was supported in part by ARO under Grant W911NF-14-1-0339. The authors would like to thank Aditya Vempaty for his valuable comments and suggestions to improve the quality of the paper. B. Kailkhura, S. Brahma and P. K. Varshney are with Department of EECS, Syracuse University, Syracuse, NY 13244. (email:
[email protected];
[email protected];
[email protected])
phase, each node collects sufficient number of observations over a period of time (summary statistic). Second, in the information fusion phase, each node communicates only with its neighbors and updates its local state information (summary statistic) about the phenomenon by a local fusion rule that employs a weighted combination of its own value and those received from its neighbors. Nodes continue with these consensus iterations until the whole network converges to a steady-state value which is the global test statistic. Finally, in the decision making phase, nodes make their own decisions regarding the presence or the absence of the phenomenon. In particular, the authors in [8], [9] considered average consensus-based distributed detection and emphasized network designs based on the small world phenomenon for faster convergence [6]. A bio-inspired consensus scheme was introduced for spectrum sensing in [10]. However, these consensus-based fusion algorithms only ensure equal gain combining of local measurements. The authors in [5] proposed to use distributed weighted fusion algorithms for cognitive radio spectrum sensing. They showed that weighted average consensus-based schemes outperform average consensus-based schemes and achieve much better detection performance than the equal gain combining schemes. More recently, other decentralized schemes [11]–[13] for distributed detection have been proposed where, as opposed to consensus based strategies, sensing and information fusion phases occur in the same time step. These strategies are particularly useful in scenarios where the hypothesis is rapidly changing and may change during the information fusion phase of the consensus strategies. However, in this paper, we focus only on consensus based distributed detection. However, consensus-based detection schemes are quite vulnerable to different types of attacks. One typical attack on such networks is a Byzantine attack. While Byzantine attacks (originally proposed in [14]) may, in general, refer to many types of malicious behavior, our focus in this paper is on data-falsification attacks [15]–[21]. In [17], [18],the authors considered the problem of distributed detection in the presence of Byzantines for a centralized model with fusion center and determined the optimal attacking strategy which minimizes the detection error exponent. The authors in [22] considered the problem of distributed detection with covert Byzantine attacks in a centralized model. They obtained the optimal attacking strategy from the point of view of a smart adversary to disguise itself from the proposed detection scheme while accomplishing its attack. In our earlier work [23] on this problem, we analyzed the performance of data fusion schemes
2
with Byzantines in a centralized model.1 Thus far, research on detection in the presence of Byzantine attacks has predominantly focused on addressing these attacks under the centralized model in which information is available at the FC [17], [18], [21], [23], [24]. A few attempts have been made to address the security threats in “average” consensus-based detection schemes in recent research [25]– [30]. Most of these existing works on countering Byzantine or data falsification attacks in distributed networks rely on a threshold for detecting Byzantines. The main idea is to exclude nodes from the neighbors list whose state information deviates significantly from the mean value. In [26] and [29], two different defense schemes against data falsification attacks for distributed consensus-based detection were proposed. In [26], the scheme eliminates the state value with the largest deviation from the local mean at each iteration step and, therefore, it can only deal with the situation in which only one Byzantine node exists. Note that, it excludes one state value even if there is no Byzantine node. In [29], the vulnerability of distributed consensus-based spectrum sensing was analyzed and an outlier detection algorithm with an adaptive threshold was proposed. The authors in [28] proposed a Byzantine mitigation technique based on adaptive local thresholds. This scheme mitigates the misbehavior of Byzantine nodes and tolerates the occasional large deviation introduced by honest users. It adaptively reduces the corresponding coefficients so that the Byzantines are eventually isolated from the network. Excluding the Byzantine nodes from the fusion process may not be the best strategy from the network’s perspective. As shown in our earlier work [21] in the context of distributed detection with one-bit measurements under a centralized model with an FC, an intelligent way to improve the performance of the network is to use the information of the identified Byzantines to the network’s benefit. More specifically, learning-based techniques have the potential to outperform the existing exclusion-based techniques. In this paper, we pursue such a design philosophy in the context of raw data fusion in decentralized networks based on weighted average consensus algorithms. Byzantines can attack weighted average consensus algorithms in two ways: 1) nodes falsify their initial data, and 2) nodes falsify their weight values. To the best of the authors’ knowledge, the susceptibility and protection of weighted average consensus-based detection schemes has not been considered in the literature. As will be seen later, in conventional consensus algorithms, weight manipulation cannot be detected. To design methodologies for defending against Byzantine attacks, two fundamental challenges arise. First, how do nodes recognize the presence of attackers? Second, after identification of an attacker or group of attackers, how do nodes adapt their operating parameters? Due to the large number of nodes and complexity of the distributed network, we develop and analyze schemes where the nodes update their own operating parameters autonomously. Our approach further introduces an adaptive fusion based detection algorithm which supports the 1 In our current work, we extend our previous work [23] for decentralized model and focus on the performance analysis of the consensus based distributed detection with Byzantines.
learning of the attackers’ behavior. Our scheme differs from all existing work on Byzantine mitigation that are based on exclusion strategies [25]–[29], where the only defense is to identify and exclude the attackers from the consensus process. A. Main Contributions In this paper, we focus on the susceptibility and protection of consensus-based detection algorithms. Our main contributions are summarized as follows: • We characterize the effect of Byzantines on the steadystate performance of the conventional consensus-based detection algorithms. More specifically, we quantify the minimum fraction of Byzantines needed to make the deflection coefficient of the global statistic equal to zero. • Using probability of detection and probability of false alarm as measures of detection performance, we investigate the degradation of transient detection performance of the conventional consensus algorithms in the presence of Byzantines. • We propose a robust distributed weighted average consensus algorithm which allows detection of weight manipulation by Byzantines and obtain closed-form expressions for optimal weights to mitigate the effect of data falsification attacks. • Finally, we propose a technique based on the expectationmaximization (EM) algorithm and maximum likelihood (ML) estimation to learn the operating parameters (or weights) of the nodes in the network to enable an adaptive design of the local fusion or update rules. The rest of the paper is organized as follows. In Sections II and III, we introduce our system model and Byzantine attack model, respectively. In Section IV, we study the security performance of weighted average consensus-based detection schemes. In Section V, we propose a protection mechanism to mitigate the effect of data falsification attacks on consensusbased detection schemes. Finally, Section VI concludes the paper. II. S YSTEM
MODEL
Consider two hypotheses H0 (signal is absent) and H1 (signal is present). Also, consider N nodes organized in an undirected graph G which faces the task of determining which of the two hypotheses is true. First, we define the network model used in this paper. A. Network Model We model the network topology as an undirected graph G = (V, E), where V = {v1 , · · · , vN } represents the set of nodes in the network with |V | = N . The set of communication links in the network correspond to the set of edges E, where (vi , vj ) ∈ E, if and only if there is a communication link between vi and vj so that, vi and vj can directly communicate with each other. The adjacency matrix A of the graph is defined as 1 if (vi , vj ) ∈ E, aij = 0 otherwise.
3
B. Sensing Phase 2
3
1 5
4
where ζi is the deterministic gain corresponding to the sensing channel, st is the deterministic signal at time instant t, nti is AWGN, i.e., nti ∼ N (0, σi2 ) (where N denotes the normal distribution) and independent across time. Each node i calculates a summary statistic Yi over a detection interval of M samples, as P t 2 Yi = M t=1 |zi |
6
Figure 1.
A distributed network with 6 nodes
The neighborhood of a node i is defined as Ni = {vj ∈ V : (vi , vj ) ∈ E}, ∀i ∈ {1, 2, · · · , N }. The degree di of a node vi is the number P of edges in E which include vi as an endpoint, i.e., di = N j=1 aij . The degree matrix D is defined as a diagonal matrix with diag(d1 , · · · , dN ) and the Laplacian matrix L is defined as di if j = i, lij = −aij otherwise. In other words, L = D − A. As an illustration, consider a network with six nodes trying to reach consensus (see Figure 1). The degree matrix for this network is given by D = diag(1, 3, 2, 4, 1, 1). The adjacency matrix A for this network is given by 0 1 0 A= 0 0 0
1 0 1 1 0 0
0 1 0 1 0 0
0 1 1 0 1 1
0 0 0 1 0 0
0 0 0 . 1 0 0
Therefore, the Laplacian matrix L = D−A for this network is given by 1 −1 0 L= 0 0 0
−1 3 −1 −1 0 0
0 −1 2 −1 0 0
0 −1 −1 4 −1 −1
We consider an N -node network using the energy detection scheme [31]. For the ith node, the sensed signal zit at time instant t is given by nti , under H0 zit = t ζi s + nti under H1 ,
0 0 0 −1 1 0
0 0 0 . −1 0 1
The consensus-based distributed detection scheme usually contains three phases: sensing, information fusion, and decision making. In the sensing phase, each node acquires the summary statistic about the phenomenon of interest. In this paper, we adopt the energy detection method so that the local summary statistic is the received signal energy. Next, in the information fusion phase, each node communicates with its neighbors to update their state values (summary statistic) and continues with the consensus iteration until the whole network converges to a steady state which is the global test statistic. Finally, in the decision making phase, nodes make their own decisions about the presence of the phenomenon using this global test statistic. In the following each of these phases is described in more detail.
where M is determined by the time-bandwidth product [32]. Since Yi is the sum of the squares of M i.i.d. Gaussian random variables, it can be shown that σY2i follows a central chi-square i distribution with M degrees of freedom (χ2M ) under H0 , and, a non-central chi-square distribution with M degrees of freedom and parameter ηi under H1 , i.e., Yi χ2M , under H0 ∼ 2 χM (ηi ) under H1 σi2 where P ηi = Es |ζi |2 /σi2 is the local SNR at the ith node and M Es = t=1 |st |2 represents the sensed signal energy over M detection instants. Note that the local SNR is M times the average SNR at the output of the energy detector, which is Es |ζi |2 . Mσ2 i
C. Information Fusion Phase In this section, we give a brief introduction to conventional consensus algorithms [4]. and explain how consensus is reached using the following two steps. Step 1: All nodes establish communication links with their neighbors, and broadcast their information state, xi (0) = Yi . Step 2: Each node updates its local state information by a local fusion rule (weighted combination of its own value and those received from its neighbors) [4]. We denote node i’s updated information at iteration k by xi (k). Node i continues to broadcast information xi (k) and update its local information state until consensus is reached. This process of updating information state can be written in a compact form as X (xj (k) − xi (k)) (1) xi (k + 1) = xi (k) + wi j∈Ni
where is the time step and wi is the weight given to node i’s information. Using the notation x(k) = [x1 (k), · · · , xN (k)]T , network dynamics can be represented in the matrix form as, x(k + 1) = W x(k) where, W = I − diag(1/w1 , · · · , 1/wN )L is referred to as a Perron matrix. The consensus algorithm is nothing but a local fusion or update rule that fuses the nodes’ local information state with information coming from neighbor nodes, and it is well known that every node asymptotically reaches the same information state for arbitrary initial values [4].
4
D. Decision Making Phase ∗
The final information state x after reaching consensus for the above consensus algorithm will be the weighted ∗ average of the PN PNinitial states of all the nodes [4] or x = i=1 wi Yi / i=1 wi , ∀i. Average consensus can be seen as a special case of weighted average consensus with wi = w, ∀i. After the whole network reaches a consensus, each node makes its own decision about the hypothesis using a predefined threshold λ2
Decision =
H1 H0
if x∗ > λ otherwise
where weights are given by [5] ηi /σ 2 wi = PN i . (2) 2 i=1 ηi /σi PN PN In the rest of the paper, Λ = i=1 wi Yi / i=1 wi is referred to as the final test statistic. Next, we discuss Byzantine attacks on consensus-based detection schemes and analyze the performance degradation of the weighted average consensus-based detection algorithms due to these attacks. III. ATTACKS
ON
C ONSENSUS BASED D ETECTION A LGORITHMS
When there are no adversaries in the network, we noted in the last section that consensus can be reached to the weighted average of arbitrary initial values by having the nodes use the update strategy x(k + 1) = W x(k) with an appropriate weight matrix W . However, suppose, that instead of broadcasting the true summary statistic Yi and applying the update strategy (1), some nodes (referred to as Byzantines) deviate from the prescribed strategies. Accordingly, Byzantines can attack in two ways: data falsification (nodes falsify their initial data or weight values) and consensus disruption (nodes do not follow the update rule given by (1)). More specifically, Byzantine node i can do the following Data falsification: xi (0) = Yi + ∆i , or wi is changed to w ˜i Consensus disruption: P (xj (k) − xi (k)) + ui (k), xi (k + 1) = xi (k) + wi
past literature (e.g., see [6], [33]) where control theoretic techniques were developed to identify disruption attackers in a ‘single’ consensus iteration. However, these techniques cannot identify the data falsification attacker due to philosophically different nature of the problem. Also, notice that, by knowing the existence of such an identification mechanism, a smart adversary will aim to disguise itself while degrading the detection performance. In contrast to disruption attackers, data falsification attackers are more capable and can manage to disguise themselves while degrading the detection performance of the network by falsifying their data. Susceptibility and protection of consensus strategies to data falsification attacks has received scant attention, and this is the focus of our work. Our main focus3 here is on the scenarios where an attacker performs a data falsification attack by introducing (∆i , w ˜i ) during initialization. We exploit the statistical distribution of the initial values and devise techniques to mitigate the influence of Byzantines on the distributed detection system. Our approach for data falsification attacks on consensus-based detection systems complements the techniques proposed in [6], [33] that are mainly focused on consensus disruption attacks. A. Data Falsification Attack In data falsification attacks, try to manipulate the PNattackers P N final test statistic (i.e., Λ = i=1 wi Yi / i=1 wi ) in a manner so as to degrade the detection performance. We consider a network with N nodes that uses Algorithm (1) for reaching consensus. Weight wi , given to node i’s data Yi in the final test statistic, is controlled or updated by node i itself while carrying out the iteration in Algorithm (1). So by falsifying initial values Yi or weights wi , the attackers can manipulate the final test statistic. Detection performance will be degraded because Byzantine nodes can always set a higher weight to their manipulated information. Thus, the final statistic’s value across the whole network will be dominated by the Byzantine node’s local statistic that will lead to degraded detection performance. Next, we define a mathematical model for data falsification attackers. We analyze the degradation in detection performance of the network when Byzantines falsify their initial values Yi for fixed arbitrary weights w ˜i . B. Attack Model
where (∆i , w ˜i ) and ui (k) are introduced at the initialization step and at the update step k, respectively. The attack model considered above is extremely general, and allows Byzantine node i to update its value in a completely arbitrary manner (via appropriate choices of (∆i , w ˜i ), and ui (k), at each time step). An adversary performing consensus disruption attack has the objective to disrupt the consensus operation. However, consensus disruption attacks can be easily detected because of the nature of the attack. Furthermore, the identification of consensus disruption attackers has been investigated in the
The objective of Byzantines is to degrade the detection performance of the network by falsifying their data (Yi , wi ). We assume that Byzantines have an advantage and know the true hypothesis. Under this assumption, we analyze the detection performance of the data fusion schemes which yields the maximum performance degradation that the Byzantines can cause, i. e., worst case detection performance. We consider the case when weights of the Byzantines have been tampered by setting their value at w ˜i and analyze the effect of falsifying the initial values Yi . Now a mathematical model for a Byzantine attack is presented. Byzantines tamper with their initial
2 In practice, parameters such as threshold λ and consensus time step are set off-line based on well know techniques [4]. In this paper, these parameters are assumed known and setting the parameters is not considered.
3 Later, we also come up with a robust distributed weighted average consensus algorithm which allows detection of consensus disruption attack while mitigating the effect of data falsification attacks.
j∈Ni
5
values Yi and send Y˜i such that the detection performance is degraded. Under H0 : Yi + ∆i with probability Pi ˜ Yi = Yi with probability (1 − Pi )
Y˜i =
Yi − ∆i Yi
with probability Pi with probability (1 − Pi )
where Pi is the attack probability and ∆i is a constant value which represents the attack strength, which is zero for honest nodes. As we show later, Byzantine nodes will use a large value of ∆i so that the final statistic’s value is dominated by the Byzantine node’s local statistic leading to a degraded detection performance. We use deflection coefficient [34] to characterize the security performance of the detection scheme due to its simplicity and its strong relationship with the global detection performance. Deflection coefficient of (µ1 − µ0 )2 , the global test statistic is defined as: D(Λ) = 2 σ(0) where µk = E[Λ|Hk ], k = 0, 1, is the conditional mean 2 and σ(k) = E[(Λ − µk )2 |Hk ], k = 0, 1, is the conditional variance. The deflection coefficient is closely related to performance measures such as the Receiver Operating Characteristics (ROC) curve [34]. In general, the detection performance monotonically increases with an increasing value of the deflection coefficient. We define the critical point of the distributed detection network as the minimum fraction of Byzantine nodes needed to make the deflection coefficient of the global test statistic equal to zero (in which case, we say that the network becomes blind) and denote it by αblind . We assume that the communication between nodes is errorfree and our network topology is fixed during the whole consensus process and, therefore, consensus can be reached without disruption [4]. In the next section, we analyze the security performance of consensus-based detection schemes in the presence of data falsifying Byzantines as modeled above. This analysis will be useful in revealing some quantitative relationships to judge the degradation of the detection performance with data falsifying Byzantines. IV. P ERFORMANCE
ANALYSIS OF CONSENSUS - BASED
DETECTION ALGORITHMS
In this section, we analyze the effect of data falsification attacks on conventional consensus-based detection algorithms. First, we characterize the effect of Byzantines on the steadystate performance of the consensus-based detection algorithms and determine αblind . Without loss of generality, we assume that the nodes corresponding to the first N1 indices i = 1, · · · , N1 are Byzantines and the remaining nodes corresponding to indices i = N1 + 1, · · · , N are honest nodes. P LetPusN1define w ˜i + w = [w ˜1 , · · · , w ˜N1 , wN1 +1 · · · , wN ]T and w = i=1 PN i=N1 +1 wi .
5 Deflection Coefficient
Under H1 :
6
4 3 2 1 0 0
15 0.25
10 0.50
0.75
Attack Probability ‘P’
Figure 2. ∆.
5 1
0
Attack Strength ‘∆’
Deflection Coefficient as a function of attack parameters P and
Lemma 1. For data fusion schemes, the condition to blind the network or equivalently to make the deflection coefficient zero is given by N1 X i=1
w ˜i (2Pi ∆i − ηi σi2 ) =
N X
wi ηi σi2 .
i=N1 +1
Proof: Please see Appendix A. Note that, when wi = w ˜i = z, ηi = η, σi = σ, Pi = N1 P, ∆i = ∆, ∀i, the blinding condition simplifies to = N 2 1 ησ . 2 P∆ Next, to gain insights into the solution, we present some numerical results in Figure 2. We plot the deflection coefficient of the global test statistic as a function of attack parameters Pi = P, ∆i = ∆, ∀i. We consider a 6-node network with the topology given by the undirected graph shown in Figure 1 deployed to detect a phenomenon. Nodes 1 and 2 are considered to be Byzantines. Sensing channel gains of the nodes are assumed to be h = [0.8, 0.7, 0.72, 0.61, 0.69, 0.9] and weights are given by (2). We also assume that M = 12, Es = 5, and σi2 = 1, ∀i. Notice that, the deflection coefficient is zero when the condition in Lemma 1 is satisfied. Another observation to make is that the deflection coefficient can be made zero even when only two out of six nodes are Byzantines. Thus, by appropriately choosing attack parameters (P, ∆), less than 50% of data falsifying Byzantines can blind the network. Next, using the probability of detection and the probability of false alarm as measures of detection performance, we investigate the degradation of transient detection performance of consensus algorithms with Byzantines. More specifically, we analyze the detection performance of the data fusion scheme, denoted as x(t + 1) = W t x(0), as a function of consensus iteration t by assuming that each node makes its local decision using the information available at the end of iteration t. For analytical tractability, we assume that Pi = P, ∀i. We denote t by wji the element of matrix W t in the jth row and ith
6
N X 1
µ0 =
i=1
N X
w ˜i w ˜i Pi P (M σi2 + ∆i ) + (1 − Pi ) P M σi2 + w w
N X
i=1
w ˜i w ˜i Pi P ((M + ηi )σi2 − ∆i ) + (1 − Pi ) P (M + ηi )σi2 + w w N X 1
2 σ(0)
=
i=1
w ˜ Pi w
2
Pi (1 −
Pi )∆2i
+
column. Using these notations, we calculate the probability of detection and the probability of false alarm at the jth node at consensus iteration t. For clarity of exposition, we first derive our results for a small network with two Byzantine nodes and one honest node (see Appendix B). Due to the probabilistic nature of the Byzantine’s behavior, it may behave as an honest node with a probability (1 − P ). Let S denote the set of all combinations of such Byzantine strategies: S = {{b1 , b2 }, {h1 , b2 }, {b1 , h2 }, {h1 , h2 }}
(6)
where by bi we mean that Byzantine node i behaves as a Byzantine and by hi we mean that Byzantine node i behaves as an honest node. Let As ∈ U denote the indices of Byzantines behaving as an honest node in the strategy combination s, then, from (6) we have U = {A1 = {}, A2 = {1}, A3 = {2}, A4 = {1, 2}} U c = {Ac1 = {1, 2}, Ac2 = {2}, Ac3 = {1}, Ac4 = {}} where {} is used to denote the null set. Let us use ms to denote the cardinality of subset As ∈ U . Using these notations, we generalize our results for any arbitrary N . Lemma 2. ThePtest statistic ofPnode j at consensus iteration N N1 t t ˜ ˜t = t, i.e., Λ j i=N1 +1 wji Yi is a Gaussian i=1 wji Yi + mixture with PDF ˜ t |Hk ) f (Λ j
=
X
P N1 −ms (1 − P )ms ×
As ∈U
φ (µk )As +
with (µk )As =
P
u∈As
N X
t (µ1k )i , wji
i=N1 +1
t wju (µ1k )j +
N X i=1
P
u∈Acs
t (wji (σ1k )i )2
t wju (µ2k )j .
The performance of the detection scheme in the presence of Byzantines can be represented in terms of the probability of detection and the probability of false alarm of the network. Proposition 1. The probability of detection and the probability of false alarm of node j at consensus iteration t in the presence of Byzantines can be represented as Pdt (j)
X
=
P
As ∈U
N1 −ms
ms
(1−P )
t (µ ) wji 11 i N t 2 i=1 (wji (σ11 )i )
λ − (µ1 )As − qP Q
PN
i=N1 +1
2M σi4
+
N X
i=N1 +1
X
Pft (j) =
As ∈U
w P i M σi2 w
i=N1 +1
1
µ1 =
N X
i=N1 +1
w Pi w
2
(3)
w P i (M + ηi )σi2 w
2M σi4
(4)
(5)
t (µ ) w 10 i i=N1 +1 ji , t (σ ) )2 (w 10 i i=1 ji
λ − (µ0 )As − P N1 −ms (1−P )ms Q qP N
PN
where λ is the threshold used for detection by node j. Next, to gain insights into the results given in Proposition 1, we present some numerical results in Figures 3 and 4. We consider the 6-node network shown in Figure 1 where the nodes employ the consensus algorithm 1 with = 0.6897 to detect a phenomenon. Nodes 1 and 2 are considered to be Byzantines. We also assume that ηi = 10, σi2 = 2, λ = 33 and wi = 1. Attack parameters are assumed to be (Pi , ∆i ) = (0.5, 6) and w ˜i = 1.1. To characterize the transient performance of the weighted average consensus algorithm, in Figure 3(a), we plot the probability of detection as a function of the number of consensus iterations without Byzantines, i.e., (∆i = 0, w ˜i = wi ). Next, in Figure 3(b), we plot the probability of detection as a function of the number of consensus iterations in the presence of Byzantines. It can be seen that the detection performance degrades in the presence of Byzantines. In Figure 4(a), we plot the probability of false alarm as a function of the number of consensus iterations without Byzantines, i.e., (∆i = 0, w ˜i = wi ). Next, in Figure 4(b), we plot the probability of false alarm as a function of the number of consensus iterations in the presence of Byzantines. From both Figures 3 and 4, it can be seen that the Byzantine attack can severely degrade transient detection performance. From the discussion in this section, we can see that Byzantines can severely degrade both the steady-state and the transient detection performance of conventional consensus-based detection algorithms. As mentioned earlier, a data falsifying Byzantine i can tamper its weight wi as well as its sensing data Yi to degrade detection performance. One approach to mitigate the effect of sensing data falsification is to assign weights based on the quality of the data. In other words, a lower weight can be given to the data of the node identified as a Byzantine. However, to implement this approach one has to address the following two issues. First, in the conventional weighted average consensus algorithm, weight wi given to node i’s data is controlled or updated by the node itself (see discussion in Section III-A). Thus, a Byzantine node can always set a higher weight to its manipulated information and the final statistics will be
7
0.98
1
0.96
0.92
Probability of Detection
Probability of Detection
0.95 0.94
Node 1 Node 2
0.9
Node 3 0.88
Node 4
Node 1 Node 2 0.85
Node 3 Node 4
0.8
Node 5
0.86
0.9
Node 5
Node 6 0.84
5
10 15 Iteration Steps
20
Node 6 0.75
25
5
(a)
10
15 Iteration Steps
20
25
(b)
Figure 3. (a) Probability of detection as a function of consensus iteration steps. (b) Probability of detection as a function of consensus iteration steps with Byzantines.
0.14
0.25 Node 1
Node 1 0.12
Node 2
Node 2
0.1
Probability of False Alarm
Probability of False Alarm
0.2 Node 3 Node 4 0.08
Node 5 Node 6
0.06 0.04
Node 3 Node 4
0.15
Node 5 Node 6
0.1
0.05 0.02 0
5
10
15 Iteration Steps
20
25
(a)
0
5
10 15 Iteration Steps
20
25
(b)
Figure 4. (a) Probability of false alarm as a function of consensus iteration steps. (b) Probability of false alarm as a function of consensus iteration steps with Byzantines.
dominated by the Byzantine nodes’ local statistic that will lead to degraded detection performance. It will be impossible for any algorithm to detect this type of malicious behavior, since any weight that a Byzantine chooses for itself is a legitimate value that could also have been chosen by a node that is functioning correctly. Thus, in the conventional consensus algorithms weight manipulation cannot be detected and therefore, conventional consensus algorithms cannot be used in the presence of an attacker. Second, as will be seen later, the optimal weights given to nodes’ sensing data depend on the following unknown parameters: identity of the nodes, which indicates whether the node is honest or Byzantine, and underlying statistical distribution of the nodes’ data. In the next section, we address these concerns by proposing a learning based robust weighted average consensus algorithm. V. A ROBUST C ONSENSUS BASED D ETECTION A LGORITHM In order to address the first issue discussed in Section IV, which is the optimal weight design, we propose a consensus algorithm in which the weight for node i’s information is controlled (or updated) by the neighbors of node i rather than by node i itself. Note that, networks deploying such an algorithm are more robust to weight manipulation because if
a Byzantine node j wants to assign an incorrect weight to the data of its neighbor i in the global test statistic, it has to ensure that all the neighbors of node i put the same incorrect weight as node j. Furthermore, the proposed algorithm enables the detection of weight-manipulating Byzantines (in contrast to conventional consensus algorithms). The attack can be treated as a consensus disruption attack and weight manipulation can be detected, unless all the neighbors of honest nodes are Byzantines.4 A. Distributed Algorithm for Weighted Average Consensus In this section, we address the following questions: does there exist a distributed algorithm that solves the weighted average consensus problem while satisfying the condition that weights must be controlled (or updated) by neighbors Ni of node i rather than by node i itself? If such an algorithm exists, then, what are the conditions or constraints for the algorithm to converge? We consider a network of N nodes with a fixed and connected topology G(V, E). Next, we state Perron-Frobenius theorem [35], which will be used later for the design and analysis of our robust weighted average consensus algorithm. 4 Weight manipulating Byzantines can be easily identified by techniques such as those given in [6], [33].
8
Theorem 1 ( [35]). Let W be a primitive non-negative matrix with left and right eigenvectors u and v, respectively, satisfying T W v = v and uT W = uT . Then limk→∞ W k = vu . vT u Using the above theorem, we take a reverse-engineering ˆ which has approach to design a modified Perron matrix W T the weight vector w = [w1 , w2 , · · · , wN ] , wi > 0, ∀i as its left eigenvector and ~1 as its right eigenvector corresponding to eigenvalue 1. From the above theorem, if the modified Perron ˆ is primitive and non-negative, then a weighted matrix W average consensus can be achieved. Now, the problem boils ˆ which meets our requirement down to designing such a W that weights are controlled (or updated) by the neighbors Ni of node i rather than by node i itself. For this purpose, we propose a modified Perron matrix ˆ = I − (T ⊗ L) where L is the original graph Laplacian, W ⊗ is element-wise matrix multiplication operator, and T is a transformation given by P j∈N wj i if i = j [T ]ij = l w ii otherwise. j Observe that, the above transformation T satisfies the condition that weights are controlled (or updated) by neighbors Ni of node i rather than by node i itself. Based on the above transformation T , we propose our distributed consensus algorithm: P wj (xj (k) − xi (k)). xi (k + 1) = xi (k) + j∈Ni
Note that, the form of our update equation is different from the conventional update equation. Let us denote the modified ˆ = I − L, ˆ where L ˆ = T ⊗ L. Perron matrix by W We then explore the properties of the modified Perron matrix ˆ and show that it satisfies the requirements of the PerronW Frobenius theorem [35]. These properties will later be utilized to prove the convergence of our proposed consensus algorithm. Lemma 3. Let G be a connected graph with N nodes. Then, ˆ = I − (T ⊗ L), with 0 < < the modified Perron matrix W 1 X satisfies the following properties. max wj i
j∈Ni
ˆ is a nonnegative matrix with left eigenvector w and 1) W right eigenvector ~1 corresponding to eigenvalue 1; ˆ are in a unit circle; 2) All eigenvalues of W ˆ is a primitive matrix5 . 3) W
0 0, > 0 and X ˆ max wj ≤ 1, ∀i. Since w is the left eigenvector of L T
i
j∈Ni
9
9
1 0.9
8
0.8
6 5 Node 1
4
Node 2 Node 3
3 2
Probability of detection
State Value
7 0.7 0.6 0.5 0.4 0.3
Node 4
0.2
Node 5
0.1
Proposed approach Equal gain combining Cuttoff combining
Node 6 1
Figure 5.
5
10
15 Iteration Steps
20
25
30
Convergence of the network with a 6-nodes ( = 0.3).
0
Figure 6.
0
0.2
0.4 0.6 Probability of false alarm
N
max
1 ,{δiH }N {δiB }i=1 i=N
s.t.
1
(µ1 − µ0 )2 2 σ(0) +1
N1 X
δiB
+
Next, to address the second issue discussed in Section IV, we exploit the statistical distribution of the sensing data and devise techniques to mitigate the influence of Byzantines on the distributed detection system. We propose a three-tier mitigation scheme where the following three steps are performed at each node: 1) identification of Byzantine neighbors, 2) estimation of parameters of identified Byzantine neighbors, and 3) adaptation of consensus algorithm (or update weights) using estimated parameters. We first present the design of distributed optimal weights for the honest/Byzantine nodes assuming that the identities of the nodes are known. Later we will explain how the identity of nodes (i.e., honest/Byzantine) can be determined. 1) Design of Distributed Optimal Weights in the Presence of Byzantines: In this subsection, we derive closed form expressions for the distributed optimal weights which maximize the deflection coefficient. First, we consider the P PN 1
w B Y˜i +
N
w H Yi
i i=1 P i=N1 +1 i where global test statistic Λ = w P PN1 B PN H w = i=1 wi + i=N1 +1 wi and obtain a closed form solution for optimal centralized weights. Then, we extend our analysis to the distributed scenario. Let us denote by δiB , the centralized weight given to the Byzantine node and by δiH , the centralized weight given to the Honest P P node. By considering δiB = wiB / w and δiH = wiH / w, the optimal weight design problem can be stated formally as:
N X
δiH = 1
i=N1 +1
i=1
B. Adaptive Design of the Update Rules based on Learning of Nodes’ Behavior
1
ROC for different protection approaches
consensus has been reached on the global decision statistics, the weighted average of the initial values (states). In the proposed consensus algorithm, weights given to node i’s data are updated by neighbors of the node i rather than by node i itself which addresses the first issue discussed in Section IV.
0.8
2 where µ1 , µ0 and σ(0) are given in (3), (4) and (5), respectively. The solution of the above problem is presented in the next lemma.
Lemma 4. Optimal centralized weights which maximize the deflection coefficient are given as δiB
=
wiB N1 P
wiB
N1 P
wiB +
i=1
δiH
=
i=1
where wiB =
+
wiH
N P
wiH
i=N1 +1
wiH
,
N P
i=N1 +1
(ηi σi2
ηi − 2Pi ∆i ) and wiH = . ∆2i Pi (1 − Pi ) + 2M σi4 2M σi2
Proof: The above results can be obtained by setting the derivative of the deflection coefficient equal to zero and solving the equation. Note that, the constraint is trivially satisfied by normalizing the obtained weights as the value of deflection coefficient is unchanged after normalization. The optimality of the weights in Lemma 4 can also be verified by upper bounding the expression of deflection coefficient using the Cauchy-Schwarz inequality and observing that this upper bound is achieved by the weights in Lemma 4. Remark 1. Distributed optimal weights can be chosen as wiB and wiH . Thus, the value of the global test statistic (or final weighted average consensus) is the same as the optimal
10
2) Identification, Estimation, and Adaptive Fusion Rule: The first step at each node m is to determine the identity (I i ∈ {H, B}) of its neighboring nodes i ∈ Nm . Notice that, if node i is an honest node, its data under hypothesis Hk is normally distributed N ((µ1k )i , (σ1k )2i ). On the other hand, if node i is a Byzantine node, its data under hypothesis Hk is a Gaussian mixture which comes from N ((µ1k )i , (σ1k )2i ) with probability (αi1 = 1 − Pi ) and from N ((µ2k )i , (σ2k )2i ) with probability αi2 = Pi . Therefore, determining the identity (I i ∈ {H, B}) of neighboring nodes i ∈ Nm can be posed as a hypothesis testing problem:
1
Probability of Detection ’Pd’
0.95 0.9 0.85 0.8 Point where Deflection Coefficient is zero
0.75 0.7 0.65 0.6
Figure 7.
Byzantine Data Included Byzantine Data Excluded 0
20
40 60 Attack Strength ’∆’
80
100
Probability of Detection as a function of attack strength
centralized weighted combining scheme7 . Next, to gain insights into the solution, we present some numerical results in Figure 6 that corroborate our theoretical results. We assume that M = 12, ηi = 3, σi2 = 0.5 and the attack parameters are (Pi , ∆i ) = (0.5, 9). In Figure 6, we compare our proposed weighted average consensus-based detection scheme with the equal gain combining scheme8 and the scheme where Byzantines are excluded from the fusion process. It can be clearly seen from the figure that our proposed scheme performs better than the rest of the schemes. Next, in Figure 7, we plot the probability of detection of the proposed scheme as a function of the attack strength ∆ for Pi = 1. The threshold λ is chosen to constrain the probability of false alarm below a constant δ = 0.01. Also, we assume that M = 12, ηi = 10, σi2 = 6. It can be seen from the figure that the worst detection performance is when the deflection coefficient is zero which also implies that the attacker is not providing any information or is excluded from the data fusion process. The reason for this is that when the deflection coefficient is non-zero, the information provided by the attacker is also non-zero which is being utilized by the proposed scheme to improve the detection performance. Notice that, the optimal weights for the Byzantines are functions of the attack parameters (Pi , ∆i ), which may not be known to the neighboring nodes in practice. In addition, the parameters of the honest nodes might also not be known. Therefore, we propose a technique to learn or estimate these parameters. We then use these estimates to adaptively design the local fusion rule which are updated after each learning iteration. 7 Note that, weights w B can be negative and in that case convergence of i the proposed algorithm is not guaranteed. However, this situation can be dealt off-line by adding a constant value to make wiB ≥ 0 and changing the threshold λ accordingly. More specifically, by choosing a constant c such that wiB + x c(0) ≥ 0, ∀i and λ ← λ + βc where β is the number of i
nodes with wiB < 0. 8 In equal gain combining scheme, all the nodes (including Byzantines) are given the same weight.
I0 (I i = H) : Yi is generated from a Gaussian distribution under each hypothesis Hk ; I1 (I i = B) : Yi is generated from a Gaussian mixture distribution under each hypothesis Hk . Node classification can then be achieved using the maximum likelihood decision rule: f (Yi | I0 )
H
≷
f (Yi | I1 )
(7)
B
where f (Yi | Il ) is the probability density function (PDF) of Yi under each hypothesis Il . However, the parameters of the distributions are not known. Next, we propose a technique to learn these parameters. For an honest node i, the parameters to be estimated are ((µ1k )i , (σ1k )2i ) and for Byzantines the unknown parameter set to be estimated is θ = {αij , (µjk )i , (σjk )2i }, where k = {0, 1}, j = {1, 2} and i = 1, · · · , Nm , for Nm neighboring nodes. These parameters are estimated by observing the data over multiple learning iterations. In each learning iteration t, each node in the network employs the data coming from their neighbors for D detection intervals to learn their respective parameters. It is assumed that each node has the knowledge of the past D hypothesis test results (or history) through a feedback mechanism.9 Also, notice that the learning is done in a separate learning phase and is not a part of consensus iterations. First, we explain how the unknown parameter set for the distribution under the null hypothesis (I0 ) can be estimated. Let us denote the data coming from an honest neighboring node i as Yi (t) = [yi0 (1), · · · , yi0 (D1 (t)), yi1 (D1 (t)+1), · · · , yi1 (D)] where D1 (t) denotes the number of times H0 occurred in learning iteration t and yik denotes the data of node i when the true hypothesis was Hk . To estimate the parameter set, ((µ1k )i , (σ1k )2i ), of an honest neighboring node, one can employ a maximum likelihood based estimator (MLE). We use ((ˆ µ1k )i (t), (ˆ σ1k )2i (t)) to denote the estimates at learning iteration t, where each learning iteration consists of D detection intervals. The ML estimate of ((µ1k )i , (σ1k )2i ) can be written in a recursive form as following: 9 Note that, there exist several applications where this assumption is valid, e.g., cognitive radio networks, sensor networks, etc. Also, the proposed method can be extended to the scenarios where the knowledge of the true hypothesis is not available at the cost of analytical tractability.
11
(ˆ σ10 )2i (t
+ 1) =
t P
D1 (r)[(ˆ σ10 )2i (t) + ((ˆ µ10 )i (t + 1) − (ˆ µ10 )i (t))2 ] +
r=1
D1P (t+1)
[yi0 (d) − (ˆ µ10 )i (t + 1)]2
d=1
t+1 P
(8)
D1 (r)
r=1
(ˆ σ11 )2i (t
+ 1) =
t P
D−DP 1 (t+1)
[yi1 (d) − (ˆ µ11 )i (t + 1)]2
(D − D1 (r))[(ˆ σ11 )2i (t) + ((ˆ µ11 )i (t + 1) − (ˆ µ11 )i (t))2 ] +
r=1
d=1
t+1 P
(9)
(D − D1 (r))
r=1
(ˆ µ10 )i (t + 1) =
(ˆ µ11 )i (t + 1)
=
t P
r=1 t+1 P
D1 (r) (ˆ µ10 )i (t) + D1 (r)
r=1 t P
1 t+1 P
D1 (t+1)
X
yi0 (d)(10)
n=1
(ˆ µ11 )i (t) +
(D − D1 (r)) 1 (D − D1 (r))
D X
θ
yi1 (d)(11)
d=D1 (t+1)
First, we maximize Q(θ, θl ) subject to the constraint
r=1
where expressions for (σˆ10 )2i and (σˆ11 )2i are given in (8) and (9), respectively. Observe that, by writing these expressions in a recursive manner, we need to store only D data samples at any given learning iteration t, but effectively use all tD data samples to determine the estimates. Next, we explain how the unknown parameter set for the distribution under the alternate hypothesis (I1 ) can be estimated. Since the data is distributed as a Gaussian mixture, we employ the expectation-maximization (EM) algorithm to estimate the unknown parameter set for Byzantines. Let us denote the data coming from a Byzantine neighbor i as ˜ i (t) = [˜ Y yi0 (1), · · · , y˜i0 (D1 (t)), y˜i1 (D1 (t) + 1), · · · , y˜i1 (D)] where D1 (t) denotes the number of times H0 occurred in learning iteration t and y˜ik denotes the data of node i when the true hypothesis was Hk . Let us denote the hidden variable as zj with j = {1, 2} or (Z = [z1 , z2 ]). Now, the joint conditional PDF of y˜ik and zj , given the parameter set is =
P (zj |˜ yik (d), θ)P (˜ yik (d)|(µjk )i , (σjk )2i )
=
yik (d)|(µjk )i , (σjk )2i ) αij P (˜
In the expectation step of EM, we compute the expectation of the log-likelihood function with respect to the hidden ˜ i , and the current variables zj , given the measurements Y estimate of the parameter set θl . This is given by Q(θ, θ l )
=
˜ i , Z|θ)|Y ˜ i , θl ] E[log P (Y
=
2 DX 1 (t) X
2 X
D X
1. We define the Lagrangian L as 2 X L = Q(θ, θl ) + λ αij − 1 . Now, we equate the derivative of L to zero: d L = λ+ dαij
DP 1 (t) d=1
P (zj |˜ yi0 (d), θl ) +
αij
D P
d=D1 (t)+1
P (zj |˜ yi1 (d), θl ) = 0.
αij
Multiplying both sides by αij and summing over j gives λ = −D. Similarly, we equate the derivative of Q(θ, θl ) with respect to (µjk )i and (σk )2i to zero. Now, an iterative algorithm for all the parameters is αij (l
D1 (t) 1 X + 1) = P (zj |˜ yi0 (d), θ l ) + D d=1
(µj0 )i (l + 1) =
DP 1 (t) d=1
DP 1 (t) d=1
(µj1 )i (l + 1) =
(14)
d=D1 (t)+1 D P
P (zj |˜ yi1 (d), θ l )˜ yi1 (d) (15)
d=D1 (t)+1
(σj0 )2i (l
+ 1) =
DP 1 (t)
j=1 d=1
P (zj |˜ yi1 (d), θ l )
P (zj |˜ yi0 (d), θ l )(˜ yi0 (d) − (µj0 )i (l + 1))2 2 P
(σj1 )2i (l + 1) =
2 P
d=D1 (t)+1
P (zj |˜ yi1 (d), θ l )(13)
P (zj |˜ yi0 (d), θ l )
D P
2 P
D X
P (zj |˜ yi0 (d), θ l )˜ yi0 (d)
DP 1 (t)
j=1 d=1
j=1 d=D1 (t)+1
αij =
j=1
yi0 (d), θ l )] yi0 (d)|(µj0 )i , (σj0 )2i )P (zj |˜ log[αij P (˜ yi1 (d), θ l )] yi1 (d)|(µj1 )i , (σj1 )2i )P (zj |˜ log[αij P (˜
2 P
j=1
j=1 d=1
+
(12)
θl+1 = arg max Q(θ, θl ).
r=1
P (˜ yik (d), zj |θ)
.
αin (l)P (˜ yik (d)|(µnk )i (l), (σnk )2i (l))
In the maximization step of the EM algorithm, we maximize Q(θ, θl ) with respect to the parameter set θ so as to compute the next parameter set:
(D − D1 (r))
t+1 P
αij (l)P (˜ yik (d)|(µjk )i (l), (σjk )2i (l)) 2 P
D1 (r) d=1
r=1
r=1 t+1 P
P (zj |˜ yik (d), θ l ) =
D P
j=1 d=D1 (t)+1
(16) P (zj |˜ yi0 (d), θ l )
P (zj |˜ yi1 (d), θ l )(˜ yi1 (d) − (µj1 )i (l + 1))2
2 P
D P
j=1 d=D1 (t)+1
(17) P (zj |˜ yi1 (d), θ l )
12
combining scheme approaches the detection performance of weighted gain combining with known optimal weight based scheme. Note that, the above learning based scheme can be used in conjunction with the proposed weighted average consensusbased algorithm to mitigate the effect of Byzantines.
1 0.9 0.8 0.7
Pd
0.6 0.5
VI. C ONCLUSION
0.4 Learning Iteration 1 Learning Iteration 2 Learning Iteration 3 Learning Iteration 4 Optimal weights
0.3 0.2 0.1 0
0
0.2
0.4
0.6
0.8
1
P
f
Figure 8.
ROC for different learning iterations
In the learning iteration t, let the estimates after the ˆ convergence of the above algorithm be denoted by θ(t) = {α ˆ ij (t), (ˆ µjk )i (t), (ˆ σjk )2i (t)}. These estimates are then used as the initial values for the next learning iteration t + 1 that uses a new set of D data samples. After learning the unknown parameter set under I0 and I1 , node classification can be achieved using the following maximum likelihood decision rule: fˆ(Yi | I0 )
H
≷
fˆ(Yi | I1 )
(18)
B
where fˆ(·) is the PDF based on estimated parameters. Using the above estimates and node classification, the optimal distributed weights for honest nodes after learning iteration t can be written as wiH (t) =
(ˆ µ11 )i (t) − (ˆ µ10 )i (t) . (ˆ σ10 )2i (t)
(19)
Similarly, the optimal distributed weights for Byzantines after learning iteration t can be written as
wiB (t) =
2 P
j=1
α ˆij (t)[(µj1 )i (t) − (ˆ µj0 )i (t)] α ˆi1 (t)ˆ αi2 (t)X + Y
(20)
where X = ((ˆ µ10 (t))i − (ˆ µ20 (t))i )2 and Y = 2 2 i i α ˆ 1 (t) (ˆ σ10 )i (t) + α ˆ 2 (t) (ˆ σ20 )i (t) . Next, we present some numerical results in Figure 8 to evaluate the performance of our proposed scheme. Consider the scenario where 6 nodes organized in an undirected graph (as shown in Figure 1) are trying to detect a phenomenon. Node 1 and node 2 are considered to be Byzantines. We assume that ((µ10 )i , (σ10 )2i ) = (3, 1.5), ((µ11 )i , (σ11 )2i ) = (4, 2) and the attack parameters are (Pi , ∆i ) = (0.5, 9). In Figure 8, we plot ROC curves for different number of learning iterations. For every learning iteration, we assume that D1 = 10 and D = 20. It can be seen from Figure 8 that within 4 learning iterations, detection performance of the learning based weighted gain
AND FUTURE WORK
In this paper, we analyzed the security performance of conventional consensus-based algorithms in the presence of data falsification attacks. We showed that above a certain fraction of Byzantine attackers in the network, existing consensusbased detection algorithm are ineffective. Next, we proposed a robust distributed weighted average consensus algorithm and devised a learning technique to estimate the operating parameters (or weights) of the nodes. This enables an adaptive design of the local fusion or update rules to mitigate the effect of data falsification attacks. We demonstrated that the proposed scheme, which uses the information of the identified Byzantines to network’s benefit, outperforms exclusion based approaches where the only defense is to identify and exclude the attackers from the consensus process. There are still many interesting questions that remain to be explored in the future work such as an analysis of the problem for time varying topologies. Note that, some analytical methodologies used in this paper are certainly exploitable for studying the attacks in time varying topologies. Other questions such as the optimal topology so as to result in the fastest convergence rate and the problem with covert data falsification attacks with a smart adversary who disguises himself from the proposed detection scheme while accomplishing its attack can also be investigated. Also, in this paper, we have assumed that the hypothesis does not change during the consensus iterations. One interesting direction to consider in the future is to study the problem where hypotheses are allowed to change during the information fusion phase. A PPENDIX A The local test statistic Yi has the mean M σi2 if H0 meani = (M + ηi )σi2 if H1 and the variance V ari =
2M σi4 2(M + 2ηi )σi4
if H0 if H1 .
The goal of Byzantine nodes is to make the deflection coefficient as small as possible. Since the Deflection Coefficient is always non-negative, the Byzantines seek to make D(Λ) = (µ1 − µ0 )2 = 0. The conditional mean µk = E[Λ|Hk ] and 2 σ(0) 2 conditional variance σ(0) = E[(Λ − µ0 )2 |H0 ] of the global P P PN1 w), can test statistic, Λ = ( i=1 w ˜i Y˜i + N i=N1 +1 wi Yi )/( be computed and are given by (3), (4) and (5), respectively. After substituting values from (3), (4) and (5), the condition to make D(Λ) = 0 becomes
13
[3] V. Veeravalli and P. K. Varshney, “Distributed inference in wireless sensor networks,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 370, pp. 100– w ˜i (2Pi ∆i − = 117, 2012. i=1 i=N1 +1 [4] R. Olfati-Saber, J. Fax, and R. Murray, “Consensus and Cooperation in Networked Multi-Agent Systems,” Proceedings of the IEEE, vol. 95, A PPENDIX B no. 1, pp. 215–233, Jan 2007. [5] W. Zhang, Z. Wang, Y. Guo, H. Liu, Y. Chen, and J. Mitola, “Distributed Note that, for sufficiently large M (in practice M ≥ 12), Cooperative Spectrum Sensing Based on Weighted Average Consensus,” the distribution of Byzantine’s data Y˜i given Hk is a Gaussian in Global Telecommunications Conference (GLOBECOM 2011), 2011 2 IEEE, Dec 2011, pp. 1–6. mixture which comes from N ((µ1k )i , (σ1k )i ) with probability [6] F. Pasqualetti, A. Bicchi, and F. Bullo, “Consensus Computation in Un(1 − P ) and from N ((µ2k )i , (σ2k )2i ) with probability P , and reliable Networks: A System Theoretic Approach,” Automatic Control, IEEE Transactions on, vol. 57, no. 1, pp. 90–104, Jan 2012. (µ10 )i = M σi2 , (µ20 )i = M σi2 + ∆i [7] G. Xiong and S. Kishore, “Consensus-based distributed detection algorithm in wireless ad hoc networks,” in Signal Processing and Commu(µ11 )i = (M + ηi )σi2 , (µ21 )i = (M + ηi )σi2 − ∆i nication Systems, 2009. ICSPCS 2009. 3rd International Conference on, Sept 2009, pp. 1–6. (σ10 )2i = (σ20 )2i = 2M σi4 , and (σ11 )2i = (σ21 )2i = 2(M +2ηi )σi4 .[8] S. Aldosari and J. Moura, “Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks,” in Signals, Systems t ˜ Yi Now, the probability density function (PDF) of xtji = wji and Computers, 2005. Conference Record of the Thirty-Ninth Asilomar Conference on, Oct 2005, pp. 230–234. conditioned on Hk can be derived as [9] S. Kar, S. Aldosari, and J. Moura, “Topology for Distributed Inference t t on Graphs,” Signal Processing, IEEE Transactions on, vol. 56, no. 6, f (xtji |Hk ) = (1 − P )φ(wji (µ1k )i , (wji (σ1k )i )2 ) pp. 2609–2613, June 2008. t t +P φ(wji (µ2k )i , (wji (σ2k )i )2 ) (21) [10] F. Yu, M. Huang, and H. Tang, “Biologically inspired consensus-based spectrum sensing in mobile Ad Hoc networks with cognitive radios,” Network, IEEE, vol. 24, no. 3, pp. 26–30, May 2010. where φ(x|µ, σ 2 ) (for notational convenience denoted as [11] D. Bajovic, D. Jakovetic, J. Xavier, B. Sinopoli, and J. M. F. Moura, 2 2 2 φ(µ, σ )) is the PDF of X ∼ N (µ, σ ) and φ(x|µ, σ ) = “Distributed detection via gaussian running consensus: Large deviations 2 2 asymptotic analysis,” IEEE Transactions on Signal Processing, vol. 59, √1 e−(x−µ) /2σ . σ 2π 9, pp. 4381–4396, Sept 2011. Now, for the three node case, the transient test statistic [12] no. A. Sayed, “Adaptation, learning, and optimization over networks,” ˜ t = wt Y˜1 + wt Y˜2 + wt Y3 , is a summation of independent Λ Found. Trends Mach. Learn., vol. 7, no. 4-5, pp. 311–801, Jul. 2014. j3 j2 j j1 t t ˜ [Online]. Available: http://dx.doi.org/10.1561/2200000051 Yi is random variables. The conditional PDF of Xji = wji [13] V. Matta, P. Braca, S. Marano, and A. H. Sayed, “Exact asymptotics of t ˜ given in (21). Notice that, PDF of Λj is the convolution (∗) of distributed detection over adaptive networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, f (xtj1 ) = (1−P )φ(µ11 , (σ11 )2 )+P φ(µ21 , (σ12 )2 ), f (xtj2 ) = (1− 2015, pp. 3377–3381. P )φ(µ12 , (σ21 )2 ) + P φ(µ22 , (σ22 )2 )) and f (xtj3 ) = φ(µ13 , (σ31 )2 ). [14] April L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Trans. Program. Lang. Syst., vol. 4, no. 3, pp. 382–401, Jul. f (zjt ) = f (xtj1 ) ∗ f (xtj2 ) ∗ f (xtj3 ) 1982. [Online]. Available: http://doi.acm.org/10.1145/357172.357176 [15] A. Fragkiadakis, E. Tragos, and I. Askoxylakis, “A survey on security f (zjt ) = [(1 − P )φ(µ11 , (σ11 )2 ) + P φ(µ21 , (σ12 )2 )]∗ threats and detection techniques in cognitive radio networks,” IEEE Communications Surveys Tutorials, vol. 15, no. 1, pp. 428–445, 2013. 1 1 2 2 2 2 1 1 2 [(1 − P )φ(µ2 , (σ2 ) ) + P φ(µ2 , (σ2 ) )] ∗ φ(µ3 , (σ3 ) ) [16] H. Rif`a-Pous, M. J. Blasco, and C. Garrigues, “Review of robust cooperative spectrum sensing techniques for cognitive radio networks,” 1 1 2 1 2 2 1 1 2 1 = (1 − P ) [φ(µ1 , (σ1 ) ) ∗ φ(µ2 , (σ2 ) ) ∗ φ(µ3 , (σ3 ) )] Wirel. Pers. Commun., vol. 67, no. 2, pp. 175–198, Nov. 2012. [Online]. Available: http://dx.doi.org/10.1007/s11277-011-0372-x [17] S. Marano, V. Matta, and L. Tong, “Distributed detection in the presence +(P )2 [φ(µ21 , (σ12 )2 ) ∗ φ(µ22 , (σ22 )2 )) ∗ φ(µ13 , (σ31 )2 )] of byzantine attacks,” IEEE Trans. Signal Process., vol. 57, no. 1, pp. 16 –29, Jan. 2009. +P (1 − P )[φ(µ21 , (σ12 )2 ) ∗ φ(µ12 , (σ21 )2 ) ∗ φ(µ13 , (σ31 )2 )] [18] A. Rawat, P. Anand, H. Chen, and P. Varshney, “Collaborative spectrum sensing in the presence of byzantine attacks in cognitive radio networks,” +(1 − P )P [φ(µ11 , (σ11 )2 ) ∗ φ(µ22 , (σ22 )2 ) ∗ φ(µ13 , (σ31 )2 )] IEEE Trans. Signal Process., vol. 59, no. 2, pp. 774 –786, Feb 2011. Now, using the fact that convolution of two normal PDFs [19] B. Kailkhura, S. Brahma, Y. S. Han, and P. K. Varshney, “Distributed Detection in Tree Topologies With Byzantines,” IEEE Trans. Signal φ(µi , σi2 ) and φ(µj , σj2 ) is again normally distributed with Process., vol. 62, pp. 3208–3219, June 2014. mean (µi + µj ) and variance (σi2 + σj2 ), we can derive the [20] ——, “Optimal Distributed Detection in the Presence of Byzantines,” in Proc. The 38th International Conference on Acoustics, Speech, and results below. Signal Processing (ICASSP 2013), Vancouver, Canada, May 2013. [21] A. Vempaty, K. Agrawal, H. Chen, and P. K. Varshney, “Adaptive f (zjt ) = (1 − P )2 [φ(µ11 + µ12 + µ13 , (σ11 )2 + (σ21 )2 + (σ31 )2 )] learning of Byzantines’ behavior in cooperative spectrum sensing,” in Proc. IEEE Wireless Comm. and Networking Conf. (WCNC), March +P 2 [φ(µ21 + µ22 + µ13 , (σ12 )2 + (σ22 )2 + (σ31 )2 )] 2011, pp. 1310 –1315. [22] B. Kailkhura, Y. Han, S. Brahma, and P. Varshney, “On covert data fal2 1 1 2 2 1 2 1 2 +P (1 − P )[φ(µ1 + µ2 + µ3 , (σ1 ) + (σ2 ) + (σ3 ) )] sification attacks on distributed detection systems,” in 13th International Symposium on Communications and Information Technologies (ISCIT), +(1 − P )P [φ(µ11 + µ22 + µ13 , (σ11 )2 + (σ22 )2 + (σ31 )2 )]. 2013, Sept 2013, pp. 412–417. [23] B. Kailkhura, S. Brahma, and P. Varshney, “On the performance analysis of data fusion schemes with byzantines,” in 2014 IEEE International R EFERENCES Conference on Acoustics, Speech and Signal Processing (ICASSP), May [1] P. K. Varshney, Distributed Detection and Data Fusion. New 2014, pp. 7411–7415. York:Springer-Verlag, 1997. [24] A. Min, K.-H. Kim, and K. Shin, “Robust cooperative sensing via state [2] R. Viswanathan and P. K. Varshney, “Distributed detection with multiple estimation in cognitive radio networks,” in New Frontiers in Dynamic sensors: Part I - Fundamentals,” Proc. IEEE, vol. 85, no. 1, pp. 54 –63, Spectrum Access Networks (DySPAN), 2011 IEEE Symposium on, May Jan 1997. 2011, pp. 185–196. N1 X
ηi σi2 )
N X
wi ηi σi2
14
[25] H. Tang, F. Yu, M. Huang, and Z. Li, “Distributed consensus-based security mechanisms in cognitive radio mobile ad hoc networks,” Communications, IET, vol. 6, no. 8, pp. 974–983, May 2012. [26] F. Yu, H. Tang, M. Huang, Z. Li, and P. Mason, “Defense against spectrum sensing data falsification attacks in mobile ad hoc networks with cognitive radios,” in Military Communications Conference, 2009. MILCOM 2009. IEEE, Oct 2009, pp. 1–7. [27] F. Yu, M. Huang, and H. Tang, “Biologically inspired consensus-based spectrum sensing in mobile Ad Hoc networks with cognitive radios,” Network, IEEE, vol. 24, no. 3, pp. 26–30, May 2010. [28] S. Liu, H. Zhu, S. Li, X. Li, C. Chen, and X. Guan, “An adaptive deviation-tolerant secure scheme for distributed cooperative spectrum sensing,” in Global Communications Conference (GLOBECOM), 2012 IEEE, Dec 2012, pp. 603–608. [29] Q. Yan, M. Li, T. Jiang, W. Lou, and Y. Hou, “Vulnerability and protection for distributed consensus-based spectrum sensing in cognitive radio networks,” in INFOCOM, 2012 Proceedings IEEE, March 2012, pp. 900–908. [30] Z. Li, F. Yu, and M. Huang, “A Distributed Consensus-Based Cooperative Spectrum-Sensing Scheme in Cognitive Radios,” Vehicular Technology, IEEE Transactions on, vol. 59, no. 1, pp. 383–393, Jan 2010. [31] F. Digham, M.-S. Alouini, and M. K. Simon, “On the energy detection of unknown signals over fading channels,” in Communications, 2003. ICC ’03. IEEE International Conference on, vol. 5, May 2003, pp. 3575– 3579 vol.5. [32] F. F. Digham, M. S. Alouini, and M. K. Simon, “On the energy detection of unknown signals over fading channels,” in IEEE International Conference on Communications, 2003. ICC ’03., vol. 5, May 2003, pp. 3575–3579 vol.5. [33] S. Sundaram and C. Hadjicostis, “Distributed Function Calculation via Linear Iterative Strategies in the Presence of Malicious Agents,” Automatic Control, IEEE Transactions on, vol. 56, no. 7, pp. 1495– 1508, July 2011. [34] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory. ser. Prentice Hall Signal Processing Series, A. V. Oppenheim, Ed. Prentice Hall PTR, 1998. [35] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1987.
Bhavya Kailkhura (S’12) received the M.S. degree in electrical engineering from Syracuse University, Syracuse, NY. Since 2012, he has been pursuing the Ph.D. degree with the Department of Electrical Engineering and Computer Science, Syracuse University. His research interests include high-dimensional data analysis, signal processing, machine learning and their applications to solve inference problems with security and privacy constraints.
Swastik Brahma (S’09-M’14) received the B.Tech. degree in computer science and engineering from West Bengal University of Technology, Kolkata, India, and the M.S. and Ph.D. degrees in computer science from the University of Central Florida, Orlando, FL, USA, in 2005, 2008, and 2011, respectively. In 2011, he joined the Department of Electrical Engineering and Computer Science (EECS), Syracuse University, Syracuse, NY, USA, as a Research Associate, where he is currently a Research Scientist. His research interests include communication systems, detection and estimation theory, cybersecurity, and game theory. He serves as a technical program committee member for several conferences. He was the recipient of the Best Paper Award in the IEEE GLOBECOM Conference in 2008, and also the recipient of the Best Ph.D. Forum Award in the IEEE WoWMoM Conference in 2009. Pramod K. Varshney (S’72-M’77-SM’82-F’97) was born in Allahabad, India, on July 1, 1952. He received the B.S. degree in electrical engineering and computer science (with highest hons.), and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois at Urbana-Champaign in 1972, 1974, and 1976 respectively. From 1972 to 1976, he held teaching and research assistantships at the University of Illinois. Since 1976, he has been with Syracuse University, Syracuse, NY, where he is currently a Distinguished Professor of Electrical Engineering and Computer Science and the Director of CASE: Center for Advanced Systems and Engineering. He served as the Associate Chair of the department from 1993 to 1996. He is also an Adjunct Professor of Radiology at Upstate Medical University, Syracuse, NY. His current research interests are in distributed sensor networks and data fusion, detection and estimation theory, wireless communications, image processing, radar signal processing, and remote sensing. He has published extensively. He is the author of Distributed Detection and Data Fusion (Springer-Verlag, 1997). He has served as a consultant to several major companies. Dr. Varshney was a James Scholar, a Bronze Tablet Senior, and a Fellow while at the University of Illinois. He is a member of Tau Beta Pi and is the recipient of the 1981 ASEE Dow Outstanding Young Faculty Award. He was elected to the grade of Fellow of the IEEE in 1997 for his contributions in the area of distributed detection and data fusion. He was the Guest Editor of the Special Issue on Data Fusion of the PROCEEDINGS OF THE IEEE, January 1997. In 2000, he received the Third Millennium Medal from the IEEE and Chancellor’s Citation for exceptional academic achievement at Syracuse University. He is the recipient of the IEEE 2012 Judith A. Resnik Award, Doctor of Engineering honoris causa from Drexel University in 2014 and ECE Distinguished Alumni Award from the University of Illinois in 2015. He is on the Editorial Boards of the Journal on Advances in Information Fusion and IEEE Signal Processing Magazine. He was the President of International Society of Information Fusion during 2001.