©2010 Old City Publishing, Inc. Published by license under the OCP Science imprint, a member of the Old City Publishing Group
Ad Hoc & Sensor Wireless Networks, Vol. 0, pp. 1–30 Reprints available directly from the publisher Photocopying permitted by license only
Data Aggregation Security Challenge in Wireless Sensor Networks: A Survey Nabila Labraoui1, Mourad Guerroui2, Makhlouf Aliouat3 and Tanveer Zia4 1
STIC University of Tlemcen, Algeria
[email protected] 2 PRISM University of Versailles, France
[email protected] 3 University of Setif, Algeria
[email protected] 4 Charles Sturt University, Australia
[email protected] Received: February 27, 2010. Accepted: October 14, 2010.
Data aggregation in wireless sensor networks (WSN) is a rapidly emerging research area. It can greatly help conserve the scarce energy resources by eliminating redundant data thus achieving a longer network lifetime. However, securing data aggregation in WSN is made even more challenging, by the fact that the sensor nodes and aggregators deployed in hostile environments are exposed to various security threats. In this paper, we survey the current research related to security in data aggregation in wireless sensor networks. We have classified the security schemes studied in two main categories: cryptographic based scheme and trust based scheme. We provide an overview and a comparative study of these schemes and highlight the future research directions to address the flaws in existing schemes. Keywords: Data Aggregation, Security, Sensor Networks, Survey.
1 Introduction Advancements in micro electro mechanical systems (MEMS) [9] and wireless networks have made possible the advent of tiny sensor nodes called “smart dust” which are low cost small tiny devices with limited coverage, low power, smaller memory sizes and low bandwidth [2]. A wireless sensor network (WSN) [5] is an ad-hoc network consisting of a large number of sensor nodes deployed to sense their surrounding environment. Sensor nodes are 1
2
N. Labraoui et al.
usually used to collect and report application-specific data to the monitoring node, known as a “sink node” [1]. WSNs are expected to be solutions to many applications [12,13], such as providing health care for the elderly, surveillance, emergency, and battlefield intelligence data gathering. Along with the attractive features and increasingly important roles, sensor networks however have their inherent limitations. Resource constraints, which is determined by the design goal of small-size and low-cost; security vulnerability, due to the open nature of wireless communication channels and the lack of physical protection of individual sensor nodes which makes it easy for the adversaries to eavesdrop the communication and compromise sensor nodes [3]. Security solutions require high computation, memory, storage and energy resources, which creates an additional challenge when working with tiny sensor nodes [2]. Extensive research has been conducted to address these limitations by developing schemes that can improve resource efficiency and enhance security. In the resource constrained WSN environment, forwarding of large amounts of data becomes the major focus of energy and bandwidth optimization efforts. Data aggregation has thus been put forward as an essential technique to achieve power and bandwidth efficiency in WSN. Based on the principle that the sink does not necessarily need all raw pieces of data collected by each sensor but only a summary or aggregated data thereof, data aggregation is done by aggregators which are comparatively powerful sensor nodes having the ability to aggregate and process data forwarded by source nodes at each intermediate node enrooted to the sink. The data communication constitutes an important share of the total energy consumption of the sensor network; sending one bit requires almost the same amount of energy as executing 50 to 150 instructions [38]. Thus, data aggregation can greatly help conserve the scarce energy resources by eliminating redundant data [6], and achieving a longer network lifetime. Typical aggregation functions include SUM, AVERAGE, MAX/MIN, and so on [4,11]. However, data aggregation in sensor networks is even more challenging by the fact that the sensor nodes and aggregators deployed in hostile environments are exposed to various threats such as node compromise, injection of bogus aggregators, disclosure of sensed data and aggregate values to intruders or tampering with nodes and data transmitted over wireless links. Therefore, the processing and aggregation mechanisms need to be resilient against attacks where the aggregator and a fraction of the sensor nodes may be compromised. There are only two existing survey on secure data aggregation schemes, which are done by Sang et al. [51] and Alzaid [15]. The first survey classifies the frameworks into: hop-by-hop and end-to-end encrypted data aggregation. However, this classification does not detail the security analysis of these schemes nor their performance. The second survey classifies the frameworks into two groups, according to the number of aggregator nodes and whether the integrity of the aggregated result is considered or not. This last is more comprehensive because a conceptual secure data aggregation framework is
Data Aggregation Security
3
proposed to help in comparing secure data aggregation schemes. However, both of these surveys have focused only on solutions based on cryptographic techniques. They completely ignore the recently emerging solutions based on trust and reputation systems, which have been proposed as an attractive complement to cryptography in securing WSNs. In our paper, we present a global survey that includes the problem statement in secure data aggregation and a novel taxonomy of existing secure schemes. We provide an overview and a comparative study of these schemes and highlight the future research directions to address the flaws in existing schemes. Nonetheless, we believe our paper will serve as a useful guide and starting point for the researchers who are interested in conducting research in the secure data aggregation. The rest of the paper is organized as follows. Section 2 introduces the concept of data aggregation in wireless sensor networks. While section 3 introduces the problem statement of secure data aggregation in WSN. Section 4 presents the classification of secure aggregation schemes with description of existing works and Section 5 discusses the performance comparison of the cited schemes. Section 6 highlights future research directions. Concluding remarks are given in Section 7.
2. Data aggregation In a typical sensor network scenario, different nodes collect data from the environment and then send it to some central node or sink which analyze and process the data and then send it to the application. However, in many cases, data produced by different node can be jointly processed while being forwarded to the sink node. Therefore, in-network aggregation deals with this distributed processing of data within the network. Data aggregation techniques explore how the data is to be routed in the network as well as the processing methods that are applied on the packets received by a node. They have a great impact on the energy consumption of nodes and thus on network efficiency by reducing number of transmission or length of packet. Elena Fosolo et al. [59] defined the in-network aggregation process as follows: “In-network aggregation is the global process of gathering and routing information through a multi-hop network, processing data at intermediate nodes with the objective of reducing resource consumption (in particular energy), thereby increasing network lifetime.” 2.1 Approaches There are two approaches for in-network aggregation: with size reduction and without size reduction. In-network aggregation with size reduction refers to the process of combining and compressing the data packets received by a node from its neighbours in order to reduce the packet length to be transmitted or forwarded towards sink. As an example, consider the situation when a
4
N. Labraoui et al.
node receives two packets that have a spatial correlated data. In this case, it is worthless to send both packets. Instead of that, one should apply any function like AVG, MAX, MIN and then send a single packet. This approach considerably reduces the amount of bits transmitted in the network and thus saving a lot of energy but on the other hand, it also reduces the precision of value of data received. In-network aggregation without size reduction refers to the process merging data packets received from different neighbours in to a single data packet but without processing the value of data. As an example, two packets may contain different physical quantities (like temperature and humidity) and they can be merged in to a single packet by keeping both values intact but keeping a single header. This approach preserves the value of data and thus transmit more bits in the network but still reduce the overhead by keeping single header. This of the two approaches to use depends on many factors like the type of application, data rate, network characteristics and so on. There is also a trade-off between energy consumption and precision of data for the two approaches. Most of the work done till now on in-network aggregation mainly deals with problem of forwarding packets from source to sink, to facilitate aggregation therein. Actually the main idea behind were to enhance existing routing protocols such that they can efficiently aggregate data. So till now, most of the data aggregation techniques fall under three categories. They are treebased approaches, multi-path approaches, and cluster-based approaches. There are also some hybrid approaches that combine any of the three techniques above. 2.2 Data Model A major function of sensor networks is to collect data, such as temperature, moisture, location, acceleration, velocity, angle, weight, blood pressure, utility consumption, etc. Usually these sensor readings are expressed as digital values, which have a certain range [dmin; dmax]. For example, an electricity meter reading of a household is usually between 0 to 100 kwh per day, depending on the size and usage pattern. Usually these sensor readings are aggregated at the base station once an hour or once a half hour. The time interval between two data aggregation operation in such network is much larger than the time used for each data aggregation operation. 2.3 Efficiency of Data Aggregation Devices in wireless sensor networks (or sensors) are often resources-limited or energy-constrained. Hence, it is important to design and develop efficient data processing techniques to make effective use of the data. Data aggregation [58] is an efficient mechanism in query processing in which data is processed and aggregated within the network. Only processed and aggregated data is returned to the base (or query) station. In such a setting,
Data Aggregation Security
5
aggregators collect the raw information from the individual nodes, process it locally, and reply to the aggregate queries of a remote user. Compared to the centralized approach where all raw data are returned, data aggregation can achieve a significant reduction in communication overhead and hence save resource consumption and increase the lifetime of wireless sensor networks. As an example, Figure 1 shows a network with 7 nodes. When data is collected without data aggregation, totally 17 transmissions are needed; however, with data aggregation only 7 transmissions are needed. As the network size grows (for example, if the network size is 2500), data collection without data aggregation will consume extremely large bandwidth as shown in [58]. 2.4 Practical applications The sensor network community is investigating several disciplines in which sensor networks might be applicable for various purposes. The following paragraphs discuss some potential applications briefly. We consider the class of applications that utilizes in-network data aggregation. 2.4.1 Military applications Wireless sensor networks can form an integral part of military command, control, communications, computing, intelligence, surveillance, reconnaissance and targeting (C4ISRT) systems. The rapid deployment, self-organization and fault tolerance characteristics of sensor networks make them a very promising sensing technique for military C4ISRT. VigilNet: The VigilNet system is a real-time WSN for military surveillance. The general objective of VigilNet is to alert military command and control units to the occurrence of events of interest in hostile regions. The events of interest are the presence of people, people with weapons, or vehicles. Successful detection, tracking and classification require that the application obtain the current position of an object with acceptable precision and confidence. VigilNet is an operational self-organizing sensor network (of over 200 XSM Mote nodes) to provide tripwire-based surveillance with a
Figure 1 Illustration of benefit from data aggregation.
6
N. Labraoui et al.
sentry-based power management scheme, in order to achieve a minimum 3 to 6 month lifetime [65]. This project is sponsored by DARPA (Defense Advance Research Projects Agency). 2.4.2 Habitat and Environment monitoring Habitat and environmental monitoring represent an important class of sensor network applications that spans large geographic areas and monitor, model and forecast physical processes, such as environmental pollution, flooding etc. Great Duck Island: In the spring of 2002, the Intel Research Laboratory at Berkeley initiated collaboration with the College of the Atlantic in Bar Harbour and the University of California at Berkeley to deploy wireless sensor networks on Great Duck Island, Maine. These networks monitor the microclimates in and around nesting burrows used by the Leach’s Storm Petrel [61]. Thirty-two motes were placed at area of interest (e.g., inside a burrows). Those motes, grouped into sensor patches, transmit sensor reading to a gateway(CerfCube),which is responsible for aggregating and forwarding the data from the sensor patch to a remote base station through a local transmit network. The base station then provides data logging and replicates the data every 15 minutes to a Postgress database in Berkeley over satellite link. Users can interact with the sensor network in two ways. Remote users can access the replica data base server in Berkeley, a small PDA-size device can be used to perform local interactions such as adjusting the sampling rates, power management parameters etc. CORIE: CORIE is a prototype of EOFS for Columbia River. Thirteen stationary sensor nodes are deployed across the Columbia River estuary; one mobile sensor station drifts offshore. Those sensor stations are usually fixed on a pier or a buoy. The stationary stations are powered by power grid, while the mobile station uses solar panel to harness solar energy. Sensor data are transmitted via wireless link toward onshore master stations; they are then further forwarded to a centralized server and fed into a computationally intensive physical environment model. The output of the model is used to guide vessel transportation and forecasting. ALERT: Automated Local Evaluation in Real-Time (ALERT [62]) is probably the first well-known wireless sensor network being deployed in real world. It was developed by the National Weather Service in the 1970’s. ALERT provides important real-time rainfall and water level information to evaluate the possibility of potential flooding. ALERT sensor sites are usually equipped with meteorological/hydrological sensors, such as water level sensors, temperature sensors, and wind sensors, data are transmitted via light-ofsight radio communication from the sensor site to the base station, a Flood Forecast Model is adopted to process those data and issue automatic warning, web-based query is available. Currently ALERT is deployed across most of
Data Aggregation Security
7
the western United States; it is heavily used for flood alarming in California and Arizona. PODS-A Remote Ecological Micro-Sensor Network: PODS [63] is a research project in University of Hawaii that built wireless network of environmental sensor to investigate why endangered species of plants will grow in one area but not in neighbouring areas. They deployed camouflaged sensors node, called Pods, in Hawaii Volcanoes National Park. Two types of sensor data are collected, weather data are collected every ten minutes and image data are collected once per hour, users can use Internet to access the data from a server in University of Hawaii at Manoa. GLACSWEB monitoring system: The GlacsWeb project is developing a monitoring system for a glacial environment that will be transferable to other remote locales both on Earth and in space. GlacsWeb focuses on the ongoing research area of sub glacial bed deformation—how the ground on which glaciers rest affects their movement. To accurately study this environment, the system must autonomously record glaciers over a reasonable geographic area and a relatively long time. The GlacsWeb system consists of probes inserted in the glacier, a base station on the glacier surface, and a reference station that relays data to the SNS in Southampton, England. Most of the nine probes, deployed in 2003, are at the ice-sediment boundary from 50 to 80 meters deep. Each probe is equipped with pressure, temperature, and orientation (tilt in three dimensions) sensors. FireWxNet: FireWxNet is a multi-tiered portable wireless system for monitoring weather conditions in rugged wild land fire environments. FireWxNet provides fire fighting community with the ability to safely and easily measure and view fire and weather conditions over a wide range of locations and elevations within forest fires [66]. FireWxNet was deployed in the summer of 2005 in Montana and Colorado. 2.4.3 The Wine Industry One of the most popular early deployments of Sensor Networks is in the Wine Production Industry [15]. However, there are other successful examples in different parts of the world, such as how Pickberry Vineyard in Sonoma, California was able to manage its crop growing problems through the use of a Sensor Network provided by Accenture Labs [16]. From the perspective of vine, growing sensors can be used to monitor soil moisture, rainfall, wind velocity and direction, and air and soil temperature. The monitoring of these factors can be critical in the cost management for a vineyard. For example, detecting frost in a timely manner can prevent loss of a crop, all the more important since, to take the Pickberry example, each ton of grapes can be worth US$1000 to US$4000. However, while all the focus is currently on using Sensor Networks for the growing of grapes this is merely one of the stages in wine production for which
8
N. Labraoui et al.
intelligent sensors can play a role. For example, temperature must be strictly controlled during the vinification process (i.e. the conversion of grape juice into wine).
3 Problem statement: Secure Data Aggregation 3.1 Security Requirement in WSN aggregation In-network data processing, such as data fusion and aggregation [6], has emerged in the recent years as an active research area in WSN. One of the important issues related to data aggregation is to find a realistic balance between computational overhead, delay, data resolution and trustworthiness [7]. This section describes the required security primitives to strengthen the security in aggregation schemes. Data Integrity: This property ensures that the content of a message has not been altered, during transmission process. An adversary near the aggregator point will be able to change the aggregated result sent to the sink by adding some fragments or manipulating the packet’s content without detection. Moreover, even without the existence of an adversary, data might be damaged or lost due to the wireless environment. Data confidentiality: It is also essential to prevent leakage of sensitive data. If, for example, a network was responsible for monitoring a military target for planning a surprise attack, then it would be necessary to ensure that the privacy of the information is preserved so that the target does not become aware of the ensuing plans. For this reason, a sensor network that uses data aggregation is also required to protect the confidentiality of the aggregated data. Data Authentication: guarantees that the reported data is the same as the original one and it has come from reliable source “identification”. In secure data aggregation, both identification and authentication are important to ensure the legitimate data transfer between sensors. Data Freshness and Availability: Given that sensor networks are used to monitor time-sensitive events, it is important to ensure that the data provided by the network is current and available at all times. This means that an adversary cannot replay old messages in the future. Moreover, data aggregation security requirements should be carefully implemented to avoid extra energy consumption. If no more energy is left, the data will no longer be available. 3.2 Attacks against WSN aggregation WSNs are vulnerable to different types of attacks [16] due to the nature of the transmission medium (broadcast), remote and hostile deployment location, and the lack of physical security in each node. However, the damage caused by these attacks varies in applications depending on the
Data Aggregation Security
9
assumed threat model. Therefore, data aggregation must be done securely to prevent a deceptive reading of the state of the environment being monitored. In this section, we discuss about the attacks that might affect the aggregation in the WSN. Node compromise: Current sensor hardware does not provide any resistance to physical tampering. If an adversary captures a node, he can easily extract the cryptographic primitives as well as exploit the shortcomings of the software implementation. This allows the adversary to launch attacks from within the system as an insider, bypassing encryption and password security systems. Considering the data aggregation scenario, the compromised nodes can successfully authenticate bogus reports to their neighbours, which have no way to distinguish bogus data from legitimate ones [8]. Denial of Service Attack: is a standard attack on the WSN, by transmitting radio signals that interfere with the radio frequencies used by the WSN. Sometimes this attack is called jamming. As the adversary capability increases, it can affect larger portions of the network. In the aggregation context, an example of the DoS can be an aggregator that refuses to aggregate and prevents data from travelling into the higher levels. Sybil attacks: refers to the scenario when a malicious node pretends to have multiple identities. For example, the malicious node can claim false identities, or impersonate other legitimate nodes in the network [16]. It affects aggregation schemes in different ways [15]. Firstly, an adversary may create multiple identities to generate additional votes in the aggregator election phase and select a malicious node to be the aggregator. Secondly, the aggregated result may be affected if the adversary is able to generate multiple entries with different readings. Thirdly, some schemes use witnesses to validate the aggregated data and data is only valid if n out of m witnesses agreed on the aggregation results. However, an adversary can launch a Sybil attack and generate n or more witness identities to make the base station accept the aggregation results. Selective Forwarding Attack: In selective forwarding, a malicious node acts like a black hole and refuses to forward every packet. Adversary uses the compromised node to forward the selected messages. In the aggregation context, any compromised intermediate nodes have the ability to launch the selective forwarding attack and this subsequently affects the aggregation results. Replay Attack: In this case, an attacker records some traffic from the network without even understanding its content and replays them later on to mislead the aggregator and consequently the aggregation results will be affected. Stealthy Attack: the adversary’s goal is to make the user accept false aggregation results, which are significantly different from the true results determined by the measured values, while not being detected by the user.
10
N. Labraoui et al.
3.3 Security challenges in Data Aggregation Consequently, it is believed that a secure data aggregation is very challenging issue, and requires more attention during design process. According to the properties required by the application and according to the type of attack and the type of adversary, a secure data aggregation scheme should have as well as possible following properties: Low communication overhead: the purpose of conducting aggregation is to reduce communication overhead. Thus, a secure scheme should maintain this purpose. Scalability: Secure aggregation techniques should provide high-security features for small networks, but also maintain these characteristics when applied to larger ones. Flexibility: secure aggregation techniques should be able to function well in any kind of environments and support dynamic deployment of nodes. Effectiveness: it is important to ensure the accuracy of the final aggregation result. Generality: the secure aggregation scheme should apply to various aggregation functions, such as MAX/MIN, MEAN, SUM, COUNT, and so forth.
4 Classification of secure aggregation schemes Many innovative and intuitive secure aggregation schemes for WSNs have been proposed for solving the problem of security in sensor networks. In this section, we survey these schemes and classify them into two classes: cryptographic-based secure data aggregation and trust-based secure data aggregation schemes. See figure 2.
Secure Data Aggregation
Cryptographic-based secure
Trust-based secure data
aggregation scheme
aggregation scheme
Techniques based on
Trust and reputation
concealed data
based techniques
Techniques based on
Artificial Intelligent
revealed data
based techniques
Figure 2 Classification of secure aggregation schemes.
Data Aggregation Security
11
4.1 Cryptographic-based secure aggregation The security issues, such as data confidentiality and integrity in data aggregation become vital when the sensor network is deployed in a hostile environment. Most current research in securing data aggregation in WSNs, have been achieved through cryptographic scheme. We distinguee two techniques: techniques based on revealed data (hop-by-hop privacy), and techniques based on concealed data (end-to-end privacy). A. Techniques based on revealed data Many protocols based on revealed data (hop-by-hop privacy) provide more efficient aggregation operation and highly consider data integrity. Hu and Evans proposed the first work that addresses the security problem in aggregation (SDA) protocol for WSNs that is resilient to both intruder devices and single device key compromises [23]. They present a secure aggregation protocol to detect misbehaving sensor nodes by exploiting two main ideas: delayed aggregation and delayed authentication. Instead of aggregating the messages at the immediate next hop, the messages are directly forwarded over the first hop to the second hop, where the aggregation is performed. By postponing the aggregation to one more hop away, it guarantees the integrity for networks where two consecutive nodes are not compromised. The delayed authentication enables authentication keys to be symmetric keys. It adopts the µTESLA protocol, which achieves asymmetry from clock synchronization and delayed key disclosure. However, the protocol may be vulnerable if a parent and a child node in the hierarchy are compromised. Przydatek et al. proposed an aggregate-commit- prove framework (SIA) [24], that allows an aggregator to accept data with high probability if the aggregate result is within a desired bound or reject the result if it is outside the bound. By constructing random sampling mechanisms and interactive proof, this scheme proposes protocols to securely compute aggregation functions including median, min/max, counting and average. Wagner [45] studies the inherent security of some aggregation functions, but he only considers the level of impact a compromised sensor may have on the final result. He proposed a mathematical framework (RA) for evaluating the security of several resilient aggregation techniques and described a number of better methods for securing the data aggregation such as how the median function is a good way to summaries statistics. Furthermore, Wagner claimed that trimming and truncation can be used to strengthen the security of many aggregation primitives by eliminating possible outliers. However, this work concerns the security of aggregation functions, not the aggregation security itself. Yang et al. propose SDAP [41], a Secure Hop-by-hop data aggregation protocol based on the principles of divide-and-conquer and commit-andattest. In SDAP, a novel probabilistic grouping technique is utilized to dynamically partition the nodes in a tree topology into subtrees. A commit-
12
N. Labraoui et al.
ment-based hop-by-hop aggregation is conducted in each subtree to generate a group aggregate. The base station identifies the suspicious subtrees based on the set of group aggregates. Finally, each subtree under suspect participates in an attestation procedure to prove the correctness of its group aggregate. Chan et al. [42] designed a novel verification algorithm (SHDA) by which the BS could detect if the computed aggregate was falsified. The authors extended the work in SIA by applying the aggregate-commit-prove framework in fully a distributed network instead of single aggregator model. In general, this scheme offers exactly what the SIA does data integrity, authentication, and confidentiality. Each parent sensor performs an aggregation function whenever it has heard from its child nodes. In addition, it has to create a commitment to the set of the input used to compute the aggregated result by using a merkle hash tree. Then, it forwards the aggregated data and the commitment to its parent until it reaches the base station. Once the base station received the final commitment values, it rebroadcasts them into the rest of the network in an authenticated broadcast. Each node is responsible for checking whether its contribution was added to the aggregated data or not. Once its readings are added, it sends an authentication code to the BS. For communication efficiency, the authentication codes are aggregated along the way to the base station. However, missing one authentication code for any reason leads the BS to reject the aggregated result. Furthermore, noticeable delay, too much transmission and computation will be added as consequences of adding security to the scheme. Çam et al. proposed an energy-efficient secure pattern based data aggregation (ESPDA) protocol for wireless sensor networks in [25,26]. ESPDA is applicable for hierarchy-based sensor networks. In ESPDA, a cluster head first requests sensor nodes to send the corresponding pattern code for the sensed data. If multiple sensor nodes send the same pattern code to the cluster head, only one of them is permitted to send the data to the cluster head. ESPDA is secure because it does not require encrypted data to be decrypted by cluster heads in order to perform data aggregation. Sanli et al. [43] propose a new data aggregation technique employing the concept of differential data (SRDA). The basic idea behind SRDA is that nodes transmit the differential data rather than the raw sensed data. The raw data sensed by sensor nodes are compared with a reference data and then only the difference data are transmitted. Consequently, differential aggregation has great potential to reduce the amount of data to be transmitted from sensor nodes to cluster-heads. The basic motivation behind differential data aggregation is that significant changes in sensor measurements occur only when an important event (e.g., a fire event for a temperature network) happens in the environment. In general, these so-called important events occur much less frequently than ordinary events in sensor networks.
Data Aggregation Security
13
Du et al. proposed a witness-based data aggregation (WDA) scheme for WSNs to assure the validation of the data sent from data fusion nodes to the base station [27]. In order to prove the validity of the fusion result, the fusion node has to provide proofs from several witnesses. A witness is one who also conducts data fusion like a data fusion node, but does not forward its result to the base station. Instead, each witness computes the message authentication code (MAC) of the result and then provides it to the data fusion node who must forward the proofs to the base station. Emiliano et al. propose the Fuzzy-based FAIR framework for resilient data aggregation in real-time responsive wireless sensor networks supporting in-network processing [40]. Like to Du’s protocol [27], Witness nodes have been often employed to confirm the result of aggregator nodes, in order to ensure the integrity of data during aggregation, and aggregate and forward the result themselves. However, Witnesses not only confirm the aggregator’s result, but also aggregate and forward the result themselves. Thus, the protocol is more robust and there is no dependency on a single aggregator. Kui et al. [56] proposed a protocol based on a clique infrastructure. The protocol uses the watchdog method to verify the calculated aggregation value of the parent. However, sensor nodes use the cryptographic algorithms only when a cheating activity is detected. Topological constraints are introduced to build a secure aggregation tree (SAT) that facilitates the monitoring of data aggregators. In SAT, any child node is able to listen to the incoming data of its parent node. When the aggregated data of a data aggregator are questionable, a weighted voting scheme is employed to decide whether the data aggregator is properly behaving or is cheating. If the data aggregator is a misbehaving node, then SAT is rebuilt locally so that the misbehaving data aggregator is excluded from the aggregation tree. B. Techniques based on Concealed Data Concealed data aggregation (CDA) is an improved version of the in-network aggregation, which in contrast to the classic Hop-by-Hop ensures the End-toEnd privacy, i.e. encrypted values do not need to be decrypted for the aggregation. Instead, the aggregation is performed with encrypted values and only the sink can decrypt the result. The fundamental basis for CDA are cryptographic methods that provide the privacy homomorphism (PH) property. An encryption algorithm E() is homomorphic, if for given E(x) and E(y) one can obtain E(x*y) without decrypting x,y for some operation *. The concept was introduced by Rivest et al. [17] in 1978. The two most common variations of PHs are the additive PH and the multiplicative PH. The latter provides the property E(x × y) = E(x) ⊗ E(y). Girao et al. [18] propose a CDA scheme (CDAM) that is based on the PH proposed in [20]. They claimed that, for the WSN data aggregation scenario,
14
N. Labraoui et al.
the security level is still adequate and the proposed PH method can be employed for encryption. Castelluccia et al. proposed a simple and provable secure additively homomorphic stream cipher (HSC) that allows for the efficient aggregation of encrypted data [19]. The new cipher replace the xor (Exclusive-OR) operation with modular addition and is therefore very well suited for CPU-constrained devices such as those in WSNs. The aggregation based on this cipher can be used to efficiently compute statistical values such as the mean, variance, and standard deviation of sensed data while achieving significant bandwidth gain. One limitation of this proposal is the important overhead and scalability problem that generates if the network is unreliable. Wenbo et al. [3] propose two different solutions: CPDA and SMART. The former gives a solution for data aggregation preserving node-privacy against other nodes for additive aggregation functions in WSNs. We observe that CPDA can be extended to provide privacy against BS and furnish a solution against data-loss as well. However, as stated by the authors, the overhead of this protocol is high. Indeed, they use the anonymization sets, where each node out of the C nodes within a cluster has to send (and receive) C − 1 messages, resulting in O(C2) sent (and received) messages within each cluster, for each aggregation phase. Furthermore, each single node has to encrypt and decrypt O(C) messages, and the cluster head has to compute the inverse of a C × C matrix, for each aggregation phase. The latter proposal, SMART, is more efficient than the first proposal. However, it does not protect privacy against the BS, and suffers from the same problem of message-loss as in [19]. Önen et al. [21] and Castellucia [22] propose a new scheme that combines a PH and multiple encryptions, (PHM1 and PHM2). Theses two works are quite similar but were developed in parallel and independently. The homomorphic of the underlying encryption technique allows sensors to aggregate their clear text measurements with the encrypted aggregate values whereas the multiple encryption scheme assure that aggregates values and individual measurements results remain oblivious to all intermediate nodes enrooted to the sink. The proposed scheme assures the end-to-end confidentiality and scales efficiency. It improves the bandwidth performance of secure aggregation scheme described in [19] and allows resisting against n compromised nodes. Mahimkar et al. [44] develop a Secure Data Aggregation and Verification (SDAV) protocol that ensures that the base station accepts the aggregate readings with high reliability, even if the cluster-head is compromised or there is a collusion of less than t compromised nodes within a cluster. Integrity check of the readings is done using Merkle Hash Trees avoiding over-reliance on the cluster-heads. They also present a hierarchical network structure for establishing cluster key in sensor networks using elliptic curve cryptosystems. The cluster key is secret from all the sensor nodes, thereby eliminating eavesdropping on the cluster key. Once an aggregator receives sensor readings in the same cluster, it aggregates them and broadcasts the average value
Data Aggregation Security
15
of the readings. Each sensor in the cluster compares its reading with the average value received from the aggregator. Then, it partially signs the average value only and only if the difference between the received average value and its reading is less than a certain value (threshold). Then, the aggregator (cluster-head) combines partial signatures to form a full signature of the aggregated results and sends it to the base station. However, this scheme requires high communication costs on data validation, and supports only the AVG aggregation function. Claveirole et al. [48] propose three schemes to achieve secure aggregation in sensor networks: Secret Multipath Aggregation (SMA), Dispersed Multipath Aggregation (DMA), and Authenticated Dispersed Multipath Aggregation (A-DMA). The main idea behind these three approaches is to exploit using multiple paths toward the sink. In fact, a sensor may split a handful of its readings into n separate messages such that t messages are needed to reconstruct the readings. By sending messages along disjoint paths, a sensor ensures that intermediate nodes do not have complete knowledge of the sensed data. Every sensor splits its readings the same way, and intermediate nodes may aggregate messages by adding them and multiplying them by a constant. When the sink recovers t aggregated messages, it may reconstruct an aggregate of the readings. In such a scenario, SMA guarantees confidentiality by applying the concept of secret sharing [49]. DMA and its authenticated version, A-DMA, address availability by dispersing information over the different paths [50]. Depending on the scheme and its parameters, these techniques provide varying levels of resistance against DoS attacks, eavesdropping, and data tampering. By using secret multipath aggregation, one can guarantee that a subset of compromised paths cannot reveal/leak any information about the readings. This is at the cost of some overhead. By using dispersed multipath aggregation, one has an optimal overhead but achieves lower levels of confidentiality. Depending on the application or scenario, one approach offers more advantages than another does. Ren et al. [55] propose a secure data Aggregation scheme for clustering wireless sensor networks (ECCM), combining privacy homomorphism (PH) based end-to-end encryption and ECC-MAC based hop-by-hop authentication. The design goal of this proposal is to protect the sensitive message from leaking to the snoopers, and to detect the injected or fabricated message and drop it as early as possible. For the first goal, the authors adopted PH based encryption to conceal the original data from the overhearing; for the second goal, hop-by-hop authentication is adopted to verify the message through message authentication code (MAC). Due to the shared key is established using ECC based point multiplication, the difficulty to get the key is a discrete logarithm problem. This scheme is suitable for hierarchical WSN with a little additional computation cost. Ozedemir proposed a secure data aggregation protocol, called CDAP [57], which takes advantage of asymmetric key-based privacy homomorphic cryp-
16
N. Labraoui et al.
tography to achieve end-to-end data confidentiality and data aggregation together. The author point out that asymmetric cryptography based privacy homomorphism incurs high computational overhead, which cannot be afforded by regular sensor nodes with scarce resources. To mitigate this problem, CDAP protocol employs a set of resource-rich sensor nodes, called aggregator nodes (AGGNODEs), for privacy homomorphic encryption and aggregation of the encrypted data. In CDAP, after the network deployment each AGGNODE establishes pairwise keys with its neighbouring nodes so that neighbouring nodes can send their sensor readings securely. The AGGNODE decrypts all the data received from its neighbours, aggregates them, and encrypts the aggregated data using the privacy homomorphic encryption algorithm. Once the data are encrypted with the privacy homomorphic encryption algorithm, only the base station can decrypt them using its private key. Due to homomorphic property, intermediate AGGNODEs can aggregate those encrypted data during data forwarding. Therefore, the data collected by sensor nodes are aggregated by AGGNODEs as they travel towards the base station. The base station decrypts the final aggregated data using its private key. C. Discussion The field of cryptography within in-network data processing is a very promising research field, and introduces many interesting challenges [7]. For instance, a straightforward approach is to achieve hop-by-hop privacy and integrity protection. However, nodes can be captured and the disclosure of their key material can lead to the disclosure of raw data and partial results, or malicious modification of data. In particular, the packaging of sensor nodes can also be affected by the low-cost requirements, not allowing for tamperresistant devices: hence, nodes may be captured and their secret material can be disclosed to an attacker. To overcome this problem, techniques are proposed, that exploit end-to-end privacy in conjunction with particular key distribution, homomorphic encryption schemes or public encryption schemes. Nevertheless, in order to make sensor networks economically viable, sensor devices are limited in their energy, computation and communication capabilities; hence, these schemes have to take into account the technical and economic constraints. To find a realistic balance between computational overhead, delay and data trustworthiness, selecting the appropriate cryptography methods for sensor nodes is fundamental. Therefore, various cryptographic services are required for some applications and common use of symmetric-key algorithms such as AES and MACs are not just imposing problems such as key protection and management but can be at the same time even more expensive. Elliptic curve cryptography (ECC) is emerging as an attractive publickey cryptosystem for wireless environments. Compared to traditional cryptosystems like RSA, ECC offers equivalent security with smaller key
Data Aggregation Security
17
sizes, which results in faster computations; lower power consumption, as well as memory and bandwidth savings. This is especially useful for mobile devices, which are typically limited in terms of their CPU, power and network connectivity. The US government has recently endorsed elliptic Curve Cryptography (ECC). In spite of the diversity and the proved efficiency of the cryptographybased solutions, many of them assume that the sensor nodes are trustworthy and reporting data truthfully. In practice though, sensors are usually deployed in open unattended environments, and hence are susceptible to physical tampering. When a node is compromised, the adversary can inject bogus data into the network. However, we argue that the conventional view of security based on cryptography alone is not sufficient for the unique characteristics and novel misbehaviours encountered in open networks. Even though cryptography can provide integrity, confidentiality, and authentication, it fails in the face of insider attacks. This necessitates a system that can cope with such internal attacks. 4.2 Trust-based secure data aggregation As we cite above, WSNs are often deployed in unattended territories that can often be hostile, they are subject to physical capture by adversaries. A simple tamper proofing is not a viable solution [10]. Hence, sensors can be modified to misbehave and disrupt the entire network. This allows the adversary to access the cryptographic material held by the captured node and allow the adversary to launch attacks from within the system as an insider, bypassing encryption and password security systems. The compromised nodes can successfully authenticate bogus reports to their neighbours, which have no way to distinguish false data from legitimate ones [8]. Trust and reputation systems have been proposed as an attractive complement to cryptography in securing WSNs. They provide the ability to detect and isolate both faulty and malicious nodes that are behaving inappropriately in the context of the specific WSN. Recently, attention has been given to the concept of trust to increase security and reliability in Ad Hoc [32, 33] and sensor networks [35, 34]. The notion of trust to be used throughout this paper is briefly defined as “trust is the degree of belief about the future behaviour of other entities, which is based on the ones the past experience with and observation of the others actions”. Reputation is another complex notion that spans across multiple disciplines. It is quite different from but easily confused with trust. The mathematical foundation for reputation management is rooted in statistics and probability [39]. Furthermore, reputation is based on collection of evidence of good and bad behaviour undertaken by other entities. It is based on past experience with a given entity, whereas trust is not restricted to this. In this section, we study the proposed schemes based on trust and reputation for secure data aggregation in networks. We classify them into two catego-
18
N. Labraoui et al.
ries: trust and reputation-based framework, and artificial intelligent-based framework. A. Trust and reputation-based framework In [28], the authors propose a trust-based framework for secure data aggregation in WSN based on Bayesian model and beta distribution probability (TKL). They first evaluate trust in individual sensor nodes based on Kullback-Leibler (KL) distance or relative entropy. The idea is to calculate the distance between ideal node behaviour and the actual node behaviour. The authors assign a confidence value to aggregated sensor data. Based on sensor data confidence, the framework computes an opinion, which encompasses belief and uncertainty on the aggregation of sensor data by means of the consensus operator [37]. Nevertheless, this approach is still time-consuming for establishing a stable reputation on sensor nodes. The reputation on a node, based the inverse square of its KL distance, suffers from severe oscillation for the first reputation evaluation. Hur et al. propose in [29] a trust-based aggregation scheme for WSN based on local trust evaluation (LTE). This local trust evaluation mechanism is suitable for resource-limited sensor networks. The trustworthiness of a node is computed based upon several trust evaluation factors, such as battery lifetime, sensing communication ratio, sensing result and consistency level. Each sensor node computes only their neighbour node’s trust accumulatively. Prior to a data aggregation, sensor nodes elect an aggregator node in their own grid, which has the highest trust value among all the nodes in an identical grid by the majority of vote. A trust agreement process is necessary, because the trust value of a node is evaluated by its neighbour nodes, that is any node never know its own trust value. Sensing data from multiple nodes are aggregated in consideration of agreed trust values of member nodes per each grid. Deceitful data from malicious or compromised nodes whose trust value are lower than those of the other legal nodes can be excluded. One drawback of this scheme is that it considers that the trust evaluation is computed by only trustworthiness nodes. Rabinovich et al. [30] propose a mechanism to detect and mitigate stealthy attacks against sensor networks by using distributed localized constraint validation and randomized routing (RTM). More specifically, they consider those applications of sensor networks where measurements are spatially correlated. This correlation is expressed in the form of measurement constraint. Sensors observe their neighbours during transmission and file “complaints” if their neighbours’ data violate the constraint when compared with their own. Protection against compromised aggregators is achieved via construction of randomized delivery trees. Every time the SN sends data to the server, it includes both current and small number of recent measurements. This allows the server to detect an attack when a small fraction of sensors has been compromised. Based on complaints, the server uses a beta reputation system [36] to identify
Data Aggregation Security
19
compromised sensors. In order to test the effectiveness of the scheme, authors developed a custom simulator to analyse the time to detect a compromised sensors, the fraction of compromised sensors successfully detected by the system, and the communication overhead introduced by the security protocol. Alzaid et al. [46] propose a reputation-based secure data aggregation (RSDA) that focuses on enhancing the data availability and the accuracy of the aggregated data. The normal sensors are grouped into cells and in each cell; one of sensors is selected to be the cell representative Crep. Initially Crep is chosen randomly since all nodes start with same reputation value. The Crep is responsible for confirming its cell reading Cread (reported by other cell member), aggregating it with other readings (if the cell is an intermediate cell), and forwarding it to upper stream cell. By monitoring neighborhood’s activities, each sensor node evaluates the behaviour of Crep and its cell members in order to filter out the inconsistent data in the presence of multiple compromised nodes (< t in each cell). RSDA is expected to detect compromised nodes and then black list them, which helps to reach the main two goals: extend the network lifetime and protect the accuracy of the aggregated data. RSDA uses beta probability density function (PDF) to update the reputation value of each sensor node due to its flexibility, strong foundation on the statistics, simplicity that meet the needs of resource constraint nodes. Ozdemir [47] propose a Secure and reliable data Aggregation protocol, called SELDA, which is based on trustworthiness of sensor nodes and data aggregators. The basic idea behind protocol SELDA is that sensor nodes observe actions of their neighbouring nodes to develop trust levels for the environment using Beta distribution function. Sensor nodes exchange their trust levels with neighbouring nodes to form a web of trust that allows them to determine secure and reliable paths to data aggregators. Based on these trust levels, sensor nodes transmit their data to aggregator(s) over one or more secure paths. During data aggregation, data aggregators weight the data based on the trust levels of the sender nodes. In order to prevent forgery and selective forwarding attacks by compromised nodes, the author propose a secure multipath data transmission algorithm that ensures secure data delivery to data aggregators. The proposed data transmission algorithm secretly selects some paths based on the reliability of the paths and keeps the quantity and identity of the selected paths secret. One important property of SELDA is that, due to the monitoring mechanisms in use, it can detect if a data aggregator is under DoS attack. The simulation results show that SELDA increases the reliability of the aggregated data at the expense of a tolerable communication overhead. B. Artificial intelligent-based framework Interest on applying Artificial Intelligence techniques on securing sensor network environments is rising. [11] Use offline neural network based learning technique to model spatial patterns in sensed data. [13] applies reinforce-
20
N. Labraoui et al.
ment-learning techniques for intrusion detection. [14] and [12] base their detection systems on multi-agent systems. The critical issue of aggregation, however, is not taken into consideration by any of these researches. In [31] the authors propose the first mechanism (AIF) that combines statistics and artificial intelligence techniques for robust detection of malicious nodes in a sensor network environment without unnecessarily eliminating honest nodes, e.g., the descendants of a malicious node. In particular, hypothesis testing mechanisms are used by a child node to estimate the probability of error reporting by a child node over one data reporting epoch, while reinforcement learning schemes are used to update such reputation over successive epochs. The use of straightforward statistical techniques is not sufficient for error monitoring in sensor networks. Due to variations in the environment and intrinsic sensing error characteristics, there can be significant short-range deviations between data reported by one node and its siblings. Hence, significant tests may observe deviations between individual reported values and aggregates reported by parent nodes for a given epoch. To create a more robust system, reputations must be accumulated over successive epochs; and only if consistent deviations are observed, a parent node should be labelled to be malicious. One drawback of this scheme is that it cannot detect unbiased errors introduced by the aggregator node. A very high learning rate can increase detection but also introduce unacceptable false positives. C. Discussion Trust and reputation have been recently suggested as an effective security mechanism for open environments such as the Internet, and considerable research has been done on modeling and managing trust and reputation. Some research work has shown that rating nodes’ trust and reputation is an effective approach in distributed environments to improve security; to support decision-making, and to promote node collaboration. In-network evaluation of trustworthiness of sensor data is based, e.g., on probability theory. The trust information can thus be used in order to determine if the sensor data should be skipped or be used for further data processing. Importantly, there is no centralized computation or storage of the reputation ratings; instead, each node maintains a reputation rating for each other node that it interacts with. This reputation rating is then used to suppress the contribution of faulty nodes in the final aggregated data. To accelerate the building of reputation over time, nodes share their observations about other nodes with the rest of the network. However, this periodic exchange of reputation values between the nodes induces an extra transmission overhead. The scalability of those approaches is still questionable. Actually, the efficiency of those approaches is tightly related to the number of nodes involved in the WSN. Hence, using the trust and reputation management scheme to secure wireless sensor networks requires paying close attention to the incurred
Data Aggregation Security
21
bandwidth and delay overhead, which have thus far been overlooked by most research work. On other hand, building a robust trust and reputation system presents several important challenges on its own [10]. The most pressing is the possibility that a malicious node that participates in the reputation system can prevent it from functioning by lying. A compromised node can falsely accuse well-behaving nodes of malicious actions or falsely praise bad-behaving nodes. To maintain its integrity the reputation system must be able to prevent these kinds of attacks. Sun et al. have classified attacks on trust as bad mouthing attack, on-off attack and conflicting behaviour attack [60]. Bad mouthing attack: is an attack when a malicious node provides dishonest vote to increase the reputation of bad node. Bad mouthing also implies to a node, which ignores to record a trust value for a good node or unfairly records a negative value, causing a bad reputation. On-off attack: in this type of attack, malicious nodes behave good and bad alternatively, compromising the network with the hope that bad behaviour will be undetected. On-off attack causes a good node to become a malicious one and bad node to act as trustworthy. Conflicting behaviour attack: malicious nodes can damage the good nodes reputation by having a conflicting behaviour; behaving good to some nodes and bad to other nodes. Due to this conflicting behaviour, nodes in one group will have low reputation value to the nodes in other group. Another important issue when building reputation is determining when a node has performed a malicious action and being able to distinguish it from natural failures. Due to the uncertain nature of WSN environment, such as collisions on the wireless channel, it is not always possible to distinguish these two kinds of erroneous behaviours. All mentioned security protocols using trust and reputation mechanisms are based on particular assumptions about the nature of attacks. If the attacker is “weak”, the protocol will achieve its security goal. This means that an intruder is prevented from breaking into a sensor network and hinders its proper operation. If the attacker is “strong” (i.e., behaves more maliciously), there is a non-negligible probability that the adversary will break in. Because of their resource constraints, sensor nodes usually cannot deal with very strong adversaries. Therefore, what is needed is a second line of defence: An Intrusion Detection System (IDS) that can detect a third party’s attempts of exploiting possible insecurities and warn for malicious attacks, even if these attacks have not been experienced before.
5 Performance Comparison This section, attempts to compare the secure data aggregation schemes that were reviewed in the last section. Comparison of security schemes can be
22
N. Labraoui et al.
difficult since the designers solve secure aggregation from different angles. Therefore, these schemes are compared in a number of different ways: security services provided (confidentiality, integrity, authentication, freshness and availability), design goal (scalability, overhead, flexibility, effectiveness and generality), and resilience against attacks described in section 2.2. We summarise this comparison in table 1 and 2.
Security Services Concealed Data
Conf
CDAM HCS PHM1 PHM2 SDAV CPDA SMART SMA DMA A-DMA ECCM CDAP
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
Revealed
Conf
Data
SDA SIA SDAP SHDA ESPDA SRDA
⊗ ⊗ ⊗ ⊗ ⊗
Integ
FAIR SAT
Fresh
Attacks existence Avai
⊗ ⊗
Integ
⊗ ⊗ ⊗ ⊗
⊗
⊗ ⊗
⊗
Auth
Fresh
⊗ ⊗ ⊗ ⊗
Avai
NC.
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ NC .
⊗ ⊗
⊗ ⊗ ⊗ ⊗ ⊗
⊗ ⊗ ⊗ ⊗
WDA
Auth
⊗ ⊗ ⊗
⊗ ⊗
DoS .
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
SFo
Rep
Syb
⊗ ⊗ ⊗ ⊗ ⊗
Design Goal Ste
Scal
⊗
⊗
⊗
High
⊗ ⊗
High High High
Fle
Eff
⊗ ⊗ ⊗
High
⊗
Gen
⊗ ⊗ ⊗ ⊗
High Med
⊗ ⊗
⊗ ⊗ ⊗
Med High High
⊗ ⊗
⊗ ⊗
⊗
DoS
SFo
Rep
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
⊗ ⊗ ⊗ ⊗
⊗ Syb
Ste
Scal
⊗
Med Med Over
Fle
⊗
High
⊗
⊗ ⊗ ⊗
Low Low
⊗ ⊗ ⊗
Med Med
⊗
⊗
⊗
Gen
⊗ ⊗ ⊗ ⊗
High
⊗
Eff
High High
⊗ ⊗ ⊗ ⊗
Over
Low
⊗ ⊗ ⊗ ⊗ ⊗ ⊗
Conf: Confidentiality - Integ: Integrity – Auth: Authentication – Fresh: Freshness – Avai: Availability NC: Node Compromise – DoS: Deni of Service – Sfo: Selective Forwarding – Rep: Replay – Syb: Sybil – Ste: Stealthy. Scal: Scalability – Over: Overhead – Fle: Flexibility – Eff: Effectiveness – Gen: Genarality
Table 1 A summary of camparison between cryptography-based secure aggregation schemes.
Security Services Trust and Reputation TKL LTE RTM RSDA SELDA Artificial Intelligent AIF
Conf
Integ
⊗ Conf
Integ
Auth
⊗
Fresh
Attacks existence Avai
⊗ ⊗ ⊗
⊗ ⊗ ⊗
⊗ ⊗
⊗ ⊗
Auth
Fresh
⊗
⊗
Avai
⊗
NC.
NC.
DoS
SFo
⊗ ⊗ ⊗ ⊗ ⊗
⊗ ⊗ ⊗
DoS
SFo
⊗
⊗
Rep
Design Goal
Syb
Ste
Scal
⊗ ⊗ ⊗
Syb
⊗
Flex
low low High
⊗ ⊗ Rep
Over
⊗
low Med
Ste
Scal
Over low
Table 2 A summary of comparison between trust-based secure aggregation schemes.
Flex
Eff
Gen
⊗ ⊗ ⊗
⊗ ⊗
⊗ ⊗
⊗ ⊗
Eff
⊗
Gen
⊗
Data Aggregation Security
23
Since the assumed adversary varies from one scheme to other, each proposed scheme has different requirements. In cryptographic based scheme, the data confidentiality represents the minimum-security requirements. As we see in table1, the techniques based on revealed data provide more efficient aggregation operation in term of overhead and highly consider data integrity. However, they represent weaker model of data confidentiality perspective than techniques based on concealed data. However, in trust based scheme, the availability and the network lifetime are the principle concern and should be provided as well as possible. We notice that all the proposed schemes based on cryptography are vulnerable to DoS and physical attacks, except the schemes SMA, DMA and A-DMA proposed by Claveirole et al. [48]. There are the only schemes, which handle the DoS attack. On the other hand, we notice that all the trust based schemes are vulnerable to DoS and Sybil attacks. Metrics used to evaluate the design goals are: Scalability, Overhead, Flexibility, and Effectiveness. 5.1 Scalability As we have cited in section 2.3, a scheme is considered scalable if it can provide high-security feature as well for small networks that for a larger ones. Alzaid et al. [15] classify the secure data aggregation into two models: the single aggregator and the multiple aggregators. In the single aggregator model, all individual data in the WSN travels to only one aggregator point in the network before reaching the base station (the querier). This model is useful when the network is small, however large networks are not suitable places to implement this model especially when data redundancy at the lower levels is high [15]. This is the case of SIA and SHDA. In the multiple aggregator models, the collected data in the WSN are aggregated more than one time before reaching the querier [15]. We could think that this model is well suitable in the large WSNs because it achieves greater reduction in the number of bits transmitted. However, this affirmation is credible only if the scheme does not generate a high overhead in communication and bandwidth. We cite the example of PHM1 and PHM2, which in spite of the high overhead (caused by the multiple encryption that consumes energy), scale efficiency because it improves the bandwidth gain. While HCS is not scalable especially if the network is unreliable because it generates an important traffic in bandwidth when transmitting the id of nodes, which participate (or not participate) in the aggregation function. In ESPDA, only one sensor per each pattern code is permitted to send data to the clusterhead and in SRDA each sensor transmit the differential data rather than the raw sensed, reducing thus the transmission overhead. Both of these two schemes are independent to the size of network and scale efficiency. In CDAP, powerful nodes, called AGGNODEs are used to make the scheme feasible for large heterogeneous wireless sensor networks. SRDA, which is trust-based scheme, is also scalable because in the cluster one sensor is elected to transmit the reading.
24
N. Labraoui et al.
5.2 Overhead The scheme introduces overhead that consumes bandwidth and energy. This overhead is measured by the size of transmitted message and the energy consumption. In concealed data techniques, data encryption is very expensive, and generates important energy consumption especially if the data is encrypted more than one time (multiple encryptions). It is the case in CDAM, HCS, PHM1, PHM2, SDAV and CPDA. Including energy consumed on CPU processing, every cryptographic primitive requires a different amount of time and a different number of CPU cycles for execution, resulting in different energy consumption values. For example, Skipjack requires 22,044.60 CPU cycles and consumes 71.76 µjoules for calculating a 29-byte packet MAC [67]. In the case of WDA and FAIR, to prove the validity of aggregation result, the aggregator has to provide proofs from several witnesses. It must then forward the proofs to the base station by piggybacking them with the aggregation result. This introduces some additional bandwidth consumption. The overhead is also increase because of the interactive verification phase imposed by checking integrity and accuracy. Moreover, a verification process has been set up in different ways, such as using interactive protocols (SIA, SHDA, SDAP), broadcast authentication by the base station (SDA, SIA, SHDA, SDAP), or voting system (WDA, FAIR). RTM and SELDA which are a trust based scheme makes the network more reliable but introduces a high overhead due to the fact that RTM uses a randomized delivery trees and SELDA uses multi path data transmission. SDAV improved the data integrity vulnerability in WDA by using digital signature. It then requires high communication costs on data validation. 5.3 Effectiveness The scheme, which ensures the accuracy of the final aggregation result, can contain a verification phase to enhance the querier ability in distinguishing between the valid and the invalid aggregated readings, as the case of SIA, SHDA, SDAP, WDA and FAIR in cryptographic based schemes. The scheme can also ensure the effectiveness of aggregation by monitoring the aggregator when sending data aggregation result to the querier. If monitors detect that the aggregator send an altered data, they send an alarm to the querier. This is the case of SAT in cryptographic based scheme and TKL, LTE, RTM, RSDA and SELDA in the trust-based schemes. However, IAF cannot ensure completely the accuracy of aggregation because this scheme cannot detect unbiased errors introduced by the aggregator node. 5.4 Flexibility All the proposed schemes are designed for static sensor networks. They do not consider the mobility of the node. However, the topology of WSNs can vary some times. The scheme must take into account this characteristic and
Data Aggregation Security
25
consider addition of new nodes or removing them in the network while supporting the same security level. It is true that this property is very difficult to ensure. Only CDAM, PHM1 and PHM2 are flexible. 5.5 Generality A secure data aggregation is general if it can be applied to various aggregation functions, such as MAX/MIN, MEAN, SUM, and COUNT. Majority of schemes ensure this property except of SDAV, CPDA, SMART, SIA and SDAP. Finally, there is no scheme, which is perfect. Each proposal has its advantages and its limitations. A trade-off between security level and performance must be carefully balanced.
6 Future Research Directions We believe that in-network processing is one of the key issues that has to be considered in all layers of the WSN’s network protocol stack in order to minimize energy consumption. However, this operation cannot be efficiently done without being secured. Integrating security as an essential component of data aggregation protocols is an interesting problem for future research. The secure data aggregation issue has been intensively studied in the past and several schemes have been proposed in the specialized literature. Most of the existing work has mainly focused on the development of an efficient routing mechanism for data aggregation. However, the performance of the secure data aggregation protocol is strongly coupled with the infrastructure of the network. We also think that secure aggregation should consider keying schemes that are more energy-efficient. Moreover, multi-tiered hierarchical aggregation approaches, such as in [54] would be the most efficient scheme when the WSN contains a high number of sensor nodes. It is worthwhile to pay more attention how to provide an efficient scheme that improves availability without neglecting energy efficiency. Combining aspects such as security, data latency and system lifetime in the context of secure data aggregation is worth exploring. Another interesting domain of research is the application of source coding theory for data gathering networks. The sensor data are highly correlated and energy efficiency can be achieved by joint source coding and data compression. On the other hand and according to [52,53], the sensor network research community has ignored at least until now the effects of mobility on the delivery of security services. Current proposed secure data aggregation schemes have focused on securing data generated and disseminated by static WSN. Consequently, the effects of mobility and network dynamism on secure aggregation must be investigated and need to be studied more extensively in order to design realistic and more practical secure aggregation schemes.
26
N. Labraoui et al.
7 Conclusion Sensor networks are vulnerable to insider and outsider attacks much more than other wireless networks for the reasons discussed in section 2. When designing a security protocol it is important to understand the dangerous and damaging effects these attacks can have so that the protocol can guard against them. Secure data aggregation in wireless sensor networks is a critical issue that has been addressed through many proposed schemes. These proposals are classified into two categories: cryptographic based scheme and trust based scheme. Some promising results have been recently achieved in secure data aggregation. They are based on advanced cryptographic concepts, such as privacy homomorphism and elliptic curve cryptography. Despite of the diversity and the proved efficiency of these solutions, many of them assume that the sensor nodes are trustworthy and reporting data truthfully. Even though cryptography can provide integrity, confidentiality, and authentication, it fails in the face of insider attacks. Another important issue is related to assessment of trustworthiness and reliability of the data provided by WSNs. However, this issue is still in its infancy, and there are not clear trust evaluation models, which can be applied to sensor networks properly. This paper provides an overview of theses techniques, each of which offers different advantages and disadvantages. A balance between the requirements and resources of a WSN determines which technique should be employed. We notice that no secure aggregation technique is ideal to all the scenarios where sensor networks are used; therefore, the techniques employed must depend upon the requirements of target application and resources of each individual sensor network. Despite of potentially great importance and very interesting theoretical and practical challenges, the topic of secure data aggregation in wireless sensor networks have received until recently much less attention than, e.g., secure routing or key management. Therefore, despite of many interesting initial results, the security questions related to data aggregation in WSN remain largely open, and in our opinion constitute an interesting area for further research.
References [1] O. Moussaoui, A. Ksentin, M. Naimi and M. Gueroui, “A novel clustering algorithm for efficient energy saving in wireless sensor networks” in the 7th IEEE International Symposium on Computer networks (ISCN’06), Istanbul, Turkey, June 2006. [2] T. A Zia and A. Y Zomaya, “A security framework for wireless sensor networks”. In Proceedings of the IEEE Sensor Applications Symposium (SAS), Houston, Texas, February 7–9, 2006. [3] H. Wenbo, L. Xue, N. Hoang, K. Nahrstedt and T. Abdelzaher, “PDA: Privacy-preserving Data Aggregation in Wireless Sensor Networks”, In 26th IEEE International Conference on Computer Communications, (INFOCOM), Anchorage, AK, May 2007, pp. 2045–2053.
Data Aggregation Security
27
[4] N. Xu, S. Rangwala, K. Chintalapudi, D. Ganesan, A. Broad, R. Govindan, and D. Estrin, “A Wireless Sensor Network for Structural Monitoring,”, In Proceedings of the ACM Conference on Embedded Networked Sensor Systems, Baltimore, MD, November 2004. [5] A. Perrig, R. Szewczyk, J. D. Tygar, V. Wen, and D. E. Culler, “SPINS: Security protocols for sensor network”, Wireless Networks, 2002, vol. 8, no. 5, pp. 521–534. [6] R. Rajagopalan and P. K. Varshney, “Data aggregation techniques in sensor networks: A survey”, In Communications Surveys & Tutorials, IEEE, Fourth Quarter 2006, Vol. 8, Issue 4, pp 48–63. [7] A.Sorniotti, L.Gomez, K.Wrona and L.Odorico “Secure and trusted in-network data processing in wireless sensor networks: a survey”, Journal of Information Assurance and Security (JIAS), September 2007, Vol. 2, Issue 3. [8] A. Perrig, J. Stankovic, D.Wagner, “Security in Wireless Sensor Networks”, Communication of the ACM, June 2004. [9] B. Warneke, K.S.J. Pister, “MEMS for Distributed Wireless Sensor Networks,” 9th International Conference on Electronics, Circuits and Systems, Dubrovnik, Croatia, September 2002. [10] S. Ganeriwal and M. Srivastava, “Reputation-based framework for high integrity sensor networks”, In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks (SASN), October 2004 , pp. 66–77. [11] P. Mukherjee and S. Sen, “Detecting malicious sensor nodes from learned data patterns”, In Proceedings of the Workshop on Agent Technology for Networks, 2007, pp. 11–17. [12] R. M. Ruairi and M. T. Keane, “An energy-efficient, multi-agent sensor network for detecting diffuse events”, In Proceedings of the 20th international joint conference on Artifical intelligence (IJCAI), 2007, pp. 1390–1395. [13] A. L. Servin and D. Kudenko, “Multi-agent reinforcement learning for intrusion detection”, In Adaptive Learning Agents and Multi Agent Systems, 2007, pp. 158–170. [14] J. Wu, C. jun Wang, J. Wang, and S. fu Chen, “Dynamic hierarchical distributed intrusion detection system based on multi-agent system”, In Proceedings of the 2006 IEEE/WIC/ ACM international conference on Web Intelligence and Intelligent Agent Technology, Washington, DC, USA, 2006, pp. 89–93. [15] H. Alzaid, E. Foo And J.M Gonzalez. “secure data aggregation in wireless sensor networks: a survey” In L. Brankovic and M.Miller, editors, 6th Australasian Information Security Conference (AISC) Wollongong, NSW, Australia, 2008, vol. 81 of CRPIT, pp. 93–105. [16] T. Roosta, S. Shieh, And S. Sastry, “Taxonomy of security attacks in sensor networks”, In the First IEEE International Conference on System Integration and Reliability Improvements’, IEEE International, Washington, DC, USA, 2006. [17] R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On Data Banks and Privacy Homomorphisms,”, In Foundations of Secure Computation, New York: Academic, 1978, pp. 169–79. [18] J. Girao, D. Westhoff, and M. Schneider, “CDA: Concealed Data Aggregation for Reverse Multicast Traffic wireless Sensor Networks,” In Proceeding IEEE International Conference on Communication (ICC). , Seoul, Korea, May 2005. [19] C. Castelluccia, E. Mykletun, and G. Tsudik, “Efficient Aggregation of Encrypted Data Wireless Sensor Network,” In Proceeding ACM/IEEE Mobiquitous, San Diego, CA, July 2005. [20] J. Domingo-Ferrer, “A Provably Secure Additive and Multiplicative Privacy Homomorphism,” Lecture Notes in Computer Science., vol. 2433, 2002, pp. 471–83.
28
N. Labraoui et al.
[21] M. Onen and R. Molva, “Secure data aggregation with multiple encryption” 4th European conference on Wireless Sensor Networks, Delft, The Netherlands, January 29-31, 2007, Also published as LNCS Vol. 4373 , pp 117–132. [22] C.Castellucia, “Securing very dynamic groups and data aggregation in wireless sensor networks”, IEEE International Conference on Mobile Adhoc and Sensor Systems, (MASS), Pisa, Italy, October 2007, pp. 1–9. [23] L. Hu and D. Evans, “Secure aggregation for wireless networks”, In Proceeding of Workshop on Security and Assurance in Ad hoc Networks, Orlando, FL, January 2003. [24] B. Przydatek, D. Song, and A. Perrig, “SIA : Secure information aggregation in sensor networks”, In Proceedings of the 1st international conference on Embedded networked sensor systems, Los Angeles, CA, Nov 5–7, 2003. [25] H. Çam, D. Muthuavinashiappan, and P. Nair, “ESPDA: Energy Efficient and Secure Pattern-Based Data Aggregation for Wireless Sensor Networks,” In Proceeding IEEE Sensors, Toronto, Canada, Oct. 2003, pp. 732–36. [26] H. Çam, D. Muthuavinashiappan, and P. Nair, “Energy-Efficient Security Protocol for Wireless Sensor Networks”, In Proceeding IEEE Vehicular Technology Conference., Orlando, FL, Oct. 2003, pp. 2981–84. [27] Du, W., Deng, J., Han, Y. S. and P. Varshney, “A witness-based approach for data fusion assurance in wireless sensor networks”, In IEEE Global Communications Conference (GLOBECOM), 2003, Vol. 3, pp. 1435–1439. [28] W. Zhang, S. Das and Y. Liu, “A trust based framework for secure data aggregation on wireless sensor networks”, In Proceedings of 3rd Annual IEEE Communications Society and Networks (SECON), 2006, pp. 60–69. [29] J. H. Junbeom , L. Yoonho, H. Seongmin, Y. Hyunsoo, “Trust-based aggregation in wireless sensor networks”, International Conference on Computing, Communications and Control Technologies, July 2005. [30] P. Rabinovich, R. Simon, “Secure aggregation in sensor networks using neighbourhood watch”, In IEEE International Conference, Glasgow, June 2007, pp. 1484–1491. [31] A. Gursel, O. Mistry, S. Sandip, “Robust Trust Mechanisms for Monitoring Aggregator Nodes in Sensor Networks”, International Workshop on Agent Technology for Sensor Networks (ATSN-08), Estoril, Portugal, May 2008. [32] S. Buchegger and J. L. Boudec, “Performance analysis of the CONFIDANT protocol,” In Proceeding in 3rd ACM international Symposium on Mobile ad hoc networking & computing, 2002. [33] P. Michiardi and R. Molva, “CORE: A Collaborative Reputation Mechanism to enforce node cooperation in Mobile Ad hoc Networks”, In Proceedings of the IFIP TC6/TC11 6th Joint Working Conference on Communications and Multimedia Security: Advanced Communications and Multimedia Security, 2002. [34] A. Srinivasan, J. Teitelbaum, and J. Wu, “DRBTS: Distributed Reputation-based Beacon Trust System,” In 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC’06), 2006. [35] S. Ganeriwal and M. Srivastava, “Reputation-based framework for high integrity sensor networks”, In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks (SASN ’04), October 2004, pp. 66–77. [36] A. Jsang and R. Ismail, “The Beta Reputation System”, In Proceedings of the 15th Bled Conference on Electronic Commerce, Bled, Slovenia, 17–19 June 2002. [37] Audun Joang, “A logic for uncertain probabilities”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(3):279–311, 2001
Data Aggregation Security
29
[38] S. Peter, K.Piotrowski, and P. Langndoerfer. “On concealed data aggregation for aggregation for wireless sensor networks”, In Proceedings of the IEEE Consumer Communications and Networking Conference, January 2007. [39] P. Robinson and M. Beigl, “Trust context spaces: An infrastructure for pervasive security”, In Proceedings of the first International Conference on Security in Pervasive Computing, 2003. [40] Emiliano De Cristofaro1, Jens-Matthias Bohli2, Dirk Westho, “FAIR: Fuzzy-based Aggregation providing In-network Resilience for real-time Wireless Sensor Networks”, In Proceedings of the 2nd ACM conference on Wireless network security, Zurich, Switzerland, pp 253–260, 2009. [41] Y. Yang, X. Wang, S. Zhu and G. Gao, “SDAP: a secure hop-by-hop data aggregation protocol for sensor networks,” In Proceedings of ACM International Symposium on Mobile Ad Ad Hoc Networking and Computing, Florence, Italy, pp. 356–367, May 2006. [42] H. Chan, A. Perrig, and D. Song, “Secure hierarchical in-network aggregation in sensor networks”, In Proceedings of the 13th ACM conference on Computer and communications security (CCS ’06), pages 278–287, 2006. [43] H. O. Sanli, S. Ozdemir and H. Cam, “SRDA: Secure Reference-Based Data Aggregation Protocol for Wireless Sensor Networks”, In Vehicular Technology Conference, pp. 4650–4654, 2004. [44] A. Mahimkar, T. S. Rappaport, “SecureDAV: A Secure Data Aggregation and Verification Protocol for Sensor Networks”, In Global Telecommunications Conference, Vol. 4, pp. 2175–2179, 2004. [45] D. Wagner, “Resilient aggregation in sensor networks”, In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks, Washington DC, USA, 2004. [46] H. Alzaid, E. Foo, J. G. Nieto,” RSDA: Reputation-based Secure Data Aggregation in Wireless Sensor Networks”, In Proceedings of the 9th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 419–424, 2008. [47] Suat Ozdemir, “Secure and Reliable Data Aggregation for Wireless Sensor Networks”, Lecture Notes in Computer Science, Springer Berlin/Heidelberg, Vol. 4836, pp 102–109, 2007. [48] T. Claveirole, M. Amorim, M. Abdalla and Y.Viniotis, “Securing wireless sensor networks against aggregator compromises“, IEEE Communications Magazine, April 2008. [49] A. Shamir, “How to share a secret,” Commununication of ACM, vol. 22, no. 11, pp. 612–613, 1979. [50] M. O. Rabin, “Efficient dispersal of information for security, load balancing, and fault tolerance,” Journal of the ACM, vol. 36, no. 2, pp. 335–348, 1989. [51] Y. Sang, H. Shen, Y. Inoguchi, Y. Tan and N. Xiong, “Secure data aggregation in wireless sensor networks: A survey”, In Proceedings of the 7th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT ’06)’, IEEE Computer Society, Washington, DC, USA, pp. 315–320, 2006. [52] G. Kambourakis, E. Klaoudatou, and S.Gritzalis, “Securing Medical Sensor Environments: The CodeBlue Framework Case”, Security and Communication Networks, 1:5, 375–388, 2008. [53] A. Muneeb, S. Umar, A. Dunkels, T. Voigt, K. Romer, K. Lanendoen, J. Polastre, and A. Uzmi, Z., “ Medium Access Control Issues in Sensor Networks”, ACM SIGCOMM Computer Communication Review, April 2006, 36(2), 33–36, 2006. [54] J. Deng, R. Han, and S. Mishra, “Security Support for In-network Processing in Wireless Sensor Networks,” 2003 ACM Wksp. Security of Ad Hoc and Sensor Networks (SASN ’03), Oct. 2003.
30
N. Labraoui et al.
[55] S. Q. Ren, D. S. Kim, and J. S. Park, “A Secure Data Aggregation Scheme for Wireless Sensor Networks”, Frontiers of high performance computing and networking, ISPA 2007 Workshops, Springer-Verlag Berlin Heidelberg, LNCS 4743, pp. 32–40, 2007. [56] W. Kui, D. Dreef, B. Sun, Y. Xiao, “Secure data aggregation without persistent cryptographic operations in wireless sensor networks”, Ad Hoc Networks 5 (1) (2007) 100–111 [57] S. Ozdemir, “Concealed data aggregation in heterogeneous sensor networks using privacy homomorphism”, In Proceedings of the IEEE International Conference on Pervasive Services (ICPS’07), Istanbul, Turkey, 2007, pp. 165–168. [58] S. Madden, M. J. Franklin, and J. M. Hellerstein, “TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks,” ACM SIGOPS Operating Systems Review, 2002, 36(SI), pp. 131–146. [59] E. Fasolo, M. Rossi, J. Widmer, and M. Zorzi, “In-Network Aggregation Techniques for Wireless Sensor Networks: A Survey”, IEEE Wireless communication 2007. [60] Y.L. Sun, Z. Han and K. J. R. Liu, “Attacks on Trust Evaluation in Distributed Networks” 40th Annual Conference on Information Sciences and Systems”, Princeton, New Jersey, March 22–24, 2006. [61] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson, “Wireless Sensor Networks for Habitat Monitoring,” In ACM International Workshop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, GA, September 2002. [62] http://www.alertsystems.org. [63] E. Biagioni and K. Bridges, “The application of remote sensor technology to assist the recovery of rare and endangered species”, In Special issue on Distributed Sensor Networks for the International Journal of High Performance Computing Applications, Vol. 16, N. 3, August 2002. [64] “MEMS Come to Oz Wine Industry”, Electronic News June 2004. [65] J. Stankovic, “Wireless Sensor Networks. In Handbook of Real-Time and Embedded Systems”, edited by I. Lee, J.Y.-T. Leung, S.H. Son (CRC Press, Boca Raton 2007). [66] C. Hartung, S. Holbrook, R. Han and C. Seielstad, “FireWxNet:AMulti-Tiered PortableWireless System for MonitoringWeather Conditions in Wildland Fire Environments”, In Proceedings of the 4th International Conference on Mobile Systems, Applications and Services (MobiSys), 2006 (2006) pp. 28–41. [67] A. S. Wander, N. Gura, H. Eberle, V. Gupta, and S. Shantz. “Energy Analysis of PublicKey Cryptography on Small Wireless Devices”, In Proceeding of the 3rd IEEE International Conference on Pervasive Computing and Communications (PerCom), March 2005.