On the Construction of Data Aggregation Tree with

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2016.2581491, IEEE Sensors Journal

1

On the Construction of Data Aggregation Tree with Maximizing Lifetime in Large Scale Wireless Sensor Networks Shaohua Wan, Yudong Zhang, Chen Jia  Abstract—Data aggregation protocols are generally utilized to extend the lifetime of sensor networks by reducing the communication cost. Traditionally, tree-based structured approaches that is a basic operation for the sink to periodically collect reports from all sensors were concerned about many applications. Since the data aggregation process usually proceeds for many rounds, it is important to collect these data efficiently, that is, to reduce the energy cost of data transmission. Under such applications, a tree is usually adopted as the routing structure to save the computation costs when maintaining the routing tables of sensors. In our previous work, we have demonstrated that multiple trees, as well as split trees, can provide additional lifetime extensions when certain nodes are deployed in a wireless sensor network. In this paper, we explore how the number of the family-set of trees influences the lifetime gain, and we work on the problem of constructing data aggregation trees that minimizes the total energy cost of data transmission under diverse set of scenarios and network query region. Through dividing query area, the sensory and aggregation data has been returned through a number of different forwarding trees within each sub query area, which reduces the network "hot spots". To evaluate the performance of the proposed approach, we have compared and analyzed an angular division routing algorithm and query region division routing with LEACH. Theoretical and experimental results illustrate that the query region division algorithm based on angle leads to lower energy cost in comparison with the models reported in the literatures. Index Terms—Energy efficiency, query region division, LEACH, tree structure, wireless sensor networks.

I. INTRODUCTION

I

N many applications, sensors are required to send reports to a specific target (e.g. a base station or a sink) periodically

Shaohua Wan and Yudong Zhang are corresponding authors. Manuscript received January 20, 2016; revised June 3, 2016; accepted June 10, 2016. This work was supported by Natural Science Foundation of Hubei Province of China under Grant 32415113004. Shaohua Wan is with the School of Information and Safety Engineering, Zhongnan University of Economics and Law, and with State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology ([email protected]). Yudong Zhang is with the School of Computer Science and Technology, Nanjing Normal University, Nanjing ([email protected]). Jia Chen is with the Department of Electrical and Computer Engineering, Technical University of Munich ([email protected]).

[1]. In habitat monitoring [2] and civil structure maintenance [3], it is a basic operation for the sink to periodically collect reports from sensors. Since the data gathering process usually proceeds for many rounds, it is necessary to reduce the number of the packets, which carry the reports, transmitted in each round for energy saving. In this paper, we undertake the development of data gathering in wireless sensor networks. Data aggregation is a well-known method for data gathering. In [1], a fixed number of reports is received or generated by a sensor are aggregated into one packet. In other applications, a sensor can aggregate the reports received or generated into one report using a divisible function (e.g. SUM, MAX, MIN, AVERAGE, top-k, etc.) [4]. Data compression, which deals with the correlation between data such that the number of reports is reduced, is another method for data gathering [5], [6]. In many applications, the spatial or temporal correlation does not exist between data (e.g. status reports [1]), and data aggregation is a more suitable method for data gathering. The effectiveness of data aggregation is mainly determined by the routing structure. Tree based routing structures have also been studied extensively and from different perspective, e.g., in-network processing of aggregate queries, maximizing geographic knowledge, balancing workloads, as well as exploiting knowledge about the mobility of sinks. An approach that combines tree-based and multipath-based routing, called “Tributaries and Deltas”, was presented in [7]. The focus of this work was to adjust the balance between trees (“Tributaries”) and multipath (“Deltas”), in response to varying network conditions, expressed in terms of the packets drop rate, for the purpose of robust and efficient in-network computation of aggregates. In particular, algorithms are presented in [8], [9], [10], [11], [12], [13], [14], [15] for computing frequent item sets, quantiles, histograms, and spatio-temporal data along with the criteria for changing the role of a particular node. In this paper, the problem of constructing the number of data aggregation trees with minimum energy cost will be studied. With these understandings, we set to explore the benefits that a given sensor network may gain when multiple trees are used in conjunction with collection of multipath for routing the results of a given query Q pertaining to a geographic region. Specifically, we present two complementary approaches and investigate their impacts on the load balancing in terms of the spatial distribution of the energy-consumption of the nodes involved in processing Q. In our recent work [16], we have

1530-437X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


2 investigated the problem of load balancing for prolonging the operational lifetime of the nodes in sensor networks, at the expense of additional (bounded) delay, as quality-of-service criteria, by means of combing multiple trees and multiple multi-paths. Two strategies were proposed and experimentally studied: (1) generating a collection tree that can be used in alternation; and (2) constructing split trees that can be used concurrently, each providing the same spatial coverage for a given query region, but yielding the best lifetime extension benefits. Motivated by this observation, we feel compelled now to devise an experimental-based methodology for determining how the number of the family-set of trees influences the lifetime gain, and what the optimal value is under diverse set of scenarios and given query area (cardinality of tributary problem, CoT). We intend to tie CoT with various query settings, with or without aggregation and user preferences, stringent or relaxed QoS（Quality of Service）requirements, such as delay-tolerance, accuracy of the result. In the end, we hope to provide a quantitative guideline for choosing CoT, at query-specification, for real-world deployments that will yield, in a best-effort, extended lifetime over the traditional methods of single-tributary construction. II.

RELATED WORK

Various aspects of the problem of routing in wireless sensor networks settings have received considerable attention, due to the fact that communication has the largest impact on the depletion of the batteries of the sensor nodes. In many data aggregation algorithms, a tree is used as the routing structure [17], [18], [19], [20], especially for the applications that have to monitor events continuously. In [21], Madden et al. propose the Tiny AGgregation Service (TAG) framework which uses the shortest path tree and proposes improvements like snooping-based and hypothesis testing based optimizations, dynamic parent switching, and using child cache to estimate lost data. TAG lets parents notify their children about the waiting time to gather all data from children before transmitting, and the sleeping schedule can be adjusted accordingly. Ding et al. use shortest path tree with parent energy-awareness in [22], where the neighboring node with the shortest distance to the sink that has higher residual energy is chosen as the parent. Zhang and Cao propose Dynamic Convoy Tree-Based Collaboration (DCTC) in [23], [24]. Essentially, DCTC tries to balance the tree in the monitoring region to reduce the energy consumption. But it assumes the knowledge of distance to the center of the event at sensor nodes, which may not be feasible to compute with the sensed information in all tracking applications. The reason of all the above tree configuration is that sensors, which usually have limited resources, can save relatively high computational costs for maintaining routing tables if sensors route packets based on a tree. While several papers target at the maximization of the network lifetime in [25], [26], [27], [28], [29], [30], [31], [32], it is sometimes desirable to minimize the energy cost. Especially Wang [33] has investigated the problem of good coverage and load balancing for the purpose of prolonging the operational lifetime

of the nodes in a sensor network. In this paper, the problem of constructing a data aggregation tree with minimum energy cost will be studied. The research is focused on how to choose a better routing metric based on data attributes to facilitate data aggregation. Protocols in tree-based family are built on traditional shortest path routing tree. The main objective is to construct such trees in a given query region. When long-running queries are executed, trees are used as the routing structure, especially for the applications that have to monitor events continuously and periodically. The reason is that sensors, which usually have limited resources, can save relatively high computational costs for maintaining routing tables if sensors route packets based on a tree. The nodes in the proximity of a designated route maintain higher energy reserves than the ones involved in servicing that particular route. Hence, if one could use these other dormant nodes for routing purposes, then the variance of the energy levels in the proximity of the designated route over time will be reduced. Moreover, if carefully used, routing along these alternative nodes will not add signiﬁcant delay to the packet delivery, when compared with the original route. In [18], the authors prove the problem of finding an optimal aggregation tree that maximizes the network lifetime is NP-complete and propose an approximation algorithm that produces a suboptimal tree. Recently, there have been some works about constructing a tree routing structure with maximum lifetime. Xue et al. [34] use a linear programming approach to give an approximation algorithm for finding a maximum lifetime aggregation tree. Shan et al. [30] use a shortest path aggregation tree to collect data from the network. In contrast, our scheme focuses on how to divide query region in which data aggregation trees are constructed with maximum lifetime. III. PROBLEM STATEMENT AND SYSTEM MODEL A. Problem Formation Given a goal of allowing users to pose declarative queries over sensor networks, a user connects to one of the sink nodes, formulates and submits a query. It adheres to the Tiny-SQL query language. When the sink node is physically located within the query region, which is a common scenario for small-scale sensor networks, then the aggregation tree construction is initiated directly at the sink node, which in turn becomes the actual root of the respective tree. The particularity of this scenario is that the sink and the root of the tree functionalities are handled by the same node. Therefore, the mechanism of splitting tree will not be applicable in this case, because the sink node cannot be easily changed. For example, it is unrealistic to assume that the user will move across the network as the sink node might change in time. Surely, one can propose separating the sink from the root node by choosing a root node in a close-proximity of the sink, and establish a point-to-point route from the root to the sink. Unfortunately, as it can be observed in Fig.1, the point-to-point route will have to share the nodes of the aggregation tree, effectively doubling their load, which is the situation we are



3 trying to avoid in the first place. It is important to outline the fact that the sampling nodes, that is, the nodes physically situated within the sampling region, are critically important for the accuracy of the query result. Nevertheless, it will be interesting to investigate experimentally if any lifetime gains can be indeed obtained in this scenario. Therefore, our approach is designed to exploit the root-workload balancing in the scenarios where this is readily possible, that is, in the situation in which the sink node is physically located outside the query region. Fortunately, this case is more common in large-scale sensor network applications, where the user is likely to be more interested to sample remote areas of the network, rather than the entire wireless sensor network. Fig. 1 gives an illustration of this scenario that we will exclusively consider. Clearly, these routes should be constructed in such a way they should not cross the aggregation-trees areas in order to prevent wireless medium contention and reduce the load of the sampling nodes. Fig.1 illustrates an example with 3 split trees and their respective roots in 3 sub query regions, along with the geographic routing used to ship the partial aggregated results to the sink node. Sensor nodes outside the sampling region are also important as they might be used in data-relay duties, making the connection between the sampling region and the sink node. We use the geographical-based shortest-path routing to implement the transmission [3].

Sink

Root

Query region

Fig. 1. Sink node is physically located outside the region, which is divided into 3 sub-regions. In each sub-region, one routing tree has been constructed.

B. System Model We assume that the sensor network consists of N nodes, SN = {S1, S2… SN} and each node is aware of its location, either through a GPS or by using some other techniques (e.g., trilateration based on a few beacon-nodes). In addition, each node knows the location of its one-hop neighbors, the nodes within its communication range. The nodes are assumed to be static and their geographic locations do not change over time. A particular query, Q is specified as a sextuple (Sink, QR, Sval, F, Δ, Type), where: 1. Sink is the ID of the sink node, which is the final destination of all the packets containing values/ readings of relevance for Q; 2. QR is the geographic region of interest, from which the reading of a particular value is needed;

3.

Sval describes the particular value that needs to be monitored (e.g., temperature); 4. F denotes the sampling frequency for the query; 5. Δ specifies the temporal duration, in terms of an interval [tbegin, tend]; 6. Type is a string that specifies the query operation performed like, for example, monitoring some aggregate value (e.g., SUM, COUNT, AVG, MEDIANS) or simply gathering the readings of individual nodes. When it comes to construct a routing tree rooted at a particular node, we assume that some of the standard geographic routing protocols, similar to the ones in TAG approach are used. IV. PARALLEL TRANSMISSION WITH DISJOINT TREES Multipath routing approaches can be used for two purposes: load-balancing which, in turn, prolongs the lifetime of the network and robustness of the transmission. We set to explore the benefits that a given sensor network my gain when multiple trees are used for routing the results of a given query Q pertaining to a geographic region. We compare two algorithms and investigate their impact on the load balancing in terms of the spatial distribution of the energy consumption of the nodes involved in processing Q. An example illustrating our problem-settings is given in Fig. 1 presents a scenario in which the readings of the sensors in a region Q are transmitted to a given sink. Here, the routing is executed using multiple trees combined with the GPSR paths. Multiple trees (CoT) determine the family of routes to be used outside Q and towards the sink, which means an efficient family of tributaries and deltas can increase the lifetime of each of the nodes involved. Consequently, we explore the two ways in which the set of nodes can be can be split into subsets in Q that will form multiple trees that will transmit with disjoint trees concurrently. A. Angular Query Region Division Routing Algorithm The algorithm consists of four stages: dividing query area, distributing query message, constructing routing trees and collecting sensor readings. (1) The rectangle ABCD represents the query area and S is sink node. The ∠ASB is divided into n equal parts by angular bisector segment, which have passed through the query area ABCD and the region will be divided into n sub-regions. We also proposed a geographic routing based query message multicast protocol and an itinerary based data collection protocol, which saves energy of distributing query messages by scheduling some nodes with the query area broadcast query messages. The proposed data collection protocol returns the query results back to the sink through geographic routing protocol, which reduces the number of data forwarding. (2) In the distribution message and constructing routing trees phase, the sink node sends query message to the close nodes of each query sub-regions. We called the root nodes, and they are red as seen in Fig. 2. A root broadcasts a message which is flooded through the entire queried region. In that message, it assigns its own level, which depends on how many hops away from the aggregation root. In this case, the aggregation root’s level is zero. Any sensor without an assigned



4 level that receives this message assigns its own level to be the level in the message plus one, until all nodes in the queried region have been specified a level. However, since some node’s neighbors receive the message roughly at the same time, we choose the one which is closest to the aggregation root as its parent. Finally, with the sub-query area broadcasting query message, a number of routing trees have been constructed which are rooted at the red nodes, as well as covering the same spatial-temporal region. (3) Within the collection phase, each node periodically senses data and is routed toward the root, finally to the query node. We will use the routing tree structure to transmit the data towards the sink level-by-level. Each inner node must await its children’s measurements before forwarding the data to its parent. In our case, each node has one and only one parent while a parent can have multiple children, as seen in Fig. 2. Intermediate nodes of the tree communication are restricted to parent-child links only and therefore, eliminate path searching and extensive message exchanging. At each intermediate node, we apply the aggregation function (MIN, MAX or AVG), depending on the query specification before forwarding the data to the direct parent. The raw measurements are not allowed to be sent towards the sink directly. Practically, this prevents the packets from getting progressively larger or numerous, which in turn yields communication savings. In the end, the sensory data is shipped to the parent along the routing tree, level by level, until to the red nodes. The proposed data collection protocol returns the query results back to the sink through geographic routing protocol, which reduces the number of data forwarding. Sink

A

B

of determining the roots in the sub-region, each node sends location and energy level to the sink, using the simulated annealing algorithm to find K optimal roots. Once the roots of the sub query regions and associated nodes are found, the sink broadcasts a message that contains the root ID for each node. If a node’s root ID matches its own ID, the node is a root; then the aggregation tree construction is initiated directly at the root node. Finally, a forest based topology which ensures that the energy load is evenly distributed among all the nodes is obtained and we consider each child tree in forest as a sub-region.

Fig. 5. Query region division routing algorithm with LEACH-C

Optimum Number of Roots We can analytically determine the optimal value of in LEACH using the computation and communication energy models. Assume that there are nodes distributed uniformly in an M×M region. If there are clusters (the roots of the trees), there are on average nodes per cluster (one cluster head and (N/k)-1 non cluster head nodes, each cluster head is just the root of the tree). Each root dissipates energy receiving signals from the nodes, aggregating the signals, and transmitting the aggregate signal to the BS (sink). Since the BS is far from the nodes, presumably the energy dissipation follows the multipath model (d4 power loss). Therefore, the energy dissipated in the cluster head (root) node during a single frame is 𝑁

𝑁

𝑘

𝑘

4 𝐸𝐶𝐻 = 𝑙 𝐸𝑒𝑙𝑒𝑐 ( − 1) + 𝑙 𝐸𝐷𝐴 + 𝑙𝐸𝑒𝑙𝑒𝑐 + 𝑙 ∈𝑚𝑝 𝑑𝑡𝑜𝐵𝑆 D

C

Fig. 2. The angular query region division

B. Query Region Division Routing Algorithm with LEACH We studied some typical routing protocols and energy hole processing strategies. In order to overcome the shortcomings of low energy efficiency, short lifetime as well as the sophistication that exists in most of the routing protocols, an energy load balance regional division routing algorithm with LEACH-C [35] is proposed, as Fig. 5 illustrates. This algorithm adopts two ways to transfer data both inter and intra regions for avoiding routing hole in the initial network stage, which can prolong network lifetime through balancing the energy consumption. Here nodes are organized into the sub-regions that communicate along with the tree structure towards to the local roots and these local roots transmit the data to the global sink, which is accessed by the end user. During the setup phase

(1)

where 𝑙 is the number of bits in each data message, dtoBS is the distance from the root node to the sink, and we have assumed perfect data aggregation. Each non-root only needs to transmit its data to the parent once during a frame. Presumably the distance to the root is small, so the energy dissipation follows the Friss free-space model (d2 power loss). Thus, the energy used in each non-root node is 2 𝐸𝑛𝑜𝑛−𝐶𝐻 = 𝑙 𝐸𝑒𝑙𝑒𝑐 + 𝑙 ∈𝑓𝑠 𝑑𝑡𝑜𝐶𝐻 (2) where dtoCH is the distance from the node to the cluster head. The area occupied by each cluster is approximately M2/k. In general, this is an arbitrary-shaped region with a node distribution ρ(𝑥, 𝑦). The expected squared distance from the nodes to the cluster head (assumed to be at the center of mass of the cluster) is given by



5 2 ] = ∬ 𝑥 2 + 𝑦 2 𝜌(𝑥, 𝑦)𝑑𝑥𝑑𝑦 E[𝑑𝑡𝑜𝐶𝐻 2

= ∬ 𝑟 𝜌(𝑟, 𝜃)𝑟𝑑𝑟𝑑𝜃 (3) If we assume this area is a circle with radius R = 𝑀⁄√𝜋𝑘 and ρ(𝑟, 𝜃) is constant for r and 𝜃, Eq.(3) can be simplified to 2𝜋

𝜌 𝑀4

𝑀⁄√𝜋𝑘

2 ] = ρ ∫𝜃=0 ∫𝑟=0 𝑟 3 𝑑𝑟𝑑𝜃 = E[𝑑𝑡𝑜𝐶𝐻 (4) 2𝜋 𝑘 2 If the density of nodes is uniform throughout the cluster area, then ρ = (1⁄(𝑀2 ⁄𝑘 )) and 2 ]= E[𝑑𝑡𝑜𝐶𝐻

1 𝑀2

(5)

2𝜋 𝑘

Therefore, in this case 1 𝑀2

𝐸𝑛𝑜𝑛−𝐶𝐻 = 𝑙 𝐸𝑒𝑙𝑒𝑐 + 𝑙 ∈𝑓𝑠 2𝜋 𝑘 The energy dissipated in a cluster during the frame is 𝑁

𝑁

𝑘

𝑘

(6)

Table 1. Energy Characteristics of Mica2 Motes

𝐸𝑐𝑙𝑢𝑠𝑡𝑒𝑟 = 𝐸𝐶𝐻 + ( − 1) 𝐸𝑛𝑜𝑛−𝐶𝐻 ≈ 𝐸𝐶𝐻 + 𝐸𝑛𝑜𝑛−𝐶𝐻 And the total energy for the frame is 𝐸𝑡𝑜𝑡𝑎𝑙 = 𝑘 𝐸𝑐𝑙𝑢𝑠𝑡𝑒𝑟 4 = 𝑙 (𝐸𝑒𝑙𝑒𝑐 𝑁 + 𝐸𝐷𝐴 𝑁 + 𝑘 ∈𝑚𝑝 𝑑𝑡𝑜𝐵𝑆 + 𝐸𝑒𝑙𝑒𝑐 𝑁 +∈𝑓𝑠

1 𝑀2 2𝜋 𝑘

(7)

𝑁)

(8) We can thus find the optimum number of clusters by setting the derivative of Etotal with respect to k to zero 𝑘𝑜𝑝𝑡 =

√𝑁

𝜖𝑓𝑠

√ √2𝜋 ∈

conditions, the experiment is implemented on Mica2 Motes. Our simulation setting is as follows. We create 100 nodes uniformly deployed in a 100×100m2 area, which use the heartbeat node discovery protocol in order to determine the neighbors. Nodes are homogeneous, sharing the same configuration: 19.2 kbps transmission/reception rate on Mac 802.15.4, 10s time-to-sleep interval, max message size 36 bytes, and power consumption characteristics based on the Mica2 Motes. A small battery powers each node, with an initial capacity of 35mAh, which, given the power consumption characteristics as summarized in Table 1, is expected to power a node for tens of hours, depending on the load. For simplicity, we do not consider the characteristic of the mobility of the nodes.

𝑀

2 𝑚𝑝 𝑑𝑡𝑜𝐵𝑆

(9)

In our experiments, N = 100 nodes, M= 100m,∈𝑓𝑠 = 10pJ, ∈𝑚𝑝 = 0.0013pJ, and 75 m

On the Construction of Data Aggregation Tree with

On the Construction of Data Aggregation Tree with

Suggest Documents

Construction of Data Aggregation Tree for Multi ... - Science Direct

Tree-on-DAG for Data Aggregation in Sensor Networks - CiteSeerX

Parallel kd-Tree Construction on the GPU with an

Scanner Data, Time Aggregation and the Construction ... - Ottawa Group

Scanner Data, Time Aggregation and the Construction ... - Ottawa Group

On the Construction of Maximum-Quality Aggregation ... - Google Sites

Group-Independent Spanning Tree for Data Aggregation in Dense

On the Randomized Construction of the Delaunay Tree

Energy Efficient Spanning Tree for Data Aggregation ...

GIST: Group-Independent Spanning Tree for Data Aggregation in ...

FASTER SUFFIX TREE CONSTRUCTION WITH MISSING ... - CiteSeerX

Integration of False Data Detection With Data Aggregation ... - CiteSeerX

Decision Tree Construction Algorithm Based on ...

the effects of data aggregation on the spatial analysis ...

On the Sorting-Complexity of Suffix Tree Construction - Rutgers CS

Distributed Data Aggregation with Geographical Routing ... - CiteSeerX

On the Privacy of Concealed Data Aggregation - Semantic Scholar

On the aggregation of eurozone data - Google Sites

On the Optimization of a Probabilistic Data Aggregation ... - MDPI

Circuit of Data Aggregation on the Fly for WSN

Circuit of Data Aggregation on the Fly for WSN

Temporal aggregation impacts on epidemiological ... - Ethica Data

Improving Multicast ACK Tree Construction with the Token Repository ...

Risk data aggregation - Markit