An approach for near-optimal distributed data

Wireless Netw DOI 10.1007/s11276-009-0211-0

An approach for near-optimal distributed data fusion in wireless sensor networks Damianos Gavalas Æ Aristides Mpitziopoulos Æ Grammati Pantziou Æ Charalampos Konstantopoulos

Springer Science+Business Media, LLC 2009

Abstract In wireless sensor networks (WSNs), a lot of sensory traffic with redundancy is produced due to massive node density and their diverse placement. This causes the decline of scarce network resources such as bandwidth and energy, thus decreasing the lifetime of sensor network. Recently, the mobile agent (MA) paradigm has been proposed as a solution to overcome these problems. The MA approach accounts for performing data processing and making data aggregation decisions at nodes rather than bring data back to a central processor (sink). Using this approach, redundant sensory data is eliminated. In this article, we consider the problem of calculating nearoptimal routes for MAs that incrementally fuse the data as they visit the nodes in a WSN. The order of visited nodes (the agent’s itinerary) affects not only the quality but also the overall cost of data fusion. Our proposed heuristic algorithm adapts methods usually applied in network design problems in the specific requirements of sensor networks. It computes an approximate solution to the problem by suggesting an appropriate number of MAs that D. Gavalas (&) A. Mpitziopoulos Department of Cultural Technology and Communication, University of the Aegean, Lesvos, Greece e-mail: [email protected] A. Mpitziopoulos e-mail: [email protected] G. Pantziou Department of Informatics, Technological Educational Institution of Athens, Athens, Greece e-mail: [email protected] C. Konstantopoulos Department of Informatics, University of Piraeus, Piraeus, Greece e-mail: [email protected]

minimizes the overall data fusion cost and constructs nearoptimal itineraries for each of them. The performance gain of our algorithm over alternative approaches both in terms of cost and task completion latency is demonstrated by a quantitative evaluation and also in simulated environments through a Java-based tool. Keywords Data fusion Wireless sensor networks Mobile agents Itinerary optimization Heuristic

1 Introduction Data fusion is the process of combining data and knowledge from different sources with the aim of maximizing the useful information content. It improves reliability while offering the opportunity to minimize the data retained. Multiple sensor data fusion is an evolving technology, concerning the problem of how to fuse data from multiple sensors in order to make a more accurate estimation of the environment [29]. Applications of data fusion cross a wide spectrum, including environment monitoring, automatic target detection and tracking, battlefield surveillance, remote sensing, global awareness, etc. [1]. They are usually time-critical, cover a large geographical area and require reliable delivery of accurate information for their completion. Most energy-efficient proposals are based on the traditional client/server computing model to handle multisensor data fusion in wireless sensor networks (WSNs); in that model, each sensor sends its sensory data to a back-end processing element (PE) or sink. However, as advances in sensor technology and computer networking allow the deployment of large amount of smaller and cheaper sensors, huge volumes of data need to be processed in real-time.

123

Wireless Netw

Recent research works suggested the use of mobile agents (MAs) [22], an intrinsically distributed computing technology, in the field of WSNs. Among other applications, MAs have been proposed as a promising solution for high-performance distributed sensor data integration tasks [35]. In such applications, the choice of agents’ itineraries (order of visited sensors) is of critical importance affecting the overall energy consumption and data fusion cost. Only few research articles have dealt with the problem of approximating optimal MA routes either through heuristics [30] or genetic algorithms [39]. The most notable weakness of these algorithms is that they rely on a single MA to visit and fuse data from distributed sensors. However, such solutions do not scale acceptably for networks comprising hundreds or thousands of sensor nodes. Herein, we propose a heuristic algorithm that adapts methods usually applied in network design problems in the specific requirements of distributed data fusion in sensor networks. Our proposed near-optimal itinerary design (NOID) algorithm not only suggests an appropriate number of MAs that minimizes the overall data fusion cost, but also constructs near-optimal itineraries for each of them. The initial ideas behind our proposed algorithm have been presented in [15]. The remainder of the paper is organized as follows: Sect. 2 introduces the MA computing model and its advantages over client/server model. Section 3 reviews works related to the research presented herein. Section 4 describes the design and functionality of our heuristic algorithm for designing near-optimal itineraries for MAs performing data fusion tasks in WSNs. Section 5 discusses several issues related to the motivation, credibility and performance of NOID, while Sect. 6 identifies suitable applications for NOID and refers to practical issues that arise in real WSN deployments. Section 7 discusses simulation results and Sect. 8 concludes the paper and presents future directions of our work.

2 Mobile agents Mobile agents technology represents a relatively recent trend in distributed computing, which answers the flexibility and scalability problems of centralized models. The term MA [28] refers to an autonomous program with the ability to move from host to host and act on behalf of users towards the completion of an assigned task. Lange and Oshima [22] listed seven good reasons to use MAs: reducing network load, overcoming network latency, robust and fault-tolerant performance, etc. The MA-based computing model enables moving the code (processing) to the data rather than transferring raw data to the processing module. By transmitting the computation engine instead of data, this model offers several important benefits. First, the

123

network overhead associated with data transfers is substantially decreased. Instead of passing large amounts of raw data over the network through several round trips, only an agent of small size is sent. MAs can also be programmed to carry task-adaptive processes which extend the built-in capability of the system. Last but not least, MAs favor stability and fault-tolerance since they can be dispatched when the network connection is alive and return results when the connection is re-established. Therefore, the performance of the system is not much affected by unreliable network links. On the other hand, the role of MAs in distributed computing is still being debated mainly due to security concerns [13]. Also, any computer system that hosts MAs should be equipped with a software platform able of receiving, instantiating and executing MAs (see Sect. 6); this platform may pose considerable demand upon system resources. Several applications have shown clear evidence of benefiting from the use of MAs [26], including e-commerce and m-commerce trading [33], distributed information retrieval [19], network awareness [2], network & systems management [13, 31, 32], etc. Network-robust applications are also of great interest in military situations today. MAs are used to monitor and react instantly to the continuously changing network conditions and guarantee successful performance of the application tasks. Mobile agents have also found a natural fit in the field of WSNs; hence, a significant amount of research has been dedicated in proposing ways for the efficient usage of MAs in the context of WSNs. In particular, MAs have been proposed for enabling dynamically reconfigurable WSNs through easy development of adaptive and applicationspecific software for sensor nodes [36], for separating sensor nodes in clusters [23], in multi-resolution data integration and fusion [29], data dissemination [6] and location tracking of moving objects [4, 35]. These applications involve the usage of multi-hop MAs visiting large numbers of sensors. The order in which those sensors are visited (i.e. MAs itinerary) is a critical issue, seriously affecting the overall performance. Randomly selected routes may even result in performance worse than that of the conventional client/server model; yet, this issue is not addressed in these works.

3 Related work Wireless sensor network environments form a promising application area for MAs; yet, such environments pose new challenges as the wireless bandwidth is typically much lower than that of a wired network and sensory data traffic may even exceed network capacity. To solve the problem of the overwhelming data traffic, [29] and [30] proposed

Wireless Netw

the MA-based distributed sensor network (MADSN) for scalable and energy-efficient data aggregation. By transmitting the software code (MA) to sensor nodes, a large amount of sensory data may be filtered at the source by eliminating the redundancy. The use of MAs enables more efficient data aggregation compared to the traditional client/server model; within the latter model, raw sensory data are transmitted to the PE where data fusion takes place (see Fig. 1a). In the MA-based scheme, MA objects may visit a number of sensors and progressively fuse retrieved sensory data, prior to returning back to the PE to deliver the data (see Fig. 1b). The main objective in MA-based distributed computing is to minimize the volume of network traffic exchanged between distributed systems while maintaining relatively low task execution time, especially for time-critical tasks [11, 13]. Despite the potential of agent mobility in distributed applications, inappropriate use of MAs may lead to a highly inefficient design. In data fusion applications for instance, using a single MA object that sequentially visits all sensor nodes may actually lead to response time, network overhead and energy consumption worse than the conventional client/server model. A rational approach to overcome such scalability problems is to partition the managed network into several logical/physical clusters (domains) with each cluster assigned to a separate MA. However, an efficient method is needed to propose optimal clustering of sensor nodes as well as optimal itinerary design of individual MAs. A first attempt to address the issue of optimizing the itineraries of multi-hop MAs has been reported in [17]: Iqbal et al. developed a performance model that allows agents to decide whether they should migrate to a site and communicate locally or the communication should be performed remotely through remote procedure calls. The same approach has been followed in [31], in the context of network and systems management applications. Rubinstein et al. [32] evaluated the scalability of MA-based management on large enterprise networks and compared the performance of this approach against that of centralized management paradigm. Based on the assumption that ‘‘MA’s state increases with the number of visited nodes Fig. 1 Centralized vs. mobile agents-based data fusion in WSNs

(a)

and, as a consequence, migration becomes difficult’’, they proposed a strategy in which the MA returns to the management station to deliver its collected data, thereby reducing its size before visiting the remaining hosts. To the best of our knowledge, only [30] and [39] deal with the problem of designing optimal MA itineraries in the context of WSNs. In [30], Qi and Wang proposed two heuristics to optimize the itinerary of MAs performing data fusion tasks. In local closest first (LCF) algorithm, each MA starts its route from the PE and searches for the next destination with the shortest distance to its current location. In global closest first (GCF) algorithm, MAs also start their itinerary from the PE node and select the node closest to the center of the surveillance region as the next-hop destination. The output of LCF-like algorithms though highly depends on the MAs original location, while the nodes left to be visited last are typically associated with high migration cost [20] (see, for instance, the last two hops in Fig. 5a); the reason for this is that they search for the next destination among the nodes adjacent to the MA’s current location, instead of looking at the ‘global’ network distance matrix. On the other hand, GCF produces in most cases messier routes than LCF and repetitive MA oscillations around the region center, resulting in long route paths and unacceptably poor performance [30, 39]. Wu et al. [39] proposed a genetic algorithm for computing routes for an MA that incrementally fuses the data as it visits the nodes in a WSN. Although providing superior performance (smaller network overhead and energy spending) than LCF and GCF algorithms, this approach implies a time-expensive optimal itinerary calculation (genetic algorithms typically start their execution with a random solution ‘vector’ which is improved as the execution progresses), which is unacceptable for timecritical applications, e.g. in target location and tracking. Also, in such applications, the group of visited sensor nodes (i.e. those with maximum detected signal level) is frequently changed over time depending on target’s movement; hence, a method that guarantees fast adaptation of MAs itinerary is needed. Most importantly, both the approaches proposed in [30] and [39] involve the use of a single MA object launched sensors

(b)

sensors

e bil mo gent a surveillance region

surveillance region Processing element

Processing element

123

Wireless Netw

from the PE station that sequentially visits all sensors, regardless of their physical location on the plane. Their performance is satisfactory for small WSNs; however, it deteriorates as the network size grows and the sensor distributions become more complicated. This is because both the MA’s round-trip delay and the overall migration cost may increase squarely with network size, as the traveling MA retrieves data from visited sensors [11]. The growing MA’s state size not only results in increased consumption of the limited wireless bandwidth, but also consumes the limited energy supplies of sensor nodes. It is noted though that the MAs state size remains constant in scenarios that involve efficient fusion algorithms. An alternative approach for MA-based data aggregation is presented in [40] which proposes the adoption of a hierarchical infrastructure wherein neighbor nodes are grouped in clusters; an MA object assigned to an individual cluster and fusing data from its nodes (cluster members). The same idea has been proposed earlier on for another application field [14]. The main advantage of this model is that it enables more fair distribution and localization of data traffic across the network. On the other hand, the problem of optimizing the itineraries of MAs traveling within the clusters still remains. In addition, such hierarchical structures imply complex manageability, heavy control traffic for the clustering process and lead to rapid energy depletion of the nodes enrolled with data aggregation (cluster heads). Our algorithm has been designed on the basis of three objectives: (a) MA itineraries should be derived as fast as possible and adapt quickly to frequent topology changes, e.g. sensor node failures due to energy depletion, by suggesting alternative MA itineraries (hence, an efficient heuristic is needed); (b) the number of MAs involved in the data fusion process should depend on the number and the physical location of the sensors to be visited as well as the amount of data collected from each sensor; the order on which an MA visits its assigned nodes should be computed so as to minimize the overall migration cost and response time; (c) since sensors energy consumption is directly related to the volume of transmitted data, sensor nodes with insufficient energy availability should be visited at the beginning of agents itinerary (at the time that MAs have not yet accumulated large amount of data).

4 The near-optimal itinerary design algorithm 4.1 Problem statement A wireless sensor network (WSN) is represented by a complete graph G = (V, E), |V| = n, where each node i, i = 0,…,n - 1, in V corresponds to a sensor node Si,

123

i = 0,…,n - 1, and each edge (i,j) in E corresponds to a communication link between the sensors Si and Sj. The sensor S0 corresponds to the PE. Each link (i,j) is associated with a cost ci,j, which is a function of the power loss of the signal transmitted over the wireless link (i,j), (defined as the difference in dB—between the effective power transmitted by Si and the power received by Sj and is a function of the physical distance between Si and Sj) as well as the transmitting power and the signal energy detected by the node Si. The mobile agent routing (MAR) problem asks for a path (itinerary) in a WSN, that optimizes a certain routing objective. The overall routing objective is to maximize the sum of the signal energy received at the visited sensors while minimizing the energy consumption (power needed for communication) and the path losses [39]. The MAR problem is NP-complete [7, 39] while approximate solutions to the problem are given by heuristic approaches [30, 39], as discussed in the previous section. Let us consider an extension of the problem where given a WSN, with S = {S0, S1, …, Sn-1} its set of sensors and C = {ci,j | (i,j) [ E} its cost matrix, instead of one itinerary, we ask for a set of near-optimal itineraries I = {I0,…, Ik}, all originated and terminated at the PE (node S0), such that the sum of the costs of the itineraries in I is minimized. The total cost per polling interval over all itineraries is defined as: ctotal ¼

jI j X j Ii j X

ðdj þ sÞ ck;l ; k; lf0; n 1g

ð1Þ

i¼1 j¼0

where |I| denotes the number of agent itineraries, |Ii| represents the number of nodes included into the ith MA itinerary, dj is the amount of data collected by the ith MA on the first j visited sensors, s the MA initial size and ck,l the cost of utilizing the link (k,l) traversed by the MA on its jth hop, i.e., the wireless link connecting sensors Sk and Sl (ck,l is given by the network cost matrix). Therefore, the extended MAR problem asks for a set of itineraries I minimizing the cost function of equation (1). It is noted that Equation (1) formulates a cost function that generally applies to any MAR scenario; for instance, in the case of efficient fusion algorithms (MAs always carry the same P payload), it still holds by setting dj ¼ c. j Interestingly, our problem of designing optimal itineraries (extended MAR problem) exhibits some similarities with the Multi-point Line Topology or the Constrained Minimum Spanning Tree (CMST) problem. A CMST is a Minimum Spanning Tree (i.e., a connected graph without cycles of minimum total cost) with the additional constraint on the size of the sub-trees rooted on the ‘center’ i.e., there is an upper limit on the number of nodes included on each of the sub-trees originated at the tree’s root. In the CMST problem, the objective is the optimal

Wireless Netw Fig. 2 CMST problem: (a) The unconnected graph; (b) the optimal multi-point line topology (constrained minimum spanning tree)

center

(a)

center

(b) 0

0 7

1

11 14

terminal

8 13 4

1

2

12

2

12

5

concentrators

7

10 11 12

3

8 15 13

6

10

3 4

terminals

7

14

5

4

14

selection of the links connecting terminals to concentrators or directly to the network center, resulting in the minimum possible total cost. The CMST problem is NPhard and as a result, several heuristics have been proposed to efficiently deal with it [20]. The output of a CMST algorithm typically comprises topologies partitioned on several multi-point lines (or tree branches), where groups of terminals share a sub-tree to a specific node (center). For instance, Fig. 2a depicts a set of nodes with a given network center and costs for connecting individual pairs of nodes, and Fig. 2b presents the near-optimal multi-line topology that minimizes the overall cost, where network nodes have been partitioned into two clusters or sub-trees, each directed to the network center. Substituting the terms ‘network center’, ‘link’ and ‘multi-point-line’ with the terms ‘processing element’, ‘migration’ and ‘itinerary’, respectively, and following the observation that the output of CMST algorithms (group of multi-point lines rooted at the center) resembles a group of itineraries all originated at the PE, the similarity of a CMST and MA itinerary planning problems comes up although the cost functions and the routing objectives are different. Based on these observations, in this work we propose and study the performance of a heuristic for the solution of the extended MAR problem, called NOID (near-optimal itinerary design) algorithm. NOID adapts some basic principles of Esau-Williams (E-W) heuristic for the CMST problem [9] in the specific requirements of our itinerary planning problem. Note that the cost function used in E-W algorithm considers selected links cost as the only contributing factor to the total itinerary cost. This is certainly not adequate metric to evaluate the cost of agent itineraries ctotal, because a key factor also affecting ctotal is the agent size; more importantly, the agent size increment rate [11], which depends on the amount of data collected by the MA on every sensor.

5

6

6

4.2 The NOID algorithm The NOID algorithm takes into account the amount of data accumulated by MAs at each visited sensor (without loss of generality, we assume this is a constant d). Note that this is a factor ignored by the LCF and GCF heuristics for solving the MAR problem. Namely, NOID recognizes that traveling MAs may become ‘heavier’ while visiting sensors without returning back to the PE to deliver their collected data [11]. Therefore, NOID restricts the number of migrations performed by individual MAs, thereby promoting the parallel employment of multiple cooperating MAs, each visiting a subset of sensors.1 Specifically, the aim of NOID algorithm is, given a set of sensors S = {S0, S1, …, Sn-1}, the PE node S0 and the cost matrix C, to return a set of near-optimal itineraries I = {I0,…, Ik}, all originated and terminated at the PE. Initially, we assume |S| = n itineraries (as many as the WSN sensors) I0,…, In-1, each containing a single sensor (S0, S1, …, Sn-1, respectively). On each algorithm step, two sensors Si and Sj are ‘connected’ and, as a result, the itineraries including these hosts (I(i) and I(j), respectively) are merged into a single itinerary. As mentioned in Sect. 2, LCF and GCF algorithms usually fail as they tend to leave hosts located far from the center stranded since they prioritize the inclusion of hosts closed to last selected node or the center. This is certainly 1

Admittedly, the payload carried by the MA may practically remain constant as it migrates within the sensor field, given that an efficient fusion algorithm is applied (e.g. [29]). However, this scenario referred to as ‘full aggregation’ [24] only applies to a specific class of applications wherein sensory data exhibit high spatial redundancy so that MAs capability for performing progressive fusion can be exploited. In general, partial aggregation is observed for most applications [24]. On the other hand, NOID represents an itinerary scheduling method that generally applies to any data retrieval application, no matter whether spatial redundancy is present or not.

123

Wireless Netw

meaningful for target tracking applications, wherein closer nodes are likely to provide important similar information about the target. For data fusion applications though, relatively expensive links are left last to be included in the solution, significantly increasing the overall cost. A way of dealing with this problem is to pay more attention to sensors far from the center, giving preference to links incident upon them. The NOID algorithm accomplishes this by using the concept of ‘tradeoff function’ ti,j associated with each link (i,j). On each iteration of the algorithm, the itineraries, that include the pair of sensors Si and Sj with the minimum tradeoff function value ti,j, are merged into one. The concept of the tradeoff function is introduced in E-W algorithm, defined as ti;j ¼ ci;j Ci;S0 , where Ci;S0 is the cost of connecting I(i) to S0. This function considers selected links cost as the only contributing factor to the total itinerary cost. Since this is certainly not adequate metric to evaluate the cost of agent itineraries ctotal, because a key factor also affecting ctotal is the amount of data collected by the MA on every sensor, we define the tradeoff function as follows: ti;j ¼ ci;j þ pi;j þ

jIðiÞX jþjIðjÞj

½f k d Ci;S0 ; 0 f 1; jIðjÞj

k¼1

¼0

if j S0 ð2Þ

where Ci;S0 is the cost of connecting I(i) to the PE S0. Initially, this is simply the cost of connecting node i directly to the PE. As sensor Si becomes part of an itinerary containing other sensors, however, this changes to: Ci;S0 ¼ min ck;S0 k2IðiÞ

ð3Þ

Equation (2) extends and adapts this function in the specific requirements of agent itinerary planning problem. The main idea behind this equation is that the more nodes an itinerary already includes, the more difficult for a new host to become part of that itinerary, especially when d is large. In particular, the inclusion of a parameter representing the amount of data collected from the previous hosts (k d) and also the number of sensors already included in the itineraries considered for merging, i.e. |I(i)| and |I(j)|, obstructs the construction of large itineraries, thereby promoting the formation of multiple itineraries, assigned to separate MAs. The coefficient f represents the filtering applied upon the data collected from each sensor from the data fusion algorithm.2 In the case of efficient fusion

2

If, for instance, the deployed sensors monitor the atmospheric temperature within the sensor field and the data fusion task involves collecting the mean or maximum measured temperature, then f is very small (since the MA will carry a single temperature value at a time).

123

algorithms where MAs size remains unaffected throughout their itinerary, the filtering coefficient is adjusted so as to P f k d ¼ c; 8k. Finally, pi.j is a penalty coefficient k defined as follows: s; if Sj S0 pi;j ¼ ð4Þ 0; elsewhere The penalty coefficient dismotivates the creation of short itineraries, i.e., it encourages so-far created itineraries to merge rather than be connected directly to the PE. The choice of the value s for pi.j in the case that Sj : So denotes the cost burden for connecting an itinerary directly to the PE rather than to another itinerary. This is exemplified in Fig. 3, which illustrates a WSN of four sensor nodes. On the second step of NOID’s execution two itineraries have been created (see Fig. 3a). On the next steps, NOID should either consider merging the two itineraries (see Fig. 3b) or connecting them to the PE (see Fig. 3c). In the former case, a single agent itinerary is derived, where the MA performs 5 hops; the cost for transferring the MA code is therefore 5 s. In the latter case, the corresponding cost for the two derived itineraries is 6 s. Thus, the extra cost burden of s which applies in the latter case should be included in the tradeoff function, hence the choice of pi.j values shown in Equation (4). Figure 4 lists a pseudo-code implementation of NOID algorithm. On each algorithm’s step, tradeoff function values ti,j are evaluated for all pairs (i,j), except of those where sensors Si and Sj are already part of the same itinerary; the ‘itineraries’ including the nodes that give the minimum ti,j value are merged. For instance, if the tradeoff function is minimized for the pair of sensors Sk and Sl, then I(k) and I(l) are merged into one itinerary. A comparison among NOID and LCF, GCF heuristics is illustrated in Fig. 5. The sequence numbers indicate the order in which individual links (or migrations) become accepted in the corresponding algorithm steps. The algorithms’ outputs for the particular WSN configuration of Fig. 5 are based on the cost matrix presented in Table 1. In our prototype implementation, the calculation of the WSN cost matrix entries is only based on the spatial distance between sensors. This decision approach has been taken because the transmission power (hence, energy) required to transmit data between pairs of sensors increases linearly with their physical distance [1, 39]; therefore, we accept that it is ‘cheaper’ to transmit certain volumes of data between sensors which are located in proximity. Note that when NOID’s execution terminates, one or more ‘sub-trees’ (groups of nodes) rooted at the PE node have been constructed; this is shown on Fig. 5c, where the sequence numbers enclosed within circles indicate the order in which individual links (or migrations) become

Wireless Netw PE

PE

PE

Fig. 3 (a) Itineraries created in step 2 of NOID’s execution; (b) considering merging the itineraries; (c) considering connecting the itineraries directly to the PE Fig. 4 Pseudocode implementation of NOID algorithm

NOID (n, c, d, s, S0) // n: Total number of sensors, c: cost matrix, d: data collected per host, S0: processing element initialize I // I: I0, .., In-1, where I0 ={S0}, … In-1 ={Sn-1} current = S0 N_ connected = 0 // N_connected: the number of sensors already included into an itinerary while (N_ connected < n) // I(i) is the sequence of hosts (itinerary) where sensor i has already been included

compute t

i, j

= ci , j + p i , j +

I (i ) + I ( j )

∑[ f ⋅ k ⋅ d ] − C k =1

C i , S0 = min c k , S0 k∈I ( i )

i , S0

, where I s (i ) ∩ I s ( j ) = ∅ and

// I s (i ) denotes the set corresponding to itinerary sequence I(i)

merge (I(i), I(j)), for (i, j) minimizing the tradeoff function ( min t i , j ) i, j

N_ connected ++ return I

accepted in the corresponding algorithm steps. It is then a trivial task to produce the near-optimal itineraries (started and terminated at the PE node) for traversing the nodes of each sub-tree; these itineraries correspond to a post-order traversal3 of the sub-trees (shown in Fig. 5d). An

3

Post-order traversal (for each node v, visit the subtrees rooted at v, then visit v) is more efficient than pre-order (for each node v, visit v, then the subtrees rooted at v) or in-order (for each node v, visit a number of subtrees rooted at v, then visit v, then the rest of the subtrees rooted at v) traversal, as it shows to derive better total itinerary cost. Specifically, post-order traversal enables the MA to visit distant sensors first and leave sensors located close to the processing element for the end of the itinerary. Hence, the relatively ‘expensive’ migrations are performed when the MA has not yet collected many data; in the end of their itinerary, when MAs have already accumulated data from the previously visited sensors, they only have to perform short migrations. The cost efficiency of postorder against pre-order or in-order traversals has been experimentally verified through the simulator tool presented at Sect. 6.

alternative approach for traversing the sub-trees produced by NOID would be to include an energy metric.4 Near-optimal itinerary design algorithm execution steps for the test network graph of Fig. 5 are demonstrated in Fig. 6, where the links (agent migrations) selected on each step are highlighted; we assume that the initial MA size is s = 1,000 bytes, the amount of data collected per host is d = 100 bytes and f = 1. On every algorithm step, a pair (i,j) minimizing ti,j is selected and, following that, the itineraries containing hosts Si and Sj are merged into a single itinerary. This process is repeated until all sensors get connected to the PE either directly or via another sensor. Figure 6 only 4

In this approach, sensors comprising each sub-tree would be sorted in decreasing order in terms of their remaining energy level. The sensors with the lowest residual energy would be visited first, when the MA does not yet carry large amounts of data. Hence, the energy requirement for transmitting the MA to the next destination sensor would be minimized; sensors with sufficient energy availability would be left for the end of the MA’s itinerary.

123

Wireless Netw Fig. 5 (a) Output of LCF, (b) output of GCF (C denotes the network’s center), (c) output of NOID (the sequence numbers indicate the order in which the corresponding MA migrations are accepted, i.e., the algorithm’s iteration sequence numbers), (d) the MA itineraries derived from the NOID algorithm’s output

Table 1 Cost matrix of the WSN shown in Fig. 5

S0

S0

A

B

C

D

E

–

50

40

62

56

42

88

–

22

24

58

73

177

–

22

21

27

130

–

19

39

131

–

18

80

–

73

A B C D E F

F

–

presents the values of ti,j for pairs (i,j) minimizing the tradeoff function for each sensor Si min ti;j ; the pair (i,j) j that minimizes ti,j over all sensors min ti;j is then selected. i;j

For instance, on step one, the pair minimizing ti,j is (i,j) = (C, D), hence itineraries including hosts C and D are merged forming: I s ðCÞ [ I s ðDÞ ¼ fC; Dg. On next step, ti,j values are re-calculated, for instance, tC;B ¼ cC;B þ jIðCÞP jþjIðBÞj ½k 100 CC;S0 ¼ 22 þ ð100 þ 200 þ 300Þ 56 k¼1

¼ 566 (the elements of the itinerary set including host D have increased: I s ðDÞ ¼ fC; Dg ) jIðCÞj ¼ 2; while

123

jIðBÞj ¼ 1). At the end of step 6, two itinerary sequences are constructed, forming two subtrees rooted at the manager host: F, C, D, E and A, B (see Fig. 5c). The itinerary of the two MAs is then designed based on the post-order traversal of the two subtrees: I1 = So, F, C, D, E, So and I2 = So, A, B, So (see Fig. 5d). It should be noted that MAs may easily deal with unexpected failures of sensors included within their itinerary (e.g. due to energy depletion) [13]. As soon as a sensor failure is detected (the network connection between the current and the next sensor cannot be established), the MA either waits for a specified timeout for a possible recovery of the destination sensor or directly hops to the node scheduled to be visited after the failed node. In either case, the failure event is recorded and reported to the PE so that itineraries are re-calculated and the failed sensor will be omitted in future MA itineraries. The itinerary design algorithms are executed at the PE node; this is a reasonable choice since an MA always starts its data collection journey from the PE node, which can usually be equipped with more powerful computing resources than regular sensor nodes. The MAs simply follow the route proposed by these algorithms. We assume that the PE node has the predetermined knowledge necessary for performing the global optimization, such as the

Wireless Netw Step 1

Step 2

Step 3

tAB =22+(100+200)-50=272

tAB =272

tAE =73+(100+200+300)-40=633

tBD =21+(100+200)-40=281

tBA =22+(100+200)-40 =282

tBE =27+(100+200+300)-40=587

tCD =19+(100+200)-62=257

tCB =22+(100+200+300)-56=566

tCΕ =39+(100+200+300)-56=583

tDE =18+(100+200)-56=262

tDE =18+(100+200+300)-56=562

tDΕ =18+(100+200+300)-56=562

tED =18+(100+200)-42 =276

tEB =27+(100+200)-42=285

tEF =73+(100+200)-42=331

tFE =73+(100+200)-88=285

tFE =285

tFE =285

Step 4

Step 5

Step 6

tAC =24+(100+200+300+400)-40=984

tAS =50+1000+(100+200)-40=1310

tAC =24+(100+200+300+400+

tBD =21+(100+200+300+400)-40=981

tBS =40+1000+(100+200)-40=1300

500+600)-40=2084

tCB =22+(100+200+300+400)-56=966

tCS =62+1000+(100+200+300+400)-42=2020

tDE =18+(100+200+300+400)-56=962

tDS =56+1000+(100+200+300+400)-42=2014

tED =18+(100+200+300+400)-42=976

tES =42+1000+(100+200+300+400)-42=2000

tFD =80+(100+200+300+400)-42=1038

tFS =88+1000+(100+200+300+400)-42=2046

0

0

0

tBD =21+(100+200+300+400+ 500+600)-40=2081

0

tCS =2020 0

0

tDS =2014 0

0

tES =2000 0

tFS =2016 0

Fig. 6 NOID algorithm execution steps for the network of Fig. 5 (for s = 1,000 bytes and d = 100 bytes)

geographical locations (through GPS interfaces) and transmitting/receiving parameters of sensor nodes. However, more recent methods may be applied to reduce the requirement for GPS-enabled sensor devices [38]. As shown in Fig. 5d, NOID enables MAs to return back to the sink to deliver their results. Admittedly, an alternative approach would be to schedule MA meetings at designated nodes to merge their data so as to further minimize the cost. However, such approach would require some form of synchronous or asynchronous inter-agent communication (e.g. a communication language such as KQML for synchronous communication or a blackboard system for asynchronous communication [3]) raising manageability issues and resulting in increased overhead upon the nodes serving as meeting points. Also, it would require an algorithm for near-optimal selection of suitable nodes to serve as meeting points. Last, the merging of data collected by different MAs is not likely to drastically decrease the overall data overhead as individual MAs are typically assigned to non-overlapping sets of nodes, located far from each other and their collected data is not likely to exhibit high spatial–temporal redundancy. Due to the above-

mentioned disadvantages this approach has not been further considered. 4.3 Computational complexity of NOID Global closest first essentially utilizes sorting to compute the MA path. Its computational complexity is O(N logN) if using a comparison based sorting algorithm. LCF has the computation complexity of O(N2) if the closest neighbour node is obtained by simple comparison in each step. Wu et al. [39] estimated the complexity of their genetic algorithm-based solution as O(N M G), where N is the number of nodes in the network, M is the initial population size, and G is the maximum generation number used to indicate the end of the computation. With regard to the computational complexity of the NOID algorithm, the total cost is O(N2 logN) at most, by keeping the trade-off functions tij in a heap structure. A heap is a convenient data structure for finding the smallest of the tij in O(1) time. The O(log N) factor in the total complexity of the NOID algorithm is due to the cost of updating the heap structure. Specifically, each time an

123

Wireless Netw

itinerary merging takes place, some of the tij values change and so the new values should replace the old ones in the heap. Each of these replacements has O(log N) cost at most. The tij values affected by an itinerary merging are the values of edges that connect nodes of the just merged itineraries with nodes outside these itineraries. Most importantly, we need to perform at most N such tij calculations/replacements, since for each node at most one tij value adjustment is needed. So, for the N merging steps of the NOID algorithm, the total cost is O(N2 logN).

instance, assuming that cost matrix entries depend on the spatial distance and the MA’s size, the cost for an MA migration between the nodes (i,j) would be calculated based on the formula: s ci;j ¼ a di;j ðs þ di Þ

ð5Þ

where a is a static variable, dsi,j represents the spatial distance between (i,j) and the data size dj carried by the MA when migrating from node j depends on the redundancy of sensory readings: dj ¼ ð1 ri;j Þðdi þ dj Þ

ð6Þ

5 Discussion

where ri,j represents the spatial redundancy among the readings of (i,j).

This section discusses several issues related to the motivation, credibility, functionality and performance of NOID.

5.2 Static versus dynamic itinerary planning

5.1 Inclusion of additional parameters in calculating agent itineraries The cost function of Equation (1) only seems to take into account the spatial distance among sensor nodes in the agent itineraries design. Nevertheless, the inclusion of spatial distance also implies the inclusion of energy expenditure (long-haul radio communications result in high energy consumption compared to short-range communications [12]). However, a number of additional parameters, in addition to spatial information, could be taken into account for computing agent itineraries. Knowledge/guesses of the sensor values could serve as such. Let us assume a temperature monitoring application wherein the latest measurements of certain sensor nodes do not vary significantly. Those nodes are therefore more ‘predictable’ and could be excluded from computed agent itineraries for a given time interval, saving network bandwidth and preserving nodes energy. Similarly, knowledge of sensor values could be used in dense WSNs wherein sensory readings of certain node groups typically present high spatial redundancy. In such scenarios, it would be meaningful to include those groups in separate itineraries. That approach would take advantage of the inherent capability of MAs for applying filtering operations upon retrieved data [see the filtering coefficient f in equation (2)]: only a small portion of retrieved data would be accumulated and the MA size would practically remain unaffected thereby minimizing the energy expenditure and the demand upon nodes’ resources. An important aspect of NOID algorithm is that the inclusion of additional itinerary scheduling parameters does not affect the tradeoff function (2). Namely, considering such parameters only affects the cost matrix values, i.e. the cost of agent migrations among sensor nodes. For

123

Itinerary scheduling can be classified as static, dynamic, predictive dynamic or hybrid [7, 27, 41]. Static planning makes use of current global network information and derives efficient agent paths at the dispatcher prior to MA’s migration. LCF, GCF and NOID algorithms fall into this category since they are executed at the PE platform; hence, MA routes are predefined and not computed on-thefly (awareness of the nodes’ geographical locations is assumed). This is a reasonable choice since MAs start their journey from the PE node, which is typically equipped with powerful computing resources. In other words, NOID is centralized,5 namely it requires the PE to know the location and residual energy of each sensor. On the other hand, since the global information collected at the sink may become outdated, a static itinerary may become suboptimal in a dynamic WSN. Thus, static itinerary planning suits applications where a given set of nodes should be repeatedly polled for sensory data (e.g. environmental conditions monitoring) and topology changes do not occur frequently. A dynamic itinerary is determined on-the-fly at each hop of the MA. In [41], a dynamic planning method is proposed to achieve progressive fusion accuracy. The dynamic planning approach seeks the sensor node with maximum residual energy that requires the least energy consumption for the agent’s migration and provides the greatest amount of information gain. The MA seeks to migrate to a sensor 5

NOID extends and adapts Esau-Williams algorithm in the specific requirements of agent itinerary planning problem. To the best of our knowledge, there is no distributed implementation of the EsauWilliams algorithm. Essentially, this algorithm builds a spanning tree whose subtrees have bounded weight. Distributed algorithms for building spanning trees exist in the literature. The simplest algorithm is to build a breadth -first search tree [25] simply by flooding a message from the root of the tree. However, these approaches, similarly to more recent approaches, [21] are associated with high message overhead, hence high energy cost in WSNs.

Wireless Netw

node that progressively increases the accuracy. A dynamic itinerary enables more prompt response to potential topology changes. On the other hand, it rises energy demands since the next-hop computation executes on resource-constrained nodes; in addition, the MA’s size considerably increases (the itinerary scheduling logic is embedded into MA code and transferred on every MA migration) [27]. Thus, dynamic itinerary planning suits highly dynamic networking environments and applications wherein the set of polled nodes cannot be determined in advance. In a hybrid approach the nodes to be visited are selected by the PE, but the visiting order is decided on-the-fly by the MA. The hybrid approach is adopted in [39], which targets object tracking applications; in case of topology changes in the network (e.g. communication loss or energy depletion) the MA routing code is re-executed at the PE and the new route is sent to the MA. Predictive itinerary planning has been devised to optimize the performance of MA-based collaborative processing applications [41] following the observation that in target tracking applications for instance, the movement pattern of the target is an important factor that affects the final fusion results, yet, not taken into account by dynamic planning approaches. In conclusion, the choice of the appropriate itinerary planning method is application-dependant. NOID certainly is not the appropriate mechanism for WSN environments that present frequent topology changes or applications with high dynamics (i.e. those that require real-time adaptation of agent itineraries to provide progressive accuracy). However, NOID could easily adapt hybrid planning practices, e.g. the MAs could be instructed to bypass nodes with low residual energy alongside their routes.

tree add to the difficulty of finding reliable timing control scheme at each node [42]. On the contrary, the MA-based data aggregation model overcomes the above-mentioned problems as the sequential fusion performed by traveling MAs removes the requirement for an aggregation synchronization plan. Also, collisions due to concurrent data transmissions are eliminated as sensory data are collected and transferred by MAs. In general, the cost of the post-order traversal is considerably reduced by the use of a shortcutting technique adapted by NOID. Specifically, suppose that the last node from which MA has collected data is, say, node a and now the tree traversal dictates that the next node to collect data from is node b. Instead of following the tree path between a and b faithfully, we try to by-pass as many nodes as possible along this path whenever direct communication is possible. Clearly, this reduces the cost of the path followed by a MA between these two nodes, especially in dense WSN deployments. Thus, the tree is used more as a guideline for determining the visit order of the nodes rather than as the actual route of the MA. In general, the use of trees in finding low cost tours in graphs has been extensively studied in the literature. For example, a well-known algorithm which approximates the optimal solution of the traveling salesman problem (TSP) is the Christofides’ algorithm (see [37] for a complete description). The solution in this algorithm is derived by traversing a minimum spanning tree (MST) and then shortcutting whenever is possible, similarly to the course adopted by NOID. However, in NOID’s case the trees are not MSTs over different groups of nodes but they are constructed with different objective in mind, namely to minimize the cost function of Eq. (1).

5.3 Cost of post-order tree traversal versus spanning tree-based data aggregation

6 Application domains and practical issues

As a first step, NOID builds spanning trees with bounded weight rooted at the sink (see Fig. 5c). Then it enables the ‘conversion’ of these trees to MA itineraries through a post-order traversal instead of adopting the straight-forward use of the spanning trees as data aggregation trees. We argue that the latter approach does not necessarily achieve high levels of aggregation, hence cost efficiency. Each node in such trees should wait first to receive data from some or all of its children, then perform local aggregation with its own data and then send the aggregated data further up the tree [34]. When nodes generate unpredictable amounts of sensory data, it becomes difficult for a node to determine the right waiting time before sending its own data up the tree. Also, collisions that may occur due to the concurrent data transmissions from nearby nodes in the

Near-optimal itinerary design ideally suits applications wherein the inherent capability of MAs to delegate processing intelligence to systems level offers added value compared to the client–server paradigm. The advantages of NOID become more evident in WSN environments wherein sensory data exhibit high spatial–temporal redundancy. Temporal redundancy occurs when individual sensors measurements do not fluctuate considerably for relatively prolonged time periods. Spatial redundancy arises in terrains densely covered by sensor nodes where practically neighbour nodes record identical readings. When the abovementioned conditions do not hold, the use of single-hop agent mobility (wherein each MA migrates to a remote node to permanently execute an assigned task thereafter) instead of multi-hop mobility—exercised by NOID—represents a more reasonable choice.

123

Wireless Netw

Hence, NOID’s agents would represent a suitable middleware solution in a variety of application fields presenting high spatial–temporal redundancy of sensory data, e.g. environmental monitoring, security applications, processing/filtering of sensor readings like light intensity, temperature, moisture, acceleration, orientation, etc. This claim is also supported by the work of Cristescu et al. [8] who argue that multi-hop itineraries is the appropriate approach for data gathering when distributed nodes’ data is spatially correlated. Also, NOID would represent an ideal medium for the transparent deployment of new application logic to distributed nodes (remote sensor programming). A fundamental requirement for any computer system that hosts agents is the existence of a software platform, commonly referred to as Mobile Agent Server (MAS) [13], acting as an interface between visiting MAs and legacy systems. In the context of sensor nodes, MAS modules functionally reside on the top of the hardware platform and the operating system or a virtual machine (VM), defining an efficient runtime environment for receiving, instantiating, executing, and dispatching incoming MAs, whilst protecting the system against malicious agent attacks. As illustrated in Fig. 7, a MAS typically comprises the following sub-components: (a) a Listener daemon, which listens for incoming MAs and provides them an execution thread; (b) the Security component, an optional subsystem that may authenticate incoming agents to ensure that they have been dispatched from trusted hosts and/or encrypt sensitive sensory data; (c) the Service logic component, an optional module that extends the functionality of MAs and implements application logic not carried, yet required by them to perform their decentralized tasks; (d) the Migration facility whose role is to dispatch upon request an MA to a specific network device. The implementation of a MAS containing rich functionality would be unacceptable for resource-constrained primitive sensor nodes (Java implementations of MAS modules for PCs typically occupy space of 100–400 KB [13]). However, modern sensor node devices, although still suffering from energy constraints and limited processing capabilities, they may undertake fairly demanding tasks (in terms of memory and processing requirement).6 The installation of agent platforms (MASs) in experimental testbeds has already been demonstrated in several research projects [5, 10, 16], providing ground for their future exploitation in commercial products.

6

Sun Microsystems’ SunSPOTs are equipped with an ARM920T 180 MHz processor, 4 MByte Flash memory and lifetime of 7 h to 909 days, depending on the CPU usage. Crossbow’s MICA2 motes are equipped with an ATmega128L 8 MHz CPU, program memory of 128 KB, external storage of 512 KB and 450 days lifetime (when their duty cycle is 1% on average).

123

Fig. 7 The hardware/software stack required for hosting MA-based applications

7 Simulation results Our simulation work attempts to compare the performance of NOID against LCF, GCF and genetic algorithm (GA) algorithms in terms of the overall itinerary length, data fusion cost and data fusion response time. Unless otherwise specified, the parameters used throughout the simulation tests are those shown in Table 2. The simulation results presented herein have been averaged over ten simulation runs (i.e., for ten different network topologies). Simulations have been conducted using a Java-based tool, implemented for this purpose. The simulator allows to easily specify simulation parameters and graphically illustrates the output of NOID, LCF, GA and GCF, while also recording their respective overall itinerary length, data fusion cost and response time. It also takes into account a number of WSN-related constraints, for instance the sensors transmission range and the network transfer rate, while an energy expenditure model has been implemented (where the energy spent for every MA migration depends on the Table 2 Simulation parameters Parameter

Value

Simulated plane (m2)

1,500 9 1,000

Number of sensors

100

Sensors transmission power (dBm) Sensors transmission range (m), assuming clear terrain

4 10

Network transfer rate (Kbps)

250

Initial sensors battery lifetime

20–100 energy units

MA execution time at each sensor (processing delay, in ms)

50

MA instantiation delay (ms)

10

MA code size (s, in bytes)

1,000

Bytes accumulated by the MA at each sensor (d, in bytes)

100

Data fusion coefficient (f)

1

Wireless Netw

spatial distance and the MA’s size). It should be noted that the source code of the GA implementation (written in C??) has been kindly provided by the authors of [39] and integrated within our simulator making use of the Java Native Interface (JNI) [18]. Figure 8 illustrates representative screens of our Javabased simulator that draw the output of the four MA-based distributed data fusion algorithms. The ellipse in Fig. 8b denotes the network center. Notably, GCF typically suggests wireless hops among distant sensor nodes which are not within mutual transmission range and, hence, should be routed through intermediate nodes, thereby frequently requiring complex routing decisions and increasing the overall latency and energy consumption. The same usually applies for the last hops of MA itineraries suggested by LCF. In contrast, NOID tends to construct itineraries with

medium-distance hops (that is usually the case for relatively dense networks) wherein traveling MAs hardly ever migrate from a sensor to another through intermediate nodes. The output of NOID (shown in Fig. 8d) involves considerably shorter overall itinerary length than GCF, yet, larger than that of LCF. However, the four itineraries of NOID result in smaller overall cost [the cost of the data fusion task is based on Equation (1)]. It is stressed that, unlike LCF, GCF and GA, the output of NOID is not always the same for a given network topology. For instance, when changing the amount of data collected from each sensor to d = 5 and d = 200 bytes, NOID proposes one and seven near-optimal agent itineraries, respectively (see Fig. 9). A first set of simulation experiments compares the performance of LCF, GCF, GA and NOID algorithms in

Fig. 8 Java-based simulation of MA-based distributed data fusion algorithms: (a) LCF output (total itinerary length: 1,973 m, overall cost: 1,559,869); (b) GCF output (total itinerary length: 3,985 m, overall cost: 3,070,673); (c) GA output (total itinerary length: 1,677 m, overall cost: 1,283,610); (d) the trees constructed by NOID (four trees, each assigned to an individual MA; (e) NOID output, where the four trees created on the previous step are traversed in post-order (total itinerary length: 2,750 m, overall cost: 1,150,605)

123

Wireless Netw

benefit of NOID over LCF, GCF and GA increases with d/s). Figure 10b depicts how the number of itineraries derived by NOID are increased for larger d values (in such scenarios it is more cost-effective to generate multiple, short itineraries, so that the MAs’ state size does not grow a lot). Figure 10c shows (in logarithmic scale) the maximum number of MA hops proposed by NOID in relation to d. For instance, for d = 0 a single itinerary is generated, hence the MA performs a numbers of hops equal to the number of sensors; in the extreme case that d C s each sensor is assigned a single MA, hence it performs only two hops (to visit the sensor and then back to the PE).

terms of their total itinerary length. As illustrated in Fig. 10a, NOID proposes itineraries of almost identical length with those of LCF and GA algorithms and considerably shorter than those of GCF when d = 0 bytes (the MA collects no data). Specifically, GA algorithm outperforms alternative approaches when the scheduling of a single itinerary is considered (d = 0 bytes for NOID) at the expense of increased execution times. As the d/s ratio increases though, the total length of NOID itineraries increases remarkably (since multiple agents cooperate in the data fusion task). However, as shown on later simulation results, the increased itinerary length does not result in higher itinerary cost (in fact the cost

100

250

(a) # NOID itineraries

Total itinerary length (km)

Fig. 9 Output of NOID algorithm for: (a) d = 5 bytes; (b) d = 200 bytes

10

(b)

200 150 100 50

1

0 20

40

60

80

100

120

140

160

180

20

200

40

60

GCF

GA

NOID (d=0)

# NOID max itinerary length

LCF

1000

80

100

120

140

160

180

200

Number of sensors

Number of sensors d=0

NOID (d=1000)

d=50

d=200

d=1000

(c)

100

10

1 20

40

60

80

100

120

140

160

180

200

Number of sensors d=0

d=50

d=200

d=1000

Fig. 10 (a) Comparison of LCF, GCF, GA and NOID algorithms in terms of total itinerary length; (b) number of itineraries suggested by NOID; (c) length (number of hops) of the maximum itinerary suggested by NOID

123

Wireless Netw

1.000

to ten nodes) as the execution time needed to derive the optimal solution increases exponentially. NOID performs better than alternative heuristic and genetic algorithm approaches and is not shown to considerably diverge from the optimal solution (Fig. 12). Figure 13 draws simulation results that serve as an informal validation that NOID indeed proposes a nearoptimal number of agent itineraries. NOID’s itineraries cost is compared against the cost output of alternative solutions with predefined number of itineraries. To ensure a more fair comparison of the latter against NOID, we have not chosen a completely random grouping of nodes in separate itineraries. Instead, when considering I itineraries, n we have specified groups comprising I sensors each (n denotes the overall number of nodes), where the group members are located within short spatial distance. In addition, the selection of the MAs migrations order within those groups is based on the LCF algorithm. Figure 13a confirms a clear advantage of NOID’s itineraries over alternative approaches since the number of itineraries suggested by NOID largely depends on the actual network scale. For instance, NOID employs 21 MAs on average when n = 100 and 47 when n = 200. Indeed, the ‘20 MAs’ output performs better than the alternative fixed-itineraries-number solutions for n = 100 and the ‘50 MAs’ solution performs better for n = 200. However,

Data fusion cost (millions)


Figure 11 compares LCF, GCF, GA and NOID algorithms in terms of their respective overall data fusion cost. For a relative low d/s ratio value (= 0.05 in Fig. 11a) the cost saving offered by NOID over GA, LCF and GCF becomes 11.7, 26.4 and 466.6%, respectively for a network size of 100 sensors and increases to 17.8, 34.9 and 571.1% for 200 sensors. As we increase d/s = 1 (see Fig. 11b), the performance gain of NOID over GA, LCF and GCF becomes 421, 480.9 and 2,614.3%, respectively for 200 sensors. For d/s = 10 (not shown), the gain of NOID increases to 1,326.7, 1,856.5 and 8,730.2%, respectively. Finally, Fig. 11c verifies that for a given network size the cost benefit of NOID over LCF and GCF increases drastically with d/s ratio. Admittedly, NOID does not derive optimal solutions. Besides, as stated in Sect. 4.1, the MAR problem is NPcomplete. To verify the convergence of NOID results with optimal solutions, we have implemented an algorithm that derives optimal MA itineraries. Namely, for a given set of nodes N it exhaustively examines all possible number of itineraries I (I = 1… N); for each itinerary Ii (i = 1… I) it examines all possible number of itinerary nodes Ni (Ni = 1… N - I ? 1) with all possible nodes’ permutations. Figure 13 illustrates the performance (overall fusion cost) of NOID and alternative approaches against the optimal solution. We examine networks of small scale (up

(a)

100

10

1 20

40

60

80

100

120

140

160

180

200

3.000

(b)

2.500 2.000 1.500 1.000 500 0 20

40

Number of sensors (d=50 bytes) GCF


LCF

GA

80

100

LCF

NOID

3.000

60

120

140

160

180

200

Number of sensors (d=1000 bytes) GCF

GA

NOID

(c)

2.500 2.000 1.500 1.000 500 0 0

200

400

600

800 1000 1200 1400 1600 1800 2000

d (bytes) LCF

GCF

GA

NOID

Fig. 11 Comparison of LCF, GCF, GA and NOID algorithms in terms of their overall data fusion cost for (a) s = 1,000 bytes, d = 50 bytes, (b) s = d=1,000 bytes, (c) s = 1,000 bytes, network size of 100 sensors

123

Wireless Netw 3,4

(a)

1,7



1,8

1,6 1,5 1,4 1,3 1,2 1,1 1 0,9 8

9

(b)

2,9 2,4 1,9 1,4 0,9 100

10

200

300

Number of sensors (d = 100 bytes) Optimal

NOID

GA

LCF

400

500

d (bytes) Optimal

GCF

NOID

GA

LCF

GCF

100

(a)

90 80 70 60 50 40 30 20 10 0

20

40

60

80

100

120

140

160

180

200



Fig. 12 Comparison of LCF, GCF, GA and NOID algorithms in terms of their overall data fusion cost against the optimal solution for (a) s = 1,000 bytes, d = 100 bytes, (b) s = 1,000 bytes, N = 10 nodes

120

(b) 100 80 60 40 20 0 0

50

100

150

200

Number of sensors (d=1000 bytes) NOID

1 MA

10 MAs

20 MAs

250 300

350

400

450

500

d (bytes)

50 MAs

NOID

1 MA

10 MAs

20 MAs

50 MAs

16 14

(a)

12 10 8 6 4 2 0 20

40

60

80

100

120

140

160

180

200

Number of sensors (d=50 bytes) LCF

GCF

GA

NOID

Overall response time (sec)

Overall response time (sec)

Fig. 13 Comparison of NOID’s itineraries cost against the cost output of alternative solutions with predefined number of itineraries (MAs), for (a) s = 1,000 bytes, d = 100 bytes, (b) s = 1,000 bytes, network size of 100 sensors

100

(b)

90 80 70 60 50 40 30 20 10 0 20

40

60

80

100

120

140

160

180

200

Number of sensors (d=1000 bytes) LCF

GCF

GA

NOID

Fig. 14 Comparison of LCF, GCF and NOID algorithms in terms of the overall response time for (a) d = 50 bytes, s = 1,000 bytes, (b) d = 1,000 bytes, s = 1,000 bytes

NOID maintains a clear performance gain mainly due to its more intelligent node grouping method. Figure 13b also verifies the advantage of NOID’s itineraries for varying d/s ratios. A last set of experiments evaluates the overall response time of LCF, GCF, GA and NOID algorithms for completing data fusion tasks. Response time is calculated as the sum of MAs instantiation delay, processing delay, MAs transmission delay and propagation delay:

123

toverall ¼ tinst þ tproc þ ttrans þ tprop

ð7Þ

The MAs instantiation delay is related to the number of MAs involved in the data fusion task (in our experiments it takes 10 msec to instantiate each MA object). Hence, it is constant for LCF, GCF and GA algorithms that always instantiate a single MA that visits the whole set of sensor nodes, while for NOID it depends on the network scale and the d/s ratio (see Fig. 10b) which dictate the number of

Wireless Netw

proposed itineraries. Processing delay (time needed for the MA to complete its data fusion task on each sensor) is constant (assumed 50 msec in our experiments). Transmission delay depends on the network transfer rate and the current size of the MA (i.e. the MA’s code size plus the amount of data accumulated within the MA’s state). Finally, propagation delay depends on the physical distance covered in successive MA migrations (i.e. on the overall itinerary length). Response time measurements are depicted in Fig. 14. In both graphs, the response times of LCF, GCF and GA almost coincide: the time differences derives from the algorithms’ itinerary lengths which is shorter for GA, longer for LCF and even longer for GCF (see Fig. 10a). It is demonstrated that as the d/s ratio increases (see Fig. 14b) the response time gain of NOID over LCF, GCF and GA increases drastically as the transmission time dominates (for LCF, GCF and GA) over the other delay parameters. That is, although NOID dispatches a large number of MAs (see Fig. 10c) thereby increasing tinst, these MAs work in parallel, while each of them visits a small set of sensors (unlike LCF, GCF and GA where a single MA performs a number of hops equal to the number of sensors). Hence, in NOID, by the end of their itinerary MAs have not collected large chunks of data, considerably decreasing the associated transmission delay. It is noted that the response time gain of NOID is an invited side-effect of the cost function (1) which motivates the design of shorter agent itineraries, especially as the d/s ratio increases.

8 Conclusions and future work In this article we presented NOID, an efficient heuristic algorithm that derives near-optimal itineraries for MAs performing incremental data fusion in WSN environments. Although NOID only considers the spatial information for designing MA itineraries, it minimizes the energy consumption involved in agents’ transmission since the transmission power required to transmit data between pairs of sensors increases with their physical distance. Near-optimal itinerary design has been extensively evaluated through simulation tests and has been shown to outperform alternative existing MA-based approaches both in terms of data fusion cost and the associated overall response time. As a future work, we intend to investigate the applicability of NOID in object tracking applications, where MA itineraries will only include sensors with increased signal strength (high target detection accuracy) and sufficient energy availability. Another future research direction will involve the implementation of NOID in real WSN environments.

Acknowledgments The authors of [39] are acknowledged for kindly providing us the source code of their genetic algorithm implementation. We are also grateful to the anonymous reviewers for thoroughly reviewing and helping us to improve the technical content and presentation of our paper.

References 1. Akyildiz, F., Su, W., Sankarasubramaniam, Y., & Cayirci, E. (2002). A survey on sensor networks. IEEE Communications Magazine, 40(8), 102–114. 2. Al-Hammouri, A., Zhang, W., Buchheit, R., Liberatore, V., Chrysanthis, P., & Pruhs, K. (2006). Network awareness and application adaptability. Information Systems and E-Business Management, 4(4), 399–419. 3. Baumann, J., Hohl, F., Radouniklis, N., Rothermel, K., & Strabetaer, M. (1997). Communication concepts for mobile agent systems. In Proceedings of the 1st international workshop on mobile agents (MA ‘97) (pp. 123–135). 4. Boulis, A. (2005). Programming sensor networks with mobile agents. In Proceedings of the 6th international conference on mobile data management (MDM’2005) (pp. 252–256). 5. Boulis, A., Han, C., & Srivastava, M. (2003). Design and implementation of a framework for efficient and programmable sensor networks. In Proceedings of ACM MobiSys’03 (pp. 187–200). 6. Chen, M., Kwon, T., & Choi, Y. (2005). Data dissemination based on mobile agent in wireless sensor networks. In Proceedings of the 30th IEEE conference on local computer networks (LCN’05) (pp. 527–529). 7. Chen, M., Gonzalez, S., & Leung, V. C. M. (2007). Applications and design issues for mobile agents in wireless sensor networks. IEEE Wireless Communications, 14(6), 20–26. 8. Cristescu, R., Beferull-Lozano, B., Vetterli, M., & Wattenhofer, R. (2006). Network correlated data gathering with explicit communication: NP-completeness and algorithms. IEEE/ACM Transactions on Networking, 14(1), 41–54. 9. Esau, L. R., & Williams, K. C. (1966). On teleprocessing system design, Part II—a method for approximating the optimal network. IBM Systems Journal, 5, 142–147. 10. Fok, C. L., Roman, G. C., & Lu, C. (2005). Rapid development and flexible deployment of adaptive wireless sensor network applications. In Proceedings of the 25th international conference on distributed computing systems (ICDCS’2005) (pp. 653–662). 11. Fuggeta, A., Picco, G. P., & Vigna, G. (1998). Understanding code mobility. IEEE Transactions on Software Engineering, 24(5), 346–361. 12. Gao, J. L. (2002). Analysis of energy consumption for ad hoc wireless sensor networks using a bit-meter-per-joule metric. In IPN Progress Report 42-150 (http://ipnpr.jpl.nasa.gov/progress report/42-150/150L.pdf). 13. Gavalas, D. (2001). Mobile software agents for network monitoring and performance management. PhD Thesis, University of Essex, UK. 14. Gavalas, D., Greenwood, D., Ghanbari, M., & O’Mahony, M. (2002). Hierarchical network management: A scalable and dynamic mobile agent-based approach. Computer Networks, 38(6), 693–711. 15. Gavalas, D., Pantziou, G., Konstantopoulos, C., & Mamalis, B. (2006). A method for incremental data fusion in distributed sensor networks. In Proceedings of the 3rd IFIP conference on artificial intelligence applications & innovations (AIAI’2006) (pp. 635–642). 16. Georgoulas, D., & Blow, K. (2007). In-Motes Bins: A real time application for environmental monitoring in wireless sensor

123

Wireless Netw

17.

18. 19.

20. 21.

22. 23.

24. 25. 26. 27.

28. 29.

30.

31.

32.

33.

34.

35.

36.

networks. In Proceedings of the 9th IEEE/IFIP international conference on mobile and wireless communications networks (MWCN’2007) (pp. 21–26). Iqbal, A., Baumann, J., & Straßer, M. (1998). Efficient algorithms to find optimal agent migration strategies. Universita¨t Stuttgart, Fakulta¨t Informatik, Bericht Nr. 1998/05. Java Native Interface, http://java.sun.com/j2se/1.5.0/docs/guide/ jni/index.html. Jiao, Y., & Hurson, A. R. (2005). Adaptive power management for mobile agent-based information retrieval. In Proceedings of the 19th international conference on advanced information networking and applications (AINA’05) (pp. 675–680). Kershenbaum, A. (1993). Telecommunications network design algorithms. New York: McGraw-Hill. Kutten, S., & Peleg, D. (1998). Fast distributed construction of k-dominating sets and applications. Journal of Algorithms, 28, 40–66. Lange, D. B., & Oshima, M. (1999). Seven good reasons for mobile agents. Communications of the ACM, 42(3), 88–89. Lotfinezhad, M., & Liang, B. (2005). Energy efficient clustering in sensor networks with mobile agents. In Proceedings of the IEEE wireless communications and networking conference (WCNC’05). Luo, H., Liu, Y., & Das, S. K. (2007). Routing correlated data in wireless sensor networks: A survey. IEEE Network, 21(6), 40–47. Lynch, N. (1996). Distributed algorithms. San Francisco, California: Morgan Kauffmann. Milojicic D. (1999). Mobile agent applications. IEEE concurrency, 7(3), 80–90. Mpitziopoulos, A., Gavalas, D., Konstantopoulos, C., & Pantziou, G. (2009). Mobile agent middleware for autonomic data fusion in wireless sensor networks. In M. K. Denko, L. T. Yang, & Y. Zhang (Eds.), Autonomic computing and networking, chapter 3 (pp. 57–81). USA: Springer. Picco, G. P. (2001). Mobile agents: An introduction. Microprocessors and Microsystems, 25(2), 65–74. Qi, H., Iyengar, S. S., & Chakrabarty, K. (2001). Multi-resolution data integration using mobile agents in distributed sensor networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 31(3), 383–391. Qi, H., & Wang, F. (2001). Optimal itinerary analysis for mobile agents in ad hoc wireless sensor networks. In Proceedings of the13th international conference on wireless communications (Wireless’2001) (pp. 147–153). Reuter, E., & Baude, F. (2002). System and network management itineraries for mobile agents. In Proceedings of the 4th international workshop on mobile agents for telecommunication applications (MATA’02), LNCS (Vol. 2521, pp. 227–238). Rubinstein, M. G., Duarte, O. C., & Pujolle, G. (2003). Scalability of a mobile agents based network management application. Journal of Communications and Networks, 5(3), 240–248. Shih, D. H., Huang, S. Y., & Yen, D. C. (2005). A new reverse auction agent system for m-commerce using mobile agents. Computer Standards & Interfaces, 27(4), 383–395. Solis, I., & Obraczka, K. (2004). The impact of timing in data aggregation for sensor networks. In Proceedings of IEEE international conference on communications (ICC’04) (pp. 3640– 3645). Tseng, Y. C., Kuo, S. P., Lee, H. W., & Huang, C. F. (2004). Location tracking in a wireless sensor network by mobile agents and its data fusion strategies. Computer Journal, 47(4), 448–460. Umezawa, T., Satoh, I., & Anzai, Y. (2002). A mobile agentbased framework for configurable sensor networks. In Proceedings of the 4th international workshop on mobile agents for telecommunications applications (MATA’02) (pp. 128-140).

123

37. Vazirani, V. (2001). Approximation algorithms. Berlin: Springer. 38. Wong, K. F. S., Tsang, I. W., Cheung, V., Chan, S. H. G, & Kwok, J. T. (2005). Position estimation for wireless sensor networks. In Proceedings of the 2005 IEEE Global Communications Conference (Globecom’2005). 39. Wu, Q., Rao, N., Barhen, J., Iyengar, S., Vaishnavi, V., Qi, H., et al. (2004). On computing mobile agent routes for data fusion in distributed sensor networks. IEEE Transactions on Knowledge and Data Engineering, 16(6), 740–753. 40. Xu, Y., & Qi, H. (2004). Distributed computing paradigms for collaborative signal and information processing in sensor networks. Journal of Parallel and Distributed Computing, 64(8), 945–959. 41. Xu, Y., & Qi, H. (2008). Mobile agent migration modeling and design for target tracking in wireless sensor networks. Ad Hoc Networks, 6(1), 1–16. 42. Yuan, W., Krishnamurthy, S., & Tripathi, S. (2003). Synchronization of multiple levels of data fusion in wireless sensor networks. In Proceedings of the 46th IEEE global telecommunications conference (Globecom’03) (pp. 221–225).

Author Biographies Damianos Gavalas received his B.Sc. degree in Informatics (Computer Science) from the University of Athens, Greece, in 1995 and his M.Sc. and Ph.D. degree in electronic engineering from University of Essex, UK, in 1997 and 2001, respectively. Currently, he is an Assistant Professor in the Department of Cultural Technology and Communication, University of the Aegean, Greece. He has served as TPC member in several leading conferences in the field of mobile and wireless communications. He has co-authored over 80 papers published in international journals and conference proceedings. His research interests include distributed computing, mobile code, network and systems management, mobile computing, m-commerce, mobile ad-hoc & sensor networks. Aristides Mpitziopoulos graduated from Hellenic Air Force Technical NCO Academy in 1996 with the speciality of telecommunications engineer. He received his B.E. degree in Culture Technology and Communication in 2006 from the University of Aegean, Mitilene, Hellas. He is an officer in Hellenic Air Force enrolled in Electronic Warfare. He is pursuing his Ph.D. degree at the Department of Culture Technology and Communication, Aegean University, Mytilene, Hellas. His main research interests are in electronic warfare, network design, mobile agents, data fusion and security in wireless sensor networks and in wireless networking in general. He is a student member of IEEE.

Wireless Netw Grammati Pantziou received the Diploma in Mathematics and her Ph.D. Degree in Computer Science from the University of Patras, Greece, in 1984 and 1991, respectively. She was a Post-Doctoral Research and Teaching Fellow at the University of Patras (1991–1992), a Research Assistant Professor at Dartmouth College, Hanover, NH, USA (1992–1994), an Assistant Professor at the University of Central Florida, Orlando, FL, USA (1994–1995) and a Senior Researcher at the Computer Technology Institute, Patras (1995-1998). Since September 1998, she is a Professor at the Department of Informatics of the Technological Educational Institution of Athens, Greece. Her current research interests are in the areas of parallel computing, design and analysis of algorithms, distributed and mobile computing and multimedia systems.

Charalampos Konstantopoulos received his Diploma in Computer Engineering from the Department of Computer Engineering and Informatics at University of Patras, Greece (1993). He also received his Ph.D. Degree in Computer Science from the same Department in 2000. Currently, he is a Lecturer at the Department of Informatics, University of Piraeus, Greece. His research interests include parallel and distributed algorithms, mobile computing, sensor networks and multimedia computing.

123

An approach for near-optimal distributed data

An approach for near-optimal distributed data

Suggest Documents

AN APPROACH FOR DISTRIBUTED KALMAN FILTERING

CMPLServer An open source approach for distributed

Big Data Analytics: An Approach using Hadoop Distributed File System

An Object-Oriented Approach To Distributed Data ... - Semantic Scholar

An Agents & Artifacts approach to Distributed Data Mining

An Agent-Based Approach to Distributed Data and ...

Big Data Analytics: An Approach using Hadoop Distributed File System

An Efficient Distributed Data Extraction An Efficient Distributed ... - arXiv

An Efficient Distributed Data Extraction An Efficient Distributed ... - arXiv

Distributed Service-Based Approach for Sensor Data Fusion in ... - MDPI

A Distributed ADMM Approach for Mobile Data ...

Distributed Service-Based Approach for Sensor Data Fusion in ... - MDPI

AN INTERLINKING APPROACH FOR LINKED GEOSPATIAL DATA

An Approach for Estimating Forensic Data ...

An approach for harmonizing met-ocean data

Distributed Graduate Seminars: An Interdisciplinary Approach to ...

An Approach for Removing Redundant Data from RFID Data ... - MDPI

An Approach for Supporting Ad-hoc Modifications in Distributed

An approach for supporting distributed user interface ... - Hal

An Energy-Aware Distributed Approach for Content and Network ...

An IEC61499-Based Development Approach for Distributed Industrial ...

An integration centric approach for the coordination of distributed ...

An Energy-Aware Distributed Approach for ... - Computer Science

An Approach for Recovering Distributed System ... - Springer Link