Data Aggregation in Sensor Networks using Learning

0 downloads 0 Views 300KB Size Report
1Soft Computing Laboratory, Computer Engineering and Information Technology Department. Amirkabir University of ... 2 Institutes for Studies in Theoretical Physics and Mathematics (IPM). School of .... the highest Q-value is chosen for forwarding the node's ...... [28] B. V. Ramana and C. S. R. Murthy, “Learning-TCP: A.
Data Aggregation in Sensor Networks using Learning Automata M. Esnaashari1, M. R. Meybodi1,2 1

Soft Computing Laboratory, Computer Engineering and Information Technology Department Amirkabir University of Technology, Tehran, Iran 2 Institutes for Studies in Theoretical Physics and Mathematics (IPM) School of Computer Science, Tehran, Iran (Esnaashari,mmeybodi)@aut.ac.ir

Abstract: One way to reduce energy consumption in wireless sensor networks is to reduce the number of packets being transmitted in the network. As sensor networks are usually deployed with a number of redundant nodes (to overcome the problem of node failures which is common in such networks), many nodes may have almost the same information which can be aggregated in intermediate nodes, and hence reduce the number of transmitted packets. Aggregation ratio is maximized if data packets of all nodes having almost the same information are aggregated together. For this to occur, each node should forward its packets along a path on which maximum number of nodes with almost the same information as the information of the sending node exist. In many real scenarios, such a path has not been remained the same for the overall network lifetime and is changed from time to time. These changes may results from changes occurred in the environment in which the sensor network resides and usually cannot be predicted beforehand. In this paper, a learning automata based data aggregation method in sensor networks when the environment's changes can not be predicted beforehand will be proposed. In the proposed method, each node in the network is equipped with a learning automaton. These learning automata in the network collectively learn the path of aggregation with maximum aggregation ratio for each node for transmitting its packets toward the sink. To evaluate the performance of the proposed method computer simulations have been conducted and the results are compared with the results of three existing methods. The results have shown that the proposed method outperforms all these methods, especially when the environment is highly dynamic. Index Terms: Sensor Networks, Data Aggregation, Learning Automata

1. Introduction Sensor nodes are very prone to failure due to their limited resources and abilities. One major reason for

this failure is exhausting energy resources and for this reason energy is a vital factor in sensor networks. Multi-hop routing which is a way for gathering information of every node in every part of the network at the sink is one of the major energy consuming tasks performed throughout the lifetime of a sensor network. For this reason, each node should try to find a path to the sink using which minimum energy is consumed by the network. One way to do so is for every node to compute the amount of energy consumed on each path separately, and then to choose the path with the least energy consumption [16]. One problem with such a method is that, it needs a way of computing energy consumption along each path. Another way is to make use of data aggregation technique. In data aggregation technique, packets with related information are combined together in intermediate nodes to form a single packet before further forwarding to the sink. This reduces the number of transmitted packets in the network and as a result, less energy is consumed. Using this method, a node has to select a path along which maximum number of nodes exists with related packets. We refer to this path for each node as the Aggregation Path of that node hereafter. Finding Aggregation Path for each node is not so simple considering only local information. A simple approach to find an approximation of this path is to use a greedy algorithm. Each node tries to forward its packets to a neighbor which has related information to that of the node as it is done in [12]. The problem of finding Aggregation Path for each node becomes even more complicated if the information possessed by each node differs during the lifetime of the network. In such a situation, Aggregation Path of each node changes from time to time and hence, nodes should adapt themselves to these changes. In this paper a new method based on learning automata has been proposed to make nodes capable of adapting themselves to changes occurred in their Aggregation Paths. In the proposed method, each node in the network is equipped with one learning automaton. These learning automata collectively learn

lower is the clustering performance. They have also shown that the amount of energy consumed in a single node is strongly related to its position in the network; a node far from the sink may consume more energy than other nodes. They have also shown that the network lifetime is inversely proportional to total consumed energy. Some other methods are also reported for data aggregation in sensor networks. Guestrin et. al. in [10] try to approximate the data generated in a single node for a specific period of time with a third order equation, and send the coefficients of this equation to the sink instead of data itself. Dasgupta et. al. in [19] assume that each node in the network has the ability to aggregate data. Based on this assumption, they present a method for maximizing network lifetime. They define the network lifetime as the time elapsed before the first node dies due to energy exhaustion. This method tries to find an aggregation tree rooted at the sink spanning all the sensors and maximizes the lifetime of the network. For this purpose, they first find all possible aggregation trees and then, make use of a heuristic approach to find the one which can maximize the lifetime of the network. Beaver and Sharaf in [12] propose an algorithm which tries to find paths along which, sensors belonging to the same group (i.e. sensing the same phenomenon) reside. In this algorithm, sink will initiate a "path construction" packet through the network. Each node, upon receiving this packet for the first time, set its parent to the sender of the packet, and then forwards the packet to its neighbors. If a node receives "path construction" again, it checks whether sender of the packet belongs to the same group as itself or not. If so, this node, changes its parent to this new sender of "path construction". This way, each path to the sink will consist of mostly the nodes belonging to the same group and hence aggregation ratio along the path will be increased. This method performs well in situations where nodes do not change their groups during the lifetime of the network. Very few efforts in data aggregation make use of learning. Radivojak et. al. in [13] uses machine learning technique to force sensor nodes to send only specific data required by the sink. Learning algorithm used in this method is executed in the sink and its results are propagated throughout the network. Beyens et al. in [14] use a Q-learner in each node forcing the node to send its data via a path along which aggregation ratio is maximum. They regard the sensor network as a Multi-Agent System (MAS) in which the agents are the sensor nodes. Each agent has a number of actions equal to the number of its neighbor nodes. Each action is to select one of the neighbors for forwarding the packets of the node towards the sink. A

the Aggregation Paths of all nodes of the network and adapt nodes to the changes occurred in the paths. To evaluate the performance of the proposed method several experiments have been conducted and the results are compared with the results of three different methods: i) LEACH algorithm [11], ii) method given in [12] which performs data aggregation with no learning, and iii) method given in [14] which performs data aggregation using Q-learning. The results have shown that the proposed method outperforms all these methods, especially when Aggregation Paths are highly dynamic. The rest of this paper is organized as follows. Section 2 gives an overview on related works reported on data aggregation in sensor networks. Learning automata as a basic learning strategy used in the proposed method will be discussed in section 3. The problem statement is given in section 4 and in section 5 the proposed method is presented. Simulation results are given in section 6. Section 7 is the conclusion.

2. Related Work Some of the methods reported recently for data aggregation in sensor networks such as [1], [2] and [3], are query based methods. In these methods a query is generated at the sink and then is broadcasted through the network. Each node, upon receiving this query, processes it partially, and then passes it on to its neighbors. After the query is processed completely, its result will be sent back to the sink. In this scenario, some nodes just process the query, and some others, propagate it, receive partial results, aggregate the results, and send them back to the sink. In some other works, such as [4], [5], [6], [7], [8] and [11], the network is first clustered, and then cluster heads are used for aggregating data packets in each cluster separately. Among these methods, LEACH1 protocol which was introduced by Heinzelman et al. attracted more attention. The operation of LEACH is separated into two phases: the setup phase and the steady state phase. During the setup phase, a predetermined fraction of nodes p, elect themselves as cluster heads by comparing a chosen random number with a predefined threshold. In the steady state phase, cluster heads, aggregate reported data from their cluster member nodes and forward aggregated data to the sink. Lotfinezhad, and Liang in [9] try to investigate the effect of partially correlated data on the performance of the clustering methods for data aggregation. They have shown that partially correlated data has a strong effect on the clustering performance, meaning that the lower is the data correlation; the 1

Low Energy Adaptive Clustering Hierarchy

2

be a cluster head or a cluster member. Same authors in [33] propose a learning automata-based scheduling solution to the dynamic point coverage problem. In the proposed scheduling algorithm, the learning automaton of each node learns the sleep duration of that node based on the movement pattern of a single moving target point in the network.

Q-value is associated with each action. When data is routed from the node A to the sink via neighbor node B, a feedback is given by B to A. The feedback is computed based on the number of hops from B to the sink and the aggregation ratio gained in B. This feedback is used in A to update the Q-value of the action corresponding to the selection of B. After the learning, the neighbor corresponding to the action with the highest Q-value is chosen for forwarding the node's data towards the sink. Learning automata is proved to perform well in the dynamic environments of wireless, ad hoc and sensor networks. Haleem and Chandramouli in [23] use learning automata to address a cross-layer design for joint user scheduling and adaptive rate control for downlink wireless transmission. The proposed method tends to ensure that user defined rate requests are satisfied by the right combination of transmission schedules and rate selections. Nicopolitidis et. al. in [24] propose a bit rate control mechanism based on learning automata for broadcasting data items in wireless networks. A learning automaton is used in the server which learns the demand of wireless clients for each data item. As a result of this learning, the server is able to transmit more demanded data items by the network more frequently. Same authors in [25] propose a learning automata based polling protocol for wireless LANs in which the access point uses a learning automaton to assign to each station a portion of the bandwidth proportional to the station's need. A decentralized approach of the above method is also given in [26, 27]. Ravana and Morthy in [28] propose Learning-TCP, a novel learning automata based reliable transport protocol for wireless networks, which efficiently adjusts the congestion window size and thus reduces the packet losses. Learning automata is also used in cellular radio networks to dynamically adjusting the number of guard channels [29, 30, 31]. Recently, a few attempts are made for applying learning automata to sensor networks [22, 32, 33]. In [22], Gholipour and Meybodi propose a learning automata-based mobicast routing protocol for wireless sensor networks. The proposed protocol uses learning automata to adaptively determine the location and the shape of the forwarding zone in such a way that the number of activated nodes in each forwarding hop along the routing path is almost the same. A clustering algorithm using cellular learning automata is proposed by Esnaashari and Meybodi in [32]. In this clustering algorithm, each node is equipped by a learning automaton. Learning automaton of each node in cooperation with the learning automata of the neighboring nodes determines the role of the node to

3. Learning Automata Learning automata is an abstract model which randomly selects one action out of its finite set of actions and performs it on a random environment. Environment then evaluates the selected action and responses to the automata with a reinforcement signal. Based on selected action, and received signal, the automata updates its internal state and selects its next action. Figure 1 depicts the relationship between an automata and its environment. Environment can be defined by the triple E = {α , β , c } where α = {α1 , α 2 ,..., α r } represents a finite input set, β = {β1 , β 2 ,..., β r } represents the output set, and

c = {c 1 , c 2 , ..., c r } is a set of penalty

probabilities, where each element ci of c corresponds to one input of action α i . An environment in which β can take only binary values 0 or 1 is referred to as P-model environment. A further generalization of the environment allows finite output sets with more than two elements that take values in the interval [0, 1]. Such an environment is referred to as Q-model. Finally, when the output of the environment is a continuous random variable which assumes values in the interval [0, 1], it is referred to as an S-model. Learning automata are classified into fixed-structure stochastic, and variable-structure stochastic. In the following, we consider only variable-structure automata. α (n) R a n d o m E n v ir o n m e n t

L e a r n in g A u to m a ta

β (n)

Figure 1. Relationship between learning automata and its environment

A variable-structure automaton is defined by the quadruple in which {α , β , p ,T } represents the action set of the α = {α 1 , α 2 , L , α r } automata, β = { β 1 , β 2 , L , β r } represents the input set,

p = { p1 , p 2 , L, p r } represents the action probability

3

set, and finally p(n + 1) = T [α (n), β (n), p(n)] represents the learning algorithm. This automaton operates as follows. Based on the action probability set p, automaton randomly selects an action α i , and performs it on the environment. After receiving the environment's reinforcement signal, automaton updates its action probability set based on equations (1) for favorable responses, and equations (2) for unfavorable ones. pi (n + 1) = pi (n) + a.(1 − pi (n)) p j (n + 1) = p j (n) − a. p j (n) pi ( n + 1) = (1 − b). pi ( n) b p j (n + 1) = + (1 − b) p j (n) r −1

∀j j ≠ i

• ζi − ζj < ε ; • Ω i ≠ ∅;

∀s i , s j ∈ Ω k ,

k = 1, 2,..., M

for i = 1, 2, ..., M

• Ω i ∩ Ω j = ∅;

for i ≠ j

M



UΩ

i

=A

i =1

We assume that each Ω k can be approximated with a simple circle with center (Xk, Yk) and radius Rk. Also we assume that ζ changes from time to time in

(1)

different parts of the network and hence, Ω1 , Ω 2 , …, Ω M are not static regions, but their positions and

∀j j ≠ i

radiuses vary during the network lifetime. The aim of the network is to periodically collect in the sink node the estimate of the position, the radius  ) about each region Ω . For and the estimated ζ$ ( Ζ

(2)

In these two equations, a and b are reward and penalty parameters respectively. For a = b, learning algorithm is called LR − P 2, for b 0 . Considering a dense network, measured value of ζ is almost equal in some certain region ( Ω ⊂ A ) around each node i, and hence ζ$ i can be

aggregated with ζj , ∀s j ∈ Ω . ζ partitions the A into M regions Ω1 , Ω 2 , …, Ω M with the following properties:

2

Linear Reward-Penalty Linear Reward epsilon Penalty 4 Linear Reward Inaction

Figure 2. A schematic of the simulated environment with

3

18 sensor nodes, and 5 different climates. Dashed circles

4

give different partitions of the network based on the estimate of the temperature

are averaged to find an estimate of the center of the region to which they belong. The estimated radius of the region then becomes the distance between the estimated center and the farthest node. Without location aggregation, a single aggregated packet resulted from aggregating n different packets from n different nodes, must contain the positions of all n nodes. This makes the aggregated packet too long which consumes higher energy for transmission. In order to have more aggregation in the network each node must try to route its packets toward the sink using a path along which more aggregation can be performed. Since the positions of the regions changes dynamically, the routing algorithm must adapt itself to these changes. To achieve the above two goals each node in the network is equipped with a learning automaton which is responsible for selecting the next best hop to forward the packets towards the sink during a data gathering round. In the rest of this section, we first describe the first version of the proposed algorithm which we call it the basic algorithm in order to give the main idea. Then two improvements to the basic algorithm are given to further improve the aggregation ratio and prolong the network lifetime.

To specify the problem more clearly, without loss of generality, we make use of a specific application in which nodes have the ability to sense the temperature, hence ζ is temperature in this application. We assume that several circular climates exist in the environment each having a different temperature, and moving along a specific direction with a specific velocity. The velocity and direction of the movement of each climate may be changed randomly in time. These climates and their temperatures partition the area of the network into different parts ( Ω ) each having a different temperature. Figure 2 depicts such an environment with 18 sensor nodes, and 5 climates. Dashed circles in this figure give different partitions of the network based on the estimate of temperature in different nodes. Note that these dashed circles have some area in common, since each Ω is estimated by a circle. However, it's not a complicated task for the sink node to make a more accurate estimation of Ω s in which these common areas are removed. Each sensor node reports the temperature of the climate in which it resides. If it happens for a node to reside in more than one climate region, its sensor will sense the temperature equal to the mean temperature of all climates it resides in. A large static climate is assumed to be in the environment which covers the whole area A of the network and its temperature is assumed to be 0. With the above assumptions, one can simply computes the temperature each node reports in figure 2. For example, s2 and s5 report 37, s4 reports -6, and s3 reports mean of 37 and -6 which is 15.5. In addition s1, s7, s8, s9, s10, s15 and s18 report 0 as their estimations.

5-1. Route Discovery Phase The route discovery algorithm is used to discover all possible paths for each node of the network towards the sink. The algorithm is based on the simple flooding method. Sink node initiates a "Path Construction" packet and broadcasts it throughout the network. This packet contains a parameter called “DistanceToSink” which is set to 1 at the sink node. Other nodes wait to receive the "Path Construction". Upon receiving this packet, three different cases may be arisen: (a) A "Path Construction" packet with lower "DistanceToSink" than that of current packet has been previously received at this node. In this case, the newly received packet is ignored. (b) The previously received "Path Construction" packets have equal "DistanceToSink" with that of current packet. In this case, sender of the "Path Construction" is added to the routing table as a new alternative path to the sink, "DistanceToSink" is increased by 1, and the packet is broadcasted over the network again. (C) The previously received "Path Construction" packets have greater "DistanceToSink" than that of current packet. This means that a path with fewer numbers of hops to the sink is found. So, all the previously found paths are removed from the routing table. Then sender of the currently

5. The Proposed Method In the network every node si has an internal clock which activates the node every t seconds to measure ζ and report it to the sink using a multi-hop routing scheme. The proposed routing scheme consists of two phases: route discovery phase and route selection phase. The route discovery phase of the routing scheme is first used to find for each node a number of alternative routes towards the sink. The route selection phase is then used to select a route among all alternative routes using which more aggregation can be performed. Data aggregation is performed along each route to the sink wherever it is possible. Data and location aggregation will be done at the same time; that is wherever data aggregation is performed, location aggregation is also performed. In location aggregation, the positions of the nodes whose data are aggregated

5

- ζi , Κ which is the aggregated data computed using

received "Path Construction" is added as a path to the sink, "DistanceToSink" is increased by 1, and finally the packet is broadcasted over the network again. The algorithm continues until each node in the network discovers all its paths with the same number of hops toward the sink. For each path, only storing the next hop in the routing table of the node will be sufficient. At the end of the route discovery phase, each node in the network has a list of all its neighbors using which it can forward its packets toward the sink. We use RoutingListi (RLi) to refer to this list for node si where |RLi| is the number of entries in this list.

i

equation (3): ζi + ζ

∑ ζ

j

(3)

j =1

=

i ,Κ i

n

n +1

In the above equation, n is the number of packets received at node si.

-

(x

i ,Κ i

, y i ,Κ

i

) which is the aggregated location and

computed using equations (4): n

xi +

j

j =1

x i ,Κ =

5-2. Route Selection Phase

∑x

n +1

i

(4)

n

After the route discovery phase is over each node in the network is activated every t seconds and reports its information to the sink node. In this phase each node uses a learning automaton for selecting the next best hop to forward its packets towards the sink during each data gathering round. This way, each node gradually learns the best neighbor (the neighbor with the most related data) for forwarding its packets toward the sink. The learning automaton associated to node si referred to by LAi has |RLi| actions whose probability 1 . Each action of selecting each is initially set to RLi

yi + y i ,Κ = i



yj

j =1

n +1

Received data packets are temporary stored and will be processed upon the next activation of si. In the following we discuss what a node performs when activated and what a node performs during the time between two subsequent activations. Node si upon its nth activation performs the following operations:

- si sets Ki = 1. - si senses its surrounding environment to measure ζi .

of the LAi corresponds to the selection of one of the neighboring nodes listed in the RLi to be used for forwarding packets of si towards the sink. After node si is activated and has measured the temperature, it asks its learning automaton to select one of its neighboring nodes (actions) for forwarding the measured temperature toward the sink. Data are aggregated during the route selection phase. The action selected by the automaton will be rewarded or penalized based on the acknowledgment received from the selected neighboring node. How to penalize or reward the selected action will be described later. Between two subsequent activations of node si it may receive two different packet types from its neighbors; data packet and acknowledgement packet. A data packet contains measured temperature by the sender of the packet and its location whereas acknowledgment packet is the response of a neighbor to a data packet sent by si to that neighbor. A data packet transmitted by node si contains three different parameters: - Ki which specifies the number of packets aggregated into that packet

- si sets ζi ,Κ i = ζ i , x i ,Κ i = x i , y i ,Κ i = y i - If no data packet is received during the (n-1)th activation and nth activation of si then

o

si creates a data packet containing

ζi ,Κ i ,

( x i ,Κ , y i ,Κ ) and Ki. i

i

- else If any data packet is received then o For each received packet from any neighbor j do ƒ If the inequality (5) is satisfied then • si performs data aggregation using a recursive version of the equation (3) given in equation (6) • si performs location aggregation using a recursive version of equations (4) given in equations (7) • si sets Ki = Ki + Kj • si uses equation (8) to compute the data aggregation ratio (DARi,j) • DARi,j is sent back to node sj via an acknowledgement packet as the feedback from the environment.

6

Where in equations (10) and (11), α is the reward rate, and β is the punishment rate. If by any means, node si does not receive any acknowledgment from node sk, LAi penalizes its selected action according to equations (11) considering DARk,i as zero. Node si then retransmits its measured temperature via one of its neighboring nodes listed in RLi other than node sk. If no other neighboring node exists in RLi, node si transmits a RouteRequest packet to all of its neighboring nodes. If a neighboring node l receives the packet and si is not listed in its RLl, it replies with a RouteReply message. Node si gathers the received RouteReply packets and adds them to its RLi. The LAi of si is reinitialized with all actions having the same action probabilities. If node si does not receive any RouteReply packet, then it can be regarded as a dead node which at this time, the network lifetime is considered to be over.

o si creates a data packet containing ζi ,Κ , i

( x i ,Κ , y i ,Κ ) and Ki. i

i

- LAi selects one of its actions (say action k) to determine -

the neighbor to which newly made data packet should be forwarded. Newly created data packet and the data packets received from the neighbors of si, for which inequality (3) is not satisfied, are transmitted to neighbor k.

ζi ,Κ i − ζj ,Κ j < ε (5)

ε is a threshold which specifies the maximum difference between measured temperatures below which aggregation can be performed. ζi , Κ =

Κ i ⋅ ζi , Κ + Κ j ⋅ ζj , Κ i

(6)

j

Κi + Κ j

i

5-3. Improvements x i ,Κ =

Κ i ⋅ x i ,Κ + Κ j ⋅ x j ,Κ i

Κi + Κ j

i

y i ,Κ =

Κ i ⋅ y i ,Κ + Κ j ⋅ y j ,Κ i

In this section we discuss two improvements to the method proposed in the previous section. For the sake of clarity in presentation in the rest of this paper we call the proposed method as "Basic Algorithm", and the basic algorithm with the first improvement as "Algorithm 1" and with the second improvement as "Algorithm 2". Algorithm 1: In algorithm 1 unlike the basic algorithm in which only the aggregation ratio between the node and the next node in the routing path is considered for choosing a neighboring node, the aggregation ratio between the node and its two next nodes in the routing path is considered for choosing a neighboring node. For the network of figure 3, if the basic algorithm is used then node s10 chooses one of its neighbors s7, s8 and s9 at random for sending its packets towards the sink, whereas algorithm 1 chooses node s7. This is because the second node from s10 in the routing path passed through s7 is node s6 which is in the same region as s10 and hence its information can be aggregated with information of s10. Algorithm 1 performs the following. Each node si upon receiving a data packet from its neighbor sj in its nth activation time, computes DARi,j using equations (8) and (9), and then makes use of equation (12) for computing Enhanced-DARi,j. Once Enhanced-DARi,j is computed, this will be sent back to node sj as the response of node si.

j

(7) j

Κi + Κ j

i

DAR i , j =

⎛ Min (ζi ,Κ

U⎜

i

+ ε , ζj ,Κ + ε ) − Max (ζi ,Κ − ε , ζj , Κ − ε ) ⎞ (8) j i j

⎜ ⎝

⎟ ⎟ ⎠



where U (x ) =

⎧x ⎨ ⎩0

;x ≥ 0 ;x < 0

(9)

Upon receiving the acknowledgment packet containing DARk,i by si, LAi penalizes or rewards its selected action (action k). If DARk,i is greater than threshold AcceptRate, then action k is rewarded according to equations (10) pk (n +1) = pk (n) +α.(DARk ,i ).(1− pk (n)) pl (n +1) = pl (n) −α.(DARk ,i ).pl (n) ∀l l ≠ k

(10)

and otherwise, it is penalized according to equations (11) pk (n +1) =(1−β.(1−DARk ,i )).pk (n) pl (n +1) =

β.(1−DARk ,i ) r −1

+(1−β.(1−DARk ,i ))pl (n)

∀l l ≠ k

Enhanced − DAR i , j = Max (DAR i , j , MRR i (n )) (12)

(11)

7

the first two experiments, the basic algorithm is compared with three different methods: i) LEACH algorithm proposed in [11], ii) method given in [12] with no learning, and iii) method given in [14] which uses Q-learning. In the third experiment, the performance of the proposed data aggregation algorithm is studied by comparing the number of packets received at the sink node in each data gathering round by the number of packets which needed to be received at the sink at that round. The fourth experiment compares algorithms 1 and 2 with the Basic Algorithm. Last experiment studies the convergence behavior of learning automata in different nodes of the network. All simulations have been implemented using NS2 simulator. We have used IEEE 802.11 as the MAC layer protocol, two ray ground as the propagation model and omnidirectional antenna. Nodes are placed randomly on a 2 dimensional area of size 100(m)x100(m). Energy model specified in [19] is used for estimating amount of energy consumed for packet transmissions. All packets except for data packets, which are 525 bytes long, are assumed to be 8 bytes long. Environment is modeled by 12 circular climates; each has its own temperature, direction and velocity of movement. Climates are moving using random way point movement strategy with a velocity which is randomly selected from a normal distribution with µ = .001 and σ = .0001 . Movements of the climates are bounded to the boundaries of the network, and hence if a climate happens to get out of the environment boundaries, it changes its direction by 180 degree and comes back to the environment of the network. In all experiments, α and β for the proposed algorithm are set to .1, and ε (acceptable range of aggregation) is set to 5. Also α and β parameters in Q-learning method are set to .1 and .01 respectively. These values are selected experimentally. t which is the time interval between two consecutive sending of data to the sink in each node is equal to 10 seconds. AcceptRate is set to .85. Simulations are performed for 50, 100, 200, 300, 400, and 500 nodes. The results are averaged over 50 runs. Experiment 1: In this experiment, we study the total number of packets received at the sink node. Figure 4 depicts the results for the proposed method and three existing methods mentioned earlier. This figure shows that the proposed method outperforms methods i, ii, and iii by 66%, 37%, and 9%, respectively. Figure 5 compares the results obtained for the proposed method with the results obtained for other three methods for N = 500 when µ varies in the

Figure 3 Network of node s10 and the environment around it In equation (10), MRRi stands for Maximum Received Reward and is defined by equation (13).

(

MRR i (n ) = max DAR k ,i ; for k ∈ RLi 1≤t ≤ n

)

(13)

The max operation is performed over all responses received by node si from the network startup until the time it is computed. Using this algorithm, in the network of figure 3, the computed Enhanced-DARi,j at node s7 is more than that of nodes s8 and s9. This is because DARi,j in all three nodes are equal, but MRR at node s7 is more than that in nodes s8 and s9 which results in higher probability of selecting node s7 by node s10. Algorithm 2: Algorithm 2 is algorithm 1 in which when a node cannot find a neighboring node for data aggregation (a neighboring node in the same region) such as node s6 in figure 3, it chooses a neighboring node with more residual energy for forwarding its packets. Using this algorithm, a node located at the boundary of a region, forward it packet to the outside of its region using a neighbor with highest residual energy. This means that routing inside a region is performed based on data aggregation ratio between nodes whereas routing between different regions is performed based on residual energy. In algorithm 2, upon receiving a packet in node si from the neighboring node sj, Enhanced-DARi,j, is computed. If this ratio is more than AcceptRate, node si sends back a favorable response to LAj. Otherwise, node si considers its residual energy to compute the response to action i of LAj; if the normalized residual energy (NREi computed using equation (14)) is more than AcceptRate, the response is favorable and is unfavorable otherwise. NRE i =

EL i MaxEnergyLevel

(14)

In the above equation, ELi is the residual energy level of node si and MaxEnergyLevel is the residual energy level of a full battery charged node.

6. Experimental Results To evaluate the performance of the proposed method several experiments have been conducted. In

8

figure 7 shows that the proposed method performs better in terms of mean energy consumption as µ increases. Figure 8 compares the lifetime of the network for the proposed method and other three methods in terms of the Round of Death metric. Higher lifetime for the proposed algorithm as shown in figure 8 is due to reduction in the number of transmitted packets in the network. As it can be seen from figure 8, when the number of nodes in the network increases, Round of Death decreases. This is mainly due to the fact that in a denser network, sensor nodes near the sink relay more packets and hence their energy is depleted earlier resulting in a lower Round of Death. Experiment 3: This experiment is conducted to study the efficiency of the proposed method with respect to the information received at the sink node during a data gathering round. For this purpose, we compare the number of packets actually received at the sink node using the proposed method and the number of packets which needed to be received at the sink node in each data gathering round. Figure 9 which gives this comparison shows that by applying the proposed method, number of packets received at the sink node in each data gathering round gradually approaches to its ideal amount, which is, the number of regions in the environment in that round. Peaks in this figure correspond to sudden changes in the number of regions due to the movements of climates. It can be seen that the proposed algorithm learns new routing paths and hence adapts the network to these changes.

range [0, .05]. This figure shows that the proposed method has no superiority over other methods when climates have no movement ( µ =0), but it works better as µ increases.

Total Number of Received Packets at the Sink

40000 Basic Algorithm 35000

Q-Learning Without Learning

30000

Leach

25000 20000 15000 10000 5000 0 50

100

200

300

400

500

Number of Nodes

Figure 4. Total number of received packets at the sink

Total Number of Received Packets at the Sink

40000 Basic Algorithm 35000

Q-Learning Without Learning

30000

Leach

25000 20000 15000 10000 5000 0 0

0.0001

0.0005

0.001

0.005

0.01

0.05

µ

Figure 5. Changes in total number of received packets at

35

Mean Energy Consumption of Nodes

the sink node with respect to changes in µ

Experiment 2: In this experiment, we study the mean energy consumption of nodes in the network. Figure 6 shows that the proposed method outperforms all other three methods in terms of mean energy consumption. It should be noted that in the proposed method, the overhead of the acknowledgement packets has also been taken into consideration for estimating the total energy consumption. We found out that on average, the amount of energy consumed for sending or receiving acknowledgement packets by each node is about 30% of the total energy consumed by that node. However, reduction of the total number of data packets transmitted throughout the network in the proposed method can compensate the overhead of these acknowledgment packets. The proposed method works well when climate zones are moving. To show this, we compared the results obtained for the proposed method with the results obtained for other methods for N = 500 for different values of µ changing between 0 and .05. The comparison whose result is given in

Basic Algorithm

30

Q-Learning Without Learning

25

Leach

20

15

10

5

0 50

100

200

300

400

500

Number of Nodes

Figure 6. Mean energy consumption of node

9

using algorithm 1 or 2. Figure 12 compares basic algorithm, algorithm 1 and algorithm 2 in terms of network lifetime. As before, Round of Death metric is used for this purpose. It can be seen that algorithm 1 and algorithm 2 both result in longer lifetime for the network.

40 35

Q-Learning Without Learning

30

Leach

25 20

14000

15

Total Number of Packets Received at the Sink

Mean Energy Consumption of Nodes

Basic Algorithm

10 5 0 0

0.0001

0.0005

0.001

0.005

0.01

0.05

µ

Figure 7. Changes in mean energy consumption of nodes with respect to changes in µ

1400

Basic Algorithm Algorithm 2 10000

8000

6000

4000

2000

Basic Algorithm 1200

Algorithm 1

12000

0 50

Q-Learning

100

Without Learning

Round of Death

1000

300

400

500

Figure 10. Total number of packets received at the sink

800

600

18

400

Mean Energy Consumption of Nodes

16

200

0 50

100

200

300

400

500

Number of Nodes

Figure 8. Network lifetime based on the Round of Death metric

Basic Algorithm 14

Algorithm 1 Algorithm 2

12 10 8 6 4 2

250

0

Basic Algorithm

50

Number of Regions

200

100

200

300

400

500

Number of Nodes

Figure 11. Mean energy consumption of nodes

150

100

1600 Basic Algorithm 1400

Algorithm 1

50

Algorithm 2 1200 Round of Death

Number of Packets Recieved at the Sink

200

Number of Nodes

Leach

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

Data Gathering Round

Figure 9. Number of packets received at the sink in each data gathering round for Basic Algorithm vs. the number of packets which must be received at the sink at that round

1000 800 600 400 200 0

Experiment 4: In this experiment, the basic algorithm is compared with algorithms 1 and 2 with respect to the mean energy consumption of nodes and the total number of packets received at the sink node. As it is shown in figures 10 and 11, comparing to the basic algorithm, algorithms 1 and 2 both results in lesser number of received packets at the sink and lower mean energy consumption in each node. This indicates that higher data aggregation ratio can be obtained

50

100

200

300

400

500

Number of Nodes

Figure 12. Network lifetime based on the Round of Death metric

Experiment 5: This experiment is conducted to better understand the convergence behavior of learning automata residing in the nodes of the network. For this purpose we keep track of the action probability vectors

10

of 3 learning automata residing in nodes s27, s42, and s21 of network of figure 13. RL27 has three entries corresponding to s16, s26 and s36 as shown in figure 13(a). Thus learning automaton LA27 has three actions each corresponds to one of these nodes. None of s16, s26 and s36 is in the same region as node s27; and hence no aggregation can be performed. As a result, the action probability of none of the actions of LA27 approaches 1 as it is depicted in figure 14. RL42 has three entries s31, s32 and s33 (figure 13(b)). Node s32 is in the same region as s42 and hence s32 can aggregate packets received from s42. Therefore, as it shown in figure 15, the probability of action corresponding to s32 approaches 1. Finally, RL21 has three entries s10, s20 and s30 (figure 13(c)). Node s21 is in the same region as s10 and s20 and hence both s10 and s20 can aggregate packets received from s21. As a result, the probability of the two actions corresponding to s10 and s20 approach .5 whereas the probability of the action corresponding to s30 approaches 0. This is depicted in figure 16.

1.2

P rob ab ility

1 0.8 0.6

Action 1 (Neighbor 31) Action 2 (Neighbor 32)

0.4

Action 3 (Neighbor 33)

0.2 0 1 24 47 70 93 116 139 162 185 208 231 254 277 300 323

Time

Figure 15. Action probabilities for node s42 1 0.9

Action 1 (Neighbor 10) Action 2 (Neighbor 20)

0.8

Action 3 (Neighbor 30)

Probability

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307 324 341 358 375 392

Time

Figure 16. Action probabilities for node s21

7. Conclusion In this paper we proposed a novel method based on learning automata for data aggregation in wireless sensor networks especially when the environment's changes can not be predicted beforehand. In this method each node in the network is equipped with a learning automaton. The learning automaton has a number of actions each of which corresponds to one of the neighbors of the node. The learning automaton for each node helps the node to find the next best hop for forwarding its packets toward the sink with the aim of performing as much as data aggregation as possible. It was shown through simulations that the proposed method outperforms the existing methods in terms of network's lifetime especially when the environment is highly dynamic.

Figure 13. (a) Environment around node number s27 and its neighbors, (b) Environment around node number s42 and its neighbors, (c) Environment around node number s21 and its neighbors 0.5

P rob ability

0.4 0.3 0.2 0.1

Action 1 (Neighbor 16) Action 2 (Neighbor 26)

References

Action 3 (Neighbor 36)

0 1 30 59 88 117 146 175 204 233 262 291 320 349 378

[1] Y. Xu, W. C. Lee, J. Xu, and G. Mitchell, "Processing Window Queries in Wireless Sensor Networks", IEEE International Conference on Data Engineering (ICDE'06), Atlanta, GA, April 2006. [2] R. Rosemark and W. C. Lee, "Decentralizing Query Processing in Sensor Networks", the Second International Conference on Mobile and Ubiquitous Systems: Networking and Services (Mobiquitous'05), San Diego, CA, July, 2005, pp. 270-280.

Time

Figure 14. Action probabilities for node s27

11

Workshop on Adaptive Agents and Multi-Agent Systems (AAMAS 05), Paris, France", 2005. [15] K. S. Narendra and M. A. L. Thathachar, Learning Automata: An introduction, Prentice Hall, 1989. [16] R. Shah and J. Rabaey, "Energy Aware Routing for Low Energy Ad Hoc Sensor Networks", in Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), Orlando, Florida, March 2002. [17] M. Ilyas, I. Mahgoub, "Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems", CRC Press, London, Washington, D.C., 2005. [18] K. Akkaya, M. Younis, "A Survey on Routing Protocols for Wireless Sensor Networks", Elsevier Ad Hoc Network Journal, pp. 325-349, 2005. [19] K. Dasgupta, K. Kalpakis and P. Namjoshi, “An Efficient Clustering-based Heuristic for Data Gathering and Aggregation in Sensor Networks”, IEEE Wireless Communications and Networking Conference, Vol.4, No. 1, March, 2003 [20] W. Heinzelman, A. Chandrakasan, H. Balakrishnan, “Energy Efficient Communication Protocol for Wireless Microsensor Networks”, Intl. Conf. on System Sciences, Hawaii, January 2000. [21] M. A. L. Thathachar and P. S. Sastry, “Varieties of Learning Automata: An Overview”, IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 32, No. 6, pp. 711-722, 2002. [22] M. Golipour and M. R. Meybodi, “LA-Mobicast: A Learning Automata based Mobicast Routing Protocol for Wireless Sensor Networks”, Sensor Letters, Vol. 6, No.2, pp. 305-311, April 2008. [23] M. Haleem and R. Chandramouli, “Adaptive downlink scheduling and rate selection: a cross layer design”, Special issue on Mobile Computing and Networking, IEEE Journal on Selected Areas in Communications, vol. 23, no.6, June 2005. [24] P. Nicopolitidis, G. I. Papadimitriou and A. S. Pomportsis, “Exploiting Locality of Demand to Improve the Performance of Wireless Data Broadcasting”, IEEE Transactions on Vehicular Technology, vol.55, no.4, pp. 1347-1361, July 2006. [25] P. Nicopolitidis, G. I. Papadimitriou and A. S. Pomportsis, “Learning-Automata-Based Polling Protocols for Wireless LANs”, IEEE Transactions on Communications, vol.51, no.3, pp. 453-463, March 2003. [26] P. Nicopolitidis, G. I. Papadimitriou, M. S. Obaidat and A. S. Pomportsis, “Carrier-sense-assisted Adaptive Learning MAC Protocol for Distributed Wireless LANs”, International Journal of Communication Systems, Wiley, vol.18, no. 7, pp. 657-669, September 2005. [27] P. Nicopolitidis, G. I. Papadimitriou and A. S. Pomportsis, “Distributed Protocols for Ad-Hoc Wireless LANs: A Learning-Automata-Based Approach”, Ad Hoc Networks Journal, Elsevier, vol.2, no.4, pp. 419-431, October 2004. [28] B. V. Ramana and C. S. R. Murthy, “Learning-TCP: A Novel Learning Automata Based Congestion Window Updating Mechanism for Ad hoc Wireless Networks”, in Proc. 12th IEEE International Conference on High

[3] J. Winter, Y. Xu, and W. C. Lee, "Energy Efficient Processing of K Nearest Neighbor Queries in Locationaware Sensor Networks", the Second International Conference on Mobile and Ubiquitous Systems: Networking and Services (Mobiquitous'05), San Diego, CA, July, 2005, pp. 281-292. [4] G. Bontempi and Y. Le Borgne, “An adaptive modular approach to the mining of sensor network data”, Workshop on Data Mining in Sensor Networks, SIAM SDM, Newport Beach, CA, USA, April 2005. [5] C. Liu, K. Wu, and J. Pei, "A Dynamic Clustering and Scheduling Approach to Energy Saving in Data Collection from Wireless Sensor Networks", In Proceedings of the Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks (SECON'05), Santa Clara, California, USA, September, 2005. [6] O. Younis and S, Fahmy, "An Experimental Study of Routing and Data Aggregation in Sensor Networks", In Proceedings of the International Workshop on Localized Communication and Topology Protocols for Ad hoc Networks (LOCAN), held in conjunction with The 2nd IEEE International Conference on Mobile Ad Hoc and Sensor Systems (MASS-2005), November 2005. [7] R. Virrankoski and A. Savvides, “TASC: Topology Adaptive Spatial Clustering for Sensor Networks”, Second IEEE Intl. Conf. on Mobile Ad Hoc and Sensor systems", Washington, DC, November, 2005. [8] S. Soro and W. Heinzelman, "Prolonging the Lifetime of Wireless Sensor Networks via Unequal Clustering," Proceedings of the 5th International Workshop on Algorithms for Wireless, Mobile, Ad Hoc and Sensor Networks (IEEE WMAN '05), April 2005. [9] M. Lotfinezhad and B. Liang, “Effect of partially correlated data on clustering in wireless sensor networks,” in Proceedings of the IEEE International Conference on Sensor and Ad hoc Communications and Networks (SECON), Santa Clara, California, October 2004. [10] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin and S. Madden, “Distributed Regression: An Efficient Framework for Modeling Sensor Network Data”, Intel corporation, 2004. [11] W. Heinzelman, A. Chandrakasan, H. Balakrishnan, "Energy-efficient communication protocol for wireless microsensor networks", Proc. of 33rd Hawaii Intl. Conf. on System Science (HICSS '00), January 2000. [12] J. Beaver, M. A. Sharaf, A. Labrinidis, and P. K. Chrysanthis, “Location-Aware Routing for Data Aggregation in Sensor Networks”, Proc. of the 2nd Hellenic Data Management Symposium, 2003. [13] P. Radivojac, U. Korad, K. M. Sivalingam and Z. Obradovic, "Learning from Class-Imbalanced Data in Wireless Sensor Networks," IEEE Semiannual Vehicular Technology Conference, VTC-Fall 2003, Vol. 5, pp. 3030-3034, Orlando, Florida, U.S.A., October 2003. [14] P. Beyens, M. Peeters, K. Steenhaut and A. Nowe, "Routing with Compression in Wireless Sensor Networks: a Q-learning Approach", In "Fifth European

12

Performance Computing, pp. LNCS 454-464, December 2005. [29] H. Beigy, and M. R. Meybodi, “A Learning Automata Based Dynamic Guard Channel Scheme”, Lecture Notes on Information and Communication Technology, Vol. 2510 Springer Verlag, pp. 643-650, Nov. 2002. [30] H. Beigy and M. R. Meybodi, "An Adaptive Uniform Guard Channel Algorithm: A learning Automata Approach", Lecture Notes in Intelligent Data Engineering and Automated Learning, Springer Verlag, LANCES 2690, Berlin, Heidelberg, Germany, pp. 405-409, 2003.

[31] H. Beigy and M. R. Meybodi, "Learning Automata based Dynamic Guard Channel Algorithms", Journal of High Speed Networks, 2008, to appear. [32] M. Esnaashari and M. R. Meybodi, "A Cellular Learning Automata based Clustering Algorithm for Wireless Sensor Networks", Sensor Letters, Vol. 6, pp. 1-13, 2008, to appear. [33] M. Esnaashari and M. R. Meybodi, " Dynamic Point Coverage in Wireless Sensor Networks: A Learning Automata Approach", Lecture Notes in Computer Science Springer Verlag, Kish Island, Iran, pp. 758762, to appear.

13