Performance Evaluation of WSN Parameters Using ... - ijtes

0 downloads 0 Views 256KB Size Report
Communication Survey And Tutorials 2010. 2. Kok-Lim Alvin Yau A, Peter Komsarczuk,. Paul D. Teal “Reinforcement ... S.R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “TinyDB: An acquisitional query processing system for sensor ...
Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013

Performance Evaluation of WSN Parameters Using Reinforcement Learning: A Survey Ms. Pradnya M. Daflapurkar Ph.D.Scholar, Sathyabama University. Chennai – 600 119, Tamilnadu, India. Asst.Prof.,Marathwada Mitra Mandal‟s Institute of Technology, Lohgaon, Pune 411047 Maharashtra, India [email protected]

B. P. Patil Professor, Electronics & Telecommunication Army Institute of Technology, Dighi Hills, Dighi Pune – 411 015 Maharashtra, India bp_patil@ rediffmail.com Abstract. Wireless Sensor Network (WSN) consists of tiny nodes with limited energy and computational power. These nodes are always dynamically changing their locations in the environment. The intelligent, effective, autonomous support to wireless sensor network is the area of research for today‟s researchers. The optimized use of energy and resources to increase the life time of the network is the task considered in this scenario. Different computational intelligent paradigms are possible to work for this task. Reinforcement learning is one of the paradigm which is studied in this paper for the evaluating the performance parameter of WSN. Keywords: Reinforcement Learning, Wireless sensor networks, Q learning, Energy Efficient Routing

1

Introduction

A wireless sensor network(WSN) is a network of distributed autonomous devices that can sense or monitor physical or environmental conditions simultaneously [1]. WSN are used in numerous applications such as environmental monitoring habitat monitoring, prediction and detection of natural calamities, medical monitoring and structural health monitoring. WSN consists of large number of small inexpensive, disposable and autonomous sensor nodes that are generally deployed in the advoc manner in vast geographical areas for remote operations. Sensor nodes have several constraints in terms of storage resources, computational capabilities, communication bandwidth and power supply. Typically sensor nodes are grouped in clusters and each cluster has a node that acts as a cluster head. All the nodes forward their sensor data to the cluster head, which in turn routes it to a specialized node called a sink node or a base station through a multi hop wireless communication. However very often the sensor network is very small and consists of single cluster with a single base station. Other scenarios such as multiple base station or mobile nodes are also possible. Resource constraints

ISSN: 2320 – 8007

and dynamic topology put potential challenges in the network discovery, network control and routing, collaborative information processing, querying and tasking [1]. In wireless networks, intelligence enables each host to make the right decision at the right time to achieve the optimum performance [2]. Different computational intelligent paradigms are existing to create the intelligent machines resolve the above mentioned complex issues. Reinforcement learning (RL) is one of the paradigms which has been widely used by the researchers. RL is an unsupervised machine learning technique that improves system performance. The phrase „unsupervised‟ means machine learning technique the host to learn the knowledge about its operating environment by itself without being overseen by an external teacher or critic [2]. The present article is organized as follows. Section 2 covers the background and motivation of application of the RL approach in a wireless sensor networks through the policy based and intelligent based approach respectively [2]. Section 3 presents a brief overview of traditional reinforcement learning. Section 4 discusses challenges in sensor networks. This also includes RL and its applications in wireless sensor networks. Section 5 provides the RL techniques applied to one of the parameters of WSN. Section 6 summarizes RL application discussed above.

2

Various Approach

2.1 Policy based Approach In this type of approach each agent follows a predefined set of rules that is hardcoded. A policy is used to define the rules through the conditional statements i.e.if-then-else.as shown in fig 1 or state

1360

Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013 them in state-event diagram. When a host encounters a particular condition (or state) and an event in the operating environment, it performs the corresponding action. A state, such as queue size, is monitored at all time; while an event, such as call handoff happens occasionally and it is detected whenever it occurs. If (state S1, Then (Action event E1) A1); Else (state S2, Then (Action if event E2) A2); Else (state S3, Then (Action if event E3) A3); . . . . . Else (state Sn, Then (Action event En) An); End if

2.2.1 Necessity of Continuous Learning. In the dynamic environment, continuous learning is necessary so that policy remains optimal or near optimal, mainly there are two reasons for continuous learning. Firstly, the operating environment evolve with time such that new state-event pairs may be encountered, and new action may be discovered, hence the policy must be constantly updated to match the state and event pairs with the optimal or near optimal actions. Secondly, the network performance brought about by an action for particular state-event pair may degrade as time go back, so rematching may be necessary. Additionally most operating environment in wireless network possess statistical property that is the traffic load may be Poisson process hence may take many trials to learn the policy, so continuous learning is necessary. One of the machine learning technique which is used to achieve continuous learning is reinforcement learning [3]

Fig. 1. The If-then-else predefined policy

A major drawback of policy based system is that since actions are hand coded, they cannot be changed “on the fly” w.r.t. continually changing operating environment. Specifically, the relationship between the states, events and the action are static. Wireless communication is complex and dynamic system. For example, the spectrum used, channels, topology and nodal availability are uncertain factors that affect the performance in a complex matter. Hence, a policy based system may not be able to deal for all possible state and event encountered throughout its operations, which results in suboptimal performance. 2.2 Intelligent Based Approach This is an alternative tool, policy based approach, which incorporates the intelligence within the system, and it is called intelligent based approach. Intelligence enables each host to learn new step, event and action, as well as matching them so that optimal or near optimal actions can be taken. Basic concept behind this approach is “practice makes perfect” while making the decision on an optimal action is a difficult task, intelligence approximates an optimal action that is achieves an optimal or near optimal action as time goes by. A host learns about its action by evaluating feedback that is consequences of executing its action. Since traditionally policy base system is not receptive to feedback, it does not achieve intelligence. Through learning on the fly, the policy in fig.1 evolves with time in order to approximate optimal policy. In the next section we discuss the necessity of continuous learning in intelligence base approach and introduce RL as well.

ISSN: 2320 – 8007

3

Reinforcement Learning

Reinforcement learning (RL) (Mitchell, 1977; Sutton & Barto, 1998) is biologically inspired machine learning technique where learning agent acquires its knowledge from direct interaction with its environment [4]. A simple example is a mouse in maze, trying to find the path to a piece of cheese. At any moment, it must select the direction to move. The result of two action is either finding the cheese or not. This maps to reinforcement learning technique to which agent (e.g. mouse) select action (ex, direction to move) and received rewards (ex. Cheese) from the environment for each action.

Fig. 2. General RL Model. The agent selects one action according to its current internal state (current view of environment and previous knowledge) , fulfills this action.

1361

Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013 A well known and widely used RL algorithm is Q-learning, which model consist of: Agent State: The learning agent has finite set of possible state S and st represent the agent state at time t. In the above example the state of mouse is its current passion in maze. Action: Q-learning associates the different set of actions AS to each state S. In our maze environment, the movement steps of the mouse represent the action- forward, backward, left, right. Immediate Rewards: There is an associated immediate reward r(st, at) with each of the state transition. In the above example, all the state transition that do not lead to goal state have the immediate reward of 0 (no cheese) and that leading to the goal state have immediate reward of 1(cheese reached). The agent can see the action with their associate reward from its current state. Agents never have global knowledge about the state, the environment, and their rewards. Action Cost: In addition to reward, there is also a cost c(st, at) associated with each action in each state. This is a scalar value which indicates the cost of the action. In the above example, if cost one unit of energy (1 bite of cheese) for a mouth to make any movement. Costs are always considered negative rewards and hence they are directly subtracted from the immediate reward. Value Function: In contrast to immediate rewards which are associate to each action in each state and are easily observable, the value function represents the expected total accumulated reward. The goal of the agent is to learn the sequence of actions with maximum value function, that is, the reward on the taken path is maximum.

Initially, the Q-Values are always initialized to zero indicating the fact that agent knows nothing. Through trial and experience the agent is learning how good the action was. The Q-Values of the actions changes through learning and finally represent the absolute value function. After conversion the greatest Q-Value function which corresponds to action in each state guarantees taking optimal decision (path). Updating Q-Value. A simple rule defined to update the Q-Value after each step of the agent:

Q(st 1 , at )  Q(st , at )   ( R(st , at )  Q(st , at ))........(1) where

Q( st 1 , at )

The new Q-Value of the pair { st+1,at } when agent is in state st+1 after taking action at in state st. The old Q-Value and

Q( st , at ) term, consist of  ( R(st , at )  QThe (st , acorrection t ))

received reward and old Q-Value, γ is the learning constant.

It prevent the Q-Values from changing to fast and thus oscillating. The total received reward is computed as

R(st , at )  r (st , at )  c(st , at ) Where,

r ( st , at ) : immediate reward as defined above and

c( st , at ) : cost of taking action at in state st .

Fig. 3. The Reinforcement Learning Model. Q-Value. To represent the currently expected total future reward at any state Q-Value is associated to each action and state Q(st, at). The Q-Value represents the memory of the learning agent in term of the quality of the action in this particular state.

ISSN: 2320 – 8007

Exploration Strategy (Action Selection Policy). Learning is performed in episodes, e.g., the mouse takes actions in its environment and updates the associated Q-Value until reaching the cheese. After completion, a new episode begins, repeating until the Q-Values no longer change. The question is how to select the next action. Always taking the action with maximum Q-value (Greedy-Policy) will result in finding locally minimal solution. On the other hand, selecting always random (Random Policy) will mean ignoring prior experience and spending too much energy to learn the complete environment. These two extreme strategies are called Exploitation and Exploration of routes. The problem of combining and weighting both, so that optimal results are achieved as fast as possible. The most commonly used strategy is called Є-greedy: with

1362

Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013 probability Є the agent takes random action and with probability (1 - Є ) it takes the best available action. RL is well suitable for distributed problem such as routing. It has medium requirement for memory and rather low computation needs at the individual nodes. This arises from the need to keep many different possible action and their values. It needs some time to converge, but it is easy to implement, highly flexible to change in topology and learn the optimal solution (e.g. Shortest Path).

4

Challenges in sensor network

Real Deployment of WSN implements one of the three general applications: periodic reporting, event detection, and database like storage. Periodic reporting is the simplest application scenario in which at regular interval the sensor sample their environment store the sensory data and send it to the base station (s). The main property of periodic reporting application is predictability of data traffic and volume. In contrast, in event detection application nodes sense the environment and data is immediately evaluated for its usefulness. If useful data (an event) is detected, the data is transmitted to the base station (s). The data traffic can hardly be predicted: events usually occur randomly and resulting data traffic is irregular however a small amount of data has to be exchange for route management and aliveness checks even when no events are detected. Database like storage system [5], similar to event base system. All sensory data (regular sampling and event) is stored locally on the nodes. Base station search for interesting data and retrieve it from the node directly. The main challenge in this application is to store the data in a smart way for the fast search and retrieval. The challenges and properties of WSN deployment can be summarized as follows: Wireless Ad-hoc Network: A fixed communication infrastructure does not exist the share wireless medium places additional restriction on the communication between the node and the problem like unreliable and asymmetric link are raised. But it gives broadcast advantage: a packet transmitted by a node to another is received by all neighbor of the transmitting node. Mobility and Topology Changes: WSN always work in dynamic scenario new node may join the network and the existing node may either changing position throughout the lifetime of network. The nodes may cease to function, and surviving nodes may go out of transmission radii of other nodes. WSN application must satisfy robustness against dynamic topology and node failure.

ISSN: 2320 – 8007

Energy Limitation: Limited energy of a node is one of a major constraint. The basic scenario includes a topology of sensor nodes and limited number of more powerful base station. Once the nodes are deployed recharging and maintenance of the batteries on the sensor node is not possible. Communication task consume maximum power available to sensor node, and in order to ensure the task completion node must have sufficient energy, which is used frugally. Physical Distribution: In WSN each unit is a autonomous computation unit that communicate with its neighbor via messages. The data is distributed throughout the nodes in the network and can be gathered at central station only with high communication cost. Subsequently, the algorithm that operate on global information from the entire network become very expensive. Hence, distributed algorithms are highly desirable. The major WSN challenges addressed by RL technique is presented in the following subsection: 1. Design and Deployment WSN are used in wide range of application from monitoring the biological system through tissue implemented sensor to monitoring forest fire through air drop sensor. In some application, the sensor node needs to be place accurately at predefined location. Sensor network design aims at determining the type, amount and location of sensor node to be placed in the environment in order to get complete knowledge of its functioning condition. 2. Localization Creating the location awareness in all deploying sensor node is the node localization. Location information is used to detect and record event, or to route packet using geometric aware routing [6,7]. Besides, the location itself is often the data that need to be sensed. Localization methods that use the time of arrival of signal from multiple base stations are commonly used in WSN [8]. 3. Data Aggregation and Sensor Fusion The process of combining the data derived from the multiple sources such that either the resulting information is in some sense better that would be possible with the individual sources, or the communication overhead of sending individual sensor reading to the base station reduced is called the sensor fusion. Due to the large scale deployment of the sensor voluminous data is generated, efficient collection of which is a critical issue. For the sensor fusion most widely used methods are Baysian network, Kalman fil-

1363

Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013 ter [9]. A survey of data aggregation used in WSN is presented in [10]. 4. Energy Aware Routing and Clustering In many applications, network life expectancy of few months or year is desired. Hence economic usage of energy is very important in WSN. Replacing and recharging the batteries on the nodes may be impractical, expensive or dangerous. Routing refers to determine a path from a source node to destination node. In proactive routing method, routing tables are created and stored regardless of when the routes are used. In reactive routing methods, routes are computed as necessary. In a densely deployed network, routing table takes a huge amount of memory, and therefore, hybrid of proactive and reactive methods are suitable for such network. Hierarchical clustering of the network is the possible solution. An overview of modern WSN routing algorithm is presented in [11]. 5. Scheduling In order to conserve energy, typical sensor nodes remain in sleep mode most of the time, and go into active mode periodically in order to acquire and transmit sensory data. A strict schedule need to be followed regarding when a node would wake up, sense, transmit (or perform locomotion), ensuring the maximum network lifetime. The main objective of WSN is making the WSN nodes to take right action at right time. 6. Security WSN have wireless link which are highly susceptible to impersonating, message distorting, eavesdropping, etc. Poorly protected nodes that move into hostile environment can be easily compromised. Administration becomes more difficult due to the dynamic topology. Various security challenges in WSN are analyzed [12] and key issues that need to be resolved for achieving the adequate security are summarized. in[13]. A review of security threats to WSN and a survey of defense mechanism is presented in [14]. 7. Quality of Service Management QoS refers to an assurance by the network to provide a set of measurable service attribute to the end to end user/application in terms of fairness, delay, jitter, available bandwidth, and the packet loss. A network has to provide QoS[17]while maximizing the network resource utilization. To achieve this goal, the network is required to analyze the application requirements, and deploy the various network QoS mechanism. A survey of QoS supports in WSN is presented in [15].

ISSN: 2320 – 8007

5

RL application to energy efficient routing in WSN

 Q Learning: Q Learning is a model free algorithm that maintains the values for the state action pairs, called Q values .The action in each state is chosen according to the Є greedy policy. It uses a parameter γ called as learning rate, to update the Q values for each state it experiences. As the name signifies it determines how quickly the agent learns from the environment to discover the optimal policy.  Queue Routing Queue learning techniques [16] are used to create routing policies. Routing algorithm based on Qlearning techniques are referred to as Q routing algorithm. Queue routing algorithm provide dynamic load balancing in packet switched network by estimating the routing delay at different nodes. Q-routing algorithm is also used in propagating energy information faster through the network. Q-routing algorithms are used in multi-source single source scenario where all non-sink nodes had a certain probability of generating message for the sink. Each nodes routing table can contain information for multiple possible path to sink along different neighbors. It also stores one Q value per neighbor since all data in the network has the same destination. So each value is an indication of how capable this neighbor of forwarding data to sink. When a node has to forward a data packet it can select a neighboring node based on the value stored in its routing table. It uses the value of all neighboring nodes adjacent to the source node.  Energy Efficient Routing In this approach the neighboring nodes energy level is used as a metric for efficient routing. This approach works by appending a nodes energy level to the feedback message that it sends after receiving a message. When a node receives feedback it will used the received energy level to modify the neighbors Q value in its routing table. It does this by generating a multiplier for the value stored in routing table. This multiplier is generated depending on the amount of energy available at the node. The less energy available at node the higher the multiplier will be used to discourage further uses of nodes with lower power levels. Finally energy efficient algorithms using Q routing extend the life time of the network.

1364

Pradnya M.Daflapurkar, et.al, International Journal of Technology and Engineering Science [IJTES] TM Volume 1[9], pp: 1360-1365, December 2013

6

Conclusion

This article surveys the reinforcement learning techniques applied in WSN to evaluate and improve the performance of the network. Reinforcement learning techniques will studied help to increase the lifetime of the network. Also RL are used for resource allocation and task scheduling in WSN.

7

References

9. R. R. Books and S. S. Iyengar, Multi-sensor Fu-

10.

11.

1. Raghvendra V. Kulkarni, Ganeshkumar Ve-

2.

3.

4.

5.

6.

7.

8.

nayagamoorthy,” Computational Intelligence In Wireless Sensor Networks: A Survey” IEEE Communication Survey And Tutorials 2010. Kok-Lim Alvin Yau A, Peter Komsarczuk, Paul D. Teal “Reinforcement Learning For Context awareness & Intelligence in Wireless Networks: Review, New Feature & Open Issues.” Elsevier‟s Journal of Network and Computer Applications. 35 (2012) pp. 253-267 Richard S.Sutton and Andrew G.Barto, Reinforcement Learning :An Introduction. MIT Press Cambridge, MA; 1998. Anna Foerster and Alexander Foerster “Emerging Communications for Wireless Sensor Networks”, (ISBN 978-953-307-082-7) Publisher: InTech, 2011. S.R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “TinyDB: An acquisitional query processing system for sensor networks,” ACM Trans. Database Syst., vol. 30, no. 1, pp. 122173, 2005. N. Patwari, J. Ash, S. Kyperrountas, A. Hero, R. Moses, and N. Correal, “Locating the nodes: Cooperative localization in wireless sensor network,” IEEE Signal Process. Mag., vol. 22, no. 4, pp. 54-69, July 2005. J. Aspnes, T. Eren, D. Goldenberg, A. Morse, W. Whiteley, Y. Yang, B. Anderson, and P. Belhumeur, “A theory of network localization,” IEEE Trans. Mobile Comput., Vol. 5, No. 12.pp. 1663-1678, Dec 2006. Boukerche, H. Oliveira, E. Nakamura, and A. Loureiro, “Localization systems for wireless sensor networks,” IEEE Wireless Commun, Mag., Vol. 14 No. 6, pp. 6-12, Dec. 2007.

ISSN: 2320 – 8007

12.

13.

14.

15. 16.

17.

sion: Fundamentals and Applications with Software. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1998 R. Rajagopalan and P. Varshney, “Data aggregation techniques in sensor networks: A survey,” IEEE Commun. Surveys Tuts., Vol, 8. No. 4, pp. 48-63, Fourth Quarter 2006. P. Jiang, Y. Wen, J. Wang, X. Shen, and A Xue, “ A study of routing protocols in wireless sensor networks,” in Proc. 6th World Congress Intelligent Control Automation WCICA 2006 Y. Wen, Ed., Vol 1, 2006, pp. 266-270. F. Hu and N. K. Sharma, “Security considerations in ad hoc sensor networks,” Ad Hoc Network, Vol. 3, no. 1 pp. 69-89, 2005. C. Karloff and D. Wagner, “Secure routing in wireless sensor networks: Attacks and countermeasures,” Elsevier‟s AdHoc Network J., Special Issue Sensor Netw. Appl. Protocols. Vol. 1, no.2-3, pp 293-315, Sept. 2003. X Chen, K. Makki, K. Yen, and N. Pissinou, “Sensor network security A survey,” IEEE Commun. Surveys Tuts., vol. 11, no. 2, pp. 5273, 2009. D. Chen and P. Varshney, “QoS support in wireless sensor networks: A survey,” June 2004. Maarten Devillé, Yann- Aël Le Borgne , Ann Nowé, . “ Reinforcement learning For Energy Efficient Routing In Wireless Sensor Networks,” The 23rd Benelux Conference on Artificial Intelligence (BNAIC 2011) pp.1-8, 3- 4 November 2011. Vasco Pereira, Jeorge Sa Silva, Edmundo Monteiro, “A framework For Wireless Sensor Networks Performance Monitoring”, IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), June 25- 28 June 2012.

1365

Suggest Documents