Reinforcement Learning Approach for Ad Hoc Network

0 downloads 0 Views 303KB Size Report
Mar 30, 2014 - 3rd International Conference on Recent Trends in Engineering & Technology ... of Q routing protocol in section 3, optimization techniques used over Q routing protocol in ... OLSR also uses only periodic updates for link state.
Proceedings of 3rd International Conference on Recent Trends in Engineering & Technology (ICRTET’2014) Organized By: SNJB's Late Sau. K. B. Jain College Of Engineering, Chandwad ISBN No.: 978-93-5107-220-1,Date :28-30 March, 2014

Reinforcement Learning Approach for Ad Hoc Network Rahul Desaia, *, B. P. Patilb a Research Scholar, Sinhgad College of Engineering, Department of Information Technology, Army Institute of Technology. Pune, 411015,India b Department of Electronics and Telecommunication, Army Institute of Technology, Pune, 411015, India

Abstract Most of the routing algorithms over ad hoc networks are not capable of adapting the run time changes such as traffic load, delivery time to reach to the destination etc, thus though provides shortest path, this shortest path may not be optimum path to deliver the packets. Optimum path can only be achieved when state of the network is sensed every time the packets are transmitted from the sender. State of the network depends on various network properties such as the size of the queue, status of the links and nodes etc. Thus Q learning framework of Watkins and Dayan is used to develop and improve such adaptive routing algorithms. In Q Routing, Q tables are used which represents the state of the network. Q Learning is based on reinforcement learning which is a popular machine learning technique, which allows an agent to automatically determine the optimal behaviour to achieve a specific goal based on the positive or negative feedbacks it receives from the environment after taking an action. This paper describes Q routing protocols over Mobile ad hoc networks. Keywords: Ad Hoc Networks, DSDV, WRP, OLSR, AODV, TORA, Q Routing, CQ Routing, DRQ Routing.

1. Introduction An ad hoc network is infrastructure less network and dynamically creates a temporary network. Due to the limitation of transmission range of wireless network interfaces, multiple network hops are needed for one node to exchange data with another across the network. Mobile node is also acting as a router and thus forwards packets for other mobile nodes in the network that may not be within the direct reach [1]. The routers are free to move randomly and organize themselves arbitrarily; thus, the network's wireless topology may change rapidly and unpredictably. Because of its mobility and non-infrastructure nature, the ad hoc network poses various new design requirements. This work is ordered as follows. We described survey and comparison of existing routing protocols/algorithms in section 2, introduction and detailed description of Q routing protocol in section 3, optimization techniques used over Q routing protocol in section 4 and finally section 5 specifies the conclusion. 2. Survey of Ad Hoc Routing Protocols In MANET, nodes are not aware of each other hence they need to discover each other by broadcasting to neighbouring nodes their presence. They also need to listen for other broadcasts in case a new node is added. The protocols used for routing in an ad hoc networks are divided into proactive and on demand routing protocols. Proactive protocols always maintain unicast routes between all pairs of nodes without considering that all routes are actually used. Therefore, when there is a need to transmit packets route is readily available and thus does not have to incur any delay for route discovery [2]. These protocols can also find shortest path routes but are not directly suitable for resource poor mobile ad hoc networks because of their high overheads and poor convergence behaviour. These protocols are broadly classified into the two traditional categories: distance vector and link state. Destination Sequenced Distance Vector (DSDV) [2] was one of the earliest protocols developed for ad hoc networks. DSDV contains destination sequence numbers to avoid count to infinity problem. DSDV also uses triggered incremental routing updates

* Corresponding author. Tel.:+919403357088. E-mail address: [email protected], [email protected]. © Elsevier Publication 2014

16

Rahul Desai .et.al.

to quickly propagate information about route changes. Wireless Routing Protocol (WRP) [2] is another distance vector protocol optimized for ad hoc networks. This protocol uses algorithms that use the next hop and second-to-last hop information to overcome the counting-to-infinity problem. Optimized List State Routing (OLSR) [3, 4] is an optimized version of link state protocol such as OSPF. It uses Multipoint relay (MPRs) concept to efficiently transmit link state updates across the network. Only the nodes selected as MPRs are allowed to generate link state updates. OLSR also uses only periodic updates for link state dissemination. The following table summarizes proactive routing protocols. Table 1: Proactive routing protocols summary

Parameters Route Updation

Routing Overhead Caching Overhead Throughput Routing Tables

DSDV Periodic, Triggered to neighbors High Medium Low 2

WRP Periodic, Triggered neighbors High High Low 4

to

OLSR Periodic, network.

Triggered

in

the

Low High Medium 4

A different approach from table driven routing is source initiated on demand routing. Main idea in on demand routing is to find and maintain only needed routes. Thus it avoids incurring the cost of maintaining routes that are not used. When there is a need of a route, route discovery process is initiated. Once a route has been established, it is maintained by a route maintenance procedure until either the destination becomes inaccessible along every path from the source or until the route is no longer designed. This saves energy and bandwidth during inactivity but congestion occurs during high activity. Because of route discovery process, significant delays are generated. Dynamic Source Routing Protocol (DSR) [5,6] also uses source routing concept. Sender knows the complete hop-by-hop route to the destination. These routes are stored in a route cache. The data packets carry the source route in the packet header. It also uses route discovery process to dynamically determine optimum route. Route request and reply packets are also used which are source routed. Ad Hoc on Demand Distance Vector Routing (AODV) [7,8] is pure on-demand routing protocol. It also initiates similar route discovery process. But AODV uses traditional routing tables, one entry per destination. For every destination, DSR maintains multiple route cache entries. Temporally Ordered Routing Algorithms (TORA) is another on demand protocol. TORA route discovery procedure computes multiple loop free routes to the destination which constitute a destination oriented directed acyclic graph (DAG). While the ad hoc network is looked upon as an undirected graph, TORA imposes a logical directionality on the links. TORA employs a route maintenance procedure requiring strong inter-nodal coordination based on a link reversal concept for localized recovery from route failures. The following table summarizes on-demand Routing protocols. Table 2: On Demand routing protocols summary

Parameters Route Creation Caching overhead Throughput Multipath Route Updation Advantages

AODV By Source Low High No Non-Periodic Adaptable to large & highly Dynamic topologies

DSR By Source High Low Yes Non-Periodic Reduce route discovery Overhead

TORA Locally Medium Low Yes High Routing Overhead -

To simulate the desired work network simulator 2.34 version is used. For simulation, area of 500 by 500 and random mobility model is considered. Simulation is performed for 20, 40, 60, 80 and 100 nodes with 50% mobility. DSDV, DSR, AODV and Multi-Path AODV (AOMDV) protocols are compared for packet delivery ratio and average end-to-end delay performance parameters. Thus it is found; AODV protocol gives best performance in terms of packet delivery ratio as well as average end-toend delay for light as well as high loads. But average end-to-end delay is found lowest using DSDV protocol. This is obvious as DSDV is proactive routing protocol, route is always available. The following table summarizes proactive and on-demand routing protocols.

© Elsevier Publication 2014

17

Figure 1. Packet delivery ratio

Figure 2. Average end-to-end delay

Final summary of proactive and on demand routing protocols is presented in Table 3. Table 3: Proactive vs. On Demand routing protocols

Parameters Storage Requirements Route Availability Periodic Route Updates Delay Scalability Control Traffic Routing Information Performance In case of High Mobility Route to every other node First Packet Latency

Proactive Higher Route is always available. Constant propagation of routing information periodically. Low 100 nodes High Keep stored in table High performance provided that the network size is small enough. Poorer Performance A route to every other node in ad-hoc network is always available. Less when compared with on-demand protocols

On-Demand Dependent on no. of routes maintained or needed Computed as per need No periodic updates. Control information is not propagated unless there is a change in the topology. High > 100 Low Doesn’t store High performance provided that the network size is large enough. Good Performance Not available. More when compared with table-driven protocols because a route need to be built

3. Q Routing In section 2, various conventional routing algorithms, both proactive and on demand were discussed in detail. In this section, new adaptive routing algorithms which are based on reinforcement learning approach, called as Q-routing (Littman and Boyan 1993a; Littman and Boyan 1993b; Boyan and Littman 1994) [9,10,11], is discussed in detail. There are two different approaches used for learning purpose. In model-based approach, the learning agent learns a model of the environment/network and uses this knowledge to design an optimum control policy for the network, while in the model-free approach a controller is learned directly from the actual outcomes. Reinforcement learning is an example of the model-based approach which is used for the task of adaptive network routing. Here the model of the system is designed in terms of Q values. Each Q value is of the form Q(S, A) showing the reinforcement of taking © Elsevier Publication 2014

18

Rahul Desai .et.al.

some action ‘A’ in state ‘S’.[12,13] When a node X receives a packet for some destination D, node X looks at the vector Qx (*, D) and selects that neighbouring node ^Y for which the Qx (^Y, D) value is minimum. Thus, node X sends the packet to that neighbour from which the packet reaches its destination as quickly as possible. These Q values are just estimates and thus do not necessarily give the best solution. The routing decision is optimum if these Q values are accurate.

Figure 3. Example of Q Routing

Once the packet from source S is received by node X, node X checks its Q table (which is similar of routing table, but Q table consists of Q values which are directly proportional to actual delay that packet takes to reach to the destination), Qx(Y,D) and Qx(N,D). Node X decides the optimum route (which could be neighbour node Y or N) that packet takes to reach to the destination D. Once the packet reaches to the neighbour (Y or N) the neighbour node returns Q value (Qy(X,D) or Qy(N,D)) back to node X. Node X updates its Q value (Qx(Y,D) and Qx(N,D)). The Q Routing algorithm can be expressed in two steps: The PacketReceive(Y) step describes what the node Y does when it receives a packet P(S, D) from one of its neighbouring nodes, X and the PacketSend(X) step describes what node X does when it has to send a packet. PacketSend(X) 1 Check for PacketQueue(X)) if not empty go to step 2 2 Receive the first packet in the PacketQueue(X). 3 Compute best neighbour Y = min (Qx(Y, D)) 4 ForwardPacket packet to neighbour Y and Wait for Y's estimate. 6 ReceiveEstimate from node Y and calculates Qx(Y, D) est =qy + δ + Qy(Z, D). 6 UpdateQvalue (Qx(Y, D)). (Qx(Y, D)new = Qx(Y, D)old + Пf (Qx(Y, D)est - Qx(Y, D)old) ) 7 Get ready to send next packet (goto 1). PacketReceive(Y) 1 Receive a packet P(S, D) from neighbour X. 2 Calculate best path for node D; Qy(Z, D) 3 Send Qy (Z, D) + qy ) back to node X. 4 If Y is destination, ConsumePackety else goto 5. 5 Check PacketQueue(Y). If it is FULL then DropPacket else goto 6. 6 Append Packet to Queuey 7 Get ready for receive next packet (goto 1). 4. Optimization of Q Routing

Figure 4. Optimization of Q Routing – CQ and DRQ Routing

© Elsevier Publication 2014

19

Inspired by the Q routing algorithm [14, 15], a new optimized routing algorithm, Confidence based Q routing (CQ Routing) is presented. The quality of the control policy depends mainly on Q values in the network that are going to represent its current state. These Q values must be updated as there is any change in the network state. Some of the Q values may not get updated for a long time as it depends upon the network traffic and load conditions. Decisions based on such unreliable Q values are unreliable. In order to bring reliability in the Q values, confidence values are introduced in Q Routing. For every Q value in the network, there is a corresponding confidence value between 0 and 1. These confidence values are also get updated so that they decay exponentially with every time step if the corresponding Q value is not updated.

Figure 5. Example of CQ Routing

Each Q value is associated with a measure of confidence value. Confidence value of 1 means corresponding Q value is completely reliable. A value of 0, means that the corresponding Q value is unreliable. In Q Routing the learning rate is always constant but in CQ Routing, the learning rate depends on the confidence value being updated and its new estimate. When node X sends a packet to its neighbour Y, it also receives the confidence value Cy(Z, D) associated with this Q value. When node X updates its Qx value, it first computes the learning rate Пf which depends on both Cx and Cy. Qx (Y, D)new = Qx(Y, D)old + Пf (Cx (Y, D), Cy (Z, D)) (Qy (Z, D) + qy + ∂) - Qx(Y, D)old) The learning rate function Пf (Cold, Cest) should be high if either: confidence in the old Q-value is low or confidence in the new Q value is high. Simple and effective learning rate function is given by: Пf (Cold, Cnew) = max (Cnew, 1- Cold). Every C-value decays with time if their corresponding Q-values are not updated in the last time step. Cx(Y, D) = δ Cx(Y, D); where δ (0, 1) is the decay constant. If a Q value is updated in the last time step, then the corresponding C value is updated based on the C-values corresponding to the Q-values used in the Q-value update. Cx(Y, D)new = Cx(Y, D)old + f (Cx(Y, D), Cy (Z, D)) (Cy (Z, D) - Cx(Y, d) old) Dual reinforcement Q Routing (DRQ) is a modified version of the Q-Routing algorithm, where learning occurs in both ways. Since, the learning process occurs in both ways the learning performance of the Q-Routing algorithm doubles. However, it adds more overheads to the network. When a node X sends a packet to one of its neighbors Y, the packet can take along information about the Q values of node X. When node Y receives this packet, it can use this information in updating its Q values related with neighbour X. The only overhead is a slight increase in the size of the packets. Q value updates in backward exploration are more accurate than Q value updates in forward exploration.

Figure 6. Example of DRQ Routing

There are various further optimization used over Q routing such as probabilistic routing, where probability distribution associated with the vectors Qx and Cx are used. Predictive Q routing is also an extension of Q Routing that maintains and uses a recovery rate of Q values and the best estimated delivery time to make its routing decisions. The recovery rate is used to predict the Q value at © Elsevier Publication 2014

20

Rahul Desai .et.al.

the current time step. CDRQ Routing [17,18] combines the features of both CQ routing and DRQ Routing. Thus, with each hop of a packet P(S, D) from node X to node Y, the Q and C values of both nodes X and Y are updated in the forward and backward exploration, respectively. 7.

Conclusion

Though an AODV or AOMDV protocol gives good performance but at high mobility and heavy load situations, both of them fail to work. Q-routing is always better as compared with existing protocols used over MANET as Q routing considers the network state at run time. Q-routing is easy to implement over fixed networks but it is very difficult to implement over mobile ad hoc networks because of special characteristics of mobile ad hoc networks. Though Q routing is better than Bellman Ford algorithms still various optimization techniques must be used over Q-routing. CDRQ routing performed better than Q routing. The other optimization techniques used could be probabilistic CDRQ routing (PRCQ routing) and Predictive Q routing could be used. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

S Radha, S Shanmugavel, “Mobility Models in Mobile Ad Hoc Network”, 2007, IETE Journal of Research, Vol.53, No. 01 A K Sharma And Amit Goyal, “A Power Efficient Self Adjusting Routing Protocols for Mobile Ad Hoc Networks”, 2007, IETE Journal of Research, Vol 53,No.4, July-Auguest 2007. Asma Toteja, Raynees Gujral, Sunil Thalia, “Comparative performance Analysis of DSDV, AODV and DSR Routing Protocols in MANETs, using NS2”, 2010 International Conference on Advances in computing Engineering, IEEE Computer Society. T. Clauses et al. Optimized Link State Routing Protocol, Http://www.ietf.org.internet-drafts/draft-ietf-manet-olsr-11.txt, July 2003, IETF Internet Draft. Haseeb Zafar, Nancy Alhamahmy, David Harle and Ivan Andonovic “Survey of Reactive and Hybrid Routing Protocols for Mobile Ad Hoc Networks” International Journal of Communication Networks and Information Security (IJCNIS) Vol. 3, No. 3, December 2011 D B Johnson, D A Maltz, Y. Hu and J G Jetcheva. The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR) http://www.ietf.org/internetdrafts/draft-ietf-manet-dsr-07.txt , Feb 2002, IETF Internet Draft. Rajesh Shrivastava & Rashween Kaur Saluja “Performance Evaluation of Extended Aodv Using Different Scenarios” International Journal of Smart Sensors and Ad Hoc Networks (IJSSAN) ISSN No. 2248-9738 Volume-1, Issue-3, 2012 S. Ali, A. Ali, “Performance Analysis of AODV, DSR and OLSR in MANET”, Master’s Thesis, M.10:04, COM/School of Computing, BTH, 2010. Shailesh Kumar “Confidence based Dual Reinforcement Q-Routing: an On-line Adaptive Network Routing Algorithm” Report AI98-267 May 1998 S Kumar, Confidence based Dual Reinforcement Q Routing : An on line Adaptive Network Routing Algorithm. Technical Report, University of Texas, Austin 1998. J A Boyan, M L Littman, Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. In Advances in Neural Information Processing Systems 6 (NIPS6), Morgan Kaufmann, San Francisco, CA pp. 671-678, 1994. Davi Kelly, “Reinforcement Learning with Application to Adaptive Network Routing” Journal of Theoretical and Applied Information Technology, 2005 JATIT. Ramzi A. Haraty and Badieh Traboulsi “MANET with the Q-Routing Protocol” ICN 2012 : The Eleventh International Conference on Networks Mahmoud Alilou, Mohammad Ali Jabraeil Jamali, Behrooz Talebzadeh and Maysam Alilou “Modified Q-learning Routing Algorithm in Fixed Networks” Australian Journal of Basic and Applied Sciences, 5(12): 2699-2703, 2011 ISSN 1991-8178 Shalabh Bhatnagar, K. Mohan Babu “New Algorithms of the Q-learning type” Science Direct Automatica 44 (2008) 1111-1119. Website: www.sciencedirect.com Nicholas Mastronarde and Mihaela van der Schaar “Fast Reinforcement Learning for Energy-Efficient Wireless Communication” IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011 Soon Teck Yap and Mohamed Othman “An Adaptive Algorithm: Enhanced Confidence based Q Routing Algorithm in network Traffic” Malaysian Journal of Computer, Vol. 17 No. 2, December 2004, pp.21-29. Shailesh Kumar, Risto Miikkulainen, “Dual Reinforcement Q-routing: An On-line Adaptive Routing Algorithm” Proceedings of Artificial Neural Networks in Engineering, 1997

© Elsevier Publication 2014

21