Reinforcement Learning Based Routing in ... - Semantic Scholar

Reinforcement Learning Based Routing in Wireless Mesh Networks Mustapha Boushaba1, Abdelhakim Hafid1, Abdeltouab Belbekkouche1, Michel Gendreau2 1

Network Research Laboratory University of Montreal, Montreal, Canada {boushamu, ahafid, belbekka }@iro.umontreal.ca

2

CIRRELT and MAGI École Polytechnique de Montreal, Montreal, Canada {michel.gendreau}@cirrelt.ca

Abstract This paper addresses the problem of efficient routing in backbone wireless mesh networks (WMNs) where each mesh router (MR) is equipped with multiple radio interfaces and a subset of nodes serve as gateways to the Internet. Most routing schemes have been designed to reduce routing costs by optimizing one metric, e.g., hop count and interference ratio. However, when considering these metrics together, the complexity of the routing problem increases drastically. Thus, an efficient and adaptive routing scheme that takes into account several metrics simultaneously and considers traffic congestion around the gateways is needed. In this paper, we propose an adaptive scheme for routing traffic in WMNs, called RLBDR (Reinforcement Learning-based Distributed Routing), that (1) considers the critical areas around the gateways where mesh routers are much more likely to become congested and (2) adaptively learns an optimal routing policy taking into account multiple metrics, such as loss ratio, interference ratio, load at the gateways and end-to end delay. Simulation results show that RLBDR can significantly improve the overall network performance compared to schemes using either interference and channel switching (MIC), Best Path to Best Gateway (BP2BG), Expected Transmission count (ETX), nearest gateway (i.e., shortest path to gateway) or load at gateways as a metric for path selection. Keywords: WMN; Interferences; ETX; Reinforcement Learning; Routing. 1 Introduction During last several years, wireless communications technologies have increasingly gained importance in our daily lives. These technologies have given rise to several types of wireless networks, such as WSNs (wireless sensor networks), VANETs (vehicular area networks) and WMNs (wireless mesh networks). Among these networks, WMNs [1, 9] have attracted significant research due to their features that include dynamic self organization, self configuration, easy maintenance and low cost. In case of link failures, the network is able to automatically establish alternative routes. A WMN can be seen as a multi-hop Mobile Ad-hoc Network (MANET) with extended connectivity at the difference that WMNs are characterized by a relatively static architecture and low mobility.

To increase WMN performance and capacity, nodes can be equipped with multiple radios and multiple channels. Otherwise, with a single channel, a node cannot transmit and receive simultaneously. Indeed, due to the scarce nature of wireless channel resources and to the limited number of channels supported by radios, network performance is highly impacted by interferences and congestion causing considerable packet losses and higher delays. Communications between two nodes in a multi-hop WMN can be supported by several intermediate, nodes called Mesh Routers (MRs). The role of MRs is to relay information from one node to another. Usually, MRs send traffic to the gateway (GW) that connects nodes to the Internet. Indeed, by equipping a WMN topology with a single GW to connect MRs to the Internet, the gateway selection problem becomes simple; this is because all upstream/downstream traffic flows must traverse the same GW to the internet; this node is more likely to become the bottleneck/single point of failure in the network [2]. To mitigate this problem, multiple gateways are installed to distribute load among them, and hence, improve performance. However, increasing the number of GWs does not necessarily increase the network capacity of WMNs. Indeed, network capacity is closely related to network connectivity and the placement of GWs; these issues are out of scope of this paper.

=0 IR .2

.1

IR =0 .1

IR=0.1

IR=0.1

=0 IR

=0 . IR

.1

1

=0 IR

Fig. 1 WMN, multi-metrics routing case The main goal of a routing protocol is to find better routes according to some requirements. These requirements vary according to client needs (e.g., bandwidth, delay, and packet loss). In general, routing protocols are designed to optimize only one of these goals [3, 4]; however, there exist several situations in WMNs where the choice of one metric, to optimize, impacts negatively the performance in terms of another metric. This situation is illustrated in Fig. 1 where nodes A, B, C, D, E are MRs, node S is the source and nodes GW1 and GW2 are gateways. In this example, we consider, as a routing metric, Interference Ratio (IR) or the load at the gateways (GW1 and GW2). Obviously, solving the optimization of these metrics separately does not lead to an optimal solution. If S wants to send traffic to the Internet, the best path to choose, when using IR, will be S-D-A-GW2; it is the path with the smallest interference ratio. On the other hand, the best path, when using load at the gateways, will be S-E-B-GW1; it is the

path that traverses the least loaded gateway. This shows that choosing different metrics could yield different (may be conflicting) results. Thus, taking into consideration several metrics simultaneously is necessary to optimize multi-hop routing in WMNs. In WMNs, multiple users intend to communicate to the Internet through gateways. In such environment, depending on the network topology and the routing strategy, traffic concentration may be observed not only at certain gateways but also in mesh routers (in the neighborhood of gateways) which become traffic “hot-spots” (i.e., nodes A and B in Fig. 1). Therefore, this concentration may increase congestion and interferences excessively on the wireless channels around the gateways. Thus, the neighborhood of gateways is identified as a critical area that routing protocols for WMNs should take into consideration. More specifically, gateways and mesh routers in the critical area should be given particular attention during path computation to prevent drastic performance degradation in that area. In this paper, we propose to use reinforcement learning, namely Q-learning algorithm to route traffic in multi-hop and multiradio wireless mesh networks. In the proposed mechanism (RLBDR), reinforcement-learning is used to update dynamically path costs and to select the next hop each time a packet is forwarded; learning agents in each mesh router learn the best link to forward an incoming packet by continuously exploiting what they have learned in the past and exploring new alternatives to discover better actions in the future. RLBDR consist of (a) exchanging advertisement messages GWADV between gateways and mesh routers; (b) measuring the quality of links, e.g., interference ratio and loss ratio; (c) selecting the best gateway; and (d) using a learning agent in each node to learn the best neighbor to send an incoming packet towards a given gateway. The remainder of the paper is organized as follows. In Section 2, we present related work on metrics and routing protocols in wireless mesh networks. In Section 3, we present the network model, notations and definitions we use throughout the paper; then, we describe RLBDR in detail..Simulation results are presented and discussed in Section 4. Section 5 concludes the paper. 2 Related work In this paper, we propose a reinforcement learning-based routing scheme, in WMNs, that uses a novel routing metric. Thus, in this section, we present an overview of (a) routing metrics proposed in the context of WMNs; and (b) learning techniques used in routing schemes. 2.1 Routing metrics for wireless mesh networks Many routing schemes have been proposed in the literature for different types of wireless networks [5, 6, 7, 8]. Most of these schemes could be roughly divided into three categories: (1) proactive routing protocols (e.g., Optimized Link State Routing OLSR [19]); (2) reactive routing protocols (e.g., Ad hoc On-demand Distance Vector AODV [10]); and (3) hybrid protocols (e.g., Temporally-ordered routing algorithm TORA [11]).

Based on the aforementioned routing techniques, several routing metrics have emerged. In general, a routing metric is used, by a routing protocol, to select the path having the highest throughput, the lowest delay and/or the lowest packet loss ratio. Hop-count is the simplest metric in routing problems. However, for WMNs, hop-count is a poor choice; indeed, there may be a shorter path between source and destination nodes that present heavier interferences and higher packet losses than a longer path in terms of hop count. ETX [3] is a routing metric that estimates the number of transmissions and retransmissions needed to successfully transmit a frame on a link. ETT [4] estimates the time a data frame needs to be successfully transmitted on a link. ETX is defined in Eq. (1): 1 (1) × where df and dr denote the forward and the reverse delivery ratio on the link, respectively. The ETT metric is defined in Eq. (2). =

= ×

S represents the packet size and B the bandwidth of the link.

(2)

The key shortcoming of ETX and ETT is that they do not consider interferences when selecting a path. Among the metrics that are based on ETX and/or ETT, we have Weighted Cumulative Expected Transmission Time (WCETT) [4], Metric of Interference and Channel switching (MIC) [12] and Best Path to Best Gateway (BP2BG) [13]. WCETT [4] has been proposed as an extension of ETT to take into account intra-flow interferences caused by channel diversity. For a given path, WCETT is defined as follows:

= 1 − × + × max

(3)

where 0

Reinforcement Learning Based Routing in ... - Semantic Scholar

Reinforcement Learning Based Routing in ... - Semantic Scholar

Suggest Documents

Designing a Reinforcement Learning-based ... - Semantic Scholar

Reinforcement learning based dual-control ... - Semantic Scholar

Distributed Reinforcement Learning Based MAC ... - Semantic Scholar

Fuzzy Model-Based Reinforcement Learning - Semantic Scholar

Vision-Based Reinforcement Learning for ... - Semantic Scholar

An Adaptive Reinforcement Learning-based ... - Semantic Scholar

Reinforcement Learning Based on Active Learning ... - Semantic Scholar

Advice-based Transfer in Reinforcement Learning - Semantic Scholar

Associative Reinforcement Learning - Semantic Scholar

Forgetting in Reinforcement Learning Links ... - Semantic Scholar

Reinforcement Learning in Distributed Domains - Semantic Scholar

Reinforcement Learning in Evolutionary Games - Semantic Scholar

Coordination in Multiagent Reinforcement Learning - Semantic Scholar

Reinforcement Learning Based Multi-Agent LFC ... - Semantic Scholar

Policy Gradient Based Reinforcement Learning for ... - Semantic Scholar

A Reinforcement Learning Based Method for ... - Semantic Scholar

Reinforcement Learning for Adaptive Routing - arXiv

Robot Task Learning based on Reinforcement ... - Semantic Scholar

Reinforcement Learning for Partially Observable ... - Semantic Scholar

A Generalized Reinforcement-Learning Model - Semantic Scholar

Reinforcement Learning for Spoken Dialogue ... - Semantic Scholar

Prioritized Sweeping Reinforcement Learning ... - Semantic Scholar

Sequence labeling with Reinforcement Learning ... - Semantic Scholar

Reinforcement Learning When Visual Sensory ... - Semantic Scholar