An Epidemiological Study of Information ... - Semantic Scholar

An Epidemiological Study of Information Dissemination in Mobile Networks. Cedric Westphal∗ , Karim Seada† , Charles Perkins‡ , Ryuji Wakikawa§ ∗ DoCoMo

Email:

Labs USA, † Nokia Research Center, ‡ WiChorus.com, § Toyota ITC/Keio University

∗ [email protected], † [email protected], ‡ [email protected], § [email protected]

Abstract—We present an analytical, epidemiological, model for the overhead of information dissemination in mobile networks, which finds application in a range of settings, including resource discovery, information dissemination in sensor networks, location services and ad hoc network routing. We consider two principal information dissemination mechanisms, one for which nodes strictly relay information for other nodes, and the other for which they also take advantage of the information already being disseminated to insert their own data to share. The latter mechanism is denoted by path accumulation. We will show that the overhead with path accumulation scales as n5/4 . With respect to as n3/2 and without path accumulation √ the cutthe connection rate, both scale as λ. We also identify √ off point for path accumulation as ξc = 1 − 2/ n. Below that threshold, path accumulation should be preferred, while above, path accumulation does not provide substantial benefits. Our model proves extremely accurate against simulation data.

I. I NTRODUCTION Mobile networks render most of the information, regarding the location of nodes, resource or services, transient and ephemeral. As such, many types of information often need to be updated and disseminated between the nodes of a mobile network. For instance, a node would announce its services or available resources by flooding their information through the network or send it to a set of specific servers. Nodes interested in sharing their services or resources could piggyback on the initial announcement dissemination. A similar mechanism of information dissemination occurs in sensor networks, where a large amount of data gathered needs to be delivered or accessed [1], [2]. There is significant research recently on how to aggregate this data efficiently, preserving the network resources. Another purpose for disseminating location information is to locate nodes across the network. For instance, the mapping of an ID to a location in a geographic protocol (e.g. geographic routing), or various location-based services in general (for instance [3]) require information to be disseminated throughout the network. The basic mechanism is for nodes to relay their current information so that other nodes can reach them when needed. A specific case of the dissemination of location information, which we will focus on below, is that of route discovery in ad hoc network protocols. This research was sponsored in part by a grant from the Japan Society for Promotion of Science (JSPS).

While dissemination of information seems a natural target for the use of the mathematics of infectious disease [4], these tools have not been applied, to the best of our knowledge, to mobile networks. This work attempts to show the usefulness of these concepts in this context. We are interested in studying the information dissemination process using an epidemiological model. To do so, we define a metric: the efficiency of the information dissemination process. We consider two typical information dissemination processes, with a wide range of applications: a mechanism where each node floods its information asynchronously throughout the network, and a mechanism where nodes which receive information to disseminate from another node, take advantage of the message to insert their own information to share (these will be defined formally below). Our key contribution is to formulate a mathematical framework to study the overhead of control messages for information dissemination in mobile networks. We adopt a macroscopic, fluid-model approximation which holds for large networks. While such epidemiological techniques have been used for dissemination of, for instance, viruses in peer-to-peer networks [5], our model is significantly different: it does not require the exchange of information to be between a node pair, but along a network path. Also, each node disseminates different information, instead of sharing the same virus. To the best of our knowledge, this is the first epidemiological analysis application to our context. The paper is organized as follows. We start with describing the information dissemination process in more detail in Section II. We will also describe the path accumulation process to disseminate information. We then take several incremental steps towards our overhead results, which we focus towards the overhead of reactive ad hoc routing protocols. First we consider the efficiency of the information dissemination process in a static network in Section III. In particular, we describe the embedded Markov chain where the state is the knowledge of the topology upon the event that information is being disseminated. We also show the relationship between the occurrence of such event and time. The second step finds the steady state behavior of the network knowledge in a dynamic network. We consider networks where the topology knowledge evolves due to expiration of information in the cache in Section IV and networks where the topology evolves due to node mobility in Section V. Finally, we compute the overhead of our information dis-

semination process in Section VI. We will see that for two asymptotic behaviors (the size of the network, or the rate of connection requests), path accumulation offers little performance benefit in the limit. We provide some numerical validation of our analytical results in Section VII, based on the case of route discovery in ad hoc network. We hasten to add that this special case is highly relevant in its own right, and is enough justification for the applicability of our model on its own. We end with related works in Section VIII and concluding remarks in Section IX. II. I NFORMATION D ISSEMINATION M ECHANISMS We present here the two main information dissemination mechanisms that we will study in the remainder of this document: information dissemination without and with path accumulation. Our first information dissemination mechanism is a simple asynchronous request-reply mechanism. When a node requires some information, it broadcast its interest through the network. Each node which receives the piece of information stores it in a cache for a pre-set time-to-live period. If a node receives the same packet (as identified by the original sender and some sequence number) several times, it stops re-broadcasting the packet. We denote this mechanism by basic information dissemination. The second mechanism is similar to the first, but adds the following modification: when a node receives a broadcast from another node, it can insert its own information to share. Since the packet has already traveled through the network, the information will not reach all the nodes. However, it might be useful to some nodes anyway. Since each node along the path of a packet inserts information, the packet accumulates more and more information, and we denote this mechanism by information dissemination with path accumulation. In order to build some intuition, we note that the basic information dissemination corresponds to the route request mechanism used in ad hoc networks by AODV [6], where the information being disseminated is the route towards the source of the packet, and the time to live is the Active Route Time-out (ART). The second mechanism corresponds in the same context to the route request mechanism in DSR [7]: each packet inserts its own address in the source route along the path followed by a route request, thus making use of the path accumulation feature. In both protocols, the route request (RREQ) is the mechanism to disseminate the information throughout the network. For clarity of explanation, we place ourselves in a routing protocol framework to derive our analytical model. However, the results are applicable to a wide range of applications as described above. III. ROUTE D ISCOVERY IN S TATIC N ETWORK We consider a network with n nodes, evenly distributed in a square area, with density such that all nodes are connected for the corresponding transmission range. Wherever n varies in our scaling results, it is implied that the network size

varies as well to keep the density constant (expanded network assumption). Due to our choice of topology, the average distance between √ two nodes will scale as n times some constant dependent on transmission range and density. √ Since both the node density and the coefficient in front of n in the average node distance are O(1), we do not explicit them in our scaling results. We assume a perfect link layer: nodes are either within connection range, or not. In future work, we intend to model the variations of the link layer and the impact of link quality on our results. Since we focus on routing at the network layer, it is reasonable (and quite common) to abstract the underlying physical and link layers underneath. We assume that each node generates connection attempts to a destination chosen randomly and uniformly among the other nodes, according to a Poisson process with rate λ. These attempts in turn generate a route request if and only if the destination is not in the route table of the originating node. Define by path an ordered pair of node (s, d). We say we know a path (s, d) if the routing protocol is able to forward packets from s to d without issuing a route request. We denote by n2 the number of paths12 . We define by tx (i) the integer-indexed sequence of times at which route requests are issued by the routing protocol, where x is np for no path accumulation, pa for path accumulation, or omitted when the context is clear. At time tx (i), the ith route request is issued. We set tx (0) = 0. tx (i) is a Markov process, as time tx (i+1) depends only on tx (i), the state ηx (i) defined next, and the Poisson processes generating connections at each node. However, tx (i), i ∈ N is not a Poisson process: as more routes are known throughout the network, the probability that a connection attempt turns into a route request decreases. Define by ηx (t) the number of paths known by the routing protocol at time t and by ηx (i) this quantity at time tx (i). This is the sum of the number of active paths cached in each node’s route table. Finally, define by ξx (.) = ηx (.)/n2 the efficiency of the route discovery process, that is, the fraction of paths discovered by the routing protocol normalized by the total number of paths. We first consider a static model, by which we mean that the connectivity between two nodes does not change with time. We also assume that the Active Route Time-Out is set to ∞, which is consistent with (and optimal for) a static network. We will later relax both assumptions. Our approach is to focus on some global quantities for the network, and to ignore the state at the node-level granularity. This is a macroscopic approach, and we do require that the network is large enough that the behavior of a few nodes does not affect the overall conclusions. 1 We use route and path interchangeably. Our definition precludes having multiple different paths between two nodes, even though there might be several ways to route packets from s to d. 2 Technically, it is n(n − 1) but the difference is only n, which we can omit compared to n2 for large values of n. We will make such simplifications several times without further discussion when some terms are negligible, and we trust that the reader will understand the omission of o(n) in front of n.

A. Without Path Accumulation Theorem 3.1: In a network with average node degree d, the number of known routes at time ti ηnp (i) and the efficiency ξnp (i) are given by: 1 i ηnp (i) d = 1 − (1 − )(1 − ) ξnp (i) = (1) n2 n n The proof is in Appendix. Theorem 3.1 provides the state at the transition point of the embedded RREQ process. In order to know the knowledge of the topology ηnp as a function of time instead of the RREQ index, we need to derive an expression for tnp (i). Combining tnp (i) with ηnp (i) will give a description of the route discovery process as a function of time. Theorem 3.2: The route request process issues the ith route request at time tnp (i), where tnp (0) = 0 and tnp (i) follows the following equations: i n n−1 tnp (i) = −1 (2) λ(n − d) n−1 The proof can be found in the Appendix. We are now able to combine Theorem 3.1 and 3.2 when we wish to obtain ηnp as a function of time. Taking an initial condition of 0 instead of ξ(0) = nd/n2 = d/n, we can state the following result (we omit the straight forward derivation): Theorem 3.3: λt (3) ξnp (t) ∼ 1 + λt B. With Path Accumulation The analysis of the route discovery process with path accumulation is not as simple as without path accumulation. There are several behaviors to observe. Recall that ηpa is the number of routes discovered by the route discovery process using path accumulation in a static network with infinite active route time-out and ξpa = ηpa /n2 . A first order approximation would modify the result of Theorem 3.1 thusly: each new RREQ is sent to the other n nodes. However, it disseminates the information not only about the source’s location, but about the intermediate points on the ¯ path as well. Define h(n) to be the average path length. Each ¯ RREQ discovers nh(n) paths, instead of the n routes of √ AODV ¯ without path accumulation. h(n) is of the order of n, with a multiplicative coefficient which depends on the topology, the transmission range and the density. In our simulations, we √ ¯ find that h(n) ∼ n/2. This is the value we use in the sequel. Modifying accordingly the steps of Theorem 3.1 gives: 1 ηpa (i) = n2 − (n2 − dn)(1 − √ )i . 2 n

(4)

As it turns out, this first order approximation does not capture the true behavior of ηpa : a RREQ is issued with probability (n2 − ηpa )/n2 at the source and, as we have seen in the previous section, it reaches n(n2 − ηpa )/n2 nodes. That is, the source’s location is distributed throughout the network to a number of node n scaled by a total factor ((n2 − η)/n2 )2 .

This scaling should be applied to intermediary points on the path of the route request as well. Namely, a RREQ, once it is generated, will bring in: n2 − ηpa (i) n2 − ηpa (i) 2 ¯ (n + ( ) nh(n)) n2 n2 This is an iterative construction for ηpa (i), and we can state the following theorem. Theorem 3.4: In a network with n nodes, the efficiency ξpa (i) at time ti is given by the following recurrence: 1 (1 − ξpa (i))2 √ + ξpa (i + 1) = ξpa (i) + (1 − ξpa (i)) (5) n 2 n

ηpa (i + 1) = ηpa (i) +

We can identify two phases for the behavior of the route discovery process with path accumulation as more and more information is distributed throughout the network. The term (1−ξpa )2 1 √ has two components, which induce two regimes: n+ 2 n (i) When ξpa is close to zero, that is when very little is known about the network topology, the component 1/n will (1−ξ )2 . Further, we can approximate be small compared to 2√pa n 1 √ the dominant term by 2 n and the behavior of the network will be that described by Equation (4). As this is the regime of an almost empty system, and as the approximation do not hold as the system gather information, we denote this phase as the warm-up phase. (ii) When ξpa is close to one, the term n1 becomes dominant (1−ξ )2 and 2√pa negligible. The equation which drives the system n can then be approximated by Equation (20). This means that the behavior of the system with path accumulation now follows the same equation as without path accumulation. Since this regime happens when ξpa → 1, or t → ∞, we denote this phase as the tail phase. The cut-off value is the value ξc for which the two terms are equal, namely ξc = 1− √2n . For ξpa < ξc , path accumulation is beneficial. For ξpa > ξc , its performance will be similar to not using path accumulation, but with RREQs of larger size. In the warm-up phase, path accumulation provides the most benefit. In the tail phase, on the other hand, the gain in terms of discovering the topology are equivalent to not using path accumulation. This means that, in the tail regime, the increasing size of the RREQ does not bring a tangible benefit over the steady size of an AODV RREQ. Finding the point at which this regime kicks in is essential to avoid burdening the network with an unnecessary path accumulation. To make explicit the relation between time and ξ(.), we can numerically solve ξ(t) using Equations (5) and (6) as in the proof of Theorem 3.2: ti+1

= ti +

1 nλ(1 − ξpa (i))

(6)

IV. S TATIC N ETWORK WITH F INITE ART We now consider networks where routes are not permanent. We first consider that routes are kept in the route table only a finite amount of time, denoted by Active Route Time-out (ART). We later consider that routes expire due to mobility

in the network. The case of a static network with finite ART allows us to introduce the methods we will use in the mobility case, and this case is important in this regard. A. Without Path Accumulation We now relax the assumption that the ART is infinite, and set it to τ < ∞. The system will have two distinct components: the positive route request component, which will create entries in the route tables at a given rate; and a negative time-out process which will flush routes out of the system. We investigate here the convergence of ηnp as a function of τ and λ. Theorem 4.1: As t goes to infinity, we have: √ ∗ η η (t) λτ λτ − np np ∗ ξnp (7) = 2 = lim = t→∞ n n2 λτ − 1 ∗ for λτ = 1, and the value for λτ = 1 of ηnp is n2 /2 (with ∗ the corresponding ξnp equal to 1/2). The proof is in Appendix. We can verify from this expression that, in a static network, the longer the time-out τ , the higher the value to which ηnp will converge, and the fewer route requests will be issued. Similarly, the higher the rate of request, the more routes are generated, and the higher the knowledge of the topology. We see that the convergence is only dependent on the product λτ , and is insensitive to the network size n.

B. With Path Accumulation We now derive the corresponding results for the case of Route Requests with path accumulation. If the operating point of the steady state is either in the warm-up phase, or in the tail phase, we can compute the limit of ηpa as t → ∞ analytically. For the tail phase, then the result is the same as without path accumulation: √ λτ − λτ ∗ 2 (8) ηpa = n λτ − 1 This is due to the fact that path accumulation in the tail phase does not bring any extra benefits: most of the topology is known, so there are fewer intermediate paths to discover. Therefore, ηpa (i + 1) − ηpa (i) ∼ ηnp (i + 1) − ηnp (i), and routes disappear at the same ART controlled rate. In the warm-up phase, on the other hand, the result is different. From (4), we see that: ηpa (i + 1) − ηpa (i) =

n2 − ηpa (i) √ 2 n

(9)

In the warm-up phase, connection attempts happen at rate nλ and most become RREQs. Using the same steps as before, we can derive: λn

2 ηpa (n2 − ηpa )2 √ = τ 2 n

(10)

This is a 2nd degree polynomial, which has only one root between 0 and n2 , which can be computed to be equal to the

result in the next statement (we skip the straightforward yet tedious derivation in the interest of space): Theorem 4.2: In the warm-up phase, ∗ ξpa

=

1

√ λτ n 2 √ + λτ2 n

(11)

Remark 4.1: : one should note that, unlike the steady state behavior for ξnp , the limit of ξpa in the warm-up√phase ¯ depends on n (it actually depends on the factor h(n) = n/2). ξpa → 1 as n → ∞. The larger the size of the network, the higher the efficiency of the route discovery process using path accumulation (at least initially, since, as ξpa reaches ξc , the system would leave the warm-up phase). When the convergence falls in between the two phases, we are not able to give a closed form expression for the limit of ξpa . To find its value, one needs to solve the following equation, deriving the rate of route request discovery using equation (5) and taking similar steps as for Theorem 4.1: √ n 2 (12) λτ (1 − ξpa )2 1 + (1 − ξpa )2 = ξpa 2 This is a polynomial function of ξpa , thus finding the root between 0 and 1 is straightforward using numerical methods. Using X = 1 − ξpa , the equation to solve becomes: √ λτ n 4 X + (λτ − 1)X 2 + 2X − 1 = 0 (13) 2 where the desired root is now the unique root between 0 and 1. We solve some examples in Section VII and compare the obtained values with the simulations. V. I NFORMATION D ISSEMINATION IN A M OBILE N ETWORK We now turn to dynamic, mobile networks. Obviously, the fraction of known routes in the network will not be a monotonously increasing function which converges to 1: each route request will discover new routes, but previously known routes will have to be removed from the route table as they expire due to the mobility. We assume the network size density stays constant, and that the node mobility follows an independent and identically distributed motion process with an average velocity v 3 . We omit the effect of the ART in the route discovery process: we assume an omniscient point of view, where a route is kept in the route table as long as it is valid, and is removed as soon as it breaks. In a dynamic network, the CDF for the path life L is given by: P (L < t) = 1 − e−

μhvt r

(14)

where h is the number of hops in the path, μ is the node density in the plane, v is the average node velocity, r is the connectivity radius. Define by c the product μv/r. c is a constant independent of n. 3 We only require that the mobility model satisfies Equation 14, a condition satisfied by many mobility models, including the modified Random Waypoint.

We assume the parameter c is known and fixed for now. We also make the assumption that the paths are independent; this is not an exact assumption, but we will show its validity and usefulness in the numerical evaluation of our results in ¯ Section VII. Define h(n) to be the average of h for a network of n nodes. Equation (14) has been evaluated empirically in [8],and a similar law has been derived analytically in [9], [10]. It has ¯ also been shown that replacing h with its average h(n) in equation (14) models the global rate of link breaks in the network.

route discovery process. As a sanity check, one can verify that ξpa = 1 corresponds to x = 0, which is indeed a root of the polynomial when c = 0. One can retrieve the static case with infinite ART this way. We can solve analytically the limit of ηpa in the case of mobility for the warm-up and tail phases. The tail phase is equivalent to AODV above. However, we have seen that for AODV, as soon as there is mobility, the limit for ξnp goes to 0 as n gets large. This means that the tail phase, for which ξ is high, is not reachable. For small ξpa , the warm-up behavior is a close approximation. However, the warm-up phase is actually an upper bound A. Without Path Accumulation on ξ as the efficiency increment in the warm-up phase is 2√1 n , Theorem 5.1: For a mobile network with density μ, average √ (1−ξ)2 node velocity v, connection radius r such that c = μv/r, the while it is actually 2√n + o(1/ n) in our model. We can consider the asymptotic behavior of ηpa : efficiency ξnp converges to: √ Theorem 5.4: For a network of size n with a mobility ¯ λ ηnp (t) λ − λch(n) ∗ lim ξnp (t) = =√ = (15) parameter c > 0 and using path accumulation, the limit ηpa is ¯ t→∞ ¯ n2 λ − ch(n) λ + ch(n) a strictly positive constant as n → ∞. ∗ The proof is in Appendix. If c = 0, ξnp will converge to 1. Proof: ηpa is one minus the root between 0 and 1 of the λ 4 2λ 2 On the other hand, if there is mobility c > 0, we can highlight polynomial c X + ( c√n − 1)X + 2X − 1. As n → ∞, λ 4 the following consequence: the polynomial becomes c X − X 2 + 2X − 1. Since this Theorem 5.2: If c > 0, the fraction of routes known in polynomial equals -1 at 0, and λc at 1, the root x between 0 ∗ the network by caching the routes discovered using AODV and 1 is strictly less than 1, and thus ηpa = 1 − x is strictly 4 − 14 behaves as O(n ) as n → ∞. positive. Proof: The proof of the Theorem is a direct√consequence of Theorem 5.4 shows that the asymptotic behavior for ηpa in ¯ Theorem 5.1 and of the fact that h(n) = O( n). drastically different than for ηnp as the size of the network A consequence of the theorem needs to be highlighted: the grows large, in the presence of mobility: while the amount knowledge of the network topology goes to zero as the network of useful information vanishes for a protocol without path grows in the presence of any mobility. accumulation, it stays constant for a protocol with path acA practical consequence of this is as follows: caching route cumulation5 . information in the network outside of the route being set up Using path accumulation in a large network entails creating by the current RREQ brings a diminishing return as the size route requests which become larger and larger, as the size of of the network grows. AODV is not efficient at disseminating the RREQ increases as √n. Since the efficiency of the routing information, so that, for large networks, it might be a better protocols should be considered with respect to the protocol use of resource not to attempt to cache routes outside of the overhead, we now turn our attention to this performance current transaction (that is, on the path of the route request- measure. route reply exchange). B. With Path Accumulation

VI. OVERHEAD OF R EACTIVE ROUTING P ROTOCOLS

Theorem 5.3: In a mobile network with parameter c = μv/r which uses path accumulation, the efficiency of the ∗ which is defined as 1 − x, routing protocol converges to ξpa where x is the root of the polynomial: 2λ λ 4 X + ( √ − 1)X 2 + 2X − 1 c c n

(16)

which lies between 0 and 1. Proof: One can write a balance equation similar to equation (12), but using the right hand side of equation (24). The steps leading to equation (16) from the balance equation are straightforward. This gives a numerical method to estimate how much knowledge of the network topology is acquired through the write that f (x) = O(g(x)) if limx→∞ f (x)/g(x) converges to a constant K = 0 and f (x) = o(g(x)) if K = 0. 4 We

We can now make use of the results we have derived thus far to compute the overhead of the routing protocols under consideration. Please note that we consider here exclusively the overhead as the amount of bandwidth used by the control messages (as approximated by the RREQs, since the RREPs are negligible). For instance, we do not consider the gain in contention achieved by reducing the number of access attempts to the MAC layer. Also, we do not consider the repair of broken routes, and the overhead created by sending route error messages and the route recovery mechanism. Since we consider the overhead of control messages at route set-up only, our results provide a lower bound on the actual overhead. 5 Other protocols such as hierarchical routing [11] or fisheye [12] would of course scale differently, but the study of these protocols is beyond the scope of this paper.

A. Overhead Knowing ξ allows us to compute the overhead of a reactive routing protocol. Since most of the overhead resides in the route discovery process, computing the overhead is akin to computing the rate of generating route requests times the size of a route request. Theorem 6.1: The overhead Ξy of the routing protocol y, where y is either pa for path accumulation or np otherwise, is equal to: Ξy = λa (

1 − 1)By ξy∗

(17)

Note: the route discovery process is driven by a parameter which is ρ = λτ in the static case with finite ART, or ρ = λ/c in the mobility case. This means that the scaling behavior we wrote up for λ is actually true for ρ. For instance, increasing τ and keeping λ constant is equivalent in terms of overhead to increasing λ and keeping τ constant. We compare the behavior of the overhead as a function of λ in Figure 1, with a network of 100 nodes and a mobility constant c = 1. The efficiency y-axis is on the right hand side, while the overhead (measured in B units) is on the left hand side.

where λa = nλ is the aggregated rate of request in the network, ξy∗ is the limit of ξy (t) computed using Theorem 5.1 the size of the route if y = np or 5.3 if y = pa, and By is √ request. By = B if y = np and By = B n/4 if y = pa. Proof: We know the rate of route request generation: it is 1−ξ nλ ξy y , which converges to λa ( ξ1∗ − 1). The packet size is y ¯ trivially B without path accumulation. It is B h(n)/2 with path accumulation, since it grows by B at each hop6 . B. Scaling of the Overhead We can use our overhead formula to compare route discovery with and without path discovery in two scaling dimensions: as the size n of the network grows, and as the request rate λ goes. For the size of the network, we have: Theorem 6.2: The overhead for route discovery with and without path accumulation grows as n → ∞: 5 4

3 2

Ξnp (n) = O(n ) and Ξpa (n) = O(n ) 1

1

Fig. 1.

Overhead Ξ and Efficiency ξ as a function of λ

(18) 1/4

Proof: Ξnp = nλ( ξnp − 1)B, and ( ξnp ) = O(n ) from √ 1 1 − 1) B2 n where ( ξpa − 1) goes Theorem 5.2. Ξpa = nλ( ξpa to a constant per Theorem 5.4. In terms of bandwidth, as the network size scales, the overhead of path accumulation is not compensated by the benefit of path accumulation: even though the protocol manages to disseminate some information throughout the network to reuse for future connections, the benefit of this re-use does not reduce the bandwidth enough to make up for the increased bandwidth of the control messages. For the route request intensity, we can derive the following scaling law: Theorem 6.3: The overhead for route discovery with and without path accumulation grows as λ → ∞:

cλ 5/4 cλ 7/4 n B and Ξpa (λ) ∼ n B (19) Ξnp (λ) ∼ 2 2 The proof is in Appendix. While the scaling behavior for route discovery with and without path accumulation is the same √ in terms of λ, path accumulation increases the overhead by a n factor as λ increases. 6 The model could accommodate a finer description of the RREQ size as Bc + kBh after k hops, where Bc is a constant overhead, and Bh is the incremental packet size change at each hop.

VII. N UMERICAL VALIDATION In order to validate our analytical models, we conducted simulations of the model defined by equation (1) for AODV and equation (5) for DSR (we actually use AODV-PA [13] in the simulations in order to isolate the information dissemination from the other parameters of the protocols). We run NS-2 simulations of static networks of 100 and 1000 nodes at a constant density of around 8 neighbors/node in order to count the number of paths discovered at each RREQ and compare the simulation results to the analysis. Each node sends exactly one packet to a random destination. If a route entry already exists to the destination, no RREQ is generated. We first consider the static network case. The top of Figure 2 presents the relationship between the theoretical results and the simulations for a 100 nodes networks, while the bottom graph shows results for 1,000 nodes. We see the perfect match of the analysis for AODV and the very good match for AODV-PA. For the path accumulation case, we can confirm the existence of the two phases, the warm-up where our analysis finds the tangent at the origin, and the tail, where it finds the asymptotic behavior towards ∞. Figure 3 checks that the behavior of ξnp and ξpa is densityindependent, at least in the results of the first order. We plot different simulated values for both ξnp and ξpa , and we can

from 1s to 25s, and are ordered on the graph by increasing ART: the longer the ART, the higher the value of the limit of ξnp . We also include straight lines on the graph, which correspond to the values obtained by using the formula given by Theorem 4.1. Note that the simulation converges to value that are closely predicted by the theoretical value.

5RXWHGLVFRYHU\HIILFLHQF\IRUDQRGHQHWZRUN

Route Discovery Efficiency for a 100 Nodes Network

Fraction of Discovered Topology 'LVFRYHUHG)UDFWLRQRI7RSRORJ\

Tail Phase Asymptote

AODV-PA Simulation

AODV-PA Analytical Model

Warm-Up Phase Asymptote

(IILFLHQF\RIWKH5RXWH'LVFRYHU\3URFHVVIRU$2'9

Efficiency of the Route Discovery Process for AODV

AODV Simulation AODV Analytical Model

55(4

AODV-PA Simulation

Tail Phase Asymptote

ART=25s

ART=20s

ART=15s

ART=10s

ART=5s

ART=1s

RREQ 55(4

(IILFLHQF\RIWKH5RXWH'LVFRYHU\3URFHVVIRU$2'93$ Efficiency of the Route Discovery Process for AODV-PA

AODV Simulation AODV Analytical Model

55(4

Fig. 2.

Static case, ART = ∞, 100 and 1000 nodes network

Efficiency ȟ for Different Densities

ART=෱

ART=25s

ART=20s

ART=15s

0.9

ȟ-NPA Analysis

0.8

ȟ-NPA Density 10

0.7

ȟ-NPA Density 15 ȟ-NPA Density 20

0.6

ȟ-NPA Density 25

0.5

ȟ-PA Analysis

0.4

ȟ-PA Density 10

0.3

ȟ-PA Density 15

0.2

ȟ-PA Density 20 ȟ-PA Density 25

0.1

96

90

84

78

72

60 66

54

48

42

36

30

24

6

12 18

0

0 RREQ

Fig. 3.

ξ as a function of the node density, simulation and analysis

see that, aside from an off-set at the origin due to the number of neighbors known through Hello messages, the behavior is roughly independent of the network density. When the ART is finite, the knowledge of the topology described by ξnp is plotted on Figure 4 (top). The figure plots some curves which converge to some asymptotic behavior. These curves correspond to different ARTs with value ranging

ART=10s

ART=5s

ART=1s

1

E ffie nc y (ȟ)

AODV-PA Analytical Model

) UDFraction F WLR Q R of I'Discovered LV F R Y H UH Topology G 7 R S R OR J \

Fraction of Discovered Topology 'LV F RY HUHG)UDF WLRQRI7RSRORJ\

Route Discovery Efficiency for a 1,000 Nodes Network 5RXWHGLVFRYHU\HIILFLHQF\IRUDQRGHQHWZRUN Warm-Up Phase Asymptote

ART=෱

Fraction of Discovered Topology ) UD F WLR Q R I' LV FR YH UH G 7 R S R OR J \

RREQ55(4

Fig. 4. Percentage of the Topology that is Discovered by AODV (top) and AODV-PA (bottom) for different ART values

Figure 4 (bottom) also describes the same quantities for AODV-PA instead of AODV. The curves correspond to ART values of 1s, 5s, 10s, 15s, 20s, 25s (the limit of ξpa is increasing as a function of the ART) and ∞ (the curve which increases to 1 eventually). In the same order, one can find the lines corresponding to solving equation (13). One can see that there is a close agreement between the values computed using equation (13) and the asymptotic behavior of ξpa . TABLE I E FFICIENCY OF AODV ROUTE DISCOVERY FOR DIFFERENT VELOCITIES Velocity (m/s) Estimation Simulation

0m/s 100% 77.9%

5m/s 8.82% 8.82%

10m/s 6.42% 7.52%

15m/s 5.29% 7.46%

20m/s 4.62% 4.34%

We compare the simulated limit of ξnp with our analysis in

presence of mobility. Since routes in the route table might not be valid, we have to correct the measured number of routes in all route tables by removing the routes that are expired. We use a sampling methodology to estimate the fraction of route that is counted but is invalid. We count three types of packets: c1 packets that are delivered after issuing a RREQ, c2 packets that are delivered using a cached route, and c3 route errors. We obtain an estimate of the ratio of valid routes over all routes by computing c2 /(c2 + c3 ). This method, while convenient, ignores the route errors due to other causes than invalid routes (for instance, dropped packet on a valid route due to interference), so we are only aiming for an approximation. The results are presented in Table I. In order to estimate the physical parameters of the network accurately, we set the parameters in our model so that the value for 5m/s equals the simulated value. We want to check that, once properly calibrated, our model will give a reasonably accurate depiction of the knowledge of the topology. VIII. R ELATED W ORK The overhead of location dissemination in ad hoc routing has been extensively studied via simulations [14], [15], [16], [17], [18]). Our results are analytical in nature, and apply to the performance of reactive protocols, while other recent results [19], [20], [21] consider the overhead of proactive protocols. Analytical models have been proposed in the past [22], [23], [24], but they do not consider cached data and the impact of route re-use on the overhead. [22], [23] further focus on a Manhattan grid topology. Our work uses a generic topology and explicitly takes into account cached routes. Hierarchical protocols [11], [12] organize the route discovery in a more scalable manner, which would lead to a lower overhead. Some hierarchical networks build a first tier composed of cluster heads, and our results would apply for the dissemination of route information over the cluster head backbone. Doing away with the route discovery process altogether has been considered in [25]. However, it is replaced by the proactive maintenance of a routing overlay, which also comes at a cost. A study of this overhead would be interesting to compare to our results, and it is in our research plans. IX. C ONCLUSION We have presented an analytical model of the efficiency of the information dissemination process in mobile ad hoc networks, which computes the fraction of path known in the network as a function of time, both in a static model and in a mobile scenario. Our analytical work used fluid approximations of a large scale ad hoc network to obtain a macroscopic description of the system. In a static network, we pointed out two regimes for route discovery using path accumulation: one in which it outperforms route discovery without path accumulation by a great margin, and one in which it is equivalent to route discovery without path accumulation. In this latter regime, path accumulation does not provide any extra benefit in finding routes, but

uses up a lot more bandwidth, wasting capacity. We identified the cut-off value ξc = 1 − √2n . We used our model to compute the overhead of the route discovery process. We computed two asymptotes of the overhead with and without path accumulation. As n → ∞, the overhead scales as n5/4 without path accumulation and n3/2 with path accumulation. Path accumulation is detrimental to the overhead in this scaling regime. As λ → ∞, both √ scale as λ but with a higher multiplying coefficient for path accumulation. In both cases, path accumulation does not provide a benefit in the limit (path accumulation is of course beneficial for smaller n and smaller λ, since it will decrease ξpa below ξc ). We provided a numerical evaluation which very closely supports our analytical model. R EFERENCES [1] T. Abdelzaher and et al., “Mobiscopes for human spaces,” IEEE Pervasive Computing - Mobile and Ubiquitous Systems, vol. 6, no. 2, April 2007. [2] J. Heidemann, F. Silva, and D. Estrin, “Matching data dissemination algorithms to application requirements,” in Proc. of the ACM SenSys, 2003. [3] S.M.Das, H. Pucha, and Y. C. Hu, “Performance comparison of scalable location services for geographic ad hoc routing,” in Proc. of IEEE INFOCOM, March 2005. [4] H. Hethcote, “The mathematics of infectious diseases,” SIAM Review, vol. 42, no. 4, pp. 599–653, December 2000. [5] R. Thommes and M. Coates, “Epidemiological modelling of peer-to-peer viruses and pollution,” in Proc. of IEEE Infocom, 2006. [6] C. Perkins and E. M. Royer, “Ad hoc on-demand distance vector routing,” in Proc. of the 2nd IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, February 1999. [7] D. Johnson, D. Maltz, and Y.-C. Hu, “The dynamic source routing protocol for mobile ad hoc networks (DSR),” IETF MANET working group, draft-ietf-manet-dsr-10.txt, work in progress, July 2004. [8] F. Bai, N. Sadagopan, B. Krishnamachari, and A. Helmy, “Modeling path duration distributions in MANETs and their impact on routing performance,” IEEE Journal on Selected Areas of Communications (JSAC), vol. 22, no. 7, pp. 1357–1373, September 2004. [9] Y. Han, R. La, A. Makowski, and S. Lee, “Distribution of path durations in mobile ad hoc networks -Palm’s theorem to the rescue,” Computer Networks (Special Issue on Network Modeling and Simulation), vol. 50, no. 12, pp. 1887–1900, August 2006. [10] R. La and Y. Han, “Distribution of path durations in mobile ad-hoc networks and path selection,” IEEE/ACM Transactions on Networking, vol. 15, no. 5, pp. 993–1006, October 2007. [11] C.-C. Chiang, G. Pei, M. Gerla, and T.-W. Chen, “Scalable routing strategies for ad hoc wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 17, pp. 1369–79, 1999. [12] G. Pei, M. Gerla, and T.-W. Chen, “Fisheye state routing: A routing scheme for ad hoc wireless networks,” in Proc. ICC, New-Orleans, USA, 2000. [13] S. Gwalani, E. M. Belding-Royer, and C. Perkins, “AODV-PA: AODV with path accumulation,” in Proc. of IEEE ICC’03. [14] J. Broch, D. A. Maltz, D. B. Johnson, Y.-C. Hu, , and J. Jetcheva, “A performance comparison of multi-hop wireless ad hoc network routing protocols,” in Proc. of ACM MobiCom’98, Dallas, TX, October 1998. [15] S. Das, R. C. neda, J. Yan, and R. Sengupta, “Comparative performance evaluation of routing protocols for mobile ad hoc networks,” in Proc. of IC3N, 1998, pp. 153–61. [16] P. Johansson, T. Larsson, N. Hedman, B. Mielczarek, and M. Degermark, “Scenario-based performance analysis of routing protocols for mobile ad-hoc networks,” in Proc. ACM/IEEE Mobicom’99, pp. 195–206. [17] E. Borgia, “Experimental evaluation of ad hoc routing protocols,” in Proc. of IEEE PerCom workshop, March 2005. [18] S. R. Das, C. E. Perkins, E. M. Royer, and M. K. Marina, “Performance comparison of two on-demand routing protocols for ad hoc networks,” IEEE Personal Communications Magazine, pp. 16–28, February 2001.

[19] X. Wu, H. Sadjadpour, and J. Garcia-Luna-Aceves, “Routing overhead as a function of node mobility: Modeling framework and implications on proactive routing,” in Proc. IEEE MASS’07, Pisa, Italy, October 2007. [20] N. Zhou and A. A. Abouzeid, “Information theoretic analysis of proactive routing overhead in mobile ad hoc networks,” to appear in IEEE Transactions on Information Theory. [21] D. Wang and A. A. Abouzeid, “Link state routing overhead in mobile ad hoc networks: A rate-distortion formulation,” in Proc. of Infocom 2008. [22] N. Zhou, H. Wu, and A. A. Abouzeid, “Reactive routing overhead in networks with unreliable nodes,” in Proc. of ACM Mobicom, 2003. [23] ——, “The impact of traffic patterns on the overhead of reactive routing protocols,” IEEE JSAC, vol. 23, no. 3, pp. 547–560, March 2005. [24] M. Naserian, K. Tepe, and M. Tarique, “Routing overhead analysis for reactive routing protocols in wireless ad hoc networks,” in Proc. WiMob 2005. [25] M. Caesar, M. Castro, E. Nightingale, G. O’Shea, and A. Rowstron, “Virtual ring routing: Network routing inspired by DHTs,” in Proc. of ACM SIGCOMM’06, Pisa, Italy, pp. 351–362.

A PPENDIX Proof of Theorem 3.1: We proceed by induction. ηnp (0) = nd as we start with a route table containing only the information about the neighbors of each node, and the average degree is d. Further, n2 − ηnp (i) n (20) n2 since a route request will reach all n nodes with probability n2 −ηnp (i) . The route discovery process will actually discover n2 more than n paths as it will also set up the path to the destination using the route reply. However, since the number of paths discovered by the √ RREP is of the order of the average ¯ path length h(n) = O( n), we can neglect it. Equation (20) can be rewritten as: ηnp (i + 1) = n + ηnp (i)(1 − n1 ) which implies ηnp (i) = n2 − (n2 − dn)(1 − n1 )i ηnp (i + 1) = ηnp (i) +

Proof of Theorem 3.2: The packet connection attempts are generated at each node according to a Poisson process with rate λ, which implies that the overall connection attempt process is another Poisson process with rate nλ. The time ti+1 satisfies ti+1 = ti + Ni /(nλ), where Ni is the number of connection attempts strictly after the ith RREQ up to the (i + 1)st route requests. Between the two route requests, the knowledge about the topology stays the same, so Ni represents the number of times a binomial random variable with parameter ηnp (i)/n2 hits one successively before hitting 0 (if the binomial r.v. hits one, then no new RREQ is generated and the route is known at the source; when it hits 0, then a new RREQ is generated). We approximate Ni with its average value ηnp (i)/(n2 − ηnp (i)) + 1 which yields ti+1 = ti + 1/(nλ) n2 /(n2 − ηnp (i)), or 1 n j=i−1 Σ ti = (21) λ j=0 n2 − ηnp (j) Plugging (1) into equation (21), we obtain: ti

=

1 n i−1 Σ λ j=0 (n2 − dn)(1 − n1 )j (22)

which reduces to the Theorem’s stated result.

Proof of Theorem 4.1: For the system to reach steady state, we need the creation rate to be equal to the elimination rate of routes. The rate at which routes are discovered is a function of three elements: (i) the connection generation process, a Poisson process with cumulative rate nλ, (ii) the rate at which such connection attempts turn into route requests, and (iii) the number of routes discovered per RREQ, (1 − ξnp )n (see Eq. 20). Only the second element is unknown, and we have seen in the proof of Theorem 3.2 that a connection attempt turns into a route request with probability 1 − ξnp and does not turn into a route request with probability ξnp . Thus in steady state, the rate at which connections turn into a RREQ 1−ξ is ξnpnp . The rate at which routes are created is thus, multiplying all (1−ξ ) (1−ξ )2 three elements: nλ ξnpnp (1 − ξnp )n = n2 λ ξnpnp . The decay process destroys routes at rate ηnp /τ . Thus we can write: λ

(1 − ξnp )2 ξnp ηnp = 2 = ξnp n τ τ

(23)

∗ which is a polynomial in ξnp (i) for which the only root ξnp between 0 and 1 is the result of the Theorem. Proof of Theorem 5.1: We can write a balance equation similar to equation (23) using the rate of route breakage derived from equation (14), instead of the rate of route disappearance through the Active Route Time-Out process. 2 2 n − ηnp n − ηnp P (L < δ) (24) n = ηnp lim λn δ→0 ηnp n2 δ 2 ¯ This eventually gives us: λ(1 − ξnp )2 = ξnp ch(n). This is a second order polynomial in ξnp , for which the only root between 0 and 1 is the result of the Theorem. Proof of Theorem 6.3: The first relation in√the √ Theo= λ/( λ+ rem statement comes from the fact that ξ np √ √ c n c n/2), and thus 1/ξnp − 1 ∼ 2 /λ. In order to derive the second relationship, we need to compute ξnp −ξpa = (1−ξpa )−(1−ξnp ). Denote xy = 1−ξy . Then xy is the root of a polynomial Py for y either np or pa. √ − 1)X 2 + 2X − 1 and Ppa = λ X 4 + Pnp . Pnp = ( c2λ c n Thus Pnp (xnp ) − Ppa (xpa ) = 0 which provides the following

relationship: xnp − xpa =

λ 4 c xpa √ −1)(xpa +xnp )+2 ( c2λ n

√ − 1) > 0 and since xpa < xnp , we have For large λ, ( c2λ n

√ n 3 n 3 4 xpa√< 4 xnp . Since n then |xnp − xpa | < 4λ3/2 . Thus, using ξpa − ξnp = xnp − xpa ,

|xnp − xpa |