Maximum Entropy Based Randomized Routing in Data-Centric Networks Kenji Leibnitz, Tetsuya Shimokawa, Ferdinand Peper
Masayuki Murata
Center for Information and Neural Networks (CiNet) NICT and Osaka University, Japan {leibnitz,shimokawa,peper}@nict.go.jp
Graduate School of Information Science and Technology, Osaka University, Japan
[email protected]
Abstract—Recently, there are increasing efforts to define new protocols and architectures for data-centric networking that do not rely on the addresses of the individual network nodes. In that case, queries are usually disseminated by flooding throughout the network until the node with the requested content is found. However, flooding always imposes an additional burden on the network since many unnecessary requests are circulated, particularly in the case of energy-constrained systems. In this paper, we examine the feasibility of a more lightweight search method by using a random walk in the network and investigate the possibility of a better load balancing through a maximum entropy random walk. We perform simulation studies and analyze the performance compared to the general random walk and flooding mechanisms in terms of search time and energy per query. We conclude that under suitable topologies the maximum entropy random walk is a viable alternative to flooding.
I. I NTRODUCTION Traditionally, communication networks operate in an address-centric manner, where the focus of the communication process is on the host-to-host unicast connection between a source and destination node that are identified by their network addresses, e.g. IP address. The growing number of automatically generated sensor data (“big data”) that is becoming accessible and the shift toward content distribution rather than individual host-to-host communication are leading toward the data-centric operation and management of information and communication networks. Data-centric protocols have been proposed in the past for wireless sensor networks [1], [2] and there have also been proposals for content-centric networking in IP networks [3], [4]. While the burden from management of network addresses can be greatly relieved in favor of the content distribution, the problem still remains in finding where this content is physically located within the network. Usually, searching for the hosts with the desired content is performed through the broadcast of query packets that are received by neighbor nodes and again replicated until the desired content is found. This process is also termed flooding. However, in large networks flooding will lead to a large overhead due to the large number of unnecessarily propagated queries. One way to reduce this burden is by modifying the flooding process by including a time-to-live (TTL) parameter to restrict the search space [5]. Another way is to use randomness in the dissemination process and probabilistically forwarding the query to only
Copyright 2013 IEICE
desnaon
desnaon 10
4
4
9 4
3
7
3
3
6
3 2
1
2
3
2
source
8
2
3
4
5
1
source (a) Query Flooding
(b) Random Walk
Fig. 1. Flooding and random walk (numbers indicate hop sequence)
some neighboring nodes instead of all neighbors. Such concept is applied for example in mobile ad hoc networks [6] or peer-to-peer networks in the form of the gossip protocol (or epidemic protocol) [7] and it is beneficial when the network size is very large or its structure is unknown. The application of fluctuations or noise has also been demonstrated in [8], [9] by biologically inspired network control methods to improve robustness. In nature, fluctuations are fundamental to the function of all kinds of biological systems ranging from molecular level to the brain [10]. Another way how to utilize randomness in the search on a network graph is through a random walk of the query packets. Random walk is a fundamental stochastic process and is performed when a query packet arrives at a node where the next hop node is randomly selected with a certain probability from the set of all neighboring nodes. While it may seem less effective performing a random walk to search for a specific node instead of using the shortest paths between the two end nodes, its benefit is that only knowledge of local connectivity is required and no routing table is necessary. However, the topology of the network has a large impact on the performance of the search and can be improved by choosing appropriate probabilities for selecting a neighbor as next hop. In [11]– [13], random walk routing in wireless sensor networks was proposed and shown to work efficiently while also balancing the load in regular grid topologies. Furthermore, random walks for queries in unstructured peer-to-peer (P2P) networks were studied in [14]. Figure 1 briefly illustrates the different behavior between flooding and random walk when a source node is searching
for a destination node. In Fig. 1(a), the query packet is received by an intermediate node that forwards it to all of its neighbors. Each arrow indicates a new query packet transmission and the correct destination is reached after 4 hops. The small numbers annotating the links represent the sequence number of taken hops. However, of all the 12 packets sent in this figure, only 4 would be actually necessary for reaching the destination node. On the other hand, Fig. 1(b) shows the random walk of a single query packet in the network, which may take wrong turns making the total number of hops a stochastic quantity. This figure also shows that the walk depends on the topological properties of the network and can be manipulated by appropriately setting the link weights, which determine the probability of choosing a certain link on the next hop. In this paper, we discuss how we can set the probabilities at a node for its next hop neighbors when the search is performed by a random walk process over a network graph. We show that using a random walk can be beneficial under certain topologies compared to flooding as the number of nodes involved in the query search and the energy consumption are reduced. Particularly in scale-free topologies, which are commonly observed in real world networks, a general random walk can be improved by appropriately setting the next hop probabilities. The paper is organized as follows. In Section II, we give a formal description of the problem and discuss the general random walk, as well as the variation of maximum entropy random walk that is used as the basis in this paper. Then, in Section III, we evaluate the proposed method by numerical simulations and we conclude this paper in Section IV. II. R ANDOM WALK ON N ETWORK G RAPHS The topology of a network of N nodes is expressed by the adjacency matrix A = (ai,j )N ×N of its graph. If a link exists between two nodes i and j, then we have an entry ai,j = 1, whereas ai,j = 0 indicates that there is no link. The sums kiin =
N
ai,j
j=1
and
kiout =
N
aj,i
j=1
are defined as the out-degree and in-degree of node i and denote the number of next hop neighbors or previous hop neighbors of node i, respectively. In the following, we will consider only undirected graphs, so kiin = kiout for all i and only referred to as degree ki . The random walk of a query packet along the edges of the graph can be expressed by a Markov chain, where a packet at node i randomly chooses a next hop node j with probability pi,j independently of its previous hop sequence. The question we are addressing in the following is whether we can define a suitable probability matrix P = (pi,j )N ×N for a given adjacency matrix A of a network graph such that the routing process is improved with regard to some performance metric, e.g., mean first-passage time. Since the physical topology A may vary in its structure and could be regular or highly complex (e.g. small world or scalefree), our goal is to counteract this heterogeneity by trying
Copyright 2013 IEICE
source trajectory of random walk
destination
Fig. 2. Example trajectory of a GRW on a scale-free network topology with N = 50 nodes and single destination node
to make nodes equally reachable from each other. Entropy characterizes the degree of disorder or randomness of a system, making this problem similar to that of maximizing entropy in a thermodynamic system to help a diffusion process spread out evenly. For a Markov chain, the entropy rate S is independent of the initial distribution of the states. The entropy rate S can be seen as the time density of the average information in a stochastic process. For the stationary probability πi∗ of finding a packet at node i, entropy rate is defined as follows [17]. S=−
N i=1
πi∗
N
pi,j ln pi,j
(1)
j=1
Thus, the higher the entropy rate S is, the more evenly balanced the network is for reaching each node. A. General Random Walk The simplest form of random walk is a General Random Walk (GRW), where the probability pi,j of choosing any neighbor node j as next hop of node i is equal for each neighbor. Since the probability is uniform, pi,j can be computed by dividing through the degree ki of node i. pi,j =
ai,j ki
i, j = 1, . . . , N
(2)
The resulting stationary distribution of finding a query packet at node i at an arbitrary time is then given by Eqn. (3), cf. [15]. ki πi∗ = N
j=1
kj
(3)
The entropy rate for GRW is therefore calculated with Eqns. (2) and (3) as shown in Eqn. (4). N i=1 ki ln ki SGRW = (4) N i=1 ki Figure 2 illustrates an example trajectory of GRW on a scale-free network topology with N = 50 nodes.
or biological networks, we can see that the network entropy for MERW always exceeds that of GRW.
TABLE I N ETWORK ENTROPY FOR VARIOUS NETWORKS TOPOLOGIES AT&T Sprint Level 3 USAir97 C. elegans Brain
nodes 631 604 624 332 297 57300
links 2078 2268 5298 2126 2148 2731905
SGRW 2.32 2.45 3.33 3.33 2.95 6.35
SMERW 10.68 9.43 8.11 3.72 3.19 7.04
C. Weight of Paths Our goal of setting the probabilities for MERW is to influence the random walk process to reduce the dependence on network topology permitting an equal propagation of query packets throughout the network. Let us now consider a path of length t within the network leading through nodes i0 , i1 , . . . , it . If we look at GRW, the probability for having this exact trajectory of a packet is
B. Maximum Entropy Random Walk Instead of using GRW, Burda et al. [15] introduce the construction of Maximum Entropy Random Walk (MERW), where the transition matrix is defined by the entries pi,j as follows. pi,j =
ai,j ψj λ ψi
(5)
The term λ is the largest eigenvalue of A and ψ its corresponding normalized eigenvector with i ψi2 = 1. Thus, the transition probabilities of the random walk process are similar to the eigenvector centrality of the nodes, which is regarded as one way of describing the influence of a node within the topology. The same way of defining entropy within a network and setting the transition probabilities accordingly to maximize entropy were also discussed by Demetrius and Manke in [17]. They further establish a relationship between entropy and the robustness of the average shortest path length since networks that have a higher entropy are also more robust toward removal of nodes. The stationary distribution of finding a query packet at node i with MERW and its entropy rate are then according to [15] as shown in Eqns. (6) and (7). πi∗ = ψi2 SMERW = log λ
(6) (7)
Unfortunately, the definition of the transition probabilities in Eqn. (5) requires knowledge of the largest eigenvalue of the adjacency matrix and its corresponding eigenvector. This can only be determined if the topology is fully known and is usually not very practical, especially in large networks. However, it was shown by Sinatra et al. [16] that the maximum entropy random walk can be constructed only with limited and local information based on the degrees of the first and second hop neighbors of each node. Table I shows the network entropies S of GRW and MERW for some example network topologies. The first three topologies, AT&T, Sprint, and Level 3, are router level topologies of ISPs obtained from the Rocketfuel data set [19]. The network USAir97 is the US airport network and C. elegans is a network of neurons in a biological organism. Both are often found as reference topologies in complex network literature. The data set of Brain is that of a functional brain network extracted from our own experimental data from resting state functional magnetic resonance imaging (fMRI). In all cases of engineered
Copyright 2013 IEICE
(t)
PGRW (γi0 ,it ) = pi0 ,i1 pi1 ,i2 · · · pit−1 ,it 1 = . ki0 ki1 · · · kit−1
(8)
where i0 is the starting node and it is the end node of the path and kij denotes the degree of each node along this path. On the other hand, for MERW the probability of the trajectory becomes 1 ψi (t) PMERW (γi0 ,it ) = t t (9) λ ψi0 and it only depends on the elements in the eigenvector for the first and last node of the path. It is independent of the transition probabilities pij of all intermediate nodes that are along the path. Thus, all paths from i0 to it of the same length t have an equal probability. III. N UMERICAL E VALUATIONS We now investigate by simulations how well the MERW based random walk query search performs when compared to GRW and flooding. Let us consider a network consisting of N nodes of which there are 1 ≤ R ≤ N randomly located destination nodes offering the desired content. If not stated otherwise, the source node issuing the query request and the R destination nodes are selected uniformly random among all nodes in the network. We consider two randomly generated network topologies in our simulations. The Barab´asi-Albert (BA) topology [20] is a scale-free network topology generated with N nodes through preferential attachment, each with a minimum node degree of 2. The Watts-Strogatz (WS) topology [21] is a small-world network topology, where each node connects by default to its first nearest neighbor in a regular grid lattice and has a shortcut connection with probability 0.1. Each data point in the following graphs is computed as the average over 1000 simulation runs and error bars indicate the 95% confidence intervals using Student’s t-distribution. A. Average Number of Hops The average number of hops for a query packet indicates the speed of search within the network. Figure 3 shows the average hop count until finding one of the destinations for the three different methods with R = 50 destination nodes and the network size N varying from 200 to 1000. The results for the scale-free topology are shown in Fig. 3(a) and for the small-world topology in Fig. 3(b).
4.5
80
3 2.5 2
60 50 40 30 20
1.5 1 0
10
200
400 600 800 number of nodes N
1000
0 0
1200
(a) Scale-free BA topology
20
200
400 600 800 number of nodes N
1000
1200
1000
1200
(a) Scale-free BA topology
25
80
GRW MERW Flooding
70
GRW MERW Flooding
60 total energy per query
average number of hops
GRW MERW Flooding
70
3.5
total energy per query
average number of hops
4
90
GRW MERW Flooding
15
10
50 40 30 20
5 10
0 0
200
400 600 800 number of nodes N
1000
1200
0 0
200
400 600 800 number of nodes N
(b) Small world WS topology
(b) Small world WS topology
Fig. 3. Average number of hops until finding the queried content for R = 50 destination nodes. In both topology types, flooding shows a better performance than the random walks, but the difference is smaller in scale-free networks than in small-world networks.
Fig. 4. Total energy per query for R = 50 destination nodes and CT x = 2.5. The energy consumption of flooding is higher than the random walks in scalefree networks since more nodes are involved in the query dissemination, while in small-world networks flooding becomes better for larger network sizes.
The basic GRW always performs worst among all three considered methods and MERW improves the walk performance by requiring a smaller number of hops. However, as expected, both random walks are inferior to the flooding case in terms of average number of hops. The dependence of both random walk methods on the topology is best seen when both Fig. 3(a) and Fig. 3(b) are qualitatively compared to each other. For the small-world type of network, the random walk methods require significantly longer time to find the destinations, although it should be remarked here that networks occurring in the real world resemble rather scale-free networks than Watts-Strogatz type of small-world networks. The differences between the average number of hops can be explained by the average path lengths among each node pair in both topologies. B. Energy Consumption Model In the evaluation of the number of hops, it could be seen that the random walk search is inferior in search time compared to flooding. The reason is that in the flooding process multiple nodes participate in the search in parallel, while in random walk only a sequence of unicast connections is used. For the case that transmissions come without any cost, flooding will always be preferable in terms of search speed.
Copyright 2013 IEICE
However, in many cases, such as in wireless sensor networks, energy efficiency is of higher importance than performance and battery power is required for both the sending and the receiving nodes. In order to include energy consumption into our model, we consider the following simple model. Let us say that each reception of a packet has a unit cost of 1 and the transmission of a packet has a cost of C ≥ 1. A typical value of C ≈ 2.5 has been reported for the Mica mote in wireless sensor networks [22]. To include both energy and performance into our evaluation, we now consider the total energy cost per query search, but leave out the costs for sensing and for monitoring the radio channel for incoming signals since these would incur the same costs in all three considered methods. Figure 4 shows the total energy required per query for C = 2.5. The relation to the number of hops in Fig. 3 is linear for the random walks, since each hop involves one transmitter and receiver due to the unicast connections. On the other hand, the energy for flooding drastically increases with the network size and is worse than the random walks in the scale-free case in Fig. 4(a). However, in small-world networks, flooding performs worse with regard to energy consumption when the
200
12
150
11 10 average number of hops
total energy per query
GRW MERW Flooding
100
50
betweenness centrality degree clustering coefficient
9 8
GRW
7 6 5
Flooding
MERW 4 3
0 0
10
20 30 number of destinations R
40
(a) Scale-free BA topology 600 GRW MERW Flooding
total energy per query
500
200
400
600 800 number of nodes N
1000
Fig. 6. Placement of R = 10 destinations at nodes in a scale-free network with highest clustering coefficient (dash-dot), degree (dashed), or betweenness centrality (solid).
D. Strategic Placement of Content
400 300 200 100 0 0
2
10
20 30 number of destinations R
40
(b) Small world WS topology Fig. 5. Total energy per query for N = 500 nodes and C = 2.5. In both types of network topologies, energy consumption saturates with increasing number of destination nodes indicating that only a small number of R is sufficient.
number of nodes is small, but improves for larger networks.
As we have seen above, the topology plays an important role on the performance of the random walk. Furthermore, the number of content destinations R further affects the speed of the query search up to a certain percentage of the total number of nodes above which no real improvement can be achieved. However, the destinations were placed in the above evaluations randomly throughout the network. In this subsection we look at where to place the R destinations with respect to each node’s complex network measures. In particular, we consider the three local node-related measures of degree, clustering coefficient, and betweenness centrality in order to allocate R destinations to the nodes having the largest of these three measures, respectively. Clustering coefficient ci of node i expresses how nodes linked to i are also linked among each other. N
C. Impact of Number of Destinations In order for the network manager to provision the network with an adequate number of destinations, we investigate the performance of a network of fixed size N = 500 and varying number of destinations. In Fig. 5(a), it can be seen that regardless of the number of destinations R, the energy for flooding is always much higher than for both random walks in scale-free topologies. However, for GRW and MERW there is no real benefit of increasing R beyond a value of about 20 after which the energy remains the same. In the case of small-world networks in Fig. 5(b), there is also a saturation point visible, although flooding remains more effective here as discussed in the previous section due to its higher connectivity. In general, this means that there exists an optimal number of destination nodes for each query depending on the total number of nodes in the network and a comparatively small fraction of destination nodes (≈ 0.04N in the example of Fig. 5) is sufficient to make random walk competitive or even superior to flooding depending on the type of network topology.
Copyright 2013 IEICE
2 ti 1 ci = n i=1 ki (ki − 1)
with
N 1 ti = ai,j ai,h aj,h 2 h,j=1
Betweenness centrality bi of node i describes the number of shortest paths from all nodes to each other that pass through node i bi =
1 (n − 1)(n − 2)
N h,j=1 h=i,h=j,i=j
ρh,j (i) ρh,j
where ρh,j is the number of shortest paths between nodes h and j, and ρh,j (i) is the number of shortest paths between h and j that pass through i. Figure 6 shows the results of the simulations of hop counts for R = 10 destinations and varying network size N for scalefree topologies. Choosing the R nodes with highest clustering coefficient (dash-dot line) yields the worst results for GRW and flooding, while MERW shows nearly equal average number of hops for all 3 placement strategies. The reason for this behavior lies in the fact that MERW already compensates for the heterogeneity of the topology, so that unlike GRW it cannot
cliques
destination in clique 1
destination in clique 2
scale-free topology
Fig. 7. Sketch of an optimal routing topology consisting of fully meshed cliques interconnected by a scale-free topology
be further improved by strategic planning of destinations and it is sufficient to perform a random placement for MERW. In both other cases of GRW and flooding, however, the search performance can be improved by choosing either the nodes with high degrees or high betweenness centrality. IV. C ONCLUSION AND F UTURE W ORK In this paper we studied the feasibility of applying a random walk query search in a data-centric network under random and complex topologies. For suitable topologies, random walk does not necessarily perform much worse than the commonly used flooding mechanism. On the contrary, since only a single path is followed during the query search, fewer nodes are involved in the dissemination process, which leads to a lower consumption of energy. We have seen that a maximum entropy random walk improves the general random walk in performance by counteracting the irregularities in topology to balance the reachability probability of the destination nodes. Furthermore, only a small number of destination nodes in the network is sufficient to provide replicas of the desired content for achieving performance compared to flooding for both small world and scale-free topologies. While this study focused on the evaluation of MERW by simulation, our future work includes deriving and evaluating analytical formulations of the considered performance metrics. This study showed further that scale-free type of topologies are beneficial for random walks compared to small-world topologies. In order to speed up the query search by random walk, our future goal is to design an optimal topology where the random walk performs equally to flooding. A possible solution is illustrated in Fig. 7 as a hybrid network structure consisting of fully-meshed cliques that are interconnected by a scale-free topology. Such structures are similar to the barbell (or dumbbell) topology [23], where each clique could accommodate one destination node. In such topology, it is easy to reach a clique for the query packet and once it is within the clique it can easily find the destination node since each node is only a single hop away.
Copyright 2013 IEICE
R EFERENCES [1] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed diffusion: a scalable and robust communication paradigm for sensor networks”, in Proc. 6th Annu. Int. Conf. on Mobile Computing and Networking (MobiCom ’00), pp. 56–56, Boston, MA, 2000. [2] I. Stojmenovi´c and S. Olariu, “Data-centric protocols for wireless sensor networks”, in Handbook of sensor networks: algorithms and architectures, I. Stojmenovi´c (Ed.), John Wiley & Sons, 2005. [3] V. Jacobson, D. K. Smetters, J. D. Thornton, M. Plass, N. Briggs, and R. Braynard. 2012. “Networking named content”. Commun. ACM, vol. 55, no. 1, pp. 117–124, Jan. 2012. [4] J. Choi, J. Han, E. Cho, T. Kwon, and Y. Choi, “A survey on content-oriented networking for efficient content delivery”, IEEE Commun. Mag., vol. 49, no. 3, pp. 121–127, March 2011. [5] N. B. Chang and M. Liu, “Controlled flooding search in a large network”, IEEE/ACM Trans. Netw., vol. 15, no. 2, pp. 436–449, April 2007. [6] Z. Haas, J. Halpern, and L. Li, “Gossip-based ad hoc routing”. IEEE/ACM Trans. Netw., vol. 14, no. 3, pp. 479–491, June 2006. [7] M. Jelasity, A. Montresor, and O. Babaoglu, “Gossip-based aggregation in large dynamic networks”. ACM Trans. Comp. Syst., vol. 23, no. 3, pp. 219–252, Aug. 2005. [8] K. Leibnitz, N. Wakamiya, and M. Murata. “Biologically-inspired selfadaptive multi-path routing in overlay networks”, Commun. ACM, special issue on self-managed systems and services, vol. 49, no. 3, pp. 62– 67, Mar. 2006. [9] K. Leibnitz and M. Murata, “Attractor selection and perturbation for robust networks in fluctuating environments”, IEEE Netw., special issue on biologically inspired networking, vol. 24, no. 3, pp. 14–18, May/June 2010. [10] T. Yanagida, M. Ueda, T. Murata, S. Esaki, Y. Ishii, “Brownian motion, fluctuation and life”, Biosystems, vol. 88, no. 3, pp. 228–242, April 2007. [11] S. D. Servetto and G. Barrenechea, “Constrained random walks on random graphs: routing algorithms for large scale wireless sensor networks”, in Proc. 1st ACM Int. Workshop on Wireless Sensor Networks and Applications (WSNA ’02), pp. 12–21, Atlanta, GA, September 2002. [12] H. Tian, H. Shen, and R. Matsuzawa, “Random walk routing for wireless sensor networks”, in Proc. 6th Int. Conf. on Parallel and Distributed Computing, Applications and Technologies (PDCAT’05), pp. 196–200, Dalian, China, Dec. 2005. [13] I. Mabrouki, X. Lagrange, and G. Froc, “Random walk based routing protocol for wireless sensor networks”, in Proc. 2nd Int. Conf. on Performance Evaluation Methodologies and Tools (ValueTools ’07), Nantes, France, Oct. 2007. [14] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, “Search and replication in unstructured peer-to-peer networks”, in Proc. 16th Int. Conf. on Supercomputing (ICS ’02), pp. 84-95, New York, NY, June 2002. [15] Z. Burda, Z. Duda, J. M. Luck, and B. Waclaw, “Localization of the maximal entropy random walk”, Phys. Rev. Lett., vol. 102, 160602, 2009. [16] R. Sinatra, J. G´omez-Garde˜nes, R. Lambiotte, V. Nicosia, and V. Latora, “Maximal-entropy random walks in complex networks with limited information”, Phys. Rev. E, vol. 83, no. 3, 030103(R), 2011. [17] L. Demetrius and T. Manke, “Robustness and network evolution – an entropic principle”, Physica A, vol. 346, pp. 682–696, 2005. [18] J. D. Noh and H. Rieger, “Random walks on complex networks”, Phys. Rev. Lett., vol. 92, no. 11, 118701-1, 2004. [19] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, “Measuring ISP topologies with Rocketfuel”, IEEE/ACM Trans. Netw., vol. 12, no. 1, pp. 2–16, Feb. 2004. [20] R. Albert, A.-L. Barab´asi, “Statistical mechanics of complex networks”, Rev. Mod. Phys., vol. 74, no. 1, pp. 47–97, 2002. [21] D. J. Watts, S. H. Strogatz, “Collective dynamics of ‘small-world’ networks”, Nature, vol. 393, no. 6684, pp. 409–410, 1998. [22] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Wireless sensor networks for habitat monitoring”, in Proc. 1st ACM Int. Workshop on Wireless Sensor Networks and Applications (WSNA ’02), pp. 88–97, Atlanta, GA, September 2002. [23] A.-M. Kermarrec, E. Le Merrer, B. Sericola, G. Tr´edan, “Second order centrality: Distributed assessment of nodes criticity in complex networks”, Comput. Commun., vol. 34, no. 5, pp. 619–628, April 2011.