Distributed Algorithm for Shortest Path Problem in Undirected Graphs via Randomized Strategy Mohammadreza Doostmohammadian
Sepideh Pourazarm
Usman A. Khan
Electrical and Computer Engineering Tufts University Medford, MA 02155 Email:
[email protected]
Systems Engineering Boston University Boston, MA 02115 Email:
[email protected]
Electrical and Computer Engineering Tufts University Medford, MA 02155 Email:
[email protected]
Abstract—In this paper, we introduce a distributed algorithm for single source shortest path problem for undirected graphs. In this problem, we find the shortest path from a given source node to other nodes in the graph. We start with undirected unweighted graphs in which the shortest path is a path with minimum number of edges. Following, we modify the algorithm to find shortest path for weighted graphs in which the shortest path is a path with minimum cost, i.e., sum of the edge weights. We examine the convergence time of the random algorithm for random Erdos-Renyi graphs as a random variable; based on that we approximate the stop-time criteria of the algorithm for graphs with unknown topology. We claim that the stop-time criteria is related to the graph parameters such as number of nodes and graph diameter.
I.
I NTRODUCTION
Randomized strategy has been recently adopted as a computational tool in many applications. This is because such algorithms are simple but fast. A few applications are: clustering and grouping of the networks [1], game theory [2], [3], distributed assignment problem [4], and even controllability analysis [5]. In this paper we apply this strategy to explore the shortest path problem in graphs. This problem seeks the optimal path in a graph from a source vertex to a destination vertex while minimizing total cost. A typical example is finding the quickest way to get from one city to another on a road map; in this case, the vertices are cities and the edges are weighted by the time needed to travel [6]. In graph theory several versions of this problem are discussed, such as singlesource shortest path problem [7], [8], [9], single pair shortest path problem [10], and all pairs shortest path problem [11]. Accordingly, there are different solutions and algorithms for mentioned problems [12]. In this paper, we focus on single source shortest path problem, however the results can be easily generalized to the multi-source problem. The most famous algorithms for this problem are Dijkstra’s algorithm [7] and Bellman-Ford algorithm [8], [9]. Dijkstra’s will choose the shortest path according to the greedy strategy, and Bellman-Ford will choose the shortest path depending on the order of relaxations. Both algorithms are proper for positive weighted directed graphs. Bellman-Ford solves the problem for graphs with negative weights while no negative cycle is reachable from source vertex. In both of these solutions (and other works in the literature) all nodes are needed to have memory and the graph is needed to be known to solve the problem. Further, existing
solutions are centralized and rely on a central process to handle all of the computation and information in the system; while in this work we provide a distributed solution. The applications of this work are in problems where the nodes deployed in an environment with limited infrastructures and there is no local and central coordinator. The basic idea is similar to randomized task allocation using swap-collide algorithm presented in [4]. This paper presents a new distributed algorithm for single source shortest path problem based on data given by a travelling token. In this algorithm every node only knows its immediate neighbors and no other nodes in the graph. The algorithm finds the shortest path without knowing any information about the network topology. Assuming that at each time slot only one node communicates with one of its neighbors, we introduce a distributed and coordination-free algorithms that find shortest path in an undirected graph in an almost surely finite time. Our algorithm is iterative and works based on a randomized strategy in which a token travels randomly throughout the graph. At each time slot active nodethe node holding the token- sends the token to one of its randomly chosen neighbors. This travel continues until the token returns to the source node. Now, source node uses data given by token to estimate paths to other nodes and again sends token to start a new travel and update previous paths. Intuitively, this algorithm leads to satisfy following properties for convergence: (i) Gradually, existing paths are introduced to source node, (ii) New travel injects new paths (information) to the system and helps finding optimal solution, (iii) Once shortest path is found, it never changes. Such algorithm finds the shortest path in undirected graphs in almost surely finite time. We explore the convergence time of this algorithm for different random graphs. We choose ErdosRenyi graphs as our network model[13]. We first run the algorithm assuming large number iterations. Then, we explore the best distribution to fit this convergence time variable. Here, we analyze Gamma distribution, Generalized Extreme Value, Log-Normal, and Log-Logistics among others. We compare these fits based on the error of their corresponding parameters. we find the relation between distribution parameters and the graph parameters such as diameter and number of nodes. This may help to find the stopping criteria for unknown connected graphs were only the diameter of the graph and the number of nodes are known. We also modify the algorithm to solve
shortest path problem for weighted directed graphs as well. The paper is organized as follows; Section II formulates the problem and preliminaries. Our random algorithm is discussed in Section III. Section IV analyzes the convergence time and stop-time criteria. Section V provides a real network example. In Section VI the algorithm is redefined for weighted graphs, and finally, Section VII concludes the paper. II.
P ROBLEM FORMULATION
Consider G is a network of N nodes connected over an undirected graph where G = (V, E), where V = {1, , N } is a vertex set and E is the edge set. The length (or cost) of a path P is the sum of the weights of the edges of P . That is, if P consists of edges e1 , e2 , , ek then the length of P is L(P ) defined as: k X L(P ) = w(ei ), (1) i=1
A distance from vertex v1 to vertex v2 in G is length of a minimum length path (shortest path) from v1 to v2 , if such a path exists. We solve shortest path problem for two sets of graphs: Undirected uniform weighted graphs and Undirected positive-weighted graphs. A. Uniform weighted graphs In this category, let assume that the weights of all edges in graph are uniform and equal to 1. So, the graph can be represented in terms of its N × N adjacency matrix A: Aij =
1, if (i, j) ∈ E, 0, else .
(2)
Here, the shortest path problem is to find a path of fewest edges between two vertices. B. Positive-Weighted graphs For this graphs, each edge may be weighted by a positives number and the graph is defined in terms of its N × N weight matrix W : Wij =
> 0 if (i, j) ∈ E, = 0 else .
(3)
The shortest path problem is to find a path with minimum cost between two vertices. III.
D ISTRIBUTED R ANDOMIZED A LGORITHM
Consider a network corresponding to graph G. The problem is finding the shortest path from source node, vs to all other nodes in the graph. Note that the graph is bi-directional. We now describe the basic algorithm. As an initial condition, let set path to each node as an empty vector. Suppose xi (k) denotes state of node i as the iterative algorithm progresses. The source node starts the algorithm by sending a token, denoted by θ, to one of its randomly chosen neighbors.
In other words, at each iteration, the active node -the node holding the token- sends the token to one of its neighbors chosen with uniform probability. Suppose node i at time k − 1 is the active node. The probability that in next iteration, k, node j holds the token is: P (xj (k) = θ) =
xs (0) = θ, xi (0) = 0, for i 6= s.
(4)
(5)
Which Ni denotes the neighbor set of node i. This travel continues as iteration progresses and token keeps identities of visited nodes. As the graph is undirected, after a while the token returns to source node. Now the source node has a string of visited nodes and their subsequences which lead to finding existing paths. This iterative algorithm is seen to update previous paths and finally find shortest path. The first path to a specific node is simply a path to previous node in the string plus an edge between them. For further updates based on new information given by token the algorithm do relaxation [12]. The process of relaxing an edge (vi , vj ) consists of testing whether we can improve the shortest path to vj found so far by going through vi and, if so, updating L(vj ). A relaxation step may decrease the value of the shortest-path estimate L(vj ) and update path to node vj . Note that in our algorithm node vj is subsequent node of node vi in the string. The algorithm is given in the following: Given: stopping criteria, source-node; Result: Shortest Path from the source-node Initialization: empty path to all nodes ; while counter < stopping criteria do counter = counter + 1; Source-node randomly selects one neighbor; Sends token to the selected neighbor; if current-node is NOT the Source-node then Current holder of the token send it to a random neighbor; Token keeps the identity of the visiting nodes; else Read the memory of the token and estimate the weight of the shortest path; Clear the token memory; end end Algorithm 1: Shortest path algorithm for un-weighted graphs Lemma 1: [12] (Sub-paths of shortest paths are shortest paths) Given a weighted, directed graph G = (V, E) with weight function w : E → R, let p = v1, v2, ..., vk be a shortest path from vertex v1 to vertex vk and, for any i and j such that 1 ≤ i ≤ j ≤ k, let pij = vi , vi+1 , ..., vj be the sub-path of p from vertex vi to vertex vj . Then, pij is a shortest path from vi to vj . Using the above lemma, relaxation strategy results in finding shortest path. Clearly, the above randomized strategy does not require the graph topology and it does not need any centralized coordination. IV.
1 . |Ni |
C ONVERGENCE OF THE ALGORITHM
In this section we analyze the convergence time of the algorithm based on some simulations. Indeed, the goal is to
−3
x 10
Sample Convergence−time GEV Gamma LogNormal LogLogistic
5
Density
4
3
2
1
Fig. 1.
Example Erdos-Renyi graph: all edges are bidirectional. 0
define the stopping criteria of our distributed algorithm. Here, we provide one example to illustrate the main idea. However the results can be inferred for similar graphs. We define the random convergence time as the earliest time at which the shortest paths from source node to all other nodes are found. Intuitively, because of iterative random base strategy, all existing paths gradually are introduced to the source node and it updates latest estimates of shortest paths up to reaching to optimal solution. So, this algorithm converges in almost surely finite time. We find the convergence time of the sample graph performing Monte-Carlo trials. Recording the convergence time for each trial we find the histogram of the convergence time for the given graph. Based on the statistical analysis of such data we aim to define the stopping criteria of new unknown graphs. For analysis we use random Erdos-Renyi graphs. The graph is constructed by randomly wiring nodes with probability p, i.e. for every two nodes we pick a random number and if it is less than a pre-specified value p we put an edge between those two nodes. Clearly, for higher amounts of p the graph is more dense. An example with n = 15 nodes and p = 0.25 is given in Fig. 1. The graph diameter for this example is 3. We randomly pick a node as source node, then record the the exact convergence time for 800 Monte-Carlo trials. Notice that in this part we exactly know the shortest path of the given graph. Therefore, the stop-time criteria in Algorithm 1, is the time when the shortest path in the source node exactly matches the known shortest path of the graph.1 We provide sample analysis data for the given example graph in Fig. 2. We fit four different distributions: Gamma distribution, Generalized Extreme Value (GEV) distribution, Log-Normal distribution, and Log-Logistics distribution. The estimated co-variance of parameter estimates for fitting these distributions are given in the following tables: TABLE I.
E STIMATED COVARIANCE OF PARAMETERS , a, b, FOR G AMMA DISTRIBUTION
a b
1 This
a 0.238 -1.195
b -1.195 6.425
is for the sake of illustration; for unknown graphs we use the stop-time criteria that will be defined later in this section
100
200
300
400
500
600
Data
Fig. 2. sampled convergence time for the example graph. Different types of distributions are examined to fit the data. E STIMATED COVARIANCE OF PARAMETERS , µ, σ, FOR L OG -N ORMAL DISTRIBUTION
TABLE II.
µ σ
TABLE III.
σ 3.18e-018 0.00018
E STIMATED COVARIANCE OF PARAMETERS , µ, σ, FOR L OG -L OGISTIC DISTRIBUTION
µ σ
TABLE IV.
µ 0.00035 3.18e-018
µ 0.000358 2.638e-006
σ 2.638e-006 8.0573e-005
E STIMATED COVARIANCE OF PARAMETERS , k, σ, µ, FOR G ENERALIZED E XTREME VALUE DISTRIBUTION
k σ µ
k 0.00147 -0.0272 -0.0544
σ -0.027 9.285 5.503
µ -0.0544 5.503 16.88
One can compare these distributions based on the sum of variances of estimation parameters, i.e. the trace of the above tables.Among these distributions, the Log-Normal distribution and Log-Logistic distribution best fit the simulation results. We use the Log-Normal distribution for the analysis in the rest of the paper. The pdf of this distribution is: (ln(t) − µ)2 ). (6) 2σ 2 tσ 2π We use this distribution to define the stop-time criteria for Algorithm 1. We define this criteria as the time for which the probability of finding the shortest path is 99%. In other words, the time for which the left hand-side area of the distribution area is 99%. In this regard, we call it 99% criteria.2 For the f (t; µ, σ) =
1 √
exp(−
2 It is worth mentioning that for defining the 99% stop-time criteria we can use any of the four mentioned distributions, since it gives the same result. This is because the tails of all four distributions in Fig. 2 lies on each other.
Log-Normal distribution we have the cumulative distribution function as,
7
6.5
(7)
The stop-time criteria is defined as,
LogNormal parameter: µ
0.46 LogNormal parameter: σ
1 (ln(t) − µ) F (t; µ, σ) = (1 + erf ( √ )) 2 2σ
0.5 0.48
0.44 0.42 0.4 0.38
6
5.5
5
4.5 0.36
1 (ln(T99% ) − µ) √ 0.99 = )) (1 + erf( 2 2σ (ln(T99% ) − µ) √ 0.98 = erf( ) 2σ √ ln(T99% ) = 2σerf−1 (0.98) + µ √ T99% = exp( 2σerf−1 (0.98) + µ)
4
0.34
(8)
0.32
8
Fig. 5.
(9)
10
12
14
16 18 20 number of nodes
22
24
σ v.s. number of nodes.
26
28
3.5 8
Fig. 6.
10
12
14
16 18 20 number of nodes
22
24
26
µ v.s. number of nodes.
(10) (11)
For the example in Fig. 2, the fitted Log-Normal distribution has µ = 5.527 and σ = 0.370; thus, the stop-time criteria is equal to T99% = 596 iteration. This implies that sending the token for 596 iteration and extracting the information, with probability of 99% the algorithm finds the shortest path for this graph. Notice that, here the graph was known and we defined T99% based on the fitted distribution. However, for an unknown graph we need to first estimate the Log-Normal parameters to find T99% . A. Stop-time criteria based on graph features In this section we compare the Log-Normal distribution parameters for Erdos-Renyi graphs with different diameters and number of nodes. First, we change the wiring probability p from 0.2 to 1 over a 15 node graph to have different diameters. Fitting the Log-Normal distribution; the resulting parameers are given in Fig. 3 and Fig. 4. We see a decreasing trend in σ parameter and an increase in µ parameter as the graph diameter increases. 8
Fig. 7. Air traffic network with 235 nodes and 2100 links: the large node represents the source node.
300
7
250
LogNormal parameter: b
LogNormal parameter: a
6 5 4 3 2
150
100
50
1 0
200
1
Fig. 3.
2
3
4 graph diameter
5
6
σ v.s. graph diameter.
7
0
1
Fig. 4.
2
3
4 graph diameter
5
6
µ v.s. graph diameter.
Next, we analyze Erdos-renyi graphs with different number of nodes. We construct random undirected graphs with probability p = 0.3 and number of nodes spanning from n = 8 to n = 33. Fig. 5 and Fig. 6 shows the evolution of the distribution parameters for different number of nodes. Based on such simulations we can extrapolate the σ and µ parameters for unknown graphs. V.
E XAMPLE
In this section we illustrate results of running our distributed algorithm for a real-world airline network. The data is taken from [14]. This example graph has 235 nodes and 2100 edges. Each node represents an airport and edges represent a
7
possible airline connection between two airports. The graph is given in Fig. 7. We perform the algorithm 1 to find the shortest path from source airport to all other airports. The token travels in the network and whenever comes back to the source node, the source node updates its information, i.e., its shortest path to all other nodes. The running time of the algorithm is approximated based on the stop-time criteria discussed in the previous section. The diameter of this graph is 6; this means that from any airport one can get to every other airport at most by 5 connections. The resulting convergence time data are given in Fig. 8. VI.
M ODIFIED ALGORITHM FOR WEIGHTED GRAPHS
Algorithm for weighted graphs is similar to explained algorithm in section III. The only difference is that in this version token keeps weight of the edge between current active node and its randomly chosen neighbor along with the nodes identities. So, token provide data consisting two strings for source node: a string of node identities and a string of edge weights. As number of iterations increases relaxation procedure is performed based on the cost of the path.
28
of interest to analyze the stop-time criteria; this is because there is not that much difference in T99% criteria for these distributions. As future direction, we intend to understand the dependency of the convergence time to other network features such as closeness, degree centrality, betweenness centrality, etc. Further, the randomized approach based on the token travel can be applied for network tomography to determine the unknown topology of the network.
−4
x 10
Convergence time GEV LogLogistic LogNormal Gamma
1.2
Density
1
0.8
R EFERENCES
0.6
[1] 0.4
[2]
0.2
[3] 0
1
1.5
2
2.5 Data
3
3.5
4
Fig. 8. The convergence time of the algorithm 1 for air-traffic network and its fitted distributions.
Given: stop-time criteria, source-node; Result: Weighted shortest Path from the source-node Initialization: empty path to all nodes ; Distance to all nodes is zero; while counter < stop-time criteria do counter = counter + 1; Source-node randomly selects one neighbor; Sends token to the selected neighbor; if current-node is NOT the Source-node then Current holder of the token send it to a random neighbor; Token keeps the identity of the visiting nodes; Token keeps the weight of all edges in travel; else Read the memory of the token and estimate the shortest path; Clear the token memory; end end Algorithm 2: Modified Shortest path algorithm for weighted graphs VII.
4
x 10
C ONCLUSIONS
In this paper, we provide a distributed randomized algorithm for a single source shortest path problem, where we find the optimal path from source node to other nodes in a graph. We introduce an iterative random strategy where the running time of the algorithm is a random variable, but converges to the shortest path in finite time. We analyze the convergence time (99% stop-time criteria) for various random Erdos-Renyi graphs. We see that the convergence time is best fitted by Log-Normal distribution, and further, its parameters are dependent to graph diameter and number of nodes. However, the formulation of this relation is hard to define since these are not the only factors (graph features) affecting the convergence-time. Other distribution may be also
[4]
[5]
[6]
[7] [8] [9] [10]
[11] [12] [13]
[14]
V. L. Minden, C. C. Youn, and U. A. Khan, “A distributed self-clustering algorithm for autonomous multi-agent systems,” in 50th Annual Allerton Conference on Communication, Control, and Computing, 2012, pp. 1445–1448. T. Basar and G. J. Olsder, Dynamic noncooperative game theory, vol. 200, SIAM, 1995. S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory,, vol. 52, no. 6, pp. 2508–2530, 2006. U. A. Khan and S. Kar, “A coordination-free distributed algorithm for simple assignment problems using randomized actions,” in 45th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Nov. 2011, pp. 58–61. Y. Y. Liu, J. J. Slotine, and A. L. Barab´asi, “Control centrality and hierarchical structure in complex networks,” Plos one, vol. 7, no. 9, pp. e44459, 2012. I. Abraham, A. Fiat, A. V. Goldberg, and R. F. Werneck, “Highway dimension, shortest paths, and provably efficient algorithms,” in 21 Annual ACM-SIAM Symposium on Discrete Algorithms, 2010, pp. 782– 793. E. W. Dijkstra, “A note on two problems in connexion with graphs,” Mathematical Programming. R. Bellman, “On a routing problem,” Quarterly of Applied Mathematics. D. R. Ford, L. R.; Fulkerson, Flows in Networks, Princeton University Press. D. Delling, P. Sanders, D. Schultes, and D. Wagner, “Engineering route planning algorithms,” in Algorithmics of large and complex networks, pp. 117–139. Springer, 2009. D. B. Johnson, “Efficient algorithms for shortest paths in sparse networks,” Journal of the ACM, vol. 24, no. 1, pp. 1–13, 1977. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, MIT Press, 2009. P. Erdos and A. R´enyi, “On the evolution of random graphs,” Publications of the Mathematical Institute of the Hungarian Academy of Sciences, vol. 5, pp. 17–61, 1960. http://wiki.gephi.org/index.php/Datasets.