63
Minimum Spanning Tree on a Subset of Nodes
MINIMUM SPANNING TREE ON A SUBSET OF NODES Petr Cenek, Michal Hrčka University of Žilina, Faculty of Management Science and Informatics, Slovak Republic e-mail:
[email protected] [email protected] Abstract Many optimisation algorithms estimate a whole matrix of shortest paths for a network before they solve a proper problem even on a reduced subset of nodes of a network. This estimation is acceptable for combinatorial problems where the estimation of distances represents only a fraction of a whole solution time. On the other hand, simple problems like a minimum spanning tree would spend most of the solution time on the distance matrix estimation. The proposed algorithm offers another approach. It does not estimate the minimum distances for a whole network in advance but rather calculates them only when needed for actual pairs of nodes. Important savings on solution time were achieved in comparison to a traditional optimisation approach. Keywords: network optimisation, minimum spanning tree, shortest paths, labelling algorithms
1
INTRODUCTION A transportation network is defined as a set of nodes and edges, where nodes stand for cities or villages or just for road junctions and edges represent road links. Thus defined network can represent a vast structure while optimisation problems are to be solved for a chosen subsets of nodes only. For instance, a distribution of goods will be planned only for inhabited places in a network or even for a limited choice of places with an important population, with shops for a special kind of products etc. A solution to many network problems is estimated following a typical optimisation process. A matrix of minimum distances in a network is calculated at first and a special problem itself is solved thereafter using the distances among all pairs of nodes in a network. The calculation complexity thus depends on the complexity of an algorithm for finding matrix of minimum distances and complexity of the special problem solution. The distance matrix estimation complexity is approximately O(N2) to O(N3) according to quality of a used algorithm and characteristics of a network (especially a density of a network).
Minimum Spanning Tree on a Subset of Nodes
64
2
MODEL OF A TRANSPORTATION NETWORK The shortest paths problem and subsequent finding of a minimum spanning tree will be discussed for transport applications and therefore a transportation network is used to refer to an infrastructure instead of a graph as is customary used in the graph theory. A transportation network can be described as a structure G(V,H), where V is a set of nodes of a network and H is a set of edges. A non negative function c(h): H → R0+ defines costs (a length for instance) for each edge h∈ H. Function c(h) can be formally written as ci,j, where (i,j) stands for another description of an edge h using its beginning node i and end node j. A number of nodes will be denoted by N=|V| and number of edges by M=|H|. The shortest paths on such a graph will be estimated according to costs c(h) defined for each edge. The problem of finding shortest paths between a pair of nodes r and s can be formulated mathematically as follows: minimise
∑c
( i , j )∈H
ij
. xij
(1)
subject to
− 1 xik − x kj = 0 + 1 (i , k )∈H ( k , j )∈H xij ∈ {0, 1}
∑
3
∑
for k = r for k ≠ r a k ≠ s
for k ∈ V
(2)
for (i,j) ∈ H
(3)
for k = s
SHORTEST PATHS ALGORITHMS There are many algorithms for finding shortest paths in a network, the labelling algorithms of type label-set or label-correct seem to be the most efficient among them, especially for transportation networks which have relatively few edges (compared to a number of nodes). Transportation networks are situated in a plane and that is why they may be represented approximately by planar graphs. So the number of edges must comply with the condition M ≤ 3(N-1). The principle of labelling algorithms is well known. Each node i in a network is marked by a label di, which is set to infinity (or to a sufficiently big number greater than any distance in a network). The label of a root is set to zero dr=0 (root r is a beginning node of a path) and the root is put in a queue (which is noted as {r}→F ). A following improving step chooses a potential node from a queue and checks if an improvement can be achieved by routing a path via the potential node. The improvement step is repeated for all successors of the potential node and if the path to a successor via the potential node is shorter than a known path (the value of its label),
Minimum Spanning Tree on a Subset of Nodes
65
the new path will be routed via potential node. The successor’s label will be set to a value given by a sum of a potential node label plus length of an edge to the successor, the potential node will be stored as a predecessor and the successor will be added to a queue as a new potential candidate. The whole process repeats until the queue is empty. The algorithm can be formally described as: Step 0: Initialise di=∞ for i = 1, 2, …, N, dr=0 and {r}→F Step 1: Take a potential node u from a queue F→u and try improvements for all successors of the node u for ∀ j | (u,j)∈H do if dj>du+cuj then begin dj = du+cuj ; F∪ {j} → F end Step 2: Repeat step 1 until queue is empty F=0. Algorithm types differ by the queue organisation. A label correct algorithm uses a FIFO queue (First-In-First-Out), which puts newcomers to the end of a queue and takes the first node in a queue as a potential node for further improvements (which is described in the algorithm above). A label set algorithm chooses from all nodes in a queue a potential node with a minimum value of its label (with a minimum distance from the root among all nodes in a queue). The algorithm will be modified only at the beginning of step 1 as follows: Step 1: Take a potential node u from a queue with a minimum value of du F → u | d u = min {d k } k∈F
and try improvements for all successors of the node u … Both algorithms have similar efficiency in finding a whole tree of shortest paths from a root. The label correct algorithm repeats processing of some nodes as they can enter the queue repeatedly. The label set algorithm processes every node just ones, but it must search the priority queue when choosing a potential node, which decreases its performance. Nevertheless, there is an important difference between the two algorithms when searching for the shortest path between a pair of nodes only (with non negative costs ci,j ≥ 0 ). The label set algorithm chooses a potential node u from a queue with a label du , which represents already the minimum distance between the root r and node u. If the node u is an end node of a path the algorithm can finish without processing remaining nodes in the queue. Thus the label set algorithm is usually more efficient when searching shortest paths between a pair of nodes only.
Minimum Spanning Tree on a Subset of Nodes
66
4
MINIMUM SPANNING TREE A spanning tree problem is defined for an undirected graph and consists of construction of a sub-graph GS = (V, K), where V is a set of nodes and K⊂ H is a subset of edges. The sub-graph GS must be connected and without cycles. The minimum spanning tree GS* is a spanning tree with a minimal sum of costs for edges used in a tree
h∈L
∑ c(h) = min ∑ c(h)
h∈K
L⊂ H
Popular methods for solving the minimum spanning tree problem are Kruskal´s or Prime´s algorithm. Both of them are based on a similar principle. The algorithm is initialised using only nodes of a graph (creating N disjoint components). The still disjoint components are then connected using edges sorted according to their length. When a potential edge connects two disjoint components, the edge is inserted in the subset K and the two components are joined and marked as a unique component. The process continues until all nodes are connected in one component (the resulting subgraph is connected). The main difference among both algorithms is that in Kruskal’s algorithm all edges are sorted and may connect any two disjoint components while in Prim’s algorithm one node is chosen as a basic component which will be grown by joining other nodes. Only outgoing edges from the basic component to other nodes are sorted according to their length and used to connect disjoined nodes.
4.1 Minimum spanning tree for a subset of nodes The situation is different if only a subset of selected (terminal) nodes VT ⊂ V is to be connected by the spanning tree GT = (VT , K). The edge costs cannot be directly used now and edges will not be inserted into a tree but distances and shortest paths among terminal nodes must be used instead. One way how to solve the problem is an estimation of the whole shortest paths matrix among terminal nodes and a use of the shortest paths as edges of a new complete graph. The spanning tree will be created then using sorted distances among the terminal nodes and the shortest paths among them instead of using edges of the original graph. The newly proposed algorithm avoids a time consuming estimation of the whole shortest paths matrix and tries to find the shortest paths and insert them into a resulting tree dynamically. The initialisation step is similar to a general approach, which means, that a sub-graph is initialised by using only nodes to create NT = |VT | disjoint components. The shortest paths to connect disjoined components are to be found then. The algorithm is based on a label-set algorithm for finding shortest paths. It starts by marking each node i by label di=∞ set to infinity. All terminal nodes are used as roots of a shortest paths forest. Their labels are set to zero dr=0 for all r∈ VT and an edge to a nearest successor is put into a queue (noted as F ∪ { (r,s) }→ F ).The queue
Minimum Spanning Tree on a Subset of Nodes
67
F is sorted according to labels ds . Each node is also marked as a member of its own component (for nodes r∈VT ) or a member of a zero component (not included in any real component) for all non terminal (transient) nodes (i∉VT). A search algorithm of a label set type is then started. An edge (r,u) is taken from a queue and an edge to a possible next successor of the node r is put into the queue. The label du is a minimum value among nodes in the queue and the node u is the nearest unprocessed node from any root (or any component). The three following cases may occur: 1. The node u does not belong to a real component (it is a transient node) and so its label will be newly set and the node will be stored in a queue for further processing (continuing the path). The node u in this case will be also marked as a node reachable from a component to which the beginning node of a path belongs. 2. The node u is a terminal node from a different component to a path’s root. In this case a new connection between two different (disjoint) components is found. The whole path from a root to the end node in another component (all edges along that path) is inserted in the subset K and the two components are joined and marked as a unique component. 3. The node u is a transient node reachable already from another component. In this case a new connection between two different (disjoint) components is found, but the path is not necessarily the shortest one yet. That is why this connection should be stored but not inserted in the spanning tree yet. The connection will be used if the markings for the node u will not change and processing of the queue will progress to higher label values (D ≥dru+ dsu ). The reason is as follows. When a node u was chosen from a queue by a label set algorithm, the distance of du is a minimum distance in a queue or in other words all shorter paths have already been processed. If the path leads to a terminal node from another component, no shorter path between these components can exist and the new path can be used directly. If a transient node u is processed on a path from a root r let us mark its proposed label by dru . Let us suppose that the node is already marked with marking dsu as reachable from a different component with a path root s. The total length of a path is then D = dru + dsu and this path can be used only when processing of nodes in a queue surpasses that value or when dru + dsu ≤ min {d k } k∈F
The situation is illustrated in Figure 1.
Minimum Spanning Tree on a Subset of Nodes
68
u
dsu
dru dsu
dru r
s drs < dru + dsu
Figure 1 Creating shortest paths from 2 roots and alternative direct edge A comparison of the traditional approach with calculation of all shortest paths (distance matrix) among terminal nodes and the new method with dynamic search of shortest paths shows that supposed gains can be reached. The computational complexity was compared using measured calculation times for both methods. Tests were run on a road network of the South Moravian region with 860 nodes and 1360 edges. Tested subsets of nodes were changed from 100 to 800 terminal nodes. The increasing curve in a graph shows calculation times for distance sub-matrix estimation and subsequent finding of a minimum spanning tree among terminal nodes. The calculation time for the new algorithm with dynamic search of shortest paths while solving the minimum spanning tree problem is illustrated by the flat curve. The advantage of the new algorithm can be seen in the graph. The difference between the two approaches can be explained as follows. The estimation of distance matrix among terminal nodes in the traditional algorithm demands one calculation of a shortest tree for each terminal node or n-times for all n terminal nodes.
Calculation time [s]
Comparison of algorithms efficiency 2 1,5 Spanning tree
1
Distance submatrix
0,5 0 100 200 300 400 500 600 700 800 Number of terminal nodes
Figure 2 Efficiency comparison of a traditional approach vs. the proposed algorithm
Minimum Spanning Tree on a Subset of Nodes
69
The computational complexity can be approximated by O(n). The new approach calculates a forest of shortest path trees only once independent on a number of terminal nodes. The computational complexity seems to be O(1) due to a fact that the distances should be calculated only once independent on a number of terminal nodes. 5
CONCLUSIONS The problem of finding a minimum spanning tree on a subset of nodes has several practical applications and it is in no way a pure theoretical problem. It may find the cheapest set of transit services in public transports, which would satisfy transport demands in a network with minimum costs for a service provider. The proposed solution cannot be used directly in practice, but it estimates a lower bound on costs for provider and can serve as a basis for further improvements from a passenger’s point of view. Another application is a problem of routing and design of electricity, water, gas or other utility networks to connect different appliances and workplaces in a building where routing is allowed along predefined edges of an underlying mesh. The number of transient nodes in such cases is very large and the proposed method can therefore help to solve realistic sized problems. The presented new algorithm works fast and estimates the optimal solution of a minimum spanning tree problem. The mentioned practical applications would be optimised by solving a Steiner tree problem on a graph which is much harder problem than minimum spanning tree discussed in the paper. Nevertheless, we believe that the proposed searching algorithm can help to find a good solution even for the Steiner tree problem after proper modification of the algorithm. REFERENCES [1] [2] [3] [4] [5] [6] [7]
Borůvka O.: O jistém problému minimálním; Práce Mor. Přírodověd. Spol. v Brně, 3, 1926, 37-58 Daneshmand S.V.: Algorithmic Approaches to the Steiner Problem in Networks; Inauguraldissertation Mannheim, 2003 Demel J.: Grafy a jejich aplikace; ACADEMIA Praha 2002 Dowsland K.A.: Hill climbing simulated annealing and Steiner problem in graphs, Eng. Opt., Volume 17, Pages:91-107, 1991. Gendreau M., Larochelle J.F., Sanso B.: A Tabu Search Heuristic for the Steiner Tree Problem. Networks, Volume 34, Issue 2, Pages 162-172, 1999. Hrčka M.: Path exchange heuristic for the Steiner problem in graphs; Transcom; Žilina; June 2005 Hrčka, M.: Hľadanie najkratších ciest na grafoch medzi dvomi vrcholmi, použitím modifikácií Dijkstrovho algoritmu. Modeling and Simulation in Management, Informatics and Constrol; International Workshop; Žilina; October 2005.
70
[8]
Minimum Spanning Tree on a Subset of Nodes
Janáček J.: Matematické programovanie; 2.vydanie, EDIS–Žilinská univerzita, Žilina, 2005 [9] Koch T., Martin A.: Solving Steiner tree problems in graphs to optimality. Networks, Volume 32, Issue 3, Pages: 207 – 232, 1998 [10] Plesník J. Heuristics for the Steiner Problem in Graphs. Discrete Applied Mathematics, 1992