Functional Graph Algorithms with Depth-First Search - CiteSeerX

8 downloads 0 Views 174KB Size Report
David J. King. John Launchbury ..... Pitman, London, 1991. 4] John E. ... tel, Jon Fairbairn, Joseph Fasel, Mar a M. Guzm an, Kevin Hammond,. John Hughes ...
Functional Graph Algorithms with Depth-First Search (Preliminary Summary) David J. King

John Launchbury

University of Glasgow

Abstract

Performing a depth- rst search of a graph is one of the fundamental approaches for solving a variety of graph algorithms. Implementing depth rst search eciently in a pure functional language has only become possible with the advent of imperative functional programming. In this paper we mix the techniques of pure functional programming in the same cauldron as depth- rst search to yield a more lucid approach to viewing a variety of graph algorithms. This claim will be illustrated with several examples.

1 Introduction Graph algorithms have long been a challenge to functional programmers. It has not been at all clear how to express such algorithms without using side e ects to achieve eciency. For example, many texts provide implementations of search algorithms which are quadratic in the size of the graph (see [7, 3, 2], for instance), compared with the standard linear implementations given for imperative languages (see [1], for instance). In this paper we implement a variety of algorithms based on depth- rst search (DFS), obtaining linear time eciency for them all. The importance of depth- rst search for graph algorithms was established by Tarjan and Hopcroft [10, 4]. They demonstrated how depth- rst search could be used as a skeleton on which to build ecient graph algorithms. The particular code-fragments relevant to a particular algorithm are embedded into the DFS procedure in order to compute relevant information while the search proceeds. While this is quite elegant it has a number of drawbacks. Firstly, the DFS code becomes intertwined with the code for the particular algorithm, resulting in opaque programs. Secondly, reasoning about such algorithms is dynamic|it is a process under discussion rather than a value|and such reasoning is complex. In response to this, DFS algorithms are commonly introduced with respect to the DFS-tree (or forest in general), providing a static intermediate value for reasoning. We build on this latter idea. If having an explicit DFS tree is good for reasoning then, so long as the overheads are not unacceptable, it is good for  Authors' address: Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland, United Kingdom. Electronic mail: {gnik, jl}@dcs.glasgow.ac.uk.

programming. In particular, we present a variety of DFS algorithms as combinations of standard components, passing explicit intermediate values from one to the other. In doing so we gain a far greater degree of modularity than is usually found in implementations of these algorithms, while still retaining the standard complexity measure. There is one place where we do need to use destructive update in order to gain the same complexity (within a constant factor) of imperative graph algorithms. The Glasgow Haskell compiler provides extensions to the nonstrict, pure functional language Haskell [5] including updatable arrays, and also allows us to encapsulate these state-based actions so they return pure functional values. Consequently we obtain linear algorithms and yet retain the ability to perform purely functional reasoning on all but one reusable component.

2 Representing graphs There are many ways to represent (directed) graphs. For our purposes, we use an array of adjacency lists. The array is indexed by vertices, and each component of the array is a list of those vertices reachable along a single edge. This adjacency structure is linear in the size of the graph, that is, the sum of the number of vertices and the number of edges. We use a standard Haskell immutable array which gives constant time lookup (but not update|these arrays may be shared arbitrarily). We can use the same mechanism to represent undirected graphs as well, simply by ensuring that we have edges in both directions. type Graph = Array Vertex [Vertex] type Edge = Assoc Vertex Vertex out :: Graph -> Vertex -> [Vertex] out g v = g ! v vertices :: Graph -> [Vertex] vertices = indices mapG :: (Vertex -> [Vertex] -> a) -> Graph -> Array Vertex a mapG f g = array (bounds g) [ v := f v (out g v) | v [Edge] edges g = [ v := w | v Graph buildG b es = accumArray (flip (:)) [] b es

Figure 1: Graph abstract data type

In the abstract data type presented in Figure 1, Vertex can be any type belonging to the Haskell index class Ix, which includes Int and Char as well as many other types. Haskell arrays come with indexing (!) and the functions indices (returning a list of the indices) and bounds (returning a pair of the least and greatest indices). We provide graph versions of bounds and (!) namely vertices and out. The function mapG applies a function to every graph vertex, building a table (an array indexed by the vertices) of the result. For example, we might de ne, outdegree :: Graph -> Array Vertex Int outdegree g = mapG numEdges g where numEdges v ws = length ws

to build a table of the number of edges leaving each vertex. Haskell provides an easy and general method for building an array from an association list (essentially a list of pairs using the in x := constructor). This operation takes linear time with respect to the length of the adjacency list. So in linear time, we can de ne a graph in terms of edges, and then convert to a graph using the function buildG. For example, graph = buildG ('a','j') ['a':='b', 'a':='f', 'b':='c', 'b':='e', 'c':='a', 'c':='d', 'e':='d', 'g':='h', 'g':='j', 'h':='f', 'h':='i', 'h':='j']

will produce the array representation for the graph in Figure 2. a

c

d

b

e

g

f

h

i

j

Figure 2: A directed graph Then, to nd the immediate successors to 'h', say, we compute: out graph 'h'

which returns ['f', 'i', 'j']. The function edges lets us go in the other direction, extracting an edge list from the array representation, again in linear time. This immediately gives us a way to create the transpose of a graph:

transposeG :: Graph -> Graph transposeG g = buildG (bounds g) (map reverseE (edges g)) where reverseE (v:=w) = w:=v

We extract the edges from the original graph, reverse their direction, and rebuild a graph with the new edges. Then, for example, out (transposeG graph) 'h'

will return ['g'].

3 Depth- rst search Depth- rst search may be loosely described as follows. Initially, all the vertices of the graph are deemed \unvisited", so we choose one and explore an edge leading to a new vertex. Now we start at this vertex and explore an edge leading to another new vertex. We continue in this fashion until we reach a vertex which has no edges leading to unvisited vertices. At this point we backtrack, and continue from the latest vertex which does lead to new unvisited vertices. Eventually we will reach a point where every vertex reachable from the initial vertex has been visited. If there are any unvisited vertices left, we choose one and begin the search again, until nally every vertex has been visited once, and every edge has been examined. g

a

B

f

b

c

d

e

C

C

h

i

F

j

Figure 3: A depth- rst forest of the graph It is common to identify the spanning forest de ned by a depth- rst traversal of a graph. This forest (a list of trees) is depicted in Figure 3 for the graph in Figure 2. The (solid) tree edges are those graph edges which lead to unvisited vertices. The remaining graph edges are also shown, but in dashed lines. These edges are classi ed according to their relationship with the tree, namely, forward edges (connect ancestors in the tree to descendants), back edges (the reverse), and cross edges (connect nodes across the forest, but always from right to left). This standard classi cation is useful for thinking about a number of algorithms. In particular, we will take it as read that no graph contains left-right cross edges with respect to a depth- rst spanning forest. If a forest did contain left-right cross edges it could not have been created by a depth- rst traversal.

4 Implementing depth- rst search

The approach to DFS algorithms which we advocate in this paper is to manipulate the DFS forest explicitly. The rst step, therefore, is to construct the depth- rst spanning forest from a graph. For this we still need two things: the rst is an appropriate de nition of trees and forests (Section 4.1), and the second is a method for marking vertices so that we can determine when a vertex has been visited (Section 4.2). Once we have seen how to build DFS forests (Section 4.3), we will see how to use them in a variety of algorithms (Section 5).

4.1 Representing trees and forests

In Figure 4 we provide an implementation for trees and forests. A forest is a list of trees, and a tree is a node containing some value, together with a forest of sub-trees. Both trees and forests are polymorphic in the type of data they may contain. data Tree a = Node a (Forest a) type Forest a = [Tree a] preorder :: Tree a -> [a] preorder (Node a ts) = [a] ++ preorderF ts postorder :: Tree a -> [a] postorder (Node a ts) = postorderF ts ++ [a] preorderF :: Forest a -> [a] preorderF ts = concat (map preorder ts) postorderF :: Forest a -> [a] postorderF ts = concat (map postorder ts)

Figure 4: Tree and forest attening functions The functions preorder and postorder are standard attening functions, placing ancestors before descendants, and postorder doing the reverse. Both place the components of subtrees in left-to-right order1 . preorder

4.2 Marking vertices

The purpose of marking vertices during a search is to determine whether a vertex has been previously visited or not. One way of viewing this is to think of maintaining a set of those vertices which have been visited. The operations we 1 Because of repeated appends (++) caused by concat these operations incur an extra logarithmic factor. Removing this is easy, but the de nitions become a little less clear.

require on the set are to add new members, and to test for membership. In imperative language implementations of graph algorithms, marking is performed by destructive update, giving constant time for both operations. Indeed, within the standard von Neumann architecture there seems to be no way to obtain constant time for both operations unless destructive update is used. This means that to implement the construction of the DFS forest, we will need to make use of explicit state within Haskell. Imperative features were initially introduced into the Glasgow Haskell compiler to perform input and output [8]. The approach is based on monads [11], and can easily be extended to achieve in-situ array updates. Launchbury showed how the original model could be extended to allow the imperative actions to be delayed until their results are required [6]. This is the model we use. The type constructor of the monad is called Seq (because it sequences actions) and is an instance of the standard state transformer monad. So elements of type Seq Int, say, are functions which, when applied to the state, return a pair of an integer together with a new state. As usual we have the unit return and the sequencing combinator bind: return :: a -> Seq a return a s = (a,s) bind :: Seq a -> (a -> Seq b) -> Seq b (m `bind` k) s = k a t where (a,t) = m s

but we will hide this behind some syntactic sugar. We extend Haskell expressions with an expression of the form {Q} which is expanded as follows2: Q ::= E | E;Q | x ({Q}) {x ({Q})

The Seq monad provides three basic array operations: newArr :: Ix a => (a,a) -> b -> Seq (ArrRef a b) readArr :: Ix a => ArrRef a b -> a -> Seq b writeArr :: Ix a => ArrRef a b -> a -> b -> Seq ()

The rst, newArr, takes a pair of index bounds (the type a must lie in the index class Ix) together with an initial value, and returns a reference to an initialised array. The time this operation takes is linear with respect to the number of elements in the array. The other two provide for reading and writing an element of the array, and both take constant time. Finally, the Seq monad comes equipped with a function newSeq. newSeq :: Seq a -> a

This takes a state-transformer function, applies it to an initial state, extracts the nal value and discards the nal state.

2 In Haskell the symbols {, }, and ; are sometimes used as layout markers, but we will only use them for monad syntax.

4.3 Performing depth- rst search

The algorithm for DFS is given in Figure 5. The function dfs takes a graph g and a list of vertices vs, and returns the depth- rst spanning forest of g. The list of vertices vs gives an initial ordering for searching the vertices, which is used to resume the search whenever one is completed. Clearly the head of vs will be the root of the very rst tree. dfs :: Graph -> [Vertex] -> Forest Vertex dfs g vs = newSeq {marks [Vertex] -> Seq (Forest Vertex) search marks [] = return [] search marks (v:vs) = {visited Forest Vertex components g = dfs (buildG (bounds g) (es++map reverseE es)) (vertices g) where es = edges g reverseE (v:=w) = w:=v

The undirected graph we actually search may have duplicate edges, but this has no e ect on the structure of the components.

Algorithm 4. Strongly connected components

Two distinct vertices in a directed graph are said to be strongly connected if each is reachable from the other. A strongly connected component is a maximal subgraph, where all the vertices are strongly connected with each other. This problem is well known to compiler writers as the dependency analysis problem|separating procedures/functions into mutually recursive groups. We implement the double depth- rst search algorithm of Kosaraju (unpublished), and Sharir [9]. scc :: Graph -> Forest Vertex scc g = dfs (transposeG g) (reverse (postOrd g))

We order the vertices of a graph using the postordering of vertices. The reverse of this ordering is used to perform a depth- rst traversal on the transpose of the graph. The result will be a forest, where each tree constitutes a single strongly connected component. Intuition as to why the algorithm works comes from recalling the nature of the order of vertices produced by postOrd. We have already recognised that the only left-right edges in this order are back edges. If we reverse this list and perform a DFS on the transposed graph, then the only way to move forward in the list of vertices is by following what was a back edge in the original search, which is therefore bound to be in a cycle (follow the tree edges back to get to the original node). A minor variation on this algorithm is to reverse the roles of the original and transposed graphs: scc' :: Graph -> Forest Vertex scc' g = dfs g (reverse (postOrd (transposeG g)))

The advantage now is that not only does the result express the strongly connected components, but it is also a valid DFS forest for the original graph (rather than for the transposed graph).

Algorithm 5. Finding reachable vertices

Finding all the vertices that are reachable from a single vertex v is a simple application of DFS. Commencing a search at v will construct a tree containing all of v 's reachable vertices. We then atten with preorder to give a list of all the reachable vertices from v .

reachable :: Graph -> Vertex -> [Vertex] reachable g v = preorderF (dfs g [v])

One application of this algorithm is to determine if there is a path between two vertices: path :: Graph -> Vertex -> Vertex -> Bool path g v w = w `elem` (reachable g v)

The elem test is lazy: it returns True the instant a match is found. Hence, the result of reachable is demanded lazily, so only produced lazily. As soon as the required vertex is discovered the generation of the DFS forest ceases. Thus dfs implements a true search and not merely a complete traversal.

6 Evidence that we achieve a linear complexity for DFS The depth- rst search algorithm presented should run in O (V + E ) time (for a graph with V vertices and E edges). To provide experimental evidence we took measurements on the strongly connected components algorithm, which uses two depth- rst searches and should run in O (V + E ) time. The results of our experiment are in Figure 6. Timings were taken on randomly generated graphs (with di ering numbers of vertices and edges) and are accurate to approximately 1%. The result is that the plotted points clearly all lie on a plane, indicating the linearity of the algorithm. Seconds

30 25 20 15 10 5

500

1000

1500

2000

2500 3000 3500 Vertices 4000

4500

5000

0

5000 4500 4000 3500 3000 2500 2000 Edges 1500 1000 500

Figure 6: Measurements taken on the strongly connected components algorithm

References

[1] Thomas H. Corman, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, Massachusetts, 1990. [2] Rachel Harrison. Abstract data types in Standard ML. John Wiley and Sons, 1993. [3] Ian Holyer. Functional programming with Miranda. Pitman, London, 1991. [4] John E. Hopcroft and Robert E. Tarjan. Algorithm 447: Ecient algorithms for graph manipulation. Communications of the ACM, 16(6):372{ 378, June 1973. [5] Paul Hudak, Simon L. Peyton Jones, Philip Wadler, Arvind, Brian Boutel, Jon Fairbairn, Joseph Fasel, Mara M. Guzman, Kevin Hammond, John Hughes, Thomas Johnsson, Richard Kieburtz, Rishiyur S. Nikhil, Will Partain, and John Peterson. Report on the functional programming language Haskell, Version 1.2. ACM SIGPLAN Notices, 27(5), May 1992. [6] John Launchbury. Lazy imperative programming. In Workshop on State in Programming Languages, ACM SIGPLAN, pages 46{56, Copenhagen, Denmark, June 1993. [7] L. C. Paulson. ML for the working programmer. Cambridge University Press, Cambridge, 1991. [8] Simon L. Peyton Jones and Philip Wadler. Imperative functional programming. In 20'th Symposium on Principles of Programming Languages, ACM, Charleston, North Carolina, January 1993. [9] M. Sharir. A strong-connectivity algorithm and its applications in data

ow analysis. Computers and mathematics with applications, 7(1), 1981. [10] Robert E. Tarjan. Depth- rst search and linear graph algorithms. SIAM Journal of Computing, 1(2):146{160, June 1972. [11] Philip Wadler. The essence of functional programming (invited talk). In 19'th Symposium on Principles of Programming Languages, ACM, Santa Fe, New Mexico, January 1992.