Oracles for Distances Avoiding a Node or Link ... - Semantic Scholar

1 downloads 0 Views 346KB Size Report
We can also keep track of the shortest path avoiding any failed node or ... Our motivating scenario is a network where node/link-failures happen quite rarely.
Oracles for Distances Avoiding a Node or Link Failure∗ Camil Demetrescu



Mikkel Thorup



R. A. Chowdhury

§

Vijaya Ramachandran



Abstract We consider the problem of preprocessing an edge-weighted directed graph G to answer queries that ask for the shortest distance from any given node x to any other node y avoiding an arbitrary failed node or link. We describe an oracle (i.e, a simple data structure) for such queries that can be stored in O(n2 log n) space, and which allows queries to be answered in O(1) time, where n is the number of nodes in G. We also √ show that if we are willing to use Θ(n2.5 ) space, we can reduce the preprocessing time by a factor of n while maintaining the constant query time. We can also keep track of the shortest path avoiding any failed node or link by maintaining for each node the outgoing edge that should be used to get on such a path. AMS Subject Classifications. [05C85, 68W01] Graph algorithms; [68P05] Data structures; [05C38] Shortest paths; [90B18] Network failures.

1

Introduction

In the distance sensitivity problem we wish to construct a data structure (which we call the distance sensitivity oracle) for a network with links having nonnegative real-valued weights, that supports any sequence of the following two distance queries and the corresponding path queries: nf-distance(x, y, v): return the shortest distance from node x to node y of the network using paths that do not contain node v; lf-distance(x, y, u, v): return the shortest distance from node x to node y of the network using paths that do not contain link (u, v); The corresponding paths queries are specified respectively by nf-path(x, y, v) and lf-path(x, y, u, v). Our motivating scenario is a network where node/link-failures happen quite rarely. As soon as a node or link-failure has been noticed, we want to be able to answer distance queries and provide directions for shortest paths in the network without the failed node or link. On the other hand, we assume that we have plenty of time to compute a new data structure in the background. We note that the ability to deal with node/link-failures enables us to deal with some other related aspects of the network. For example, by dealing with a link-failure (u, v) we actually deal with arbitrary changes to a link weight. More precisely, we simply compute the distance from node x to node y in the network where (u, v) has its weight changed to w as ∗ Part of the results presented in this paper for link failures appeared in the Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02) and in the Proceedings of the 13th International Symposium on Algorithms and Computation (ISAAC’02). † Dipartimento di Informatica e Sistemistica, Universit` a di Roma “La Sapienza”, via Salaria 113, 00198 Rome, Italy. Email: [email protected]. URL: http://www.dis.uniroma1.it/~demetres. FAX: +39-06-85300849. Author supported in part by the DIMACS Special Focus on Next Generation Networks Technologies and Applications, by the IST Programme of the EU under contract n. IST-1999-14.186 (ALCOM-FT) and by the Italian Ministry of University and Scientific Research (Project “Alinweb: Algorithms for Internet and the Web”). This work has been done while visiting the AT&T Shannon Labs, 180 Park Avenue, Florham Park, NJ 07932. ‡ AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932. Email: [email protected]. URL: http://www.research.att.com/info/mthorup. § The University of Texas at Austin, Department of Computer Sciences, Austin, TX 78712. Email: [email protected]. URL: http://www.cs.utexas.edu/~shaikat. Supported in part by an MCD Graduate Fellowship and NSF Grant CCR-9988160. ¶ The University of Texas at Austin, Department of Computer Sciences, Austin, TX 78712. Email: [email protected]. URL: http://www.cs.utexas.edu/~vlr. Supported in part by Texas Advanced Research Program Grant 003658-0029-1999 and NSF Grant CCR-9988160.

1

min{lf-distance(x, y, u, v), distance(x, u)+ w + distance(v, y)}. Here a weight change could model that traffic is moving slower/faster along a certain link. An interesting application of dealing with single weight changes is a local search like the one in [9]. There, one wants to consider a neighborhood of a given weight setting, where each neighbor is obtained by changing a single weight. Another motivation for solving the distance sensitivity problem arises from recent interest in Vickrey pricing of networks [11, 19]. We describe this application in more detail at the end of this section. We model the network as a weighted directed graph G = (V, E, w), where w is a nonnegative real-valued weight function over E, |V | = n and |E| = m. A variant of the distance sensitivity problem, related to reachability in directed acyclic graphs with respect to edge failure, was first introduced by King and Sagert in [14], where they consider the problem of supporting sensitivity queries of the kind: “Is there a path from node x to node y which does not contain edge (u, v)?” Here, we are concerned with general weighted digraphs and distance queries instead of reachability queries, and consider both node and link failure. This problem is similar to the replacement paths problem ([11], also see the erratum for this paper, and [12, 18]) which, given a pair of nodes x and y in G, computes the set of shortest paths from x to y avoiding each of the nodes (or edges) on πx,y one at a time, where πx,y is the original shortest path from x to y in G. This problem can be solved in O(m + n log n) preprocessing time and O(n) space for both the node removal [18] and edge removal [11] case. However, in this paper, we are interested in finding replacement paths for all possible node pairs in a directed graph. Of a similar flavor is the most vital node (or arc) problem [1, 2, 4, 18], which is the problem of identifying the node (or edge) on a given shortest path, whose removal results in the longest replacement path. The most natural approach to the distance sensitivity problem would be to use one of the recent dynamic all pairs shortest paths (APSP) algorithms [5, 6, 13, 15], and delete the failed link (or the failed node – in which case we need to represent each node in the original graph as an edge by splitting it). The best bound (from [6]) takes O(n2 log2 n) amortized time for real weighted directed graphs, but then queries avoiding the failed node or link are answered in constant time. However, our goal here is to answer a query as quickly as possible after a node or link failure, and then we are actually better off just using an O(n log n + m) single source shortest paths (SSSP) algorithm [10] computing the answer from scratch. Another extreme solution would be to construct a table that for each node pair (x, y) and each node/edge stores the distance from x to y avoiding that node/edge. For node failures, such a table of size O(n3 ) is trivially computed by n APSP computations. For real-weighted sparse graphs, this takes O(mn2 + n3 log log n) time using a recent O(mn + n2 log log n) time APSP algorithm [20], and for dense graphs, it takes O(n4 (log log n/log n)1/2 ) time using an O(n3 (log log n/log n)1/2 ) time APSP algorithm [21]. However, for link failures the size of the table is O(mn2 ) and requires m APSP computations. Space can be reduced (at least for the link failure case) by working from one source x at a time, constructing a shortest paths tree T (x). This tree changes only if we remove any of the O(n) nodes or edges in it. Hence it is only for these nodes and edges that we need to record new distances from x to all other nodes y ∈ V . As a result, the total space requirement is O(n3 ) for both node and edge deletion. The query time is still O(1). As elaborated in [22], it is also quite easy to get a running time of O(mn2 + n3 log log n). However, still we consider the cubic space prohibitive for most practical applications. Our Results. The main result of this paper is a deterministic oracle with fast query time for both node and link failure that uses nearly the same space as that required for storing the distance matrix of the input graph. More precisely, we construct an oracle that uses O(n2 log n) space and answers distance queries under a node or link failure in O(1) worst-case time. This result is quite surprising, since the space bound is significantly smaller than n3 and yet our scheme answers queries in O(1) time. The construction time for our oracle is O(mn2 + n3 log n) in the worst case. However, if one is willing to settle for Θ(n2.5 ) space, we can improve preprocessing time to O(mn1.5 + n2.5 log n), and the query time remains constant. In Table 1 we put our bounds in perspective by comparing them to the bounds for related problems obtainable with previous algorithms: in the context of most vital node detection and Vickrey pricing, we are extrapolating the performance of algorithms designed for a single source-destination pair to the all-pairs case. To achieve our bounds, we construct data structures where we store information about all pairs shortest paths excluding only nodes or edges with specific properties, rather than excluding all possible nodes or edges. We also store information about shortest paths where we exclude all nodes on subpaths, rather than single nodes. We remark that our algorithms are very simple, and thus amenable to efficient implementations. 2

Oracle

Context

Graph Type

Construction Time

Space

Query Time

Nardelli et al. [18]

Most Vital Node Detection [nf-distance(x, y, v)]

Undirected

O(mn2 + n3 log n)

O(n3 )

O(1)

Hershberger & Suri [11]

Vickrey Pricing in Networks [lf-distance(x, y, u, v)]

Undirected

O(mn2 + n3 log n)

O(n3 )

O(1)

Method 1

Our Contribution

Node/Link Failure [nf-distance(x, y, v)] [lf-distance(x, y, u, v)]

O(mn2 + n3 log n) Directed (or Undirected)

O(n2 log n)

O(1)

Method 2 O(mn1.5 + n2.5 log n)

O(n2.5 )

O(1)

Table 1: Known Results and Our Contribution.

Part of the results presented in this paper for link failures were presented in two conference papers: one by the first two authors [7] which presented the two algorithms, and the other by the last two authors [3]. The paper [3] improves some of the results in [7], but it also has an extra claim on construction time which is incorrect. In this paper, we extend those oracles to efficiently handle node failures in addition to link failures. In the process, the presentation of the paper has been changed: we first present algorithms to handle node failures directly and then extend these algorithms to deal with link failures. Notation. Throughout the paper, we denote by T (x) a shortest path tree of G rooted at node x and we denote by πx,y the unique path from node x to node y in T (x). We also denote by wx,y the weight of edge (x, y) in b we denote G, by dx,y the weighted distance from x to y in G, and by hx,y the number of edges in πx,y . By G b b Since graph G where we have reversed the orientation of edges. Thus, π b and T denote π and T related to G. we can resolve ties by adding a small random fraction to each weight, we assume w.l.o.g. that all shortest paths are unique: this implies that πx,y and π ˆy,x use the same edges and for any node z ∈ πx,y , both πx,z and πz,y are subpaths of πx,y . We let BT (x) (i, j) denote the band of paths in T (x) that connect nodes at level i with nodes at level j > i in the tree, assuming that the level of x in T (x) is 0. Notice that, if πc,d ∈ BT (x) (i, j) then hc,d = j − i. Let a and b be nodes on πx,y . We say a < b if a appears before b on the path. When referring to πx,y , we denote by [a, b] the segment of πx,y that starts at a and ends at b, and includes both a and b. By [a, b) we denote the same segment but excluding the node b and the edge incident on it. Similarly, (a, b] excludes a and the edge u,v emanating from it. By πx,y we denote the shortest path from node x to node y in G avoiding the nodes in [u, v]. Finally, we define the notion of a detour. Definition 1.1. Let x ≤ a < u ≤ v < b ≤ y be nodes on πx,y . Consider a path px,y from x to y of the form πx,a · pa,b · πb,y , where · denotes the path concatenation operator and pa,b is a path from a to b. The path pa,b is said to be a detour for going from x to y in G avoiding the nodes in [u, v], provided πa,b ∩ pa,b = ∅. u,v Claim 1.1. Let πx,y be the shortest path from x to y avoiding the segment [u, v]. Then there exist nodes a and b u,v on πx,y with x ≤ a < u ≤ v < b ≤ y such that πx,y can be represented as πx,a · pa,b · πb,y with pa,b being a detour u,v and pa,b = πa,b . u,v Proof. Since πx,y avoids the nodes in [u, v], pa,b cannot intersect any node in [u, v]. Moreover, pa,b cannot intersect any node in the segment (a, u), since if it intersects such a node c then pc,b will be a subpath of pa,b and thus

3

π u,v a,b

π x,a a

x

u

v

π b,y b

y

u,v Figure 1: Optimal detour πa,b for nodes x, y in G without edge (u, v).

πx,c · pc,b · πb,y will be a shorter path than πx,a · pa,b · πb,y . A similar argument rules out the possibility of pa,b u,v intersecting a node in the segment (v, b). Thus, πx,y = πx,a · pa,b · πb,y . Now, if a a detour p0a,b shorter than pa,b 0 exists, πx,a · pa,b · πb,y will be a shorter path than πx,a · pa,b · πb,y and thus contradicting the first part of the claim u,v which has already been proved. Hence, pa,b = πa,b . u t In view of the above claim, we will say that pa,b is an optimal detour for a shortest path from x to y in G u,v u,v avoiding the nodes in [u, v] if πx,y = πx,a · pa,b · πb,y and pa,b = πa,b (see Figure 1). Note that nodes a and b mark u,v the longest common portions of πx,y and πx,y . Vickrey Pricing in Networks. The Vickrey mechanism is a generalization of the sealed bid second price auction, in which the highest bidder wins the auction but pays a price equal to the second highest bid. This auction protocol motivates a rational bidder to bid truthfully [17]. In a distributed network in which multiple rational self-interested agents own different parts of the network, Vickrey mechanism is often the best way to determine the utility of various network elements. In order to elicit truthful responses from the agents, each agent is compensated in proportion to the marginal utility he/she brings to the network. Willing manipulations by participating agents is eliminated by making an agent’s payment depend only on the declarations of other agents. Consider a scenario in which we need to find shortest paths in a network G, where links are owned by selfinterested agents. Agents are assumed to bid on each link individually. Nisan and Ronen [19] formulated the following expression as the payment pe (x, y) to be made to the owner of a link e for a given node pair (x, y): pe (x, y) =



dx,y (G|we =∞ ) − dx,y (G|we =0 ) 0

if e ∈ πx,y otherwise

For any e ∈ πx,y , the term dx,y (G|we =0 ) can be easily computed from the shortest path tree T (x) rooted at x as dx,y (G|we =0 ) = dx,y (G) − we = dx,u (G) + dv,y (G), where e = (u, v). But computing dx,y (G|we =∞ ) naively for all e ∈ πx,y requires running an O(m + n log n) time shortest paths algorithm [8, 10] on G \ e for each e ∈ πx,y . This requires O(m + n log n) × hx,y time in the worst case. This problem was studied in [11], but no improvement to this trivial bound is known. The distance sensitivity problem we study in this paper is a generalization of the above problem to the situation where one is potentially interested in finding all Vickrey payments for all node pairs (instead of a single pair). In this case our first algorithm can carry out the entire computation using significantly less space than that used by the na¨ıve algorithm. On the other hand, our second algorithm reduces both time and space requirements of the computation. The complexities of all three algorithms are compared in Table 2 (However, for very sparse graphs the running time of the na¨ıve algorithm can be improved to O(mn(m + n log log n)) [20]). Organization of the paper. The remainder of this paper is organized as follows. In Section 2 we show how to compute shortest paths in a directed graph G with nonnegative real-valued edge weights where we exclude entire paths and we define a “path cover” notion, showing how to use information about shortest paths that do not use entire paths to determine shortest paths excluding a single node. These tools are used in Section 3 to devise an oracle for the distance sensitivity problem that answers nf-distance queries in constant worst-case time using nearly the same space required for storing a single distance matrix. This oracle can be constructed in O(mn2 + n3 log n) worst-case time. In Section 4 we show that, if one is willing to settle for more space, we can reduce the preprocessing time to O(mn1.5 + n2.5 log n). This second oracle uses O(n2.5 ) space while still answering nf-distance queries in constant worst-case time. Section 5 shows how to extend the oracles designed for node failures to deal also with link failures and Section 6 discusses a space lower bound for the single-source version of the distance sensitivity problem. Finally, Section 7 provides some concluding remarks. 4

Algorithm

Time

Space

Query Time

Na¨ıve

O(n2 (m + n log n))

O(n3 )

O(1)

Our 1st Result

O(n2 (m + n log n))

O(n2 log n)

O(1)

Our 2nd Result

O(n1.5 (m + n log n))

O(n2.5 )

O(1)

Table 2: Complexity of computing Vickrey payments for all node pairs.

2 Shortest Distances Under Deletion of Paths In this section we provide simple algorithms for computing shortest distances in a directed graph G with nonnegative real-valued edge weights where we exclude an entire path. These algorithms will be useful in Section 3 and Section 4 for constructing distance sensitivity oracles. In particular, if P is a set of shortest paths in G, we consider the problem of designing a procedure exclude(G, x, P ) that computes for each path π ∈ P the shortest distances from node x to all other nodes in G when the nodes on π are deleted. Throughout this paper we will assume that deletion of the nodes on a given path π results in the deletion of all its nodes including its endpoints. We can compute exclude(G, x, P ) in O(|P |(m + n log n)) worst-case time using a Dijkstra computation [10] on the graph G with all nodes in π deleted, for each π ∈ P . In the following we show that this computation becomes considerably more efficient if the paths in P are ‘independent’, a notion we define below. Algorithm exclude-d. Let G, x and P be as above, and let π ∈ P . Let Tx (π) be the subtree of T (x) rooted at the first node of π, and let Wπ be the set of all nodes in Tx (π) except the nodes on π. We observe that only nodes in Wπ may have their distances from x increased if we remove from G the nodes on π. Hence, consider the following directed graph Gπ = (Vπ , Eπ , wπ ), where • Vπ = Wπ ∪ {x}. • Eπ contains an edge from x to each node in Wπ and the edges in G induced by nodes in Wπ . • wπ,a,b is the weight of edge (a, b) in Gπ defined as:  wπ,a,b =

minc6∈Tx (π) {dx,c + wc,b } wa,b

if a = x otherwise

It is not difficult to see (see proof of Claim 2.1) that the shortest path in G from x to a node v in Wπ when nodes on π are removed has the same length as the shortest path from x to v in Gπ . Hence shortest paths from x to all nodes in Wπ when nodes in π are deleted from G can be computed by a Dijkstra computation on Gπ with source x. The algorithm exclude-d(G, x, P ) works in the same way as exclude(G, x, P ), but it uses Gπ instead of G for each π in P . Since the graph Gπ is typically smaller than G, exclude-d can be expected to have better performance than exclude for any collection of paths P . We now define the notion of independent paths, and we show that exclude-d performs significantly better than exclude when P is a collection of independent paths. Definition 2.1. Let P be a set of shortest paths in G. The paths in P are independent in G w.r.t. x if for any π1 , π2 ∈ P , no node of π1 is a descendent of a node of π2 in T (x), and vice versa. Claim 2.1. If P is a set of independent paths, then algorithm exclude-d(G, x, P ) requires O(m + n log n) time in the worst case and computes the same output as exclude(G, x, P ). 5

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

function nf-distance(x, y, v) : R if dx,v + dv,y > dx,y then return dx,y l ← dlog 2 hx,v e if l = log 2 hx,v then return sl[x, y, l + 1] r ← dlog2 hv,y e if r = log2 hv,y then return sr[x, y, r + 1] u ˆ ← vr[x, v, l], vˆ ← vl[v, y, r] d ← min(dx,uˆ + sl[ˆ u, y, l], sr[x, vˆ, r] + dvˆ,y ) if hx,v ≤ hv,y then d ← min(d, dl[x, y, l]) else d ← min(d, dr[x, y, r]) return d

{v in left half} {v in right half}

Figure 2: Query algorithm for the first oracle. Proof. Using the notation given above, for a given π ∈ P , let π 0 x,y be the shortest path from x to any y ∈ Wπ avoiding the nodes on π. Let b be the first node on π 0 x,y such that b ∈ Wπ and let c be the node preceding b. Then let us write π 0 x,y = px,c · ec,b · π 0 b,y where ec,b is the edge from c to b. Since for any z 6∈ Tx (π), the path πx,z is composed entirely of the nodes in T (x) − Tx (π), it follows that px,c = πx,c . Also note that π 0 b,y cannot contain any node z 6∈ Tx (π), since if it contains such a node z then πx,z will be a shorter path to z contradicting our choice of b. These two observations justify the use of Gπ instead of G − π in order to compute the shortest path tree rooted at x avoiding the nodes on π. For each π ∈ P , |Vπ | ≤ the number of nodes in Tx (π). Since the paths in P are independent, any two such subtrees are disjoint for distinct paths in P . Since each Eπ contains edges in G induced by nodes in Wπ and edges from x to only the nodes in Wπ , it follows that the sum of the cardinalities of all Vπ and Eπ are linear in n and m, respectively. Hence exclude-d runs in O(m + n log n) time when P is a set of independent paths. u t Path Cover. In this section, we discuss a covering notion that will be crucial to proving the correctness of our query algorithms. Let x ≤ l1 < l2 ≤ r1 < r2 ≤ y be nodes on πx,y . We define distx,y (l1 , l2 , r1 , r2 ) to be the u,v shortest distance from x to y when nodes in [l2 , r1 ] are deleted and when the detour πa,b (see Definition 1.1) is constrained to have a in [l1 , l2 ) and b in (r1 , r2 ]. If x and y are clear from the context, we will denote this by just dist(l1 , l2 , r1 , r2 ). We will say that a value d covers [l1 , l2 ) × (r1 , r2 ] if d ≤ dist(l1 , l2 , r1 , r2 ). Claim 2.2. For any x < u b≤u e < u ≤ v < ve ≤ vb < y on πx,y ,

distx,y (x, u, v, y) = min(distx,y (x, u e, ve, y), distx,y (b u, u, v, y), distx,y (x, u, v, vb))

Proof. Let the end-points of the optimal detour in the shortest path from x to y that avoids [u, v] be a and b, a < b, and consider all possible positions of a in [x, u): • a in [x, u b): dist(x, u, v, b v ) handles the cases when b lies in (v, vb] and dist(x, u e, ve, y) handles the cases when b lies in (b v , y]. Thus, min(dist(x, u, v, vb), dist(x, u e, ve, y)) handles all possible positions of b in (v, y]. • a in [b u, u): dist(b u, u, v, y) handles the cases where b lies in (v, y].

Thus, for each possible position of a in [x, u), min(distx,y (x, u e, ve, y), distx,y (b u, u, v, y), distx,y (x, u, v, vb)) handles all possible positions of b in (v, y]. u t 3 Distance Sensitivity Oracles with Query Time O(1) and Space O(n2 log n) In this section we describe a deterministic oracle for single node failure with constant query time that uses nearly the same space required for storing a single distance matrix. In particular, we show how to preprocess a graph with nonnegative real-valued edge weights in O(mn2 + n3 log n) worst-case time, producing a compact oracle that uses O(n2 log n) space and answers nf-distance queries in O(1) worst-case time. Data structure. Using O(n2 log n) space, we maintain each dx,y and hx,y , and we maintain six matrices dl, dr, sl, sr, vl, and vr of size n × n × log n defined for any x, y and for any i, 1 ≤ i ≤ dlog2 hx,y e, as follows: 6

^ Paths covered by sl[u,y,l]

x

^ u

~ u

~ v

v

^ v

y

2l-1 ^ Paths covered by sr[x,v,r]

x

^ u

~ u

~ v

v

^ v

y

2r-1 Paths covered by dl[x,y,l]

x

^ u

~ u

~ v

v

^ v

y

2l-1 Figure 3: Query example with hx,v ≤ hv,y : The shortest distance from node x to node y without using node v can be obtained by taking the minimum of dx,ˆu + sl[b u, y, l], sr[x, vb, r] + dvˆ,y , and dl[x, y, l]. Notice that the union of the paths in the grey areas in the figure is the set of all possible paths from x to y that do not use node v. • dl[x, y, i] = shortest distance from node x to node y in G without the nodes on the sub-path of πx,y starting at level 2i−1 and ending at level 2i − 1 in T (x); b without the nodes on the sub-path of π • dr[x, y, i] = shortest distance from node y to node x in G by,x starting i−1 i b at level 2 and ending at level 2 − 1 in T (y); • sl[x, y, i] = shortest distance from node x to node y in G without the node of πx,y on level 2i−1 in T (x);

b without the node of π • sr[x, y, i] = shortest distance from node y to node x in G by,x on level 2i−1 in Tb(y); • vl[x, y, i] = node of πx,y at level 2i−1 in T (x);

• vr[x, y, i] = node of π by,x at level 2i−1 in Tb(y).

Preprocessing. Distances dx,y and hx,y and matrices vl and vr are easily initialized from shortest path trees of G. To compute dl[x, y, i], we call procedure exclude(G, x, BT (x)(2i−1 , 2i −1)), discussed in Section 2, for each x and for each i, 1 ≤ i ≤ dlog2 hx,y e (recall that BT (x) is defined under Notation in Section 1). To compute dr[x, y, i], we call i−1 i b y, B procedure exclude(G, b(y) (2 , 2 −1)), for each y and for each i, 1 ≤ i ≤ dlog2 hx,y e. For each x and for each T i, 1 ≤ i ≤ dlog2 hx,y e, we compute sl[x, y, i] by calling procedure exclude-d(G, x, BT (x)(2i−1 , 2i−1 )). We compute i−1 i−1 b y, B sr[x, y, i] by calling procedure exclude-d(G, b(y) (2 , 2 )), for each y and for each i, 1 ≤ i ≤ dlog2 hx,y e. T 7

Query. The query algorithm is shown in Figure 2. In line 1 of the algorithm, we get rid of the case where v ∈ / πx,y and return dx,y as the answer. Lines 2 and 3 take care of the case where v is 2l edges away from node x on πx,y for some nonnegative integer l, 0 ≤ l < log2 hx,y . Lines 4 and 5 handle the case where v is 2r edges away from node y on π by,x for some nonnegative integer r, 0 ≤ r < log2 hx,y . Lines 6 to 9 take care of the remaining cases.

Correctness. Using the matrices sl and sr lines 2 to 5 of the query algorithm answer the following two types of trivial queries: (1) hx,v = 2l for some nonnegative integer l, 0 ≤ l < log2 hx,y , and (2) hv,y = 2r for some nonnegative integer r, 0 ≤ r < log2 hx,y . So in order to prove the correctness of nf-distance, we need only to prove the correctness of the code segment of lines 6 to 9 that handles the nontrivial case when none of the above four conditions hold. Here, dx,ˆu + sl[b u, y, l] = distx,y (b u, u, v, y) and sr[x, vb, r] + dvˆ,y = distx,y (x, u, v, vb). Now let us consider the case when hx,v ≤ hv,y (see Figure 3). In this case, 0 < hx,ˆu < huˆ,v = 2l−1 ≤ hv,ˆv . dl[x, y, l] is the distance from x to y avoiding a sub-path having 2l−1 nodes and starting at a node that is 2l−1 edges away from x on πx,y . Let the endpoints of that sub-path be u e and ve, and hx,˜u < hx,˜v . So, we have, x < u b≤ u e < v < ve ≤ vb < y. Thus, by Claim 2.2, d covers [x, v) × (v, y]. By construction of matrices dl, sr, and sr, d is always equal to the weight of some path from x to y that does not use node v. Thus, we can conclude that d is the desired answer to nf-distance(x, y, v). A similar argument holds for the case when hx,v > hv,y . Claim 3.1. Preprocessing requires O(mn2 + n3 log n) worst-case time and any nf-distance operation requires O(1) worst-case time. Proof. We observe that the number of distinct paths in BT (x) (2i−1 , 2i − 1) is exactly the same as the number of nodes on level 2i − 1 in T (x). Hence, the total number of distinct paths in all BT (x) (2i−1 , 2i − 1), for 1 ≤ i ≤ dlog2 hx,y e, is bounded P from above by the number of nodes in T (x). Since exclude(G, x, P ) runs i−1 i , 2 − 1)| = O(n), the matrices dl and dr can be in O(|P |(m + n log n)) time and 1≤i≤dlog2 hx,y e |BT (x) (2 calculated in O(mn2 + n3 log n) worst-case time. On the other hand, for each x and for each i, 1 ≤ i ≤ dlog2 hx,y e, we can compute sl[x, y, i] by calling procedure exclude-d(G, x, BT (x)(2i−1 , 2i−1 )) since the single-node paths in BT (x) (2i−1 , 2i−1 ) are trivially independent. Since exclude-d(G, x, BT (x)(2i−1 , 2i−1 )) runs in O(m + n log n), the total time required to compute the matrix sl is O(mn log n + n2 log2 n). Similarly the matrix sr can be computed in O(mn log n + n2 log2 n) time. Hence the preprocessing time is dominated by the time to compute the dl and dr matrices and requires O(mn2 + n3 log n) worst-case time. Since the query algorithm executes a constant number of steps, it runs in O(1) worst-case time. u t 4 Improving Preprocessing Time In this section we show that, if one is willing to settle for more space, we can design a distance sensitivity oracle where we reduce the preprocessing time to O(mn1.5 + n2.5 log n). This second oracle uses O(n2.5 ) space and answers distance queries in O(1) worst-case time.

Data structure. We maintain each dx,y √ and hx,y , and we maintain five matrices dh, dt, √ vc, dc, and bc using O(n2.5 ) space. Matrix dh has size n × n × n, matrices dt, vc, and dc have size n × n × 2 n, and matrix bc has size n × n. They are defined as follows: • dh[x, y, i] = shortest distance from node x to node y in G without the node v on πx,y for which hx,v = i, √ where 0 < i ≤ n; • dt[x, y, i] √ = shortest distance from node x to node y in G without the node v on πx,y for which hv,y = i,where 0 < i ≤ 2 n; • vc[x, y, i] = √ node of πx,y at level qi in T (x), where q0 = 0 < q1 < ·√ · · < qk < n is any increasing sequence of k + 1 ≤ 2 n positive numbers depending on x, and qi − qi−1 ≤ n for any i, 1 ≤ i ≤ k (the method of obtaining the sequence {qi : 0 ≤ i ≤ k} is described later); • dc[x, y, i] = shortest distance from ˆ = vc[x, y, i] + 1 √ node x to node y in G without the nodes in πuˆ,ˆv , where u and vˆ = vc[x, y, i + 1], if hvˆ,y > n, and undefined otherwise; • bc[x, y] = index i such that y is at a level between qi + 1 and qi+1 in T (x). 8

l0

x

s0

q0

x

s1

s3

q3 q4 q5 q6

s4

q7

s2 l2 l3

(a) Sequence {li} obtained by cutting T'(x) at vertices of degree >1

x

q1 q2

l1

(b) Sequence {si} obtained by cutting at regular intervals of height √n

(c) Sequence {qi} obtained by merging sequences {li} and {si}

Figure 4: Constructing matrix dc: cutting trees T 0 (x) to form bands of edge-disjoint paths Preprocessing. As distances dx,y and hx,y are easily initialized from shortest path trees of G, we focus on constructing matrices dh, dt, dc, vc, and bc. Since bands BT (x) (i, i) contain paths formed by single nodes, which we note are trivially independent w.r.t. x, constructing matrix dh can be done by performing calls to algorithm √ exclude-d(G, x, BT (x) (i, i)), presented in Section 2, for each node x and for each i such that 0 < i ≤ n. b y, B Similarly, matrix dt can be initialized via calls to algorithm exclude-d(G, b(y) (i, i)) for each node y and for T √ each i such that 0 < i ≤ 2 n. √ To compute dc, we consider the problem of cutting each shortest path tree T (x) into at most 2 n bands of √ height ≤ n, finding a suitable subset of each band containing paths independent w.r.t. x, and calling exclude-d to compute shortest distances without those paths. √ More in detail, for each node √ x we wish to find a sequence q0 = 0 < q1 < · · · < qk < n of length k + 1 ≤ 2 n such that qi − qi−1 ≤ n for any i, 1 ≤ i ≤ k, and compute a subset of paths in BT (x) (qi + 1, qi+1 ) which are independent w.r.t. x. The following claim provides a nice combinatorial property on trees that helps us solve the problem. Let T be a tree with n nodes and let size(v) be the number √ of nodes in the subtree of T rooted at v. Let T 0 be obtained from T by deleting any node u such that size(u) ≤ n. √ Claim 4.1. For any tree T with n nodes, at most n nodes in T 0 have out-degree > 1. √ √ Proof.√Since T 0 contains only nodes that have size > n in T , T 0 has at most n leaves. This implies that at most n nodes of T 0 have out-degree > 1. u t 0 Let l0 < l1 < · · · < lk be the sequence of levels is a node with out-degree √ in T such that at each level li there 0 > 1 in T (Figure 4a). By Claim 4.1, k ≤ n. We now observe that cutting T 0 at each li yields bands of node-disjoint paths:

Claim 4.2. For any i, 1 ≤ i < k, BT 0 (li + 1, li+1 ) is a band of node-disjoint shortest paths. Proof. BT 0 (li + 1, li+1 ) contains all the paths that connect nodes at level li + 1 with nodes at level li+1 in T 0 . The proof easily follows by observing that, by the definition of sequence {li }, all nodes in T 0 at levels li + 1 to li+1 − 1 have out-degree ≤ 1. u t Notice that, since T 0 is obtained by pruning T , BT 0 (li + 1, li+1 ) ⊆ BT (li + 1, li+1 ). Moreover, since paths in BT 0 (li +1, li+1 ) are node-disjoint and connect nodes at level li +1 to nodes at level li+1 in T 0 , they √ are independent in G w.r.t. the root of T 0 . Unfortunately, however, we are not guaranteed that li − li−1 ≤ n, as we need for constructing dc. However, we note that splitting a band of node-disjoint paths yields again bands of node-disjoint paths. This leads to the following claim: Claim 4.3. If BT (i + 1, j) is a band of node-disjoint shortest paths, then for any i < h < j, both BT (i + 1, h) and BT (h + 1, j) are bands of node-disjoint shortest paths. 9

1. 2. 3. 4. 5. 6.

7.

function nf-distance(x, y, v) : R if dx,v + dv,y √ > dx,y then return dx,y if hv,y ≤ 2 n then return dt[x, y, hv,y ] i ← bc[x, v] u ˆ ← vc[x, y, i] + 1 vˆ ← vc[x, y, i + 1] d ← min { dc[x, y, i], dx,uˆ + dh[ˆ u, y, hu,v ˆ ], dt[x, vˆ, hv,ˆv ] + dvˆ,y } return d

Figure 5: Query algorithm for the second oracle. √ Let s0 < s1 < · · · < s√n be a sequence such that si = i · n (Figure 4b). By Claim 4.3, if we √ merge n with sequences {li } and {si } and get rid of duplicates, we obtain an ordered sequence {q } of length ≤ 2 i √ the desired properties√(Figure 4c). We remark that {qi } induces at most 2 n bands of node-disjoint paths in T 0 (x) with height ≤ n. These paths are independent w.r.t x. To initialize √ dc, we can thus perform calls to exclude-d(G, x, BT 0(x) (qi + 1, qi+1 )) for each node x and for each 0 < i ≤ 2 n. Again, we can use exclude-d instead of exclude. At this point, one may argue that excluding only independent paths in T 0 (x), which is obtained by pruning T (x), might not give the correct result for some dc[x, y, i] if y 6∈ T 0 (x). However, dc[x, y, i] is defined only when √ hvˆ,y > n, where vˆ = vc[x, y, i + 1], and vˆ ∈ T 0 (x) in this case. Thus dc[x, y, i] is correctly computed by calling exclude-d(G, x, BT 0(x) (qi + 1, qi+1 )). Finally, we observe that once sequences {qi } have been computed for each T (x), matrices vc and bc are easily initialized. √ Query. The query algorithm is shown in Figure 5. We first get rid of the cases where v 6∈ πx,y and hv,y ≤ 2 n, for which the answers are stored explicitly in dx,y and dt[x, y, hv,y ], respectively (lines 1–2). In line 3 we retrieve the unique index i such that v falls in BT (x) (qi + 1, qi+1 ), and then in lines 4–5 we identify the nodes u b and vb on the path πx,y in T (x) that are at levels qi + 1 and qi+1 , respectively. The correct answer is given in line 6 by accessing matrices dc, dh, and dt. Correctness. of nf-distance in the case √ that lines 3–7 are executed, we first note √ √ To prove the correctness that hv,y > 2 n implies hvˆ,y > n, since by construction qi − qi−1 ≤ n, and thus dc[x, y, i] is well-defined. We u,v now prove that the answer takes into account all possible configurations of the end-points of detours πa,b (see Figure 1). It is easy to see that x < u b < v < vb < y and: • dc[x, y, i] = distx,y (x, u b, vb, y)

• dx,ˆu + dh[ˆ u, y, huˆ,v ] = distx,y (b u, v, v, y) • dt[x, vˆ, hv,ˆv ] + dvˆ,y = distx,y (x, v, v, vb)

Thus, by Claim 2.2, d = distx,y (x, v, v, y).

Claim 4.4. Preprocessing requires O(mn1.5 + n2.5 log n) worst-case time and any nf-distance operation requires O(1) worst-case time. Proof. Growing shortest path trees T (x) for all nodes x requires O(mn + n2 log n) in the worst-case [10]. The proof √ for the preprocessing follows from Claim 2.1 by observing that initializing dc, dh, and dt is carried out via O( n) calls to exclude-d for each node x. The bound for queries is straightforward. u t 5 Handling Link Failures The oracles in the previous two sections can be easily extended to handle link failures by maintaining one additional matrix de of size n × n for any x and y: 10

1. 2. 3. 4. 5.

function lf-distance(x, y, u, v) : R if dx,u + wu,v + dv,y > dx,y then return dx,y d1 ← nf-distance(x, y, u) d2 ← dx,u + de[u, y] d ← min(d1 , d2 ) return d

Figure 6: Query algorithm for link failure. • de[x, y] = shortest distance from node x to node y in G without the first edge of πx,y . Claim 5.1. The matrix de can be initialized in O(mn + n2 log n) time. Proof. The proof directly follows from the properties of an earlier version of the algorithm exclude-d in [7] based on the notion of edge-independent paths. However, since in this paper we use the notion of node-independent paths instead, we present a proof of the claim based on it. Consider a given T (x) and let v1 , v2 , · · · vk be the children of x in T (x). We extend T (x) to T 0 (x) and thus G to G0 by introducing k new nodes u1 , u2 , · · · uk and for 1 ≤ i ≤ k, replacing each edge (x, vi ) in T (x) by two consecutive edges (x, ui ) and (ui , vi ). Clearly removing any node ui from T 0 (x) has the same effect as removing the corresponding edge (x, vi ) from T (x). Therefore, since the single node paths in BT 0 (x) (1, 1) are trivially independent, we can compute de[x, y] for all y ∈ V − {x} in time O(m + n log n) by calling exclude-d(G0, x, BT 0 (x) (1, 1)). (Note that we can perform the same computation in the same time bound without constructing G0 explicitly, but instead extending exclude-d to handle this special case.) Since we need to call exclude-d once for each x ∈ V , the total initialization time for de is O(mn + n2 log n). u t Query. The query algorithm is shown in Figure 6. In line 1 of the algorithm, we get rid of the case where the failed link is not on πx,y . In line 2, d1 is assigned the shortest distance from x to y avoiding node u and thus edge (u, v). In line 3, d2 is assigned the shortest distance from x to y that avoids the edge (u, v) but passes through the node u. In line 4, d is assigned the minimum of d1 and d2 which is then returned in line 5 as the shortest x to y distance avoiding (u, v). Correctness. We observe that the paths in G from x to y that avoid (u, v) can be divided into two groups: (1) paths that avoid u and (2) paths that pass through u. Line 2 of the algorithm finds the length of the shortest path in group (1) and line 3 does the same for group (2). Thus the minimum of the two distances obtained in lines 2 and 3 gives the required distance. Note that since nf-distance runs in constant worst-case time, so does lf-distance. Remark. We also observe that the shortest distance from any node x to another node y avoiding two consecutive b which is the dual failed links on πxy , can be computed in constant time by maintaining another n × n matrix de, b of de in G. b b without the first edge of π • de[x, y] = shortest distance from node y to node x in G ˆy,x .

Assuming that the two consecutive failed links are (u, v) and (v, w), all paths in G from x to y avoiding those two edges can be divided into two groups: (1) paths that avoid v: the length of the shortest such path can be found by calling nf-distance(x, y, v), and (2) paths that pass through v: the length of the shortest such path is b given by de[x, v] + de[v, y]. Thus the smaller of these distances is the required shortest distance. 6

A Space Lower Bound

In this section, we briefly discuss a space lower bound for the single-source version of the distance sensitivity problem. This version is relevant, e.g., in routing in networks, where routers typically only need to know which outgoing link to choose to get on a shortest path to a packet’s destination [16]. Without link-failures, this can be 11

represented in O(n) space with a single-source shortest path tree. In Section 3, we have seen that one can represent all pairs shortest paths with failures in O(n2 log n) space. A natural question is whether there is something to do for the single-source case. If d is the maximal number of hops to the destination, in [] it is shown that one can store distances avoiding a failed link in O(d · n) space. We now discuss the negative result that for any number m of edges in a graph G, we may need Ω(m) space to represent distances with link-failures. The graph is as follows. Consider a chain over the nodes v0 , . . . , vn−1 , where v0 is the source and n is the number of nodes of G. Each edge (vi , vi+1 ) has weight 2 · n. If an edge (vi , vj ), j > i + 1, is in G, it gets weight (4 · n − i − j) · n + id(i, j), where id(i, j) < n. Hence, if link (vi , vi+1 ) fails and we want to go to vj from v0 , the unique optimal replacement link to use is (vi , vj ), if it is in G. This holds even in the undirected case. The graph has up to n replacement edges, and for each of them, there is at least a link-failure and a destination that will need it. Hence, a link-failure oracle will have to know the edges of G, and will need to know all the id(i, j) values. This yields Ω(m) space, which might be as high as Ω(n2 ) for dense graphs. 7 Conclusions We have presented compact data structures for maintaining information about shortest paths in a weighted directed graph in case of both node failures and link failures. We have shown that, surprisingly, such a data structure can be stored using nearly the same space required to store a single distance matrix, while still supporting queries in constant time. Our oracle can be easily constructed in O(mn2 + n3 log n) time, matching the preprocessing time of all pairs variants of similar problems such as most vital node detection [18] and Vickrey pricing [11] in networks, which require O(n3 ) space. However, we have shown that using O(n2.5 ) space we can improve construction time to O(mn1.5 + n2.5 log n). Differently from the case of dynamic algorithms, where distances have to be updated after each node or link failure, our oracles are already prepared to any possible failure of a single node or link, and so the delay time in answering a query is minimized. If failures in a network happen quite rarely, when a node or link goes down we have time to construct a new oracle in the background to cope with a possible additional failure. It would be interesting to explore whether compact oracles with fast query time that are able to deal with more than one failure at a time can be constructed. Finally, can we further improve construction time? Acknowledgments The first two authors wish to thank the anonymous referees for some very useful suggestions for improving the presentation of the conference version of their paper. References [1] M.O. Ball, B.L. Golden, and R.V. Vohra. Finding the most vital arcs in a network. Operations Research Letters, 8:73–76, 1989. [2] A. Bar-Noy, S. Khuller, and B. Schieber. The complexity of finding most vital arcs and nodes. Technical Report No CS-TR-3539, Institute for Advanced Studies, University of Maryland, College Park, MD, 1995. [3] R.A. Chowdhury and V. Ramachandran. Improved distance oracles for avoiding link-failure. In Proc. of the 13th International Symposium on Algorithms and Computation (ISAAC’02), Vancouver, Canada, LNCS 2518, pages 523– 534, 2002. [4] H.W. Corley and D.Y. Sha. Most vital links and nodes in weighted networks. Operations Research Letters, 8:157–160, 1982. [5] C. Demetrescu and G.F. Italiano. Fully dynamic all pairs shortest paths with real edge weights. In Proc. of the 42nd IEEE Annual Symposium on Foundations of Computer Science (FOCS’01), Las Vegas, Nevada, 2001. [6] C. Demetrescu and G.F. Italiano. A new approach to dynamic all pairs shortest paths. To appear in Proc. of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), San Diego, California, 2003. [7] C. Demetrescu and M. Thorup. Oracles for distances avoiding a link-failure. In Proc. of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02), San Francisco, California, pages 838–843, 2002. [8] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. [9] B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In Proceedings of the 19th IEEE INFOCOM - The Conference on Computer Communications, Tel-Aviv, Israel, pages 519–528, 2000. [10] M.L. Fredman and R.E. Tarjan. Fibonacci heaps and their use in improved network optimization algorithms. Journal of the ACM, 34:596–615, 1987.

12

[11] J. Hershberger and S. Suri. Vickrey prices and shortest paths: What is an edge worth?. In Proc. of the 42nd IEEE Annual Symposium on Foundations of Computer Science (FOCS’01), Las Vegas, Nevada, pages 129–140, 2001. Erratum in FOCS’02. [12] J. Hershberger, S. Suri, and A. Bhosle. On the difficulty of some shortest path problems. In Proc. of the 20th International Symposium of Theoretical Aspects of Computer Science (STACS’03), Berlin, Germany, pages 343–354, 2003. [13] V. King. Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs. In Proc. 40th IEEE Symposium on Foundations of Computer Science (FOCS’99), New York City, New York, pages 81–99, 1999. [14] V. King and G. Sagert. A fully dynamic algorithm for maintaining the transitive closure. In Proc. 31st ACM Symposium on Theory of Computing (STOC’99), Atlanta, Georgia, pages 492–498, 1999. [15] V. King and M. Thorup. A space saving trick for directed dynamic transitive closure and shortest path algorithms. In Proceedings of the 7th Annual International Computing and Combinatorics Conference (COCOON’01), Guilin, China, LNCS 2108, pages 268–277, 2001. [16] J. Moy. OSPF: Anatomy of an Internet Routing Protocol. Addison-Wesley, 1999. [17] A. Mas-Collel, W. Whinston, and J. Green. Microechonomic theory. Oxford University Press, 1999. [18] E. Nardelli, G. Proietti, and P. Widmayer. Finding the most vital node of a shortest path. In Proceedings of the 7th Annual International Computing and Combinatorics Conference (COCOON’01), Guilin, China, LNCS 2108, pages 278–287, 2001. [19] N. Nisan and A. Ronen. Algorithmic mechanism design. In Proceedings of the 31st Annual ACM Symposium in Theory of Computation (STOC’99), pages 129–140, 1999. [20] S. Pettie. A faster all-pairs shortest path algorithm for real-weighted sparse graphs. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP’02), Malaga, Spain, LNCS 2380, pages 85-97, 2002. [21] T. Takaoka. Subcubic cost algorithms for the all pairs shortest path problem. Algorithmica, 30:309–318, 1998. [22] M. Thorup. Fortifying OSPF/IS-IS against link-failure, 2001. Manuscript.

13