Reliable Communication in Faulty Star Networks

Reliable Communication in Faulty Star Networks Khaled Day Department of Computer Science Sultan Qaboos University Muscat 123 Oman

Abstract‘ We take advantage of the hierarchical structure of the star graph network to obtain an efficient method for constructing node-disjoint paths between arbitrary pairs of nodes in the network. A distributed faulttolerant routing algorithm for the star network based on this construction method is then presented and evaluated. The proposed algorithm adapts the routing decisions in response to node failures. Node failure and repair conditions may arise dynamically (at any time) provided that the total number of faulty nodes at any given time is less than the nodeconnectivity n-1 of the n-star. We show that if the node failures occur ’reasonably… apart in time, then all messages will be routed on paths of length δ + ε where δ is the minimum distance between the source and the destination and ε is 0, 2, or 4. In the unlikely case where more failures occur in a ’short period…, the algorithm still delivers all messages but via possibly longer paths. Index Terms‘ Fault tolerance, star graph, routing, node-disjoint paths, distributed algorithm. 1. Introduction The star graph [1] has received considerable attention in the last decade as an attractive interconnection topology [2]. It provides an interconnection topology for a large number of processors using a low number of communication channels while providing a high level of redundancy that makes it highly faulttolerant [3]. The star graph network has many other attractive features including vertex and edge symmetry, sub-logarithmic degree and diameter, recursive structure, efficient routing and broadcasting, and embedding of important interconnection graphs such as hypercubes,

Abdel-Elah Al-Ayyoub Department of Computer Science and Information Systems Jordan University of Science and Technology Irbid 22110 Jordan

trees, cycles, and meshes [69,13,14,17,19,21,23]. The star network is optimally fault-tolerant and strongly resilient [3]. Its fault diameter is one plus its diameter [16,22]. A number of efficient algorithms have been designed to run on star graph interconnected multiprocessors [4,11,18,20]. In a large multiprocessor system, the probability of having faulty nodes (or links) approaches one as the size of the network increases. It is essential therefore to design for such systems fault-tolerant routing algorithms that allow to route messages between non-faulty nodes in the presence of faulty ones. Some fault tolerance properties of the star graph are analyzed in [3,16,22]. A fault-tolerant routing scheme using a depth first search is presented in [23]. The scheme keeps the information about the traversal path in a stack that is popped every time a message backtracks. However, this scheme does not guarantee liveness and deadlock-free transmission. In fact an example can be found [5] where the message gets stuck by being continuously sent to the same node. In [5] a fault-tolerant routing scheme also based on depth first search combined with backtracking is presented. A routed message carries a large amount of information about the traversed links and visited nodes. Furthermore, when an optimal path does not exist in a faulty n-star and assuming a maximum of n-2 faults, their routing algorithm may require up to 2i+2 additional routing steps, where i is O( n ). In this paper we first present an efficient method for constructing n-1 node-disjoint paths between any pair of distinct nodes in the n-star network. We then design and evaluate a fault-tolerant routing algorithm based on this construction method. Our algorithm allows node failures and repairs to occur dynamically (at any time)

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

provided that the total number of faulty nodes does not exceed n-2 at any given time. Under this condition at least one among the n-1 nodedisjoint paths between two nodes is fault-free. Our approach is based on identifying these faultfree paths and using them to route messages. When such a routing path becomes blocked due to a node failure, the blocked messages are derouted and special warning messages are sent back to message sources in order to switch to different node-disjoint paths. In our approach we propagate failure information only to source nodes that have actually issued messages over blocked paths. When such a source node issues a message, and if the faulty node that blocks the message…s path has not recovered yet, then, and only then, the source node is requested to switch to a different path for the corresponding destination. The techniques used to identify the paths, to propagate node failure information to source nodes, and to switch from a routing path to another, incur little communication and computation overhead. However, each node has to maintain a routing table of n! entries each identifying the node-disjoint path to use when sending to the corresponding destination node. These table entries are adjusted upon reception of warning messages. The algorithm can be readily extended to handle link failures in addition to node failures. 2. Preliminaries An interconnection network is modeled as an undirected graph. The vertices correspond to the processors and the edges correspond to the communication links. An n-star graph, denoted by Sn, is an undirected graph consisting of N = n! vertices labeled with the n! permutations on n symbols (we use the set of symbols = {1,2,“ , n}) and such that there is an edge between any two vertices if, and only if, their labels differ only in the first (leftmost) and in any (one) other position. Sn is regular and vertex and edge symmetric [1,2]. The following is a description of the notations used in the path construction given in the next section. Let X and Y be two nodes in the n-star graph Sn. Let α be one of the symbols in .

Let i and j be two symbol positions (i.e., 1≤ i,j ≤ n). We introduce the following notations: Xi: The symbol at the ith position in the permutation X. pos(X, α): The position of the symbol α in the permutation X. X → Y: An edge from node X to node Y. X(i): X…s neighbor obtained by exchanging X1 and Xi. Notice that X(1) = X. X(i,j): The node (permutation) obtained from X(i) by exchanging X1 and Xj. S nα / i : The sub-graph of Sn containing all nodes with symbol α in position i. π = X → “ → Y: π is a path from X to Y. MinPath(X, Y) : A minimum length path between X and Y. π1//π2: Joining two paths π1 and π2. This join operation is only defined when the last node of π1 coincides with the first node of π2. The result is a path containing the nodes of π1 followed by those of π2 without duplication of the common node. A minimum length path MinPath(X,Y) from X to Y can be constructed by applying repeatedly the following routing rules at every node Z starting at X and until the destination Y is reached [2]: (R1) If Z1 = Y1 then select smallest i > 1such that Zi ≠ YI; exchange Z1 and Zi. (R2) Otherwise let i = pos(Y, Z1), exchange Z1 and Zi. Failure Assumptions. We make the following failure assumptions: 1) All the node faults are full-stop, i.e., there are no malicious faults. 2) Fault detection and diagnosis algorithms exist. Each node knows exactly the failure status of all its neighbors. 3) Node failures and repairs may happen at any time, but the total number of faulty nodes does not exceed n-2 at any time. We assume that each node…s information about the failure status of its neighbors is updated

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

dynamically using the fault detection and diagnosis algorithms. 3. Efficient Node-Disjoint Path Construction A number of methods for constructing nodedisjoint paths in star networks have been proposed in previous research works [9,10,15]. However, these methods involve a cumbersome characterization of the node-disjoint paths and therefore cannot be easily used for efficient fault-tolerant routing. We start by presenting an easier to implement node-disjoint path construction method based on the hierarchical structure of the star graph. The obtained method will be used as a basis for the fault-tolerant routing algorithm presented in the next section. Let X = X1X2 “ Xn and Y = Y1Y2 “ Yn be two distinct nodes in the n-star graph. Let p be the smallest symbol position satisfying Xp ≠ Yp and p ≥ 2. The value of p will play a key role in the node-disjoint path construction. Let α be any symbol other than Xp and Yp. Let iα = pos(X,α) and let jα = pos(Y,α). We construct a path πα between X and Y for each value of α as follows: πα = X [→ X(iα)] → X(iα,p) // MinPath(X(iα,p), Y(jα,p)) // Y(jα,p) → Y(jα) [→ Y]. In the above defined path πα, the starting edge X → X(iα) serves to bring the symbol α to the first position. If the symbol α is already in first position in X (i.e., if iα = 1), then this edge is omitted. This possible omission of the first edge is expressed by the square brackets notation [→ X(iα)]. Similarly the last edge Y(jα) → Y is omitted when Y(jα) is equal to Y (i.e., when jα = 1). Since there are n-2 values for α satisfying α ≠ Xp and α ≠ Yp, there are n-2 such πα paths. Theorem 1: The above n-2 paths πα are nodedisjoint. (proof omitted due to size limitation). Theorem 2: The n-2 paths πα along with the path π Y p form a set of n-1 node-disjoint paths. (proof omitted due to size limitation)

Theorem 3: There is a set of n-1 node disjoint paths between any two nodes of the star graph each of length equal to δ + ε where δ is the minimum distance and ε is 0, 2, or 4. (proof omitted due to size limitation) Figure 1 illustrates the construction of three node-disjoint paths between X = 2143 and Y = 1234 of S4. We have in this case p = 3, Xp = 4, Yp = 3, α1 = 1, i1 = pos(X, α1) = 2, j1 = pos(Y, α1) = 1, α2 = 2, i2 = pos(X, α2) = 1, j2 = pos(Y, α2) = 2, l = pos(X, Yp) = 4, k = pos(Y, Xp) = 4, and m = 2. This path construction method will serve as the basis for the proposed fault-tolerant routing algorithm presented in the next section. Using the above construction, each of the n-2 nodedisjoint paths πα is completely identified by the triple (X,Y,α) and the path π Y p is completely identified by the triple (X,Y,Yp). Notice that in both cases the last item in the path identifier corresponds to the symbol moved to first position (if not already in first position) then to position p during the first one or two routing steps. Recall that p is a position at which the source X and the destination Y differ. The path identifier (X,Y,α) or (X,Y,Yp) is included in the message header to assist the routing along one of the node-disjoint paths. 4. Overview Of A Fault-Tolerant Routing Algorithm We present in this section an efficient distributed fault-tolerant routing algorithm for the star graph based on the node-disjoint paths constructed in the previous section. The algorithm guarantees the routing of a message from any source node X to any destination node Y in the presence of at most n-2 faulty nodes. Each intermediate node routes the message based only on local data and on the failure status of the neighboring nodes. The algorithm routes a message to the destination Y following an optimal or near optimal path except for a short period immediately following node failure detection. The following result is derived from the existence of n-1 node-disjoint paths of

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

S 1/3 4 π1

X(2,3)=4213

Y(1,3)=3214

X(2)=1243 X(4,3)=4132

X(4)=3142

S

4/3 4

1432

Y(4,2)=2431

Y(4)=4231

X=2143

Y=1234

π3

S 3/3 4

Y(2)=2134

X(1,3)=4123

Y(2,3)=3124

π2

S 42/3 Figure 1. Three node-disjoint paths between 2143 and 1234 in S4 optimal or near optimal length between any two nodes of the n-star (theorem 3). Corollary: In the presence of at most n-2 faulty nodes in the n-star graph, there exists between any two fault-free nodes at least one fault-free path of minimum length, minimum length plus 2, or minimum length plus 4. Consider a source node X and a destination node Y. At any given time, messages issued by X are routed towards Y on one of the n-1 node-disjoint paths constructed in the previous section. A message carries a path identification in the form (X,Y,s) where s is either α or Yp as described earlier. Each node X maintains in a routing table PathIdX the symbol s corresponding to the path currently in use for each possible destination. So if the path currently in use to route messages from X to Y is the path (X,Y,s), then we would have PathIdX[Y] = s. As long as there are no new failures or repairs, messages from X to Y continue to flow on the same path. Initially, every node X initializes, for every destination Y, its table entry PathIdX[Y] to the value Yp. This identifies the path π Y p between X and Y. A source node X, desiring to send a message to a destination node Y has to tag the message with

the triple (X,Y,PathIdX[Y]) and forward it to its X(d) neighboring node, where d = p if X1 = PathIdX[Y] or d is such that Xd = PathIdX[Y] otherwise. An intermediate node Z receiving a message carrying the identifier (X,Y,s) attempts to forward the message along the identified path. Node Z determines the forwarding edge on this path based on the node-disjoint path construction presented earlier. If the forwarding edge leads to a faulty neighboring node, then node Z forwards the message along the path (Z,Y,PathIdZ[Y]) instead, and modifies the path identification of the message from (X,Y,s) to (Z,Y,PathIdZ[Y]). In other words, the message is now switched to use the current path from Z to Y (instead of continuing on the currently used path from X to Y). After this de-routing, node Z sends a special warning message back to the source node X requesting to switch to the next nodedisjoint path for future messages going from X to Y. Such a warning message is routed in the same way as regular messages. A warning message carries, in addition to the regular path identification, a tag (WRN, Y). In this tag WRN is a bit flag indicating that the message is a warning, and Y is the destination for which the current path has to be adjusted at the source X.

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

In normal (non-warning) messages, the tag (WRN, Y) is replaced by the tag (NRM, nil). When a node X detects a new failure of a neighboring node F, then X adjusts all the entries in its routing table PathIdX that correspond to paths with starting edges leading to F. For each such case, node X switches to the next path not leading to a faulty node from the circular sequence π p , π α1 , π α 2 ..., π α n − 2 of node-disjoint paths, where α1 α2, .. αn-2 correspond to the n-2 values of α. A similar adjustment of the PathIdX table is performed when X receives a warning message carrying a tag (WRN, wrndest), except that in this case X adjusts only the entry PathIdX[wrndest] of its routing table. When X detects a recovery from failure of a neighboring node R, it will not exclude the paths with first edge leading to R in subsequent adjustments of the routing table. 5. Performance Of Routing Algorithm

The

Fault-Tolerant

We first show that it is impossible to design an optimal algorithm (i.e., that routes all messages on the shortest available fault-free path) with the dynamic failure assumptions. Theorem 4: There is no optimal routing algorithm for the n-star that tolerates up to n-2 faulty nodes and such that failures may occur dynamically (at any time). (proof omitted due to size limitation) In fact under the general dynamic failure assumption it is not even possible to design a routing algorithm that guarantees an upper bound on the routing distance. This is due to the fact that no algorithm can predict when and where failures will occur and therefore a message can theoretically loop forever between intermediate nodes due to a possibly foreverchanging failure configuration. We will therefore consider some reasonable assumptions related to failure timing. The proposed algorithm goes through transition periods immediately following failure detection

events. The lengths of these transition periods correspond to the time needed to propagate warning messages back to source nodes in order to switch to different node-disjoint paths. Let…s first consider the behavior of the algorithm outside these transition periods. Assume that the time period between two consecutive node failures is sufficiently long. More precisely, it is assumed to be long enough to allow the reception and processing by source nodes of all warning messages related to previous node failures before the next node failure. After the path adjustments and until the next node failure all messages will then be routed using either optimal or near optimal paths. This assumption on the timing of failures is not too unrealistic since in real systems node failures are unlikely to be fractions of seconds apart while warning message propagation and processing is usually in that range. Let ti be the time at which the ith node failure occurs. Let Zi be the node that fails at time ti. Let wi be the time at which the reception and processing of the last warning message associated with the failure of node Zi is completed by the corresponding source node. Theorem 5: In the presence of at most n - 2 faulty nodes at any time, if wj < ti+1 for all j ≤ i, then during the time interval [w, ti+1] Route delivers all messages on paths of minimum, minimum plus two, or minimum plus four lengths, where w = max {wj, j ≤ i}. (proof omitted due to size limitation) Now we show that when faults occur sufficiently apart in time, during the transition period [ti, wi] a message suffers de-routing at most once causing a maximum increase by eight over the optimal length in the routing distance. Theorem 6: In the presence of at most n - 2 faulty nodes at any time, if wj < ti for all j < i, and if wi < ti+1 then during the time interval [ti, wi] the algorithm Route routes messages on paths of length at most the minimum length plus eight. (proof omitted due to size limitation)

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

In general, the algorithm ROUTE delivers all messages to destination provided that the total number of faults is less than n-1 and that messages do not get trapped in infinite loops due to a continuously changing fault configuration. Such infinite looping can theoretically occur no matter what routing algorithm is used. Consider for example the scenario where a message is sent back and forth between two adjacent nodes X and Y whose corresponding two sets of n-2 neighboring nodes (other than X and Y) alternate in becoming faulty and repaired. However, if the fault configuration is kept unchanged for a reasonably long enough time, then the algorithm Route will eventually deliver all messages as stated in the following result. Theorem 7: For any time t there exists a time τ > t such that if no faults occur between t and τ, then the algorithm Route delivers before τ all those messages issued before t. (proof omitted due to size limitation) 6. Conclusion This paper has two contributions to the study of star graph interconnection networks. First, we designed a new efficient method for constructing node-disjoint paths between any two nodes of the star graph. Second, we have proposed and analyzed a fault-tolerant routing algorithm for the n-star network based on the new node-disjoint path construction method. For the fault-tolerant routing, node failures may occur dynamically at any time provided that the total number of faults does not exceed n-2. We have shown that if the failures occur reasonably apart in time, then all messages will be routed on optimal or near optimal paths (of length at most the minimum length plus four) except for some messages sent during the period of propagation of failure information. During such a transition period the routing distance is shown not to exceed the optimal distance plus eight provided that a failure always occurs after completing the handling of previous failures. In the unlikely case where more faults occur during the period needed to inform source nodes about previous faults, the algorithm still guarantees

delivery to destination of all messages routed during this transition period, but with possibly larger delays. An extension of the obtained algorithm to handle link failures in addition to node failures has been discussed. The algorithm can be easily adapted to any interconnection network for which there exists a simple systematic characterization of complete sets of optimal or near optimal node-disjoint paths. References [1] S. B. Akers and B. Krishnamurthy, ”A Group-Theoretic Model for Symmetric Interconnection Networks,’ IEEE Trans. on Computers, vol. 38, no. 4, pp. 555-566, April 1989. [2] S. B. Akers, D. Harel, and B. Krishnamurthy, ”The Star Graph: An Attractive Alternative to the n-Cube,„ in Proc. Int. Conf. Parallel Processing, 1987, pp. 393-400. [3] S. B. Akers and B. Krishnamurthy, ”The Fault Tolerance of Star Graphs,„ in Proc. 2nd Int. Conf. Supercomputing, 1987, pp. 270-276. [4] A. Al-Ayyoub and K. Day, ”Matrix Decomposition on the Star Graph„, IEEE Trans. on Parallel and Distributed Systems, vol. 8, no. 8, pp. 803-812, August 1997. [5] N. Bagherzadeh, N. Nassif, and S. Latifi, ”A Routing and Broadcasting Scheme in Faulty Star Graphs," IEEE Trans. on Computers, vol. 42, no. 11, pp. 1398-1403, November 1993. [6] P. Berthome, A. Ferreira and S.Perennes, ”Optimal Information Dissemination in Star and Pancake Networks,„ IEEE Trans. on Parallel and Distributed Systems, vol. 7, no. 12, pp. 1292-1300, December 1996. [7] C. Chen and J. Chen, ”Optimal Parallel Routing in Star Networks,„IEEE Trans. on Computers, vol. 46, no. 12, pp. 1293-1303, December 1997. [8] C. Chen and J. Chen, ”Nearly Optimal Oneto-Many Parallel Routing in Star Networks,„ IEEE Trans. on Parallel and Distributed Systems, vol. 8, no. 12, pp. 1293-1303, December 1997.

0-7695-1573-8/02/$17.00 (C) 2002 IEEE

[9] K. Day and A. Tripathi, "A Comparative Study of Topological Properties of Hypercubes and Star Graphs," IEEE Trans. on Parallel and Distributed Systems, vol. 5, no. 1, pp. 31-38, January 1994. [10] M. Dietzfelbbinger, S. Madhavapeddy, and I.H. Sudborough, ”Three disjoint path paradigms in star networks,„ Proc. IEEE Symp. Parallel and Distributed Processing, Dallas, 1991, pp. 400-406. [11] P. Fragopoulou, and S. G. Akl, "A Parallel Algorithm for Computing Fourier Transforms on the Star Graph," IEEE Trans. on Parallel and Distributed Systems, vol. 5, no. 5, pp. 525-531, May 1994. [12] L. Gargano, U. Vaccaro, and A. Vozella, "Fault-tolerant Routing in the Star and Pancake Interconnection Networks," Information Processing Letters, Vol. 45, Issue 6, pp. 315-320, April 1993. [13] S. W. Graham, and S. R. Seidel, "The Cost of Broadcasting on Star Graphs and K-ary Hypercubes," IEEE Transactions on Computers, vol. 42, no. 6, pp. 756-759, June 1993. [14] J. S. Jwo, S. Lakshmivarahan and S. K. Dhall, "Embedding of Cycles and Grids in Star Graphs," Journal of Circuits, Systems, and Computers, vol. 1, no. 1, pp. 43-74, 1991. [15] J. Jwo, S. Lakshmivarahan, and S.K. Dhall, ”Characterization of node-disjoint (parallel) paths in star graphs,„ Proc. Int l Parallel Processing Symp., Anaheim, Calif., 1991, pp. 404-409. [16] S. Latifi, ”On the Fault Diameter of the Star graph,„IPL, vol. 46, pp. 143-150, 1993.

[17] J. I. Lee and J. H. Chang, "Embedding Complete Binary Trees in Star Graphs," Journal of the Korea Information Science Society, vol. 21, no. 2, pp. 407-415, February 1994. [18] A. Menn and A. K. Somani, "An Efficient Sorting Algorithm for the Star Graph Interconnection Network," Proc. Intl. Conf. Parallel Processing, 1990, pp. 1-8. [19] M. Nigam, S. Sahni and B. Krishnamurthy, "Embedding Hamiltonians and Hypercubes in Star Interconnection Graphs," Proc. Intl. Conf. Parallel Processing, 1990, pp. 340343. [20] S. Rajasekaran, and D. S. L. Wei, "Selection, Routing, and Sorting on the Star Graph," in the Proceedings of the Seventh International Parallel Processing Symposium, 1993, pp. 661-665. [21] S. Ranka, J. C. Wang and N. Yeh, "Embedding Meshes on the Star Graph," Journal of Parallel and Distributed Computing, vol. 19, no. 2, pp. 131-135, October 1993. [22] Y. Rouskov and P. K. Srimani, ”Fault Diameter of Star Graphs,„IPL, vol. 48, pp. 243-251, 1993. [23] J. P. Sheu, W. H. Liaw, and T. S. Chen, "A Broadcasting Algorithm in Star Graph Interconnection Networks," Information Processing Letters, vol. 48, no. 5, pp. 237241, December 1993. [24] S. Sur and P. K. Srimani, ”A Fault-tolerant Routing Algorithm in Star Graph Interconnection Networks,„ in Proc. Int. Conf. Parallel Processing, vol. 3, 1991, pp. 267-270.

0-7695-1573-8/02/$17.00 (C) 2002 IEEE