A NEW FAULT-TOLERANT ROUTING ALGORITHM

0 downloads 0 Views 423KB Size Report
ticomputers such as the iPSC/2 [19] and iPSC/860 [24] while the latter has been adopted in recent ... ing in fault-free networks has been extensively studied in the past [8, 9, 11,. 16, 17, 22]. ..... Sp = (l + 2)(1 − P. A(j±) l+1. )+(l + 4)P .... tempts to route along a non-faulty path from the family π of shortest possi- ble lengths before ...
July 5, 2004

11:57

WSPC/113-IJHSC

00022

International Journal of High Speed Computing Vol. 12, No. 1 (2004) 29–54 c World Scientific Publishing Company

A NEW FAULT-TOLERANT ROUTING ALGORITHM FOR k-ARY n-CUBE NETWORKS

J. AL-SADI Department of Computer Science, Zarka Private University, Zarka, Jordan K. DAY Department of Computer Science, Sultan Qaboos University, Sultanate of Oman M. OULD-KHAOUA Department of Computing Science, University of Glasgow, Glasgow G12 8RZ, UK Received February 2002 This paper describes a new fault-tolerant routing algorithm for k-ary n-cubes using the concept of “probability vectors”. To compute these vectors, a node determines first its faulty set, which represents the set of all its neighbouring nodes that are faulty or unreachable due to faulty links. Each node then calculates a probability vector, where the lth element represents the probability that a destination node at distance l cannot be reached through a minimal path due to a faulty node or link. The probability vectors are used by all the nodes to achieve an efficient fault-tolerant routing in the network. An extensive performance analysis conducted in this study reveals that the proposed algorithm exhibits good fault-tolerance properties in terms of the achieved percentage of reachability and routing distances. Keywords: Multicomputers; interconnection networks; k-Ary n-Cube; faulttolerant routing; performance evaluation.

1. Introduction K-ary n-cubes have been one of the most common networks for multicomputers due to their desirable properties, such as ease of implementation and ability to reduce message latency by exploiting communication locality found in many parallel applications. K-ary n-cubes possess an n-dimensional grid 29

July 5, 2004

30

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

structure with k nodes in each dimension such that every node is connected to its neighbouring nodes in each dimension by direct channels. The two most popular instances of k-ary n-cubes are the hypercube (where k = 2) and the torus (where n = 2 or 3). The former has been used in early multicomputers such as the iPSC/2 [19] and iPSC/860 [24] while the latter has been adopted in recent systems like the J-Machine [18], CRAY T3D [13] and CRAY T3E [2]. A routing algorithm specifies how a message selects a path to cross from source to destination, and has great impact on network performance. Routing in fault-free networks has been extensively studied in the past [8, 9, 11, 16, 17, 22]. As the network size scales up the probability of processor and link failure also increases. It is therefore essential to design fault-tolerant routing algorithms that allow messages to reach their destinations even in the presence of faulty components (links and nodes). Existing fault-tolerant routing algorithms, discussed mainly in the context of the hypercube topology [3, 5, 12, 14, 15, 20], have assumed that a node knows either only the status of its neighbours (such a model is called local-information-based) [3, 5, 12] or the status of all nodes (global-information-based) [4, 23]. Localinformation-based routing yields sub-optimal routes (if not routing failure) due to insufficient information upon which the routing decisions are made. Global-information-based routing can achieve optimal or near optimal routing, but often at the expense of high communication overhead to maintain up-to-date network-wide fault information. The main challenge is therefore to devise a simple and efficient way of representing limited global fault information that allows optimal or near-optimal fault-tolerant routing. There have recently been a number of studies reported in the literature that have described limited-global-information-based fault-tolerant routing algorithms. Most of these algorithms, however, have been developed for the hypercube [1, 6, 15, 25, 26]. As a result, little work has considered the other versions of the k-ary n-cubes, such as tori. In fact, most of the existing research on k-ary n-cubes has dealt with the practical and implementation issues associated with fault-tolerant routing [8, 9, 11]. Except from the work of [21], there has hardly been any study that investigates the topological properties of k-ary n-cubes for the provision of efficient fault-tolerant routing algorithms. In [21], the authors have described a fault-tolerant routing algorithm for k-ary n-cubes, which requires a large information table in each node. The table contains entries for every other node in the network, which maintains a list of optimal neighbours leading to a particular destination node, along with the associated probabilities. The algorithm suffers from

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

31

high computational complexity and storage cost since nodes maintain global information about the network. In this paper, we propose a new limited-global information-based routing algorithm for the high-radix k-ary n-cube, like the torus. As we shall see below, the algorithm uses the concept of “probability vectors” to considerably reduce the storage requirement for maintaining fault information, compared to the algorithm proposed in [21]. In the proposed algorithm, each node A starts by determining the set of faulty or unreachable neighbours. Then each node A calculates its probability vector P A = (P1A , . . . , PnA bk/2c). The lth element, PlA , of the probability vector represents the probability that a destination node at distance l from A cannot be reached from A using a minimal path due to faulty nodes and links. An extensive analysis is performed in this study to assess the performance of the proposed algorithm. The results presented here reveal that the new algorithm performs near optimal routing for practical values of the numbers of faulty nodes. Moreover, the results reveal that the algorithm exhibits good performance levels in terms of the achieved routing distances and percentages of reachability even when there exist a large number of faulty nodes. The remainder of the paper is organised as follows. Section 2 reviews some background information (preliminaries and notation) that will be useful for the subsequent sections. Section 3 presents the proposed fault-tolerant algorithm for the k-ary n-cube. Section 4 presents an analytical study of the proposed algorithms. Section 5 conducts a performance evaluation of the new algorithm through simulation experiments. Section 6 concludes this study. 2. Preliminaries and Notation The k-ary n-cube, Qkn , is an undirected graph with k n vertices (nodes). Each node A is labelled in the form A = an−1 , an−2 , . . . , a0 , where 0 ≤ ai < k. Two nodes A = an−1 , an−2 , . . . , a0 and B = bn−1 , bn−2 , . . . , b0 are joined by a link if, and only if, there exists i, 0 ≤ i < n, such that a i = bi ± 1 (mod k) and for aj = bj for i 6= j. For the sake of clarity, we will omit writing mod k in similar expressions in the remainder of our discussion. Q kn has degree 2n and diameter nbk/2c. The shortest path between nodes A and B is equal to their Lee distance is given by dL (A, B) =

n−1 X i=0

wi ,

where wi = min (|ai − bi |, k − |ai − bi |) . 0≤i Sp) or

value;

/* least expected distance through A(i±) */ (∃ A(i±) )) then send M to A(i±) ;

else if ∃ A(j±) and ((∃ A(i±) and P r > Sp) or (∃ A(i±) )) then send M to A(j±) ; else failure

/* destination unreachable */

} else Detecting Looping End.

Fig. 1. Outline of the proposed “PV Routing” fault-tolerant routing algorithm.

if M.Route_distance ≤ dL(A, D) + (k-2) * No-FaultyNodes then July 5, 2004 11:57 WSPC/113-IJHSC 00022 { M.Route_distance: =M.Route_distance + 1 if D is a reachable neighbour then deliver M to D; exit; /* destination reached */ New Fault-Tolerant l = Lee distanceA between A and D

Routing Algorithm for k-Ary n-Cube Networks

35

(i ± )

(i ± ) Pl −A1 value; Let a reachable preferred neighbour with least theAleastbeexpected routing distance to the desired destination. PV Routing

selects the forwarding neighbouring node with the least expected routing distance using probability-based estimations of the least expected routing ( j±) distance (given by Eqs. spare (5) and (6)). If an A ( j ± ) be a reachable neighbour withimmediate least Pl +A1 non-faulty value; neighbour Let is not available then the destination is unreachable. Routing failure occurs Sp= (l + 2a)(case. 1 − Pl +A1( j ± ) ) + (l + 4) Pl +A1( j ± ) ; /* least expected distance through A ( j ± ) */ in such

Pr = l (1 − Pl −A1(i ± ) ) + (l + 2) Pl −A1(i ± ) ; /* least expected distance through A(i ± ) */

±) ( j ±) ∼ nodes, if Example ∃ A(i ± ) and1.( (∃Consider A ( j ± ) andthe Pr 3-ary ≤ Sp) 2-cube thenfaulty send M to A(ias ;shown or ( ∃ Awith) )four

in Fig. 2 (faulty nodes are represented as black nodes). Table 1 shows

±) (i ± ) A ( j ± ) and ( (∃faulty A(i ± ) and ) ) then send M to A ( j each ; elsethe if ∃corresponding > Sp) or (∼ ∃ Avectors setPr and probability associated with

node A

else failure

/* destination unreachable */

Example 2. Consider a 3-ary 2-cube with four faulty nodes, as described in Fig. 2. To route a message from the source node (2, 0) to the destination else Detecting_Looping node (0, 1), first node (2, 2) has two preferred neighbours, (0, 0) and (2, 1), but since node (0, 0) is faulty, the proposed routing algorithm will route to End. node (2, 1) as an intermediate node. Node (2, 1) has one spare neighbour (long cycle neighbour) (1, 1), but the algorithm routes the message through Fig. 1: Outline of the proposed “PV_Routing” fault-tolerant the preferred neighbour to its destination node (0, 1). routing algorithm.

}

02

01

00

10

11

12

21

22

20 Fig. 2. An example of a 3-ary 2-cube with four faulty nodes.

Fig. 2: An example of a 3-ary 2-cube with four faulty nodes.

Table 1. The faulty sets and probability vectors of nodes in a 3-ary 2-cube with 4 faulty 8 nodes. Node A

(0, 0)

(0, 1)

(0, 2)

(1, 0)

(1, 1)

(1, 2)

(2, 0)

(2, 1)

(2, 2)

F1A P1A P2A

Faulty Faulty Faulty

{00, 02} 0.500 0.191

Faulty Faulty Faulty

{00, 12} 0.500 0.273

{12} 0.250 0.171

Faulty Faulty Faulty

{00,22} 0.500 0.273

{22} 0.250 0.171

Faulty Faulty Faulty

July 5, 2004

36

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

The proposed algorithm can detect and minimise the effects of looping. Notice from the above description of PV Routing in Fig. 1 that looping is detected if the routing distance exceeds the specified limit (Lee distance plus f (k − 2) where f is the number of faulty nodes). Since each faulty node may cause a derouting and an increase in the routing distance by a value ranging between 1 and k − 2, the maximum increase in the routing distance should not exceed f (k −2). Since looping occurs when a destination is not reachable from the source node, the algorithm will then discard such a message. 4. Performance Analysis This section analyses some performance properties of the proposed PV routing algorithm in terms of the achieved minimum and average routing distances for various types of the k-ary n-cube. In the remainder of the paper, we assume that there are f faulty nodes in the network, and that all the N nodes are equally likely to be faulty with the failure probability p = f /N . Furthermore, we assume that the source and destination nodes are non-faulty. Let us now start by deriving a lower bound on the probability of minimum distance routing using the new algorithm. 4.1. A lower bound on the probability of minimum distance routing For any two nodes at Hamming distance h, h ≥ 2, and Lee distance l, h ≤ 1 ≤ nbk/2c, the k-ary n-cube with k ≥ 5 is known [7] to embed a family π of 2n node-disjoint paths of the following lengths: h paths of length l, 1 ≤ l ≤ nbk/2c, 2n − 2h paths of length l + 2, and h paths of length l + 4. Assuming there exists a “hypothetical” routing algorithm R that attempts to route along a non-faulty path from the family π of shortest possible lengths before considering other paths. The following theorem provides a lower bound on the probability of minimum distance routing achieved by the algorithm R. Theorem 1. For any source A and destination D at Lee distance l, h ≤ l ≤ nbk/2c, and Hamming distance h, 2 ≤ h ≤ n, and k ≥ 2 the routing algorithm R routes from A to D on a path of length at most l + 4 with a probability at least 1 − Pl · Pl+2 · Pl+4 , where Pl = (1 − (1 − p)l )h

(7)

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

37

Pl+2 = (1 − (1 − p)l+2 )2n−2h

(8)

Pl+4 = (1 − (1 − p)l+4 )h

(9)

f f = . n k N

(10)

p=

Proof. Let Pl , be the probability that all node-disjoint paths in π of length l are faulty. Such a path is faulty if at least one of its l nodes (other than the source node) is faulty. Each node is faulty with probability p = f /N since there are f faulty nodes and all the N nodes in the network are equally likely to be faulty. Therefore a path of length l from π is faulty with probability 1 − (1 − p)l , and hence Pl = [1 − (1 − p)l ]h . Similar analysis yields the expressions for Pl+2 and Pl+4 . Therefore at least one of the 2n paths of π is non-faulty with probability 1 − Pl · Pl+2 · Pl+4 . The PV Routing algorithm attempts to route through a neighbour that has the highest probability of minimum distance routing. The new algorithm keeps all options open and may select from any of the possible paths. As a result, it does not have any preference for a particular family of paths as does algorithm R in Theorem 1. It is therefore expected that PV Routing will perform at least as good as algorithm R. In other words, the probability that PV Routing routes from a source A to a destination D at Lee distance l on a minimum distance path with at least the probability 1−P l ·Pl+2 ·Pl+4 . Claim 1. PV Routing routes in k-ary n-cubes with k ≥ 5 between any given pair of nodes at Lee distance l, h ≤ l ≤ nbk/2c, and Hamming distance h, 2 ≤ h ≤ n, on a minimum length path with probability at least 1 − P l · Pl+2 · Pl+4 . This claim is verified experimentally by analysing the performance of the proposed algorithm in order to measure the path lengths against the number of faulty nodes in the network. To this end, simulation experiments have been carried out over an 8-ary 3-cube with 512 nodes with different random distributions of faulty nodes. We started our experiments with a non-faulty k-ary n-cube and then the number of faulty nodes was increased gradually up to 70% of the network size with random fault distribution. Paths from every node A to all destinations at Lee distance 6 and Hamming distance 3 (as an average Lee and Hamming distances) were selected. Figure 3 shows both the calculated and measured probability of minimum distance routing against the number of faulty nodes in the 8-ary 3-cube when the Lee distance is l = 6 and the Hamming distance is h = 3. Other simulation experiments

July 5, 2004

38

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua Calc_Prob

Measu_Prob

1

Probability

0.8 0.6 0.4 0.2

32 0

28 0

24 0

20 0

16 0

12 0

80

40

0

0

Faults

Fig. 3. The calculated and measured probability of minimum distance routing against the number of faulty nodes in the 8-ary 3-cube. Fig 3: The calculated and measured probability of minimum distance routing against the number of faulty nodes in the 8-ary 3-cube.

have been carried out over the 8-ary 3-cube with a fixed number of faulty nodes 153 (30% of the nodes) with random distributions. A total Table 2: Calculated and measured probability of different minimum distance routing for a fixed number of faulty nodes (30% faulty nodes) in the 8-ary of 300,000 source-destination pairs3-cube. were randomly selected. Table 2 contains both the calculated and measured probability of miniLee Dist. Hamming Dist. Analytical Prob. Experimental Prob. mum distance2 routing for different Lee and0.885 Hamming distances. 2 0.985 Both Fig. 3 and Table 2 confirm that the of minimum distance 3 2 probability 0.751 0.957 routing for 3 0.783 0.985 than the corresponding probability for the PV Routing 3is always better 4 2 0.604 0.937 hypothetical routing algorithm R. This shows that the probability that PV Routing 4routes from a 3source A to a0.636 destination D at0.964 Lee distance l 5 2 0.467 0.934 on a minimum length path is at least 1 − P l · Pl+2 · Pl+4 . 5 3 0.495 0.950 6 2 0.351 0.931 6 3 0.373 0.945 7 0.936 4.2. The average routing2 distance in0.258 the k-ary 2-cube (or 7 3 0.275 0.944 2-D torus) 8 2 0.187 0.936 We restrict the for the class of k-ary 8 discussion in 3 this section 0.200 0.943 2-cubes (or 3 our results to 0.144 0.943 2-D tori). We 9will later extend the 3-dimensinoal tori and then 10 3 0.103 0.937 generalise them for the k-any n-cube. In order to evaluate the average rout3 0.073 ing distance 11 of PV Routing, we define a hypothetical class0.928 of probabilistic 12 3 0.052 0.918

routing algorithms. We then evaluate the performance of these algorithms with the aim of deriving bounds on the performance of PV Routing.

4.2 Definition The Average 2. Routing Distance in the k-Ary 2-Cubea (or 2-D Torus) Routing AlgoA routing algorithm is called Probabilistic

rithm (or PRA for short) the routing decisions based We restrict the discussion in thisif section for the class of k-aryare 2-cubes (oron 2-Dmaximising tori). We will later the probability of minimum distance routing when selecting a node from the extend our results to the 3-dimensinoal tori and then generalise them for the k-any n-cube. In order to faulty-free neighbours. 12

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

39

Table 2. Calculated and measured probability of minimum distance routing for a fixed number of faulty nodes (30% faulty nodes) in the 8-ary 3-cube. Lee dist.

Hamming dist.

Analytical prob.

Experimental prob.

2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12

2 2 3 2 3 2 3 2 3 2 3 2 3 3 3 3 3

0.885 0.751 0.783 0.604 0.636 0.467 0.495 0.351 0.373 0.258 0.275 0.187 0.200 0.144 0.103 0.073 0.052

0.985 0.957 0.985 0.937 0.964 0.934 0.950 0.931 0.945 0.936 0.944 0.936 0.943 0.943 0.937 0.928 0.918

The following assumptions are made to simplify the analysis of the PRA algorithm and to derive bounds on the performance of the PV Routing algorithm. (i) In selecting the next move, the neigbours are considered in the following order: preferred on the first dimension, preferred on the second dimension, spare on the first dimension, and spare on the second dimension. (ii) After f spare routing moves, the message is discarded to avoid looping. Lemma 1.

PV Routing is a PRA.

Proof. Since the PV Routing algorithm decisions are based on maximising the probability of minimum distance routing when selecting a node from the faulty-free neighbours then PV Routing is a PRA. The PV Routing algorithm routes messages depending on the probabilistic value PlA used as a base of the routing decisions to destinations at Lee distance l. The maximum number of spare moves a path is expected to pass through using PV Routing is f . The total path length in such a case is the Lee distance between source and destination plus 2f . If the path exceeded f spare moves then looping must have occurred and the message is then discarded.

July 5, 2004

40

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

We now derive an expression for the average routing distance in the PRA algorithm. Since the k-ary n-cube has symmetric network topology, we will focus, without loss of generality, our discussion on a particular source node, S. We will use the following notation during the derivation: — Pl1 ,l2 ,s : probability of making exactly s spare moves when routing between the source node A and a destination with Lee distance components (l1 , l2 ), where l = l1 + l2 , and l1 and l2 are the distances across the first and second dimension, respectively. ¯ — Dl : average routing distance from the source node A to destinations at Lee distance l. ¯ l ,l : average routing distance from the source node A to destina— D 1 2 tions with Lee distance components (l 1 , l2 ). — wl1 ,l2 : ratio of the number of nodes with Lee distance components (l1 , l2 ) to the number of nodes at lee distance l = l 1 + l2 from the source node A. — Nl : the number of nodes at Lee distance l from the source node A. Lemma 2.

wl1 ,l2 =

Wl1 ,l2 is given by:  1   l > bk/2c and (l1 = 0 or l2 = 0)   4bk/2c − 2l + 2        1   l > bk/2c and (l1 > 0 and l2 > 0)   2bk/2c − l + 1   1     2l         1 l

.

(11)

l ≤ bk/2c and (l1 = 0 and l2 = 0) l ≤ bk/2c and (l1 > 0 and l2 > 0)

Proof. For a given l1 and l2 , 0 ≤ l1 , l2 ≤ bk/2c, and a fixed source node A = (x1 , x2 ) in the k-ary 2-cube, if l1 6= 0 and l2 6= 0 then there are four possible destination nodes with Lee distance components (l 1 , l2 ) from A. These are: (x1 + l1 , x2 + l2 ), (x1 + l1 , x2 − l2 ), (x1 − l1 , x2 + l2 ), and (x1 − l1 , x2 − l2 ). If however l1 = 0 or l2 = 0 then there are only two such possible destinations. Furthermore, the number of destinations N l at a given Lee distance l from the source node A is given by Nl = 4(2bk/2c − l + 1) ,

l > bk/2c .

(12)

This is derived from the fact that the range for each l 1 and l2 should be bk/2c, bk/2c − 1, bk/2c − 2, . . . , l − bk/2c. The number of values in this range

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

41

is 2bk/2c − l + 1. Furthermore there are four destinations with Lee distance components (l1 , l − l1 ) for each case. Using a similar reasoning yields the result Nl = 4l for the case l ≤ bk/2c. Hence the claimed result for w l1 ,l2 . Lemma 3.

¯ l ,l is given by D 1 2

¯ l ,l = D 1 2

f X

Pl1 ,l2 ,s · (l1 + l2 + 2s), where

(13)

s=0

Pl1 ,l2 ,s = (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s + p2 (1 − p)Pl1 +1,l2 ,s−1 + p3 (1 − p)Pl1 ,l2 +1,s−1 .

(14)

The probability Pl1 ,l2 ,s satisfies the following the boundary conditions Pl1 ,l2 ,s

=

 1      1        (1 − p)l1 −1       (1 − p)l2 −1       (1 − p)Pl1 −1,l2 ,0 + p(1 − p)Pl1 ,l2 −1,0       0       0       (1 − p)Pl1 −1,0,s + p(1 − p)Pl1 +1,0,s−1      + p2 (1 − p)Pl1 ,1,s−1        (1 − p)Pl1 −1,0,s + p(1 − p)Pl1 ,1,s−1

(1 − p)P0,l2 −1,s + p(1 − p)P1,l2 ,s−1      + p2 (1 − p)P0,l2 +1,s−1       (1 − p)P0,l2 −1,s + p(1 − p)P1,l2 ,s−1        (1 − p)Pl1 −,1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s         + p2 (1 − p)Pl1 +1,l2 ,s−1        + p3 (1 − p)Pl1 ,l2 +1,s−1      (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s       + p2 (1 − p)Pl1 ,l2 +1,s−1       (1 − p)Pl1 ,−1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s      + p2 (1 − p)Pl1 +1,l2 ,s−1     (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s

l1 = 1, l2 = 0, s = 0 l1 = 0, l2 = 1, s = 0 l1 > 0, l2 = 0, s = 0 l1 = 0, l2 > 0, s = 0 l1 > 0, l2 > 0, s = 0 l1 = 1, l2 = 0, s > 0 l1 = 0, l2 = 1, s > 0 1 < l1 < bk/2c, l2 = 0, s > 0 l1 = bk/2c, l2 = 0, s > 0 . l1 = 0, 1 < l2 < bk/2c, s > 0 l1 = 0, l2 = bk/2c, s > 0 0 < l1 < bk/2c, 0 < l2 < bk/2c, s > 0

l1 = bk/2c, 0 < l2 < bk/2c, s > 0 l2 = bk/2c, 0 < l1 < bk/2c, s > 0 l1 = bk/2c, l2 = bk/2c, s > 0

(15)

July 5, 2004

42

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

¯ l ,l be the average routing distance between a given pair of Proof. Let D 1 2 nodes with Lee distance components (l 1 , l2 ). Since each spare move increases the routing distance by 2 hops, and since messages are discarded after making ¯ l ,l as f spare moves, we can write D 1 2 ¯ l ,l = D 1 2

f X

(l1 + l2 + 2s)Pl1 ,l2 ,s .

(16)

s=0

To make s spare moves when routing a message at Lee distance l from its destination, we distinguish the following cases based on the first move made: (a) A preferred move on the first dimension leading to a node with Lee distance components (l1 − 1, l2 ) from destination, and the remaining route must include s spare moves. (b) A preferred move on the second dimension leading to a node with Lee distance components (l1 , l2 − 1) from destination, and the remaining route must include s spare moves. (c) A spare move on the first dimension leading to a node with Lee distance components (l1 + 1, l2 ) from destination, and the remaining route must include s − 1 spare moves (d) A spare move on the second dimension leading to a node with Lee distance components (l1 , l2 + 1) from destination, and the remaining route must include s − 1 spare moves. It can be easily verified that P1,0,0 = 1, P0,1,0 = 1, P1,0,s = 0 and P0,1,s = 0 for all s > 0 since the source and destination nodes are both assumed to be non-faulty. For l = l1 + l2 ≥ 2, Pl1 ,0,0 is the probability that a destination with Lee distance components (l1 , 0) is minimally reachable. This probability is equal to (1 − p)l1 −1 as this requires all l1 − 1 preferred intermediate nodes to be non-faulty. Following similar arguments the probabilities P 0,l2 ,0 and Pl1 ,l2 ,0 are obtained. For 0 < l1 < bk/2c, 0 < l1 < bk/2c, and s > 0, we therefore can write Pl1 ,l2 ,s as Pl1 ,l2 ,s = (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s + p2 (1 − p)Pl1 +1,l2 ,s−1 + p3 (1 − p)Pl1 ,l2 +1,s−1 .

(17)

When the destination is at Lee distance bk/2c on one or both dimensions, then the first move can only be a preferred move on that dimension, and in this case

July 5, 2004

11:57

WSPC/113-IJHSC

00022

43

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

Pl1 ,l2 ,s  (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s       + p2 (1 − p)Pl1 ,l2 +1,s−1   = (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s    + p2 (1 − p)Pl1 +1,l2 ,s−1      (1 − p)Pl1 −1,l2 ,s + p(1 − p)Pl1 ,l2 −1,s

l1 = bk/2c and 0 < l2 < bk/2c . 0 < l1 < bk/2c and l2 = bk/2c l1 = bk/2c and l2 = bk/2c (18)

The results of Lemmas 2 and 3 are used to obtain the following theorem. ¯ l, Theorem 2. For the PRA algorithm, the average routing distance, D between a given pair of nodes at Lee distance l in the k-ary 2-cube is given by min(l,bk/2c)

¯l = D

X

¯ l ,l−l · wl ,l−l . D 1 1 1 1

(19)

l1 =0

Claim 2. The average routing distance between two nodes at Lee distance ¯ l. l for PV Routing in the k-ary 2-cube is at most D This claim is intuitively justified by the fact that PV Routing makes routing decisions based on maximising the calculated probability of minimum distance routing while the PRA algorithm selects the first feasible move from the list: preferred on the first dimension, preferred on the second dimension, spare on the first dimension, and spare on the second dimension. Since the PV routing and PRA algorithms are based on similar probabilistic nature, we expect them to perform similarly in terms of the achieved average routing distances. To support this intuitive claim, we compare the results obtained using the above-derived expressions against those obtained through simulation. ¯ l ,l , and D ¯ l given by We first solved the equations related to w l1 ,l2 , Pl1 ,l2 ,s , D 1 2 Lemmas 2, 3, and Theorem 2. These calculations yield the average routing ¯ A = (D ¯ 1, D ¯ 2, . . . , D ¯ nbk/2c ). We then performed experiments distance vector D of our PV Routing algorithm to measure experimentally the corresponding average routing distances vector. Tables 3 and 4 show results for the calculated and measured average routing distances, respectively, in a 225-nodes 15-ary 2-cube where the faulty nodes are increased gradually up to 60 faulty randomly distributed nodes. All possible source-destination pairs have been generated and tested. The

July 5, 2004

44

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

Table 3. The measured average routing distance between two nodes at Lee distance l for different numbers of faulty nodes in the 15-ary 2-cube. Number of faulty nodes in 15-ary 2-cube l

0

5

10

15

20

25

30

35

40

45

50

55

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1.00 2.10 3.15 4.14 5.14 6.14 7.14 8.13 9.11 10.11 11.12 12.12 13.13 14.14

1.00 2.11 3.18 4.21 5.23 6.25 7.28 8.27 9.25 10.25 11.25 12.26 13.29 14.30

1.00 2.16 3.24 4.30 5.35 6.39 7.43 8.41 9.38 10.37 11.38 12.38 13.40 14.39

1.00 2.17 3.26 4.33 5.39 6.47 7.53 8.52 9.48 10.49 11.51 12.52 13.53 14.52

1.00 2.24 3.39 4.50 5.56 6.64 7.71 8.69 9.64 10.61 11.60 12.56 13.53 14.49

1.00 2.25 3.41 4.50 5.57 6.66 7.75 8.71 9.66 10.62 11.62 12.61 13.65 14.66

1.00 2.40 3.54 4.66 5.74 6.87 7.98 8.92 9.83 10.83 11.84 12.81 13.85 14.85

1.00 2.40 3.62 4.81 5.90 7.04 8.18 9.16 10.11 11.10 12.13 13.14 14.10 15.13

1.00 2.43 3.66 4.89 6.05 7.23 8.40 9.40 10.38 11.30 12.31 13.23 14.24 15.15

1.00 2.52 3.73 4.96 6.07 7.27 8.42 9.39 10.29 11.22 12.23 13.27 14.34 15.25

1.00 2.59 3.84 5.06 6.23 7.43 8.58 9.59 10.53 11.49 12.65 13.87 15.10 15.86

1.00 2.54 3.81 5.13 6.31 7.52 8.58 9.52 10.41 11.44 12.64 13.82 14.92 15.78

Table 4. The calculated average routing distance between two nodes at Lee distance l for different number of faulty nodes in the 15-ary 2-cube. Number of faulty nodes in 15-ary 2-cube (calculated probability) l

0

5

10

15

20

25

30

35

40

45

50

55

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2.02 3.05 4.07 5.09 6.12 7.14 8.14 9.16 10.2 11.2 12.2 13.2 14.3

1.00 2.05 3.10 4.15 5.20 6.24 7.29 8.29 9.32 10.36 11.41 12.45 13.49 14.49

1.00 2.08 3.15 4.23 5.30 6.38 7.44 8.45 9.49 10.55 11.61 12.67 13.72 14.68

1.00 2.11 3.21 4.32 5.42 6.52 7.60 8.61 9.65 10.73 11.81 12.88 13.94 14.84

1.00 2.14 3.28 4.41 5.54 6.67 7.77 8.78 9.82 10.90 11.99 13.07 14.13 14.94

1.00 2.18 3.35 4.51 5.67 6.82 7.93 8.94 9.98 11.06 12.15 13.24 14.29 14.98

1.00 2.21 3.42 4.61 5.79 6.96 8.08 9.10 10.12 11.20 12.29 13.37 14.39 14.95

1.00 2.25 3.49 4.71 5.91 7.10 8.23 9.24 10.25 11.32 12.39 13.45 14.43 14.83

1.00 2.29 3.56 4.81 6.03 7.23 8.35 9.35 10.35 11.39 12.45 13.47 14.39 14.62

1.00 2.33 3.63 4.90 6.14 7.34 8.44 9.42 10.40 11.42 12.44 13.42 14.26 14.29

1.00 2.38 3.70 4.98 6.23 7.42 8.49 9.45 10.41 11.38 12.36 13.27 14.01 14.84

1.00 2.41 3.76 5.05 6.29 7.47 8.49 9.42 10.34 11.27 12.18 13.02 14.64 14.27

experimental results and the analytical results are in close agreement with those obtained using simulation, demonstrating the accuracy of our above analytical derivation. The results also show that PV Routing can achieve as good performance as any fault-tolerant routing algorithm, e.g. the PRA

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

45

algorithm that achieves high ratios of minimal routing in the presence of faults in the network. A second set of simulation experiments have been carried out for a k-ary 2-cube with a fixed number of faulty nodes (20% of the nodes) with different random distributions. All possible source-destination pairs in the network have been generated and tested. Table 5 contains both the calculated and measured probabilities of minimal path routing for the resulting different Lee distances. This table also supports the claim that the proposed PV Routing

Table 5. The measured and calculated average routing distance for a fixed number of faulty nodes (20% faulty) in the k-ary 2-cube of different sizes. k

Lee dist.

3

1 2

1 2

1 2

1 2 3 4

1 2.5 3.395 4.463

1 2.171 3.179 3.999

1 2 3 4 5 6

1 2.556 3.827 4.975 5.878 6.849

1 2.278 3.442 4.463 5.487 6.184

1 2 3 4 5 6 7 8

1 2.301 3.533 4.587 5.575 6.555 7.617 8.572

1 2.284 3.53 4.682 5.701 6.744 7.768 8.375

1 2 3 4 5 6 7 8 9 10

1 2.42 3.572 4.734 5.846 6.835 7.766 8.624 9.513 10.344

1 2.289 3.549 4.78 5.92 6.934 7.965 9.022 10.02 10.502

5

7

9

11

Measured

Calculated

k

13

15

Lee dist.

Measured

Calculated

1 2 3 4 5 6 7 8 9 10 11 12

1 2.273 3.566 4.682 5.757 6.909 7.947 8.937 9.903 10.872 11.89 12.91

1 2.295 3.562 4.807 6.023 7.152 8.158 9.171 10.225 11.271 12.226 12.566

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2.368 3.52 4.713 5.844 7.025 8.175 9.139 10.049 11.008 11.97 12.941 13.937 14.835

1 2.293 3.559 4.805 6.031 7.231 8.349 9.348 10.347 11.392 12.447 13.473 14.392 14.618

July 5, 2004

46

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

and hypothetical PRA algorithms exhibit similar performance characteristics in terms of the achieved average routing distance. 4.3. The average routing distance in the k-ary 3-cube (or 3-D torus) Using a similar analysis to that introduced in the above Lemmas 1 and 2, ¯ l for the PRA algorithm in the k-ary 3-cube the average routing distance D is found to be ¯l = D

x X x X x X

Dl1 ,l2 ,l3 · wl1 ,l2 ,l3 , where l1 + l2

l1 =0 l2 =0 l3 =0

¯ l ,l ,l = D 1 2 3

+ l3 = l and x = min(l, bk/2c)

(20)

f X

(21)

(l1 + l2 + l3 + 2s)Pl1 ,l2 ,l3

s=0

Pl1 ,l2 ,l3 ,s = (1 − p)Pl1 −1,l2 ,l3 ,s + p(1 − p)Pl1 ,l2 −1,l3 ,s + p2 (1 − p)Pl1 ,l2 ,l3 ,−1,s + p3 (1 − p)Pl1 +1,l2 ,l3 ,s−1 + p4 (1 − p)Pl1 ,l2 +1,l3 ,s−1 + p5 (1 − p)Pl1 ,l2 ,l3 +1,s−1  2   l1 > 0 or l2 > 0 or l3 > 0   Nl     4 l1 = 0 or l2 = 0 or l3 = 0 wl1 ,l2 ,l3 =  Nl     8    l1 > 0 and l2 > 0 and l3 > 0 Nl  bk/2c bk/2c bk/2c  X X X    1 l1 + l2 + l3 and l > bk/2c   Nl = l1 =0 l2 =0 l3 =0 . l−2  X    i l ≤ bk/2c   6 + 12(l − 1) + 8

(22)

(23)

(24)

i=0

Table 6 shows both the calculated versus measured results for the average routing distances in PV Routing for different sizes of the k-ary 3-cube, where the number of faulty nodes is 20% of the total network size. The results demonstrate that the above-derived expressions predict the average routing distance with a reasonable degree of accuracy.

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

47

Table 6. The measured and calculated average routing distance for a fixed number of faulty nodes (20% faulty) in the k-ary 3-cube of different sizes. k

Lee dist.

Measured

Calculated

3

1 2 3

1 2.082 3.112

1 2.043 3.014

1 2 3 4 5 6

1 2.210 3.240 4.246 5.270 6.279

1 2.203 3.277 4.233 5.354 6.347

1 2 3 4 5 6 7 8 9

1 2.317 3.478 4.568 5.554 6.660 7.680 8.787 9.792

1 2.232 3.408 4.501 5.573 6.643 7.693 8.754 9.726

1 2 3 4 5 6 7 8 9 10 11 12

1 2.323 3.434 4.551 5.791 6.791 7.840 8.905 10.075 11.954 12.157 13.122

1 2.235 3.426 4.602 5.709 6.785 7.875 8.954 10.012 11.083 12.159 13.096

5

7

9

4.4. The average routing distance for the k-ary n-cube (the general case) We now compute the expected average routing distance for the k-ary n-cube. To reduce the complexity of the combinatorics involved, and thus simplify the analysis, we assume that the Lee distance l i (1 ≤ i ≤ n) across any of the n dimensions is equal to li = k/4, which is the average routing distance along a dimension. Assuming equal chances of encountering faulty nodes on each of the n dimensions, the number of spare moves on each dimension is at

July 5, 2004

48

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

most f /n. Since each spare move causes two extra routing steps, the overall average routing distance can thus be written as ¯= D

f /n n X X

(k/4 + 2s)Pi,s

(25)

i=1 s=0

where Pi,s is the probability that s spare moves are made on dimension i. In order to evaluate Pi,s we assume the routing algorithm attempts to make a preferred move on one of dimensions 1, 2, . . . , n in this order. If that is not possible, the routing attempts to make a spare move on one of dimensions 1, 2, . . . , n in this order. Consequently, a spare move is performed on dimension i if all n preferred moves are not possible (due to faulty nodes or links) and the i − 1 spare move on dimensions 1, 2, . . . , i − 1 are not possible. Hence the probability of making one spare move on dimension i is P n+(i−1) (1 − p). The probability of making s spare moves on dimension i is given by Pi,s = (pn+i−1 )(1 − p))s ,

(1 ≤ i ≤ n) .

(26)

Table 7 shows results using the above equations and simulation for estimating the average routing distance in the k-ary 3-cube for different sizes, where the number of faulty nodes is 10% of the network size. The results show that in each case the measured and calculated average routing distances are in close agreement with each other, demonstrating the validity of the above approximate analytical model. Table 7. The measured and calculated average routing distance for a fixed number of faulty nodes (10% faulty) in the 3-ary 3-cube of different sizes. k

¯ Measured D

¯ Calculated D

2 3 4 5 6 7 8 9 10

1.714 2.083 3.051 3.735 4.625 5.352 6.258 7.002 7.861

1.502 2.253 3.003 3.753 4.503 5.254 6.004 6.754 7.504

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

49

5. Experimental Performance Analysis This section obtains experimentally three additional performance measures on the proposed PV routing algorithm, namely deviation from optimality, unreachability, and looping. To this end, simulation experiments have been carried out over a 10-ary 3-cube with 1000 nodes with different random distributions of faulty nodes. We started the experiments with a non-faulty k-ary n-cube and then the number of faulty nodes was increased gradually up to 50% of the network size with random fault distribution. A total of Unreachability 300,000 source-destination pairs where selected randomly at each run. In the first10 two sets of results reported below (in Figs. 4 and 5 respectively). However, 8 before presenting the results, we define the following variables, which will be used to quantify some performance measures. 6 4

Unreachability

2 0 10 0

25

50

75

100

125

150

175

200

225

250

275

300

325

350

225

250

275

300

325

350

Faulty Nodes

8 6 4 2 0 0

25

50

75

100

125

150

175

200

Faulty Nodes

Deviation

100 80 60 40

Deviation

20 0 100

0

25

50

75 100 125 150 175 200 225 250 275 300 325 350 Faulty Nodes

80

Fig. 4. Percentage of unreachability and average percentage of deviation in the proposed 60 PV Routing algorithm for 9-ary 3-cube. 40 of unreachability and average percentage of deviation in the proposed Fig. 4: Percentage 20

PV_Routing algorithm for 9-ary 3- cube.

0 0

25

50

75 100 125 150 175 200 225 250 275 300 325 350 Faulty Nodes

Fig. 4: Percentage of unreachability and average percentage of deviation in the proposed

PV_Routing algorithm for 9-ary 3- cube.

11:57 WSPC/113-IJHSC 80

J.

00022

70 60 50

10%

40 30 100 20 90 10 Al-Sadi, 80 0 70

20%

30%

40%

50%

K. Day & M. Ould-Khaoua 2

3

4

5

6

60 50

7

8

9

k 10%

40 30 20

Deviation form Optimality

50

Deviation form Optimality Deviation form Optimality

July 5, 2004

100 90

30%

40%

50%

100 90

10

80 70

0

60 2 50

3

4

5

6

7

8

9

k

40 10% 30

20%

30%

40%

50%

20 10 16

Unreachability

0 14 2 12 10 8 610% 4 2 16 0

14 12 10 8 6 4 2 0

3

4

5

6

7

8

9

k 20%

2

30%

3

4

10%

16 14 12 10 2 8 6 4 10% 2 0

3

4

2

6

k

30%

5

20%

40

40%

5

20%

Unreachability

Unreachability

20%

6

4

5

8

8

40% 6

k

9 50%

7

30%

3

7 40%

k

35

50%

9

50%

7

8

9

Looping

30 25 20 15 10%

20%

30%

40%

50%

10

40 5 35 0 2 10%

Looping

30

3

20%4

5 30%

25 20

6

7 40%

8

50% 9

k 40

Looping

Fig. 5. Percentage of deviation, unreachability, and looping in the proposed PV Routing 15 35 algorithm for different sizes of the k-ary n-cube (each curve represents a specific percentage 10 30 of faulty nodes). 25

Fig 5: Percentage of deviation, unreachability, and looping in the proposed PV_Routing algorithm for 5 20 different sizes of the k-ary n-cube (each curve represents a specific percentage of faulty nodes). 0 15 10

2

3

4

5

6

7

8

9

k

5 0 2

3

4

5

6

7

8

9

k Fig 5: Percentage of deviation, unreachability, and looping in the proposed PV_Routing algorithm for different sizes of the k-ary n-cube (each curve 25 represents a specific percentage of faulty nodes).

Fig 5: Percentage of deviation, unreachability, and looping in the proposed PV_Routing algorithm for different sizes of the k-ary n-cube (each curve represents a specific percentage of faulty nodes).

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

— — — — —

51

Total: total number of generated messages Routing Distance: number of links crossed by a message Lee Distance: Lee distance between the message source and destination Fail Count: number of routing failure cases Looping Count: number of messages that cross a number of links beyond a maximum threshold before being discarded.

Using the above variables we propose the following three performance measures as the basis for studying the PV Routing algorithm [20, 24]. — Average percentage of deviation 1 P Routing Distance-Lee Distance = T otal × 100 Lee Distance Count — Percentage of unreachability = F ailT otal × 100 Looping Count × 100. — Percentage of looping = T otal

from

optimality

The average deviation from optimality indicates how close the achieved routing is to the minimal distance routing. The percentage of unreachability measures the percentage of messages that the algorithm failed to deliver to a destination due to faulty components. The percentage of loops indicates the ratio of messages that failed to reach destinations due to network partitioning. We believe that these three measures give realistic indications on the performance of a fault-tolerant routing algorithm and are sufficient for the purpose of our current study. Figure 4 reveals that PV Routing achieves high reachability with low percentage of deviation from optimality. The deviation from optimality remains low as long as this number of faulty nodes does not exceed 50% of the total number of nodes, then it grows almost linearly with the number of faulty nodes. The proposed algorithm is capable of routing messages using optimal distance paths even when there are a large number of faulty components. This is due to the fact that the algorithm repeatedly chooses to route through areas of the network with the least number of faults in the neighbourhood by choosing to route to a preferred neighbour with the least probability that a destination at distance l from A is not minimally reachable from A. As a result, the algorithm tends to select paths that diverge from areas with high counts of faulty components. The result also reveals that the percentage of looping remains practically negligible when the percentage of faulty nodes is less than 40%. Another experiment was conducted to evaluate the performance behaviour of our algorithm when the network size increases. For the sake of illustration, we have fixed the value of n to 3, and increased the value for k from 2 up to 9; we have found that the same conclusions are reached when

July 5, 2004

52

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

other values of n are considered. For each network size, the algorithm has been tested by setting the percentage of faulty nodes to 10% of the network size, then to 20%, 30%, 40%, and 50% of the network size. At each run, a total of 30,000 source-destination pairs where selected randomly. The result presented in Fig. 5 shows that the performance properties of the new faulttolerant routing algorithm are not affected as the network size is scaled up. This reveals that the new algorithm possesses the nice property of maintaining good performance levels without imposing any restriction on the system size. 6. Conclusions K-ary n-cubes have been one of the most common networks for multicomputers due to their ease of implementation and ability to exploit communication locality found in many parallel applications. This study has first introduced the concept of “probability vectors”, and then used it to propose a new fault-tolerant routing algorithm for k-ary n-cubes. As a first step in the new algorithm, each node A determines its view of the faulty set FA of neighbouring nodes which are either faulty or unreachable from A. Equipped with these faulty sets node A calculates its probability vectors, P A , by exchanging fault information with its reachable neighbours. An element PlA , 1 ≤ l ≤ nbk/2c, of the vector represents the probability that a destination node at distance l cannot be reached from node A using a minimal path due to a faulty node or link along the path. Each node then uses the probability vectors to perform an efficient fault-tolerant routing in the network. An analytical study has been presented to derive lower bounds on the average message distance achieved by the proposed algorithm. A performance analysis of the algorithm using simulation experiments has also been reported. The results have revealed that the new algorithm exhibits good performance in terms of the routing distance and percentage of reachability even when the number of faulty nodes in the network is large. The results have also revealed that the algorithm maintains good performance levels as the network sizes scales up. The next of our work is to apply the concept of the probability vectors to other well-known multicomputer networks, such as generalised hypercubes and n-dimensional meshes. References [1] J. Al-Sadi, K. Day and M. Ould-Khaoua, Fault-tolerant routing in hypercubes using probability vectors, Parallel Computing 27(10) (2001) 1381–1399.

July 5, 2004

11:57

WSPC/113-IJHSC

00022

A New Fault-Tolerant Routing Algorithm for k-Ary n-Cube Networks

53

[2] E. Anderson, J. Brooks, C. Grassl and S. Scott, Performance of the Cray T3E multiprocessor, Proc. Supercomputing Conference, San Jose, California, 1997. [3] M. S. Chen and K. G. Shin, Adaptive fault-tolerant routing in hypercube multicomputers, IEEE Trans. Computers 39(12) (1990) 1406–1416. [4] M.-S. Chen and K. G. Shin, On hypercube fault-tolerant routing using global information, Proc. 4th Conf. Hypercube Concurrent Computers & Applications, 1989, pp. 83–86. [5] M. S. Chen and K. G. Shin, Depth-first search approach for fault-tolerant routing in hypercube multicomputers, IEEE Trans. Parallel & Distributed Systems 1(2) (1990) 152–159. [6] G. M. Chiu and K.-S. Chen, Fault-tolerant routing strategy using routing capability in hypercube multicomputers, Proc. Int. Conf. Parallel and Distributed Systems (1996) 396–403. [7] K. Day and A. Al-Ayyoub, Fault Diameter of K-ary n-cube Networks, IEEE Trans. Parallel & Distributed Systems 8(9) (1997) 903–907. [8] J. Duato, S. Yalamanchili and L. Ni, Interconnection Networks: An Engineering Approach, IEEE Computer Society Press, 1997. [9] P. T. Gaughan and S. Yalamanchili, Adaptive routing protocols for hypercube interconnection networks, IEEE Computer 26(5) (1993) 12–24. [10] S. Graham and S. Seidel, The cost of broadcasting on star graphs and k-ary hypercubes, IEEE Trans. Computers 42(6) (1993) 756–759. [11] L. Gravano, G. Pifarre, P. Berman and J. Sanz, Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks, IEEE Trans. Parallel & Distributed Systems, 5(12) (1994) 1233–1251. [12] J. M. Gordon and Q. F. Stout, Hypercube message routing in the presence of faults, Proc. 3rd Conf. Hypercube Concurrent Computers and Applications (Pasadena, California, 1988), pp. 251–263. [13] R. E. Kessler and J. L. Schwarzmeier, CRAY T3D: A new dimension for Cray Research, in CompCon, Houston, Texas, Spring 1993, pp. 176–182. [14] Y. Lan, A fault-tolerant routing algorithm in hypercubes, Proc. Int. Conf. Parallel Processing (1994) 163–166. [15] T. C. Lee and J. P. Hayes, A fault-tolerant communication scheme for hypercube computers, IEEE Trans. Computers 41(10) (1992) 1242–1256. [16] D. Linder and J. Harden, An adaptive and fault tolerant wormhole routing strategy for k-ary hypecubes, IEEE Trans. Computers 40(1) (1991) 2–12. [17] L. M. Ni and P. K. McKinley, A survey of routing techniques in wormhole networks, IEEE Computer 26(2) (1993) 62–76. [18] M. Noakes et al., The J-machine multicomputer: An architectural evaluation, Proc. 20th Int. Symp. Computer Architecture (1993). [19] S. F. Nugent, The iPSC/2 Direct-Connect communication technology, Proc. Conf. on Hypercube Concurrent Computers & Applications 1 (1988) 51–60. [20] C. S. Raghavendra, P. J. Yang and S. B. Tien, Free dimension — an effective approach to achieving fault tolerance in hypercubes, Proc. 22nd Int. Symp. Fault-Tolerant Comp. (1992) 170–177. [21] C. P. Ravikumar and C. S. Panda, Adaptive routing in k-ary n-cubes using incomplete diagnostic information, Microprocessors and Microsystems 20 (1997) 351–360.

July 5, 2004

54

11:57

WSPC/113-IJHSC

00022

J. Al-Sadi, K. Day & M. Ould-Khaoua

[22] Y. Saad and M. H. Schultz, Data communication in hypercubes, Technical Report YALEU/DCS/RR-428, Computer Science Dept., Yale University, 1985. [23] J.-P. Sheu and M.-Y. Su, A multicast algorithm for hypercube multiprocessors, Proc. Int. Conf. Parallel Processing (1992) 18–22. [24] B. Vanvoorst, S. Seidel and E. Barscz, Workload of an iPSC/860, Proc. Scalable High-Performance Computing Conf. (1994) 221–228. [25] J. Wu, Adaptive fault-tolerant routing in cube-based multicomputers using safety vectors, IEEE Trans. Parallel and Distributed Systems 9(4) (1998) 321– 334. [26] J. Wu and E. B. Fernandez, Broadcasting in faulty hypercubes, Proc. 11th Symp. Reliable Distributed Systems (1992) 122–129.

Suggest Documents