Optimal Time Broadcasting in Faulty Star Networks Aohan Mei, Feng Bao, Yukihiro Hamada, and Yoshihide Igarashi Department of Computer Science, Gunma University, Kiryu, 376 Japan Email: fmei, bao, hamada,
[email protected] Abstract. This paper investigates fault-tolerant broadcasting in star networks. We propose a non-adaptive single-port broadcasting scheme in the n-star network such that it tolerates n 0 2 faults in the worst case and completes the broadcasting in O (n log n) time. The existence of such a broadcasting scheme was not known before. A new technique, called diusing-and-disseminating, is introduced into the scheme. This technique is useful to overcome the diculty due to the special topology of star networks. We also analyze the reliability of the broadcasting scheme in the case where faults are randomly distributed in the n-star network. We show that the broadcasting scheme in the n-star network can tolerate (n!) random faults with a high probability, where is any constant less than 1.
1 Introduction Broadcasting is one of the fundamental tasks in network communications. It is the process of disseminating a message from the source node to all other nodes in the network. It can be accomplished in such a way that each node repeatedly receives and forwards messages. For the past decade an overwhelming amount of studies on broadcasting in networks have been done. There are some good survey papers on this subject, e.g.,[9],[12]. In large networks, some of the nodes and/or links may be faulty. However, multiple copies of the message can be disseminated through disjoint paths. The fault-tolerance of broadcasting can be achieved by processing this type of message multiplicities. We say that the broadcasting is successful if the correct message from the source node can reach all the healthy nodes in the network within a certain limit of time. Apparently, the maximum number of faults that can be tolerated in this way is less than the connectivity of the network in the worst case. Various models of fault-tolerant broadcasting and communication in networks have been proposed. We can contrast some features of the models. These are adaptive versus non-adaptive, single-port versus multi-port, full-duplex versus half-duplex and others [10],[12]. For some models of broadcasting it is assumed that each node has the information about the set of faults [8],[13]. For other models of broadcasting it is assumed that each node does not have any information about faults. For a model of the latter type each healthy node receives a
message from an adjacent node and sends the message to an adjacent node no matter where faulty nodes and/or links exist. Broadcasting schemes in hypercubes discussed in [5] and [14] are for a model of the latter type. Star networks were proposed as attractive interconnection networks [1]. In recent years, star networks have been much studied [2],[3],[6],[7],[15],[16]. Previous theoretical work related to this paper is as follows. The n-star network consists of n (n 0 1)-star networks and additional n!=2 links. The connectivity and the diameter of the n-star network are n 0 1 and b3(n 0 1)=2c, respectively [1], [2]. For any pair of distinct nodes u and v in the n-star network, there are n 0 1 node-disjoint paths from u to v with lengths at most dist(u; v) + 4, where dist(u; v) denotes the distance between u and v [6], [15]. It was shown that the fault diameter of the star network is diam + 1, where diam denotes the diameter of the star network [15]. In [11] a single-port broadcasting scheme for the n-star with no faults was proposed. Its running time is at most n log n and optimal in the sense that for any constant c < 1, there does not exist any single-port broadcasting scheme with running time cn log n. A problem of nding a single-port broadcasting scheme in faulty star networks was also presented in [11]. However, any substantial solution to this problem has not been given. In this paper we give an optimal solution to this problem. For the multi-port model, a faulttolerant broadcasting using routing was introduced in [4], and its running time is O(n 23 log n). With additional assumption that all nodes of the star network have the information about the set of faults, a multi-port broadcasting scheme using the minimum time and the minimal number of messages possible was presented in [8]. In this paper we consider single-port broadcasting in faulty star networks under the assumption that each node does not have any information about faults. We give a single-port broadcasting scheme that can tolerate n 0 2 faults. We show that for any constant > 0, broadcasting by our scheme terminates within (1 + )n log n time for a suciently large n. We can say that the running time of our fault-tolerant broadcasting scheme is optimal for the asymptotic order and almost optimal for the constant factor of the order. However, we should like to point out that our result is of theoretical interest. We study the asymptotic performance of broadcasting in faulty star networks from theoretical viewpoint. The broadcasting process by our scheme can be divided into two stages. The rst stage is called the diusing stage, and second stage is called the disseminating stage. In the diusing stage the message from the source node is transmitted along n 0 1 disjoint channels in the n-star network. This stage can be implemented in the single-port manner in O (n log n) time. We will explain this stage in Section 4. The disseminating stage is essentially the same as the broadcasting process commonly used in networks (e.g., broadcasting in hypercubes [14]). It can be also implemented in the single-port manner and in O (n log n) time. This stage will be explained in Section 3. In Section 5 we give a probabilistic analysis of the reliability of the broadcasting by our scheme in the n-star network with faults randomly distributed. We show that the broadcasting is successful with a probability higher than 1 0 1=n! if there are at most (n!) faulty nodes randomly
distributed in the n-star network and is a constant less than 1.
2 Preliminaries Let a a 1 1 1 an be a permutation of n symbols 1; 2; 1 1 1 ; n. For each i (2 i n) and a permutation a a 1 1 1 an , a generator gi is de ned as gi (a 1 1 1 ai ai 1 1 1 an ) = ai 1 1 1 ai0 a ai 1 1 1 an . An undirected graph G = (V; E ) is called the nstar graph (denoted by Sn ) if V = fa a 1 1 1 an j a a 1 1 1 an is a permutation of 1; 2; 1 1 1 ; ng and E = f(u; v ) j u; v 2 V and v = gi (u) for some ig. The n-star graph is also called the n-star network. For 2 i n, edges of Sn speci ed by gi are said to be of dimension i. We often denote edge (u; v ) by giu if gi (u) = v . It is immediate that Sn has n! nodes and it is (n 0 1)-regular. We can choose (n 0 1)! permutations with an identical last symbol from n! permutations. Hence, we can decompose the n-star network into n node-disjoint (n 0 1)-star networks. From this property we can say that star networks are hierarchical. As an example, we show the 4-star network in Figure 1, where #a; #b; #c and #d are connected to #a; #b; #c and #d, respectively. 1
2
1
1
1
2
1
+1
+1
1
Fig. 1.
2
1
2
The 4-star network.
Let a ; a ; 1 1 1 ; am be m distinct symbols chosen from f1; 2; 1 1 1 ; ng. There are m! permutations of a ; a ; 1 1 1 ; am . For each permutation a a 1 1 1 am , we assign an integer between 0 and m! 0 1 in the lexicographic order. This integer is called the rank of a a 1 1 1 am and denoted by r (a a 1 1 1 am ). 1
2
1
1
2
2
1
1
2
2
In Section 4, we propose a fault-tolerant broadcasting scheme that exploits smaller sub-star networks Si 's (2 i < n) recursively. In the analysis of the scheme, we specify sub-star networks in the following way. Each sub-star network Si of Sn is speci ed by a permutation of n symbols and the set of generators fg ; g ; 1 1 1 ; gig. In other words, for each i, Sn can be decomposed into (n!)=(i!) disjoint Si 's by partitioning the node set so that nodes belong to the same part of the decomposition if and only if n 0 i symbols from the rightmost of all the 2
3
nodes of the part are an identical sequence. Note that each sub-star network Si can be also decomposed into disjoint smaller sub-star networks. For simplicity, we use 1 3 n to denote a permutation with the rst and last symbols being 1 and n, respectively. This notation can be extended to denote other permutations. For example, 1 3 n 0 1n denotes a permutation with the rst, the (n 0 1)st and the nth symbols being 1, n 0 1 and n, respectively. We assume that each node represents a processor and each link represents a bidirectional communication line connecting two nodes at the extremes of the link. All the nodes in a network are synchronized with a global clock. Broadcasting time is measured as the number of steps to complete the broadcasting. In each step every node can send a message to at most one neighbor node and can receive a message from at most one neighbor node. Such a model is called a single-port network. Communication from node b b 1 1 1 bn to node a a 1 1 1 an can be done in the same fashion as communication from the identity permutation 12 1 1 1 n to node f (a )f (a ) 1 1 1 f (an ), where f (bk ) = k for each k (1 k n). Hence, it is sucient to consider broadcasting only from the source node 12 1 1 1 n. All logarithms in this paper are to the base 2. 1
1
2
1
2
2
3 An Information Disseminating Scheme In this section we give a natural broadcasting scheme in Sn . The principle of the broadcasting scheme is the same as that of broadcasting in hypercubes given in [5], [14]. It is described as the following procedure. The procedure is executed at each node u of Sn concurrently. procedure Dissem(n; t) (* it is executed at each node u *) repeat t times for i := 2 to n do if u held the message before the current step then u sends the message along dimension i For the implementation of procedure Dissem(n; t), it is not necessarily that each node knows whether its incident nodes and links are healthy and which is the source node of broadcasting.
Theorem 1. If there are at most n02 faulty nodes and/or links in Sn , then every node receives the message from the source node by executing Dissem(n; diam +1) at each node in Sn . Proof. For any pair of distinct nodes u and v in Sn , there are n 0 1 node-disjoint paths from u to v with lengths at most minfdist(u; v ) + 4; diam + 1g [6], [15]. Hence, the theorem holds.
Corollary 2. If there are at most n 0 2 faulty nodes and/or links in Sn , then the message from the source node can reach every node within distance d by executing procedure Dissem(n; d + 4) at each node in Sn . 2
Theorem 3. For any t < bn=2c, procedure Dissem(n; t) cannot complete broadcasting even if there are no faults in Sn . Proof. Without loss of generality we may assume that the source node is 12 1 1 1 n. Consider a node = a a 1 1 1 an such that for each i (1 i bn=2c), ai > bn=2c. Obviously there exists such a node. We show that the message from the source node cannot reach node by executing procedure Dissem(n; t) if t < bn=2c. We call each loop of the repeat statement of Dissem(n; t) a round. Then Dissem(n; t) consists of t rounds and each round consists of n 0 1 steps. For each node u = u u 1 1 1 un , let l(u) = jfui : 1 i bn=2c and ui > bn=2cgj. Then apparently l (12 1 1 1 n) = 0 and l() = l(a a 1 1 1 an ) = bn=2c. If the message held in node u moves to node v during a round of procedure Dissem, then l(v) l(u) + 1. Hence, if the message from the source node reaches node v by Dissem(n; t) then l(v) t. We therefore need at least bn=2c rounds to complete broadcasting by procedure Dissem. 1
1
2
2
1
2
From Theorem 3.1 and Theorem 3.3 we can say that procedure Dissem can broadcast a message from the source node throughout the network Sn in 3n =2+ O(n) steps if there are at most n 0 2 faults, but cannot complete broadcasting in Sn in bn=2c(n 0 1) steps even if there are no faults. 2
4 An Ecient Fault-Tolerant Broadcasting Scheme From the hierarchical structure of star networks, a broadcasting scheme was proposed in [11]. Roughly speaking, the scheme consists of n 0 2 rounds. Suppose that the source 12 1 1 1 n has initially a message which will be broadcast in Sn . The source sends the message to n sub-star networks Sn0 's so that at least one node of each Sn0 receives the message. Recursively continue this process. Then at least one node of each sub-star network Sn0i in the ith round (i 2) has the message. Each node with the message in each Sn0i sends the message to n 0 i +1 sub-star networks Sn0i in the (i +1)st round so that at least one node in each Sn0i receives the message. Clearly, all the nodes in Sn receive the message from the source after repeating the round n 0 2 times. This scheme, however, may fail if there is a fault in Sn . Let us recall the de nition of function r given in Section 2. For example, for each permutation of arbitrary three distinct symbols function r assign a rank between 0 and 5 in lexicographic order. For permutations of integers 1, 3, and 5, r(135) = 0, r (153) = 1, r(315) = 2, r(351) = 3, r (513) = 4, and r (531) = 5. For a pair of permutations p and p of the same length (not necessarily they are permutations of the same set of integers), if r (p ) 6= r(p ) then there is at least one position where the symbols of p and p are distinct. Note that p and p are distinct if r(p ) 6= r(p ) and that p and p are not necessarily identical even if r(p ) = r(p ). Let d be the minimum positive integer such that d! n. Let p = p p 1 1 1 pn be an arbitrary permutation of 1; 2; 1 1 1 ; n, and let p[i; j ] denote pi pi 1 1 1 pj , where 1 i j n. We partition the label of each permutation p into three 1
1
+1
+1
1
2
1
1
2
1
1
2
2
2
1
1
2
2
1
2
+1
intervals. The rst interval is just the rst symbol p[1; 1] and called the head. The second interval is p[2; 3d + 1] and called the identi er district. The identi er district is divided into three blocks, p[2; d +1], p[d +2; 2d +1], and p[2d +2; 3d +1]. The last interval is p[3d + 2; n]. (See Figure 2.) The broadcasting process by our scheme is divided into two stages, called the diusing stage and the disseminating stage. In the diusing stage, the message from the source node s = 12 1 1 1 n is transmitted along n 0 1 internally disjoint channels. Each channel contains at least one node in every (3d + 1)-sub-star such sub-star networks. In other network. Note that there are totally dn words, after the diusing stage, at least one node in each (3d + 1)-sub-star network holds the message if there exist at most n 0 2 faults in Sn . Hence, after the diusing stage, for every node u in Sn , there exists a node v holding the message within distance b d c from u. Then, Dissem (n; b d c + 4) is executed in the disseminating stage. By Corollary 3.3, every node can receive the message in this way if there exist at most n 0 2 faults in Sn . !
(3 +1)!
9
9
2
2
Fig. 2.
The identi er district of p.
The diusing stage consists of the pre-stage and the recursive stage. During the pre-stage, s sends its message to its n 0 2 neighbors g (s) = 213 1 1 1 n, g (s) = 3214 1 1 1 n; 1 1 1, gn0 (s) = (n 0 1)2 1 1 1 (n 0 2)1n. Then the n 0 1 nodes with the message (including s) send the message along dimension n. Then, n 0 1 nodes t = n 3 1, t = n 3 2, 1 1 1, tn0 = n 3 (n 0 1) receive the message. Next for each j (1 j n 0 1), the message is transmitted from tj to wj along an appropriate route, where wj satis es the conditions that wj [3d + 2; n] = tj [3d + 2; n] and r(wj [2; d + 1]) = r(wj [d + 2; 2d + 1]) = r(wj [2d + 2; 3d + 1]) = j . For each j (1 j n 0 1)) we can specify such a route from tj to wj by choosing appropriate nodes whose labels are obtained by some changes of symbols in tj [2; d + 1], tj [d + 2; 2d + 1], and tj [2d + 2; 3d + 1]. For any pair of distinct i and j (1 i; j n 0 1) the route from ti to wi and the route from tj to wj are node-disjoint since the last symbol of the label of each node on the former route is i while the last symbol of each node on the latter route is j . The recursive stage follows the pre-stage. The recursive stage is consistent with the hierarchical structure of Sn . For clear explanation, we now assume that no faults exist in Sn . Suppose that there exist n 0 1 nodes holding the message in an m-sub-star network in the recursion stage. Let these n 0 1 nodes be x ; x ; 1 1 1 ; xn0 , where for each j (1 j n 0 1), r (xj [2; d + 1]) = r (xj [d + 2
1
1
2
1
2
1
1
3
2; 2d + 1]) = r (xj [2d + 2; 3d + 1]) = j . Then for each j (1 j n 0 1), the message is transmitted from xj to m nodes that are in dierent (m 0 1)-substar networks. That is, for each j (1 j n 0 1), xj transmits the message to a node, say yj in each of the m 0 1 Sm0 , where yj does not necessarily satisfy r(yj [2; d + 1]) = r (yj [d + 2; 2d + 1]) = r(yj [2d + 2; 3d + 1]) = j . However, at least two of r (yj [2; d + 1]), r (yj [d + 2; 2d + 1]) and r (yj [2d + 2; 3d + 1]) are equal to j . If one of these three ranks is not equal to the others, then we can choose a route from yj to a node in the same Sm0 , say zj which is obtained by some changes of symbols in the corresponding intervals so that r (zj [2; d + 1]) = r(zj [d + 2; 2d + 1]) = r(zj [2d + 2; 3d + 1]) = j . Then we move to the next round of the recursive stage. The recursive stage stops when it reaches (3d + 1)-substar networks. The principle of the recursive stage is depicted in Figure 3. More detailed description about the recursive stage is given in Subsection 4.1. 1
1
Fig. 3.
Message routes from Sm to Sm01 in the recursive stage.
4.1 Construction of n 0 1 Node-Disjoint i-Level Channels De nition 4. For a node p in Sn , the identi er (id for short) of p is r(p[i; i + d]) if r(p[i; i + d]) = r (p[j; j + d]) for some i 6= j (i; j = 1; d + 1 or 2d + 1), and otherwise it is 0.
Lemma 5. For any pair of permutations p and p of length n, they are distinct if their id's are not identical. 2 1
2
We now explain the process of the recursive stage in detail. For each j (1 n 0 1), wj has the message at the beginning of the recursive stage. We
j
therefore may consider that each wj is a message source at the beginning of the recursive stage. Hereafter, we call them message sources. Remember that r(wj [2; d + 1]) = r(wj [d + 2; 2d + 1]) = r(wj [2d + 2; 3d + 1]) = j . The recursive stage is divided into n 0 3d 0 1 rounds. For each i (1 i n 0 3d 0 1) and each j (1 j n 0 1), let Wi (wj ) be the set of nodes that hold the message from wj and are ready to broadcast the message at the beginning of round i. Initially, let W (wj ) = fwj g and Wi (wj ) = (i > 1). Its contents will be renewed at the beginning of each round i. For each i (1 i n 0 3d 0 1) and each j (1 j n 0 1), during round i each node in Wi (wj ) executes the following operations: 1
(1) Each node p 2 Wi (wj ) sends its message to g (p); g (p); 1 1 1 ; g d (p) sequentially. Then node p broadcasts its message in a binary jumping way as described in Section 4.2. That is, p sends its message to gd (p ); g d (p ); 1 1 1 ; g d k (p) sequentially, where k is the maximum integer such that 3d + 1 + 2k n 0 i. 2
3 +1+1
3 +1+2
3
3 +1
3 +1+2
When all of g (p); g (p); 1 1 1 ; g d (p) have received the message, each node u 2 fg (p); g (p); 1 1 1 ; g d (p)g executes the following operations: 2
2
3
3 +1
3
3 +1
(2) Along gn0i , u sends the message received from p. Then after gn0i (u) receives the message, it transmits the message to node u0 through a path such that r(u0 [2; d + 1]) = r(u0 [d + 2; 2d + 1]) = r (u0 [2d + 2; 3d + 1]) = j . Let Routepi;j (u) be the set of nodes on the path from u to u0 . Node u0 is added to Wi (wj ). +1
+1
(
)
+1
Let 3d + 1 < dim v in the set of nodes (including gd (p ); g d ;111; v gd k (p)) that received the message along an edge g dim broadcasts the message in a binary jumping way as described in (3) below: 3d + 1 + 2k . Each node
3 +1+1
3 +1+2
3 +1+2
(3) When p sends a message along g pd l01 , v rst receives a message v (it is not necessarily that v receives the message directly from from gdim p). Then v sends its message to gdim l (v); gdim l+1 (v); 1 1 1 ; gdim q (v) sequentially, where q is the maximum integer such that dim + 2q n 0 i. When the binary jumping transmissions nished, p and all the nodes that received the message along a dimension among g d ; g d ; 1 1 1 ; gn0i , send the message along gn0i . The set of the nal n 0 3d 0 i nodes that have received the message along gn0i is denoted by BJi (p). All the nodes in BJi (p) are added to Wi (wj ), too. 3 +1+2
+2
+2
+2
3 +2
3 +3
+1
+1
+1
The operations listed (1), (2) and (3) above are called Rule (1), Rule (2) and Rule (3), respectively. A detailed description about this scheme will be given in Subsection 4.2.
Lemma 6. Let wj = 3j (1 j < n) be a message source in Sn at the beginning of the recursive stage. For each i (1 i < n 0 3d) and for an arbitrary node p 2 Wi (wj ), r(p[2; d + 1]) = r(p[d + 2; 2d + 1]) = r(p[2d + 2; 3d + 1]) = j . 2
De nition 7. Let p 2 Wi (wj ), H = fg (p); g (p); 1 1 1 ; g d (p)g and Yi (wj ; p) = S Routepi;j (u). Let Xi (wj ; p) be the set of nodes that have received the mesu2H sage directly or indirectly from node p but not in YS i (wj ; p) during round i. We S Xi (wj ; p). Yi (wj ; p) and Vi (wj ) = de ne Ui (wj ) = 2
(
3
3 +1
)
p2Wi (wj )
p2Wi (wj )
De nition 8. Let wj be a message source in Sn at the beginning of the recursive stage. For 1 i < n 0 3d, the i-level channel rooted at wj , denoted by Ci (wj ), is de ned as
Si
(Uk (wj ) [ Vk (wj )).
k=1
The next lemma is immediate from the de nition of the i-level channel.
Lemma 9. For 1 i < n 0 3d, at least one node in each Sn0i obtained by xing the last i symbols of the labels of nodes of Sn is contained in the i-level channel rooted at wj , Ci (wj ). 2 Lemma 10. Let wj = 3j (1 j < n) be a message source in Sn at the beginning of the recursive stage satisfying r(wj [2; d + 1]) = r(wj [d + 2; 2d + 1]) = r(wj [2d + 2; 3d + 1]) = j . Then for each i (1 i < n 0 3d) there exists an i-level channel rooted at wj , Ci (wj ), such that the id of every node in Ci (wj ) is j . Proof. By the de nition of the i-level channel rooted at wj , Ci (wj ) =
Si
(Uk (wj )
k=1
[Vk (wj )). From Lemma 4.2, for each node p 2 Wi(wj ), r(p[2; d + 1]) = r(p[d + 2; 2d + 1]) = r (p[2d + 2; 3d + 1]) = j . According to Rule 3 of the recursive stage, the identi er district of each node in Vk (wj ) (1 k < n 0 3d) remains unchanged. Hence, the id's of all the nodes in
n0S 3d01
Vk (wj ) are equal to j . On the other hand, from Rule 1 and Rule 2 of the recursive stage, at most one of u[2; d + 1], u[d + 2; 2d + 1] and u[2d + 2; 3d + 1] of each node u 2 Uk (wj ) (1 k < n 0 3d) n0Sd0 has changed. Hence, the id's of all the nodes in Uk (wj ) are equal to j . k=1
3
1
k=1
From Lemma 4.1 and Lemma 4.4, the following theorem is immediate.
Theorem 11. Let w ; w ; 1 1 1 ; wn0 be n 0 1 message sources in Sn . Then for each i (1 i < n 0 3d), there exist n 0 1 node-disjoint i-level channels Ci (w ); Ci (w ); 1 1 1 ; Ci (wn0 ). 2 1
2
1
1
2
1
Lemma 12. For 1 i < n 0 3d, the time needed in round i of the recursive stage is 3d + maxflog(n 0 3d 0 i) + 1; b d c + 1g. 3
2
Proof. The process of round i contains two parts. One part is to broadcast the message to Ui (wj ); and the other is to broadcast the message to Vi (wj ). At the beginning of round i, each node p 2 Wi (wj ) sends its message to g (p); g (p); 1 1 1 ; g d (p) sequentially. This takes 3d steps. Then node p sends the message to 2
3 +1
3
gd (p ); g d (p ); 1 1 1 ; g d (p) sequentially. Simultaneously g (p); g (p); 1 1 1 ; g d (p) that received the message from p, send the message along gn0i . 3 +1+1
3 +1+2k
3 +1+2
2
3
Since the distance between u and u0 speci ed in Rule 2 of the recursive stage is at most 1 + b dc, the time needed in round i is 3d + b dc + 1. On the other hand, broadcasting time to Vi (wj ) is 3d + log(n 0 3d 0 i) + 1. Hence, the lemma holds. 3 +1
+1
3
3
2
2
Theorem 13. Let s be the source node in Sn . For an arbitrary sub-star network S d obtained by xing the last n 0 3d 0 1 symbols of the labels of nodes in Sn , s can send the message to n 0 1 distinct nodes in the S d through n 0 1 noden0Pd0 (3d +maxflog(n 0 3d 0 i)+1; b d c +1g) disjoint paths within n + b d c0 1+ i steps. 3 +1
3 +1
3
9
2
1
3
2
=1
Proof. Initially, s sends the message to n 0 2 neighbors g (s); g (s); 1 1 1 ; gn0 (s) sequentially. Then the n 0 1 nodes with the message (including s) send the message along dimension n. So, n 0 1 nodes 31; 32; 1 1 1 ; 3n 0 1 receive the message. Furthermore, each node 3j (1 j < n) transmits the message to node wj satisfying r(wj [2; d + 1]) = r(wj [d + 2; 2d + 1]) = r (wj [2d + 2; 3d + 1]) = j . Clearly, these n 0 1 paths are node-disjoint. Then we may regard such node wj as a message source, and let these nodes be w ; w ; 1 1 1 ; wn0 . These operations need at most n + b d c 0 1 steps totally. From Lemma 4.3, Cn0 d0 (wj ) (1 j < n) will be used to send the message of wj to all the S d 's obtained by xing the last n 0 3d 0 1 symbols of the node labels. That is, at least one node in each S d has received the message from wj . From Theorem 4.5, arbitrary two Cn0 d0 (wj1 ) and Cn0 d0 (wj2 ) are node-disjoint if j 6= j . From Lemma 4.6, the total time is n + b d c 0 1 + 2
1
2
3
1
1
9
2
3
1
3 +1
3 +1
3
n0P 3d01 i=1
1
2
(3d + maxflog(n 0 3d 0 i) + 1; b
1
3
1 9
2
d c + 1g). Hence, the theorem holds. 2
3
4.2 Diusing-and-Disseminating Scheme in Faulty Star Networks We assume that there exist at most f faults in Sn , where f < n 0 1. Our broad-
casting scheme, called the Diusing-and-Disseminating, is described as follows: procedure Diusing-and-Disseminating (* for each node u *) if u is the source node then =3 the pre-stage 3= for i := 2 to n 0 1 do send the message along giu if u has the message then send the message along gnu if u received a message from gnu then for i := 1 to b d c do adjust u so that r(u[2; d + 1]) = r(u[d + 2; 2d + 1]) = r(u[2d + 2; 3d + 1]) = u[n; n] for i := 0 to n 0 3d 0 2 do =3 the recursive stage 3= 9
2
begin
if u has the message then begin for j := 2 to 3d + 1 do
send the message along gju call Binary-jump(3d + 1; n 0 i)
end if u received the message from gju (1 < j < 3d + 2) then send the message along gnu0i if u received a message from gnu0i then call Route(u) end call Dissem(n; b d c + 4) =3 the disseminating stage 3= 9
2
The following procedure Binary-jump(n ; n ) is to distribute the message from a node in Sn to its neighbors in a binary jumping way. The node with the message at the beginning of each round i is regarded as a message source in round i. procedure Binary-jump(n ; n ) (* for each node u *) if u is a message source then X := n for i := 1 to dlog(n 0 n + 1)e do if u has the message then 1
1
2
2
1
2
1
begin dim := X + 2i0 if dim minfn 0 1; n g then u send the message along gdim end else if the message was rst received by u then X := the dimension through which the message came if u has the message then send the message along gnu2 Lemma 14. Let u = h d 1 1 1 dn1 an1 1 1 1 an2 3 be a message source. After u executes the for loop of Binary-jump(n ; n ), there exists a set of n 0 n nodes fan1 d 1 1 1 dn1 3; an1 d 1 1 1 dn1 3, 1 1 1 ; an2 d 1 1 1 dn1 3g such that each node in the 1
2
1
2
+1 1
+1
2
+2
2
2
1
2
2
set has received the message. Proof. We prove the lemma by induction on i, the iteration number of for loop. For i = 1, the message source sends the message along gnu1 , and then the node receiving the message has an1 in the rst position of its label. Suppose that the lemma holds true up to the ith iteration. That is, after the ith iteration, the nodes that received the message have an1 ; 1 1 1 ; an1 i 0 in the rst positions of their labels. We rst assume that 2i n . Then at the (i + 1)st iteration the message source u sends the message to a node with an1 i at the rst position of its label. Since node an1 l 3 (1 l < 2i ) received the message from dimension n + l, the node sends the message to a node with an1 l i at the rst position of its label. If 2i n and 2i > n , then there exists a number c (1 c < 2i ) such that 2i + c = n . +1
+1
+1
+2
+1
2
+
+2
1
+ +2
2
2
+1
2
1
In the diusing stage, a node with the message often has a block in its identi er district such that the rank of the block is dierent from the rank of the corresponding block of the message source. Such a block is called a destroyed block. Route(u) can transmit a message from u to a node with no destroyed block in its id district. procedure Route(u) (*for each node u *) if not r(u[2; d + 1]) = r(u[d + 2; 2d + 1]) = r(u[2d + 2; 3d + 1]) then
begin
nd the destroyed block u[j + 1; j + d] (j = 1; d + 1 or 2d + 1) for i := 1 to b d c do adjust u[j + 1; j + d] so that r(u[2; d + 1]) = r(u[d + 2; 2d + 1]) = r(u[2d + 2; 3d + 1]) end 3
2
Lemma 15. For an arbitrary node u in Sn , Route(u) can transmit a message from u to a node u0 with r(u0 [2; d +1]) = r(u0 [d +2; 2d +1]) = r(u0 [2d +2; 3d +1]) within b d c steps if just one u[j + 1; j + d] (j = 1; d + 1; or 2d + 1) is destroyed. 3
2
Proof. These two nodes u and u0 are in the same Sd obtained by xing the last n 0 d 0 1 symbols of the node labels in Sn . Hence, the maximum number of steps needed to transmit the message is at most the diameter of Sd . +1
+1
Theorem 16. Procedure Diusing-and-Disseminating can broadcast a message from the source node to all other nodes in Sn within (1 + ")n log n steps if there exist at most n 0 2 faulty nodes and/or links in the network, where " is a positive constant less than 1.
Proof. For a suciently large n, that (" log n)! n. Thus, assume that d = " log n. The total time T (n) needed by Diusing-and-Disseminating is evaluated as follows: 1
1
9d 9d T (n) = n + 0 1 + ( + 4)(n 0 1) 2 2 n0X d0 3d (3d + maxflog(n 0 3d 0 i) + 1; + 1g) + 3
1
2
i=1
n + 92d 0 1 + ( 92d + 4)(n 0 1) + ( 92d + 1)(n 0 3d 0 1) +
n0X 3d01 i=1
(log(n 0 3d 0 i) + 1)
= "n log n + log(n 0 3d 0 1)! + O(n) = (1 + ")n log n + O(n)
5 Analysis of Broadcasting in Sn with Random Faults Since the connectivity of Sn is n 0 1, any broadcasting scheme in Sn can tolerate at most n 0 2 faults. However, if we assume that faulty places are randomly distributed in a network then the worst case occurs rarely. Hence, even if there exist much more than n 0 2 faults in Sn , broadcasting may succeed with a high probability. In this section, we show that for any constant < 1, if there are no more than (n!) faulty nodes randomly distributed in Sn , then broadcasting by our scheme succeeds with a probability higher than 1 0 1=n!. For simplicity, we assume that there are at most f faulty nodes and that every link is healthy. Consider all con gurations of Sn with at most f faulty nodes. We also assume that for each i (1 i f ) all con gurations of Sn with exact i faulty nodes are equally P probable and that the probability of a con guration with i faults is pi , where fi pi = 1. We denote a con guration of Sn with i faulty nodes by conf (n; i). Apparently the number of dierent conf (n; i)'s is n! . We now estimate the probability that broadcasting by our scheme fails. i =1
We denote CONF (n; i)(fail) = fconf (n; i) : broadcasting fails in conf (n; i)g. Then equality (1) is immediate.
P robfbroadcasting succeedsg = 1 0
f X i=1
0 n!
jCONF (n; i)(fail)j i
1
pi : (1)
Let s be the source node and v be an arbitrary node in Sn . Denote CONF (n; i)(fail; v) = fconf (n; i) : the message from s cannot reach v by our broadcasting scheme in conf (n; i)g. Then we have the next inequality.
jCONF (n; i)(fail)j
10
f X i=1
2n!(n log n)n0
1
n! 0 (n 0 1) i 0 (n 0 1)
0 n! i
1
pi :
(7)
Theorem 17. For any constant < 1, if there are no more than (n!) faulty nodes randomly distributed in Sn , broadcasting by our scheme succeeds with a probability higher than 1 0 1=n!. Proof. Let f = (n!) . From inequality (7) and the following inequality n! 0 (n 0 1) n! 0 = i(i 0 1) 1 1 1 (i 0 n + 2) < i n0 ; i 0 (n 0 1) i n!(n! 0 1) 1 1 1 (n! 0 n + 2) n! 1
1
P robfsuccessg > 1 0
f X 1 (n!) n log n n0 n i log n n0 > 10 > 10 : pi n! n! n! 2
i=1
1
2
1
The lower bound obtained above on the successful probability is not tight. Some failure con gurations are counted more than once in our analysis, and some con gurations that are actually not failure ones are also counted as failure con gurations in our analysis. However, the result is quite satisfactory since (n!) is much larger than n 0 2 while the lower bound 1 0 n on the probability of successful broadcasting is highly close to 1. We therefore can say that our probabilistic analysis is acceptable. We can similarly analyze the case where there exist only failed links randomly distributed in Sn . We can also analyze in a similar way the case where there exist both failed nodes and failed links, if a reasonable probabilistic distribution of the faults can be given. 1
!
6 Conclusion We showed that our broadcasting scheme tolerates up to n 0 2 faults in Sn and that its running time is O(n log n). This running time is optimal for the asymptotic order and almost optimal for the constant factor of the order. We conjecture that the scheme might be optimal even for the constant factor of the order, too. This problem is theoretically interesting, and worthy for the further investigation.
References 1. S. B. Akers, D. Harel, and B. Krishnamurthy, \The star graph: An attractive alternative to the n-cube." In Proc. Int. Conf. Parallel Processing, pp. 393{400, 1987. 2. S. B. Akers and B. Krishnamurthy, \On group graphs and their fault tolerance." IEEE Trans. Computers, Vol. C-36, pp. 885{888, 1987. 3. S. B. Akers and B. Krishnamurthy, \A group-theoretic model for symmetric interconnection networks." IEEE Trans. Computers, Vol. 38, pp. 555{566, 1989. 4. N. Bagherzadeh, N. Nassif, and S. Lati , \A routing and broadcasting scheme on faulty star graphs." IEEE Trans. Computers, Vol. 42, pp. 1398{1403, 1993. 5. S. Carlsson, Y. Igarashi, K. Kanai, A. Lingas, K. Miura, and O. Petersson, \Information disseminating schemes for fault tolerance in hypercubes." IEICE Trans. Fundamentals, Vol. E75-A, pp. 255{260, 1992. 6. K. Day and A. Tripathi, \A comparative study of topological properties of hypercubes and star graphs." IEEE Trans. Parallel and Distributed Systems, Vol. 5, pp. 31{38, 1994. 7. P. Fragopoulou and S. G. Akl, \Optimal communication algorithms on star graphs using spanning tree constructions." Journal of Parallel and Distributed Computing, Vol. 24, pp. 55{71, 1995. 8. L. Gargano, A. A. Rescigno, and U. Vaccaro, \Optimal communication in faulty star networks." Manuscript, 1995. 9. S. M. Hedetniemi, S. T. Hedetniemi, and A. L. Liestman, \A survey of gossiping and broadcasting in communication networks." Networks, Vol. 18, pp. 319{349, 1988. 10. A. L. Liestman, \Fault-tolerant broadcast graphs." Networks, Vol. 15, pp. 159{171, 1985.
11. V. E. Mendia and D. Sarkar, \Optimal broadcasting on the star graph." IEEE Trans. Parallel and Distributed Systems, Vol. 3, pp. 389{396, 1992. 12. A. Pelc, \Fault-tolerant broadcasting and gossiping in communication networks." Manuscript, 1995. 13. D. Peleg, \A note on optimal time broadcast in faulty hypercubes." Journal of Parallel and Distributed Computing, Vol. 26, pp. 132{135, 1995. 14. P. Ramanathan and K. G. Shin, \Reliable broadcast in hypercube multicomputers." IEEE Trans. Computers, Vol. 37, pp. 1654{1657, 1988. 15. Y. Rouskov and P. K. Srimani, \Fault diameter of star graphs." Information Processing Letters, Vol. 48, pp. 243{251, 1993. 16. S. Sur and P. K. Srimani, \A fault tolerant routing algorithm in star graph interconnection networks." In Proc. Int. Conf. Parallel Processing, Vol. III, pp. 267{270, 1991.
This article was processed using the LaTEX macro package with LLNCS style