E cient Routing in Interconnection Networks Based on Cycle Pre x

0 downloads 0 Views 181KB Size Report
May 1993; revised September 1993. Index terms: cayley graphs, networks, routing. Abstract. Cycle pre x (CP) graphs are a class of directed graphs with vertices.
Ecient Routing in Interconnection Networks Based on Cycle Pre x Graphs W. Chen, V. Faber, E. Knill Los Alamos National Laboratory Los Alamos, NM 87545 [email protected] May 1993; revised September 1993.

Index terms: cayley graphs, networks, routing. Abstract Cycle pre x (CP) graphs are a class of directed graphs with vertices the permutations of f1; : : : ; ng and edges (  (12 : : :k); ) for k  2. CP graphs are vertex symmetric and have diameter and degree n ? 1. They have been proposed as graphs underlying interconnection networks. We give a new ecient routing scheme for CP graphs based on the inversion code for permutations. This routing scheme has the advantage of requiring P no more than =1 dlog2 (i)e bits in addition to the message. The best previously known method requires (n ? 2)dlog2(n + 1)e bits. n i

1

1 Introduction We describe an ecient new routing scheme for cycle pre x (CP) graphs. CP graphs [6] are a class of directed graphs with small diameter and outdegree for a given number of vertices. Speci cally, CP(n) is a graph with n! vertices and outdegree and diameter n ? 1 (see below for details). This compares very favorably to the hypercube. It also compares well to other graphs with good degree/diameter properties which have been proposed for interconnection networks. For example, the n-star graph [2] also has n! vertices and outdegree n ? 1, but its diameter is b3(n ? 1)=2c. CP graphs are therefore promising candidates for highly connected interconnection networks. Although the routing method for CP graphs given in [6] is appealingly simple, the number of bits required for the message headers is (n ? 2) log2 (n +1), which is substantially larger than dlog2 (n!)e. The number of bits required for the header could be reduced to the minimum by using an arbitrary encoding of addresses, for example by letting the header be the code of the destination node. In general, such an encoding requires large lookup tables at each node to determine the next edge on the path. A good encoding allows ecient processing of the headers using little local storage or processing e ort beyond that required for receiving and bu ering. The routing scheme described here is based on the inversion code for P permutations. It requires =1 dlog2 (i)e bits for the header. This decreases message latency, particularly if the average size of a message is small. We show that our routing scheme can be easily implemented. Generating a header from an internal description of the destination is fast and parallelizable. The algorithm is compatible with many of the known methods for increasing network throughput and for avoiding deadlock [4] [5]. The outline of the paper is as follows. Section 2 contains the de nitions n i

2

and an overview of CP graphs and their generalizations. The combinatorial coding method underlying the routing algorithm is described in Section 3. The algorithm is given in Section 4 and analyzed in Section 5. A generalization to CP coset graphs is outlined in Section 6. Finally, in Section 7, we conclude with a discussion of the properties of the algorithm and its relationship to other networks and routing methods .

2 De nitions The cycle-pre x graphs (CP-graphs) are Caley coset digraphs de ned on the group S of permutations of [1; n] = f1; : : :; ng. Application of permutations is on the left. If  is a permutation which maps i to  (i), then we write  = ( (1);  (2); : : : ;  (n)). Thus permutations are represented as sequences of distinct elements. Composition is de ned by ( )(i) =  ( (i)). The cycle-pre x permutations with 2  k  n are the permutations which cyclicly permute [1; k] to the left and leave the other numbers xed. Thus

= (k; 1; : : :; k ? 1; k + 1; : : :; n). Let CP(n; 1) be the digraph whose vertices are the permutations of [n] and whose edges are the ordered pairs (; ) such that  =  for some k. Figure 1 shows CP(3; 1). For general k  1, CP(n; k) is obtained by identifying vertices of CP(n; 1) modulo a subgroup of S . Let H be the subgroup of S consisting of the permutations  with  (i) = i for i > n ? k. CP(n; k) is the digraph whose vertices are the left cosets H of H and whose edges are the ordered pairs (H ; H ) such that for some  2 H and 2  k  n,  2 H . Note that for k = 1, H = f[1; n]g so that for k = 1, this de nition is consistent with the earlier de nition of CP(n; 1). The graphs CP(n; k) can be viewed in terms of partial permutations. A partial permutation of [1; n] of length k is a sequence p = (p(1); : : :; p(k)) n

k

k

k

n

k

k

k

n

k

k

k

k

3

k

k

(1,2,3)

(2,1,3)

(3,1,2)

(1,3,2)

(2,3,1)

(3,2,1)

Figure 1: CP(3; 1). of distinct elements of [1; n]. We identify the set of length n ? k partial permutations of [1; n] with S =H = fH j  2 S g: The coset of H corresponding to the partial permutation p = (p(1); : : :; p(n ? k)) consists of the set of permutations  with  (i) = p(i) for 1  i  n ? k. In this way, the vertices of CP (n; k) can be identi ed with partial permutations of length n ? k. The neighbors of p in CP(n; n ? k) are the partial permutations of the form (p(i); p(1); : : : ; p(i ? 1); p(i + 1); : : :; p(n ? k)) and (y; p(1); : : :; p(n ? k ? 1)), where y 62 fp(1); : : : ; p(n ? k)g. Figure 2 shows CP(4; 2). To describe the basic properties of CP graphs and the routing methods we de ne notation for operations on partial permutations. Let p = (p(1); : : :; p(k)) be a partial permutation. Concatenation of sequences is denoted by juxtaposition. For example, if q = (q (1); : : :; q (l)), then pq = (p(1); : : :; p(k); q (1); : : : ; q (l)). We write j 2 p i j = p(i) for some i. If j = p(i), then p?1 (j ) = i. If j 62 p, then de ne p?1 (j ) = k + 1. Let p n j denote the sequence obtained from p by removing j if j 2 p, and by removing the last element of p if j 62 p. Thus the neighbors of p in CP(n; n ? k) are the partial permutations (i)(p n i) with n

k

k

4

n

k

i 6= p(1).

Let p n q = (((p n q (l)) n q (l ? 1)) : : :) n q (1). The expression [i; j ] stands for the partial permutation i(i + 1) : : :j . If j < i, [i; j ] is considered to be the empty sequence. Let p[i; j ] denote the sequence (p(i); p(i + 1); : : : ; p(j )). If q is a partial permutation, then q  p means that q is a subsequence of p. The graphs CP(n; k) have the unique shortest path property. This means that between any two vertices of CP(n; k) there is a unique shortest path. The marked path in Figure 2 shows the shortest path from (1; 2) to (1; 4) in CP(4; 2). The shortest path is found as follows. Let p and q be two partial permutations of length n ? k. To determine the rst edge on the shortest path from p to q do the following: Find the minimum l such that (1) q [l; n ? k]  p and (2) if i < p?1 (q (n ? k)) then p(i) 2 q . If there is no such l, let l = n ? k + 1. De ne d(p; q ) = l ? 1. The neighbor n(p; q ) of p on the shortest path is then given by (q (l ? 1))p n (q (l ? 1)). To see that this de nes the unique shortest path between pairs of vertices, note that n(p; q ) is the only neighbor s of p which satis es that d(s; q ) < d(p; q ). Since d(x; y ) = 0 i x = y , d(p; q ) is the distance from p to q . This proves the following result [6]:

Theorem 2.1 Let p; q 2 CP(n; k). The distance from p to q is given by d(p; q).

Let d = d (p; q ) = q [d(p; q ) ? i + 1; q (d(p; q ))]. Then the i'th node after p on the shortest path from p to q is given by d p n d . i

i

i

i

Theorem 2.1 suggests the following simple method for routing in CP(n; k): The header sent by p1 to d1(p n d1 ) is the reverse of the sequence H (p; q ) = (0; : : :; 0)q [1; q (d(p; q ) ? 1)]

of length n ? k ? 1. The reason for using the reverse of H (p; q ) is so that the rst element of the header received by d1(p n d1) is q (d(p; q ) ? 1). This way d1 knows the next edge along the path at the earliest possible moment. Furthermore, to update the header, d1 only needs to remove the rst entry and append a 0, which 5

is a computationally simple task easily implementable in hardware. Generating the header at the origin of the message is also not dicult. We discuss this issue when we analyze the routing method described below. To represent the header requires (n ? k ? 1)dlog2(n + 1)e bits. What is the minimum number of bits S required by a header for shortest path routing in a regular graph G? Let N be the number of vertices of G and  the degree of G. If the header contains an arbitrary address, then S  log2 (N ). However, if a node knows which edge a message was received on, the lower bound can be decreased to S  log2(N=). This reduction is rarely exploited, partly because adding fault tolerance is dicult if the destination depends on the last edge traversed. The exact value of the lower bound on S depends on the graph. In the case of CP (n; k) graphs, a precise expression can be obtained. S

 Smin = max log2 fq j d1(p; q)(p n d1(p; q)) = sg : p;s

By vertex symmetry, this expression does not depend on p. Consider the case k = 1. Using p = [1; n] and s = (n; 1; : : :; n ? 1), one can show that Smin

= log2 ((n ? 1)!



?1 X

n

1)

i! i=1 log2 ((e ? 1)(n ? 1)!)

3 The inversion code for permutations The inversion code is a well-known representation of permutations. See e.g. Stanley [11] for de nitions and a discussion of its properties. If  is a permutation of [1; n], then the inversion code for  is a function I ( ) from [1; n] to [0; n ? 1] with I ( )(i) < i. I ( ) is de ned by I ( )(i) =

fj j

j >  ?1 (i) and  (j ) < ig

6

An inversion of  is a pair k < l such that  (k) >  (l). Thus I ( )(i) is the number of inversions k < l with k =  ?1 (i). For example, I ((1; 3; 2))(3) = 1 and I ((1; 3; 2))(i) = 0 otherwise. Since 0  I ( (k)) < k, a binary representation P of I ( ) as a list requires =1 dlog2 (i)e bits. For all  , I ( )(1) = 0. The inversion code is a bijection of the permutations of [1; n] onto the set of functions f : [1; n] ! [0; n ? 1] which satisfy f (i) < i for all i. Given such a function f , the unique permutation  with I ( )(i) = f (i) is obtained as follows: Form the partial permutation p1 = (1). Suppose p has been obtained. Construct p +1 by inserting l + 1 after the (l ? f (l + 1))'th element in p . Inversion codes can be generalized to partial permutations. Fix n and l. Let p = (p(1); : : :; p(l)) be a length l partial permutation of [n]. One possible code for p is given by I (pq ), where q consists of the x 62 p in increasing order. However, the obvious representation of this code requires the same number of bits as does the inversion code of a full permutation. Here is a more space ecient way of encoding p. Let s = (p(i1) < p(i2) < : : : < p(i )). Let t = (t(1); : : :; t(n)) be the sequence de ned by n i

l

l

l

l

t(k) =

8 > < > :

0 if k 62 p, p(j ) if k = p(i ). j

Then t is obtained by inserting zeros in p such that the i with t(i) 6= 0 are exactly the i occuring in p. For 1  k  l, let I (p)(k) =

fj j

j > p(ik ) and t(j ) < p(k)g :

I (p) is a version of the multiset inversion code of t.

For example, if p = (3; 2; 6; 5) and n = 7, then t = (0; 3; 2; 0; 6; 5; 0) and I (p) = (2; 3; 1; 2). The function I (p) satis es I (p)(k) < n ? l + k for 1  k  l. Any function f with this property can be decoded into a unique partial permutation of length l by the following sequence of steps: Let t0 be the sequence of n ? l zeros. Construct t +1 from t k

7

k

by inserting k + 1 after the (k ? f (k + 1))'th element of t . Let s1 < : : : < s be the positions of the non-zero elements of t . Finally let p(i) = s where P j = t (s ). To represent I (p) as a sequence requires =1 dlog2 (n ? l + k)e bits. k

l

l

l

j

l

i

i

4 Routing in

( 1) using the inversion code

C P n;

In an interconnection network based on CP(n; 1), each node of CP(n; 1) contains a processor. We identify node  with the processor it contains. The edges leaving  are labelled. The edge (;  ) is labelled by  (k). Thus if we leave node  by the edge labelled  (k), then the next node we reach is ( (k)) n( (k)) We assume that each processor/node \knows" its permutation in the forms ( (1); : : :;  (n)) and ( ?1(1); : : :;  ?1(n)). It should also have access to the permutation k

  = ( (n ? 1);  (n);  (n ? 2);  (n ? 3); : : :;  (1))

and its inversion code in the order (I ()(n); I ()(n ? 1); : : :; I ()(1)). Suppose we want to send a message M from  to . To do this, node  rst generates a header H and prepends this to M . This gives the message HM . Node  then sends HM to the next processor on the shortest path to . When a processor receives a message HM , it reads and processes the header. It thus obtains information about where to send the message. If the message is to be passed on, the header is modi ed, the new header replaces H and the message is sent along the appropriate edge. The simplest method using inversion codes simply lets H be the inversion code of the destination node. However, in this case, obtaining the next edge along the path is a nontrivial task which must be performed at each step. The goal is to trivialize the computations for all but the originating node without substantially increasing the complexity of initially generating the header. Processor  generates the header H as follows: 8

1. Determine the largest increasing sequence i ? < : : : < i such that  (i ? + ) = (n ? k + j ) for 0  j  k. This will be used to obtain the shortest path to  as discussed at the end of Section 2. n

n

k

k

n

j

2. Let  be the permutation  = ((n ? k ? 2); : : : ; (1); (n ? 1); (n); (n ? 2) : : :; (n ? k); (n ? k ? 1)):

Note the positions of (n ? 1) and (n). Their positions ensure that the destination node can be extracted from the header by any node along the path. 3. Construct the inversion code I () for  and let H = (I ()(n); I ()(n ? 1) : : :I ()(2)). Processor  then sends HM along the edge labelled (n ? k ? 1). The important observation is that the rst n ? k ? 2 elements of  are the sequence of edge labels other than the rst one along the shortest path to , while the remaining elements are the last k + 2 elements of  in reverse order with the exception of (n) and (n ? 1). These properties of the header (the main invariants ) will be maintained as the message is passed along. If processor  receives the message HM with header H it processes H as follows: Let H = (I ()(n); I ()(n ? 1); : : :; I ()(2)) for some permutation . 1. If I ()(i) = I ()(i) for each i, then the message M is for  and does not need to be passed on. In this case skip the subsequent steps. The condition means that  is the reverse of  . 2. Find the largest k such that I ()(k) = k ? 1. The number k is the largest such that all i < k follow k in r. This implies that k = (1). 3. Let I (0 )(i) = I ()(i) + 1 for i > k, I (0 )(k) = 0 and I (0)(i) = I ()(i) otherwise. Let H 0 = (I (0)(n); I (0)(n ? 1); : : :; I (0)(2)). Then 0 is the 9

Table 1. Routing in CP(6; 1). (1; 3; 2; 5; 4; 6) ! (2; 1; 3; 5; 4; 6) ! (6; 2; 1; 3; 5; 4) ! (1; 6; 2; 3; 5; 4) =  (received) (6; 1; 4; 5; 3; 2) (1; 4; 5; 3; 2; 6) (4; 5; 3; 2; 6; 1) I () (0; 0; 1; 2; 2; 5) (0; 0; 1; 2; 2; 0) (0; 1; 2; 3; 3; 1) Message [5; 2; 2; 1; 0; M ] [0; 2; 2; 1; 0; M ] [1; 3; 3; 2; 1; M ] =

permutation obtained by moving (1) from the beginning of  to the end. This makes sure that the main invariants are maintained. 4. Send H 0M along the edge labelled by k. Table 1 illustrates the procedure.

Theorem 4.1 The routing algorithm described above is correct. Proof. Most of the proof is contained in the description and the associated

comments. It remains to show that  =  i  is the destination of the message. We say that  and  agree on (the ordering of) fi; j g i i and j occur in the same order in  as in . The construction of  and the main invariants guarantee that any receiver  of a message destined for  agrees with  on f(n ? 1); (n)g.  accepts the message i there is only one pair fi; j g on which  agrees with  and this pair is given by f(1); (2)g. This hold i  = .

5 Analysis of the routing method for CP( 1) n;

The goal of this section is to demonstrate that the operations required in the routing algorithm can be implemented eciently in hardware. Both sequential and parallel methods will be discussed. Since n is expected to be small, e.g. 10

15 as 15! = 1:3 1012, a real implementation will use xed size circuits to make maximum use of parallelization. To implement the routing method eciently, let each processor  have random access to the sequences ( (1); : : :;  (n)), ( ?1 (1)); : : :;  ?1(1)) and (I ()(n); : : :; I ()(1)). The local storage requirement for these sequences is O(n log(n)). n
i ? , call p increasing . The parallel pre x method allows p0 to determine the least i such that p is increasing in dlog2 (n ? 1)e steps. Next the inversion code of  has to be determined. This can be done sequentially by online sorting of  starting with (n); (n ? 1); : : : and determining the insertion position of (i) in the ordered set consisting of (i + 1); : : :; (n). ?  A fast parallel pre x method using 2 processors is also available. Note ?  that 152 = 105, so a reasonably sized ciruit implementing such an algorithm is possible for the networks that are likely to be used. For each i, use processing elements p with i  j  n. In the rst step, p fetches (i) and (j ). We are assuming that concurrent reads are possible, otherwise another parallel pre x step is required to distribute the values of  to the processors which need it. Next p stores 1 if (i) > (j ) and 0 otherwise. The computation of the parallel pre x sum of p ; p +1; : : : into p results in p knowing I ()((i)),

: : : < in n

n

k

j

i

n

i

n

i

j

n

n

i

n

i

n

i

n

i

i

i

i

n

i;j

i;j

i;j

i;i

i;i

i;i

11

i;i

the number of j > i with (j ) < (i). This requires dlog2 (n ? i)e steps. Finally, p stores this value as I ()((i)). i;i

Passing on the message. Except for checking whether the current node is

the destination of the method, the header can be processed eld by eld in order of arrival. Since the header is received in the order I ()(n); : : :; I ()(2), the algorithm proceeds by adding one to I ()(i) to obtain I (0)(i) until the rst i with I ()(i) = i ? 1 is received. If there is no such i, then the desired i is 1. It then sets I (0)(i) = 0 (unless i = 1) and leaves the remaining I ()(j )'s unchanged. The beginning parts of the header can be passed on as soon as  has received (1) the rst i with I ()(i) = (i ? 1) and (2) an i with I ()(i) 6= i ? 1 ? I ( )(i).

6 Generalization to CP( ) n; k

The simplest way in which one can use inversion codes for routing in CP(n; k) is to extend each partial permutation p to a full permutation p0 by appending the remaining elements of [n] in order. One can then use the same algorithm as for CP(n; 1). The header permutation  is de ned as before by prepending the sequence of edge labels to be traversed in order and making sure that if these labels are moved to the end of the permutation one by one, then the resulting permutation is the reverse of the destination label except for the ordering of the rst two elements. If it is desirable to have shorter headers, one can use the space ecient inversion code for partial permutations in Section 3. The header is constructed essentially as before. The partial permutation associated with a given node p is extended by appending the smallest element y 62 p to p. This yields the labels actually used. The header consists of the inversion code of an extended partial 12

permutation  whose pre x is the sequence of edge labels to be traversed, and which satis es that if the labels are moved to the end of  one by one, then the nal  is the reverse of the label of the destination node except for the ordering of the rst two elements. Though this is more space ecient than using extension to full permutations, the decoding tasks are more dicult. Determining the label of the next edge on the path cannot be done in stream processing mode. Full decoding of the inversion code to a partial permutation may be needed at each step along the path. It is possible to implement fast parallel decoding and encoding circuitry. However, this is considerably less simple then the routing algorithm for CP(n; 1). Finally, observe that if k is large, say k  n=2, it may be more ecient to use the method described in Section 1.

7 Discussion We have described an ecient algorithm for routing in CP graphs. The primary improvement over previously known routing methods for this class of graphs is in the amount of space required for the message headers. It is important to minimize this resource, particularly in networks with ne-grain parallelism, where each message is of a xed, relatively small size. For example, CP(9; 1) is a graph with 362880 nodes. To encode the label of a node therefore requires at least dlog2(362880)e = 19 bits. The algorithm described here uses P9=1dlog2(i)e = 21 bits. The algorithm described in [6] requires 7dlog2 (10)e = 28 bits. We have therefore obtained an improvement of 7 bits while requiring only 2 bits more than the simple lower bound and 4 bits more than Smin. Table 2 gives other values for comparison. The fundamental issues underlying general routing methods in interconnection networks are discussed in [4] [5]. The routing algorithm described i

13

Table 2. Comparison of header lengths for routing methods in CP(n; 1). List of edges Inversion code Address Smin P n! (n ? 2)dlog2 (n + 1)e =1 dlog2 (i)e dlog2 (n!)e dlog2 ((n ? 1)! P ?1 =1 1=i!)e 3 6 2 3 3 2 4 24 6 5 5 4 5 120 9 8 7 6 6 720 12 11 10 8 7 5040 15 14 13 11 8 40320 24 17 16 14 9 326880 28 21 19 17 10 3628800 32 25 22 20 11 39916800 36 29 26 23 12 479001600 40 33 29 27 13 6227020800 44 37 33 30 14 87178291200 48 41 37 34 15 1307674368000 52 45 41 38 16 20922789888000 70 49 45 42 n

N

n i

n i

14

here is compatible with many of the approaches that have been suggested for dealing with deadlock and latency problems. Note that in most cases the next edge on the path to the destination node is known before the entire header is received. This can contribute to the eciency of methods such as wormhole routing. In order for a routing method to be able to avoid faulty edges and nodes, it must be possible to select alternate paths to the destination. Both the routing method of [6] described in Section 2 and the one based on the inversion code can be used to select an arbitrary path of length l  n ? 1 from  to . If the distance to the destination is n ? 1, the rst edge on the selected path must be the one belonging to the unique shortest path, even if each node along the path is allowed to change the selected path. This is because we have taken advantage of the fact that the distance to the destination from an intermediate node is at most n ? 2. There are simple ways in which one can add the ability to select alternate paths in all cases. For the method of [6], it suces to add an additional edge label to the header. For the method described here, one can use inversion codes of permutations of [1; n + 1] by implicitly appending n + 1 to each node label. In either case, an additional dlog2 (n + 1)e bits are required to represent the header. The simplicity, eciency and adaptability of routing in CP graphs contributes to the advantages of these graphs as interconnection networks. Other networks with good degree/diameter properties are the deBrujin graphs and its generalizations [7], the star graphs [1] and others [3]. Though relatively simple routing algorithms exist for most of these graphs, they remain to be optimized. Since the star graphs are also based on permutations, it is likely that the inversion code can be used for them. In designing a code for message headers it is important have ecient algorithms for generating the headers and for deter15

mining the next edge on the path. We have achieved this for networks based on CP graphs.

References [1] S.B. Akers, D. Harel, B. Krishnamurthy, \The star graph: an attractive alternative to the n-cube", Proceedings of the Intl. Conf. of Parallel Processing (1987) 393{400. [2] S.B. Akers, B Krishnamurthy, \A group-theoretic model for symmetric interconnection networks", IEEE Trans. Comp. 38 (1989) 555{566. [3] J.-C. Bermond, N. Homobono, C. Peyrat, \Large fault-tolerant interconnection networks", Graphs and Combinatorics 5 (1989) 107{123. [4] W.J. Dally, C.L. Seitz, \Deadlock-free message routing in multiprocessor interconnection networks", IEEE Trans. Comp. C-36 (1987) 547{553. [5] W.J. Dally, \Express cubes: improving the performance of k-ary n-cube interconnection networks", IEEE Trans. Comp. 40 (1991) 1016{1023. [6] V. Faber, J. Moore and W. Y. C. Chen, \Cycle pre x digraphs for symmetric interconnection networks", Networks , to appear (1993). [7] M. Imase, M. Itoh, \A design for directed graphs with minium diameter", IEEE Trans. Comput. C-32 (1983) 782{784. [8] R.E. Ladner, M.J. Fischer, \Parallel pre x computation", Journal of the ACM 27 (1980) 831{838. [9] C.P. Kruskal, L. Rudolph, M. Smir, \The power of parallel pre x", IEEE Trans. Comput. C-34 (1985) 965{968. 16

[10] E. Knill, \The connectivity of cayley coset digraphs", preprint (1993). [11] R. P. Stanley, Enumerative Combinatorics, Vol I, Wadsworth&Brooks/Cole, Monterey, Cal., 1986.

17

(3,1)

(1,3) (1,2)

(2,3) (2,1)

(3,2)

(4,1)

(4,3)

(1,4)

(3,4) (2,4)

(4,2)

Figure 2: CP(4; 2). 18

Suggest Documents