We denote by V (G) and E(G) the vertex and edge set of a graph G, respectively. An .... Li = fx 2 V (Gd) j kxk = ig for i = 0; :::; d(p ?1) and consider a set D = Sk.
Embedding of Hypercubes into Grids S.L. Bezrukov
y
J.D. Chavez
z
L.H. Harper
U.-P. Schroeder
x
M. R ottger
y
y
Abstract We consider one-to-one embeddings of the -dimensional hypercube into grids with 2n vertices and present lower and upper bounds and asymptotic estimates for minimal dilation, edge-congestion, and their mean values. We also introduce and study two new cost-measures for such embeddings, namely the sum over = 1 of dilations and the sum of edge congestions caused by the hypercube edges of th dimension. It is shown that, in their simulation via the embedding approach, such measures are much more suitable for evaluating the slowdown of the uniaxial hypercube algorithms then the traditional cost measures. n
i
; :::; n
i
1 Introduction and Motivation The power of message-passing multiprocessor systems strictly depends on the structure of their interconnection network. Various types of network topologies have gained favor and are in use today. Among these, grids are emerging as one of the most popular network architectures. Many distributed computing systems which appeared in the last few years are based on this structure, especially on 2 or 3-dimensional grids. Examples of systems with grid-based networks as their underlying hardware architecture are those developed by Intel in the Touchstone project and those produced by the Parsytec corporation. A natural reason for the popularity of grids lies in their simple structure, which provides for easy construction and scaling-up of such systems. On the other hand, a large number of algorithms and programming techniques (e.g. the ascend-descend algorithms) are developed for the hypercube. Its popularity, eciency, and versatility as a programming network model is mostly also due to its recursive structure that is well suited for many algorithms, the divide-and-conquer algorithms for example. This work was supported by the DFG-Sonderforschungsbereich 376 \Massive Parallelitat: Algorithmen, Entwurfsmethoden, Anwendungen" and by the EC ESPRIT Long Term Research Project 20244 \ALCOM-IT". y Department of Math. and CS, University of Paderborn, D-33095 Paderborn, Germany. z Department of Math., California State University, San Bernandino, CA 92407, USA. x Department of Math., California State University, Riverside, CA 92521-0135, USA.
1
In order to run these ecient hypercube algorithms on a grid-based parallel computer, one has to simulate their communication requirements. A natural approach to such simulation would be via graph embedding. We denote by V (G) and E (G) the vertex and edge set of a graph G, respectively. An embedding f of a graph G into a graph H is an injective mapping f : V (G) 7! V (H ) together with a rooting scheme, which assigns for each edge e = fu; vg 2 E (G) some path f (e) from f (u) to f (v) in H . The quality of an embedding f may be expressed in terms of dilation and edge-congestion, in order to satisfy the demands of process locality and communication eciency. For an edge e 2 E (G) its dilation dilf (e) in the embedding f is de ned as the length of the path f (e) in H . Now, for an edge e0 2 E (H ) we put its congestion conf (e0) in the embedding f to be equal to jfe 2 E (G) j e0 2 f (e)gj. In this paper we consider embeddings of hypercubes into grids. As usual by the hypercube of dimension n 2 IN (denotation Qn), we mean the graph with V (Qn) = f0; 1gn and E (Qn) = ffx; yg j x; y 2 V (Qn); (x; y) = 1g, where (x; y) is the Hamming distance (i.e. the number of entries where x; y dier). The d-dimensional grid Gd is de ned as the Cartesian product of d pipes with p vertices each. We assume throughout the paper that n d n d the grid G has 2 vertices, i.e. d divides n, and so p = 2 . For the class F of all considered embeddings of Qn into Gd , denote dil(n; d) = min max dilf (e); f 2F e2E Qn con(n; d) = min max conf (e0 ); f 2F e0 2E Gd X 1 dil(n; d) = min dilf (e); f 2F jE (Qn)j e2E Qn X 1 0) con ( e con(n; d) = min f d f 2F jE (G )j e0 2E Gd (
(
)
)
(
)
(
)
In the case d = 1 the numbers dil(n; 1) and con(n; 1) are called bandwidth and cutwidth, respectively, and are well studied. For x; y 2 V (Qn) and for A V (Qn), denote ?t A = fv 2 V (Qn) n A j min (v; w) tg; w2A @A = ffu; vg 2 E (Qn) j u 2 A; v 2 V (Qn ) n Ag: It is known (see [6] and [5, 9] respectively) that
! i dil(n; 1) = max j? Aj = m jAmin j m bi=2c i 1 2n ? 2 + (n mod 2) : con(n; 1) = max min j @A j = m jAj m 3 =
nX ?1
1
(1)
=0
+1
=
(2)
We denote by fban and flex the embeddings of Qn into the pipe G , which have the corresponding parameters as in (1) - (2), respectively. To de ne these embeddings we rst 1
2
number the pipe vertices by 1; 2; :::; 2n from one of its ends to another, then introduce the Bandwidth order B and the Lexicographic order L on the vertices of Qn and nally embed the ith vertex of Qn in the order B (resp. L) to the pipe vertex numbered by i for i = 1; :::; 2n. Let x; y 2 V (Qn) with x = (x P ; :::; xn), y = (yP; :::; yn). We say that x is greater y in order L (denotation x >L y) i ni xi 2n?i > ni yi 2n?i. In the similar way, just by replacing n with d and 2 with p in this formula, we de ne the Lexicographic order of the vertices of Gd. Concerning the order B for Qn, we write x >B y i 1
1
=1
(i) (ii)
=1
Pn x > Pn y , or Pin xi = Pin yi and x < y. i i L i i =1
=1
=1
=1
The results of Harper [5, 6] show that for each m 2 [1; 2n], the minima in (1) and (2) are attained on the subsets of V (Qn) represented as the collections of the rst m elements in the orders B and L, respectively. The case d = 2 was studied by Zienicke [10] (see also [1]), who proved that
0s 1 n dil(n; 2) = O @ 2n A
and
p con(n; 2) = O n 2n :
These results were sharpened by Lai and Sprague [8] (cf. also [2]), who showed that dilq(n; 2) p2 1:128: (3) 0:89 nlim n !1 n 2
In our paper we extend the approach of Harper [6], which allows us to get a good lower bound for dil(n; d) for higher-dimensional grids. The constructions for all the upper bounds we present in the next sections are based on embeddings of hypercubes into pipes and are of the following type. Taking into account that d divides n, we represent Qn as the Cartesian product of d copies of Qn=d . Now, embed each Qn=d into a pipe Pi with 2n=d vertices in accordance with some embedding f , the same for each i = 1; :::; d. Since P Pd is isomorphic to Gd, we obtain an embedding of the whole Qn into Gd, which we denote by f . We show (cf. [8] for d = 2) 1
Theorem 1
s
(n; d) 1 lim dil q n;d!1 d n 3 d on n 2d = ( )
s
2;
con(n; d) 32 2 nd as n ! 1:
(4) (5)
In contrast to this, concerning the minimal average values of the dilation and edgecongestion, we present the following results: 3
Theorem 2 Let n ! 1. Then
dil(n; d) nd 2 nd ; con(n; d) 21 2 nd :
In analysis of the interconnection networks it is usually assumed that two neighboring nodes of the network can communicate only in discrete time units. Thus, the standard parameters of an embedding, such as the dilation and the edge-congestion, are appropriate for evaluation of its quality only under the assumption that in each time unit any edge of Qn may be used for the communication equiprobably. However, for a wide class of hypercube algorithms, namely for so-called uniaxial algorithms, it is signi ed that in each time unit i = 1; :::; n only the edges Ei E (Qn), which are parallel to the ith coordinate axis (or dimension), can be used. In this situation the slowdown of the simulation in each time unit i via the embedding approach can still be estimated by dilations di and edgecongestions ci only caused by the edges of Ei in the embedding. Hence, eective simulation of the whole uniaxial algorithm after n time units corresponds to minimization not of each di or ci, but rather of their sums. In accordance with this, we introduce and study two new quality measures of an embedding f . We say that an edge of Qn is of ith dimension if the two with the edge incident vertices of Qn dier in the ith entry and denote by Ei the set of all edges of Qn of ith dimension. Furthermore, let w ; :::; wn with w w wn be some non-negative weights which correspond to how often the communication links of ith dimension in Qn are used during the run-time of an uniaxial algorithm. Now, instead of dil(n; d) and con(n; d), consider the functions 1
1
2
n X
wi max dilf (e); Sdil(n; d) = min e2Ei f 2F i=1
n X
Scon(n; d) = min wi emax fconf (e0 ) j e0 2 f (e); e 2 Ei g: 0 2Gd f 2F i=1
These functions precisely describe the total slowdown of the run-time of an uniaxial algorithm by its simulation on the d-dimensional grid. We are able to compute the values Sdil(n; d) and Scon(n; d) exactly only for d = 1. For larger values of d, exact formulas are dicult to come by without special knowledge about the weights. This concerns mostly the upper bounds, so we assume that w = = wn = 1 for d > 1. However, our arguments provide lower bounds for Sdil(n; d) and Scon(n; d) for arbitrary weights as well. 1
Theorem 3
n X
Sdil(n; 1) = Scon(n; 1) = wi 2i? ; i Sdil(n; d) Scon(n; d) 2 nd for d > 1 as n ! 1: 1
=1
The upcoming sections are devoted to the proofs of Theorems 1 - 3. 4
(6) (7)
2 Standard Measures 2.1 Bounds for the Dilation Proof of (4). , we immediately get Considering the embedding fban
dil(n; d) dil(n=d; 1):
The inequality (8) in combination with (1) and
dil(n; 1) bn=n2c
(8)
!
as n ! 1 (see [2] for a complete proof) provides the upper bound (4). To prove the lower bound (4) we extend the approach of Harper [6]. For x = (x ; :::; xd) 2 P d d V (G ) denote kxk = i xi. We introduce the levels Li of the grid GdSde ned by Li = fx 2 V (Gd ) j kxk = ig for i = 0; :::; d(p ? 1) and consider a set D = ki Li [ D0 for some k and D0 Lk . As it follows from the vertex isoperimetric problem for the grid [3], for any xed h such a set D (with an appropriate choice of the subset D0) has a minimal number of vertices at distance at most h in the grid among all subsets of the grid of the same cardinality. Denote jDj = m. Let A be the collection of vertices of Qn mapped into D in an embedding f and consider the set F of vertices of the grid, which are images of the vertices of ?tA in the embedding. Let y 2 F be a vertex with maximal value of kyk and denote q = kyk. Now q ? k is the width of a band in the grid located between its levels Lk and Lq , which is required to replace the vertices of ?t A for jAj = jDj = m. We denote this width by Wt(m) := q ? k ? 1. Furthermore, let u be a vertex of ?t A, the image of which in the embedding is y and v 2 A with f (v) = x 2 D. Considering a shortest path P connecting the vertices u and v in Qn, we conclude that there exists an edge e 2 P , such that dilf (e) (kyk?kxk? 1)=(u; v) Wt(m)=t. This leads to the lower bound 1
=1
=0
+1
Wt (m) : dil(n; d) max max m t t
Let us choose m and t of the form
m =
n?X spn)=2
(
pi=0
t = s n;
n i
!
(9)
(10) (11)
with some positive constant s, which will be an optimization parameter (herepand below we omit the integer parts for brevity). Let H be the ball of radius n ? s n (in the 5
Hamming metric) centered in (0; :::; 0) 2 V (Qn). Now, jH j = m and from the vertex isoperimetric problem for Qn [7] it is known that
gt(m) := jAmin j? Aj = j?tH j = j m t =
! n : i
n X spn)=2
( +
i=(n?spn)=2
(12)
This shows that jV (Qn) n (A [ ?t(A))j m; which implies jV (Gd ) n (D [ F )j m. Since the grid Gd has d(p ? 1) + 1 levels, we get
Wt (m) z := d(p ? 1) ? 1 ? 2k: In other words, z is de ned in such a way, that the sum of z greatest numbers jLj j (0 j d(p ? 1)) asymptotically equals gt(m) (cf. (12)). Since the largest levels of Gd are located symmetrically around its middle level (with j = d(p ? 1)=2), we have z = 2x with x determined by the equation d(p?X 1)=2+x j =d(p?1)=2?x
Lj gt(m):
(13)
For further applications we have to estimate the sums in (12) and (13) asymptotically. For this, we use some facts from the probability theory. Let be a continuous random variable and F (x) be its distribution function (i.e. F (x) = P ( x)). We say that is normally distributed in (?1; 1) if
Zx 1 F (x) = (x) := p e?z = dz: 2 ?1 2 2
Next, consider a sequence of discrete random variables n, taking on a nite number of P i i values xn with corresponding probabilities pn. Let Fn(x) = P (n x) = i pin be the xn x distribution function of n, and n and n be its mean and variance, respectively. We say that n is asymptotically normal with mean n and variance n if 2
2
nlim !1
X
xin n +xn
pin = (x) for every x 2 (?1; 1):
The random variables ; :::; n are called independent if 1
P ( < x ; :::; n < xn ) = P ( < x ) P (n < xn): 1
1
1
1
Theorem 4 (Central Limit Theorem, cf. e.g. [4]). If ; :::; n are independent random 1
variables with the common distribution function F , mean , and variance 2 , then the random variable n = 1 + + n is asymptotically normal with mean n and variance n2.
6
We apply Theorem 4 to the independent discrete random variables ; :::; n, which take on the values 0; :::; l ? 1 each with probability 1=l. Then, the random variable n = + + n is asymptotically normal with mean and variance respectively 1
1
? 1: n;l = n l 12 (14) Denote by Lj the number of vertices of the l l grid Gn on distance j from the vertex (0; :::; 0), i.e. the size of its j th level. Clearly, n takes on the values 0; :::;Pn(l ? 1) each with probability 1=ln, so the distribution function Fn(x) of n is Fn(x) = ln jx Lj . Therefore, n;lX x n;l X i = lim 1 Lj = (x): (15) p lim n n!1 ln n!1 n;l = n l ?2 1 ;
2
2
1
+
j =0
xin n;l +x n;l
Equality (15) also implies
n;lX +x n;l j =n;l ?x n;l
Lj (x) ln
as n ! 1, where (x) = (x) ? (?x) = p 1 2
(16)
Rx e?z = dz.
?x
2 2
Now returning back to the estimation of the sum in (13), let us look for x represented in the form x = y d;p. Applying Theorem 4 with n = d, l = p = 2n=d , n;l = d;p = d(p ? 1)=2, n;l = d;p, one has
gt (m)
d(p?X 1)=2+x j =d(p?1)=2?x
Lj =
dp=2+ Xy d;p j =dp=2?y d;p
Lj (y) 2n:
(17)
Similarly, for the estimation of the sum in (12) we apply Theorem 4 to Qn with l = 2 and n; = n=2. Rewriting t in (11) in the form t = 2s n; , we get 2
2
gt(m) =
n=2+ Xs n;2 j =n=2?s n;2
! n (s) 2n: j
(18)
From (17) and (18) it follows that y s. Thus, q Wt (m) z 2s d;p. Finally, taking into p account that t = 2s n; = s n and d;p p d=12 (cf. (14)), one has 2
q s Wt (m) 2s d;p 2p pd=12 = d 2pn=d ; t 2s n; n 3 n and the lower bound in (4) is proved. 2
7
2
2.2 Bounds for the Edge-Congestion Proof of (5). Consider the collection Fm Gd of the rst m vertices of Gd taken in the Lexicographic order and let C (Fm ) denote the edge cut, separating Fm from Gd n Fm . By induction on d it is easily shown that
jC (Fm )j = O pd? = O 2 d?d 1
1
n
(19)
for any m, 1 m 2n. Now, let the maximum in (2) be attained for m = m (depending on the parity of n), and consider Fm . Denote by Gd (i) the subgrid f(x ; :::; xd ) 2 Gd j x = ig. One has 0
1
0
[q i=0
Gd(i) F
m0
1
q[ +1 i=0
Gd (i)
for some q, 0 q < p ? 1. Hence,
jC (Fm )j = pd? + jC 0(Fm \ Gd(q + 1))j;
(20)
1
0
0
where the cut C 0 on the right hand side of (20) is considered in the subgrid Gd(q + 1). From (20) and (19) applied to Gd(q + 1) it follows that
jC (Fm )j = pd? + O pd? = 2 d?d n + O 2 d?d n : 1
2
1
2
0
(21)
Let A V (Qn ) be the set of vertices which are mapped to Fm in the embedding. Then, at least j@Aj edges of Qn should be mapped into the paths of Gd passing through the cut C (Fm ), so there exists an edge e 2 C (Fm ) such that 0
0
0
j : con(n; d) con(e) jCj(@A F )j m0
(22)
Since jAj = jFm j = m , (2) implies j@Aj 2n ? , so from (21) and (22) it follows 0
2 3
2 3
0
con(n; d)
2n ? 2 n d? n = 2 d (1 ? o(1)): d? n 3 2 d +O 2 d 1
2 3
2 3
2
(23)
. One has To get an upper bound consider the embedding flex
con(n; d) con(n=d; 1) 32 2 nd ; which together with (23) completes the proof of (5).
8
2
3 Average Case Analysis Note that for any embedding f 2 F of a graph G into H it holds X X dilf (e) = conf (e0 ): e0 2E (H )
e2E (G)
(24)
Indeed, let e 2 E (G) and dilf (e) = a. Then, the image of e in the embedding is a path in H with a edges. Therefore, the edge e increases on 1 the congestions of exactly a edges of H . Proof of Theorem 2. Consider an embedding f of Qn into Gd and let us compute the sum on the right hand side of (24), which we denote by f . Let A V (Qn ) and D be its image in f in the grid Gd. Consider the edge-cut rD separating D from its complement in Gd . Then,
X
e2rD
conf (e) j@Aj:
(25)
Indeed, the image of each edge of @A passes through the edges of rD. However, maybe there are some other paths connecting two vertices of D (or of its complement), which are also cut by rD. This could make the sum in (25) greater than j@Aj. Now, let Dm be the collection of the rst m vectors of Gd in the Lexicographic order and Am V (Qn) be the vertices that are mapped to Dm in the embedding f . Then jAmj = m and (25) implies Xn Xn X (26) conf (e) j@Am j: 2
2
m=1 e2rDm
m=1
Due to a result of Harper [5],
Xn 2
m=1
j@Am j 2n? (2n ? 1);
(27)
1
where the equality takes place if each Am is the collection of the rst m vertices of Qn taken in the Lexicographic order. Let us estimate the double sum in (26). We say that an edge e = fu; vg 2 E (Gd) is i-edge, if the vectors u; v dier in the ith entry. Denote by E i the set of all i-edges of Gd . It is easily shown by induction on d that, due to the choice of the Lexicographic order, conf (e) for each i-edge e appears in the double P i ? sum at most p times. Therefore, denoting ci = e2Ei conf (e) and taking into account (26) - (27), one has f = Pdi ci and 1
=1
d X i=1
cipi?1
Xn X 2
m=1 e2rDm
conf (e) 2n? (2n ? 1): 1
(28)
Thus, we are able to estimate the sum of cipi? . To get a bound for f , however, we need to estimate the sum of ci. In order to do this, let us de ne a squashed Lexicographic 1
9
order Lj . Consider the permutation ! 1 j ? 1 j j + 1 d j = d ? j + 2 d 1 2 d ? j + 1 ; the bottom row of which is simply the cyclic shift of the top row on j entries. Now, for ); y = (y ; :::; yd) 2 V (Gd) we say that x is greater y in order Lj if Pd x x=p(xj i;?:::;xdP d y pj i ? . Clearly, the order L is isomorphic to the Lexicographic i i j i i order up to rotations of the grid Gd . Therefore, considering, instead of the set Dm, the collections of the rst m vertices of Gd in order Lj and applying the arguments above, one has for any j 2 f1; :::; dg =1
1 ( )
1
1
=1
( )
1
d X i=1
cipj i ? 2n? (2n ? 1): ( )
1
(29)
1
Summarizing the inequalities (29) for j = 1; :::; d, we nally get d X d X ci pj? d 2n? (2n ? 1); 1
i=1
1
j =1
which, with p = 2n=d , implies as n ! 1 d n? n X f = ci d 2 pd(2? ? 1) = d2 2n d i p? 1
=1
=d (1 ? o(1)):
(30)
( +1)
1 1
. We call a set of p To get an upper bound for minf 2F f , consider the embedding flex vertices of Gd a column, if all these vertices agree in some p ? 1 entries. Denote by Cd the number of columns in Gd . Since the image of each edge of Qn in the embedding flex belongs to some column, denoting by Am the collection of the rst m vertices of Qn=d in the Lexicographic order, one has (cf. (27) and [5])
f Cd lex
X e2E (G
1)
conflex (e) =
n=d X
2
m=1
j@Am j = Cd 2n=d? (2n=d ? 1): 1
(31)
Simple arguments show that Cd satis es the recursion Cd = p Cd? + pd? , which with C = 1 gives Cd = d pd? . Therefore, (30) and (31) imply as n ! 1 d 2n d =d : min (32) f f 2F 2 1
1
1
1
( +1)
Finally, using (32) and (24), one has dil(n; d) = min =jE (Gd)j f 2F f con(n; d) = min =jE (Qn)j; f 2F f which, after taking into account jE (Gd)j = (p ? 1) Cd d 2n and jE (Qn)j = n 2n? , implies Theorem 2. 2 1
10
4 Simulation of the Uniaxial Algorithms 4.1 Bounds for the Sum of Edge-Congestions
We show rst that Scon(n; 1) = Pni wi 2i? . For that we consider a slightly more general function de ned on numberings of Qn by numbers 1; 2; :::; 2n. Let N be such a numbering and Am V (Qn) be the set of the rst m numbered vertices. Consider @Am and denote by i (Am) the number of edges of @Am , which have the ith dimension. Further denote 1
=1
i(N ; n; m) = max (A ); j m i j n X wi i(N ; n; m): (N ; n; m) = 1
i=1
In these denotations
Scon(n; 1) min (N ; n; 2n): N
(33)
Let l(n) equal the minimum in (33) and for 2 f0; 1g and A V (Qn), jAj = m denote
Qn; = f(x ; :::; xn) 2 V (Qn) j xn = g A = A \ Qn; ; m = jA j: Clearly the numbering N induces naturally the numberings of Qn; and Qn; , which we denote by N and N respectively. One has (N ; n; m) (N ; n ? 1; m ) + (N ; n ? 1; m ) + wn; (34) where computing (N ; n ? 1; m ) we deal with the weights w w wn? . To show (34), consider the set Am V (Qn) and notice that @Am in Qn surely consists of @Am and @Am in the subcubes Qn; and Qn; respectively, if we consider just the edges of the rst n?1 dimensions. Moreover, n (Am) 1 and @Am \@Am = ;. Clearly,P(34) implies l(n) 2 l(n ? 1) + wn, the solution of which with l(1) = w gives l(n) ni wi 2i? , and so by (33) n X Scon(n; 1) wi 2i? : 1
0
0
0
0
1
1
1
0
1
1
0
1
2
1
1
0
1
1
=1
1
1
i=1
In order to prove the inverse inequality, consider the numbering of Qn induced by the Lexicographic order L. Clearly, i(L; n; 2n) = 2i? for i = 1; :::; n, which gives 1
Scon(n; 1) (L; n; 2n) =
n X i=1
wi 2i? : 1
In the case d > 1 and w = = wn = 1 we use quite similar arguments as in Section 2.3 (cf. (22)), introducing the set Fm Gd and taking into account (21) we come to the 1
11
inequality
Scon(n; d) jC l((Fn) )j = 2 nd (1 ? o(1)):
(35)
m0
, which leads to To get an upper bound we consider the embedding flex
Scon(n; d) Scon(n=d; 1) 2 nd :
4.2 Bounds for the Sum of Dilations Consider rst the case d =P1. We apply induction on n. For an embedding Pn d 2fn and i = 1; :::; n denote di = jEij dilf (e). Note that max dil ( e ) d and ?1 f i i i e2Ei e2Ei (cf. (30) and (27)). Now assuming w w wn and applying the inductive hypothesis with wi0 = wi ? wn, i = 1; :::; n ? 1, one has 1
1
=1
2
Sdil(n; 1)
n X
i=1 nX ?1 i=1
wi d i =
nX ?1 i=1
wi0 di + wn
wi0 2i? + wn(2n ? 1) = 1
n X
i=1 n X i=1
di
wi 2i? : 1
(36)
For d > 1 using the same method and (30) again, one gets n n X X X X Sdil(n; d) di = 2n1? dilf (e) = 2n1? dilf (e) d 2 nd (1 ? o(1)): (37) i i e2Ei e2E Qn =1
1
1
=1
(
)
Clearly, the embedding flex one has di = 2i? for i = 1; :::; n and thus Sdil(n; 1) Pn w for i ? i 2 , which ts the lower bound (36). Furthermore, the embedding flex with i w = = wn = 1 provides 1
=1
1
1
Sdil(n; d) d Sdil(n=d; 1) d 2 nd :
This completes the proof of Theorem 3. 2 Note that the case of arbitrary weights wi for d > 1 our approach provides lower bounds for Scon and Sdil of the form 2
n ?d n X
1
d
i=1
wi 2i? (1 ? o(1)) 1
instead of (35) and (37). However, concerning the upper bounds, the edge-congestion of equals the maximum of the edge-congestions of embeddings f of then embedding flex lex d Q in a pipe. This scheme requires partitioning the set fw ; :::; wng into d parts Qi with possibly equal (weighted with powers of 2) sums. One practical way to do this is to put Qi = fwj j j = (i ? 1) mod dg, i = 1; :::; d. 1
12
5 Conclusion Since the communication time of the most parallel algorithms signi cantly exceeds the time required for local computations, the choice of appropriate embeddings is very important for the simulation of these algorithms. We considered one-to-one embeddings of the hypercube of dimension n into d-dimensional grids with the same number of nodes and derived bounds for the dilation and edgecongestion of such embeddings. We also introduced and studied two new cost-measures for embeddings. These new measures are much more suitable for estimating the slowdown of uniaxial algorithms than ordinary embedding parameters. It is interesting to note that concerning these ordinary parameters (dilation and edge-congestion) an embedding optimal for one of them is not optimal for the other, while for our new cost-measures is optimal for both of them. This makes it highly probable that this the embedding flex embedding provides good practical results for the simulation. Below, we present some experimental results concerning data transfer in the Parsytec GCel 1024 parallel computer, consisting of 1024 transputers T-800 con gured in the 32 32 grid. The message-rooting in this computer is supported by a software implementation of the wormhole rooting. We simulated the communication behavior of the uniaxial algorithms with w = = wn for n = 1; :::; 10, by sending an amount of 6.4 Mbyte of data through each edge of Ei in Qn for i = 1; :::; n, split into small packets of size 120 bytes. For each mentioned value of n, we embedded Qn into a subgrid of the 32 32 grid and for each i = 1; :::; n we measured the maximal time ti for a message to reach its destination in the grid along the paths corresponding to the images of the edges of Ei in the embedding. Let t = (t + + tn)=n, then the average performance P (f ) of the communication in the embedding f equals ) and P (f ). As one P (f ) = 6:4=t Mbyte/s. In Fig. 1 we show the results for P (flex ban can see from the gure, the performance of each embedding is almost the same for n 4, which is explained by the similarity of the orders B and L for small n. However, for larger n, the dierence increases and reaches the factor of approximately 6 for n = 10, i.e. the whole communication in uniaxial algorithms is done 6 times faster in the embedding flex . Therefore, the embedding f is not only asymptotically optimal, as compared to fban lex we showed above, but also provides very good practical results for concrete instances. 1
1
6 Acknowledgments We thank Ulrich Freise for providing the measurements on the GCel 1024 parallel computer.
13
References [1] Annexstein F.: Embedding hypercubes and related networks into mesh-connected processor arrays. J. Parall. Distr. Comput., 23 (1994), 72{79. [2] Bezrukov S.L., Rottger M., Schroeder U.-P.: Embedding of hypercubes into grids. Tecnical Report tr-sfb-95-2, University of Paderborn, 1995. [3] Bollobas B., Leader I.: Compressions and isoperimetric inequalities. J. Comb. Th., A-56 (1991), 47{62. [4] Feller W.: An introduction to probability theory and its applications. Vol. 1, New York: Wiley, 1968. [5] Harper L.H.: Optimal assignment of numbers to vertices. J. Sos. Ind. Appl. Math, 12 (1964), 131{135. [6] Harper L.H.: Optimal numberings and isoperimetric problems on graphs. J. Comb. Theory, 1 (1966), No.3, 385{393. [7] Katona G.O.H.: A theorem on nite sets. In: Theory of Graphs, Akademia Kiado, Budapest 1968, 187{207. [8] Lai T.-H., Sprague A.P.: Placement of the processors of a hypercube. IEEE Trans. Comp., 40 (1991), No.6, 714{722. [9] Nakano K.: Linear layout of generalized hypercubes. In: Lect. Notes in Comp. Sci., 790, Springer Verlag, 1994, 364{375. [10] Zienicke P.: Embedding hypercubes in 2-dimensional meshes. Humboldt-Universitat zu Berlin, (unpublished manuscript).
14