A distributed scheme for many-to-one embedding ...

A Distributed Scheme for Many-to-one Embedding Binary Trees into Hypercubes Yao-Ming Yeh and Yiu-Cheng Shyu Department of Infomation and Computer Education National Taiwan Normal University Taipei, Taiwan

Abstract The commercial hypercube-based machine is a good physical architecture for embedding variety logical topologies because of its logarithmic diameter and high bandwidth. Through such embedding. the algorithms originally developed for logical topologies can be directly executed on hypercubes. In this paper, we present a distribved scheme for embedding larger binary trees into smaller hypercubes. From a practical point of view, this is a very important issue when hypercube of limited size is availabfe. The desired embeddings can be constructed in parallel with time complexity of 0 (log N) , where N is the total number of tree nodes. It provides a uniform many-to-one embedding with mini" overhead.

bo"up algorithms for embedding a complete binary tree into a hypercube e i k with dilation 1 and expausion 2 or with dilation 2 and expansion 1; Johnsson [7]. in contrast, gave a topdown algorithm. Also at that time,Bhatt and Ipsen [81, Desphende and Jenevein [SI shown that a complete b w tree cannot be embedded

into a hypemube with dilation cost of 1 and expansion cost less than 2. Therefore, I two-rooted complete binary tree is i n w t d d which can be embedded into a hypercube with dilation cost of 1 and expansioncost of 1. A different and simpler method fm embedding the tworooted complete binary tree was given by K. Efe [lo]. Since most of the previous results [6,7,8,9,101 are based on a centralized algOtithm to deal with one-to-one embedding only. it would be infeasible when the number 1 Introduction of tree nodes are larger than the availablecube nodes.To Due to its structural regularity and numerous werwme this problem, it is desirable to study other properties [l]. the hypercube has drawn considerable useful embeddings. such as many-to-one embedding, in attention in recent years. Several successful which multiple tree nodes tm mapped into a single cube multiprocessa systems have been built using the node. The many-to-one embedding is commonly arised hypercube topology, including the Cosmic Cube 121. the from the execution of tree-lie programs in which only Intel iPSC [31. and the nCUBE [41. one level of tree nodes are active in each step of the From a programming perspective. the binary trees execution. As one level of nodes are active at one time, are important class of computational structures. They only nodes on the same level have to be mapped to are widely employed to solve many applications distinct cube nodes. In this paper. we present a utilizing the " divide-andconquer " methodology to distributed scheme for embedding larger binary trees formulate parallel algorithms such as merging, sorting into smaller hypercubes. which provides a uniform and graph problems, etc. Therefore. it is interesting to many-bone embedding with minimum overhead. This see whether these algorithms can be executed on scheme is an extension of our earlier result proposed in hypercubes efficiently. One way to do this is to embed [I 11 for distributed one-to-one embedding two-rooted binary trees into hypercubes, Through such embedding, complete binary tree into hypercube. the algorithms originally developed for binary trees can The rest of this paper is organized as follows. be directly executed on hypercubes. Section 2 introduces the basic notations and The problem of embedding can be modeled by p r e l i i e s . A brief review of the previous scheme in graph mapping. It can be viewed mapping the guest [ll] for distributed one-to-one embedding two-rooted graph into the host graph. The cost of an embedding is complete binary tree into hypercube is given in Section 3. usually measured in terms of dilation and expansion. Section 4 presents the proposed many-to-one scheme. Dilation is defined as the maximum distance in the host Concluding remarks are given in Section 5. graph between the adjacent nodes in the guest graph. Expansion is the ratio of the size of the host graph to tbe 2 Notations and Preliminaries An n-dimensional hypercube ( or ncube for short ), size of the guest graph. denoted by Qn, consists of 2 O identical processors. Each Many researchers have studied the embeddings of binary trees into hypercubes. A survey of several hown processor has its own local memory and is point to point communication results was given in [SI. Wu [6] gave recursively defined, interconnected by IQ".'

172

channels which is known as the edges of an n-cube. Each node of Qn is given adistinctn-bitlabelor address .xl ( bit xi corresponding to dimension i ), such that two nodes connected by a link have labels that differ in exactly one bit. Each link between adjacent nodes has dimension i if the labels of the nodes differ only in the ith bit. We consider the problem of embedding a guest graph G = {VC.Ec) into a host graph H = {VH,EH]is a one-to-onefunction f : VC + VH. Dilation cost of an embedding is the maximum { Hamming distance ( f(u). f(v) ), (U. v) E Ec ). Expansion cost is defined as the ratio IVHI / IVd. Intuitively, dilation measures communication overhead. and expansion measures processor utilization. When embedding is performed in many-to-one manner, we use load factor instead of expansion to measure processor utilization. The load factor of an embedding is defined as the maximum number of nodes in guest graph G that are mapped into the same node in host graph H. In this paper, the host graph is always the hypercube, and the guest graph can be the two-rooted complete binary tree or complete binary tree. In the previous schemes [6,7,8,9,10], the embedding can be viewed as a labeling of each tree node with n-dimensional (0.1) bits. The label of each tree node represents the cube node it will be mapped to. In our distributed embedding scheme, starting with the root, each node is responsible for determiuing the neighbors across which dimensions will be its children, and for invoking the embedding procedure at its children. It can be viewed as labeling each tree edge with a dimension within 1 to n. Since adjacent nodes in a hypercube differ in only a single bit position, a single integer ( i.e. edge label ) will suffice to differentiate between the neighbors of a node. If G can be embedded into H with dilation 1, the edge labeling of the G is called a proper edge labeling of G. In the subsequent discussion. neighbors of a given node will be referred to by the edge label differentiatingthem from this node. A complete binary tree of height n, denoted by Tu, is a one-rooted binary tree with the root on level 1 and 2"-1 nodes. A two-rooted complete binary tree conof height n, denoted by 2T.. comprises 2n nodes. It can be obtained from a complete binary tree by introducing a dummy node between the root and one of its subtrees. Fig. l(a) shows a complete binary tree Tu.Fig. I@) shows a two-rooted complete binary tree 2T.. with L and R as the two roots and SIand S, as the roots of their subtrees.

(a)

6)

Fip.1 Two types of complete binary trees

3 One-to-one Embedding of 2Tn To facilitate later explanation of the many-to-one scheme. we briefly review the distributed embeddmg scheme developed in [ll] for one-to-one embedding a two-rooted complete binary tree 2Tn into hypercube Qn. In this scheme,each node except two roots receives a packet of information from its parent. determines which neighbors will be its children, and send them the information they need to continue the configuration. The information needed to construct the desired embedding is called " packet message ". The format of the packet message is (n.currentleve1. Control-word,Control-list), where n is the height of the binary tree; currentlevel is the level of the current node in the binary tree; Control-word is a character which contributes a node to determine the locations of its children and generate the new packet messages for its children; Control-list is a cardinal number array that is used to number the items of the array with one corresponding to the leftmost item. Starting with the right root ( i t can be invoked at any cube node by the host processor). initially it will take the left root to be its neighbor across dimension n. Then the right root and left root are responsible for generating the initial packet messages (n,2.R,[n,n-I.... ,I]) and (n.2,L,[n.n-I.. .. .I]). and send them to the roots of their subtrees S, and SI across the dimensions n-2 and n-1 respectively. W e a node has received the packet message. the embedding procedure. named 2TreeEmbed. at that node performs the sequential stages as follows. Stage 1 : It check whether the node is a leaf or not. If currentlevel = n then the embedding procedure 2TreeEmbed terminates; otherwise. goes to the next stage. Stage 2 : It determines the dimensions across which its children should be located by retrieving the specific items of the Control-list. Stage 3 : It generates the new packet messages for its children which is used to construct the rest of the configuration. Stage 4 : It sends its children the new packet messages to invoke the embedding procedure 2TreeEmbed at these child nodes.

Owing to the use. of edge labeling method. the embedding can be rooted at any cube node by invoking the embedding procedure 2TreeEmbed at that node. When all the leaf nodes are invoked, the entire embedding has been constructed. The whole algorithm runs in O(1ogN) which is the time it takes to walk the paths from the two roots to the leaves in parallel, where N is the number of the tree nodes. Finally, the embedding result in [ll] can be concluded as the theorem below. Theorem 1 :R d u r e 2TreeEmbed constructs a proper edge labeling of 2T. for embedding into Q. with expansion 1 [ll].

173

1

An example of 2% which can be constructed a proper edge labeling by 2TreeEmbed far embedding into Q3 is shown in Fig.2. Note that the Control-word apd Control-list received by each node are shown in the brackets. the n and currentlevel are omitted for clarity.

oae-@one embedding procedure 2TreeEmbed. These modifications are based on the following two aspects : Aspect 1 : "he embedding of the internal nodes of 2T. is just like one-to-one embedding of 2Tn.i into Q-1. Aspect 2 : We restrict each node on level n-1 to take its left child to be itself ( Le. map to the same cube node) and take its right child to be its neighbor across dimension righr, where righr is the second item of the Control-list.

....... S.

(a)blthlset pp (b) at th.u d d2T-

obsesvation. we h o w from Theorem 1 that the intemal nodes of 2T. can be one-to-one embed into QPi. If we map each leaf node to a distinct intemal node, we have an embedding of 2Tn into Q-1. We can obtain such embedding by adapting, with simple modifications. the

b) rnbddlag mdt

Fig2 2T3 distributed embedding into Q 3

4 Many-to-one Embeddings of Binary Trees From a practical point of view, it is very important to address the issue afsimulating larger binary trees cm smaller hypercubes. If larger binary trees can be embedded into smaller

with minimum w$t. then problem of 1size can be solved with the available machine. Hence. many-embedding is highly demanded to investigate. For such embedding, three fundamental rqujrements that we Wiu emphasiite are : (1) lower d i l a h , (2) lower load f s @ , and (3) lower slow down. The fmt requkment is needed to keep ccmununication overhead low. The adjscent nodes of a binary tree should be mapped to the nodes of a hypercube as near as possible. Intuitively. the d requirement is needed f a balancing the work loads of processors. The n u m b of tree nodes mapped to a certain cube node should be uniform across the nodes of the hypercube. To better explain the third requirement, consider the executioo of tree-like programs in which only one level of nodes active in each step of the execution. Therefore. the tree nodes on the same level should be mapped to the cube nodes as different as possibly. If each tree node on the same level is mapped to a distinct cube node, the system will not incur any slow down since the tree nodes being simulated by a cube node are not active at the same time. Due to the restrictions we emphasize above, we reject the simple schemes that violate any requhment; otherwise. the embedding overhead will grow linearly with the number of nodes in the hypercube. The scheme we present €wesatisfiks all requirements above. 4.1 Embedding of 2T. In this subsection, we consider many-@embeddhg of 2Tn into Qk. k < n-1. Em't, we discuss the case k = n-I. which can be easily extended to derive all other c~sesfar k < n-2. A two-rooted complete binary tree 2Tn contains 2" nodes, of which 2n-1are leaf nodes. and 2p1 are hte4 nodes.We observe that the intemal nodes from level 1to level n-1 of 2T. can be viewed as a 2Tpi and the leaf nodes are equal to the internal nodes. Based on the

The m d e d 2TreeEmbed is described formally as follow. M d f i d 2TreeEmbed ( n . c u r r e n t ~ e l , C o n t ~ - ~ d , C o n t ~ - l ~ ) begin

P initialization stcp *I

if currentnodeis the right root R of 2T. then invoke the left mot L via link n-1 send (n,2.R,[n-l,n-Z,... ,ln via link n-2 to invoke S, Stop if currentnodeis the left root L of 2T. then send (n.2,L,[n-l,n-Z,... ,In via link n-1 to invoke SI stop P Stage 1 chedr whether it is a leaf node *I if currentlevel= n then Stop P Stage 2 determine the locations of its children */ switch ( currentlevel) case n-2 : left =Control-list[l] if Control-word = L then right = Control_list[3] else right = Control_list[2] case n-1 : left = P stay at current node */ right = Control-list[2] default : left = Control-list[l] right = Control-listf4] end P Stage 3 generate packet messagesfor its children */ delete the first item of the Control-list L-Control-list = R-Control-list = Control-list if currentlevel e n - I then if Control-word = L then interchange the first and second items of L-Control-list if Control-word = R then interchange the first and second items of R-Control-list P Stage 4 send the packet messages to its children */ send packet message (n.currentlevel+l,L,L-Conrrol-list) via link left to its left child send packet message (n,currentlevel+IRR-Control-list) via link right to its right child end

.

"-"

After these modifications. the embedding result of 2Tt1into Qu-iis stated in the following lemma. Lemma 1 : Modified 2TreeEmbed s u ~ s f u l embed l~ 2Tn into Qn-1with dilaticm 1. expansion 1/2, load factor 2. and without any slow down. Proof: The internal nodes of 2Tn can be viewed as a 2Tn-1. From Theorem 1, we h o w that it can be one-@

174

one embedded into Qn-1 with dilation 1. S i the restriction of aspect 2 of the modifications described above. each left leaf node is mapped to its parent and each right leaf node is mapped to a neighbor of its parent. Thus. the entire embedding is still with dilation 1. At this point. each cube node simulates exactly one internal node and one leaf node such that the load factor is 2 and expansion is In. Since the two tree nodes being simulated by a cube node are not active at the same time, the embedding will not incur any slow down.

L

m4 R

81

(0)

Fig.3 Embedding of 2T4 into Q 3 with dilation 1, expansion load factor 2. and without any slow down

In.

An example of embedding 2T4 into Q3 is given in Fig.3. Fig.3(a) shows that the internal nodes of 2T4 are one-to-one embedded into Q3. The locations of children determined by the nodes on level 3 of 2% are depicted in Fig.313). where the symbol " - " represents that the node and its child are assigned in the s e e cube node. Fig.3(c) shows the embedding result, where the dotted lines represent the intemal nodes which the leaf nodes w i l l be mapped to. The embedding of 2Tn into Q-I can be easily extended to deal with the embedding of 2Tninto Qk. k i n-2. The theorem below shows the extension method and generalizes a similar. but not identical. result given by K. Efe and K. Ramaiyer [121. Theorem 2 : A 2T. can be embedded into Qk , k S n-1, with dilation 1. expansion , optimum load factor Pk, and minimum slow down 2"'(n-k)-1. Before proving this theorem it is instructive to intmduce the optimum load factor and the minimum slow down of embedding 2Tn into Qk, k S n-1. The number of tree nodes of 2Tn is 2" and the number of cube nodes of Qk is zk. It needs no justification, the optimum load factor for such embedding is 2"nk = 2n-k. To better explain the minimum slow down, consider the type of tree-like algorithms in which only one level of nodes zue active at a time. If the number of tree nodes on the same level being simulated by a cube node is greater than one. the slow down will be incurred. Let Si denote

Inn-'

the maximum number of tree nodes on level i beiig simulated by a cube node. Hence. the slow down incurred on level i is Si-1. The embedding has minimum s~ Intuitively, if we slow down if ~ ~ 2is, mini". distribute the tree nodes on the same level evenly over the cube nodes ( i.e. Si. S2. ... Sk. Sk+I,Sk+z, ... , Sn = 1. 1, ... * 1, 2k/2k, 2k+lEk, ... , 2n-1n'; = 1, 1, ... , 1 , 20. 2l. ... 2*k-1). the slow down will be minimum. Proof ( of Theorem 2 ) : As described in Lemma 1, the nodes of 2Tnfrom level 1 to level k+l can be viewed as a 2Tk+1 and Can be embedded hto Qk with dilation 1. Then. we replace each leaf node of the 2Tk+1 by a (n-k)level binary tree, that is, the leaf node and its descendants of 2Tn zue mapped to the same cube node. As the result of this replacement, we can embedded 2Tn into Qk with dilation 1. At this time, each cube node nodes simulates an internal node and a subtree of 2"l' ( i.e. a total of 2nktree nodes ). Hence, the load factor is optimum 2n-kand the expansion is I/2n-k. Note that we distribute the tree nodes on the same level evenly over the cube nodes.The sequence SI, S2,. ... s k , sk+l, Sk+Z, . .. ,snof our embeddiug is 1, 1, ... , 1.2'. 2l, ... , 2n-k-1.Thus. we have an embedding with mini" slow down (1-1) + (1-1) + ... + (1-1) + (2'-1) + (2l-1) + ... + (2n-k-1-1)= 2"-'(n-k)-l. The many-to-one scheme we present here is entirely distributed and extremely flexible. It only needs O(1ogN) time to set up or reconfigure the desired embedding for an N-node tree. We can make a brief comparison between our scheme and previous study. K. Efe and K. Ramaiyer 1121 propose a centralized. node labeling algorithm for embedding 2Tn into Q k , k n-1. Their scheme has the same result in terms of dilation, expansion, and load factor. However, it needs O ( N ) time to set up or " f i i the desired embedding.

.

.

4.2 Embedding of T. M. Y. Fang and W. T. Chen [131 proposed a scheme for embedding Tn into Qk. k 5 n-1. They first embed Tn into Qn-1,and then fold Qn-i until it reaches the desired size of Qk. Although the result is bounded in terms of dilation and load factor above by constauts 2 and ( 3 / 2 ) ~ 2respectively, ~-~ their embedding has P 3 - l dilation-two edges in total and the load factor is not optimum ( optimum is 2n'k). It violates both the first and second requirements we emphasized. However, using a relatively simple method, we can get a better result that satisfy all reqkments, as stated in the following theorem. Theorem 3 : A Tn can be embedded into an Qk, k i n-I. with a single edge having dilation 2 and the rest having no more than dilation 1, expansion near Inn-k. optimum load factor 2n-kand minimum slow down 2""(n-k)-1. proof: Since a complete binary tree T. will become a two-rooted complete binary tree 2T. by introducing a dummy node between the root and its left subtree. the

175

embedding of Tu into Qk Can be easily derived from the embedding of 2Tu into Qk ( see Fig.4 ), where k 5 n-1.

TheonlydilatiOn-twoedgeofToisbetweentherootand its left subtree. Follows immediately from Theorem 2, we have such nuembedding. W

Tree-"l topolosy embedding. in particular. embedding into hypercube has received considerable interest As a result. it might be interesting to consider OUT embedding technique under various dynamic or probabilistic fault models or under relaxed load and dilation mlraints. MOR general areas of research include adapting OUT technique to the embeddings of o b structures. such as arbitrary trees. dynamic trees. and soforth.This will be pursuedin the futule.

References Y. Saad. and M. H. Schultz, "Topologicalproperties of hypercubes." IEEE. Trans. Comput.. vol. 31, No. 7. pp.

. Fig.4 Embedding transformationfrom 2T. to T

867-812,1988.

Fig.S(a) shows that the embedding of T4 into Q3 is obtained fronh the embedding of 2T4 into Q3. The embedding result is &own in Fig.S(b). where the dotted lines represent the internal nodes which the leaf nodes will be mapped to.

C. L. Seitz. "The cosmic cube," Commun. ACM. vol. 28, pp. 22-33. Jan, 1985. Intel Corp.. "A new direction in scientific computing." Order 28009-001. Beaverton. OR, 1985. L. N. Bhuyan. and D. P. Agrawal. "Generatized hypercube and hyperbus structur~ for a computer network." IEEE. Trans. Comput.. vol. C-33, pp. 323-333.

8 ..........

m

.

... ....... & m y

Izmid

(b)

Fig.5 Embedding of T4 into 4 3

[lo]

5 Conclusions

In this paper,

we have discussed the distributed scheme for embedding larger binary trees into analler

hypercubes. Fnrm a practical point of view. many-to-embedding is m important class of embeddings. If largw binary txes CM be embedded into smaller hypercubes. the problem of larger size can be solved with the available machine. With simple modifications of the one-to-one embedding scheme, the many-@one embedding with optimum load factor and minimum slow down can be derived easily.

[Ill

[121

[131

176

1984. M. Livingston, and Q. F. Stout "Embeddings in hypercubes," Proc. 6th Intern. Cot$ on Marhe. Modeling. Aug. 1987. A. Wu."Embedding of tree networks into hypercubes," J. Parallel Diswib. Comput. 2, pp. 238-249.1985. S. L. Johnsson, "Communication efficient basic linear algebra computations on hypercube architectures." J. Parallel Distrib. Comput. 4, pp. 133-172.1987. S. N. Bhatt, and I. C. F. Ippsen. "How to embed trees in hypercubes," Rep. YALG'DCSmR-43. Department of Computer Science. Yale University. 1985. S. Deshpande, and R.Jtnevein. "Scalability of a binary tree on a hypercube." Proc. ICPP. pp. 661468.1986. K Efe. "Embedding mesh of trees in the hypercube." f. ParallelDistrib. Comput. 11. pp. 222-230.1991. Y.C. S h y , and Y. M. Yeh. "Distributed schemes for embedding binary trees into hypercubes." Proc. ICAST. pp. 53-60.1994. K Efe, and K Ramaiyer, "Congestion and fault tolerance of binary tree embeddings on hypercube." Proc. IEEE. 32th Symp. Foundations of Computer Science. pp 458-463, 1991. M. Y.Fang,and W.T. Chen, "An embedding of large binary trees into hypercube multiprocessors of limited size," f. I M o m i o n Science and Engineering 8, pp 105-119.1992.

A distributed scheme for many-to-one embedding ...

A distributed scheme for many-to-one embedding ...

Suggest Documents

A data embedding scheme for H.263 compatible video coding ...

Reversible Data Embedding Scheme Using Relationships between ...

Randomized Embedding Scheme Based on DCT ... - CiteSeerX

Distributed Event-triggered Scheme for Economic ...

Traffic-Distributed Clustering Scheme for Cluster

Distributed Consistency Maintenance Scheme for Interactive Mixed ...

NeXeme: A Distributed Scheme based on Nexus

A Distributed Satisfactory Content Delivery Scheme ...

Distributed Deployment Scheme for ... - Semantic Scholar

Embedding a Peer-Supported Development Scheme - Science Direct

A Distributed Power Coordination Scheme for ... - Automatic Control

A Unified Feedback Scheme for Distributed Interference ... - CiteSeerX

A Light-Weight Distributed Scheme for Detecting IP ... - acm sigcomm

A Distributed Scheme for Efficient Pair-wise Comparison of Complete ...

A Scheme for Secure and Reliable Distributed Data Storage in ...

A Pattern Recognition Scheme for Distributed Denial of Service (DDoS ...

A semi distributed load balancing scheme for large ... - Semantic Scholar

A Distributed Scheme for LexiconâDriven Handwritten Word

D-MHR: A Distributed Management Scheme for Hybrid Networks to ...

A distributed BIST control scheme for complex VLSI devices - VLSI ...

A Distributed Prefix Allocation Scheme for Subordinate MANET

A Distributed Network Management Scheme for Real ... - BioMedSearch

A Distributed Dynamical Scheme for Fastest Mixing ... - IEEE Xplore

A Low Overhead Logging Scheme for Fast Recovery in Distributed ...

A distributed scheme for many-to-one embedding ...