A heuristic processor allocation strategy in hypercube ... - koasas - kaist

A Heuristic Processor Allocation Strategy in Hypercube Systems S.Y. Yoon, 0. Kang, H . Yoon, SR.Maeng, andJ.W.Cho Departmenl of Computer Science Korea Advanced Institute of Science ana' Technology(KAlST) 373-1Kusong-Dong Yusung-Gu. Taejon 305-701 Korea E-mail: [email protected]

ABSTRACT In this paper, a new processor allocation scheme for hypercube systems, called the HPA (Heuristic Processor Allocation) strategy, is presented. In this scheme, an undirected graph, called the SCgraph(Subcube-graph), is used to maintain the free subcubes available in system, which are represented by vertices. An allocation request for a k-cube is satisfied by finding a free subcube of dimension k in the SC-graph or by decomposing a nearest higher dimension subcube. If there are more than one subcube of dimension k, a subcube which has minimum degree in the SC-graph is selected to reduce the external fragmentation. For deallocating the released subcube a heuristic algorithm is used to maintain the dimension of free subcube as high as possible. It is theoretically shown that the HPA strategy is not only statically optimal but also it has a complete subcube recognition capability in a dynamic environment. Extensive simulation results show that the HPA strategy improves the performance and significantly reduce the allocation/deallocation time compared to the previously proposed schemes.

location of a subcube of the size determined by 1). The first step is investigated in [9]. The second step is the subject of this paper and called processor allocation in general. The main objective of processor allocation is to maximize processor utilization and to minimize externdinternal fragmentatwn. By recognizing more subcubes which are available in the system, processor utilization is enhanced. Internal fragmentation occurs when the processor allocation scheme cannot recognize the available subcubes, and external fragmentation occurs when even if a sufficient number of processors are available in the hypercube. they may not form a subcube large enough to accommodate the incoming task because they are scattered. To deal with this fragmentation problem, many schemes [2,4,7] have been proposed recently. There are two approaches for the processor allocation, the bottom-up and top-down approach. In the bottom-up approach, 2" allocation bits are used to keep track of the availability of all the processors. To form subcubes, the allocation schemes examine the allocation bits. By the way of searching the allocation bits, there are several schemes, such as the S&y strategy 131, the Gray code(GC) strategy [3!, and the treecollapsing(TC) strategy [5]. Since the bottom-up approach only uses informations about the allocation bits, it is fairly difficult to reduce external fragmentation. Furthermore, under heavy load on the system, it takes much time to search free subcubes. On the other hand, the topdown approach keeps the lists or set to maintain the free cubes available in the system. Among the top-down approaches, since thefree list strategy [8] maintains a list for each dimension to keep the free subcubes available in the system, it can perfectly recognize all the subcubes. Another top-down appmch proposed in [6] maintains a subcube set to keep the free subcubes available in the system. In this paper, we propose a new topdown allocation strategy, called Heuristic Processor Allocation (HPA) smtegy. An undirected graph, whose vertex represents the available subcubes in the system, is used to allocate and deallocate the subcubes. When dealle cating the released subcube, a heuristic algorithm is used to maintain the available subcube as large as possible. To reduce search time and minimize external fragmentation, it modifies only the relaled nodes in the

1. Introduction Recently, hypercube systems have been drawing considerable attention from researchers due to their topological properties of high potential for the parallel execution of various algorithms, An n-dimensional hypercube system, also known as a n-cube, is a parallel computer characterized by the presence of 2" processors interconnected as an n-dimensional binary cube. A processor with its own local memory is connected to its n neighboring processors. There are 2" distinct n-bit binary addresses or labels that may be assigned to the processors such that the address differs from that of each of its n neighbors in exactly one bit position. For multi-user multi-tasking hypercube systems, one of the fundamental problem is how to allocate subcubes to incoming tasks. A task in a hypercube system is represented as a task flow graph and assigned to a subcube in the system for execution. Upon completion of execution, the subcube used for the task must be released for later use. The efficient allocation and/or deallocation of processors in the hypercube system is an important issue for achieving high performance. Prmssor allocation in a hypercube system consists of two steps : 1) determination of the size of a subcube to accommodate an incoming task, and 2) recognitwn and

0-8186-2310-1191 $1.00Q 1991 IEEE

574

graph when the allocation or deallocation is requested. From the simulation results, it is observed that the processor utilization with the HPA strategy is almost the same as that with the multiple GC strategy. But it gives rise to virtually the same search time as the Buddy and the GC strategy and much less than that of the multiple GC strategy. The rest of the paper is organized as follows. Section 2 gives previous work on processor allocation for hypercube systems. The HPA strategy is presented in Section 3. Section 4 presents the theoretical analysis and simulation results to compare the proposed strategy with others. Finally, concluding remarks are given in Section 5.

pared to the multiple GC strategy. Moreover, the implementation of the extended version is quite complex. A subcube allocation scheme based on tree collapsing, called the TC strategy, has been proposed in [5]. The basic idea of this scheme is an extension of the Buddy strategy and lies in collapsing the binary tree representations of a hypercube successively so that the nodes which form a subcube but are distant are brought close to each other. This scheme can recognize all subcubes with much less complexity than the multiple GC strategy in generating the search space. But this scheme needs a large storage for keeping binary tree structures as well as a long execution time to produce a collapsed tree.Therefore, the authors proposed a hardware circuit to speed up the tree collapsing. A subcube allocation scheme for a large dimension of hypenxbe, n210, has been proposed in [ll], called the Best-Sequential-Fit-First (BSFF). In this scheme, the subcube information and computation are distributed among idle nodes, therefore the burden of the host are minimized. This scheme has a complete subcube recognition capability. A common feature of above schemes is that they use a bottom-up approach since a subcube is formed starting the lowest dimension. In these approaches, allocation and deallocation of subcubes usually result in a fragmented hypercube excessively.

2. Previous Works For the comparison of processor allocation schemes, the major factors which affect the criterion of superiority are the property of optimum, subcube recognition capability, and time complexity of subcube allocation. An allocation scheme is optimal if it can always allocate a k-cube when there are at least 2k free nodes. An allocation scheme is called statically optimal if it is always able to allocate a k-cube when there are at least 2k free nodes available, under the condition that none of the allocated cubes are released. On the other hand, it is called dynamically optimal if the scheme can accommodate processor allocation and deallocation at any time. Several subcube allocation schemes have been proposed in the last few years which have a complete subcube recognition capability and/or the property of statically optimal.

2.2 Top-down Approach

[a

In order to minimize fragmentation, Dutt and Hayes have proposed the subcube allocation and coalescing strategies. This scheme introduced the concept of a man’mal set of subcubes (MSS), which is useful in making allocations, and suggest a heuristic algorithm for efficiently coalescing a released cube with the existing free cubes. It has been proved in [6] that the problems of allocating subcubes and of forming an MSS are NP-hard in a dynamic environment. This scheme is shown by simulations that the coalescing strategy, which uses heuristic algorithm, is two to three times slower than the Buddy strategy but has a considerably higher performance, measured by the allocation ratio for subcube requests, than the Buddy strategy. An another top-down approach, called thefree list strategy, using k-map adjacency rule has been also proposed in [8]. In this scheme, afree list is used to record all disjoint free subcubes, with one list for a dimension. A requested subcube is allocated from the list. Therefore, the allocation steps are simple, but the scheme involves a quite complicated deallocation procedure. During a subcube deallocation process, the deallocated subcube is compared with all adjacent idle subcubes in order to form a higher dimension of subcube. Although this strategy assures full subcube recognition in a dynamic environment, its time complexity in the worst case can be very high. Hence, the extension of the algorithm for parallel implementation is introduced and the modjiedfree list scheme for reducing the time complexity is proposed.

2.1 Bottom-up Approach

Chen and Shin [3] proposed two subcube allocation schemes using the Buddy system, called the Buddy strategy, and using a Gray code (GC), called the GC strategy, the predominant schemes currently in use. Although they can be easily implemented because of its simplicit , the Budd and GC strategies can only recognize 2”’; and 2n-k+Ysubcubes of dimension k respectively, both in O(N) time, where N=2” and o p, if all the nodes included in p are also included in a,and the dimension of a is greater than p. If a and p have the same nodes, then a is equal io p, denoted a = p.

Let k be the size of subcube which can accommodate the input task. Step 2 If there are nodes of size k, select a node n, which has the minimum degree in Gf.Allocate the corresponding subcube and delete n, from Gf. Stop. Step 3 If not, find the nearest higher dimension subcube Q, that is available. If there is no higher dimension subcubes in the graph G , then the request is denied and the task is {ept in the

Step I

Definition 5 Let Qf be a set of free subcubes in hypercube Q,. Then an undirected graph GJ = (N,E) be a SC-graph for QJ if there is a node n 1 corresponds to the element of QJ and is an edge ( n l , n 2 ) E E ifH(n1,nz) = 1, where n 1, n z E N.

576

None of these free subcubes can accept the request I,. According to the above property, total number of free nodes, 2IcJ, is smaller than 2"J. If we add all the

waiting queue. Step 4 Decompose the subcube Q,, which is found in step 3, until the requested size of subcube is reached. Delete the node which corresponds to Qt from ct,, and insert the node which represents remained subcube, to Gf.Go to step 2. Example 1 Fig. 3.2 (a) shows the current subcubes available in the 4-cube system, and Fig. 3.2 (b) shows the corresponding SC-graph. Let us assume that there is a request of subcube of size one. Then the HPA strategy allocates C 4 [ 1 lo*) because the degree of C4 in G, is less than that of C Z [ 11*1). After this, let us assume that there is a 2-cube request. Then, C3 is decomposed into two 2-CUbeS cs {()I**), c6 [00**). and C 6 is all@ cated because it has the minimum degree. After Cq and C6 are allocated, the available subcubes and the corresponding SC-graph are shown in Fig. 3.3.

'"YI

1=

allocated and free nodes, we must have ( 2"i') = 2". Since

4

2141 > 2" can

t

strategy can alwa;=S accommodate any input task sequence I = (I1, I2, ...,Zk] if and only if 2 'I I2".

f;

n u s , the HPA strategy is statically optimE 3 3 Deallocation Scheme in the HPA Strategy

When a task is completed, the corresponding subcube is deallocated for next use while keeping the dimension of free subcubes available in the system as large as possible. To achieve this goal, the free list strategy proposed in [8] searches exhaustively all the subcubes which can be formed with the elements in the free list of subcubes. In the HPA strategy, however, since only a part of SCgraph with respect to the released subcube is tested, search time can be significantly reduced. When a subcube Qk of size k is released, the SCgraph is changed into one of the following three cases according to the size of k and the current structure of SC-graph.

v2

@)

Fig. 3.2 SC-graph before allocation

Case-I Only one subcube of size k is added. Case-2 More than one subcubes of size k are formed each of which include part of the released subcube. Case-3 The subcubes of size from k+l to M+l are formed, where M is the size of the highest dimension subcube before the processing of deallocation. The new subcubes then include all or part of the released subcubes.

1 v1

v5

11

< 21'11,

2"11+

be derived, for j I EIt contradicts hi assumption on the size of cubes, i.e., 2"J I2". Therefore, the HPA

v4 v3 c 1 = (1110) c 2 = (11*1) c 3 = IO***) c 4 = (loo*)

(a)

2lCll

#

c 1 = (11101 c2 = (ll*l] c5 = (01**)

It is easy to process case-1 and case-2, whereas the processing of case-3 is very difficult and complicated because free subcube size should be kept as large as possible. Hence, the complexity of the deallocation algorithm is determined by the case-3.

@)

Fig. 3.3 SC-graph after allocation

Lemma 1 Let Qn be a n-dimensional subcube and be decomposed into a set of subcubes, ( C l , C2..., C m ) , where m23. Then, a SC-graph which represents (Cl, C2 ...C,,, 1 has a cycle.

Theorem 1 The HPA strategy is statically optimal. Proof For the static allocation, the HPA strategy can have at most one node for each dimension in the SCgraph. So, all the nodes are distinct from each other in terms of subcube dimension. Let I = {Il, 1 2 , ..., I k ) be the input task sequence such that 2"' I 2 " , where

Proof Trivial 0 Lemma 2 Let ni and n . be nodes of SC-graph corresponding U> subcubes di and C j , respectively. Let there exist an edge between ni and n, in the SC-graph, i.e., H(Ci, C,) = 1, and let subcubes C, and C m be greater than or equal to Ci and C j , respectively. Then, there exists an edge between N,, and N, which correspond to the subcubes Cn and Casi.e., H(C,, C m ) = 1.

x.$

Vi I is the dimension of the request Assume that there is a task lj(l5jijlk) that cannot be accommodated by the HPA strategy. From this, it is obvious that there is no subcube larger than or equal to Ujl in tke remaining free

cubes. Let the set of remaining free subcubes after the allocation of request up to Ij-1 be C = {Cl, C 2 , ..., Cm).

577

Proof Trivial 0 Theorem 2 Let the subcube Q,, not be formed from the SC-graph Gf before the subcube Qk is released, and let Q n for n>k be constructed from a set of nodes in the SC-graph Cf, ( n 1, n2, ..., n, ) (m23). Then the node N, which corresponds to the Qk is a member of { n 1 , n 2. ..., n, ) and there is a cycle which consists of n I , n 2, ...,n,.

Proof We know that N, is a member of In I , n 2, ...,n,) because the subcube Q. can not be formed from the SC-graph Gf before the subcube Qk is released. Let C 1 , C 2, ... Cm be subcubes corresponding to n 1, n 2. ... n m , respectively, and Q , be generated from S 1,S 2 , ... S, where Si I Ci, llilm. Then, by Lemma 1, a SC-graph which is constructed from S I , S2, ... S , has cycle. By Lemma 2, a SC-graph which IS constructed from C I , CZ,... C, also has cycle. 0 When a subcube of size k is released, the deallocation algorithm in the HPA strategy fist examines whether case-1 is satisfied. If not, then case-3 is processed. If the released subcube can not form the subcube Q , (rn>k) in this step, then case-2 is processed. The deallocation algorithm in the HPA strategy is as follows: Processor Deallocation ( Let the terms related to the deallocation algorithm are defined as follows: k : Size of the released subcube Q k N r : Number of free processors in hypercube system Q. Qf : Set of free subcubes available in Q. Cr : SC-graph corresponding to Q f M : Size of the highest dimension subcube in

Qr

M,,

: Integer p such that 2P I N[+2' < 2 + l , i.e.. M,, is the size of the highest dimension subcube which can be formed when Q k is added to Q f Let Mi be initially empty.)

.

Step 1 Insert the node Nk corresponding to Q k into Gr . Srep 2 If the degree of Nk is zero or 2' > N f , then stop. Step 3 For each cycle which contains Nk in Cf,do the following steps of (a) and (b). The search space which enumerates cycles is restricted to the nodes Ni such that the hamming distance between Q k and the subcubes corresponding to Ni is less than or equal to min(M,, max(k, M+l)), where min(M,,, max(k, M+1)) is the size of the highest dimension subcube with Q t . (a) If the subcubes QCi corresponding to the nodes which constitute a cycle form subcube Q , of size (m>k), then insert Q, into the set M i . (b) If m is equal to min(M,, max(k, M+1)), then go to step 4.

Step 4 Let Q m x be the maximum subcube in Insert a node corresponding to Qmrx into

Mi.

Gf,

and form subcubes QRi from the nodes which constitute subcubes QCi and are not included Qmu. Insert the nodes corresponding to QRi into Gf and delete the nodes corresponding to QCi from Gf. Step 5 If step 3 does not form Q,,, (>k), then find a complementary subcube QL' of Q k . If i) exists, form a subcube Qk+l from Qt and Q k . Insea the node corresponding to Qt+l into Gf d, delete the nodes corresponding to Q k and Qk from Gf. Step 6 If step 5 does not form Qt+l. then find two subcubes,. Q c l .and Qc2, of size k-1 such that the hammmg &stance and exact distance between Qt and them are one and two respectively, and there is a bit position i such that the addresses of three subcubes differ at the position i. If they exist, then form two subcubes, &I and Qk2, of size k. Insert the nodes corresponding to Qk1 and Q k 2 into Gf and delete the nodes corresponding to Qc I , Qc 2. and Qk from Gf. Step 2 examines case-1. Step 3 and step 4 generate higher dimension subcubes using the propew of Theorem 2. Step 5 makes the subcube of size k+l with a complementary subcube, and Step 6 processes case-2. Step 5 generates a new subcube which has a higher dimension than the released subcube even if they may not form a cycle in the SC-graph. Using the p p e r t y of Theorem 2, the deallocation algorithm forms new subcubes which has the higher dimension than the relinquished subcube by detecting a cycle in the SC-graph. The cycle includes all or part of nodes corresponding to the released subcube in the SC-graph, and the cycle which forms the largest dimension subcube is selected. The purpose of this selection strategy is to minimize external fragmentation. An example for the deallocation algorithm is given below. Example 2 Let the set of free subcubes and corresponding SC-graph be shown as Fig. 3.4, and the released subcube be S. Then the node N, corrmnding to the subcube S is created by step 1. By step 3, two cycles which consist of (N,,C 2 , C3) and IN,,C I ,C2, C31 in the SC-graph are detected Usmg step 4, the cycle which consists of C 1 . C2. C 3, and N, is selected to form a maximum subcube. Therefore, the subcube CSIo***) is formed from the cycle which consists of C 1, C 2. C 3. and Ns .The node corresponding to this cycle is insetted to the SC-graph and all the nodes connected to this cycle are removed. The nodes which are not included in Cs IO***) form a new subcube c6 (1110). Hence, three Subcubes c4 ( 1 ~ )c5. (0***], and c6 ( 1110) are formed, and the SC-graph is constructed as Fig. 3.5 (b). Fig. 3.6 illustrates the processing of step 6. One subcube of size 2 and two subcubes of size 1 make two subcubes of size 2. Let the current free subcubes be C1 [01'1) and C2 [ 10'0) and the released s ~ b c ~ b bee S (00'' ). Using step 6, two ~-cu&, C3 and C4, an be generated.

578

2) Suppose that a higher dimension subcube Q k can be generated from a set of subcubes {Cl,..., Cn) (n 2 3). Then, by Themem 2, Qt E (Cl, CZ,..., Cn) and a SCgraph of {Cl,C2,...,C,) has cycle. Step 3 in the deallocation algorithm of the HPA strategy detects all possible cycles including the node corresponding to Qk. Thus, a cycle consisting of n 1, n 2, ...tt, is also detected and a higher dimension subcube Q k is really constructed. From 1) and 2), the HPA strategy generates a recognizable k-cube if it exists. 0

c1= (oo**) c 4 = [ 1OOo) C2 = (0111) Ci = (*110) c3 = (010*)

(b)

4. Performance Comparison

Fig. 3.4 The processing of step 3 and 4(before deallocation)

The time complexity of the single GC strate is O(2") [8].,in the multiple GC strategy, ~ ( n~,n / 2 y d i f i m n t GC s must be generated to achieve complete subcube recognition. Hence the multiple GC scheme has time complexity o(c(~, In /2 J)2n). There are two docation schemes using topdown approach. namely, the MSS strategy and thefree list Strategy. In the former scheme, the authors proposed two deallocation algorithms, one is a heuristic algorithm without perfect subcube recognition ability and the other is an optimal algorithm. The time complexity of these algorithms are O(n2") and O(Z3*), respectively. In the later scheme, the deallocation complexity claimed in [8] is O(n3), derived by assuming the list length to be O(n). Theoretically,however, the list length can grow as large as 0(2"4) when only one or two lists are nonempty, say the list for 0 or 1 dimension. Therefore, this gives the rise to the time complexity of at least O ( t 1 ~ 2 . ~where ) , k is a subcube dimension. From this point of view, the exact time complexity is fairly difficult to estimate. We now analyze the theoretical time complexity of the HFA strategy. The time complexity is related to the number of cycles in the SC-graph. Theoretically, the maximum number of nodes in the SC-graph is equal to the number of disjoint subcubes when all the free disjoint cubes are in dimension 0. For the processor allocation, to reduce the external fragmentation, one needs to search the node which has the minimum degree in the SC-graph. Hence, the time complexity of the HPA strategy allocation is O(2"). If we assume that the number of subcubes is O(n), the allocation complexity of the HPA strategy is O(n). For the processor deallocation, step 3 of the deallocation algorithm takes the maximum time since all cycles in the SC-graph must be searched. Mateti and Deo [lo] have shown that the upper bounds of time complexity for enumerating all the cycles in a graph is O((v+e)c), where e is the number of edges, v the number of nodes, and c the number of cycles in the graph. In a SC-graph, the maximum number of nodes is equal to the number of members in the muximl independent set corresponding to the hypercube. Therefore, the maximum number of nodes in the SC-graph is less than or qual to 2&-l,where k is the dimension of subcube. Note that the maximum number of edges for node n can grow as large as n(n-1)/2 when all the nodes are connected because the complete graph has the maximum number of edges in undirected graph. The number

1100

c 4 = ( 1OOo) 011 c 5 = (O***) C 6 = (1110)

1000

(4

4.1 The Worst Case Analysis

(b)

Fig. 3.5 The processing of step 3 and 4(after deallocation)

y=yoo:;

-

c 1 = (01*1) c 2 = (10*0)

v3

v4

(c)

c 3 = (*O*O) c4 = (0**1)

Fig. 3.6 The processing of step 6 in deallocation Theorem 3 The HPA strategy can recognize any possible k-cube if it exists.

Proof Without loss of generality, assume that the subcube Qk can not be formed from the SC-graph G, before the subcube Qm (m