Michigan State University. Michigan State University ... East Lansing, MI 48824 ... bors. It gives t,he best performance among all the existing strategies. We have ...
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Guiding
Processor Allocation with Mesh Connected Multiple
Yung-Kang
Chu
Dept. of Electrical Engineering Michigan State University East Lansing, MI 48824
Execution Systems
Time
for
I-Ling Yen
Diane T. Rover
Dept. of Computer Science Michigan State University East Lansing, MI 48824
Dept. of Electrical Engineering Michigan State University East Lansing, MI 48824
Abstract
submesh of a certain width and height. The processor allocator tries to find a free submesh for the job using a processor allocation strategy.
Mesh connected parallel architectures have become increasingly popular in the design of multiprocessor systems in recent years. Several submesh allocation strategies for two-dimensional mesh systems have been proposed. In this paper, we investigate the effect of using estimated erecution times to guide submesh allocation. We have proposed a family of processor allocation strategies, called Estimated Execution Time (EET) strategies, based on the estimated execution time information. Extensive simulations have been performed to study the performance of our strategies compared with other strategies. The results show that our strategies outperform existing strategies in terms of mean and standard deviation of response time under all load conditions and different job characteristics. Inaccurate estimation of such execution time will simply cause less overall system performance improvement, but not cause any execution failures.
1
Estimated Processor
Several processor allocation strategies have been proposed. Li and Cheng proposed the TwoDimensional Buddy System (PDBS) [5]. However, it is applicable only when the system is a square mesh with all sides being exactly powers of 2. Additionif the really, it will cause large internal fragmentation quested submesh does not also satisfy this constraint. To avoid these two drawbacks, Chuang and Tzeng proposed the Frame Sliding (FS) strategy [2]. It uses a frame sliding over the mesh plane to search for a free submesh. It is applicable to any rectangular mesh system with arbitrary width and height and it eliminates the internal fragmentation. However, this strategy sometimes misses an available free submesh and The First causes unnecessary external fragmentation. Fit (FF) and Best Fit (BF) strategies were proposed by Zhu [9] to improve the Frame Sliding strategy. Both strategies will recognize a free submesh for an incoming request if there is one available. Ding and Bhuyan proposed a first-fit strategy called Adaptive Scan (AS) [4], which allows the rotating of the submesh request by 90 degrees (i.e., interchange the width and height) and search the mesh system one more time if a free submesh is not found in the first pass. The Busy List (BL) strategy proposed by Das Sharma and Pradhan [3] is a best-fit strategy which tries to allocate a job to the submesh with the largest number of busy neighbors. It gives t,he best performance among all the existing strategies.
Introduction
Mesh architectures have become increasingly popular in the design of multiprocessor systems as worm-hole routing technology advances. Many twodimensional (2D) mesh systems have been built or are under development. Intel Paragon [6], Touchstone Delta System [7], and Tera Computer System [l] are several examples. A partitionable 2D mesh-connected MIMD system can be divided into submeshes of different sizes and run different jobs. To fully exploit such a partitionable system and maximize overall system throughput, we need to consider the problem of how to effectively allocate processors to jobs. Jobs are submitted to the system dynamically, with a certain arrival rate, and are placed in a global job queue. Each job requests a
We have performed simulation runs to study the performance of existing strategies and have observed that the different completion times of the jobs in the system can result in busy processors being arbitrarily scattered around, thus causing large external fragmentation. Based on our experience, many of the jobs
163 1060-3425195 $4.00 0 1995 IEEE
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
running on parallel systems are programs submitted repeatedly. The approximate execution time of these programs can be quite accurately estimated by the user. By providing an estimated execution time to the processor allocator, the system can allocate jobs with similar finishing times in the same areas. In this paper, we investigate the effect of using estimated execution times to guide submesh allocation. We have developed two processor allocation strategies, Estimated Execution Time-Search Boundaries (EETSB) and Estimated Execution Time-Search Corners (EET-SC), based on the estimated execution time information. Extensive simulations have been performed to compare our strategies with two existing best-fit strategies, namely, Best Fit (BF) strategy and Busy List (BL) strategy under different load conditions and job characteristics. The results show that EET strategies outperform the existing strategies, reducing the mean and standard deviation of response time of existing strategies. Inaccurate estimation of execution time will simply cause less overall system performance improvement, but not cause any execution failures. The rest of the paper is organized as follows. In the next section, we review several processor allocation strategies. Then we propose a new family of allocation strategies, EET, in Section 3. In Section 4, we describe our simulation model and present simulation results. Finally, we conclude the paper in Section 5.
2
Previous
Allocation
1995
is the base of the submesh and 2k is its size. Ifiz > 0, then B(x,y,k:-1), B(x+~~-~,Y,IC-~), B(~,y+2’-~,L-l), and B(~+2~-l,y+2~-~,Ic-l) are also submeshes, which are buddies of each other. An incoming request for S(w, h) is assigned a square submesh of size Z, where t = 2r’“gZ(“““(“~h))l . This strategy tends to give large internal fragmentation and, hence, a significant amount of processor resources could be wasted.
A two-dimensional (rectangular) mesh, denoted by M(W, H), consists of W x H nodes arranged in a W x H two-dimensional grid. Each node represents a processor. The node in column i and row j can be represented by address < i, j >. A two-dimensional submesh in M(W, H), denoted by S(w, h), consists of w x h nodes, is a subgrid of M(W, H) such that 1 5 w 5 W and 1 5 h 5 H. The address of a submesh is denoted by a quadruple < xi, yl, x2, yz >, where < xi, yi > indicates the lower left corner and < x2, yz > indicates the upper right corner of the submesh. The base of a submesh is the node at the lower left corner of the submesh. Figure 1 shows a rectangular mesh M(5,4). Nodes < 2,1 >, < 3,1 >, < 4,1 >, < 2,2 >, < 3,2 >, < 4,2 > form a submesh S(3,2) with address < 2,1,4,2 >. The base of S(3,2) is < 2,1 >. A node is busy if it is allocated to a job. A node is free if it is not busy. A free submesh is a submesh whose nodes are all free. A busy (allocated) submesh is a submesh whose nodes are all allocated to
2.2
Frame Sliding
Strategy
The Frame Sliding (FS) strategy [2] first introduced the idea of busy set, coverage set and reject set. The busy set is the collection of all busy submeshes. The coverage set is the union of the cov-
164
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
erage submeshes of all busy submeshes. The COVerage submesh of a busy submesh with respect to an incoming requested submesh is a submesh whose nodes can not serve as the base for the requested submesh. For an allocated submesh < z1,y1,z2,y2 >, the coverage submesh for an incoming job J(w, h) is < mat(0,21 - w + I>, m=+o, Yl - h + 1),3J2, Y2 >, where maz(i, j) returns the larger value of i and j. The reject set with respect to an incoming job J(w, h) for a mesh system M(W, H) consists of two reject submeshes < W -w+l,O,W-l,H-1 > and < O,H-h+ l,Wl,H1 >. Nodes in the reject set can never be the base of any free submesh for the incoming job. For a job requesting a w x h submesh, the FS strategy first generates the coverage set and reject set according to w, h, and the busy set. Starting from the lowest leftmost free processor, it compares the base node of every candidate frame (submesh) in sequence against the coverage set and reject set. If this node belongs to any element of the two sets, the frame will slide w nodes to the right. If it exceeds the right boundary, the frame will go up h nodes and then slide from right to left. The slide goes on until a free submesh is found or the entire mesh is searched. The FS strategy can be applied to rectangular meshes with arbitrary width and height. It allocates a job to a free submesh with the same size as that of the job, thus eliminating the internal fragmentation. However, due to the use of strides, this strategy may miss available free submeshes.
2.3
First
strategy. The FF strategy allocates a job to the first available submesh. Thus, a large contiguous area may be divided and allocated for a small submesh request, resulting in potent.ial external fragmentation.
2.4
Best Fit Strategy
In [9], Zhu also proposed a Best Fit (BF) allocation strategy. Instead of allocating the first available free submesh, the BF strategy chooses among the available submeshes for allocation baaed on a certain heuristic (to be specified later) which tries to allocate jobs at corners to reduce the external fragmentation. A COP ner is an element in the coverage array that serves as the end of a consecutive sequence of O’s in both the row and column where it is located. The heuristic is to choose the corner in the coverage array with the largest number of busy neighbors (l’s) surrounding it. For a submesh request, the coverage array is computed following the procedures described for the FF strategy. The coverage array is then scanned as follows to identify corners: (1) Scan each row in the coverage array in any order and mark two end elements of each sequence of consecutive O’s as left and right, respectively; and (2) Scan each column in the coverage array in any order, and for each consecutive sequence of O’s in a column, mark the two end elements as bottom and top, respectively. If any top or bottom is also marked by left or right, it is a corner. The number of busy neighbors (l’s) of each corner is then computed. A corner located in one of the four corners of the coverage array is considered as having two additional l’s surrounding it, while a corner located in the boundaries of the coverage array, e.g., C[O, 11, is considered as having one additional 1. The corner with largest number of busy neighbors is chosen for allocation. If there are ties, the corner with the smallest area is selected. The area of a corner is the product of the numbers of consecutive O’s in its row and column starting from the corner.
Fit Strategy
The First Fit (FF) strategy proposed by Zhu [9] is a modification of the FS strategy. Instead of using busy set, a busy array B[W, H] is constructed according to each processor’s status, where B[i, j] is 1 (0) if processor < i,j > is busy (free). Instead of computing coverage submeshes individually, a coverage array C[W-w+l,H-h+l] can be computed from the busy array where C[i, j] is 0 if node < i, j > can be a base for incoming job J(w, h). The reject set is no longer needed because the dimension of the coverage array is reduced by w - 1 and h - 1 in z and y directions, respectively. To fill the coverage array, the FF strategy scans first the rows in the busy array from right to left to fill the coverages to the left of busy submeshes. Then it scans all columns of the coverage array from top to bottom starting at the leftmost column, to fill the coverages below the busy submeshes. The first zero found in the coverage array is used as the base for the requested submesh. This approach eliminates the allocation miss encountered in the FS
Although the BF strategy is designed to reduce the fragmentation and subsequently to improve the system performance, it does not exhibit a better performance compared with the FF strategy in most cases. Zhu indicated in [9] that the BF strategy may result in jobs scattered in the mesh, while the FF strategy tends to allocate jobs toward the upper left corner. Consequently, the mesh becomes more fragmented when using the BF strategy.
165
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
2.5
Adaptive
Scan Strategy
submesh is the number of busy neighbors (including mesh boundaries) it has. The consequence of the BL strategy is that jobs tend to gather either at the four corners of the mesh system or adjacent to allocated jobs. If an initial candidate submesh does not overlap with any allocated submesh , it will not slide. Because the BL allocation strategy does not consider all free neighboring nodes of all the allocated submeshes, it may not always be able to find the free submesh with the highest boundary value.
The Adaptive Scan (AS) strategy [4] is a modified version of the FS strategy discussed above. It is the first strategy that allows the rotating of the submesh request by 90 degrees if a free submesh is not found in the first pass. After constructing the coverage set for the particular submesh request, it scans the entire mesh starting from the lowest leftmost node < 0,O >, from left to right, bottom to top, to node < W-w+l,H-h+l>,withadaptivestepsizesto skip impossible nodes. If the current node belongs to a coverage submesh in the coverage set, let z,,, be the maximal 2 index of this coverage submesh. The scan will jump from the current node to the z,,, + 1 position in the same row until the z index exceeds W-w + 1 or a free submesh is found. If a free base is not found, it will scan the next row from left to right until the y index exceeds H - h + 1 or a free submesh is found. If it can not find a free base after scanning the whole mesh, it will rotate the requested submesh from S(w, h) to S( h, w) and scan the mesh one more time. The AS strategy will always find a free submesh if there is one available, so it has complete submesh recognition capability, and it gives the best performance among all first-fit allocation strategies assuming an FCFS scheduling policy is used.
2.6
Busy
1995
3
Estimated
Execution
Time
Strategy
We have performed simulation runs to study the performance of existing strategies and found that the different completion times of the jobs in the system can result in busy processors being scattered throughout the mesh, thus causing large external fragmentation. Actually, most jobs running on parallel systems are programs submitted repeatedly. Users usually can quite accurately estimate the execution time of their jobs. Having an estimated execution time for the job submitted to the processor allocator, the system can try to allocate jobs with similar completion times in the same area. We propose a family of processor allocation strategies, called Estimated Execution Time (EET), w h ic h u t,i 1ize the estimated execution time information.
List Strategy
3.1
The Busy List (BL) strategy [3] is a best-fit strategy with complete submesh recognition capability. In this algorithm, a list of all the allocated submeshes, busy list (busy set), is maintained. For an incoming job J(w, h), the algorithm generates initial candidate submeshes for both J( w, h) and J( h, w) along the four corners of each allocated submesh within the mesh boundaries. The initial candidate submeshes generated from each allocated submesh will then be considered one by one to check for overlap with any of the other allocated submeshes. An initial candidate submesh will slide along the boundary of the allocated submesh which generated it until a free submesh is found only if it overlaps with another allocated submesh. At the same time, it increases its boundary value (to be defined later) by the length of the common boundary if it is adjacent to some allocated submesh. It also generates candidate submeshes which are at the four corners of the mesh system. However, they will be discarded in case of an overlap with some allocated submesh. The best-fit submesh is the one with the highest boundary value. The boundary value of a free
Basic Idea and Heuristics
The basic idea of our EET strategies is to allocate a job in close proximity to jobs with similar estimated completion times. The estimated completion time (ECT) of a job in the system is the time it was allocated plus its estimated execution time. The estimated completion time of the job which is currently being allocated is the current time plus its estimated execution time. The estimated completion time difference (ECTD) is calculated for each boundary node of a candidate submesh and its neighbor. If a neighbor is free, the ECTD is the estimated execution time of the job. If a neighboring node is busy, the ECTD is defined as the absolute value of the estimated completion time difference of the two neighboring jobs. If the neighboring node is beyond the boundary, the ECTD is zero. Because of inaccurate estimation, it is possible to encounter a situation in which the ECT of a neighboring job is less than the current time. If this happens, current time is used to represent the ECT of 166
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Job Y
3.2
Candidate submesh B
Candidate submesh A Figure 2: A two-dimensional
Detailed
Algorithms
We first propose the Estimated Execution TimeSearch Boundaries (EET-SB) strategy. It will search candidate submeshes which are adjacent to the boundaries of the mesh system or to allocated submeshes. The searching procedure is similar to the BL strategy, but the EET-SB strategy will consider all possible candidate submeshes along the boundaries. After each successful submesh allocation, the system must record the estimated completion time of that job and update the busy list for future allocation consideration. After each job completes and releases its processors, the busy list again needs to be updated. The detailed algorithm is presented in Figure 3. The Best Fit allocation strategy proposed by Zhu is designed to find corners in the coverage array. A major reason why the BF strategy performs worse than the BL strategy is that the BF strategy does not allow the rotating of the requested submesh by 90 degrees. We propose the Estimated Execution Time-Search Corner (EET-SC) processor allocation strategy in Figure 4. The corner-searching procedure is similar to that in the BF strategy, but we allow the rotating of the requested submesh. Simulation results and performance comparisons are given in the next section.
Job X mesh M(8,8).
the neighboring job and the ECTD is merely the estimated execution time of the job currently being allocated. The total estimated completion time difference (TECTD) of a candidate submesh is the sum of the estimated completion time differences of all its neighboring nodes. Consider the 8 x 8 mesh in Figure 2. Two jobs, X and Y, are currently in the system with estimated completion times c and y, respectively. Submeshes A and B are two candidate submeshes for a job 2 requesting a 3 x 4 submesh with estimated completion time z, where .z is the current time t plus the estimated execution time z’ of 2. If both x and y are greater than t, candidate submesh A has a total estimated completion time difference of x] + 2 x It. - y] + 2 x z’. Candidate submesh 3xjrB has a total estimated completion time difference of 3x]%-x1+4x%‘. Iftisgreaterthanxandtisless than y, A has a total estimated completion time differenceof3xr’+2x]z-y]+2xz’andBhasatotal estimated completion time difference of 3 x z’+ 4 x z’. If both x and y are less than t, A has a total estimated completion time difference of 3 x .z’+ 2 x .z’+ 2 x t’ and B has a total estimated completion time difference of 3 x r/+4 x z’. The objective is to choose the candidate submesh with the smallest total estimated completion time difference.
4
Simulation
Results
and Analysis
We developed simulations to compare the performance of our EET strategies with the BL and BF strategies. In order to achieve the peak performance of the BL strategy, we implement it in such a way that it will consider all free neighboring nodes and always find the free submesh with the highest boundary value. In order to have a fair comparison, the BF strategy is implemented in such a way that it can rotate the requested submesh as in the EET-SC strategy.
4.1
Simulation
Model
The simulator is implemented in C/CSIM [S]. Various mesh sizes ranging from 16 x 16 to 64 x 64 are considered for the simulation. However, the simulation results for mesh systems of various sizes follow a similar trend, thus, we only report the performance results for a 32 x 32 mesh system in this paper. In our simulation, job execution time and job inter-arrival time are each assumed to have the exponential distribution with the mean of EXTM (mean job execution time) and IATM ( mean job inter-arrival time), respectively. For our EET strategies, we generate estimated 167
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Allocate J( w, h) 1. compute the coverage array for J(w, h); 2. scan all the rows in the coverage array and mark two end elements of each sequence of consecutive O’s as end;
Allocate J(w, h) 1. if busy-list = null, then allocate J(w, h) to < 0,0, w - 1, h - 1 >; stop; else = 00 and initialize min-TECTD TECTD
3. scan all the columns in the coverage array; if any element is an end element of a consecutive sequence of O’s in a column, and it is previously marked as end, then add it to corner-list; 4. compute the TECTD for the candidate submesh based at each corner;
= 0;
2. generate candidate submeshes for J(w, h) and J(h, w) on the four corners of the mesh system and put them in the candidate-list;
5. compute the coverage array for J(h, w);
3. for each job B in the busy-list
do generate candidate submeshes for J(w, h) and J(h, w) that lie within the boundaries of the mesh system along the four boundaries of B and put them in candidate-list; end for;
6. follow steps 2,3, and 4 to identify corners and their corresponding TECTD; 7. if the comer-list is empty, then put J(w, h) in the wait.ing queue and stop;
else choose the candidate submesh with the smallest TECTD and stop;
4. get the first submesh S in the candidate-list; 5. for each job B in the busyAis2 do if S overlaps with B, then remove S from and goto step 8. the candidate-list else if S is adjacent to B, then multiply their ECTD by the length of their common boundary, add it to TECTD, and
Figure 4: The EET-SC allocation strategy.
execution time I for each job according to the exact job execution time y of the job by a normal distribution with mean y and standard deviation c x y, where c is a coefficient indicating the accuracy of the estimation. In our experiments, c is ranging from 0 (perfect estimation) to 1. If the estimated execution time generated is negative, it will be set to zero. The width w and height h of the requested submesh are generated independently. We considered three distributions for w and h, namely, uniform, decreasing, and increasing distributions [2], to cover different characteristics of jobs. In the uniform distribution, w and h are equally likely to have a value of 1 through N (N = 32). In the decreasing distribution, the range of 1 through N is divided into several intervals. The distribution of w and h within each interval is still uniform. However, in a decreasing distribution, the probability that w or h falls into one of the intervals decreases as the values in the interval increase. Essentially, a decreasing distribution on w and h represents a system with more jobs requesting small submeshes. Similarly, in an increasing distribution, the probability that w or h falls into one of the intervals increases as the values in the interval increase. If w and h follow the increasing distribution, the system will have a greater chance to
mark those neighboring nodes of S as busy; end for; 6. multiply the number of free neighboring nodes of S by the EET of J(w, h), and add it to TECTD; 7. if TECTD < min_TECTD, choice-submesh = S; min-TECTD = TECTD;
then
8. if S is not the last submesh in the candidate-list, then get the next submesh in candidate-list, let it be S, and goto step 5; 9. if the candidate-list is empty, then put J(w, h) in the waiting queue and stop;
else allocate J(w, h) or J(h, w) to choice-submesh and stop; Figure 3: The EET-SB allocation strategy.
168
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
1995
I””
90
90so
-
70 al5040 30. 20 10 0 0
0.1
0.2
0.3 Load
0.4
0
0.5
I 0.1
0.4
0.2
0.5
L::d
load (w and h:
Figure 6: Standard deviation of response time vs. load (w and h: uniform distribution).
receive jobs requesting large submeshes. The results reported in this paper are collected from 20 independent simulation runs. Each run completes 50,000 allocations/deallocations, and we record the response time of all jobs. The error is less than 5% with a 90% confidence level. As in [5, 2, 9, 41, the overheads of the allocation algorithms are assumed to be negligible. In order to compare the performance of various strategies under different load, we define the load as follows: n x EXTM load = N x IATM
the EET-SB strategy shows performance gain. When the load reaches 0.5, the EET-SB strategy reduces the mean response time compared to BL allocation strategy by 18% to 27% under different estimation accuracies. When the load reaches 0.6, the system becomes unstable, so we report the results only up to a load of 0.5. The standard deviation of the response time is presented in Figure 6. Figures 5 and 6 indicate that the EET-SB strategy outperforms the BL strategy under all workloads when width and height of requested submeshes are uniformly distributed. As the execution time estimation error decreases, the mean response time becomes smaller. When the coefficient c is 1, the standard deviation of the estimated execution time equals the exact execution time. The estimation error is very large under this condition, and over 16% of the jobs will have their estimated execution time set to zero. In a real system, it is reasonable that some users have no idea about the execution times of their submitted jobs. The system can set a default value or zero to those jobs for the purpose of allocation. In our simulation, the EETSB strategy outperforms the BL strategy even under the worst case of c = 1. When the coefficient c is 0, exact execution time is used to select the candidate submesh and the EET-SB strategy performs the best. The same experiment is conducted to consider different characteristics of job requests, namely, increasing and decreasing distributions for the width w and height h of submeshes requested. Here, we consider an increasing distribution where the range of 1 through 32 (for height or width) is divided into four intervals, [1,16], [17,24], [25,28], and [29,32], and the probabilities that the height or width falls in these intervals
Figure 5: Mean response time vs. uniform distribution).
where n is the mean requested submesh size (number of processors) of jobs, and N is the total number of processors in the 2D mesh system. In our simulation, we fix the EXTM to be 10.0 time units and adjust the IATM according to the desired load. The system will become unstable with too small an IATM value. We measure the system performance based on the mean and standard deviation of response time. To have an accurate measurement, only stable systems are considered. 4.2
Performance
Results
Figure 5 illustrates the mean response time under different workloads using BL and EET-SB strategies with the coefficient c equal to 0.0, 0.5, and 1.0. The width and height of the requested submeshes are uniformly distributed from 1 to 32. When the load is light (less than 0.3), all allocation strategies have similar performance. Most jobs will be served promptly after they enter the system. For load greater than 0.4, 169
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
BL -
0’ 0
I 0.1
0.2
0.3 Load
Figure 7: Mean response time vs. increasing distribution).
0.4
0’ 0
0.5
load (w and h:
. 0.1
0.2
Lt:d
Figure 9: Mean response time vs. decreasing distribution).
50
0.4
0.5
load (w and h:
200 160
40 E F 8 E a B ::
BL -
160
30
20
10
.-E
140
ii
120
ii
100
ag
60
ii 5:
60 40 20
0
0 0
0.1
0.2
0.4 L::d
0.5
0
0.1
0.4
0.2
0.5
L”da3d
Figure 8: Standard deviation of response time vs. load (w and h: increasing distribution).
Figure 10: Standard deviation of response time vs. load (w and h: decreasing distribution).
are P[l, 161 = 0.2, P[17,24] = 0.2, P[25,28] = 0.2, and P[29,32] = 0.4. For decreasing distribution, we have intervals [1,4], [5,8], [9,16], and [17,32], and the probabilities that the height or width falls in these intervals are P[1,4] = 0.4, P[5,8] = 0.2, P[9,16] = 0.2, and P[17,32] = 0.2. The simulation results of the mean and standard deviation of response time for BL and EET-SB strategies when the width and height of requested submeshes are distributed under the increasing distribution are presented in Figures 7 and 8, respectively. Performance results are similar to that of the uniform distribution case. However, the performance difference among various strategies is not as significant as that of the previous case. This moderate performance improvement for increasing distribution of job
requests is expected. The system tends to have more jobs requesting large submeshes in an increasing distribution and, hence, there is a reduced chance of allocating large jobs into the system while some jobs are in the system. The extreme case is that all jobs request the full mesh, so no matter which processor allocation strategy you use, the performance will be the same. Figures 9 and 10 show the mean and standard deviation of response time versus load, respectively, for BL and EET-SB strategies under the decreasing distribution of job requests. With a decreasing distribution, the system has more small jobs than large ones. The processor allocation strategy plays an important role when the system load is heavy. As the load reached 0.5, the EET-SB strategy reduced the mean response time by 170
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences 160 150 140 130 120 110 100 90 60 70 60 50 40 30 20 10 0
300 c
0
0.1
0.2
0.4
I 1
BF -
-
;: 10 0 0
BF -
0.1
Lt:d
Figure 11: Mean response time vs. load (w and h: uniform distribution).
110 100 9060 70 60 50. 40
0
0.5
. 0.1
0.2
Lt:d
0.4
0.5
Figure 12: Standard deviation of response time vs. load (w and h: uniform distribution).
15 to 55% of the BL strategy under different estimation accuracies. Figures 5 to 10 show that the EET strategy outperforms the BL strategy under all load conditions, various job characteristics, and different estimation accuracies. Because the performance difference is small when the width and height of the requested submesh are distributed under increasing distribution, only the results for uniform and decreasing distribution are presented for the BF and EET-SC strategies. Figures 11 and 12 show the mean and standard deviation of response time under various loads for BF and EET-SC strategies when w and h are uniformly distributed. Figures 13 and 14 show the mean and standard deviation of response time under various loads for BF
1995
0.2
I
0.3 Load
0.4
-I
0.5
Figure 13: Mean response time vs. load (w and h: decreasing distribution). and EET-SC strategies when w and h are distributed under decreasing distribution. As we can observe from Figures 11 through 14, the EET-SC strategy outperforms the BF strategy. In order to compare the performance of all the strategies discussed above, we combine Figure 5 with Figure 11 and Figure 9 with Figure 13 into Figures 15 and 16, respectively. We can see that the BL strategy outperforms the BF strategy under all conditions. The EET-SC and EET-SB strategies have similar performances. The EET-SC strategy has a slight advantage over the EETSB strategy in the c = 1 case, while the EET-SB strategy performs better than the EET-SC strategy when c = 0. When EET-SB strategy is used, we have all the candidate submeshes along the boundaries to choose from, while the EET-SC strategy considers only corners. It is reasonable that the EET-SB strategy can find a better fit than the EET-SC strategy when the execution time estimation is precise. In the case of c = 1, since the EET-SC strategy will only consider corners and the EET-SB strategy will consider all boundaries, EET-SB strategy is more likely to choose a bad candidate submesh because of the large estimation error. Finally, the EET strategies outperform the BF and BL strat.egies under all conditions.
5
Conclusions
In this paper, we investigate the effect of using estimated execution time to guide submesh allocation. We have developed two processor allocation strategies for two-dimensional mesh-connected systems, called
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
the EET-SB and EET-SC strategies, which utilize the estimated time information provided by users, to allocate jobs with similar finishing times in the same areas. Extensive simulation runs have been performed to compare various strategies. The results indicate that both EET strategies have better performance than previously proposed strategies under all circumstances. Inaccurate estimation of such execution time will simply cause less overall system performance improvement, but not cause any execution failures.
250 200 150 100 50 0 0
0.1
0.2
0.4
0.5
References
L::d
PI R.
Alverson et al., “The Tera Computer System,” Conf. on Supercomputing, pp. 1-6, 1990.
Figure 14: Standard deviation of response time vs. load (w and h: decreasing distribution).
Proc. 1990 Int’l
100
PI P.-J.
Chuang and N.-F. Tzeng, “An Efficient mesh Allocation Strategy for Mesh Computer Conference on tems,” Proc. International tributed Computing Systems, pp. 256-263, 1991.
90 80 i= 2
70
3
60
aB
:
5
;
SubSysDis-
May.
[31 D. Das Sharma and D. K. Pradhan, “A Fast and Efficient Strategy for Submesh Allocation in Mesh-Connected Parallel Computers,” Proc. IEEE Symp. on Parallel and Distributed Processing, pp. 682-689, Dec. 1993.
10 01
PI J.
Ding and L. N. Bhuyan, “An Adaptive Submesh Allocation St,rategy for Two-Dimensional Mesh Connected Systems,” Proc. 1993 International Conference on Parallel Processing, vol. II, pp. 193-200, Aug. 1993.
I 0
0.1
0.2
Lt:d
0.4
0.5
Figure 15: Mean response time vs. load (w and h: uniform distribution).
[51 K. Li and K. H. Cheng, “A Two Dimensional Buddy System for Dynamic Resource Allocat.ion in a Partitionable Mesh Connected System,” Proc. ACh! Computer Science Conference, pp. 22-28, Feb. 1990.
200 180 160 140
PI
120
“Paragon XP/S Product Overview,” Intel Corpo1991.
ration,
100 80
[71 “A Touchstone DELTA System Description,” Intel Corporation, 1991.
60 40
PI H.
Schwetman, CSIM tronics and Computer 1991.
20 0 0
0.1
0.2
0.3 Load
0.4
0.5
PI Y.
User’s
MicroelecCorporation,
Guide,
Technology
Zhu, “Efficient Processor Allocation Strategies for Mesh-Connected Parallel computers,” Journal of Parallel and Distributed Computin.g, vol. 1R pp. 328-337. Dec. 1992.
Figure 16: Mean response time vs. load (w and h: decreasing distribution).
172
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE