will go to server that puts the system in state j âÎk. Ï'k is a column vector of .... CPU(dedicated server) service distribution and keep the service distribution for ...
Efficient Resource Allocation for Parallel and Distributed Systems Ahmed M. Mohamed, Reda Ammar and Lester Lipsky Dept. of Computer Science and Engineering University of Connecticut, Storrs, CT 06269, USA Tel:860-486-0849, Fax:860-486-1273 {ahmed, reda, lester @engr.uconn.edu} Abstract. The resource requirements characteristics of parallel scientific and commercial applications are quite diverse. In order to achieve the expected levels of high availability and faster service, new methods for resource allocation and management are needed. In this paper, we developed a new analytical model for resource allocation. The model includes the major system and application parameters that affect their performance. The model can detect and take into account the contention that may occur in shared device. It also includes the effect of the transient behavior and the type of the application service distribution. We use the model to examine the performance characteristics of these parameters and their effect on resource allocation. Our results show that the service distribution, the transient behavior and any resources’ contention have a significant impact on resource allocation. Key Words: Cluster Computing, Resource Allocation, Performance Modeling, Transient Analysis, Jackson Networks, Queueing Analysis and Parallel Systems.
1. Introduction The class of parallel scientific and commercial applications is extremely diverse. These applications exhibit a wide range of speedup and efficiency characteristics. Furthermore, a wide variety of parallel and distributed systems are available to the user community. These varieties range from the traditional multiprocessor vector systems to clusters or networks of workstations (NOW) and even the geographically dispersed meta-systems connected by high-speed Internet connections (GRID). The considerable availability of different parallel and distributed systems as well as the diversity of parallel scientific and commercial applications make the problem of resource allocation a non-trivial problem. The resource allocation policy should not be based on a specific system or application. It also should take into account major parameters that affect the system and application’s performance. The focus of this paper is to develop an efficient resource allocation policy based on performance modeling and analysis. In [1,2], we developed an analytical performance model that can integrate different architecture and application
parameters in one analytical model and can detect which parameter is the bottleneck of the system. These parameters include architecture parameters (system size, speed of each CPU, capacity of storage devices and communication bandwidth), application parameters (size, amount of shared data, service distribution), the degree of contention of each shared device (calculated by the model) and the effect of the system transient characteristics. The model estimates the effect of these parameters on the resources allocation. Our model is general and can be applied to any system and any parallel application. The rest of the paper is organized as follows. In Section 2 we will give a brief background for some of models that have been developed for this problem. In Section 3 there will be a description of our performance model. In Section 4 we introduce our model for resource allocation. We present experimental results in Section 5.
2. Related Work Research in this area has generated a large volume of results under different assumptions for
the application and the system. Corsava et al [5] presented a resource allocation model based on intelligent architecture. They installed intelligent agents in each device in the system. These agents monitor the system and send the results to the scheduler. Then, the scheduler makes decisions based on the available information. The model, like any monitoring system makes decisions based on the current available information. It can’t predict the performance behavior or the expected bottlenecks. Also, It does not take into account the application size or system parameters. Many models based on monitoring are reported in the literature [10,21]. Another approach is to consider the allocation problem as a scheduling problem [1,4,6,7,20]. Dowdy et al [6] reviewed the role of the architecture issues in the choice of the scheduling discipline. They also chose a selected set of scheduling policies for different architectures and discussed the merits of each. Also, the allocation problem can be considered as a portioning problem too. Islam et al [10] developed a resource management system for allocating resources. The system consists of several components, including a hierarchical software architecture and a new technique called flexible dynamic partitioning. The hierarchical software architecture supports the coexistence of multiple independent scheduling strategies. While flexible dynamic partitioning is responsible for managing the resources. We developed an analytical performance [16,17] model for parallel and distributed systems. We included the major parameters that affect the performance of such systems. This model then, can be utilized in resource allocation. The advantage of using such analytical models is the flexibility, since it can be used for different system configuration. Also, it does not provide any load on the system like monitoring approach. Our model does not assume any specific scheduling method; therefore it can be used as a baseline model.
and by Gordon and Newell [8] who showed that certain classes of steady state queueing networks with any number of service centers could be solved using a product-form solution. A substantial contribution was made by Buzen [2,3] who showed that the ominous-looking formulas were computationally manageable. Moore and Basket et al [18,19] summarized under what generalizations the product form solution could be used (e.g., processor sharing, multiple classes,). Thereafter the performance analysis of queueing networks began to be considered as a research field of its own.
3.1 Steady State Model It has been shown by Jackson [11], Gordon and Newell [8] and Buzen [3] that for any network of K service centers with exponentially distributed service times, and serving a total of N customers the probability of being in the state n = {n1, n2 ……nk} where n1 + n2 + ………..+ nk = N is P (n ) =
1 G (K )
K
∏
i =1
X n ] β (n ) i
[
i
i
(1 )
i
ni is the number of customers at service center i, Xi is the fraction of time a customer spends at each visit service center i when there is no one else in the system and βi(ni) is the load function of service center i. The function G(K) normalizes the probabilities so that their sum is 1, so
G(K ) = ∑ all n`
K
Xi
i =1
i( i
∏β n)
(2)
One then can compute G(K) using Buzen’s algorithm to compute various performance parameters such as system throughput, which is given by, (3) QK(N) = G(N - 1) / G(N) In the model, the average time to complete an application of N tasks running on a cluster of K workstations is given by, (4) E(Tapp(N)) = N / QK(N)
3.1 The Transient Model
3. The Performance Model Since the early 1970s, networks of queues have been studied and applied to numerous areas in computer science and engineering with a high degree of success. General exponential queueing network models were first solved by Jackson [11]
We introduce some definitions that are important to understand the model. See [15,17] for a complete description. Ξk is the set of all internal states of S when there are k active customers there. There are D(k) such states. Mk is the completion rate matrix where [Mk]ii the service rate of leaving state i. The rest are zeros.
Pk is the transition matrix where [Pk]ij i,j ∈ Ξk, is the probability that the system will go from state i to state j when service is completed while the system in state i. Qk is the exit matrix where [Qk]ij is the probability of a customer leaving S when the system was in state i∈ Ξk, leaves the system in state j ∈ Ξk-1. Rk entrance matrix where [Rk]ij is the probability a customer upon entering S finding it in state i ∈ Ξk-1 will go to server that puts the system in state j ∈ Ξk. τ’k is a column vector of dimension D(k) where [ττk]i is the mean time until a customer leaves S, given that the system started in state i ∈ Ξk. Suppose we have a computer system made up of K workstations and we wish to compute a job made of N tasks where N > K. The first K tasks are assigned to the system and the rest are queued up waiting for service. Once a task finishes it is immediately replaced by another task from the execution queue. Assume that the system initially opens up and K tasks flow in. The mean time to finish executing all tasks is, E(T) = pK [
N −K
∑
(YK RK)i] (ττ’K)
i=0
+ pK (YK RK)N-K YK [ττ’K + YK τK-1 + ….. (5) + YK YK-1… Y1τ’1] For a complete description of the model, see [16,17]. We used Linear Algebraic Queueing Theory (LAQT) [13] in our analysis. The above matrices include all of the major parameters mentioned earlier.
disks and their speed and each communication channel bandwidth. The following is how to utilize our model to assign these resources: 1. Identify the application parameters (N, amount of shared data, mean and variance of the tasks` service time). 2. Identify system parameters (K, speed of CPUs, disks and channel bandwidth). 3. Construct the matrices described earlier. 4. Calculate the estimated running time from equation 5. 5. Check the vector pK (YK RK) to test if any of the resources is about to saturate (P is close to 1). 6. If any of the resources is about to saturate, add more resources to resolve the problem. 7. If the estimated running time is not good enough, we should first resolve all of the saturated resources then increase the number of processors.
5. Results We usually assume that parallel and distributed systems work most of the time in steady state or that the transient portion does not have a significant impact on the performance. We believe this assumption is not accurate. Therefore, we first show the importance of the transient analysis and to what extent the steady state model can be used. Then, we integrate our performance model with resource allocation algorithm described earlier.
5.1 Importance of Transient Analysis
4. The Resource Allocation Algorithm The target parallel application considered to be a set of independent tasks, {t1, t2... tN}, where each task is made up of a sequence of requests for CPU, local data and remote data. The service time of each task is unknown but the average service time over the whole set of tasks is known. The set of active tasks can communicate with each other by exchanging data from each other's disks. The tasks run in parallel, but they must queue for service when they wish to access the same device. These tasks need to be executed by a computing system consists of K computers. The resources available include, the number of computers, the available computing power in each computer, number of
To show the importance of transient analysis, we used the steady state model and our performance model to calculate the running time of a parallel application on a 8-node central storage system and a 5-node distributed storage system. The average running time per task is 24 units of time. Each task spends 90% of its time locally. The speed of the central disk is three times the speed of any local disk. We changed the number of tasks between 10 and 300 per application. The percentage error is calculated as follows,
E= where,
EK (Tapp ( N ))tr − EK (Tapp ( N )) SS EK (Tapp ( N ))tr
EK(Tapp(N))tr is the mean of N tasks running on K the transient model. EK(Tapp(N))SS is the mean of N tasks running on K the steady state model.
service time of an application workstations calculated using service time of an application workstations calculated using
Prediction Error 60 50 Err%
40
5.2 Importance of Service Distribution The exponential distribution is widely used in many performance model. It has been shown by Leland and Ott [12] that the distribution of the CPU times at BELLCORE are power tails (PT). Also, Lipsky[14], Hatim[9] and others, found that file sizes stored on disks and even page sizes are PT. If these observations are correct, then performance modeling based on exponential distribution will not be adequate any more. Our model supports any kind of service distribution [17].
30 N= 200
20 8
10
7
0 50
100
150
200
250
300
No. of tasks per application Fig 1. Effect of performance region on prediction of running time.
In Figure 1, we can observe that the percentage error is approximately 20% when the application consists of 60 tasks (almost 8 times the system size). The error reduces to 5% when the application consists of 240 tasks (30 times the system size). In Figure 2, we run the same experiment on a 5-node distributed storage system. We see that ignoring the effect of the transient regions can lead to a serious error in calculating different performance metrics. We expect that it will get even worse for larger systems since the bigger the system is the slower the system reaches the steady state and the more dominant the transient behavior. Therefore, we recommend that to achieve accurate performance measures, one should take into account the effect of the performance regions and also the effect of contention, which will be calculated by our model. Prediction Error 40 35 30 25 20 15 10 5 0
6 Speedup
10
E%
N = 50
5 4 3 2 1 0 1
10
20
30
40 C2
50
60
70
80
Fig 3. Effect of service distribution on resource allocation
In Figure 3, we show the effect of service distribution on the speedup obtained from 8-node distributed system for applications of 50 and 200 tasks. We see that by changing the coefficient of variation (C2) from 1 (exponential) to 10 the speedup drops significantly. In fact it drops by 50% if C2 is 20. Also, we notice the effect of the transient region by comparing the speedup when the number of tasks is 50 (transient region has a significant impact) and with the same application when the number of tasks is 200.
5.3 Resource Allocation
10
50
100
150
200
250
No of w orkstations
Fig 2. Effect of transient region on prediction of running time.
In our analysis, we provide more insight to the effect of different factors that influence the resources allocation of parallel and distributed systems. We take into account the parameters of both of the architecture and the application. Also, we include new three parameters, the effect of the transient regions, the effect of contention and more importantly the effect of the service distribution.
Speedup
N = 20
N = 100
N = 200
9 8 7 6 5 4 3 2 1 0 2
4
6 8 No. of workstations
10
Fig 4. Effect of transient region on resource allocation
In Figure 4 we show the effect of the performance region on the resource allocation. The number of workstations allocated to the application does not guarantee the required speedup if we ignore the performance region. The speedup drops by 50% for a system of 10 workstations when N is 20 compared with the case of N=200. Contention
No-Contention
10
As mentioned earlier, this drop in the speedup occurs because transient regions dominate when N=20 and it has less effect when N=200 (steady state dominate). In Figure 5 we used the same application but we chose small communication bandwidth to have contention at the communication channel. Then, we increased the bandwidth three times to resolve the bottleneck. We notice that increasing the number of workstations did not improve the speedup of the system in the case of contention. Therefore, it is very important to test if the system has any saturated resource before we start adding more processors. 5.3.2 Non-Exponential dedicated servers One of the factors that can affect the resource allocation significantly is the application service distribution. In Figure 6, we used the same application parameters but we changed the CPU(dedicated server) service distribution and keep the service distribution for shared servers as exponential. We used the Erlangian (C2 =0.5), exponential (C2 =1) and hyper-exponential (C2 =5) distributions. Our results show that the exponential distribution can be a good approximation to narrow distributions (C2 < 1). On the other hand it fails to approximate distributions with C2 > 1 (applications with high variance). Exp
speedup
5.3.1 Systems with Exponential Service Centers We assume that the exponential distribution is accepted as a service distribution. The exponential distribution is widely used for many analytical performance models. In this experiment we evaluate the effect of transient behavior and contention on resource allocation. The application we use consists of 20, 100 and 200 tasks. Each task spends in average 140.25 units at its local workstation, 12.375 units of time at the remote disk and 12.375 units of time at the communication channel. In Figure 4, 5 we use our model to calculate the speedup provided by the system to the application.
Speedup
H2
9 8 7 6 5 4 3 2 1 0 2
8
E2
4
6
8
10
No of workstations
6
Fig 6 Effect of service distribution on resource allocation
4
The type of service distribution still has a significant impact even with dedicated servers (no waiting). Our results indicate that, if we change the service distribution of the shared devices, the speedup becomes even lower (due to queueing). See [17] for more results.
2 0 2
4 6 8 No. of w orkstations
10
Fig 5 Effect of contention on resource allocation
Conclusion The efficient use of the available resource plays an important role in the design and use of parallel and distributed systems. The considerable availability of different parallel and distributed systems as well as the diversity of the parallel scientific and commercial applications make the problem of resource allocation a non-trivial problem. In this paper, we presented a new resource allocation analytical model. The model includes major system and application parameters that affect their performance. The model can detect and take into account the contention that may occur in shared devices. It also includes the effect of the transient behavior and the type of the service distribution of the application. We showed to what extent these parameters affect the resource allocation decision. The model is general and can be applied to any parallel or distributed system. Additional work needs to be done to make the model fault tolerant.
References [1] R. Buyya, “High Performance Cluster Computing: Architecture and Systems,” Prentice Hall PTR, NJ, 1999. [2]J.Buzen,“Queueing Network Models of Multiprogramming”, Thesis, Harvard University, 1971. [3] J. Buzen, “Computational Algorithms for Closed Queueing”, Comm. ACM, Vol 16, No. 9, Sep 1973 [4] R. J. Chen, “ A Hybrid Solution of Fork/Join Synchronization in Parallel Queues”, IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 8, pp. 829845, Aug. 2001. [5] S. Corsava and V. Getov, “ Intelligent Architecture for Automatic Resource Allocation in Computer Clusters”,International Parallel and Distributed Processing Symposium, Nice, France, Apr 2003. [6]L.Dowdy,E. Rosti and E. Smirni,” Scheduling Issues in High Performance Computing” Performance Evaluation Review Mar 1999. [7] I. Foster and Kesselman, “”The Grid: Blueprint for a New Computing Infrastructure,” Morgan-Kaufmann, 1998 [8] W. Gordon, G. Newell,”Closed Queueing Systems ”, JORSA, Vol. 15, pp. 254-265, 1967. [9] J. Hatem, L. Lipsky, “ Buffer Problems on Telecommunications Networks”, 5th International Conference on Telecommunication Systems, Nashville, TN, 1997. [10] N. Islam, A. Prodromidis, M. Squillante, L. Fong, A. Gopal.” Extensible Resource Management for Cluster Computing “ 17th International Conference on
Distriputing Computing Systems (ICDCS'97), Baltimore Maryland, May 1997 [11] J. Jackson, ”Jopshop-Like Queueing Systems”, J. TIMS, Vol. 10, pp. 131-142, 1963. [12] Leland, T. Ott,” Analysis of CPU times on 6 VAX 11/780 at BELLCORE”, Proceeding of the International Conference on Measurements and Modeling of Computer Systems, April 1986. [13] L. Lipsky, "QUEUEING THEORY: A Linear Algebraic Approach", McMillan, 1992. [14] L. Lipsky, “The Importance of Power-Tail Distributions for Modeling Queueing Systems,”, Operations Research, Vol 47, No. 2, (March-April 1999). [15] L. Lipsky, J. Church, “Applications of Queueing Network Models," Comp. Surveys, pp. 205-221, Sep. 1977. [16] Ahmed Mohamed, Reda Ammar and Lester Lipsky, “Transient Model for Jackson Networks and its Approximation”, 7th International Conference on Principles of Distributed Systems (OPODIS03), Martinuque, France, Dec 2003. [17] Ahmed Mohamed, Lester Lipsky and Reda Ammar,”Performance Model for Parallel and Distributed Systems with Finite Workloads.” Submitted to Journal of Performance Evaluation. [18] F. Moore, “Computational Model of a Closed Queueing Network with Exponential Servers”, IBM J. of Res. and Develop., pp. 567-572, Nov. 1962. [19] R. Muntz, F. Baskett, K. Chandy,” Open, closed and Mixed Networks of Queues with Different Classes of Customers”, JACM, Vol. 22, pp. 248-260, Apr 1975. [20] K. Naik, K. Setia, and S. Squillante “Processor Allocation in Multiprogrammed Distributed-Memory Parallel Computer Systems”, Journal of Parallel and Distributed Computing, Vol. 46(1), PP 28-47, Oct. 1997. [21] P. Pazel, T. Eilam, L. Fong, M. Kalantar, K. Appleby, G. Goldszmidt.” Neptune: A Dynamic Resource Allocation and Planning System for a Cluster Computing Utility” 2nd International Symposium on Cluster Computing and the Grid (CCGRID'02), Berlin, Germany, May 2002.