Balanced Scheduling Algorithm Considering ... - Springer Link

3 downloads 1284 Views 1MB Size Report
Grid resources such as desktop computers, mobile devices tend to provide a rela ... into nine groups depending on availability and job type (computing-intensive.
Balanced Scheduling Algorithm Considering Availability in Mobile Grid JongHyuk Lee1 , SungJin Song1 , JoonMin Gil2 , KwangSik Chung3 , Taeweon Suh1 , and HeonChang Yu1, 1

2

Dept. of Computer Science Education, Korea University {spurt,white}@comedu.korea.ac.kr, {suhtw,yuhc}@korea.ac.kr Dept. of Computer Science Education, Catholic University of Dague [email protected] 3 Dept. of Computer Science, Korea National Open University [email protected]

Abstract. The emerging Grid is extending the scope of resources to mobile devices and sensors that are connected through unreliable networks. Nowadays the number of mobile device users is increasing dramatically and the mobile devices provide various capabilities such as location awareness that are not normally incorporated in fixed Grid resources. Nevertheless, mobile devices exhibit inferior characteristics such as poor performance, limited battery life, and unreliable communication, compared to fixed Grid resources. Therefore, the job scheduling and the load balancing are more challenging and sophisticated in mobile Grid environment. This paper presents a novel balanced scheduling algorithm in mobile Grid, taking into account the mobility and availability in scheduling. We analyzed users’ mobility patterns to quantitatively measure the resource availability that is classified into three types: full availability, partial availability, and unavailability. We also propose a load balancing technique by classifying mobile devices into nine groups depending on availability. The experimental results show that our scheduling algorithm provides a superior performance in terms of execution times to one without considering availability and load-balancing. Keywords: scheduling, load balancing, availability, mobile grid.

1

Introduction

Grid [1] is a large-scale virtual computing environment where geographically distributed resources collaboratively provide a computing infrastructure. It is used for solving computing-intensive and data-intensive problems that are not practically feasible to run in traditional distributed computing environments. The  

This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2006-311-D00173). Corresponding author.

N. Abdennadher and D. Petcu (Eds.): GPC 2009, LNCS 5529, pp. 211–222, 2009. c Springer-Verlag Berlin Heidelberg 2009 

212

J. Lee et al.

early Grid was implemented mostly with physically fixed resources with highperformance, and the resources are connected through high speed and reliable networks.Emerging Grids [2] are extending a scope of resources to mobile devices and sensors that are connected through unreliable networks. Especially, a mobile Grid focuses on incorporating mobile devices by supporting new functionalities such as mobility. A mobile device in mobile Grid can play roles as both a consumer and a provider. As a consumer it requests service to a Grid, and as a provider it actively participates in processing service requests. Compared to physically fixed Grid resources such as desktop computers, mobile devices tend to provide a relatively inferior performance in terms of CPU capability, amount of main memory, and storage capacity. They also have a limited battery life and are commonly connected to wireless networks that are not as reliable as wired networks. Due to the availability and reliability issues, it is not straightforward to use a mobile device as a Grid resource and there are skepticisms on using a mobile device as a provider. Nevertheless, mobile devices offer various capabilities such as location awareness that are not normally incorporated in fixed Grid resources. Nowadays the number of mobile device users is exploding and devices are rolled out equipped with a processor and large memory with advanced technology at an ever-faster pace. Considering the devices’ capabilities and enormous number of gadgets, mobile devices have immense potential to serve as resource providers in mobile Grid environment. The challenges in mobile Grid are job scheduling and load balancing issues under the unreliable communication environment. For example, the network link in mobile Grid could be broken while executing a job that may require data communication. Then, the job should wait until the connection is reestablished. This leads to performance degradation of the job. Without proper load balancing, all jobs may be allocated only to the stable resources such as physically fixed Grid components. It results in discriminating mobile devices with less performance but with the enormous number of population. It incurs not only the decrease of the Grid resources’ utilization, but also the performance degradation due to the improper load-balancing. This paper presents a novel balanced scheduling algorithm taking into account the mobility and availability in scheduling. We analyzed users’ mobility patterns to quantitatively measure the resource availability that is classified into three types. We also propose a load balancing technique by dividing mobile devices into nine groups depending on availability and job type (computing-intensive and data-intensive). The rest of the paper is organized as follows. Section 2 presents related work on scheduling algorithms in mobile Grid. Section 3 discusses challenges introduced by the mobile device’s mobility. Section 4 demonstrates the system architecture of a mobile Grid and describes characteristics of users’ mobility patterns, which are used to propose our load balancing algorithm in Section 5. Experimental results are presented in Section 6. Finally, we conclude our paper with future works in Section 7.

Balanced Scheduling Algorithm Considering Availability in Mobile Grid

2

213

Related Work

Several studies have researched on scheduling issues in mobile Grid focusing on power efficiency, communication availability due to mobility, and job replication. From the power efficiency point of view, Chang-Qin Huang et al. [3] proposed a proxy based hierarchical scheduling model that takes into account mobility and power management in wireless environment. In this model, the scheduler is comprised of two levels (top level and proxy level) to efficiently utilize the energy of wireless node and to guarantee QoS at the same time. There are several studies on the communication availability. Park et al. [4] proposed a scheduling algorithm with the processor and the communication availabilities. This algorithm confined communication scope and thus can be useable even when the network link is broken due to the mobile device’s mobility. However, this algorithm has a shortcoming in that it is applicable to specific job types with no communication during a job execution. Farooq et al. [5] devised a generic mobility model to predict a time duration for which a user (and thus a device) will remain in a specific domain. It is based on learning from the user’s behavior in the past. This model computes the average mobility and the time in range based on the user range parameter. Then, it calculates how many jobs a mobile device can execute. Ghosh et al. [6] proposed a scheduling algorithm that applies a pricing strategy to the job allocation problem and optimizes a total system cost. Since these algorithms do not consider the processor availability and load balancing, they have limitations on the scheduling optimization. In job replication aspect, Litke et al. [7] proposed a method that estimates a number of job replications using the Weibull reliability function and maximizes the resource utilization for workloads caused by replication with the knapsack formulation.

3

Problem Statement

As opposed to a traditional Grid, the mobile Grid has a characteristic that Grid resources are changing their physical positions according to users’ movement. When a mobile device crosses domains in wireless network, a mobile user demands the terminal mobility to maintain a session. Even when a mobile user opens a session on one device and moves to another device, the user still demands both user mobility and session mobility for the continuation of the service. Present mobile computing guarantees the terminal mobility and the session mobility through MIP and SIP. It is implemented in either network layer or its upper layer. When the network condition is stable, resource and job management techniques in traditional Grid are directly applicable to the mobile Grid. Nonetheless resources in mobile Grid may actively participate in a Grid or may be separated from the Grid, depending primarily on the network health. Therefore, the traditional techniques on resource and job managements are not pertinent to the mobile Grid environment. Especially a fault tolerance feature withstanding unstable network links should be incorporated in mobile Grid to achieve the performance goal.

214

J. Lee et al.

The availability and reliability of the system is greatly influenced by device’s mobility. Availability is defined as whether the user can use a system immediately at a specific time. On the other hand, Reliability is whether the user can utilize a resource without a failure. Thus the availability in mobile Grid can be defined as a ratio of the expected uptime (e.g. system power is on) to the sum of the expected values of uptime and downtime (e.g. system power is off). Availability =

Tup Tup + Tdown

(1)

where Tup is uptime and Tdown is downtime. The downtime is classified into a planned downtime and an unplanned downtime. For example, the rebooting caused by system configuration changes belong to the planned downtime. Unhandled exceptions and physical problems such as hardware failure belong to the unplanned downtime. Since the planned downtime is inevitable, we focus on reducing the unplanned downtime caused by power supply shortage and network link failure. There are four different combinations depending on power status and network link status. Especially, we pay a special attention to the case where the system power is on, but the network link is down. This case is an uptime from a job point of view if it does not require communication during execution via the network link. However, this case becomes a downtime in the opposite situation where the job does require communication. It means that the system availability becomes different according to the job characteristic. When a job should be executed for a relatively long time without suspension, the reliability plays an important role and the communication failure caused by device’s mobility should be taken into account in the formula.

4 4.1

System Model System Architecture

Mobile Grid is a convergence of wired and wireless computing environment to efficiently utilize fixed and mobile Grid resources. It typically consists of physically fixed devices, mobile devices, and proxies. Fig. 1 shows the system architecture of a mobile Grid. The proxy is a delegation system that delivers job requests to a Grid, so mobile devices requesting jobs do not have to be online all the time. The main functionalities of proxy are information service and job scheduling. The information service collects resource information through information providers such as Network Weather Service (NWS). The job scheduler chooses a suitable resource to execute a requested job according to a scheduling algorithm. Our scheduling algorithm takes into account user mobility and load balancing described in Section 4.2 and Section 5. 4.2

Characteristics of User Mobility

This section investigates characteristics of users’ mobility patterns. We first discuss mobility parameters investigated in the prior research, and introduce new parameters suitable for the mobile Grid environment.

Balanced Scheduling Algorithm Considering Availability in Mobile Grid

215

Fig. 1. System architecture of a mobile Grid

In computer networks, an Access Point (AP) is a device that allows wireless devices to connect to a wireless network. The mobile device user may freely move around APs and has access to a network. In a mobile environment with APs, all time (Tall ) of a mobile device is divided into an uptime and a downtime. The uptime is further divided into a time duration (Tconnected ) during which a network is connected and a time duration (Tdisconnected ) during which a network is disconnected. In [8], two metrics are introduced to model the user mobility: AP prevalence and user persistence. These two metrics are defined as follows: Definition 1. AP Prevalence: a ratio of the time duration (Tij ) during which the ith user spends in the j th AP to the time duration (Tconnected ) during which the network is connected. P revij =

Tij i Tconnected

(2)

The more a user visits an AP and/or spends time at the AP, the AP prevalence becomes higher. In [8], each user is classified into one of five groups (stationary, occasionally mobile, regular, somewhat mobile, and highly mobile) based on the maximum prevalence and the median prevalence. Since the AP prevalence does not take into account users’ mobility pattern of how a user maintains a session in AP, it is not able to represent the communication instability caused by user’s frequent movements among APs. The user persistence complements this shortcoming. Definition 2. User Persistence: a time duration during which the ith user stays at the j th AP until the user moves to another AP or the network link is down. n 

P resijk = Tij

(3)

k=1

where n is the number of sessions. If the terminal mobility of mobile devices is guaranteed, a mobile device can continuously interact with a job requestor and the user persistence does not

216

J. Lee et al.

contribute to reliability. In mobile Grid, however, a network link could be down unexpectedly. In such a situation, mobile devices are not able to communicate with the requestor. It poses an important factor for job allocation. As mentioned, there are four possible combinations depending on power status (on and off) and network link status (connection and disconnection). Since the network link cannot be established without power supply, the remaining three combinations are considered in our paper. The probabilities Pc , Pp , and Pd of each case are given by Equations (4), (5), and (6), respectively. Pc is a probability that the power is on and the network is connected. Pp is a probability that the power is on and the network is disconnected. Finally, Pd is a probability that the power is off and the network is disconnected. n P ersijk Tij i Pc = k=1 i = i (4) Tall Tall Ppi =

i Tup i i − Pc Tall

Pdi = 1 − (Pci + Ppi ) =

(5) i Tdown i Tall

(6)

Using the above three equations, we classify availability into three types: full availability, partial availability, and unavailability. Definition 3. Full Availability: a probability that a mobile device fully executes jobs and returns outcome via network link. Ac =

Pc Pc + Pp + Pd

(7)

Definition 4. Partial Availability: a probability that a mobile device fully executes jobs but does not return outcome due to the network failure. Ap =

Pp Pc + Pp + Pd

(8)

Definition 5. Unavailability: a probability that a mobile device does not execute jobs at all because the device is off. Ad = 1 − (Ac + Ap )

(9)

The job execution in a Grid consists of three phases: input transmission, computation, and outcome transmission. Data transmission may occur in the middle of computation, which is obviously possible only when a mobile device is connected to a network. We present the data transmission time in terms of the full availability and define communication unit time (ucm ) as a time to transmit data when the network condition is healthy. The expected transmission time Ecm in terms of ucm is given by Eq. (10). A condition of whether the network link is established at the beginning of transmission is included in the formula. Ecm = Ac ∗ ucm + (1 − Ac ) ∗ (ucm +

1 1 ) = ucm + −1 Ac Ac

(10)

Balanced Scheduling Algorithm Considering Availability in Mobile Grid

217

We define computation unit time (ucp ) as a time for computation when the computing resource is completely available. If an amount of communication (mucm ) is required for the c number of times during the job execution, the expected computation time Ecp in terms of ucp and ucm is given by Ecp = (Ac + Ap ) ∗ ucp + (1 − Ac − Ap ) ∗ (ucp + = ucp + mucm +

1 c + −c−1 Ac + Ap Ac

c  1 )+ Ecm (muicm ) Ac + Ap i=1

(11)

Therefore, the expected execution time E is expressed as follows. E = (sucm + mucm + eucm ) + ucp +

1 c+2 + −c−3 Ac + Ap Ac

(12)

where sucm is a time to transfer input data at the beginning and eucm is a time for outcome transmission at the end of a job execution. The first four terms (sucm , mucm , eucm , ucp ) in Eq. (12) represent communication times spent in reliable wired network. The rest (1/(Ac + Ap ), (c + 2)/Ac , c, 3) is mobile-computing specific factors. In other words, the expected execution time is increased by these three terms. Therefore, the parameters (Ac , Ap , c) should be used as criteria for choosing mobile devices as target Grid resources to minimize the overall execution time of workloads. Under a full availability condition like a traditional Grid, the expected computation time (Ecp ) and the expected execution time (E) are reduced to Eqs (13) and (14) since Ap becomes zero. Ecp = Ac ∗ ucp + (1 − Ac ) ∗ (ucp + = ucp + mucm +

1+c −c−1 Ac

c  1 )+ Ecm (muicm ) Ac i=1

E = (sucm + mucm + eucm ) + ucp +

(13) 3+c −c−3 Ac

(14)

The Eq. (14) is always higher than the Eq. (12). In other words, by means of utilizing devices of partial availability (Ap ) in mobile Grid, the expected execution time is decreased at a maximum by 1/Ac − 1/(Ac + Ap ). Therefore, to increase the performance in execution of workload, it is imperative for scheduling algorithm to consider both the full availability and the partial availability of the computing resources.

5

Balanced Scheduling Algorithm Considering Availability

In a Grid, a job is commonly classified into two types: computing-intensive job and communication-intensive job. Those have different resource requirements. In

218

J. Lee et al. Table 1. Nine groups based on the full availability and the partial availability

``` Ap Low Medium High ``` ``` Ac `` Ap ∈ [0, ωAp ) Ap ∈ [ωAp , oAp ) Ap ∈ [oAp , 1] Low Ac ∈ [0, ωAc ) Medium Ac ∈ [ωAp , oAc ) High Ac ∈ [oAc , 1]

LL ML HL

LM MM HM

LH MH HH

mobile Grid, it tends to provide a superior performance when a scheduler first assigns a job with a large amount of communication to a stable mobile device under healthy network condition. It is because the job execution could be delayed by unexpected network breakdowns in the middle of execution. Especially in mobile Grid, it is imperative to assign a job to a pertinent mobile device according to a job type. For the job allocation, we propose a multi-level queue scheduling algorithm with priority. The priority is determined based on the full availability and the partial availability. Mobile devices are classified into nine groups based on the full availability and the partial availability as shown in Table 1, where o and ω represent an upper bound and a lower bound, respectively, for the classification. In practice, communication reliability of the groups HH, HM, and HL is preferable for executing jobs since the computing resource and network condition are relatively stable and healthy. However, it is wasteful to assign all jobs to devices from just three groups in terms of execution time and resource utilization. A job with no communication may be executed in a resource with a full and a partial availability (Ac and Ap ), resulting in a superior performance in overall execution. In the opposite case where communication is involved in execution, the number of transmissions (c) should be additionally considered. We propose the following scheduling algorithm according to the job type. – A job with long communication interval is assigned to a mobile device with a higher Ap even if Ac is not high. – A job with a large amount of communication is assigned to a mobile device with a higher Ac . – Determine a priority level (high, middle, low) according to the communication interval of a job and select a queue corresponding to the level of Ap . – Determine a priority level (high, middle, low) according to the amount of communication and select a queue corresponding to the level of Ac .

6 6.1

Experiments Experimental Environment

We evaluated our scheduling algorithm using SimGrid toolkit [9] with a reallife trace: WLAN trace [10] of Dartmouth campus. After analyzing the trace, network information was extracted and it is supplied to the SimGrid platform. The trace is composed of the syslog records produced by APs from September 1,

Balanced Scheduling Algorithm Considering Availability in Mobile Grid

Fig. 2. Histogram of the number of sessions lasting less than 2 hours

219

Fig. 3. Cumulative density function of the number of sessions

2005 to October 4, 2006. The trace as of June 6, 2006 is chosen to create network information to provide input to the SimGrid platform. Fig. 2 shows a histogram of the number of sessions lasting less than 2 hours. Fig. 3 shows a cumulative density function of the number of sessions. The data shown in Fig. 3 includes sessions maintained for more than 24 hours. Due to the unstable communication environment, the number of the sessions maintained for less than 2 hours is about 80% of the trace. After fitting the probability density function to various statistic distributions, we found that the Pareto distribution fits best with the shape parameter 1.2602 and the scale parameter 3018.0. To provide the processor status information to SimGrid platform, we used the Weibull distribution since it effectively represents the machine availability [11]. We randomly extracted time durations by the inverse function of the Weibull distribution. Based on the network information and the time durations, we created testvectors for full and partial availabilities. The network information from the WLAN trace is used as time slots for full availability. The time durations are used as the time slots for partial availability. Then the processor start time is calculated by padding time duration from the network information in front of processor time duration, and the processor completion time is calculated by padding time duration from the network information behind the processor time duration. In this way, the processor information is synchronized with network information. Fig. 4 shows a distribution of Ac and Ap for mobile devices. The dotted lines on the x and y axes indicate upper and lower bounds of Ac and Ap , so it is classified into nine groups. Then we randomly created jobs with various computation and communication sizes and limited a number of data transmissions during the job execution to two. 6.2

Experimental Results

Our experiments investigated effects of two factors (i.e. user mobility and load balancing) on execution time. First, we experimented the effect of user mobility. Four methods are used as shown in Table 2. The method I-1 in Table 2,

220

J. Lee et al.

Fig. 4. Distribution of Ac and Ap for mobile devices Table 2. Four methods to evaluate effects of user mobility on performance Methods I-1 I-2 I-3 I-4 full availability X O O O partial availability X X O O data transmission in executing a job X X X O considerations

for example, means that it allocates jobs to resources without considering full availability, partial availability, and data transmission during job execution. Fig. 5 shows average execution time of each method when 4,000 jobs are executed. As shown, the method I-4 reports the shortest execution time. Our algorithm provides a 10% performance improvement compared to a prior work [4], of which condition is reflected in the method I-3. Currently, our dynamic scheduling algorithm is not directly applicable to batch jobs. However, we expect that the performance of batch jobs would be greatly enhanced by applying minmin and max-min according to the job type. Second, we investigated effects of the load balancing on execution time. Based on the method I-4, we experimented with the following three methods.

Fig. 5. Average execution time of Methods I-1, I-2, I-3, and I-4

Balanced Scheduling Algorithm Considering Availability in Mobile Grid

Fig. 6. Average execution time of Methods II-1, II-2, and II-3

221

Fig. 7. Standard deviation of load for Methods II-1, II-2, and II-3

– Method II-1: jobs are evenly allocated to mobile devices in a round robin fashion. – Method II-2: mobile devices are classified into three groups according to the full availability, and a job is allocated to a group corresponding to a job type. – Method II-3: mobile devices are classified into nine groups according to full availability and partial availability, and a job is allocated to a group corresponding to a job type. Fig. 6 shows average execution time of each method and Fig. 7 shows standard deviations of the number of loads on Grid resources. We found that the method II-2 provides the best performance, yet marginally better than the method II-3. However, the method II-2 reports the worst load balancing. The method II-1 provides the best load balancing but the worst execution time. The method II-3 reports a medium standard deviation of loads and comparable execution time to the best. Consequently, it is not unreasonable to state that the method II-3 is suitable for job scheduling because it is satisfactory in terms of execution time and at the same time it provides a relatively low standard deviation of load distribution. The average execution time and the degree of load balancing are influenced by the upper and the lower bounds. Because our scheduler determines a group according to job type, if many mobile devices belong to a group with a fewer jobs, the average execution time and the load balancing would get worsen. Therefore, it is imperative to determine the upper and the lower bounds dynamically.

7

Conclusions and Future Work

This paper presents a novel balanced scheduling algorithm in mobile Grid, considering mobility patterns of mobile device users. Our algorithm takes into account mobility and load balancing in scheduling. We analyzed user’s mobility patterns to quantitatively measure the resource availability that is classified into three types: full availability, partial availability, unavailability. An adaptive load balancing technique is also proposed by classifying mobile devices into nine

222

J. Lee et al.

groups depending on the full and the partial availabilities. The experimental results show that our scheduling algorithm provides a superior performance to the one without considering the partial availability. Throughout the experiments, we found that the partial availability and the grouping are crucial factors for the performance and the load balancing. Overall, our study provides effective algorithms to allocate mobile resource according to the job type. In the future, we are planning to conduct a wider variety of experiments to study additional factors that contribute to performance and load balancing in mobile Grid. We also have a plan to apply methods such as batch scheduling and dynamic selection of the upper and the lower bounds to a mobile Grid.

References 1. Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, San Francisco (2004) 2. Kurdi, H., Li, M., Al-Raweshidy, H.: A Classification of Emerging and Traditional Grid Systems. IEEE Distributed Systems Online 9(3) (2008) 3. Huang, C., Zhu, Z.T., Wu, Y.H., Xiao, Z.H.: Power-Aware Hierarchical Scheduling with Respect to Resource Intermittence in Wireless Grids. In: Proc. of the Fifth Int. Conf. on Machine Learning and Cybernetics (2006) 4. Park, S.M., Ko, Y.B., Kim, J.H.: Disconnected Operation Service in Mobile Grid Computing. In: Proc. of the Int. Conf. on Service Oriented Computing (2003) 5. Farooq, U., Khalil, W.: A Generic Mobility Model for Resource Prediction in Mobile Grids. In: Proc. of the Int. Symp. on Collaborative Technologies and Systems (2006) 6. Ghosh, P., Roy, N., Das, S.K.: Mobility-Aware Efficient Job Scheduling in Mobile Grids. In: Proc. of Cluster Computing and Grid (2007) 7. Litke, A., Skoutas, D., Tserpes, K., Varvarigou, T.: Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments. Future Generation Computer Systems 23, 163–178 (2007) 8. Balazinska, M., Castro, P.: Characterizing Mobility and Network Usage in a Corporate Wireless Local-Area Network. In: ACM MobiSys. (2003) 9. Casanova, H.: Simgrid: A toolkit for the simulation of application scheduling. In: Proc. of 1st IEEE/ACM Int. Symp. on Cluster Computing and the Grid (2001) 10. Henderson, T., Kotz, D.: CRAWDAD trace dartmouth/campus/syslog/05 06 (v. 2007-02-08), http://crawdad.cs.dartmouth.edu/dartmouth/campus/syslog/05_06 11. Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments, UCSB Computer Science Technical Report Number CS2003-28

Suggest Documents