Adaptive Utility-Based Scheduling in Resource-Constrained Systems David Vengerov Sun Microsystems UMPK16-160 16 Network Circle Menlo Park, CA 94025
[email protected]
Abstract. This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework for making repeated scheduling decisions based on dynamically observed information about unscheduled jobs and system’s resources. This framework generalizes the standard scheduling problem to a resource-constrained environment, where resource allocation (RA) decisions (how many CPUs to allocate to each job) have to be made concurrently with the scheduling decisions (when to execute each job). Discrete-time Optimal Control theory is used to formulate the optimization problem of finding the scheduling/RA policy that maximizes the average utility per time step obtained from completed jobs. We propose a Reinforcement Learning (RL) architecture for solving the NP-hard Optimal Control problem in real time, and our experimental results demonstrate the feasibility and benefits of the proposed approach.
Content Areas: Reinforcement Learning, Scheduling, Simulation, Fuzzy Logic
Adaptive Utility-Based Scheduling in Resource-Constrained Systems David Vengerov? Sun Microsystems UMPK16-160 16 Network Circle Menlo Park, CA 94025
[email protected]
Abstract. This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework for making repeated scheduling decisions based on dynamically observed information about unscheduled jobs and system’s resources. This framework generalizes the standard scheduling problem to a resource-constrained environment, where resource allocation (RA) decisions (how many CPUs to allocate to each job) have to be made concurrently with the scheduling decisions (when to execute each job). Discrete-time Optimal Control theory is used to formulate the optimization problem of finding the scheduling/RA policy that maximizes the average utility per time step obtained from completed jobs. We propose a Reinforcement Learning (RL) architecture for solving the NP-hard Optimal Control problem in real time, and our experimental results demonstrate the feasibility and benefits of the proposed approach.
1
Introduction
The classical real-time scheduling problem is that of deciding on the order in which the currently unscheduled jobs should be executed by a single processor, with no possibility of processor sharing. In the case of hard real-time systems, each job is characterized by an execution time and a deadline. A soft real-time system is the one where each job receives some utility as a function of its completion time. The concept of scheduling jobs based on Time Utility Functions (TUFs) was first introduced by Jensen in [3], where the objective of maximizing the total utility accrued by the system over time was used. The utility accrual (UA) paradigm is a generalization of the deadline scheduling in hard real-time systems. An important benefit of the UA paradigm is that it allows one to optimize the productivity of the system: average utility from completed jobs per unit of time ([4]). Recently, the originator of the UA paradigm E. D. Jensen has co-authored a paper [10], which claims to present the only scheduling algorithm that can handle arbitrarily shaped TUFs and multi-unit resource constraints (multiple resource types, each having multiple units). This algorithm is based on the idea that since the future is ?
This material is based upon work supported by DARPA under Contract No. NBCH3039002
3 unpredictable, it is best to use the “greedy” strategy of inserting as many “high utility” jobs into the schedule as early as possible. While it is a reasonable heuristic for a resource-constrained environment, it is possible to develop algorithms that do try to predict the future and hence can perform better than the greedy strategy. The Optimal Control Theory addresses the problem of finding a policy that maximizes a possibly discounted sum of future rewards and hence can be used to address the scheduling problem within the UA paradigm. The discrete-time optimal control problem, also known as the Markov Decision Problem (MDP), can be solved using dynamic programming if the system model is known and the number of possible system states is small. Neither of these conditions is satisfied in the real-world scheduling systems, and the Reinforcement Learning methodology ([1, 6]) was developed as a formal extension of dynamic programming to the case of unknown system models and large state spaces. We are not aware of any attempts of using the Optimal Control or the MDP formulation for solving the real-time scheduling problem within the UA paradigm, for a single CPU or a multi-CPU environment. This paper addresses the above shortcomings by presenting a general MDP framework and algorithms for solving scheduling problems within the UA paradigm in the multi-CPU environment. The key feature of the proposed scheduling approach is approximation of the value function that computes the expected long-term productivity of the machine (average utility per time step obtained from all completed jobs) as a function of its current state, which is accomplished using the Reinforcement Learning methodology. Once an approximation to this value function is obtained, any scheduling decisions can be made that modify the current machine state with the goal of increasing its state value. The experimental results implementing the above utility-based scheduling approach are presented, which demonstrate a consistent increase in machine productivity over the standard scheduling approach in the UA domain. This benchmark approach is the one lying at the heart of the algorithms presented in [2, 5, 10], suitably extended to handle multi-CPU jobs contending for a limited number of CPUs.
2
Problem Formulation
Each job arriving into the system requires a certain number of CPUs to be executed at the maximum rate but can be executed with fewer CPUs at a slower rate. The final job utility decays as its waiting+execution time increases. Then, if only a small number of CPUs are idle in the system, a question arises whether a new job that ideally requires more CPUs should still be scheduled, receiving a small final utility due to its prolonged execution time. Alternatively, the scheduler can wait until more CPUs become available to ensure a smaller execution time, while risking to have a long waiting time for this job and a low final utility once again. Another possible resource management decision in such a situation is to suspend some of the currently running jobs or compact them to a smaller number of CPUs (extending the execution time of each job) and free up enough CPUs for the waiting job to be executed at a fast rate. This paper presents a novel utility-based scheduling framework capable of adequately resolving the tradeoffs mentioned above. An overview of the considered problem is given below. More details will be given in section 4. The job execution time, if assigned the required number of CPUs, varies among the jobs but is assumed to be known (this assumption is not necessary in order for the scheduling policies to work
4 – job execution times can be random and knowing only the mean execution time for each job would suffice). The unit utility of each job is a decreasing function of the job completion time, which includes the job waiting time. The final utility of each job is its unit utility multiplied by K*L, where K is its desired number of CPUs and L is its ideal execution time if the job were to receive K CPUs. The K*L factor is introduced to reflect the assumption that the larger and the longer jobs should receive the same scheduling priority as the smaller and the shorter jobs – to make the scheduler indifferent between scheduling one job requiring K CPUs and having the ideal execution time L and scheduling K*L jobs each of which requires one CPU and has an ideal execution time of one unit of time. The jobs are assumed to arrive randomly into the system. The goal is to continually schedule incoming jobs onto the system and to decide how many CPUs should be allocated to each job so as to maximize average system productivity (final utility from all completed jobs) per time step.
3 3.1
Solution Methodology Overview
The basic scheduling algorithm uses the best-fit technique: the machine selects the next job to be scheduled which has the tightest fit for this machine (will have the fewest free CPUs remaining after the job is scheduled). If there is a tie among the jobs that provide the best fit to the available resources, then the highest unit utility job gets scheduled. In addition to this basic scheduling algorithm, this paper considers two types of utility-based policies, which apply if the machine cannot fit any of the currently waiting jobs. The first preemption policy decides whether one or several of the jobs currently running on that machine should be temporarily suspended in order to allow one of the waiting jobs to be scheduled with all the desired CPUs. The suspended jobs are placed in the waiting queue and can be resumed from the place they left off. The second oversubscribing policy decides whether any of the currently waiting jobs should be “squeezed” into the remaining free CPUs even if there are fewer of them than the job ideally desires. Both of these policies make scheduling decisions by selecting the job configuration that provides the best starting point for machine’s future operations in terms of maximizing the expected long-term utility (value) from all jobs completed in the future. Thus, the key component of the proposed scheduling framework is the possibility of learning the value function for a machine based on the state of its resources, the jobs it is executing and the jobs waiting to be scheduled. Once such a value function is obtained, any kind of scheduling decisions can be considered (not just preemption and oversubscribing), which can now be driven by maximization of the machine’s state value following the scheduling decision. Any kind of parameterized value function approximation can be used in the proposed scheduling framework, since the only criterion for its compatability with the reinforcement learning methodology for tuning its parameters is function differentiability with respect to each tunable parameter. As a demonstration, a parameterized fuzzy rulebase will be used in this paper to approximate machine value functions.
5
3.2
Fuzzy Rulebase
A fuzzy rulebase is a function f that maps an input vector x ∈