schedule (global state of the automaton and the values of active clocks) the ... the time to total termination starting from this state and following the optimal ...
As Soon as Probable: Optimal Scheduling under Stochastic Uncertainty Jean-Francois Kempf, Marius Bozga, and Oded Maler CNRS-VERIMAG University of Grenoble France @imag.fr
Abstract. In this paper we continue our investigation of stochastic (and hence dynamic) variants of classical scheduling problems. Such problems can be modeled as duration probabilistic automata (DPA), a well-structured class of acyclic timed automata where temporal uncertainty is interpreted as a bounded uniform distribution of task durations [18]. In [12] we have developed a framework for computing the expected performance of a given scheduling policy. In the present paper we move from analysis to controller synthesis and develop a dynamicprogramming style procedure for automatically synthesizing (or approximating) expected time optimal schedulers, using an iterative computation of a stochastic time-to-go function over the state and clock space of the automaton.
1
Introduction
The allocation over time of reusable resources to competing tasks is a problem of almost universal importance, manifested in numerous application domains ranging from manufacturing (job shop) to program parallelization (task-graph). When the system admits uncertainty, for example, in task arrival time or duration, one cannot execute a fixed time-triggered schedule but needs to develop a scheduling policy (or strategy) that prescribes scheduling decisions at different states that the system may find itself in. The general problem falls into the scope of controller synthesis for timed systems [5] and its extensions toward time-optimality [4] and cost-optimality [14]. The reader is referred to [1] for a general framework for modeling and solving dynamic scheduling problems based on a restricted class of timed automata and to [17] for a more general exposition of control in the presence of adversaries. The above mentioned work treats temporal uncertainty in the set-theoretic and worstcase tradition. A duration of a task can be any number in a given interval (no probability assigned) and the strategy is evaluated according to the worst performance induced by an instance of the external environment. Duration probabilistic automata (DPA) have been introduced in [18] to express stochastic scheduling problems and evaluate the expected performance of schedulers by methods similar to techniques used for generalized semi-Markov processes (GSMP) [10, 11, 7] such as [6, 2]. This approach refines the successor operator used in the verification of timed automata [9, 15] from a zone transformer into a density transformer and can compute, for example, the time distribution of reaching the final state and hence the expected termination time under a given
scheduler. In [12] we developed an alternative clock-free procedure for evaluating the performance of schedulers by computing volumes of zones in the duration space. In the present paper we move to the synthesis problem where we seek an optimal scheduling policy which decides in what global situations to start an enabled step of a process and under what conditions to wait and let other processes use the resource first. To this end we define a stochastic time-to-go function which assigns to any state of the schedule (global state of the automaton and the values of active clocks) the density of the time to total termination starting from this state and following the optimal strategy. These functions are piecewise-continuous and we show how they can be computed backwards from the final state. The rest of the paper is organized as follows. Section 2 provides preliminary definitions. Section 3 introduces single processes, shows how to model them by automata and solve the (degenerate) optimal scheduling problem by backward value iteration. In Section 4 we introduce several processes running in parallel and admitting resource conflicts. We model them as products of automata and define schedulers that, in each state, resolve the non-determinism associated with the decision whether to start a step. In Section 5 we define the basic iterative step for computing the value (optimal expected time-to-go) in a state of the automaton based on the value of its successors. The value/policy iteration algorithm using these operators defines the optimal strategy. In Section 6 we study the computational aspects of the algorithm. First we characterize the class of time densities obtained by applying the passive race-analysis part of the iteration. Then we prove a non-laziness property of optimal schedulers for this class of problems, which facilitates the approximate solution of the optimization part of the iteration. A discussion of future work closes the paper.
2
Preliminaries
Definition 1 (Bounded Support Time Density). A time density is a function ψ : R+ → R+ satisfying Z ∞ ψ[t]dt = 1. 0
A time density is of bounded support when ψ(t) 6= 0 iff t ∈R I for some interval I = [a, b]. A partial time density satisfies the weaker condition: ψ[t]dt < 1. A boundedsupport time density is uniform if ψ[t] = 1/(b − a) inside its support [a, b]. We use a non-standard notation for distributions: Z t ψ[≤ t] = ψ[t0 ]dt0 ψ[> t] = 1 − ψ[≤ t] 0
with ψ[≤ t] indicating the probability of a duration which is at most t. We use c to denote the “deterministic” (Dirac) density which gives R the constant c with probability 1. The expected value of a time density ψ is E(ψ) = ψ[t] · tdt. We will use time densities to specify durations of tasks (process steps) as well as the remaining time to termination given some state of the system. To this end we need the following operators on densities: 1) Convolution, to characterize the duration of two or
more tasks composed sequentially, and 2) Shift, to reflect the change in the remaining time given that the process has already spent some amount of time x. Definition 2 (Convolution and Shift). Let ψ , ψ1 and ψ2 be uniform densities supported by I = [a, b], I1 = [a1 , b1 ] and I2 = [a2 , b2 ], respectively. – The convolution ψ1 ∗ψ2 is a density ψ 0 supported by I 0 = I1 ⊕I2 = [a1 +a2 , b1 +b2 ] and defined as Z t
ψ 0 [t] =
ψ1 [t0 ]ψ2 [t − t0 ]dt0
0
– The residual density (shift) of ψ relative to a real number 0 ≤ x < b is ψ 0 = ψ/x such that ψ 0 [t] = ψ[x + t] · γa,b (x) where 1 if 0 < x ≤ a γa,b (x) = b−a b−x if a < x < b When x < a, ψ/x is a simple shift of ψ. When x > a we already know that the actual duration is at least x (restricted to the sub-interval [x, b]) and hence we need to normalize. Note also that (ψ ∗ c)[t] = (c ∗ ψ)[t] = ψ[t − c] and that 0 is the identity element for convolution. We write ψ 0 = ψ/x more explicitly as 0 when x + t ≤ a 1 when x ≤ a, a < x + t ≤ b 0 ψ [t] = b−a 1 when a < x, x + t ≤ b b−x 0 when b < x + t One can verify that the shift satisfies: ψ/x [t−x] = γa,b (x)·ψ[t] and (ψ/x )/y = ψ/(x+y) . Note that the expressive weakness (and computational advantage) of the exponential distribution is due to ψ/x = ψ. A subset of a hyper-rectangle is called a zone if it is expressible as a conjunction of orthogonal and difference constraints, namely constraints of the form xi ≤ c, xi − xi0 ≤ c, etc.
3
Processes in Isolation
Processes in our model are inspired by the jobs in the job-shop problem. Each process consists of an ordered sequence of steps, indexed by K = {1, . . . , k}, such that a step can start executing only after its predecessor has terminated.1 Definition 3 (Process). A sequential stochastic process is a pair P = (I, Ψ ) where I = {Ij }j∈K is a sequence of duration intervals and Ψ = {ψj }j∈K is a matching sequence of densities with ψj being the uniform density over Ij = [aj , bj ], indicating the duration of step j. Probabilistically speaking, step durations can be viewed as a finite sequence of independent uniform random variables {yj }j∈K that we denote as points y = (y1 , . . . , yk ) ranging over a duration space D = I1 × · · · × Ik ⊆ Rk with density ψ(y1 , . . . , yk ) = ψ1 (y1 ) · · · ψk (yk ). A state-based representation of a process is given by simple DPA. 1
See [1] for a straightforward generalization to partial-order precedence constraints.
Definition 4 (SDPA). A simple duration probabilistic automaton (SDPA) of k steps is a tuple A = (Σ, Q, {x}, Y, ∆) where Σ = Σs ] Σe is the alphabet of start and end actions with Σs = {s1 , . . . , sk } and Σe = {e1 , . . . , ek }. The state space is an ordered set Q = {q 1 , q1 , q 2 , . . . , qk , q k+1 } with q j states considered idle and qj states are active, x is a clock variable and Y = {y 1 , . . . , y k } is a set of auxiliary random variables. The transition relation ∆ consists of tuples of the form (q, g, r, q 0 ) with q and q 0 being the source and target of the transition, g is a guard, a precondition (external or internal) for the transition and r is an action (internal or external) accompanying the transition. The transitions are of two types: 1. Start transitions: for every idle state q j , j < k +1 there is one transition of the form (q j , sj , {x}, qj ). The transition, triggered by a scheduler command sj , activates clock x and sets it to zero; 2. End transitions: for every active state qj , there is a transition, conditioned by the clock value, of the form (qj , x = yj , ej , q j+1 ). This transition renders clock x inactive and outputs an ej event.
y1 := ψ1 () ··· yk := ψk ()
s1 x1 := 0 q1
q1
x1 = y1 e1 ···
xk = yk ek
sk xk := 0 qk
qk
q k+1
Fig. 1. A simple DPA.
For each step j we draw a duration yj according to ψj . Upon a scheduler command sj the automaton moves from a waiting state q j to active state qj in which clock x advances with derivative 1. The end transition is taken when x = yj , that is, yj time after the corresponding start transition. An extended state of the automaton is a pair (q, x) consisting of a discrete state and a clock value which represents the time elapsed since the last start transition. The extended state-space of the SDPA is thus S = {(q j , ⊥) : j ≤ k + 1} ∪ {(qj , x) : j ≤ k ∧ x ≤ bj } where ⊥ indicates the inactivity of the clock in waiting/idle states. Note the difference between transition labels sj and ej : the start transitions are controllable and are issued by the scheduler that we want to optimally synthesize while the end transitions are generated by the uncontrolled external (to the scheduler) environment represented by random variable yj . Without a scheduler, the SDPA is underdetermined and can issue a start transition any time. The derivation of an optimal scheduler is done via the computation of a time-to-go function that we first illustrate on the degenerate case of one process in isolation. In this case each state has only one successor and any waiting between steps unnecessarily increases the time to termination. Definition 5 (Local Stochastic Time to Go). The local stochastic time-to-go function associates with every state (q, x) a time density µ(q, x) with µ(q, x)[t] indicating the
probability to terminate within t time given that we start from (q, x) and apply the optimal strategy. This function admits the following inductive definition: µ(q k+1 , ⊥) = 0
(1)
µ(q j , ⊥) = µ(qj , 0)
(2)
Z µ(qj , x)[t] = 0
t
ψj /x [t0 ] · µ(q j+1 , 0)[t − t0 ]dt0
(3)
Line (1) indicates the final state while (2) comes from the fact that in the absence of conflicts the optimal scheduler need not wait and should start each step immediately when enabled. Equation (3) computes the probability for termination at t based on the probabilities of terminating the current step in some t0 and of the remaining time-to-go being t − t0 . It can be be summarized in a functional language as µ(qj , x) = ψj /x ∗ µ(q j+1 , ⊥) = ψj /x ∗ µ(qj+1 , 0)
(4)
The successive application of (4) yields, not surprisingly, µ(q1 , 0) = ψ1 ∗ · · · ∗ ψk . Definition 6 (Local Expected Time to Go). The expected time-to-go function is V : Q × X → R+ defined as Z V (q, x) = µ(q, x)[t] · tdt = E(µ(q, x)). This measure satisfies V (qj , x) = E(ψj /x ) + V (qj+1 , 0) where the first term is the expected termination time of step j starting from x. For the initial state this yields V (q1 , 0) = E(ψ1 ∗ · · · ∗ ψk ) = E(ψ1 ) + · · · + E(ψk ) =
k X
(aj + bj )/2.
j=1
4
Conflicts and Schedulers
We extend the model to n processes, indexed by N = {1..n}, that may run in parallel except for steps which are mutually conflicting due to the use of the same resource. For simplicity of notation we assume all processes to have the same number k of steps. Definition 7 (Process System). A process system is a triple (P, M, h) where P = P 1 || · · · ||P n = {(I i , Ψ i )}i∈N is a set of processes, M is a set of resources, and h : N × K → M is a function which assigns to each step the resource that it uses.
We use notations Pji to refer to step j of process i and ψji and Iji = [aij , bij ] for the respective densities and their support intervals. Likewise we denote the corresponding controllable and uncontrollable actions by sij and eij , respectively. Without loss of generality we assume there is one instance of each resource type, hence any two steps Pji 0 and Pji0 such that h(i, j) = h(i0 , j 0 ) are in conflict and cannot execute simultaneously. Each process is modeled as an SDPA Ai = (Σ i , Qi , {xi }, Y i , ∆i ) and the global system is obtained as a product of those restricted to conflict-free states. We write global 0 0 states as q = (q 1 , . . . , q n ) and exclude states where for some i and i0 , q i = qji , q i = qji 0 0 and steps Pji and Pji0 are conflicting. We say that action sij (respectively, eij ) is enabled in q if q i = q ij (resp. q i = qji ). Since only one transition per process is possible in a global state, we will sometime drop the j-index and refer to those as si and ei . Definition 8 (Duration Probabilistic Automata). A duration probabilistic automaton (DPA) is a composition A = S A1 ◦ · · · ◦ An = (Σ, Q, X, Y, ∆) of n SDPA with the action alphabet being Σ = i Σ i . The discrete state space is Q ⊆ Q1 × · · · Qn (with forbidden states excluded). The set of clocks is X = {x1 , . . . , xn }, S the extended state-space is S ⊆ S 1 × · · · S n and the auxiliary variables are Y = i Y i ranging over the joint duration space D = D1 × · · · × Dn . The transition relation ∆ is built using interleaving, that is, a transition (q, g, r, q 0 ) from q = (q 1 , . . . , q i , . . . , q n ) to i i q 0 = (q 1 , . . . , q 0 , . . . , q n ) exist in ∆ if a transition from (q i , g, r, , q 0 ) exists in ∆i , 0 provided that q is not forbidden. The DPA thus defined (see Fig. 2-a) is not probabilistically correct as it admits nondeterminism of a non probabilistic nature: in a given state the automaton may choose between several start transitions or decide to wait for an end transition (the termination of an active step). A scheduler selects one action in any state and then the only nondeterminism that remains is due to the probabilistic task durations. A discussion on different types of schedulers can be found in [12]. Definition 9 (Scheduler). A scheduler for a DPA A is a function Ω : S → Σs ∪ {w} such that for every s ∈ Σs , Ω(q, x) = s only if s is enabled in q and Ω(q, x) = w (wait) only if q admits at least one active component. Composing the scheduler with the DPA (see Fig. 2-b) renders it input-deterministic in the sense that any point y ∈ D induces a unique2 run of the automaton. Definition 10 (Steps, Runs and Successors). The steps of a controlled DPA A ◦ Ω, induced by a point y ∈ D are of the following types: sij
– Start steps: (q, x) −→ (q 0 , x0 ) iff q i = q ij and Ω(q, x) = sij ; eij
– End steps: (q, x) −→ (q 0 , x0 ) iff q i = qji and xi = yji ; t
– Time steps: (q, x) −→ (q, x + t) iff ∀i (q i = qji ⇒ xi + t < yji ). 2
We define a priority order among the ei -actions so that in the (measure zero) situation where two actions are taken simultaneously we impose the order to guarantee a unique run and avoid artifacts introduced by the interleaving semantics.
The run associated with y is a sequence of steps starting at (q 11 , . . . , q n 1 ) and ending in ). (q 1k+1 , . . . , q n k+1 The t-i-successor of a state (q, x) denoted by σ i (t, q, x), is the state (q 0 , x0 ) reached from (q, x) after a time step of duration t followed by a a transition of P i . The duration of a run is the sum of the time steps and it coincides with the termination time of the last process, known as makespan in the scheduling literature.
5
Expected Time Optimal Schedulers
In [12] we developed a method to compute the expected termination time under a given scheduler by computing volumes in the duration space. We now show how to optimally synthesize such schedulers from an uncontrolled DPA description. To this end we extend the formulation of stochastic time-to-go from a single process (3) to multiple processes (7). We define a partial order relation on global states based on the order on the local 0 states with q q 0 if for every i, q i q i . For extended states we let (q, x) (q 0 , x0 ) if either q ≺ q 0 or (q = q 0 ∧ x ≤ x0 ). The forward cone of a state q is the set of all states q 0 such that q ≺ q 0 . The immediate successors of a state q are the states reachable from it by one transition. A partial scheduler is a scheduler defined only on a subset of Q. To optimize the decision of the scheduler in a state we need to compare the effect of the possible actions on the time-to-go. Definition 11 (Local Stochastic Time-to-Go). Let A be a DPA with a partial strategy whose domain includes the forward cone of a state q. With every i, x and every s ∈ Σs ∪ {w} enabled in q, the time density µi (q, x, s) : R+ → [0, 1] characterizes the stochastic time-to-go for process P i if the controller issues action s at state (q, x) and continues from there according to the partial strategy. Note that for any successor q 0 of q the optimal action has already been selected and we denote its associated time-to-go by µi (q 0 , x0 ). Once µi (q, x, s) has been computed for every i, the following measures, all associated with action s, can be derived from it. Definition 12 (Global Stochastic Time-to-Go). With every state (q, x) and action s enabled in it, we define – The stochastic time-to-go for total termination (makespan) which is the expected n duration of the last task: µ(q, x, s) = max{µ1 (q, x, Z s), . . . , µ (q, x, s)}; – The expected total termination time: V (q, x, s) =
t · µ(q, x, s)[t]dt
The computation of µ for a state q, based on the stochastic time-to-go of its successors, is one of the major contributions of the paper. The hard part is the computation of the time-to-go associated with waiting in a state where several processes are active. In this race situation the automaton may leave q via different transitions and µ should be computed based on the probabilities of these transitions (and their timing) and the time-to-go from the respective successor states.
q41
e13
q31
e12
q21 q22
q21 s12
s12
s12 s22
q¯21 e11
s22
q11
q12
e21
q¯22
s22
q22
e22
q32
e21
q42
(a) q41
e13
q31
e12
q21
q21 q22 s12
q¯21 e11
q11
q12
e21 q¯22
s22
q22
e22
q32
e21
q42
(b) Fig. 2. (a) The DPA for two parallel processes admitting a resource conflict and their respective second steps, with the conflict state (q21 , q22 ) removed. The dashed arrows indicate start transitions which should be under the control of a scheduler while the dotted arrows indicate post-conflict start transitions; (b) The automaton resulting from composition with a FIFO scheduler which starts a step as soon as it is enabled.
With each state (q, x) we associate a family {ρi (q, x)}i∈N of partial time densities with the intended meaning that ρi (q, x)[t] is (the density of) the probability that the first process to terminate its current step is P i and that this occurs within t time. This is relative to the fact that the time elapsed since the initiation of each active step is captured by the respective clock value in x.3 Definition 13 (Race Winner). Let q be a state where n processes are active, each in a step admitting a time density ψ i . With every clock valuation x = (x1 , . . . , xn ) ≤ (b1 , . . . , bn ) and every i we associate the partial density: Y 0 ρi (q, x)[t] = ψ i /xi [t] · ψ i /xi0 [> t] i0 6=i 0
This is the probability that P i terminates in t time and every other process P i terminates within some t0 > t. Definition 14 (Computing Stochastic Time-to-go). For every i, the function µi is defined inductively as µi ((. . . q ik+1 . . .), x) = 0 0
(5)
0
µi (q, x, si ) = µi (σ i (0, q, x))) i
µ (q, x, w)[t] =
n Z X i0 =1
t
0
(6) 0
ρi (q, x)[t0 ] · µi (σ i (t0 , q, x))[t − t0 ]dt0
(7)
0
For any global state where P i is in its final state, µi is zero (5). Each enabled start action 0 si leads immediately to the successor state and the cost-to-go is inherited from there 0 (6). For waiting we make a convolution between the probability of P i winning the race and the value of µi in the post-transition state and sum up over all the active processes (7). The basic iterative step in computing the value function and strategy is summarized in Algorithm 1. A dynamic programming algorithm starting from the final state and applying the above procedure will produce the expected-time optimal strategy. Since we are dealing with acyclic systems, the question of convergence to a fixed point is not raised and the only challenge is to show that the defined operators are computable.
6
Computing Optimal Strategies
Algorithm 1 splits into three parts, the third being merely book-keeping. In the first we essentially compute the outcome of waiting (by race analysis) and of starting. As we 3
Note that the dependence on the time already elapsed is in sharp contrast with the memoryless exponential distribution where this time does not matter for the future. For those distributions the time-to-go is associated only with the discrete state and is much easier to compute, see [1] for the derivation of optimal schedulers for timed automata with exponential durations.
Algorithm 1: Value Iteration Input: A global state q such that Ω(q 0 , x) and µi (q 0 , x) have been computed for each of its successors q 0 and every i Output: Ω(q, x), and µi (q, x) % COMPUTE : forall s ∈ Σs ∪ {w} enabled in q for i = 1..n compute µi (q, x, s) according to (6-7) end compute µ(q, x, s) (max of random variables) compute V (q, x, s) (expected makespan) end % OPTIMIZE : forall x ∈ Zq V (q, x) = mins (V (q, x, s)) s∗ = arg mins V (q, x, s) Ω(q, x) = s∗ end % UPDATE : for i = 1..n µi (q, x) = µi (q, x, s∗ ) end
shall see, starting from a specific class of time densities, this computation can be done in a symbolic/analytic way, resulting in closed-form expressions over x and t for the values of µi , µ and V associated with each action. The second part involves optimization: to classify extended states according to the action that optimizes V in them. For this part we prove a monotonicity property which facilitates the task of approximating the boundaries between the optimality domains of the various actions. Each of the functions we have defined is in fact an infinite family of functions parameterized by a state q and clock valuation x ranging over the rectangle Zq . They can be characterized as follows. Definition 15 (Zone-Polynomial Time Densities). A function µ : Z → (R+ → [0, 1]) over a rectangular clock space Z is zone-polynomial if it can be written as f1 (x1 , . . . , xn )[t]) if Z1 (x1 , . . . , xn ) and l1 ≤ t ≤ u1 f2 (x1 , . . . , xn )[t]) if Z2 (x1 , . . . , xn ) and l2 ≤ t ≤ u2 µ(x1 , . . . , xn )[t] = ... fL (x1 , . . . , xn )[t]) if ZL (x1 , . . . , xn ) and lL ≤ t ≤ uL where – For every r, Zr (x1 , ..., xn ) is a zone included in the rectangle Z, which moreover satisfies either Zr ⊆ [xi ≤ ai ] or Zr ⊆ [ai ≤ xi ], for every i = 1..n.
– For every r, the bounds lr , ur of the t interval are either nonnegative integers c or terms of the form c − xi , with i = 1..n, c ∈ Z+ . Moreover, the interval [lr , ur ] must be consistent with the underlying zone, that is, Zr ⊆ [lr , ur ]. P Pk (x1 ,...,xn ) k – For every r, fr (x1 , ..., xn )[t] = Qr (x1 ,...,xn ) t where Pk are arbitrary polynok
mials Y and Qr is a characteristic polynomial associated with zone Zr defined as (bi − max{xi , ai }). Note that for each zone, the max is attained uniformly as i
either ai or xi . Theorem 1 (Closure of Zone-Polynomial Densities). Zone-polynomial time densities are closed under (6) and (7). Sketch of Proof: Operation 6 is a simple substitution. Closure under summation is also evident - you just need to refine the partitions associated with the summed functions and make them compatible and then apply the operation in each partition block. The only 0 intricate operation is the quasi-convolution part of (7). The function µi (σ i (t0 , q, x))[t − t0 ] is not a zone polynomial time density. Due to the time progress by t0 enforced by the 0 substitution σ i it might happen that polynomials of the form (bi −(xi −t0 )) appear in the denominators. But, in all feasible cases, they will be simplified through multiplication 0 by ρi (q, x)[t0 ] which contains the same polynomials as factors (within ψ i /xi [≥ t0 ], see Def. 13). Hence, integration of t0 is always trivially completed as t0 occurs only on numerator polynomials and/or powers of the form t0k and (t − t0 )k . Moreover, after integration, the remaining constraints on t and x can also be rewritten to match the required form of zone-polynomial time densities. Next we move to the optimization part of the iteration. Consider a state q where process P i has to decide whether or not to start a step while other processes are active in their respective steps. The optimal strategy Ω in this state partitions the clock space of the other processes into Ω −1 (si ) and Ω −1 (w), extended states where waiting or starting are preferred. The boundary corresponds to the set of solutions of the piecewise-polynomial equation V (q, x, s) = V (q, x, w). Our approach is to approximate the optimal strategy based on sampling: we cut the clock space Zq into an ε-grid and for each grid point x we compare V (q, x, s) and V (q, x, w) and select the optimal value (Fig. 3-(a)). Once the optimal action has been computed for all grid points we complete the strategy for the rest of the clock-space by selecting in each ε-cube the action which is optimal for its leftmost corner (Fig. 3-(b)). To estimate the deviation from the optimal strategy we first bound the derivative of V with respect to any of the clocks. Lemma 1 (Derivative of V ). Let V be the value function associated with a problem. ∂V Then for every (q, x) and for every i (q, x) ≥ −1 ∂xi Sketch of Proof: Consider first the local value function of a process in isolation. In any state q, the clock space splits into two parts. When x < a, the derivative is naturally −1. When x > a the rate of progress is slower (because progress is combined with accumulation of optimism-contradicting information) and the magnitude of the deriva-
x3
x3 w
s
x2
x2
Fig. 3. Approximating the optimal action for P 1 in the clock space of P 2 ||P 3 : (a) Computing the optimal action for all grid points with the dark circle indicating Ω −1 (w); (b) Approximating the optimal strategy with the dark cubes indicating Ω 0−1 (w).
tive is smaller.4 When running together, the progress of each process is bounded by its progress in isolation or by the progress of another process that blocks it. The progress of the expected termination of the last process (makespan) is bounded by the progress of each of the processes. Lemma 2 (Approximation). Let x0 be a point in the clock space and let x < x0 be its nearest grid point on the left and hence d(x, x0 ) < ε. Assume without loss of generality that the optimal strategy Ω satisfies Ω(q, x) = s and Ω(q, x0 ) = w. Consider an approximate strategy Ω 0 such that Ω 0 (q, x0 ) = s, then the respective value functions of the strategies satisfy V 0 (q, x0 ) − V (q, x0 ) ≤ ε. Proof: According to what we know about the optimal strategy we have V (q, x0 , w) < V (q, x0 , s) < V (q, x, s) < V (q, x, w) and from Lemma 1, V (q, x, s) − V (q, x0 , w) ≤ ε and so is V (q, x0 , s) − V (q, x0 , w) = V 0 (q, x0 ) − V (q, x0 ). Before we state the consequent main result, we prove an important property of optimal strategies which can reduce the number of grid points for which the optimal strategy should be computed. This “non-laziness” property, formulated in the deterministic setting in [1], simply captures the intuition that preventing an enabled process from taking a resource is not useful unless some other process benefits from the resource during the waiting period.5 The proof in a non-deterministic setting is more involved. 4
5
The derivative of V represents the progress toward the average duration, minus the growth of the average itself when x > a. Or, in scheduling under uncertainty, if some information gathered during the period has increased the expected time-to-go associated with waiting, which is impossible in our setting. Non-lazy schedules are also known as active schedules.
Definition 16 (Laziness). A scheduling policy Ω is lazy at (q, x) if Ω(q, x + t) = si for some i and Ω(q, x + t0 ) = w for every t0 ∈ [0, t). A schedule is non-lazy if no such (q, x) exists.6 Theorem 2 (Non Lazy Optimal Schedulers). The optimal value V can be obtained by a non-lazy scheduler. To prove the theorem we need a lemma whose proof can be found in the appendix. Lemma 3 (Value of Progress). Let q be a state and let x and x0 be two clock valuations i which are identical except for x0 = xi + δ. Then the value of (q, x0 ) is at least as good as the value of (q, x), that is, V (q, x0 ) ≤ V (q, x). The lemma can be wrongly interpreted as saying that always a more advanced state has a better value. This is true in general only for progress that does not induce a possibility of blocking other processes: advancing the clock of an already-active step or starting a step on a resource that has no other future users. Starting a step on a resource that has other users in its horizon is a type of progress which is outside the scope of the lemma. Proof of Theorem 2: Imagine a strategy Ω which is lazy at (q, x) and takes its earliest start at (q, x + t), as illustrated in Fig. 4-(a). Following an alternative strategy that starts at x, we will find ourselves after t time in a state where one clock is more advanced (Fig. 4-(b)) and hence satisfying the condition of Lemma 3. If we keep all instances of laziness partially ordered according to and apply the above modification starting from the minimal elements of the state space, we gradually push the earliest laziness toward later states until it is completely eliminated.
x
x+t
(a)
x
x+t
(b)
Fig. 4. Proof of theorem: the state reached after starting at x + t (a) is less advanced than after starting at x (b).
To illustrate the limited scope of the lemma and theorem consider a task whose duration is characterized by a discrete probability with probability p for a and 1 − p for b. In this case, the value function associated with waiting is p(a − x) + (1 − p)(b − x) when x < a V (x) = 0(a − x) + 1(b − x) when x > a Here at x = a there is a jump in V from (1 − p)(b − a) to (b − a) which is, intuitively, due to the accumulation of information: after a time, the non-occurrence of an end event 6
This definition of laziness can be extended from a one-dimensional interval [x, x + t] to any pair of clock valuations x and x0 such that x0 is reachable from x by a time step.
tells us that the duration is certainly b. Such a situation contradicts the lemma, because for x < a < x0 we may have V (x0 ) > V (x). This jump in the expected time-to-go for waiting can also justify laziness: when x < a the expected value of waiting can be better than for starting, but after x = a the relation between these values may change. Corollary 1 (Forward-Backward Closure). Let (q, x) be an extended state such that V (q, x, w) < V (q, x, s) and hence the optimal scheduler satisfies Ω(q, x) = w. Then Ω(q, x0 ) = w for any x0 = x+t. Likewise if V (q, x, w) > V (q, x, s) then Ω(q, x0 ) = s for all x0 = x − t. This property can be used to reduce the number of grid points for which the strategy should be computed: Ω(q, x) = w implies Ω(q, x0 ) = w for any grid point of the form x0 = x + mε and likewise if Ω(q, x) = s, starting is also optimal for any x0 = x − mε. Using an adaptation of the multi-dimensional binary search algorithm of [16], originally devised to approximate Pareto fronts for multi-criteria optimization problems, one can complete the marking of grid points with less than (1/ε)n evaluations of the strategies. Theorem 3 (Main Result). Let Ω be the expected-time optimal scheduler whose value at the initial state is V . For any ε, one can compute a scheduler Ω 0 whose value V 0 satisfies V 0 − V ≤ ε. Proof: Apply Algorithm 1 while going backwards from the final state. In the optimization part use sampling with grid size ε/nk. The division by nk is needed to tackle the (theoretical) possibility of approximation error accumulation along paths.
7
Discussion
To the best of our knowledge, this work presents the first automatic derivation of optimal controllers/schedulers for “non-Markovian” continuous-time processes. We have laid down the conceptual and technical foundation for treating clock-dependent time densities and value functions and investigated the properties of optimal schedulers. We list below some ongoing and future work directions: 1. The algorithm as described here works individually on each global state, which is a recipe for quick state explosion. A more sophisticated algorithm that works on the whole decision subspace associated with an action s and taking advantage of forward/backward closure, will be much more efficient. Although the stronger property of upward-downward closure does not follow, unfortunately, from nonlaziness, the multi-dimensional binary search algorithm of [16] could provide good approximations whose error analysis will require a more refined characterization of the partition induced by the optimal strategy. 2. Implementation and experimentation: we currently have a prototype implementation building upon the infra-structure developed in [12] for computing integrals over zones. Once completed, we intend to compare cost/quality tradeoffs of our algorithm with statistical methods such as [8, 13] that evaluate and synthesize schedulers based on sampling the duration space. Such experiments will determine whether symbolic techniques can be part of tools for design-space exploration [13].
3. Extending the result to cyclic systems will require a definition of the value associated with infinite runs and some new techniques for proving convergence to a fixed point in the spirit of [3]. Introducing stochasticity in task arrival will enrich queuing theory with the expressive advantage of automata. 4. One can think of alternative measures for evaluating the performance of the scheduler. For example, rather than considering the expected makespan (max over all processes) we can optimize the average termination time of all tasks, or even employ a multi-dimensional vector of termination times in the multi-criteria spirit. Acknowledgement: The project benefitted from the support of the French ANR project E QINOCS and numerous useful comments given by E. Asarin.
References 1. Y. Abdedda¨ım, E. Asarin, and O. Maler. Scheduling with timed automata. Theoretical Computer Science, 354(2):272–300, 2006. 2. R. Alur and M. Bernadsky. Bounded model checking for GSMP models of stochastic realtime systems. In HSCC, pages 19–33, 2006. 3. E. Asarin and A. Degorre. Volume and entropy of regular timed languages: Analytic approach. In FORMATS, 2009. 4. E. Asarin and O. Maler. As soon as possible: Time optimal control for timed automata. In HSCC, pages 19–30, 1999. 5. E. Asarin, O. Maler, A. Pnueli, and J. Sifakis. Controller synthesis for timed automata. In Proc. IFAC Symposium on System Structure and Control, pages 469–474, 1998. 6. L. Carnevali, L. Grassi, and E. Vicario. State-density functions over DBM domains in the analysis of non-Markovian models. IEEE Trans. Software Eng., 35(2):178–194, 2009. 7. C.G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. Springer, 2008. 8. A. David, K. Larsen, A. Legay, M. Mikuˇcionis, and Z. Wang. Time for statistical model checking of real-time systems. In CAV, pages 349–355, 2011. 9. C. Daws, A. Olivero, S. Tripakis, and S. Yovine. The tool KRONOS. In Hybrid Systems, pages 208–219, 1995. 10. R. German. Non-markovian analysis. In E. Brinksma, H. Hermanns, and J.-P. Katoen, editors, Euro Summer School on Trends in Computer Science, pages 156–182, 2000. 11. P.W. Glynn. A GSMP formalism for discrete event systems. Proceedings of the IEEE, 77(1):14–23, 1989. 12. J.-F. Kempf, M. Bozga, and O. Maler. Performance evaluation of schedulers in a probabilistic setting. In FORMATS, pages 1–15, 2011. 13. Jean-Francois Kempf. On Computer-Aided Design-Space Exploration for Multi-Cores. PhD thesis, University of Grenoble, October 2012. 14. K.G. Larsen, G. Behrmann, E. Brinksma, A. Fehnker, T. Hune, P. Pettersson, and J. Romijn. As cheap as possible: Efficient cost-optimal reachability for priced timed automata. In CAV, 2001. 15. K.G Larsen, P. Pettersson, and W. Yi. UPPAAL in a nutshell. International Journal on Software Tools for Technology Transfer (STTT), 1(1):134–152, 1997. 16. J. Legriel, C. Le Guernic, S. Cotton, and O. Maler. Approximating the Pareto front of multicriteria optimization problems. In TACAS, 2010. 17. O. Maler. On optimal and reasonable control in the presence of adversaries. Annual Reviews in Control, 31(1):1–15, 2007. 18. O. Maler, K.G. Larsen, and B.H. Krogh. On zone-based analysis of duration probabilistic automata. In INFINITY, pages 33–46, 2010.
Appendix: Proof of Lemma 1 Lemma 1 (Value of Progress) Let q be a state and let x and x0 be two clock valuations i which are identical except for x0 = xi + δ. Then the value of (q, x0 ) is at least as good as the value of (q, x), that is, V (q, x0 ) ≤ V (q, x). Sketch of Proof: We prove it by induction on the number of transitions remaining between q and the final state. Base case: When there is only one end transition pending and one active clock working toward a duration characterized by a time density ψ, the value is defined as Z
b
ψ/x (y)(y − x)dy =
V (q, x) = max(x,a)
(a + b)/2 − x when x ≤ a (b − x)/2 when x ≥ a
and the derivative dV /dx is negative (−1 and then −1/2). Hence V (q, x0 ) < V (q, x). Inductive case: suppose the lemma holds for all states beyond q. We will show that each action taken in (q, x0 ) can lead to a state at least as advanced as the state reached via the same action from (q, x). For an immediate start transition this is evident. Suppose we take action w at both (q, x0 ) and (q, x). Then there are three possibilities: if the 0 0 race winner at (q, x0 ) is some P i , i0 6= i, taking an ei transition to q 0 , then the same 0 P i will win the race from (q, x) within the same time, landing in q 0 with value of i x less advanced by δ and the inductive hypothesis holds, see Figure 5-(a) and the corresponding runs ξx and ξx0 , depicted below projected on the two relevant clocks: 0
0
ei
0
t
ξx0 : (xi + δ, xi ) −→ (xi + δ + t, xi + t) −→ (xi + δ + t, ⊥) 0
0
t
0
ei
ξx : (xi , xi ) −→ (xi + +t, xi + t) −→ (xi + t, ⊥).
Suppose, on the contrary, that P i , the process whose clock is more advanced in x0 than in x, wins the race from (q, x0 ) within t time. If P i is also the winner from (q, x), the ei transition will be taken by ξx after t + δ time. By waiting at q 0 for δ time, run ξx0 can reach the same state as ξx , see Figure 5-(b) and the corresponding runs: 0
ei
0
t
0
0
δ
ξx0 : (xi + δ, xi ) −→ (xi + δ + t, xi + t) −→ (⊥, xi + t) −→ (⊥, xi + t + δ) 0
ei
0
t+δ
0
ξx : (xi , xi ) −→ (xi + δ + t, xi + δ + t) −→ (⊥, xi + t + δ). 0
Finally, suppose the race winner from (q, x) is P i within some t + δ 0 time, δ 0 < δ, leading to state q 00 . This splits into two sub-cases. If at q 00 the run ξx does not start a new step, then, by waiting at q 0 for δ − δ 0 time, ξx0 can reach within t + δ 0 the same extended state that ξx has reached in more time t + δ: 0
t
ei
0
δ0
0
0
ei
ξx0 : (xi + δ, xi ) −→−→ (⊥, xi + t) −→ (⊥, xi + t + δ 0 ) −→ (⊥, ⊥) 0
t+δ 0 ei
0
δ−δ 0
ei
ξx : (xi , xi ) −→−→ (xi + δ 0 + t, ⊥) −→ (xi + δ + t, ⊥) −→ (⊥, ⊥).
xi
0
ei
t+δ
δ
0
ei x
i0
t x0
x x0
x
xi
xi
(a)
(b)
0
xi
B ei δ − δ0
δ − δ0 A 0
ei
0
ei 0
xi
ei
δ0
t + δ0 t x0
x
xi
(c) 0
Fig. 5. Proof of lemma: (a) P i wins from both x and x0 ; (b) P i wins from both. (c) P i wins from 0 x0 and P i wins from x. Here either ξx0 reaches point A in less time than does ξx , or both reach point B at the same time.
If, on the other hand, during the interval of duration δ − δ 0 in which ξx catches up with 0 0 the progress of ξx0 in P i , ξx makes an si transition to start a next step of P i , so can ξx0 and they will reach the same state: 0
t
ei
0
δ0
0
ei
0
0
δ−δ 0
si
ξx0 : (xi + δ, xi ) −→−→ (⊥, xi + t) −→ (⊥, xi + t + δ 0 ) −→−→ (⊥, 0) −→ (⊥, δ − δ 0 ) 0
0
t+δ 0 ei
si
0
δ−δ 0
ei
ξx : (xi , xi ) −→−→−→ (xi + δ 0 + t, 0) −→ (xi + δ + t, δ − δ 0 ) −→ (⊥, δ − δ 0 ).
These two cases are illustrated in Figure 5-(c). The bottom line of this case analysis is that for every action taken in (q, x) and (q, x0 ) and for every y, there exist durations ty and t0y such that 0 ≤ t0y ≤ ty , a state py such that q ≺ py and clock valuations zy and zy0 such that zy ≤ zy0 in the value of one clock (as in the premise of the lemma) and ty
t0y
(q, x) −→ (py , zy ) and (q, x0 ) −→ (py , zy0 ). The cost to go from (q, x) can be written as Z V (q, x) = ψ(y) · (ty + V (py , zy ))dy and since for every y, we have t0y ≤ ty and, moreover, following the inductive hypothesis V (py , zy0 ) ≤ V (py , zy ), we can conclude that V (q, x0 ) ≤ V (q, x).