rational systems analysis, design and optimization of complex computer systems. In its outlook ...... on page writes need be included in the workload description.
Modelling and Performance Evaluation of Computer Systems Edited by H. BEILNER and E. GELENBE
NORTH-HOLLAND
MODELLING AND PERFORMANCE EVALUATION OF COMPUTER SYSTEMS
MODELLING AND PERFORMANCE EVALUATION OF COMPUTER SYSTEMS Proceedings of the International Workshop organized by THE COMMISSION OF THE EUROPEAN COMMUNITIES Joint Research Centre, Ispra Establishment, Department A Ispra (Varese), Italy October 4-6, 1976 edited by
H. BEILNER Universität Dortmund, FRG
and
E. GELENBE ¡ria-Laboria, Rocquencourt, France
cmc 1977 NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM · NEW YORK · OXFORD
© North-Holland Publishing Company, Amsterdam and ECSC, EEC EAEC, Brussels and Luxembourg, (1977) AII rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
Library of Congress Catalog Card Number: 77-1179 North-Holland ISBN: 0 7204 0554 8 PUBLISHERS:
NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM, NEW YORK, OXFORD
SOLE DISTRIBUTORS FOR THE U.S.A. AND CANADA:
ELSEVIER / NORTH-HOLLAND, INC. 52 VANDERBILT AVENUE, NEW YORK. N.Y. 10017
for
The Commission of the European Communities, Directorate-General Scientific and Technical Information and Information Management, Luxembourg EUR 5659 LEGAL NOTICE Neither the Commission of the European Communities nor any person acting on behalf of the Commission is responsible for the use which might be made of the following information.
PRINTED IN THE NETHERLANDS
CONTENTS
FOREWORD PROGRAMME CHAIRMAN'S PREFACE Statistical sequential methods in performance evaluation of comrajter systems M. ARATO
vii ix
-1
Accuracy of an approximate computer system model M. BADEL, A.V.Y. SHUM
11
A characterization of VM/370 workloads Y. BARD
35
Scheduling dependent tasks with different arrival times to meet deadlines J. BYMBIICZ A performance evaluation of the CII SIRIS 8 operating system - methodology, tools and first results L. BOI, P. CROS, J.P. DRUCBERT, J.Y. ROUSSELOT, P. BOURRET, R. TREPOS
67
Task sequencing in a batch environment with setup tiroes J. BRUNO, R. SETHI
S1
Product form and local balance in queueing networks K.M. CIÎANDY, J.II. HOWARD, JR., D.F. TOWSLEY
39
57
Optimal queueing policies in multiole-processor computers G. FAYOLLE, M. ROBIN
103
File assignment in memory hierarchies D. FOSTER, J.C. BROWNE '
119
Maximum load and service delays in a data-base system with recovery from failures E. GELENBE, D. DEROCHETTE
129
Random injection control of multiprogramming in virtual memory E. GELENBE, A. KURINCKX
143
A modelling approach to the evaluation of computer system performance H. GOMAA
< 171
The use of memory allocation to control response tiroes in paged computer systems with different job classes J.if. HINE, I. MITRANI, S. TSUR
201
Comparison of the working sets and bounded locality intervals of a program J. LENFANT
217
CONTENTS Deterministic job scheduling in computing systems C.L. LIU
241
Formal modelling of discrete dynamic systems H.C. MAYR, P.C. LOCKEMANN
255
Ergodicity conditions and congestion control in computer networks G. PUJOLLE
287
Queueing model of a multiprogrammed computer system with a jobqueue and a fixed number of initiators M. REISER, A.G. KONHEIM
319
An analytic model of dispatching algorithms Y. SHOHAT , J.C. STRAUSS
335
Practical considerations in the numerical analysis of Markovian models W.J. STEWART
363
Working set dynamics H. VANTILBORGH
377
Comparison of global memory management strategies in virtual memory systems with two classes of processes A.L. SCHOUTE
389
An approach to adapting a multiaccess time-sharing computer system to user requirements by simulation methods M. BAZEWICZ, Λ. PETERSEIL
415
A versatile programmable hardware monitor Y. BEKKERS, Β. DECOUTY
423
An automatic clustering technique applied to workload analysis and system tuning H. FANGMEYER, R. GLODEN, J. LARISSE
427
Front- and back-end minicomputer arrangement in multiprogramming environment D. GRILLO, A. PERUGIA
435
Statistical problems in the simulation of computer systems J P.C. KLEIJNEN
467
Performance evaluation of a batch-time sharing computer system using a trace driven model V. MINETTI
485
Hardware measurement of CPU activities H. SCHREIBER
499
An approach to the straightforward production of computer system simulators 0. TEDONE
501
Scheduling with memory allocation in multiprocessing systems J. WEGLARZ
513
AUTHOR INDEX
515
FOREWORD In August 1974 IRIA organized the First International Workshop on Modelling and Performance Evaluation of Computer Systems and also took the initiative that similar workshops could be organized in subsequent years. Upon the initiative of Professor E. Gelenbe, IRIA and the Joint Research Centre, (JRC), of the Commission of the European Communities, the Ispra Establisliment, was invited to host the Second International Workshop on Modelling and Performance Evaluation of Computer Systems which took place in Stresa on October 4-6, 1976 with an attendance of almost 200 participants. In accepting to host the workshop, the Joint Research Centre wanted to give a further contribution to stimulating a forum for exchange of ideas in an area of Computer Science whose importance is ever increasing. We felt a series of Work shops in tiiis subject held at different locations in Europe may have a longlasting effect on suitable and timely trends in research. The Workshop was co-sponsored by IRIA and IFIP Working Group 7.3 (Computer Systems Modelling). The Programme was arranged by a Scientific Programme Committee composed of: II. Beilner D.P. Bovet G. Iazeolla H. Fangmeyer E. Gelenbe J. Larisse M.M. Lehman P.J. Courtois
Universität Dortmund, Germany Università di Pisa, Italy Universita di Pisa, Italy JRC - Department A, Ispra. Italy IRIA Laboria, France (Chairman) JRC - Department A, Ispra, Italy Imperial College, United Kingdom Laboratoire M.B.L.E., Brussels, Belgium
The organizatorial arrangements were amply made by Messrs. Fangmeyer and Larisse of the Ispra Establisliment and by Mrs. Moretti and her staff of the Press and Public Relations Service. Many people have thus contributed to an important Workshop whose results are presented for a wider public in this volume.
H.J. Helms Workshop Chairman
PROGRAt-ME CHAIRMAN'S PREFACE In the last fifteen years, a discipline under the broad title of Computer System Modelling and Performance Evaluation has come into existence within computer science and engineering. The basic motivation for this discipline is highly practical : its objective is to create the tools for the quantitative and rational systems analysis, design and optimization of complex computer systems. In its outlook this area is goal oriented : problems are tackled primarily when practical impact of the solutions can be foreseen. Therefore a substantial part of the critique and discussion which surround this field, as Dr. Helms so rightly pointed out in his introductory remarks to this Workshop, concerns the applicability of the methods and solutions which are proposed. A number of key topics having strong potential for continued growth have been identified. Probabilistic models of computer systems are foremost among these since they combine the ability to be directly applied to concrete industrial problems and sufficient intellectual challenge to justify deep mathematical work. The applicability of these models is now widely recognized and a multitude of case studies of their use have been documented. This recognition is confirmed by the fact that there are two ongoing industrial projects (one in the U.S. and the other in France) for developing computer-aided modelling packages based upon networks of queues models and their analytical solutions. These models also play a central role with respect to conventional simulation and measurement techniques : they are used to validate simulation models, they help the measurement expert to formulate pertinent questions concerning the experiments he should design, and in general serve to clarify the issues and to structure the questions which the system performance analyst must tackle. It is perfectly legitimate and necessary that the cases where these models are not sufficiently precise, or where their assumptions are too simplistic, be brought out. This can only serve as an incentive for the serious researcher, and as a warning to the user who wants to call upon certain mathematical results without carefully examining the assumptions under which he is working. However criticism of an area based on the observation of a limited or marginal sample of papers is unjustified. It should be pointed out that these models do allow us to examine time dependent behaviour of computer systems if we are willing to pay the computational cost. Although initially the mathematical results which were used in this area were well known to queueing theorists, it is now a pleasure for a computer scientist to observe that this situation has rapidly reversed itself. Computer scientists have been able to obtain exact or approximate solutions to large classes of queueing networks which had not been previously treated. In this respect at least, computer science will have made a contribution to applied probability theory. We can hope for many more results of this nature which combine practical impact with more broad theoretical significance. Another key subject matter within computer system performance evaluation is the analysis of deterministic schedules of computations and of tasks. One can observe two basic trends of research in this area : the search for "good" algorithms for organizing computations on single or multiple processors (algorithms for matrix operations, for parallel computations, for task sequencing, etc.) with or without time constraints (e.g. as in real-time scheduling), and the classification of scheduling problems with respect to their intrinsic difficulty (i.e. complexity). Although immediate applications of results in this area are not as numerous, one can see the importance it can have if most computer scheduling problems can be systematically classified as either possessing a computationally simple solution or an heuristic for which performance bounds are known.
PROGRAMME CHAIRMAN'S PREFACE Simple statistical models of computer systems have also been proposed and applied, both in the characterization of workloads and in order to provide global causeand-effect relationships for computer systems. We should expect to see more statistics used to provide measures of the degree of confidence to be awarded to models of specific computer systems in the future. The papers in this volume cover many of these topics. Application areas include detailed performance studies of operating systems, program behaviour models and measurements, models and simulations of virtual memory page-on-demand systems, scheduling of multiprocessor systems, and reliable information processing in the presence of system failures. Most of the papers originate from European research laboratories and universities, though our colleagues from the U.S. have also made a substantial contribution to our workshop. I take this opportunity to thank, on behalf of the programme committee, Dr. Helms and his dedicated colleagues Mr. Fangmeyer and Dr. Larisse of EURATOM-CETIS who directed the organistion of the workshop and followed up on all matters which helped the success of the Stresa meeting. IRIA contributed by publicizing the call for papers and the workshop with the help of Mile. Bricheteau. The papers were selected by the scientific programme committee with the help of referees which each individual committee member called upon. The sponsorship of IFIP Working Group 7.3 and the support of its chairman, Dr. P.E. Green of IBM, is very much appreciated. The validity of computer system modelling and performance evaluation research in Europe, the large number of interesting problems which remain to be solved, imply that we can look forward to a regular series of symposia in this area.
Eroi Gelenbe Paris, December 1976
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
STATISTICAL SEQUENTIAL METHODS IN PERFORMANCE EVALUATION OF COMPUTER. SYSTEM
M. Arato Research Institute for Applied Computer Sciences Budapest, Hungary
The methods of statistical sequential analysis used in hypothesis testing are well known. The Bayesian approach makes their use much simpler and gives new methods to solve various problems. We give exact treatments of some hiererchical memory manage ment problems. For example, in virtual memory systems to mini mize the page fault rate when the reference string 7),,n 2 , . . . is an independent identically distributed sequence of random variables with unknown probabilities, we have to solve the Bellman equations. By the same method we solve a problem of dynamic file alloca tion in a two-computer network. The problem is treated for the case when the rates of the files requests from independent se quences of independent random variables. INTRODUCTION In this paper algorithms are considered which provide control of stored items in a computing system. The statistical analysis of two-level storage hierarchies is treated and then the dynamic file assignment in a two-computer network will be discussed by the same mathematical method. In the first case it is known that the replacement problem arises in computer system management whenever the executable memory space available is insufficient to contain all data and code that may be accessed during the execution of the programs. The first level (buffer or cache) denotes the faster device, and the backing store represents the larger but slower memory. The devices may contain η items, called pages, and we assume that k < n of them may be in the buffer. The generator output (the reference string) is a sequence of requests T),,7j2, . . . for page contents. For example, if the ith request η. takes on the value ƒ(< n), this means that the ith request is for the contents of the ƒ th page. All accesses to the hierarchy must be served from the first level. If the ƒ th page is present in the buffer, the page is delivered in 7", seconds, otherwise the page must be fetched from the second level which requires ΤΊ seconds. Generally Tj < 7Λ. We assume, without loss of generality, that the buffer is full from the beginning and a page must be removed to make room for the j th page if it is not in the first level. In this paper it is assumed that a page is brought into main mamory only when it is demanded and found missing and a page is removed from the main memory. Such a scheme is known as demand
2
M. ARATO
paging. The purpose of the replacement algorithm is to minimize the number of required replacements, termed page faults. We take as a cost criterion the average number of page faults generated during execution, and we wish to minimize this cost. In finding the optimal demand page algorithm in all the previous papers Pj (p 2 = 1 p p , but it is not known which probability is related with the pages. The computer is a multiprogrammed one and in the first level memory one page may be kept constantly. In case of a page fault the page of the second level is taken to the main memory and the replacement of one page to the second level occours after delivering the content of the demanded page. A sequence of Λ' references is to be made and at each stage either Αχ ox /I is on the second level, the loss being I if a page fault occurs, 0 otherwise. Let { denote the a priori probability that A, has the less request probability, p.. Let n,(f = 1,2, . . .) denote the reference string, η = / (/ = 1,2) if the /th page (At) was refe
STATISTICAL SEQUENTIAL METHODS IN PERFORMANCE
renced. Let X (r = 1,2, which level was referenced:
) denote the random variable which gives that at time moment t
1
page on the first level was referenced,
2
page on the second level was referenced.
* t -
) denote the decision which page has to be removed to level 2:
Let dtU= 0,1,2, 1
if page A,
goes to level 2
2
if page ΑΊ
goes to level 2.
d =
It is obvious that if
tit+dt_x.
if
U,-«*,_,·
Λ" = 0
We introduce the non observable rendom variable w, which gives the relation between the request probabilities (p^Pj) and pages Al,Ai1
pages (i4j,i4 2 ) having reference probabilities
(p2,pt).
pages (A.,A2)
(ρ.,ρ).
ω = ■
having reference probabilities
The distribution of w is Ρ(ω = 1 ) = f,
« u = 2) = 1 {.
We seek among all Markov decision rules δ = (d0jl., . . . , dN_ t ) (see Shiryaev [13]), where dt depends only on n ; , n ( _ 1 , . . . , n , , sucha δ = (d 0 , . . . , d^_ ,) for which max £UT, ° + . . . + X\
" ) = ¿ W , ° + . . . + XN N~ ' ).
7V 1
(1.1)
Simple calculations give that E(X\") = ρ 2 ξ + ρ 2 ( 1 ξ ) , £ ( ^ , 2 ) ) = ρ 2 ξ + pj(l
a
and so
E(X\") £U_l 2 ) ) = ( P 2 P , K 1 2 Î ) . From (2) we get that the difference is greater than iän 0 if ξ > 1/2 (it does ιnot depend on p)t means d0 = 1 if ξ > 1/2 and d0 = 2 if ξ < 1/2. Now we prove the following lemma.
(1.2) this
M. ARATO
Lemma 1. Let ξ = 1/2 and
¿(f) denote the a posteriori probabilities,
then
k
p\p'2-
(1.3)
A '
P 2 P> >v/¡ere A ' denotes the number of occurrences of page A2
(i.e.
n t = 2,1 < λ < t).
Proof. On the basis of Bayes' theorem we get A n , Ι ω = \)i\cj=
«D
1)
P(i), Ι ω = \)Ρ(ω = 1) + Αι?, Ι ω = 2)Ρ(ω = 2)
fa«
¡f
_ η
P,f + P 2 ( i t ) Pi* Ρ2£+Ρ,0 and from here in case
if
ξ)
.
j
'
'
ï?, = 1,
J = 1/2
p,
if
p2
if
η , = 2,
«D = '1
In the same way we get (ξ = 1/2) „2 n
Ρ, + P 2
ί(2)
1/2
2
=2
if
η, = 2,
i? 2 = 1
if
η , = 1,
i?2 = I.
or
I,
2.
„2 ρ;
+
P;
By induction the lemma can easily be proved. By the same method as in (2) we can prove that after the first observation 17, the best decision 1
if
£(D = p ,
(i.e. n, = 2),
2
if
£(D = p 2
(i.e. 1?,
Let us introduce the following reward I
if
x=
2
if
.v = 0.
I).
V(x),
1,
V(x) =
If £(f) denotes the posterior distribution of the random variable at time moment / - 1 (or the prob ability at moment
t) then the total gain from
(see Prohorov-Rozanov [11]).
t until Ν can be expressed as follows
STATISTICAL SEQUENTIAL METHODS IN PERFORMANCE
Κ(ί,ί(ί)^, = x,5l'-Nh=
*■
E,MI)X
,.W.
,).
2 V(X¡ » ' ) , δΙ'-Ί = « / , _ , , . . - , 1/2 and that page A2 should be put on the second level if £(r) < 1/2. Then δ* is an optimal sequential procedure. In other words the optimal procedure means that at every moment t that page should be removed on the second level for which the posterior probability of having the less reference probability, p 2 is greater. Proof. It is obvious that V(N.S;{N)s) = V(x) since it does not depend on ξ(Ν) and V(N - \,l(N-
\)jc) = V(x) + max {Ρ(ω = \)P(X^N ~ χ) = 1 Ιω = \)V(N&N),\) d
N-\
+ Ρ(ω = 2)P(X(Ndfl- ' ' = 1 Ιω = 2)K(M{(/V),1)} , where Κ(/ν,ξ(Λ0,1 = 1. From the last relation we get with decision dN _, = 2, V(N - 1, ¡¿Ν - l),x,dN_1 = 1) = V(x) + { {(ΛΟρ, + (1 ί(Λ0ρ,} , decision d„ _ , = 2, V(N - l,i(N - U*.dN_l = 2) = V(x) + {ξ(Λ0ρ, + (1 - ξ(Ν))ρ2) ■ For their difference we have (p, p , ) ( l 2«Λ0) and this means, as p , > p 2 , the optimal decision dN _ , has the form 1
if
£(/V) > 1/2,
2
if
ξ(Λ0 < 1/2,
i.e. the decision depends only on the a posteriori probabilities. The Bellman equation for every t has the form
+
M. ARATO
Y(t¿W,x)
= V(x) = V(x) + max {ξ(ί + IVKX^tf d
= 1 Ιω = \)V(t + 1,{(/ + 1),1) +
t
+ í(r + l ) « * , ' * ' ' = ΟΙω = \)V(t + l,ì(r + 1),0) + (1 i(t + 1)) · • IXxlt'i
=. ΙΙω = 2)V(t + !,{(/ + 1),1) + (1 £(/ + D W , ' ^ = ΟΙω = 2)
V(t+ I,J(r+ 1),0)}, which gives the optimal decision rule (see ProhorovRozanov [11]) 1
if
£(?+ 1)> 1/2,
2
if
£(r+ 1 ) < 1/2,
d. =
which proves the theorem. We prove the following THEOREM 2. Let p , > l/2,£ = 1/2 and Ν fixed. Let the reference string n ; be an independent, identically distributed sequence of random variables with two states. Then the optimal sequential pro cedure δ* which minimizes the expected number of page faults (see (1)), puts at each stage that page to the second level which has the less request frequency. Proof. Theorem I. proves that the optimal procedure δ* has the following form (a similar result is known in the "twoarmed bandit" problem) 1
if
£(r) > 1/2, (4)
2
if
£(f) < 1/2.
Comparing (3) and (4) we get (using again the fact p , > 1/2) 1
if
k > r/2,
if
k > r/2,
Cl
and the theorem is proved. 2. ON THE FILE A SSIGNMENT IN A COMPUTER NETWORK When a file is used by several computers in a network, it can be stored in the memory of one of them and be accessed by the other computers. Here we assume only one copy of each file is allowed to exist in the system at any given time, which is the case when the updating of the files is going. Under seve ral simplifying assumptions (e.g. that the files are regulated by the computersaccording to independent request processes, the files are short, the transmission of them takes very short time, the communica tion lines have sufficient capacity) we can treat each file separately. In the earlier analysis the goal was to find the best static location for the files, for the entire operating period (see e.g. Chu [6]). But the parameters of the system vary with time and in this case a dynamic allocation might give a substantial improvement in perfomance (see Segali [12]). Here we show that a
STATISTICAL SEQUENTIAL METHODS IN PERFORMANCE dynamic file assignment might be also necessary when the statistical description contains unknown parameters, which have to be estimated from the observed requests. The goal of this part is to show in the most elementary case that the dynamical file allocation policy can be solved in the discreate time case. Let us consider a completely connected system of two computers, and let be defined the decision df at time t (t - 0,1,2, . . .) in thè following way: , if the file in consideration will be stored at a =i computer í at time r + 1, i.e. after t,(i = 1,2). The variables y ( (i)(i = 1,2) take the value 1 or 0 according to whether the file is located in me mory i or not at time t, i.e. 1
if
p 2 We assume that the operation cost is 0 when the file is in the requested computer and the communication cost per transmisión from computer 1. to 2. (or from 2. to 1.) is 1. The expected total cost over the period / = 1,2, . . . , Ν is Ν
N
N
c= Ε{Σ{Υι«)η2«)+ /2(θι/,α))} =£{2Ί),ω +^ι^ω^ω-17,(0]}.
an
When ρ, and ρ 2 are known, (ρ, > ρ 2 ), C is minimum if j*,(r)= 1). (d ( _, = 1). In this case C=Np2. Now we assume that the values p , ,p 2 are unkonown and the control variables (decision) dt can be functionals of the past observations f η,(s), i72(s), Yx(s), Y2(s), s < t — } . The exercise is the same; find optimal decisions to minimize the expected cost C. Using the Bayesian approach we introduce the random variable w, where
M.
ARATO
tí, (0 has the rate ρ
2
if
and ι? 2 (0 the rate p 2 ,
i?,(0 has the rate p 2 and η 2 (/) the rate ρ
and Ρ(ω = 1) = {,Λω = 2) = 1 ξ. We assume that p , and p 2 are known but it is not known with request process they are related. Let Xt denote the random variable which gives that at time moment t a transmission was taken because the file was not in the requested computer:
(*·,(/)= 1 and i) 2 (/)= D or {Y2U) = I and 17,(0= 1) otherwise By the help of decisions dt we have:
*i'>' =
y,(OD2(0+
K 2 (Oi7,(0
< / , . , ) sucha δ*= (d'aM\
We seek among the Markov decision rules δ = (d„, . for which
min£(A'(,
0.729
0.252
0.250
0.493
0.466
0.669
0.667
0.713
0.695
0.770
0.743
1 2
21
ACCURACY OF AN APPROXIMATE C OMPUTER SYSTEM MODEL
H " ^
0.25
0.5
0.9
0.6
0.75 S
F
S
ρ
0,429
O.451
0.444
0.474
0.474
0.520
0.480
0.535
0.497
0.674
0.528
0.405
0.557
0.520
0.578
0.536
0.618
C.571
0.433
0.426
0.591
0.552
O.COO
0.571
0.6C3
O.C'C
0.245
0.446
0.443
0.601
0.577
0.612
0.596
0.6Θ3
0.635
0.246
0.247
0.454
0.455
0.629
0.599
O.64I
0.621
0.706
0.6ÍS
7
0.256
0.248
0.467
0.465
0.633
0.617
0.646
0.640
0.712
0.660
β
0.247
0.249
0.474
0.472
O.646
0.632
0.66Θ
0.656
0.717
0.698
9
0.249
0.250
0.466
0.478
0.649
0.645
0.676
0.670
0.753
C.714
IO
0.245
0.250
O.503
0.482
0.652
0.656
0.6β8
0.C62
0.741
0.725
S
V
S
y
0.201
0.200
0.331
0.333
2
0.J..2
0.220
0.3Ö6
3
0.234
O.232
4
0.241
5
S
F
S
F
'
S
0.196
0.200
0.334
0.333
0.424
2
0.275
0.222
0.394
0.375
3
0.231
0.234
0.419
*
0.240
0.241
5
0.249
e
x \ ^
ΐ/μ-0.25
o.e
0.75
0.5
0.9
p
s
F
S
?
0.423
0.429
. 0.451
0.444
0.479
0.474
0.371
0.519
0.475
0.536
0.492
0.575
0.523
0.420
0.399
0.543
0.512
0.572
0.530
0.606
0.562
0.239
0.416
0.420
0.569
0.542
0.597
0.561
0.692
0.59Ï
0.247
0.244
0.444
0.437
0.599
0.567
0.612
0.587
0.660
0.623
fi
0.245
0.24£
0.457
0.449
0.604
0.580
0.643
0.609
0.683
0.647
7
0.246
0.248
0.4£5
0.459
0.629
0.605
0.647
0.628
0.709
0.667
3
0.253
0.249
0.454
0.467
0.626
0.621
0.660
0.614
0.696
0.£85
9
0.144
0.249
0.461
0.473
O.Ó49
0.634
0.662
0.65S
0.736
0.70O
:0
0.244
0..-Ó9
0.476
0.478
0.637
0.6-15
0.693
0.670
0.724
0.714
■
1
l/μ
Ì
22
M. BADEL, A . V . Y . SHUM
TABU' ι
^l
·/, » «
0.5 F
S
F
Q.200
0.200
0.338
0.333
0.429
0.224
0.219
0.36t
0.366
3
0.237
0.231
.0.405
4
0.237
0.23Θ
5
3.23'î
'
0.9
0.8
0.75
£
S
F
S
F
0.429
0.445
0.444
0.475
0.474
0.503
0.471
0.523
0.486
0.505
0.51H
0.395
0.538
0.5C6
0.554
0.523
O.0O2
0.555
0.420
0.415
0.557
0.534
0.560
Ü.5.3
0,636
0.56''
0.243
0.442
0.431
0.590
0.556
0.612
0.578
0.650
0.613
0.235
C.245
0.40Ú
0.444
0.595
0.576
0,6¿1
0.599
0.606
O.C35
7
0.253
0.247
0.458
0.454
0.614
0.596
0.623
0.617
0.666
0,655
8
0,250
0.248
0.458
0.462
0.630
0.611
0.649
0.633
0.699
0.673
9
0.244
0.249
0.471
0.469
0.Ö31
0.624
0.664
0.647
0.690
O.Otfa
_
0.244
0.249
0.469
0.474
0.626
0.635
0.651
0.660
0.710
0.702
F
S
F
Γ
S
1
K 2 10
;
\ P
i/u
= o . ¿5
Ξ 1
0.5
0 9
0. θ
0.75
s
2
?
0.203
0.200
0.337
0.333
0.426
0.429
0.444
0.444
C.475
C. 474
2
P. . 2 2
C.216
0.361
0.366
0.511
0.466
0.529
0.4W
0.556
0.515
3
υ...?9
0.230
0.400
0.391
0.5Ü9
0.500
0.549
0.517
0.566
0.549
4
0.235
0.237
0.413
0.411
0.549
0.52*/
0.576
C. 545
0.61Ù
0.57L
5
0.239
0.241
0.429
C.420
0.571
0.550
0.599
0.569
0.642
0.ÕC4
6
0.242
0.244
0.436
0.439
0.610
0.570
0.616
0.590
0.645
0.(16
7
0..'·43
0.246
0 . 4 «30
0.449
0.602
0.56"/
0.613
C.606
0.676
0.645
8
0.246
0.24e
0.453
0.459
0.614
0.602
0.634
C. 623
0.691
o. ;t2
o
U. 24?
0.248
C.4G5
C.465
0.625
0.015
C.647
O.0J7
0.695
0.677
10
o.:49
0.249
C.4C9
C.470
0.637
0.626
G. ¿62
C.63L
0.714
1.691
¡
23
ACCURACY OF AN APPROXIMATE COMPUTER SYSTEM MODEL
K2-1
p \ P
ΐ/μ= c
0.
'5
i
0.
0.75
3
0. 9
1
F
S
F
R
F
S
F
S
P
C.LOI
0.200
0.333
0.335
0.429
0.429
0.444
0.444
0.474
0.474
0.2.19
0.250
0.470
0.482
0.625
O.I43
0.645
0.673
0.696
O.715
3
0.252
0.250
0.493
0.496
0.699
0.706
U.729
0.73Í;
0.7Õ2
O.792
4
0.2Ì9
0.250
c . 502
0.500
0.724
C.729
0.762
0.767
0.Ö17
C.Ê29
5
0.2,0
0.2'jO
0.496
0.500
0.736
0.740
0.776
G.7C1
0.640
0.650
6
0.250
c . 250
0.418
0.500
0.743
0.745
0.769
0.7E9
0.664
0.ΕΈ4
7
...»
C. 250
0.5CC
0.500
C.750
0.747
O.794
0.793
0.672
0.673
6
0.252
O.25O
0.502
0.500
0.760
0.749
O.794
0.796
O.HÏfi
0.679
9
0.253
O.25O
0.501
0.50.»
C.74Ú
0.749
0.795
0.796
0.Θ79
0-864-
c.250
0.250
0.500
0.50a
0?763
0.750
0.6115
0.769
0.005
Ο.θθβ
'
ΤΑΗ,ε 12
¿"^J*
0.5
ΐ / μ - 0.25
0.8
0.75
S
F
p
F
O.199
0.20C
0.333
0.333
0.429
2
0.246
0.246
0.449
0.449
3
0.249
0.250
û.471
4
0.249
Ο.25Ο
5
0.249
6
09
S
F
5
F
0.429
0.444
0.4,4
0.473
0.474
0.602
0.561
Ο.629
Û;6ÛI
0.672
0.637
0.482
0.663
0.646
0.6B9
0.673
0.492
0.494
0.6B3
0.665
0.720
Ο.713
0.779
0.762
0.250
0.485
C.496
0.702
0.706
0.735
0.736
0.808
0.792
0.257
0.250
0.491
0.499
0.704
0.720
0.762
0.755
C.820
0.613
7
0.252
0.250
0.492
0.500
0.726
0.729
0.764
0.767
0.Θ27
0.629
8
0.250
0.250
0.496
0.500
0.734
0.736
0.777
0.775
C.C41
0.641
«
C.249
0..50
0.491
0.5OO
0.251
0.25Q
0.497
0.500
0.723
0.743
0.7/6
0.766
0.649
0.Γ.5Γ:
1
»
i
M. BADEL, A . V . Y . SHUM
24
*? =
:^\ P
1/
0 5
0 . 25
0.75
■°·
μ ρ
fi
0.9
3
S
F
S
F
'
0.199
0.200
0.334
0.333
0.429
0.429
0.444
0.444
0.471
0.474
2
0.241
0.244
0.436
0.426
0.592
0.545
0.606
0.564
0,653
0.597
3
C.24?
0.244
0.465
0.465
0.635
O.6O9
0.656
C.63I
0.718
C.6Î8
*
0.247
0.250
C. 473
0.4U2
0.665
0.64Θ
0.701
0.673
0.74b
0.715
5
0.247
0.250
0.480
0.491
0.676
0.675
0.707
0.702
0.77Û
0.749
6
0.253
0.250
0.487
0.496
0.69e
0.693
0.738
0.723
0.B01
0.773
7
0.244
0.250
0.465
0.498
0.710
0.706
0.73B
0.738
0.796
0.792
θ
0.24S
0.250
0.505
0.499
0.712
0.716
0.753
0.750
0.814
0.807
9
0.249
0.250
0.49«
J.499
0.723
0.724
0.759
0.759
0.020
0,019
.c
0,25t
0.250
0.509
0.500
0.712
0.729
0.757
0.767
0.029
C.629
TABLE 14
VN^
ΐ/μ
-
S 1
0 5
0.25 F
«
0 . 75 F
0.9
0 8 S
S
F
-0.199
0.200
0.326
0.333
0.430
0.429
0.442
0.444
0.476
0.474
2
0.238
0.239
0.434
0.411
0.573
O.523
0.599
0.540
0.643
C.572
3
0.248
0.24Θ
C.457
0.449
O.617
0.581
0.636
0.601
0.695
0.637
4
0.243
0.249
0,473
0.470
0.647
0.621
0.667
0.643
0.723
ο.αα
5
0.249
O.250
0.476
0.462
0.662
0.648
0.689
0.673
0.741
0.715
6
0.250
0.250
0.4A2
0.49c
0.679
0.669
0.702
O.696
0.7Λ&
C.741
7
0.252
0.250
0.4Ö5
O.494
0.688
O.61 (5
0.733
0.713
0.701
Ù.762
8
0.251
O.25O
0.496
0.496
0.704
0.697
0.730
0.727
0.794
0.77e
0
0.245
0.250
O.492
0.496
0.711
0.706
O.731
0.73Θ
0.*iOH
0.790
10
0.247
0.250
O.492
0.499
0.714
0,714
0.745
0.740
G. CIO
0.603
25
ACCURACY OF AN APPROXIMATE COMPUTER SYSTEM MODEL
» \ p
1/
0 . .5
0.5
i
0.
0.75
μ
s
F
s
ν
'
0.200
0.200
0.337
0.333
2
0.221
0.236
0.412
3
0.240
0.246
=·
0.252
5
o.
0. 9 ?
^
Ρ
S
Ρ
0.428
0.429
C.444
0.444
0.470
0.399
0.554
0.505
0.59t
C.525
0.626
0.556
0.443
0.437
0.597
0.561
0.619
C. 550
0.661
G.614
0.249
0.454
0.459
0.623
0.599
0.650
0.620
c. 69e
0.656
242
0.250
0.475
0.473
0.646
0.627
0.667
0.650
0.724
0.ÍÚ9
6
0.245
0.250
0.477
0.482
0.652
o.6ie
0.692
0.673
0.746
C.715
7
0.25J
0.250
0.486
0.46t
0.665
0.665
0.709
0.692
0.741
0.737
9
0.247
0.250
0.499
0.492
0.671
0.679
0.717
0.707
0.769
0.754
„..:>.
0.2'ÍO
.495
.'. I O J
0.101
0 . ÜJO
0.719
0.719
0.771
o.7ft
0.490
0.491
0.091
0.699
0.717
ù.730
o.eoe
0.761
!
S
I 0.250
0.244
1C
i
TABIC ia
ΐ / μ -
:í N ^SJ'
ί
0
5
0.
0.75
?
Ο.Ι99
G. 200
0.335
0.33'
0.424
'
0.233
0.232
C. 407
O.391
'
0.241
0.241
0.435
*
C'. 652
0.248
5
0.249
6
B
0.9
1
S
P
s
0.429
0.446
0.444
0.472
0.474
0.550
0.497
0.575
0.514
0.612
0.544
0.426
0.566
0.545
0.60B
O.564
0,653
0.597
0.453
0.449
0.594
0.5U1
O.636
0.601
C.605
0.137
0.249
0.416
O.405
0.625
0.609
0.640
O.631
0.695
0.666
0.250
0.250
0.479
0.475
0.637
0.631
0.666
0.654
0.724
0.694
7
0.2Ç0
0.25c
0.473
0.462
0.647
0.048
0.673
0.673
0.753
0.715
θ
Ο.250
O.25O
0.482
0.468
0.674
0.663
0.701
0.689
0.762
0,733
?
0.C55
>.25.
i ,472
O.491
O.OtM
0.1475
0.724
o.
;o2
(..769
0.740
«ο
C
I
0.494
0.6'4
0.665
0.712
0.715
0.777
0.762
1
!
0.25
S
255
j
0.250
fi
F
26
M. BADEL, A . V . Y . SHUM
t!·
*\£
l /
u
0 . 2 5
0.5
S
s
P
0.75
0.9
0.Θ
B
F
S
F
S
1 3.200
0.200
2
0.230
0.230
3
0.246
4
0.328
0.333
0.424
0.429
0.443
0.444
0.47C
0.475
0.706
0.364
0.549
0.480
0.560
0.5G5
0.6C9
0.536
0.241
0.438
0.418
0.582
0.533
0.593
0.551
0,648
0.563
0.Ü43
0.246
0.433
0.441
0.602
0.567
0.015
0.5Θ6
C.Ó69
C.621
5
0.246
0.246
0.455
0.451
0.605
0.594
0.637
0.615
0.194
0.051
6
0 . 2 Ml
iî.249
0.475
0.46C
0.633
0.016
0.664
0.63Í'
0.700
0.676
7
0.247
0.250
0.477
0.476
0.631
0.634
0.671
0.657
0.720
C.697
a
0.252
0.250
0.473
0.4β2
0.646
0.64Θ
0.6Θ2
0.673
0.72Θ
0.715
9
0.245
0.250
0.485
0.487
0.
'59
U.6Ó1
0.695
0.6'7
0.756
10
0.256
0.250
0.486
0.490
0.67b
0.672
0.709
0.699
0.760
f
ς
Ρ
0.195
0.200
0.331
0.355
2
.,,
0.227
0.392
3
0.239
0 . 2 59
4
0.241
5
6
ï2!
' ^
i/„ o. ¿5
0.
5
0.
0.75
•
1
0 9
S
0.430
0.429
0.444
0.444
o^e
0.474
0.379
C.542
0.4B2
0.55C
0.490
0.5S3
0.529
0.416
0.411
0.553
0.5*3
0.566
0.540
0.651
0.572
0.245
0.413
0.433
0.590
0.555
0.611
0.574
0.650
0.6U
0.219
0.248
0.454
0.449
0.606
0.561
0.625
0.601
0,676
0.637
0.247
0.249
0.452
0.461
0.619
0.603
0.643
0.624
0.680
0.661
0.252
0.249
0.46θ
0.470
0.628
0.621
0.652
0.643
0.710
0.662
8
0.249
0.250
0.471
0.477
0.657
0.636
0.695
0.659
0.725
0.700
α
0.247
0,¿T0
υ.
47*
0.18L'
0.643
O.ólB
Q.6&U
0.675
0.733
0.715
O..Î59
0.250
0.4,7
0.4t'1
0.662
0.66o
0.6ri?
O'Ot^
0.743
7
Ρ
3
1
i
•7
F
1"
!
μ
ACCURACY OF AN APPROXIMATE COMPUTER SYSTEM MODEL
27
7AB1£ Β
*!
r co m K. O οσ ί
ο
> σ
m οσ m «O OD m
NO. OF TRANSACTIONS TOTAL VIRTUAL TIME TOTAL OVHD TIME ACTIVE USERS
-π o οσ
_ Ζ
TJ
Η Η Ο -ο
s ο -Η
ο οσ 2 ο ο m
Γ~
LOST
*PR0B (VIRTUAL) TIME *CP OVHD TIME *VI0 *VIRT. LINES PRINTED *VIRT. CARDS READ *VIRT. CARDS PUNCHED *W0RKING SET «PAGING INDEX *PAGE READS *THINK TIME RESPONSE TIME TIME IN ELIGIBLE LIST TIME IN Q
TIME SPAN 30829.
C SE ONDS
440.311 0.52
TRIVIAL TRANSACTIONS
NONTRIVIAL TRANSACTIONS
9459 130.008 462.123 2.249
18171 I2867.27O 12176.875 3.864
TRIVIAL TRANSACTION (AV. PER TRANS.)
NONTRIVIAL TRANSACTION (AV.PER TRANS.)
0.01374 0.04886 0.625 2.304 0.021 0.081 21.969 2.995 1.456 7.077 0.252 0.001 0.250
0.70812 O.67OI3 40.831 10.471 O.334 0.076 78.642 7-945 3-862 2.638 3.918 0.018 3.9OI
OTHER
8O.965 I52.790 O.O36 OTHER
TOTAL
13078.238 12791.785 6.149 TOTAL
(AV. PER SEC. VIRTUAL TIME)
(AV. PER SEC. VIRTUAL TIME)
I.887II 50.096 17.291 0.803 0.0 80.679 141.671 68.869
0.97810 57.493 16.323 0.484 0.164 76.867 14.082 6.845
I3.723 0.272 13.451
5.7Π 0.027 5.684
-< roo Oca
on
A CHARACTERIZATION OF VM/370 WORKLOADS
51
ization cannot be performed independently of model validation. If the model produces correct output, both the model and the characterization stand validated. If the output is incorrect, either one may be at fault. Our validation consisted of obtaining monitor tapes from nine different VM/370 installations, covering a wide range of configurations and workloads. In all cases, the system performance analysis and characterization of user workload reports were generated. Depending on the variety of workloads observed, users in each case were placed into one to three user classes, whose descriptions were entered into the model. Additional data required by the model were estimated as follows: It was assumed that all accesses to 1/0 devices by all user classes were in the same proportion as the overall measured I/O rates to these devices, that the average record length was 800 bytes (the normal CMS* disk file block length), and that accesses at any time were uniformly distributed over an area equal to the span of a normal CMS minidisk. It should be noted that a systems engineer working at a specific installation could easily come up with more realistic estimates. The model predictions are compared to the observed performance in Table 2. The model accuracy specifications called for +30% accuracy except for short response times, were errors of up to 1 second could be tolerated. It will be seen from Table 2 that these requirements were met in all but one case. The generally low response time predictions are due to the assumption that disk arm movements are confined to one minidisk area on each disk, whereas in fact many seek motions traverse from one area to another. The extent of the specified seek distances provide an easy vehicle for tuning the model to a given installation. The data shown thus far do not demonstrate the portability of the workload characterization, since no measurements were obtained on workloads that were transferred from one configuration to another. We are currently on the lookout for such data. TOWARDS A TYPICAL USER PROFILE When an entirely new installation is to be configured, one might have to draw on an inventory of workload profiles obtained at existing installations. It is interesting, therefore, to find out how workloads of various types vary among installations, and whether "typical" workload profiles can be generated. As a step in this direction, the workload characterizations of the transaction oriented CMS users from seven VM/370 installations were combined into a characterization which Is summarized in Table 3· In computing the averages reported in the table, each installation workload was given an equal weight. Since the CPU's at the installations ranged from a model 145 to a model 168, all CPU times were reduced to the 145 speed. It was found that even after this adjustment, average CPU times per nontrivial transaction varied considerably fron one installation to the next. Therefore, nontrivial-transaction resource requirements were reduced to a one-second of model 145 CPU time basis. This transformation would not affect the results of performance predictions made by the model, except for a proportionate change in nontrivial response time.
CMS is the interactive operating system usually used under VM/370 [2],
' CPU ' Model
Percent CPU Uti 1 i z a t i o n
Average Logged Users
Measured
'
17.1
Percent V I r t u a l CPU Time
Predicted
'
18.6
Measured
'
5-3
Average T r i v i a l Response (Seconds)
Predicted
'
5.7
Predicted
Measured '
0.7
Average 1lont r l v i a l Response (Seconds)
'
0.6
Measured
' 135
4
j 145
8
84.0
85-7 (81.3)*
42.5
43.5 (44.3)
0.25
0.21 (0.14)
3.9
3.1 ( 2.74)
' 145
15
96.6
95.1 (85.3)
40.8
36.3 (34.9)
0.51
0.70 (0.69)
26.6
32.5 (18.2 )
'
155-11
20
22.2
22.3
6.7
6.6
0.05
0.05
1 .20
1.06
1
155-11
23
36.9
36.1
10.7
10.4
0.08
0.10
2.76
3.35
| 158
37
59.2
55.2 (66.0)
31.5
28.8 (30.5)
0.21
0.14 (0.25)
' 158
46
70.3
65.9 (99-7)
37.8
35.2 (45.4)
0.14
0.10 (0.31)
2.54
1.51** ( 5.14)
|
158
24
68.8
71.8 (70.2)
52.2
55.8 (53.6)
0.07
0.08 (0.09)
6.07
4.99 ( 7.25)
' 168
72
36.0
35.1 (34.3)
14.5
14.6 (16.3)
0.13
0.10 (0.06)
7.8
6.2 ( 4.7 )
Predictions In parentheses are based on average workload described in Table 3· nation. Predicted value outside accuracy specifications.
Table 2.
Validation of Model and Workload Description
'
19¡0
Predicted
21.8
'
15.6
15.5 (20.4 )
See next section for expla-
A CHARACTERIZATION OF VM/370 WORKLOADS
53
Perusal of Table 3 reveals that there is considerable, though not complete, uniformity among the workloads. Among trivial transactions, the variations (relative to the mean) In CPU and think times are small, and in virtual l/0's and paging index are moderate. The relatively large variations in unit record activity are not important from a system performance point of view. Among the important variables on the nontrivial side, only the paging index is even moderately variable. Of all the tabulated variations, the one with most profound effect on performance predictions is that of the trivial to nontrivial transaction ra t i o. In order to assess the error that would be committed if the "typical" data of Table 3 were assumed to describe an unknown CMS workload, these data were introduced into the model in place of the measured data for some of the configurations of Table 2 (except that the correct trivial to nontrivial transaction ratios were retained). The resulting performance predictions are shown in parentheses in Table 2. While sometimes considerably less accurate than predictions based on the true workloads, these predictions may still be considered as useful "ballpark" estimates of expected performance. CONCLUSION We have described a characterization of VM/370 workloads which is easily obtained from measurements on installed systems, and which is suitable for driving an analytic model of the system. The make up of the modeled workload can be varied easily by changing the relative numbers of users in various classes. Sensitivity of performance to workload parameters is easily determined by the model. The workload characterization also has uses other than as input to a model. For the installation manager, it points to the users who consume the critical resources of the system; for the system developer, it highlights the user activities for which the system should be designed. A vexing problem In performance analysis is the finding of benchmarks which typify a complex user environment. Using characterizations such as the one presented here, it is possible to create synthetic benchmarks which approximate the workload of any particular installation [10]. The most important extension to our workload characterization would be a facility for converting user level descriptions of the most common types of activity (e.g., file editing, data base queries, teleprocessing network control) into the machine level description.
54
Y.
BARD
+
Trivial Transactions 1
Parameter
Average
' V i r t u a l CPU time ' 0.013 CP CPU time 0.045 ' V i r t u a l S t a r t I/O's 0.95 0.61 ' Virtual lines printed ' V i r t u a l cards read 0.03 0.26 ' V i r t u a l cards punched 14.1 ' Working set 1.6 ' Paging index 2.0 ' Page reads 1 4.6 Think time + + 1
'
Non t • i v i a l Transactions
Standard Devi at ion
Average
Standard Deviation
0.004 0.013 0.60 0.66 0.05 0.28 6.9 1.2 1.8 1.4
' 1 . 0 ' 0.86 61.5 15.4 5-9 4.7 40.2 6.1 7-9 6.1
0 0.35 11.3 17-7 7.7 5-5 22.5 5.1 7.1 3.2
+
+
+
+
1
Parameter ' Secondary s t o r a g e paging s l o t s ' T r i v l a l / n o n t r i v i a l transactions r a t i o ' A c t i v e u s e r s , per 100 logged-on users +
' Standard ' ' Average ' D e v i a t i o n ' .+ + + ' 52 ' 17 ' ι ι 27 ' 32 ' 68 ' 14 ' -+ + +
Notes : 1-.
All CPU times are given as seconds on a Model 145.
2.
All nontrivial transactions are reduced to Model 145 time.
3.
Standard deviations are individual transactions.
4.
Extreme observations were ignored ratio and paging slot values.
Table 3.
a basis of one
between installation
averages, not
second of between
in computing the trivial/nontrI vial
Average CMS Workload
A CHARACTERIZATION OF V M / 3 7 0 WORKLCADS
55
REFERENCES
1.
Y. Bard, An A n a l y t i c Model o f CP-67 and VM/370, i n Computer A r c h i t e c t u r e s and Networks, E. Gelenbe and R. Mahl ( e d s . ) , North H o l l a n d P u b l i s h i n g C o . , Amsterdam ( 1 9 7 4 ) , pp. 419-460. The model has been e x t e n s i v e l y m o d i f i e d s i n c e the above was p u b l i s h e d . A new d e s c r i p t i o n w i l l appear in a f o r t h coming paper.
2.
IBM V i r t u a l Machine F a c i l i t y / 3 7 0 , I n t r o d u c t i o n , Form No. Data Processing D i v i s i o n , White P l a i n s , N.Y. (1972).
3.
C J . Young, VM/370 Biased Scheduler, Technical Report TR 7 5 . 0 0 0 1 , IBM New England Programming C e n t e r , B u r l i n g t o n , MA (1973)·
4.
IBM V i r t u a l Machine F a c i l i t y / 3 7 0 , Control Program (CP) Program L o g i c , Form No. SY20-0880, IBM Data Processing D i v i s i o n , White P l a i n s , N.Y. (1972).
5.
Y. B a r d , Performance C r i t e r i a and Measurement f o r a IBM Systems Journal _[0_, 193-216 (1971).
6.
Y. Bard and K. V. Suryanarayana, On the S t r u c t u r e o f CP-67 Overhead, i n S t a t i s t i c a l Computer Performance E v a l u a t i o n , W. F r e i b e r g e r ( e d . ) , Academic P r e s s , New York ( 1 9 7 2 ) , pp 329-346.
7.
S. J . B o i e s , User Behavior on an I n t e r a c t i v e Computer J o u r n a l J_3_, 2-18 (1974).
8.
Y. B a r d , P r e d i c t i o n o f System Paging Rates, Proceedings o f Computer Science and S t a t i s t i c s , 7 _ t h Annual Conference on the I n t e r f a c e , Iowa State U n i v e r s i t y , Ames, Iowa, 138-141 (1973).
9.
P. H. Callaway, Performance Measurement Tools f o r VM/370, IBM Systems J o u r nal J4_, 134-160 (1975).
10.
P. A. Hamilton and B. W. K e r n i g h a n , S y n t h e t i c a l l y Generated Performance Test Loads f o r O p e r a t i n g Systems, Proceedings o f the 1-st Annual SIGME Symposium on Measurement and E v a l u a t i o n , Palo A l t o ( 1 9 7 3 ) , pp 121-126.
GC20-1800, IBM
T i m e - s h a r i n g System,
System, IBM Systems
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
SCHEDULING DEPENDENT TASKS WITH DIFFERENT ARRIVAL TIMES TO MEET DEADLINES
J. Biazewicz Institute of Control Engineering Technical University of Poznan Poznan, Poland
This paper deals with deterministic problems of scheduling dependent tasks on one processor. Tasks are to he scheduled before their deadlines. First, it is proven that, when scheduling tasks which enter the system at time t=0, preemptions are not to be considered. Then, the case of tasks which arrive at different time instants, is presented. A scheduling algorithm is proposed and the proof of its optimality is given. INTRODUCTION In this paper the problem of scheduling dependent and preemptable tasks, which arrive at known, different, time instants, on one processor in a hard-real-time environment, will be considered. In this environment all tasks have strict deadlines that must be respected, Liu and Layland /1973/, Manacher /1967/. .This is often the case in process control or monitoring systems. Labetoulle /1974/ andManacher /1967/ considered systems in which independent task consist of a computation that must be executed periodically, with each computation of a task having a fixed dead line for completion. The deadline of a given computation can be no 57
58
J. BIAZEWICZ
later than the time of the request for the execution of the next computation of the task. In this case, the optimal solution is obtained by scheduling tasks according to the earliest deadline. The optimal algorithm for the set of dependent tasks, when all of them arrive at time t=0, was given by Lawler and Moore /1969/. The algo rithms mentioned above also concern the case of one processor. The cases, when dependent tasks arrive at different time instants might occur, for example, on a work project, when certain supplies are not scheduled to arrive until some time after the project has begun. In a realtime computer program, they might correspond to in puts supplied, at various discrete times within some timing template, bythe external hardware, which sends a "go" message when it has data ready, Manacher /1967/. PROBLEM FORMULATION We will be concerned with the set of preemptable tasks Τ.,Τ»,...,T « A partial order is defined on this set, the order specifying opera tional precedence constraints. T.^T. signifies that T. must be completed before T. can begin. On the other hand, Τ,JNT., means that J ·* J T. does not precede T.. For every task T., j=1,2,...,n, the process or ing time %., the arrival time r.> 0, and the deadline d. are given. The tasks are to be scheduled on one processor. Our objective is to find a schedule, if it exists, such that no task will be late, and that precedence relations will be satisfied. Without loss of generality we can assume, that tasks are numbered in such a way, that T.^T. implies i < j . It can also be assumed, that Τ.·^Τ. implies r.< r., since I, cannot begin until T, is finished. Some notions are now defined. Task T. will be called a v a i l a b l e /at moment t/, if r.. ¿ t J J and all its predecessors have been processed before and during time instant t. We will call the schedule o p t i m a l if all tasks are processed before their deadlines and if precedence relations are satisfied. The scheduling algorithm will be o p t i m a l
when it finds the
59
SCHEDULING TASKS TO MEET DEADLINES optimal schedule, if it exists, i.e. it finds the optimal schedule when arbitrary algorithm finds such a schedule. THE OPTIMAL ALGORITHM First, we will prove the lemma which is the generalization of the theorem given by McNaughton /1959/· Lemma
When scheduling dependent tasks on one processor with r.=0, j=1,2,.. ..,n, to meet deadlines, preemptions are not to be considered. Proof We will prove, that if there exists an optimal schedule in which tasks are preempted, then the optimal schedule without preemptions will exst. Let us assume, that there exists the optimal scheduleTTin which tasks are preempted. Let T., be one of them /see Fig. 1a/.
Th
Fig. 1 In Fig.1, A,B,C and D are parts of the optimal schedule. The sequence of tasks can be changed in such a way, that we obtain schedule TV /Fig. 1b/ in which task T^is processed without preemptions and is i
I
finished at the same time instant t=a. In such a schedule TV , prece dence relations are not violated, since: task TJ is begun no earlier than in ΤΓ , and its predecessors /in part A of the sequence/ will be completed before this moment, task I. is completed at the same moment as in schedule ÎT , so that all its successors /in part D of the sequence/ may be processed in
60
J. BKAZEWICZ
the same K/yy as in ΤΓ , the other tasks /in parts Β and C/ are processed in the same sequence as in TT , so that precedence relations among them are respected. No task in the new schedule 3T is late, since task T., and tasks from parts that were not moved up /A and D/ are completed at the same time and tasks from parts that were moved up /B and C/ are completed earlier than in 9T . Hence, schedule 3T is also optimal. Consequently after a finite number of such changes we can find the optimal schedule in which no task is preempted. An algorithm is described below which will be denoted by R, and it will be proved that this algorithm will find a schedule in which all tasks are completed on time, if such a schedule exists. A l g o r i t h m
R
1° For every task T. determine
* = = mmin i n [ d d , m i n ( d j I ^ l j j j + *■ Ì, ¿Z ν
where £ i s a small number and f,j i s used to break t i e s . 2° A ssign the processor to the available task Ï. which has the mini M
J
mum value of aZ . Process it until either it is completa or task Œ. with d£< a1! arrives and will be available. In latter case preempt task Tj. Repeat this step until all tasks are scheduled. We will prove,now, the following theorem. T h e o r e m
1
Algorithm R is optimal. Proof Two cases may occur Case 1 Tasks arrive at such time instants, that if the whole task set is processed in increasing order of d?, then no such task 1. will arrive K J
SCHEDULING TASKS TO MEET DEADLINES
61
during the processing of T., j=1,2,...,n, that d£cd*. The whole task set may then be processed: 1 in increasing order of dÎ, o " 2 as though all tasks were available at time t=0. The algorithm which assigns tasks to a processor so that 1 and 2 are fulfilled, is optimal for sets of nonpreemptable tasks /Lawler and Moore /1969//. Following the lemma, it can be stated that it is also optimal for sets of preemptable tasks. Then for this case, if an optimal schedule exists, algorithm R will find it. Case 2 Tasks arrive at such moments that, if they are processed in increas ing oider of dt, then some task T. will arrive during the processing of any Τ., that di < d*, j
κ
j
We are forced now, to prove that the optimal algorithm from among available tasks will choose one with the smallest dï. Let us assume J that it does not, i.e. there exists an optimal schedule 3T in which tasks are processed in other order. Let T, /or part of T. / be pro cessed before T. /or part of T./ in fT , despite the fact that dt i s true and where Z I , Z2 are areas and OP a logical operator (
=, i
, > , < > > ,
value
1
2
3
4
15.308
845.66
41.308
3.162
5
90.63
There are 6 possible values of s; therefore the χ has 5 degrees of freedom and there is a p r o b a b i l i t y of 0.01 that the JC2 value would be greater than 15.086. The r e p a r t i t i o n i s not uniform (except f o r the disk number 4 ) . But this disk is a private one. I t was not used during 2 sessions. The r e p a r t i t i o n f o r the two other sessions was :
1
Sectors
2
3
4
5
6
Session 3
605
561
515
513
490
. 668
Session 4
6712
6700
6723
6747
6734
6739
Number of accesses 2
2
X value : 37.65. The X has 15 degrees of freedom and there is a probability of-0.01 that the X value would be greater than 30.578. So we can assert that the repartitions are not the same and that the probability of sectors access is not stationary. V.l.2. CYLINDER ADDRESS
Disk number
X
2
value
Even if the χ not uniform.
1
1610
2
3
4
716
572
201
5
2460
has 405 degrees of freedom it is obvious that the repartition is
V.2. TEST OF INDEPENDENCE OF CONSECUTIVE DISK ADDRESSES Let ρ = the probability of two consecutive addresses on the same cylinder (com puted from distribution of cylinder addresses by 405 2 formula ρ =
where p. i s the use probability of cylinder i ) .
5[ Pi i =0
η = observed number of two consecutive inputs / outputs on the same cylinder
A PERFORMANCE EVALUATION OF THE CII SIRIS 8 O.S.
75
Τ = number of disk accesses r = η /Τ
dr = confidence interval for r, with a probability of error equal to 0.01 ν = ratio of arm movements lower than 10 cylinders
Disk
1
2
3
4
5
Ρ
0,1208
0,0319
0,0814
0,0255
0,0354
η
22983
25003
5806
3593
18104
Τ
68277
55039
8522
7996
24925
r
0,3366
0.4543
0.6813
0.4493
0.7263
dr
0.0047
0.0021
0.005
0.0056
0.0028
ν
0.7937
0.5302
0.8468
0.4583
0.8701
We conclude that two consecutive disk addresses are not independent. V.3. RATE OF INTERVENTION OF THE OPTIMIZATION ALGORITHM For the disk used in the computing center observed, the time of arm movement is 10 ms (min.) 70 ms (maximum) the rotation time is = 25 ms So the maximum access time is 95 ms and the average access time is 60 ms.1 Let q the ratio of intervals between inputs / outputs lower than 95 ms and q, ID
α
the r a t i o of intervals between inputs / outputs lower than 60 ms. Let q = 1 - ρ = probability of two consecutive disk addresses on d i f f e r e n t c y l i n ders q
q is the minimum rate of intervention of the 1/0 optimizer.
q q = is an approximation of the average rate of intervention of the 1/0 optimizer.
76
L. BOI et al.
Disk number
1
2
3
4
5
q
0.6634
0.5457
0.3187
0.5507
0.2737
"m
0.6211
0.6274
0.6207
0.635
0.7863
%
0.4677
0.4532
0.4762
0.528
0.5933
«Nu
0.412
0.3424
0.1978
0.3497
0.2152
0.3103
0.2473
0.1518
0.2908
0.1624
"""a
We give below, as example, some results for one disk and one session. The 10 most used cylinder
Cylinder number
(Disk 1)
Number of use
Rate of use in %
5
16602
20.42
4
13389
16.47
6
12256
15.07
1
11494
14.13
3
6628
8.15
55
1740
2.14
71
1473
1.81
100
1437
1.77
216
1404
1.73
46
1363
1.68
A PERFORMANCE EVALUATION OF THE CII SIRIS 8 O.S.
77
Arm's movement
Length of movement
Number
Rate in %
0
22983
33.66
1
14458
21.18
2
8569
12.55
3
3924
5.75
4
2253
3.3
5
1219
1.76
50
549
0.8
70
531
0.78
66
429
0.63
51
414
0.61
VI. CONCLUSION This study shows that the assumption of uniform d i s t r i b u t i o n of disk addresses is not true. So we can have misgivings about the efficiency of the optimization algorithm. But we have also showed that t h i s algorithm is useless i n at most 70 % of inputs / outputs. I t may be asked the u t i l i t y of such an algorithm. However, we must remember that there is no actual good algorithm for disk policy ; i t may be necessary to have d i f f e r e n t algorithms f o r d i f f e r e n t uses of disks (swap jUsers, systems). But the most important i n t e r e s t of this work i s to show that the analysis tool i s almost more important than the measurement tool i n order to obtain s i g n i f i c a n t results. As such a tool may be easily portable we hope that an important e f f o r t w i l l be done in that way.
L. B OI e t a l .
78
i
i
!
i
!
HTS: 'nfcRAiirÍF nFs Αππΐ r-o^paBl· ef¡seete ',
j
l
\ '
*-a&
'ι
:
stat
30000
aito
SSU
3ZE
i
:
i ■
ε3θοο
¡
—
I
-'.;-
__!
..... _ _ J Γ I ;
j
!
i
I
!
;'
¡ ; i i
: :
1
|
1 .:
! i
—j—
_L:._ i 1
: i
1
■v. .J. .. j _ .:..[. i
:
■
I
!
:
¡ ■'
1
i
.".Χ Ι. !
¡
¡
—
... .i.i:: 1 !
■ . i
!
..■■
. .
·
·
■
I
_i.
■
iH
| . _
I
r
! j
—
; '
I
H'M tior
0.
£
3
4
S
._;.._ —;
. „
1
1 •i:
■■
!
■
._L
6
4M
: ·:· . (...... i ·'·■
—
...· i j .
_
i .1
—
1
! '. •
!
_
0.
'
—f ·
■
j
! :
¡
l.v.
_.j_
"Ί".
....._
soão. '
i1
!
'1
;
": 1 Ύ"
·· v
Ili.
!
1
'
,,!..!.,..(.
1
!
: :
-
i
i
•
i :
f f Γ 'f 1
\.L]
: ι i.i
i ...
; i :
:
.·—j—
i
1
J
>
....;_
.
..
:
4
: !
—j
[
f
~
—
_j 1 i
. ;._
i '
i
.! I :"
ι
1 1
i
!
'
!
■
!——;·
■ ;
!
i ...
u4
14 .J..L_L.
.Γ
■
I.j.... .:.Li_
'."" '
!
¿£tl
■
. ¡ ; i ; i
¡
3«
!
L.iJ
' :
i' " '~¡' ! Γ'1 ' ; i ! "~*" '; i ■ '
ι
ÍJÜJI_SECIEUBSL__
: ! :4' _ . ! 1
: 1'
!
A PERFORMANCE EVALUATION OF THE CII SIRIS 8 O.S.
■ ¡. :..·;■
:
.nilRFF " [ I T I TFlAITRF I Ί !
ι
:
- :
: -t i I i I ί-'-Ι • 2 Í P / 3 PHYSTRIIFS,, i j
I
;
!
i i : j J ' t.T." ' ^1 ' ■ pi t'
.- .!..-
'i·-'
:
'. -fr ν yj-:. "■ ■ r t S
J.
.\
ï." '' i'--:'· ;. ;
!-
i
ti^ j j í 4ν!νν
j.
■
Ί
■V·
./ i ... ¡
;. Vr
T !"
! .
ι .
!
• ',
Ι. ι ■ ■ Ι
■
i i
ι X: ■'-l'-si?
'Ι
■ ί V . ι ..
■
i
I L
■ J
i
!
■ 1
■ ' ·
■■•'Η ' ■
■
ι
i
!
~h. -~T i
;
:t
1
■
ν
—|—
■
:1
.. ;.._ _ _ j _ _ —j~r
í
i
.' ,j._'
;
!
. í:
Ι
i
i
'■
Γ
' i ■
!
_
*
Vf..
.. !
l· .1
I... ;
¡
:
_
i
.
,.T uin J i
«ma
i
" i l 'J
tas SSOQ
:J ΤΙ ·Ι üf!
:
:
. ! 3
.τ h]
:
! .
I/0)| ! 1
Τ
■ í
i
. ·;, j
79
I! Time
'1
, « β ,11 16 j«
a
«
«
i* »
a
¡« s
ssi
»
80
L. B OI e t a l .
ACKNOWLEDGMENTS We acknowledge : Mr TERRINE, r e s p o n s i b l e f o r t h e SFER p r o j e c t , the C I I team, r e s p o n s i b l e f o r SIRIS 8 c o n c e p t i o n , t h e responsi b l e s o f computing c e n t e r s .
REFERENCES Arden, B .W., B o e t t n e r , D. ( 1 9 6 9 ) . Measurements and Performances o f a M u l t i p r o g r a m ming System; Second Symposium on O.S. P r i n c , P r i n c e t o n . Bard, Y. ( 1 9 7 1 ) . CP67 Measurement and A n a l y s i s : Overhead and Throughput; Work shop on Sys. P e r f o r . E v a l u a t i o n , H a r v a r d . B a t s o n , Α . , B rundage, R. ( 1 9 7 0 ) . I n s t r u m e n t i n g Computer Systems and t h e i r Pro grams; F a l l J o i n t Comp Conf. B e t o u r n e , C , Krakowiak, S. ( 1 9 7 3 ) . Mesures sur un Système C o n v e r s a t i o n n e l ; J o u r nées d ' E t u d e s , T o u l o u s e . B o i , L . , B o u r r e t , P . , C r o s , P . , D r u c b e r t , J . P . , and T r e p o s , R., ( 1 9 7 6 ) . E v a l u a t i o n des Performances du Système C i l SIRIS 7 ; Congrès AFCET, SFER/IRIA e t CERT. Bryan and Shemer, ( 1 9 6 9 ) . The U.T.S. TimeSharing System Performance A n a l y s i s and I n s t r u m e n t a t i o n ; Second Symposium on 0 . S . ^ P r i n c . B u s s e l , B . , and K o s t e r , A . A . , ( 1 9 7 0 ) . I n s t r u m e n t i n g Computer System and t h e i r P r o grams; F a l l J o i n t Comp. Conf. H o l w i c k , G.M., (B oole and B abbage), ( 1 9 7 1 ) . Designing a Commercial Performance Measurement System; Workshop on S y s t . P e r f o r m . E v a l u a t i o n , H a r v a r d . Kolence, K.W., ( 1 9 7 1 ) . A Software View o f Measurement T o o l s ; Datamation. L e r o u d i e r , J . , ( 1 9 7 3 ) . Mesures sur l a Demande de S e r v i c e f a i t e à un Système; Journées de T r a v a i l sur l e s Mesures, Toulouse. S a l t z e r , J . H . , and G i n t e l l , J . W . , ( 1 9 7 0 ) . The I n s t r u m e n t a t i o n o f MULTICS; ACM, Vol 13. S u t h e r l a n d , G . G . , ( 1 9 7 3 ) . Measurement o f Lawrence Livermore L a b o r a t o r y CDC 7600, System Performance; SIGME Symposium, Palo A l t o .
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
TASK SEQUENCING IN A BATCH ENVIRONMENT WITH SETUP TIMES John Bruno
and Ravi S e t h i
Computer Science Department
Pennsylvania State University University Park, PA 16802 The problem of sequencing a f i n i t e collection of tasks on one or more machines so as to minimize the average f i n i s h i n g time is considered. Tasks are divided up into classes and associated with each class is a setup time. Dynamic programming solutions are given. Precedence constraints are also considered. INTRODUCTION We consider a class of scheduling problems consisting of a collection of r classes of tasks to be processed by m Ï 1 machines. There is a setup task for each class which must be processed prior to processing tasks in its class. There is a known processing time and deferral cost for each task and precedence con straints among the tasks within a class. The scheduling discipline is nonpreemptive and no inserted idle time is allowed. The objective is to minimize the sum of the weighted finishing times of all the tasks. There are many examples in both computer and job-shop scheduling in which a penalty is incurred when we switch the processor from one task to another. Consider, for example, processing a collection of programs written in several different source languages and submitted as a batch for compilation. The loading time of a specific language processor corresponds to the penalty incurred (setup time) when we switch from one class of programs to another. The compilation time for a program can, in most cases, be accurately estimated from the length of the source code. If we consider a single processor and all unity deferral costs our problem is to deter mine a sequence in which to load the language processors (they may have to be loaded more than once) and a sequence in which programs should be processed so as to minimize the average finishing time of all the tasks. One can envision similar examples in which the setup time corresponds to loading a microprogram which can process a particular class of tasKS. In the next section we give a precise specification of the problem. Following this we consider the special case in which there are no precedence constraints. We give a basic property of optimal schedules which clarifies the sequencing order within a class. This leads to a dynamic programming solution on one machine whose time complexity is 0(r 2 n r ) where r is the number of classes and n is the total number of nonsetup tasks. We also give a solution for more than one machine but here we must impose the additional restriction that all the deferral costs are identical. We allow precedence constraints but restrict ourselves to a single machine. Lemma 3 gives a basic sequencing property of tasks within a class for op timal schedules. When we restrict the precedence constraints to the series-para llel type, this property allows us to efficiently transform the problem with pre cedence constraints to one without precedence constraints. In the last section we
'This research has been partially supported by the National Science Foundation under Faculty Fellowship Number GZ-3707. 81
J. BRUNO and R. SETHI
82
discuss some complexityrelated issues. PROBLEM SPECIFICATION Let are
r be a fixed positive integer giving the number of classes of tasks. There (k.j+1) > 1 tasks in class i denoted by T Í Q . T ^ I , . . . , T ^ Task T i0 is the setup task f o r class i . The processing time and deferral cost for task Tjj are denoted by T.¡J and ω^·, respectively. The processing times are a l l positive except possibly f o r the setup tasks, the deferral costs are nonnegative and (».¡ο = 0. Associated with class i i s an ( i r r e f l e x i v e ) p a r t i a l order ■ ft" Let n, s, and a be vectors denoting (n»,.,.,n r ), (s,,...,s m ), and (a,,...a ) respectively. If χ is a vector then x(i) denotes the it'1 coordinate of x. Let 0 denote a vector with all coordinates equal to 0. Define the set C(n,a) of pairs of integers as C(n,a)={(i,j)|lsisr
and lsjsm
and n(i)>0
and a(j)>0}.
We now supply the optimal i ty equations: (5) (6)
M(Q;s;Q) = 0 M(n;s;a) = min 1 and accordingly ngiy μ ^ υ nη_i) , ^ ■· luiiu» p >■'()!" ( H a - i ) , a cont that ρ(Ηα_·] a PI Un , M —ntradi etion and therefore we con P(Uq-l' clude that U must occur as an uninterrupted sequence in every optimal schedule with respect to < ' . Part 2. In this part of the proof we show that there is an optimal schedule with respect to p(U.j)
U
Fig. 4 Through a series of interchanges which do not increase the mwft we can move U to the position between Hg and U7 thereby obtaining a schedule S' which sati s f i e s the sequencing constraints with respect to 0 for all non-empty feasible S. It may be possible to describe a discipline by more than one set of parameters a(l|S,k) and r(i|S). For example, the Processor Sharing discipline is character ized by equal service rates r(i[s) = 1/n (l .
The occupancy and state of X..) respectively.
T = (χ^.-.,χ^ "S+(u,i,k)= (S1,...,Su+(i,k),...,Su) Y+(u,i,k,Y) = (Xj^.-.-.X^d.^Yj,...^) S-(u.i) = (S1,..,Su-i,...,Su) X-(u,i) = (Χχ
X^i,...,}^)
r(u,i|s) = r u (i|S u ) a(u,i|s",k) = a (i |s ,k) n(u) = the number of customers in queue u. k(u,i) = the class of the customer at station i of queue u
We de-
98
K.M. CHANDY, J.H. HOWARD, JR., and D.F. TOWSLEY X
u, i
= the remaining service requirement of the customer at station i of queue u.
By an argument similar to the one given for single queues in [22], the equili brium state probability function p(X) for a network must satisfy the network balance equation:
\1A UxT T(u ' 1ils) + u=ISi=l U
Κ
v=lk=í
L
u,i
n(v)+l 3=1
b
v.k;u.k(u,i) a ( u i | S ( u ' i ) ' k ( u 1 ) ) f u,k(u,i)«u,i ) ·
r(v,j|s(u,i) + (v,j,k))p(X(u,i)+(v,j,k,0+))j| = 0
(10)
We assume the network is ergodic, and the balance equation has a unique solution. As with individual queues, we subdivide the network balance equation. p(X) sat isfies local balance for queue u in the network if the term in braces {) vanishes for all feasible X. This balances the gain in probability density of state X due to arrivals in queue u against the loss in probability density due to. service in queue u. Refining further, p(X) satisfies station balance for queue u in the network if the term in brackets [] vanish for u, all i=l...n(u), and all feasible X\ This balances the gain due to arrivals at station i against the loss due to service at station i of queue u, for each i. To relate the behavior of a queue in the network with its behavior with Poisson arrivals (as studied in section 3 ) , we define queue u in isolation to be a queue with the same service discipline and distributions as the queue in the network, and with customers of class k arriving in a Poisson manner with rates λ pro portional to the relative visit rates y , of the network. Thus λ ,AÏV I · y 'u.k u.kA Yu.k The constant of proportionality A is unimportant provided that it is positive and not so large that an isolated queue has no steady state. THEOREM 6 If each queue of a network satisfies local balance when isolated, then: 1.
The equilibrium state probability density function of the network takes the product form: ι
Ö
P(X) =7> Π Ρ,,00,, KJ
UD
.UU
u=l where G is normalizing constant and p u (X ) is the state probability density function of queue u in isolation. 2.
The network is locally balanced, and
3.
Each queue which satisfies station balance when isolated satisfies station balance in the network.
Proof:
Muntz [5] has proved part 1.
We prove parts (2) and (3) in [22].
A class independent discipline (CID) network is one in which each queue has a class independent discipline. We define the CID product form of a network state probability function by combining the network and isolated queue product
PRODUCT FORM AND LOCAL BALANCE IN QUEUEING NETWORKS
99
forms: , p(X) =~ G
U n(u) Π q (n(u)) Π y ., , , (1-F ., ..(Χ .)) , Λι . . J u,k(u,i) u,k(u,i) u,i
where n(u) Π 1/R (m) , u m=l and G is a normalizing constant. Clearly, G will be a function of the individual queues in the network, their interconnections, and the number of customers of each class in the network. q (n(u)) = u
THEOREM 7 A CID network has the CID product form at equilibrium if and only if each queue in the network has the product form when isolated. Proof: By Theorem 5, the product form at equilibrium for individual CID queues when isolated implies local balance when isolated, and thus by Theorem 6 it im plies the CID product form. The forward implication is shown in [22], APPENDIX - NOTATION Queue Structure
Κ
(fixed) number of customer classes.
k
a particular customer class, 1-k-K.
η
the number of customers in a queue, n-0.
i
an occupied station, 1-i-n.
k(i)
the class of the customer at station i.
S = (k(D, ...,k(n))
the occupancy, or discrete part of the state.
Χ.
the remaining service requirement of the customer at station i, 0 . , where | ξ 1 | i s t h e t o t a l number of modules transfered at τ^ ( t h e r e f o r e τϊ+1 > ΐ>·). The two formulations are equivalent i f f . h=0. 2. HEURISTIC FORMULATION. We use dynamic programming methods to derive functional i n e q u a l i t i e s for v(n,m) defined by ( 1 . 2 ) . Let ι be an a r b i t r a r y point on the time a x i s , (n.m) t h e corresponding s t a t e and τ the f i r s t decision time with τ 1 > 1. i)
- If τ > t , we choose δ, 0 < δ < τ ^ - t ; on [ t . t + δ Γ , the process evolves l i k e a pure Markov Process. Assuming t h a t optimal decisions are taken a f t e r t + δ , t h e cost on [t,+°°[ i s :
(2.1)
X = E (f nm J
e _ a S f(N ,A )ds + e _ a ô v ( N R . M j ) . s s
i i ) I f τ = t , a t r a n s f e r occurs, and t h e cost i s (2.2)
Υ(ς) = k ( ç ) + Vtnξ, ra+ς)
δ
6
106
G. FAYOLLE and M. ROB IN
where ξ=1 and kCç) = k^g ( r e s p . ξ = 1 , ίίζ) to 2 ( r e s p . from 2 to 1 ) . Obviously, ( 2 . 2 ) holds i f f .
n^S,0
= kg^) i f t h e t r a n s f e r i s made from 1
and njàí,0.
(see Section 3.1 for d e t a i l s ) .
The optimal cost V(n,m) must v e r i f y (2.3)
V(n.m) = rainiX, rain
Y(C)¡
ς=-ι.+ι From now on, we set MV(n,m) = min Υ(ξ). ζ—1.+1 (2.3) can be w r i t t e n (2.4)
V(n.m) < X
(2.5)
vCn.m) < MV(n.m)
Í2.6)
(v(n,m)X)(v(n,m)MV(n,m)) = 0
The i n f i n i t e s i m a l g e n e r a t o r of the Markov Process (Ν^,Μ^)
{
AWLn.m) = χ η
λ, rw(n+1 ,m) W(n,m)] + Χ
is μ J w ( n 1 ,m) w(n,m)l
η
+ Xn ^ 2 [w(n,m1 ) W(n,m)l
Using Dynkin's formula, E e °ν(Ν £ ,Μ Ε ) V(n,m) = E nm 6 6 nm J Dividing by δ and letting δ » 0, it follows,
Γ
δ aS e~ A ( VaV)(N ,M )ds . ss
AV + aV < f
(2.8)
I V < MV (AV + aV f)(V MV) = 0
Remark 2.1. i)
( 2 . 8 ) i s equivalent to AV + aV < f V
0 function on ¡0,1,
,N¡ χ ¡0,1,....M).
The uniqueness results from the stochastic interpretation of u. Let us define Ψ(Ν ,Μ )) s s which is a stopping time w.r.t the family of σfields ¡J (3.8)
τ = Inf(s > 0 I U ( N ,Ά } — ' ss
o¡N s ,M s |s< t¡.
Dynkin's formula (cf. [6]) yields (3.9)
E u(N„,Mje nm τ τ
α τî
u(n.m) = E
Γ * aS e (Auau)(N .M )ds nm J s s o
But, by definition of τ, t € [θ.τ( => uCN t ,M t ) < >y(Nt,Mt) hence, in ( 3 . 7 ) . we have for t ξ [ 0 , τ [ (3.1θ)
(3.1Ο
-Au + au = f , which give s in ( 3 . 9 ) Γ τ aS u(n,ra) = Ε ί e f(N ,Μ )ds + ε ' ^ ψ ί Κ . , Μ , ) ) nm J s s τ τ
Now l e t τ be an a r b i t r a r y stopping time w . r : t . 3 l a and ( 3 . 7 ) , i t follows t h a t (3.12)
. From the same Dynkin's formu
u(n,m) < E (/ e~ aS f(H ,Ά )ds + β - α τ ψ(Ν .A )). ~ nm J s s τ τ
OPTIMAL QUEUEING POLICIES IN MULTIPLE-PROCESSOR COMPUTERS
109
(3.1O and (3.12) imply Γ
(3.13)
τ
uCn.m) = inf E (ƒ e n,m ί τ °o
aS
f(N ,M )ds + e s s
ατ
ψ(Ν ,M )) τ τ
where the infimum is taken on the set of all stopping times w.r.t 3 But the previous computation holds for any solution of (3.7) and, as the right member of (3.13) is uniquely defined, the solution of (3.7) (if it exists) is uni que. Now the existence of a solution of (3.7) is a consequence of theorem 5.1 p.245 of J.L. Lions [7] (the theorem of ¡7] is stated in a much more general context l). Another proof could be done by using the "penalized problem" Au
ε
+ au
ε
+ (u f) ε e
for which it can be shown that u minated.
= f
j u, solution of (3.7). The proof of (i) is ter
Proof of (ii). The stochastic interpretation of u given by (3.13) yields : ψ < ¥' => u < u' if u and u' are the solution of (3.7) corresponding to ψ and Ψ' respectively. We establish below that V
< V
by induction.
Demonstration. Let us prove that 0 < V
< V
; we have
(3.14)
1 fτ V (n.m) = inf E ( ' e a S f(N ,Ά )ds + e nm J s s τ o
We set
J 1 (τ) = E C ƒ e~ aS f(N ,Ά ) + e"aTMV°(N ,Α )) nm nm J ss τ τ
aT
MV°CN ,M )) τ τ
Clearly,
Now if V lows that
V (n,m) < J (+00) < V°(n,m) — nra — „k-1 ... operator .... .. being ,.... non-decreasing, we have MV < ν , the M k+1/ % k / s J Lx) < J Or), nm — nm
u
V τ
=>
k+1 V
< MV
. It fol
k < V . -
Furthermore, f > 0, implies easily (on the stochastic interpretation) V
> 0. V k.
Therefore. V converges to a limit V. Upon letting k -» ~, we obtain -AV + αϊ < f . as V
k
< MV
k—1
can equivalently be written ^(n.m) < k
T^Cn.m) < k
12
+ Vk
1
(n-1 , m+1 )
+ Vk 1 ( n + 1 , n+1 )
(for n-1 > 0, m-1 > 0, n+1 < N, m+1 < M, the boundary cases are s t r a i g h t - f o r w a r d . ) ,
110
G. FAYOLLE and M. ROBIN
We can take the limit for each (n.m) (there is a finite number of states), which yields V < MV . From the same argument in the third equation of (3.5), it follows that tion of (3.1 ) to (3.3).
V is solu
To show the uniqueness, we use a proof similar to Laetsch f8]. Let u. and ug be two solutions of (3.O to (3.3), and let γ be the greatest number such that Ο < γ < 1 YU, < u 2 Assuming γ < 1 , t h e r e e x i s t s δ , γ 0 I V(N 1 ,Μ1 ) = ¡WIN1,.!1)). — ' s s s s and ζ as follows : Let (3.18)
(3.19)
τ
d
M VOn.m) = min k ξ + ν(ηξ, m+ς) ξ >O
Μ V(n.m) = min k ξ + ν(η+ξ, m-ξ) ξ > Ο (of course the minimization i s done on values of ξ which hold η - ξ , η+ξ, m+ξ, m-ζ, in ¡ 0 . 1 , . . . . N j or ¡0,1 MJ). Let ξ (n.m) r e a l i z i n g the minimum of A v(n.m), ξ (n,m) r e a l i z i n g t h e minimum of M2V(n.ra). Then ζ 1 i s defined as Ç1(N1 , M1 ) i f M VCN1 ,M1 ) < M VCN1 , Λ1 ) and ζ 2 (Ν 1 ,Μ1 ' in the other case. τ τ x τ τ τ τ τ According to the previous d e f i n i t i o n s , we have V(N1,M1) < M V ( N 1 . M 1 ) , s s s s
S ί Γθ,τ 1 Γ ' '
and (by ( 3 . O - ( 3 . 3 ) ) , - A V C N 1 . « 1 ) + aV(N1.M1) = f Í N 1 . « 1 ) , ss'
SS
SR 7
s E TO.TV L
'
112
G. FAYOLLE and M. ROBIN
from
1 E e " a T V Í N V . M V ) V(n.m) = E f ηπ 1 na J τ 1
e ^Al V a V K N 1 .M1 )ds s s
( v a l i d , because (Ν ,Μ ) i s a Markov p r o c e s s ) , we get (3.20)
VLn.m) = E j % e _ Q S f (t;1 'J )ds + β"0™ c, ζ 1 + E e _ Q T V(N , ,M , ) nm J ο s s ' 1^ nm _1 τ ' τ1
where c = k
if the transfer is made from d ) to (2) and c = k
otherwise.
Defining, now, (3.2Ο
τ 2 = Inf (s > τ1 I V ( N 2 , M 2 ) = irøCN2.M2)).
— ' 1 2 there comes, for s £ Γτ , τ Γ
s
s
ο
s s
1
Ε e " " V(N 2 ,Μ 2 ) Ε e " a T VCN , ,Μ , ) = E nm nm nm J χ2 τ2 τ1 τ1
Γ t
2
e ~ a S f ( N ,Μ ) d s s s '
a¡:u V(N
%¿
,Μ ) = c ζ x¿
¿
+ V(Ν ..Μ z¿
x¿
),
and so o n . . .
This leads to : (3.22)
k . V(n.rn) = E ( i V a S f ( N ,Α )ds + V e ' ^ c . ζ 3 + e~ a T V(N , ,Α , )) ' nmJo s s .^ f Tk T k "
V is bounded, and letting τ
t +°°, we have :
v(n.m) = J (v), nm when v was defined by the previous construction (3.18), (3.19), (3.21). For an aroltrary control v, the same computation gives inequalities in 13.2θ), (3.22) [on account of (3.1) and (3.2)] ano it follows that : V(n,m) < J (v) — nm Thus V is the optimal cost. The case h/0. We still denote by 'A the operator defined in (2.I0) and we give briefly the modifi cation w.r.t the section 3.1. It can oe proved, (as the Theorem 2.1) that the sequence V is well defined. But here, we show that V^ » V uniformly. Upon setting k+1
||| Vvk+1V | vk|( = sup |vk+'(n.m) Vk(n,m)|, it follows from (3.I3), V*+'Vk|| < ||MV k MV k '| and easily. „i
MV
.,„k1 11
MV
ah π „k
|| < e
„k1 1
Ij V V
k k+1 k Then, for a > 0, h > 0, the mapping V — V is a contraction mapping, and V' V
OPTIMAL QUEUEING POLICIES IN MULTIPLEPROCESSOR COMPUTERS
113
uniformly w.r.t (n.m). The end of the proof is identical to that of theorem 2.1.
4. THe; LONG RUN AVERAGE CO.iT. The payoff is now defined by J Cv) = Lim Inf E (,' f ÍN .M )ds + T¡ ktÇ 1 )! . ) nm' TnmJ s s . , τ 1 . 1 '•S1 We denote by V (n.m) the optimal cost in the discounted problem and (4.0
(4.2)
V (n.m) = Inf J (v) o nm v
Theorem 4.1. i)
|v (η,α) V {o,o)|< C a constant, α α ' ii) aV > V which is a constant, α o and there exists a function h(n,m) such that Ah + V < f o— (4.3)
( h < Mh (Ah + V 'f)(h Mh) = 0 1
0
Proof. Let τ° = inf(s > Ο | (Ν ,M ) = (o.o)) ~ ' s s (here (Ν ,M ) is the Markov process describing the free evolution of the two N s s queues). It is well known that Ε τ and Ε τ are finite, setting nm oo nm τ = inf(s > 0 | (Ν ,M ) = (n.m)). nm — ' s s Hence, from (30, we have o r- τ
V (n.m) < E ( α — nm J
e~aSf(N
o
,Μ )ds + e _ a T V ÍN ,M ) ) s s α το το
i.e. V (n,m) < H f||E τ ° + V ( o , o ) α " " nm α
=> V (η,m) V ( o . o ) < C α
α
t
Taking thi initial condition (o.o), the same argument with τ entails : nm V (o.o) V (n.m) < C α α where C = sup (sup E x , sup Ε τ
|| |Ρ· Π
) < +°°, because the state space is finite.
η α
"
From aV < || f|| =n.m sup f, we know that there exists a sequence a, l 0, such that a, V V. n.m k "k ak |V (n.m) V (o,o)| < C ,
114
G. FAYOLLE and M. ROB IN Lin a, V (n,m) = Lim a, V (o.o) kfO Λ κ k(°° κ
ν
η
·
η
and V = constant. LV ίο,o)
Using A ÍV a (o,o)) = O we have
i s a constant) and s e t t i n g h a (n,m) = V (n,m)-V ( o . o ) ,
-Ah
+ ah + aV (o.o) < f a a a — < Ah a — a (-Ah + ah + aV (0,0) - f ) ( h a a û
h
Eence, i f h = ties :
- Mh ) = 0 a a
Lim (ν (n,m) V ( 0 , 0 ) ) , we get the previously asserted i n e q u a l i kf°° k Ah + V < f h
< Mh
(Ah + V f)(h Mli) = 0 By the construction of section 3.2, one can show that V = V_ [We could e a s i l y proceed as in section 3.2 on 3 .2 : k.TT kk Γ tr k A _< Λ 1 V.E (τ Λ Τ) < E ( f (Ν ,Μ )ds + T;k(ç k(ç X)x )X τ,1 nm " — nm J s s · ,
i>1
' S '
+ E h(N , ,Μ , ) h(n.m) k ™ T TkA T A T where the equality holds when τ are defined by t k = I n f ( s > Tk_1 —
| h(N ,M ) = ith ÍN ,M ) )J] . > s s s s
5. ABOUT SOME OTHER CASES. We give here the form of the optimality conditions in some other cases. The opti mal policy is obtained by using repeatedly the method of section 3. In general, the optimality conditions are always of the form Í3.1) to Í3.3) if the uncontroll ed process is markovian with A as its infinitesimal operator. 5.1.
.'i/i'i/1 /ï,
Finite horizon problem.
The state is described by it.Ν ,Μ ) and the operator ^ + A
where
A
is given by Í2.7)
Moreover, we must add the final condition VÍT.n.ra) = 0, in the inequalities, if Τ is the value of the horizon. 5.2.
M/ji/j/"./ queues.
The only modification is that μ μ, (¿5, if Ξ
and 5
A
n)
and
and μ μ2ί32 2 2
A
are replaced by m)
are respectively the number of servers in subsystem i l ) arid ( 2 ) .
OPTIMAL QUEUEING POLICIES IN MULTIPLEPROCESSOR COMPUTERS
115
5.5. General Service Time distributions, G (t) and G (t) denote the the di distribution of service times. We define G! (y) μ. (y) = 1G±Cy)
i = 1,2
where G!(y) = dG. (y)/dy. It is known that in an A/G/Î queue, the process (Ν ,Υ.) i s Markovian i f Y^ rej.re lapsed time time from from the the last l a s t beginning sents the elapsed beginning of s e r v i c e ( i f N+ 4 0 ; we"take Y t = 0 if N t = 0 ) . Therefore, in our case, the state will be described by (il ,Υ ,M ,Y ) and tr.e opera tor A has the form A +A , where A^Ln.y,, m.y 2 ) = X n
> Q
(^
+ μ, Cy, )[v(n1 .Q.m.y,,)
V(n. yi .m.y 2 )]) + * η < κ λ1 [v(n+1 ,y ^ ,α,γ^-Vin.y^
,m,y2)]
and A„ is analogously defined with τ — , p„(yO 2 2 2 oy 2 The form of A (differential operator) introduces some a/iiitionnal difficulties ira the proof of the existence of a solution to the inequalities obtained in Sectie:. 3. Nevertheless, the previous assertions remain valid. (For instance, the proof can be extended as in [5]). 6. A NUMERICAL EXAMPLE. Observe that the procedure used in Section 3 gives a numerical algorithm if inequa lities (3.7) can be sol/ed (or (3.5)). However these inequalities are known in numerical analysis as "variational inequa lities" (in this paper we have a very simple case of variational inequalities) and a suitable numerical method to solve them is the "relaxation projection method" cf. [9], which we use below. The algorithm is the following : 1. Solve (3.4) 2. Solve (3.5). until
sup | V k (n.m) V k _ 1 (n.m) | < ε , n,m
where ε is given error threshold. The numerical data for the example were the following : Ν = 2 0 , M = 20 λ 1 = 0 . 5 , λ „ = 1.0.
. μ]
= 0 . 5 . μ2 = 4.
f(n,m) = e n + c m + c_(ó ,, + 6 ,,) io. . = 1 i f i = j , 0 o t h e r w i s e ) . 1 2 3 ni mM ij c
= 1 .
0 = 1 . 1 , 0 = 1 0 .
k
12 = k 21 = ° · ' · h = 0 or 0.01 On fig. 1 and 2 three regions appear : region 1, where transfers are made from 1 to 2 .
116
G. FAYOLLE and M. ROBIN region 2. where transfers are made from 2 to 1 An intermediate region 3. where no decision is taken.
Some remarks must be made about the numerical experiments : i)
c is used tc penalize the state when the system rejects an arrival. Other wise, the policy would sometimes allow to fill up a queue in order to "reduce" the arrival rate.
ii) if c = c the policy is very insensitive to the variations of λ., μ.. i=1.2. and a transfer occurs either when one queue is idle, or when it is full (this last fact being due to the oost c ) . In the situation presented here, the computer time used is about 5 seconds on an IBM 370/168 computer.
CONCLUSION. In this paper, we have controlled the jockeying between two interconnected queues. Obviously, the method used applies in the general case of η queues. The problem we are now faced with is to calculate analytically the values of cer tain measures of performance in the optimally controlled process. Taking the exam ple of a holding cost function related to the global waiting time (i.e. for instan ce f(n,m) = γ— + ,— ) and for arbitrarily low transfer costs (k and k ) , what is the magnitude of tne improvement w.r.t independent processors ?
REFERENCES. [1] E.G. Coffman, I. Mitrani, "Selecting a scheduling rule that meets pre specifi ed response time demands"; Proceedings Fifth Symposium on Operating System Principles ; Nov. 1975. Boston Texas. [21! H. Glazer, "Jockeying in queues" ; Operations Research 6, 145 (1958). [3] E. Koenigsberg, "On jockeying in queues". Management Science, Vol. 12 n°5, 1966. [4J Ξ.Η. Fuller, D. Siewiorek, "Some observations on semi conductor technology and the architecture of large digital modules", Computer Vol 6, n°10, Oct. 1973, pp. 1421. [5] M. Robin, "Some optimal control problems for queueing systems"; Proceedings Symposium on stochastic Systems Lexington Ky, USA, 1975 (to appear in Math. Programming Studies). [6] E. Dynkiu, "Markov processes" ;
Vol 1, Springer Verlag I965.
[7] J.L. Lions, "Quelques méthodes de résolution des problèmes aux limites non linéaires". Duiiod Paris 1969. [&] Γ. Laetsch, "An uniqueness tneorem for elliptic quasivariational inequalities", J. of Functional Analysis. 18. 1975, pp. 286287. [9] R. Glowinsky, J.L. Lions, R. Trémolières, "Approximation des inéquations varia tionnelles". Dunod Paris 197>6.
117
OPTIMAL QUEUEING POLICIES IN MULTIPLEPROCESSOR COMPUTERS
1
1
1
1
2
2
2
2
2
2
2
2
2
2 2
y
,/' h=o
figure
1
y 2 2 2 —/
.2—1
Θ figure
J J S
LU. 2
0
1
2
2
2
3
2 Ί
2 S
2
/ -y
1
1
1
r
2
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
FILE ASSIGNMENT IN MEMORY HIERARCHIES
Derrell
Foster
D e p a r t m e n t of Computer S c i e n c e Duke U n i v e r s i t y Durham, N o r t h C a r o l i n a and J.C. Browne Department of Computer Science The University of Texas at Austin Austin, Texas
among order which These
This paper describes a methodology for file assignments the devices of a heterogeneous memory hierarchy in to balance file accesses. It has a number of features extend its capability beyond previous methodologies. include: (a)
Explicit account of queueing for devices in the hierarchy. It thus is applicable to a multi processor/multiprogramming environment.
(b)
It uses total system throughput rather than device usage or performance as a performance metric.
(c)
The possible reusability of files loaded into executable memory is explicitly modeled. Device size constraints in the hierarchy are expressly included.
(d) (e)
Implementation of the methodology is straightforward and its application is practical.
An implementation of the methodology is described and the model system is then applied to several assignment problems and to stmdy the utility of various heuristic file assignment pro cedures. The implementation is a two-phase hybrid model which retains all of the essential performance determining factors without intractable cost of evaluation. A simple heuristic which appears to be extremely effective is proposed for file assignment. INTRODUCTION AND OVERVIEW This paper defines a methodology for file assignment to a heterogeneous hierarchy of non-executable memories, describes the implementation of this methodology and applies a model implementation to several example problems. The methodology is demonstrated to be both effective with respect to improving system throughput and efficient in application. File transfer to and from executable memory from the levels of a hierarchically structured auxiliary memory system can often be a performance limiting factor for large data processing computer systems. The problem of optimal assignment of files to storage devices has already received considerable attention. The great variety of auxiliary memory devices now being developed and projected[l,2]
119
120
D. FOSTER and J.C. BROWNE will increase interest in the problem domain. Ramamoorthy and Chandy [3] is the starting point for most subsequent work on the memory hierarchy analysis problem. Ramamoorthy and Chandy formulated the assignment problem in operations research terms and established the technique of using a cost basis. Arora and Gallo [4] determined a methodology for op timum file assignment in a uniprogramming invironment. Chow [5] has made a complete formal analysis of the uniprogramming assignment case. Shedler [6] has analyzed a particular two level hierarchy with implicit inclusion of queue ing and multiprogramming effects. Buzen [7] has shown that I/O devices should be loaded proportionately to their speed to maximize system throughput in the multiprogramming system. Chen [8] gives elegant solutions to several interesting cases of the file assignment problem including queueing effects by determining optimal device utilizations. Chen does not consider application of his techni que which is couched in terms of a linear programming problem. Experience sug gests that applications of linear programming techniques to large file sets will be either intractable or extremely expensive. PROBLEM STATEMENT AND SOLUTION METHODOLOGY This paper focuses on the problem of assigning logical files (or segments) to the heterogeneous storage devices of a multiprogrammed (and/or multiprocessor) computer system so as to maximize the task or job completion rate (throughput) of the total system. Factors which effect system performance under a given file assignment include: the access time, transfer rate and storage capacity of the storage device set, the channel connections between the storage devices and the executable memory, the usage profile of the file set including the probability that a file will be used more than once following a transfer to executable memory (file reusability) and queueing delays for device access caused by multiprogramming and/or multiprocessing. The file assignment methodology presented in this paper takes all of these factors into account. Reusability has not been previously considered in other work on file assignment. The effects of queueing delays and explicit device size constraints are, for the first time, integrated into a complete and applicable file assignment formalism. The solution methodology presented here can be con sidered to be an extension of the work of Arora and Gallo [4] to a multiprogram ming/multiprocessor environment with inclusion of possible reusability of files loaded into executable memory. The practical inclusion of device queueing extends the optimality of assignments into the domain of multiprocessor/multiprogrammed usage of a file storage system. The wait time associated with queue ing for a device can rapidly increase its effective service time. For example, a utilization of 50% with a drum for a 10 millisecond mean service time and exponential distribution of service times gives a wait time of 20 milliseconds and thus doubles the effective service time of the device. Multiple uses in executable memory of a file after it has been loaded from a storage system is a common occurrence in time sharing systems and transaction oriented systems whose language processors and/or utilities are reentrant or multiple user code. The examples given in Section 4 demonstrate that file assignments which include the effects of multiprogramming/multiprocessing environments and loaded file reuseability can be significantly different from those generated on the basis of uni programming or static assignment environments. Throughput will vary significantly and be increased significantly due to the inclusion of these factors in file assignment. The solution methodology is defined as follows: (a)
(b)
Determine the characteristics of a file set character izing each file in terms of the volume of the file, request frequency of the file, the instructions executed per request, words loaded per request and the reusability of the file. The values possible for reusability are program file not reusable, data not reusable, program file reusable and data file reusable. Specify the hardware configuration in terms of processors,
FILE ASSIGNMENT IN MEMORY HIERARCHIES
121
storage devices, and channel interconnections. The service times for each device must also be specified. (c) Make an initial file assignment among the devices of the storage hierarchy. (d) Determine the fraction of file references satisfied by files already loaded into executable memory, the system throughput with the initial file assignment and the device utilizations. (e) Determine the utilization fractions for the device sets of the storage hierarchy which maximize the total system throughput given the reusage fraction generated preceding. (Note also that ODtimal device usage depends upon channel interconnection and usage.; (f) Make a file assignment which is "optimal" with respect to generating a match for the optimal device utilization fraction determined in step (e). (g) Determine the file reusage fraction, device utilization and the throughput of the system under this new file assignment. (h) Reassign files so as to correct for deviation of the actual device utilizations from the optimal device utilizations. (i) Iterate on steps (e), (f), (g) and (h) until either no files are reassigned in (f) or (h) or until the variation in throughput on a cycle falls below a desired threshold. Steps (d) and (g) are, of course, executed in terms of system models. Appro priate models will be defined in section 2. It should be noted that this methodology is applicable not only at load time, but also defines a procedure for dynamic reassignment in terms of usage variations. A more detailed des cription of the methodology can be found in Foster 19). A MODEL IMPLEMENTATION OF THE METHODOLOGY A methodology for file assignment is useful in direct proportionality to its implementability as well as to its insight into the factors controlling device hierarchy management. This section describes an implementation made to validate implementability and applicability of this methodology. We first consider the determination of optimal device utilization fractions (step (e) in the methodology definition). We adopt a queueing network descrip tion of the system. In this model the storage device utilization fractions are determined by the device service rates, the branching probabilities to a device from the CPU or CPUs in the system, the fractional reusability of the loaded files and the queueing algorithms for the devices. The branching probability to a specified storage device is determined by the files assigned to the device. The branching probabilities are thus the variables to be optimized [7,8] in the determination of the optimal device utilizations since all other factors are un affected by file assignment (since we are not considering optimization of usage within a given individual device). An example network which was used in an appli cation given in Section 4 is shown in Figure 1. The branching probability (f) for bypassing storage device service and returning for additional CPU service directly is determined from a simulation model described succeeding. It represents the fraction of file requests satisfied by files already loaded in executable memory. The implemented network model uses first-come-first served scheduling of the storage devices and processor sharing scheduling on the CPU with exponential service rates for all devices. (More detailed models could be used if desired. The approximations here, however, are reasonable ones in the light of current practice.) This model system is analytically soluble. The values of P., the branching probabilities which optimize system throughput, are determined by a grid search. Hogarth and Chandy [10] and Baskett and Price [11] determined analytical procedures for optimal selection of branching probabilities on several metrics. The appropriate degree of multiprogramming is determined from the simulation model described in the next paragraph.
122
D. FOSTER and J.C. BROWNE
FIGURE I : ANALYTICAL MODEL FOR DETERMINATION OF OPTIMAL BRANCHING PROBABILITIES Evaluation of the fractional reusability of the files loaded in executable memory and application of executable memory constraints (steps (d) and (e) in the methodology definition) was accomplished by a simulation model written in ASPOL [12]. The system model is illustrated by Figure 2 which refers to one of the examples of Section 4. This model includes express allocation of executable memory to files and the reservation of channels and devices. Job flow through the model procedures follows. A job file is,.assigned memory at / ^ l ^ y A channel to the appropriate device is assigned a t / Z \ · The I/O request is satisfied and the job file loaded. It begins a CPU service and requests a data file. The process is continued for the data file and subsequent job and data files. Requests for files are generated in conformance to the activity profiles. Explicit account of loaded files is kept. The fractional reusability is determined from these statistics. File sets with a fairly compact frequency of usage profile (a common characteris tic) operating on an executable memory system capable of holding several active records (or active file subsets) will show a strong coupling between reusability and loading strategies. The associated tendency of file references to cluster (which we have not included in our modeling implementation) will tend to magnify this coupling. The iterative approach between a queueing model and a simulation model (described in the methodolgy is designed with several goals in mind. The necessity of explicit allocation of executable memory to account for file size and reusability factors requires the use of simulation for evaluation of appropriate models. The need for exploring an extensive parameter space of possible file assignments suggested the definition of a two phase model where the analytically soluble queueing network model can be used to determine optimal branching probabilities (and thus optimal file assignments). The model system implementation described
FILE ASSIGNMENT IN MEMORY HIERARCHIES
123
preceding represents a geographically compact device hierarchy. The methodology is, however, extendable to network file assignment and this extension is being pursued by Chandy and Hewes [13].
I/Oi
7^]A-f^H5)-V3]e
FIGURE 2: MODEL FOR DETERMINATION OF EXECUTABLE MEMORY ALLOCATION AND FILE REUSABILITY EXAMPLE APPLICATIONS The set of example applications is given here only in outline form. For other applications or for details of the results the interested reader should consult Foster [9]. The first example demonstrates the effect of queueing delays. It uses the file activity profile of Figure 3. This profile is the file set obtained and'used by Arora and Gallo [4] except that reusability is expressly considered. When the processing of a nonreusable file finishes that file is deallocated from executable memory. When the processing of a reusable file finishes that file remains in executable memory until it is replaced by another file which needs its memory. If a reusable file is a data file, its memory is deallocated only after it is written to the appropriate device with a probability of 0.1. The hardware config uration is as follows: (a) (b)
There is a single CPU with mean execution time for instructions of one microsecond. The executable memory has a size sufficient to hold any 5 file blocks and a transfer time of 0.5 microseconds per word.
124
D. FOSTER and J.C. BROWNE
i
Reusability
I 2 3 4 5 6 7 S 9 K) 11 12 Π 14 15
PR PR PR PR PR PR PR PR PR PR PR PR PR Ρ 1) 11 I) 1) 1) I) D DR D D I) II I) D il I) 11
Ι fi
17 IH 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 'JS 39 40 41 42
»
11 D D 11 D D I) 11 D P
Frequency
Instructions
Record Size
Volume
6000 3000 3000 3100 2900 2900 2400
2112 6000 3456 1920
2112 6000 3456 1920
3200 2752
3200 2752
.176
1800
1088
1088
.14 .15 .15 .15
1800 1000
1792
1792
3.1 1.2 .26 .26 .76 .32 .32 .14
.296 .228 .268 .820 1.420 .492 .176 .492 .001 .296 .101 1.130 .843 .180 .001 .912 .090 .560 .140 .063 .068 .0001 .001 .750 .0001 .001 .001
5
.001
800
600
1000
10
1700
10 1 304 3 1 10 20 27 30 40 47 30 20 30 1 50 4 1 7 50 100 7 100 50 3 75
1000
254 254 832 640 384 64
1700
150 2 304 6 2
2000
462 27 35
l>6 47
33 28 33 1 202 2 1 7 245 209 7 1000
66 3
1056
500
254 254 832 h 40
384 21504 167040 24000 264000 516800 90024
330
113154
462
216000 1033352 5630400 135360 68630 26032 600000 1320 126000 2112 4400 93060 21102 30030 54012 60060 20064 240042 3168 482608
FIGURE 3: ACTIVITY PROFILE (c)
File memory hierarchy (1) A fast drum large enough to hold all records with an average latency time of 4.3 milliseconds and the transfer time per word of 4.2 microseconds. (2) A slower drum with a sufficient capacity to hold all files, a latency of 17 milliseconds and the transfer time per word of 4.2 microseconds. (3) A disk with enough capacity to hold all files. The combined seek and latency time is 47.5 milliseconds and the average transfer time per word is 7 microseconds.
FILE ASSIGNMENT IN MEMORY HIERARCHIES
125
The degree of multiprogramming, i.e., the number of jobs located anywhere in the model system is restrained to be five. There is, thus, no queueing anywhere in the system save for the actual memory devices. The initial strategy which would be applied in this case by Arora and Gallo is to load all files to the fast drum. The solution obtained including queueing and re usability assigns 59% of the files (on a relative frequency basis) to the slow drum and 28% of the files to the disk. The actual reference probability of the several devices will be different from the static frequency because of the reusa bility factor. One file of high frequency use is assigned to the disk. It is almost never transported since it nearly always stays in executable memory. The difference in throughput is significant. The optimal static assignment of placing all files on the fast device gives a relative throughput of 209 whereas the optimal solution gives a throughput of 257. The reusability fraction for loaded files is approximately 0.32. The usage of the various and several devices are as follows: the fast drum .80, the slow drum .18, and the slow disk .02. The second example is designed to test the significance of reusability. If we alter the activity profile of Figure 3 to where all files are either reusable or not reusable and assign finite sizes to the executable memory and the several file memories, a reasonable evaluation of the influence of reusability is obtained. The hardware system characteristics are as follows: (a) (b) (c)
There is a single CPU with one instruction per microsecond execution rate. There is a single fast memory of 32k words (k=1024). File storage: The file storage hierarchy has 3 entries. (1) A fast drum with a capacity of 1.2 million words, a latency time of 4.3 milliseconds and a transfer time per word of 4.2 microseconds per word. (2) A slow drum which has a capacity of 1.2 million words, a transfer rate of 4.2 microseconds per word, and a latency time of 17 milliseconds. (3) A slow disk with a capacity of 3 million words, a combined mean seek and latency time of 47.5 milliseconds and a transfer time per word of 7 microseconds.
These devices are all connected to the CPU by two pooled channels. We set an upper limit on multiprogramming of 7, The throughput for the case where all files are reusable becomes 140 units and the system is processor-bound. The throughput in the case where the files are all not reusable is 96.9 units and the system is essentially load-bound. A further example which illustrates the influence of executable memory sizes is to divide the 32k words of executable memory into two sixteen-word memories, one at the same performance level of the original 32k memory and a slower memory with one half the performance characteristics of the faster memory. The result is then that for reusable files throughput falls to 119 units, while for non-reusable »files where the bottleneck is principally access to the devices, the throughput falls only to 94. The final application discussed here is the evaluation of file assignment heuris"tics. The model implementation of the methodology is very efficient in application. The complete solution to the first example of this section required approximately 1 1/4 minutes of central processor time on a CDC 6600. It is however, not always trivial to gather the data on file usage necessary to apply the methodology given here. It is, therefore, still desirable to evaluate simple heuristics to determine their relative accuracy. A very simple heuristic with a good performance may very well be entirely preferable to an excellent algorithm with cumbersome application. The following heuristic strategies were evaluated:
126
D. FOSTER and J.C. BROWNE (a) (b)
(c) (d)
(e)
Load the most frequently executed files on the fastest devices using only device capacity as a loading constraint. Load the most frequently executed files on the fastest devices using device capacity and semi-optimal branching probabilities which will be defined subsequently as loading constraints Load the most frequently executed files after frequence normali zation, i.e., request frequency divided by volume on the fastest device using device capacity as a loading constraint. Use the most frequently executed files after frequency normali zation on the fastest devices using device capacity and semioptimal branching probabilities as the loading constraint. Load the most frequently used files on the slowest devices using only device capacity as a loading constraint.
Obviously (e) is a worst-case strategy devised for comparison purposes. A semi-optimal branching probability is computed by estimating a reusability factor and a mean I/O device service time using mean record size of all files and the device characteristics and attempting to approximately match the optimal branching probabilities obtained by a single solution of the queueing network model. Table 1 gives the result of this evaluation. Strategies (c) and (d) give fair to good results: the very simple strategy (c) obtains an assignment which differs from the optimal assignment by little more than 15% in throughput. The somewhat complex strategy (d) which omits only iteration through the hybrid model obtains a throughput of 92.6%, only 5% less than obtained by an optimal file assignment strategy. TABLE 1 THROUGHPUT FOR HEURISTIC LOADING STATISTICS Heuristic a b c d e Optimal
Throughput
82..3 84..1 87 .3 92 .6 75..8 96 .9
SUMMARY AND COMMENT The methodology for file assignment presented here spans all of the major perfor mance limiting factors in file assignment to memory heirarchles. It is, however, straightforward to apply. It yields excellent results in the example cases tried. The performance effects of device queueing and file reusability were explored. A simple heuristic for file assignment that appears to be near optimal for the limited span of cases examined was found. There are at least two areas where the methodology described here needs extension and development. These are consideration of request set clustering and evaluation of how and when reassignments should be executed. ACKNOWLEDGEMENTS This research was supported by the National Science Foundation under Grant GJ-1084 to the Department of Computer Sciences at the University of Texas.
FILE ASSIGNMENT IN MEMORY HIERARCHIES
127
REFERENCES 1. R.R. Martin and H.D. Frankel, "Electronic Disks in the 1980's," Computer 8, 24 (1975). 2. S.L. Rege, "Cost, Performance and Size Tradeoffs for Different Levels in a Memory Hierarchy," Computer 9, 43 (1976). 3. C.V. Ramamoorthy and K.M. Chandy, "Optimization of Memory Hierarchies in Multiprogrammed Systems," J. ACM 17, 426 (1970). 4. S. Arora and A. Gallo, "Optimization of State Loading in Multilevel Memory Systems," J. ACM 20, 307 (1973). 5. C K . Chow, "On Optimization of Storage Hierarchies," IBM J. Res, and Devel.18, 1974 (1974). 6. G.S. Shedler, "A Queueing Model of a Multiprogrammed Computer with a Two-Level Storage System," Comm. ACM 16, 3 (1973). 7. J. Buzen, "Computational Algorithms for Closed Queueing Networks with Exponential Servers," Comm. ACM 16, 517 (1973). 8. P.S. Chen, "Optimal File Allocation," Ph.D. Thesis, Harvard University, August 1973. 9. D.V. Foster, "File Assignment in Memory Hierarchies," Ph.D. Dissertation, Department of Computer Science, University of Texas, August 1974. 10. J. Hogarth and K.M. Chandy (submitted for publication). 11. F. Baskett and T. Price, private communication. 12. Control Data Corporation, Publication No. 17314200. 13. K.M. Chandy and J.E. Hewes, "File Allocation in Distributed Systems," Proceedings of the International Symposium on Computer Performance Modeling, Measurement and Evaluation, Cambridge, Mass., March, 1976, pp. 10-13.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
MAXIMUM LOAD AND SERVICE DELAYS I N A DATA-BASE SYSTEM WITH RECOVERY FROM F A I L U R E S .
.
GELENBE, D.
DEROCHETTE
U n i v e r s i t y of Liège Service d'Informatique 5 9 , Avenue des T i l l e u l s B - 4 0 0 0 LIEGE BELGIUM ABSTRACT
A mathematical model of a transaction oriented system under intermittent failures is proposed. The system is assumed to operate with a check pointing and roll-back/recovery method to insure reliable information processing. The model is used to derive the principal performance measures, including availability, response time and the system saturation point.
1 . INTRODUCTION The purpose of this paper is to study the performance of a transaction oriented computer system under the effect of intermit tent system failures when the roll-back/recovery method is used to maintain the integrity of the data stored in the system. In parti cular we would like to respond to questions such as "What is the maximum load of transactions the system can support in the presence of a given rate of failure ?" and "How will the response time vary as a function of the failure rate ?", or "What proportion of the system time will be taken up by recovering from failures or prepa ring check-points as opposed to useful transaction processing ?". All of these questions are particularly important in data-base sys tems in which elaborate procedures for assuring data integrity can consume an important part of system time, especially during acti vity periods when the failure rate is high (e.g. when a modified version or a new release of the system is being introduced, or when the data-base contents are being modified frequently and er rors are being introduced inadvertently). Here by a failure we shall mean not just a deficiency in the hardware or software which
129
130
E· GELENBE and D. DEROCHETTE
leads to erroneous system operation but also the intentional or un intentional introduction or erroneous information which breaches the integrity of the information stored in the system. The system will be assumed to operate in the standard check point-rollback- recovery
(CRR) mode. For the purposes of this paper,
the instant of occurrence of a failure will coincide with the ins tant of detection of a failure! as we shall presently see, this assumption leads to no loss of generality in our context. CRR mode operation may be described as follows (see Figure 1 ) . At specified
check-pointing
failure
'
failure
i
Z
V i
C
j-1
Lt - Bl -
i t=C
j
a
i*1 a i*1 + Z l*1
Z l -J
Figure 1 instants of time {a }
+ , a checkpoint (CP) is instauredi that is
a secure and valid copy of all the information contained in the sys tem is stored into a memory unit which cannot be tampered with by failures. The creation of the i-th CP immobilizes the system du ring z. seconds. This means that during the time interval (a ,a +ζ ) no transactions can be processed. We also assume here that in this interval no failures can occuri this assumption is not essential and will be removed later. At instants ( b J ) . „,+ i i eN
requests for tran-
sections occur: we assume that two or more transactions cannot be processed in parallel and that they occur singly (i.e. several transactions cannot be requested simultaneously at some b ). The service time for the i-th transaction is s.. At instants {c.}. ,,+ i i i eN failures take place. A failure is supposed to invalidate all of the information stored in the system, excepting that which has been saved at the CP which preceeds the failurei this is why the prece ding remark that the instant of occurrence and of detection of a failure can be assumed to coincide is valid. For all transactions occurring in the time interval (a ,a
) a copy is maintained in
the system so that if a failure occurs at some time t e t a. *z. « a. .) i i i+1
ROLLBACK AND RECOVERY
131
all transactions 1 ) which have occurred in the interval (a +z ,t) have to be processed once again. CHANDY et al. [1] have obtained analytical results, in parti cular for the optimal value of the time separating two successive CP's in order to minimize the ratio of total expected downtime (their cost function) to total time over this period. They assume that CP's are established periodically and that failures occur ac cording to a Poisson process. With respect to the statement of the problem they consider, the main difference with our work is that they assume that the transactions which are requested during the establishment of a CP or during rollback/recovery is négli geable, and thus there is no degradation in system performance due to CRR mode operation. A survey of the work in this area is given by CHANDY [2] . The problem of selecting an optimum CP interval has also been examined by YOUNG [3] . Related problems are studied in [4,5]. The detailed model and the principal results of this paper will be presented in the following sections. As an example of the kind of result which we obtain let us mention the following without giving the detailed assumptions on which it is based. Resu It■ Let requests for transactions occur at a fixed rate of transactions per unit time, and let the average service time for 1 each transaction be λ . Let Τ be the average value of the time spent in normal operation between (a.+z.) and a. . for any i. De l ι i+1 note by Ζ the average value of ζ. , and let the time necessary for recovery from a failure by the CRR method be given by (F + pkT) on the average. The system will have a finite response time to tran sactions and be nonsaturated if and only if (1)
λ/μ < (1 + ykU Τ + Z/T + γ F ) ~ 1 o
where γ is the rate of occurrence of failures. Τ
, the value
of Τ which maximizes the supremum of the set of attainable values of λ for a nonsaturated system is given by 1) In fact we shall presently see that our results will also hold if only a subset of these, e.g. the transactions which modify the contents of the database, has to be reprocessed.
132
E. GELENBE and D. DEROCHETTE °
(2)
Τ
1/2 = (Z/ykp)
U i
In fact (2) which is obtained from (1) immediately is equi valent to a result derived in [13]. In the fallowing we shall present a mathematical model of the system we have described. It will turn out to be a nonstandard queueing model; that is, one for which the analysis at stationary state is not available in the literature. We shall provide a com plete analysis of the model at stationary state giving the gene rating functions for the relevant probability distributions. The ergodicity condition given in equation [1) will follow from these results. In order to interpret (1) in the absence of failures and of the CRR mode of operation, one should take Τ ■+ « and γ ■*■ 0 , with Τγ ■*■ 0 . We then obtain the familiar relation λ/μ
D,q
time
It is
the
p(n,q) exists
= 1,2,3,
such
that
to
138
E. GELENBE and D. DEROCHETTE [λ - λχ + γ + Τ"1 + μ - μ/xjG (χ)
(11)
Ρ (1 - 1/χ)ρ(0,1) + Ζ 1 0 3 (χ) + (pkT)" 1 G 2 (x) Similarly from (7) , (6) and (9) , (10) : Υε,,.(χ) = [λ - λχ + ( y kT)" 1 ]G 2 (x)
(12)
T"1G,(X)
(13)
= [λ - λχ + z"1]G_(x)
1
3
These equations yields μ(1-1/χ)ρ(0,1) (14)
G 1 (x)
Dix) - A(x) - B(x)
where Dtw) = λ - λ χ + γ + Τ A(x) = γ(μΐα)
1
-ι
+ μ -
μ/x
/(λ - λχ + (μία)" 1 )
Β(χ) = Ζ 1 τ" 1 /(λ - λχ + Ζ"1) Clearly the system is ergodic only if p(D,1) > 0 and 0 < G (1) < ». By applying l'Hôpital's rule to (14), we see that a necessary condition for ergodicity is that D'(1) A'(1) B' C U or,
> D
equivalently that
(15)
y > λ[1 + ykTy + Z/T]
To show that it is sufficient notice that if G(x) = G.(x) ♦ G (χ) < G (χ), then (16)
G(1) = G C U M
+ μkγT + Ζ/Τ]
Notice also from (5) (11) with ^ptn.q) = 0. that if p(0,U
> 0,
d t
then p(n,q) > 0 for all η > 0, q = 1,2,3. Now assume (15) and take
ROLLBACK AND REC OVERY
(17)
p(D,1)
= [1
+ Ζ/Τ]"1λ/μ
+ μ^Τγ
139
= A
ρ
which is obviously positive. Using (14) to which is applied l'HSpi tal's rule in (16) we obtain G(1) = 1 which is obviously finite. Therefore (15) is necessary and sufficient for the ergodicity of +
CM ,Q ■teR ) and Theorem 1 is proved. Let us now turn to the proof of Theorem 3. Using (12) (14) we write da)
GCx) = G ^ x )
+ G 2 (x) + G 3 (x)
G1 (x)[1 + C (χ)] where Ctxl
T 1 " λ λχ + Ζ _ 1
ï λ λχ + (pkT)" 1
The expected number of transaction requests in queue at stationary state is E{N } = lim ^GCx) X+1 1
and by LITTLE's Theorem, W = λ" Notice that G(1) = A "
1
EIN^}.
and
G' (1) · λ [ γ ( μ ^ ) 2
+ Z 2 /T]
Therefore G'CU
= A"1G^(1)
+ Aλ[γ(μkT) 2 +
Z 2 /T]
Denote by u, v the numerator and denominator respectively of G (x) in (14). Then G'lx) = U ' V
2V'U v
which is indeterminate at χ * 1. L'Hôpital's rule has to be applied to it twice to obtain
140
E. GELENBE and D. DEROCHETTE
G
,
m
=
u"(1)v'(1)
-
ν" (U u' CU
2(v' CU)2
where Ξ
1
v'CU
= μ ,
u'CU
= S"1
-
λΑ~1
= S"1(A
.
v"(U
-
ρ)
» -
.
u"(U
2S~1
-
= -
2λ
2
2S
1
(A
(γ(λ^)
2
-
+
p) Z2/T)
so that G-C11 1
pA + λpA 2 (γίμkT) 2 + Z 2 /T) A - ρ
and
E{N } = Ρ ! XA 2 [ Y (pkT) 2 + Z 2 /T) 0, r ii a ri|0 p(v.t) = P r { V t
Vl
wn Vt
+
2
ι.,D V n +2 | V D
.,η + 2, VQ }
where ν = (v_,,...,v „ ) is a vector of integers 0 < v. s L, i = 1, 0 n+2 " ι ...,n+2 [L is the total number of terminals and of user processes in the model). Under our assumptions the p(v,t) satisfy the follo wing system of differential difference equations :
— p t v . t ) = [CLMU + Σ % +
+ aCN) + DCv)]p(v,t)
(LM+1)λρ(ο(ν,0,η+2),t)
C2)
150
E. GELENBE, A. KURINCKX
+ Σ' M.pCc(v.O.i),t) 1 ι +
r" +2 (a.)" 1 p(c(v,i,0),t) 1 ι
+ D(c(v,0,n+U )p(c(v,0,n+U, t) where a
. = Q , a _ = C, a. ■ e(m), and for any vector ν we de n+1 n+2 1
fine cCv,i,j) « CvQ
v±1
ν +1
v n+ 2 ),Vi,j,j * i
(3)
and in (2) any pCv.t) such that ν contains an element which is ne gative or larger than L is set to p(v,t) = 0 for all t s 0. Also we take D(v) = 0 if the Cn+Uth component of v is zero (i.e. v
= 0 ) . These conditions, together with the assumption that
μ. = 0 in (2) if v. = 0, suffice to define the boundary conditions of (2). Finally, notice that : ,
,.η + 2
L = Σ
ο
y
„
i ' M * Σο
_n+1 v
,,
„n
i ' N " r o Vi
D(v) is the arrival rate of processes from the impeded set to the CPU queue : D(v) = (nN)i(N.t)
(4)
This system of differential difference equations will posess a tïO,
unique solution p(v,t) > 0 such that Σ p(v,t) ■ 1 ,
if all the state transition rates are positive and finite.
However the available theory [11] does not provide the equilibrium solution. In the following section we resort to an approximate so lution method. 2.3.2. Approximate solution viadecomposition The approximate analysis technique for queueing networks developed in [3,5] is suited to systems composed of subsystems where the time constants of intersubsystem interactions are appreciably larger than those associated with Interaction within each subsystem. This makes it possible to use equilibrium results for each subsys tem in the global system model.
151
RANDOM INJECTION CONTROL OF MULTIPROGRAMMING In our analysis we apply the technique several times to our model. These successive simplifications are shown on Figures 3(a) and 3(b), for the model of Figure 2(a).
processes
figure 5(a)
Μ
γ(Μ)
LTD
0
figure 3(b) ¿'Impeded set ; O N ) processes
Ν
ß(ll)
DUD—Eh figure 3(c)
152
E. GELENBE, A. KURINCKX As a first step we will approximate the system of Figure
2(a) by the model shown on Figure 3(a). Let BIN,tint be the proba bility that a process leaves the resource loop to enter either the impeded or think state in the interval (t,t + At) when the number of processes in the resource loop is N. We may write Ì(N,t) = [x
„(N) + χ „CN)]a(N)ACN,t) n+1 n+2
where ACN.t) is the probability that the CPU is not idle when there are N processes in the resource loop at time t. The approximation consists of replacing ACN.t) by A (N) : B(N.t) = ßCN) = [χ ,.(Ν) + χ _CN)]o(N)A n CN) n+1 n+2 0
(5)
where A (N) is the stationary probability that the CPU queue in the closed model of Figure 2(b) (i.e. the resource loop with Ν processes and with no interactions with the terminals or with the impeded set) is not empty. This approximation will be valid [3,5] if for 1 < i < π, α(Ν)[χ
.(Ν) + χ ΓΝ)] '
·'
//
ι
/
.·'■
■10
1
/.·'
^
2
3
4
5
6
7
Figure
9
11
10
11
12
13
14
15
L
168
E. GELENB E, A. KURINCKX
APPENDIX
Outlined here are the details of the derivation of the re sulting equivalent server (7). γ(Μ) is given by γ(Μ) = Σ^ „ — ßCN)p(N/M) N = 1 x „ + x _ n+1 n+2
CAD
Recall that pCN/M) 'is the steady state probability of having Ν processes in the resource loop given that there are M processes in either the impeded set or in the resource loop. We obtain this proba bility by solving the equilibrium equations for the system of Figure 3(c) . ρ(Ν/Π)[β(Ν) + δ(ΝΠΜΝ)] = pCN + 1/M)ß CN+1 ) + ρ(Ν1/Μ)δ CN) CMN+U
, 1' S Ν < M (A2)
ρ(1/Μ)β(1) = ρ(0/Μ)δ(0)M p(M/M)ßCM) = p(M1/M)6(MU The solution of these equations is N1 Π . 'δ Ci) CMi) p(N/M) = p(0/M) i^y
0 < N < M
(A3)
n^=1ßci) and we obtain p[0/M) by using the condition
Σ^ =0 ρ[Ν/Μ) = 1 This yields the result :
pCO/M) = [ï"
Π Ν "πδ(1)CMi) 1^ n^ =1 ß(i)
1 + 1]
(A4)
Using this value, and C1). expressions (A2) and (A3) become
pCN/M) = p(0/M)
h
N 1 „ ,.,f ρ lNJ
N n
"l 1-U
CMi)
1 < Ν ί Μ
(Λ5)
RANDOM INJECTION CONTROL OF MULTIPROGRAMMING
PtD/nl
and
replace
* [ΣΝ = 1 4ïïïf
pCN/M)
by
its
n
i=0 ( " n
value
in
(AU x
CM)
= B p C O / M ) M ! Σ N=1 X
169
+ 1] 1
[A5)
:
—
^N"1
_ + χ . n+2 n+1
(MN)! CA6)
pCO/M)
= ( 1 + B Ml
We know
1/C
Β tend
lim γ(Ν) Β*·» which
to
= 1/C
is the expression Notice
following
B
( N
^
[ n
.
N ] |
]
that
n+1
Let
Σ^=1
1/C +
1/Q
infinity.
Σ
The
hN"1 γ ^ ^ η
M
given
limit
M (ΣΝ = 1
of
equation
CU)
is
hN"1 "1 (NICMN)!1 O
in C 7 ) .
that we set B"·™ [as mentioned to represent
time is spent by processes
CA 6)
the fact
in the impeded
set.
in the
discussion
that when N = 0, n_o
170
E. GELENBE, A. KURINCKX REFERENCES
[1] L. BELADY, C.J. KUEHNER, "Dynamic Space Sharing in Computer Systems", CACM 12, 5, May (1969), 262288. [2] P. DENNING, 'Thrashing : Its 'Causes and Prevention", Proc. AFIPS 6B, Fall Joint Comput. Conf., 33, 915)922. [3] P.J. COURTOIS, "On the Near Complete Decompôs ab i 1 ity of Net works of Queues and of Stochastic Models of Multiprogramming Systems", CarnegieMe 11 on University, Computer Science Dept., Research Report, November (1971). [4] P.J. COURTOIS, "Decomposability, Instabilities, and Saturation in Multiprogramming Systems", CACM 18, 7, July (1975), 371377. [5] P.J. COURTOIS, "Error Analysis in Nearlycompletely Decomposa ble Stochastic Systems", Econometri ca, 43, 4, July (1975), 691709. [6] Α. BRANDWAJN, J. BUZEN. E. GELENBE, D. POTIER, "A Model of Performance for Virtual Memory Systems", Proc. ACMSIGMETRICS Symposium, Montreal, September (1974), to appear in IEEE Trans. Software Engineering. [7] M. BADEL, E. GELENBE, J. LEROUDIER, D. POTIER, "Adaptive Opti mization of a Virtual Memory System", Proc■ IEEE, Special Issue on Interactive Systems, June (1975). [8] P. DENNING, S. GRAHAM. "Multiprogrammed Memory Management", Proc. IEEE, Special Issue on Interactive Systems, June (1975). [9] P. DENNING, Ξ. GRAHAM, "Multiprogramming and Program Behaviour" Proc. ACMSIGMETRICS, 3, 4, September (1974). [10] J. LEROUDIER, D. POTIER. "Principles of Optimality for Multi programming", Intern. Symp. on Computer Performance, Model ling, Measurements, Evaluation, Harvard, March 19, (1976). [11] E. GELENBE, R.R. MUNTZ, "Exact and Approximate Solutions to Probabilistic Models of Computer System Behaviour Part I", to appear in Acta Informatica. [12] J. LEROUDIER, M. PARENT, "Quelques aspects de la modélisation des systèmes informatiques par simulation à événements dis crets", RAIRO, Informatique, Vol. 10, n" 1, Janvier (1976), p. 5 à 26. [13] R.W. CONWAY, "Some Tactical Problems in Digital Simulation", Management Sciences, 10, 1, October (1963). [14] P. LE GALL, "Convergence des Simulations et Application aux Réseaux Téléphoniques", AFIRO monographies de recherche opé rationnelle, 7, (1968). [15] E. GELENBE, A. KURINCKX, "Optimal and Suboptimal Random In jection Control of Multiprogramming", in preparation.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
A MODELLING APPROACH TO THE EVALUATION OF COMPUTER SYSTEM PERFORMANCE
H. GOMAA Department of Computing and Control Imperial College of Science and Technology London, England
This paper investigates some aspects of developing fast models of computer system performance. Two different modelling techniques, regression and simula tion modelling, have been applied and a method developed of combining their use within a multilevel hybrid modelling framework. The main objective of this paper is to demonstrate the feasibility and value of this approach to the modelling and evaluation of computer system performance. The approach has been demonstrated by modelling a CDC 6000 computer system at three levels of detail. At each level, a self-contained model of the system has been developed. The Workload Model is a purely regression model of computer system performance. It was developed after a comprehensive performance analysis and analysis of residuals. The model expresses a batch job's elapsed time as a function of the job's resource demands and the load on the system. The model has been successfully validated. The Load Adjusting Model is a hybrid simulation/regression model in which a simulation framework is created which models job arrival and termination. Within this framework, re gression techniques are used. The Memory Management Model is developed from the Load Adjusting Model by a systematic expansion of detail. The Memory Management subsystem is simulated in considerable detail, whereas the rest of the system is modelled in much less detail. Both hybrid models have been calibrated and validated.
,
171
172
1.
H. GOMAA
INTRODUCTION Three methods of evaluating computer system performance (11) are by:
(1)
Measuring and analysing the performance of the system processing the normal production workload.
(2)
Creating a model of the workload, in the form of a set of benchmark or synthetic programs, and applying this to the actual computer system. By this means, experiments may be carried out in a controlled reproducible environment.
(3)
Modelling the computer system and its workload. Techniques for modelling computer systems include simulation, analytical and statistical tech niques . For a computer system performance model to be of most value to computer
installation managers, it should be capable of modelling the system's performance in a fraction of the real world time. In such conditions, it is more economical to experiment with the model than the system itself. Furthermore, real system data should be used for input to the model, as well as for calibration and validation purposes.
This paper investigates some aspects of developing fast models of computer system performance. Two different modelling techniques, regression and simulation modelling, have been applied and a method developed of combining their use within a multilevel hybrid modelling framework. The main objective of this paper is to demonstrate the feasibility and value of this approach to the modelling and evaluation of computer system performance.
2.
THE
MODELLING
APPROACH
1)
Multilevel Modelling. One major problem of modelling computer systems is that of selecting the right level of detail to include in the model. This problem may be overcome by modelling the system at more than one level of detail. At each level, a self-contained model of the system is designed, implemented, calibrated and validated. At the next level, the model is refined further by adding more detail. This method implies that only as much detail need be incorporated into the model as is necessary for the aspect of the system being evaluated. In
A MODELLING APPROACH TO EVALUATING PERFORMANCE
173
general, the less detail included in the model, the more the saving in time and cost to develop the model and the more economical the running of the model becomes.
2)
Regression Modelling.
Regression analysis is a statistical method of
modelling computer systems which relies on workload and performance data collected from the system being evaluated. Regression analysis provides a means of approximating to the complex relationships that exist in a com puter system by some simple mathematical function such as a polynomial.
3)
Hybrid Modelling.
A regression model has the disadvantage of being
static, i.e. it does not model the passage of time. A simulation model is capable of overcoming this limitation. Furthermore, a simulation model can model structural and logical relationships in the system. However, a simulation model which produced results similar to a regression model would probably need to model the system in considerably more detail, and consequently be more expensive to implement. A more promising alternative is to combine simulation and regression modelling techniques to produce hybrid models of computer system performance.
The approach has been demonstrated by modelling the short job workload on the Imperial College CDC 6000 Kronos system, at three levels of detail. At each level a self-contained model of the system has been developed. At the first level, the Workload Model uses purely regression techniques. At the second level, the Load Adjusting Model, simulation techniques are introduced and combined with the regression techniques. At the third level, the Memory Management Model, more detail is introduced with the simulation of the memory management subsystem.
All three models are trace driven and model the execution phase of a batch job, that is the time from when a job is first scheduled for execution to the time it terminates. The time spent in this phase is the job elapsed time.
3.
THE
SYSTEM
MODELLED
3.1
The Imperial College CDC 6000 System The Imperial College (IC) CDC 6000 system supports a timesharing service,
a 'cafeteria' service for short batch jobs, a local batch service and a remote batch service.
174
H. GOMAA Batch jobs may fall into one of five job categories. In this paper, jobs
in the smallest category are referred to as short jobs, while jobs in the other four categories are referred to as long jobs. Short jobs may use up to 16 seconds CPU time, 25K words Central Memory and no magnetic tapes. The IC CDC 6400 consists of one central processor (CPU) and ten peripheral processors (PPs) (6). The central processor executes user jobs which are multiprogrammed in 64K Central Memory (CM), and a limited number of system functions. The PPs execute system functions only. The operating system used is Kronos. It supports dynamic storage allocation including rollin/rollout.
3.2
The Kronos Dayfile
The data used for the evaluation was derived entirely from the CDC Dayfile. The Dayfile is a system file which collects workload and performance data and is used primarily for accounting purposes (10). The Kronos Dayfile provides sufficient data to allow the batch workload to be modelled at the job level. However, insufficient data is available to allow the interactive workload to be modelled.
The contents of the Kronos Dayfile need to be reduced into a more con venient form for input to the models of the Kronos system. The Dayfile for a particular session is input to a suite of Dayfile Processing programs collect ively called the Preprocessor. The Preprocessor outputs one or more files for each session. Each file consists of a set of observations (one per job) where each observation consists of:
(a)
A performance measure, namely job elapsed time. This is the time from when a job starts execution to the time it terminates.
(b)
Measures of the job's resource demands, e.g. CPU time required, average Central Memory used, number of disc records transferred and number of job steps.
(c)
Measures of the system load. These are measures of the amount of com petition for system resources that a job experiences: (i)
Average number of terminals logged in while this job was in execu tion.
(ii) Average number of batch jobs and the average number of short jobs
A MODELLING APPROACH TO EVALUATING PERFORMANCE
175
concurrently in execution with this job. These are measures of batch job competition for Central Memory as well as the CPU. The multi programming level cannot be determined, as no record of rollin/ rollout is maintained by the Dayfile.
4.
THE
WORKLOAD
MODEL
4.1
Regression Modelling A regression model (i.e. equation) relates an'output (dependent) variable
to several input (independent) variables X., X„ ... X.. The functional relation ship could be linear in its coefficients, such as: Y = a
k + Σ
° i-1
a.Χ. L L
where a., i=0, 1, ... k are unknown parameters called the regression coefficients, whose values may be estimated by means of least squares fitting techniques (7,8). Examples of the application of regression techniques to the evaluation of computer system performances are given in references (1, 2, 9, 14, 20, 21). This section highlights the main results of the regression modelling of the IC Kronos system. A detailed description is given in reference (12).
4.2
Regression Modelling of the Kronos System The first models constructed used regression techniques to model the
performance of the batch workload on the Kronos system in the presence of a timesharing load. In the models, the dependent variable is job elapsed time (see 3.2). The independent variables are of two types:
(a)
Measures of the job's resource demands (see 3.2).
(b)
Measures of the system load while the job was in execution (see 3.2). Initially the workload was classified into long and short job workloads,
and models were constructed of each class. Several problems were experienced modelling the long job workload (12), and it was therefore decided to concen trate instead on modelling the short job workload. The initial models of the short job workload were unsatisfactory. In particular, the amount of variation 2 explained by the models (R ) was low.
176
H. GOMAA An analysis of residuals (for each observation, i.e. job, the residual ■
actual job elapsed time - predicted job elapsed time) showed that during certain periods of each session, there were groups of jobs with long elapsed time and large positive residuals. A close study of the Dayfile revealed that during these periods, the system was heavily loaded. Because of this, short jobs were rolled out of Central Memory for substantial periods of time. Since rollout time is not recorded in the Kronos Dayfile, it was not possible to account for this in the models. When the jobs with large residuals were excluded, models with much better fits were constructed.
4.3
Regression Modelling in the Absence of a Timesharing Load
A CYBER 7314 was installed at Imperial College in the summer of 1974. The CYBER is architecturally similar to the 6400. The workload was divided between the two machines with the CYBER supporting an entirely batch workload. This provided an excellent opportunity for the batch workload to be modelled in the absence of the timesharing workload.
Four models of the short job workload were constructed, one for each 2 session modelled. In the initial models, R was over 0.67, showing that in the absence of the timesharing load, much better models of the short job workload could be constructed.
However, an analysis of the residuals led to the discovery that there were some jobs in each session which displayed unusual characteristics (in particular they were highly I/O bound) and whose presence distorted the models. These jobs constituted less than 5% of the short job workload. It was therefore decided to exclude these observations from the models.
4.4
The Workload Model
Models were constructed for the four sessions which covered a period of four months, and a model was constructed using data pooled from all four sessions (Table 1 ) . The same independent variables were forced into each model to allow a comparison of the models. The models were validated by.comparing the residual sum of squares of the models of the individual sessions with that of the pooled model (12).
The validation of the models showed that the regression coefficients of the four models were consistent, although there was a greater variation in the
177
A MODELLING APPROACH TO EVALUATING PERFORMANCE Table 1: Regression Models of the Short Job Workload on the I.C. Kronos System
Independent
27/1/75
28/4/75
30/4/75
30/1/75
Variable
a.m.
a.m.
a.m.
p.m.
T
r.c. s.e.
t K
0.09 25.0
2.22 0.12 18.8
2.21 0.09 24.8
2.28
2.25 0.05
0.09 25.1
47.3
r.c.
1.16
1.05
1.05
1.14
1.08
s.e.
0.15
0.10
0.10
0.12
0.06
t N
2.25
Pooled
7.6
10.5
10.5
9.3
19.3
r.c.
5.47
7.17
5.38
4.92
5.32
s.e.
0.65
0.82
0.72
0.41
0.26
8.5
8.8
7.4
t
12.1
20.1
Intercept
3.38
1.92
4.02
3.06
3.25
R2
0.71
0.75
0.73
0.73
0.73
s
7.7
7.7
7.7
8.1
7.8
F
266
249
311
318
1145
No. of jobs
333
259
345
354
1291
Key:
T : CPU time K : No. of job steps N : Average short job load r.c. : regression coefficient s.e. : standard error of regression coefficient t : t-statistic for significance of regression coefficient 2 R
: proportion of variation explained by model s : standard error of residuals F : F-statistic for significance of regression equation
178
H. GOMAA
intercepts. This is considered a significant result, as both in this project and elsewhere (1, 2, 18) the inconsistency of regression coefficients has been a major problem experienced in applying regression modelling techniques to computer system performance evaluation.
It is therefore justifiable to use the model pooled from all four sessions as a satisfactory regression model of the short job workload. This model is called the Workload Model and is represented by: t
where t
e
= 3 . 2 5 + 2 . 2 5 T + I . 0 8 K + 5.32N
is the job elapsed time, T is the CPU time required, K is the number
of job steps, and N is the average number of short jobs in competition with this job over its lifetime.
The regression coefficient of T is an estimate of the average time ex pansion factor experienced by each job for each second of CPU time. The re gression coefficient of K is an estimate of the average system overhead to initiate a job step. The regression coefficient of N provides a means of esti mating the delay experienced by a short job due to competition from other short jobs. The reason why only short jobs make a significant contribution to the model is because they have priority over long jobs in the competition for Central Memory.
5.
THE
LOAD
ADJUSTING
MODEL
5.1
Limitations of the Workload Model Once the purely regression Workload Model has been validated, it may be
used for making fast and approximate predictions of a batch job's elapsed time, given the job's resource demands and the load on the system during its execution. The Workload Model suffers from an important structural limitation. This is because a regression model is static and hence does not recognise the passage of time. To enable the Workload Model to model varying loads, the load experienced by each job is input as an independent variable. This is used to estimate the delay experienced by the job due to competition from other jobs. The main limitations of this method is that all measures of the load on the system must be specified in advance of a run of the model (Figure 1 ) . A model which overcomes this limitation is one which is capable of adjusting its esti-
A MODELLING APPROACH TO EVALUATING PERFORMANCE
Job's Resource Demands WORKLOAD MODEL
Measured Load
Figure 1:
The Workload Model
Job's Resource Demands
>
LOAD ADJUSTING MODEL
Estimated Load
Figure 2:
Predicted Job Elapsed Time
The Load Adjusting Model
Predicted Job -Elapsed Time
179
180
H. GOMAA
mate of system load as each modelled job commences or terminates execution (Figure 2 ) . Unlike a regression model, such a model must be dynamic, i.e. it must be capable of modelling the passage of time, and in particular job arrival and job termination.
5. 2
Fast Dynamic Computer System Models There are a number of ways of building fast dynamic models of computer
systems. One method is to use simulation techniques. A simulation model may model a computer system at almost any required level of detail. However, there is a tendency for many simulation models to model systems in considerable detail. In these cases, the greatest drawback to simulation modelling is probably its relatively high cost (17). A more promising alternative is to combine simulation with different modelling techniques to produce dynamic hybrid models.
Kimbleton has described an analytically driven computer system simulator (15, 16) which combines simulation and queuing modelling techniques. The method described here combines regression techniques with simulation techniques. A simulation framework is used to model the passage of time. By this means, the static regression model is converted into a dynamic hybrid simulation/regression model. 5.3
Concepts of the Load Adjusting Model
5.3.1
Introduction In the Load Adjusting Model (LAM), a simulation framework is created
which allows each job's progress through the system to be modelled dynamically. Within this framework, two submodels (22) are used. Each submodel has the property that it can run as part of the main model, receiving inputs from and feeding its output to other parts of the model. Alternatively, the submodel may run independently of the remainder of the model.
A regression submodel predicts each job's elapsed time in the absence of competition from other jobs. The simulation framework allows the number of jobs in execution at any stage to be estimated. A numerical submodel predicts the time delay experienced by a job due to the competition from other jobs, for each period when the number of jobs executing is constant. The simulation framework maintains a running sum of the predictions of the two submodels. At the simu lated time of job termination, this sum is
the predicted job elapsed time.
A MODELLING APPROACH TO EVALUATING PERFORMANCE 5.3.2
181
Modelling Job Elapsed Time A batch job's elapsed time t
may be considered as consisting of two
t = t. + t. e j d
(1)
t. is the elapsed time a job would experience if no other job were competing with it for resources, t, is the delay a job experience due to competing with other jobs for system resources. Hence t, is equal to zero if the job experiences no competition from other jobs. Consequently t. is the minimum elapsed time a job would experience in the system and is referred to as the job execution time from now on. It is predicted by the regression submodel:
t. = f ( d r d 2 ... d n )
(2)
where (d., d„ ... d ) are the job's resource demands, e.g. CPU time, memory and I/O demands. 5.3.3
Time Intervals In modelling a system session, LAM divides the session up into a series of
time intervals. A time interval is defined as a period of time during which the number of jobs competing for resources is constant. A time interval is started or terminated by one of two possible events: job arrival or job termination. To simplify the model, it is assumed that in any time interval t., in which there is more than one job competing for resources, each job is treated identically by the system.
In an interval t., each executing job experiences some useful execution t.. and some delay t... t.. may be accounted for by CPU time, I/O time, or by the system carrying out some function for the job, e.g. job steps initiation, t . is the delay experienced by a job due to the competition from other jobs for scarce system resources, and so may represent time waiting for CPU, waiting for I/O, or time rolled out of Central Memory.
A job's elapsed time t
may be expressed as:
t = t. + t , e j d
182
H. GOMAA s = t. + Z J i=l
t.. dl
where s is the number of time intervals a job passes through in the execution phase. For each job, t. may be predicted at the simulated time of job arrival using equation (2). t,. is estimated for each time interval as described next. 5.3.4
Modelling Delay Time It is assumed that in each interval, all jobs are treated identically by
the system. It is further assumed that the delay t.. experienced by each job in a time interval is : (i)
a function of the number of jobs, n, competing for resources with a given job
(ii) a linear function of the length of the time interval t. i.e.
t d i = t. g (n)
(3)
In the general case, we assume that g (n) = a
+ Σ
0
k=l
i
K
g (n) is a polynomial of the form: η
However, since t . has been defined such that there is no delay if only one job is executing, i.e.
t,. = 0 when η = 0 dl
.••a = 0 o
and P
g (η) = Σ k=l
k a. η Κ
Furthermore, if we assume that only the first two terms of the polynomial are significant, then we have for any time interval t. in which there are Ν jobs executing: g (Ν) = a ^ + a 2 N 2 Substituting for g (Ν) in (3):
t
di
=
(a
lN +
a
2N2)ti
(4)
A MODELLING APPROACH TO EVALUATING PERFORMANCE
183
However t.. = t. ji i
t.. di
Substituting for t.. from (4): t.. = (1 - 3χΝ - a 2 N 2 )t.
t.
1
5.3.5
(5)
t. . Ji
1 - 3χΝ - a 2 N 2
(6)
Estimating Time Interval Length A time interval is terminated either by a new job arriving or a job
terminating. The time of the next job arrival t
is obtained from the input
trace. The time a job terminates is predicted by the model. When a job enters the system, its execution time t. is predicted by the regression submodel (equation 2). At the start of each time interval, each job in the system has a remaining execution time t. , which is the real time a job would require to complete execution if no other jobs were competing for resources. If we assume all jobs are treated identically, then the job with the minimum t. (given by t. ) is the Jjob that will terminate first. The time interval t. s ' jrm im necessary to complete execution of the job with execution time t. is computed using equation (6).
+
ììIB
( 7)
1 - a Ν - a2N2 t. is then compared with t to determine whether the next event is a job arrival v J im a or a job termination. Hence, the length of the next time interval t. is given by: t. = min (t. , t ) i im' a
, '
Given t., the execution time t.. and delay time t,. for this interval may l*
be computed
'
ji
di
using equations (5) and (4) respectively, t. is added to the value
of the elapsed time so far for each job and t.. is subtracted from the value of J
ji
the execution time remaining (t. ) for each job. This procedure continues until t.
is reduced to zero for a particular job. This represents the time at which
the model predicts the job will terminate. The accumulated elapsed time at the simulated time of job termination is then the predicted elapsed time for that
'
184
H. GOMAA
job. 5.4
The Load Adjusting Model of the Kronos System
5.4.1
The Regression Submodel The Workload Model, developed for the short job workload on the IC Kronos
System (see 4.4.) is a regression equation of the form:
t
e
= b ' + b, 'T + b,'K + b,'N o 1 2 3
(8)
As only the short job workload is modelled, it is appropriate to amend the definition of job execution time given in 5.3.2. It is now defined as the elapsed time experienced by a'job when it experiences no competition from other short jobs. The equation
t. = b + b,T + b.K J O 1 L
(9)
was found to be a suitable model for the subset of the short job workload which did not experience any competition from other short jobs (i.e. N = 0 ) . Conse quently, this is the regression submodel used in the Load Adjusting Model for predicting job execution time. However, this subset was not a representative sample of the short job workload, as these short jobs tended to make substantial ly lower resource demands. Hence, one would not expect the values of the coef ficients b , b. and b. for the model of the subset to be the appropriate values for the regression submodel. However, they provide a suitable initial setting for these parameters.
5.4.2 (1)
Assumptions made by LAM As only the short job workload is modelled, the delay experienced by a short job in a time interval is assumed to depend only on the competition from other short jobs. The competition from long jobs is ignored (see also 4.4.).
(2)
It is assumed that in each time interval, all jobs are treated identically by the system. This is a reasonable assumption for CPU allocation, where a round-robin scheduling algorithm is enforced. It is likely to be less reasonable for I/O management.
(3)
Since all real time data obtained from the Dayfile (e.g. job start and end
A MODELLING APPROACH TO EVALUATING PERFORMANCE
185
times) are measured in units of a second, the basic real time quantum in the model is the second. It should be pointed out, however, that CPU time is measured in milliseconds. This value is used by the regression submodel to predict job execution time, which is then rounded to the nearest sec ond. These assumptions are bound to lead to inaccuracies in the model. Attempts are made to minimize these during the calibration of the model. 5.5.
Implementation A Preprocessor (see 3.2) prepares a workload trace, representing a given
session, for input to LAM. The trace consists of a set of short jobs ordered by job arrival time. Each job is represented by a vector of its resource demands which is input to the model at the simulated time of job arrival. The parameter settings for the two submodels are also input. The model outputs the predicted elapsed time, execution time and delay time for each job.
A Postprocessor analyzes the file generated by a run of LAM. The Post processor prints the results in tabular form, computes the means and standard deviations of the predictions of the model, and plots various figures as re quired. It also carries out a statistical analysis of the results.
Workload traces were prepared for three sessions. These were the morning sessions of 27/1/75 and 30/4/75 and the afternoon session of 30/1/75. All three sessions were used in the construction of the Workload Model (4.4).
5.6
Calibration of LAM Calibration is an iterative procedure whose objective is to reduce the
difference in behaviour between the model and the real system by adjusting the parameters of the model (3). There are two sets of parameters which were adjusted during calibration:
(a)
The parameters b , b. and b„ of the regression submodel for the job exe cution time (equation 9 ) .
(b)
The parameters a. and a.
of the delay time submodel (equation 4 ) .
The overall calibration approach was based on that used in the calibration of a simulation model of OS/360 under LASP (5). The remainder of this section
186
H. GOMAA
highlights the main features of the calibration. The calibration is described in detail in reference (13). One method of comparing the difference between the real system and the model is to compare the real job elapsed time with the predicted job elapsed time. The difference between these values is the residual. The mean absolute value of the residuals was chosen as a 'figure of merit' (5).
Calibration may be carried out by applying a given trace to the model and adjusting the parameters by means of an iterative tuning procedure. For each stage of this procedure, LAM (with a given setting of the parameters) and the Workload Model are run using the same input trace. For both models, each job's elapsed time is predicted and the residual is derived. The Postprocessor prepares a sequence of matched pairs, one for each job, of absolute values of residuals. The non-parametric Wilcoxon matched-pairs signed-ranks test (19) is then carried out to compare the sequence of matched pairs. It tests the null hypothesis that there is no significant difference between the models.
Each time a run with the model is made, a decision has to be made about the settings of the parameters for the next run. To assist in this, Beilner's Method of Good Balance is used (4). This method provides a systematic means of guiding the model-builder towards constructing a well-balanced model.
With the guidance provided by this analysis, the parameters are altered and the process is repeated. The iterative tuning procedure is continued until no further significant improvement in LAM is obtained. LAM was calibrated using the 30/4/75 trace. Some problems were experienced during the calibration (13). These were caused by perturbations in the real system which were not recorded in the data and hence should not be modelled. A few short jobs affected by these perturbations were excluded from the calibration.
5.7
Validation of LAM Validation of the LAM aims at determining the domain of situations for
which the model performs with a given accuracy, for an established calibration (3). During calibration, the parameters of the model are adjusted with the ob jective of reducing the different in the behaviour between the real and modelled worlds for a particular workload trace. The objective of the validation process is to find a set of parameter values, determined during calibration, with which the model predictions are not significantly different for other traces.
A MODELLING APPROACH TO EVALUATING PERFORMANCE
187
The LAM parameters were set to values obtained in the calibration of the model with the 30/4/75 trace. The model was then run with the 24/1/75 and 30/1/75 traces respectively. A non-parametric test, the Mann-Whitney U-test (19), was carried out to determine if there was any significant difference in the model predictions. The criterion for comparison was the absolute value of residuals. The null hypothesis, that two independent groups of observations have been drawn from the same population, was tested. In this case, the independent groups were the sets of absolute residuals obtained by running the model with a given set of parameters, using different input traces. Since there were three traces, the test was carried out in a pairwise manner, comparing two sets
of absolute residuals at
a time, making three tests in all.
The Mann-Whitney test indicates the probability of the null hypothesis, that the two sets of absolute residuals have been drawn from the same distribu tion, being true. Table 2 shows that the null hypothesis may be accepted at the 10% level. Hence the LAM has been successfully validated for the three sessions under consideration.
6.
THE
MEMORY
MANAGEMENT
MODEL
6.1
Limitations of the Load Adjusting Model In the Workload Model, a measure of the level of competition is given by
the average number of short jobs competing for resources with a given job. This measure is input to the model as an independent variable. In the Load Adjusting Model, the level of competition is estimated using a simulation framework. However, the level of competition, whether measured in the Workload Model or estimated in LAM, does not distinguish between jobs in Central Memory (CM) com peting for the CPU and I/O, and jobs rolled out of CM. Furthermore, in both models, only the short job workload is modelled and the presence of the long job workload is ignored.
The Memory Management Model (MMM) attempts to overcome both these limi tations by simulating the memory management subsystem. This subsystem is modelled at a greater level of detail than the rest of the system. By simulating this sub system, it becomes possible to make reasonable estimates of how many long jobs are executing in CM and how many are rolled out.
Short job delay, due to competition with other jobs, is estimated on a time interval basis using a numerical submodel (as in LAM).
However, one import-
H. GOMAA
188 Table 2:
Validation of Load Adjusting Model (LAM)
Parameter Settings b
= 4.7
o
b. = 2.1 1
&χ = 0.11
.
b . = 0.8 ¿
a 2 = 0.01
Model Predictions
Session
actual t
Modelled
(secs)
e
predicted t
e
(secs)
Fri
FI
Ρ
27/1/75
17.4
15.6
5.08
4.56
0.161
30/4/75
17.0
15.7
4.72
4.55
0.064
30/1/75
16.8
15.7
4.42
4.01
0.119
Key:
t
:
mean job elapsed time
Ir I
:
mean of absolute r e s i d u a l s of Workload Model (WM)
Irs I
:
mean of a b s o l u t e r e s i d u a l s of LAM
:
p r o b a b i l i t y t h a t t h e r e i s no difference between WM and LA M (Wilcoxon Test)
I l Ρ
Pairwise Comparison of LA M P r e d i c t i o n s using MannWhitneyUTest
Ρ'
Session A
Session Β
27/1/75
30/4/75
Ρ'
0.140
27/1/75
30/1/75
0.492
30/4/75
30/1/75
0.161
:
P r o b a b i l i t y t h a t t h e r e i s no difference between t h e two s e t s of absolute residuals.
A MODELLING APPROACH TO EVALUATING PERFORMANCE
189
ant difference is that the predicted delay in MMM is based on the estimated number of jobs resident in CM, rather than the estimated number of jobs in exe cution, some of which may be rolled out. A regression submodel is used to predict job execution time, defined in MMM as a job's elapsed time in the absence of competition from any other jobs.
6.2
Memory Management in Kronos When a job first enters the system, it is placed in the Input Queue. A
job rolled out of CM is placed in the Rollout Queue. For the purpose of memory allocation, the Job Scheduler (which handles memory as well as job scheduling) treats the Input and Rollout queues as one queue.
Every job has a CM priority associated with it. When a job enters the In put or Rollout Queue, its CM is set to an initial value. Its priority is gradual ly aged till an upper bound is reached. On the IC Kronos system, short jobs are given an initial priority which is above the upper bound. Consequently, their priorities are not aged, and they are given preferential treatment over other batch jobs in the allocation of CM. A job resident in CM is liable to be rolled out if a higher CM priority job makes a memory request.
Each job resident in CM is also awarded two time slices, a CPU time slice and a CM time slice. The CPU time slice is the amount of CPU time a job may use before becoming eligible for rollout. The CM time slice is the amount of real time a job may be resident in CM for, before becoming eligible for rollout.
6.3
Assumption Made by the Memory Management Model In addition to assumptions 2 and 3 made by LAM (5.4.2), the following
assumptions are made by MMM: (1)
The regression submodel for predicting short job execution time is also used to predict the execution time of long jobs. This is because con siderable difficulty was experienced modelling the long job workload (12) and so a separate regression submodel for this workload was not available. The objective of introducing the long job workload into the MMM was there fore limited to making a better estimate of the competition experienced by short jobs. This assumption is bound to lead to inaccuracies in the model. Nevertheless it was felt that modelling the long job workload, even in a crude form, was preferable to not modelling it at all.
190 (2)
H. GOMAA The only information on memory allocation that may be derived from the Dayfile is the average CM used by a job during its execution. Consequent ly, the model assumes that a job uses its average memory size throughout its execution. It would be a simple extension to the model to handle memory allocation at the job step level, if the data was available.
(3)
Only two priority levels for access to CM are assumed, one for short jobs and the other for long jobs. As short jobs have a constant access priority to CM which is higher than for long jobs, the approximation is a reason able one.
6.4
Overall Design of the Memory Management Model
6.4.1
Job Commencement At the simulated time of job arrival in the system, the job's execution
time is predicted by the regression submodel. The model then determines whether sufficient memory is available for the job. If sufficient memory is available, an entry is set up for the job and is linked onto the end of the Execution List. This list contains an entry for each simulated job executing in CM.
If insufficient memory is available, then for long jobs the entry is linked onto the end of a combined Input/Rollout Queue for long jobs. This con tains an entry for every long job in Input or Rollout state. For short jobs, the model checks if.there is enough 'eligible' memory, in addition to free memory, to allow the job to start execution. Eligible memory is that used by executing jobs which are eligible for rollout. This includes all long jobs and those short jobs which have exceeded their time slice (see 6.4.2). If there is now sufficient memory, as many 'eligible' jobs as necessary are rolled out, and the new job is linked onto the Execution List. If insufficient memory is available, the job is linked onto the end of the short job Input/Rollout Queue.
6.4.2
Time Slice Expiry Each time round its main loop (at each event occurrence), the model checks
each job in the Execution List to determine whether it has exceeded the CPU time or CM time slice (see 6.2). The CM time slice is a real time measure, but the CPU time slice has to be converted into a real time measure, which is named the Execution time slice. At the simulated time of job commencement, the Execution time slice is estimated for each job whose total CPU time is greater than the
A MODELLING APPROACH TO EVALUATING PERFORMANCE
191
CPU time slice: Execution Time Slice =
Estimated Job Execution Time χ CPU Time Slice Total CPU Time Required
If a job has exceeded its time slice, its priority is set to 'eligible for rollout'. 6.4.3
Job Rollout and Rollin If insufficient memory is available when a short job enters the system,
jobs in 'eligible' state are liable to be rolled out, until enough memory has been released to satisfy a memory request. Long jobs are always rolled out be fore eligible short jobs. Each time round its main loop, the model checks if sufficient memory is available for one or more jobs in first the short job and then the long job In put/Rollout queues. If so, the job is brought into CM and placed onto the end of the Execution List.
6.5
Calibration and Validation A number of problems arose in the calibration of the Memory Management
Model (MMM). These were due to the limited amount of workload and performance data available. In addition to the data used in calibrating LAM, the following data was available:
(i)
The total amount of Central Memory available for user jobs, 50K 60-bit words.
(ii)
The average CM used by each job during execution. The main problems in calibrating this model were:
(a)
In Kronos, memory is usually allocated to a job at the job step level. Because the model uses the average CM allocated to jobs, it thus con siderably underestimates the rate of change of memory allocation.
(b)
No record of rollin/rollout is maintained by the Dayfile. Thus no indi cation is given of which jobs were rolled out, and for how long. Because of these problems, the calibration approach adopted for MMM was
192
H. GOMAA
similar to that for LAM, in spite of the fact that the system is modelled at a greater level of detail. The model was calibrated using the 30/1/75 trace and validated using the 27/1/75 and 30/4/75 traces. The results of the validation are shown in Table 3. The Mann-Whitney test shows that the null hypothesis, that the two sets of abso lute residuals have been drawn from the same population, may be accepted at the 4% level. Thus the MMM has been validated at this level for the three sessions under consideration.
7.
COMPARISON
OF
MODELS
Both hybrid models were written in Fortran. The Load Adjusting Model is just over 400 statements long, while the Memory Management Model is just under 800 statements long. Table 4 shows, for the three models, the compilation times and the execution times required to model one 3J hour session consisting of 354 short jobs and 61 long jobs. The results show that experimenting with any of these models is more than three orders of magnitude more economical than experi menting with the real system.
Table 5 compares the predictions of the three validated models for each of the three sessions. It shows that the mean absolute value of the residuals as a percentage of mean actual elapsed time is between 26% and 29% for the Workload Model, and between 24% and 27% for both the Load Adjusting and Memory Management Models. The main reason for the comparatively large residuals in all three models is due to the limitations of the available data, derived entirely from the Kronos Dayfile.
These limitations also meant that the amount of workload data collected for input to the model and performance data for calibration purposes could not be significantly increased as the level of detail of the models increased. Whereas the Workload Model uses the average load experienced by each job as in put, this figure is predicted by both the Load Adjusting and Memory Management Models, given the times of job arrival. The MMM also uses the average memory requirement of each job and the total amount of Central Memory available to user jobs. The effect of these limitations is that the improvement in the predictions, as the level of detail of the models is increased, is not as significant as one would aim for with
(a)
this modelling approach. Thus Table 5 shows that:
The predictions of the Load Adjusting Model are significantly better than
A MODELLING APPROACH TO EVALUATING PERFORMANCE
193
Table 3: Validation of Memory Management Model (MMM) Parameter Settings b
o
= 4.7
a, = 0.07
b, = 1.9 1
b 2 = 0.7
a 2 = 0.01
Model Predictions Session
Actual
Predicted
Modelled
t
t~ e
(secs)
e
(secs)
FA
|r s |
1 1
Ρ
0.037
27/1/75
17.4
15.4
5.08
4.51
30/4/75
17.0
15.7
4.72
4.58
0.133
30/1/75
16.8
15.2
4.42
3.99
0.046
Key:
:
mean job elapsed time
:
mean of absolute residuals of Workload Model (WM)
:
mean of absolute residuals of MMM
:
probability that there is no difference between WM and MMM (Wilcoxon test)
Pairwise Comparison of MMM Prediction Using Mann-Whitney U-Test
Ρ'
Ρ'
Session A
Session Β
27/1/75
30/4/75
0.057
27/1/75
30/1/75
0.444
30/4/75
30/1/75
0.042
: probability that there is no difference between the two sets of absolute residuals.
194
H. GOMAA
Table 4:
Comparison of Model Speeds
Compilation
Execution
Time (sees)
Time* (sees)
Workload Model
1.8
0.7
Load Adjusting Model
2.3
3.4
Memory Management Model
4.6
4.6
*
CPU time to process
3J hour session on CYBER.
Table 5: Comparison of Model Session
Measured t
Workload Model
-
(secs)
(sees)
27/1/75
17.4
30/4/75 30/1/75
Key:
Load Adjusting Model
e t
Predictions
|r| ~t~ e
(%)
Memory Management Model P
l
r (%) t~~ e
P
2
P
3
e
(sees)
5.08
29.2
4.56
26.2
0.161
4.51
25.9
0.037 0.181
17.0
4.72
27.8
4.55
26. S
0.064
4.58
26.9
0.113 0.344
16.8
4.42
26.4
4.01
23.9
0.119
3.99
23.8
0.046 0.419
(sees)
t
: Mean actual elapsed time
r
: Mean absolute value of residuals for a given model
P.
: Probability that there is no significant difference between LAM and WM (Wilcoxon Test)
?2
Ί
P,
: Probability that there is no significant difference between MMM and LAM (Wilcoxon Test)
Probability that there is no significant difference between MMM and WM (Wilcoxon Test)
196
H. GOMAA the Workload Model at the 10% level (i.e. with 90% confidence) in only one case out of three.
(b)
The predictions of the Memory Management Model are significantly better than the Workload Model at the 5% level in two cases out of three.
(c)
In none of the cases are the predictions of the Memory Management Model significantly better than the Load Adjusting Model.
The main limitations of the data are: (a)
Job rollout time is not recorded. This data could be used for developing a more accurate regression model and to assist in the calibration of the Memory Management Model.
(b)
No good measures of I/O demand are available. Better measures, such as a job's non-overlapped I/O time and count of I/O requests made, could lead to an I/O term appearing in the regression models.
(c)
Data is collected at the job level. Hence the batch subsystem can only be modelled at the job and not the job step level.
(d)
There is insufficient data to model the timesharing subsystem. It has been suggested that most user installations would easily tolerate
models with an accuracy level of 20% (16). It is believed that with additional data, the accuracy of the three models could be improved so that the mean abso lute value of the residuals would fall below this level for most conditions,
8.
CONCLUSIONS
(a)
A detailed understanding of system performance and of the characteristics of the workload is necessary to enable the construction of meaningful and consistent regression models at the workload level.
(b)
Regression and simulation modelling techniques may be combined to construct a hybrid model, which is a dynamic model of system performance.
(c)
The hybrid simulation/regression framework provides a valuable means of modelling the system at different levels of detail.
A MODELLING APPROACH TO EVALUATING PERFORMANCE
197
By constructing, calibrating and validating the three models of the Im perial College system, the feasibility of combining regression and simulation modelling techniques within a multilevel modelling framework has been demon strated. By this means the advantages of both techniques are exploited. Re gression analysis provides a fast statistical method of modelling a system or subsystem at a gross level. Simulation models the passage of time and provides a means of modelling the system in more detail by representing logical and struc tural relationships in the system.
9.
ACKNOWLEDGEMENTS I am indebted to Professors D.J. Howarth and M.M. Lehman and to Dr. H.
Beilner for their invaluable advice and assistance. I am also indebted to W.J. Brooker for his assistance on statistical matters. I am very grateful to the Imperial College Computer Centre Manager, Mr. A. Spirling, and his predecessor Mr. J.L. Benbow, for providing the Dayfile tapes. I am also grateful to P.G. Jones and J.L. Thompson for implementing some of the Dayfile processing programs.
10.
1)
REFERENCES
Y. Bard, 'Performance Criteria and Measurement for a Time Sharing System', IBM Systems Journal, Vol. 10, No. 3, 1971.
2)
Y. Bard and K.R. Suryanarayana, 'On the Structure of CP Overhead', in Statistical Computer Performance Evaluation, W. Freiberger (ed.), Aca demic Press 1972.
3)
H. Beilner, 'Problems in Calibrating and Validating Simulation Models', Proceedings of 2nd Seminar on Experimental Simulation, Liblice
(CSSR),
1973. 4)
H. Beilner, 'The Method of Good Balance', Proc. Computer Systems Per formance Evaluation Workshop, Iowa State University, 1973.
5)
H. Beilner and G. Waldbaum, 'Statistical Methodology for Calibrating a Trace-Driven Simulator of a Batch Computer System', in Statistical Com puter Performance Evaluation, W. Freiberger (ed.), Academic Press, 1972.
6)
Control Data 6400/6500/6600/6700 Computer Systems Reference Manual,
198
H. GOMAA Publication No. 60100000.
7)
C. Daniel and F. Wood, 'Fitting Equations to Data', Wiley, 1971.
8)
N. Draper and H. Smith, 'Applied Regression Analysis', Wiley, 1966.
9)
H.P. Friedman and G. Waldbaum, 'Evaluating System Changes under Un controlled Workloads: A Case Study', IBM Systems Journal, Vol. 14, No. 4, 1975.
10)
H. Gomaa and M.M. Lehman, 'Performance Analysis of an Interactive Computing System in a Controlled Environment', Proc. Online Conference on Computer System Evaluation, September 1973.
11)
H. Gomaa, 'Performance Measurement and Evaluation: A Survey', Imperial College, Dept. of Computing and Control, Internal Report, May 1976.
12)
H. Gomaa, "Regression Models for the Evaluation of Computer System Performance', Proc. Eurocomp conference on Performance Evaluation, September 1976.
13)
H. Gomaa, 'The Calibration and Validation of a Hybrid Simulation/ Regression Model', Imperial College, Dept, of Computing and Control, Internal Report, September 1976.
14)
U. Grenander and R. Tsao, 'Quantitative Methods for Evaluating Computer System Performance: A Review and Proposals', in Statistical Computer Performance Evaluation, W. Freiberger (ed.), Academic Press, 1972.
15)
S.R. Kimbleton, 'A Fast Approach to Computer System Performance Pre diction', Proc. Computer Architectures and Networks - Modelling and Evaluation Workshop, IRIA, 1974.
16)
S.R. Kimbleton, 'A Heuristic Approach to Computer Systems Performance Prediction', Proc. AFIPS National Computer Conference, 1975.
17)
H.C. Lucas, 'Performance Evaluation and Monitoring', ACM Computing Surveys, Vol. 3, No. 3, 1971.
18)
M. Schatzoff and P. Bryant, 'Regression Methods in Performance Evaluation: Some Comments on the State of the Art', Proc. Computer System Performance
A MODELLING APPROACH TO EVALUATING PERFORMANCE
199
Evaluation Workshop, Iowa State University, 1973.
19)
S. Siegel, 'Non-Parametric Statistics of the Behavioural Sciences', McGraw-Hill, 1956.
20)
R. Watson, 'Computer Performance Analysis: Applications of Accounting Data', Rand Report R-573-NASA/PR, May 1971.
21)
G. Waldbaum, 'Evaluating Computing System Changes by Means of Regression Models', Proc. 1st SIGME Symposium on Measurement and Evaluation, 1973.
22)
G. Waldbaum and H. Beilner, 'Submodel Simulation', Proc. Summer Simulation Conference, 1973.
Modelling and Performance Evaluation 'of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES IN PAGED COMPUTER SYSTEMS WITH BIFFERENT JOB CLASSES J . H . H i n e , I . M i t r a n i and S .
Tsur
U n i v e r s i t y of N e w c a s t l e upon T y n e , U.K. Abstract The possibility of giving different quality of service to Jobs of different classes by regulating their memory allocation is examined in the context of a paged computer system. Two parameterized algorithms which partition the main memory between two classes of jobs are considered. Initially a closed system consisting of a processor and a paging device, with fixed numbers of jobs, is studied to determine optimal degrees of multiprogramming and the proportion of processor time devoted to each class. Applying a decomposition approach and treating the closed system as a single server, the response times in an open system with external arrivals are studied. The object here is to investigate the effect of the memory allocation parameters on the expected response times under the two algorithms. Bounds on the expected response times as functions of the control parameters are obtained by considering the extremes of loading conditions. A way of applying the results to systems with more than two job classes is indicated. 1.
Introduction
This paper presents some preliminary results on the effect of the use of memory allocation to regulate expected response times. Ve assume that the demand on a computer system can be divided into several classes such as terminal users, relatively small student batch jobs and larger production jobs. A management decision is to allocate the system resources in such a way that each class has an expected response time reflecting the importance placed on it. This problem has previously been considered emphasising central processor allocation [4, 7 ] . In the present study memory allocation is used to control the expected response times. For the case mentioned we might consider a multiprogramming system which runs a time-sharing system in a portion of memory, maintains a resident load and go monitor and processes a batch stream through the remainder of memory. The bulk of the paper, is concerned with the case of two job classes. The system is described in section 2. The model represents a central processor, paging device and external queues for jobs awaiting a memory allocation. This model can be solved by the method of decomposition. An aggregate server, representing the processor and drum, is used to study response times as a function of memory allocation. Section 3 introduces a strategy based on a fixed partition of memory among the classes. Vithin a system with two classes of jobs bounds are obtained on the response times of jobs of each class as a function of the size of its part ition. In section 4 we attempt to generalize the strategy to al].ow the partition size to vary when this will benefit system performance. Bounds are also obtained for the response times in this case. Finally, we consider the generalisation of our methods to systems with more than two job classes and suggest a possible approach. 2.
Model Description
The system under consideration consists of a CFU, M pages of main memory and a paging drum. The demand comprises two job classes: jobs of class i arrive in a Poisson stream with rate λι' and have CPU requirements distributed exponentially with mean 1/μι ; i=1,2. At any moment in time a job is either passive, i.e. wait ing in the external queue associated with its class, or it is active, i.e. occupy ing a certain amount of main memory and circulating between the CPU and the paging drum (fig. 1 ) .
201
O.H. HINE, I. MITRANI and S. TSUR
202
Fig. 1 Diagram of the system model
Page Allocation Fig. 2
Lifetime function
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES
203
The behaviour of the system is controlled by a scheduling strategy S. If at a given moment in time there are N3 class 1 jobs and N 2 class 2 jobs in the system, S determines: a) the number of active class i jobs, ni , and b) the fraction of main memory allocated to class i, Γι ; i=1 , 2; Γι+Γ2=1. Each active class i job gets Ι Γι M/ni J pages of main memory (i=1,2), where IxJ is the largest integer less than or equal to x. Sometimes we shall refer to a) as 'admission policy' and to b) as 'memory allocation policy'j it should be remembered, however, that they are not independent but are parts of a general scheduling policy. The behaviour of a class i job executing in ρ pages of main memory is governed by a lifetime function ei (p) giving the expected time between page faults; the distribution of that time is assumed to be exponential. The schedul ing strategy and the lifetime functions determine the page fault rate ξι (ηχ ,ι^) of a class i job when ii]. class 1 jobs and n 2 class 2 jobs are active. For our numerical results we have used the lifetime functions proposed by Chamberlin et al [3"] and shown in fig. 2: e, (p) =
2b i 1 + ( C l /p) 2
;
i=1,2
(1)
The scheduling discipline at the CPU is assumed to be Processor-Sharing; this is a good approximation to the Round-Robin disciplines most commonly used in practice. Queueing at the drum is on a First-In-First-Out basis with exponential service but the instantaneous service rate T|(n) is assumed to depend on the number of jobs, n, in the drum queue. Denning [6] has shown that if T)(n) is chosen so that J _ Τ I 2n+m-1 \
Tl(n) - m I
2n J
(2)
then this FIFO discipline satisfactorily approximates the departure process from a drum with m sectors and rotation time T. We regard the demand and hardware characteristics as fixed and are interested in the effect of the scheduling strategy S on system performance. Two families of scheduling strategies will be considered. In each case, the study proceeds in three separate stages, described below. First, the closed network consisting of the CPU and the drum, with ni class 1 jobs and n 2 class 2 jobs endlessly circulating between them, is analysed in the steady-state. Only the memory allocation part of S plays a role here. Under our assumptions, this model falls within the class of models considered by Baskett et al [2]. Let the system state be described by the vector (ki ,1^ ) , where ki is the number of class i jobs at the CPU; i=1,2. The number of jobs at the drum is j=ni+n2-ki-kg . The steady-state probabilities are given by
P(ki,ka) = 0 ( ^ + ^ ) ' Γ ki ¡ka '.
ι
Ί
U i (Ή ,na )J
\
Γ
ι
Ί
Κ
|_& (n, ,n2 )J
-JJ
(ni-k, )[{ns-'ks)
!
where C- is a normalising constant, Ei (m ,n2) is the page fault rale for class i jobs and T](i) is the state-dependent drum service rate. These probabilities can be determined numerically with little difficulty. Using them, and remembering that the CPU is Processor-Shared, one can find the steady-state probabilities n i (ni ,η^) that the CPU is serving a class i job:
204
O.H. HINE, I. MITRANI and S.TSUR
kj
n, (n, ,n2) = \
\
k) =1 n,
π2(η,,η2) = V k 1= 0
p(ki,ks) j _
ks =ü (4)
n,
V
pdix.fca) J i *+1*
kfc¿1
The sum of these two probabilities is equal to the CPU utilisation factor. TTj (m ,Γ^ί/ηι represents the average amount of CPU time that a class i job receives per unit of elapsed time. For a given memory allocation policy, the values of Π) and n 2 which maximise the CPU utilisation factor are found by performing a search. Denote these values by L) and Lg . We also propose a 'rule of thumb' for finding L¡ and le . Second, the admission policy of S for the open system with external arrivals and departures is formulated. Basically, this states that up to L) class i jobs may be active at any one time, where L¡ depends on the memory allocated to class i at the time; i=1,2. The scheduling policy S is now completely defined. Third, the steadystate probability that the CPU is serving a class i job, given that there are N) class 1 jobs and N a class 2 jobs in the system (and hence the numbers of active jobs are nj and n s ) , is taken to be equal to the steady state probability that in the closed network with na class 1 jobs and n^ class 2 jobs the CPU is serving a class i job: prob(CPU serves a class i job| N),N 2 )
=TT
i ( n i , n a ) ; i=1 ,2
(5)
This is an approximation; it amounts to asserting that the subsystem of the active jobs reaches steadystate between each two changes in the open system state (N) ,N2). Such an assumption is justified in our case, because the expected interarrivai times 1/λ.ι , 1/^2 a "d CPU requirements 1 /μ.]. , 1/|ΐ^ are, in practice, several orders of magnitude larger than the expected times between page faults and drum service times. For more detailed discussion on this 'decomposition' approach see Courtois
[5]·
It follows from (5) that the instantaneous departure rate for class ί jobs when the system is in state (N) ,N2) is equal to M-1 (Ni t^s) = μι "i (η, ,η?) ; i=1 ,2 (6) This allows the CPU and drum to be replaced by a single server which serves class 1 and class 2 jobs simultaneously with state-dependent service rates given by (6). The state vector (N) ,N2) can therefore be treated as a two-dimensional birth and death process whose steady-state probabilities P(Nj,N 2 ) satisfy the system of balance equations P(N!,N 2 ) [ X J + ^ + U J C N J ,Na) +μ 2 (Ν, ,N2 ) ] = λ, P(N, -1 ,Na H ^ P f N , ,Na-1 ) + +MN,+1,!fe)P(NI+1,lfe)+HE(NI ,N3+1)P(Ni ,lfc+1); N, ,N2=0 ,1 ,2. . .. (7) where P(-1 ,N2 )=P(N) ,-1 )=0. For a given set of parameter values, equations (7) plus the normalising equation CO
Σ
P(!t,NB)=1 (8) Ni ,N2=0 can be solved numerically in order to obtain the expected number of class i jobs in the system, E[Ni 1 and the expected response time for class i jobs, W| =E[Ni 1/Xj ; i=1 ,2. We shall determine the range of W( values by finding, for each scheduling strategy, an upper and a lower bound on Wj .
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES 3.
205
Static partitioning strategies
In this section we consider scheduling strategies which allocate to the job classes memory partitions of fixed size, allowing however jobs of ohe class to occupy the whole of memory when no jobs of the other class are present. The numbers of active jobs are determined by placing limits on the degrees of multi programming. More precisely, let memory be divided into two fractions Vi and y2 ; Yi+Ya=1· If, at a given moment in time, there are Nj class 1 jobs and N 2 class 2 jobs in the system then a)
The number of active class i jobs, ni, is given by minpJi ,Li (vt ) ] if N >0
i=1,2; j¿i (9)
Hi
I min[Ni ,Li (1)] if K =0 where Li (γι ) and L^ (γ2 ) are the optimal class 1 and class 2 degrees of multi programming for the partition (Y) ,Ya) and Li (1) is the optimal class i degree of multiprogramming when that class occupies the whole memory. These values will be determined later. b) The memory available to class i jobs is divided equally among them. We thus have a family of scheduling strategies depending on a single parameter, γχ . When γχ is varied in the interval [0,1], the policies range from preemptive, priority for class 1 (γι=1) to preemptive priority for class 2 (γ, =0) . The analysis of their performance proceeds as outlined in the last section. For a given γ^ , the closed network is solved, varying the numbers ni and r^ . This gives the probabilities ττ1(ηι,η2) defined by (4) and also determines the pair [Li(γι), I5(γ 2 )] which optimises the CPU utilisation factor. We found that there are usually several pairs (n) ,na) which yield utilisation factors close to the optimum and that the pair [\,^! M/cj J J I ^ M / C ^ J) , where C) is the page parameter of the lifetime function (1), is always one of them. It seems therefore unnecessary to perform an exhaustive search in each case. Table 1 gives some typical results for a particular case, showing the optimal degrees of multi programming and the nearoptimal ones obtained from the above heuristic. Having determined the probabilities Tij (rq jUj) and the parameters of the admission policy (9), we can decompose the open system and solve (7) and (8), with service rates μι(Ν!,Ν2) given by (6). However, a closedform solution for the general twodimensional birth and death process is not available and numerical solutions, while feasible, are timeconsuming and produce results only for isolated points in the parameter space. We shall give instead upper and lower bounds on the average response times Wi (i=1,2), as functions of yj . Consider the average response time for class 1 , W) . If we replace the present system by one in which class 2 jobs stop arriving as soon as there is one of them in the system, this will surely result in a lower V since it will reduce the interference from class 2. Thus a lower bound on W;i can be obtained by analysing a simpler birth and death process in which the second dimension has only two states, zero and one. This analysis is presented in Appendix A. Similarly, if we replace the present system by one in which the class 2 queue is always saturated, this will result in a higher Wa since it will increase the interference from class 2. Thus an upper bound for W) can be obtained by solving an M/M/1 queueing system with statedependent service rates, jiffN]), given by
Γμι(Νι,ι^(Υ2)) if N K L)(YI)
H*(N,)=
^ ( L ^ ) , ! ^ ) ) if ^ * Μ
Υ ι
)
OC)
The bound obtained in this way is reasonable for large values of Υα but is too crude for small values of Yj (e.g. it is always infinity for Υ}=0). An alter native upper bound is needed for these values, one which takes into account the fact that the class 2 queue is not saturated. We use the case yi^O to find a
J.H. HINE, I. MITRANI and S. TSUR
206
0
0.000 0.000
0.000 0.458
0.000 0.673
1
0.347 0.000
0.270 0.356
0.232 0.487
2
0.560 0.000
0.400 0.305
,jO.351 0.426
3
0.684 0.000
0.417 0.279
0.373 0.397
4
0.756 0.000
5
0.790 0.000
6
0.000 0.749
„ 0.000 0.772
0.792 0.000
7
0.777 0.000
8
0.771 0.000
Table 1:
i\ (i,j)
Vi,a)
in the fixed partition case.
parameters used: ρ =10 pages, ρ =20 pages, M=80 page frames, b. =1 5 msec, 1
2
b =25 msec, γ =32/80, Y =48/80. Disk parameters:
9 sectors, 100 msec rotation time.
Results marked by a 'box'
correspond to the near optimal degrees of multiprogramming determined by the heuristic used. by a search.
Results marked by '*' correspond to the optimal ones determined
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES
7i
Fig. 3
Expected response time bounds vs 7, : fixed partition.
Logarithmic scale.
207
208
J.H. HINE, I. MITRANI and S. TSUR
J
9 8 7 6
μι μ2 λ, λ2
W,
-
5
1.0
0.53
= 1.0 = 0.5 =0.3 =0.1
4 3
-
2
1.0 0 9 8 7 6 5
0.4
0.É 0.4
-
a?
0.0
0.0
4 3
2
1
I
I
4
I I I I I 5 6 7 8 9 10
I
Fig. 4 W , , W2 region with fixed partition strategy. Labelled points represent values of y,. Logarithmic scale.
ι 3
I 4
W, I I I I I 5 6 7 8 9 102
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES
209
heuristic for such a bound; now jobs of class 1 are executed only during idle periods of the class 2 queue. The analysis of this case is presented in Appendix B. The average response time for class 2, W 3 , is treated in exactly the same way. Figure 3 shows the upper and lower bounds for W) and W 2 in a particular case. It illustrates how one upper bound has been used for large partitions and the alternative one for small partitions. Figure 4 shows the region of possibly achievable pairs (Vi ,W2 ) of response times. We do not know, of course, whether all points in the region are achievable. The region was constructed by combining the lower bounds and upper bounds from Fig. 3 for particular values of Y) . 4.
Dynamic partitioning strategies
Although the fixed partition strategies are able to influence response times, they do not make very good use of memory under certain loading conditions. For example, suppose that, in order to give class 1 jobs a low average response time, a large partition is allocated to that class. Then at times when the class 1 traffic intesity is low there may be only one or two jobs in the partition. Each job would receive a relatively large memory allocation, but, as the lifetime function in Fig. 2 indicates, a point of diminishing returns is quickly reached. Just as dynamic partitioning of memory among jobs has been found to be more efficient than fixed partitioning, so dynamic partitioning among classes should be more efficient than fixed. Therefore, we generalize the strategies of the last section by allowing jobs of one class to "spill over" into the other part ition if the latter is under-utilized. The proposed scheduling strategy has three parameters, pi , Pg , and Y) . p¡ is the minimum number of pages that a class ί job may be allocated; as in the last section experience indicates that pi =Ci is a good choice (i=1,2). γ± is the fraction of memory provisionally allocated to class 1. A fraction γ2=Τ~Υι is similarly allocated to class 2. If at any moment in time there are N3 class 1 jobs and N 2 class 2 jobs in the system then for i=1,2: 1. If Ν) ρχ s ΥχΜ and N2P2 S γ 2 Μ then all jobs are active and each class i job is allocated either |_Y¡M/NI J pages if a job of the other class is present or IM/N) J pages if no jobs of the other class are present. 2. If Ν) Ρ) ε YiM and N2P2 2 γ 2 Μ then ni=|_YiM/p¡ J jobs of class i are active and each is allocated pi pages. 3. If Ni pi > YiM and Nj pj < y¡ M for j ¡¿ i then all class j jobs are active and either a)
NiPi^MNjpj in which case all class i jobs are also active and allocated pt pages each and each class j job is allocated \_(MNi p¡ )/Nj J pages, or
b)
NiPi>MNjPj in which case m =^(MNj pj )/pi J class i jobs are active and each is allocated p¡ pages; each class j job is allocated pj pages.
In other words, if the contention for both partitions is evenly balanced, either light (case 1) or heavy (case 2 ) , this strategy behaves in an identical fashion to the fixed partition one. If one partition is loaded heavily while the other is loaded lightly (case 3, classes i and j respectively) the first class is allowed to expand into the second partition. When that occurs, each job of the expanding class is limited to the minimum allocation. The expansion is not allowed to force any job of the other class to be excluded from memory or to receive less than its minimum allocation. Thus the strategy responds to the current loading placed on the system. Proceeding as before, the closed network was solved for all feasible combin ations of na and n s and the probabilities ττ, (m ,ι^) and π 2 (ηα ,Γ^) were computed. Table 2 shows their values for the same particular case as table 1. It can be
210
J.H. HINE, I. MITRANI and S. TSUR
0
0.000 0.000
0.000 0.459
0.000 0.674
0.000 0.750
1
0.347 0.000
0.271 0.356
0.272 0.487
• 0.188 0.580
2
0.560 0.000
0.401 0.306
0.351 0.426
0.249 0.525
3
0.684 0.000
0.418 0.280
0.374 0.398
4
0.756 0.000
0.500 0.239
0.453 0.321
5
0.790 0.000
0.568 0.199
6
0.792 0.000
0.625 0.148
7
0.777 0.000
Table 2
n (i,j) in the dynamic partition case.
" (i,ï) Parameters as in Table 1
0.000 0.772
211
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES
λ) = 0 . 3 λ2=0.1
W,
μ, = 1 . 0 μ2=0.5
W, 1 9 8 7 6 5 4 3
2h
0.2
0.4
0.6
0.8
Fig. 5 Expected response time bounds vs y, : dynamic partition. Logarithmic scale.
1.0
212
J.H. HINE, I. MITRANI and S. TSUR
seen that there are now more states which yield high CPU utilisations. Using these probabilities and the decomposition of the open system, it is again possible to obtain bounds for the expected response times, Wj and V 3 . In determining the lower bound of Vi it is noted that if we assume the other class is lightly loaded nearly all of memory will be available to class i and Vi does not play a significant role. For example, the lower bound of V1 is obtained by considering an M/M/l system with state dependent service rates \¡>y (Ni ,0) . An upper bound on Wt can be obtained by assuming that the other queue is saturated. This implies that the other class will always take the maximum possible memory allocation allowed by the strategy. This leads to another M/M/l system with state dependent service rates. As with the fixed partition, this bound works well for large memory allocations but poorly for smaller ones. For the latter, the heuristic described in appendix Β was used. Fig. 5 shows the upper and lower bounds of V3 and V2 for the same case as Fig. 3. Note that the upper and lover bounds on Vj no longer converge as Vi approaches unity. For the fixed partition strategy they converged because Yj =1 meant that class i had preemptive priority over the other class. For the dynamic case this is not true. The other class can impair the performance of class i whenever class i needs less than the whole of memory. 5
Generalization to more than two job classes
In the previous sections we have examined the effect on the response times of two classes of two different memory allocation strategies. The fixed partition strategy may be generalized to include C classes. The static partitioning of memory is defined by the vector γ t= (γα , ...,γ0) where c
Σ
vt = 1
1=1 and the state of the system by the vector Ν = (Nj ,...,NC) in which Ni is the number of class i jobs in the system. The amount of memory that class i is allocated at any given time is a function of the system state and is given by Vi 6 (Ni)
Γ,
M
C
E Y'6(Nj) j=1 where 6(n)=1 if n>0, δ(η)=0 otherwise. Memory is partitioned among the classes with jobs in the system in proportion to γ. The closed model may be solved as before to determine the optimal levels of multiprogramming and the probabilities TTj (η, ,n2 ,... ,nj. ) . The aggregate model behaves as a Cdimensional birth and death process. To determine bounds for the expected response time of class i we form an aggregate class which is used to represent the joint behaviour of the other classes. For example, to bound W¡ form a class with arrival rate λ defined by λ = \s+\} + .. .+XC and service rate μ defined by
From this point we can proceed as with two classes. The bounds obtained will be an approximation because of the aggregation to determine μ. A similar general isation is possible for the dynamic partitioning strategy but it is not so straightforward.
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES 6.
213
Conclusion
We have reported on an initial investigation into the use of memory allocat ion to meet response time requirements. The bounds derived give an indication of how the performance of the system may be shifted by the allocation of memory. The upper bounds may be relevant to some current systems. It is common practice to maintain a queue of batch jobs to provide background processing for a timesharing system. This queue may be viewed as saturated. We would expect the timesharing system to receive a relatively large allocation of memory and that its performance would be near the upper bound obtained for either strategy. Further work is needed to specify more precisely the response times, Vi , as functions of Yi and to extend the results to three or more classes. Computer managers might then be able to determine expected response times for several classes by parameters in the memory allocation strategy. References [1] Avi-Itzhak, B. and Naor, P., Some Queueing Problems with the Service Station Subject to Breakdowns, Opns. Res., 11, 3, pp 303-320, 1963. [2]
Basket, F., Chandy, K.M., Muntz, R. R. and Palacios, F. G., Open, Closed and Mixed Networks with Different Classes of Customers, JACM, 22, 2, pp 248-260, 1975.
[3]
Chamberlin, D. D. , Fuller, S. H. and Lin, L. Υ. , A Page Allocation Strategy for Multiprogramming Systems with Virtual Memory, Proc. 4th Symposium on Operating Systems Principles, pp 66-72, 1973-
[4]
Coffman, E. G. and Mitrani, I., Selecting a Scheduling Rule that Meets PreSpecified Response Time Demands, Proc. 5th Symposium on Operating Systems Principles, pp 187-191, 1975.
[51
Courtois, P., Decomposability, Instability and Saturation in Multiprogramming Systems, CACM, 18, 7, pp 371-377, 1975.
[6] Denning, P. J., A Note on Paging Drum Efficiency, Comp. Surveys, 4, 1, pp 1-4, 1972. [7~\
Mitrani, I. and Hine, J. Η. , Complete Parameterised Families of Job Schedul ing Strategies, Tech. Report 81, Computing Laboratory, University of Newcastle upon Tyne, 1975.
[8] Omahen, Κ., and Marathe, V,, Analysis and Applications of the Delay Cycle for the M/M/c Queueing System, CSD-TR-165, Computer Science Dept., Purdue University.
J.H. HINE, I. MITRANI and S. TSUR
214
Appendix A Here we derive a lower bound on the expected response time for class i jobs, W¡ , by assuming that all class j jobs ( j^i) which find on arrival one class j job in the system, are lost. Short of eliminating class j altogether, this assumpt ion produces the most favourable environment for class i and hence the least value of Wi . We shall present the analysis in terms of class 1 but it applies equally well to class 2. The system can now be only in states (M¡ ,0) and (N¡ ,1); U¡ =0,1,2,... . The steadystate probabilities of these states satisfy equations (7) and (8) where $2 = 0,1 and where ^ = 0 in all terms involving 1^2 1. Furthermore, the coefficients of equations (7) become independent of Ni for N)2 L, where L=U(1) is the maximum number of class 1 jobs that may be active at any one time (see the admission rule (9)). We introduce two generating functions
Mz) = C
rj(N,k)zN; k=0,1
(A1)
N=0 and further split them into L1 G k (z) = ^
P(N,k)z N + Y
N=0
P(N,k)z N = ρ,(ζ) + g k (z);
k=0,1 (A2)
N=L
Equations (7) for N)=L, L+1, ... and Na=0,1 lead to a system of two simultaneous equations for g¿, (z) and gj(z):
\3z
+ (1ζΗλ,ζμ) λ, ζ
'foi'-)"
"β, ( ζ ) "
\)Z + (1
μ*) ϋ
g) ( ζ )
5=
f,
(Λ-3)
(ζ)
where μ = μι™! (Lj (1 ) ,0) i s the maximum s e r v i c e r a t e for c l a s s 1 when the l a t t e r has the whole memory; μ* = μι "j (L, (VI ) , 1 ) i s a s i m i l a r r a t e when c l a s s 1 runs in a p a r t i t i o n γα ; ν=μ2,π3 (L) (y) ) ,1) i s the c l a s s 2 s e r v i c e r a t e under those con d i t i o n s . The terms in the right-hand side are fo(z) = X j Z ^ P Í L I . O j ^ z S a . O ) ;
f a ( z ) = h z L+ ' P(L1 ,1 )Hi*zLP(L,1 ) .
(A2) and (A3) allow us to express the generating functions (A1) in terms of 2L+2 unknown probabilities: P(N,k); N=0,1,...,L; k=0,1. The balance equat ions (7) for 14=0,1 ,... ,L1 ; N2=0,1 provide 2L relations for these unknowns. Another equation is obtained from the normalising condition (8) which can now be rewritten as Gfed) + 6 a (1) = 1 Finally, we notice that the matrix in the lefthand side of (A3) is singular for some point zj, in the interval (0,1). Since the power series g,(z) and gj (z) converge for all values of ζ in that interval, the extended matrix (obtained by adding the column of free terms fQ (z) and f, (z) to it) must have rank 1 for z=z0 . This provides the last necessary equation and we are now able to determine all the unknowns. The expected number of class 1 jobs in the system is equal to E [ ^ ] = G 0 '(1) + 0^(1)
and
ΕΓΝ, yx,
THE USE OF MEMORY ALLOCATION TO CONTROL RESPONSE TIMES
215
The above calculations were performed for different values of γα to obtain the lower bounds in figure 3. Appendix Β In this appendix we derive an expression for the expected response time of a class i job, with expected service time si , for the fixed partition case with Vi=0. By the assumptions made in determining si the expected response time may either be bounded or estimated by a heuristic. For clarity the determination of Wj with Y!=0 is done. It is obvious that the roles of the classes may be reversed. Class 2 goes through alternating busy and idle periods. During a busy period class 1 is preempted and during an idle period class 1 is allocated the total memory. This model corresponds to the breakdown model solved by Avi-Itzhak and Naor in Γ1 ~\. They consider a single class of jobs and a server that breaks down after exponentially distributed intervals. Although the distribution of the repair time is general it suffices to know its first and second moments in order to determine the expected response time of the system. The two models are identical with the start of a class 2 busy period corresponding to a breakdown. The duration of an idle period is simply the time until the next class 2 job arrives. This has expected value Ι/λ^. Let the duration of a busy period for class 2 be denoted by the random variable Θ, drawn from an unknown probability density function with first and second moments E£fl] and E[fl2Τ respectively. We can now state the probability that class 2 is idle, Pj , or busy, pi : 1
Μ[Θ] R
=
1+λ2Ε[θ]
From [1] the expected response time for class 1 is given by: Vi = (ρ 0 -λ 1 5 1 Γ
(3ι+ρ0ρ:1Ε[θ8Τ/2Ε[θ-])
(B-1)
when the distribution of the service times is exponential. Class 2 behaves as an M/M/l system with state dependent service rates μ 3 (0,Ν 3 ). The first and second moments of the busy period distribution for this system have been derived by Omahen and Marathe [8], Let 9k be the random variable denoting the length of a period in which k or more class 2 jobs are in the system. If L is the maximum number of class 2 jobs admitted to memory then from [8]:
and
EreL]
= ^(O.LO-Xs)"1
E[fl k ]
= (1+X 2 E[fi k + 3 ])/u 2 (0,k)
(B-2) 1Sk··.. ,a.) the sum of the products r
a1
r
l (a 1 + o 2 )
r
2
k
. . . (ttj + a 2 +...+ a k )
for all the k-tuples (r.) of nonnegative integers whose sum is T-k. Let Si be the probability that one of the j top pages of the LRU stack is referenced :
222
J. LENFANT
(4)
j Σ α q=l M
S, = J
(0 s q s η ) *
Since Pr(D s j) = 1 S . , , equality (3) is equivalent to (5)
Pr(w(t,T) = k) =(lS 1 )(lS 2 )...(lS k . 1 )P T . k (o 1 ,a 2
0|< )
As the power series expansion of 1/(1S,z) is 7 ? 3 3 1 + ( 0 l + a 2 + . . . + o h ) z + ( a 1 + a 2 + . . . + a h ) ' z + ( c ^ + c ^ tTk . . ,+a^) ζ . , the number P_k ( a , , a 2 a k ) i s the c o e f f i c i e n t of ζ i n the power series expansion of tne r a t i o n a l fuftcti Q(z) = 1 / ( 1 S 1 z ) ( l S 2 z ) . . .
(1S k z)
As a consequence of assumption LRU3, the poles of this rational function are sim ple. Therefore its canonical expansion is
(6) Q(z) =
Sk_1
k Σ
i=i ~
W
i^y^T
líjík
1
J
1
If we use this expression of Q(z) in order to compute the coefficients of its power series expansion, we obtain k T i (7) P T . k (a 1 ,a 2 αη) = Ζ ^ Sh where
(8)
^(li1^1)" q)ih
Combining equalities (5) and (7), we obtain the following result : Proposition 1 : Assume that the behavior of a program is described by the LRU stack model. The distribution of the size of its working sets is such that, for any positive integer k not greater than min(n,T), k
(9)
Pr(w(t,T)=k) =
Σ B h=l
Tl kh κπ
S.'
L
n
where (10) Β
Κί1
=
and (4)
S. = J
(1S )(1S ) . . . ( i s k , ) i ί 5-1- = Α χ E (1-S.) kn J Π (S.-SJ lsjsk-1 n q lfqik q^h
Σ aq. q=l
* Let (a.).
, be a family of numbers. If the index set I is empty, then
Σ a. = 0 and Π a. = 1. For instance, S„ = 0 (equality 4) j
1
j
i
o
WORKING SETS AND BOUNDED LOCALITY INTERVALS
223
With formula 9, it is easy to compute effectively the distribution of w(t,T) as well as the mean size w(t,T) and the missing page rate m(t,T). Note that min(t.T) (11) m(t,T) = Σ (1-S. ) Pr(w(t,T) = k) K k=l I - ANALYSIS OF THE BOUNDED LOCALITY INTERVALS Denote by σ. . = (a. ,a„ a ) the state of the program at time t, i.e. the LRU stack just after reference RJ} According to the definition stated above,the set Z.(t) = {a. , a„,..., a.} is an activity set iff there is an integer Τ. (0 ί T.< t) with the following properties : ASI : D T
> i
AS2 : for any instant u , T . < u s t : l s D
si
AS3 : for any integer j , 1 s j s i, there is an instant u, T. < u s t
~
such that
η
R = a..
u J Condition AS3 leads immediatly to an upper bound for T. : T. s t - i . I t is note worthy that there is at most one integer T. which f u l f i l s both ASI and AS2. 4 . 1 . PROBABILITYJHAT J ^ t } JE^Mj^CJIVIJY JET Denote by θ an integer less than t . By the additive property of probability measu res , we obtain : (12) P r ( Z i ( t ) is an a c t i v i t y set) = t-i Σ P r ( Z . ( t ) is an a c t i v i t y set and Τ, = θ) n
θ=0
1
The event "Z.(t) is an activity set and Τ. = θ" is equivalent to the following Λ event : "the distance string D D
M+l,nEli W S l
. ... D. is of the form :
^ ' S i
ïi.
(1.2.3) 3 ··· "„(1.2
1
where . c k h is a distance in the range {k, k+1
h}
(k s h)
. The r. 's are nonnegative integers whose sum is tei h ) r is a string of r distances less than or equal to h.
. (1,2
Because of the stochastic independence of stack distances, the probability of such a string is Pr(Dg e {1+1,1+2
η)) χ Pr(D e + 1 e {1
χ ... χ Pr(D t . r
i}) χ Pr(D e + 2
r
* α1
l
+1
e {1})
D t belong to {1,2,....i})
+1
We obtain : PrfZ^t) is an activity set and T. = θ) = (1S.¡) χ S.¡ χ x
Ds+r
r
( α ι +α 2'
r
2 ··■ ( α ι + α 2 + ··· + a i '
i
π
(Si"sa)
224
J . LENFANT
the sum Σ being computed over the set of a l l ituples (r ) of nonnegative integers whose "'sum i s t e i . Thus t h i s p r o b a b i l i t y equals " (13) (1S ) χ S χ 1
1
Π (S S ) χ Ρ .(α.,*,,...,*,) lsq ti)
where (15) C. h
(1S.) χ S. χ
By comparing equalities (12) and (14) we obtain : ι Σ C ih h=l 0
(16) Pr(Z.(t) is an a.s.)
— 1 Su
(if 1 s i < η) (if i = η)
If we are only interested in the longrun behavior of the program, we can neglect the term S, * in the formula above : Proposition 2 : The longrun probability that the set of the i most recently used pages is an activity set, is
I Sn \ h1
(17) Pr(Z i is an a.s.
0
ΊΓΓ3
i1
( I i i i
n-1)
η" (i = η)
Corollary 1 : The mean number of simultaneously existing activity sets is
(18)
n-1 Σ 1=1
i C-. S. 1 " 1 Σ ~^J~ 1 h=l ^h
Proof : Let us denote by λ(.,.) the random variable which assumes the following values : x(i,t)
1
if Z.(t) is an activity set at time t
0
otherwise
The number of activity sets at time t is Σ x(i,t). The mean value of this num ber is i=l Σ i=l
λ (i,t) =
Σ Pr(X(i,t) = 1) = Σ Pr(Z. is an a.s.) i=l i=l
WORKING SETS AND BOUNDED LOCALITY INTERVALS
225
The corollary follows immediatly from this equality and proposition 2. 4.2. BOUNDED LOCALITY INTERVALS If Z.(t) is an activity set at time t which terminates at time t+l, the lifetime of the associated BLI is defined by τ.
= t T. + 1
Consequently the d i s t r i b u t i o n of the l i f e t i m e is such that (19) P r ( t . = x) = P r ( Z , ( t ) is an a.s. and T.. = tx+1 and Dt+1
>
i I Z.'(t) is an a.s. which terminates at time t )
Pr(Z.¡(t) is an a.s. and T. = t x + l ) ( l S . ¡ ) P r ( Z . ( t ) is an a.s.)(lS.¡) Corollary 2 : The long run d i s t r i b u t i o n of the l i f e t i m e of a B LI of size i is given by the equality if X < i (20) Pr(t.=x) Σ C . . S. ln h h=l
ι Σ C ih h h=l 1S, S
x2 i f χ Ï i+1 i1
I t follows that the mean l i f e t i m e of a B LI is (21) Ε(τ.) =
C. h S Í " 1 ( i + l i S h ) ( l S h ) 2
^
/
Σ C.. S,""1 (1S. ) _ 1 h h=l l n n
From equality (21) we can derive an expression f o r another useful quantity : the fraction p.-, of the execution time which is covered by B LI's of i pages. Let t be appositive integer. With probability Pr(Z.(t) is an a.s.) χ (1S. ) the set Z. is an activity set terminating at t+l. Thus the fraction p,., is (22)
P(i)
= Pr(Z.(t) is an a.s.) χ (1S,.) χ Ε(τ.)
'' (i, c *^ s "
= (1S
(i+liSh)(lSh)"
226
J. LENFANT
In sections 3 and 4 we analyzed independently working sets and activity sets. Now, in order to compare these two concepts, we compute the probability that a working set is an activity set. 5 - COMPARISON OF WORKING SETS AND ACTIVITY SETS Let us first notice that if the size of the working set W(t,T) is i then this working set is Z.(t). In sections 3 and 4 we have computed the probability that the working set is Z.(t) and the probability that Z.(t) is an activity set. This is not sufficient to compare these two events because they are not stochastically independent. It is possible to compute the joint distribution (23) Pr(w(t,T) = j and Ζ .(t) is an activity set). However the resulting expression is so complicated that it is not very useful for effective computations. In this section we are less ambitious and compute only the probability that the working set W(t,T) is an activity set. Basically we have to evaluate expression (23) when i equals j. Denote by £ the event "Z.(t) is an activity set and T. = e and w{t,T) = i" and by f(.,.,.) the function (24) f(l.e.T) = ?r(t) We shall compute the values assumed by f in the following domain . i is an integer and I s i s n-1 . T is an integer and 2i < T s t . θ is an integer and 0 s θ s t Note that f ( n , . , . ) = 0. Moreover the r e s t r i c t i o n 2i < T (instead of i s T) keeps us from studying a special case of l i t t l e i n t e r e s t . We distinguish three cases according to the value of θ : F i r s t case : 0 s e < t-T+1 θ
t-T+1
I
I
t 1
^References considered for the evaluation^ of w(t.T) The event C can be depicted as the intersection of the three following events : • De>1 . 1 s D si for a l l 0 s u s t . the i pages at the top of the LRU stack j u s t after reference R (or just a f t e r R+.-r) are referenced during the interval [t-T+1 t] Thus event C occurs i f f the distance s t r i n g D
eDe+lDe+2 -
D
t
is of the form ε
1+1,η ( 1 . 2 . · " · 1 ) * "
Τ
\
( D 1 e 2 i (1.2) 2 ··■ ^ ( 1 , 2 , . . . . I ) 1
where the r 's are integers whose sum is T - i . Therefore
WORKING SETS AND BOUNDED LOCALITY INTERVALS f(i.e.T) = (1S.) χ S, χ 1
Π (S.S ) χ Ρ .(α, q lsqsi1 ' l
227
α,, '
Using equalities (7) and (15), we obtain (25) f ( i , e , T ) = S 1 1 8 1
Second
case : tT+1 s e s tT+1 I
Σ Ch s J " 1 h=l η η n
(T >, i )
t+iT θ ι
t i
Event C is the intersection of the three following events : (I)
w(el,e+Ttl) < ii (i.e., at most i1 pages are referenced during
[tT+1 ei]) (Π) B D " i ( I I I ) The i pages at the top of the LRU stack j u s t pages) are referenced during [Θ+1 t ]
after e (and only these
( I ) is not an actual constraint since the inequality θ+Τtl s i1 is equivalent to the inequality e s t+iT which holds in t h i s case. The intersection of events (II)
and ( I I I )
is
"Z, (t) is an activity set and T. = e" Its probability is given by eq. (14) : (26) f ( i , e , T )
i Cih S ^ " 8 " 1 Σ n=i 0
if e s ti if θ > ti
Thanks to the inequality 2i < Τ by which we have r e s t r i c t e d the domain of f , . t i > t+iT and, consequently, e cannot be greater than t i in this second case. Third case : t+iT < e < t We can consider event S as the intersection of the same events as in the second case. However the condition " w ( e l , e + T t l ) s i 1 " may not hold. (27) f ( i , e , T ) = B r(w(e1,θ+Τtl) s i1) χ P r ( Z , ( t ) is an a c t i v i t y set and t i = θ) From proposition 1 , we obtain : 1
1
*
ÛJ.T + 9 Ζ ¿
(28) Pr(w(θ1,θ+Τtl) s i1) = Σ Σ B .. S° + k = l 1=1 κ *
q=l\j=q
Jt
yq
*
228
J. LENFANT
1
Insert this expression into equality (27)
Ec^'ia^ ) r8+Tt2'
f(i.e.T)
if θ s ti
Denote by B . the product (29) D. . =[ 1qh
Β. )
Σ
J
\j=q
(1 s i,q,h s n)
ih
V
The expression of f(i,P,T) in this third case is then
(30) ffi.e.T) =
q=l h=l
iqtl
q
if e s ti
h
if θ > ti Let us summarize the p a r t i a l results that we obtained by considering the three cases : tT+1
t+iT
—L 1 Σ
h=l
C
ti
]
ih S h
^
C
3 Σ
ih S h
1$nsi
D..„u ^s r "iqh
T_t 2
"
s^ 0 1
0
lsqsi1 By a summation with respect to θ, we obtain Pr(w(t,T) = i and Z . ( t ) is an a.s.) = Σ f ( i , o , T ) = 1
. 1
c s
0=0
tT+1 i
i-h 21)
ι
h=i
1
Tl Cih S
"
,. T i 1 . Tl b h " bh
+
h=l ° i h
i1 Σ D...S.T"3+ Σ D.. 1JJ J j=l Ishsi l q n lsqsi1 q¿h
T su s
Ö
i1
s
Ti1 _ s Ti1 9 1_1¡ S Sh q
i1
s
3
h
I f we sum with respect to i and l e t t tend to i n f i n i t y , we obtain the following proposition : Proposition 3 : The longrun p r o b a b i l i t y that a working set with window size T is an a c t i v i t y set, i s Pr(W(t,T) is an a c t i v i t y set) =
n1 Σ π. i=l η
(Τ >, 2η)
WORKING SETS AND BOUNDED LOCALITY INTERVALS
229
where
ι
T-l
Σ C-, S ' h-i in h 2^-T ,. 1 · si
. ς T-i-1
232
J.
LENFANT
P(i) 1 —,
Pr(Z.
i s an a . s . )
'ni -
Ί π
.5 _
-Γ ■
i
a) P r o b a b i l i t y that Z. i s an a c t i v i t y set
A 1
Γ
..._.]£. J — >i ■
b) Fraction p ( i ) of the execution time covered by B LI's of size i
D(s)
.5
M än" c) Relative difference D(s)=(F ws (s)F LRU (s))/F LRU (s) FIGURE 4 : Bounded Locality Intervals of program A
233
WORKING SETS AND BOUNDED LOCALITY INTERVALS
1
5
_
¡LÀ
p(i)
Ltd
i
10
a) P r o b a b i l i t y that Z. i s an a c t i v i t y set
b) Fraction p ( i ) of the execution time covered by B LI's of size i
D(s) 1
_
t# c) Relative difference
D(s)=(Fws(s)FLRU(s))/FLRU(s)
FIGURE 5 : ßounded L o c a l i t y Intervals of program Β
234
J.
LENFANT
p(i) Pr(Z. i s an a . s . )
u
Lhtd >L£ 10
15
td m
iJL
10
a) P r o b a b i l i t y that Z. i s an a c t i v i t y set
b) Fraction p ( i ) of the execution time covered by BLI's of size i
1 _
a
H 10
15
h ri h 15
c) Relative difference D(s)=(F HS (s)-F LRU (s))/F LRU (s)
FIGURE 6 : Bounded Locality Intervals of program C
WORKING SETS AND BOUNDED LOCALITY INTERVALS
235
tively). On more than 50 numerical examples, we have observed a coincidence relevant to the r e l a t i v e efficiency of LRU memory management versus working set p o l i c i e s . Assume that s page frames are allocated to a η-page program (n > s ) . I f page replacement is decided according to the LRU r u l e , the page-fault rate is : (33)
FLRU
=
i=s'+l " 1
Alternatively, if the memory hierarchy is operating under a working set policy, i.e. if the set of pages present in main storage is the working set W ( t j ) for some fixed Τ (this involves dynamic partitionning), the page-fault rate is the missing page rate m(T)j_ In order to compare both policies we have to choose the window size Τ so that w(T) = s. This may not be feasible in an exact way since Τ is an integer. However, from equality (9) of proposition 1, w(T) is an increasing and continuous mapping of the real interval [0 «>[ onto the real interval [0,n[. Therefore it has an inverse w "1. We define as follows the page-fault rate of a program to which s page frames are allocated on the average, under a working set rule : (34) F ws (s) = m ( w _ 1 ( s ) ) The relative difference D(s) = (FUc(s) - F R,,(s))/F. R(J(s) is plotted on figures 4c and 5c for programs A and Β respectively. The coincidence is that the maxima of D(s) and those of p(i) are assumed for the same values of the argument. This is particularly striking for the hypothetical program C (figure 6) which is defined by 20 k χ 1000 if 1 s i s 5 k χ 100 if 6 < i s 10 k χ 10 if 11 s i S 15 k if 16 s i S 20 (k is a normalizing constant) The difference in efficiency between the LRU algorithm and working set policies is rather small except when the size of memory allocated is close to a very frequent ly occuring size of localities (in BLI sense). The probability that the working set coincides with an activity set depends on the window size T, as shown in figures 7 and 8 ; programs differ considerably from one anotherlnthis probability. The probability of coincidence is very high for program B when T belongs to the interval [3 000 5 000] whereas it is rather small for program A when T is in the considered domain. Since w(T) tends to1n when T tends to infinity and Ζ is not an activity set, Pr(W(t,T) is an activity set) is ne gligible for efficiently large values of the window size. Conclusions : The locality property of programs has a dramatic impact on the performance of virtual memory systems. Some page replacement algorithms, e.g. the LRU algorithm, have been devised to take advantage of this fact. An attempt to quanti fy the intuitive notion of locality leads to the concept of a working set ; work ing sets induce policies for storage hierarchy management which are easy to imple ment. Activity sets and bounded locality intervals are another approach for defin-
1 ,
.5
window size Τ 7 500
2 500
10 000
FIGURE 7 : Probability that the working set W(t,T) is an activity set at time t
Γ
12 500
15 000
(program A)
.5
window st¿e T 2 500
5 000
7 500
10 000
FIGURE 8 : Probability that the working set W(t,T) is an activity set at time t
12 500
(program B)
WORKING SETS AND BOUNDED LOCALITY INTERVALS
237
ing both locality and locality lifetime. They permit one to obtain a deeper in sight intothe scattering of references, especially as concerns the detection of locality changes that take place (sometimes slowly, sometimes very fast). Although an efficient tool for the study of program behavior, the concept of bounded loca lity interval is ill-suited to the design of rules that would manage a memory hierarchy while dynamically estimating the characteristics of the current program. The two approaches to the notion of locality may not be considered as equivalent since, very often, the working set W(t,T) happens to be different from all activi ty sets existing at time t.
233
J . LENFANT R E F E R E N C E S
[1]
BRAWN, B.S. and GUSTAVSON, F.G. : Program behavior in a paging environment. Proc. of AFIPS Fall Joint Comp. Conf, 1968, pp 1019-1031.
[2]
COFFMAN, E.G. and VARÍAN, L.C. : Further experimental data on the behavior of programs in a paging environment. Comm. ACM 11, 7 (July 1968) pp 471-474.
[3]
COMEAU, L.W. : A study of the effect of user program optimization in a paging system. ACM Symp. on operating system principles, Gatlinburg, Tenn. (1967).
[4]
DENNING, P.J. : The working set model for program behavior. Comm. ACM 11, 5 (May 1968), pp. 323-333.
[5]
DENNING, P.J. and SCHWARTZ, S.C. : Properties of the working set model. Comm. ACM 15, 3 (March 1972), pp 191-198.
[6l
FERRARI, D. : Improving locality by critical working sets. Comm. ACM 17, 11 (Nov. 1974), pp 614-620.
[7]
LENFANT, J. : Evaluation sur des modèles de comportement de la taille d'un ensemble de travail. Revue d'Automatique, Informatique et Recherche Opérationnelle, série "Bleue", B-2 (June 1974), pp 77-92.
[8]
LENFANT,J. : Comportement des programmes dans leur espace d'adressage ; application à la gestion des mémoires hiérarchisées. Thesis for a Doctorat-ès-Science, University of Rennes, Rennes, France (Dec. 1974).
[9]
LENFANT, J. and BURGEVIN, P. : Empirical data on program behavior. Proc. ICS 75 (June 1975), North-Holland, pp 163-169.
[10]
Mc KELLAR, A.C. and COFFMAN, E.G., Jr : Organizing matrices and matrix operations for paged memory systems. Comm. ACM 12, 3 (March 1969), pp 153-165.
[11]
MADISON, A.W. and BATSON, A.P. : Characteristics of program localities. Comm. ACM 19,5 (May 1976), pp 285-294.
[12]
MORRIS, J.B. : Demand paging through utilization of working sets on the Maniac II. Comm. ACM 15, 10 (Oct. 1972), pp 867-872.
[13]
MORRISON, J.E. : User program performance in virtual storage systems. IBM Syst. J., 3 (May 1973), pp 216-237.
[14]
RODRIGUEZ-ROSELL, J. : Experimental data on how program behavior affects the choice of scheduler parameters. Proc. of 3rd ACM Symposium on operating systems principles (1971), pp 156-163.
WORKING SETS AND BOUNDED LOCALITY INTERVALS [15] [16] [17]
ROGERS, J.G. : Structured programming for virtual storage systems. IBM Syst. J. 14, 4 (1975), pp 385-406. SHE D LER, G.S. and TUNG, C. : Locality in page reference strings - SIAM ■ J. on Comp. 1, 3 (Sept. 1972), pp 218-241. SPIRN, J.R. and DENNING, P.J. : Experiments with program-' locality. Proc. AFIPS Fall Joint Comp. Conf. 1972, pp 611-621.
239
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
DETERMINISTIC JOB SCHEDULING IN COMPUTING SYSTEMS+
C. L. Liu Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6180I USA
I.
Introduction
We survey in this paper some of the recent activities in the area of deter ministic job scheduling in computing systems. The problem of job scheduling is clearly an important one in the design of computer operating systems. Further more, the abstraction and idealization of task sets and computing systems, and the design and implementation of scheduling strategies for effective utilization of computing resources are important problems in connection with the study of modelling and performance evaluation of computing systems. It should also be noted that many of the results on job scheduling in computing systems have immediate interpretations in several problem areas in operations research and industrial engineering. The amount of literatures on the subject matter produced in the past several years is indeed voluminous. As general references, we refer the reader to Baer [1], Baker [2], Coffman [6], Coffman and Denning [7], Conway, Maxwell, and Miller [10]. Instead of summarizing and listing the many results, we shall attempt to identify and to illustrate some of the general features of these results. It is hoped that such an identification will be helpful in leading to an appreciation of the overall picture and an understanding of some of the design principles and methodologies. II.
The Models
We describe first a general model of a computing system which can be special ized in various ways to include most of the results in the literature on job scheduling. We make the following assumptions: (1) A computing system consists of two classes of resources,'dedicated resources and shared resources. In each class, there are different kinds of resources. (2) There is a certain number of units of dedicated resources of each kind. The execution of a job requires an integral number of units of each kind, includ ing zero unit as a possibility. The execution of a job completely occupies a unit, and no other jobs can be executed on the same unit concurrently. Examples of dedicated resources are processors, input-output devices, and so on.
Supported in part by the National Science Foundation under grant NSF MCS73-O3I1O8-AOI, and in part by the Gesellschaft für Mathematik und Datenverarbeitung MBH Bonn, where the author was a visiting researcher in the summer of 1976·
241
242
C. L. LIU
(3) There is a unit of shared resources of each kind. The execution of a job requires a fraction of the unit of each kind of shared resources, including zero fraction as a possibility. Concurrent execution of a number of jobs might share the same unit of shared resources, provided that the sum of the fractions of the unit they share does not exceed one. Examples of shared resources are core memories, magnetic disks and drums, and. so on. (U) The units of each kind of dedicated resources might not be identical. It might be the case that the execution times will be different when a job is executed on different units of one kind of dedicated resources. It might be the case that a job can only be executed on some of the units of a particular kind of dedicated resource. Since the execution of a job might require, in general, more than one unit of dedicated resources of each kind, the execution of a job is said to be completed if its execution on all units is completed. (5) The unit of each kind of shared resources is considered to be uniform. A job will release the portions of shared resources it occupies when its execution on all units of dedicated resources is completed. Let ρ = (J , J , ..., J., ...] be a set of jobs and < be a precedence relation Φ on ^ . That J, < J means the execution of job J cannot begin until the execu tion of job J, has been completed. J is called a predecessor of J , and J is called a successor of J . A set of jobs is said to be independent if the precedence relation < is empty. Each job in g. is specified in the following way: Let ρ denote the number of kinds of dedicated resources and η denote the number of units of each kind in the computing system on which the set of jobs Ç is to be executed. Let q denote the number of kinds of shared resources. The utilization of the dedicated resources by a job J is specified by a ρ χ η matrix Τ = ||t || k k Ij and a ρ component vector U. = ||u. ||, where 0 S t. . S °° and for each i there exists at least one j 'such that t. . < °o, and u. is an integer such that 0 s u, g η. The 1 ^ th value of t.. is the time it takes to execute job J on the j unit of dedicated ^ th resources of the i kind. That t. . = °° means that job J cannot be executed on +Vi
the j
^""
+V)
units of dedicated resources of the i
kind.
The value of u. is the
number of units of dedicated resources of the i kind which the execution of job J requires. Similarly, the utilization of the shared resources by J is speci fied by a q component vector V = ||v. ||, where 0 s v. S I . The value of v, is the +Vi
k
i
i
1
fraction of the i kind of shared resources which the execution of job J, requires. By scheduling a set of jobs on a computing system, we mean to assign, within certain time interval(s), to each job resources that are needed for its execution with the constraint that all the resources needed for the execution of a job are assigned to the job simultaneously. A schedule is a specification of the assign ment of resources to the jobs, and a scheduling algorithm is a procedure that There is no loss in generality in normalizing each kind of shared resources to one unit. A precedence relation is a binary relation that is antisymmetric and transitive.
#
As will be seen, there is no loss of generality in assuming that there is the same number of units in each kind of dedicated resources. For each i, u. is not larger than the number of finite entries in the i
row of T. .
DETERMINISTIC JOB SCHEDULING
243
produces a schedule for every given set of jobs. By preemptive scheduling discipline, we mean to allow the interruption of the execution of jobs in a schedule. By non-preemptive scheduling discipline, we mean the execution of a job must continue until completion, once its execution commences. Most of the early work on deterministic job scheduling dealt with the case in which there is only one kind of dedicated resources and no shared resources. Furthermore, when we assume that the execution time of a job is the same on all units of the dedicated resources and that the execution of each job requires exactly one unit of the dedicated resources, the problem is exactly that of job scheduling on a computing system with identical processors. In this case, we shall refer to t , which is always equal to t , t ,,...,t , as the execution time of the job. Hu [25] studied the scheduling problem for jobs with unit execution times and with the precedence relation over them being a forest. For some extension and generalization of Hu's result see Chen and Liu [k], and Schindler [ 4 6 ] . Fujii, Kasami, and Ninomiya [13], and Coffman and Graham [9] studied the problem of scheduling jobs with unit execution times on a twoprocessor computing system. Chen and Liu [5] and Lam and Sethi [39] investigated extensions to multi-processor computing systems. Gonzales, Ibarra, and Sahni [22], Horowitz and Sahni [23], Ibarra and Kim [26], Kafura [30], Liu and Liu [¿2], Liu and Yang [43] extended the case mentioned above to that in which the processors are not identical, meaning that the values of tn.,t,-, ...,t, in the matrix T, for a job J, might not all be the 11' 12' 'In k k same. A special case is one in which the processors are assumed to be of differ ent speeds. In particular, let a ,a , ...,a , and 1 be the speed of the first, second, ..., n-1
, and the n
processor, respectively.
(There is no loss in
generality in assuming the speed of the n processor to be 1.) It follows that we have t n n = tn /a,, tnri = t, /a„, ...,fc.. = t, /a., ..., t, , = t, /a , for 11 ln' 1' 12 In' 2' ' li ln' 1' ' 1, n-1 In n-1 all jobs. Kafura and Shen [31,32] studied the case in which a job can be executed only on a subset of the processors,which is assumed to be identical. In other words, in the matrix T, = Rt,, t n _ tn II for each job J,, some of the entries are k U 12 In k' infinite while all finite entries are equal. A physical interpretation of the case is the situation in which each processor has its own private memory, and the scheduling of a job on a processor must satisfy the job's memory requirement. Garey and Graham [lU] studied the case in which there is one kind of dedicated resource and arbitrarily many kinds of shared resources. Yao [h-91 studied a special case in which all jobs have unit execution times. Krause, Shen, and Schwetman [36] studied the case in which there is only one kind of shared resources with an additional condition that only a limited number of jobs can share the resources simultaneously.
III.
Scheduling to Minimize Completion Time
Different criteria can be used to measure how good a schedule is. The most common one is the completion time of a schedule, that is, the total time it takes to complete the execution of a set of jobs according to the schedule. Clearly, for a given set of jobs, a "good" schedule is one with "short" completion time, and an optimal schedule is one with shortest possible execution time. The effectiveness of a scheduling algorithm is measured by how good the schedules it produces are. One might wish to consider the worst case performance of a scheduling algorithm, or one might wish to consider the average case performance of a scheduling algorithm. Most of the current works are concerned with the
244
C. L. LIU
worst case performance analysis of scheduling algorithms. As was mentioned above, we shall make an attempt to identify some of the general features of scheduling algorithms whose effectiveness will be measured by the completion time of the schedules they produce. 1. There are algorithms that produce optimal schedules. Clearly, optimal schedules and algorithms that produce optimal schedules are of significant interest. Unfortunately, very little is known about "efficient" algorithms that produce optimal schedules for arbitrary computing systems and arbitrary sets of jobs. As a matter of fact, efficient algorithms that produce optimal schedules are known only for the following cases: (i)
(ii)
Jobs having unit execution times with the precedence relation over them being a forest are to be scheduled on a computing system with identical processors. Jobs having unit execution times are to be scheduled on a computing system with two identical processors.
We shall describe an algorithm due to Hu [25] which produces an optimal schedule for case (i). We introduce first the notion of demand scheduling algorithms. A demand scheduling algorithm is one that always attempts to schedule executable jobs
on resources that are free at any time instant.
In other words,
a demand scheduling algorithm never leaves any resources idle intentionally. A particularly simple class of demand scheduling algorithms is known as list scheduling algorithms. A list scheduling algorithm assigns distinct priorities to jobs and allocates resources to jobs with highest priorities among all executable ones at any time instant. Hu's algorithm is a list scheduling algorithm. the level of a job: (i) (ii)
We define first the notion of
The level of a job that has no successor is defined to be 1. The level of a job that has one or more successors is equal to one plus the maximum value of the levels of its successors.
In Hu's algorithm priorities are assigned to jobs according to their levels such that jobs of higher levels will have higher priorities. (Assignment of priorities to jobs of the same level is arbitrary.) Hsu [24] contains a simple proof that Hu's algorithm produces optimal schedules for case (i). See also Chen and Liu [If]. Fujii, Kasami, and Ninomiya [13] and Coffman and Graham [9] discovered algorithms that produce optimal schedules for case (ii). We present here Coffman and Graham's algorithm, which is also a list scheduling algorithm. In Coffman and Graham's algorithm, priorities are assigned to jobs as follows: (i)
Starting with 1, which is the lowest priority, distinct and consecutive priorities are assigned to jobs that have no successors arbitrarily.
A job is said to be executable at a time instant if executions of its predecessors have all been completed at that time instant. It is not difficult to construct examples to show that there are optimal schedules in which resources are left idle intentionally.
DETERMINISTIC JOB SCHEDULING (ii)
245
Priorities are assigned to jobs with one or more successors recursively: (a)
A job to all of whose successors priorities have been assigned will be labelled with the priorities of its successors ..) in decreasing order.
(b)
Compare the labels of all labelled jobs according to the lexicographical order. Starting with the lowest unassigned priority, distinct and consecutive priorities are assigned to the labelled jobs such that jobs with larger labels will be assigned higher priorities.
For quite a number of years our inability to discover efficient algorithms that produce optimal schedules has been a rather frustrating experience. However, such frustration is at least partially pacified by some recent results in Complexity Theory. There exists a class of problems known as NP-complete prob lems, many of which are known to be "difficult" in that no efficient algorithm for solving them has been found although a great deal of effort has gone into attacking them. (See Cook [11] and Karp [33,34].) Furthermore, it has been shown that if one can discover an efficient algorithm for solving one of these problems one would obtain immediately efficient algorithms for all NP-complete problems. Some examples of NP-complete problems are the travelling salesman problem, to determine whether a graph has a clique of given size, to determine whether a planar graph is 3-colorable, and the discrete multicommodity flow problem. It has been shown that the general scheduling problem is an NP-complete problem. Furthermore, many simplified versions of the general scheduling problem are also NP-complete problems. For example, we have Theorem 1: (i)
(ii)
(iii)
The following problems are NP-complete problems:
To determine whether a set of independent jobs can be scheduled on a computing system with two identical processors so that the completion time is less than or equal to a given ω. To determine whether a set of jobs with unit execution times can be scheduled on a computing system with η identical processors, for any n, so that the completion time is less than or equal to a given ω. To determine whether a set of independent jobs with unit execution times can be scheduled on a computing system with three identical processors and one kind of shared resources so that the completion time is less than or equal to a given ω.
The complexity of scheduling problems was first studied by Ullman [48]. See also Brucker, Lenstra, and Rinnooy Kan [3], and Garey and Johnson [l6]. We should point out that there is a large body of literatures on obtaining optimal schedules by the methods of complete enumeration, mixed integer and non linear programming, and dynamic programming. Note that in these approaches, the
Clearly, we must be more precise about what we mean by an "efficient" algorithm. Without going into all the technical details, we simply stipulate that an efficient algorithm to be one that produces a schedule for a set of m jobs in time proportional to a polynomial function of m.
246
C. L. LIU
computation time required to produce an optimal schedule will be an exponential function of the number of jobs to be scheduled. We refer the reader to Lenstra [40] and Rinnooy Kan [44]. See also Horowitz and Sahni [23] and Sahni [45]. 2. There are simple scheduling algorithms that spend very little effort to search for a schedule. Almost directly opposite to the approach of spending a lot of effort to determine an optimal schedule, one could consider the approach of spending little or no effort to search for a reasonably good schedule. In view of the discovery of the class of NPcomplete problems, such an approach becomes a particularly attractive one. (As general references to the area of approximation algorithms, see Johnson [27] and Garey and Johnson [17].) For example, a very simple scheduling algorithm is a list scheduling algorithm with arbitrary assign ment of priorities. The following result is due to Graham [19,20,21]: Theorem 2 : For a computing system with n identical processors, let us denote the completion time of a schedule for a given set of jobs produced by an arbitrary list scheduling algorithm and let ω denote the shortest possible completion time.T Then ω —ω
S 2
0
For η = 2, the ration ω/ω
1 =■η
in Theorem 2 is upperbounded by the constant 3/2.
That is, in terms of the completion time a schedule produced by any list sched uling algorithm is not worse than an optimal schedule by 50$· When the number of processors in the system increases, although the comparison becomes less favorable, the suboptimal schedule is never worse than an optimal schedule by 100$. Theorem 2 can be extended immediately: Theorem 3 (Liu and Liu [42]): speed b , η
For a computing system with n. processors of
processors of speed b , — ,
ri processors of speed b , where
b. > b > ... > b, g 1, we have 1 2 k ' ω —
_ 1 S rr- +
1
1 Σ
n,b,
i1
i x
Garey and Graham [14], and Yao [49] studied list scheduling algorithms for computing systems with shared resources. For example, similar to Theorems 1 and 2, we have Theorem 4 (Garey and Graham [14]): For a computing system with η identical processors and one kind of shared resources, we have
From now on, we shall consistently use ω to denote the completion time of an arbitrary schedule and ω to denote the completion time of an optimal schedule.
DETERMINISTIC JOB SCHEDULING
247
Theorem 5 (Garey and Graham [l4]): For a computing system with two or more processors and q kinds of shared resources and for a set of independent jobs, we have ω
— m
S
. ,n+l
,.
min(7T, q + 2 v 2 '
2ql\
—) η
3 There are cases in which an algorithm that produces optimal schedules under a certain set of conditions is applied to situations that do not satisfy these conditions. The following results illustrate this point: Theorem 6 (Chen and Liu [5]): When Hu's algorithm is applied to schedule a set of jobs with unit execution times on a computing system with η identical processors. Then n\
ω
)l
s 4
η = 2
5
œ
g
2
"b
1 n1
n ê 3
Theorem 7 (Lam and Sethi [39]): When Coffman and Graham's algorithm is applied to schedule a set of jobs with unit execution times on a computing system with η identical processors. Then ÜL
*
%
"
2
2
-
n
Kaufman [35] extended Hu's algorithm to the scheduling of jobs with unequal execution times on a computing system with η identical processors, where the precedence relation over the jobs is a forest. By defining the level of a job to be the length of the chain between the job and the root of the tree it is in (including the execution time of the job itself), Kaufman has shown that Theorem 8:
In the extended Hu's algorithm described above ω S ω + k k/n Ρ
where ω
is the completion time when the jobs are executed according to an optimal
preemptive schedule, and k is the execution time of the longest job in the set. 4. There are algorithms that perform a certain amount of computation in order to produce good schedules. For example, consider the problem of scheduling a set of independent jobs on a computing system with η identical processors. If we sort the jobs according to their execution times and assign high priorities to jobs with long execution times, we can upperbound the worse case behavior of such a list scheduling algorithm by: Theorem 9 (Graham [20]):
For the scheduling algorithm described above ω_ ωη
4 " 3
s
"
1 3n
248
C. L. LIU
Extension of the idea of assigning high priorities to jobs with long execu tion times to computing systems with nonidentical processors have been carried out in Gonzales, Ibarra, and Sahni [22], and Ibarra and Kim [26]. As another example, we consider the following algorithm for scheduling a set of independent jobs on a computing .system with n identical processors: We pick out the k longest jobs in the set and schedule them in such a way that the total execution time (for the execution of these k jobs) is minimum. The remaining jobs will be scheduled according to the rule that whenever a processor is free an arbitrarily chosen job will be executed on that processor. Graham [20] has shown that Theorem 10:
For the scheduling algorithm described above
■2
s
ι
+
1Ì S_
5. One can consider algorithms that produce schedules which are as close to optimal schedules as it is desired at the expense of computation time. An algorithm is said to be an eapproximation algorithm if for a given e, the algorithm will yield a schedule such that the ratio (ωω )/ω is less than ε. Sahni [45] studied the problem of scheduling a set of independent jobs on a computing system with η identical processors and obtained an eapproximation algorithm whose complexity is ö^mirn /e) the set. IV.
) where m is the number of jobs in
Scheduling to Meet Deadlines
To illustrate some other aspects of the scheduling problem, we shall survey some of the results on a generalization of our model by assuming that each job has a ready time, a time at or after which execution of the job can begin, and a deadline, a time at or prior to which execution of the job must be completed. We consider two cases: (i)
For a given computing system with fixed amounts of resources, a set of jobs is said to be schedulable if there is a schedule according to which all jobs can be executed to meet their deadlines. Such a schedule will be referred to as a feasible schedule for the set of jobs. A set of jobs is said to be schedulable by a scheduling algorithm if the algorithm yields a feasible schedule for the set. Consequently, a scheduling algorithm is said to be optimal if it yields a feasible schedule for every schedulable set of jobs. On the other hand, if a scheduling algorithm is not optimal, one would wish to measure the effectiveness of the algorithm in terms of the fraction of schedulable sets of jobs it is capable of scheduling.
(ii)
For a computing system with variable amounts of resources, one would want to determine algorithms that produce a feasible schedule for a given set of jobs utilizing a minimum amount, or close to a minimum amount, of resources. In this case, a scheduling algorithm is Baid to be optimal if it yields a feasible schedule for a given set of jobs utilizing a minimum amount of resources. Again, if a scheduling algorithm is not optimal, one would wish to measure its differences in terms of the amount of resources it utilizes.
DETERMINISTIC JOB SCHEDULING
249
An example of case (ii) that has been studied rather extensively is one in which a set of independent jobs is to be scheduled on a computing system con sisting of identical processors, with all jobs having the same ready time and dead line . Such a problem can clearly be rephrased as a bin packing problem in which we have an infinite supply of bins of a fixed size and a set of packages to be packed into bins so that the sum of the sizes of the packages in a bin does not exceed the size of the bin. For the many results on bin packing, see Johnson et al [29], and Johnson [28]. See also Coffman, Garey, and Johnson [8], and Garey, Graham, Johnson, and Yao [15]· As it turns out, the problem of determining a feasible schedule for a given set of jobs that utilizes a minimum number of processors is another NPcomplete problem. Consequently, attention has been directed toward simple scheduling algorithms that do not use excessive amounts of resources. As illustrations, we present the following results: An algorithm known as the nextfit algorithm assigns jobs to processors one at a time. (Since the jobs are independent, the jobs assigned to one processor will be executed sequentially.) Let us number the processors as Ρ , Ρ , . . . . Starting with processor P., a job will be assigned to processor P. if its execu tion time plus the execution times of the jobs that have already been assigned to P. does not exceed the deadline. Otherwise, it will be assigned to processor P.
, and the assignment step will then be repeated for the next job on processor
P.
.
Let Ν denote the number of processors needed by the nextfit algorithm and
N_ denote that needed by an optimal algorithm. Theorem 11:
It is not difficult to show that
For the nextfît algorithm Ν lim JL g 2 NQ*»
0
An improvement of the nextfit algorithm is the firstfit algorithm. It also assigns jobs to processors one at a time. The only difference is that a job will be assigned to Ρ if its execution time plus the execution times of the jobs that have been assigned to Ρ
do not exceed the deadline.
made to assign the job to Ρ , and then Ρ , and so on. Theorem 12 (Johnson et al [29]): τ
Otherwise, attempts will be We have
For the firstfit algorithm Ν
^
17
As an illustration of case (i), we consider first the probleimof scheduling a set of independent jobs with individual ready time and deadline on a computing system with a single processor where preemption is allowed. Because preemption is allowed, we can restrict ourselves to the consideration of demand scheduling algorithms only. The notion of list scheduling algorithms defined above can also be carried over. Here, we shall allow a job of higher priority to preempt the execution of a job of lower priority, whenever the former is ready. A list scheduling algorithm, known as relative urgency algorithm or deadline driven algorithm, assigns priorities to jobs according to their deadlines with higher priorities assigned to jobs having earlier deadlines. It has been shown that
250
C. L. LIU
Theorem 15 (Labetoulle [37]):
The relative urgency algorithm is optimal.
See also Labetoulle [38] for further results. A special case of the problem of scheduling independent jobs with individual ready time and deadline is that of scheduling periodic job streams. The problem was studied in Serlin [47] and Liu and. Layland [41]. We define a periodic job stream to be an infinite sequence of jobs with periodic ready times, where the deadline of a job is the ready time of the succeeding job in the job stream. Moreover, all jobs in the job stream have the same execution time. For a periodic job stream J., we shall use T. to denote the period of the jobs in the stream, C, to denote the execution time of each job. the utilization factor of the job stream.
The ratio u
= C /τ
is referred to as
Clearly, the relative urgency algorithm can be applied to the scheduling of periodic job streams. However, a class of algorithms that are easier to implement assign priorities to the job streams instead of to the jobs. In other words, if a job stream has a higher priority over another job stream then any job in the former stream will have higher priority over any job in the latter job stream. Among all algorithms that assign priorities to job streams, we study one that assigns priorities to job streams according to their periods with higher priorities assign to job streams with shorter periods. We shall refer to such a scheduling algorithm as the rate monotonic scheduling algorithm. The following theorems are due to Liu and Layland L41J: Theorem 14: Among all scheduling algorithms that assign priorities to periodic job streams, the rate monotonic scheduling algorithm is a best one in the sense that if a set of periodic job streams is schedulable by any assignment of priorities to the job streams, it is schedulable by the rate monotonic scheduling algorithm. Theorem 15: A set of m periodic job streams is schedulable by the rate monotonie scheduling algorithm if the sum of their utilization factors is less than or equal to m(2
-1).
Note that Theorem 15 shows a way to measure the effectiveness of the rate monotonie scheduling algorithm, which is not an optimal algorithm. Dhall and Liu [12] studied the problem of scheduling periodic job streams on a computing system with η identical processors. Some of their results illustrate the possibility of combining the techniques discussed above. We define the rate-monotonic next-fit scheduling algorithm and the rate-monotonlc first-fit scheduling algorithm for periodic job streams: Periodic job streams are assigned to processors using either the next-fit algorithm or first-fit algorithm similar to that defined above for independent jobs. In the present case, a set of job streams is said to "fit" into one processor if they are schedulable by the rate monotonie scheduling algorithm. We have: Theorem l6: When a set of periodic job streams is scheduled by the ratemonotonic next-fit algorithm, we have 2.4
s
lim τρ N0-« 0
g
2.67
where N is the number of processors used by the rate-monotonic next-fit algorithm and N n is the minimum number of processors needed.
DETERMINISTIC JOB SCHEDULING
251
Theorem 17: When a set of periodic job streams is scheduled by the ratemonotonic first-fit algorithm, we have
lim
V°
/
g 2
°
where N is the number of processors used by the rate-monotonic first-fit algorithm and N_ is the minimum number of processors needed. Labetoulle [38] has also studied the problem of scheduling independent jobs for multiprocessor computing systems. Garey and Johnson [l8] studied the case of scheduling jobs with unit execution times over which there is a precedence relation on a computing system with two identical processors. V.
Concluding Remarks
As was mentioned in the introduction, it is not possible to carry out an exhaustive survey of the many aspects of the problem area of job scheduling in the amount of space we have for this paper. Some of the important topics we did not mention or only have barely touched upon include scheduling to minimize mean flow time or weighted mean flow time, scheduling with penalty or cost function, and preemptive scheduling. We refer the interested reader to the general references cited in Section I. References [I]
Baer, J. L., "A survey of some theoretical aspects of multiprocessing," Computing Surveys, 5, 1 (1973), 31-80.
[2]
Baker, K., Introduction to Sequencing and Scheduling, John Wiley & Sons, 1974·
[3]
Brucker, P., J. K. Lenstra and A. H. G. Rinnooy Kan, "Complexity of machine scheduling problems," to appear in Operation Research.
[4]
Chen, N. F. and C. L. Liu, "On a class of scheduling algorithms for multi processors computing systems," Proceedings of the 1§74 Sagamore Computer Conference on Parallel Processing (1974), I-I6.
[5]
Chen, N. F. and C. L. Liu, "Bounds on the critical path scheduling algorithm for multiprocessor computing systems," to appear.
[6]
Coffman, E. G., Jr., (ed.), Computer and Job-Shop Scheduling Theory, John Wiley & Sons, 1976.
[7]
Coffman, E. G., Jr. and P. J. Denning, Operating Systems Theory, PrenticeHall, 1973.
[8]
Coffman, E. G., Jr., M. R. Garey and D. Ξ. Johnson, "An application of bin packing to multiprocessor scheduling," to appear.
[9]
Coffman, E. G., Jr. and R. L. Graham, "Optimal scheduling for two processor systems," Acta Informatica, 1, 3 (1972), 200-213.
[II] Conway, R. W., W. L. Maxwell and L. M. Miller, Theory of Scheduling, Addison-Wesley, I967.
252
C. L. LIU
[11]
Cook, S. Α., "The complexity of theorem-proving procedures," Proc, of the 3rd Annual ACM Symposium on the Theory of Computing, 1971, 151-158.
[12]
Dhall, S. K., and C. L. Liu, "On a real-time scheduling problem," to appear.
[13]
Fujii, M., T. Kasami, and K. Ninomiya, "Optimal sequence of two equivalent processors," SIAM J. on Applied Math., 17, 3 (1969), 784-789. Erratum, 20, 1 (1971), Ito.
[l4]
G arey, M. R., and R. L. Graham, "Bounds for multiprocessor scheduling with resource constraints," SIAM J. on Computing, 4 (1975), 187-200.
[15]
G arey, M. R., R. L. Graham, D. S. Johnson, and A. C. Yao, "Resource constrained scheduling as generalized bin packing," to appear in J. Combinatorial Theory (A).
[16]
G arey, M. R., and D. S. Johnson, "Complexity results for multiprocessor scheduling under resource constraints," SIAM J. on Computing, 4 (1975), 397-411.
[17]
G arey, M. R., and D. S. Johnson, "Approximation algorithms for combinatorial problems : an annotated bibliography," to appear in Algorithms and Complexity: Recent Results and New Directions, J. F. Traub (ed.).
[l8]
G arey, M. R., and D. S. Johnson, "Scheduling tasks with non-uniform deadlines on two processors," to appear in J. ACM.
[19]
G raham, R. L., "Bounds for certain multiprocessing anomalies," Bell Sys■ Tech. J., 45 (1966), I563-I58I.
[20]
G raham, R. L., "Bounds on multiprocessing timing anomalies," SIAM J. on Applied Math., I7 (I969), 4l6-429.
[21]
G raham, R. L., "Bounds on multiprocessing anomalies and related packing problems," Proc, of the Spring Joint Computer Conference (I972), 205-217.
[22]
G onzales, T., 0. H. Ibarra, and S. Sahni, "Bounds for LPT schedules on uniform processors," University of Minnesota, Computer Science Technical Report, 1975.
[23]
Horowitz, E., and S. Sahni, "Exact and approximate algorithms for scheduling non-identical processors," to appear J. ACM.
[24]
Hsu, N. C , "Elementary proof of Hu's theorem on isotone mappings," Proc. AMS, 17 (I960), 111-114.
[25]
Hu, T. C , "Parallel scheduling and assembly line problems," Oper. Res., 9, 6 (1961), 841-848.
[26]
Ibarra, 0. H., and C. E. Kim, "Heuristic algorithms for scheduling independent tasks on non-identical processors," University of Minnesota, Computer Science Technical Report, 1975·
[27]
Johnson, D. S., "Approximation algorithms for combinatorial problems," J. Computer and Systems Sciences, 9 (1974), 256-278.
[28]
Johnson, D. S., "Fast algorithms for bin packing," J. Computer and Systems Sciences, 8 (1974), 272-314.
DETERMINISTIC JOB SCHEDULING
253
[29]
Johnson, D. S., A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham, Worst-case performance bounds for simple one dimensional packing algorithms," SIAM J. on Computing, 3 (1974), 299-325·
[30]
Kafura, D. G., "Analysis of scheduling algorithms for a model of a multi processing computer system," Ph.D. Thesis, Purdue University, 1974.
[31]
Kafura, D. G., and V. Y. Shen, "Scheduling independent processors with different storage capacities," Proc. ACM National Conf. (1974), I6I-I66.
[32]
Kafura, D. G., and V. Y. Shen, "An algorithm to design the memory configura tion of a computer network," Purdue University, Computer Science Technical Report, 1975.
[33]
Karp, R. M., "Reducibility among combinatorial problems," Complexity of Computer Computations, R. E. Miller and J. W. Thatcher (eds.) Plenum Press, New York, N.Y. (1972), 85-IO3.
[34]
Karp, R. M., "On the computational complexity of combinatorial problems," Networks, 5 (1975), 45-68.
[35]
Kaufman, K. T., "An almost-optimal algorithm for the assembly line scheduling problem," IEEE Trans, on Comp-, C-23 (1974), II69-II74.
[36]
Krause, K. L., V. Y. Shen, and H. D. Schwetman, "Analysis of several taskscheduling algorithms for a model of multiprogramming computer systems," J. ACM, 22 (I975), 522-55O.
[37]
Labetoulle, J., "Ordonnancement des processus temps reel sur une ressource preemptive," Thèss de 3 Î m e cycle, Université' Paris VI (1974"V
[38]
Labetoulle, J., "Real time scheduling in a multiprocessor environment," to appear.
[39]
Lam, S., and R. Sethi, "Worst case analysis of two scheduling algorithms," to appear in SIAM J. on Computing.
[40]
Lenstra, J. K., Sequencing by Enumerative Methods, Mathematisch Centrum, Amsterdam, I976.
[4l]
Liu, C. L., and J. W. Layland, "Scheduling algorithms for multiprogramming in a hard-real-time environment," J. ACM, 20 (1973), 46-6l.
[42]
Liu, J. W. S., and C. L. Liu, "Bounds on scheduling algorithms for heterogeneous computing systems," Proc, of the I974 IFIP Cong. (1974), 349-353.
[43]
Liu, J. W. S., and A. Yang, "Optimal scheduling of independent tasks on heterogeneous computing systems," Proc, of the ACM (1974), 38-45·
[44]
Rinnooy Kan, A. H. G., Machine Scheduling Problems, H. E. Stenfert Kroese B. V., Leiden, 1974.
[45]
Sahni, S., "Algorithms for scheduling independent tasks," J. ACM, 23 (I976), 116-127.
[46]
Schindler, S., "On optimal schedules for multiprocessor systems," Proc, of Princeton Conf. on Information, Scienes and Systems (1972), 219-223.
254 [47]
C. L. LIU Serlin, O., "Scheduling of time critical processes," Proc, of the Spring Joint Computers Conference (1972), 925-932.
[48] Ullman, J. D., "Polynomial complete scheduling problems," Operating Systems Review, 7 (1973), 96-101. [49] Yao, A. C , "On scheduling unit-time tasks with limited resources," Proc, of the Sagamore Conference on Parallel Processing (1974), 17-36.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
FORMAL MODELLING OF DISCRETE DYNAMIC SYSTEMS Heinrich C.Mayr, Peter C.Lockemann Institut für Informatik II Universität Karlsruhe
The aim of the paper is to introduce a tool for the formal description of systems with discrete dynamic behaviour. The states of these systems are defined by distinct distributions of discrete entities over certain system elements that are capable of storing these entities. A state transition can then be described as a change from one distribution to another, the change being due to active system elements. Information processing systems form a subclass of discrete dynamic systems so that the formal concepts developed in the paper may be applied to their description and evaluation. The respective models are called "Decision and Action Nets".
1. Introduction The analysis of information systems - already existing or being developed often suffers from the lack of concepts and tools for its systematic execution. Seldom in this domain the precision of the results of an analysis satisfies mathematical demands, even if formal methods are being used for certain partial domains. It is easy to see that the lack of precise concepts often influences the quality of gained informations in a negative way. Further the error-proba bility is proportional to the system complexity, therefore the definition of exact descriptive tools whose application can be supported by computer is necessary. This could also support communication about information systems and their development, both being impeded by a non-unified terminology. Apart from a unified terminology we search for concepts to describe systems, which support both the analysis of the system structure and the processes taking place within the system. Among others the following system characteristics are of interest: -
liveness and safeness conflicts in case of access to resources existence of bottle-necks store capacity speed of operation and transfer distribution and co-ordination of partial tasks structure and manipulation of the data produced by the system.
A tool for description covering these would besides introduce possibilities to compare different systems; it would be highly desirable if such a comparison could be performed on different levels of abstraction. A number of attempts in this direction have already been carried out. Apart from Petri nets which reflect enlogic and synchronic system characteristics no attempt has found wide spread application so far. To our opinion this is due to two essential facts: - First the demand for model interpretation by these concepts is too high, i.e.
255
256
H.C. MAYR, P.C. LOCKEMANN
abstraction of things occurs which are of interest to system analysts; they, however, can insert them by interpretation only. - Secondly some concepts start from a kind of model-module-technique, and so require the user to adapt his ideas in a way to enable their description by the existing modules. (E.g. the Evaluation Nets of Noe and Nutt (1973)). We believe the first thing is to find out from the system aspect, which proper ties and relations must at least be describable no matter how complicated it is to formalize. Then we may attempt to restrict the conceptual system to what is necessary in each particular case. The concept introduced in part 4 of this paper is to be understood as a step in this direction, without claiming, however, that it will satisfy all requirements. In the remainder we shall consider a more general subject area, namely the area of dynamic systems of which informations system are a part. We will therefore give a more detailed discussion of dynamic systems in section 2. It is obvious that the concept of model takes up an essential part when handling tools for description. Hence we will discuss it in section 3.
2.
Dynamic Systems
It is widely agreed to define a system by a set of objects (the system elements), governed by so-called system relations existing between. A particular system is specified by the kind of its elements and relations: To the relations certain predicates are attached, whose validity must be determinable for all objects of a domain being observed, so that the system can be explicitly distinguished from its environment. However, this does not bar further relations of other kinds from existing between system elements and objects of the environment. Such system elements are called boundary elements. A system without boundary elements is designated as closed, otherwise as open. We assume, however, each element to enter at least one system relation to at least one different element, i.e. isolated elements are excluded. Going even further, we demand that systems be connected; i.e. for every binary partitioning of the element set at least one pair of elements can be found with a relation existing between them such that the elements do not belong to the same subset. Otherwise we shall speak of two or more systems which are not necessarily independent of each other. The reason for this requirement will become apparent in section 3. Using the concept of set in the system definition has two important consequences: On the one hand we express thereby that any objects of human observation or thought are qualified to be system elements (according to Cantor's set definition), and in particular that the system elements may themselves be systems. On the other hand it is guaranteed that all elements are well-distinguished with regard to the system observed. In accordance with the origin of the system elements, the relations existing between them will also be both of an intellectual or physical nature. Logical dependencies between the constituents of a complex theory are as admissible as relations corresponding to the physical transport of energy or materials. The remarks so far apply mainly to structural and static aspects, so that we could also speak of a structure instead of a system. The notion of function, how ever, is also associated with the system concept. By this notion we understand the entirety of tasks which are assigned to the system by its external environment, or to a system element by its own (system internal) environment. The notion of
FORMAL MODELLING OF DISCRETE DYNAMIC SYSTEMS
257
task is not defined exactly, but we can state that in literature it is in fluenced by the following 6 aspects: way of performance (How to perform the task?), subject (who?), object (at what?), resources (with what?), time (when?), location (where?). The execution of a task is called an activity. A system activity is split up in activities of one or more system elements. As a result there must be some interaction between elements (executing partial tasks) to execute the whole task in a consistent fashion. The entirety of activities of a system element defines its behaviour. Consequently the behaviour of a system is determined by the be haviour of its elements and their interactions. An activity is a process that takes up a certain amount of time, i.e. time is a reference entity for systems. Whereas the interaction is hard to imagine for mental objects, it is realised in the material domain by physical phenomena which are exchanged via the existing relations and can be observed or measured at certain places of the system. They are generally designated as flow entities. If the phenomena consist of discrete entities, the respective system is called discrete. Obviously flow entities are also system elements having, however, variable and time-dependent relationship to the system. Their distribution in the system - observed at a certain instance in time - defines the system state at that particular instance. A system is called dynamic, if state variations can be observed in an ordered time space. Structural variations (i.e. alterations in the composition of a system by elements or relations between elements) lead to a new system according to our definition of system. If for each possible system state all subsequent states are uniquely determined, the respective system is called deterministic.
3.
Remarks on Modelling
Our aim is to develop concepts which enable us to describe dynamic systems in a uniform way and to analyse their behaviour by means of their description, e.g. their model. The results of such an analysis highly depend on the quality of the model which for its part is directly related to the power and the precision of the concepts being used for modelling. Hence we must take the notion of modelling concept into closer consideration before discussing our special tools for system description. The relevant literatureconsiders a model to be a conceptual image of a certain object (the model object ). As such it depends on a subject capable perceiving the model object, i.e. the model is an intellectual affair. Modelling, therefore, is a subjective process associating a model with a model object. For the purpose of communication a model must be materialised, i.e. a physical representation must be associated with the model. Associating a representation with a model is called realization. We can assume that each modelling is based upon certain intellectual concepts. In the most general case, these concepts are no more than intuitive notions. For communication within a system, however, concepts are needed which must be inde pendent of a particular subject; these we shall call modelling concepts. They supply the rules according to which we arrive at perceptions of objects. In addition, realization concepts regulate the realization of models. The rules themselves are formulated in a meta-language - in the sequel mainly in natural English. This corresponds to the observation in modern linguistics that any human cognition is embedded within the scope of language.
258
H.C. MAYR, P.C. LOCKEMANN
In order to discuss concepts without constantly referring to their formulation in meta-language, we shall denote them by so-called (concept-indications (e.g. the concept of "finite automaton"). The total of all objects for which models can be constructed according to a given concept is called the object domain of the concept, and the models are called instances of that concept. Usually these models are named like the concept indication. A set of modelling concepts, whose object domains together cover a given object set, forms - together with a set of related realization concepts - a modelling system for this object set. It is our goal to define a modelling system for the set of discrete dynamic systems. In developing a model for a given system one usually wishes to disregard certain aspects of the system, and emphasize others, i.e. to abstract from the former ones. This is true for each modelling, i.e. modelling and abstraction are closely interrelated. The extend to which a modelling concept effects the disregard of aspects defines its abstraction level. Consequently there exists an order between abstraction levels at least from an intuitive standpoint. A modelling system for dynamic systems will be of particular value if it allows for a whole range of abstraction levels. For many purposes models would be useless if they abstracted from the structural characteristics of their objects. On the other hand, if they are structured ("complex") they can certainly be understood as static systems. Hence the notion of system is a fundamental concept. This is the reason why we demanded in the previous section strong connectivity of systems. The components of a complex model and the relations existing between them are, for their part, models developed according to particular concepts. Unstructured models are called primitive just like the concepts they are based upon. Models are formed for the sake of deriving statements concerning their objects. Therefore complex modelling concepts are expected to result in models that, from a meta-theoretic standpoint, conform structurally to the model object. Naturally, the degree of this homomorphy depends on the level of abstraction. Models whose homomorphy with the model object is perceptible by mere human inspection are called figurative, whereas models are called formal, if they are based upon a mathematical concept. Corresponding designations exist for model representations, too. Often formal models are described by figurative representations and vice versa. Models are static, i.e. they are structures. Hence their alteration leads to new models. Consequently we may only express in a model the possibility of state variations and their results. The transition from one state to another itself cannot be modelled. (It may, however, be simulated with the help of a model representation to which a driver is applied). Consequently, the concept of state is fundamental to model objects as well. The state of an object refers to characteristics of all its time-dependent features at a given moment in time. If an object alters its own state we speak of that state alteration as of an ("active") transition. Thereby it becomes evident that we must distinguish between the modelling of transitions and the modelling of activities: The latter concern alterations of objects by other objects. Activities within a dymanic system lead to transitions of the entire system. As mentioned above, homomorphy between object and model is required in the sense that the structural characteristics of the model must be observable in the object as well. The converse is not true: the extent to which structural characteristics of the object reappear in the model is directly related to the level of abstraction chosen before. Surely it would not be realistic to search for modelling
FORMAL MODELLING OF DISCRETE DYNAMIC SYSTEMS
259
systems that cover the abundance of proceedings in real systems, further produce figurative and transparent models, and on top of that can be formalized. On the other hand, the existing modelling systems for discrete dynamic systems abstract too much, i.e. cover their relevant aspects only insufficiently. The modelling system of decision nets that is being discussed in the remainder is an attempt to introduce more general modelling concepts for discrete dynamic systems.
4
Modelling of discrete dynamic systems by means of decision nets
4.1 Modelling of structural characteristics Dynamic systems are composed of active elements (i.e. elements capable of per forming activities), flow entities and passive elements where flow entities can be observed. Since in physical systems flow entities occur for a finite duration, passive elements will be called stores. Stores also are the connecting link to an observer outside the system, because he can draw his conclusions on the processes going on in the system only by observation and comparison with earlier obser vations. The elements capable of action are called office. Generally an activity of an office is subject to certain conditions (e.g. the existence of a task) whose existence is manifest by the occurrence of flow entities in stores where they may be perceived by the respective office. The idea of perception is used to express the potential for access such as measurability, visibility etc.. The utilization of such an access is called re cognition, regardless of whether flow entities are absorbed from their store or not. The structural pre-requisite for recognition, and for transport of flow entities from stores to offices, is the existence of connections between offices and stores (visual connection, audible connection etc.). Furthermore, access in opposite directions is necessary as well: offices can place flow entities in stores and so make them available to other offices. Here connections for access must be provided, too. We call all these connections re gardless of their direction interfaces. Obviously, interfaces represent the structural relationships between system elements. The essential difference between stores and interfaces is as follows: a flow entity passing through an interface (in consequence of an access) is not accessible in the sense of our definition of access. Hence the one end of an interface is determined as the point at which flow entities cease or start to be observable, the other end as the point, where flow entities begin or cease to be the subject of an activity. ι The direction of an interface depends on the capabilities of the office involved for access, on the access characteristics of the store involved (e.g. read-only memory), and on the property of the transport medium (e.g. semi-permeability in diffusion processes). In the modelling system, however, the direction is ex clusively a feature of the (connecting) interface. Interfaces through which an office observes flow entities are called inputs of the office, otherwise outputs of the office. Whereas stores are exclusively passive, offices may also give the appearance of being able to store flow entities. In such offices a medium (memory) must exist, where flow entities can be filed or recognised exclusively by the office
260
H.C. MAYR, P.C. LOCKEMANN
involved. It is quite obvious that this can be described by means of the concepts defined up to now, provided the office is adequately dissected into the subsystem. Hence the distinction between stores and memories is defined only relative to a given level of abstraction, i.e. the specific needs of the modelling person. In summary, offices, stores and interfaces are our tools for describing the structure of a dynamic system. The similarity to ummarked Petri nets suggests an analoguous formal treatment, except that we shall combine all interfaces (in puts and outputs) into the single relation. In addition, we require both the set of offices and the set of stores to be non-empty, since otherwise we cannot speak of a dynamic system. 4.2 Modelling of flow entities As a rule, the possibilities of access established by an interface are not uni versal, but can only be applied to flow entities with strictly defined character istics, namely those which are the subject or the result of activities of the office involved. Furthermore there are offices which can perform different kinds of activities, and then cause the dependency of their reaction on certain charac teristics of the flow entities being filed on their input stores.Consequently, there exist relationships between flow entities and activities that induce a typification of flow entities that is specific to atleast one office: Flow enti ties of the same type cause this office to react in the same general fashion. This reasoning parallels discussions in the field of programming languages where the type or mode of an object is defined by the operations that can be performed on it. The concept of office-specific type runs counter to the needs of communication between offices. To circumvent the problem, one may consider flow entities of different types to be built from elementary entities whose elementary type is identical throughout the system. These elementary constituents are denoted as atoms; the entirety of atoms with respect to a system will be called the atom domain of that system. The set of all flow entities (the action domain) in a system can then be described by means of atom domain and a number of composition rules. For reasons of complexity we shall restrict the rules to set formation; any additional structural characteristics of flow entities will be expressed in terms of additional atoms. Furthermore we only allow a finite number of distinct elementary types. This restriction can be justified by the fact that the total of all offices of a real system at best has to work on a finite number of different kinds of tasks. (Of course, the number of potential flow entities and activities can still be un restricted). The primitive types (e.g. the types of primitives) define a parti tion of the atomic domain in finite and disjunct classes. So the type of a flow entity can be modeled by a "characteristic" vector whose components identify the types of constituting primivites. In many applications it will be sufficient to analyse system and office behaviour in dependence on the type of exchanged flow entities. Besides, on this abstraction level one can compare and classify dynamic system corresponding to their structure and behaviour and not dependinq on the atom domain involved. In the following we only refer to this level of abstraction, because the discussion of more detailed models (action nets) would be beyond the scope of this paper. Modelling flow entities by characteristic vectors(called V-Types) enables a for mal description of store contents and, consequently, of the distribution of flow entities. Apart from capacity we don't take into account structural store properties. Rather, if needed, we interpret them as primitives of a certain type,
FORMAL MODELLING OF DISCRETE DYNAMIC SYSTEMS
261
marking this type in the V-types filed on the respective store. Neglecting them it is possible that several identical V-Types are related to the same store (e.g. if the position in a structured store is the only criterion to distinguish between several flow entities), hence the concept of set is not adequate for enumerating store contents. On the contrary we must refer to the concept of selection often used in combinatorics. The functional relation between the stores of a net struc ture and selections' of V-Types is called V-type distribution. The type concept can also be used to formalise the "passability" of interfaces. As mentioned above, the access possibilities represented by interfaces are not universal, but can only be applied to flow entities with strictly determined characteristics. We then take the type of accessible flow entities as a basis, and express the passability by an interval of characteristic vectors. The corresponding assignment is denoted as interface definition. Thereby we can state which types a flow entity may consist of at least and at most, in order to enable transport via corresponding interface. 4.3 Properties of offices Before looking into the modelling of office-behaviour we have to discuss two fundamental restrictions resulting from our modelling system: - Considering the fact that flow entities are discrete, and interpreting their occurrence in the system in form of distribution on the stores of a (static) struc ture we are not able to describe a continuous flow.Consequently, the reaction of an office must always be a discrete event. - The second restriction results from our postulate the event of an activity to be exclusively expressed by an alteration of store contents. Hence the office be haviour as well can only be specified by its effect on the stores related to it. Then it is impossible to interfere to an event of activity, i.e. an activity takes place "in one step" from the view of both the observer and the other offices. In order to enable interference, the splitting of the office to a partial system will be necessary. A complete description of the processes in such systems only is possible, if we can dispose of tools to describe the conditions on which the single offices can become active. According to the remarks made so far offices become active, if perceptible flow entities have occurred on the input store. In reality, however, this rarely is necessary for all input stores. Hence a concept of activation according to Petri nets seems too restricted to us in this context. As a conse quence, we need tools to mark those subsets of office inputs )activation possibilities, which cause an office activity when stimulated. In this context stimulation means that perceptible flow entities are filed on the stores involved. The respectiv tool consists in a relation defined as correspondence, which we call conditioning. Hence the conditioning of offices of a net structure describes structural con ditions for activity. In addition, the activation of an office also depends on the filing situation on their output stores. For if these only dispose of finite capacity, the offices filing flow entities on them can only operate as long as capacity allows for.So it is necessary to know of any office activity the output stores it is adressed to, and we further need adequate mappinqs which are called flow behaviour. In this context it is sufficient to restrict their domain to the sets of the (perceptible) flow entities being accepted in the particular case. Be sides we only need to regard their types. Yet it is possible that by distribution of flow entities more than one activation possibility of an office are exploited. Normally it is determined which of these cause a reaction at first. Therefore we must provide for a concept which is easily defined by indexing the activation possibilties [priority list).
262
H.C. MAYR, P.C. LOCKEMANN
4.4 Decision Nets We can imagine a particular a c t i v i t y of an o f f i c e to consist of the following steps: Access to flow e n t i t i e s in the stores of an activation p o s s i b i l i t y Processing of perceived flow e n t i t i e s F i l i n g of produced flow e n t i t i e s on the output stores. The description of an a c t i v i t y has to cover these steps. Whereas the formalism for the description of f i l i n g is easy to define (union of selections), access and processing remain to be taken into closer consideration: I t is possible that there are more than one flow e n t i t y on an input store of an o f f i c e meeting the particular activation preconditions. The object of an access i s , however, only a single flow e n t i t y , the i d e n t i f i c a t i o n of which is random or deterministic. For the description of deterministic access we define functions, called access specification. Access specifications are l o g i c a l l y related to the interfaces. Further we need a concept f o r the description of the access behaviour, i f flow e n t i t i e s w i l l be removed or copied from input stores. Finally we define a set of functions (the behaviour) which specify the flow e n t i t i e s resulting of an a c t i v i t y . So we can define the objects called decision models of discrete dynamic systems:
nets which we w i l l understand as
Definition: A ηdecision net is a quadrupel N = (η,T,S,Σ) where (1) η&Ν (2) (T,S,r)
(number of primitive types) is a connected b i p a r t i t e graph
(T: o f f i c e s , S: Stores, T x S u S x T ¿ r :
Interfaces; |T|>0· β is the access specification of σ (4) stS ^ s = (c), where et Ν is the capacity of s. (5) t«T=»t = (Kt,pt,*t,nt,/J) wi th (5.1) K t s 2 Σ where ktKt, a«k : at Sx{t] Hence τ: Τ + 2 is a correspondence, the conditioning.) p+: K. ·+ Ν (priority list) (5.2) * t >n t ,/£ are sets of functions Φίκ,πκ ,ftk (keK t ), where in detail (k = {ο , ,a-,}s.S> π . ιί Κ Μ-1 14-1 Μ Μ
INFLUENCE OF PROGRAM LOADING'ON THE PAGE FAULT RATE It follows immediately
Π. = Π X
Π =
Μ
i1 Π O. , i = 2, M1 0=1 J n . M1 Π
°i ·
TTÕ"
J
M j=1 Solution of model Β We have O
. i = 1....M1, j = i+1 i
1 O.
, i = 1....M1, j = 1....Í
i o.M
, i = M. j = 1 , .. .M1
i.O i o„ 0„ + M
. i = M, j = M
M
otherwise and equations (4) can be written : M
1 ». -1
Π, = Σ ( —
π.
j=1 M
n
= 2
1 O.
Vi+¿(~
. - O.
M n
i = °ii n i, + S
0=1
^ " j
( — ^ ) IL
1 © π =o π + M K1 M1
M
Π„ + ® Π„ Μ M M
In order to solve this system of equations, we form the differences Π. Π. ., i = 2, M. We then have : 1 11 1 ©
n. n. , = °. , n. , ©. „ π. „ — — Η π. , , i = 2, M1, ι
11
11
11
12
12
(Πi i1 — ©i1 πi1;)= ii1 = 2 fn i2— L i1 u
11
ι1
©
Π 1.
i2 i2J
275
276
M. PARENT and D. POTIER From this equation, the solution is simply obtained as : i1 Π. = i Π, Π 1
©.
Π. n
M
= M
3
j=1
M1
r © M:
π
©.
j=1
Solution of model C We have : 1 ©.
i = 1 ; 3 = ι i = 1, ...M1 ; j = i+1
P
(1 © ) β
i = 2,...M ; j = 1
d ©)dß)
i = 2.M1 ; 3 = i
d c y o ß ) +©M
i = M; j = M
Λ
η+1
ñ "n
V
*»« Fi rare
III.
*n+1
2
MAXIMAL TRAFFIC RATE. RESPONSE TIME AND UTILIZATION RATE WITH RETRANSMISSION OF TYPE 1 We give on figure 2, the model that we want to study (we have only depic
ted station η and n+1). To analyse the behaviour of station n, we are going to re place it by an equivalent queue in which there is no feedback. We shall note the mean of a distribution f(x) by E[f(x)], the variance by Var[f(x)] and the squared
290
G. PUJOLLE
coefficient of variation of interarrivai (service) times Ka, (Ks). Let λ , μ , ρ b· η
η
η
interarrivalrate, service rate, rejection probability at station n. Let g(x) be the distribution of service time in station n. We shall take E[g(x)] = — valent
and Var[g(x)] = Var . The service time distribution of the equi n station is :
) Σ
h(x) = (1 ρ
p£] g*k(x).
(O
k=1 where * is convolution product. By LaplaceStieltjes transform we obtain : Eih(x)!=Elg(x)l/(lpn+1)=
Ks{h(x)i = p n + 1
+
(
Ks{g(x)} . (1 p n + 1 ) = p n + 1
1
+
(2)
}
ξ
. Var n . ( 1 p a + 1 )
We shall set vfi = μ η . (ï P n + , ) and Ks¿ = p ^
+
(3)
Ka a . (1 P n + , ). To
study the behavior of station n, it is necessary to know the interarrivai rate λ and the squared coefficient of variation of interarrivai times in this station. The packets arrival rate λ
at the station η is equal to λ / (1 ρ ) .
The number of packets flowing from station n1 to η is the sum of external arri vals and of the number of packets retransmitted from station n1. In the following analysis we shall use diffusion approximations: [3 to 6 ] . We have two possibilities to approximate the squared coefficient of va riation of interarrivai times : methods of Reiser and Kobayashi [l2j and Gelenbe and Pujolle [l3i. V.'e take the first one because in tandem queues the formulation is very easy : Ka
= Ks
. The approximation will be better as we approach the
saturation condition. The probability ρ
that a packet is refused at the entrance of station
η is equal to probability that there are M
packets in the station. This value
comes from Oeler.be [5] ; thus we have the system :
λ η = λ/0 P n )
Pr, ^ - P j
(4)
ERGODICITY CONDITIONS AND CONGESTION CONTROL IN COMPUTER NETWORKS where ρ •
λη = — , Y r i v n η
2t
= a
n
with b
η
= λ n
291
Ka + v Ks . n n n n
It is a linear system of equations with two unknows in which the proba bility ρ . interferes (Ks = Var μ ( ΐ - ρ . ) + P ,).We must study the system J r n+1 η nrn *n+1 n+1 beginning with the last station, because ρ = 0, thus we obtain step by step, K+1
λ
and ρ , from η = Κ to η = 2. Utilization rates are given by 1 - Ρ (θ) where
η
η
η
Ρ (θ) is determined in [5] :
'- (0) ·, ' , ' v v
(5)
1 - ρΠ
The mean number of customers in station η is computed by the technique described by Badel [6, 9 ] , that we recall briefly. If f is the probability distri bution of the number of customers in station n, given by [5] :
% ^ [ e
Ï X
l ]
0 = M and x [ 3 )
> 0,
we can r e d u c e t h e d i m e n s i o n a l i t y o f t h e s t a t e by d e f i n i n g Y = (Υ ,Υ ) where Y( = X ( + X V and Y ' = χ'2'. The p r o c e s s Υ d e s c r i b e s a r e l a t e d q u e u i n g t t t t t s y s t e m S* w h i c h i s p o r t r a y e d i n F i g . 2 ( b ) . I n S* we a l l o w f u l l g e n e r a l i t y w i t h r e s p e c t t o r o u t i n g . S „ i s e q u i v a l e n t t o S* g i v e n t h a t IT _ = ir = 0 . The M
¿ ι ¿
M
ι J
blocking mechanism of S* is such, that server 1 shuts down immediately after the transition (i,Ml) * (il,M) occurs. We shall only be interested in the stationary state probabilities {p. .} of the process Y which are defined by p. . = lim Pr {Y *"
'
= i and Y
fcH»
r
Y i s a n i r r e d u c i b l e c o n t i n u o u s T t i m e Markov p r o c e s s , h e n c e l p .
= j}. The process i
. ; always
exist.
The s t a t i o n a r y p r o b a b i l i t i e s s a t i s f y a s y s t e m o f l i n e a r e q u a t i o n s . For s i m p l i c i t y of n o t a t i o n , we d e f i n e a = μ ι , a = μ it , β , = μ,ττ 3 and B = U ^ j y W e w i l l a l s o a s s u m e , t h a t t h e r a t e s μ and μ_ a r e n o r m a l i z e d such t h a t λ = 1 . results
Cl+VVWPi,j
=P +
il.J
e
i
P
+ ß
i , j
+
l
2
P
+ a
il.J+l i
P
(0 φ ς ·
Proof: Equation (22) follows d i r e c t l y from the recursion (20). The proof of (23) to (26) requires straightforward but extremely tedious computations. We found the use of an automatic symbol manipulation system [4] an e s s e n t i a l h e l p . Define η
= min{p
Theorem It ς
,p
} and n
Assume a, > 0. 1 ζ
ζ
Μ,1' Μ,2
= max{p
,p
}.
We are now ready for
Then D ,(z) = 0 has 2M r e a l and single roots M η
Μ,Μ-1' Μ,1'ηΜ,2
n
M,M+l WhÍOh
haVe
Ά
'
^οΙΙαΛ^
properties : (1) There are M roots in (0,n ) and M roots in (η ,·») which interleave as follows with the roots of Dw _(z) ° * ζΜ,1 < ζΜ-1,1 « ζΜ,2 * ζΜ-1,2
0 and D (ρ" ) > 0 for all St..
The number of roots follows simply by counting.
20
η+ -. /Ç1 2
I
0.4 0.2
\
*
v
\
\
x
\
s
\
t
V vs.;
J»C +
s
.
»M,M:
.^ --. J· t....
\ Λ 0.1
ι
.
ι
.
L iMii *.
C
Λ
Fig. 3.
*
0+
Roots of the characteristic equation D„(z) = 0 vs. M. parameter values are o
= 1.2,
0, β
= 3.
The
The shaded
areas indicate negative values of D (z), M 2.4
The existence of a stable solution
It is the object of this section to investigate conditions under which a stable solutions exist and to show that for a stable S*, the boundary values can be
M obtained from the following system of l i n e a r equations M1 ¿0 P0,£ V W " °' k=1'2 "1 ·
(28)
Our main result is summarized in Theorem 2:
The system S* has a stable solution if and only if the characteristic M equation D„(z) has exactly M1 roots ζ„ ,,ζ„ »>···>(„ „ , in (0,1). In this M
Μ,1
Μ,¿
M, Fl- i.
case, the system of linear equations (28) determines positive boundary values {p .} up to a constant factor, which follows from normalization.
QUEUING MODEL OF A MULTIPROGRAMMED COMPUTER SYSTEM
327
Proof: The proof uses probabilistic arguments similar to the ones given in [U. The following Corollaries follow from Theorem 2 : Corollary 1:
If p±
< 1 and p 2 < 1, then S* is stable iff D (1) < 0.
Corollary 2:
The stability of S*, i.e., the condition ρ
is a necessary condition for stability of S*.
< 1 and ρ
< 1
Furthermore, if 3* is
M
M
s t a b l e , t h e r e e x i s t s an M* such t h a t S* i s s t a b l e for M* < M < » M and unstable for 0 £ M £ M*.
The property of stability is a condition on the parameters λ,μ ,μ ,{ir. .} and M. We give an explicit form of this condition in Corollary 3: λ
S* is stable if and only if
Μ>
λ
1 where
if
M+1
Pl
*
P2 (29)
M M+1
1 ρ
if ρ
The region of stability in the (p ,p )-plane is depicted in Fig. 4 for various values of M.
1.0
^ M = I6 ^-M=8
0.8
^M=4
N.
0.6
\.
\ ^ ^M,=
2
^M=l
0.4
N.
0.2
Ν.
0.2 Fig.
4.
^s^\
0.4
0.6
0.8
Stability chart for various values of M.
\ „ M = 0'λΜ>1'0 M
But (45) is the solution of a M/M/1 system with queuedependent rates \i(k) , which are defined by if k < M
μ (JO
(46) if k > M
Thus for the degenerate system S , φ = 0 (shared processors) we find the distri bution of the system population identical to the queue size distribution of an equivalent M/M/1 system, whose rate function is obtained from the closed system of Fig. 5. The decomposition of S into an open "outer model" (the equivalent M M/M/1 queue) and a closed "inner model" is shown schematically in Fig. 6. It is easy to obtain the marginal distribution Pr{x =i} from (45). The marginal distributions Pr{X =i} and Pr{x =j} are found as a weighted sum of the M closed system solutions. The decomposition of a queuing system with blocking into an open outer model and a closed inner model.is frequently used in practice to get approximate solutions for more complicated networks with blocking. [6] For the example of SM , we have now given the precise condition under which such approximations are valid, namely that φ be small or in other words, that the average number of cycles be large. The accuracy can be estimated from the graphs EÍX Fig. 7.
(3)
} vs φ which are given in
OUTER MODEL χ131
χ'Ίχ12'
-I I I I I I I | — I I I
/i(K)
K=x'»*x ia o< 131
INNER MODEL
F i g . 6.
Schematic r e p r e s e n t a t i o n of a h i e r a r c h i c a l decomposition of S .
334
M. REISER and A.G. KONHEIM
i
M=4
χ=0|
1 Γ'
40
30
λΜ = 1.05
/
s\
20
y ^ ^
/ y:0J
y
γ,Ώ
\"^ IO
λΜ=Ι.25
J
, —-—^^-^^ 1
0.2
Fig.
7.
1
0.4
1
I
j y--\ J
1
r=io 1
0.6 0.8 1.0
»
Average queue size of the buffer vs. φ for 5,. The value for M φ = 0 is the one obtained by a hierarchical model.
REFERENCES [1]
A. G. Konheim and M. Reiser, "A Queuing Model with Finite Waiting Room and Blocking", JACM, to appear.
[2]
M. F. Neuts, "Two Queues in Series with a Finite Intermediate Waiting Room", J. Appi. Prob. 5_, 123 (1968).
[31
B. Avi-Itzhak and M. Yadin, "A Sequence of Two Servers with No Intermediate Queue", Mgmt. Sci. 11^, 553 (1965).
[4]
J. H. Griesmer and R. D. Jenks, "Experience with an Online Symbolic Mathematics System", Proc. of the ONLINE72 Conf., Brunei Univ., Oxbridge, Middlesex, England (1972) (Also as IBM Report RC3925, "The SCRATCHPAD System").
[5]
S. S. Lavenberg, "Stability and Maximum Departure Rate of Certain Open Queuing Networks Having Finite Capacity Constraints", IBM Report RJ1625, 1975.
[6]
P.J. Courtois, "Decomposability, Lustabilities and Saturation in Multiprogramming Systems", CACM, 18, 371(1975).
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
Y. Shohat* AUERBACH Associates Inc. and J. C. Strauss University of Pennsylvania Philadelphia, Pennsylvania
The HASP Execution Task Monitor (HETM) is an optional feature of the IBM OS/360 operating system.
HETM attempts to improve
system performance by employing a dynamic dispatching algorithm that balances CPU usage of the various tasks multiprogrammed by the system. The success of HETM has prompted the development of similar algorithms with different performance improvement goals and mechanisms.
This paper describes two queuing theory models
of the dispatching algorithms.
The first is a simple model
which gives approximate results, and the second is a more elab orate model which gives exact results.
Both models are used
in evaluating the effectiveness of various dispatching algo rithms.
The approximate results obtained by the simple model
are verified to be reasonably accurate.
Simple queuing theory
methods are demonstrated to be useful for the analytical study of computer system performance. INTRODUCTION Multiprogrammed computer systems were developed to improve system thruput capa bility.
This development introduced the related problems of allocating system
resources and evaluating; any effort to improve system performance by means of better resource allocation introduces the need to evaluate the level of the im provement.
Such evaluation is very difficult for multiprogrammed computer sys
tems because of the complex interrelationships existing between system elements
* This paper is based on a thesis submitted to the Faculty of The Moore School of Electrical Engineering in partial fulfillment of the requirements for the degree of Master of Science in Engineering (for graduate work in Computer and Information Sciences).
335
336
Y. SHOHAT and J.C. STRAUSS
This paper uses queuing theory models to investigate dynamic dispatching as a CPU scheduling discipline for a multiprogrammed computer system, where several jobs of different types compete for the allocation of system resources.
The pa
per deals with dynamic dispatching similar to that available through HASP Execu tion Task Monitor (HETM), a feature of IBM OS/360 operating system. J. C. Strauss [19,20,21] evaluates HETM and develops two variations based on its dispatching algorithm.
In reference [20] Strauss employs queuing theory models
to investigate the HETM algorithm and its variations.
However, the results ob
tained in [20] are based on simplified assumptions and serve only as a first approximation.
This paper develops an exact solution to the problem presented
in [20] and evaluates the error caused by the approximations made in [20]. Section 2 reviews the properties of the HASP dynamic dispatching algorithm and presents several variations.
Following the presentation of the approximate solu
tion developed in [20], Section 3 develops the exact solution.
Results are pre
sented in Section 4, and a comparison is made between the approximate and the exact results. THE HASP EXECUTION TASK MONITOR AND VARIATIONS This section reviews in brief the HETM dispatching algorithm and its variations and presents their basic characteristics.
A detailed discussion can be found in
[18,19,20,21,23]. HETM ENVIRONMENT The environment which is presented here is a job class scheduled environment, like the IBM OS/360.
In such a system, jobs are assigned to classes external to
the system and are selected for initiation on a priority basis from within clas ses.
In IBM-like multiprogramming systems, the CPU execution (dispatching) pri
ority of tasks which constitute a job can be determined in one of two ways:
(1)
the dispatching priority is fixed and determined by the class to which the job is assigned (as in the case of standard O S ) , or (2) the dispatching priority is rearranged periodically according to dynamic changes in task characteristics in order to better utilize the system (as in the case of the HETM).
In any case,
this task dispatching priority is pre-emptive of all lower priority task CPU execution. In general, the environment discussed above can be described as in Figure 1: Jobs are classed to Μ„ job classes on the basis of resource requirements and i is assumed that the classes are equally worthy of system attention.
Jobs are
[] The numbers in brackets in the text indicate references in the Bibliography.
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS multiprogrammed one from each job class. units of CPU time to complete.
337
Jobs from class i require an average C
At time t¿
(=ΧΔ), there are Ν. . jobs in the i
input job class queue. COMPUTER SYSTEM
JOB CLASSES
(
(
O
Class 1
Class ι·
Multiprogrammed
)
ij at degree M„
• • (
ACHIEVED PERFORMANCE
Class i
) System operates in priority mapping j for period Δ ·
• •
•0
Class 1
(
A mapping consists of each job class having a unique ordered priority between 1 and M„ or zero. Classes of priority zero don't execute.
For the j priority mapping, the system achieves an average . CPU usage for the 1 job class of c.,.
There are Ν . jobs in class i at time £. Jobs are classed based on resource and other require ments and may only be multiprogrammed between classes. Figure 1. Pictorial representation of job class oriented computer system
The system control mechanism, which is similar to that available in OS/360, in volves the specification of the unique internal preemptive CPU priority to be given to a single task from each of the M
classes over a control interval Δ .
This priority ordering is specified every Δ time units on the basis of the ob served performance of the system. of the M
The specification of the ordered priorities
classes is termed a priority mapping.
There are Mp priority mappings.
The system behavior is observed through the job CPU usage c.. attained for a task ij ,th . from the i job class when it is multiprogrammed in priority mapping j .
338
Y. SHOHAT and J.C. STRAUSS
STANDARD OS/360 JOB CLASS SCHEDULED SYSTEM In a standard OS/360 job class scheduled system, there is no dynamic dispatching. A fixed relationship exists between the job class and the preemptive CPU dis patching priority of tasks from jobs in that class.
One intent of this fixed
dispatching priority scheme is to allow installations to class jobs based on prior knowledge of CPUI/O characteristics and thereby obtain high throughput operation.
This intention ignores two very important problems: (1)
The characteristics of many jobs are not known;
(2)
The characteristics of many jobs change dramatically during execution.
These problems prompted development of the HASP Execution Task Monitor, HETM, to provide a dynamic dispatching capability while still retaining other good user oriented features of job class scheduling. HETM DYNAMIC DISPATCHING A complete description of HETM and its detailed relations to OS dispatching is presented in [23]. HETM periodically rearranges the preemptive CPU dispatching priority of speci fied tasks in an attempt to more equitably distribute CPU service and maintain system thruput by keeping I/O bound tasks active without unduly harming service to CPU bound tasks.
The set of tasks monitored by HETM ("the dynamic priority
group") and the monitoring period ("the control interval") are specified by the installation.
The task priority is rearranged on the basis of the recorded task
CPU utilization history. history of the i
The equation employed to determine the CPU utilization
task at control interval i (h .) is: i ,χ
\ ι '
cpu
i,*
+ h
i,¿i V "
where : (1)
(CP
£ Ν CPU ±
t
Yx
+ h
i,*i>
=
number of tasks in dynamic priority group.
=
CPU usage of the i t h task during the ¿ t h control interval.
For new tasks entering the dynamic priority group, the prior history is set to zero and the task is assigned the lowest priority.
Low h values indicate that
the task has not utilized the CPU either because it was blocked by I/O or waiting,
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS but not activated.
339
H igh h values indicate that a task has had the opportunity
and has utilized the CPU. HETM is activated every control interval.
The CPU utilization history values
(h ) of all tasks in the dynamic priority group are computed and the OS dis i,x patching chain is reordered in inverse order of the value ranked history values. Reference [19] establishes that HETM attempts to equalize the utilization of the competing tasks.
This result can be stated more formally:
Let P. denote the
probability of mapping j occurring while applying the HETM algorithm; for a sys tem in statistical equilbrium, the standard H ETM algorithm will attempt to cause equal average CPU usage by each class as described by (2). M
M Ρ
Ρ
Vi,k€
ÇxYu'ÇiVw
[ι
·ν
(2)
Reference [19] establishes also that HETM algorithm can have a marked effect on performance measures related to thruput and turnaround.
These observation moti
vated the development of two other dynamic dispatching algorithms with different optimization goals.
These algorithms are based on the tendency for HETM to
equalize the "control variable", CPU.
, of equation (1), over all job classes in
ι ,A
the dynamic priority group as pointed out in (2). VARIATIONI EQUAL JOB CLASS TURNAROUND DYNAMIC DISPATCHING The whole point for designing a new variation of the HETM algorithm, is that thruput as a measure of system performance should be defined not solely in terms of keeping the resources of the system active but also in terms of the system's ability to process the given workload.
Where before thruput was measured by the
degree of system resources balance and utilization, it is now defined in terms of the average rate of job completion. An appropriate definition of thruput for an equal worth job class scheduled sys tem, is the time average rate of job completion requiring equal average turnaround for all job classes.
For optimization purposes, the equal turnaround restriction
can be expressed as the constraint that jobs from different classes are processed at average rates proportional to the number of jobs waiting execution in the dif ferent class queues. If P. denotes the probability of the system operating under priority mapping j , while applying the equal job class turnaround dispatching algorithm, the equal job class processing rate constraint is expressed in (3)
340
Y. SHOHAT and J.C. STRAUSS M
M
.Ρ p. , _P ? JJïTn " S i JJiT¿ Jil N ^ j^l N k C k where:
N
k li , [l.M TVI. 1 K c n J '
c
!
(3)
= the number of jobs of class i to be completed
C. = average CPU time of a class i job c . = average CPU usage of a class i job running in priority mapping j . As indicated previously, the effect of the HETM dynamic dispatching algorithm is to provide equal time average values of its control variable over the classes, if it is possible.
Equation (3) describes a situation where it is desired that the
class proportional CPU usage have equal average values for all job classes. Therefore, if the class proportional CPU usage (CPU CPC,
) defined in (4) 1 ,x
, f ^ x
is employed as the control variables in the HETM algorithm described in equation (1), equal average job class turnaround will be achieved where possible.
Refer
ence [21] establishes the optimality of this algorithm in the sense that it pro vides maximum thruput in addition to equal average job class turnaround. VARIATIONII EQUAL JOB CLASS RESPONSE RATIO DYNAMIC DISPATCHING It is often the case that the goal of VariationI is not realistic.
This would
particularly be the case of large imbalances in the number of jobs in the class queues.
In such a case the control algorithm of VariationI could delay process
ing of a class of small jobs in favor of a class of many large jobs.
While using
a similar approach as [8] which presents the idea of highest response ratio job initiation, reference [20] extends this idea to develop another dynamic dispatch ing algorithm. Using the presented method a new control variable for (1) is defined as follows: The average job class response ratio may be described as: Average class „ „ .. turnaround time Response Ratio » Average class uniprogramming processing time
if
r±
°i Average uniprogramming elapsed time for job class i
(5)
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
where
341
C. = Average CPU time of a class i job
then
O.T^
= Average uniprogramming elapsed time time of class i job
To specify an equal average class response ratio constraint the denominators of the terms in equation (3) need only be divided by the appropriate class unipro gramming elapsed times.
A simple cancelation of C. yields:
M
M Ρ
Ρ p
Σ ^ fa. Σ j=l
* Ν
j=l
Λ
j fa. ì
where P., c.., N, are defined as before. j ij ι response ratio (CPR. ) defined in (7)
CPR. , =
CPU
'
V i , k e [l,Mc]
(6)
ΝΛ Therefore, if the class proportional
i,x
(7)
Ά,χ
is employed as the control variable in the HETM algorithm of equation (1), equal class response ratio will be achieved where possible. EVALUATING THE DISPATCHING ALGORITHMS USING QUEUING NETWORK MODELS A simple queuing model is developed in reference [20] to investigate the proper ties of the dispatching algorithms presented above.
Queuing theory is widely
used to investigate multiprogramming computer systems.
The most popular type of
models applied for this purpose is the closed queuing network model, first in vestigated by Jackson [13] and by Gordon and Newell [12]. In the last several years, work on this application has produced a variety of models meant to capture important aspects of computer systems.
Arora and Galo [3], Baskett [6], Buzen
[9], Mitrani [15], Moore [16], Tanaka [22] and other apply this type of models to multiprogramming computer systems.
Adiri [1] and Buzen [10] review the applica
tion of queuing network models to computer systems. Three major problems immediately arise when one tries to employ such a model to the problem of dynamic dispatching algorithms.
The first problem emerges from
the fact that the model is closed, which means that the number of tasks in the computer system is assumed to be fixed. not taken into consideration. heavy loaded system.
External queues or task completion are
Such an assumption is reasonable when studying a
However, in order to investigate the relative performance
of the different dispatching algorithms over a wide range of system loading
342
Y. SHOHAT and J.C. STRAUSS
conditions, an open system model in which external queues and task completions are considered, should be employed.
The open system model is of particular impor
tance when evaluating VariationI Equal Job Class Turnaround and VariationII Equal Job Class Response Ratio, since both algorithms employ the length of the different classes external queues. The second problem is that in the closed queuing network models, all tasks are assumed to be indistinguishable in the sense that all have the same CPU or I/O devices service rates.
Clearly, such an assumption does not fit a model for eval
uating dispatching algorithms, since the motivation for developing such algorithms is to improve system thruput using the different CPUI/O characteristics of the different classes.
Therefore, the model required to represent the main effects
of a dispatching algorithm on performance has to account for the differences in CPU and I/O service rates among the different classes. The third problem involves the CPU queue service disciplines.
In the closed
queuing network models, tasks are served on a FCFS basis, while the CPU service discipline control to the dispatching algorithms studied here is of preemptive resume priority dispatching discipline.
Therefore, another modification in the
model is required to accommodate such a discipline. Other authors have treated some of these problems.
Anderson [2] modifies the
closed model to accommodate different service rates for different tasks (thus treating the second problem) and presents an approximation to the preemptive priority dispatching problem (third problem) while still dealing with a closed model.
AviItzhak and Heyman [4] developed an approximate open queuing model
(first problem) while still dealing with indistinguishable tasks.
Baskett et al
[5] presents the most general open model with different classes of customers. Reference [5], however, does not discuss preemptive priority dispatching disci pline (third problem). Reference [20] addresses the problems of open system, differentiation of classes, and preemptive resume priority discipline for a two job class example.
The
solution is presented in the following sections. TWO JOB CLASS EXAMPLE Figure 2 presents a schematic description of the general two class model. of type i (i = ct,ß)
arrive from a Poisson source to the class i queue.
Jobs
Jobs of
type i are characterized by: λ.
»
Average arrival rate.
μ
■
Average service rate in preemptive exponential CPU server.
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
343
Average service rate in FCFS exponential DTU.
P^
Probability of completion of job after DTU service.
Class i external queue discipline is FCFS.
Job type i enters the system only when
no other job from the same type is being served by the system; so the system is multiprogramming only if there are jobs from both classes in it, otherwise the system is uniprogramming or idle.
—.
CPU
DTU
Queue Arriving Jobs a
Job flow
V
α β Job Completion Figure 2.
Performance observation and control
Two job class system with dynamic dispatching
Internal to the system, a higher pre-emptive CPU priority job denoted as job a, and a lower priority job b, compete for service from the CPU and the DTU. is modeled as an exponential server with average service rate μ.. pletion
The CPU
Following com
of CPU service, a job either begins DTU service immediately or waits in
the DTU queue for completion of the other job being processed by the DTU. is modeled as an exponential server with average service rate δ..
The DTU
Following com
pletion of DTU service, with probability p., job i completes and leaves the system, or, with probability 1-p , moves to the CPU queue and begins a new cycle.
Upon
leaving the system, the job is immediately replaced by another type i job, if there is one waiting for service. A controller (the k
dynamic dispatching algorithm) observes the behavior of the
system through the CPU usage of the two jobs and the lengths of the two jobs ex ternal queues.
Every Δ time units, at the end of each control interval, the con
troller may change the relationship (mapping) between the external job types
(α,β)
344
Y. SHOHAT and J.C. STRAUSS
and the internal priority designations (a,b). MODELS USED IN REFERENCE [20] The general system model is developed as two constituent models:
(1) a closed
model for the system uniprogramming one of the two job types when there are no jobs of the other type to process; and (2) a closed model for the system multipro gramming both a and β jobs.
The results of these models are then synthesized to
an open composite model which is used to discuss the performance effects of the different dispatching algorithms. Uniprogramming Closed Model This model assumes that there is only one type of job, i, (a or β) to be pro cessed.
The job is served by the CPU and then moved immediately to the DTU.
There is no waiting in the CPU or DTU queue.
The job has a probability ρ of
leaving the system after completion DTU service, and a probability 1p. of moving back to the CPU to start a new service cycle.
If the job does leave the system,
it is immediately replaced by another job. Therefore, it is assumed that there is always one job in the system.
The last assumption makes the model, in effect,
a closed one. This model yields the following: Job Uniprogramming CPU and DTU utilizations :
^uiïïTl^
*«.*
»)
μ
±
— DTU , »
T=r S μ 1 í
i α, β
Job Uniprogramming Average Service Rates: R
ui
°¿~T77
i=a ß
( )
'
Multiprogramming Closed Model This model assumes that jobs of both types ( Of and β) ate always present. The model describes the internal system and recognizes two job types:
(1) job a,
with the highest CPU preemptive priority; and (2) job b, with the lowest CPU preemptive priority.
The relation between jobs α and β and the model notation
a and b is determined by the appropriate priority mapping: mapping 1 denote the relation a=ct, b=/3, while mapping 2 denote the relation a=/3, b=a. The probability of the system to be under mapping j (j = 1,2), P., will be established by the long term effect of the controller (dispatching algorithm).
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS A job in the system can be in one of four possible positions:
345 (0) waiting for
CPU service; (1) receiving CPU service; (2) waiting for DTU service, or (3) re ceiving DTU service.
The contents (a, b, x) of the last three positions are used
to represent the state of the system.
(An χ denotes the poisition being vacant.)
In this notation for example, state bxa is the state of job b being processed by the CPU and job a being processed by the DTU. the following states:
The nature of the model suggests
axx, bxa, xba, axb, xab.
Assuming the system is in steady state, the state probabilities (ρ , pr , etc.) axx b χ. ι can be determined by solving the state transition balance equations as in [20] where j denotes the mapping under which the system is operating.
These probabili
ties then may be used to obtain the following: CPU Multiprogramming Utilization:
CPUJ = pj + pJ a axx axb (10)
cml = P^ b
bxa
DTU Multiprogramming Utilization:
DTUJ = PJ. + P¿ a xba bxa (11) 1
J
DTtt? = V , + P . b xab axb
Multiprogramming Effective Service Rate:
R3 = C P U ^ p ^ ea a a a (12) *eb *
W
b"bPb
When the system is operating under the k
dynamic dispatching algorithm, the k th algorithm will establish a steady state probability p. of operating in the j 1 k mapping. Given the p. it is possible to develop system performance measures in 3
terms of the external jobs type a and β for the k
t n
algorithm as follows:
346
Y. SHOHAT and J.C. STRAUSS Job Multiprogramming CPU Utilization: CPU* = F* CPU1 + P^ CPU2 ma 1 a 2 b (I·3)
k k' 1 k 2 CPU . = Ρ, CPUr + Ρ , CPU mp 1 b 2 a Job Multiprogramming DTU Utilization: D T U k = P^ D T U 1 + pij DTU,2 ma 1 a 2 b k
k
1
V
(14)
0
DTUV. = Ρ, DTU;" + Ρ* DTU mp 1 b 2 a Job Multiprogramming Service Rates: 1 2 H* fc ma P Ï 1R ea +P$R 2 eb
(15) R*,, = P^R 1 , + P^R 2 mp 1 eb 2 ea Composite Open Model The above uniprogramming and multiprogramming models are closed models in the sense that they do not account for the effects of empty job queues.
These models
ignore arrivals or completions of jobs and therefore cannot be used to estimate performance measurements such as job class turnaround time and job class response ratio.
These models, however, provide the potential job class service rates of
the system when uniprogramming or multiprogramming, as demonstrated in (9) and (15).
Using these results it is possible to develop the following composite open
model:
The basic idea is that the external behavior of the two queues server de
scribed in Figure 2, can be represented as two separate single queue servers, one for each class.
The rate at which a server processes its own class jobs depends
on whether or not a job from the other class is present in the system.
When there
is no job from the other class, the server processes its own class jobs at a uni programming service rate (9). When there is a job from the other class the server processes its own class jobs at a multiprogramming service rate (15). As a first approximation reference [20] assumes that the equivalent composite _k server for each class is an exponential server with average service rates R and R
n
for the k
dispatching algorithm.
Regardless of the exponentiality assumption, queuing theory gives the utilizations of the two composite servers a s :
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
347
Job Class Composite Utilization:
λα
k *ca
· »
kk
R ca
*cß *cp
k k
(16)
c/3
These composite server utilizations facilitate the computation of the composite service rates in terms of the multiprogramming model rates of equation (15) and the uniprogramming model rates of equation (9) as follows: Job Class Composite Service Rate
uà (17)
The simultaneous solution of (16) and (17) and the exponentiality assumption facilitates the computation of the average composite class turnaround and the average composite class response ratio as follows:
Job Class Composite Turnaround
wk = L Ca W* = CP
R
λ C
l
"
cß
β
(18)
Job Class Composite Response Ratio: k k — RR K = W K R ca ca ua
k
ί19)
k
Κβ ' "cß \ß The Mapping Probabilities
The performance measures of equations (16-19) involve the computation of the mapping probabilities p. for each of the four dispatching algorithms. J —k —k probabilities are known, they can be used to calculate R (15) to (16-19) yields the desired results.
Once these
and R „ (15). Applying
The sequel develops the mapping prob
abilities for each of the four dispatching algorithms discussed earlier. OS job class priority (K=l) Under standard OS job class priority there is no dynamic dispatching.
CPU
348
Y. SHOHAT and J.C. STRAUSS
priority is directly determined by job type and the system operates in one mapping or the other. Thus, if a has highest priority Ρ 1 1 = 0, Ρ = 1 .
= 1, Ρ
= 0 and if β has highest
priority Ρ
HETM priority control (K=2) As presented in (10), standard HETM forces the average CPU utilization of the two job classes to be equal, if possible, while the system is multiprogramming. By ——2 2 2 2 setting CPU and CPU . of (13) equal, P, and Ρ can be easily determined, mû mp 1 ¿ Equal job class turnaround (K=3) As presented in (3) the equal job class turnaround dispatching algorithm forces, if possible, equal average job class turnaround while the system is multiprogram 3 3 k k It is then possible to solve for P, and P„ for which W = W „. 1 2 ma mp
mine.
Equal job class response ratio (K=4) As presented in (6), equal job class response ratio dispatching forces, if possi ble, the ratio of average job class turnaround to average class uniprogramming processing time to be equal across classes while the system is multiprogramming. For the cylic server model, the uniprogramming processing time of a type i job is 4 the reciprocal of the rate given in (9). Thus the mapping probabilities Ρ and 4 ή _ ¿, _ i P. can be determined by solving W R = W R . 6 2 ' mo; ua mp up EXACT SOLUTION Reference [20] indicates that the density functions characterizing a cylic server model like the one developed here, are more nearly hyperexponential than exponen tial.
Results obtained from a descriptive simulation model [11] however suggests
that the solution developed in [20] gives a reasonable approximation to the real problem.
The problem is solved exactly here to gain a better understanding of
the limitations of the exponentiality assumption. To solve this problem exactly it is necessary to look more carefully into the simple composite models.
Contrary to the approach of [20], the single composite
server is presented here as a cyclic server with several exponential service sta tions in series.
Such a model represents accurately the behavior of a job in the
general model presented in Figure 2.
The following sections develop such a model
and use the results to solve the posed problem. Open Cyclic Server with Several Exponential Service Stations A cyclic server with k stations works in this way: 2, ..., k. queue.
the stations are numbered 1,
Jobs arrive from an external source and buffered into an external
A new job enters the system when the current job in the system terminates,
and starts its service at station 1. the job moves to station 2 and so on.
Upon completion of service at station 1, The job moves from one service station to
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
349
the following one, until it completes service in station k thus completing one cycle.
From station k (the last station in the cycle) the job departs from the
system, with probability p, or moves to station 1 beginning a new cycle.
It is
further assumed here that the external source is Poisson with average arrival time λ, that all the service stations are exponential with average service rate μ. (i 1, 2, ..., k ) , and that there is no service time dependence among sta tions'. Now denote: t = service time: the time elapsing between the moment a specific s job enters the system and the moment it departs. R
= average service rate:
average number of jobs the system can
complete in a time unit. W
= average turnaround time:
the average time elapsing between the
moment a specific job joins the external queue and the moment it departs the system after completion. 9
= utilization of the open system.
2 Queuing theory yields the following results (E(X) and E(X ) denote the first and second moments of X ) : Average Service Rate:
R
— E(t s )
(20)
Open System Utilization:
ÍI
(2D
Average Turnaround Time: 2, λ E(t')
,
» = = +2 R
(1? )
(22)
Applying these results for the cylic system with exponential service stations yields :
k E 'O S
νΡ Σ ττ i=i η
(23)
344
Y. SHOHAI
350
and the internal priority designati
Y. SHOHAT and ,
and
MODELS USED IN REFERENCE [20]
E(t2) *ψ
The general system model is develop
¿
S
model for the system uniprogramming
Ρ
jobs of the other type to process; gramming both a and β jobs.
Ä*
■ 2(2ρ)
¿
Ρ
i,jl
The re
an open composite model which is us different dispatching algorithms.
Substituting (24) in (22), the turnaround and ρ
Uniprogramming Closed Model This model assumes that there is or cessed.
R
The job is served by the C
pZÜS>) i,T
There is no waiting in the CPU or I Where
R » —¡E— k
leaving the system after completioi
ι
back to the CPU to start a new ser\
Σ—
it is immediately replaced by anotl
i1 "i
is always one job in the system,
equation (25) will be needed later on for
a closed one.
written as:
w , L + M2P) (± R 1Î [R2
This model yields the following: Job Uniprogrammi CPU , = -¡r— ui
M.
Error Estimation The result presented in (25) can be used t
DTU . = ui μ±
nentiality assumption expressed in equatie A cyclic server with k stations of service
Job Uniprogrammi
ρ is:
ȴ
R ui
approximating such a server as a single ex Multiprogramming Closed Model
W as:
This model assumes that jobs of be
W
model describes the internal syste
1 Rλ
k ρμkX
with the highest CPU preemptive f preemptive priority.
The relatie
while the exact solution, W*, obtained fro
a and b is determined by the apprc
w*
relation a=ct, b=/3, while mapping '.
. L ρμ
+
*k(fcfl)(2p) 2pp(ppXk)
of the system to be under mapping long term effect of the controllei
„. Thus:
W* . , λ ,1 k+1. — 1 + jj ( — )
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
351
clic server with k identical exponential service stations, and a ) for completion, equation (31) yields the following results:
)
The approximate solution W approaches the exact solution W* as p, >n probability, approaches TJT,
thus for a system with many service
> approximation will be good for modeling the behavior of "large jobs,1 |obs with small completion probability (therefore, circulating the :imes). >ressed in tei For the system to be in a steady state situation it can be shown Itive error, ER, defined as:
ER
WW*
(30)
w
kil s E R s M i.
(31)
For this mal range of the relative error, ER, is ΓΤΤ".
This range is growing
However, it is limited by 1 which indicated that the approximation rence [20] is reasonable. d*LM4+M2AW2 0
»iW-ìH
todel
exact solution utilizes most of the results obtained in the approxi ji, the general model is slightly different. the error invi model constituted of two models: (19).
For each job type i there
(1) job type i uniprogramming cyc
Ddel; and (2) job type i multiprogramming cyclic server model.
For
2, a composite model is then synthesized in a way which maintains the probability [cs
Q j t^e
cyclic server model.
The uniprogramming model is identical
resented in the first solution (equations (89)).
The job type i
ning and the job type i composite models, are developed in the follow erver gives t Uniprogramming Model uniprogramming can be modeled using a cyclic server with 4 stations: enoted as CPUQ), CPU, DTU queue (denoted as DTUQ), and the DTU.
In
a multiprogrammed job gets "service" in the CPUQ while waiting the ob to complete CPU service.
After the other job completes its serv
PU, the modeled job moves to the CPU.
After completion of CPU serv
is "served" by the DTUQ while the other job is served by the DTU. Ü is freed, the job moves to the DTU. depart from the system with cycle.
After completing DTU service,
probability p, or move to the CPUQ and
352
Y. SHOHAT and J.C. STRAUSS
The average service rate of the CPU is μ. and the average service rate of the DTU is ô..
The average service rates of the CPUQ and the DTUQ depend on the dispatch
ing algorithm under which the system is operating.
Denote the average CPUQ multi
programming service rate under the kth algorithm as fm , and the average DTUQ algorithm as θ , , the model can be mi
multiprogramming service rate under the k described by Figure 3.
Job type i external queue
Completion Figure 3.
Job type i multiprogramming model (2
solution)
The CPU and DTU multiprogramming utilizations are presented in equations (1011, 1314).
The CPUQ and DTUQ utilizations can be calculated in the same way:
Job Multiprogramming CPUQ Utilization:
* =. P„k, Ρρ 2 CPUO" τηα 2 axx (32)
CPU
p\ ppl
V = 1'
axx
Job Multiprogramming DTUQ Utilization:
k l k 2 DTUQ" ■ Ρ, pΡ . + pΡ, pΡ 1 xab 2 xba τηα (33) k
DTuov
„ .ik pl
_j_ pk p2
p 1, P xba . . V . + P 20 P;xab
applying the balance equation technique, ζ
¡.k mi
CPU", mi ,. CPUQ
mi
Η
k
k and θ . ate calculated as
e
k mi
DTU , mi ,. DTUQ
mi
(34) l
;
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
353
substituting (34) into the turnaround equation presented in (26)
gives the
job multiprogramming turnaround as:
„k
_L_ + Μ(2-ρί} /
mi " =k
, „κ
^i+44+&VA+Mi+emi¿i\
i k p
^k where:
2 An „ k
I /^k \2 ~
/
(35)
k
iW^eml61
R
(36)
^miV^miV^miMi^ml^mi „k __*± 1 R\ mi k Ρ Equation (35) should be used instead of equation (18) when calculating Ρk and the mapping probabilities, for the equal job turnaround algorithm (k=3), and the equal job response ratio algorithm (k=4). Job Type i Composite Model The composite model developed here uses the same idea demonstrated in the first solution.
In the exact solution, however, the single server is expanded to be a
cyclic server with four service stations: CPU queue, CPU, DTU queue and DTU. This is done to use the same idea presented before; namely, that the CPU queue and DTU queue can be observed as service stations that the job must pass while being served by the system. The average service rate of the CPU and DTU is μ. and ¡5. respectively, while the average service rates of the CPUQ and DTUQ depend on whether or not a job from the other class is present in the system. th
Denote by ζ
- job i average CPUQ
cl
k
composite service rate under the k
algorithm, and by β . - job i average DTUQ
composite service rate under the k
algorithm, the composite model is similar to
the model described in Figure 3. Using the CPU and DTU uniprogramming utilizations, computed in (8) (the CPUQ and DTUQ uniprogramming utilizations equal ο) , the CPUQ, CPU, DTUQ,1 DTU multiprogram k k ming utilization, computed in (1314) and (3233), and Q , Q „ - the iob class "ctt
*cß
(*) When job type i always has the highest priority, CPUQ.=0.
In this case the
model should be modified to a model with only three service stations CPU, DTUQ and DTU. (**) Since the CPU and DTU are exponential service stations, due to the "memory less" property of exponential servers, the CPUQ and DTUQ are exponential service stations and equation (26) can be used.
354
Y. SHOHAT and J.C. STRAUSS
composite utilizations, computed in (16), it is possible to compute the composite utilizations of the various service stations: Job Composite CPUQ Utilization:
n .
■ *cß CPUQmc (37) CPÜC
™%β " Ka
W
Job Composite CPU Utilization: CPU* » p k e CPU* + (lp k ) CPU ca *cp ma cß ua
(38) CPU k f k CPU k 0 + (lpk ) CPU „ cß 'ca m/J *ca up Job Composite DTUQ Utilization: DTUQ o „ DTUQ w ^cct *cß ma
(39) DTUQ
cp " Ka
DTUC
Job Composite DTU U t i l i z a t i o n : BTU" o k „ DTUk + ( l 9v k „ ) DTU ce *cß ma cß ua
(40) DTUk yo k DTUk„ + ( l 9 k ) DTU „ c/3 ca mß ccc u/3 ►k
Applying the balance equation technique, ζ
,
k
and θ ., are
calculated as: >k
CPU k ci
Η
fci CPUQ ci
.
, DTU k _k _ ci
.
'
ecl
¿i
(*)
(41) ( >
DTUQ^
Substituting (41) into the turnaround equation presented in (26) gives the job composite turnaround as: (*) See footnote for equation (34).
355
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
ι , *ι(2-ν ƒ 1
„k
ci
where :
p
'ci
ci
ci
i feirei *i
ι
ΑΆΊ
(42)
(43)
"Aíi+íl^^ít«!«^^^
'ci ci Equation (42) should be used instead of equation (18) when computing the job class composite turnaround and response ratio, needed to evaluate the various dispatch ing algorithms. RESULTS A computer program has been written to compute general performance measurements for the two job class scheduled system presented here. These performance indi la cators include the mapping probabilities (P.), as well as the composite service . k ^ k k rate (R ) , utilization (0 . ) , turnaround time (W . ) , and response ratio (RR ) . cl ci ci ci The program computes the measurements for each of the four dispatching algorithms (k = 1, ..., 4 ) , first using the approximate solution presented in reference [20], and then using the exact solution, developed here.
The presented results are
used to compare the approximate solution with the exact solution. Two examples of different system characteristics are used for comparison purposes (see Table 1 ) . These examples, run with different values of λα
and λ„,
demon
strate the behavioral characteristics of the relative error of the approximate solution as a function of the demand on the system, and the utilization of the system. TABLE 1.
Two Examples
Job Type a Characteristics
Job Type β Characteristics
Example
Μα
°a
Pa
Η
δβ
Η
1
10
20
.5
15
10
.5
2
10
20
.5
15
10
.3
356
Y. SHOHAT and J.C. STRAUSS
Tables 25 present results obtained for example 1 and 2 which reveal the following: (1)
For Standard OS Fixed Priority Dispatching · (Table 2 ) , the absolute
value of the relative error of the turnaround time (|ER(W)|) varies between 0.01 and 0.133 for the various combinations of demand (λ_,,λο) and mapping priority (P). (2)
For HETM Dynamic Dispatching (Table 3 ) , the absolute value of ER(W)
varies between 0.004 and 0.124. (3)
For Equal Job Class Turnaround (Table 4 ) , the absolute value of
ER(W) varies between .004 and 0.112. (4)
For Equal Job C lass
Response Ratio (Table 5 ) , the absolute value
of ER(W) varies between 0.006 and 0.106. A similar range of values (0.0 to 0.14) for the absolute value of the relative error of the turnaround time is found to be typical in more than 100 other runs of other examples which are not recorded here. The fact that the maximum relative error recorded is not more than 14 percent strongly suggests that the approximate solution presented in [20] is reasonably accurate and yields results which are pretty close to the exact results obtained by the second solution.
However, it is interesting to note that the approximate
solution shows the dispatching algorithms to be more affective than is evident in the exact solution; that is to say, that for the equal job class turnaround dispatching algorithm, the approximate results show W
and W „ to be closer to ca cp each other than in the exact results, and for the equal job class response ratio dispatching algorithm, the approximate results show RR and RR „ to be closer to
ca
cß
each other than is indicated by the exact results. Aside from this reservation, the results obtained here strongly support the con clusions reached in reference [20]. These conslutions are, in brief: (1)
The standard HETM algorithm insures better system performance com
pared to using fixed priority dispatching with poor information on workload char acteristics. (2)
The equal job class turnaround algorithm better balances response
to actual workload for heavily loaded systems. (3)
The equal job class response ratio offer an interesting compromise
between the HETM and equal turnaround algorithms, while still allowing relative class response to be influenced by the relative class workload demand. CONCLUSIONS This paper develops an exact solution to a queuing model for evaluating dynamic dispatching algorithms.
The solution is compared to an approximate solution
TABLE 2.
Standard OS Fixed Priority Dispatching (k1)
Ex.
λα λ 5
Ρ
a
ζ
β
2 . 2 1.9 1.0 2 . 8 5 4 1.651
a
β
a
. 7 7 1 1.151 1.529
0 . 0 2 . 0 2 9 2 . 8 0 4 1.084
RR c
W c
fc
.678
Relative Error
Second S o l u t i o n R e s u l t s
Approximation R e s u l t s R
P*
β
a
β
-
5.098
-
1.107
a
*
*
R Z
et
ß
ß
1.613
3.320
1.0 2 . 0 1.0 2 . 9 8 8 2 . 4 1 1
.335
.828
. 5 0 3 2 . 4 1 3 1.676 7 . 2 4 0
a
RR
*
W Z
|ER(P)|
* C
ß
a
-
5.377
-
1.177
3.531
β
|ER(W)|
a .052
β
3a ZZ 3=>
.060
. 5 1 9 2 . 4 5 6 1.729 7 . 3 6 9
. 0 3 0 .017
1 0.0 2.011 2.910
.497
.687
. 9 8 9 1.099 3 . 2 9 7 3 . 2 9 7
2 . 0 1.0 1.0 3 . 1 1 1 1.875
.643
.533
.900 1.143 3.000 3.428
0.0 2.666 2.864
.751
. 3 4 9 1.512
2 . 5 1.0 1.0 2 . 8 6 1
.883
.874 1 . 1 3 3 2 . 7 6 8
0 . 0 2 . 1 8 5 1.676 1.144 2 . 1 1.2 1.0 2 . 8 4 6 1.025
.597
. 7 3 8 1.171 1.341
a u
PC
CE
. 9 7 0 1.188 3 . 2 3 3 3.564
. 0 2 0 .075
o," .967 1.126 3.224 3 . 3 7 8
. 0 7 0 .015
. 5 5 7 5 . 1 5 6 1.670
.022 . 0 3 6
1.547
.537 5 . 0 3 9 1.610
9.228
M
ra
w tu
m to
n
3» I O ZC
2.943
9.809
.059
z^. ci
1.480
2.664
1.689
3.041
.124
P
4.471
1.410
4.701
.049
en
2.422
4.359
QJ
2
Ö
o
2.068
S 3.722
«
ut
01 S
QJ
OJ
B B ra ro rt en w
.015
0 . 0 1.962 1.684 1.070
.713
1.0 1.0 1.0 3 . 0 4 7 1.455
.328
.687
. 4 8 8 2 . 1 9 6 1.628 3 . 9 5 2
. 5 0 5 2 . 4 3 7 1.684 4 . 3 8 6
.033 .099
0.0 2.235 1.751
.447
.571
. 8 1 0 1.331 2 . 6 9 9 2 . 3 9 6
. 8 0 2 1.535 2 . 6 7 3 2 . 7 6 4
.010 .133
ΙΛ
TABLE 3.
HETM Dynamic Dispatching (k2)
Approximation R e s u l t s Ex. λ
α
V
R Ρ
a
: β
a
β
2 . 2 1.9 .184 2 . 0 8 4 2 . 5 0 4 1.056 .759 1
a
.465 .719
2 . 0 1 . 0 .184 2 . 7 1 3 2 . 6 5 4
.737 .377 1 . 4 0 3
. 2 . 1 1 . 2 .184 2 . 0 2 2 1.507 1 . 0 3 8 .796 1 . 0 1 . 0 .184 2 . 3 5 3 1.680
.425 .595
ß 1.656
a
-
* Ρ* C
R
RR c
Ρ*
β
C
α
β
0£
Ö u
(J OS
3
3.255
3.727 5.858
ε . 7 3 8 1 . 4 7 0 2 . 4 6 3 2.G46 (Λ
tn
d)
η
ra
α> e η tn
a
E O tn
tn ra
α) Ε
«
R e l a t i v e Error RR
c
β
*
a c
-
β
|ER(P)|
|ER(W)|
a
β
5.088
.024
.858 1 . 3 6 0 2 . 8 6 0 4 . 0 8 0
.014 .059
.618 4 . 8 1 1 1.855
.028 .022
1.443 α
* 1.696
Ol
ο?
.605 4 , 6 7 6 1.814 2.071
W
α
α
4.967
.870 1.279 2 . 9 0 0 3 . 8 3 9 π
1 . 0 2 . 0 .184 2 . 1 4 9 2 . 7 8 1
2 . 5 1 . 0 .184 2 . 2 2 2 1 . 4 8 3 1.125 .674 2
Second S o l u t i o n R e s u l t s
W c
c
UI
υ
2.298 3.698
4.137 6.657
.099 .120
Ε
. 7 3 6 1.677 2 . 4 5 3 3 . 0 1 9
.004 .124
TABLE 4.
R K
ß
Ρ
α
β
a
RR
W
ζ
β
a
β
ZZ. 3»
Relative Error
Second Solution Results
Approximation Results
Ex. λ
3=
Equal Job Class Turnaround (k3)
ο.
Ç*
R A
c
β
Ρ* O!
β
a
W
c
*
RR
a
*
β β β 2.2 1.9 .556 2.297 1.991 ¡.958 .954 10.324 10.941 34.451 32.822 .556 2.297 1.091 .957 .954 10.166 10.679 33.886 32.036
a
1 ER(Ρ)I 1 ER(W) | α
οι
β
0
.016 .024
3.885
.127
.005 .053
2.845
.010
.051 .004
2.5 1.0 .728 8.515 1.011 .994 .989 65.020 89.829 216.73 161.69 .727 2.515 1.011 .994 .989 66.623 96.960 222.07 174.53
.001
.024 .073
2 2.1 1.2 .483 2.195 1.261 .957 .951 10.515 16.306 35.052 29.352 .481 2.194 1.263 .957 .950 10.432 17.650 34.775 31.771
.0U4
.008 .076
.47
.010 .112
1 1.0 2.0 .133 2.109 '2:815 .474 .710 2.0 1.0 .783 2.971 2.051 .673 .487
1.0 1.0 .115 2.307 1.706 .433 .586
.902 1.030
.765
1.227 .952
1.416
3.006 3.433
2.550
3.681 .118 2.097 2.825 .477 .708 2.855 .791 2.975 2.044 .672 .489
2.550 .078 2.283 1.720 .438 .581
.897 1.085
.773
1.295 .948
1.594
2.989 3.617
2.577
2.869
.
TABLE 5.
Equal Job Class Response Ratio (k4)
Se 2ond Solution Results
Approximation Results
Ex. λ «
K
ß
Ρ
a
RR
W
R
a
α
R
c β
Ρ*
a
*
* a
z
W
*
Relative Error RR
*
•a
a β β β β ß β 2.2 1.9 .557 2.298 1.989 .957 .955 10.181 11.162 3.394 3.349 .557 2.298 1.989 .957 .955 10.025 10.895 3.3.42 3.268 β
α
|ER(P)|
|ER(W)1 a
β
0.
.015 .024
1 1.0 2.0 .154 2.125 2.801 .470 .714
.889
1.248 2.962 3.745 .140 2.114 2.810 .473 .712
.884
1.316 2.946 3.949
.10
.006 .052
2.0 1.0 .803 2.983 2.033 .670 .492
1.017
.968 3.392 2.903 .812 2.989 2.026 .669 .493
1.073
.964 3.577 2.893
.011
.052 .004
200.8 192.0
0.
.024 .071
2.5 1.0 .729 2.517 1.010 .993 .990 58.812 98.99
2 2.1 1.2 .492 2.203 1.255 .953 .957 1.0 1.0 .268 2.412 1.650 .414 .606
106.6
32.39 33.00 .490 2.201 1.256 .954 .955
9.648 19.76
32.16 35.57
.004
.007 .072
1.537 2.360 2.767 .234 2.389 1.662 .419 .602
.718 1.719
2.395 3.094
.14
.014 .106
9.717 18.33 .708
196.0 178.2 .729 2.517 1.010 .993 .990 60.25
AN ANALYTIC MODEL OF DISPATCHING ALGORITHMS
361
developed in reference [20]. Results obtained from the exact solution verify that the approximate results are reasonably accurate, and thus support the major con clusion in reference [20], namely:
that the dynamic dispatching algorithms improve
system performance, each algorithm according to its stated goals. This paper demonstrates the usefulness of approximate solutions to simple analytic models as a tool for the system designer. ACKNOWLEDGMENT The authors wish to acknowledge the patient and capable assistance of Leola Peterson Goodman in the preparation of the manuscript of this paper. BIBLIOGRAPHY 1.
Adiri, I., "Queuing Models for Multiprogrammed Computers", IBM Research
2.
Anderson, H.A., "Approximating Pre-emptive Priority Dispatching in a Multi-
Report, RC-3802, Yorktown Heights, Y.Y., March 30, 1972. programmed Model", IBM J. Res, and Develop., Nov. 1973. 3.
Arora S. and Gallo Α., "The Optimal Organization of Multiprogrammed Multi level Memory", Proc. ACM SIGOPS Workshop on System Performance Evaluation, New York, April 1971.
4.
Avi-Ttzhak B. and Heyman D.P., "Approximate Queuing Models for Multiprogram ming Computer Systems", Operations Research, 21, 6, Nov-Dec 1973.
5.
Basket, F. et al. "Open, Closed and Mixed Networks of Queues with Different Classes of Customers", J. ACM 22, 2, April 1975, pp. 248-260.
6.
Baskett, F., "Mathematical Models of Multiprogramming Computer Systems", Ph.D Thesis, U. of Texas, Austin, Dec. 1970.
7.
Bowdon, E.K. Sr., Mamrak, S.K. and Salz, F.R., "Performance Evaluation in Network Computers", Proc, of the Symposium on Simulation of Computer Systems, June 1972.
8.
Brinch Hansen, P., "An Analysis of Response Ratio Scheduling", Proc, of IFIP Congress 71, Lubljana, Yoguslavia, August 71.
9.
Buzen, J-, "Analysis of System Bottlenecks Using a Queuing Network Model", Proc. ACM SIGOPS Workshop in System Performance Evaluation, New York, April 1971.
10.
Buzen, J.P., "Queuing Network Models of Multiprogramming", Ph.D Thesis, Harvard University, 1971.
11.
Chiang, A.T., "A Simulation Study of Dynamic Dispatching", Master Thesis, Computer Science Dept., Washington University, St. Louis, Mo., May 1974.
12.
Gordon, W.J. and Newell, G.F., "Cycling Queuing Systems With Exponential Servers", Operations Research 15, 254 (1967).
362
Y. SHOHAT and J.C. STRAUSS
13.
Jackson, J.R., "Network of Waiting Lines", Operations Research, 5, 518 (1957).
14.
Lan, J.C., "A Study of Job Scheduling and Its Interaction with CPU Schedul
15.
Mitrani, I., "Nonpriority Multiprogrammed Systems Under Heavy Demand Condi
16.
Moore, C , "Network Models for Large Scale Time Sharing Systems", Ph.D
17.
Northouse, R.A. and Fu, K.S., "Dynamic Scheduling of Large Digital Computer
ing",
Master's Thesis, The University of Texas at Austin, 1971.
tions - Customer Point of View",' J. ACM 19, 3, 1972. Thesis, U. of Michigan, Ann Arbor, April 1971. Systems Using Adaptive Control and Clustering Techniques", IEEE Transactions on Systems Men and Cybernetics, Volume SMC-3, Number 3, pp. 225-234, May 1973. 18.
Shohat, Y. "An Analytic Model of Dynamic Dispatching Algorithms", Master Thesis, Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, Penna. May 1976.
19.
Strauss, J.C., "An Analytic Model of the HASP Execution Task Monitor", Comm. ACM 17, 12 (Dec. 74) pp. 679-685.
20.
Strauss, J.C., "Dynamic Dispatching in Job Class Scheduled System", Proc. AFIPS 1975 NCC, vol. 44 AFIPS Press, Montvale, N.J., pp. 343-350.
21.
Strauss, J.C., Chaing, A.T., "Priority Control for Maximum Thruput with Equal Job Class Turnaround", Proc, of Computer Science and Statistical Symp., Iowa State University, Oct. 1973.
22.
Tanaka, Η., "An Analysis of On-Line Systems Using Parallel Cyclic Queues", J. Inst. Elee, and Comp, of Japan, 53-c, 10(1970).
23.
The HASP system. Doc. for IBM Type 3 Program, HASP-II, version 3.0, No. 360 -
24.
Wulf, W.A., "Performance Monitors for Multiprogrammed Systems", Proc, of the
D - 05.1.014, IBM Corp., Feb. 26, 1971. Second Symposium on Operating Syterns Principals, ACM, New York, N.Y. 175-181, 1969.
pp.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © NorthHolland Publishing Company (1976)
PRACTICAL CONSIDERA TIONS IN THE NUMERICA L A NA LYSIS OF MA RKOVIA N MODELS.
William J .
STEWA RT.
IRISA,
Université de RENNES, 35031 FRANCE.
1.
Introduction. Although analytic techniques, such as those of Jackson,[il], Gordon and
Newell,fio], etc. are an important tool for the investigation of queueing net works, their applicability is, despite the recent advances in more generalized models, (Baskett et al,[l], Buzen, [3], Chandy,[4], etc.) and w¿th approximate techniques, (Gaver,[β], Gelenbe,[δ,9], Kobayashi,[l3]), still rather restrictive. Simulations, on the other hand, are often very expensive and sometimes produce results which are less than satisfactory. A numerical approach lies somewhere between these two extremes.
It permits
a wider class of problem to be solved than with analytic techniques, although it does not possess this letter's advantages of parametrization and very low cost. Also, it is considerably less expensive and more accurate than simulation, but is unable to compete against the wealth of detail which may be incorporated into a simulation. 363
364
W.J. STEWART If a system can be modelled by a continuous-time Markov chain, then numerical
techniques may be employed to determine the stationary (or long-run) probability vector of the system, i.e. the vector whose length is equal to the number of states which the system can occupy, and whose i-th component denotes the prob ability of the system being in state i after a period of time which is suffic iently long to erase the influence of the starting state. From such stationary probabilities, system characteristics like the busy/idle period of the servers, and the probability distribution of queue lengths may be derived. Numerical techniques have been extensively used by Wallace, (Recursive Queue Analyser, RQA-l,[lVj) and have met with much success. Unfortunately, the numerical solution of Markov chains has not been as widely adopted as its bene fits would warrant.
It is often the case that the natural and straightforward
way to model a system is by a Markov chain and subsequent numerical analysis. Rather than do this, however, systems analysts prefer to distort the model to make it conform to one for which an analytic solution is possible, or if this fails, to turn immediately to simulation. Perhaps an important reason why this is so, is due to the fact that the standard techniques sometimes fail to converge to the correct result, and this despite the existence of convergence proofs. Analysts therefore may under standably be wary of such techniques.
In the following section, the convergence
properties of the standard numerical techniques are examined, and the cases where convergence will be difficult, or even impossible, are exposed.
In
particular, it will be shown that decomposable, or nearly decomposable systems result in matrices which are exceptionally difficult to handle with the standard techniques. A new numerical approach, which surmounts the problem created by difficult systems, is then presented. In the second part of this paper, a complete Fortran software package, MARCA: Markov Chain Analyser, which embodies the new approach for the numerical solution of Markovian networks, is presented.
Its purpose is to help a user construct
his model, to analyse it, and to produce meaningful results. With such a package it is envisaged that he may simply and inexpensively conduct a wide variety of modelling experiments with only the minimum amount of knowledge of both Markov chain theory, and numerical linear algebraic techniques
NUMERICAL ANALYSIS OF MARKOVIAN MODELS
365
2. Numerical Considerations in Markov Modelling. Consider a system which is modelled by a continuoustime, homogeneous Markov chain with discrete state space. Let P.(t) be the probability that the system is in state i at time t, then n n = P.CtMl I s.it} + { I s. .P.(t)} it 1 J. * IT ι _t_ * κ ι .K J ]íi k*i where s, . is the rate of transition from state k to state i, and n is the total ki number of states. P.(t+6t) 1
Let
s.. ii
n n = Il s.. , then, P.(t+it) = P.(t) + { Y .P. (t)} it I s, ..;_.1 ]ij ' ' 1 ι ι ,,t n ki k j*i k=l
P.(t+it) P.(t)
Lim i itKD
η
y s. .p. (t)
=
1
ι
it
In matrix notation
.
p.(t)
i
.'.kik k=l
P(t) = STP(t).
At steady state, the rate of change of P(t) is zero, and therefore STP
= 0
(1)
where P(t) is now written as P. From equation (1), (STAt + D P where
W
T S AtP + Ρ = Ρ
= Ρ
where At is arbitrary,
i.e.
WTP
= Ρ
(2)
= S At + I
It is possible to select At such that W
is a stochastic matrix which may
be regarded as the transition probability matrix for a discrete time Markov system in which transitions take place at intervals of At, At being sufficiently small to ensure that the possibility of two changes of state within this inter val is negligible. From the method of construction of this matrix, it may be shown that there always exists a unit eigenvalue and that no qther eigenvalue exceeds this in modulus. The required vector Ρ is therefore the lefthand eigenvector corresponding to the dominant eigenvalue of the stochastic matrix W. When numerical techniques are to be used, then because of the nature of the matrices involved (large and very sparse), iterative techniques are preferable to direct methods. This is due to the fact that the application of iterative techniques leaves the matrix unaltered. Consequently a very large saving in core requirements is effected, since the matrices, which would normally be much too large to keep in main memory, may now be stored in a compact form, i.e. the
366
W.J. STEWART
only information stored is the nonzero elements and the position of these ele ments in the matrix. Naturally a compact storage scheme is chosen which not only minimizes the core requirements, but which also permits the matrix to be easily postmultiplied by a vector, since, as we will see, this is the only oper ation required of the matrix. Iterative techniques may also result in a decrease in the time requirements, since use may be made of initial approximations to the solution.
This is often
a very important consideration when a series of tests is being performed, such as the determination of the optimal degree of multiprogramming.
The results
obtained for a degree of multiprogramming k, may be used to obtain an initial approximation to the solution for degree of multiprogramming k+1. This is particularly important when it is realized that as the degree of multiprogramm ing increases, so also does the size of the matrix and the amount of computation required. When the dominant eigenvalue and corresponding eigenvector of a matrix are required, then the power method,[δ], is the usual iterative technique employed. This is the numerical technique which has been incorporated into RQA1, and, apart from simultaneous iteration which is discussed below, is the only iter ative method available for the partial eigensolution of a real unsymmetric matrix. The convergence of the power method may be examined as follows. Let A for
be a square matrix of order η with eigensolution
i = 1, 2, ... η.
Further suppose
Consider the iterative scheme
y.
Ax. = λ.χ.'
|λ. | > |λ„| > |λ.| ì
= Ay, , with
...
ì. | λ \ .
y~ arbitrary.
™en & = v\o - ¿ V i ï i = λ ϊ {α ώ + j 2 °i a i A i )k vi 1 · It may be observed that the process converges onto the dominant eigenvector x. and that the rate of convergence is dependent on the ratios
fX-|/|X_|
for i = 2, 3, ... η. The smaller these ratios are, then the faster will be the convergence.
In particular it is the magnitude of the subdominant eigenvalue
λ. which determines the convergence rate. Thus in the case where the subdomin ant eigenvalues are close in modulus to the dominant eigenvalue, the rate of convergence will be very slow.
Furthermore, if there is more than one eigen
value of maximum modulus, then convergence will not be achieved. Note that the analyst in search of a numerical technique for the solution of Markov chains is not obliged to work with eigenvalues and eigenvectors, but may
367
NUMERICAL ANALYSIS OF MARKOVIAN MODELS
use iterative equation solving techniques to obtain a solution to the set of lin ear homogeneous equations given by (1).
Iterative methods available are those of
Jacobi, of Gauss-Seidel and of successive over-relaxation. However the same problem concerning the convergence of these methods may also be observed; viz, when the eigenvalues of the iteration matrices of such methods are close to unity, convergence is very slow. A comparison of different numerical techniques for the solution of Markov chains has been given by Stewart,[l5]. Consequently models which give rise to stochastic matrices having more than one eigenvalue of unit modulus are not susceptible to solution by standard numerical techniques. Furthermore, in cases where there is a unique eigenvalue of unit modulus and several eigenvalues close to unity, then the rate of conver gence will be extremely slow. Such cases arise respectively, when the matrices are decomposable or nearly decomposable. A stochastic matrix is said to be decomposable if it can be brought by a permutation of its rows and columns to the form, called the normal form of decomposable stochastic matrices, given by:
»11
°
0
0
U
0
0
0
22
U
ZERO
33
(3)
w
'
U
kk
Uk+1,k+1 NON-ZERO
υ
-m,l
υ „
U -mm
-m,2
Here the diagonal matrices U.. , i = 1, 2, .. m, are square, non-zero and non-decomposable; all submatrices to the right of the diagonal matrices are zero, and those to the left of U.. are zero for i = 1, 2, '
-11
... k, and non-
zero for i = k+1, k+2, .... m. The diagonal submatrices are as follows: U.., i = 1, 2, k isolated or essential, -11 U.., i = k+1, .... m transient. -11 '
W.J. STEWART
368
On application of the theorem of Perron-Frobenius,[l4j, it may be shown that each of the essential submatrices possesses a unique unit eigenvalue, and that the eigenvalue of largest modulus of the transient submatrices is strictly less than unity:
in other words, the decomposable stochastic matrix given by (3)
possesses exactly k unit eigenvalues. However models which give rise to matrices which are decomposable are extre mely rare. Much more common are those whose matrices are nearly decomposable. If the matrix A
given by equation (3) is now altered so that the off-diagonal
submatrices which are zero become non-zero, then the matrix is no longer decompable and
A
will have a unique unit eigenvalue. However if these off-diagonal
elements are small compared to the elements of the diagonal submatrices, then the matrix A
is said to be nearly decomposable since there are only weak inter
actions among the diagonal submatrices. Consider for example, the decomposable stochastic matrix given below.
0.2
0.1
0.5
0.5 0.6
A
ri
U
2
h h
Β
U
3
b
C
Submatrices A is transient.
ZERO
0.4 I 0,1
0 9
1 0.9 JL _ I 0.1
0 1
, ZERO .
—1
C
0.4
and
0.1
I I
0.8
0.1
0.1
0.8
B are essential and non-decomposable, while submatrix
By the theorem of Perron-Frobenius, S
possesses two unit
eigenvalues, one corresponding to each of the two essential submatrices. Consid er the effect of introducing a small non-zero element and let element (4, 4) be given by matrix.
0.1 - e, so that
ε
into position (4, 1)
S remains a stochastic
There now exists a non-zero probability of a transition from
In other words
Β to A.
Β has become a transient submatrix. However, since the probab
ility of such a transition is small, Β may be considered to be almost essential. In a similar manner, if small transitions are introduced into submatrices U. and
U,,
U_, then no submatrix may be considered as strictly essential, nor the
matrix decomposable. However, since the probabilities are small, the submatrices A
and
Β may be considered almost essential, and consequently the matrix
may be considered to be nearly decomposable.
S
NUMERICAL ANALYSIS OF MARKOVIAN MODELS
369
The nearness of a stochastic matrix to decomposability may be determined from its subdominant eigenvalues. The eigenvalues of ~0.1
0.9~
0.9
0.1
are 1.0 and -0.8, while the eigenvalues of
0.1-ε 0.9
B'
0.9
are 1.0-ε/2 and
0.1
In the latter case, terms of the order of observed that the smaller the value of of
B' become to those of
B.
-0.8-ε/2.
2 ε have been neglected.
It may be
ε becomes, the closer the eigenvalues
In general, it may be shown that the closer a
submatrix is to being essential, then the closer its dominant eigenvalue is to unity.
Therefore, in a general stochastic matrix
S
each eigenvalue which
is exactly unity represents an essential submatrix, while eigenvalues which are close to unity indicate submatrices which are nearly essential. Nearly decomposable stochastic matrices therefore arise whenever the rates of transition in one part of a model differ largely from the rates of transition in another
a relatively frequent occurrence. Models of hierarchical storage
schemes, for example, fall into this category.
For further information on decom
posability and near decomposability, the reader is referred to Courtois,[6]. To obtain solutions for such models, a numerical technique is required whose rate of convergence is independent of the dominant set of eigenvalues. A class of methods which satisfy this criterion is now presented. Simultaneous iteration methods are an extension of the power method, in which iteration is carried out with a number of trial vectors which converge onto the eigenvectors corresponding to the dominant eigenvalues. The method was first proposed by Bauer,[2], who introduced the methods of treppen-iteration and biiteration.
Subsequent methods are related to Bauer's bi-iteration but improve
the rate of convergence by introducing an "interaction analysis" into the iter ation cycle. These methods simultaneously produce both left- and right-hand sets of eigenvectors, and have been most highly developed for the real symmetric eigenvalue problem, where the left and right eigenvectors coincide. However, a lop-sided method by Jennings and Stewart,[12], which determines only one set of eigenvectors for a real unsymmetric matrix, and is therefore particularly well suited to the needs of Markovian analysis, has recently proved reliable in numer ous tests. The lop-sided simultaneous iteration algorithm will not be considered in de tail in this paper, rather the reader is referred to the reference given above.
370
W.J. STEWART
In general, however, the effect of iterating with several vectors rather than a single one, as is the case with the power method, is that the rate of convergence is no longer dependent upon the subdominant eigenvalue λ„ but rather on the (m+l)th and lower eigenvalues, λ ., λ „, where m is the number of trial vectors employed in the analysis. In other words, the error in the initial approx imation to the stationary probability vector will diminish by a factor of approx imately |λ.|/|λI at each iteration as opposed to |λ.|/|λ_| . Note that the eigenvalues are assumed to be given by λ., where ι = \ Ì |x2| i |λ3| > > |λη| > o The methods of simultaneous iteration in general, and the lopsided method in particular, can therefore be used to great advantage where many eigenvalues are clustered close to the dominant one. Even in the case where the dominant eigenvalue has multiplicity greater than one, and/or there exists more than one eigenvalue having maximum modulus, simultaneous iteration methods still converge. It is merely necessary to chose m larger than the number of eigenvalues which are close to unity. While it is true that iterating with m trial vectors requires at least m times the computation per iteration as iterating with a single vector, conver gence is usually achieved in considerably less than 1/m times the number of iterations. Also note that in some cases where iterating with a single vector is preferable to iterating with several, for example when the eigenvalue λ. is well separated from the subdominant eigenvalues, then simply by choosing m = 1, the lopsided method reverts to the power method. This simultaneous iteration method provides the numerical basis of the software package MARCA which will now be described.
3. MARCA: Markov Chain Analyser. In modelling experiments it is usual to represent a system as a network of queues and servers whose state at any moment describes the state of the system at that moment. MARCA employs the concept of "balls and buckets". Consider, for example, the case of several programs competing for a set of resources. The resources are effectively on loan to the programs, to be returned when asked for by the operating system, or when the program has finished needing them. In terms of balls and buckets, a unit of resource may be represented as a ball, and a program or resource pool, as a bucket. The movement of balls among
NUMERICAL ANALYSIS OF MARKOVIAN MODELS
371
the buckets represents the behaviour of the resources as they are deallocated and allocated among the programs. At any instant, the state of the model is deter mined by the number of balls in each bucket, just as the state of the computer system is, at any instant, given by the amount of resource which each program possesses, and the amount which remains idle in the resource pool. Alternatively, a program may be represented as a ball, and a unit of resource as a bucket.
In this case the number of balls in a bucket represents one more
than the number of programs queueing to use the resource:
one more since one of
the balls represents the program currently using the resource.
Depending on the
formulation of the problem and on the particular results required, other repre sentations are possible.
It is due to this flexibility that the ball and bucket
concept has been chosen to provide the basis of the modelling techniques employed by MARCA.
All systems which are to be analysed by this software package must be
formulated in terms of balls and buckets. The movement of a ball from one bucket to another is called a transition from a source bucket to a destination bucket, and the rate of transition is defined as the rate at which the source bucket loses balls to the destination bucket.
This last quantity will obviously vary from system to system, and from
bucket to bucket. A transition of particular interest is an instantaneous tran sition which means, as the name implies, that the actual movement takes no time at all. The use of such a transition is described below. The state of the system is represented in row vector form, the i-th element of which denotes the number of balls in the i-th bucket at the moment of interest. The major objectives for the package were that it should be able to cater for a wide variety of modelling experiments, that it should incorporate the best known methods of analysis available, and that it should be easy to use.
It is
believed that the concept of balls and buckets permits the first criterion to be satisfied, while lop-sided simultaneous iteration satisfies the second.
To make
the package easy to use, it was decided to keep the amount of user information required to an absolute minimum.
In particular, the responsibility for organizing
the eigensolution of the probability matrix is taken by the package itself, so that the user need have no knowledge whatsoever of linear algebraic techniques. The actual information which the user must supply consists of certain model characteristics, which are input as data, and two short Fortran subroutines which he himself must write. The information supplied as data describes the basic configeration of the model.
It consists of the following four items.
372
U:J. STEWART
(i)
The number of buckets and the number of balls in the model.
(ii)
The maximum number of balls which each of the buckets can possess at any instant.
(iii) An initial state, i.e. an initial value for each of the buckets. This is obtained by assigning to each bucket, a specific value that is within the capacity of that bucket, and such that when the initial values are taken together, they satisfy the constraints of the system:
in other words, the
initial state must be a realizable state of the system. (iv)
A list of all the transitions that can possibly occur among the buckets.
The first subroutine which must be supplied by the user, is the subroutine RATE, and its purpose is to determine the rate of transition from a given source state to a destination state.
It must necessarily be written by the user himself,
since only he will be aware of what the transitions are and upon what they depend. A certain transition may, for example, depend upon the state the system is curr ently occupying, or more simply, just upon the number of balls which the source bucket possesses. Usually this routine is very short and can be quickly written by the user.
In any case, the manual for the software package,[l6], contains
helpful suggestions on how RATE may be efficiently and painlessly programmed.
In
addition several examples are provided. The second routine which the user must write himself, is the subroutine INSTANT. This is called each time a destination state is obtained by the subrout ine RATE.
Its purpose is to examine this state and determine if an instantaneous
transition occurs from it to any other state.
Instantaneous transitions will
usually occur as the result of certain scheduling policies imposed upon the system and which the user wishes to consider as taking no time at all to imple ment.
For example, a user may use INSTANT to model the case in which possession
of the CPU is instantaneously taken from a low priority process to be given to a higher priority process which has just requested it. In this case it is known from the state of the system (i.e. the low priority process computing and the high priority process waiting), that an instantaneous transition will occur; fur ther, the state to which the instantaneous transition is destined may be easily determined. In reality, of course, there are no instantaneous transitions since even the simplest operations take a small but finite amount of time to perform. The advantages of taking such transitions to be of zero duration, is that it consid erably simplifies the analysis. Note that in many models, instantaneous transit ions will not be required, so that RATE is the only routine which the user need
NUMERICAL ANALYSIS OF MARKOVIAN MODELS write.
373
On the other hand, when they are required, then the routine is no more
difficult to write than RATE, and as was the case for this latter, helpful sugg estions and examples are provided in the manual. When the four items of data are input to the package, they are checked for inconsistencies.
If an error is detected, the package does not immediately
halt, but first verifies the remaining items.
The input data is also output to
the printer. This enables the user to check that the problem solved is the one he intended to solve, and that he has not supplied incorrect data.
Care should
be taken to verify this output since MARCA cannot detect all input errors and inconsistencies.
The outputted date elso serves as a type of file index in the
case when more than one problem of a similar nature is being solved. When the checking phase has been successfully completed, then all the states of the system are formed and the matrix of transition probabilities constructed. Two onedimensional arrays are used to store the matrix in compact form; the actual nonzero values of the probabilities are stored by rows in a real arrey, while the number and position of the elements in each row ere stored in an int eger array. The list of states end the matrix are constructed as follows. A state, currentstate (originelly chosen as the state reed from the input), is examined to determine which states it can reach by a one step transition.
For each dest
ination state, newstate, the routine written by the user to determine the rate of transition is entered, and provided it returns a strictly positive transition rete, the subroutine INSTANT is called to see if an instantaneous transition may be made from newstate.
If so, the result of the instentaneous transition
becomes newstate, so that it is now a single step trensition from currentstate■ The list of stetes is now searched to determine if it contains newstate, if not newstate is entered into the list of states. The rate of transition (that al ready obtained from RATE), is now stored in the appropriate arrays. When all destination states which emanate from currentstate have been so treated, then the next state in the list becomes currentstate end the process repeeted until ell possible stetes have been considered.
Finally, to complete the anelysis, the
matrix is scaled by diving all the elements by the diagonal element of largest modulus.The matrix is converted to a stochastic matrix by adding it to the unit matrix.
The states of the system are printed by the package, as and when they are generated. (The user may suppress this output if he wishes). decomposable, all the states may NOT be generated.
If the system is
Suppose that the system is
374
W.J. STEWART
decomposable into two subsystems A, and B, so that no transitions are possible from any state of A to any state of B, and vice versa.
If the initial state
chosen in the input belongs to A, then the only states generated by the package will be the states of the subsystem A.
Even if transitions are possible from
Β to A (but not the converse), and the initial state is in A, none of the states of Β will be generated.
When the system is known in advance to be decomposable,
it is advisable to treat the different parts separately.
This does not alter
the results obtained, it requires less core to hold arrays, and uses less time to perform the analysis. Once the matrix of trensition probebilities has been obtained, the package autometicelly initietes the lop-sided simultaneous iteration algorithm to deter mine the long run probability vector. This vector is then anelysed to obtain the probability distributions of the different buckets which constitute the model.
For each bucket, the following information is output.
(i)
The probability distribution of balls in the bucket,
(ii)
The average number of balls in the bucket.
(iii) The standard deviation of the distribution. This information provides the user with a quentitive measure of the performance of his system.
By careful examination he may detect in which buckets most of
the balls spend their time.
If this is in buckets representing resources, then
the user may deduce high resource utilization.
Perhaps, however, one or two of
the buckets monopolize the system, possessing for a large proportion of the time, all, or most of, the balls. Such a system represents a bottleneck, introduced either by the scheduling elgorithm, or by the poor performance of a component vis-a-vis the remainder of the system.
Buckets which for a large proportion of
the time are relatively empty, may indicate that insufficient use is being made of the system components they represent. Finally, to complete this section on the software packege, it should be pointed out, that although basic models may be analysed almost immediately by the newcomer to the package, it is only with experience that full advantage can be taken of all the facilities incorporated into MARCA.
Conclusions. This paper considered the problem of obtaining, by numerical techniques, the stationary probability vector of computer system models which are based on Markovian concepts. The disadvantages of the standard methods of solution have
375
NUMERICAL ANALYSIS OF MARKOVIAN MODELS
been examined, particularly with respect to decomposable and nearly decomposable systems. A new technique, that of simultaneous iteration, which is not subject to the previous difficulties, was then presented.
Finally, a complete software
package, MARCA, for the numerical analysis of Markov chains was described.
It
was shown that the efficiency of the basic techniques employed in the package, its wide applicebility, and its ease of use, are among its most desirable qualities.
REFERENCES. 1.
BASKETT, F.
Open, Closed and Mixed Networks of Queues with Different
CHANDY, M.
Classes of Customers.
MUNTZ, R. and
JACM, April 1975.
Vol. 22, No. 2.
PALACIOS, J. 2.
BAUER, F.L.
Des Verfahren der Treppeniteretion und verwendte Verfahren zur Losung algebreischer Eigenwertprobleme. ZAMP, 1957.
3.
BUZEN, J.
Anelysis of System Bottlenecks using e Queueing Network Model.
Proc. ACM-SIGOPS Workshop on System Performance
Eveluetion. 4.
CHANDY, K.M.
Vol 8, pp. 214 - 235
April 1971, pp. 82 - 103.
The Anelysis end Solutions for Generel Queueing Networks. Proc. VI Annuel Princeton Confer, on Information Science and Systems. Princeton University,
March 1972.
5. COHEN, A.M.
Numerical Analysis, McGraw-Hill Book Company (U.K.) 1973.
6.
Decomposability, Instabilities and Saturation in Multi
COURTOIS, P.J.
programming Systems. 7.
GAVER, D.P.
CACM.
July 1975, Vol. 18, No. 7.
Diffusion Approximation and Models for Certain Congestion Problems. J. Appi. Prob. Vol. 5, 1968, pp. 607 - 623.
8. GELENBE, E.
On Approximate Computer System Models. JACM.
9.
GELENBE, E. PUJOLLE, G.
April 1975, Vol. 22, No. 2.
Probabilistic Models of Computer Systems. To be published in Acta Informatica.
Part II.
376 10.
11.
W.J. STEWART GORDON, W.J. and
Closed Queueing Systems with Exponential Servers.
NEWELL, G.F.
Operations Research, Vol. 15, No. 2, April 1967, pp 254.
JACKSON, J.R.
Jobshop-like Queueing Systems.
Management Science.
Vol. 10, No. 1, Oct. 1963, pp 131 - 142. 12.
13.
JENNINGS, A. and
Simultaneous Iteration for Partial Eigensolution of Real
STEWART, W.J.
Matrices.
KOBAYASHI, H.
Application of the Diffusion Approximation to Queueing Networks.
J. Inst. Maths. Applies. 1975, Vol. 15
Pert 1. Equilibrium Queue Distribution.
ACM-SIGME First Symp on Measures and Evaluation. Palo Alto, California, 1973. pp 54 - 62. 14.
ROMANOVSKY, V.l.
Discrete Markov Cheins.
15.
STEWART, W.J.
A Comparison of Numerical Techniques in Markov Modelling.
Wolters-Noordhoff Pub.
(1970)
IRISA Publication Interne N° 36. Université de Rennes, 35031 FRANCE. 16.
STEWART, W.J.
MARCA:
Markov Chain Analyser.
IRISA Publication
Interne N° 45, Université de Rennes, 35031 FRANCE. 17.
WALLACE, V.L. and
RQA-1 The Recursive Queue Analyser.
ROSENBERG, R.S.
Lab. Technical Report N° 2. University of Michigan, Ann Arbor, February 1966.
System Engineering
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
WORKING SET DYNAMICS
Hendrik Vantilborgh MBLE Research Laboratory Brussels, Belgium
A model for the interaction between the working set and the reentry rate is developed. This model explains some experi mental observations of Rodxiguez-Rosell on the dynamic beha viour of the working set size and on how this behaviour de pends on the window and page size parameters.
1. INTRODUCTION Although the working set model of program behaviour has been with us for a long time, little attention has been given to its dynamics; only average or stationary characteristics such as the page reentry rate or the working set size mean and distribution have been investigated. This state of affairs has been empirically remedied by Rodriguez-Rosell when he reported his experiments on the time-beha viour of the working set for a varying window size Τ and for a varying page size p. In this paper we set up a model for the interaction between the working set size and the page reentry rate; loosely speaking, this model formalizes the following cycle of cause and effect :
working set size decreases page reentry rate increases
page reentry rate decreases
working set size increases
We shall see how this model can successfully explain the experimental observations made by Rodriguez-Rosell. These observations pertain to the oscillatory pattern in the working set size behaviour, and to the influence on this behaviour of the win dow and page size. As a consequence of the ensuing better understanding, we shall be able to characterize a reasonably good choice for a value of the window size parameter T. In the last section we explain Rodriguez-Rosell's observations on the coefficient of variation of the working set size distribution and on the de pendence of its mean on the page size.
377
378
H. VANTILBORGH
2. A SIMPLE MODEL As in [5] we take the duration of an instruction as time unit. We shall use t to denote the running time, and T to denote the size of the window measured in instructions. Our model is based on the assumption that the dynamic interaction which exists between the working set size w(t,T) and the page reentry rate r(t,T) obeys the following set of rules : (i)
an increasing of r(t,T) tends to increase w(t,T); or more pre cisely : an increasing of r(t,T) causes w(t,T) either to in crease faster if it is already increasing, or to decrease more slowly if it is decreasing;
(ii)
conversely, a decreasing of r(t,T) tends to decrease w(t,T); more precisely : a decreasing of r(t,T) causes w(t,T) either to decrease faster if it is already decreasing, or to increase more slowly if it is increasing;
(iii)-(iv) reciprocally, w(t,T) influences r(t,T) in the same way : an increasing of w(t,T) causes r(t,T) to decreases faster or to increase more slowly, and a decreasing of w(t,T) causes r(t,T) to increase faster or to decrease more slowly. In fact, these rules aim at encapsulating the most important dynamical aspects of the locality property of program behaviour. The fluctuations of the page reentry rate of a program which in the course of its execution passes through different localities or phases, reflect the passage of these localities through a working set which is continually attempting to capture new localities while releasing old ones. These rules can be more precisely formulated. Therefore, we note first that, with respect to the time-derivatives of w(t,T) and r(t,T) we may distinguish the four situations A, B, C and D defined in table 1 below. Our rules define then for each of these situations the sign of the second time-derivatives of w(t,T) and r(t,T); these signs are also listed in table 1.
situation
dw(t,T)
dr(t,T)
d 2 w(t,T)
d2r(t,T)
dt2
dt
dt
A
>0
>0
>0
0
Figure 1. w(t,T) and r(t,T).
Remark. It is possible to arrive at exactly the same equations for w(t,T) and r(t,T) when starting with a different set of rules governing their concomitant behaviour : (i')-(ii')
when w(t,T) is larger than its mean w(T), r(t,T) decreases, and when w(t,T) is smaller than w(T), r(t,T) increases.
(iii')-(iv') when r(t,T) is larger than its mean r(T), w(t,T) increases, and when r(t,T) is smaller than r(T), w(t,T) decreases. These rules of behaviour describe the following cycle of cause and effect in the interaction between working set size and page reentry rate :
working set size working set size increases > becomes larger than its mean page reentry rate decreases
page reentry rate becomes larger than its mean
t
page reentry rate becomes smaller than its mean
page reentry rate increases working set size working set size becomes smaller · < — decreases than its mean
WORKING SET DYNAMICS
381
The hypotheses (i') to (iv') can also easily be formalized by the following sys tem of equations : Ä^=£
2
(r(t,T) r(T)) ,
(1·)
and ^ f ^ = m 2 (w(t,T) w(T)) . (2') dt The integration of this system yields expressions for w(t,T) and r(t,T) which are identical to (3) and (4). Though the second set of rules (i') to (iv') is conceptually simpler than the first set (i) to (iv), it was found a posteriori by inspection of figure 1, it presupposes the existence of a mean value, and it seems to us that it captures less sharply the influences that the variations of w(t,T) and r(t,T) mutually exert and undergo. We have thus explained the general oscillatory behaviour of w(t,T), but it is possible to explain some other and more precise experimental observations report ed by RodriguezRosell [5]. 3. THE INFLUENCE OF THE WINDOW AND PAGE SIZE To carry further our investigation, we need to evaluate ν in function of the window size Τ and the page size p. It is therefore useful to remark that
lim ζ} K+»
SK W t , T ) w(T)) 2 dt = L. 0 2v
(6)
This contains the clue to all subsequent results, since we will approximate the variance σ 2 (Τ) of the working set size distribution by the integral in (6), which is the mean square error of the deterministic process w(t,T) defined by (5). This yields
V = °2(T) ·
(7)
2v 3.1. The influence of the window size Τ on the behaviour of the working set can now be formulated as follows : w(t,T) oscillates around a mean value w(T) with amplitude a = \f~2~ σ(τ). o
Both theory [2,6] and experiment [4,5] show that σ (Τ) reaches a maximum for some T, say T„. This implies the existence of 3 phases in the behaviour of w(t,T) as Τ increases :
382
H. VANTILBORGH
Τ < T„ : σ(Τ) and ο increase, M Τ ==* Τ : σ(Τ) and α are stationary, Τ > T., : σ(Τ) and α decrease. M The above explains Rodriguez-Rosell's observation that, when Τ increases, the near-equal-amplitude oscillation of the working set size disappears. RodriguezRosell did not observe the first phase, viz. when the amplitude of the oscilla tion increases, because it occurs for too small values of T. It explains at the same time why Rodriguez-Rosell observed this oscillation to leave only the maxi mum value : indeed, w(T) tends asymptotically to its maximum value, and the am plitude o tends to 0 with σ(Τ). 3.2. Before we embark on the investigation of the influence of the page size p, we must first remove a slight ambiguity in [ 5l · In the working set size behaviour shown there, two kinds of peaks appear : at one hand, the maxima of the oscillation, and at the other hand the irregular peaks at 25000, 1 100 000, 1 300 000, 1 550 000 and 2 000 000 instructions. Rodriguez-Rosell has also made a distinction between these two sorts of peaks, although it is not always clear which peaks are meant when he discusses a par ticular result. The irregular peaks are caused by overlaying and thus signalize the changing of locality during the program execution. It is obvious that, when the window size increases, the period during which the old and the new locality are both in the working set also increases. This explains why, when Τ increases, these peaks remain, becoming wider in the graph. The height of these peaks is determined by the locality sizes, and is nearly independent of the page size; this suggests that the peaks whose relative importance decreases as the page size decreases, are the maxima of the oscillation. For the investigation of the influence of p, we must refer to a principle of program behaviour analysis proposed by Courtois and the author in [ 1] . It is shown there that it is licit to analyse a characteristic or a property of a pro gram first at locality level, and then to extend the result of that analysis at the program level by considering a program page reference string as a sequence of reference strings in localities, separated by short and infrequent transitory periods . Incidentally, this pattern can be readily seen in figure 1 of [ 5] ·' if we discard the initial phase, we can discern four localities, from 150 to 400, 400 to 1 050, 1 100 to 1 500 and 1 500 to 2 000 Kinstructions respectively, each locality having its own oscillation mean, amplitude and frequency. In [ 1] it is also shown that the IRM is - except for a few enumerable cases - an acceptable model of internal locality behaviour.
1. This principle also justifies the use of constants SC and m in (1) and (2); the interaction between w(t,T) and r(t,T) is certainly not stationary over the whole program, but, as we learn from [ 1 ] , a statistical equilibrium within a locality is already reached in the short run.
WORKING SET DYNAMICS
383
We are now ready to show that Rodriguez-Rosell's observation that, when the page size is decreased, the relative importance of the peaks decreases, is confirmed by our model. To that end we must investigate
VTaW w(T) Both σ(Τ) and w(T) are functions of p , because they depend on the total number of pages, which is for a given program, inversely proportional to p. Whereas the properties of w(T) are fairly well known, little is known about σ(Τ), and it is in the IRM only that explicit expressions, required to assess the in fluence of p, are known for both. At this point, the principle of analysis mentioned above is instrumental : justifies the analysis of o(T)/w(T) by the IRM for each separate locality. expressions for w(T) and σ(Τ) for an IRM characterized by a page reference tribution {b.; i = 1, ,n} where η is the total number of pages, are (see
ι
w(T) = η - J
i=1
(1 - b . ) T ,
it The dis [2,6]):
(8)
and o 2 (T) = 2
I (l-b.-b.) T - (n-w(T))(n-1-w(T)) . 1 J 1«i2n
2n
ï
j=1
(lb)T, 3
where it is understood that each of the n original pages has a reference proba bility equal to (b . ,+ b . ) , the two terms being the reference probabilities of the two corresponding page halves. This amounts to show that
I
((1b 2i . l ) T +(l^ 2 .) T )>2.I
1=1
0b2i_rb2i)T
1=1
which is easily done, by observing that for all i
( l
" W
>
'
and
We refer to Hatfield's excellent exposé [3] on the part played by the page size in virtual memory systems. 7. CONCLUSION Within the framework of a simple model for the interaction between the working set and the page reentry rate, we have explained some experimental observations of RodriguezRosell. As a result of this investigation, we have obtained a useful characterization of a reasonably good choice for the value of the window size T, an important parameter in a working set space allocation strategy. ACKNOWLEDGEMENTS I thank P.J. Courtois for our fruitful discussions, and J. Georges for his penetrating remarks on an earlier version. REFERENCES [ 1 ] COURTOIS, P.J. and VANTILBORGH, H.: A Decomposable Model of Program Paging Behaviour. Acta Informatica, in print. [2 ]DENNING, P.J. and SCHWARTZ, S.C. : Properties of the WorkingSet Model. Comm. of the ACM, Jjj., 3, 191198 (1972). [3 ]HATFIELD, D.J. : Experiments on Page Size, Program Access Patterns, and Virtual Memory Performance. IBM J. Res. Develop., J[6, 1, 5866 (1972).
WORKING SET DYNAMICS
387
[4 ]RODRIGUEZ-ROSELL, J. : The working set behaviour of some programs. Report NA 72.51, Inst, för Informationsbehandling, KTH, Stockholm, Sweden (1972). [5 ]
: Empirical Working Set Behavior. Comm. of the ACM, 16, 9, 556-56O (1973).
[6 ]VANTILBORGH, H. : On the working set size distribution and its normal approximation. BIT, _l4_, 2, 240-251 (1974).
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES IN VIRTUAL MEMORY SYSTEMS WITH TWO CLASSES OF PROCESSES
A.L. Schoute Department of Mathematics University of Groningen Netherlands
Global memory management strategies are investigated by means of a stochastic model with two classes of processes. The model consists of a central server network embedded in a queueing system, which represents the memory scheduling mechanism and allows quite general admission rules. An explicit and approximate steady-state solution is derived for a system working under a never-vanishing "workload", which is composed of processes from both classes according to a Bernoulli distribution. Comparisons between different strategies are made on the basis of their effect on the system throughput.
1. Introduction In modelling multi-process computer systems we not only have to deal with a network of parallel servers (processors), but also with a finite memory, of which the customers (processes) need a share before they can be served by some of the processors (in particular the CPU). This forces us to consider models in which there is a distinction between all of the active processes present in a system and the multi-programming set (i.e. the subset of processes which are memory resident). In modern virtual memory systems a process is ".memory resident" if it has a number of memory pages at his disposal, which typically contain only a part of its program and data. Only processes in the multi-programming set can take a share of the (CPU) processing power. Memory scheduling,(i.e. the admission and removal policies for the multi-programming set) and page
management, (i.e. the allocation of
pages to members of the set) have a major influence on the way in which all the processes proceed within the system. If we direct our attention to global scheduling strategies, we can make interesting (and realistic) studies by considering stochastic models in which not all processes are statistically identical and/or not all processes are treated equally. Conversely, where a model allows only statistically
389
390
A.L. SCHOUTE
identical processes, we are restricted to memory scheduling and paging rules which do not disturb the equality. In this case the only freedom left in relation to memory scheduling, is the control of the degree of multi-programming. We present a stochastic model of a virtual memory system which allows two classes of processes. The restriction to two classes is made so that we still derive an explicit solution. We believe, in spite of this restriction, that the model shows new aspects, which could not be revealed by considering only one class. In section 2 we give an outline of the model and the reasoning behind it. The EMAS-system [1], which applies the working set principle in conjunction with a dynamic classification of processes, has to a large extent served as a basis for this model. The model is also in many respects comparable to that given by Brandwajn [2], but a much more general memory scheduling mechanism is considered. We have solved the equations of our model by approximation using the so-called decomposition and equivalence technique [3]. This technique consists of hierarchically decomposing a network into nested (sub)networks. Firstly one derives a steady-state solution for the sub-network as if it were an independent closed queueing network (without any interaction with the surrounding network). Secondly, one uses these results to solve the enclosing network. This method of solution, which was first applied to computer networks by Courtois [4] is also used extensively by Brandwajn. In cases where the rate at which the state-transitions of the sub-network proceed is some orders of magnitude faster than the rate at which the surrounding network operates, the approximate solution will be very close to the exact solution and is therefore justified. Our model is decomposed into a processing part, in which transitions occur at intervals in the order of 10-50
m sec, and into a memory admission
part with transitions at intervals in the order of 0.2 - 2 sec. Although the ratio is not extreme, the decomposition seems reasonable. Both parts are treated separately in sections 3 and 4. In section 3 we make use of the mathematical framework for closed networks with different classes of customers as given by Muntz and Baskett [5]. In the last section (section 5) we discuss a number of cases in which our model is applicable, together with some results.
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
391
2. Outline of the model The model we present consists of a central server network S
[6], repre
senting the processing part of the system, embedded in a queuing system S_, which represents the mechanism that controls the admission of processes to the central server network (see fig. 1 ) . We suppose that the processes in S. are memory resident. The admissionmechanism therefore coincides with the function of the memory scheduler.
terminal
runqueue admi
terminals
—fcpuV
»■
< loop
removal removal loop
drumqueue
drum
central server network
memory
diskqueue
disk
queue
S
!
memory scheduler a) Network S
b) Network S
& S,
figure 1 The open queueing system S network S
can again be taken as a part of a closed
containing two alternative paths (fig. lb):
a "removal" loop, which is travelled by processes which have not completed their tasks but are nonetheless removed from memory for some reason. a "terminal" loop, which is take by processes which have completed their tasks and are waiting for user action at a (typewriter) terminal or batchstation. The total configuration which we get, is similar to that studied by Brandwajn in [2], However a number of assumptions about the model are different (e.g. Brandwajn assumes that only processes in a subnetwork of S
share real memory).
392
A.L. SCHOUTE The central server network S
operates as follows:
Processes served by the CPU can, apart from being removed, be interrupted in two ways : a) by a page-fault interrupt, which results in a request to the drumprocessor for a page transfer, b) by an I/O-interrupt, which results in a request to the diskprocessor for an I/O-transfer. After the completion of the transfer the process again joins the run queue. We assume that the service discipline of the CPU can be regarded as "processor shared". This means that the processing-capacity of the CPU is equally shared amongst the processes in the run queue. If there are n processes in the run queue, each will proceed at 1/n of the processing speed of the CPU. This assumption is crucial in the case of statistically different processes. It would be realistic in time-sharing systems with CPU time-slicing at a rate many times faster than the page-fault and I/O-interrupt rates. The interpretation of the chosen central-server network is somewhat arbitrary. If we look at EMAS, we see that both program overlay and data-input/output are handled by the same paging-mechanism. There appears to be no difference between a page fault and an I/O-interrupt. However, the system uses drum memory for copies of pages which have been accessed in the recent history of the process
* )and
disk-memory to contain all of the files. The effect tends
to be the same as in the earlier interpretation: program page-requests will mostly be directed to the drum and file page-requests to the disk. In order to investigate different methods of memory scheduling we ought to specify the effect that the number of memory pages, which a process has at its disposal, and the pagemanagement have on the processing behaviour. Program squeezing will affect the rate at which page-fault interrupts occur, the I/O-interrupt rate staying more or less at the same level. The relationship between the mean inter page-fault time L (also called the life"time) and the number of pages σ can be expressed by means of a so-called life-time function L(a). Although the life-time function depends on many factors, such as the paging rules and the properties of the program, its global shape is rather characteristic and known from experience (see fig. 2 ) . We have made use of the approximations of:
)
there is a drum working-set in the same way as for main memory.
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
393
(mS)
(1) Belady & Kuehner class 1: α =10~ 2 k =2.5 class 2: α =10~ 2 k,=2 (2) Cham..Fuller & Liu class 1: Β =15 Cj=15 class 2: B2=25 C 2 =50
figure 2 lifetime functions
10
20
30
40
50
60
70
80
90
100
σ ■* (pages) 1) Belady & Kuehner [7]: L(o) = ασ
where a depends on the program and
the scaling and k depends on the program locality and lies somewhere in the range 1.5 < k < 2.5, 2) Chamberlin, Fuller & Liu [8] :
L(a)
2B
> + (§)*
where C is the number of
pages that provides the process with half of its largest possible life-time and Β is the expected inter page-fault time if the allocation equals C.
394
A.L. SCHOUTE
These approximations were suggested for fixed partition policies. They still seem reasonable for page allocation policies which make it appear that processes have a fixed partition, but at the same time make use of the fact that in reality the required partition is variable. Such an "illusion-technique" is applied in EMAS. Each process has a fixed allocation (maximum working-set size) assigned to it. A process may accumulate main memory up to its maximum working-set size. The number of pages used by a process must fall within this allocation and in this respect we have a local page allocation policy. The page replacement happens in a special way: after a fixed period of process time ("strobe period") the working-set is adjusted by throwing away recently unused pages. In the meantime the working-set grows by the addition of missing pages on demand. The illusion consists of the fact that the total real memory is over-estimated, but in such a way that the sum of all actual working-sets will almost surely fit in the real memory. So there is also a global aspect in the page allocation policy (owing to the fact that all free pages belong to one pool) which is exploited by the over-allocation. EMAS works with an over-estimate in the order of 20 to 30% (this is, however, partly due to page sharing). The working of the queueing system S„ is based on a classification of processes into different classes. This feature is essential in EMAS. In EMAS each class has a maximum working-set size. The memory scheduling is based on a dynamic classification of processes, which proceeds in the following way (after starting in an initial class): when a process tries to exceed its maximum working-set size it is removed from memory and reclassified in another is also associated with each
class.
class
A maximum residence time
If this time allocation is used up
then the process is also removed from memory and reclassified. Depending on the working-set size, the process can be assigned to a
class
with
the same or a smaller maximum working-set size. In this very coarse manner the memory-allocation of a process can grow and shrink dynamically, but not without the intervention of the memory scheduler. The memory scheduler in EMAS applies the working-set principle: a process is only admitted to memory if there is room for its maximum working-set size (within the over-estimated memory). In our queueing system S , we assume, as in Brandwajn's model, that we have two classes of processes, that the admission is in order of arrival (FIFO) and that the class to which a scheduled process belongs is determined by a sequence of Bernoulli trials (independent of the state of the system). Moreover we assume that the system S
works in a saturated condition, i.e.
the memory scheduler will always find a next process. The admission rule
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
395
itself can be chosen quite generally with only a few natural restrictions (see section 4 ) . This makes it possible to compare the effect of different scheduling policies (for example an admission rule based on the working set principle against an admission rule based on equipartition with some other form of workload control). The Bernoulli trials, according to which the class of the next-to-beadmitted process is chosen, determines in a saturated system the throughput ratios between the different classes. Therefore, if such a choice mechanism is implemented (as in EMAS) it provides an important tool to control the system. The assumptions about the scheduler input place some restrictions on the way in which the system S. can be fitted into the total network S, and on the results which we can derive from the total model. The total model seems reasonable if we only consider the removal loop. Such a model is interesting in the following cases: a) the system does indeed work in a saturated condition as far as the terminal loop is concerned, i.e. we may consider this loop as being closed (any process which finishes will be replaced immediately by another one) ; b) the emphasis is placed on the removal loop and the influence of the terminal loop is neglected (which is an interesting case in relation to EMAS). However the influence of the terminal loop is interesting, just in those cases where for at least one class the saturated condition does not hold. Further approximations are then needed if we are to derive "easy" solutions.
3. Central Server Network S. Let us first consider the network S
as a closed model with m processes.
These m processes are divided amongst c classes, the division being fixed and given by the tuple (m., m„, ...,m ) . We have three service stations: the CPU (stage 0 ) , the drum-processor(stage 1) and the disk-processor (stage 2 ) . (See fig. 3 ) . The service times and the times between page-faults and I/0-requests are taken to be exponentially distributed * ) and mutually independent. *) In theory, any service time distribution with a rational Laplace transform could be modelled by introducing extra (exponential) stages in the network. This is also true in the case of different classes of customers; see Muntz and Baskett [5]. In practice one is forced to consider simple networks in order to limit the size of the state space.
396
A.L. SCHOUTE We assume that the drum and disk service-rates μ
mean service times τ. μ
and u, (corresponding to
/μ.) are the same for all classes. The CPU-rate
, which is the sum of the page-fault rate φ
and the I/O-request rate
depends on the class r. The non-zero transition probabilities q..(r) of going from stage i to stage j for a process of class r, are given by:
q 0 1 (D =Φ Γ /μ ε q 0 2 (r)
Ψ/μ,
(3.1)
[10(r) = q 2 0 (r) processor-shared"
stage
(CPU) Figure 3
stage 1
subnetwork S (drumprocessor) Stage ,2
(diskprocessor) The state of the model is described in terms of the numbers of processes at each stage. Let us denote by: n. the number of processes from class r at state i ir
0,1,2 1,2,..
η. = ι
Σ η. the total number of processes at stage i , ir r=l 2 2 We have Σ η. = m and Σ η. = m i r r i-O i-O x According to Muntz and Baskett the stationary probability of state n=
(n 0| ,..., n Q c , n]
η
, n 2] ,..., η
) is given by the following
product form, if we assume that the service discipline of the CPU, for which the service time
m —
c
/μ
is class dependent, is "processor shared":
π {t-i
r
Ir 'ir
"2r *2r
'Ir'
Λ
2τ]
(3.2)
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES The x.
397
are a solution of the transition equations:
2
¿
q
0
i j ( r ) y ir x ir " V j r
J 0,1.2
(3.3)
r = l,2,....c
The subscript m = (m , m . , — , m ) indicates the (fixed) number of processes in each of the different classes. K' is a factor which normalizes the sum m of all probabilities to 1. We prefer to write (3.2) in the equivalent form:
ρ (η) Κ l m — m
c , m \ n n n„ „rf r I or Ir 2r, Π, { η , η , , η„ χ χ. χ„ i r=l Η o r ' I r ' 2t! or Ir 2r
m
with the multinomial
m
(κ = ( m
^ijiDj
A (3. )
(ηο,η,,η2)
coefficients
m .
Λ
= r ^ ( η„„,η. , n , J η', η. . o r ' " l r ' " 2 r ' "or' " i t ' u
r , etc. η. . "2r·
V ). » J
m
In our model the homogeneous solution of (3.3) is easy to find:
χ
= C or
(C is an arbitrary constant) Because χ = χ and χ. " x„ don't depend on the class r, we can simplify or o 2r 2 (3.4) by considering only states (η , n..,...,n, , n , ) . If we sum over all states η with (η , n..,...,n. ,n_) fixed, we obtain: η
χ
K
o
η
2
χ,
c
/m
ι η,
V V »Π·· ».C V = m Tmf ^/(n'jxj >
^
398
A.L. SCHOUTE
In the case of two classes (c = 2) with χ = 1, χ
τ.φ , χ. τ ψ,we can
derive the result:
VVnll'n12>V
(τ,ψ) 2 Ψ)
=K
ι ρ m Τ (m,,m,) "1""2'
/m
W m2 Ν
l "11 n , . A n"12 .J
η,,
^iV
n,2
(x
l*2>
This result is used in our model. Let us again consider the general probabilities Ρ (η) given by (3.4). The CPUutilization U is given by the sum of the probabilities of all the states η with η > 0: — o U(
H>{n|nr>0} VH) — o
(3.8)
—
If the CPU is not idle (n > 0) then the fraction of the processor capacity ° η spent on processes of category r is —2Σ. , because of the processor sharing o discipline. Let V (m) denote the mean of this fraction for class r. We have: V (m) = E { — r— η
I η > 0} ' o
{η|ηΣ > 0} "ο —
O
Σ
F
S"
n>>
{nln > 0} 2 ~ = rrfy χ r U (m) or — — m
E
(3.9)
m with m —
= (m,,m„,...,m 1,... ,m ). I 2 r c
In the special case where χ r
= χ , χ. χ, and χ„ = χ„ for each class r or o Ir 1 2r 2 i.e.where the processing behaviour is independent of the classes, we have the standard result: η η n Ρ(η0,ηι,η2) = Κ x0° X| *¿
(3.10)
V(m)=^
(3.11)
and
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
399
4. Queueing system S,, The quantities U(m) and V (m), derived in the preceding section, will be used to solve the enclosing system S
by approximation. The approximation consists
of the fact that we equate the network S
with a single server queue with
serving rates, which are derived from the closed network S . We take as the serving rate μ
of a process of class r the result:
£ r (m) = U(m) . V r (m) . Ζτ
where — c
i
(4.1)
(r = l,2,...,c)
denotes the mean execution time per residence period for a process r
of class r. The memory scheduling proceeds, as we assumed, in a saturated condition, i.e. there is always a non-empty queue of processes waiting for admission. By γ we denote the class of the process at the head of this queue. This process, called the candidate, is the one to be scheduled next. Processes are only'admitted at moments when some other process leaves S . Depending on the admission rule, there could be zero, one or more processes admitted. Admission occurs in order of arrival starting with the candidate.
sub-network S figure 4
queueing system S
The states of model S. are described by s = (ιη,γ), where: m ■ (m and
m ,...,m ) as defined in S.
γ £ {1,2,...,c} denotes the class of the candidate.
Let X be the set of all states s = (m,v), which are admissible (this depends on the admission rule). If a state transition s' ■+ s is the result of the departure of a process of class r and the admission of processes of the classes: a^,
a2,...,a.
.(o :■ s (o
(i
= 0,1,2,...) we speak about the "path":
> Λ}
(a = γ')
400
A.L. SCHOUTE
The admission will stop after the first i for which s
e χ, and we assume
that such a state will always be reached for every row of waiting processes. The event, that the memory scheduler encounters a process of class r as the next candidate, is considered to be independent of the state of the model and has the probability λr . Therefore, if a transition s' = (m',y) ·* . . r s = (πι,γ) can occur, it has the probability: c
V S = '.VJ
m.-m'. + δ .
J J
Jr)
λ
rr-Ns's
α-i where:
6. - { ^ = r jr o jtr
( 4
·2)
γ' , m. - m.' + δ. > 0 and J j jr —
N , = number of different paths from s' to s, when a process of class r leaves. We are interested in the stationary solution p(s) of the Markov-process, as defined above. The balance equations, which must hold in the stationary case, have the form: ?(s)
c c Σ μ (s) = Σ p(s') Σ μ (s') m. for each comoonent i ) . 1
—
1
To put it into words, "consistency" means that if in some configuration m a process of class r can't be admitted, then it can never be admitted in other configurations without compensation. For example, an admission rule based on fixed space requirements for processes of each class, will be consistent. Proof of (4.8): Define the set of states £
which could be occupied immediately after
the departure of a process of class r and before any admission:
402
A.L. SCHOUTE
£
= r
{(m,Y) | (m|,...,mr+l,...,mc,Y) e χ)
Consider a path:
s' ■+ s ( o ) r
+' s ( 1 ) ■+ ... ìp s ( l p )
Clearly s
£ E . If i > 0 then a. should be equal to γ ' .We have s £ £ r r (i) for ι > 0, because, when we suppose that s € £ , then there is an m m s e x with m>^ (m'j ,m'2,... ,m'r ' Y |' ' ) . which violates the consistency. We can therefore find all p a t h s y r , for a given s, by considering the transitions in the reversed order, now removing processes until we
*
find an s
e £ . Because the departure rates are normalized for each s:
c Σ μ (s) = 1, we find that the sum of the products corresponding to r=l possible paths Ρ will again be unity. A
all
We shall treat two cases, where we can find a product form of p(s) which satisfies (4.9), for every path. m , The first (obvious) case is, when μ (s) = — . Then p(s) — ¡ — ^ 5 r m m ¡m.: solution. ' ¿
r is a m ! c
The second case is given by the following theorem: Theorem If we have a Markov process as previously defined with 2 classes of processes and a consistent admission rule, then the solution of(4.6) is given by:
p U )
. _
E _
(4..0)
Π μ (i,θ (i)) Π μ 2 (θ,0)ο) i=l j=l where: θ.(m.) = max {m. | (m ,πΟ ( χ } 62(m]) = max {m2 | (m[,m2) £ χ } Proof In general it is easy to see that for a consistent admission rule Ν Τ t= 0 for only one r. Moreover if there are only 2 classes, we have ss Ν ί < 1 f if for κa path i > 2 then a, - a, ·»...- a· t= r, so there is no ss— ρ— 2 3 l \ Ρ choice with 2 classes). We need only show that the relation (4.9) is true for any transition s' ■* s. Define: χχ =
{(πΐρΠ^,γ) e χ | (π^ + Ι,π^,γ) $ χ)
χ, =
{(m.,m, γ) e χ I (m.,m,+l,Y) £ χ)
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
403
From the definition it follows that (mpSjdn,)) e χ 2 (91(m2).m2)e χ, Therefore if s = (m.,m.) e χ. we have p(m ] ,m 2 ) = μ 2 (m ] ,m 2 ) pOn^n^-I) and similarly for the other index. (We have omitted γ where it is irrelevant.) Consider the path from s' ■+ s:
(ο) Ύ' (1) a-2 % *· ... ■+
s -+ s ' + s
8 2 then: β ,...,β ν Ρ' Ε χ ρ — r and a« = a, = ... = a. Φ r, from which follows that: x p p(s0)) = μ
(s(2))...ya.
(s(1P})p(s)
(4.11)
32
We have two cases: r = Ύ'
Then s' = s
, both factors μ (s') and μ
(s
) cancel and
(4.9) is equivalent to (4.11). rt Ύ'
: Then we always have s' € Y t
From the consistency follows that s «
eχ
p(s') μ Γ ( 8 *) = p(s ( l ) ) μ γ , ( 5 0 ) )
which together with (4.11) gives (4.9).
J>
From the theorem we derive our main result: If we restrict our model to two classes and have a consistent admission rule, then the stationary solution of (4.3) is explicitly given by: m. ~ λ1 p(m.,m9,Y) = c — ■ 1 2 m
m9 ·λ2 =
· λ2γ m_
π μ (i,θ (i)) π i=l ' j=l
(4.12)
2 y202(j),j) Σ r=l
μ^Οη,,^)
404
A.L. SCHOUTE
where
u^mpiiij) = U(m 1 ,m 2 ). V r (m 1 ,m 2 ) ε
as given by subnetwork S
μ (mpnu) are the normalised μ* (4.5) θ and
& θ 2 are defined in (4.10)
c is chosen so that the probabilities sum to unitv. We are interested in the following performance measures:
a) the mean CPUutilisation Ü=
Σ
uUj.nu) p(m ] ,m 2 ,Y)
(4.13)
sCx b) the mean pagefault rate φ=
Σ
c {Σ
sex
r=l
V ^ n y n ^ ) cpr) p(m 1 ( m 2 ,γ)
(4.14)
c) the mean number of memoryresident processes of each class, and the mean level of multiprogramming m m
=
Σ s£
m
p(m,,m2,y)
r = 1,2
r
X
(4.15)
d) the mean throughput rate Λ =
Σ
Λ
(4.16)
with Λ
=
Σ
V (m) U(m) ε
ρ (π^,π^,γ)
(4.16)
sex We have in fact (by a steadystate argument) Λ
λ Λ.
The mean throughputrate is directly related to II by: (—
U
+ — )
Λ = ¡5
(4.16a)
V
l
e) the mean residence time W
for each class. r
According to Little's formula we have:
W
= m r
/ Λ r
r 1,2. r
(4.17)
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
405
5. Applications and Results It is fairly easy to compute results from the solution of our model. The parameters, which determine the probability distribution (3.7) of S., are:
and
τ τ„ φ. φ. ψ
= mean drum service time, = mean disk service time, = pagefault rate of class 1, = pagefault rate of class 2, = iorequest rate.
The CPUutilisation (3.8) and the processor fraction spent on each class (3.9) can be found without computing all the probabilities in (3.7) (see the computational techniques given by Buzen [9]). The pagefault rates φ
depend upon the lifetime function chosen and upon the
memory allocation σ , where ψ
= 1 / L (σ ) .
This allocation could be equal
for all classes (σ · = σ Vr) (balanced partitioning)
or "tailored"
r=l,2,...c.
to each class (imbalanced partitioning).
For given lifetime functions it is theoretically possible to find the tailored allocation for which the pagefault rates are equal ("equipagefault allocation"): φ = « = ^ r
1 / L (σ ) r r
V . r
Furthermore,the solution of our model S ε λ χ
= =
depends upon
mean execution time of class r per residence period input fraction of class r, a "consistent" set of admissible states.
The solution consists of the set of steadystate probabilities for all the states (m.,m,Y) of S . (the distinction between γ = 1 and γ = 2 is of no further use to us, so if both occur we add the probabilities together). By means of these probabilities we can compute the "overall" performance measures as given by (4.13) (4.17). Note that the throughputratio between both classes is specified by the ratio λ
: λ, which is imposed on the system
by the given input. This input could be either "natural" or controlled by a choice mechanism. By controlling the inputratio we can influence the mean response time for processes of different classes within the total system S_. Our main goal is to achieve
an optimal throughput for given inputratios.
This implies that we should optimize the main CPUutilisation υ (see 4.16a). Another factor is the overhead caused by pagefaulting. Because a low mean page fault rate usually corresponds to a high CPUutilisation, we have another reason for taking the mean CPÙutilisation υ as our main performance measure. The admission rule, which corresponds to the set χ, will, of course, be directly or indirectly (e.g. via φ ) related to the memory allocation σ and the total available memory M.
406
A.L. SCHOUTE Before treating different allocation and admission policies, we first
discuss some interesting points, which can be derived from the (sub)model S. alone, when we examine the CPUutilisation for a fixed population (m.,m_) of process. One question concerns the effect of imbalanced partitioning in the case where all processes have the same program characteristics (equal lifetime functions). We have noticed in this case that in normal circumstances imbalanced partitioning is never preferable to balanced partitioning in contrast to the impression given by Denning & Grahem in [10]. Only in extreme cases, where the drum is a notorius bottleneck (as in the example they give on page 938), is there a better CPUutilisation with imbalanced allocation. Our results are in accordance with those of Ghanem who proved in [11] that in cases where the imbalanced partitioning is advantageous, one should take the extreme partitioning which means choosing a lower degree of multiprogramming.
The importance of this result is that in the
sequel we never need to consider imbalanced allocations within one
class
in order to improve the system. In the case
of different lifetime functions, an imbalanced or, to put it
positively, a tailored allocation, will in general give a better utilisation of the CPU. This is highly dependent upon the lifetime functions under consideration. For example, one cannot say in general whether an equipartition (σ = M/(m. + nú)) or an equipagefault allocation (note that (? is determined by m.o
+ m.a
= M) is the more advantageous in relation to the CPUutilisation.
An important factor is the mean pagefault rate, Σ V (m.,m.) φ , which must r c r=i r ' ¿ be minimized under the constraint Σ m a = M. r1
r r
However, we can say that if an imbalanced allocation is applied in an alternating way ("biasing" [7]), irrespective of the program characteristics and uniformly in time, then in most cases this will not give a better CPU utilisation. An improvement of the CPUutilisation will however occur
ii
"biasing" is performed on the basis of equal numbers of pagefaults and not on the basis of equal time periods. This is due to the fact that on average the system stays longer in the state with the most favourable allocation (giving the lowest mean pagefault rate). Not only is the CPUutilisation important but also the processor fraction which is spent on each class. In fact one can find that the CPUutilisation stays at a reasonable level due to processes of one class, while the fraction spent on the other class is practically zero (as a result of a high pagefault rate). In the case of the equipagefault allocation the fractions are proportional to the number of processes in each class (see 3.11).
COMPARISON OF GLOBAL MEMORY MANAGEMENT STRATEGIES
407
We can make a crude classification of page allocation policies by considering the following factors: E:
all processes get an equal share of the memory (equipartition)
versus T:
processes get a "tailored" share of the memory
I:
the allocation is fixed throughout the residence time
and
versus II:
the allocation may change during the memory residence time
We will give an admission rule for each of the combinations E T , Τ , Ε Τ
,
and compare the results for some values of the parameters.
If the allocation is fixed throughout the residence time (case I) we will not,in general, reach the optimal CPUutilisation in each state. This is however possible in case II, where the allocation may be adapted at those times at which removal and admission of processes take place. Such a page management policy may have negative repercussions on the paging behaviour of processes, and this must be taken into account by means of the lifetime function. Where it is difficult to make an estimate of the effect of these abrupt, but relatively rare changes (for example changes in the degree of multiprogramming), we proceed with a "bestcase" analysis by using the same lifetime functions as in case I. We start by comparing the cases which fall under I: Case E
:
The only possible admission rule consists of imposing a fixed degree of multi programming m : χ = { (m ,m ,γ) | m . + m
= m }.
Ça£Ê_ïi : In t h i s case we also have l i t t l e choice. I t i s s e n s i b l e to f i l l the memory as far as p o s s i b l e , i . e . by applying the working set p r i n c i p l e . Thus the s e t χ contains a l l the states(m ,πι.,γ) which s a t i s f y the r e l a t i o n : c
Μ σ < Σ ι η σ < Μ Ύ r=, r r -
(5.1)
A.L. SCHOUTE
408
An example is given in fig. 5, in which also the possible state transitions are shown. The tailored allocation should be chosen globally. We can choose for example: a) the equipagefault allocation, b) the allocation σ
for which the optimal U is reached if there are procPSses r
of class r only, i.e. for t l ! (5.2)
figure 5
In fig. 6 Û is given against the (mean) dep.ree of multiprogramming, for various input ratios and the two sets of lifetime assumptions, given in fig. 2. The points (m,S) of case E
are connected by a smooth line in order to
compare them with the points (m,u) given for case Τ . A nonuniform globally fixed allocation has the drawback of fragmentation : in general f \
there is memory unallocated
. . .
. The effect of fragmentation could be significant,
especially if the total memory is small. The fragmentation effect will be minimal, as we might expect, if the majority of processes (λ with a small working set (σ
»
λ 2 ) belong to a class
k
' -° 5Ι / Wι
» - § "J" 20/s««/ a Eiir and L/k u
For batch jobs Ç|^ is generated so that the number of running batch jobs at the same time is equal to installation parameter BACKGROUND. The load on the computer characterised by shown characteristics is the input to the system model simulating job scheduling, dispatching and processing '. The flowchart of the scheduling and dispatching is shown at Fig. 3. Scheduling is as follows: interactive job is started if the number of processed interactive jobs at the time is less then MOPLIMIT, in the other case job can not be put on the system. For batch jobs, when BACKGROUND is reached and JOBLIMIT-MOPLIMIT is not reached, job is set awaiting, when BACKGROUND is not reached, job is started, otherwise it is not put on the system. For started jobs the percen tage of computer power for every job /Computer Power Index CPI/ is calculated. From this level the system does not differentiate inter active and batch jobs. x/ For details see references 1, 2.
M. BAZEWICZ, A. PETERSEIL
418
0,3.
0,2
0,2 0,1 . 0,1
/ k w·»*·/
ίΐ)
20
W
Op«. Storage required oy pregr.W
Ρτβ©Μβ1*β *i*» f » progfejvi tpt
Ρ
?
\
0,3 .
0,3 J
/
0,2
ί~\
\ /
/"Ν
\
0,2-
0,1
0,1
-1
1 2
1
1
1
3
4
1
1
1
5 6
/
'
Λ
20
40
overbud tia·
K B M V of joo pregnane I
Ρ ι
0,5 0,4 J
/ '\
0,3
ί i
0,2
/
Iûteractlre joka
\ s
0,1
3
6
\
\
Λ
·
7
/
9 12 15 18 a
24
Ρ 1 probability Fig. 2. Empirical job characteristics
60
^
ADAPTING A TIMESHARING SYSTEM BY SIMULATION METHODS
419
Dispatching is as follows: started jobs are processed with computer timesharing. For every job the time slice equal to W k χ SLOTTIME is given. At the end of every time slice the fair waiting time wf k χ SLOTTIME CPI is calculated and dispatcher for next time slice chooses the job which real waiting time exeeds mostly the fair waiting time. When job to be processed is in the backing storage, it is swapped in to the operational storage, and other jobs /only batch ones/ are swapped out to the backing storage. In simulation experiment the mean value of the system reply time re lated to processing time for job /including overhead/ is measured both for batch and interactive jobs. It is estimated by
L6Q:Z
cPt^C ^%Î)
batch jobs /JF denotes the set of finished jobs to the instant g\ is the instant^of nth program of kth job finishing, it is calculated from ξ ^ by adding the time segments, which is equal to cptfe , disposed in the consecutive ST assigned to k, with the intervals between the ST/.
The weighted sum of these values is called the quality coefficient C
1 C I
+ C
2 C B
RESULTS OF SIMULATION EXPERIMENT It can be easily found that installation parameters and the capacity of resources mostly influence the system behaviour. Our job is to find such values of these parameters which provide fast service of interactive jobs and also causes batch jobs not to wait to much, viz. to minimize Q. The weights c^, and c 2 are set respectively 5 and 1 and it makes the response time for interactive job five times more important /C and C are the values of the same range/. Because of hardware limitations of real installation the following values of parameters were set: JL = 15, MP = 10, BC = 4 /one job is allowed to wait/, the capacity of operational storage is 128 k, au xiliary storage is practically infinite. Q as the function of MC with ST as the parameter was tested.
M. BAZEWICZ, A. PETERSEIL
420
JOB OPERATOR
JOBLMIT REACHED ?
VI _/.
Inside such systems minicomputers can perform specialized func tions. Aim of the paper is to investigate the throughput of an arrange ment made up of two minicomputers, one dedicated to manage the periphery and to carry out a portion of the computing requirements (front-end function), the other dedicated to carry out the remain der of the computing requirements (back-end function). Both the front-end and the back-end are accessible' via one queue each; the maximum number of jobs circulating in the system equals the maximum number of jobs that can be simultaneously open in the front-end minicomputer, that is its multiprogramming grade. Every job arriving at the system has first to be processed by the front-end before it may proceed inside and has again to be proces sed there before it may leave the system. The queueing discipline is POPS for the jobs waiting to be servi ced by the back-end, whereas the jobs waiting to be serviced by the front-end may have different scheduling, according to whether they have not yet been processed (priority) or have already been processed (preempty-resume). The analysis of the effect of these operating modes represents a further contribution to the study of the CSM (Central Server Model) traffic model,
¿AJ¿5_J. 435
436
D. GRILLO, Λ. PERUGIA
Key assumptions are: closed network,exponential service times,one type of job,scheduling disciplines independent of queue length,no overhead.
II.System Description and Operating Rules Aside from the applications for which they are just taylored, mini computers are also becoming used as building modules for computer complexes, Z1_/Z2_7Z3_/· By properly arranging minicomputers it is expected greater flexi bility in matching customer requirements in a wide range of situa tions than the conventional computing systems can presently allow. A situation in which the resort to an arrangement of minicomputers could be of advantage is, for example, the case of an information system operating in a real time environment. A major feature of such a system, as described in ¿&J
should be
the implementation of periphery management and computing function into a set of dedicated minicomputers frontends, while informa tion retrieval should be carried out by a second 3et of dedicated minicomputers backends, A suitable combination of architecture and control should allow independent and stepwise expansion of the two sets according to the traffic demand without any change in the system layout and in the control procedures. As a first step toward the analysis of the system behaviour a hea vy schematized model has been investigated, which nevertheless gi ves some insights into the throughput mechanism. The model is characterized by reducing the system to one frontend and to one backend and by assuming that control times play no role on the traffic flov/ and that no overhead occurs. Both the frontend and the backend are accessible via one queue each. The emphasis of the investigation is on the impact of queue disci plines on the system throughput, which is taken as a measure of the system efficiency.
FRONT AMD BACKENu MINICOMPUTER ARRANGEMENT
437
To this end, as an alternative to the discipline PCPS throughout, the possibility is considered of shortening in a differential way the waiting time of jobs in different stages of advancement. The typical pattern of a job is as follows: upon incoming into the system the job joins the frontend queue where with probability p_ becomes the last item or with probability 1Pn the first one. After being processed by the frontend for the first time, the job joins the backend queue where it becomes the last item. Y/hen the job leaves the backend it joins again with probability P1 the frontend queue where it becomes the last item or it preempts with probability 'lP1 the job being serviced by the frontend. After being serviced for the second time in the frontend the job leaves the system and any preempted job will be resumed at exactly the same processing stage v/here it was interrupted. By setting both probabilities to 1 the scheduling discipline for the frontend becomes PCPS, while by setting both of them to 0 it becomes LCPS preemptyresume. A continuum of disciplines is reali zed by varying independently p. and ρ
between 0 and 1.
Before leaving the system a job must be serviced by the frontend, that is by the same server a job has to pass through first. The analysis of the effect of these operating modes represents a further contribution to the study of the CSM traffic model
,¿J*3í5J'°
In this paper only extreme cases resulting from the combination of values 0 and 1 for p_ and ρ
are investigated analytically.
Por the case 0^p.^· 1 and 0¿p ^ 1 , with P r /p 1 , analitic expressions of the throughput are obtained for low values of the multiprogram ming grade by means of a program that operates algebrically; fur ther analysis is based on simulations.
III.The Traffic Model The traffic model is defined when statements about the system layout, the traffic process and the servicing disciplines are made. In what follows under front (back)end subsystem will be understood
438
D. GRILLO, A. PERUGIA
the front- (back-)end queue and the front- (back-)end server. The lay-out consists of the two subsystems in tandem, ¡is represen ted in Pig.1. Por the arriving process the assumption of "closed network" is ma de, ¿1_J> i.e. each job leaving the system is immediately replaced by a new job or, in other words, the number of jobs circulating inside the system is kept constant. In the model this number equals the multiprogramming grade of the front-end, MPG. All jobs are of the same type, i.e. they all have the same service distributions, which are assumed to be negative exponential with mean r and 1 -without loss of generality- for the front- and the back-end respectively. According to the assumption of closed network and to the operating rules described in tfie preceding Section, each job traces a double loop trajectory in the system, touching the front-end unit twice, as evidenced in Pig.1. Jobs can be regarded as belonging to either of two disjoint classes: upon entering the system, from the upper loop, a job belongs to class 0; after leaving the front-end toward the lower loop a job belongs to class 1 ; after leaving the front-end toward the upper loop a job belongs again to class 0. The back-end subsystem contains only jobs belonging to class 1, whe reas the front-end subsystem contains jobs belonging to both classes. Jobs in the back-end subsystem are queued according to PCPS disci pline. On arrival at the front-end subsystem jobs are queued: accor ding to PCPS with probability pf)(p1) if they belong to class 0(1); with priority and probability 1-pQ if they belong to class 0; accor ding to preempty-resume with probability 1-p. if they belong to class 1. The sequence of class designations attached to the jobs as they are actually ordered in a subsystem queue together with the class desi gnation of the job currently being processed in the same subsystem represents the state of that subsystem.
FRONT- AND BACK-END MINICOMPUTER ARRANGEMENT
front-end subsystem
back-end subsystem
Pig.1 - System lay-out
439
440
D. GRILLO, A. PERUGIA
As the state of the back-end subsystem adds nothing to the know ledge of the system state as it is furnished by the state of the front-end subsystem, this latter fully characterizes the state of the whole system. The space S of system states has dimension MPG Isl = ¿_2X
(III- 1
i=0 that is the number of ways of choosing with replacement i elements (the class designations for the job being serviced and for i-1 jobs in queue) out of a population of two elements (class 0 and clas3 1), i = 0(1)MPG. Pig.2 gives a picture of the state space for MPG=3 together with transition rates between states. The i-tuple c.C-.o.c, i = 1(1)MPG, inside each circle represents the sequence of class designations in the subsystem front-end for, from left to right, the job currently being processed, the first, the second,...,the i-th job in the queue; corrispondently the state of the system is represented by {c1c....c.j. The state "front-end subsystem empty" is represented by {$}· The throughput of the system is defined as the rate at which jobs exit from the system or, equivalently, the rate at which in the front-end subsystem a job changes from class 1 to class 0. Indicating by
s
' = U{ 1c 2*" c i} »V c 2··· 0 ! i = 1(1)MPG
(III- 2
the subspace of all states whose evolution involve the replacement of a job belonging to class 1 with a job belonging to class 0, the expression of the throughput is throughputs^ = r Prob( S') = r Prob(U{lc 2 --* c i} >
FRONT- AND BACK-END MINICOMPUTER ARRANGEMENT
'{ooop
Fig.2 - State space for the general case (MPG=3)
441
442
D. GRILLO, A. PERUGIA
Σ:
.Prob( { 1 c 2 " * c J )
,"V C 2 C i
i = 1(1)MPG
(III 3
As under steadystate conditions the rate at which a job circles in the upper loop equals the rate of circling in the lower loop, the throughput may also be expressed as throughput^jpj, = r Prob((J{oc2.. .C.J ) = r ¿_2rrt>(
{ 0 c 2 " " c i } ) t"V'c2...ci
i = 1(1)MPG
(III 4
when considering the flow into the backend subsystem, or as throughputMpG = Prob( {ø} ,(Jfc,^.. .C.J ) = Prob( [ji]) + 2 _ P r o b ( Vc1c2...ci
(°)
{ο^.,.ο.,})
, i = 1(1)(MPG1)
(III 5
when considering the flow out of the backend subsystem. By combining expressions III3 and III4, one alsohas throughput^ = | (1Prob( [ø]
))
(III 6
which is useful for computation purposes. For ease of notation a state {c.c_...c.| will be referred to in what follows as s. , i = 1(1)MPG and j =\ c,2 ~ , and ProbCjc.Cp...cj) , i.e. Prob(s. . ) , will be shortened in P. .. 13 '
13
The expressions III3, III4 and III5 of the throughput then become respectively
j¿pG 2i11
throughput^ = r ^
Σ ^ P ij
(°) Prob(a,b) indicates in this context Prob(aub)
(ΙΙΙ
~
7
FRONT AND BACKEND MINICOMPUTER ARRANGEMENT
443
UPO 21!
throughput^,, =
Σ Σ mν10
r
7 ¿ _
p
(III 8
feo .S1
MPG1 2 1 ! throughput^ =
¿_
¿^
P
(III_
ij
and expression I I I 6 becomes
9
MPG 2 1
t h r o u g h p u t . ^ = \ (1P 0 0 ) = | ¿ ^
¿^
Pid
(III_10
when the conventions 3 ηη ={0} and Ρ ηη = Prob(y2ÍJ) are made. The s t e a d y s t a t e , "global balance',' ¿JJ
equations
Ρ = r Ρ 00 10 (r+Ö
i ) P i j = i{m0å
2)
P
1
P
i1,(j1)/2
+ (j + D(mod 2) r p Q \
¿ / 2 + 2
^
+ r p . P. . _i1 + 3, r I> ι
,
( r + ô . ) P . . = j(mod 2) ρ, P ^ +
,
for j =0( 1 ) ( 2 1 _ 1 1 )
1+ ι , 3
(d
_l)/2
*1 V i . j z 1 1
+ (j+l)(mod 2) r p Q P ± + 3. r P. . . ι 1+1,3
1
,
d/2+2
i_1
for j =2 1
1
(1)(2X1)
i f i ^ MPG
i = 1(1)MPG
(III11
444
D. GRILLO, A. PERUGIA
together with the normalizing condition MPG 2 1 - !
Σ_ Σ i=0
P
ij -
_?· Σ Σ*ί? i=1
i=1
(«ι-
j=1
from which, by making use of relations VII-8 and VII-9
_? ■i1τ k
i
k-1
ff h=1 y 0 ° the throughput is gi ven by min ír/2,1 \ . Over the whole range of values for the ratio between the service rates and for each value of the multiprogramming grade the combination p Q =0 PH=1 is superior to the combination P 0 =P 1 , which in turn is su perior to the combination Pn=1 P.=0. Moreover results
are
collected by a mixed analitical and si
mulative approach which βμρηορ* the conjecture that the throughput as given by any combinations of p 0 and p 1 lies between the extreme cases P o =0 P1=1 and Pn=1 P.=0 and that combinations characterized by ρ zip
are superior to combinations characterized by P0^p...
The combinations p_p..=p are all equivalent for any p, a result which raises the question as to whether the model ¿8_J dited
could be cre
with more generality.
The dependence of the system throughput on the scheduling discipli ne becomes weaker when the multiprogramming grade increases. Por va lues of multiprogramming grade above ten there is practically no mo re difference among all possible combinations of p Q and ρ
and al
ways less appreciable gains in the throughput are achieved.
FRONT- AND BACK-END MINICOMPUTER ARRANGEMENT
463
The model considered in this· paper is a heavy schematization of the reality. Further study is currently undertaken to better match the model to the reality and to extend it to the case of many front-ends and many back-ends. First results achieved by simulation seem to indicate that the system behaviour for the case of many front-end and.many back-end does not significantly differ from that of the system investigated in this pa per, provided scale factors are taken into consideration.'
464
D.GRILLO, A.PERUGIA
References ¿\J
W.A.Wulf and C.G.Bell,"C.mmp A Multiminiprocessor", Pall Joint Computer Conference,1972
¿2.J
D.P.Bhandarkar, "Analysis of Memory Interference in Multi processor", IEEE Transactions on Computers,sept. 1975
β,J
W.A.Wulf,R.Levin and C.Pierson,"Overview of the HYDRA Operating System",5th Symposium on Operating System Principles,Austin,Texas,nov.9 1975
¿Kj
C.H.Sauer and K.M.Chandy,"Approximate Analysis of Central Server Model",IBM J. Res. Develop.,may 1975
¿3_J
W.M.Chow,"Central Server Model for Multiprogrammed Computer Systems with Different Classes of Jobs",IBM J. Res. Develop, may 1975
¿βJ
M.Brizzi and G.Cioffi,"Studio di sistemi di elaborazione distribuiti per applicazioni gestionali in tempo reale", Fondazione Ugo Bordoni Interim Report, 1976
¿jj
K.M. Chandy, "The Analysis and Solutions for General Queue ing Networks",6th Annual Princeton Conference on Infor mation Sciences and Systems,Princeton University,mar. 1972
¿3J
P.Baskett,K.M.Chandy,R.R.Muntz and P.G.Palacios, "Open, Clo sed, and Mixed Networks of Queues with Different Classes of Customers",Journal of the Association for Computing Ma chinery, apr. 1975
¿3_/
U.Herzog,I.Woo and K.M.Chandy,"Solution of Queueing Problems by a Recursive Technique",IBM J. Res. Develop.,may 1975
FRONT- AND BACK-END MINICOMPUTER ARRANGEMENT
465
Appendix The r e l a t i o n (A - 1
throughput M p G p o = 0 > throughput^,,, P0=PI=P 0έρ_:1 ΡΓ1 is verified if 1/Prob({0}) p o = o >
1/Prob({^})po=pi=p
pl=1
(A - 2
Οέρέΐ
Since the normalizing condition VII-8 can a l t e r n a t i v e l y be expressed as fMPG-1 . 2MPG-1 u 1τ— o *
Σ Φ * Σ Σ S>-
Ό
Lk=0
+
1
¡l
u=MPG v=2(u-MPG)+1 MPG-1 /2MPG2k
λ
V_MPG22k/
_
1
>
= 1
A (
3
P 2BK=5S
k=0
and from V1 and V2 MPG
^°^M)p0=Pl=p = 2 ^~aG+ì » ~
^ -*
A2 then reads MPG1 „ k
a_PG1 J i ,
.
Ι Φ * Γ Σ ("Κ"
k=0
(A 5
u=MPG+1 v=2(uMPG)+1
MPG1 1 V /2MPG2k +?
\
1
___
.
MPG V ,2,
L· V__W22k J^iPG^k * L W
k=0
k=0
D.GRILLO, A.PERUGIA
466 that i s 2MPG-1
Σ
u=MPG+1
u
MPG-2
Σ UK U * ΙΣ
v=2(u-MPG)+1
2I.IPG-2-k V 2MPG-2-2k/ 2MPG-2-k > O \ / r
k=0
(A - 6 which supports statement A-1.
.
Since the left side of A-6 is lessened by
„
u Vil (2) r
, as can be
u=MPG+1 easily verified, it follows that throughput
'MPG
PQ=0
P,-1
1 w ]=1
See Fishman (1967). This approach has not been applied in practical simulation studies since the estimation of p. is too clumsy. (iii) Independent, regenerated blocks In a queuing system the behavior of the system after it has retur ned to the empty state, is independent of past behavior! So we may form blocks (tours, cycles) of customers, the blocklengths depending on the return to the idle state. This approach creates independent, identically distributed blocks. In general, Markov systems have this regenerative (renewal) property. The practical problem is to detect whether the system one is simulating has such renewal states. There are also some statistical estimation problems, since the point estimators and confidence intervals are not using straightforward formulas like eq. (1). See Iglehart (1975) and Fishman (197 4) for more details. Applications of this approach to the simulation of computer systems can be found in Coppus et al. (1976), Lavenberg and Slutz (1975), and Iglehart (1975 , p. 2). Next we consider the initialization problem. Even if we wish to estimate the steadystate response, the simulated system first has to pass through the transientstate. We usually throw away the transientstate observations, in approaches (i) and (ii) above. If we replicate runs we throw away observations at the beginning of each run! It is difficult to test statistically whether the transientstate is over; see Leroudier and Parent (1976). Rules ofthumb do exist, e.g. we might discard observations as long as they seem to increase steadily. Note that Blomqvist (1970) showed that for simple queuing systems the mean square error is minimal if all observations, steadystate and biased transient state, are averaged. These problems of initialization disappear in approach (iii) since starting in the empty state is perfectly acceptable, and permits to take measurements immediately. Once we have an_estimator for the variance of the estimated ave rage response (y), we can perform further statistical analyses Confidence intervals for the expected value, say n, based on the Student tstatistic are of the following type.
STATISTICS AND SIMULATION
P(n < y + t^_x s //n) = 1-α
471
(3)
The t-statistic can often be applied since it is robust against non-normality. Moreover, if y (not y) is an average, then y may be asymptotically normal even if its components (say w) are de pendent; see Kleijnen (1975, p. 455). Once we have a confidence interval, it is simple to test a parti cular null-hypothesis, say H0:n > nQ. For, if the interval in eq. (3) does not cover the value r,Q of the null-hypothesis then we re ject that null-hypothesis. Confidence intervals of fixed length, say c, can be determined (approximately) by selecting a, stochastic, sample size t° t η—1.22 η = { s c } -y
/ Λ \ (4)
Observe that in eq. (4) the standard deviation s_ can be recal culated as more observations y come available, so-called sequential approach. For intervals of predetermined lengths and sequential tests we refer to Robbins et al. (1967) and Gosh (1970). For additional discussions within a simulation context we refer to Kleijnen (1975) and Robinson (1976). In practical simulation studies one usually characterizes a whole time path by a single number (usually the mean, but y might also denote a sum or a maximum queuelength). For stationary time series a more detailed analysis can be performed through spectralanalysis which shows whether the time series has particular perio dicities; see Fishman and Kiviat (1967). In some simulations (e.g. Systems Dynamics) one "analyses" the non-stationary time series intuitively and qualitatively, e.g. one simply determines whether the time series explodes or stabilizes. 3. COMPARING AND RANKING SEVERAL SYSTEMS In the preceding section we discussed the sample size (the number of observations or runs) for a single system variant, or "popula tion" in statistical jargon. Actually, a simulation is done since we are interested in the performance of alternative systems. In the present section we shall examine sample size considerations in case we investigate a few systems, e.g. k computer systems with different queuing priorities, (k exceeds 2 but is rather small; the case of a great many system variants will be presented in the next section.) We distinguish situations with a fixed sample size and with a variable sample size. For a single population this distinction corresponds with eqs. (3) and (4) respectively. (i) Multiple comparison procedures The number of observations per population may be fixed because we are doing a pilot experiment or because the available computer time is limited. Suppose we wish to compare some new systems
472
J . P . C . KLEIJNEN
with a s t a n d a r d system. A nalogous t o eq. '1 , Z*n0 ) Pín x n 0 < (£,
+ t°
/ — nx + — n 0} =
+ t«
/!_
Vl
/ 2 P{n2n0 £
( y . 2 " ^ i'
_:
...
(3) we o b t a i n
2 +
_.
1a
Ëo} .
la
υ
etc.
where t h e d e g r e e s of
(5) freedom v. might be t a k e n
vi = min(ni(n0)l
as
i = l,...,(kl)
(6)
Other approaches for ν are discussed in Kleijnen (1975, pp. 471 473). However, the point we want to emphasize here is that in each of the (k1) confidence intervals of eq. (5) we can make an error with probability a. Hence the probability that all statements based on a single experiment are correct, is no longer (Iα). To keep the socalled experimentwise error rate below α we may apply "multiple comparison" procedures. The simplest technique continues to use intervals like eqs. (3) and (5) but reduces α in t to a/m where m is the number of statements based on a single experiment; socalled Bonferroni approach. In the example of eq. (5) we then have m=kl. We can make various types of comparisons among the k population means, e.g. k1 comparisons with the "standard" mean associated with the existing system (η.ηη with i=l,...,kl); all k(kl)/2 pairwise comparisons (η·η·ι i^i'); selection of a subset containing the best population or a subset containing all populations better than the standard population (the populations in the subset may be further studied in additional experiments). We refer to Kleijnen (1975) for a discussion of multiple comparison procedures for various simulation situations, their efficiency and robustness, and types of error rates; also see Miller (1966) and Gupta (1965). Lavenberg and Slutz (1975) applied the Bonferroni approach in their simulation of several computer configurations. In most simulations, however, the experimentwise error rate is ignored. (ii) Multiple ranking procedures There are procedures to determine how many independent observations (simulation runs) should be taken from each of the k populations (systems) in order to select the best population. The best popu lation is usually the one with the largest mean. Some procedures have been derived for other selection criteria, e.g. the variance, or for other problem formulations, e.g. a complete ranking of all populations from worst to best. Most procedures are based on the "indifference zone" approach: The procedure guarantees a correct selection with probability at least P* (or in the familiar symbols lo) only if the best population mean is at least δ
better than
STATISTICS AND SIMULATION
473
the next best mean, Ρ and & being specified by the experimenter. A recent monograph is Bechhofer et al. (1968); a survey of exis ting procedures, their efficiency and robustness, and some heuristic procedures is given in Kleijnen (1975). Note that most procedures are sequential. Computer simulation experiments are well suited to sequential schemes since the digital computer operates sequentially. We do not know of any applications of these ranking procedures. In practice one usually bases his selection on point estimates, possible augmented with individual confidence inter vals like eq. (3). if all factors are quantitive then many system variants result and we have the situation of the next section. 4. DESIGN AND ANALYSIS OF MANY SYSTEMS Different systems (system variants) have different values or values or types of system parameters, input variables, and behavioral relationships (operating characteristics). These varying parameters, etc. are called "factors" in statistical design terminology. A quantitative factor can have many values or "levels". A qualitative factor has only a limited number of levels, e.g. 2 levels namely 2 queuing priority rules. If we want to investigate k factors, factor i having L. levels, then the number of combina k1 tions of factor levels is Π L.. For example, if there are only i=l x 7 factors and each factor is at its minimum number of levels (L.=2) then we still have 2 =128 combinations! Therefore we look for an experimental design that specifies a selected number of combinations that will actually be investigated. (Remember that 1 factor combination specifies 1 system which is simulated over time and results in 1 time path for that particular system). Be sides a selection of the, say, Ν factor combinations we need a model to interpret the many responses (output data). This model we call a metamodel. In practice one starts from a "basic" system configuration, and varies one factor after the other until one's time and energy are exhausted. A few examples of computer studies using experi mental designs, are provided by Schatzoff and Tillman (1975, p. 254). Applications of regression metamodels, not to simulation experiments but to reallife experiments with computer systems, are given by Schatzoff and Bryant (1973). Compared to the scien tific methods of the following subsections, the onefactorata time approach requires more effort nevertheless resulting in less result! Interactions among factors cannot be estimated inthe "practical"onefactoratatime method. Lack of scientific experimental design is often accompanied by lack of analysis. Many output data are provided but since no metamodel is available the interpretation is left to the intuition. See also Kleijnen (1975, pp. 289290), and Schatzoff and Tillman (1975, p. 254). 4.1. THE REGRESSION METAMODEL: MAIN EFFECTS AND INTERACTIONS Denote the output variable of the simulation by y, the k input factors by x. (j = l,...,k) and the vector of random numbers by r. (Vectors and matrices are denoted by ■+.) Hence
474
J.P.C. KLEIJNEN
y = f (χχ,... ,\,r)
(7)
where the function f is no simple explicit function but is speci fied by the computer simulation program. As a "black box" meta model we propose a regression model that is linear in its parame ters β: y. = B Q + ^ X Ì ! + ... + ß k x ik ) +
+
(Β12Χ±1Χ12
+ ... + Βκ,κΛ,κΛκ)
+ ( Í ! ^ ! + ... + 8kkx^k) + (...)
+ e.
(8)
where x.k is the value of input variable k in execution i of the simulation program f; β. is the main effect or first order appro ximation of the effect of x k ; ß k _, k is the interaction between the variables k1 and k; ß k k is the quadratic effect of variable k; β. is the overall response level. The inaccuracy of the model is the disturbance (noise) term e. The term (...) in eq. (8) suggests that more complicated regression models are possible, e.g., higher order interactions among more than 2 variables may be introduced. Note that an interaction like β. means that the effect of factor x, on the response y also depends on the level of factor x. If one or more variables χ are qualitative, then we use dummy values 0 and 1 for the corresponding χ and we also set its quadratic effect to 0 since that effect has no interpretation. For both qualitative and quantitative factors the effects β can be estimated by least squares regression analysis. If all factors are qualitative then a subset of regression analysis results, known as analysis of variance (ANOVA). If we doubt the adequacy of the above metamodel then we perform the following test. 2 The noise e has a variance σ that can be estimated by duplicating observations y at 1 specific combination of the x's. So if we repeat the simulation of system i η. times then η.
=
=
~-l y l] ( *ig " ¿ι» 2 /'"!" 1 »
Next consider the "residual mean squares", which is wellknown in regression analysis: N
SS
=
Σ i=l
y )V(Nq) 2 (y
(10)
x
where y. is the response predicted by the regression model with,
STATISTICS AND SIMULATION
475
s a y , q l e a s t s q u a r e s e s t i m a t o r s j i . (q = k + 1 i n a f i r s t - o r d e r m o d e l ; s e e e q . 8 . ) We now t h a t SS_ h a s t h e same e x p e c t e d v a l u e as -2 σ , provided the regression model is correct; otherwise SS_ is in flated. Hence we can test a possible lack of fit of the metamodel through the F-test in eq. (11); see Kleijnen (1975, pp. 365-367). SS FV = -=5 v - l' 2 S2 —e
with
-( '
„
(11)
Regression or ANOVA has been used in a few simulation studies; for references and a case study see Kleijnen (1975). The above regres sion models should be distinguished from the following models. When building a simulation one has to choose the degree of detail of representation. More global, aggregated models are always possible. For instance, Eilon and Mathewson (1973) used a global network as a "metamodel" for a detailed queuing model of an air port. The regression metamodel is the ultimate black-box model that can be used to interpret the results of the (more or less detailed) simulation model. To estimate the q parameters β in eq. (8) we need to select Ν combinations x.. (i = 1,...,N and j = l,....,k) which also deter mine cross-products χ.,χ.,, etc. This is the experimental design problem. We discuss this vast problem area in 3 steps, namely screening, further exploration, and optimization. 4.2. SCREENING DESIGNS If the simulation model has a great many, conceivably important factors then we start with a preliminary (pilot) investigation to detect the important factors. These factors may form the subject of both sensitivity analysis and optimization. Several screening designs are available. (i) 2
_ designs
Let us first look at an example. Suppose we investigate 3 factors each having only 2 levels denoted by +1 and - 1 , or briefly + and -. Then a possible selection of only k of the 2-* possible combinations or 2 combinations, is shown in table 1. Note that all columns in table 1 are orthogonal, i.e. Ν Σ χ χ i=l ID 13
=0
if j f
j'
(12)
Hence least squares estimators £ of the main effects of the 3 factors are possible. For, in general
I = (X'X)"1 X'y
(13)
Where X denotes the matrix with elements x.. (i = 1,...,N, and
476
J.P.C. KLEIJNEN
j = 0,...,ql) with x.Q Ξ 1. The variance of the estimators j3 follows from the matrix for the variances and covariances, say Ω, t $0 = (S'î) 1 σ2
Combination 1 2 3 4
χ,
x,
+ +
+ +
(14)
x3 = χ ι χ 2
Table 1: A 2 1 design. It can be proved that these variances are minimal if X is orthogo nal; Kleijnen (1975, p. 371). Procedures to construct such orthogonal matrices are provided in the experimental design litera ture, and results have also been tabulated. Ingeneral, in 2kp" designs all k factors are at 2 levels and only a fraction (viz. a 2 " fraction) is examined. Depending on the size of this fraction estimates are still possible of main effects and, possibly, of loworder interactions; see Box and Hunter (1961) and subsection 4.3 below. However, if k is high the number of combinations is still too high, for if we have k factors then Ν exceeds k even if we estimate only the k main effects. In that case other designs are needed. (ii) Random designs The combinations of factor levels are randomly selected from among all possible combinations. The number of combinations (N) can be determined independently of the number of factors (k) and levels. Hence Ν may be chosen even smaller than k; see Satterthwaite (1959). A disadvantage is that this random selection may result in combinations that do not permit "good" estimates of the indi vidual effects. (The degree of orthogonality of the columns in the matrix of independent variables is stochastic instead of being controlled.) Therefore the following type of design was devel oped. (iii) Supersaturated designs The number of combinations (N) is smaller than the number of factors (k) and the combinations are so selected that, given N and k, the estimators of the factor effects are as "good" as possible. (The maximum nonorthogonality of the design columns is minimized.) Booth and Cox (1962) used an iteative computerized procedure to derive these designs for 7 selected (k,N) combinations. If other combinations are desired the following type of design is useful.
STATISTICS AND SIMULATION
477
(iv) Group-screening designs As an illustration suppose we have 9 factors, x. through x-. Form, say 3 groups (or group-factors) z. through z,. Group-factor ζ is said to be at its "high" level ( + 1) if all its components x 1 , x, and x. are at their "high" level, i.e. at the level where they produce the best result. (As an example consider a queuing system where x. denotes the number of parallel service stations and x, denotes the priority rule. Let x 2 = +1 mean small-jobs-first and x, = -1 mean first-come-first served. Then x. and x 2 at their high level are expected to decrease the waiting time.) We can test the 3 group-factors ζ , z 2 and z, in a 2 3 - 1 design shown in table 1 above. From the definition of the high level of ζ and the additio nal assumption that no interactions among the x's exist (which could cancel a main effect) it follows that if z. has no main effect then all its components (χ,, χ·,, X3) must have no main effects! In that case the 3 factors χ, , x 2 and x, can be elimina ted from further experimentation. In general, in group-screening the k factors are combined into g groups of factors (g
I/O CONSOLE
. 1 / 0 BOUND
QUANTUM EXPIRES I/O DISK
«CPU BOUND-
CHANNEL
1
JOB ARRIVAL
T-E -*|T~ SLICED JOB
HF CENTRAL MEMORY
4 MD-25
I" CPU
CHANNEL
2
I/O BOUND CPU BOUND
J
^ DISKS
ft' JOB EXIT 4 TAPES
JOB
Fig. 2 -
MODEL STRUCTURE
4 MD-25
490
V. MINETTI
THE APPROACH The choice of simulation techniques was based upon the following considerations: a- simulation readily gives insight on system efficiency and response time; b- simulation techniques would provide objective and quantitative information on the system's performances as required by tuning; c- it was important not to interfere with the service being provided by the Computing Center; d- a simulation would allow to analyze both present and predicted workloads on pte^ sent and future system configurations. MODEL STRUCTURE The structure of the model is illustrated in Fig. 2. Heavy line blocks represent the principal resources (central memory, central pro cessor unit, disks and magnetic tapes), while thin line blocks represent queues; linkage lines show the path of evolution of requests. The model is in line with the real system configuration presented above (slow peripherals have not been in troduced as their operations were estimated to have little effect on the para meters of interest). However, the simulation implementation was based on a slightly simplified structure according to the previously stated interest to investigate overall system behaviour rather than particular activities. The most important approximations made were the following: a- The dependence of the time slice length on the job's characteristics was neglect ed, assuming its effect would be negligible for our purposes: a fixed value of 270 ms, computed as the arithmetic average between minimum (40 ms) and maximum (500 ms) slices admitted, was applied. b- The l/0-bound task queue was suppressed; indeed preliminary simulation runs re vealed that this queue was empty for a large percentage of the time. c- Disks and magnetic tapes represented in Fig,3 as a queue followed by the stor age support were treated so that an access to them resulted in a pure delay time; overall effects were accounted for by evaluating the delay time on the ba_ sis of real wait and service time. The simplified model is shown in Fig.3. SIMULATOR STRUCTURE Queues and interrupts were handled in a very elementary form and require no com ments. Other parts of the simulator are briefly described in the following subpa ragraphs. a) Data Sets Data used by the simulator were organized in three data sets: input data, output data and service data sets (see Fig.4). The input data set contains the workload characteristics as they were obtained from the trace used for accounting purposes. Jobs' and Tasks' input data were con figured as listed below: JOB:1.Arrival time 4.Activation time 7.1/0 disk calls 10.Total 1/0 tiroe(tape)
2.Priority class 5.Total CPU time 8.Total 1/0 time(disk) 11.Exit time
TASK 1.Arrival time 4.Average CPU time
2.Average memory requested 3.Total CPU time 5.Average CPU time between 6.1/0 disk calls 1/0 console 8.Average 1/0 time 9.1/0 console calls 11.Exit time
7.Total 1/0 time(disk) 10.Average think time
3.Average memory requested 6.Average CPU time 9.Average 1/0 time(disk) 12.Response time
QUANTUM EXPIRES
DELAY I/O CONSOLE TASK ARRIVAL
3: >
JOB ARRIVAL
-E -41 -E
-E -E
>
TASK
TASK
TASK EXIT I/O DISK
>
>-
DELAY
>
DISKS I/O BOUND
SLICED JOB CPU
CENTRAL MEMORY
CPU BOUND ft* JOB EXIT
Fig. 3
-
SIMPLIFIED MODEL STRUCTURE
DELAY
JOB TAPES
V. MINETTI
492
TRACE
INPUT DATA SET
'
SERVICE DATA SET »-
EVENT MANAGER
"
E-TABLE
OUTPUT DATA SET WL-TABLE
Fig.4
DATA SETS USED BY THE SIMULATOR.
The average CPU time was computed as: (total CPU time)/(I/0 disk calls). The total I/O time (disk) is the sum of all wait and service times related to a job (or a task) for its I/O disk operations. The average I/O time was computed as: (total I/O time)/(I/0 disk calls). The output data set contains the workload characteristics as evaluated by the simu lator. This set was so configured: JOB
1. Activation time 4. Total I/O time
TASK 1. Total CPU time 4. I/O console calls
2. Total CPU time 5. Exit time
3. I/O disk calls 6. Response time
2. I/O disk calls 5. Swap operations
3. Total I/O time 6. Response time
The service data set contains additional information necessary for the simulation. This set was so configured: JOB
1. Next event (for sliced jobs only);2.Remaining CPU time(for sliced jobs only)
TASK 1. Next event 2. Remaining CPU time 4. Time of termination of the last I/O console
3. CPU time counter
The next event information was used to evaluate if additional CPU time had to be allocated. The CPU time counter was used to evaluate when an I/O console call had to occur. The response time counter and the time of termination of the last I/O console were used for statistics on the T.S. response time. blWorkload Table (WL-Tablel The basic structure used for acquiring and collecting data is the WL-Table. Any s_i mulated job (or task) has an identifier name and up to 22 reserved positions in the WL-Table; each position corresponds to one of the above-mentioned input, output and service data sets. Whenever a new job (or task) enters the system a new entry is added to the WI.-Table consequently its length grows as the simulation proceeds. At the end of the simula tion statistical parameters can be obtained by analyzing the whole output data set.
PERFORMANCE EVALUATION USING A TRACE DRIVEN MODEL
493
c) Events and Event-table (E-Table) The state of the system at a certain time can be defined in terms of the situation of the system resources - free or busy - and of the situation of the queues - num ber and identity of requests queued. Following MacDougal (1970) an event can be defined as a change in the state of the system, implied by a transition between operations or activities. Simulation can then be developed by taking care of an event and predicting at what time the consequently generated events will occur; therefore the simulated time is a non-continuous quantity and its increasing values represent, an any time, the occurrence value of the last event scheduled (actual time). The event scheduling algorithm involves the use of an Event-Table (Ε-Table). In our case an entry of this table contains the name of a job (or task), its next event and the time this event will occur (see Fig,5).
NAME OF JOB OR TASK
Fig. 5
NEXT EVENT TYPE
An entry of
NEXT EVENT TIME
the E-table.
An event scheduling routine scans the Ε-Table looking for the next event, namely the event whose future occurrence time is the closest to the actual time. The con trol is then transferred to an event-routine which performs the operations whose initiation corresponds to the occurrence of that event. Five events were introduced: 1. Job or task arrival 2. Task loaded 3, Release CPU 4, Release disk 5, I/O console termination A detailed description of the operations involved by each event is presented in Minetti (1975). Fig.6 shows the main structure of the simulated program. The initialization routine establishes system parameters (memory partition size, time slice and quantum length, termination time, etc.); an event scheduler triggers the simulation by scheduling the arrival of the first job: from that point on the flow of events is selfmaintained until the termination time is reached. The simulator was written in CII Extended FORTRAN IV. Its present size (compiled) is about 73000 bytes (it should be pointed out that no particular efforts were de voted to the reduction of its size). d) CPU Assignment The time duration of CPU assignements to any job (or task) was performed by gene rating random samples from exponential distributions. The exponential hypothesis was applied both to jobs and tasks. Actually, a round-robin discipline does not well agree with this hypothesis; how ever, the difference among the two procedures was reduced by saving the excess of the generated CPU time (with respect to the maximum time allowed in the roundrobin discipline) and by using it for the next CPU assignement. Whenever a CPU time has to be generated a routine accesses the WL-Table to get the value of the average CPU time. This parameter is then used to define the specific exponential distribution. A sample is obtained by generating a random number uniformally distributed in the range 0-1, and using it to derive the CPU time from the distribution.
V. MINETTI
494 The time-scale unit is 1 ms.
Fig.6
MAIN STRUCTURE OF THE SIMULATION PROGRAM.
e) I/O Operations The simulation of I/O disk operations (as already stated) was performed by intro ducing a delay in the activity of processes every time an I/O disk call was to be issued. Such delays are constant for a specific job and their value is derived from input data set. This simplification was imposed by the fact that we couldn't derive more detailed information, from the trace, about the distribution of indi vidual service times. Similar considerations must be applied to the magnetic tapes operations. To simu late interactions between a task and a user, I/O console calls are issued. Every task is provided with a CPU time counter which selects the time an I/O console call has to occur: namely, when the CPU time counter reaches the value of the ave_r age CPU time between I/O console calls, a I/O console call is issued, the task's activity is delayed for a time corresponding to the average user's think-time, the CPU time counter is reset and the task processing restarts. f) Swaps To predict the event TASK LOADED the simulator is provided with a list which con tains the names of all resident tasks which can be swapped out. The evaluation of the total swapping time is based on the global 'swap size' in volved by this operation, that is: the total time required for swapping out a 30 pages task and for swapping in a 20 pages task was computed as the time requested to transfer 50 pages. VALIDATION To validate the model simulation runs were driven by traces obtained from portions
PERFORMANCE EVALUATION USING A TRACE DRIVEN MODEL
495
of a typical working day (6.5 hours). When the traces were derived the system was running with the following memory partition: 60 pages reserved to interactive pro cessing, 111 pages reserved to batch processing. The average distribution of jobs among the various priority classes was as follows: Ρ class C " Τ " D "
11% 3% 84% 2%
Validation was based on a comparison between simulated and actual system perfor mances. Table 1. shows the quantities compared in the validation and the corresponding de viations.
Characteristic
J
0 Β
Τ A S Κ
average deviation
maximum deviation
%
%
I/O disk calls
5
n.c.
total I/O disk time
5
n.c.
total CPU time
3
n.c.
B. response time: - Ρ class
20
> 100
- C
"
18
> 100
— Τ
"
14
> 100
- D
24
> 100
I/O disk calls
14
17
total I/O disk time
14
17
total CPU time I/O console calls
1
3
11
16
Table 1. - RESULTS OBTAINED BY COMPARING ACTUAL AND SIMULATED SYSTEM (n.c.= not computed).
Vie high values of maximum deviations observed in the B. respon se time were attributcd to the fact that the actual delay time due to some operat ions like mounting storage media was not accounted for by the model. Moreover, treating every job as a single-step activity and negl ecting any overhead load certainly influenced the overall deviations. The model was considered to be validated primarily on the basis of the inter-class distribution of jobs. The average deviation (14%) of the B. response time observed in the Τ class, the most commonly used, might be confidently ac cepted especially if the model's approximations were considered, On the contrary no comparison was possible on the fitting of T. S. response time since the trace didn't include any data about it. By resorting to empirical data a rather satisfactory matching was established. A good simulation speed was obtained: the ratio (simulated elap sed time during
496
V. MINETTI
model execution) / (actual model execution time) was about 108 to 1 (Seaman and Soucy, 1969). RESULTS Two memory partitions were investigated by simulation: the first would have al lowed the allocation of one task requiring a FORTRAN compilation, while the second would have allowed two similar tasks to be allocated. Quantitatively such partitions can be defined as follows: a) 41 pages to time-sharing and 130 pages to batch b) 82 pages to time-sharing and 89 pages to batch. The above partitions were suggested by the fact that many interactive activ ities were constituted by FORTRAN programs which required a compilation step (the FORTRAN compiler is a 41 pages sized task). The results obtained by running the simulator with these partitions showed that in the first case general satisfactory performances were maintained by the system, while in the second case a very noti ceable increase in the time-sharing quickness was achieved at the expenses of jobs, whose response time suffered an unacceptable increase. Table 2 summarizes the results obtained.
CLASS
PARTITIONS (pages) 130 Β.
-
41 T.S
average response time(S)
89 B.
d e v i a t i o n average response time (S) X - 30.5
82 T.S. deviation
%
+ 19.4
Ρ
25
C
1330
+
3.7
1375
Τ
414
-
4.4
506
+ 16.8
D
459
- 12.1
1495
+186.9
0.6
- 39.9
Task
1.07
+
3.5
43
+
7.4
Table 2. - RESULTS OBTAINED BY THE SIMULATION (deviations are relative to the results obtained in the validation run).
SUMMARY Presently, the real memory partition is as in a) above. The results predicted by the simulation have matched quite satisfactorily the real system behaviour. The model, as it now stands, can be improved by representing in a more realistic way the I/O disks and tapes operations. Effects of changes in the priority classes and in the scheduling algorithms for the CPU can be observed. Hypothetical workloads can be simulated quite similarly to real workloads simply by providing the simulator with a corresponding trace.
PERFORMANCE EVALUATION USING A TRACE DRIVEN MODEL
497
ACKNOWLEDGEMENTS The work described in this paper is included in a research activity partially sup ported by a grant of IBM Italia to the Chair of Electronic Computers, REFERENCES Grenander, U., Tsao, R.F, (1972), Quantitative Methods for Evaluating Computer System Performance: a review and proposals, in Statistical Computer Performance Evaluation (Academic Press, New York). Mac Dougal, M.H. (1970). Computing Surveys, 2, No.3, 191. Minetti, V. (1975). Definizione della Partizione di Memoria per un Sistema di Ela borazione funzionante in Batch e Time-sharing, Tesi di laurea (Università di Ge nova) . Seaman, P.H., Soucy, R.C. (1969). IBM System Journal, 8, No.4, 264.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
HARDWARE MEASUREMENT OF CPU ACTIVITIES H. Schreiber University of Erlangen-Nürnberg. IMMD III
The instruction stream processed by a computer CPU reflects the amount of functions called for by a certain user environment. It can be characterized by instruction frequencies and sequences. The only way to measure such data in an actual computing center, without any disturbance of the normal operation is by means of a hardware monitor. Measurements of this type have been performed at the two computing centers of the University of Erlangen-Nürnberg. They are equipped with Control Data CD 3300 and AEG Telefunken TR 440 computer systems. The hardware monitor used ("Zählmonitor II" [ij ) was developed and built at the computer science department (IMMD III) of this university. The intention to compare the results measured on computer systems of different manufactures requires a classification of the instruction set by functional aspects in order to eliminate as good as possible architectural characteristics. For our classification the Gibson-Mix Ü2J served as a model. It/s thirteen instruction classes were reduced to eight by summing up all discrete fixed point and floating point operations withint two classes. This makes possible comparisons with other data reported in the literature. CD 3300 Instr. Class
Day
TR 440 Gibson 360 Techn. Obj.
Night
360 Techn. Comp.
1,8
0,9
0,2
3,8
7,8
14,1
1 Transfer
47,6
46,7
62,3
49,2
50,5
39,7
2 Branch
38,3
44,7
22,6
16,6
20,7
34,2
3 Fltg. Pnt.
2,6
2,4
1,6
12,2
6,3
4 Fixed Pnt.
2,8
1,6
6,9
9,7
5,5
5 Shift
3,3
2,1
4,5
6,9 4,4
6 Boole
3,4
1,4
0,7
1,6
4,0
5,0
7 I/O and others
0,2
0,2
1,2
5,3
1,5
1,5
0 Compare
Table 1: Relative frequencies of instruction classes measured at Erlangen. The Gibson-Mix and data for an IBM 360 system are included.
3
Table 1 shows the relative frequencies of instructions measured at the CD 3300 and TR 440. For comparison and Gibson-Mix and data measured for an IBM 360 system [_3] are given as well. Results show a surpris ingly low use of arithmetic operations, especially floating point operations and a high rate of branches. These data were produced by a job mix that primarily consisted of compiler code and compiler 499
500
H. SCHREIBER
generated code. They are representative for the load generated by the user environment of a university computing center. Correlations between CPU operations are most comprehensively repres ented in form of a transition matrix. As an example data measured for the CD 3300 are shown in table 2. It is obvious from this table, that data transfer and branch type instructions are most frequently used as connecting elements between all other operations.
-
0,1
-
2,2
1,8
2,6
1,6
0,2
0,9
0,6
1,0
0,3
-
0 Compare
7,5
43,0
46,4
2,9
1 Transfer
1,0
50,0
40,5
2 Branch
0,8
43,2
53,1
3 Fltg. Pnt.
1,5
52,6
13,0
29,6
4 Fixed Pnt. 5 Shift
-
52,1
25,8
48,4
15,7
38,4
0,1
34,5
6 Boole 7 I/O and others
Table 2:
0,2
2,5
0,1 0,8
12,1
7,4
2,6
0,1
9,2
10,9
15,8
33,9
13,8
3,0
6,4
4,4
23,4
13,6
7,4
0,6
0,1
-
-
0,1 20,3
Transition probabilities of instruction classes for the CD 3300.
These data - besides others - will be used to develop and evaluate new computer architectures. Literature : [l] [2j [3j
R. Klar, H. Schreiber, H.C. Widjaja Messungen mit dem Zählmonitor II. Arbeitsberichte des IMMD, Bd. 8, Nr. 9, 1975 J.C. Gibson The Gibson Mix. IBM internal publication Nr. TR OO.2043, June 1970 M.J. Flynn Trends and problems in computer organizations. Information Processing 74, North Holland Pubi. Comp., 1974 S. 3 - 10.
Modelling and Performance Evaluation of Computer Systems, E. Gelenbe, ed. © North-Holland Publishing Company (1976)
AN APPROACH TO THE STRAIGHTFORWARD PRODUCTION OF COMPUTER SYSTEM SIMULATORS
Tedone
Abstract : The problem of the straightforward production of simulation models and related simulation programs is dealt with. The approach consists of two steps: 1)depart ins from technical specifications, the system is described in terms of activities,request s and resouroes using the J.D. Hoe's Pro-Nets representation. Much effort is made to structure the net so that a Simula 67 simulation program can be easily derived; 2)a program is derived in which net transitions are described by Simula procedures and tokens by Simula ohjeots. This approach has been successfully applied to the production of a simu lator for evaluating performance of the Enquiry Terminal System,an interactive terminal concentrator syst em,implement ed on a Selenia GP160 minicomputer. 1. Introduction Performance analysis is getting more and more important since computer eystemsare becoming more sophisticated. Many problems arise in evaluating performance; because of the complexity of the systems, simulation methods are generally used to investigate the system. The following fundamental steps can be considered in evaluating system performance: 1) Obtain the model. Starting from the system technical specifica tions one selects by means of successive ahstraot ions, what has to be considered and measured in the system and what can be mis sed. The system is described in terms of activities, requests and resources using an appropriate method of representation. 2) Produce a simulation program. A program is derived for the model using a simulation language. This program is then exercised with This work has been supported by Selenia S.p.A. under Convenzione Selenia-Consiglio Nazionale delle Ricerche. Pisa, Italy. Selenia S.p.A., Roma, Italy.
501
502
0. TEDONE
input data representing a given workload, and data are c o l l e c t e d i n order t o gain i n s i g h t i n t o system behaviour. 3) Analyze r e s u l t s . Results of simulation are analyzed i n order t o obtain s u g g e s t i o n s for modification i n the system t o be designed. In t h i s way the design of the system i s performed i n t e r a c t i v e l y u s i n g simulation r e s u l t s . The prohlem t h i s paper d e a l s with can be s t a t e d as that of reducing t h e gap between step 1 and 2 . To t h i s purpose, a r e p r e s e n t a t i o n of t h e system i s used which can d i r e c t l y lead t o a simulation program and on the other hand can be e a s i l y used by system d e s i g n e r s . In a previous paper by I a z e o l l a , M a r t i n e l l i and Tedonefl] the Noe and N u t t ' s ΕNet représentât ion [2] has been considered. The Noe's ProNets représentât ion £ 3 ] i s now considered here. This t u r n s out t o be quite s a t i s f a c t o r y i n r e p r e s e n t i n g a system, and t r a n s l a t i o n of ProNets i n t o a simulation program i s very easy and straightforward when the SIMULA 67 simulation language i s used.
2 . The modeling approach In modeling a computer system, d i f f e r e n t t y p e s of p r o c e s s e s running i n t h e system have t o be considered. Prooesses are c o n s i dered of a CPU or I/O type i f they mainly require the CPU or an i / o device r e s p e c t i v e l y . P r o c e s s e s of a s p e c i a l type have a l s o t o be g e n e r a l l y considered i n order t o represent the environment of the computer system. In d e r i v i n g the model, a l e v e l of a b s t r a c t i o n for the system has to he s e l e c t e d which could adequately represent system behaviour as far as the performance a s p e c t s i s ooncerned while n e g l e o t i n g d e t a i l s not d i r e c t l y r e l a t e d t o t h i s . In the proposed approach each process i s divided i n t o a sequence of a c t i v i t i e s . Each a c t i v i t y i n turn i s divided i n t o the f o l l o w i n g phases: 1) System s t a t u s a n a l y s i s and m o d i f i c a t i o n . Requests for d i f f e r e n t t y p e s of resouroes are considered and system s t a t u s v a r i a b l e s are modified a c c o r d i n g l y . Requests for a c t i v a t i o n of new pro c e s s e s are a l s o considered, but they don't come i n t o e f f e o t u n t i l phase 3 i s reached. This phase Í3 supposed to be a zero time phase. 2) Time consumption. A c e r t a i n time i s g e n e r a l l y consumed for the a c t i v i t y of the process i n the CPU for a CPU prooess or i n an I/O device for an I/O p r o o e s s . This time a l s o r e p r e s e n t s the time needed for a c t i o n s of phases 1 and 3 . 3) Requests d i s p a t c h i n g . A f i n a l phase i s considered «here a l l r e q u e s t s for a c t i v a t i o n of new p r o c e s s e s considered i n phase 1 are now examined. They w i l l immediately produce an a c t i v a t i o n of a prooess i f t h i s i s p o s s i b l e , otherwise w i l l be stored in a queue and w i l l be considered when the requested process i s a v a i l a b l e . This phase i s a l s o supposed t o he a zero time phase. All time needed for t h r e e phases i s evaluated i n phase 1) and
PRODUCTION OF SIMULATORS
503
consumed i n phase 2 ) . A f t e r t h e f i n a l phase has been completed, t h e a c t i v i t y t e r m i n a t e s and t h e p r o c e s s becomes a v a i l a b l e a g a i n or w a i t s f o r an event i n a w a i t i n g queue. The o p p o r t u n i t y f o r t h e above d i v i s i o n i s d e r i v e d from t h e n e c e s s i t y of t a k i n g i n t o account t h e preemption mechanism and t h e need f o r t h e preempted p r o c e s s t o be i n a d e f i n i t e s t a t e whenever an i n t e r r u p t a r r i v e s . I f a CPU p r o c e s s i s i n t e r r u p t e d from an i / o p r o c e s s , i t s r e s i d u a l CPU-time i s e v a l u a t e d and t h e CPU p r o c e s s i s t e m p o r a r i l y s t o r e d i n t h e ready queue. The f i r s t phase of t h e c u r r e n t a c t i v i t y i s supposed t o be t e r m i n a t e d , and, a s soon a s resumed, t h e CPU p r o c e s s r e e n t e r s t h e time consumption phase i n o r d e r t o spend t h e r e s i d u a l CPU-time. According t o t h i s point of view every computer system can be d e s c r i b e d i n t e r m s of r e s o u r o e s , r e q u e s t s f o r them and a c t i v i t i e s . One can t h i n k of a l l t h e computer system a s a very simple r a i l w a y n e t . T r a i n s ( p r o c e s s e s ) p a s s t h r o u g h s t a t i o n s ( p h a s e 1 or 3) b o a r d i n g p a s s e n g e r s ( r e q u e s t s ) or l e t t i n g them get o f f . The t i m e t h e t r a i n spends going from one s t a t i o n t o a n o t h e r (phase 2) i s c o n s i d e r e d t o i n c l u d e a l s o t h e t i m e t h e t r a i n s t o p s at t h e s t a t i o n . A computer system can he c o m p l e t e l y r e p r e s e n t e d by u s i n g P r o Nets r e p r e s e n t a t i o n . They a r e however used i n t h i s paper i n an i n formal way and have sometimes been s p e c i a l i z e d i n o r d e r t o s i m p l i f y t h e subsequent work of producing t h e s i m u l a t i o n program. The marking of input and output a r c s of a t r a n s i t i o n aooording t o which OR/AND f i r i n g c o n d i t i o n s can be very e a s i l y r e p r e s e n t e d , was p a r t i c u l a r l y s u i t a b l e i n r e p r e s e n t i n g t h e a c t i v a t i n g or hlo.cking mechanism f o r a process. In t h e net r e p r e s e n t a t i o n , t o k e n s a r e a l l c o n s i d e r e d t o be one of t h e f o l l o w i n g fundamental t y p e s : l ) P r o c e s s t o k e n s ( o r simply " p " t o k e n s ) , which r e p r e s e n t a p r o c e s s flowing t h r o u g h t h e n e t . 2)Request t o k e n s (or simply " r " t o k e n s ) , which r e p r e s e n t a r e q u e s t f o r a c t i v a t i o n of a nevi p r o c e s s , or a r e q u e s t f o r a r e s o u r c e . The d a t a s t r u c t u r e a s s o c i a t e d with an " r " t o k e n c o n t a i n s p a r a m e t e r r e f e r r i n g t o t h e r e q u e s t , i f a n y . The d a t a s t r u c t u r e a s s o c i a t e d i n s t e a d with a "p" t o k e n c o n s i s t s of a l i s t where r e q u e s t s f o r d i f f e r e n t r e s o u r c e s and a c t i v a t i o n s of neií p r o c e s s e s aro appended; t h e s e are t h e n e x p l o r e d and t a k e n away from t h e l i s t , a s long a s t h e t o k e n goes t h r o u g h d i f f e r e n t t r a n s i t i o n s . Because of t h e complex i t y of t h e system, i t i s g e n e r a l l y u s e f u l t o use composite or a b s t r a c t t r a n s i t i o n s t h a t summarize t h e e f f e c t s of i n t e r v e n i n g t r a n s i t i o n s of a more r e f i n e d n e t , with t h e i r own v a r i a b l e s , p a t h c h o i s e s and t i m e d e l a y s . A b s t r a c t t r a n s i t i o n s w i l l a l s o be c a l l e d macro-t r a n s i t i o n s . No g e n e r a l c o n s i d e r a t i o n s a r e p o s s i b l e f o r m a o r o - t r a n s i t i o n s and t h e y w i l l he d e s c r i b e d a s t h e y a r e e n c o u n t e r e d . In modeling t h e system, t r a n s i t i o n s whioh r e p r e s e n t v e r y simple and t y p i c a l s i t u a t i o n s a r e a l s o u s e d . The f o l l o w i n g t y p e s have heen considered. l)Running t r a n s i t i o n s , d e n o t e d as "R" t r a n s i t i o n s , r e p r e s e n t t h e s i t u a t i o n of a p r o c e s s s t a r t i n g t o r u n . Any t i m e t h e p r o c e s s i s a v a i l a b l e and a r e q u e s t f o r i t a r r i v e s , t h e p r o c e s s i s a c t i v a t e d .
504
0. TEDONE
2)Activating transitions, denoted as "A" transitions, represent the situation when a request for the activation of a new prooess is sent by the process passing through the transition. "A" transitions act in a symmetric way as the "R" transitions do. When an "A" transition fires, a request token is dispatohed and the activa ting process continues. When instead an R transition fires, the request token is considered, the available process is awakened and it starts running. 3)Holding transitions, denoted as "H" transitions, represent a pure delay for the prooees passing through. 4)Passivating transitions, denoted as "P" transitions represent the process going to sleep. It will be resumed and will start running again as soon as an activating request is sent to it. 5)A particular passivating transition is the waiting transition denoted as "W" transition. Flowing through the transition, the process is passivated and inserted into a waiting queue. 6)Dummy transitions, denoted as "D" transitions, are also introdu ced in the net to represent confluence of different tokens flows. They are generally used for ease of representation. The simulta neous presence of different tokens can't take place, in any case a simulation mechanism is supposed to exist according to which simultaneous tokens arriving at the same transition, are consi dered in a given order. The production of a program starting from the net representation of the computer system is quite straightforward if the Simula 67 language is used. The production of the program moves along the following threo st eps: l)Each transition is described by Simula procedure. Some of them are system defined class simulation procedures. Dummy and Running transitions have no direct translation, but are implioitly taken into account by the simulator. 2)Eaoh token representing a process is described by a Simula process object. Each token representing a request is desoribed by a Simula link object.The body of each Simula process class consists in a sequence of procedure calls representing transitions that the token will encounter flowing through the net. For each class a data structure is also declared for storage of request parameters. 3)Simulation parameters are read in and initial conditions are stated. The outlined approach will be used in deriving the simulation model and the corresponding simulation program of the Enquiry Terminal System that will now be described.
3. The ETS system The system to be analyzed is the Enquiry Terminal System that connects a number of video terminals to an Univac 1100 computer, using a GP-160 Selenia minicomputer, a 16 bit, 32 k-words mini computer.
PRODUCTION OF SIMULATORS
505
Special programs called communication programs are responsihle for message exchange between the terminals and the host computer. Communication programs run under a real time operating system, which is essentially responsihle for scheduling and physical I/O handling. In deriving the model, CPU r-.nc' I/O processors arc supposed to function in parallel as indicated in Fig. 1.
output requests to host trasmission
from host trasmission
input requests
Fig.1
One I/O p r o c e s s , vihich s i m u l a t e s t h e p h y s i c a l i / o , i s c o n s i d e r e d f o r each t e r m i n a l and one for t h e l i n e c o n n e c t i n g t h e system t o t h e host computer. Only one i n t e r r u p t at t h e end of t h e l/O p r o c e s s i s considered. The l i n e p r o c e s s t a k e s i n t o account t h e time f o r t r a n s m i s s i o n t o and from t h e host computer and s i m u l a t e s a l s o t h e h e h a v i o r of t h e host computer. As i n d i c a t e d i n F i g . 1 input r e q u e s t s , ( i . e . r e q u e s t s f o r messages t o be t r a n s m i t t e d t o t h e host computer) a r e c o n s i d e r e d e x t e r n a l t o a l l t h e system. Output r e q u e s t s , ( i . e . r e q u e s t s f o r messages t o be t r a n s m i t t e d i n answer from host computer t o t e r m i n a l s ) a r e g e n e r a t e d i n s i d e t h e host computer. For t h e sake of simp l i c i t y , two independent p r o c e s s e s w i l l be c o n s i d e r e d i n t h e model f o r each t e r m i n a l , one r e s p o n s i b l e f o r g e n e r a t i o n of input r e q u e s t s
506
0. TEDONE
and the other for generation of output requests. The system is supposed to run without operator intervention and transient conditions are not considered. The supervisor scheduler process has been considered and is responsible for the selection of the next ready process to be acti vated according to the scheduling discipline, which is a priority one. Communication processes run under an operating system and can be interrupted. When interrupted, processes are inserted in the ready queue in a passive state. A particular process called hosthandler is responsible for activation of the line process. After a message is sent to the host computer, the hostandler waits for a response from the host and is in turn activated by the line process. For each terminal there are two "aotuator" prooesses, one re sponsible for input and the other for output. They are both respon sible for activation of the i/O prooesses. Another process for treatment of messages is also considered. It obtains the concentration and acts as a link botwoen the hosthandler and actuator processes.
4.
The net representation.
Pig. 2 provides the system net representation. To distinguish process token from request token, locations where the latter are supposed to arrive are denoted as (r) locations. Many transitions in the net are macro-transitions whioh in turn oan be described in terms of token and more simple transitions. Details of macro-transition are not reported in this paper. Maoro transition are indicated with a name that will also be the name of the corresponding SIMULA procedure. Transitions of the type indicated in Section 2 are indioated with a letter and a progressive number. Any transition is supposed to be a zero time transition except for the ENTERCPU the HOSTRESPONSE and all H type transitions. Any time an activity of a prooess terminates, and the A transition fires, the scheduler process is activated (L..) a n Q will start running passing through transition R.. The scheduler process enters the SELECT transition in order to seleot the next prooess to be activated in the ready-queue. The scheduler process will send a request token (L ) to the H transition and will enter a passive phase passing through transition Ρ . The time spent by the scheduler is considered as a part of the total CPU time of the selected prooess. As soon as activated, the new process (L ) enters the KSGHANDLER transition. This is a transition where, aooording to the particular process, different actions are made depending on the mes sages to be exchanged with the host and the status of terminale. Requests for activation of new processes are stored in the data stru cture of the process token passing through, and will be examined when
output requests
Fig 2
508
0. TEDONE
the t oken ent ers the DISPATCHER transition. The prooess then enters (L ) the ENTERCPU transition and this transition simulates the time spent by a process in the CPU; the CPU preemption mechanism is also taken into account. If the process has been previously preempted it immediately enters (L , L ) the ENTERCPU transition for consuming the residual time without passing through the MSGHANDLER transition. The ENTERCPU transition can also be entered by a preempting process (L 1 ) · In case an interruptible prooess is present in CPU, its residual time is evaluated, and the interrupted prooess (L ) will enter the TASKINSERT transition (L ) and wait to be resuäed in the ready-queue (L ) . Any CPU process, after the time to be spent in CPU has completely elapsed (L ) , enters the DISPATCHER transition and all requests for activation of new processes are considered. Also I/O prooesses enter the DISPATCHER transition before terminating. Entering the DISPATCHER transition a different action is taken, for each request stored in the data structure of the terminating prooess. Whenever a request for activation of a new CPU prooess is encountered, the request token (Lft) enters the TASKENQUEUE transi tion. Only one transition of this type is indicated in Fig. 2, but one for each process should he considered. If the requested prooess is idle, a token representing the process (L 1 n ) will enter the TASKINSERT transition and then wait in the ready-queue to be consi dered by the scheduler. Whenever a request of waking up a procese waiting in the blooked-queue is encountered, the request token (L 7 ) will activate the WAKEUP transition and the awakened process will pass into the ready-queue ( L 1 7 i L 2 p)· Requests for i/o process activation represented by a token in location L will be considered later. When all requests in the data structure of the passing token have been considered, the scheduler process will be activated (L. ) and the current process will enter a passive phase (L ) . If this process is a CPU process, it will enter the W. transition through L waiting for an event in the blocked queue (L ) , or will enter the TASKENQUEUE transition (L ) ; in this case it will become idle if no activation request (L„) I S waiting for it, or will enter the ready-queue again in order to serve a new request (l> ¿, L90)* Requests for activation of an i/o process, as soon as dispatched by the DISPATCHER transition, arrive in location L of Fig. 2. Passing through the dummy transition D the request token enters the DRIVERENQUEUE transition for the selected terminal. If the i/o process corresponding to the terminal is idle (L ,...,L ) it starts running immediately, otherwise the request is properly enqueued. When on the contrary the i/o process stops, a token arrives in location I*1?i enters the DRIVERENQUEUE transition (L ,...,L ) and the first request,if any, waiting for the activation or waking up of other processes TERHHANDLER transition. Requosts for activation or waking up of other processes arc eventually stored in the data structure of the token. The i/o process token successively enters a HOLD
PRODUCTION OF SIMULATORS
509
transition simulating time for i/o operations. After all the time has elapsed, the ENTERCPU transition is entered in a preempting mode and the DISPATCHER transition will finally analyze activation re quest s. In a very similar way, in modeling the communication with the host computer, a line process has teen considered which will he activated as soon as a request for communication to the host computer is encountered (L ,L ) at the DISPATCHER transition. When activated, the token representing the process (L ,) flows through transition R,transition H representing time for transmission from ETS to host computer.Then the token enters the HOSTRESFONSE transition ( L _ Q ) that represents the hehaviour of the host computer as far as commu nications are concerned; a time which representa a polling time is associated with the H0STRESPO1TSE transition. The process token then passes to the H transition representing the time for transmission from the host and finally enters the ENTERCPU transition in a preempting mode (L ) . After having passed the DISPATCHER transition, the process token will appear again in location L in a passive state. Two different processes, not represented in figure, are also considered for each terminal to simulate input and output for the system according to given random distributions. An input request to one terminal is represented by a request token in location L of Fig. 2. A request for the input process of that terminal is associa ted with the token and the request is properly enqueued. An output request from the host computer to one terminal ÍB simulated by a new request token ( L ) in a message-queue which will he examined by the line process entering the HOSTRESPONSE transition.
5. The simulation program The simulation program is derived as outlined in Section 2. Transitions are descrihed as Simula procedures. Request tokens are descrihed as link class objects and each object contains the request parameters. Process tokens are instead descrihed as prooess class objects where the body of each class consists in a sequence of proce dure calls indicating the flow of the process through the net. As an example, the simulation program descrihing the hosthandler process is reported. Each CPUPROC class has heen declared as a Simula process class and the data structure has been defined. PROCESS CLASS CPUPROC; BEGIN REF (HEAD ) REQUEST QUEUE ; REAL SAVETIME,RESIDUALTIME,CPUT IME; BOOLEAN INTERRUPTED; INTEGER PRIORITY,TERMINAL; TEXT WAY;
REQUESTQUEUE :- NEW HEAD;
510
0. TEDONE END; CPUPROC CLASS HOSTHANDLER; BEGIN WHILE TRUE DO BEGIN MSGHANDLER; RESIDUALTIME :=TOTCPUTIME ; CPULOOP : ENTERCPU (FALSE); I F INTERRUPTED THEN GOTO CPULOOP; DISPATCHER; ACTIVATE SCHEDULER DELAY 0; WAIT ( BLOCKEDQUEUE); END; END;
The ENTERCPU procedure is also illustrated. Two different types of aotions are executed according to the way the procedure is called. In case it is called in a preempting mode, the process actually run ning in the CPU, if any, is removed from the CPU; the residual time to be spent is evaluated and the process inserted in the readyqueue in a passive state. The preempting process immediately enters the CPU. In case the prooedure is called in a non preempting mode, the CPU is simply occupied and the time starts running for the prooess. PROCEDURE ENTERCPU (PREEMPT); BOOLEAN PREEMPT; BEGIN IF PREEMPT THEN BEGIN IF CPU=/