c Heldermann Verlag ISSN 0940-5151
Economic Quality Control Vol 18 (2003), No. 2, 165 – 194
Availability Formulas and Performance Measures for Separable Degradable Networks Cornelia Sauer and Hans Daduna
Abstract: We derive explicit formulas for availability and performance measures in complex degradable stochastic networks. The networks consist of interacting service systems (queues) where servers are unreliable and may break down. The breakdown and repair mechanisms are of rather general structure, and may show complex dependencies among the nodes of the networks. The degradable network models investigated incorporate the breakdowns of the servers and their repairs as well as the behavior of the service mechanisms and the customers’ routing into the overall Markovian system description. The service rates may depend on the local loads of the service centers, and furthermore working and repair periods are globally determined by state dependent intensities. Three different rules are applied to handle the customers’ routing connected with nodes in repair status. These rules originate from communication and production theory where they are used to resolve blocking situations. Modeling unreliable networks in this way leads to steady state distributions of product form for the supplemented Markovian state processes. From these first quantities we are able to derive point availabilities for single nodes and for groups of nodes as well as the standard performance measures for service networks. Key words: Performability, reliability, Jackson networks, Gordon-Newell networks, blocking rules, product form solution, point availability, throughput.
1
Introduction
“One of the most difficult tasks in reliability engineering is to analyse dependent components (often referred to as common mode failures). It is difficult to formulate the dependency in a mathematically stringent way . . . ”. [1][p. 32]. In parallel to this observation which is concerned with reliability aspects of systems (e.g. availability) in queueing theory single and multichannel servers with unreliable service channels pose hard problems when explicit and simply to use formulas are required. There are early papers in the area of Operations Research which contribute to this field, see e.g. [45], [23], [43], [2] for the single server case and [32] and [34] for the case of a node with parallel servers. These papers mainly deal with performance analysis aspects of the service systems under investigation. In [30] and [31] a two-node-network with unreliable nodes is analysed and it is shown that considerable mathematical effort using the theory of complex variables is needed. A systematic study of single node systems with unreliable service channels is
166
Cornelia Sauer and Hans Daduna
given in [3] considering in parallel the aspects of performance analysis and of availability as far as explicit results and formulas could be derived. Besides, there are two success stories in applied probability, which deal with complex systems and which developed separately. • The queueing networks’ success story which began with the work of Jackson, Gordon and Newell, the BCMP–group, and Kelly, who provided us up to 1975 with a versatile class of so-called separable networks with explicitly accessible steady state of product form. For recent surveys see [42], [10], [12]. • The success story of reliability theory for complex networks dates back at least to the influential books of Barlow and Proschan [5], [6]. Since then the mathematical theory of availability analysis and maintenance optimization for complex systems has considerably developed, especially driven by the modern theory of stochastic processes, [1]. A compilation of many related topics is contained in [25]. For recent surveys see [1], [8], [9].
Unfortunately it turned out that a unified modeling of queueing networks for a simultaneous investigation of performance and reliability aspects resisted up to now the derivation of easy to apply recipes for systems’ analysis. The occurrence of unreliable servers, respectively transmission channels (“the nodes”) in networks does not allow the elegance of performance analysis methods based on product form solutions. Network models with nodes of degradable performance which we found in the literature are not separable. Nevertheless over the last twenty years the field of Performability Theory has been developed starting from the direct needs for unified modeling and investigation of computer systems and networks and telecommunications networks which incorporate performance analysis and reliability analysis, see [29] or [20]. The proceedings of the Fifth International Workshop on Performability Modeling of Computer and Communication Systems [18] contain theoretical studies as well as applications of performability techniques. The experience of research in this fields confirmed that the interplay between performance analysis and reliability theory is a matter of structural complexity and computational problems. Indeed both stochastic processes, namely that describing the performance of a system and that describing the availability of a system’s components are processes in a random environment, defined by the respective other process. These interacting processes, respectively random environments, are usually thought to resist explicit computations of steady states without strong simplifications as it is e.g. necessary for the reduced work-rate approximation, sketched below. Without having access to the steady state distribution of the network a unified mathematical framework for reliability and performability models is described in [44] using the framework of Markov reward models. It can be the starting point for developing computational schemes and approximation procedures, see [10] for a recent survey.
Availability Formulas and Performance Measures for Separable Degradable Networks
167
From this point of view the aim of our research is to further contribute to bridge the gap between the worlds of performance analysis and reliability theory in connection with performability of queueing networks. A recent survey on how to approximately solve the related problems by means of models introducing the feature of unreliable nodes into queueing network models is given in [11], where four different suggestions to overcome the lack of explicit steady state solutions are recollected and further developed. • For the reduced work–rate approximation the unreliable server is replaced by a reliable one with a service capacity incorporating the mean down times of the actual server. • The Poisson approximation replaces the interacting network nodes by nodes in isolation with suitably adjusted Poisson arrivals and then uses the single server results from classical theory. • The MMPP approximation (Markov–modulated Poisson process [17]) generalizes the latter approach by integrating the non–renewal property of the network flows into the arrival processes of the isolated queues. • A further modification is the joint–state approximation, where the up/down status of the nodes are controlled by an environment process and then incorporated into the MMPP approximation. Our approach is different from those described in [11]. We develop new network models which integrate breakdowns of the servers and their repair into the very first Markovian system description. The probability for a breakdown of a server particularly depends on the locally defined load of that node, and the repair process may be state-dependent as well. The main difficulty then is to develop modified routing regimes which regulate the customers’ movements in the network in times of nodes in repair status by forbidding new customers to enter these nodes. We obtain for such network models steady states of product form. Thus, having the fundamental joint characteristics of queue lengths and up/down status at hand we are able to analyse distributed complex systems with strongly dependent components which interact by the routing customers as well as by common mode failures. In Section 2 we introduce as starting point the classical exponential networks with stochastic routing and state-dependent service rates. In Section 3 we describe the breakdown and repair mechanisms and introduce the candidates for the steady state distributions for the degradable networks described in Section 4. Sections 5 and 6 constitute the paper’s core. After introducing the applied routing rules in case of a server’s breakdown, we state and prove our main results on product form steady states. In Section 7 we derive for prototype systems availability and performance measures for the degradable Jackson networks and subnetworks thereof. Section 8 contains some conclusions and an outlook. As to the notation, throughout the paper we set empty sums to 0, empty products to 1, and 00 := 0 and, finally, ⊂ means “subset or equal”.
168
2
Cornelia Sauer and Hans Daduna
Standard separable networks
In this section we summarize definitions and theorems, which will be needed in the sequel. The starting point for our development are the open networks of single server nodes with state dependent service intensities introduced by Jackson [21]. 2.1 Definition (Jackson network) • A Jackson network is a network of service stations (nodes) numbered {1, 2, . . . , J} := ¯ Station j is a single server with infinite waiting room under FCFS regime. CusJ. tomers in the network are indistinguishable. At node j there is an external Poisson– λj arrival stream, λj ≥ 0. Customers arriving at node j from the outside or from inside of the network request for an amount of service time, which is exponentially distributed with mean 1. Service at node j is provided with intensity µj (nj ) > 0, if there are nj > 0 customers present at node j. All service and interarrival times are assumed to be independent. • Movements of the customers in the network are governed by a Markovian routing mechanism. A customer on leaving node i selects with probability r(i, j) ≥ 0 to visit node j next, and then enters node j immediately, commencing service if he finds the server idle, otherwise he joins the tail of the queue at node j; with probability r(i, 0) ≥ 0 this customer decides to leave the network immediately. Thus for all i ∈ J¯ J
r(i, j) = 1
(1)
j=0
holds. Given the departure node i the customer’s routing decision is made independently of the network’s history. • Let J¯0 := {0, 1, . . . , J} and λ :=
J
λj ,
r(0, j) :=
j=1
λj λ
(2)
and r(0, 0) := 0, then we assume that the matrix R := (r(i, j) : i, j ∈ J¯0 ) is irreducible. • Let Xj (t) denote the number of customers present at node j at time t ≥ 0, either waiting or in service (local queue length at node j), then X(t) := (Xj (t) : j = 1, . . . , J) is the joint queue length vector of the network at time t and X = (X(t) : t ≥ 0)
(3)
is the joint queue length process of the Jackson network with state space E := IN J .
Availability Formulas and Performance Measures for Separable Degradable Networks
169
2.2 Theorem ([21]) The joint queue length process X of the Jackson network is a Markov process with transition rates (q(x, y) : x, y ∈ E) given by: ¯ i = j, and x = (n1 , . . . , nJ ) ∈ E For i, j ∈ J, q(n1 , . . . , ni , . . . , nJ ; n1 , . . . , ni + 1, . . . , nJ ) = λi and if ni > 0 q(n1 , . . . , ni , . . . , nJ ; n1 , . . . , ni − 1, . . . , nJ ) = µi (ni ) r(i, 0) q(n1 , . . . , ni , . . . , nj , . . . , nJ ; n1 , . . . , ni − 1, . . . , nj + 1, . . . , nJ ) = µi (ni ) r(i, j) Furthermore q(x, x) = −
y∈E\{x}
q(x, y)
q(x, y) = 0otherwise
(4)
The traffic equation of the network η j = λj +
J
η i r(i, j),
j = 1, . . . , J,
(5)
i=1
has a unique solution which we denote by η = (η 1 , . . . , η J ). We assume henceforth X to be ergodic implying that the unique stationary and limiting distribution π ˆ of X is given by: nj
J 1 ηj π ˆ (n1 , . . . , nJ ) = K(J) j=1 k=1 µj (k)
for (n1 , . . . , nJ ) ∈ IN J with normalization constant K(J). 2.3 Remark The class of exponential FCFS single servers with state-dependent service rates encompass the classical exponential FCFS multiservers with state-independent service rates. The steady state distribution of the closed analogue of the Jackson network was found by Jackson some years later, see [22], and rediscovered by Gordon and Newell [19]. We sketch the relevant results. 2.4 Definition (Gordon-Newell network) • A Gordon-Newell network consists of a set of single server nodes J¯ := {1, 2, . . . , J} as described in the Definition 2.1 of the Jackson network without external arrivals and departures, i.e. the probabilities r(j, 0) for all j ∈ J¯ and the total network arrival rate λ are set to zero. There are D > 0 indistinguishable customers cycling in ¯ The the network according to an irreducible Markov matrix R = (r(i, j) : i, j ∈ J). service times are the same as in the open Jackson network case and the independence assumptions on service times and routing decisions are assumed to hold as well.
170
Cornelia Sauer and Hans Daduna
• Let Xj (t) denote the number of customers present at node j at time t ≥ 0, either ¯ waiting or in service (local queue length at node j). Then X(t) := (Xj (t) : j ∈ J) is the joint queue length vector of the network at time t. We denote by X = (X(t) : t ≥ 0) the joint queue length process of the Gordon-Newell network with state space S(J, D) := (n1 , n2 , . . . , nJ ) ∈ NJ : n1 + n2 + . . . + nJ = D
(6)
2.5 Theorem ([22], [19]) The joint queue length process of the Gordon-Newell network X is a Markov process with transition rates (q(x, y) : x, y ∈ S(J, D)) given by: ¯ i = j, and x = (n1 , . . . , nJ ) ∈ S(J, D) : For i, j ∈ J, q(n1 ,. . . ,ni , . . . , nj , . . . , nJ ; n1 , . . . , ni −1, . . . , nj +1, . . . , nJ ) = µi (ni )r(i, j) if ni > 0(7) and q(x, x) = −
q(x, y)
(8)
y∈S(J,D)\{x}
q(x, y) = 0 otherwise
(9)
The process X is ergodic. Let η = (η 1 , . . . , η J ) denote the unique probability solution of the traffic equation ηj =
J
¯ η i r(i, j), j ∈ J.
(10)
i=1
The unique stationary and limiting distribution π ˇ=π ˇ (J, D) of X on S(J, D) is nj J ηj 1 π ˇ (n1 , . . . , nJ ) = G(J, D) j=1 i=1 µj (i)
(11)
for (n1 , . . . , nJ ) ∈ S(J, D) with G(J, D) as normalization constant.
3
Degradable Nodes and Repair
For the networks described in Section 2 we consider the servers at the nodes to be unreliable, i.e., the nodes may break down. In this case the network’s performance is degraded because a subset of the operating systems which constitute the network is not available. The breakdown events are of rather general structure and may occur in different ways: Nodes may break down as an isolated event or in groups simultaneously and the repair of nodes may end for each node individually or in groups as well. It is not required that those nodes which stopped service simultaneously, return to service at the same time instant. For describing the system’s evolution the state space has to be enlarged for each network process as described below.
Availability Formulas and Performance Measures for Separable Degradable Networks
171
Control of breakdowns and repairs: ¯ ⊂ J¯ be the set of nodes in down status and I¯ ⊂ J\ ¯ K, ¯ I¯ = ∅, be some subset • Let K ¯ of nodes in up status. Then the nodes of I break down concurrently with intensity ¯ ¯ ∪ I), ¯ if there are ni customers at node i, i ∈ K ¯ ∪ I. ¯ So breakdown of (ni : i ∈ K αK I¯ node set I¯ depends on local loads of the corresponding nodes. • Nodes in down status neither accept new customers nor continue serving the old customers who have to wait for the server’s return. (At any node i under repair the service intensity µi (ni ) is set to 0.) We will describe below three regimes to handle routing connected with nodes in down status below. ¯ are under repair, K ¯ ⊂ J, ¯K ¯ = ∅, with ni customers at node • Assume the nodes in K ¯ ¯ ⊂ K, ¯ H ¯ = ∅, i, i ∈ K, which are waiting there for service to be resumed. Then, if H ¯ K ¯ ¯ return from repair as a batch group with intensity β ¯ (ni : i ∈ K) the nodes of H H and immediately resume their services. Routing then has to be updated again as will be described below. ¯ ¯ ∪ I) ¯ and β K¯¯ (ni : i ∈ K) ¯ for occurrence (ni : i ∈ K In this general setting the intensities αK H I¯ of breakdown and repair events cannot be chosen arbitrarily, but have to meet some constraints. We have found versatile rules for rather general classes of suitable intensities by letting the breakdown and repair events be controlled by a Markov process which may depend on the number of customers momentarily at the nodes in the network. If we represent for each node j ∈ J¯ its operating, normal status respectively its repair status as state 0 respectively 1 and consider breakdowns as births and repairs as deaths we are in the range of multidimensional birth–death processes on state space {0, 1}J , i.e. processes with transition intensities + a, α(m, a), if m =m for m, m , a ∈ {0, 1}J . q(m, m ) := (12) − a, β(m, a), if m =m
Following Serfozo we assume that these control birth–death processes are reversible to obtain explicit steady states. In [41][Table 1] several types of special birth and death intensities are described in connection with the resulting steady state distribution, which we can apply to our setting. We generalize Serfozo’s versions of (12) to Markov modulated multidimensional birth–death processes where we let additionally the transition intensities in a general way be dependent on the customer loads at nodes concerned. We present the details for the rather general case of the fifth entry of [41][Table 1], extending the definition to our systems as follows: 3.1 Definition The intensities for breakdowns, respectively repairs for I¯ = ∅ are ¯ ∪ I) ¯ A(ni : i ∈ K ¯ ¯ (13) αK I¯ (ni : i ∈ J) := ¯ A(ni : i ∈ K) respectively ¯ ¯ βK ¯ (ni : i ∈ J) := H
¯ B(ni : i ∈ K) ¯ H) ¯ B(ni : i ∈ K\
(14)
172
Cornelia Sauer and Hans Daduna
where A and B are nonnegative functions, A, B : l∈J¯0 Nl → R+ = [0, ∞). ¯ ¯ and β K¯¯ (ni : i ∈ J) ¯ will serve as transition intensities for a Markovian (ni : i ∈ J) Since αK H I¯ system process they will be assumed henceforth to be finite. For standardization we set A(ni : i ∈ ∅) := 1 =: B(ni : i ∈ ∅). It follows, e.g., that for a network in complete up status (all nodes are functioning) the ¯ I¯ = ∅ is α∅¯(ni : i ∈ I) ¯ = A(ni : i ∈ I) ¯ and the breakdown intensity for node set I¯ ⊂ J, I ¯ is: return intensity for the complete set of broken down nodes, say K, ¯ ¯ = B(ni : i ∈ K) ¯ β K¯ (ni : i ∈ K) K
Furthermore the assumptions on the transition intensities imply for example ¯ = 0 =⇒ A(ni : i ∈ K ¯ ∪ I) ¯ = 0 for all K ¯ ⊂ J, ¯ I¯ ⊂ J\ ¯ K, ¯ I¯ = ∅, A(ni : i ∈ K)
(15)
which means that if at a fixed customer state (n1 , . . . , nJ ) ∈ NJ the common breakdown ¯ ∈ J, ¯ is zero, than this set K ¯ cannot break down simultaneously rate for a set of nodes, K together with other nodes either. 7 Example (Independent breakdown and repair) If the breakdown and the repair intensities of a single node, say i, depend only on the local load, ni , at this node, the functions A and B are assumed to be of product form ¯ = ¯ = A(ni : i ∈ I) ai (ni ), B(ni : i ∈ H) bi (ni ), (16) i∈I¯
¯ i∈H
¯ ⊂ J¯ yields which for all combinations of nodes K ¯ ¯ ¯ =∅ αK ai (ni ) with I¯ ∩ K I¯ (ni : i ∈ J) =
(17)
i∈I¯
¯ βK ¯ (ni H
¯ = : i ∈ J)
bi (ni )
¯ ⊂K ¯ with H
(18)
¯ i∈H
8 Example The class of breakdown intensities in Definition 3.1 includes the following important special cases: (a) State independent breakdown and repair Breakdown and repair intensities depend of the nodes but are independent of the customer load in the network. Then the functions A and B only depend on the node set concerned by the breakdown, respectively repair: ¯ = A(I), ¯ A(ni : i ∈ I)
¯ = B(H) ¯ for all H ¯ ⊂ J, ¯ I¯ = ∅. B(ni : i ∈ H)
(19)
With these assumptions the breakdown/repair process is Markovian on its own on ¯ of all subsets of J. ¯ The state description of this process can be the state space P(J) equivalently chosen to be {0, 1}J and the special intensities selected for this example coincide exactly with those given in [41][(5),Table 1].
Availability Formulas and Performance Measures for Separable Degradable Networks
173
(b) Vacation systems A more sophisticated regime is obtained assuming nodes which only break down when they are idle and, therefore, the function A is defined as ¯ = A(I) ¯ A(ni : i ∈ I)
¯ I¯ = ∅, δ 0 ni for all I¯ ⊂ J,
(20)
i∈I¯
and similarly the function B may take positive values only for (subsets of) nodes which are empty as well: ¯ = B(H) ¯ B(ni : i ∈ H)
¯ ⊂ J, ¯H ¯ = ∅. δ 0 ni for all H
(21)
¯ i∈H
The behavior of this system can be considered as a network of queues where empty servers take vacations: Whenever the server becomes idle it waits at most an exponentially distributed time (with parameter A(0)) for an arrival of a new customer. If no customer arrives the server takes a vacation which is exponentially distributed with parameter B(0). When a vacation is ongoing new customers are rejected. When the vacation expires the returned server again waits an exponential A(0) time for new customers. If a new customer arrives, service immediately commences. Otherwise a new vacation is taken by the server, and so on. Such systems for the case of single nodes have found much interest in literature during the last years, see [15] for a review and [26] for more recent developments. 3.2 Remark (a) The other versions of (12) collected by Serfozo [41] can each serve as well as starting point for developing state-dependent breakdown and repair intensities similar to those of Definition 3.1. Applying then the rerouting principles of Sections 5 and 6 leads to theorems completely in parallel to the results proved there: The networks are separable with product form steady states. The product form given in Definition 4.1 has to be modified by only incorporating the steady state distribution of the birth–death process which corresponds to the newly developed breakdown and repair rates. (b) In case (a) of Example 8 the distribution of the time until the first breakdown, when starting in a network state in which all nodes are working, is distributed according to the multivariate exponential Marshall-Olkin distribution, see [27].
4
Steady States for Degradable Networks
There are various strategies in practice to handle breakdowns in case of unreliable machines or transmission channels, and there are various models in literature describing at least approximately these features. Either, customers are assumed to be lost when the
174
Cornelia Sauer and Hans Daduna
server breaks down, or they are stored at the waiting room, or they are rerouted to functioning nodes to obtain service there. Often these principles are translated into abstract stay–on and rerouting schemes. We adapt some abstract rules to our framework which are usually used to resolve blocking situations. The subsequent theorems show that separable networks are obtained in case of unreliable servers. We first present the probability measures of typical product form. These are the central features for our further development because they will serve as steady states for various network processes. 4.1 Definition (Product form equilibrium states for degradable networks ) For describing the breakdown of nodes in Jackson respectively Gordon-Newell networks we have to attach to the state spaces E := NJ respectively E := S(J, D) of the corresponding network processes X an additional component which carries information of the availability (reliability) behavior of the system described by a process Y . We introduce states of the form ¯ × NJ ¯ n1 , n2 , . . . , nJ ) ∈ P(J) (22) (I; respectively ¯ n1 , n2 , . . . , nJ ) ∈ P(J) ¯ × S(J, D) (I;
(23)
with I¯ = ∅. ¯ I¯ operating in a normal The set I¯ contains the nodes under repair. For a node j ∈ J\ up status there are nj ∈ N customers present and if nj > 0 one of them is under service. For a node i ∈ I¯ in down status there are ni ∈ N customers waiting for the return of the = (Y, X) for repaired server. By means of these states we define the Markov processes X the networks on the state spaces
J ¯ (24) E=E∪ P(J)\∅ × N respectively
¯ E = S(J, D) ∪ P(J)\∅ × S(J, D)
(25)
with probability measures of product form J nj 1 ηj π(n1 , . . . , nJ ) = C j=1 i=1 µj (i)
(26)
with (n1 , . . . , nJ ) ∈ E and (n1 , . . . , nJ ) ∈ S(J, D), respectively and nj J ¯ ηj 1 A(ni : i ∈ I) ¯ π(I; n1 , n2 , . . . , nJ ) = ¯ C B(ni : i ∈ I) µ (i) j=1 i=1 j
¯ × NJ ¯ n1 , . . . , nJ ) ∈ P(J) for (I; ¯ n1 , . . . , nJ ) ∈ P(J) ¯ × S(J, D), respectively, and (I; with I¯ = ∅
(27)
Availability Formulas and Performance Measures for Separable Degradable Networks
175
where η j is the solution of the traffic equation (7) respectively (10) of the standard Jackson respectively Gordon-Newell network and C the normalization constant: n j J A(ni : i ∈ I) ¯ η j C= (28) 1 + ¯ = C(J) µj (i) B(ni : i ∈ I) (n1 ,...,nJ )∈E
j=1 i=1
¯ J¯ I⊂ I¯=∅
in case of open networks, where we assume C to be finite, and nj J A(ni : i ∈ I) ¯ ηj C= 1 + ¯ = C(J, D) µj (i) B(ni : i ∈ I) ¯ J¯ I⊂ j=1 i=1 (n1 ,...,nJ )∈S(J,D)
(29)
I¯=∅
in case of closed networks. For a Jackson respectively Gordon-Newell network according to Definition 2.1 respectively Definition 2.4, with control of breakdowns according to Definition 3.1 and with suitably defined routing processes (see the descriptions in Chapters 5 and 6 below) these probability measures will serve as uniquely defined stationary and limiting distributions.
5
Routing in Case of Breakdowns: Blocking
Next we discuss several routing protocols in case of breakdowns of servers. The first one is derived from a well known principle used to resolve blocking situations in networks with resource constraints. Blocking mechanisms have been developed for stochastic networks in which a customer cannot leave a node after having obtained his service because the waiting capacity of the next destination node is reached. Therefore, the customer cannot join this node and the flow through the network according to the routing matrix is delayed or totally stopped. We give a short survey about the most prominent methods to model the case of blocking. For more detailed reviews of blocking techniques see [36], [37] or the recently appeared book [4]. Blocking Mechanisms • Blocking after service (bas) Assume that after completing the service at node i at time t a customer chooses the next destination node j according to his routing instruction. If at time t node j is not able to accept further customers, the customer stays and blocks server i until the situation at node j has changed and the customer can enter node j. Node i is blocked during this waiting period, i.e., it cannot start serving another customer, who might be waiting in the waiting room. If several nodes are simultaneously blocked by the same node j, then the “first blocked - first unblocked- rule” usually determines the order in which the nodes will be unblocked when a departure occurs from the destination node j.
176
Cornelia Sauer and Hans Daduna
• Blocking before present service (bbs) A customer at node i selects the following destination node j according to the routing rule before he starts receiving service at node i at time t. If node j is full at time t, node i immediately becomes blocked. When a departure occurs from node j, node i becomes unblocked and service begins. However, as soon as the destination node j becomes full again during the customer’s service at i, the service is interrupted and node i becomes blocked again. Depending upon whether the customer at the blocked node i is allowed to occupy the position in front of the server or not, one can distinguish the two cases bbs-server occupied and bbs-server not occupied. Of course this is only meaningful when node i has finite waiting capacity. • Repetitive service (rs) A customer after being served at node i chooses the next destination node j according to the routing instruction. If node j is full, the customer stays at node i to obtain another service. When this additional service expires the customer either selects his destination node anew according to his routing instruction (rs - random destination) or the previously chosen node j remains his fixed destination. In this case, the customer’s service at node i has to be repeated until at the end of a service, node j is able to receive another customer (rs - fixed destination). 5.1
The Blocking principle RS–RD
We have selected the blocking principle Repetitive Service – Random Destination (rs–rd) to obtain a new routing regime for our networks in cases that there are nodes which are under repair and cannot continue their services and cannot accept customers. Then customers cannot enter these nodes and are redirected to different nodes in the network generating a situation which is similar to blocking. The principle rs–rd has the advantage that the service intensity at a blocked node formally must not be changed and is independent of the situation at other nodes. We derive a new routing matrix according to rs–rd for our networks which regulates the customers’ moves between the nodes that are not under repair. 5.1 Definition (rs–rd with reversible routing in exponential networks) Let I¯ be the set of the nodes of a Jackson respectively Gordon-Newell network which are presently under repair. Then the routing probabilities being restricted to nodes with ¯ I) ¯ are defined by indices from J¯0 \I¯ (respectively J\ ¯ I), ¯ i = j, r(i, j), i, j ∈ J¯0 \I¯ (respectively J\ ¯ (30) r˜I (i, j) = ¯ ¯ ¯ ¯ r(i, i) + k∈I¯ r(i, k), i ∈ J0 \I, (respectively J\I), i = j. For the external arrival rates in a Jackson network with (30): ¯
¯ ˜ I = λ˜ ¯ I¯ functioning in up status λ rI (0, j) = λr(0, j) = λj for nodes j ∈ J\ j ¯ ˜I λ j
¯ = λ˜ rI (0, j) = 0 for nodes under repair j ∈ I¯ (j = 0).
(31) (32)
Note that external arrivals may be rejected with positive probability and cause an immediate departure, because arrivals to nodes under repair are rerouted:
177
Availability Formulas and Performance Measures for Separable Degradable Networks
¯
r˜I (0, 0) = r(0, 0) +
r(0, k) =
k∈I¯
λk k∈I¯
λ
≥ 0.
(33)
We assume further that the routing matrix of the original process, see Definition 2.1 (respectively Definition 2.4), is reversible: η j · r(j, i) = η i · r(i, j) for i, j ∈ J¯0 (34) with η j from (7), (respectively i, j ∈ J¯ with η j from (10)) Applying this routing regime results in product form theorems. 5.2 Theorem A Jackson network according to Definition 2.1 with reversible routing and with breakdown and repair intensities given by Definition 3.1 and rerouting according to rs–rd (Definition 5.1) has a stationary distribution of product form given by (26) and (27). Proof: The steady state equations for the network processes are: J J ¯ π(n1 , . . . , nJ ) (1 − δ 0nj )µj (nj )(1 − r(j, j)) + λj + A(ni : i ∈ I) j=1
=
J
j=1
¯ J¯ I⊂ I¯=∅
(1 − δ 0nj )π(n1 , . . . , nj − 1, . . . , nJ )λj
j=1
+
J
(1 − δ 0nj )
j=1
+
J
J
π(n1 , . . . , ni + 1, . . . , nj − 1, . . . , nJ )µi (ni + 1)r(i, j)
i=1 i=j
π(n1 , . . . , nj + 1, . . . , nJ )µj (nj + 1)r(j, 0)
j=1
+
¯ n1 , n2 , . . . , nJ )B(ni : i ∈ I) ¯ π(I;
for all (n1 . . . , nJ ) ∈ NJ
(35)
¯ J¯ I⊂ I¯=∅
¯ n1 , n2 , . . . , nJ ) ∈ P(J) ¯ × NJ with I¯ = ∅, we have For all (I; I¯ ¯ ˜ ¯ n1 , n 2 , . . . , n J ) (1 − δ 0nj )µj (nj )(1 − r˜I (j, j)) + λ π(I; j +
¯ I¯ K⊂ ¯ =∅ K
=
¯ I¯ j∈J\
A(ni : i ∈ I¯ ∪ K) ¯ ¯
B(ni : i ∈ I) + ¯ K) ¯ ¯ B(ni : i ∈ I\ A(ni : i ∈ I) ¯ J\ ¯ I¯ K⊂ ¯ =∅ K
¯
˜I ¯ n1 , n2 , . . . , nj − 1, . . . , nJ )λ (1 − δ 0nj )π(I; j
¯ I¯ j∈J\
+
¯ I¯ j∈J\
(1 − δ 0nj )
¯ I¯ j∈J\
¯ I¯ k∈J\ k=j
¯
¯ n1 , . . . , nk + 1, . . . , nj − 1, . . . , nJ )µk (nk + 1)˜ π(I; rI (k, j)
178
Cornelia Sauer and Hans Daduna
+
¯ I¯ j∈J\
+
¯
¯ n1 , n2 , . . . , nj + 1, . . . , nJ )µj (nj + 1)˜ π(I; rI (j, 0)
¯ ¯ K; ¯ n1 , n2 , . . . , nJ ) A(ni : i ∈ I) π(I\ ¯ K) ¯ A(ni : i ∈ I\ ¯ I¯ K⊂ ¯ =∅ K
+
¯ n1 , n 2 , . . . , n J ) π(I¯ ∪ K;
¯ J\ ¯ I¯ K⊂ ¯ =∅ K
¯ B(ni : i ∈ I¯ ∪ K) ¯ B(ni : i ∈ I)
(36)
We have to show that the distribution (26) and (27) solve the equations (35) and (36). ¯ on the left hand side is equal to the In equation (35) the term π(n1 , . . . , nJ )A(ni : i ∈ I) ¯ on the right hand side for each I¯ ⊂ J, ¯ I¯ = ∅. Thus, ¯ n1 , . . . , nJ )B(ni : i ∈ I) term π(I; the global stationary equation of the standard Jackson network with J nodes is left over, which is solved by the stationary distribution (26). In equation (36) the term on the left hand side ¯ ¯ ¯ n1 , n2 , . . . , nJ ) B(ni : i ∈ I) = π(n1 , . . . , nJ ) A(ni : i ∈ I) π(I; ¯ K) ¯ ¯ K) ¯ B(ni : i ∈ I\ B(ni : i ∈ I\ is equal to the term at the right hand side ¯ ¯ ¯ K; ¯ n1 , . . . , nJ ) A(ni : i ∈ I) = π(n1 . . . , nJ ) A(ni : i ∈ I) π(I\ ¯ K) ¯ ¯ K) ¯ A(ni : i ∈ I\ B(ni : i ∈ I\
(37)
(38)
¯ ⊂ I, ¯K ¯ = ∅. for any K Moreover, the term ¯ ¯ A(ni : i ∈ I¯ ∪ K) A(ni : i ∈ I¯ ∪ K) . . . , n ) = π(n 1 J ¯ ¯ A(ni : i ∈ I) B(ni : i ∈ I) on the left hand side is equal to the term ¯ ¯ ¯ ¯ ¯ n1 , . . . , nJ ) B(ni : i ∈ I ∪ K) = π(n1 . . . , nJ ) A(ni : i ∈ I ∪ K) π(I¯ ∪ K; ¯ ¯ B(ni : i ∈ I) B(ni : i ∈ I) ¯ n1 , . . . , nJ ) π(I;
(39)
(40)
¯ ⊂ J\ ¯ I, ¯K ¯ = ∅. The remainder is the global stationary on the right hand side for each K ¯ I, ¯ service rates µj (·), external equation of the standard Jackson network with node set J\ ¯ ˜ I and routing probabilities r˜I¯(i, j) for i, j ∈ J\ ¯ I¯ which is solved by arrival rates λ j nj ¯ η˜Ij ¯ ¯ C˜ −1 (ni : i ∈ I) (41) π(I; n1 , . . . , nJ ) = µ (i) j ¯ ¯ i=1 j∈J\I
¯ are constants and η˜I¯ is the solution of the traffic equation where ni , i ∈ I, ¯ ¯ ¯ ¯ ˜I + ¯ I¯ η˜Ij = λ η˜Ii r˜I (i, j), j ∈ J\ j
(42)
Therefore, it is sufficient to show that ¯ ¯ I. ¯ η˜Ij = η j for all j ∈ J\
(43)
¯ I¯ i∈J\
Availability Formulas and Performance Measures for Separable Degradable Networks
179
¯
Using the definition of the new routing probabilities r˜I (i, j) from (30), (43) can be verified:
¯ I¯ ˜I + λ r(j, j) + η r ˜ (i, j) = λ + η r(i, j) + η r(j, i) j i i j j ¯ I¯ i∈J\
¯ I¯ i∈J\ i=j
= λj +
¯ I¯ i∈J\
= ηj −
i∈I¯
η i r(i, j) +
η i r(i, j) +
i∈I¯
i∈I¯
i∈I¯
η j r(j, i)
η i r(i, j)
¯ I¯ = η j , j ∈ J\
(44)
where the last but one equality follows from the premise of reversibility. Finally we define ¯ = C˜ −1 (ni : i ∈ I)
nj ¯ ηj A(ni : i ∈ I) C −1 ¯, µj (i) B(ni : i ∈ I) ¯ i=1
(45)
j∈I
¯ n1 , . . . , nJ ). to obtain the proposed form for π(I;
•
5.3 Theorem A Gordon-Newell network defined by Definition 2.4 with reversible routing and with breakdown and repair intensities from Definition 3.1 and rerouting according to rs–rd (Definition 5.1) has a stationary distribution of product form given by (26) and (27). Proof: Theorem 5.3 is proved by setting λ = 0 and r(j, 0) = 0 for all j ∈ J¯ in the proof of Theorem 5.2. • In the present setting it cannot be expected that the requirement may be removed to have reversible routing for the original network without unreliability phenomenon. Even in the simpler case of introducing control of blocking this is a standard assumption necessary to obtain explicit expressions for the steady state distribution. In the next section, however, routing control mechanisms are introduced that do not need such an assumption for obtaining simple to use equilibria. 5.2
Stalling
The next way of handling breakdowns is again parallel to schemes which control blocking, see e.g. [30], [33], [14](Subsection 3.6.2) and can be considered as a global version of blocking before service (bbs). Whenever some node j breaks down and enters its repair status the service is stopped at all the other nodes of the network. Therefore, any movement of customers in the network is stopped. In addition customers wanting to enter the network from the outside are rerouted to an immediate departure. The service at all nodes is not resumed until all broken down nodes are repaired again. Following [40] we call this breakdown control mode stalling.
180
Cornelia Sauer and Hans Daduna
We assume that servers in up status at nodes where the service is interrupted due to a stalling situation behave as being in a hot stand–by mode. This means that these nodes may break down although they are not effectively used, but working capacity is only hold ready. Further the breakdown rates of these nodes or collections of them are the same as if they are working on the frozen population size there. 5.4 Definition (Stalling) If there is a breakdown of either a single node or a group of nodes, then all arrival streams to the network and all service processes at the nodes in up status are completely interrupted and resumed only after all failed nodes are repaired. With this rerouting principle we obtain the same stationary behavior as in the case of the rs-rd mechanism. 5.5 Theorem Consider a Jackson network (Definition 2.1) with breakdown and repair intensities from Definition 3.1 and rerouting according to stalling (Definition 5.4). Assume that nodes being in up status but not serving because of the stalling regime are in hot stand–by mode, i.e. the breakdown rates are not changed by the stalling regime. Then the Jackson network has a stationary distribution of product form given by (26) and (27). Proof: With arguments analogous to those in the proof of Theorem 5.2 it can be shown that the stationary distribution from (26) and (27) is a solution of the global balance equations for the network process: J J ¯ λj + A(ni : i ∈ I) π(n1 , . . . , nJ ) (1 − δ 0nj )µj (nj )(1 − r(j, j)) + j=1
j=1
¯ J¯ I⊂ I¯=∅
J (1 − δ 0nj )π(n1 , . . . , nj − 1, . . . , nJ )λj = j=1
+
J
(1 − δ 0nj )
J
j=1
+
J
π(n1 , . . . , ni + 1, . . . , nj − 1, . . . , nJ )µi (ni + 1)r(i, j)
i=1 i=j
π(n1 , . . . , nj + 1, . . . , nJ )µj (nj + 1)r(j, 0)
j=1
+
¯ n1 , n2 , . . . , nJ )B(ni : i ∈ I) ¯ π(I;
(46)
¯ J¯ I⊂ I¯=∅
for all (n1 . . . , nJ ) ∈ NJ .
¯ n1 , n 2 , . . . , n J ) π(I; ¯ I¯ K⊂ ¯ =∅ K
=
A(ni : i ∈ I¯ ∪ K) ¯ ¯
B(ni : i ∈ I) + ¯ K) ¯ ¯ B(ni : i ∈ I\ A(ni : i ∈ I) ¯ J\ ¯ I¯ K⊂
¯ ¯ K; ¯ n1 , n2 , . . . , nJ ) A(ni : i ∈ I) π(I\ ¯ K) ¯ A(ni : i ∈ I\ ¯ I¯ K⊂ ¯ =∅ K
¯ =∅ K
(47)
Availability Formulas and Performance Measures for Separable Degradable Networks
+
¯ J\ ¯ I¯ K⊂ ¯ =∅ K
¯ n1 , n2 , . . . , n J ) π(I¯ ∪ K;
181
¯ B(ni : i ∈ I¯ ∪ K) ¯ B(ni : i ∈ I)
¯ × NJ with I¯ = ¯ n1 , n2 , . . . , nJ ) ∈ P(J) ∅. for all (I;
•
Note that the occurrence of (47) indicates formally the consequences of the hot stand–by regime for nodes in up status which are interrupted during their services due to breakdowns of other servers. Although no server is active and no arrivals occur, servers nevertheless may break down. 5.6 Theorem Consider a Gordon-Newell network (Definition 2.4) with breakdown and repair intensities from Definition 3.1 and rerouting according to stalling (Definition 5.4). assume that nodes being in up status but not serving because of the stalling regime are in hot stand–by mode, i.e. the breakdown rates are not changed by the stalling regime. Then the Gordon-Newell network has a stationary distribution of product form given by (26) and (27). Proof: Theorem 5.6 is proved by setting λ = 0 and r(j, 0) = 0 for all j ∈ J¯ in the proof of Theorem 5.5. •
6
Routing in Case of Breakdowns: Skipping
The third way to handle breakdowns relies on methods from the theory of taboo probabilities for Markov chains. These methods seem to be introduced independently several times into the realm of queueing network theory and were used to resolve blocking, see e.g.[16] and [42][Chapter 3.6], (where it is called blocking and rerouting) and the references therein. Skipping was introduced by Schassberger [40] and later on, based on Schassberger’s result, it was used in [13] to construct general abstract network processes. For understanding skipping consider the routing process of the Jackson network (Definition 2.1) as an irreducible, homogenous Markov chain V = (Vn : n ∈ N) on state space J¯0 := {0, 1, . . . , J} with transition matrix R = (r(i, j) : i, j ∈ J¯0 ). Assume that customers ¯ i.e., for travelling in the network are not allowed to enter node set I¯ ⊂ J¯ (I¯ = ∅, I¯ = J), the process V the node set I¯ is a so-called taboo set that has to be skipped. This means in detail the following. If a customer at node i ∈ J¯0 \I¯ selects (with probability r(i, j)) for the next jump’s ¯ The jump is allowed and immediately performed and the destination node j ∈ J¯0 \I. customer joins node j for service. If (with probability r(i, k)) the customer selects for the ¯ he only performs an imaginary jump to that node, next jump’s destination node k ∈ I, spending no time at node k, but immediately performs the next jump according to the routing matrix R, i.e., with probability r(k, ) he selects the successor node ; if ∈ J¯0 \I¯
182
Cornelia Sauer and Hans Daduna
then the jump is performed and the customer joins node for service, but if ∈ I¯ the customer has to perform another random choice as if he would depart from ; and so on. To formalize the skipping rules, consider an irreducible, homogenous Markov chain V on a countable state space Z with transition probability matrix S = (s(z, z ) : z, z ∈ Z). We define the restriction of V to the state space Z0 ⊆ Z, such that the taboo set Z\Z0 is finite, as a homogenous Markov chain W = (Wn : n ∈ N) where W0 ≡ V0 with h(0, ω) ≡ 0
(48)
Wn (ω) := Vh(n,ω) (ω) with h(n, ω) := inf {m ∈ N : m > h(n − 1, ω), Vm (ω) ∈ Z0 }
(49) (50)
for n ≥ 1. Note that the process W may start in the taboo set Z\Z0 . Let s0 (z, z ) := P(W1 = z |W0 = z)
(51)
be the probability that the process V enters at state z ∈ Z0 the allowed state space Z0 for the first time after the start in z ∈ Z. The matrix S0 = (s0 (z, z ) : z ∈ Z, z ∈ Z0 ) is stochastic. Applying the First Entrance Method and the strong Markov property we derive from the probabilities s0 (z, z ) for z ∈ Z and z ∈ Z0 the transition probabilities in the case of skipping: s(z, z )s0 (z , z ). (52) s0 (z, z ) = s(z, z ) + z ∈Z\Z0
This follows from s0 (z, z ) = P(W1 = z |W0 = z) = P(W1 = z , V1 ∈ Z0 |W0 = z) + P(W1 = z , V1 ∈ Z\Z0 |W0 = z) = P(V1 = z |V0 = z) + P(V1 = z |W0 = z)P(W1 = z |W0 = z, V1 = z ) = s(z, z ) +
z ∈Z\Z0
= s(z, z ) +
z ∈Z\Z0
P(V1 = z |V0 = z) · P(W1 = z |V1 = z ) =P(W1 =z |W0 =z )
s(z, z )s0 (z , z )
(53)
z ∈Z\Z0
for z ∈ Z and z ∈ Z0 . It is essential that invariant measures of the process V are invariant measures for S0 |(Z0 ×Z0 ) , i.e. of W |Z0 as well.
Availability Formulas and Performance Measures for Separable Degradable Networks
183
6.1 Lemma ([40]) Let u = (u(z) : z ∈ Z) be a vector and u|Z0 = (u(z) : z ∈ Z0 ) its restriction to Z0 . Then uS = u
⇒
u|Z0 S0 |(Z0 ×Z0 ) = u|Z0
(54)
Proof: Applying the skipping rule (52) yields for all z ∈ Z and z ∈ Z0 : s(z, z )s0 (z , z ) u(z)s0 (z, z ) = u(z)s(z, z ) + u(z)
(55)
z ∈Z\Z0
Summation over all z ∈ Z yields for all z ∈ Z0 the following. u(z)s0 (z, z ) = u(z)s(z, z ) + s0 (z , z ) u(z)s(z, z ) z∈Z
⇔
z∈Z
z ∈Z\Z
=u(z )
u(z)s0 (z, z ) = u(z ) +
z∈Z
⇔ u(z ) =
0
z∈Z
(56)
=u(z )
u(z)s0 (z, z )
(57)
z∈Z\Z0
u(z)s0 (z, z )
(58)
z∈Z0
• 6.2 Theorem The skipping rules (52) applied to the routing process with taboo set I¯ yield in Jackson (respectively Gordon-Newell) networks new Markov routing matrices
¯ I) ¯ ˆ I¯ = rˆI¯(i, j) : i, j ∈ J¯0 \I¯ (respectively J\ (59) R which are determined by the following set of equations: ¯ ¯ ¯ I) ¯ rˆI (j, k) = r(j, k) + r(j, i)ˆ rI (i, k), k, j ∈ J¯0 \I¯ (respectively J\
(60)
i∈I¯
¯
rˆI (i, k) = r(i, k) +
¯
r(i, l)ˆ rI (l, k),
¯ k ∈ J¯0 \I¯ (respectively J\ ¯ I) ¯ i ∈ I,
(61)
l∈I¯ ¯
ˆ I for j ∈ J¯ in Jackson networks are given by: The new arrival rates denoted λ j ¯ ¯ I¯ ˆ I = λˆ ¯ I¯ λ r (0, j) = λ + λi rˆI (i, j) for j ∈ J\ j j ¯
ˆ I = 0 for k ∈ I¯ λ k
(62)
i∈I¯
(63)
6.3 Remark The same behavior as described above is obtained by setting the service intensities at the nodes under repair to ∞. The main results in the framework of exponential networks of queues refer also to the generalized product form probabilities ( Definition 4.1). 6.4 Theorem For a Jackson network (Definition 2.1) with breakdown and repair intensities from Definition 3.1 and rerouting according to the skipping rule (Definition 6.2) has a stationary distribution of product form given by (26) and the state process on E (27).
184
Cornelia Sauer and Hans Daduna
Proof: In analogy to the proof of Theorem 5.2 it is sufficient to show that the solutions of the traffic equations of a standard Jackson network and of a Jackson network with node ¯ ˆ I from (62) and routing probabilities ¯ I, ¯ service rates µj (·), external arrival rates λ set J\ j ¯ ¯ I¯ are equal, i.e., ηˆIj¯ = η j for all j ∈ J\ ¯ I¯ with ηˆIj¯ being the rˆI (i, j) from (60) for i, j ∈ J\ solution of ¯ ¯ ¯ ¯ ˆI + ¯ I. ¯ ηˆIi rˆI (i, j) for j ∈ J\ (64) ηˆIj = λ j ¯ I¯ i∈J\
Recall the traffic equation (7) of the Jackson network ¯ η j = λj + η i r(i, j) for j ∈ J.
(65)
i∈J¯
Summing up the left and right hand side of (65) over all j ∈ J¯ yields with r(0, 0) = 0: ηj = λj + η i r(i, j) j∈J¯
j∈J¯
= λ
j∈J¯
j∈J¯ i∈J¯
r(0, j) +
i∈J¯
=1
⇐⇒ λ =
i∈J¯
ηi
j∈J¯
r(i, j)
(66)
=1−r(i,0)
η i r(i, 0)
With λ = η 0 , equations (65) and (67) yield: η i r(i, j) for j ∈ J¯0 . ηj =
(67)
(68)
i∈J¯0
implying that η = (η 0 , η 1 , . . . , η J ) is an invariant measure for the routing process according to the routing matrix R. ¯ the accompaIt follows from Lemma 6.1 that for all potential groups of taboo nodes, I, ¯ ¯ ¯ ¯ fulfills η R ˆ I = η and, thus, we have ˆ I = (ˆ rI (i, j) : i, j ∈ J¯0 \I) nying new routing matrix R ¯ ˆ I from (62): shown with λ j ¯ ¯ ˆI + ¯ I¯ η i rˆI (i, j) for j ∈ J\ (69) ηj = λ j ¯ I¯ i∈J\
or ¯ ¯ I. ¯ ηˆIj = η j for all j ∈ J\
(70) •
6.5 Theorem For a Gordon-Newell network (Definition 2.4) with breakdown and repair intensities from Definition 3.1 and rerouting according to the skipping rule (Definition has a stationary distribution of product form given in (26) 6.2) the state process on E and (27). Proof: Theorem 6.5 is proved by setting λ = 0 and r(j, 0) = 0 for all j ∈ J¯ in the proof of Theorem 6.4. •
Availability Formulas and Performance Measures for Separable Degradable Networks
7
185
Computation of Availability and Performance Measures
= (Y, X) with their steady states given in Definition 4.1 include The network processes X as observables the information on availability (reliability properties) and queue lengths (performance properties). To be more precise, for each model X(t) = (Y (t), X(t)) we see that from the very definition of the Markovian state we can determine reliability aspects and performance aspects simply by computing marginal distributions and their characteristics. This will be explained by computing point availabilities for subnetworks of a Jackson network and by computing mean queue lengths. We first consider Jackson networks with breakdown and repair intensities which are independent of the customer load in the network (see Example 8(a)). In this case we obtain rather simple and explicit expressions for the availability and performance measures. ˆ k ) the mean queue length at node k ∈ J¯ in the stationary standard Jackson Denote E(X network (see Theorem 2.2). 7.1 Proposition Consider a stationary degradable Jackson network (Definition 4.1) with breakdown and repair intensities, which only depend on the set of nodes concerned (see Example 8(a)). Then ¯ ⊂ J¯ is (a) The stationary joint point availability at time t ≥ 0 for the subnetwork K ¯ ¯ π(I), (71) A(K)(t) =1− ¯ K ¯ I⊃
¯ is the probability that exactly the nodes in set I¯ ⊂ J¯ are under repair, where π(I) given by −1 A(K) ¯ ¯ A(I) ¯ = π(I) (72) 1 + ¯ ¯. B(K) B(I) ¯ J¯ K⊂ ¯ =∅ K
(b) The average queue lengths E(Xk ) are the same as in the standard Jackson network, ¯ k ∈ J. Proof: For the normalization constants K(J) (from the standard Jackson network according to Theorem 2.2) and C(J) (from the degradable Jackson network according to Definition 4.1) we have: !J !nj ηj
i=1 µj (i) (n1 ,...,nJ )∈E j=1 K(J)
= !J !nj ηj
¯ A(K) C(J) ¯ ¯ 1 + K⊂J
¯ =∅ K
¯ B(K)
−1
A(K) ¯ = 1 + ¯ B(K) ¯ J¯ K⊂ ¯ =∅ K
(n1 ,...,nJ )∈E
j=1
i=1
µj (i)
(73)
186
Cornelia Sauer and Hans Daduna
With (73) we obtain the following:
¯ = π(I)
(n1 ,...,nJ )∈E
¯ ¯ n1 , . . . , nJ ) = C(J)−1 A(I) π(I; ¯ B(I)
¯ K(J) A(I) · ¯ C(J) B(I) −1 A(K) ¯ A(I) ¯ = 1 + ¯ ¯ B(K) B(I) ¯ J¯ K⊂
(n1 ,...,nJ )∈E
nj J ηj µj (i) j=1 i=1
=
for I¯ ⊂ J¯
(74)
¯ =∅ K
and
E(Xk ) =
¯ n1 , . . . , nJ ) nk π(n1 , . . . , nJ ) + π(K;
(n1 ,...,nJ )∈E
=
(n1 ,...,nJ )∈E
−1
nk · C(J)
nj J A(K) ¯ ηj 1 + ¯ µj (i) B(K) ¯ J¯ K⊂ j=1 i=1
A(K) ¯ −1 = 1 + ¯ C(J) B( K) ¯ J¯ K⊂ ¯ =∅ K
¯ J¯ K⊂ ¯ =∅ K
¯ =∅ K
nj J ηj nk · µj (i) j=1 i=1 (n1 ,...nJ )∈E
ˆ k) =K(J) E(X
ˆ k ) for k ∈ J¯ = E(X
(75) •
Proposition 7.1 states that in case of breakdown and repair rates as given in Example 8(a) the mean value analysis for availability and performance measures can be performed separately. Furthermore, in this case the equilibrium distribution has the following remarkable product structure: ¯ π ¯ n1 , . . . , nJ ) = π(I) ˆ (n1 , . . . , nJ ) (76) π(I; ¯ I¯ = ∅, (n1 , . . . , nJ ) ∈ E for I¯ ⊂ J, with π ˆ (n1 , . . . , nJ ) as steady state distribution in the standard Jackson network. As stated in Example 8(a) the marginal process Y = (Y (t) : t ≥ 0) which describes the breakdown/repair status of the nodes is a Markov process. Therefore, the result of Propo¯ is the steady state of the multidimensional sition 7.1(a) is not surprising, because π(I) J birth–death process on {0, 1} . On the other hand Y and the joint queue length network process X are definitely dependent, because the transition mechanism of X depends on the actual state of Y . Therefore, the result of Proposition 7.1(b) is less intuitive. In fact this result does not hold in case
Availability Formulas and Performance Measures for Separable Degradable Networks
187
of more general system dynamics as can be seen from the next Proposition, which says that in case of general breakdown and repair intensities the average queue lengths at the nodes in the degradable networks can differ from the corresponding means in the standard networks. 7.2 Proposition Consider a stationary degradable Jackson network (Definition 4.1) ¯ and with nodes, which can only break with state-independent service rates µj , j ∈ J,
−1 η down when the server is idle (see Example 8(b)). For j ∈ J¯ let Kj := 1 − µj > 1. j
¯ ⊂ J¯ is (a) The stationary joint point availability at time t ≥ 0 for the subnetwork K ¯ ¯ π(I) (77) A(K)(t) =1− ¯ K ¯ I⊃
¯ is the probability that exactly the nodes in set I¯ ⊂ J¯ are under repair, where π(I) given by ¯ = π(I) !
j∈I¯ Kj
+
¯ A(I) ¯ B(I) H⊂J¯ ¯ =∅ H
! ¯ ¯ ¯ Kj A(H) !j∈J\H ¯ B(H) ¯ I¯ Kj j∈J\
.
(78)
(b) The average queue length E(Xk ) for k ∈ J¯ is given by
−1 ! ¯ A(K) ¯ J¯ 1 (k) K 1 + K⊂ ¯ K ¯ ¯ j ¯ J\ j∈K ¯ =∅ B(K) K ˆ k) E(Xk ) = E(X ¯ A(K) ¯ J¯ 1 + K⊂ ¯ B(K)
(79)
¯ =∅ K
¯ and equal to 0 otherwise. with indicator function 1H¯ (k) equal to 1 if k ∈ H Proof: (a) Note, that the normalization constant K(J) for the steady state !distribution of the standard Jackson network can be written in the form K(J) = j∈J¯ Kj implying: ¯ n1 , . . . , n J ) ¯ = π(I; π(I) (n1 ,...,nJ )∈E −1
= C(J)
−1
= C(J)
= !
= !
¯ A(I) ¯ B(I)
(n1 ,...,nJ )∈E
¯ A(I) ¯ B(I)
δ 0ni
i∈I¯
j=1
η j nj µj ¯ ¯
¯ I| ¯ |J\ ¯ I)∈N ¯ j∈J\I (ni :i∈J\ ¯ ! A(I) ¯ I¯ Kj ¯ j∈J\ B(I)
j∈J¯ Kj
j∈I¯ Kj
+
+
H⊂J¯ ¯ =∅ H
¯ A(I) ¯ B(I) H⊂J¯ ¯ =∅ H
¯ A(H) ¯ B(H)
!
J η j nj
¯ H ¯ i∈J\
! ¯ ¯ ¯ Kj A(H) !j∈J\H ¯ B(H) ¯ I¯ Kj j∈J\
Kj .
µj
(80)
188
Cornelia Sauer and Hans Daduna
1 C(J)
(b) E(Xk ) =
nk ·
J j=1
(n1 ,...,nJ )∈E
ηj µj
nj
A(K) ¯ 1 + ¯ B(K) ¯ J¯ K⊂
δ 0ni
¯ i∈K
¯ =∅ K
nj ¯ ηj 1 A(K) ˆ k )+ K(J)E(X n = 1 ¯ K ¯ (k) k J\ ¯ C(J) µ B( K) j ¯ J¯ ¯ K)| ¯ ¯ K ¯ K⊂ |J\ ¯ K)∈N ¯ j∈ J\ (n :i∈ J\ i ¯ =∅ K
ˆ k )( =E(X
!
¯ K ¯ j∈J\
Kj ) 1J\ ¯ K ¯ (k)
A(K) ¯ K(J) ˆ −1 E(Xk ) 1 + Kj = ¯ K ¯ (k) ¯ 1J\ C(J) B(K) ¯ J¯ ¯ K⊂ ˆ k) = E(X
1+
¯ =∅ K
¯ J¯ K⊂ ¯ =∅ K
!
¯ A(K) ¯ B(K)
1+
1J\ ¯ K ¯ (k) ¯ A(K)
¯ J¯ K⊂ ¯ =∅ K
where = follows by standard summations.
j∈K
−1 ¯ Kj j∈K
ˆ k ), < E(X
(81)
¯ B(K)
•
¯ for complete down status respectively From 7.2(a) it is obtained that the probability π(J) the probability π(∅) for complete up status of the network is smaller respectively larger than in case of Proposition 7.1. ¯ = π(J)
< π(∅) = >
K(J) +
¯ A(J) ¯ B(J) ¯ A(H) H⊂J¯ ¯ B( H) ¯
¯ A(J) ¯ B(J)
1+ 1+ 1+
H⊂J¯ ¯ =∅ H
H=∅
!
¯ H ¯ j∈J\
¯ A(H) ¯ B(H)
Kj (82)
1 H⊂J¯ ¯ =∅ H
¯ A(H) ! 1 ¯ B(H) ¯ Kj j∈H
1 H⊂J¯ ¯ =∅ H
¯ A(H) ¯ B(H)
(83)
This result appears to be plausible since in the setting of Proposition 7.2 a breakdown only occurs in the special situation when the server is idle. In case of Proposition 7.2 the average queue lengths are smaller than in case of Proposition 7.1, which follows intuitively from breakdowns being only possible in the idle state, which will increase the sojourn time for this state. The most important performance measure is possibly the steady state throughput. For open networks without performance reductions due to breakdowns this is simply λ, see Definition 2.1. By simple flow balancing arguments we can derive throughput results for the general case of open degradable networks as well.
Availability Formulas and Performance Measures for Separable Degradable Networks
189
7.3 Proposition The throughput denoted T H in degradable open networks is the effective external arrival rate. (a) For networks with rerouting under terms of stalling T H(st) = λ · π(∅),
(84)
(b) For networks with rerouting under terms of rs-rd T H(rs-rd) =
¯ · π(I)
¯ J¯ I⊂
λj ,
(85)
¯ I¯ j∈J\
(c) For networks with rerouting under terms of skipping
T H(sk) =
" ¯ λ− π(I)
¯ J¯ I⊂
# ¯
λi rˆI (i, 0) ,
(86)
i∈I¯
¯ is the probability that exactly the nodes in set I¯ ⊂ J¯ are under repair. where π(I) It is intuitively clear that the following relations hold: T H(st) < T H(rs-rd) < T H(sk) < λ,
(87)
For computation of throughput in closed networks of exponential servers there are different algorithms available. The next proposition shows that under some additional assumptions these algorithms can also be applied in our framework. 7.4 Proposition The throughput T H(j) at node j ∈ J¯ in degradable closed networks is given by: ¯ n1 , . . . , nj , . . . , nJ ) · µj (nj ). T H(j) = π(I; (88) ¯ J, ¯ I⊂ j∈ / I¯
(n1 ,...,nJ )∈S(J,D)
implying that for queue lengths independent breakdown and repair intensities, which only depend on the set of nodes concerned (see Example 8(a)) G(J, D − 1) ¯ π(I) (89) T H(j) = η j G(J, D) ¯ J, ¯ I⊂ j∈ / I¯
Proof: We have to show only (89).
190
Cornelia Sauer and Hans Daduna
T H(j) =
A(I) ¯ ¯ B(I) ¯ J, ¯ I⊂ j∈ / I¯
(n1 ,...,nJ )∈S(J,D)
nk J η k µj (nj ) µ (i) C(J, D) k=1 i=1 k
=
A(I) A(K) ¯ ¯ 1 + ¯ ¯ B(I) B(K) ¯ J, ¯ ¯ J, ¯ I⊂ K⊂ j∈ / I¯
j∈ / I¯
nk J ! !
(n1 ,...,nJ )∈S(J,D−1) k=1 i=1
nk J ! !
(n1 ,...,nJ )∈S(J,D) k=1 i=1
−1
A(I) A(K) ¯ ¯ 1 + ¯ ¯ B(I) B(K) ¯ J, ¯ ¯ J, ¯ I⊂ K⊂
= ηj
ηj
¯ =∅ K
=
−1
ηj
ηk µk (i)
ηk µk (i)
G(J, D − 1) G(J, D)
¯ =∅ K
G(J, D − 1) ¯ π(I) G(J, D) ¯ J, ¯ I⊂ j∈ / I¯
• Note that the expression ηj
G(J, D − 1) G(J, D)
(90)
is the throughput at node j in a Gordon–Newell network without breakdowns according to Definition 2.4.
8
Conclusions
The survey in [11] which we sketched in the introduction elucidates two alternative principles for dealing with the hard problem of investigating simultaneously availability (reliability) and performance aspects in a unified Markovian system process with the aim to obtain explicit and simply to use formulas for both features. • Define a detailed Markov model and then use an approximative solution procedures for example decomposition methods. • Find a simple Markov model with explicitly computable steady state distribution. Important examples for the first method are the described Poisson approximation, the MMPP approximation, and the joint–state approximation, while the reduced work–rate approximation is a prototype for the second method. The advantage of the class of models constructed according to the second principle is that after initially performing the modeling procedure during the evaluation process no additional adhoc assumptions for further approximations are needed.
Availability Formulas and Performance Measures for Separable Degradable Networks
191
But in case of the often used reduced workload approximation there is a severe drawback: In the modeling process we lose the breakdown events, respectively the reliability status of the network, from the set of observables. To be more precise: In this analytically tractable model for performance analysis the aspects of reliability cannot be considered explicitly. All the model buildings presented in our paper belong to the class of investigations along the line guided by the second principle. And the way we proceeded with the respective analysis proved to be successful. The advantage of these modeling features are obviously that both aspects Reliability and Performance behavior can be treated explicitly either separated or in parallel in the same model. As we have shown by the prototype examples in Section 7 in many cases which encompass rather general parameter structures of the systems we are able to decouple mean value analysis concerning availability and classical performance measures of queueing theory. It is obvious how to generalize the computations to more general networks. Clearly this will be increasingly complex but always straightforward. Some comments on the used rerouting principles may be in order here: Stalling is a common practice, e.g. in production control. If some machine breaks down and must be repaired the production process is interrupted. Even in some big automobile firms the workers on the assembly line are able to stop the whole production process if something went wrong and/or the products are of a low-quality. The blocking principle Repetitive Service – Random Destination (rs–rd) is a well established approximation for some common communication protocols in systems with finite buffers or for ALOHA-type protocols. Skipping can be considered as a versatile rerouting protocol in networks with side constraints. In our opinion it is a promising feature in the realm of queueing network theory. From the point of reliability we decided to describe the nodes’ behavior by using binary variables, up or down. Clearly, a more elaborated modeling in the sense of using multistate systems [1][Sect2.1.2] to describe locally different modes of degradation of the nodes is desirable. In [35] an approximate product form expression of the steady state for a class of separable queueing networks, in which the servers can operate on different levels of degradation, is derived using the observed property of near-complete decomposability and the consequences thereof. This is a topic of our ongoing research. In this paper we focussed on exponential Jackson and Gordon-Newell networks. We expect that most of the results of the previous sections will carry over (with notational burdenings) to the case of mixed and non-exponential networks as well. In [38] some preliminary results are developed concerning BCMP networks, [7], which allow different classes of customers and new service disciplines while the stationary distribution is retained of product form. Further specifications of the breakdown and repair intensities and further routing protocols in case of breakdowns which yield (modified) product form steady states for networks with unreliable servers are subject of ongoing research.
192
Cornelia Sauer and Hans Daduna
References [1] Aven, T., Jensen, U. (1999) Stochastic Models in Reliability. Applications of Mathematics Vol. 41, Springer, New York. [2] Avi-Itzhak, B. and Naor, P. (1963) Some queuing problems with the server station subject to breakdown. Operations Research, 11, 303-320. [3] B¨ar, M., Fischer, K. and Hertel, G. (1988) Leistungsf¨ahigkeit, Qualit¨ at, Zuverl¨ assigkeit. transpress VEB Verlag f¨ ur Verkehrswesen, Berlin. [4] Balsamo, S., De Nitto Persone, V. and Onvural, R. (2001) Analysis of Queueing Networks with Blocking. Kluwer Academic Publishers. [5] Barlow, R., Proschan, F. (1965) Mathematical Theory of Reliability. Wiley, New York. [6] Barlow, R., Proschan, F. (1975) Statistical Theory of Reliability and Life Testing. Holt, Rinehardt and Winston, New York. [7] Baskett, F., Chandy, M., Muntz, R. and Palacios, F.G. (1975) Open, closed and mixed networks of queues with different classes of customers. Journal of the Association for Computing Machinery, 22, 248-260. [8] Beichelt,F. (1993) Zuverl¨ assigkeits– und Instandhaltungstheorie. Teubner, Stuttgart. [9] Birolini, A. (1994) Quality and Reliability of Technical Systems. Springer, Berlin. [10] Bolch, G., Greiner, S., Meer, H. de and Trivedi, K.S. (1998) Queueing networks and Markov chains. John Wiley, New York. [11] Chakka, R. and Mitrani, I. (1996) Approximate Solutions for Open nNetworks with Breakdowns and Repairs. In F. P. Kelly, S. Zachary and I. Ziedins, editors, Stochastic Networks - Theory and Applications, Chapter 16, 267-280, Clarendon Press, Oxford. [12] Daduna, H. (2001) Stochastic networks with product form equilibrium. In D. N. Shanbhag and C. R. Rao, editors, Stochastic Processes: Theory and Methods, Volume 19 of Handbook of Statistics, 11, 309-364. [13] Daduna, H. and Szekli, R. (1996) A queueing theoretical proof of increasing property of Polya frequency functions. Statistics and Probability Letters, 26, 233-242. [14] Dijk, N.M. Van (1993) Queueing Networks and Product Forms - A Systems Approach. Wiley, Chichester. [15] Doshi, B.T. (1990) Single server queues with vacations. In H. Takagi, editor, Stochastic Analysis of Computer and Communications Systems 217 - 267, North-Holland, Amsterdam. [16] Economou, A. and Fakinos, D. (1998) Product form stationary distributions for queueing networks with blocking and rerouting. Queueing Systems and Their Applictions, 30, 251 - 260.
Availability Formulas and Performance Measures for Separable Degradable Networks
193
[17] Fischer, W. and Meier-Hellstern, K. (1993) The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation, 18, 149-171. [18] German, R., Luethi, J. and Telek, M. (2001), editors. Proceedings of the Fifth International Workshop on Performability Modeling of Computer and Communication Systems, September 15-16,2001, Erlangen. [19] Gordon, W.J. and Newell, G.F. (1967) Closed queueing systems with exponential servers. Operations Research, 15, 254-265. [20] Haverkort, B.R., Marie, R., Rubino, G. and Trivedi, K. (2001), editors, Performability Modelling - Techniques and Tools. Wiley. [21] Jackson, J.R. (1957) Networks of waiting lines. Operations Research, 5, 518-521. [22] Jackson, J.R. (1963) Jobshop-like queueing systems. Management Science, 10, 131142. [23] Keilson, J. (1962) Queues subject to service interruption. The Annals of Mathematical Statistics, 33 No. 4, 1314-1322. [24] Kelly, F.P. (1979) Reversibility and stochastic networks. Wiley, New York. [25] Krishnaiah, P.R., Rao, C.R. (1988), editors, Handbook of Statistics 7. Quality Control and Reliability. North–Holland, Amsterdam. [26] Lee, H.W., Ahn, B.Y. and Park, N.I. (2001) Decompositions of the queue length distributions in the MAP/G/1 queue under multiple and single vacations with Npolicy. Stochastic Models, 17, 157 - 190. [27] Marshall, A.W. and Olkin, I. (1967) A multivariate exponential distribution. Journal of the American Statistical Association, 2, 30-44. [28] Melamed, B. (1982) Sojourn times in queueing networks. Mathematics of Operations Research, 7, 223-244. [29] Meyer, J.F. (1984) Performability modeling of distributed real-time systems. In G. Iazeolla, P.J. Courtois and A. Hordijk, editors, Mathematical Computer Performance and Reliability, 361-372, North-Holland, Amsterdam. [30] Mikou, N. (1988) A two-node Jackson’s network subject to breakdowns. Stochastic Models, 4, 523-552. [31] Mikou, N., Idrissi-Kacimi, O. and Saadi, S. (1995) Two processes interacting only during breakdown: the case where the load is not lost. Queueing Systems, 19, 301317. [32] Mitrani, I. and Avi-Itzhak, B. (1968) A many-server queue with service interruptions. Operations Research, 16, 628-638. [33] Mitrani, I. (1974) Networks of unreliable computers. In E. Gelenbe and R. Mahl, editors, Computer Architectures and Networks, 359-374, North-Holland, Amsterdam.
194
Cornelia Sauer and Hans Daduna
[34] Mitrani, I. and Wright, P.E. (1994) Routing in the presence of breakdowns. Performance Evaluation, 20, 151-164. [35] M¨ uller-Clostermann, B. (1988) An approximate product form for a class of degradable queueing networks. Performance Evaluation, 8, 165-172. [36] Onvural, R.O. (1990) Closed queueing networks with blocking. In H. Takagi, editor, Stochastic Analysis of Computer and Communication Systems, 499-528, NorthHolland, Amsterdam. [37] Perros, H.G. (1990) Approximation algorithms for open queueing networks with blocking. In H. Takagi, editor, Stochastic Analysis of Computer and Communication Systems, 451-498, North-Holland, Amsterdam. [38] Sauer, C. and Daduna, H. (2003) Separable networks with unreliable servers. In: Providing QoS in Heterogeneous Environments.Proceedings of the 18th ITC, Berlin. Edts.: J. Charzinski, R. Lehnert, P. Tran-Gia. Vol. 5b of Teltraffic Science and Engineering, 821-830, Elsevier Science, Amsterdam. [39] Schassberger, R. (1973) Warteschlangen. Springer, Wien. [40] Schassberger, R. (1983) Decomposable stochastic networks: Some observations. In F. Baccelli and G. Fayolle, editors, Modelling and Performance Evaluation Methodology, Volume 60 of Lecture Notes in Control and Information Sciences, chapter IV, 137-150, Springer, Berlin, 1984. Proceedings of the International Seminar, Paris, France, January 24-26. [41] Serfozo, R.F. (1993) Queueing networks with dependent nodes and concurrent movements. Queueing Systems, 13, 143-182. [42] Serfozo, R.F. (1999) Introduction to Stochastic Networks. Applications of Mathematics 44, Springer, New York. [43] Thiruvengadam, K. (1963) Queuing with breakdowns. Operations Research, 11, 6271. [44] Trivedi, K.S., Malhotra, M. (1993) Reliability and performability techniques and tools: A survey. Preprint, Dept. of Electrical Engineering, Duke University, Durham. [45] White, H. and Christie, L.S. (1958) Queuing with preemptive priorities or with breakdown. Operations Research, 6, 79-95.
Cornelia Sauer Hans Daduna University of Hamburg Department of Mathematics Center of Mathematical Statistics and Stochastic Processes
[email protected] [email protected]