performance and reliability modeling using markov ... - Semantic Scholar

PERFORMANCE AND RELIABILITY MODELING USING MARKOV REGENERATIVE STOCHASTIC PETRI NETS by

Hoon Choi Department of Computer Science Duke University

Date: Approved: Dr. K. S. Trivedi, Supervisor Dr. S. Chowdhury Dr. M. A. Holliday Dr. O. C. Ibe Dr. N. C. Strole Dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 1993

c 1993 by Hoon Choi Copyright All rights reserved

ABSTRACT (Computer Science)

PERFORMANCE AND RELIABILITY MODELING USING MARKOV REGENERATIVE STOCHASTIC PETRI NETS by

Hoon Choi

Department of Computer Science Duke University Date: Approved:

Dr. K. S. Trivedi, Supervisor Dr. S. Chowdhury Dr. M. A. Holliday Dr. O. C. Ibe Dr. N. C. Strole An abstract of a dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 1993

Abstract Stochastic timed Petri nets are recognized as useful modeling tools for analyzing the performance and reliability of systems. However, one of the problems in using these Petri nets is that the distributions of event times are usually restricted in order to solve the model analytically and numerically rather than by simulation. We introduce a new class of Petri nets called Markov Regenerative Stochastic Petri Nets (MRSPNs) to relieve this restriction. The MRSPNs are stochastic timed Petri nets in which the underlying stochastic processes are Markov regenerative processes. The MRSPNs allow generally distributed timed transitions as well as immediate transitions and exponentially distributed timed transitions. With a condition that at most one generally distributed timed transition is enabled in each marking, an MRSPN can be solved analytically. We provide a transient (time-dependent) analysis method and a steady state analysis method for this new class of Petri nets. As a speci c example of general distributions, we provide the equations for an MRSPN in which the general distributions are restricted to be deterministic. This class of MRSPNs is called DSPN. We investigate the sensitivity of DSPN models by studying the variation of output performance with respect to changes of a system parameter. We show an algorithm for computing the sensitivity functions in terms of steady state probabilities of the DSPN. Another problem with using stochastic timed Petri nets is that the size of state space of the underlying stochastic process tends to be large. To overcome this largeness problem, we propose two approximation models. We approximate client-server systems with the folding method and polling systems with the xed point iteration method. Several performance measures such as the mean cycle time, the mean response time and throughputs are obtained. As a study of reliability analysis, we introduce a new type of the mean time to failure (MTTF) named conditional MTTF and provide its computational method in a continuous-time Markov chain model of a system.

i

Acknowledgements I would express my sincere appreciation to my advisor, Dr. Kishor S. Trivedi, for his constant support and encouragement during the course of this work. I am deeply grateful to him. I have always been astonished at his extensive knowledge, his exhaustive energy to pursue research, and his endless new ideas. I appreciate Dr. Vidyadhar G. Kulkarni of the University of North Carolina, Chapel Hill for his help in bringing this research to completion. I wish to thank the other members of my committee: Dr. Shyamal Chowdhury, Dr. Oliver C. Ibe, Dr. Mark A. Holliday and Dr. Norman C. Strole for their kind advice and helpful suggestions. I also would like to thank my fellow graduate students: Vivek Khera, Keehang Kwon, Dimitris Logothetis, Varsha Mainkar, Manish Malhotra, Apratim Purakayastha, Vijay Srinivasan, Lorrie Tomek, Wei Wang, Chang-Yu Wang and Steven P. Woolet for their helpful discussion during my study. Special thanks to Drs. Phil F. Chimento, Gianfranco Ciardo, Jogesh Muppala, Anapathur Ramesh who kindly answered all the questions I had. Sharing a learning experience with them was a real pleasure. I am grateful to Patrick Harubin who educated me about American life and culture and for his careful review of this dissertation. I must express my thanks to Electronics and Telecommunications Research Institute, Korea for giving me an opportunity to study at Duke and IBM for awarding me the graduate fellowship which enabled me to concentrate on my research. Above all, this work is dedicated to my family with great love. My parents never lost trust in me and always prayed to God for me. My lovely wife Hyewon, son Youngwu, and daughter Heejoo endured all the diculties to support my study. All these made this work possible.

ii

Contents Abstract Acknowledgements List of Figures List of Tables 1 Introduction

1.1 Modeling and Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Contributions and Organization : : : : : : : : : : : : : : : : : : : : : : : : : : : :

2 Background

2.1 Stochastic Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1.1 Markov Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1.2 Semi-Markov Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1.3 Markov Regenerative Processes : : : : : : : : : : : : : : : : : : : : : : : : 2.1.4 Markov Reward Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2 Stochastic Timed Petri Nets : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2.1 Petri nets : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2.2 Stochastic Timed Petri Nets with Exponentially Distributed Firing Times 2.2.3 Stochastic Timed Petri Nets with Generally Distributed Firing Times : : 2.3 Relation of Markov Chains with Stochastic Timed Petri Nets : : : : : : : : : : :

3 Markov Regenerative Stochastic Petri Nets

3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : 3.2 De nition : : : : : : : : : : : : : : : : : : : : : : : : 3.3 Transient Analysis of MRSPNs : : : : : : : : : : : : 3.3.1 Basic Equations for the Transient Behavior : 3.3.2 Transient Analysis in the Transform Domain 3.4 Steady State Analysis of MRSPNs : : : : : : : : : : 3.5 Example : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

4 Modeling of Uniformly Distributed Firing Times

i ii v vii 1 1 3

5

5 6 9 10 12 13 13 15 17 18

23 23 24 32 32 39 42 45

48

4.1 Uniform Distribution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 4.2 Analysis of MRSPN for Uniform Distribution : : : : : : : : : : : : : : : : : : : : 49

5 Modeling of Deterministic Firing Times 5.1 5.2 5.3 5.4

Deterministic and Stochastic Petri Nets (DSPNs) Transient Analysis of DSPNs : : : : : : : : : : : Transient Analysis in the Transform Domain : : Steady State Analysis of DSPNs : : : : : : : : :

iii

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

53 53 54 59 61

iv 6 Sensitivity Analysis of Deterministic and Stochastic Petri Nets

6.1 Parametric Sensitivity Analysis : : : : : : : : : : : : : : : : : : : : : : : : 6.2 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.3 Sensitivity of DSPNs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.3.1 Sensitivity with respect to Firing Rates or Branching Probabilities 6.3.2 Sensitivity with respect to Deterministic Firing Times : : : : : : : 6.4 Example : Optimization of a Queue with Vacation : : : : : : : : : : : : :

7 Performance Analysis using MRSPNs

7.1 Client-Server Systems : : : : : : : : : : : : : : : : : : : : 7.1.1 Model De nition : : : : : : : : : : : : : : : : : : : 7.1.2 Token Ring Network-Based Client-Server System : 7.1.3 CSMA/CD Network-Based Client-Server System : 7.1.4 Numerical Results : : : : : : : : : : : : : : : : : : 7.1.5 Sensitivity Analysis : : : : : : : : : : : : : : : : : 7.1.6 Accuracy of the Superclient Approximation : : : : 7.1.7 Summary : : : : : : : : : : : : : : : : : : : : : : : 7.2 Polling Systems : : : : : : : : : : : : : : : : : : : : : : : : 7.2.1 Stochastic Reward Net Models of Polling Systems 7.2.2 Fixed Point Iteration Method : : : : : : : : : : : : 7.2.3 Approximation of Polling System Performance : : 7.2.4 Existence of a Fixed Point : : : : : : : : : : : : : : 7.2.5 Summary : : : : : : : : : : : : : : : : : : : : : : :

8 Reliability Analysis using MRSPNs

8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : 8.2 De nition of the Conditional Mean Time To Failure 8.3 Computation of the Conditional MTTF : : : : : : : 8.3.1 Method : : : : : : : : : : : : : : : : : : : : : 8.3.2 Time Complexity of the Method : : : : : : : 8.4 The Second Moment of The Time To Failure : : : : 8.5 The Cumulative Conditional MTTF : : : : : : : : : 8.6 Examples : : : : : : : : : : : : : : : : : : : : : : : : 8.6.1 A Communication Network : : : : : : : : : : 8.6.2 Fault-Tolerant System Architectures : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

64 64 65 66 66 71 74

80

80 82 83 88 94 100 102 106 107 108 111 113 118 120

121 121 123 125 125 129 129 130 136 136 140

9 Conclusion

145

Bibliography

150

9.1 Summary of the Dissertation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 145 9.2 Future Research : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 146

List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4

A Sample Path of MRGP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : An Example of a Petri net : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : SRN Model of 1 Client, 1 Server System : : : : : : : : : : : : : : : : : : : : : : Reachability Graph of the Example SRN : : : : : : : : : : : : : : : : : : : : : : : CTMC Underlying the Example SRN : : : : : : : : : : : : : : : : : : : : : : : : In nitesimal Generator Matrix of the CTMC : : : : : : : : : : : : : : : : : : : : Examples of MRSPNs and Their Markings : : : : : : : : : : : : : : : : : : : : : The Relation between the Reachability Graph and the Reduced Reachability Graph MRSPN and Reachability Graphs of M/G/1/2/2 System : : : : : : : : : : : : : Transient and Steady State Probabilities of M/G/1/2/2 System : : : : : : : : : :

11 14 18 20 21 21 27 30 45 47

4.1 Distribution Function of a Uniformly Distributed Random Variable X UNI (a; b) 49 6.1 Computation of the Sensitivity Function : : : : : : : : : : : : : : : : : : : : : : : 71 6.2 (a) DSPN of the Vacation Queue with Deterministic Service, Vacation Times (b) State Transition Diagram : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75 6.3 Sensitivity Function of 1 with respect to h : : : : : : : : : : : : : : : : : : : : 78 7.1 A Possible Con guration of Network Relative to Tagged Client : : : : : : : : : : 84 7.2 SRN for Tagged Client Subsystem in Token Ring Network : : : : : : : : : : : : : 85 7.3 SRN for the Superclient Subsystem in Token Ring Network : : : : : : : : : : : : 86 7.4 SRN for the Server Subsystem in Token Ring Network : : : : : : : : : : : : : : : 87 7.5 SRN for the Complete System in Token Ring Network (N > 1) : : : : : : : : : : 89 7.6 SRN for Tagged Client Subsystem in CSMA/CD Network : : : : : : : : : : : : : 90 7.7 SRN for the Superclient Subsystem in CSMA/CD Network : : : : : : : : : : : : 92 7.8 SRN for the Server Subsystem in CSMA/CD Network : : : : : : : : : : : : : : : 93 7.9 SRN for the CSMA/CD Network-Based System : : : : : : : : : : : : : : : : : : : 95 7.10 Mean Response Times : 5 Client Stations, 4 Kbytes Reply Packet : : : : : : : : : 98 7.11 Mean Response Times : 5 Client Stations, 8 Kbytes Reply Packet : : : : : : : : : 99 7.12 Throughput of the Server : 5 Client Stations, 4 Kbytes Reply Packet : : : : : : : 100 7.13 Average Number of Messages at the Server : 5 Client Stations, 4 Kbytes Reply Packet : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 101 7.14 Parametric Sensitivity Analysis of CSMA/CD Network-Based System : : : : : : 103 7.15 Exact SRN Model for the Token Ring Network-Based System with 5 Client Stations105 7.16 SRN Model of 5 Nodes, Finite Population, Single Service Polling System : : : : : 110 7.17 Approximate Model of a Polling System : Si (N) : : : : : : : : : : : : : : : : : : 114 7.18 SRN Model for Computing the Mean Delay : Ti (k) : : : : : : : : : : : : : : : : : 115

v

vi 7.19 Algorithm for the Mean Response Time Computation : : : : : : : : : : : : : : : 116 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10

A Simple CTMC Model of a System Reliability : : : : : : : : : : A CTMC with m Absorbing States : : : : : : : : : : : : : : : : : CTMC of The Recon gurable Duplex System : : : : : : : : : : : Relations between CMTTF A (), MTTF A (), MTTF B ( ) The Modi ed CTMC of the Recon gurable Duplex System : : : A Markov Model Q and Its Modi ed CTMC Q : : : : : : : : : : System Con guration of an Example Network : : : : : : : : : : : Markov Chain Model of a Network System Reliability : : : : : : Architectures of Fault-Tolerant Systems : : : : : : : : : : : : : : MTBHE with respect to CS and CD : : : : : : : : : : : : : : : : j

j

j

0

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

124 125 131 132 133 134 137 137 141 144

List of Tables 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11

Mean Response Time (Mean Reply Packet Length : 4 Kbytes) : : : Mean Response Time (Mean Reply Packet Length : 8 Kbytes) : : : State Space and Storage Requirements : : : : : : : : : : : : : : : : : State Space and Storage Requirements : : : : : : : : : : : : : : : : : Mean Response Time When Mean Reply Packet Length is 4 Kbytes State Space and Storage Requirements, M=3 : : : : : : : : : : : : : State Space and Storage Requirements, N=6 : : : : : : : : : : : : : Execution Time of One-level Models (in sec) : : : : : : : : : : : : : : State Spaces and Execution Time of Models, N = 5; M = 3 : : : : : Mean Response Time at Node 1, M = 3 : : : : : : : : : : : : : : : : Mean Response Time at Node 1, M = 1 : : : : : : : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

97 97 97 104 104 112 112 112 116 117 117

8.1 Dependability Measures for the Three Architectures : : : : : : : : : : : : : : : : 142

vii

Chapter 1 Introduction 1.1 Modeling and Analysis System modeling is a process which abstracts a real-world system into a set of parameters that characterizes the system. With minimal eort and cost, the dependability or performance of the system can be analytically evaluated over a wide range of system parameters and con gurations. This enables the system designer to estimate the behavior of the system at the design stage rather than at the implementation stage. Besides, it may be also useful for a customer to compare existing systems so that the most favorable product can be selected by evaluating the reliability or performance model of each alternative. Ever since the advent of electronic systems, their modeling and analysis have always been a matter of interest and have been done extensively. The fast growth of computer applications has led to the development of computer-communication networking; and thus modeling of the network systems such as Local Area Network (LAN) and Integrated Services Digital Network (ISDN) has become important. Techniques for evaluating system dependability and performance include measurement, simulation, and analytic modeling. Measurement is believed to be the most accurate method, but it is only possible after the system has been implemented, and it tends to be expensive. Discreteevent simulation is possible even at design stage, but it also tends to be expensive since a large amount of computation time may be needed in order to obtain statistically signi cant results. Analytic modeling is an attractive alternative to measurement or simulation method, particularly if many dierent designs need to be evaluated. Analytic models include product-form

1

2 queuing network models [52] and (semi-) Markov models [57, 111]. In [68], an approximate performance analysis of an Ethernet-based client-server system is given using product-form models. The analysis uses a network of queues which is parameterized by the results of measurement experiments. However, important interdependencies among system components (e.g., synchronization, blocking, multiple resource possession etc.) are generally ignored when using product-form queuing network models. The modern computer-communications systems have various functions and consist of many components. The analysis of these systems is generally dicult because of various kinds of dependencies between the components and their complex operations. For instance, in a local area network, request A from station i can receive service later than another request B from other station j, even though A is generated before B. This is due to the randomness of the process by which stations can gain the right of access to the network, and the eect of queueing of generated messages. Since various kinds of dependencies in the system can be well captured by Markov models, and since stochastic timed Petri nets [2, 4, 81, 84, 118] provide a convenient means of speci cation and automated generation, solution of large Markov models, we use Markov models described in the form of stochastic timed Petri nets to model and evaluate computer-communications systems. Modeling a system is done more easily and eciently by using a stochastic timed Petri net than by using a Markov model directly. The complex behavior of the system can be concisely represented by a stochastic timed Petri net. The stochastic timed Petri net model of a system is easier to understand than the Markov model or the probabilistic analysis model. This approach, however, also has several intrinsic problems. First, the number of states of the Markov models increases very rapidly as the model size increases. Converting a stochastic timed Petri net model into its Markov model manually is almost impossible for a large model. Even with an automated tool, converting and solving the underlying Markov model usually takes a large memory space and a long execution time. Second, the distributions of event times are usually restricted, i.e., they must be all exponential [4, 81] or exponential plus deterministic (constant) [2], in order to use stochastic timed Petri nets and solve the nets analytically and numerically. Due to this limitation, event times that are not exponentially distributed are often assumed to be exponential, or they are sometimes approximated by the phase-type expansion with exponential distributions [11, 35, 85].

3 For example, a deterministic event time may be approximated by using a series of exponential stages (e.g., 10). However these approximations not only degrade the accuracy but also result in an explosion of the model size.

1.2 Contributions and Organization In order to model generally distributed event times which occur in many practical problems, we suggest a new class of stochastic timed Petri nets called Markov Regenerative Stochastic Petri Nets (MRSPNs). After a brief introduction of some background information in Chapter 2, we give a de nition, a transient (time-dependent) analysis, and a steady state analysis method of this new class of Petri nets in Chapter 3. We show that the underlying stochastic process of an MRSPN is the Markov regenerative process. We show a sucient condition for stochastic timed Petri nets to be MRSPNs. For a class of MRSPNs which satis es this condition, we derive the kernel distributions of the underlying Markov regenerative process, the equations for the steady state behavior, and the equations for the transient behavior. In Chapter 4, we show the simpli ed equations of the method for a special case of the uniformly distributed event times. In Chapter 5, we apply the method for a special case of MRSPNs called Deterministic and Stochastic Petri Nets [2] where general distributions of MRSPNs are restricted to be deterministic. We investigate the sensitivity of Deterministic and Stochastic Petri Nets models in Chapter 6. The variation of output performance measures with respect to changes of a chosen parameter of the model is studied. In order to overcome the largeness problem, approximation methods which simplify the overall model and yet provide reasonable accuracy must be developed. In Chapter 7, we propose two approximation techniques. We rst show the folding method using examples of client-server systems. We model various kinds of dependencies that can occur in the request arrival process at the server using Stochastic Reward Nets (SRNs), a class of stochastic timed Petri nets. We consider both the token ring network-based client-server systems and the CSMA/CD networkbased systems. The system's sensitivity with respect to its parameters is also discussed. We show the exact model of a token ring based system, and compare with its approximate model and the dierence in performance measures. We next show the xed point iteration method through a polling system example. Several performance measures such as the mean cycle time and the mean response time of a polling system obtained by the approximation methods are

4 studied. This approximation method is applicable for both the symmetric polling system and asymmetric polling system. We prove the existence of the numerical solution by the xed point iteration method. In Chapter 8, we investigate the reliability issue of systems. Among various reliability measures, we are particularly interested in the mean time to failure (MTTF) of a system which is one of the most practical reliability measures. We introduce a new MTTF named conditional MTTF and provide a computational method in SRN models. As the conclusion, we give a summary of this thesis and suggestions for possible future research in Chapter 9.

Chapter 2 Background In this chapter, we present a brief introduction to the concepts and the notation for stochastic processes and stochastic timed Petri nets. Stochastic timed Petri nets and underlying stochastic processes are the modeling technique we use in this study. If the system is modeled by a stochastic timed Petri net, the analysis is performed by solving the underlying stochastic process. We will illustrate the analysis method of stochastic timed Petri nets through a simple example. More details on these topics can be found in [27, 57, 111].

2.1 Stochastic Processes Let S be the set of all possible outcomes of a random experiment. The set S is called the sample space of the experiment. A random variable is a function (mapping) from a outcome s 2 S to a real number. A stochastic process is a family of random variables fX(t) j t 2 T g, de ned on a given probability space, indexed by the parameter t (such as time) where t varies over a parameter set T. The values returned by the random variable X(t) are called states, and the set of all possible values forms the state space, denoted by , of the process. Since a random variable itself is a function on the sample space S, the above family of random variables is in fact a family of functions fX(t; s) j t 2 T; s 2 S g: For a xed t = t1, X(t1 ; s) forms a single random variable as s varies over S. For a xed sample point s1 2 S, X(t; s1 ) is a function of only time t, called sample function or a realization of the process. When both t and s are varied, we have the family of random variables constituting a stochastic process.

5

6

2.1.1 Markov Processes A Markov process is a stochastic process that satis es the following Markov property:

8 t; tn; :::; t1; t0 such that t > tn > ::: > t1 > t0; 8 x; xn; :::; x1; x0 P fX(t) x j X(tn ) = xn; :::; X(t1) = x1; X(t0 ) = x0 g = P fX(t) x j X(tn ) = xn g

(2.1)

which means that the dynamic behavior of a Markov process is such that probability distributions for its future development depend only on the present state ( X(tn ) ) not on how the process arrived in that state (past history). A Markov chain (MC) is a Markov process with a discrete ( nite or countably in nite) state space. If we choose to observe the state of the process at a discrete set of time points, we get a discrete-time Markov chain (DTMC). On the other hand, if we observe it in continuous time, what we get is called a continuous-time Markov chain (CTMC). We will review the DTMCs rst. Most of the theories for DTMCs also apply for CTMCs. Let the random variables X0 ; X1; :::; Xn; ::: represent the successive observations of a system with Markov property at time steps 0,1,...,n,... , respectively. Then the sequence of the random variables fXn ; n 0g forms a DTMC. The Markov property in this case can be stated as:

8 i0 ; i1; :::; in 2 ZZ P fXn = in j Xn?1 = in?1; :::; X1 = i1 ; X0 = i0 g = P fXn = in j Xn?1 = in?1g where ZZ is the set of integers. If P fXm+n = j j Xm = ig = P fXn = j j X0 = ig the DTMC is said to be homogeneous. De ne the one-step transition probability matrix P = [pij ] (i; j 2 ) of a homogeneous DTMC1 such that: pij = P fXn+1 = j j Xn = ig = P fX1 = j j X0 = ig:

(2:2)

The entries of the matrix P satisfy the following properties:

8 i; j 2 ;

0 pij 1 and

X

j

pij = 1:

1 We consider only homogeneous Markov chains in this study. Hereafter, we use \Markov

chains" to mean \homogeneous Markov chains".

7 Also let P(n) = [pij (n)] be the transition probability matrix of a DTMC such that:

8 i; j 2 ;

pij (n) = P fXn = j j X0 = ig:

The probability mass functions of the random variable X0 over all the states in the state space is called the initial distribution and is speci ed by the initial probability vector: p(0) = (p0 (0); p1(0); p2(0); :::; :::): De ne p(n) = (p0 (n); p1(n); p2(n); :::; :::) to be the vector of probabilities of being in each state of the system at time n after starting from the initial state. The vector p(n) is then the transient solution, or time-dependent solution of DTMC and is computed as: p(n) = p(0) P(n) = p(0) P n;

X

j 2

pj (n) = 1:

(2:3)

This means that a DTMC is completely described by its one-step transition probability matrix and its initial probability vector.

The steady state solution of the DTMC is denoted by the probability vector v = (vj ) so that v = nlim !1 p(n). In order to introduce the steady state solution method, we need the

following de nitions. Let fii(n) be the probability that system returns to state i, starting from

state i, after n transitions. A state is called recurrent if 1 X n=0

1 X

n=0

fii(n) = 1, and called transient if

fii(n) < 1. In other words, after leaving a state i, if there is any chance for the system not to

revisit that state, i is a transient state. Otherwise it is a recurrent state. For instance, absorbing states are recurrent states. Even if a state i is recurrent, the mean recurrence time can be either nite or in nite. A state i is called positive recurrent if the mean recurrence time is nite, and called recurrent-null if it is in nite. Two states i and j that are reachable from each other are said to communicate. If every state communicates with every other state in a nite number of steps, the Markov chain is said to be irreducible. Next we de ne periodicity of Markov chains. For a recurrent state i, pii (n) > 0 for some n 1. De ne the period of state i as the greatest common divisor of the set of integers n such that pii(n) > 0. The state i is called aperiodic if its period is 1, and periodic if the period is greater than 1. It is shown [43] that if any state of a Markov chain is aperiodic then all the states of the Markov chain are aperiodic, and thus the Markov chain is called aperiodic.

8 The steady state probability vector v of a positive recurrent, aperiodic and irreducible Markov chain2 can be obtained by solving a system of linear equations: v = v P;

X

j 2

vj = 1:

(2:4)

Note that the steady state solution is independent of its initial probability distribution p(0). We now review continuous-time Markov chains (CTMC). Let fX(t); t 0g be a nite state space, ergodic CTMC. De ne the in nitesimal generator matrix Q = [qij ] of a CTMC consisting of the direct transition rates (rates of exponentially distributed transition times) from state i to j (i 6= j; i; j 2 ) and the diagonal entries de ned as: qii = ?

X

j;j 6=i

qij (refer to Figure

2.5 and 2.6 for examples). If Q does not depend on time t, the CTMC is said to be homogeneous, equivalently a CTMC is de ned to be homogeneous if: P fX(t + s) = j j X(s) = ig = P fX(t) = j j X(0) = ig: Just as a homogeneous DTMC can be completely described by its one-step transition probability matrix P and its initial probability vector p(0), a homogeneous CTMC can be completely described by its in nitesimal generator matrix Q and its initial probability vector. The transient probability vector at time t for a CTMC, p(t), is obtained by solving a system of dierential equations: dp(t) = p(t) Q ; t > 0 (2:5) dt and the steady state probability vector = tlim !1 p(t) is obtained by solving the system of linear

equations:

Q = 0;

X

j 2

j = 1:

(2:6)

The amount of time spent in a state before moving to another state, i.e., sojourn time of the state is geometrically distributed in DTMCs and exponentially distributed in CTMCs. A common way of describing the one-step transition probability matrix of a DTMC is drawing a 2 This type of Markov chain is called an ergodic Markov chain and has a limiting probability

distribution (because there exist a unique distribution of Xn as n ! 1). We consider only nite state space, ergodic Markov chains in this study.

9 directed graph called the state transition diagram. A node labeled i of the state diagram represents state i of the DTMC and an arc labeled pij from node i to j implies that the one-step transition probability P fX1 = j j X0 = ig = pij . Likewise, the CTMC is commonly described by a state transition (rate) diagram where a node labeled i represents state i and the arc from node i to j is labeled qij , the transition rate (see Figure 2.5).

2.1.2 Semi-Markov Processes The semi-Markov process (SMP) is a generalization of a DTMC where the times between transitions are allowed to be random variables with general distributions. In a DTMC, the state transitions occur at every unit time step (note that the sojourn time in a state is geometrically distributed). In an SMP, the times between transitions are random variables which depend on the present, and the possibly next, state. The CTMC is a special case of the SMP where the times between transitions are exponentially distributed random variables and they depend on only the present state due to the memoryless property of the exponential distribution. In order to give a formal de nition of the SMP, we need the following de nitions. A sequence of bivariate random variables f(Yn; Tn ); n 0g is called a Markov renewal sequence if 1. T0 = 0; 8 n > 0; Tn+1 > Tn and Yn 2 ZZ 2. 8 i; j 2 ; P fYn+1 = j; Tn+1 ? Tn t j Yn = i; Tn; Yn?1; Tn?1; :::; Y0; T0g = P fYn+1 = j; Tn+1 ? Tn t j Yn = ig

(Markov Property)

= P fY1 = j; T1 t j Y0 = ig

(Time Homogeneity).

(2.7)

Denote the conditional probability in Equation (2.7) by Kij (t). The matrix K(t) = [Kij (t)] is called the kernel. The distribution function of T1 starting from state i is de ned as: Hi(t) = P fT1 t j Y0 = ig =

X

j 2

Kij (t); t 0; i 2

(2:8)

and it is called the sojourn (holding) time distribution of state i of the SMP. Let N(t) = supfn 0 : Tn tg.

10 We now de ne the SMP. Consider a Markov renewal sequence f(Yn; Tn); n 0g. The process fX(t); t 0g such that: X(t) = YN (t) ; t 0 (2:9) is a semi-Markov process with kernel K(t). The process fYn ; n 0g is a DTMC with the one-step transition probability matrix P = K(1) and is called the embedded Markov chain (EMC) of the SMP fX(t); t 0g. This follows from the fact that Equation (2.2) is obtained from Equation (2.7) by letting t ! 1. The SMP can be completely speci ed by the initial distribution p(0) and the kernel matrix K(t).

2.1.3 Markov Regenerative Processes A Markov regenerative process is a generalization of many stochastic processes including semiMarkov process (SMP). A stochastic process fZ(t); t 0g is called a Markov regenerative process (MRGP), also known as a semi-regenerative process, if there exists a Markov renewal sequence f(Yn; Tn ); n 0g of random variables such that all the conditional nite dimensional distributions of fZ(Tn + t); t 0g given fZ(u); 0 u Tn ; Yn = ig are the same as those of fZ(t); t 0g given Y0 = i [20, 67]. This de nition implies that P fZ(Tn + t) = j j Z(u); 0 u Tn ; Yn = ig = P fZ(t) = j j Y0 = ig:

(2:10)

In other words, even though the Markov regenerative process fZ(t); t 0g may not have the Markov property in general, there is a sequence of embedded time points T0; T1 ; :::; Tn; ::: such that the states of the process at those time points (Y0 ; Y1; :::; Yn; :::; respectively) satisfy the Markov property, i.e., : 8 i0 ; i1 ; :::; in 2

P fYn = in j Yn?1 = in?1; :::; Y1 = i1 ; Y0 = i0 g = P fYn = in j Yn?1 = in?1 g: Therefore it does not matter what states the process fZ(t); t 0g has visited until reaching Yn at time Tn. The state Z(Tn ) is the only needed information for the future development of Z(Tn + t); t 0. Moreover the embedded time points fTn; n 0g are also regeneration points if the process begins with the same state at these time points, that is:

fZ(Tn + t); t 0 j Z(Tn ) = in g =d fZ(t); t 0 j Z(0) = in g

11 where =d denotes equality in distribution. Like the SMP, the kernel of the MRGP is de ned as K(t) = [Kij (t)] where: Kij (t) = P fY1 = j; T1 t j Y0 = ig

i; j 2 :

The process fYn; n 0g which is underlying the MRGP fZ(t); t 0g is the embedded Markov chain whose one-step transition probability matrix is K(1) because Equation (2.2) is obtained from Equation (2.7) by letting t ! 1. The process fX(t); t 0g which is formed from fZ(t); t 0g according to Equation (2.9) is called the semi-Markov process of the MRGP with kernel K(t). The Figure 2.1 shows a sample path of the MRGP. Starting from state Y0 , the MRGP changes the state to s1 at time t1, and to state s2 at time t2 (T0 < t1 < t2 < T1). After staying in s2, the state changes to Y1 at time T1 . The behavior of the MRGP after time Tn (starting from state Yn = i) is statistically identical with the behavior after time Tn?1 or Tn?2 , ..., or T0 given that the process also starts from state i at that time point. The dierence between the MRGP and the SMP is that the SMP does not have the state changes between Ti and Ti+1 (in the SMP, the process stays in Yi until Ti+1 ). The Markov regenerative process is more complex stochastic process than the semi-Markov process. Z (t)

Y2 s2

Y0

Y1

s1 i

Yn = i 0

T1

T2

......

Tn

Figure 2.1: A Sample Path of MRGP

t

12

2.1.4 Markov Reward Processes A Markov reward process [76, 100] is a Markov process with real-valued rewards attached to its states. A real-valued reward rate ri is associated to state i 2 of a Markov chain and, if the Markov chain stays in state i for duration Hi , a reward ri Hi is gained. Thus, a Markov reward model is a Markov chain model of a system with rewards. The Markov reward model is frequently used for the combined evaluation of performance and reliability of a degradable fault-tolerant system. The reward rate is an index of the level of performance of a system in that state. For instance, a reward rate could be the queue length at a resource or the number of busy resources in a system or the throughput of the system in that state. The expected performance measure of interest is then a weighted sum of the state probabilities with the reward as weight. Let R represent the random variable corresponding to the reward rate in steady state, then the expected reward rate E[R] can be computed as [82]: E[R] =

X

i

i2

ri i

where i is the steady state probability of state i of the Markov chain. We can perform not only steady state analysis but also transient and cumulative transient analysis. The expected value of the reward rate as a function of time, E[R(t)], can be computed as: X i2

E[R(t)] = ri pi (t) i

where pi (t) is the probability of state i at time t. The accumulated reward, Y (t), over the interval [0; t) is de ned by: Y (t) = E[Y (t)], can be computed as: E[Y (t)] = E[

t

Z

0

t

Z

0

R(u) du : The expected value of the accumulated reward,

R(u) du] =

Z

t

0

E[R(u)] du =

X

i

ri

Z

0

t

pi (u) du :

The expected accumulated reward until absorption, E[Y (1)], can also be computed as: E[Y (1)] =

X

i

ri

Z

0

1

pi (u) du :

13 Recently, considerable attention has been given to the problem of evaluating the distribution of accumulated reward, Y (x; t) P[Y (t) x]. This is the probability of completing a given amount of useful work, x, within a speci c time interval, t. For more details, see [7, 28].

2.2 Stochastic Timed Petri Nets 2.2.1 Petri nets A Petri net [91, 92] is an abstract, formal model of information ow. It is a powerful method for describing and analyzing the ows of information and the controls in a system. The main use of Petri nets has been in the modeling of systems where events may occur asynchronously or concurrently. The Petri net graph model is a bipartite directed graph whose nodes are divided into two disjoint sets called places and transitions (Figure 2.2). Transitions (denoted by bars) represent events and places (denoted by circles) represent conditions for the events. Directed arcs from places to transitions are called input arcs, and those from transitions to places are called output arcs. Input places of the transition are the set of places which are directly connected to the transition by input arcs and output places are the set of places which are directly connected from the transition by output arcs. Tokens are associated with places, represented by black dots or numbers inside the places. Each place contains non-negative number of tokens. The number of arcs connecting a place to a transition (a transition to a place) is called the multiplicity of that input (output) arc. When the multiplicity of an arc is more than one, a small bar with a number (equal to the multiplicity) is placed next to the arc. The state of a Petri net is de ned by the number of tokens in each place, and is represented by a vector m = (#(P1); #(P2); ; #(Pn)), called a marking, where #(Pi ) is the number of tokens in place i and n is the number of places in the net. A transition is said to be enabled in the current marking if the number of tokens in each input place is at least equal to the the multiplicity of the corresponding input arc. The ring of an enabled transition is an atomic action in which tokens are removed (the number equal to the multiplicity) from the input places of the transition and are added (the number equal to the multiplicity) to the output places of the transition, possibly resulting in a new marking of the Petri net. A marking mj is said to be reachable from a marking mi if, starting from mi , there exists a sequence of transitions whose

14

2 2

Figure 2.2: An Example of a Petri net rings generate mj . A reachability graph can be constructed by connecting a marking mi to a marking mj with a directed arc if the marking mj can result from the ring of some transition enabled in mi . From the given initial marking m0 , a unique reachability graph is obtained for a Petri net. One of the important developments in the theory of Petri nets is the introduction of enhancements that make Petri nets more useful in performance evaluation studies. One such enhancement is stochastic timed Petri nets which associate random ring times (the time that elapses after a transition is enabled until it res) with the transitions. Transitions which have nonzero ring times are called timed transitions, and transitions with zero ring times are called immediate transitions. Many classes of stochastic timed Petri nets have been proposed. They can be categorized according to the type of processing duration ( ring times) that they can handle [115]: 1. discrete-time processing duration 2. exponentially distributed processing duration 3. generally distributed processing duration. Zuberek [118], Razouk and Phelps [95], Molloy [80], Holliday and Vernon [58] formulated classes of stochastic timed Petri nets with discrete-time processing durations. The time-related performance measures for these models are obtained by transforming the Petri nets into a DTMC and then solving the DTMC. Since the state-of-the-art trend is based on using stochastic timed

15 Petri nets with continuous-time processing durations, we only consider the latter two cases.

2.2.2 Stochastic Timed Petri Nets with Exponentially Distributed Firing Times Classes of stochastic timed Petri nets with exponentially distributed processing durations were suggested by Molloy [81], Natkin [84], and Zuberek [117]. Their works evolved into the Stochastic Petri Net (SPN). Ajmone Marsan et al. [4] have extended the SPN to the Generalized Stochastic Petri Net (GSPN) by including zero processing durations. We explain the GSPN in this section. In GSPNs, immediate transitions (denoted by bars) re in zero time once they are enabled, and timed transitions (denoted by thin empty rectangles) re after a random, exponentially distributed time (Figure 2.3). In any marking of a GSPN, several transitions may be simultaneously enabled. The set of such transitions constitute the con icting set of transitions in the marking. In this case, the transition that will re depends on the set , the con icting set of transitions. If consists of only timed transitions, the enabled timed transition tj (j 2 ) res with probability: rj (mi ) X rk (mi ) k2

where rk (mi ) is the ring rate of the transition tk in marking mi . If has one immediate transition, then that immediate transition res regardless of the number of timed transitions in . If has more than one immediate transitions, then only one immediate transition res regardless of the number of timed transitions in . In this case, it is necessary to specify a probability mass function on the set of enabled immediate transitions according to which transition is selected for ring. The inhibitor arc has been also introduced in GSPNs. It connects a place to a transition, and is denoted by a line terminating with a small circle instead of an arrowhead at the transition (e.g., the arc from P2 to s1 in Figure 2.3). A transition which has inhibitor input arcs may re only if each of its ordinary input places contains at least as many tokens as the multiplicity of the input arcs and all of its inhibitor input places contain fewer tokens than the multiplicity of the corresponding inhibitor arc. The tokens in the ordinary input places are removed as this

16 transition res, however the tokens in the inhibitor input places remain untouched. Ciardo, Muppala, and Trivedi [30] introduced the Stochastic Reward Nets (SRNs), extensions to GSPNs, to increase the modeling power of the Petri nets. By associating reward rates with the markings, SRN allows the automated generation of Markov reward models, facilitating the combined evaluation of performance and reliability of degradable fault-tolerant systems. SRN also provides the variable multiplicity arcs (represented as a zigzag sign on the arc) that permit the removal (deposition) of a marking-dependent number of tokens from (to) an input (output) place. This allows ushing all tokens in an input place in that marking and move a possibly dierent number of marking-dependent tokens into an output place. Another added capability is the general marking-dependency which permits the rate or probability of a transition to be a function of the number of tokens in any place of the net. Guards in the SRN allow the ring of a transition to be based on the global structure of the net. Each transition can have an associated boolean guard so that the function is evaluated once the transition meets the basic enabling conditions, i.e., the number of tokens in each of its input places is not less than the multiplicity of corresponding input arc and the number of tokens in each of its inhibitor places is less than the multiplicity of corresponding inhibitor arc and no other transition with priority higher than this transition is enabled. Only after the guard is evaluated to be true, does the transition become nally enabled. Molloy [79] showed that the stochastic process underlying an SPN is a continuous-time Markov chain. SPN markings are in one-to-one correspondence to CTMC states. The sojourn time in each marking(state), mi , is an exponentially distributed random variable with the rate: X

j 2

rj (mi ) : The transition rate from marking mi to mj is obtained as: (mi ; mj ) def =

X

k2 ij

rk (mi )

(2:11)

where ij is the set of transitions enabled by marking mi whose ring generates marking mj . As a result, the in nitesimal generator matrix Q of the CTMC which is underlying the SPN is easily obtained. Likewise, Ajmone Marsan et al. [4] showed that GSPNs are also equivalent to CTMCs. GSPN models are analyzed through the underlying CTMCs. Similarly, SRN models are transformed into Markov reward models (CTMC with reward structure).

17

2.2.3 Stochastic Timed Petri Nets with Generally Distributed Firing Times The exponential assumption has been regarded as one of the main restrictions in the application of SPNs or GSPNs to practical problems. Non-exponentially distributed processing durations often appear in many practical problems. For instance, deterministic times arise while modeling time outs in the communication protocols, log-normal distribution [75] is often used for repair times, and uniform distribution is adopted in Estelle [15]. In an eort to alleviate this restriction, several classes of stochastic timed Petri nets that allow generally distributed processing durations have been proposed. The examples are Extended Stochastic Petri Net (ESPN) [41], Deterministic and Stochastic Petri Net (DSPN) [2], and Markov Regenerative Stochastic Petri Net (MRSPN) [24]. The ESPN allows generally distributed ring times as well as exponentially distributed ones. Under some restrictions, the underlying stochastic process of an ESPN is a semi-Markov process and therefore analytically solvable. If these restrictions are not met for a model, the original proposal was to use discrete-event simulation. The DSPN allows transitions with zero ring times or exponentially distributed or deterministic ring times. As a result, the underlying stochastic process of a DSPN is neither a Markov nor a semi-Markov chain. However, DSPNs are solvable analytically if at most one timed transition with deterministic ring time is enabled concurrently with exponentially distributed timed transitions. A steady state solution method for DSPNs appears in [2, 24, 70], and a transient solution method is given in [24]. Algorithms for parametric sensitivity analysis of the steady state solution are presented in [25]. Furthermore, in [24], the underlying stochastic process for a DSPN is shown to be a Markov regenerative process (MRGP). The MRSPN is a generalization of DSPN that allows the deterministic distributions of a DSPN to become general distributions. The introduction of the MRSPN is the main theme of this dissertation. Details of MRSPNs follow in the next chapter. Stochastic timed Petri nets with generally distributed processing durations are also considered in [34, 54]. These enhancements, however, require discrete-event simulation to obtain the solution. Such simulations tend to be expensive since a large amount of computation time may be needed in order to obtain statistically signi cant results. In this study, we only consider the stochastic timed Petri nets which are solvable analytically-numerically.

18

2.3 Relation of Markov Chains with Stochastic Timed Petri Nets Once a system has been modeled as a stochastic timed Petri net, the next step is to solve the model. In this section, we explain this procedure through an example modeled by a stochastic reward net (SRN). Consider an SRN model in Figure 2.3. The places are labeled as Pi; 1 i 8. The timed P1

t1

P2

1

t2

P3

1

s1

P4 t3

P5

t4

P6 t5

P7 s2

P8 t6

Figure 2.3: SRN Model of 1 Client, 1 Server System transitions (with exponentially distributed ring times) are denoted by thin empty rectangles and are labeled as tj; 1 j 6 along with their ring rates. The immediate transitions are denoted by bars and are labeled as sj; 1 j 2. The number inside a place represents the number of tokens in that place. Let #(Px) denote the number of tokens in place Px, and let m = f#(P1); ; #(P8)g denote the marking of the SRN. The SRN model is solved by the following steps. First, the reachability graph is generated

19 from the SRN model. Second, the reachability graph is converted to a CTMC. Finally, the CTMC is solved analytically-numerically. The reachability graph of Figure 2.4 is obtained from the initial marking of Figure 2.3. The label on a directed edge represents the transition whose ring generates the successor marking. When a previously known marking is obtained the further development is pruned. Thus it can be seen that there are sixteen unique markings. The markings can be classi ed into two types: vanishing markings and tangible markings [4]. A vanishing marking is one in which at least one immediate transition is enabled, and a tangible marking is one in which no immediate transition is enabled. The reachability graph of Figure 2.4 has six vanishing markings: m0, m3, m5, m9, m14 and m15. These are represented by rectangles and the tangible markings are represented by ovals. When the vanishing markings are eliminated by merging them with their successor tangible markings, the resulting model becomes the CTMC. Figure 2.5 depicts the ergodic CTMC which is obtained from the reachability graph of Figure 2.4 corresponding to the SRN of Figure 2.3. In the gure, nodes represent states and arcs represent state transitions. State i is labeled as \(i)" as well as the corresponding marking, for the reference. The state space of the underlying CTMC consists of tangible markings = f m1; m2; m4; m6; m7; m8; m10; m11; m12; m13g. The in nitesimal generator matrix Q = [qij ] of the CTMC, where qij is the rate of transition from state i to state j (i 6= j; i; j 2 ) and qii = ?

X

j;j 6=i

qij so that

X

j

qij = 0, is shown in Figure

2.6. The matrix Q is obtained from the reachability graph using the following fact [4]: Q = QTT + QTV W

(2:12)

where QTT (QTV ) is the matrix of the rates of transitions from the tangible markings to tangible (vanishing) markings. P V T (P V V ) is the matrix of the probabilities of transitions from vanishing to tangible (vanishing) markings and [I ? P V V ]W = P V T [83]. The i-th element of W gives the probability that the rst state visited is tangible marking i after an arbitrary number of state transitions among vanishing markings given that the current state is in vanishing marking. Let i be the steady state probability that the CTMC is in state i as before. The vector

20

Initial Marking m0 = 10100000

s1 m1 = 10010000

t1

t3

m2 = 01010000

m3 = 10000010

t3

s2 m4 = 10000001

m5 = 01000010

s2

t1

m6 = 01000001

m6 = 01000001

t6

t6 m7 = 01100000

t2 m8 = 00011000

t4

t3 m9 = 00001010

m10 = 00010100

s2

t3

m11 = 00001001

t4 m13 = 00000101

t6 m15 = 00100100

m12 = 00000110

t6

m14 = 00101000

t5 m4 = 10000001

s1 m8 = 00011000

s1 m10 = 00010100

Figure 2.4: Reachability Graph of the Example SRN

t6 m0 = 10100000

21

m1 (1)

m2 (2)

m4 (3)

m6 (4)

m12 (9)

m7 (5)

m8 (6)

m10 (7)

m13 (10)

m11 (8)

Figure 2.5: CTMC Underlying the Example SRN 2

Q=

6 6 6 6 6 6 6 6 6 4

?( + ) 0 ?

0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 ?( + ) 0 0 0 0 0 0

0 0 0

0 0 0 0 ? 0 0 ? 0 0 ?( + ) 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 ? 0

0 0 ?( + ) 0 0 0 ? 0

0 0 ?

3 7 7 7 7 7 7 7 7 7 5

Figure 2.6: In nitesimal Generator Matrix of the CTMC is the row vector of these probabilities. Then the system of linear equations: Q = 0;

X

i

i = 1

will provide the required probabilities. Thus, the steady state analysis of a Markov model involves the solution of a system of linear equations. The number of equations (states) can be rather large. However, the in nitesimal generator matrix Q is usually sparse and this can be exploited in solving and storing large Markov models. Hence iterative methods such as GaussSeidel or Successive Overrelaxation (SOR) are used rather than direct methods such as Gaussian elimination. The iteration for SOR3 is k+1 = ![ k+1 U + k L ]D?1 + (1 ? !)k 3 Note that is not a column vector but a row vector.

22 where k+1 is the solution vector at the kth iteration, L is a lower triangular matrix, U is an upper triangular matrix, and D is a diagonal matrix such that Q = D ? L ? U. For ! = 1, the SOR iteration reduces to the Gauss-Seidel iteration. The choice of ! is discussed in [27]. Generally the state space of a Markov model is very large, so the manual generation of the in nitesimal generator matrix is dicult. Several packages have been developed for automatically generating the reachability graph and solving the associated CTMC [23, 30, 36]. In this study we use the SPNP [30] to perform all the steps above to obtain steady state probabilities and rewards. The transient and sensitivity analysis can be carried out with equal ease by using SPNP.

Chapter 3 Markov Regenerative Stochastic Petri Nets 3.1 Introduction The classes of stochastic timed Petri nets that have been proposed for performance and reliability analysis of systems include Stochastic Petri Net (SPN) [81], Generalized Stochastic Petri Net (GSPN) [4], Extended Stochastic Petri Net (ESPN) [41], and Deterministic and Stochastic Petri Net (DSPN) [2]. In the SPN, a transition res after an exponentially distributed amount of time ( ring time) when it is enabled. The GSPN allows transitions with zero ring times or exponentially distributed ring times. The stochastic process underlying an SPN or a GSPN is a continuous-time Markov chain. The ESPN allows generally distributed ring times and exponentially distributed ones. Under some restrictions, the underlying stochastic process of an ESPN is a semi-Markov process. The DSPN allows transitions with zero ring times or exponentially distributed or deterministic ring times. As a result, the underlying stochastic process of a DSPN is neither a Markov nor a semi-Markov chain. However, DSPNs can be solved analytically with a restriction that at most one timed transition with deterministic ring time to be enabled concurrently with exponentially distributed timed transitions. A steady state solution method for DSPNs appears in [2, 24, 70], and a transient solution method is given in [24]. Algorithms for parametric sensitivity analysis of the steady state solution are presented in [25]. Furthermore, in [24], the underlying stochastic process for a DSPN is shown to be a Markov regenerative process (MRGP). In this chapter1 , we introduce a new class of stochastic timed Petri nets by generalizing the 1 This chapter is based on the paper Markov Regenerative Stochastic Petri Nets by H. Choi and

23

24 deterministic ring times of a DSPN to be arbitrary distributed ring times. We show that the underlying stochastic process for this new class of Petri net continues to be MRGP, and hence we call it a Markov Regenerative Stochastic Petri Net (MRSPN). The main contributions of this chapter include: (1) de nition of a new and powerful class of stochastic timed Petri nets that is a superset of SPNs, GSPNs, ESPNs and DSPNs, (2) showing that the underlying stochastic process for the new class of Petri nets is the Markov regenerative process. We show a sucient condition for stochastic timed Petri nets to be MRSPNs. For a class of MRSPNs which satis es this condition, we derive (3) the kernel distributions of the underlying MRGP, (4) the equations for the steady state behavior, and (5) the equations for the transient behavior. The chapter is organized as follows. In Section 3.2, we de ne MRSPN and show that the underlying stochastic process of the MRSPN is a Markov regenerative process. Equations for the transient analysis of MRSPNs are derived in Sections 3.3 along with the computational aspect of the method. The steady state analysis of MRSPNs are derived in Sections 3.4. As an example, the M/G/1/2/2 queueing system is developed in Section 3.5.

3.2 De nition In stochastic timed Petri nets, a random ring time elapses after a transition is enabled until it res. As explained in Section 2.2.1, transitions which have nonzero ring times are called timed transitions and transitions with zero ring times are called immediate transitions. We will call a timed transition whose ring time is exponentially (generally) distributed an EXP (GEN) transition. Let Fd () denote the distribution of the ring time of a GEN transition. Let M(t) be the tangible marking of a stochastic timed Petri net at time t. The right-continuous, piecewise constant, continuous-time stochastic process underlying the stochastic timed Petri net is called marking-process fM(t); t 0g. Study of the marking-process fM(t); t 0g is the thrust of the analysis of the stochastic timed Petri net.

De nition 3.1 A stochastic timed Petri net is called a Markov Regenerative Stochastic Petri Net (MRSPN) if its marking-process fM(t); t 0g is a Markov regenerative process. V. G. Kulkarni and K. S. Trivedi which is to appear in the 16th IFIP W.G. 7.3 Int'l Sym. on Computer Performance Modelling, Measurement and Evaluation (Performance'93), Rome, Italy, Sep. 1993.

25 A formal de nition of Markov regenerative stochastic Petri nets is as follows: MRSPN is a 12-tuple (P ; T ; A? ; A+ ; Ao ; m0; g; >; w; ; F; C ) where P = fp1 ; p2; :::; pjPjg is a nite set of places T = T I [ TE [ TG TI = ft1; t2; :::; tjT j g is a nite set of immediate transitions TE = ft1 ; t2; :::; tjT jg is a nite set of EXP transitions TG = ft1; t2; :::; tjT j g is a nite set of GEN transitions I

E

G

A? : P T INjPj ! IN is the marking-dependent multiplicity of an input arc from p 2 P to t 2 T ( INjPj : the set of markings) A+ : P T INjPj ! IN is the marking-dependent multiplicity of an output arc from t 2 T to p 2 P Ao : P T INjPj ! IN is the marking-dependent multiplicity of an inhibitor arc from p 2 P to t 2 T m0 = (#(p10 ); #(p20 ); ; #(pn01)) is the initial marking g : T INjPj ! ftrue, falseg is the guard for a transition > is a transitive and irre exive relation imposing a priority among transitions w : TI INjPj ! R+ is the weight assigned to the ring of an enabled immediate transition (R+ : non-negative real numbers) : TE INjPj ! R+ is the ring rate of an EXP transition F : TG ! F speci es the distribution of the ring time of a GEN transition (F : the family of general distribution functions) C is the conditions imposed on Petri nets to be MRSPNs. In a marking m, t1 is enabled only if (1) each of its ordinary input places contains at least as many tokens as the multiplicityof the input arcs and all of its inhibitor input places contain fewer tokens than the multiplicity of the corresponding inhibitor arc, and (2) the guard for t1 in the marking m is evaluated to be true, and (3) no other transition t2 exists such that t2 has higher priority while t2 also satis es the condition (1) and (2). The C de nes a sucient condition for the net to be an MRSPN, i.e., the restrictions imposed on the structure of stochastic timed Petri net in order for the underlying marking-processes to be a MRGP. We provide a version

26 of C in De nition 3.2. It is not an easy matter to check whether a given stochastic timed Petri net is an MRSPN. However, if a given Petri net meets C at least, the Petri net is an MRSPN. To check if a given Petri net satis es C or not can be accomplished by the reachability graph analysis. In spite of the restriction, we see that MRSPNs is a very large class of stochastic timed Petri nets that include previously studied classes of Petri nets.

Theorem 3.1 A stochastic timed Petri net is an MRSPN if it is either an SPN or a GSPN or an ESPN 2 or a DSPN.

Proof We know that the marking-process of an SPN or a GSPN is a CTMC (see [81] and [4] respectively), that of an ESPN is an SMP (see [41]), and that of DSPN is an MRGP (see [24]). All these processes are special cases of the MRGP, thus the theorem follows. 2 Consider a class of stochastic timed Petri nets de ned as follows.

De nition 3.2 A stochastic timed Petri net is said to be in PN if

at most one GEN transition is enabled in a marking and

the distribution of the ring time of a GEN transition is marking-independent in the form and the parameters.

The main result of this section is that stochastic timed Petri nets in PN are MRSPNs. For this reason a stochastic timed Petri net in PN will be denoted by MRSPN . Figure 3.1 (a) and (c) shows two example MRSPNs with their reachability graphs (b) and (d), respectively. In the gures, lled rectangles denote GEN transitions and empty rectangles denote EXP transitions. Each transition is associated with the weight of ring (in case the transition is an immediate transition) or the ring rate (for an EXP transition) or the ring time distribution (for a GEN transition). In the reachability graphs, solid arcs denote state transitions by GEN transitions and dotted arcs denote state transitions by EXP transitions. A 4-tuple in each marking of Figure 3.1(b) denotes (#(p1); #(p2); #(p3); #(p4)). Similarly, a 3-tuple in each marking of Figure 3.1(d) denotes (#(p5); #(p6); #(p7)). Transition tj represents the submission of a job by one of two clients. The ring time of this transition is exponentially distributed with the marking-dependent ring rate, i.e., #(p1). The ring time of transition 2 By ESPN, we mean a subset of ESPNs which can be solved analytically.

27

p1 #

p2

p3

ts

tj

p5

p4

td

tv

te

p6

Transition Distribution Parameters tj EXP #(p1) ts Phase Type (; M ) (; ) tv GAM

p7

Transition Distribution Parameters EXP te td DET (c)

(a)

m1

2001

tv 2010

m4

tj ts tj

m2

1101

tv 1110

m5

(b)

tj ts tj

m7

m3

0201

tv 0210

m6

td

100

010

m8

te 101

(d)

m9

Figure 3.1: Examples of MRSPNs and Their Markings

28 ts has phase type distribution, and that of transition tv has Gamma distribution. Transition td takes a xed amount of time to re. When the EXP transition te res, the transition td is disabled. Whenever te res, one token is put back in place p5 making td enabled again. This models the reset of a deterministic timer by a preemptable event which is represented by te. The reduced reachability graph of MRSPN is generated from the reachability graph in the following way. Suppose a timed transition t res in a tangible marking mi and the successor marking is a vanishing marking mk . Then some immediate transition enabled in mk will re and the next successor marking may be either vanishing or tangible. The probability of visiting a tangible marking mj after visiting an arbitrary number (including 0) of vanishing markings given that the current marking is the vanishing marking mk is Wkj . fWkj g is a solution to the system of linear equations [83]: VV

(I ? B )W = B

VT

(3:1)

where B V V (B V T ) is the matrix of one-step transition probabilities from vanishing to vanishing (tangible) markings in the reachability graph. The entire sequence of vanishing markings between mi and mj initiated by the ring of a timed transition t in mi is substituted by the branching probability Wkj in the following way: 1. If t is an EXP transition with the ring rate i , each of the successor tangible markings mj is connected from mi directly with the transition rate: (mi ; mj ) = i Wkj :

(3:2)

2. If t is a GEN transition with the random ring time X, each of the successor tangible markings mj is connected from mi directly with the ring time X. The branching probabilities in this case are kept in the branching-probability matrix = [(i; j)] where (i; j) = Wkj . (Recall that after the ring of t the marking mi changes to mk with probability 1.) (i; j) = P fnext tangible marking is mj j current tangible marking is mi and t resg (3:3) In case there are no vanishing markings between mi and mj , then (i; j) is set to 1. For all other entries, (i; j) is set to 1 if i = j and 0 otherwise.

29 Figure 3.2 illustrates the relation between the reachability graph and the reduced reachability graph. In the gure, tangible markings are represented by ovals and the vanishing markings are represented by rectangles. Thin solid arcs out of rectangles denote state transitions by immediate transitions. Tangible markings m5 throughm8 are directly connected from the tangible marking m1 in the reduced reachability graph. If t1 is an EXP transition with ring rate i , m5 throughm8 are directly connected from m1 with the transition rates: (m1 ; m5 ) = i ; (m1 ; m6 ) = (1 ? )i ; (m1 ; m7) = (1 ? ) i ; (m1 ; m8) = (1 ? )(1 ? )i : If t1 is a GEN transition, the eects of vanishing markings m2 throughm4 are captured by the following branching probabilities: (m1 ; m5) = ; (m1 ; m6) = (1 ? ); (m1 ; m7) = (1 ? ) ; (m1 ; m8) = (1 ? )(1 ? ): We will call a timed transition t to be exclusive3 in a marking m if t is the only transition enabled in m. When a timed transition t is enabled together with another transition t0 , t is called a competitive transition in the marking m if the ring of t0 disables the transition t. If the ring of t0 does not disable transition t, then t is called a concurrent transition in the marking m. Markings m1 ; m2 and m5 of Figure 3.1(b) are the ones in which a GEN transition and an EXP transition are enabled concurrently. Markings m3 and m6 (m4 ) are the ones in which a GEN (EXP) transition is enabled exclusively. Marking m7 of Figure 3.1(d) is the one in which a GEN transition and an EXP transition are enabled competitively. We shall show that, under the condition of PN , the marking-process fM(t); t 0g of MRSPN is a Markov regenerative process (MRGP). We need to introduce the following notation in order to prove this result. Let be the set of all tangible markings of the reduced reachability graph of the MRSPN . Then is also the state space of the marking-process of the MRSPN . For example, = fm1 ; m2; m3 ; m4; m5 ; m6g is the state space of the marking-process of the MRSPN shown in Figure 3.1(a). Consider a sequence fTn ; n 0g, of epochs when the MRSPN is observed. Let T0 = 0 and de ne fTn ; n 0g recursively as follows: Suppose M(Tn +) = m. 3 These de nitions of exclusive, competitive, and concurrent transitions are dierent from the

original de nitions given in [41]. In [41], exclusiveness, competitiveness and concurrentness are de ned in the scope of overall markings of Petri net. For example, a transition t is said to be exclusive if, for every marking in the reduced reachability graph that enables t, the marking should not enable other transitions. We de ne them in the scope of individual marking, because t may be enabled alone in some marking while being enabled with other transitions in some other markings. The restrictive de nitions made in [41] are not appropriate in our context.

30

m1

p1

p4

1000 0000 t1 m2 0100 0000 t3 t2 m3 0010 0000 0001 0000 m4 t6 t5 t4 t7 0000 0010 0000 0100 0000 0001 0000 1000

t7

(b) Reachability Graph

t1 p2

t3

t2 p3

t4 p5

t5 t6 p6

p7

p8

t1 0000 1000

Transition t1 t2 t3 t4 t5 t6 t7

m6

m5

Distribution Parameters (; ) GAM Immediate Immediate 1? Immediate 1? Immediate

Immediate 1? Immediate

m5

m7

m8

m1

1000 0000 t1 t1 t1 0000 0010 0000 0100 0000 0001

m6

m7

m8

(c) Reduced Reachability Graph

(a) MRSPN

Figure 3.2: The Relation between the Reachability Graph and the Reduced Reachability Graph

31 1. If no GEN transition is enabled in state m, de ne Tn+1 to be the rst time after Tn that a state change occurs. If no such time exists, we set Tn+1 = 1. 2. If a GEN transition is enabled in state m, de ne Tn+1 to be the time when the GEN transition res or is disabled. Note that there cannot be more than one GEN transition enabled in state m, and hence the above cases cover all possibilities. With the above de nition of fTn ; n 0g, let Yn = M(Tn +). We now show the following result.

Theorem 3.2 The marking-process fM(t); t 0g of an MRSPN is a Markov regenerative

process, i.e., MRSPN is an MRSPN.

Proof First we shall show that f(Yn; Tn); n 0g embedded in the marking-process of the

MRSPN is a Markov renewal sequence, i.e.,

8 i; j 2 ; P fYn+1 = j; Tn+1 ? Tn t j Yn = i; Tn; Yn?1; Tn?1; :::; Y0; T0g = P fYn+1 = j; Tn+1 ? Tn t j Yn = ig (Markov Property) = P fY1 = j; T1 t j Y0 = ig (Time Homogeneity).

(3.4)

Suppose the past history Y0 ; T0; :::; Yn?1; Tn?1; Yn ; Tn is given and Yn = i. Consider two cases. 1. No GEN transition is enabled, i.e., all the transitions that are enabled in state i at time Tn are EXP transitions. In this case, due to the memoryless property of the exponential random variable, the future of the marking-process depends only on the current state i and does not depend upon the past history or the time index n. 2. Exactly one GEN transition is enabled in state i. There may be other EXP transitions enabled in state i. In this case, Tn+1 is the next time when the GEN transition res or is disabled. The joint distribution of Yn+1 and (Tn+1 ? Tn ) will depend only on the state at time Tn. (That is, the next transition time depends on the destination state Yn+1 and the state Yn+1 will be decided from the current state Yn . Therefore, where to and when to move next is only dependent on Yn at time Tn .) It is thus independent of the past history and the time index n.

32 This proves that f(Yn ; Tn); n 0g is a Markov renewal sequence. Now consider the marking-process from time Tn onwards, namely fM(Tn + t); t 0g. Given the history fM(u); 0 u Tn ; M(Tn+) = ig, it is clear from the above argument that the stochastic behavior of fM(Tn + t); t 0g depends only on M(Tn ) = i. Thus,

fM(Tn + t); t 0 j M(u); 0 u Tn ; M(Tn) = ig =d fM(Tn + t); t 0 j M(Tn) = ig =d fM(t); t 0 j M(0) = ig

8i 2

where =d denotes equality in distribution. This proves that fM(t); t 0g is a Markov regenerative process. 2 Then it follows that the conditional probabilities: Kij (t) = P fY1 = j; T1 t j Y0 = ig

i; j 2 :

(3:5)

forms the kernel of the MRGP, K(t) = [Kij (t)]. As shown in Section 2.1.3, fYn ; n 0g is a DTMC (embedded Markov chain) with transition probability matrix K(1). Also if we let N(t) = supfn 0 : Tn tg, the process fX(t); t 0g such that X(t) = YN (t) ; t 0 is a semi-Markov process (SMP) of the MRSPN with kernel K(t). Using the theory of Markov regenerative process, we carry out the transient analysis and the steady state analysis in the following sections.

3.3 Transient Analysis of MRSPNs 3.3.1 Basic Equations for the Transient Behavior In this section, we derive equations for the transient probabilities of the marking-process fM(t); t 0g. De ne the transition probability Vij (t) = P fM(t) = j j M(0) = Y0 = ig

i; j 2

(3:6)

and let V (t) = [Vij (t)]. Let Kiu Vuj (t) =

t

Z

0

dKiu(x) Vuj (t ? x);

(3:7)

33 and K V (t) be a matrix whose (i; j) element is

X

u

Kiu Vuj (t).

The transient analysis of the marking-process of the MRSPN is based on the following general theorem on MRGPs with kernel K(). We include the proof for completeness.

Theorem 3.3 The transition probability matrix V (t) satis es the following generalized Markov renewal equation:

V (t) = E(t) + K V (t)

(3:8)

where Eij (t) = P fM(t) = j; T1 > t j Y0 = ig.

Proof Conditioning on Y1 and T1 and using the Markov regenerative properties of fM(t); t 0g, we obtain: P fM(t) = j j Y0 = i; Y1 = k; T1 = xg = (

P fM(t) = j j Y0 = i; T1 = xg Vkj (t ? x)

t t j Y0 = ig +

X

k2

XZ

k2 0

t

Vkj (t ? x) dKik (x)

Kik Vkj (t)

= Eij (t) + [K V ]ij (t) which in matrix form is Equation (3.8).

2

Note that Eij (t) describes the behavior of the marking-process between two transition epochs of the EMC, i.e., over the time interval [0; T1). We will call the matrix E(t) the local kernel, opposed to the kernel K(t) which is global in the marking-process. In order to use the above theorem, we need to specify K(t) = [Kij (t)] and E(t) = [Eij (t)] matrices for the MRSPN . We give the following de nitions for this purpose.

34 Consider a state m in . Let G (m) be the set of GEN transitions and E (m) be the set of EXP transitions enabled in state m. For example, G (m1 ) = ftvg; E (m1 ) = ftj g for state m1 of Figure 3.1(b). Consider the following cases. Case 1 : G (m) = ;, i.e., no GEN transition is enabled in m. In this case, de ne: m =

X

n2

(m; n):

(3:9)

If two or more EXP transitions in E (m) lead to marking n after the ring, then (m; n) is the sum of the transition rates of these EXP transitions as shown in Equation (2.11). Given Y0 = m, T1 is exponentially distributed with rate m and M(t) = Y0 for 0 t < T1 . Case 2 : G (m) = fdg, i.e., exactly one GEN transition d is enabled. Suppose Y0 = m. In this case T1 is the time when d res or is disabled due to the ring of a competitive EXP transition. De ne (m) to be the set of all states reachable from m in which the marking-process can spend a non-zero time before the next EMC transition occurs, i.e., during [0; T1). For example, (m1 ) = fm1 ; m2; m3 g for m1 of Figure 3.1(b), and (m7 ) = fm7 g for m7 of Figure 3.1(d). The marking-process during [0; T1) is a CTMC, which is called the subordinated CTMC [70], on state space with the in nitesimal generator matrix Q(m). The generator matrix Q(m) is formed as follows: for any n 2 (m), the rate from n to n0 2 is given by (n; n0), and if n 62 (m), the rates out of a marking n are zeros4 . Next, de ne (m) to be the set of states which is reachable starting from m (not necessarily directly) by ring of a competitive EXP transition. Similarly, de ne (m) to be the set of states reachable by ring of the GEN transition d. For example, (m7 ) = fm8g; (m7 ) = fm9 g for state m7 of Figure 3.1(d) and (m1 ) = fm4 ; m5; m6 g; (m1 ) = ;, for state m1 of Figure 3.1(b). Note that (m) ? (m), whereas (m) . This completes the description of the process in case 2. We now de ne the kernel of an MRSPN in the following theorem. E

G

G

G

E

E

E

G

Theorem 3.4 The kernel K(t) = [Km;n(t)] (m; n 2 ) of the marking-process of the MRSPN is given by: 4 Even though Q(m) is de ned for each m, it does not have to be distinct for each m. For instance, Q(m1) = Q(m2) = Q(m3) for states m1 ; m2 and m3 in Figure 3.1(b).

35 1. for state m such that G (m) = ;, (

Km;n (t) =

0

m = 0 m > 0

(m;n) (1 ? e?m t)

m

(3:10)

2. for state m such that G (m) = fdg, if n 2 (m) but n 62 (m) : E

G

Km;n (t) = [eQ(m)t ]m;n (1 ? Fd (t)) +

t

Z

0

[eQ(m)x ]m;n dFd (x)

(3:11)

if n 62 (m) but n 2 (m) : E

G

Km;n (t) =

X

t

Z

m 2 (m) 0

[eQ(m)x ]m;m dFd (x) (m0 ; n) 0

(3:12)

0

if n 2 (m) and also n 2 (m) : E

G

Km;n (t) = [eQ(m)t ]m;n (1 ? Fd (t)) + +

X

Z

m 2 (m) 0

t

Z

t

0

[eQ(m)x ]m;n dFd (x)

[eQ(m)x ]m;m dFd (x) (m0 ; n) 0

(3.13)

0


G

t0 :

Km;n (t) = 0;

(3:14)

Proof As given in Equation (3.5), Km;n (t) is de ned by: Km;n (t) = P fY1 = n; T1 t j Y0 = mg m; n 2 : 1. When no GEN transition is enabled in state m, the ring of any EXP transition in E (m) triggers the state change of the EMC. Hence it is clear that: P fY1 = n; T1 t j Y0 = mg = P fstate transition occurs until t j Y0 = mg P fn is reached by the transition j Y0 = mg n) = (1 ? e?m t ) (m; : m

36 If E (m) = ;, then (m; n) = m = 0. In this case, further state transitions are not possible from the state m, i.e., m is an absorbing state and P fY1 = n; T1 t j Y0 = mg is computed to be 0 for all n 2 . 2. When a GEN transition d is enabled, the EMC state change is triggered either at the time of ring of d or when d is disabled by a competitive EXP transition. Recall that the set (m) de nes the states which are reachable from m by ring of the competitive EXP transitions. Depending on the type of state n, we have the following cases. (a) If n is in (m) but not in (m), then n is only reachable by ring of a competitive EXP transition. Let the ring time X of the GEN transition be x. E

E

G

If 0 t < x, the competitive EXP transitions may have red during [0; t]: P fY1 = n; T1 t j Y0 = mg = P fthe state of the subordinated CTMC is n at time t j Y0 = mg = [eQ(m)t ]m;n n 2 (m) : E

If t x, they can re only up to time x ([eQ(m)x ]m;n ). Thus: (

Km;n (t) =

[eQ(m)t ]m;n [eQ(m)x ]m;n

0t 0

(3:15)


G

Pm;n = E[[eQ(m)X ]m;n ]

(3:16)

where E[] denotes the expected value and X is the ring time of d, if n 62 (m) but n 2 (m) : E

G

Pm;n =

X

m 2 (m)

E[[eQ(m)X ]m;m ] (m0; n) 0

(3:17)

0


G

Pm;n = E[[eQ(m)X ]m;n] +

X

m 2 (m)

E[[eQ(m)X ]m;m ] (m0 ; n) 0

(3.18)

0


G

Pm;n = 0:

(3:19)

38 Proof These results follow from Theorem 3.4 using P = K(1) and E[X] = R01 x dFd (x). 2

Now we derive expressions for the local kernel E(t).

Theorem 3.5 The local kernel E(t) = [Em;n(t)] (m; n 2 ) of the MRSPN is given by: 1. when G (m) = ; : Em;n (t) = m;n e?m t

(3:20)

where m;n is the Kronecker de ned by m;n = 1 if m = n and 0 otherwise, 2. when G (m) = fdg : for n 2 (m), for n 62 (m),

Em;n(t) = [eQ(m)t ]m;n (1 ? Fd (t))

(3:21)

Em;n (t) = 0:

Proof As de ned in Theorem 3.3: Em;n (t) = P fM(t) = n; T1 > t j Y0 = mg which is the state transition probability of the marking-process between two transition epochs of the EMC. Starting with M(0) = m and knowing that T1 > t, the marking-process fM(u); 0 u tg is a CTMC as explained below. 1. Suppose G (m) = ;. Then the ring of any EXP transition triggers the state change of the EMC. The probability of marking-process being in state n at time t (before the EMC state change occurs) given that it entered state m at time 0 is the probability that the marking-process stays at the initial state m until time t, i.e., M(u) = M(0) for all 0 u < T1 . The fM(u); 0 u < T1 j M(0) = mg in this case is a degenerative CTMC that stays in state m. Therefore: P fM(t) = n; T1 > t j Y0 = mg = m;n (1 ? Hm (t)) = m;n e?m t where Hm (t) is de ned in Equation (2.8). 2. Suppose G (m) = fdg. Then the EMC state change is triggered either at the time of ring of d or when d is disabled. Recall that the marking-process stays in a state in

39

(m), starting from m, before the next EMC state change occurs. The marking-process is captured by the subordinated CTMC with generator matrix Q(m). If we let the ring time X of the GEN transition be x, Em;n (t) is the transition probability of this CTMC to state n (n 2 (m)) by time t (t < x). That is, for state n 2 (m) Em;n (t) is evaluated to [eQ(m)t ]m;n . If d is disabled before t by ring of a competitive EXP transition, the next state should be outside of (m) . That is, for state n 62 (m) Em;n (t) = 0. Unconditioning X, we get: Em;n(t) =

Z

t

1

[eQ(m)t ]m;n dFd (x)

= [eQ(m)t ]m;n (1 ? Fd (t)) n 2 (m) :

2

The above two cases cover all the possibilities.

Corollary 3.2 Given the state transition probability matrix V (t) and the initial probability distribution p(0) = (pj (0)), the state probability at time t of the MRSPN is computed by:

pj (t) =

X

i2

i; j 2 :

pi (0) Vij (t)

Proof Follows by conditioning on the initial state Y0.

(3:22)

2

3.3.2 Transient Analysis in the Transform Domain We discuss methods of computing Equation (3.8) in this section. A direct approach may be to solve the system of integral equations:

8 i; j 2 ;

Vij (t) = Eij (t) +

X

Z

k2 0

t

Vkj (t ? x) dKik (x)

(3:23)

which is discussed in detail in [45]. Laplace-Stieltjes transformation can also be employed. De ne the Laplace transform (LT) R of a function f(t) to be f (s) = 01 e?st f(t) dt and the Laplace-Stieltjes transform (LST)

40 R of a function f(t) to be f (s) = 01 e?st df(t): Then LT and the LST satisfy the following

relation5:

f (s) = sf (s) ? f(0) :

(3:24)

R The F (s) = 01 e?st dFd (t) is the LT of the ring time X of d and is known for most of

distributions. For a square matrix A, de ne F (A) to be 01 e?At dFd (t). The LT (LST) of a matrix of functions is de ned to be the matrix of the LTs (LSTs) of elements. Taking LST's on both sides of Equation (3.8), we get: R

V (s) = E (s) + K (s)V (s) ; V (s) = [I ? K (s)]?1E (s) :

(3.25)

Therefore, we can get V (s) once K (s) and E (s) are known. Then V (t) can be obtained either analytically or numerically inverting V (s). We now show that V (t) can be computed this way by deriving the expressions of the K (s); E (s) of the MRSPN .

Theorem 3.6 The LST of K(t) = [Km;n (t)] (m; n 2 ) of the MRSPN is given by: 1. for state m such that G (m) = ; : (s) = (m; n) Km;n s+

(3:26)

m


(s) = Km;n

G

s(sI ? Q(m))?1 ? Q(m)(sI ? Q(m))?1 F (sI ? Q(m)) m;n

(3.27)

if n 62 (m) but n 2 (m) : E

G

(s) = Km;n

X

m 2 (m) 0

(sI ? Q(m)) (m0 ; n) Fm;m 0

(3.28)


(s) = Km;n

G

s(sI ? Q(m))?1 ? Q(m)(sI ? Q(m))?1 F (sI ? Q(m)) m;n

+

X

m 2 (m) 0

(sI ? Q(m)) (m0 ; n) Fm;m 0

5 This relation holds when the derivative of f() is bounded.

(3.29)

41 if n 62 (m) and also n 62 (m) : E

G

(s) = 0 : Km;n

(3:30)

Proof Equation (3.26), (3.30) are easily obtained from the de nition of LST. Equation (3.27) is obtained using the fact (see Appendix for the proof): t2

Z

t1

e?sx eQx dx = (sI ? Q)?1 f e?(sI ?Q)t1 ? e?(sI ?Q)t2 g

(3.31)

and taking LST's on Equation (3.11): (s) = Im;n + Km;n

?

Z

1

0+

Z

1

0+

[Q(m) e?(sI ?Q(m))t ]m;n dt

[e?(sI ?Q(m))t ]m;n dFd (t) +

Z

1

0+

?

Z

1

0+

[Q(m) e?(sI ?Q(m))t ]m;n Fd (t) dt

[e?(sI ?Q(m))t ]m;n dFd (t)

= [Q(m)(sI ? Q(m))?1 ? Q(m) F (sI ? Q(m))]m;n + Im;n = [Q(m)(sI ? Q(m))?1 ? Q(m) (sI ? Q(m))?1 F (sI ? Q(m))]m;n + Im;n = [s(sI ? Q(m))?1 ? Q(m)(sI ? Q(m))?1 F (sI ? Q(m))g]m;n : Equation (3.28) is obtained from taking LST's on Equation (3.12): (s) = Km;n

=

1

Z

0

e?st

X

m 2 (m)

[eQ(m)t ]m;m dFd (t) (m0 ; n) 0

0

X

Z

m 2 (m) 0

1

[e?(sI ?Q(m))t ]m;m dFd (t) (m0 ; n) 0

0

=

X

m 2 (m) 0

(sI ? Q(m)) (m0 ; n): Fm;m 0

Equation (3.29) is obtained from the results of above two cases.

2

Theorem 3.7 The LST of E(t) = [Em;n(t)] (m; n 2 ) of the MRSPN is given by: 1. for state m such that G (m) = ; : s (s) = m;n Em;n s + m

(3:32)

42 2. for state m such that G (m) = fdg, for n 2 (m) :

(s) = [s(sI ? Q(m))?1 f I ? F (sI ? Q(m)) g]m;n Em;n

(3:33)

(s) = 0: Em;n

for n 62 (m) :

Proof Equation (3.32) is obtained from taking LST on Equation (3.20). Equation (3.33) is obtained from Equation (3.21) by: (s) = Im;n + Em;n

?

Z

1

0+

Z

1

0+

e?st [Q(m) eQ(m)t ]m;n dt

?

Z

1

0+

e?st [Q(m) eQ(m)t ]m;n Fd (t) dt

e?st[eQ(m)t ]m;n dFd (t)

= [Q(m)(sI ? Q(m))?1 ? Q(m) F (sI ? Q(m)) ? F (sI ? Q(m))]m;n + Im;n = [Q(m)(sI ? Q(m))?1 ? fQ(m) (sI ? Q(m))?1 + I gF (sI ? Q(m))]m;n + Im;n = [s(sI ? Q(m))?1 f I ? F (sI ? Q(m)) g]m;n :

2

3.4 Steady State Analysis of MRSPNs In this section, we consider the steady state analysis of an MRSPN whose underlying SMP is nite and ergodic (irreducible, aperiodic and positive recurrent) so that the limiting probability distributions exist. These can be computed using the general theory of MRGP. We brie y review the theory below. De ne m = E[T1 j Y0 = m] ; mn = E[time spent by the marking-process in state n during [0; T1) j Y0 = m] =

Z

0

1

P fM(t) = n; T1 > t j Y0 = mg dt

(3.34)

and the steady state probability vector v = (vj ) of the EMC: v = vP;

X

j 2

vj = 1

43 where P = K(1) is the one-step transition probability matrix of the EMC de ned in Section 3.2. When G (m) = ;, T1 is the time until any of EXP transitions res. Hence, m = E[T1 j Y0 = m] = 1=m where m is as de ned in Equation (3.9). When G (m) = fdg, m is given by: m = E[T1 j Y0 = m] = =

Z

1

X

Z

x

0 n2 (m) 0 Z 1 X n2 (m) 0

[eQ(m)u ]m;n du dFd (x)

Lm;n (x) dFd (x) :

(3.35)

The mn is computed by: mn = Note that m =

X

n2

Z

0

1

P fM(t) = n; T1 > t j Y0 = mgdt =

1

Z

0

Em;n(t) dt :

(3:36)

mn. By changing the order of integration in Equation (3.35), we get

Equation (3.36). When the GEN transition d is enabled concurrently with EXP transitions, then m is the same as E[X], where X is the time to re of the GEN transition. The following theorem [67] describes the steady state probability distribution of the MRSPN . We include the proof for completeness.

Theorem 3.8 Let fM(t); t 0g be an MRGP with embedded Markov renewal sequence f(Yn ; Tn); n 0g. Suppose v = (vj ) is a positive solution to v = vP and the SMP of the MRGP is nite and ergodic. The limiting distribution = (j ) of the state probabilities of the MRGP is given by: X

j = tlim !1 P fM(t) = j j Y0 = mg = where

kX 2

vk kj

k2

vk k

=

X

k2

k kj k

(3:37)

k = Xvk k : vr r r2

Proof

Let V (t) = [Vij (t)], Vij (t) = P fM(t) = j j M(0) = Y0 = ig as in Equation (3.6). From Equation (3.8), we have V (t) = E(t) + K V (t) which takes the form of the generalized

44 Markov renewal equation [20]. The steady state solution V (t) of the generalized Markov renewal equation is known as [20]: Z

lim V (t) t!1 ij

=

1X

vk Ekj (t) dt

0 k2

X

k2

vk k

:

(3:38)

Now, if we let 1fAg = 1 when the event A is true or 0 otherwise: kj = E[time spent by the M(t) in state j during [0; T1) j Y0 = k] = E[ = = =

T1

Z

0 1

Z

0 Z

0 Z

E[

1fM (t)=j g dt j Y0 = k]

1Z u

0

T1

Z

0

P fM(t) = j j Y0 = k; T1 = ug dt dHk(u)

1Z 1

0

t

1fM (t)=j g dt j Y0 = k; T1 = u] dHk (u)

P fM(t) = j j Y0 = k; T1 = ug dHk(u) dt

(changing the order of integration) = =

Z

0 Z

0

1 1

P fM(t) = j; T1 > t j Y0 = kg dt Eij (t) dt :

Substituting this result in Equation (3.38), we get the Equation (3.37).

2

Intuitively j is the fraction of time the marking-process spends in state j and is given by: j =

X

k2

(fraction of time the SMP of the MRSPN spends in state k) (the time the marking-process spends in state j per unit time of the SMP

spent in state k) X = k kj : k k2

45

3.5 Example We illustrate the analysis methods through an example of the M/G/1/2/2 queueing system. The M/G/1/2/2 queueing model represents two customers in the system each of which submits a job at exponentially distributed interval, and one server with generally distributed service times. The size of the queues for arriving jobs is two. Figure 3.3 shows the MRSPN , its reachability graph, and its reduced reachability graph. Each client is assumed to submit a job with an

p1 #

p2

p3

ti

ta

g p4

Transition Distribution Parameters EXP #(p1) ta g (0:5; 1:0) Uniform ti Immediate 1.0 (a) MRSPN

m1

2001

m4

1101

m2

1010

m3

0110

ta ti

m1 g

ta g

(b) Reachability Graph

m2 m3

2001

ta

g

1010

ta

0110

g (c) Reduced Reachability Graph

Figure 3.3: MRSPN and Reachability Graphs of M/G/1/2/2 System

46 interval that is exponentially distributed with rate = 0:5 job/hour, and the service time of a job is assumed to be uniformly distributed in the interval (0:5; 1:0) hours. The # sign above the transition ta indicates the marking-dependent ring rate as explained in Section 3.2. The transition ti is an immediate transition. Dashed (solid) arcs represent the state transitions by EXP (GEN) transitions and dotted arcs represent the state transitions by immediate transitions. We compute the state probability vector p(t) = (pj (t)), (j = 1; 2; 3) at time t with given initial marking m1 = (2001) so that pj (t) = V1j (t). The kernel K(t) = [Kij (t)] (i; j = 1; 2; 3) of this model is given as: 0tb

8
0

is its second derivative, and f100(t) is 100th approximation sewhere f(t) is pj (t) and f(t) quence of f(t). The plot in Figure 3.4 shows the transient (time-dependent) state probabilities pj (t) over a time interval [0,6] along with the steady state probabilities j which are computed from Theorem 3.8. As expected, the transient state probabilities approach the steady state probabilities as time approaches in nity.

1+

1 p1(t) 2 p2(t) 3 p3(t)

0:8 + +

0:6

3

0:4

0:2

+

3 +

?

+++ ++++++ ++++++ 3 +++3 3++++3 3+++++ 3

???????????????????????????? ? 0 ? ? 1 2 3 4 5 6 0 Time t

Figure 3.4: Transient and Steady State Probabilities of M/G/1/2/2 System

Chapter 4 Modeling of Uniformly Distributed Firing Times In this chapter, we provide a special case of the equations shown in the previous chapter for a state m where a transition d with uniformly distributed ring time over the interval (a; b) is enabled.

4.1 Uniform Distribution A continuous random variable is said to have a uniform distribution UNI (a; b) over the interval (a; b); (b > a > 0) if its probability density function f(t) and the distribution function F(t) are given by: (

f(t) =

1

b?a 0

8
> > >
> > > > > :

[eQ(m)t ]m;n b ? t [eQ(m)t ] + 1 fL (t) ? L (a)g m;n m;n b?a b ? a m;n 1 b ? a fLm;n (b) ? Lm;n (a)g

0t b

(4:3)

50 2. if n 62 (m) but n 2 (m) : E

G

8 > > > > > > >
> > > > > > :

0

1 0 b ? a m 2 (m)fLm;m (t) ? Lm;m (a)g (m ; n) X

0

0

0t b

0

3. if n 2 (m) and also n 2 (m) : E

G

8 > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > :

[eQ(m)t ]m;n b ? t Q(m)t 1 m;n + b ? a fLm;n (t) ? Lm;n (a)g b ? a [e ]X fLm;m (t) ? Lm;m (a)g (m0 ; n) + b ?1 a m 2 (m) 0

0

0t b

0

4. if n 62 (m) and also n 62 (m) : E

G

Km;n (t) = 0;

t0 :

(4:6)

Proof These results are obtained by using the de nition of L(t) and applying Equation (4.1)

2

to Theorem 3.4.

Corollary 4.2 Let X be the ring time of GEN transition which is enabled in state m, and X UNI (a; b). The one-step transition probabilities from state m, Pm;n (m; n 2 ), of the EMC are given by: 1. if n 2 (m) but n 62 (m) : E

G

Pm;n = b ?1 a fLm;n (b) ? Lm;n (a)g

(4:7)

51 2. if n 62 (m) but n 2 (m) : E

G

Pm;n = b ?1 a

X

m 2 (m)

fLm;m (b) ? Lm;m (a)g (m0 ; n) 0

0

(4:8)

0


G

X Pm;n = b ?1 a fLm;n (b) ? Lm;n (a)g + b ?1 a f(Lm;m (b) ? Lm;m (a)g (m0 ; n) (4.9) m 2 (m) 0

0

0

4. if n 62 (m) and also n 62 (m) : E

G

Pm;n = 0:

(4:10)

Proof These results follow from Corollary 4.1 and the fact P = K(1).

2

Now we derive expression for Em;n(t).

Corollary 4.3 Let X be the ring time of GEN transition which is enabled in state m, and X UNI (a; b). The Em;n (t) (m; n 2 ) of the MRSPN are given by: 1. for n 2 (m), 8 > [eQ(m)t ]m;n 0t > > < (4:11) Em;n (t) = > b ? t [eQ(m)t ]m;n a t b b ? a > > > :

2. for n 62 (m),

0

t>b

Em;n (t) = 0 t 0:

Proof These results are obtained by applying Equation (4.1) to Equation (3.21).

2

The Laplace-Stieltjes transform (LST) of the K(t), E(t) are given as follows.

Corollary 4.4 Let X be the ring time of GEN transition which is enabled in state m, and X UNI (a; b). The LST of Km;n (t) (m; n 2 ) is given by:

52 1. if n 2 (m) but n 62 (m) : E

G

(s) = [ s(sI ? Q(m))?1 ? Km;n

(4.12)

?1 ?(sI ?Q(m))a ?(sI ?Q(m))b fe ?e g ]m;n Q(m)(sI ? Q(m))?1 (sI ?b Q(m)) ?a

2. if n 62 (m) but n 2 (m) : E

G

(s) = 1 Km;n b?a

X

m 2 (m)

[ (sI ? Q(m))?1 fe?(sI ?Q(m))a ? e?(sI ?Q(m))b g ]m;m (m0 ; n) (4:13) 0

0


G

(s) = [ s(sI ? Q(m))?1 ? Km;n

(4.14)

?1 ?(sI ?Q(m))a ?(sI ?Q(m))b fe ?e g ]m;n Q(m)(sI ? Q(m))?1 (sI ?b Q(m)) ?a X + b ?1 a [ (sI ? Q(m))?1 fe?(sI ?Q(m))a ? e?(sI ?Q(m))b g ]m;m (m0 ; n) m 2 (m) 0

0

4. if n 62 (m) and also n 62 (m) : E

G

(s) = 0; Km;n

t0:

(4:15)

Proof Follows from Theorem 3.6 and the LST of a uniformly distributed random variable: F (sI ? Q(m)) = (sI ? Q(m))?1 (b ? a)?1 fe?(sI ?Q(m))a ? e?(sI ?Q(m))b g:

2

Corollary 4.5 Let X be the ring time of GEN transition which is enabled in state m, and (s) (m; n 2 ) of the MRSPN is given by: X UNI (a; b). The Em;n 1. for n 2 (m), ?1

(s) = [ s(sI ? Q(m))?1 fI ? (sI ? Q(m)) fe?(sI ?Q(m))a ? e?(sI ?Q(m))b gg ]m;n Em;n b?a

2. for n 62 (m),

(s) = 0 Em;n

t 0:

Proof Follows from Theorem 3.7 and the LST of a uniformly distributed random variable. 2

Chapter 5 Modeling of Deterministic Firing Times Constant processing durations often arise in many practical problems. For instance, time-outs or propagation delays of communication networks are naturally associated with constant delays. The analysis method of MRSPN is revisited in this chapter for the case that a GEN transition d is enabled in state m and the ring time of d has a deterministic distribution, i.e., the ring time is a constant. We discuss this case in terms of Deterministic and Stochastic Petri Net (DSPN) in which general distributions are restricted to be deterministic distributions1.

5.1 Deterministic and Stochastic Petri Nets (DSPNs) The Deterministic and Stochastic Petri Net (DSPN) was rst introduced in [2] along with a steady state solution method. An improved numerical algorithm for steady state analysis was presented in [70]. An algorithm for parametric sensitivity analysis of the steady state solution was presented in [25]. DSPNs have been applied for modeling an Ethernet bus LAN [3], a fault-tolerant clocking system [72], and CSMA/CD protocol with deterministic collision resolution time [71]. Following [2], we assume that at most one deterministic transition is allowed to be enabled in each marking and the ring time of a deterministic transition is markingindependent. We will call a timed transition whose ring time is deterministically distributed a

DET transition. An example DSPN is shown in Figure 3.1(c). Figure 3.1(a) is also a DSPN

when transitions ts and tv are DET transitions. 1 This chapter is based on the paper Transient Analysis of Deterministic and Stochastic Petri Nets by H. Choi and V. G. Kulkarni and K. S. Trivedi in the 14th International Conference on Application and Theory of Petri Nets, Chicago, U.S.A., Jun. 1993.

53

54 The marking-process fM(t); t 0g underlying the DSPN is formed by the changes of the tangible markings over the time domain. Let be the state space of the marking-process of the DSPN. Consider a sequence fTn; n 0g, of epochs when the DSPN is observed. Let T0 = 0 and de ne fTn; n 0g recursively as follows: Suppose M(Tn +) = m. 1. If no DET transition is enabled in state m, de ne Tn+1 to be the rst time after Tn that a state change occurs. If no such time exists, we set Tn+1 = 1. 2. If a DET transition is enabled in state m, de ne Tn+1 to be the time when the DET transition res or is disabled. Note that there cannot be more than one DET transition enabled in state m, and hence the above cases cover all possibilities. Let Yn = M(Tn+).

Theorem 5.1 The marking-process fM(t); t 0g of a DSPN is a Markov regenerative process.

Proof

In Theorem 3.2, we have shown that the marking-process of a class of stochastic timed Petri nets with at most one GEN transition enabled in each marking and with the ring time distribution of the GEN transition being marking-independent, is a Markov regenerative process. A DSPN belongs to this class because the deterministic distribution is one of the general distributions. Therefore the marking-process of a DSPN is a Markov regenerative process. 2 Now that the DSPN is shown to be the MRSPN, the solution method for MRSPN can be applied for the DSPN.

5.2 Transient Analysis of DSPNs In this section, we develop K(t); E(t) matrices for computing the transient distribution of the marking-process fM(t); t 0g of the DSPN. De ne the kernel K(t) = [Kij (t)] as in Equation (3.5) and V (t) = [Vij (t)] as in Equation (3.6). As in Equation (3.8), E(t) = [Eij (t)] describes the behavior of the marking-process between two transition epochs of the EMC fYn; n 0g, i.e., over the time interval [0; T1). Consider a state m in . Let G (m) be the set of DET transitions and E (m) be the set of EXP transitions enabled in state m. As in Chapter 3, de ne (m) to be the set of all states reachable from m in which the marking-process can spend a non-zero time before the ring of

55 the DET transition d or competitive EXP transitions enabled in state m, i.e., before the next EMC transition occurs2 . Next, de ne (m) to be the set of states which can be reached starting from m (not necessarily directly) by ring of a competitive EXP transition. Similarly, de ne

(m) to be the set of states reached by ring of the DET transition d. E

G

Using the above notation, we now describe the kernel K(t).

Theorem 5.2 The kernel K(t) = [Km;n (t)] (m; n 2 ) of the marking-process of the DSPN is given by: 1. for state m such that G (m) = ;, (

Km;n (t) =

0

m = 0 m > 0

(m;n) (1 ? e?m t)

m

(5:1)

where (m; n) is the transition rate from marking m to n in the reduced reachability graph, 2. for state m such that G (m) = fdg with being the ring time of transition d, if n 2 (m) but n 62 (m) : E

G

8
t, the marking-process fM(u); 0 u tg is a CTMC whose in nitesimal generator matrix is Q(m). 1. Suppose G (m) = ;. Then the ring of any EXP transition triggers the state change of the EMC. The probability of marking-process being in state n at time t (before the EMC state change occurs) given that it entered state m at time 0 is the probability the marking-process stays at the initial state m until time t, i.e., M(u) = M(0) for all 0 u < T1 . The fM(u); 0 u < T1 j M(0) = mg in this case is a degenerative CTMC that stays in state m. Therefore: P fM(t) = n; T1 > t j Y0 = mg = m;n (1 ? Hm (t)) = m;n e?m t where Hm (t) is de ned in Equation (2.8).

59 2. Suppose G (m) = fdg. Then the EMC state change is triggered either at the time of ring of d or when d is disabled. In this case, we have the set (m) which consists of the states that marking-process can visit starting from m until the next EMC state change occurs. Since the EMC state change occurs after time t, the marking-process at time t will remain in (m) . Therefore,

8 n 62 (m);

Em;n(t) = 0 :

For state n 2 (m), the marking-process is captured by the subordinated CTMC with generator matrix Q(m) which describes the behavior of the marking-process over the time interval [0; T1). The transition probability of this CTMC at time t is: P fM(t) = n; T1 > t j Y0 = mg =

(

[eQ(m)t ]m;n 0

t 1)

90

POA

PCH

1

tc0

PSI PSA

tc2 tc1

PTI 1

PTA tta

tts

PTB

ttb PTT

PTW

ttt

Figure 7.6: SRN for Tagged Client Subsystem in CSMA/CD Network

91 transitions tts, tc0, tc1 and tc2 represent the event that the tagged client is contending for the channel. Transition tc0 represents the event that the tagged client, the superclient (one or more members in the superclient subsystem) and the server, all attempt simultaneously to gain access to the idle channel. Transition tc1 represents the event that both the tagged client and the server attempt to gain access to the idle channel. Transition tc2 represents the event that both the tagged client and the superclient attempt to gain access to the idle channel. All these events result in collision. The only event that leads to the client successfully gaining access right of the channel is tts since it is enabled when only the tagged client attempts to gain access to the idle channel, i.e., there is no PN token in either POA or PSA , and there is a PN token in PTA and one in PCH . The common ring rate of these transitions is . After the ring time of tts elapses, i.e., a collision detection time, a PN token is put in the place PTT which represents the condition that the client has gained access right of the channel. The place PTB represents the condition that the client is in the backo state. The timed transition ttb has a ring rate of , and its ring denotes the end of a backo period. Figure 7.7 shows the SRN for the superclient subsystem. This gure varies from Figure 7.6 in a few ways. First, the timed transitions toa and tob have marking dependent ring rates. The place POB denotes the condition that a client is in the backo state. There is a new set of conditions that can occur when the channel is sensed idle and both the server and the tagged client are idle (or in the backo state). If, under this condition, POA contains two or more PN tokens then the timed transition tc4 is enabled. When it res all the PN tokens in POA are removed and an equivalent number of PN tokens are deposited in POB . (We have used a dierent symbol, the bow-tie ./, to denote the variable marking of two or more PN tokens.) This corresponds to the condition that two or more clients generate requests within the period of vulnerability when both the tagged client and the server are idle. These active clients' requests will then collide and all aected clients will enter the backo state. Figure 7.8 shows the SRN for the server subsystem. Since we have a FIFO queue at the server, we ensure that a new reply cannot be transmitted until after a previous reply has been successfully transmitted. The place PSE represents the condition that a reply has been generated to be sent to a client, and PSA represents the condition that the reply has reached the head of the queue and is ready to be transmitted. The immediate transition s1 is enabled when PSE contains at least one PN token (i.e., a reply is available), PSA contains no PN token (i.e., no reply is currently contending for the channel), PST contains no PN token (i.e., no reply is being

92

N-1

#

POI

toa

POA

POB

tc4 # tob

tos POT

tot

POW

PCH

1

tc0 tc3

PSA

PSI

tc2

PTA

Figure 7.7: SRN for the Superclient Subsystem in CSMA/CD Network

93

POA

N-1

POW

POI

PCH

1

tc0

POH

PSB

tc3 tsb PSI

PSE

PSA

s1

tsa

s2

tss

PST

s3 tst

PSW s4

tc1

PTI

1

PTA PTW

Figure 7.8: SRN for the Server Subsystem in CSMA/CD Network

94 transmitted after gaining access right of the channel), and PSB contains no PN token (i.e., the server is not in a backo state). The places PTW , POH and POW operate in the same manner described in Figure 7.4. Figure 7.9 shows the SRN for the complete system.

7.1.4 Numerical Results We now present the numerical results of solving the SRN models in Section 7.1.2 and 7.1.3 with the following parameters. Consider a network with a channel capacity of 10 Mbits/second and a signal propagation velocity of 2 108 m/second. We assume that the cable length is 2 Km and the request packet length is variable with mean size 1000 bits/packet (which corresponds to 1= = 0:1 ms), and that the reply packet length is also variable with mean 4 Kbytes/packet (which corresponds to 1= = 3:2 ms). For the CSMA/CD network we assume that the collision detection time (including time for collision reinforcement) is variable with mean 1= = 0:01 ms and that the backo time is variable with mean 1= = 0:5 ms. For the token ring network we assume that the mean token length is 3 bytes, which corresponds to a mean walk time 1= = f[0:01=(N + 1)] + 0:0024g ms, where N is the number of client workstations. Finally, we assume that the mean time taken by the server to produce a reply is 1= = 2 ms. We de ne the oered load by = N[(1=) + (1= )]. Let denote the probability that the tagged client is idle, and the mean response time at a client workstation, that is, the mean time that elapses from the instant a request is generated until the reply to the request is received. Then 1= : = (1=) + This gives

= (1 ? )=():

Since we obtain by solving the CTMC of the SRN, we can easily compute . Table 7.1 shows the mean response time (in seconds) of the client-server systems in a token ring network and a CSMA/CD network for dierent values of and N when the mean reply packet length is 4 Kbytes (i.e., 1= = 3:2 ms). Table 7.2 shows the mean response time obtained with a mean reply packet length of 8 Kbytes (i.e., 1= = 6:4 ms). Table 7.3 shows the size of the

95

#

N-1 POI

tc4

POA

toa

POB

#

tob

tos POT

POW

tot

PCH

1

tc0

PSB

POH

tc3 tsb PSI

PSA

tsa

s1

PSE

s2

tss

PST

s3 tst

PSW s4

tc2

tc1

PTI

1

PTA tta

tts

PTB

ttb PTT

ttt

PTW

Figure 7.9: SRN for the CSMA/CD Network-Based System

96 models, the number of states in the CTMCs and the number of nonzero entries in the CTMC matrix. The sizes of the Markov models increase rapidly as the number of stations increases. Observe that when the mean reply packet length is 4 Kbytes and = 0:9; the value of is approximately 30 requests/second when N = 10. In an environment where no reply packet is required subsequent to a successful transmission (i.e., there is no server, hence it is a plain token ring or CSMA/CD network system), the oered load would be 0 = N= = 0:03: The corresponding mean response time (i.e., mean time to gain access right of the channel plus packet transmission time) would be approximately 0.0001 seconds: the time to transmit a request packet, because oered load to the network is so small that the mean time to gain access right is negligible. Thus, the presence of the server signi cantly increases the mean response time. Also, the performance of the system based on CSMA/CD network is better than that of the token ring network based system when the oered load, , is small. As increases, the token ring based system performs better than the CSMA/CD based system because of increased collisions in the latter. This is the same phenomenon that occurs in plain local area network systems, i.e., a CSMA/CD network performs better than a token ring network at a low trac. The crossover point depends on the length of the reply packet: the higher the reply packet length, the later the crossover. This fact is depicted in Figures 7.10 and 7.11 which compare the mean response times of the token ring based system and CSMA/CD based system with 5 stations for the reply packet of length 4 Kbytes and 8 Kbytes respectively. Note that a value of such as = 6:6 in client-server systems corresponds to 0 = 0:2 in a plain token ring or CSMA/CD LAN system, which is a small value of oered load in a practical network. Thus, the range of values of used in Figures 7.10 and 7.11 are of practical interest. Figure 7.12 shows the throughput (the mean number of requests processed per second) of the server for the both systems with respect to the oered load, and Figure 7.13 shows average number of messages at the server which are either being served or waiting for the service. It is easy to con rm that the throughput and the mean number of messages at the server are related with each other. As we expected, the throughput of the client-server system in token ring network gets better than that in CSMA/CD network as the oered load to the network increases. This is because, in CSMA/CD network, the server station experiences more frequent collisions when it tries to send reply packets to client workstations. The delay in the reply packets blocks the client workstations that are waiting for the replies for their requests. As a

97

Table 7.1: Mean Response Time (Mean Reply Packet Length : 4 Kbytes) 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5

N = 5 N = 7 N = 10 CSMA/CD Token Ring CSMA/CD Token Ring CSMA/CD Token Ring 0.005789 0.005807 0.005835 0.005860 0.005870 0.005905 0.006824 0.006879 0.007055 0.007129 0.007257 0.007353 0.007916 0.007991 0.008466 0.008561 0.009009 0.009120 0.008989 0.009053 0.009960 0.010024 0.011032 0.011065 0.009988 0.010019 0.011432 0.011424 0.013168 0.013053 0.010889 0.010873 0.012805 0.012710 0.015262 0.014975 0.011683 0.011616 0.014040 0.013859 0.017204 0.016762 0.012376 0.012257 0.015126 0.014870 0.018943 0.018377

Table 7.2: Mean Response Time (Mean Reply Packet Length : 8 Kbytes) 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5

N = 5 N = 7 N = 10 CSMA/CD Token Ring CSMA/CD Token Ring CSMA/CD Token Ring 0.009232 0.009264 0.009298 0.009342 0.009350 0.009408 0.010806 0.010967 0.011137 0.011356 0.011423 0.011699 0.012504 0.012818 0.013301 0.013735 0.014078 0.014625 0.014227 0.014672 0.015681 0.016291 0.017269 0.018022 0.015898 0.016436 0.018139 0.018868 0.020839 0.021699 0.017463 0.018062 0.020551 0.021350 0.024563 0.025457 0.018896 0.019530 0.022823 0.023664 0.028222 0.029118 0.020186 0.020838 0.024905 0.025772 0.031656 0.032553

Table 7.3: State Space and Storage Requirements N = 5 N = 7 N = 10 No. of Nonzero No. of Nonzero No. of Nonzero System States Entries States Entries States Entries CSMA/CD 1,478 4,271 5,542 18,132 24,033 86,772 Token Ring 1,880 4,400 7,420 18,536 34,210 90,050

98

0:014 0:013 + 3

0:011 Response Time in seconds 0:01

+ 3

3 + 3 +

0:009 0:008 0:007 0:4

+ 3

+ 3

0:012

+ 3

Ring, 4K 3 CSMA, 4K +

3 + 0:6

0:8

1

1:2 1:4 Oered Load

1:6

1:8

2

Figure 7.10: Mean Response Times : 5 Client Stations, 4 Kbytes Reply Packet

99

0:032 0:03

3 +

3 +

+ 3

+ 3

+ 3

+ 3

9

10

3 +

0:028 Response Time in seconds 0:026

3 +

3 +

0:024 0:022

3 + 2

Ring, 8K 3 CSMA, 8K + 3

4

5

6 7 Oered Load

8

Figure 7.11: Mean Response Times : 5 Client Stations, 8 Kbytes Reply Packet

100

300 250 200 Throu- 150 ghput 100

+ 3

3 +

+ 3

50 0

+ 3

3 +

Ring 3 CSMA +

+ 3 0

3 +

3 3 + + 3 + 3 + 3 +

0:5

1

1:5 Oered Load

2

2:5

3

Figure 7.12: Throughput of the Server : 5 Client Stations, 4 Kbytes Reply Packet result, the overall message generation from the client workstations becomes degraded, hence the throughput does not increase even if the oered load increases.

7.1.5 Sensitivity Analysis Sensitivity is another important factor in system design. It is very useful for bottleneck analysis and optimization of a system. During the design stage of a system, it is usual that the exact values of input parameters for the system model are not known. Parametric sensitivity analysis [44] provides a means of studying the eects of these uncertainties on the output measure of the model. It is performed to determine the parameters to which the model is sensitive and the degree of sensitivity. The rates and probabilities of transitions in an SRN model can be de ned as functions of some independent parameter (e.g., message generation rate or transmission rate etc.) and the eect of variation in this parameter on the output measures can be studied. Let the sensitivity functions be the derivatives of state probabilities with respect to the

101

1:1 1 0:9 3 + + 0:8 3 3 + + 3 Ave. 0:7 + 3 No. 0:6 + 3 of Mess- 0:5 + 3 ages 0:4 + 3 Ring 3 0:3 + CSMA 0:2 + 3 0:1 3 + 0 0 0:5 1 1:5 2 Oered Load

3 +

2:5

3 +

3

Figure 7.13: Average Number of Messages at the Server : 5 Client Stations, 4 Kbytes Reply Packet parameters. By dierentiating Equation (2.6) with respect to , we obtain a system of equations @ Q = ? @Q ; @ @

@i = 0: i @

X

The sensitivity functions @ @ are obtained by solving the above system of equations in much the

same way as Equation (2.6). Here @Q @ is the derivative of the in nitesimal generator matrix with

@ respect to (see Equation (6.5) for the computation of @Q @ ). Once we have @ , the parametric sensitivities of the performance measures of interest can be obtained.

Figure 7.14 shows normalized derivatives of the throughput ( @T @ , where T is throughput and 2 f; ; ; ; ; g), i.e., the increment of the throughput with respect to a fractional change in the parameter values of ; ; ; ; or of the system in CSMA/CD network with 4 Kbytes reply packets. It is interesting to note that, unlike a plain CSMA/CD network, (collision detection rate) and (backo rate) are not the main factors that aect the clientserver system's performance even if the system performance is gradually getting sensitive to as the number of client workstations increases. The main parameters that aect the throughput

102 are rather (request generation rate), (reply transmission rate) and (reply generation rate), in that order. In order to improve the throughput, we have to increase generation rate of request and reply or increase the reply transmission rate (shorter transmission time, smaller reply packets).

7.1.6 Accuracy of the Superclient Approximation The main approximation made in the model of the ring-network based system is the aggregation of multiple clients into a superclient subsystem. Although there is considerable saving that accrues in the storage space and solution time, errors are inevitable. The main loss of delity is that the location-dependency among the N ? 1 clients is not considered in the approximate model. Suppose a PN token has arrived at POS from POP of Figure 7.5. That means the superclient subsystem has just gained the access right. Consider a situation where the rst and last members of the N ? 1 clients group have request messages ready. Then place POA has 2 PN tokens from POI . Since both POA and POS have PN tokens, tos res (the request from the rst client is transmitted). After ring ttp, a PN token returns to POS , and since there still remains one PN token in POA which represents the request from the last client, tos res again. As a result, the request from the last client is served as if it is from the second client. In the real situation, a delay of (N ? 2) (the mean walk time) is needed for the last client to gain the access right. In the case that some intermediate client ahead of the last generates a request, the time to reach the last client becomes even longer. The location of requesting stations in superclient subsystem is not depicted in Figure 7.5. They are served just in FIFO order of request generation. It is obvious that the response time of the last station in real situation becomes much dierent from the approximate model's in this case. Figure 7.15 shows the exact model of a client-server system with 5 clients based on a token ring network. There are 5 copies of tagged-client-like-subsystem. Places PWi (i = 1; :::; 5) represent the condition that a request is waiting for its reply at i-th slot of the FIFO queue from the tail. Thus PW 5 represents a request waiting for a reply at the head of FIFO queue. All PN tokens in PWi proceed to PW (i+1) whenever there are no PN tokens in PW (i+1) . The multiplicity of output arcs from tsi to PW 1 is i. The reason that we put dierent number of tokens in PW 1 is to identify the requesting stations. When the server station nishes transmitting a reply message, a PN token is put into PSW . Depending on i, the number of tokens in PW 5 , only

103

@T @

@T @

160 140 120 100 80 60 3 40 20 2 + 0 2

14 12 10 8 6 4 2 3 0+ 2 2

2 2 3 2 + 3

3 2 +

3

2

= 3 = + = 2

2

2 +

+

4 5 6 7 8 Number of Client Workstations

9

+

+

+

10

2 2 2 2 2

3 2 + 3

2 3 3 2 3 + 3 + + + 4 5 6 7 8 Number of Client Workstations

= 3 = + = 2 +

+

9

10

Figure 7.14: Parametric Sensitivity Analysis of CSMA/CD Network-Based System

104 Table 7.4: State Space and Storage Requirements Approximate Model Exact Model Ratio No. of States 1,880 26,672 7.05 % Nonzero Entries 4,400 66,192 6.65 %

Table 7.5: Mean Response Time When Mean Reply Packet Length is 4 Kbytes Approximate Model Exact Model 0.1 0.005807 0.005807 0.3 0.006879 0.006878 0.5 0.007991 0.007989 0.7 0.009053 0.009049 0.9 0.010019 0.010013 1.1 0.010873 0.010864 1.3 0.011616 0.011604 1.5 0.012257 0.012242 2.0 0.013499 0.013477 4.0 0.015771 0.015732 8.0 0.016892 0.016839 16.0 0.017329 0.017264 20.0 0.017398 0.017330

Error 0.0 % 0.015 % 0.025 % 0.044 % 0.06 % 0.083 % 0.103 % 0.123 % 0.163 % 0.248 % 0.315 % 0.377 % 0.392 %

one of the transitions si is allowed to re by associating enabling functions with each si. This implies that the reply the server just sent is for i-th client. Table 7.4 compares the storage requirements of the approximate and the exact model. The ratio is de ned by 100 (size of approximate model / size of exact model). Table 7.5 shows errors due to the approximation for 5-clients system with 4 Kbytes reply packets. The error is de ned by 100 (result of approximate model ? result of exact model)/(result of exact model). We save about 93 percents of space and execution time by using approximate model at the cost of small error in the results. Though the error increases as the oered load increases, it is very small, less than 1 percent.

105 PC 1I 1

PC 2I 1

PC 3I 1

PC 4I 1

PC 5I 1

PC 1A

ta1

ts1 PC 2P

PC 2A

ta2

ta3 PC 4A

ta5 PSI

PSA

tsa

t3p

PW 2

PC 3S 3

PW 3

t4p

PC 4S PW 4

4

t5p

PC 5S

PW 5

5

ts5 PSP

PW 1 2

ts4 PC 5P PC 5A

PC 2S

ts3 PC 4P

t2p

ts2 PC 3P PC 3A

ta4

PC 1S 1

PSS

tsp

PSW

s1

tss t1p

PC 1P

s2

s3

s4

s5

PC 1I PC 2I PC 3I PC 4I PC 5I PC 1S

Figure 7.15: Exact SRN Model for the Token Ring Network-Based System with 5 Client Stations

106

7.1.7 Summary We have considered the use of Stochastic Reward Nets in evaluating the performance of clientserver systems consisting of N workstations and one server. The performance evaluation of the client-server systems is dierent from that of plain token ring and CSMA/CD network systems which are widely studied and reported in the literature in that the latter only consider message arrival pattern, network access scheme and physical aspects of message transmission, while the former considers the interdependent behaviors of the clients and the server as well. We have developed the SRN models of the systems based on both the token ring network and the CSMA/CD network. Because these models have a large state space, computer-aided solutions are required. In this study, we used a software package called the Stochastic Petri Net Package (SPNP). Numerical results are obtained for a 5, 7, and 10-station systems. However, the model with larger number of stations is easily constructed. It is also easy to extend the model to a system with more than one server. Our results are for systems in which the message transmission time distributions are exponential. However, we can model systems in which these distributions are not exponential but general, as long as the stochastic timed Petri nets models satisfy the condition of MRSPNs. One of the major drawbacks of the Stochastic Reward Net method is the fact that the storage requirement can be large since the state space of the reachability graph is generally large. We have used a superclient aggregation approximation to considerably reduce the size of the state space without incurring signi cant error. Further use of decomposition is currently under investigation to solve even larger system models.

107

7.2 Polling Systems The polling system is a system of multiple queues served by a single server in cyclic order2 . Many applications on computer-communications are based on the polling model, for instance, the data transfer between remote terminals and a central computer in 1960's where each terminal is connected to the computer on a separate transmission line switched in cyclic order among them [26, 104], the data link control protocol of multi-drop lines in 1970's [37], the token passing schemes in LAN in 1980's [16, 39, 60] and recent studies of the ISDN. Polling systems have been studied extensively in the literature ([51, 107] provide good surveys). Most of the eorts deal with in nite population, symmetric polling systems where all queues in a polling system have the same characteristics such as the job arrival time distribution and rate, service time distribution and rate, and polling (switchover) time distribution and rate. However many real systems are asymmetric and the number of users who access a queue is nite. Only a few papers have dealt with nite population polling systems [5, 61, 106]. Ibe and Trivedi [61] analyzed asymmetric nite population polling systems which have exponentially distributed interarrival and service times for gated, exhaustive and limited service policies. In [106], Takagi studied the asymmetric nite population polling system which has general service time and exhaustive service policy. Modeling polling systems by means of Generalized Stochastic Petri Nets (GSPNs) and analyzing them numerically by solving the underlying continuous-time Markov chain (CTMC) is a recent approach [5, 61]. Modeling a large system such as a polling system is done more easily and eciently by using GSPNs than using Markov model directly. The behavior of the system can be concisely represented by GSPNs. Moreover the diculties in analyzing asymmetric polling systems are the same as in symmetric polling systems. Also the nite population polling systems are modeled easily. The GSPN approach, however, also has some intrinsic problems. The time for an event to occur must be exponentially distributed in GSPNs. But this shortcoming can be overcome by using phase-type expansions of non-exponential distributions [35, 59, 75, 79, 86] or using the non-exponential distribution directly as discussed in Chapter 3. Next, the underlying Markov chain converted from GSPNs usually has a large state space. The memory needed for the Markov 2 This section is based on the paper Approximate Performance Models of Polling Systems using Stochastic Petri Nets by Hoon Choi and Kishor S. Trivedi presented at the IEEE INFOCOM

92, Florence, Italy, May 1992.

108 chain matrix may exceed the memory capacity of computer systems even for relatively small models. Thus, directly or indirectly (via GSPN) applying the Markov chain technique to model real systems is computationally intractable. To overcome the largeness problem, approximation methods which simplify the overall model yet provide reasonable accuracy must be developed. There have been several approximation methods at the Markov chain level [12, 33, 108]. These methods are based on reducing the Markov chain state space of a system to save the cost of solving the chain, and thus can be applied only after the whole Markov chain of the system becomes available. However, generating the Markov chain of a large, complex system itself is still expensive. In this study, we suggest approximation methods at the Petri net level, i.e., the methods which simplify the Petri net structure of a large system so that the size of the underlying Markov chain becomes considerably small. In this way, we avoid generating a huge Markov chain

besides reducing the solution cost. The objective of this study is to present analytic-numeric models of polling system performance. Such models provide a cost eective means of evaluating dierent design alternatives. The approach we use is that of Stochastic Reward Nets (SRNs), a class of GSPNs. Since the CTMC that result from the GSPN can become intractably large, we present a fast yet accurate approximate model of the polling systems. The section is organized as follows. In Section 7.2.1, SRN models of the single service polling system are described. Largeness problem in using SRNs is also pointed out. A brief overview of xed point iteration method is given in Section 7.2.2. In Section 7.2.3, an approximation method for both the symmetric polling systems and the asymmetric polling systems is proposed and numerical results are presented. Since our approximation method is based on xed point iteration, existence of a solution to the xed point equation is proven in Section 7.2.4.

7.2.1 Stochastic Reward Net Models of Polling Systems One-level Model of Polling Systems The SRN based models of polling systems were introduced in [61]. Among the models, the asymmetric nite population, single service (limited service policy with limit=1) polling model is used in this study as an example to show the approximation methods. This is the model of a LAN with network interface units to which multiple user systems connect in order to access the network. A network interface unit of the LAN becomes a node of the network, and thus

109 corresponds to a queue of the polling model. Figure 7.16 shows the model with 5 nodes. This model will be referred to as the one-level model and will be compared with the approximate models developed later in this study. Tokens in places PjI , j = 1; 2; : : :5, of Figure 7.16 represent the number of potential users that can generate messages to transmit at node j. It initially contains Mj tokens which is the population size of the users at the node. If Mj = 1 for all j, this model correspond to the single buer polling system. A token in place PjB represents the condition that a job has been queued at node j, in other words, a message has been generated and waits to be transmitted. A token in PjS represents the condition that the server has arrived at node j, and that in PjP represents the condition that the server is polling node j. A timed transition is represented by a rectangle and an immediate transition is represented by a bar in this gure. Firing the timed transition taj models the generation of a message. The ring rate of taj is marking-dependent, that is, it is proportional to the number of tokens in PjI . Marking-dependent ring rate is represented by the pound symbol (#) placed next to the transition in SRN. Firing the timed transition tpj means the server has arrived after polling node j. When the server arrives at a node, it may nd messages waiting (tokens in PjB ). If there are messages, the server transmits one message and commences polling the next node. If there is no message waiting, it immediately commences polling the next node (the immediate transition sj res). The service (message transmission) time at node j is assumed to be exponentially distributed with mean 1=j , and the polling time of node j is assumed to be exponentially distributed with mean 1= j . With this SRN model of the polling system, we can compute various performance measures such as the mean cycle time (de ned as the mean time to complete one cycle of service to all nodes, that is, the time from the instant that a Petri net token leaves a place PjS to the instant that the token returns to that place), the mean response time (de ned as the mean time from the instant a message is generated until it has been transmitted, that is, the time from the instant that a Petri net token is moved into PjB until it returns to place PjI ) and mean queue length (de ned as the mean number of users waiting for service in the queue, that is, the mean number of Petri net tokens in PjB ).

110 P1I M1

P2I M2

P3I M3

P4I M4

P5I M5

1 # ta1

P2B

ta2 t s2

ta3 t s3 4 # ts4 P5B

ta5

ts5

2

tp3

s2 P3S

P4P

s3

4 P4S

P5P

tp5 5

P2S

3

tp4 4

s1

P3P

3 P4B

5 #

tp2 2

P3B

3 #

P2P

1

ts1

2 #

ta4

P1S

P1B

s4

5 P5S

P1P

tp1

s5

1

1

Figure 7.16: SRN Model of 5 Nodes, Finite Population, Single Service Polling System The mean response time is computed as follows [111]: Mj ? 1 ; j 2 f1; 2; :::; N g E[Rj ] = Tput(t aj ) j

(7:1)

where Rj denotes the response time and j is the generation rate of one message and Tput(taj ) is the throughput of transition taj of node j. Since Tput(taj ) can be obtained by a tool such as SPNP [30], the mean response time is easily computed. The mean queue length is also easily

111 obtained by computing the steady state expected reward rate using the number of tokens in PjB as the reward rate.

Largeness Problem Markov models for complex systems, such as a polling system, tend to have a large number of states. Even though the speci cation of the Markov model can be eciently done by SRN, generating the Markov model from the Petri net and solving it still requires large storage space and long execution time. Table 7.6 shows the sizes of the CTMC of the polling models. The number of states of the Markov chain and the non-zero entries in the in nitesimal generator matrix are shown. The table is for the one-level model with dierent numbers of nodes and 3 users in each node. Table 7.7 shows the storage requirements of the model with N = 6 (6 nodes) under dierent values of M, the user population per node, that is, the maximum number of tokens in place PjI . As we see in these tables, the storage needs increase drastically as M or N increases. The increase is much faster as M increases than as N increases. Table 7.8 compares the execution time (user CPU time) needed to solve one-level models on a Sun 4/280 with 64 Mbytes main memory and UNIX operating system. For a 7-node polling system with 3 users at each node, required user CPU time was 98,874.8 seconds (27 hours, 27 minutes and 55 seconds). The actual running time including system time is much longer than this, possibly several days depending on the (multiuser) system's total load. Since execution time grows very fast as the model size increases, solving a bigger model, say a 20-node system, would be infeasible [61]. The largeness problem can be relieved to some degree by using a sparse storage technique (as is done in SPNP [30]). Nevertheless, for a large number of nodes or for a large number of users at a node, the Markov chain size will exceed the current limits of the system storage. Thus we need approximation methods to overcome the largeness problem.

7.2.2 Fixed Point Iteration Method Many models of practical computer or communication systems are impossible to solve exactly even if the favorable Markovian assumptions are made. The exact analysis of these systems is often too costly due to the large size of the state space unless it has a product form solution. An approach that often produces acceptable results in such cases is to make simplifying assumptions

112 Table 7.6: State Space and Storage Requirements, M =3 N 2 3 4 5 6 7

CTMC States Number of Non-Zero Matrix Entries 56 138 336 1080 1792 7104 8960 42240 43008 235008 200704 1247232

Table 7.7: State Space and Storage Requirements, N =6 M 1 2 3

CTMC States Number of Non-Zero Matrix Entries 576 2208 7290 35964 43008 235008

Table 7.8: Execution Time of One-level Models (in sec) N 2 3 4 5 6 7

M =1 M =2 1.1 3.1 3.3 19.0 10.7 113.0 35.3 662.7 102.5 3392.1 327.2 16198.3

M =3 7.4 70.9 656.9 5302.8 39296.1 98874.8

about the dependencies between system components, isolate subsystems which are solvable and feed the solutions for those subsystems back into the original model. Each subsystem is analyzed with the remainder of the system represented in a simpli ed manner. The methods are iterative in that the parameters of the simpli ed complement of a subsystem are modi ed after the other subsystems are analyzed. The subsystem is then re-analyzed with the modi ed parameter values for its remainder of the system. This process is continued until some convergence criterion is satis ed. This relation is well represented by xed point equation, that is, an unknown quantity is expressed by a function of itself.

113 De nition 7.1 Any solution x of the equation x ? f(x) = 0 where x is an unknown scalar or vector, that is, any point x in the domain of f for which x = f(x) is a xed point of f and the above equation is called a xed point equation.

Often, there are more than two sets of equations and corresponding sets of parameters, say x and y. The two sets of equations take the form. y = f(x) and x = g(y) These equations can be reduced to a single set of xed point equations. x = g(f(x)) or y = f(g(y)) Applications of the xed point iteration method can be found in [31, 38, 56, 78, 110]. To solve the xed point equations, the numerical iterative method, called xed point iteration method is usually used. Iterative method is a method in which we choose an arbitrary x0 and compute a sequence x0; x1; x2; : : : recursively, until there is no signi cant change in the values of x, from a relation of the form xi+1 = f(xi ) (i = 0; 1; 2; : ::) where f is de ned in some space containing x0 and the range of f lies in that space.

7.2.3 Approximation of Polling System Performance We propose an approximation method based on a xed point iteration. This method works for both the symmetric and asymmetric polling systems. We divide the N-node model into 2 groups; a tagged-node submodel and a submodel for the remaining (N ? 1) nodes. Since the remaining (N ? 1) nodes merely act like a delay with respect to the tagged node, we can approximate the system by substituting the (N ? 1)-node submodel with a single transition representing a delay. Then the reduced model becomes a single-node model with server vacation. The mean vacation time is the mean delay for serving (N ? 1) nodes. Figure 7.17 shows this model, labeled Si (N). The places PiI ; PiB ; PiS and PiP for node i 2 f1; 2; :::;N g are the same as those of the one-level model. The place PiV and the transition

114 PiI Mi

i #

PiS

PiB

tai tsi

PiV

i i

si tdi

1 PiP

i

tpi

Figure 7.17: Approximate Model of a Polling System : Si(N ) tdi are for the surrogate delay due to the remaining (N ? 1) nodes. In order to obtain the mean response time at node i, we solve the model Si (N) with proper delay rate, i , for the transition tdi . Since we do not know how long this delay is, we cannot solve Si (N) directly. Instead, we solve it by an iterative method with given parameters and a initial guess for i . De ne di : the mean delay due to node i di(k) : the mean delay due to node i when there are k messages in the queue of that node when the server arrives P(PiB = k) : the steady state probability of having k messages in the queue of the node i Di : the sum of delays of all nodes except node i Tput(t) : the throughput of a transition t. Then di can be approximated by: di = Di =

Mi X k=0

X

j 2A;j 6=i

di (k) P(PiB = k)

dj ;

A = f1; 2; ::::; N g:

(7:2) (7:3)

The di(k) is obtained by computing the mean time to absorption of the model Ti (k) of Figure 7.18 for node i with dierent values of k (0 k Mi ). In the single service polling system,

115 di(k) is 1=i for 1 k Mi . P(PiB = k) is computed by solving Si (N) with i of the previous iteration step or the initial guess for the rst iteration. Once we get Di , i is computed as: i = D1

(7:4)

i

since i represents the mean delay rate due to all nodes other than node i. This i is used when we solve the model Si (N) with respect to node i at the next iteration step.

1

i

PiB

tpi PiS

k

tsi

PiP

i

si PiV

Figure 7.18: SRN Model for Computing the Mean Delay : Ti(k) This method results in a coupled system of N xed point equations. i can be represented as i = fi (d1 ; d2; : : :; di?1; di+1; : : :; dN ), and di = gi(i ) for some function fi ; gi, therefore we have: 1 = 1(2 ; 3 ; 4; 5 ; : : :; N ) 2 = 2(1 ; 3 ; 4; 5 ; : : :; N ) 3 = 3(1 ; 2 ; 4; 5 ; : : :; N ) .. . i = i(1 ; 2; : : :i?1 ; i+1; : : :N ) .. . N = N (1 ; 2 ; 3; 4 ; : : :; N ?1) :

116 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

for i = 1 to N for k = 0 to Mi solve Ti (k) to get di(k) i(1) 0 /* 0 : initial value */ repeat for i = 1 to N solve Si (N) with i(m) to get P(PiB = k) for each k compute di as Equation (7.2) for i = 1 to N compute Di as Equation (7.3) compute i(m+1) as Equation (7.4) until (ji(m+1) ? i(m) j < ) compute E[Ri] as Equation (7.1)

Figure 7.19: Algorithm for the Mean Response Time Computation This means that i depends on all other j 's and itself implicitly. Thus to nd i iteratively, we need to solve Si (N) and compute i for each node j (j 2 f1; 2; : : :N g) at each iteration step. We continue the iteration until there is no signi cant change in the values of j from one iteration to the next. Then we compute the mean response time at node i by Equation (7.1). Tput(tai ) is obtained by solving Si (N) with the xed point value of i . Figure 7.19 summarizes the procedure.

Table 7.9: State Spaces and Execution Time of Models, N = 5; M = 3 Si (N) one-level ratio states 11 8960 0.123 % nonzero entries 19 42240 0.045 % execution time(sec) 165.5 5302.8 3.121 % Table 7.9 compares the computation costs of the one-level model and this approximate model for the 5-node, 3-user asymmetric polling system. The execution time which is the total time to carry out the computations of Figure 7.19 is measured in seconds on a Sun 4/280 computer with 64 Mbytes main memory and UNIX operating system. The ratio is de ned as (value by model Si (N)/value by one-level model) 100.

117

Table 7.10: Mean Response Time at Node 1, M = 3 Model Si (N) one-level model 0.1 0.891402 0.888061 0.2 0.982582 0.980364 0.3 1.09088 1.08962 0.4 1.22294 1.22055 0.5 1.38478 1.37599 0.6 1.58487 1.55974 0.7 1.82989 1.77337 0.8 2.12901 2.02159 0.9 2.46943 2.29479

error 0.376 % 0.226 % 0.116 % 0.196 % 0.639 % 1.611 % 3.187 % 5.314 % 7.610 %

Table 7.11: Mean Response Time at Node 1, M = 1 0 Model Si (N) one-level model 0.1 0.831636 0.827575 0.2 0.849999 0.845293 0.3 0.869004 0.863147 0.4 0.889042 0.881536 0.5 0.909833 0.90018 0.6 0.93153 0.919218 0.7 0.953827 0.938353 0.8 0.977181 0.958006 0.9 1.00115 0.977939

error 0.491 % 0.557 % 0.679 % 0.851 % 1.072 % 1.339 % 1.649 % 2.002 % 2.373 %

118 Table 7.10 compares the mean response time by this approximation method with those from the one-level model at node 1 of the 5-node, 3-user asymmetric system with 1 = 2, 2 = 3 = 4 = , 5 = 0:5, and i = 294:12, i = 1:25 messages=msec, for all i. The error is de ned by 100 (value by model Si (N) ? value by one-level model)/value by one-level model. The oered load, , is de ned as =

N X

Mi i : Table 7.11 shows the mean response time at i=1 i

node 1 of the same system but when the user population is 1 (Mi = 1 for all i). In the tables, 0 is one third of the value of M = 3 cases. The approximation method performs better at low oered loads than at high loads as the case of the mean cycle time approximation. It is a good approximate model for a LAN of personal computers where user population of each node is 1. But even if oered load increases, errors are small enough to use the method in practice considering the huge saving in computation cost. The mean cycle time is the mean time for the server to traverse all the N nodes in the polling system in sequence. Since di is the mean time to traverse the node i, di is the mean polling time plus the mean service time at the node. These di(i i N) have been computed by xed point iteration Equation (7.2). Therefore, the mean cycle time can be obtained by: E[C] =

N X i=1

di :

7.2.4 Existence of a Fixed Point The xed point equations given in the previous section were solved by iterative methods with initial values. Two most common questions that arise regarding the iterative methods are: whether a solution exists and whether it is unique. Here we prove the existence of a solution. We show the existence of a xed point solution for the mean response time of a symmetric system for simplicity. We use the following theorem for the proof [89].

Theorem 7.1 (Brouwer Fixed Point Theorem) Let Rn be the n-dimensional space of reals. If there exists a compact, convex set S Rn and there exists a continuous function f such that f(x) 2 S for all x 2 S, then there exists a solution to equation f(x) = x. Consider Si (N) model (Figure 7.17) of a polling system where i = ; i = ; i = ; i = Mi = M for all nodes. De ne

119 d : the mean delay due to one node (t) : the steady state probability that a transition t is enabled. The mean response time E[R] is computed by: M ?1= M ?1= M ?1 E[R] = Tput(t a ) Tput(ts ) (ts )

(7:5)

where the last equation comes from the fact that the throughput of a transition is the eective ring rate, i.e., the nominal ring rate multipliedby the probability that the transition is enabled. The probability (ts ) can be obtained by solving the Petri net model when all of the parameters of the model are given. But, since is unknown, we obtain (ts ) by xed point iteration. In Si (N) of Figure 7.17, (ts ) is dependent on all the parameters, i.e., M; N; ; , , and . With the given value of parameters M > 0; N > 1, and ; ; 2 (0; 1), the probability (ts ) becomes a function of : (ts ) = () (7:6) for some function . The represents the delay rate due to the (N ? 1) nodes: = (N ?1 1)d ; N > 1:

(7:7)

The d is computed from Ti (k) of Figure 7.18: d = 1 + 1 (ts ) :

(7.8)

Applying Equation (7.8) into Equation (7.7), we get: = N 1? 1 +

(ts ) = g((ts ));

N >1

(7:9)

for a continuous function g such that g(x) 2 (0; 1) 8x 2 [0; 1]: By using Equation (7.8), we can also express the function of Equation (7.6) in terms of the probability (tp ) 2 [0; 1] ((tp ) is also dependent on ): f ? ? g (ts ) = (tp )

120 which is a continuous function such that () 2 [0; 1] 8 2 (0; 1). By combining Equation (7.6) and Equation (7.9), we obtain the following xed point equation: (ts ) = () = (g((ts ))) = f((ts )) where f is a continuous function such that f(x) 2 [0; 1] 8x 2 [0; 1]: If we let S = [0; 1], S is a set of real numbers between 0 and 1, and S is a compact (closed and bounded) set. S is also a convex set because: x + (1 ? )y = z 2 S

8x; y 2 S; 2 [0; 1]:

Therefore from the Brouwer Fixed Point Theorem, there exists a solution to the equation (ts ) = f((ts )) : The xed point value of (ts ) proves that there exists a xed point value of the mean response time of Equation (7.5).

7.2.5 Summary In this study, an approximation method for the performance of polling systems has been suggested. The results from the one-level polling model and those from the approximate models are presented. The approximation method saves more than 95 percent of computation cost without a concomitant loss in accuracy. The method performs very well at low oered loads. Even at higher loads, the accuracy of the mean response time computed by xed point iteration method is acceptable considering the saving in computation cost.

Chapter 8 Reliability Analysis using MRSPNs In this chapter1 , we investigate the reliability issue of systems. Among various reliability measures, we are particularly interested in the mean time to failure (MTTF) of a system which is one of the most practical reliability measures. We introduce a new type of MTTF named conditional MTTF and provide its computational method in SRN models. The SRN is a subset of MRSPNs as mentioned in Section 2.2.

8.1 Introduction In many practical situations, there are multiple causes of system failures and accordingly different modes of failures. For instance, failure of a computer system may be due to the failure of processor or memory or disk subsystem. In a disk subsystem, a failure may result in data destruction which is accompanied by a warning. Alternatively data may be corrupted without being detected. The latter event is more serious than the former, so we would like the probability of the latter event to be small and the mean time to the occurrence of the latter event rather long. In a safety critical system, the system design should provide a long mean time to an unsafe shutdown while a short mean time to safe shutdown may be acceptable. In a fault-tolerant system, a failure due to imperfect coverage [13] is less desirable compared with a failure due to the exhaustion of redundancy. Transient probability of system failure is easily broken down into its constituent causes 1 This chapter is based on the paper Conditional MTTF and its Computation in Markov Reliability Models by H. Choi and K. S. Trivedi presented at the 1993 Annual Reliability and

Maintainability Symposium (RAMS), Atlanta, USA, Jan. 1993.

121

122 and has been reported by many existing reliability modeling tools. While these probabilities can be used to estimate the system's susceptibility to its failure causes, the mean time to failure (MTTF) due to individual causes provides more practical information. The MTTF due to individual failure causes, or the MTTF to individual failure modes (states) is a relatively unexplored topic. Most of the previous studies about MTTF deal with the MTTF to a single group of failure states of a system [18, 19, 99]. The purpose of this study is to formalize the concept of the MTTF to individual failure state(s) satisfying a given condition, and to develop an ecient method of computing it in a nite state space CTMC which is the underlying stochastic process of SRN model. We show the reliability model of a system in SRN and its equivalent CTMC model through an example in Section 8.6.1. There have been some studies about MTTF associated with conditions. The MTTF from a given initial state is one example, and the mean residual life at time t [19] which is the expected time to failure given that the system has been operational up to time t, is another. In [64], the conditional expectation is de ned to be the expected time to failure given that the failure occurs within a speci c time range. All these studies, however, do not consider the MTTF to individual failure states. In [28], it is observed that the probability of absorbing into a partition of absorbing states from a given initial state may be computed as the accumulated reward until absorption by assigning zero reward rate to the states in the partition and positive reward rate to all other absorbing states. The possibility of computing the expectation and distribution of time given that the process is absorbed in a state with zero reward is also speculated. We give notation and de nition of conditional MTTF in Section 8.2. In Section 8.3, a method of computing the conditional MTTF is developed and the time complexity of the method is analyzed. The method of computing higher moments of the time to failure is investigated in Section 8.4. The conditional MTTF can be applied to any reliability model, but it is especially useful for a system which has multiple critical failure states and when the conditional MTTF to one of the critical failure states is of particular interest. It is also useful for a system which has multiple non-critical failure states and the conditional MTTF to one of them is required. When a system has both the critical and non-critical failure states, it may be required to compute the mean time to failure to a critical failure state disregarding the occurrence of non-critical failures. We discuss this cumulative measure of time to failure in Section 8.5. In Section 8.6, conditional MTTFs have been shown for several example systems.

123

8.2 De nition of the Conditional Mean Time To Failure We rst de ne the notation used in the chapter: fX(t); t 0g : a time-homogeneous, nite state, continuous-time Markov chain (CTMC)

: the state space of the CTMC T : the set of all transient states A : the set of all absorbing (failure) states, A = ? T A : the set of absorbing states which correspond to the given failure condition (A A ) Y : a random variable representing the time for the Markov chain to absorb into A E[Y ] : the expected value of Y f(y) : the probability density function of the absorption time FA (y) : the distribution of the nite absorption time, FA (y) = P fY y and X(y) 2 Ag fA (y) : the probability density function of the nite absorption time, fA (y) = dyd FA (y) F(yjA) : the conditional distribution of time to absorption to A, F(yjA) = P fY y j X(y) 2 Ag PA (1) : the probability of absorbing into A, PA (1) = P fX(1) 2 Ag MTTF A : the conditional MTTF to A j

CMTTF A : the cumulative conditional MTTF to A j

1T : a column vector of size jT j with all 1's 0T : a column vector of size jT j with all 0's. In a Markov reliability model, failures are represented by absorbing states [111]. In a system with multiple subsystems, failures due to the dierent causes/sources may be modeled either by a single absorbing state or by several dierent states representing individual failure modes. We assume that individual failure modes are present.

124 s0

1 s1

2 s2

Figure 8.1: A Simple CTMC Model of a System Reliability Consider a simple CTMC reliability model of Figure 8.1. In state s0 system is up, in state s1 the memory is up but the processor is down, and in state s2 the processor is up but the memory is down. Suppose that A = fs1; s2g, i.e., A = A . Then E[Y ] starting from state s0 is: E[Y ] = MTTF =

Z

1

y f(y) dy = +1 : 1 2 0

Now suppose we are interested in the mean time to reach state s2 only (A = fs2g A ), i.e., the mean time for the system to fail due to the memory. Then: E[Y ] =

1 with prob. p1 = 1 =(1 + 2 ) 1=(1 + 2 ) with prob. p2 = 2 =(1 + 2 ):

Hence the expected value of Y is E[Y ] = 1 p1 + (1 + 2 )?1 p2 = 1. As we see in this example, the conventional way of computing the expected value of time to failure to A does not give us meaningful information when A A . However, suppose we impose a condition that the system has reached A and inquire about the mean time to failure in that case. It can be seen that the expected nite absorption time [67]: E[Y and Y < 1] = E[Y j Y < 1] P fY < 1g = E[Y j X(Y+) 2 A] PA (1):

(8.1)

The E[Y j Y < 1] is the expected value of Y given that Y is nite. It is obvious that only in case the system absorbs into A, Y becomes nite. Thus, the expected value of Y under the condition that the system will eventually reach A is the expected value of Y given that it is nite: E[Y j Y < 1] = E[Y j X(1) 2 A] : Hence we can de ne the conditional mean time to failure by: E[Y j X(Y+) 2 A] = E[Y and Y < 1] =PA (1)

125 Z 1 y fA (y) dy = P 1(1) A 0

(8.2)

which is the expected time until absorption to a set A of states given that the system eventually absorbs into A. The set A consists of the states that represent the given condition.

8.3 Computation of the Conditional MTTF 8.3.1 Method Transient Class T

r1

r2

r3

r4

rm

Figure 8.2: A CTMC with m Absorbing States Consider a CTMC with m 1 absorbing states as in Figure 8.2. The set A of absorbing states of this CTMC is A = fr1; r2; r3; :::; rmg: The in nitesimal generator matrix Q = [qij ] of the nite state space, time-homogeneous CTMC consists of the direct transition rates from state i to j (i 6= j; i; j 2 ) and qii = ?

X

j;j 6=i

qij . Let the generator matrix Q be partitioned

so that the transient states appear rst followed by states in A, and followed by the remaining absorbing states which are collectively labeled as B. It is clear that qij = 0 (8j 2 ) for a state i in A or B: # " QTT QTA QTB 0 0 : (8:3) Q= 0 0 0 0 Here QTT is the partition of the generator matrix consisting of the transition rates between the states in T, similarly QTB has the transition rates from states in T to states in B. We assume A to have a single state so that QTA is jT j 1 matrix. In case that A consists of multiple absorbing states, we can easily aggregate these states into a single state. Note that the aggregation does not imply an approximate solution. De ne the state probability vector

126 P(t) = [PT (t); PA (t); PB (t)] where Pi(t) is the probability that the system state at time t is in R the set i; i 2 fT; A; B g. De ne the integrals of state probabilities, Li (t) = 0t Pi (u)du, and the corresponding vector L(t) = [LT (t); LA (t); LB (t)]: We (i) compute the absorption probability to A and (ii) obtain the conditional distribution of time to absorption to A in the form of Laplace-Stieltjes transform (LST). From the LST of the distribution, (iii) we compute the conditional MTTF. The absorption probability is the probability that the system absorbs into A with a given initial state distribution: P fX(1) 2 Ag = PA (1): In case that X(0) 2 B, PA (1) is simply 0 and if X(0) 2 A, then PA(1) is 1. Without loss of generality, we assume that the system starts in one of the transient states. In case that A = A , A includes all the absorbing states of the system model resulting in PA(1) = 1. In case that A A , i.e., when B is not empty, there is a possibility for the system to absorb into B, hence PA (1) < 1.

Proposition 1

The absorption probability to A is given by:

PA (1) = T QTA

(8:4)

where T is the solution of the linear system:

T QTT = ?PT (0):

(8:5)

Proof

R Let P (s) = 01 e?st P(t) dt be the Laplace transform (LT) of P(t). Taking the LT of both sides of forward Kolmogorov equation, dP(t) dt = P(t) Q (t > 0), we get:

1

Z

0

e?st dP(t) dt dt =

Z

0

1

e?st P(t) Q dt;

sP (s) ? P(0) = P (s) Q P (s)(sI ? Q) = P(0) :

t>0

127 That is,

"

sIT ? QTT ?QTA ?QTB 0 s 0 [PT (s); PA (s); PB (s)] 0 0 sIB

#

= [PT (0); PA(0); PB (0)] where IT ; IB are identity matrices with dimension jT jjT j, jB jjB j, respectively. By assumption, PA (0) = PB (0) = 0, hence:

and

PT (s) = PT (0)[sIT ? QTT ]?1

(8:6)

?1 PA (s) = PT (0)[sIT ?sQTT ] QTA :

(8:7)

From the nal value theorem of Laplace transform: ?1 PA(1) = tlim !1 PA (t) = slim !0 sPA (s) = ?PT (0)[QTT ] QTA :

The proof is then completed by denoting T = ?PT (0)[QTT ]?1.

2

We note that LT (1), the integration of state probabilities for the transient states up to time 1, is the vector consisting of the expected total time spent by the system in state i (i 2 T) until absorption: LT (1) =

Z

0

1

PT (t)dt = PT (0) = ?PT (0)[QTT ]?1 = T :

Hence, the MTTF of a system is obtained by summing up the components of the vector T [10, 48, 102]: (8:8) MTTF = T 1T : The distribution of time to failure (absorption) to the overall absorbing states ( A ) of a CTMC has been commonly studied. Let the unconditional distribution of time to absorption to A (A A ) be denoted by FA (t), and the conditional distribution of time to absorption to A be denoted by F(tjA).

128 Theorem 8.1

The conditional mean time to failure to A is given by:

Y < 1] = T QTA MTTF A = E[Y and PA(1) T QTA

(8:9)

j

where T is the solution of the linear system:

T QTT = ?T :

(8:10)

Proof

Let PA (s) be the LST of PA (t) so that: P (s) = A

Z

0

1

e?st dPA(t) = sPA (s) ? PA (0) = sPA (s):

Using Equation (8.7), we get:

We have, for t < 1:

PA (s) = PT (0)[sIT ? QTT ]?1QTA :

(8:11)

FA (t) = P fY t and X(t) 2 Ag = PA(t);

(8:12)

and X(1) 2 Ag = PA (t) = FA (t) : (8:13) F(tjA) = P fY t j X(t) 2 Ag = P fY P tfX( 1) 2 Ag PA(1) PA(1) Denoting by F (sjA), the LST of F(tjA), we have: FA(s) = PA (s) = PT (0)[sIT ? QTT ]?1QTA ; PT (0) ?1 F (sjA) = PPA((s) 1) = P (1) [sIT ? QTT ] QTA : A

A

(8:14)

From the LST of the conditional distribution, we can obtain the conditional MTTF to A: dF (sjA) MTTF A = slim ? !0 ds ?1[QTT ]?1QTA ?T [QTT ]?1QTA = : = PT (0)[QTTP] (1 T QTA A ) j

If we denote ?T [QTT ]?1 = T , we get Equation (8.9).

2

129 The Equation (8.9) agrees with Equation (8.13) where the distribution of the conditional time to absorption is obtained from the distribution of the unconditional time to absorption divided (normalized) by the absorption probability. Note that the expected nite absorption time E[Y and Y < 1] is indeed the mean rst passage time to A and this can be simply computed by T QTA . Any linear algebraic system solver can be used to solve Equations (8.5) and (8.10). A direct method like Gaussian elimination may be used for small Markov chains. Iterative methods are more practical for large and sparse models [47, 66, 93]. The matrix QTT is a non-singular, diagonally dominant matrix. Hence the convergence of an iterative method such as Gauss-Seidel or SOR (Successive Overrelaxation) is guaranteed [9].

8.3.2 Time Complexity of the Method Let the number of transient states be n (jT j = n) and the number of non-zero entries in the QTT matrix be . In order to compute the conditional MTTF, we rst compute the absorption probability in Equation (8.4) and the expected nite absorption time. The absorption probability is obtained by solving one system of linear equations for T and then a vector multiplication of size n. The expected nite absorption time is obtained by solving another system of linear Equations (8.10) followed by a vector multiplication of size n. We assume a sparse storage scheme and an iterative method such as Gauss-Seidel or SOR. We need O() scalar multiplications in one iteration step. The number of iterations k required for the convergence of solutions is determined by the speci ed tolerance, " [47]. The total number of scalar multiplications to solve a linear system is then O(k). The number of scalar multiplications needed to multiply 2 vectors of size n is O(n). Hence the overall complexity in terms of scalar multiplications to compute the conditional MTTF is O(2(k + n)).

8.4 The Second Moment of The Time To Failure It is worth noting that the second moment E[Y 2] of the time to failure when A = A (the time to absorption to overall absorbing states) can be computed from T of Equation (8.10), like the rst moment E[Y ] of the time to failure (MTTF) is computed from T as shown in Equation

130 (8.8). By following the argument of Section 8.3, we aggregate all the absorbing states into a single state labeled A. The generator matrix of the CTMC will look like: h Q = Q0TT Q0TA

i

:

Then LST of PA (t), which is LT for Y , is derived as: PA (s) = PT (0)[sIT ? QTT ]?1QTA : The second moment of time to failure to A is: 2 d2PA(s) = 2 T [QTT ]?1[QTT ]?1QTA : E[Y 2 ] = slim ( ? 1) !0 ds2

We have the following relation due to the property of generator matrix, i.e.,

(8:15) X

j 2

qij = 0:

QTT 1T + QTA = 0T : This means QTA = ?QTT 1T , as a result: E[Y 2 ] = ?2 T [QTT ]?1 1T = 2 T 1T = 2

X

i2T

i :

(8:16)

This result agrees with Grassman's [50]. Now that we know the second moment, we can easily compute the variance of the time to failure. This method can be generalized to the case of higher moments of the time to failure.

8.5 The Cumulative Conditional MTTF One useful application of the conditional MTTF is to compute an accumulation of MTTFs until the occurrence of a speci c event. We de ne the cumulative conditional MTTF as the mean time until failure since the system began operational satisfying the given condition and disregarding other possible system failures that do not satisfy the given condition. Recall that the conditional MTTF was the mean time to failure given that the upcoming failure is the one satisfying the condition.

131 An example of the cumulative conditional MTTF is mean time to critical (hazardous, unsafe) failure. Consider a system which has non-critical failure states and a critical failure state [63, 98]. In [98], the authors classi ed failures in a RAID (Redundant Arrays of Inexpensive Disks) system. The recoverable system failure is due to recoverable data errors, and the catastrophic failure is due to unrecoverable data errors or drive failure. In [63], the safe failure state and the unsafe failure state for the recon gurable duplex system are de ned. When the system experiences a non-critical failure and the failure can be detected, the system may be reset back to the initial state without a critical damage such as a loss of work or the corruption of data. Since the non-critical system failures do not incur critical damage, the critical failure is the main concern, and we like to evaluate the mean time until the critical failure occurs. Figure 8.3 shows the CTMC of the recon gurable duplex system in [63]. The system consists of two operating processors. Both processors carry out the same computations in parallel. The outputs of two processors are compared and any discrepancy indicates a fault in one of the processors. Each processor runs self-diagnostic routines that attempt to locate the source of the fault. If the fault is located, the system undergoes recon guration to isolate the faulty processor and operates with the remaining processor. If the fault is not located by the self-diagnostics after it is detected, the system is safely shutdown to be reset. The system is regarded as having unsafely failed if a fault occurs but neither the comparison process nor the self-diagnostics detects it. The probability that the comparison process detects the existence of a fault is CC , and the probability the self-diagnostics locate it is CS . 2(1 ? CC )(1 ? CS )

2

unsafe

2CC (1 ? CS )

safe

2CS

(1 ? CS )

1

CS

Figure 8.3: CTMC of The Recon gurable Duplex System Suppose we want to compute the mean time until the unsafe failure occurs. For simplicity we assume the time to reset the system from the safe failure state is negligible. This assumption

132 however can be easily removed to include a reset time. Let A = funsafeg, B = fsafeg. The cumulative conditional mean time to failure to A denoted by CMTTF A can be represented as the random sum of R copies of MTTF B followed by an MTTF A (Figure 8.4) where R is a random variable for the number of resets, i.e., the number of occurrences of safe failures before the unsafe failure occurs. R is modi ed geometrically distributed with parameter PA(1). Hence: j

j

j

CMTTF A = E[R] MTTF B + MTTF A j

j

j

1) MTTF + MTTF = PPB ((1 B A A ) j

=

j

PB (1) MTTF B + PA(1) MTTF A : PA (1) j

j

Using Equations (8.4) and (8.9): PA (1) MTTF A = T QTA ; j

PB (1) MTTF B = T QTB :

(8.17)

j

Therefore: QTB ) = CMTTF A = T (QPTA(+ 1) A

j

T (?QTT 1T ) PA (1)

1 1 Z 1 y f(y) dy: = PT (1T) = MTTF = PA (1) PA (1) 0 A

fail by B fail by B

0

fail by B

(8:18)

fail by A time

Figure 8.4: Relations between CMTTFjA (), MTTFjA (), MTTFjB ( )

133 Note that Equation (8.18) diers from Equation (8.2) in its probability density function. The CMTTF A is computed using f() whereas MTTF A is computed using fA (). The CMTTF A simply turns out to be MTTF=PA (1). These results hold when the system has one critical (unsafe) failure state as in this example. For more general cases, we next show a method of computing the cumulative conditional MTTF to one of the critical failure states when there is more than one such states. Observe that whenever safe failures occur, the system is reset to its initial state. This behavior can be captured by the modi ed CTMC obtained from the original CTMC by directing state transitions to the safe failure states (in the original CTMC) to the initial state. Figure 8.5 shows the modi ed CTMC based on the original one in Figure 8.3. Whenever the recon gurable duplex system moves to the safe failure state, it is reset to the initial state, the state 2. j

j

j

2

2(1 ? CC )(1 ? CS )

unsafe

CS

2CS

(1 ? CS )

1

Figure 8.5: The Modi ed CTMC of the Recon gurable Duplex System If Q is the generator matrix of the modi ed CTMC, it can be shown that: 0

"

#

"

QTT + QTB 1B PT (0) Q Q Q = TT TA = 0 0 0 0

0

0

#

QTA : 0

The QTT consists of QTT of the original CTMC plus the newly added components QTB 1B 0

PT (0). The latter are the rates from T to B of the original CTMC which are to be redirected back to T according to the initial probability distribution of the system. The following result describes the method to compute the cumulative conditional MTTF in a Markov model.

Theorem 8.2

Suppose a Markov model has one or more critical failure states and one or more non-critical failure states. The cumulative conditional MTTF to one of the critical failure

134 states in the Markov model is the same as the conditional MTTF to that state in its modi ed model.

Proof

Without loss of generality, consider a CTMC, labeled Q, with two critical failure states A; B and one non-critical failure state C as in Figure 8.6 (a). The corresponding modi ed CTMC Q of the Q is shown in Figure 8.6 (b). The generator matrices of these Markov models are: 0

T

T A

B

C

C

A

B

(b) Q

(a) Q

0

Figure 8.6: A Markov Model Q and Its Modi ed CTMC Q0 2

Q=4

QTT QTA QTB QTC 3 0 0 0 0 5; 0 0 0 0 0 0 0 0

"

#

QTT + QTC PT (0) QTA QTB Q = 0 0 0 : 0 0 0 0

The cumulative conditional MTTF in Q to A disregarding the failures to C is obtained by: CMTTF A (Q) = E[R] MTTF C (Q) + MTTF A (Q) j

j

j

) MTTF (Q) + MTTF (Q) = 1 ?PCP(1(1 C A C ) j

j

where R is a modi ed geometric random variable for the number of resets, i.e., the number of failures to C. Using the results derived earlier, we obtain: QTA = PT (0)[QTT ]?1[QTT ]?1QTA MTTF A (Q) = T Q ?PT (0)[QTT ]?1QTA T TA j

QTC = PT (0)[QTT ]?1[QTT ]?1QTC : MTTF C (Q) = T Q ?PT (0)[QTT ]?1QTC T TC j

135 Using the result PC (1) = T QTC , we have: T QTA + CMTTF A (Q) = 1 ?TQTC T QTA T QTC j

TT ]?1[QTT ]?1 QTC + PT (0)[QTT ]?1 [QTT ]?1QTA = P1T?(0)[Q (?PT (0)[QTT ]?1QTC ) (?PT (0)[QTT ]?1QTA )

= kuA suC + sA A

(8.19)

where uA = ?PT (0)[QTT ]?1QTA , uC = ?PT (0)[QTT ]?1QTC , sA = PT (0)[QTT ]?1[QTT ]?1QTA , sC = PT (0)[QTT ]?1 [QTT ]?1QTC and k = 1=(1 + PT (0)[QTT ]?1QTC ) which are all scalars. The conditional MTTF to A in Q is computed by: 0

TC PT (0)]?1[QTT + QTC PT (0)]?1QTA : MTTF A (Q ) = PT (0)[QTT?+P Q(0)[Q + Q P (0)]?1Q 0

T

j

TT

TC T

TA

It can be shown: [QTT +QTC PT (0)]?1 = [QTT fI+[QTT ]?1QTC PT (0)g]?1 = [I+[QTT ]?1QTC PT (0)]?1[QTT ]?1 : We can remove the inversion of sum of matrices by using Sherman-Morrison-Woodbury formula [47] as follows: ]?1QTC PT (0) : [I + [QTT ]?1QTC PT (0)]?1 = I ? 1 +[QPTT(0)[Q ]?1Q T

TT

TC

The PT (0)[QTT ]?1QTC of the denominator is a scalar, ?PC (1). Then MTTF A of Q can be rewritten as: 0

j

?1 TC PT (0)][QTT ]?1[I ? k[QTT ]?1QTC PT (0)][QTT ]?1QTA MTTF A (Q ) = PT (0)[I ? k[QTT?] P Q(0)[I ? k[Q ]?1Q P (0)][Q ]?1Q 0

T

j

TT

TC T

TT

TA

2 C sA + k uA uC sC : = sA + kuA sCu+ +kuku u A

A C

(8.20)

Now we compare Equation (8.19) and Equation (8.20). The product of the numerator of Equation (8.19) and the denominator of Equation (8.20) is: (kuA sC + sA )(uA + kuA uC ) = ku2A sC + uA sA + k2 u2A uC sC + kuA uC sA :

136 The product of the denominator of Equation (8.19) and the numerator of Equation (8.20) is: uA (sA + kuA sC + kuC sA + k2uA uC sC ) = uA sA + ku2A sC + kuA uC sA + k2 u2A uC sC : The cross product of two Equations (8.19) and (8.20) are the same. Therefore, CMTTF A (Q) = MTTF A (Q ) . 0

j

j

2

In the example of the recon gurable duplex system, one can con rm that the conditional MTTF to unsafe failure state in the modi ed CTMC (which is the same as MTTF in this case) is the same as the cumulative conditional MTTF to that state in the original CTMC: MTTF A of Q = CMTTF A of Q 0

j

j

S : = 2 (1 ? C1 ++2C C CS ) (1 ? CS )

8.6 Examples The concept of conditional MTTF is useful in many applications. In a queueing network model with nite size queues, a new arriving job at a node whose buer is full is rejected. If we think of buer over ow at a node as a failure, we can compute the mean time to over ows in the queueing network (MTTF) as well as the mean time to over ow at an individual node (conditional MTTF), and also the mean time to over ow at a speci c node ignoring the over ows at other nodes (cumulative conditional MTTF). In this section, we compute the mean time to failure of a communication network due to failures of each subsystem. We also compare several safety measures for simplex and duplex architectures of fault-tolerant systems.

8.6.1 A Communication Network Consider a communication network shown in Figure 8.7. Suppose there are 2 switches and 1 communication link and 6 user nodes. A switch fails with constant rate = 2:0=year, and the link fails with constant rate = 0:1=year. Since both the switches and the link are crucial for the network operation, the network is regarded to have failed as soon as one of the 2 switches fails or the link fails. A user node can fail with constant rate = 1:0=year, but once it fails

137 node 1

node (m+1) link

node 2

node m

node (m+2)

switch 1

switch 2

node N

Figure 8.7: System Con guration of an Example Network 0 2 2

S 2

6

1 5

L

2 4

U

Figure 8.8: Markov Chain Model of a Network System Reliability it is under repair by a single repair facility with the constant repair rate = 50:0=year. The network is regarded to have failed if 3 or more user nodes have failed at a time. A Markov chain model for this system is shown in Figure 8.8. The set of absorbing states is A = fU; S; Lg and the set of transient states is T = f0; 1; 2g and the initial state is 0. In state 0 all the subsystems are up. In state 1 (2) both the switches and the link are up, but there is 1 (2) failed user node. The state U represents the network failure due to the lack of operational user nodes. The state S (L) represents the network failure due to the switch (link) failure. The generator matrix of this Markov chain is:

138 2 6

Q = 664

0 1 2 ?(6 + 2 + ) 6 0 ?(5 + 2 + + ) 5 0 ?(4 + 2 + + ) 0 0 0 0 0 0 0 0 0

U 0 0 4 0 0 0

S 2 2 2 0 0 0

L 3 7 7 0 75 : 0 0

Suppose we are interested in (i) the MTTF of the network given that the network failure is due to the switch failure and (ii) the MTTF given that all the switches were operating at the time of failure. The MTTF of the system given that the network failure is due to the switch failure is the mean time for the system to absorb into the state S starting from the system's initial state 0. From Q, make the following partitions. "

QTT =

?(6 + 2 + )

" # 6 0 2 ?(5 + 2 + + ) 5 ; QTA = 2 2 ?(4 + 2 + + )

0

#

The initial probability vector is PT (0) = [P0(0); P1(0); P2(0)] = [1; 0; 0]: By solving the system of equations in Equation (8.5), "

[0 ; 1; 2]

?(6 + 2 + ) 0

# 6 0 = [?1; 0; 0] ?(5 + 2 + + ) 5 ?(4 + 2 + + )

we get T = [0:216204; 0:0236733; 0:00203729]: Similarly, by solving Equation (8.10), "

[0 ; 1 ; 2]

?(6 + 2 + ) 0

6 0 ?(5 + 2 + + ) 5 ?(4 + 2 + + )

#

= [?0:216204; ?0:0236733; ?0:00203729] we get T = [0:0517603; 0:0061315; 0:000562733]: Then by applying T ; T into Equation (8.9), we obtain the conditional MTTF to A = fS g. QTA = 0:241633 year MTTF A = T Q T TA j

Let us consider the example (ii). The MTTF given that all the switches were operating at the time of network failure is the mean time for the system to absorb either to U or L (A = fU; Lg).

139 The QTT matrix is the same as in the example (i). Since the condition will be satis ed when any of the two states in A has been reached, we aggregate U and L into one virtual absorbing state w. The transition rate from a transient state to w is the same as the sum of the rate to U and the rate to L. qiw = qiU + qiL (i 2 T) Hence

"

QTA = 4 +

#

:

The initial probability vector is again PT (0) = [P0(0); P1(0); P2(0)] = [1; 0; 0]: By solving the system of equations of Equations (8.5) and (8.10) we get, T = [0:216204; 0:0236733; 0:00203729] T = [0:0517603; 0:0061315; 0:000562733] which are already computed in the example (i). We nally compute the conditional MTTF to A = fU; Lg as: QTA = 0:250347 year: MTTF A = T Q T TA j

Let us compare the MTTF A with MTTF. MTTF to overall absorbing states is computed by Equation (8.8) as: MTTF = 0 + 1 + 2 = 0:241915 year: j

The second moment of the failure time to overall absorbing states is computed as in Equation (8.16) E[Y 2 ] = 2(0 + 1 + 2 ) = 0:11691 (year)2

and accordingly, the variance is computed as V ar[Y ] = E[Y 2 ] ? fE[Y ]g2 = 0:058387 (year)2 : Note that MTTFjfU;Lg is longer than MTTF or MTTFjfS g . This is because the system tends to absorb to L with the rate which is one third smaller (more slowly) than that to S from any of the transient states 0; 1; 2. Even though the system tends to absorb into U fast

140 when it in state 2 (4 = 4), this does not contribute much because the probability of reaching state 2 is very small. MTTFjfS g is less than MTTF. This agrees with our intuition because the system stays at the state 0 most of time before absorption (one can check the 0 is greater than 1 or 2 by order of magnitude.). At the state 0, the system absorbs into A = fS g with rate 2 = 4, or into L with rate = 0:1. That is, the system tends to absorb into A faster. Hence the conditional MTTF to S is smaller than MTTF which is averaged with slower failure times to either U or L.

8.6.2 Fault-Tolerant System Architectures Consider Markov models of commonly used fault-tolerant system architectures of Figure 8.9. We take into account coverage factor de ned as the conditional probability that the system is successfully recovered given that a fault has occurred. Consider the following three architectures. 1. S (Simplex system) 2. D (Duplex system) 3. DS (Duplex system recon gurable to Simplex system) The system S consists of a single processor. When a fault occurs (at the rate ) in the processor, the system will fail. The failure can be classi ed into 2 types. With probability CS , the fault is detected and the system shuts down safely. On the other hand, with probability 1 ? CS , the system does not detect the fault and it experiences an unsafe (critical, hazardous) failure. The probability CS is the coverage factor for a simplex system, it is the probability that the system detects the fault when it has occurred. Figure 8.9 (a) shows the Markov model of this system. In the operational state Oi, i (i 1) processors are operating. The states SF and UF represent safe and unsafe failure states, respectively. The system D consists of 2 identical processors executing the same task in parallel. When a processor generates a fault at the rate , the fault is covered with probability CD and the system is shutdown safely (Figure 8.9 (b)). The coverage factor CD in this case is the probability that the system detects the fault. After the execution of a task, the outputs from 2 processors are compared with each other. Because of the comparison, it is more likely to detect a fault when it has occurred. Thus the probability CD is naturally higher than CS (0 < CS < CD < 1)2 . 2 Even though a probability value may be either 0 or 1, this is almost never the case for a

141 CS SF

O1

(1 ? CS )

2CD

O2

UF

SF

UF

2(1 ? CD )

(b) System D

(a) System S 2CDS O1

CS SF

O2

(1 ? CS )

2(1 ? CDS ) UF

(c) System DS

Figure 8.9: Architectures of Fault-Tolerant Systems The system DS (Figure 8.9 (c)) also consists of 2 processors executing the same task in parallel. When a fault occurs, the system suers unsafe failure with probability 1 ? CDS . With probability CDS , the fault is covered and the system is recon gured into the system S. The coverage factor CDS is the probability not only to detect the fault but also to recon gure the system in order to keep operating with the non-faulty processor. Since CDS is the probability that the both events are successful, CDS is naturally smaller than CD (0 < CDS < CD < 1).

Measures We compare the three architectures with steady state measures such as the MTTF of systems, the probability of unsafe failure, the conditional MTTF to unsafe failure and the mean time between hazardous events (MTBHE) [74]. We note that the comparison of these architectures in terms of PUF (1) occurs in [46]. MTBHE is obtained by computing the cumulative conditional MTTF to unsafe failure from the time the system began operation. Using the results derived earlier, we compute the measures as shown in Table 8.1. Note that MTTF SUF is the same as MTTF S . One might think that the conditional MTTF j

to state UF is simply 1=(1 ? CS ) because we know the system will move to state UF by coverage factor.

142 Table 8.1: Dependability Measures for the Three Architectures Measures MTTF

Architecture S 1

Architecture D 1 2

Architecture DS 1 CDS 2 +

PUF (1)

1 ? CS

1 ? CD

1 ? CS CDS

MTTF UF

1

1 2

1 + 2CDS ? 3CS CDS 2(1 ? CS CDS )

MTBHE

1

1

1 + 2CDS 2(1 ? CS CDS )

j

(1 ? CS )

2(1 ? CD )

the condition, then we can ignore the state transition to state SF. This is not true because conditioning on the unsafe failure does not mean that the system is guaranteed to fail unsafely. The system can fail either safely or unsafely. The conditional MTTF to the unsafe failure state is the MTTF just in case that X(1) = UF. By ignoring the transition to the safe failure state, we obtain the MTBHE.

Comparisons of Reliability and Safety Measures We compare the three architectures with respect to reliability measure (MTTF) and safety measures (probability of unsafe system failure and MTBHE). Reliability Measure It is clear that the following relations of MTTFs hold: MTTF S > MTTF D MTTF D < MTTF DS MTTF S < MTTF DS when CDS > 0:5. In order for any system to be dependable, the coverage factor of the system should be reasonably high. Most of practical systems have a coverage value bigger than 0.9 [13, 40]. Considering this

143 fact, we have the following relations hold in most cases: MTTF D < MTTF S < MTTF DS :

(8:21)

MTTF of the duplex system is the smallest among the three architectures. It is even worse than the simplex system. This is because the rate of fault occurrence from two processors is double that from one processor. We con rm the well known fact that adding more processors does not necessarily improve the reliability of a fault-tolerant system [112]. Safety Measures S We rst compare probabilities of unsafe failures. Since CS < CD , we get PUF (1) = 1 ? CS > D D DS 1 ? CD = PUF (1): From CD > CDS and CS < 1, PUF (1) = 1 ? CD < 1 ? CS CDS = PUF (1): S DS Similarly, PUF (1) < PUF (1): We nally get the following relations: D S DS PUF (1) < PUF (1) < PUF (1) :

(8:22)

It is interesting to observe that Equations (8.21) and (8.22) have the reverse order among the three architectures. Even though the system DS provides the highest mean time to failure, the probability for the system to fail unsafely is also the highest. This means that the system DS is more prone to fail in an unsafe manner once it fails even though the expected time to fail is long. On the other hand, even though the mean time to failure of the duplex system is the lowest, the probability of unsafe failure is the smallest. We can conclude that when the safety is the most important factor or the cost of recovery from the unsafe failure is very high compared with that of safe failure, the duplex system is the most desirable choice among the three architectures.

Next, consider the MTBHEs of the three architectures. We rst compare MTBHEs of the simplex and the duplex systems. Figure 8.10 shows the relation between MTBHEs and CS ; CD

(CS < CD ). The MTBHE S is bigger than MTBHE D when CD < (CS + 1)=2 (the lightly-shaded region of the plot) and it is less than MTBHE D when CD > (CS + 1)=2 (the darkly-shaded region of the plot). From this plot we know that, for most practical value of CD (> 0:9): MTBHE D > MTBHE S : We now compare MTBHEs of the system D and DS. The MTBHE D will be bigger than MTBHE DS

when CD > CDS (2 + CS )=(1 + 2CDS ) : Computing MTBHE D and MTBHE DS

144 CD 1.0 0.9 0.5

CD = CS CD = (CS + 1)=2

II I 0.5

1.0

D S I : MTBHE < MTBHE D S

CS

II : MTBHE > MTBHE

Figure 8.10: MTBHE with respect to CS and CD numerically over CD 2 (0; 1) and CS 2 (0; CD ) and CDS 2 (0; CD ), we have observed that MTBHE D > MTBHE DS most of the time except when both the CDS and CS are close to CD (the dierence is less than 0.05) in the range of CD 0:95. In most of practical cases, when CD CS , we expect: MTBHE D > MTBHE DS : It can be concluded from this analysis that safety measures are highly sensitive to coverage values. Under a reasonable assumption that the duplex coverage(CD ) is larger than the simplex coverage (CS ) and the duplex coverage with recon guration(CDS ), the duplex architecture turns out to be safer than the others under the probability of unsafe failure and mean time between hazardous events even though its reliability is the worst. In conclusion, many practical problems demand the computation of the mean time to failure for a system with multiple causes of system failure. We have formalized the concept of associating a condition to the computation of the mean time to failure and developed an ecient computation method assuming a Markov reliability model. The method of computing the cumulative conditional mean time to failure and the second or higher moments of the failure time have been discussed.

Chapter 9 Conclusion 9.1 Summary of the Dissertation Stochastic timed Petri nets provide a good framework to model the dynamic behavior of concurrent and asynchronous systems. Various dependencies that often occur in distributed systems can be modeled by stochastic timed Petri nets. As a result, modeling of distributed computercommunication systems by stochastic timed Petri nets is a recent trend. We have introduced a new class of stochastic timed Petri nets called MRSPNs which can handle generally distributed event times. We have de ned MRSPNs to be stochastic timed Petri nets whose underlying stochastic processes are Markov regenerative processes. We have shown a sucient condition for stochastic timed Petri nets to be MRSPNs. We also have introduced MRSPN , a subset of MRSPNs which satis es this condition, and have discussed an analytical solution technique for this. We have derived the kernel distributions, the equations for the steady state behavior, and the equations for the transient behavior for this new class of MRSPNs. The transient and steady state analysis of MRSPNs can be done analytically and numerically rather than by simulation. As a speci c example of general distributions, we have provided the above equations for uniformly distributed ring times. We also have provided the equations for an MRSPN which have deterministic ring times. This latter class of MRSPN is called the DSPN. We have investigated the sensitivity of DSPN models by studying the variation of output performance with respect to changes of a system parameter. We have shown an algorithm for computing the sensitivity functions in terms of steady state probabilities of the continuous-time process

145

146 underlying the DSPN. Once we obtain the sensitivity function, the sensitivities of the DSPN model output measures can be obtained accordingly. One of the problems in using stochastic Petri nets for real applications is that the size of underlying Markov chain of a Petri net model tends to be large so as to be computationally intractable. Hence, in order to make the performance analysis of a large system tractable, we have developed approximation methods at the Petri net level to avoid not only the generation of large state spaces of underlying stochastic processes but also the computation of the large model.

9.2 Future Research We observe that the condition given in De nition 3.2 is one of the sucient conditions for stochastic timed Petri nets to be MRSPN. More relaxed conditions are possible. For example, we can allow more than one GEN transition to be enabled in a marking. The restriction is that all the GEN transitions enabled in that marking must have started their enabling epochs at the same time point. The underlying stochastic process of the stochastic timed Petri net with this condition is still MRGP; thus this stochastic timed Petri net is MRSPN. We may also be able to identify the necessary condition for the Petri nets to be MRSPNs. Ecient numerical methods are crucial for the computation of transient, steady state solutions for large models. The computational method described in this thesis needs a matrix inversion which is costly for a large size matrix. As an alternative, the numerical inversion method coupled with SOR linear system solution method is recommended. Solving the system of integral equations in Equation (3.23) is another subject. We may be able to obtain the transient solution by partial dierential equations solution methods. The study of sensitivity of MRSPN models is another important subject and is currently being investigated [73]. Approximation of large Petri net model at the Petri net level using the near-independent decomposition method [31] is of interest. The computation of conditional MTTF in semi-Markov process models and in Markov regenerative process models of system reliability might also be interesting. Some possible applications of reliability or performance modeling using MRSPN are M/D/1 or MMPP/D/1 queue transient analysis, communication network models with deterministic

147 event times (switch over time, service time etc.), and a system with multiple components each with a fail rate and with a single repair facility having a generally distributed service time. Also, this can be applied to veri cation and performance evaluation of communication protocols [114], phased-mission analysis [32, 42, 90, 101], and preventive maintenance models [6, 8, 87]. In this thesis, MRSPNs have been shown to be useful in solving models with generally distributed event times which occur in many practical systems. There is a broad range of applications for these, and there is much work to be done in this eld.

148

Appendix I : Proof of Equation (3.31) Z

a

b

e?sx eQx dx

=

Z

b a

e?sx

1 Qi Z b si+1 xie?sx Qi xi dx = X dx i+1 a i! i=0 s i=0 i!

1 X "

1 X

i (sb)k e?sb i (sa)k e?sa X Qi f1 ? X g ? f 1 ? = i+1 k! k! g i=0 s k=0 k=0

#

(by using a property of the Erlang distribution function) 1 X

Qi = i+1 i=0 s =

(

1 X

1 (sa)k e?sa (sb)k e?sb ? X k! k! k=i+1 k=i+1

)

1 kX ?1 Qi (sa)k e?sa Qi (sb)k e?sb ? X i+1 i+1 k! k! k=1 i=0 s k=1 i=0 s 1 kX ?1 X

(by changing the order of summation) = = =

1 (sa)k e?sa kX ?1 Qi ?1 Qi X (sb)k e?sb kX ? i i k=1 s k! i=0 s k=1 s k! i=0 s 1 X

1 X

1 (sa)k e?sa I ? (Q=s)k (sb)k e?sb I ? (Q=s)k ? X k=1 s k! I ? (Q=s) k=1 s k! I ? (Q=s) 1 (sa)k e?sa (sb)k e?sb (sI ? Q)?1 (I ? Qk ) ? X ?1(I ? Qk ) (sI ? Q) k! sk k! sk k=1 k=1 1 X

1 (sb)k k X ? sb ? 1 = e (sI ? Q) f k! I ? (Qb) k! g k=1 1 k X (Qb)k g I ? = e?sb(sI ? Q)?1 f (sb) k! k=0 k!

? ?

1 (sa)k k X ? sa ? 1 e (sI ? Q) f k! I ? (Qa) k! g k=1 1 k X (Qa)k g e?sa (sI ? Q)?1 f (sa) I ? k! k=0 k!

= e?sb(sI ? Q)?1 (esbI ? eQb ) ? e?sa (sI ? Q)?1 (esa I ? eQa ) = (sI ? Q)?1fI ? e?(sI ?Q)b g ? (sI ? Q)?1fI ? e?(sI ?Q)a g = (sI ? Q)?1f e?(sI ?Q)a ? e?(sI ?Q)b g:

2

149

Appendix II : Abbreviations CSMA/CD : Carrier Sense Multiple Access protocol with Collision Detection CTMC : Continuous-Time Markov Chain DET transition : a timed transition with deterministic ring time DTMC : Discrete-Time Markov Chain DSPN : Deterministic and Stochastic Petri Net EMC : Embedded Markov Chain ESPN : Extended Stochastic Petri Net EXP transition : a timed transition with exponentially distributed ring time FIFO : First In First Out GEN transition : a timed transition with generally distributed ring time GSPN : Generalized Stochastic Petri Net ISDN : Integrated Services Digital Network LAN : Local Area Network LT : Laplace Transform LST : Laplace-Stieltjes Transform MMPP : Markov Modulated Poisson Process MRGP : Markov Regenerative Process MRSPN : Markov Regenerative Stochastic Petri Net MTBHE : Mean Time Between Hazardous Events MTTF : Mean Time To Failure PN : Petri Net SMP : Semi-Markov Process SOR : Successive Overrelaxation SPN : Stochastic Petri Net SPNP : Stochastic Petri Net Package SRN : Stochastic Reward Net System D : Duplex system System DS : Duplex system recon gurable to Simplex system System S : Simplex system UNI : Uniform distribution

Bibliography [1] ANSI/IEEE Standard 802.5. An American National Standard, IEEE Standards for Local Area Networks: Token Ring Access Method and Physical Layer Speci cations. IEEE Press, 1985. [2] M. Ajmone-Marsan and G. Chiola. On Petri nets with deterministic and exponentially distributed ring times. In Lecture Notes in Computer Science, volume 266, pages 132{ 145. Springer-Verlag, 1987. [3] M. Ajmone-Marsan, G. Chiola, and A. Fumagalli. An accurate performance model of CSMA/CD bus LAN. In Lecture Notes in Computer Science, volume 266, pages 146{161. Springer-Verlag, 1987. [4] M. Ajmone-Marsan, G. Conte, and G. Balbo. A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems. ACM Transactions on Computer Systems, 2(2):93{122, May 1984. [5] M. Ajmone-Marsan, S. Donatelli, and F. Neri. GSPN Models of Markovian Multiserver Multiqueue Systems. Performance Evaluation, 11(4):227{240, Nov. 1990. [6] S. L. Albin and S. Chao. Preventive replacement in systems with dependent components. IEEE Transactions on Reliability, 41(2):230{238, Jun. 1992. [7] M. D. Beaudry. Performance related reliability for computer systems. IEEE Transactions on Computers, C-27(6):540{547, June 1978. [8] F. Beichelt and K. Fischer. General failure model applied to preventive maintenance policies. IEEE Transactions on Reliability, 29(1):39{41, Apr. 1980.

150

151 [9] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. Academic Press, 1979. [10] J. T. Blake, A. L. Reibman, and K. S. Trivedi. Sensitivity analysis of reliability and performance measures for multiprocessor systems. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 177{186, Santa Fe, U.S.A., May, 1988. [11] A. Bobbio and A. Cumani. Discrete state stochastic systems with phase-type distributed transition times. In Proceedings of the AMSE International Conference on Modelling and Simulation, pages 173{192, Athens, 1984. [12] A. Bobbio and K. S. Trivedi. An aggregation technique for the transient analysis of sti Markov chains. IEEE Transactions on Computers, COM-35(9):803{814, Sep. 1986. [13] W. G. Bouricius, W. C. Carter, and P. R. Schneider. Reliability modeling techniques for self-repairing computer systems. In Proceedings of the 24th Annual ACM National Conference, pages 295{309, 1969. [14] M. Bowman, L. L. Perterson, and A. Yeatts. Univers: An attribute-based name server. Software-Practice and Experience, 20(4):403{424, Apr. 1990. [15] S. Budkowski and P. Dembinske. An introduction of Estelle: A speci cation language for distributed systems. Computer Networks and ISDN Systems, 14:3{23, 1987. [16] W. Bux. Local-area subnetworks: A performance comparison. IEEE Transactions on Communications, COM-29(10):1465{1473, Oct. 1981. [17] W. Bux, F. H. Closs, K. Kuemmerle, H. J. Keller, and H.R. Mueller. Architecture and design of a reliable token-ring network. IEEE Journal on Selected Areas in Communications, SAC-1:756{765, Nov. 1983. [18] J. A. Buzacott. Markov approach to nding failure times of repairable systems. IEEE Transactions on Reliability, R-19:128{134, 1970. [19] V. M. Catuneanu and A. N. Mihalache. Reliability Fundamentals. Elsevier, 1989. [20] E. C inlar. Introduction to Stochastic Processes. Prentice-Hall, Englewood Clis, U.S.A., 1975.

152 [21] E. E. Chang, D. Gedye, and R. H. Katz. The design and implementation of a version server for computer-aided design data. Software-Practice and Experience, 19(3):199{222, Mar. 1989. [22] P. F. Chimento and K. S. Trivedi. The completion time of programs on processors subject to failure and repair. IEEE Transactions on Computers, To appear. [23] G. Chiola. A software package for the analysis of generalized stochastic Petri net models. In Proceedings of International Workshop on Timed Petri Nets, pages 136{143, Torino, Italy, Jul. 1-3 1985. [24] H. Choi, V. G. Kulkarni, and K. S. Trivedi. Transient analysis of deterministic and stochastic Petri nets. In Proceedings of The 14th International Conference on Application and Theory of Petri Nets, Chicago, U.S.A., Jun. 21-25 1993, To appear. [25] H. Choi, V. Mainkar, and K. S. Trivedi. Sensitivity analysis of deterministic and stochastic Petri nets. In Proceedings of MASCOTS'93, the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 271{276, San Diego, USA, Jan 1993. [26] W. Chou. Terminal response time on polled teleprocessing networks. In Computer Networking Symposium Proceedings, pages 1{10, Dec. 13 1978. [27] G. Ciardo, A. Blakemore, Jr. P. F. Chimento, J. K. Muppala, , and K. S. Trivedi. Automated generation and analysis of Markov reward models using stochastic reward nets. In C. Meyer and R. Plemmons, editors, Linear Algebra, Markov Chains and Queuing Models, IMA Volumes in Mathematics and its Applications, Vol. 48. Springer-Verlag, 1993, To appear. [28] G. Ciardo, R. Marie, B. Sericola, and K. S. Trivedi. Performability analysis using semiMarkov reward processes. IEEE Transactions on Computers, C-39(10):1251{1264, 1990. [29] G. Ciardo, J. K. Muppala, and K. S. Trivedi. Analyzing concurrent and fault-tolerant software using stochastic reward nets. Journal of Parallel and Distributed Systems, pages 255{269, 1992. [30] G. Ciardo, J. K. Muppala, and K. S. Trivedi. SPNP: Stochastic Petri Net Package. In Proceedings of 3rd International Workshop on Petri Nets and Performance Models, pages 142{150, Kyoto, Japan, Dec. 1989.

153 [31] G. Ciardo and K. S. Trivedi. A decomposition approach for stochastic Petri net models. In Proceedings of the 4th International Workshop on Petri Nets and Performance Models, pages 74{83, Melbourne, Australia, Dec. 3-5 1991. [32] C. A. Clarotti, S. Contini, and R. Somma. Repairable multiphase systems{Markov and fault-tree approaches for reliability evaluation. In G. Apostolakis, S. Garribba, and G. Volta, editors, Synthesis and analysis methods for safety and reliability studies, pages 45{58. Plenum Press, New York, 1980. [33] P. J. Courtois. Decomposability: Queueing and Computer System Applications. Academic Press, 1977. [34] J. Couvillon, R. Freire, R. Johnson, W. D. Opal, M. A. Qureshi, M. Rai, W. H. Sanders, and J. E. Twedt. Performability modeling with UltraSAN. In Proceedings of the 4th International Workshop on Petri Nets and Performance Models, pages 290{299, Melbourne, Australia, Dec. 3-5 1991. [35] D. R. Cox. A use of complex probabilities in the theory of stochastic processes. In Proceedings of the Cambridge Philosophical Society, volume 51, pages 313{319, 1955. [36] A. Cumani. ESP - A package for the evaluation of stochastic Petri nets with phase-type distributed transition times. In Proceedings of International Workshop on Timed Petri Nets, pages 144{151, Torino, Italy, Jul. 1-3 1985. [37] R. J. Cypser. Communication Architecture for Distributed Systems. Addison-Wesley, 1978. [38] E. de Souza e Silva, S.S. Lavenberg, and R. R. Muntz. A perspective on iterative methods for the approximate analysis of closed queueing networks. In G. Iazeolla, P. J. Courtois, and A. Hordijk, editors, Mathematical Computer Performance and Reliability, pages 225{ 244. North-Holland, 1984. [39] R. C. Dixon, N. C. Strole, and J. D. Markov. A token-ring network for local data communications. IBM Systems Journal, 22(1,2):47{62, 1983. [40] J. B. Dugan and K. S. Trivedi. Coverage modeling for dependability analysis of faulttolerant systems. IEEE Transactions on Computers, 38(6):775{787, 1989.

154 [41] J. B. Dugan, K. S. Trivedi, R. M. Geist, and V. F. Nicola. Extended stochastic Petri nets: Applications and analysis. In E. Gelenbe, editor, Performance '84, pages 507{519. Elsevier Science Publishers B. V. (North-Holland), Amsterdam, Netherlands, 1985. [42] J. D. Esary and H. Ziehms. Reliability analysis of phased missions. In R. E. Barlow, J. B. Fussell, and N. D. Singpurwalla, editors, Reliability and Fault Tree Analysis: Theoretical and Applied Aspects of System Reliability and Safety Assessment, pages 213{236, Philadephia, 1975. SIAM. [43] W. Feller. An Introduction to Probability Theory and Its Applications, volume 2. John Wiley and Sons, Inc., 2 edition, 1971. [44] P. M. Frank. Introduction to System Sensitivity Theory. Academic Press, New York, 1978. [45] R. M. Geist, M. K. Smotherman, K. S. Trivedi, and J. B. Dugan. The reliability of life-critical systems. Acta Informatica, 23:621{642, 1986. [46] T. C. Giras, J. A. Profeta III, D. Bozzolo, C. Y. Choi, and B. W. Johnson. Safety issues in the comparative analysis of redundant architectures. In COMPRAIL 92, Washington DC, U.S.A., Oct. 1992. [47] G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, U.S.A., 2 edition, 1989. [48] A. Goyal, S. Lavenberg, and K. S. Trivedi. Probabilistic modeling of computer system availability. Annals of Operations Research, 8:285{306, March 1987. [49] W. K. Grassman. Transient solution in Markovian queueing systems. Computers and Operations Research, 4:47{56, 1977. [50] W. K. Grassmann. Computational methods in probability theory. In D. P. Heyman and M. J. Sobel, editors, Stochastic Models, pages 199{254. North-Holland, 1990. [51] D. Grillo. Polling mechanism models in communication systems - some application examples. In H. Takagi, editor, Stochastic Analysis of Computer and Communication Systems, pages 659{698. North-Holland, 1990. [52] D. Gross and C. M. Harris. Fundamentals of Queueing Theory. John Wiley & Sons, Inc., second edition, 1985.

155 [53] D. Gross and D. Miller. The randomization technique as a modeling tool and solution procedure for transient Markov processes. Operations Research, 32(2):334{361, 1984. [54] P. J. Haas and G. S Shedler. Stochastic Petri nets with timed and immediate transitions. Communications in Statistics- Stochastic Models, 5(4):563{600, 1989. [55] S. Hauser, R. Mittu, C. Rivera, and G. Thoma. Performance study of a lan based image server. In Proceedings of 10th Annual International Phoenix Conference on Computers and Communications, pages 630{636, Scottsdale, U.S.A., Mar. 27-30 1991. [56] P. Heidelberger and K. S. Trivedi. Analytic queueing models for programs with internal concurrency. IEEE Transactions on Computers, C-32(1):73{82, Jan. 1983. [57] D. P. Heyman and M. J. Sobel. Stochastic Models in Operations Research, Volume I. McGraw-Hill, 1982. [58] M. A. Holliday and M. K. Vernon. A generalized timed Petri net model for performance analysis. IEEE Transactions on Software Engineering, SE-13(12):1297{1310, Dec. 1987. [59] M. C. Hsueh, R. K. Iyer, and K. S. Trivedi. Performability modeling based on real data: A case study. IEEE Transactions on Computers, 37(4):478{484, Apr. 1988. [60] O. C. Ibe and P. Y. Chen. Hybrinet: A hybrid bus and token ring network. Computer Networks and ISDN Systems, 9(3):215{221, Mar. 1985. [61] O. C. Ibe and K. S. Trivedi. Stochastic Petri net models of polling systems. IEEE Journal on Selected Areas in Communications, 8(10):1649{1657, Dec. 1990. [62] D. L. Jagerman. An inversion technique for the Laplace Transforms. Bell System Technical Journal, 61(8):1995{2002, Sep. 1982. [63] B. W. Johnson and J. H. Aylor. Reliability & safety analysis of a fault-tolerant controller. IEEE Transactions on Reliability, 35(4):355{362, Oct. 1986. [64] B. W. Johnson, J. Pet-Edwards, and A. J. Schwab. Conditional expectations in the evaluation of fault-tolerant systems. In Proceedings Annual Reliability and Maintainability Symposium, pages 242{247, Orlando, U.S.A., Jan. 1991. [65] J. Keilson. Markov Chain Models: Rarity and Exponentiality. Springer Verlag, Berlin, 1979.

156 [66] U. Krieger, B. Muller-Clostermann, and M. Sczittnick. Modelling and analysis of communication systems based on computational methods for Markov chains. IEEE Journal on Selected Areas in Communications, 8(9):1630{1648, 1990. [67] V. G. Kulkarni. Lecture Notes on Stochastic Models in Operations Research. University of North Carolina, Chapel Hill, U.S.A., 1990. [68] E. D. Lazowska, J. Zahorjan, D. R. Cheriton, and W. Zwaenepoel. File access performance of diskless workstations. ACM Transactions on Computer Systems, 4(3):238{268, Aug. 1986. [69] P. J. Leach, P. H. Levine, B. P. Douros, J. A. Hamilton, D. L. Nelson, and B. L. Stumpf. The architecture of an integrated local network. IEEE Journal on Selected Areas in Communications, SAC-1(5):842{857, Nov. 1983. [70] C. Lindemann. An improved numerical algorithm for calculating steady-state solutions of deterministic and stochastic Petri net models. In Proceedings of the 4th International Workshop on Petri Nets and Performance Models, pages 176{185, Melbourne, Australia, Dec. 3-5 1991. [71] C. Lindemann. A stochastic performance modeling technique for deterministic medium access schemes. In Proceedings of the Third Workshop on Future Trends of Distributed Computing Systems, pages 346{352, Taipei, Taiwan, Apr. 14-15 1992. [72] M. Lu, D. Zhang, and T. Murata. Analysis of self-stabilizing clock synchronixation by means of stochastic Petri nets. IEEE Transactions on Computers, C-39:597{604, 1990. [73] V. Mainkar, H. Choi, and K. S. Trivedi. Sensitivity analysis of Markov regenerative stochastic Petri nets. In Proceedings of The 5th International Workshop on Petri Nets and Performance Models, Toulouse, France, Oct. 20-22 1993, (submitted). [74] V. Mainkar, K. S. Trivedi, C. Constantinescu, and W. Wang. Safety analysis of some basic multiprocessor architectures, Jul. 1992. Internal Document. [75] M. Malhotra and A. L. Reibman. Selecting and implementing phase approximations for semi-Markov models. To appear in Stochastic Models, 1992. [76] J. F. Meyer. On evaluating the performability of degradable computing systems. IEEE Transactions on Computers, C-29(8):720{731, Aug. 1980.

157 [77] J.G. Mitchell and J. Dion. A comparison of two network-based le servers. Communications of the ACM, 25:233{245, Apr. 1982. [78] I. Mitrani. Fixed-point approximations for distributed systems. In G. Iazeolla, P. J. Courtois, and A. Hordijk, editors, Mathematical Computer Performance and Reliability, pages 245{258. North-Holland, 1984. [79] M. K. Molloy. On the Integration of Delay and Throughput Measures in Distributed Processing Models. PhD thesis, University of California, Los Angeles, Los Angeles, U.S.A., 1981. [80] M. K. Molloy. Discrete time stochastic Petri nets. IEEE Transactions on Software Engineering, SE-11(4):417{423, Apr. 1985. [81] M. K. Molloy. Performance analysis using stochastic Petri nets. IEEE Transactions on Computers, C-31(9):913{917, Sep. 1982. [82] J. K. Muppala and K. S. Trivedi. Composite Performance and Availability Analysis using a Hierarchy of Stochastic Reward Nets. In G. Balbo and G. Serazzi, editors, Computer Performance Evaluation, Modelling Techniques and Tools, pages 335{350. Elsevier, Amsterdam, 1992. [83] J. K. Muppala and K. S. Trivedi. GSPN models: Sensitivity analysis and applications. In Proceedings of the 28th ACM Southeast Region Conference, pages 24{33, Apr. 1990. [84] S. Natkin. Reseaux de Petri Stochastiques. PhD thesis, CNAM-PARIS, Paris, France, 1980. [85] M. F. Neuts. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore, MD, 1981. [86] M. F. Neuts and K. Meier. On the use of phase-type distributions in reliability modelling of systems with two components. Operations Research Spektrum, 2(4):227{234, 1981. [87] D. Nguyen and D. Murthy. Optimal preventive maintenance policies for repairable systems. Operations Research, 29(6):1181{1194, Dec. 1981. [88] W. C. Obi. Error analysis of a Laplace transform inversion procedure. SIAM Journal of Numerical Analysis, 27(2):457{469, Apr. 1990.

158 [89] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, Inc., 1970. [90] A. Pedar and V. V. S. Sarma. Phased-mission analysis for evaluating the eectiveness of aerospace computing-systems. IEEE Transactions on Reliability, 30(5):429{437, 1981. [91] J. L. Peterson. Petri Net Theory and the Modeling of Systems. Prentice-Hall, Inc., Englewood Clis, 1981. [92] C. A. Petri. Kommunikation mit Automaten. PhD thesis, University of Bonn, Bonn, Germany, Jan. 1962. [93] B. Philippe, Y. Saad, and W. J. Stewart. Numerical methods in Markov chain modeling. Operations Research, 40(6):1156{1179, Nov-Dec 1992. [94] G. Popek, B. Walker, J. Chow, D. Edwards, C. Kline, G. Rudisin, and G. Thiel. Locus: A network transparent, high reliability distributed system. In Proceedings of the 8th ACM Symposium on Operating Systems Principles, pages 169{177, Dec. 14-16 1981. [95] R. R. Razouk and C. V. Phelps. Performance analysis using timed Petri nets. In Y. Yemini, R. Strom, and S. Yemini, editors, Protocol Speci cation, Testing, and Veri cation, IV, pages 561{576. Elsevier Science Publishers, B.V.,(North-Holland), 1985. [96] A. L. Reibman, R. M. Smith, and K. S. Trivedi. Markov and Markov reward model transient analysis: An overview of numerical approaches. European Journal of Operational Research, 40:257{267, 1989. [97] V. V. S. Sarma, K. S. Trivedi, and A. L. Reibman. Optimization methods in computer system design. In B. Silverman, editor, Handbook of Engineering Design Using Operations Research Methods, pages 683{705. North-Holland, Amsterdam, 1986. [98] M. Schulze, G. Gibson, R. Katz, and D. Patterson. How reliable is a RAID? In Proceedings of Spring COMPCON 89, pages 118{123, San Francisco, U.S.A., Mar. 1989. [99] D. P. Siewiorek and R. S. Swarz. The Theory and Practice of Reliable System Design. Digital Press, Bedford, U.S.A., 1982. [100] R. M. Smith and K. S. Trivedi. A performability analysis of two multiprocessor systems. In Proceedings of IEEE Int. Symp. on Fault-Tolerant Computing, FTCS-17, pages 224{ 229, Pittsburgh, U.S.A., Jul. 1987.

159 [101] M. K. Smotherman. Transient solutions of time-inhomogeneous Markov reward models with discontinuous rates. In W. J. Stewart, editor, Numerical Solution of Markov Chains, pages 385{399. Marcel Dekker, Inc., New York, 1991. [102] W. Stewart and A. Goyal. Matrix methods in large dependability models. Research Report RC-11485, IBM, Nov. 1985. [103] D. Swinehart, G. McDaniel, and D. Boggs. Wfs: A simple shared le system for a distributed environment. In Proceedings of the 7th ACM Symposium on Operating Systems Principles, pages 9{17, Dec. 10-12 1979. [104] J. S. Sykes. Analysis of the communications aspects of an inquiry-response system. In AFIPS Conference Proceedings, 1969 Fall Joint Conference, pages 655{667, Las Vegas, U.S.A., Nov. 18-20 1969. [105] H. Takagi. Analysis of Polling Systems. The MIT Press, 1986. [106] H. Takagi. Analysis of an M/G/1//N queue with server's multiple vacations and exhaustive service, and its application to a polling model. Technical Report 0033, IBM Research, Tokyo Research Laboratory, Tokyo, Japan, 1990. [107] H. Takagi. Queueing analysis of polling models: An update. In H. Takagi, editor, Stochastic Analysis of Computer and Communication Systems, pages 267{318. North-Holland, 1990. [108] Y. Takahashi. Weak D-Markov chain and its application to a queueing network. In G. Iazeolla, P. J. Courtois, and A. Hordijk, editors, Mathematical Computer Performance and Reliability, pages 153{165. North-Holland, 1984. [109] F. A. Tobagi. Multiaccess protocols in packet communication systems. IEEE Transactions on Computers, C-28(4):468{488, Apr. 1980. [110] L. A. Tomek and K. S. Trivedi. Fixed point iteration in availability modeling. In M. Dal Cin, editor, Informatik-Fachberichte, Vol. 283: Fehlertolerierende Rechensysteme, pages 229{240. Springer-Verlag, Berlin, 1991. [111] K. S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice-Hall, Inc., Englewood Clis, 1982.

160 [112] K. S. Trivedi, A. S. Sathaye, O. C. Ibe, and R. C. Howe. Should I add a processor ? In Proceedings of the 23rd Annual Hawaii International Conference on System Sciences, pages 214{221, Jan. 1990. [113] A. K. von Mayrhauser and K. S. Trivedi. Computer con guration design to minimize response time. Computer Performance, pages 32{39, Mar. 1982. [114] C. Wang and K. S. Trivedi. Integration of speci cation for modeling and speci cation for system design. In Proceedings of The 14th International Conference on Application and Theory of Petri Nets, Chicago, U.S.A., Jun. 21-25 1993, To appear. [115] R. P. Wiley and R. R. Tenney. Performance analysis of stochastic timed Petri nets. In Proceedings of the 23rd Annual Hawaii International Conference on System Sciences, pages 222{231, Jan. 1990. [116] J. W. Wong and M. H. Ammar. Response time performance of videotex systems. IEEE Journal on Selected Areas in Communications, SAC-4:1174{1180, 1986. [117] W. M. Zuberek. M-timed Petri nets, modeling and performance evaluation of systems. Technical Report 8503, Department of Computer Science, Memorial University of Newfoundland, St. John's, Canada, Feb. 1985. [118] W. M. Zuberek. Timed Petri nets and preliminary performance evaluation. In Proceedings of 7th Annual Symposium on Computer Architecture, pages 88{96, May 1980.

Biography Hoon Choi was born in Korea on May 19, 1960. He received a B.S.E.E. in computer engineering from Seoul National University, Korea in 1983. From 1983 to 1988, he was with ETRI (Electronics and Telecommunications Research Institute) as a member of technical sta. He worked on design and implementation of LAN and ISDN systems. He was awarded an Excellent Researcher Award of ETRI in 1984. In 1988, he began graduate studies in the Department of Computer Science at Duke University, Durham, U.S.A. He received an M.S. in computer science from Duke University in 1990. He was awarded an IBM Graduate Fellowship during 1991-1993 academic years.

Publications 1. Oliver C. Ibe, Hoon Choi and Kishor S. Trivedi, \Performance Evaluation of Client-Server Systems," IEEE Transactions on Parallel and Distributed Systems, To appear. 2. Varsha Mainkar, Hoon Choi and Kishor S. Trivedi, \Sensitivity Analysis of Markov Regenerative Stochastic Petri Nets", Fifth International Workshop on Petri Nets and Performance Models (PNPM'93), Toulouse, France, Oct. 20-23 1993 (Submitted). 3. Hoon Choi and V. G. Kulkarni and K. S. Trivedi, \Markov Regenerative Stochastic Petri Nets", The 16th IFIP W.G. 7.3 Int'l Sym. on Computer Performance Modelling, Measurement and Evaluation (Performance'93), Rome, Italy, Sep. 1993. 4. Hoon Choi and V. G. Kulkarni and K. S. Trivedi, \Transient Analysis of Deterministic and Stochastic Petri Nets", The 14th International Conference on Application and Theory of Petri Nets, Chicago, U.S.A., Jun. 21-25 1993. 5. Hoon Choi, Varsha Mainkar and Kishor S. Trivedi, \Sensitivity Analysis of Deterministic and Stochastic Petri Nets", 1993 International Workshop on Modeling, Analysis and

161

162 Simulation of Computer and Telecommunication Systems (MASCOTS'93), San Diego,

USA, Jan. 17-20 1993. 6. Hoon Choi and Kishor S. Trivedi, \Conditional MTTF and its Computation in Markov Reliability Models", 1993 Annual Reliability and Maintainability Symposium (RAMS), Atlanta, USA, Jan. 25-28 1993. 7. Hoon Choi and Kishor S. Trivedi, \Approximate Performance Models of Polling Systems using Stochastic Petri Nets", in Proc. IEEE INFOCOM 92, 11th Annual Joint Conference of the IEEE Computer and Communication Societies, Florence, Italy, May 4-8 1992. 8. Hoon Choi, \Design Speci cation of the SPNP Client-Server System", LAN System Performance and Analysis Section (G37), IBM RTP, Aug. 1991. 9. Jack V. Briner, Jr. and Hoon Choi, \A Multiprogramming Uniform System Emulator," Unpublished Technical Report, Department of Computer Science, Duke University, Feb. 1989. 10. Hoon Choi, Y.S. Baek, K.S. Lee, N.H. Lee and Y.H. Choi, \Speci cation and Testbed Implementation for CCITT No. 7 Signalling System in Korea," in Proc. the Ninth International Conference on Computer and Communications(ICCC), Tel Aviv, Israel, Nov. 1988. 11. Hoon Choi, Y. Tscha, N.H. Lee and Y.H. Choi, \Service Protocol Architectures in ISDN," in Proc. of The Conference of Korea Institute of Electronics Engineers, Vol.9, No.1, June 1986. 12. Hoon Choi , Y.H. Choi and S.J. Chung, \Design and Implementation of the ETRI-NET File Server," Journal of Korea Information Science Society, Vol. 12, No. 3, Aug. 1985. 13. Hoon Choi, N.H. Lee and Y.H. Choi, \A Study on ISDN Protocol Architecture," in Proc. Korea Institute of Electronics Engineers, Vol. 8, No. 1, June 1985. 14. Hoon Choi, E.H. Kwon, H.Y. Park, Y.S. Moon, Y.H. Choi and S.J. Chung, \Development of Application Softwares on K-LAN," in Proc. The Conference of Korea Information Science Society, Vol.12, No. 1, Apr. 1985. 15. Hoon Choi and S. L. Min, \Compiling Method for the Modi ed Portion of a Program," Journal of Korea Information Science Society, Vol. 10, No. 1, Feb. 1983.

Markov Reliability Modeling for Induction Motor Drives

Optimizing Performance and Reliability in ... - Semantic Scholar

Integration of Reliability and Performance ... - Semantic Scholar

Automating the Performance and Reliability ... - Semantic Scholar

Adaptive, Nonparametric Markov Modeling for ... - Semantic Scholar

RELIABILITY ASSESSMENT USING ... - Semantic Scholar

perils of software reliability modeling - Semantic Scholar

modeling failure dependencies in reliability ... - Semantic Scholar

Markov modeling and reliability analysis of urea ... - Springer Link

Reliability of Semi-Markov Systems in Discrete Time: Modeling and ...

Performance Modeling Formalisms - Semantic Scholar

Bayesian Clustering Using Hidden Markov ... - Semantic Scholar

MARGINAL MAP ESTIMATION USING MARKOV ... - Semantic Scholar

Image Registration Using Markov Random ... - Semantic Scholar

Instrument classification using Hidden Markov ... - Semantic Scholar

Reliability of performance measurements derived ... - Semantic Scholar

Performance and Power Consumption Modeling ... - Semantic Scholar

Performance and Energy Modeling for ... - Semantic Scholar

Analysis and Modeling Experiment Performance ... - Semantic Scholar

Runtime Performance Modeling and Measurement ... - Semantic Scholar

Analysis and Modeling Experiment Performance ... - Semantic Scholar

Performance Modeling of Database and ... - Semantic Scholar

Hidden Markov Models for Modeling and ... - Semantic Scholar

Using Reliability Modeling and Accelerated Life ... - ScienceDirect

performance and reliability modeling using markov ... - Semantic Scholar

performance and reliability modeling using markov ... - Semantic Scholar

Suggest Documents