Non-Markovian State-Space Models in Dependability Evaluation

Non-Markovian State-Space Models in Dependability Evaluation Salvatore Distefanoa,∗, Kishor S. Trivedib a

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Via Ponzio 34/35, 20133 Milano, Italy. b Department of Electrical and Computer Engineering, Duke University, 27708 Durham, NC, USA.

Abstract The purpose of this paper is to provide an up-to-date treatment of advanced analytic, state-space based techniques to study dependability models with non-exponential distributions. We first provide an overview of different techniques for the solution of non-Markovian state-space based models, including phase-type expansion and a general framework which allows us to deal with renewal, semi-Markov, and Markov-regenerative processes, trying to characterize them on dependability contexts. In the last part of the paper we illustrate these techniques by means of some examples dealing with common non-exponential reliability behaviors. Our aim is to provide a reference for practising engineers, researchers and students in state-space dependability modeling and evaluation.

∗

Corresponding author. Email addresses: [email protected] (Salvatore Distefano), [email protected] (Kishor S. Trivedi) Preprint submitted to Quality and Reliability Engineering International

November 22, 2011

Nomenclature1 r.v. Random variable i.i.d. s-independent and identically distributed Cdf Cumulative distribution function pdf Probability density function pmf Probability mass function ((N)H)C/DTMC ((Non-)Homogeneous) Continuous/Discrete time Markov chain O/PDE Ordinary/Partial differential equation ((A)C/D)PH ((Acyclic) Continuous/Discrete) Phase type MRS Markov renewal sequence SMP Semi-Markov process MRGP Markov regenerative process EMC Embedded Markov chain I/DFR Increasing/Decreasing failure rate N, R Natural and Real numbers’ sets π(t) = [πi (t)] State probability vector at time t P = [pij ], B = [bij ] Transition probability matrices of DTMC Q(t) = [qij (t)] Infinitesimal generator matrix, where qij (t), i 6= j is the transition rate from state i to state j at P time t, and qii (t) = − j6=i qij (t) Ω, Φ Discrete state space sets P ||A|| Supremum norm of a generic matrix A: ||A|| = supi j |aij | mi [X] i non-central moment of the random variable X: mi [X] = E[X n ] E[X] Expected value of the random variable X (E[X] = m1 [X]) 2 var[X] Variance of the random variable X (var[X] = E[(X − pE[X]) ]) p σ[X] Standard deviation of the random variable X (σ[X] = var[X] = E[(X − E[X])2 ]) C[X], C 2 [X] Coefficient of variation and squared coefficient of variation of the random variable X (C[X] = σ[X] var[X] , C 2 [X] = (E[X]) 2) E[X] α = [αi ] Initial state probabilities’ vector of the Markov chain associated to a PH distribution b = [bi ] n-dimensional column vector grouping the probabilities from any state to the absorbing one in a DPH distribution Sn Random variable associated to the time instant of the nth event occurrence in Markov renewal theory Xn System state at time Sn in Markov renewal theory Yn = Sn − Sn−1 Random variable associated to the time interval between the (n − 1)st and the nth events in Markov renewal theory; if Xn = j, Yn is called the sojourn time in state j or the nth sojourn time (X, S) Bivariate stochastic process implementing a Markov renewal sequence K(t) = [Kij (t)] Kernel matrix of a Markov renewal sequence; in MRGP it is the global kernel matrix since it describes the evolution of the process from the Markov regeneration epoch perspective, without describing what happens in between these P moments H(t) = [Hi (t)] Sojourn time vector, where Hi (t) = Rj Kij (t) is the sojourn time distribution in state i h = [hi ] Mean sojourn time vector, where hi = 0∞ (1 − Hi (t))dt v = [vi ] Steady state probability vector of the embedded Markov chain of a SMP V(t) = [Vij (t)] Conditional transition probabilities matrix of a Markov renewal process V∼ (s) Laplace transform of V(t) E(t) = [Eij (t)] Local kernel of an MRGP: describes the state probabilities of the process during the interval between successive Markov regeneration epochs

1

The singular and the plural of an acronym are always spelled the same.

2

1. Introduction Different types of analytic models can be distinguished depending on the nature of their constitutive elements and solution techniques. The models we consider in this paper are based on state-space methods due to their flexibility and power in capturing dependence conditions in the system [1, 2, 3]. The state-space approach is a very general approach and can handle more cases in dependability and performance modeling than any other analytic method [4]. It can be used when the component behaviors are statistically independent (s-independent) as well as for systems involving dependencies. The method proceeds by the enumeration of system states, that is, a collection of variables, the values of which define the state of the system at a given point in time. State-space models may be of deterministic or stochastic nature. Stochastic models are usually the method of choice when modeling dependability since phenomena involving significant uncertainties and unpredictable variability (inherent in the system or in its inputs) frequently need to be represented. Through the probabilistic approach, the repercussions of such uncertainties in the model solution can be clearly shown. Stochastic models can be further classified as Markovian or non-Markovian. This distinction is based on the joint distribution of the underlying stochastic process and is explored in the remainder of this paper. The most commonly used stochastic model for performance and reliability analysis is the homogeneous continuous time Markov chain, which assumes that rates associated with events such as arrivals, service completions, failures, repairs, etc. are all constant in time, and the resulting transition time is exponentially distributed. It covers a wide range of real-world dependability and performance modeling problems. However, some behaviors of many practical systems cannot be captured by HCTMC, e.g. time-dependent rates or non-exponential distributions and aging effects. Although the non homogenous continuous time Markov chains allow us to represent time-variant rates, they are not able to cover or model regeneration/renewal behaviors, since the time dependent rates are only allowed to depend on a unique-global time [5, 6, 7]. The main contribution of this paper is to discuss and summarize several analytical techniques for modeling and evaluating the dependability of a system. In particular we focus on non-exponential behaviors, showing how to deal with distributions without the memoryless property in dependability models by discrete state space methods. Due to their high power and flexibility, such techniques allow us to model and consequently to analytically evaluate any kind of dependability behaviors, both static and dynamic. In this context the paper makes a general 3

overview of the topic, providing a detailed survey on state space techniques. Our aim in making this is to show the feasibility and the effectiveness of state space methods by applying them to the dependability evaluation of general but realistic examples, in order to cover several topologies and application areas. The examples have been explored in depth providing the results, so that others can make use of such techniques. Thus, in Section 2 we introduce the basics of analytical, renewal theory and state-space techniques that can be used in modeling systems with non-exponential distributions. In Section 3 we characterize such techniques in dependability contexts, using some of them in the evaluation of specific well-known examples of common and practical dependability behaviors, in order to demonstrate their effectiveness and applicability to real cases. Then, an overview of the recent enhancement on non-exponential distributions in state-space dependability models is provided in Section 4. Section 5 closes the paper with some final remarks. 2. Non-Markovian Dependability Modeling The homogeneous Markov models allow the solution of stochastic problems enjoying the Markov property: the probability of any particular future behavior of the process, when its current state is known exactly, is not altered by additional knowledge concerning its past behavior. A wide range of real dependability and performance modeling problems fall in the class of Markov models. However, some important aspects of system behavior in stochastic models cannot be easily captured through a Markov model. The common characteristic these problems share is that the Markov property is not valid (if valid at all) at all time instants. This category of problems is jointly referred to as non-Markovian. Non-Markovian models can be analyzed using several approaches: • Phase-type expansions [8, 9] Non-exponential distributions in a system model are replaced by a combination of exponentially distributed stages or phases. • Markov renewal theory [10, 11] - System behavior can be studied by means of some appropriately chosen embedded epochs where the Markov property holds. • Supplementary variables [12] Elapsed sojourn time (or residual sojourn time) is described by one or more continuous variables associated with the system states, and the resulting continuous state space Markov process can be solved by traditional methods. 4

2.1. Phase Type Expansion The use of phase type distributions dates back to the pioneering work of Erlang on congestion in telephone systems at the beginning of the last century [13]. His approach (named method of stages), although simple, was very effective in dealing with non-exponential distributions and has been considerably generalized since then. The method consists in representing a non-exponentially distributed state sojourn time by a combination of stages each of which is exponentially distributed. The whole process becomes an HMC provided that the description of the system-state contains the information as to which stage of the component state duration has been reached. The division into stages is an operational device and may not necessarily have any physical significance, and any distribution with a rational Laplace transform can, in principle, be represented exactly by a phase type expansion. The major advantage of the phase type expansios is that, once a proper stage combination has been found to represent or approximate a distribution, we can solve the resulting HMC model even in fairly complex models. By contrast, the application of supplementary variables, semi-Markov processes or Markov regenerative models is very limited in practical problems. The basic phase type expansion technique approximates a non-exponential distribution by connecting dummy stages with independent and exponential sojourn time distribution in series or parallel (or combination of both). A process with sequential phases (series connection) gives rise to a hypoexponential distribution, depending upon whether or not the phases have identical parameters, respectively. Instead, if a process consists of alternate phases (parallel connection) then the overall distribution is hyperexponential. The basic indicator in selecting one of these distributions to represent a nonexponential distribution is given by the coefficient of variation. The coefficient of variation of a random variable X, C[X], is a measure of deviation from the exponential distribution and is given by: C[X] =

σ[X] , E[X]

(1)

where σ[X] is the standard deviation of X and E[X] is its expectation. This coefficient varies according to the selected distribution. Important generalizations of the basic stage device are the Coxian [14], Phase Type [9], and Generalized Hyperexponential [15] distributions. Neuts [16, 9, 17, 18] popularized the class of PH distributions, which correspond to the time until absorption in a finite state Markov chain with n transient 5

states and a single absorbing state labeled (n + 1). Specifically, a Cdf F (t) is CPH if it can be written as: F (t) = 1 − αeQt 1, t ≥ 0 (2) where Q = [qij ], i, j = 1, ..., n is the infinitesimal generator matrix (restricted to the n transient states) of the CTMC; α = [αi ], i = 1, ..., n is the vector of initial state probabilities of the CTMC transient states; and 1 is an n-dimensional column vector of all ones. Each component of eQt 1 corresponds to a continuous phase-type distribution that results from starting at a particular state. Therefore, the Cdf F (t) can be interpreted as a mixture of CPH distributions, that is: F (t) =

n X i=1

αi 1 − eQt 1 i .

(3)

In the same way, a cumulative probability function F (k) can be expressed as the DPH distribution: k

F (k) = 1 − αB 1 = 1 − α

k−1 X

Bi b, k ∈ N

(4)

i=0

where B = [bij ], i, j = 1, ..., n is the DTMC transient states transition probability matrix; α = [αi ], i = 1, ..., n is the initial state probabilities vector; and b = [bi ], i = 1, ..., n is the n-dimensional column vector grouping the probabilities from any state to the absorbing one. The major advantage of using PH distributions is computational: instead of dealing with differential equations, complex variables and numerical integration, they can be handled using matrix methods [15]. A drawback of PH distributions is their non-uniqueness of representation. Many different combinations of defining parameters lead to the same Cdf. A problem that is solved by restricting the CTMC to acyclic multistate homogeneous Markov models as suggested in [19, 20, 21, 22]. 2.2. Markov Renewal Theory A set of very powerful techniques for the solution of non-Markovian models is based on concepts grouped under the umbrella of Markov renewal theory [10, 11], a collective name that includes Markov Renewal Sequences (MRSs), and two other important classes of stochastic processes with embedded MRS, named SemiMarkov Processes (SMPs) and Markov Regenerative Processes (MRGPs). 6

Assume the system we are modeling is described by a stochastic process Z = {Zt ; t ≥ 0} taking values in a countable set Φ. Suppose we are interested in a single event related with the system (e.g., when all system components fail). Additionally, assume the times between successive occurrences of this type of event are i.i.d. non-negative r.v. Yn , where Yn is the duration, the time interval between the (n − 1)st and the nth event. Let S0 < S1 < S2 < ... be the time instants of successive events to occur. Assuming Xn is the system state at time Sn . Definition 1. In mathematical terms, the bivariate stochastic process (X, S) = {Xn , Sn ; n ∈ N} is a Markov renewal sequence with state space Ω ⊆ Φ provided that: Pr{Xn+1 = j, Sn+1 − Sn ≤ t | Xn = i, ..., X0 ; Sn , ..., S0 } = = Pr{Xn+1 = j, Sn+1 − Sn ≤ t | Xn = i}

(5)

if S0 = 0 and Sn+1 ≥ Sn , for all n ∈ N, j ∈ Ω, t ≥ 0, . The random variables Xn and Sn are the state being visited and the time, respectively, of the nth transition. Thus (X, S) is a special case of bivariate Markov process in which the increments S1 − S0 , S2 − S1 , ... are all non-negative and are conditionally independent given X0 , X1 , .... These increments are called the sojourn times; if Xn = j, then Sn+1 − Sn is called the sojourn time in state j or the nth sojourn time. We will always assume time-homogeneous MRS, thus, the conditional transition probabilities Kij (t) Kij (t) = Pr{Xn+1 = j, Sn+1 − Sn ≤ t | Xn = i}

(6)

are independent of n for any i, j ∈ Ω, t ≥ 0. Therefore, we can always write: Kij (t) = Pr{X1 = j, S1 ≤ t | X0 = i}.

(7)

The matrix of transition probabilities K(t) = [Kij (t)] is called the kernel of the MRS2 . The stochastic sequence X = {Xn ; n ∈ N} keeps track of the successive states visited at Markov renewal moments and forms a discrete-time Markov chain with state space Ω. P = [pij ] is the transition probability matrix of such Embedded Markov Chain (EMC), where the one-step transition probabilities are: pij = P r{Xn+1 = j | Xn = i} = lim Kij (t) t→∞

2

(8)

Note that Kij (t) is a possibly defective distribution function, so that limt→∞ Kij (t) ≤ 1.

7

2.2.1. Semi-Markov Process Semi-Markov process (SMP) is a generalization of both continuous and discrete time Markov chains which permits arbitrary sojourn time distribution functions, possibly depending on both the current state and on the state to be visited next. Formally: Definition 2. A semi-Markov process is the process Y = {Yt ; t ≥ 0} defined by Yt = XN (t) = Xn ,

if Sn ≤ t < Sn+1 ,

(9)

for t ≥ 0, where N(t) is the counting process. From the SMP definition it should be observed that the process only changes state (possibly back to the same state) at the Markov regeneration epochs Sn . To describe the transient behavior of a semi-Markov model, the Markov renewal equation (15) (explained later in this Section) should be applied. However, to analyze the steady-state behavior or to calculate some expected values of a SMP model, there exists a simpler method called the two-stage method. It describes a SMP model using the matrix P and the vector H(t), where P = [pij ]n×n = limt→∞ K(t) is the one-step transition probability matrix P for the EMC of the SMP model, and H(t) = [Hi (t)] with i = 1, ..., n, Hi (t) = j Kij (t) is the sojourn time distribution in state i. This method thinks of SMP transitions as taking place in two stages: in the first stage, the system stays R ∞ in state i for some amount of time, the mean sojourn time in state i is hi = 0 (1 − Hi (t))dt. Then in the second stage, the system moves to state j with probability pij . When this method is applied to the steady-state analysis of SMP, we first calculate the steady-state probability vector of the EMC using equation v = vP. Given the mean sojourn time vector h = [h1 , h2 , ...hn ], the steady-state probability vector of the SMP π = [π1 , π2 , ...πn ] can be written as: πi =

vi hi n X vk hk

(10)

k=0

that Pnis the ratio between the time spent in state i (vi hi ) over the total time spent ( k=0 vk hk ). 8

2.2.2. Markov Regenerative Process A stochastic process Z = {Zt ; t ≥ 0} with state space Φ is called regenerative [23] if there exist time points at which the process probabilistically restarts itself. Formally: Definition 3. A stochastic process {Zt ; t ≥ 0} is called a regenerative process if there exists a non-negative random variable S1 such that: 1. Pr{S1 = 0} < 1, Pr{S1 < ∞} = 1, 2. {Zt ; t ≥ 0} and {Zt+S1 ; t ≥ 0} are stochastically identical, 3. {Zt+S1 ; t ≥ 0} is independent of {Zt ; 0 ≥ t ≥ S1 }. The random times when the future of Z becomes a probabilistic replica of itself are named regeneration points for Z. The sequence of such regeneration points identifies a renewal sequence, the embedded renewal sequence of the regenerative process. So, at each renewal epoch or regeneration point, the process Z restarts from scratch, or, equivalently, it is independent of its past, i.e. the stochastic behavior from such regeneration points is the same as Z had from t = 0. This concept may be weakened by letting the future after a time of regeneration depend also on the state of a MRS at that time, identified as the embedded Markov renewal sequence. We then say that Z is a Markov regenerative process (MRGP) or, alternatively, a semi-regenerative process. Definition 4. A stochastic process {Zt ; t ≥ 0} is a Markov regenerative process iff it exhibits an embedded MRS (X, S) with the additional property that all conditional finite distributions of {Zt+Sn ; t ≥ 0} given {Zu ; 0 ≤ u ≤ Sn , Xn = i, i ∈ Ω ⊆ Φ} are the same as those of {Zt , t ≥ 0} given X0 = i. As a special case, the definition implies that: Pr{Zt+Sn = j | Zu , 0 ≤ u ≤ Sn , Xn = i} = Pr{Zt = j | X0 = i}.

(11)

This means that the MRGP {Zt ; t ≥ 0} does not have the Markov property in general, but there is a sequence of embedded time points (S0 , S1 , ..., Sn , ...) such that the states (X0 , X1 , ..., Xn , ...) respectively of the process at these points satisfy the Markov property. It also implies that the future of the process Z from t = Sn onwards depends on the past {Zu , 0 ≤ u ≤ Sn } only through Xn . 2.2.3. Problem Solving Using Markov Renewal Theory Let Z = {Zt ; t ∈ R+ } be a stochastic process with discrete state space Φ and embedded MRS (X, S) = {Xn , Sn ; n ∈ N} with kernel matrix K(t). For 9

such a process we can define a matrix V(t) = [Vij (t)] of conditional transition probabilities as: Vij (t) = Pr{Zt = j | Z0 = i}. (12) In many problems involving Markov renewal processes, our primary concern is finding ways to effectively compute Vij (t) since several measures of interest (e.g., reliability and availability) are related to the conditional transition probabilities of the stochastic process. At any instant t, the conditional transition probabilities Vij (t) of Z can be written as [10, 11]: Vij (t) = P r{Zt = j, S1 > t | Z0 = i} + P r{Zt = j, S1 ≤ t | Z0 = i} XZ t dKik (u)Vkj (t − u), (13) = Pr{Zt = j, S1 > t | Z0 = i} + k∈Ω

0

for all i ∈ Ω, j ∈ Φ, and t ≥ 0. Let E(t) = [Eij (t)] with: Eij (t) = Pr{Zt = j, S1 > t | Z0 = i}

(14)

then, the set of integral equations Vij (t) defines a Markov renewal equation, and can be expressed in matrix form as Z t V(t) = E(t) + dK(u)V(t − u), (15) 0

where the Lebesgue-Stieltjes integral3 is taken term by term. If the stochastic process Z is an SMP then E(t) is a diagonal matrix with elements: Ei,i (t) = 1 − Ki (t).

(16)

The Markov renewal equation represents a set of coupled Volterra integral equations of the second kind [24] and can be solved in time-domain or in LaplaceStieltjes domain. For a discussion of approaches to solve these equations see [25, 26]. To better distinguish the roles of matrices E(t) and K(t) in the description of the MRGP we use the following terminology: 3

Rt 0

dK(t) dt .

dK(u)V (t − u) =

Rt 0

k(u)V (t − u)du when K(t) possesses a density function k(t) =

10

• We call matrix E(t) the local kernel of the MRGP, since it describes the state probabilities of the process during the interval between successive Markov regeneration epochs. E(t) describes the evolution of the MRGP between two Markov regeneration epochs. • Since matrix K(t) describes the evolution of the process from the Markov regeneration epoch perspective, without describing what happens in between these moments, we call it the global kernel of the MRGP. To summarize, solving problems using Markov renewal theory is a two steps process: first, we need to construct both kernel matrices K(t) and E(t)4 ; then we solve the set of Volterra integral equations for the conditional transition probabilities Vij (t) or for some measures of interest. The construction of kernel matrices can proceed by reasoning from particular facts to a general conclusion (inductive approach) or reasoning from the general to the specific (deductive approach) [27]. We can classify the existing solution methods of the Markov renewal equations into two categories: time domain methods [25]; Laplace-Stieltjes domain method [28, 29]. One possible time domain solution is based on a discretization approach to numerically evaluate the integrals presented in the Markov renewal equation [6, 30, 31, 32]. The integrals are solved using some approximation rule such as trapezoidal rule, Simpson’s rule or other higher order methods. A potential problem with this approach is that it can in general be expensive to compute. Nevertheless, there exist cases where the generalized Markov renewal equation has a simple form and the time-domain solution can be carried out. Another time domain alternative is to construct a system of PDE, using the method of supplementary variables [12]. This method has been considered for steady-state analysis in [33] and subsequently extended to the transient case in [34]. An alternative to the direct solution of the Markov renewal equation in timedomain is the use of transform methods [35, 28, 29, 27, 36, 37, 30, 38, 39, 40]. By applying such method to Equation 15 a linear system in the Laplace domain is obtained. After solving the linear system for V∼ (s), transform inversion is required. In very simple cases, a closed-form inversion might be possible but in most cases of interest, numerical inversion will be necessary. The transform 4

For the case of SMP, only the global kernel matrix K(t) is necessary.

11

inversion however can encounter numerical difficulties especially if V∼ (s) has poles in the positive half of the complex plane. For steady-state analysis of MRGP, given the local kernel E(t) and global kernel K(t), we define the following terms: Z ∞ X vk µk µm = E[S1 |X0 = m]; αmn = Emn (t)dt; v = vP, where vi = 1; βk = P . v r µr 0 r∈Ω i∈Ω

µm is the mean time before next regenerative epoch, given the initial regeneration state is m. αmn is the mean sojourn time in state n before next regenerative epoch, given initial regeneration state m. v = [vi ] is the steady-state probability vector for the EMC. With the definitions above, [29] gives the following theorem for steady-state MRGP analysis: Theorem 1. The limiting probability vector p = [pj ] of the state probabilities of the MRGP is given by: P vk αkj X αkj (17) pj = lim P r{Z(t) = j} = Pk∈Ω = βk t→∞ µ k k∈Ω vk µk k∈Ω The theorem can be interpreted as follows: pj =

X

(mean fraction of time spent in EMC state k) ×

k∈Ω

(mean time spent in state j in each visit of state k) mean time spent in each visit of EMC state k

2.3. Supplementary Variables This method, originally proposed in [12], is the most direct method of solving non-Markovian models. We assign to every non-exponential random variable in the specification of the model an additional variable which keeps track of either the elapsed or the residual time associated with this random variable. The purpose of the added supplementary variables is to include all necessary information about the history of the stochastic process. The resulting Markov process is in continuous time and has a state space which is multidimensional of mixed type, partly discrete and partly continuous. Since, after the inclusion of the supplementary variables, the stochastic process describing the system behavior satisfies the Markov property, it is possible to derive Chapman-Kolmogorov equations describing the dynamic behavior for such a process. The resultant set of ordinary or partial differential equations can be defined together with boundary conditions and analyzed. The supplementary variables approach has been successfully applied to the solution of queuing models 12

[41, 42, 43], dependability models [44, 45, 46, 47, 48, 49, 50, 51], and stochastic Petri net models [52, 34, 53, 33, 25, 54, 55]. Details of this approach are beyond the scope of this paper. 3. From Theory to Practice in Dependability Evaluation Aim of this section is to discuss the Markov renewal theory in the dependability context. A failure distribution plays one of the most significant role in dependability modeling. The failure distribution is defined on the length of lifetime of a device. The modes of failure for the system in consideration strongly affect the choice of failure distribution. Ideally, the failure distribution should be estimated by statistical samples of failure time collected from somewhat experiment. However, we can never collect a failure sample from the device failed already, and, for instance, almost all errors in the software system are poorly-reproducible. That is, it is difficult to observe and collect enough failure samples to estimate the failure distribution in practical situation. Instead of direct estimations of the failure distribution, it is useful to know the relationship between the failure mechanism and the failure rate function in making a choice of failure distribution. If the failure distribution F (t) has a density f (t), the failure rate function is defined as r(t) = f (t)/F (t), where F (t) = 1 − F (t). Intuitively, r(t) means the probability that the device of age t fails at time t. In general, failure rate functions can be characterized by aging properties. Typically, there are three classes: increasing, decreasing and constant failure rates which are abbreviated as IFR, DFR and CFR, respectively. The IFR property means that a failure rate function is non-decreasing in age t, and represents age-related degradation and wear out failures. The DFR property, which has a decreasing failure rate function of time, results from early failures. The CFR property represents a random failure with a constant failure rate. In the reliability engineering, a bathtub curve is also used as an expression of a lifetime of a device, which consists of three periods in which failure rates are DFR, CFR and IFR, respectively. On one hand, the CFR property leads to an exponential failure time. The CFR property is invoked to model a failure of the complex system where the system failure results from the failure of any component. On the other hand, IFR and DFR properties pose non-exponential distributions. In particular, distributions with IFR property are commonly used to represent the first time to failure for the system in which all components are initially new. Typical non-exponential distributions in

13

dependability modeling are gamma, Weibull, truncated normal and log normal distributions, and their applications cause us to analyze non-Markovian models. In order to better explain the application of non-Markovian models in dependability context, in the following we provide some examples in which we put into practice the techniques described in Section 2. The first application, discussed in Subsection 3.1, is the steady state availability evaluation through Markov renewal theory of a power outage model. Then, in Subsection 3.2 describes the availability evaluation of a parallel system composed of two components, affected by common cause failure and load sharing behaviors, and also with single repair facility, by using an MRGP model. 3.1. Power Outage Model

Figure 1: State Space Model of a Power Supply System with Two Types of Outages Consider the model of a power supply system with two types of outages depicted in Fig. 1. It contains three states: 0 is the operational state, while 1 and 2 are two outage states. Fi (t), i = 1, 2 is the failure time distribution of entering state i, if the other transition outgoing from state 0 is disabled. Gi (t), i = 1, 2 is the repair time distribution when the system is in the outage state i. We want to calculate the steady-state availability of the system. We first manually calculate the steady state probability vector using Markov renewal theory. To do so we determine the embedded DTMC of this semi-Markov process. Let the transition time from state 0 to state 1 and to state 2 be represented by the L1 and L2 random variables, respectively. The global kernel K(t) is as follows:   0 K01 (t) K02 (t)  K10 (t)  0 0 K20 (t) 0 0

14

where: K01 (t) = P r{L1 ≤ t ∧ L2 > L1 } =

Z

t

Z

t

(1 − F2 (u))dF1 (u),

0

K02 (t) = P r{L2 ≤ t ∧ L1 > L2 } =

(1 − F1 (u))dF2 (u),

0

K10 (t) = G1 (t), K20 (t) = G2 (t), and the transition probability matrix for the embedded Markov chain is:   0 η 1−η 0  P = K(∞) =  1 0 1 0 0 R∞ where η = 0 F1 (u)dF2 (u). Solving v = vP, we have 1 v0 = , 2

η v1 = , 2

v2 =

1−η 2

For the sojourn time distribution: H0 (t) = P r{L1 ≤ t ∨ L2 ≤ t} = 1 − [1 − F1 (t)][1 − F2 (t)] = F1 (t) + F2 (t) − F1 (t)F2 (t), H1 (t) = G1 (t), H2 (t) = G2 (t) while the mean sojourn time at state i is Z ∞ hi = [1 − Hi (t)]dt. 0

The steady-state probability vector π = (π0 , π1 , π2 ) can be written as: vi hi , i = 0, 1, 2 v h k k k=0

πi = P2

Now if we assume that L1 and L2 are distributed according to a Weibull function α 1 − e−(t/β) and the repair times of both state 1 and state 2 are deterministic, i.e.: 0, t < T2 0, t < T1 −(t/β1 )α1 −(t/β2 )α2 , G2 (t) = F1 (t) = 1−e , F2 (t) = 1−e , G1 (t) = 1, t ≥ T2 1, t ≥ T1 15

we have: α2 η= β2 Z h0 =

∞

Z

−(t/β1 )α1

(1 − e

0 ∞

)

α1 −(t/β )α2 2

e−(t/β1 )

t β2

α2 −1

α2

e−(t/β2 ) dt

dt, h1 = T1 , h2 = T2

0

Since the steady state availability A∞ corresponds to the probability to be in 0 state 0 or π0 we have: A∞ = π0 = P2v0 hv0 h = h0 +ηT1h+(1−η)T . By substi2 k=0 k k tuting the parameters values summarized in Table 1 in the above formula and numerically solving the integrals through the Mathematicar tool [56], we obtain: η = 0.770941, h0 = 1483.42 A∞ = 0.999172. Therefore the system, considering the parameters shown in Table 1, ensures a three nines steady state availability. α1 2

β1 α2 β2 T1 2000 0.75 10000 1

T2 2

Table 1: Parameters of the Example.

3.2. Parallel system with Load Sharing, Common Cause Failure and Single Repair Facility

GB(t)

5

λΑ q/2(δAλA+δBλB)

(1-q)δAλA

1

(1-q)δBλB

2

GA(t)

3

GB(t) q/2(δAλA+δBλB)

λB

4

GA(t)

Figure 2: MRGP representing the parallel system with single repair facility. Next consider a system composed of two components sharing the load. Such components fail autonomously, but, with probability q, a common cause failure 16

can trigger the failure of the other component. Only a shared single repair facility is available, so the repairs are fulfilled in a FCFS repair policy: when the repair facility is busy and a second failure occurs, the second component to fail waits in a repair queue until the first component is put back into service. We consider the components’ lifetime exponentially distributed with respective rates λA and λB , and their time-to-repair generally distributed with functions GA (t) and GB (t), respectively. The load sharing is represented by exploiting proportional hazard models, but, since the two components are different, we need to specify two different dependency parameters for them, proportionally weighting their mutual impact: δA and δB . The state space model representing the overall system is depicted in Fig. 2. In order to take into account the single repair facility, we use an MRGP, graphically characterized by the square representing states 4 and 5, and also by differentiating the arcs as dashed, thin and thick. In such MRGP, the dashed transitions represent concurrent transitions. A transition tr is said to be concurrent with respect to another transition tr ′ if both tr and tr ′ can occur in a given state i and the firing of tr does not disables tr ′ . Otherwise, tr is said to be competitive with tr ′ , and is represented by a thick arc. Exponential competitive transitions are instead represented by thin arcs. To explain the MRGP model, we define the stochastic process Z = {Z(t); t ∈ R+ } representing the system state at any instant t ≥ 0, where:  2, if both components are working at time t ≥ 0, sharing the work    load;     1, if component A is being repaired while B is working on the whole     load at time t ≥ 0;     3, if component B is being repaired while A is working on the whole     load at time t ≥ 0;  4, if component A is being repaired while B is waiting for repair at Z(t) =   time t ≥ 0, or due to a common cause failure, in which the re    pairman randomly selects component A is the first to be repaired;         5, if component B is being repaired while A is waiting for repair at     time t ≥ 0, or due to a common cause failure, in which the re   pairman randomly selects component B is the first to be repaired.

In states 4 and 5, we assume that the repairman randomly chooses the first component to repair, and so the probabilities to select A or B are the same, both 0.5. The system is in state 2 if both A and B are up (and the repairman is free). Component 17

A can fail at the rate δA λA , since it is in load sharing with B (which rate is δB λB ), reaching state 1. It will take the repairman a time to repair distributed according to the GA (t) Cdf to bring the system back to state 2. If component B goes down during repair time of component A, the system jumps to state 4, and while the repair action on component A continues. Once A is repaired, and B is still down, the system is in state 3. B is repaired with a repair duration generally distributed with Cdf GB (t), but A can fail again in the mean time, moving the system to state 5. As shown in Fig. 2, EXP transitions from state 1 to state 4 and from state 3 to state 5 are concurrent, since the firings of these two transitions do not disable the enabled general transition representing the corresponding failed component under repair. Such transitions do not correspond to Markov regeneration epochs since they occur while non-exponentially distributed transitions are enabled. So, the stochastic process Z is an MRGP whose EMC is identified by states 1, 2 and 3, while states 4 and 5 do not belong to this EMC, therefore they are represented by squares in order to graphically highlight this difference. Once the MRGP has been identified, the next step, as per Section 2.2, is the construction of both the global and the local kernel matrices K(t) and E(t). By observing the MRGP of Fig. 2, we can identify the structure of the global kernel matrix K(t) as:   0 K1,2 (t) K1,3 (t) 0 K2,3 (t)  K(t) =  K2,1 (t) K3,1 (t) K3,2 (t) 0 To compute the elements of K(t), we start from the first row going down. The process of determining elements K1,2 and K1,3 is quite alike, so we will show the computation of K1,2 . Let RA , RB be the times to repair and LA , LB the times to failure r.v. of A and B, respectively. In order to compute K1,2 we apply the following approach: K1,2 (t) = Pr{Z(S1 ) = 2, S1 ≤ t|Z(0) = 1} = = Pr{“repair of A has finished at t and B did not fail during the repair of A”} = Z t Z t = Pr{RA ≤ t ∧ LB > RA } = Pr{LB > u}dGA (u) = e−λB u dGA (u) 0

0

Following a similar reasoning we obtain K2,1 . There are two ways of reaching state 1 from 2: the first is if A fails before B, the second is if a common cause failure occurs and the repairman choose to first repair A and ultimate such repair 18

action. The two events are independent so the corresponding probabilities can be added. In mathematical terms we have: K2,1 (t) = Pr{Z(S1 ) = 1, S1 ≤ t|Z(0) = 2} = = Pr{(LA ≤ t ∧ LB > LA ) ∨ (RA ≤ t ∧ (LA = LB ) ≤ RA )} = Z t Z t 1 δB λB u δA λA u = (1 − q) e d(1 − e )− q e−qδB λB u GA (t − u)d(1 − e−qδA λA u ) 2 0 Z t0 1 − q e−qδA λA u GA (t − u)d(1 − e−qδB λB u ) 2 0 Z t = (1 − q)δA λA e(δA λA +δB λB )u du 0 Z t 1 e(δA λA +δB λB )u GA (t − u)du − q(δA λA + δB λB ) 2 0 The computation of K2,3 is analogous, we can obtain this by exchanging A and B in the previous formula. The third row is completely symmetrical to the first, so we obtain the corresponding elements by this latter. We finally obtain the following global kernel matrix K(t) as: 

0 R  (1 − q)δA λA 0t eλLS u du−  Rt λ u K(t) =  1 LS  − 2 qλLS 0 e GA (t − u)du Rt (1 − e−λA u )dGB (u) 0

Rt

e−λB u dGA (u) 0

Rt

e−λA u dGB (u)

0

0

Rt

− e−λB u )dGA (u) R (1 − q)δB λB 0t eλLS u du− Rt λ u 1 LS − 2 qλLS 0 e GB (t − u)du 0 0 (1

    

where λLS = (δA λA + δB λB ). Once K(t) is specified, we need to obtain the local kernel matrix E(t). Since a Markov regenerative process can change states between two consecutive Markov regeneration epochs, we need to capture these changes through the E(t) matrix. Moreover, the cardinality of Z can be larger than the cardinality of the EMC state space and so, generally, E(t) is a rectangular matrix. In fact, in the present example, the EMC has only 3 states while the system has 5 possible states. By a careful examination of the MRGP of Fig. 2, the structure of E(t) is:   E1,1 (t) 0 0 E1,4 (t) 0 E(t) =  0 E2,2 (t) 0 E2,4 (t) E2,5 (t)  0 0 E3,3 (t) 0 E3,5 (t) In defining E(t) we start from its square sub-matrix E1 (t) = [Ei,j (t)] with i, j = 1..3. This it a diagonal matrix, only Ei,i (t) with i = 1..3 are non-zero elements. E1,1 (t) represents the probability of remaining in state 1 until a given time t, i.e. 19

the probability that the repair of A is not finished at time t and B has not failed until t, thus: E1,1 (t) = Pr{Z(t) = 1, S1 > t|Z(0) = 2} = Pr{RA ≤ t ∧ LB > t} = = Pr{RA ≤ t}Pr{LB > t} = [1 − GA (t)]e−λB t since the two events are independent. E2,2 (t) is the probability of remaining in state 2 until t, and so the probability that a common cause failure cannot occur until t , and so: E2,2 (t) = (1 − q)e−(δA λA +δB λB )t . In the same way as in E1,1 (t) we determine E3,3 (t) = [1 − GB (t)]e−λA t . Element E1,4 (t) is obtained as the probability of having a failure of component B before time t while the repair of A is not finished until t, and so E1,4 (t) = [1 − GA (t)](1 − e−λB t ). Applying the same reasoning to the other elements of E(t) we have:   GA (t)e−λB t 0 0 GA (t)(1 − e−λB t ) 0  0 (1 − q)e−λLS t 0 q/2e−λLS t q/2e−λLS t E(t) =  −λA t −λA t GB (t)e 0 GB (t)(1 − e ) 0 0 where: Gi (t) = [1 − Gi (t)] with i = A, B, and λLS = (δA λA + δB λB ) as above. Once both the kernel matrices K(t) and E(t) are specified the model could be analyzed. The solution is obtained by analysing the Markov renewal equation (15). As discussed in Section 2.2.3, a closed form solution of this equation is often really hard to obtain, and hence it is necessary to use numerical algorithms to solve it. A good alternative could be to consider the Laplace-Steltjes transform domain. If we are interested only in the steady state, we can use the simpler technique summarized by Equation (17), also implemented in SHARPE [57]. In our case study we wish to evaluate the impact of the common cause failures on the system, parameterizing the analysis of the corresponding model by varying q. Therefore we use SHARPE in order to evaluate the MRGP steady state availability to study the effect of q. Component A B

λ 0.002 0.01

δ 0.3 0.6

1/µ 0.1 0.1

Table 2: Parameters of the parallel system MRGP We assume deterministic time to repair distribution, or: GA (t) = u(t − 1/µA ) and GB (t) = u(t − 1/µB ). Table 2 summarizes the numerical values of the parameters used in computation, times are expressed in hours, while rates in hours-1 . 20

By these values we consider the component A more reliable than B, since this latter has lower failure rate, but on the other hand we also consider B more powerful than A, because the load sharing has a stronger impact in component A, that is more sensitive to the workload changes. 1,000

Steady State Availability

0,995 0,990 0,985 0,980 0,975 0,970 0,965 0,960 0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

q

Figure 3: Steady state availability of the parallel system varying q. By analyzing the MRGP for such numerical values we obtain the graph depicted in Fig. 3. It shows how the steady state availability (A∞ ) of the parallel system varies by changing the common cause failure probability q. By the graph it is possible to notice a clear linear trend of the A∞ with respect of q. The steady state availability of the system goes from the maximum A∞ = 0.999687913 if no common cause failure are considered (q = 0), to the minimum of A∞ = 0.963846678 reached if the two components always fail together (q = 1). 4. Related Work Renewal theory has been widely applied to the dependability context. Some references for the application of Markov renewal theory in the solution of performance and reliability/availability models can be found in [35, 27, 36, 25, 37, 30, 38, 58, 40, 59, 60]. One of the most widely and successfully state space model adopted in reliability and availability evaluations of non-Markovian systems is SMP. A good reference on the topic is [61]. The book contains the theoretical notions of Markov renewal process, characterized and adapted to reliability/availability contexts and problems. It also proposes many examples which aid in the understanding of the theory and shows how to apply this latter to concrete physical situations such as: 21

three-state systems, systems with mixed constant repair time, systems with multiphase repair, systems with non-regenerative states, two-component systems with cold standby, maintenance and Markov renewal shock models. Another book mainly focusing on SMP is [62]. It aims to give a complete and self-contained presentation of semi-Markov models with finitely many states, firstly by formally providing the theory, revising and extending it in view of solving real life problems of risk management, and then proposing basic algorithms so that effective numerical results can be obtained. It deals with a larger context that also includes reliability, proposing an interesting semi-Markov model for maintenance systems applied to a classical example given in [63] regarding two machines (computers in the original example) working in parallel, extended with non-exponential distributions. In [64] the authors discuss some techniques for evaluating and optimizing the reliability of multi-state systems. They propose several interesting application of SMP to the reliability assessment in order to relax the Markov assumption. The work described in [65] deals with semi-Markov models of repairable systems focusing on the system’s interval reliability. An interesting analysis technique is proposed evaluating both the transient and the steady state interval availability based on the Laplace transform domain. [66] further extends and specifies the technique, introducing two quantities, the joint reliability and availability as the probability the system being reliable/available at the two time instants t and t + x. this technique is then applied to power transmission lines in a two-state fluctuating environment. An SMP is also exploited in the evaluation of a nonMarkovian UPS system availability in [67]. A similar model, further extending and better specifying the storage model, has been evaluated in [68]. A different application of SMP in reliability context is done in [69]: there the author exploits a semi-Markov process for modeling the failure rate of an object and then, applying a system of renewal equations, the object reliability functions for the alternating and the Poisson failure process are obtained, as also the Laplace-Stieltjes transform of the object reliability function and its mean time to failure. A new mathematical and numerical formulations for non homogeneous SMP, described through transition probabilities, is proposed in [70]. The authors claim that their approach is more efficient and had less computational effort, but kept the accuracy in relation to the available methods in the literature (i.e. traditional numerical techniques and Monte Carlo simulation). The technique is applied to the reliability evaluation of a simple three-state semi-Markov example and of a more complex downhole optical monitoring systems taken from literature [71], demonstrating the effectiveness of the technique in comparison to the existing ones. 22

Also phase type expansions have been directly used to model and evaluate the reliability/availability of non-Markovian systems. For example, a repairable two-component system with a shared repairman is studied in depth in [72], by characterizing phase-type sojourn time distributions. A similar idea, adapted to the discrete time domain is applied in [73], where a standby system is investigated by discrete phase type, and then both combined into [74], where the reliability of a repairable cold-standby system with phase-type distributions is investigated. More recently Markov regenerative processes have been used in dependability. Some examples concerning the reliability analysis of power plants and other fault-tolerant systems can be found in [27, 75, 76, 77]. Non-Markovian modeling has also been applied in software reliability context. In [78], the authors evaluate an analytical model of a software system employing inspection-based preventive maintenance, through a Markov Regenerative Process with a subordinated semiMarkov reward process. In [79], rejuvenation is modeled in a redundant computer system via a semi-Markov process in order to counteract software aging. Several different configurations are evaluated in terms of availability by varying the rejuvenation policies. Another application of MRGP modeling and analysis can be found in [80], where the availability of Internet-based services perceived by a Web user, which capture the interactions between the service facility and the user is evaluated, also investigating two different online service scenarios: singleuser-single-host and single-user-multiple-host. In [81] the authors address the analytical dependability evaluation of phase mission systems by proposing a new methodology for their modeling and evaluation. Even though the technique is based on an MRSPN approach, the solution of the resulting model is developed by the specialization of the MRGP theory. A phase mission system example has been used throughout the paper to exercise the MRSPN approach in the modeling and evaluation of such systems. Further work on phase mission systems modeling by MRGP is in [82] where the authors address the mission reliability analysis of such kind of systems by deriving several efficient formulations for intraphase behavior analysis and interphase behavior analysis, allowing random phase durations, non-exponentially distributed repair activities and different repair policy. Two examples of phase mission systems have been used throughout the paper to exercise the approach, modeled by MRGP and therefore analyzed in order to evaluate the design parameters, such as failure rate, repair time and redundancy level, and study the effects that the variation of the system running time has on the mission reliability. Several other examples and applications of MRGP in dependability context and their solution by the SHARPE tool [3] can be found in [57]. 23

A different approach used by several authors in order to model and evaluate the system dependability is to transform/approximate, under specific assumptions, a non-Markovian (semi-Markov) model into Markov models. In [83], the author develops semi-Markovian models for systems undergoing periodic, random and generally distributed tests and subsequent repair, thus obtaining equivalent Markovian models for a number of special but frequently met cases in order to solve the former and substantially reducing both the computer storage and time requirements. The work described in [84] presents an analytical approach for the dependability evaluation of non-Markovian discrete state systems with multiple components, containing both stochastic and deterministic processes. In particular the approach they developed is well suited for the dependability assessment of the so-called failure delayed systems, i.e., systems where the failures do not have immediate consequences for the users. The evaluation technique applied to the non-Markovian model, defines a three-step procedure: firstly, the analytical expressions for the relevant dependability indices are derived; then, those expressions are reduced to a canonical form by a set of transformations, obtaining Markovian statements; finally, the expressions are evaluated through the application of a symbolic algorithm. Another approach used in the system dependability evaluation by non-Markovian state space models is to combine these latter with other different notations in order to provide more powerful techniques and/or to reduce the complexity of the overall resulting stochastic process. For example, [85] proposes two different approaches, the maximum likelihood and the Bayes ones, for estimating the parameters included in a semi-Markov three-state reliability model; [71] develops an availability assessment model in which the system dynamics is described via a continuous-time semi-Markovian process, integrated with a Bayesian belief network characterizing the cause-effect relationships among factors influencing the repairman error probability during maintenance, technique then applied to a real case study concerning mature oil wells. Many examples of state space models combined with state space models exist [3]. 5. Conclusions Markov models are a well known modeling approach in industrial and academic environments for both practical applications and theoretical research. In fact, the availability of stable and user-friendly tools based on Markov models has largely contributed to the success of such paradigm as a general purpose, flexible and effective modeling and analysis method. The research devoted to exploit spe24

cific properties and structures of Markov models enabled a rapid increase in the size of the problems that can be effectively handled. New challenging and promising results have been recently obtained in the attempt to overcome the memoryless/exponential assumption of homogeneous Markov models distributions. The possibility of dealing with non-exponential distributions represents a realistic goal in the present state of the art. This goal is followed by the paper, with specific regards to the dependability. In such context the paper makes a general overview of the existing approaches, providing a detailed survey on state space techniques and methodologies. Our aim is to give proof of the effectiveness and the applicability of non-Markovian techniques, by applying them to the dependability evaluation of generic but realistic examples. In these we investigate specific common dependability aspects and behaviors with different techniques. The examples have been studied in depth, evaluating both the transient and the steady-state availability and providing the obtained results, with the hope that more people may utilize these powerful techniques. References [1] D. I. Heimann, N. Mittal, K. Trivedi, Availability and reliability modeling for computer systems, in: M. C. Yovits (Ed.), Advances in Computers, Vol. 31, Academic Press, San Diego, CA, USA, 1990, pp. 175–233. [2] J. K. Muppala, M. Malhotra, K. S. Trivedi, Markov dependability models ¨ of complex systems: Analysis techniques, in: S. Ozekici (Ed.), Reliability and Maintenance of Complex Systems, Springer, Berlim, Germany, 1996, pp. 442–486. [3] R. Sahner, K. S. Trivedi, A. Puliafito, Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1995. [4] B. S. Dhillon, Reliability Engineering in Systems Design and Operation, Van Nostrand Reinhold, New York, NY, USA, 1983. [5] D. in Reliability Analysis of Fault-Tolerant Computer Systems”, K. s. trivedi and r. m. geist, IEEE Trans. on Reliab. 32 (1983) 463–468. [6] R. Geist, M. Smotherman, K. S. Trivedi, J. Bectha Dugan, Reliability analysis of life-critical systems, Acta Informatica 23 (6). 25

[7] R. M. Geist, M. Smotherman, K. S. Trivedi, J. B. Dugan”, ”the use of weibull fault processes in modeling fault tolerant systems”, AIAA Journal of Guidance, Control, and Dynamics 11 (1988) 91–93. [8] R. F. Botta, C. M. Harris, W. G. Marchal, Characterizations of generalized hyperexponential distribution functions, Communications in Statistics Stochastic Models 3 (1) (1987) 115–148. [9] M. F. Neuts, Renewal process of phase type, Naval Research Logistics Quartely 25 (3) (1978) 445–454. [10] E. Çinlar, Introduction to Stochastic Processes, Prentice-Hall, Englewood Cliffs, NJ, USA, 1975. [11] V. G. Kulkarni, Modeling and Analysis of Stochastic Systems, Chapman & Hall, London, UK, 1995. [12] D. R. Cox, The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables, Proceedings of the Cambridge Philosophical Society 51 (3) (1955) 433–441. [13] F. Brokemeyer, H. S. Halstron, A. Jensen, The life and works of A. K. Erlang, Transactions of the Danish Academy of Technical Sciences 2. [14] D. R. Cox, Use of complex probabilities in the theory of stochastic processes, Proceedings of the Cambridge Philosophical Society 51 (1955) 313–318. [15] R. F. Botta, C. M. Harris, Approximation with generalized hyperexponential distribution functions: Weak convergence results, Queueing Systems Theory and Applications 1 (2) (1986) 169–190. [16] M. F. Neuts, Probability distributions of phase type, Liber Amicorum Prof. Emeritus H. Florin (1975) 173–206University of Louvain, Belgium. [17] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach, Johns Hopkins University Press, Baltimore, MD, USA, 1981. [18] V. Ramaswami, M. F. Neuts, A duality for phase type queues, The Annals of Probability 8 (5) (1980) 974–985.

26

[19] A. Bobbio, A. Cumani, A. Premoli, O. Saracco, Modelling and identification of non-exponential distributions by homogeneous markov processes, in: Proceedings of the 6th Advances in Reliability Technology Symposium, Bradford, 1980, pp. 373–392. [20] A. Bobbio, A. Premoli, O. Saracco, Multi-state homogeneous Markov models in reliability analysis, Microelectronics and Reliability 20 (1980) 875– 880. [21] A. Bobbio, A. Cumani, A Markov approach to wear-out modelling, Microelectronics and Reliability 23 (1) (1983) 113–119. [22] A. Cumani, On the canonical representation of homogeneous markov processes modelling failure-time distributions, Microelectronics and Reliability 22 (3) (1982) 583–602. [23] W. L. Smith, Renewal theory and its ramifications, Journal of the Royal Statistical Society, Series B 20 (2) (1958) 243–302. [24] C. Fröberg, Introduction to Numerical Analysis, 2nd. ed., Addison-Wesley Publishing Company, Reading, MA, USA, 1969. [25] R. German, D. Logothetis, K. S. Trivedi, Transient analysys of Markov regenerative stochastic Petri nets: A comparison of approaches, in: Proceedings of the 6th International Workshop on Petri Nets and Performance Models - PNPM’95, Durham, NC, USA, 1995, pp. 103–111. [26] M. Telek, A. Bobbio, L. Jereb, A. Puliafito, K. S. Trivedi, Steady state analysis of Markov regenerative SPN with age memory policy, in: H. Beilner, F. Bause (Eds.), Lecture Notes in Computer Science, Vol. 977, 1995, pp. 165–179. [27] R. Fricks, M. Telek, A. Puliafito, K. Trivedi, Markov renewal theory applied to performability evaluation, in: K. Bagchi, G. Zobrist (Eds.), Stateof-the Art in Performance Modeling and Simulation. Modeling and Simulation of Advanced Computer Systems: Applications and Systems, Gordon and Breach Publishers, Newark, NJ, EUA, 1997, pp. 193–236. [28] A. Bobbio, M. Telek, Markov regenerative SPN with non-overlapping activity cycles, in: Proceedings of the 1st Annual IEEE International Computer

27

Performance & Dependability Symposium - IPDS’95, Erlangen, Germany, 1995, pp. 124–133. [29] H. Choi, V. G. Kulkarni, K. S. Trivedi, Markov regenerative stochastic Petri nets, Performance Evaluation 20 (1994) 337–357. [30] D. Logothetis, K. Trivedi, Time–dependent behavior of redundant systems with deterministic repair, in: W. J. Stewart (Ed.), Computations with Markov Chains, Kluwer Academic Publishers, Norwell, MA, USA, 1995, pp. 135– 150. [31] M. Smotherman, K. Zemoudeh, A non-homogeneous Markov model for phased-mission reliability analysis, IEEE Transactions on Reliability 38 (5) (1989) 585–590. [32] M. K. Smotherman, R. M. Geist, Phased mission effectiveness using a nonhomogeneous Markov reward model, Reliability Engineering & System Safety 27 (2) (1990) 241–255. [33] R. German, C. Lindemann, Analysis of stochastic Petri nets by the method of supplementary variables, Performance Evaluation 20 (1–3) (1994) 317–335. [34] R. German, Transient analysis of deterministic and stochastic Petri nets by the method of supplementary variables, in: Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems Conference - MASCOTS’95, Durham, NC, USA, 1995, pp. 394–398. [35] A. Bobbio, V. G. Kulkarni, A. Puliafito, M. Telek, K. Trivedi, Preemptive repeat identical transitions in Markov regenerative stochastic petri nets, in: Proceedings of the 6th International Workshop on Petri Nets and Performance Models - PNPM’95, Durham, NC, USA, 1995, pp. 113–122. [36] S. Garg, A. Puliafito, M. Telek, K. Trivedi, Analysis of preventive maintenance in transactions based software systems, submitted for publication. [37] D. Logothetis, K. Trivedi, Dependability evaluation of the double counterrotating ring with concentrator attachments, ACM/IEEE Transactions on Networks 2 (5) (1994) 520–532.

28

[38] D. Logothetis, K. Trivedi, The effect of detection and restoration times for error recovery in communication networks, to appear in the Journal of Network and Systems Management. [39] M. Telek, Some advanced reliability modeling techniques, Ph.D. thesis, Technical University of Budapest, Departament of Telecomunications, Budapest, Hungary (1994). [40] M. Telek, A. Pfening, Performance analysis of Markov regenerative reward models, Performance Evaluation 27 & 28 (1996) 1–18. [41] F. Baccelli, K. S. Trivedi, Analysis of an M/G/2 standby redundant system, Performance (1983) 457–476. [42] V. V. Kalashnikov, Mathematical Methods in Queueing Theory, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [43] H. Li, T. Yang, Single-server retrial queue with server vacations and a finite number of input sources, European Journal of Operational Research 85 (1) (1995) 149–160. [44] J. Cao, Reliability analysis of M/G/1 queueing system with repairable service station of reliability series structure, Microelectronics and Reliability 34 (4) (1994) 721–725. [45] Y.-M. Chen, T. Fujisawa, H. Osawa, Availability of the system with general repair time distributions and shut-off rules, Microelectronics and Reliability 33 (1) (1993) 13–19. [46] B. S. Dhillon, O. C. Anude, Common-cause failure analysis of a nonidentical unit parallel system with arbitrarily distributed repair times, Microelectronics and Reliability 33 (1) (1993) 87–103. [47] B. S. Dhillon, O. C. Anude, Income optimization of repairable and redundant system, Microelectronics and Reliability 34 (11) (1994) 1709–1720. [48] B. S. Dhillon, N. Yang, Availability of a man-machine system with critical and non-critical human error, Microelectronics and Reliability 33 (10) (1993) 1511–1521.

29

[49] M. N. Gopalan, Dinesh Kumar, On the transient behaviour of a repairable system with a warm standby, Microelectronics and Reliability 36 (4) (1996) 525–532. [50] W. Shao-Ming, H. Ren, W. De-Jun, Reliability analysis of a repairable system without being repaired “as good as new”, Microelectronics and Reliability 34 (2) (1994) 357–360. [51] C. Singh, R. Billinton, Reliability modelling in systems with nonexponential down time distributions, IEEE Transactions on Power Apparatus and Systems PAS-92 (1973) 790–800. [52] R. German, New results for the analysis of deterministic and stochastic Petri nets, in: Proceedings of the IEEE International Computer Performance and Dependability Symposium - IPDS’95, Erlangen, Germany, 1995, pp. 114– 123. [53] R. German, C. Kelling, A. Zimmermann, G. Hommel, TimeNET - a toolkit for evaluating non-Markovian stochastic Petri nets, in: Proceedings of the 6th International Workshop on Petri Nets and Performance Models PNPM’95, Durham, NC, USA, 1995, pp. 210–211. [54] C. Kelling, R. German, A. Zimmermann, G. Hommel, TimeNET: Evaluation tool for non-Markovian stochastic Petri nets, in: Proceedings of the 1996 IEEE International Computer Performance & Dependability Symposium IPDS’96, Urbana-Champaign, IL, USA, 1996, p. 62. [55] C. Lindemann, G. Ciardo, R. German, G. Hommel, Performability modeling of an automated manufacturing system with deterministic and stochastic Petri nets, in: Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, GA, USA, 1993, pp. 576–581. [56] Wolfram Research, Wolfram Mathematicar Website URL (June 2009). [57] W. Xie, Markov Regenerative Process In Sharpe, Master’s thesis, Duke University, Department of Electrical and Computer Engineering, Durham, NC,USA (1999). [58] V. Mainkar, K. Trivedi, Approximate analysis of priority scheduling systems using stochastic reward nets, in: Proceedings of the 13th International

30

Conference on Distributed Computing Systems - ICDCS’93, Pittsburgh, PA, USA, 1993, pp. 466–473. [59] T. Nakagawa, S. Osaki, Markov renewal processes with some nonregeneration points and their applications to reliability theory, Microelectronics Reliability 15 (6) (1976) 633 – 636. [60] Q. Jin, Y. Sugasawa, K. Seya, Probabilistic behavior and reliability analysis for a multi-robot system by applying petri net and markov renewal process theory, Microelectronics and Reliability 29 (6) (1989) 993 – 1001. [61] N. Limnios, G. Oprisan, Semi-Markov Processes and Reliability, Statistics for Industry and Technology, Birkhäuser, Boston, MA, USA, 2001. [62] J. Janssen, R. Manca, Semi-Markov Risk Models for Finance, Insurance and Reliability, Springer, 2007. [63] R. E. Barlow, F. Proschan, Mathematical Theory of Reliability, Classics in Applied Mathematics, Wiley, New York, 1965. [64] A. Lisnianski, G. Levitin, Multi-State System Reliability - Assessment, Optimization and Applications, Vol. 6 of Series On Quality, Reliability And Engineering Statistics, World Scientific Publishing Co, 2003. [65] A. Csenki, On the interval reliability of systems modelled by finite semimarkov processes, Microelectron. Reliab. 34 (8) (1994) 1319–1335. [66] A. Csenki, Joint interval reliability for markov systems with an application in transmission line reliability, Reliability Engineering & System Safety 92 (6) (2007) 685–696. [67] L. Yin, R. Fricks, K. Trivedi, Application of semi-markov process and ctmc to evaluation of ups system availability, in: Reliability and Maintainability Symposium, 2002. Proceedings. Annual, 2002, pp. 584–591. [68] A. Pievatolo, E. Tironi, I. Valade, Semi-markov processes for power system reliability assessment with application to uninterruptible power supply, Power Systems, IEEE Transactions on 19 (3) (2004) 1326–1333. [69] F. Grabski, The reliability of an object with semi-Markov failure rate, Applied Mathematics and Computation 135 (1) (2003) 1 – 16. 31

[70] M. das Chagas Moura, E. L. Droguett, Mathematical formulation and numerical treatment based on transition frequency densities and quadrature methods for non-homogeneous semi-markov processes, Reliability Engineering & System Safety 94 (2) (2009) 342 – 349. [71] E. L. Droguett, M. das Chagas Moura, C. M. Jacinto, M. F. S. Jr., A semimarkov model with bayesian belief network based human error probability for availability assessment of downhole optical monitoring systems, Simulation Modelling Practice and Theory 16 (10) (2008) 1713 – 1727, the Analysis of Complex Systems. [72] R. Prez-Ocn, J. E. R. Castro, Two models for a repairable two-system with phase-type sojourn time distributions, Reliability Engineering & System Safety 84 (3) (2004) 253 – 260. [73] D. Montoro-Cazorla, R. Prez-Ocn, A deteriorating two-system with two repair modes and sojourn times phase-type distributed, Reliability Engineering & System Safety 91 (1) (2006) 1 – 9. [74] J. E. Ruiz-Castro, R. Prez-Ocn, G. Fernndez-Villodre, Modelling a reliability system governed by discrete phase-type distributions, Reliability Engineering & System Safety 93 (11) (2008) 1650 – 1657. [75] R. Fricks, L.Yin, K. Trivedi, Application of semi-markov process and CTMC to evaluation of UPS system availability, in: RAMS’2002, 2002, pp. 584– 591. [76] M. Perman, A. Senegacnik, M. Tuma, Semi-Markov models with an application to power-plant reliability analysis, IEEE Transactions on Reliability 46 (4) (1997) 526–532. [77] N. Wereley, B. Walker, Approximate semi-Markov chain reliability models, in: 27th IEEE Conference on Decision and Control, Vol. 3, 1988, pp. 2322– 2329. [78] K. Vaidyanathan, D. Selvamuthu, K. S. Trivedi, Analysis of inspectionbased preventive maintenance in operational software systems, Reliable Distributed Systems, IEEE Symposium on 0 (2002) 286.

32

[79] V. P. Koutras, A. N. Platis, Semi-markov availability modeling of a redundant system with partial and full rejuvenation actions, Dependability of Computer Systems, International Conference on 0 (2008) 127–134. [80] W. Xie, H. Sun, Y. Cao, K. S. Trivedi, Modeling of user perceived webserver availability, in: Proceedings of the IEEE International Conference on Communications, 2003. [81] I. Mura, A. Bondavalli, Markov regenerative stochastic petri nets to model and evaluate phased mission systems dependability, IEEE Transactions on Computers 50 (12) (2001) 1337–1351. [82] Y. chang Mo, D. Siewiorek, X. zong Yang, Mission reliability analysis of fault-tolerant multiple-phased systems, Reliability Engineering & System Safety 93 (7) (2008) 1036 – 1046, bayesian Networks in Dependability. [83] I. A. Papazoglou, Semi-markovian reliability models for systems with testable components and general test/outage times, Reliability Engineering & System Safety 68 (2) (2000) 121 – 133. [84] J. A. Faria, M. A. Matos, An analytical methodology for the dependability evaluation of non-markovian systems with multiple components, Reliability Engineering & System Safety 74 (2) (2001) 193 – 210. [85] A. El-Gohary, Estimations of parameters in a three state reliability semimarkov model, Applied Mathematics and Computation 154 (2) (2004) 389 – 403.

33

Non-Markovian State-Space Models in Dependability Evaluation

Non-Markovian State-Space Models in Dependability Evaluation

Suggest Documents

Uncertainty Propagation through Dependability Models

Experimental Dependability Evaluation of the

Dependability Models for Iterative Software ... - Semantic Scholar

Dependability and Security Models - Semantic Scholar

Engineering Oriented Dependability Evaluation: MEADEP and Its ...

[hal-00650086, v1] Dependability Evaluation of ...

Automated Performance and Dependability Evaluation using Model ...

Dependability Evaluation and Benchmarking of ... - e-government

Evaluation of Software Dependability - Semantic Scholar

DEPENDABILITY

DEPENDABILITY

tool-supported dependability evaluation of redundant ... - CiteSeerX

Dependability Evaluation Based on System Monitoring

Dependability Evaluation and Benchmarking of ... - Semantic Scholar

Quantitative Evaluation of Dependability Critical Systems ... - CiteSeerX

dependability evaluation of fault tolerant ... - Semantic Scholar

Compositional Dependability Evaluation for - Albert-Ludwigs ...

Dependability Evaluation of SISO Control-Theoretic ...

Dependability Evaluation of Complex Embedded Systems and

Architectural dependability evaluation with Arcade - University of

A Framework for Dependability Evaluation of

Performance and Dependability Evaluation of ... - Semantic Scholar

Automated Performance and Dependability Evaluation using Model ...

A Framework for Dependability Evaluation of