Lectures in Reliability Theory With Applications to

0 downloads 0 Views 854KB Size Report
Feb 3, 2017 - This definition follows the British standard BS 4778. The same standard ... What are the typical problems solved by the Reliability Theory ? One direction in ... Here is the point where the statistical inference issues enter ..... Contrary to Section 1, let us assume now that the state of component i is described by ...
Lectures in Reliability Theory With Applications to Preventive Maintenance I. Gertsbakh January 2000

ii

Contents PREFACE

vii

Introduction

ix

1 System Reliability 1.1 The System and Its Components. 1.2 Independent Components . . . . . 1.3 Lifetime Distribution . . . . . . . 1.4 Exercises . . . . . . . . . . . . . . 2

3

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 1 6 10 14

Parametric Lifetime Distributions 2.1 Poisson Process . . . . . . . . . . . . . . . . . 2.2 Aging and Failure Rate . . . . . . . . . . . . 2.3 Normal, Lognormal and Weibull Families . . 2.3.1 Normal and Lognormal Distributions 2.3.2 The Weibull distribution . . . . . . . 2.4 Exercises . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

17 17 24 29 29 32 39

Statistical Inference: Incomplete Data 3.1 Kaplan-Meier Estimator . . . . . . . . . . . . . 3.2 Probability Paper . . . . . . . . . . . . . . . . . 3.3 Censored Data: Parameter Estimation . . . . . 3.3.1 The Maximum Likelihood Method. . . 3.3.2 Censored and Grouped Data . . . . . . 3.3.3 Censored Weibull and lognormal sample 3.3.4 Point and Confidence Estimation . . . . 3.3.5 Large Sample Confidence Intervals . . . 3.4 Exercises . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

41 41 44 48 48 50 53 55 59 61

. . . .

. . . .

. . . .

. . . .

. . . .

4 Preventive Maintenance Models 67 4.1 Renewal Theory: Basic Facts . . . . . . . . . . . . . . . . . . . . 67 4.1.1 Renewal Process . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.2 Renewal Reward Process . . . . . . . . . . . . . . . . . . 72 iii

iv

CONTENTS

4.2

4.3

4.4

4.5

4.1.3 Alternating Renewal Process . . . . . . . . . . . . . Principal Maintenance Models . . . . . . . . . . . . . . . . . 4.2.1 Block replacement: Cost Criterion . . . . . . . . . . 4.2.2 Block Replacement: Availability Criterion . . . . . . 4.2.3 Periodic Group Repair . . . . . . . . . . . . . . . . . 4.2.4 Minimal Repair . . . . . . . . . . . . . . . . . . . . . 4.2.5 Age Replacement. Cost-type Criterion . . . . . . . 4.2.6 Age Replacement. Availability-type Criterion. . . . Qualitative Investigation . . . . . . . . . . . . . . . . . . . . 4.3.1 The principal behavior of η(T ) . . . . . . . . . . . . 4.3.2 What is better: age or block replacement ? . . . . . 4.3.3 Finding the optimal maintenance period (age) T ∗ . . 4.3.4 Contamination of F (t) by early failures . . . . . . . 4.3.5 Treating uncertainty in data . . . . . . . . . . . . . 4.3.6 Age replacement for the IFR family . . . . . . . . . 4.3.7 Age replacement with cost discounting . . . . . . . 4.3.8 Random choice of preventive maintenance periods . Opportunistic Maintenance . . . . . . . . . . . . . . . . . . 4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 4.4.2 System Description. Coordination of Maintenance tions. Cost Structure . . . . . . . . . . . . . . . . . . 4.4.3 Optimization Algorithm. Basic Sequence . . . . . . 4.4.4 Discussion. Possible Generalizations . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . 74 . . . 77 . . . 77 . . . 78 . . . 79 . . . 81 . . . 84 . . . 85 . . . 86 . . . 86 . . . 88 . . . 88 . . . 89 . . . 89 . . . 91 . . . 93 . . . 94 . . . 97 . . . 97 Ac. . . 97 . . . 99 . . . 101 . . . 103

5 Maintenance: Parameter Control 5.1 Introduction. System Parameter Control . . . . . . . . . . . . . . 5.2 Maintenance of a Multiline System . . . . . . . . . . . . . . . . . 5.3 Two-Stage Failure Model . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Failure Model . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Preventive Maintenance. Costs . . . . . . . . . . . . . . . 5.3.3 Some Facts About the Nonhomogeneous Poisson Process 5.3.4 Finding Optimal Location of the PM Points . . . . . . . 5.4 Markov-Type Processes With Rewards (Costs) . . . . . . . . . . 5.4.1 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Semi-Markov Process . . . . . . . . . . . . . . . . . . . . 5.4.3 SMP With Rewards (Costs) . . . . . . . . . . . . . . . . 5.4.4 Continuous-Time Markov Process . . . . . . . . . . . . . 5.5 Opportunistic Replacement of a Two-Component System . . . 5.5.1 General Description . . . . . . . . . . . . . . . . . . . . . 5.5.2 Formal Setting of the Problem . . . . . . . . . . . . . . . 5.5.3 Numerical Example . . . . . . . . . . . . . . . . . . . . . 5.6 Malfunction in an Automatic Machine . . . . . . . . . . . . . . . 5.6.1 Problem Description . . . . . . . . . . . . . . . . . . . . 5.6.2 Problem Formalization . . . . . . . . . . . . . . . . . . .

109 109 111 113 113 114 115 116 118 118 120 121 122 124 124 125 126 129 129 130

CONTENTS

5.7

6

v

5.6.3 Numerical Example . . . . . . . . . . . . . . . . . . . . . 132 Multidimensional State Description . . . . . . . . . . . . . . . . . 133 5.7.1 General Description . . . . . . . . . . . . . . . . . . . . . 133 5.7.2 The Best Scalarization . . . . . . . . . . . . . . . . . . . . 134 5.7.3 Preventive maintenance: multidimensional state description137

Best Time Scale For Age Replacement 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 6.2 Finding the Optimal Time Scale . . . . . . . . . 6.3 Optimal Time Scale for Age Replacement . . . . 6.4 Algorithm for finding the optimal Ta . . . . . . 6.4.1 Complete samples . . . . . . . . . . . . . 6.4.2 Right-censored samples . . . . . . . . . . 6.4.3 Quantal-type data: parametric approach 6.5 Optimal Age Replacement for Fatigue Data . . .

7 Maintenance With Learning 7.1 Information Update: Learning From New Data 7.2 Maintenance Without Information Update . . . 7.3 Rolling Horizon and Information Update . . . 7.4 Actual and ”Ideal” Rewards . . . . . . . . . . 7.6 Numerical Example . . . . . . . . . . . . . . . 7.6 Exercises . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

141 141 143 144 147 147 148 149 149

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

153 153 156 157 158 158 160

Solutions to Exercises

165

Appendix 1: Nonhomogeneous Poisson Process 191 A.1 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 A.2 Intensity Function: Estimation . . . . . . . . . . . . . . . . . . . 194 Appendix 2: Covariances and Means of Order Statistics

199

Appendix 3: the Laplace Transform

205

Appendix 4: Probability Paper

207

Appendix 5: Renewal Function

211

References

213

Notation and Abbreviations

217

Index

219

vi

CONTENTS

PREFACE This is a one-semester course in Reliability Theory and Preventive Maintenance given for M.Sc. students of Industrial Engineering Department of BGU in the 97/98 and 98/99 academic years. Engineering students are mainly interested in the applied part of the theory. The value of preventive maintenance theory lies in the possibility of its implementation. The latter crucially depends on how we handle statistical reliability data. The very nature of the reliability theory object – system lifetime – makes it extremely difficult to observe much data. The data available are usually incomplete, e.g. heavily censored. Thus the desire to make the course material more applicable led me to including in the course such topics as modeling system lifetime distributions (Chapters 1,2) and the maximum likelihood techniques for lifetime data processing (Chapter 3). A course in Theory of Statistics is a prerequisite for these lectures. Standard courses usually pay very little attention to the techniques needed for our purpose. A short summary of them was given in Chapter 3, including widely used probability plotting. Chapter 4 describes the most useful and popular models of preventive maintenance and replacement. Some practical aspects of applying these models were addressed, such as treating uncertainty in the data, the role of data contamination and the opportunistic scheduling of maintenance activities. Chapter 5 presents the maintenance models which are based on monitoring a ”prognostic” parameter. Formal treatment of these models needs using some basic facts from Markov-type processes with rewards (costs). In the recent years, there is a growing interest to maintenance models based on monitoring the process of damage accumulation. A good example are the works dealing with preventive maintenance of such ”nontypical” objects as bridges, concrete structures, pipelines, dams, etc. The chapter is concluded by considering a general methodology for planning preventive maintenance when a system has a multidimensional state parameter. The main idea is to make the maintenance decisions depending on the value of a one-dimensional system ”health index”. The material of Chapter 6 is new for a traditional course. It is based on recent works of Kh. Kordonsky and considers the choice of the best time scale for age replacement. It would be not an an exaggeration to say that the correct vii

viii

PREFACE

choice of the time scale is a central issue in any sphere of reliability applications. Chapter 7 shows an example of learning in the process of servicing a system. Several strong assumptions were made to make the mathematics maximally simple. It is important to demonstrate to the students that the combination of prior knowledge with new data received in the process of decision making is, in fact, a universal phenomenon, which may have various useful applications. It takes me, on the average, two weeks in the classroom (3 hours weekly) to deliver the material of one chapter. In addition, I am spending some time explaining the most useful procedures of Mathematica needed for the numerical analysis of the theoretical models and for solving the exercises. Getting to the ”real” numbers and graphs always gives the students a good feeling and develops better intuition and understanding, especially if the material is saturated by statistical notions. The course is concluded by detailed solutions of exercises, including numerical investigation by means of Mathematica.

Ilya Gertsbakh Beersheva, January 2000.

Introduction Patrick D.T. O’Connor (1991), page xx, gives the definition of reliability ”as the ability to perform a required function under stated conditions for a stated period of time”. This definition follows the British standard BS 4778. The same standard defines failure as ”the termination of the ability to perform a required function”. To make these definitions work and to become a basis for developing a quantitative theory, one has to specify what is ”the required function” and what is ”the termination of the ability” to perform this function. Consider for example, car tire. Certainly, its puncture terminates the ability of the tire to function. This is an example of a ”sudden” failure. A sudden failure develops in a very short interval of time. Contrary to sudden failure is a ”gradual” failure. For the same tire, one might specify its proper functioning as the ability to avoid skidding and its failure as a wearout of the tire beyond some prescribed critical level. Typically, a gradual failure is defined in terms of certain physical parameter associated with the system (component) as the event that this parameter become smaller (or larger) than some prescribed critical value. An important step in the formal development of the reliability theory is introducing the notion of lifetime. It is defined as the time elapsed since the ”birth” of the system till the appearance of certain event which we call ”failure”. For example, the lifetime of a car cooling system is the mileage made by the car from its acquisition until the appearance of the malfunction (failure) in the cooling system which reveals itself as the loss of ability to cool the car interior. It is important to stress, that it is not always clear in which units we should measure the lifetime. For a car, it might be a calendar time, mileage or something else. For example, for cooling system, the ”relevant” time might be the number of air conditioner operation hours. After the lifetime is defined, we are able to define a principal quantitative notion for reliability theory: system (component) reliability. Typically, it is the probability that the lifetime τ will exceed some value t in the already defined time scale. Formally, we call P (τ > t) = R(t) the reliability function. Often the term survival function is used. Note that 1 − R(t) = P (τ ≤ t) = F (t) is the cumulative distribution function (c.d.f.) of the random variable (r.v.) τ . Most of operating systems are not of ”one-shot” type, i.e. after failure ix

x

INTRODUCTION

appearance, the system is repaired and starts operating again. Therefore, the socalled renewable systems have alternating ”up” and ”down” periods. Important characterization of system reliability for renewable systems is made by using the notions of interval and stationary availability. Suppose, an important mission for which the system is designed lasts time ∆. So, those who use the system will be first of all interested in the value of the probability that the system will be up at some time instant t, when the mission begins, and will not fail during the subsequent period of length ∆. Denote this probability by Av (t, ∆). This is the so-called interval availability, an important characteristic, especially for highly expensive military equipment. Very popular is another reliability index which is the limiting value of Av (t, ∆) as t → ∞ and ∆ → 0. The resulting index called stationary availability and denoted as Av is the probability that the system will be up in some remote time instant t. It is worth noting that Av is approximately equal to the ratio of the total time up on [0, t] over t, for large values of t. A somewhat more general characterization of a system which has up and down periods can be made in terms of system’s average reward accumulated during time period [0, t]. Dealing with reward gives certain advantages in the formal treatment. For example, each unit of normal operation gives a certain reward, each unit of idle time and each failure and its elimination (repair) costs money, or formally results in negative reward. We are usually interested in maximizing the average reward by varying the maintenance policy. The additive reward structure over time allows to use such powerful tool as Dynamic Programming to achieve the optimum reward. Very often, it is more convenient to operate with the average reward per unit time, which formally is defined as the limit, as t → ∞, of the total accumulated reward on [0, t] over t. What are the typical problems solved by the Reliability Theory ? One direction in reliability theory is called structural reliability. Typically, it studies the connection between the properties of an assembly of components (i.e. a system) and the properties of the components themselves. Suppose, for example, that component reliability functions and/or their stationary availabilities are known and assume that the component failure are independent events. The structural reliability theory enables to evaluate the system reliability and/or its availability on the basis the reliabilities and availabilities of system components. Chapter 1 is devoted to the basics of structural reliability. Another circle of problems tackled by the reliability theory is the probabilistic modeling of lifetime. Which parametric families of lifetime distributions are suitable for describing random lifetime? How to connect the physical failure model with the corresponding random mechanism causing failure and how to derive from it the analytic form of the lifetime distribution? Answering these questions is of primary importance for all applications of reliability theory. A notion which in fact was introduced by reliability theory and which plays a major role in lifetime modeling is failure rate (often termed also hazard rate). This notion allows to connect the probabilistic properties of lifetime distribution

xi with the physical characteristics of systems, such as aging and susceptibility to failure. Chapter 2 considers the most useful lifetime distributions in reliability and reviews their principal properties. The implementation of theoretical models to practical situations depends crucially on our ability to support these models by data. The simplest situation is that we assume the analytic form of the lifetime distribution to be known, but only up to certain parameters. These parameters must be estimated from test or field data. Here is the point where the statistical inference issues enter the reliability theory. The characteristic feature of the statistical inference in reliability is the presence of incomplete observations. If we observe lifetime, then contrary to the observations of other physical parameters (e.g. weight, height, voltage, etc), we hardly could hope to obtain a complete sample. Special methods are used to process incomplete and censored data, among them the method of maximum likelihood. The statistical inference for incomplete data are reviewed in Chapter 3. The main emphasis of this book is made on the preventive maintenance (PM) issues. A typical problem of that sort is the choice of the optimal age in the so-called age replacement scheme. Suppose that a system (component) has a known lifetime distribution F (t). Its failure costs C, and its planned repair (so called preventive maintenance ) – a much smaller amount c, c x, φ(y) = 1, then the corresponding cut set is called minimal cut set or simply minimal cut. # A state vector x is called a path vector if φ(x) = 1. The set A(x) = {i : xi = 1} is called then a path set. If, in addition, for any y, y < x , φ(y) = 0, then the corresponding path set is called minimal path set, or minimal path. # A minimal cut set is a minimal set of components whose failure causes the failure of the whole system. If all elements of the path set are ”up” then the system is up. Minimal path is a minimal set of elements whose functioning (i.e. being up) ensures that the system is up. The minimal path set cannot be reduced, it has no redundant elements. Examples 1.5, 1.6 continued. For Example 1.5, x1 = (1, 1, 1, 0) is a a path vector. The corresponding path set is {1, 2, 3}. It is not however a minimal path set because element 2 can be turned down but the system still is up. {1, 3} is the minimal path set. There are three other minimal path sets. Find them ! For Example 1.6, there are two minimal path sets {1, 2} and {3, 4, 5}. The system on Fig. 1.3 has two minimal cuts: {1, 2} and {3, 4}. The set {1, 2, 3} is also a cut set but not a minimal one. The system on Fig 1.4 has six minimal cuts of the form {i, j}, where i = 1, 2 and j = 3, 4, 5.# Theorem 1.1: structure function representation.

1.1.

THE SYSTEM AND ITS COMPONENTS.

5

Let P1 , P2 , ..., Ps be the minimal path sets of the system.Then

φ(x) = 1 −

s Y

(1 −

j=1

Y

xi ) .

(1.1.4)

i∈Pj

Let C1 , C2 , ..., Ck be the minimal cut sets of the system. Then

φ(x) =

k Y j=1

(1 −

Y

(1 − xi )) .

(1.1.5)

i∈Cj

Proof Assume that thereQis at least one minimal path set all elements of which are up, say P1 . Then i∈P1 xi = 1 and this leads to φ(x) = 1. Suppose now that the system is up. Then there must be one minimal path set having all of its elements in the up state. Thus the right-hand side of (1.1.4) is 1. Therefore, φ(x) = 1 if and only if there is one minimal path set having all its elements in the up state. This proves (1.1.4). We omit the proof of (1.1.5) which is similar. It follows from Theorem 1.1 that any monotone system can be represented in two equivalent ways: as a series connection of parallel subsystems each being a minimal cut set, or as a parallel connection of series subsystems each being a minimal path set. Therefore, there are two possibilities to represent the structure functions. After corresponding simplifications, they become identical, as the following example shows. Example 1.5 continued. The structure function given above for the system on Fig. 1.3. is based on minimal cuts {1, 2} and {3, 4}. The system has also four minimal paths {1, 3} {1, 4} {2, 3} {2, 4}. Thus, φ(x) = 1 − (1 − x1 x3 )(1 − x1 x4 )(1 − x2 x3 )(1 − x2 x4 ). The structure function based on minimal cuts has been presented earlier in Example 1.5. Both formulas produce identical results. To verify that, it is necessary to simplify both expressions. Note that for binary variables, xki = xi for any integer k.# More information on monotone systems and their structure function can be found in literature, e.g. in Barlow and Proschan (1975), Chapter 1, and Gertsbakh (1989), Chapter 1. ‘

6

1.2

CHAPTER 1. SYSTEM RELIABILITY

Independent Components: System Reliability and Stationary Availability

Contrary to Section 1, let us assume now that the state of component i is described by a binary random variable Xi , defined as follows: P (Xi = 1) = pi , P (Xi = 0) = 1 − pi ,

(1.2.1)

where 1 or 0 correspond to the operational and failure state, respectively. It will be assumed that all components are mutually independent. It implies a considerable formal simplification: for independent components, the joint distribution of X1 , X2 , ..., Xn is completely determined by component reliabilities p1 , p2 , ..., pn . Denote by X = (X1 , X2 , ..., Xn ) the system state vector. It is now a random vector. Correspondingly, the system structure function φ(X) = φ(X1 , ..., Xn ) becomes a binary random variable: φ(X) = 1 corresponds to the system up state and φ(X) = 0 corresponds to the system down state. Definition 2.1: system reliability System reliability r0 is the probability that the system structure function equals 1: r0 = P (φ(X) = 1) .#

(1.2.2)

Since φ(·) is a binary random variable, the last formula can be written as r0 = E[φ(X)] .

(1.2.3)

(1.2.3) is a very useful expression since the operation of taking expectation E[·] is a very powerful tool for reliability calculations. The following examples show how to compute system reliability via its structure function. ExampleQ2.1: reliability of a series system n φ(X) = i=1 Xi , and therefore r0 = E[φ(X)] =

n Y

pi .#

(1.2.4)

i=1

Example 2.2: parallel Qn system Here φ(X) = 1 − i=1 (1 − Xi ). Thus r0 = E[φ(X)] = 1 −

n Y

(1 − pi ) .#

(1.2.5)

i=1

Example 2.3: Series connection of parallel systems (Example 1.5, Section 1).

1.2. INDEPENDENT COMPONENTS

7

From the expression for φ(X) it follows immediately that r0 = E[φ(X)] = [1 − (1 − p1 )(1 − p2 )][1 − (1 − p3 )(1 − p4 )] .#

(1.2.6)

Example 2.4: 2-out-of-4 system with identical elements. For this system, 4 X r0 = E[φ(X)] = P ( Xi ≥ 2) = i=1

4   X n m p (1 − p)4−m .# m m=2

(1.2.7)

Let (αi ; p) denote the vector p with its i-th component replaced by αi . So, (1i ; p) = (p1 , ..., pi−1 , 1, pi+1 , ..., pn ). Theorem 2.1: pivotal decomposition Let r0 = r(p) = r(p1 , ..., pn ) be the reliability of a monotone system with independent components.Then r(p) = pi r(1i ; p) + (1 − pi )r(0i ; p) .

(1.2.8)

Proof By the definition, r0 = E[φ(X)] = P [φ(X) = 1]. By the formula of the total probability, r0 = pi P (φ(X) = 1|Xi = 1) + (1 − pi )P (φ(X) = 1|Xi = 0), or r0 = pi P (φ(1i ; X) = 1|Xi = 1) + (1 − pi )P (φ(0i ; X) = 1|Xi = 0). Since X1 , ..., Xn are independent, the last equality takes the form: r0 = pi P (φ(1i ; X) = 1) + (1 − pi )P (φ(0i ; X) = 1) .

(1.2.9)

(1.2.9) can be rewritten as r(p) = pi r(1i ; p) + (1 − pi )r(0i ; p) .

(1.2.10)

Physically, r(1i ; p) is the reliability of a system in which the i-th component is replaced by an absolutely reliable one; similarly, r(0i ; p) is the reliability of the system in which the i-th component has failed. (1.2.10) is applied until replacement of components by absolutely reliable ones and/or by failed ones produces a structure for which the reliability is easily computable, such as the series-parallel system. Let us demonstrate how to use the pivotal formula to compute the reliability of a bridge structure shown on Fig. 1.5. Example 2.5: reliability of a bridge structure. The best choice is to pivot around element 3. Suppose the component 3 is up. Then the bridge becomes a series connection of two parallel subsystems consisting of elements 1,2 and 4,5, respectively. Its reliability is r(13 ; p) = [1 − (1 − p1 )(1 − p2 )][1 − (1 − p4 )(1 − p5 )] , compare with (2.6). If 3 is down, then the bridge becomes a parallel connection of two series systems: one with components 1,4 and the second - with components 2,5. Its reliability is r(03 ; p) = 1 − (1 − p1 p4 )(1 − p2 p5 )], Therefore, system reliability is r0 = p3 · r(13 ; p) + (1 − p3 )r(03 ; p).

8

CHAPTER 1. SYSTEM RELIABILITY

Figure 1.5: Bridge structure It is easy to finish the calculations. The final result is r0 = E[φ(X)] = p1 p3 p5 + p2 p3 p4 + p2 p5 + p1 p4 − p1 p2 p3 p5 − p1 p2 p4 p5 − p1 p3 p4 p5 − p1 p2 p3 p4 − p2 p3 p4 p5 + 2p1 p2 p3 p4 p5 . (1.2.11) It is instructive to obtain the same result by using, for example, the minimal paths. Bridge has four minimal path sets: {1,3,5}, {2,3,4},{1,4}, and {2,5}. Thus the random structure function is φ(X) = 1 − (1 − X1 X3 X5 )(1 − X2 X3 X4 )(1 − X2 X5 )(1 − X1 X4 ) .(1.2.12) Open the parentheses, simplify the expression using that Xik = Xi and take the expectation. # More on the use of the pivotal decomposition can be found in Gertsbakh (1989) and Barlow (1998) and the references there. System reliability r0 as defined above is of purely ”static” nature: system operation time is not present at all. One can imagine that a coin is flipped for element i showing ”up” and ”down” with probabilities pi and 1−pi , respectively. Then, for this static experiment, r0 is the probability that the system will be in the ”up” state. Another interpretation might be the following. Assume that we have a certain instant t∗ on the time axis and pi is the probability that component i is up at t∗ . Then by (1.2.2) r0 has the meaning of the probability that the system is up at t∗ . We will show in the next section how this fact leads to an expression of system lifetime distribution function. There is another interpretation of the quantity r0 which involves time and is related to the so-called system availability. First, suppose that each component has alternating up and down periods on the time axis. Respectively, the whole system has also alternating up and down periods on the time axis. Fig. 1.6 illustrates this situation for a series system of two components. System up periods are those for which both components are in the up state.

1.2. INDEPENDENT COMPONENTS

9

Figure 1.6: The system is up if and only if both components are up (k)

(k)

Consider component k and denote by ξj and ηj , j = 1, 2, 3, ..., the sequential up and down periods for this component. The following quantity is called stationary availability of component k: A(k) v =

µ(k) , µ(k) + ν (k) (k)

(1.2.13) (k)

where µ(k) = E[ξj ], ν (k) = E[ηj ]. (k)

Av has the following probabilistic interpretation. Let P (k) (t) be the probability that component k is up at some time instant t. Then (k) A(k) (t) . v = lim P t→∞

(1.2.14)

One can say that the stationary availability is the probability that the component is up at some remote instant of time (formally, as t → ∞). Another interpretation is the following. Denote by V (k) (T ) the total amount of up time for component k in [0, T ]. Then (k) A(k) (T )]/T . v = lim E[V T →∞

(1.2.15)

Now we are able to claim the following. Suppose that we have a formula expressing the ”static” up probability r0 of the system as a function of component up probabilities pi , i.e. r0 = ψ(p1 , p2 , ..., pn ). (Let us remind that the compo(k) nents are assumed to be independent). Suppose that pk = Av , k = 1, 2, ..., n, i.e. pk equals to the stationary availability of the k-th component. Then system stationary availability, Av is equal to (n) Av = ψ(A(1) v , ..., Av ) .

(1.2.16)

System stationary availability can be interpreted as the limiting probability that the system will be in the up state at a remote time moment. Alternatively,

10

CHAPTER 1. SYSTEM RELIABILITY

let V (T ) be the total amount of up time for the system on [0, T ]. Then Av = limT →∞ E[V (T )]/T . The proof of (1.2.15) can be found in, e.g. Barlow and Proschan (1975), Chapter 7. For a series system, Av has the form: Av =

n Y

µ(k) . µ(k) + ν (k) k=1

(1.2.17)

This important formula is widely used in reliability practice. System availability will be discussed in Chapter 4 in connection with alternating renewal processes.

1.3

Lifetime Distribution of a System Without Component Renewal

The following properties of system components will be assumed for the exposition in this section: (i) Component i has lifetime τi with known cumulative distribution function (c.d.f.) Fi (t) = P (τi ≤ t), i = 1, ..., n. (ii) τi are independent random variables (r.v.’s), i = 1, ..., n. (iii) At t = 0, all components are up. A component which fails is not renewed or repaired. A components which fails remains in the failure (down) state ”forever”. To define system lifetime, we need some extra notation. Let a binary 1/0 r.v. Xi (t) be defined as follows: Xi (t) = 1 if and only if τi > t. If τi ≤ t then Xi (t) = 0. In other words, Xi (t) equals one as long as component i is up and it becomes zero when this component gets down. Let X(t) = (X1 (t), ..., Xn (t)) be the component state vector at time t. Definition 3.1: system lifetime System lifetime τ is the time until the first entrance of the system into the down (failure) state: τ = inf[t : φ(X(t)) = 0] .#

(1.3.1)

We define system reliability R(t) as the probability that τ exceeds t: R(t) = P (τ > t). Often R(t) is called survival probability or survival function. Let r0 = E[φ(X1 , ..., Xn )] = ψ(p1 , ..., pn ) be the ”static” reliability of the system, see Definition 2.1. The following theorem says how to find R(t) by means of the function ψ(p1 , ..., pn ). Denote by Ri (t) the reliability of component i, i.e. Ri (t) = P (τi > t). Theorem 3.1: system reliability R(t) = ψ(R1 (t), R2 (t), ..., Rn (t)),

(1.3.2)

1.3.

LIFETIME DISTRIBUTION

11

where Ri (t) = 1 − Fi (t). Proof It follows from the definition of Ri (t) that R(t) is the probability that the system is operating (up) at time instant t. A monotone system consisting of nonrenewable components starts operating at t = 0 in the ”up” state, enters eventually the ”down” state and remains in it forever after the first entrance into it. Thus, if the system is ”up” at the instant t, it was ”up” during the whole time interval [0, t], or R(t) = P (τ > t). Theorem 3.1 says that the system reliability function is obtained by replacing in the structure function the component reliabilities pi by the corresponding reliability functions Ri (t). Example 3.1: minimum-maximum calculus Qn For a series system, r0 = i=1 pi . Then by Theorem 3.1, R(t) =

n Y

(1 − Fi (t)) .

(1.3.3)

i=1

Let F (t) be the c.d.f. of system lifetime τ . Then P (τ ≤ t) = F (t) = 1 − R(t) = 1 −

n Y

(1 − Fi (t)).

(1.3.4)

i=1

Let us recall how the formula for the c.d.f. of the minimum τ of n independent r.v.’s τ1 , ..., τn is derivedQ . The events {τ > and {τ1 > t, ..., τn > t} are Qt} n n equivalent. Thus P (τ > t) = i=1 P (τi > t) = i=1 Ri (t). Therefore, the lifetime of a series system of independent components coincides with the minimum lifetime of its components. Qn For a parallel system, r0 = 1 − i=1 (1 − pi ), and therefore R(t) = 1 −

n Y

Fi (t) .

(1.3.5)

i=1

Qn Similarly to the series system, it is easy to show that 1 − R(t) = i=1 Fi (t) is the formula for the c.d.f. of the maximum of n independent r.v.’s τ = max(τ1 , τ2 , ..., τn ). In other words, the lifetime of a parallel system coincides with the maximal lifetime of its components. It is now obvious that the lifetime of a series-parallel system can be expressed via the component lifetimes using minimum and maximum operations. Let us consider, for example, the system shown in Fig. 1.7. Let τi ∼ Fi (t). Clearly, the system lifetime τ = max(τ5 , min(τ1 , τ4 , max(τ2 , τ3 ))); .

(1.3.6)

12

CHAPTER 1. SYSTEM RELIABILITY

Figure 1.7: A series-parallel system Obviously, τ ∗ = max(τ2 , τ3 ) ∼ F ∗ (t) = F2 (t)F3 (t) .

(1.3.7)

Similarly, τ ∗∗ = min((τ1 , τ4 ), τ ∗ ) = 1 − R1 (t)R4 (t)R∗ (t) .

(1.3.8)

Now, τ = max(τ5 , τ ∗∗ ), and by (1.3.7), F (t) = F5 (t)(1 − R1 (t)R4 (t)(1 − F2 (t)F3 (t))) .#

(1.3.9)

We have seen that for a series-parallel system it is quite easy to derive an expression for system lifetime distribution. It turns out that the reliability function of a monotone system with independent components can be approximated from above and from below by a reliability function of a specially chosen seriesparallel system. This is summarized in the following statement, see e.g. in Barlow and Proschan (1975), Chapter 2. Theorem 3.2 Let A1 , A2 , ..., Al be the minimal path sets, and let C1 , C2 , ..., Cm be the minimal cut sets of the monotone system. Denote the reliability of i-th component by pi , and by r0 −the system reliability. Then m Y k=1

(1 −

Y j∈Ck

(1 − pj )) ≤ r0 ≤ 1 −

l Y r=1

(1 −

Y

pj ) .

(1.3.10)

j∈Al

This theorem says that r0 is bounded from above by the reliability of a fictitious system (with independent components) which is a parallel connection of series subsystems each being the minimal path set of the original system. Similarly, the lower bound is the reliability of a system obtained by a series connection of parallel subsystems each being the minimal cut set of the original system. For example, the ”lower system” for a bridge is shown on Fig. 1.8, a.

1.3.

LIFETIME DISTRIBUTION

13

Figure 1.8: The ”lower” (a) and the ”upper” (b) systems for the bridge Similarly, Fig. 1.8, b shows the ”upper system”. Note that the elements i and i∗ are assumed to be independent. Denote the lower bound in (1.3.10) by LB(p1 , ..., pn ) and the upper bound by U B(p1 , ..., pn ). Then, using the arguments of Theorem 3.1, we can establish the following: Corollary 3.1 LB(R1 (t), ..., Rn (t)) ≤ R(t) ≤ U B(R1 (t), ..., Rn (t)) .

(1.3.11)

In other words, system reliability function R(t) is bounded from above and from below by the reliability functions of the ”lower” and ”upper” system, respectively. In conclusion, let us present a very useful formula for computing the mean value of system lifetime. Suppose that τ is a nonnegative random variable, i.e. P (τ ≥ 0) = 1. Let F (t) be the c.d.f. of τ , i.e. P (τ ≤ t) = F (t). Then Z ∞ E[τ ] = (1 − F (t))dt . (1.3.12) 0

R∞ To prove this formula, integrate 0 (1 − F (t))dt by parts and use the fact that if E[τ ] exists, then limt→∞ [t(1 − F (t))] = 0. Example 3.2 Let τ ∼ F (t) and let T be a constant. Define X = min[τ, T ]. Find E[X]. The physical meaning of r.v. X is the following. Suppose that τ is the system lifetime. We stop the system either at its failure or at ”age” T , whichever occurs first. This situation is typical for the so-called age replacement which we will be considering later, see Chapter 4. Then X is the random time span during which the system has operated before it was stopped.

14

CHAPTER 1. SYSTEM RELIABILITY

Note that T can be viewed as a discrete random variable whose all probability mass is concentrated in the point t = T . T has the following c.d.f.: FT (t) = P (T ≤ t) = 0, for t < T , and FT (t) = 1 for t ≥ T . Moreover, T is independent on τ . Then by (1.3.3) FX (t) = (1 − F (t)) · (1 − FT (t))dt . Since FX (t) = 0 for t ≥ T , Z T (1 − F (t))dt . # E[X] =

(1.3.13)

(1.3.14)

0

1.4

Exercises

1. Consider the following system:

All elements are independent. Denote by pi the reliability of element i, i = 1, 2, 3, 4. a. Find φ(x), the system structure function. b. Find r0 = E[φ(X)], the system reliability. c. Find the Barlow-Proschan upper and lower bounds on r0 . d. Suppose that the lifetime τi of element i is exponential, i.e. P (τi ≤ t) = 1 − e−λi t . Find the system lifetime distribution function. e. Suppose that element i is up, on the average, during time µi = 10 + i, and is down, on the average, during time νi = 2 + 0.5i. Periods of up and down times alternate for each element. Find system stationary availability . 2. Let system reliability be r0 = ψ(p1 , p2 , ..., pn ). The so-called Birnbaum’s importance index of element i is defined ∂ψ as Ri = ∂p . i a. Find R3 for the system of Exercise 1. Find the element with greatest importance for p1 = 0.9, p2 = 0.8, p3 = 0.5 = p4 . b. Use the pivotal decomposition formula (1.2.10) to prove that R(i) = r(1i ; p) − r(0i ; p).

1.4.

EXERCISES

15

Hint: Differentiate r(p) with respect to pi . 3. Let (1i , x) be the vector x = (x1 , ..., 1, ..., xn ), i.e. the i-th coordinate is set to be 1. Similarly, (0i , x) is the vector x with i-th coordinate equal 0. Prove that φ(x) = xi · φ(1i , x) + (1 − xi ) · φ(0i , x). 4. n identical elements each with exponentially distributed lifetimes are connected in parallel. The mean lifetime of a single element equals 1/λ = 1. Which n would provide the mean lifetime of the whole system being equal at least 2 ? 5. Let n independent elements be connected in series. The lifetime Pnof the i-th element is τi ∼ Exp(λi ). Prove that the system lifetime τ ∼ Exp( i=1 λi ). 6. Consider the bridge structure, see Section 2. Assume that all elements have reliability p = 0.9, 0.95, 0.99. Compute the exact reliability and the BarlowProschan upper and lower bounds. 7. Consider a parallel and a series system with component reliabilities p1 ≤ p2 ≤ ... ≤ pn . What are for these systems the most important components, according to the Birnbaum’s measure? 8. Compute bridge reliability using the pivotal decomposition formula. Hint: Pivot around the component 3, see Fig. 1.5. 9. A radar system consists of three identical independently operating stations. The system is considered as being ”up” if and only if at least two stations are operational. Assume that station’s lifetime G(t) = 1 − exp[−λt]. After the system enters the down state, it is repaired and brought to ”brand new” (up) state during time trep = 0.1/λ. a. Find the mean duration of the ”up” time. b. Find the system stationary availability. 10. A system consists of two units, A and B. If A is up and B is down, or vice versa, the system is in the ”up” state. Because of the power supply system deficiency, a simultaneous operation of A and B is impossible, since there is not enough power for both units. Is this system monotone ?

16

CHAPTER 1. SYSTEM RELIABILITY

Chapter 2

Parametric Lifetime Distributions

Everything should be made as simple as possible, but not simpler. Albert Einstein

2.1

Poisson Process. Exponential and Gamma Distributions.

Suppose that we observe certain events that appear at random time instants. For example, the events might be arrivals of buses at the bus terminal, or telephone calls addressed to a certain person, or failures of an equipment. To be specific, we shall speak of some particular sort of events, say telephone calls. Denote by N (t) the total number of calls which took place during the interval (0, t]. For each fixed t, N (t) is a random variable, and a family of random variables {N (t), 0 < t < ∞} is termed a counting random process. In this section we restrict our attention to processes with stationary increments: the number of calls in the interval (t1 +s, t2 +s] has the same distribution as the number of calls in any other interval of length (t2 − t1 ). One very important particular type of these processes is the so-called Poisson process. Denote by ∆i the interval (t, t + ∆i ] and let Ai (ki ) be the event ”the number of calls during ∆i equals ki ”.Denote by N (h) the number of events in the interval 17

18

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

(t, t + h]. Definition 1.1 {N (t), t > 0}, is said to be a Poisson process with rate λ, if (i) For any set of nonoverlapping intervals ∆1 , ..., ∆m , and for any collection of integers k1 , k2 , ..., km , the events A1 (k1 ), ..., Am (km ) are jointly independent. (ii) P (N (h) = 1) = λh + o(h) as h → 0. (iii) P (N (h) ≥ 2) = o(h) as h → 0.# Our purpose is to derive formulas for the probabilities Pk (t) that there are exactly k calls in the interval of length t. In addition to the above assumptions (i)-(iii), we postulate also that the probability of having more than zero events in a zero-length interval is zero: Pk (0) = 0, k ≥ 1. Therefore, P0 (0) = 1. It follows from Definition 1.1 that P0 (h) = 1 − λh + o(h) and thus P0 (t + h) = P (N (t + h) = 0) = P (N (t) = 0, N (t + h) − N (t) = 0) = P (N (t) = 0)P (N (t + h) − N (t) = 0) = P0 (t)[1 − λh + o(h)] . Hence, (P0 (t + h) − P0 (t))/h = −λP0 (t) + o(h)/h .

(2.1.1)

Now letting h → 0, we obtain the differential equation P00 (t) = −λP0 (t),

(2.1.2)

which has to be solved for the initial condition P0 (0) = 1. The solution is P0 (t) = e−λt .

(2.1.3)

Similarly, for n > 0, Pn (t + h) = Pn (t)P0 (h) + Pn−1 (t)P1 (h) + o(h ).

(2.1.4)

Substitute into P0 (h) = 1 − λh + o(h) in (2.1.4). This leads to the equation Pn0 (t) = −λPn (t) + λPn−1 (t) .

(2.1.5)

It must be solved under the condition that Pn (0) = 0, n > 0. Let us omit the routine solution procedure and provide the answer: Pn (t) = e−λt (λt)n /n! .

(2.1.6)

(Verify it !). We have therefore proved the following proposition : Theorem 1.1 In a Poisson process with parameter λ, the number of calls in the interval of length t has a Poisson distribution with parameter λt.

2.1. POISSON PROCESS

19

Now it is the right time for a short reminder about the Poisson distribution. We say that r.v. X has a Poisson distribution with parameter µ (denoted as X ∼ P(µ)) if P (X = k) = e−µ µk /k!, k = 0, 1, 2... The mean value of X is E[X] = µ and the variance is V ar[X] = µ. The Poisson distribution is closely related to the binomial distribution. Suppose that we have a series of n independent experiments, each of them can be a success (with probability p) or a failure with probability 1−p. Then the random variable Y which counts the total number of successes in n experiments has the so-called binomial distribution:   n k P (Y = k) = p (1 − p)n−k . (2.1.7) k We will use the notation Y ∼ B(n, p). Now assume that n → ∞ and p → 0, in such a way that n · p = µ. Then it is a matter of simple algebra to show that   n k P (Y = k) = p (1 − p)n−k → e−µ µk /k!, (2.1.8) k as n → ∞. One can imagine the Poisson process with rate λ in the following way. Divide the interval [0, t] into M small intervals, each being of length ∆ = t/M . Imagine that a ”lottery” takes place in each such small interval, the result of which can be either ”success” or ”failure”, independently of the results of all the previous lotteries. Each success is an ”event” ( a call) in the counting process N0 (t) which is the total number of successes in [0, t]. Now let the probability of success be p∆ = ∆ · λ, i.e. it is proportional to the length of the small interval. Then, as M → ∞, p∆ · M = (λ · t/M ) · M = λt .

(2.1.9)

The number of successes in [0, t] has, for any finite M , the binomial distribution, which in limit (as M goes to infinity) approaches the Poisson distribution with parameter λt. The above described discrete ”lottery process” is in fact a discrete version of Definition 1.1 (with ∆ = h ). Denote by τi the interval between the (i − 1)-st and the i-th call in the Poisson process. (Assume that the call number zero appears at t = 0), see Fig. 2.1. Theorem 1.1 implies the following Corollary 1.1 (i) τ1 , τ2 , ..., τn , ... are i.i.d. random variables, P (τi ≤ t) = 1 − e−λt .

(2.1.10)

20

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Figure 2.1: Poison process. τi ∼ Exp(λ) (ii) Define Tk =

Pk

i=1 τi .

P (Tk ≤ t) = 1 − e−λt

Then

k−1 X i=1

(λt)k . k!

(2.1.11)

The distribution of τi is called exponential and will be denoted as Exp(λ). It plays an extremely important role in reliability theory. The distribution of Tk is called the Gamma Distribution and it will be denoted as Tk ∼ Gamma(k, λ). Proof (i) follows from the fact that P0 (t) = e−λt . Indeeed, it says that the time to the first call exceeds t with probability P0 (t). Therefore, the time to the first call τ1 ∼ Exp(λ). From the definition of the Poisson process it follows that the time τ2 between the first call and the second call has the same distribution and is independent on τ1 , etc. To prove (ii), note that the events {Tk > t} and ”there are (k − 1) or less events in [0, T ]” are equivalent. Thus, P (Tk > t) = e−λt

k−1 X i=1

(λt)k . k!

(2.1.12)

The exponential distribution is a particular case of the Gamma distribution: Exp(λ) = Gamma(1, λ). The mean and variance of these distributions is given below.For τ ∼ Exp(λ), E[τ ] = 1/λ; V ar[τ ] = 1/λ2 .

(2.1.13)

The exponential distribution is the only continuous distribution which possesses the so-called memoryless property. Let τ ∼ Exp(λ). Then P (τ > t) = R ∞ −λx λe dx = e−λt . Suppose, it is known that {τ > t1 }. Let us compute the t conditional probability P (τ > t1 + t2 |τ > t1 ) =

P (τ > t1 + t2 ) e−λ(t1 +t2 ) = = e−λt2 , P (τ > t1 ) e−λt1

(2.1.14)

2.1. POISSON PROCESS

21

which equals P (τ > t2 ). In physical terms, (2.1.14) says that if the ”system” has survived past t1 , the chances to survive an extra interval of length t2 are the same as if the system were ”brand new”. To get a better understanding of the exponential distribution, let us consider its discrete version. Suppose that the time axis is divided into small intervals of length ∆. A certain event, say a failure, may appear with probability p in each of these intervals, independently on its appearance in any other interval. Let us define the random variable K as the ordinal number of that ∆-interval in which the failure has appeared for the first time. Obviously, P (K = n)P= (1 − p)n−1 p. ∞ It is a matter of routine calculation to show that P (K > n) = i=n+1 P (K = n i) = (1 − p) . Now let us consider the conditional probability P (K > n + m|K > n) =

P (K > n + m) = (1 − p)m = P (K > m).(2.1.15) P (K > n)

This is exactly the discrete analogue of the memoryless property of the exponential distribution: if the ”system” did not fail during the first n intervals, the chances not to fail at least another m intervals are the same as the chances not to fail during the first m intervals for a ”new” system. A random variable K which is distributed as P (K = n) = (1 − p)n−1 p, n = 1, 2, 3, ... is said to have a geometric distribution. We will denote it as K ∼ Geom(p). For this random variable, E[K] = 1/p; V ar[K] = (1 − p)/p2 .

(2.1.16)

The geometric distribution is in fact a discrete version of the exponential distribution. Let us consider a time scale in which the lifetime τ is measured in ∆ -units: τ = K · ∆. Then P (τ > t) = P (K∆ > t) = P (K > [t/∆]) = (1−p)[t/∆] = (1−p)(1/p)·p·[t/∆] ≈ −at e , where p ≈ a · ∆ and p → 0. So, if the failure probability is approximately proportional to the interval length and tends to zero, the geometric distribution approaches the exponential distribution. From the model of the Gamma distribution, it follows that E[Tk ] = k/λ; V ar[Tk ] = k/λ2 .

(2.1.17)

The densities of the Exponential and Gamma distributions are, respectively: fτ (t) = λe−λt , fTk (t) =

tk λk−1 e−λt , t>0. (k − 1)!

(2.1.18)

A useful characteristic of a distribution of a positive random variable is the coefficient of variation (c.v.) defined as p c.v. = V ar[τ ]/E[τ ] . (2.1.19)

22

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Figure 2.2: The Gamma densities. All three densities have the same mean 1. The c.v. is 1, 0.5 and 0.35 for the curves 1,2 an 3, respectively

Figure 2.3: The failure appears with the appearance of the k-th shock

The more peaked is the d.f., the smaller is the c.v. For the Gamma family (2.1.18), 1 c.v. = √ . k

(2.1.20)

Fig 2.2 shows the form of the density functions for the Gamma family. The Gamma distribution is quite useful for reliability modeling because it may describe component (system) lifetime for so-called shock accumulation scheme. Imagine that external shocks arrive according to a Poisson process with parameter λ and that the component can survive no more than k − 1 shocks. Then the appearance of the k-th shock means failure. Therefore, component lifetime distribution function will be Gamma(k, λ).

2.1. POISSON PROCESS

23

For ”large” k, a normal approximation can be used and k  t − k/λ   Pk τ − k/λ X t − k/λ  i=1 i √ P( ≤ √ ≈Φ √ . (2.1.21) τi ≤ t) = P k/λ k/λ k/λ i=1 Here Φ(·) is the standard normal c.d.f. Since the normal random variable formally may have negative values and the lifetime is by definition a positive random variable, the use of normal approximation is justified if its left (negative) tail may be neglected. We recommend using the normal approximation only for k ≥ 9 − 12. A useful modification of (2.1.10) is the exponential distribution with location parameter. Its density is: f (t) = λe−λ(t−a) ,

t ≥ a, and 0, for t < a.

(2.1.22)

In applications, a is usually positive, and in reliability theory it is called the threshold parameter. (2.1.22) might reflect the following single shock failure model. Imagine that that there is some latent deterioration process going on in the system, and during the interval [0, a − ∆] the deterioration is comparatively small so that the shocks do not cause system failure. During a relatively short time interval [a − ∆, a], the deterioration progresses rapidly and makes the system susceptible to shocks. Afterwards, a single shock causes failure. The notation τ ∼ Exp(a, λ) is used for the r.v. with d.f. (2.1.22). Since the exponential distribution plays a prominent role in reliability and preventive maintenance theory, we will present two additional models leading to the exponential distribution. The first is a formalized scheme of a ”large” renewable system. Suppose that the system consists of n independent components, n is ”large”. The lifetime of component i is τi , the mean lifetime of τi is µi = E[τi ]. After the component fails, it is immeadiately replaced by a statistically identical one. Assume that all components are organized in a series system, and each component failure causes system failure. Let us consider the ordered sequence of time instants of system failures t1 < t2 < ... < tk < .... This sequence is obtained by positioning all component failure instants on a common time axis. The following remarkable fact was first discovered by Khinchine (1956). If the mean components lifetimes are approximately of the same magnitude, then the intervals between system failures ∆m = (tm −tm−1 ) behave, for large m and n, approximately as i.i.d. P exponentially distributed r. v.’s. The parameter n of these r.v.’s approaches Λ = i=1 µ−1 i . If τi ∼ Exp(1/µi ), thenP the pooled sequence of failures forms a Poisson n process with parameter Λ = i=1 µ−1 i . This becomes obvious if we recall that inP the pooled process, the probability of an event in the interval [t, t + δ] equals n δ i=1 µ−1 i + o(δ). The above described fact about arbitrary sequences of r.v.’s τi describes a model of creating a random process which approaches, under appropriate conditions, to the Poisson process with a constant event rate.

24

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

For more details see Barlow and Proschan (1975), Chapter 8, Section 4. Now let us turn to a sum of random number of i.i.d. random variables. Formally, our object will be the random variable S = Y1 + Y2 + ... + YK ,

(2.1.23)

where K ∼ Geom(p). It is assumed that K is independent on the i.i.d. r.v.’s Y1 , Y2 , .... Let us remind that P (K = r) = (1 − p)r−1 p,

r = 1, 2, 3, ...

(2.1.24)

The r.v. S is created in the following way. Imagine that we have a coin which shows ”Head” (failure) with probability p. We flip the coin until failure appears for the first time. For example, if we obtain the sequence ”Tail, Tail, Head”, the failure appears on the third trial. The r.v. K counts the number of trials needed to obtain the first Head. After K becomes known and K = k is observed, we put S = Y1 + ... + Yk . Let us give a practical interpretation of (2.1.23). Suppose that the system lifetime is described by a r.v. Yi . At the instant of failure, an emergency unit replaces the failed system. It is capable to work only a short period of time, say 1 hour. During this time the failed system must be diagnosed and repaired. Let 1 − p be the probability that this mission could be carried out with success. If the repair succeeds, the emergency unit returns to standby, and the system continues to work. Otherwise, with probability p, the system fails. One hour is considered as a negligible time. An interesting fact is the following Theorem 1.2 If Yi ∼ Exp(1/µ), then S ∼ Exp(p/µ),

(2.1.25)

i.e. S has an exact exponential distribution with parameter p/µ. We leave the proof for an exercise. What happens if Yi is not an exponential r.v.? It turns out that if p is ”small” then S is ”close” to the exponential distribution with parameter p/E[Yi ]. For more details we refer to Gertsbakh (1989), Chapter 3.

2.2

Aging and Failure Rate

Let us consider in this section a system (component) which is not renewed at its failure. The event ”failure” might be defined in many ways, depending on the appropriate circumstances. It might be a mechanical breakdown, deterioration beyond the permissible value, appearance of certain defect in the system performance (overheating, noise), or a decrease of system performance index below a certain previously established critical level.

2.2.

AGING AND FAILURE RATE

25

For our formal treatment, the particular definition of the failure is not important, but important is that we fix a certain time scale, define the event ”failure” and measure the time τ elapsed from the system ”birth” (or beginning of its operation) to system ”death” or failure. We assume further that the lifetime, i.e. the time from the ”birth” to the ”death”, is a continuous positive random variable having the density function f (t) and the c.d.f. F (t) = P (τ ≤ t). In real life, after the failure, the system is usually repaired, restored, partially or completely, and starts its ”new life”, fails again, etc. In other words, most of the systems are subject to renewal in the course of their operation. Our analysis in this section deals with nonrenewable systems: we study the properties of the time interval until the appearance of the first failure. Most important and useful for the reliability analysis is the conditional probability of system failure in a small interval [t, t + ∆] given that at the the instant t the system is ”up” (”alive” ). Formally, we are interested in the conditional probability P (t ≤ τ ≤ t + ∆|τ > t) =

P (failure appears in (t, t + ∆)) . P (τ > t)

(2.2.1)

Since the numerator in (2.2.1) equals approximately f (t) · ∆, we arrive at the following formula: P (t ≤ τ ≤ t + ∆|τ > t) ≈

f (t) · ∆ . 1 − F (t)

(2.2.2)

In other words, the conditional failure probability in [t, t + ∆] given the system is up at t is, for small ∆, approximately proportional to the quantity f (t)/(1 − F (t)). Definition 2.1 The function h(t) =

f (t) 1 − F (t)

(2.2.3)

defined for all t > 0 such that F (t) < 1, is called the failure or hazard rate. # In survival analysis which deals with the probabilistic analysis of lifetime for biological objects, the notion rate of mortality is used instead of the ”failure rate”. Recalling that the exponential distribution arises as the time to the first event in the Poisson process, and that the probability of the appearance of an event in [t, t+∆] is proportional to ∆, it should be expected that the failure rate for the exponentially distributed lifetime is constant. Indeed, for f (t) = λe−λt and F (t) = 1 − e−λt , h(t) =

f (t) =λ. 1 − F (t)

(2.2.4)

26

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Suppose that we know the system failure rate h(t). Could we reconstruct in a unique way its lifetime density function and /or the lifetime c.d.f. ? The answer is ”yes” since there is a simple and useful relationship between the survival probability R(t) = 1 − F (t) = P (τ > t) and the failure rate h(t):  Z t  P (τ > t) = R(t) = exp − h(u)du .

(2.2.5)

0

The proof is left for Exercise 9. Definition 2.2 A random variable τ is of increasing (decreasing) failure rate type if the corresponding failure rate h(t) is an increasing (decreasing) function of t. # We denote τ ∈ IFR (DFR) or F ∈ IFR (DFR) if h(t) is increasing (decreasing). The exponential random variable is, by the definition, both of IFR and DFR type. It can be proved that h(t) = Const is a characteristic property of the exponential distribution, among the positive continuous random variables. A phenomenon which is very important for reliability theory in general and for preventive maintenance, in particular, is aging. On intuitive level, aging means an increase of failure risk as a function of time in use. In that case the failure rate is an appropriate measure of aging. The memoryless exponential distribution reflects ”no aging” situation, a lifetime τ ∈ IF R displays an inreasing failure risk and reflects the situation when the object is subject to aging. If τ ∈ DF R, the failure risk decreases in time, and the corresponding system, contrary to aging, becomes ”younger”. We will show later that mixtures of exponential distributions have this somewhat surprising property. Another approach to the formalization of aging is considering the conditional survival probability R(x|t), as a function of t, for an interval of fixed length x, [t, t + x], given that the system is alive at t. Obviously, R(x|t) =

R(t + x) P (τ > t + x) = . P (τ > t) R(t)

(2.2.6)

If this conditional probability decreases as a function of t, it would be natural to say that the system is aging. It turns out that this type of aging is completely identical to the IFR property. Indeed, using (2.2.5), it is easy to establish that R t+x R(x|t) = exp(− t h(u)du). Taking the derivative of R(x|t) with respect to t, we arrive at the following theorem: Theorem 2.2 R(x|t) is decreasing (increasing) in t if h(t) is increasing (decreasing). In the case of a discrete random variables, the definition of IFR and DFR

2.2.

AGING AND FAILURE RATE

27

should be given in the terms of the survival probability R(x|t). Definition 2.3 Let Xi , i = 1, ..., n be independent random variables, Xi has d.f. fi (t). We say that Y is a mixture of X1 , ..., Xn if its d.f. is P Pn n fY (t) = i=1 αi fi (t), where 0 ≤ αi ≤ 1, i=1 αi = 1. # Mixtures arise, for example, if the daily production of several similar machines is mixed together. αi is then the relative portion of the i-th machine in the daily production; Xi is the random characteristic of the items produced on the i-th machine, say the random lifetime of a resistor which came out of the ith machine. An interesting fact which has also important practical implications is stated in the following Theorem 2.3. Mixture of exponential distributions has a DFR distribution. Proof Let Xi ∼ Exp(λi ). Obviously, the probability P (Y > t) can be represented in the following form: P (Y > t) = E[e−Λt ], where Λ is a random variable, such that P (Λ = λi ) = αi . Then the failure rate of Y is hY (t) =

−∂E[e−Λt ] /E[e−Λt ] . ∂t

Now consider the derivative of hY (t) with respect to t. The differentiation can be carried out inside the E[·] operator: −E[eΛt ]E[Λ2 eΛt ] + (E[ΛeΛt ])2 dhY (t) = . dt (E[eΛt ])2 Now define X = eΛt/2 and Z = Λe−Λt/2 . Then the numerator in the last expression is E 2 [XZ] − E[X 2 ]E[Z 2 ]. It is nonpositive by the Cauchy-Schwarz inequality, which proves the theorem. Let us now investigate the relationship between the failure rate of components and the failure rate of the system. It is assumed that the system consists of independent components. The situation is very simple for a series system, as the following theorem states: Theorem 2.4 The failure rate of a series system is the sum of its component failure rates. Proof

28

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Qn The reliability of a series system is R(t) = i=1 Ri (t). Take the logarithm of the left-hand side and of the right-hand side. Differentiate with respect to t. Then we obtain that n X h(t) = hi (t), (2.2.7) i=1

where hi (t) = fi (t)/Ri (t). It is clear from (2.2.7) that if all components are of IFR (DFR) type, then the series system is of the IFR (DFR) type. Historically, the fact established in Theorem 2.4 became an important impulse for the design of reliable systems and for the development of reliability theory. It happened in the era of first generation computers, that the designers and users suddenly became aware that their computers fail extremely often, say once per hour. They realized that it was because the computer consisted of tens of thousands of parts (tubes, relays, etc) which had failure rate of magnitude 10−5 hr−1 . As a system, the computer is basically a series system, i.e. the failure of each part means the system failure. Thus the system failure rate for the first generation computers had the magnitude of10−1 hr−1 .Thus the average interval between failures was 1–2 hours. There is an interesting and useful connection between the mean lifetimes of the components of a series system and the mean system lifetime. We state it without proof. Theorem 2.5 Let µi be the mean lifetime if the i-th component. If all components are of IFR type, then the system mean lifetime µ satisfies the following inequality: n X −1 µ−1 . (2.2.8) µ>( i ) i=1

We have seen that the IFR property is inherited by a series system from its components. The situation differs for a parallel system. It may happen that the IFR property of the components in not inherited by the whole system. Exercise 3 demonstrates that a system of two parallel elements with exponential lifetimes may have a nonmonotone failure rate. The following theorem shows that the exponential distribution represents the ”worst case” from the reliability point of view. Denote by τ the system lifetime. Theorem 2.6

2.3. NORMAL, LOGNORMAL AND WEIBULL FAMILIES

29

If F ∈ IFR and E[τ ] = µ, then for 0 ≤ t ≤ µ R(t) ≥ e−t/µ .

(2.2.9)

We omit the proof which can be found in Gertsbakh [1989], p.63. This theorem has an immediate and very useful implication. Corollary 2.6 Let r0 = ψ(p1 , p2 , ..., pn ) be the system reliability function. Then for 0 ≤ t ≤ min{µ1 , µ2 , ..., µn }, R(t) ≥ ψ(e−t/µ1 , e−t/µ2 , ..., e−t/µn ) .

(2.2.10)

Proof It follows immediately from Theorem 2.6 and the monotone dependence of ψ(p1 , ..., pn ) on each of pi . This statement says that on the time interval [0, min{µ1 , µ2 , ..., µn }], the worst case situation would take place if all system components (being organized in a system with the same structure function) would be replaced by exponential ones, with the same mean lifetimes. We can view the results of the last two theorems as a trial to extract information about the system lifetime when the information available is the mean lifetime and the behavior of the failure rate. The more we know about the statistical properties of the system, the more accurate could be our prediction about system lifetime. An elegant mathematical theory was developed by Barlow and Marshall (1964) for IFR distributions with known first and second moment. This theory allows obtaining a quite accurate bounds on system reliability. This turns out to be very useful for various applications, including the preventive maintenance.

2.3

Normal, Lognormal and Weibull Families

2.3.1

Normal and Lognormal Distributions

We have already introduced the normal distribution in the previous section in the context of damage accumulation process described by the model τ = τ1 + τ2 + ... + τk , where τi are the intervals between the successive events in the Poisson process. We say that τ ∼ N (µ, σ) if P (τ ≤ t) = Φ

t − µ σ

1 =√ 2π

Z

(t−µ)/σ

−∞

exp[−v 2 /2]dv .

(2.3.1)

30

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

µ and σ are the mean value and the standard deviation, of the r.v. τ , respectively. The support of the normal distribution is the whole axis (−∞, ∞) and therefore formally this distribution can not represent a positive random variable. However, if σ/µ < 1/3, the negative tail of the normal distribution is negligible, and it may serve as a c.d.f. for a positive random variable. Let us compare, for example, the normal and Gamma√distributions for k = 10. By (2.1.20), the c.v. of the Gamma distribution is 1/ 10. Thus we might expect that the survival probabilities   P9 √ R1 (t) = e−λt i=0 λi /i! and R2 = 1 − Φ t−10/λ 10/λ are close to each other (both distributions have the same means and variances). This is true, in fact, as the Fig. 2.4 shows.

Figure 2.4: Comparison of Gamma and Normal survival probabilities The normal random variable is of the IFR type. Fig. 2.5 shows typical curves of the failure rate. The lognormal distribution is defined in the following way. We say that the r.v. Y has a lognormal distribution with parameters µ and σ (which will be denoted as Y ∼ logN (µ, σ)), if  log t − µ  . (2.3.2) P (Y ≤ t) = Φ σ The corresponding density function is   1 fY (t; µ, σ) = √ exp − (log t − µ)2 /2σ 2 , for t > 0 . 2πσt

(2.3.3)

Let τ = log Y . How is τ distributed? P (τ ≤ t) = P (log Y ≤ t) = P (Y ≤ et ) = Φ

t − µ σ

.

(2.3.4)

2.3. NORMAL, LOGNORMAL AND WEIBULL FAMILIES

31

Figure 2.5: h(t) for the Normal distribution. 1 − for N (µ = 5, σ = 1), 2 − for N (µ = 5, σ = 1.33) Table 2.1: c.v. for the lognormal distribution σ 0.1 c.v. 0.10

0.15 0.151

0.2 0.202

0.25 0.254

0.3 0.307

0.4 0.416

0.5 0.533

0.6 0.658

0.7 0.795

We see, therefore, that τ = log Y ∼ N (µ, σ). The logarithm of a lognormal r.v. is a normal random variable. The first two central moments of the lognormal distribution are as follows: E[Y ] = exp(µ + σ 2 /2), 2

V ar[Y ] = (eσ − 1) · exp(2µ + σ 2 ).

(2.3.5) (2.3.6)

For the lognormal distribution, the coefficient of variation is a function of σ only: p (2.3.7) c.v. = eσ2 − 1 ≈ σ for σ < 0.5 . Table 2.1 shows how the c.v. depends on σ. The lognormal density is left-skewed and has for large values of c.v. a heavy right tail. The density becomes more symmetrical for small c.v. values, see Fig. 2.6. The three-parameter lognormal distribution has the density   1 fY (t; µ, σ) = √ exp −(log(t−t0 )−µ)2 /2σ 2 , for t > t0 ,(2.3.8) 2πσ(t − t0 )

32

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Figure 2.6: The lognormal density for µ = 2, σ = 0.15 (curve 1), µ = 2, σ = 0.3 (curve 2) and µ = 2, σ = 0.6 (curve 3) and t0 is called the threshold of sensitivity. The density (2.3.8) is used in the cases when the lifetime data follow the lognormal density for t > t0 , and failures were not observed on [0, t0 ]. The investigation of the failure rate h(t) = fY (t; µ, σ)/(1 − Φ((log t − µ)/σ)) reveals that h(t) → 0 as t → 0 and as t → ∞. Thus, the lognormal random variable is neither of IFR nor of DFR type. This distribution may be suitable for describing the lifetime of objects which display so -called ”training effect”, see Gertsbakh and Kordonsky (1969), p. 76 for more details. There are several probabilistic models in the reliability literature describing the appearance of the lognormal distribution. An interesting source is the book of Aitchison and Brown (1957). Gertsbakh and Kordonsky (1969) show that the lognormal distibution arises in a model of damage accumulation in which the probability of single damage occurrence in [t, t + ∆] is ∆ · λ/(1 + t). In fact, the lifetime defined as the time to the appearance of the k-th damage is the time to the the k-th event in a non-homogeneous Poisson process with intensity function λ(t) = λ/(1 + t), see Appendix 1.

2.3.2

The Weibull distribution

Lawless (1983) writes in his review that about a half of all papers on statistical methods in reliability in the period 1967–1980 were devoted to the Weibull distribution. The popularity of this distribution lies in the fact that, depending on the parameters, it may describe both IFR and DFR random variables. Besides, a logarithmic transformation of the Weibull random variable produces a r.v. which belongs to the so-called location-scale family which possesses several very good features for statistical analysis.

2.3. NORMAL, LOGNORMAL AND WEIBULL FAMILIES

33

We say that the random variable τ has a Weibull distribution with parameters λ and β if the density function of τ equals for t > 0 f (t; λ, β) = λβ βtβ−1 exp(−(λt)β ),

(2.3.9)

and zero, otherwise. λ > 0 is called the scale parameter, and β > 0 − the shape parameter. The Weibull c.d.f. is F (t; λ, β) = 1 − exp(−(λt)β ) for t > 0,

(2.3.10)

and F (t; λ, β) = 0 for t ≤ 0. The notation τ ∼ W (λ, β) will be used for the Weibull random variable. Obviously, for β = 1, the Weibull family becomes the exponential distribution. The failure rate for the Weibull family is h(t) = f (t; λ, β)/(1 − F (t; λ, β)) = λβ βtβ−1 .

(2.3.11)

h(t) is increasing in t for β > 1, decreases in t for β < 1 and remains constant for β = 1, see Fig. 2.7.

Figure 2.7: h(t) curves for the Weibull distribution (λ = 1) The mean value of the Weibull random variable is E[τ ] = λ−1 Γ(1 + β −1 ),

(2.3.12)

R∞ where Γ(·) is the Gamma–function: Γ(x) = 0 v x−1 e−v dv. The variance is also expressed via the Gamma − function: V ar[τ ] = λ−2 [Γ(1 + 2β −1 ) − (Γ(1 + β −1 ))2 ] .

(2.3.13)

34

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Table 2.2: The values of Γ[1 + 1/β] and Γ[1 + 2/β] β Γ[1 + 1/β] 0.75 1.19064 1.00 1.00000 1.25 0.93138 1.50 0.90274 1.75 0.89062 2.00 0.88623 2.25 0.88573 2.50 0.88726 2.75 0.88986 3.00 0.89298 3.25 0.89633 3.50 0.89975 3.75 0.90312 4.00 0.90640 4.25 0.90956 4.50 0.91257 4.75 0.91574 5.00 0.91817

Γ[1 + 2/β] 4.0122 2.0000 1.42962 1.19064 1.06907 1.00000 0.95801 0.93138 0.91409 0.90274 0.89534 0.89062 0.88776 0.88623 0.88564 0.88573 0.88632 0.88726

2.3. NORMAL, LOGNORMAL AND WEIBULL FAMILIES

35

Table 2.3: The c.v. for the Weibull family β 0.5 c.v. 2.24

1 1.5 1 0.68

2 0.52

3 0.36

4 0.28

5 0.23

To make easier the calculations, below is a table which gives the values of Γ[1 + 1/β] and Γ[1 + 2/β] for various β values. For the Weibull distribution the c.v. depends only on β only :  0.5 c.v. = Γ(1 + 2/β)/Γ2 (1 + 1/β) − 1 . (2.3.14) Table 2.3 shows how the c.v. depends on the β values. The Weibull densities for various c.v. values are shown on Fig. 2.8.

Figure 2.8: The Weibull densities (mean=1) Let τ ∼ W (λ, β). Consider the random variable X = log τ . It has the following c.d.f.: P (X ≤ t) = F (t) = 1 − exp− exp((t−a)/b) , −∞ < t < ∞,

(2.3.15)

where a = − log λ and b = 1/β. The derivation of (2.3.15) is left as an exercise. The c.d.f. (2.3.15) is of location-scale type. This fact is very important for statistical inference about the location and scale parameters. Definition 3.1 Let F0 (t) be a c.d.f. with d.f. f0 (t). Assume that the c.d.f. of a random variable X depends on the parameters µ and σ, σ > 0 : P (X ≤ t) = F (t; µ, σ).

36

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

If F (t; µ, σ) = F0 ((t − µ)/σ), we say that F belongs to a location-scale family. The corresponding density function is f0 ((t − µ)/σ) · σ −1 ; µ and σ are the location and scale parameters, respectively. # The two-parameter exponential distribution (1.1.23), the normal and the extreme value distribution (2.3.15) belong to location-scale families. The notation X ∼ Extr(a, b) will be used to denote a r.v. with c.d.f. (2.3.15). This distribution is known in probability theory as the extreme value distribution of third type. For more details see Barlow and Proschan (1975), Chapter 8. The Weibull family is closed with respect to the minimum-type operation, in the following sense. If τi ∼ W (λi , β), τi are independent r.v.’s and τ(1) = min1≤i≤n {τi }, then Pn τ(1) ∼ W (λ0 , β), where λ0 = ( i=1 λβi )1/β . We leave the reader to prove this property as an exercise. The Weibull distribution arises in a minimum-type scheme in the following way. Assume that there is a sequence of independent random variables {τk , k = 1, 2, ...} such that for t → +0 P (τk ≤ t) = ctd (1 + o(1)), c > 0, d > 0; t > 0,

(2.3.16) d

i.e. the c.d.f. of τk behaves near zero approximately as ct . It turns out that the sequence Zn = an min{τ1 , τ2 , ..., τn } converges in distribution to W (c1/d , d) as n → ∞ and an = n1/d . Let us consider an arbitrary monotone system with independent and nonrenewable components. The component i has lifetime τi ∼ Exp(λi ). Let us assume that the components are ”highly reliable”, i.e. formally λi = α · θi , α → 0.

(2.3.17)

Following Burtin and Pittel (1972), let us show that the lifetime of this system can be approximated by a Weibull distribution. For any state vector x, let G(x) = {i : xi = 1} and B(x) = {i : xi = 0}. Denote by D the set of all state vectors which correspond to system down state, i.e. D = {x : φ(x) = 0} .

(2.3.18)

The set D can be partitioned into the subsets Dr , Dr+1 S,n..., according to the number of failed elements in the state vectors: D = k=r Dk . So, Dr is a collection of all state vectors which have the smallest number r of failed elements. ( In other words, r is the size of the smallest cut set in the system). Let τ be the system lifetime. Let us partition the event ”the system is down” into the events ”the system is in Dk ”, k ≥ r. Then n  X  X Y Y R(t) = 1− e−λi t · (1−e−λi t ) .(2.3.19) k=r

x∈Dk {i∈G(x), x∈Dk }

{i∈B(x), x∈Dk }

2.3. NORMAL, LOGNORMAL AND WEIBULL FAMILIES

37

Indeed, the sum in (2.3.19) is the probability that the system is in one of its down states. For each such state x, the elements of G(x) are up and the elements of B(x) are down. Now expand e−λi t = 1 − λi t + O(λ2i t2 ) = 1 − αθi t + O(α2 ) and substitute it into (2.3.19). The main term in (2.3.19) will be determined by the first summand with the smallest k = r. After some algebra it follows that R(t) = P (τ > t) = 1 − αr · tr · g(θ) + O(αr+1 ) ≈ exp(−αr tr g(θ)), (2.3.20) where g(·) is the sum of the products of θi over all cut sets of minimal size r: X Y g(θ) = θi . (2.3.21) x∈Dr i∈B(x)

(2.3.20) suggests that the lifetime of a system with highly reliable exponential component lives approximately follows the Weibull distribution: P (τ ≤ t) ≈ 1 − exp(−αr tr g(θ)).

(2.3.22)

Remark More formally, (2.3.22) states that for any t in a fixed interval [0, t0 ], R(t) approaches exp(−αr tr g(θ)) as α → 0.# If there are reasons to believe that the lifetime follows the Weibull pattern only for t ≥ t0 > 0 and that failure can never occur before t0 , the natural extension of (2.3.10) is the three-parameter Weibull distribution: P (τ ≤ t) = 1 − exp(−(λ(t − t0 ))β ), t ≥ t0 .

(2.3.23)

As in the lognormal case, t0 is called a threshold or a guaranteed time parameter. Let us consider an example illustrating the quality of the approximation (2.3.22). Example 3.1 : s − t connectivity of a dodecahedron network Fig. 2.9 shows a network with 20 nodes and 30 edges called dodecahedron. The nodes are absolutely reliable. The edges fail independently and their lifetimes τ ∼ Exp(λ). The network fails if there is no path leading from node 1 (”source”) to node 2 (”terminal”). The reliability of such network is termed the s − t connectivity. We assume that all θi = 1 and α is ”small”. The dodecahedron has two minimal-size minimal cut sets with r = 3 edges. Indeed, node 1 is disconnected from node 2 if the edges {17,18,19} or the edges {19,20,30} fail. All other cut sets separating the source from the terminal have size greater than r = 3. Let us take t = 1 and consider how good is the approximation to network failure probability provided by the expression Fapprox (1) = 1 − exp[α3 g(θ)], for

38

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Figure 2.9: The dodecahedron network Table 2.4: Comparison of exact an approximate reliability e−α 0.8 0.85 0.90 0.92 0.94 0.96 0.98 0.99

α Fapprox (1) 0.22314 0.0219 0.16252 0.0854 0.10536 0.00234 0.08338 0.00116 0.06188 0.000474 0.04082 0.000136 0.02020 1.65 · 10−5 0.01005 2.0 · 10−6

Fexact (1) 0.0362 0.0122 0.00288 0.00136 0.000528 0.000146 1.7 · 10−5 2.03 · 10−6

rel. error in % 39 30 18 15 10 7 3 1.5

various values of α approaching zero. Since there are two minimal cuts of size 3, g(θ) = 2θ3 = 2 by (2.3.21). Thus our approximation is Fapprox (1) = 1−exp[2α3 ]. The table 2.4 shows the values of network reliability by the Burtin-Pittel approximation versus the exact values of network failure probabilities Fexact (1), for α ranging from 0.22314 to 0.01005. (Fexact (1) were computed by J.S. Provan using an algorithm based on cutset enumeration, see Fishman (1995), p. 62). It is seen from Table 2.4 that for α below 0.05 the Burtin-Pittel approximation is quite satisfactory.# Estimation of network reliability may be a quite difficult task, especially for large and highly reliable networks. The exact calculation needs special algorithms and software. Monte Carlo simulation is a good alternative, see e.g. Elperin et al (1991). Example 3.1 shows that for very reliable networks with failure probability less than 10−4 , the Burtin-Pittel approximation provides with minimal efforts a reasonably accurate solution.

2.4.

2.4

EXERCISES

39

Exercises

1. One operating unit is supported by four similar units in a ”cold” standby. When the operating unit fails, one of the standby units takes its place. The system fails if all units have failed. Denote system lifetime by τs . Assume that the lifetime of each unit is τ ∼ Exp(λ). Argue that τs ∼ Gamma(5, λ). 2. Let τ ∼ U (0, 1). Find the failure rate h(t). 3. Two exponential components with failure rates λ1 = 1 and λ2 = 5 are connected in parallel. Find the expression for system failure rate. Investigate theoretically and/or numerically the behavior of system failure rate. Is it monotone? 4. Assume that at time t = 0 we start observing n identical and independent devices, each having an exponential lifetime with parameter λ. Each failed device is not replaced, and the observation stops after observing r, r < n, failures. Denote by T the total time on test (TTT). Prove, that TTT ∼ Gamma(r, λ). Hint . The time to the first failure is ∼ Exp(nλ). The time between first Pthe r and the second failure is ∼ Exp((n − 1)λ). Then note that TTT = i=1 (n − i + 1)(τ(i) − τ(i−1) ). 5. Suppose that a population with d.f. Exp(λ1 = 1) is contaminated by adding some amount of units with lifetime distributed according to Exp(λ2 ) = 5. The lifetime of a randomly chosen unit is distributed according to F (t) = 0.2[1 − e−5t ] + 0.8[1 − e−t ]. Investigate theoretically or numerically the failure rate and show that it is of the DFR type. 6. Suppose that τ ∼ W(λ, β), E[τ1 ] = 1, V ar[τ2 ] = 0.16. Find λ and β. Repeat the same calculations by assuming that τ has a Lognormal distribution. 7. Element a is connected in series to a parallel group of two identical elements b and c. τa ∼ W (λ = 1, β = 2). Elements b and c are exponential with mean lifetime 0.5. Find the analytic form of system failure rate h(t). 8. For the system in Exercise 7 find using Corollary 2.6 the lower bound on system reliability. What is the interval for which this lower bound is valid? 9. Prove formula (2.2.5). 10. Prove Theorem 1.2.

40

CHAPTER 2.

PARAMETRIC LIFETIME DISTRIBUTIONS

Hint: Use the Laplace transform. The Laplace transform of the exponential density is R∞ L(s) = E[e−st ] = 0 e−st λe−λt dt = λ/(λ + s). 11. Let τ ∼ W (λ, β). Find the c.d.f. of X = log τ . Introduce new parameters b = β −1 and a = − log λ. 12. Let τi ∼ W (λi , β). Find the c.d.f. of τ = min(τ1 , ..., τn ). 13. Consider a bridge system shown on Fig. 1.5, Chapter 1. Argue that for it the expression for g(θ) in the Burtin - Pittel approximation is g(θ) = θ1 θ2 +θ4 θ5 . 14. Consider a series system of independent components with failure rates h1 (t), ..., hn (t). Prove that the system mean lifetime equals R∞ R t Pn E[τ ] = 0 exp[− 0 i=1 hi (x)dx]dt. Hint. Use (1.3.12), (2.2.5) and Theorem 2.4. 15, a. Find the c.d.f. of a series system consisting of two independent components. The lifetime of the first one is τ1 ∼ Exp(λ1 ) and the lifetime of the second one is τ2 ∼ W (λ2 = 0.906, β = 4). b. Find the density function of the system lifetime and investigate it graphically for λ1 values: 0.5, 1, 2 and 4.

Chapter 3

Statistical Inference From Incomplete Data Statistics is the science of producing unreliable facts from reliable figures. Quips and Quotes, p. 765.

He uses statistics as a druncard uses a lamppost, for support, not for illumination. Chesterton

3.1

Kaplan-Meier Estimator of the Reliability Function

In the center of the statistical inference in reliability theory is the probabilistic characterization of lifetime. To solve any problem in preventive maintenance we eventually will need information about the lifetime distribution function: the estimates of its parameters and/or a nonparametric estimate of the survival probability function. By the very nature of lifetime, obtaining a complete sample of observations is practically impossible. In a laboratory testing, we stop the experiment either at a prescribed time or after observing a prescribed number of failed items. Otherwise, the experiment becomes too costly and too time consuming. Thus, for some items the lifetime is censored, i.e. the information has about it has the form ”the lifetime τ exceeds some value tc ”. Suppose, we monitor a sample of 100 new type transmissions installed on 41

42

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

new trucks. It is agreed that a failed transmission will be immediately reported, together with the corresponding mileage. But it may happen that the truck is damaged in an accident and/or the transmission has been dismantled. Then the information received might be the following: the lifetimes t1 , t2 , ..., t45 of 45 failed transmissions have been observed and recorded. For the remaining 55 transmissions, the information has the following form: the lifetime τj exceeds some constant Lj , j = 46, 47, ..., 100. An important case of incomplete information was described and formalized in the paper by Artamanovsky and Kordonsky (1970). After a fatigue failure is discovered in an operating aircraft, the whole fleet is put on ground and inspected. The following information is recorded for aircraft i: the total time in air Ti , and the presence or absence of a specific fatigue crack. So, for the i-th aircraft, it is known only whether the fatigue life τi exceeds Ti or not. This section describes a very useful nonparametric estimation procedure for the survival probability function R(t), i.e. for the probability of survival past time t. This procedure does not assume any parametric framework, i.e. no a priori assumptions are made about the functional form of R(t). It will be assumed that the following information is available. A group of n identical items starts operating at t0 = 0. During the operation, the items may fail and also may be withdrawn (lost) from the follow up. It is assumed that the failure times are recorded, together with the number of items ”alive” just prior to the failure time. Suppose, failures occur at the instants 0 < t1 < t2 < ... < tk . Let nj be the number of items ”at risk” (i.e. ”alive”) just prior to tj . Denote by wj the number of items withdrawn from the observation in the interval between the (j − 1)-st and j-th failure, i.e. in the interval (tj−1 , tj ), t0 = 0. Obviously, n1 = n − w1 , n2 = n1 − 1 − w1 , etc. Kaplan and Meier (1958) suggested their famous nonparametric estimator ˆ R(t) of the survival probability R(t), which has the following form: Y ˆ = R(t) (1 − 1/ni ) . (3.1.1) {i:ti ≤t}

ˆ is right-continuous. It equals 1 for 0 ≤ t < t1 ; if the observation By (3.1.1), R(t) (follow up) stops at the instant of the k-th failure tk , while some items remain ˆ alive, R(t) is defined only for the interval [0, tk ]. A heuristic derivation of the estimator (3.1.1) can be made as follows. Suppose, our observations of the sample are carried out at the time instants t∗i , i = 1, 2, ... At t∗i we record the number n∗j of items alive just prior to the observation instant. Besides, assume that the number of failures (”deaths”) d∗j is known for the interval ∆j = (t∗j−1 , tj ). Therefore, we know also the number of items wj ”lost” or withdrawn in this interval. Let τ be the item lifetime. Denote P (τ > t∗j ) = R(t∗j ) = P ( to survive past t∗j ). Obviously, R(t∗j ) = P (τ > t∗0 )P (τ > t∗1 |τ > t∗0 ) · · · P (τ > t∗j |τ > t∗j−1 ).

(3.1.2)

3.1. KAPLAN-MEIER ESTIMATOR

43

Table 3.1: The ABS Test Data failure number i mileage ti ni (ni − 1)/ni 1 3, 220 50 0.980 2 6, 250 49 0.980 3 12, 660 46 0.978 4 15, 610 42 0.976 5 22, 980 39 0.974 6 27, 570 35 0.971 7 30, 800 34 0.971 8 33, 460 30 0.967 9 38, 500 27 0.963 10 41, 290 25 0.960 11 44, 870 20 0.950 12 50, 070 16 0.938

ˆ i) R(t 0.980 0.960 0.939 0.916 0.892 0.866 0.841 0.813 0.783 0.752 0.714 0.670

Denote by pi = P (τ > t∗i |τ > t∗i−1 ). A natural estimate of pi is the quantity pˆi = 1 − d∗i /ˆ n∗i , where n ˆ ∗i is the average number of items operational during the interval ∆i . It seems reasonable to put n ˆ ∗j = n∗j − wj /2. Thus the so -called ¯ j ) of R(tj ) is obtained as life-table estimate R(t ¯ j) = R(t

j Y

pˆi .

(3.1.3)

i=1

The Kaplan-Meier estimator can be viewed as the limiting case of the estimator (3.1.3). Imagine that the lengths of ∆j tend to zero, and the number of these intervals tends to infinity. Some of these intervals which do not contain failures will contribute to (3.1.1) by a factor equal 1. The only nontrivial contribution will come from those intervals which contain failures. For them, assuming that failures instants are separated from each other, pˆ∗i = 1 − 1/n∗i = 1 − 1/ni . Thus we arrive at the Kaplan-Meier estimator. In the literature it is also termed as the product-limit (PL) estimator. Example 1.1. Estimation of survival probabilities for ABS units. A new experimentally designed automatic braking system (ABS) was installed on 50 cars. At ABS failures, the mileage of each car has been recorded. Besides, some cars were removed from the testing due to the failures of other parts or because of accidents. The experiment was designed in such a way that at the instant of the ABS failure or car removal, the corresponding mileage has been ˆ are summarized in Table 3.1. # recorded. The data and the computation of R(t)

44

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

It can be shown that under rather general random withdrawal mechanism, ˆ R(t) is an unbiased estimator of R(t). The following formula known in literature as the Geenwood’s formula gives ˆ an estimate of V ar[R(t)]: X ˆ ˆ 2· Vd ar[R(t)] = [R(t)] (ni (ni − 1))−1 . (3.1.4) {i:ti ≤t}

An approximate (1 − 2α)- confidence interval on the survival probability for a fixed value t can be obtained as 0.5 ˆ 0.5 ˆ − zα · (Vd ˆ ˆ [R(t) ar[R(t)]) , R(t) + zα · (Vd ar[R(t)]) ],

(3.1.5)

where zα is the α-quantile of the standard N (0, 1) normal distribution.

3.2

Probability Paper

In this section we describe a popular graphical technique for analyzing the lifetime data which uses so-called probability paper. Using this technique is usually the first step in the lifetime data analysis. By means of probability paper it is possible to check visually how close the data follow the hypothetical distribution function. Besides, it provides quick estimation of distribution parameters. Very convenient is also the fact that the probability paper is applicable to a right censored samples, a feature which is very important for reliability applications. Using probability paper is restricted to location-scale families, see Definition 3.1 in Chapter 2. Suppose, therefore, that the c.d.f F (·) of lifetime τ is such that t − a , b>0. (3.2.1) P (τ ≤ t) = F b We assume that F (·) is a continuous and strictly increasing function. Then for any p between 0 and 1, the equation t − a p=F (3.2.2) b has a single root denoted as tp . Definition 2.1: p-quantile The root t = tp of (3.2.2) is called the quantile of level p, or the p - quantile.# The p - quantile has a simple probabilistic meaning: the p - part of the probability lies to the left of tp : P (τ ≤ tp ) = p/; .

(3.2.3)

The median is the 0.5-quantile, and the lower quartile is the 0.25-quantile.

3.2. PROBABILITY PAPER

45

Denoting by F −1 (·) the inverse function of F (·), we obtain that tp − a = F −1 (p), b

(3.2.4)

tp = a + b · F −1 (p).

(3.2.5)

or

This equation shows that for a location-scale family, there is a linear relationship between F −1 (p) and the tp . This is the basis for constructing the probability paper. Let t(1) , t(2) , ..., t(n) be the ordered observations from the population τ ∼ F ((t − a)/b). The sample counterpart of the c.d.f. is called the empirical c.d.f. and denoted as Fˆn (t). It is a stepwise function that jumps by 1/n at each t(i) . Definition 2.2: the empirical distribution function Fˆn (t). Fˆn (t) = 0 for t < t(1) and Fˆn (t) = 1 for t ≥ t(n) , and Fˆn (t) = k/n, for t(k) ≤ t < t(k+1) , k = 1, ..., n − 1 .#

(3.2.6)

Let tp be the p-quantile of F (·). Then it is easy to prove that Fˆn (tp ) → p as n → ∞ (since p is the probability that an observed lifetime is ≤ tp ). Then t(k) is the empirical value of the p = k/n-quantile, and we can write t − a k (k) , (3.2.7) pˆk = ≈ F n b or t(k) ≈ a + b · F −1 (ˆ pk ) .

(3.2.8)

This relationship is the key for probability plotting: the observed lifetimes t(k) must be plotted against F −1 (ˆ pk ). Fˆ (t) changes from (k − 1)/n to k/n at t(k) ; it is recommended to plot t(k) against F −1 ((k − 0.5)/n), see e.g. Vardeman (1994), p. 77, 78. Afterwards, a straight line must be drawn by eye through the plotted points. Closeness of the points to this line is a certain confirmation that the sample belongs to the population with the hypothetical c.d.f. F (t). To facilitate the use of probability paper, we present in Appendix 4 the Weibull and the Normal probability paper, together with the Mathematica codes for producing the paper and plotting on it. Before turning to a numerical example, let us point out a way of extending the applicability of the above described plotting procedure. Let τ = ψ(X), ψ be a monotonically increasing function, and let the c.d.f. of τ belongs to a location-scale family. Then  ψ(y) − a  P (X ≤ y) = P (ψ(X) ≤ ψ(y)) = F . (3.2.9) b

46

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

Therefore, the probability plot for a r.v. X can be obtained from the probability plot of τ by changing the time scale from t to ψ(t). Two examples are very important. (i) Let τ = log X ∼ N (µ, σ 2 ), i.e. X is lognormal. We write it as X ∼ logN (µ, σ 2 ). Here ψ(X) = log X. Thus, the probability paper for the lognormal distribution is obtained from the normal paper by replacing the time axis by a logarithmic time scale. (ii) Let τ ∼ Extr(a, b), i.e. P (τ > t) = 1 − exp(−e(t−a)/b ) . Then X = eτ ∼ W (λ, β), with λ = e−a , β = b−1 . Therefore, the Weibull paper is obtained from the probability paper for the extreme-value family by a logarithmic change of the time scale. Example 2.1 Seven braking units were installed on experimental trucks. Five failures were observed in the experiment, at the mileage 2.25, 6.7, 37.6, 85.4 and 110.0 (in thousands of miles) . It was decided to stop the test after observing five failures. Two braking units survived 110 000 miles. The following table presents the data needed for a probability plot: i t(i) (i − 0.5)/7 1 2.25 0.071 2 6.7 0.214 3 37.6 0.357 4 85.4 0.50 5 110.0 0.642 Fig. 3.1 shows 5 points with coordinates (t(i) , F −1 ((i − 0.5)/7)) plotted on the Weibull paper. They follow rather well a straight line. We presume that the two remaining points, if were observed, would be close to the drawn line. We have, therefore, a certain evidence that the underlying c.d.f. is Weibull. Important is to note that we we are dealing with an incomplete sample! Note also that in most applications, the left tail of the distribution is of greatest importance. In reliability practice, we hardly will be able to observe all order statistics. The Weibull paper gives also a possibility of obtaining parameter estimates. The estimation of λ uses a special line which corresponds to the probability 1 − e−1 ≈ 0.632. The time value which corresponds to the intersection of the ˆ = 1/115. drawn line and this special line is the estimator of 1/λ. In our case, λ To estimate the Weibull shape parameter β, we suggest the following procedure. Read from the graph of the drawn line the time value tp which corresponds to a certain p-value, say p = 0.35. We see that t0.35 ≈ 27. Then use the following formula: log(− log(1 − p)) βˆ = . ˆ log(tp λ)

(3.2.10)

3.2. PROBABILITY PAPER

47

Table 3.2: The fatigue test results i 1 2 3 4 5 6 7 8 9 10 11 12 13

N (i) log10 N (i) 0.53 4.724 0.65 4.813 0.76 4.881 0.80 4.905 0.87 4.940 0.90 4.954 0.902 4.955 1.02 5.009 1.07 5.029 1.074 5.031 1.09 5.037 1.16 5.064 1.22 5.085

For our data we obtain βˆ = 0.58. To facilitate the use of Weibull paper, we present in Appendix 4 a Mathematica code for plotting on the Weibull paper. To activate this program, define (i) the observed lifetimes as ”tw”, see In[1 ] and (ii)”nobsw”, the total number of observations. The program in Appendix 4 is made for the range of observations [1, 1 000]. In case that the observations fall outside this range, rescale the range. For example, if the maximal observation is 100 000 cycles, express the results in hundreds of cycles. # There is an ample literature on using graphical methods for data analysis and estimation. We refer the reader to the Weibull Analysis Handbook (1983), and to Dodson’s Weibull Analysis (1994). Example 2.2: fatigue test data for cyclic bending of a cantilever beam The following are the results of a fatigue experiment. A cantilever beam made of 2 alloy B-95 was bent with maximal cyclic stress 30 kg/mm (Kordonsky, Doctoral Dissertation, (1966)). 22 specimen were put on test, 13 failed in the interval [0, 1.25 · 105 ], see Table 3.2 below. N (i) is the lifetime in the units of 100 000 cycles. Fig 2.3 is the plot on Normal paper of the lifetimes of the noncensored observations obtained by means of the Mathematica program presented in Appendix 4. For our case, we have to supply the program with the following data: (i) the decimal logarithms of the lifetimes put into the list ”tn”, the third column of Table 3.2;

48

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

(ii) the total number of observations (including censored), ”nobsn”. In our case, nobsn=22. (iii) the first argument of PlotRange should include the minimal and the maximal observed lifetime. We took [4.6, 5.2], see Fig. 3.2. Fig. 3.2 shows that the dots follow rather well a straight line. Assuming that the censored lifetimes belong to the same population, it seems very plausible that the actual lifetimes follow the lognormal distribution. (Let us remind that if log τ ∼ N (µ, σ), then τ ∼ logN (µ, σ)). Parameter estimation from the Normal plot is very simple. Draw a straight line L through the plotted points. The intersection of L with the horizontal line marked 0.5 is the estimate of µ. In our case, µ ˆ = 5.06. To obtain the estimate of σ, read the abscissa x0 of the intersection of L with the horizontal line just above the line marked 0.15 (this line corresponds to the probability 0.159 ). x0 = 4.89. The estimate of σ is σ ˆ=µ ˆ − x0 = 5.06 − 4.89 = 0.17. # Remark So far we described the use of probability paper for complete or right-censored samples. Could we make the probability plot for data subject to more complicated censoring ? A practical solution would be using the Kaplan-Meier ˆ estimate R(t) to produce a substitute for the empirical distribution function. ˆ Put Fˆ (t) = 1 − R(t). Suppose that at the point ti , Fˆ (t) jumps from p∗i to p∗∗ i . −1 Then calculate pi = (p∗i +p∗∗ )/2 and plot (t , F (p )) on the probability paper. i i i In other words, use pi instead of (i − 0.5)/n. #

3.3

Parameter Estimation for Censored and Grouped Data

3.3.1

The Maximum Likelihood Method.

Suppose we observe a sample {t1 , t2 , ..., tn } from population with density function f (t; α, β). (For the sake of simplicity, we assume that f (·) depends on two unknown parameters α and β ). The standard maximum likelihood method of estimating the parameters works as follows. (i) Write the likelihood function Lik = Lik(α, β; t1 , ..., tn ) =

n Y

f (ti ; α, β) .

(3.3.1)

i=1

(ii) Find the values of α and β (depending on the observed sample values) which maximize the likelihood function. These values α ˆ =α ˆ (t1 , t2 , ..., tn ) and ˆ 1 , t2 , ..., tn ) are called the maximum likelihood estimates (MLE) of the βˆ = β(t parameters.

3.3. CENSORED DATA: PARAMETER ESTIMATION

49

In practice, the MLE are found by taking the logarithm of the likelihood function n X logLik = log f (ti ; α, β), (3.3.2) i=1

and by solving the following system of equations ∂logLik ∂logLik = 0, = 0. ∂α ∂β

(3.3.3)

ˆ of (3.3.3) does exist, is unique and correAssume that the solution (ˆ α, β) ˆ are called the sponds to the maximum of the likelihood function. Then (ˆ α, β) maximum likelihood estimates. Typically, the MLE have good statistical properties and in some cases coincide with the optimal estimates. Remark 1 Finding the MLE as the stationary point via the system of equations (3.3.3) does not work in one important case: when the support of the density function f (t; ·, ·) depends on the parameters. An example is the exponential density with a threshold parameter: f (t; λ, a) = λe−λ(t−a) , t ≥ a . Here the support depends on the parameter a, and the maximum of the likelihood function is not attained at the point where partial derivatives are zero. But the principle that the MLE should maximize the likelihood function remains valid. We demonstrate later for the exponential distribution with threshold parameter that the MLE could be found directly from the expression of logLik, see Exercise 4. # Remark 2 An important property of the MLE method is the so-called invariance. Suppose that we want to find the maximum likelihood estimate ψˆ of some parametric function ψ(α, β). By the definition, ψˆ is the value of the function ψ(α, β) which corresponds to the global maximum of the likelihood function. We may be tempted to express the likelihood function as a function of ψ and to maximize it. Instead of doing this tedious work, there is a much easier solution: the MLE of ψ is obtained via the formula ˆ α, β) ˆ . ψˆ = ψ(ˆ

(3.3.4)

For example, the p-quantile for a location-scale family F ((t − a)/b) equals tp = a + b · F −1 (p). (Verify it!) Suppose that we have the MLE’s a ˆ and ˆb of a and b, respectively. Then the MLE of the p-quantile tp is tˆp = a ˆ + ˆb · F −1 (p) . #

(3.3.5)

The main advantage of the maximum likelihood method is that it could be easily adjusted to incomplete data, such as censored and/or grouped samples. This we demonstrate in the next subsection.

50

CHAPTER 3.

3.3.2

STATISTICAL INFERENCE: INCOMPLETE DATA

Maximum Likelihood Function for Censored and Grouped Data

1. Censoring by the k-th order statistic. The population has the density function f (t; α, β) and the c.d.f. F (t; α, β). The information available are the first k order statistics t(1) , t(2) , ..., t(k) . The only information about the remaining n − k observations is that they exceed t(k) . This type of data (termed type−II censoring) is typical for reliability testing: n devices are put on test which terminates, because of time or cost restrictions, exactly at the instant of observing the k-th failure. The likelihood function has the following form: Lik =

k Y

f (t(i) ; α, β) · [1 − F (t(k) ; α, β)]n−k .

(3.3.6)

i=1

Example 3.1: exponential distribution Let f (t, λ) = λe−λt . Then logLik = k log k − λ(t(1) + ... + t(k) + (n − k)t(k) ).

(3.3.7)

The expression in brackets in (3.3.7) is the total time on test TTT. The equation ∂logLik/∂λ = 0 gives ˆ = k/T T T. λ

(3.3.8)

It is worth to recall that the TTT ∼ Gamma(k, λ), see Exercise 4, Chapter 2. ˆ # This fact is useful for an investigation of the statistical properties of λ. 2. Censoring by a constant. The lifetime data are censored by a fixed constant T (so-called type−I censoring). A typical example is testing n devices during a fixed period of time [0, T ]. Denote the observed lifetimes t(1) , t(2) , ..., t(k) , exactly as in the previous case, with the difference being that here k is random and is not fixed in advance. To make the maximum likelihood method work, we must observe at least one noncensored lifetime. The likelihood function is similar to Type−II censoring: Lik =

k Y

f (t(i) ; α, β) · [1 − F (T ; α, β)]n−k .

(3.3.9)

i=1

3. Random noninformative censoring. Suppose that the lifetime τi of the i-th item can be observed only during a random time interval [0, Ti ]. The r.v. Ti ∼ G(t) is assumed to be independent

3.3. CENSORED DATA: PARAMETER ESTIMATION

51

on τi . For example, the lifetime of a transmission can be observed on the time interval [0, Ti ], where Ti is the actual mileage made by the i-th truck during the warranty period, say one year. If the transmission fails during the first year, we will know the actual value of τi = ti and also that Ti > ti . Otherwise, we observe the truck one-year mileage Ti = xi and know only that τi exceeds xi . We assume that τi ∼ f (t; α, β), and that the c.d.f. of τi is F (t; α, β). Important is the following assumption: the c.d.f. of Ti does not depend on the parameters (α, β), which justifies the name noninformative censoring. To write the likelihood function we need a notation g(t) for the density function and the c.d.f. of Ti . It is convenient to introduce an indicator random variable δi which equals 1 if τi ≤ Ti , and 0 otherwise. Denote by yi the observed lifetime or the censoring time for the i-th item. Then the contribution of the i-th item to the likelihood function is Liki = [f (yi ; α, β) · (1 − G(yi )]δi · [g(yi ) · (1 − F (yi ; α, β))](1−δi ) . (3.3.10) Qn Now the log-likelihood function becomes logLik = i=1 Liki : logLik =

n X

h i log f (yi ; α, β)δi ·(1−F (yi ; α, β))(1−δi ) +H(y1 , ..., yn ).(3.3.11)

i=1

Here the last term H(·) depends only on g(yi ) and G(yi ) and does not contain unknown parameters. Since we proceed by taking partial derivatives of the log-likelihood with respect to α and β, this H-term can be simply omitted.

4. Quantal response data. We already mentioned the work of Artamanovsky and Kordonsky (1970), in which the following important situation was studied. For the i-th item, the information about its lifetime is the following: either the item failed during the interval [0, yi ], where yi is a known quantity, or survived past yi . The exact time of the failure is not known. This type of yes/no data about the failure time is called quantal. It is typical for periodic inspections and for follow-up studies. For the sake of simplicity, let us number the ”yes” responses (i.e. the failed items) by index i changing from 1 to r. The nonfailed items have numbers from r + 1 to n (the ”no” response). Then the likelihood function is: Lik =

r Y i=1

F (yi ; α, β) ·

n Y

(1 − F (yi ; α, β)) .

(3.3.12)

i=r+1

It has been proved in the above cited paper that for a location-scale families, the system of likelihood equations for this case has a unique solution if at least one ”yes” and one ”no” response have been observed. Example 3.2: testing n items on the interval [0, y]

52

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

Consider F (t; λ) = 1 − e−λt , all yi = y. r items have failed on [0, y] and (n − r) survived past y. Here Lik = (1 − e−λy )r e−λy(n−r) , logLik = r log(1 − e−λy − λy(n − r). ˆ = y −1 log n ∂logLik/∂λ = rye−λy /(1 − e−λy ) − y(n − r) = 0 gives λ (n−r) . # 5. Grouped data. Another situation often arising in lifetime testing and/or in the follow-up studies is the following: the time axis is divided into nonoverlapping time intervals I1 = (t0 , t1 ], I2 = (t1 , t2 ], ..., Ik = (tk−1 , tk ], Ik+1 = (tk , tk+1 ), where t0 = 0, tk+1 = ∞. N items are put on test (or on a follow-up) at t = 0, and the information available is that Nj of them have failed in the interval Ij , j = 1, ..., k + 1, Pk+1 j=1 Nj = N . One of the first monographs devoted to parameter estimation for grouped data was by Kulldorf (1961). Assume that the items fail independently and that their lifetime has the c.d.f. F (t; α, β). The likelihood function is : Lik =

k+1 Yh

iNi F (ti ; α, β) − F (ti−1 ; α, β) .

(3.3.13)

i=1

Example 3.3: fatigue test data 25 specimen made from duralumin alloy were tested during 90 million cycles. The following data were recorded: 7 units failed on the interval [0, 45], 13 units failed on the interval (45, 90], 5 units survived the test. Previous experience says that the lifetime follows Weibull distribution F (t; λ, β) = 1 − exp[−(λt)β ]. Prior information suggests that the scale parameter λ is in the range 0.008 − 0.02, and that the shape parameter β lies between 2 and 4. The likelihood function has the following expression:  7 Lik = 1 − exp[−(λ · 45)β ] ·  13 exp[−(λ · 45)β ] − exp[−(λ · 90)β ] exp[−5(λ · 90)β ] . ˆ and β. ˆ Let us show how to use Mathematica to find the MLE’s λ In the printout presented on Fig. 3.3, In[1] defines the loglikelihood function as logLik, and its partial derivatives f1 and f2 with respect to λ and β, correspondingly. The operator ”FindRoot” solves the system (3.3.3), and Out[4 ] is its solution: λ = 0.0137, and β = 2.29. This operator demands defining the initial values for the variables. As such were taken λ = 0.014, and β = 3, near the

3.3. CENSORED DATA: PARAMETER ESTIMATION

53

middle of the the prior range for these parameters. Since the point where both partial derivatives are zero is not necessarily the maximum point, it is highly advisory to make a contour plot of the loglikelihood function. It is carried out by means of the operator ”ContourPlot”, see In[5 ]. The plot provided by Out[5 ] ˆ βˆ are indeed the maximum likelihood estimates. # leaves no doubt that λ,

3.3.3

Finding Maximum Likelihood Estimates for a Censored Sample: Weibull and lognormal Distribution

Weibull Distribution In this subsection we denote by t1 , t2 , ..., tr the noncensored observations, and by tr+1 , ..., tn − the censored ones. For example, for type−II censoring, tr+1 , ..., tn are equal to the largest observed order statistic. It is more convenient to work not with the actual observed (or censored) lifetimes but rather with their logarithms. So, denote yi = log ti , i = 1, ..., r, r + 1, ..., n.

(3.3.14)

Let us remind that if the original observations ti come from the Weibull distribution, then their logarithms have the extreme-value distribution Extr(a, b), where a and b are expressed through the Weibull parameters: λ = e−a , β = 1/b .

(3.3.15)

The system of the likelihood equtions has the following form: ∂logLik/∂a = −r/b + b−1

n X

e(yi −a)/b = 0,

i=1

∂logLik/∂b = −r/b − b−2

r X

(yi − a) + b−2

i=1

n X

e(yi −a)/b (yi − a) = 0 .

i=1

From the computational point of view, it is always easier to solve one nonlinear equation instead of a system of two. The first of the above equations allows expressing a as a function of b: n   X a = b log r−1 eyi /b .

(3.3.16)

i=1

Using (3.3.16) we obtain an equation for b only: Pn r X yi eyi /b −1 Pi=1 = r yi + b . n yi /b i=1 e i=1

(3.3.17)

54

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

This equation has usually a single root and is easily solved numerically. Denote its solution by ˆb. This will be the desired MLE of b. Now substitute it into (3.3.16) and obtain a ˆ. Lognormal Distribution. We know that if the logarithms of actual lifetimes follow the normal distribution, then the lifetimes themselves are lognormal. So, we process the logarithms (yi , i = 1, ..., n) as if they are normally distributed. The procedure for finding the MLE’s for the normal case with censored observations is based on a rather efficient iterative procedure, which is a version of so-called ExpectationMaximization algorithm, see e.g. Lawless (1982), Chapter 5. To describe it, we need some notation. Let Z ∞ √ 2 φ(t)dt . (3.3.18) φ(x) = ( 2π)−1 e−x /2 , R(x) = x

Define also U (x) = φ(x)/R(x); γ(x) = U (x) · (U (x) − x) .

(3.3.19)

Denote by µ and σ the parameters of the normal distribution . Define xi = (yi − µ)/σ, wi = yi for i = 1, ...r, i.e. for the noncensored observations, and wi = µ + σ · U (xi ) for i = r + 1, ..., n, i.e. for the censored observations. Now use the following iterative procedure. Step 0. Define initial estimates of µ and σ. Denote them as µ0 and σ0 . Step 1. Use µ0 and σ0 to compute wi from the above formulas, for i = 1, 2, ..., n. Step 2. Compute µ1 and σ1 from the following equations: µ1 =

n X

wi /n,

(3.3.20)

i=1

and σ12

Pn (w − µ0 )2 i=1 Pn i = . r + i=r+1 γ(xi )

(3.3.21)

Go to Step 1 by setting µ0 := µ1 and σ0 := σ1 . Repeat Steps 1,2 until procedure converges. It is easy to implement the above procedure on Mathematica because it has built-in operators for computing R(x). We considered in this section the most useful parameter estimation techniques for incomplete data. This type of statistical inference is probably the most relevant to the implementation of reliability theory and preventive maintenance models to real-life situations.

3.3. CENSORED DATA: PARAMETER ESTIMATION

55

There is an ample literature on this topic. The reader can find a lot of information in Lawless (1982) and Gertsbakh (1989). These sources give also a numerous references on inference from censored, grouped and quantal-type data.

3.3.4

Point and Confidence Estimation of Location and Scale Parameters Based on Linear Combination of Order Statistics

The purpose of this subsection is to describe methods of parameter estimation for incomplete data when the sample is drawn from a location−scale family. The parameter estimates are linear combinations of observed order statistics. These methods are easy to use and the quality of the corresponding estimators is comparable to the quality of maximum likelihood estimators. Suppose that a random sample X1 , X2 , ..., Xn is drawn from a population of location−scale type, see Definition 3.1 in Chapter 2. We remind that Xi ∼ F0 ((t − µ)/σ). Denote by X(i) the i-th order statistic, and let Zi = (Xi − µ)/σ. Obviously, Zi is parameter-free, P (Zi ≤ t) = F0 (t). Then one can write the following set of equations: X(i) = µ + σm(i) + σ(Z(i) − m(i) ),

(3.3.22)

where m(i) = E[Z(i) ]. The index i in (3.3.22) takes on the ordinal numbers of the observed order statistics. For example, we observe the second, third and fifth ordered observation. Then i = 2, 3 and 5. To simplify the exposition, we assume that we observe the first r order statistics, i.e. we are in a situation of Type−II censoring. Then in (3.3.22), i = 1, 2, ..., r. Since Zi is parameter-free, the values of mi can viewed as known. Thus the set of r equations (3.3.22) contains two unknowns, µ and σ; the term i = Z(i) − m(i) is a zero-mean random variable. (3.3.22) can be represented in a matrix form: X = Dβ + σ,

(3.3.23)

where X0 = (X(1) , ..., X(r) ) (0 means transposition); β is a column-vector with components µ, σ;  is a column-vector with components 1 , ..., r . D is the following matrix:   1 m(1)  1 m(2)    ...  D=  ...   ... ...  1 m(r) (3.3.23) is a standard linear regression relationship. To make the regression mechanism work, we have to compute the variance-covariance matrix V = ||Cov[i , j ]||, i, j = 1, 2, ..., r .

(3.3.24)

56

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

This can be done since i ’s are parameter free. It is known from the regression theory (see e.g. Seber (1977), Chapter 3) that: (i) Minimal variance linear unbiased estimators of µ and σ are (ˆ µ, σ ˆ )0 = (D0 V−1 D)−1 D0 V−1 X;

(3.3.25)

(ii) The covariance matrix of (ˆ µ, σ ˆ ) is σ 2 (D0 V−1 D)−1 .

(3.3.26)

µ ˆ, σ ˆ are called the best linear unbiased estimators (BLUE’s). As it follows from (3.3.25), these estimators have the following general form: Pr Pr µ ˆ = i=1 a(n, r; i)X(i) , and σ ˆ = i=1 b(n, r; i)X(i) Let us remind that the regression method works for any collection of order statistics, not necessarily for type−II censored samples. If we have at our disposal the order statistics with numbers i1 , i2 , ..., ir , the same method applies with obvious changes in the D-matrix and in the V-matrix. The normal case has been studied extensively theoretically and numerically. If we have a sample which, by our assumption, follows the lognormal distribution, we have to process the logarithms of the observations as a sample from the normal distribution. The paper by Sarhan and Greenberg (1962) ”The best linear estimates for the parameters of the normal distribution” in ”Contributions to order statistics”, contains tables of the coefficients a(n, r; i), b(n, r; i), for a wide range of n and r values. It considers type−II censored samples, as well as left- and right-censored samples. We recommend to use the above source for the normal case. If we have a sample from the Weibull distribution then the logarithms of the observations follow the so-called Extreme-value distribution, see (2.3.15). BLUE estimators for this case have been investigated by Lieblein and Zelen (1956). To facilitate the numerical work in calculating the BLUE’s, we present in the Appendix 2 the V-matrices for the normal and the Extreme-value case, for n = 8, 10 and n = 15, together with the mean values of the order statistics m(i) . An interesting theoretical and practical issue is to compare the maximum likelihood estimators and the BLUE’s. A suitable criterion for comparison is the mean square error (m.s.e.). Let us remind that the m.s.e. of an estimator θˆ is defined as ˆ = V ar[θ] ˆ + E[(θˆ − θ)2 ]. m.s.e.[θ]

(3.3.27)

For the Extreme-value distribution, both methods provide close results, unless there is a heavy censoring. For that case, the MLE’s have a preference over

3.3. CENSORED DATA: PARAMETER ESTIMATION

57

the BLUE’s. So far we have demonstrated several methods of obtaining point estimators for location and scale parameters. Now let us discuss confidence estimation for these parameters. Let µ ˆ and σ ˆ be the estimators of the location and scale parameters, respectively. These estimators are functions of the observed order statistics X(1) , ..., X(r) : µ ˆ = ψ1 (X(1) , ..., X(r) ),

(3.3.28)

σ ˆ = ψ2 (X(1) , ..., X(r) ) .

(3.3.29)

Definition 3.1: equivariance If for any d and any c > 0, the estimators ψ1 , ψ2 satisfy the following properties: ψ1 (cX(1) + d, ..., cX(r) + d) = cψ1 (X(1) , ..., X(r) ) + d, ψ2 (cX(1) + d, ..., cX(r) + d) = cψ2 (X(1) , ..., X(r) ),

(3.3.30) (3.3.31)

then they are termed equivariant. # (3.3.30) means that if all observations Xi undergo the transformation Xi∗ = cXi + d, then the estimator of the location parameter undergoes the same transformation. (3.3.31) means that for the same transformation, the estimator of the scale parameter is multiplied by factor c and remains insensitive to d. Let us state without proof that the BLUE’s of µ and σ are equivariant. The following notion plays a central role in confidence estimation. Definition 3.2: pivotal quantity Let the c.d.f. of Xi depend on some parameter θ. Then a function of X1 , ..., Xn and θ, Q(X1 , X2 , ...Xn ; θ), is called a pivotal quantity if the distribution of Q is parameter-free. # To show how a pivotal quantity can be used for a confidence estimation, let us recall the famous expression √ n − 1(X − µ) (3.3.32) q = P 0.5 , n 2 /n (X − X) i i=1 Pn where Xi ∼ N (µ, σ), and X = i=1 Xi /n is the sample avarage. The reader knows from the Theory of Statistics (see e.g. Devore (1982), p. 259 ) that q is distributed according to the so-called t-distribution (with n − 1 degrees of freedom), and q does not depend on the unknown parameter µ. Then the confidence interval for µ is obtained from the following relationship:  P tα ≤ q ≤ t1−α = 1 − 2α, (3.3.33)

58

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

where tα , t1−α are the corresponding quantiles of the t-distribution. Transforming the inequalities inside P (·) in (3.3.33), leads to the following well-known (1 − 2α) - confidence interval: tα t1−α [X − √ s, X + √ s], (3.3.34) n n 0.5 P n 2 X) /(n − 1) where s = (X − . i i=1 Let GQ (t) be the c.d.f. of the pivotal quantity Q. Then a confidence interval for θ could be obtained by using the relationship  P qα ≤ Q(X1 , ...Xn ; θ) ≤ q1−α = 1 − 2α, (3.3.35) where qα , q1−α are the corresponding quantiles of GQ (t). The desired confidence interval on θ will be obtained if we succeed in ”pivoting”, i.e. in solving the inequalities inside P (·) in (3.3.35) with respect to θ. Let us show how this procedure works if we have at our disposal equivariant estimators satisfying (3.3.30),(3.3.31). Theorem 3.1 Let ψ1 and ψ2 satisfy (3.3.30),(3.3.31), and let tp be the p-quantile of the r.v. Y ∼ F0 ((t − µ)/σ). Then A1 =

ψ1 − µ ψ 1 − tp ψ2 ψ1 − µ , A3 = , A4 = , A2 = ψ2 σ σ ψ2

(3.3.36)

are pivotal quantities. Proof Let Yi = (Xi − µ)/σ be i.i.d., Yi ∼ F0 (t). Then Y(i) = (X(i) − µ)/σ, i = 1, ...n. Obviously, ψ2 (Y(1) , ..., Y(r) ) is a pivotal quantity. It equals ψ2 ((X(1) − µ)/σ, ..., (X(r) − µ)/σ)) = ψ2 (X(1) , ..., X(r) )/σ = A2 . Similarly, ψ1 (Y(1) , ..., Y(r) ) is a pivotal quantity. By (3.3.30) it equals (ψ1 (X(1) , ..., X(r) ) − µ)/σ = A3 . A1 is a pivotal quantity because A1 = A3 /A2 . For a location-scale family, tp = µ + σt0p , where F0 (t0p ) = p. A4 is a pivotal because A4

=

(ψ1 (X(1) , ..., X(r) ) − tp )/ψ2 (X(1) , ..., X(r) )

= ψ1 (µ + σX(1) , ..., µ + σX(r) ) − µ − σt0p )/ψ2 (µ + σX(1) , ..., µ + σX(r) ) =

(ψ1 (Y(1) , ..., Y(r) ) − t0p )/ψ2 (Y(1) , ..., Y(r) ).

If the quantiles of the c.d.f. of the pivotals are known, then the use of Theorem 3.1 for obtaining confidence intervals is straightforward. For example, let A4 ∼ G4 (t), and G4 (tα ) = α and G4 (t1−α ) = 1 − α. Then P tα ≤

 ψ 1 − tp ≤ t1−α . ψ2

(3.3.37)

3.3. CENSORED DATA: PARAMETER ESTIMATION

59

Solving the inequality inside P (·) with respect to tp (assuming ψ2 > 0) produces the following (1 − 2α)- confidence interval on tp : [ψ1 − ψ2 t1−α , ψ1 − ψ2 tα ] .

(3.3.38)

The only remaining obstacle is finding the quantiles of the pivotals. The case with the t− distribution is a rare exception because the c.d.f. of the pivotal is available in a closed form. For practical purposes, the following Monte Carlo simulation procedure could be applied for finding the quantiles of the pivotals. Step 1 . Generate a random sample Y1 , ..., Yn from the distribution F0 (t); Step 2. Calculate the statistics ψ1 , ψ2 and the pivotal quantities A1 , ..., A4 . (k)

Denote by Aj

j = 1, 2, 3, 4, the k-th replica of the respective pivotal.

Repeat Steps 1,2 N times (say, N = 100 000). Order the samples of (k) Aj , k = 1, 2, ..., N ; j = 1, 2, 3, 4. For each j, take as an estimate of the corresponding α-quantile the [|N · α|]-th ordered observation. (2,500) For example, for N = 100 000 and α = 0.025, t0.025 = Aj , i.e. the quan(97,500)

tile estimate is the 2500-th ordered observation. Similarly, t0.975 = Aj . The paper by Elperin and Gertsbakh (1987) demonstrates the implementation of the above method for confidence estimation.

3.3.5

Large-Sample Maximum Likelihood Confidence Intervals

The loglikelihood function (3.3.2) and the MLE’s α ˆ and βˆ can be used for obtaining approximate confidence intervals for the parameters. First, we will describe the modus operandi of this approach. Later we will discuss the conditions under which this approach provides good results. ˆ Suppose that we have at our disposal the MLE α ˆ , β. (i). Compute the following second order derivatives: ∂ 2 logLik , ∂α2 ∂ 2 logLik , V12 (α, β) = − ∂α∂β ∂ 2 logLik V22 (α, β) = − . ∂β 2 V11 (α, β) = −

(3.3.39) (3.3.40) (3.3.41)

(ii). Substitute the MLE’s into the expressions of Vij . Denote the corresponding

60

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

ˆ expressions Vˆij = Vij (ˆ α, β). Form the following matrix called the Fisher observed information matrix:   Vˆ11 Vˆ12 IF = ˆ V12 Vˆ22 This matrix is positively definite. (iii). Find the inverse of IF : I−1 F =



ˆ 11 W ˆ 12 W

ˆ 12 W ˆ 22 W



(iv). Let zγ be the γ-quantile of the standard normal distribution N (0, 1). An approximate (1 − 2γ)-confidence intervals for α and β are calculated by the following formulas: p p ˆ + z1−γ · W11 ], (3.3.42) for α : [ˆ α − zγ · W11 , α p p ˆ ˆ for β : [β − zγ · W22 , β + z1−γ · W22 ] . (3.3.43) The principal statistical fact behind these formulas is the following: If (i) the likelihood function has the form (3.3.1), where ti are the observed values of i.i.d. random variables; (ii) the density function f (t; α, β) satisfies some regularity conditions, the most critical of which is that the support does not depend on the parameters; Then, as n → ∞, the MLE’s α ˆ and βˆ are asymptotically normal and unbiased, with variance-covariance matrix I−1 F . This result remains true if the observations are subject to Type−I and Type−II censoring. Under some additional demands on f (t; α, β), it can be extended to the case of random independent censoring, see Lawless (1982), p. 526. The crucial point is that the above result is true asymptotically, i.e. for large samples. In reliability practice, it is very unlikely to have a large sample of observed lifetimes. Applying the maximum likelihood method to small and medium-size samples, as well as for the samples which are obtained under arbitrary censoring, may lead to inaccuracies in the actual confidence level. The issue of applicability of asymptotic results to finite samples needs further theoretical and experimental research. We suggest to use the above described method with caution, and treat the results as approximate ones. Example 3.3 continued: large sample confidence intervals for the parameters The necessary calculations are presented by the Mathematica printout shown on Fig. 3.4, which is a continuation of the printout shown on Fig. 3.3.

3.4.

EXERCISES

61

In[6 ] defines the elements of the matrix IF , as the corresponding second order partial derivatives. These derivatives are calculated by means of the operator ”D[...]”. Afterwards, the derivatives are numerically evaluated at the MLE’s found in Out[4], see Out[6], Out[7], Out[8]. In[9] defines the observed Fisher information matrix and its inverse as ”Finv=Inverse[F]”, and Out[10] is the I−1 F . Out[11] and Out[12] are the estimates of the corresponding standard deviations. We summarize that in the following form: ˆ = 0.0137 ± 0.0014, βˆ = 2.29 ± 0.55. # λ

3.4

Exercises

1. Suppose τ1 ∼ W (λ, β), with E[τ1 ] = 1, and V ar[τ1 ] = 0.16. Find λ and β. Suppose that τ2 ∼ logN (µ, σ), with E[τ2 ] = 1 and V ar[τ2 ] = 0.16. Find µ and σ. Compare 0.1-quantiles of τ1 and τ2 . 2. Mann and Fertig in their paper in Technometrics (1973) give the following data on a life testing of 13 identical components (in thousands of cycles) : 0.22, 0.50, 0.88, 1.00, 1.32, 1.33, 1.54, 1.76, 2.50, 3.00. The test was terminated after the tenth failure. a. Assume that the lifetime follows the Weibull distribution. Estimate parameters using the Weibull paper. Estimate t0.1 . b. Solve the above problem for the lognormal distribution. c. Find the MLE for Weibull parameters λ and β. 3. 10 items were put on test at t = 0 and inspected at t = 2. Four failed, the rest were operating. Assume that the lifetime follows W (λ = 0.4, β). Find numerically the MLE of β. 4. The lifetime τ ∼ Exp(a, λ), see (2.1.23). Suppose, t1 , ..., tn are the observed lifetimes. Find MLE of a and λ. Hint: Derive from the loglikelihood that a ˆ = min(t1 , ..., tn ). 5. Lifetime of a certain equipment follows c.d.f. F (t; α; β). n units started to operate at t = 0 and were inspected at t1 = T and t2 = 2T and t3 = 3T . k1 units failed in [0, T ] , k2 - in [T, 2T ], and the rest n − k1 − k2 failed in [2T, 3T ]. Write in a general form the expression for the likelihood function and for finding the MLE of α and β. 6. Replacement of a component on a finite interval [0, T ]. a. Suppose a new component started operating at t = 0. When the component fails, it is immediately replaced by a new one. The replacements take place only on the interval [0, T ]. The density function of component lifetime τ is f (t; α, β), and the c.d.f. is F (t; α, β).

62

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

Suppose, the replacements took place at the instants {ti }, 0 < t1 < t2 < ... < tk < T . Use this information to derive an expression for the likelihood function. Hint: The following lifetimes were observed : d1 = t1 , d2 = t2 − t1 , ..., dk = tk − tk−1 and τ > T − tk . Thus the corresponding likelihood function is Lik =

k Y

f (di ; α, β) · [1 − F (T − tk ; α, β)]

i=1 −β

b. Assume that τ ∼ F (t; β) = 1 − e−t . The following data were received in the replacement process described in part a: T = 10, t1 = 2, t2 = 3.3, t3 = 5.6, t4 = 7.2, t5 = 8.9. Find the MLE of β. 7. The Hazard Plot a. If R(t) = P (τ > t is the survival function, then the expression Λ(t) = − log R(t) is called the cumulative hazard. Prove that if τ ∈ IF R, then the cumulative hazard in a convex function (assume that τ has a differentiable failure rate). Rt Hint. By (2.2.3), the cumulative hazard equals Λ(t) = 0 h(v)dv. Differentiate Λ(t) twice, and use the property that for the IFR − family, h0 > 0. b. The property established in a is used for a graphical investigation of R(t). ˆ i ) versus ti , where R(t) ˆ Plot − log R(t is the Kaplan-Meier estimator of the survival function. This graph is called the hazard plot. If the hazard plot reveals convexity, this might be an evidence that τ ∈ IF R. Construct the hazard plot for the ABS data in Example 1.1. e

3.4.

EXERCISES

Figure 3.1: The Weibull plot for Example 2.1

63

64

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

Figure 3.2: The normal plot for Example 2.2

3.4.

EXERCISES

Figure 3.3: Mathematica printout for solving Example 3.3

65

66

CHAPTER 3.

STATISTICAL INFERENCE: INCOMPLETE DATA

Figure 3.4: Mathematica printout for the continuation of Example 3.3

Chapter 4

Preventive Maintenance Models Based on Lifetime Distribution

When it is not necessary to change, it is necessary not to change Viscount Lucius Cary Falkland If it ain’t broken, don’t fix it Saying

4.1 4.1.1

Basic Facts from Renewal Theory and Reward Processes Renewal Process

When we choose a preventive maintenance scheme, we usually are interested in selecting the maintenance parameters, e.g. the maintenance period, in the ”best” possible way. To do this, we need to compare the expressions for mean (expected) costs or rewards for various maintenance periods. Our first task is to learn how to write expressions for these costs or rewards. We will need some 67

68

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

basics from the renewal theory. Definition 1.1: counting process {N (t), t > 0} is said to be a counting process if N (t) represents the total number of ”events” on (0, t]. # Obviously, N (t) is integer valued, s < t implies that N (s) ≤ N (t), N (0) = 0, and N (t) − N (s) = the number of events on (s, t]. Denote by Xn the random time between the (n − 1)-st and the n-th event, see Fig 4.1. (The zero event takes place at t = 0 and we don’t count it). Definition 1.2: renewal process Let {Xi } be i.i.d. nonnegative random variables, Xi ∼ F (t). Then the counting process N (t) N (t) = {max n : Sn = X1 + X2 + ... + Xn ≤ t}

(4.1.1)

is called renewal process. #.

Figure 4.1: The renewal process Simply speaking, the renewal process counts the number of Xi -intervals on (0, t]. If E[Xi ] = µ, then by the law of large numbers, Sn /n → µ as n → ∞ with probability 1. Let us establish the distribution of N (t). Denote by Fn (t) = P (Sn ≤ t). Denote F1 (t) = F (t), the c.d.f. of Xi . Theorem 1.1 P (N (t) = n) = Fn (t) − Fn+1 (t). Proof N (t) ≥ n ⇔ Sn ≤ t. Then P (N (t) = n) = P (Sn ≤ t) − P (Sn+1 ≤ t). Definition 1.3: renewal function The mean number of events m(t) on (0, t] is called the renewal function: E[N (t)] = m(t). #

4.1. RENEWAL THEORY: BASIC FACTS

69

Theorem 1.2 m(t) =

∞ X

Fn (t) .

(4.1.2)

n=1

Proof P ∞ m(t) = n=1 n · P (N (t) = n) = Fn (t) + ...

after some algebra

= F1 (t) + F2 (t) + ... +

Example 1.1: Poisson process. In a Poisson process, Xi ∼ Exp(λ), see Chapter 2, Section 1. From the description and the properties of this process we know that the number of ”calls” (events) on (0, t] has a Poisson distribution with parameter λt. Therefore, the mean number of events on (0, t] equals λt and thus m(t) = λt. # Theorem 1.3 As t → ∞, with probability 1 1 N (t) → . t µ

(4.1.3)

Proof (Ross, (1993)) Obviously, SN (t) ≤ t < SN (t)+1 ,

(4.1.4)

SN (t) SN (t)+1 t ≤ < . N (t) N (t) N (t)

(4.1.5)

or

As t → ∞, so does N (t). SN (t) /N (t) is an average of N (t) i.i.d. random variables X1 , ..., XN (t) , and by the strong law of large numbers, SN (t) /N (t) → µ. Write    SN (t)+1 /N (t) = SN (t)+1 /(N (t) + 1) (N (t) + 1)/N (t) . The first factor tends to µ and the second − to 1, as t goes to infinity. Therefore, t/N (t) → µ, or N (t)/t → 1/µ. The quantity 1/µ is called the renewal rate. Theorem 1.4: elementary renewal theorem As t → ∞, m(t) 1 → . t µ We omit the proof.

(4.1.6)

70

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

At the first glance, this theorem seems to be a corollary of Theorem 1.3 since the convergence of N (t)/t to 1/µ must imply that the E[N (t)]/t also converges to 1/µ. There are, however, some subtle details which complicate the proof, see Ross (1970), p. 40 or Ross (1993), p. 275. Theorem 1.5 Suppose, the interrenewal intervals have mean µ and variance σ 2 . Then, as t → ∞, [m(t) − t/µ] →

1 σ2 − . 2µ2 2

(4.1.7)

We omit the proof. It can be found in Gnedenko et al(1969). The theorem states that m(t) has an asymptote, as shown on Fig. 1.2

Figure 4.2: The behavior of the renewal function m(t) Theorem 1.6: asymptotic distribution of N (t) As t → ∞, N (t) − t/µ d p −→ N (0, 1) . t σ 2 /µ3

(4.1.8)

Loosely speaking, N (t) for large t is approximately normal with mean t/µ and variance tσ 2 /µ3 . We omit the proof, which can be found in Feller (1968), Chapter 13. In applications, we often need an explicit expression for the renewal function m(t). Unfortunately, closed form expressions for m(t) can be found only for the Gamma and normal case.

4.1. RENEWAL THEORY: BASIC FACTS

71

Example 1.2: m(t) for Gamma and normal case (i) The Gamma distribution. Let Xi ∼ Gamma(k, λ). Note that Fn (t) is an nfold convolution of this distribution. Remembering the model of the Gamma distribution, Fn ∼ Gamma(nk, λ). Therefore, denoting F1 (t) = Gamma(t; k, λ), m(t) =

∞ X

Gamma(t; k · n, λ) .

(4.1.9)

n=1

(ii) The normal distribution. Assume that the normal distribution is used to describe a nonnegative random variable. It implies that the negative tail might be ignored. Then m(t) =

∞  t − nµ  X √ . Φ σ n n=1

(4.1.10)

However, for most cases we are forced to use approximations to the renewal function. It is easy to show that the n-th convolution is always less or equal then [F1 (t)]n : Fn (t) ≤ [F (t)]n , where we put F1 (t) = F (t). Then it follows from Theorem 1.2 that F1 (t) + ... + Fn (t) ≤ m(t) ≤ F1 (t) + ... + Fn (t) +

∞ X

[F (t)]i .

(4.1.11)

i=n+1

For practical purposes, for small t values, it is enough to take in (4.1.11) n = 2. Then, assuming that F (t) < 1, F1 (t) + F2 (t) < m(t) < F1 (t) + F2 (t) + [F (t)]3 /(1 − F (t)) .

(4.1.12)

This approximation is reasonably good for t < 0.5 · E[Xi ]. To facilitate the use of the renewal function in computations related to preventive maintenance, we present in Appendix 5 a table of the renewal function for the Weibull family. The results were obtained by using the approximation P5 m(t) ≈ n=1 Fn (t). We used the following recursive formula: Z t Fn (t) = f (x) · Fn−1 (t − x)dx, 0

where F1 (t) = F (t) = 1 − exp[−tβ ], f (t) = dF (t)/dt. The integral has been replaced by a finite sum. The maximal error of the approximation in the table does not exceed 0.001. The following theorem proved by D. Blackwell describes the local behavior of the renewal function m(t) as t goes to infinity. Theorem 1.7 (Blackwell)

72

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Let the c.d.f. of Xi be a continuous function. Then for all a > 0, a lim [m(t + a) − m(t)] = . t→∞ µ

(4.1.13)

We omit the proof which can be found in Ross (1970), page 41. From (4.1.10) it follows that limt→∞ [m(t + a) − m(t)]/a = 1/µ and thus 1 m(a + t) − m(t) = . a→0 t→∞ a µ lim lim

(4.1.14)

Suppose, it is possible to interchange the limits. Then lim

t→∞

dm(t) 1 = m0 (t) = , dt µ

(4.1.15)

i.e. the derivative of the renewal function for ”large” t equals the renewal rate. m0 (t) is called the renewal density or the renewal rate. Let us give a probabilistic interpretation of the renewal density. Obviously, m0 (t) · ∆t ≈ m(t + ∆t) − m(t) = E[# of renewals in (t, t + ∆t)] = P (exactly one renewal in (t, t + ∆t)) + X jP (j renewals in (t, t + ∆t)) . j>1

It can be shown (see Gnedenko et al (1969), Section 2.3) that the sum in the last formula is o(∆t) as ∆t → 0. Thus, m0 (t) · ∆t ≈ P (exactly one renewal in (t, t + ∆t)).

4.1.2

(4.1.16)

Renewal Reward Process

Let us associate with each renewal period a nonnegative random variable Ri called reward or cost. It will be assumed that {Ri } are i.i.d. random variables, with mean value E[Ri ] = E[R]. Let us assume that the reward Ri for the i-th renewal period is accounted at the end of this period. Then the total reward accumulated on the interval PN (t) [0, t] will be R(t) = i=1 Ri , since there are N (t) renewal periods on (0, t]. The following theorem is the key for computing the reward per unit time. Theorem 1.8 (i) With probability → 1, E[R] R(t) → t µ

(4.1.17)

as t → ∞. (ii) As t → ∞, E[R(t)] E[R] → . t µ

(4.1.18)

4.1. RENEWAL THEORY: BASIC FACTS

73

Proof (i) Write R(t) = t

PN (t) i=1

t

Ri

 PN (t) R  N (t)  i i=1 . = N (t) t

(4.1.19)

By the strong law of large numbers, the first fraction in the right-hand side tends to E[R] as t → ∞. By Theorem 1.3, the second ratio tends to 1/µ, which proves (i). (ii) It might seem strange that here the proof is more difficult then for (i). We omit it and refer the reader to Ross (1993). Example 1.3: comparing two replacement strategies. An equipment has two independent parts, a and b. Their lifetimes are r.v.’s Xa and Xb , respectively: Xa ∼ F (t), and Xb ∼ G(t). Let E[Xa ] = µa , E[Xb ] = µb . If a part fails, an emergency repair (ER) takes place, the part is immediately replaced by an equivalent new one. The cost of an ER is cER = $1, 000, a ”large” cost. There is also an option to combine the ER of one part with a preventive repair (PR) of the another one. PR means replacing the part in the absence of failure by an equivalent new one. The cost of combined repair is ccom = cER +∆, and ∆ is relatively small. Assume ccom = $1030. Two service strategies are suggested: replace each part at its failure only (strategy I) ; together with an ER of one part, carry out the PR of the another one (strategy II). Strategy II is often called opportunistic. It is decided to compare both strategies by comparing the values of the expected cost per unit time over an infinite time period. Strategy I. Obviously, the cost per unit time ηI is h i ηI = lim E the cost over [0, t] /t = cER · (1/µa + 1/µb ) . (4.1.20) t→∞

Strategy II. Denote by Z = min(Xa , Xb ), and let E[Z] = µ0 . Then, on the average, each µ0 units of time, a cost ccom is paid. Thus the cost per unit time is ηII = ccom /µ0 . Let us compare these strategies for two cases : both Xa and Xb are exponential, with parameters λa and λb , respectively, and both Xa and Xb are Weibull, with parameters λ = 1, β = 2 for Xa , and λ = 2, β = 2 for Xb . For the first case, Z = min(Xa , Xb ) ∼ Exp(λa + λb ) and thus ηI = cER · (λa + λb ). As to ηII , it equals ccom · (λa + λb ). So, the conclusion is that for exponential lifetimes, strategy I is always preferable. It is, in fact, clear without any formal derivation because for an exponential (nonaging) part a preventive replacement (even for a small extra cost) is just a vaste of money. For the Weibull case the situation is different. First, let us compute R ∞ the means of Xa and Xb . The easiest way is to use the formula E[X] = 0 (1 −

74

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

F (t))dt which gives µa = 0.886, and µb = 0.443. One can use numerical integration in Mathematica. To compute E[Z], use (1.3.4)and (1.3.12). √ 2 β β P (Z > t) = exp[−(λ a t) ] exp[−(λb t) ] = exp[−( 5t) ]. R∞ Now E[Z] = 0 P (Z > t)dt = 0.396, and ηI = $3, 390 and ηII = $2, 600. Therefore, the opportunistic policy is better than the replace-only- uponfailure policy. #

4.1.3

Alternating Renewal Process. Component Stationary Importance Measure for a Renewable System

Consider a component which has alternating ”up” and ”down” states. Assume that the up periods are described by a sequence of i.i.d. positive random variables Yj , j = 1, 2, ..., and the down periods − by another sequence of positive i.i.d. random variables Xj , j = 1, 2, .... These two sequences are assumed to be independent. It is convenient to assume that time t = 0 coincides with the end of an up period, and thus the duration of alternating states is described by the sequence X1 , Y1 , X2 , Y2 , ... Denote Zi = Xi + Yi , i = 1, 2, .... These r.v.’s describe statistically identical operation cycles. At the instants Z1 , Z1 + Z2 , ..., Z1 + ... + Zk , ..., the component fails. Obviously, the counting process Na (t) = {max n : Sn = Z1 +Z2 +...+Zn ≤ t} is a renewal process. Denote E[Xj ] = ν, E[Yj ] = µ. Obviously, ν +µ is the mean interval between renewals in the process Na (t). Let Av (t) be the following ratio: Av (t) =

total time up on [0, t] . t

(4.1.21)

Obviously, the numerator lies between Y1 + ... + YNa (t) and Y1 + ... + YNa (t)+1 . We are interested in the behavior of Av (t) as t → ∞. Let us divide the numerator and the denominator by Na (t). As t → ∞, Na (t) also goes to infinity. Since (Y1 + ... + YNa (t) )/Na (t) → µ, and t/Na (t) → (ν + µ), see Theorem 1.3, we arrive at the following result: lim Av (t) =

t→∞

µ = Av . µ+ν

(4.1.22)

This limit is called stationary availability. This notion was already introduced in Chapter 1, Section 2. The probabilistic interpretation of Av is the probability that at some remote instant t the component is in the up state. The quantity 1/(µ + ν) is the limit of the renewal density as t → ∞. We call this limit the stationary renewal rate. Recalling the probabilistic interpretation of the renewal density (see above), ∆t/(µ + ν) is the probability that the component fails in the interval (t, t + ∆t), as t tends to infinity. Now suppose that we have a monotone system consisting of n independent renewable components. Each component has alternating up and down intervals.

4.1. RENEWAL THEORY: BASIC FACTS

75

To describe the renewal process for component j, we use the previously introduced notation and add index j. So, the stationary availability of component j is denoted as Av (j) and it equals µ(j)/(µ(j) + ν(j)). Suppose also that we know the system structure function φ(x1 , ..., xn ). Recall from Chapter 1 that E[φ(X1 , ..., Xn )] = ψ(Av (1), ..., Av (n)) = Av is the system stationary availability, where P (Xj = 1) = Av (j). Av is the probability that the system is in the up state as t → ∞. From time to time the system fails. We are interested in obtaining the relative proportion of system failures coinciding with the failure of component j. Denote by Q(t) the total number of system failures on the interval (0, t] and by Qj (t) the total number of system failures caused by the failure of component j. We call the limiting value of the fraction Qj (t)/Q(t) (for t → ∞) Barlow’s stationary index of importance of component j and denote it by Bj : Bj = lim

t→∞

Qj (t) . Q(t)

(4.1.23)

Let us consider an example which would clarify the meaning of Bj . Exercise 1 in Chapter 1 considers a series-parallel system of four components: component 2 and 3 are in parallel, both are in series with component 1, and the components 1,2,3 are in parallel with component 4. Suppose, we are interested in system failures caused by the failures of component j = 2. Its stationary renewal rate equals 1/(µ(2)+ν(2)). Not each failure of this component causes system failure. For example, let component 2 fails when 1 and 3 are up and 4 is down. The system will not fail. Similarly, if 1 and 4 are down and the component 2 fails, it does not cause the system to fail because it is already in the down (failure) state. On the contrary, let component 2 fails when 3 and 4 are down and 1 is up. Then clearly this failure causes the system failure. More formally, we might say that the failure of component 2 coincides with the system failure if φ(x1 , 12 , x3 , x4 ) − φ(x1 , 02 , x3 , x4 ) = 1. Indeed, this difference equals 1 if and only if the first term is 1 and the second is 0, which means that (i) the system with component 2 in the up state is up and (ii) the system with component 2 in the down state is down. Since φ(X) is a binary variable, P (φ(X1 , ..., 1j , ..., Xn ) − φ(X1 , ..., 0j , ..., Xn ) = 1) = E[φ(X1 , ..., 1j , ..., Xn ) − φ(X1 , ..., 0j , ..., Xn )] = E[φ(X1 , ..., 1j , ..., Xn )] − E[φ(X1 , ..., 0j , ..., Xn )] = ψ(Av (1), ..., 1j , ..., Av (n)) − ψ(Av (1), ..., 0j , ..., Av (n)) .

(4.1.24)

The first term in the last line is the stationary availability of the system in which the component j is ”absolutely reliable”, i.e. is always up. The second term is the stationary availability that the system is in up state with component

76

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

j always being down. Note that the expression in (4.1.24) does not involve the reliability characteristics of component j ! Note also that (4.1.24) is the Birnbaum’s importance measure of component j defined as ∂ψ/∂pj (see Exercise 2 in Chapter 1). Indeed, by pivoting around component j, one obtains that ψ(Av (1), ..., Av (n)) = Av (j)ψ(Av (1), ..., 1j , ..., Av (n)) + (1 − Av (j))ψ(Av (1), ..., 0j , ..., Av (n)).

(4.1.25)

Now ∂ψ(·) = ψ(Av (1), ..., 1j , ..., Av (n)) − ψ(Av (1), ..., 0j , ..., Av (n)) . (4.1.26) ∂Av (j) Now it is clear that the Birnbaum’s importance measure of component j applied to component stationary probabilities pj = Av (j) is the stationary probability that the system is in ”j - critical state”, i.e. in such state that component j failure coincides with the system failure. On a ”large” time span [0, t], the mean duration of the time during which the system is j − critical equals to t · ∂ψ(Av (1), ..., Av (n))/∂Av (j). Now let us consider a small time interval (t, t + ∆t) as t → ∞. The event Cj = ”component j fails in this interval and the system is in j-critical state” has the probability P (Cj ) ≈

∆t ∂ψ(Av (1), ..., Av (n)) · . ∂Av (j) µ(j) + ν(j)

(4.1.27)

The above reasoning is in fact a heuristic proof of the following theorem by Barlow (1998), Section 8.3: Theorem 1.9

Bj

= =

Qj (t) lim Pn i=1 Qi (t)

t→∞

(µ(j) + ν(j))−1 (∂ψ(Av (1), ..., Av (n)))/∂Av (j) Pn . −1 (∂ψ(A (1), ..., A (n))/∂A (i)) v v v i=1 (µ(i) + ν(i))

(4.1.28)

Example 1.4 Compute the Barlow’s stationary importance measures Bj for the system described in Exercise 1, Chapter 1, for the following data: µ(1) = 1, ν(1) = 0.2; µ(2) = 1, ν(2) = 0.3; µ(3) = 0.8, ν(3) = 0.2; µ(4) = 1.5, ν(4) = 0.5. The stationary availabilities are: Av (1) = 0.833; Av (2) = 0.769; Av (3) = 0.8, Av (4) = 0.75.

4.2. PRINCIPAL MAINTENANCE MODELS

77

The structure function of the system is φ(x1 , ..., x4 ) = 1 − (1 − x1 x2 +)(1 − x1 x3 )(1 − x4 ), which simplifies to φ(x1 , ..., x4 ) = x1 x2 + x1 x3 − x1 x2 x3 + x4 − x1 x2 x4 − x1 x3 x4 + x1 x2 x3 x4 . Replace xi by Av (i) and take the partial derivatives: ∂ψ(·)/∂Av (1) = Av (2) + Av (3) − Av (2)Av (3) − Av (2)AV (4) − Av (3)Av (4) + Av (2)Av (3)Av (4) = 0.238; ∂ψ(·)/∂Av (2) = Av (1) − Av (1)Av (3) − Av (1)Av (4) + Av (1)Av (3)Av (4) = 0.042; ∂ψ(·)/∂Av (3) = Av (1) − Av (1)Av (2) − Av(1)Av (4) + Av (1)Av (2)Av (4) = 0.048; ∂ψ(·)/∂Av (4) = 1 − Av (1)Av (2) − Av(1)Av (3) + Av (1)Av (2)Av (3) = 0.205; Substituting these values and the values of (µ(i) + ν(i))−1 into (4.1.27), we obtain that B1 = 0.52; B2 = 0.08; B3 = 0.13; B4 = 0.27. Thus, 52% of system failures coincide with the failures of component 1. #

4.2

Principal Models of Preventive Maintenance.

The models which we describe now are of principal nature to the applications of preventive maintenance theory. We will describe the preventive maintenance policy and derive an expression for a cost (reward) criterion to characterize each model.

4.2.1

Periodic (Block) Replacement. rion.

Cost-type Crite-

A new unit starts operating at t = 0. At each of the time instants T, 2T, 3T, ... the unit is replaced by a new one, from the same population. This replacement termed preventive maintenance (PM) costs c , c < 1. At each failure which appears between the PM’s, the unit is also replaced by a new one. This replacement upon failure is called emergency repair (ER) and it costs cER = 1. All replacements last negligible time. Information available is the c.d.f. of unit lifetime F (t). Fig. 4.3 explains the block replacement model.

Figure 4.3: Scheme of block replacement

78

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Here the ER’s between two adjacent PM’s form a renewal process on [0, T ]. The mean cost over one period of length T equals E[Ri ] = c + m(T ) · 1, where m(T ) is the mean number of ER’s on [0, T ]. Then the mean cost per unit time is ηA (T ) =

c + m(T ) . T

(4.2.1)

The block replacement is the simplest and most widely used maintenance scheme. A formal complication with the expression for ηA (T ) is that it depends on the renewal function which is difficult to compute. Often the following upper and lower bounds are satisfactory (see (4.1.12)): c + F (T ) + F 2 (T ) < ηA (T ) < T c + F (T ) + F 2 (T ) + [F (T )]3 /(1 − F (T )) . T

(4.2.2)

Investigation of the expression for ηA (T ) will be done later. Obviously, small values of T lead to large costs for often made PM’s. Large values of T save on PM’s but lead to large costs on ER’s. It is worth noting that as T → 0, ηA (T ) → ∞; if T → ∞, then by Theorem 1.4 m(T ) behaves as T /µ and ηA (T ) → 1/µ. Remark : the relevant time scale. It is important to determine what is meant by ”time” in the context of preventive maintenance in general and in the context of block replacement, in particular. Time t may have several meanings, depending on the specific circumstances: calendar time, operation time, the number of operation cycles, etc. If the time is defined as calendar time, then the c.d.f. of system lifetime is related to calendar time and the maintenance periods are expressed as calendar units, say hours, days or months. If the relevant time is the o perational (”up”) time, then the lifetime distribution, preventive maintenance periods, etc. are measured in corresponding units, e.g. operation hours. In many cases, the choice of the relevant time units is obvious. For example, calendar time is a natural choice for a continuously operating equipment. Very often, there might be several competing ”parallel” time scales, such as the calendar time and the mileage scales for a car. It is not immediately clear which of these two scales is the ”true” one for each particular car system. We will return to the choice of the best time scale in Chapter 6. #

4.2.2

Block Replacement: Availability Criterion

A new unit starts operating at t = 0. The time axis is a calendar time which includes operation and idle time. At each failure, an ER is carried out which lasts time tER . After the total accumulated operational time reaches T , a PM is carried out, and it lasts tP M . Both ER and PM completely renew the unit.

4.2. PRINCIPAL MAINTENANCE MODELS

79

After a PM, the process repeats itself, as it is seen from Fig. 4.4. Typically, tP M t). The minimal repair made at age t0 eliminates the failure but leaves h(t0 ) unchanged. An important fact is the following Theorem 2.7 Under minimal repair, the mean number of failures on [0, T ] is equal to Z T H(T ) = h(t)dt . (4.2.6) 0

We omit the proof which is based on the fact that for minimal repair the failure instants follow a Poisson process with time dependent event rate λ(t) = h(t), see Appendix 1.

82

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

In particular, the probability of absence of failures on [0, T ] equals e−H(T ) . This formula is similar to (2.1.3). If ER costs cER = 1 and PM costs cP M = c, then the average cost per unit time is ηD (T ) =

H(T ) + c . T

(4.2.7)

Example 2.2 Let h(t) = λ. Then ηD (T ) = λ + c/T , and obviously the best strategy is T = ∞, i.e. to avoid any PM. This is quite expected because a constant failure rate characterizes the exponential distribution. Suppose now that h(t) = a · t. (This corresponds to the Weibull distribution with shape parameter β = 2). It is easy to derive p that ηD (T ) = aT /2 + c/T . The optimal T ∗ always exists and is equal T ∗ = 2c/a. #

Minimal repair with partial renewal Complete renewal, as considered above, demands in practical terms either a replacement of the whole component (or the system) by a new one, or carrying out a totality of repair actions which would bring each part of the system to its ”brand new” state. For example, all worn mechanical parts must be replaced by new ones, etc. Not always this is technically feasible.Very often, the periodic repair activity improves the system but does not bring it to the brand new state (we call it ”partial renewal”). Now it will be assumed that this partial repair (e.g. lubrication, replacement of badly worn parts, adjustment and tuning of others) made at instant t = T does not bring the system failure rate h(t) to its initial level h(0). Formally, the behavior of the failure rate on the interval [T, 2T ] will not be a copy of the failure rate behavior on the interval [0, T ]. In literature, several models of partial renewal have been proposed. Zhang and Jardine (1998) suggest that the partial renewal makes system failure rate between ”bad as old” and ”good as new”. The paper by Usher et al (1998) considers a partial renewal which reduces the actual system age by a certain fraction. We suggest the following partial renewal model. On each interval Ik = [T · (k − 1), T · k], system failure rate will be equal to the system failure rate on the previous interval Ik−1 multiplied by a ”degradation” factor eα , where α > 0, and α is an unknown parameter. For example, if the failure rate on I1 is h(t) and ranges between h(0) and h(T ), then on the interval I2 , after the partial renewal at t = T , the failure rate will be eα h(t) and it ranges between h(0)eα and h(T )eα . On the third interval I3 , the failure rate will vary between e2α h(0) and e2α h(T ), etc. This implies that the mean number of failures on the interval I1 is H1 = H(T ), on I2 it is H2 = eα H(T ), and so on. On Ik , the mean number of failures will be equal Hk = e(k−1)αH(T ) .

4.2. PRINCIPAL MAINTENANCE MODELS

83

In our calculations, only the mean number of failures Hk on the intervals Ik , k ≥ 1 play role, and not the failure rate on these intervals. Therefore, we might say that we postulate the following property of the partial renewal: the mean number of failures on the interval Ik after a partial renewal carried out at the instant Tk−1 equals the mean number of failures on the interval Ik−1 multiplied by the degradation factor eα . We assume further that after K partial renewals the system undergoes a complete renewal. In the above references, it is termed as ”overhaul”. An overhaul brings the system failure rate to its initial level h(0), i.e. to the ”brand new” state. Let us define the following costs: cmin is the cost of the minimal repair; cpr is the partial renewal cost; Cov is the overhaul cost. Simple reasoning leads to the following expression for the cost per unit time: cmin H(T ) · (1 + eα + ... + eα(K−1) ) + (K − 1)cpr + Cov . (4.2.8) K ·T It is assumed that T is given. We are looking for an optimal K ∗ which minimizes η(K). The above formula can be simplified by noting that 1 + eα + ... + eα(K−1) = (eαK − 1)/(eα − 1). η(K) =

Example 2.3 Suppose that T = 1, the failure rate h(t) = 2t, the degradation factor is eα = 1.1. The costs are: cmin = 15 $, cpr = 150 $, and Cov = 1000 $. Below is the graph for η(K) derived by Mathematica. It is seen that the optimal K ∗ = 20.

Figure 4.7: The graph of η(K) Two quantities appear in (4.2.8) : H = H(T ) and α. For applications, it is vital to have an easy way of estimating these parameters from data available. Suppose we register the numbers of failures Ni , i = 1, ..., r on r consecutive intervals I1 , I2 , ..., Ir : n1 , n2 , ..., nr . Theoretically, the mean number of failures Mi on the interval Ii equals Mi = eα(i−1) H. Then log Mi = log H + α · (i − 1) .

(4.2.9)

84

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

We suggest replacing log Mi by log ni and using the linear regression technique for finding a = log H and α. In other words, these parameters must be found by minimizing the expression S=

r  X

log ni − a − (i − 1)α

2

.

(4.2.10)

i=1

So far we considered periodic maintenance policies. They are convenient for implementation but the choice of the period T was made without taking into account the previous history of the maintenance, e.g. the actual age of the system. In the next subsection we consider the so-called age replacement policy, which does take into account the system age.

4.2.5

Age Replacement. Cost-type Criterion

At t = 0, a new unit starts operating. Its lifetime c.d.f. is F (t). The unit is replaced when it reaches age T , or at its failure, whichever occurs first. The ER (repair upon failure) costs cER , the PM (preventive replacement at age T ) costs cP M > tP M . Denote by f (t) the lifetime density. One renewal period will be equal either X = t + tER , if the unit fails in the interval (t, t + dt), t ≤ T , which takes place with probability f (t)dt, or it

86

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

will be equal X = T + tP M , if the unit reaches age T , which takes place with probability 1 − F (T ). Thus, after simple algebra, Z T E[X] = (1 − F (x))dx + tER · F (T ) + tP M · (1 − F (T )) . (4.2.13) 0

On one renewal period the unit is operational, on the average, during the time RT (1 − F (x))dx . Suppose that the reward equals the operational time. Then 0 the mean reward per unit time is the system stationary availability: RT (1 − F (x))dx 0 ηF (T ) = R T . (1 − F (x))dx + tER · F (T ) + tP M · (1 − F (T )) 0 It is easy to simplify this expression:  tER · F (T ) + tP M · (1 − F (T )) −1 . ηF (T ) = 1 + RT (1 − F (x))dx 0

(4.2.14)

Thus we see that maximizing the availability is equivalent to minimizing the corresponding cost-type criterion (4.2.11).

4.3

Qualitative Investigation of Age and Block Preventive Maintenance.

The basic expressions for the average costs are: ηblock (T ) = (c + m(T ))/T, − for the block replacement and Z ηage (T ) = (F (T ) + c(1 − F (T )))/

T

(1 − F (x))dx, 0

− for the age replacement.

4.3.1

The principal behavior of ηblock (T ) and ηage (T ) as a function of T . The role of c and F (t).

Both expressions behave similarly when T → 0: both tend to infinity. As t → ∞, both expressions tend to 1/µ, where µ is the mean lifetime. We have proved this fact earlier for block replacement. For age replacement, note that the denominator tends to µ as T tends to infinity and the numerator tends to 1. There are two types of behavior of η(·) (T ), as shown on Fig 4.9. Type I means that the minimal cost is attained at T = ∞, or the PM should be avoided. It easy to check that for F (t) = 1 − e−λt this is the case, obviously

4.3. QUALITATIVE INVESTIGATION

87

Figure 4.9: Type I and type II behavior

for both age and block maintenance. Type II behavior means that there is a finite optimal maintenance period and/or optimal age T ∗ . If c > 1 or F (t) has a DFR, we observe type I behavior. Fig. 4.9 suggests a natural measure Q of preventive maintenance efficiency: Q = η(·) (∞)/ min η(·) (T ) = T >0

1 . η(·) (T ∗ ) · µ

(4.3.1)

For type I behavior, Q = 1; for type II behavior, Q > 1. The greater is Q, the more efficient is the optimal preventive maintenance. As a general rule, we can say that Q increases with the decrease of c and with the decrease of the coefficient of variation of the lifetime F (t). In simple words, preventive maintenance (with a proper choice of T ∗ ) is more efficient if the PM costs less in comparison with the cost of ER, and if the density f (t) is more ”peaked” around the mean value. Note that the largest possible value of Q will be attained for an ”ideal” distribution: F (t) = 0 for t < µ, and F (t) = 1, otherwise. For this case, the optimal T = µ − 0 (”a little less” than µ), and the maximal Q = 1/c. Table 4.1 shows the values of Q and T ∗ for various values of c and the coefficient of variation of F (t). These calculations were made for block replacement, for Gamma distribution Gamma(k, λ). The picture for age replacement √ is similar. Recall that the c.v. = 1/ k. It is seen from Table 4.1 that Q is increasing for each c as the c.v. decreases. Similarly, for each fixed c.v., Q increases with the decrease of c. Typically, the optimal maintenance period T ∗ increases with the decrease of c.v. and the increase of c. It is also typical that the optimal T lies in the range [0.4 − 0.65]µ.

88

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS Table 4.1: Q and T ∗ for Block Replacement c k = 2; 0.15 0.42µ 1.11 0.35 − 1 0.50 − 1

k = 4; 0.4µ 1.72 0.57µ 1.01 − 1

k = 8; 0.47µ 2.50 0.58µ 1.28 − 1

k = 16; 0.56µ 3.23 0.64µ 1.56 0.68µ 1.15

T∗ ,Q T∗ Q T∗ Q T∗ Q

Table 4.2: Dependence of Qage /Qblock on c and k k 2

4.3.2

c = 0.2 c = 0.5 1.10 1

k 4

c = 0.2 c = 0.5 1.08 1.03

k 8

c = 0.2 c = 0.5 1.05 1.14

What is better: age or block replacement ?

Suppose we apply age and block replacement to the same system, each of them in its optimal way. The PM in both schemes costs c; the cost for the ER is 1, also the same for age and block replacement. Let us compare minT >0 ηage (T ) with minT >0 ηblock (T ). It turns out that the optimal age replacement is more profitable than the optimal block replacement. To prove this, we need to involve the notions of optimal Markovian-type strategies, see for details Gertsbakh (1977), p 99. Table 4.2 presents a comparison between age and block replacement for the Gamma family and various c values. This table shows that the age replacement is always better than the block replacement, but the ratio of the efficiencies Qage /Qblock is relatively close to 1. It should be noted that in practice, for the same ER cost, the cost of a PM made in a preplanned time as in the block replacement, will be smaller than the corresponding PM cost for age replacement.

4.3.3

Finding the optimal maintenance period (age) T ∗ .

Formally, T ∗ must satisfy the equation ∂η(·) (T ) =0. ∂T

(4.3.2)

Simple formulas are not available for the root of this equation. It will be a waste of time trying to find T ∗ analytically. I advice to tackle the problem numerically. It is very simple using Mathematica software. Plot the corresponding

4.3. QUALITATIVE INVESTIGATION

89

Table 4.3: Efficiency Q as a function of  β 2 2 2

 = 0.0  = 0.1  = 0.2 λ 1.38 1.28 1.22 2 1.38 1.25 1.13 4 8 1.38 1.22 1.13

 = 0.0  = 0.1  = 0.2 β 1.96 1.64 1.47 3 1.96 1.52 1.39 3 1.96 1.52 1.35 3

η(·) (T ) for T < 2µ. Typically, if the optimal T exists, it lies within the interval [0, 2µ]. Locate the optimal T and use the ”FindMinimum” operator.

4.3.4

Contamination of F (t) by early failures

In practice, the efficiency of the optimal preventive maintenance could be considerably impaired by contaminating the population F (T ) by an exponential population with much smaller mean value. Formally, let us assume that the lifetime distribution has the c.d.f. F (t) =  · F1 (t) + (1 − ) · F0 (t),

(4.3.3)

where F0 is the principal population with an increasing failure rate. It is contaminated with an addition of  · 100% of ”bad” items having exponential lifetime F1 (t) = 1 − e−λt . Let us remind that for c.d.f. F1 , no optimal preventive maintenance policy exists. Let us consider a numerical example. Example 3.1 : optimal age replacement for a contaminated Weibull population. β Let us take F1 (t) = 1 − e−λt , and F0 (t) = 1 − e−t , β = 2 and 3. Table 4.3 presents the results of computing Q for various  and λ values, for c = 0.2. It is seen that even a small contamination of 10% may considerably reduce the efficiency Q.#

4.3.5

Treating uncertainty in data

When we know exactly the lifetime distribution F (t) and the maintenance cost c, the problem of finding the optimal maintenance period (or age) T ∗ is in principle a simple task, even if there are no closed form expressions for T ∗ . How to handle the problem of finding the best maintenance policy if there is uncertainty with regard to F and c ? To be more specific, let us consider an age replacement model with two possibilities regarding the c.d.f. F (t), e.g. F1 ∼ W (λ1 , β1 ) and F2 ∼ W (λ2 , β2 ). Suppose that these two distributions represent two extremes: the ”best” and the ”worst” case, respectively. We suggest to define these extremes according to the value of the coefficient of variation. So, F1 has the smallest c.v. and F2 has the largest c.v.

90

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Suppose that the cost c lies between cmin and cmax . Thus we consider four data combinations: data(1)= (F1 , cmin ); data(2)= (F1 , cmax ); data(3)= (F2 , cmin ); data(4)= (F2 , cmax ). Fig 4.10 presents four curves of ηage (T ; data(i)) for i = 1, 2, 3, 4. Our decision is choosing an appropriate age T for the PM. How to do that in case of uncertainty? One possible solution would be the ”statistician’s approach”: let us nominate a certain ”probabilistic weight” to each data set and calculate the average cost ηˆage (T ). Then we choose that value of T which minimizes ηˆage . (The statistician would say that he/she minimizes the average risk). A reasonable approach is putting equal weights to all four data combinations. Thus we write the expression ηˆage (T ) =

ηage (T ; data(1)) + ... + ηage (T ; data(4)) , 4

(4.3.4)

and look for T ∗∗ which minimizes ηˆ: minT >0 ηˆage (T ) = ηˆage (T ∗∗ ). ηˆage (T ∗∗ ) is called in statistical literature Bayesian risk, see DeGroot (1970), Section 8.2, and T ∗∗ is called the Bayesian decision. The Bayesian risk is shown on Fig. 4.10 as a dashed curve with white circles.

Figure 4.10: Curve i corresponds to data(i), i = 1, 2, 3, 4

4.3. QUALITATIVE INVESTIGATION

91

Another approach which I would call ”the pessimist’s solution” is the following: choose for each data set the best age Ti∗ . Assume that the ”nature” plays against us and, as by Murphy’s Law, the real data would always correspond to the maximal cost for this age. For example, Fig. 4.10 shows that the data set 4 is the worst for T2∗ . The maximal costs are shown by big black dots, and the minimal costs − by small black dots. Now choose the data set which provides the minimal value of the maximal costs. For the curves on Fig. 4.10, this will be data(4). The corresponding ”best” T is T4∗ . Suppose that we have to choose between two extreme cases: the first one is an IFR − type lifetime distribution with mean µ. The optimal maintenance age T ∗ guarantees ηage (T ∗ ) = 0.7/µ. The second is an exponential lifetime distribution with the same mean µ, for which, as we know, the optimal age is infinity and ηage (∞) = 1/µ. The ”pessimist approach” chooses in this situation the option ”never do a PM”, i.e. formally an infinite age for the replacement. Somehow, it is consistent with the principle: ”If it ain’t broken, don’t fix it”. Another version of the pessimist’s approach is the so-called minimax principle. Rappoport (1998), p. 51 says ”...that it rests on the assumption that the worst that can possibly happen will happen. Application of the principle amounts to expecting the worst possible outcome of each choice and choosing the alternative of which the worst outcome is the best of all the worst outcomes associated with the alternatives”. (See also DeGroot (1970, Section 8.7). In our case, ”the alternatives” are the replacement ages T . The implementation of the minimax principle to our situation means (i) calculating ηmax (T ) = max{ηage (T ; data(1)), ..., ηage (T ; data(4))} and (ii) finding the value T0 which would it minimize ηmax (T ). Let us illustrate the minimax apoproach by Fig. 4.11. The curve η1 corresponds to an exponential distribution with mean µ1 = 1, and the curve η2 - to an IFR distribution with mean µ2 = 1/2. The dashed curve is ηmax (T ). According to the minimax approach, we should choose the age T0 . The pessimist would choose T2∗ .

4.3.6

Age replacement for the IFR family

We have already mentioned in Section 2.2 that there are quite satisfactory bounds on 1 − F (t) if F is of IFR type and the first two moments are known. Let us use these bounds for finding the best age replacement policy. We remind that the standard expression for the average cost is ηage (T ) =

F (T ) + c · (1 − F (T )) , RT (1 − F (x))dx 0

ηage (T ) =

1 − (1 − c) · (1 − F (T )) . RT (1 − F (x))dx 0

or (4.3.5)

92

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Figure 4.11: The dashed curve is max(η1 (T ), η2 (T ))

Let [1 − F (t)]∗ and [1 − F (t)]∗ be the upper and lower bounds for [1 − F (t)], respectively. Substitute the lower bound into (4.3.5). Then we obtain an upper bound on ηage (T ) which we denote φmax (T ). Similarly, substitute the upper bound and obtain the lower bound denoted φmin (T ). Therefore,

φmin (T )
0. Proof Express Cα (T ) from (4.3.7) and multiply it by α: RT ( 0 f (x)e−αx dx + c(1 − F (T ))e−αT ) · α α · Cα (T ) = . RT 1 − 0 f (x)e−αx dx − (1 − F (T ))e−αT In this fraction, numerator and denominator tend to zero as α tends to zero. Use L’Hospital’s rule, differentiate the numerator and denominator with respect to α and set α = 0. The result follows. The practical conclusion from this theorem is that for small discount factor values (which is practically the case in reality), the cost per unit time C0 (T ) is approximately proportional to the discounted cost Cα (T ). In practical terms it means that it does not matter which of these cost criterions will be used for finding the best replacement age T ∗ .

4.3.8

Random choice of preventive maintenance periods

It was assumed in all principal maintenance models considered in Section 2 that the maintenance period T or the maintenance age T is a nonrandom quantity which we choose in some ”optimal” way to optimize the costs or rewards. In practice, however, the maintenance period is always subjected to certain random changes. For example, it has been decided to inspect the mechanical equipment of an elevator each half a year. The technicians who carry out the maintenance may start the inspection a week later (or a week earlier), depending on their working load in taking care of other elevators. The ”right” nomination of the inspection time for an elevator should be done in terms of operation hours. For example, it is assumed that the elevator works, on the average, 8 hours daily. Suppose that the best inspection period is 1 500 operation hours, which on the average is close to a half-year inspection period. But even if the elevator is inspected exactly each 182 days, the total amount of operation hours between inspections will have considerable variations.

4.3. QUALITATIVE INVESTIGATION

95

We have two tasks. First, we have to derive an expression for costs for the random maintenance period. Second, we have to investigate how the presence of randomness influences the expected costs. Suppose that we are able to control the randomness of the maintenance period, i.e. to choose the c.d.f. GT (t) of the maintenance period T . Is it possible to reduce the minimal expected costs by a proper choice of the c.d.f. GT (t) ? It is easy to derive an expression for costs under random maintenance period. Assume that the random maintenance period T is independent on the system lifetime and has the density function gT (t). A typical expressions for the average cost per unit time for a fixed maintenance period T has the form E[R(T )] , E[L(T )]

η(T ) =

(4.3.10)

where the denominator is the expected duration of the renewal period (which of course depends on T ), and the numerator is the expected cost per one such RT renewal period. For example, in case of age replacement, E[L(T )] = 0 (1 − F (t))dt, and E[R(T )] = cf F (T ) + cpm (1 − F (T )), where F (·) is the c.d.f. of system lifetime, cf and cpm are the costs associated with the ER and PM, respectively. Now, if T is random variable itself, the expected duration of one renewal period becomes Z ∞ L0 = E[L(x)]gT (x)dx . (4.3.11) 0

Similarly, the expected cost per one renewal period is now Z ∞ R0 = E[R(x)]gT (x)dx .

(4.3.12)

0

Then the expected cost per unit time becomes η0 =

R0 . L0

(4.3.13)

For example, consider age replacement with T being random, e.g. T is uniformly distributed in the interval [tmin , tmax ]. Then R tmax (cf F (x) + cpm (1 − F (x)))dx u . (4.3.14) η0 = tmin R tmax R x ( 0 (1 − F (t))dt tmin The following theorem demonstrates that no reduction in the expected costs can be obtained by introducing randomness in the maintenance parameter T . Theorem 3.2 Let min η(T ) = η(T ∗ ) = T >0

E[R(T ∗ )] . E[L(T ∗ )]

96

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Then η0 ≥ η(T ∗ ). Proof Since η(x) =

E[R(x)] E[R(T ∗ )] ≥ , E[L(x)] E[L(T ∗ )]

E[R(x)] · E[L(T ∗ )] ≥ E[R(T ∗ )] · E[L(x)] .

(4.3.15)

Multiply both sides of (4.3.15) by gT (x) and integrate with respect to x from 0 to ∞. Then Z ∞ Z ∞ E[R(T ∗ )]E[L(x)]gT (x)dx. (4.3.16) E[R(x)] · E[L(T ∗ )]gT (x)dx ≥ 0

But (4.3.16) is equivalent to R∞ E[R(x)]gT (x)dx E[R(T ∗ )] ≤ R0∞ , ∗ E[L(T ) E[L(x)]gT (x)dx 0

0

(4.3.17)

which proves the theorem. Example 2.4 continued Suppose that the system is maintained according to the minimal repair scheme, see subsection 4.2.4. The failure rate is h(t) = 0.5t, the PM cost is c = 0.2, the failure cost is 1. Find the optimal period of minimal repair. Consider a random choice of the minimal repair period T : T ∼ Exp(1) and compare the costs. RT In our case the cost criterion is ηD (T ) = ( 0 0.5tdt+0.2)/T = (0.25T 2 +0.2)/T . p The optimal period is T ∗ = 2c/a = 0.89. It corresponds to ηD (T ∗ ) ≈ 0.45. The cost criterion for random T equals, according to (4.3.13), R∞ 0.25x2 e−x dx + 0.2 η0 = 0 R ∞ −x . (4.3.18) xe dx 0 After little algebra we obtain η0 = 0.52. Interestingly, this is not much worse than the optimal result 0.45. # Remark R R The expression (4.3.10) has a general form η = f (x)dF (x)/ g(x)dF (x), where F (x) is a c.d.f. of some random variable. Very often, this c.d.f. is not known, and only some partial information is available about it. For example, we might know only the first two moments (i.e. the mean and standard deviation) of the r.v. X ∼ F (x). An interesting mathematical theory developed by Stoikova and Kovalenko (1986) (see also Kovalenko et al (1997)) allows obtaining bounds on η in the class of all c.d.f.’s having the given values of the moments. #

4.4. OPPORTUNISTIC MAINTENANCE

4.4 4.4.1

97

Optimal Opportunistic Preventive Maintenance of a Multielement System. Introduction

Preventive maintenance of real-life systems is characterized by three features: the maintenance policy involves many components (elements); most of preventive maintenance actions usually involve several components simultaneously; the cost of a ”joint” preventive action is considerably smaller than the total cost of similar maintenance actions if they were carried out separately. When several components are served simultaneously, then the instant chosen for maintenance is hardly ”optimal” for all components involved. But nevertheless, there is a gain in the cost and in the convenience when a group of components undergoes simultaneously checkup, inspection and maintenance. This type of servicing is termed opportunistic. Most of complex real-life systems have in fact an ”opportunistic” maintenance schedule, for example cars, passenger aircraft, etc. So far we have considered only one example of an opportunistic maintenance, see Section 4.1, Example 1.3. Now we do it in detail.

4.4.2

System Description. Coordination of Maintenance Actions. Cost Structure

We assume that the whole assembly under service has three levels: element level; subsystem level and system level. Very helpful would be to follow the description by checking with the example on Fig. 4.13. The subsystems are denoted by indices i = 1, 2. The elements of subsystem i are denoted by (i, j), j = 1, 2, .... Fig. 4.13 shows that the subsystem 1 has elements (1,1),(1,2), and the second subsystem has elements (2,1),(2,2) and (2,3). To plan and to coordinate the maintenance actions, a common time axis t is introduced. It is convenient to think that t is not a calendar time but rather an operational time scale, for example the mileage scale. On this scale, the maintenance actions are carried out simultaneously for groups of elements. To simplify the maintenance scheduling and to allow carrying out joint maintenance actions, a common time grid with a step ∆ is introduced. One may assume that ∆ = 1. Any preventive maintenance action can be carried out only at time t∗ which is a multiple of ∆: t∗ = k · ∆, where k is an integer. Assume that the cost for a failure of element (i, j) is aij . The cost for a preventive (i.e. scheduled) maintenance (PM) of element (i, j) consists of direct costs cij plus the set-up costs. The structure of these set-up costs is crucial in our problem. If an element (i, j) undergoes a PM, the cost paid in addition to cij is f0 for involving the whole system plus fi for involving the subsystem i to which this element belongs. The ”opportunistic” nature of the costs is reflected in the following. Suppose

98

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Figure 4.13: Scheme of the opportunistic maintenance that at certain instant ∆ · k = t, a maintenance is planned for elements which constitute a group G. Suppose that G involves a set S of subsystems. Then the total costs for servicing the whole group G will be : C(G) =

X (ij)∈G

cij +

X

fi + f0 .

(4.4.1)

i∈S

Don’t be frightened by this expression. On Fig 4.13, at time t = 6, a PM is planned for elements (11), (12), (21), and (23). This is the group G. The subsystems involved are 1 and 2; S = {1, 2}. The total PM cost will be c11 + c12 + c21 + c23 + f1 + f2 + f0 . Let us consider an example. Example 4.6 Write an expression of the costs per unit time for the system shown on Fig. 4.13. Assume that element (ij) is serviced with period tij . Denote by t = (t11 , t12 , t21 , t22 , t23 ). Assume t11 = 2, t12 = 3; t21 = 2, t22 = 4, t23 = 6. Each element is repaired according to the minimal repair scheme, (see subsection 4.2.4) and the average number of failures on [0, tij ] for element (i, j) is Hij (tij ). The least common multiple (l.c.m.) of all tij values is 12 ( set ∆ = 1). The whole preventive maintenance scheme is shown on Fig.4.13. The mean costs per

4.4. OPPORTUNISTIC MAINTENANCE

99

unit time are:  1  η(t) = · 6(H11 (2)a11 + c11 ) + 4(H12 (3)a12 + c12 ) + 12  1  · 6(H21 (2)a21 + c21 ) + 3(H22 (4)a22 + c22 ) + 2(H23 (6)a23 + c23 ) + 12 8f0 + 8f1 + 6f2 . 12 Let us explain this expression. The component (1,1) is repaired 6 times on the period of 12, the component (1,2) - four times, etc. This explains the factors at the Hij (·) terms. Besides, the subsystems 1 and 2 are involved in the preventive maintenance 8 and 6 times, respectively. For example, the subsystem 2 is ”opened” at t = 2, 4, 6, 8, 10, 12. The total number of maintenance actions on the period (0, 12] is eight. This explains the last term. #

4.4.3

Optimization Algorithm. Basic Sequence

We want to find an optimal vector t which would minimize η(t). A formal description of the optimization algorithm needs introducing a notion of a basic sequence. Suppose that for a subsystem i we have a set of preventive maintenance (PM) periods t0 = (ti1 , ..., tim ). Without loss of generality, we may assume that this sequence is ordered, so that ti1 ≤ ... ≤ tim . By the definition, the basic sequence consists of vectors t0 , t1 , t2 , .., . where each next vector is obtained by an increase by ∆ of the minimal component(s) of the previous one. For example, if t0 = (2, 4, 5) then t1 = (3, 4, 5); t2 = (4, 4, 5), t3 = (5, 5, 5), etc. The optimization algorithm consists of the following steps: Step 1: Individual (element) optimization. On this step, an optimal PM period is found for each element separately, without taking into account the subsystems and system set-up costs. Step 2: Optimization on subsystem level. Take the optimal periods of PM found for elements of a subsystem i in Step 1 and organize them into a vector ti . This will be the initial vector of the basic sequence for this subsystem. The basic sequence for subsystem i is obtained according to the above description. For each vector of this basic sequence t0i , t1i , t2i , ..., calculate the costs per unit time for the i-th subsystem. These costs include the set-up costs fi only. Denote by t∗i the best vector for the subsystem i. Step 3: optimization on system level. Concatenate all subsystem optimal vectors {t∗i } obtained in Step 2 into one vector for the whole system. Denote it t00 . Construct the corresponding basic sequence. Optimize the total costs

100

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

per unit time on the vectors of this sequence. Take into account all set-up costs, including f0 . The vector from the ”all-system” basic sequence t∗0 which minimizes the costs is the optimal PM vector. Example 4.6 continued. Let us implement the above algorithm for the system of Fig. 4.13, for the following data: all aij = 1. c11 = 0.1, c12 = 0.4; c21 = c22 = c23 = 0.2. H11 (t) = t2 ; H12 (t) = 0.25t2 ; H21 = t + t2 ; H22 (t) = 0.1t2 ; H23 (t) = 0.05t3 . The set-up costs are f1 = f2 = 5, f0 = 8. Step 1: optimization on element level. For element (ij), the expression for minimization is Hij (t) · aij + cij . (4.4.2) t Simple calculations show that the optimal periods for elements (ij) are: t∗11 = 1, t∗12 = 1; t∗21 = 1, t∗22 = 2, t∗23 = 2. Aij (t) =

Step 2: minimization on subsystem level. Subsystem 1 The basic sequence is (1, 1), (2, 2), (3, 3), ... We have to minimize the sum of individual costs plus the contribution of the subsystem 1 set-up costs. Let d1 be the l.c.m. of t11 , t12 and let M1 (t11 , t12 ) be the number of PM’s on the interval (0, d1 ]. Then add the subsystem 1 set-up cost M1 f1 /d1 . For example, if t11 = 2, t12 = 3, then d1 = 6 and M1 = 4, since the PM’s are made at the instants (2,3,4,6). Then one should add 4f1 /6. For subsystem 1 we have to minimize the following expression: B(x,y) =

x2 + 0.1 0.25y 2 + 0.4 5 + + . x y x

(In our example x = y; the subsystem 1 is ”opened” each x units of time). For (1, 1) we have B11 = 6.75. For (2, 2), B2,2 = 5.25. For (3, 3) we obtain B3,3 = 5.58, etc. The optimal vector is (2,2). Subsystem 2 For subsystem 2, the basic sequence is (1,2,2), (2,2,2), (3,3,3),... Similarly to subsystem 1, we obtain that the best vector is (2,2,2). Step 3: the system level. The basic sequence for the whole system is (2,2,2,2,2), (3,3,3,3,3), (4,4,4,4,4),... Use for minimization the following principal expression: X Hij (xij )aij + cij M1 f1 M2 f2 M0 f0 + + + . (4.4.3) η(x11 , ..., x23 ) = xij d1 d2 d0 i,j

4.4. OPPORTUNISTIC MAINTENANCE

101

Explanation is needed for Mi and di . di is the least common multiple of (xi1 , xi2 , ..). For example, if xi1 = 2, xi2 = 3, xi3 = 4, then di = 12. Mi is the total number of PM’s made to subsystem i on (0, di ]. For the above numbers, Mi = 8. d0 is the l.c.m. of (x11 , ..., x23 ), and M0 is the total number of PM’s on (0, d0 ]. In our example, the calculations are very easy because all xij are the same for each vector of the basic sequence. The minimal costs are obtained for (3,3,3,3,3) and are equal 14.6.# The main advantage of the above described algorithm is that one must investigate only the vectors of the basic sequence. This algorithm provides the exact minimum if (i) the time intervals allowed for PM form a sequence of type ∆ , ∆ · k1 , ∆ · k1 · k2 , ..., where k1 , k2 , ... are integers. For example, the following sequence satisfies this property : 1, 2, 4, 12 , 24, ...; (ii) Hij (x) are convex functions of x. This demand is satisfied if the component lifetimes are of IFR type. We believe that for general case our algorithm provides a solution close to optimal. More details about this algorithm can be found in Gertsbakh (1977), pp. 216–227.

4.4.4

Discussion. Possible Generalizations

1. The additive structure of PM costs, as described above, might be adequate for servicing a truck. The cost f0 reflects the cost for removing the truck from service and for taking it to the service station; f1 is the cost paid every time the engine is inspected, diagnosed and repaired; f2 is the payment for inspecting and testing the braking system, etc. It is a reasonable assumption that all these costs must be added together. A quite different situation is servicing a military aircraft. Here the main purpose is to get the maximal readiness (availability). Instead of costs it would be wise to look at the duration of servicing aircraft systems. If the fuel system and engines are served simultaneously, then the true ”cost” would be the service duration determined by the longest service time. 2. Optimal planning of maintenance activities has a lot in common with scheduling problems. The following example suggested by George et al (1979) is a good illustration of how replacement and ”classical” scheduling are related to each other. Suppose we have to maintain in proper conditions the engines of a two-engine aircraft. Each engine has a limited resource in hours of continuous operation. The following stock of engines is available: 1 engine with 100 hrs resource; 4 engines with 300 hrs resource; 1 engine with 200 hrs resource; 1 engine with 500 hrs resource. The total resource of all engines is 2 000 hrs. A schedule which allows operation of the aircraft during 1 000 flight hours is shown on Fig. 4.14, a. The

102

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Figure 4.14: Two schedules of engine replacement

replacement of one engine or of two engines simultaneously costs c = 1. Thus the cost of the schedule on Fig. 4.2, a is 6. Clearly, it is not the most economic solution. A relatively small enumeration reveals that there is a cheaper schedule which is shown on Fig. 4.2. b. Here the total cost of engine replacements is 5. Finding out an optimal replacement schedule for a fleet of several aircraft may be a quite challenging combinatorial problem. The crucial feature of engine replacement schedule is that the simultaneous replacement of two engines costs two times less than the total cost of two individual replacements. Cost reduction due to a simultaneous performance of several maintenance actions is typical for any opportunistic replacement scheme.

4.5.

4.5

EXERCISES

103

Exercises

1. You have at your disposal the following data on failures of a compressor unit of a new type of industrial refrigerator: 127 units started operating on January 1, 1990; 8 of them failed during the first two ears of operation; the total of 58 failures was registered during the period [1990, 1995]. a. Assume that the compressor’s lifetime has a lognormal distribution. Equate t1 = 2 years to the 8/127 -quantile, and t2 = 5 years to the 58/127 -quantile. Estimate the parameters. b. Assume that each failure of the compressor costs 3, 500$, and each preventive replacement costs 1, 000$. Would you advice to carry out an age replacement at age T = 3 years ?

2. An equipment for cotton production has two independently failing identical mechanical parts. Two service strategies are suggested: (i) replace each part upon its failure only; (ii) at any failure, replace also the nonfailed part. Each replacement completely renews the corresponding part. An individual replacement costs 1, 500 $. Replacing the nonfailed part together with the failed part costs 2, 000 $. There is uncertainty regarding the lifetime distribution of the parts. The first hypothesis is that τ ∼ Gamma(k = 7, λ = 1). The second hypothesis is that τ ∼ Gamma(k = 4, λ = 0.5). Analyze replacement strategies (i) and (ii) under both hypotheses and make your suggestions regarding the ”best” maintenance policy.

3. An age replacement is considered for a bearing unit of paper production line. The lifetime (in years) is τ ∼ W (λ = 1, β = 4). The cost of emergency repair (upon failure) lies within the limits ce = [5 − 20]. The cost of the PM is cp = 1. Make your suggestion regarding the optimal age of the preventive maintenance.

4. A street lamp is replaced periodically, with the replacement period T being a random variable T ∼ N (µ = 1, σ 2 = 0.1). The lifetime of the lamp is τ ∼ Gamma(k = 4, λ = 1). Each failure costs 100 $, each preventive maintenance costs 5 $. Derive an expression for costs per unit time and evaluate it numerically. Hint: Consider first a fixed period T . The costs per unit time have the form η(T ) = A(T )/T . Argue that for random T , one should take η = E[A(T )]/E[T ],

104

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

where the mean is taken with respect to the distribution of T .

5. The lifetime of a mechanical part has a Weibull distribution with mean 2 000 hrs, and the c.v.=0.3. It has been decided to carry out an age replacement of the part. The ER takes 50 hrs and renews completely the part. The PM (which also renews the part) lasts 10 hrs. Find the optimal age T ∗ for the age replacement which would maximize the stationary availability. Hint: Derive an expression for the stationary availability and reduce the problem to finding an optimal age replacement.

6. A system consists of two parts denoted 1 and 2. Each part undergoes a periodic repair. The repair period can be T = ∆, 2∆, 4∆, 8∆, 16∆, (set ∆ = 1). The failures between scheduled repairs are eliminated according to the minimal repair scheme. The failure rates of the parts are : λ1 (t) = 0.1 · t; λ2 (t) = 0.06 · t2 . Each failure costs 10 $, each scheduled repair costs 2 $. The system set-up cost paid at each scheduled repair is f0 = 8. Find an optimal opportunistic replacement schedule for the system.

7. Two new type heavy trucks were monitored during 50 000 and 70 000 miles, respectively. Engine failures were observed at 26 000; 32 500 and 43 300 miles in the first truck, and at 19 500; 35 300; 50 200 and 68 700 miles in the second truck. a. Assume that engine failures occur according to a non-homogeneous Poisson process (NHPP) with intensity function λ(t) = exp[α + βt]. Estimate α and β using the maximum likelihood method described in Appendix 1. b. Each failure costs, on the average, 1,000 $ and each preventive repair costs between 100 $ and 300 $. Assume that the engines are serviced according to the minimal repair scheme, see subsection 4.2.4. Assume that the engine failures ˆ where α appear according to a NHPP with intensity function λ(t) = exp[ˆ α + βt], ˆ ˆ and β are the MLE estimates found in part a. Make your suggestion regarding the optimal minimal repair period.

8. Optimal periodic inspection of a chemical defense equipment. Chemical defense equipment (CDE) is periodically inspected during its storage in order to maintain its readiness. During the inspection, the CDE is checked whether it is capable of neutralizing active ingredients of the chemical weapon in case of a chemical attack. The interval between inspections is T , and the inspection lasts time t0 . The storage-inspection scheme is shown on Fig. 4.15. At the instant t = 0, the CDE

4.5.

EXERCISES

105

Figure 4.15: Operation − inspection scheme of the CDE is put on the storage. During the interval [T, T + t0 ], the CDE is inspected. If the CDE has failed in the interval [0, T ], then the inspection reveals the failure, and the CDE is completely renewed toward the end of the inspection period. If there is no failure, the next inspection starts at the instant 2T + t0 , and it lasts t0 . It will reveal the failure if it appeared after the last inspection, and so on. Assume that the lifetime of the CDE has a known density function f (t). The CDE is available for use (”ready”) if it has not failed, and only between inspections.( It is not available during the inspections). It is assumed that the ”lifetime failure clock” works during the inspection and between the inspections in the same way. It will be assumed that if the CDE fails during the inspection, this fact will be immediately revealed, and the CDE will be completely renewed toward the end of the inspection. Our task is to find the optimal value T ∗ of the inspection period T to maximize the stationary availability of the CDE. Denote by η(T ) the stationary availability of the CDE. Investigate η(T ) numerically for f (t) = e−t and t0 in the range 0.05 − 0.1.

Hint: η(T ) = A(T )/B(T ), where A(T ) is the ”ready” time of the CDE during one storage-inspection-renewal cycle, and B(T ) is the mean duration of this cycle.

The expression for A(T). Suppose that the CDE fails at the instant x, which lies inside the ”ready” period: x ∈ [k(T +t0 ), k(T +t0 )+T ], where k = 0, 1, 2, .... Then with probability f (x)dx, the ”ready” time equals x − kt0 . If the CDE fails during the inspection, i.e. x ∈ [k(T +t0 )+T, (k +1)(T +t0 )],

106

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

then the ”ready” time equals (k + 1)T . This leads to the following expression: A(T ) =

∞ Z X k=0

∞ X

ak +T

(x − kt0 )f (x)dx +

ak

[F (ak+1 ) − F (ak+1 − t0 )] · (k + 1)T,

k=0

where ak = k(T + t0 ) and F (t) =

Rt 0

f (x)dx.

The expression for B(T). If the failure appears in the interval [ak , ak+1 ], the cycle length equals (k + 1)(T + t0 ). This leads to the expression B(T ) =

∞ X

(k + 1)(T + t0 )[F (ak+1 ) − F (ak )] .

k=0

9. The following problem is discussed by Elsayed (1996), p. 530. A critical component of a complex system fails when its failure mechanism enters one of two stages. Stage one is entered with probability α and the stage two – with probability 1 − α. The lifetime of the component in stage i is exponentially distributed with parameter λi , i = 1, 2. Determine the optimal preventive maintenance interval for different values of λ1 and λ2 , if the failure costs cf = 500$ and the preventive maintenance costs cp = 100$. (The author meant a block replacement). What is, in your opinion, the optimal period of the preventive maintenance?

10. Age replacement with a minimum-type exponential contamination. Consider an age replacement applied to a series system consisting of two independent components. The lifetime of the first component is τ1 ∼ Exp(λ1 ), and the lifetime of the second component is τ2 ∼ W (λ2 = 1, β = 2). The failure costs cf = 1, the preventive maintenance costs cp = 0.2. Find the value of λ1 which would provide that E[τ1 ] = E[τ2 ]. Write the expression for the cost functional η(T ), where T is the replacement age. Investigate η(T ) numerically and graphically and find out the optimal T value and the corresponding minimal cost. Compute also the efficiency Q of the optimal age replacement. Repeat the calculations for the system which has only one (the second) component. You will obtain much higher efficiency. Explain this fact. 11. Comparison of optimal age replacement policies. Compare ηage (T ) for two distributions logN (µ, σ) and W (λ, β), which have the same mean equal 1 and

4.5.

EXERCISES

107

the same coefficient of variation c.v. = 0.25. Assume that the failure costs cf = 1, and that the PM costs c = 0.2.

12. Minimal repair with partial renewal A mechanical equipment is repaired according to the minimal repair scheme with partial renewal, see section 4.2.4. The equipment is repaired at the instants T0 · k, k = 1, 2, 3. The degradation factor equals eα . The equipment was observed on the first five consecutive intervals and the following numbers of failures have been recorded: n1 = 7, n2 = 9, n3 = 12, n4 = 16, n5 = 21. The interval between incomplete renewals is T0 = 1. R 1 Estimate the mean number of failures H(1) = 0 h(v)dv and the degradation α factor e . Assume the following costs: cmin = 15 $, cpr = 150$, and Cov = 1, 000 $ . Find out the optimal period K ∗ for equipment overhaul.

108

CHAPTER 4. PREVENTIVE MAINTENANCE MODELS

Chapter 5

Preventive Maintenance Based on Parameter Control

Everywhere in life, the question is not what we gain, but what we do Carlyle – Essays

5.1

Introduction. System Parameter Control

So far we distinguished two states of the system (component) - up and down states, or failure and nonfailure states. The statistical information about the system (component) was related to its lifetime, i.e. to the transition time from the no failure (”good”) state into the failure (”bad”) state. We can picture the state of the system as a binary random process, see Fig. 5.1, a. In this chapter we consider several models of preventive maintenance for which we know enough to distinguish intermediate states between the ”completely new” and the ”completely failed” system. A good example might be a failure development as a damage accumulation model, recall the model leading to the Gamma distribution. Preventive maintenance of a system with many states is an intervention into the damage accumulation process ξt by ”pulling it down” from a dangerous state into the ”safe” state, see Fig. 5.1, b. 109

110

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Figure 5.1: a: A binary system; b: Shifting system parameter from state n − 1 to state 0 Mechanical deterioration is a good example of a process in which we can distinguish intermediate states between ”brand new” system and failed system. The recent paper by Redmont et al (1997) studies the deterioration of concrete constructions (e.g. bridges) and defines twelve stages of failure development, starting from a hairline crack to a large-volume splitting and corrosion. A system with n standby units allows a natural definition of intermediate states ξ(t) = i, i = 0, 1, 2..., n, according to the number i of failed standby units. The most advanced preventive maintenance methods are based on a periodic or continuous follow-up of system diagnostic parameters, such as vibration monitoring, output fluid control, etc., see e.g. the recent book by Williams et al (1998) ”Condition - based maintenance and machine diagnostics”. We discuss in Section 5.7 the preventive maintenance strategy based on a follow-up of multidimensional prognostic parameter. Obviously, one might expect that the preventive maintenance based on a periodic or continuous monitoring of system diagnostic parameter must be more efficient than a preventive maintenance policy based only on the knowledge of the existence of ”good” and ”bad” states. There is however a price for this higher

5.2. MAINTENANCE OF A MULTILINE SYSTEM

111

efficiency: we have to know the statistical properties of the random process ξ(t) which represents the damage accumulation and/or serves as a prognostic parameter for system failure. Usually this is a more involved task than just describing in statistical terms the transition time from ξ(t) =” brand new” into ξ(t) =”failure”.

5.2

Optimal Maintenance of a Multiline System

We consider in this subsection a system consisting of n identical subsystems (lines). Every line operates and fails independently of others, while the replacement (repair) of the failed line requires that the work of all lines be stopped. During the operation of the system, information is received about lines’ failures. Define the state of the system ξ(t) as the number of failed lines at time t. We assume therefore that we observe ξ(t). It will be assumed also that the failure-free operation time of every line has a known c.d.f. F (t). System maintenance is organized as follows. When the number of failed lines ξ(t) reaches the prescribed number k, 0 < k ≤ n, the system is stopped for repair, all failed lines are repaired, and all nonfailed lines are brought to their ”brand new” state. Thus the maintenance policy means shifting ξ(t) from ”dangerous” state ξ = k into the initial state ξ = 0. The real-life prototype of the multiline system are certain types of mass production lines, and it is natural to assume that for them the reward is proportional to the total failure-free operation time. So, we assume that each line gives a unit reward during a unit of its failure-free operation. Denote by t(k) the time needed to repair the system. Our purpose is to find the optimal k which provides the maximal mean reward per unit time. Denote by τi the failure free operation time of line i, i = 1, 2, ..., n. Denote by τ(i) the i-th ordered failure instant. The total system operation time during one operation-repair cycle is then W (k) = nτ(1) + (n − 1)(τ(2) − τ(1) ) + ... + (n − k + 1)(τ(k) − τ(k−1) ) .(5.2.1) The length of one operation-repair cycle is L(k) = τ(k) + t(k) .

(5.2.2)

Recalling the theory from Section 4.1, we arrive at the expression for the mean reward per unit time: E[W (k)] . (5.2.3) v(k) = E[L(k)] + t(k) In order to be able to compare maintenance efficiency for systems with different n values, it is convenient to characterize the efficiency of the maintenance rule by the ratio χk = v(k)/n .

(5.2.4)

112

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

We continue by assuming that each line has a constant failure rate λ. The expression for W (k) is in fact identical to the formula for the TTT (total time on test), see Exercise 4, Chapter 2. Thus W (k) ∼ Gamma(k, λ) and E[W (k)] = k/λ . Pk

i=1 (n

− i + 1)−1 . After simple algebra we

1 k . Pk n (n − i + 1)−1 + t(k)λ i=1

(5.2.6)

For τ ∼ Exp(λ), E[τ(k) = λ−1 obtain that χk =

(5.2.5)

Suppose that the repair time of k lines is t(k) = N + L · k .

(5.2.7)

Consider first the case N = 0. It is easy to show that the denominator of χk increases with k, and therefore the optimal k = 1. In other words, for N = 0, the best policy is to stop the system for the repair immediately after the first failure of a line. Now assume that the repair time is constant and does not depend on k, i.e. t(k) = N . Then the optimal k is usually not equal to one. Fig. 5.2 shows several graphs for the reward per unit time, for various values of n = 4, 6, 8, 12 and N = 0.1, 0.2, 0.3. (We assume that λ = 1). The curves on this figure show that an incorrect choice of k may considerably reduce the reward. For example, for n = 8 and N = 0.2, the best k = 3. It provides a 50% increase of χ in comparison with a wrong choice of k, say k = 1.#

Remark. The problem considered in this section is very similar to the block replacement of a group of machines, see subsection 4.2.4. The maintenance policy considered there does the following: all machines are renewed after time T has elapsed since the last repair. Thus, the decision to repair was based on time. A similar problem was considered by Okumoto and Elsayed (1983). In this section, the decision to renew the machines is based on observing the number of failed machines (”lines”). An interesting theoretical question is which policy can provide better results. It turns out that the answer is not trivial, and certain advanced theoretical considerations must be involved in answering this question. So, if the machines (”lines”) have identically distributed exponential lives, the optimal maintenance policy among all possible policies belongs to the subclass of the so-called Markovian policies. Simply speaking, these are the maintenance rules which are based on observing the process ξ(t). The best maintenance decision belongs to the class ”repair after ξ(t) has reached some critical level”. More on that issue the reader can find in the paper by Gertsbakh (1984). #

5.3. TWO-STAGE FAILURE MODEL

113

Figure 5.2: Reward for various k and n values

5.3 5.3.1

Preventive Maintenance in a Two-Stage Failure Model The Failure Model

We distinguish initial failures (”damages”) and terminal failures. The initial failures appear according to a nonhomogeneous Poisson process (NHPP) with rate λ(t). This means that an initial failure appears in the interval [t, t + ∆] with probability λ(t)∆ + o(∆) as ∆ → 0. Each initial failure degenerates into the terminal failure after a random time X, see Fig. 5.3. An initial failure which appears at time τi , degenerates into the terminal failure after time Xi , i.e. the terminal failure appears at the instant

114

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Zi = τi + Xi . We assume that {Xi } are i.i.d., independent on τi , and that Xi ∼ F (t).

Figure 5.3: Degeneration of initial failures into terminal failures

5.3.2

Preventive Maintenance. Costs

The whole process of failure appearance and preventive maintenance is considered on a finite time interval [0, T ], see Fig 5.4.

Figure 5.4: Preventive maintenance is scheduled at the instants t1 , ..., tk Preventive maintenance (PM) actions are scheduled at the instants t1 , t2 , ..., tk , 0 < t1 < t2 < ... < tk < T . The number of PM’s k is fixed in advance. Each PM carried out at ti reveals and eliminates all initial failures (damages) existing at the instant ti . Our goal is finding the optimal location of the PM instants t1 , ..., tk which would minimize the mean number of terminal failures on [0, T ].

5.3. TWO-STAGE FAILURE MODEL

5.3.3

115

Some Facts About the Nonhomogeneous Poisson Process

We will need some facts from the theory of NHPP, see Appendix 1 for a detailed derivation. Theorem 3.1 Let ξt be a NHPP process with rate λ(t). Denote by vk (t1 , t2 ) the probability to have exactly k events appear in the interval [t1 , t2 ]. Then [Λ(t1 , t2 )]k , k = 0, 1, ..., k! Rt Rt Rt where Λ(t1 , t2 ) = 0 2 λ(t)dt − 0 1 λ(t)dt = t12 λ(t)dt . In particular, the mean number of events in [t1 , t2 ] is vk (t1 , t2 ) = exp[−Λ(t1 , t2 )]

(5.3.1)

E[ξt2 − ξt1 ] = Λ(t1 , t2 ),

(5.3.2)

and the probability of having no events in [t1 , t2 ] is P (ξt2 − ξt1 = 0) = exp[−Λ(t1 , t2 )] .

(5.3.3)

We see from this theorem that the NHPP is characterized by the integral of the event rate, the so called cumulative event rate Z t Λ(t) = λ(v)dv. 0

The key for our analysis is the following theorem ( for the proof see Andronov and Gertsbakh (1972), Andronov (1994)). Theorem 3.2 The appearance times of terminal failures constitute a NHPP with cumulative event rate ΛF (t) defined as Z t ΛF (t) = λ(t − y)F (y)dy . (5.3.4) 0

Often, the following equivalent form is more convenient: Z t ΛF (t) = λ(y)F (t − y)dy . 0

To derive it, put y = t − v in (5.3.4). As a corollary, the mean number of terminal failures on [t1 , t2 ] is M [tk , tk+1 ] = ΛF (t2 ) − ΛF (t1 ),

(5.3.5)

and the probability of no terminal failures on [t1 , t2 ] is P0 [t1 , t2 ] = exp[−(ΛF (t2 ) − ΛF (t1 )] .

(5.3.6)

116

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

5.3.4

Finding Optimal Location of the PM Points

Suppose that there is only a single PM at t = tk . Then the mean number of terminal failures on [tk , tk+1 = T ] is Z

tk+1

λ(v)F (tk+1 − v)dv .

M [tk , tk+1 ] =

(5.3.7)

tk

Indeed, in the absence of the PM, the mean number of failures on [0, tk+1 = Rt T ] would be 0 k+1 λ(v)F (tk+1 − v)dv. The terminal faiulures in [tk , tk+1 ] are of two types: those arising from initial failures ”born” in [0, tk ] and those arising from initial failures ”born” in [tk , tk+1 ]. The former are eliminated by the PM . This is equivalent formally to putting λ(v) = 0 for v ∈ [0, tk ]. Thus we arrive at (5.3.7). It is instructive to give a heuristic proof of (5.3.7). Divide the interval [tk , tk+1 ] into a large number of small intervals, each of length ∆. A damage appears in [v, v + ∆] with probability λ(v) · ∆, and with probability F (tk+1 − v) it degenerates into a terminal failure within the interval [tk , tk+1 ]. Thus the contribution of one ∆-interval to the mean number of terminal failures is the sum of all terms λ(v) · ∆ · F (tk+1 − v), which tends to the integral (5.3.7) when ∆ → 0. Now suppose that m PM’s are planned at the instants 0 < t1 < t2 < ... < tm < T . Then the mean number of terminal failures will be M (t1 , ..., tm ) =

m Z X j=0

tj+1

λ(x)F (tj+1 − x)dx,

(5.3.8)

tj

where t0 = 0, tm+1 = T . Now our problem is to locate optimally the PM points t1 , ..., tm in order to minimize M (t1 , ..., tm ). This is a typical Dynamic Programming problem. Let us recall the Bellman’s Dynamic Programming principle (Bellman (1957), p. 18): ”An optimal policy has the property that whatever the initial state and the initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision”. The implementation of this principle to our problem is as follows. Denote by N (x, T ; s) the minimum mean number of terminal failures in the interval [x, T ] if (i) a PM is made at t = x; (ii) s PM points are optimally located in the interval (x, T ]. Suppose that we do the first PM in [x, T ] at time t = t1 , and afterwards follow the optimal strategy in the remaining period (t1 , T ]. Then we obtain the

5.3. TWO-STAGE FAILURE MODEL

117

following principal recurrence relation for s = 1, 2, ...:  Z t1  N (x, T ; s + 1) = min λ(v)F (t1 − v)dv + N (t1 , T ; s) .

(5.3.9)

x 0 for all r ≥ s. In other words, there is a positive probability that a particle can reach any state j from any state i in s transitions. The following principal theorem establishes the limiting behavior of the rstep transition probabilities. Theorem 4.1 Pn (r) (i) limr→∞ Pij = πj ; πj are nonnegative, and j=1 πj = 1. (ii) πj is the unique nonnegative solution of the system πj =

n X

πi Pij , j = 1, 2, ..., n,

i=1

n X

πi = 1 .

(5.4.5)

i=1

πi are termed stationary or limiting probabilities of the Markov chain. Let us mention several valuable properties of the stationary probabilities. By Theorem 4.1, πi is the limiting probability that the process is in state i at some remote time instant t, formally as t → ∞ (assume that each transition takes one time unit). Suppose that we observe the chain during a large number of transitions N . Let Ki (N ) be the number of times the process was visiting state i. Then the long-run proportion Ki (N )/N → πi as N → ∞. For state j, define mjj to be the mean number of transitions until a Markov chain, starting in state j, returns again to that state. It turns out that πj =

1 . mjj

(5.4.6)

A heuristic proof is as follows (see e.g. Ross (1993)). The chain spends 1 unit of time in state j at each visit into j. Since, on the average, it visits j once in mjj steps, the limiting probability for this state must be 1/mjj .

120

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

The returns of the Markov chain to any fixed state, say state 1, form a renewal process. The behavior of the chain after its return to state 1, which took place at the time instant K1 , does not depend on the past history for t < K1 . If the returns into 1 take place at the instants K1 , K2 , K3 , ..., then the intervals K2 − K1 , K3 − K2 , ..., etc. are i.i.d. r.v.’s with mean m11 .

5.4.2

Semi-Markov Process

It was assumed that the transitions in the Markov chain take place at the instants t = 1, 2, 3, .... One might imagine that after the transition into state i, the Markov chain is ”sitting” in this state i exactly one unit of time, after which the next transition takes place. Let us consider now a very useful generalization of a Markov chain called Semi – Markov process. In simple words, it is a chain with random time intervals between transitions; these random intervals do not depend on the past trajectory. Suppose that we have the same set of states {1, 2, ..., n} as for the Markov chain and the same probabilistic mechanism governing the change of states. Suppose that our new process starts at t = 0 from state i. The next state will be, as in the Markov chain, the state j chosen with probability Pij . Now introduce the one-step transition time τij . If the next state is j, then the SemiMarkov process (SMP) spends time τij in i, and jumps afterwards into j. The SMP will be denoted as ζt . The distribution of τij is defined as P (τij ≤ t|transition i ⇒ j) = Fij (t) .

(5.4.7)

The trajectory of ζt is constructed as follows. Take the initial state i. Choose randomly the next state according to the distribution Pik , k = 1, 2, ..., n. Suppose, the next state is s. Then let the process stay in i a random time τis ∼ Fis (t), after which the process jumps into state s. If the next state is q the process sits in s a random time τs ∼ Fsq (t) and then jumps into q, etc. Fig. 5.5 shows a trajectory of a SMP. For the further exposition we need to introduce the mean one-step sitting time νi for state i. From the description of ζt it follows that Z ∞ n X Pij νi = xdFij (x) . (5.4.8) j=1

0

For more information about the SMP see e.g. Ross(1970), Gertsbakh (1977). Denote by Qj the limiting probability of state j. Formally, Qj = lim P (ζt = j) . t→∞

(5.4.9)

Qj equals to the long-run proportion of time which the SMP spends in state j: Qj = lim

t→∞

time in j on [0, t] . t

(5.4.10)

5.4.

MARKOV-TYPE PROCESSES WITH REWARDS (COSTS)

121

Figure 5.5: Trajectory of a SMP. Transitions occurred at t1 , t2 , t3 , t4 , t5 into the states 4 ,4 ,3 ,2 and 4, respectively Denote by Ljj the mean first-passage (return) time from state j into state j. The following important theorem establishes the connection between Qj , Ljj and πj , see Ross (1970). Theorem 4.2 πj νj , Qj = Pn i=1 πi νi Pn πi νi Ljj = i=1 . πj

(5.4.11)

(5.4.12)

Let us present a heuristic proof of this theorem. Consider a large time span [0, T ], during which Ki transitions occurred into state i. The SMP spends, on Pnthe average, time νi in state i after each such transition. Therefore, Pn T ≈ K ν . Since the total time spent in j is K ν , Q ≈ K ν / i i j j j j j i=1 i=1 Ki νi . Pn Now divide both sides by K = i=1 Ki and take into account that Ki /K ≈ πi . This explains the first claim. Suppose that ζt returns into j each Ljj units of time. On the average, after each return the process sits in j time νj . Thus, Qj ≈ νj /Ljj , which explains the second claim of the theorem.

5.4.3

SMP With Rewards (Costs)

Now let us define rewards or costs associated with the transitions in the SMP. Assume that a reward ψij (τij ) is accumulated for a transition i ⇒ j which lasts

122

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

τij . Then the average reward ωi for a one-step transition from i is obtained by averaging ψij (x) over all possible durations of the one-step transition and over all possible destinations j: Z ∞ n X ωi = Pij ψij (x)dFij (x) . (5.4.13) j=1

0

Now everything is ready for a formula for long-run mean reward per unit time.

Theorem 4.3 The mean long-run reward per unit time is Pn πi ω i . g = Pi=1 n i=1 πi νi

(5.4.14)

It is instructive to derive this formula heuristically. Consider again a large time span [0, T ]. During this time, Ki transitions occur into i, and after each such transition the SMP stays in this state, on the average, during time νi . Therefore, T ≈ K1 ν1 + K2 ν2 + ... + Kn νn .

(5.4.15)

On the other hand, every visit into state i gives a reward whose mean value is ωi . Therefore, the total accumulated reward on [0, T ] is Reward on([0, T ]) ≈ K1 ω1 + K2 ω2 + ... + Kn ωn .

(5.4.16)

Now n n Reward on [0, T ]  X Ki ωi   X Ki νi −1 P Pn = g≈ · . (5.4.17) n T K j j=1 j=1 Kj i=1 i=1 Pn Now note that Ki / i=1 Ki is approximately equal to the stationary probability πi . This formula will be used in the next two sections.

5.4.4

Continuous-Time Markov Process

Let ξ(t) be a continuous-time Markov process with a finite number of states 0, 1, 2, ..., N . It means that for all t1 , t2 , t1 ≤ t2 , and any pair of states i, j, P (ξ(t1 + t2 ) = j|ξ(t1 ) = i, any history of ξ(u) on [0, u], u < t1 ) = P (ξ(t1 + t2 ) = j|ξ(t1 ) = i) . (5.4.18) It is postulated therefore, that the future behavior of ξ(t) after t1 depends only on its present state i at t1 . A convenient way of defining the transition probabilities Pij (t2 ) = P (ξ(t1 + t2 ) = j|ξ(t1 ) = i) is to describe the behaviour of the process on a small time interval.

5.4.

MARKOV-TYPE PROCESSES WITH REWARDS (COSTS)

123

Let us define the following transition rates λkj : as ∆t → 0, P (ξ(t + ∆t) = j|ξ(t) = k) = λkj ∆t + o(∆t), j 6= k, X P (ξ(t + ∆t) = j|ξ(t) = j) = 1 − λkj ∆t + o(∆t) .

(5.4.19)

j6=k

Let the initial state at t = 0 be i: P (ξ(0) = i) = 1. We will describe a standard procedure for finding the transition probabilities Pij (t) = P (ξ(t) = j|ξ(0) = i) . Let us introduce the following matrix B:  ∗ λ0 λ01 λ02 . . . λ0N  λ10 λ∗1 λ12 . . . λ1N   . . . . . . . B=  . . . . . . .   . . . . . . . λN 0 λN 1 λN 2 . . . λ∗N

       

Here λ∗k = −

X

λkj , k = 0, 1, ...N .

(5.4.20)

j6=k 0 0 Let P0i (t) and Pi (t) be the columns with elements Pi0 (t), ..., PiN (t) and Pi0 (t), ..., PiN (t), respectively. The transition probabilities Pij (t) satisfy the following system of differential equations:

P0i (t) = B0 Pi (t),

(5.4.21)

with the initial condition Pik (0) = 0, if k 6= i and Pii (0) = 1. B0 is the transposed matrix B. To prove (5.4.21), let us derive the equation for Pij0 (t). (5.4.21) says that X Pij0 (t) = Pik (t)λkj + λ∗j Pij (t) . (5.4.22) k6=j

Let us consider the transition probability from state i into state j during the interval [t, t + ∆t]. To be in state j at t + ∆t, the process ξ(t) either has to be in state k, k 6= j at time t and then jump into state j during the interval [t, t + ∆t], or to be already in state j at t and remain in this state during the interval [t, t + ∆t]. This leads to the following probability balance equation: X X Pij (t + ∆t) = Pik (t)λkj ∆t + (1 − λkj ∆t)Pij (t) + o(∆t). (5.4.23) k6=j

j6=k

Now transfer the term Pij (t) from the right to the left, divide by ∆t and set ∆t → 0. We obtain the desired equation (5.4.22).

124

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

The method which usually simplifies the analytic solution of the system (5.4.21) is applying the Laplace transform, which leads to a system of linear algebraic equations. Let us denote Z ∞ πj (s) = e−st Pij (t)dt . (5.4.24) 0

Applying the Laplace transform (5.4.24) to each row of the system (5.4.211) and using the formula Z ∞ e−st Pij0 (t)dt = sπj (s) − Pij (0), (5.4.25) 0

we obtain the system sΠ(s) − Pi (0) = B0 Π(s),

(5.4.26)

where Π(s) is the column consisting of elements πi (s), j = 0, 1, ..., N . From this follows the system of equations (Is − B0 )Π(s) = Pi (0),

(5.4.27)

where I is the identity matrix. This system must be solved with respect to πj (s), j = 0, ..., N , and the functions Pij (t) must be found with the help of the Laplace transform tables, see Appendix 3. For a better understanding of the process ξ(t) defined via the transition rates, let us describe its sample paths. Suppose that ξ(0) = i1 . Then ξ(t) sits in i1 a random time τi1 ∼ Exp(−λ∗i1 , after which jumps into state i2 which is chosen with probability λi1 ,i2 /(−λ∗i1 ). In state i2 , the process stays a random time τi2 ∼ Exp(−λ∗i2 , after which it jumps into a new state i3 , etc.

5.5 5.5.1

Opportunistic Replacement of a Two-Component System General Description

A system consists of two independent parts (components), named a and b. Both parts operate continuously and fail independently. If a component fails, an inspection would reveal it and the failed component will be replaced. After replacement, it is as good as new. If one component, a or b, has failed and is replaced, then there are two options with regard to the other component: to replace it too or not. The first option is what we call ”an opportunistic replacement”. In Section 4.1 we considered an example of an opportunistic replacement in which both components were always replaced upon the failure of any one of them. This policy turned out to be less costly than the policy of replacing

5.5.

OPPORTUNISTIC REPLACEMENT OF A TWO-COMPONENT SYSTEM 125

each component separately (Example 1.3). A natural extension of this simplified opportunistic replacement is the following: when one of the two components has failed, replace the second one only if its age exceeds some critical value Tmax . We shall assume that the system is periodically inspected, the inspection reveals always the existing failure. There might be the following four situations: (1) no failure discovered, no actions taken ; (2) one part has failed, the another one has not, only the failed part is replaced; (3) one part has failed, the another one has not, both parts are replaced; (4) both parts found failed, and both are replaced. Let us introduce the costs associated with these situations. There is a zero cost associated with (1); the cost for (2), the single repair cost, is Cf ; the cost for (3) or (4), the joint repair cost, is Cp,f .

5.5.2

Formal Setting of the Problem

The time interval between inspections ∆ is set to be 1. All inspections and repairs last negligible time. The lifetimes of a and b are τa and τb , respectively, and they are independent discrete random variables on a unit-step grid. Let P (τa = i) = pi , i = 1, ..., K; P (τb = j) = qj , j = 1, ..., K .

(5.5.1)

We will need the conditional probabilities that a part fails at ”age” k, k > r, if it has survived age r. Let (r) P (τa = i|τa > r) = pi , i = r + 1, r + 2, ...; and (r) P (τb = i|τb > r) = qi , i = r + 1, r + 2, ... Obviously, (r)

pi

= pi /(1 −

r X j=1

(r)

pj ); qi

= qi /(1 −

r X

qj ) .

(5.5.2)

j=1

Our opportunistic replacement policy is formulated as follows: if during an inspection it has been revealed that one of the two parts has failed, then replace the second part only if its age exceeds the critical value Tmax = N . The state ζ of the two-part system will be defined as follows. Put ζ = (Ka , Kb ), where Ka and Kb are the respective component ages after the inspection and replacement . The consecutive states of the system will be denoted as ζ0 , ζ1 , .... Consider an example. Time t is measured in integers 0,1,2,.. Suppose that N = 2. Both parts are brand new at t = 0; the lifetimes of a and b are τa = 1, τb = 4. The system starts from the state ζ0 = (0, 0) at t = 0. (Think that at t = 0 both parts have been replaced). The first failure of a takes place at t = 1), and only a is replaced since the age of b is less than N = 2. So, after the first replacement, the state of the system is defined as ζ1 = (0, 1). Suppose that the part which replaces the failed a-part has lifetime τa∗ = 4. The next failure takes place at t = 4; this is the failure of b. The age of a is now 3 > N = 2. Thus, both components need to be replaced, and the state of the system becomes ζ2 = (0, 0). Thus the trajectory of ζj is (0, 0) ⇒ (0, 1) ⇒ (0, 0) ⇒ ....

126

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Suppose that τa = τb = 4. Then the initial state is (0,0). The failure for the first time takes place at t = 4, both components are replaced, and the next state is again (0,0). So, the trajectory is ζ0 = (0, 0) ⇒ ζ1 = (0, 0) ⇒ .... ζj has the following state space: S = {(0, 0); (0, j), j = 1, ..., N, (i, 0), i = 1, ...N }. The state (i, i) for i > 0 is impossible. Indeed, after the replacement, one or both ages are zero. It is an important fact that the states of the system ζj , j = 0, 1, 2, ..., form a Markov chain. Indeed, the probability of the transition (i1 , j1 ) ⇒ (i2 , j2 ) depends only on the present state (i1 , j1 ) and does not depend on previous states. This is due to the definition of system state. If the ages of a and b are fixed, the future behavior is determined via the conditional probabilities (5.5.2). Our goal is obtaining an expression for the average long-run costs per unit time. For this purpose we use the technique described in the previous section. In fact, the intervals between transitions from state to state are random, and ζt fits the description of a Semi-Markov process, see subsection 5.4.2. We proceed with a numerical example.

5.5.3

Numerical Example

Below are the data on components lifetime: i 1 2 3 4 5 6 7

pi 0.08 0.14 0.17 0.17 0.15 0.11 0.18

(1)

pi − 0.152 0.185 0.180 0.160 0.120 0.203

(2)

pi − − 0.218 0.218 0.192 0.141 0.231

qi 0.05 0.08 0.16 0.30 0.20 0.10 0.11

(1)

qi − 0.084 0.168 0.316 0.211 0.105 0.116

(2)

qi − − 0.184 0.345 0.230 0.115 0.126

It will be assumed that N = 2, and thus the system states are : (0,0), (0,1), (0,2), (1,0), (2,0). The most tedious part of this example is calculating the transition probabilities. Let us start with state (0,0). To avoid mistakes, it is advisable to picture the sample space of (τa , τb ), see Fig. 5.6. A point with coordinates (r, s) represents the event (τa = r, τb = s) and since the lifetimes of a and b are independent, its probability is pr · qs . If the event A(1,0) takes place, the next state after (0,0) will be (1,0). (Verify it !) The event A(0,1) means transition into (0,1). Similar meaning have the events A(0,2) , A(2,0) . The remaining probability mass corresponds to the transition (0, 0) ⇒ (0, 0). It is a matter of a routine computation to get p00,01 = 0.046; p00,02 = 0.122; p00,10 = 0.076; p00,20 = 0.062; p00,01 = 0.694. Denote by ν00 the mean transition time from state (0,0). If the event

5.5.

OPPORTUNISTIC REPLACEMENT OF A TWO-COMPONENT SYSTEM 127

Figure 5.6: The sample space for the transitions from (0,0)

S S B1 = (τa = τb = 1) A(1,0) A(0,1) takesSplace, then S the system stays in (0,0) one time unit. If B2 = (τa = τb = 2) A(2,0) A(0,2) takes place, the transition lasts two time units. The events A3 , A4 , A5 , A6 and A7 mean that the transition times are 3, 4, 5, 6 and 7, respectively, see Fig. 5.6. So, ν00 = 1 · P (B1 ) + 2 · P (B2 ) + 3 · P (A3 ) + ... + 7 · P (A7 ). The result is ν00 = 3.247. Let us show how to compute the transition characteristics from state (2,0). Fig. 5.7 contains the necessary information. The τa axis is now the residual life (2) of a given that it survived past t = 2, see the column pi in the table above. The events A(0,1) and A(0,2) correspond to the transition into (0,1) and (0,2), respectively. Indeed, suppose that ζi = (2, 0), i.e. b is brand new and a has age 2. ζi+1 = (0, 1) means that after the failure of a, b has age 1. The transition ζi ⇒ ζi+1 can happen if and only if the residual lifetime of a is 1 and the age of b is greater than 1, exactly as defined by the event A0,1 . This gives p20,01 = 0.207; p20,02 = 0.190; p20,00 = 0.603. The events B1 through B5 , see Fig. 5.7, correspond to the transition times 1, 2, ...,5 respectively. Thus we arrive at ν20 = 2.592.

128

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Figure 5.7: The sample space for the transitions from (2,0) We omit similar calculations for the transitions starting in states (0,1), (1,0) and (0,2). We obtain eventually the following transition probability matrix P: state (0, 0) (0, 1) (1, 0) (2, 0) (0, 2)

(0, 0) (0, 1) 0.694 0.046 0.719 0 0.653 0.144 0.603 0.207 0.562 0

(1, 0) (2, 0) (0, 2) 0.076 0.062 0.122 0.077 0.131 0.073 0 0.042 0.161 0 0 0.190 0.169 0.269 0

By solving the system (5.4.5), see Theorem 4.1, we find the stationary probabilities: π00 = 0.670, π01 = 0.059, π10 = 0.075, π20 = 0.083, π02 = 0.113. The last task is finding the mean one-step costs. Let us assume that a single component replacement costs Cf = 1, 000$, and two component replacement costs Cp,f = 1, 400$. Suppose that the system state is (0,0). The transition into state (0,0) costs Cp,f . The transitions to any other state cost Cf . Therefore, the mean cost

5.6. MALFUNCTION IN AN AUTOMATIC MACHINE

129

associated with one transition starting in (0,0) is ω00 = P ((0, 0) ⇒ (0, 0)) · Cf + (1 − P ((0, 0) ⇒ (0, 0))) · Cp,f .

(5.5.3)

The mean cost for a transition starting from other states are computed similarly. For example, ω01 = P ((0, 1) ⇒ (0, 0)) · Cf + (1 − P (0, 1)) ⇒ (0, 0))) · Cp,f .

(5.5.4)

The following table summarizes the information about the stationary probabilities, mean transition times and costs. state (i, j) (0, 0) πij 0.670 νij 3.247 ωij 1, 278

(0, 1) 0.059 3.112 1, 288

(1, 0) 0.075 2.870 1, 261

(2, 0) 0.083 2.592 1, 241

(0, 2) 0.113 2.321 1, 225

Now everything is ready to compute the mean costs in $ per unit time. By (5.4.14), P πij ωij = 375.6 (5.5.5) g= P πij νij For a comparison, let us calculate the costs for an alternative policy: replace both units at the failure of any one of them. The result is g1 = Cp,f /E[min(τa , τb )] = 431 .#

5.6 5.6.1

Discovering a Malfunction in an Automatic Machine Problem Description

An automatic machine produces one article every time unit. If the machine is in a ”good” state, then there is the probability p0 that the produced article is defective. If the machine has a malfunction (i.e. it is in a failure state) this probability is p1 , and p1 > p0 . The entire time of machine operation is divided into periods during which N articles are produced. It is assumed that the failure can arise with constant probability γ at the beginning of each period. The state of the machine is tested by taking samples of n articles at the end of each period from the batch of N articles produced during this period. All articles in a sample are examined and if the number X of defective articles is k or more, it is decided that there is a malfunction. (k is the prescribed ”critical” level). If it turns out that X ≥ k, the machine is stopped for a time mp , during which it is established whether there is a failure or not. If the alarm was false, the machine immediately resumes its operation. If not, an additional time ma is spent to carry out machine repair, after which the machine starts operation with the initial low level p0 of defective rate.

130

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

It is assumed that each good article gives a reward of Cg and each defective article leads to a penalty of Cd . Our problem is to find the optimal ”stopping” rule X ≥ k, k = 1, 2, ...n, which would maximize the reward per unit time.

5.6.2

Problem Formalization

Define a SMP with the states: ”1” – the machine is operating, there is no malfunction; ”2” – the machine is operating (i.e. producing articles), there is a malfunction; ”3” – the machine is down, it is repaired after a malfunction has been discovered; ”4” – the machine is not operating, but there is no malfunction. From the above description it follows that the transition probabilities P31 = P41 = P23 = 1, see Fig. 5.8.

Figure 5.8: State transition diagram Let b0 (n, k) be the probability that a sample of n articles has k or more defective articles if the machine is in order, and let b1 (n, k) be the same probability for a machine which has a malfunction. The number of defective articles in a random sample of size n taken from a batch of size N has a so-called hypergeometric distribution, see e.g. Devore (1982), page 109. For N >> n and for p0 and p1 not too close to zero, which is our case, this distribution can be very good approximated by a binomial X ∼ B(n, p). Then we can write n   X n i b0 (n, k) = P (X ≥ k; p0 ) = p (1 − p0 )n−i , (5.6.1) i 0 i=k

b1 (n, k) = P (X ≥ k; p1 ) =

n   X n i=k

i

pi1 (1 − p1 )n−i .

(5.6.2)

5.6. MALFUNCTION IN AN AUTOMATIC MACHINE

131

Let Y be the ordinal number of the period at the beginning of which the malfunction occurs. According to the problem description, Y ∼ Geom(γ): P (Y = q) = (1 − γ)q−1 · γ, q ≥ 1 . It follows that P (Y > m) =

∞ X

P (Y = q) = (1 − γ)m .

(5.6.3)

q=m+1

The transition 1 ⇒ 4 occurs if and only if the alarm signal was received before the malfunction appeared. By (5.6.1),(5.6.3) it happens at the end of the m-th period with probability p14 (m) = [1 − b0 (k, n)]m−1 · b0 (k, n) · (1 − γ)m .

(5.6.4)

From this it follows that P14 =

∞ X

p14 (m) = b0 (k, n) · (1 − γ) · [1 − (1 − b0 (k, n))(1 − γ)]−1 .(5.6.5)

m=1

Obviously, P12 = 1 − P14 . Now we have the following transition probability matrix P : state 1 2 3 4

1 2 0 P12 0 0 1 0 1 0

3 4 0 P14 1 0 0 0 0 0

The system (5.4.5) (π1 , ..., π4 ) = (π1 , ..., π4 )P has the following solution: π1 = D, π2 = D · P12 , π3 = D · P12 , π4 = D · P14 , where D = 1/(3 − P14 ). Now calculate the mean one-step transition times. Obviously, ν3 = mp + ma , ν4 = mp . The machine stays in state 1 a random number of periods Z, and this number exceeds m if and only if there was no malfunction and no false alarm during these m periods. Thus, P (Z > m) = (1 − b0 (k, n))m · (1 − γ)m , m > 1 .

(5.6.6)

It is a matter of routine calculation to show that E[Z] = [(1−b0 (k, n))·(1−γ)]−1 , and therefore the mean one-step sitting time in state 1 is ν1 = N · E[Z]. If the machine is in state 2, the probability of discovering the malfunction after one period is b1 (k, n). It is easy to derive from here that the machine stays in state 2, on the average, time ν2 = N/b1 (k, n). It remains to determine the mean one-step rewards. Following the problem description, we obtain ω1 = ν1 [(1 − p0 )Cg − p0 · Cd ], ω2 = ν2 [(1 − p1 )Cg − p1 · Cd ],

132

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

ω3 = ω4 = 0. Indeed, an operating machine produces in state 1 a good unit with probability 1 − p0 , and a defective one –with probability p0 . This explains the formula for ω1 . The formula for ω2 is similar. Now we have all ingredients for using the formula (5.4.14) for the mean long-run reward per unit time: P4 πi ω i g(k) = Pi=1 . 4 i=1 πi νi

5.6.3

(5.6.7)

Numerical Example

Let us investigate the formula (5.6.7) for the following values of the parameters involved: γ = 0.1; N = 100. (On the average, a malfunction appears once per 1 000 articles); mp = 50, ma = 100. The probabilities of producing a defective article are p0 = 0.05 and p1 = 0.2. It means that, on the average, one out of 20 articles is defective when the machine operates normally, and one out of 5 is defective when there is a malfunction. The sample size for checking the machine state is n = 20. The reward Cg = 1. The penalty for a defective is Cd = 1. Let us investigate the expression for g(k) numerically. Fig. 5.9 presents the results. The optimal k = 5. It guarantees g(5) = 0.76.

Figure 5.9: Reward as a function of the critical number k Smaller or larger values of k reduce the reward substantially, as is seen from the Fig. 5.9. For example, if we take k = 2, there will be too many false alarms; if k = 10 then the machine will produce too many defective articles.

5.7. MULTIDIMENSIONAL STATE DESCRIPTION

5.7

133

Preventive Maintenance of Objects With Multidimensional State Description

5.7.1

General Description

We considered so far in this chapter several preventive maintenance models based on the observation of the state parameter. For example, the number of failed lines in the multiline system was such a parameter. Another example was the state ζt of the two-component system described in Section 5.5. The system degradation in the two-stage failure model (Section 5.3) is reflected by the accumulated number of initial failures. The main difficulty intrinsic in these and similar problems is finding out an appropriate stochastic description of the process which reflects the system degradation (deterioration), or the damage accumulation. The situation is relatively simple if this description can be given in the terms of a one-dimensional Markov-type process. In practice, however, a situation when the state of the system (component) is described well enough by a one-dimensional parameter is the exception rather than the rule. More often an adequate description of the state of a technical object is given in terms of some multidimensional process η(t) = (η1 (t), η2 (t), ..., ηn (t)) .

(5.7.1)

An important practical issue is how to construct, based on the observations of the vector process η(t), an optimal preventive maintenance procedure. There are several cases where in fact we can reduce the situation to a one-dimensional. (a) The processes ηk (t), k = 2, ..., n are functionally related to the first coordinate η1 (t) as ηk (t) = ψ(ηk (t)) + t , where t is an observation noise. In this case actually only η1 (t) contains useful information. (b) All ηi (t) are independent processes which can be observed, monitored and controlled independently. For example, in the car the state of the braking system is described by the average wear of the braking disk η1 (t), and the state of the engine – by the fuel consumption index η2 (t). We might assume that there is no connection between these two processes. One can design the maintenance procedures separately for the braking system and for the engine. The only connection between the maintenance actions for both these systems may arise as a result of cost reduction when these two subsystems undergo simultaneously the preventive maintenance. However, in general case, the above described simplifications are not possible, and we have to deal with a rather difficult problem of controlling a multidimensional process. The complications arise as a result of dealing with a multidimensional ”critical region”, which replaces now the ”critical” one-dimensional level. Besides, and this is even more important, we need an adequate probabilistic description of a multi-dimensional stochastic process. The difficulties in getting this description are further aggravated by the fact that we never observe

134

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

”pure”, undisturbed sample paths of η(t) because of breakages , partial repairs, maintenance interventions into the system, etc. How to remedy this situation? When we take a closer look at the maintenance practice, we will always discover that the maintenance decisions are based on a cleverly chosen one-dimensional system (component) ”health index”. A good car mechanic would take as such index the engine noise. We would decide to sell our car if the yearly service cost (an integrated parameter !) will exceed some limit. A production line would be stopped for maintenance if the defective rate or a certain product’s physical parameter will exceed some critical value. The works of Hastings (1969), Gertsbakh (1972,1977), Bailey and Mahon (1975), and the more recent works of Kordonsky and Gertsbakh (1995, 1998) present examples of using one-dimensional parameters as the basis for making maintenance decisions. Summing up, it would be desirable to replace the vector η(t) by a scalar process r(t) = ψ(η1 (t), ..., ηn (t)) and to use this new ”system health index” as the basis for our maintenance decisions.

5.7.2

The Best Scalarization

The essential feature of of the scalar function r(t) must be the ability to discriminate between failed and nonfailed objects. To construct such a function, we will involve some ideas of the discriminant analysis, see Anderson (1984), Chapter 6, Gertsbakh (1972, 1977). Let us assume that the whole population of objects can be conventionally divided into two groups: A (the nonfailed or ”good”) and B (the failed or ”unfit”). Such a subdivision cannot be done always easily. Medical doctors usually can distinguish very well between a healthy person and a sick one, and moreover, classify the patient within a specific group according to his/her illness. In technology, such a classification is sometimes very difficult. Indeed, between the states ”brand new” and ”entirely useless object” there are many intermediate states which are difficult to distinguish. In practice, however, the decision is made on the basis of intuition plus experience and common sense. In our case, we will need to distinguish between two extreme categories of objects: ”very good” and ”very bad”. Thus, we begin by assuming that there is an expert who can classify all objects under consideration into these two groups, A and B. Let NA and NB be the numbers of objects in these two groups, which are characterized by the samples A B B xA 1 , ..., xNA and x1 , ..., xNB ,

(5.7.2)

where every vector in the sample z = A, B, xzj , j = 1, ..., Nz is a n-tuple xzj = (xzj1 , ..., xzjn ) .

(5.7.3)

The idea of constructing the ”best” scalarization is based on replacing the vector

5.7. MULTIDIMENSIONAL STATE DESCRIPTION

135

xzj by a scalar n X

rjz =

li xzji ,

(5.7.4)

i=1

where the coefficients li are selected in some ”optimal” way. To explain this approach, let us give a geometric interpretation to the scalarization (5.8.4). The samples A and B are two clusters of points in the ndimensional parameter space. Let µA and µB be the n-dimensional vectors of mean values for A and B, respectively, and let SA and SB be respective sample variance-covariance matrices: (i) µ ˆA

=

NA X

xA mi /NA ,

(5.7.5)

xB mi /NB , i = 1, 2, ..., n,

(5.7.6)

m=1 (i)

µ ˆB = µ ˆz =

NB X

m=1 (ˆ µ(1) ˆ(n) z , ..., µ z ),

z = A, B ,

(5.7.7)

and Sz = ||Vijz ||, i, j = 1, 2, ..., n, z = A, B, where Vijz = Nz−1

Nz X

z (xzmi − µ ˆ(i) ˆ(j) z )(xmj − µ z ).

(5.7.8)

m=1

Linear transformation means replacing each vector observation xzj by the Pn scalar i=1 `i xzji . Geometrically it is equivalent to projecting each observation onto some line whose direction is collinear to the vector ` = (l1 , ..., ln ). Fig. 8.1 shows two clusters A and B in a two-dimensinal space. Obviously, the projection onto the direction `(1) better discriminates between A and B than does the projection onto `(2) . The fundamental idea of R. Fisher is to choose the direction ` in such a way that the ratio of squares of the difference of projected mean values on the line ` over the sum of the variances of the projected samples would be maximal. Let us represent the Fisher’s ratio via the vectors of the mean values and the variance-covariance matrices. Suppose that each vector observation is replaced by the projection yjz = l1 xzj1 + ... + ln xzjn = `xzj ,

(5.7.9)

where ` is a row-vector, xzj is a column-vector, and `xzj is their scalar product. The mean value in the projected sample z is yˆz =

Nz X n X j=1 k=1

xzjk /Nz = `ˆ µz .

(5.7.10)

136

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Figure 5.10: Scheme illustrating the choice of the ”best” scalarization (µ ˆz is a column-vector). The variance of the projection of both samples is given by Nz  X  X S=` (Nz )−1 (xzj − µ ˆz )(xzj − µ ˆz )0 `0 = `(SA + SB )`0 . (5.7.11) z=A,B

j=1

The notation ”0” means transposition. Therefore, the Fisher’s optimal vector `∗ maximizes the following expression: D=

(`(ˆ µA − µ ˆB ))2 . `(SA + SB )`0

(5.7.12)

The following theorem is a well-known fact from linear algebra: Theorem 7.1 Let SA + SB be a nonsingular matrix. Then D is maximized by the vector `∗ defined as `0∗ = (SA + SB )−1 (ˆ µA − µ ˆB );

(5.7.13)

The maximal value of D equals D∗ = (ˆ µA − µ ˆB )0 (SA + SB )−1 (ˆ µA − µ ˆB ) .

(5.7.14)

It is instructive to compare the result of this theorem with another approach used in mathematical statistics for the classification of objects into two groups. Let the probabilistic properties of populations A and B be described by their

5.7. MULTIDIMENSIONAL STATE DESCRIPTION

137

density functions pA (x) and pB (x), respectively. According to the theorem of Neyman –Pearson, the best inference about the membership of a given vector x0 to the populations A or B must be formed on the basis of the value of the likelihood ratio rA,B (x0 ) = pA (x0 )/pB (x0 ),

(5.7.15)

or on the value of any monotone function of rA,B . Suppose that pA and pB are multidimensional normal densities with identical (nonsingular) covariance matrices V. Then, using standard manipulations, we obtain that log rA,B (x0 ) = x0 V−1 (µA − µB ) − (µA + µB )0 V−1 (µA + µB )/2 . (5.7.16) In this formula, the second term is a constant which does not depend on the actual observation vector x0 and the first term is nothing but a scalar product of the vector x0 and the vector V−1 (µA − µB ). This vector is analogous to the vector `∗ in the Fisher’s approach. Thus there is a close proximity between Fisher and Neyman–Pearson approaches.

5.7.3

Preventive maintenance: multidimensional state description

Let us describe the procedure of data processing and organizing the preventive maintenance for a system whose state description is multidimensional. Step 1. Two samples representing ”brand new” and ”entirely unfit” objects are selected. It is desirable that the first sample would consist of good-functioning units operating for a relatively short time. The second sample contains units which already are in operation for a longer time, and according to the expert opinion would be considered as unfit or close to failed. The actual values of the parameters have to be measured for every object in the samples, and the samples of type (5.7.2) must be formed. It is not expedient to restrict ourselves a priori by a number of parameters measured (i.e. by the dimension of the vector xzj ). Step 2. The vectors µ ˆA , µ ˆB , the matrices SA , SB , and (SA + SB )−1 are calculated, and finally – the vector `∗ defined by (5.7.13). Using this vector, each n-tuple of observations is reduced to a single value yjz , see (5.7.9). Now two one-dimensional samples yjA , j = 1, ..., NA , and yjB , j = 1, ..., NB , must be investigated in order to decide whether the discrimination between A and B is satisfactory or not. Fig. 5.13 shows two examples of comparing the projected samples. If the situation is similar to the case shown on the right part of Fig. 5.13, we can say that we succeeded in creating a one-dimensional discrimination function. If not (Fig. 5.13, left), the above discrimination technique cannot be used. We

138

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Figure 5.11: Patterns of bad and good discrimination suggest the following half-empirical criterion. Let σ ˆA and σ ˆB be the standard deviations of the projected samples A and B, respectively. Then we assume that the discrimination is good if |ˆ y A − yˆB | > 2.5(ˆ σA + σ ˆB ) .

(5.7.17)

Step 3. Suppose that we are lucky and there is a good discrimination between ”good” and ”bad” objects. Fig. 5.14 shows a possible way of organizing the preventive maintenance. Consider a certain object which has at the initial time instant t0 the vector of parameters η(t0 ) = (η1 (t0 ), η2 (t0 ), ..., ηn (t0 )). Until now, while speaking about discrimination, we have acted as if the parameters were ”frozen”; their evolution in time has not been taken into consideration. Now recall that ηi (t) are random processes. On the parameters’ ”phase space” shown on Fig. 5.13, the evolution of an object from population A into population B goes along some random path. We are interested only in the projection of the vector η(t) on the line ` which was chosen as a line with best discriminating properties between A and B. Therefore, our maintenance actions are nothing but a control of the scalar process r(t) = η1 (t)l1 + η2 (t)l2 + ... + ηn (t)ln .

(5.7.18)

This is justified because in the area when this parameter changes, there are clearly expressed zones corresponding to ”good” and ”bad” objects. Apparently, it is expedient to introduce two levels for r(t), H and h, so that the situation r > H might be considered as a failure, the state h ≤ r(t) ≤ H – as marginal, and the state r(t) < h – as the good one. No maintenance actions should be undertaken while the object is in the good zone; preventive maintenance (PM) must be undertaken while the object is discovered in the marginal zone, and an emergency repair (ER) must be made if the process enters the failure zone. PR and ER mean shifting the process from its actual state to some state in the

5.7. MULTIDIMENSIONAL STATE DESCRIPTION

139

Figure 5.12: Scheme of preventive maintenance based on the control of the scalar parameter r(t).

(r(t) < h)-zone. Step 4. This step is investigation and validation of the stochastic pattern for the behavior of r(t). This is probably the most difficult part of the whole maintenance procedure. First, we will need to observe sample paths of r(t). Very probably, there will be a small number of complete paths and many partial paths since the system under observation is subjected to changes, repairs, etc. We suggest first of all to try to fit a Markov-type process. If discrete levels are introduced for the process r(t), then in fact our first choice is to fit a Markov chain to the stochastic behaviour of r(t). We will not go into the details of this investigation. Let us mention only that the book by Gertsbakh (1977) presents an example of such statistical analysis, see Chapter 4, Section 3. The objects were hydraulic pumps used in aircraft industry. A sample of 33 pumps was examined and classified into two groups A and B by an expert, with 25 and 8 units, respectively. Measurements of eight physical characteristics were made, including pressure oscillation index, spectral characteristics of vibrations and noise. The analysis of the Fisher discrimination ratio showed that for the optimal discrimination, the ratio (ˆ y A −ˆ y B )/(ˆ σA +ˆ σB ) = 2.52. The range of r(t) was divided into 25 levels, where the levels 1,2,3 were defined as ”good”, the levels r > 16 – as failure, an the rest – as marginal.

140

CHAPTER 5. MAINTENANCE: PARAMETER CONTROL

Statistical analysis based on observing 17 pumps during 3 000 hrs of operation allowed to construct a model of Markov chain with 25 states for r(t). Two inspection periods were introduced: ∆ = 200 hrs and ∆ = 400 hrs. The inspection, maintenance and ER costs were defined as cinsp = 1, cP M = 10 and cER = 100, respectively. The optimal policy was found which prescribes inspecting a pump after 400 hrs if it was found in one of the states 1,2,...,9, and inspection each 200 hrs for the states 10,11,...,16. The maintenance action was a shift from marginal or failure state to one of the good states. It was also established that the optimal policy is robust with respect to the changes in cost parameters.

Chapter 6

Best Time Scale For Age Replacement

Counting time is not so important as making time count James J. Walker You will never ”find” time for anything. If you want time, you must make it Charles Buxton

6.1

Introduction

As a rule, the lifetime of any system (component) can be measured and observed in more than in one time scale. There are two widely used ”parallel” time scales for cars: the calendar time scale and the mileage scale. For aircraft frame, three time scales are the most popular: the calendar time scale, the number of flight hours (time in the air) and the number of landings/ takeoffs. For jet engines, the age is measured in operation hours, and in operation cycles. In some cases, the choice of the most ”relevant” time scale is obvious. For example, the ”true” age of the car body must be measured in calendar time, and not in mileage because the main aging factor is corrosion which goes on in the ”usual” time. On the contrary, the age of the braking system must be measured in mileage scale since the dominating factor is the wear of braking disks and shoes which develops as a function of mileage. For aircraft undercarriage, 141

142

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

the lifetime must be related to the number of operation cycles (the number of landings and takeoffs). There are, however, many cases where the right choice of the most relevant time scale is not so obvious. For example, the airframe failure is a result of fatigue cracks in the most heavily stressed joints. The appearance of these cracks depends on both corrosion and fatigue damage accumulation. The corrosion is developing in calendar time and in time in the air, and the fatigue cracks may depend on the total time in air and the number of heavy-stress cycles which take place at landings and takeoffs. There are trials to design by technical means so-called damage counters which create their own artificial time scale, see e.g. Kordonsky and Gertsbakh (1993). The correct choice of the ”best” time scale is of crucial importance for accurate failure prediction and for planning preventive maintenance activities. Let us consider a very simple but instructive example: replacement of a cutting tool on an automatic production line (APL). It is known that the milling cutter is capable of providing the necessary accuracy during, say 1 000–1 200 operation cycles. (Suppose, that the instrument manufacturing process as well as the work of the APL are highly stable wellcontrolled processes). The APL works daily during 2-18 hours, depending on the market demand. One operation cycle lasts exactly 10 minutes. On the average, the APL works 10 hours daily and, on the average, does 600:10=60 cycles daily. Suppose that the maintenance manager decides to carry out the preventive replacement of the tool each 16 days. On the ”average”, it corresponds to 960 operation cycles, which seems to be a reasonable age for the replacement. But obviously, this is not a very clever decision: due to large variations in the daily working hours, the tools will be replaced actually in the range of 200 – 1 700 cycles. In most cases, either a good new tool will be replaced or, on the contrary, a tool which already has worn out. Much better would be to record the number of operation hours for the APL and initiate the replacement as soon as this number reaches the ”critical age” of 960 cycles. Using more formal language, there are two time scales for the milling cutter, the operational time and the calendar time, and the relevant time scale is the operational time. Once there are at least two observable principal time scales (say, mileage and operation time), it is always possible to introduce a new time scale by considering a weighted sum of the principal time scales. In this way we arrive at the problem of choosing the best, most appropriate time scale. Kh.B. Kordonsky suggested the following definition of the ”best” time scale (see Kordonsky and Gertsbakh (1993)). The best time scale provides the minimal coefficient of variation (c.v.) of the time to failure. There is a good intuitive reason behind this definition. If a system has in a certain time scale a lifetime whose c.v. is very small, then failure prediction in this scale will be almost certain, and maintenance actions might be carried

6.2. FINDING THE OPTIMAL TIME SCALE

143

out very efficiently. We have already demonstrated in Chapter 4 that the c.v. is one of the main factors influencing the maintenance efficiency. In the next section we define formally the the best ( optimal) time scale for the family of time scales obtained as a convex combination (with nonnegative weights) of two principal time scales. In Section 6.3 we consider an example of fatigue test data and demonstrate the difference in the c.v. between various time scales. We introduce a cost criterion for age replacement for an arbitrary time scale and describe a numerical procedure for finding the optimal time scale for age replacement. In Section 6.4 this procedure is applied to the fatigue test data and it is demonstrated that the optimality in the terms of c.v. and in the terms of the cost criterion for age replacement practically coincide. Several types of data are considered. For complete samples, we use a nonparametric approach to finding the optimal preventive maintenance policy based on the empirical lifetime data, see also Arunkumar (1972). For censored and quantal-type data, our approach is based on assuming a parametric lifetime distribution model and on using maximum likelihood for parameter estimation.

6.2

Finding the Optimal Time Scale

We need some notation. Time scales are denoted by calligraphic letters L, T, H. Italic capitals L, T, H and small Italic letters l, t, h are used to denote random variables and nonrandom variables (constants) in the corresponding time scales, respectively. As soon as there is more than one scale, the question of which time scale is better arises immediately. Intuitively we prefer that time scale which is able to predict more accurately the failure instant. We suggest the following: Definition 2.1 Suppose that we have several time scales L1 , L2 , ..., Lk , and let the lifetimes of a particular system be L1 , L2 , ..., Lk , respectively. Then we say that the time scale L1 is optimal for this system if the corresponding c.v. is the smallest: c.v.[L1 ] = min {c.v.[Li ]} . #

(6.2.1)

Remind that the c.v. is defined as p c.v.[L] = V ar[L]/E[L] .

(6.2.2)

1≤i≤k

The c.v. by (6.2.2) is dimensionless, it is invariant with respect the choice of the time unit and is relatively easy for estimation, especially if a complete sample is available. When we consider a particular problem for which there is an optimality criterion, the best time scale becomes well defined: it is the time scale which provides the optimal value for this criterion. We demonstrate further an example

144

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

for which the optimality in the terms of the cost criterion for age replacement and the optimality in the terms of the c.v. practically coincide. Suppose that we have two principal observable time scales L and H. Then let us consider the family of times scales Ta = (1 − a) · L + a · H, a ∈ [0, 1].

(6.2.3)

(6.2.3) means that if the given system has lifetimes L and H in the time scales L and H, respectively, then the lifetime in the Ta - scale is defined as Ta = (1 − a) · L + a · H .

(6.2.4)

Theorem 2.1 Let S be a nonsingular variance-covariance matrix for the vector V = (L, H) and let M be a column vector with components (E[L], E[H]). Suppose that both components of the vector G = (g1 , g2 )0 = S−1 · M have the same sign. (”0” means transpose). Then the optimal a = a∗ providing the minimal c.v. is a∗ =

g1 . g1 + g2

(6.2.5)

If g1 and g2 have opposite signs, the optimal scale is either L, or H, according to the smallest c.v. Proof Finding the minimum of the c.v. is equivalent to maximizing the ratio (M0 G)2 /G0 SG, G 6= 0. It is known, see e.g. Johnson and Wichern (1988), p. 64, that the maximum is attained at G = Const · S−1 M. The minimal c.v. is 1/M0 S−1 M. From here it follows that a∗ =

g1 E[H]V ar[L] − E[L]Cov[L, H] , g1 /g2 = . g1 + g2 E[L]V ar[H] − E[H]Cov[L, H]

(6.2.6)

It is very instructive to have a geometric interpretation of the Ta - scale. This scale is obtained by projecting the realizations of a two-dimensional random variable (L, H) on a direction which is collinear to the vector b = (1 − a, a), and by multiplying them by the norm of this vector. (Recall that (6.2.4) is a scalar product of two vectors (L, H) and (1 − a, a), see also Fig. 6.1 ).

6.3

Optimal Time Scale for Age Replacement: Fatigue Data

Example 1.1: Kordonsky’s fatigue test data. A sample of 30 steel specimen were subjected to cyclic bending until they broke of fatigue. The sample was divided into six groups of size 5, and each group was

6.3. OPTIMAL TIME SCALE FOR AGE REPLACEMENT

145

Figure 6.1: Ta scale is determined by the vector (1 − a, a) subjected to a two-level periodic loading consisting of 5, 000αj low-load cycles and 5, 000(1 − αj ) high-load cycles. The lifetimes in the number of low- and high-load cycles are given in the table 6.1. below. Denote by L the number of low-load cycles and by H the number of highload cycles. The empirical c.v. in both scales differ significantly : c.v.[L] = 1.106, and c.v.[H] = 0.399 . # Let us proceed with the derivation of a formula for long-run mean costs for age replacement in the Ta -scale. Denote by Fa (t) system lifetime in the time scale (6.2.4). We already know from Chapter 4 the formula for the average long-run costs: ηa (x) =

Fa (x) + (1 − Fa (x)) · d Rx . (1 − Fa (t))dt 0

(6.3.7)

(As usually, the cost for failure is 1, the cost for preventive repair at age x is d < 1; assume that these costs are measured in $ ). The optimal preventive maintenance age is the value x = xa which minimizes ηa (x). In (6.3.7), Fa (t) is the system c.d.f. in the time scale Ta = (1−a)·L+a·H, which we call the Ta - scale. An important issue is comparing costs per unit time in different time scales. ηa (x) has the dimension of $ per 1 unit of Ta time. In order to be able to compare costs for different scales we must express them through the same time unit. We will do it by reducing the dimension of the cost criterion to $ per unit of the L - time. Suppose, e.g. that in the flight time scale the value of η(x) is $100 per one hour. Suppose that in the landing time scale the costs are $300 per one landing.

146

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

Table 6.1: The number of low-load and high-load cycles to fatigue failure i αj 1 0.95 2 3 4 5 6 0.80 7 8 9 10 11 0.60 12 13 14 15

Low-load 256, 800 235, 800 370, 150 335, 100 380, 300 153, 000 176, 200 160, 300 156, 000 103, 000 84, 000 81, 000 90, 000 57, 000 66, 000

High-load 13, 500 11, 600 19, 250 17, 500 20, 000 38, 000 44, 000 40, 000 39, 000 25, 000 54, 400 52, 300 59, 900 37, 300 42, 700

i αj Low-load 16 0.40 32, 000 17 48, 000 18 42, 000 19 42, 000 20 54, 000 21 0.20 10, 000 22 16, 000 23 12, 000 24 19, 000 25 11, 000 26 0.05 3, 000 27 3, 750 28 4, 250 29 3, 320 30 2, 750

High-load 45, 700 70, 400 61, 500 60, 600 80, 400 37, 500 62, 700 45, 300 72, 600 42, 000 53, 900 68, 550 77, 950 57, 950 51, 250

To compare these costs we note that on the average one landing corresponds, for example, to 4.5 flight hours. Thus $300 /1 landing = $300 / 4.5 hours = $66.7 per one hour. We used the fact that the average lifetime in hours m1 and the average lifetime in landings m2 are related as m1 = 4.5m2 . In general, let the mean life of the system in scale L be E[L] and the mean life in Ta - scale be E[Ta ] = (1 − a)E[L] + aE[H]. Therefore, one unit of the Ta - time is equivalent to E[L]/E[Ta ] units of L - time. Therefore, to bring the dimension of the expression (6.3.7) to the units of $ per 1 unit of L - time, we must consider the following expression: γa (x) = ηa (x) · E[Ta ]/E[L] .

(6.3.8)

Note that the factor 1/E[L] is the same for all a and can be omitted when ηa (x) are compared for various a - values. A closer look at the expression (6.3.8) reveals that, up to a factor 1/E[L], γa (x) is nothing but an inverse of the previously introduced efficiency Q, see Section 4.3: Qa (x) =

1 1 R∞ . = ηa (x) · E[Ta ] ηa (x) · 0 (1 − Fa (t))dt

(6.3.9)

Suppose now that for fixed a, xa is the optimal age: min γa (x) = γa (xa ). x>0

(6.3.10)

ALGORITHM FOR FINDING THE OPTIMAL TA

6.4.

147

Then the optimal Ta - scale corresponds to such a = a∗ which provides the minimal γa (xa ): γa∗ (xa∗ ) = min γa (xa ) . 0≤a≤1

(6.3.11)

Since ηa (x) has the dimension $ per unit of Ta - time, and E[Ta ] has the dimension of Ta time, Qa is dimensionless with respect to any time scale. Qa is in fact an ”internal” efficiency measure which fits any time scale. This is particularly convenient for comparing various time scales with respect to the efficiency of the age replacement.

6.4

Algorithm for finding the optimal Ta

6.4.1

Complete samples

Nonparametric approach Let us adopt the nonparametric approach according to which the c.d.f Fa (t) in (6.3.7) is replaced by the empirical distribution function (EDF) Fˆa (t). Let l1 , ..., lm and h1 , ..., hm be the observed lifetimes in the L - and H - time scales, respectively. Then for fixed a we have a complete sample of Ta -times: ti = (1 − a)li +ahi , i = 1, 2, ..., m. Assume that ti are already ordered : t1 < t2 < ... < tm . Then the EDF is (see Section 3.2) : i Fˆa (t) = , t ∈ [ti , ti+1 ), i = 1, ..., m − 1, m

(6.4.12)

and Fˆa (t) = 0 for t < t1 , and Fˆa (t) = 1 for t ≥ tm . The nonparametric estimate of (6.3.7) is Rt [Fˆa (x) + (1 − Fˆa (x)) · d] · 0 m (1 − Fˆa (t))dt γˆa (x) = (E[L])−1 . Rx ˆa (t))dt (1 − F 0

(6.4.13)

The procedure for finding the optimal scale is now as follows. Fix a = a0 ; find the optimal PM age x0 which minimizes γˆa0 (x). Denote this minimal value γˆa∗0 ; repeat for a := a + ∆a on a grid over a ∈ [0, 1]. Find out that a∗ which provides the minimum of γˆa∗0 : γ ∗ = min γˆa∗0 . 0≤a0 ≤1

(6.4.14)

It follows from the general theory that as the sample size m → ∞, the optimal PM age tends for fixed a with probability 1 to the true optimal PM age in Ta - scale.

148

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

Parametric approach Let us make an assumption that the lifetime Ta has a known parametric form Ta ∼ F (v; θ), where F (·) has a known functional form, and θ is an unknown parameter, possibly multidimensional. Denote by f (v; θ) the corresponding density function. Our goal is to estimate θ and to find the ”best” value of a. We suggest using a method based on maximum likelihood, which works as follows. (I). For each fixed a, write the likelihood function for the observations {(li , hi ), i = 1, ..., m}:

Lik =

m Y

f ((1 − a)li + ahi ; θ).

(6.4.15)

i=1

ˆ (II). Maximize the loglikelihood with respect to θ. Let θ(a) be the MLE’s. ˆ (III). Express the coefficient of variation of Ta via θ(a): ˆ c.v.(a) = ψ(θ(a)) .

(6.4.16)

(IV) Put a := 0.00(0.01)1.00 and repeat steps (I), (II), (III). Fix a = a∗ providing the minimal value of the c.v. Then the ”best” time scale will be Ta∗ , and the corresponding c.d.f. will be ˆ a∗ ). F (v; θ;

6.4.2

Right-censored samples

A procedure similar to the above described for complete samples can be extended to the case of right-censored L- and H- times. Suppose that the i-th item was observed until time li in the L - scale and was removed from the observation afterwards. Suppose that this item survived time hi in the H - scale. Then obviously the lifetime in Ta is right-censored by ta (i) = (1 − a) · li + a · hi . The key idea is to calculate the Kaplan-Meier estimate of the survival funcˆ a (t) and to use 1 − R ˆ a (t) instead of the empirical distribution function tion R ˆ Fa (t) in (6.4.13). There is however a complication: for censored data, the Kaplan-Meier estimate is available usually only for a finite interval [0, Tmax ], where Tmax depends on the largest noncensored observation, see Chapter 3, Section 1. Typically, this interval is large enough for finding the optimal replacement age, but it is not clear how to estimate the mean lifetime in Ta . We suggest to use for this purpose a ”parametric assumption”: plot the data on probability paper, estimate approximately the parameters from the plot which seems to produce the best fit, estimate E[Ta ] and plug this estimate into (6.4.13).

6.5. OPTIMAL AGE REPLACEMENT FOR FATIGUE DATA

149

Parametric approach Here the situation is much easier and reminds the case of complete samples, with the only difference being the form of the likelihood function: Lik =

r Y

f ((1 − a)li + ahi ; θ) ·

i=1

m  Y

 1 − F ((1 − a)li + ahi ; θ) . (6.4.17)

i=r+1

Here i = 1, ..., r are the numbers of complete observations, and i = r + 1, ..., m – the numbers of the right-censored ones. The rest of computations is similar to the procedure in the previous section.

6.4.3

Quantal-type data: parametric approach

In real-life situations, very often the reliability data come from periodic inspections and thus there are no ”complete” observations at all. Below we will consider an example in which the data are based on a single inspection. Suppose that an item is examined at some point (li , hi ) in the (L, H) plane. The information received is as follows: failure exists or does not exist. Geometrically, if the failure exists, its appearance point has coordinates in the rectangle with vertices (0, 0), (0, hi ), (li , 0), (li , hi ). For the Ta -time scale it means that the failure appeared before the instant ta (i) = (1 − a)li + ahi . We are in a situation described in Chapter 3, Section 3 as ”Quantal response data”. Correspondingly, the likelihood function is Lik =

r Y i=1

F ((1 − a)li + ahi ; θ)

n Y

(1 − F ((1 − a)li + ahi ; θ)),

(6.4.18)

i=r+1

where the items with the ”yes”-response are numbered i = 1, ..., r and the items with the ”no”-response are numbered i = r + 1, ..., m. The search for the ”best” time scale goes exactly as described above, with the obvious change in the likelihood function. The next section presents two examples based on fatigue data.

6.5

Optimal Age Replacement for Fatigue Test Data

Example 3.1 continued The data of Example 3.1 are plotted on Fig.6.2 The algorithm described in 6.4.1 was applied with a changing on the grid 0.0(0.02)1.0 Table 6.2 shows the optimal age replacement parameters. For small d = 0.1 − 0.3, the optimal replacement age coincides with the smallest observation t1 . It increases with the increase of d, which is typical for age replacement. For d = 0.4 − 0.5, the optimal age coincides with the third ordered observation t3 . For all values of d the optimal a = 0.88. Most

150

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

Figure 6.2: The fatigue test results on the (L, H)-plane interesting is the fact that this a value is very close to a∗ = 0.873 which provides the minimal c.v ! Let us note that the minimal c.v. equals 0.399, which is considerably smaller than the c.v. in the L- orH- scale. The last column shows that the optimal age replacement provides a considerable increase in efficiency. How strong is the influence of the parameter a, i.e. the choice of the time scale, on the efficiency of the age replacement ? Table 6.3 gives the answer for d = 0.3. It is seen that the optimal scale with a = 0.88 is considerably more efficient than the best of the two scales L or H with a = 0 and a = 1, respectively # Example 4.1: fatigue cracks in the aircraft wing joints.

Table 6.2: Optimal age replacement parameters d Optimal a Optimal age xa 0.1 0.88 34, 200 0.2 0.88 34, 200 0.3 0.88 34, 200 0.4 0.88 38, 280 0.5 0.88 38, 280

r 1 1 1 3 3

Efficiency Q 5.60 3.01 2.06 1.59 1.32

6.5. OPTIMAL AGE REPLACEMENT FOR FATIGUE DATA

151

Table 6.3: Efficiency Q as a function of a a=0 1.03

a = 0.4 a = 0.6 a = 0.8 a = 0.88 1.04 1.29 1.86 2.06

a = 1.0 1.58

Table 6.4: Cracks in the wing joints Object i 1, 2 3, 4 5, 6 7, 8 9, 10 11, 12 13, 14 15, 16 17, 18 19, 20 21, 22 23, 24 25, 26 27, 28 29, 30 31, 32 33, 34

flight time Li [hrs] # flights Hi 30 15 40 20 60 40 80 60 170 129 180 128 230 140 250 140 350 170 380 160 640 290 670 320 780 260 840 420 860 430 560 470 960 360

∆R 0 0 0 0 0 0 0 0 0 0 0 0 4 0 3 3 0

∆L 0 0 0 0 0 0 0 0 0 0 0 0 0 9 5 4 5

Table 6.4 shows the results of inspecting aircraft wing joints. In each inspection, the size of the fatigue crack ∆R and ∆L in mm has been recorded (Kordonsky and Gertsbakh, (1997)). (L and R indicate the left and the right joint, respectively). Find the optimal time scale and the optimal age replacement policy. The likelihood function is the following: Lik =

34 Y

[F ((1 − a)li + ahi ; θ)]δi · [1 − F ((1 − a)li + ahi ; θ)]1−δi , (6.5.19)

i=1

where δi = 1 if the object i had a crack, and δi = 0, otherwise. Our principal assumption is that the lifetime in Ta has a lognormal distribution:  log x − µ  a P (τa ≤ x; µa , σa ) = Φ . (6.5.20) σa

152

CHAPTER 6.

BEST TIME SCALE FOR AGE REPLACEMENT

For each a on a discrete grid 0.00(0.01)1.00, we found the corresponding MLE’s of µa and σa . For the lognormal distribution, the c.v. depends only on σa , see Section 2.3. The minimal c.v. and therefore the ”best” scale corresponds to the minimal σa . Below are the results: For the L-scale: a = 0, µa = 6.64. The standard deviation σa = 0.463. For the H-scale: a = 1, µa = 5.87. The standard deviation σa = 0.262. For the ”best” scale: a = 0.852, µa = 6.036. The standard deviation σa = 0.197. The best time scale is T0.852 = 0.148 · L + 0.852 · H. Let d = 0.2, which is a rather high cost for inspection and PM. ∗ In the ”best” scale, the optimal age for the PM is T0.852 = 302. The efficiency of the optimal age replacement Q = 1.6.

Figure 6.3: ηa∗ (T ) for Example 4.1 Let us check the probability of crack appearance on the interval [0, 302]. It equals Φ((log 302 − 6.036)/0.197) = Φ(−1.65) ≈ 0.05. Suppose that it is necessary to guarantee much smaller failure probability p = 0.001. Then the maintenance period x must satisfy the equation p = 0.001 = Φ((log x − 6.036)/0.197), or x ≈ 230. Fig. 6.3 shows the plot of ηa∗ (T ). T = 230 will reduce the efficiency of the optimal age replacement to Q = 1.45 .#

Chapter 7

Preventive Maintenance With Learning What is all Knowledge too but recorded Experience, and a product of History; of which, therefore, Reasoning and Belief, no less than Action and Passion, are essential materials ? Carlyle - Essays Learned fools are the greatest fools Proverb

7.1

Information Update: Learning From New Data

An important feature was missing in all preventive maintenance (PM) models considered so far. This feature is learning. Our decisions regarding the timing and type of maintenance actions were based solely on the information about the system, which we had before we actually started implementing the maintenance policy. Denote the information we initially had by Data(0). Then our policy so far was the following : The decisions on PM actions during the whole system life period were determined solely on the basis of Data(0). In reality, during system operation there is always an information influx regarding system properties, maintenance costs, etc. The main sources of this 153

154

CHAPTER 7. MAINTENANCE WITH LEARNING

additional information are new laboratory test data, field data coming from similar systems, new information from the producer, observations on the system under service, etc. The right thing to do is to reconsider, in the light of this new knowledge, the decisions which were taken previously and were based on lesser knowledge. So we arrive at preventive maintenance with learning. The formal framework for PM with learning will be the so-called ”rolling horizon” model. We consider an inspection/renewal policy of a system based on the knowledge of its lifetime distribution. Our initial knowledge denoted Data(0) leads to the following decision: carry out the inspections of the system at the instants {t1 , t2 , t3 , ...}. This sequence maximizes the reward for the given Data(0). We follow this policy only one step, i.e. during the period [0, t1 ]. At the instant t1 , we reconsider all information we have: Data(0) plus all new information received during [0, t1 ]. This constitutes our new information denoted symbolically as Data(1) = Data(0) data(1). Now we look for the optimal maintenance policy for the remaining time (starting at t1 ), given Data(1). Suppose that the ”best” inspection times now are {t∗2 , t∗3 , ...}. Carry out the first inspection at time t∗2 , reconsider at t∗2 the inspection policy in the light of new data(2), check the system at the next optimal inspection instant, update the information, etc. The information regarding system lifetime consists of two parts: permanent and variable. The permanent part is the list of all feasible lifetime c.d.f’s: {F1 (t), F2 (t), ..., Fm (t)} .

(7.1.1)

The variable part is the collection of prior/posterior probabilities: {p1 , p2 , ..., pm },

m X

pi = 1 .

(7.1.2)

i=1

These are subject to change, according to the Bayes theorem, depending on the information received during the service and maintenance process of the system. Suppose that at a certain stage of this process the variable information (7.1.2) is {ˆ p1 , pˆ2 , ..., pˆm }. Then our immediate decision regarding the system lifetime will be based on the assumption that the ”true” lifetime has the c.d.f. Fˆ (t) =

m X

pˆi Fi (t),

(7.1.3)

i=1

and that the corresponding lifetime density function is fˆ(t) =

m X

pˆi fi (t),

i=1

where fi (t) is the density corresponding to Fi (t).

(7.1.4)

7.1. INFORMATION UPDATE: LEARNING FROM NEW DATA

155

Now let us describe the mechanism of updating the variable part of the information (7.1.2). We assume that before the system started its operation, our initial (prior) information is summarized in the form of the prior distribution: Data(0) = {p0i , i = 1, ..., m} .

(7.1.5)

Suppose that at a certain instant we receive new information in the form of complete or censored observations regarding system lifetime. For example, we receive the following laboratory test data: two system failures lifetimes x1 and x2 were observed; three other systems did not fail during the testing period [0, x3 ]. In order to simplify and make tractable the formal part of our investigation, we postulate the following two properties of the information received in the course of system operation: (I): The information influx does not depend on the properties of the system under service. In other words, it comes from ”external” sources. Denote by data(k), k = 1, 2, 3, ... the data we receive in the course of system operation at time instants tk , where t1 < t2 < t3 .... (II): {data(k), k, 1, 2, 3...} consist of observed values of statistically independent random variables. Let the likelihood function for data(k) be Lik[data(k)|j], if it is assumed that the data are generated by Fj (t). (Recall the material of Chapter 3, Section 3 on the likelihood function). Now we apply the Bayes theorem to recompute our prior distribution {p0i , i = 1, ..., m} into the posterior distribution on the basis of new data influx. After observing data(1), our posterior distribution will be p0j · Lik[data(1)|j] , j = 1, 2, ..., m. p1j = Pm 0 r=1 pr · Lik[data(1)|r]

(7.1.6)

Symbolically, Data(1) = {p1j , j = 1, 2, ..., m}. The posterior lifetime c.d.f. after observing data(1) will be F 1 (t) = F (t|data(1)) =

m X

p1j Fj (t),

(7.1.7)

j=1

with the corresponding density function f 1 (t) ≡ f (t|data(1)) =

m X

p1j fj (t).

(7.1.8)

j=1

Suppose, the variable information was updated k times. Denote the corresponding posterior distribution as Data(k) = {pkj , j = 1, ..., m}.

156

CHAPTER 7. MAINTENANCE WITH LEARNING

After receiving the next portion of information data(k + 1), the posterior distribution is updated as follows: pkj · Lik[data(k + 1)|j] P , j = 1, 2, ..., m. pk+1 = m j k r=1 pr · Lik[data(k + 1)|r]

(7.1.9)

If our posterior distribution is {pk+1 , j = 1, 2, ..., m}, we act as if the system j lifetime has the c.d.f. F k+1 (t) =

m X

pk+1 Fi (t) . i

(7.1.10)

i=1

It follows from more advanced statistical theory that if all data come from the ”true” c.d.f. Fα (t), and if the amount of information goes to infinity, the sequence of vectors {pkj , j = 1, ..., m} converges to the vector (0, ..., 1α , ...0) as k → ∞, see e.g. Kullback (1959)).

7.2

Maintenance - Inspection Model With No Information Update

Suppose that according to our present information the system lifetime density function is f (t). We assume that a reward of c1 is accrued for each unit of failure-free operation time, and a penalty c2 is paid for each unit of operation after a failure has appeared. At t = t1 the system is inspected and its true state (failure or not) is revealed. If the inspection reveals failure, the system is repaired and brought to the ”brand new” state. The cost for that action is cf . If the inspection reveals no failure, the system is preventively repaired and also brought to the ”brand new” state, for smaller cost cp . All rewards received at time t1 are discounted by a factor (1 − β)t1 . Suppose that after the inspection no new information has been received. Denote by V (f ; t1 ) the mean discounted reward for [0, ∞] if the inspection period is t1 and the lifetime has d.f. f (t). Following the above description, V (f ; t1 ) = R(f ; t1 ) + (1 − β)t1 V (f ; t1 ), where R(f ; t1 ) is the one-step reward: Z t1 R(f ; t1 ) = c1 xf (x)dx + c1 t1 (1 − F (t1 )) − 0 Z t1 c2 (t1 − x)f (x)dx − cf F (t1 ) − cp (1 − F (t1 )) .

(7.2.1)

(7.2.2)

0

Note that in order to simplify the formal treatment of our problem we adopted a simplified discounting scheme: the rewards received during the first

7.3.

ROLLING HORIZON AND INFORMATION UPDATE

157

inspection period [0, t1 ] are not discounted. The discounting in a form of a factor (1 − β)t1 is applied only to the future rewards. In order to find out the best time for the first inspection, let us introduce the maximal discounted reward V (f ). It satisfies the following Dynamic Programming-type recurrent relationship: V (f ) = max[R(f ; t) + (1 − β)t V (f ; t)] . t>0

(7.2.3)

Indeed, if we choose the inspection period t, our reward will be R(f ; t) for the first step plus the future rewards V (f ; t) discounted by the factor (1 − β)t . Suppose that the maximum in (7.2.3) is attained at t = t∗ . Then the optimal sequence of inspection times is {k · t∗ , k = 1, 2, ...}. V (f ) satisfies the equation ∗ V (f ) = R(f ; t∗ ) + (1 − β)t V (f ; t∗ ). Thus V (f ) = R(f ; t)/(1 − (1 − β)t ) and, by the definition of V (f ), t∗ = argmaxt>0

7.3

R(f ; t) . 1 − (1 − β)t

(7.2.4)

Rolling Horizon and Information Update

The optimal inspection policy with no information update is periodic, with the period t∗ found above. Now we implement information updating for the nomination of the inspection times. We start our process with Data(0) = {p0i , i = 1, ..., m}, where p0i is the prior probability that the true lifetime density is fi (t). We plan the first inspection at t∗1 : t∗1 = argmaxt>0

R(f 0 ; t) , 1 − (1 − β)t

(7.3.1)

where f 0 (t) is determined by (7.1.5). At t = t∗1 , a new information in the form of data(1) is received. We update our prior information regarding the lifetime density via the Bayes rule (7.1.6). Now we plan the next inspection as if the true lifetime density is (in the light of our new knowledge) 1

f (t) =

m X

p1j fj (t) .

(7.3.2)

j=1

Suppose that the second inspection takes place at the instant t2 = t∗1 + t∗2 where t∗2 is found from the relationship t∗2 = argmaxt>0

R(f 1 ; t) . 1 − (1 − β)t

Update again the information, and proceed in a similar way.

(7.3.3)

158

CHAPTER 7. MAINTENANCE WITH LEARNING

A typical (k + 1)-st step of the above procedure is as follows. Suppose that the (k + 1)-st inspection is carried out at the instant tk+1 = t∗1 + ... + t∗k + t∗k+1 . The information data(k+1) is received at tk+1 . Then update the information following the Bayes formula (7.1.9); schedule the next inspection at tk+2 = tk+1 + t∗k+2 , where t∗k+2 is found as t∗k+2 = argmaxt>0

R(f (k+1) ; t) . 1 − (1 − β)t

(7.3.4)

Here f

(k+1)

=

m X

pk+1 fj (t) . j

(7.3.5)

j=1

7.4

Actual and ”Ideal” Rewards

Suppose that the ”true” c.d.f. is Fα (t), where α can in principle be any one of the indices 1, 2, ..., m. Let us derive an expression for the actual total reward. Suppose that using the rolling horizon policy we derived the following sequence of inspection times: t1 = t∗1 , t2 = t∗1 + t∗2 , ..., tk = t∗1 + ... + t∗k , ... Then the total accumulated actual reward is Vα = R(fα ; t∗1 ) + (1 − β)t1 · R(fα ; t∗2 ) + (1 − β)t2 · R(fα ; t∗3 ) + ...,

(7.4.1)

where R(fα ; t∗1 ) is the one-step mean reward on [0, t∗1 ] computed for the lifetime density fα (·). It is interesting to compare the actual reward with the maximal reward Vmax which would have been received if already at t = 0 the true CDF Fα had been known. Obviously, Vmax = max[R(fα ; t) + (1 − β)t Vmax ] . t>0

(7.4.2)

Let us to characterize the relative efficiency of the rolling horizon policy by the ratio η=

7.6

Vα . Vmax

(7.4.3)

Numerical Example

Our initial information regarding the lifetime is the following : F (t) may be one of the following three c.d.f.’s: F1 (x) = 1 − e−x , the exponential distribution; 2 F2 (x) = 1 − e−x , the Weibull distribution with λ = 1, β = 2;

7.6.

NUMERICAL EXAMPLE

159

2

F3 (x) = 1 − e−0.25x , the Weibull distribution with λ = 0.5, β = 2. To get better feeling of them note that F1 has mean value 1, F2 has mean 0.886 and F3 has mean 1.77. The respective density functions are f1 (x) = e−x , f2 (x) = 2xe−x , f3 (x) = 0.5xe−x . Our Data(0) is summarized in the following vector of prior probabilities: Data(0) = p0 = (1/3; 1/3; 1/3). We assign, therefore, equal probabilities to any one of the above three possibilities. The reward (cost) parameters are: c1 = 50; c2 = 10, cf = 20, cp = 1. We assume that the discount factor is β = 0.1. We start with the lifetime density which is equal f 0 (t) =

3 X

  2 2 p0i fi (t) = (1/3) e−t + 2te−t + 0.5te−0.25T .

(7.5.1)

i=1

The relationship (7.2.4) gives t∗1 = 0.256. Suppose that no new information has been received at the instant t1 = t∗1 = 0.256. According to our approach, it has been decided to carry out the next inspection at t2 = 0.256 + 0.256 = 0.512. Again, there was no new information. New data became available only at the instant t4 = 4 · 0.256 = 1.024. So, data(1), data(2), data(3) were zero. data(4) has the form of three observed lifetimes: data(4) = {x1 = 0.904; x2 = 1.222; x3 = 0.921}. These are complete observations. The expression for the likelihood is Lik[data(4)|j] =

3 Y

fj (xi ) .

(7.5.2)

i=1

Using the Bayes formula (7.1.6) we recompute our prior probabilities and obtain the updated information: p4 = {0.105; 0.767; 0.128}. We write an upper index ”4” since this vector is obtained at t4 . Now our starting point  is the lifetime density  P3 2 2 4 4 f (t) = i=1 pi fi (t) = 0.105 · e−t + 0.767 · 2te−t + 0.128 · 0.5te−0.25t . Using (7.2.4) we get t∗5 = 0.219. At t5 = 1.024 + 0.219 = 1.243 no new data arrived. We apply the inspection period 0.219 once again, and at the time t6 = 1.243 + 0.219 = 1.462 we have a new portion of information which consists of three noncensored observations: data(6) = {x1 = 0.971; x2 = 1.149; x3 = 0.246}. Using the Bayes formula we obtain p6 = {0.056; 0.931; 0.013}. It is decided now that the ”true” c.d.f. is F2 , i.e. W (λ = 1, β = 2). We act further as if the prior probabilities were p = (0; 1; 0). The data given above were randomly generated from the population F2 . There is ≈ 0.07 probability that our choice of the ”true” lifetime is wrong.

160

CHAPTER 7. MAINTENANCE WITH LEARNING

If the true c.d.f. were F2 then (7.2.4) would provide the answer t∗ = 0.207. So, on the interval [t6 , ∞], the inspection period is constant and equals 0.207. Suppose that we knew from the very beginning that the true lifetime is F2 . Then we would use this period from the very beginning and by (7.2.3) the mean reward would have been Vmax = V (f2 ) = maxt>0 [R(f2 ; t) + 0.9t V (f2 ; t)] = 388.3. Let us compute by (7.4.1) the mean actual reward. The actual reward corresponds to the ”true” lifetime F2 and the actual nonoptimal choice of the inspection periods. Calculating R(f2 ; t) by (7.2.2) for t = 0.256, t = 0.219, t = 0.207, we obtain: R(f2 ; 0.256) = 10.3; R(f2 ; 0.219) = 8.7; R(f2 ; 0.207) = 8.3. Calculations by (7.4.1) give Vα = V2 = 384.9. This number is surprisingly close to Vmax ! By (7.4.3), η = 0.99. #

7.6

Exercises to Chapters 5,6,7

1. A system has three states denoted 1,2 and 3. The transitions between these states occur according to a Markov chain with transition probabilities P12 = 0.3, P13 = 0.7; P21 = 0.1, P23 = 0.9, P32 = 1. The system stays in states 1 and 3 exactly one hour, after which a transition occurs. In state 2 the system stays a random time τ ∼ Exp(λ = 0.5). Each transition gives a reward of $1. Find the mean reward per unit time for the above system. 2, a. Consider a multiline system described in Section 2 of Chapter 5. The time to failure τ of a line has an arbitrary c.d.f. F (t) and density function f (t). Assume that for each line a unit of operational time gives a reward crew and each unit of idle time gives a negative reward −cf . The system is totally renewed for a cost crep after k lines out of n have failed. Derive an expression for the reward per unit time. Hint. Derive first the d.f.’s of the order statistics τ(1) , τ(2) , ..., τ(n) . 2, b. Assume that n = 4, lifetime of a line has the c.d.f. W (λ = 0.25, β = 1.5), crew = 1, cf = 0.1 and crep = 2. Find numerically the optimal k. 3. 10 hydraulic high pressure hoses were installed on operating aircrafts. The hoses were periodically inspected and defined as failed after a microcrack has

7.6. EXERCISES

161

been detected. The data are as follows: i hours to failure 1 3 700 2 3 900 3 4 200 4 2 700 5 3 100 6 1 950 7 2 100 8 2 300 9 2 700 10 2 800

flights to failure 920 1 020 1 200 1 370 1 540 1 520 1 630 1 660 1 760 1 840

Consider nonparametric age replacement in the time scales H - hours, F number of flights and in the ”best” ”combined” scale Ta = (1 − a)H + aF, which provides the smallest value of the c.v. Take a = 0.0(0.05)1.00. Assume that the ER costs cER = 1, the PM costs d = 0.2. Compare the costs of the optimal age replacement in the scales H, F and in the best combined scale. 4. Consider the following modification of the age replacement: replace at age tp which is the p-quantile of c.d.f. F (t). Write a formula for cost per unit time. Try to simplify it by assuming that 1 − F (t) changes linearly between the points [0, tp ]. 5. Optimal control of spare parts. A system has one main part and r − 1 spare parts. The system has to work during time period [0, m + 1]. Any part which is put in work for a unit-time period has a constant probability of failure p during this period. Once the part has failed it is never recovered. At any instant t = k it is possible to put in work any number of parts available. For example, if at the instant k there are 5 nonfailed units, then any number i, 1 ≤ i ≤ 5 can be put in work for the period [k, k + 1]. If all parts put in work for one time period [s, s + 1] have failed, it is considered as system failure. Find the optimal policy of putting the spare parts to work which would minimize the probability of system failure for the period [0, m + 1]. Hint. Define random process ξk as the number of nonfailed parts at the instant t = k. Put a unit cost for a transition into state 0, and zero cost for all other transitions. Denote by Wi (m) the minimal average cost for a system which has to work m units of time and has initially i nonfailed parts. Write a Dynamic programming - type recurrence relation expressing Wj (m + 1) through Wj (m). Argue that Wj (m) is equal to the probability of system failure on [0, m]. For more details see Gertsbakh (1977), p 147.

162

CHAPTER 7. MAINTENANCE WITH LEARNING

6. Suppose that p0j is the prior probability that the ”true” density is fj (t), j = 1, 2, ..., m. Suppose that the following data, in a form of a complete sample, has been received: data = {x1 , ..., xn−1 , xn } Then the posterior probabilities are, according to (7.1.6), Qn p0j · j=1 fj (xi ) 1 pj = Pm 0 Qn , k=1 pk · i=1 fk (xi ) j = 1, 2, ..., m. Suppose that the data come in portions. The first portion is data(1) = {x1 , ..., xn−1 } and the second data(2) = {xn }. Then we can act follows. First, compute the posterior probabilities from data(1) : Qn−1 p0j · i=1 fj (xi ) 1 pˆj = Pm 0 Qn−1 , k=1 pk · i=1 fk (xi ) j = 1, ..., m. Second, view these probabilities as prior ones and recompute the posterior as pˆ1j · fj (xn ) pˆ2j = Pm 1 . ˆk · fk (xm ) k=1 p Show that for all j = 1, ..., m, pˆ2j = p1j . 7.Optimal inspection-repair policy of a partially renewed system A system has three states: 0,1 and 2. State 0 corresponds to a new system, state 1 is a ”dangerous” state, state 2 is the failure state. System evolution in time is described by a continuous time Markov process ξ(t) with time-dependent transition rates: λ01 (t) = t, λ12 (t) = t. All other transition rates equal zero. The system starts operating from state 0. The nearest inspection is planned after time T . If the inspection reveals state 1, ξ(t) is shifted into state 0, and a new inspection is planned after time T . If the inspection reveals failure, the process stops. The following costs are introduced: the inspection cost is cins = 1, the repair (shift) cost is crep = 2. Each unit of time the system is in failure state costs B = 5, the cost for failure is Af = 10. a. Derive the transition probabilities for ξ(t); b. Derive the recurrence relation for minimal costs for finding the optimal inspection period T ∗ . Hint. Follow the reasoning of subsection 5.4.4 to derive the following system of differential equations for P0i (t) = P (ξ(t) = i|ξ(0) = 0): 0 P00 (t) = −λ0,1 (t)P0,0 (t); 0 P01 (t) = λ0,1 (t)P0,0 (t) − λ12 (t)P0,1 (t).

7.6. EXERCISES

163

Solve this system for the initial condition P00 (0) = 1, P01 (0) = 0. Note that P02 (t) = 1 − P00 (t) − P01 (t). 8. Two-state continuous-time Markov process A system has two states : 0 and 1. The transition rate from 0 to 1 is λ, the transition rate from 1 to 0 is µ. The system is in state 0 at time t = 0. Denote by P0 (t), P1 (t) the probabilities that the system is at time t in state 0 or 1, respectively. Use the Laplace transform technique to find the probabilities P0 (t), P1 (t). Investigate their behavior for t → ∞.

164

CHAPTER 7. MAINTENANCE WITH LEARNING

Solutions to Exercises The real danger is not that machines will begin to think like men, but that men will begin to think like machines. Sydney J. Harris Chapter 1 1, a. There are three minimal paths: {1, 2}, {1, 3}, {4}. By (1.1.4), the structure function is: φ(x) = 1 − (1 − x1 x2 )(1 − x1 x3 )(1 − x4 ). 1, b. Multiply the expressions in the first two brackets and use the fact that x21 = x1 . Then φ(X) = 1 − (1 − X1 X2 − X1 X3 + X1 X2 X3 )(1 − X4 ). Now r0 = E[φ(X)] = 1 − (1 − p1 p2 − p1 p3 + p1 p2 p3 )(1 − p4 ) = ψ(p1 , p2 , p3 , p4 ). 1, c. The upper bound is the reliability of a parallel system with independent paths {1, 2}, {1∗ , 3}, {4}. By (1.3.10), φU B (p) = 1 − (1 − p1 p2 )(1 − p1 p3 )(1 − p4 ). Similarly, the lower bound is the reliability of a series connection of two parallel systems which correspond to the minimal cuts {1, 4}, {2, 3, 4∗ }. Thus φLB (p) = (1 − (1 − p1 )(1 − p4 ))(1 − (1 − p2 )(1 − p3 )(1 − p4 )). 1, d. Substitute pi = e−λi t into the expression of r0 . Denote the result by r0 (t). This will be the system reliability. The lifetime distribution function is F (t) = 1 − r0 (t). 1, e. The stationary availability of element i is Ai = µi /(µi + νi ). According to the data, A1 = 0.815, A2 = 0.8, A3 = 0.788, A4 = 0.778. By (1.2.12) and (1.2.16) , system availability is obtained by substituting Ai instead of pi into the expression for ψ(p1 , ..., p4 ). The result is Av = 0.951. 2. Differentiate ψ(·) found in 1, b with respect to p1 , p2 and p3 . The results are : 165

166

SOLUTIONS TO EXERCISES

R1 = (1 − p4 )(p2 + p3 − p2 p3 ); R2 = (1 − p4 )p1 (1 − p3 ), R3 = (1 − p4 )p1 (1 − p2 ); R4 = 1 − p1 p2 − p1 p3 + p1 p2 p3 . Substitute the pi values and obtain that R1 = 0.45, R2 = 0.225, R3 = 0.09, R4 = 0.19. The element 1 has, therefore, the highest importance. 3. Consider any vector x. Its i-th coordinate is either 1 or zero. In the first case, the left-hand side is φ(1i , x), and it equals to the right-hand side. In the second case, the left-hand side is φ(0i , x) and again the same is in the right-hand side.

4. The mean lifetime of the whole system is (1/n + 1/(n − 1) + ... + 1). The easiest way to obtain this result is the following. The first failure takes place after time τmin = min(τ1 , ..., τn ), where τi ∼ Exp(λ = 1). From here it follows that τmin ∼ Exp(n), and E[τmin ] = 1/n. After the first failure, the situation repeats itself, with n being replaced by n − 1. Check n = 3. The mean lifetime of the parallel system will be 1/3 + 1/2 + 1 =11/6. Already for n = 4 we have 1/4+1/3+1/2+1=25/12. The answer: n ≥ 4. 5. Follows directly from (1.3.4). 6. From the expression just before (1.2.11), the exact reliability equals r0 = 2p3 + 2p2 − 5p4 + 2p5 . The upper bound is r∗ = 1 − (1 − p2 )2 (1 − p3 )2 because there are two minimal paths with two elements and two minimal paths with three elements. The lower bound is r∗ = [1 − (1 − p)2 ]2 [1 − (1 − p)3 ]2 since there are two minimal cuts of size two and two minimal cuts of size three. The numerical results are: p r∗ r0 r∗ 0.90 0.978141 0.978480 0.997349 0.95 0.994758 0.994781 0.999807 0.99 0.999798 0.999798 1.00000

Qn 7 . For a series system, ∂r0 /∂pi = j=1 pj /pi , see (1.2.4). Thus the most important component has the smallest pi , i.e. it is the first one. Similarly, for a parallel system (see (1.2.5)), the most important component has the largest pi , i.e. it is the n-th component. 9. a, b. The radar system is a so-called 2-out-of-3 system. It is operational if ether two stations are up and one is down, or all three stations are up. Therefore, its reliability is R(t) = 3(1−G(t))2 G(t)+(1−G(t))3 = 3 exp[−2λt](1−exp[−λt]+ exp[−3λt]. After simple algebra, R(t) = 3e−2λt − 2e−3λt . To obtain the mean up time, use (1.3.12). It gives E[τ ] = 5/(6λ). By

167 (1.2.11), the stationary availability is Av = E[τ ]/(E[τ ] + trep ) = 0.893 10. Obviously, φ(0, 0) = 0, φ(1, 0) = φ(0, 1) = 1. Due to the error in designing the power supply system, φ(1, 1) = 0, and thus this system is not monotone, compare with Definition 1.1. Note, e.g. that here {1} is a minimal path set, but {1,2} is not a path set. Chapter 2 1. τs = τ1 + ... + τ5 , where τi are the lifetimes of the units. The result follows from Corollary 1.1. 2. By (2.2.3), h(t) = 1/(1 − t) for t ∈ [0, 1), since the density of τ is f (t) = 1, and F (t) = t. 3. The c.d.f. of the system is Fs (t) = (1 − e−5t )(1 − e−t ), see (1.3.5) in Chapter 1. Let us investigate the failure rate using Mathematica. The following printout shows the c.d.f. denoted as ”F” the density ”f”, the failure rate ”h” and its plot. Obviously, h(t) is not a monotone function. It is easy to establish that limt→0 h(t) = 0 and that limt→∞ h(t) = 1.

5. Here it is easy to obtain an analytic solution. f (t) = e−5t + 0.8e−t . After simple algebra, h(t) = 1+0.8/(0.2+0.8e4t ). Clearly, h(t) decreases as t increases.

p 6, a. The coefficient of variation is c.v. = V ar[τ ]/E[τ ] = 0.4. Since the solution involves the Gamma function, see (2.3.14), let us use Mathematica. The computation details are seen from the following printout. The results are

168

SOLUTIONS TO EXERCISES

λ = 0.889 and β = 2.696.

√ 6, b. By (2.3.7), eσ2 − 1 = 0.4. From here it follows that σ = 0.385. To find µ, we have to solve the equation : 1 = exp(µ + 0.3852 /2). The solution is µ = −0.0741. 7. The group of two elements in parallel has c.d.f. Fb,c (t) = (1 − e−2t )2 , see (1.3.5). The failure rate of this group is hb,c (t) = 4(1 − e−2t )e−2t /(1 − Fb,c (t)). The failure rate of element a is ha (t) = 2t, see (2.3.11). The whole system has failure rate hs (t) = ha (t) + hb,c (t). 8. By (2.3.12), the mean lifetime of a is µa = Γ(1.5) = 0.886. So, its reliability is at least as the reliability of an exponential element with λa = 1/0.886. By Corollary 2.6, the lower bound on system reliability is ψL (pa , pb , pc ) = pa (1 − (1 − pb )2 ), where pa = exp(−t/µa ), and pb = exp(−2t). This bound is valid for t ∈ [0, min(0.886, 0.5)] = [0, 0.5]. 9. Substitute h(u) from (2.2.3) into (2.2.5). Represent f (u)du/(1 − F (u)) as −d log(1 − F (u)). Then integrate. 10. Consider the Laplace transform of S:

−zS

L(S) = E[e

]=

∞ X

P (K = k)E[e−z(Y1 +...+YK ) |K = k] =

k=1 ∞ X k=1 ∞ X

(1 − p)k−1 p · E[e−z(Y1 +...+Yk ) ] =

∞ X

(1 − p)k−1 p · (E[e−zY1 ])k =

k=1

(1 − p)k−1 p(λ/(λ + z))k = after some algebra = pλ/(z + pλ).

k=1

Thus L(S) = pλ/(z + pλ), which means that S ∼ Exp(p/µ), where 1/µ = λ. 11. P (X ≤ t) = P (log τ ≤ t) = P (τ ≤ et ) = 1 − exp[−(λet )β ].

169 Now substitute λ = e−a and β = 1/b: P (X ≤ T ) = 1 − exp[−e(t−a)/b ]. 12. By (1.3.4), P (τ ≤ t) = 1 − Pn ( i=1 λβi )1/β .

Qn

i=1

β

e−(λi t) = 1 − exp[−(λ0 t)β ], where λ0 =

13. A bridge has two minimal cuts of minimal size 2, {1, 2} and {4, 5}. This explains the expression g(θ). 15, a. For a series system, F (t) = 1 − exp[−λ1 t − (0.906t)4 ]. The mean of τ2 is Γ[1 + 1/4]/0.906 = 1. The system lifetime density function is f (t) = exp[−λ1 t − (0.906t)4 ](λ1 + 0.9064 · 4t3 ). It is difficult to analyze the behavior of f (t) analytically. The graphical investigation provides interesting results, as it is shown below . 15, b. The printout shows the graphs of f (t) together with the exponential density of τ1 (bold), for various λ1 values. It is seen that for λ1 ≥ 2, the graph of f (t) almost coincides with the exponential density.

When λ1 is large, i.e. when the mean value of the exponential component is small, the exponent becomes the dominant component of the density function, because the Weibull component has the density close to zero for small values of t. On the contrary, for small λ1 values, the Weibull component prevails and f (t) has a clearly expressed mode near its mean value. Chapter 3 1. According to (3.2.2), the 0.1-quantile of τ1 is the root of the equation 1 −

170

SOLUTIONS TO EXERCISES β

e−(λt0.1 ) = 0.1. After some algebra, t0.1 = (λ)−1 · (− log 0.9)1/β . Substitute the values of λ and β from the solution of 6, a, Chapter 2. The result is t0.1 = 0.488. Denote by q0.1 the 0.1-quantile of τ2 . It is the root of the equation Φ((log q0.1 − µ)/σ) = 0.1. From here (log q0.1 −µ)/σ = Φ−1 (0.1) = −1.2816. Using the result of Exercise 6, b, Chapter 2, q0.1 = exp(µ − 1.2816σ) = 0.567. 2, c. Let us carry out a straightforward maximization of the likelihood function, using Mathematica. Let t1 , ..., t10 be the observed lifetime values in an increasing order. There are three observations censored at t10 = 3. The likelihood function for the Weibull case is, according to (3.3.9), Q10 Lik = i=1 (λβ βtiβ−1 exp[−(λti )β ]) · exp[−3(λt10 )β ]. Simplify this expression by taking its logarithm. The contour plot of ”logLik” shows a clear maximum point in the neighborhood of λ = 0.4, β = 1.4. The operator ”FindMinimum” applied to the negative of ”logLik” finds out that ˆ = 0.44 and βˆ = 1.42. λ

3. Here we have quantal-type data. From (3.3.12), β β Lik = (1 − e−(2λ) )4 · (e−(2λ) )6 . Substitute λ = 0.4. Then logLik = 4 log(1 − β e−(0.8) ) − 6(0.8)β . The equation ∂logLik/∂β = 0 leads, after some algebra, to β the equation e0.8 − 1 = 1/1.5. Its solution is βˆ = 3.01. Pn 4. The likelihood is Lik = λn · exp[−λ i=1 (ti − a)]. In this expression, a ≤ t(1) = min(t1 , t2 , ..., tn ). Thus the largest possible value for a is t(1) , the smallest

171 observed lifetime. It is seen from the expression for Lik that it is maximized ˆ = by P a ˆ = t(1) . Proceed in a usual way and derive that the MLE of λ is λ n n/ i=1 (ti − a ˆ). 5. The likelihood function is Lik = [F (T ; α, β)]k1 · [F (2T ; α, β) − F (T ; α, β)]k2 · [1 − F (2T ; α, β)](n−k1 −k2 ) . 6, b. The likelihood function is Lik = β 5

5 Y i=1

exp[− dβ−1 i

6 X

dβi ] ,

i=1

where d6 = T − t6 . The printout shows the graph of the logarithm of the likelihood (denoted as ”logLik”) with a maximum near 1.4, see Out[2]. The ”FindMinimum” operator applied to -”logLik” finds out the MLE βˆ = 1.312, see Out[3].

172

SOLUTIONS TO EXERCISES

7, b. The hazard plot is presented on the printout below. It seems plausible that the true hazard plot is a convex function.

Chapter 4 1, a. We have two equations: Φ((log 2 − µ)/σ) = 8/127 = 0.063 and Φ((log 2 − µ)/σ) = 58/127 = 0.457. Φ−1 (0.063) = −1.53, Φ−1 (0.457) = −0.108. Now we have log 2 = µ − 1.53σ, log 5 = µ − 0.108σ. The solution is: µ ˆ = 1.678, σ ˆ = 0.644. By (2.3.5.) the mean of the lognormal distribution is exp[ˆ µ+σ ˆ 2 /2] = 6.59. 1, b. By (4.2.11), the cost for age replacement at age T = 3 is ηage (3) =

3500 · F (3) + 1000 · (1 − F (3)) . R3 (1 − F (t))dt 0

The denominator of this expression can be represented also in the following form (integrate by parts): Z 3 (1 − F (3))3 + tf (t)dt, 0

p where f (t) = ( (2π)σt)−1 exp[−(log t − µ)2 /2σ 2 ] is the lognormal density. F (3) = Φ((log 3 − µ ˆ)/ˆ σ ) = 0.184. The result of the integration is 0.403896. Now ηage (3) = (0.184 · 3500 + 0.816 · 1000)/(3 · 0.816 + 0.4039) = 512. Suppose that we replace the compressor only at failure. Then we pay, on the average, 3,500 $ per 6.59 years, or 531 $ yearly. This is only by 4% higher than the cost of age replacement at T = 3. 2. The mean of Gamma(k = 7, λ = 1) is µ1 = k/λ = 7, see (2.1.17). The mean of Gamma(k = 4, λ = 0.5 is µ2 = 8. Thus, the strategy (i) costs 2 · 1500/µ1 = 429 $ and 2 · 1500/µ2 = 375 $ for the first and the second c.d.f., respectively.

173 Strategy (ii) means that we replace both parts, on the average, once per the interval of the length µmin = E[min(τ1 , τ2 )], where τi is either ∼ Gamma(7, 1) or ∼ Gamma(4, 0.5). For the first distribution, numerical integration shows that µmin = 5.53. For the second distribution, µmin = 5.81. Thus the strategy (ii) costs 2 000/5.53 = 362 $ and 2 000/5.81 =344 $, for the first and the second c.d.f., respectively. Thus strategy (ii) costs always less than strategy (i). 3. The formula for the average costs for age replacement is (4.2.11). We have to compare ce = 20 with ce = 5. The printout below shows two graphs: ηage (T ) for ce = 20, the upper curve, and for ce = 5, the lower curve. The choice is between the optimal age for the upper curve, T1∗ = 0.36, and T2∗ = 0.54, for the lower curve. The worst-case cost for T1∗ is about 3.67, while the worst case cost for T2∗ is about 4.80. I would prefer T1∗ = 0.36.

174

SOLUTIONS TO EXERCISES

4. We have a block replacement with random period T. The corresponding expression for the cost criterion is Rb f (T )(100m(T ) + 5)dT η= a . E[T ] Since T ∼ N (µ = 1, σ = 0.1), we may take a = µ − 4σ = 0.6 and b = µ + 4σ = 1.4. The denominator equals µ = 1. Let us use the approximation to m(T ) based P on the formula given in Example 1.2, Chapter 4: m m(T ) = n=1 Gamma(T ; nk, λ). It is enough for our case to take m = 4. The computation results are presented in the printout below. ”lb” is the approximation to the renewal function from below, ”ub” – from above. The result is η = 6.96089. The graph shows the plots of ηb (T ) with m(T ) being replaced by the upper and lower bound, respectively (”UB” and ”LB” ). It is seen that in the neighborhood of the minimum point (near T ∗ = 1), both curves coincide. It is worth noting that in the absence of periodic replacement, the cost will be 100$/E[τ ] = 25, since the mean lifetime of the street lamp is E[τ ] = k/λ = 4.

175 5. First, find β and λ. From (2.3.14) it follows that β = 3.714. From (2.3.12) we obtain λ = 0.000451. The formula for the stationary availability is RT (1 − F (t; λ, β))dt 0 . Av (T ) = R T (1 − F (t; λ, β))dt + 50F (T ; λ, β) + 10(1 − F (T ; λ, β)) 0 After simple algebra,  50F (T ; λ, β) + 10(1 − F (T ; λ, β))  . Av (T ) = 1/ 1 + RT (1 − F (t; λ; β))dt 0 The printout shows the investigation of the fraction in the last expression, denoted as ”Num/Den”. Mathematica does not like using the upper integration limit T as a variable in the Plot and FindMinimum operators, but nevertheless carries out the calculations. The optimal replacement age is T ∗ = 1170 hrs, and the availability is Av (T ∗ ) = 1/(1 + 0.0118) = 0.988.

6. Step 1: optimization on element (part) level. Part 1: The expression for the costs is RT 10 · 0 0.1tdt + 2 A11 (T ) = = (0.5T 2 + 2)/T . T It is easy to check that the minimum on the grid T = 1, 2, 4, 8, 16 is attained at T = 2.

176

SOLUTIONS TO EXERCISES

Similarly, for Part 2, Z A12 (T ) = (10 ·

T

0.06t2 dt + 2)/T = (0.2T 3 + 2)/T .

0

and the optimal T = 2. Step 2: optimization on the system level. The basic sequence is (2, 2), (4, 4), (8, 8), (16, 16). The expression for the costs is η(T ) = (0.5T 2 + 2)/T + (0.2T 3 + 2)/T + 8/T . The last term 8/T reflects the system set-up costs paid at each planned replacement. It is easy to check that the minimum of η(T ) is attained at T = 2, i.e. for the first vector of the basic sequence. The conclusion‘: replace both parts each two units of time. The optimal value of η(T ) is 7.8. For comparison, η(1; 2) = 12.3. 7, a. The first step is solving the equation (A.2.9) in Appendix 1. Ttot = T 1 + T 2, where T 1 = 26 + 32.5 + 43.3 = 101.8 in thousands of miles. Similarly, T 2 = 19.5 + 35.3 + 50.2 + 68.7 = 173.7 . n1 = 3, n2 = 4, are the number of observed engine failures. The expression ”eq” (see the printout) is the left-hand side of (A.2.9) ( the fraction is transferred to the left). The operator ”FindRoot” produces βˆ = 0.0240447. Substituting it into (A.2.11) gives α ˆ = −3.68552. 7, b. The relevant expression for the cost criterion is (4.2.7). Its numerator equals RT ˆ ˆ num = 1000 · 0 exp[ˆ α + βt]dt + crep = 1000eαˆ (ExpβT − 1)/βˆ + crep , where crep lies between 100 and 300. The denominator equals T . The expression (4.2.7) for ηD (T ) is denoted in the printout as η. The plot shows two curves for the costs as a function of T . The upper curve corresponds to crep = 300, the lower – to crep = 100. The optimal repair periods are 16 000 and 25 600, for crep = 100 and crep = 300, respectively. If we assume that crep = 100, and use a near-optimal period of 15 000 miles, the worst case situation would be $50.2 per thousand miles. If we assume that crep = 300 and do a minimal repair each T ∗ = 25 000 miles, the worst-case situation would be $46.4 per 1 000 miles. The ”minimax” reasoning makes preferable the second option i.e. T ∗ = 25 000 miles.

177 This is a continuation of the printout for solving Exercise 7.

178

SOLUTIONS TO EXERCISES

8. The printout below shows the details of the numerical investigation of η(T ) for f (x) = e−x . The c.d.f. F (x) = 1 − e−x . The calculations were made for k ranging from 0 to 20. A(T ) is denoted as ”A” and ”A1”, and B(T ) as ”B” and ”B1”, for t0 = 0.05 and t0 = 0.1, respectively. (t0 is denoted as ”t”). The upper graph shows η(T ) = A/B for t0 = 0.05. The optimal T ∗ ≈ 0.27, and the maximal η(T ∗ ) ≈ 0.74. The lower graph shows η(T ) for t0 = 0.1. The optimal T ∗ ≈ 0.4, and the maximal η(T ∗ ) ≈ 0.65. An important observation is that choosing large T values, say T = 0.8 (note that it is 80 % of the mean CDE lifetime), may considerably reduce the availability. In my opinion, taking T = 0.33, i.e. in the ”middle”, would provide near optimal results for any t0 in the range [0.05-0.1].

179 9. The mixture of exponential lifetimes is a DFR distribution, see Chapter 2, Theorem 2.3. Therefore, no finite maintenance period can be optimal. The answer is T ∗ = ∞, i.e. never carry out the PM. 10. By (2.3.12), E[τ2 ] = Γ[1 + 1/2] = 0.886. Thus λ1 should be 1/0.886=1.13. 2 The lifetime of the series system is Fs (t) = 1 − e−1.13t e−t . The expression for η(T ) is RT η(T ) = (Fs (T ) + (1 − Fs (T ))cp )/ 0 (1 − Fs (t))dt. This expression is defined in In[93], see the printout. The graph of η(T ) is shown in Out[95].The ”FindMinimum” - operator In[96] provides T ∗ = 0.565 and the minimal η(T ∗ ) = 1.808. We remind that the efficiency Q = 1/(E[τs ] · η(T ∗ )). Out[98] shows that Q = 1.07, quite a low efficiency. The graph in Out[100] shows the function η(T ) for a system without the exponential component. The optimal T is 0.511, and the optimal η is 0.817. The corresponding efficiency (denoted as Q) is now 1.38, which is considerably higher than the previous value of 1.07. How could this fact be explained? The presence of an exponential component changes the form of the density function near zero and makes it very close to the exponential density. We advice the reader to reexamine the Exercise 15 in Chapter 2. A presence of an exponential component with mean life which is equal or smaller than the mean life of the aging Weibull component would make Q near 1. We call this phenomenon ”exponential minimum-type contamination”. It reduces drastically the efficiency of the optimal age replacement. The practical conclusion is the following. If there is a suspicion that the series system contains many exponentially distributed components, it is not advisable to replace preventively the whole system. The PM should be applied only to that part of the system which has a clearly expressed aging. In our example, this is the second component whose life follows the Weibull distribution with β = 2.

180

SOLUTIONS TO EXERCISES

Below is the continuation of the printout for solving Exercise 10.

11. Check that σ = 0.246, µ = −0.030258,and λ = 0.913, β = 4.5 provide that the c.v.=0.25 and the mean =1 for both distributions. Let us remind that for τ ∼ logN (µ, σ), the c.d.f. is Φ((log(t) − µ)/σ). The corresponding calculations are carried out in In[134], see the printout below. Note that one has to call the package of Continuous distributions and define afterwards the standard normal c.d.f. as NormalDistribution[0,1]. Out[141] presents the graphs of ηage (T ) for the lognormal and the Weibull cases. Both curves practically coincide and therefore, maintenance age T ∗ ≈ 0.6 for both cases.

181 Below is the printout for Exercise 11.

P5 12. The ”FindMinimum” operator applied to S = i=1 (log ni − a − α(i − 1))2 gives a ˆ = 1.93, α ˆ = 0.277. The ”Plot” operator produces the graph of η(K) shown below. The optimal K ∗ = 5, and the minimal costs are about 500 $ per cycle.

182

SOLUTIONS TO EXERCISES

Chapters 5,6,7 1. The system of equations for finding the stationary probabilities is (5.4.5). In our case, the system is π1 = 0.1π2 ; π2 = 0.3π1 + π3 ; π1 + π2 + π3 = 1. Its solution is presented on the printout below: π1 = 0.048, π2 = 0.483, π3 = 0.469. (Note, that one of the equations of the system π = πP can be deleted. We deleted the third one). The mean one-step rewards are equal 1, the mean transition times are: ν1 = ν2 = 1, ν3 = 2. By (5.4.14), g = 1/(π1 + π2 + π3 · 2) = 0.681.

2, a. The mean length of one operation cycle is E[τ(k) ] = µ(k) . On this cycle, the mean total ”up” time of all lines is E[W (k)] = nµ(1) + (n − 1)(µ(2) − µ(1) ) + ... + (n − k + 1)(µ(k) − µ(k−1) ), see (5.2.1). The mean total operation time of all lines during one cycle is nµ(k) and thus the mean total idle time is E[Idle] = nµ(k) − E[W (k)]. Thus the mean reward per unit time is χk =

crew · E[W (k)] − cf · E[Idle] − crep . µ(k)

Let fi (t) be the density of the i-th order statistic: fi (t) =

n! [F (t)]i−1 [1 − F (t)]n−i f (t) . (i − 1)!(n − i)!

Its derivation is instructive. The i-th ordered observation lies in the interval (t, t + dt) if (i) a single observation lies in this interval; (ii) some i − 1 out of the remaining n − 1 observations are  less or equal than t. The probability of (i) is n1 f (t)dt; the probability of (ii) is  n−1 i−1 [1 − F (t)]n−i . To find µ(i) , we have to integrate: i−1 [F (t)] Z ∞ µ(i) = tfi (t)dt . 0

Using Mathematica operator ” NIntegrate” is very convenient for the numerical evaluation of µ(i) .

183 2, b The printout below shows the computation details. F (t) = 1−exp[−(0.25t)β ]. The density f (t) is found using the operator ”D[F,t]”. f 1, f 2, f 3 and f 4 are defined according to the formula above. µ1, mu2, µ3 and µ4 are the mean values of the order statistics. The upper limit 40 serves as the ”infinity”. The printout shows the values of χ1, ..., χ4. The optimal choice is k = 2 which guarantees the highest reward 2.74. The choice k = 1 is the second best with reward 2.60. The worst choice is k = 4, which would reduce the reward to 1.80.

184

SOLUTIONS TO EXERCISES

3. In order to be able to process data, we need to call ”DescriptiveStatistics”. The data are presented in a form of points (hours, flights) on the (H, F )-plane. The operator ”For” investigates the coefficient of variation of the lifetime, which is defined as a linear combination of two time scales: H –hours, and F – flights: T data = (1 − a) · Hdata + a · F data. a = 0.0(0.05)1.00. The output (only the cases a = 0, a = 0.7 and a = 1 are displayed) shows that a = 0.7 provides the smallest c.v. (the c.v. is denoted as ”cvT”). It equals 0.082 while for the lifetime in the number-of-flights scale it equals 0.204. To compute the optimal replacement age, we use the formula (6.4.13). Note that Fˆa (t) is a piece-wise constant function. For all three scales H, F, T0.7 , the optimal replacement age coincides with the smallest observed lifetime. The results are: γˆ = 0.302, for the H -scale; x∗ = 1950 hours; γˆ = 0.314, in the F -scale; x∗ = 920 flights; γˆ = 0.229, in the T0.7 -scale, x∗ = 1649 . We see, therefore, that using the scale with the smallest c.v. provides a considerable reduction in the mean cost γ.

185

4. Consider the expression (6.3.7). Substitute x = tp and use the fact that F (tp ) = p. Then (6.3.7) takes the form: p + (1 − p)d . γ(tp ) = R tp (1 − F (x))dx 0 The function 1 − F (x) equals 1 at x = 0 and 1 − p at x = tp . Assuming linearity, it is easy to obtain that the integral equals (1+(1−p))tp /2. Substitute this into the previous formula. The result is γ(tp ) =

2(p + (1 − p)d) . (2 − p)tp

Carrying out the PM at age tp has the advantage that the failure probability before the PM does not exceed a predetermined value p. 5. Suppose we have k nonfailed parts which are put in work for one unit-time period. Denote by b(k, z) the probability that exactly z out of k will fail during this period, z = 0, 1, ..., k. Obviously,   k z b(k, z) = p (1 − p)k−z . z Suppose that we have ahead of us m + 1 unit-time periods and j nonfailed parts. We decide to put in work kj parts, kj ≤ j, after which we follow the optimal policy. The average cost will be then kj −1

ˆ j (m + 1) = b(kj , kj ) · 1 + W

X

b(kj , z)Wj−z (m) .

z=0

Indeed, we pay 1 for a failure during the unit interval, which happens when all parts put in work have failed during this interval. If z parts fail, z = 0, 1, ..., kj − 1, then with probability b(kj , z) we pay, by the definition of Wi (m), the minimal cost Wj−z (m) during the m remaining intervals, since we start with j − z nonfailed parts.

186

SOLUTIONS TO EXERCISES

Applying the optimality principle, we write kj −1

Wj (m + 1) = min

1≤kj ≤j

h

b(kj , kj ) +

X

i b(kj , z)Wj−z (m) .

z=0

In order to find the optimal policy, we have to apply the ”backward motion” algorithm. First, find Vj (1) by assuming that there is only one period ahead, and j nonfailed parts are available, j = 1, ..., r. (Set Vj (0) = 0). Then, using the values of Vj (1), calculate Vj (2) via the above recurrence relation, etc. Obviously, Wj (1) = min1≤kj ≤j pkj = pj = b(j, j). This indicates that Wj (1) is equal to the probability that all j parts will fail on a unit-length interval. Suppose that we are at t = m − 1 and there are two unit-length periods ahead. Then kj −1

Wj (2) = min

1≤kj ≤j

h

b(kj , kj ) +

X

i b(kj , z)Wj−z (1) .

z=0

The system failure in [m − 1, m + 1] can occur in two ways: either all kj parts put on work at t = m − 1 fail during [m − 1, m], or z parts fail on the first period, z < kj , and the remaining j − z parts fail during [m, m + 1]. Now it is clear that the expression in the brackets is equal to the probability of system failure during [m−1, m+1], calculated under the following conditions: (i) The process starts at t = m − 1 with j nonfailed parts; (ii)The first decision is to put in work kj parts; (iii) The decision at t = m is made optimally. This means that Vj (2) is equal to the minimal probability of a failure in a two-step process. By analogy, it can be shown that Vj (m) gives the desired value of the minimal probability in a m-step process. 6. Substitute pˆ1j , j = 1, ..., m into the expression for pˆ2j . The denominator of pˆ1j cancels out, and we arrive at the formula for p1j . 7 a. Let the system be in state 0 at t = 0. Consider the interval [t, t + h], where h > 0, h → +0. The system will be in state 0 at t + h if it was in this state at t and no transition appeared into state 1 during [t, t + h]. This leads to the equation: P00 (t + h) = P00 (t)(1 − λ01 (t)h) + o(h) . After simple algebra, it leads to the differential equation 0

P00 (t) = −λ01 (t)P00 (t) . Its solution for λ01 (t) = t and the initial condition P00 (0) = 1 is P00 (t) = exp[−t2 /2]. This result was expected: the sitting time in state 0 has the Weibull distribution, since the transition rate to 1 (i.e. the ”failure” rate) is λ01 (t) = t.

187 Relating the possibilities to be in state 1 at time t + h to the possibilities to be either in state 0 or in state 1 at time t, we arrive at the following equation P01 (t + h) = P01 (t)(1 − λ12 (t)h) + P00 (t)λ01 (t)h + o(h) . After simple algebra, letting h → 0, we obtain: 0

P01 (t) = −λ12 (t)P01 (t) + λ01 (t)P00 (t). The initial condition is P01 (0) = 0. Check that the solution is P01 (t) = 2 exp[−t2 /2]t2 /2. Thus P02 (t) = 1 − P00 (t) − P01 (t) = 1 − e−t /2 (1 + t2 /2). Note that P02 (t) = P (τ02 ≤ t), where τ02 is the transition time from state 0 to state 2. The corresponding density function f02 (t) is the derivative of P02 (t). 7 b. Let us derive the recurrence equation for the optimal inspection period. Denote byW0 the minimal costs under the optimal inspection policy. Let T be the length of the first inspection period. Then

Z h W0 = min cins +

T

(A + B(T − x))f02 (x)dx + i W0 (P01 (T ) + P00 (T )) + crep P01 (T ) . 0 0, Pk (t, s + h) = Pk (t, s)P0 (t + s, h) + Pk−1 (t, s)P1 (t + s, h) + o(h) , where P0 (t + s, h) = 1 − λ(t + s)h + o(h), and P1 (t + s, h) = λ(t + s)h + o(h). Therefore, Pk (t, s + h) = Pk (t, s)[1 − λ(t + s)h] +Pk−1 (t, s)λ(t + s)h + o(h),

(A.1.7)

from which it follows that Pk (t, s + h) − Pk (t, s) = λ(t + s)[Pk−1 (t, s) − Pk (t, s)] + o(1) . h In the limit, for any k > 0 0

Pk (t, s) = λ(t + s)[Pk−1 (t, s) − Pk (t, s)] .

(A.1.8)

This system must be solved with the initial condition Pk (t, 0) = 0 for k > 0. It is instructive to demonstrate an efficient way of obtaining the solution. Introduce the following generating function: F (t, s; x) =

∞ X

Pk (t, s)xk .

(A.1.9)

k=0

Multiply the equation (A.1.8) by xk and sum up from k = 0 to infinity. (Set P−1 (t, s) ≡ 0). Then, after some algebra, we obtain ∂F = (x − 1)λ(t + s)F, ∂s

(A.1.10)

A.1. BASIC PROPERTIES

193

or ∂ log F = (x − 1)λ(t + s) . ∂s

(A.1.11)

It follows now from (A.1.11) that Z log F (t, s; x) − log F (t, 0; x) = (x − 1) Z t+s λ(y)dy = (x − 1)Λ(t, s) . (x − 1)

s

λ(t + v)dv =

(A.1.12)

0

t

Since F (t, 0; x) = P0 (t, 0) = 1, the solution of (A.1.12) is F (t, s; x) = exp[(x − 1)Λ(t, s)] = exp[−Λ(t, s)]

∞ X [Λ(t, s)]k xk k=0

k!

. (A.1.13)

Comparing this expression with the definition (A.1.9) of F (t, s; x), we see that Pk (t, s) = exp[−Λ(t, s)]

[Λ(t, s)]k , k≥0. k!

(A.1.14)

We see therefore that the number of events in (t, t + s) in a NHPP follows a Poisson distribution with parameter Λ(t, s). For a particular case of λ(t) = λ, we arrive at the result (2.1.6) for a Poisson process with rate λ. Let us consider the distribution of the random intervals between two adjacent events in a NHPP. The situation for a Poisson process with constant rate λ was very simple: the interval between any two adjacent events is τ ∼ Exp(λ). For a NHPP, the distribution of the distance between the event which took place at t∗ and the next event depends on t∗ . Suppose that an event took place in a NHPP at the instant t. Denote by τt the interval to the next event. Corollary 1.1 (i) In a NHPP with intensity function λ(t), the c.d.f. of τt is Z t+x P (τt ≤ x) = Ft (x) = 1 − exp[− λ(v)dv],

(A.1.15)

t

and the density function of τt is Z t+x ft (x) = λ(t + x) · exp[− λ(v)dv] .

(A.1.16)

t

(ii) The random variable τt does not depend on the history of the NHPP on the interval [0, t]. Proof.

194

APPENDIX 1: NONHOMOGENEOUS POISSON PROCESS

(i) Consider the R t+xinterval (t, t + x). By (A.1.6), − λ(v)dv P0 (t, x) = e t . If there are no events in (t, t + x), then the interval to the next event exceeds x. This proves (A.1.15) because Ft (x) is the probability of the complementary event. The density function ft (x) is a derivative of Ft (x) with respect to x. (ii) Follows from the assumptions (i) -(iv) of the NHPP. For λ(t) ≡ λ, we arrive at the exponential distribution of the interval between adjacent events. Note that the mean number of events on [0, t] in a NHPP equals µ[o,t] = Λ(t) .

(A.1.17)

This follows directly from the fact that the number of events on [0, t] has a Poisson distribution (A.1.14).

A.2

Parametric estimation of the intensity function

In this section we consider two important types of the intensity function - the so-called log-linear form and the Weibull form. Log-linear form of the intensity function Suppose that the intensity function of the NHPP has the form: λ(t) = eα+βt ,

(A.2.1)

where α and β are unknown parameters. Following Cox and Lewis (1966), we describe the maximum likelihood approach to the estimation of α and β. Suppose we observe a NHPP on the interval [0, T ], and suppose that the events took place at the instants 0 < t1 < t2 < ... < tn < T . Then by Corollary 1.1, the likelihood function has the following form: Lik(t1 , ..., tn ) = f0 (t1 ) · ft1 (t2 − t1 ) · ... · ftn−1 (tn − tn−1 ) · [1 − Ftn (T − tn )] .

(A.2.2)

For ft (x) and Ft (x) defined in Corollary 1.1, the likelihood function takes the form: Z T n Y Lik((t1 , ..., tn ) = λ(ti ) exp[− λ(v)dv] . (A.2.3) i=1

0

A.2. INTENSITY FUNCTION: ESTIMATION

195

Substitute λ(t) from (A.2.1) into (A.2.2) and take the logarithm. Then, after simple algebra, logLik(t1 , ..., tn ) = nα + β

n X

ti − eα [eβT − 1]/β .

(A.2.4)

i=1

The MLE for α and β will be found as the solution of the maximum likelihood equations: ∂logLik/∂α = n − eα (eβT − 1)β −1 = 0,

(A.2.5)

and ∂logLik/∂β =

n X

  ti − eα − (eβT − 1)β −2 + eβ T β −1 = 0 .

(A.2.6)

i=1

From (A.2.5) it follows that eαˆ =

nβˆ ˆ eβT

−1

.

(A.2.7)

Substituting eαˆ from (A.2.7) into (A.2.6), we obtain, after some algebra, the equation for the MLE of β: n X

ti + nβˆ−1 =

i=1

nT

.

ˆ 1 − e−βT

(A.2.8)

Suppose that we have observed several independent realizations of the NHPP with the same intensity function (A.2.1). The data from the i-th realization are (i) (i) as follows: the events took place at the instants t1 , ..., tni ; the process was observed on the interval [0, Ti ], i = 1, 2, ..., m. The ”overall” likelihood function will be the product of m likelihood functions for separate realizations. The derivation of the equations for finding the MLE’s is similar. We present the final result. The equation for βˆ is Pm Pm βT m ˆ i X Ti i=1 ni · i=1 e ˆ Ttot + ni /β = , (A.2.9) Pm βT ˆ i −m i=1 e i=1 where Ttot =

n1 X j=1

(1)

tj + ... +

nm X

(m)

tj

.

(A.2.10)

j=1

The equation for α ˆ takes the form: P m βˆ ni eαˆ = Pm i=1 . ˆ i βT −m i=1 e

(A.2.11)

196

APPENDIX 1: NONHOMOGENEOUS POISSON PROCESS

ˆ We will derive It is desirable to have a measure of variability of α ˆ and β. the large-sample maximum-likelihood confidence intervals described in Chapter 3, Section 3. First we compute the second order derivatives by (multiplied by ˆ They are –1 ) evaluated at the MLE’s α ˆ and β. ˆ = n, V11 (ˆ α, β) ˆ = V12 (ˆ α, β)

n X

(A.2.12) ti ,

(A.2.13)

i=1

and ˆ = βˆ−1 V22 (ˆ α, β)

n X

ˆ − 2) + nT ti (βT



.

(A.2.14)

i=1

Vij are the elements of the observed Fisher information matrix IF .The elements of its inverse I−1 F are: n X ˆ 11 = Vˆ22 /(Vˆ22 n − ( ti )2 ) ; W

(A.2.15)

i=1

ˆ 12 = −(Vˆ12 /Vˆ22 )W ˆ 11 ; W

(A.2.16)

n X ˆ 22 = n/(Vˆ22 n − ( W ti )2 ) .

(A.2.17)

i=1

Therefore, the estimates of standard errors of α ˆ and βˆ are q q ˆ 11 , σ ˆ 22 . ˆβˆ = W σ ˆαˆ = W

(A.2.18)

Weibull-form intensity function Suppose that the intensity function has the form λ(t) = αβ βtβ−1 .

(A.2.19)

For this intensity function, the time to the first failure τ0 ∼ W (α, β), as it follows directly from (A.1.15). The loglikelihood now is log Lik(t1 , ..., tn ) = nβ log α + n log β + (β − 1)

n X

log ti − (αT )β .(A.2.20)

i=1

The MLE will be found as the solution of the following two equations: ∂logLik/∂α = nβ/α − βαβ−1 T β = 0,

(A.2.21)

A.2. INTENSITY FUNCTION: ESTIMATION

197

and ∂logLik/∂β = n log α + n/β +

n X

log ti − (αT )β β log(αT ) = 0 . (A.2.22)

i=1

The solution is: ˆ α ˆ = n1/β /T, βˆ =



log T − n−1

n X

−1 log ti ) .

(A.2.23)

i=1

The elements of the Fisher information matrix are: ˆ = nβˆ2 /ˆ V11 (ˆ α, β) α2 ,

(A.2.24)

ˆ = βn ˆ log(ˆ V12 (ˆ α, β) αT )/ˆ α,

(A.2.25)

ˆ = n/βˆ2 + n(log(ˆ V22 (ˆ α, β) αT ))2 .

(A.2.26)

and

From here it follows that the estimates of standard errors of α ˆ and βˆ are q q 2 ), σ 2 ). ˆβˆ = V11 /(V11 V22 − V12 (A.2.27) σ ˆαˆ = V22 /(V11 V22 − V12 Remark. Suppose that the point process N (t), t ≥ 0 describes failures of a renewable system. In this context, the derivative of the mean number of failures on [0, t] with respect to t v(t) =

E[N (t)] , dt

(A.2.28)

is called the rate of occurrence of failures (ROCOF). This term is very popular in reliability literature, see e.g. O’Connor (1991), Crowder et al (1991). Obviously, for a NHPP, the ROCOF equals the intensity function λ(t), see (A.1.16). The ROCOF is often confused with failure rate h(t) defined earlier in Chapter 2. The principal difference is that the notion of failure rate is defined only for a random variable describing the lifetime of a nonrepairable component (system). In NHPP, the failure rate of the time to the first failure τ0 coincides with λ(t) only on the interval [0, τ0 ]. # More information on the statistical inference for the NHPP process, including simple graphical analysis, can be found Cox and Lewis (1966) and Crowder et al (1991).

198

APPENDIX 1: NONHOMOGENEOUS POISSON PROCESS

Appendix 2: Covariances and Means of the Order Statistics Table 1. Covariance matrix of order statistics for n = 8, Z ∼ Extr(0, 1) i 1 2 3 4 5 6 7 8

1 1.645

2 0.422 0.464

3 0.262 0.280 0.398

4 0.180 0.193 0.201 0.290

5 0.131 0.140 0.152 0.168 0.232

Table 1, a.

Mean values of the order statistics m(1) −2.6567

m(2) −1.5884

m(3) −1.0111

m(4) −0.5880

m(5) −0.2312

m(6) 0.1029

m(7) 0.4548

m(8) 0.9021

199

6 0.097 0.103 0.112 0.124 0.140 0.198

7 0.071 0.076 0.082 0.090 0.102 0.121 0.183

8 0.048 0.052 0.056 0.062 0.070 0.083 0.106 0.200

200APPENDIX 2: COVARIANCES AND MEANS OF ORDER STATISTICS Table 2. The covariance matrix of order statistics for n = 10, r = 9, Z ∼ Extr(0, 1). i 1 2 3 4 5 6 7 8 9

1 1.645

2 0.436 0.646

3 0.275 0.290 0.397

4 0.193 0.204 0.217 0.287

5 0.144 0.152 0.162 0.174 0.227

6 0.111 0.117 0.124 0.137 0.145 0.190

Table 2, a.

Mean values of the order statistics, n = 10 m(1) −2.800 m(6) −0.257

m(2) −1.826 m(7) −0.012

m(3) −1.267 m(8) 0.284

m(4) −0.868 m(9) 0.585

m(5) −0.544 m(10) 0.990

7 0.086 0.091 0.097 0.104 0.113 0.125 0.166

8 0.067 0.071 0.076 0.081 0.088 0.098 0.111 0.152

9 0.051 0.054 0.058 0.062 0.067 0.074 0.085 0.100 0.149

201 Table 3. The covariance matrix of order statistics for n = 15, r = 10, Z ∼ Extr(0, 1). i 1 2 3 4 5 6 7 8 9 10

1 1.645

2 0.455 0.645

3 0.293 0.303 0.396

4 0.211 0.219 0.227 0.285

5 0.162 0.168 0.174 0.182 0.223

6 0.129 0.134 0.139 0.145 0.152 0.184

Table 3, a.

Mean values of the order statistics m(1) −3.285 m(6) −0.802

m(2) −2.250 m(7) −0.585

m(3) −1.713 m(8) −0.387

m(4) −1.340 m(9) −0.201

m(5) −1.048 m(10) −0.021

7 0.106 0.109 0.114 0.118 0.124 0.130 0.157

8 0.088 0.091 0.094 0.098 0.103 0.108 0.115 0.138

9 0.074 0.076 0.079 0.082 0.086 0.091 0.096 0.103 0.124

10 0.062 0.064 0.067 0.069 0.073 0.077 0.081 0.087 0.093 0.113

202APPENDIX 2: COVARIANCES AND MEANS OF ORDER STATISTICS Table 4. The covariance matrix of order statistics for n = 8, Z ∼ N (0, 1). i 1 2 3 4 5 6 7 8

1 0.373

2 0.186 0.239

3 0.126 0.163 0.200

4 0.095 0.123 0.152 0.187

5 0.075 0.098 0.121 0.149 0.187

Table 4, a. Mean values of the order statistics m(1) −1.4236

m(2) −0.85224

m(3) −0.4728

m(4+k) = −m(4−k+1) , k = 1, 2, 3, 4

m(4) −0.1525

6 0.060 0.079 0.098 0.121 0.152 0.200

7 0.048 0.063 0.079 0.098 0.123 0.163 0.239

8 0.037 0.048 0.060 0.075 0.095 0.126 0.186 0.373

203 Table 5. The covariance matrix of order statistics for n = 10, Z ∼ N (0, 1). i 1 2 3 4 5 6 7 8 9 10

1 0.344

2 0.171 0.214

3 0.116 0.147 0.175

4 0.088 0.112 0.134 0.158

5 0.071 0.090 0.108 0.128 0.151

6 0.058 0.074 0.089 0.106 0.126 0.151

Table 5, a. Mean values of the order statistics m(1) −1.5388

m(2) −1.0014

m(3) −0.6561

m(4) −0.3758

m(5+k) = −m(5−k+1) , k = 1, 2, 3, 4, 5

m(5) −0.1227

7 0.049 0.062 0.075 0.089 0.106 0.128 0.158

8 0.041 0.052 0.063 0.075 0.089 0.108 0.138 0.175

9 0.034 0.043 0.052 0.062 0.074 0.090 0.112 0.147 0.214

10 0.027 0.034 0.041 0.049 0.058 0.071 0.088 0.116 0.171 0.344

204APPENDIX 2: COVARIANCES AND MEANS OF ORDER STATISTICS Table 6. The covariance matrix of order statistics for n = 15, r = 10, Z ∼ N (0, 1). i 1 2 3 4 5 6 7 8 9 10

1 0.301

2 0.148 0.179

3 0.101 0.122 0.141

4 0.077 0.094 0.108 0.122

5 0.063 0.076 0.088 0.100 0.112

6 0.053 0.064 0.074 0.084 0.095 0.106

7 0.45 0.055 0.064 0.073 0.082 0.091 0.103

8 0.040 0.048 0.056 0.064 0.071 0.080 0.090 0.102

9 0.035 0.043 0.049 0.056 0.063 0.071 0.080 0.090 0.103

Table 6, a. Mean values of the order statistics m(1) −1.736

m(2) −1.248

m(3) −0.948

m(4) −0.715

m(5) −0.516

For m(8+k) = −m(8−k) , k = 1, 2, 3, 4, 5, 8, 7

m(6) −0.335

m(7) −0.165

m(8) −0.000

10 0.031 0.038 0.044 0.050 0.056 0.063 0.071 0.080 0.091 0.106

Appendix 3: The Laplace Transform The Laplace transform π(s) of a nonnegative function g(t) is defined as

Z



π(s) =

e−st g(t)dt .

0

π(s) 1 s+λ µ 1 1 + as 1 s(s − a) 1 (s − a)2

g(t)

e

(A.3.2) (A.3.3)

(eat − 1)/a

(A.3.4)

teta

(A.3.5)

eas − ebs a−b

(A.3.6)

(1 + as)eas

(A.3.7)

aeas − bebs a−b

(A.3.8)

λk sk−1 e−λs (k − 1)!

(A.3.9)

(c − b)esa + (a − c)esb + (b − a)esc (a − b)(a − c)(c − b)

(A.3.10)

k

1 (s − a)(s − b)(s − c)

µ/s /a

s (s − a)(s − b) λ λ+s

(A.3.1)

−t/a

1 , a 6= b (s − a)(s − b) s (s − a)2



e−λs

205

206

APPENDIX 3: THE LAPLACE TRANSFORM

Appendix 4: Probability Paper 1. Normal Probability Paper

207

208

APPENDIX 4: PROBABILITY PAPER

Below is the Mathematica code for producing the Normal Paper and for plotting on it. Data processing using normal paper is described in Example 2.2, Section 3.2.

209 2. Weibull Probability Paper

210

APPENDIX 4: PROBABILITY PAPER

Below is the Mathematica code for producing the Weibull Paper and for plotting on it. Data processing using Weibull paper is described in Example 2.1, Section 3.2.

Appendix 5 : Renewal Function β

The renewal function m(t) for F (t) = 1 − e−t

t 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

β = 1.5 0.011 0.032 0.060 0.093 0.130 0.172 0.216 0.264 0.314 0.367 0.422 0.477 0.537 0.597 0.657 0.719 0.782 0.845 0.909 0.973

β=2 0.003 0.010 0.023 0.042 0.066 0.096 0.130 0.170 0.215 0.265 0.318 0.376 0.437 0.502 0.569 0.639 0.711 0.785 0.860 0.936

β = 2.5 0.001 0.003 0.009 0.019 0.033 0.053 0.078 0.109 0.146 0.189 0.238 0.294 0.355 0.421 0.492 0.568 0.647 0.729 0.814 0.890

β=3 0.000 0.001 0.004 0.008 0.017 0.029 0.046 0.069 0.098 0.134 0.178 0.229 0.288 0.354 0.427 0.509 0.591 0.680 0.773 0.868

211

β = 3.5 0.000 0.000 0.001 0.004 0.008 0.016 0.027 0.044 0.066 0.095 0.132 0.178 0.233 0.297 0.370 0.452 0.541 0.637 0.738 0.842

β=4 0.000 0.000 0.001 0.002 0.004 0.009 0.016 0.028 0.044 0.067 0.098 0.138 0.189 0.249 0.321 0.404 0.497 0.599 0.707 0.820

212

APPENDIX 5: RENEWAL FUNCTION

References Aitchison, J. and J.A. Brown. 1957. Lognormal Distribution. Cambridge University Press. Anderson, T.W. 1984. An Introduction to Multivariate Statistical Analysis. 2nd ed., John Wiley & Sons. Andronov, A.M. 1994. Analysis of Nonstationary Infinite-Linear Queueing System. Autom. Control and Comp. Sci., 28, 28-33. Andronov, A.M. and I. Gertsbakh. 1972. Optimum Maintenance in a Certain Model of Accumulation of Damages. Eng. Cybern., No 5, 620-628. Artamanovsky, A.V. and Kh.B. Kordonsky. 1970. Estimate of Maximum Likelihood for Simplest Grouped Data. Theory of Probab. Its Applic., 15, 128-132. Arunkumar, S.A. 1972. Nonparametric Age Replacement. Sankhya, A, 251-256. Barlow, R.E. 1998. Engineering Reliability. ASA-SIAM series on statistics and applied probability. Barlow, R.E. and A. Marshall. 1964. Bounds for Distributions With Monotone Hazard Rate, I and II. Ann. Math. Statist., 35, 1234-1274. Barlow, R.E. and F. Proschan. 1975. Statistical Theory of Reliability and Life Testing. Holt, Rinehart and Winston, Inc. Bellman, R. 1957. Dynamic Programming. Princeton Univ. Press, Princeton, N.J. Birnbaum, Z.W. 1969. On the importance of different components in a multicomponent system. Multivariate Analysis - II, ed. P.R. Krishnaiah, 581-592, New York: Academic Press. Burtin, Yu. and B. Pittel. 1972. Asymptotic Estimates of the Reliability of a Complex System. Eng. Cybern., 10(3), 445-451. Crowder, M.J., A.C. Kimber, R.L. Smith and T.J. Sweeting. 1991. Statistical Analysis of Reliability Data.Chapman & Hall. Devore, Jay L. 1982. Probability and Statistics for Engineering and the Sciences. Brooks/Cole Publishing Company. Dodson, B. 1994. Weibull Analysis. ASQC Quality Press. Elperin, T and I. Gertsbakh. 1987. Maximum Likelihood Estimation in a Weibull Regression Model With Type I Censoring: A Monte Carlo Study. Communications in Statistics, Simulation and Computation, 18, 349-372. Elperin, T. I. Gertsbakh and M. Lomonosov. 1991. Estimation of Network 213

214

REFERENCES

Reliability Using Graph Evolution Models. IEEE Trans. Reliab., 40, No 5, 572-581. Elsayed, E. 1996. Reliability Engineering. Feller, W. 1968. Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed. New York: John Wiley and Sons. Fishman, G.S. 1995. Monte Carlo: Concepts, Algorithms and Applications. Springer George, L., H. Mahlooji and Po-Wen Hu. 1079. Optimal Replacement and Build Policies. Proceedings 1979 Ann. Reliability and Maintainability Symposium. Gertsbakh, I. and Kh. Kordonsky. 1969. Models of Failure. Springer -Verlag. Gertsbakh, I. 1972. Preventive Maintenance of Objects With Multi-Dimensional State Description. Engineering Cybernetics, No 5, 91-95 (in Russian). Gertsbakh, I. 1977. Models of Preventive Maintenance. North - Holland. Gertsbakh, I. 1984. Optimal Group Preventive Maintenance of a System With Observable State Parameter. J. Appl. Prob., 16, 923-925. Gertsbakh, I. 1989. Statistical Reliability Theory. Marcel Dekker, Inc. Gertsbakh, I. and Kh. Kordonsky. 1994. The Best Time Scale for Age Replacement. 1994. Intern J. of Reliab., Quality and Safety Engineering, 1, 219-229. Gnedenko, B.V., et al. 1969. Mathematical Methods in Reliability Theory. New York: Academic Press. DeGroot, Moris H. 1970. Optimal Statistical Decisions. McGraw-Hill Company. Kaplan, E.L. and P. Meier. 1958. Nonparametric Estimation From Incomplete Observations. JASA, 53, 457-481. Khinchin,A. Ya. 1956. Streams of Random Events Without Aftereffect. Theory of Probab. and Its Appl., 1, 1-15. Kordonsky, Kh.B. and I. Gertsbakh. 1993. Choice of the Best Time Scale for System Reliability Analysis. Europ. J. Operat. Res., 65, 235-246. Kordonsky, Kh.B. and I. Gertsbakh. 1995. System State Monitoring and Lifetime Scales -I and II. Reliab. Engineering and System Safety, 47, 1-14; 49, 145-154. Kordonsky, Kh.B and I. Gertsbakh. 1997. Fatigue Crack Monitoring in Parallel Time Scales. Proceedings of ESREL, Lisboa, 1485-1490. 47, 1-14; 49, 145-154. Kovalenko, I.N. and N.Yu. Kuznetsov and Ph.A. Pegg. 1997. Mathematical Theory of Reliability of Time-Dependent Systems. Wiley. Kullback, S. 1959. Information Theory and Statistics. New York -John Wiley and Sons, Inc. Kulldorf, G. 1961. Estimation from Grouped and Partially Grouped Data. Wiley: New York. Lawless, J.F. 1982. Statistical Models and Methods for Lifetime Data. Wiley and Sons, Inc.  Lawless, J.F. 1983. Statistical Methods in Reliability. Technometrics, 25, 305316. Lieblein, J. and M. Zelen. 1956. Statistical Investigation of the Fatigue Life of Deep Groove Ball Bearings. J. Res. Nat. Bur. Stand., 57, 273-316.

215 Mann, N.R. 1967. Tables for Obtaining the Best Linear Invariant Estimates of Parameters of the Weibull Distribution. Technometrics, 9, 629-645. Mann, N.R. 1969. Optimum Estimators for Linear Functions of Location and Scale Parameters. Ann. Math. Stat., 40, 2149-2155. Mann, N.R. and K.W. Fertig. 1973. Tables for Obtaining Confidence Bounds and Tolerance Bounds Based on Best Linear Invariant Estimates of Parameters of the Extreme Value Distribution. Technometrics, 17, 87-101. Miller, Rupert G., Jr. 1981. Survival Analysis. John Wiley & Sons. O’Connor, Patrick, D.T. 1991. Practical Reliability Engineering. 3rd edition. John Wiley & Sons. Okumoto, E. and E.A. Elsayed. 1983. An Optimum Group Maintenance Policy. Naval Res./ Logist. Quart., 30, 667-674. Rappaport, A. 1998. Decision Theory and Decision Behaviour. McMillan Press, Ltd. Redmont, E.F., et al. 1997. OR Modeling of the Deterioration and Maintenance of Concrete Structures. Erop. J. Operat. Res., 99, 619-631. Ross, S.M. 1970. Applied Probability Models With Optimization Applications. Holden Day, San Francisco. Ross, S.M. 1993. Introduction to Probability Models. Fifth Ed., Academic Press, New York. Sarhan, A.E. and B.G. Greenberg. 1962. The Best Linear Estimates for the Parameters of the Normal Distribution. In the collection Contributions to Order Statistics, New York: Wiley. Seber, G.A.F. 1977. Linear Regression Analysis. John Wiley & Sons. Stoikova, L.S. and I.N. Kovalenko. 1986. On Certain Extremal Problems of Reliability Theory (Russian). Tekhn. Kibernetica, 6, 19-23. Taylor, H.M. and S. Karlin. 1984. An Introduction to Stochastic Modeling.Academic Press, Inc. Usher, John S., et al. 1998. Cost Optimal Preventive Maintenance and Replacement Scheduling. IIE Trans., 30, 1121-1128. Vardeman, Stephen B. 1994. Statistics for Engineering Problem Solving. PWS Publishing Company, Boston. Weibull Analysis Handbook . 1983. Wright-Patterson AFB, Ohio 45433, USA. White, D.J. 1993. Markov Decision Processes. Wiley and Sons, Inc. Williams, John, H, Alan Davies and Paul R. Drake. 1998. Condition-Based Maintenance and Machine Diagnostics. Chapman & Hall. Wolfram, Stephen. 1999. Mathematica book. 4th ed. Wolfram Media/Cambridge University Press. Zhang, Fan and Andrew K.S. Jardine. 1998. Optimal Maintenance Models With Minimal Repair, Periodic Overhaul and Complete Renewal. IIE Trans., 30, 1109-1119.

216

REFERENCES

Notation and Abbreviations c.d.f. d.f. i.i.d. r.v. DFR ER IFR NHPP PM SMP

Cumulative distribution function Density function Independent identically distributed Random variable Decreasing failure rate emergency repair Increasing failure rate Nonhomogeneous Poisson Process preventive maintenance Semi-Markov process

X ∼ F (x) X ∼ f (x) X ∼ (µ, σ 2 ) τ ∼ B(n, p)

The r.v. X has c.d.f. F (x) The r.v. X has d.f. f (x) The r.v. X has mean µ and variance σ 2 The r.v. τ has binomial distribution with parameters n and p. τ ∼ Geom(p) The r.v. τ has Geometric distribution with parameter p. τ ∼ Exp(λ) The r.v. τ has exponential distribution with parameter λ. τ ∼ Exp(a, λ) The r.v. τ has exponential distribution with parameters a, λ. τ ∼ Extr(a, b) The r.v. τ has the Extreme value distribution with parameters a, b. τ ∼ Gamma(k, λ) The r.v. τ has Gamma distribution with parameters k and λ. τ ∼ logN (µ, σ) The r.v. τ has lognormal distribution with parameters µ, σ. τ ∼ N (µ, σ) The r.v. τ has  normal parameters µ, σ.  distribution with R x √ −1 −v2 /2 t−µ If X ∼ N (µ, σ), then P (X ≤ t) = Φ σ , where Φ(x) = −∞ ( 2π) e dv. τ ∼ P(µ) The r.v. τ has Poisson distribution with parameter µ. τ ∼ U (a, b) The r.v. τ is uniformly distributed on [a, b]. τ ∼ W (λ, β) The r.v. τ has Weibull distribution with parameters λ, β. E[X] The mean (expected) value of r.v. X 217

218

NOTATION AND ABBREVIATIONS

V ar[X] The variance of r.v. X p c.v.[X] The coefficient of variation of X, i.e. c.v.[X] = V ar[X]/E[X]. R(t) = 1 − F (t) = P (τ > t) tp The p-quantile, i.e. P (X ≤ tp ) = p. d

Xn → F (t) as n → ∞ The sequence {Xn , n = 1, 2, ...} converges in distribution to c.d.f. F (t). λ(t) The event rate in the NHPP; λ(t) · ∆ ≈ P (one event in (t, t + ∆)). Rt Λ(t) = 0 λ(v)dv Cumulative event rate in NHPP. h(t) Failure (hazard) rate; h(t) = f (t)/(1 − F (t)). x < y, where x = (x1 , x2 , ..., xn ), y = (y1 , ..., yn ), means that xi ≤ yi , i = 1, ..., n, and there is one coordinate j for which xj < yj . m(t) Renewal function m0 (t): Renewal density, i.e. dm(t)/dt. # End of example, definition or remark log b Natural logarithm of b Equation (2.3.5)

The fifth equation of Section 3 in Chapter 2.

Index

219

Suggest Documents