Dynamic Reliability Assessment for Multi-State Systems Utilizing ...

7 downloads 146 Views 2MB Size Report
Systems Utilizing System-Level Inspection Data. Yu Liu, Member .... netic head, a critical unit in hard disk drives. Si et al. ..... restore to a better state. Hence, one ...
IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

1287

Dynamic Reliability Assessment for Multi-State Systems Utilizing System-Level Inspection Data Yu Liu, Member, IEEE, Ming J. Zuo, Senior Member, IEEE, Yan-Feng Li, and Hong-Zhong Huang, Member, IEEE

number of components in an MSS

Abstract—Traditional time-based reliability assessment methods evaluate the reliability of a multi-state system (MSS) from a population or a statistical perspective that the reliability of a system is computed purely based upon historical time-to-failure data collected from a large population of identical components or systems. These methods, however, fail to characterize the stochastic behaviors of a specific individual system. In this paper, by utilizing system-level observation history, a dynamic reliability assessment method for MSSs is put forth. The proposed recursive Bayesian formula is able to dynamically update the reliability function of a specific MSS over time by incorporating system-level inspection data. The dynamic reliability function, state probabilities, and remaining useful life distribution of an MSS in residual lifetime are derived for two common cases: the degradation of components follows a homogeneous continuous time Markov process, and a non-homogeneous continuous time Markov process. The effectiveness and accuracy of the proposed method are demonstrated via two numerical examples.

observation history up until state of an MSS at time instant state of component at time instant set containing all the combinations of components' states when a system is at its state number of combinations of components' states at a system state state of component state of component in the th combination of components' states when a system is at state number of states of component a constant transition intensity of component from its state to state for a homogenous Markov model a time-varying transition intensity of component from its state to state at time for a non-homogenous Markov model probability of component staying at its state at time probability of an MSS staying at its state at time performance capacity of a system at its state

Index Terms—Dynamic reliability assessment, multi-state system, remaining useful life, system-level inspection data.

ABBREVIATION AND ACRONYMS MSS

multi-state system

UGF RUL

universal generating function remaining useful life

PDF

probability density function NOTATION performance capacity of an MSS at time instant performance capacity of component at time instant

Manuscript received March 18, 2014; revised June 24, 2014, August 22, 2014, September 28, 2014, and December 13, 2014; accepted March 29, 2015. Date of publication April 21, 2015; date of current version November 25, 2015. This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Natural Science Foundation of China under contract numbers 51375078, 71101017, and 71371042. A shorter version of this paper was presented at PHM 2013 (September 2013, Milan, Italy) and published in its proceedings [46]. Associate Editor: G. Levitin. (Corresponding Author: M. J. Zuo.) Y. Liu, Y. F. Li, and H. Z. Huang are with the School of Mechanical, Electronic, and Industrial Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail: [email protected]; [email protected]; [email protected]). M. J. Zuo is with the School of Mechanical, Electronic, and Industrial Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China, and also with the Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 2G8 Canada (e-mail: mzuo@ualberta. ca). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TR.2015.2418294

probability density function of remaining useful life given observations up to I. INTRODUCTION

E

NGINEERING systems and their components may manifest multiple states from working perfectly to completely failing during their deterioration process [1]. Multi-state system (MSS) reliability theory, which allows for characterizing the deterioration of degraded systems by introducing multiple intermediate states between the aforementioned two extreme states, has been recognized as a more effective tool to appropriately reveal more details of the complicated stochastic behaviors of advanced engineering systems [2], [3]. Many newly developed methods, like the extended decision diagram-based method [4], stochastic processes [5]–[7], universal generating functions (UGFs) [8], [9], recursive algorithms [10], Monte Carlo simulation [11], [12], and stochastic Petri nets [13] have been used to facilitate the reliability and performance evaluation for a variety of MSSs, e.g., manufacturing systems [14], power systems [15], networked systems [12], grid systems [16], spacecraft [17], municipal infrastructure [18], and defense strategy [19].

0018-9529 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1288

Some recent advances in multi-state system reliability analysis and optimization have been reported in [3]. As demonstrated in these studies, with the assistance of MSS reliability tools, the reliability and performance of systems are more accurately assessed and greatly enhanced [2], [3]. However, it is noteworthy that the aforementioned studies are based on traditional time-based reliability models, which compute the reliability of an MSS from the population or statistical perspective. Put another way, state transition intensities or probabilities, the distribution of the sojourn time at a certain state, and many other quantities that characterize stochastic behaviors of a system are derived from historical data collected from a large population of identical systems, and the reliability assessment of a system over time is conducted purely based upon this statistical information [20]. For a specific individual system, if additional useful knowledge or information related to the stochastic (deteriorating) behaviors of the system becomes available, uncertainty associated with the deterioration of this specific system can be further reduced so as to lead a more precise reliability assessment for this system. Bearing this general concept in mind, dynamic reliability herein is defined as the reliability function or model of a specific system that can be updated or modified by collecting additional useful knowledge or information related to the stochastic (deteriorating) behaviors of the system after the system is put into use. In the past decade, many attempts have been made to assess the dynamic reliability of systems. For example, if denotes the failure time of a binary-state system, the survival probability is the dynamic reliability function of a specific individual system given the knowledge (or observation) that the system survives at least up to [21]. In addition to the status of survival, knowledge or information related to a system's stochastic (deteriorating) behavior, like loading (or usage) history [22], [23], internal and external covariance [24], [25], or condition monitoring data [26], [27], can also be utilized to update the reliability function or model of a specific individual system. As the reliability of a system is directly associated with the system's residual useful life, diverse efforts related to remaining useful life estimation or prediction belong to the scope of dynamic reliability assessment. Methods related to remaining useful life prediction can be roughly classified into two categories, i.e., data-driven, and model-based; and the state-of-the-art methods are summarized in [28]–[31]. Most reported works on dynamic reliability assessment focus on the situation where knowledge or condition monitoring information collected from a system or component only reflects, either directly or indirectly, the current condition and future evolution of the system or component itself [26], [27], [32], [33]. For example, Ye et al. [34] developed a Wiener Process model with measurement error to characterize the wear process of a magnetic head, a critical unit in hard disk drives. Si et al. [35], [36] focused on improving the accuracy of the remaining useful life prediction for a single-component system by introducing nonlinear degradation paths or a path-dependent updating strategy. In these cases, condition monitoring data are merely collected from a unit or a component, and can only be used to update the reliability of the monitored unit or component. However, as most engineered systems consist of more than one component,

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

and the condition and evolution of a system are eventually determined by its components, tracking and utilizing both component-level and system-level knowledge or condition monitoring information during the system operation stage is expected to improve the accuracy of dynamic reliability assessment for this specific individual system. To achieve our long-term goal of integrating hierarchical knowledge and condition monitoring information across various levels of a system to better assess the system dynamic reliability, our focus in this work is to utilize system-level inspection data of a specific individual MSS to narrow down the possible states of its multi-state components and further update the reliability of these components, and then update the reliability function of the system. This issue is very commonly encountered in engineering practices, for example, where components' states are oftentimes unobservable, but the system state can be accurately observed. However, several combinations of components' states may result in the same system state. In this case, identifying the possible state of each component in the system enables engineers to update the dynamic reliability function of this system, and predict the system's further deteriorating behaviors and remaining useful life. In addition, it also allows for maintaining, as early as possible, the seriously degraded components that have significant contributions to system performance capacity, or may cause failure of the entire system. To the best of our knowledge, this issue has never been addressed in literature. In this paper, by utilizing all the system-level observation history, a recursive Bayesian method is put forth to recursively identify the possible states of components, and further update the reliability function of the system. The dynamic reliability function, state probabilities, and remaining useful life distribution of an MSS are also proposed for two common cases: the degradation of components follows a homogeneous continuous time Markov process, and a non-homogeneous continuous time Markov process. Numerical examples are presented to illustrate the effectiveness of the proposed method. The remainder of this paper is organized as follows. Section II introduces the problem we study in this work. The details of the proposed recursive Bayesian method along with the formulation of dynamic reliability function, state probabilities of both the system and its components, and remaining useful life distribution in residual lifetime are elaborated in Section III. Two numerical examples are presented in Section IV to demonstrate the effectiveness of our proposed method, and they are followed by conclusions and remarks in Section V. II. PROBLEM STATEMENTS The MSSs investigated in this paper consist of multiple components. Every component has multiple discrete states that could be distinguished by either performance capacity or level of degradation. For example, consider a power supply system consisting of generating and transmitting facilities, and each generating unit can function at different levels of capacity. Generating units are complex assemblies of many parts. The failure of different parts may lead to situations in which the generating unit continues to operate, but at a reduced capacity [2]. Another example is that, based on the damage level, the condition of a bearing within a gearbox may be classified into

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

1289

TABLE II THE SYSTEM STATES AND THE ASSOCIATED COMBINATIONS OF COMPONENTS' STATES.

Fig. 1. A water piping system.

TABLE I FLOW TRANSMISSION RATES OF COMPONENTS.

several states from perfectly working to completely failed; for example, we can define the states as normal, moderately damaged, seriously damaged, and completely failed. Similar treatment can be found in many engineering practices, such as manufacturing systems [14], power systems [15], networked systems [12], grid systems [16], spacecraft [17], and municipal infrastructure [18]. The entire system manifests multiple states with different combinations of components' states. Most often, more than one combination of components' states may result in the same system state. A simple water piping system is exemplified here to illustrate this argument. Suppose the water piping system is comprised of three pipes, as shown in Fig. 1. Units #1 and #2 are connected in parallel, and then serially connected with Unit #3. As reported in [2], [3], [37], [38], the deteriorating process of each pipe may be classified into several discrete states, say good, medium, and failed, based on the performance capacity determined from engineering practice. Each discrete state corresponds to an interval of the flow rate of the pipe, and the middle or the average flow rate in that interval is used to represent the performance capacity of the pipe at each discrete state. The flow transmission rates of each component in each state are tabulated in Table I. The performance capacity of the entire system at any time instant is completely determined by the performance ca( ,2,3). Based on the system pacity of components configuration, and the components' performance capacity, one has as the performance capacity of the entire system equals the minimal value between (the sum of the mass flow rates of Units #1 and #2) and (the mass flow rate of Unit #3). The system states distinguished by the flow transmission rates along with the associated combinations of components' states are given in Table II, and this table corresponds to the structure function . As observed in Table II, some system states, e.g., states 5, 4, 3, 2, and 1, are caused by multiple combinations of components' states. In some engineering practices, it might be impossible to observe the condition of components; for example, it is difficult to set up sensors or devices to detect the status of bearings and gears of a gearbox as they are inside the gearbox. But it is very

easy to track and observe the condition of the entire gearbox via critical signals correlated with the system condition, like vibration signals and oil debris. Instead of directly monitoring the condition of the states of components, the condition monitoring information from the system level becomes important to reflect states of components. Once components' states are identified, the reliability function of the entire system can be updated so as to predict the system's future behavior in a more accurate manner. For example, if the system is detected to be at state 4, Unit #1, and Unit #2 must be at state 1, and state 3 respectively. Unit #3, however, could be either at its state 2 or at its state 3, leading to different deterioration patterns of both Unit #3 and the system in residual lifetime. Identifying the current state of Unit #3, therefore, becomes critical to update the reliability function of the system, and this paper fulfills this need. Before introducing the proposed method to assess the dynamic reliability of a monitored MSS, some assumptions used in this work are summarized as follows. 1. An MSS consists of components, and has a finite number of states , ranging from a perfectly working condition to a completely failed one. denotes the total number of states of the MSS; , and represent the best, and the worst states of the MSS respectively. 2. The component in the MSS could have more than two states denoted as , where is the number of possible states of component ; , and are the best, and the worst states of component respectively. Components in a system monotonically degrade from a better state to a worse one. 3. The transition intensities or the deterioration model of component from its state to state ( ) are known in advance. This assumption is practical as data collected from component-level reliability tests for each component can be used to choose a deterioration model, and estimate the unknown parameters in the deterioration model. In this paper, existing parameter estimation methods for multi-state components with a Markov deterioration profile, as reported in [20], [39], can be used to estimate the transition intensities of components.

1290

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

4. The structure function of an MSS with respect to components' states is exactly known. The state of an MSS at any time instant can be, therefore, determined by the combination of components' states at any time instant as . Tools like the universal generating function (UGF) [2], [9], and minimal path vector [40], provide a computationally efficient way to find out combinations of components' states with respect to the system state. 5. As multiple combinations of components' states may lead to the same system state, one must have . is a set containing all the combinations of components' states when the system is at its state . It can be further represented as , and denotes the total number of combinations of reprecomponents' states at system state . sents the state of component in the th combination of components' states when the system is at state . 6. The states of components of a specific individual MSS at any time instant after the MSS is put into use are unobservable; whereas the system state is observable during the operation stage by utilizing condition monitoring techniques, and it is assumed in this work that the system state can be immediately identified at any inspection time. The observed state directly corresponds to the actual health condition or performance capacity of a specific individual MSS, and it is denoted as at the th inspection time; while consisting of represents a sequence of system states observed at inspection time instants. The inspection interval could be either periodical or non-periodical. 7. Assume that there is no maintenance activity intervening in the deterioration process of an MSS.

where , . Equation (1) represents the probability of the inspected system staying at its state ( ) with the th combination of components' states given the observation history up to . separates the observation history into two portions: the inspection data at time instant , and these before . By utilizing Bayes' rule, (2) the conditional probability of (1) can be further expanded as (3) acts as the at the bottom of the page, where event of (2); is event , whereas , and , which can be separated into the observation history up until , is event of (2). It is obvious that the first term in the numerator of (3) equals one as event contains . The second term in the numerator of (3) can be expanded as

(4) indicates that, at the ( ) th inspecwhere tion time, the system is at state with the th combination of components' states. The denominator of (3) can be written as

III. PROPOSED DYNAMIC RELIABILITY ASSESSMENT METHOD A. Estimation of Current Status , At any inspection time , the current system state along with the observation history of a specific individual system, are available. The objective here is to identify the states of components at the current time instant by using these sequential system-level inspection data. The associated conditional probability is denoted as

(1)

(5)

(3)

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

By plugging (4) and (5) into (3) and (1), one has (6) at the bottom of the page. It is noted that (6) is a recursive Bayesian formulation in which the probability of the system staying at its state with the th combination of components' states at inspection time is a function of the probability of the system staying at its state with the th combination of components' states at inspection time , and the probability of the system transiting from the state to within the inspection in. Because we have assumed that, at the beginterval ning of the operation stage, the system and its components are in a brand new condition, the initial condition of (6) for the recursive process is set to . With the assumption that a component's degradation can be characterized by a Markov model, the future state which a component will degrade to is -dependent only on the present state, and -independent of the past states. is, therefore, -in. If the dependent with stochastic behaviors of components are also -independent of one another, one has

(7) is the state of component in the th combination where of components' states when the system is at its state . Such expansion is based on the structure function of the system which links each system state with all the possible components' state combinations. Beside the structure function table (e.g., Table II), other reported methods of representing the structure function of multi-state systems, say the multi-valued structure function [2], can also be used to find the one-to-many relation between the system's states and the components' states. Obviously, the state transition probability of the system is transformed into a product of state transition probabilities of components. To compute the probability of state transitions of a component, many well-established stochastic models such as Markov models [2], or Petri nets [13], can be used. We only briefly review two models, namely the homogenous Markov model, and the non-homogenous Markov model, which have been extensively adopted in MSS modeling [2], [41], [42]. 1) Components' Degradation Governed by a Homogenous Markov Model: In this situation, the transition time between any pair of states of a component is assumed to be exponentially

1291

distributed. The probability of component can be derived by solving the corresponding set of Kolmogorov differential equations (8) ( ) is the probability of where component sojourning at its state at time instant . The initial condition is the most important setting here, and it should be , for . Therefore, by solving these differential equations, equates to . 2) Components' Degradation Governed by a Non-Homogenous Markov Model: As transition intensities vary over the age of a component in a non-homogenous Markov model, deriving the probability of a component at a certain state is more challenging than the aforementioned homogenous Markov model [41], [42]. How to solve a non-homogenous Markov model in a computationally efficient manner is outside the scope of this paper, and the formulation given in [41], [42] can be directly used here, with the initial condition , for ( ). For example, if , that is, the component sojourns during time interval , the value in its state of equates to , and can be expressed as

(9) , the probability of component staying If at state at time while staying at state at time is written as

(10)

(6)

1292

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

By substituting and (7) into (6), the probability of components' states at the current inspection time can be derived by utilizing all of the system-level observation history up until . It should be noted that the results will vary from system to system even if they pass through the same states but with different transition time instants. This transition is the unique feature of our proposed method, which is able to track the deterioration of every specific individual system rather than a population of identical systems. This point will be demonstrated via our numerical examples in Section IV. B. Prediction of Future Status With the knowledge of components' possible states and associated probabilities at the latest inspection time , the behaviors of the components and the specific individual system in the future residual lifetime can be more precisely predicted. This prediction is achieved by considering the component making a transition from the current state to the next. If the system is observed at state at the current inspection time, the probability of the system being in state ( ) at the end of a specified time span can be computed by

and the corresponding mean remaining useful life (RUL) of the system is formulated as

(14) whereas the probability density function of the RUL of the system can be computed by (15) where is the threshold state, and the system getting into a state lower than can be viewed as failure. For example, for the multi-state weighted -out-of- :G system, the state is defined as the lowest state whose utility of all components is greater than or equal to [43]; whereas, for the MSS defined in [2], the state is the lowest state whose performance capacity is greater than or equal to the user demand. In the latter case, the dynamic reliability function can be written as

(16)

(11) is

the

time

elapsed

after

.

The probability can be computed in the same manner as (7), but the only difference worth noting is to replace , and of (7) by , and , respectively. The probability of component staying at its state ( ) for the rest of its lifetime can

where the random quantity denotes the user demand with possible discrete values, denotes the th ( ) possible variable of the random quantity , is the probability takes the value of , and is an indicator function that which takes the value of one if is not less than zero and zero otherwise. Hence, the corresponding mean remaining useful life is expressed as

be computed by (17)

(12) indicates the highest state of where component among all the combinations of components' states when the system is at state at inspection time . Hence, the dynamic reliability function of the system in the future lifetime is expressed as

(13)

Even though only the type of MSS whose reliability is defined as its performance capacity rate being greater than its specified user demand is exemplified here, the proposed method has broad applications to various types of MSSs if the threshold state in (14) can be clearly defined. IV. NUMERICAL EXAMPLES With the aim of validating the effectiveness of the proposed method, two numerical examples are presented in this section. The first example is designed to provide a step-by-step procedure to facilitate the use of the new method in engineering practices. Results from Monte Carlo simulation are provided to validate our method. In the second example, the proposed method is further applied to a hypothetic mechanical system of which state transition intensities vary with the components' age.

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

TABLE III TRANSITION INTENSITIES OF COMPONENTS WATER PIPING SYSTEM.

IN THE

1293

STUDIED

A. Example 1 – A Water Piping System The illustrative example of a water piping system introduced earlier is presented here to demonstrate the use of the proposed dynamic reliability assessment method. The transition time of a component from its better state to a lower one is assumed to follow an exponential distribution; that is, the deterioration of each component in the system can be characterized by a homogeneous continuous time Markov process. The transition intensities of components are tabulated in Table III. These parameters can be obtained by analyzing the historical transition data collected via component-level reliability tests as reported in [20], [39]. As the deterioration processes of these components are assumed to be -independent, the probability of any component in one of its states can therefore be computed by resolving the corresponding Kolmogorov differential equations shown in (8). The initial condition is set to be and ( ), indicating component is at the best state at . The details of solving the Kolmogorov differential equations can be found in [2], [3], [6], [38]. By solving the corresponding Kolmogorov differential equations, one can get the state probabilities of Unit #1 over time as

The state probabilities of Units #2 and #3 are identical, and are

where for Unit #2, and for Unit #3. The corresponding state probability curves for Units #1, #2, and #3 are plotted in Fig. 2(a)–(c), respectively. One has , and thus the system state probabilities can be derived by using the universal generating function (UGF) (See [2] for the details of UGF). The system state probabilities starting from time zero are depicted in Fig. 3 by the dotted lines with marks, representing the traditional time-based state probability evaluation without taking into account inspection data. Per the proposed method in this work, both the state probabilities of components and systems can be updated dynamically for an individual specific system if the system state could be observed during the lifetime of the system. Suppose a specific system (System #1) is observed in its state 4 at months after being put into use. However, there are two possible combinations of components' states, as shown in Table II, i.e.,

Fig. 2. The state probabilities of components. (a) Unit #1. (b) Unit #2. (c) Unit #3.

Based on (6), the probability is shown in the first equation at the bottom of the next page, where , and . One has as components of the system are all in their best states at the beginning of time. The conditional probability can be computed by multiplying the probabilities of components transiting from one combination to another as (7). Thus, one has the second equation at the bottom of the next page, where can be computed by the aforementioned Markov model. For example, months , as one can observe from Fig. 2(a). By plugging state transition probabilities of all components into (3), one has , , indicating that Unit #3 is more likely in its state 2 than state 3 at this moment. The updated state probabilities of the system in the residual lifetime right after months can be derived via (11). For example, the probability of the system being in state 2 is formulated as

1294

Thus, one has the last equation at the bottom of the page. As we assume that the system is non-repairable, components cannot restore to a better state. Hence, one has . Given that the system state is observed to be 4 at months, the updated state probabilities months are also plotted of the system starting from in Fig. 3. With this inspection data, starting from months, the system may only be in states 1, 2, 3, and 4. The updated state probabilities after are quite different from the state probabilities extending from time 0. For example, is quite different from after , in the interval , in Fig. 3. If state 1 of the system is the failure state, i.e., , the system reliability equals the probability of the system staying at states greater than state 1. The updated reliability curve of the specific system after incorporating the inspection information at is plotted in Fig. 4 by the dashed line. As seen in Fig. 4, the updated reliability of the system is greater than the reliability assessed without utilizing the system-level observation over the time span of [0.8, 2.2] months; whereas the updated reliability possesses a relatively faster declining trend after 2.2 months. By utilizing the ensuing system-level inspection data, the reliability of the specific system can be further updated. Suppose System #1 is further observed in its state 2 at months. The state probabilities of components can be recursively updated by using (6) following the same fashion as our previous demonstration. Hence, one has

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

It indicates that the Unit #3 of the system has a greater probability in its state 2 than its state 1. The updated state probabilities, and the updated reliability curve of System #1 after , are respectively delineated in Fig. 3, and Fig. 4 by the solid lines with marks. Though different systems from the same population have identical state transition intensities, the dynamic reliability of a specific system may deviate from the population, and that difference may be apparent if system-level observations are obtained and utilized. Suppose System #2 has the exactly same state transition intensities as System #1 studied earlier. However, System #2 is observed to be in state 5 at , months. The corresponding dynamand in state 3 at ically updated state probabilities, and the reliability of System #2 are illustrated in Fig. 5, and Fig. 6 respectively. In Fig. 5, the curves starting from time 0 denote the state probabilities without taking into account inspection data, whereas the curves starting from , and those starting from represent the updated state probabilities by utilizing system-level inspection data at , and respectively. By comparing Systems #1 and #2, one can observe that the updated state probabilities and the reliability of these two systems are different. System #1 has a faster degradation trend than System #2 after . B. Results Validation The Monte Carlo simulation is conducted in this subsection to further verify analytical results of the proposed method for the

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

1295

Fig. 3. The original and updated state probabilities of System #1.

Fig. 6. The original and updated reliability of System #2 after inspections.

Fig. 4. The original and updated reliability of System #1 after inspections.

Fig. 7. The flowchart of the Monte Carlo simulation.

systems which are not in state 5 at months will be removed from the copies of the generated systems. Assume that the number of the remaining systems that match with a sequence of observations up until is . The probability that the specific system is with a certain combination of components' states can be computed as (19) Fig. 5. The original and updated state probabilities of System #2.

water piping system. The basic flowchart of the Monte Carlo simulation for computing the state probabilities of a specific individual MSS is outlined in Fig. 7. sets of state transition data, i.e., the time instant at which a system transits from one state to another, are first generated for copies of the studied MSSs in accordance with transition intensities of components. Based on the inspection data, the systems that do not match with observations will be eliminated. For example, if a specific individual MSS is observed in its state 5 at months, the

where is the number of realizations of the system whose components' states combination is at . In the remaining lifetime, the state probabilities of the specific system whose observations data is are expressed by (20) is the number of systems which are in state where at . The computation process will continue until there is no additional inspection data for the specific system.

1296

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

TABLE IV THE SYSTEM STATES WITH RESPECT TO COMBINATIONS OF COMPONENTS' STATES.

Fig. 8. The updated state probabilities from both simulation and the proposed analytical method.

TABLE V TRANSITION INTENSITIES OF COMPONENTS IN MECHANICAL SYSTEM.

THE

HYPOTHETICAL

Fig. 9. The updated reliability from both simulation and the proposed analytical method.

Both simulation and analytical results for a special System #3 are plotted in Fig. 8, and Fig. 9, where the system is observed in state 6, and state 3 at months, and months, respectively. In Fig. 8, the curves starting from time 0 represent the state probabilities (analytical and simulated results) without taking into account inspection data, whereas the curves starting from represent the updated state probabilities (analytical and simulated results) utilizing inspection data obtained at , and the curves starting from are results utilizing inspection data at . is set to 50,000 at initial time . Only some representative state probabilities (states 6, 3, and 1) are plotted in Fig. 8 to avoid the possible confusion due to a huge number of curves. As one can observe from Figs. 8 and 9, the analytical results are exactly identical with the simulation results. However, for the simulation with , it takes around 600 seconds via Matlab R2008 on a PC with an Intel Core(TM) Duo 2 GHz CPU and 4 GB RAM; whereas the proposed analytical method is able to get accurate results within 2 seconds. C. Example 2 – A Hypothetical Mechanical System In this subsection, a numerical example of a hypothetical multi-state mechanical system consisting of three components

is used to further demonstrate the proposed method in the case where state transition intensities vary over time. The health condition of every component can be roughly categorized into three states, namely normal (state 3), moderately damaged (state 2), and seriously damaged (state 1); whereas the health condition of the entire system is classified into six states from perfectly working to completely failed, denoted as state 6 to state 1. The health condition of the system is completely determined by the states of components. The structure function characterizing the relationship between the system state and the combi( , 2, 3) is given in nation of components' states Table IV. As most mechanical components suffer from aging failures, e.g., wear, corrosion, cracking, etc. during their lifetime, it is more appropriate to model the deterioration process of components via a non-homogenous Markov model. Put another way, the state transition intensities of components increase with age. Hence, the state transition intensities of each component are assumed to be a function of the component's age as shown in Table V. To illustrate that the accuracy of system reliability assessment will be improved by the proposed method via incorporating system-level inspection data, we randomly generate a set

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

TABLE VI TIME INSTANTS OF STATE TRANSITIONS.

TABLE VII THE OBSERVATIONS FROM INSPECTIONS OF THE SPECIFIC SYSTEM.

1297

TABLE VIII THE MSE OF PREDICTED RULS.

working or has failed at each inspection time. In this case, the can traditional survival probability be directly used to compute the dynamic reliability of the system if the system is observed in the working state at inspection time , where is the traditional time-based multi-state system reliability function without taking into account inspection data. 2) Case II: Instead of using the recursive Bayesian formulation to estimate the probability of components' states combinations, we assume that the probability of the system staying at any one of combinations of components' states is the same, i.e., . A loss function is used here to calculate the mean squared errors (MSEs) of the predicted RUL with respect to the actual RUL. The loss function is written as [44] (21)

Fig. 10. The PDFs of predicted remaining useful life at inspection points.

of state transition data for a specific individual system by using the components' transition intensities in Table V with the assumption that all the components are in a completely new condition at month. Based on the state transition times of components, and the relationship between the system's states and components' states, we can get the time instants that the simulated system transits from one combination of components' states to another, as tabulated in Table VI. The specific system is inspected four times (except month) during its lifetime, and the corresponding system states observed at these inspection points are listed in Table VII. By utilizing the system-level observations, the reliability of the specific system can be updated, as well as the distribution of the remaining useful life (RUL). The probability density functions (PDFs) of the remaining useful life of the specific system are plotted in Fig. 10, where both the state transition time and inspection time are also indicated. One can observe that the bounds of the PDFs of RULs become narrower as the system deteriorates from better states to lower ones, indicating a reduction of the predictive uncertainty for the RUL. All the PDFs of RULs encompass the actual RUL (denoted by the dash line) of the specific system. The PDF at an inspection time equal to zero is the result of traditional time-based reliability assessment methods where system-level observation is not taken into account, and it has the widest distribution bound, indicating a great uncertainty in predicting the possible failure time of the system. For the sake of illustrating the effectiveness of the proposed method in terms of predicting the RUL, the results are compared with that of the following two cases. 1) Case I: It is assumed that the exact state of the system is unobservable, but one can only know whether the system is

where is the actual RUL at ; , the same as (15), is the PDF of the predicted RUL at . A smaller value of indicates a better prediction of RUL. The MSEs for the three cases are presented in Table VIII, and the smallest value at each inspection time is highlighted. At month, the MSEs of all the three methods are the same because no inspection data are incorporated at this moment. When taking account of inspection data, as shown in Table VIII, Case I has the largest error among three cases, indicating the predictive capability is the worst. The major reason is that Case I uses the least information (either working or failed) associated with the system's deterioration process, and it leads to the worst prediction of RUL. The inspection data of both the proposed method and Case II contains the information of the state of the system, resulting in a better predictive capability. The proposed method outperforms Case II as it allows for incorporating all the system-level inspection data up until to achieve a more accurate prediction of RUL. V. CLOSURE, AND REMARKS To utilize the system-level observation history of a specific system to improve the accuracy of reliability assessment, a dynamic reliability assessment method for MSSs is developed in this paper. A Bayesian formula is put forth to recursively identify possible states of components at inspection time, further update the reliability function, and predict the remaining useful life of the system. The dynamic reliability function of MSSs and state probabilities of components in the residual lifetime are proposed for two common cases: the degradation of components follows a homogeneous continuous time Markov process, and a non-homogeneous continuous time Markov process. As demonstrated in our numerical examples, by utilizing the system-level observations of a specific system, the reliability function of the system can be dynamically updated so as to provide a more

1298

IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 4, DECEMBER 2015

accurate prediction of the remaining useful life of the specific system than that of traditional time-based reliability assessment methods. Even though only three-component multi-state systems are exemplified in this work, it is worth noting that the proposed method can be straightforwardly applied to various types of multi-state systems (e.g., multi-state network, multi-state weighted -out-of- systems, etc.) with a larger number of components, or a more complicated system configuration (beyond series, parallel, mixed series-parallel) by replacing (7) per the relation between the system's states and components' states. However, it is required that the relation between the system's states and components' states should be deterministic and exactly known as presented in Table II, and Table VI for the two studied examples respectively. There are several important issues to be addressed in our on-going works. First, in this work, we have assumed that the system state is perfectly observable during system operation. However, in some engineering practices, the system state may not be directly inspected, or the observed system state may not directly correspond to the actual performance capacity of the system. For example, a generator could be used to produce only 100 MW electric power while its actual performance capacity is greater than 100 MW. Another example is that condition monitoring systems collecting signals from sensors may be used to indirectly assess the health condition of a system. Noises and errors from condition monitoring may, therefore, lead to imperfect observations of system state or condition, which in turn will impact the accuracy of dynamic reliability assessment. In all these cases, the actual state of a system is partially observed. These situations are considered to be future research topics. Second, only system-level inspection data are considered in the present work. There is no doubt that integrating condition monitoring data collected across various physical levels of a system, say from component-level to system-level, will further improve the accuracy of dynamic reliability assessment. Third, in some engineering practices, the relation between components and systems can only be characterized by a probabilistic way rather than a deterministic model [45]. It is more challenging to conduct dynamic reliability assessment for such circumstances, and this problem will be explored in the future. Fourth, the effectiveness of maintenance activities will be greatly enhanced if the proposed dynamic reliability assessment is taken into account in maintenance scheduling, and it will be demonstrated in our future works. ACKNOWLEDGMENT Comments and suggestions from all reviewers and the Associate Editor are very much appreciated. REFERENCES [1] W. Kuo and M. J. Zuo, Optimal Reliability Modeling: Principles and Applications. Hoboken, NJ, USA: Wiley, 2003. [2] A. Lisnianski and G. Levitin, Multi-State System Reliability Assessment, Optimization, Application. Singapore: World Scientific, 2003. [3] A. Lisnianski, I. Frenkel, and Y. Ding, Multi-State System Reliability Analysis and Optimization for Engineers and Industrial Managers. London, U.K.: Springer, 2010.

[4] A. Shrestha and L. Xing, “A logarithmic binary decision diagram-based method for multistate system analysis,” IEEE Trans. Rel., vol. 57, no. 4, pp. 595–606, 2008. [5] W. Li and H. Pham, “Reliability modeling of multistate degraded systems with multi-competing failures and random shocks,” IEEE Trans. Rel., vol. 54, no. 2, pp. 297–303, 2005. [6] A. Lisnianski, “Extended block diagram method for a multi-state system reliability assessment,” Rel. Eng. Syst. Safety, vol. 92, no. 12, pp. 1601–1607, 2007. [7] Y. W. Liu and K. C. Kapur, “Reliability measures for dynamic multistate nonrepairable systems and their applications to system performance evaluation,” IIE Trans., vol. 38, no. 6, pp. 511–520, 2006. [8] I. A. Ushakov, “Universal generating function,” Soviet J. Comput. Syst. Sci., vol. 24, no. 5, pp. 118–129, 1986. [9] G. Levitin, The Universal Generating Function in Reliability Analysis and Optimization. London, U.K.: Springer, 2005. [10] W. Li and M. J. Zuo, “Reliability evaluation of multi-state weighted -out-of- systems,” Rel. Eng. Syst. Safety, vol. 93, no. 1, pp. 160–167, 2008. [11] E. Zio, M. Marella, and L. Podofillini, “A Monte Carlo simulation approach to the availability assessment of multi-state systems with operational dependencies,” Rel. Eng. Syst. Safety, vol. 92, no. 7, pp. 871–882, 2007. [12] J. E. Ramirez-Marquez and D. W. Coit, “A Monte-Carlo simulation approach for approximating multi-state two-terminal reliability,” Rel. Eng. Syst. Safety, vol. 87, no. 2, pp. 253–264, 2005. [13] Y. F. Li, E. Zio, and Y. H. Lin, “A multistate physics model of component degradation based on stochastic Petri nets and simulation,” IEEE Trans. Rel., vol. 61, no. 4, pp. 921–931, 2012. [14] M. Nourelfath, M. C. Fitouhi, and M. Machani, “An integrated model for production and preventive maintenance planning in multi-state systems,” IEEE Trans. Rel., vol. 59, no. 3, pp. 496–506, 2010. [15] D. Elmakias, New Computational Methods in Power System Reliability. London, U.K.: Springer, 2008. [16] G. Levitin and Y. S. Dai, “Service reliability and performance in grid system with star topology,” Rel. Eng. Syst. Safety, vol. 92, no. 1, pp. 40–46, 2007. [17] J. H. Saleh and J. F. Castet, Spacecraft Reliability and Multi-State Failures a Statistical Approach. West Sussex, U.K.: Wiley, 2011. [18] Y. Ding, M. J. Zuo, Z. Tian, and W. Li, “The hierarchical weighted multi-state k-out-of-n system model and its application for infrastructure management,” IEEE Trans. Rel., vol. 59, no. 3, pp. 593–603, 2010. [19] K. Hausken and G. Levitin, “Minmax defense strategy for complex multi-state systems,” Rel. Eng. Syst. Safety, vol. 94, no. 2, pp. 577–587, 2009. [20] A. Lisnianski, D. Elmakias, D. Laredo, and H. B. Haim, “A multi-state Markov model for a short-term reliability analysis of a power generating unit,” Rel. Eng. Syst. Safety, vol. 98, no. 1, pp. 1–6, 2012. [21] D. Banjevic, “Remaining useful life in theory and practice,” Metrika, vol. 69, no. 2–3, pp. 337–349, 2009. [22] S. Mathew, P. Podgers, V. Eveloy, N. Vichare, and M. Pecht, “A methodology for assessing the remaining life of electronic products,” Int. J. Perform. Eng., vol. 2, no. 4, pp. 383–395, 2006. [23] M. Pecht and J. Gu, “Physics-of-failure-based prognostics for electronic products,” Trans. Inst. Meas. Control, vol. 31, no. 3–4, pp. 309–322, 2009. [24] H. Liao, W. Zhao, and H. Guo, “Predicting remaining useful life of an individual unit using proportional hazards model and logistic regression model,” in Proc. Annu. Reliability and Maintainability Symp., 2006, pp. 127–132. [25] M. Y. You, F. Liu, and G. Meng, “Proportional hazards model for reliability analysis of solder joints under various drop-impact and vibration conditions,” J. Risk Rel., vol. 226, no. 2, pp. 194–202, 2012. [26] N. Z. Gebraeel, M. A. Lawley, R. Li, and J. K. Ryan, “Residual-life distributions from component degradation signals: A Bayesian approach,” IIE Trans., vol. 37, no. 6, pp. 543–557, 2005. [27] W. Wang and A. H. Christer, “Towards a general condition based maintenance model for a stochastic dynamic system,” J. Oper. Res. Soc., vol. 51, no. 2, pp. 145–155, 2000. [28] A. K. S. Jardine, D. Lin, and D. Banjevic, “A review on machinery diagnostics and prognostics implementing condition-based maintenance,” Mech. Syst. Signal Process., vol. 20, no. 7, pp. 1483–1510, 2006. [29] A. Heng, S. Zhang, A. C. C. Tan, and J. Mathew, “Rotating machinery prognostics: State of the art, challenges and opportunities,” Mech. Syst. Signal Process., vol. 23, no. 3, pp. 724–739, 2009.

LIU et al.: DYNAMIC RELIABILITY ASSESSMENT FOR MULTI-STATE SYSTEMS UTILIZING SYSTEM-LEVEL INSPECTION DATA

[30] Y. Peng, M. Dong, and M. J. Zuo, “Current status of machine prognostics in condition-based maintenance: A review,” Int. J. Adv. Manuf. Technol., vol. 50, no. 1–4, pp. 297–313, 2010. [31] X. S. Si, W. Wang, C. H. Hu, and D. H. Zhou, “Remaining useful life estimation-A review on the statistical data driven approaches,” Eur. J. Oper. Res., vol. 213, no. 1, pp. 1–14, 2011. [32] M. Dong and D. He, “A segmental hidden semi-Markov model (HSMM)-based diagnostics and prognostics framework and methodology,” Mech. Syst. Signal Process., vol. 21, no. 5, pp. 2248–2266, 2007. [33] A. Ghasemi, S. Yacount, and M. S. Ouali, “Evaluating the reliability function and the mean residual life for equipment with unobservable states,” IEEE Trans. Rel., vol. 59, no. 1, pp. 45–54, 2010. [34] Z. S. Ye, Y. Wang, K. L. Tsui, and M. Pecht, “Degradation data analysis using Wiener processes with measurement errors,” IEEE Trans. Rel., vol. 62, no. 4, pp. 772–780, 2013. [35] X. S. Si, W. Wang, M. Y. Chen, C. H. Hu, and D. H. Zhou, “A degradation path-dependent approach for remaining useful life estimation with an exact and closed-form solution,” Eur. J. Oper. Res., vol. 226, no. 1, pp. 53–66, 2013. [36] X. S. Si, W. Wang, C. H. Hu, D. H. Zhou, and M. Pecht, “Remaining useful life estimation based on a nonlinear diffusion degradation process,” IEEE Trans. Rel., vol. 61, no. 1, pp. 50–67, 2012. [37] G. Levitin, The Universal Generating Function for Reliability Assessment and Optimization. London, U.K.: Springer, 2004. [38] C. M. Tan and N. Raghavan, “A framework to practical predictive maintenance modeling for multi-state systems,” Rel. Eng. Syst. Safety, vol. 93, no. 8, pp. 1138–1150, 2008. [39] P. Lin, Y. Liu, X. Zhang, and Z. Huang, “Bayesian parameter estimation for multi-state components,” in Proc. 2013 Int. Conf. Quality, Reliability, Risk, Maintenance, and Safety Engineering, 2013, pp. 198–201. [40] M. J. Zuo, Z. Tian, and H. Z. Huang, “An efficient method for reliability evaluation of multistate networks given all minimal path vectors,” IIE Trans., vol. 39, no. 8, pp. 811–817, 2007. [41] Y. W. Liu and K. Kapur, “Reliability measures for dynamic multistate nonrepairable systems and their applications to system performance evaluation,” IIE Trans., vol. 38, no. 6, pp. 511–520, 2006. [42] Y. Liu and H. Z. Huang, “Optimal replacement policy for multi-state system under imperfect maintenance,” IEEE Trans. Rel., vol. 59, no. 3, pp. 483–495, 2010. [43] W. Li and M. J. Zuo, “Reliability evaluation of multi-state weighted -out-of- systems,” Rel. Eng. Syst. Safety, vol. 93, no. 1, pp. 160–167, 2008. [44] M. J. Carr and W. Wang, “Modeling failure modes for residual life prediction using stochastic filtering theory,” IEEE Trans. Rel., vol. 59, no. 2, pp. 346–355, 2010. [45] A. G. Wilson and A. V. Huzurbazar, “Bayesian networks for multilevel system reliability,” Rel. Eng. Syst. Safety, vol. 92, no. 10, pp. 1413–1420, 2007. [46] Y. Liu, M. J. Zuo, and H. Z. Huang, “Dynamic reliability assessment for multi-state degraded systems,” in Chemical Engineering Trans. (Proc. 2013 Prognostics and System Health Management Conf.), Milan, Italy, Sep. 8–11, 2013, vol. 33, pp. 535–540. Yu Liu (M'14) received his Ph.D. degree in Mechatronics Engineering from the University of Electronic Science and Technology of China in 2010.

1299

He is a Professor in the School of Mechanical, Electronic, and Industrial Engineering, at the University of Electronic Science and Technology of China. He was a Visiting Pre-doctoral Fellow in the Department of Mechanical Engineering at Northwestern University, USA from 2008 to 2010; and a Postdoctoral Research Fellow in the Department of Mechanical Engineering, at the University of Alberta, Canada from 2012 to 2013. He has published over 50 peer-reviewed papers in international journals and conferences. His research interests include system reliability modeling and analysis, maintenance decisions, prognostics and health management, and design under uncertainty.

Ming J. Zuo (S'89-M'89-SM'00) received the Bachelor of Science degree in Agricultural Engineering in 1982 from Shandong Institute of Technology, China; and the Master of Science degree in 1986, and the Ph.D. degree in 1989, both in Industrial Engineering, from Iowa State University, Ames, IA, USA. He is affiliated with the University of Electronic Science and Technology of China, and the Department of Mechanical Engineering at the University of Alberta, Canada. His research interests include system reliability analysis, maintenance modeling and optimization, signal processing, and fault diagnosis. Dr. Zuo is an Associate Editor of the IEEE TRANSACTIONS ON RELIABILITY (2009–2014, 2015-present), Department Editor of IIE Transactions, Regional Editor for North and South American region for International Journal of Strategic Engineering Asset Management, and Editorial Board Member of Reliability Engineering and System Safety, and International Journal of Reliability, Quality and Safety Engineering. He is a Fellow of IIE, Fellow of EIC, and Founding Fellow of International Society of Engineering Asset Management.

Yan-Feng Li received his Ph.D. degree in Mechatronics Engineering from the University of Electronic Science and Technology of China in 2013. He is currently a faculty member of the University of Electronic Science and Technology of China. His research interests include the reliability analysis and evaluation of complex systems, dynamic fault tree modeling, Bayesian networks modeling, and probabilistic inference.

Hong-Zhong Huang (M'06) received a PhD degree in Reliability Engineering from Shanghai Jiaotong University, China He is a Professor of the School of Mechanical, Electronic, and Industrial Engineering, at the University of Electronic Science and Technology of China. He has held visiting appointments at several universities in the USA, Canada, and Asia. He has published 200 journal papers and 5 books in the fields of reliability engineering, optimization design, fuzzy sets theory, and product development. His current research interests include system reliability analysis, warranty, maintenance planning and optimization, and computational intelligence in product design. Prof. Huang is an ISEAM Fellow, a technical committee member of ESRA, a Co-Editor-in-Chief of the International Journal of Reliability and Applications, and an Editorial Board Member of several international journals. He received the William A. J. Golomski Award from the Institute of Industrial Engineers in 2006, and the Best Paper Award of the ICFDM2008, ICMR2011, and QR2MSE2013.

Suggest Documents