International Journal of Production Research, Vol. 43, No. 6, 15 March 2005, 1275–1293
HMMs for diagnostics and prognostics in machining processes P. BARUAH and R. B. CHINNAM* Department of Industrial and Manufacturing Engineering, Wayne State University, 4815 Fourth Street, Detroit, MI 48202, USA
(Received 21 October 2004) Despite considerable advances over the last two decades in sensing instrumentation and information technology infrastructure, monitoring and diagnostics technology has not yet found its place in health management of mainstream machinery and equipment. This is in spite of numerous studies reporting that the expected savings from widespread deployment of condition-based maintenance (CBM) technology would be in the tens of billions of dollars in many industrial sectors as well as in governmental agencies. It turns out that a prerequisite to widespread deployment of CBM technology and practice in industry is cost efficient and effective diagnostics and prognostics. This paper presents a novel method for employing hidden Markov models (HMMs) for carrying out both diagnostic as well as prognostic activities for metal cutting tools. The methods employ HMMs for modelling sensor signals emanating from the machine (or features thereof), and in turn, identify the health state of the cutting tool as well as facilitate estimation of remaining useful life. This paper also investigates some of the underlying issues of proper HMM design and training for the express purpose of effective diagnostics and prognostics. The proposed methods were validated on a physical test-bed, a vertical drilling machine. Experimental results are very promising. Keywords: Diagnostics; Hidden-Markov-models; Process monitoring; Prognostics; Remaining useful life
1. Introduction Despite considerable advances in sensing instrumentation, data acquisition hardware, and information technology infrastructure, condition monitoring and diagnostics are still largely reserved for only the most critical system components and have not yet found their place in health management of mainstream machinery and equipment (Kacprzynski 2000). In the case of prognostics, there exist no robust methods for even the most critical system components. [Diagnostics has traditionally been defined as the ability to detect and sometimes isolate a faulted component and/ or failure condition. Prognostics build upon the diagnostic assessment and are defined here as the capability to predict the progression of this fault condition to component failure and estimate the remaining useful life (RUL).] Recognizing the inability to prevent costly unscheduled equipment breakdowns through *Corresponding author. E-mail:
[email protected] International Journal of Production Research ISSN 0020–7543 print/ISSN 1366–588X online # 2005 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00207540412331327727
1276
P. Baruah and R. B. Chinnam
preventive maintenance (PM) activities and basic condition monitoring methods, there seem to be consensus within different industries that one of the next great opportunities for successful competition in the global market lies in widespread deployment of condition-based maintenance (CBM) technology. For motivation, consider the following examples. . Maintenance activities at the United States Department of Defense alone costs over $40 billion annually (Department of Defense 2000 Maintenance Policy: Programs and Resources Fact Book 2000). . Maintenance costs at medium-sized power utility companies exceed operating profit (Geibig 1999). . Average cost cutting potential for maintenance activities is a staggering 50% (Theede 1999). . Annually savings from widespread deployment of CBM technology in the United States alone is estimated at $35 billion (Lee 2003). A prerequisite to widespread deployment of effective CBM technology and practice in industry is effective diagnostics and prognostics (NIST 1998). CBM entails maintenance of systems and equipment based on an assessment of current and projected condition or health (Lebold 2001). CBM increases system efficiency through elimination of unnecessary maintenance. More importantly, CBM leads to better system availability. The economic ramifications of CBM are many fold since it affects labour requirements, replacement part costs, and the logistics of scheduling routine maintenance (Carey 2000). Development of effective methods for diagnostics and in particular prognostics has been one of the main research thrusts of the CBM community for the last few years. A primary challenge in performing diagnostics for machining processes is the need to achieve a high degree of accuracy in classifying a cutting tool’s health state in real-time given some sensory signals. Industry experience shows that very few false alarms during the technology evaluation phase can easily lead the operators to disable the monitoring and diagnostic modules. Given our current inability to fully explain the changes in these machine states or conditions through mechanistic modelling (i.e. models derived from first principles), there is a need to heavily rely on empirical models developed from sensor signals, often employing sensor-fusion (Hall 2001). There are at least two reasons why ‘generic’ CBM technology based on empirical methods seems promising. First, the cost of developing first principles based diagnostic and prognostic algorithms is prohibitive and can only be justified for extremely critical systems. An extremely small fraction of machinery and equipment would fall into this category. Secondly, generic technology based on recent advances in statistical and computational intelligence methods seem to lend itself for effective diagnostics, and more importantly, widespread deployment in a cost effective manner. The focus of this paper is on the promise of one such family of statistical methods for developing effective CBM technology for tools in machining processes. A stochastic approach called a hidden Markov model (HMM) has been quite effective in applications such as speech processing and medical diagnostics (Liporace 1982, Rabiner 1989). HMMs are statistical models that are based on the principles of Markov chains. They were initially introduced in the early 1970s (Baum 1966, 1972) and became very popular in the late 1980s and 1990s (Rabiner 1989). There are two strong reasons behind this. First, the models are very rich in mathematical structure
HMMs for diagnostics and prognostics in machining processes
1277
and hence can form the theoretical basis for use in a wide range of applications. Secondly, the models, when applied properly worked very well in several important applications (Rabiner 1989). An article by Olivier Cappe´: ‘Ten years of HMMs’ clearly shows, with well over 300 references, HMM’s widespread acceptance and successful use in different areas (Olivier 2001). They are currently regarded the ‘gold standard’ for the extremely difficult and somewhat similar task of speech recognition. An added benefit of employing HMMs is the ease of model interpretation in comparison with the so-called ‘black box’ modelling methods such as artificial neural networks, often employed in advanced diagnostic models. There has been limited use of HMMs by the diagnostics community. Literature does suggest the applicability of these types of models for carrying out diagnostics (Carey 2000, Altas 2002, Begg 1999, Ertunc 2000). However, important issues such as the task of selecting an appropriate HMM model from among the different types of HMMs and the task of setting the number of states within the HMM is not well covered in the literature. What is also missing from the literature is proper justification of HMMs for diagnostics (given the assumptions that underlie any HMM). In recent years, numerous extensions to the classic HMM were proposed (Blimes 1998a, Bengio and Frasconi 1996, Brand 1996, Fine et al. 1998, Ghahramani and Jordan 1997). The diagnostics community is yet to investigate the advantages of using any of these models. Literature on prognostic methods is extremely limited but the concept has been gaining importance in recent years. Unlike numerous methods available for diagnostics, prognostic methods are still in their infancy and literature is yet to present a working model for effective prognostics. The theory presented in this paper in combination with the case study shows successful application of HMMs for carrying out both diagnostic and prognostic activities for machining processes. In particular, the study involves monitoring of drill bits by employing a thrust-force and torque dynamometer. This paper also investigates some of the underlying issues of proper HMM design and training for the purpose of diagnostics, missing from the literature. The ability to perform prognostics through HMMs is a novel contribution in this paper, suggesting a method for estimating RUL and the associated confidence intervals. The paper is organized as follows: Section 2 provides a brief background on HMMs, while section 3 presents a framework for carrying out diagnostics and prognostics using HMMs. Section 4 shows the results from a drilling process case study, and section 5 offers a discussion regarding the limitations of HMMs and their overall suitability for diagnostic and prognostic activities. Finally, section 6 offers some concluding remarks.
2. Brief background on HMMs A HMM is a stochastic technique for modelling signals that evolve through a finite number of states. The states are assumed hidden and responsible for producing observations. In particular, a (first order) HMM assumes that the system behaviour depends only on the current state. The objective is to characterize the states given the observations. Denoting Xt as the hidden state at time t and Ot as the observation sequence, assuming that there are K possible states, we have then Xt 2 f1, . . . , Kg. Ot might be a discrete symbol, Ot 2 f1, . . . , Lg, or a feature vector
1278
P. Baruah and R. B. Chinnam
from an L dimensional space, Ot 2 RL . Characterization of a model is through its parameters. The parameters for a basic (first order) HMM are the initial state distribution ðiÞ ¼ PðX1 ¼ iÞ, the transition model A ¼ fi, j g ¼ PðXt ¼ jjXt1 ¼ iÞ, and the observation model bj ðot Þ ¼ pðOt ¼ ot jXt ¼ jÞ, which is the probability of a particular observation vector at a particular time t for state j. The complete collection of parameters for all observation distribution is represented by B ¼ fbj ðÞg. In a basic HMM, these parameters are assumed stationary or time-invariant, i.e. the state transition matrix and the observation models do not change with time. Hence, a HMM can be written as HMMð, A, BÞ. ðÞ normally represents a multinomial distribution. The transition model is usually characterized by a conditional multinomial distribution: A ¼ fi, j g ¼ PðXt ¼ jjXt1 ¼ iÞ, where A is a stochastic matrix (each row sums to one). If the observations are discrete symbols, one can represent the observation model as a matrix B ¼ fbi ðkÞg ¼ PðOt ¼ kjXt ¼ iÞ. There are many possible observation distributions. When observations are discrete, the distributions fbi ðkÞg are mass functions and when the observations are continuous, the distributions are typically specified using a parametric model family. It is common to represent PðOt jXt Þ as a Gaussian for observation vectors in RL (Murphy 2002): PðOt ¼ ojXt ¼ iÞ ¼ Nðo; ki , Di Þ
ð1Þ
where Nðo; ki , Di Þ is the Gaussian density with mean k and covariance Di evaluated at o: Nðo; ki , Di Þ ¼
1 ð2ÞL=2 jDj
1 2
exp 12ðo kÞ0 D1 ðo kÞ
ð2Þ
A more flexible representation is a mixture of M Gaussians: PðOt ¼ ojXt ¼ iÞ ¼
M X
PðMt ¼ mjXt ¼ iÞNðo; km, i , Dm, i Þ
ð3Þ
m¼1
where Mt is a hidden variable that specifies the mixture components, and PðMt ¼ mjXt ¼ iÞ ¼ ci, m is the conditional prior weight of each mixture component. It is not necessary for the observations to be discrete, continuous, scalar-, or vector-valued. Generally, models for discrete symbols are called discrete HMMs and those that model continuous observations are called continuous HMMs. In conventional designation, Xt is hidden and Ot is observed. The rest of this document follows this notation. Graphically, any HMM can be depicted in at least three ways (Bilmes 2002). The first view portrays only a directed state-transition graph as in figure 1(a). This graph only depicts allowable transitions in the HMM’s underlying Markov chain and not output distributions or the conditional independence properties of the chain. A second HMM view displayed in figure 1(b) shows how HMMs are one instance of a directed graphical model (DGM; also called a Bayesian network). This view shows the conditional independence property of the HMM and its temporal progression, but the Markov chain topology is unspecified. It is clear that both these views show different HMM properties. Authors believe that both of these views combined could reveal most of the information about the model.
HMMs for diagnostics and prognostics in machining processes S1
S2
Ot-1
Ot
Xt-1
Xt
Ot+1
1279
Ot+2
S3
S8
S4 S7
S6
S5
(a)
Xt+1
Xt+2
(b)
Figure 1. Graphical illustration of an HMM. (a) Stochastic finite-state automation view of an HMM. Arrows denote non-zero state transition probability. (b) HMM shown as a graphical model.
2.1 HMM learning HMM models joint probability of a collection of random variables fO1 , . . . , OT , X1 , . . . , XT g. Under an HMM there are two conditional assumptions made about these random variables that make associated algorithms tractable. The independent assumptions are: The tth hidden variable, given the (t 1)st hidden variable, is independent of previous variable, or PðXt jXt1 , Ot1 , . . . , X1 , O1 Þ ¼ PðXt jXt1 Þ
ð4Þ
The tth observation, given the tth hidden variable, is independent of other variables, or PðOt jXT , OT , XT1 , OT1 , . . . , X1 , O1 Þ ¼ PðOt jXt Þ
ð5Þ
Now, learning refers to the estimation of parameter of HMM: , A and B. It is normally carried out through an iterative learning process. A-priori values of , A and B are assumed and observations are presented iteratively to the model for estimation of parameters. Maximum likelihood is the basic concept behind this estimation procedure. In each iteration, the goal is to maximize the expected data log-likelihood (logarithm of the probability that the model generated the observation sequence). All the results reported in this paper are based on the use of the BaumWelch’s expectation maximization algorithm (Poritz 1988). This iterative process continues until the change in log-likelihood is less than some threshold and convergence is declared. This procedure should normally be repeated several times, for the algorithm normally converges to a local best estimate (i.e. a local optimal solution). Here we briefly discuss the EM algorithm for continuous HMM with mixture of Gaussians. The two basic problems associated with HMM learning are: . To find pðOjlÞ for some observation sequence O ¼ ðo1 , . . . , oT Þ. We use the forward-backward (FB) algorithm (described in section 2.1.1) for this purpose since it is more efficient than the direct evaluation method. . To find l ¼ arg maxl pðOjlÞ. The Baum-Welch (also called EM for HMMs) algorithm solves this problem. 2.1.1 FB algorithm: estimating pðOjkÞ from given O ¼ ðo1 , . . . , oT Þ. pðOjlÞ is obtained using either the forward procedure or the backward procedure of the FB-algorithm. The forward procedure consists of recursive estimation of def
i ðtÞ ¼ pðO1 ¼ o1 , . . . , Ot ¼ ot , Xt ¼ ijlÞ,
1280
P. Baruah and R. B. Chinnam
which is the probability of seeing the partial sequence o1 , . . . , ot and ending up in state i at time t. Here l ¼ ð, A, BÞ denotes the complete set of HMM parameters. Recursive estimation of i ðtÞ is as follows (Blimes 2002): i ð1Þ ¼ i Bi ðo1 Þ hXN i ðtÞaij Bj ðotþ1 Þ j ðt þ 1Þ ¼ i¼1 i XN ðTÞ: pðOjlÞ ¼ i¼1 i
ð6Þ
The backward procedure involves recursive estimation of def
i ðtÞ ¼ pðOtþ1 ¼ otþ1 , . . . , OT ¼ oT , Xt ¼ ijlÞ, which is the probability of the ending partial sequence otþ1 , . . . , oT given that we started at state i at time t. i ðtÞ can be computed efficiently as (Blimes 1998b): i ðTÞ ¼ 1 XN a B ðo Þ ðt þ 1Þ j ðtÞ ¼ i¼1 ij j tþ1 j XN pðOjlÞ ¼ ð1Þi Bi ðo1 Þ i¼1 i
ð7Þ
2.1.2 EM for estimation of HMM parameters, i.e. k ¼ ðp, A, BÞ. EM is an optimization algorithm; its goal is to find l ¼ arg maxl pðOjlÞ. EM algorithm uses both the forward and backward procedures outlined above. We define the following: def
. i ðtÞ ¼ pðXt ¼ ijO, lÞ, which is the probability of being in state i at time t for the state sequence O. Using Markovian conditional independence assumption, i ðtÞ can be expressed in terms of i ðtÞ and i ðtÞ as (Blimes 1998b): ðtÞi ðtÞ i ðtÞ ¼ PN i j¼1 j ðtÞj ðtÞ
ð8Þ
In particular, when PðOt jXt Þ is modelled as a mixture of Gaussians, we define the probability that lth component of the ith mixture generated observation ot as il ðtÞ ¼ i ðtÞ
cil Bil ðot Þ ¼ pðXt ¼ i, Mit ¼ ljO, lÞ bi ðot Þ
ð9Þ
where, Mit is a random variable indicating the mixture component at time t for state i. def
. ij ðtÞ ¼ pðXt ¼ i, Xtþ1 ¼ jjO, lÞ, which is the probability of being in state i at time t and being in state j at time t þ 1. ij ðtÞ can be expanded as: ij ðtÞ ¼
i ðtÞaij Bj ðotþ1 Þj ðt þ 1Þ : i ðtÞ
ð10Þ
Given there are E observation sequences, the eth being of length Te , the EM updating equations for the estimates of a HMM (i.e. , A
HMMs for diagnostics and prognostics in machining processes
and Bðc, , Þ) are (Blimes 1998b): XE e ð1Þ e¼1 i i ¼ E XE XTe aij ¼ XEe¼1 XTt¼1 e t¼1
e¼1
XE XTe cil ¼
Xe¼1 E
Xt¼1 Te
e¼1
t¼1
XE XTe t¼1 il ¼ Xe¼1 E XTe e¼1
1281
ije ðtÞ ie ðtÞ ile ðtÞ ie ðtÞ
ð11Þ
ile ðtÞoet
t¼1
ie ðtÞ
XE XTe il ¼
e¼1
e ðtÞðoet il Þðoet il ÞT t¼1 il X E XTe e ðtÞ e¼1 t¼1 i
2.2 Training the HMM based classifier The word ‘training’ of a classifier refers to estimation of parameters of the classifier given sample signals using a supervised or an unsupervised training scheme. We discuss here a supervised training scheme where gathered observations (whether discrete symbols or continuous signals) need to be labelled first according to some labelling criterion; they will constitute the training data set. For example, if the task of the classifier is to recognize M distinctly different states of a machine (for a given failure mode), the assumption here is that observation sequences are available from each of these machine states. Given these M groups of observation sequences (where each group contains one or more observation sequences), M different HMMs are trained for characterization of each group, and in turn, the machine state. These M HMMs collectively constitute the classifier and their parameters constitute the classifier’s parameters set. Therefore, the learning algorithm for the HMM-based classifier is a simple sequential application of EM algorithm for learning of each of the M HMMs. 2.3 HMM-based classifier for detection of health state The procedure for classification or identification of the machine state using any future (or unlabelled) observation sequence using the M trained HMMs is rather straightforward. Each of the M trained HMMs are presented with the same sequence and based on the value of highest likelihood the sequence can be classified, i.e. sequence O belongs to mth HMM if pðOjlm Þ ¼ arg max1kM ðpðOjlk Þ. This procedure is illustrated in figure 2. Here, PðOjlÞ is obtained using the forward-backward algorithm discussed in section 2.1.1. Choice for using a discrete or continuous HMM to build a classifier lies in whether the observations are discrete or continuous. There are instances when
1282
P. Baruah and R. B. Chinnam HMM1 Log-likelihood computation HMM2 Sensory signals
Perceptual processing
Observation Sequence
Log-likelihood computation
Select maximum
Index of recognized state
HMMM Log-likelihood computation
Figure 2. Block diagram for a HMM-based classifier. The perceptual processing phase produces the feature vector and already trained HMMs are used for indexing them into a particular class.
continuous signals are transformed into discrete sequences. There are several techniques available to quantize continuous observations Ot 2 RL via codebooks, etc., into discrete symbols, Ot 2 f1, . . . , Lg. The advantage in doing so is that discrete HMMs (necessary for modelling discrete symbols) do not invoke any assumption regarding the underlying distribution of the observed symbol. Instead, the probability mass function associated with each state is estimated in a nonparametric form. However, such quantization or discretization could lead to a loss of information from the observation sequence (Rabiner 1989). Thus, it might be advantageous to work with continuous HMMs as long as the necessary assumptions regarding the observation densities are not seriously violated.
3. Diagnostics and prognostics using HMMs 3.1 HMMs for diagnostics As defined earlier, diagnostics involves detection and sometimes isolation of a faulted component and/or failure condition. By definition it lacks the power of forecasting future performance, something fundamental to carrying out effective prognostics. As illustrated in figure 3, a typical diagnostics tool consists of sensors, feature selection algorithms, inference algorithms, and action functions. Given a set of appropriate sensor signals (preferably conditioned), the feature selection process can be rather involved and might utilize time-domain analysis methods, frequency domain analysis methods (such as FFT and STFT), as well as joint time-frequency-domain analysis methods (such as wavelets and empirical model decomposition). Action function uses the results from an inference model, and in turn activates another system, e.g. an alarm. The inference model block is the focus here. HMMs could be successfully employed for inference modelling for several diagnostic systems. The task is to develop a trained HMM to recognize the different states of the machining process. For example, if the goal is to develop a diagnostics model for classifying the state of a drill bit, one would develop multiple-HMMs to recognize the different states of interest (such as ‘good bit’, ‘slightly worn out bit’, and ‘fully worn out bit’). Given a set of labelled observation sequences for each of the distinct states of interest
HMMs for diagnostics and prognostics in machining processes Physical Process
Sensor(s)
Figure 3.
Feature selection
Inference model(s)
1283
Action function
Components of a typical diagnostics model.
(i.e. M), the process outlined in section 2 can be employed for building the trained HMMs. Once the HMMs are trained, the process outlined in figure 2 can be employed for state classification. The performance of any inference model can greatly depend on the quality of the features extracted during the feature extraction process. While ‘sophisticated’ features could make the task of the inference model easy, this often comes at the expense of time (necessary for feature selection), computation-complexity, and cost. The goal is to develop an inference model that is not demanding in terms of feature extraction and yet delivers accurate inferences without a lot of computational complexity (and hardware implementation costs if appropriate). In that sense, the continuous HMMs employed here for performing diagnostics in a drilling process almost eliminated the need for any feature extraction. A HMM-based classifier as a diagnostics module can be a generic diagnostic engine for any kind of real world signal that we monitor. It has been argued in literature (Blimes 2002) that only a finite number of states are needed to model real-world signals. Here real world signals encompass any band-limited nonstationary or non-steady signals. An HMM’s Markov chain establishes the temporal evolution of the process’s statistical properties. Therefore, a HMM with a Markov chain having fast enough average state-change and having enough states can capture all the inherent signal variability.
3.2 HMMs for prognostics Prognostics build upon the diagnostic assessment and are defined here as the capability to predict the progression of this fault condition to component failure and estimate the remaining useful life (RUL). It essentially extends the performance of the diagnostics tools by adding ‘forecasting’ functionality. For example, if the diagnostics process revealed a failure mechanism in action, it is the function of the prognostics process to forecast the future course of action of the failure mechanism as well as estimate the RUL under current operating conditions. In this paper, we demonstrate that the information so extracted from HMMs employed for diagnostics can also be utilized for carrying out prognostics. Let us suppose that M HMMs have been successfully trained to recognize the M distinct sequential states of interest for a failure mechanism. Presentation of temporally ordered (by life/usage) observation sequences from such a process would yield the sorts of log-likelihood trajectories illustrated in figure 4. If HMMj results in the largest log-likelihood for any given observation sequence Oi, then, one would declare the process to be in state j. The coordinates of the points of intersection of the log-likelihood profiles for the different HMMs along the life/ usage axis in figure 4 represent the estimated state-transition time instants. It is these state transition points that would allow us to extend the use of HMMs for prognostics. This process is described below.
1284
Log-likelihood
P. Baruah and R. B. Chinnam
HMM1
HMM2
…
HMMM
tS
1→S2
Observation sequences
Figure 4. Log-likelihoods resulting from the different HMMs employed by a diagnostics model when presented with temporally ordered (by life/usage) observation sequences.
If the objective of the diagnostics process is to recognize the M distinct states of a cutting tool, when the trained HMM set (i.e. HMM1, HMM2, . . . , HMMM) is presented with a temporally ordered observation sequence from a unit, as illustrated in figure 4, it will result in a vector of M 1 state transition times denoted by S ¼ ½tS1 !S2 , tS2 !S3 , . . . , tSM1 !SM . Let us suppose that observation sequences are available from R similar units, collected for the purpose of developing diagnostics and prognostics models. This will result in R estimated vectors of state transition times, denoted by ! ¼ ½S1 , S2 , . . . SR . The fundamental assumption here is that ! provides the information necessary to carryout prognostics. The procedure is as follows. We start with the assumption that ! follows some multivariate distribution. Once the distribution is assessed, the conditional probability distribution of a distinct state transition tSi !Siþ1 given the previous state transition points (i.e., tS1 !S2 , . . . , tSi1 !Si ) for any ‘individual’ unit under investigation can be estimated. The process of constructing any necessary confidence intervals is rather straightforward as well. The process can be iterated in a recursive manner to make predictions regarding several sequential state transition times. As expected, the larger the number of state transitions already witnessed for a unit the tighter the prediction intervals associated with the final states of the unit. Section 4 illustrates this method in more detail using a case study.
4. Case study: diagnostics and prognostics in a drilling process Drilling process, one of the most commonly used machining processes, is selected here as a test-bed for validating the proposed HMM-based diagnostics and prognostics framework. The proposed methods should remain relevant for other machining processes as well. The diagnostics objective here is to assess the health or well being of the drill bit during the machining process by utilizing thrust-force and torque signals generated by a Transducer Techniques dynamometer. The prognostics objective is to estimate the RUL of the drill bit. In particular, the goal is to access the condition of the cutting edges of the drill bit that are normally subjected to gradual wear along the cutting lips and the chisel edge. These wear processes are gradual degradation processes, and hence, their improved compatibility with the proposed framework. Literature demonstrates that thrust-force and torque signals have a strong correlation with the condition of the bit (in terms of geometry), and hence the selection of these signals. On the contrary, if the goal is to predict
1285
HMMs for diagnostics and prognostics in machining processes
micro chipping along the cutting lips, thrust-force and torque signals may not be appropriate for they respond too late. This will certainly prevent us from performing any meaningful prognostics. A more appropriate signal for such catastrophic failures is an acoustic emission signal (that gives out bursts of energy starting at the earliest stages of crack formation and propagation). The experimental set-up consists of a HAAS VF-1 CNC Machining Center, a workstation with LabVIEW software for signal processing, a Kistler 9257B piezo-dynamometer for measuring thrust-force and torque, and a National Instruments PCI-MIO-16XE-10 card for data acquisition. The experimental set-up is depicted in figure 5. Stainless steel bars with a thickness of 0.25 inches are used as specimens for the tests. The drill bits were high-speed twist drill bits with two flutes, and were operated under the following conditions without any coolant: feed-rate of 4.5 inches per minute (ipm) and a spindle speed of 800 revolutions per minute (rpm). Fourteen drill bits were used in the experiment. Each drill bit was used until it reached a state of physical failure either due to excessive wear or due to gross plastic deformation of the tool tip due to excessive temperature (resulting from excessive wear). The thrust-force and torque data was collected for each hole from the time instant the drill penetrated the work piece through the time instant the drill tip protruded out from the other side of the work piece. The data was collected at 250 Hz, considered adequate to capture cutting tool dynamics in terms of thrustforce and torque. For illustrative purposes, the raw data collected from drill bit no. 8 is depicted in figure 6. Since the drill bits were used until they reached a state of total physical failure, the data collected from the very last hole attempted by any given drill bit is corrupt and tends to be quite noisy and unpredictable. Given that, the information collected from the last attempted hole is ignored through out this study.
Experimental setup for capturing thrust-force and torque degradation signals from 00 a 14 HSS drill bit.
3500 3000
Hole #23
2500 2000
Hole #1
1500 1000 500 0
Figure 6.
Actual torque in Newtonmetersfor drill bit no. 3
Actual thrust force in Newtons for drill bit no. 8
Figure 5.
9 8 7 6 5 4 3 2 1 0
Hole #23
Hole #1
Plots of thrust-force and torque signals collected from drill bit no. 8.
1286
P. Baruah and R. B. Chinnam
4.1 Preparing data for model building 4.1.1 Defining a training sequence. Throughout this study, the training sequence is defined as one that covers an individual hole. Due to bit wear and non-uniformity of the work piece surface, the actual time necessary to drill a hole varies somewhat from hole to hole. This results in training sequences of different length. Given that the signature from any single hole is non-stationary in nature (as evident from figure 6), care should be exercised in ensuring consistency of the sequences presented to the diagnostic HMMs in terms of ‘scope’ and ‘meaning’. For example, if training sequences are generated from random segments of a hole, one might have limited success training the HMMs but there exists no guarantee that the models will offer any generalization. This is not so critical if the signals are stationary in nature (such as current signals and vibration signals). 4.1.2 Issue of resolution. The smaller the time window covered by the sequence the better the resolution (health state recognition resolution improves during actual implementation). However, this is only possible to the extent that there is enough information in the sequence for potential state recognition. Observational sequences that are too short can lead to false alarms or misclassifications. In this study, the assumption is made that a resolution of one hole is acceptable. 4.1.3 Normalization. Thrust-force and torque signal amplitudes are quite different as apparent from figure 6. To improve the convergence properties of the EM algorithm used for training the HMMs, the observational sequences are all normalized individually. 4.1.4 Deciding on the number of health states. Mechanics of drilling processes suggest that a typical drill bit (subjected to failure due to gradual wear) evolves through three distinct health-states prior to reaching failure. The initial phase, labelled here as ‘good’, is a short span during which the drill bit wears in. The second phase, labelled here as ‘medium’, is the longest span during which the drill bit is subjected to very gradual wear. The third phase, labelled here as ‘bad’, is the span during which rapid degradation takes place. The very last phase is defined here as ‘worst’ and represents the very last hole or two during which the drill bit is practically ineffective in maintaining any sort of tolerances on the generated hole. We assumed that a typical drill bit on the average goes through all these distinct health-states. Hence, M ¼ 4 HMMs need to be trained to recognize these distinct states (labelled here as HMMGood, HMMMedium, HMMBad, and HMMWorst). 4.1.5 Labelling the individual training sequences. In this study, it would have been best to label the individual training sequences (i.e. holes) by employing either a machining expert or a more objective technical measure such as degree of rake wear. However, this would have called for removal of the drill bit from the machine after completing every hole for evaluation. This might still be warranted in a realworld application (to achieve good diagnostics accuracy). However, to improve the pace of experimentation, the following labelling scheme is employed in this study: First two holes of the drill bit are always labelled good, the three middle most holes are labelled medium, the second last and the third last holes are labelled bad, and the last hole is labelled as worst.
HMMs for diagnostics and prognostics in machining processes
1287
4.1.6 Preparing training and testing data sets. Out of the 14 drill bits tested, data from 10 bits are being used for training the HMMs and the data from the remaining four bits are used for testing the models. This leads to 2 10 ¼ 20 training sequences for HMMGood, 30 for HMMMedium, 20 for HMMBad, 10 for HMMWorst. Please note here that these drill bits lasted anywhere between 18 and 55 holes.
4.2 HMM design issues 4.2.1 Choice of HMM type. For the current case study, an ergodic model is most appropriate for imposing any constraint on the model freedom (such as the enforcement of a strict left–right state transition constraint) may impair its ability to model the data. While literature shows that a left–right model is sometimes appropriate for these kinds of observations, there are some potential difficulties in envisioning the drilling hole cycle as a LR process. If the drilling cycle is physically a LR process, modelling the observations through a LR-HMM will definitely lead to better generalization. The joint scatter plot of thrust-force vs torque illustrated in figure 7(b) clearly shows that the path followed by the hole exhibits local randomness, meaning that the hidden states of the HMM could be potentially revisited multiple times. Thus, even if the physical phenomenon of drilling process suggests an LR process, the captured representations do not. In our case study, preliminary analysis revealed that ergodic models in fact offer superior performance to LR models. There are other types of HMMs besides Ergodic HMMs and LR-HMMs. Factorial HMMs, auto-regressive HMMs, hidden filter HMMs etc., are some special cases of HMMs (Murphy 2002). These models succinctly describe certain classes of signals. In this paper, not all these types of HMMs are investigated; the chief interest has been to investigate applicability of HMMs as generic diagnostic/prognostic engines.
1.5
1.5
1
1 Normalized Torque
Normalized Thrust & Torque
4.2.2 The issue of number of HMM states. Number of states within the HMM affects generalization of the model. While a HMM with too many states may yield high log-likelihoods during the training phase, they most often lead to an over fitted model, resulting in poor performance during the testing phase. There exist at least
0.5 0 -0.5 -1
0.5 0 -0.5 -1 End
-1.5 -2
-1.5 Start 0
End 50
100 Time
150
200
-2 -2
Start
-1
0 1 Normalized Thrust
2
Figure 7. (a) Normalized thrust-force and torque signals for a particular hole. (b) Joint plot of normalized thrust-force and torque signals during a particular hole. Note the overall shape in spite of local randomness.
1288
P. Baruah and R. B. Chinnam
two options for setting the number of states: (a) Choose roughly the same number of states that are actually present in a drilling cycle during a typical hole. (b) Employ cross-validation methods for optimization of number of states. Option (a) is obviously more demanding in terms of the knowledge of process mechanics. In the absence of such knowledge, option (b) is our best bet. In this study, a cross-validation process is employed. Figure 8 shows the results from this process. Furthermore, each HMM is restricted to have the same number of states. This is because comparison of their likelihood is the basis of classification. 4.2.3 Iterative training. Given the susceptibility of the Baum Welch’s EM algorithm to converge toward a local optimal solution, the HMMs need to be trained multiple times starting from different initial states to ensure a good solution if not an optimal solution. 4.2.4 Sensor fusion. The use of multiple sensors to increase the capability of any intelligent system has received considerable attention in recent years. One of the several advantages of using multiple sensors is that it can improve ‘observability’. Given the goal of efficient and effective diagnostics and prognostics, sensor fusion is often the way to go. However, one should carefully test the hypothesis that additional sensors might lead to an improvement in the overall performance. In the absence of such a proof, sensor fusion should be avoided for it puts an extra burden on the model developer and could negatively affect the performance (in terms of both computational complexity and effectiveness). In this study, thrust-force and torque signals are used to characterize the health-state of the bit. While there exists a high degree of correlation between 1x104
%classification error
Log-likelihood
0.5
Thrust signal
HMMBad
HMMMedium
HMMGood
0
HMMWorst
-0.5 -1 -1.5
20 15 10 5 0 0
-2
5
10
15
20
Number of states -2.5
0
10
20
30
40
No. of epochs
Torque signal
% classification error
% classification error
Thrust and torque signals 20 15 10 5 0 0
Figure 8.
5
10 15 Number of states
20
20 15 10 5 0 0
5
10 15 Number of states
20
Illustration of the cross-validation method for deciding on number of states. Also note the beneficial impact of sensor fusion.
HMMs for diagnostics and prognostics in machining processes
1289
the ztwo (in the order of 85%), they are still complementary. In this study, sensor fusion has led to a 5% improvement in accuracy, as illustrated in figure 8.
4.3 Diagnostics using HMMs 4.3.1 State classification. Once the HMMs are trained, sequence (or hole) classification is performed based on the procedure illustrated in figure 2. The classification accuracy came to be 97.5% for the training dataset and 96.9% for the testing data set. For illustrative purposes, the trained HMMMedium is superimposed onto the joint scatter plot of hole no. 6 of drill bit no. 8 as shown in figure 9. This particular HMM had 10 hidden states, hence the 10 ellipsoids. The location of the ellipse corresponds to the mean vector of the observable bivariate Gaussian density and the major and minor axes represent the Eigen vectors of the covariance matrix. It should be evident from the plot that certain state durations are relatively long and certain others are short. Figure 10 depicts the evolution of the states for the 10 training drill bits from good to worst in the log-likelihood space of the three HMM models (fourth HMM is ignored due to the inability to plot a 4D graph). 4.3.2 Log-likelihood plots. The log-likelihood plots for the four testing drill bits are illustrated in figure 11. Note that while their overall shapes do not resemble the ‘perfect’ plots illustrated in figure 4, they do serve the purpose of performing accurate diagnostics (with 97% accuracy).
4.4 Prognostics based on HMMs The task here is to first model the estimated vectors of state transition times (i.e. Si ) for the drill bit dataset. Given that drill bits are assumed to evolve through four health-states, the state transition time vector is 3D (i.e. ½tGood!Medium , tMedium!Bad , tBad!Worst ). Once the underlying distribution is ascertained, the task then is to construct the necessary conditional distributions 1.5
8
7
1
6
Torque
0.5
5
0
4
9
-0.5
3 -1
10 2
-1.5 -2 -2
1 -1
0
1
2
Thrust
Figure 9.
A 10 state HMMMedium superimposed onto the joint scatter plot of hole no. 6 of drill bit no. 6.
1290
P. Baruah and R. B. Chinnam Meduim Worn-out Worn-out HMM loglikelihood
0
-1000
-2000
Good
-3000 1000 M HM d m o iu liho ed e M glik lo
0
220
-1000 -200
-2500
Evolution of the health-states for training drill bits from good to worst in the log-likelihood space of the three HMM models.
400
500
200
0
-200
-1000
Log Likelihood
Log Likelihood
Figure 10.
0 MM Good H hood li e k li log
Results for Drill-Bit #4 -600
Results for Drill-Bit #5 -2000 HMMGood
HMMGood
-1000
HMMMedium
-3000
HMMMedium
HMMBad
HMMBad
HMMWorst
HMMWorst
-1400
0
5
10 Hole Number
15
19
-4000 0
5
10 Hole Number
15
1
Figure 11. Log-likelihood trajectories for different HMM models of certain drill bits.
for prediction of RUL and the associated confidence intervals. As stated earlier, this study assumes that ! follows a Gaussian distribution. 4.4.1 Test of normality. Multivariate data with small sample size is most often assumed to be Gaussian distributed. If the sample size is large, the distribution-fit may be investigated vigorously. In our case, the dataset is relatively small with just 14 observations. A formal method for judging the joint normality of a dataset is based on the squared generalized statistical distances dj2 ¼ ðxj x ÞT S1 ðxj x Þ. If 2 the data were in fact Gaussian, the chi-square plot of the pairs ½dðjÞ , 2p ðð j 0:5Þ=nÞ 2 2 (where dð jÞ are ordered distances and p ðð j 0:5Þ=nÞ is the 100ð j 0:5Þ=n percentile of the chi-square distribution with p degrees of freedom) is expected to yield a straight line with a slope of 45 and pass through the origin (Johnson 1982). 4.4.2 Construction of the necessary distributions. The goal here is to model the state-transition time vectors resulting from the diagnostics models. For the current case study, two distributions are being sought. The first distribution attempts to model the joint distribution of the following two state transition
HMMs for diagnostics and prognostics in machining processes 10
1291
Chi-squared plot of ordered distance
chi-sqr
8 6 4 2
bi-variate tri-variate
0 0
5 Ordered d-sqr
10
Figure 12. Chi-square plots of ordered distances for assessing normality of the bi-variate and tri-variate state transition distribution (i.e. ½tGood!Medium , tBad!Worst and ½tGood!Medium , tMedium!Bad , tBad!Worst ).
times: ½tGood!Medium , tBad!Worst . Thus, in monitoring any future drill bit, once the first state transition is detected, the joint distribution allows us to perform prognostics in the sense one can now construct the following conditional distribution: f ðtBad!Worst jtGood!Medium Þ. The second distribution attempts to model the 3D joint distribution of all the state transition times: ½tGood!Medium , tMedium!Bad , tBad!Worst . This way, once the first two state transitions are detected for any future drill bit, the following conditional distribution can be constructed: f ðtBad!Worst jtGood!Medium , tMedium!Bad Þ. One can see that the first bi-variate distribution is facilitating relatively long-term predictions and the second tri-variate distribution is facilitating short-term predictions. Given that the second distribution is utilizing more information from the unit under investigation, it is expected to lead to an improved estimate of the final state transition (i.e. tBad!Worst ). Figure 12 illustrates the chi-square plots of ordered distances for assessing normality of these two distributions. While the plot shows some evidence of systematic deviation from the 45 straight line, given the small sample size, it seems reasonable to analyze the data as if they were multivariate normal. 4.4.3 Prognostics. Our case study involves three state transitions. Normally, the prediction of the final state transition (i.e. tBad!Worst ) is of most interest for it allows us to replace the drill bit just in time to prevent any failure (that can potentially damage the work piece) while allowing us to utilize the full life of the drill bit. For other applications, the choice of the critical state transition point may be different and it could be that more than one state transition is of interest. The following section outlines the procedure for estimation of the desired conditional distributions for the purpose of prognostics. . A basic property of Gaussian distributions is as follows: If X ¼ ½X1 ..X2 is . D11 D12 and D22 > 0, then, the distributed as Nðk, DÞ with k ¼ ½k1 ..k2 , D ¼ D21 D22 conditional distribution of X1 , given X2 ¼ x2 , is also a Gaussian with a mean 1 vector of k1 þ D12 D1 22 ðx2 k2 Þ and a covariance matrix of D11 þ D12 D22 D21 . For the current case study, one can now readily use the above property to derive the conditional distributions of interest (i.e., f ðtBad!Worst jtGood!Medium Þ or f ðtBad!Worst jtGood!Medium , tMedium!Bad Þ), that are indicative of RUL.
1292
P. Baruah and R. B. Chinnam 80
f (t Bad →Worst | tGood →Medium ,t Medium→ Bad )
f (tBad →Worst | tGood → Medium ) DB#11
DB#12
DB#13
30
DB#14
DB#11
60
DB#12
DB#13
DB#14
40
tBad→Worst
tBad→Worst
25 3xSD
20
20
0
15
-20
10
-40 7
4
6 t Good→Medium
4
5
3xSD
7,11 7,11
4,11
6,9
tGood → Medium , tMedium→ Bad
4,10 4,10
Figure 13. The actual and predicted state transition points for all four drill bits. Squares denote actual state transition points and circle the predicted. Each plot also shows three standard deviations band for the prediction distribution.
Figure 13 reports the results from employing the above procedure for the current case study. Here, state transition points from 10 drill bits (same set was used to train the HMMs) are used to construct the distributions and four drill bits (the testing set for HMMs) are used to verify the prediction. As expected, the prediction intervals are tighter when tBad!Worst is predicted by employing both tMedium!Bad and tGood!Medium simultaneously rather than just tGood!Medium . Thus, these results demonstrate the ability of HMMs (used for modelling sensor signals) to facilitate both diagnostics and prognostics.
5. Conclusion This paper successfully demonstrates the value in adopting HMMs for diagnostics and prognostics. This research is a quest for robust empirical methods for diagnostics and prognostics in machining processes that can facilitate a more widespread deployment of CBM technology and practice in industry. Through the experimental study on a vertical drilling machine, we have shown how HMMs are able to parsimoniously represent the most relevant aspects of the sensory signals. It is evident from the diagnostics accuracy that this representation is quite satisfactory. It is also a robust method in that the HMMs worked satisfactorily with raw signals with little or no pre-processing. The paper also demonstrates the role of sensor fusion for effective diagnostics. The most significant contribution of this paper is the introduction of a novel method for carrying out prognostics using HMMs employed for diagnostics. The method allows one to estimate RUL under certain conditions. The prognostics model is basically driven by a multivariate distribution of the state transition points generated by HMMs employed for diagnostics. Thus, the accuracy of the prognostics method depends on the sample size used to construct the model. A Bayesian approach can be possibly used to update the parameters of such a prognostics model as new data becomes available.
References Atlas, L., Ostendorf, M. and Bernard, G.D., Hidden Markov models for monitoring tool-wear. Proc. ICASSP, 2000, 6, 3887–3890.
HMMs for diagnostics and prognostics in machining processes
1293
Baum, L.E. and Petrie, T., Statistical inference for probabilistic functions of finite state Markov chain. Ann. Math. Stat., 1966, 37, 1554–1563. Baum, L.E., An inequality and associated maximization technique in statistical estimation for probabilistic function of Markov Processes. Inequalities, 1972, 3, 1–8. Begg, C.D., Merdes, T., Byington, C. and Maynard, K., Dynamic modelling for mechanical diagnostics and Prognostics, in Maintenance and Reliability Conference (MARCON 99), 1999. Bengio, Y. and Frasconi, P., Input/output HMMs for sequence processing. IEEE Trans. on Neur. Net., 1996, 7, 1231–1249. Blimes, J. A., A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley, 1998a. Blimes, J., Data-driven extensions to HMM statistical dependencies, in International Conference on Speech Language Processing, 1998. Blimes, J., What HMMs can do. Technical Report: UWEETR-2002-03, 2002. Brand, M., Coupled hidden Markov models for modelling interacting processes. Technical Report 405, MIT Lab for Perceptual Computing, 1996. Carey, B. and Dan, M., Condition based maintenance of machines using Hidden Markov Models. Mech. Syst. Sig. Proc., 2000, 14, 597–612. Department of Defense Maintenance Policy, Programs, and Resources, Fact Book 2000 (http://www.acq.osd.mil/log/logistics_materiel_readiness/organizations/mppr/assetts/ factbooks/factbook/factbook.pdf). Ertunc, H.M., Loparo, K.A. and Ocak, H., Tool wear condition monitoring in drilling operations using hidden Markov models (HMMs). Int. J. Mach. Tools Manuf., 2000, 41, 1363–1384. Fine, S., Singer, Y. and Tishby, N., The hierarchical Hidden Markov Model: analysis and applications. Mach. Learn., 1998, 32, 41–62. Geibig, K., Instandhaltungsmanagement im Fokus der Liberalisierung, Optimiertes Instandhaltungsmanagement, EVU, Du¨sseldorf, 26–27 April 1999. Ghahramani, Z. and Jordan, M., Factorial hidden Markov models. Mach. Learn., 1997, 29, 245–273. Hall, D.L. and Lina, J., Handbook of Multisensor Data Fusion, 2001 (New York: CRC Press). Johnson, R.A. and Wichern, D.W., Applied Multivariate Statistical Analysis, 1982 (New Jersey: Prentice-Hall, Inc.). Kacprzynski, G.J. and Roemer, M.J., Health management strategies for 21st century condition-based maintenance systems, in 13th International Congress on COMADEM, Houston, TX, 3–8 December 2000. Lebold, M. and Thurston, M., Open standard for condition-based maintenance and prognostic systems, in Maintenance and Reliability Conference, 2001. Lee, J., Approaching Zero Downtime. The Center for Intelligent Maintenance Systems, Harbor Research Pervasive Internet Report, April 2003. Liporace, L.A., Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Trans. Inform. Theory, 1982, IT-28, 729–738. Murphy, P.K., Dynamic Bayesian networks: representation, inference and learning. PhD Thesis, 2002. NIST-ATP CBM Workshop Report, 1998. Olivier, C., ‘Ten Years of HMM’ (http://tsi.enst.fr/cappe/docs/ hmmbib.html), 2001. Poritz, A.B., Hidden Markov models: a guided tour. ICASSP, 1988, 1, 7–13. Rabiner, L.R., A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 1989, 77, 257–285. Smyth, P., Clustering sequences with hidden Markov models. Adv. Neural Inform. Proc. Syst., 1997, 9, 648–654. Theede, A., Erfolgreiches Benchmarking in der Instandhaltung. Optimiertes Instandhaltungsmanagement in EVU, Du¨sseldorf, 26–27 April 1999. Zhong, S. and Ghosh, J., A unified framework for model-based clustering, in ANNIE-2002, 2002.