Computers in Industry 64 (2013) 214–225
Contents lists available at SciVerse ScienceDirect
Computers in Industry journal homepage: www.elsevier.com/locate/compind
Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method Yan Chen 1,*, Feibai Zhu, Jay Lee NSF Center for Intelligent Maintenance Systems, University of Cincinnati, OH, United States
A R T I C L E I N F O
A B S T R A C T
Article history: Received 4 June 2011 Received in revised form 25 July 2012 Accepted 16 October 2012 Available online 11 December 2012
When developing Prognostic and Health Management (PHM) applications for manufacturing systems, data acquired frequently comes with issues which hinder further data analysis. However, there is neither a clear definition of the data quality nor evaluation methods to quantify if acquired data is suitable for these prognostic modeling tasks such as failures detection, diagnosis and prediction. Especially, during health diagnosis modeling of engineering systems, based on data-driven method, acquired data is expected to contain clusters that can be used to differentiate multiple system health conditions. So in most cases, once data is acquired, people would like to intuitively believe that data is able to cluster into subgroups. However, this bias could lead to acceptance of false information in data. Furthermore, most of the existing metrics, such as clustering tendency in statistics and cluster-ability in data mining, only individually evaluate data characteristics without considering prognostic modeling. This paper proposes a new method to evaluate and improve data quality for system health diagnosis modeling. The clusters, as critical data characteristics for modeling multiple system conditions, are first estimated by ‘‘visualization’’ on the dissimilarity spectrum from spectral analysis and then evaluated in terms of their fitness and separation with each others. A visual assessment based outlier detection method is also proposed to recognize outliers from the data, which utilizes the graphic intermediate results from previous evaluation. Finally one group of bearing testing dataset acquired from real industrial applications is used to demonstrate how proposed methods are used to evaluate and improve the data quality. Published by Elsevier B.V.
Keywords: Data quality Prognositcs Data partitioning Outlier detection Bearing health diagnosis
1. Introduction Since last decade, Prognostic and Health Management (PHM), emerged as a new engineering discipline focusing on failure detection, prediction, and the management of health of complex engineering systems [1], have been implemented on sophisticated applications such as aerospace vehicles, military and defense, the automotive industry and others [2,3]. Recently, significant efforts are paid on developing and deploying PHM solutions for manufacturing systems. Such growing requirements come from the realization that PHM capabilities can be of vital support to revolutionize logistics and operational readiness [4]. Based on the real-time assessment of digitized information obtained from onboard sensing equipment, PHM-enabled technologies can
* Corresponding author at: NSF Center for Intelligent Maintenance Systems, 560 Baldwin Hall, University of Cincinnati, OH, United States. Tel.: +1 513 237 1470; fax: +1 513 556 4647. E-mail addresses:
[email protected] (Y. Chen), Zhufi@mail.uc.edu (F. Zhu),
[email protected] (J. Lee). 1 Present/permanent address: IMS Center, University of Cincinnati, 560 Baldwin Hall, 2600 Clifton Avenue, Cincinnati, OH, United States. 0166-3615/$ – see front matter . Published by Elsevier B.V. http://dx.doi.org/10.1016/j.compind.2012.10.005
improve responsiveness of the entire logistics systems, and automate system health management. Generally, among all of approaches to PHM problems, there are some methods that involve accurate system qualitative models, or semi-quantitative models, at the other end of the spectrum, there are methods that do not require any form of system modeling and rely only on a large amount of historical process data [5,6] to deliver system prior knowledge. In these data-driven based methods, most of the data is obtained from sensor readings mounted on equipments. The sensor readings augment historical knowledge such as equipment behavior, human expertise and so on. Afterward, features representatively indicating equipment health are extracted from the sensor readings through variety of signal processing and statistical feature extraction techniques. In feature space, historical system knowledge can be modeled in various forms, including statistical distributions, dynamic models with differential equations describing health states of the equipment with time, and others [7]. These methods finally lead to decisions on assessment of equipment health states, identification of root causes of specific faults, and prediction of future system behavior. During PHM analysis, historical knowledge carried by data is systematically fed into a decision-making process by sophisticated
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
data modeling techniques. However, it is frequently discovered that the collected data is inaccurate, incomplete or redundant [8]. A large amount of data with possible flaws is collected intrinsically without considering when or how it will be used in later analysis; in addition, the volume of data produced by today’s industry has been steadily increasing day by day because of the advanced sensing technologies. However, current PHM researches provide limited techniques to guarantee that collected data carries appropriate historical knowledge that will contribute to decision making modeling [2]. Therefore, data quality problems arise in the PHM environment. 1.1. Data quality issues in PHM process In real world application, historical raw data comes with errors. Subsequent data preparation procedures might also introduce deficient factors to the data at feature level. All of these errors would impact final decision making. A signal collected from sensor readings usually carries random disturbance, noise and other unknown disturbances whose accurate knowledge of intrinsic physical principles is not clarified. In addition, during data storage and transfer, errors might also be induced. Missing observations happen due to temporary loss of sensory response [9] or limitations of the data acquisition system. If these missing values occur in a critical signal range, it can reduce the representativeness of the information and therefore will distort the inference based on the entire observation population [10]. In some cases, installed sensors may not be sufficient to cover critical perspectives of concerned system phenomena or may contain redundant information. Although some of these problems can be solved by experienced engineers, when system become complex, the principles to evaluate and select appropriate sensors or feature sets for a particular problem are not transparent. During initialization stage, numerous sensors are often installed and varieties of features are extracted without considering the consequent effects on PHM modeling. Thus, both feature incompleteness and redundancy issues can be encountered later. Incompleteness of features will place a burden on later PHM modeling algorithms. Redundant features cause troubles, such as unnecessary computations and dealing with non-invertible matrices [11]. When using historical data for PHM modeling, usually a training dataset is first established. The training data, in fact, is a data matrix consisting of multiple features in columns and observations in rows. One observation is also considered as an instance used in future data modeling. The label is a unique identity for every instance and indicates the corresponding system properties when sensor data was collected. However, one of frequently raised issues is that most of historical records may be useless because context information that enables identification of these observations is lost. These observations might come from different working conditions and can be mixed together without separated identities. Possibly, noise instances may also be labeled and contaminate good instance population. 1.2. Data quality definition for system health diagnosis in PHM In larger context of Information System, data quality researches investigate its definition, modeling and control techniques. The definition of data quality is to reflect concerns for the suitability of the data products to meet the needs of primary data use, which is often referred as ‘‘fitness for use’’. Similarly, in PHM environment, the data quality should reflect the suitability of data to satisfy the modeling for purposes of failure detection, diagnosis and prediction. In other words, as to support PHM decision making, the data
215
with good quality is the one possessing corresponding characteristics that are easily identified by PHM modeling method. Specifically, considering PHM problems including failure prediction, system health diagnosis and failure detection, there are three main concerned data characteristics: trend-ability, cluster and abnormality. In this paper, the data quality for health diagnosis related modeling is studied. In data-driven health diagnosis, classification modeling techniques are frequently used to estimate the system condition of new observations. It usually categorizes a set of new system observations based on classified historical information. Most classification algorithms first establish a baseline model of historical data to map multi-dimensional observations into a single category value. Once the model is established, for any new observation, it can predict its corresponding category value. In this case, the historical training data with good quality is supposed to possess clusters which are data structures representing different system conditions. In reality, often once historical data-set is acquired, people would like to intuitively believe that it is able to cluster into subgroups. This could lead to acceptance of false information if the dataset has ambiguous partitions because of outliers, signal error contaminated instances or it is actually not aggregated [12]. Testing the clustering tendency, verify the number of clusters contained in the dataset before establishing baseline model should help reduce the bias. This research focuses on evaluating data quality before the modeling by checking whether the data tends to cluster and how fit the potential subgroups in the data are. In presented method, a visual assessment based data partitioning methods is employed to clarify data characteristics of clusters and their quality in terms of fitness and separations. 2. Related work and objectives In literatures, numerous research efforts are developed to quantify individual data quality issue. There are several wellknown metric developed to quantify data quality. Signal to Noise Rate (SNR) analyzes how much a signal is corrupted by noise; clustering tendency and cluster-ability are used to evaluate data’s capability to aggregate into clusters; and data predictability for trend-ability analysis [13]. Another area of data quality research focuses on solving data quality issues. The study in [14] demonstrates that feature selection can be an efficient approach for finding a small set of features corresponding to an optimal classification performance of partial discharge diagnostic systems. The research by [15] illustrates how a decision tree method is able to identify the best feature set for purpose of bearing failure detection. In a study conducted by [1] for fault detection and diagnosis in engineering systems, features are usually ranked by their distinguish-ability (differentiating classes between their overlap), detect-ability (identifying the smallest failure signature), and identify-ability (tracking the similarity of features as they identify a fault mode). For signal processing, various filter techniques are investigated to remove background noise that might mask true signal expected as the indicator of system behaviors. By filtering the signals, white noise and other known disturbances will be excluded and hence the signals will be representative for system health. The Fourier and wavelet transforms are popular methods [16,17]. If the noises from an individual signal could be considered as univariate outliers statistically, the instances with unknown or mislabeled identities in the training data matrix could be equivalent to multivariate outliers for PHM modeling and analysis [18]. For these types of outliers, detection methods can be categorized into three types: statistical methods [19], distance-based methods [20], and density-based methods [21].
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
216
Feature extraction
classification algorithms
Feature vectors Training data
Yes Data Quality?
No
Data
Filtering
Multiple failure conditions data model
Fig. 1. Data quality evaluation and improvement.
As the above mentioned, although a large number of related researches are developed in the area of signal processing and data mining, there is a lack of systematic method for data quality evaluation in the purposes of PHM analysis. Unlike independent data characteristic metrics such as signal-to-noise ratios, linearity and so on, the data evaluation discussed here is a set of modeldependent procedures and metrics that quantify whether the multiple dimensional training data is suitable to support modeling for a specific PHM purpose. It would guarantee information sufficiency and accuracy for quantitative modeling. Targeting at PHM analysis of system health diagnosis, the objective of this research is to establish a methodology for evaluating and then improving quality of training data by identifying outlier instances prior to diagnostic classification modeling. According to data-driven PHM approaches, baseline modeling of system health diagnosis is equivalent to training an offline classifier that can recognize different system behavior patterns. The classifier is then capable of predicting online upcoming failures by comparing it with recognized patterns. However, error and noise in training data directly lead to inefficiency of the classifier. It is also proved that removing noise from the training data by the classifier itself may sacrifice even more predictive accuracy. Therefore, as Fig. 1 shows, the basic concept of this research is to use an efficient data partitioning algorithm and a set of metrics to decide if the data is suitable for classification modeling. If necessary, outlier instances will be filtered out. This research is developed under the following assumptions: 1. Targeted training datasets are collected from in-house test-beds or historical database with identities (labels) corresponding to different failures. 2. The feature space exists in which any subgroup of instances representing one consistent health condition is uniformly distributed as a cluster structure separated from the others. 3. This research assumes that, in multi-dimensional training data, an outlier can be a single instance that deviates from a subgroup of the data or instances in a subgroup with small size and far away from the majority of the data. In next section, an overview of proposed method is first introduced. Sections 4 and 5 provide details of developed data partitioning method and evaluation metrics. In Section 6 an outlier detection methods is proposed based on graphic intermediate results of data quality evaluation. Finally a group of dataset for the bearing failure diagnosis is used to demonstrate the outlier detection and the data evaluation methods. 3. Data quality evaluation and improvement for prognostic modeling In statistics, clustering tendency assessment mainly addresses the problem of deciding whether data shows a predisposition to cluster into natural groups without indentifying the groups
themselves [12]. The study assumes that arbitrary number of points have three possible spatial arrangements: first these points are arranged randomly; second, aggregated, or exhibit mutual attraction; and last, regularly spaced, or exhibit mutual repulsion. Under this assumption, cluster tendency is decided by testing data spatial arrangement against randomness within a well defined sampling window. However this test did not provide details of data inherent structures such as how many clusters, if they are well separated and so on [22]; in most cases, this cannot determine if the data is qualified to generate desired subgroups in next step classification analysis. In addition, a suitable sampling window for spatial randomness test is very critical since inappropriate selection of a sampling window can mislead clustering tendency assessment. Another side of the story is the clusters validating. Few attempts have been made to establish metrics to validate data clustering. Cluster-ability is a notion in some publications, for example [23], which determines how much ‘clustered structure’ is in a data. There are a number of measures of cluster-ability: center perturbation cluster-ability [24], worst pair ratio cluster-ability [25], separate-ability cluster-ability [23], variance ratio clusterability [26] and cluster-ability assuming a target clustering [27]. Intuitively, a good clustering expects to have a larger distance among points in different clusters (separation) and a smaller distance among points in the same cluster (distortion). The intuition has been converted to different forms according to clustering methods. For examples, center-based clustering is measured by summarized ratio of the distance from the point to its closest center, the ratio is smaller if the point surely belongs to the cluster; for lost function based clustering, the lost function itself is established based on the intuition. It searches for the optimal separation in the data as to achieve the best clustering quality. Therefore, instead of simply evaluating data clustering tendency without identify the cluster structures, this study tries to determine the data quality in terms of cluster-ability. A visual assessment based partition algorithm is developed to find clusters in dataset then cluster structures are evaluated in terms of their fitness and separation. Take the case of bearing multiple failures modeling as an example: vibration signals are usually collected from consecutive inspections of the bearing under multiple system health conditions such as outer-race, inner-race, and roller failures. After multiple inspections, collected data composes a training data matrix in d-dimensional feature space. A classifier is then supposed to be trained based on the data to predict the new observation category of working condition. However, before establishing the classifier, people intend to know if collected data has sufficient cluster structures that are able to profile expected failure conditions in the feature space, and how easily these structures can be recognized. These questions can be answered by simple discover even visualize corresponding characteristics of the training data; consequently the following design of classification learning algorithm will be more efficient. From the perspective of a subgroup of instances in the dataset, if they precisely represented consecutive measurements under one time-invariant failure condition, we assume they are not predisposed to further partition into multiple clusters because of data inner consistencies. They should be uniformly distributed in shaped feature space profiling specific system health condition. Therefore spatial randomness is used to examine individual cluster structure fitness. In addition, the metrics are established to evaluate the separation between clusters. If all subgroup of data are well separated from each other, there is no need to pursue powerful and sensitive learning algorithms for the further classification modeling. Most training data collected from dynamic system behaviors would form irregular non-convex shaped cluster structures. However, a large amount of research proves the distance-based
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
217
Fig. 2. Spectral analysis based randomness test.
clustering/partitioning learning algorithms such as k-mean are limited to perceive these types of inherent structures. Therefore, in this study, nonlinear spectral decomposition technique is utilized to visualize training data in a low dimensional space. Fig. 2 illustrates the procedures of data evaluation and improvement. First, based on new representation of original data set from spectral decomposition, a dissimilarity matrix can be generated. After reordering the instance sequence by their pair-wise dissimilarities, a spectrum of the dissimilarity matrix can be created, in which the instances close to each other are gathering as dark blocks along diagonal line. They also stand for cluster structures in the training data. A visual assessment based partitioning algorithm on the spectrum image is designed to identify and separate these clusters. Following the partitioning, the metrics are designed to quantify the separation and characteristics of these subgroups. 4. Data partition by visualized assessment and spectral decomposition 4.1. Spectral decomposition To unveil data intrinsic cluster structures, manifold learning based methods can better handle these nonlinearly distributed and overlapped training data sets from different failure modes, which is frequent practices in most PHM applications, compared to classical linear decomposition methods. For Laplacian Eigen-maps (LE) motivated manifold algorithm [28], the d-dimensional dataset is viewed as a undirected data graph G = (V, E) with node set V and edges E. Every node in the graph represents one instance of data set. If the graph G is weighted, each edge carries a non-negative weight wij 0. The unnormalized graph Laplacian matrix is defined as: L = D W where W is the weighted adjacency matrix and D is degree matrix. The Laplacian Eigenmap is computed by spectral decomposition for Eigen-values and Eigenvectors of Ly = lDy. The image of vi under the embedding is converted into the lower dimensional space Rm given by ordered Eigenvectors: {y1(i), y2(i), . . ., ym(i)}. This
decomposition provides significant information about the graph and distribution of all instances. It has been proven experimentally that the inner natural groups of dataset are recovered by mapping the original dataset into the space spanned by Eigenvectors of the Laplacian matrix [28]. Fig. 3 shows an example of spectral decomposition on iris data set. The iris flower data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher in 1936 as an example of discriminate analysis. The data set contains 150 instances totally, 50 instances from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each instance, which are shape characteristics of the flowers, in centimeters. As shown in Fig. 3(b), in two-dimensional Eigenvector space generated from spectral decomposition, two clusters are obvious. The dissimilarities between clusters are dramatized. However, the selection of suitable number of Eigenvectors is a general problem for most spectral decomposition based applications. The study proved that in ranked Eigen-value array L the number of repeating Eigen-values with a magnitude of 1 is equal to the actual number of potential clusters, which is supposed to be selected as the dimension of new spanning space. However this principle does not work when clusters are not well separated [29] that will lead to the Eigen-value array without repeating 1 values. So here, the first large drop in the derivative of Eigen-value array is used to estimate possible partitions in the data. 4.2. Visual assessment based data partitioning Through spectral decomposition, original data set is mapped into a low dimensional space. As iris example shows, the dissimilarity relationship between instances is magnified. The following section will introduce a method that would further create a clear image for visual assessment of data cluster structures. A simple visualized data partitioning method by a reordered adjacency (dissimilarity) matrix image of the data will be proposed. As Fig. 4 shows, from new representatives of the data (chart a), a two-dimensional adjacency (dissimilarity) matrix, which consists
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
218
Iris data represented by two eigenvectors from Spectral decomposition
Iris data represented by two features 2.5
the second Eigenvector
1
Petal Width
2
1.5
1
0.5
0.8 0.6
the first specie the second the third
0.4
0.2 0 -0.2 -0.4
0
1
2
3
4
5
6
7
-0.6 -1
-0.5
Petal Length
0
0.5
1
the first Eigenvector
Fig. 3. Spectral decomposition of Iris data: (a) raw data and (b) first two eigenvectors image after spectral decomposition.
of numerical pair-wise dissimilarity values, is calculated and then reordered. The instances (2,4,6 or 1,3,5 in chart a) that have less pair-wise dissimilarity are moved together along the indices axes in the spectrum (chart c) of the reordered dissimilarity matrix. In the spectrum, a group of instances close to each other will form a red (dark in chart d) block, if smaller values were indicated by red color. Similarly, as shown in chart b and d, the iris data forms the spectrum with dark blocks carried by diagonal line, which indicate cluster structures of the original data. The concept of the reordering is first proposed in [30]. In the study [31], enhanced visual assessment method for clustering tendency assessment is proposed. The study utilizes genetic algorithm to differ dark block areas with others by finding a binary threshold of the spectrum grayness; this requires a rather large computations. Another important disadvantage of the simple binary threshold based method is that it assumes the separations between any two clusters are similar and able to be identified by single binary threshold. As a matter of fact, multiple clusters are usually separated with different levels of dissimilarity.
Therefore, instead of searching for a binary threshold indicating edges of these dark blocks, the proposed algorithm intends to find these partitioning points on the diagonal line, which indicate separations of any two dark blocks as shown in chart c of Fig. 4. All of values from the reordered dissimilarity matrix {wij} are equally scaled into h levels {bk} = {b1, b2, . . ., bh}, bh = max{wij}, 0 < bk max{wij}; and Ak is the matrix defined as: Akij ¼
wi j ; 0;
if wi j 2 ð0; bk Þ else
(4-1)
Ak is the symmetric matrix Akij ¼ Akji , the partitioning points Pn Pi k k satisfy the condition: m¼iþ1 Am;i 0 þ e 2, and m¼1 Aiþ1;m 0 þ e 2. If visualize the searching in terms of dissimilarity spectrum, in each search step, a batch of dissimilarity value is added; at beginning, the subgroups of the data even with subtle dissimilarity are found; with preceding more search steps, some subgroups
Fig. 4. Visual assessment based data partitioning.
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
219
Fig. 5. Partition searching driven by dissimilarity levels: (a) dissimilarity spectrum, (b) partitioning point search steps, (c) step 1 spectrum and (d) step 4 spectrum.
merge into a large subgroup, others in large dissimilarity with the rest stay. Taking iris dataset as an example, dissimilarity values from all pairs of instances are scaled into 20 batches whose dissimilarity levels are indicated by the color bar shown in Fig. 5(a). After first step as shown in Fig. 5(c) six subgroups would be identified. Every subgroup is one block. The separation between any two blocks can be recognized by these narrow crossing points on the diagonal line. However, with more dissimilarity values added in the spectrum previous subgroups merged and the number of partitioning points is decreasing. To some extent, after more dissimilarity values are put in to the spectrum, the number of partitions points would reach stable and the search will be end. The searching algorithm for the partitions includes the following steps:
3. If the size of partition points founded is same with previous two search steps, it ends. If not, increment k and repeat second and third steps. a. If PAk = PKk1 = PAk2, end b. Else k = k + 1
5.1. Data clustering tendency assessment
1. Initialization: scale all elements in dissimilarity matrix W into h equally spaced intervals ordered from the small to large. a. generate B = {b1, b2, . . ., bh}, bi = {wij} is an array including all of elements from W that belongs to hth interval b. f ! X, assign an empty set to X collecting dissimilarity values that will be put in the spectrum c. set k = 1 2. Start kth step search for partitioning points P on diagonal line of current VAT spectrum. Current VAT spectrum draws out dissimilarity matrix Ak. a. bk ! X collect dissimilarity values at kth level wi j ; if wi j 2 X k b. generate matrix A and Akij ¼ 0; else c. search P in the spectrum created by matrix Ak and put p ! PAk
The spatial randomness test based clustering tendency assessment is used to examine subgroup data fitness. If partitioned subgroup of instances uniformly distributed in the feature space, they had no tendency to further cluster, they would be fit to represent one mode of working condition. As to avoid the need to know the sampling window, the hypothesis test [32] is established to detect randomness based on the edge length distribution of minimum spanning trees (MSTs) constructed on the data graph. Kolmogorov–Smirnov (KS) statistics is used to define the maximum difference between the two cumulative distribution functions (CDFs) which serves as the normalized all edge length distribution of MST over the data graph of the subgroup and reference CDFs from randomness process. As Fig. 6 shows, iris data is partitioned into three subgroups with circle, square and star
The search ends when the minimum data quality is reached. 5. Data quality evaluation In Section 4, the proposed method discovered all of subgroups of data by visualized partitioning algorithm. In this section, the fitness and separation of these data subgroups would be evaluated.
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
220
F(x)
1 7
0.5 0
0
0.1
0.2
0.3
5
0.4 0.5
x
0.6
F1(x) reference 0.7 0.8
Empirical CDF of dataset H0: 1
4 F(x)
feature three
6
Empirical CDF of dataset H0: 0
3
1 F1(x) reference
0.5 0 0
2
0.1
0.2
0.3
1 5 7 6
3 feature two
2 4
5
0.4
0.5
0.6
0.7
Empirical CDF of dataset H0: 0 F(x)
8
4
x
1 0
feature one
F1(x) reference
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x
Fig. 6. Iris data: (a) data plot with estimated partitions and (b) spatial randomness test over each partition.
markers, a hypothesis test against randomness is conducted for each of them (Fig. 6(b)). The test would be able to approximately argue if every subgroup is well formed as a cluster, in other words, if it only contains data from specific system health condition. By comparing with given data labels, it is found that circle marker group consists of all setosa instances (h0 = 0 accept randomness); star marker group only contains one part of virginica instances (h0 = 0); square marker group includes one part of virginica and all versicolor instances (h0 = 1). During the test, the reference is generated by pseudorandom values drawn from the standard uniform distribution on the open interval (0,1). All of reference samples are data instance in multiple dimensions, which is consistent with testing instances. All edge lengths of MST for testing sample are first normalized by the volume of convex hull containing all testing instances as the equation (5-1) shows: p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! d V reference ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sˆi j ¼ si j p (5-1) d V testing Here, Vreference and Vtesting are volumes of convex hulls that surround reference data and testing data respectively. sˆi j and sij are the normalized and non-normalized length of MST edges, respectively. 5.2. Inter-clusters separation assessment In this section, two metrics will be established to evaluate the separation between discovered subgroups. If data graph G = (V, E) with node set V = {v1, . . ., vn} and edge set E = {e1, . . ., em}, W is weighted adjacency matrix of neighbor vertices; with A, B 2 V, the link between set of vertices A and B can be quantified as: X linksðA; BÞ ¼ Wði; jÞ (5-2) i 2 A; j 2 B
The k-way normalized cut problem [33] is to minimize such links that escape a cluster relative to the total ‘‘weights/similarity’’ of the cluster. The optimization objective functions are encoded as: minNcutðA1 ; A2 :::Ai :::An Þ ¼
min
fA1 ;A2 ...An g
k X i¼1
linksðAi ; A¯ i Þ P k;h 2 Ai Wðk; hÞ
be, further the partition would be from the rest. linksðAi ; A¯ i Þ m1i ¼ P k;h 2 Ai Wðk; hÞ
(5-4)
The second metric is the weakest link of a partition, which is the maximum edge weight value over all pairs of points belonging to the same partition, divided by the shortest edge betweenpartitions. On the contrary to the first metric, higher value of the second metric mean worse separation. m2i
maxi; j 2 Ai ðW i; j Þ mini 2 Ai ;k 2 A¯ i ðW i;k Þ
(5-5)
These two metrics evaluate the cluster separation by lost function based and linkage based measurement. The former metric is less sensitive but more local consistent than the later one. Take iris data as an example, using the proposed data partitioning method, three partitions appear in the training data as Fig. 6(a) shows. The two metrics are calculated for every partition (Table 1). Partition 3 is the best compared to others. It consists of all instance identified as the first Iris specie (50 instances), which is well separated from the other in terms of local consistency and overlap. The second partition consists of all of the second specie instances and a part of the third one (16 instances), so it is worst by both measurements, whose rejection of randomness hypothesis also proves the mixture. 6. Outlier detection by visualized data partitioning Under the outlier assumption made in this research, targeted outliers include small size subgroup of instances that have weak connection with others, or instances that deviate from the majority and locate in the periphery area of individual partitions. In the following section, a method to detect the outliers from training data is proposed. The method takes use of two important intermediate graphic results from previous data evaluation procedures, which are dissimilarity spectrum (DS) and minimum spanning tree (MST). Dissimilarity spectrum is an image of the visualized adjacent matrix of representatives of training data. Minimum spanning tree is an established spanning tree over
(5-3)
A¯ i is the complement of Ai and Ai 2 V Here, the lost function is disassembled at the summation; each element becomes a metric that quantifies the separation between a single partition and the rest of the data. The metric is a ratio of a partition’s link with its complement and its inner weights. If the weight was indicated by dissimilarity, the higher the metric would
Table 1 Iris data structure evaluation. Data structures (size)
M1
M2
Ratio (M2/M1)
Randomness
Partition 1 (star: 34) Partition 2 (square: 66) Partition 3 (circle: 50)
3.39 2.71 5.80
6.54 8.09 1.48
1.93 2.98 0.25
H0 = 0 H0 = 1 H0 = 0
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
221
Fig. 7. Data graph and dissimilarity spectrum of the dataset with complex boundary: (a) data graph, (b) dissimilarity spectrum and (c) the secondary separation in the second cluster of four-class data.
instances in each cluster. They are mainly used to examine spatial arrangement of the instances. Instead of global searching for outliers over all training data, proposed method searches outliers over individual cluster structure. Consequently a list of top outlier candidate will be provided based on the visual assessment of DS and MST, and then they are ranked based on their outlier-ness scores calculated by local outlier factor (LOF) [21]. The part of the public dataset ‘‘four-class’’ [34] is used to help illustrate developed outlier detection methods, as the data graph shown in Fig. 7(a). Here, 20 random outliers are seeded into selected training data. Cluster structures can be found by proposed partitioning method as the line in Fig. 7(a) shows. Fig. 7(b) is the DS of the dataset, for each cluster block the instances around the edge have relative larger dissimilarity with the ones in the center of the block (lighter color stands for larger dissimilarities). Obviously, at right bottom or up left of each cluster block, there are some secondary small blocks which contain some instances deviating from the main cluster block. Based on Fig. 7(b), if any outliers existed, these instances between the two blocks would be the top candidates. As suspected, most of instances locating in ambiguous areas around the boundary of the two blocks in the spectrum are seeded random outliers as G1 G2 and G3 groups highlighted. In the data graph of Fig. 7(a), they are found also far away from main clusters. Furthermore, the outlier-ness of these instances in G1, G2, and G3 groups can be sophisticatedly measured by comparing local density of the instance with its neighbors. This method is likely to pick up the outlier candidate groups from individual well-formed cluster structures in the data. In other words, if the majority of instances in one cluster structure are tightly connected and only a small group of instances deviates from the specific cluster, the dissimilarity spectrum method is able to find the group outlier easily. However, if the outliers spread individually between different cluster structures and have large dissimilarity with the most cluster structures, it would be hard to find them by the method. Another problem in the method is to define the suspected area containing outliers in the spectrum, which is to find secondary separation in one dark block. Therefore, the local average dissimilarity (LAD) of every instance is defined as w ¯ i 2 Ck ¼ P 1 j 2 C k wi j , where wi j stands for the dissimilarity between instance n i and j. LAD of all instances in one cluster become an array: fw ¯ i ji ¼ 1; 2; :::; and i 2 C k g The array gives a curve that reflects instance tightness with each other. The secondary separation will be found by tracking maximal and minimal values of the first derivative of the curve. Fig. 7(c) shows the zoom-in spectrum of the 2nd block in Fig. 7(b) and calculated LAD derivative. The separation is then picked up by the highlighted maximal and minimal values. It is discovered that the edge lengths {wij} of a MST over a cluster of instances follow the distribution that depends on the
spatial arrangement of instances. For most subgroup of data whose spatial arrangement is verified as randomness, the outlier detection problem is equal to find these inconsistent edges whose weight are significantly larger than average length of their local neighbor edges in MST edge length distribution. Therefore, if there is a partition C = {C1, C2, C3, . . ., Cn} over instances and MSTi is constructed MST over Ci, here we define that an inconsistent edge w for outlier detection in MSTi meets the following two conditions: w > wmax ; wmax is the upper limit: T% of edge lengths in MSTi. N1/N2 < B or N1/N2 > 1/B; If the MSTi breaks at edge w into two branches, and N1 and N2 stand for the number of nodes on each branch. Constant B controls the number of instances in the branch connected by inconsistent edge w, which must be small enough as to be suspected as outliers. Constant T varies with the tightness of the MSTi. If the MSTi spreads with scattering instances, heuristically, the up-limit controlled by T are supposed to be larger. In summary, inconsistent edges are the edges that separate a small group or individual instance from the majority with larger distance than others. For the 1st cluster in Fig. 7(a), the MST is established as shown in Fig. 8; finally only one edge is proved to be the defined inconsistent edge and highlighted in Fig. 8. Apparently the left small group of instances is more likely to be outlier group.
Fig. 8. Minimum spanning tree over the first cluster of four-class data.
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
222
Fig. 9. Bearing failures.
Table 2 The bearing signature frequency. Spindle rotation speed Ball pass frequency outer race Ball pass frequency inner race Ball spin frequency
fr 7.14 fr 9.88 fr 5.824 fr
Fig. 10. Dissimilarity spectrum of bearing training data with outliers.
Table 3 The bearing failure conditions.
7. Case study Bearings are one of most frequent failing components in rotary machines. Such failures can be disastrous, leading to costly downtime. In order to prevent these kinds of failures, various bearing condition monitoring techniques have been developed. Generally bearing failures can be categorized into: outer-race failure, roller/cage failure, inner-race failure and combination of these as Fig. 9 shows. Therefore a test-bed was setup in NSF I/UCRC center for Intelligent Maintenance System (IMS) lab to identify different types of bearing failures by data-driven modeling approach. A single axes accelerometer is mounted to collect vibration signals with a sampling rate of 50 kHz. The spindle rotation speed is 800 rpm (fr). The bearing characteristic signature frequencies are listed in Table 2. Multiple tests are conducted under seven system health conditions (Failure mode f1–f7), as listed in Table 3.
Failure mode
Condition description
f1 f2 f3 f4 f5 f6 f7
Normal Rollers Outrace Inner race + roller Outer race + inner race Outer race + inner race + roller Outer race + roller
training data with seeded outliers is shown in Fig. 10. The data quality evaluation results are listed in Table 4. As the results show, cluster C1, C2, C3, and C4 have poorer quality that is quantified by M2/M1 ratio compared to others and the rejection of randomness. In dissimilarity spectrum, the first four partitions have blurred edges. Since the first cluster has only 6 instance that deviate from the majority, they are naturally categorized as the outliers. 7.2. Outlier detection
7.1. Initial data quality evaluation The proposed outlier detection method then is applied on the three ‘‘unhealthy’’ clusters (C2, C3, and C4). Overall MST method found total 9 outliers from 20; DS method found 4 more as Table 5 shows. For partition C2 in Fig. 10, there is no obvious secondary edge for suspected outlier area. From another perspective, MST method is more efficient for C2 and finds 6 outliers compared to 3 by DS method shown in Table 5. For partition C4 with a small group of instances scattering around the periphery area of the cluster, the secondary edge in the dissimilarity spectrum of C4 is easy to be
Besides the unknown outliers during the testing, 20 random outliers are seeded into the training data. The training data includes 10 features extracted from the vibration signals. They are amplitudes at 9 bearing vibration signature frequencies (1, 2, 3 per signature frequency) and time domain signature of root mean square (RMS). Similarly, each seeded random outlier instance has 10 features which are randomly generated within the range of corresponding feature. The dissimilarity spectrum of Table 4 Bearing data quality. Data quality (with outliers) Partition index
Failure modes
m1
m2
Ratio
Instances spatial arrangement (0:random)
Size
C1 C2 C3 C4 C5 C6 C7 C8
Outliers f5 + outlier f4 + f5 + outlier f6 + outlier f7 f2 f3 f1
2.08 4.65 7.17 6.15 15.87 21.81 51.26 93.68
5.77 5.17 3.54 2.85 0.78 1.85 0.70 0.17
2.78 1.11 0.49 0.46 0.05 0.08 0.01 0.00
1 1 1 1 0 1 0 0
6 20 22 23 17 17 17 17
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
223
Table 5 Detected outliers by ODS and OMST. Notes
OMST LOF
Failure modes
Partition index
MST tightness related parameters (T) MST edge kurtosis
MST edge mean
2.79 2.33 1.26 1.02 0.89 1.72 5.37 15.52 9.21
Outliers Outliers Outliers Outliers Outliers Outliers Outliers Outliers Outliers
C2 C2 C2 C2 C2 C2 C3 C3 C4
1.7
6.4
4.7
5.2
3.1
7.2
Notes
ODS LOF
2.79 2.33 1.26 7.50 6.25 9.21 8.43 8.33 8.14 7.67 11.00
Failure modes
Partition index
Outliers Outliers Outliers f6 f6 Outliers Outliers Outliers Outliers Outliers f7
C2 C2 C2 C3 C3 C4 C4 C4 C4 C4 C4
recognized. In MST method, the spread-ness of MST over the cluster is measured by its edge mean and kurtosis value. C4 has a flat edge length distribution and much higher mean value, compare to C2 and C3. For DS method, since the instances are reordered by their dissimilarity in the spectrum, if LAD array has higher standard deviation like C4, it is more likely that these instances at start and end have a large difference with the majority; outliers are easily recognized. In summary, there are 13 outlier candidates found from C2, C3, C4, and by adding the ones in C1, totally 19 of 20 outliers are correctly found pulse 3 ‘‘false’’ alarm instances from test f5 and f6.
LAD array related parameters Standard deviation
Kurtosis
Mean
0.07
2.25
0.22
0.09
5.27
0.14
0.11
4.52
0.16
In Table 6, true label of every instance is listed in ‘‘instance label’’ column, ‘‘partition index’’ are consistent with the labels in Fig. 10. All candidates that were found are listed in Table 6 with LOF scores, the 95% outliers are recognized within top 22 potential candidates. Based on the global LOF method, only 50% outliers can be detected within top 22 candidates. 7.3. Data quality evaluation before and after outlier removal After removal of the outliers, nearly all training data quality is improved. Both metrics for most of partitions have been improved
Table 6 Benchmark with standard global LOF. Candidate index
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
OMST + ODS
Standard global LOF
LOF
Instance label
Instance index
Partition index
LOF
Instance label
Instance index
0.89 1.02 1.26 1.72 2.33 2.79 5.37 6.25 7.50 7.67 8.14 8.33 8.43 9.21 11.00 15.52 1.36 1.36 1.36 1.36 1.36 1.36
Outlier Outlier Outlier Outlier Outlier Outlier Outlier f6 f6 Outlier Outlier Outlier Outlier Outlier f7 Outlier Outlier Outlier Outlier Outlier Outlier Outlier
137 136 132 120 134 133 127 78 76 138 139 128 131 122 98 124 129 125 123 121 126 135
C2 C2 C2 C2 C2 C2 C3 C3 C3 C4 C4 C4 C4 C4 C4 C3 C1 C1 C1 C1 C1 C1
1.85 1.87 1.92 2.27 2.31 2.35 2.76 2.88 3.02 3.28 3.65 3.94 4.47 5.35 5.86 7.67 8.14 8.33 8.43 9.21 11.00 15.50
f5 Outlier f7 f6 Outlier f2 Outlier f4 f2 f2 f6 f5 f4 Outlier f5 Outlier Outlier Outlier Outlier Outlier f6 Outlier
80 120 111 100 134 20 133 59 18 28 90 78 65 127 75 138 139 128 131 122 98 124
Y. Chen et al. / Computers in Industry 64 (2013) 214–225
224 Table 7 Data quality improvement by outlier removal. Data quality with outliers
Data quality without outliers 1
2
Improvement ratio 01
02
Partition labels
m
m
Partition labels
m
m
(m1 m0 1)/m1
(m0 2 m2)/m2
Outliers f5 + outlier f4 + f5 + outlier f6 + outlier f7 f2 f3 f1
2.08 4.65 7.17 6.15 15.87 21.81 51.26 93.68
5.77 5.17 3.54 2.85 0.78 1.85 0.70 0.17
f5 f4 + f5 f6 + outlier f7 f2 f3 f1
7.62 9.51 10.80 16.32 20.87 49.03 89.65
1.81 1.61 0.54 0.26 1.85 0.70 0.17
2.67 1.04 0.50 1.65 0.31 1.25 0.75
2.18 2.20 5.60 9.81 0.58 1.64 3.02
Fig. 11. Dissimilarity spectrums before and after outlier removal.
more than one time as shown in Table 7. Every partition is better separated with others, though only f1, f3 and f7 conditions have uniformly distributed instances similar to the condition before outlier removal. The same conclusion can be visualized in Fig. 11.
improve PHM results, which leads to better decision-making and significant cost/time savings. The outcome of this research could potentially impact other fields of research such as data-mining, biostatistics, and signal processing.
8. Conclusion and future work
Acknowledgement
The proposed method systematically evaluates acquired data in industry applications according to their capability of profiling multiple system conditions for the purpose of health diagnosis modeling. Based on the evaluation, outliers in the data can be recognized and data quality is able to be quantitatively calculated. Obviously the method provides an autonomous technique that can quickly evaluate data acquired from multiple tests or online inspections. The evaluation leads to better illustration of data sufficiency by discovering its inner characteristics of clusters, the significance of each partition and the percentage of outliers. This also facilitates the design of next step prognostic algorithms. Even the data is discovered to have poor quality, the visualized characteristics also support data reconstruction through different signal processing, feature extraction and selection techniques. In the future, the quantitative evaluation can be improved by testing more data sets from different applications and benchmark other clustering methods as well as cluster-ability metrics. Beside outlier detection, feature extraction and selection methods would be investigated for purposes of data quality improvement. In summary, the paper proposes a systematic and autonomous method to evaluate acquired data based on a novel visualized data partitioning algorithm as well as an outlier detection method. It reduces unnecessary investment in redundant prognostics analysis due to poor quality datasets. Assured data quality will also
This work was supported by US National Science Foundation (Award Number: 1031986) as a part of the fundamental research project: ‘‘A Systematic Methodology for Data Validation and Verification for Prognostics Applications’’. References [1] G.J. Vachtsevanos, F.L. Lewis, M. Roemer, A. Hess, B. Wu, Intelligent fault diagnosis and prognosis for engineering systems, Wiley, US, 2006, ISBN: 047172999X. [2] W. Tianyi, Y. Jianbo, D. Siegel, J. Lee, A similarity-based prognostics approach for remaining useful life estimation of engineered systems, in: International Conference on Prognostics and Health Management 2008, PHM 2008, 2008, 1–6. [3] S. Das, B.L. Matthews, R. Lawrence, Fleet level anomaly detection of aviation safety data, in: IEEE Conference on Prognostics and Health Management (PHM) 2011, 2011, 1–10. [4] J. Lee, J. Ni, D. Djurdjanovic, H. Qiu, H. Liao, Intelligent prognostics tools and Emaintenance, Computers in Industry 57 (2006) 476–489. [5] V. Venkatsubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis. Part I. Quantitative model-based methods, Computers & Chemical Engineering 27 (March) (2003) 293–311. [6] J.B. Coble, J.W. Hines, Prognostic algorithm categorization with PHM Challenge application, in: International Conference on Prognostics and Health Management 2008, PHM 2008, 2008, 1–11. [7] D. Djurdjanovic, J. Lee, J. Ni, Watchdog Agent—an infotronics-based prognostics approach for product performance degradation assessment and prediction, Advanced Engineering Informatics 17 (2003) 109–125. [8] Q. Hai, N. Eklund, N. Iyer, H. Xiao, Evaluation of filtering techniques for aircraft engine condition monitoring and diagnostics, in: International Conference on Prognostics and Health Management 2008, PHM 2008, 2008, 1–8.
Y. Chen et al. / Computers in Industry 64 (2013) 214–225 [9] M. Torres, E. Bogatin, Signal integrity parameters for health monitoring of digital electronics, in: International Conference on Prognostics and Health Management 2008, PHM 2008, 2008, 1–6. [10] P.E. McKnight, Missing Data: A Gentle Introduction, The Guilford Press, New York, US, 2007. [11] I. Lopez, N. Sarigul-Klijn, Distance similarity matrix using ensemble of dimensional data reduction techniques: vibration and aerocoustic case studies, Mechanical Systems and Signal Processing 23 (2009) 2287–2300. [12] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall Inc., Upper Saddle River, New Jersey, 1988. [13] M. Kaboudan, A measure of time series’ predictability using genetic programming applied to stock returns, Journal of Forecasting 18 (1999) 345–357. [14] W. Yan, K.F. Goebel, Feature selection for partial discharge diagnosis, Proceedings of SPIE (2005) 166–175. [15] V. Sugumaran, V. Muralidharan, K.I. Ramachandran, Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing, Mechanical Systems and Signal Processing 21 (2007) 930–942. [16] R.G. Lyons, Understanding Digital Signal Processing, Prentice Hall PTR, Upper Saddle River, New Jersey, 2004. [17] J.S. Walker, A Primer on Wavelets and Their Scientific Applications, CRC Press, US, 1999. [18] G. Williams, R. Baxter, H. Hongxing, S. Hawkins, G. Lifang, A comparative study of RNN for outlier detection in data mining, in: Proceedings of the 2002 IEEE International Conference on Data Mining, 2002. ICDM 2003, 2002, pp. 709–712. [19] P.J. Rousseeuw, K. Van Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics (1999) 212–223. [20] E.M. Knorr, R.T. Ng, Algorithms for mining distance-based outliers in large datasets, in: Proceedings of the International Conference on Very Large Data Bases, 1998, pp. 392–403. [21] M.M. Breunig, H.P. Kriegel, R.T. Ng, J. Sander, LOF: identifying density-based local outliers, Sigmod Record 29 (2000) 93–104. [22] A. Banerjee, R.N. Dave, Validating clusters using the Hopkins statistic, in: Proceedings of the 2004 IEEE International Conference on Fuzzy Systems, 2004, vol. 1, 2004, pp. 149–153. [23] O. Rafail, R. Yuval, J.S. Leonard, S. Chaitanya, The effectiveness of Lloyd-type methods for the k-means problem, in: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006, FOCS ‘06, 2006, 165–176. [24] S. Ben-David, N. Eiron, H.U. Simon, The computational complexity of densest region detection, Journal of Computer and System Sciences 64 (2002) 22–47. [25] S. Epter, M. Krishnamoorthy, M. Zaki, Clusterability detection and initial seed selection in large data sets, in: The International Conference on Knowledge Discovery in Databases, 1999. [26] B. Zhang, Dependence of clustering algorithm performance on clustered-ness of data, Technical Report 20010417. Hewlett-Packard Labs, 2001. [27] M.F. Balcan, A. Blum, S. Vempala, A discriminative framework for clustering via similarity functions, in: Proceedings of the 40th Annual ACM, 2008, pp. 671–680. [28] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, Advances in Neural Information Processing Systems 1 (2002) 585– 592. [29] A.Y. Ng, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems 2 (2002) 849. [30] J.C. Bezdek, R.J. Hathaway, VAT: a tool for visual assessment of (cluster) tendency, in: Proceedings of the 2002 International Joint Conference on Neural Networks, 2002, IJCNN ‘02, 2002, pp. 2225–2230. [31] L. Wang, U. Nguyen, J. Bezdek, C. Leckie, K. Ramamohanarao, Enhanced visual analysis for cluster tendency assessment and data partitioning, Advances in Knowledge Discovery and Data Mining (2010) 16–27. [32] J. Beardwood, J.H. Halton, J.M. Hammersley, The shortest path through many points, Mathematical Proceedings of the Cambridge Philosophical Society 55 (1959) 299. [33] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 888–905. [34] H. Tin Kam, E.M. Kleinberg, Building projectable classifiers of arbitrary complexity, in: Proceedings of the 13th International Conference on Pattern Recognition, 1996, vol. 2, 1996, pp. 880–885.
225
Dr. Yan Chen received the MS degree in mechanical engineering from Shanghai Jiao Tong University, China and the PhD degree in mechanical engineering from NSF Industry/University Cooperative Research center (I/UCRC) on Intelligent Maintenance System (IMS, www.imscenter.net) in University of Cincinnati, Ohio, US. Her research focuses on intelligent prognostic and data mining tools for engineering system failure detection, diagnosis and prediction as well as other signal processing techniques. She has involved and led many research efforts to develop and deploy prognostic technology in real manufacturing world under collaboration with global companies including P&G, GE, etc. She has published several refereed papers in these areas, including those published in prestigious journals and top international conferences. She is also a frequent reviewer for various international journals and conferences including Transactions of the Institute of Measurement and Control, ASME International Manufacturing Science and Engineering Conference, etc.
Mr. Feibai Zhu obtained a Master of Science Degree at the University Of Cincinnati. He became actively involved in Data Mining area approximately two years ago. Previously, he was involved in Intelligent Maintenance System for Industrial Applications like Machine Center and QC Machine.
Dr. Jay Lee is Ohio Eminent Scholar and L.W. Sott Alter Chair Professor in Advanced Manufacturing at the Univ. of Cincinnati and is founding director of National Science Foundation (NSF) Industry/University Cooperative Research center (I/UCRC) on Intelligent Maintenance Systems (IMS, www.imscenter.net) Since its inception in 2001, the Center has been supported by over 70 global companies including P&G, GE Aviation, National Instruments, Boeing, Caterpillar, Siemens, Chevron, Honeywell, Parker Hannifin, Spirit AeroSystems, etc. His current research focuses on dominant innovation tools for product and service design as well as intelligent prognostics tools and smart predictive analytics for equipment reliability assessment and smart product life cycle management. He also serves as honorary professor and visiting professor for a number of institutions including Shanghai Jiao Tong Univ., City Univ. of Hong Kong, Cranfield Univ. in UK, Lulea Univ. of Technology in Sweden, Hong Kong PolyU, Xian Jiao Tong Univ. and Harbin Institute of Technology (HIT) in China. He is editors and associate editor for a number of journals including IEEE Transaction on Industrial Informatics, Int. Journal on Prognostics & Health Management (IJPHM), Int. Journal on Service Operations and Informatics, etc. He has authored/co-authored numerous highly influential articles and technical papers in the areas of machinery monitory and prognostics, E-manufacturing, and Intelligent Maintenance Systems. He also has a number of patents and trademarks. He is a frequently invited speaker and has delivered over 150 invited keynote and plenary speeches at major international conferences. He is a Fellow of ASME, SME, as well as a founding fellow of International Society of Engineering Asset Management (ISEAM).