A Data Mining Approach for Fault Diagnosis: An

Accepted Manuscript A Data Mining Approach for Fault Diagnosis: An Application of Anomaly Detection Algorithm Afrooz Purarjomandlangrudi, Amir Hossein Ghapanchi, Mohammad Esmalifalak PII: DOI: Reference:

S0263-2241(14)00250-4 http://dx.doi.org/10.1016/j.measurement.2014.05.029 MEASUR 2876

To appear in:

Measurement

Received Date: Revised Date: Accepted Date:

17 February 2014 11 April 2014 15 May 2014

Please cite this article as: A. Purarjomandlangrudi, A.H. Ghapanchi, M. Esmalifalak, A Data Mining Approach for Fault Diagnosis: An Application of Anomaly Detection Algorithm, Measurement (2014), doi: http://dx.doi.org/ 10.1016/j.measurement.2014.05.029

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A Data Mining Approach for Fault Diagnosis: An Application of Anomaly Detection Algorithm

Abstract: Rolling-element bearing failures are the most frequent problems in rotating machinery, which can be catastrophic and cause major downtime. Hence, providing advance failure warning and precise fault detection in such components are pivotal and cost-effective. The vast majority of past research has focused on signal processing and spectral analysis for fault diagnostics in rotating components. In this study, a data mining approach using a machine learning technique called Anomaly Detection (AD) is presented. This method employs classification techniques to discriminate between defect examples. Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop anomaly detection algorithms. The performance of the developed algorithms was examined through real data from a test to failure bearing. Finally, the application of anomaly detection is compared with one of the popular methods called Support Vector Machine (SVM) to investigate the sensitivity and accuracy of this approach and its ability to detect the anomalies in early stages.

Authors: Afrooz Purarjomandlangrudi1*, Amir Hossein Ghapanchi2, Mohammad Esmalifalak3 1. Queensland University of Technology (QUT), 2 George St Brisbane, QLD 4000 Science and Engineering Department E-mail: [email protected] 2. School of Information and Communication Technology, Griffith University Gold Coast, Queensland, 4222, Australia E-mail: [email protected] 3. University of Houston, Houston, TX, USA Department of Electrical and Computer Engineering E-mail: [email protected]

A Data Mining Approach for Fault Diagnosis: An Application of Anomaly Detection Algorithm

Abstract: Rolling-element bearing failures are the most frequent problems in rotating machinery, which can be catastrophic and cause major downtime. Hence, providing advance failure warning and precise fault detection in such components are pivotal and cost-effective. The vast majority of past research has focused on signal processing and spectral analysis for fault diagnostics in rotating components. In this study, a data mining approach using a machine learning technique called Anomaly Detection (AD) is presented. This method employs classification techniques to discriminate between defect examples. Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop anomaly detection algorithms. The performance of the developed algorithms was examined through real data from a test to failure bearing. Finally, the application of anomaly detection is compared with one of the popular methods called Support Vector Machine (SVM) to investigate the sensitivity and accuracy of this approach and its ability to detect the anomalies in early stages. Key words: data mining, fault diagnosis, machine learning, anomaly detection, support vector machine. 1.

Introduction:

Low speed rotating machineries are widely applied in many heavy industries such as steel pipe, mining and wind turbine power plants. Rolling element bearing condition monitoring has become the center of attention in recent years, since the majority of rotating machine defects are caused by faulty bearings and it offers considerable economic savings and easy implementation. According to the Department of Trade and Industry in the UK (DTI), the benefit of Condition Based Maintenance (CBM) in wind turbine rolling elements has achieved total savings of £1.3 billion per year in the device’s lifetime[1]. Condition Monitoring Systems (CMS) or Health Monitoring Systems (HMS) plays important roles in organizing condition-based maintenances and repairs (M&R). One means of achieving this objective is applying an efficient fault diagnosis technique that has greater sensitivity to find very minor defects. Bearing faults are one of the foremost causes of failures in rotating mechanical systems (40–50% in wind turbines [2]) for they include some or numerous bearings to provide smooth rotation with minimal losses, and their faults can be directly contributed to consecutive problems in other major components. Since the time of occurring principal failures varies for inner race, outer race, ball, and rolling element, the accuracy and sensitivity of the maintenance techniques are essential to detect incipient faults in bearings. The majority of existing works have focused on classified fault types on the basis of availability of fault samples, in practice collecting all types of faulty data from bearing defects is very difficult if not impossible. This is due to the fact that some faults occur very occasionally and also each type of machine has specific failure vibration patterns [3-5]. Some previous studies have overcome the

problem by applying data-mining algorithms and machine learning classification technologies, which use a historical database of the system to predict failures. Among various methods that have been used machine learning, artificial neural networks (ANN) have experienced the fastest development over the past few years[6].Nevertheless, there are some drawbacks with neural networks, such as structure identification difficulties, Orthogonal Weight Estimators (OWE) learning, local convergence, and poor generalization abilities, since they originally applied Experienced Risk Minimization (ERM). The other method, Support Vector Machines (SVM), was a better solution to overcome the disadvantages mentioned [7], [8] and rapidly became the center of attention in recent research. Basically, in SVM the algorithm deals with binary classification problems, and furthermore various kinds of SVM fault classifications suffer from huge amounts of computation which causes some restrictions. The aim of this research is to propose a fault diagnosis method that is able to overcome all the above mentioned drawbacks, provide system with higher sensitivity in fault detection and the most important point is it does not need huge historical data with fault samples. This method is based on anomaly detection approaches to create models of normal data and then attempts to detect abnormalities from the normal model in the observed data. Machine learning technique develops algorithms that are able to find different patterns in data and adjust program actions according to the training dataset. Hence, the anomaly detection algorithm is able to recognize the majority of new types of intrusion [9, 10]. This method provides the ability to classify data where generally we only have access to a single class of data, or a second class of data is under-represented. However, this method needs a purely normal data set to train the algorithm. The algorithm may not recognize future failures and will assume they are normal if the training data set includes the effects of the intrusions. The aforementioned feature contributes to diagnosing faults and fatigues in early stages, and because of the high sensitivity in its nature this method is extremely rigorous in comparison with previous algorithms.

This paper is organized as follows. Section 2 discuss about the existing analysis techniques and feature extraction. Section 3 contains the methodology proposed for fault detection and diagnosis. In section 4, the experimental data from a test rig is used to validate the proposed algorithm. In section 5, the implication of the methods for both researchers and practitioners are explained, and finally the conclusion closes the paper in section 6.

2.

Analysing techniques

Fault diagnosis and CM, using vibration analysis are commonly applied in a variety of industrial tasks. In rotating components, vibration signals are generated and can be observed via sensors [11]. These vibration signals can provide information on the health state of the rotating component and relevant features of vibration signals can be extracted for signal processing. Various analysis techniques have been conducted on the raw vibration data collected from the sensors [12-14]. The techniques normally include time-domain processing, frequency-domain processing

and time-frequency techniques. These are the three main classes among numerous techniques that have been developed for waveform and data analysis, and interpretation. Frequency-domain techniques have previously been employed to show that a localized defect can generate a periodic signal with a singular characteristic frequency [15]. This approach identifies and isolates the added frequency component to diagnose faults. However, this approach is suitable for defect information where the collected data contains a waveform signal with certain harmonic attributes. Time-domain techniques are primarily descended from statistical behaviours of signal waveforms. There are several characteristic features like peak-to-peak amplitude, Root-Mean Square (RMS) energy, standard deviation, skewness and kurtosis. These statistical features and their probability density functions can experience modifications when abnormalities are observed in the input data. 2.1. Kurtosis In this paper we will apply one of time-domain features called kurtosis, which is defined as the fourth statistical moment, normalized by the standard deviation to the fourth power. It represents a measure of the flattening of the density probability function near the average value. Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal distribution is 3. Distributions that are more outlier-prone than the normal distribution have kurtosis greater than 3; distributions that are less outlier-prone have a kurtosis value less than 3[16]. The kurtosis value can be calculated using the following equation: (1) μ= mean of x[i] and σ= standard deviation of x[i]. 2.2. Non-Gaussianity Score (NGS): One of the quantitative measurements of a data set which arises of the deviation from Gaussianity is the nonGaussianity score (NGS). The NGS feature which is assigned to symbol ψ is developed to measure the nonGaussianity for a segment of data[17]. To calculate ψ for data x[i] first the inverse normal cumulative distribution function (CDF) for the data should be obtained from the following equations: (2) where,

(3) The value of

in (2) is the probability required to calculate

values for plotting the normal probability of the data.

If x[i] presents a Gaussian distribution data set, the normal probability plot would compare and the probabilities of a Gaussian dataset which is assigned to g[i] in (3). The measure of non-Gaussianity (ψ) is the rate of deviation of the probability plot of the data (to its Gaussian probability plot reference (g).

3.

Proposed method

3.1. Machine learning approaches For many engineering and science problems there are no direct mathematical solutions. Learning techniques have been used extensively to overcome this problem. Researchers in different fields try to develop algorithms that learn the behavior of the given problem using historical data [18, 19]. Learning algorithms can be used in different applications such as the prediction of the future value, clustering and detection of anomaly behavior in the data[16]. Machine learning is a type of artificial intelligence (AI), which provides the ability to learn without being explicitly programmed for systems. This technique develops computer programs that are able to teach themselves and train their algorithms to find patterns in data and adjust program actions accordingly. Support vector machines (SVM) are one of the supervised learning approaches used for classification and regression which has recently emerged as a popular machine learning method. The SVM training algorithm creates a model based on the training examples with the ability to predict a new example would belong to which category. SVMs have been applied in various diagnostic applications, such as bearing faults [20], induction motors[21], machine tools [22, 23], rotating machines [24], etc. Generally, in fault diagnostics SVMs are combined with other feature selection techniques and kernel functions [25]. In this paper both Anomaly Detection (AD) and SVM are conducted on the experimental data set to obtain a comparison of these techniques. 3.2. Basic theory of Anomaly Detection Anomaly detection, also referred to as novelty detection, outlier detection[26],or one-class learning [27], is based on classification techniques and provides the user with the ability to classify data where generally we only have access to the a single class of data, or a second class of data is under-represented. This method usually consists of two phases, the first is a training phase and the second is a testing phase. In the former, the algorithm is trained by a labeled data set which is mainly normal examples, and in the latter phase the learned algorithm is applied to a new data set. To put it more formally, it discovers a functional mapping

from the examples of the

training dataset to an unknown probability distribution p(x, y).Where the normal samples are abnormal samplesare

and

. (4)

In order that the AD algorithm is trained based on the presumption that anomalous data are not generated by the source of normal data and the training set contains a huge percentage of normal data, the algorithm would detect any other example which is anomalous and has intrusive activity that leads to a good rate of sensitivity towards incipient faults[28].Different methods such as statistical anomaly detection, data-mining based methods, and machine learning based techniques are propounded for anomaly detection, for which the last one is used in machine fault detection. The procedure of the algorithm and the mathematical concepts behind it are discussed in the following sections.

3.2.1.

Gaussian distribution

This research is proceeded with a supervised learning approach. We will assume that the labels are available in the dataset and that the vast majority of these examples are “normal” (non-anomalous), but that there are also some examples of the system acting anomalously within this dataset. To apply anomaly detection, first it needs to select a training data set

(where

algorithm for each feature

) and then fit a model to the data'sdistribution. For this purpose the

(i=1,…,n) estimates the Gaussian parameters

and

that fits the data in the i-th

.The Gaussian distribution is obtained by the following equation, where μ is the mean,

dimension

controls the variance, and p is the probability density function:

(5) 3.2.2.

Estimating parameters for a Gaussian distribution

The Gaussian parameters

of the i-th feature are estimated by the following equations. To compute the mean

we applied: (6) And for the variance: (7) 3.2.3.

Selecting the threshold, ɛ

When the Gaussian parameters are calculated, the algorithm investigates which examples have a very high and low probability given this distribution. The low probability features are more likely to be the anomalies in the dataset. To define which examples are anomalies, the algorithm would select the threshold by using the F1score on a cross validation set. The precision (prec) and recall (rec) are applied for computing the F1score:

(8)

The prec and rec can be obtained from: (9)

(10)

Where tp is the number of true positives; which means the original label in the dataset says it is an anomaly and our algorithm has correctly classified it as an anomaly. The fp is the number of false positives, when it is not an anomaly, but the algorithm incorrectly classifies it as an anomaly. And finally fn is the number of false negatives in cases where the dataset says it is an anomaly while the algorithm wrongly classifies it as not being anomalous. Afterwards the algorithm will try several values of ε to find the best one based on the F1score. Once the best ε is selected, the algorithm finds the examples that fall out of the threshold boundaries, which are the anomalies of the data set. Indeed the F1score represents the accuracy of the algorithm, in which a score of 1 means 100% accuracy and that the algorithm can identify all anomalies in the data set.

4.

Case study

The vibration data employed in this paper have been collected from the data set of rolling element bearings from the NSF I/UCR Center for Intelligent Maintenance Systems (IMS – www.imscenter.net) with support from Rexnord Corp. in Milwaukee, WI[29]. As Figure 1 illustrates, four bearings were installed on a shaft. Vibration data was collected every 10 min for 164hwith a sampling rate of 20 kHz. An outer race defect occurred at the end of this experiment on bearing 1. As a result, the data of the horizontal accelerometer of bearing 1 have been applied, in which the bearing consists of 16 rollers in each row [30].

Fig.1. Bearing test rig and sensor placement

The majority of past studies have been done on bearing fault diagnostics conducted by simulated or seeded damage, and experiments employing these kinds of faults are not appropriate to detect natural defects in advance. Since the main objective of this research is to present a method with the ability of detecting bearing incipient faults, this data set is completely appropriate for our purpose. The Fast Fourier Transform (FFT) spectrum of the signal for 8 days before the test rig stop working of data collection has been plotted in the following figures which the horizontal axes indicate frequency to give a clear perspective of the data patterns in this test to failure experiment. The figures show

that from day 1 to day 4 there are some minor changes in the FFT spectrum while from day 5 they were going to become a major defect.

Fig.2. FFT spectrum for test to failure experiment

4.1. Model description

In this experiment 100 captures of normal operating conditions were used as the training data set (16hr) and afterwards two explanatory parameters that were already discussed, kurtosis and NGS, are extracted for each subband of the raw data. These parameters are independent of the energy of the signal and have a proper distribution for machine learning data analysis. The sub-bands are selected with 50 ms length and 25 ms shift. The shift is employed to provide an overlapping for each sub-band of the dataset.

The obtained data from the feature extraction step is then applied as an input to the algorithm. They are processed by the three phases of Gaussian distribution, estimating parameters for a Gaussian, and Selecting the threshold. When the best ε is determined, the anomaly detector finds every example whose probability is less than the defined ε. Figure 3 depicts a schematic of the anomaly visualizing for fault diagnostic when the defect is occurred, and as can be seen in the zoom out section the anomaly detector has the ability to recognize bad data from the incipient stage, and all anomalies are detected and circled in red in the visualizing schematic. Figure 4 depicts the SVM clustering schematic for the same data set through a similar algorithm which can determine the best kernel parameters for the data set.

Fig.3. Anomaly Detection result visualizing schematic

Fig.4. SVM result visualizing schematic

The results clearly prove that the anomaly detection techniques have greater sensitivity and a 95% detection accuracy. In more scientific terms, the F1 score for both the anomaly detection and SVM methods have been calculated and plotted by an additional shift of 10 in the data (Figures 5 and 6). Figure 5 indicates that for the training data set the graph reaches the steady state after using 370 examples with an accuracy of 97%, while this rate is 75% for the SVM. Moreover, as can be seen, in anomaly detection the graph experiences a levelling out at 110 samples and 95% accuracy, which are 42 and 82% respectively for the SVM graph, and means that after sufficient training, the anomaly detection has greater precision with a lesser amount of examples.

Fig.5. Anomaly Detection F1score

Fig.6. SVM F1score On the other hand, besides accuracy and sensitivity, the fast detection is one of the most important aspects of this method. Figures 7 and 8 display the exact time that each technique has detected the first anomaly point. Figure 7 has been adopted from the results of past research that has been conducted on the same dataset [31]. In this figure the red dots show the points where the systems detects a change in behavior for the first time, for the SVM this is 75 h before the crack makes the machine stop and 100 h for anomaly detection.

Fig.7. SVM output [31]

Fig.8. Anomaly Detection output

5.

Discussion

This paper investigates fault diagnostics and the detection of incipient failures and defects in rolling element bearings. The proposed method utilizes machine learning techniques to detect abnormalities in the operation. A real bearings vibration data set from NSF I/UCR Center for Intelligent Maintenance Systems was utilized to test the performance of the anomaly detection algorithm, and then the same data was applied to a SVM algorithm for comparison. The implication of the results of this work for industrial users and researchers are discussed below. 5.1. Implications for industry practitioners Bearings as a principal element are used in many applications contain rotating components to provide a relative motion for other moving machine elements. Therefore it is widely used in automotive, industrial, marine, and aerospace applications as an effective means to support transmitted power to the load. In terms of industrial applications, for low speed rotating components such as wind turbines and agricultural equipment, which includes gearbox and bearings, the purposed method can be easily applied for detecting any type of fault and fatigue regarding gear teeth, shafts, bearings and many other elements. In order to do so, data is collected from the desired part so that one can train the algorithm by a historical data of normal data and then apply it on the online CMS to detect any abnormalities in advance. This method not only does not require a huge data base to train but also contains not very complex and time consuming computation for users. Thus it would be a very efficient and cost effective method to detect impending problems and allow repairs to be scheduled. In terms of automotive engineering, which implies bearing in pumps, steering systems, air-conditioning compressors, engine rocker arms, throttle butterfly valves, gearboxes, and transmissions, applying a more accurate and convenient method such anomaly detection can contribute to improving the condition monitoring of such systems. The bearings in marine environments in diesel engines used on ships mainly help reduce friction by converting sliding friction into rolling friction. They are subject to different types of forces due to gas pressure, and different reciprocating and rotating motions of engine parts. Hence they can employ the discussed method as an adequate fault diagnosis technique for better maintenance and to prevent major and costly failures.

5.2. Implications for researchers For many engineering and science problems, learning techniques have been widely used when there is no direct mathematical solution for different issues. There are several machine learning methods that have been used for the purpose of fault detection in low speed rotating components such as bearings and gearboxes. Among these methods much research has been done regarding ANN, which has some disadvantages such as structure identification difficulties, local convergence, and poor generalization abilities. SVM is another technique that has recently become very popular, but often its huge amount of computation makes it impractical for many applications. To overcome these problems the anomaly detection approach, which is based on classification techniques, is adapted to detect

new types of intrusion. This method contains less computation and training data, while the results indicate a higher precision and efficiency. It also is able to detect the first abnormal behavior earlier in comparison with the other techniques applied. Regarding future work for researchers, applying anomaly detection for fault diagnosis in rolling element bearings provides a platform for the further use of this method in the condition monitoring of other low speed rotating components, such as gearboxes and shafts. Moreover, there are some other machine learning approaches that can be investigated for this purpose, such as inductive logic programming, clustering, reinforcement learning, and similarity and metric learning. Also a combination of these methods with principal components analysis, classification techniques or game theory would be some areas of research that may enhance the fault detection and condition monitoring of rotating components. 6.

Conclusion

Learning techniques have recently become very popular and widespread especially in fault diagnostics, and the ability of self-teaching and training provides a variety of applications. In this paper, a data mining approach used machine learning technique called anomaly detection is applied to diagnose early defects in wind turbine bearings. This method employs learning algorithm to categorize the anomalies from normal data. In the proposed approach, Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop the anomaly detection algorithm. In order to test the efficiency of the application in bearing fault detection, real data from a test to failure bearing was utilized. The purpose of using this data set is that all the data from when the bearing worked properly until the defect has happened and made it stop working were provided. The results show that this method is able to detect faults and anomalies in bearing components with higher accuracy and smaller data sets. It can be implied that anomaly detection has the accuracy of 97%, while this rate is 75% for the SVM. They also indicate that this method provides the ability to detect defects at earlier stages than previous methods, which is significant and essential in rolling element condition monitoring.

In this paper anomaly detection machine learning method was proposed to diagnose early defects in wind turbine bearings. The proposed approach was tested against the popular and well known method of SVM, using real data from a fault seeded bearing test. In the proposed approach, two features, kurtosis and NGS were extracted as part of the developed anomaly detection algorithm. The data which is used in this paper has a 0.007 inch fault in the inner race, outer race and ball with loads of 0 and 1 HP applied onto the bearing test. The results in this paper indicated that the anomaly detection learning techniques can achieve higher accuracy with better efficiency for bearing fault diagnosis compared to the previously applied method using the SVM classifier approach. In addition, it was shown that anomaly detection converges to optimum value with high accuracy and fewer required data samples.

7. References

[1] [2]

[3] [4]

[5]

[6] [7] [8] [9] [10] [11]

[12] [13] [14]

[15]

[16] [17]

[18] [19] [20]

A. Davies, Handbook of condition monitoring: techniques and methodology: Springer, 1998. E. Al Ahmar, V. Choqueuse, M. Benbouzid, Y. Amirat, J. El Assad, R. Karam, et al., "Advanced signal processing techniques for fault detection and diagnosis in a wind turbine induction generator drive train: A comparative study," in Energy Conversion Congress and Exposition (ECCE), 2010 IEEE, 2010, pp. 3576-3581. S. Fang and W. Zijie, "Rolling bearing fault diagnosis based on wavelet packet and RBF neural network," in Control Conference, 2007. CCC 2007. Chinese, 2007, pp. 451-455. L. Meng, W. Miao, and W. Chunguang, "Research on SVM Classification Performance in Rolling Bearing Diagnosis," in Intelligent Computation Technology and Automation (ICICTA), 2010 International Conference on, 2010, pp. 132-135. K. F. Al-Raheem and W. Abdul-Karem, "Rolling bearing fault diagnostics using artificial neural networks based on Laplace wavelet analysis," International Journal of Engineering, Science and Technology, vol. 2, 2010. S.-T. J. Cristianini Nello, "An introduction to support vector machines and other kernel-based learning methods," 2004. V. VN., "The nature of statistical learning theory," New York: Springer Verlag, 1999. J. Cheng, D. Yu, and Y. Yang, "A fault diagnosis approach for gears based on IMF AR model and SVM," EURASIP Journal on Advances in Signal Processing, vol. 2008, p. 647135, 2008. D. E. Denning, "An intrusion-detection model," Software Engineering, IEEE Transactions on, pp. 222-232, 1987. N. N. M. Esmalifalak, R. Zheng, and Z. Han, "Detecting Stealthy False Data Injection Using Machine Learning in Smart Grid," in the Proceedings of IEEE GLOBECOM 2013, Atlanta, 2013. A. K. Jardine, D. Lin, and D. Banjevic, "A review on machinery diagnostics and prognostics implementing condition-based maintenance," Mechanical systems and signal processing, vol. 20, pp. 1483-1510, 2006. P. McFadden and J. Smith, "Vibration monitoring of rolling element bearings by the highfrequency resonance technique—a review," Tribology international, vol. 17, pp. 3-10, 1984. R. Heng and M. Nor, "Statistical analysis of sound and vibration signals for monitoring rolling element bearing condition," Applied Acoustics, vol. 53, pp. 211-226, 1998. G. Luo, D. Osypiw, and M. Irle, "On-line vibration analysis with fast continuous wavelet algorithm for condition monitoring of bearing," Journal of Vibration and Control, vol. 9, pp. 931947, 2003. Y. Li, S. Billington, C. Zhang, T. Kurfess, S. Danyluk, and S. Liang, "Adaptive prognostics for rolling element bearing condition," Mechanical systems and signal processing, vol. 13, pp. 103-113, 1999. G. N. A.Purarjomandlangrudi, M. Esmallifalak, A.C. Tan, "Fault Detection in Wind Turbine: A systematic literature review," Wind engineering, 2013. H. Ghaemmaghami, D. Dean, S. Sridharan, and I. McCowan, "Noise robust voice activity detection using normal probability testing and time-domain histogram analysis," in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, 2010, pp. 44704473. V. N. Vapnik, "An overview of statistical learning theory," Neural Networks, IEEE Transactions on, vol. 10, pp. 988-999, 1999. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction vol. 1: Cambridge Univ Press, 1998. L. B. J. a. A. K. Nandi, "Fault detection using support vector machine and artificial neural network, augmented by genetic algorithm," Mechanical System and Signal Processing, vol. 16, pp. pp. 373-390, 2002.

[21] [22]

[23]

[24] [25] [26] [27] [28] [29] [30]

[31]

S. Pöyhönen, A. Arkkio, P. Jover, and H. Hyötyniemi, "Coupling pairwise support vector machines for fault classification," Control Engineering Practice, vol. 13, pp. 759-769, 2005. J. Sun, G. Hong, M. Rahman, and Y. Wong, "The application of nonstandard support vector machine in tool condition monitoring system," in Electronic Design, Test and Applications, 2004. DELTA 2004. Second IEEE International Workshop on, 2004, pp. 295-300. J. Sun, M. Rahman, Y. Wong, and G. Hong, "Multiclassification of tool wear with support vector machine by manufacturing loss consideration," International Journal of Machine Tools and Manufacture, vol. 44, pp. 1179-1187, 2004. A. Y. D. M. J. Tax, and R. P. W. Duin, "Pump failure determination using support vector data description," Lecture Notes in Computer Science, pp. pp. 415-425, 1999. F. He and W. Shi, "WPT-SVMs based approach for fault detection of valves in reciprocating pumps," in American Control Conference, 2002. Proceedings of the 2002, 2002, pp. 4566-4570. S. Marsland, "Novelty detection in learning systems," Neural computing surveys, vol. 3, pp. 157195, 2003. K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, "An introduction to kernel-based learning algorithms," Neural Networks, IEEE Transactions on, vol. 12, pp. 181-201, 2001. S. Kumar and E. H. Spafford, "An application of pattern matching in intrusion detection," 1994. J. Lee, Qiu, H., Yu, G., & Lin, J., "Bearing data set. IMS, University of Cincinnati, NASA Ames Prognostics Data Repository, Rexnord Technical Services.," 2007. H. Qiu, Lee, J., Lin, J., & Yu, G., "Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics," Journal of Sound and Vibration, pp. 1066– 1090, 2006. D. Fernández-Francos, D. Martínez-Rego, O. Fontenla-Romero, and A. Alonso-Betanzos, "Automatic Bearing Fault Diagnosis Based on One-Class ν-SVM," Computers & Industrial Engineering, 2012.

Highlights:

1. A data mining approach called Anomaly Detection was presented for fault detection. 2. Two features, kurtosis and Non-Gaussianity Score (NGS) are extracted from raw data. 3. Both anomaly detection and SVM techniques are applied on these features. 4. Results show AD has the ability to detect the incipient faults sooner than the SVM. 5. AD has higher accuracy and sensitivity than the SVM in fault diagnosis.