Anomaly Detection in Gas Turbine Fuel Systems

energies Article

Anomaly Detection in Gas Turbine Fuel Systems Using a Sequential Symbolic Method Fei Li 1 , Hongzhi Wang 1, *, Guowen Zhou 2 , Daren Yu 2 , Jiangzhong Li 1 and Hong Gao 1 1 2

*

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China; [email protected] (F.L.); [email protected] (J.L.); [email protected] (H.G.) School of Energy Science and Engineering, Harbin Institute of Technology, Harbin 150001, China; [email protected] (G.Z.); [email protected] (D.Y.) Correspondence: [email protected]; Tel.: +86-131-3675-2809

Academic Editor: Antonio Calvo Hernández Received: 12 April 2017; Accepted: 14 May 2017; Published: 20 May 2017

Abstract: Anomaly detection plays a significant role in helping gas turbines run reliably and economically. Considering the collective anomalous data and both sensitivity and robustness of the anomaly detection model, a sequential symbolic anomaly detection method is proposed and applied to the gas turbine fuel system. A structural Finite State Machine is used to evaluate posterior probabilities of observing symbolic sequences and the most probable state sequences they may locate. Hence an estimation-based model and a decoding-based model are used to identify anomalies in two different ways. Experimental results indicate that both models have both ideal performance overall, but the estimation-based model has a strong robustness ability, whereas the decoding-based model has a strong accuracy ability, particularly in a certain range of sequence lengths. Therefore, the proposed method can facilitate well existing symbolic dynamic analysis- based anomaly detection methods, especially in the gas turbine domain. Keywords: gas turbine fuel system; anomaly detection; symbolic dynamic analysis; time series

1. Introduction Gas turbine engines, which are among the most sophisticated devices, perform an essential role in industry. Detecting anomalies and eliminating faults during gas turbine maintenance is a great challenge since the devices always run under variable operating conditions that can make anomalies seem like normal. As a consequence, Engine Health Management (EHM) policies are implemented to help gas turbines run in reliable, safe and efficient state facilitating operating economy benefits and security levels [1,2]. In the EHM framework, many works have been devoted into anomaly detection and fault diagnosis in gas turbine engines. Since Urban [3] first got involved in EHM research, many techniques and methods have been subsequently proposed. Previous anomaly detection works mainly involve two categories: model-based methods and data driven-based methods. Model-based methods typically includes linear gas path analysis [4–6], nonlinear gas path analysis [7,8], Kalman filters [9,10] and expert systems [11]. Data driven-based methods typically include artificial neural networks [12,13], support vector machine [14,15], Bayesian approaches [16,17], genetic algorithms [18] and fuzzy reasoning [19–21]. Previous studies are mainly based on simulation data or continuous observation data. Simulation data are sometimes too simple to reflect actual operating conditions as real data usually contains many interferences that make anomalous observations appear normal. This is particularly challenging for anomaly detection in gas turbines that operate under sophisticated and severe conditions. There are two possible routes for improving anomaly detection performance for gas turbines, the first one is from the perspective of anomalous data and the second one is from the perspective of a detection model. Energies 2017, 10, 724; doi:10.3390/en10050724

www.mdpi.com/journal/energies

Energies 2017, 10, 724

2 of 22

On the one hand, anomalies occurring during gas turbine operation usually involve collective anomalies. A collective anomaly is defined as a collection of related data instances that is anomalous with respect to the entire data set [22]. Each single datapoint doesn’t seem like anomaly, but their occurrence together may be considered as an anomaly. For example, when a combustion nozzle is damaged, one of the exhaust temperature sensors may record a consistently lower temperature than other sensors. Collective anomalies have been explored for sequential data [23,24]. Collective anomaly detection has been widely applied in domains other than gas turbines, such as intrusion detection, commercial fraud detection, medical and health detection, etc. [22] In industrial anomaly detection, some structural damage detection methods are applied by using statistical [25], parametric statistical modeling [26], mixture of models [27], rule-based models [28] and neural-based models [29], which can sensitively detect highly complicated anomalies. However, in gas turbine anomaly detection, these methods have some common shortcomings. For example, data preprocessing methods such as feature selection and dimension reduction are highly complicated when confronting gas turbine observation data, which greatly undermines the operating performance. Furthermore, many interferences included in the data such as ambient conditions, normal patterns changes, even sensor observation deviations produce extraneous information for detecting anomalies, usually concealing essential factors that may be critically helpful for detection. For instance, a small degeneration in a gas turbine fuel system may be masked by normal flow changes, thus hindering anomaly detection of a device’s early faults. Thus, some studies have focused on fuel system degeneration estimation by using multi-objective optimization approaches [30,31], which are helpful for precise anomaly detection in fuel systems. On the other hand, detection models for gas turbines require both sensitivity and robustness capabilities. The sensitivity ensures a higher detection rate and the robustness ensures fewer misjudgements. The symbolic dynamic filtering (SDF) [32] method was proposed and yielded good robustness performance in anomaly detection in comparison to other methods such as principal component analysis (PCA), ANN and the Bayesian approach [33], as well as suitable accuracies. Gupta et al. [34] and Sarkar et al. [35] presented SDF-based models for detecting faults of gas turbine subsystems and used them to estimate multiple component faults. Sarkar et al. [36] proposed an optimized feature extraction method under the SDF framework [37]. Then they applied a symbolic dynamic analysis (SDA)-based method in fault detection in gas turbines. Next they proposed Markov- based analysis in transient data during takeoff other than quasi-stationary steady-states data and validated the method by simulation on the NASA Commercial Modular Aero Propulsion System Simulation (C-MAPSS) transient test-case generator. However, current SDF-based models usually adopt simulated data or data generated in laboratories, especially in the gas turbine domain. The performance of these methods remains unconfirmed with real data, for instance, from long-operating gas turbine devices that contains many flaws and defects, for which sensors may not always be available for data acquisition. Considering that both solutions for improving the performance of gas turbine anomaly detection have their disadvantages, in this paper, we combine the two strategies by building a SDA-based anomaly detection model and processing collective anomalous sequential data in order to establish more sensitive and robust models and eliminate their intrinsic demerits. We then apply this method in anomaly detection for gas turbine fuel systems. In this paper, the observation data from an offshore platform gas turbine engine are first partitioned into symbolic sequential data to construct a SDA-based model, finite state machine, which reflects the texture of the system’s operating tendency. Then two methods, an estimation-based model and a detection-based model, are proposed on basis of the sequential symbolic reasoning model. One method is more robust and the other is more sensitive. The two methods can be well integrated in different practical scenarios to eliminate irrelevant interferences for detecting anomalies. A comparison between collective anomaly detection, symbolic anomaly detection and our method is presented in Table 1.

Energies 2017, 10, 724

3 of 23

Table 1. Comparison between different current methods and ours.

Symbolic Anomaly 3 of 22 Our Method Detection Symbolic and Sequential based Sequential and symbolic Table 1. Comparison between different current methods and ours. Strategy semantic based analysis integrated analysis analysis Collective Anomaly Symbolic Anomaly Our Method • Confronts real operating data, Detection Detection • High robustness; guaranteeing both Symbolic and semantic Sequential and symbolicsensitivity Strategy Sequential based analysis based analysis integrated analysis • Detects complicated • An easy operating and robustness; Advantages industrial anomalies; vector • •High dimension data are Confronts real operating data, • High sensitivity representation in converted to simple guaranteeing both sensitivity • High robustness; and robustness; • Detects complicated anomaly detection individual symbol sequences; • An easy operating industrial anomalies; • High dimension data are Advantages vector representation in • Eliminates interferences converted to simple individual • High sensitivity anomaly detection symbol sequences; • High dimension data • Most are based on • Eliminates interferences needs to be simulated data or preprocessed experimental data, Disadvantages •• Interferences High dimension data complexity • lack Most are based on and needs to simulated data or remarkably influence credibility in actual be preprocessed experimental data, lack Disadvantages scenarios complexity and • detection Interferences

Energies 2017, 10, 724

Collective Anomaly Detection

credibility in

remarkably

actual scenarios influence detection This paper is organized in six sections. In Section 2, preliminary mathematical theories on symbolic dynamic analysis are presented. Then, in Section 3 the data used in this study and symbol partition are introduced. The finite state machine training and the two anomaly detection models are This paper is organized in six results sections. 2, preliminary theories on proposed in Section 4. Experimental andIna Section comparison between themathematical two models from several symbolic dynamic analysis are presented. Then, in Section 3 the data used in this study symbol perspectives are given in Section 5 and a discussion and conclusions are briefly presented in and Section 6. partition are introduced. The finite state machine training and the two anomaly detection models are proposed in Section 4. Experimental results and a comparison between the two models from several 2. Preliminary Mathematical Theories perspectives are given in Section 5 and a discussion and conclusions are briefly presented in Section 6. 2.1. Discrete Markov Model 2. Preliminary Mathematical Theories T Consider a time series of states   t  of length T, indicated by ω   1 ,  2 ,..., T  . 2.1. Discrete Markov Model A system can be in a same state at different times and it doesn’t need to achieve all possible states at a time series statesisωgenerated T, indicated by (1), ωT which = {ω (is 1),named ω (2), ...,transfer ω ( T )}. (t) of length once.Consider A stochastic series ofofstates through Equation A system can be in a same state at different times and it doesn’t need to achieve all possible states at once. probability: A stochastic series of states is generated through Equation (1), which is named transfer probability: aij  P  j  t  1 i  t  (1) aij = P ω j (t + 1)|ωi (t) (1) This represents a conditional probability that a system will transfer to state  j in the next time represents a conditional probability thatreletated a systemtowill state ω j model in the next time pointThis when it is in state thetransfer time. Into a Markov diagram,  i currently. a ij is not point when it is in state ωi currently. aij is not reletated to the time. In a Markov model diagram, each discrete state  is denoted as a node and the line which links two nodes is denoted as a each discrete state ωi isi denoted as a node and the line which links two nodes is denoted as a transfer transfer probability. typical model Markov model diagram is represented probability. A typicalAMarkov diagram is represented in Figurein 1. Figure 1.







Figure 1. Discrete Markov model. Model.



Energies 2017, 10, 724

4 of 23

Energies 2017, 10, 724

4 of 22

The system is in state   t  at moment t currently, while the state in moment (t + 1) is a random function which is both related to the current state and the transfer probability. Therefore, a specific The system is in state ω (t) at moment t currently, while the state in moment (t + 1) is a random time series of states generated in probability is actually a consecutive multiply operation of each function which is both related to the current state and the transfer probability. Therefore, a specific time transfer probability in this series. series of states generated in probability is actually a consecutive multiply operation of each transfer probability in this series. 2.2. Finite State Machine 2.2. Finite Statethat Machine Assume at moment t, a system is in state   t  , and meanwhile the system activates a visible

symbol . Aatspecific activate specific series the of system visible activates symbols:   t  that Assume momenttime t, a series systemwould is in state ω (t),aand meanwhile aυTvisible specific time series would activate a specific series of visible symbols:   1symbol ,  2 ,...,υ(t)T.  A. In this Markov model, state   t  is invisible, but the activated visible T υ = {υ(1), υ(2), ..., υ( T )}. In this Markov model, state ω (t) is invisible, but the activated visible can observed.We Wedefine definethis thismodel modelasasaafinite finitestate statemachine machine (FSM), (FSM), shown shown in in Figure Figure 2. symbol υ(t)t can symbol bebe observed. 2. The activation activation probability probabilityof ofaaFSM FSMisisdefined definedby byEquation Equation(2): (2): The









jj (tt) b jkbjk=PP υkk(tt) ω

(2) (2)

In this this equation, equation,we wecan canonly onlyobserve observethe thesymbol symbolυ(t). tIn In Figure it can be seen the FSM  . Figure In 2 it 2can be seen thatthat the FSM has four invisible states linked with transfer probabilities and each ofofthem has four invisible states linked with transfer probabilities and each themcan canactivate activatethree three different different visible visible symbols. symbols. The FSM is strictly subject to causality, which which means means the the probability probability in in the the future future conclusively conclusively depends depends on on the the current current probability. probability.

Figure 2. 2. Finite state state machine. machine. Figure

2.3. Baum-Welch Baum-Welch Algorithm Algorithm 2.3. The Baum-Welch Baum-Welch algorithm algorithm [38] and activation activation The [38] is is used used in inestimation. estimation.Transfer Transferprobabilities probabilities aaijij and probabilities from a pool of training samples. The The normalized premise of aij of anda bij jk and is that: probabilities b jk from a pool of training samples. normalized premise b jkareare b jk is that:

∑ aij = 1, for all i j

a

 1,for all i

b

 1,for all j

ij

∑ bj jk = 1, for all j k

jk

(3)

(3)

k We define a forward recursive equivalence in Equation (4), where αi (t) denotes the probability in state t which has generated t symbols of the sequence aij the is the transfer Weωdefine a forward recursive equivalence in Equation (4), where υi Tt before. denotes probability i at moment probability from state ωi (t − 1) to state ω j (t) and b jk is the probability of activated symbol υk in T in state  i at moment t which has generated t symbols of the sequence υ before. a ij is the state ω j (t): (   t  and b is the probability of activated symbol transfer probability from state  i  t  1  to state jk 1 jt=0 α j (t) = (4)  k in state  j  t  : ∑i αi (t − 1) aij b jk

We define a backward recursive equivalence in Equation (5) either, where β i (t) denotes the probability in state ωi at moment t which generates symbols of the sequence υT from moment t + 1 to

Energies 2017, 10, 724

5 of 22

T. aij is the transfer probability from state ωi (t) to state ω j (t + 1) and b jk is the probability of activated symbol υk on state ω j (t + 1): ( 1 t=T β i (t) = (5) ∑ j β j (t + 1) aij b jk the parameters aij , b jk in the above two equations remain unknown, so an expectation maximized strategy can be used in the estimation. According to Equations (4) and (5), we define the transfer probability γij (t) that is from state ωi (t − 1) to state ω j (t) under the condition of observed symbol υk : γij (t) =

αi (t − 1)αij b jk β j (t)

(6)

P (υ T |θ )

where P υT |θ is the probability of the symbol sequence υT generated through any possible state sequence. Expectation transfer probability through a sequence from state ωi (t − 1) to state ω j (t) is ∑tT=1 γij (t) and the total expectation transfer probability from state ωi (t − 1) to any state is ∑tT=1 ∑k γik (t). 3. Data and Symbolization In gas turbine monitoring, all operation data observed by sensors are continuous. There is no observation that can acquire several discrete operating conditions automatically. Thus, in this section we focus on a symbol extraction method that can discretely represent different load condition patterns. After symbolization, many irrelevant interferences are dismissed. In this paper, the data resource is from a SOLAR, Titan 130 Gas Turbine. The parameters used are listed in Table 2 below and an overview of the structure of the turbine fuel system is shown in Figure 3. Table 2. Monitoring sensors used in symbol extraction. Sensor No.

Parameters

92 84 81 85 98 148 88 82 95 96 112

Average gas exhaust temperature Average fuel supplement temperature Main gas supplement pressure Main gas valve demand Average combustion inlet temperature Generator power guide vane position Main gas nozzle pressure Main gas heat value Main gas density Average compressor inlet temperature

Energies 2017, 10, 724

Unit ◦C

◦C

kPa % ◦C kW % kPa kJ/kg kg/m3 ◦C

6 of 23

Figure3.3.Overview Overview on on SOLAR fuel system. Figure SOLAR Gas gas Turbine turbine fuel system.

Original data contain many uncertainties and ambient disturbances. For example, the average combustion inlet temperature varies during daytime and at night, influenced by the atmospheric temperature that fluctuates over several days. This tendency is shown in Figure 4. Besides, the operating parameters of the gas turbine also have uncertainty, even under the same operating pattern. Figure 5 shows the uncertainty of the average gas exhaust temperature. It suggests that both in scenarios 1 and 2, the average gas exhaust temperature has an uncertainty boundary around the

Fuel supplement pressure Fuel supplement pressure

Gas fuel supply Gas fuel supply

Control valve position Control valve position

Flow first security valve Flow second security valve Flow first security valve Flow second security valve

Energies 2017, 10, 724

Main gas nozzel pressure Main gas nozzel pressure

Control valve Control valve

Figure Figure 3. 3. Overview Overview on on SOLAR SOLAR Gas Gas Turbine Turbine fuel fuel system. system.

Main burner supply Main burner supply

6 of 22

many uncertainties and ambient disturbances. Original Original data data contain contain many many uncertainties uncertainties and and ambient ambient disturbances. disturbances. For For example, example, the the average average the atmospheric combustion inlet temperature varies during daytime and at night, influenced by the atmospheric combustion inlet temperature varies during daytime and at night, influenced by the atmospheric temperature is shown in Figure 4. the fluctuates over over several several days. days. This Thistendency tendency shown Figure 4. Besides, temperature that that fluctuates fluctuates over several days. This tendency is is shown in in Figure 4. Besides, Besides, the operating parameters of the gas turbine also have uncertainty, even under the same operating the operating parameters of the gas turbine also have uncertainty, even under the operating operating parameters of the gas turbine also have uncertainty, even under the same operating pattern. temperature. pattern. pattern. Figure Figure 55 shows shows the the uncertainty uncertainty of of the the average average gas gas exhaust exhaust temperature. temperature. It It suggests suggests that that both both scenarios and 2, the average gas exhaust temperature has an uncertainty boundary in in scenarios scenarios 11 and and 2, 2, the the average average gas gas exhaust exhaust temperature temperature has has an an uncertainty uncertainty boundary boundary around around the the center line. All these factors may influence the device operation and data processing. The center line. All these factors may influence the device operation and data processing. The disturbances center line. All these factors may influence the device operation and data processing. The disturbances and uncertainties are extraneous and which with and uncertainties usually extraneous and redundant, interferes with theinterferes anomaly detection. disturbances andare uncertainties are usually usually extraneous which and redundant, redundant, which interferes with the the anomaly detection. Therefore, one way to eliminate such interferences is data symbolization. Therefore, one way to eliminate such interferences is data symbolization. anomaly detection. Therefore, one way to eliminate such interferences is data symbolization.

Figure 4. Disturbance of ambient temperature. Figure 4. 4. Disturbance Disturbance of of ambient ambient temperature. temperature. Figure

Figure 5. Uncertainty of gas exhaust temperature. Figure 5. 5. Uncertainty Uncertainty of of gas gas exhaust exhaust temperature. temperature. Figure

Many Many strategies strategies for for data data symbolization symbolization or or discretization discretization have have been been proposed proposed and and were were well well Many in strategies forpapers data symbolization or discretization have been proposed and were well discussed previous [39–41]. In summary, two kinds of approaches—splitting discussed in previous papers [39–41]. In summary, two kinds of approaches—splitting and and discussed in previous papers [39–41]. In summary, two kinds of approaches—splitting and merging—can be used in data symbolization. A feature can be discretized by either splitting the interval of continuous values or by merging the adjacent intervals [36]. It is very difficult to apply the splitting method in our study since we cannot properly preset the intervals. Therefore, a simple merging-based strategy is adopted for data symbolization. We apply a cluster method to symbol extractions—K means (KM) cluster method. The main reason is that in a FSM, the number of hidden states and visible symbols are both finite and KM initially has a specific cluster boundary and expected cluster numbers. Samples in a cluster are regarded as the same symbol. There are seven clusters that correspond to seven different load conditions as shown in Table 3, so K is equal to 7 for the KM model.

Energies 2017, 10, 724

7 of 22

Table 3. Correspondence list between clusters and labels. Cluster No.

Label

Symbol

#0 #1 #2 #3 #4 #5 #6

Flow rise in low load Steady in high load Steady in low load Fast changing condition Flow rise in high load Flow drop in high load Flow drop in low load

FRLL SHL SLL FCC FRHL FDHL FDLL

The purpose of the KM method is to divide original samples into k clusters, which ensures the high similarity of the samples in each cluster and low similarity among different clusters. The main procedures of this method are as follows: (1)

(2)

Choose k samples in the dataset as the initial center for each cluster, and then evaluate distances between the rest of the samples and every cluster centers. Each sample will be assigned to a cluster to which the sample is the closest. Renew the cluster center through the nearby samples and reevaluate the distance. Repeat this process until the cluster centers converge. Generally, distance evaluation uses the Euclidean distance and the convergence judgement is the square error criterion, as shown in Equation (7): k

E=

∑ ∑ | p − mi | 2

(7)

i =1 p∈Ci

where E is sum error of total samples, P is the position of the samples and mi is the center of cluster Ci . Iteration terminates when E becomes stable, which is less than 1 × 10−6 . Therefore, the final clusters and samples are converted from continuous variables into discrete visible symbols. In this study, altogether 79,580 samples that are divided into seven symbols. The sampling interval is 10 min and symbol extraction results are shown in Figure 6. The most frequent symbols are Steady in high load (SHL) and Steady in low load (SLL). The intermediate ones are Flow rise in low load (FRLL), Flow rise in high load (FRHL), Flow drop in low load (FDLL) and Flow drop in high load (FDHL). The least frequent symbol is Fast changing condition (FCC). There are 3327 samples labeled as anomalies and the distribution of the abnormal samples is shown in Figure 7. All of the abnormal labels are derived manually from the device operation logs and maintenance records. To simplify our modeling, we classify all kinds of anomalies or defects in records into one class—anomaly. After symbolization, we use these symbols generated by KM cluster method to construct a FSM to measure whether a series of symbols are on normal conditions or on an Energiesone. 2017, 10, x FOR PEER REVIEW 8 of 23 abnormal

Figure 6. Distribution on total symbol data. Figure 6. Distribution on total symbol data.

Energies 2017, 10, 724

Figure 6. Distribution on total symbol data.

8 of 22

Figure 7. 7. Distribution Distribution of of anomaly anomaly data. data. Figure

4. FSM Modeling and Anomaly Detection Methods 4. FSM Modeling and Anomaly Detection Methods The finite state machine, widely applied in symbolic dynamic analysis, has shown great The finite state machine, widely applied in symbolic dynamic analysis, has shown great superiority in comparison to other techniques [30,31]. Therefore, the main tasks of establishing a superiority in comparison to other techniques [30,31]. Therefore, the main tasks of establishing sequential symbolic model in gas turbine fuel system anomaly detection are to build a FSM to a sequential symbolic model in gas turbine fuel system anomaly detection are to build a FSM to estimate posterior probabilities of sequences and to build detection models to detect the abnormality estimate posterior probabilities of sequences and to build detection models to detect the abnormality of sequences. Before this, however, the discrete symbols extracted in Section 3 need to be sequenced of sequences. Before this, however, the discrete symbols extracted in Section 3 need to be sequenced into series, time series segments, of length T. into series, time series segments, of length T. For FSM modeling, one aspect of this work is to determine several parameters. There are five For FSM modeling, one aspect of this work is to determine several parameters. There are five states defined in this model: normal state (NS), anomaly state (AS), turbine startup (ST+), turbine states defined in this model: normal state (NS), anomaly state (AS), turbine startup (ST+), turbine shutdown (ST-) and halt state (HS). Actually, anomaly is a hidden state that is usually invisible during shutdown (ST-) and halt state (HS). Actually, anomaly is a hidden state that is usually invisible during operation compared to the other four states. Till now, we have defined the model structure of the operation compared to the other four states. Till now, we have defined the model structure of the FSM which contains five states and seven visible symbols. For any moment, there are seven possible FSM which contains five states and seven visible symbols. For any moment, there are seven possible symbols in a state that could be observed with probability b . a is the probability that a state symbols in a state that could be observed with probability b jk . aij jkis theijprobability that a state transfers transfers from one toWhen another. the pattern operating patternreflected changesby reflected or hidden from one to another. the When operating changes symbolsbyorsymbols hidden states at a states at a structural level, the device might experience an anomaly that is easier to observe. This thus structural level, the device might experience an anomaly that is easier to observe. This thus satisfies satisfies idea of anomaly findingin patterns in data that do not to conform to behavior. expected the basic the ideabasic of anomaly detection,detection, finding patterns data that do not conform expected behavior. Another concern about FSM modeling is how long the time series are, which can be efficiently Another FSM modeling is howestimation. long the time series are, which can be efficiently involved in theconcern transferabout and activation probability Figure 8 shows that different lengths T involved in the transfer and activation probability estimation. Figure 8 shows that different lengths of segments may lead to different classification label results. A segment is defined as an anomaly only T of segments may lead to different classification label results. A segment is defined as an anomaly if it contains at least one abnormal sample. This classification suggests that the time series in T = 5 and only it contains at least one abnormal sample. This classification suggests that the time series9 in T TEnergies = 10ifare significantly different. 2017, 10, 724 of 23 = 5 and T = 10 are significantly different.

Figure 8. 8. Time Time series series generation generation with with different different length. length. Mark Mark (+) (+) denotes denotes the the normal normal series series and and mark mark Figure ((−) −) denotes denotes the the abnormal abnormal series. series.

It is difficult to primordially define a proper T that can be applied well in parameter estimation and anomaly detection, so another way we take is to optimize length T recursively until it has the best performance in the anomaly detection section. We declare that in actual data preprocessing, time series are generated by a sliding window with length T in order to ensure that the size of time series can match the original dataset, which means the total number of time series is 79581-T. For anomaly detection, two strategies are used: an estimation-based model and a decoding-

Figure 8. Time series generation with different length. Mark (+) denotes the normal series and mark (−) denotes the abnormal series. Energies 2017, 10, 724

9 of 22

It is difficult to primordially define a proper T that can be applied well in parameter estimation and anomaly detection, so another wayawe takeTisthat to optimize length T recursively it has the It is difficult to primordially define proper can be applied well in parameteruntil estimation best in the anomaly section. declare that Tinrecursively actual datauntil preprocessing, time andperformance anomaly detection, so anotherdetection way we take is to We optimize length it has the best series are generated by a sliding window with length T in order to ensure that the size of time series performance in the anomaly detection section. We declare that in actual data preprocessing, time series can the original dataset, which means theTtotal number of time series is 79581-T. arematch generated by a sliding window with length in order to ensure that the size of time series can Forthe anomaly two means strategies are used: anofestimation-based model and a decodingmatch originaldetection, dataset, which the total number time series is 79581-T. based For model. The estimation-based modelare is that FSM to calculate theand probability of a symbol anomaly detection, two strategies used:we anuse estimation-based model a decoding-based model. The estimation-based use FSM to calculate the abnormal probabilitysamples, of a symbol sequence. In this case, FSM ismodel built isbythat thewe training data precluding so the sequence. sequences In this case, FSM is built by testing the training samples, sothrough the abnormal contained in the data data will precluding have veryabnormal low probabilities abnormal sequences contained inmodel the testing data very low probabilities throughstate calculation. calculation. The decoding-based is that wewill usehave a FSM to decode a most probable sequence The decoding-based model is symbol that we use a FSM If to the decode a most state probable state sequence which generates an observed sequence. estimated sequence contains which anomaly generates sequence. If the estimated state sequence contains anomaly states (AS), states (AS),an theobserved symbol symbol sequence is judged to be an anomaly. the symbol sequence judged to be ancontents, anomaly.the schematic of a FSM modeling is shown in Figure 9, According to the is aforementioned According to the aforementioned contents, the schematic a FSM modeling shown in Figure 9, as illustrated below. First, data are sequenced into a pool ofoftime series by an is initial sliding window as illustrated First, data are sequenced a pool of time series an initial window with length T. below. Then data are divided into two into parts: training data and by testing data.sliding The training data with length T. Then data are divided into two parts: training data and testing data. The training data are used to construct a FSM, to estimate transfer and activation probabilities of unknown states and are used to construct a FSM, to estimate transfer and activation probabilities of unknown states and visible symbols. The testing data are used to evaluate the FSM performance. After modeling the FSM, visible symbols. The testing data are used to evaluate the FSM performance. After modeling the FSM, performance will be tested through two anomaly detection strategies—the decoding-based model performance will be tested through two anomaly detection strategies—the decoding-based model and the estimation-based model. Length T will be updated until the models and the FSM achieve the and the estimation-based model. Length T will be updated until the models and the FSM achieve the best performance. best performance.

b jk

aij

Figure 9. Schematic Figure Schematicof ofFSM FSMmodeling modelingprocedures. procedures.

4.1. Training a FSM The main task of training a FSM is to estimate a group of transfer probabilities aij and activation probabilities b jk from a pool of training samples. In this study, Baum-Welch algorithm [41] is used in estimation. The probability γij (t) is deduced in Equation (6). Therefore, an estimated transfer probability αˆ ij is described as: ∑tT=1 γij (t) αˆ ij = T (8) ∑t=1 ∑k γik (t)

Energies 2017, 10, 724

10 of 22

Similarly, estimation of activation probability bˆ jk is described as: ∑tT=1 γ jk (t) bˆ jk = T , ∑t=1 ∑l γ jl (t)

(9)

where expectation activation probability of υk (t) on state ω j (t) is ∑tT=1 γ jk (t) and the total expectation activation probability on state ω j (t) is ∑tT=1 ∑l γ jl (t), l is number of symbols. According to the analysis above aij , b jk can be gradually approximated by αˆ ij , bˆ jk through Equations (8) and (9) until convergence. The pseudocode of the estimation algorithm is shown in Algorithm 1 below. The first aij , b jk are generated randomly at the beginning. Hence we can evaluate αˆ ij , bˆ jk using aij , b jk estimated in the former generation. We repeat this process until the residual between αˆ ij , bˆ jk and aij , b jk is less than a threshold ε and then the optimized aij , b jk are used in the FSM.

Algorithm 1. Procedures of aij , b jk estimation. Input: Initial parameters aij (0), b jk (0), Training set υT , convergence threshold ε, z ← 0 Output: Finial estimated FSM parameters αˆ ij , bˆ jk 1 2 3 4 5

Loop z ← z + 1 Estimating aˆ ij (z) by a(z − 1), b(z − 1) in Equation (8) Estimating bˆ jk (z) by a(z − 1), b(z − 1) in Equation (9) aij (z) ← aˆ ij (z) b jk (z) ← bˆhjk (z) i

6

Until max aij (z) − aij (z − 1); b jk (z) − b jk (z − 1) < ε

7

Return aij ← aij (z) , b jk ← b jk (z)

i,j,k

4.2. Anomaly Detection Based on the Estimation Strategy An estimation strategy is used in anomaly detection of FSM in this part, which is inspired by anomaly detection approaches. Anomaly detection is defined as finding anomalous patterns in which data do not relatively satisfy expected behavior [22]. The FSM estimation strategy calculates the posterior probabilities of the symbol sequences generated by FSM. It can detect anomalies efficiently if it has been built from normal data. Therefore, this strategy conforms to the basic idea of what anomaly detection does. In this strategy, FSM is utilized to establish an expected pattern and the estimation process involves finding the nonconforming sequences, in other words, the anomalies. Figure 10 illustrates the schematic of anomaly detection using the estimation strategy. In order to establish an expected normal pattern, the training data used in FSM modeling are utterly normal sequences. Parameters aij , b jk are estimated by FSM training, which indicates the intrinsic normal pattern recognition capability contained in these parameters, hence, the model estimation probabilities of testing symbol sequences in which abnormal sequences are included. Probabilities in normal sequences will be much higher than that in abnormal ones, so the detection indicator is actually a preset threshold that judges the patterns of test sequences. The probability of a symbol sequence generated by a FSM can be described as: rmax P υT = ∑ P υT ωr T P ωr T ,

(10)

r =1

where r is the index of state sequence of length T, ωr T = {ω (1), ω (2), ..., ω ( T )}. If there are, for instance, c different states in this model, the total number of possible state sequences is rmax = c T . An enormous amount of possible state sequences need to be considered to calculate the probability of

Energies 2017, 10, 724

11 of 22

Energies 2017, 10, 724

11 of 23

υT ,

a generated symbol sequence as shown in Equation (10). The second part of the equation can be pattern recognition capability contained in these parameters, hence, the model estimation probabilities descried as: of testing symbol sequences in whichabnormal sequences are included. Probabilities in normal T T ωrthat =in∏ P(ω (t)|ones, ω (t − ) ) detection indicator is actually a (11) sequences will be much higherPthan abnormal so1the i =1 sequences. preset threshold that judges the patterns of test

υ mT

υ1T υ1T ……

P  υ1T 

aij b jk

υ nT

υiT

P  υTm 

Figure 10. Schematic of of anomaly theestimation estimation strategy. Figure 10. Schematic anomalydetection detection using using the strategy.

The probability ofTasymbol sequence generated by a FSM can be described as: The probability P ωr is continuous and chronological multiplication. Assume that activation r T ω T can be probability of a symbol critically dependsT on the current T T state, T so the probability P υ r , P  υ    P  υ ω r P  ω r  (10) described in Equation (12) and then Equation (13) is the other description of Equation (10): r 1 max

T  1 ,  2 ,..., T  . If there are, for T T P υ = P υ t ω (12) ( ( )| ( t ) ), ω r ∏ instance, c different states in this model, the total number of possible state sequences is r  c T . An

of lengthT T, ωr  where r is the index of state sequence

max

t =1

enormous amount of possible state sequences need to be considered to calculate the probability of a υT , as rmax T generated symbol sequence shown in Equation (10). The second part of the equation can be T P υ = P(υ(t)|ω (t) ) P(ω (t)|ω (t − 1) ). (13) ∑ ∏ descried as: r =1 t =1

T T P  ωabove P   t   ist O1 c T T which is too high to finish (11) this However, the computing cost of the r   equation i 1 process. There are two alternative methods that can dramatically simplify the computing process, a P ωrT algorithm, forward algorithm and a backward whichand are chronological depicted by Equations (5) andAssume (6), respectively. The probability is continuous multiplication. that 2 T , that is c T −2 times faster than the original The computing cost of these algorithms are both O c activation probability of a symbol critically depends on the current state, so the probability T strategy. shows the detection indicator the forward Initialize of aij , b jk , P υAlgorithm ω r T can2be described in Equation (12) andbased then on Equation (13) is algorithm. the other description training sequences υ(t) precluding anomalies, α j (0) = 1, t = 0., then update α j (t) until t = T and the Equation (10): probability of υ(t) is α j ( T ). If the probability is higher than the preset threshold, then the symbol T T sequence is classified in the positive, normal Otherwise the symbol sequence is classified in the P  υT ωclass. (12) r    P   t    t   , t 1 negative, anomalous class.

 





rmax

T

P  υT    P   t    t  P   t    t  1 . Algorithm 2. Anomaly based on the estimation strategy. r 1detection t 1



 



(13)

 

T Input: However, the computing cost of the above equation is O c T which is too high to finish this t ← 0 , aij , b jk , sequence υ(t), α j (0) = 1, threshold θ process. There are two alternative methods that can dramatically simplify the computing process, a Output: forward algorithm and a backward algorithm, which are depicted by Equations (5) and (6), Classification result Class(υ(t)) 2 T 2 respectively. The1 computing times faster than For t ←cost t + of 1 these algorithms are both O c T , that is c

 

α j (t) ← b jk2υshows αi (tdetection − 1) aij (t)∑ic=1the Algorithm

2 the original strategy. indicator based on the forward algorithm. Until tsequences =T Initialize a ij , b jk3, training υ  t  precluding anomalies,  j  0   1 , t = 0., then update Return P(υ(t)) ←ofα jυ( T the probability  j  t  until t = 4T and  t) is  j T  . If the probability is higher than the preset If α j ( T ) > θ 5 6 Class(υ(t)) ← Positive 7 Else Class(υ(t)) ← Negative 8 Return Class(υ(t))

Energies 2017, 10, 724

12 of 22

There are two points to be mentioned. In the first place, the decision judgement is very simple for use of anomaly detection by threshold θ. This convenience is ascribed to the normal pattern constructed by FSM. The probabilities estimated from FSM intuitively simulate the possibilities of the symbol sequences emerging in a real system. Furthermore, the performance of this strategy is not only dependent on the efficiency of the trained FSM, but also on the proper threshold. Therefore, in the modeling process, we optimize the threshold by traversing different values in order to get the highest overall accuracy of normal and abnormal sequences. 4.3. Anomaly Detection Based on the Decoding Strategy Compared to the estimation strategy, a FSM can detect anomalies too if it can recognize whether there are hidden anomalous states in sequences. Fortunately decoding a state sequence is available in FSM so another way to detect anomalies is to decode a symbol sequence to state sequence and then find whether the state sequence contains abnormal states. Unlike the previous estimation strategy, the decoding strategy is a sort of, somehow, optimization algorithm in which a searching function is used. Figure 11 illustrates the schematic of anomaly detection using the decoding strategy. In this strategy, FSM is trained by all sequences that contain both normal and abnormal sequences so the FSM reflects the system that may run on normal or abnormal patterns. After that, the decoding process involves searching for the most probable state sequence that the symbol sequence corresponds to. A greedy-based method is applied in searching in each step so the most possible state is chosen and is added to the path. The final path is the decoded state sequence. Finally, the model judges a symbol Energies 10, 724 sequence on whether it contains anomaly states. 13 of 23 sequence by2017, its state

υ1T

υmT

υ1T

υnT

aij

ω1T

b jk

ω mT

ωiT

Figure 11. 11. Schematic ofofanomaly usingthe thedecoding decoding strategy. Figure Schematic anomalydetection detection using strategy.

Algorithm 3 shows the procedures of the decoding-based anomaly detection strategy. Initialize

Algorithm 3 shows the procedures of the decoding-based anomaly detection strategy. Initialize parameters a , b jk , testing sequence υ  t  ,  j  0   1 , and path. Then, update  j  t  and in each parameters aij , b jk , ijtesting sequence υ(t), α j (0) = 1, and path. Then, update α j (t) and in each moment moment t, traverse all state candidates and the most possible state in this moment is the one who t, traverse all state candidates and the most possible state in this moment is the one who makes α j (t) makes  j  t  biggest and then add the state into the path until the end of the sequence. After that, biggest and then add the state into the path until the end of the sequence. After that, scan the decoded scan the decoded state sequence if there is at least one anomaly state (AS), then the observed sequence state sequence if there is at least one anomaly state (AS), then the observed sequence is classified in the is classified in the negative, anomalous class, otherwise classified in the positive, normal class. negative, anomalous class, otherwise classified in the positive, normal class. ComparedAlgorithm to the estimation strategy, the decoding strategy has some advantages and 3. Anomaly detection based on the decoding strategy. disadvantages. First of all, the detection indicator based on the decoding strategy may help the Input: anomaly detection system sensitivity meaning that it can precisely warn of an anomaly once t 0 , a ij , b jk ,improve, υ  t  ,  j  0   1 , path    it occurs. Besides, it helps system locate anomaly emerging time in high resolution. For example, Output: a sequence contains 10 samples and the sample interval is 10 min so the length of the sequence is Class υ t   100 min. If the sequenceClassification is abnormal result and the two system, estimation-based model and decoding-based t provide t 1 1 the latter For can model, both alert, a more precise anomaly occurrence time, for instance, in 20 min j former 1 2 and 60 min possibly but the can only provide a probability that an anomaly may have occurred. j  j  1 For 3 However, this point may also be a disadvantage of the decoding strategy due to the lack of robustness. c  t  b  t  t  1 4  aij j  jk   i 1 i  5 6

Until j  c

j   arg max  j  t  j

7

Add  j  to path

8

Until

t T

Energies 2017, 10, 724

13 of 22

This detection strategy is a local optimization algorithm that may reach a local minimum point that is actually not the global optimized solution. It may have searched an utterly different state sequence than the real one. Furthermore, error rate accumulates with the growth of the searching path, particularly for a longer length T, and the false positive rate may be very high, meaning that many normal sequences are classified as anomalies, so the length of the sequence is a critical factor for detection performance. It indicates the robustness performance of the models. This issue will be analyzed in the next section. Algorithm 3. Anomaly detection based on the decoding strategy.

6

Input: t ← 0 , aij , b jk , υ(t), α j (0) = 1, path ← {} Output: Classification result Class(υ(t)) For t ← t + 1 j←1 For j ← j + 1 α j (t) ← b jk υ(t)∑ic=1 αi (t − 1) aij Until j = c j0 ← argmaxα j (t)

7 8 9 10 11 12

Add ω j0 to path Until t = T If path contains state (AS) Class(υ(t)) ← Negative Else Class(υ(t)) ← positive Return Class(υ(t))

1 2 3 4 5

j

5. Experimental Design and Results Analysis 5.1. Performance Evaluation The data used in this section is from a SOLAR Titan 130 gas turbine power generator on an offshore oil platform. The data list and symbol distribution are shown in Table 2 and Figure 6. Overall (79581-T) sequences are used in training and testing by cross validation. The dataset was divided equally and sequentially into ten folders. Each folder was used in turn as testing data, with the other nine as training data, until every folder had been tested by the others. Hence the final result consists of a performance average together with a standard deviation. The labels of the samples are grouped into positive (normal) and negative (anomaly). The detection performance is measured by the true positive rate (TP rate) and true negative rate (TN rate) [42]. Table 4 shows the definition of confusion matrix which measures four possible outcomes. Table 4. Definition on confusion matrix. Actual Status

Detecting Positive

Detecting Negative

Positive Negative

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

The TP rate is defined as the ratio of the number of samples correctly classified as positive to the number of samples that are actually positive: TPrate =

TP . TP + FN

(14)

Energies 2017, 10, 724

14 of 22

The TN rate represents the ratio of the number of samples correctly classified as negative to the number of actual negative samples: TN . (15) TNrate = TN + FP In this paper, anomaly detection on a gas turbine fuel system is a class imbalanced problem in that the normal class is much larger than the abnormal class. The performance of imbalanced data can be measured by AUC, which is the area under the Receiver Operating Characteristic (ROC) curve, as shown in Figure Energies 2017, 10, 724 12. The ROC curve is a plot of the FP rate on the X axis versus the TP rate on 15 ofthe 23 Y axis. It shows the 10, differences between the FP rate and the TP rate based on different rules. Energies 2017, 724 15 of 23

Figure 12. Schematic diagram of Area Under ROC Curve (AUC). The surface of the area is the

Figure 12. 12. Schematic diagram of Area Under ROCROC Curve (AUC). The surface of theof area the Receiver Figure Schematic of Area (AUC). The surface theisarea is the Receiver Operatingdiagram Characteristic curveUnder and the area ofCurve the shadow is AUC. Operating curve and curve the area thearea shadow AUC. is AUC. ReceiverCharacteristic Operating Characteristic andofthe of theisshadow 5.2. Exemplary Cases and Experimental Results

Exemplary Cases and Experimental Resultsworks, two exemplary cases are provided. One is a normal 5.2.5.2. Exemplary and Experimental To Cases illustrate how the detectionResults system sequence. Another is an abnormalsystem sequence that is in fuel nozzle valve whichOne mayis illustrate how detection works, two exemplary cases are a normal To To illustrate how thethe detection system works, two exemplary casesanomaly, areprovided. provided. Onecause is a normal the fuel flow fluctuation or drop and in turn cause a load drop. The selected sequences, after sequence. Another is an abnormal sequence that is in fuel nozzle valve anomaly, which may causethe sequence. Another is an abnormal sequence that is in fuel nozzle valve anomaly, which may cause normalization, are of length T = 10, with a 10 min sampling interval, for a total 100 min. The the fuel flow fluctuation or drop and in turn cause a load drop. The selected sequences, after fuel flowanomalous fluctuation or drop in in turn cause drop. The selected sequences, after‘Main normalization, sequence is and shown Figure 13 aasload the red sequence, for original parameters gas normalization, are of length T = 10, with a 10 min sampling interval, for a total 100 min. The are of length T = 10, with a 10 min interval, for ainto total 100 min. The anomalous sequence valve demand’ and ‘Power’. The sampling two sequences, partitioned symbols, are described in Figure 14. anomalous sequence is consisted shown Figure 13 for as the red sequence, for points. original parameters ‘Main The map is seven different symbols and 10 time black row is the gas is shown inroutine Figure 13 as the redinof sequence, original parameters ‘MainThe gas valve demand’ and valvenormal demand’ and ‘Power’. The two sequences, partitioned into symbols, are described in Figure 14. sequence and the red dashed line is the anomalous sequence. Besides, the labeled state ‘Power’. The two sequences, partitioned into symbols, are described in Figure 14. The routine map is The routine is consisted of sevencases different symbols 10 time points. TheST-}.and black row sequencemap of normal and anomalous are {NS, NS, NS, and ST-, ST-, NS, ST-, ST-, NS, {NS,is the consisted of seven different symbols and 10 time points. The black row is the normal sequence and NS,sequence NS, NS, NS, AS,the AS, red AS, NS, NS}, respectively. normal and dashed line is the anomalous sequence. Besides, the labeled state the red dashed line is the anomalous sequence. Besides, the labeled state sequence of normal and sequence of normal and anomalous cases are {NS, NS, NS, ST-, ST-, NS, ST-, ST-, NS, ST-}.and {NS, anomalous cases are {NS, NS, NS, ST-, ST-, NS, ST-, ST-, NS, ST-}.and {NS, NS, NS, NS, NS, AS, AS, AS, NS, NS, NS, NS, AS, AS, AS, NS, NS}, respectively. NS, NS}, respectively.

Figure 13. An exemplary anomaly sequence.

Figure 13. An exemplary anomaly sequence. Figure 13. An exemplary anomaly sequence.

Energies 2017, 10, 724

15 of 22

Energies 2017, 10, 724

16 of 23

Figure 14. Comparison betweenaanormal normal sequence sequence and anomaly sequence in symbolic description. Figure 14. Comparison between andanan anomaly sequence in symbolic description.

The results are generated by the trained FSM and the two detection models, the estimation- and The results are models. generated by the trained FSM andisthe two detection models, probabilities the estimation in this decoding-based Threshold model 0.00743 and the posterior of and decoding-based Threshold θ indetermined this modelby is 0.00743 and the posterior normal normal and models. anomalous sequences the estimation-based modelprobabilities are 0.01563 of and and 0.0009235. anomalous determined by the estimation-based are 0.01563 and 0.0009235. Thesequences decoded most probable state sequences calculated bymodel the decoding-based model are NS, NS, NS,probable ST-, ST-, state ST-, NS, ST-, NS} calculated and {NS, NS, NS, AS, AS, AS, AS, NS}. The {NS, decoded most sequences byNS, theNS, decoding-based model are The {NS, NS, anomalous sequence contains AS, yet the normal doesn’t. As a result, these two models have both NS, NS, ST-, ST-, ST-, NS, ST-, NS} and {NS, NS, NS, NS, NS, AS, AS, AS, AS, NS}. The anomalous made contains correct decisions. Thenormal performance of the models in each group within crosssequence AS, yet the doesn’t. Astwo a result, these twodata models have boththe made correct validation is given in Table 5. It can be seen from the table that the overall performance of the decisions. The performance of the two models in each data group within the cross-validation is given estimation-based model is better than that of the decoding-based model. Specifically, the estimationin Table 5. It can be seen from the table that the overall performance of the estimation-based model is based model’s evaluation in eight groups for TP rate, six groups for TN rate and eight groups for better than that of the decoding-based model. Specifically, the estimation-based model’s evaluation in AUC are better than the other model. Similarly, Tables 6 and 7 show the evaluation of the confusion eightmatrixes groupsfor forthe TPtwo rate, six groups for TN rate and groups for AUC are better than the models. The results illustrate thateight the estimation-based model outperforms the other model. Similarly, Tables 7 show the as evaluation of the confusion matrixes for the two decoding-based model6inand overall accuracy well as deviation. Therefore, it can be concluded that models. in The results illustrate that the estimation-based model outperforms the decoding-based model in overall terms of T = 10,  = 0.00743, the estimation-based model can resolve anomaly detection in gas turbine accuracy as wellmore as deviation. it analyzed can be concluded in terms of Tlength = 10,ofθ sequence = 0.00743, the  and fuel systems efficiently.Therefore, However, as earlier, thethat threshold can drasticallymodel influence performance. As a result, impacts need tomore be further estimation-based candetecting resolve anomaly detection in gasseveral turbine fuel systems efficiently. discussed. However, as analyzed earlier, the threshold θ and length of sequence can drastically influence detecting

performance. As a result, several impacts need to be further discussed.

Table 5. Performance of the two models in each data group.

Model Estimation-Based Model Model Table 5. Performance of the two modelsDecoding-Based in each data group. Measurement TP Rate TN Rate AUC TP Rate TN Rate AUC 0.9577 0.8977Model 0.9277 0.9478 0.9009 0.9244 Model 1 Estimation-Based Decoding-Based Model 2 0.9692 0.9052 0.9372 0.954 0.8938 0.9239 Measurement TP Rate TN Rate AUC TP Rate TN Rate AUC 3 0.9549 0.906 0.9305 0.9393 0.9032 0.9213 1 0.9577 0.8977 0.9277 0.9478 0.9009 0.9244 4 0.9617 0.9111 0.9364 0.9336 0.8931 0.9134 2 0.9692 0.9052 0.9372 0.954 0.8938 0.9239 5 0.9679 0.8986 0.9333 0.9334 0.9052 0.9193 3 0.9549 0.906 0.9305 0.9393 0.9032 0.9213 6 0.9549 0.9043 0.9296 0.9567 0.9089 0.9328 4 0.9617 0.9111 0.9364 0.9336 0.8931 0.9134 7 0.9574 0.8986 0.9039 0.9307 0.9445 5 0.9679 0.9333 0.9334 0.9188 0.9052 0.93170.9193 8 0.9488 0.9043 0.9101 0.9295 0.9444 6 0.9549 0.9296 0.9567 0.8976 0.9089 0.9210.9328 9 0.9699 0.9039 0.908 0.939 0.9587 7 0.9574 0.9307 0.9445 0.8861 0.9188 0.92240.9317 10 0.9442 0.9101 0.9164 0.9303 0.9487 8 0.9488 0.9295 0.9444 0.8905 0.8976 0.91960.921 9Mean value 0.9699 0.939 0.9587 0.8998 0.8861 0.92290.9224 0.9587 0.908 0.9061 0.9324 0.9461 10 0.9442 0.9164 0.9303 0.9487 0.8905 0.9196 Mean value 0.9587 0.9061 0.9324 0.9461 0.8998 0.9229 Table 6. Performance of the estimation-based model. Actual status Normal Anomaly

Detecting Normal 0.9587 ± 0.0086 0.0939 ± 0.0056 Detecting Normal

Detecting Anomaly 0.0413 ± 0.0086 0.9061 ± 0.0056 Detecting Anomaly

Table 6. Performance of the estimation-based model. Actual status Normal Anomaly

0.9587 ± 0.0086 0.0939 ± 0.0056

0.0413 ± 0.0086 0.9061 ± 0.0056

Energies 2017, 10, 724

16 of 22

Energies 2017, 10, x FOR PEER REVIEW Table 7. Energies 2017, 10, x FOR PEER REVIEW

Performance of the decoding-based model.

17 of 23 17 of 23

Table 7. Performance of the decoding-based model. Actual status Detecting Detecting Anomaly Table 7. Performance of Normal the decoding-based model. Normal 0.9461 ± 0.0089 0.0539 ± 0.0089 Actual status Detecting Normal Detecting Anomaly Actual status Anomaly Detecting Normal Detecting Anomaly 0.1011 ± 0.0097 0.8989 ± 0.0097 Normal 0.9461 ± 0.0089 0.0539 ± 0.0089 Normal 0.9461 ± 0.0089 0.0539 ± 0.0089 Anomaly 0.1011 ± 0.0097 0.8989 ± 0.0097 Anomaly 0.1011 ± 0.0097 0.8989 ± 0.0097

5.3. Threshold Determination Strategy

5.3. Threshold Determination Strategy 5.3. Threshold Strategy Aside from Determination training a FSM, a core problem of building an estimation-based detection model is to Aside from training a FSM, a acore problem of building an estimation-based detection model value is determine a proper threshold θ. In particular we can find adetection most suitable Aside from training a FSM, a core problem sequence of buildinglength, an estimation-based model is  to determine a proper threshold . In a particular sequence length, we can find a most suitable value thattocan classifyathe testing sequence Actually, with different thresholds, we will gainvalue different  . Inwell. determine proper threshold a particular sequence length, we can find a most suitable that can classify the testing sequence well. Actually, with different thresholds, we will gain different that can classify theas testing sequence well. Actually, with thresholds, we will gain different classification results, is shown in Figure 15, which is adifferent ROC curve of the estimation-based model classification results, as is shown in Figure 15, which is a ROC curve of the estimation-based model results, as is shown in Figure 15, which a ROC curve thecurve. estimation-based model withclassification different thresholds. The optimized threshold is is one of them onofthe with different thresholds. The optimized threshold is one of them on the curve. with different thresholds. The threshold is one of themprobability on the curve. A sequence is judged to beoptimized anomalous when its posterior is less than θ. When θ = 0, A sequence is judged to be anomalous when its posterior probability is less than  . When  = A sequence is judged to be anomalous when its posterior probability is less . When = TN the TN rate, the correct detected anomalous sequence among all anomalies, is 0than initially. Then the 0, the TN rate, the correct detected anomalous sequence among all anomalies, is 0 initially. Then the 0, the TN rate, the correct detected anomalous sequence among all anomalies, is 0 initially. Then the rate increases with the growth of θ to 1 when θ reaches a certain point. On the contrary, the TP rate is TN rate increases with the growth of  to 1 when  reaches a certain point. On the contrary, the TP TN rate increases with the growth of to 1 when reaches a certain point. contrary, thedecrease TP 1 initially when θ = 0 since all the posterior probabilities are over 0. Then theOn TPthe rate starts to rate is 1 initially when  = 0 since all the posterior probabilities are over 0. Then the TP rate starts to rate is 1 initially when = 0 since all the posterior probabilities are over 0. Then the TP rate starts to rate withdecrease the growth of θ. This regularity is illustrated in Figure 16. One concern is that both the TP with the growth of  . This regularity is illustrated in Figure 16. One concern is that both the decrease with the growth of . This regularity is illustrated in Figure 16. One concern is that both the and TP TNrate rateand areTN important for the model performance, so the most suitable threshold shouldshould be found rate are important for the model performance, so the most suitable threshold TP rate and TN rate are important for the model performance, so the most suitable threshold should in a be synthesized highest place, which is measured by the average accuracy that is that the is mean value of found in a synthesized highest place, which is measured by the average accuracy the mean be found in a synthesized highest place, which is measured by the average accuracy that is the mean the TP rate and TN rate. In this experiment, we searched the θ in step of 0.0001 until it reached value of the TP rate and TN rate. In this experiment, we searched the  in step of 0.0001 until it the value of the TP rate and TN rate. In this experiment, we searched the  in step of 0.0001 until it peakreached average accuracy whenaccuracy θ = 0.0074. the peak average when  = 0.0074. reached the peak average accuracy when  = 0.0074.

Figure 15. ROC curveofofthe the estimation-based model with different thresholds. Figure 15.15. ROC modelwith with different thresholds. Figure ROCcurve curve of the estimation-based estimation-based model different thresholds.

Figure 16. Performance change with increasing threshold. Figure 16. Performance change with increasing threshold.

Figure 16. Performance change with increasing threshold.

Energies 2017, 10, 724 Energies 2017, 10, x FOR PEER REVIEW

17 of 22 18 of 23

The The optimized optimized threshold threshold will will drop drop along along with with the the change change of of length length T T because because the the posterior posterior probabilities are are continuously continuously multiplying. multiplying. The probabilities The probabilities probabilities drop drop exponentially exponentially as as TT increases. increases. Figure 17 shows shows optimized optimized thresholds thresholds for for different different sequence sequence lengths. lengths. Figure 17

Figure 17. Optimized thresholds for different lengths of the sequence. Figure 17. Optimized thresholds for different lengths of the sequence.

Figure 17(1) is depicted in Cartesian coordinates and Figure 17(2) is depicted in logarithmic Figure 17(1) is depicted in Cartesian coordinates and Figure 17(2) is depicted in logarithmic coordinates in the Y axis. It can be clearly seen that the  has an exponential tendency. coordinates in the Y axis. It can be clearly seen that the θ has an exponential tendency. 5.4. Comparison between the Two Models on Different Length of Sequence 5.4. Comparison between the Two Models on Different Length of Sequence Another main factor that influences the performance of the detection models with the length of Another main factor that influences the performance of the detection models with the length of sequence. The robustness performance of the two models are measured by AUC for different sequence. The robustness performance of the two models are measured by AUC for different sequence sequence lengths. A model with high robustness performance will have a relatively stable AUC with lengths. A model with high robustness performance will have a relatively stable AUC with the growth the growth of length T, and vice versa. of length T, and vice versa. Figures 18 and 19 illustrate the comparison between the estimation-based model and decodingFigures 18 and 19 illustrate the comparison between the estimation-based model and based model in TN rate and TP rate for different lengths of sequence, respectively. With the growth decoding-based model in TN rate and TP rate for different lengths of sequence, respectively. With the of length, the TN rate and TP rate of estimation-based model gradually decrease and then become growth of length, the TN rate and TP rate of estimation-based model gradually decrease and then stable after about T > 60, steadying in 0.82 in TN rate and 0.89 in TP rate. However, the performance become stable after about T > 60, steadying in 0.82 in TN rate and 0.89 in TP rate. However, the of the decoding-based model seems different to that of the estimation-based model. The TN rate of performance of the decoding-based model seems different to that of the estimation-based model. the decoding-based model rises to nearly 1 when the length T > 30, whereas the TP rate decreases The TN rate of the decoding-based model rises to nearly 1 when the length T > 30, whereas the drastically when T > 20. The tremendous difference between the two models is ascribed to several TP rate decreases drastically when T > 20. The tremendous difference between the two models is reasons. First, the detection mechanisms of the two models are not confirmed. The estimation-based ascribed to several reasons. First, the detection mechanisms of the two models are not confirmed. model is built upon the premise of a normal pattern which is constructed by a normal pattern based The estimation-based model is built upon the premise of a normal pattern which is constructed by a FSM, while the decoding-based model is built upon the trained FSM containing normal and normal pattern based FSM, while the decoding-based model is built upon the trained FSM containing anomalous sequences. It points out that estimation-based model concerns those data without normal and anomalous sequences. It points out that estimation-based model concerns those data anomalies while the decoding-based model concerns all the data whatever class they belong to. without anomalies while the decoding-based model concerns all the data whatever class they belong to. Second, the detection indicator used in the estimation-based model is posterior probabilities whereas Second, the detection indicator used in the estimation-based model is posterior probabilities whereas the detection indicator used in the decoding-based model is states. With the growth of length T, the the detection indicator used in the decoding-based model is states. With the growth of length T, the sequences become longer and more complicated for classification using a threshold because there are sequences become longer and more complicated for classification using a threshold because there more single symbols in a sequence. As a result, the performance of the estimation-based model are more single symbols in a sequence. As a result, the performance of the estimation-based model decreases and eventually becomes steady because of a suitable threshold and classification capability decreases and eventually becomes steady because of a suitable threshold and classification capability of FSM. Regarding the decoding-based model, it is more complicated than the estimation-based of FSM. Regarding the decoding-based model, it is more complicated than the estimation-based model. model. With the growth of length T, the sequences become longer, but it is easier to find a possible With the growth of length T, the sequences become longer, but it is easier to find a possible abnormal abnormal state. A sequence is judged to be anomalous only if there is at least one abnormal state. The state. A sequence is judged to be anomalous only if there is at least one abnormal state. The longer longer the sequence is, the more probable it is that abnormal states will hide, so the TN rate rises the sequence is, the more probable it is that abnormal states will hide, so the TN rate rises when the when the length is growing. On the contrary, it is easy to understand that long, complicated length is growing. On the contrary, it is easy to understand that long, complicated sequences would sequences would lead to the model’s incorrect judgement because the more states to be decoded, the higher the possible of misjudgement, so the TP rate decreases along with the growth of T.

Energies 2017, 10, 724

18 of 22

lead to the model’s incorrect judgement because the more states to be decoded, the higher the possible Energies 2017, 10, x FOR PEER 19 of 23 of misjudgement, so the TPREVIEW rate decreases along with the growth of T. Energies 2017, 10, x FOR PEER REVIEW 19 of 23

Figure Comparisonbetween between theestimation-based estimation-based and decoding-based rate. Figure 18.18. Comparison decoding-basedmodels modelson onTN TN rate. Figure 18. Comparison betweenthe the estimation-based and decoding-based models on TN rate.

Figure 19. Comparison between the estimation-based and decoding-based models on TP rate. Figure Comparisonbetween betweenthe theestimation-based estimation-based and and decoding-based Figure 19.19. Comparison decoding-basedmodels modelsononTPTPrate. rate.

Based on the points presented above, we can draw the following conclusion: the estimationBased on the points presented above, we can draw the following conclusion: the estimationbased model a better robustness performance than the decoding-based model while the Based on thehas points presented above, we can draw the following conclusion: the estimation-based based model has a better robustness performance than the decoding-based model while the decoding-based model may yield better performance in particular intervals. Figure 20 shows the model has a better robustness than the decoding-based while the decoding-based decoding-based model mayperformance yield better performance in particular model intervals. Figure 20 shows the AUC comparison between the estimation-based model and decoding-based model, which is an model yield better performance in particular model intervals. 20 shows the AUC comparison AUCmay comparison between the estimation-based andFigure decoding-based model, which is an evaluation measurement that reflects overall classification performance of a model for class between the estimation-based model andthe decoding-based model, which is an evaluation measurement evaluation measurement that reflects the overall classification performance of a model for class imbalanced problems. The AUC of the estimation-based model gradually decreases as the TP rate thatimbalanced reflects theproblems. overall classification of a model for class imbalanced problems. AUC The AUC ofperformance the estimation-based model gradually decreases as the The TP rate and TN rate along with the growth of length T. However, the AUC of the decoding-based model rises of of the model gradually decreases as the TP and along with the growth andestimation-based TN rate along with the growth of length T. However, therate AUC of TN the rate decoding-based model rises to a peak value when T = 20 and then starts to drop. In this figure, we can see that before T = 12, the length However, the AUC thethen decoding-based model to awe peak when T = T20= and then to a T. peak value when T = 20ofand starts to drop. In thisrises figure, canvalue see that before 12, the performance of the estimation-based model is better than that of the decoding-based model. Between starts to drop. In figure, we can see that is before = 12, theofperformance of the model. estimation-based performance of this the estimation-based model betterTthan that the decoding-based Between T = 12 and 53, the decoding-based model outperforms the estimation-based model. After T > 53, the T = 12 53,than the decoding-based model outperforms estimation-based T > 53, the model is and better that of the decoding-based model.the Between T = 12 andmodel. 53, theAfter decoding-based estimation-based model is much more efficient than the decoding-based model. In conclusion, if we estimation-based is much more efficient theTdecoding-based model. In conclusion, we model outperformsmodel the estimation-based model.than After > 53, the estimation-based model isifmuch need a short term anomaly detection system, e.g., with less than a 1 h observation window, the need a short term anomaly detection system, e.g., with less than a 1 h observation window, the more efficient than the decoding-based model. In conclusion, if we need a short term anomaly estimation-based model will be satisfactory. If a long term anomaly detection system is needed, e.g., estimation-based model willless be satisfactory. If a long termwindow, anomaly detection system is needed, e.g., detection system, e.g., than a model 1 h observation will for 1 h to 8 h, then thewith decoding-based will be more efficient.the Forestimation-based an ultra-long termmodel anomaly for 1 h to 8 h, then the decoding-based model will be more efficient. For an ultra-long term anomaly be detection satisfactory. If ae.g., long term detection system is needed, e.g., for 1 h to 8 h, then the system, over 8 h,anomaly an estimation-based model should be used. detection system, e.g.,will overbe 8 h, an estimation-based should be anomaly used. decoding-based model more efficient. For an model ultra-long term detection system, e.g.,

over 8 h, an estimation-based model should be used.

Energies 2017, 10, 724 Energies 2017, 10, x FOR PEER REVIEW

19 of 22 20 of 23

Figure20. 20.Comparison Comparisonbetween betweenthe theestimation-based estimation-basedmodel modeland anddecoding-based decoding-basedmodel modelAUC. AUC. Figure

The main reason that causes the difference between the two models is that in the estimationThe main reason that causes the difference between the two models is that in the estimation-based based model, FSM is used to calculate posterior probabilities for normal sequences, so it is trained by model, FSM is used to calculate posterior probabilities for normal sequences, so it is trained by completely normal data. When FSM are calculating probabilities for testing sequences that are completely normal data. When FSM are calculating probabilities for testing sequences that are anomalous data, the results would be much lower than that of normal data. Therefore, optimizing a anomalous data, the results would be much lower than that of normal data. Therefore, optimizing proper threshold classifying two classes would be of great use for an efficient model performance. In a proper threshold classifying two classes would be of great use for an efficient model performance. the decoding-based model, FSM is used to find the most probable state sequence of an observed In the decoding-based model, FSM is used to find the most probable state sequence of an observed sequence. The main task of FSM is to detect anomalous states efficiently so it is trained by both normal sequence. The main task of FSM is to detect anomalous states efficiently so it is trained by both normal and abnormal data. Once an anomalous state is detected, the sequence is judged to be an anomaly. and abnormal data. Once an anomalous state is detected, the sequence is judged to be an anomaly. 5.5. Comparison of the Different Models 5.5. Comparison of the Different Models In this Section, four models are compared to our proposed models, which are the fuzzy logic In this Section, four models are compared to our proposed models, which are the fuzzy logic method (FL) [21], extended Kalman filtering (EKF) [43], support vector machine (SVM) [14] and back method (FL) [21], extended Kalman filtering (EKF) [43], support vector machine (SVM) [14] and back propagation network (BPN) [12]. Fuzzy set theory and fuzzy logic provide the framework for propagation network (BPN) [12]. Fuzzy set theory and fuzzy logic provide the framework for nonlinear nonlinear mapping. Fuzzy logic systems have been widely used in engineering applications, because mapping. Fuzzy logic systems have been widely used in engineering applications, because of the of the flexibility they offer designers and their ability to handle uncertainty. Extended Kalman flexibility they offer designers and their ability to handle uncertainty. Extended Kalman filtering is filtering is widely used in state estimation and anomaly detection. It uses measurable parameters to widely used in state estimation and anomaly detection. It uses measurable parameters to estimate estimate operating state by state functions and measurement functions for 1-D non-linear filtering. operating state by state functions and measurement functions for 1-D non-linear filtering. Supported Supported vector machine is an algorithm that emphasizes structural risk minimization theory. An vector machine is an algorithm that emphasizes structural risk minimization theory. An SVM can SVM can operate like a linear model to obtain the description of a nonlinear boundary of a dataset operate like a linear model to obtain the description of a nonlinear boundary of a dataset using a using a nonlinear mapping transform. The perceptron learning rule is built by a hyperplane alone, nonlinear mapping transform. The perceptron learning rule is built by a hyperplane alone, with a with a group of weights assigned to each attribute. Data are classified into one class when the sum of group of weights assigned to each attribute. Data are classified into one class when the sum of the the weights of an attribute is a positive number and into another class when the sum is a negative weights of an attribute is a positive number and into another class when the sum is a negative number. number. In the back-propagation network, attributes are reweighted if the samples are classified In the back-propagation network, attributes are reweighted if the samples are classified incorrectly incorrectly until the classification is correct. until the classification is correct. The four models can be divided in two categories: fuzzy logic and extended Kalman filtering are The four models can be divided in two categories: fuzzy logic and extended Kalman filtering are model-based approaches, and support vector machine and back propagation network are machine model-based approaches, and support vector machine and back propagation network are machine learning-based approaches. The estimation-based model (EBFSM) and decoding-based model learning-based approaches. The estimation-based model (EBFSM) and decoding-based model (DBFSM) (DBFSM) are in sequential length T = 10. The overall accuracy is AUC instead. The experimental result are in sequential length T = 10. The overall accuracy is AUC instead. The experimental result is shown is shown in Table 8. It can be seen that the proposed models have a better performance than any other in Table 8. It can be seen that the proposed models have a better performance than any other compared compared models, though the compared four models have similar performance. Concretely, in 10 models, though the compared four models have similar performance. Concretely, in 10 different different datasets, the estimation-based model outperforms the compared models in nine groups and datasets, the estimation-based model outperforms the compared models in nine groups and the the decoding-based model outperforms the compared models in eight groups. The two proposed decoding-based model outperforms the compared models in eight groups. The two proposed models models have about 3–5% higher accuracy than any other models. have about 3–5% higher accuracy than any other models. Table 8. Comparison of different models for overall accuracy. Models Dataset

FL

EKF

SVM

BPN EBFSM DBFSM

Energies 2017, 10, 724

20 of 22

Table 8. Comparison of different models for overall accuracy. Models Dataset

FL

EKF

SVM

BPN

EBFSM

DBFSM

1 2 3 4 5 6 7 8 9 10 Mean value

0.8842 0.8788 0.9003 0.8679 0.8614 0.9099 0.9071 0.8922 0.8766 0.8999 0.8878

0.8709 0.9033 0.912 0.8457 0.8268 0.8897 0.8872 0.9099 0.8962 0.8645 0.8806

0.9043 0.8922 0.8933 0.9208 0.8822 0.9012 0.9188 0.9189 0.9154 0.8989 0.9046

0.8823 0.8945 0.9321 0.9011 0.8875 0.9047 0.9301 0.8913 0.8489 0.8689 0.8941

0.9277 0.9372 0.9305 0.9364 0.9333 0.9296 0.9307 0.9295 0.939 0.9303 0.9324

0.9244 0.9239 0.9213 0.9134 0.9193 0.9328 0.9317 0.921 0.9224 0.9196 0.9229

6. Conclusions The essential issue of anomaly detection is how to detect sensitively and effectively when and how an anomaly would happen. Conventional anomaly detection methods for points or collective anomalies are mostly established based on continuous real-time sensor observations. Noise, operating fluctuation or ambient conditions may be all contained in raw data, which make anomalous observations appear normal. The anomalous features often hide in the structural data which reflects various patterns of the device operating. Thus, we first partitioned the original data into classes by using K-means clustering and symbolized each class to construct a sequence-based feature-structure which intrinsically represents the operating patterns of a device. Second, we built the core computing unit for anomaly detection, finite state machine, on a large quantity of training sequential data to estimate posterior probabilities or find most probable states. Two detection models were generated, an estimation-based model and decoding-based model. The two models on anomaly detection for gas turbine fuel system have their own advantages and weakness, concretely concluded as follows: (1)

(2)

The estimation-based model has strong robustness since the performance is highly stable when the sequence length is growing. However, in the decoding-based model, the performance varies with different sequence lengths. The anomalous sequences are easier to detect in longer lengths than in shorter one, but there will be a higher false alarm rate at longer lengths, which means many normal sequences are misclassified as anomalies. The decoding-based model can help us to locate anomalies occurring point with a high time resolution, which means it can tell us precisely what symbol points are in an anomalous state rather than only probabilities of sequences made by the estimation-based model.

The estimation-based model is more suitable for anomaly detection in gas turbine fuel systems when the observation window is less than 1 h or over 8 h, whereas the decoding-based model is more suitable when the observation window is between 1 and 8 h, so it may help people choose the most efficient detection model for different demands. Further work may be concentrated on algorithm optimization and application extension. As is described above, the decoding-based model uses a local searching algorithm which may cause high deviation in some circumstances, though it could be operated at a high computing speed, so the algorithms need to be optimized. We also need to apply this method to other domains of gas turbine anomaly detection, such as gas path, combustion components, etc. All of these need further attention. Acknowledgments: This paper was partially supported by National Natural Science Foundation of China (NSFC) grant U1509216, 61472099, National Sci-Tech Support Plan 2015BAH10F01, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province LC2016026 and MOE–Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology. Author Contributions: Fei li developed the idea, performed the experiments and wrote the draft paper. Hongzhi Wang reviewed and supervised the whole paper. Guowen Zhou got involved in data processing

Energies 2017, 10, 724

21 of 22

and experimental analysis, Daren Yu and Jiangzhong Li significant improved the paper in scientific ideas and helped modify the paper. Hong Gao provided the raw data and involved in data preprocessing and improved the paper from algorithmic perspective. Conflicts of Interest: The authors declare no conflict of interest.

References 1.

2. 3. 4. 5. 6.

7. 8. 9. 10. 11.

12.

13. 14. 15. 16. 17. 18.

19. 20.

Volponi, A.J. Gas Turbine Engine Health Management: Past, Present and Future Trends. In Proceedings of the ASME Turbo Expo 2013: Turbine Technical Conference and Exposition, San Antonio, TX, USA, 3–7 June 2013; pp. 433–455. Marinai, L.; Probert, D.; Singh, R. Prospects for Aero Gas-Turbine Diagnostics: A Review. Appl. Energy 2004, 79, 109–126. [CrossRef] Urban, L.A. Gas Path Analysis Applied to Turbine Engine Condition Monitoring. J. Aircr. 1973, 10, 400–406. [CrossRef] Pu, X.; Liu, S.; Jiang, H.; Yu, D. Sparse Bayesian Learning for Gas Path Diagnostics. J. Eng. Gas Turbines Power 2013, 135, 071601. [CrossRef] Doel, D.L. TEMPER—A Gas-Path Analysis Tool for Commercial Jet Engines. J. Eng. Gas Turbines Power 1992, 116, V005T15A013. Gulati, A.; Zedda, M.; Singh, R. Gas Turbine Engine and Sensor Multiple Operating Point Analysis Using Optimization Techniques. In Proceedings of the AIAA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit, Las Vegas, NV, USA, 24–28 July 2000; pp. 323–331. Mathioudakis, K. Comparison of Linear and Nonlinear Gas Turbine Performance Diagnostics. J. Eng. Gas Turbines Power 2003, 127, 451–459. Stamatis, A.; Mathioudakis, K.; Papailiou, K.D. Adaptive Simulation of Gas Turbine Performance. J. Eng. Gas Turbines Power 1989, 238, 168–175. Meskin, N.; Naderi, E.; Khorasani, K. Nonlinear Fault Diagnosis of Jet Engines by Using a Multiple Model-Based Approach. J. Eng. Gas Turbines Power 2011, 13, 63–75. Kobayashi, T.; Simon, D.L. Evaluation of an Enhanced Bank of Kalman Filters for In-Flight Aircraft Engine Sensor Fault Diagnostics. J. Eng. Gas Turbines Power 2004, 127, 635–645. Doel, D. The Role for Expert Systems in Commercial Gas Turbine Engine Monitoring. In Proceedings of the ASME 1990 International Gas Turbine and Aeroengine Congress and Exposition, Brussels, Belgium, 11–14 June 1990. Loboda, I.; Feldshteyn, Y.; Ponomaryov, V. Neural Networks for Gas Turbine Fault Identification: Multilayer Perceptron or Radial Basis Network? In Proceedings of the ASME 2011 Turbo Expo: Turbine Technical Conference and Exposition, Vancouver, BC, Canada, 6–10 June 2011. Bettocchi, R.; Pinelli, M.; Spina, P.R.; Venturini, M. Artificial Intelligence for the Diagnostics of Gas Turbines—Part I: Neural Network Approach. J. Eng. Gas Turbines Power 2007, 129, 19–29. [CrossRef] Lee, S.M.; Choi, W.J.; Roh, T.S.; Choi, D.W. A Study on Separate Learning Algorithm Using Support Vector Machine for Defect Diagnostics of Gas Turbine Engine. J. Mech. Sci. Technol. 2008, 22, 2489–2497. [CrossRef] Lee, S.M.; Roh, T.S.; Choi, D.W. Defect Diagnostics of SUAV Gas Turbine Engine Using Hybrid SVM-Artificial Neural Network Method. J. Mech. Sci. Technol. 2009, 23, 559–568. [CrossRef] Romessis, C.; Mathioudakis, K. Bayesian Network Approach for Gas Path Fault Diagnosis. J. Eng. Gas Turbines Power 2004, 128, 691–699. Lee, Y.K.; Mavris, D.N.; Volovoi, V.V.; Yuan, M.; Fisher, T. A Fault Diagnosis Method for Industrial Gas Turbines Using Bayesian Data Analysis. J. Eng. Gas Turbines Power 2010, 132, 041602. [CrossRef] Li, Y.G.; Ghafir, M.F.A.; Wang, L.; Singh, R.; Huang, K.; Feng, X. Non-Linear Multiple Points Gas Turbine Off-Design Performance Adaptation Using a Genetic Algorithm. J. Eng. Gas Turbines Power 2011, 133, 521–532. [CrossRef] Ganguli, R. Application of Fuzzy Logic for Fault Isolation of Jet Engines. J. Eng. Gas Turbines Power 2003, 125, V004T04A006. [CrossRef] Shabanian, M.; Montazeri, M. A Neuro-Fuzzy Online Fault Detection and Diagnosis Algorithm for Nonlinear and Dynamic Systems. Int. J. Control Autom. Syst. 2011, 9, 665. [CrossRef]

Energies 2017, 10, 724

21. 22. 23.

24. 25. 26. 27.

28.

29. 30.

31.

32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.

22 of 22

Dan, M. Fuzzy Logic Estimation Applied to Newton Methods for Gas Turbines. J. Eng. Gas Turbines Power 2007, 129, 787–797. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 1–58. [CrossRef] Lei, W.; Wang, S.; Zhang, J.; Liu, J.; Yan, Z. Detecting Intrusions Using System Calls: Alternative Data Models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 9–12 May 1999; pp. 133–145. Sun, P.; Chawla, S.; Arunasalam, B. Mining for Outliers in Sequential Databases. In Proceedings of the SIAM International Conference on Data Mining, Bethesda, MD, USA, 20–22 April 2006; pp. 94–106. Manson, G.; Pierce, G.; Worden, K. On the Long-Term Stability of Normal Condition for Damage Detection in a Composite Panel. Key Eng. Mater. 2001, 204–205, 359–370. [CrossRef] Ruotolo, R.; Surace, C. A Statistical Approach to Damage Detection through Vibration Monitoring. BMC Proc. 1997, 3, S25. Hollier, G.; Austin, J. Novelty Detection for Strain-Gauge Degradation Using Maximally Correlated Components. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2002), Bruges, Belgium, 24–26 April 2002; pp. 257–262. Yairi, T.; Kato, Y.; Hori, K. Fault Detection by Mining Association Rules from Housekeeping Data. In Proceedings of the International Symposium on Artificial Intelligence Robotics & Automation in Space, St–Hubert, QC, Canada, 18–22 June 2001. Brotherton, T.; Johnson, T. Anomaly Detection for Advanced Military Aircraft Using Neural Networks. In Proceedings of the Aerospace Conference, Big Sky, MT, USA, 10–17 March 2001. Aminyavari, M.; Mamaghani, A.H.; Shirazi, A.; Najafi, B.; Rinaldi, F. Exergetic, Economic, and Environmental Evaluations and Multi-Objective Optimization of an Internal-Reforming SOFC-Gas Turbine Cycle Coupled with a Rankine Cycle. Appl. Therm. Eng. 2016, 108, 833–846. [CrossRef] Mamaghani, A.H.; Najafi, B.; Casalegno, A.; Rinaldi, F. Predictive Modelling and Adaptive Long-Term Performance Optimization of an HT-PEM Fuel Cell Based Micro Combined Heat and Power (CHP) Plant. Appl. Energy 2016, 192, 519–529. [CrossRef] Ray, A. Symbolic Dynamic Analysis of Complex Systems for Anomaly Detection. Signal Process. 2004, 84, 1115–1130. [CrossRef] Rao, C.; Ray, A.; Sarkar, S.; Yasar, M. Review and Comparative Evaluation of Symbolic Dynamic Filtering for Detection of Anomaly Patterns. Signal Image Video Process. 2009, 3, 101–114. [CrossRef] Gupta, S.; Ray, A.; Sarkar, S.; Yasar, M. Fault Detection and Isolation in Aircraft Gas Turbine Engines. Part 1: Underlying Concept. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2008, 222, 307–318. [CrossRef] Yamaguchi, T.; Mori, Y.; Idota, H. Fault Detection and Isolation in Aircraft Gas Turbine Engines. Part 2: Validation on a Simulation Test Bed. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2008, 222, 319–330. Soumik, S.; Jin, X.; Asok, R. Data-Driven Fault Detection in Aircraft Engines with Noisy Sensor Measurements. J. Eng. Gas Turbines Power 2011, 133, 783–789. Sarkar, S.; Mukherjee, K.; Sarkar, S.; Ray, A. Symbolic Dynamic Analysis of Transient Time Series for Fault Detection in Gas Turbine Engines. J. Dyn. Syst. Meas. Control 2012, 135, 014506. [CrossRef] Baum, L.E.; Petrie, T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Ann. Math. Stat. 1966, 37, 1554–1563. [CrossRef] Liu, H.; Hussain, F.; Tan, C.L.; Dash, M. Discretization: An Enabling Technique. Data Min. Knowl. Discov. 2002, 6, 393–423. [CrossRef] Tsai, C.J.; Lee, C.I.; Yang, W.P. A Discretization Algorithm Based on Class-Attribute Contingency Coefficient. Inf. Sci. 2008, 178, 714–731. [CrossRef] Gupta, A.; Mehrotra, K.G.; Mohan, C. A clustering-Based Discretization for Supervised Learning. Stat. Probab. Lett. 2010, 80, 816–824. [CrossRef] Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). ACM Sigmod Rec. 2011, 31, 76–77. [CrossRef] Chen, J.; Patton, R.J. Robust Model-Based Fault Diagnosis for Dynamic Systems; Springer: New York, NY, USA, 1999. © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).