Fault Detection for Gas Turbine Hot Components Based on a

1 downloads 0 Views 8MB Size Report
17 Aug 2018 - detection method for hot components based on a convolutional .... Convolutional neural network (CNN) are successfully used in ..... AUCs, and MCCs of 10 random runs for the three models are shown in Table 1. ..... Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document ...
energies Article

Fault Detection for Gas Turbine Hot Components Based on a Convolutional Neural Network Jiao Liu 1 , Jinfu Liu 1, *, Daren Yu 1, *, Myeongsu Kang 2 , Weizhong Yan 3 and Michael G. Pecht 2 1 2 3

*

ID

, Zhongqi Wang 1

School of Energy Science and Engineering, Harbin Institute of Technology, Harbin 150001, China; [email protected] (J.L.); [email protected] (Z.W.) Center for Advanced Life Cycle Engineering (CALCE), University of Maryland, College Park, MD 20742, USA; [email protected] (M.K.); [email protected] (M.G.P.) Machine Learning Lab, GE Global Research Center, Niskayuna, NY 12309, USA; [email protected] Correspondence: [email protected] (J.L.); [email protected] (D.Y.)

Received: 4 August 2018; Accepted: 14 August 2018; Published: 17 August 2018

 

Abstract: Gas turbine hot component failures often cause catastrophic consequences. Fault detection can improve the availability and economy of hot components. The exhaust gas temperature (EGT) profile is usually used to monitor the performance of the hot components. The EGT profile is uniform when the hot component is healthy, whereas hot component faults lead to large temperature differences between different EGT values. The EGT profile swirl under different operating and ambient conditions also cause temperature differences. Therefore, the influence of EGT profile swirl on EGT values must be eliminated. To improve the detection sensitivity, this paper develops a fault detection method for hot components based on a convolutional neural network (CNN). This paper demonstrates that a CNN can extract the information between adjacent EGT values and consider the impact of the EGT profile swirl. This paper reveals, in principle, that a CNN is a viable solution for dealing with fault detection for hot components. Based on the distribution characteristics of EGT thermocouples, the circular padding method is developed in the CNN. The sensitivity of the developed method is verified by real-world data. Moreover, the developed method is visualized in detail. The visualization results reveal that the CNN effectively considers the influence of the EGT profile swirl. Keywords: gas turbine; hot component; fault detection; exhaust gas temperature (EGT); convolutional neural network (CNN)

1. Introduction Gas turbines are widely used as the main power source in many areas, such as aircraft, ships, oil and gas applications, and power generation. Reducing maintenance costs and increasing the availability of gas turbines are two essential issues for equipment owners [1,2]. Prognostics and health management (PHM) can solve these problems and ensure that gas turbines run safely and economically [3–5]. The hot components, including the combustion system and turbine, are the critical components of gas turbines. The hot components operate under the adverse environmental conditions of high temperature, high pressure, and high speed. Hot component failures often result in catastrophic accidents and very large economic losses. Fault detection plays an important role in PHM systems, and usually focuses on timely detection of faults and avoiding more serious losses. Therefore, fault detection for gas turbine hot components is particularly significant. Exhaust gas temperature (EGT) provides the most relevant information about gas turbine hot component performance, so it is usually used to monitor gas turbine hot components. EGT is measured

Energies 2018, 11, 2149; doi:10.3390/en11082149

www.mdpi.com/journal/energies

Energies 2018, 11, 2149

2 of 18

by several thermocouples distributed equally in the gas turbine exhaust section. All thermocouple readings are almost identical when the hot components are healthy, whereas some thermocouple readings are variable when the hot components malfunction. Hence, the EGT profile can be used for fault detection. The uniformity of the EGT profile indicates the health status of the hot components, and some typical indicators are presented, such as the temperature difference between EGT and the average EGT value [6,7], temperature difference between the maximum and the minimum EGT [8], maximum EGT [7,9], median EGT [7] and average EGT [7]. Excessive and sudden changes or consistent upward trends in the indicators mean the potential hot component problems. However, in practice, the operators have found this solution to lack adequate sensitivity. In other words, the hot components have been damaged seriously once the alarm is generated [10]. The methods above take advantage of discrepancies between different EGT values to detect faults. However, the hot component fault is not the only reason for EGT discrepancies. Different operating and ambient conditions also cause EGT discrepancies. Hot gas from the combustors is rotated by the turbine blades. The swirl angle changes depending on the operating and ambient conditions. Hence, the EGT profile swirls under different operating and ambient conditions, and the EGT profile swirl results in the EGT discrepancies. At the same time, the EGT profile scales under different operating and ambient conditions, which also causes the EGT discrepancies. It is difficult to detect the faults from these complex interference factors. Obviously, by simply monitoring the EGT discrepancies, the methods above cannot detect the faults as early as possible. Therefore, many researchers have focused on the fault detection method of the hot components based on the EGT model [11–21]. A normality EGT model is built [11–21]. If the estimated EGT based on the model is different from the actual EGT, the abnormalities of the hot components are identified. The more accurate the model, the better the detection performance is. The model may be built either from some prior identification based on a set of training data (data-driven model) or from prior physical knowledge of the gas turbine (physical model). Data-driven models usually use some data mining methods to learn the relationship between the measurable parameters and EGT. Song et al. [11] and Yılmaz [12] evaluated the relationship between EGT and operational parameters based on a multiple linear regression. An artificial neural network (ANN) is also usually used to build the EGT model [13,14]. Tarassenko et al. [13] presented an EGT model of normality based on an ANN and made use of the spatial correlations in the EGT profile. When a fault develops in one of the combustors, there is a local effect on the temperature profile; only a small number of contiguous thermocouples are significantly affected whilst the rest of the profile is largely unchanged. The model of normality was, therefore, constructed by learning the function relating the temperature values from the four thermocouples opposite it. Yan [15,16] took advantage of stacked denoising autoencoder to extract feature from EGT and used extreme learning machines (ELM) to detect abnormalities in combustors. Hierarchical clustering and self-organizing map neural networks were used for gas turbine pre-chamber burnout anomaly detection [17]. Some EGT physical models were also presented. Basseville et al. [18,19] presented an EGT physical model. In this model, the swirl angle was calculated based on gas turbine physical principles. There were some ideal hypotheses, and the size of the gas turbine need to be known. Medina et al. [20] presented an EGT model based on both basis function expansion and the Brayton cycle. The swirl angle was obtained based on an empirical function of the ambient temperature and the power generated, which was provided by the manufacturer. As a universal empirical function, it is insensitive to detect the hot component faults of a specific gas turbine. Liu et al. [21] presented an EGT model and two main factors affecting the EGT were considered, including the operating and ambient conditions, and the structure deviation of different combustors caused by processing and installation errors. However, the EGT profile swirl was not considered. Although many EGT models were presented in previously works, the influence of EGT profile swirl on EGT values has not been well considered. Therefore, to further improve the sensitivity of fault

Energies 2018, 11, 2149

3 of 18

Energies 2018, 11,gas x turbine hot components, the EGT model should be developed by considering 3 ofthe 18 detection for effect of the EGT profile swirl. Convolutional neural successfully used used in in pattern pattern recognition recognition and and image image Convolutional neural network network (CNN) (CNN) are are successfully processing [22–24]. A CNN, to the best of our knowledge, has not been used for gas turbine hot processing [22–24]. A CNN, to the best of our knowledge, has not been used for gas turbine hot componentsPHM PHMapplications. applications. CNN is good at learning features, it is robust to the components AA CNN is good at learning locallocal features, and itand is robust to the feature feature shift, scale, and distortion. The property of a CNN makes it a viable solution for dealing shift, scale, and distortion. The property of a CNN makes it a viable solution for dealing withwith the the EGT profile problem. First, abnormal information the hotcomponent componentisiscontained containedin inseveral several EGT profile problem. First, thethe abnormal information of of the hot adjacentEGT EGTvalues values rather ratherthan thanall all the the EGT adjacent EGT values. values. By By perceiving perceiving local local features features and and sharing sharing weights, weights, a CNN can exactly extract key information from adjacent EGT values. Second, the EGT swirl a CNN can exactly extract key information from adjacent EGT values. Second, the EGT profile profile swirl can be be seen seen as as the the shift shift of of the the EGT EGT profile. profile. Since Since aa CNN CNN can can ensure ensure some some degree degree of of shift shift invariance, invariance, can it can effectively discern the impact of the EGT profile swirl. In this paper, the influence of the the EGT EGT it can effectively discern the impact of the EGT profile swirl. In this paper, the influence of profile swirl on fault detection of the hot components is described. Furthermore, the reason why CNN profile swirl on fault detection of the hot components is described. Furthermore, the reason why is suitable for hotfor components fault detection is analyzed and visualized in detail. According to the CNN is suitable hot components fault detection is analyzed and visualized in detail. According distribution characteristics of EGT thermocouples, the circular padding method of CNN is to the distribution characteristics of EGT thermocouples, the circular padding method of CNN is developed. CNN CNN is is evaluated evaluated in in the the context context of of aa real-world real-world application applicationof ofgas gasturbine turbinehot hot component component developed. fault detection. fault detection. The remainder asas follows. Section 2 introduces the challenges of fault The remainder of ofthis thispaper paperisisorganized organized follows. Section 2 introduces the challenges of detection of gas hot hot components andand emphasizes thethe influence fault detection of turbine gas turbine components emphasizes influenceofofthe theEGT EGTprofile profile swirl. swirl. Section33 describes describesthe thetheoretical theoreticalbackground backgroundof of CNN CNN and and analyzes analyzes the thereason reasonwhy why CNN CNN is is suitable suitable Section for hot component fault detection. Based on the distribution characteristics of gas turbine EGT for hot component fault detection. Based on the distribution characteristics of gas turbine EGT thermocouples, the the circular circular padding padding method method of of CNN CNN is is developed. developed. The The experimental experimental results results and and thermocouples, visualization results are given in Section 4, and Section 5 concludes the paper. visualization results are given in Section 4, and Section 5 concludes the paper.

2. Challenges of Fault Detection for Gas Gas Turbine Turbine Hot Hot Components Components 2. A typical typical gas turbine turbine usually usually consists consists of three components—a compressor, compressor, a combustion combustion system, and a turbine. a widely used parameter to monitor the performance of hot EGT and turbine. EGT EGTisis a widely used parameter to monitor the performance ofcomponents. hot components. is measured by several thermocouples distributed uniformly in in the EGT is measured by several thermocouples distributed uniformly thegas gasturbine turbineexhaust exhaust section. section. Figure 11describes describes circumferential distribution ofcombustors the combustors and thermocouples. As is Figure thethe circumferential distribution of the and thermocouples. As is known, known, the status burning status of the combustors is almost inoperation, the normal and all the the burning of the combustors is almost same in the same normal andoperation, all the thermocouple thermocouple readings almost same. When the hot components are abnormal, some readings are almost same.are When the hot components are abnormal, some thermocouple readings thermocouple readings would bethermocouple different fromreadings. the otherTherefore, thermocouple readings. Therefore,result normal would be different from the other normal hot components in components resultwhile in a uniform profile, while abnormal components give rise to an ahot uniform EGT profile, abnormalEGT hot components give rise to anhot unequal EGT profile, as shown unequal profile, as the shown in Figurediscrepancy 2. The greater the temperature discrepancyreadings, between in FigureEGT 2. The greater temperature between different thermocouple different thermocouple the greater thefaults. possibility of the hot component faults. the greater the possibilityreadings, of the hot component

Figure 1. Combustors and and thermocouples thermocouples distribution. distribution. Figure 1. Combustors

Energies 2018, 11, 2149

4 of 18

Energies 2018, 11, x

4 of 18

Energies 2018, 11, x

4 of 18

Figure 2. Comparisonbetween between normal operation. Figure 2. Comparison normaland andabnormal abnormal operation.

However, the hot component fault is not the unique factor that causes the discrepancy between

However, thevalues. hot component fault is not the unique factor that causes the discrepancy between varying EGT varying EGT values. Figurechanges 2. Comparison between normal andambient abnormalconditions. operation. All the EGT values First, every EGT value with the operating and First, every EGT value changes withand theitoperating andthe ambient conditions. All the EGT values increase or decrease at the same time, is shown that EGT profile scaling under different However, the hot conditions. componentIn fault is not the unique factor this thatfactor causeswas themainly discrepancy between operating and ambient previous work [11–14,21], discussed. In increase or decrease at the same time, and it is shown that the EGT profile scaling under different varying EGT values. this paper, it will also be considered. operating and ambient conditions. In previous work [11–14,21], this factor was mainly discussed. First, every EGT value with under the operating and ambient and conditions. Allconditions. the EGT values Second, the EGT profilechanges also swirls different operating ambient The In this paper, itor will also be considered. increase decrease at the same time, and it is shown that the EGT profile scaling under different combustion system heats a mixture of air and fuel at very high temperature, and the hot gas from the Second, the EGT profile also In swirls under different operating and ambient conditions. operating and ambient conditions. previous workthe [11–14,21], factorby was discussed. combustors drives the turbine to rotate. Naturally, hot gas isthis rotated themainly turbine blades, asIn The combustion heats mixture air and fuel does at very temperature, and the hot gas gas from this paper, itFigure will also be considered. described insystem 3. As aa result, theofthermocouple nothigh measure the temperature of the Second, the EGT profile also swirls under different operating and ambient conditions. The the combustors drives the turbine to rotate. Naturally, the hot gas isbetween rotatedthe by combustor the turbine blades, from the combustor at the same angular position. There is a swirl angle and combustion system heats a mixture of air and fuel at very high temperature, and the hot gas from as described in Figure measuring 3. As a result, the thermocouple not measure the temperature ofthe the gas the thermocouple the temperature of the hotdoes gas from this combustor. The swirl angle combustors drives thesame turbine to rotate. Naturally, the hot is rotated bydifferent the turbine blades, as and depends on the operating and ambient conditions. Figure illustrates EGT profiles from the combustor at the angular position. There is a4gas swirl anglethe between the combustor described in Figure 3. As a result, the thermocouple does not measure the temperature of the gas under differentmeasuring power. The the EGTtemperature profiles are shown after mean to simply eliminate the angle the thermocouple of the hot gasnormalization from this combustor. The swirl fromprofile the combustor at theby same angularand position. There is a swirl angle between the combustor and EGT scaling caused operating ambient conditions. EGT profile a is measured at 121.56 depends onthermocouple the operating and ambient conditions.ofFigure 4gas illustrates the different The EGTswirl profiles under the measuring the temperature the hot from this combustor. angle MW of power, and EGT profile b is measured at 148.92 MW of power. The EGT profile swirls about different power. The EGT profiles are shown after mean normalization to simply eliminate the EGT depends on the operating and ambient Figure 4 illustrates the different profiles one thermocouple when generated power conditions. is reduced from 148.92 MW to 121.56 MW. AsEGT depicted in profileFigure scaling caused by operating ambient conditions. EGT profile a istomeasured at 121.56 under different power. EGT and profiles areisshown afterthe mean normalization simply eliminate the MW 5, the position of The the thermocouples fixed, and EGT profile swirls under the operating EGT profile scaling caused by operating and ambient conditions. EGT profile a is measured at 121.56 of power, and EGT profile b is measured at 148.92 MW of power. The EGT profile swirls about and ambient conditions. It can be seen that the temperature discrepancies between different one MW of power, and EGT profile b is measured at 148.92 of power. The121.56 EGT profile about thermocouple when generated power is reduced from MW 148.92 MW to MW.swirls As depicted in thermocouple readings could change. one thermocouple when generated power is reduced from 148.92 MW to 121.56 MW. As depicted in and Figure 5, the position of the thermocouples is fixed, and the EGT profile swirls under the operating Figure 5, the position of the thermocouples is fixed, and the EGT profile swirls under the operating ambient conditions. It can be seen that the temperature discrepancies between different thermocouple and ambient conditions. It can be seen that the temperature discrepancies between different readings could change. thermocouple readings could change.

Figure 3. Rotation of hot gas in a turbine.

Figure 3. Rotationofofhot hot gas gas in Figure 3. Rotation in aaturbine. turbine.

Energies 2018, 11, 2149

5 of 18

Energies 2018, 11, x

5 of 18

Energies 2018, 11, x

5 of 18

(a ) (a )

(b) (b)

Figure 4. EGT profiles swirl: (a) EGT profiles under different power; and (b) EGT profile b is manually

Figure 4. EGT profiles swirl: (a) EGT profiles under different power; and (b) EGT profile b is manually Figure 4.counter-clockwise EGT profiles swirl:for (a) one EGTthermocouple. profiles under different power; and (b) EGT profile b is manually rotated rotated counter-clockwise forfor one thermocouple. rotated counter-clockwise one thermocouple.

Figure 5. The influence of EGT profile swirl on the EGT. Figure Theinfluence influence of of EGT on on thethe EGT. Figure 5. 5. The EGTprofile profileswirl swirl EGT. In addition, the hot gas from different combustors mixes in the turbine, which reduces the In addition, hot gas from different combustors mixes in thediscrepancies turbine, which the amplitude of thethe EGT discrepancies compared with the temperature at reduces the exit from In addition, the EGT hot discrepancies gas from combustors mixes the turbine,at reduces amplitude of the compared with the temperature discrepancies the exit frombe the the combustors. Accordingly, thedifferent EGT discrepancies caused by thein hot components ’ which faults would the combustors. Accordingly, the EGT discrepancies by the hot components ’ faultsatwould be from amplitude of the EGT discrepancies compared withcaused the temperature discrepancies the exit very small. very small. The false alarm rate (FAR) and the missing alarm rate (MAR) are hot the two most criticalfaults evaluation the combustors. Accordingly, the EGT discrepancies caused by the components’ would be The false alarm rate (FAR) the missing alarm rate (MAR) are the will two high, most critical evaluation indexes of fault detection. Theand detection threshold is small and the FAR while the detection very small. indexes of fault detection. The detection threshold is small and the FAR will high, while the detection threshold is large and the MAR will high, as shown in Figure 6. As is analyzed, EGT discrepancies The false alarm rate (FAR) and the missing alarm rate (MAR) are the two most critical evaluation threshold is large the MAR will high, asHowever, shown indifferent Figure 6.operating As is analyzed, EGT discrepancies can be used as theand fault detection indicator. and ambient conditions can indexes of fault detection. The detection threshold is small and the FAR will high, while the detection can because used as the faultdiscrepancy. detection indicator. However, and ambient conditions can also the EGT The effect of the different operatingoperating and ambient conditions is not only threshold is large and theprofile MARscaling, willThe high, as shown in Figure 6. As is analyzed, EGT discrepancies also cause effect operating ambient reflected inthe theEGT EGTdiscrepancy. but alsoof in the EGT profileand swirl. At theconditions early stageisofnot theonly fault, can bereflected used asinthe fault detection indicator. However, different operating and ambient conditions theof EGT scaling, but alsothan in the profile swirl. Atfactors. the early stage ofa the fault, the influence theprofile faults is even smaller theEGT influence of these To avoid high false can also cause the discrepancy. The the operating and ambient conditions not only the influence ofEGT the faults is even smaller than of the influence these factors. To avoid high false alarm rate, the threshold should be set effect larger. This causes aofhigh missing alarm rate, ai.e., theis fault alarm the threshold should be setalso larger. causes a high missing rate, i.e., theoffault cannot be EGT detected as early as possible. Hence, the EGT challenge of fault detection hot components is fault, reflected in rate, the profile scaling, but in This the profile swirl. Atalarm thefor early stage the cannot be as eliminate early as smaller possible. Hence, the challenge fault foravoid hot components is alarm determining to the effects ofthe different operating anddetection ambientTo conditions on thefalse EGT the influence ofdetected thehow faults is even than influence ofofthese factors. a high determining to eliminate the effects ofcauses different and ambient conditions the EGT profile. rate, the thresholdhow should be set larger. This a operating high missing alarm rate, i.e., theonfault cannot be profile. It should be mentioned that the operating points analyzed in this paper are that the start-up detected as early as possible. Hence, the challenge of fault detection for hot components is determining It should that the operating in this paper aredifferent that theoperating start-up process of thebe gasmentioned turbine is completed and thenpoints the gasanalyzed turbine runs stably under how to eliminate effects different operating and conditions on the EGToperating profile. process of thethe gas turbineof is completed and then the gas ambient turbine runs stably under different and ambient conditions. Itand should be conditions. mentioned that the operating points analyzed in this paper are that the start-up ambient

process of the gas turbine is completed and then the gas turbine runs stably under different operating and ambient conditions.

Energies 2018, 11, 2149

6 of 18

Energies 2018, 11, x

6 of 18

Energies 2018, 11, x

6 of 18

Figure Bayesian decision decision diagram. Figure 6.6.Bayesian diagram.

3. Developed Fault Detection Method for Gas Turbine Hot Components Based On a CNN

Figure Bayesian decision diagram. 3. Developed Fault Detection Method for6.Gas Turbine Hot Components Based On a CNN

3.1. Theoretical Background of a CNN

3. Developed Fault Detection Method for Gas Turbine Hot Components Based On a CNN 3.1. Theoretical Background of a CNN

A CNN is a multi-stage feed-forward artificial neural network that learns a hierarchical feature

Arepresentation CNN is a multi-stage neural network that learns hierarchical 3.1. Theoretical Background offor a CNN mechanismfeed-forward input dataartificial with different levels of abstraction [25].aIn each stage, afeature certain number of feature maps corresponds to a levellevels of abstraction for features. The feature map representation mechanism for input data with different of abstraction [25]. In each stage, a certain A CNN is a multi-stage feed-forward artificial neural network that learns a hierarchical feature consists of several neurons. Feature maps at different stages are connected by operations such as number of feature maps corresponds to adata level of different abstraction foroffeatures. The[25]. feature map consists of representation mechanism for input with levels abstraction In each stage, a convolution, nonlinear activation, and pooling. Figure 7 shows a typical CNN architecture. severalcertain neurons. Feature maps at different stages by operations as convolution, number of feature maps corresponds to aare levelconnected of abstraction for features.such The feature map consists of several neurons. Feature maps at different stages are architecture. connected by operations such as nonlinear activation, and pooling. Figure 7 shows a typical CNN convolution, nonlinear activation, and pooling. Figure 7 shows a typical CNN architecture.

Figure 7. Typical CNN architecture.

A CNN comprises three basic architectural concepts: local receptive fields, shared weights, and Figure 7. 7.Typical architecture. Figure TypicalCNN sub-sampling [26]. These concepts ensure CNN CNN has architecture. some degree of scale, shift, and distortion invariance. A CNN comprises three basic architectural concepts: local receptive fields, shared weights, and The comprises first conceptthree meansbasic each neuron (convolutional kernel) of the convolution layer perceives Asub-sampling CNN architectural local receptive fields, weights, [26]. These concepts ensure CNN concepts: has some degree of scale, shift, andshared distortion only the local area of the input rather than the global input. Due to local receptive fields, the and sub-sampling [26]. These concepts ensure CNN has some degree of scale, shift, and invariance. elementary key features can be extracted, such as corners, end-points, and oriented edges in digit distortion invariance. The first concept means each neuron (convolutional kernel) of the convolution layer perceives recognition. Then these elementary key features are combined by the subsequent layers to detect only theconcept local area of theeach input rather (convolutional than the global kernel) input. Due to local receptive layer fields,perceives the The first means neuron of the convolution higher-order features. elementary key features can be extracted, such as corners, end-points, and oriented edges in digit only the local area indicates of the input rather than theshares global Due toin local The second the convolutional kernel theinput. same weights featurereceptive maps at a fields, recognition. Then these elementary key features areas combined byend-points, the subsequent layers to detect the elementary key features can be extracted, such corners, and oriented edges in certain stage. The input would be scanned sequentially by a signal convolutional kernel that has a higher-order features. digit recognition. Then elementary key isfeatures combined byshifted the subsequent layers to detect local receptive field.these Therefore, if the input shifted, are the output will be by the same amount, The second indicates the convolutional kernel shares the same weights in feature maps at a but will features. be left unchanged otherwise. Mathematically, the j-th output feature map of a convolution higher-order certain stage. The input would be scanned sequentially by a signal convolutional kernel that has a layer is expressed as Equation (1): The second indicates the convolutional shares same in by feature maps at a certain local receptive field. Therefore, if the inputkernel is shifted, thethe output willweights be shifted the same amount, stage. but Thewill input would be scanned sequentially signal convolutional kernel that has a local be left unchanged otherwise. Mathematically, j-th output feature map of a convolution  j by a the y j = f the + wijwill  xi be layer is expressed as Equation (1): is shifted, receptive field. Therefore, if the input but will  b output  shifted by the same amount,(1)





i



be left unchanged otherwise. Mathematically, the j-th output feature map of a convolution layer is  ij i ij y j = and f  b j y+j iswthe ! xj-th (1) expressed as Equation (1): input feature map, x i is the i-th where  output feature map. w is the



 i



ij bias. convolutional kernel between x and y j =y f, and b j +b∑jiswthe ∗ xi The symbol  is the convolutional i y x wij isinthe where is the i-th input feature map, and is theReLU j-th activation output feature map.applied i operation. f ( ) indicates the nonlinear activation function, is usually a i

j

j

(1)

j  wisijthe x i and convolutional kernel between is the bias. feature The symbol convolutional f ( xinput ) = maxfeature ( 0, x ) . map, i.e., i-th whereCNN, xi is the andy y,j and is theb j-th output map. is the convolutional i j j f  operation. indicates the nonlinear activation function, ReLU activation is usually applied a ( ) kernel between x and y , and b is the bias. The symbol ∗ is the convolutional operation. f (·) in indicates f ( x ) = max the nonlinear activation function, ( 0, x ) . ReLU activation is usually applied in a CNN, i.e., f ( x) = max(0, x). CNN, i.e., The third concept, sub-sampling, is commonly known as “pooling”. Pooling obtains a specific feature from every feature map. For example, in Equation (2), global average pooling [27] obtains

j

Energies 2018, 11, x

7 of 18

The third concept, sub-sampling, is commonly known as “pooling”. Pooling obtains a specific 7 of 18 feature from every feature map. For example, in Equation (2), global average pooling [27] obtains the

Energies 2018, 11, 2149

average value of the feature map a , such as a1 , whereas the feature map a is separated into the average value of the feature map a, such as a1 , whereas the feature map a is separated into many many non-overlapping regions, and max-pooling obtains the maximum value of each region, such non-overlapping regions, and max-pooling obtains the maximum value of each region, such as a2 . as a2 . As can be seen, the location of the extracted feature is eliminated after pooling. This is because As can be seen, the location of the extracted feature is eliminated after pooling. This is because the the location is important, less important, onceelementary the elementary key features are detected. This property also location is less once the key features are detected. This property also ensures ensures CNN istorobust to theshifts. feature shifts. CNN is robust the feature 

1 5  a= 2 7

4 1 3 4

 2 " 0 5  , a1 = [3], a1 = 1 7 3

4 3 3 5

4 5

#

(2) (2)

3.2. Applicability of CNNs CNNs in in Gas Gas Turbine Turbine Hot Hot Component Component Fault FaultDetection Detection 3.2. Applicability of According to to thermodynamic thermodynamic theory, theory,the theEGT EGTTT obtained Equation According obtained as as Equation (3):(3): 4 4is is

(

)

h  i k −1 / k 1 −π(t(k−1))/k T4 =TT4 3= 1T3−1η−t 1t − t 

(3) (3)

 t expansion where TT the combustor outlet temperature, efficiency, is the expansion ratio, where the combustor outlet temperature, ηt isturbine efficiency, πt is the ratio, and k is t is turbine 3 3isis the exponent. A gas turbine several combustors. Considering the hot gas andisentropic k is the isentropic exponent. A gas usually turbinecontains usually contains several combustors. Considering the mixing, the T depends on the temperature of each combustor. Then, Equation (3) can be written as 4 the T depends on the temperature of each combustor. Then, Equation (3) can be hot gas mixing, 4 Equation (4): written as Equation (4): h  i   n (k−1)/k T4,i = ∑ T3,jn 1 − ηt 1 − πt g Φit − Φcj (4) k −1 / k T4,ji==1  j =1T3, j 1 − t 1 −  t( )  g ( ti − cj ) (4)   where T3,j is the outlet temperature of the j-th combustor and T4,i is the exhaust gas temperature where T is the outlet temperature of the j-th combustor and T4,i is the exhaust gas temperature measured3, jby the i-th thermocouple. g(·) indicates the influence of each combustor on T4,i . It is usually measuredasby i-th thermocouple. the influence of each combustor () indicates modeled thethe normal distribution, asgshown in Equation (5) and Figure 8 [24,28]. Φc ison theTangular 4,i . It is

(

)

j

t position of the j-th position of the i-th and A and σ are  cj is usually modeled ascombustor, the normal Φ distribution, as shown in Equation (5)thermocouple, and Figure 8 [24,28]. the i is the angular t constant parameters that characterize the influence of the combustor. angular position of the j-th combustor,  is the angular position of the i-th thermocouple, and A

and



i

! of the combustor. are constant parameters that characterize the influence Φ2 g(Φ) = A exp − 22  2σ  g (  ) = A exp  − 2   2 

(5)

(5)

Figure 8. Function Function gg(( Φ )).. Figure

Since the hot gas rotates in the turbine, the thermocouple does not measure the temperature of the gas from the combustor at the same angular position. The swirl angle Φd is used to correct Equation

EnergiesEnergies 2018, 11, 2149 2018, 11, x

8 of 18 8 of 18

Since Energies 2018,the 11, hot x

gas rotates in the turbine, the thermocouple does not measure the temperature 8 of of 18

(4). Φthe related thecombustor operatingatand conditions. theangle expression thetoEGT is shown fromtothe the ambient same angular position.Finally, The swirl used correct  d isfor d is gas Since the hot gas rotates in the turbine, the thermocouple does not measure the temperature of as Equation (6): Equation (4).  d is related to the operating and ambient conditions. Finally, the expression for the  i  the gas from the combustorn at theh same angular angle d is used to correct (position. k−1)/k The swirl t c EGT is shown as Equation T4,i = (6): T 1 − η 1 − π g Φ − Φ − Φ (6) t ∑ 3,j d i j t Equation (4).

and ambient conditions. Finally, the expression for the  d is relatedj=to1 the operating n k −1 / k

(

T4,i =  j =1T3, j 1 −t 1 −  t( EGT is shown as Equation (6):

)

) g (  −  t i

c j

− d )

(6)

  Based on Equation (6), it can be seen that one combustor’s performance can affect several adjacent n k − 1 / k ( ) c Based on Equation (6), it can be seen that one combustor’s affect several  thermocouple readings. For illustration, 9 shows )a gschematic diagram can of the influence T4,i =  j =1T3,Figure t − performance (6) of the j 1 − t (1 −  t j − d )   ( ai schematic adjacent thermocouple readings. For illustration, Figure 9 shows diagram of the influence combustor outlet temperature on thermocouple readings. As can be seen, the abnormal information theBased combustor outlet temperature on thermocouple readings. As can be seen, the abnormal on Equation (6), it can seen that one combustor’s performance canall affect several of theofhot component is contained inbe several adjacent EGT values rather than the EGT values. information of the hot component is contained in several valuesdiagram rather than allinfluence the EGT adjacent thermocouple readings. For illustration, Figure 9adjacent shows aEGT schematic of the As stated earlier, one of the advantages of CNN is the ability to perceive local features. The CNN can values. stated earlier, of the advantages of CNN isreadings. the abilityAs to perceive local the features. The of the As combustor outlet one temperature on thermocouple can be seen, abnormal extract the can keyextract information in several adjacent values. CNN the key information in severalEGT adjacent values. information of the hot component is contained in several EGT adjacent EGT values rather than all the EGT

values. As stated earlier, one of the advantages of CNN is the ability to perceive local features. The CNN can extract the key information in several adjacent EGT values.

Figure Schematic diagram of theofinfluence of the combustor temperature on thermocouple Figure 9. 9. Schematic diagram the influence of theoutlet combustor outlet temperature on readings. readings. thermocouple Figure 9. Schematic diagram of the influence of the combustor outlet temperature on thermocouple

In addition, because of the rotation of the hot gas, the same combustors would affect the different readings. In addition, because rotationand of the hot gas, the same combustors would the different thermocouples when of thethe operating ambient conditions change. That means theaffect EGT profile swirlsInunder different operating andambient ambient conditions. Forcombustors example, hot affect component fault swirls thermocouples when the operating and conditions change. That the means the EGT profile addition, because of the rotation of the hot gas, the same would the different a “cold zone” the EGT profile, as shownconditions in Figure 10. When the ambient thermocouples when in the operating and ambient change. means theand EGT profile undercauses different operating and ambient conditions. For example, theThat hotoperating component fault causes a conditions change, the cold zone would present at a different angular position. The EGT profile swirl swirls under different operating and ambient conditions. For example, the hot component fault “cold zone” in the EGT profile, as shown in Figure 10. When the operating and ambient conditions can be seen as the shift key features. As 10. stated earlier, the CNN and extracts the causes a “cold zone” in of thesome EGTelementary profile, as shown in Figure When the change, the cold zone would present at a different angular position. Theoperating EGT profileambient swirl can be elementary key features from thewould input. present After pooling, the keyangular featureposition. does notThe contain original conditions change, the cold zone at a different EGT the profile swirl seen as the shift of some elementary key features. As stated earlier, the CNN extracts the elementary location information. Theof CNN has the property shift, scale, distortion Therefore, can be seen as the shift some elementary keyoffeatures. As and stated earlier, invariance. the CNN extracts the key features from input. After pooling, the key feature not contain the original location aelementary CNN can effectively consider theinput. effect After of EGT profile swirl. keythe features from the pooling, the key does feature does not contain the original information. The CNN has the property of shift, scale, and distortion invariance. Therefore, a CNN location information. The CNN has the property of shift, scale, and distortion invariance. Therefore, can effectively consider theconsider effect of profile a CNN can effectively theEGT effect of EGTswirl. profile swirl.

Figure 10. Abnormal EGT profiles under different operating and ambient conditions. Figure 10. Abnormal EGT profilesunder under different different operating and ambient conditions. Figure 10. Abnormal EGT profiles operating and ambient conditions.

In general, the faulty information of the hot component is included in the local EGT profile rather than the global EGT profile. The EGT profile swirl reflects the shift of some key features. The three basic architectural ideas, local receptive fields, shared weights, and pooling, ensure that the CNN is

Energies 2018, 2018, 11, 11, xx Energies

of 18 18 99 of

In general, the faulty information of the hot component is included in the local EGT profile rather than the global EGT profile. The EGT profile swirl reflects the shift of some key features. The three Energies 2018, 11, 2149 9 of 18 basic architectural ideas, local receptive fields, shared weights, and pooling, ensure that the CNN is robust to the feature scale, shift, and distortion. Therefore, a CNN is suitable for detecting early faults in robust gas turbine components. to thehot feature scale, shift, and distortion. Therefore, a CNN is suitable for detecting early faults in gas turbine hot components. 3.3. Improved CNN for Gas Turbine Hot Component Fault Detection 3.3. Improved CNN for Gas Turbine Hot Component Fault Detection Convolutions are usually performed after padding the feature maps. Padding means adding Convolutions areofusually performed after theitfeature maps. Padding means adding some values to the edge the input matrix. On thepadding one hand, can save the size of feature maps so some values to the edge of the input matrix. On the one hand, it can save the size of feature maps that it is a useful way to increase the model depth. On the other hand, padding in feature maps can so thatuse it isthe a useful way to increase model depth. On is the other hand, padding in feature maps can better border information ofthe feature maps, which beneficial for the detection performance. better use the border information of feature maps, which is beneficial the detection performance. No-padding means there is no padding operation. A typical paddingfor method in CNNs is zeroNo-padding means there is no padding operation. A typical padding methodas inshown CNNs is padding. In zero-padding, zeros are added to the edge of the input matrix, inzero-padding. Figure 11. InEGT zero-padding, zerosare arecircumferentially added to the edge of the input matrix, as shown in Figure 11. The EGT The thermocouples distributed. Based on the distribution c haracteristics are circumferentially distributed. Based on the EGT of thermocouples EGT thermocouples, the circular-padding method is used to distribution detect faultscharacteristics of gas turbineofhot thermocouples, the circular-padding used to detect faults of gas hot components in components in this paper, as shown inmethod Figure is12. In circular-padding, the turbine last EGT thermocouple this paper, showntoin the Figure 12. of In circular-padding, the last EGT thermocouple areEGT added readings are asadded front the first EGT thermocouple readings, andreadings the first to the front of the first EGT thermocouple readings, and the first EGT thermocouple readings thermocouple readings are added to the back of the last EGT thermocouple readings. For example,are to thethermocouples, back of the last 25th, EGT thermocouple For example, ofare theadded 27 EGTtothermocouples, of added the 27 EGT 26th, and 27threadings. thermocouple readings the front of and 27th thermocouple are addedoperation to the front of thesure firstthat thermocouple reading. the25th, first26th, thermocouple reading. The readings circular-padding makes the information The circular-padding makes sure thatand the the information the firstreadings EGT thermocouple between the first EGT operation thermocouple readings last EGTbetween thermocouple is fully readings and the last EGT thermocouple readings is fully extracted. extracted.

Figure Zero-padding. Figure 11.11. Zero-padding.

Figure 12.12. Circular-padding. Figure Circular-padding.

4. Experiments 4. Experiments 4.1.4.1. Data Description andand Model Parameters Setup Data Description Model Parameters Setup The real operating data sampled once perper minute come from thethe gasgas turbine running forfor three The real operating data sampled once minute come from turbine running three months. The gas turbine is used to generate electricity. The gas turbine has 27 combustors and months. The gas turbine is used to generate electricity. The gas turbine has 27 combustors27 and thermocouples to measure the EGT. The The datadata come from the the realreal operating gasgas turbine, and thethe 27 thermocouples to measure the EGT. come from operating turbine, and measurement equipment areare thethe 27 27 thermocouples distributed equally in in thethe gasgas turbine exhaust measurement equipment thermocouples distributed equally turbine exhaust section. Filtering points corresponding correspondingtotopart part operation condition (speed < 95%) section. Filteringout outthose thosedata data points operation condition (speed < 95%) results in 16,825 samples, of which 16,636 samples are normal and 189 samples are abnormal. The EGT profiles are shown after mean normalization.

Energies 2018, 11, x

10 of 18

results in 16,825 samples, of which 16,636 samples are normal and 189 samples a re abnormal. The 10 of 18 EGT profiles are shown after mean normalization. The CNN architecture is shown as Figure 13. Referring to some well-known CNN architectures [29,30] cross-validation results, four convolution layers, max-pooling layer, one [29,30] global Theand CNN architecture is shown as Figure 13. Referring to some one well-known CNN architectures average pooling layer, and four one fully-connected layer used in this paper. activation and cross-validation results, convolution layers, oneare max-pooling layer, oneThe global averagefunction pooling for the convolution layers is the ReLu function, the The activation function for the fully connected layer, and one fully-connected layer are used in thisand paper. activation function for the convolution layer isisthe function. The size reflects thefor number of thermocouples by one layers thesoftmax ReLu function, and thefilter activation function the fully connected layeraffected is the softmax combustor. Based experience, thenumber filter size set to 1 × 5. Based on theby cross-validation function. The filteron size reflects the ofisthermocouples affected one combustor.results, Based the on filter number is filter 16 in the two the size of the feature map is half that of experience, the sizefirst is set tolayers. 1 × 5. After Basedmax on pooling, the cross-validation results, the filter number is theinoriginal. the filter number set to 32the in the twofeature layers. map For the CNNthat design, this paper 16 the firstThus, two layers. After max is pooling, sizelast of the is half of the original. sets the parameters based empirical learning set to 0.1. Thus, the filter number is on set many to 32 in the last suggestions two layers. [30,31]. For theThe CNN design,rate thisispaper sets The the number of epochs formany learning is 200.suggestions Momentum[30,31]. is usedThe as alearning weight-updating strategy tonumber accelerate parameters based on empirical rate is set to 0.1. The of the training process [32].Momentum Usually, the momentum coefficient is recommended to the be training 0.9. L2 epochs for learning is 200. is used as a weight-updating strategy to accelerate regularization is usedthe asmomentum a strategy tocoefficient overcomeis the overfitting to problem The regularizer decay process [32]. Usually, recommended be 0.9. [32]. L2 regularization is used as set to 0.0001. aparameter strategy toisovercome the overfitting problem [32]. The regularizer decay parameter is set to 0.0001. Energies 2018, 11, 2149

Figure 13. 13. CNN architecture. Figure

4.2. CNN Performance 4.2. CNN Detection Detection Performance ANN and thethe twotwo most commonly used methods for building EGT model previous ANN andELM ELMwere were most commonly used methods for building EGTinmodel in works [13–15]. To evaluate the effectiveness of the CNN for hot component fault detection, the previous works [13–15]. To evaluate the effectiveness of the CNN for hot component fault detection, detection performance waswas compared between the ANN, parameters the detection performance compared between the ANN,ELM, ELM,and andCNN. CNN.The The model model parameters were determined via cross-validation and empirical suggestions. For the ANN design, a three-layer were determined via cross-validation and empirical suggestions. For the ANN design, a three-layer ANN was was used. used. The The neural neural node node number number in in the the hidden hidden layer layer was was 20. 20. The function for for the the ANN The activation activation function hidden layer was the sigmoid function, and the activation function for the output layer was the hidden layer was the sigmoid function, and the activation function for the output layer was the purelin purelin function. The learning rate 0.1. The number of epochs for learning was The 500.ELM The ELM has function. The learning rate was 0.1.was The number of epochs for learning was 500. has one one design parameter, the number of hidden neurons. Referring to [15], the number of hidden design parameter, the number of hidden neurons. Referring to [15], the number of hidden neurons neurons set to 1000. was set towas 1000. The five-fold five-foldcross-validation cross-validation was employed model training validation. The samples The was employed forfor model training and and validation. The samples were were randomly partitioned into five equal sizes subsamples. Of the five subsamples, a single randomly partitioned into five equal sizes subsamples. Of the five subsamples, a single subsample subsample retained as the validation and the remaining fourwere subsamples were used asdata. the was retainedwas as the validation data, and thedata, remaining four subsamples used as the training training data. Thewas training was used to obtain ANN, and The CNN models. The The training data used data to obtain the ANN, ELM,the and CNNELM, models. validation datavalidation was used data was used to evaluate the effectiveness of the three methods. The cross-validation process was to evaluate the effectiveness of the three methods. The cross-validation process was then repeated five then repeated five times, with each of the five subsamples used exactly once as the validation data. times, with each of the five subsamples used exactly once as the validation data. The detection result was the average of the five results. To obtain a more robust comparison, the five-fold cross-validation

Energies 2018, 11, 2149

11 of 18

Energies 2018, 11, x

11 of 18

was run 10 times, each time with different random splitting of five folds of the data. The CNN, result was the average of the five ANNThe anddetection ELM were implemented in Python 3.6.results. To obtain a more robust comparison, the fivefold cross-validation was run 10 times, each time with different random(ROC) splitting of five folds of the In this paper, accuracy (ACC), receiver operating characteristic curves, and Matthews data. The CNN, ANN and ELM were implemented in Python 3.6. correlation coefficient (MCC) were used as the detection performance measure for comparison. MCC is this paper, accuracy (ACC), receiver operating characteristic (ROC) curves, and Matthews defined asInEquation (7) [33]: correlation coefficient (MCC) were used as the detection performance measure for comparison. MCC is defined as Equation (7) [33]: TP × TN − FP × FN

MCC = p

( TN + FN ) × ( TNTP+FP TN)−× FP( TP  FN+ FN ) × ( TP + FP)

MCC =

(TN + FN )  (TN + FP )  (TP + FN )  (TP + FP )

(7)

(7)

Figure 14 shows one five-fold cross-validation comparison of ROCs for the three models. The Figure 14 shows one than five-fold ROCs fora the three models. The CNN model performs better the cross-validation ANN and ELMcomparison models. Toofperform quantitative comparison model better than the ANN andfor ELM models. ToROCs perform a quantitative of theCNN ROCs, theperforms area under ROC curve (AUC) each of the was calculated.comparison The ACC and ROCs, the area under curve (AUC)The for each of the was calculated. The ACC and MCCof forthe different models were ROC also calculated. means andROCs the standard deviations of the ACCs, MCC for different models were also calculated. The means and the standard deviations of the ACCs, AUCs, and MCCs of 10 random runs for the three models are shown in Table 1. As described in AUCs, and MCCs of 10 random runs for the three models are shown in Table 1. As described in Section 4.1, the dataset is seriously imbalanced, so the ACC does not clearly show the advantage Section 4.1, the dataset is seriously imbalanced, so the ACC does not clearly show the advantage of of thethe CNN. However, from the means and standard deviations of MCC and AUC, it can be seen CNN. However, from the means and standard deviations of MCC and AUC, it can be seen that that the CNN model’sdetection detection performance is significantly better ANN and ELM the CNN model’s performance is significantly better than thethan ANNthe model andmodel ELM model. model. The next section discusses why the CNN is better at detecting early faults in gas turbine The next section discusses why the CNN is better at detecting early faults in gas turbine hot hot components in detail. components in detail.

Figure14. 14.ROC ROC comparison. comparison. Figure Table 1. Detection results for different models.

Table 1. Detection results for different models.

Measures Measures Models Models

ANN ANN ELM ELM CNN

CNN 4.3. Detection Visualization

ACC ACC 0.994  0.0016 0.994±0.0009 0.0016 0.992

MCC MCC 0.709  0.0866 0.709 ± 0.0866  0.0736 0.564 0.564 ± 0.0736  0.0347 0.927

AUC 0.808  0.0920 ± 0.0920 0.686 0.808 0.0616 ± 0.0616 0.999 0.686 0.0014

0.998 ± 0.0007

0.927 ± 0.0347

0.999 ± 0.0014

0.992±0.0007 0.0009 0.998

AUC

A CNN is a black box model, and visualization can help us better understand how the CNN 4.3. Detection Visualization works well in the fault detection of gas turbine hot components.

A CNN a black model, visualization can ambient help us conditions. better understand The is EGT profilebox swirls with and different operating and The swirlhow anglethe willCNN worksvary well in the fault detection of gasconditions. turbine hot components. with the operating and ambient This phenomenon is clearly shown in Figure 15. The The EGTcolors profile swirlsdifferent with different operating and ambient conditions. The swirl different indicate EGT values. The higher the temperature, the brighter the angle color. will 15 shows the shift of the EGT profileThis is the same as that is of clearly power. For example, the 15. vary Figure with the operating and trend ambient conditions. phenomenon shown in Figure generated power rises at about 550 min. At the same time, the location of the highest EGT value shifts The different colors indicate different EGT values. The higher the temperature, the brighter the the 15 16th thermocouple the of 14th the generated power drops the color.from Figure shows the shift to trend thethermocouple. EGT profile When is the same as that of power. Fortoexample, the generated power rises at about 550 min. At the same time, the location of the highest EGT value shifts from the 16th thermocouple to the 14th thermocouple. When the generated power drops to

Energies 2018, 11, 2149

12 of 18

Energies 2018, 11, x

12 of 18

Energies 2018, 11, x

12 of 18

the original level,the the location of highest the highest EGT also value alsoback. comes Theoflocation other EGT original level, location of the EGT value comes Theback. location other EGTofvalues values shows a similar phenomenon. It should be noted that the speed and IGV angle remain basically shows a similar phenomenon. It should be noted that the speed and IGV angle remain basically original level, the location of the highest EGT value also comes back. The location of other EGT values constant during time. shows a similar phenomenon. It should be noted that the speed and IGV angle remain basically constant during thisthis time. constant during this time.

Figure profileswirl swirlunder under different power. Figure15. 15.EGT EGT profile different power. Figure 15. EGT profile swirl under different power.

Figure 4 and the specific example of theprofiles EGT profiles under operating Figures 4 and 16Figure show16 theshow specific example of the EGT swirlswirl under operating andand ambient ambientFigure conditions. EGT profile a isthe measured 121.56 MW, profile b isunder measured underand 4 and Figure 16 show specificunder example of the EGTEGT profiles swirl operating conditions. EGT profile a is measured under 121.56 MW, EGT profile b is measured under 148.92 MW, 148.92 MW,conditions. and EGT profile c is measured under 181.87 MW.MW, TheEGT EGTprofile profilebswirls about one ambient EGT profile a is measured under 121.56 is measured under and EGT profile c is measured under 181.87 MW. The EGT profile swirls about one thermocouple when thermocouple when power is increased from 121.56 MW to 148.92 MW, and the EGT profile swirls 148.92 MW, and EGT profile c is measured under 181.87 MW. The EGT profile swirls about one power is increased from MW to 148.92 MW,121.56 and the EGT profile swirls about two thermocouples about two thermocouples when is increased from 121.56 MW toMW, 181.87 MW. EGT profiles thermocouple when121.56 power ispower increased from MW to 148.92 and theThe EGT profile swirls when is increased from 121.56 MW to 181.87 MW. The EGT profiles a, b, and c are all measured a, power b, and c are all measured under normal conditions. EGT profiles a, b, and c are examples to about two thermocouples when power is increased from 121.56 MW to 181.87 MW. The EGT profiles under normal conditions. EGT profiles a, b, and c are examples to illustrate the CNN detection, illustrate thecCNN detection, andunder they are unfolded in Figure 17.profiles The CNN architecture model to a, b, and are all measured normal conditions. EGT a, b, and c areand examples are same as in Section 4.1. CNN and parameters they are unfolded in Figure 17. they The architecture and parameters areand themodel same as illustrate the the CNN detection, and are unfolded in Figure 17.model The CNN architecture parameters are the same as in Section 4.1. in Section 4.1.

(a ) (a )

(b) (b)

Figure 16. EGT profiles swirl: (a) EGT profiles under different power; and (b) EGT profile c is manually forEGT two thermocouples. Figure rotated 16. EGTcounter-clockwise profiles swirl: (a) profiles under different power; and (b) EGT profile c is Figure 16. EGT profiles swirl: (a) EGT profiles under different power; and (b) EGT profile c is manually manually rotated counter-clockwise for two thermocouples.

rotated counter-clockwise for two thermocouples.

Energies 2018, 11, x Energies 2018, 11, 2149

13 of 18 13 of 18

Energies 2018, 11, x

13 of 18

Figure17. 17.Unfolded Unfolded EGT Figure EGT profiles. Figure 17. Unfolded EGTprofiles. profiles.

Figure 18 shows the 16 filters in the first convolution layer. The CNN extracts the features which

Figure 18 shows shows the the 16 16 filters filters in in the first first convolution layer. layer. The The CNN CNN extracts the the features features which which Figure 18 can be used to distinguish betweenthe normal convolution and abnormal classes. After theextracts first convolution layer, can be be used to distinguish between normal and abnormal classes. After the first convolution layer, can used to distinguish between normal and abnormal classes. After the first convolution the feature maps of EGT profiles a, b, and c can be seen in Figure 19. Different filters can be used tolayer, the feature maps of EGT profiles a, b, and c can be seen in Figure 19. Different filters can be used to the feature maps of EGT profiles b, and c can bebyseen in Figure Different filters can be used to perceive different features. Thea, feature extracted the second filter19. is shown as an example. In the perceive different features. The feature feature extracted by the the second second filter is is However, shown as asdue an example. example. In the the firstdifferent EGT profile a, the feature mainly appears at the 22nd thermocouple. to the swirl,In perceive features. The extracted by filter shown an first EGT profile a, the feature mainly appears at the 22nd thermocouple. However, due to the swirl, the feature mainly appears at the 21st thermocouple in the second EGT profile b, and the feature first EGT profile a, the feature mainly appears at the 22nd thermocouple. However, due to the swirl, mainlymainly appears at the at20th thermocouple in theinthird EGTsecond profile c. The features the feature feature mainly appears at 21st thermocouple insecond the EGT profile and thealso feature the appears thethe 21st thermocouple the EGT profile b,other andb, the feature mainly demonstrate similar phenomena. The results show that the effect of EGT profile swirl makes the key mainly appears at the 20th thermocouple in EGT the third profile The other also appears at the 20th thermocouple in the third profileEGT c. The otherc.features also features demonstrate feature appear in phenomena. different locations, and theshow convolution operation successfully perceives the key demonstrate similar The results that the effect of EGT profile swirl makes the key similar phenomena. The results show that the effect of EGT profile swirl makes the key feature appear features. feature appear in different locations, and operation the convolution operation successfully in different locations, and the convolution successfully perceives the key perceives features. the key features.

Figure 18. Filters in the first convolution layer.

Figure the first first convolution convolution layer. Figure 18. 18. Filters Filters in in the layer.

Energies 2018, 11, 2149

14 of 18

Energies 2018, 11, x

14 of 18

Figure 19. Feature map after the first convolution layer. Figure 19. Feature map after the first convolution layer.

After the pooling layer, the location information of the key feature is eliminated. As long as the

After theexists, pooling layer, the location of theFigure key feature is the eliminated. Asafter longthe as the feature it will be reflected in theinformation final feature map. 20 shows feature map global average layer.inInthe thefinal normal EGTmap. profile, some20key features can be perceived. feature exists, it will pooling be reflected feature Figure shows the feature map after the no features been detected the abnormal EGT profilecan d (Figure 21), so However, CNN globalHowever, average pooling layer.have In the normal EGTinprofile, some key features be perceived. successfully judge EGT profile d as an anomaly. Therefore, the CNN can detect a specific feature no features have been detected in the abnormal EGT profile d (Figure 21), so CNN successfully judge wherever it appears in the EGT profile. The specific feature refers to the key feature used to EGT profile d as an anomaly. Therefore, the CNN can detect a specific feature wherever it appears in distinguish between normal and abnormal classes. This is the reason why a CNN can solve the EGT the EGT profile. The specific feature refers to the key feature used to distinguish between normal and profile swirl problem and improve the sensitivity of fault detection of gas turbine hot components. abnormal classes. This is the reason why a CNN can solve the EGT profile swirl problem and improve the sensitivity of fault detection of gas turbine hot components.

Energies 2018, 11, x

15 of 18

Energies 2018, 2018, 11, 2149 Energies 11, Energies 2018, 11,xx

15 of 18 15 15 of of 18 18

Figure Feature map after globalaverage average pooling. Figure Feature map after pooling. Figure 20.20. Feature global average pooling. Figure 20. Featuremap mapafter after global global average pooling.

Figure 21. An abnormal Figure 21.An Anabnormal abnormalEGT EGT profile. Figure 21. EGTprofile. profile. Figure 21. An abnormal EGT profile. 4.4. in 4.4.Improvement Improvement inCircular-Padding Circular-Padding 4.4. Improvement in Circular-Padding 4.4. Improvement in Circular-Padding In In this this section, section,the the detection detection results results of of three three different different padding padding methods methods were were compared. compared.The The In this section, the detection results of three different padding methods were compared. The three three padding methods are: no-padding, zero-padding, and circular-padding. The CNN architecture three padding methods are: no-padding, zero-padding, and circular-padding. The CNN architecture In this section,are: the detection results of three different padding methodsThe were compared. The padding methods no-padding, zero-padding, and circular-padding. CNN architecture and model parameters are the only difference method. A and modelmethods parameters are the same sameas asin in Section Section4.1. 4.1.The The only differenceis isthe the padding padding method. A three padding are: no-padding, zero-padding, and circular-padding. The CNN architecture and model parameters areused the before same in Section 4.1.operation. The onlyACC, difference is the padding method. padding operation every convolution MCC, AUC were padding operationwas was used beforeas every convolution operation. ACC, MCC, and and AUC were used used and model parameters are the same as in Section 4.1. The only difference is the padding method. A A padding operation was usedmeasure. before As every convolution operation. ACC, MCC, andisisAUC as performance 22, in maps useful asthe thedetection detection performance measure. Ascan canbe beseen seenin in Figure Figure 22,padding padding infeature feature maps usefulwere padding operation wasperformance used beforemeasure. every convolution ACC, MCC, and AUC were maps used the is not for hot component usedfor as improving the detection As can beoperation. in Figure 22, padding feature for improving thedetection detectionperformance. performance.Zero-padding Zero-padding isseen notsuitable suitable forgas gasturbine turbine hotin component as the detection performance measure. As can be seen in Figure 22, padding in feature maps is useful fault detection because would the information faultfor detection because would introduce introduce the wrong wrong information on on the border. border. The circularis useful improving theititdetection performance. Zero-padding is the not suitableThe forcirculargas turbine for improving the detection performance. Zero-padding is not suitable for gas turbine hot padding method detects better than the other methods owing to the reasonable addition of the border padding method detects better than the other methods owing to the reasonable addition of the border hot component fault detection because it would introduce the wrong information on component the border. fault detection because it would introduce the wrong information on the border. The information. information. The circular-padding method detects better than the other methods owing to the reasonable circularaddition

padding method detects better than the other methods owing to the reasonable addition of the border of the border information. information.

Figure Figure22. 22.Comparison Comparisonof ofdifferent different padding padding methods. methods.

5. 5.Conclusions Conclusions Figure 22. of methods. In used for of turbine components Figure 22. Comparison Comparison ofdifferent different padding methods. In this thispaper, paper,aamethod method used for fault faultdetection detection ofgas gaspadding turbinehot hot componentsbased basedon on aaCNN CNN is developed. The hot component fault is not the unique factor that causes the discrepancy is developed. The hot component fault is not the unique factor that causes the discrepancybetween between 5. varying varyingEGT EGTvalues. values.Different Differentoperating operatingand andambient ambientconditions conditionscan canalso alsocause causethe theEGT EGTdiscrepancy. discrepancy. 5. Conclusions Conclusions

In In this this paper, paper, aa method method used used for for fault fault detection of gas turbine hot components based based on on a CNN CNN is is developed. developed. The The hot hot component component fault fault is is not not the the unique unique factor factor that that causes causes the the discrepancy discrepancy between between varying varying EGT EGT values. values. Different Different operating operating and andambient ambientconditions conditionscan canalso alsocause causethe theEGT EGTdiscrepancy. discrepancy. To detect hot component faults as early as possible, and improve the detection performance, the key

Energies 2018, 11, 2149

16 of 18

problem is how to eliminate the influence of various factors on the EGT profile. In this paper, the effect of the EGT profile swirl under different operating and ambient conditions is studied in detail. It is found that the abnormal information is contained in several adjacent EGT values rather than the global EGT profile, and the EGT profile swirl reflects the shift of some key features. CNN which has the properties of local perception and shared weights could effectively extract the abnormal information from the EGT profile. Moreover, pooling operation eliminates the location information of the key features. Therefore, CNN is robust to the feature scale, shift, and distortion. These characteristics make CNN suitable for solving the problem caused by EGT profile swirl in the fault detection of gas turbine hot components. To fully extract the information in the EGT profile, according to the EGT thermocouples with circular distribution, this paper develops the circular-padding method in a CNN. In circular-padding, the last EGT thermocouple readings are added to the front of the first EGT thermocouple readings, and the first EGT thermocouple readings are added to the back of the last EGT thermocouple readings. The experiment results on the real-world gas turbine data indicate that the fault detection for gas turbine hot components based on CNN is more accurate than the typical methods. The CNN visualization results explain the reason why CNN is suitable for hot components fault detection. It is proved that the effect of the EGT profile swirl is the key feature which can distinguish between normal and abnormal classes appear in different locations, and the convolution operation successfully perceives the key features. Then by the pooling operation, the position information of the key features is eliminated. Some key features can be perceived in the normal EGT profile, whereas no features can be detected in the abnormal EGT profile. CNN extracts the information between adjacent EGT values and eliminates the impact of EGT profile swirl. The experiment results also show that the circular-padding method is better than other typical padding methods. Author Contributions: Conceptualization, J.L., J.F.L. and D.Y.; Methodology, J.L. and M.K.; Data Curation, W.Y.; Supervision, Z.W. and M.G.P. Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations EGT CNN PHM C0 C1 C2 u α1 β1 α2 β2 FAR MAR yj bj wij xi T3 T4 πt ηt k

Exhaust gas temperature Convolutional neural network Prognostics and health management Static blade inlet velocity Static blade outlet velocity Moving blade outlet velocity Turbine speed Static blade outlet airflow angle Static blade outlet relative velocity angle Moving blade outlet airflow angle Moving blade outlet relative velocity angle False alarm rate Missing alarm rate Output feature map Bias Convolutional kernel Input feature map Combustor outlet temperature Exhaust gas temperature Expansion ratio Turbine efficiency Isentropic exponent

Energies 2018, 11, 2149

Φit Φcj Φd A σ TP TN FP FN ACC ROC MCC AUC

17 of 18

Angular position of the i-th thermocouple Angular position of the j-th combustor Swirl angle of hot gas Amplitude of the function Constant parameter that characterizes the influence of the combustor True positive True negative False positive False negative Accuracy receiver operating characteristic Matthews correlation coefficient Area under ROC curve

References 1. 2. 3.

4. 5. 6. 7.

8. 9. 10. 11. 12. 13. 14. 15.

16.

Zaidan, M.A.; Relan, R.; Mills, A.R.; Harrison, R.F. Prognostics of Gas Turbine Engine: An Integrated Approach. Expert Syst. Appl. 2015, 42, 8472–8483. [CrossRef] Lu, F.; Jiang, C.; Huang, J.; Wang, Y.; You, X. A Novel Data Hierarchical Fusion Method for Gas Turbine Engine Performance Fault Diagnosis. Energies 2016, 9, 828. [CrossRef] Tahan, M.; Tsoutsanis, E.; Muhammad, M.; Karim, Z.A.A. Performance-Based Health Monitoring, Diagnostics and Prognostics for Condition-Based Maintenance of Gas Turbines: A Review. Appl. Energy 2017, 198, 122–144. [CrossRef] Lu, J.; Huang, J.; Lu, F. Sensor Fault Diagnosis for Aero Engine Based on Online Sequential Extreme Learning Machine with Memory Principle. Energies 2017, 10, 39. [CrossRef] Volponi, A.J. Gas Turbine Engine Health Management: Past, Present and Future Trends. J. Eng. Gas. Turbines Power-Trans. ASME 2014, 136, 051201. [CrossRef] General Electric Company. Heavy Duty Gas Turbine Monitoring & Protection; General Electric Company: Las Vegas, NV, USA, 2015. Miller, K.W.; Iasillo, R.J.; Lemmon, M.F. Method of Monitoring for Combustion Anomalies in A Gas Turbomachine and A Gas Turbomachine Including A Combustion Anomaly Detection System. U.S. Patent 20,150,267,591, 24 September 2015. Yu, L.; Jin, J. Study on Deflection Laws of 9FA Gas-Turbine Exhaust Temperature Field. Electr. Power Constr. 2009, 3, 26. Gulen, S.C.; Griffin, P.R.; Paolucci, S. Real-time on-line Performance Diagnostics of Heavy-duty Industrial Gas Turbines. J. Eng. Gas. Turbines Power-Trans. ASME 2002, 124, 910–921. [CrossRef] Liu, J. The Research for Exhaust Temperature Anomaly Detection in Gas Turbine. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2014. Song, Y.X.; Zhang, K.X.; Shi, Y.S. Research on Aeroengine Performance Parameters Forecast Based on Multiple Linear Regression Forecasting Method. J. Aerosp. Power 2009, 24, 427–431. Yılmaz, I. Evaluation of the relationship between exhaust gas temperature and operational parameters in CFM56-7B engines. Proc. Inst. Mech. Eng. Part G-J. Aerosp. Eng. 2009, 223, 433–440. [CrossRef] Tarassenko, L.; Nairac, A.; Townsend, N.; Buxton, I.; Cowley, Z. Novelty detection for the identification of abnormalities. Int. J. Syst. Sci. 2000, 31, 1427–1439. [CrossRef] Kenyon, A.D.; Catterson, V.M.; McArthur, S.D.J. Development of An Intelligent System for Detection of Exhaust Gas Temperature Anomalies in Gas Turbines. Insight 2010, 52, 419–423. [CrossRef] Yan, W.; Yu, L. On Accurate and Reliable Anomaly Detection for Gas Turbine Combustors: A Deep Learning Approach. In Proceedings of the 2015 Annual Conference of the Prognostics and Health Management Society, Austin, TX, USA, 18–22 October 2015. Yan, W. One-Class Extreme Learning Machines for Gas Turbine Combustor Anomaly Detection. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016.

Energies 2018, 11, 2149

17. 18. 19. 20. 21. 22. 23. 24.

25. 26. 27. 28. 29. 30. 31.

32. 33.

18 of 18

Zhang, Y.; Bingham, C.; Garlick, M.; Gallimore, M. Applied Fault Detection and Diagnosis for Industrial Gas Turbine Systems. Int. J. Auto. Comput. 2017, 14, 463–473. [CrossRef] Basseville, M.; Benveniste, A.; Mathis, G.; Zhang, Q. Monitoring the combustion set of a gas turbine. IFAC Symp. Fault Detect. Superv. Saf. Tech. Process. SAFEPROCESS 1994, 1, 397–402. [CrossRef] Zhang, Q.; Basseville, M.; Benveniste, A. Early warning of slight changes in systems. Automatica 1994, 30, 95–113. [CrossRef] Medina, P.; Saez, D.; Roman, R. On Line Fault Detection and Isolation in Gas Turbine Combustors. In Proceedings of the ASME Turbo Expo 2008: Power for Land, Sea, and Air, Berlin, Germany, 9–13 June 2008. Liu, J.F.; Liu, J.; Wan, J.; Wang, Z.; Yu, D. Early Fault Detection of Hot Components in Gas Turbines. J. Eng. Gas. Turbines Power-Trans. ASME 2017, 139, 021201. Jin, K.H.; Mccann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 2017, 26, 4509–4522. [CrossRef] [PubMed] Dong, C.; Chen, C.L.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [CrossRef] [PubMed] Luong, T.X.; Kim, B.K.; Lee, S.Y. Color Image Processing Based on Nonnegative Matrix Factorization with Convolutional Neural Network. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014. Liu, Y.; Chen, X.; Peng, H.; Wang, Z. Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 2017, 36, 191–207. [CrossRef] Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv, 2013, arXiv:1312.4400. Zhang, Q. Contribution à la Surveillance de Procédés Industriels. Ph.D. Thesis, University of Rennes 1, Rennes, France, 1991. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2014, 1409, 1556. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 International Conference on Computer Vision (ICCV15), Santiago, Chile, 11–18 December 2015. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. Matthews, B.W. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442–451. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).