MODEL BASED ACTUATOR CONTROL LOOP FAULT DETECTION Jérôme Lacaille1 – Rostand Nya Djiki2 1
Senior Algorithm Expert, Moissy Cramayel Snecma, France,
[email protected] 2 Hispano Suiza, Moissy Cramayel, France,
[email protected]
ABSTRACT On turbojet compressors, actuators automatically control the variable geometry to optimize the engine efficiency, depending on the flight regime, the external conditions and the command inputs. Located on the actuators, several position sensors track the real movement of the hydraulic jacks. As engine manufacturers we developed an algorithm that detects abnormalities of the control loop. This algorithm analyzes specific state parameters describing the joint behavior of the observed signals. An inline temporal model depending on external exogenous measurements models the sensor position. The parameters of this model point to the relationship between position, control and context measurements. This methodology is used during operational flights for instant abnormalities detection but also on a long-term scale for condition-based maintenance. NOMENCLATURE VSV FADEC ARX LVDT AIC FR DMD WFM MSE SPC
Variable Stator Vane Full Authority Digital Engine Control Auto-Regressive model with eXogenous variables Linear Variable Differential Transformer Akaike Information Criterion Flight Regime Demand Fuel flow Mean Square Error Statistic Process Control
INTRODUCTION The algorithm hypothesis is based on the assumption that a crude model of the control loop may catch important behavior changes. Very specific models may seek smaller effects but will leaks in robustness. Moreover, one only seeks local rough tendencies to follow in time. The control process is modeled by a linear autoregressive equation (ARX), which corresponds to a rational filter in the frequency domain. Then the algorithm uses statistic control charts on indicators built from functions of those filter parameters. Depending on the different flight regimes (FR), a selection procedure automatically computes, during the engine reception ground tests, the minimal set of external inputs and each of their needed historical horizons. In operational conditions, the embedded code continuously computes, on successive intervals of time, a new set of parameters according to the model type associated to the current FR.
1
The model, a rational filter, is designed as a set of ratio of polynomials. It is identified by its zeros and poles in the complex plane. The coordinates of those points are stochastically dependent and may not be used directly. Moreover, some specific indicators as local energy variation may be preferred. A compression algorithm extracts independent indicators and normalizes them to scores of known distributions. Scores are finally monitored with control charts and thresholds that have been predefined or estimated during the engine ground tests. When using this same algorithm for conditional maintenance, a new filter is computed only once for each flight-cycle during a predefined standard condition and the monitoring procedure is applied at flight-cycle time scale. State of the art in the monitoring for aircraft engine systems control loop is describe in DiezLledo and al. (2007). This article goal is to gives some more mathematic elements to achieve an effective realization of such monitoring. Equivalent solutions are well known in other industrial domains especially in highly instrumented manufacturer process such as semiconductor. See for example AEC/APC courses in Lacaille & Zagrebnov (2006) and the FGP (Flexible Golden Pattern) built for SPC purposes in Lacaille & Zagrebnov (2007).
DESCRIPTION System model The physical system of variable stator vanes (VSV) that control the variable geometry of the turbofan engine consists of two hydraulic actuators mechanically linked together. Each actuator position y is measured with a position sensor LVDT (Linear Variable Differential Transformer) and is controlled by a servovalve which input can be noted u. The command depends on the engine operation conditions and is computed through a feed-back loop described in Figure 1. Knowing the cylinder volume V, the piston area A, the fluid density ρ and bulk modulus κ, the equivalent load mass m, the load stiffness K and damping c, as well as some measurement of the external and internal leakage L, it becomes possible to define a approximation of the relation between the command u (with a gain b) and the observed displacement y (equation (1)). 3 2 ρVK Lc⎤ dy ⎡LK⎤ ⎡ρVm⎤ d y ⎡ρVc Lm⎤ d y ⎡ ⎢⎣ κA ⎥⎦ dt3 +⎢⎣ κA + A ⎥⎦ dt2 +⎢⎣Aρ+ κA + A⎥⎦ dt+⎢⎣ A ⎥⎦ y =bu
(1)
This analytical continuous formulation shows that it seems reasonable to approximate the temporal behavior with a discrete and autoregressive stationary process. The numeric regression (see Aoki (1990) or Lacaille (1998) for example) leads to a stochastic equation (2) with quantified time t rj
r0
k
i =1
j =1 i = 0
y (t ) = ∑ ai y (t − i ) + ∑∑ b j ,i x j (t − i )
(2)
where the variables xj regroup all k external inputs including the command u and other observations …k related to the system (see x in equation (9)). Let a = (a 0 = 1,−ai ) i =1…r0 and b = (b j ,i ) ij==01… r j be the vectorized parameters of equation (2). The vector parameters a and b are called state parameters because they describe the relation between context, command and observed displacement. Classical regression methods are used to estimate the parameters from observations. The first challenge is to identify the order of the system: the integers r0 through rk.
2
Figure 1: Schematic view of control-loop for an actuator and its monitoring algorithm. Temporal behavior For our purposes, this temporal regression realizes a crude estimation of the position y(t). The interpretation of the model is better if one stabilizes the process around some stationary point. A different model is estimated for each type of stationary phases belonging to a flight cycle. In general this happens during some specific stable flight regimes. After identification of such flight regimes, a new model is computed periodically and a temporal multivariate signal [a(n), b(n)]FR is acquired. The time index n corresponds to periodic observation t (n) = t 0 + nΔt . (The index n may also be a flight number for long-term maintenance purpose). In practice for each index n a new estimate of the parameters is computed on a time interval In around date t(n).
METHODOLOGY The aim of the monitoring is to detect abnormalities of the control loop by observing tendencies on the state parameters [a(n), b(n)]. A crude estimation of those parameters is sufficient because we are more interested in state trends than in the quality of the regression. Moreover, a fine model may easily be over-parameterized, leading to fast adjustments and brutal changes of the model behavior due to non-local variations. To ensure even more robustness to context variation we also exploit external exogenous measurements like local rotation speeds as model inputs (some xj variables). Computation of stochastic indicators Let F j (w) be the polynomial fractions describing the model:
⎧⎪ B j ( w) = b j , 0 + b j ,1 w + b j ,r w r j j F j ( w) = , with polynomial ⎨ r A( w) ⎪⎩ A( w) = 1 − a1 w − − a r0 w 0 B j ( w)
(3)
Loss ratio: The most evident criterion for testing adequacy of a model to data is to measure the mean square error (MSE) between the real position y and the model estimation yˆ (see equation (11) for the relation between likelihood and MSE) and compare this error with the same one obtained during ground calibration on the reference filter.
3
e0 = MSE
(4)
MSE 0
This criterion tests the adjustment accuracy to a model similar in shape (in linearity and sizes) than the reference one. In fact e0 may be less than 1 if the new model is locally better adjusted than during bench tests. But a high value always means that something is wrong. Energy variations: Unique characteristics of rational filters are zeros (roots of Bj) and poles (roots of A). A statistic control chart may follow those values, looking for threshold crossing and trend detection. (Some roots are complex numbers, but all polynomials are real so each complex root has a conjugate and one follows only real and positive imaginary parts.) The experts generally prefer the use of weighted combinations that easily identify physical failures. Below is an example of automatic computation of such indicator when no expertise is available. A polynomial is fully represented by all its roots, but observing the control loop by all filter poles and roots may be exaggerated. It is somewhat better to monitor the signals around some specific frequency intervals that have a real meaning and not necessary on their whole spectre. Each rational filter is applied to each one of the k exogenous processes. To get rid of bad nonlocal estimations (away from the current frequency variations of each external process) one focus the survey on specific frequencies. For each exogenous variable xj, we compute the power spectral 2
density X j (w) by applying a Fourier transform to the autocorrelation R jj (τ ) of the input signal: X j ( w) = ∫ R jj (τ ) e −iwτ dτ 2
[
with R jj (τ ) = E x j (t ), x j (t − τ )
]
(5)
This computation is done once on a wide enough time interval. It will be readjusted only after a maintenance event able to change the system behavior during any specific FR. (Practical computations will use discrete fast Fourier transform, FFT). In general such computation is done at a ground reception test or a maintenance shop visit. Then the current filter output F j (w) is compared to a reference filter F 0j ( w) that was also learnt during maintenance or reception test. This leads to a vector e = (e j ) j =1
k
of aggregate indicators computed by weighted integration of each
filter over the main frequency bandwidth of each input.
ej =
1 2π
∫ (F (w) − F j
0 j
)
2
(6)
( w) X j ( w) dw
Each ej may be interpreted as a variation of correlation energy for the part of the position signal 2
y affected by the input xj. In fact F j ( w) X j ( w) corresponds to the cross power spectral density of signal xj and the same signal filtered by Fj (Fourier transform of the cross-correlation). Taking the square of the filter instead gets the power spectral density of each filtered signal. Hence the integral of the difference of squares is effectively a variation of energy and can also be used as another indicator. (For computation efficiency, instead of computing energy integral, one may focus only on main 2
responses frequencies for each power spectre X j (w) .)
4
Computation of scores The inputs x may be correlated, but the indicators must be studied as a whole. This is implemented by a global z-score computed from normalized residuals of cross projection for each indicator on the space spanned by the others. (This algorithm is described with more details in Lacaille (2009)): Suppose that the indicators are summarized by the vector e. One first builds a regression of each component ej by the other ones called eˆ j
[
]
eˆ j = E e j ei ;i ≠ j ≈ c j , 0 + ∑ c j ,i ei
(7)
i≠ j
where conditional expectation is estimated (in this example) by a linear regression. The global zscore is the Mahalanobis distance of the residuals z j = e j − eˆ j . Let z = ( z j ) j =1 k and compute a mean reference z of z during the test phase as well as an estimation of the covariance matrix Σ of e. The global score Z2 is: Z 2 = (z − z )' Σ −1 (z − z )
(8)
This positive indicator is supposed to have a χ 2 (k ) distribution and detects global behavior abnormalities. Following each local normalized indicator ~ z j = (z j − z j ) / σ j , where σj is the jth diagonal element of Σ, helps drill down to the local behavior corresponding to specific input xj.
CALIBRATION During endurance tests the engine is monitored and the recorded data serve to calibrate thresholds of FR detection algorithm but also the reference values for the system parameters. The sensors used in our case are rotation speeds of HP shaft N2 and LP shaft N1, downstream compressor pressure (PS3) P, fuel flow W (WFM) and valves commands u (DMD). The actuator position output y (LVDT) is read from the controller (FADEC). Hence the input x is the following vector of dimension 5: x = ( N 1 , N 2 , P, W , u )
(9)
The calibration of an autoregressive model needs two main pretreatment phases: the identification of each flight regime and the computation of the system (2) ranks (the sizes rj for each input signal and the autoregressive rank r0). Once this done, the current FR identified, a learning of each regression parameter ai and bj,i is launched. It involves a classical regression method for ARX process. Before model rank analysis and calibration, but after the flight regime identification, each input and observation data segment is normalized according to some exogenous context information (like altitude, mean rotation speed) to match standardized measurements (Figure 2). This helps smooth little variations in external conditions (see CNR algorithm in Lacaille (2009)).
FR identification
Input data normalization
Rank identification
Reference model calibration
Figure 2: Synoptic of the algorithm calibration process.
5
Flight regimes identification On the Figure 4 a schematic description of the different stabilized flight phases was shown in term of HP-shaft speed variations. Transient phases like acceleration, deceleration and reverse are not stationary phases and cannot be used here. The monitoring of the actuators needs some clear stabilization of the engine to help realize the model hypothesis of a stationary stochastic process. Each stationary phase is defined by an almost stable N2 observation. A first computation automatically detects the stable parts (segments) of the RPM signal according to a minimum duration and a maximum tolerance. Once such time segments identified, a classification code labels each segment as belonging to a specific FR category. The flight regimes are essentially defined by three parameters: the minimum duration, a reference value to classify the segment and a tolerance. In fact no other measurement are needed here (such as altitude or plane attitude) because one essentially wants to calibrate the algorithm on ground. After classification of a new time segment, a quality QFR is computed as the likelihood to belong to the corresponding class. A segment that is far from the reference measurement will have a poor quality even if classified as belonging to this FR. On the Figure 3 a synoptic of the FR detection algorithm is given: Detect stable segments
N2
Classify into FR
Compute quality
Figure 3: FR identification process. N2 engine speed
reverse cruise
take-off descent
ground-idle
ground-idle
time
Figure 4: Stabilized flight regimes represented according to shaft rotation speed. Model dimensions computation For each flight regime a set of rank (rj) is estimated. This computation is an optimization of the model according to a maximum likelihood criterion. An adequate criterion able to compare models of different model sizes is needed: we use Akaike (1976) AIC criteria to compensate the loglikelihood, essentially the mean square error (MSE), by the number of estimated parameters k d = r0 − 1 + ∑ j =0 r j .
AIC = 2d − 2 log( L)
(10)
6
(The AIC criteria compensate the log-likelihood log(L) by its natural bias.) For a linear model with Gaussian noise hypothesis, this likelihood can be written in term of mean square error. MSE =
1 N
N
∑ ( y(t ) − yˆ (t ))
2
and log( L) = − N (log(2π MSE) + 1)
(11)
t =1
where N is the number of samples. Minimizing AIC criterion corresponds to minimizing the mean square error but also the number of estimated parameters. This criterion helps to find the good model size but we also look for a robust model that is usable on new observations. Our methodology exploits a small cross-validation algorithm. To test a model of a given rank r = (r0 , …, rk ) where r0 ≥ 1 and r j ≥ 0 on a sample of size N we calibrate a
model on a first part of the segment (says ρ=80% of the whole length) and test it on the rest. The difficulty with cross-validation for temporal models is that we need continuous segments of the initial signal. In our practical application, the estimated position is initialized on the first part (used for calibration) then anticipated on the test part (signal decomposition in calibration part and test part is illustrated on Figure 5). The test gives a likelihood specific to this selection of ranks, but where the mean square error is computed on a part of the signal that was not used for calibration. So the AIC criterion needs to be a little modified: normalization is applied on the sample size factor N to keep a constant and coherent value. After suppression of constant parts an adjusted criterion α λ (r ) with a “sort of regularization” parameter λ is obtained:
α λ (r ) = log(MSE) + λ d
(12)
The rank selection procedure is a forward schema. We begin with r0 = (1, … ,0) and compute the criterion α 0 = α λ (r0 ) which is log(σ 2 ) ≈ 0 because y was normalized ( σ = 1 ) and the signal is estimated by its mean ( μ = 0 ). Then for each component j of r we test a new rank set defined by adding 1 to the jth rank. This will add a penalization of λ which is compensated (at the beginning) by the decrease of the estimation error. The next chosen rank set r1 is defined as the one that minimizes α (r ) . The series (α n ) n =0… decreases until some minimal value is reached (when the error reduction cannot compensate the increase of the parameter size d). This minimal value corresponds to our selection of a best rank set for the system (2).
Calibration part
Test part
Figure 5: Signal decomposition in two parts for cross validation. 7
Once an optimal rank set chosen, the corresponding model parameters are computed but this time using the whole calibration signal. This gives the reference models and filters F 0j ( w) used in
equation (6).
INLINE MONITORING During bench tests, indicators are first computed and then some more data is used for the identification of scores. A score is a transformation of the indicators that belong to a known statistic distribution (see Azencott (2003) and Azencott (2004)). It is easier to define scores thresholds to master false alarm rates, but score computation needs a little calibration phase. Capitalize data buffer
Compute local model
Compute indicators
Capitalize buffer of indicators
Inline score calibration
Compute score s
Figure 6: Monitoring algorithm synoptic. Observed data (x t , y t ) are capitalized in a buffer of predetermined size. This is our signal test segment I t = {(x s , y s ); s ∈ [t − N + 1, …, t ]}. At each periodic time t(n) multiple of a given duration Δt (that essentially depends on computation speed and available CPU power): 1. Test for stability according to N2. 2. Identify the current FR with its classification quality if the segment is stable. 3. Estimate the local model for the optimal set of ranks that was identified for this flight regime. Finally the new model defined by filters F j (n, w) is compared to the reference model according to equations (4) or (6). At the beginning the reference model exists but no observations of the indicators ej are available. Progressively a new buffer of distances (e(n) )n =1…P is capitalized. When this buffer reach a minimal size P, the local regression estimators of vector e are learnt (equation (7)), the covariance matrix Σ is computed and the indicators may be transformed into scores zj(n). Our algorithm also manages outliers segments during this online calibration by removing the worst observations (those with really bad scores) until we actually reach P good scores. Finally the monovariate scores zj(n) are summarized into a global score Z2(n) of known χ² distribution (equation (8) and details page 5). The classification quality Q FR (n) ∈ [0,1] computed during FR detection weights this global score to minimize detection risk when poor FR identification occurs. (On Figure 6 the monitoring steps including temporal buffer recording for inline calibration of scores were illustrated.)
8
Of course using directly a monovariate global score like the loss ratio e0 (equation (4)) avoids the need to build any estimator to catch dependencies. Moreover, dividing MSE by the original value MSE0, which corresponds to the numerator variance during calibration, automatically normalized the loss ratio into a standard score (hence Z 2 (n) = e0 (n) ). Thus only the normalization factor MSE0 needs to be learnt during the first steps of calibration. The final score to follow will be directly the product QFR (n) × e0 (n).
RESULTS AND CONCLUSIONS Once calibrated, the use of the whole algorithm is almost feedforward: 1. Sample periodically the acquisition process. 2. Build time interval In on which a test for stability is done. 3. In case of stability identify the nearest FR and compute its quality Q FR (n) . 4. Estimate the corresponding ARX model parameters on this interval. 5. Compute the indicators (e j ) , the scores ( z j ) and finally the global score Z 2 (n) . 6. Alerts are sent when QFR (n) × Z 2 (n) exceeds some predefined threshold. In case of detected abnormality it may be possible to look at each normalized score component ~ z j (n) and identify with an adequate classification method a specific system fault. (The classification algorithm we currently use is the FDI described in Lacaille (2009).) The detection thresholds may be either specified by expert knowledge or learnt as specific distribution quantiles (like equivalent 3 or 6-sigma upper bounds) during the online score calibration. This algorithm may be used for any kind of control loop. The VSV is one of such system on which the solution gives good results. The computations below are synthetic presentations of results for simulated VSV-like data.
Rank estimation How to estimate a “good” λ? A bad choice for the coefficient λ in the rank selection process (see equation (12)) may increase the criterion α at the beginning. At each step the increase of parameter number makes the crossvalidation MSE decrease. Let ν be the decrease rate at a given step p+1 such that MSE p +1 = ν × MSE p
(13)
Then, using equation (12) one sees that
α p +1 = α p + logν + λ
(14)
The selection procedure stops if α p does not decrease anymore. At least, the worst case is if
α p = α p +1 which gives λ = − logν Defining the maximal acceptable decrease rate ν before stopping the selection process gives an estimation of a corresponding value for λ. For example, if the maximal decrease rate is ν=80% (the process continue until MSE p +1 ≥ 0.8 × MSE p ) we have to set λ=-log(0.8)≈0.22.
9
On Figure 7 the decrease of the coefficient α p are clearly observed until step p=3 when the increase of parameters compensate the decrease of precision. In that example step 3 was selected. Selection criterion α 0 -1 -2 -3
α (p)
-4 -5 -6 -7 -8 -9
0
1
2
3
4 5 Selection step p
6
7
8
9
Figure 7: Process of rank selection. Statistic process control A test process of the monitoring algorithm was realized with simulation adding errors from 0 to 10% variation of the signal on either the observation, the command or both. On the Figure 8 the effect of such error on the loss ratio e0 (n) was shown. The red stars define the first date where the perturbation is applied. Control chart on loss ratio. 14 error on both
12 error on command
e0 = MSE / MSE0
10
8
6
4
error on position
2
0 0
20
40
60
80
100 120 time step (n)
140
160
Figure 8: Control chart of the loss ratio.
10
180
200
Conclusion This algorithm is a standard pursuit method. The data monitored is the state of a local model. This is essentially the most complex part of this proposal. Once a model parameter set defined and a distance between models selected, it is a classical statistic process control (SPC, Zagrebnov (2006)) method. The advantage of such an algorithm is in the way an abnormality is automatically detected as an unusually important distance value. The calibration of the algorithm may be completely automated. However, this will not be able to identify between different types of failures. This problem may be addressed otherwise with a specific classification algorithm like in Lacaille (2009). This algorithm is currently tested on civil aircrafts engines and it is giving acceptable results. The SPC part has to be carefully defined especially in the use of convenient temporal filters to help detecting tendencies, thus enabling the way to prognostic and anticipation. ACKNOWLEDGEMENTS The physical analysis of the VSV system as well as the identification of influent input data and the definition of failure modes was done by Snecma TMM cell members in Hispano-Suiza. This work is well described in Diez-Lledo and al. (2007), and it is a result of TATEM (Technologies and Techniques for New maintenance Concepts) European Project.
REFERENCES H. Akaike (1976), “Canonical Correlation Analysis of Time Series and the Use of an Information Criterion”, in R. Mehra and K. Lainiotis, Eds., “System Identification: Advances and Case Studies”, Academic Press, Inc. New York. M. Aoki (1990), “State Space Modeling of Time Series”, Springer Verlag, New York. J. Lacaille (1998), “Synchronization of Multivariate Sensors with an Autoadaptive Neural method”, Journal of Intelligent and Robotic Systems, vol. 21, pp155-165. A. Mertins (1999), “Signal Analysis, wavelets, Filter Banks, Time-Frequency Transforms and Applications”, John Wiley & Sons. R. Azencott (2003), “A method for monitoring a system based on performance indicators”, U.S. Patent US6594618, Miriad Technologies. R. Azencott (2004), “Method for detecting anomalies in a signal”, US Patent US6721445B1, Miriad Technologies. J. Lacaille, M. Zagrebnov (2006), “A statistical approach of abnormality detection and its applications”, AEC/APC, Denver (CO). J. Lacaille (2007), “How to automatically build meaningful indicators from raw data”, AEC/APC Palm Spring. J. Lacaille, M. Zagrebnov (2007), “An Unsupervised Diagnosis for Process Tool Fault Detection: the Flexible Golden Pattern”, IEEE Transactions on Semiconductor Manufacturing, Volume 20, Issue 4, Page(s): 355 – 363. E. Diez-Lledo and al. (2007), “Hydraulic Actuation Loop Degradation Diagnosis and prognosis”, 1st CEAS European Air and Space Conference. J. Lacaille (2009), “Standardized Failure Signature for a Turbofan Engine”, IEEE Aerospace Conference, Big Sky, MT.
11