Ideas for Fault Detection Using Relation Discovery - Halmstad University

4 downloads 7227 Views 1MB Size Report
May 14, 2012 - in limited budget for on-board sensors and the amount of .... to a central server. There, they .... of dedicated sensors is also a problem: neither of.
The 27th annual workshop of the Swedish Artificial Intelligence Society (SAIS), 14–15 May 2012, Örebro, Sweden Published by Linköping University Electronic Press http://www.ep.liu.se/ecp_home/index.en.aspx?issue=071

Ideas for Fault Detection Using Relation Discovery Slawomir Nowaczyk

Rune Prytz

Stefan Byttner

Halmstad University

Volvo Group Trucks Technology

Halmstad University

[email protected]

[email protected]

[email protected]

Abstract Predictive maintenance is becoming more and more important in many industries, especially taking into account the increasing focus on offering uptime guarantees to the customers. However, in automotive industry, there is a limitation on the engineering effort and sensor capabilities available for that purpose. Luckily, it has recently become feasible to analyse large amounts of data on-board vehicles in a timely manner. This allows approaches based on data mining and pattern recognition techniques to augment existing, hand crafted algorithms. Automated deviation detection offers both broader applicability, by virtue of detecting unexpected faults and cross-analysing data from different subsystems, as well as higher sensitivity, due to its ability to take into account specifics of a selected, small set of vehicles used in a particular way under similar conditions. In a project called Redi2Service we work towards developing methods for autonomous and unsupervised relationship discovery, algorithms for detecting deviations within those relationships (both considering different moments in time, and different vehicles in a fleet), as well as ways to correlate those deviations to known and unknown faults. In this paper we present the type of data we are working with, justify why we believe relationships between signals are a good knowledge representation, and show results of early experiments where supervised learning was used to evaluate discovered relations.

1

Introduction

Most industries nowadays are moving towards more sophisticated cyber-physical systems, with new challenges arising from increased software and sys-

tem complexity. Developing and, even more importantly, maintaining those systems requires a significant engineering effort. For commercial ground vehicle operators (such as bus and truck fleet owners), the maintenance strategy is typically reactive, meaning that a fault is fixed only after it has become an issue affecting vehicle’s performance. Uptime guarantees consist in scheduling component maintenance and replacement based on statistical lifetime predictions. The biggest difficulty in moving towards predictive maintenance, in the vehicle industry, lies in limited budget for on-board sensors and the amount of engineering time it takes to develop diagnostic algorithms. Predicting that there is a need for maintenance before something breaks down is virtually impossible to plan during vehicle development cycle, especially if diagnostic algorithms need to handle multiple different kinds of faults, work in a consistent manner on a wide variety of vehicle configurations, as well as for many different types of operation under varying environment conditions. The development costs of fault diagnostics in the classical paradigm will keep growing, with the current trend of increasing number of components in vehicles and stricter requirements on their efficiency. The only solution seems to be augmenting engineering work with automated data analysis. This has been made possible by the introduction of low-cost wireless communication. Data mining can now be performed on-board real vehicles as they are being used. The subsystems that are critical for safety or long-term health of the vehicle will always use, to some degree at least, diagnostic mechanism developed and tested by engineers, but there is a lot of value to be gained from monitoring the state of as many additional subsystems as possible. Our approach is based on unsupervised discovery of relations between various signals that are avail-

1

able on the internal vehicle network. While it is difficult to detect faults by looking at characteristics of signals (such as road speed ) in isolation, the interrelations of connected signals are more indicative of abnormal conditions. The difference between our work and most other approaches lies in the requirement that relation discovery be done completely autonomously. While engineers are often able to propose a large number of “good” relations between various signals, those are typically general enough to hold for all or almost all vehicles. A data mining system we aim to develop, however, should be tightly connected to a particular fleet of vehicles, either by geographic region, vehicle configurations or type of operation. Some of the relations that are useful for detecting faults in long-haul trucks would be inadequate for delivery trucks, for example. Today, it is not feasible to develop specialised diagnostic algorithms for each of those cases, even though they would be very useful. Therefore, the two main benefits of automated deviation detection are broader applicability and higher sensitivity. The former is obtained by no longer requiring an expert to target a particular subsystem, analyse it in isolation, predict possible faults, what their symptoms would be, and which of those symptoms can be guaranteed to never appear during normal operation. Instead, the data mining approach can take a more “complete vehicle” perspective, both cross-checking data from different subsystems, as well as detecting faults nobody has predicted can take place. There are many potential problems that are currently not being monitored, since the cost of hand-crafting diagnosis methods for them is too high. We expect fully automatic methods to be less reliable than engineered and heavily tested diagnostic routines, but they are not intended to replace, but rather supplement them. Higher sensitivity can be achieved because unsupervised deviation detection algorithms can take a fleet-based approach, where “normal operation” can be defined on a much smaller scale. There are very few subsystems in a vehicle where manufacturers can afford to develop a diagnostic method that is specialised (or at least parameterised) for a particular geographical region or a particular usage pattern. In the case of pattern recognition approaches, it is very easy to define “standards” for a particular location or for a particular type of op-

eration — or even for both at the same time, for example by only comparing vehicles within a single company. This can lead to earlier detection of problems, where a fleet owner is alerted that a given symptom is unusual for their vehicles, even though the same symptom would be perfectly normal for another operator. This is becoming more important as new hardware and, especially, software capabilities in vehicles lead to higher customisability of the manufacturers’ offer and mean that “one size fits all” solutions are becoming less desirable. This paper is organised as follows. We briefly present related research in the next section, followed by description of data we are working with in Section 3. We discuss signal relations in Section 4 and ways to discover them in 5. We close by evaluation in Section 6 and conclusions in Section 7.

2

Related research

Automated data mining for vehicle applications has previously been the topic of several papers. An early paper by Kargupta et. al. [4] shows a system architecture for distributed data-mining in vehicles, and discusses the challenges in automating vehicle data analysis. In Zhang et al. [9], being able to do cross-fleet analysis (comparing properties of different vehicles) is shown to benefit rootcause analysis for pre-production diagnostics. In Byttner et. al. [1], a method called COSMO is proposed for distributed search of “interesting relations” (e.g. strong linear correlations that hold for long periods of time) among on-board signals in a fleet of vehicles. The interesting relations can then be monitored over time to enable e.g. deviation detection in specific components. A method based on a similar concept of monitoring correlations (but for a single vehicle instead of a fleet) is shown in D’Silva [2]. In Vachkov [8], the neural gas algorithm is used to model interesting relations for diagnostic of hydraulic excavators. Contrary to our work, however, both the papers by D’Silva and Vachkov assume that the signals which contain the interesting relations are known apriori. In [5], a method for monitoring relations between signals in aircraft engines is presented. Relations are compared across a fleet of planes and flights. Unlike us, however, they focus on discovering relationships that are later evaluated by domain experts.

2

0.4051

1 Feb

0.1490

0.0548 15 Jan 0.0201

2 Jan

0.0074 15 Dec 6 Dec

0.0027

15 Nov

0.0010

0.0003

31 Oct

0.0001 16 Oct 13 Oct

40

50

60

70

80

90

0.0000

Figure 1: Engine Coolant Temperature

3

Description of data

As a part of the Redi2Service project, we have equipped 19 Volvo buses with the hardware capable of collecting data from the internal vehicle network. Our setup has been in place since September 2011, giving us almost half a year worth of data from their operation in western Sweden. The data consists of over 100 signals that are measured with a sampling frequency of 1 Hz. This results in a volume of roughly 10GB of data per week. As an example, in Figure 2 we plot the values of a signal called Vehicle Speed over a half an hour episode. 70 60 50 40 30 20 10 0 13:20

13:25

13:30

13:35

13:40

13:45

13:50

Figure 2: Vehicle Speed

This gives us access to data from a real-world operation of those vehicles, without any artificial modification to either the usage or condition of the buses. On one hand, this makes the data extremely valuable, but at the same time makes it difficult to analyse, since we cannot assume that all the buses are in the perfect initial condition. In fact, we are already seeing symptoms that indicate significant number of technical problems in some of them. The most natural way of analysing such data is by using histograms. This is not the only way,

of course, and there are a number of characteristics that are not captured by a histogram, but this is a good starting point. In the abstract, before bringing human expert knowledge to the table, we are interested in differences in signal characteristics along two axes: between different time periods and between different vehicles. In this paper we will focus on the first one. We need to analyse how does a particular signal change in time, hopefully leading to a discovery of components that are starting in good condition, slowly wear out, until they reach a point where they can be considered “broken” and they start to negatively affect the performance of the whole bus. A promising trend can be seen in Figure 1, where we plot a sequence of histograms for the signal called Engine Coolant Temperature over a period of four months. Each horizontal line corresponds to a set of 20000 data readings, presented as a colour map histogram with logarithmic scale. The bar on the right visualises how the value probabilities are mapped into colours. It is interesting to note that the actual amount of data we obtain from each bus varies significantly in time, according to usage in a given period. Therefore, while the Y axis in Figure 1 represents real time, it is definitely not linear, as indicated by the dates shown. This plot, however, reveals a critical flaw in looking at signals in isolation. To a human expert Figure 1 does not indicate a trend corresponding to a wearing out component, but rather an influence of a well-known external condition: it is simply significantly colder in January than it is in October.

3

4

Signal Relations

One possible way to increase robustness against external influences is to look at relations between different but related signals. Our claim — supported, to some degree at least, both by the data we have been collecting now and by results of earlier experiments — is that there exist a large number of “interesting” relationships between signals and that those are a better predictors of faults than characteristics of individual signals alone. One such example would be relation between signals Oil Temperature and the aforementioned Engine Coolant Temperature, depicted in Figure 3. It is more difficult to visualise such relation across multiple time periods, and so we have decided to only present a scatter plot for October 2011 and for January 2012, each containing 40000 readings. As can be expected due to the basic laws of thermodynamics, there is a strong linear relation between those two signals. The plots are definitely not identical (for example, both signals reach higher values in October), but there is a fundamental structure to the relation that has not changed. Our goal is to capture this in a model. Faults that affect one of the subsystems but not the other would then introduce a systematic shift that would change parameters of that model.

Figure 3: Scatter plot of Oil Temperature against Engine Coolant Temperature, October and January

Of course, relations between signals can be arbitrarily complex, but one aspect of the hardware installed in our buses is it’s capability to perform pattern recognition on-board. Right now we are storing all the data on USB memories and only mining this data off-line, in order to better understand

what we are dealing with. Ultimately, however, the model estimation will be done on the vehicle and only the result of it will be transmitted, wirelessly, to a central server. There, they will be compared with data from the past and from other members of the fleet, and decision will be made whether any deviations are interesting enough to show the user. Due to limited computational power, we have mostly limited ourselves to looking at linear models. We are investigating other solutions, however, since non-linear relations are quite common. An important resource is also a database called Vehicle Service Records, which contains a detailed information about every repair and maintenance operation during the lifetime of a bus. It will allow us to not only inform the user that there is a problem with their vehicle, but also what had to be done to fix it last time similar thing happened.

5

Relation discovery

In general, it is far from trivial to evaluate ideas we have presented in the previous sections. The only true measure is the savings in maintenance expenses once the system is deployed. However, as a start, we have performed an experiment on a Volvo VN780 truck, where we have been collecting values of 21 signals, over 10 driving runs with four different faults injected, as well as 4 runs under normal operating conditions. Each episode lasted approximately four hours, and took place in a controlled environment under a variety of driving situations. The exact details of faults are not important here, but they include clogged of Air Filter and Grill, leaking Charge Air Cooler and partially congested Exhaust Pipe. The method we used for discovering relations consists of three steps. We start with data preprocessing and removing the influence of ambient conditions, but we do not discuss details here, interested readers can find them in [6]. We then proceed to choose the most interesting signals to model, as well as which signals should be used to model them. Finally, we estimate model parameters. The main challenge is to determine which relations exist between signals. We begin by modelling each signal using all other signals as regressors: ! n X 2 ⊤ Ψk = arg min yk (t) − Ψ ϕk (t) (1) Ψ∈Rs−1

4

t=1

Figure 4: Model parameters (Lasso method)1

where s is number of signals, Ψk is a vector of parameter estimates for the model of yk and ϕk is the regressor for yk (i.e. the set of all other signals). Following the LASSO (Least Absolute Shrinkage and Selection Operator) method [7], we use an energy constraint Ck as an upper bound on the sum of absolute values of all parameters for yk : s−1 X ||Ψk,i || < Ck (2) i=0

We gradually increase value of Ck , performing a cross-validation test after each run. Initially, the mean squared error of the model keeps decreasing, but at some point it begins to increase, as it starts to overfit. We then make a decision of whether the MSE is sufficiently low to consider this model to be good enough. Different Ck are optimal for each model, with some never reaching below the chosen MSE threshold and thus being considered uninteresting for further analysis. The second stage consists of finding and removing insignificant model parameters, namely those which are unstable and with low values. To this end, a sequence of estimates for each regressor within a model is collected over a series of time slices. We perform a t-test to find which of those estimates are significant, i.e. which are non-zero. This allows us to remove artificial signal dependencies, leaving only strong relationships. For calculating parameters for the selected models at difference times, we have tested two different approaches. The first is the LASSO method as outlined above, where we split data into a number of time slices, and, for each slice, calculate optimal model parameters. The second method is RLS (Recursive Least Squares) method [3], which recursively calculates the estimates over a sliding window defined by the forgetting factor: 1 previously

published in [6]

Figure 5: Model parameters (RLS method)1

−1 P (0) = δinit I

Θ(0) = Θinit ⊤

e(n) = y(n) − Θ (n − 1)ϕ(n) P (n − 1)ϕ(n) g(n) = λ + ϕ⊤ (n)P (n − 1)ϕ(n)

(3) (4) (5)

P (n) = λ−1 P (n − 1) − g(n)ϕ⊤(n)λ−1 P (n − 1) (6) Θ(n) = Θ(n − 1) + e(n)g(n) (7) The reason we chose those two methods is that LASSO approach allows an estimator to easily adapt to models that are changing in time, at the cost of possible oscillating behaviour if several models are of similar quality. On the other hand, RLS offers very fast convergence, but — due to it’s incremental nature — takes a long time to “catch up” if the underlying relation changes. As an example, Figures 4 and 5 show parameters in “fuel inst = p1 ∗cac in p+p2 ∗in manif t” relation: fuel inst (instantaneous fuel consumption) can be approximated using cac in p (charge air cooler input pressure) and in manif t (input manifold temperature). We plotted estimates obtained by both LASSO and RLS methods using colours to mark which fault was injected during a particular run. From among our four faults, only clogged Air Filter (AF) can be discovered based on the fuel inst relation above. There are other relations that are useful for other faults, of course, but it is difficult to get a clear overview of the complete solution.

6

Evaluation

As shown in the previous section, some faults can be detected reasonable easily. Unfortunately, it does not hold for all of them. Actually, the biggest problem with the experiment as we have done it is that some of the injected faults were easy, while others were very difficult to detect.

5

7

Figure 6: Classification error, Lasso and RLS1

We have decided to use supervised learning to evaluate whether parameters of our models (both from LASSO and RLS estimators) can be useful for detecting faults. We tried three different classifiers: linear regression, support vector machine (SVM) and random forest. Both the forgetting factor (for RLS) and the number of data slices (for LASSO) are parameters for tuning. We have found that less slices and larger forgetting factor gives better signal to noise ratio and a more robust solution. However, they are a lot less sensitive to faults that are only apparent under certain conditions. This is due to the smoothing larger slices and forgetting factor results in. As an example, a partially clogged air filter will only have a visible effect if the engine is running at high power, since this is the only situation when a large air flow is required. We have run the classification task a number of times, varying the time slice size the and forgetting factor. It is easy to see from Figure 6 that choosing too small forgetting factor for RLS is detrimental. On the other hand, the effect of choosing too many data slices is hardly visible. In general, the random forest classifier outperforms both SVM and linear classifier by a pretty large margin. We do not know why this is the case, since we have not investigated the classification itself in great depth. More interestingly, RLS estimator appears to give slightly better results than the LASSO estimator, but it probably is not worth the increased computational complexity. As a final comment, the resulting classification error appears to be rather high, but it is important to take into account that this data set is a very difficult one. There is a lot of different external influences that disturb the “normal” operation of a truck, and the low quality of available sensors result in high levels of noise in the data. The lack of dedicated sensors is also a problem: neither of the four faults we have analysed is being monitored in any way for current in-production vehicles.

Conclusions

In this paper we present a project that we are involved in, developing an unsupervised algorithm for discovering interesting relations between time series of vehicle signal data, to be used for fault detection and predictive maintenance. We present our approach and show initial evaluation, using supervised learning, on the data collected from a Volvo truck during a fault injection experiment. This is a step towards a system that would be able to analyse on-board data on real vehicles and detect anomalies in an autonomous way. Ideas presented here are very much work in progress and there are numerous directions to extend those results. Primarily, we have not really answered the question of how to distinguish “interesting” relations from “uninteresting” ones, especially taking into account that we are looking for those that hold most, but definitely not all, of the time. It is also not quite clear if the supervised classification is the best way of evaluating usefulness of discovered relations. We intend to explore other possibilities, especially those connected to the service records database we have access to. Acknowledgements. This work has been partially funded by grants from VINNOVA and from the Knowledge Foundation.

References [1] S. Byttner, T. R¨ ognvaldsson, and M. Svensson. Consensus self-organized models for fault detection (COSMO). Engineering Applications of Artificial Intelligence, 24(5):833–839, 2011. [2] S.H. D’Silva. Diagnostics based on the statistical correlation of sensors. Technical Report 2008-01-0129, Society of Automotive Engineers (SAE), 2008. [3] Monson H. Hayes. Statistical Digital Signal Processing and Modeling. John Wiley & Sons, Inc., 1996. [4] H. Kargupta et al. VEDAS: A mobile and distributed data stream mining system for real-time vehicle monitoring. In Int. SIAM Data Mining Conference, 2003. [5] J. Lacaille and E. Come. Visual mining and statistics for turbofan engine fleet. In IEEE Aerospace Conf., 2011. [6] Rune Prytz, Slawomir Nowaczyk, and Stefan Byttner. Towards relation discovery for diagnostics. In Proceedings of the First International Workshop on Data Mining for Service and Maintenance, pages 23–27, 2011. [7] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58, 1996. [8] G. Vachkov. Intelligent data analysis for performance evaluation and fault diagnosis in complex systems. In IEEE International Conference on Fuzzy Systems, pages 6322–6329, July 2006. [9] Y. Zhang, G.W. Gantt, et al. Connected vehicle diagnostics and prognostics, concept, and initial practice. IEEE Transactions on Reliability, 58(2), 2009.

6