Kernel PCA in application to leakage detection in drinking water distribution system Adam Nowicki, Michał Grochowski Faculty of Electrical and Control Engineering, Gdansk University of Technology, Narutowicza str. 11/12, 80-233 Gdansk, Poland
[email protected];
[email protected]
Abstract. Monitoring plays an important role in advanced control of complex dynamic systems. Precise information about system’s behaviour, including faults detection, enables efficient control. Proposed method- Kernel Principal Component Analysis (KPCA), a representative of machine learning, skilfully takes full advantage of the well known PCA method and extends its application to nonlinear case. The paper explains the general idea of KPCA and provides an example of how to utilize it for fault detection problem. The efficiency of described method is presented for application of leakage detection in drinking water systems, representing a complex and distributed dynamic system of a large scale. Simulations for Chojnice town show promising results of detecting and even localising the leakages, using limited number of measuring points. Keywords: machine learning, kernel PCA, fault detection, monitoring, water leakage detection.
1
Introduction
Several studies aims at estimating the losses in drinking water distribution systems (DWDS). Even though they differ with respect to the measurement methods and hence, are difficult to compare, the results are always alarming; European Commission studies show that they can be as high as 50% in certain areas of Europe. The losses can be considered as a difference between the volume of the water delivered to the system and the volume of authorized consumption (nonrevenue water-NRW). The World Bank estimates the worldwide NRW volume to be 48.6 billion m3/year [1] - most of it is caused by leakages. The most popular approach for detecting the leakages is acoustic based method [2-3]; overview of other methods can be found in [4-5]. The approach presented in this paper makes use of the fact that the appearance of the leakage can be discovered through analysis of flow and pressure measurements.
2
Adam Nowicki, Michał Grochowski
Water supply networks usually occupy large territories and are often subject to local disturbances which have limited effect on the remaining part of the network. This motivates building a number of local models rather than a single model of the entire network. The typical quantities measured are flows and pressures. The place the measurements are taken has an important impact on the efficiency of the monitoring system. In order to reduce the cost of the whole system it is desirable to deploy the instruments in a concentrated manner – around pipe junctions (called nodes), preferably with as many pipes crossing as possible. Then, for a node with pipes, 1 measurements are available: flows and a pressure. A model of such a node serves as a local model. A variety of methods for fault detection can be applied for leakage detection problem. An extensive review of most common approaches can be found in [6]. Water distribution system is dynamic, complex, nonlinear system with varying parameters. Clearly, in this case the quantitative modelling is a very demanding task, while there is no straightforward solution for qualitative approach. Moreover, the values of flows and pressure measured in a real-life networks at a given node are proven to be highly repeatable on the daily basis with a predictable variations depending on the season. During the leakage the relationship between measurements is disturbed thus providing a fault symptom. These aspects motivates the choice of data-driven approach for a problem of leakage detection. This paper presents the results of employing the Kernel Principal Component Analysis to this problem. Results of application of other data-driven approaches for the leakage detection can be found in [5],[7].
2
Data-driven approach for novelty detection using Kernel PCA
Consider a single row vector of a size 1 containing a set of measurements taken at the same time, denoted as :
⋯
(1)
This vector belongs to -dimensional space called input space . The measured signals are assumed to be chosen so that the vector determines operational state of the process, but not necessarily uniquely. It is sufficient that for any two measurements: ∈ , ∈ ∶
(2)
where and correspond to data collected during normal and faulty state of the process, respectively. This means that a single vector is capable of carrying a symptom of the fault. When dealing with data-driven methods for a fault detection, there are two approaches to determine whether the fault has occurred: novelty detection and classification. This paper presents solution based on novelty detection, where
Kernel PCA in application to leakage detection in drinking water distribution system 3
the model is built using the data from set only – a training set with a number of data points significantly larger than the dimension of an input space and covering all operational states of the process. Then, the task of fault detection can be reduced to the problem of finding a decision boundary in -dimensional input space that tightly encloses the training set. Hence, when previously unknown data is presented, the fault detection system is able to separate ordinary from novel patterns. If the data follows a continues linear pattern, PCA is a method of choice. This purely statistical method uses hypershpere as a decision boundary [8]. Unfortunately, most of real world applications involves dealing with non-linear patterns. A remedy to this might be VQPCA: it uses a number of local PCA models which are built using Voronoi scheme. However, its application is restricted to the cases where pattern can be approximated with piecewise linear patterns and no determinism is required. A relatively new method that does not suffer from these drawbacks is the Kernel PCA, introduced in [9]. It can be considered as a non-linear extension of PCA that combines multivariate analysis and machine learning. Instead of looking for a linear relationship between the variables in the input space, all measurements are mapped into a higher dimensional feature space through a non-linear mapping ∙: ⟶ , 1,2, …
(3)
〈, #〉 %, #
(4)
A subspace is identified within the feature space where PCA is used. Linear patterns detected in this space corresponds to non-linear patterns in the input space . The desirable size of is such that it allows to capture only the general pattern of the data; normally is of a higher dimension than . In a classical approach operations in large-dimensional spaces yields considerable workload since each vector is represented by a number of coordinates. Kernel PCA, which represents a group of so-called ‘kernel methods’, solves this problem using ‘the kernel trick’ described in [10]. For any algorithm which operates exclusively on inner product between data vectors, it is possible to substitute each occurrence of inner product with its kernel representation Inner product can be interpreted as a similarity measure between data points: if the angle between two different vectors is small it means that both data points follows the same linear pattern. A value of %, # is calculated using chosen kernel function, which operates on data in input space , but corresponds to the inner product between data in , thus allowing to detect linear patterns there. This means that there is no need to carry out the mapping from the input space into the feature space, explicitly. Moreover, neither coordinates of the , nor even mapping ∙ is needed to be known - from this point of view it is the kernel function that defines this mapping ∙. The choice of a proper function can be motivated by specific domain knowledge – this enables to incorporate heuristics into the method. In order to check if new data follows the pattern discovered in a training set, a mechanism
4
Adam Nowicki, Michał Grochowski
based on the reconstruction error may be used [8],[11]. This solution assumes that a subspace of feature space found during the training is suitable only for the data similar to the training set &. This means that for such a data, during the mapping : → minimum information is lost and gives almost the same result as mapping : → , thus not producing large reconstruction error .
3
Kernel PCA model
Application of the Kernel PCA method for monitoring purposes is a two-step process: in the first step the model is built and then, in the second step this model is used to determine the status of the system based on the latest measurements. In order to simplify the problem it is assumed that the model is non-adaptive which means that training set remains constant and therefore model is built only once. Let & be the matrix containing normalized training set with data points , 1,2 … given as row vectors in the input space ∈ ) :
⋯ )
⋱ ⋮ ⋮ & * - * ⋮ (5) , ⋯ ,,) ,
Since this is novelty detection it is assumed that ∈ for 1,2 … (data set represents only normal operating states). In kernel methods all the information about the data is stored in Kernel matrix / which contains the value of the kernel function %0 , 1 calculated for each pair of vectors from &. For gauss function: / %0 , 1 〈 , 0 1〉 exp
56 5 6 , 7 8 0 27
(6)
The value of the free parameter 7 is chosen empirically. Since / 1 and / / , only elements above the diagonal need to be computed, which means that computation of Kernel matrix / of [ size requires ∑,;
< evaluations of the kernel function. Classic PCA requires that data is normalized, i.e. ∑, < 0 for = 1,2, … . This is also the case when using Kernel PCA, but since data is expressed in terms of inner product, the normalization is applied indirectly, through Kernel matrix /. Each > can be expressed in terms of /: element of normalized Kernel matrix / > / 5 /
,
,
,
@<
@<
@,A<
1 1 1 ? /@ 5 ? /@ ? /@A
(7)
A mapping B 5 C which takes into account that the centre of the mass C of the training set & is moved to the origin of the coordinate system in the feature space , is associated with centring procedure given in (7).
Kernel PCA in application to leakage detection in drinking water distribution system 5
Normally, when applying classical PCA to the linear problems, eigenvectors D
of the covariance matrix E
,;
∙ & F & are searched for, since they define principal
components. In Kernel PCA the algorithm is applied to data in the feature space ,
so the primal PCA problem could be solved by finding eigenvectors of E ∙ ,;
F & B B B &, only the mapped data points & are not available explicitly. This problem is solved with different version of PCA, called dual PCA, which allows > by: to compute eigenvectors D of B F &B& using B&B F & / G D B&F (8) HI > given as a column where I is -th eigenvalue associated with -th eigenvector G of / B & vector. Although in (8) is not available explicitly, it will be possible to use D in this form later on. It is worth noting that with primal PCA at most eigenvectors for the covariance matrix E can be evaluated, while in Kernel PCA there can be evaluated as many as eigenvectors that spans -dimensional feature space . Since it is always possible to construct a single hyperplane consisting of any points in -dimensional space, thus it is possible to find a linear relationship between all mapped points; however, this might lead to overfitting. Therefore only J eigenvectors corresponding to J largest eigenvalues are used, resulting in the subspace that captures the general pattern in the data. The value of J is chosen empirically. These J eigenvalues are stored as column vectors in matrix K . Having a model defined, it is possible to evaluate a reconstruction error L∙. For a new data point F , this error, denoted as LF , can be regarded as the squared distance MF between the exact mapping of F into the feature space and its projection onto the chosen subspace (Fig. 1). Let FNO and FNO denote PC scores of F associated with the feature space and reduced feature space , respectively. Since principal components originates from the origin of coordinate system, using Pythagoras theorem: ‖FNO ‖
LF ‖MF ‖ ‖FNO ‖ 5 ‖FNO ‖
(9)
The term is a distance of the F from the origin of the coordinates in the feature space and can be calculated using inner product calculation: ‖FNO ‖ 6BF 6 〈F , F 〉 〈C , C 〉 5 2〈F , C 〉 ,
,
,
1 2 > 5 ? %F , %F , F ? ? / < <
<
(10)
The PC score FNO of the test point F in the reduced feature space is equal to projection of its image BF onto the eigenvectors K :
6
Adam Nowicki, Michał Grochowski
FNO
MF
FNO
YE representing
Fig. 1. Geometrical interpretation of the reconstruction error.
FNO BF K BF B&F
Q
(11) HΛS >, respectively. where Q and ΛS contain J first eigenvectors and eigenvalues of / F The term BF B& can be calculated using kernel function with correction that takes into account centring in the feature space resulting in vector /F : BF B&F /F /F ⋯ /F@ ⋯ /F, , /F@
,
,
<
<
,
,
1 1 1 >@ 5 ? %F , > %F , @ 5 ? / ??/
(12)
< <
Using (10) and combination of (11) and (12) the expression for reconstruction error LF in (9) can be calculated. The value of the error is always between zero and a value close to one. An error LF exceeding some selected maximal value of the reconstruction error L ,TU indicates that the test point F is not following the pattern defined by the training set &. This means that L ,TU serves as a decision boundary (Fig. 2) that enables to classify the current state of the system: LF V L ,TU ⇒ F ∈ (13) LF X L ,TU ⇒ F ∈ a) b)
Fig. 2. a) Reconstruction error for a test set containing measurements from normal state (green) and leakage (red) b) The same test set in the input space : data points from the leakage (red) are separated from data points from normal operation (green) by a decision boundary (bold). Model built from the training set (blue)
Kernel PCA in application to leakage detection in drinking water distribution system 7
4
Chojnice case study
In order to prove the efficiency of the presented method, a number of experiments has been carried out. Each single experiment aimed at answering the following question: “Is it possible to detect a leakage that occurred in node ‘A’ basing entirely on measurements taken at node ‘B’?”. All the experiments were based on the measurements provided by Epanet with simulations carried out using a calibrated model of a real network. This network has been divided into a number of regions that are partly independent in the sense of leakage discovery as described later. Training set corresponds to measurements collected during 6 days every 5 minutes, while the test set was collected during the 7th day, with the leakage being simulated in the middle of the day. For each of the monitoring nodes a KPCA model was built, with kernel width 7 1 and a size of the feature space Z set to J 70. Values of these parameters were chosen as a result of an analysis. Since the data is normalized and the training set has the same size yielding the same dimension of the full feature space , the result of applying common parameters for all nodes provides satisfactory outcome, however this leaves the place for further improvements. The third parameter - maximal allowed reconstruction error L\ was chosen so that 99,9% of training set is considered to be in the normal state. In order to check what is the monitoring potential of each node, results from a set of following experiments were brought together: for a fixed monitored node leakages of a chosen size were consecutively simulated in adjoining nodes. This has provided a monitoring range map of a chosen node. Some nodes presents better performance than others. This is caused by the diverse effect of a certain leakage on the measurements. Simulations carried out proved that there are several factors that have strong influence on the performance of the node: The position of monitored node in respect to the leakage node. As explained earlier a leakage causes disturbance within a limited range. This range is different for each node since compensation of adjoining nodes for water loss is never the same (Fig. 3). Furthermore, it turns out that the large supply pipes have the significant ability to mask the leakage symptom as they can easily deliver increased amount of water without affecting its own state. This motivates dividing the network into separate regions with the arrangement of supply pipes taken into account [7] . Size of the leakage. The influence of the leakage on the measured variables is in general proportional to the relative amount of water lost in a leakage. Larger leakages cause stronger disturbances and as a result larger area is affected. This explains why a monitoring node performs much better in case of larger leakages (Fig 3c). The time of a day. The amount of the water flowing through the pipes changes throughout the day with the least water being supplied at night. Although the value of the pressure is generally lower at this time of the day resulting in less water being
8
a)
Adam Nowicki, Michał Grochowski
b)
monitoring node leakages detectable from monitoring node
c)
leakages undetectable from monitoring node not covered by the experiment
Fig. 3. a) (top) leakages simulated in different nodes have different impact on the reconstruction error E(t) evaluated for monitored node, (bottom) results of the experiment presented on the map; b) poor (top) and good (bottom) candidate for monitoring node – simulated leakage 2 m^ /h, c) the range of detection depends on the size of the leakage: 1.5m^ /h (top), 12m^ /h (bottom)
lost, it is much easier to observe the leakage as it has relatively stronger impact on the network. This means that the sensitivity of the monitoring node changes throughout the day. Even though the area covered by the detection system differs for each monitoring node, they share some common properties: the area is always continuous and concentrated around monitoring node in a non-symmetric manner. The monitored nodes should be chosen carefully as there might be a significant difference in performance between a good candidate and a poor one (Fig. 3b). Setting a number of monitoring nodes provides a possibility to monitor an entire network. Since the area monitored by each node is subject to change depending on the leakage size, the number of required nodes heavily depends on the expected sensitivity of the system: if one needs to detect and to precisely localise even small leakages this requires setting a large number of monitoring nodes close to each other. The possibility to detect the leakages only in close neighbourhood of monitored node extends application of the method to localization of the potential fault (Fig. 4). If for
Kernel PCA in application to leakage detection in drinking water distribution system 9
a)
b)
c)
Fig. 4. The idea of leakage localisation using local models: a) an example of the detection range for three monitored nodes with simulated leakages Q=2 m^ /h, b) monitored nodes marked in blue, place of the simulated leakage marked in red c) values of reconstruction error in monitored nodes for situation given in b).
a single node current value of reconstruction error La exceeds L,TU , it indicates that a leakage occurred in some close neighbourhood. If several nodes report an error at the same time this suggest that a larger leakage occurred somewhere in between.
5
Conclusions and future work
The paper describes an approach to detect the leakages in water distribution system using Kernel PCA method with a limited number of measurements. The arrangement of the measuring points is determined through simulations and heuristics in order to ensure efficient fault detecting abilities of local KPCA models. By adjusting the number of controlled nodes, one can set a sensitivity of the system to maintain economic level of real losses. The usage of KPCA, instead of conventional PCA, reduces number of false alarms and prevents model conservatism. The methodology was verified on calibrated model and data of Chojnice town (Northern Poland). At this stage of the research localisation of the leakages is supervised by a man, however promising results completing the process of automatic fault detection and localisation are obtained by the paper Authors with usage of Self Organising Maps. Other faults (such as pump or valve breakdown, water contamination) can be identified and isolated using this approach. Moreover, the method is rather of a generic nature, hence might be transferred into similar systems e.g. pipeline systems,
10
Adam Nowicki, Michał Grochowski
telecommunication systems, power systems etc, known as a network systems. Optimal and adaptive parameters of KPCA models selecting predispose the method to real time diagnostic and control systems e.g. Fault Tolerant Model Predictive Control.
References 1.
Thornton, J., Sturm, R., Kunkel, G.: Water Loss Control. The McGraw-Hill Companies, New York (2008) 2. Water Audits and Loss Control Programs - Manual of Water Supply Practices, M36. American Water Works Association, (2009). 3. Jin, Y., Yumei, W., Ping, L.: Leak Acoustic Detection in Water Distribution Pipeline, Proceedings of the 7th World Congress on Intelligent Control and Automation, pp. 3057-3061 (2008) 4. Xiao-Li, C., Chao-Yuan, J., Si-Yuan, G., Leakage monitoring and locating method of water supply pipe network. Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, pp. 3549-3551, (2008). 5. Mashford, J., Silva, D.D., Marney, D., Burn, S.: An approach to leak detection in pipe networks using analysis of monitored pressure values by support vector machine. In: 2009 Third International Conference on Network and System Security, pp.534-539 (2009). 6. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.: A review of process fault detection and diagnosis. Part I, II, III. Computers and Chemical Engineering, 27, pp. 293-346, (2003) 7. Duzinkiewicz, K., Borowa, A., Mazur, K., Grochowski, M., Brdys, M.A., Jezior, K.: Leakage Detection and Localization in Drinking Water Distribuition Networks by MultiRegional PCA. Studies in Informatics and Control, 17 (2), pp. 135-152 (2008) 8. Jackson, J.E., Mudholkar, G.,: Control procedures for residuals associated with principal component analysis. Technometrics, 21, pp. 341-349 (1979). 9. Schölkopf, B., Smola, A.J., Müller, K.R,: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, pp. 1299–1319 (1998). 10. Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, pp. 821–837 (1964). 11. Hoffman, H.: Kernel PCA for novelty detection. Pattern Recognition, 40, pp. 863-874 (2007).