Fuzzy Clustering Based Segmentation of Time-Series Janos Abonyi, Balazs Feil, Sandor Nemeth, and Peter Arva University of Veszprem, Department of Process Engineering, P.O. Box 158, H-8201, Hungary,
[email protected]
Abstract. The segmentation of time-series is a constrained clustering problem: the data points should be grouped by their similarity, but with the constraint that all points in a cluster must come from successive time points. The changes of the variables of a time-series are usually vague and do not focused on any particular time point. Therefore it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation. This paper proposes a clustering algorithm for the simultaneous identification of fuzzy sets which represent the segments in time and the local PCA models used to measure the homogeneity of the segments. The algorithm is applied to the monitoring of the production of high-density polyethylene.
1
Introduction
A sequence of N observed data, x1 , . . . , xN , ordered in time, is called a timeseries. Real-life time-series can be taken from business, physical, social and behavioral science, economics, engineering, etc. Time-series segmentation addresses the following data mining problem: given a time-series T , find a partitioning of T to c segments that are internally homogeneous [1]. Depending on the application, the goal of the segmentation is used to locate stable periods of time, to identify change points, or to simply compress the original time-series into a more compact representation [2]. Although in many real-life applications a lot of variables must be simultaneously tracked and monitored, most of the segmentation algorithms are used for the analysis of only one time-variant variable [3]. The aim of this paper is to develop an algorithm that is able to handle time-varying characteristics of multivariate data: (i) changes in the mean; (ii) changes in the variance; and (iii) changes in the correlation structure among variables. Multivariate statistical tools, like Principal Component Analysis (PCA) can be applied to discover such information [4]. PCA maps the data points into a lower dimensional space, which is useful in the analysis and visualization of the correlated high-dimensional data [5]. Linear PCA models have two particularly desirable features: they can be understood in great detail and they are straightforward to implement. Linear PCA can give good prediction results for simple time-series, but can fail to the analysis of historical data with changes in regime M.R. Berthold et al. (Eds.): IDA 2003, LNCS 2810, pp. 275–285, 2003. c Springer-Verlag Berlin Heidelberg 2003
276
J. Abonyi et al.
or when the relations among the variables are nonlinear. The analysis of such data requires the detection of locally correlated clusters [6]. The proposed algorithm does the clustering of the data to discover the local relationships among the variables similarly to mixture of Principal Component Models [4]. The changes of the variables of the time-series are usually vague and do not focused on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. For example, if humans visually analyze historical process data, they use expressions like ”this point belongs to this operating point less and belongs to the another more”. A good example of this kind of fuzzy segmentation is how fuzzily the start and the end of early morning is defined. Fuzzy logic is widely used in various applications where the grouping of overlapping and vague elements is necessary [7], and there are many fruitful examples in the literature for the combination of fuzzy logic with time-series analysis tools [8,2,9,10,16]. Time-series segmentation may be viewed as clustering, but with a timeordered structure. The contribution of this paper is the introduction of a new fuzzy clustering algorithm which can be effectively used to segment large, multivariate time-series. Since the points in a cluster must come from successive time points, the time-coordinate of the data should be also considered. One way to deal with time is to use time as an additional variable and cluster the observations using a distance between cases which incorporates time. Hence, the clustering is based on a distance measure that contains two terms: the first distance term is based on how the data is in the given segment according to Gaussian fuzzy sets defined in the time domain, while the second term measures how far the data is from hyperplane of the PCA model used to measure the homogeneity of the segments. The remainder of this paper is organized as follows. Section 2 defines the timesegmentation problem and presents the proposed clustering algorithm. Section 3 contains a computational example using historical process data collected during the production of high-density polyethylene. Finally, conclusions are given in Section 4.
2 2.1
Clustering Based Time-Series Segmentation Time-Series and Segmentations
A time-series T = {xk |1 ≤ k ≤ N } is a finite set of N samples labelled by time points t1 , . . . , tN , where xk = [x1,k , x2,k , . . . , xn,k ]T . A segment of T is a set of consecutive time points S(a, b) = {a ≤ k ≤ b}, xa , xa+1 , . . . , xb . The csegmentation of time-series T is a partition of T to c non-overlapping segments STc = {Si (ai , bi )|1 ≤ i ≤ c}, such that a1 = 1, bc = N , and ai = bi−1 + 1. In other words, an c-segmentation splits T to c disjoint time intervals by segment boundaries s1 < s2 < . . . < sc , where Si (si−1 + 1, si ). Usually the goal is to find homogeneous segments from a given time-series. In such a case the segmentation problem can be described as constrained clustering:
Fuzzy Clustering Based Segmentation of Time-Series
277
data points should be grouped by their similarity, but with the constraint that all points in a cluster must come from successive time points. In order to formalize this goal, a cost function cost(S(a, b)) with the internal homogeneity of individual segments should be defined. The cost function can be any arbitrary function. For example in [1,11] the sum of variances of the variables in the segment was defined as cost(S(a, b)). Usually, the cost function cost(S(a, b)) is defined based on the distances between the actual values of the time-series and the values given by the a simple function (constant or linear function, or a polynomial of a higher but limited degree) fitted to the data of each segment. The optimal c-segmentation simultaneously determines the ai , bi borders of the segments and the θi parameter vectors of the models of the segments. Hence, the segmentation algorithms minimize the cost of c-segmentation which is usually the sum of the costs of the individual segments: cost(STc ) =
c
cost(Si ) .
(1)
i=1
The (1) cost function can be minimized by dynamic programming to find the segments (e.g. [11]). Unfortunately, dynamic programming is computationally intractable for many real data sets. Consequently, heuristic optimization techniques such as greedy top-down or bottom-up techniques are frequently used to find good but suboptimal c-segmentations: Sliding window: A segment is grown until it exceeds some error bound. The process repeats with the next data point not included in the newly approximated segment. For example a linear model is fitted on the observed period and the modelling error is analyzed [12]. Top-down method: The time-series is recursively partitioned until some stopping criteria is met. [12]. Bottom-up method: Starting from the finest possible approximation, segments are merged until some stopping criteria is met [12]. Search for inflection points: Searching for primitive episodes located between two inflection points [5]. In the following subsection a different approach will be presented based on the clustering of the data. 2.2
Clustering Based Fuzzy Segmentation
When the variance of the segments are minimized during the segmentation, (1) results in the following equation: cost(STc ) =
c
si
i=1 k=si−1 +1
xk − vix 2 =
c N
2 βi (tk )Di,k (vix , xk ) .
(2)
i=1 k=1
2 where Di,k (vix , xk ) represents the distance how far the vix = mean of the variables in the i-th segment (center of the i-th cluster) is from the xk data point; and
278
J. Abonyi et al.
βi (tk ) = {0, 1} stands for the crisp membership of the k-th data point is in the i-th segment: 1 if si−1 < k ≤ si (3) βi (tk ) = 0 otherwise This equation is well comparable to the typical error measure of standard kmeans clustering but in this case the clusters are limited to being contiguous segments of the time-series instead of Voronoi regions in Rn . The changes of the variables of the time-series are usually vague and do not focused on any particular time point. As it is not practical to define crisp bounds of the segments, Gaussian membership functions are used to represent the βi (tk ) = [0, 1] fuzzy segments of time-series Ai (tk ), which choice leads to the following compact formula for the normalized degree of fulfillment of the membership degrees of the k-th observation is in the i-th segment: 2 1 (tk − vit ) Ai (tk ) Ai (tk ) = exp − , βi (tk ) = (4) c 2 σi2 Aj (tk ) j=1
For the identification of the vit centers and σi2 variances of the membership functions, a fuzzy clustering algorithm is introduced. The algorithm, which is similar to the modified Gath-Geva clustering [13], assumes that the data can be effectively modelled as a mixture multivariate (including time as a variable) Gaussian distribution, so it minimizes the sum of the weighted squared distances between the zk = [tk , xTk ]T data points and the ηi cluster prototypes J=
c N
m
(µi,k )
2 Di,k (ηi , zk )
=
i=1 k=1
c N
m
2 2 (µi,k ) Di,k (vit , tk )Di,k (vix , xk )
(5)
i=1 k=1
where µi,k represents the degree of membership of the observation zk = [tk , xTk ]T is in the i-th cluster (i = 1, . . . , c), m ∈ [1, ∞) is a weighting exponent that determines the fuzziness of the resulting clusters (usually chosen as m = 2). 2 As can be seen, the clustering is based on a Di,k (ηi , zk ) distance measure, 2 t which consists of two terms. The first one, Di,k (vi , tk ), is the distance between the k-the data point and the vit center of the i-th segment in time 2 t 1 1 (tk − vit )2 , (vi , tk ) = exp − 1 Di,k 2 σi2 2πσi2
(6)
where the center and the standard deviation of the Gaussian function are N
vit =
N
m
(µi,k ) tk
k=1 N
k=1
m
(µi,k )
, σi2 =
k=1
m
2
(µi,k ) (tk − vkt ) N k=1
. m
(µi,k )
(7)
Fuzzy Clustering Based Segmentation of Time-Series
279
The second term represents the distance between the cluster prototype and the data in the feature space 2 αi det(Ai ) 1 x x T x exp − (xk − vi ) (Ai ) (xk − vi ) , (8) 1 Di,k (vi , xk ) = 2 (2π)r/2 where αi represents the a priori probability of the cluster and vix the coordinate of the i-th cluster center in the feature space, N 1 αi = µi,k , vix = N k=1
N
m
(µi,k ) xk
k=1 N
k=1
,
(9)
m
(µi,k )
and r is the rank of Ai distance norm corresponding to the i-th cluster. The Ai distance norm can be defined in many ways. It is wise to scale the variables so that those with greater variability do not dominate the clustering. One can scale by dividing by standard deviations, but a better procedure is to use statistical (Mahalanobis) distance, which also adjusts for the correlations among the variables. In this case Ai is the inverse of the fuzzy covariance matrix Ai = F−1 i , where N m T (µi,k ) (xk − vix ) (xk − vix ) . (10) Fi = k=1 N m (µi,k ) k=1
When the variables are highly correlated, the Fi covariance matrix can be ill conditioned and cannot be inverted. Recently two methods have been worked out to handle this problem [14]. The first method is based on fixing the ratio between the maximal and minimal eigenvalue of the covariance matrix. The second method is based on adding a scaled unity matrix to the calculated covariance matrix. Both methods result in invertible matrices, but none of them extract the potential information about the hidden structure of the data. Principal Component Analysis (PCA) is based on the projection of correlated high-dimensional data onto a hyperplane which is useful for the visualization and the analysis of the data. This mapping uses only the first few p nonzero eigenvalues and the corresponding eigenvectors of the Fi = Ui Λi UTi , covariance matrix, decomposed to the Λi matrix that includes the eigenvalues of Fi in its diagonal in decreasing order, and to the Ui matrix that includes the eigenvectors −1 corresponding to the eigenvalues in its columns, yi,k = Λi,p2 UTi,p xk , When the hyperplane of the PCA model has adequate number of dimensions, the distance of the data from the hyperplane is resulted by measurement failures, disturbances and negligible information, so the projection of the data into this p-dimensional hyperplane does not cause significant reconstruction error Qi,k = (xk − x ˆk )T (xk − x ˆk ) = xTk (I − Ui,p UTi,p )xk .
(11)
280
J. Abonyi et al.
Although the relationship among the variables can be effectively described by a linear model, in some cases it is possible that the data is distributed around some separated centers in this linear subspace. The Hotelling T 2 measure is often used to calculate the distance of the data point from the center in this linear subspace 2 T Ti,k = yi,k yi,k . (12) The T 2 and Q measures are often used for the monitoring of multivariate systems and for the exploration of the errors and the causes of the errors. The Figure 1 shows these parameters in case of two variables and one principal component.
Fig. 1. Distance measures based on the PCA model.
The main idea of this paper is the usage of these measures as measure of the homogeneity of the segments: Fuzzy-PCA-Q method: The distance measure is based on the Q model error. The Ai distance norm is the inverse of the matrix constructed from the n−p smallest eigenvalues and eigenvectors of Fi . A−1 = Ui,n−p Λi,n−p UTi,n−p , i
(13)
Fuzzy-PCA-T 2 method: The distance measure is based on the Hotelling T 2 . Ai is defined with the largest p eigenvalues and eigenvectors of Fi A−1 = Ui,p Λi,p UTi,p , i
(14)
When p is equal to the rank of Fi , this method is identical to the fuzzy covariance matrix based distance measure, where Fi is inverted by taking the Moore-Penrose pseudo inverse. The optimal parameters of the ηi = {vix , Ai , vit , σi2 , αi } cluster prototypes are determined by the minimization of the (5) functional subjected to the following constraints to get a fuzzy partitioning space, U = [µi,k ]c×N | µi,k ∈ [0, 1], ∀i, k;
c i=1
µi,k = 1, ∀k;
0, and initialize the U = [µi,k ]c×N partition matrix randomly. Repeat for l = 1, 2, . . . Step 1. Calculate the ηi parameters of the clusters by (7), (9), and (13) or (14). 2 Step 2. Compute the Di,k (ηi , zk ) distance measures (5) by (6), (8). Step 3. Update the partition matrix 1
(l)
µi,k = c
j=1
2/(m−1)
(Di,k /Dj,k )
, 1 ≤ i ≤ c, 1 ≤ k ≤ N .
(17)
until ||U(l) − U(l−1) || < .
3
Application to Process Monitoring
Manual process supervision relies heavily on visual monitoring of characteristic shapes of changes in process variables, especially their trends. Although humans are very good at visually detecting such patterns, for a control system software it is a difficult problem. Researchers with different background, for example from pattern recognition, digital signal processing and data mining, have contributed to the process trend analysis development and have put emphasis on different aspects of the field, and at least the following issues characteristic to the process control viewpoint are employed: (I.) Methods operate on process measurements or calculated values in time-series which are not necessarily sampled at constant intervals. (II.) Trends have meaning to human experts. (III.) Trend is a pattern or structure in one-dimensional data. (IV.) Application area is in process monitoring, diagnosis and control [3,5,10]. The aim of this example is to show how the proposed algorithm is able to detect meaningful temporal shapes from multivariate historical process data. The monitoring of a medium and high-density polyethylene (MDPE, HDPE) plant is considered. HDPE is versatile plastic used for household goods, packaging, car parts and pipe. The plant is operated by TVK Ltd., which is the largest Hungarian polymer production company (www.tvk.hu) in Hungary. An interesting problem with the process is that it requires to produce about ten product grades according to market demand. Hence, there is a clear need to minimize the time of changeover because off-specification product may be produced during transition. The difficulty of the problem comes from the fact that there are more than
282
J. Abonyi et al.
ten process variables to consider. Measurements are available in every 15 seconds on process variables xk , which are the (xk,1 polymer production intensity (P E), the inlet flowrates of hexene (C6in ), ethlylene (C2in ), hydrogen (H2in ), the isobutane solvent (IBin ) and the catalyzator (Kat), the concentrations of ethylene (C2 ), hexene (C2 ), and hydrogen (H2 ) and the slurry in the reactor (slurry), and the temperature of the reactor (T ). The section used for this example is a 125 hours period which includes at least three segments: a product transition around the 25th hour, a ”calm” period until the 100th hour, and a ”stirring” period of 30 hours. (see Figure 2).
C (w%)
0.5
2
0 0 1
20
40
60
80
100
120
60
80
100
120
C
60
80
100
120
20
40
60
80
100
120
H2 (mol%)
C
40
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
80
100
120
0.6
0.55 0.5
0 0 1 0.5 0 0 0.5
40
60
80
100
in
Kat
0
20
40
60 Time [h]
80
100
1 0.5 0 0.4
120 T (oC)
20
slurry (g/cm3)
2in
(kg/h)
20
0.2
H
20
0.5 0
0 0 0.4
0
0
6
40
C (w%)
20
2in
(t/h)
0 0 1 0.5
in
IB (t/h)
0 1
0.5
6in
(kg/h)
PE (t/h)
1 0.5
0.3 0.2
120
60 Time [h]
Fig. 2. Example of a time-series of a grade transition.
4
10
x 10
1
9
0.9
8
0.8
7
0.7
6
0.6
5
0.5
4
0.4
3
0.3
2
0.2
1
0.1
0
0
20
40
60 Time [h]
80
100
120
0
0
20
40
60 Time [h]
80
100
120
Fig. 3. Fuzzy-PCA-Q segmentation: (a) 1/Di,k (vix , xk ) distances, (b) fuzzy time-series segments, βi (tk )
Fuzzy Clustering Based Segmentation of Time-Series
283
1
120
0.9 100
0.8
0.7 80
0.6
0.5
60
0.4 40
0.3
0.2 20
0.1
0
0
20
40
60 Time [h]
80
100
120
0
0
20
40
60 Time [h]
80
100
120
Fig. 4. Fuzzy-PCA-T 2 segmentation: (a) 1/Di,k (vix , xk ) distances, (b) fuzzy time-series segments, βi (tk )
The distance between the data points and the cluster centers can be computed by two methods corresponding to the two distance measures of PCA: the Q reconstruction error and the Hotelling T 2 . These methods give different results which can be used for different purposes. The Fuzzy-PCA-Q method is sensitive to the number of the principal components. With the increase of p the reconstruction error decreases. In case of p = n = 11 the reconstruction error becomes zero in the whole range of the data set. If p is too small, the reconstruction error will be large for the entire time-series. In these two extreme cases the segmentation does not based on the internal relationships among the variables, so equidistant segments are detected. When the number of the latent variables is in the the range p = 3, ..., 8, reliable segments are detected, the algorithm finds the grade transition about the 25th hours Figure 3). The Fuzzy-PCA-T 2 method is also sensitive to the number of principal components. The increase of the model dimension does not obviously decreases the Hotelling T 2 , because the volume of the fuzzy covariance matrix (the product of the eigenvalues) tends to zero (see (8)). Hence, it is practical to use mixture of PCA models with similar performance capacity. This can be achieved by defining p based on the desired accuracy (loss of variance) of the PCA models: p−1 f =1
diag (Λi,f )/
n
f =1
diag (Λi,f ) < q ≤
p f =1
diag (Λi,f )/
n
diag (Λi,f )
(18)
f =1
As Figure 4 shows, when q = 0.99 the algorithm finds also correct segmentation that is highly reflects the ”drift” of the operating regions (Figure 3).
284
4
J. Abonyi et al.
Conclusions
This paper presented a new clustering algorithm for the fuzzy segmentation of multivariate time-series. The algorithm is based on the simultaneous identification of fuzzy sets which represent the segments in time and the hyperplanes of local PCA models used to measure the homogeneity of the segments. Two homogeneity measures corresponding to the two typical application of PCA models have been defined. The Q reconstruction error segments the time-series according to the change of the correlation among the variables, while the Hotelling T 2 measure segments the time-series based on the drift of the center of the operating region. The algorithm described here was applied to the monitoring of the production of high-density polyethylene. When the Q reconstruction error was used during the clustering, good performance was achieved when the number of the principal components were defined as the parameters of the clusters. Contrary, when the clustering is based on the T 2 distances from the cluster centers, it was advantageous to define the desired reconstruction performance to obtain a good segmentation. The results suggest that the proposed tool can be applied to distinguish typical operational conditions and analyze product grade transitions. We plant to use the results of the segmentations to design intelligent query system for typical operating conditions in multivariate historical process databases. Acknowledgements. The authors would like to acknowledge the support of the Cooperative Research Center (VIKKK) (project 2001-II-1), the Hungarian Ministry of Education (FKFP-0073/2001), and the Hungarian Science Foundation (T037600). Janos Abonyi is grateful for the financial support of the Janos Bolyai Research Fellowship of the Hungarian Academy of Science. The support of our industrial partners at TVK Ltd., especially Miklos Nemeth, Lorant Balint and dr. Gabor Nagy is gratefully acknowledged.
References 1. K. Vasko, H. Toivonen, Estimating the number of segments in time series data using permutation tests, IEEE International Conference on Data Mining (2002) 466–473. 2. M. Last, Y. Klein, A. Kandel, Knowledge discovery in time series databases, IEEE Transactions on Systems, Man, and Cybernetics 31 (1) (2000) 160–169. 3. S. Kivikunnas, Overview of process trend analysis methods and applications, ERUDIT Workshop on Applications in Pulp and Paper Industry (1998) CD ROM. 4. M. E. Tipping, C. M. Bishop, Mixtures of probabilistic principal components analysis, Neural Computation 11 (1999) 443–482. 5. G. Stephanopoulos, C. Han, Intelligent systems in process engineering: A review, Comput. Chem. Engng. 20 (1996) 743–791. 6. K. Chakrabarti, S. Mehrotra, Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proceedings of the 26th VLDB Conference Cairo Egypt (2000) P089.
Fuzzy Clustering Based Segmentation of Time-Series
285
7. J. Abonyi, Fuzzy Model Identification for Control, Birkhauser Boston, 2003. 8. A. B. Geva, Hierarchical-fuzzy clustering of temporal-patterns and its application for time-series prediction, Pattern Recognition Letters 20 (1999) 1519–1532. 9. J. F. Baldwin, T. P. Martin, J. M. Rossiter, Time series modelling and prediction using fuzzy trend information, Proceedings of 5th International Conference on Soft Computing and Information Intelligent Systems (1998) 499–502. 10. J. C. Wong, K. McDonald, A. Palazoglu, Classification of process trends based on fuzzified symbolic representation and hidden markov models, Journal of Process Control 8 (1998) 395–408. 11. J. Himberg, K. Korpiaho, H. Mannila, J. Tikanmaki, H. T. Toivonen, Time-series segmentation for context recognition in mobile devices, IEEE International Conference on Data Mining (ICDM’01), San Jose, California (2001) 203–210. 12. E. Keogh, S. Chu, D. Hart, M. Pazzani, An online algorithm for segmenting time series, IEEE International Conference on Data Mining (2001) http://citeseer.nj.nec.com/keogh01online.html. 13. J. Abonyi, R. Babuska, F. Szeifert, Modified Gath-Geva fuzzy clustering for identification of takagi-sugeno fuzzy models, IEEE Transactions on Systems, Man, and Cybernetics 32(5) (2002) 612–321. 14. R. Babuˇska, P. J. van der Veen, U. Kaymak, Improved covariance estimation for gustafson-kessel clustering, IEEE International Conference on Fuzzy Systems (2002) 1081–1085. 15. F. Hoppner, F. Klawonn, R. Kruse, T. Runkler, Fuzzy Cluster Analysis, Wiley, Chichester, 1999. 16. F. Hoppner, F. Klawonn, Fuzzy Clustering of Sampled Functions. In NAFIPS00, Atlanta, USA, (2000) 251–255.