Robust Singular Spectrum Transform Yasser Mohammad1 , Toyoaki Nishida2 Nishida-Sumi Laboratory, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan 1
[email protected] 2
[email protected]
Abstract. Change Point Discovery is a basic algorithm needed in many time series mining applications including rule discovery, motif discovery, casual analysis, etc. Several techniques for change point discovery have been suggested including wavelet analysis, cosine transforms, CUMSUM, and Singular Spectrum Transform. Of these methods Singular Spectrum Transform (SST) have received much attention because of its generality and because it does not require ad-hoc adjustment for every time series. In this paper we show that traditional SST suffers from two major problems: the need to specify five parameters and the rapid reduction in the specificity with increased noise levels. In this paper we define the Robust Singular Spectrum Transform (RSST) that alleviates both of these problems and compare it to RSST using different synthetic and real-world data series.
1
Introduction
The research in change point (CP) discovery problem have resulted in many techniques including CUMSUM [1], wavelet analysis [2], inflection point search [3], autoregressive modeling [4], Discrete Cosine Transform, and Singular Spectrum Analysis SST [5]. Most of these methods with the exception of SST either discover a single kind of change (e.g. CUMSUM discovers only mean shifts), require ad-hoc tuning for every time series (e.g. wavelet analysis), or assumes a restricted generation process (e.g. Gaussian mixtures). The main disadvantages of SST though are the sensitivity to noise and the need to specify five different parameters. The main idea of SST is to use PCA to discover the degree of ’difference’ between the past and future signal pattern around every point in the time series and use the difference as the change score for this point. Many researchers suggested improvements to traditional SST even though most of these suggestions targeted increasing the speed of the algorithm not its accuracy. [6] introduced online SVD and [7] proposed Krylov Subspace Learning. [7] also proposed using the angle between the subspaces associated with the major PCA components of the past and the future to calculate the change score at every point. The main problem of this proposal is the assumption that all eigen vectors are equal in importance which can be an inaccurate assumption if the distribution of the top eigen values is not nearly uniform (a condition that
2
happens most of the time in our experience with real world time series). In this paper we propose a different approach to utilize the information of the eigen values as well as the eigen vectors for finding the change score. This paper defines the Robust Singular Spectrum Transform (RSST) for discovering change points in time series which reduces the number of parameters required into two parameters rather than five and dramatically increases the specificity of the traditional SST without decreasing its sensitivity. RSST is linear in time and space requirements as SST and adds very small constant increase in the processing time. Moreover speedup techniques like the use of Krylov Subspace Learning suggested in [7] can be directly utilized with RSST. Extensive comparisons between SST and RSST on synthetic data supports the superiority of RSST (see section 4) in both synthetic and real world data.
2
Singular Spectrum Transform
Moskvina-Zhigljavsky [8] used the singular spectrum analysis technique for change detection. The technique is based on the SVD of the Hankel matrix. As SVD can be applied to almost any kind of matrix the algorithm can be applied to various types of time series without any ad-hoc tuning. The essence of the SST transform is to find for every point x (i) the difference between a representation of the dynamics of the few points before it (i.e. x (i − p) : x (i)) and the few points after it (i.e. x (i + g) : x (i + f )). This difference is normalized to have a value between zero and one and named xs (i). The dynamics of the points before and after the current point are represented using the Herkel matrix which is calculated as: H (t) = [seq (t − n) , ..., seq (t − 1)]
(1)
T
where seq (t) = {x (t − w + 1) , ..., x (t)} Singular Value Decomposition (SVD) is then used to find the singular values and vectors of the Herkel Matrix by solving: T
H (t) = U (t) S (t) V (t)
(2)
where S (i − 1, i − 1) ≤ S (i, i) ≤ (i + 1, i + 1). Only the first l left singular vectors (Ul (t)) are kept to represent the past change pattern as the hyperplane defined by them. [5] showed that this hyperplane encodes the major directions of change in the signal. A similar procedure is used to find the direction of largest change in the dynamics for the future of the signal by concatenating m overlapping windows of size w starting g points after t according to: T
r (t + g) = {x (t + g) , ..., x (t + g + w − 1)} G (t) = [r (t + g) , ..., r (t + g + m − 1)]
(3) (4)
3
The eigen vector β (t) corresponding to the direction of maximum change in the future of the signal is found by solving: G (t) G (t) ug = µug
T
(5)
β (t) = ugm
(6)
where m = arg min (µi ) i
If there is no change in the dynamics of the signal, it is expected that β (t) will lie in or very near to the hyperplane represented by Ul (e.g. the directions of maximum change in the past). To quantify the discrepancy between β (t) and Ul , we find the projection of β (t) onto Ul (α (t)). UlT β (t)
α (t) =
U T β (t) l
(7)
The change score is then calculated as the cosine of the angle between α (t) and β (t): T
xs (t) = 1 − α (t) β (t)
(8)
The first problem of the SST algorithm is the need to specify five different parameters. [5] has shown that SST is usually robust to wide variations in w. Domain knowledge or visualization can help in finding an appropriate value for n and w. Choosing the rest of the SST parameters (i.e. g, m and specially l) are more problematic as domain knowledge is not very useful in choosing them. One of the contributions of the proposed RSST transform is to automatically determine a sensible value for these three parameters (section 3). The second problem of SST is the fast degradation of its Specificity when the input signal is noisy specially with constant or zero background signal as will be shown in section 4. The main contribution of the proposed RSST transform is alleviating this limitation which allows accurate change point discovery under very noisy conditions (section 4).
3
Robust Singular Spectrum Transform
The Robust Singular Spectrum Transform is proposed in this paper as a solution to the two main problems of SST detailed in the previous section. In SST, the parameter g encodes the delay after which we look for the change in the signal. The change of g affects the results only when the g becomes very near to w. Once g becomes near to w the correlation between the SST transform and the ground truth change points degrades sharply in the synthetic data we used. This suggests that any value of g w is enough. For this reason RSST fixes g at zero. The parameter m serves a similar rule as the parameter n which is deciding how deep we look into the future (the past) for changes in dynamics. As RSST symmetrically processes past and future sequences m is set to n. The
4
parameter l is more difficult to choose. SST fixes this parameter to a value specified by the user. In RSST the value of l (t) is allowed to change from point to point in the time series depending on the complexity of the signal before it. To calculate a sensible value for l we first sort the singular values of H (t) and find the corner of the accumulated sum of them (linf (t)) [the point at which the tangent to the curve has an angle of π/4]. The singular vectors with singular values higher than this value are assumed to be caused by the genuine dynamics of the signal while the other directions encode the effect of noise. This dynamic setting of l reduces the effect of noise on the final results as will be shown in the following section. After choosing the parameters g, m and l (t), RSST works in the same way as SST for finding past and future patters and calculates H (t), G (t), and Ul (t) similarly. To find a first guess of the change score around every point, RSST tries to utilize more information from the future Henkel Matrix (G (t)) than SST by T using the lf (t) eigen vectors of G (t) G (t) with highest corresponding eigen values (λ1:lf ) rather than only the first eigen vector used in SST. The value of lf (t) is selected using the same algorithm for selecting l (t). T
G (t) G (t) ug = µug
(9)
βi (t) = ugi , i ≤ lf andλj−1 ≤ λj ≤ λj+1 f or 1 ≤ j ≤ w
(10)
Each one of these lf directions are then projected onto the hyperplane defined by Ul (t) The projection of βi (t)s and the hyperplane defined by Ul (t) is then found using: UlT βi (t)
αi (t) =
U T βi (t) , i ≤ lf l
(11)
The change scores defined by βi (t)s and alphai (t)s are then calculated as: T
csi (t) = 1 − αi (t) βi (t)
(12)
The first guess of the change score at the point t is then calculated as the weighted sum of these change point scores where the eigen values of the matrix G (t) are used as weights. lf P
x ˆ (t) =
λi × csi
i=1 lf P
(13) λi
i=1
After applying the aforementioned steps we get a first estimate x ˆ (t) of the change score at every point t of the time series. RSST then applies a filtering step to attenuate the effect of noise on the final scores. The main insight of
5
this filter is that the reduction of SST specificity in noisy signals happens in the sections in which noise takes over the original signal in the time series. The response of SST at these sections can be modeled by a random around a high average for uncorrelated white noise. The filter used by RSST discovers these sections in which the average and the variance of x ˆ (t) remains nearly constant and attenuates them. The guess of the change score at every point is then updated by: p p (14) x ˜ (t) = x ˆ (t) × |µa (t) − µb (t)| × σa (t) − σb (t) where µa and σa are the mean and variance of x ˆ (t) in a subsequence of length w before the point t while µb and σb are the mean and variance of x ˆ (t) in a subsequence of length w after the point t RSST then keeps only the local maxima of x ˜ (t) and normalizes the resulting time series by dividing with its maximum. This normalized signal x (t) represents the final change score of RSST.
4
Comparison Between SST and RSST
. Fig. 1 shows visually the effect of adding white noise to the signal on the performance of SST and the proposed RSST. For every condition the original signal, response of the SST transform and response of the RSST transform are shown. Fig. 1.a and Fig. 1.b show the response of both SST and RSST to two changes in dynamics of a data series with strong background signal and with no background signals. Both algorithms perform well in this condition. Fig. 1.d and Fig. 1.e show the effect of adding uniform random noise of range -0.5%:0.5% of the peak to peak (P-P) value of the original signal. Even with this very low noise level, the SST transform gives high response in all the locations in which the noise level is higher than the original signal value. The performance of SST transform is specially unacceptable when there is no background signal. The RSST transform on the other hand performs much better under this very low noise level. Fig. 1.d and Fig. 1.e show the effect of adding uniform random noise of range -50%:50% of the peak to peak (P-P) value of the original signal. Again the performance of RSST is visually much better than the performance of SST under this very noisy condition. This difference in performance will be quantified by extensive tests using synthetic data with adjustable noise levels and background signal strength in this section. To compare the performance of SST and RSST both in speed and accuracy, 5760 different synthetic time series with controlled embedded changes were produced by changing various features of the time series. Every time series was composed of a recurring pattern called the background signal with embedded patterns embedded at random locations with controllable numbers. Uniform noise was added to the time series before applying the transforms to it. The dependent variables in this experiment were the background signal strength (peak-to-peak
6
Fig. 1. The Effect of noise Level on the performance of SST and RSST algorithms. For every situation four signals are presented in the following order: Original Time Signal, Ground Truth on the locations of the Change Points, Response of the SST algorithm, Response of the RSST algorithm. Cases (a), (b) and (c) represent increasing noise levels when a strong background signal exists. Cases (d), (e), (f) represent increasing noise levels when no background signal exists
value of the background signal divided by the peak-to-peak value of the embedded pattern), the noise level, and the generating processes. The length of the time series was fixed at 8000 points and the length of the embedded pattern was varied from 40 to 400 points. The parameters w, n were fixed at 25 and 10 respectively. Changing w in the range 10 to 60 did not cause any major change in the results. Changing n in the range 5 to 15 also did not cause any major change in the performance. Due to lack of space the results of these tests will not be reported here as [5] have previously achieved similar results for the SST transform. For SST, g and m were selected to equal zero and n respectively (as in the RSST). The value of l was selected to be 3 after experimenting with 300 random samples of the time series and selecting the value that achieved best performance. For evaluation purposes, we treated the problem as discrete event detection and used specificity and sensitivity to measure the effectiveness of SST and RSST as follows:
7
1. Sensitivity of Maximum Response SnMax defined as the the maximum response of the transform in a window of width τ /100 around the true points of change, averaged over all true points of change. τ is length of the embedded pattern. 2. Sensitivity of Response Density SnDen defined as the the average response of the transform in a window of width τ /10 around the true points of change, averaged over all true points of change. 3. Specificity of Maximum Response SpMax defined as the the maximum response of the transform in all τ /10 windows of the time series excluding the τ /10 windows around the true change points. 4. Specificity of Response Density SpDen defined as the the average response of the transform in all the points of the time series excluding all τ /10 windows around the true change points.
(a) Effect of Background Signal (b) Effect of Background Signal Strength on the specificity Strength on the sensitivity
(c) Effect of the noise level on the (d) Effect of the noise level on the Specificity Sensitivity Fig. 2. comparison between RSST and SST using synthetic data
Fig. 2-a shows the effect of background signal strength on the specificity of both SST and RSST. As the Figure shows the specificity of RSST is always higher than SST in terms of Specificity of Response Density SpDen. This means that RSST is more suitable for applications in which the response around the change points not only the response at the change points is important because RSST will have higher distribution of high responses not only at change points but around them compared with other points in the time series. In terms of Specificity of Maximum Response (SpMax), RSST is also superior for SST both in low and high background signal strengths with a period in which both transforms perform
8
equally between 40% and 70% in our data sets. The important point here is that RSST is not only superior to SST in terms of specificity using both metrics but it is only more stable specially in cases when the background signal strength is low (≤30%). In these cases SST fails to distinguish the changes resulting from the noise and the changes resulting from the original time series because the amplitude of the time series is smaller than the amplitude of the noise. RSST can cope with these situations because of its final filtering step. Fig. 2-b shows the effect of background signal strength on the sensitivity of both SST and RSST. As the figure shows there is no clear superiority of one of the transforms over the other in terms of Sensitivity of the Maximum Responses but SST shows slight superiority in the Sensitivity of the Response Density. This can be attributed to the reduced specificity of SST which results in many false high responses, some of them comes naturally very near to the change points increasing the apparent sensitivity of SST. This effect did not happen in the SenMax matrix as the maximum response will not be affected with these noisy false responses. Fig. 2-c shows the effect of noise level on the specificity of the two transforms. RSST is clearly superior to SST in terms of specificity at all noise levels and the rate degradation of its specificity is less than SST. This superiority is specially clear in the SpDen metric. Fig. 2-d shows the effect of noise level on the sensitivity of the two transforms. Here again SST is slightly superior to RSST due the effect of noisy false responses near the change points explained earlier.
(a) Computation time
(b) number of eigen vectors used from the past and future Herkel matrices
Fig. 3. Effect of signal complexity SST and RSST
Fig. 3-a shows the effect of signal complexity on the computation times of SST and RSST. Signal Complexity is defined as the multiplication of noise level and number of embedded changes. As the figure shows, the computational complexity of SST and RSST are both linear in signal complexity. RSST has higher slop than SST because as the noise level increases the Specificity of the basic SST transform decreases causing multiple sections of noise-generated high responses that requires more computation to be eliminated (see section 3). Fig. 3-b shows the effect of signal complexity on the number of effective singular values used from the past and future Herkel matrices. SST has fixed values for both of these parameters. RSST on the other hand chooses these parameters automatically and
9
as shown on the figure the number of eigen vectors used are linearly dependent on the signal complexity which is a desirable feature from both computational and accuracy point of view as the size of the matrices used in the projections are kept near the optimal value which attenuates the effect of noise. This may be one of the reasons for the increased specificity and robustness of RSST.
5
Application to Human-Human Interaction Mining
To test the accuracy of RSST on real world data we analyzed the respiration response of 22 participants with ages ranging from 18 to 45 while they were explaining an assembly/disassembly to a listener who either listened carefully (natural session) or did not attend to the explanation (unnatural session). This led to 44 explanation sessions. For more details on this experiment refer to [9]. The respiration signal of the speaker was sampled with rate 100Hz using a Polymate (TEAC Company) device. We then applied RSST and SST to the res-
Fig. 4. Number of Peaks in the Change Points detected by SST and RSST
piration signal of the instructor. We calculated the number of peaks per minute (change score> 0.6) in the SST and RSST responses in 11 natural and 11 unnatural sessions. Fig. 4 shows a box chart of this count in the two experiment conditions using SST and RSST. Applying t-test no statistical significant difference between the SST response in natural and unnatural conditions was found while the difference in RSST response in these conditions was significant with p < 0.001.
6
Conclusions
This paper presented a novel version of Singular Spectrum Transform (SST) called Robust Singular Spectrum Transform (RSST). The main differences between RSST and SST are: 1. fixing the choice of two parameters (g and m). 2.
10
automatically calculating a sensible value for the number of singular vectors representing the past and the future. 3. using multiple future eigen vectors rather than a single one. 4. utilizing a final filtering step to attenuate the effect of noise on the transform. The paper presented extensive evaluation of the RSST algorithm using synthetic data which confirmed its superiority of SST in Specificity and Robustness specially in noisy environments and when the background signal strength of the time series is near zero. The last section of the paper applied the proposed transform to mining physiological data in a controlled human-human interaction experiment and the results show that there is a statistically significant difference in the RSST response to the respiration signal of human subjects when interacting with a listener that behaves naturally and when interacting with a listener that behaves in an un-natural manner. There was no statistically significant difference between these two conditions in the SST response to the same signal. Directions of future research includes speeding up the calculation of the RSST transform by approximating the SVD calculation and applying it to generate constraints for motif discovery problems that can increase both the speed and accuracy of such algorithms. Extending the algorithm to multidimensional data is another direction of future extensions of this research.
References 1. m. Basseville, Kikiforov, I.: Detection of Abrupt Changes. Printice Hall, Englewood Cliffs, New Jersy (1993) 2. Kadambe, S., Boudreaux-Bartels, G.: Application of the wavelet transform for pitch detection of speech signals. Information Theory, IEEE Transactions on 38(2) (Mar 1992) 917–924 3. Hirano, S., Tsumoto, S.: Mining similar temporal patterns in long time-series data and its application to medicine. In: ICDM ’02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Washington, DC, USA, IEEE Computer Society (2002) 219 4. Gombay, E.: Change detection in autoregressive time series. J. Multivar. Anal. 99(3) (2008) 451–464 5. Ide, T., Inoue, K.: Knowledge discovery from heterogeneous dynamic systems using change-point correlations. In: Proc. SIAM Intl. Conf. Data Mining. (2005) 6. Zha, H., Simon, H.D.: On updating problems in latent semantic indexing. SIAM Journal on Scientific Computing 21(2) (1999) 782–791 7. Ide, T., , Tsuda, K.: Change-point detection using krylov subspace learning. In: Proceedings of the SIAM Internations Conference on Data Mining. (2007) 8. Moskvina, V., Zhigljavsky, A.: An algorithm based on singular spectrum analysis for change-point detection. Communications in Statistics.Simulation and Computation 32(4) (2003) 319–352 9. Mohammad, Y., Xu, Y., Matsumura, K., Nishida, T.: The h3 r explanation corpus:human-human and base human-robot interaction dataset. In: The fourth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP2008). (December 2008)