ROKS 2013
Windowing strategies for on-line multiple kernel regression Manuel Herrera BATir - Universit´e libre de Bruxelles
[email protected]
Rajan Filomeno Coelho BATir - Universit´e libre de Bruxelles
[email protected]
Abstract: This work proposes two on-line learning multiple kernel regression (MKr) versions, to update the current model to a more accurate one, avoiding computational efforts associated with re-calculating the whole process each time that new data are available. The first approach is by sliding windows: strategy which maintains the size of the kernel matrix under study. The another one is by the so-called “worm” windows. It shrinks the kernel matrix as sliding windows does, but not at every entry of new data, attempting to lose a minimum of information. Keywords: Multiple kernel regression, on-line learning, windowing methods
1 Introduction Most of the kernel-based algorithms cannot be used to operate on-line since a number of difficulties such as time and memory complexities (due to the growing kernel matrix) and the need to avoid over-fitting. However, there are some works obtained in this sense during last years [1, 6, 7]. A kernel-based recursive least-squares algorithm that implements a fixed size “sliding-window” technique [5] has been proposed by Vaerenbergh et al., 2006 [7]. We propose a similar methodology for resizing the kernel matrix to assist in the on-line process of multiple kernel regression (MKr) for mixed variables. The MKr process is summarized in the following two equations: Eq. 1 and Eq. 2. Where the kernel matrix used in the regression (Eq. 2) is calculated by a combination of kernels (Eq. 1).
˜ i , xj ) = K(x
M X
µs Ks (xi , xj )
Fig. 1: Online multiple kernel regression.
2 Windowing for on-line MKr 2.1 On-line MKr by sliding windows The sliding window approach consists in only taking the last N pairs of the stream to performance the multikernel regression. When we obtain a new observed pair {xn+1 , yn+1 }, we first down-size the kernel matrix, (n) Kj , by extracting the contribution from xn−N (see Eq. 3)
(1)
s=1
fˆ(x) = b +
n X ˜ i , x) (αi+ − αi− )K(x
(2)
i=1
The aim of the windowing strategies for on-line MKr (see Figure 1) is to improve the performance of the process without increasing the original algorithm’s computation time. The new windowing strategy of “wormwindows” is proposed in this work. This method has two “expand-shrink” phases: firstly, it allows increasing the kernel matrix as its size remains adequate to work and there is not any over-fitting issue. Shrinking the kernel matrix to the original size is proposed when its size reaches to be not computationally efficient.
ˇ (n) K j
(n)
Kj (2, 2) · · · .. .. = . . (n) Kj (N, 2) · · ·
(n) Kj (2, N ) .. . (n) Kj (N, N ) (n)
(3)
and then we augment again the Kj dimension by importing the data input xn+1 to obtain the kernel expressed in Equation 4.
105
ROKS 2013
(n+1)
Kj
=
ˇ (n) K j
Kj (X n , xn+1 ) Kj (xn+1 , X n ) Kj (xn+1 , xn+1 ) + λ
shows a comparison between both windowing strategies introduced in this work by the boxplot of their RMSEs.
(4)
where X n = (xn−N +1 , . . . , xn )T Next, the kernel matrices are summed again (see Figure 1) and their weights, µ, should be updated too. As it is a particular case of the calculation of weights corresponding to the batch phase of the overall process, the proposal is to follow an Stochastic Gradient Descent (SGD) [2, 3] algorithm. 2.2 On-line MKr by “worm” windows The so-called worm window approach consists in augmenting the kernel matrix size when new data become available. A shrink to the original size is proposed when its performance falls below a certain tolerance limit. Then, there are taking into account the last n data. The performance of the first growing phase of the algorithm should be checked after the first iteration; simulating its computational efficiency with random data and establishing a maximum size. Besides of this, overfitting issues should be considered in order to shrink the kernel matrix.
Fig. 2: RMSE errors of the windowing strategies. Structural design case-study.
These results support the worm window strategy for online learning MKr models as a more accurate methodology than sliding windows. In addition, the computational efforts related to worm windows can be controlled by a previous tuning phase.
References
The worm windows alternative should offer a major stability in their predictions as consequence of always considering a number of data equal or greater than sliding windows. On the other hand, the sliding alternative requires a minor computational efforts and will take a major proportion of new data. Thus, depending on the nature of the database, its variability, and the targets of the analysis we could choose one of these two options for the on-line learning.
[1] S. C.H. Hoi, Rong Jin, Peilin Zhao, and Tianbao Yang. Online multiple kernel classification. Machine Learning, pages 1–27, 2012. In press. [2] A. Karatzoglou. Kernel methods software, algorithms and applications. PhD thesis, 2006. [3] J. Kivinen, A. Smola, and R. C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8):2165–2176, 2004.
3 Numerical results
[4] T. Liao. Improved ant colony optimization algorithms for continuous and mixed discretecontinuous optimization problems. Technical report, CoDE-IRIDIA Dpt., Universit´e libre de Bruxelles, Belgium, 2011.
To validate the on-line MKr approaches introduced in this work, a series of analytical benchmarks have been used, along with a structural design test case. 3.1 Analytical test cases The on-line MKr methods proposed are first validated on a set of three artificial mixed-variable benchmark functions of 5 continuous and 5 discrete variables adapted from [4]. In all cases, we test 20 updates of 5 elements each time. While sliding window strategy multiplied its RMSE error by six along the learning process, the worm window error remained nearly constant.
[5] H. Lin, D. Chiu, Y. Wu, and A. Chen. Mining frequent itemset from data streams with a timesensitive sliding window. In SIAM International Conference on Data Mining 2005, 2005. [6] M. Martin. On-line support vector machine regression. In 13th European Conference on Machine Learning - ECML 2002, pages 282–294, 2002. [7] S. van Vaerenbergh, J. V´ıa, and I. Santamar´ıa. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP 2006, volume V, pages 789–792, 2006.
3.2 Structural design instance A structural design example by a 3D rigid frame is also introduced to illustrate the performance of these online MKr methods. The quantity of interest is the total mass of the structure which is characterized by ten design, 5 continuous and 5 discrete, variables. Figure 2
106