Variable-mixing parameter quantized kernel algorithms for combating impulsive interference
robust
mixed-norm
Lu Lu1, Haiquan Zhao1*, Badong Chen2 1
School of Electrical Engineering, Southwest Jiaotong University, Sichuan, People’s Republic of China 2 School of Electronic and Information Engineering, Xi’an Jiaotong University, Shaanxi, People’s Republic of China *
[email protected]
Abstract: To overcome the performance degradation in impulsive noise environments, a kernel robust mixed-norm (KRMN) algorithm is presented which no longer requires a Gaussian environment. It incorporates the robust mixed-norm (RMN) algorithm and kernel method to obtain robustness against impulsive noise. However, it has two major problems as follows: (1) The choice of the mixing parameter in the KRMN algorithm is crucial to obtain satisfactory performance. (2) The structure of KRMN grows linearly as the iteration goes on, thus it has high computational burden and memory requirement. To solve the parameter selection problem, two variable-mixing parameter KRMN (VPKRMN) algorithms are developed in this paper. Furthermore, a sparsification algorithm, quantized VPKRMN (QVPKRMN) algorithm is introduced for nonlinear system identification under impulsive interference environment. The convergence property in the mean square sense has been carried out, and the energy conservation relation (ECR) for QVPKRMN algorithm is established. Simulations in the context of nonlinear system identification under impulsive interference have shown that the proposed VPKRMN and QVPKRMN algorithms provide superior performance than existing algorithms. 1. Introduction Kernel method has received increasing attentions in machine learning and adaptive signal processing literatures. Some successful applications were proposed to improve the robustness of the nonlinear adaptive filter, such as support vector machine (SVM) [1], kernel principal component analysis (KPCA) [2], and kernel adaptive filters [3], etc. The main idea of the kernel method is to transform the input data into a high-dimensional feature space via a reproducing kernel Hilbert space (RKHS) [4,5]. After that, the linear adaptive filter is applied in the feature space. In this direction, Engel et al. proposed a kernel recursive least squares (KRLS) algorithm, whose framework was similar to the structure of radial basis function (RBF) network [6]. To increase the ability of modeling nonlinear systems, a KRLS was proposed by combining a sliding-window approach with the conventional l2-norm regularization [7]. Later, a kernelized version of the extended recursive least squares (EX-KRLS) algorithm was proposed focusing on two special cases of the state transition matrix [8]. In [9], using a stochastic gradient approximation in RKHS, a KLMS algorithm was proposed in which the learning strategy was similar to resource-allocating networks (RAN) [10]. As an improvement, kernel affine projection algorithm (KAPA) was presented with online advantages [11]. Besides, it has been successfully applied in nonlinear active noise cancellation
1
[3,11] and nonlinear acoustic echo cancellation (NLAEC) field [12]. Although the above-mentioned kernel adaptive filters achieve good performance, they are not suitable for online applications, as their structures grow linearly with the number of processed patterns. In the past years, some sparsification techniques that constrain the growth of the network size were proposed [3,6,13,14]. In particular, quantized KLMS (QKLMS) algorithm has been successfully applied to static function estimation and time series prediction [24]. It has a mechanism to utilize the redundant input data, which is helpful to achieve a better accuracy and a more compact network with fewer centers. Hence, QKLMS has better accuracy and a more compact network than other sparsification methods. On the other hand, in many other fields of science, impulsive noise exists widely. It is well known that the impulsive interference tends to have infinite variance. This is main reason that the traditional l2norm algorithms are not suitable for impulse interference. Thus, a family of norm stochastic gradient adaptive filter algorithms was proposed, such as least mean absolute third (LMAT) [15-16], least-meanfourth (LMF) [17-19], least-mean mixed-norm (LMMN) [20-22], and robust mixed-norm (RMN) [23]. Among these algorithms, the RMN algorithm was presented from a modification of the LMMN algorithm [7], by replacing the fourth-order error norm with the first-order one. That is, the cost function of RMN is a convex function of the error norms that underlie the least mean square (LMS) and least absolute difference (LAD) algorithms. Therefore, the RMN algorithm has robust performance in the presence of impulsive noise. In view of the above facts, a KRMN algorithm was proposed by deriving a RMN in RKHS [25]. The combination of the kernel method and the RMN algorithm improves the nonlinear filtering performance under impulsive interference. Regrettably, the unsuitable selection of mixing parameter degrades the performance of KRMN algorithm. Therefore, two adaptive update rules are considered to overcome this problem. We nominate these new algorithms as variable mixing parameter KRMN (VPKRMN) algorithm. And, based on the VPKRMN algorithms, we proposed a quantized VPKRMN (QVPKRMN) algorithm to curb the growth of the networks and reduce the computational burden. At the same time, the energy conservation relation (ECR) for proposed QVPKRMN algorithm is conducted. Last, simulation studies for nonlinear system identification demonstrate that the proposed algorithms outperform the LMS, RMN, KLMS, KRMN, and kernel LAD (KLAD) algorithms. This paper is organized in the following manner. Section 2 introduces a brief description of kernel method and KRMN algorithm. In Section 3, two novel VPKRMN algorithms are derived to adaptive update the mixing parameter of KRMN. And, a QVPKRMN is proposed based on VPKRMN algorithm to low network size. In Section 4, the analysis of convergence property is conducted. In Section 5, many
2
simulation results are provided to show the efficiency of the proposed algorithm. Finally, some conclusions are found in Section 6. 2. Kernel method and KRMN algorithm 2.1 Kernel method The kernel method is useful nonparametric modeling tools to deal with the nonlinearity problem. The power of this idea is that transform input data (input space U ) into a high-dimensional feature space
F using a certain nonlinear mapping, which can be expressed as: φ :U F
(1)
where φ is the feature vector in kernel method. To apply the kernel method in linear adaptive filter, the kernel function κ is introduced which translates inner product operations in feature space without knowing the exact nonlinear mapping. By the construction of the mapping, an important implication in RKHS is the Mercer theorem [5]:
(u, u) φ(u)φT (u) ii (u)iT (u)
(2)
i 1
where is the nonnegative eigenvalue, and is the corresponding eigenfunction, respectively. The eigenvalues and eigenfunctions constitute the feature vector φ :
φ(u) [ 11 (u), 2 2 (u),...]T .
(3)
It is well known that a Mercer kernel is a continuous, symmetric and positive-definite kernel. The commonly used Gaussian kernel can be expressed as
(u, u) exp(h || u u ||2 )
(4)
where h is the kernel bandwidth. Using (2) and (4), the feature space can be calculated by inner product. Consequently, the output of adaptive filter can be expressed by inner product of the transformed test data
(u) and training data (u j ) n
f (u) a j y j (u), (u j )
(5)
j 1
where a j is the coefficient at discrete time n, and is the inner product operation, respectively. 2.2 KRMN algorithm When the desired or the input signal is corrupted by impulsive noise, the performance of KLMS degrades. To overcome this problem, the KRMN algorithm was proposed by using kernel method [25].
3
The input data of RMN algorithm are transformed into RKHS as φ(n) , and the weight vector in feature space is defined as Ω(n) , Ω(1) = 0 . The error signal is defined as
e(n) d (n) ΩT (n)φ(n).
(6)
The KRMN algorithm is based on minimization of the mixed-norm error as follows [23,25]:
J (n) E{e(n)2 } (1 ) E{| e(n) |}
(7)
is the statistical expectation operator, is a constant with a range of (0,1). This cost function where E{} is a linear combination of the KLMS and KLAD algorithms. That is, the combination of l2 norm and l1 norm. By using the steepest descent algorithm, the gradient vector of J(n) with respect to Ω(n) is calculated as
Ω ( n ) J (n) [2 e(n) (1 ) sign{e(n)}]φ(n)
(8)
where sign{x} denotes the sign function, i.e., if x 0 , then sign{x} 1 , if x=0, sign{x} returns to 0, otherwise sign{x} 1 . Then, the filter weight of KRMN algorithm is solved iteratively on the new example sequence {φ(n), d (n)}
Ω(n 1) Ω(n) [2 e(n) (1 ) sign{e(n)}]φ(n).
(9)
Repeat using the weight update formula (9), we have n 1
Ω(n 1) [2 e( j ) (1 ) sign{e( j )}]φ( j ).
(10)
j 1
By using the Mercer kernel in (2), the output of the filter can be calculated through kernel evaluations n
y (n 1) [2 e( j ) (1 ) sign{e( j )}] ( j, n 1).
(11)
j 1
To simplification, the efficient a j (n 1) is defined as
a j (n 1) [2 e( j ) (1 ) sign{e( j )], j 1,..., n 1
(12)
and codebook C(n) refer as a center set in time n
C(n 1) [C(n), u(n 1)].
(13)
It can be observed that if the kernel function is replaced by a radial kernel, KRMN produces a growing RBF network by allocating a new kernel unit for every new example with input u(n+1). The main bottleneck of the KRMN is its network size grows with the number of processed data. To overcome this severe drawback, a quantization scheme should be used to curb the growth of network.
4
3. Proposed algorithms 3.1 VPKRMN algorithm An unsuitable mixing parameter selection will leads a performance degradation of KRMN. To circumvent this problem, the mixing parameter should be automatically adjusted. Here, we use (n) instead of to derive the variable mixing parameter algorithm. First, considered the mixing parameter
(n) at each iteration to minimize the mixed-norm error of the KRMN, also by means of a minimum square error stochastic (n) , we obtain
(n 1)
J (n) (n)
{ (n) E{e(n)2 } [1 (n)]E{| e(n) |}} (n)
{ (n)e(n) 2 [1 (n)] | e(n) |} (n)
(14)
where (n) is confined to the closed interval [0,1]. Then, we add a scaling factor γ to (14) to control the steepness of the J(n). So, an adaptive update rules for KRMN algorithm is obtained, and we name the new algorithm the VPKRMN-Algorithm 1: VPKRMN-Algorithm 1: (n 1) (n) {E{| e(n) |} E{e2 (n)}} (n) {| e(n) | e 2 (n)}.
(15)
As can be seen from (15), the mixing parameter is adjusted by switching the two types of error norm. When | e(n) | e2 (n) , the mixing parameter tends to one, the KLMS algorithm plays a dominate role of the filter. When | e(n) | e 2 (n) , the mixing parameter tends to zero, the KLAD algorithm plays a dominate role of the filter.
5
1 0.9
λ=0.2
0.8
λ=0.4
0.7
J(n)
0.6 0.5
λ=0.6 λ=0 λ=0.8
0.4 0.3 0.2
λ=1
0.1 0 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
e(n)
Fig. 1. The cost functions with different mixing parameter settings.
The cost function of the VPKRMN is an unimodal function (See Fig. 1). The unimodal character is preserved for mixing parameter (n) chosen in (0,1), that is , the second term of (15) keep a very small value. Hence, the steepness update rule in VPKRMN Algorithm 1 is very sensitive to the choice of parameter γ. To avoid this limitation, a new adaptive update approach is adopted for mixing parameter of KRMN algorithm. To avoid confusion, KRMN algorithm with the following update rule is called the VPKRMN-Algorithm 2.
(n 1) (n) p(n) 2 VPKRMN Algorithm 2 : p(n) p(n 1) (1 )e(n)e(n 1)
(16)
where and are the constants with a range of [0,1]. They are exponential weighting parameters that control the quality of estimation of the algorithm. The parameter is a positive constant greater than zero, and p(n) is a low-pass filtered estimation of e(n)e(n 1) . Note that mixing parameter has a fixed value when =1 and =0. There are two reasons that account for the use of p(n) to update mixing parameter
(n) . Firstly, the error autocorrelation e(n)e(n 1) is generally a good measure of the proximity to the optimum [26]. Secondly, the environment is divided into two kinds by error autocorrelation e(n)e(n 1) : no impulsive environment and impulsive environment. To object of VPKRMN-Algorithm 2 is to ensure the large (n) when the VPKRMN Algorithm 2 is far from the optimum with decreasing (n) . The large value of (n) leads to the dominate role of first term in KRMN. That is, bringing l2-norm error provides a more accurate final solution and less misadjustment, when there is no impulsive interference. Conversely,
6
when there is impulsive noise, the value of (n) is small, and the l1-norm error offers fast convergence rate and low misadjustment. 3.2 QVPKRMN algorithm QVPKRMN incorporates the idea of quantization into VPKRMN to provide an efficient learning performance under impulse interference. In a way, the quantization scheme is similar to the sparsification with NC method [3]. In fact, they almost have the same computational complexity. The main difference between the quantization scheme and NC method is the quantization scheme utilizes the redundant data to locally update the coefficient of the closest centre. The quantization method can be summarized as a learning strategy: the input space is quantized, if the current quantized input has already been assigned a centre, no new centre will be added, but the coefficient of that centre will be updated through merging a new coefficient [24]. The feature vector φ(n) in the weight update equation can be expressed as Ω(0) 0 T e( n) d ( n) Ω ( n 1)φ(n) Ω( n) Ω(n 1) [2 (n)e(n) (1 (n)) sign{e(n)]Q[φ(n)]
(17)
where Q[] is a quantization operator in feature space F . Owing to the high dimensionality of feature space, the quantization scheme is usually used in input space U . The learning rule of QVPKRMN algorithm in U can be given as f0 0 e(n) d (n) f n 1 (u(n)) f f [2 (n)e(n) (1 (n)) sign{e(n)] (Q[φ(n)],) n 1 n
(18)
where Q[] is a quantization operation in input space U . Throughout this paper, the notation φ q (n) is replace the notation Q[φ(n)] , and u q (n) Q[u(n)] , respectively. In the following QVPKRMN algorithm,
C j (n 1) is the jth element of the codebook C( n 1) , |||| is the Euclidean norm in feature space F , and
U is the threshold of the distance. For the case U =0, the QVPKRMN algorithm will reduce to the VPKRMN algorithm. The proposed QVPKRMN algorithm is summarized in Table 1. Table 1 Proposed QVPKRMN algorithms Initialization: choose step size μ, bandwidth parameters of kernel h a1 (1) d (1) 2 sign(d (1)) , C(1) [u(1)]
f1 a1 (1) (u(1), ) , (1) 0.5 Computation: While {u(n), d(n)} n>1 is available do (1) Compute the output of the adaptive filter:
7
size ( C ( n 1))
y ( n)
α j (n) (u(n 1), u( j ))
j 1
(2) Compute the error: e(n) = y(n)d(n) (3) Compute the distance between u(n) and C(n1) dis(u(n), C(n 1)) min || u(n) C j (n 1) || 1 j size ( C ( n 1))
(4) If dis(u(n), C(n 1)) U , keep the codebook unchanged:
C(n 1) C(n) and quantize u(n) to the closest center through updating the coefficient of that center: α j* (n) α j* (n 1) [2 (n)e(n) (1 (n)) sign{e(n)] where j * arg min || u(n) C j (n 1) || 1 j size ( C ( n 1))
otherwise, assign a new center and corresponding new coefficient: C(n 1) [C(n), u(n 1)] α(n) [α (n 1), [2 (n)e(n) (1 (n)) sign{e(n)]] Then, using two new update rule of mixing parameter Algorithm 1: (n 1) (n) {| e(n) | e 2 (n)} (n 1) (n) p (n) 2 Algorithm 2 : p (n) p (n 1) (1 )e(n)e(n 1) end while Table 2 Computational complexity. Algorithms LMS RMN KLMS KLAD KRMN QKLMS VPKRMN QVPKRMN
Computation (training) O(N) O(N) O(N2) O(N2) O(N2) O(L2) O(N2) O(L2)
Memory (training) O(M) O(M) O(N) O(N) O(N) O(L) O(N) O(L)
Computation (test) O(M) O(M) O(N) O(N) O(N) O(L) O(N) O(L)
Memory (test) O(M) O(M) O(N) O(N) O(N) O(L) O(N) O(L)
Table 2 summarizes the computational complexity of the algorithms, where N is the training times, M is the length of the filter, L (L