A novel kernel correlation coefficient with robustness

1 downloads 0 Views 927KB Size Report
conducted under linear, nonlinear, normal and contam- inated Gaussian ... kernel function · kernel correlation coefficient (KECC) · bivariate normal ... Xu, W., Li, B., Ma, R. et al. J Sign Process ... some non-decreasing transformation due to various sit- uations [20]. ... (N is the sample size), a little slower than PPMCC but much ...
Cite this article as: Xu, W., Li, B., Ma, R. et al. J Sign Process Syst (2016). doi:10.1007/s11265-016-1212-8 Noname manuscript No. (will be inserted by the editor)

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise Weichao Xu * · Baojun Li · Yanzhou Zhou · Yun Zhang

the date of receipt and acceptance should be inserted later

Abstract In this paper, we proposed a new kernel correlation coefficient (KECC), with an emphasis on its robustness against impulsive noise and/or monotonic nonlinear transformations. To gain further insight, we compared KECC with other four correlation coefficients, namely, Pearson’s product moment correlation coefficient (PPMCC), Kendall’s tau (KT), Spearman’s rho (SR) and order statistics correlation coefficient (OSCC). Extensive simulation experiments were conducted under linear, nonlinear, normal and contaminated Gaussian models (CGM) based on seven means of performance evaluation. Theoretical analysis showed that KECC satisfies various desired properties. Numerical results suggest that KECC performs equally well with the optimal PPMCC under the bivariate normal model, and outperforms the others when inpulsive noise and/or nonlinearity exist in the data. Moreover, KECC can detect accurately the time delay of signals corrupted by impulsive noise. Last but not least, KECC runs in linearithmic time, only slightly slower than the fastest PPMCC. The advantages of KECC revealed in this work might shed new light on the topic of correla-

Weichao Xu ( ) School of Automation, Guangdong University of Technology, No. 100 Waihuan Xi Road, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong, P. R. China 510006 E-mail: [email protected] Baojun Li E-mail: [email protected] Yanzhou Zhou E-mail: [email protected] Yun Zhang E-mail: [email protected]

tion analysis, which is important in many areas including signal processing. Keywords association · impulsive noise · Gaussian kernel function · kernel correlation coefficient (KECC) · bivariate normal model · contaminated Gaussian model (CGM).

1 Introduction Interpreted as the strength of statistical relationship between two random variables, correlation coefficients have and will continue to play an important role in signal processing [1, 2, 3, 4, 5, 6, 7, 8, 9]. Qualitatively, correlation coefficients should be large and positive if there is a high probability that large (small) values of one random variable are associated with large (small) values of the other; and it should be large and negative if the direction is reversed, namely, large (small) values of one random variable occur in conjunction with small (large) values of the other [10]. Up to now, a large amount of methods have been proposed to measure the intensity of correlation between random variables in the literature. Among these methods, the most thoroughly studied and widely used are three classical coefficients [11], i.e., Pearson’s product moment correlation coefficient (PPMCC) [12, 13, 14], Kendall’s tau (KT) [15] and Spearman’s rho (SR) [15, 16, 9]. Other methods, with either a longer history such as Pearson’s rank-variate correlation coefficient (PRVCC) [17], or a shorter history such as Gini correlation (GC) [18] and order statistics correlation coefficient (OSCC) [5], have also been investigated recently by the present first author with colleagues [6, 7, 8, 19, 20]. There have many advantages and disadvantages with respect to the correlation coefficients mentioned above.

2

Due to the seminal work of Fisher [?,13], it follows that, when the data follows an exact bivariate normal model, (1) PPMCC is an asymptotically unbiased estimators of the parent correlation, and (2) the variance of PPMCC approaches the Cramer-Rao lower bound as the sample size becomes large [21]. In addition, from the viewpoint of programming, PPMCC has a linear time complexity, which is a desired property for scenarios where real-time is a critical requirement. However, empirical evidences suggest that PPMCC performs poorly when nonlinearity exists in the data collection system [6]. Moreover, theoretical analysis shows that PPMCC is notoriously sensitive to impulsive contamination in the data [16]. Even a single outlier can distort severely the value of PPMCC and hence results in misleading inference in practice [9]. The two rank-based methods, SR and KT, are invariant under monotonic transformations [15], thus often considered as robust alternatives to PPMCC when the data are attenuated by some non-decreasing transformation due to various situations [20]. Besides, SR and KT are also insensitive to impulsive noise, according to the theoretical results established under the contaminated normal model [9, 16]. However, they are only suboptimal under the bivariate normal model, both with an asymptotic relative efficiency to PPMCC being at most 91% [15]. The brand new method OSCC performs equally well as PPMCC under the bivariate normal model [7]; and is better than PPMCC in nonlinear cases [6]. Nevertheless, as illustrated later on, OSCC also su↵ers the drawback of sensitivity to impulsive noise, as PPMCC does. The advantages of PRVCC and GC lie in (1) their mathematical tractability under bivariate normal model, and (2) their robustness against nonlinearity and impulsive noise embedded into one of the two channels [19, 20, 8]. However, they are only suboptimal in the normal cases, in the sense of having larger mean square errors than that of PPMCC when estimating the parent correlation [20, 8]. Moreover, their performance will degrade severely if both channels contain impulsive noise. As to the computational load, OSCC, SR, PRVCC and GC are of linearithmic order, while KT is in quadratic time [6, 19, 20, 8].

Weichao Xu * et al.

4. it runs in linearithmic time, i.e., of order O(N log N ) (N is the sample size), a little slower than PPMCC but much faster than KT and SR; 5. it has small biasedness in both linear and nonlinear scenarios; 6. it is sensitive to changes of degree of association; 7. it outperforms KT and SR in the aspect of robustness against impulsive noise. The rest part of this paper is organized as follows. Section 2 presents the definition and general properties of the novel KECC. In Section 3 are introduced the models of noise and association, as well as performance evaluation methodology, employed to investigate the performances of KECC. In Section 4, we present the experiment results to verify the advantages of KECC. Finally in Section 5, we draw our conclusions on the proposed KECC.

2 Kernel correlation coefficient 2.1 Definition and properties Kernel is usually referred to as a kind of weighting function that widely used in the area of nonparametric statistics [22, 23]. The most popular one is Gaussian kernel function, a kind of symmetric positive definite kernel defined as 1 (⇠1 , ⇠2 , ⌘) , p e 2⇡⌘

⇠2 ) 2

(⇠1 2⌘

2

(1)

where ⇠1 , ⇠2 are two variables, ⌘ is the width of the kernel. N Let {(xi , yi )}i=1 be N data pairs draw from a bivariate distribution. Let ⌫x and ⌫y be the sample medians N N of {xi }i=1 and {yi }i=1 , respectively. Denote by IRx and N IRy the interquartile ranges with respect to {xi }i=1 and N {yi }i=1 . Write ⌘x , 3.53 ⇥ IRx and ⌘y , 3.53 ⇥ IRy . Define

To overcome the problems mentioned above, we propose a novel robust correlation coefficient, termed the kernel correlation coefficient (KECC), which possesses the following advantages:

1 x (x, ⌫x , ⌘x ) , p e 2⇡⌘x

1. it can discriminate positive correlation from negative correlation; 2. it is standardized, i.e., the values are within [ 1, 1]; 3. it performs equally well as PPMCC under the bivariate normal model;

1 y (y, ⌫y , ⌘y ) , p e 2⇡⌘y

(x

(y

Xi , (xi

⌫x )kx (xi , ⌫x , ⌘x )

⌫x )2 2⌘x2

(2)

⌫y )2 2⌘y2

(3)

(4)

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

Yi , (yi

⌫y )ky (yi , ⌫y , ⌘y ).

(5)

Then, based on PPMCC, our KECC is defined as

rKE , s

1 N 1 N

N P

Xi Y i

i=1

N P

i=1

Xi2 N1

N P

i=1

.

(6)

Yi2

Theorem 1 The KECC defined in (6) has the basic properties of a correlation coefficient, as follows: 1. 1  rKE  +1; 2. rKE is symmetric, i.e., rKE (x, y) = rKE (y, x); 3. rKE (x, y) = ±1 when x and y are in strict linear relationship; 4. rKE (x, y) is shift and scale invariant with respect to both x and y. Proof Property 1 and 2 follow directly from CauchySchwarz inequality and the symmetric structure of (6), respectively. To verify Property 3, assume that y = ax + b, where a > 0 and b are two constants. Under this linear transformation, it follows that ⌫y = a⌫x + b, ⌘y = a⌘x , y = x /a, which means that Yi = Xi and hence rKE = 1 from (6). Similarly, we have rKE = 1 if a < 0. Now we check Property 4 by assuming x0 = ax x + bx , and y 0 = ay y + by , with ax > 0, ay > 0, bx , by being constants. It now follows that ⌘x0 = ax ⌘x , ⌘y0 = ay ⌘y , ⌫x0 = ax ⌫x + bx , ⌫y0 = ay ⌫y + by , x0 = x /ax , y0 = y /ay . Substituting these terms into (4) and (5) gives Xi0 = Xi and Yi0 = Yi , which means rKE (x0 , y 0 ) = rKE (x, y) by (6). 2.2 Robustness analysis of KECC From its definition in (6), it can be asserted that KECC is robust against impulsive noise (outliers with very large variance). The reason is three-fold. Firstly, the sample median, being a robust estimate of the central tendency of the data, is employed instead of the sample mean, which is sensitive to outliers. Secondly, the Gaussian kernel function suppresses the influence of outliers by imposing a smaller weight on sample points with very large values; whereas it has little e↵ect on the majority normal sample points. Thirdly, the kernel width ⌘ depends on the interquartile range of the samples, which is also a robust measure of data dispersion. Theorem 2 below illustrates quantitatively the robustness of KECC against impulsive noise.

3

Theorem 2 Let N (µ1 , µ2 , 12 , 22 , %) be the probability density function (pdf ) of a bivariate Gaussian distribution with means µj , variances j2 , j = 1, 2, and correN lation %. Assume that {(xi , yi )}i=1 in (6) are N data pairs drawn from a contaminated Gaussian population (X , Y), whose pdf is (1 ")N (µX , µY ,

2 X,

2 Y , ⇢)+"N (µX , µY ,

02 X,

02 0 Y,⇢ )

(7)

0 0 where 0  " ⌧ 1, X X and Y Y . Let IRX and IRY be the population interquartile ranges with respect to X and Y, respectively. Write ⌘X , 3.53 ⇥ IRX and ⌘Y , 3.53 ⇥ IRY . Then

⇢ lim E(rKE ) = q ⇡⇢ 2 N !1 1 ⇢ 1 3 0 ( + T ) !1 X S 2

(8)

0 Y !1

where s

s

⌘Y4

(9)

r 2 2 2 ⌘2X + 1 2 ⌘2Y + 1 X Y T = r +r . 2 2 Y X 2 ⌘2 + 1 2 ⌘2 + 1

(10)

S=

2

2 ⌘X

2 X

+

4 ⌘X

4 X

2

⌘Y2

2 Y

+

4 Y

r

Y

X

Proof see Appendix. Remark 1 The contaminated Gaussian model (CGM) defined by (7) is frequently employed to model the impulsive noise in the literature [24, 25]. Specifically, in (7), the first term represents the distribution of the majority of the “normal” data; whereas the second term represents the distribution of a tiny fraction of outliers, with variances vary large compared with those of the majority. 2.3 Other four correlation coefficients For completeness of this paper, we present the definitions of four other correlation coefficients, which are recruited in the comparison studies later on. The associated four estimators for the parent correlation with respect to bivariate normal model are also listed for ease of reference. N Let {(xi , yi )}i=1 be N data pairs draw from a conN tinuous bivariate distribution. Rearranging {(xi , yi )}i=1 in ascending order based on the magnitudes of x, we get N N two new data sequences, x(i) i=1 and y[i] i=1 , respectively, where x(1) < · · · < x(N ) are called the order

4

Weichao Xu * et al.

statistics of x and y[1] , . . . , y[N ] the associated concomitants [26, 27, 28]. Suppose that xj is at the kth posiN tion in the sorted sequence x(i) i=1 , the number k is termed the rank of xj and is denoted by Pi (= k). Similarly, we also define the rank of yi and denote it by Qi . Given these notations, the four coefficients, PPMCC (rP ), KT (rK ), SR (rS ) and OSCC (rO ), are then defined as follows:

rP ,  N X

N X

(xi

x)(yi

(xi

x)

i=1

rK ,

1/2

(yi

y)

sgn(xi

xj )sgn(yi

i=1 j=1

N X

N (N

(Pi

N (N 2

N X

[x(i)

x(N

i=1 N X

Qi ) 2 (13)

1)

i+1) ]y[i]

.

[x(i)

x(N

yj ) (12)

1)

i=1

rS , 1

(11)

2

i=1

N X N X

6

rO ,

N X

(14)

i+1) ]y(i)

N

If {(xi , yi )}i=1 are i.i.d and follow a bivariate Gaussian model with correlation coefficient ⇢, the following four estimators of ⇢ [6] ◆ 1 ⇡rK ⇢ˆK , sin 2 ✓ ◆ 1 ⇢ˆS , 2 sin ⇡rS 6

(15)

are asymptotically unbiased as N large, with variances [7]

⇡ 2 (4 36 ⇡ 2 (1 ⇢2 ) V(ˆ ⇢S ) = V(rS ) 4 V(ˆ ⇢O ) = V(rO ) ' V(rP )

3.1 Model of noise We employ the contaminated Gaussian model (CGM) which is commonly used to generate impulsive noise in the literature [24, 25]. Let N (µ, 2 ) stand for a normal distribution with mean µ and variance . The probability density function (pdf) of CGM is ")N (0,

2 1)

+ "N (0,

2 2)

(18)

2 where 0  " ⌧ 1, 2 1 , N (0, 1 ) represents the pdf of the “normal” (non-impulsive) noise, and N (0, 22 ) represents the pdf of the impulsive noise. So, the total noise variance is

= (1

")2

2 1

+ "2

2 2.

(19)

Specially, the CGM (18) degenerates to the Gaussian model N (0, 12 ) when " = 0. In this work, the values of 4 1 and 2 are set to be 0.1 and 10 , respectively, unless otherwise specified.

3.2 Models of association 3.2.1 Linear model (LM)

(1 ⇢2 )2 N 1 ⇢2 ) V(rK )

V(ˆ ⇢P ) = V(rP ) ' V(ˆ ⇢K ) =

This section introduces the models of noise, linear and nonlinear associations emulating di↵erent scenarios that might be encountered in practice. Several indices are also provided to evaluate the performance of KECC in comparison with other four correlation coefficients, in terms of their abilities of quantifying the associations between two random variables. In all these models, the data are generated with bivariate Gaussian models, and the sample size is set to be 1000.

2



⇢ˆO , rO

(17)

For brevity, we use ⇢⇣ , ⇣ = {KE, P, K, S, O} to denote the five estimators in the sequel.

(1

i=1

⇢ˆP , rP

⇢ˆ , rKE .

3 Models and performance evaluation

y)

i=1

2

where the expressions of V(rK ) and V(rS ) can be found in [9]. Another estimator based on KECC is constructed as

LM is constructed by a bivariate normal model [29] with additive noise from CGM. Specifically, assume that (x0 , y 0 ) ⇠ N (µx , µy , x2 , y2 , ⇢). Then, LM is formulated as (16)

x(i) = x0 (i) + ↵nx (i) y(i) = y 0 (i) + ↵ny (i)

(20)

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

5

where ⇢ 2 [ 1, 1] characterizes the linear association, ↵ 2 [0, 10] controls the signal-to-noise ratio (SNR), nx (i) and ny (i) are noises generated from (18). Under this LM, five estimators ⇢⇣ are to be computed with respect to various parameter setups as follows. 1. ⇢ = 1 and " = 0. In this case the LM is impulsive noise free. With increasing of ↵, the association between x and y becomes smaller and smaller, which means that ⇢⇣ should have a decreasing relationship with ↵. For a fixed ↵, the greater the magnitude of E(⇢⇣ ), the better the corresponding performance in the context of normal noise robustness. 2. ⇢ = 1, ↵ = 1 and " 0. This case aims at investigating the robustness against impulsive noise. With increasing of ", the fraction of impulsive component is larger and larger, which means that ⇢⇣ should also have a decreasing relationship with ". For a fixed ", the greater magnitude of E(⇢⇣ ), the better its performance in the context of impulsive noise robustness. 3. ⇢ 2 [ 1, 1], ↵ = 1 and " 0. This case is to compare the biasedness as well as the power of discriminating di↵erent underlying ⇢’s under various fractions of impulsive noise.

aspects, under di↵erent models with di↵erent parameter setups. Now let us elaborate the seven evaluation methods one by one.

3.2.2 Nonlinear model (NM)

3.3.4 Sensitivity to changes in ⇢

NM is a nonlinear model used to study the e↵ect of nonlinear transformations to the data. It is constructed based on LM, by imposing monotonic nonlinear transforms to both channels, as

Under LM, we employ another index called sensitivity ratio (SR) [30] to test the sensitivity to changes in ⇢. For this purpose, Fisher’s z-transform of ⇢⇣ , denoted as z⇣

x(i) = T [x0 (i)] + ↵nx (i) y(i) = T [y 0 (i)] + ↵ny (i)

(21)

where the nonlinear increasing transform T assumes two forms, T1 (·) = 2sign(·)(·)2 and T2 (·) = 2 arctan(·), respectively. The parameter ↵ 2 [0, 1] controls the SNR. Under this NM, five estimators ⇢⇣ are to be computed with respect to various parameter setups as follows. 1. ↵ = 0 and ⇢ 2 [ 1, 1]. In this case, the data is only attenuated by the two nonlinear transforms. The closer the value of E(⇢⇣ ) to ⇢, the better the robustness performance against nonlinearity. 2. ↵ > 0, " > 0 and ⇢ 2 [ 1, 1]. In this case, interference comes from both nonlinearity and impulsive noise. Again, the closer the value of E(⇢⇣ ) to ⇢, the better the robustness performance against both nonlinearity and impulsive noise.

3.3.1 Noise robustness Under LM, we firstly fix " = 0 and compare the decreasing rates of E(⇢⇣ ) with the increase of ↵. We further fix ↵ = 1 and compare the decreasing rates of E(⇢⇠ ) with the increase of ". 3.3.2 Deviation degree Under both LM and NM, we compare the degree of deviation from E(⇢⇣ ) to the ideal ⇢ 2 [ 1, 1] in a similar way as in [6]. 3.3.3 Variance Under both LM and NM, we compare the variance V⇣ , V(⇢⇣ ) of all five estimators. The smaller the variance, the better the performance.

z⇣ = tanh

1

⇢⇣ =

1 1 + ⇢⇣ loge 2 1 ⇢⇣

(22)

is employed to preprocess the computed ⇢⇣ before further manipulation. After such transformation, which maps [ 1, 1] to [ 1, 1], the resultant z⇣ follows approximately normal distributions with constant variances (i.e., independent of ⇢) [12, 14]. Given two distinct (1) ⇢1 and ⇢2 (⇢1 < ⇢2 ), we have two sets of coefficients ⇢⇣ (2)

(1)

and ⇢⇣ , and their respective Fisher’s z-transforms, z⇣ (2)

and z⇣ . SR is then defined as (1)

(2)

z⇣ z⇣ SR⇣ = p 2 &1 + &22

(23)

(i)

where z ⇣ and &i denote the mean and standard devia3.3 Performance evaluation The comparative studies with respect to the performance of r⇣ are conducted from the following seven

(i)

tion of z⇣ , respectively, for i = 1, 2. Note that SR measures the ability of to detect the changes of underlying ⇢. Greater values of SR indicate better discriminatory power.

6

Weichao Xu * et al.

3.3.5 Root mean square error (RMSE) The RMSE represents the sample standard deviation of the di↵erence between estimated and true ⇢, which is defined as q RMSE , E [(ˆ ⇢⇣ ⇢)2 ]. (24) It is a measure combining the e↵ects of both the bias and variance. transmit plus y [i]

the ideal echo y [i

⌧0 ]

8 (i 100)2 > < 1000 800 e 0  i < 200 y2 [i] = p > : 40⇡ 0 otherwise

(26)

where the corrupted signal is x[i] = y[i ⌧0 ] + n[i] with n[i] following CGM (18) with " = 0.05, 2 = 104 1 = 1. The time delay ⌧0 is arbitrarily set to be 300. The purpose is to estimate ⌧0 as accurate as possible under various signal to noise ratio SN R , 20 log10 1 . As illustrated in Figure 1, the procedure of estimating ⌧0 includes two steps. The first step is to construct a correlation function that corresponds to ⌧ by each of ⇢⇣ with respect to x[i] and y[i ⌧ ]. The second step is to locate time-shift ⌧ˆ0 corresponding the maximum of the correlation function. 3.3.7 Time Complexity Measurement

⌧0

echo plus noise x[i] = y [i

⌧ 0 ] + n [i ]

We analyze the time complexities of r⇣ in the language of big-oh, in symbol, O(·). We also estimate the relationship between computational loads of r⇣ versus the length of signal from 200 to 2000 with a step N = 200.

⌧0

scanning window y [i

⌧]

4 Numerical results correlation function of time lag ⌧

⌧ˆ0

Fig. 1: Schematic illustration of estimating the timedelay ⌧0 . In the bottom panel, the time-shift ⌧ˆ0 corresponding to the maximum of the correlation function is considered as an estimate of the true time-delay ⌧0 .

This section compares the performance of KECC with those of PPMCC, SR, KT and OSCC under LM and NM, in terms of seven means of evaluation described above. All samples are generated by functions in the Matlab Statistics ToolboxTM . Specifically, the normal samples are generated by mvnrnd, whereas the contaminated normal samples are generated by gmdistribution and random. The number of Monte Carlo trials is set to be 104 unless otherwise specified.

4.1 Comparative study under linear model 3.3.6 Time-delay estimation

4.1.1 Noise robustness

It is often encountered in radar, sonar or communication that we need to estimate the time-delay between a prescribed “clean” signal with a distorted and delayed version corrupted by impulsive noise. In this work we will give two examples of time-delay estimation. In these examples, the prescribed clean signals are a segment of sinusoidal wave or Gaussian wave as follows:

y1 [i] =

8 < :

3 sin 0



10⇡ ⇥ i 200



0  i < 200 otherwise

(25)

Figure 2 shows the results of E(⇢⇣ ) versus ↵ under LM for ⇢ = 1. It is seen from Figure 2(a) that, when the data contains no impulsive noise (" = 0), all five correlation coefficients decrease unanimously with increasing of ↵. That is to say, KECC, PPMCC, OSCC, SR and KT have the same performance under the impulsive noise free scenario. However, as illustrated by Figure 2(b), the curves of PPMCC and OSCC drop rapidly from 1 to 0 with increase of ", meaning their extreme sensitivity to impulsive noise. One the other hand, ⇢KE , ⇢K and ⇢S descend much slowly than ⇢P and ⇢O , suggesting the

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

0 .8

0 .8

0 .6

0 .6 ⇢⇣

1

⇢⇣

1

7

0 .4 0 .2 0

0 .4

PPMCC SR KT OSCC KECC

0

2

4

6

8

PPMCC SR KT OSCC KECC

0 .2 0

10

0

0.05

0 .1



0.15

0 .2

"

(a)

(b)

1

1

0 .5

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

1

⇢⇣

⇢⇣

Fig. 2: Noise robustness comparison under LM. (a) Results under the bivariate normal model (" = 0). (b) Results under CGM (" 2 [0, 0.2] with increment 0.02).

1

0 .5

0

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

1

1

1



1

0 .5

0 .5

1

⇢⇣

⇢⇣

1

PPMCC SR KT OSCC KECC Ideal

0 .5

1

0 .5

0

⇢ (c) " = 0.05

0

0 .5

1

⇢ (b) " = 0.02

(a) " = 0

0

0 .5

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

1

1

1

0 .5

0

⇢ (d) " = 0.08

Fig. 3: Relations between E(⇢⇠ ) and ⇢ 2 [ 1, 1] under LM.

0 .5

1

8

Weichao Xu * et al. ·10

3

·10

PPMCC SR KT OSCC KECC Ideal

0 .5

0

V⇣

1

V⇣

1

3

1

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

0 .5

0

1

1

0 .5



3

·10

1

0 .5

1

3

V⇣

1

V⇣

1

PPMCC SR KT OSCC KECC Ideal

0 .5

0

0 .5

⇢ (b) " = 0.02

(a) " = 0 ·10

0

1

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

0 .5

1

⇢ (c) " = 0.05

0

1

0 .5

0

⇢ (d) " = 0.08

Fig. 4: Relations between V(⇢⇣ ) and ⇢ 2 [ 1, 1] under LM. robustness of the former three against impulsive noise. Based on the dropping speed, we can say that KECC outperforms KT outperforms SR outperforms OSCC and PPMCC. 4.1.2 Comparison of deviation degree Figure 3 demonstrates the relationships of ⇢⇣ versus ⇢ 2 [ 1, 1] with an increment ⇢ = 0.1. Under the linear model LM, we set " 2 {0, 0.02, 0.05, 0.08} to evaluate the robustness against impulsive noise by means of deviation of ⇢⇣ from ⇢. Obviously, the ideal cure is the diagonal line (“Ideal” in Figure 3). Then, the closer the observed curve to the diagonal line, the smaller the degree of deviation, and the better the performance. From Figure 3(a), it is clear that, when there is no impulsive noise in the data (" = 0), 1) 1  ⇢⇣  +1; 2) ⇢⇣ ± 1 as ⇢ ± 1; 3) ⇢⇣ = 0 as ⇢ = 0; 4) ⇢⇣ are all increasing functions of ⇢; and 5) all E(⇢⇣ ) coincide with the diagonal line. Therefore, under LM, all five coefficients performs similarly when the data are free from impulsive noise.

However, as shown in Figure 3(b)-(d), E(⇢P ) and E(⇢O ) are both approximately equal to zero, no matter how small " is. In other words, PPMCC and OSCC lose completely the information of the underlying ⇢ as long as there is impulsive noise in the data. Meanwhile, smaller deviations with respect to KECC, KT and SR are observed, meaning their robustness against impulsive noise under LM. The overall performance in the aspect of impulsive-noise robustness can thus be ordered as ⇢KE > ⇢K > ⇢S > ⇢O ⇡ ⇢P according to the degree of deviation from the diagonal line. 4.1.3 Comparison of variance Figure 4 shows the relationships between the variances V(⇢⇣ ) and ⇢ 2 [ 1, 1] under LM. In this case, we also set " 2 {0, 0.02, 0.05, 0.08}. Note that, the smaller the magnitude, the better the performance. Figure 4(a) illustrates the comparable performances of PPMCC, KECC and OSCC for " = 0, the impulsive noise free case. It is seen that the variances of these three estimators

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

9

Table 1: SRs Comparison of The Results of LM ⇢1 /⇢2

0/0.1

SRP SRS " = 0.00 SRK SRO SRKE

55.6464 58.8588 63.8231 63.7045

0.1/0.2 0.2/0.3 0.3/0.4 0.4/0.5 0.5/0.6 0.6/0.7 0.7/0.8 0.8/0.9

64.1992 8.3649 5.1784 3.8864 3.2081 2.7974 2.5467 2.4011 2.2959

7.3603 7.7401 8.3548 8.2998

-0.0075 3.6892 8.6609 5.0658 0.0068 -0.0066 7.9006 6.5265

4.6661 4.8339 5.1734 5.1478

3.5419 3.6326 3.8839 3.8615

2.9604 3.0114 3.2071 3.1900

2.6100 2.6387 2.7964 2.7834

2.4031 2.4171 2.5464 2.5375

2.2754 2.2823 2.4008 2.3921

2.1849 2.1870 2.2953 2.2888

0.0092 -0.0023 0.0011 -0.0025 3.1972 2.7667 2.4910 2.2695 3.7107 2.9983 2.6056 2.3299 0.0076 -0.0024 0.0021 -0.0024

-0.0091 2.1587 2.1880 -0.0103

0.0137 -0.0030 2.0448 1.9832 2.0576 1.9871 0.0159 -0.0055

SRP SRS " = 0.02 SRK SRO SRKE

0.0055 3.7479

SRP SRS " = 0.04 SRK SRO SRKE

-0.0010 2.5551 6.0334 -0.0014

0.0023 -0.0035 -0.0019 2.4896 2.3264 2.1600 3.6737 2.8773 2.4448 0.0012 -0.0025 -0.0030

6.1649

5.3174 4.0996 3.3388 2.8602 2.5671 2.3614 2.2371 2.1616

SRP SRS " = 0.06 SRK SRO SRKE

-0.0000 1.9632 4.7868 -0.0006

0.0028 -0.0026 -0.0070 1.9014 1.8509 1.7801 2.8930 2.3552 2.0598 0.0017 -0.0007 -0.0082

5.1594

4.5354 3.6927 3.1225 2.7141 2.4520 2.2734 2.1566 2.0747

SRP SRS " = 0.08 SRK SRO SRKE

-0.0098 1.4830 3.7361 -0.0094

0.0030 1.4589 2.2738 0.0034

4.1688

3.7992 3.2630 2.8289 2.5095 2.3225 2.1281 2.0517 1.9757

4.5905 3.5962 3.0374 2.6644 2.4586 2.2980 2.2135

0.0021 1.4436 1.8752 0.0012

0.0091 2.0202 2.1697 0.0097

0.0040 -0.0108 0.0054 1.6946 1.6304 1.5904 1.8498 1.7170 1.6359 0.0042 -0.0103 0.0044

0.0017 0.0003 1.3943 1.3436 1.6437 1.4915 0.0025 -0.0002

are very close to the theoretical result in the bivariate normal model (“Ideal” curve is drawn from the first formula in (16)). At the same time, the variances with respect to KT and SR are greater than those of PPMCC, KECC and OSCC, which manifests the inferiority of the first two coefficients under the bivariate normal model. From Figure 4(b)-(d), it is observed that, when the data is corrupted with impulsive noise (" > 0), 1) V(⇢KE ) remains the lowest for all "; 2) V(⇢K ) is comparable to V(⇢S ) for |⇢| around zero, and the former is smaller than the latter for ⇢ close to ±1; 3) V(⇢P ) and V(⇢O ) are greater than the other three over a large range of ⇢; and 4) V(⇢O ) is consistently larger than V(⇢P ), although the di↵erence decreases with increase of ". Based on these observations, the overall variance performance can be ordered as ⇢KE > ⇢K > ⇢S > ⇢P > ⇢O . 4.1.4 Comparison of sensitivity ratio Table 1 lists the comparative results of SR defined by (23) for all five coefficients over ⇢ 2 [0, 0.9] under LM. The following phenomena are observed. 1. Under the exact bivariate normal model, i.e., when " = 0, the values of SRP are maximum, followed by SRO by SRKE by SRK by SRS . This is of no surprise due to the optimality of PPMCC in the normal

0.0020 -0.0149 0.0072 -0.0024 1.9230 1.8438 1.7876 1.7531 2.0039 1.8848 1.8056 1.7593 0.0021 -0.0156 0.0085 -0.0026 0.0026 0.0002 1.5637 1.5270 1.5852 1.5336 0.0043 -0.0010

0.0011 -0.0004 0.0021 -0.0013 1.3406 1.3043 1.2857 1.2697 1.4258 1.3490 1.3057 1.2767 0.0012 -0.0007 0.0030 -0.0021

cases. However, the di↵erences between SRP , SRO and SRKE are very small. This means that PPMCC is only slightly better than OSCC and KECC in this scenario. 2. When " > 0, SRKE dominates others in most cases, followed by SRK by SRS by SRO and SRP . 3. When " > 0, SRP and SRO are both around zero, that is, they lose completely the power of discriminating changes of ⇢ when impulsive noise exists in the data. Based on these observations, the overall performance in terms of SR can be ordered as SRKE > SRK > SRS > SRO ⇡ SRP . 4.1.5 Comparison of root mean square error From Table 2, we can observe that 1) for " = 0, RMSEP is the smallest, but only slightly smaller than RMSEO than RMSEKE ; 2) for " > 0, RMSEKE is the smallest, except for some rare cases; 3) for " > 0, RMSEP and RMSEO are approximately the same, but far larger than the other three. The overall performance in terms of RMSE is thus ordered as RMSEKE < RMSEK < RMSES < RMSEO ⇡ RMSEP .

Weichao Xu * et al. 10



-1.0

-0.9

-0.8

-0.7

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

Table 2: RMSE of five estimators for " = {0.00, 0.02, 0.04, 0.06, 0.08} under LM -0.6

0.6

0.7

0.8

0.9

1.0

0.0178 0.0173 0.0162 0.0164 0.6010 0.0570 0.0524 0.6011

0.0222 0.0217 0.0204 0.0205 0.5013 0.0512 0.0487 0.5014

0.0256 0.0252 0.0237 0.0239 0.4014 0.0451 0.0439 0.4016

0.0285 0.0283 0.0268 0.0269 0.3013 0.0406 0.0401 0.3015

0.0302 0.0301 0.0286 0.0286 0.2028 0.0366 0.0364 0.2032

0.0319 0.0318 0.0303 0.0303

0.1050 0.0341 0.0341 0.1056

0.0328 0.0328 0.0313 0.0312

0.0307 0.1051 0.2024 0.3017 0.0331 0.0342 0.0364 0.0406 0.0332 0.0342 0.0363 0.0401 0.0323 0.1056 0.2026 0.3017

0.0334 0.0334 0.0319 0.0318

0.4014 0.0454 0.0443 0.4016

0.0282 0.0280 0.0267 0.0267

0.5010 0.0510 0.0486 0.5011

0.0256 0.0252 0.0238 0.0239

0.6008 0.0569 0.0523 0.6009

0.0218 0.0213 0.0200 0.0202

0.7015 0.0630 0.0550 0.7016

0.0179 0.0173 0.0161 0.0163

0.8003 0.0694 0.0559 0.8004

0.0129 0.0124 0.0115 0.0117

0.9004 0.0759 0.0525 0.9006

0.0070 0.0066 0.0060 0.0061

1.0004 0.0822 0.0322 1.0005

0.0000 0.0000 0.0000 0.0000

"

0.0127 0.0122 0.0114 0.0115 0.7010 0.0635 0.0554 0.7010

⇢ˆP 0.0000 0.0060 0.0114 0.0162 0.0203 0.0237 0.0268 0.0285 0.0303 0.0312 0.0318 0.0314 0.0303 0.0288 0.0266 0.0238 0.0200 0.0161 0.0115 0.0060 0.0000

0.0070 0.0066 0.0060 0.0061 0.8005 0.0696 0.0560 0.8006

0.0304 0.0303 0.0288 0.0288

0.0000 0.0000 0.0000 0.0000 0.9007 0.0760 0.0525 0.9008

0.0319 0.0319 0.0303 0.0303

⇢ˆS 0.00 ⇢ˆK ⇢ˆO ⇢ˆKE

1.0004 0.0821 0.0322 1.0005

0.0330 0.0330 0.0314 0.0314

⇢ˆP ⇢ˆS 0.02 ⇢ˆK ⇢ˆO

0.9005 0.1675 0.1210 0.9005

0.8006 0.1512 0.1245 0.8006

0.7006 0.1345 0.1188 0.7006

0.6006 0.1171 0.1081 0.6006

0.5012 0.0999 0.0950 0.5012

0.4015 0.0827 0.0803 0.4016

0.3012 0.0658 0.0648 0.3013

0.2025 0.0503 0.0500 0.2026

0.1047 0.0381 0.0381 0.1049

0.0315 0.0331 0.0331 0.0323

0.1047 0.0384 0.0384 0.1050

0.2027 0.0507 0.0504 0.2028

0.3014 0.0662 0.0652 0.3015

0.4011 0.0828 0.0804 0.4012

0.5012 0.1002 0.0953 0.5013

0.6008 0.1174 0.1084 0.6009

0.7004 0.1345 0.1187 0.7005

0.8006 0.1512 0.1244 0.8006

0.9005 0.1672 0.1208 0.9005

1.0007 0.1831 0.0827 1.0007

0.0603 0.0552 0.0507 0.0466 0.0426 0.0393 0.0366 0.0343 0.0326 0.0318 0.0315 0.0321 0.0329 0.0346 0.0370 0.0395 0.0427 0.0467 0.0507 0.0552 0.0604

1.0005 0.1829 0.0826 1.0005

⇢ˆKE 0.0303 0.0288 0.0281 0.0281 0.0284 0.0291 0.0298 0.0307 0.0311 0.0315 0.0318 0.0318 0.0309 0.0307 0.0298 0.0291 0.0283 0.0279 0.0280 0.0288 0.0303 ⇢ˆP ⇢ˆS 0.04 ⇢ˆK ⇢ˆO ⇢ˆKE

⇢ˆP 1.0006 0.9007 0.8009 0.7009 0.6006 0.5010 0.4008 0.3014 0.2024 0.1048 0.0315 0.1046 0.2023 0.3017 0.4015 0.5009 0.6010 0.7008 0.8006 0.9006 1.0007 ⇢ˆS 0.2761 0.2525 0.2269 0.2009 0.1745 0.1471 0.1196 0.0926 0.0671 0.0443 0.0331 0.0439 0.0667 0.0925 0.1201 0.1469 0.1746 0.2009 0.2268 0.2523 0.2767

0.9003 0.3551 0.2891 0.9003

0.8005 0.3182 0.2805 0.8006

0.7007 0.2810 0.2588 0.7007

0.6009 0.2429 0.2302 0.6010

0.5011 0.2036 0.1968 0.5011

0.4014 0.1650 0.1617 0.4014

0.3018 0.1258 0.1244 0.3018

0.2027 0.0875 0.0870 0.2028

0.1050 0.0521 0.0521 0.1051

0.0316 0.0332 0.0332 0.0320

0.1050 0.0523 0.0523 0.1051

0.2026 0.0876 0.0872 0.2026

0.3017 0.1257 0.1243 0.3018

0.4013 0.1654 0.1620 0.4013

0.5007 0.2038 0.1969 0.5006

0.6009 0.2426 0.2300 0.6009

0.7008 0.2806 0.2585 0.7008

0.8008 0.3183 0.2806 0.8008

0.9005 0.3549 0.2889 0.9005

1.0005 0.3904 0.2458 1.0005

0.1301 0.1177 0.1054 0.0934 0.0817 0.0699 0.0596 0.0489 0.0404 0.0337 0.0315 0.0340 0.0406 0.0489 0.0594 0.0705 0.0816 0.0937 0.1055 0.1178 0.1300

1.0007 0.3903 0.2457 1.0007

0.06 ˆK 0.1470 0.1931 0.1927 0.1808 0.1630 0.1409 0.1166 0.0913 0.0667 0.0443 0.0331 0.0439 0.0663 0.0912 0.1170 0.1407 0.1630 0.1809 0.1927 0.1929 0.1474 ⇢ ⇢ˆO 1.0007 0.9007 0.8008 0.7009 0.6007 0.5010 0.4008 0.3014 0.2025 0.1049 0.0320 0.1048 0.2024 0.3018 0.4015 0.5010 0.6010 0.7008 0.8007 0.9006 1.0007 ⇢ˆKE 0.0901 0.0821 0.0740 0.0663 0.0593 0.0519 0.0457 0.0401 0.0358 0.0327 0.0317 0.0328 0.0354 0.0399 0.0457 0.0523 0.0593 0.0663 0.0740 0.0820 0.0903 ⇢ˆP ⇢ˆS 0.08 ⇢ˆK ⇢ˆO ⇢ˆKE

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

11

Table 3: Performance comparison of rKE , rK , rS for y[i] being a segment of gaussian wave. SN R

rKE

rK

rS

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

300.02±0.48 300.00±0.51 300.00±0.56 300.01±0.60 300.01±0.66 300.01±0.77 299.98±0.89 299.99±1.05 299.98±1.22 300.01±1.48 300.00±1.79

300.00±1.62 300.00±1.84 300.00±2.14 299.98±2.54 300.00±2.95 300.00±3.41 299.98±3.91 299.96±4.77 299.96±5.42 300.01±6.47 299.99±7.51

299.80±12.19 299.74±12.44 299.71±13.01 299.72±13.39 299.81±14.14 299.77±14.75 299.75±15.25 299.74±16.05 299.72±16.29 299.79±17.25 299.72±18.38

Table 4: Performance comparison of rKE , rK , rS for y[i] being a segment of sinusoidal wave. SN R

rKE

rK

rS

0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

300.00±0.03 300.00±0.05 300.00±0.09 300.01±0.12 300.01±0.17 300.00±0.22 300.01±0.44 299.99±0.50 300.00±1.37 299.99±5.63 300.06±10.79

299.87±0.14 299.84±0.17 299.83±0.20 299.82±0.40 299.81±0.95 299.81±0.67 299.80±2.34 299.79±3.08 299.82±7.50 299.79±18.54 299.91±34.06

299.84±1.33 299.80±1.64 299.82±1.23 299.83±2.06 299.80±2.25 299.81±2.14 299.82±4.17 299.80±5.13 299.82±10.22 299.81±20.41 299.87±37.48

4.1.6 Comparison of time delay estimation Here, we only compare the performance of KECC, KT and SR for estimation of time delay. PPMCC and OSCC are excluded from this study due to their extreme sensitivity to impulsive noise verified above. From Table3 and Table4, we can see that all three coefficients can estimate the time delay accurately, since E(ˆ ⌧0 ) are very close to the true value of ⌧0 . However, from the viewpoint of standard deviation, the performance in terms of time delay estimation is ordered as KECC > KT > SR.

as the other three have similar deviations from the diagonal line; 2. for " = 0.02, as shown in the Figure 5(b), the deviation of E(⇢KE ) from the diagonal line is the smallest, followed by E(⇢K ) by E(⇢S ) by the other two; 3. for " = 0.05, E(⇢O ) and E(⇢P ) are both around zero, irrelevant to the underlying ⇢, that is, OSCC and PPMCC lose completely the information about ⇢. Similar results are also observed from Figure 7 with respect to T2 (·) = arctan(·). The overall performance can thus be ordered as KECC > KT > SR > OSCC and PPMCC.

4.2 Comparative study under nonlinear model 4.2.2 Comparison of variance 4.2.1 Comparison of deviation degree Figures 5 depicts the relationships between ⇢⇣ and ⇢ with respect to T1 (·) = sign(·)(·)2 . It is seen that 1. for " = 0, as shown in the Figure 5(a), E(⇢K ) and E(⇢S ) are both very close to the diagonal line; where-

Figures 6 and 8 present the relationships between variance (V⇣ ) and ⇢ with respect to T1 and T2 . The following are observed. 1. For " = 0, V(⇢P ) is simliar to V(⇢O ), whereas V(⇢K ) is similar to V(⇢S ). For |⇢| around zero, V(⇢KE )
SRK > SRS > SRO ' SRP . This again illustrates the superiority of KECC over other coefficients in terms of discriminating changes of ⇢ when the data is attenuated by both nonlinearity and impulsive noise.

These observations allow us to order the performance, in terms of variance under NM, as KECC > KT > SR > OSCC and PPMCC.

4.2.4 Comparison of root mean square error

4.2.3 Comparison of sensitivity ratio

1. in general, for " = 0,

The results of SR listed in Table 6 under NM show that 1. in general, for " = 0, SRK > SRS > SRKE > SRO > SRP ;

Table 5 summarizes the results of RMSE for " = 0 and " = 0.08, respectively. It is seen that

RMSEK < RMSES < RMSEO < RMSEKE ' SRP ; 2. in general, for " = 0.08, RMSEKE < RMSEK < RMSES < RMSEO ' SRP .

-1.0

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.7010 0.1981 0.1775 0.7010

0.6011 0.1716 0.1598 0.6011

0.5009 0.1453 0.1389 0.5009

0.4011 0.1182 0.1150 0.4011

0.3017 0.0915 0.0902 0.3017

0.2026 0.0658 0.0654 0.2026

0.1050 0.0435 0.0435 0.1052

0.0437 0.0376 0.0332 0.0317 0.0333 0.0317 0.0324 0.0329 0.0332 0.0329 0.0316 0.0324 0.0329 0.0333 0.0329 0.0436 0.0375 0.0332 0.0317 0.0333 0.0444 0.0380 0.0333 0.0317 0.0334

0.0379 0.0437 0.0498 0.0541 0.0569 0.0326 0.0320 0.0313 0.0297 0.0283 0.0326 0.0318 0.0312 0.0298 0.0290 0.0378 0.0436 0.0496 0.0538 0.0566 0.0383 0.0443 0.0507 0.0553 0.0583

0.0559 0.0502 0.0369 0.0116 0.0264 0.0240 0.0211 0.0187 0.0281 0.0274 0.0265 0.0194 0.0556 0.0499 0.0366 0.0115 0.0574 0.0515 0.0378 0.0112

⇢ˆP 1.0007 0.9003 0.8008 0.7003 0.6009 0.5010 0.4012 0.3017 0.2024 0.1050 0.0318 0.1052 0.2023 0.3017 0.4016 0.5015 0.6010 0.7008 0.8004 0.9006 1.0000 ⇢ˆS 0.2830 0.2604 0.2352 0.2092 0.1817 0.1538 0.1253 0.0971 0.0695 0.0451 0.0333 0.0450 0.0696 0.0972 0.1256 0.1541 0.1819 0.2091 0.2358 0.2602 0.2828

0.0368 0.0501 0.0558 0.0565 0.0544 0.0496 0.0211 0.0238 0.0263 0.0282 0.0299 0.0311 0.0264 0.0273 0.0281 0.0289 0.0300 0.0311 0.0365 0.0497 0.0555 0.0562 0.0541 0.0495 0.0112 0.0377 0.0514 0.0573 0.0579 0.0556 0.0506

0.0353 0.0450 0.0577 0.0711 0.0847 0.0968 0.1068 0.1113 0.1058 0.0818

0.0436 0.0660 0.0918 0.1181 0.1450 0.1720 0.1984 0.2236 0.2484 0.2718 0.0436 0.0656 0.0905 0.1150 0.1386 0.1601 0.1778 0.1884 0.1867 0.1200 0.1050 0.2024 0.3017 0.4007 0.5010 0.6009 0.7007 0.8007 0.9005 1.0005

0.0315 0.1049 0.2024 0.3016 0.4007 0.5010 0.6009 0.7007 0.8007 0.9005 1.0004

⇢ˆK 0.1687 0.2110 0.2074 0.1930 0.1726 0.1489 0.1228 0.0961 0.0692 0.0450 0.0333 0.0449 0.0693 0.0962 0.1232 0.1492 0.1727 0.1929 0.2080 0.2109 0.1686 ⇢ˆO 1.0006 0.9003 0.8008 0.7003 0.6010 0.5010 0.4012 0.3018 0.2025 0.1051 0.0323 0.1053 0.2024 0.3017 0.4016 0.5015 0.6011 0.7008 0.8003 0.9005 1.0001 ⇢ˆKE 0.0916 0.1072 0.1110 0.1079 0.1004 0.0891 0.0765 0.0621 0.0484 0.0366 0.0318 0.0366 0.0483 0.0622 0.0765 0.0895 0.1003 0.1079 0.1112 0.1073 0.0917

0.08

0.0116 0.0187 0.0194 0.0115

⇢ˆP ⇢ˆS 0.00 ⇢ˆK T2 ⇢ˆO ⇢ˆKE

0.8008 0.2238 0.1885 0.8008

0.0334 0.0334 0.0320 0.0818 0.1059 0.1114 0.1068 0.0968 0.0847 0.0709 0.0575 0.0448 0.0353 0.0317

0.9002 0.2487 0.1870 0.9002

⇢ˆK 0.0021 0.0079 0.0131 0.0178 0.0218 0.0254 0.0282 0.0306 0.0319 0.0329 0.0332 0.0329 0.0321 0.0302 0.0282 0.0252 0.0220 0.0177 0.0130 0.0078 0.0021 ⇢ˆO 0.0002 0.0282 0.0475 0.0590 0.0634 0.0635 0.0585 0.0511 0.0424 0.0351 0.0319 0.0346 0.0424 0.0516 0.0583 0.0630 0.0637 0.0590 0.0474 0.0280 0.0002 ⇢ˆKE 0.0006 0.0463 0.0646 0.0693 0.0673 0.0622 0.0548 0.0469 0.0395 0.0339 0.0318 0.0338 0.0397 0.0470 0.0548 0.0620 0.0675 0.0694 0.0642 0.0462 0.0006

1.0006 0.2718 0.1200 1.0006

T2

-0.9

⇢ˆP 0.0002 0.0305 0.0504 0.0620 0.0663 0.0661 0.0605 0.0526 0.0432 0.0352 0.0317 0.0348 0.0432 0.0530 0.0604 0.0656 0.0666 0.0620 0.0503 0.0303 0.0002 ⇢ˆS 0.0044 0.0088 0.0138 0.0184 0.0223 0.0258 0.0284 0.0307 0.0320 0.0329 0.0332 0.0328 0.0321 0.0303 0.0284 0.0256 0.0224 0.0184 0.0138 0.0086 0.0044



⇢ˆP ⇢ˆS 0.08 ⇢ˆK T1 ⇢ˆO ⇢ˆKE

T1

0.00

"

Table 5: RMSE of five estimators under NM

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise 13

Weichao Xu * et al. 1

1

0 .5

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

1

1

0 .5

0

0 .5

0

⇢⇣

⇢⇣

14

PPMCC SR KT OSCC KECC Ideal

0 .5

1

1

1

0 .5



0

0 .5

1

⇢ (b) " = 0.08

(a) " = 0

Fig. 7: Relations between E(⇢⇠ ) and ⇢ 2 [ 1, 1] under NM with T2 . ·10

3

·10

PPMCC SR KT OSCC KECC Ideal

0 .5

0

V⇣

1

V⇣

1

3

1

0 .5

0

PPMCC SR KT OSCC KECC Ideal

0 .5

0 .5

1



(a) " = 0

0

1

0 .5

0

0 .5

1

⇢ (b) " = 0.08

Fig. 8: Relations between V(⇢⇣ ) and ⇢ 2 [ 1, 1] under NM with T2 . In other word, when the data is corrupted by both nonlinearity and impulsive noise, KECC outperms the other four coe↵cients once again.

4.3 Comparison of time complexity From their definitions in (6)–(14), it is obvious that (1) PPMCC has a linear time complexity of order O(N ), (2) KT has a quadratic time complexity of O(N 2 ), and (3) KECC, OSCC and SR all have linearithmic time complexity of order O(N log N ). The last assertion comes from the fact that KECC, OSCC and SR all depend on the sorting procedure, whose time complexity is of order O(N log N ). To confirm the above analysis, we estimate the relationship between computation time and signal length N , with N being increased with a step N = 200 from 200 to 2000. All the computational

speed tests were performed in MATLAB2015a in a PC with Inter(R) Core(TM) i7-3770 CPU @ 3.40GHz. The algorithms of r⇣ for each N were run for 2000 times. Figure 9 shows that the computational speed of KECC is just a little slower than that of PPMCC, faster than that of OSCC than that of SR than that of KT.

5 Concluding remarks In this paper, we proposed a new kernel correlation coefficient (KECC), with an emphasis on its robustness against impulsive noise and/or monotonic nonlinear transformations. To uncover its advantages, we compared KECC with other four correlation coefficients, namely, PPMCC, KT, SR and OSCC. Extensive simulation experiments were conducted under linear, nonlinear, normal and contaminated Gaussian models based

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise

15

CPU Time(Logarithnic Scale)[sec]

Table 6: SRs Comparison of The Reulst of NM

6

4

⇢1 /⇢2

0/0.1

0.1/0.2 0.2/0.3 0.3/0.4 0.4/0.5 0.5/0.6 0.6/0.7 0.7/0.8 0.8/0.9

SRP SRS " = 0.00 SRK T1 SRO SRKE

16.4644 21.9799 41.6656 27.1296

6.9904 5.2605 6.1692

3.3570 3.9815

SRP SRS " = 0.08 SRK T1 SRO SRKE

-0.0103 1.9799 5.4408 -0.0102

0.0114 1.9357 2.9857 0.0113

0.0027 -0.0013 -0.0075 -0.0019 1.8684 1.7847 1.7249 1.6369 2.3943 2.0720 1.8844 1.7270 0.0038 -0.0022 -0.0073 -0.0012

SRP SRS " = 0.00 SRK T2 SRO SRKE

28.3324 19.6987 21.7849 28.1929

6.4875 7.0625

2.7335 2.7209

2.4156 2.4041

2.2383 2.2282

SRP SRS " = 0.08 SRK T2 SRO SRKE

-0.0095 2.0127 4.6390 -0.0083

0.0092 -0.0123 0.0113 -0.0018 1.9372 1.8618 1.7640 1.6869 2.8332 2.3049 2.0054 1.8209 0.0093 -0.0136 0.0125 -0.0023

-0.0021 1.6139 1.6873 -0.0010

0.0019 -0.0020 0.0020 1.5670 1.5177 1.4849 1.6052 1.5356 1.4906 0.0023 -0.0027 0.0018

7.6329

4.9347 3.5240 2.8406 2.4246 2.1942 2.0207 1.9242 1.8328

43.0661 5.2705

5.7489

3.3394 4.5273

2.5915 3.5062

2.2526 2.9558

2.0384 2.6027

1.9571 2.3778

1.9116 2.2442

1.9069 2.1779

7.4050 4.6990 3.5945 3.0059 2.6296 2.3927 2.2504 2.1795

2.6116 3.1211

2.2742 2.6654

2.0575 2.3666

1.9707 2.1764

1.9215 2.0584

1.9140 2.0040

0.0043 0.0012 1.6112 1.5741 1.6588 1.5964 0.0043 -0.0003

0.0018 1.5506 1.5577 0.0035

4.4069 3.3188 2.7469 2.4245 2.2002 2.0726 1.9674 1.9179

4.1894 4.4656

3.2232 3.4012

2.7336 2.8646

2.4153 2.5298

2.2383 2.3377

2.1044 2.1960

2.0319 2.1144

7.2421 4.5764 3.4630 2.9021 2.5506 2.3485 2.2011 2.1158

6.4893 28.4623 6.4311

4.1901 4.1614

3.2235 3.2047

2.1049 2.0958

2.0323 2.0254

KECC runs in linearithmic time, only slightly slower than the fastest PPMCC. The advantages of KECC revealed in this work are believed to shed new light on the topic of correlation analysis, which is important in many areas including signal processing.

PPMCC SR KT OSCC KECC

2

Acknowledgment 0

2

500

1,000

1,500

2,000

N

Fig. 9: Results of CPU time versus N 2 [200, 2000] with increment N = 200. A logarithmic scale is used for better visual e↵ect.

on seven means of performance evaluation. Theoretical analysis showed that KECC satisfies various desired properties contained in Theorem 1. Numerical results suggest that (1) under the bivariate normal model, KECC performs equally well with the optimal PPMCC and OSCC, (2) under the contaminated Gaussian model, the behavior of KECC changes only slightly, and outperforms KT and SR, which are well known to be robust against impulsive noise, (3) under monotonic nonlinear models together with impulsive noise, KECC is also the best, (4) KECC can detect accurately the time delay of signals corrupted by impulsive noise, and (5)

This work was jointly supported in part by National Natural Science Foundation of China (Projects 61271380 and U1501251), in part by Guangdong Natural Science Foundation (Projects S2012010009870 and 2014A030313515), in part by Guangzhou Science and Technology Plan (Project 201607010290), and in part by Project Program of Key Laboratory of Guangdong Higher Education Institutes of China under Grant 2013CXZDA015.

Appendix Proof of Theorem 2 Let ⌫X and ⌫Y be the population medians of X and Y, respectively. Then, from [31], we have ⌫ X = µX

⌫ Y = µY .

(27)

Define ⌘X , 3.53IRX ⌘Y , 3.53IRY

(28) (29)

16

Weichao Xu * et al.

X0 , Xe Y , Ye 0

1 (X , Y) 2 (X , Y)

X2 2⌘ 2 X

(30)

Y2 2⌘ 2 Y

(31)

, N (µX , µY ,

2 X, 02 X,

, N (µX , µY ,

(X , Y) , (1

2 Y , ⇢) 02 0 Y,⇢ )

1 (X , Y) + "

")

(32) (33)

2 (X , Y).

N !1

E(X 0 Y 0 )

E(X 02 )E(Y 02 )

E1 (X 0 Y 0 ) =

(34) =

.

1 (X , Y)

=

2⇡

1 p

X

Y

where Q(X , Y) =

1 ⇢2

1

⇢2 )

+ (1 =

✓ ✓

⇢2

1

X

1 2 Q(X ,Y)

e

µX X

Y

µY Y

(X a)2 (Y + 2 2 (1 ⇢ ) X



◆2

Y

(36)

a = µX + ⇢

(Y

µY ) =

Y

µY Y

◆2

2 (X , Y)

=

2⇡

0 X

0 Y

where 1 Q (X , Y) = 1 ⇢0 2 0

02

+ (1 =

✓

⇢ )

Y

a0 ) 2

(X (1



X

(37)

⇢Y.

(38)

02

⇢ )

0 2 X

µY

0 Y

e

2 (X , Y)

in (33),

1 0 2 Q (X ,Y)

(39)

0Y

µY



◆2

0 Y

◆2

0 X 0 Y

(Y

µY ) =

2⇡

+1 ˆ H2 (Y) = Xe

1

(X a)2 2(1 ⇢2 ) 2 X

dX

(45)

2 (X , Y)dX dY

(46)

0 X

1 p 0 1 Y

⇢02

+1 ˆ Y 0e

Y2 2 02 Y

1

H2 (Y)dY (47)

X2 2⌘ 2 X

e

(X a0 )2 2(1 ⇢02 ) 02 X

dX .

(48)

Now we need to evaluate H1 and H2 . It follows that X2 (X a)2 (X + 2 2 = 2⌘X 2(1 ⇢2 ) X

Z)2 + A 2B

B= µY ) 0 2 Y

2

(40)

0 X 0 Y

⇢0 Y.

Z=

2 ⌘X

2 2 ⇢2 ) X a (1 ⇤ 2 + ⌘2 2 (1 ⇢2 ) X X 2 2 ⌘X ⇢2 ) X (1 2 2 2 (1 ⇢ ) X + ⌘X 2 a⌘X 2 + ⌘2 . (1 ⇢2 ) X X

(49)

Y2 2⌘ 2 Y

(X , Y)dX dY

(50) (51) (52)

Then, from (38), (50) and (51), we have (41)

A = B

2 Y



(1

2 2 2 X⇢ Y 2 ⇢2 ) X

+1 ˆ H1 (Y) = Xe

Ye

e

with

+1 ¨ E(X Y ) = Xe

X2 2⌘ 2 X

X2 2⌘ 2 X

0

and hence

0

(44)

and

From (30)–(41), the numerator in (35) becomes 0

H1 (Y)dY

1

1

A= ⇥

with a 0 = µ X + ⇢0

⇢2

Y2 2 2 Y

where

(Y

+

+1 ˆ Y 0e

(43)

1

µX

0 X

+1 ˆ H1 (Y) = Xe

=

Y

⇢02

1

X

1

µY ) 2

For the bivariate normal distribution the pdf is 1 p

2⇡

1 p 1 Y

+1 ¨ E2 (X Y ) = X 0Y 0

2 Y

X

1

1 (X , Y)dX dY

with

0

with X

+1 ¨ X 0Y 0

(35)

Therefore, we need to work out all the three expectation terms in (35). For the bivariate normal distribution 1 (X , Y) in (32), the pdf

(42)

where

When the sample size N ! 1, it follows that lim E(rKE ) = p

")E1 (X 0 Y 0 ) + "E2 (X 0 Y 0 )

= (1

1

(X

2 + ⌘X

Z)2 2B

e

⇤ A 2B

(53)

dX

A novel kernel correlation coefficient with robustness against nonlinear attenuation and impulsive noise +1 ˆ

A 2B

=e

(X

(X

Z)e

Z)2 2B

+ Ze

(X

Z)2 2B

where dX

1

=e =

p

+1 ˆ Ze

A 2B

1

(X

Z)2 2B

dX

A 2B

2⇡BZe Y2 2D 2

= CYe

(54)

where 2 3 ⇢2 ) X ⌘X ⇢ 2⇡(1 p 2 2 ]3 [(1 ⇢2 ) X + ⌘X Y

C=

2 Y [(1

2

D =

2 2 ⇢2 ) X + ⌘X ] . 2 ⇢2 X

(56)

(57)

where 2 Y (1

⇢2 ) +

2 2 Y ⌘X

2 2 X ⌘Y

+

2 2 + ⌘X ⌘Y .

(58)

⇢02 ) +

⇢0

0

E1 (X Y ) =

2⇡

X

p

C p

Y

1

⇢2

+1 ˆ Y 2e

2⇡CJ 3 p = 2⇡ X Y 1 ⇢2 ⌘ 3 ⌘Y3 X Y ⇢ = X p . W3

Y2 2J 2

E(X 02 ) = p

(1 (2 (1 (2

6 ")⌘X

2 X 2 2 4 3 X ⌘X + ⌘X ) ")⌘Y6 Y2 2 2 Y ⌘Y

+ ⌘Y4 )3

Y2 2D 02

(60)

C = D02 =

p

p 2⇡(1 ⇢02 )

02 X

⇢02 )

X !1 0 Y !1

+q

(2

(2

6 "⌘X

02 X

02 2 X ⌘X + "⌘Y6 Y02 02 2 Y ⌘Y

4 )3 ⌘X

+ ⌘Y4 )3

(66)

. (67)

(1

")E1 (X 0 Y 0 ) + "E2 (X 0 Y 0 ) p E(X 02 )E(Y 02 ) ⇢

(68)

where S and T are defined in (9) and (10), respectively. To show that the last term in (68) is approximately equal to ⇢, we need to evaluate ⌘X and ⌘Y contained in 2 S and T . Let 1 (·) and 2 (·) be the cdf of N (µX , X ) 02 and N (µX , X ), respectively. Let qU and qL be the upper and lower quartiles of X , respectively. Since the marginal distribution of X is (1

2 ]3 + ⌘X

02 2 ⇢02 ) X + ⌘X ] . 02 ⇢02 X

(1 (61)

")N (µX ,

2 X)

+ "N (µX ,

02 X ),

02 X

2 ] + ⌘X

1 (qU )

+"

(63)

2 (qU )

=

3 . 4

(69)

0 By the assumption of X X , it follows that 0 < (q ) < (q ), and hence, from (69), 2 U 1 U

(1

1 1 1 1 = 2 + 02 + 02 02 J ⌘Y D Y W0 02 2 ⇢02 ) Y ⌘Y [(1

")

(62)

Write

=

(65)

we have

02 3 0 X ⌘X ⇢

0 [(1 Y 02 Y [(1

dY

E(X 0 Y 0 ) lim E(rKE ) = p N !1 E(X 02 )E(Y 02 ) 0

dY

(59)

Y2 2J 02

(64)

Given (42), (59), (65)–(67) along with the fact that ⇢ 2 [ 1, +1], we arrive at

where 0

+p

=q 2 ( 1 S⇢ + 12 T )3

1

2 2 + ⌘X ⌘Y .

0 3 3 ⌘ ⌘ pY X Y. W 03

=

In a parallel manner, we have H2 (Y) = C 0 Ye

02 2 X ⌘Y

+

0 X

Then 0

02 2 Y ⌘X

Similarly we can also obtain

E(Y 02 ) = q

1 1 1 1 = 2 + 2 + 2 J2 ⌘Y D Y W = 2 2 2 + ⌘2 ] ⌘ [(1 ⇢2 ) X Y Y X 2 X

02 02 X Y (1

A substitution of (60) into (47) gives ˆ +1 C0 0 0 p E2 (X Y ) = Ye 0 0 2⇡ X 1 ⇢02 1 Y p 2⇡C 0 J 03 p = 0 0 2⇡ X 1 ⇢02 Y

(55)

Write

W =

W0 =

=

p

17

")

1 (qU )