Research Article Indefinite Kernel Network with

Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2016, Article ID 6516258, 6 pages http://dx.doi.org/10.1155/2016/6516258

Research Article Indefinite Kernel Network with 𝑙𝑞-Norm Regularization Zhongfeng Qu and Hongwei Sun School of Mathematical Science, University of Jinan, Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan 250022, China Correspondence should be addressed to Hongwei Sun; ss [email protected] Received 29 November 2015; Accepted 13 March 2016 Academic Editor: Bo Yang Copyright © 2016 Z. Qu and H. Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We study the asymptotical properties of indefinite kernel network with 𝑙𝑞 -norm regularization. The framework under investigation is different from classical kernel learning. Positive semidefiniteness is not required by the kernel function. By a new step stone technique, without any interior cone condition for input space X and 𝐿 𝜏 condition for the probability measure 𝜌X , satisfied error bounds and learning rates are deduced.

1. Introduction Regression learning has been widely applied in economic decision making, engineering, computer science, and especially statistics. To fix ideas, let the input data space X be a compact domain of R𝑛 ; the output data space Y is a subset of R, and 𝜌 is a Borel probability distribution on Z = X × Y, which the random variable (𝑋, 𝑌) is drawn from. Let 𝜌(⋅ | 𝑥) be the conditional distribution for 𝑥 ∈ X, and 𝜌X is the marginal distribution. The goal of regression learning is to learn the regression function 𝑓𝜌 : X → Y, which is given by 𝑓𝜌 (𝑥) = E (𝑌 | 𝑋 = 𝑥) = ∫ 𝑦 𝑑𝜌 (𝑦 | 𝑥) . Y

(1)

As the distribution 𝜌 is unknown, 𝑓𝜌 can not be calculated directly. In learning theory framework, one can get an approximation of 𝑓𝜌 from a set of observations z = {𝑧𝑖 = 𝑚 (𝑥𝑖 , 𝑦𝑖 )}𝑚 identically and independently drawn 𝑖=1 ∈ Z according to 𝜌. Least square scheme is the most popular method for regression problem. The generalization error for any measurable function 𝑓, 2

E (𝑓) = ∫ (𝑓 (𝑥) − 𝑦) 𝑑𝜌, Z

(2)

is the mean error that suffered from the use of 𝑓 as a model for the process producing 𝑦 from 𝑥; see Cucker and Zhou [1]. A simple computation shows that [1] 2

E (𝑓𝜌 ) = min ∫ (𝑓 (𝑥) − 𝑦) 𝑑𝜌, Z

󵄩2 󵄩󵄩 󵄩󵄩𝑓 − 𝑓𝜌 󵄩󵄩󵄩 󵄩𝐿 𝜌2 = E (𝑓) − E (𝑓𝜌 ) . 󵄩

(3)

X

Thus, E(𝑓) − E(𝑓𝜌 ) can be used to value the learning effect for 𝑓 to approximate 𝑓𝜌 . In order to minimize the expected risk functional, we employ the empirical risk functional Ez (𝑓) =

1 𝑚 2 ∑ (𝑓 (𝑥𝑖 ) − 𝑦𝑖 ) . 𝑚 𝑖=1

(4)

The empirical risk minimization (ERM) principle is to approximate the function 𝑓𝜌 which minimizes the risk E by the function 𝑓∗ which minimizes the empirical risk (4) in some hypothesis space H. Kernel based machines usually take reproducing kernel Hilbert spaces (RKHS) H𝐾 as the hypothesis space, which is associated with a Mercer kernel 𝐾. For definitions and properties, see [2]. The well-known regularized least square regression algorithm is 𝑓z,𝛾 = arg min { 𝑓∈H𝐾

1 𝑚 2 󵄩 󵄩2 ∑ (𝑓 (𝑥𝑖 ) − 𝑦𝑖 ) + 𝛾 󵄩󵄩󵄩𝑓󵄩󵄩󵄩𝐾 } . 𝑚 𝑖=1

(5)

2

Discrete Dynamics in Nature and Society

The penalty term 𝛾‖𝑓‖2𝐾 with regularization parameter 𝛾 > 0 is to avoid ill-pose and overfitting. In the literature, a variety of consistence analysis have been done, for instance, the capacity dependent ones (e.g., [3, 4] and references therein) and the capacity independent ones (e.g., [5–8]). In recent years, coefficient based regularization kernel network (CRKN) attracts more attentions: 1 𝑚 2 ∑ (𝑦𝑖 − 𝑓𝛼 (𝑥𝑖 )) + 𝜆Ω (𝛼) , 𝑓𝛼 ∈H𝐾,x 𝑚 𝑖=1

𝑓z = arg min

(6)

where 𝛼 = (𝛼1 , . . . , 𝛼𝑚 ) ∈ R𝑚 and 𝑓𝛼 = ∑𝑚 𝑖=1 𝛼𝑖 𝐾(𝑥, 𝑥𝑖 ). In this setting, the hypothesis space H𝐾 is replaced by a finite dimension function space: 𝑚

(7)

where 𝐾 : 𝑋 × 𝑋 → R is only asked to be continuous and uniform bounded on 𝑋 × 𝑋 and is called a general kernel. The penalty Ω(𝛼) is imposed on the coefficients of function 𝑓 ∈ H𝐾,x . Some researches have been done for the mathematical foundation of CRKN. In [9], the frame of error analysis for this coefficient based regularization is proposed. CRKN with 𝑙2 -norm regularizer and indefinite kernel is studied; error bound and learning rate are derived thorough explicit expression of solution 𝑓z and integral operator techniques, not only for independently sampling, but also for weak dependent sampling; see [10, 11]. CRKN with 𝑙1 -norm regularizer attracts more attention, for it often leads to sparsity of the coefficient {𝛼z,𝑖 }𝑚 𝑖=1 of 𝑓z with properly chosen regularization parameter 𝜆. In [12], Xiao and Zhou analyzed the coefficient regularization using the ℓ1 norm. Under a Lipschitz condition of the kernel 𝐾, the rate of order 𝑂(𝑚−𝑟/2(1+𝑛) ) is obtained. It is very slow since 𝑛 is usually very large. In [13], under an interior cone condition for input space X and 𝐿 𝜏 condition for the probability measure 𝜌X , satisfied error bound and sharp convergence rate are obtained. The shortage for this analysis is that conditions are too strict, especially for 𝐿 𝜏 condition, since it only includes the almost uniform distribution on X. To the best of our knowledge, the probability measure 𝜌X has no direct relation with the consistency of algorithms. So we attempt to deduce satisfied error bound under more general conditions. To this end, we introduce a new step stone technique to the consistence analysis of CRKN with 𝑙1 -norm regularization in [14]. In this paper, we adopt the step stone technique to study the ℓ𝑞 norm based CRKN with 1 ≤ 𝑞 ≤ 2, which is defined as 𝑓z = 𝑓𝛼z , 𝛼∈R

‖𝛼‖𝑞,𝑚 fl {𝑚

𝑚

1/𝑞

󵄨 󵄨𝑞 ∑ 󵄨󵄨󵄨𝛼𝑖 󵄨󵄨󵄨 }

.

(9)

𝑖=1

2. Assumptions and Main Results Throughout the paper, we always assume that |𝑦| ≤ 𝑀 almost surely with some constant 𝑀 > 0. By this assumption, we have that |𝑓𝜌 (𝑥)| ≤ 𝑀 for any 𝑥 ∈ 𝑋. Also we require the kernel 𝐾 ∈ 𝐶𝑠 (X × X) with 𝑠 > 0, and the norm ‖𝐾‖𝐶𝑠 (X) is denoted by 𝜅. For the bounded indefinite kernel 𝐾(𝑥, 𝑦), we consider the Mercer kernel X

∈ R𝑚 , 𝑚 ∈ N} ,

where 𝛼z = arg min𝑚

𝑞−1

̃ (𝑥, 𝑡) = ∫ 𝐾 (𝑥, 𝑢) 𝐾 (𝑡, 𝑢) 𝑑𝜌X (𝑢) . 𝐾

H𝐾,x = {𝑓𝛼 (𝑥) = ∑𝛼𝑖 𝐾 (𝑥, 𝑥𝑖 ) : 𝛼 = (𝛼1 , . . . , 𝛼𝑚 ) 𝑖=1

𝑚 where 𝑓𝛼 = ∑𝑚 𝑖=1 𝛼𝑖 𝐾𝑥𝑖 (⋅) = ∑𝑖=1 𝛼𝑖 𝐾(⋅, 𝑥𝑖 ), and the norm

(8) 1 𝑚 2 ∑ (𝑦𝑖 − 𝑓𝛼 (𝑥𝑖 )) + 𝜂 ‖𝛼‖𝑞𝑞,𝑚 , 𝑚 𝑖=1

(10)

The integral operator associated with a general kernel 𝑇(𝑥, 𝑦) is defined by 𝐿 𝑇 𝑓 (𝑥) = ∫ 𝑇 (𝑥, 𝑢) 𝑓 (𝑢) 𝑑𝜌X (𝑢) ,

(11)

X

which is a compact positive operator. For the approximation ability of kernel 𝐾 based hypothesis space, we assume that −𝛽/2 2 𝐿𝐾 ̃ 𝑓𝜌 ∈ 𝐿 𝜌X (X), for some 𝛽 > 0. One of the main difficulties for the analysis of CRKN is that the hypothesis space is dependent on the sample. Definition 1. Define a Banach space ∞

H1 = {𝑓 : 𝑓 = ∑𝛼𝑖 𝐾𝑡𝑖 , {𝛼𝑖 } ∈ 𝑙1 , {𝑡𝑖 } ⊂ 𝑋}

(12)

𝑖=1

with the norm 󵄩󵄩 󵄩󵄩 󵄩󵄩𝑓󵄩󵄩 ∞

∞

𝑖=1

𝑖=1

󵄨 󵄨 = inf {∑ 󵄨󵄨󵄨𝛼𝑖 󵄨󵄨󵄨 : 𝑓 = ∑𝛼𝑖 𝐾𝑡𝑖 , {𝛼𝑖 } ∈ 𝑙1 , {𝑡𝑖 } ⊂ 𝑋} .

(13)

The continuity and uniform boundedness of 𝐾 ensure that H1 consists of continuous functions. Denote the ball of radius 𝑅 as 𝐵𝑅 = {𝑓 : 𝑓 ∈ H1 , ‖𝑓‖ ≤ 𝑅}. Our estimate of sample error is conducted through a concentration inequality, which is based on the 𝑙2 empirical covering number of 𝐵𝑅 . The normalized 𝑙2 metric 𝑑2 on Euclidean space R𝑘 is defined by 2

(𝑑2 (𝛼, 𝛽)) =

1 𝑘 2 ∑ (𝑎 − 𝑏 ) , 𝑘 𝑖=1 𝑖 𝑖 for any 𝛼 =

𝑘 (𝑎𝑖 )𝑖=1

(14) ,𝛽 =

𝑘 (𝑏𝑖 )𝑖=1

∈ R𝑘 .

Under the above metric, the covering number of a subset 𝐸 of R𝑘 with 𝜖 > 0 is denoted by N2 (𝐸, 𝜖). Let F be a set of functions on X, x = (𝑥𝑖 )𝑘𝑖=1 ∈ X𝑘 . The sampling operator 𝑆x : F → R𝑘 is defined by


3

𝑆x (𝑓) = (𝑓(𝑥𝑖 ))𝑘𝑖=1 , for any 𝑓 ∈ F. The 𝑙2 empirical covering number of F is defined by N2 (F, 𝜖) = sup sup N2 (𝑆x (F) , 𝜖) . 𝑘∈N x∈X𝑘

Thus, we have that 󵄩 󵄩𝑞 E (𝑓z ) − E (𝑓𝜌 ) ≤ E (𝑓z ) − E (𝑓𝜌 ) + 𝜂 󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 󵄩𝑞 󵄩 + 𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚

(15)

A bound for 𝑙2 empirical covering number of 𝐵1 was derived in [13]. Suppose that 𝐾 ∈ 𝐶𝑠 (X × X); then, 𝑝

1 log N2 (𝐵1 , 𝜖) ≤ 𝑐𝑝 ( ) , ∀𝜖 > 0, 𝜖

= {E (𝑓z ) − Ez (𝑓z )} + {Ez (𝑓z,𝜆 ) − E (𝑓z,𝜆 )} + Ez (𝑓z ) 󵄩 󵄩𝑞 + 𝜂 󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 − Ez (𝑓z,𝜆 )

(16)

󵄩 󵄩𝑞 󵄩𝑞 󵄩 − 𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚 + 2𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚

where 𝑐𝑝 is a constant independent of 𝜖, and 0 < 𝑝 < 2 is defined by 2𝑛 { { { 𝑛 + 2𝑠 { { { { 2𝑛 𝑝={ { 𝑛+2 { { { { {𝑛 {𝑠

+ E (𝑓z,𝜆 ) − E (𝑓𝜌 ) ≤ {E (𝑓z ) − Ez (𝑓z )}

when 0 < 𝑠 ≤ 1, 𝑛 when 1 < 𝑠 ≤ 1 + , 2 𝑛 when 𝑠 > 1 + . 2

󵄩 󵄩𝑞 + 2𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩2,𝑚 + E (𝑓z,𝜆 ) − E (𝑓𝜌 ) .

Theorem 2. Let 𝑓z be the optimal solution of the algorithm (8) with 1 ≤ 𝑞 ≤ 2. Suppose 𝐵1 satisfies the capacity condition (16) −𝛽/2 2 with 0 < 𝑝 < 2, and 𝐿 𝐾 ̃ 𝑓𝜌 ∈ 𝐿 𝜌X (X) for some 𝛽 > 0. Taking 𝜂 = 𝑚−𝑡 such that 0 < 𝑡 < 𝑞/(𝑝+2), we have for any 0 < 𝛿 < 1, with confidence 1 − 𝛿, that there holds (18)

̃ is a constant independent of 𝑚. where 𝐶 Corollary 3. Under assumptions of Theorem 2. Suppose that 𝜖−1/(2+𝑝) 𝑞 = 1 and 𝑓𝜌 ∈ H𝐾 with 1/(2 + 𝑝) > ̃ . Taking 𝜂 = 𝑚 𝜖 > 0, for any 0 < 𝛿 < 1, then with confidence 1 − 𝛿, there holds E (𝑓z ) − E (𝑓𝜌 ) = 𝑂 (𝑚−1/(2+𝑝)+𝜖 ) .

+ {Ez (𝑓z,𝜆 ) − E (𝑓z,𝜆 )}

(17)

Our main conclusion is about the asymptotic convergence rate of kernel network (8).

̃ −𝑡 min{𝛽,1} , E (𝑓z ) − E (𝑓𝜌 ) ≤ 𝐶𝑚

(19)

The condition of 𝑓𝜌 ∈ H𝐾 ̃ is equal to 𝛽 ≥ 1. When the kernel 𝐾 ∈ 𝐶∞ (X × X), the parameter 𝑝 can be arbitrarily close to 2; thus, our learning rate is almost the optimal rate 1/4 for CRKN with 𝑙1 -norm regularization; see [13].

3. Rough Error Bounds of 𝑙𝑞 -Regularization We propose the following new error decomposition by employing a stepping-stone approach. Consider the regularized kernel network with 𝑙2 norm: 𝑓z,𝜆 = 𝑓𝛼z,𝜆 ,

The second inequality holds by the definition of 𝑓z and 𝑓z,𝜆 ∈ H𝐾,x . Let 𝑔𝑓 (𝑧) = (𝑓(𝑥) − 𝑦)2 − (𝑓𝜌 (𝑥) − 𝑦)2 for 𝑓 ∈ H1 . For ̃ ∈ H , we have any 𝑓, 𝑓 1 ̃ − E (𝑓)} ̃ {E (𝑓) − Ez (𝑓)} + {Ez (𝑓) = E𝑔𝑓 −

1 𝑚 1 𝑚 ∑𝑔𝑓 (𝑧𝑖 ) + ∑𝑔𝑓̃ (𝑧𝑖 ) − E𝑔𝑓̃ . 𝑚 𝑖=1 𝑚 𝑖=1

𝛼∈R

(20) 1 𝑚 2 ∑ (𝑦𝑖 − 𝑓𝛼 (𝑥𝑖 )) + 𝜆 ‖𝛼‖22,𝑚 . 𝑚 𝑖=1

(22)

The following concentration inequality is the special case of Lemma 3 in [13]; here we take 𝛼 = 1. Lemma 4. Let F be a class of measurable functions on 𝑍. Suppose that there exist constants 𝑏, 𝑐 > 0, such that ‖𝑓‖∞ ≤ 𝑏 and E𝑓2 ≤ 𝑐E𝑓 for every 𝑓 ∈ F. Moreover, there are some 𝑎 > 0 and 𝑝 ∈ (0, 2), such that 1 𝑝 log N2 (F, 𝜖) ≤ 𝑎 ( ) , ∀𝜖 > 0. 𝜖

(23)

Then, for any 0 < 𝛿 < 1, with confidence 1 − 𝛿/2, there holds, for any 𝑓 ∈ F, 󵄨󵄨 󵄨󵄨 𝑚 󵄨󵄨 󵄨 󵄨󵄨E𝑓 − 1 ∑𝑓 (𝑧𝑖 )󵄨󵄨󵄨 󵄨󵄨 󵄨󵄨 𝑚 󵄨󵄨 󵄨󵄨 𝑖=1 1 𝑎 2/(2+𝑝) ≤ E𝑓 + 𝑐𝑝󸀠 (max {𝑐, 𝑏})(2−𝑝)/(2+𝑝) ( ) 2 𝑚 + (2𝑐 + 18𝑏)

where 𝛼z,𝜆 = arg min𝑚

(21)

log (4/𝛿) , 𝑚

where 𝑐𝑝󸀠 is a constant only depending on 𝑝.

(24)

4


For any 𝑟 > 0, the function set F𝑟 is defined by F𝑟 = {𝑔𝑓 : 𝑓 ∈ 𝐵𝑟 }. Now we verify that the conditions in Lemma 4 hold. Firstly, ‖𝑔𝑓 ‖∞ ≤ (3𝑀 + 𝜅𝑟)2 for any 𝑔𝑓 ∈ F𝑟 , and E𝑔𝑓2 ≤ (3𝑀 + 𝜅𝑟)2 E𝑔𝑓 .

(25)

Secondly, for any 𝑔𝑓1 , 𝑔𝑓2 ∈ F𝑟 , 󵄩󵄩 󵄩 󵄩󵄩𝑔𝑓1 − 𝑔𝑓2 󵄩󵄩󵄩 ≤ (𝜅𝑟 + 3𝑀) 󵄩󵄩󵄩󵄩𝑓1 − 𝑓2 󵄩󵄩󵄩󵄩∞ . 󵄩 󵄩∞

󵄩2 󵄩 E (𝑓z,𝜆 ) − E (𝑓𝜌 ) = 󵄩󵄩󵄩󵄩𝑓z − 𝑓𝜌 󵄩󵄩󵄩󵄩𝐿2

𝜌X

(26) ̃𝐾,𝑝,𝛾 (log ( ≤𝐶

Therefore, we have that, for ∀𝜖 > 0,

](𝛾,𝑝)

20] (𝛾, 𝑝) )) 𝛿

𝑚− min{𝛽,1}𝛾 , (32)

󵄩󵄩 󵄩 󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩2,𝑚

𝜖 N2 (F𝑟 , 𝜖) ≤ N2 (𝐵𝑟 , ) (𝜅𝑟 + 3𝑀) 𝜖 ) ≤ N2 (𝐵1 , 𝑟 (𝜅𝑟 + 3𝑀)

Proposition 5. Let 𝑓z,𝜆 be given by (20) and 𝐾 ∈ 𝐶𝑠 (X × X) −𝛽/2 2 with some 𝑠 > 0. Suppose that 𝐿 𝐾 ̃ 𝑓𝜌 ∈ 𝐿 𝜌X (X) for some 𝛽 > 0 and 0 < 𝜆 ≤ 1. Then, for any 𝛿 ∈ (0, 1), take 𝜆 = 𝑚−𝛾 with 0 < 𝛾 < 2/(2 + 𝑝), where 𝑝 is defined by (17); then, with probability 1 − 𝛿/2 there holds

(27)

̃𝛾 (log ( ≤𝐶

](𝛾,𝑝)/2

20] (𝛾, 𝑝) )) 𝛿

𝑚max{(1−𝛽)𝛾/2,0} ,

𝑟 (𝜅𝑟 + 3𝑀) 𝑝 ) . 𝜖

̃𝐾,𝑝,𝛾 , 𝐶 ̃𝛾 where ](𝛾, 𝑝) = (4 − 𝛾(2 + 𝑝))/(2 − 𝛾(2 + 𝑝)) and 𝐶 are constants independent of 𝛿 and 𝑚.

For any vector 𝛼 = (𝑎1 , 𝑎2 , . . . , 𝑎𝑚 ), there is

Combining this estimate with (31) leads to Proposition 6.

≤ 𝑐𝑝 (

𝑚

𝑚

1/𝑞

󵄨 󵄨 󵄨 󵄨𝑞 ∑ 󵄨󵄨󵄨𝑎𝑖 󵄨󵄨󵄨 = ‖𝛼‖1,𝑚 ≤ (∑ 󵄨󵄨󵄨𝑎𝑖 󵄨󵄨󵄨 ) 𝑖=1

𝑖=1

× 𝑚1−1/𝑞 = ‖𝛼‖𝑞,𝑚 .

(28)

Denote 󵄩 󵄩 󵄩 󵄩 W (𝑟) = {z ∈ Z𝑚 : 󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 ≤ 𝑟, 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚 ≤ 𝑟} .

(29)

For any z ∈ W(𝑟), there is 𝑓z , 𝑓z,𝜆 ∈ 𝐵𝑟 . Applying (22) and Lemma 4 with 𝑏 = 𝑐 = (3𝑀 + 𝜅𝑟)2 and ̃𝑟 of Z𝑚 𝑎 = 𝑐𝑝 (𝑟(𝜅𝑟 + 3𝑀))𝑝 , it follows that there is a subset 𝑉 with measure at most 𝛿/2 such that

Proposition 6. Let 𝐾 ∈ 𝐶𝑠 (X × X) with some 𝑠 > 0. Assume −𝛽/2 2 −𝛾 that 𝐿 𝐾 ̃ 𝑓𝜌 ∈ 𝐿 𝜌X (X), for some 2 ≥ 𝛽 > 0. Taking 𝜆 = 𝑚 with 0 < 𝛾 < 2/(2 + 𝑝) and 0 < 𝜂, 𝛿 < 1, then for any 𝑟 > 0 there is a subset 𝑉𝑟 of 𝑍𝑚 with measure at most 𝛿 such that, for any z ∈ W(𝑟) \ 𝑉𝑟 , 󵄩𝑞 󵄩 󵄩𝑞 󵄩 E (𝑓z ) − E (𝑓𝜌 ) + 𝜂 󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 + 𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚 4 ≤ 𝑎̃ (𝑚−2/(2+𝑝) + 𝑚−1 log ) max {𝑟2 , 𝑀2 } 𝛿 ](𝛾,𝑝)

20] (𝛾, 𝑝) + ̃𝑏 (log ( )) 𝛿

{E (𝑓z ) − Ez (𝑓z )} + {Ez (𝑓z,𝜆 ) − E (𝑓z,𝜆 )} ≤

1 {E (𝑓z ) − E (𝑓𝜌 ) + E (𝑓z,𝜆 ) − E (𝑓𝜌 )} 2

⋅ (𝜂𝑚max{(1−𝛽)𝑞𝛾/2,0} + 𝑚− min{𝛽,1}𝛾 ) ,

2/(2+𝑝) 1 1 2/(2+𝑝) (30) + 2 (1 + ) 𝑐𝑝󸀠 (𝑐𝑝 ) (𝜅𝑟 + 3𝑀)2 ( ) 𝜅 𝑚

+ 40 (𝜅𝑟 + 3𝑀)2

log 4/𝛿 , 𝑚

In this section, we apply an iteration technique to deduce an error bound. Proposition 6 ensures that, for z ∈ W(𝑟) \ 𝑉𝑟 ,

󵄩𝑞 󵄩 󵄩𝑞 󵄩 E (𝑓z ) − E (𝑓𝜌 ) + 𝜂 󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 + 𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚

󵄩 󵄩 󵄩 󵄩 max {󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 , 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚 }

1 ≤ 4 (1 + ) (𝜅 + 3)2 𝑐𝑝󸀠 𝑐𝑝2/(2+𝑝) max {𝑟2 , 𝑀2 } 𝜅 1 2/(2+𝑝) + 3 {E (𝑓z,𝜆 ) − E (𝑓𝜌 )} + 80 (𝜅 + 3)2 ) 𝑚

⋅ max {𝑟2 , 𝑀2 }

where 𝑎̃ and ̃𝑏 are constants independent on 𝛿, 𝑚, and 𝜂.

4. Refined Error Bound by Iteration

̃𝑟 . ∀z ∈ W (𝑟) \ 𝑉

Plugging this estimate into (21) yields that

⋅(

(33)

(31)

log 4/𝛿 󵄩𝑞 󵄩 + 3𝜂 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩2,𝑚 . 𝑚

The following excess generalization error of 𝑓z,𝜆 was proved in [15].

4 1/𝑞 ≤ 𝑎̃1/𝑞 (𝜂−1/𝑞 𝑚−2/(2+𝑝)𝑞 + 𝜂−1/𝑞 𝑚−1/𝑞 (log ) ) 𝛿 ⋅ max {𝑟2/𝑞 , 𝑀2/𝑞 } 1/𝑞

+ ̃𝑏

(34) ](𝛾,𝑝)/𝑞

(log (

20] (𝛾, 𝑝) )) 𝛿

⋅ (𝑚max{(1−𝛽)𝛾/2,0} + 𝜂−1/𝑞 𝑚− min{𝛽,1}(𝛾/𝑞) ) .


5

We take 𝜂 = 𝑚−𝑡 with 0 < 𝑡 ≤ 𝑞𝛾/2 < 𝑞/(2 + 𝑝), which yields that 4 1/𝑞 󵄩 󵄩 󵄩 󵄩 max {󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞,𝑚 , 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞,𝑚 } ≤ 2 (̃𝑎 log ) 𝛿 𝑡/𝑞−2/(2+𝑝)𝑞

⋅𝑚

2/𝑞

max {𝑟

2/𝑞

,𝑀

](𝛾,𝑝)/𝑞

20] (𝛾, 𝑝) )) ⋅ (log ( 𝛿

1/𝑞

} + max {2̃𝑏

(𝛾/2)(1−min{𝛽,1})

𝑚

𝑀 , } 2 1 fl 2

(35)

≤ 𝜇 (𝛾, 𝑝, 𝛿) 1/𝑞

22+2/𝑞 ̃𝑏

(2/𝑞)𝐽

(2/𝑞)𝐽−1 −1

, 𝑏𝑚,𝜂 (̃𝑎𝑚,𝜂 𝑏𝑚,𝜂 ) (2/𝑞)𝐽

max {(̃𝑐𝜅,𝑝,𝑀)

∪ 𝑉𝑟 . 2/𝑞

max{𝑎𝑚,𝜂 max{𝑟𝑗−1 , 𝑀2/𝑞 }, 𝑏𝑚,𝜂 } and 𝑟0

(2/𝑞)𝐽

̃ , (𝑑 𝜅,𝑝,𝑀 )

, (38)

̃

} 𝑚𝜃 ,

𝑞/(2−𝑞) 𝑎̃𝑚,𝜂 = 𝑎𝑚,𝜂 ,

(36) =

max{𝑀, 𝑀2/𝑞 }𝑚𝑡/𝑞 . Hence,

𝑍𝑚 = W (𝑟0 ) ⊂ W (𝑟1 ) ∪ 𝑉𝑟0 ⊂ ⋅ ⋅ ⋅ 𝐽−1

](𝛾,𝑝)/𝑞

𝜇 (𝛾, 𝑝, 𝛿) = (log

𝑗=0

20] (𝛾, 𝑝) ) 𝛿

4 1/(2−𝑞) (log ) , 𝛿

(39)

̃ and ̃𝑐𝜅,𝑝,𝑀 and 𝑑 𝜅,𝑝,𝑀 are both constants independent of 𝑚, 𝜂, 𝛿, and by taking 𝛾, 𝑡 as 0 < 𝛾 < 2/(2 + 𝑝) and 0 < 𝑡 ≤ 𝑞𝛾/2, we can deduce 𝛾 𝜃̃ = (1 − min {𝛽, 1}) , 2

(37)

⊂ W (𝑟𝐽 ) ∪ (⋃𝑉𝑟𝑗 ) .

(40)

by choosing 𝐽 to be

ln ((2/ (2 + 𝑝) − 𝑡 − ((2 − 𝑞) 𝑟/2) (1 − min {𝛽, 1})) / (2/ (2 + 𝑝) − 𝑡 − (2 − 𝑞) 𝑡/𝑞)) ] + 1. ln (2/𝑞)

Proof of Theorem 2. By taking 𝛾 = 2𝑡/𝑞 and replacing 𝛿 by 𝛿/(𝐽 + 1) in Proposition 6 and the above discussion, with confidence 1 − 𝐽𝛿/(𝐽 + 1) for some 0 < 𝛿 < 1, the estimate (38) assures that 󵄩1/𝑞 󵄩 󵄩1/𝑞 󵄩 𝑟𝐽 = max {󵄩󵄩󵄩𝛼𝑧 󵄩󵄩󵄩𝑞 , 󵄩󵄩󵄩𝛼𝑧,𝜆 󵄩󵄩󵄩𝑞 } 𝐽

(42)

̃ and 𝐽 are where 𝑐̃1 is a constant independent of 𝑚, and 𝜇, 𝜃, defined by (39), (40), and (41). Proposition 6 ensures that, with confidence 1 − 𝛿/(𝐽 + 1), E (𝑓z ) − E (𝑓𝜌 ) ≤ 𝑎̃ (𝑚−2/(2+𝑝) + 𝑚−1 log

(41)

Competing Interests The authors declare that they have no competing interests.

Acknowledgments

𝛿 (2/𝑞) 𝜃̃ ≤ 𝑐̃1 𝜇 (𝛾, 𝑝, 𝑚 , ) 𝐽+1

4 (𝐽 + 1) ) 𝛿

](𝛾,𝑝)

⋅ (𝜂𝑚max{(1−𝛽)𝑞𝛾/2,0} + 𝑚− min{𝛽,1}𝛾 ) . Combing the above two inequalities completes the proof.

Zhongfeng Qu is supported by Shandong Provincial Scientific Research Foundation for Excellent Young Scientists (no. BS2013SF003). Hongwei Sun is supported by the Nature Science Fund of Shandong province China (no. ZR2014AM010).

References [1] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, 2007. [2] N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.

⋅ max {𝑟𝐽2 , 𝑀2 } 20 (𝐽 + 1) ] (𝛾, 𝑝) + ̃𝑏 (log ( )) 𝛿

, 𝑏𝑚,𝜂 }

where

W (𝑟) ⊂ W (max {𝑎𝑚,𝜂 max {𝑟2/𝑞 , 𝑀2/𝑞 } , 𝑏𝑚,𝜂 })

𝐽=[

(2/𝑞)𝐽

𝐽

It follows that, for any 𝑟 > 0,

=

2/𝑞

𝑀; thus, 𝑟𝑗 = max{𝑎𝑚,𝜂 𝑟𝑗−1 , 𝑏𝑚,𝜂 }. A simple computation (see [13]) gives that (2/𝑞) −1 𝑟𝐽 ≤ max {̃𝑎𝑚,𝜂 𝑟0

1 ⋅ 𝑎𝑚,𝜂 max {𝑟2/𝑞 , 𝑀2/𝑞 } + 𝑏𝑚,𝜂 . 2

Let 𝑟𝑗

For 0 < 𝜂 ≤ 1, we have that 𝑟0 ≥ 𝑀, and 𝑟𝑗 ≥ 𝑏𝑚,𝜂 ≥

(43)

[3] Q. Wu, Y. M. Ying, and D.-X. Zhou, “Learning rates of leastsquare regularized regression,” Foundations of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006. [4] Y.-L. Xu and D.-R. Chen, “Learning rates of regularized regression for exponentially strongly mixing sequence,” Journal of Statistical Planning and Inference, vol. 138, no. 7, pp. 2180–2189, 2008.

6 [5] O. Bousquet and A. Elisseeff, “Stability and generalization,” Journal of Machine Learning Research, vol. 2, no. 3, pp. 499–526, 2002. [6] S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approximations,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007. [7] H. Sun and Q. Wu, “Regularized least square regression with dependent samples,” Advances in Computational Mathematics, vol. 32, no. 2, pp. 175–189, 2010. [8] T. Zhang, “Leave-one-out bounds for kernel methods,” Neural Computation, vol. 15, no. 6, pp. 1397–1437, 2003. [9] Q. Wu and D.-X. Zhou, “Learning with sample dependent hypothesis spaces,” Computers & Mathematics with Applications, vol. 56, no. 11, pp. 2896–2907, 2008. [10] H. Sun and Q. Wu, “Coefficient regularization in least square kernel regression,” Applied and Computational Harmonic Analysis, vol. 30, pp. 96–109, 2011. [11] H. Sun and Q. Wu, “Indefinite kernel network with dependent sampling,” Analysis and Applications, vol. 11, no. 5, Article ID 1350020, 2013. [12] Q. W. Xiao and D. X. Zhou, “Learning by nonsymmetric kernels with data dependent spaces and 1-regularizer,” Taiwanese Journal of Mathematics, vol. 14, no. 5, pp. 1821–1836, 2010. [13] L. Shi, Y.-L. Feng, and D.-X. Zhou, “Concentration estimates for learning with l1 -regularizer and data dependent hypothesis spaces,” Applied and Computational Harmonic Analysis, vol. 31, no. 2, pp. 286–302, 2011. [14] H. Sun and Q. Wu, “Sparse representation in kernel machines,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2576–2582, 2015. [15] L. Shi, “Learning theory estimates for coefficient-based regularized regression,” Applied and Computational Harmonic Analysis, vol. 34, no. 2, pp. 252–265, 2013.


Advances in

Operations Research Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Advances in

Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Journal of

Applied Mathematics

Algebra

Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

Journal of

Probability and Statistics Volume 2014

The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

International Journal of

Differential Equations Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Submit your manuscripts at http://www.hindawi.com International Journal of

Advances in

Combinatorics Hindawi Publishing Corporation http://www.hindawi.com

Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Journal of

Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

International Journal of Mathematics and Mathematical Sciences

Mathematical Problems in Engineering

Journal of

Mathematics Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Volume 2014

Volume 2014


Volume 2014

Discrete Mathematics

Journal of

Volume 2014



Journal of

Function Spaces Hindawi Publishing Corporation http://www.hindawi.com

Abstract and Applied Analysis

Volume 2014


Volume 2014


Volume 2014

International Journal of

Journal of

Stochastic Analysis

Optimization



Volume 2014

Volume 2014

Research Article Indefinite Kernel Network with

Research Article Indefinite Kernel Network with

Suggest Documents