Approximating and learning by Lipschitz kernel on the ... - Springer Link

2 downloads 0 Views 180KB Size Report
CAO Fei-long. WANG Chang-miao. Abstract. This paper investigates some approximation properties and learning rates of Lips- chitz kernel on the sphere.
Appl. Math. J. Chinese Univ. 2014, 29(2): 151-161

Approximating and learning by Lipschitz kernel on the sphere CAO Fei-long

WANG Chang-miao

Abstract. This paper investigates some approximation properties and learning rates of Lipschitz kernel on the sphere. A perfect convergence rate on the shifts of Lipschitz kernel on the sphere, which is faster than O(n−1/2 ), is obtained, where n is the number of parameters needed in the approximation. By means of the approximation, a learning rate of regularized least square algorithm with the Lipschitz kernel on the sphere is also deduced.

§1

Introduction

Many applications require the modelling and analysis of signals whose domain is the surface of a sphere. Examples include the study of seismic signals, gravitational phenomenon, hydrogen atom, solar corona, and medical imaging of the brain, etc.. Furthermore, the first order radial derivative of the spherical Abel-Poisson kernel played a key role in determining the gravitational potential of the surface of the earth by the well-known high-low GPS Satellite-to-Satellite Tracking (see [10], [12]). Meanwhile, in modelling the satellite gravity gradiometry the second order radial derivative of the spherical Abel-Poisson kernel has been applied as an important tool as well (see [11], [12]). However, the spherical Abel-Poisson kernel is a classical Lipschitz kernel. Hence, it is necessary to study the properties of spherical Lipschitz kernel further. On the other hand, in the past decades, learning theory has become a popular research subject and the study of generalization properties of learning algorithms has been an important issue. In the study, one of the basic and significant characteristics is the generalization performance, on which much work has been done in much literature, such as [7], [19], [24], [21], [2], [8], [5], and [6]. By now, there have also been some investigations of learning theory on manifold (see [4], [3], and [25]). However, as a special example of compact Riemannian manifold, sphere owns some good properties such as symmetry and homogeneous nature, which allow us to obtain some Received: 2011-08-19. MR Subject Classification: 41A25, 41A05, 41A63. Keywords: Approximation, learning rate, Lipschitz kernel, sphere. Digital Object Identifier(DOI): 10.1007/s11766-014-2912-0. Supported by the National Natural Science Foundation of China (61272023, 91330118).

152

Appl. Math. J. Chinese Univ.

Vol. 29, No. 2

complete and explicit results in many cases (see [18], [13], and [15]). Recently, Minh [16] studied the learning rate of regularized least square regression algorithm with spherical polynomial kernel or Gaussian kernel when the regression function belongs to the reproducing kernel Hilbert spaces associated with these kernels. Li [14] used the spherical polynomial kernel (1 + x · y)n and deduced the same learning rate when the regression function belongs to a more general space by using the well-known spherical de la Vall´ee-Poisson operator. The main novelty of this paper is on approximation and learning of spherical Lipschitz kernel. Comparing with linear approximation on the sphere, we will deduce a perfect convergence rate for approximation by shifts of Lipschitz kernel on the sphere by means of a method of nonlinear approximation on the sphere, which is faster than O(n−1/2 ), where n is the number of parameters needed in the approximation. By using the approximation, we will also deduce a learning rate of regularized least square algorithm with the Lipschitz kernel on the sphere. The paper is organized as follows. The introduction of the background of learning theory and some basic knowledge of sphere are arranged in Section 2. Section 3 gives the approximation capacity of Lipschitz kernel on the sphere and shows the main results on the regularization error. Section 4 concerns the estimate of the excess sample error and gives the learning rate analysis. Finally, Section 5 concludes this paper.

§2

Preliminaries

Let the input space X be the unit sphere Sd embedded into the (d+1)-dimensional Euclidean space Rd+1 and the output space Y = R. Let ρ be a probability distribution on Z = X × Y , which admits the decomposition ρ(x, y) = ρX (x)ρ(y|x). The error for a function f : X → Y is defined as  E(f ) = (f (x) − y)2 dρ, Z

which is minimized by the regression function (see [9], [21], [8]), defined by  ydρ(y|x), x ∈ X, fρ (x) = Y

where ρ(·|x) is the conditional probability measure at x induced by ρ. Usually, the regression function cannot be constructed exactly since we are given only a finite, possibly small, random examples on Z. Let L2ρX be the Hilbert space of square integrable functions on X, with norm denoted by  · ρ . Because of the least-square nature, the measurement is the weighted L2 metric in L2ρX defined as 1/2  2 f L2ρ = f ρ = |f (x)| dρX , X

X

where ρX is the marginal distribution of ρ on X. With the assumption that fρ ∈ L2ρX , one can see from [22] that f − fρ 2ρ = E(f ) − E(fρ ).

(1)

CAO Fei-long, WANG Chang-miao.

Approximating and learning by Lipschitz kernel on the sphere

153

The target of the regression problem is to learn the regression function or to find a good approximation from random samples. The least square regularized algorithm for the regression problem is a discrete least square problem associated with a Mercer kernel. Let K : X × X → R be continuous, symmetric, and positive semidefinite, i.e., for any finite m set of distinct points {x1 , x2 , . . . , xm } ⊂ X, the matrix (K(xi , xj ))i,j=1 is positive semidefinite. Such a kernel is called a Mercer kernel. If HK is one of reproducing kernel Hilbert spaces (RKHS), it is associated with the kernel K defined (see [1]) to be the closure of the linear span of the set of functions {Kx = K(x, ·) : x ∈ X} with the inner product ·, ·K satisfying Kx , Ky K = K(x, y) and Kx , f K = f (x), ∀x ∈ X, f ∈ HK . Let C(X) be the space of continuous functions on X with the norm  · ∞ . It is easy to prove that f ∞ ≤ κf K , ∀f ∈ HK .  Here κ = supx∈X K(x, x). As the probability measure ρ is unknown, neither fρ nor E(f ) is computable. All we have in hand are the samples z = (xi , yi )m i=1 . In learning theory, one approximates fρ by the function minimizing the empirical error Ez with respect to the sample z: m 1  Ez (f ) = (f (xi ) − yi )2 . m i=1 Hence, the least square algorithm in HK is defined as   m 1  2 2 (f (xi ) − yi ) + λf K . fz,λ = arg min f ∈HK m i=1

(2)

Here λ ≥ 0 is a constant called the regularization parameter. Usually, it depends on the sample number m. In other words, λ = λ(m). Moreover, it must satisfy limm→0 λ(m) = 0. According to [7], we know that there exists a unique fz,λ in HK satisfying (2) and having the the following form: m  ai K x i , fz,λ = i=1

where a1 , a2 , . . . , am ∈ R are the suitable coefficients. Definition 2.1. A function f defined on T ⊂ R is called the Lipschitz function if f satisfies |f (t) − f (s)| ≤ L|t − s|α , α ∈ (0, 1], t, s ∈ T,

(3)

where L is a positive constant. Next, we give the definition of Lipschitz kernel. This paper focuses on the problem of approximation and learning on the unit sphere in Rd+1 defined by

Sd = x = (x1 , x2 , . . . , xd+1 ) ∈ Rd+1 : x21 + x22 + · · · + x2d+1 = 1 . Since Mercer kernel φ(x, y) is usually defined on X × X(X ⊂ Rn ), we would define φ(x, y) = φ(x · y) to be the corresponding Mercer kernel on the sphere. So we can say a kernel function is Lipschitz kernel if it satisfies the Lipschitz property. If Mercer kernel φ has Lipschitz property, then Hφ is an RKHS which is associated with the kernel φx defined to be the closure of the

154

Appl. Math. J. Chinese Univ.

Vol. 29, No. 2

linear span of the set of functions {φx = φ(x, ·) : x ∈ X} with the inner product ·, ·φ and norm denoted by  · φ . In this article, the hypothesis we consider for the regularized least square algorithm is an RKHS associated with the Lipschitz Mercer kernel φ. Definition 2.2. The regularization error for a regularizing function f˜λ ∈ Hφ is defined as D(f˜λ ) = E(f˜λ ) − E(fρ ) + λf˜λ 2 . φ

It is called the regularization error of the scheme (1) when f˜λ = fλ , i.e., D(fλ ) = E(fλ ) − E(fρ ) + λfλ 2φ . It is possible that the two terms in right side of the above equation are not balanced, and E(fλ ) − E(fρ ) decays faster than λfλ 2φ does. Let fλ = gn , we devote our attention to estimate the approximation error D(gn ) = E(gn ) − E(fρ ) + λgn 2φ for the regression function fρ . For any measurable function f : Sd → Rd+1 we have E(f ) − E(fρ ) = f − fρ 2ρ . Consequently, we have E(fλ ) − E(fρ ) = fλ − fρ 2ρ .

§3

Approximation capacity of spherical Lipschitz kernel

In this section, we give the error analysis for the regularization error D(λ) by constructing a function gn in terms of Lipschitz kernel on the sphere to approximate the regression function. Theorem 3.1. Suppose that φ is the Lipschitz kernel, and for any x ∈ Sd  f (x) = h(z)φ(x · z)dz Sd

with h ∈ L2 (Sd ). Then there exists n n   gn (x) = cj φ(x · xj ) with cj ≤ hL1 (Sd ) , xj ∈ Sd j=1

(4)

j=1

such that 1

f − gn L2 (Sd ) ≤ Cn− 2 − d . α

(5)

Proof. It is possible to find bounded measurable sets Sj such that   1 1 d n , diam(Sj ) = O(n− d ), (6) S = ∪j=1 Sj , Sj ∩ Si = ∅, i = j, |Sj | = O n where |Sj | and diam(Sj ) denotes the measure and the diameter of Sj , respectively. We now define coefficients  cj = h(t)dt, Sj

CAO Fei-long, WANG Chang-miao.

Approximating and learning by Lipschitz kernel on the sphere

and

⎧ ⎪ ⎨ μj (t) =

Sd

and

t ∈ Sj , cj = 0,

0, t ∈ Sd \ Sj , cj = cj = 0.  μj (t)dt = μj (t)dt = 1,

0, ⎪ ⎩ 0,



Then

1 cj h(t),

155

(7)

Sj n 

cj μj (t) = h(t).

(8)

j=1

In order to prove (5), it is sufficient to prove that 2      n    2α  f (x) − ··· c φ(x · t ) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) ≤ C 2 n−1− d j j   d d S S   2 d j=1 L (S )

holds for all

{tj }nj=1

⊂S . d

Indeed, if for all {tj }nj=1 ⊂ Sd ,  2   n    f (x) − cj φ(x · tj )     2 j=1 L

> C 2 n−1−

2α d

,

(Sd )

then it follows from (7)   ··· > >

that 2    n    f (x) − cj φ(x · tj ) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn )   Sd Sd   2 d j=1 L (S )   2α ··· C 2 n−1− d μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) Sd Sd 2 −1− 2α d

C n

,

which contradicts (9). Now we use (7) and (8) to prove (9). 2      n     f (x) − ··· cj φ(x · tj ) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn )   Sd Sd   2 d j=1 L (S ) ⎞ ⎛ 2       n    ⎟ ⎜ f (x) − ··· cj φ(x · tj ) dω(x)⎠ μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) = ⎝  Sd Sd Sd   j=1 ⎧ ⎡ 2    ⎨  n  ⎣ = ··· h(t)φ(x · t)dt − 2 cj φ(x · tj ) h(t)φ(x · t)dt Sd Sd ⎩ Sd Sd Sd j=1 ⎫ ⎛ ⎞2 ⎤ ⎪ n ⎬  ⎥ + ⎝ cj φ(x · tj )⎠ ⎦ dω(x) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) ⎪ ⎭ j=1

(9)

156

Appl. Math. J. Chinese Univ. ⎧ ⎨ 

 =



Sd

Sd

!2  h(t)φ(x · t)dω(t) − 2

Sd

h(t)φ(x · t)dω(t)

n  j=1

Vol. 29, No. 2 

cj

Sd

φ(x · tj )μj (tj )dω(tj )

⎫ ⎤2 ⎡ ⎪ n ⎬  ⎣ + ··· cj φ(x · tj )⎦ μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) dω(x) ⎪ Sd Sd j=1 ⎭ ⎧ !2  ⎨    n  = h(t)φ(x · t)dω(t) − 2 h(t)φ(x · t)dω(t) cj φ(x · tj )μj (tj )dω(tj ) Sd ⎩ Sd Sd Sd j=1 ⎡ ⎤2 ⎫ ⎪  n ⎬  ⎣ ⎦ dω(x) + cj φ(x · tj )μj (tj )dω(tj ) ⎪ Sd ⎭ j=1 ⎧ ⎤2 ⎡  ⎪  n ⎨  ⎣ + ··· cj φ(x · tj )⎦ μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) Sd ⎪ Sd j=1 ⎩ Sd ⎡ ⎤2 ⎫ ⎪  n ⎬  −⎣ cj φ(x · tj )μj (tj )dω(tj )⎦ dω(x) ⎪ Sd ⎭ j=1 



⎧ ⎨

 =



Sd

h(t)φ(x · t)dω(t) −

Sd

j=1

⎧ ⎨

 +



Sd





c2j

j=1

−2

n 

Sd



⎧ ⎨

=



Sd

Sd

+ Sd



i=j=1

j=1



⎫ !⎬ φ(x · ti )μi (ti )dω(ti ) φ(x · tj )μj (tj )dω(tj ) dω(x) ⎭ Sd Sd

ci cj



⎡ φ(x · t) ⎣h(t) −

n  j=1

c2j

j=1

n 

⎡ n  ⎣ c2j (φ(x · tj ))2



⎧ n ⎨



+2

Sd

cj

!2 φ(x · tj )μj (tj )dω(tj )

i=j=1



⎫2 ⎬ φ(x · t)μj (t)dω(t) dω(x) ⎭ Sd



ci cj φ(x · ti )φ(x · tj )⎦ μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn )

i=j=1 n 

···

Sd

n 

+ 2

n 

ci cj

 Sd

⎫2 ⎬ cj μj (t)⎦ dω(t) dω(x) ⎭ ⎤

(φ(x · tj ))2 dμj (tj )dω(tj )

 Sd

 φ(x · ti )μi (ti )dω(ti )

Sd

! φ(x · tj )μj (tj )dω(tj )

CAO Fei-long, WANG Chang-miao.



n 

Sd

j=1



n 

−2

ci cj

i=j,i,j=1

⎧ n ⎨

 = Sd



c2j



j=1

157

!2 φ(x · tj )μj (tj )dω(tj )



c2j

Approximating and learning by Lipschitz kernel on the sphere

Sd

⎫ !⎬ φ(x · ti )μi (ti )dω(ti ) φ(x · tj )μj (tj )dω(tj ) dω(x) ⎭ Sd Sd 

n 

(φ(x · tj ))2 dμj (tj )dω(tj ) −

c2j

⎫ !2 ⎬ φ(x · tj )μj (tj )dω(tj ) dω(x), ⎭ Sd



j=1

where the last equation is deduced by (8). Then we have 2      n     ··· cj φ(x · tj ) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn )  f (x) − d d S S   2 d j=1 L (S )  "   2 # n  2 2 dω(x) cj (φ(x · t)) μj (t)dω(t) − φ(x · t)μj (t)dω(t) = Sd

j=1

=

n 

c2j

j=1

=

n 

c2j

j=1





⎧ ⎨

Sd

Sd

⎡ $  2 ⎣ (φ(x · t)) μj (t)dω(t) −

⎩ Sd Sj  " Sd

(φ(x · t))2 μj (t)dω(t)

Sj



Sj

n 

c2j

⎫ ⎬ ⎦ dω(x) ⎭

%2 ⎤ φ(x · s)μj (s)dω(s)

Sj

=

(φ(x · s))μj (s)dω(s) Sj

$ +

φ(t · x)μj (t)dω(t)

Sj

(φ(x · t))μj (t)dω(t)

2

⎧ ⎨

⎡  ⎣

$

%2



=

⎫ ⎬ dω(t)⎦ dω(x) ⎭ ⎤ % ⎤

μ (t) φ(x · t) − μj (s)φ(x · s)dω(s) ⎩ Sd Sj j Sj ⎫ ⎧ ⎡ $ 2   n ⎬ ⎨  ⎣ c2j μj (t) μj (s)φ(x · t)ds − μj (s)φ(x · s)ds dω(t)⎦ dω(x) ⎭ ⎩ Sd Sj Sj Sj j=1 ⎫ ⎧ ⎤ ⎡ $ %2  n ⎬ ⎨  2 ⎣ cj μj (t) μj (s) (φ(x · t) − φ(x · s)) dω(s) dω(t)⎦ dω(x) . ⎭ ⎩ Sd Sj Sj j=1

=

⎫ ⎬ ⎦ dω(x) ⎭

%2 ⎤

j=1

It is easy to deduce that n 

cj =

j=1

j=1

and n  j=1

c2j

n  

=

 h(t)dt =

$ n  j=1

Sd

Sj

Sj

h(t)dt ≤ hL1 (Sd )

%2 h(t)dt



n   j=1

Sj

 dt Sj

h2 (t)dt

158

Appl. Math. J. Chinese Univ.

=

n 

Vol. 29, No. 2



1 h2 (t)dt ≤ C . n Sj

|Sj |

j=1

Since S is compact, it follows form (3) and (6) that for arbitrary x ∈ Sd , t, s ∈ Sj there holds that α |φ(x · s) − φ(x · t)| ≤ Cn− d , and hence 2      n     f (x) − ··· c φ(x · t ) μ1 (t1 ) · · · μn (tn )dω(t1 ) · · · dω(tn ) j j   Sd Sd   2 d j=1 L (S ) ⎧ ⎫ $ %2   n ⎨ ⎬  2α c2j μj (t)dt μj (s)ds n− d ≤ C ⎩ Sj ⎭ Sj d

j=1



Cn−1−

2α d

.

This completes the proof of Theorem 3.1. The following lemma estimates the RKHS norm of gn . Lemma 3.1. Let gn be defined as in (4) and φ is the Lipschitz kernel on the sphere. Then gn 2φ ≤ ch2L1(Sd ) . Proof. For any gn (x) =

&n

· xi ) there holds ' n ( n   = ci φ(x · xi ), cj φ(x · xj )

i=1 ci φ(x

gn 2φ

i=1

=

n 

ci

i=1

j=1 n 

φ

cj φ(xi · xj ).

j=1

Since |φ(t) − φ(s)| ≤ L|t − s|α , α ∈ (0, 1], t, s ∈ [−1, 1], φ is a bounded variation function. Moreover, we have |φ(t)| ≤ C, t ∈ [−1, 1]. Hence, we can get from (4) and (10) that gn 2φ ≤ Ch2L1 (Sd ) This proves Lemma 3.1. In the following, we give the another main result of this section. Theorem 3.2. Suppose that φ is Lipschitz and Mercer kernel, and for any x ∈ Sd  f (x) = h(z)φ(x · z)dz Sd

with h ∈ L2 (Sd ). Then there exists a positive constants C1 and C2 such that D(gn , λ) ≤ C1 n−1−

2α d

+ C2 λh2L1 (Sd ) .

(10)

CAO Fei-long, WANG Chang-miao.

Approximating and learning by Lipschitz kernel on the sphere

159

Proof. Since D(gn ) = ≤

E(gn ) − E(fρ ) + λgn 2φ = gn − fρ 2ρ + λgn 2φ C1 n−1−

2α d

+ C2 λh2L1 (Sd ) ,

we get from Theorem 3.1 and Lemma 3.1 that D(gn ) ≤ C1 n−1−

2α d

+ C2 λh2L1 (Sd ) .

The proof of Theorem 3.2. is finished.

§4

Learning Rates

In this section, the excess error will be bounded by integral operator and standard techniques in learning theory. We need the following lemma to prove our main result. Lemma 4.1. Let z be randomly drawn according to ρ satisfying |y| ≤ M almost surely. Then, for any 0 < δ < 1, with confidence 1 − δ there holds 12κM log(4/δ) √ fz,λ − fλ ρ ≤ , mλ provided that 8κ2 log(4/δ) √ . λ≥ m The proof of Lemma 4.1 can be seen from [20]. Theorem 4.1. Let fz,λ be defined by (2). Suppose that φ is Lipschitz and Mercer kernel. Then with confidence at least 1 − δ, the inequality 1

1

fz,λ − fρ ρ ≤ C3 (log(4/δ)) 2 m− 4 holds. Proof. We can write the excess error as fz,λ − fρ ρ ≤ fz,λ − fλ ρ + fλ − fρ ρ = S + D. Next, we bound S and D, respectively. *1 ) D = fλ − fρ ρ = fλ − fρ 2ρ 2 *1 *1 ) ) ≤ fλ − fρ 2ρ + λfλ 2φ 2 ≤ gn − fρ 2ρ + λgn 2φ 2 . From Theorem 3.2 we have

From Lemma 4.1 and λ =

+ , 12 2α D ≤ C1 n−1− d + C2 λh2L1 (Sd ) .

8κ20 log(4/δ) √ , m

we know that 1

1

S ≤ C4 (log(4/δ)) 2 m− 4 . Hence,

, 12 + 1 1 2α fz,λ − fρ ρ ≤ C4 (log(4/δ)) 2 m− 4 + C1 n−1− d + C2 λh2L1 (Sd ) . d

Let n = m 2d+4α and λ =

8κ20 log(4/δ) √ , m

we have 1

1

fz,λ − fρ ρ ≤ C3 (log(4/δ)) 2 m− 4 .

160

Appl. Math. J. Chinese Univ.

Vol. 29, No. 2

The proof of Theorem 4.1 is completed.

§5

Conclusions

We have studied some approximation properties by shifts of Lipschitz kernel on the sphere Sd for a class of spherical function in the L2 metric, and the rate of approximation is achieved by O(n−1/2−α/d ), where n is the number of parameters needed in the approximation, α is Lipschitz exponent, and d is the dimensional number of sphere. Furthermore, by using the approximation we have studied the regularized least square algorithm involving RKHS associated with the Lipschitz kernel on the sphere. A prefect learning rate was derived in terms of integral operator and standard techniques in learning theory.

References [1] N Aronszajn. Theory of reproducing kernels, Trans Amer Math Soc, 1950, 68: 337-404. [2] P L Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans Inform Theory, 1988, 44: 525-536. [3] M Belkin, P Niyogi, V Sindawani. Manifold regularization: A geometric framework for learning from examples, University of Chicago Computer Science Technical Report TR-2004-06, 2004. [4] M Belkin, P Niyogi. Semi-supervised learning on Riemannian manifolds, Mach Learn, Special Issue on Clustering, 2004, 56: 209-239. [5] O Bousquet, A Elisseeff. Stability and generalization, J Mach Learn Res, 2002, 2: 499-526. [6] D R Chen, Q Wu, Y M Ying, D X Zhou. Support vector machine soft margin classifiers: error analysis, J Mach Learn Res, 2004, 5: 1143-1175. [7] F Cuker, S Smale. On the mathematical foundations of learning theory, Bull Amer Math Soc, 2001, 39: 1-49. [8] F Cucker, S Smale. Best choices for regularization parameters in learning theory: On the biasvariance problem, Found Comput Math, 2002, 2: 413-428. [9] L Devroye, L Gy¨ orfi, G Lugosi. A Probabilistic Theory of Pattern Recognition, Springer-Verlag, Berlin, Heidelberg, New York, 1996. [10] W Freeden, T Gervens, M Schreiner. Constructive Approximation on the Sphere, Calderon Press, Oxford, 1998. [11] W Freeden, S Perevrzev. Spherical Tikhonov regularization wavelets in satellite gravity gradiometry with random noise, J Geom, 2001, 74: 730-736. [12] W Freeden, V Michel, H Nutz. Satellite-to-satellite tracking and satellite gravity gradiometry (Advanced techniques for high-resolution geopotential field determination), J Engrg Math, 2002, 43: 19-56.

CAO Fei-long, WANG Chang-miao.

Approximating and learning by Lipschitz kernel on the sphere

161

[13] S Hubbert, T M Morton. A Duchon framework for the sphere, J Approx Theory, 2004, 129: 28-57. [14] L Q Li. Regularized least square regression with spherical polynomial kernels, Int J Wavelets Multiresolut Inf Process, 2009, 7: 781-801. [15] H Q Minh. Reproducing kernel Hilbert spaces in learning theory, Ph D Thesis in Mathematics, Brown University, 2006. [16] H Q Minh. Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory, Constr Approx, 2010, 32: 307-338. [17] C M¨ uller. Spherical Harmonics, Lecture Note in Mathematics, Vol 17, Springer, Berlin, 1966. [18] F J Narcowich, X P Sun, J D Ward, H Wendland. Direct and inverse sobolev error estimates forscattered data interpolation via spherical basis functions, Found Comput Math, 2007, 7: 369390. [19] S Smale, D X Zhou. Shannon sampling and function reconstruction from point values, Bull Amer Math Soc, 2004, 41: 279-305. [20] S Smale, D X Zhou. Learning theory estimates via integral operators and their approximations, Constr Approx, 2007, 26: 153-172. [21] V Vapnik. Statistical Learning Theory, John Wiley & Sons, 1998. [22] Q Wu, Y Ying, D X Zhou. Learning rates of least-square regularized regression, Found Comput Math, 2006, 6: 171-192. [23] K Y Wang, L Q Li. Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing, 2006. [24] Q Wu, D X Zhou. SVM soft margin classifiers: Linear programming versus quadratic programming, Neural Comput, 2005, 17: 1160-1187. [25] G B Ye, D X Zhou. Learning and approximation by Gaussians on Riemannian manifolds, Adv Comput Math, 2008, 29: 291-310.

Department of Mathematics, China Jiliang University, Hangzhou 310018, China. Email: [email protected]

Suggest Documents