Nov 12, 1993 - DEPARTMENT OF STATISTICS ... statistical applications by ..... kernel, do not expect the software to yield a nice and good looking solution in a ...
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706
TECHNICAL REPORT NO.908 November 12, 1993
Computing a family of reproducing kernels for statistical applications
by
Christine Thomas-Agnan
Computing a family of reproducing kernels for statistical applications Christine Thomas-Agnan November 12, 1993 Abstract
For an open subset of R, an integer m, and a positive real parameter , the R R Sobolev spaces H m ( ) equipped with the norms: kuk2 = u(t)2dt + 21m u(m) (t)2dt constitute a family of reproducing kernel Hilbert spaces. When is an open interval of the real line, we describe the computation of their reproducing kernels. We derive explicit formulas for these kernels for all values of m in the case of the whole real line, and for m = 1 and m = 2 in the case of a bounded open interval. j
Keywords:
Sobolev spaces, Reproducing kernels, Hilbert spaces.
1 Introduction 1.1 Motivation
It is often the case that statisticians or applied probabilists encounter the theory of reproducing kernel Hilbert spaces (RKHS) in their studies (see for example Weinert(1982)). Each speci c application usually requires the use of an adapted RKHS. Some of them are very well documented in the literature, but we have come across cases where the expression of the kernel function of the space was not available, as for example in Delecroix, Simioni, ThomasAgnan (1992) for the nonparametric estimation of functions under shape restrictions. We believe that the method of derivation as well as the actual expression of these kernels may be of interest to other authors. The rest of this introduction is devoted to the description of the spaces of interest. In section 2, we establish that, in the case of the whole real line, the kernels are Fourier transforms of simple functions and can be obtained through easy contour integration. In section 3, we turn to the case of a bounded open interval. We rst establish the relationship between the kernel on the interval (a; b) associated with and the kernel on the interval (0; 1) associated with another value of this parameter. Then, to determine the kernel on (0; 1), we derive a system of dierential equations of order 2m with initial conditions, and give its solutions for m = 1 and m = 2. The appendix contains a collection of intermediate steps in these computations. 0
This research was partly supported by NSF Grant DMS-9002566
2
1.2 The spaces and their norms Let denote an open subset of jR, and H m( ) denote the classical real Sobolev space, i.e. the set of elements u of L2 ( ) whose weak derivatives u(k) in the sense of generalized functions belong to L2( ) for any integer k between 1 and m. For classical properties of these spaces, we refer the reader to Adams(1975). We just recall here that each element u of H m ( ) is equal almost everywhere to a unique absolutely continuous function on with ordinary derivatives absolutely continuous up to order m ? 1, and last derivative de ned almost everywhere in the ordinary sense and square integrable. We will identify elements of H m ( ) with this unique representor. The classical norms for these spaces are:
kuk2 =
m Z X j =0 x2
(u(j)(t))2dt
(1)
For the particular application we had in mind in Delecroix, Simioni, Thomas-Agnan (1993), the following norms seemed more appropriate:
kuk2
=
Z
u(t)2dt +
1
Z
(m) 2 2m u (t) dt
(2)
They are simpler to interpret as a weighted sum of the L2 norms of u and its last derivative the parameter regulating the balance.The norms de ned by (2) are topologically equivalent to the ones de ned by (1), by virtue of the Sobolev inequalities (see Agmon (1965)), which can be applied to the case of the real line or the case of a bounded open interval. Let us also recall a few facts about reproducing kernel Hilbert space theory. Let Dkx, the derivative functional of order k at x, be de ned by:
u(m),
8x 2 ; 8u 2 H m( ); Dkx(u) = u(k)(x)
(3)
A reproducing kernel Hilbert space is a Hilbert space in which the evaluation functionals D0x are continuous functionals for all x 2 . Let dkx in H m( ) denote the representors of the functionals Dkx, in the sense that:
8x 2 ; 8u 2 H m ( ); Dkx(u) = hu; dkxi
(4)
These representors exist by the Riesz representation theorem as soon as Dkx is continuous. The function K (x; y) = d0x(y) is known as the kernel of the space, and the reader is referred to Aronszajn(1950) for more extensive properties. It is known that in the Sobolev spaces H m ( ), the functionals Dkx are well de ned and continuous if and only if m ? k > 21 (see Wahba and Wendelberger(1980)). The computation of dkx from K is straightforward. Our (a;b) the aim in sections 2 and 3 is the computation of K for H m( ). Let us denote by Km; 1 the kernel function of H m ( ) when kernel function of H m ( ) when = (a; b), and Km;
= jR. 3
2 Case of the whole real line When = jR, the space H m( ) falls into the family of Beppo-Levi spaces described in Thomas-Agnan (1991). It follows from the results of this paper that the kernel is translation 1 (x; t) = K 1 (t ? x), and invariant, which can be writen with a slight abuse of notation:Km; m; is given by: 1 1 (! ) = (5) F Km; 1 + (2 ! )2m
where F denotes the Fourier transform (as de ned in Thomas-Agnan(1991)).Even though the proof can be found in this reference, it is interesting to outline it here in this very simple 1 satis es, for all u 2 H m ( ): example. By de nition, Km; Z 1 Z 1 m 1 (t; x) + 1 (m)(t) @ K 1 (t; x) = u(x) u(t)Km; u (6) 2m ?1 @tm m; ?1 Using the Parseval identity in these two integrals, and the Fourier inversion formula in the 1 (x; t) is solution to the right hand side, one easily concludes that the function fx(t) = Km; following equation: 2m (7) F fx(!) + ( 2! ) F fx(!) = exp(?2i!x) This rst shows that F fx(!) = exp(?2i!x)F f0(!), and therefore that the kernel is translation invariant, and that F f0 is given by (5). 1 can From formula (5) and the properties of Fourier transform, one concludes that Km; 1 by: be expressed in terms of Km; 1 1 (t) = K 1 (t) Km; (8) m;1 1 is a kernel which is familiar to the nonparametric statisticians since it is the \ asympKm; 1 totically equivalent" kernel to smoothing splines of order m. The theory for this equivalence can be found in Silverman(1984)as well as the analytic expression of this kernel for = 1, and m = 1 or 2. After searching the classical mathematical tables, we were unable to nd the formula for this Fourier transform for general m, and were lead to compute it by contour integration. The result is stated in the following proposition, followed by a short description of this contour integration. Proposition 1 1 (t) = Km; 1
mX ?1 exp (? j t j ei 2m +k m ? 2 )
(2m?1)(i 2m +i k m) k=0 2me R 1 exp(2i!x) ?1 1+(2!)2m d! , for x 0, integrate on
(9)
Proof: to compute the integral the boundary of the upper half disc of the complex plane fj z j R; Im(z) 0g, and let R tend to 1. The poles on the upper half plane are 21 exp(i 2m + i km ) for k = 0; ::m ? 1. The integral is then equal to the product of 2i by the sum of the residues of the integrand at these poles,which yields (9). The most frequent cases of application which are m = 1; 2, and 3, are given explicitly in the next corollary and gures 1 and 2 display the graphs of some of these kernels. Low values of correspond to atter kernels. 4
m=2 3
2.5
2
1.5
1
0.5
t 0 -3
-2
-1
0
1
2
3
1 for m = 2 and = 3; 6; 9 Figure 1: Km;
3
m=3
2.5
2
1.5
1
0.5
t 0 -3
-2
-1
0
1
2
3
1 for m = 3 and = 3; 6; 9 Figure 2: Km;
Corollary 1
K11;1(t) = 21 e?jtj
p
t K21;1(t) = 12 e? 2 sin(j t j 22 + 4 ) p t 1 ?j t j ? 1 K3;1(t) = 6 fe + 2e 2 sin(j t j 23 + 6 )g j j p
j j
5
3 Case of a bounded open interval
3.1 Reducing the problem to the unit interval
(0;1) must satisfy, for all u 2 H m (0; 1): By de nition (4), the kernel Km; Z
1 0
Z 1 m (m)(t) @ K (0;1)(t; x) = u(x) (0;1)(t; x)dt + 1 u u(t)Km; 2m @tm m;
0
(10)
After the change of variables t = sb??aa , for s 2 (a; b), and letting v(s) = u( sb??aa ) and y = (a;b): a + (b ? a)x, equation (10) yields for all v 2 Km; (0;1)( s ? a ; y ? a ) 1 ds v(s)Km; + b ?Za b ? a b ? a a 1 b(b ? a)mv(m)(s) @ m K (0;1)( s ? a ; y ? a ) 1 ds = v(y) (11) 2m a @sm m; b ? a b ? a b ? a (a;b) that: It is then easy to conclude by de nition of Km; Z
b
Proposition 2
where = (b ? a).
(a;b)(s; y ) = 1 K (0;1)( s ? a ; y ? a ) Km; b ? a m; b ? a b ? a
(12)
(a;b) to K (0;1) and therefore it is enough to compute the kernels on the This formula relates Km; m; unit interval.
3.2 Computation of Km;;
(0 1)
In the bounded interval case, the kernels are not translation invariant, and it is convenient to introduce the right and left kernels as follows: (0;1)(x; t) = L (t) Km; x = Rx(t)
for for
tx tx
(13)
Because of the fundamental symmetryproperty of reproducing kernels (see Aronszjan(1950)), it is enough to know the left kernel since: for t x; Rx(t) = Lt(x)
(14)
Then let us come back to equation (10) and perform m consecutive integration by parts in the second integral. For any function g with 2m derivatives in an open subinterval (; ), we can write: Z 2m?1 (m) (m) (t)dt = (?1)m u(t)g (2m)(t)dt + X (?1)k+m [u(2m?1?k) g (k)] u ( t ) g k=m
Z
6
Assuming the left and right kernels do have this extra smoothness property, which will be checked a posteriori, we can apply this identity to g = Lx on (; ) = (0; x) and to g = Rx on (; ) = (x; 1), and plug it back to equation (10) to get: Z
+
x
[Lx + (? 21)m L(2x m)](t)dt +
0 2m ?1 X
k=m
m
Z
1
x
[Rx + (? 21)m R(2x m)](t)dt m
(?1)k+m f[u(2m?1?k)L(xk)]x0 + [u(2m?1?k)R(xk)]1xg = u(x)
(15)
Identifying the terms with the right hand side of (10), one gets the following result: Proposition 3 The left and right kernels Lx and Rx are entirely determined by the following
system of dierential equations:
m ( ? 1) Lx + 2m L(2x m) = 0 m Rx + (? 21)m R(2x m) = 0
(16)
with the boundary conditions:
L(xk)(0) = 0 R(xk)(1) = 0
for for
k = m; ::2m ? 1 k = m; ::2m ? 1
(17)
and:
L(xk)(x) ? R(xk)(x) = 0; for k = 0; ::2m ? 2 m ? = (?1) 1 2m; for k = 2m ? 1
(18)
From equations (16), one concludes that Lx and Rx can be expressed as follows:
Lx(t) = Rx(t) =
m X j =1 m X j =1
lj exp( j t) cos(j t) + lj+m exp( j t) sin(j t) rj exp( j t) cos(j t) + rj+m exp( j t) sin(j t)
(19)
where j = sin( 2m + jm ) and j = cos( 2m + jm ). The 4m unknowns lj ; rj ; j = 1; ::2m are entirely determined by the system of 4m linear equations (17), and (18). Specializing now to the case m = 1, and combining with the result of Proposition 2, we give the solution in the next corollary and the system of 4 linear equations in the appendix. Corollary 2 The left kernel corresponding to
for t x;
) K1(a;b ; is given by:
Lx (t) = sinh (b ? a) cosh (b ? x) cosh (t ? a) 7
(20)
This kernel can be found in the case = 1 in Duc-Jacquet(1973). In the case m = 2, we introduce the following four functions in order to express the nal result: p p 2 b1(z) = exp( 2 z) cos( 22 z) p p 2 b2(z) = exp( 2 z) sin( 22 z) p p 2 b3(z) = exp(? 2 z) cos( 22 z) p p 2 b4(z) = exp(? 2 z) sin( 22 z) (21) For the same purpose, we will write the left kernel in the following form:
Lx(t) =
4 4 X X
j =1 k=1
ljk bj (t)bk (x)
(22)
where the 16 coecients ljk are given in the next corollary, and the system of 8 linear equations in the appendix. Corollary 3 The coecients ljk de ning the left kernel of K2(0;;1) through equation (22)are given by: p p p l11 = f? cos(p2 ) + sin(p2 ) + 3 exp(?p 2 ) ? 2g l12 = f? cos(p2 ) ? sin( p2 ) + exp(?p 2 )g l13 = f? cos( p2 ) + 3 sin(p 2 ) ? exp(p 2 ) + 2g l14 = f?3 cos( 2 ) ? sin( 2 ) ? exp( 2 ) + 4g l21 = l12 p p p l22 = fcos( 2p ) ? sin( 2p ) + exp(? p2 ) ? 2g l23 = f? cos(p2 ) + sin(p2 ) + exp(p2 )g l24 = f? cos( p 2 ) ? sin( 2 ) ? exp( 2 ) + 2g l31 = l13 ? 42 p l32 = l23 + 42 p p p l33 = fcos( 2p ) + sin( 2p ) ? 3 exp( p2 ) + 2g l34 = f? cos( p 2 ) + sin( 2 ) + exp( 2 )g l41 = l14 ? 42 p l42 = l24 ? 42 l43 = l34 p p p (23) l44 = f? cos( 2 ) ? sin( 2 ) ? exp( 2 ) + 2g 8
where:
=
p
p
2
p
16(sin2( 22 ) ? sinh2( 22 ))
(24)
The solution of the system of 8 linear equations was obtained with the help of the computer algebra system Maple. Nevertheless, for those who wish to compute their own kernel, do not expect the software to yield a nice and good looking solution in a few minutes. When the coecients of the linear system are, like in our case, functions of x involving products of exponential and trigonometric functions, human intervention still seems to be necessary. The reader will nd the maple code in the appendix. Figure 3 and 4 display the graphs of these kernels in the case m = 2, x = 0:25 and x = 0:5. As before, low values of correspond to atter kernels.
3
2.5
2
1.5
1
0.5
t 0 0
0.2
0.4
0.6
0.8
1
-0.5
[0;1] for m = 2, = 3; 6; 9 and x = 0:25 Figure 3: Km;
9
3
2.5
2
1.5
1
0.5
t 0 0
0.2
0.4
0.6
0.8
1
-0.5
[0;1] for m = 2 and = 3; 6; 9 and x = 0:5 Figure 4: Km;
4 Appendix
4.1 Intermediate steps for m = 1
When m = 1, the left and right kernels are linear combinations of: exp(t) and exp(?t):
Lx(t) = l1 exp(t) + l2 exp(?t) Rx(t) = r1 exp(t) + r2 exp(?t)
(25)
and the linear system de ning these coecients is:
l1 ? l2 r1 exp( ) ? r2 exp(? ) l1 exp(x) + l2 exp(?x) ? r1 exp(x) ? r2 exp(x) (l1 ? r1) exp(x) ? (l2 ? r2) exp(?x)
= = = =
0 0 0
(26)
4.2 Intermediate steps for m = 2
When m = 2, for the purpose of computing the kernel, we write the left and right kernels as follows in terms of the basis functions (21):
Lx(t) = Rx(t) =
4 X
j =1 4 X j =1
10
lj bj (t) rj bj (t)
(27)
and the linear system de ning these coecients is:
l2 ? l4 = 0 ?r1b2( ) + r2b1( ) + r3b4( ) ? r4b3( ) = 0 ?l1 + l2 + l3 + l4 = 0 r1(?b1( ) ? b2( )) + r2(b1( ) ? b2( )) + r3(b3( ) ? b4( )) + r4(b3( ) + b4( )) = 0 (l1 ? r1)b1(x) + (l2 ? r2)b2(x) + (l3 ? r3)b3(x) + (l4 ? r4)b4(x) = 0 ?(l1 ? r1)b2(x) + (l2 ? r2)b1(x) + (l3 ? r3)b4(x) ? (l4 ? r4)b3(x) = 0 (l1 ? r1)(b1(x) + b2(x)) ? (l2 ? r2)(b1(x) ? b2(x)) p ?(l3 ? r3)(b3(x) ? b4(x)) ? (l4 ? r4)(b3(x) + b4(x)) = 2 (l1 ? r1)(b1(x) ? b2(x)) + (l2 ? r2)(b1(x) + b2(x)) ?(l3 ? r3)(b3(x) + b4(x) + (l4 ? r4)(b3(x) ? b4(x)) = 0 (28) It is clear that the last four equations only involves the unknowns lj ? rj ; j = 1; ::4. Therefore one can decompose the calculation in two systems of 4 linear equations. An intermediate result is the expression of these dierences: p p p p 2 2 2 (l1 ? r1) exp( 2 x) = 4 [cos( 2 x) + sin( 22 x)] p p p p 2 2 2 (l2 ? r2) exp( 2 x) = 4 [? cos( 2 x) + sin( 22 x)] p p p p 2 2 2 (l3 ? r3) exp(? 2 x) = 4 [? cos( 2 x) + sin( 22 x)] p p p p 2 2 2 (l4 ? r4) exp(? 2 x) = 4 [? cos( 2 x) ? sin( 22 x)] (29)
4.3 Maple code
writeto(progout); #basic functions: g1:=proc(t) cos(gam*tau*t):end: g2:=proc(t) sin(gam*tau*t): end: f1:=proc(t) exp(gam*tau*t)*g1(t): end: f2:=proc(t) exp(gam*tau*t)*g2(t):end: f3:=proc(t) exp(-gam*tau*t)*g1(t):end: f4:=proc(t) exp(-gam*tau*t)*g2(t):end: 11
#coecients: wl:=array([al,bl,cl,dl]): wr:=array([ar,br,cr,dr]): #constant: gam:=sqrt(2)/2: # rst system: #matrix: A:=array([[g1(x),g2(x),g1(x),g2(x)],[-g2(x),g1(x),g2(x),-g1(x)], [g1(x)+g2(x),-g1(x)+g2(x),-g1(x)+g2(x),-g1(x)-g2(x)], [g1(x)-g2(x),g1(x)+g2(x),-g1(x)-g2(x),g1(x)-g2(x)]]): #right hand side: v:=array([0,0,tau/gam,0]): #solution: with(linalg): w:=linsolve(A,v): #second system: #matrix: B:=array([[0,1,0,-1], [-f2(1),f1(1),f4(1),-f3(1)],[-1,1,1,1], [-f1(1)-f2(1),f1(1)-f2(1),f3(1)-f4(1),f3(1)+f4(1)]]): #right hand side: exp1:=-w[1]*exp(-gam*tau*x)*f2(1)+w[2]*exp(-gam*tau*x)*f1(1) +w[3]*exp(gam*tau*x)*f4(1)-w[4]*exp(gam*tau*x)*f3(1): exp2:=w[1]*exp(-gam*tau*x)*(-f1(1)-f2(1))+w[2]*exp(-gam*tau*x)*(f1(1)-f2(1)) +w[3]*exp(gam*tau*x)*(f3(1)-f4(1))+w[4]*exp(gam*tau*x)*(f3(1)+f4(1)): vl:=array([0,exp1,0,exp2]): #solution: wl:=linsolve(B,vl): #left kernel: lx:=proc(t) wl[1]*f1(t)+wl[2]*f2(t)+wl[3]*f3(t)+wl[4]*f4(t):end; #right kernel: wr:=array([wl[1]-w[1]*exp(-gam*tau*x),wl[2]-w[2]*exp(-gam*tau*x), wl[3]-w[3]*exp(gam*tau*x),wl[4]-w[4]*exp(gam*tau*x)]): rx:=proc(t) wr[1]*f1(t)+wr[2]*f2(t)+wr[3]*f3(t)+wr[4]*f4(t):end; #checking the results: # rst equation: eqn1:=(D@@2)(lx)(0): simplify(eqn1); #equation 2: eqn2:=(D@@2)(rx)(1): simplify(eqn2); #equation 3: eqn3:=(D@@3)(lx)(0): simplify(eqn3); #equation 4: 12
eqn4:=(D@@3)(rx)(1): simplify(eqn4); #equation 5: eqn5:=lx(x)-rx(x): simplify("); #equation 6: eqn6:=(D@@2)(lx)(x)-(D@@2)(rx)(x):simplify("); #equation 7: eqn7:=-(D@@3)(lx)(x)+(D@@3)(rx)(x): simplify("); #equation 8: eqn8:=D(lx)(x)-D(rx)(x): simplify("); # nal result: deno:=det(B): simplify(deno); #coecient1: temp1:=wl[1]*deno: simplify(temp1); #coecient 2: temp2:=wl[2]*deno: simplify(temp2); #coecient 3: temp3:=wl[3]*deno: simplify(temp3); #coecient 4: temp4:=wl[4]*deno: simplify(temp4);
References [1] R. A. Adams,(1975) Sobolev spaces, [Academic press, Inc, Harcourt Brace Jovanovich publishers]. [2] S. Agmon,(1965) \Lectures on elliptic boundary value problems", [D.Van Nostrand, Princeton NJ]. [3] N. Aronszajn,(1950) \Theory of reproducing kernels", Transactions of the AMS, vol 68, pp. 3-404. [4] M. Delecroix, M. Simioni, C. Thomas-Agnan,(1992) \Functional estimation under shape constraints", technical report, vol. 92.22.269, [Universite de Toulouse I, France]. [5] M. Duc-Jacquet,(1973) \Approximation des fonctionnelles lineaires sur les espaces hilbertiens autoreproduisants", [thesis, University of Grenoble, France]. [6] B. Silverman,(1984) \Spline smoothing: the equivalent kernel method, Annals of Statistics, vol 12, pp. 898-916. [7] C. Thomas-Agnan (1991) \Spline functions and stochastic ltering", Annals of Statistics, pp..1512-1527. [8] G. Wahba, J. Wendelberger,(1980) \Some new mathematical methods for variational objective analysis using splines and cross-validation", Monthly Weather Review, pp.1122-1143. 13
[9] H. L. Weinert (1982) \Reproducing kernel Hilbert spaces: applications in statistical signal processing", [Weinert ed, Stroudsburg Pa, Hutchinson Ross Pub Co NY].
14