Kernel Estimation of Mixed Spatial Derivatives of Statistics of Scattered Data Stephan Küchlin∗, Patrick Jenny Institute of Fluid Dynamics, Swiss Federal Institute of Technology (ETH) in Zurich Sonneggstrasse 3, 8092 Zurich, Switzerland
Abstract We present an approach to directly estimate spatial derivatives of statistics of scattered data based on the concept of kernel density estimation. We derive asymptotic bias and variance of various kernel types, as well as explicit expressions for many kernels. Further, we show that the method trivially extends to kernel interpolation. The root mean squared error performance with regards to several estimates of selected kernels is investigated in a Monte Carlo study. Since the kernel approach is completely local, it is simpler to implement compared to finite difference methods, especially in the context of locally refined computational grids and parallel algorithms based on domain-decomposition. The generality of the method makes it suitable for many particle based numerical simulation algorithms, as well as for data analysis tasks.
1. Introduction The subject of non-parametric density (derivative) estimation has received much attention in the statistics and—more recently—machine intelligence literature, cf., e.g., Refs. [1, 2]. Given as data N realizations xl , l = 1, . . . , N, of a random variable with associated probability density ϕ(x), one is interested in estimating ϕ(x) via the kernel density estimate ! N 1 X x − xl b ϕ(x) = K , hN l=1 h see, e.g., the seminal work by Epanechnikov [3]. Here, we consider a related problem: we seek an estimate of the ν-th order (ν = 0, 1, . . . ) mixed spatial derivative of the generalized state space moments Eζ g(ζ; x) of a joint probability density fξζ (x, c) at a given location x, i.e., P α x ∂αx11 ∂αx22 · · · ∂ xddxx Eζ g(ζ; x) with di=1 αi = ν. To this end, our aim is to derive optimal kernels and their respective asymptotic bias and mean squared error expressions for the estimation of various mixed spatial derivatives of the state space moments. The suggested approach may prove useful for particle-based simulation methods ranging from molecular dynamics, direct simulation Monte ∗
Corresponding author. Email address:
[email protected] (Stephan Küchlin)
Preprint submitted to Elsevier
June 26, 2018
Carlo and related rarefied gas dynamics simulation algorithms to astro-physics simulations; as well as for many data analysis tasks. The remainder of the paper is organized as follows: after introducing notation (Section 2), we derive the kernel estimator for mixed spatial derivatives of density weighted generalized state space moments, along with necessary consistency conditions for the kernels (Section 3). Next, we study the convergence properties of the estimator by deriving the relevant asymptotic bias and variance expressions (Section 4), as well as the asymptotically optimal kernel bandwidth (Section 5). Estimation of the unweighted generalized state space moments in terms of the estimates of their density weighted counterparts is considered in Section 6. Furthermore, we show in Section 7 that kernel interpolation of mixed spatial derivatives of a (potentially unknown) function may be considered as a special case of the results obtained for the generalized state space moments. Section 8 deals with the explicit construction of kernel functions: three classes of kernels are derived, and a novel regularization of the associated variational problem is proposed, which generalizes previous results in the literature. A Monte Carlo study (Section 9) illustrates the root mean squared error behavior of various estimates as a function of kernel type and sample size for a relevant test case. Section 10 concludes. 2. Multi-Index Notation For notational convenience, multi indices are used throughout. We restate here (with slight modification) the relevant definitions from Ref. [4, Appendix A]: a multi index is an ordered n-tuple α = (α1 , α2 , . . . , αn ) ∈ Nn0 of integers. For α ∈ Nn0 , define the length |α| = α1 + α2 + · · · + αn and factorial α! = α1 !α2 ! · · · αn !, respectively. Addition and subtraction, multiplication by a scalar, and comparison of multi indices are defined component-wise, i.e., for α, β ∈ Nn0 , a ∈ N: α ± β = (α1 ±β1 , α2 ±β2 , . . . , αn ±βn ), α ≤ β if (α1 ≤ β1 , α2 ≤ β2 , . . . , αn ≤ βn ) and aα = (aα1 , aα2 , . . . , aαn ). Note that subtraction α − β is only defined for β ≤ α. Further, for a vector x = (x1 , x2 , . . . , xn ) ∈ Rn , define xα = x1α1 x2α2 · · · xnαn . The mixed partial |α| derivative operator w.r.t. coordinates x is abbreviated as ∂α = ∂α1 1 ∂α2 2 · · · ∂αn n = ∂α1 ∂∂α2 ···∂αn . We use x1 x2 xn the special multi indices 0 = (0, . . . , 0), 1 = (1, . . . , 1) and δi = (δ1i , δ2i , . . . , δni ), where δi j is the Q Kronecker delta. Consequently, |0| = 0, 0! = 1, x0 = 1, ∂0f (x) = f (x), |1| = n, 1! = 1, x1 = nj=1 x j and xδi = xi , where here and in the following, in expressions such as x1 , the dimension of the multi index in the exponent is implied by the dimension of the vector. 3. Kernel Estimate of Mixed Spatial Derivatives of Generalized State Space Moments 3.1. Definitions Given data in a d x + dc -dimensional phase space, comprised of d x -dimensional configuration space with coordinate vector x ∈ Rdx , and dc -dimensional state space with coordinate vector c ∈ Rdc , we consider the case when said data may be assumed to be realizations of the absolutely continuous random vectors ξ : Ω → Rdx and ζ : Ω → Rdc , respectively, where Ω denotes the sample space. Let B(Rn ) be the Borel σ-algebra of subsets of Rn . Define the (non-normalized) phase density
2
f : Rdx × Rdc → R, such that the probability of finding a data point in the region B ∈ B Rdx × Rdc is given by ¨ 1 Pr({ξ, ζ} ∈ B) = ˜ f (x0 , c0 ) dx0 1 dc0 1 , (1) 1 1 0 0 0 0 B Rd x ×Rdc f (x , c ) dx dc where dx0 1 dc0 1 denotes dx10 dx20 · · · dxd0 x dc01 dc02 · · · dc0dc . Hence, f (x, c) is non-negative and a Borel function. The phase density may be factored into the (configuration) density ρ(x) and the normalized ´ conditional state probability density fζ (c, x) satisfying Rdc fζ (c, x)dc1 ≡ 1, i.e., f (x, c) = ρ(x) fζ (c, x). Define the normalization constant in Eq. (1) as ¨ ˆ 0 0 01 01 M= f (x , c ) dx dc = ρ(x0 )dx0 1 .
(2)
(3)
Rd x
Rd x ×Rdc
In the case where ρ(x) corresponds to a physical density with units of mass per volume, M is equal to the total mass under consideration. The joint probability density of the random vectors {ξ, ζ} follows as 1 fξζ (x, c) = f (x, c). (4) M Finally, the expectation operator acting on Borel measurable functions g : Rdx × Rdc × Rdy → R of {ξ, ζ; y}, where the vector y ∈ Rdy denotes non-random elements in the domain of g, is defined as usual: ¨ Eξζ g(ξ, ζ; y) = g(x0 , c0 , y) fξζ (x0 , c0 )dx0 1 dc0 1 . (5) Rd x ×Rdc
In the following, we only consider functions g for which E[|g|] < ∞ and E[g2 ] < ∞ exist. Further, we are specifically interested in the case where g is not a function of ξ, but may depend on the measure space coordinate x. Accordingly, we set y ≡ x and g = g(ζ; x). In this case, ˆ Eξζ g(ζ; x) = Eζ g(ζ; x) = g(c0 , x) fζ (c0 , x)dc0 1 . (6) Rdc
Define Fg : Rdx → R as the density weighted, generalized state space moments with spatial dependence, viz. Fg (x) = Eζ ρ(x)g(ζ; x) ˆ = ρ(x)g(c0 , x) fζ (c0 , x)dc0 1 (7) dc ˆR = g(c0 , x) f (c0 , x)dc0 1 . Rdc
The following sections are concerned with the estimation of the mixed spatial derivatives ∂α Fg from sample data.
3
3.2. Derivation of the Kernel Estimate We proceed by rewriting the integrand in Eq. (7) in terms of a convolution in the measure space of ξ with the limit of a sequence of suitable functions φ h : Rdx → R with h ∈ Rdx , hi > 0, as follows ([5, Ch. 7, Lemma 1]): for φ : Rdx → R absolutely integrable and appropriately differentiable, with ´ compact support on the d x -dimensional interval [−1, 1]dx , satisfying Rdx φ(x)dx1 = 1, ˆ 1 0 0 g(c , x) f (c , x) = lim+ g(c0 , x0 ) f (c0 , x0 ) 1 φ diag(h)−1 x − x0 dx0 1 , (8) h→0 Rd x |h {z } φh
Qx where h1 = h(1,1,...,1) = dj=1 h j . Note that here and throughout, [a, b]n denotes the set {x ∈ Rn : a ≤ xi ≤ b, i = 1, . . . , n} ∈ B(Rn ), i.e., an n-dimensional cube with edge-length b − a. Inserting Eq. (8) into Eq. (7), we find ¨ 1 Fg (x) = lim+ g(c0 , x0 ) f (c0 , x0 ) 1 φ diag(h)−1 x − x0 dx0 1 dc0 1 h→0 h Rd x ×Rdc " (9) # 1 −1 = M lim+ Eξζ g(ζ, ξ) 1 φ diag(h) (x − ξ) . h→0 h The ν-th order mixed spatial derivative ∂α Fg of Fg , with |α| = ν, follows as " # −1 α α1 ∂ Fg (x) = M lim+ Eξζ g(ζ, ξ) ∂ 1 φ diag(h) (x − ξ) h→0 h " # 1 (α) −1 = M lim+ Eξζ α+1 φ diag(h) (x − ξ) g(ζ, ξ) , h→0 h where the notation φ(α) signifies
(10)
∂|α| φ(r). α α ∂r11 ∂r22 ···∂αrnn
α F (x) of ∂α F (x) may be obtained by fixing h and estimating the expected value An estimate ∂[ g g in Eq. 10 via the weighted average of a finite number N of samples, each with index l, l = 1, . . . , N, weight wl , position xl , and with state vector cl , viz.
α F (x) = ∂[ g
N 1 X
hα+1
wl Kα diag h −1 x − xl g(cl , xl ),
(11)
l=1
where the kernel Kα is the ν-th order mixed partial derivative φ(α) , with |α| = ν, of the function φ, and the weights are assumed to satisfy w > 0, l
N X
wl ≡ M,
l=1
which also implies
and
lim wl = 0,
N→∞
(12)
PN l 2 = 0. Define the random variable η : Ω → R, which depends on the l=1 w 4
random vectors ζ and ξ as follows: 1
−1 (x K diag h − ξ) g(ζ, ξ). (13) α hα+1 The expected value of η may be calculated w.r.t. ξ and ζ, i.e., E η ≡ Eξζ η . hFori all Kα and fξζ considered here, η is absolutely continuous and satisfies E |η| < ∞ and E η2 < ∞. Let {ηi , i : 1 ≤ i ≤ N} be a collection of independent, identically distributed random variables with the same distribution as η. The estimate (11) may be written as a random variable in terms of the ηi as η=
α F (x) = ∂[ g
N X
w l ηl ,
(14)
l=1
with expected value N h i X h i α [ E ∂ Fg (x) = wl E ηl = ME η .
(15)
l=1
Under the assumptions (12), the following strong law of large numbers holds (cf., e.g., [6, Proposition 4.3]): α F = M E η = 1. Pr lim ∂[ (16) g N→∞
α F to consistently estimate ∂α F . For Accordingly, M E η = ∂α Fg is a necessary condition for ∂[ g g suitable Kα , Eq. (10) shows that this condition is met in the limit h → 0+ . Consequently, we set h = h(N), hi > 0, i = 1, . . . , d x , while requiring lim hi = 0,
N→∞
as well as
lim
N→∞
PN l 2 l=1 w h2α+1
= 0.
(17)
The hi will be referred to as kernel bandwidths in the following, and, for dimensional data xl , are assumed to have the same units as the coordinates x, so that the kernel may be defined in a dimensionless reference frame. Under the assumptions (12) and (17), it follows that the estimate (11) converges almost surely for suitable Kα , i.e., α F = ∂α F Pr lim ∂[ (18) g g = 1. N→∞
5
3.3. Kernel Consistency Constraints In the following, we derive constraints on the kernel Kα in order to fulfill M lim+ E η = ∂α Fg . h→0 For η as defined in Eq. (13), we obtain ¨ −1 1 0 ME η = K diag h x − x g c0 , x0 f x0 , c0 dx0 1 dc0 1 α α+1 h Rd x ×Rdc ˆ 1 (19) = α+1 Kα diag h −1 x − x0 Fg x0 dx0 1 d h x ˆ R 1 = α Kα y0 Fg x − diag h y0 dy0 1 , h Rdx where we have used the substitution xi0 := xi − hi y0i , dxi0 = −hi dy0i , and 0lim xi0 = ∓ 0lim y0i , x →±∞ xi →±∞ i0 01 0 0 0 i = 1, . . . , d x ; and dy denotes dy1 dy2 · · · dydx . Replacing Fg x − diag h y in Eq. (19) by the corresponding multivariate Taylor series around the point x yields 1 ME η = α h
ˆ
∞ X (−1)|γ| γ 0 γ γ h y ∂ Fg (x)dy0 1 Kα y d γ! x R |γ|=0 0
∞ 1 X (−1)|γ| γ = α h µγ (Kα ) ∂γ Fg (x) h |γ|=0 γ! dx X 1 (K )F (x) = Qdx αi µ − h j1 µδ j1(Kα )∂ j1 Fg (x) 0 α g h j1 =1 i=1 i d
d
1
2
(20)
x X x 1X (Kα )∂ j1 ∂ j2 Fg (x) + hj hj µ 2 j =1 j =1 1 2 (δ j1 +δ j2 )
dx X dx X dx 1X , h j1 h j2 h j3 µ(δ j1 +δ j2 +δ j3 ) (Kα )∂ j1 ∂ j2 ∂ j3 Fg (x) + h.o.t. − 6 j =1 j =1 j =1 1
2
3
where each term in the series is characterized by the multi index γ, and ˆ ˆ 0γ 0 01 µγ (Kα ) = y Kα y dy = y01 γ1 y02 γ2 · · · y0dxγdx Kα y0 dy0 1
(21)
Rd x
Rd x
are the moments of order |γ| of the kernel Kα . It follows that Kα should satisfy the following moment conditions to ensure M lim+ E η = ∂α Fg : h→0
γ , α, |γ| < k 0, µγ (Kα ) = |γ| (−1) γ! γ = α, !
6
(22)
where k = ν + 2i with ν = |α|, i ∈ N, is the kernel order (Gasser et al. [7]), defined as the lowest order of the kernel moments greater ν for which at least one µγ: |γ|=k (Kα ) is non-zero. Note that the conditions (22) also imply µγ (Kα ) = 0 ∀γ : |γ| < |α| ⇒ µγ (Kα ) = 0 ∀γ : γ 6≥ α,
(23)
since the integration in Eq. (21) over the i-th spatial dimension, corresponding to an index with γi < αi , will always yield zero, due to the conditions (22) implying µγi δi = 0. For Kα satisfying the constraints (22), Eq. (24) reads k X γ X (−1) h α γ γ−α µγ (Kα ) ∂ F(x) + o h , (24) M E η = ∂ Fg (x) + α h |γ|=k γ! |γ|=k γ>α γ>α where the summation is over all multi indices γ of length |γ| = k satisfying γ > α, and the “small oh” notation o{g(h)} denotes all terms that approach zero faster than the argument g as h → 0+ . 4. Convergence of the Kernel Estimate To study the convergence of the estimate (11) as a function of N as N → ∞, we study the following expressions, defined point-wise, i.e., at every position x: h i h i α αF α [ B ∂[ (bias), (25) g = E ∂ Fg − ∂ Fg h i h i2 αF α α [ [ Var ∂[ (variance), (26) g = E ∂ Fg − E ∂ Fg h i 2 α αF α [ and MSE ∂[ (mean squared error), (27) g = E ∂ Fg − ∂ Fg under the assumption that the kernel Kα satisfies the constraints (22). 4.1. Bias For a kernel of order k, satisfying Eqs. (22), using Eqs. (15) and (24), the point-wise bias of the estimate (11) reads X h i (−1)k X hγ γ γ−α α [ (K ) B ∂ Fg (x) = µ ∂ F(x) + o h . (28) γ α hα |γ|=k γ! |γ|=k γ>α γ>α Eq. (28) shows that the order of the kernel determines the leading order term in the expansion of the bias. The specific form of the leading bias term may be controlled by imposing additional conditions on the k-th order moments, e.g., µKα γ!, γ = α + (k − ν)δi , i ∈ {1, 2, . . . , d x } ! µγ: |γ|=k (Kα ) = (29) 0 else, 7
with scalar parameter µKα ∈ R (cf., e.g., Ref. [8] and references therein). This choice of µγ ensures that the leading bias term only depends on the mixed partial derivatives ∂γ Fg with γ > α, and equally weights the contribution of each coordinate direction. Other choices are of course possible: for example, the condition |ω|! µKα γ! ω! , γ = α + 2ω ! (30) µγ: |γ|=k (Kα ) = 0 else, with multi index ω, would ensure that the leading bias term is proportional to the total trace of k−ν the mixed derivative tensor of order k − ν, ∂ j ∂ j∂···∂ j , 1 ≤ ji ≤ d x , which, in contrast to the sum 1 2 (k−ν) of elements with equal indices implied by the constraints (29), is a tensor invariant. We do not investigate alternative choices to Eq. (29) in the following. Under the additional assumption that Kα satisfies the constraints (29), Eq. (28) reads d x X X h i k K (k−ν)δ α+(k-ν)δ γ−α α i i α B ∂[ Fg (x) = (−1) µ h ∂ Fg (x) + o h . (31) i=1 |γ|=k γ>α For example, inserting α = δ j , hi ≡ h, d = 3 and k = 5 into Eq. (31) yields n o h i 4 Kα 4 4 4 4 B ∂d j F g (x) = −h µ ∂ j ∂1 + ∂2 + ∂3 F g (x) + o h .
(32)
4.2. Variance and Mean Squared Error Applying the variance operator to Eq. (14) and using Bienaymé’s identity for a weighted sum of independent random variables yields N N h i X X l l h 2 i 2 2 α [ Var ∂ Fg = Var w η = E η − E η wl . (33) l=1
l=1
We use the same change of variables as in Eq. (19) to find the intermediate result ¨ h i 2 1 2 M E η = 2(α+1) Kα diag h −1 x − x0 g c0 , x0 2 f x0 , c0 dx0 1 dc0 1 h dc dx ˆ R ×R 2 1 = 2(α+1) Kα diag h −1 x − x0 Fg2 x0 dx0 1 dx h ˆ R 1 = 2α+1 Kα (y)2 Fg2 x − diag h y dy1 h Rd x R(Kα )Fg2 (x) = + o{1}, h2α+1
(34)
ˆ
where R(g) =
g(y)2 dy1 Rd x
8
(35)
is defined for any square integrable function g : Rdx → R. Inserting Eq. (34) into Eq. (33), the point-wise variance of the estimate (11) follows as N h i i X 2 α [ Var ∂ Fg (x) = wl E η2 − E η 2
h
l=1 N 2 ! 1 α 1 X l 2 R(Kα )Fg2 (x) w + o{1} − ∂ Fg (x) + o{1} = M l=1 h2α+1 M PN l 2 N l=1 w 1 X l 2 2 (x) + o R(K )F w . = α g 2α+1 2α+1 h Mh
(36)
l=1
h i α F (x) approaches zero as N → ∞. The fastest rate of By the assumptions (17) on h, Var ∂[ g 2 1 PN l convergence of M → 0 is obtained in the case of equally weighted samples, i.e., with l=1 w M 1 PN M l 2 w = w = N , l = 1, . . . , N, which implies M l=1 w = N . The point-wise mean squared error (MSE) of the estimate (11) follows as h i h i h i2 αF α α [ [ MSE ∂[ g = Var ∂ F g + B ∂ F g 2 PN l 2 γ X 1 h l=1 w γ 2 (x) + (K ) (x) = µ ∂ F R(K )F γ α g α g Mh2α+1 h2α |γ|=k γ! γ>α N 1 X l 2 w . + o h2α+1 l=1
(37)
5. Asymptotically Optimal Bandwidth Under the additional assumptions that hi = h, i = 1, . . . , d x , and wl = w = M , l = 1, . . . , N, N assuming Kα satisfies conditions (29), the asymptotic values for the point-wise bias, variance and mean squared error, i.e., the leading terms in expressions (28), (36) and (37), read dx X h i α F (x) = (−1)k h(k−ν) µKα AB ∂[ ∂α+(k-ν)δi Fg (x), g
(38)
i=1
i 1 α F (x) = AVar ∂[ R(Kα )MFg2 (x), g Nhdx +2ν h i 1 α F (x) = AMSE ∂[ R(Kα )MFg2 (x) g Nhdx +2ν d 2 x 2 X + h2(k−ν) µKα ∂α+(k-ν)δi Fg (x) , h
and
i=1
9
(39)
(40)
respectively. Since for any kernel Kα satisfying conditions (22) and (29), the scaled kernel Kαδ := δ δ−(dx +ν) Kα (δ−1 x), with δ ∈ R > 0, also satisfies Eq. (22), as well as Eq. (29) with µKα = δk−ν µKα , δc 2 one may choose a scaling δ = δc such that µKα = R Kαδc =: T (Kα ), with R as defined in Eq. (35), and such a kernel is referred to as canonical [9]. Specifically, R Kαδ = δ−(2ν+dx ) R(Kα ), and hence 1 2k+d x α) δc = R(K . It follows that 2 K (µ α ) 2 k−ν Kα 2ν+d x 2k+d x T (Kα ) = R(Kα ) µ .
(41)
Note that T (Kα ) is invariant to any re-scaling of Kα , i.e., T Kαδ ≡ T (Kα ). Eq. (40) may be written in terms of T (Kα ) as d 2 x 1 X h i 2(k−ν) α+(k-ν)δi . α F (x) = T (K ) 2 (x) (x) AMSE ∂[ + h MF ∂ F (42) g α g g Nhdx +2ν i=1 h i α F (x) , 0, differentiating Eq. (42) w.r.t. h and equating to zero formally yields Assuming AB ∂[ g the point-wise AMSE-optimal bandwidth hAMSE , i.e.,
hAMSE x, N; d x , k, ν, Fg , Fg2
1 2k+d x MFg2 (x) 1 (d x + 2ν) = . N 2(k − ν) Pdx α+(k-ν)δi (x)2 ∂ F g i=1
(43)
6. Estimation of Unweighted Generalized Moments . In case the density ρ is known at the sample locations xl , one may set g˜ (cl , xl ) := g(cl , xl ) ρ(xl ) , α F (x) (Eq. (11)) yields a consistent estimate for the quantity ∂α E [g(ζ; x)] directly. in which case ∂[ g˜ ζ In the general case, an estimate of ∂α Eζ [g(ζ; x)] may be obtained by applying the generalized Leibniz rule to expand the mixed derivative ∂α Fg , i.e., X ∂η ρ(x) ∂ω Eζ g(ζ; x) α α , (44) ∂ Fg (x) = ∂ Eζ ρ(x)g(ζ; x) = α! η! ω! η+ω=α where the summation is over all multi indices η, ω for which η + ω = α. Rearranging Eq. (44) yields ω η X 1 α ∂ ρ(x) ∂ Eζ g(ζ; x) ∂ Fg (x) − α! , ∂α Eζ g(ζ; x) = (45) ρ(x) η! ω! η+ω=α ω,α
where the unknown partial derivatives ∂η ρ on the right hand side may be estimated by setting g ≡ 1 in Eq. (11) (since F1 (x) ≡ ρ(x)), and the unknown partial derivatives ∂ω Eζ g(ζ; x) with ω , α, all of which of order smaller |α|, may be estimated by applying the procedure recursively. Obviously, 10
. the case α = 0 leads to Eζ g(ζ; x) = Fg (x) F1 (x). 7. Kernel Interpolation It is worth noting that kernel-based interpolation of a (deterministic but potentially unknown) α F (x) with g(c, x) ≡ g(x), since function u : x ∈ Rdx → R is directly related to the estimate ∂[ g ∂α Fg (x) = ∂α Eζ ρ(x)g(x) ≡ ∂α (ρ(x)g(x)).
(46)
Given as data N values ul := u(xl ) at (random) locations xl , l = 1, . . . , N, one may directly use α F (x) then the data ul in Eq. (11) by setting g(xl ) := ul . By the results obtained in Section 3, ∂[ g α α yields a consistent estimate for ∂ (ρ(x)u(x)). The quantity ∂ u(x) may, in turn, be estimated via the procedure outlined in Section 6. 8. Explicit Kernel Functions To derive explicit expressions for Kα , we again only consider the case hi = h, i = 1, . . . , d x . Following Gasser et al. [7], we consider two classes, “minimum variance kernels” Kαv , that minimize the asymptotic variance (Eq. (39)), and “optimal kernels” Kαo , minimizing the asymptotic MSE (Eq. (40)). In the following, only continuous kernels with compact support on the d x -dimensional interval [−1, 1]dx will be considered. 8.1. Minimum Variance Kernels For a given configuration space dimension d x , mixed partial derivative operator ∂α of order |α| = ν, and kernel order k > ν, we seek the kernel Kαv, k that minimizes the functional R Kαk = ´ k 2 1 Kα (x) dx (cf. Eq. (35)) on the d x -dimensional interval [−1, 1]dx , subject to the moment conditions (22), and no condition on the values of the k-th order moments. Using the method of Lagrangian multipliers, this may be stated as the following minimization problem over the square-integrable functions u : Rdx → R: ˆ 1 ˆ 1 X v, k 2 ω 1 Kα (x) = argmin λω x dx = argmin L(x, u(x))dx1 , (47) u(x) + u(x) u
−1
u
|ω|