A Particle System for Global Optimization - IEEE Xplore

8 downloads 0 Views 312KB Size Report
A Particle System for Global Optimization. Chi Zhang. Abstract—A particle algorithm for global optimization is presented, based on concepts from optimal ...
52nd IEEE Conference on Decision and Control December 10-13, 2013. Florence, Italy

A Particle System for Global Optimization Chi Zhang

Abstract— A particle algorithm for global optimization is presented, based on concepts from optimal transport. The particle flow is constructed by considering a time-stepping variational problem, and the resulting flow is interpreted as a gradient flow with respect to a certain (pseudo-) metric. The construction requires to solve an Euler-Lagrange boundary value problem at each time step, and a Galerkin procedure is described to approximate its solution. The algorithm is illustrated by a numerical example.

I. I NTRODUCTION We consider a global optimization problem min h(x),

x∈Rd

(1)

where h : Rd → R is a real-valued continuously differentiable bounded function. The objective is to obtain, using a simulation-based method, the global minimizer x¯ = arg min h(x). x∈Rd

We will assume that such a minimizer exists and is unique. In this paper, we introduce an interacting particle algorithm based on a gradient-flow construction; cf., [6]. The particle system is comprised of N stochastic processes {Xti : 1 ≤ i ≤ N}: The value Xti ∈ R is the state of the ith particle at time t. In our approach, the dynamics of the ith particle is defined by a controlled system: dXti = −ε∇h(Xti ) dt + (1 − ε) dUti

(2)

where the initial condition X0i is sampled from a given initial distribution p∗ (x, 0), ε ∈ (0, 1) is a small parameter, and Uti is the control input that models the interaction with the population. The motivation comes from ‘consensus-type’ control used in the convex optimization literature; cf., [3]. We cast the synthesis of control input as an optimization problem. The optimal control input is obtained via solution of an associated Euler-Lagrange (E-L) boundary value problem (BVP). In particular, the optimal control input is given by, dUti = u(Xti ,t) dt, and the control function u(x,t) = ∇φ (x,t) is obtained by solving an E-L BVP: ˆ ∇ · (p(x,t)∇φ (x,t)) = (h(x) − h)p(x,t), Z (3) φ (x,t)p(x,t) dx = 0, Financial support from the AFOSR grant FA9550-09-1-0190 and the NSF CPS grant 0931416 is gratefully acknowledged. C. Zhang is with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign (UIUC) [email protected]

978-1-4673-5717-3/13/$31.00 ©2013 IEEE

where p denotes the distribution of Xti , and hˆ := E[h(Xti )] = R h(x)p(x,t) dx. In numerical implementations, we approximate hˆ ≈ N1 ∑Ni=1 h(Xti ) =: hˆ (N) . Note that the control function needs to be obtained for each value of time t. The contributions of this paper are as follows: • Variational formulation. The particle system (2) - (3) is obtained by considering a certain variational problem based on a time-stepping iterative procedure that is introduced in Sec. II. At each time-step, the procedure seeks to minimize the expected value of the function h, referred to as energy. The dynamics of the particle system (2) may thus be regarded as a gradient flow, or a steepest descent, for the energy. It is noted that the particle dynamics is entirely deterministic, without any selection or reproduction. • Convergence. We show that the particle system solves the optimization problem (1) in the following sense: Suppose p(x, 0) > 0 and the control function u is obtained as a solution of (3), then, Z

h(x)p(x,t) dx → h(x) ¯ as t → ∞.

(4)

This implies that the empirical distribution of the population will concentrate at the unique global minimizer x. ¯ • Numerical algorithm. We present a Galerkin algorithm to approximate the control function u(x,t). The algorithm is completely adapted to data (That is, it does not require an explicit construction of p(x,t) or computation of derivatives). Certain closed-form approximations for the control function are also presented. The conclusions are illustrated with a numerical example. We next present a brief review of the relevant literature: Literature review: Population-based methods, including interacting particle algorithms, are widely used for global optimization. Some examples include genetic algorithms, differential evolution and controlled random search (see recent survey articles [2], [1]). Whereas these algorithms can lack a model for how the particle distribution evolves, the interacting particle algorithm presented in [8] extends the simulated annealing algorithm via a Feynman-Kac construction with Markov exploration and selection kernels. The selection favors particles with higher Metropolis ratio (see also [7]). More recently, model-based random search methods (see [13]) have been developed. These methods update the particles according to a prescribed reference distribution (model) whose evolution has the desired convergence properties. Various methods have been proposed based on the choice of the reference distribution and its evolution: i) the

1714

cross entropy method ([9]) uses an optimal importance sampling distribution , ii) the model reference adaptive search in [4] updates the reference distribution via an application of the Bayes’ recursion, iii) a particle filtering algorithm appears in [12], and iv) the refernce model in [10] evolves according to the replicator dynamics, which is closely related to our work. Since the reference distributions may be implicit, particles are often sampled from a parameterized distribution, that at each iteration approximates (e.g., with respect to the Kullback-Leibler divergence) the reference distribution; cf., [5]. The outline of the remainder of this paper is as follows. The variational time-stepping procedure appears in Sec. II. The equations for the density and the particle system are described in Sec. III and Sec. IV, respectively. Finally, Sec. V provides a numerical example. The proofs appear in the Appendix. Notation: Ck is used to denote the space of k-times continuously differentiable functions on Rd ; Cck denotes the subspace of functions in Ck with compact support; L∞ denotes the space of bounded functions on Rd ; L2 (Rd ; ρ) is used to denote the Hilbert space of functions on Rd that are squareintegrable with respect to density ρ; and H k (Rd ; ρ) is used to denote the Hilbert space of functions whose first k-derivatives (defined in the weak sense) are in L2 (Rd ; ρ). For a function f , ∇ f = ∂∂ xfi is used to denote the gradient

2) Wasserstein distance: 1 2 1 W2 (ρ | ρl−1 ) = inf 2 2 p∈P(ρl−1 ,ρ)

Z Rd ×Rd

|x − y|2 p( dx dy)

(7) with P(ρl−1 , ρ) as the set of all probability measures whose first marginal is ρl−1 and second marginal is ρ. 3) Convex combination of the two: 1 ε W22 (ρ | ρl−1 ) + (1 − ε)D(ρ | ρl−1 ), 2

(8)

where ε ∈ (0, 1) is a small parameter. The third case is ultimately of more interest – the related result follows from combining the results for the first and the second cases. The sequence of minimizers {ρl }Ll=0 is used to construct, via interpolation, a density function ρ (L) (x,t) for t ∈ [0, T ]: Define ρ (L) (x,t) by setting ρ (L) (x,t) := ρl (x), for t ∈ [tl ,tl+1 )

(9)

for l = 0, 1, 2, . . . , L − 1. Sec. III is concerned with convergence analysis in the limit ∆tl → 0. Remark 1: The minimizer of (5) with only the K-L divergence is given by the following recursion related to the stochastic annealing algorithm: ρ0 (x) = p∗0 (x) ∈ P, ρl−1 (x) exp(−∆tl h(x)) . ρl (x) = R ρl−1 (y) exp(−∆tl h(y)) dy

∂2 f

and D2 f = ∂ xi ∂ x j is used to denote the Hessian. The derivatives are interpreted in the weak-sense.

(10)

Indeed, by Jensen’s formula,

II. T IME -S TEPPING P ROCEDURE

Z

The considerations of this section – construction of a gradient flow – is motivated by the optimal transport literature, in particular, the work of Otto and co-workers on the variational interpretation of the Fokker-Planck-Kolmogorov equation [6]. The time-stepping scheme involves a sequence of minimization problems in the space of probability densities, n o P := ρ : Rd → [0, ∞) meas. density M (ρ) < ∞ where M (ρ) := |x|2 ρ(x) dx. We consider a finite-time interval [0, T ] with an associated discrete-time sequence {0,t1 ,t2 , . . . ,tL } with tL = T , and denote ∆tl := tl − tl−1 . Set ρ0 = p∗0 ∈ P and inductively define {ρl }Ll=1 ⊂ P by taking ρl ∈ P to minimize the functional R

Il (ρ) := M(ρ | ρl−1 ) + ∆tl

Z

h(x)ρ(x) dx,

(5)

Il (ρ) ≥ − ln(

with equality if and only if ρ = ρl . Although the optimizer is known for the K-L divergence, a careful look at the first order optimality equations associated with ρl leads to i) the PDE (13) for evolution of the density (in Sec. III), and ii) a particle algorithm with mean-field control that approximates of the density (in Sec. IV). The analysis with different choices of M(ρ | ρl−1 ) in (5) yields different algorithms. Throughout the paper, the following assumptions are made for the prior distribution p∗0 and for function h: Assumption A1 The probability density p∗0 ∈ P is of the form p∗0 (x) = e−G0 (x) , where G0 ∈ C2 , ∇G0 = O(|x|) as |x| → ∞, and D2 G0 ∈ L∞ . Assumption A2 The function h ∈ C2 with h, ∇h, D2 h ∈ L∞ . III. VARIATIONAL A NALYSIS

R

where E(ρ) := h(x)ρ(x) dx is the energy associated with the objective function h(x). The following choices for the (pseudo-) metric M(ρ | ρl−1 ) are considered: 1) Kullback-Leibler (K-L) divergence: Z  ρ(x)  dx. (6) D(ρ | ρl−1 ) = ρ(x) ln ρl−1 (x)

ρl−1 (y) exp(−∆tl h(y)) dy),

A. First order optimality condition The analysis proceeds by first obtaining the first variation as described in the following Lemma. The proof appears in Appendix A. Lemma 1: Consider the minimization problem (5) under Assumptions (A1)-(A2). The minimizer ρl satisfies the following Euler-Lagrange equation:

1715

(i) If M(ρ | ρl−1 ) = D(ρ | ρl−1 ), the K-L divergence defined in (6), then ρl satisfies Z   ∇ · ς + ∇ ln(ρl−1 ) · ς − ∆tl ∇h · ς ρn dx = 0 , (11)

IV. PARTICLE S YSTEM A. Particle dynamics and control architecture The particle system is comprised of N coupled stochastic processes {Xti }Ni=1 , whose dynamics evolve according to,

for each vector field ς ∈ L2 (Rd ; ρl−1 ). (ii) If M(ρ | ρl−1 ) = 21 W22 (ρ | ρl−1 ), the Wasserstein distance defined in (7), then ρl satisfies Z Rd ×Rd

(y − x) · ς (y)p( dx dy) + ∆tl

Z

dXti = −ε∇h(Xti ) dt + (1 − ε)u(Xti ,t) dt,

∇h · ς ρl dx = 0 , (12)

for each vector field ς ∈ L2 (Rd ; ρl−1 ). Here, p ∈ P is the infimizer in the definition of W22 (ρl−1 | ρl ) (see (7)). We are now prepared to state the main theorem concerning ρ (L) (x,t), which was obtained from an interpolation of the minimizers {ρl }Ll=0 . The proof appears in Appendix B. Theorem 1: Let {ρl }Ll=0 be the sequence of minimizers of the time-stepping procedure (5) and ρ (L) (x,t) be the interpolated density defined in (9). Denote ρ(x,t) as the limiting density of ρ (L) (x,t), as ∆tl → 0 (We assume that such a density exists). Then, (i) If M = D, the K-L divergence, the density ρ(x,t) is a weak solution of the replicator equation: ∂ρ (x,t) = −(h(x) − hˆ t )ρ(x,t), ∂t

(13)

where the control function u(x,t) ∈ Rd . The initial conditions {X0i }Ni=1 are i.i.d. and drawn from the initial distribution p∗ (x, 0). We denote the distribution of Xti as p(x,t), and impose an admissibility requirement on the control: Definition 1 (Admissible control): The control input u(Xti ,t) is admissible if the random variable u(x,t) satisfies: E[|u|2 ] := E[∑di=1 |ui (x,t)|2 ] < ∞, for all t ∈ [0, T ]. Note that there are two types of distributions of interest in our analysis: 1) p∗ (x,t): The desired distribution, evolving according to (15). 2) p(x,t): The distribution of Xti . The control function u(x,t) is said to be optimal if p∗ ≡ p. That is, given p∗ (·, 0) = p(·, 0), our goal is to choose u in the particle dynamics so that these distributions coincide. The forward equation for p∗ (x,t) is given by (15). The forward equation for p(x,t) is the Fokker-Planck equation for the particle dynamics (17):

where hˆ t := h(x)ρ(x,t) dx. R

(ii) If M = 21 W22 , the Wasserstein metric, the density ρ obtained in (i) is a weak solution of the Fokker-Planck equation, ∂ρ (x,t) = ∇ · (ρ(x,t)∇h(x)). (14) ∂t Corollary 1: Suppose M is a convex combination of the K-L divergence and the Wasserstein metrics (see (8)) with a fixed ε ∈ (0, 1). Denote p∗ (x,t) := ρ(x,t), the limiting density described in Theorem 1. Then p∗ (x,t) is a weak solution of the evolution equation, ∂ p∗ (x,t) = ε∇ · (p∗ (x,t)∇h(x)) − (1 − ε)(h(x) − hˆ t )p∗ (x,t) ∂t (15) R where now hˆ t = h(x)p∗ (x,t) dx. B. Convergence Next we analyze the convergence of the density evolution in (15). The dynamics can be interpreted as the gradient flow of the energy E(p∗ ) with respect to a pseudo-metric which, in this case, is a convex combination of the Wasserstein distance and the K-L divergence. Hence, it is natural to set E(p∗ ) as the Lyapunov function for (15). The related result is summarized in the next theorem; the proof appears in Appendix C. Theorem 2: Let the density p∗ (x,t) satisfy (15), then: Z

h(x)p∗ (x,t) dx → h(x) ¯ as t → ∞. (16) If the global minimizer x¯ is unique, this implies that the density p∗ (x,t) asymptotically concentrates at the global minimizer x. ¯

(17)

∂p (x,t) = ε∇ · (p(x,t)∇h(x)) − (1 − ε)∇ · (p(x,t)u(x,t)). ∂t (18) B. Optimal control and consistency The optimal control input is chosen by inspection. In particular, comparison of (15) and (18) suggests the choice, u(x,t) = ∇φ (x,t), where φ (x,t) is the solution of the Poisson equation: ˆ ∇ · (p(x,t)∇φ (x,t)) = (h(x) − h)p(x,t), (19)

Z

φ (x,t)p(x,t) dx = 0. R

The normalization φ (x,t)p(x,t) dx = 0 is for convenience: if φ o is a solution to the Poisson equation above, we get the desired normalization by substracting its mean. For the purposes of both analysis and numerics, it is useful to introduce the weak formulation of the BVP (19). Denote Z H01 (Rd ; p) := {φ ∈ H 1 (Rd ; p) φ p dx = 0}. A function φ ∈ H01 (Rd ; p) is said to be a weak solution of the BVP (19) if Z

∇φ (x,t) · ∇ψ(x)p(x,t) dx = −

Z

ˆ (h(x) − h)ψ(x)p(x,t) dx, (20)

for all ψ ∈ H 1 (Rd ; p).R Denoting E[·] := ·p(x,t) dx, the weak form of the BVP (19) can also be expressed as

1716

ˆ E[∇φ · ∇ψ] = −E[(h − h)ψ],

∀ ψ ∈ H 1 (Rd ; p).

(21)

Fig. 1. (a) The Double-well function and the initial distribution of particles. (b) Particles (scattered vertically at each time) concentrate at the global minimizer quickly.

This representation is useful for the numerical algorithm described in Sec. IV-C. The proof of the following consistency theorem appears in Appendix D. The main part of the proof is to show that the control function, obtained by solving BVP (19), is admissible. Theorem 3: Consider the two distributions p∗ (x,t) and p(x,t) evolving according to (15) and (18), respectively. Suppose Assumptions (A1)-(A2) hold, and the control u(x,t) is chosen based on (19). Then, provided p∗ (·, 0) = p(·, 0), we have for all t ≥ 0, ∗

p (·,t) = p(·,t). C. Galerkin method for solving the BVP In this section, a Galerkin algorithm is described to construct an approximate solution of (19). The time t is fixed. For notational ease, the explicit dependence on time is suppressed (That is, p(x,t) is denoted as p(x), φ (x,t) as φ (x), u(x,t) as u(x) etc.). The function φ is approximated as, K

φ (x) =

∑ αk ψk (x), k=1

where {ψk (x)}Kk=1 are basis functions. The finite-dimensional approximation of (21) is to choose constants {αk }Kk=1 such that

where recall hˆ ≈ N −1 ∑Ni=1 h(Xti ) =: hˆ (N) . The important point to note is that the control function can be expressed in terms of certain averages taken over the population. V. N UMERICAL E XPERIMENTS Consider the minimization problem for the double-well function h(x) = −2x2 + x4 − 0.5x, depicted in Fig. 1 (a). The global minimizer x¯ = 1.057 while x ≈ −1 is a local minimizer. In the numerical simulation, the initial condition of the particles is sampled from the Gaussian distribution N (−0.8, 0.52 ). The sample size N = 1000. The particles are also depicted in Fig. 1 (a). Note that there are relatively more particles close to the local minimizer at −1 than to the global minimizer x. ¯ The control function is approximated by using the Galerkin procedure with the Fourier basis functions,   2π 2π x, cos( x), sin( x) . 4 4 The parameter ε = 0.02. Fig. 1 (b) depicts the convergence of particles to the global minimizer x. ¯

K

ˆ ∑ αk E[∇ψk · ∇ψ] = −E[(h − h)ψ],

∀ ψ ∈ S,

(22)

k=1

where S := span{ψ1 , ψ2 , . . . , ψK } ⊂ H 1 (Rd ; p). ˆ k ], α = Denoting [A]mk = E[∇ψk · ∇ψm ], bk = −E[(h − h)ψ (α1 , α2 , . . . , αK )T , the finite-dimensional approximation (22) is expressed as a linear matrix equation: Aα = b.

(23)

The matrix A and vector b are approximated by using only the particles: [A]mk = E[∇ψk · ∇ψm ] ≈

1 N ∑ ∇ψk (Xti ) · ∇ψm (Xti ), N i=1 N

ˆ k ] ≈ − 1 ∑ (h(Xti ) − h)ψ ˆ k (Xti ), bk = −E[(h − h)ψ N i=1

ACKNOWLEDGEMENT

The author is grateful to Prof. Prashant Mehta and Prof. Sean Meyn for several discussions related to the subject of this paper. A PPENDIX A. Proof of Lemma 1 We follow the approach proposed in [6] and introduce a flux {Φτ }τ≥0 to generate the first variation of the minimizer ρl for the functional (5): At time τ define ρτ := Φ#τ ρl as the push forward of ρl . Φτ is generated by a vector field ς : Rd → Rd , d Φτ = ς (Φτ ) , dτ

1717

Φ0 (x) = x

(24)

where for now we assume ς ∈ Cc1 . Thus for an arbitrary function f , Z

Z

f (z)ρτ (z) dz =

f (Φτ (x))ρl (x) dx,

(25)

|DΦτ (x)|ρτ (Φτ (x)) = ρl (x),

(26)

where D· denotes the Jacobian of a map and | · | denotes the determinant of a matrix. Denote i(τ) = Il (ρτ ), then the first-order optimality condition reads, d = 0. (27) i(τ) dτ τ=0

We evaluate the left hand side term by term according to (5). (i) M(ρ | ρl−1 ) = D(ρ | ρl−1 ), Z

i(τ) =

ρτ ln ρτ dz −

Z

Lemma 2: For {ρl }Ll=0 obtained in (10), under Assumption (A1)-(A2), we have 1) ρl is of the form ρl = e−Gl where Gl ∈ C2 , ∇Gl = O(|x|) as |x| → ∞, and D2 Gl ∈ L∞ . 2) There is λ¯ > 0 such that ρl satisfies LSI(λ¯ ) for all l. 3) If g ∈ L2 (Rd ; ρl−1 ), then g ∈ L2 (Rd ; ρl ) with Z

Z

hρ dz. (28)

Z

ρτ (z) ln ρτ (z) dz =

ln[|DΦτ (x)|]ρl (x) dx,

∇ · (ρl−1 ξl ) = ( f − fˆl−1 )ρl−1

Z

B. Proof of Theorem 1 (i) For the case M(ρ | ρl−1 ) = D(ρ | ρl−1 ), we begin with some preliminaries: Under Assumption (A1) (see Sec. II), the density ρ0 is known to satisfy a logarithmic Sobolev inequality for some constant λ0 > 0. That is, for all functions f on Rd with R 2 f ρ0 dx = 1,

|ξl |2 ρl−1 dx ≤ (const.)

(32)

[LSI(λ0 )]

Furthermore, the following lemma asserts that all minimizers ρl satisfy a uniform logarithmic Sobolev inequality. The proof is omitted.

Z

| f − fˆl−1 |2 ρl−1 dx ≤ C0 .

(33)

Substituting (32) into (30) with ς = ξl , we obtain fˆl − fˆl−1 = ∆tl

Z

∇h · ξl ρl dx,

(34)

On summing, L

Z

fˆL = fˆ0 + ∆tl ∑

∇h · ξl ρl dx.

(35)

l=1

The integral of the summand is well-defined since ξl ∈ L2 (Rd ; ρl ) (by Lemma 2–(3)) and ∇h ∈ L∞ . Now let ηl ∈ L2 (Rd ; ρl−1 ) be a solution of ∇ · (ρl−1 ηl ) = (∇h · ξl −

Z

∇h · ξl ρl−1 dx)ρl−1 .

(36)

By repeating the arguments now for ηl , the fact ∇h ∈ L∞ and relation (33) imply Z

|ηl |2 ρl−1 dx ≤ C1 ,

(37)

where the constant C1 is independent of l. Employing (30) once more with ς = ηl , Z

∆tl |∇ f |2 ρ0 dx.

(31)

where fˆl−1 := f ρl−1 dx. The existence and uniqueness of such a solution is proved in [11], where the following apriori bound is also shown:

Here the interchange of differentiation and integration is justified because ς has compact support, Φτ (x) = Φ0 (x) = x outside this support and the difference quotient  1  ln |∇Φτ (x)| − ln[|∇Φ0 (x)| τ d ln[|∇Φτ (x)|] τ=0 = ∇ · ς (x), as τ ↓ converges uniformly to dτ 0. Calculations for the remaining terms in (28) are similar and thus omitted. On substituting these results into (27) we obtain (11). Finally, to extend the E-L equation to an arbitrary vector field ς in L2 (Rd ; ρl−1 ), it suffices to approximate ς by a sequence of smooth, compactly supported vector fields. (ii) For derivation of (12) with M(ρ | ρl−1 ) = 21 W22 (ρ | ρl−1 ), see [6].

1 2λ0

(30)

R

τ=0

f 2 ln f 2 ρ0 dx ≤

(29)

Let ξl ∈ L2 (Rd ; ρl−1 ) be the solution of

where C = ρl (x) ln ρl (x) dx is independent of τ. Then Z d ρτ (z) ln ρτ (z) dz dτ τ=0 Z Z d =− ln[|DΦτ (x)|] ρl (x) dx = − ∇ · ς (x)ρl (x) dx. dτ

Z

∇h · ς ρl dx = 0.

k f kL2 (Rd ;ρl ) ≤ C0 .

Z

R

Z

Z

ρl ∇ · (ρl−1 ς ) dx − ∆tl ρl−1

ln[ρτ (Φτ (x))]ρl (x) dx

=C−

|g|2 ρl−1 dx.

Given a test function f ∈ Cc0 , f ∈ L2 (Rd ; ρl ) for all l ∈ {0, 1, 2, · · · L}, with a uniform bound,

For the first integral, using (25) and (26), Z

Z

Equation (29) implies that, if ρl−1 ∈ P, then ρl ∈ P (take g(x) = x). By induction, ρl ∈ P if ρ0 ∈ P for all l. Now we derive the result in (i). First, (11) can be compactly written as,

Z

ρl−1 ln ρl−1 dz + ∆tl

|g|2 ρl dx ≤ exp (2∆tl khk∞ )

∇h · ξl ρl dx = ∆tl

Z

∇h · ξl ρl−1 dx + El ,

(38)

and El = −( ∇h · ηl ρl dx)∆tl2 . Integrating by part and using (32), the first integral on the right hand side becomes,

1718

R

Z

∇h · ξl ρl−1 dx = −

Z

h( f − fˆl−1 )ρl−1 dx.

(39)

constant λ˜ > 0, and as shown in [11], equation (19) has a unique solution φ ∈ H01 (Rd ; p) satisfying

Substituting (38) and (39) into (35), we have, L

fˆL = fˆ0 − ∑

Z

L

h( f − fˆl−1 )ρl−1 dx + ∑ El .

l=1

(40)

To pass to the limit ∆tl ↓ 0 (i.e. L → ∞), we claim that ∑Ll=1 |El | → 0. Indeed, the uniform bound in (37), in conjunction with Lemma 2-(3) and the assumption ∇h ∈ L∞ , leads to, Z 1/2 2 1/2 Z ∆tl |ηl |2 ρl dx |∇h|2 ρl dx |El | ≤ ≤ C2 exp (∆tl khk∞ )∆tl2

ρ(x,t) := (const.) exp(−h(x)t)p∗0 (x), where (const.) is some normalization constant (see (10)). Then as ∆tl ↓ 0, ρ (L) (x,t) → ρ(x,t) a.s. and by (40),

= fˆ0 −

Z tZ 0

Z tZ 0

h(x)( f (x) − fˆs )ρ(x, s)dxds (h(x) − hˆ s ) f (x)ρ(x, s)dxds

for all f ∈ Cc0 , showing that ρ(x,t) is a weak solution of the replicator equation (13). For (ii), cf.,[6]. C. Proof of Theorem 2 Central to the proof is constructing a Lyapunov function that decreases over time. A natural choice is E(p∗ ) = R ∗ hp dx. A direct calculation shows:

(15)

d E(p∗ ) = dtZ

Z

h

∂ ∗ p dx ∂t Z

ˆ ∗ dx h∇ · (p∗ ∇h) − (1 − ε) h(h − h)p Z   ˆ 2 dx, = − p∗ ε|∇h|2 + (1 − ε)(h − h)

= ε

which is strictly negative for all distribution p∗ > 0. Moreover, E(p∗ ) is bounded below: E(p∗ ) =

Z

hp∗ dx ≥

|∇φ |2 p dx ≤

1 λ˜

Z

ˆ 2 p dx, |h − h|

(41)

which implies E[|u|2 ] =

Z

|∇φ |2 p dx ≤

1 λ˜

Z

ˆ 2 p dx ≤ 1 |h − h| λ˜

Z

|h|2 p dx.

(42) If we assume h ∈ L2 (Rd ; p∗0 ), then Lemma 2 (iii) and (42) ensure that the control u(x,t) obtained from (19) is admissible. R EFERENCES

for some C2 independent of l. Now denote

fˆt = fˆ0 −

Z

l=1

Z

h(x)p ¯ ∗ dx ≥ h(x). ¯

As a consequence, we have E(p∗ ) → h(x) ¯ as t → ∞. D. Proof of Theorem 3

[1] M. M. Ali, C. Khompatraporn, and Z. B. Zabinsky. A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. Journal of Global Optimization, 31(4):635–672, 2005. [2] M. M. Ali and A. T¨orn. Population set based global optimization algorithms: Some modifications and numerical studies. Computers and Operations Research, 31(10):1703–1725, 2004. [3] V. S. Borkar, N. Jayakrishnan, and S. Nalli. Manufacturing consent. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing, pages 1550–1555, September 2010. [4] J. Hu, M. C. Fu, and S. I. Marcus. A model reference adaptive search method for stochastic global optimization. Communications in Information and Systems, 8(3):245–276, 2008. [5] J. Hu, Y. Wang, E. Zhou, M. C. Fu, and S. I. Marcus. A survey of some model-based methods for global optimization. In H. Daniel and M. Adolfo, editors, Optimization, Control, and Applications of Stochastic Systems: In Honor of On´esimo Hern´andez-Lerma, Systems & Control: Foundations & Applications, pages 157–179. Birkh¨auser, 2012. [6] R. Jordan, D. Kinderlehere, and F. Otto. The variational formulation of the Fokker-Planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998. [7] O. Molvalioglu, Z. B. Zabinsky, and W. Kohn. Meta-control of an interacting-particle algorithm for global optimization. Nonlinear Analysis: Hybrid Systems, 4(4):659–671, 2010. [8] P. D. Moral. Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Springer, New York, 2004. [9] R. Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability, 1(2):127–190, 1999. [10] Y. Wang and M. C. Fu. Model-based evolutionary optimization. In Proceedings of the 2010 Winter Simulation Conference, pages 1199– 1210, December 2010. [11] T. Yang, R. S. Laugesen, P. G. Mehta, and S. P. Meyn. Multivariable feedback particle filter. In Proceedings of IEEE Conference on Decision and Control, pages 4063–4070, December 2012. [12] E. Zhou, M. C. Fu, and S. I. Marcus. A particle framework for randomized optimization algorithms. In Proceedings of the 2008 Winter Simulation Conference, pages 647–654, December 2008. [13] M. Zlochin, N. Birattari, and M. Meuleau, M. Dorigo. Model-based search for combinatorial optimization: A critical survey. Annals of Operations Research, 131(1-4):373–395, 2004.

If u(·,t) is admissible, the consistency is straightforward: substitute the first equality in (19) into (18) with u(x,t) = ∇φ (x,t), we get precisely the same evolution dynamics as (15). It remains to show uniqueness and admissibility of u(x,t). By Assumption (A1)-(A2), Lemma 2 and Theorem 1, we have for all time t, p = e−G (x,t) where G ∈ C2 , ∇G = O(|x|) as |x| → ∞ and D2 G ∈ L∞ . Moreover, there is some λ > 0 such that p satisfies LSI(λ ) uniformly in t. Then p is known to admit a spectral gap (or Poincar´e inequality) with some 1719

Suggest Documents