Adaptive Importance Sampling Strategies - CERMICS

Adaptive Importance Sampling Strategies Tony LELIEVRE, Mathias ROUSSET, Gabriel STOLTZ MICMAC project team, INRIA & CERMICS, Ecole des Ponts et Chauss´ees http://cermics.enpc.fr/∼stoltz/

The metastability problem

Convergence of the Adaptive Biasing Force method [6]

• Applications in computational physics and Bayesian statisitics, when some high dimensional probability measure has to be sampled Z • Measure to be sampled µ(dq) = Z −1 e−βV (q) dq with Z = e−βV (q) dq r D 2 dWt, ensemble averages can be approximated by • For an ergodic dynamics such as dqt = −∇V (qt) dt + β trajectorial averages: Z Z T 1 hAi = A(q) µ(dq) = lim A(qt) dt (1) T →+∞ T 0 D • Although the convergence (1) is theoretically ensured, it can be very slow from a numerical viewpoint • Metastability arises from free-energy barriers, which can have either energetic or entropic origins

• Dynamics

 q    dqt = −∇ V − Ft ◦ ξ − β −1 ln(|∇ξ|−2) (qt) |∇ξ|−2(qt) dt + 2β −1|∇ξ|−1(qt) dWt, Z ξ (t, z) ′  F (z) = E f (X ) ξ(X ) = z = f (q) dψ  t t  t Σ(z)

• Existence and uniqueness of the solution [3]  ! −1  ∇ V − Ft ◦ ξ ψ + β ∇ψ    , ∂tψ = div  −2  |∇ξ|   Z  • Nonlinear PDE on the law of qt: f |∇ξ|−1ψ(t, ·)dσΣ(z)   Σ(z) ′  Z . F (z) =  t   −1ψ(t, ·)dσ  |∇ξ|  Σ(z)  Σ(z)

• Expected longtime limits: Ft(z) → F (z), ψt(q) → ψ∞(q) = e−β(V (q)−F (ξ(q)))

• Proof using entropy estimates. The relative entropy of µ with respect to ν is H(µ|ν) =

Free-energy biased sampling • Consider a function ξ : D → Rm (m ≪ dim(D)) such that ξ(qt) is some slowly evolving degree of freedom (a notion to be precised...) • Marginal equilibrium distribution (m = 1 to simplify)

Σ(z)

e−βV (q) δξ(q)−z (dq) dz = Z −1

Z

e−βV (q)

Σ(z)

dσΣ(z)(dq) |∇ξ(q)|

where the macroscopic and microscopic entropies are respectively Z ξ ξ Em(t) = EM (t) = H ψ (t, ·) ψ∞ , em(t, z) ψ ξ (t, z) dz

dz = e−βF (z) dz,

M

where Σ(z) = q ∈ D ξ(q) = z is a submanifold of D

• Conditional equilibrium distribution

−βV (q) |∇ξ(q)|−1 σ e Σ(z)(dq) ξ ν (dq | z) = Z e−βV |∇ξ|−1 dσΣ(z)

• Mean force

∇F (z) =

f (q) ν ξ (dq | z)

∇ξ ∇ξ · ∇V −1 − β div f= 2 |∇ξ| |∇ξ|2

with

Σ(z)

• Application: entropic barrier 8

8

4

4

x coordinate

x coordinate

y coordinate

y coordinate

3.

−4

0

−4

0.

x coordinate

−8

−6

−2

2

6

8

−8 0.0

x coordinate

5000

10000

15000

20000

ξ ξ em(t, z) = H ν (t, · | z) ν (∞, · | z)

• The overall rate of convergence of the method is limited by the rate of convergence of the projected dynamics, so that the metastability in the ξ direction is removed

• When the potential is biased by the free-energy, V(q) = V (q) − F (ξ(q)), the new marginal distribution is constant = uniform sampling in ξ and the metastability is removed in this direction!

0

dµ dµ. Then ln dν

• The marginal density satisfies a simple diffusion equation ∂tψ ξ = β −1∂zz ψ ξ , therefore EM → 0 • Control of the microscopic entropy when assuming some uniform ergodicity for the dynamics at fixed ξ(q) = z (logarithmic Sobolev inequality)

Σ(z)

Z

E(t) = H(ψ(t, ·) | ψ∞) = EM (t) + Em(t),

−8 0

5000

Time

10000

15000

20000

Time

(1) Potential for which entropic barriers have to be overcome (0 in the region enclosed by the curve, +∞ outside) and (2) associated free energy profile when ξ(x, y) = x. Typical trajectories for a simple Metropolis random walk (3) and a dynamics biased by the free energy (4).

Application to Statistical Physics: Mulitple replica & Selection • Model dimer in a solvent (double-well potential), solvent particles interacting through the purely repulsive potential (truncated Lennard-Jones):  #2 " 12 6 σ σ  (r − σ − w)2 4ε − + ǫ if r ≤ σ, VWCA(r) = Vdimer(r) = h 1 − r r 2  w 0 if r > σ,

• Two energy minima (compact state r = r0 = 21/6σ, stretched state r = r0 + 2w), energy barrier h |q1 − q2| − r0 • Reaction coordinate = dimer bond length: ξ(q) = 2w ξ ∂zz ψt • Implementation with multiple replicas [7] and selection procedure with the fitness function S = c ξ ψt

Adaptive methods: A general framework and consistency results [5] • Bottom line of adaptive methods: add a biasing term, depending on ξ only, and adapt it on-the-fly in order to reach a uniform distribution of ξ(qt).

Bond length

3.0

• The resulting potential is Vt = V − Ft ◦ ξ, and rules to update the bias are needed q • Denote by ψ(t, q) the law of the process dqt = −∇(V − Ft ◦ ξ)(qt) dt + β2 dWt

• General update formula for Adaptive Biasing Potential method [4, 8] using the observed free energy ! Z dσΣ(z) dFt(z) −1 = Ft Fobs(t, z) , Fobs(t, z) = −β ln ψ(t, ·) dt |∇ξ| Σ(z) • General update formula for Adaptive Biasing Force method [1, 2] using the observed mean force Z dΓt(z) = Gt Γobs(t, z) − Γt(z) , Γobs(t, z) = f dψ ξ (t, ·|z) dt Σ(z) Possibly, set Γt(z) = Γobs(t, z) • If some equilibrium is reached and the updating functions Ft and Gt are strictly increasing (with Gt(0) = 0), then F∞ = F + c and Γ∞ = ∇F∞ • Issues with the case m > 1: ABF is not a gradient dynamics

References [1] E. Darve and A. Pohorille, Calculating free energies using average force, J. Chem. Phys. 115(20) (2001) 9169–9183 ´nin and C. Chipot, Overcoming free energy barriers using unconstrained molecular dynamics sim[2] J. He ulations, J. Chem. Phys. 121(7) (2004) 2904–2914 `vre and R. Roux, Existence, uniqueness and convergence of a particle approx[3] B. Jourdain, T. Lelie imation for the adaptive biasing force process, In preparation. [4] A. Laio and M. Parinello, Escaping free-energy minima, Proc. Natl. Acad. Sci. U.S.A 99 (2002) 12562-12566. `vre, M. Rousset and G. Stoltz, Computation of free energy profiles with parallel adaptive [5] T. Lelie dynamics, J. Chem. Phys. 126 (2007) 134111 `vre, M. Rousset and G. Stoltz, Long-time convergence of an adaptive biasing force method, [6] T. Lelie Nonlinearity 21 (2008) 1155-1181

0.5

3.5

0.4

2.5

Escape rate

µξ (dz) = Z −1

Z

the total entropy can be decomposed as

Z

2.0

0.3

0.2

1.5 0.1 1.0

0

50

100

150

0.0 0.0

200

0.1

0.2 Time

Time

0.3

0.4

Left: Dynamics with and without bias. Right: Selection procedure with increasing selection strength c.

Application to Bayesian statistics (Monte-Carlo ABF) • Hidalgo stamp problem: the thickness of Ndata = 485 stamps are measured, and the corresponding histogram is approximated by a mixture of N Gaussians. N × [v 3N −1, • Parameters x n = (q1, . . . , qN −1, µ1, . . . , µN , v1, . . . , vN ) ∈ SN −1 × [µ , µ ] , +∞) ⊂ R max min min o with SN −1 = (q1, . . . , qN −1) 0 ≤ qi ≤ 1, q1 + . . . qN −1 ≤ 1 r N v X PN −1 vi i 2 qi • The corresponding mixture is f (y | x) = exp − (y − µi) , where qN = 1 − i=1 qi 2π 2 i=1

• The likelihood of observing the data yi, 1 ≤ i ≤ Ndata is

Π(y | x) =

NY data

f (yd | x) ∝ e−βVlikelihood

d=1

• Potential V = Vprior + Vlikelihood such that the probability of a given configuration is proportional to exp(−V ) • A simple Metropolis random-walk is metastable • Use a Monte-Carlo ABF dynamics where ξ(x) = q1 is the reaction coordinate. • Principle of the method = update the average force experienced in the q1 direction and obtain the free-energy bias by integrating the approximated mean force

N=3 N=4 N=5 N=6 N=7

[7] P. Raiteri, A. Laio, F. L. Gervasio, C. Micheletti, and M. Parrinello, Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics, J. Phys. Chem. B 110(8) (2006) 3533–3539 [8] F. G. Wang and D. P. Landau, Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram, Phys. Rev. E 64(5) (2001) 056101

0.05

0.075

0.1 0.125 Thickness

0.15

Left: Histogram of the data, and fit with several gaussian modes. Middle: Biasing potential obtained from a Monte-Carlo ABF dynamics. Right: Evolution of the averages µ1, µ2 and µ3 for a biased dynamics