Bottom line of adaptive methods: add a biasing term, depending on ξ only, and adapt it on-the-fly in order to reach a uniform distribution of ξ(qt). ⢠The resulting ...
Adaptive Importance Sampling Strategies Tony LELIEVRE, Mathias ROUSSET, Gabriel STOLTZ MICMAC project team, INRIA & CERMICS, Ecole des Ponts et Chauss´ees http://cermics.enpc.fr/∼stoltz/
The metastability problem
Convergence of the Adaptive Biasing Force method [6]
• Applications in computational physics and Bayesian statisitics, when some high dimensional probability measure has to be sampled Z • Measure to be sampled µ(dq) = Z −1 e−βV (q) dq with Z = e−βV (q) dq r D 2 dWt, ensemble averages can be approximated by • For an ergodic dynamics such as dqt = −∇V (qt) dt + β trajectorial averages: Z Z T 1 hAi = A(q) µ(dq) = lim A(qt) dt (1) T →+∞ T 0 D • Although the convergence (1) is theoretically ensured, it can be very slow from a numerical viewpoint • Metastability arises from free-energy barriers, which can have either energetic or entropic origins
• Dynamics
q dqt = −∇ V − Ft ◦ ξ − β −1 ln(|∇ξ|−2) (qt) |∇ξ|−2(qt) dt + 2β −1|∇ξ|−1(qt) dWt, Z ξ (t, z) ′ F (z) = E f (X ) ξ(X ) = z = f (q) dψ t t t Σ(z)
• Existence and uniqueness of the solution [3] ! −1 ∇ V − Ft ◦ ξ ψ + β ∇ψ , ∂tψ = div −2 |∇ξ| Z • Nonlinear PDE on the law of qt: f |∇ξ|−1ψ(t, ·)dσΣ(z) Σ(z) ′ Z . F (z) = t −1ψ(t, ·)dσ |∇ξ| Σ(z) Σ(z)
• Expected longtime limits: Ft(z) → F (z), ψt(q) → ψ∞(q) = e−β(V (q)−F (ξ(q)))
• Proof using entropy estimates. The relative entropy of µ with respect to ν is H(µ|ν) =
Free-energy biased sampling • Consider a function ξ : D → Rm (m ≪ dim(D)) such that ξ(qt) is some slowly evolving degree of freedom (a notion to be precised...) • Marginal equilibrium distribution (m = 1 to simplify)
Σ(z)
e−βV (q) δξ(q)−z (dq) dz = Z −1
Z
e−βV (q)
Σ(z)
dσΣ(z)(dq) |∇ξ(q)|
where the macroscopic and microscopic entropies are respectively Z ξ ξ Em(t) = EM (t) = H ψ (t, ·) ψ∞ , em(t, z) ψ ξ (t, z) dz
dz = e−βF (z) dz,
M
where Σ(z) = q ∈ D ξ(q) = z is a submanifold of D
• Conditional equilibrium distribution
−βV (q) |∇ξ(q)|−1 σ e Σ(z)(dq) ξ ν (dq | z) = Z e−βV |∇ξ|−1 dσΣ(z)
• Mean force
∇F (z) =
f (q) ν ξ (dq | z)
∇ξ ∇ξ · ∇V −1 − β div f= 2 |∇ξ| |∇ξ|2
with
Σ(z)
• Application: entropic barrier 8
8
4
4
x coordinate
x coordinate
y coordinate
y coordinate
3.
−4
0
−4
0.
x coordinate
−8
−6
−2
2
6
8
−8 0.0
x coordinate
5000
10000
15000
20000
ξ ξ em(t, z) = H ν (t, · | z) ν (∞, · | z)
• The overall rate of convergence of the method is limited by the rate of convergence of the projected dynamics, so that the metastability in the ξ direction is removed
• When the potential is biased by the free-energy, V(q) = V (q) − F (ξ(q)), the new marginal distribution is constant = uniform sampling in ξ and the metastability is removed in this direction!
0
dµ dµ. Then ln dν
• The marginal density satisfies a simple diffusion equation ∂tψ ξ = β −1∂zz ψ ξ , therefore EM → 0 • Control of the microscopic entropy when assuming some uniform ergodicity for the dynamics at fixed ξ(q) = z (logarithmic Sobolev inequality)
Σ(z)
Z
E(t) = H(ψ(t, ·) | ψ∞) = EM (t) + Em(t),
−8 0
5000
Time
10000
15000
20000
Time
(1) Potential for which entropic barriers have to be overcome (0 in the region enclosed by the curve, +∞ outside) and (2) associated free energy profile when ξ(x, y) = x. Typical trajectories for a simple Metropolis random walk (3) and a dynamics biased by the free energy (4).
Application to Statistical Physics: Mulitple replica & Selection • Model dimer in a solvent (double-well potential), solvent particles interacting through the purely repulsive potential (truncated Lennard-Jones): #2 " 12 6 σ σ (r − σ − w)2 4ε − + ǫ if r ≤ σ, VWCA(r) = Vdimer(r) = h 1 − r r 2 w 0 if r > σ,
• Two energy minima (compact state r = r0 = 21/6σ, stretched state r = r0 + 2w), energy barrier h |q1 − q2| − r0 • Reaction coordinate = dimer bond length: ξ(q) = 2w ξ ∂zz ψt • Implementation with multiple replicas [7] and selection procedure with the fitness function S = c ξ ψt
Adaptive methods: A general framework and consistency results [5] • Bottom line of adaptive methods: add a biasing term, depending on ξ only, and adapt it on-the-fly in order to reach a uniform distribution of ξ(qt).
Bond length
3.0
• The resulting potential is Vt = V − Ft ◦ ξ, and rules to update the bias are needed q • Denote by ψ(t, q) the law of the process dqt = −∇(V − Ft ◦ ξ)(qt) dt + β2 dWt
• General update formula for Adaptive Biasing Potential method [4, 8] using the observed free energy ! Z dσΣ(z) dFt(z) −1 = Ft Fobs(t, z) , Fobs(t, z) = −β ln ψ(t, ·) dt |∇ξ| Σ(z) • General update formula for Adaptive Biasing Force method [1, 2] using the observed mean force Z dΓt(z) = Gt Γobs(t, z) − Γt(z) , Γobs(t, z) = f dψ ξ (t, ·|z) dt Σ(z) Possibly, set Γt(z) = Γobs(t, z) • If some equilibrium is reached and the updating functions Ft and Gt are strictly increasing (with Gt(0) = 0), then F∞ = F + c and Γ∞ = ∇F∞ • Issues with the case m > 1: ABF is not a gradient dynamics
References [1] E. Darve and A. Pohorille, Calculating free energies using average force, J. Chem. Phys. 115(20) (2001) 9169–9183 ´nin and C. Chipot, Overcoming free energy barriers using unconstrained molecular dynamics sim[2] J. He ulations, J. Chem. Phys. 121(7) (2004) 2904–2914 `vre and R. Roux, Existence, uniqueness and convergence of a particle approx[3] B. Jourdain, T. Lelie imation for the adaptive biasing force process, In preparation. [4] A. Laio and M. Parinello, Escaping free-energy minima, Proc. Natl. Acad. Sci. U.S.A 99 (2002) 12562-12566. `vre, M. Rousset and G. Stoltz, Computation of free energy profiles with parallel adaptive [5] T. Lelie dynamics, J. Chem. Phys. 126 (2007) 134111 `vre, M. Rousset and G. Stoltz, Long-time convergence of an adaptive biasing force method, [6] T. Lelie Nonlinearity 21 (2008) 1155-1181
0.5
3.5
0.4
2.5
Escape rate
µξ (dz) = Z −1
Z
the total entropy can be decomposed as
Z
2.0
0.3
0.2
1.5 0.1 1.0
0
50
100
150
0.0 0.0
200
0.1
0.2 Time
Time
0.3
0.4
Left: Dynamics with and without bias. Right: Selection procedure with increasing selection strength c.
Application to Bayesian statistics (Monte-Carlo ABF) • Hidalgo stamp problem: the thickness of Ndata = 485 stamps are measured, and the corresponding histogram is approximated by a mixture of N Gaussians. N × [v 3N −1, • Parameters x n = (q1, . . . , qN −1, µ1, . . . , µN , v1, . . . , vN ) ∈ SN −1 × [µ , µ ] , +∞) ⊂ R max min min o with SN −1 = (q1, . . . , qN −1) 0 ≤ qi ≤ 1, q1 + . . . qN −1 ≤ 1 r N v X PN −1 vi i 2 qi • The corresponding mixture is f (y | x) = exp − (y − µi) , where qN = 1 − i=1 qi 2π 2 i=1
• The likelihood of observing the data yi, 1 ≤ i ≤ Ndata is
Π(y | x) =
NY data
f (yd | x) ∝ e−βVlikelihood
d=1
• Potential V = Vprior + Vlikelihood such that the probability of a given configuration is proportional to exp(−V ) • A simple Metropolis random-walk is metastable • Use a Monte-Carlo ABF dynamics where ξ(x) = q1 is the reaction coordinate. • Principle of the method = update the average force experienced in the q1 direction and obtain the free-energy bias by integrating the approximated mean force
N=3 N=4 N=5 N=6 N=7
[7] P. Raiteri, A. Laio, F. L. Gervasio, C. Micheletti, and M. Parrinello, Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics, J. Phys. Chem. B 110(8) (2006) 3533–3539 [8] F. G. Wang and D. P. Landau, Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram, Phys. Rev. E 64(5) (2001) 056101
0.05
0.075
0.1 0.125 Thickness
0.15
Left: Histogram of the data, and fit with several gaussian modes. Middle: Biasing potential obtained from a Monte-Carlo ABF dynamics. Right: Evolution of the averages µ1, µ2 and µ3 for a biased dynamics