Prepared for submission to JHEP
An efficient algorithm for numerical computations of continuous densities of states
K. Langfelda B. Lucinib R. Pellegrinic A. Ragoa
arXiv:1509.08391v1 [hep-lat] 28 Sep 2015
a
Centre for Mathematical Sciences, Plymouth University, Plymouth, PL4 8AA, UK College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK c School of Physics and Astronomy, University of Edinburgh, Edinburgh EH9 3FD, UK b
E-mail:
[email protected],
[email protected],
[email protected],
[email protected] Abstract: In Wang-Landau type algorithms, Monte-Carlo updates are performed with respect to the density of states, which is iteratively refined during simulations. The partition function and thermodynamic observables are then obtained by standard integration. In this work, our recently introduced method in this class (the LLR approach) is analysed and further developed. Our approach is a histogram free method particularly suited for systems with continuous degrees of freedom giving rise to a continuum density of states, as it is commonly found in Lattice Gauge Theories and in some Statistical Mechanics systems. We show that the method possesses an exponential error suppression that allows us to estimate the density of states over several orders of magnitude with nearly-constant relative precision. We explain how ergodicity issues can be avoided and how expectation values of arbitrary observables can be obtained within this framework. We then demonstrate the method using Compact U(1) Lattice Gauge Theory as a show case. A thorough study of the algorithm parameter dependence of the results is performed and compared with the analytically expected behaviour. We obtain high precision values for the critical coupling for the phase transition and for the peak value of the specific heat for lattice sizes ranging from 84 to 204 . Our results perfectly agree with the reference values reported in the literature, which covers lattice sizes up to 184 . Robust results for the 204 volume are obtained for the first time. This latter investigation, which, due to strong metastabilities developed at the pseudo-critical coupling of the system, so far has been out of reach even on supercomputers with importance sampling approaches, has been performed to high accuracy with modest computational resources. This shows the potential of the method for studies of first order phase transitions. Other situations where the method is expected to be superior to importance sampling techniques are pointed out.
Contents 1 Introduction and motivations
1
2 Numerical determination of the density of states 2.1 The density of states 2.2 The LLR method 2.3 Observables and convergence with δE 2.4 The numerical algorithm 2.5 Ergodicity 2.6 Reweighting with the numerical density of states
2 2 3 8 11 13 14
3 Application to Compact U(1) Lattice Gauge Theory 3.1 The model 3.2 Simulation details 3.3 Volume dependence of log ρ˜ and computational cost of the algorithm 3.4 Numerical investigation of the phase transition 3.5 Discretisation effects
16 16 18 19 21 24
4 Discussion, conclusions and future plans
24
A Reference scale and volume scaling
27
1
Introduction and motivations
Monte-Carlo methods are widely used in Theoretical Physics, Statistical Mechanics and Condensed Matter (for an overview, see e.g. [1]). Since the inception of the field [2], most of the applications have relied on importance sampling, which allows us to evaluate stochastically with a controllable error multi-dimensional integrals of localised functions. These methods have immediate applications when one needs to compute thermodynamic properties, since statistical averages of (most) observables can be computed efficiently with importance sampling techniques. Similarly, in Lattice Gauge Theories, most quantities of interest can be expressed in the path integral formalism as ensemble averages over a positive-definite (and sharply peaked) measure, which, once again, provide an ideal scenario for applying importance sampling methods. However, there are noticeable cases in which Monte-Carlo importance sampling methods are either very inefficient or produce inherently wrong results for well understood reasons. Among those cases, some of the most relevant situations include systems with a sign problem (see [3] for a recent review), direct computations of free energies (comprising the study of properties of interfaces), systems with strong metastabilities (for instance, a system with
–1–
a first order phase transition in the region in which the phases coexist) and systems with a rough free energy landscape. Alternatives to importance sampling techniques do exist, but generally they are less efficient in standard cases and hence their use is limited to ad-hoc situations in which more standard methods are inapplicable. Noticeable exceptions are micro-canonical methods, which have experienced a surge in interest in the past fifteen years. Most of the growing popularity of those methods is due to the work of Wang and Landau [4], which provided an efficient algorithm to access the density of states in a statistical system with a discrete spectrum. Once the density of states is known, the partition function (and from it all thermodynamic properties of the system) can be reconstructed by performing one-dimensional numerical integrals. The generalisation of the Wang-Landau algorithm to systems with a continuum spectrum is far from straightforward [5, 6]. To overcome this limitation, a very promising method, here referred to as the Logarithmic Linear Relaxation (LLR) algorithm, was introduced in [7]. The potentialities of the method were demonstrated in subsequent studies of systems afflicted by a sign problem [8, 9], in the computation of the Polyakov loop probability distribution function in two-colour QCD with heavy quarks at finite density [10] and – rather unexpectedly – even in the determination of thermodynamic properties of systems with a discrete energy spectrum [11]. The main purpose of this work is to discuss in detail some improvements of the original LLR algorithm and to formally prove that expectation values of observables computed with this method converge to the correct result, which fills a gap in the current literature. In addition, we apply the algorithm to the study of Compact U(1) Lattice Gauge Theory, a system with severe metastabilities at its first order phase transition point that make the determination of observables near the transition very difficult from a numerical point of view. We find that in the LLR approach correlation times near criticality grow at most quadratically with the volume, as opposed to the exponential growth that one expects with importance sampling methods. This investigation shows the efficiency of the LLR method when dealing with systems having a first order phase transition. These results suggest that the LLR method can be efficient at overcoming numerical metastabilities in other classes of systems with a multi-peaked probability distribution, such as those with rough free energy landscapes (as commonly found, for instance, in models of protein folding or spin glasses). The rest of the paper is organised as follows. In Sect. 2 we cover the formal general aspects of the algorithm. The investigation of Compact U(1) Lattice Gauge Theory is reported in Sect. 3. A critical analysis of our findings, our conclusions and our future plans are presented in Sect. 4. Finally, some technical material is discussed in the appendix. Some preliminary results of this study have already been presented in [12].
2 2.1
Numerical determination of the density of states The density of states
Owing to formal similarities between the two fields, the approach we are proposing can be applied to both Statistical Mechanics and Lattice Field Theory systems. In order to keep the discussion as general as possible, we shall introduce notations and conventions that can describe simultaneously both cases. We shall consider a system described by the
–2–
set of dynamical variables φ, which could represent a set of spin or field variables and are assumed to be continuous. The action (in the field theory case) or the Hamiltonian (for the statistical system) is indicated by S and the coupling (or inverse temperature) by β. Since the product βS is dimensionless, without loss of generality we will take both S and β dimensionless. We consider a system with a finite volume V , which will be sent to infinity in the final step of our calculations. The finiteness of V in the intermediate steps allows us to define naturally a measure over the variables φ, which we shall call Dφ. Properties of the system can be derived from the function Z Z(β) = Dφ eβS[φ] . which defines the canonical partition function for the statistical system or the path integral in the field theory case. The density of state (which is a function of the value of S[φ] = E) is formally defined by the integral Z ρ(E) = Dφ δ S[φ] − E . (2.1) In terms of ρ(E), Z takes the form Z Z(β) =
dE ρ(E) eβE .
The vacuum expectation value (or ensemble average) of an observable O which is function of E can be written as1 Z 1 hOi = dE O(E) ρ(E) eβE . (2.2) Z(β) Hence, a numerical determination of ρ(E) would enable us to express Z and hOi as numerical integrals of known functions in the single variable E. This approach is inherently different from conventional Monte-Carlo calculations, which relie on the concept of importance sampling, i.e. the configurations contributing to the integral are generated with probability Pβ (E) = ρ(E) eβE /Z(β) . Owing to this conceptual difference, the method we are proposing can overcome notorious drawbacks of importance sampling techniques. 2.2
The LLR method
We will now detail our approach to the evaluation of the density of states by means of a lattice simulations. Our initial assumption is that the density of states is a regular function of the energy that can be always approximated in a finite interval by a suitable functional expansion. If we consider the energy interval [Ek , Ek + δE ], under the physically motivated 1
The most general case in which O(φ) can not be written as a function of E is discussed in Subsect. 2.3.
–3–
assumption that the density of states is a smooth function in this interval, the logarithm of the latter quantity can be written, using Taylor’s theorem, as d ln ρ δE δE + E − Ek − + Rk (E) , (2.3) ln ρ(E) = ln ρ Ek + 2 dE E=Ek +δE /2 2 1 d2 ln ρ δE 2 3 Rk (E) = E − Ek − + O(δE ). 2 dE 2 Ek +δE /2 2 Thereby, for a given action E, the integer k is chosen such that Ek ≤ E ≤ Ek + δE ,
Ek = E0 + k δE .
Our goal will be to devise a numerical method to calculate the Taylor coefficients d ln ρ ak := dE E=Ek +δE /2
(2.4)
and to reconstruct from these an approximation for the density of states ρ(E). By introducing the intrinsic thermodynamic quantities, Tk (temperature) and ck (specific heat) by d ln ρ d2 ln ρ 1 1 1 = ak , = = − 2 . (2.5) dE E=Ek +δE /2 Tk dE 2 E=Ek +δE /2 Tk ck V we expose the important feature that the target coefficients ak are independent of the 2 /V . In all practical applications, R volume while the correction Rk (E) is of order δE k will be numerically much smaller than ak δE . For a certain parameter range (i.e., for the correlation length smaller than the lattice size), we can analytically derive this particular volume dependence of the density derivatives. Details are left to the appendix. Using the trapezium rule for integration, we find in particular ρ(Ek+1 + δE /2) ln = ρ(Ek + δE /2)
Z
Ek+1 +
δE 2
δ Ek + 2E
d ln ρ δE 3 dE = [ak + ak+1 ] + O(δE ). dE 2
(2.6)
Using this equation recursively, we find ln
ρ(EN + ρ(E0 +
δE 2 ) δE 2 )
=
N −1 X aN a0 2 δE + ak δE + δE + O(δE ). 2 2
(2.7)
k=1
Note that N δE = O(1). Exponentiating (2.3) and using (2.7), we obtain n o δE 2 ρ(E) = ρ EN + exp aN (E − EN − δE /2) + O(δE ) 2 ! N −1 Y ak δE 2 = ρ0 e exp aN (E − EN ) + O(δE ) , k=1
where we have defined an overall multiplicative constant by δE ρ0 = ρ E0 + ea0 δE /2 . 2
–4–
(2.8) (2.9)
We are now in the position to introduce the piecewise-linear and continuous approximation of the density of states by ! N −1 Y ak δE ρ˜(E) = ρ0 e eaN (E−EN ) , N (E) : EN ≤ E < EN +1 . (2.10) k=1
i.e., N is chosen in such a way that EN ≤ E < EN + δE for a given E. With this definition, we obtain the remarkable identity n o h i 2 2 ρ(E) = ρ˜ (E) exp O(δE ) = ρ˜ (E) 1 + O(δE ) . (2.11) which we will extensively use below. We will observe that ρ(E) spans many orders of magnitude. The key observation is that our approximation implements exponential error suppression, meaning that ρ(E) can be approximated with nearly-constant relative error despite it may reach over thousands of orders of magnitude: 1−
ρ˜(E) 2 = O δE . ρ(E)
(2.12)
We will now present our method to calculate the coefficients ak . To this aim, we introduce the action restricted and re-weighted expectation values [7] with a being an external variable: Z 1 hhW [φ]iik (a) = Dφ θ[Ek ,δE ] (S[φ]) W [φ] e−aS[φ] , (2.13) Nk Z Z Ek +δE −aS[φ] Nk = Dφ θ[Ek ,δE ] e = dE ρ(E) e−aE , (2.14) Ek
where we have used (2.1) to express Nk as an ordinary integral. We also introduced the modified Heaviside function ( 1 for Ek ≤ S ≤ Ek + δE θ[Ek ,δE ] (S) = 0 otherwise . If the observable only depends on the action, i.e., W [φ] = O(S[φ]), (2.13) simplifies to hhOiik (a) =
1 Nk
Z
Ek +δE
dE ρ(E) O(E) e−aE ,
(2.15)
Ek
Let us now consider the specific action observable δE , 2
(2.16)
hh∆Eiik (a) = 0 .
(2.17)
∆E = S − Ek − and the solution a of the non-linear equation
–5–
Inserting ρ(E) from (2.8) into (2.15) and defining ∆a = ak − a, we obtain: R E +δ 2 ρ(Ek + δE /2) Ekk E dE (E − Ek − δE /2) e∆a (E−Ek ) eO(δE ) hh∆Eiik (a) = R E +δ 2 ρ(Ek + δE /2) Ekk E dE e∆a (E−Ek ) eO(δE ) R Ek +δE dE (E − Ek − δE /2) e∆a (E−Ek ) Ek 2 = + O δE = 0. (2.18) R Ek +δE ∆a (E−E ) k dE e Ek Let us consider for the moment the function R Ek +δE dE (E − Ek − δE /2) e∆a (E−Ek ) Ek F (∆a) := . R Ek +δE dE e∆a (E−Ek ) Ek It is easy to check that F is monotonic and vanishing for ∆a = 0: F 0 (∆a) > 0 ,
F (∆a = 0) = 0 .
We therefore conclude for any δE that if (2.18) does have a solution, this solution is unique. For sufficiently small δE there is a solution, and, hence, the only solution is given by: d ln ρ 2 hh∆Eiik (a) = 0 ⇔ a = + O δE . (2.19) dE E=Ek + δ2E The later equation is at the heart of the LLR algorithm: it details how we can obtain the log-rho derivative by calculating the Monte-Carlo average hh∆Eiik (a) (using (2.13)) and solving a non-linear equation, i.e., (2.17). In the following, we will discuss the practical implementation by addressing two questions: (i) How do we solve the non-linear equation? (ii) How do we deal with the statistical uncertainty since the Monte-Carlo method only provides stochastic estimates for the expectation value hh∆Eiik (a)? Let us start with the standard Newton-Raphson method to answer question (i). Starting from an initial guess a(0) for the solution, this method produces a sequence a(0) → a(1) → a(2) → . . . → a(n) → a(n+1) . . . , which converges to the true solution ak . Starting from a(n) for the solution, we would like to derive an equation that generates a value a(n+1) that is even closer to the true solution: d hh∆Eiik a(n) hh∆Eiik a(n+1) = hh∆Eiik a(n) + a(n+1) − a(n) = 0 . (2.20) da Using the definition of hh∆Eiik a(n+1) in (2.18) with reference to (2.16) and (2.15), we find: h
i d hh∆Eiik (a) = − ∆E 2 k (a) − hh∆Eii2k (a) =: − σ 2 (∆E; a) . (2.21) da We thus find for the improved solution: a(n+1) = a(n) +
hh∆Eiik (a(n) ) . σ 2 (∆E; a(n) )
–6–
(2.22)
We can convert the Newton-Raphson recursion into a simpler fixed point iteration if we assume that the choice a(n) is sufficiently close to the true value ak such that δE a(n) − ak 1 . Without affecting the precision with which the solution a of (2.18) can be obtained, we replace 2 i h i 1 2 h δE 1 + O δE ∆a 1 + O(δE ) . (2.23) σ 2 (∆E; a) = 12 Hence, the Newton-Raphson iteration is given by a(n+1) = a(n) +
12 (n) ) 2 hh∆Eiik (a δE
(2.24)
We point out that one fixed point of the above iteration, i.e., a(n+1) = a(n) = a, is attained for hh∆Eiik (a) = 0 , which, indeed, is the correct solution. We have already shown that the above equation has only one solution. Hence, if the iteration converges at all, it necessarily converges to the true solution. Note that convergence can always be achieved by suitable choice of under-relaxation. We here point out that the solution to question (ii) above will involve a particular type of under-relaxation. Let us address the question (ii) now. We have already pointed out that we have only a stochastic estimate for the expectation value hh∆Eiik (a) and the convergence of the Newton-Raphson method is necessarily hampered by the inevitable statistical error of the estimator. This problem, however, has been already solved by Robbins and Monroe [13]. For completeness, we shall now give a brief presentation of the algorithm. The starting point is the function M (x), and a constant α, such that the equation M (x) = α has a unique root at x = θ. M (x) is only available by stochastic estimation using the random variable N (x): E[N (x)] = M (x) , with E[N (x)] being the ensemble average of N (x). The iterative root finding problem is of the type xn+1 = xn + cn (α − N (xn )) (2.25) where cn is a sequence of positive numbers sizes satisfying the requirements ∞ X
cn = ∞
and
n=0
∞ X
c2n < ∞
(2.26)
n=0
It is possible to prove that under certain assumptions [13] on the function M (x) the limn→∞ xn converges in L2 and hence in probability to the true value θ. A major advance in understanding the asymptotic properties of this algorithm was the main result of [13]. If we restrict ourselves to the case c cn = (2.27) n
–7–
one can prove that
√
n(xn − θ) is asymptotically normal with variance σx2
=
c2 σξ2 2 c M 0 (x) − 1
(2.28)
where σξ2 is the variance of the noise. Hence, the optimal value of the constant c, which minimises the variance is given by 1 c= 0 . (2.29) M (θ) Adapting the Robbins-Monro approach to our root finding iteration in (2.24), we finally obtain an under-relaxed Newton-Raphson iteration a(n+1) = a(n) +
2 δE
12 hh∆Eiik (a(n) ) , (n + 1)
(2.30)
which is optimal with respect to the statistical noise during iteration. 2.3
Observables and convergence with δE
We have already pointed out that expectation values of observables depending on the action only can be obtained by a simple integral over the density of states (see (2.2)). Here we develop a prescription for determining the values of expectations of more general observables by folding with the numerical density of states and analyse the dependence of the estimate on δE . Let us denote a generic observable by B(φ). Its expectation value is defined by Z 1 hB[φ]i = Dφ B[φ] eβS[φ] Z(β)
(2.31)
In order to relate to the LLR approach, we break up the latter integration into energy intervals: Z 1 X hB[φ]i = Dφ θ[Ei ,δE ] B[φ] eβS[φ] . (2.32) Z(β) i
Note that hB[φ]i does not depend on δE . We can express hB[φ]i in terms of a sum over double-bracket expectation values by choosing W := B[φ] exp{(β + ai )S[φ]} in (2.13). Without any approximation, we find: 1 X Ni eai Ei hhB[φ] exp{βS[φ] + ai (S[φ] − Ei )}ii (Ei ), Z(β) i X ai Ei Z(β) = Ni e hh exp{βS[φ] + ai (S[φ] − Ei )}ii (Ei ) .
hB[φ]i =
i
–8–
(2.33) (2.34)
where Ni = Ni (ai ) is defined in (2.14). The above result can be further simplified by using (2.11): Z Ei +δE Z Ei +δE 2) ai Ei O(δE Ni e = dE ρ(E) exp{−ai (E − Ei )} = e dE ρ˜(E) exp{−ai (E − Ei )} Ei
Ei
Z
2
Ei +δE
2
= eO(δE ) ρ˜(Ei ) dE = δE ρ˜ (Ei ) eO(δE ) Ei h i 2 = δE ρ˜ (Ei ) 1 + O(δE ) .
(2.35)
We now define the approximation to hB[φ]i by 1 X δE ρ˜ (Ei ) hhB[φ] exp{βS[φ] + ai (S[φ] − Ei )}ii hB[φ]iapp = Z(β) i X Z(β) := δE ρ˜ (Ei ) hh exp{βS[φ] + ai (S[φ] − Ei )}ii .
(2.36) (2.37)
i
Since the double-bracket expectation values do not produce a singularity if δE → 0, i.e., lim hhB[φ] exp{βS[φ] + ai (S[φ] − Ei )}ii = finite ,
δE →0
using (2.35), from (2.33) and (2.34) we find that X 3 2 hB[φ]i = hB[φ]iapp + O(δE ) = hB[φ]iapp + O(δE ).
(2.38)
i
The latter formula together with (2.36) provides access to all types of observables using the LLR method with little more computational resources: Once the Robbins-Monro iteration (2.30) has settled for an estimate of the coefficient ak , the Monte-Carlo simulation simply continues to derive estimators for the double-bracket expectation values in (2.36) and (2.37). With the further assumption that the double-bracket expectation values are (semi-)positive, an even better error estimate is produced by our approach: h i X 3 2 hB[φ]i = hB[φ]iapp + O(δE ) = hB[φ]iapp 1 + O(δE ) . i
This implies that the observable hB[φ]i can be calculated with an relative error of order 2 . Indeed, we find from (2.33,2.34,2.35) that δE 1 X δE ρ˜ (Ei ) hhB[φ] exp{βS[φ] + ai (S[φ] − Ei )}ii (2.39) hB[φ]i = Z(β) i n o 2 × exp O(δE ) , X Z(β) := δE ρ˜ (Ei ) hh exp{βS[φ] + ai (S[φ] − Ei )}ii . (2.40) i
Thereby, we have used X n o X X 2 2 2 ai exp ci δE |ai | exp{ci δE } ≤ |ai | exp{cmax δE } ≤ i i i n o X X 2 2 = exp{cmax δE } ai = exp O(δE ) × ai . i
–9–
i
The assumption of (semi-)positive double expectation values is true for many action observables, and possibly also for Wilson loops, whose re-weighted and action restricted double expectation values might turn out to be positive (as it is the case for their standard expectation values). In this case, our method would provide an efficient determination of those quantities. This is important in particular for large Wilson loop expectation values, since they are notoriously difficult to measure with importance sampling methods (see e.g. [14]). We also note that, in order to have an accurate determination of a generic observable, any Monte-Carlo estimate of the double expectation values must be obtained to good precision dictated by the size of δE . A detailed numerical investigation of these and related issues is left to future work. For the specific case that the observable B[φ] only depends on the action S[φ], we circumvent this problem and evaluate the double-expectation values exactly. To this aim, we introduce for the general case hhW [φ]iik the generalised density wk (E) by Z ρ(E) wk (E) = Dφ θ[Ek ,δE ] (S[φ]) W [φ] δ E − S[φ] . (2.41) We then point out that if W [φ] is depending on the action only, i.e., W [φ] = f (S[φ]), we obtain: wk (E) = f (E) θ[Ek ,δE ] (E) . With the definition of the double expectation value (2.13), we find: R Ek +δE hhW [φ]iik (ak ) =
Ek
dE ρ(E) e−ak E wk (E) R Ek +δE dE ρ(E) e−ak E Ek
(2.42)
Rather than calculating hhW [φ]iik by Monte-Carlo methods, we can analytically evaluate 2 ) ). Using the observation that for any smooth (C ) function this quantity (up to order O(δE 2 g Z Ek +δE δE 3 dE g(E) = δE g Ek + + O δE , 2 Ek and using this equation for both, numerator and denominator of (2.42), we conclude that δE 2 hhW [φ]iik (ak ) = wk Ek + + O δE . (2.43) 2 Let us now specialise to the case that is relevant for (2.39) with B depending on the action only: W [φ] = b S[φ] exp{βS[φ] + ai (S[φ] − Ei )}, wi (E) = b(E) exp{βE + ai (E − Ei )}.
(2.44)
This leaves us with δE β(Ei + δE ) ai δE 2 2 e 2 + O δE . hhW [φ]iii (ai ) = b Ei + e 2
– 10 –
(2.45)
Inserting (2.43) together with (2.44) into (2.36), we find: δE β(Ei + δE ) 1 X δE 2 2 bi Ei + hB[φ]i = δE ρ˜ Ei + e + O δE , Z(β) 2 2 i X δE δE eβ(Ei + 2 ) . Z(β) = δE ρ˜ Ei + 2
(2.46) (2.47)
i
Below, we will numerically test the quality of expectation values obtained by the LLR approach using action observables only, i.e., B[φ] = O(S[φ]). We will find that we indeed 2 for this type of observables (see below Fig. 6). achieve the predicted precision in δE 2.4
The numerical algorithm
So far, we have shown that a piecewise continuous approximation of the density of states that is linear in intervals of sufficiently small amplitude δE allows us to obtain a controlled estimate of averages of observables and that the angular coefficients ai of the linear approximations can be computed in each interval i using the Robbins-Monro recursion (2.30). Imposing the continuity of log ρ(E), one can then determine the latter quantity up to an additive constant, which does not play any role in cases in which observables are standard ensemble averages. The Robbins-Monro recursion can be easily implemented in a numerical algorithm. Ideally, the recurrence would be stopped when a tolerance for ai is reached, i.e. when 12 ∆Ei (a(n) ) i (n+1) (n) − ai = ≤, (2.48) ai 2 (n + 1) δE with (for instance) set to the precision of the computation. When this condition is (n+1) fulfilled, we can set ai = ai . However, one has to keep into account the fact that the computation of ∆Ei requires an averaging over Monte-Carlo configurations. This brings into play considerations about thermalisation (which has to be taken into account each (n) (n+1) time we send ai → ai ), the number of measurements used for determining ∆Ei at (n) (n) fixed ai and – last but not least – fluctuations of the ai themselves. Following those considerations, an algorithm based on the Robbins-Monro recursion relation should depend on the following input (tunable) parameters: • NTH , the number of Monte-Carlo updates in the restricted energy interval before starting to measure expectation values; • NSW , the number of iterations used for computing expectation values; • NRM , the number of Robbins-Monro iterations for determining ai ; • NB , number of final values from the Robbins-Monro iteration subjected to a subsequent bootstrap analysis.
– 11 –
The version of the LLR method proposed and implemented in this paper is reported in an algorithmic fashion in the box Algorithm 1. This implementation differs from that provided in [7, 8] by the replacement of the originally proposed root-finding procedure based on a deterministic Newton-Raphson like recursion with the Robbins-Monro recursion, which is better suited to the problem of finding zeros of stochastic equations. Since the ai are Algorithm 1: The LLR method as implemented in this work. Input: NSW , NTH , NRM , NA Output: ai ∀i 1 for 0 ≤ i < (Emax − Emin ) /δE do 2 Initialise Ei = Emin + iδE , a0i = a ¯i ; 3 for 0 ≤ n < NRM do 4 for k ≤ NSW do 5 Evolve the whole system with an importance sampling algorithm for one sweep according to the probability distribution n
P (E) ∝ e−ai E
7
accepting only configuration such that Ei ≤ E ≤ Ei + δE if j ≥ NTH then Compute E (j) , the value of the energy in the current configuration j;
8
Compute
6
(n)
∆Ei (ai ) =
1 NSW − NTH
X
E (j) − Ei −
j>NTH
9
δE 2
Compute (n)
(n+1)
ai 10
(n)
= ai
−
12∆Ei (ai ) 2 (n + 1) δE
Repeat NB times to produce NB candidates ai for a subsequent bootstrap analysis
determined stochastically, a different reiteration of the algorithm with different starting conditions and different random seeds would produce a different value for the same ai . The stochastic nature of the process implies that the distribution of the ai found in different runs is Gaussian. The generated ensemble of the ai can then be used to determine the error of the estimate of observables using analysis techniques such as jackknife and bootstrap. The parameters Emin and Emax depend on the system and on the phenomenon under investigation. In particular, standard thermodynamic considerations on the infinite volume limit imply that if one is interested in a specific range of temperatures and the studied observables can be written as statistical averages with Gaussian fluctuations, it is possible
– 12 –
Figure 1: Left: For contiguous energy intervals if a transition between configurations with energy in the same interval requires going through configurations with energy that are outside that interval, the simulation might get trapped in one of the allowed regions (in green). Right: For overlapping energy intervals with replica exchange, the simulation can travel from one allowed region to the other through excursions to the upper interval. to restrict the range of energies between the energy that is typical of the smallest considered temperature and the energy that is typical of the highest considered temperature. Determining a reasonable value for the amplitude of the energy interval δE and the other tunable parameters NSW , NTH , NRM and NA requires a modest amount of experimenting with trial values. In our applications we found that the results were very stable for wide ranges of values of those parameters. Likewise, a ¯i , the initial value for the Robbins-Monro recursion in interval i, does not play a crucial role; when required and possible, an initial value close to the expected result can be inferred inverting hE(β)i, which can be obtained with a quick study using conventional techniques. The average hh. . .ii imposes an update that restricts configurations to those with energies in a specific range. In most of our studies, we have imposed the constraint analytically at the level of the generation of the newly proposed variables, which results in a performance that is comparable with that of the unconstrained system. Using a simple-minded more direct approach, in which one imposes the constraint after the generation of the proposed new variable, we found that in most cases the efficiency of Monte-Carlo algorithms did not drop drastically as a consequence of the restriction, and even for systems like SU(3) (see Ref. [7]) we were able to keep an efficiency of at least 30% and in most cases no less than 50% with respect to the unconstrained system. 2.5
Ergodicity
Our implementation of the energy restricted average hh· · ·ii assumes that the update algorithm is able to generate all configurations with energy in the relevant interval starting from configurations that have energy in the same interval. This assumption might be too strong when the update is local2 in the energy (i.e. each elementary update step changes the energy by a quantity of order one for a system with total energy of order V ) and there 2
This is for instance the case for the popular heath-bath and Metropolis update schemes.
– 13 –
are topological excitations that can create regions with the same energy that are separated by high energy barriers. In these cases, which are rather common in gauge theories and statistical mechanics3 , generally in order to go from one acceptable region to the other one has to travel through a region of energies that is forbidden by an energy-restricted update method such as the LLR. Hence, by construction, in such a scenario our algorithm will get trapped in one of the allowed regions. Therefore, the update will not be ergodic. In order to solve this problem, one can use an adaptation of the replica exchange method [15], as first proposed in [16]. The idea is that instead of dividing the whole energy interval in contiguous sub-intervals overlapping only in one point (in the following simply referred to as contiguous intervals), one can divide it in sub-intervals overlapping in a finite energy region (this case will be referred to as overlapping intervals). With the latter prescription, after a fixed number of iterations of the Robbins-Monro procedure, we can check whether in any pairs of overlapping intervals (I1 , I2 ) the energy of both the corresponding configurations is in the common region. For pairs fulfilling this condition, we can propose an exchange of the configurations with a Metropolis probability (n) (n) aI −aI (EC1 −EC2 ) 1 2 Pswap = min 1, e , (2.49) (n)
(n)
where aI1 and aI2 are the values of the parameter a at the current n-th iterations of the Robbins-Monro procedure respectively in intervals I1 and I2 and EC1 (EC2 ) is the value of the energy of the current configuration C1 (C2 ) of the replica in the interval I1 (I2 ). If the proposed exchange is accepted, C1 → C2 and C2 → C1 . With repeated exchanges of configurations from neighbour intervals, the system can now travel through all configuration space. A schematic illustration of how this mechanism works is provided in Fig. 1. As already noticed in [16], the replica exchange step is amenable to parallelisation and hence can be conveniently deployed in calculations on massively parallel computers. Note that the replica exchange step adds another tunable parameter to the algorithm, which is the number NSWAP of configurations swaps during the Monte-Carlo simulation at a given Monte-Carlo step. A modification of the LLR algorithm that incorporates this step can be easily implemented. 2.6
Reweighting with the numerical density of states
In order to screen our approach outlined in subsections 2.2 and 2.3 for ergodicity violations and to propose an efficient procedure to calculate any observable once an estimate for the density of states has been obtained, as an alternative to the replica exchange method discussed in the previous section, we here introduce an importance sampling algorithm with reweighting with respect to the estimate ρ˜. This algorithm features short correlation times even near critical points. Consider for instance a system described by the canonical 3
For instance, in a d-dimensional Ising system of size Ld , to go from one groundstate to the other one needs to create a kink, which has energy growing as Ld−1 .
– 14 –
ensemble. We define a modified Boltzmann weight WB (E) as follows: −β E+c1 for E < E min ; e 1 WB (E) = 1/˜ ρ(E) for Emin ≤ E ≤ Emax ; e−β2 E+c2 for E > E max .
(2.50)
Here Emin and Emax are two values of the energy that are far from the typical energy of interest E: Emin E Emax . (2.51) If conventional Monte-Carlo simulations can be used for numerical studies of the given system, we can chose β1 and β2 from the conditions hE(βi )i = Ei ,
i = 1, 2 .
(2.52)
If importance sampling methods are inefficient or unreliable, β1 and β2 can be chosen to be the micro-canonical βµ corresponding respectively to the density of states centred in Emin and Emax . These βµ are outputs of our numerical determination ρ˜(E). The two constants c1 and c2 are determined by requiring continuity of WB (E) at Emin and at Emax : lim
− E→Emin
WB (E) =
lim
+ E→Emin
WB (E)
and
lim
− E→Emax
WB (E) =
lim
+ E→Emax
WB (E) . (2.53)
Let ρ(E) be the correct density of state of the system. If ρ˜(E) = ρ(E), then for Emin ≤ E ≤ Emax ρ(E)WB (E) = 1 , (2.54) and a Monte-Carlo update with weights WB (E) drives the system in configuration space following a random walk in the energy. In practice, since ρ˜(E) is determined numerically, upon normalisation ρ(E)WB (E) ' 1 , (2.55) and the random walk is only approximate. However, if ρ˜(E) is a good approximation of ρ(E), possible free energy barriers and metastabilities of the canonical system can be successfully overcome with the weights (2.50). Values of observables for the canonical ensemble at temperature T = 1/β can be obtained using reweighting: hO(β)i =
hOe−βE (WB (E))−1 iW , he−βE (WB (E))−1 iW
(2.56)
where h i denotes average over the canonical ensemble and h iW average over the modified ensemble defined in (2.50). The weights WB (E) guarantees ergodic sampling with small auto-correlation time for the configurations with energies E such that Emin ≤ E ≤ Emax , while suppressing to energy E Emin and E Emax . Hence, as long as for a given β of p the canonical system E = hEi and the energy fluctuation h∆E = h(E − hEi)2 i are such that Emin hEi − ∆E and hEi + ∆E Emax , (2.57)
– 15 –
the reweighting (2.56) does not present any overlap problem. The role of Emin and Emax is to restrict the approximate random walk only to energies that are physically interesting, in order to save computer time. Hence, the choice of Emin , Emax and of the corresponding β1 , β2 do not need to be fine-tuned, the only requirement being that Eqs. (2.57) hold. These conditions can be verified a posteriori. Obviously, choosing the smallest interval Emax − Emin where the conditions (2.57) hold optimises the computational time required by the algorithm. The weights (2.56) can be easily imposed using a metropolis or a biased metropolis [17]. Again, due to the absence of free energy barriers, no ergodicity problems are expected to arise. This can be checked by verifying that in the simulation there are various tunnellings (i.e. round trips) between Emin and Emax and that the frequency histogram of the energy is approximately flat between Emin and Emax . Reasonable requirements are to have O(100 − 1000) tunnellings and an histogram that is flat within 15-20%. These criteria can be used to confirm that the numerically determined ρ(E) is a good approximation of ρ(E). The flatness of the histogram is not influenced by the β of interest in the original multi-canonical simulation. This is particularly important for first order phase transitions, where traditional Monte-Carlo algorithms have a tunnelling time that is exponentially suppressed with the volume of the system. Since the modified ensemble relies on a random walk in energy, the tunnelling time between two fixed energy densities is expected to grow only as the square root of the volume. This procedure of using a modified ensemble followed by reweighting is inspired by the multi-canonical method [18], the only substantial difference being the recursion relation for determining the weights. Indeed for U(1) lattice gauge theory a multi-canonical update for which the weights are determined starting from a Wang-Landau recursion is discussed in [19]. We also note that the procedure used here to restrict ergodically the energy interval between Emin and Emax can be easily implemented also in the replica exchange method analysed in the previous subsection.
3 3.1
Application to Compact U(1) Lattice Gauge Theory The model
Compact U(1) Lattice Gauge Theory is the simplest gauge theory based on a Lie group. Its action is given by X S=β cos(θµν (x)) , (3.1) x,µ