Geometric Inequalities for the Eigenvalues of Concentrated Markov Chains Author(s): Olivier François Source: Journal of Applied Probability, Vol. 37, No. 1 (Mar., 2000), pp. 15-28 Published by: Applied Probability Trust Stable URL: http://www.jstor.org/stable/3215656 Accessed: 22/01/2009 15:26 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=apt. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact
[email protected].
Applied Probability Trust is collaborating with JSTOR to digitize, preserve and extend access to Journal of Applied Probability.
http://www.jstor.org
J. Appl. Prob. 37, 15-28 (2000) Printed in Israel ? AppliedProbabilityTrust2000
GEOMETRIC INEQUALITIES FOR THE EIGENVALUES OF CONCENTRATED MARKOV CHAINS OLIVIERFRANCOIS,* LMC/IMAG
Abstract This articledescribesnew estimatesfor the second largesteigenvaluein absolutevalue of reversibleand ergodic Markovchains on finite state spaces. These estimatesapply when the stationarydistributionassigns a probabilityhigher than 0.702 to some given state of the chain. Geometric tools are used. The bounds mainly involve the isoperimetric constant of the chain, and hence generalize famous results obtained for the second eigenvalue. Comparisonestimates are also established,using the isoperimetricconstant of a referencechain. These results apply to the Metropolis-Hastingsalgorithmin order to solve minimizationproblems,when the probabilityof obtainingthe solution from the algorithmcan be chosen beforehand.For these dynamics,robustboundsare obtainedat moderatelevels of concentration. Keywords:ReversibleMarkovchain; eigenvalues;isoperimetricconstant;MetropolisHastingsdynamics;minimization AMS 1991 SubjectClassification:Primary60J10; 65C05 Secondary15A42
1. Introduction During the past few years, Markov chain Monte Carlo methods have proved to be useful in seeking the absolute minimum of a function H defined on a finite set E. In these methods, an ergodic reversible Markov chain is simulated, for which the stationary distribution tends to concentrate on the absolute minimum. However, Monte Carlo algorithms return approximate solutions, and the quality of the approximation depends on the number N of steps performed by the chain. Most often, the stationary distribution is a Gibbs distribution, which depends on a positive parameter T called temperature: Vi E E,
7T(i)
exp(-H(i)/ = ex ZT
T)
(1)
where ZT =
exp(-H
(i)/T).
(2)
ieE
In this article, the function H is assumed to be non-negative and minimal at a unique point i, in E such that H(i,) = 0. Thus, one has 7rT(i,)
--
as
T -> 0.
(3)
Received 20 August 1997; revision received 13 April 1999. * Postal address: LMC/IMAG,BP 53, 38041 Grenoblecedex 9, France.Email address:
[email protected].
15
0. FRAN(OIS
16
Hence, simulating TrTat low temperaturesenables us to solve the minimization problem associatedwith H with high probability.Let a be a numberin the interval(0, 1), say a > ^, and assume that T can be fixed so that rT (i,) > -.
(4)
Denote by XN the state which is returnedat the end of the algorithm. For 'large' N, the probabilityP(XN = i*) is correctlyapproximatedby 7T(i*). Then, one has P(XN - i*) >_ .
(5)
Therefore,the numbera may be viewed as a level of the confidencethata user can have in the Monte Carlominimizationprocedure. In orderto make the methodefficient, the distributionof the Markovchain at the final step must be close to the stationarydistributionJrT. Then, the main issue consists of finding the numberN of steps needed to reachthe stationarydistribution. It is well knownthatthe rateof convergencetowardsthe stationarydistributionis controlled by the spectralgap of the chain (the second largesteigenvaluein absolute value). This article gives new estimateson the spectralgap of a reversibleMarkovchain underthe hypothesisthat Equation(4) is satisfied for a greaterthan an explicit value (close to 0.701, see Section 2). The approachis geometric. Intuitively,fast convergenceis expected when the chain moves quickly to the subsets havinglarge probabilityunderthe stationarydistribution.This situation correspondsto large values of geometricquantitiescalled isoperimetricconstants. References [ 16, 21] shed light on the role playedby these quantitieson the controlof the second eigenvalue of the chain. This article emphasizes the control which is exerted on the whole spectrum by the isoperimetricconstant. It is organized as follows. The main results are stated in Section 2. The proofs are given in Section 3. Among Markovchain Monte Carlo methods, Metropolis-Hastingsdynamicsare very popular[12, 18]. In Section 4, our boundsare applied to the Metropolis-Hastingsdynamics, yielding robust convergenceestimates. The paper is concludedby a shortdiscussion, in which the results are comparedto those obtainedin [15]. 2. Background and presentation of the main results 2.1. Previous results This articleconsidersreversibleergodic Markovchains definedon the finite set n > 2.
E-{1,...,n},
(6)
Let rr denote the (common) stationarydistributionof these chains. The subscriptT is used when r = rT is a Gibbs distributionat temperatureT. The transitionmatrixis denoted by P = (p(i, j))i,j=l...n, or PT when the stationarydistributionis rT. Also denote Vi, j e E,
a(i,j)
=
(i)p(i, j).
(7)
Reversibilityinduces that Vi, j
E,
a(i, j) = a(j, i).
(8)
The eigenvaluesof P are real, and can be orderedas follows -1I
(24)
pT(i, j) q(i, j)7r() ( 7rT(i)) 7Tr(i) rrT(i)
otherwise,
and pT(i, i)
PT(i, j),
1-
=
(25)
j#i
where F is an arbitraryfunction such that 0 < F(x) < 1 for 0 < x < 1. (The standard Metropolisdynamicsis obtainedfor F = 1.) Let a be such that v2 < a < 1,
(26)
and mini:i* H(i)
T
t) is an edge of the transition graph.
In comparisonwith the results obtainedfrom Poincare'sinequalities[15], the main advantage of this approachis its robustnesswith respectto =
min
(i-+j);
IH(i) - H(j).
(29)
H(i)AH(j)
When T is fixed and 8 is small, the estimates given by [15] may become very inaccurate.In contrast,the previousbound may be appliedwith small regardto the constant8.
20
0. FRANCOIS 3. Geometric inequalities for the eigenvalues of a reversible Markov chain
3.1. Cheeger-like estimates As usual, the space of functionsdefinedon E is endowed with a Hilbertianstructure.This space is denotedby L2(7r)and n
(f, g) =
g E Vf,L2(r),
T(i)f(i)g(i).
E
(30)
i=l
The transitionmatrix P is reversibleif and only if P is self-adjointas an operatorof L2(7r). In this section, the following resultis proved. Theorem 3.1. Let P be the transitionmatrixof a reversibleMarkovchain on E and i* e E be such that 7r(i*) > v2, where v. is given by Equation(19). Let X < 1 be an eigenvalueof P. Thenwe have 1-
> K2,
(31)
where K* = 1 - /1 - 7r(i*)(1 + VI-Tr(i)). Corollary 3.1. Let P be the transitionmatrixof an ergodic reversibleMarkovchain on E and i* e E be such that n (if) > v2. Thenwe have Vi
Pk (i,
E,
< 1 - w(i) 7(i) (1 -K22 .) -7(.)2TVVi ,2 47r(i)
)k.
(32)
Proof This is an obvious consequenceof Theorem3.1 and Equation( 11). The proof of Theorem3.1 startswith a lemma on the eigenfunctionsof the chain. Lemma 3.1. Let P be the transitionmatrixof a time-reversibleMarkovchain. Let X < 1 be an eigenvalue of P and f an associated eigenfunctionsatisfying (f, f) = 1. Thenwe have, for all i E E, n
(j)lf (j) - 0.702. This chain is a particularcase of the Metropolis algorithm, and it will play a significant role in the proof of our third result. The spectrum of this chain can be entirelydescribed[5]. The constantp(P) is given by 11
p(P) = 1-
(52)
The isoperimetricconstant 0 is close to 1/n if 7r(1) is close to 1. The bound obtained in Theorem 3.1 behaves as (1 - 1/n2)1/2. This does not give the correctorderof convergence rate. However,this drawbackis well known in the Cheeger-likeapproach.A more significant example will be studiedin Section 4. In this perspective,the comparisonresults presentedin the next section will be useful. 3.2. Comparison of two chains In this section, the Cheeger-likeapproachis mergedwith Poincare'smethodused in [6, 4], in orderto give new estimatesfor the eigenvaluesof reversibleMarkovchains. In whatfollows, P2 denotes the transitionmatrixof a referencechain while P1 denotes the transitionmatrixof the chainof interest.Both chains arereversiblewith respectto the same probabilitydistribution r. For all i, j E E, denote by yij a path of Pi from i to j without loops (irreducibility warrantsthe existence of such a path), that is, a sequence io = i, ik,..., ir = j such that P1(ik, ik+l) > 0 and ik
{io, ...
,
ik-l}.
An edge of the transition graph between k and e,
k : feis denotedby (k ->. ), still with respect to Pi. Let r be an arbitraryset of pathsof P1 consisting of one path Yij for all i < j in E. Assume that F is symmetric. If yij E r, then r also contains Yji, which is obtainedby reversingthe sequence yij. The following result is proved. Theorem 3.2. Let P1, P2 be the transitionmatricesof reversibleMarkovchains on E and let i, E E be such that n(i*) > v*2,where v, is as in Equation(19). Let X < 1 be an eigenvalue of Pi. Thenwe have 1-
2 > K,222/A2
(53)
where r(i)p2(i, j)
A =max
r (k) pl (k,e)54)
(k-+) Yij (k-->)
(54)
and 02 is the isoperimetricconstantassociated with the transitionmatrix P2. Proof Let f be an eigenfunctionof Pi associatedwith the eigenvalueX < 1 and satisfying (f, f) = 1. Reorderingthe elements of E as before, we assume that f2(1) < f2(2)
f (Xk)-7r(f)
P(n -E
nk=0
t),
where f is a function defined on E, and Xk is the state of the chain at step k. The results in [17, 19] give quantitativebounds for this probability.According to these results, the number of visits to i. before step N can be estimatedunderstationarityby Nir(i,), to the extent that N is large comparedto (1 - X2)-1. In this situation,a user can easily identify the absolute minimum.In view of specific applications(such as hardcombinatorialproblems),it would be useful to obtain analogous results at lower confidencelevels (e.g. 0.5), but this issue deserves furtherwork. Next we turn to the comparisonwith the results obtained in [15]. Theorem 4.1 gives a convergencerate towardsequilibriumwhich is roughly 1 - K'e-2m/T (for some K' > 0) as T goes to 0. However, the true order is 1 - Ce-m/T, for some C > 0. In this perspective, the results establishedby [15] regardingthe Metropolisdynamicsare more powerful. But, the approachof [15] has a weakness. Recall that =
min (i0-j); H(i)AH(j)
IH(i) - H(j)I.
(86)
In [15], the temperaturebelow which the inequality PT < 1 - K"e-m/T
(87)
holds is T7= min (nlog m,
n (A+ B)'1 ) \ Ibryr
'
(8 log 2b log2 '
(88)
28
0. FRANCOIS
The definitions of the quantities yr, A and B are quite long. The interested reader may refer to [15], p. 359. For instance, let 8 be very small and T > T.. Then inequality (87) cannot be used, and the estimation of PT obtained by [15] may be inaccurate. In contrast, the constant 8 does not significantly affect the probability rrT(i*). Applying Theorem 4.1 to the Metropolis chain leads to the following bound (a > v2): p2p2r