May 13, 2009 - arXiv:0905.1539v2 [math.PR] 13 ... This improves a previous bound by Diaconis and Saloff-Coste ... In this paper, we ... can still essentially use the spectral gap to bound the total variation mixing time to O(n5(log n log Ç«)3),.
Total Variation Bound for Kac’s Random Walk Yunjiang Jiang
arXiv:0905.1539v2 [math.PR] 13 May 2009
May 13, 2009 Abstract We show that the classical Kac’s random walk on S n−1 starting from the point mass at e1 mixes in O(n5 log n) steps in total variation distance. This improves a previous bound by Diaconis and Saloff-Coste of O(n2n ).
1
Introduction
Mark Kac proposed the following simplified model of one-dimensional Boltzmann gas dynamics (for historical details, see [3], [4]): For n particles on R, we can represent their velocities (v1 , . . . , vn ) as a point on the unit sphere S n−1 , after normalization so that n X
vi2 = 1
i=1
. Conservation of kinetic energy (assuming 0 potential energy) in the gas dynamics is equivalent to (v1 (t), . . . , vn (t)) staying on S n−1 for all t ≥ 0. We will not introduce momentum conservation in our model, because that will force the collision to be inelastic (see 2nd paragraph below). Each time there is a collision, it occurs with probability 1 between two particles, which corresponds to choosing two coordinate directions xi , xj and rotating S n−1 along the 2-plane xi ∧ xj by some angle θ. Notice that vi2 + vj2 before and after the collision stays the same, since the velocities of the other particles are not affected by the collision. By disregarding the position information of the particles (which have to be confined in some compact domain, for example S 1 , else they will eventually run off to infinity), each collision occurs between any pair of the particles with equal probability n1 . The rotation angle θ can be chosen from some distribution (2) on [0, 2π), which physically is a measure of the elasticity of the collision; for example, inelastic collision in R will correspond to a distribution of θ that’s a delta measure concentrated at π. In this paper, we will assume that θ is uniformly distributed on [0, 2π). Thus we obtain a discrete time Markov chain on S n−1 with transition kernel given by, for f : S n−1 → R continuous, and x ∈ S n−1 , n Z 1 X Kf = `n´ 2
2π
f (R(i, j; θ)x)
0
i6=j
1 dθ 2π
(1)
where R(i, j; θ) denotes the rotation along the oriented i ∧ j plane by the angle θ. By transposing, K defines a map from the set of probability measures on S n−1 to itself, since K(1) = 1. It is not too hard to see that the Un−1 , the uniform distribution on S n−1 , is a stationary distribution for K: for each summand Ki,j (without n1 in (1)), we have (2) Z Un−1 (Ki,j f ) = (Ki,j f )(x)Un−1 (dx) S n−1
=
Z
(
S n−1
= =
Z
ZS
n−1
Z
2π
f (R(i, j; θ)x)
0
1 ( 2π
Z
1 dθ)Un−1 (dx) 2π
2π
f (x)dθ)Un−1 (R(i, j; θ)dx) 0
f (x)Un−1 (dx)
S n−1
1
using a change of variable formula and the fact that Un−1 is invariant under rotations. This establishes that Un−1 Ki,j = Un−1 for all i 6= j. Thus their average Un−1 K = Un−1 as well. We further claim that Markov chain is aperiodic because once a point is reached, it can be reached in the next step with positive probability density for any rotation. It is also irreducible since along a sequence of rotations (i1 ∧ i2 , . . . , ik ∧ ik+1 ) that form a connected spanning graph in Kn , the complete graph on n vertices, one can transport any point on S n−1 to any other point with positive probability density; such sequence of rotations certainly occur with positive probability. Thus by general theory of Markov chains, we know that with any initial distribution µ on S n−1 , lim µK l (A) − Un−1 (A) = 0
l→∞
uniformly in A ⊂ S. This implies total variation convergence. Using the L2 theory of discrete time Markov chains, it can be shown that if the starting distribution µ is in L2 (S n−1 , Un−1 ), then we get the following convergence bound ||µK l − Un−1 ||T V < ||µ − 1||L2 (1 −
1 l ) 2n
1 by the result in [4], that shows the spectral gap of K is bounded below by 2n . In fact it’s given exactly n+2 by 2n(n−1) for n ≥ 2. If the initial distribution µ does not have an L2 density with respect to Un−1 , then the L2 theory above fails. The best result for the rate of convergence when the initial distribution is say concentrated at one point is given in [3], where to get with ǫ close to Un−1 in total variation norm requires O(n2n log ǫ) steps. The L2 theory gives a mixing time of O(2n log ǫ||µ||L2 ). In the following section, however, we will show that by some ’truncation’ argument, and triangle inequalities for the total variation distance, one can still essentially use the spectral gap to bound the total variation mixing time to O(n5 (log n log ǫ)3 ), when the walk starts at e1 , a fixed point on S n−1 . It is not clear whether the power on the log ǫ factor in the mixing time bound can be reduced to 1. In this sense, the result is not a strict improvement over [3].
2
Bounding the Total Variation Distance
Next we want to bound the convergence rate of the Kac’s random walk on S n−1 , in total variation distance. Recall the total variation distance between two probability measures µ and ν on the same probability space (S, S) is defined by the following variational quantity: ||µ − ν||TV = sup |µ(A) − ν(A)| A∈S
where S is the σ-algebra on S. Let Ak be the event that at the kth step of the walk, every pair of coordinates has been used. Then we have ! n 1 P (Ack ) := ηk < (1 − `n´ )k 2 2
Conditioning on this event, we have the following two claims: dµ′k 1. the density g := dUn−1 of the resulting distribution µ′k of the conditioned random walk with respect to the uniform distribution on S n−1 satisfies the following bound n
g(x) ≤ | min xi |−n ( i=1
2
n
n X i=1
(− log |xi |)k )C k n
k Y
≤ C k kk | min xi |−n (− log | min xi |)k i=1
for some fixed absolute constant C. 2. For k > −n2 log n log ǫ, and ǫ < following bound on its probability
n−3/2 2
i=1
m!
(2)
m=1
(3)
the set Hǫ := {x ∈ S n−1 : |xi | < ǫ for some i} satisfies the
2
1
µ′k (Hǫ ) ≤ ǫ 4
(4)
Let us first show how claims 1 and 2 lead to a polynomial time convergence rate for the Kac walk under total variation norm. Let µk be the distribution on S n−1 after k steps of the random walk, and let µ′k be µk conditional on Ak , i.e., for B ⊂ S n−1 , µ′k (B) = P (δe1 Rk ∈ B|Ak ) where R is the one step transition kernel of the Kac random walk. Then we have
||µ′k
− µk ||TV < ηk
µ′k (B) 1 − ηk which gives µk (B) > µ′k (B) − ηk µ′k (B) hence µ′k (B) − µk (B) < ηk which shows (5). Next recall that a Markov kernel is weakly contracting in total variation norm because if f is a bounded continuous function on the state space with ||f ||∞ ≤ 1 then Rf (x) =
R
R(x, dy)f (y) satisfies the same L∞ bound, hence (µR − νR)(f ) = (µ − ν)(Rf ) ≤ ||µ − ν||TV
Thus by the triangle inequality we just need to bound ||µ′k Rl − Un−1 ||TV from now on, where Un−1 denotes the uniform distribution on S n−1 and at the end add ηk to the resulting bound. Next we modify µ′k to a different distribution νk as following. We define νk in terms of its density with respect to Un−1 .
3
On the set Hǫc , dνk dµ′k = dUn−1 dUn−1 On the set Hǫ , we let its density be a constant equal to the mass of Hǫ under µ′k divided by its mass under Un−1 , which is what’s needed for νk to be a probability distribution on S n−1 ; we invoke claim 1 above to get an upper bound on this constant: dνk µ′k (Hǫ ) ≡ dUn−1 Un−1 (Hǫ )
Γ( n−1 ) 2
r
n−2 2
which follows from log convexity of the Γ function. The total variation distance between µ′k and νk is given simply by their total variation distance over the region Hǫ , hence we have ||µ′k − νk ||TV ≤ µ′k (Hǫ ) +
nΓ( n2 ) ǫ Γ( 21 )Γ( n−1 ) 2
≤ n3/2 ǫ + ǫ1/4
(6) (7)
Thus by choosing ǫ sufficiently small, whose exact value we will determine in the end, we can make sure that µ′k and νk are very close in total variation distance. And again by weak contractivity of Markov kernel, we now simply need to focus on bounding ||νk Rl − Un−1 ||TV . Since νk has an L2 density with respect to Un−1 , we can use the spectral gap to bound the rate of convergence. First we bound the L2 (dUn−1 ) distance between νk and Un−1 : Z Z 1 dνk dνk ||νk − Un−1 ||L2 (dUn−1 ) = ( (8) | − 1|2 dUn−1 + − 1|2 dUn−1 ) 2 | dU dU c n−1 n−1 Hǫ Hǫ Let’s bound the two integrals separately. For the first integral on the right hand side of (8), we have Z Z dνk dνk 2 | − 1|2 dUn−1 ≤ ) dUn−1 + Un−1 (Hǫ ) ( Hǫ dUn−1 Hǫ dUn−1 Γ( n ) 8π ǫ n−1 2 1 n − 2 Γ( 2 )Γ( 2 ) r 2π < 4ǫ−1/2 n−2
< ǫ−3/2
(9)
For the second integral, notice first that Hǫc is the set of points on S n−1 for which all the coordinates k are greater than ǫ. So claim 2 tells us that the density dUdνn−1 over this region is bounded above by ǫ−n , from which we immediately get the following bound Z dνk − 1|2 dUn−1 < ǫ−2n + 1 (10) | Hǫc dUn−1 Combining (9) and (10), we get, for ǫ
2 say, that
||νk − Un−1 ||L2 (dUn−1 ) ≤ 2ǫ−n
4
By the results in [4], we know that the spectral gap of the Kac kernel is
1 , n
so we get
1 dνk − 1||L2 (dUn−1 ) (1 − )m dUn−1 n 1 ≤ 2ǫ−n (1 − )m n
||νk Rl − Un−1 ||TV ≤ ||
(11)
Finally combining (5) (6) and (11), we get
||δe1 R
k+l
− Un−1 ||TV ≤
! 2 n 1 1 (1 − `n´ )k + n3/2 ǫ + ǫ1/4 + C k kk |ǫ|−n (− log ǫ)k (1 − )l 2 n 2
(12)
So it remains to minimize the right hand side of (12) with respect to k and l. Suppose our target total variation distance is 3δ. Then we can simply divide 3δ into three equal parts and bound each summand in (12) by δ. We look at each summand below: bounding the first summand yields ! n 1 (1 − `n´ )k < δ 2 2 ! n ⇒k > (− log δ + 2 log n) 2 So it suffices to take k > n2 log n log
1 δ
(13)
Bounding the second summand ǫ1/4 + n3/2 ǫ < δ, it suffices to have ǫ1/4 < δ/2 n3/2 ǫ < δ/2 which gives
ǫ
2. Observe that at step j − 1, j ≤ k, the support of the running distribution is a subsphere of S n−1 . Without loss of generality, we call this subsphere S m . Denote by uj , vj the axes that span the plane of rotation γj . The way γj affects the previous running distribution can be classified into three cases: 1. uj , vj 6∈ S m , in which case the running distribution remains unchanged. 2. uj , vj ∈ S m , in which case the support at step j is still on S m . 3. uj ∈ Sm , vj 6∈ Sm , in which case the support of the running distribution grows to be a sphere with 1 dimension higher than S m , denoted without loss of generality S m+1 , and the density with respect to the uniform distribution on S m+1 is bounded by (u2j + vj2 )−1/2 times the previous density bound with respect to S m . Case 1 is clear because the rotation does not take S m outside itself and for θ ∈ [0, 2π], the density at (x1 , . . . , xm+1 , . . . , (u2j + vj2 )1/2 cos θ, . . . , (u2j + vj2 )1/2 sin θ, . . . , xn ) with respect to Um only depends on the first m + 1 coordinates, which means that averaging over θ uniformly in [0, 2π] remains the same. to understand case 3, we have the following Lemma 3.1. Assuming the running density h(x1 , . . . , xm+1 ) with respect to Um after step j−1 is bounded by g(x1 , . . . xm+1 ), and that without loss of generality uj = xm+1 , vj = xm+2 , then the new density with respect to Um+1 after step j is bounded by 1 g(x1 , . . . , (xm+1 + xm+2 )1/2 )(x2m+1 + x2m+2 )−1/2 2π Proof. Denote the new density with respect to Um+1 by h(x1 , . . . , xm+2 ) with a slight abuse of notation. Then we have h(x1 , . . . , (x2m+1 + x2m+2 )1/2 cos θ, (x2m+1 + x2m+2 )1/2 sin θ) is independent of θ and in particular equals h(x1 , . . . , (x2m+1 + x2m+2 )1/2 , 0) Furthermore the total contribution of density from (x1 , . . . , (x2m+1 +x2m+2 )1/2 cos θ, (x2m+1 +x2m+2 )1/2 sin θ) for all θ should add up to the previous density at the point (x1 , . . . , (x2m+1 + x2m+2 )1/2 ). In other words, Z 2π (x2m+1 + x2m+2 )1/2 h(x1 , . . . , (x2m+1 + x2m+2 )1/2 cos θ, (x2m+1 + x2m+2 )1/2 sin θ) θ=0
=h(x1 , . . . , . . . , (x2m+1 + x2m+2 )1/2 )
Notice that the factor (x2m+1 + x2m+2 )1/2 accounts for the measure of the circle over which we aggregate. Thus we get h(x1 , . . . , (x2m+1 + x2m+2 )1/2 cos θ, (x2m+1 + x2m+2 )1/2 sin θ) 1 = (x2m+1 + x2m+2 )−1/2 h(x1 , . . . , (x2m+1 + x2m+2 )1/2 ) 2π 1 ≤ (x2m+1 + x2m+2 )−1/2 g(x1 , . . . , (x2m+1 + x2m+2 )1/2 ) 2π
to study case 2, we make the assumption that after step j − 1, the density with respect to Um is bounded by an expression of the form C(a21 + b21 )−1/2 . . . (a2m−1 + b2m−1 )−1/2 [(− log |x1 |)j−1 + . . . + (− log |xm+1 |)j−1 ]
(14)
where C is a constant that varies with j and m. Here ai = 6 bi for each i and (a1 , b1 ), . . . , (am−1 , bm−1 ) are pairs in (x1 , . . . , xm+1 )2 with the property that no two pairs are the same and each coordinate appears at most twice.
6
Lemma 3.2. Under the assumption above, if jth rotation is in case 2, then the new density bound takes the form 2C(j + 1)!(a21 + b21 )−1/2 . . . (a2m−1 + b2m−1 )−1/2 [(− log |x1 |)j + . . . + (− log |xm+1 |)j ] with possibly a different sequence of (ai , bi ) satisfying the same property as before. Proof. Without loss of generality assume (uj , vj ) = (1, 2). The new density h′ is obtained from the old density h by averaging over θ ∈ [0, 2π] of h(R(1, 2, θ)x), where R(1, 2, θ)x denotes the rotation of the vector x ∈ S m by angle θ along x1 ∧ x2 . In formula we have, h′ (x) =
1 2π
Z
2π
h(R(1, 2, θ)x)dθ
(15)
0
We write the bound (14) as a sum of m + 1 terms and consider one of the terms gi (x) = C(a21 + b21 )−1/2 . . . (a2m−1 + b2m−1 )−1/2 (− log |xi |)j−1 By assumption, at most two of a1 , b1 , . . . , am−1 , bm−1 equals x1 at most two equals x2 . By the circle averaging formula (15), we have
gi′ (x) =
1 2π
Z
2π
g((x21 + x22 )1/2 cos θ, (x21 + x22 )1/2 sin θ, x3 , . . . , xm+1 )dθ
0
We shall break the integral into two parts, where the range of integration is over Ic = [0, π/4] ∪ [3π/4, 5π/4] ∪ [7π/4, 2π] and its complement Is in [0, 2π] respectively, i.e., the ranges are where cos θ is close to 1 and sin θ is close to 1 respectively. For θ ∈ Ic , all the factors in gi (x) of the form (x22 + x2s )−1/2 , that involves x2 but not√x1 upon the rotation R(1, 2, θ) becomes ((x21 + x22 ) sin2 θ + x2s )−1/2 , which can be bounded above by 2(x21 + x22 + x2s )−1/2 . As of the factors that involve both x1 and x2 , i.e., (x21 + x22 )−1/2 , there can be at most one of such. And it remains the same under the rotation R(1, 2, θ) since (x21 + x22 ) cos2 θ + (x21 + x22 ) sin2 θ = x21 + x22 . The factors that involve x1 and xl , l 6= 2, becomes ((x21 + x22 ) cos2 θ + x2s )−1/2 , which we can bound as follows: using the fact that 12 (|a| + |b|) ≤ (a2 + b2 )1/2 ≤ |a| + |b|, we get ((x21 + x22 ) cos2 θ + x2s )−1/2 ∼ (|x1 | + |x2 |)| cos θ| + |xs |)−1 where a ∼ b means b/C ≤ a ≤ bC for some constant C. Here the constant can be taken to be 2. Finally it’s also possible that i ∈ {1, 2}, in which case we also have to deal with a (− log[(x21 + x22 )−1/2 cos θ])j−1 factor that goes to infinite for θ ∈ Ic . In fact when i = 1, the only factors that have singularities for θ ∈ Ic and for the coordinates bounded away from 0 take the following form: ((|x1 | + |x2 |)| cos θ| + |xs |)−1 ((|x1 | + |x2 |)| cos θ| + |xt |)−1 (− log[(x21 + x22 )−1/2 cos θ])j where s 6= t, or without the xt factor. In the former case we show that the following integral 1 2π
Z
θ∈Ic
((|x1 | + |x2 |)| cos θ| + |xs |)−1 ((|x1 | + |x2 |)| cos θ| + |xt |)−1 (− log[(x21 + x22 )1/2 cos θ])j−1 dθ
is bounded by j!(x21 + x22 )−1/2 (x2s + x2t )−1/2 [(− log |x1 |)j + (− log |x2 |)j + (− log |xs |)j + (− log |xt |)j ] whereas in the case xt is not present, the same bound applies to the expression
7
(16)
1 2π
Z
θ∈Ic
((|x1 | + |x2 |)| cos θ| + |xs |)−1 (− log[(x21 + x22 )1/2 cos θ])j−1 dθ
(17)
clearly because ((|x1 | + |x2 |) cos θ + |xt |)−1 ≥ 1 When i 6= 1, the logarithmic singularity will not arise when integrating over θ ∈ Ic , so it will trail off as a remaining factor of the form (− log |xi |)j−1 ≤ 1 + (− log |xi |)j . Recall also that we have factors of the form 2(x21 + x22 + x2s )−1/2 (x21 + x22 + x2t )−1/2
(18)
coming from the uniform bound on the factors involving x2 but not x2 ; here s, t are possibly different indices than those appearing in the singular factors. (18) can be trivially bounded above by 2(x21 + x2s )−1/2 (x22 + x2t )−1/2 . The remaining inverse factors in g(R(1, 2, θ)x) do not contain x1 or x2 , so one can easily check that the inductive hypothesis is satisfied. The best way to visualize this branching inductive argument is to consider a simple graph on m + 1 vertices with degrees bounded above by 2. The edges between i and j represents a factor of the form (x2i + x2j )−1/2 in the bound on the density. A rotation in the x1 ∧ x2 plane has the effect of producing two graphs on m + 1 vertices. Without loss of generality let’s describe one of those two descendant graphs, the one associated with x1 : there will be edges (1, 2), (3, 4), (1, 3), and (2, 4) if x3 and x4 were incident to x1 in the previous graph, or simply (1, 2) when x1 only has degree 1. If x1 had degree 0, then it remains isolated in the x1 component of the descendant graph. In the process of this rewiring, some logarithmic factors are also introduced, namely, if (− log |x3 |)j−1 or (− log |x4 |)j−1 was a factor in the bound for the previous step running disribution, then the new bound will have j!(− log |x4 |)j . If it’s a log factor of other coordinates, then the exponent remains the same. It remains to prove the bound (16), which is given by the following technical lemma. Lemma 3.3. For 0 ≤ xt , xs , 0 ≤ x1 , x2 , Z 1 ((x1 + x2 )θ + xs )−1 ((x1 + x2 )θ + xt )−1 (− log[(x1 + x2 )θ])j−1 dθ 0
≤4(j + 1)!(xs + xt )−1 (x1 + x2 )−1 [(− log x1 )j + (− log x2 )j + (− log xs )j + (− log xt )j ]
Proof. Without loss of generality, we can assume xt ≤ xs . First of all the factor ((x1 + x2 )θ + xs )−1 can be bounded above by 2(xs + xt )−1 for θ ∈ [0, 1]. So it remains to bound the integral of the remaining factors Z 1 ((x1 + x2 )θ + xt )−1 (− log[(x1 + x2 )θ])j−1 dθ 0
≤x−1 t
Z
ǫ
(− log[(x1 + x2 )θ])j−1 dθ + (− log[(x1 + x2 )ǫ])j−1
0
Z
1
((x1 + x2 )θ + xt )−1 dθ
ǫ
j j−1 ≤x−1 (x1 + x2 )−1 log[(x1 + x2 + xt )/((x1 + x2 )ǫ + xt )] t j!ǫ(− log[(x1 + x2 )ǫ]) + (− log[(x1 + x2 )ǫ]) j j−1 ≤x−1 (x1 + x2 )−1 log[(x1 + x2 )ǫ] t j!ǫ(− log[(x1 + x2 )ǫ]) + (− log[(x1 + x2 )ǫ]) j j −1 =x−1 t j!ǫ(− log[(x1 + x2 )ǫ]) + (− log[(x1 + x2 )ǫ]) (x1 + x2 )
Taking ǫ = xt , we obtain the result.
4
Proof of Claim 2
Here we use the result from [5] that after k = n2 log n = logǫ steps the L2 transportation distance between the running distribution of the Kac random walk on S n−1 and the uniform distribution Un−1 is less than ǫ. So by Holder’s inequality, the L1 transportation distance is also less than ǫ. Now suppose µk (Hǫ ) > ǫ1/4 . We know that the uniform measure Un−1 (Hǫ ) is of the order ǫ; in fact using the marginal
8
density formula for a single coordinate on the unit sphere, one sees that it is bounded above by n3/2 ǫ, and similarly Un−1 (H2ǫ ) ≤ 2n3/2 ǫ. So for ǫ sufficiently smaller than n−3/2 , we can make sure that (n3/2 ǫ)1/4 − 2ǫn3/2 > ǫ1/2 then in order to transport the excessive mass in Hǫ under the kth running distribution of the random walk to other places on S n−1 , any transport strategy has to take all the mass outside H2ǫ , which means a distance more than ǫ has to be traversed for each particle mass. This shows the L1 transportation cost for smoothing out the region Hǫ is at least ǫǫ1/2 , which is greater than the total L1 transportation distance, a contradiction.
5
Acknowledgement
I would like to thank my advisor Persi Diaconis for introducing me to the wonderful problem of Kac’s random walk and directing me to the right papers to look at.
References [1] George E. Andrews, Richard Askey, Ranjan Roy. Special Functions. [2] Logarithmic derivatives of the Gamma function. scipp.ucsc.edu/ haber/ph116A/psifun.ps [3] Persi Diaconis, Laurent Saloff-Coste. Bounds for Kac’s Master Equation. [4] E. A. Carlen, M. C. Carvalho, M. Loss, Determination of the Spectral Gap for Kac’s Master Equation and Related Stochastic Evolutions. [5] Roberto Imbuzeiro Oliveira. On the convergence to equilibrium of Kac’s random walk on matrices.
9