MODERATE AND LARGE DEVIATIONS FOR ERDOS-KAC THEOREM

arXiv:1311.6180v2 [math.PR] 8 Dec 2013

˝ MODERATE AND LARGE DEVIATIONS FOR ERDOS-KAC THEOREM BEHZAD MEHRDAD AND LINGJIONG ZHU Abstract. Erd˝ os-Kac theorem is a celebrated result in number theory which says that the number of distinct prime factors of a uniformly chosen random integer satisfies a central limit theorem. In this paper, we establish the large deviations and moderate deviations for this problem in a very general setting for a wide class of additive functions.

1. Introduction Let V (n) be a random integer chosen uniformly from {1, 2, . . . , n} and let X(n) be the number of the distinct primes in the factorization of V (n). A celebrated result by Erd˝ os and Kac [9], [10] says that (1.1)

X(n) − log log n √ → N (0, 1), log log n

as n → ∞. This is a deep extension of Hardy-Ramanujan Theorem (see [13]). In addition, the central limit theorem holds in a more general setting for additive functions. A formal treatment and proofs can be found in e.g. Durrett [7]. In terms of rate of convergence to the Gaussian distribution, i.e. Berry-Esseen √ bounds, Rényi and Turán [18] obtained the sharp rate of convergence O(1/ log log n). Their proof is based on the analytic theory of Dirichlet series. Recently, Harper [14] used a more probabilistic approach and used Stein’s √ method to get an upper bound of rate of convergence of the order O(log log log n/ log log n). In terms of large deviations, Radziwill [17] used analytic number theory approach to get a series of asymptotic estimates. Féray et al. [12] proved precise large deviations using the mod-Poisson convergence method developed in Kowalski and Nikeghbali [16]. In this paper, we study the large deviation principle and moderate deviation principle in the sense of Erd˝ os and Kac. Instead of precise deviations that have been studied in Féray et al. [12]. Our large deviations and moderate deviations results are in the sense of Donsker-Varadhan [19], [3], [4], [5], [6]. Our proofs are probabilistic and require only elementary number theory results, in constrast to the analytical number theory approach in Radziwill [17]. Both large deviations and moderate deviations in our paper are proved for a much wider class of additive functions than what have been studied in Radziwill [17] and Féray et al. [12]. Date: 8 December 2013. Revised: 8 December 2013. 2000 Mathematics Subject Classification. 60F10,11N37. Key words and phrases. Erd˝ os-Kac theorem, additive functions, number of distinct prime factors, large deviations, moderate deviations. 1

2

BEHZAD MEHRDAD AND LINGJIONG ZHU

Before we proceed, let us introduce the formal definition of large deviations. A sequence (Pn )n∈N of probability measures on a topological space X satisfies the large deviation principle with speed bn and rate function I : X → R if I is nonnegative, lower semicontinuous and for any measurable set A, we have 1 1 log Pn (A) ≤ lim sup log Pn (A) ≤ − inf I(x). (1.2) − inf o I(x) ≤ lim inf n→∞ bn x∈A n→∞ bn x∈A Here, Ao is the interior of A and A is its closure. We refer to Dembo and Zeitouni [2] or Varadhan [20] for general background of large deviations and the applications. Let X1 , . . . , Xn be a sequence of Rd -valued i.i.d. random vectors with mean 0 and convariance matrix C that is invertible. Assume that E[ehθ,X1 i ] < ∞, for θ in √ some ball around the origin. For any n ≪ an ≪ n, a moderate deviation principle says that for any Borel set A, (1.3) ! n 1 X Xi ∈ A an i=1 ! n 1 n 1 X Xi ∈ A ≤ − inf o hx, C −1 xi. ≤ lim sup 2 log P an i=1 2 x∈A n→∞ an Pn i=1 Xi ∈ ·) satisfies a large deviation principle with the speed

1 n − inf o hx, C −1 xi ≤ lim inf 2 log P n→∞ an 2 x∈A

In other words, ( a1n a2n n .

The above classical result can be found for example in [2]. Moderate deviation principle fills in the gap between central limit theorem and large deviation principle. In this paper, we are interested to prove both large deviations and moderate deviations for Erd˝ os-Kac theorem for a wide calss of additive functions. 2. Main Results Throughout this paper, we let P denote the set of all the prime numbers. Assumption 1. Let g be a strongly additive, i.e. g(pk ) = g(p) for all primes p and integers k ≥ 1, and g(mn) = g(m) + g(n) whenever gcd(m, n) = 1. In addition, we assume that there exists a probability measure ρ on R so that R • For any θ ∈ R, RR eθy ρ(dy) < ∞. R • For any θ ∈ R, R eθy ρn (dy) → R eθy ρ(dy), where P 1 (2.1)

ρn (A) =

for any Borel set A ⊂ R.

g(p)∈A,p≤n,p∈P p P 1 p≤n,p∈P p

,

Let V (n) be a uniformly chosen random integer from {1, 2, . . . , n} and Zp = 1 if V (n) is divisible by p and Zp P = 0 otherwise. Then, for any strongly additive function g, we have g(V (n)) = p≤n,p∈P g(p)Zp . We have the following large deviations result. X(n) Theorem 2. Under Assumption 1, ( log log n ∈ ·) satisfies a large deviation principle with speed log log n and rate function Z (2.2) I(x) := sup θx − (eθy − 1)ρ(dy) . θ∈R

R

˝ MODERATE AND LARGE DEVIATIONS FOR ERDOS-KAC THEOREM

3

X(n) Corollary 3. Assume that g(p) → λ ∈ (0, ∞) as p → ∞. ( log log n ∈ ·) satisfies a large deviation principle with speed log log n and rate function ( x log λx − λx + 1 if x ≥ 0, (2.3) I(x) := λ +∞ otherwise.

For the special case g(p) ≡ 1, X(n) denotes the number of distinct prime factors of X(n) a random integer uniformly chosen from {1, 2, . . . , n} and ( log log n ∈ ·) satisfies a large deviation principle with speed log log n and rate function ( x log x − x + 1 if x ≥ 0, (2.4) I(x) := +∞ otherwise. Remark 4. (i) If Assumption 1 fails, Theorem 2 may not hold. For example, we can define ( λ1 α2k < p ≤ α2k+1 (2.5) g(p) = , λ2 α2k+1 < p ≤ α2k+2 where 0 < λ1 < λ2 and k = 0, 1, 2, . . . and αi − αi−1 are sufficiently large so that for some small δ > 0, and some fixed θ > 0, we have X eθg(p) − 1 1 (2.6) ≤ λ1 + δ log log αn p p∈P,p≤αn

along the subsequence αn when n is odd and X eθg(p) − 1 1 (2.7) ≥ λ2 − δ log log αn p p∈P,p≤αn

along the subsequence αn when n is even. This shows that for any fixed θ > 0, 1 θX(n) the limit log log ] does not exist. Therefore there is no large deviation n log E[e principle as a result of Varadhan’s lemma, seeRe.g. Dembo and Zeitouni [2]. (ii) The assumption that for any θ ∈ R, R eθy ρ(dy) < ∞ basically says that the density of primes p such that g(p) is large should be small. For example, if P eθg(p) −1 will deg(p) grows to infinity as p → ∞, then, the scaling of p≤n,p∈P p pend on θ and thus there is no f (n) → ∞ independent of θ so that the limit P 1 eθg(p) −1 exists. On the other hand, if g(p) → 0 as p → ∞. Then, p≤n,p∈P f (n) p P θg(p) eθg(p) −1 ∼ p as p → ∞. Now, if also p≤n,p∈P g(p) p p → ∞ as n → ∞, then, we have X eθg(p) − 1 1 → θ, (2.8) lim P g(p) n→∞ p p≤n,p∈P

p

p≤n,p∈P

P and the rate function for the large deviations is then trivial. Again, if p∈P g(p) p < ∞, there is no Donsker-Varadhan type large deviations. (iii) Let p1 < p2 < p3 < · · · be the ordered sequence of all the prime numbers. A celebrated result in number theory due to Hoheisel [15] says that there exists a constant γ < 1 such that pk+1 − pk < pγk for sufficiently large k. This shows that X pk+1 − pk X 1 X 1 1 = (2.9) − ≤C pk pk+1 pk pk+1 p2−γ pk ≤n pk ≤n pk ≤n k

4


for some universal constant CPand the series is convergent. Therefore, we have P 1 1 1 1 p2k+1 ≤n p2k+1 ∼ 2 log log n. Let 0 < λ1 < λ2 < ∞. p2k ≤n p2k ∼ 2 log log n and Define g(pk ) = λ1 if k is odd and g(pk ) = λ2 if k is even. Thus, we have i 1 h P 1 1 (2.10) lim E eθ p∈P,p≤n g(p)Zp = (eθλ1 − 1) + (eθλ2 − 1). n→∞ log log n 2 2

X(n) Hence, we conclude that ( log log n ∈ ·) satisfies a large deviation principle with speed log log n and rate function ( supθ∈R θx − 21 (eθλ1 − 1) − 21 (eθλ2 − 1) if x ≥ 0, (2.11) I(x) := +∞ otherwise.

In some special cases, there is an explicit expression for the rate function. For example, consider λ2 = 2λ1 ∈ (0, ∞). In this case, the optimal θ is given by ! p −λ1 + λ21 + 16λ1 x 1 . log (2.12) θ∗ = λ1 4λ1 Hence, for x ≥ 0, (2.13) 1 1 I(x) = θ∗ x − (eθ∗ λ1 − 1) − (eθ∗ λ2 − 1) 2 2 ! p x −λ1 + λ21 + 16λ1 x +1 log = λ1 4λ1 ! p 1 −λ1 + λ21 + 16λ1 x 1 − − 2 4λ1 2

−λ1 +

!2 p λ21 + 16λ1 x . 4λ1

Here are some examples in which we can get an explicit expression for the rate function I(x). Example 5. (i) Assume that ρ(dy) = 12 δλ1 + 12 δ2λ1 . Then, from Remark 4 (iii), for x ≥ 0, the rate function I(x) has explicit expression as given in (2.13). (ii) Assume that ρ(dy) has Poisson distribution with parameter λ > 0. Then, we have Z ∞ θ (eθy − 1)ρ(dy) = θx + 1 − eλ(e −1) . (2.14) θx − 0

Differentiating with respect to θ and setting the derivative as zero, we get x = θ λeθ eλ(e −1) . Then, we get the optimal θ∗ = log( λ1 W (xeλ )), where W (z) is the Lambert W function defined as z = W (z)eW (z) for any z ∈ C, see e.g. Corless et al. [1]. Hence, for x ≥ 0, we have λ 1 λ W (xe ) + 1 − eW (xe )−λ . (2.15) I(x) = x log λ (iii) RAssume that ρ(dy) has binomial distribution with parameters n and β. Then, ∞ we get 0 eθy ρ(dy) = (1 − β + βeθ )n . Thus, the optimal θ∗ satisfies x = n(1 − β + βeθ∗ )n−1 βeθ∗ . For n = 1, I(x) = x log(x/β) + β − x. For n = 2, ! p −(1 − β) + (1 − β)2 + x2 + 1 − (1 − β)2 − x2 . (2.16) I(x) = x log β


5

(iv) Assume that ρ(dy) has Gaussian distribution with mean 0 and variance 1. R∞ 1 2 1 2 Then, we get −∞ eθy ρ(dy) = e 2 θ . Thus, the optimal θ∗ satisfies x = θ∗ e 2 θ∗ , p 2 which is equivalent to x2 = θ∗2 eθ∗ . Therefore, we have θ∗ = W (x2 ), where W (·) is the Lambert W function and p x . (2.17) I(x) = W (x2 )x + 1 − p W (x2 )

We conclude this section by stating a moderate rate deviation principle, which fills in the gap between central limit theorem and large deviation principle. P P g(p)2 g(p) and σn2 := Theorem 6. Let µn := p∈P,p≤n p . Let an be p∈P,p≤n p a positive sequence so that σn ≪ an ≪ σn2 as n → ∞. Under Assumption 1, n ∈ ·) satisfies a large deviation principle with speed a2n /σn2 and rate func( X(n)−µ an tion x2 (2.18) J(x) := . 2 3. Proofs

3.1. Proof of Large Deviation Principle. In this section, we will prove Theorem 2. The proof consists of a series of superexponential estimates, i.e. Lemma 7, Lemma 8 and Lemma 9, and an application of Gärtner-Ellis theorem (see e.g. Chapter 2 in Dembo P and Zeitouni [2]). Write X(n) = p≤n,p∈P g(p)Zp , where Zp = 1 if V (n) is divisible by p and 0 P otherwise. The first step is to show that p≤n,p∈P g(p)Zp can be approximated by P |g(p)|≤C,p∈P g(p)Zp in the following sense.

Lemma 7. for any ǫ > 0, (3.1)

  X 1 lim sup lim sup g(p)Zp ≥ ǫ log log n = −∞. log P  n→∞ log log n C→∞ |g(p)|>C,p≤n,p∈P

Proof. Note that (3.1) holds if we can prove the following two estimates,   X 1 g(p)Zp ≥ ǫ log log n = −∞, log P  (3.2) lim sup lim sup n→∞ log log n C→∞ g(p)>C,p≤n,p∈P

and (3.3)

 1 log P  lim sup lim sup n→∞ log log n C→∞

X

g(p) C > 0, for any θ > 0, by Chebychev’s inequality, we have g(p)θ > 0 and   X 1 lim sup g(p)Zp ≥ ǫ log log n log P  (3.7) n→∞ log log n g(p)>C,p≤n,p∈P h P i 1 log E eθ g(p)>C,p≤n,p∈P g(p)Zp − θǫ ≤ lim sup n→∞ log log n i h P 1 ≤ lim sup log E eθ g(p)>C,p≤n,p∈P g(p)Yp − θǫ log log n Zn→∞ (eθy − 1)ρ(dy) − θǫ, = y>C

and the limit goes to −θǫ as C → ∞. By letting θ → ∞, we obtained (3.2). Similarly, for g(p) < −C < 0, by taking θ < 0, we get   X 1 g(p)Zp ≤ −ǫ log log n log P  lim sup (3.8) n→∞ log log n g(p) 0, i i h P h P (3.11) E eθ| p∈A(n,C) g(p)Zp | ≤ E eθ p∈A(n,C) |g(p)|Zp i h P ≤ E eθC p∈A(n,C) Zp h i P ≤ E eθC p∈A(n,C) Yp .


7

Therefore, for any θ > 0, we have i i h h P P (3.12) log E eθ| p∈A(n,C) g(p)Zp | ≤ log E eθC p∈A(n,C) Yp X 1 log (eθC − 1) + 1 = p p∈A(n,C)

≤ (eθC − 1)

X

kn ≤p≤n,p∈P

1 . p

We also have X

(3.13)

kn ≤p≤n,p∈P

and

1 ≃ log log n − log log kn p 1 log n = log log n − log (log log n)2 = 2 log log log n,

2 log log log n log log n

(3.14)

→ 0 as n → ∞. Therefore, by Chebychev’s inequality,   X 1 lim sup g(p)Zp ≥ ǫ log log n ≤ −ǫθ. log P  n→∞ log log n p∈A(n,C)

This proves (3.10) since it holds for any θ > 0. i h P Next, let us show that E eθ p≤kn ,|g(p)|≤C,p∈P g(p)Zp can be approximated by i h P E eθ p≤kn ,|g(p)|≤C,p∈P g(p)Yp in an appropriate way. Lemma 9. For any θ ∈ R, h P i i h P 1 (3.15) lim log E eθ p∈B(n,C) g(p)Zp − E eθ p∈B(n,C) g(p)Yp = −∞, n→∞ log log n

where B(n, C) := {p : p ≤ kn , |g(p)| ≤ C, p ∈ P}. Proof. For any θ ∈ R,

(3.16) h P i i h P θ g(p)Zp − E eθ p∈B(n,C) g(p)Yp E e p∈B(n,C) X X |θ|r |θ|r ≤ |E[Snr ] − E[Sñr ]| + E[Snr ] + r! r! r≤K log log n

r>K log log n

X

r>K log log n

|θ|r ˜r E[Sn ], r!

where (3.17)

Sn :=

X

Sñ :=

g(p)Zp ,

(3.18)

E[Sñr ] =

r X X k=1 ri

g(p)Yp .

p∈B(n,C)

p∈B(n,C)

We claim that |E[Sñr ] − E[Snr ]| ≤

X

(Ckn )r . n

To see this, notice that

1 X r! g(p1 )r1 · · · g(pk )rk E[Ypr11 · · · Yprkk ]. r1 ! · · · rk ! k! p j

8


We observe that E[Ypr11 · · · Yprkk ] = E[Yp1 · · · Ypk ] =

(3.19)

1 , p1 · · · pk

which differs from E[Zpr11 · · · Zprkk ] = E[Zp1 · · · Zpk ] =

(3.20) by at most

1 n.

n 1 , n p1 · · · pk

Therefore, |E[Sñr ] − E[Snr ]| ≤

(3.21)

r X X

k=1 ri

 1 ≤ n

1 X C r1 +···+rk r! r1 ! · · · rk ! k! p n j r X (Ckn )r C ≤ . n

p∈B(n,C)

Thus, we can bound the first term in (3.16) by X

(3.22)

r≤K log log n

|θ|r |E[Snr ] − E[Sñr ]| r!

|θ|r (Ckn )r r! n r≤K log log n X ≤ C0 e− log n er(log |θ|+log C+log kn +1)−r log r ≤

X

≤ C0 e

− log n

r≤K log log n

K log log n ·

max

r≤K log log n

er(log |θ|+log C+log kn +1)−r log r . 1

Let F (r) := r(log |θ| + log C + log kn + 1) − r log r. Since kn = n (log log n)2 ≫ r, it is straightforward to compute that F ′ (r) = (log |θ| + log C + log kn + 1) − log r − 1 > 0,

(3.23)

for any r ≤ K log log n. Therefore the maximum of F (r), r ≤ K log log n is achieved at K log log n and, hence, (3.24)

X

r≤K log log n

n |θ|r K log log n (loglog log n)2 |E[Snr ] − E[Sñr ]| ≤ C1 log log n · e− log n · e r! 1

≤ e− 2 log n ,

for sufficiently large n. The second term in (3.16) is bounded above by the third term. (3.25)

X

r>K log log n

|θ|r E[Snr ] ≤ r!

X

r>K log log n

|θ|r ˜r E[Sn ]. r!

To bound the third term in (3.16), first observe that X g(p1 ) · · · g(pℓ )E[Yp1 · · · Ypℓ ], (3.26) E[Sñr ] = p1 ,p2 ,...,pℓ ≤kn


9

where the sums are over primes p1 , . . . , pℓ ≤ kn that may not be distinct. Notice that kn ≤ n and for distinct p1 , . . . , pℓ , we have 1 . (3.27) E[Yp1 · · · Ypℓ ] = p1 · · · pℓ Therefore, it is not difficult to see that    X X 1  + (3.28) E[Sñr ] ≤ C r  p

p≤n,p∈P

p≤n,p∈P



≤ rC r 

X

p≤n,p∈P

For r > K log log n, (3.29)

r

2



1 + ···+  p

X

p≤n,p∈P

1 . p

r  1   p

|θ|r ˜r E[Sn ] ≃ er log |θ|+r log C−r log r+r+log r+(log log log n)r r! ≤ e(log |θ|+log C+1−log K)r+log r 1

≤ e− 2 (log K)r ,

for sufficiently large K. Hence, the third term in (3.16) is bounded above by X 1 1 (3.30) e− 2 (log K)r ≤ C2 e− 2 log K log log n , r>K log log n

where C2 is a positive constant. The proof is completed by letting K → ∞.

Finally, we are ready to prove Theorem 2. Proof of Theorem 2. Recall that B(n, c) = {p : p ∈ P, |g(p)| ≤ C, p ≤ kn }. For any θ ∈ R, we have i h i h P Y (3.31) E eθg(p)Yp log E eθ p∈B(n,C) g(p)Yp = log p∈B(n,C)

=

X

p∈B(n,C)

log

1 θg(p) 1 e +1− p p

.

For sufficiently large p, 1 θg(p) 1 1 1 (3.32) log + (eθλ − 1) + O , e +1− p p p p2 P 1 1 and it is well known that log log p∈P,p≤n p → 1 as n → ∞, and we also have n (3.33)

lim

n→∞

log log kn log log n − 2 log log log n = lim = 1. n→∞ log log n log log n

Therefore, by Assumption 1 and definition of ρn and ρ i Z C h P 1 (3.34) lim (eθy − 1)ρ(dy), log E eθ p∈B(n,C) g(p)Yp = n→∞ log log n −C

if ρ({−C}) = ρ({C}) = 0 since convergence of moment generating functions implies weak convergence. Here we can assume that ρ({−C}) = ρ({C}) = 0 since there

10


are at most countably many atoms for ρ, we can assume that we choose a sequence of C that goes to infinity so that −C and C are not atoms of ρ. By lemma 9, we have (3.35) i i h P h P 1 1 log E eθ p∈B(n,C) g(p)Zp = lim log E eθ p∈B(n,C) g(p)Yp n→∞ log log n n→∞ log log n Z C = (eθy − 1)ρ(dy). lim

−C

By G¨ arner-Ellis theorem, see e.g. Dembo and Zeitouni [2], ( satisfies a large deviation principle with the rate function ( ) Z

P

g(p)Zp log log n

p∈B(n,C)

∈ ·)

C

(3.36)

IC (x) = sup θx − θ∈R

−C

(eθy − 1)ρ(dy) .

Hence, by the approximation estimates developed in Lemma 7 and Lemma 8 ∈ ·) satisfies a large deviation principle with the rate function Z ∞ θy (e − 1)ρ(dy) . (3.37) I(x) = lim IC (x) = sup θx − X(n) ( log log n

C→∞

θ∈R

−∞

3.2. Proof of Moderate Deviation Principle. We conclude this section by giving a proof of the moderate deviation principle. Proof of Theorem 6. Recall that Yp are independent Bernoulli random variables with parameter p1 and Zp = 1 if V (n) is divisble by p and Zp = 0 otherwise, where V (n) is an integer uniformly distributed on {1, 2, . . . , n}. Let an

kn := n σn2 .

(3.38)

Similar to the proof of Lemma 7, we can show that, for any ǫ > 0,   X σn2 g(p)Zp ≥ ǫan  = −∞, (3.39) lim sup lim sup 2 log P  n→∞ an C→∞ |g(p)|>C,p∈P

and also similar to the proof of Lemma 8, we can show that, for any ǫ > 0,   2 X σ g(p)Zp ≥ ǫan  = −∞. (3.40) lim sup 2n log P  n→∞ an kn