Chernoff type bounds for sum of dependent random variables and

Chernoff type bounds for sum of dependent random variables and applications in additive number theory V. H. Vu

∗

Abstract We present generalizations of Chernoff’s large deviation bound for sum of dependent random variables. These generalizations seem to be very useful for the Erd˝os probabilistic method. As an illustrating application, we sketch the solution of an old problem of Nathanson [14] concerning thin Waring bases.

1

Introduction

A common problem in additive number theory is to prove that a sequence with certain properties exists. One of the essential ways to obtain an affirmative answer for such a problem is to use the probabilistic method, established by Erd˝os. To show that a sequence with a property P exists, it suffices to show that a properly defined random sequence satisfies P with positive probability. The power of the probabilistic method has been justified by the fact that in most problems solved by this method, it seems almost impossible to come up with a constructive proof. Quite frequently, the property P requires that for all sufficiently large n ∈ N, some relation P(n) holds. The general strategy to handle this situation is the following. For each n, one first shows that ∗ Microsoft Research, Microsoft Corporation, Redmond, WA 98052; [email protected]

1

P(n) fails with Pa small probability, say s(n). If s(n) is sufficiently small so that ∞ n=1 s(n) converges, then by Borel-Cantelli’s lemma, P(n) holds for all sufficiently large n with probability 1 (see, for instance, [10], Chapter 3). The main issue in the above argument is to show that for each n, P(n) holds with high probability. One of the key tools one usually use to achieve this goal is the famous Chernoff’s bound [2], which provides a large deviation bound for the sum of independent random variables. Chernoff’s bound has many variants and the following (Theorem A.1.14, [1]) seems to best suit the purpose of this paper. Theorem 1.1 P Let t1 , . . . , tn be independent binary random variables. Consider Y = ni=1 ti and let µ be the mean of Y . Then for any positive constant ε, there is a positive constant c(ε) such that P r(|Y − µ| ≤ εµ) ≤ 2e−c(ε)µ . Let us illustrate the use of Theorem 1.1 by presenting Erd˝os’s solution to one of his favorite problems, Sidon’s problem on the existence of thin bases. A subset X of N is a basis of order k if every sufficiently large number n ∈ N can be represented as a sum of k elements of X and k (n) the number of representations of n, with respect we denote by RX k (n) is small. to X. A basis X is thin if for all sufficiently large n, RX The notion of thin bases was introduced by Sidon, who, sometime in 1930’s, posed the following question to Erd˝os n?

2 (n) = no(1) for all large Is there a basis X of order 2 such that RX

As Erd˝os later recalled, he was thrilled by the question, and thought that he would come up with an answer within few days. It took a little longer; about 20 years later, in 1956, Erd˝os positively answered the question by showing much more is true [4] 2 (n) = Θ(log n), Theorem 1.2 There is a subset X ⊂ N such that RX for all sufficiently large n.

Erd˝os’s proof goes roughly as follows. Define a sequence X randomly by choosing each number x to be in X with probability px = c(log x/x)1/2 , where c is a large positive constant (if x is 2

small so that px = c(log x/x)1/2 > 1 simply set px = 1). Let tx be the indicator of the event that x is chosen; it is obvious that tx ’s are independent random variables. The number of representations of n can now be expressed as Yn =

X

ti tj .

(1)

i+j=n

A routine calculation shows that µn = E(Yn ) = Θ(log n). Set ε = 1/2; by increasing c, we can assume that µn ≥ 2 log n/c(ε) where c(ε) is the constant in Theorem 1.1. Notice that if i+j = i0 +j 0 = n then either {i, j} = {i0 , j 0 } or they are disjoint. Therefore, the products ti tj in (1) are independent. Theorem 1.1 thus applies and implies P∞

P r(|Yn − µn | ≥ µn /2) ≤ 2e−2 log n .

(2)

Since n=1 e−2 log n converges, the proof is complete by BorelCantelli’s lemma. It is quite natural to ask whether Theorem 1.2 can be generalized to arbitrary k. Using the above approach, in order to obtain a basis k (n) = Θ(log n), we should set p = cx1/k−1 log1/k x. X such that RX x Similar to (1), consider Yn =

X

tx1 . . . txk .

(3)

x1 +...+xk =n

Although Yn has the right expectation Θ(log n), we face a major problem: the products tx1 . . . txk with k > 2 are not anymore independent. In fact, a typical number x appears in quite many (Ω(nk−2 )) solutions of x1 + . . . + xk = n. This completely dashes the hope that one can use Theorem 1.1 to conclude the argument. It took a long time to overcome this problem of dependency. In 1990, Erd˝os and Tetali [8] successfully generalized Theorem 1.2 for arbitrary k. Theorem 1.3 For any fixed k, there is a subset X ⊂ N such that k (n) = Θ(log n), for all sufficiently large n. RX

3

The heart of Erd˝os-Tetali’s proof is to show that Yn (as defined in (3)) is sufficiently concentrated, and to do this the authors used a³ very delicate moment computation. ´ ³ In particular,´the lower tail P r(Yn ≤ µn /2) and upper tail P r(Yn ≥ 3µn /2) probabilities need to be treated separately with rather different techniques. The purpose of this paper is to introduce several new large deviation bounds, which can replace Theorem 1.1 in the case the events in the sum are dependent. These bounds are interesting by several reasons. First, they can be seen as extensions of Chernoff’s bound to a larger class of functions. Moreover, compared to other classical large deviation results such as Azuma’s and Janson’s inequalities, our bounds possess certain advantages. Finally, they seem very useful in applications concerning random sequences and we shall illustrate this by few applications. The approach based on these bounds, presented in the applications, is very general and easy to implement to other problems. Most of the results presented here were proved in details in other papers [12,17,18,21] (these papers can be found in journals or downloaded from http://msnhomepages.talkcity.com/IvyHall/vanhavu/default.htm). By putting them together we hope to give the reader a more systematic view of our results and a better understanding of the applications. The rest of the paper is organized as follows. In the next section, we describe our large deviation results. We shall also make a brief comparison between our bounds and classical large deviation results such as Azuma’s and Janson’s inequalities. In Section 3, we shall use these bounds to give a short proof of a further extension of Theorem 1.3. This section also serves as a prelude to a more difficult application which follows in Section 4, involving thin Waring bases. The main result of this section, among others, settles a 20 year old question of Nathanson [14]. Notations. In the whole paper, N denotes the set of positive integers and Nr denotes the set of all r-powers. The asymptotic notations such as Θ, O, o are used under the assumption that n → ∞. All logarithms have natural base. E(Y ) denotes the expectation of a random variable Y .

4

2

Concentration of sum of dependent variables

In this section, we assume that t1 , . . . , tn are independent binary random variables. In many problems concerning random sequences, the random variable we are interested in can be expressed in a form Pm I , j=1 j where Ij are indicator random variables which are usually a product of several atom variables ti ’s. Typically, the same ti may appear in many Ij ’s, and therefore the Ij ’s are not independent. Our results focus on random variables depending on t1 , . . . , tn which can be expressed as a polynomial with positive coefficients (a random variable depending on t1 , . . . , tn can be view as a function from the n-dimensional unit hypercube, equipped with the product measure, to R). It is clear that this class covers random variables of the above type, since each Ij can be seen as a monomial. It is also clear that all random variables considered in the previous section are of this type. The sum of all ti , considered in Theorem 1.1, is a polynomial of degree 1.

2.1

Premilinaries

Consider a polynomial Y with degree k in t1 , . . . , tn , we say that Y is regular if its coefficients are positive and at most 1. Moreover, Y is homogeneous if the degree of every monomial is the same. Given a (multi-) set A, ∂A (Y ) is the partial derivative of Y with respect to the variables with indices in A. For instance, if Y = t1 t22 and A1 = {1, 2} and A2 = {2, 2} then ∂A1 (Y ) = 2t2 and ∂A2 (Y ) = 2t1 , respectively. If A is empty then ∂A (Y ) = Y . EA (Y ) denotes the expectation of ∂A (Y ). Furthermore, set Ej (Y ) = max|A|≥j EA (Y ), for all j = 0, 1, . . . , k. The results we are about to present follows the following general theme: For a positive polynomial Y , if the expectations of its partial derivatives are relatively small, then Y is strongly concentrated. For the intuition behind the results and the general method to prove them, we refer to [21].

5

2.2

Polynomials with expectation Θ(log n)

In this subsection, we consider a special case when the polynomial in question has expectation of order log n. It is a typical case in applications concerning random sequences and the following theorem, proved by the present author in [18], seems to be very useful. Theorem 2.1 For any positive constants k, α, β, ε there is a constant Q = Q(k, ε, α, β) such that the following holds. If Y is a regular positive homogeneous polynomial of degree k, n/Q ≥ E(Y ) ≥ Q log n and E(∂A (Y )) ≤ n−α for all non-empty set A of cardinality at most k − 1, then ³ ´ P r |Y − E(Y )| ≥ εE(Y ) ≤ n−β . One can interpreted this theorem as follows: If Y has expectation Θ(log n) and the condition E(∂A (Y )) ≤ n−α is satisfied, then in certain range the tail distribution of Y is similar to that of Poisson distribution. The case k = 1 is well-known and appeared in most text books in probability. It would be instructive for the reader to compare this theorem with Theorem 1.1.

2.3

General results

Theorem 2.1 is a special case of a more general result. To this end, we assume that our polynomial Y is regular and has degree k. To state the result, we need some new definitions. First define a function f as follows: f (K) = max{1, b(K/k!)1/k c}− 2. Furthermore, let f (K/2)

1 1 1 nk δ 2 r(k, K, n, δ) = (log2 ) + (δ/K 8 )b 8k log δ c . δ f (K/2)!

Given a regular polynomial Y , define h(k, K, n, δ) recursively in the following way: h(1, K, n, δ)

=0

h(k, K, n, δ) = h(k − 1, K, n + dE(Y )e, δ) + nr(k, K, n, δ). 6

∗ Y the polynomial obtained from Given a set A, we denote by ∂A the partial derivative ∂A Y by subtracting its constant coefficient. ∗ Y and we define E∗ (Y ) = max ∗ E∗A (Y ) is the expectation of ∂A |A|≥1 EA (Y ). 1 Finally, notice that since our atom random variables ti are binary, every high degree term can be reduced to degree 1 (for instance t31 = t21 = t1 . A polynomial is simplified if every monomial is completely reduced. The following theorem is the main theorem of [18]

Theorem 2.2 Let Y be a simplified regular polynomial of degree k. For any positive numbers δ, λ and K satisfying K ≥ 2k, E∗1 (Y ) ≤ δ ≤ 1 and 4kKλ ≤ E(Y ) ³ ´ p P r |Y − E(Y )| ≥ 2 λkKE(Y ) ≤ 2ke−λ/4 + h(k, K, n, δ). The reader might want to try to deduce Theorem 2.1 from Theorem 2.2 as an exercise. One can also deduce from Theorem 2.2 the following corollary, which gives a concentration result for functions with expectation O(log n). Corollary 2.3 Assume that Y is a regular positive homogeneous polynomial of degree k and the expectation of Y is at most log n. Assume furthermore that for all A, 1 ≤ |A| ≤ k − 1, E(∂A (Y )) ≤ n−α for some positive constant α. Then there are positive constants c = c(α, k) and d = d(α, k) such that for any 0 ≤ ε ≤ 1, ³ ´ 2 P r |Y − E(Y )| ≥ εE(Y ) ≤ de−cε E(Y ) . Again, the reader is invited to compare this corollary with Theorem 1.1. To conclude this subsection, let us mention a result of a little bit different flavor. This result was needed in the proof of Theorem 2.2 and proved useful in several other situations (see [1] or Section 3, for instance). The proof (using moments calculation) is simple and can be found in Chapter 8 of [1]. Consider a polynomial Y which is a sum of different monomials with coefficient 1. Two monomials of Y are disjoint if they do not 7

share any atom variable. At a point t, we say that a monomial is alive if it is not zero. We denote by Disj(Y (t)) the maximum number of pairwise disjoint alive monomials of Y at t. Proposition 2.4 For Y as above and any positive integer K P r(Y ≥ K) ≤ E(Y )K /K!.

2.4

More general results

The conditions of Theorems 2.1 and 2.2 require the expectation of any partial derivative of Y be much smaller than 1. This is usually satisfied when the polynomial has expectation O(polylogn). However, for polynomials with larger expectations, one cannot always expect this to happen. The following theorem, proved by Kim and the present author [12] (see also [1], Chapter 7) deals with such a situation. Theorem 2.5 For every positive integer k there are positive constants ak and bk depending only on k such that the following holds. For any positive polynomial Y = Y (t1 , . . . , tn ) of degree k, where ti ’s are independent binary random variables ³ ´ p P r |Y − E(Y )| ≥ ak λk−1/2 E0 (Y )E1 (Y ) ≤ bk e−λ/4+(k−1) log n . Theorem 2.5 is powerful when the ratio E(Y )/EA (Y ) is large (for any A). This theorem does not require EA (Y ) to be bounded by 1. For instance, if E(Y )/EA (Y ) ≥ nδ for some positive constant δ, then E0 (Y ) = E(Y ) ≥ E1 (Y )nδ . Thus, one canpchoose λ = nε for some positive constant ε such that the tail ak λk E0 (Y )E1 (Y ) is still negligible compared to E(Y ). On the other hand, the bound is ε roughly e−n /4 . Theorem 2.5 and its variants have several deep applications in combinatorics [19, 20, 13]. The interested reader might want to check [21] for a survey on the subject. Remark. Theorem 2.5 was stated in [12] and [1] with λk instead of λk−1/2 . It is remarked in [12] that the replacement is possible with 8

essentially the same proof. In certain range of λ, λk−1/2 could be further reduced to λ1/2 [21].

2.5

Comparison

Two well-known results which are frequently used in the Erd˝os probabilistic method are Azuma’s inequality and Janson’s inequality. In the following, we state these results and make a brief comparison with our bounds. Let us start with Azuma’s inequality. We say that a function Y = Y (t1 , . . . , tn ) has Lipschitz coefficient r if changing any ti changes the value of Y by at most r. The following version of Azuma’s inequality follows [1]. Theorem 2.6 If Y has Lipschitz coefficient r, then for any λ > 0 √ P r(|Y − E(Y )| ≥ r λn) ≤ 2e−λ/2 . The advantage of Azuma’s inequality is that it is very general: Y can be any function. On the other hand, restricted to the class of polynomials, our results are frequently stronger since they guarantee strong concentration under a usually much weaker condition. There are several situations where Y has large Lipschitz coefficient but the expectations of its partial derivatives are still very small. A concrete example will appear in the next section when we examine the function Yn in (3). Now let us present Janson’s inequality. Pm This inequality deals with functions which can be expressed as j=1 Ij , where each Ij is the product of some ti ’s. We write Ii ∼ Ij if the two monomials share a common atom variable. In [11], Janson proved P Theorem 2.7 With Y as above and ∆ = Ii ∼Ij E(Ii Ij ), the following holds ³ ´ (εE(Y ))2 − 2(E(Y )+∆) P r Y ≤ (1 − ε)E(Y ) ≤ e . The advantage of this theorem is that there is no restriction on the degree of Y . On the other hand, it provides only the lower tail 9

bound. Our results provide bounds in both directions. Moreover, the strength of these bounds are often comparable to what one gets from Janson’s inequality.

3

Thin linear bases

In this section, we extend Theorem 1.3 by allowing a representation to be a linear combination with fixed coefficients. Fix k positive integers a1 , . . . , ak , where gcd(a1 , . . . , ak ) = 1. Let QkX (n) be the number of representations of n of the form n = a1 x1 + . . . ak xk , where xi ∈ X. We shall prove Theorem 3.1 There is a subset X ⊂ N such that QkX (n) = Θ(log n), for all sufficiently large n. The proof of Theorem 3.1 is based essentially on the proof of the more difficult Theorem 4.2 in [17]. However, since the proof of Theorem 3.1 is much simpler, we present it first in order to give the reader a better understanding of our method. The assumption gcd(a1 , . . . , ak ) = 1 is necessary, due to a simple number theoretic reason. Theorem 1.3 follows from Theorem 3.1 by setting all ai = 1. To start the proof, we use Erd˝os’s idea and define a random set X as follows. For each x ∈ N, choose x with probability px = cx1/k−1 log1/k x, where c is a positive constant to be determined. Let tx be the indicator random variable of this choice; thus, tx is a {0, 1} random variable with mean px . Fix a number n, and let Qn be the Pset of all k-tuples (x1 , . . . , xk ), where xi are positive integers and i ai xi = n. The number of representations of n using elements from the random sequence X can be expressed as a random variable in the following way Yn =

X

tx1 . . . txk .

(4)

(x1 ,...,xk )∈Qn

It is obvious that Y is a polynomial of degree k in t1 , . . . , tn . We now show that with probability close to 1, Yn is Θ(log n) for any sufficiently large n. It is easy to show that E(Yn ) is of the right 10

order, namely log n. Next, we want to make use of Theorem 2.1. The main obstruction here is that Yn , as a polynomial, does have partial derivatives with large expectations which violate the condition of Theorem 2.1. For instance, consider the representation a1 K +a2 x2 + . . .+ak xk where K is a constant. The partial derivative with respect to tx2 , . . . , txk has expectation pK = Θ(1). However, we could easily overcome this obstruction by splitting Yn into two parts, as follows. [1] Set a = 0.4 (0.4 can be any small constant) and let Qn be the subset of Qn consisting of all tuples whose smallest element is at [2] [1] least na and Qn = Qn \Qn . We break Yn into the sum of two [1] [2] terms corresponding to Qn and Qn , respectively: Yn = Yn[1] + Yn[2] , where Yn[j] =

X

tx1 . . . txk .

[j] (x1 ,...,xk )∈Qn

[1]

Intuitively, Yn should be the main part of Yn , since in most solutions Pk of i=1 ai xi = n, all xi = Θ(n). To finish the proof it suffices to show ³ ´ [1] [1] [1] [1] (A) E(Yn ) = Θ(log n) and P r |Yn − E(Yn )| ≥ E(Yn )/2 ≤ n−2 . (B) For almost every sequence X, there is a finite number M (X) [2] such that Yn ≤ M (X) for all sufficiently large n. (A) and (B) confirmed our intuition. The main part of Yn , [1] [2] which is of order log n, indeed comes from Yn ; Yn ’s contribution is bounded by a constant. We apply Theorem 2.1 to verify (A). In order to complete this, [1] we first need the following lemma, which bounds E(∂A Yn )’s. Lemma 3.2 For all non-empty multi-sets A of size at most k − 1, E(∂A Yn[1] ) = O(n−a/2k ). Proof. Consider a (multi-) set A of k − l elements y1 , . . . , yk−l . For a permutation π ∈ Sk (where Sk denotes the symmetric group on 11

[1]

{1, 2, . . . , k}) let Qn,l,π be the set of l-tuples (x1 , . . . , xl ) of positive integers satisfying xi ≥ na for all i and l X

aπ(i) xi = n −

i=1

k−l X

aπ(l+j) yj .

j=1

A simple consideration shows that ∂A (Yn[1] ) ≤ b(k)

X

X

tx1 . . . txl ,

π∈Sk (x ,...,x )∈Q[1] 1 l n,l,π

where b(k) is a constant depending on k. By symmetry, it now suffices to verify the following E(

X

tx1 . . . txl ) = O(n−a/2k ), [1]

(x1 ,...,xl )∈Qn,l,π

0

where π0 is the identity permutation. Without loss of Pgenerality, we can assume that xl = max(x1 , . . . , xl ). Set m = n − k−l j=1 al+j yj ; Pl since i=1 ai xi = m and the ai ’s arePfixed numbers,Rit follows that m 1/k−1 1/k−1 ≈ xl = Ω(m/l). Using the fact that m ∂z ≈ x=1 x 1 z 1/k m , we have E(

X

P O( (x

tx1 . . . txl ) =

[1] (x1 ,...,xl )∈Qn,l,π 0

= O(log n)

P

[1] 1 ,...,xl )∈Qn,l,π 0

na ≤min(x1 ,...,xl ) a1 x1 +...+al xl =m

px1 . . . pxl ) 1/k−1

x1

1/k−1

. . . xl

=

P 1/k−1 )l−1 (m/l)1/k−1 ) O(log n)O(( m x=1 x

=

O(log n)O(m(l−1)/k (m/l)1/k−1 )

=

O(log n)O(m(l−k)/k ) = O(n−a/2k ),

since k − l ≥ 1 and m ≥ na . This concludes the proof of the lemma. ¤ The last step in the previous calculation explains the restriction min(xi )ki=1 ≥ na . This assumption guarantees that every partial [1] derivatives of Yn has small expectation. 12

From the above calculation, it follows immediately (by setting l = [1] k and m = n) that E(Yn ) = O(log n). Moreover, a straightforward [1] argument shows that if c → ∞, then E(Yn )/ log n → ∞. Indeed, [1] there are at least Θ(nk−1 )) k-tuples (x1 , x2 . . . , xk ) in Qn where xi = Θ(n) for all i ≤ k, where the constants in the Θ’s depend only on k and the ai ’s. On the other hand, each such tuple contributes at least [1] ck n1−k log n to E(Yn ). Therefore, by increasing c, we can assume [1] that E(Yn ) satisfies the condition of Theorem 2.1. Theorem 2.1 then applies and implies (A). Before continuing with the proof of (B), let us pause for a moment and show why we could not apply Azuma’s inequality to prove (A). [1] The reason is that the Lipschitz coefficient of Yn is way too large. It is clear that there is a number x which appears in Ω(nk−2 ) tuples [1] in Qn (as a matter of fact, many numbers do so). For such an x, [1] changing tx , in the worst case, might change Yn by Ω(nk−2 ). Thus, [1] the Lipschitz coefficient of Yn is Ω(nk−2 ). This coefficient is clearly too large for Azuma’s inequality to deliver a non-trivial bound. Now we turn to the proof of (B), which is purely combinatorial. We say that a l-tuple (x1 , . . . , xl ) (l ≤ k) is an Pl-representation of n if there is a permutation π ∈ Sk such that li=1 aπ(i) xi = n. For all l < k, let QlX (n) be the number of l-representations of n. With essentially the same computation as in the previous lemma, one can show that E(QlX (n)) = O(n−1/k log n) = O(n−1/2k ). Proposition 2.4 then implies that for a sufficiently large constant M1 , with probability 1 − O(n−2 ), the maximum number of disjoint representations of n in QlX (n) is at most M1 . By Borel-Cantelli’s lemma, we conclude that for almost every random sequence X there is a finite number M1 (X) such that for any l < k and all n, the number of disjoint l-representations of n from X is at most M1 (X). Using a computation similar to the one in the proof of Lemma [2] 3.2, one can deduce that E(Yn ) =PO(n(a−1)/k log n) = O(n−1/2k ). a Indeed, since x1 ≤ n , instead of ( nx=1 x1/k−1 )k−1 , one can write Pna 1/k−1 Pn ( x=1 x1/k−1 )k−1 and the bound follows. So, again x=1 x by Proposition 2.4 and Borel-Cantelli’s lemma, there is a constant M2 such that almost surely, the maximum number of disjoint k-

13

[2]

representations of n in Yn is at most M2 for all large n. From now [2] on, it would be useful to think of Yn as a family of sets of size k, each corresponds to a representation of n. We say that a sequence X is good if it satisfies the properties described in the last two paragraphs. [2] To finish the proof, it suffices to show that if X is good, then Yn is bounded by a constant. This follows directly from a well-known combinatorial result Erd˝os and Rado’s [7], stated below. A collection of sets A1 , . . . , Ar forms a sun flower if the sets have pair-wise the same intersection (Ai ’s are called petals of the flower). Erd˝os and Rado shown Lemma 3.3 If H is a collection of sets of size at most k and |H| > (r − 1)k k! then H contains r sets forming a sun flower. ³ ´k Set M (X) = max(M1 (X)k!, M2 ) k! and assume that n is suf[2]

ficiently large. It is clear that if Yn > M (X), then by Erd˝os-Rado’s [2] sunflower lemma, Yn contain a sunflower with M3 = max(M1 (X)k!, M2 )+ 1 petals. If the intersection of this sunflower is empty, then the petals form a family of M3 disjoint k-representations of n. Otherwise, assume that the intersection consists of y1 , . . . , yj where 1 ≤ j ≤ k − 1. By the pigeon hole principle, there is a permutation π ∈ Sk such P that one can find M1 (X) + 1 (k − j)-representations of m = n − ji=1 aπ(i) yi among the sets obtained by the petals minus their common intersection. These M1 (X)+1 sets are disjoint due to the definition of the sun flower. Therefore, in both cases we obtain a contradiction. ¤ Remark. Using the more general Theorems 2.2 or 2.5 one can prove a statement similar to Theorem 3.1 when QkX (n) is required to be Θ(g(n)), for any “reasonable” function g(n) À log n (reasonable here means that one should be able to set px so that the expectation of QkX (n) is g(n)). In fact, in such a case, one can easily show that QkX (n) is asymptotically g(n), since now in the proof of (A) one can have a deviation tail o(g(n)). The much harder direction is to go below log n and here the probabilistic method seems to reach its limit. In order to prove that a function Y with probability 1 − n−2 14

does not deviate too much from its expectation, it does seem that one needs to assume that the expectation of Y is of order at least log n.

4 4.1

Thin Waring bases The problem

The most interesting bases in additive number theory are, perhaps, the set of all rth powers, for arbitrary r ≥ 2. The famous Waring’s conjecture (proved by Hilbert, Hardy-Littlewood, Vinogradov and many other by the beginning of the last century) asserts that for any fixed r and sufficiently large k, Nr , the set of all rth powers is a k basis. These bases are very far from being thin, due to the following deep theorem [16]. Theorem 4.1 For any fixed r ≥ 2, there is a constant k(r) such k (n), the number of representations of n as that if k ≥ k(r) then, RN k th a sum of k r powers satisfies k k/r−1 RN ), r (n) = Θ(n

for every positive integer n. It is natural to ask that whether Nr contains a thin subbasis and one may probably hope to obtain a result similar to Theorem 1.3. On the other hand, the problem appears much more difficult. For many years, researchers focused on a simpler question that whether Nr contains a subbasis of small density. This question has been investigated intensively for r = 2 [5, 3, 23, 24, 22, 15]. Choi, Erd˝os and Nathanson proved in [3] that N2 contains a subbasis X of order 4, with X(m) ≤ m1/3+² , where X(m) denotes the number of elements of X not exceeding m. Improving this result, Zöllner [23, 24] shows that for any k ≥ 4 there is a subbasis X ⊂ N2 of order k satisfying X(m) ≤ m1/k+² for arbitrary positive constant ². Wirsing [22], sharpening Zöllner’s theorem, proved that for any k ≥ 4 there is a subbasis X ⊂ N2 of order k satisfying X(m) = O(m1/k log1/k m). It 15

is easy to see, via the pigeon-hole principle, that Wirsing’s result is best possible, up to the log term. A short proof of Wirsing’s result for the case k = 4 was given by Spencer in [15]. For r ≥ 3, much less was known. In 1980, Nathanson [14] proved that Nr contains a subbasis with density o(m1/r ). In the same paper, he raised the following question. Question. Let r ≥ 2 and k be fixed, positive integers, where k is sufficiently large compared to r. What is the smallest density of a subbasis of order k of Nr ? Can it be m1/k+o(1) ? It is clear that the conjectured density m1/k+o(1) is best possible up to the o(1) term. Very recently, the present author succeeded to prove the extension of Theorem 1.3 to Nr for arbitrary r [17] Theorem 4.2 For any fixed r ≥ 2, there is a constant k(r) such that if k ≥ k(r) then Nr contains a subset X such that k RX (n) = Θ(log n), k (1) = 1. for every positive integer n ≥ 2 and RX

This theorem implies, via the pigeon hole principle that X has density O(m1/k log1/k m), settling Nathanson’s question. The proof of Theorem 4.2 uses the framework presented in the previous subsection and again relies heavily on Theorem 2.1.

4.2

Sketch of the proof

The proof of Theorem 3.1 follows closely the framework provided in the previous subsection, but it will require, in addition, a purely number theoretic lemma (Lemma 4.3), which is an extended version of Theorem 4.1 (Theorem 4.1) and is interesting on its own right. To start, we define a random subset of Nr as follows. Choose, for each x ∈ N, xr with probability px = cx−1+r/k log1/k x, where c is a positive constant to be determined. Again let tx denote the characteristic random variable representing the choice of xr : tx = 1 if xk is chosen and 0 otherwise. Similar to (4), the number of

16

representations of n (not counting permutations), restricted to X, can be expressed as follows

Yn =

k RX (n)

=

k Y

X

txj = Y (t1 , . . . , tbn1/r c ).

(5)

xr1 +...+xrk =n j=1

Given the framework presented in the last section, the substantial difficulty remaining is to estimate the expectations of Yn and its [1] [1] partial derivatives (to be precise, we should consider Yn , where Yn is defined similarly with a properly chosen parameter a; however, let us put this technicality aside). In the following, we shall focus on the expectation of Yn . Notice that E(Yn ) =

X xr1 +...+xrk =n

k

c

k Y

−1+r/k

xj

log1/k xj .

j=1

To see that the right hand side has order log n, one may P argue as follows. A typical solution (x1 , . . . , xk ) of the equation kj=1 xrj = n should satisfy xj = Θ(n1/r ), for all j. Thus, a typical term in the sum is Θ(n−k/r+1 log n). On the other hand, by Theorem 4.1, the number of terms is Θ(nk/r−1 ) and we would be done by taking the product. However, there could be many non-typical solutions with a larger contribution. For instance, assume that x = (x1 , . . . , xk ) is a solution where 1 ≤ xj ≤ Pj and some of the Pj ’s are considerably smaller than n1/r (for example, P1 = nε with ε ¿ 1/r). The contribution Q −1+r/k of the term corresponding to x is at least Ω( sj=1 Pj ), which is significantly larger than the contribution of a typical term. To overcome this problem, Pwe need a upper bound on the number of solutions of the equation kj=1 xrj = n, with respect to additional constrains xj ≤ Pj , for arbitrary positive integers P1 , . . . , Pk . Denote this number by Root(P1 , . . . , Pk ). We proved the following lemma, which generalizes the lower bound in Theorem 4.1. Lemma 4.3 For a fixed positive integer r ≥ 2, there exists a constant kr such that the following holds. For any constant k ≥ kr ,

17

there is a positive constant δ = δ(r, k) such that for every sequence P1 , . . . , Pk of positive integers −1

Root (P1 , . . . , Pk ) = O(n

k Y j=1

Pj +

k Y

1−r/k−δ

Pj

),

j=1

for all n. The proof of this lemma requires a sophisticated application of the Hardy-Littlewood’s circle method and is beyond the scope of this paper. The reader may consult [17] for the full proof.

References [1] N. Alon and J. Spencer, The Probabilistic Methods, Second Edition, Wiley, NewYork, 2000. [2] H. Chernoff, A measure of the asymptotic efficinecy for tests of a hypothesis based on the sum of observation, Ann. Math. Stat. 23 (1952), 493-509. [3] S.L.G Choi, P. Erd˝os and M. Nathanson, Lagrange’s theorem with N 1/3 squares, Proc. Am. Math. Soc., 79 (1980), 203-2-5. [4] P. Erd˝os, Problems and results in additive number theory, Colloque sur la Théory de Nombre (CBRM) (Bruxelles, 1956), 127137. [5] P. Erd˝os and M. Nathanson, Largange’s theorem and thin subsequences of squares. In J.Gani and V.K. Rohatgi, editors, Contribution to Probability, p.3-9, Academic Press, New York, 1981. [6] P. Erd˝os and A. Rényi, Additive properties of random sequences of positive integers, Acta Arith., 6 (1960), 83-110. [7] P. Erd˝os and R. Rado, Intersection theorems for systems of sets, J. London Math. Soc., 35 (1960), 85-90.

18

[8] P. Erd˝os and P. Tetali, Representations of integers as sum of k terms, Random Structures and Algorithms 1 (1990), 245-261. [9] P. Erd˝os, A. Sárközy and T. Sós, Problems and results on additive proeprties of general sequences III, Studia Sci. Math. Hung., 22 (1987), 53-63. [10] H. Halberstam and K. F. Roth, Sequences, Springer-Verlag, New York, 1983. [11] Janson, S. Poisson approximation for large deviations, Random Structures and Algorithms 1 (1990), 221-230. [12] J. H. Kim and V. H. Vu, Concentration of multivariate polynomials and its applications, Combinatorica, 20 (2000), 417-434. [13] J. H. Kim and V. H. Vu, Small complete arcs in projective planes, submitted. [14] M. Nathanson, Waring’s problem for sets of density zero, 302310, Analitic Number Theory, edited by M. Knopp, Lecture Notes in Mathematics 899, Springer-Verlag, 1980. [15] J. Spencer, Four squares with few squares, p 295-297, D.V. Chudnovsky et al. editors,Number Theory, New York Seminar 1991-1995, Springer. [16] R.C. Vaughan, The Hardy-Littlewood method, Cambridge Univ. Press, 1981. [17] V. H. Vu, On a refinement of Waring’s problem, to appear in Duke Math. Journal. [18] V. H. Vu, On the concentration of multivariate polynomials with small expectation, Random Structures and Algorithms, 16 (2000), 344-363. [19] V. H. Vu, New bounds on nearly perfect matchings in hypergraphs: higher codegrees do help, Random Structures and Algorithms, 17 (2000), 29-63.

19

[20] V. H. Vu, On the list chromatic number of locally sparse graphs, submitted. [21] V. H. Vu, Concentration of non-Lipschitz functions and applications, in preparation. [22] E. Wirsing, Thin subbases, Analysis, 6 (1986), 285-308. [23] J. Zöllner, Der Vier-Quadrate-Satz und ein Problem von Erd˝os and Nathanson, Ph.D thesis, Johannes Gutenberg-Universit¨ at, Mainz, 1984. ¨ [24] J. Zöllner, Uber eine Vermutung von Choi, Erd˝os and Nathanson, Acta Arith., 45 (1985), 211-213.

20

Chernoff type bounds for sum of dependent random variables and

Chernoff type bounds for sum of dependent random variables and

Suggest Documents

Bounds for sums of random variables when the ...

Random Variables and Probability Distributions Random Variables

Probability Bounds for Polynomial Functions in Random Variables - Enet

The Distribution of a Sum of Binomial Random Variables

Sum of Squared Shadowed-Rice Random Variables ... - IEEE Xplore

Chernoff-Hoeffding Bounds for Markov Chains: Generalized and ...

Improved Bounds for Sum Multicoloring and

Stein's method for dependent random variables occurring in Statistical ...

a new graphical model for dependent random variables - CiteSeerX

Probabilities and Random Variables

Probability and Random Variables

Probabilities and random variables

Random variables ∑

Sums of dependent lognormal random variables - Department of ...

Fano's inequality for random variables

Discrete Random Variables and Probability Distributions Random ...

UPPER AND LOWER BOUNDS FOR THE SUM OF ... - Caltech Authors

Chernoff-type Direct Product Theorems - Columbia CS

Generalized Bhattacharyya and Chernoff upper bounds on Bayes ...

Probability, Random Variables, and Selectivity

Probability, Random Variables, and Stochastic

Chernoff-Type Direct Product Theorems - Springer Link

Chernoff bounds on pairwise error probabilities of ... - Semantic Scholar

Random Variables 1 Motivation