Contributions to the Generalized Coupon Collector and LRU ... - arXiv

Contributions to the Generalized Coupon Collector and LRU Problems Christian BERTHET STMicroelectronics, Grenoble, France, Abstract. Based upon inequalities on Subset Probabilities, proofs of several conjectures on the Generalized Coupon Collector Problem (i.e. CCP with unequal popularity) are presented. Then we derive a very simple asymptotic relation between the expectation of the waiting time for a partial collection in the CCP, and the Miss rate of a LRU cache. Keywords: Generalized Coupon Collector Problem, LRU, Combinatorial Identities, Subset Probabilities, Inequalities on Subset Probabilities. Address all correspondence to: BERTHET Christian; E-mail: [email protected]

1.

Introduction

It is known that Coupon Collector Problem (CCP) for a general popularity and a partial collection, on the one hand, and Miss rate computation of Least Recently Used (LRU) caches on the other hand are twin problems [Flajolet92]. The latter problem is also referred to as the Move-to-front search cost. In the Coupon Collector Problem, a set of N, N>1, distinct objects (coupons, items...) is sampled with replacement by a collector in a way which is independent of all past events. This random process is frequently labelled Independent Reference Model (IRM). Each drawing produces item ‘i’ from the reference set of N items with probability pi such N

that ∑ pi = 1 . Distribution {pi} is often called ‘popularity’. Also, as in [Boneh97] we use i =1

the shorthand EL (‘equally likely’) to denote a uniform distribution (i.e. pi=1/N). CCP problem comes down to define how many trials (‘waiting time’) are needed before one has collected N items for the complete collection and a number n, n 1, 1 > pi > 0, N

and ∑ pi = 1 . For a subset J of the reference set {1,..,N}, we use uppercase PJ to denote i =1

the probability of subset J, i.e. the sum of probabilities of the elements of J: PJ = ∑ pi . i∈J

Christian BERTHET

Page 2

6/16/2017

2.

Tn Variable: Waiting Time (i) Probability Formula

We use the definition of [Boneh97] for the Waiting Time. This variable is “Tj: The number of drawings needed to complete a sub-collection of size j”. In the following we use the notation: ‘n’ is the sub-collection size, out of ‘N’ possible items in the reference set, and ‘k’ the number of drawings. The formula for the probability of this variable (pdf form) was first given by Von Schelling [VonSchelling54] (using a somewhat different notation), N≥n≥1, k≥1: n−1  N − j −1  k −1 Pr[Tn = k ] = ∑ (−1) n−1− j   ∑ PJ (1 − PJ ) . N − n   j =0 |J |= j

Cumulative form (so-called CDF): Pr[Tn ≤ k ] = ∑0 k ] = 1 − Pr[Tn ≤ k ] are easily computed.

It can be proved that Pr[Tn = k ] is always null for k k ] = ∑ (−1) n−1− j   ∑∑ PJ , hence Tn expectation is:  N −n |J |= j k ≥0 k ≥0 j =0 n −1 N − j −1 1   . This is the expression given by Flajolet&al E[Tn ] = ∑ (−1) n−1− j  ∑  N −n |J |= j 1 − PJ j =0 (Formula 14a of the reference paper [Flajolet92]) for a partial collection.

It is direct that

∑ Pr[T

n

N −1

1 . | J | = j 1 − PJ

For the full collection, formula is: E[TN ] = ∑ (−1) N −1− j ∑ j =0

This notation is equivalent to the “Von Schelling notation” using index change k=N-j: N N 1 1  k −1   k −1  which appears in E[Tn ] = ∑ (−1) n−1− N +k   ∑ = ∑ (−1) n−1− N +k   ∑  N −n |J |= N −k 1 − PJ k = N −n+1  N −n |J |=k PJ k = N − n +1 N 1 [Schelling54], and: E[TN ] = ∑ (−1) k −1 ∑ for the complete collection. k =1 | J | = k PJ N 1  N   = ∑   k , then: | J | = k PJ k is the Nth harmonic number, thanks to the remarkable

It is easy to see that, for EL and complete collections, since E[TN ] = NH N , where HN

m m m 1 1 equality: ∑ (−1) p −1   = ∑ = H m . Formula is generalized to partial collections of   p =1  p  p p =1 p EL probabilities E[Tn ] = N (H N − H N −n ) (Appendix Section 9 gives a possible proof).

(iv) Pdf curves Following graph shows the pdf probability (decimal log) of a complete collection for N=12, k≥12 and three different popularities: uniform, Zipf and generalized Zipf (aka power-law) with 0.5 parameter.

Christian BERTHET

Page 4

6/16/2017

N! −N , H N and N N −N 1− a H N ,a ( N !) for uniform, Zipf law and generalized Zipf law with skewness ‘a’ where

For k=N (initial point of each curve) pdf probability is respectively:

H N (resp. H N ,a ) is the Nth (resp. Generalized) Harmonic number. We verified that up to N=100 (so it is conjectured for N>100) that the abciss of the pdf maximum for an EL distribution is o(N*lnN), i.e. the same trend as the expectation but slightly below, since HN=ln(N)+γ+ο(1/Ν), where γ is the Euler-Mascheroni constant (γ≈0.5772). For any other power-law distribution, abciss of the maximum is not known.

(v) Observation in EL case

Let us consider the CDF expression Pr[TN ≤ k ] extended to the case where k (k>N) is not integer, i.e. belongs to the continuous domain. Then we define Pr[TN ≤ E[TN ]] , i.e. the probability that waiting time for a complete collection is less or equal to its expectation. Calculating this expression using definition of Stirling numbers of the 2nd order (again assuming extension to the continuous domain, with the first parameter of the Stirling number not integer any more) produces for EL the following figure (blue curve) : Pr[TN ≤ NH N ] , plotted for N up to 1000. It is compared to Ne − H N (grey curve), whose limit, when N→∞, is e-γ ≈ 0.56146. Limit of Pr[TN ≤ E[TN ]] is slightly above e-γ value.

N N − i   i  ( − 1 )    ∑  i  N  i =0 N

It stands that:

NH N

N i   −iH N . ( − 1 )  e ∑ i i =0 N

can be approximated by

i

− i i   On the one hand, 1 − ≈ e N when i k ] − Pr[Tn > k ] = ∑ (−1) n− j   ∑ PJk − ∑ (−1) n−1− j   ∑ PJ ,  N −n | J |= j  N −( n +1) |J | = j j =0 j =0 n N− j   i.e: Pr[Wk = n] = ∑ (−1) n − j   ∑ PJk .  N −n |J |= j j =0 Obviously, Pr[Wk = n] = 0 for n>N and k k ] , n N − j −1  k Wk CDF probability is: Pr[Wk ≤ n] = ∑ (−1) n− j   ∑ PJ .  N −n −1 | J |= j j =0

n N − j −1 N − j −1   N  n− j  n− j  ( − 1 ) 1 = ( − 1 )      = 1 (see Appendix 1 ∑ ∑ ∑  N − n −1  j   N − n −1 | J |= j j =0 j =0 n N − j −1  k in [Berthet17]), the CCDF form is: Pr[Wk > n] = ∑ (−1) n− j   ∑ (1 − PJ ) .  N −n−1 |J |= j j =0 n

Noticing that for N≥n>0,

N N − j −1  k Let us stress that for the complete collection: Pr[Wk ≤ N ] = ∑ (−1) N − j   ∑ PJ , − 1  | J |= j j =0 binomial coefficient is 1 for j=N, and 0 elsewhere, hence Pr[Wk ≤ N ] = 1 or

Pr[Wk > N ] = 0 .

Also:

N N  N− j  Pr[Wk = N ] = ∑ (−1) N − j   ∑ PJk = ( −1) N ∑ (−1) j ∑ PJk .  0 |J | = j j =0 j =0 |J | = j

As

shown in [Berthet17] Corollary 1, this expression is null for k n) = ∑  ∑ (−1) n − j    ∑ (1 − PJ ) obtained by  N − n−1  | J |= j n≥ 0 j =0  n = j commuting the summations. N − j −1 N −1 N − j −1 N − j −1   u Noting that: ∑ ( −1) n− j  =  ∑ ( −1)   = 1 j = N −1 where 1A=1 when A is true else 0,  N −n−1  u =0  u  n= j

finally: E[Wk ] =

∑ (1 − P

| J | = N −1

J

k

N

(

)

k ) = ∑ 1 − (1 − pi ) . This is formula (26) of [Boneh97]. i =1

Remarkably E[Wk] is also the average Working Set function [Fagin77] used in caching analysis and this is the reason why we use for Wk variable the same patronyme.

(iv) Summary of Formulas Notation: (n different items among N possible after k trials) Variable Definition Pdf CDF

Tn Number of drawings needed to complete a sub-collection of size n in k trials n−1  N − j −1  k −1 Pr[Tn = k ] = ∑ (−1) n−1− j   ∑ PJ (1 − PJ )  N − n |J |= j j =0 n −1

Pr[Tn ≤ k ] = ∑ (−1) j =0

CCDF

n −1− j

 N − j −1  k   ∑ (1 − PJ )  N −n |J |= j

n−1

 N − j −1  k Pr[Tn > k ] = ∑ (−1) n−1− j   ∑ PJ  N −n |J |= j j =0

Wk Number of different items observed in the first k drawings n  N− j  Pr[Wk = n] = ∑ (−1) n − j   ∑ PJk  N −n |J |= j j =0 n  N − j −1  k Pr[Wk ≤ n] = ∑ (−1) n− j   ∑ PJ  N −n−1 |J | = j j =0

(

n  N − j −1  k Pr[Wk > n] = ∑ ( −1) n − j   ∑ 1 − PJ N − n − 1   j =0 |J |= j

Expectation

Also called Waiting time for a n-size partial Also called WS(k) function (average collection ‘working set’) N N − j − 1 n −1 k   E[Wk ] = ∑ (1 − (1 − p j ) ) ∑ 1 E[Tn ] = ∑ (−1) n−1− j   j =1 j =0  N −n |J |= j 1 − PJ Note that all probability expressions as well as Tn expectation can be transformed with a summation index change (N-j).

(v) Wk Recurrence Relation for EL For an EL distribution, [Read98] gives a recurrence relation for the variable Wk. With his notation, probability is pn (r + 1,s + 1) = [(n- s) pn (r, s) + (s + l) pn(r, s + 1)] /n r = 1, 2, 3,... ; s= l, ... ,min (r, n). which is justified: “to find (s + 1) different types of card in (r + 1) packets, either we previously had s different types in the first r packets and then (with chance (n - s)/n) Christian BERTHET

Page 8

6/16/2017

)

found one of the other (n - s) types in the (r + 1)th packet, or we already had (s + 1) different types in the first r packets and then got one of these types again in the (r + l)th packet (which happens with chance (s + 1)/n)”. Rewritten with our notation (with k substituted for r, N for n and n for s) N −n n +1 Pr[Wk +1 = n + 1] = Pr[Wk = n] + Pr[Wk = n + 1] . This recurrence stems directly N N from recurrence on Stirling numbers. Let’s note that [Read98] does not distinguish between the two variables giving Wk probability in formula (5) and TN expectation in formula (6). Also let us mention that, again, there is no such recurrence for non-EL popularities.

(vi) Properties Relation between Tn CDF and Wk pdf for complete collection of EL popularity extends N

itself to any popularity, i.e. : Pr[Wk = N ] = ∑ (−1) N − j ∑ PJk = 1 − Pr[TN > k ] = Pr[TN ≤ k ] . j =0

|J |= j

A consequence of formula (27) [Boneh97] is: Pr[Tn = k ] = Pr[Wk −1 < n] − Pr[Wk < n] Pr[Tn+1 = k ] − Pr[Tn = k ] = Pr[Wk −1 = n] − Pr[Wk = n] .

which implies also :

Pr[Wk = n] = Pr[Tn +1 > k ] − Pr[Tn > k ] , n n] − Pr[Tn > n] = 1 − Pr[Tn > n] = Pr[Tn ≤ n] = Pr[Tn = n] .

(vii) Other Relations n −1

From

previous

relations,

it

follows

that:

∑ Pr[W

k

q =1

N

complementarity:

∑ Pr[W q=n

= q] = Pr[Tn > k ] ,

= q] = Pr[Tn ≤ k ] . We have also:

k

N

N

N

n =1

n =1

n =1

∑ Pr[T u >k

n

and

by

= u ] = Pr[Wk < n] .

Another relation is: ∑ Pr[Tn = k ] = ∑ Pr[Wk > n − 1] − ∑ Pr[Wk −1 > n − 1] = E[Wk ] − E[Wk −1 ] hence: N

N

n =1

i =1

∑ Pr[Tn = k ] = ∑ pi (1 − pi )

k −1

.

1  Pr[Tn = k ] = 1 −  ∑  N n =1 N

This sum is always positive for any k>0, and for EL case: A dual relation is:

∑ Pr[W k ≥0

k

Christian BERTHET

.

< n] = ∑ Pr[Tn > k ] = E[Tn ] , which is equivalent to: k ≥0

∑ Pr[W k ≥0

k −1

k

= n] = E[Tn+1 ] − E[Tn ], n < N .

Page 9

6/16/2017

For an EL distribution, this is:

∑ Pr[W k ≥0

k

= n] = N (H N −n − H N −n−1 ) =

N . N −n

(viii) Proof of Correctness of CCP Probability N

From Tn pdf formula, we have: Pr[T1 = k ] = 1k =1 , and Pr[T2 = k ] = ∑ pi k −1 (1 − pi ) ⋅1k >1 . i =1

Proving that Probability formula is correct in the sense that it is always positive: Pr[Tn = k ] > 0, ∀n : 1 ≤ n ≤ N , ∀k : k ≥ n is not so obvious. For example, the proof of Pr[T3 = k ] > 0, ∀k ≥ 3 is equivalent to proving:

N

∑ PJk −1 (1 − PJ ) > ( N − 2)∑ pik −1 (1 − pi ) ,

| J |= 2

i =1

for any popularity of size N. In the next Section, we prove this relation in the complete case, after introducing a specific quantity that makes expressions somehow less painful to manipulate.

Christian BERTHET

Page 10

6/16/2017

4.

Calculation on Sums of Subsets Probabilities (i) Definition RNk =

For a given popularity of size N, we introduce the notation:

∑ (− 1)

|J |

k

PJ .

0≤ | J | ≤ N

Let us stress that higher index of RNk is a convention and not an exponent. TN probability for a complete collection is easily derived: Pr[TN > k ] =

N −1

∑ (−1)

N −1−| J |

|J | =0

PJ = 1 − (−1) N RNk , and then: Pr[TN ≤ k ] = (−1) N RNk . k

(ii) Properties It stands that: RN0 = R1N = ...RNN −1 = 0 and RNN = (− 1) N ! ∏ p j . N

1≤ j ≤ N

∑ (− 1) (a + P ) |J |

The following property holds:

J

0 ≤| J |≤ N

N

= RNN , ∀a real ,

which stems

directly from the binomial development and nullity of RNk when k 0 .

In other words, RNk expression can be iteratively obtained if the expression is known for a decremented size of the reference set and a correspondingly decremented exponent. Unfortunately, this does not give a clue for the ‘initial condition’ of the recurrence. We can now define a recurrence relation on CCP Waiting Time probability: pj   Let TN-1,{l} be the variable related to the distribution q j = , j ∈ (1..N ), j ≠ l 1 − pl   defined on the reference set of size (N-1) resulting from the exclusion of element ‘l’. N

Then CCP probability verifies: Pr[TN = k ] = ∑ pl (1 − pl ) l =1

k −1

Pr[TN −1,{l } ≤ k − 1], ∀k > 0 .

This implies that, for a complete collection, Pr[TN = k ] is always positive for k≥N, since, whatever the element ‘l’, Pr[TN −1,{l } ≤ k − 1] > 0 . This statement can be easily proven by recurrence on the size of the distribution set N.

(v) Explicit expressions of RNk , k≥N Here we derive explicit expressions of RNk for k=N+1, k=N+2 and k=N+3, which are valid for any popularity.  N +1  1 As seen before, from binomial development, we have: RNN +1 = RNN   .  2 N

Christian BERTHET

Page 12

6/16/2017

In Appendix Section 11 using the above recurrence, we prove by induction on the size N N  N +2  1  2 of the distribution that: RNN +2 = RNN    3 + ∑ pi  .  3  4N  i =1  N  N +3  1  2 Using again binomial development, this leads to: RNN +3 = RNN   1 + ∑ p i   4  2 N  i =1 

So far, alas, we have not found a closed-form expression of RNN + 4 . A generalization to any exponent is an open question.

(vi) Case of Uniform Popularity For a uniform popularity, and replacing RNk notation by ELkN , it stands from the definition

(− 1) N! S (k , N ) .  N  j  of Stirling numbers that: EL = ∑ (− 1)    = Nk 0≤ j ≤ N  j  N  k

N

j

k N

A derivation can be obtained using the recurrence relation on Stirling numbers of 2nd n

order S (n + k , n) − S (n + k − 1, n − 1) = n S (n + k − 1, n) . Then: S (n + k , n) = ∑ mS (m + k − 1, m) m =1

One can obtain successively ( ELkN are then derived) :  N +1  S ( N + 1, N ) = ∑ mS (m, m) =    2  m =1 N

 m+1   N + 2  3 N + 1 S ( N + 2, N ) = ∑ mS (m + 1, m) = ∑ m  =    3  4 m =1 m =1  2  N

N

 m + 2  3m + 1  N +3  N +1  S ( N + 3, N ) = ∑ m  =    4  4  2  m =1  3  N

3 2  N + 4  15 N + 30 N + 5 N − 2 S ( N + 4, N ) =   48  5  2  N +5  N +1  3 N + 7 N − 2 . S ( N + 5, N ) =    8  6  2 

−1

n+k It appears that S ( n + k , n)  is always a polynomial of degree k-1, obviously equal to  k +1  1 when n=1 and positive when n>1. This was stated by Griffiths [Griffiths12].

Christian BERTHET

Page 13

6/16/2017

5.

Proof of Conjectures (i) Boneh and Hofri 1997 Conjecture on the appearance of graphs

In the 1997 version (pg. 10) of their work, Boneh and Hofri [Boneh97] give the following statement: “We conjecture that the appearance of Fig. 1, where the duration curve lies entirely above the detection curve (except for the first two points: t = 0, 1, where they coincide) is unique to the EL case, and that in all other cases they intersect”. ‘Duration’ is the inverse function of the Waiting Time Expectation and ‘Detection’ is the Working Set Expectation divided by N. A proof of this conjecture on the so-called duration and detection curves can be stated as follows. Each of these curves is a sequence of segments. As stated by the authors, they coincide at N 1 N 1  (0,0) and (1,1/N). For x=2, ‘Detection’ function is y = 1 − ∑ (1 − pi )2 =  2 − ∑ pi 2  , N i =1 N i =1  hence its slope on [1,2] segment is:

N 1 2 1 − ∑ pi  . Inverse of Waiting Time expectation N i =1 

 N 1 has coordinates (x=E{C2},y=2/N) with E{C 2 } =  ∑  i =1 1 − pi

[1, E{C2}] segment is:

1/ N  1  ∑  i =1 1 − pi N

  − N 

=

1  p N  ∑ i  i =1 1 − pi N

  

  − ( N − 1) , hence its slope on 

.

It is worth noting that in the uniform case, both slopes are equal to (N-1)/N2. For a non-uniform case, conjecture holds if it can be proved that the slope of “detection” function at point x=1+ is steeper than that of the “Duration “ function. If this is the case, curves necessarily intersect later because, at the other end of the curve, “Detection” function is always strictly below 1, whereas inverse of waiting time function ends at point with coordinates (x= E[TN],y=1). −1

This comes down to showing that

2

N N   N 2 2 2 1 − ∑ pi  = 1 + ∑ pi +  ∑ pi  + ... is smaller i =1  i =1   i =1 

N

N N pi 2 3 = 1 + ∑ pi + ∑ pi + .. which is true if : i =1 1 − p i i =1 i =1

than ∑

 N 2 k pi >  ∑ pi  ∑ i =1  i =1  N

k −1

, ∀k > 2 . This

relation is proven in Appendix Section 10 Lemma 10.2 and completes the proof of the conjecture.

(ii) EL duration and EL detection Proving that in the EL (unifom case) duration lies above the detection is equivalent to 1  proving that: 1 − 1 −   N

Christian BERTHET

N (H N − H N −n )

≤

n , ∀n : 0 ≤ n ≤ N , which is obvious for n=0, 1, and N

Page 14

6/16/2017

2 N −1

2 1 ≤ (1 − ) N −1 can be checked easily using WolframAlpha©. General N N case is proven by induction. Assuming relation is true for n, it holds for (n+1): N. For n=2: 1 −

N

(1 −

N

n 1 N ( H N − H N −( n +1) ) 1 N ( H N − H N − n )+ N −n  1 ) = (1 − ) ≥ 1 − (1 − ) N −n . Then it must be proved: N N N  N

n 1 1 N  1    n +1 1 − (1 − ) N −n ≥ 1 −  , i.e.: (1 − ) ≥ 1 −  N N  N  N   N −n true because (1-1/x)x is increasing for x>=1. N

N −n

, ∀n : 0 ≤ n < N , which is

(iii) EL and only EL is maximal for Working Set Expectation N 1 − pi   Let us consider the distribution q i = , 1 ≤ i ≤ N. Obviously, it verifies: qi = 1  ∑ N −1  i =1 N

and is non-EL. Then, from ∑ q i k > i =1

N

holds that:

 1 − pi 

∑  N − 1 

k

>

i =1

1 , except for k=0,1 where equality holds. N k −1

(N − 1)k

N

Thus: ∑ (1 − p i )k >

N

i =1

k −1

1 (see lemma 10.2 in Appendix Section 10), it N k −1

  1 ∑ (1 − (1 − p ) ) < ∑ 1 − 1 − N  N

, or

i =1

k

i

N

i =1





  . In other words,   k

compared to all other non-uniform distributions, EL is always maximal regarding Working Set expectation for any k>1. k

1 Relation ∑ (1 − pi ) > ∑ 1 −  has a number of consequences, for example, for k=3: N i =1 i =1  N N 3 1 2 3 3∑ pi −∑ pi > − 2 . N N i =1 i =1 N

k

N

(iv) EL and only EL is minimal for Waiting Time Expectation Boneh and Hofri have shown the minimality of EL compared to any other distribution regarding the expectation of the waiting time of a full collection. This result is extended in [Anceaume14] (Theorem 5 page 8) to any partial collection where they show that expectation of any non-EL popularity is higher or equal to that of EL (which does not mean strictly higher). In Appendix Section 12, we use a different argument to show the strict minimality of EL regarding the expectation of Waiting Time Tn for a partial collection.

(v) EL and only EL is minimal for Waiting Time CCDF In Appendix Section 13 we give a proof of minimality of EL w.r.t Waiting Time CCDF, first for a complete collection. We then give the sketch of the proof for a partial collection.

Christian BERTHET

Page 15

6/16/2017

6. Relation between CCP Waiting Time Expectation and LRU Miss Rate (i) Uniform Distribution In case of a uniform distribution (EL), expectation difference of the Waiting Time for a N partial collection of size j is: ∆E[T j ] = for 0≤j2, i =1

N   k −1 2   = 0 , and we know from lemma 10.3 (Appendix Section 10) that p p − p ∑ ∑ i i j   i =1 j = 1   expression is null only for an EL distribution, otherwise it is strictly positive. N

Therefore an extension to an arbitrary popularity of the relation between Expectation difference and LRU MR would make sense only as an asymptotic approximation when j and N increase. There is a quite old result that defines an asymptotic approximation of LRU MR which is known to give excellent results for real-life values of cache size and support size.

(iii) Fagin approximation of LRU miss rate In 1977, an approximation of LRU miss rate was given by Fagin [Fagin77] under the classical IRM hypothesis. Fagin claims that “in a certain asymptotic sense” LRU miss rate can be approximated when the support size N increases, by an expression which (after moving to the continuous domain and assuming that popularity contains no large pi, i.e., for all elements of the popularity: ln(1-pi)=- pi) we represent as: MR[ j ] ⋅

Christian BERTHET

d WS −1 ( j ) ≈ 1 , dj

Page 16

6/16/2017

where WS-1 is the inverse function of the Working Set function and ‘≈’ denotes the “asymptotically close” condition. This approximation fell into oblivion during quite a few years before being recently rediscovered under the “Che approximation” label. Of course it is not an exact calculation of LRU miss rate such as in King or Flajolet formulae, however, it is surprisingly precise and much valuable since it is computable on very large supports which is not the case for exact formulas. In the next section, we show that there is another asymptotic relation, this time between the two variables of the CCP problem: WS −1 ( j ) ≈ E[T j ] . Therefore, Fagin approximation can be extended by the following relation between the Miss Rate of an LRU cache of size j and the Waiting Time expectation for a partial collection of size j, when they are subject to the same popularity and obviously, the same IRM hypothesis: MR[ j ] ⋅ ∆E[T j ] ≈ 1 .

(iv) Algebraic proof of WS −1 ( j ) ≈ E[T j ] Intuitively, inverse function WS-1(D) is the average of the size of a window containing D distinct addresses (or items,..) hence it is asymptotic to the expectation of CCP waiting time to collect a partial collection of D coupons. It was noted, in [Boneh97] equation (57), and using our notation, that, for an EL distribution: WS ( E[TN ]) ≈ N (1 − e − H N ) , i.e. “the expected number of items detected by the time the collector would expect to finish is extremely close to N, at N-e−γ, with the shortfall essentially independent of N”. Note that e−γ is the limit of Ne − H N when N→∞. In this section, first, we generalize Boneh observation and prove that this relation on TN (complete collection) is indeed true for any popularity, i.e: N > WS ( E[TN ]) > N − e −γ , for any popularity.

Then, we prove that a similar relation exists for a partial collection of size j, as long as j is j > WS ( E[T j ]) > j − e −γ , for any popularity. large enough: Proofs are given in Appendix Section 14. This leads us to the conclusion that asymptotically: WS −1 ( j ) ≈ E[T j ] .

(v) Application: Derivation of Doumas & Papanicolaou formulas for power-laws popularities. The asymptotic approximation E[T j ] ≈ WS −1 ( j ) lends itself to a result obtained by A. Doumas and V. Papanicolaou [Doumas12] when the popularity distribution is a powerlaw (a.k.a. generalized Zipf law). This is described in Appendix Section 15.

Christian BERTHET

Page 17

6/16/2017

7.

Conclusion

In this document, we have shown a novel and very simple asymptotic relation between the Expectation difference of the CPP Waiting Time on the one hand, and the Miss rate of an LRU cache on the other hand, assuming that both are faced with the same popularity of items and same IRM hypothesis. Prior to this, we have introduced inequalities on subsets probabilities, some of them were not already known. they allow for algebraic proofs for some conjectures on the optimality of EL w.r.t to other popularities

Christian BERTHET

Page 18

6/16/2017

8.

References

[Anceaume14] Anceaume Emmanuelle, Yann Busnel, and Bruno Sericola. "New results on a generalized coupon collector problem using Markov chains." arXiv preprint arXiv:1402.5245 (2014). [Berthet16] Berthet, C. (2016). Identity of King and Flajolet & al. Formulae for LRU Miss Rate Exact Computation. arXiv preprint arXiv:1607.01283. [Berthet17] Berthet, C. (2017). On Von Schelling Formula for the Generalized Coupon Collector Problem. arXiv preprint arXiv:1703.01886. [Boneh97] Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), http://web.cs.wpi.edu/~hofri/CCP.pdf [Doumas12] Doumas, Aristides V., and Vassilis G. Papanicolaou. "Asymptotics of the rising moments for the coupon collector’s problem." Electron. J. Probab 18.41 (2012): 115. [Erdos61] Erdős, P., Renyi A., (1961). On a classical problem of probability theory. [Fagin77] Fagin, R. (1977). Asymptotic miss ratios over independent references. Journal of Computer and System Sciences, 14(2), 222-250. [Flajolet 92] Flajolet, Philippe; Gardy, Danièle; Thimonier, Loÿs (1992), "Birthday paradox, coupon collectors, caching algorithms and self-organizing search", Discrete Applied Mathematics 39 (3): 207–229. doi:10.1016/0166-218X(92)90177-C, MR 1189469. [Ferrante12] Ferrante, M., & Frigo, N. (2012). On the expected number of different records in a random sample. arXiv preprint arXiv:1209.4592. [Ferrante14] Ferrante, M., & Saltalamacchia, M. (2014). The Coupon Collector’s Problem.Materials matemàtics, 0001-35. [Griffiths12] Griffiths, M. (2012). 96.49 On the diagonals of a Stirling number triangle. The Mathematical Gazette, 96(536), 333-337. [Read98] Read, K. L. Q. (1998). A lognormal approximation for the collector's problem. The American Statistician, 52(2), 175-180. [Schelling54] Herman Von Schelling Coupon Collecting for Unequal Probabilities The American Mathematical Monthly, Vol. 61, No. 5, May, 1954 (1954).

Christian BERTHET

Page 19

6/16/2017

9. Appendix: Expectation of partial collection for uniform distributions We want to prove that for uniform distributions and a collection of size c among n items: c −1  n −i −1  n  n E[Tc ] = ∑ (−1) c −1−i  = n( H n − H n − c ) .    n −c  i  n − i i =0 c −1−i  n  c −1  c −1  (−1)  n−i −1  n   n  c−1  c E T nc From  , one has: [ ] = or with a =  ∑        c 2  c  i =0  i  (n − i )  n−c  i   c  i  n − i

(−1) i . 2  (n − c + 1 + i )

n c −1 c −1 summation index change: E[Tc ] = nc ∑  

 c  i =0 

1

Known relation: x a (1 − x ) b dx = ∫ 0

(a

i

a! b! obtained by iterated integration by parts + b + 1 )!

a!b!u a + k (1 − u ) b +1−k . k =1 ( a + k )!(b + 1 − k )!

b +1

u

is generalized to a variable bound, for a≥0, b≥0:

a b ∫ x (1 − x) dx = ∑ 0

u 1 b +1  1 a a!b!u a + k −1 (1 − u ) b +1−k a!b! b+1 1 b   . x ( 1 − x ) dx du = du = ∑ ∫0 u  ∫0 ∫0 ∑  (a + b + 1)! k =1 (a + k ) k =1 ( a + k )!(b + 1 − k )!  1

Then:

On the other hand, integral is also obtained by binomial expansion: b b i a +i  b  (− 1) ( ) x ( 1 − x ) dx = − 1 x dx = u a +i +1 .     ∑ ∑ ∫0 ∫0 i=0  i  i =0  i  a + i + 1 u

u

a

i

b

b

b b 1  b  (− 1)  b  (− 1) Hence: ∫ ( ∫ x a (1 − x) b dx)du = ∫ ∑   u a +i du = ∑   2 u 0 a + i +1 i = 0  i  (a + i + 1) 0 0 i =0  i  1

u

1

i

a!b! b+1 1  b  (− 1) = or,   ∑ ∑ 2 (a + b + 1)! k =1 (a + k ) i = 0  i  (a + i + 1) b

Thus we obtain the remarkable identity:

i

i

a!b!  b  (− 1) = (H a+b+1 − H a ) .   ∑ 2 (a + b + 1)! i = 0  i  (a + i + 1) b

i

Note that, by setting a=0 and b+1=m, identity derives to the well-known identity: m m p −1   1 ( − 1 ) = Hm . ∑  p p =1  p And it finally holds that: (−1) i 1  n  c −1  c −1   n  (n − c)!(c − 1)! c E[Tc ] = nc ∑   = nc  = n( H n − H n − c ) . ∑ 2 n!  c  i =0  i  (n − c + 1 + i ) c k =1 ( n − c + k )

Christian BERTHET

Page 20

6/16/2017

10. Appendix: Inequalities on Subset Probabilities We consider a non-EL distribution {pi}, 0 pi 2 > pi 3 .... and consequently: N

N

i =1

i =1

N

1 = ∑ pi > ∑ p i > ∑ pi > .. . 2

3

i =1

(i) lemma 4 of [Anceaume14] We give another proof of a very interesting lemma from [Anceaume14], lemma 4 pg7, which uses a proof of induction on N. 1 Let us divide the set {p1,...,pN} into subsets depending on the sign of  p i −  and note

N  1 1 1 1   U = { pi | p i > } and V = { p i | p i < } . It holds that: ∑  pi −  + ∑  p j −  = 0 , since N N N  j∈V  N i∈U 

all pi not belonging to U or V are equal to 1/N. Summation on U is positive and is the opposite of the summation on V. Also, note U is always non-empty otherwise V is also necessarily empty as well, which contradicts the fact that distribution is non-EL. Then, noting that ∀pi ∈ U , ∀p j ∈ V are such that pi > p j , it stands necessarily that: N 1  1 1  1 1  1  pi −  + ∑  p j −  < 0 , thus: ∑  pi −  < 0 which is [Anceaume14] N p N p N  j∈V j    i∈U i =1 i  i  N 1 result: ∑ >N 2 . This result can be further generalized using the same argument with i =1 pi

∑p

the following lemmas.

(ii) Lemma 10.1 For any non-EL distribution {pi} and any non-decreasing function f(x) on [0,1] such that 1 f(0)≠f(1), then ∑  pl −  f ( pl ) > 0 . Inequality also holds when f verifies f(a)>f(b) for N l  1 < a < 1 . Reciprocally, if f is a non-increasing function f(x) with f(0) ≠f(1), N 1 1 or verifies f(a)0: k pi N N 1 1 < ∑ k , hence: ∑ k > N k +1 . On the other hand, i =1 p i i =1 p i

Proof: Lemma 10.1 applies to f ( pi ) = N

∑ i =1

N 1  1 1 , so: p − < 0 N   ∑ i k k −1 N pi  i =1 p i

Christian BERTHET

Page 21

6/16/2017

N

using a similar argument, it holds that:

∑p i =1

>

k i

1 for k>1. Hence same formula holds N k −1

for any non-EL distribution and any exponent k, except for k=0 and 1 for which the two N

sides are equal. Note it is equivalent to another formulation:

∑ (Np ) i =1

k

i

>N.

N N 1 1 N2 k k . = p = p > N = ∑ ∑∑ ∑∑ ∑ i i k N −1 i =1 1 − p i i =1 k ≥ 0 k ≥ 0 i =1 k ≥0 N N

A direct consequence is: N

And similarly:

N pi N k +1 , this result is proved and used in Theorem 5 = pi > ∑ ∑∑ N −1 i =1 1 − p i i =1 k ≥ 0

of [Anceaume14].  N 2 p >  ∑ pi  ∑ i i =1   i =1 N

(iv) Lemma 10.3

k −1

k

N



N



i =1



l =1



, ∀k > 2

For any distribution it stands that: ∑ pi  pi − ∑ pl 2  = 0 . From that, by separating the set {p1,...,pN} into subsets depending on the sign of N N N  2 2 2  pi − ∑ pi  , let us note U = { p i | p i > ∑ pi } and V = { pi | pi < ∑ p i } . It holds that: i =1 i =1 i =1   N N   2 2 p i  p i − ∑ p l  + ∑ p j  p j − ∑ p l  = 0 , i.e., summation on U is positive and is the ∑ i∈U l =1 l =1   j∈V  

opposite of the summation on V. Then noting that ∀pi ∈ U , ∀p j ∈ V , pi > p j , it stands necessarily that:

∑p i∈U

2

i

N N  2 2 2  p i − ∑ pl  + ∑ p j  p j − ∑ p l  > 0 . l =1 l =1   j∈V   2

N N    N  Hence ∑ pi  pi − ∑ pl 2  > 0 which readily implies ∑ pi 3 >  ∑ pi 2  . i =1 i =1 l =1    i =1  N

2

N

With the same argument it stands for k>2:

∑p i =1

N

∑p i =1

k i

k

N  2  p i − ∑ p l  > 0 . Hence l =1  

 k −1  2 >  ∑ p l  ∑ p l  and using a simple induction argument it leads to  l =1  l =1  N

 N 2 p >  ∑ pi  ∑ i i =1  i =1  N

k −1 i

N

k −1

, ∀k > 2 . This completes the proof of the conjecture.

N   2  + ∑  p j − ∑ pl  < 0 . i∈U l =1 l =1  j∈V   N N N   1 Hence ∑  pi − ∑ pl 2  < 0 which is another proof of: ∑ pl 2 > . N i =1  l =1 l =1 

Let us note also that from similar arguments,

Christian BERTHET



QED.

N

∑  p − ∑ p

Page 22

i

2

l

6/16/2017

N

A more general relation stems from two previous results: Both

1

∑1− p i =1

N

∑ i =1

> i

N2 and N −1

1 >N 2 are special cases, for j=1 and j=N-1, of next Lemma 10.4 with is a pi

generalization to sums of subset probabilities. 1

∑1− P

(v) Lemma 10.4

|J |= j

>

J

N

∑N − j, 0< j < N

|J |= j

Note that equality holds when j=0. Another formulation is:

1

∑P

|J |= j

J

>

N , 0< j< N, |J |= j j

∑

with equality for j=N. A proof goes like this:  N −1  We know ([Berthet17] Appendix3 relation 1) that ∑ (1 − PJ ) =   so we can consider   |J |= j  j  −1  N  N −1   the new distribution (1 − PJ )   of   possible elements (one per subset J). It is    j    j  N 1 non-EL. And, since for any non-EL distribution {pi} with N elements: ∑ >N 2 , then, i =1 p i 2

N 1  N −1  for the new distribution:   ∑ >   which implies for any non-EL  j | J |= j 1 − PJ  j  N N 1 distribution: ∑ QED >  ,0< j< N .  N− j | J | = j 1 − PJ  j  N −1  j This expression can also be derived differently by noting that ∑ PJ =   = ∑ ,   | J |= j  j −1  | J |= j N j  1≤j≤N, or ∑  PJ −  = 0 . Using an argument similar to the one used in Lemma 10.1: N | J |= j  j j j    PJ  PJ −  > 0 ∑ j  PJ − N  + ∑ j  PJ − N  = 0 necessarily implies that: |∑ N  J |= j | J |= j ;P > | J |= j ;P < J

J

N

N

2

k

 j  j k so ∑ PJ > ∑   . Similarly, ∑ PJ > ∑   for k>1. Equality stands for k=0 | J |= j |J |= j  N  | J |= j | J |= j  N  and k=1. Finally, summation over all k≥0, lends the desired relation. QED 2

k

N 1 Let us mention the dual relation: ∑ k > ∑   for all 00. | J | = j PJ |J |= j j 

Christian BERTHET

Page 23

6/16/2017

N  N +2  1  2 RNN +2 = RNN    3 + ∑ pi   3  4N  i =1 

11. Appendix: Proof of

Using WolframAlpha©, relation can be directly proved for N=2, 3, 4: R(4,2): Simplify (x^4+(1-x)^4-1)/(2!*x*(1-x)*comb(4,3)*(3+x^2+(1-x)^2)/(4*2)) is -1. R(5,3): Simplify (x^5+y^5+(1-x-y)^5-(x+y)^5-(1-x)^5-(1-y)^5+1)/(3!*x*y*(1-xy)*comb(5,3)*(3+x^2+y^2+(1-x-y)^2)/(4*3)) is 1. R(6,4): simplify (x^6+y^6+z^6+(1-x-y-z)^6-(x+y)^6-(x+z)^6-(y+z)^6-(1-x-y)^6-(1-x-z)^6-(1-y-z)^6+(1x)^6+(1-y)^6+(1-z)^6+(x+y+z)^6-1)/(4!*x*y*z*(1-x-y-z)*comb(6,3)*(3+x^2+y^2+z^2+(1-x-yz)^2)/(4*4)) is -1. N

For the general case, we use the recurrence: RNN + 2 = RNN +1 − ∑ pl (1 − pl ) l =1

N +1

RNN−+11,{l } , ∀k > 0

We assume the induction hypothesis holds on the distribution over (N-1) elements, i.e. the original distribution without element ‘l’, i.e.:  N  p 2  N +1 1  N +1 N −1  i  .   RN −1,{l } = RN −1,{l }   + 3 ∑      3  4( N − 1)  i =1,i ≠l  1 − pl  

Then: R

⇔R

N +2 N

N +2 N

=R

=R

N +1 N

N +1 N

N

− ∑ pl (1 − pl ) l =1

N

+ ∑ (1 − pl ) l =1

2

N +1

R

N −1 N −1,{l }

 N  p 1  N +1    i   ∑   3  4( N − 1)  i =1,i ≠l  1 − pl

 N  p 1  N +1    i R   ∑   3  4 N ( N − 1)  i =1,i ≠l  1 − pl N N

2    + 3    

2    + 3    

 N + 1 N 1   N 2   ∑ pi  + 3(1 − pl )2  RN  2 2 ⋅ 3!   i =1,i ≠l  l =1  N N  1  2 2 2  = RNN +1 1 +  3(1 − pl ) − pl + ∑ pi   ∑ i =1   2 ⋅ 3! l =1  N

⇔ RNN + 2 = RNN +1 + ∑ ⇔ RNN + 2

  1   N   ( N + 2) ∑ pi 2  + 3 N − 6   Hence: RNN + 2 = RNN +1 1 +  2 ⋅ 3!   i =1     And finally: RNN + 2 = RNN +1

Christian BERTHET

N N N +2 2  N +2  1  2  3 + ∑ pi  = RNN    3 + ∑ pi  2 ⋅ 3!   3  4N  i =1 i =1  

Page 24

QED

6/16/2017

12. Appendix: Minimality of EL for Waiting Time Expectation of a Partial Collection N

A direct consequence of

+∞

1

N

∑ 1 − p = ∑∑ p i =1

i

k = 0 i =1

k i

>

N2 (see Appendix Section 10) is: N −1

2N − 1 = N (H N − H N − 2 ) . This means that EL and only EL is always minimal N −1 regarding the waiting time expectation for an incomplete subset of size 2. In order to extend this result to any partial collection size, we use the expression IJ , defined in [Berthet16] where J is a subset of size j on {1,..,N}: E[T2 ] >

∑

IJ =

permutation {i1 ,i2 ,..i j }ofJ

pi1 pi2 .. pi j (1 − pi1 )(1 − pi1 − pi2 )..(1 − pi1 − pi2 .. − pi j )

.

We are interested in the sum of IJ over all subsets J of size j over a N-size support, noted ∑ I J . Expectation of a partial collection of size n verifies the relation E[Tn ] = ∑ I J . | J |= j

0≤ | J | < n

This formulation of expectation was first expressed to our knowledge in [Ferrante12] with the help of conditional probabilities (see also [Ferrante14] and [Berthet16] for a proof of equivalence of this formula to the standard formula). We want to prove that: ∑ I J > | J |= j

N , for all j, 1≤j N (H N − H N −n ) , and therefore that EL is minimal regarding the waiting time for a partial collection.

Proving the inequality implies that for any non-EL distribution: E[Tn ] >

∑

N

This proof is obvious for j=1 since:

pi and, using lemma 10.2 in Appendix i =1 1 − p i

∑IJ = ∑

| J |=1

N Section 10 : ∑ I J > . In the general case we note that ∑ I J is a sum over all N −1 | J |= j | J | =1 permutations of all subsets of size j, then: N N N pi1 pi2 ... pi j I = ... , which actually is ∑ ∑ ∑ ∑ J 1 − pi1 1 − pi1 − pi2 ... 1 − pi1 − pi2 − .. − pi j |J |= j i1 =1 i2 =1 i j =1 i2 ≠ i1

i j ≠i1 ≠..≠i j −1

Christian BERTHET

(

)(

)(

Page 25

)

6/16/2017

[Ferrante12] notation. This expression can be rewritten:    N  N  pi j pi1 N  p i2  I = .. ∑ J ∑ ∑ ∑ 1 − p − p − .. − p   . | J |= j i1 =1  1 − p i1 i2 =1  1 − p i1 − p i2 i j =1 i1 i2 ij    i2 ≠ i1 i j ≠ i1 ≠..≠ i j −1   

  pj  pi N  = ∑ For j=2: ∑ I J = ∑∑ ∑  . We know that | J |=2 i =1 j =1 (1 − p i )(1 − p i − p j ) i =1  1 − p i j =1 1 − p i − p j  j ≠i j ≠i   N pi N for a non-EL N-size distribution {pi } , 1≤i≤N: ∑ . Let us now consider the > N −1 i =1 1 − p i set of distributions, one for each value of ‘i’, 1≤i≤N, on a (N-1)-size subset: pj   q j =  , 1≤j≤N, j≠i. Clearly at least one of those is non-EL, and thus: 1 − pi   N

qj

N

∑1− q j =1 j ≠i

N

=∑ j

j =1 j ≠i

pi p j

N

pj 1 − pi − p j

>

N

N −1 . Finally: N −2

∑I

| J |=2

J

N  p > ∑  i i =1  1 − p i

 N −1 N  . >  N −2 N −2

In a way similar to case j=2, we consider the distributions on a (N-(j-1)) size subset N   pi j qi j N − j +1 leading to: > and thus, q =  ij  ∑ 1 − qi j N−j 1 − pi1 − pi2 − .. − pi j −1  i j =1  i ≠ i ≠..≠ i j

N

∑

i j =1 i j ≠ i1 ≠..≠ i j −1

pi j 1 − p i1 − pi2 − .. − pi j

>

1

j −1

N − j +1 . Applied iteratively for each index from ij down N−j

N N −1 N − j + 1 N , and ... = N −1 N − 2 N− j N− j |J |= j consequently, minimality of EL regarding the waiting time of a partial collection. QED

to i1, this yields the desired result:

Christian BERTHET

∑I

J

>

Page 26

6/16/2017

13. Appendix: Minimality of EL for Waiting Time CCDF (i) Comparison to EL regarding CCDF of a Complete Collection We compare RNk =

∑ (− 1)|J | PJ k , non-EL distributions and

0≤ | J | ≤ N

ELkN =

j   j  ∑ (− 1)  j  N  N

0≤ j ≤ N

k

 

for the EL one, for all k≥N (both are null for 0≤k k ) = ∑ (−1) N −1− j ∑ PJ = (−1) N −1 RNk − (−1) N = 1 − (−1) N RNk , and k

j =0

N −1

|J |= j

1 = ∑ (1 − (−1) N RNk ) . 1 − P J |= j k ≥0 J

E[TN ] = ∑ (−1) N −1− j ∑ j =0

We want to prove that: ∀k ≥ N , (−1) N RNk < (−1) N ELkN . For k

1 . 2

 0   0  3 2 3 3   0  3 3 3 With N=3: ∑ PJ =   +   3∑ pl − ∑ pl  +  ∑ pl , hence |J |= j  j −3   j −2  l =1 l =1   j −1  l =1

(− 1)3 R33 = − ∑ (− 1) j ∑ PJ 3 = ∑ pl 3 −  3∑ pl 2 − ∑ pl 3  + 1 = 1 − 3∑ pl 2 + 2∑ pl 3 , 3

3

3

l =1  l =1  using expression of ∑ PJ described in ([Berthet17], Relation 7). 0≤ j ≤ 3

|J |= j

l =1

3

3

l =1

l =1

3

|J |= j

It can be shown using WolframAlpha© that, for any (x,y): 0 l =1

k

k

k

k

) and (− 1) EL 3

k 3

= 1−

2k −1 . Proving 3 k −1

2k −1 is not trivial (WolframApha© does not 3k −1

converge). General proof for k>N and any distribution, goes like this: -

Using a decomposition lemma introduced by [Anceaume14] page 2, (See also n  k +1 k  Relation 4 page 12 in [Berthet 17]): ∑ PJ = ∑  pl ∑ PJ , 1 ≤ j ≤ n , 0 ≤ k , |J |= j l =1  | J | = j , l∈J  by a direct extension to complex sums, it stands that: N

RNk = ∑ pl l =1

-

∑ (− 1) ∑ ( p

1≤ j ≤ N

| J | = j −1, l∉J

N

elements: R = ∑ pl (1 − pl )

.

k −1

l =1

 (− 1) ∑  pl + PJ ∑ 1 − pl 1≤ j ≤ N | J | = j −1, l∉J  1 − pl j

  

k −1

Summation index change: N

R = ∑ pl (1 − pl ) k N

-

l

 pj  Introduction of  , j ∈ (1..N ), j ≠ l distribution with (N-1) 1 − pl  k N

-

+ PJ )

k −1

j

k −1

l =1

We note a =

 (− 1) ∑ (− 1) ∑  pl + PJ 1 − pl 0≤ j ≤ N −1 | J | = j , l∉J  1 − pl j

k −1

pl which gives the binomial development (k≥N): 1 − pl

∑ (− 1) ∑ (a + P ) j

0≤ j ≤ N

  

k

|J |= j

J

|J | u k j u = ∑ a k −u   ∑ (− 1) ∑ PJ , since ∑ (− 1) PJ = 0 for  u  0≤ j ≤ N u= N |J |= j 0≤ | J | ≤ N k

u k ] > Pr[ ELn > k ] , where EL is a shorthand for the variable of the uniform case. We start with the simple cases n=2 and n=3. 1 N 1  N − j −1  k k For n= 2, we have: Pr[T2 > k ] = ∑ (−1)1− j   ∑ PJ = ∑ pi > k −1 thanks to N  N −2 |J |= j j =0 i =1 Lemma 10.2 in Appendix Section 10, hence: Pr[T2 > k ] > Pr[ EL2 > k ] . 2 N  N − j −1  k k k For n=3, Pr[T3 > k ] = ∑ (−1) 2− j  P = − ( N − 2 ) pi + ∑ ( pi + p j ) ∑ J ∑  N −3 |J |= j j =0 i =1 1≤i < j ≤ N

Christian BERTHET

Page 31

6/16/2017

Proving that EL is minimal for this expression is not trivial. WolframAlpha© does not converge even for N=4. As mentioned previously, the sequence of operations detailed in the previous section can be used for the general case (partial collection with n N N

always upper-bounded by N) finally: 1>

WS ( E[TN ]) e −γ . > 1− N N

Note on Minimal value

Although it is not necessary in our calculations, it is worth mentioning that surprisingly N

the minimum of

∑ (1 − p ) i =1

i

E [ TN ]

is NOT the EL case when N>2. This is clearly visible on 3

the (N=3, j=3) graph where the three minima of

∑ (1 − p ) i

i =1

E [T3 ]

have a different color than

3H

3  1 the unique EL case. Indeed, EL expression is 31 −  = 0.322567, and we found three  3 symmetrical minima at ~ 0.312403, one of them at (0.417, 0.417, 0.167).

Similarly for N=4, EL case is 0.363834 and a minimum at 0.326982712 is observed for (0.087 0.304 0.304 0.305). Christian BERTHET

Page 35

6/16/2017

It is not known whether an analytical solution exists for the minimas. Worth also mentionning is that, for the distribution such that all but one element, say x, N

∑ (1 − p )

are equal,

E [ TN ]

i

i =1

tends to e-1 when x→0, and e-γ when x→1. When N increases

to infinite, abciss of the minimum gets closer to 0, hence minimum tends to e-1. Analysis for N=3, j=2

We now consider the case of an incomplete collection. For N=3 and j=2, the following 3

∑ (1 − p )

graph of

E [T2 ]

i

i =1

is obtained:

Let us note the distribution {x,y,z} with x+y+z=1. We have: x y 1− x − y x 1− x , then: [E[T2 ]] y =0 = 1 + . E[T2 ] = 1 + + + + 1− x 1− y x+ y 1− x x

[

Hence: (1 − x )

+ (1 − y )

E [T2 ]

3

Finally: ∑ (1 − pi )

E [T2 ]

i =1

E [T2 ]

+ (x + y )

E [T2 ]

]

y =0

= (1 − x )

[ E [T2 ]]y = 0

+1+ x

[ E [T2 ]]y = 0

1 < 1 + , i.e. 1.3678, using same argument as before. e

Let us note that here again, minimum is 1.0855, and it is not the EL case which is 1.0886. Analysis j=2, whatever N

It stands: E[T2 ] = N

∑ (1 − p ) i =1

i

E [T2 ]

N

∑ (−1)

k − N +1

k = N −1 N

N N  k −1  pi 1 1  ∑ , hence = − ( N − 1 ) = 1 + ∑ ∑  | J |= k P i =1 1 − p i i =1 1 − p i J  N −2 

 N pi  1+ ∑ i =11− pi

= ∑ (1 − pi )

   

.

i =1

We extend the previous result obtained for N=3 by setting (N-2) items to 0, leading to N (1 − pi )E[T2 ] < N − 2 + 1 . ∑ e i =1

Christian BERTHET

Page 36

6/16/2017

Generalization N

An upper bound of

∑ (1 − p ) i

i =1

E [T j ]

is obtained when N-j elements are null, and the other

elements are bounded by the value obtained for a complete collection. The proof is as follows. Let us use Von Schelling notation: N  k −1  1 E[T j ] = ∑ (−1) j + k − N −1   ∑ , and denote by x an element of the reference set,  | J |= k PJ k = N − j +1  N− j N N −1  k −1   k −1  1 1 then: E[T j ] = ∑ (−1) j + k − N −1   ∑ . + ∑ (−1) j + k − N −1   ∑  | J |=k PJ k = N − j +1  | J |= k PJ k = N − j +1 N − j N − j   x∈J   x∉J Let us assume x is null:

[E[T ]] j

x =0

N

∑ (−1)

=

j + k − N −1

k = N − j +1

N −1  k −1   k −1    ∑ 1 + ∑ (−1) j + k − N −1   ∑ 1 ,  | J |= k −1 PJ k = N − j +1  | J |=k PJ  N− j   N− j 

With a variable change in the first summation (u=k-1) and extending the second N −1 N −1  u   k −1  1 j +u − N  j + k − N −1   ∑ 1 . summation to k=N-j: E[T j ] x =0 = ∑ (−1) + ∑ (−1) ∑     u=N − j  N − j | J |=u PJ k = N − j  N − j | J |=k PJ

[

]

 k −1  j +k − N   ∑ 1 . RHS is the expression of waiting time ( 1 ) − ∑   k =N − j  N − j −1 | J |=k PJ expectation for a partial collection of size j among N-1 elements (after exclusion of element ‘x’. The trick is that since ‘x’ is a null-weight element, the distribution remains the same for all the other elements).

[

]

Then: E[T j ] x =0 =

N −1

N −1 N [E [ T ] ] E [T ]  Then: ∑ (1 − pi ) j  = 1 + ∑ (1 − pi ) j x = 0 . i =1  i =1  x =0

Same reasoning can be applied iteratively for N-j variables, and, using the bound obtained for a complete collection (applied to j variables), we finally obtain: N

N − WS ( E[T j ]) = ∑ (1 − pi )

E [T j ]

< N − j + ( j − 1)e

i =1

Christian BERTHET

Page 37

− H j −1

,

∀{ pi }, i ∈ {1,..N } .

6/16/2017

Analysis of particular case N=6

We find the following values for N=6: (setting x, y,z,u,w to 0.00001 and varying them by steps of 0.01 with t=1-x-y-z-u-w) and comparison to the formula: − H j −1

j (N=6)

Max observed

N − j + ( j − 1)e

2

4.36348

4.36788 = 4 + e −1

3.98245

3

3.44042

3.44626

3.02642

4

2.47332

2.47964

2.07605

5

1.49181

1.49806 = 1 + 4e − H 4

1.15416

6

0.509713

0.509719 = 5e − H 5

0.33358

Min observed

1 WS ( E[T j ]) has the j j − 1 − H j −1 . = 1− e j

Coming back to the first graph of this section, we see that function following lower bound:

(

1 1 −H WS ( E[T j ]) > j − ( j − 1)e j −1 j j

)

Let us remark that first this bound is independent of N and when j→∞ (and also N), 1 e −γ 1 N − 1 − H N −1 WS ( E[T j ]) > 1 − . Secondly, for j=N: WS ( E[T j ]) > 1 − e . j j N N 1 Finally, function is upper-bounded: 1 > WS ( E[T j ]) . This is a direct consequence of j Boneh conjecture on the appearance of graphs, which implies that for any popularity there is necessarily a value above which, for any x: E −1 [Tx ] > WS ( x) . Then, let j = E −1 [Tx ] , or x = E[T j ] , this implies: j > WS ( E[T j ]) . In conclusion, for any popularity and for j large enough it stands that j > WS ( E[T j ]) > j − e −γ and then: WS ( E[T j ]) ≈ j .

Christian BERTHET

Page 38

QED.

6/16/2017

15. Appendix: Waiting Time Expectation of power-law popularities Working Set Expectation has a computable expression when the popularity distribution is 1 a power law with parameter ‘a’ (aka skewness): p i = , where HN,a is the Nth H N ,a ⋅ i a generalized Harmonic number. We assume pi 1, n =1 n asymptotic is ς (a)Na ln( N ) when a>1. For a=1, since the Nth-harmonic number is ln(N)+γ+ο(1/Ν), asymptotic is H N N ln N ≈ N ln 2 N . This result is also in [Flajolet92]. 1− a Finally, for 0