THE NUMBER OF WINNERS IN A DISCRETE ... - Project Euclid

5 downloads 92 Views 219KB Size Report
Technical University of Vienna. In this tutorial, statistics on the number of people who tie for first place are considered. It is demonstrated that the so-called Rice's ...
The Annals of Applied Probability 1996, Vol. 6, No. 2, 687 ] 694

THE NUMBER OF WINNERS IN A DISCRETE GEOMETRICALLY DISTRIBUTED SAMPLE BY PETER KIRSCHENHOFER

AND

HELMUT PRODINGER

Technical University of Vienna In this tutorial, statistics on the number of people who tie for first place are considered. It is demonstrated that the so-called Rice’s method from the calculus of finite differences is a very convenient tool both to rederive known results as well as to gain new ones with ease.

Let the random variable G be geometrically distributed. That is, P G s k 4 s pq ky 1 , with q s 1 y p. Also, assume that n independent copies are given. Finally, let X count the number of random variables with highest value. A popular realization of this situation is to consider n ‘‘players’’ who independently toss coins until each of them sees the first head. In this interpretation, X is the number of players who gain their respective heads in the very last round of the game, that is, the ‘‘winners’’ of the game. In w 1x the probability distribution of X, the expectation E X of X and the asymptotic behavior of P X s 14 Žprobability of a single winner. for n ª ` were to be determined. In the solution w 2x it was remarked that}surprisingly}this probability does not converge as n ª ` but rather has an oscillating behavior. At the same time, Eisenberg, Stengle and Strang w 5x discussed this problem and related topics, exhibiting the structure of the periodic fluctuation, for which an explicit Fourier expansion was given. Also about this time, Brands, Steutel and Wilms w 4x came independently to roughly the same results. A recent paper by Baryshnikov, Eisenberg and Stengle w 3x deals with the existence of the limiting probability of a tie for first place. In fact, a fluctuating behavior in asymptotic expansions is not at all uncommon. There are numerous results of that type, for example, in the analysis of divide-and-conquer recursions w 7, 8, 16x or digital sums w 6x , that play a prominent role in the probabilistic analysis of algorithms. Our aim in this paper is to some extent tutorial: the asymptotic technique that yields the Fourier expansions of the fluctuating functions very comfortably is called ‘‘Rice’s method’’ Žsee the recent survey w 9x. . In order to convince the reader of the advantages of this method, we will rederive a result on the initially mentioned problem in the sequel, and afterwards present some new results concerning higher moments of distribution as well as the number of persons reaching a specified level beyond the winnerŽs..

Received October 1994; revised November 1995. AMS 1991 subject classification. 60C05. Key words and phrases. Geometric distribution, generating functions, Rice’s method.

687

688

P. KIRSCHENHOFER AND H. PRODINGER

Let us abbreviate Q [ 1rq and L [ log Q. Also, let pm denote the probability pm s P X s m4 , that is, the probability of having m winners Žamongst n players.. Then

ž /Ýq

pm s p m n m

Ž 1.

Ž jy1. m

Ž 1 y q jy1 .

ny m

.

jG1

This follows from the observation that m out of n people have a Žwinning. value j, while the other ones have a smaller value. Now we set N [ n y m, expand the binomial and sum over j to get the alternative formula

ž /

pm s p m n m

Ž 2.

N

Ý ks0

ž /Ž N k

y1 .

1

k

1 y Qykym

.

The key point in analyzing this alternating sum asymptotically is the following lemma. LEMMA 1 Žw 9x. . Let f Ž z . be a function that is analytic on w n 0 ,q `w . Assume that f Ž z . is meromorphic in the whole of C and analytic on V s D `js1 g j , where the g j are concentric circles whose radius tends to `. Let f Ž z . be of polynomial growth on V. Then, for N large enough, N

Ý

Ž 3.

ksn 0

ž /Ž N k

k

y1 . f Ž k . s

Ý Res w N ; z x f Ž z . , z

where Ny 1

w N; z x s

N! G Ž N q 1 . G Ž yz . Ž y1. s z Ž z y 1 . ??? Ž z y N . GŽ N q 1 y z.

and the sum is extended to all poles not on w n 0 ,q `w . The following proposition collects the asymptotic results concerning the distribution of the number of winners among n players. We demonstrate the use of Rice’s method by giving our alternative proof for the asymptotics of the probabilities, mention the Žknown. expectation and derive the Žnew. asymptotics of the variance. PROPOSITION 1. Let X be the random variable ‘‘number of winners among n players’’ as described above. Then pm s P  X s m 4

Ž 4.

Ž 5.

s

1 pm L m

q

pm L

dm Ž log Q n . q O

En s E X s

p 1 q

1

ž / n

,

m fixed, n ª `,

Ž 1 q d 1 Žlog Q n . . q O L

1

ž / n

689

WINNERS IN A GEOMETRICALLY DISTRIBUTED SAMPLE

and Vn s Var X

Ž 6.

s

p 1 q2 L

y

p2 1 q2 L

ž

2

Qj

Ý jG1

Ž Q j q 1.

2

q

1 4

/

q t 1 Ž log Q n . q O

1

ž / n

,

where dm Ž x . and t 1Ž x . are continuous periodic functions of period 1, mean 0 and small amplitude. PROOF. In order to prove Ž4., we apply the lemma Žwith n 0 s 0. to expression Ž2.. In this instance f Ž z . s 1rŽ1 y Qyz ym . has poles at z s ym q x j , with x j s 2 jp irL, and is bounded on concentric circles C j about the origin passing through the points

Ž 2 j q 1. p i

ym "

L

,

j s 1, 2, . . . .

Therefore we only have to consider the residues of

w N; z x

1 1 y Qyzym

at z s ym q x j .

The computation of the residues is simple; Res zs ymq x j s

G Ž N q 1. G Ž m y x j . 1 G Ž N q 1 q m y xj . L

whence we have w after multiplication by the factor p m n ) m instead of N G 1x the formulas

Ž 7.

pm s

1 pm L m!

,

ž / and going back to

G Ž n q 1.

Ý G Ž m y xj . G Ž n q 1 y x . ,

n m

n ) m,

j

jgZ

where}according to the previous remarks}the series stands for the Cauchy principal value. Using Stirling’s formula for the approximation of the G-functions, we find that G Ž n q 1. G Ž n q 1 y xj .

ž

s n xj 1 q O

1

ž // n

,

n ª `,

so that, for n getting large, the series converges to the Fourier expansion of a periodic function in log Q n. Pulling out the term with index j s 0 Žthe ‘‘mean’’. and denoting the remaining periodic fluctuation Žof mean 0.,

dm Ž x . s we gain formula Ž4..

1 m!

Ý G Ž m y x j . e 2 jp i x , j/0

690

P. KIRSCHENHOFER AND H. PRODINGER

It is interesting to note that the alternating sum Ž2. can also be rewritten using the partial fraction decomposition of the meromorphic function 1rŽ1 y q zq m ., namely, 1

Ž 8.

1yq

zqm

s

1 2

1

q

L

Ý jgZ

1 z q m q xj

with x j s

2 jp i L

.

ŽCompare w 11x , 7.10.. Again, the sum stands for the Cauchy principal value. The usual argument to derive Ž8. is to compute the sum of the principal parts of the function and to show that the difference between this sum and the function}which has to be entire}is bounded, and thus a constant. Inserting Ž8. into Ž2., we find N

1 1 k N q pm n Ž y1. m k 2 L

ž / ž /

pm s p m n m

Ý

ks0

ž /

N

Ý Ý jgZ ks0

ž /Ž N k

y1 .

1

k

k q m q xj

.

Now, for x g C R  yN, . . . , 04 , N

Ý ks0

ž /Ž N k

y1 .

1

k

kqx

s

G Ž N q 1. G Ž x . GŽ N q 1 q x.

s

N!

Ž N q x . Ž N q x y 1 . ??? x

Žcompare w 10x , Ž5.41.., so that Žremembering N s n y m. pm s

Ž 9. s

pm 2 pm 2

dn , m q dn , m q

1 pm L m! 1 pm L m!

n!

Ý jgZ Ž m q x j . ??? Ž n q x j . G Ž n q 1.

Ý G Ž m y xj . G Ž n q 1 y x . . j

jgZ

In order to get Ž5., we observe that En s QŽ p 1 y pdn, 1 . as was already reported in w 2x . Let us now engage in the proof of Ž6.. For this we compute the second factorial moment Mn for n G 2: Mn s

Ý

m Ž m y 1 . pm

mG2

s n Ž n y 1.

ž

Ý mG2

Ž 10.

s

s

n Ž n y 1. p 2 q2 n Ž n y 1. q

/

ny m ny2 p m Ý q jm Ž 1 y q j . my2 jG0

Ý q 2 j Ž1 y q j .

ny 2

jG1

p 2 ny2

Ý

2

s 2 Q 2 p2 y 2

ks0

p2 q2

ž

1 k ny2 Ž y1. kq2 k Q y1

/

dn , 2 s 2 Q 2 Ž p 2 y p 2dn , 2 . .

WINNERS IN A GEOMETRICALLY DISTRIBUTED SAMPLE

691

The variance is obtained in the usual way by computing Vn s Mn q En y En2 . Hence

Ž 11.

Vn s

p2 1 q

2

L

q

p 1 q L

y

p2 1 q

2

2

L

q t Ž log Q n . q O

1

ž / n

.

This time

t Ž x. s

p2 2 q2 L

d2 Ž x . q

p 1 q L

d 1Ž x . y

p2 1 q 2 L2

d 12 Ž x .

has Žintegral. mean different from 0! While this quantity is quite small, it can be extracted using the methods described in w 12x , and we find the alternative formula Ž6. of the proposition. I There is a nice way to derive the explicit forms of the expectation and the second factorial moment, using Žprobability. generating functions. Let the coefficient of z k in FnŽ z . denote the probability that n players produce k winners. We get the following recursion: n

Ž 12.

Fn Ž z . s

ž /

n ny k k p q Fk Ž z . q p n z n , k

Ý ks1

n G 1.

It is convenient to set F0 Ž z . s 1. This recursion is almost self-explanatory. When, at a certain level, the remaining players all fail, we label each of them by a ‘‘ z’’ and leave the recursion Žequivalently we might think of z as the probability of an event independent of the game.. The expectation En is obtained via FnX Ž1.; therefore n

En s

Ý ks1

ž /

n ny k k p q Ek q np n , k

n G 1.

Defining the exponential generating function EŽ z . s Ý n G 1 En z nrn!, we obtain E Ž z . s e p z E Ž qz . q pze p z . Using the ‘‘Poisson transform’’

ˆŽ z . s eyz E Ž z . s E

ˆn E

Ý nG1

zn n!

this simplifies to

ˆŽ z . s EˆŽ qz . q pzeyq z . E Equating coefficients we see that, for n G 1,

ˆn s E

np q

Ž y1.

ny1

qn 1 y qn

,

692

P. KIRSCHENHOFER AND H. PRODINGER

and furthermore, for n G 1, n

En s

ž /

n ˆ Ek s k

Ý ks1

s

np q

ny1

Ý ks0

ž

n

Ý ks1

1 ky 1 n kp Ž y1. k k q Q y1

ž /

1 k ny1 , Ž y1. kq 1 k Q y1

/

which coincides with the formula in w 2x . For the second factorial moment we differentiate twice and evaluate at z s 1. An almost identical computation gives the same expression that we obtained already. This approach was used by Knuth in w 15x under the name ‘‘binomial transform’’ and subsequently used by many people Žsee, e.g., w 17x. . Finally, we want to produce some additional new results which shed some additional light on the original question about the number of winners. Assume that the winners have reached the level j. We are interested in the number of players who reached the level j y d, where d is a parameter. Ž d s 0 is the case that was just considered.. Call the random variable in question X d . Then we have the following proposition. PROPOSITION 2. Let X d denote the random variable ‘‘number of players who reach level d below the winners.’’ Then

Ž 13.

P Xd s m d 4 s

Ý Ý jGdq1

ž

= Ž pq jy1 .

n m0 , . . . , m d , m m0

/

??? Ž pq jydy1 .

md

Ž 1 y q jydy1 .

m

,

where the first sum runs over all m 0 G 1, m1 G 0, . . . , m dy1 G 0 and m s n y m 0 y ??? ym d ,

Ž 14.

EnŽ d . s

1

p2

L q

dq1

q

1

p2

L q

dq1

d 1 Ž log Q n . q O

1

ž / n

,

n ª `,

and VnŽ d . s

Ž 15.

p2 q

2 dq2

1 L

Ž 1 q q dq1 . y

q t 1Ž d . Ž log Q n . q O

1

ž / n

p4 q

2 dq2

1 L

ž

2

Ý jG1

Qj

Ž Q j q 1.

2

q

1 4

/

.

PROOF. The formula for the probabilities is self-explanatory w compare the comments on formula Ž1.x . The expected value is just the sum of m d times

693

WINNERS IN A GEOMETRICALLY DISTRIBUTED SAMPLE

this quantity. We get EnŽ d . s n

pq jydy1

Ý

Ž pq jy1 q ??? qpq jydy1 q 1 y q jydy1 .

ny 1

jGdq1

y Ž pq jy2 q ??? qpq jydy1 q 1 y q jydy1 . q jydy1 Ž 1 y q j .

Ý

s pn

ny 1

y Ž 1 y q jy1 .

ny 1

ny1

jGdq1

s

p q

2

dq 1

n

Ý

q j Ž1 y q j .

ny 1

y

jGd

p q

ny1

n Ž1 y q d .

.

This formula holds only for d G 1. Since p ny 1 EnŽ0. s En s n Ý q j Ž 1 y q j . , q jG1 we find

Ž 16.

p2

p

EnŽ d . s

q

EnŽ0. y d

q

dy1

n dq1

Ý

q j Ž1 y q j .

ny 1

y

js1

p q

n Ž1 y q d .

ny1

,

and, because the extra terms are exponentially small, we can use the asymptotic result for EnŽ0. and have p EnŽ d . s dq1 p 1 q exponentially small terms in n, n G 2, q from which Ž14. is immediate using Ž4.. The second factorial moment MnŽ d . for d G 1 is computed analogously as Žwith MnŽ0. s Mn . MnŽ d . s n Ž n y 1 .

2

Ý Ž pq jydy1 . Ž1 y q j .

ny2

y Ž 1 y q jy1 .

ny2

jGdq1

Ž 17.

s s

p

q

3

n n y 1. 2 dq2 Ž

p q2 d

Ý

q 2 j Ž1 y q j .

ny2

jGd

y

p2 q

2

n Ž n y 1. Ž1 y q d .

MnŽ0. q exponentially small terms in n,

ny2

n ª `.

Thus formula Ž15. gives us the asymptotics for the variance. I REFERENCES w 1x E3436 ŽPROBLEM . Ž1991.. Amer. Math. Monthly 99 366. w 2x E3436 ŽSOLUTION. Ž1994.. Amer. Math. Monthly 101 78]80. w 3x BARYSHNIKOV, Y., EISENBERG, B. and STENGLE, G. Ž1995.. A necessary and sufficient condition for the existence of the limiting probability of a tie for first place. Statist. Probab. Lett. 23 203]209. w 4x BRANDS, J. J. A. M., STEUTEL, F. W. and WILMS, R. J. G. Ž1994.. On the number of maxima in a discrete sample. Statist. Probab. Lett. 20 209]217. w 5x EISENBERG, B., STENGLE, G. and STRANG, G. Ž1993.. The asymptotic probability of a tie for first place. Ann. Appl. Probab. 3 731]745.

694

P. KIRSCHENHOFER AND H. PRODINGER

w 6x FLAJOLET, P., GRABNER, P., KIRSCHENHOFER, P., PRODINGER, H. and TICHY, R. F. Ž1994.. Mellin transforms and asymptotics: digital sums. Theoret. Comput. Sci. 123 291]314. w 7x FLAJOLET, P. and RICHMOND, L. B. Ž1992.. Generalized digital trees and their differencedifferential equations. Random Structures and Algorithms 3 305]320. w 8x FLAJOLET, P. and SEDGEWICK, R. Ž1986.. Digital search trees revisited. SIAM J. Comput. 15 748]767. w 9x FLAJOLET, P. and SEDGEWICK, R. Ž1995.. Mellin transforms and asymptotics: finite differences and Rice’s integrals. Theoret. Comput. Sci. 144 101]124. w 10x GRAHAM, R., KNUTH, D. E. and PATASHNIK, O. Ž1989.. Concrete Mathematics. Addison-Wesley, Reading, MA. w 11x HENRICI, P. Ž1977.. Applied and Computational Complex Analysis 1. Wiley, New York. w 12x KIRSCHENHOFER, P. and PRODINGER, H. Ž1991.. On some applications of formulae of Ramanujan in the analysis of algorithms. Mathematika 38 14]33. w 13x KIRSCHENHOFER, P. and PRODINGER, H. Ž1993.. A result in order statistics related to probabilistic counting. Computing 51 15]27. w 14x KIRSCHENHOFER, P., PRODINGER, H. and SZPANKOWSKI, W. Ž1989.. On the variance of the external pathlength in a symmetric digital trie. Discrete Appl. Math. 25 129]143. w 15x KNUTH, D. E. Ž1973.. The Art of Computer Programming 3. Addison-Wesley, Reading, MA. w 16x MAHMOUD, H. M. Ž1992.. Evolution of Random Search Trees. Wiley, New York. w 17x PRODINGER, H. and SZPANKOWSKI, W. Ž1993.. A note on binomial recurrences arising in the analysis of algorithms Žletter to the editor.. Inform. Process. Lett. 46 309]311. w 18x SZPANKOWSKI, W. Ž1988.. The evaluation of an alternative sum with applications to the analysis of some data structures. Inform. Process. Lett. 28 13]19. INSTITUT FUR ¨ ALGEBRA UND DISKRETE MATHEMATIK TECHNISCHE UNIVERSITAT ¨ WIEN WIEDNER HAUPTSTRASSE 8-10 r 118 A-1040 WIEN AUSTRIA E-MAIL : [email protected] [email protected]

Suggest Documents