Generalizing the classical Banach matchbox problem, we consider the process of ... In this paper we study certain extensions of the Banach matchbox problem.
Asymptotic Probabilities in a Sequential Urn Scheme Related to the Matchbox Problem W. Stadje
Abstract. Generalizing the classical Banach matchbox problem, we consider the process of removing two types of `items' from a `pile' with selection probabilities for the type of the next item to be removed depending on the current numbers of remaining items, and thus changing sequentially. Under various conditions on the probability p 1 2 that the next removal will take away an item of type I, given that n1 and n2 are the current numbers of items of the two types, we derive asymptotic formulas (as the initial pile size tends to in nity) for the probability that the items of type I are completely removed rst and for the number of items left. In some special cases we also obtain explicit results. n ;n
1 Introduction In this paper we study certain extensions of the Banach matchbox problem. As an introductory example, consider two interacting species of initial sizes N1 and N2 living in the same area. We suppose that at any time the probability that the next death will occur in species 1 depends on the current size of both species. Various causes could in uence these probabilities: the species may compete for food and living space, they may be hunted by the same or dierent predators, possibly with varying intensities, etc. A large species may be an easier target, but its members can perhaps also help each other, deprive the other species of its food supply or expel it from preferred hiding places, or the predators may gradually become selective and develop preferences depending on the two population sizes. These and various other factors could play a role in changing the elimination probabilities. We assume that no births occur in either species and are interested in the probability that the rst species will be extinct before the second one, and in the remaining size of the surviving species. We will study the asymptotic behavior of these quantities for large initial population sizes. As a model, consider two piles I and II of items that are successively taken away with removal probabilities depending on the current pile sizes. The aim is to determine the distribution of the number of remaining items at the time when the rst pile is emptied (or at the time when this is noticed). In the original formulation of the matchbox problem the removal probabilities are constant, say equal to p 2 (0; 1) for pile I. The case when p = 1=2 and both piles contain the same number of items has been treated in Feller's book [11, pp. 166, 170 and 238] and by Holst [16] and Goczyla [12]. The problem is so natural that over the years various extensions and related questions have been considered, for example, in the context of paired comparisons (Maisel [18], Uppuluri and Blot [27], Menon and Indira [20], Groeneveld and Arnold [13], Nagaraja and Chan [21], Stadje [25], Sigrist [24]), inverse sampling (Harris [14]), le storage (Mendelson et al. [19], Goczyla [12]), or Berg's medicine bottle problem [4]. 1
Asymptotic considerations in these papers are restricted to the residual pile size in the case of constant pn and n1 = n2 ! 1. Stirzaker [26] and Knuth [17] have suggested an interesting extension of the matchbox problem. Stirzaker discusses several practical situations in which sequentially varying probabilities are realistic and, following Knuth, treats in detail the special case when the larger pile is always chosen with probability p and the smaller one with probability 1 ? p. Other important applications using sequentially changing selection probabilities are to consistency checking in training sets (Ben-David and Jagerman [2]) and selective sampling (Chesson [9]). Let n = (n1; n2) 2 N2 and denote by pn the probability that the next item will be chosen from pile I, given that piles I and II still contain n1 and n2 items, respectively. Under various conditions on the double sequence (pn)n2N we will derive asymptotic results on the probability that pile II is emptied rst, given that one starts with piles of sizes N1 and N2. In Section 2 the case of an `in-built' tendency of pn toward a xed probability r 2 (0; 1) will be treated. If the current pile sizes satisfy n1=(n1 + n2) r, the further elimination of items from pile I is slowed down by imposing an upper bound on the dierence pn ? r; similarly, if n1=(n1 + n2) < r, a certain minimal rate of removals from pile I is ensured by bounding r ? pn from below (see (2.1)-(2.2)). For initial pile sizes N1; N2 let P (N1; N2) be the probability that pile II will be emptied rst, and let N = N1 + N2 ! 1. We prove that there is a sequence N of order O(N 1? ) for a certain > 0 such that P (N1; N2) ! 1 if N1 Nr(1 + N ),and P (N1; N2) ! 0 if N1 Nr(1 ? N ). In Section 3 we study the case when pn = n =(n + n ) for polynomially growing sequences n and n. This is a wide class of sequences for which pn (asymptotically) increases in n2 and decreases in n1. If we set n a and n b for some a; b > 0, we also obtain the case of constant pn (pn b=(a + b)). Let us illustrate the main results of Section 3 in the case n = an; n = bn for some constants a; b > 0; that is, pn = bn2=(an1 + bn2). In the following we let the initial pile sizes N1; N2 tend to in nity such that N1=N2 remains bounded away from zero and from in nity. De ne XN(1);N (XN(2);N ) to be the size of pile I (II) at the moment the rst of the piles is emptied. In particular, XN(1);N = 0 if and only if pile I is removed rst. Result 1. If aN12 ? bN22 = o(N13=2); then for every > 0, 2
N
N
2
1
1
2
1
2
2
1
2
P (XN(1);N 2 [N13=4?; N13=4+]) ! 1=2 P (XN(2);N 2 [N23=4?; N23=4+]) ! 1=2: 1
2
1
2
Result 2. If N13=2jaN12 ? bN22j remains bounded away from zero and from in nity, then for every > 0,
!
2 2 x) 3=2 2aN1 2? bN2 3 1=2 N1 [a + b (N2=N1) ] Rx provided that x N13=4+ . Here, and in the following, (x) = ?1 (2)?1=2e?u =2du is the standard normal distribution function. Result 3. If N1?3=2(aN12 ? bN22) ! 1; then P (N1; N2) ! 1: If, additionally, aN12=bN22 ! c > 1; then
P (0 < XN(1)1 ;N2
2
N11=2? N1?1XN(1);N ? (1 ? c?1)1=2 ! 0 in probability for every > 0: 1
2
2
P
P
In general, the behavior of the removal process depends on that of Nn=1 n ? Nn=1 n. If this quantity grows slowly (quickly), we have P (N1; N2) ! 1=2 (! 1), while if it increases at a certain \moderate" rate, P (N1; N2) can be approximated by means of the normal distribution function. Using the precise results in Section 3 we also determine the asymptotic behavior of the residual pile size. In predator-prey and competing species models of the Lotka-Volterra type the evolution of the population sizes is usually modeled as a continuous-time birth and death process (see, e.g. Hitchcock [15], Billard [7] and Ridler-Rowe [22]). The case of pure death processes in which the species can only decrease in number because of starvation, overcrowding, predators, or removal in some form has also received much attention (Severo [23], Billard [5,6], Billard and Kryscio [8]). These papers focus on the solution of the pertaining Kolmogorov forward equations, i.e. on the exact computation of the transition probabilities. We are interested in extinction probabilities and the residual population size; their asymptotic properties as the initial population size tends to in nity have not been treated before. Note that in this context the continuous-time structure can be neglected. If n and n denote the death rates of the two species, the corresponding removal probabilities are given by pn = n=(n + n). In particular, if n and n are of the `product form' n = n ;1n ;2 n; n = n ;1n ;2 n with the same n > 0 in both formulas, we can apply the results of Section 3, where we have to set n = n ;1=n ;1 and n = n ;2=n ;2. Several models in Billard and Kryscio [8] are of this type and satisfy the conditions in Section 3. Finally, in Section 4 we consider in detail the case of proportional removal probabilities and also derive some new results on the matchbox problem of Stirzaker [26] and Knuth [17]. 1
1
1
1
2
2
1
2
1
2
2
2
2 Removals with a trend We assume in this Section that there are constants r 2 (0; 1), s 2 [0; 1), 2 (0; 1] and " > 0 such that n 1 pn r + s n + n ? r + O((n1 + n2)? ?" ) if n1=(n1 + n2) 2 [r; 1] (2.1) 1 2 n 1 pn r ? s r ? n + n ? O((n1 + n2)? ?" ) if n1=(n1 + n2) 2 [0; r]: (2.2) 1 2 1
g(x) r n1=(n1 + n2) Figure 1. Regions for pn: r
1
The shaded areas in Figure 1 show where the points (n1=(n1 + n2); pn) are assumed to lie (except for the O-terms). Relations (2.1) and (2.2) enforce a drift of pn toward r. Let g(x) = r + s(x ? r), x 2 [0; 1]. Then except for the O-terms, (2.1) - (2.2) are tantamount to pn ()g(n1=(n1 + n2)) if n1=(n1 + n2) ()r: 3
Due to the drift toward r, PN (N1) will be seen to be close to 0 if N1 is a bit smaller than Nr and close to 1 if n exceeds Nr by a certain amount N which can be chosen of order o(N ?1+ ) for some > 0. Thus for the initial size N1 of pile I the fraction r of the total size N of both piles is a threshold value. Theorem 1. Let (2.1) - (2.2) be satis ed. Let a = max[1=2; 1 ? ; s] and let N be a sequence satisfying N =N a ! 1 if a 6= s s 1 = 2 N =N (ln N ) ! 1 if a = s: Then lim PN (Nr + N ) = 1 (2.3) N !1 lim P (Nr ? N ) = 0: (2.4) N !1 N
Proof. We consider an auxiliary non-homogeneous Markov chain Y0 = 0; Y1 ; Y2; : : : for which the conditional distribution of YN +1 given YN is given by PY jY = p(Y ;N ?Y +1)"Y +1 + (1 ? p(Y where "y denotes the point mass at y. Then clearly, N +1
Since
N
N
N
N
N
;N ?YN +1) )"YN ;
(2.5)
P (YN +1 n) = P (YN n ? 1) + P (YN = YN +1 = n) = p(n;N ?n+1) P (YN n ? 1) + (1 ? p(n;N ?n+1) )P (YN n):
(2.6)
PN +1(n) = p(n;N ?n+1)PN (n ? 1) + (1 ? p(n;N ?n+1) )PN (n); (2.7) the distribution functions of (YN )N 0 satisfy the same recursion as (PN ())N 0, and P0(0) = P (Y0 = 0) = 1. Hence, PN () is the distribution function of YN . Let N = E ((YN ? Nr)2). By Chebyshev's inequality, PN (Nr ? N ) + 1 ? PN (Nr + N ) P (jYN ? Nrj N ) N =N2 :
(2.8)
Thus we have to nd a suitable estimate for N . Note that N +1 = E (E ((YN +1 ? (N + 1)r)2jYN )) ?
= E p(Y
N
;N ?YN +1) (YN
+ 1 ? (N + 1)r)2 + (1 ? p(Y
N
E ((YN ? Nr)2 ) + 2E ((p(Y
N
;N ?YN +1) ? r)(YN
;N ?YN +1) )(YN
? Nr) + max[r2; (1 ? r)2]:
From (2.9) and the assumptions (2.1) - (2.2) we can conclude that
N +1 N
? (N + 1)r)2
1 + 2Ns + E (jYN ? Nrj)O(N ? ?" ) + O(1):
(2.9) (2.10)
All O-terms in this proof refer to N ! 1. From (2.10) and the simple inequality E (jYN ? Nrj) N 1? + N ?1 E ((YN ? Nr)2) we obtain 2 s ? 1 ? " (2.11) N +1 N 1 + N + O(N ) + O(N ); 4
where = max[0; 1 ? 2 ]. Let !N = N ?2s N . Then by (2.11), ?
!N +1 !N 1 + N1
?2s ?
1 + 2Ns + O(N ?1?" ) + O(N ?2s )
!N (1 + O(N ?0 )) + O(N ?2s );
=
(2.12)
where 0 = min[1 + "; 2] > 1. It follows from (2.12) that for all N > N0 1
!N ? !N = 0
N P
(!j ? !j?1 ) N;N
j =N0 +1
N P 0
j =N0 +1
O(j ?0 ) +
N P j =N0 +1
O(j ?2s );
(2.13)
where N;N = max(!N ; : : :; !N ).PNow choose N0 0 large enough to ensure that the rst sum on the right-hand side of (2.13), Nj=N +1 O(j ? ), is smaller than 1=2 for all N > N0. Then 0
0
0
N X
N;N !N + 12 N;N + max[0; O(j ?2s)]; j =N +1 0
0
0
0
so that for all N > N0
!N N;N 2!N + 2 0
0
N X j =N0 +1
max[0; O(j ?2s )]:
(2.14)
Hence !N = O(1 + N ?2s+1) if 6= 2s ? 1, and !N = O(ln N ) if = 2s ? 1. This yields for N = N 2s !N the relation 8
0 and d; h 0 such that
n = and + O(nd?1 ); as n ! 1
(3.1)
n = bnh + O(nh?1 ); as n ! 1:
(3.2)
For de niteness we assume that d h. Moreover, we set 0 = 0 = 0. We study the random variable Xn describing the nal size of the remaining pile, given that the other pile has just been emptied and the initial sizes have been n1 for pile I and n2 for pile II. Formally, set 5
Xn = (j; 0) (or Xn = (0; j )) if j of the n1 items of pile I (of the n2 items of pile II) are left at the time the last item from pile II (I) is being removed. The probabilities of interest are Qn(x) = P (Xn = (j; 0) for some j x) Rn(x) = P (Xn = (0; j ) for some j x): Note that Qn(n1) (Rn(n2)) is the probability that pile II (I) will be completely removed rst; in particular, Qn(n1) + Rn(n2) = 1 and Qn(n1) = Pn +n (n1). It will be convenient to consider continuous functions (t), (t) on [0; 1), de ned as follows: (t) and (t) are ane-linear on every interval [n; n + 1] and (n) = n, (n) = n, n 2 Z+. Then we set 1
A(x) =
Z x
0
(t) dt; B (x) =
Z x
0
2
(t) dt:
The following theorem, the central result of this section, gives asymptotic estimates for Qn(x) and Rn(x). We need the sums ?(nm;k)
=
n1 X j =k1
mj + (?1)m
n2 X l=k2
lm; m 2 N:
(1) Further let n = n + n , ?(nm) = ?(nm;0); ?n;k = ?(1) n;k ; ?n = ?n;0 : The relation a b between variable quantities a and b always means that a=b tends to 1. Theorem 2. Assume that (3.1), (3.2) hold and that n tends to in nity such that 1
2
0 < 1=K < nd1+1=nh2+1 < K < 1
(3.3)
?n = ?n;0 = o(nd1+(2=3)):
(3.4)
for some constant K > 0 and Then if x = o(n(31 d+2)=(3d+3)? ) for some > 0, we have 1=2 (2) 1=2 Qn(x) ((A(x) ? ?n)=(?(2) n ) ) + (?n =(?n ) ) ? 1;
(3.5)
if x = o(n(32 h+2)=(3h+3)?)) for some > 0, it follows that 1=2 (2) 1=2 Rn(x) ((B (x) + ?n)=(?(2) n ) ) ? (?n =(?n ) ):
(3.6)
Following the proof of Theorem 2 we will present three corollaries in which the three subcases (a) j?nj = o(nd1+(1=2)), (b) nd1+(1=2) = o(j?n j); (c) 1=K < j?nj=nd1+(1=2) < K < 1 are treated separately. In case (a) we nd that Qn(n1) Rn(n2) 1=2 for large n (recall that Qn(n1) is the probability that pile II will be removed rst) and that the size of the remaining pile is roughly of order n(21 d+1)=(2d+2) or n(22 h+1)=(2h+2), respectively. In case (b), assuming for the moment that ?n > 0; one has Qn(n1) ! 1, and Corollary 2 also gives a growth condition on x under which the probability that pile I will survive with less than x items is close to 1. In case (c) we arrive at asymptotic probabilities Qn(n1) and Rn(n2) which are dierent from 0, 1/2 and 1 and can be approximated by the normal distribution. 6
For the proof we need the auxiliary sequence bn;k de ned recursively by
bn;k = pnb(n ?1;n );k + qnb(n ;n ?1);k; n > k bk;k = 1; k (1; 1) bn;k = 0; n 6 k b(n;0);(k;0) = b(0;n);(0;k) = 0: 1
2
1
2
(3.7) (3.8) (3.9) (3.10)
Here is the natural ordering on Z2+, so that n 6 k means that ni < ki for some i 2 f1; 2g, and we set 0 = (0; 0). It is clear that bn;(k;0) is the probability that at the moment when the last item from pile II is removed, the remaining number of items in pile I is equal to k, given that n1; n2 are the initial pile sizes; bn;(0;k) is interpreted similarly. In the following two lemmas we give an integral representation and an asymptotic formula for bn;k. Let
Hn;k(t) =
" n 1 Y
j =k1
(1 ? j it)
n2 Y l=k2
(1 + lit)
#?1
; n = (n1; n2); k = (k1; k2):
(3.11)
Lemma 1. The solution of (3.3) - (3.6) is given by bn;k = k 2+ k
Z1
2
1
?1
Hn;k(t) dt; n > k:
(3.12)
Proof. It is not dicult to check that Hn;k(t), considered as1 a function of n and k, is a R solution of (3.7) for every xed value of t. Hence the integral Hn;k(t) dt (where 2 C is ?1
arbitrary) is a solution of (3.7). A straightforward computation (using standard calculus of residues) then shows that bn;k, as de ned in (3.12), satis es the boundary conditions. 2 Lemma 2. Let n tend to in nity such that (3.3) holds and assume that k and ?n;k satisfy ?n;k = o(nd1+(2=3)) for some > 0. Then
k1 = o(n11?(3d+3)
?1 ?
); k2 = o(n21?(3h+3)
(3.13) ?1 ?
)
(3.14)
(2) 1=2 2 bn;k ( k =(2?(2) n;k ) ) exp(??n;k =2?n;k ):
(3.15)
Proof. To simplify the notation, we will drop the subscripts n and k so that H = Hn;k,
= n, ?(m) = ?(nm;k), ? = ?(1) n;k . We will need the asymptotic relations
(?(2))?1? ! 0 ?
2=(?(2))1?" ! 0 for every " 2 0; (2d + 1)?1
?3 =(?(2))2 ! 0:
(3.16) (3.17) (3.18)
Indeed, (3.16){(3.18) follow from = and1 + bnh2 + O(nd1?1 + nh2?1 ) and the easily proved inequality ?(2) C1n12d+1 + C2n22h+1 for some C1; C2 > 0: We have to analyze the function H . Let G(t) = ln H (?it). By (3.11), 7
n1 X 0 G (t) = j =k1
n j ? X l : 1 ? j t l=k 1 + lt 2
(3.19)
2
The right-hand side of (3.19) tends to ?1, as t & ? n?1, and to 1, as t % ?n 1. Thus, d ?1 ?1 dt (ln H )(?it) has a root t0 2 (? n ; n ). If there is not only one real root, we choose t0 so as to minimize its absolute value. Expanding the right-hand side of (3.19) in ascending powers of t yields 1 X 0 G (t) = ? + ?(m+1)tm; jtj < min[?n 1; n?1]: (3.20) 2
2
1
1
1
m=1
2
If ? = 0 it follows that t0 = 0. Clearly, theProot t0 of (3.20) can be considered as an analytic function of ? so that we can write t0 = 1m=1 m?m , if j?j is suciently small. From the relation 1 1 X X ( m +1) ? ( k ?k )m 0 (3.21) m=0
k=1
we can calculate recursively the coecients k by expanding (3.21) in ascending powers of ?. One nds that 1 = ?1=?(2); 2 = ??(3)=(?(2))3, and so on. By the de nition of ?(m) it is easily seen that (3.22) ?(m) (mn ?2 + nm?2)?(2) = O( m?2 ?(2)): Using (3.22) and a simple induction argument it follows that 1
2
m = O( m?1 (?(2))?m):
(3.23)
From (3.22) and (3.16) we conclude that
t0 = o(1= ): Next we write the integral in (3.12) as Z 1
?1
H (t) dt =
Z 1
?1
(3.24)
H (t ? it0) dt:
(3.25)
We will show now that asymptotically this integral depends only on the behavior of H in the vicinity of ?it0. First we have to notice that m m = dtd m G(t0) (?1)m?1(m ? 1)!?(m); m 2;
(3.26)
this follows by taking derivatives in (3.20) and utilizing (3.24). Now expanding the function t 7! G(t0 ? it) = ln H (t ? it0) in a Taylor series in the interval jtj 2?(1?)=2 yields Z ?(1?)=2 2
?2?(1?)=2
H (t ? it0) dt =
Z ?(1?)=2 2
2?(1?)=2
exp ln H (?it0) + Z ~
1 X m
m=2 m!
(?it)m
!
dt !
2 = 2?1=2H (?it0) exp ? u2 + 33=2 (?iu)3 + O( 42 u4) du 3!2 ?~ 2
8
?1=2H (?it 2
0)
Z 1
?1
exp(?u2=2) du = (2=2)1=2H (?it0);
(3.27)
where ~ = 21=22?(1?)=2 = 2=2 ! 1. Let us estimate the other contributions to the integral in (3.25). We may assume that j 1 and l 1 for j; l 1 (otherwise we can carry out a suitable change of variables in (3.25)). For the integrals over [1; 1) and [2?(1?)=2; 1) we get Z 1
1
jH (t ? it0)j dt = =
Z 1
Z 1 "Y n1
1 Z 1
1
j =k1
2
n1 Y
4
l2t2 + (1 ? lt0)2
2j tj1 ? j t0j
l=k2 n2 Y
2j 2?(1?) + (1 ? j t0)2
= H (?it0)
n2 Y
dt
3?1=2
2 ltj1 + lt0j5
jH (t ? it0)j dt
j =k1
# ?1=2
dt
l=max(k2 ;1) j =max(k1 ;1) ? ( n ? k + n ? k ) = 2 ? 1 1 1 2 2 O 2 (n1 ? k1 + n2 ? k2) H (?it0) ;
2?(1?)=2 n1 Y
2j t2 + (1 ? j t0)2
n2 Y
l22?(1?) + (1 ? lt0)2
(3.28)
?1=2
l=k2 n2 Y ?1=2 ? (1?) ? (1?) ? 2 2 ? 2 2 +1 j (1 ? j t0) 2 +1 l (1 ? lt0) 2 : j =k1 l=k2
Y n1
(3.29)
By (3.24), (3.26) and (3.17), we have
2j (1 ? j t0)?22?(1?) = O 2(?(2))?(1?) = o(1) l2(1 ? lt0)?2 2?(1?) = o(1): Now we can use the inequality 2x=2 x + 1 for 0 x 1 and conclude from (3.29) that Z 1 H (t ?(1?)=2 2
?
it0) dt
= O H (?it0
?(1?) )2??(2) 2 =4
= O H (?it0
)2?C(?(2) )
(3.30)
for some C > 0. Estimates similar to (3.28) and (3.30) also hold for the corresponding integrals over (?1; ?1] and (?1; ?2?(1?)=2]. Using these relations and (3.27) yields Z 1
?1
H (t) dt (2=2)1=2H (?it0):
(3.31)
We can expand ln H (?it0) in ascending powers of ?. From (3.20) it follows that ln H (?it0) =
1 ?(m+1) X m=0
tm+1 m+1 0
= t0 ? +
1 ?(m+1) X m=1
tm+1 m+1 0
=
Using the chain rule we nd the coecients rm: 1 d m+1 (ln H (?it0)) ?=0 rm = (m + 1)! d? 9
1 X m=1
rm?m+1 :
"
Thus,
1 dt0 P dt0 1 d m ( m +1) m t0 + ? + ? t0 = (m + 1)! d? d? m=1 d? 1 dm t0 = m : = m (m + 1)! d? ?=0 m + 1
#
?=0
1 X
m m +1 H (?it0) = exp 2 + (m + 1) ? : (3.32) m=2 But 1 = ?1=?(2) and, by (3.23), (3.18) and (3.17), m?m+1 = O( m?1 (?(2))?m ?m+1 ) = o(( 2=?(2))(m?2)=3) = o((?(2))?(m?2)=3);
1?2
so that (3.32) entails
H (?it0) exp(??2=2?(2) ): Inserting (3.33) in (3.31) and recalling (3.12) and 2 ?(2), we arrive at (3.15). 2 Proof of Theorem 2. Setting k1 = k, k2 = 0 in Lemma 2 we can conclude that Qn(x)
X
kx
provided that for k x
?1=2 expf?(? ? A(k ))2=2?(2) g; (k)(2?(2) n n ) n
?n;(k;0) ?n ? A(k) = o(nd1+(2=3))
and
(2) ?(2) n;(k;0) ?n : Relation (3.35) follows from ?n = o(nd1+(2=3)) (by (3.4)) and
(3.33)
(3.34) (3.35) (3.36)
d+2)=(3d+3)?](d+1) ) = o(nd1+(2=3)): A(k) = O(kd+1 ) = o(n[(3 1 (2) To see (3.36), note that ?(2) n;(k;0) ?n ? Z k
0
Rk
0
(t)2dt and that for k x,
(t)2dt = O(k2d+1) = o(n21d+1);
2d+1 for some C > 0. Using the Euler{MacLaurin sum formula we nd that, while ?(2) n Cn1 under our conditions, Rx
?1=2 expf?(?n ? A(u))2=2?(2) Qn(x) (u)(2?(2) n ) n g du 0
=
A(xR)??n
??n
?1=2 expf?y 2=2?(2) (2?(2) n ) n g dy
(2) 1=2 1=2 = ((A(x) ? ?n)=(?(2) n ) ) ? (??n =(?n ) ):
This proves (3.5). The analogous relation (3.6) for Rn(x) is shown similarly. 2 10
Corollary 1. If we strengthen condition (3.4) and assume that ?n = o(nd1+(1=2));
(3.37)
Qn(n(21 d+1)=(2d+2)?) ! 0 and Qn(n(21 d+1)=(2d+2)+) ! 1=2
(3.38)
it follows that for every > 0,
Rn(n(22 h+1)=(2h+2)?) ! 0 and Rn(n(22 h+1)=(2h+2)+ ) ! 1=2: (3.39) 1=2 Proof. Relation (3.37) entails ?n = o((?(2) n ) ), so that Qn(x) (A(x)=(?2)n )1=2) ? 12 ; if x = o(n(31 d+2)=(3d+3)? ) and 1 if x = o(n(3h+2)=(3h+3)? ): 1=2 Rn(x) (B (x)=(?(2) 2 n ) ) ? 2; 2d+1 + C n2h+1 , the relations Since A(x) and B (x) are of order xd+1 and ?(2) 2 2 n is of order C1 n1 (3.38) and (3.39) follow immediately. 2 Corollary 2. If we assume in Theorem 2 additionally that j?nj ! 1 such that nd1+(1=2) = o(?n ); then
?n ! 1 =) Rn(n2) ! 0; Qn(n1) ! 1 ?n ! ?1 =) Rn(n2) ! 1; Qn(n1) ! 0: Furthermore, we have Qn(x) ! 1 if ?n > 0 and x = xn ! 1 such that 2 ?(2) n = o((A(x) ? ?n ) ):
(3.40) (3.41) (3.42)
1=2 Proof. Under the additional assumption (3.40), (?n =(?(2) n ) ) ! 1, so that (3.41) follows from (3.6). Moreover, (3.42) implies that ((A(x) ? ?n)=(?n)(2))1=2) ! 1, so that Qn(x) ! 1
is a consequence of (3.5). 2 Corollary 3. Suppose that j?nj=nd1+(1=2) remains bounded away from zero and from in nity. Then for every > 0, d+1)=(2d+2))+ if x n((2 1
1=2 Qn(x) (?n=(?(2) n ) )
In particular,
1=2 Rn(y) 1 ? (?n =(?(2) n ) )
h+1)=(2h+2))+ : if y n((2 2
(3.43) (3.44)
1=2 Qn(n1) = 1 ? Rn(n2) (?n=(?(2) (3.45) n ) ): d+1)=(2d+2))+ Proof. For x n((2 we have 1 0 d+1)=(2d+2))+ A(x) A(n((2 ) d +a 1 nd1+(1=2)+ 1 1=2 ! for some 0 > 0. Therefore, under the new assumption on ?n we have (A(x) ? ?n)=(?(2) n ) ((3 d +2) = (3 d +3))+ 1 for those x. Thus if we additionally assume that x = o(n1 ), we can conclude
11
(3.43) from Theorem 2 (cf. (3.5)). Similarly, (3.6) implies that (3.36) is valid under the h+2)=(3h+3))+ ): But if x n((2d+1)=(2d+2))+ , it now follows that additional condition y = o(n((3 2 1 1=2 (1 + o(1))(?n =(?(2) n ) ) 1 ? Rn (n2 ) = Qn(n1 ) Qn(x)
d+1)=(2d+2))+ 1=2 Qn(n((2 ) (?n=(?(2) n ) ); 1
so that the extra condition restricting the growth of x is super uous. The same argument allows to drop the additional assumption on y; (3.43) and (3.44) are proved. 2 Under the conditions of Corollary 3, we have thus found limits for Qn(x) which are dierent from 0; 1=2, and 1 and for which a normal approximation can be used. Now let us consider a situation in which ?n is not of smaller order than nd1+(2=3), as assumed in Theorem 2 and its corollaries. Suppose, additionally to the growth conditions (3.1), (3.2) and (3.3) on n1 and n2, that d = h and that n n X X l ! c 6= 1: (3.46) cn = j 2
1
j =1
l=1
Without loss of generality we may assume that c > 1. In view of (3.1) and (3.2), assumption (3.46) is tantamount to n1=n2 ! ac=b. Let Xn(1) be the rst component of Xn. It is clear that if (3.46) holds with c > 1, the process terminates at a point (j; 0) with probability close to 1. Obviously, (3.46) implies that ?n (1 ? c?1)and1+1=(d + 1). We show that n?1 1Xn(1), the fraction remaining from the initial size of pile I, converges in probability to (1 ? c?1)1=(d+1). The following theorem also gives some information about the rate of convergence. Theorem 3. Under the above assumptions, we have for every > 0,
n1(1=2)?
n?1X (1) ? (1 ? c?1)1=(d+1) 1
n
!P 0:
Proof. As we want to apply Lemma 2, we have to ensure that ?n;(k;0) = ?n ? A(k) + O(nd1 )
(3.47) (3.48)
satis es ?n;(k;0) = o(nd1+(2=3)). Thus, k has to be chosen such that A(k) = ?n + o(nd1+(2=3)) which is tantamount to (a=(d + 1))kd+1 = ?n + o(nd1+(2=3)): (3.49) We also need ?(2) n;(k;0) . A short computation shows that ?(2) n;(k;0)
= ?(2) ? n
Z k
0
(t)2 dt + O(n21d ) = ?(2) n ?
a2 2d+1 k + O(n21d ) = ?~ + o(n21d+(2=3)); 2d + 1
where ?~ Cn21d+1 for some constant C > 0. Using Lemma 2 we obtain bn;(k;0) (k)(2?~ )?1=2 expf?(?n ? A(k))2=2?~ g for values of k satisfying (3.48). Therefore, if s and t are positive numbers that are both of
12
order o(nd1+(2=3)), we nd as in the proof of Theorem 2 that
P Xn(1) 2 fk 2 N j A(k) 2 (?n ? s; ?n + t]g =
P
k:A(k)2(?n?s;?n +t] Z ?n+t
?n ?s
bn;(k;0)
(2?~ )?1=2 expf?(?n ? x)2=2?~ g dx
= (s=?~ 1=2) + (t=?~ 1=2) ? 1:
(3.50) 1 = 2 1 ~ ~ Now let us set s = t = where > 0 is arbitrary. Then (s=? ) = (t=? =2) (n C ?1=2) ! 1, so that the right-hand side of (3.50) tends to 1. Note that
nd1+(1=2)+ ,
?n =
a d+1 a (1 ? c?1)nd1+1 + O(nd1 ); A(k) = k + O(kd ): d+1 d+1
Thus, the inequality j?n ? A(k)j s implies the relation
0 n(11 =2)? nk 1
? (1 ? c?1)1=(d+1)
= o(1) for every 0 > :
But by our choice of s and t, the probability of the event j?n ? A(Xn(1))j s tends to 1. Hence, for every 0 > > 0 and > 0 we have lim inf P
(1=2)?0 ?1 (1) ? 1 1 = ( d +1) (1) n1 n1 Xn ? (1 ? c ) lim inf P ?n ? A(Xn ) s = 1:
2
4 Some special cases In the Introduction we have mentioned the example n = an; n = bn for two constants a; b > 0. Since in this case d = h = 1 and, by a short calculation, ?n (an21 ? bn22)=2; ?(2) n 2 3 2 3 a n1 + b n2; the results presented there follow from the corollaries and Theorem 3. Let us also consider the classical case of constant pn p 2 (0; 1). De ne "n = (n1=n2 ) ? p(1 ? p)?1. The following straightforward consequences of the results of Section 3 are new: (1) If "n = o(n?2 1=2), then Qn(n(11 =2)+) ! 1=2; Qn(n1(1=2)?) ! 0; Rn(n(12 =2)+) ! 1=2; Rn(n2(1=2)?) ! 0 for every > 0. (2) If n21=2"n ! d 2 (0; 1), then Qn(n1) ! (d(1 ? p)=p1=2): (3) If n21=2"n ! 1, then Qn(n1) ! 1: (4) If n1=n2 ! cp=(1?p) for some c > 1, then n1(1=2)?(n?1 1Xn(1) ?1+c?1 ) ! 0 in probability for every > 0: Our next example concerns proportional removal probabilities. Let
pn = 1 ? n1=(n1 + n2 ? 1); n1 + n2 > 1 13
(4.1)
and pn = 1=2 for n1 +n2 1. Note that if p~n ;n = n2=(n1 +n2), the corresponding probability P~N (N1) that pile II is emptied rst is given by P~N (N1) = PN +1(N1), so that this case is also covered. The corresponding Markov chain YN , as de ned in Section 2, has a simple transition law. It is given by Y0 0; Y1 = 0 or 1 each with probability 1/2, Y2 1 and 1
P (YN +1 = m + 1jYN = m) =
2
m = 1 ? P (YN +1 = mjYN = m); N 2: N
We set 1( ) = and introduce the generating function N ( ) = E ( Y ); N 2, of YN . 1 Theorem 4. Let R(; x) = P N ( )xN ?1 for j j; jxj < 1: Then N
N =1
R(; x) =
1? ? 1 + : 1 ? ex(1?)
(4.2)
Proof. Use the recursion P (YN +1 = m) = to nd that
N +1?m m P (YN = m ? 1) + P (YN = m); m = 1; : : :; N N N
(4.3)
(1 ? ) 0 ( ) for 2 R; N 1: (4.4) N N Summing (4.4) we nd that R(; x) satis es the partial dierential equation @R @R R(; x) + (1 ? ) (; x) ? (1 ? x) (; x) = 0 @ @x and the boundary condition R(; 0) = . The solution is given by (4.2). 2 Theorem 5. For N 2 the distribution, the probability-generating function and the characteristic function N ( ) = E (eiY ) of YN are given by N +1( ) = N ( ) +
N
P (YN = m) =
1
(N ? 1)!
m ?1 X j =0
(?1)j
N (m ? j )N ?1; m = 1; : : :; N ? 1 j
(4.5)
1 (1 ? )N X N ( ) = j N ?1 j ; j j < 1: (4.6) (N ? 1)! j=1 dN ?1 (?1)N ?1 iN=2 N (4.7) e sin (=2) N ?1 cot xjx==2; 2 R: N ( ) = (N ? 1)! dx Proof. Equation (4.6) follows from the power series expansion of the right-hand side of (4.2). Next, (4.5) follows from expanding (4.6) in ascending powers of . From (4.4) it can be deduced that N +1( ) = ei N ( ) + iN ?1 (ei ? 1) N0 ( ); 2 R. The solution of this recursion, starting from 1( ) = ei=2 cos(=2), is given by (4.7). 2 Now we use (4.7) to prove a central limit theorem of the Berry-Esseen type for PN (). Theorem 6. For the sequence pn given by (4.1) there is a constant K > 0 such that
14
sup j PN N2 + (N=12)1=2x ? (x) j KN ?1=2; N 2 N: (4.8) x2R Proof. Consider fN ( ) = N ( )e?iN=2. Using (4.7) and taking derivatives in the series expansion 1 X 22n B2n 2n x cot x = (?1)n x ; jxj < ; (2 n )! n=0 a straightforward computation yields
fN ( ) = sin(==2 2)
N
1 + (?1)N hN ( )
; j j < 2
(4.9)
where we have set 8 > > > > >
> > > :
1 jB2(j+n+1)j 2(j+n+1); if N = 2j + 2: (2j + 1)! n=0 (2n)!2(j + n + 1)
hN ( ) = >
1 P
(4.10)
The Taylor expansion of 'N ( ) = ((2= ) sin(=2))N shows that there is an a > 0 such that 2 4 ? 1 1 = 2 'N ((12=N ) ) = exp ? 2 + O(N ) e?a for all 2 [0; (N=12)1=2]; (4.11) Thus if 2 [0; (N=12)1=2], the monotonicity of hN and (4.12) imply that jfN ((12=N )1=2 ) ? e? =2j j'N ((12=N )1=2 ) ? e? =2j + j'N ((12=N )1=2 )jhN ((12=N )1=2 ) 2
2
2
e?a + e? =2 + hN ()e?a : 2
2
2
(4.12) Let us next nd an upper bound for hN (). For odd N it follows from (1.5) that fN ( ) is a polynomial in cos(=2), so that fN () = 0. By (4.9), this implies that hN () = 1 for odd N . Now let N = 2j + 2. Denote the coecients in (4.10) by n(N ) so that
h2j+1( ) =
1 X n=1
(2n j+1) 2(j+n); h2j+2( ) =
1 X
(2n j+2) 2(j+n+1) :
n=0 P1 k ? 1 1 ? 2 k ? 2 k ?2k Using the famous identity B2k = (?1) (2k)!2 n=1 n , we obtain (2j + 1)! (2n)! 2(j + n + 1) jB2(j+n+2)j 2(j+n+2) (2n+1j+2) 2(j+n+2) = (2 j +2) n 2(j+n+1) (2j + 1)! (2(n + 1))! 2(j + n + 2) jB2(j+n+1)j 2(j+n+1) .P 1 1 j + 2n + 3) P k?2(j+n+1) k?2(j+n+2) (j +2(2n n++2)(2 1)(2n + 2) k=1 k=1
j + 2n + 3) (j +2(2n n++2)(2 : 1)(2n + 2) 15
(4.13)
For n > 2j the right-hand side of (4.13) is bounded by 3=4. Further, note that (2n j+2) = (2n + 1)(2n+1j+1)=(2j + 1). Applying (4.13) we nd that
h2j+2() =
2j (2j +2) P 2(j+n+1) +
n=1
n
1 P n=2j +1
2(n j+2)2(j+n+1)
1 n (j + m + 2)(2j + 2m + 3) Q 2n + 1 (2j+1) 2(j+n+1) P n+1 + 2(2jj+1+2)2(3j+2) 2(2m + 1)(2m + 2) n=1 2j + 1 n=2j +1 m=2j +1
1 (2j +1) P n 2(j+n) +
2j P
n=1
1 P
k !
3 4
k=1
4j + 3 (2j+1) 6j+4 2j + 1 2j+2
h2j+1 () + 9h2j+1() = 10:
Furthermore, if 0 <
(4.14)
< N 1=4, as N ! 1:
(N= 4)hN ((12=N )1=2 ) = o(1);
(4.15)
Relation (4.15) can be seen as follows. Without loss in generality let us consider the case N = 2j + 1; in the range 2 (0; N 1=4) we have = (12=N )1=2 = o(1). It is easy to check that jB2(j+n)j K (2(j + n))! 1 2(j +n)?1 2(j +n) (2j )!(2n ? 1)! 2 (2j )! (2n ? 1)! 2j 2n n j ? 2( j + n ) K2(2) n 1+ j 1 + n < K3 n(e=2)2(j+n): Here (and in the following) K1; K2; : : : are constants. We can thus conclude that (N= 4)hN ((12=N )1=2 ) K3
1 NP n 2(j+n) 2(j +n) ( e= 2 )) 4 n=1 2(j + n)
1
K4N ?1 P (e=2))2(j+n) 2(j+n?2) = o(N ?1 ); 0 < < N 1=4: n=1
Now we use Esseen's smoothing inequality (Chung [100, p.?208]) for fN , considered as the characteristic function of the distribution function x ! PN N2 + (N=12)1=2x ; x 2 R: sup jPN x2R
C K5
!
N N 1=2 + ( ) x ? (x)j C T> inf0 2 12 (N=R12)1=2 0
NR1=4 3 0
N
?1
RT
0
jfN ( ) ? e? =2j d + 1 2
#
T
12 1=2 12 1=2 12 1=2 ? = 2 )1=2 j'N (( N ) ) ? e j + j'N (( N ) )jhN (( N ) ) d + ( 12 N
e? =2d + 2
"
2
(N=R12)1=2
N 1=4
!
(e?a + e? =2)d + N ?1=2 KN ?1=2: 2
2
16
!
For the second inequality we have set T = (N=12)1=2 and used (4.16) and (4.12) for the range 2 [0; N 1=4] and (4.13), (4.15) and h2j+1() = 1 for the range 2 [N 1=4; (N=12)1=2 ]. The Theorem is proved. 2 Finally, let us return to the problem of transparent matchboxes introduced by Knuth [17] and Stirzaker [26]: Matches are taken successively from two boxes (each containing initially n matches) according to the following rule. (a) If one box contains more matches than the other one, the next match is taken from this box with probability p. (b) If both boxes contain the same number of matches, both boxes are chosen with equal probability 1/2. Let Xn be de ned as in Section 3 for n = (n; n) and let Un be its non{zero component (which is equal to the number of matches left when one of the boxes is emptied). For the generating functions of the Un 's Stirzaker [26] derives the neat formula 1 X n X P (Un = k)xnuk = (p ? '(pqxpqxu ))(q ? '(pqx)u) ; jxj < 1; juj 1 n=1 k=1
where q = 1 ? p and '(x) = 1 ? (1 ? 4x)1=2 =2: Now consider the asymptotic distribution of Un. Clearly, the cases of negative, zero and positive drift have to be distinguished. Case 1. 0 < p < q. For this case Stirzaker [26] shows that limn!1 P (Un = k) = p?1 (q ? p)(p=q)k ; k 1: Case 2. p > q > 0. If there is an upward drift we can apply Theorem 1 of Bender [3] on the asymptotic normality of double sequences of non-negative numbers. All conditions of this theorem are satis ed for the sequence an(k) = P (Un = k). The function r needed in Bender's result is in our case given as the unique positive root x = r(s) of the equation 1 ? q?1'(pqx)es = 0 (4.16) in a neighbourhood of x0 = r(0) = 1. Solving (4.15) for x yields r(s) = [1 ? (1 ? 2qe?s)2]=4pq = p?1(e?s ? qe?2s) for s > ln(2q), and we have r(0) = 10 ; r0(0) 00= (2q ? 1)=p; r00(0) = (1 ? 4q)=p: 0 (0) )=[n1=2( r (0) ? r (0) )] is asymptotically N (0; 1)-distributed; after It follows that (Un + n rr(0) r(0) r(0) simpli cation we obtain 2
2
D (pUn ? (p ? q)n)=(nq)1=2 ! N (0; 1): Case 3. p = q = 1=2. Then if (Sn )n1 denotes the symmetric random walk with steps 1 and (Sn0 )n1 is the corresponding random walk with re ection at 0, we have for k 2 1 2n ? k ? 1 1 0 P (Un = k) = 2 P (S2n?k?1 = k ? 1) = P (S2n?k?1 = k ? 1) = 2n?k?1 n ? 1 ; 2 (4.17) where we have used the re ection principle for the second equation. Let s > 0 and let kn be an arbitrary sequence of integers such that kn=n1=2 ! s. Using Stirling's formula for the right-hand side of (4.16) it is easy to see that
P (Un = kn ) (n) exp(?s2=4); as n ! 1 1 2
17
(4:17)
in the sense that the ratio of both sides tends to 1. By standard arguments as in the classical proof of the Central Limit Theorem of de Moivre and Laplace it follows from (4.17) that
P (Un
=n1=2
Zs
s) ! ?1=2e?x =4 dx for all s 0: 2
0
References 1. Anderson, K., Sobel, M. and Uppuluri, V.R.R. (1982) Quota ful lment times. Canad. J. Statist. 10, 73 - 88. 2. Ben-David, A. and Jagerman, D.L. (1993) A non-deterministic approach towards con ict resolution in inconsistent training sets. Preprint. 3. Bender, E.A. (1973) Central and local limit theorems applied to asymptotic enumeration. J. Combinatorial Theory (A) 15, 91 - 111. 4. Berg, S. (1987) A variant of Banach's match box problem. Ann. Discr. Math. 33, 1 - 8. 5. Billard, L. (1974) Competition between two species. Stoch. Proc. Appl. 2, 391-398. 6. Billard, L. (1977) On Lotka-Volterra predator prey models. J. Appl. Prob. 14, 375-381. 7. Billard. L. (1981) Generalized two-dimensional birth and death processes and some applications. J. Appl. Prob. 18, 335-347. 8. Billard, L. and Kryscio, R.J. (1977) The transition probabilities of a bounded bivariate pure death process. Math. Biosciences 37, 205-221. 9. Chesson, J. (1976) A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation. J. Appl. Prob. 13, 795-797. 10. Chung, K.L. (1968) A Course in Probability Theory. Harcourt, Brace & World, New York etc. 11. Feller, W. (1968) An Introduction to Probability Theory and Its Applications, Vol. 1. 3rd ed., Wiley, New York. 12. Goczyla, K. (1986) The generalized Banach match-box problem. Acta Appl. Math. 5, 27 - 36. 13. Groeneveld, R.A. and Arnold, B.C. (1984) Limit laws in the best of 2n ? 1 Bernoulli trials. Nav. Res. Logistics Quarterly 31, 275 - 281. 14. Harris, B. (1971) Generalized Banach match box problems and asymptotic distributions for inverse multinomial sampling. Bull. Inst. Internat. Statist. 44, 202 - 206. 15. Hitchcock, S.E. (1986) Extinction probabilities in predator-prey models. J. Appl. Prob. 23, 1-13. 16. Holst, L. (1989) A note on Banach's match box problem. Statist. Prob. Letters 8, 441 - 443. 17. Knuth, D. (1984) The toilet paper problem. Amer. Math. Monthly 91, 465 - 470. 18. Maisel, H. (1966) Best k of 2k ? 1 comparisons. J. Amer. Statist. Ass. 61, 329 - 344. 19. Mendelson, H., Pliskin, J.S. and Yechiali, U. (1980) A stochastic allocation problem. Oper. Res. 28, 687 - 693. 20. Menon, V.V. and Indira, N.K. (1983) On the asymptotic normality of the number of replications of a paired comparison. J. Appl. Prob. 20, 554 - 562. 21. Nagaraja, H.N. and Chan, W.T. (1989) On the number of games played in the best of (2n ? 1) series. Nav. Res. Logistics 36, 297 - 310. 22. Ridler-Rowe, C.J. (1978) On competition between two species. J. Appl. Prob. 15, 457-465. 23. Severo (1969) A recursion theorem on solving dierential-dierence equations and applications to some stochastic processes. J. Appl. Prob. 6, 673-681.
18
24. Sigrist, K. (1989) n-point, win-by-k games. J. Appl. Prob. 26, 807 - 814. 25. Stadje, W. (1986) A note on the Neyman-Pearson fundamental lemma. Methods Oper. Res. 53, 661 - 670. 26. Stirzaker, D. (1988) A generalization of the matchbox problem. Math. Sci. 13, 104 - 114. 27. Uppuluri, V.R.R. and Blot, W.J. (1974) Asymptotic properties of the number of replications of a paired comparison. J. Appl. Prob. 11, 43 - 52.
19