l-overlapping success runs of length k in n Bernoulli trials arranged on a circle is ... When the trials are ordered on a circle, two circular binomial distributions.
Statistical Papers 46, 411-432 (2005)
Statistical Papers 9 Springer-Verlag 2005
On binomial and circular binomial distributions of order k for/-overlapping success runs of length k Frosso S. Makri and Andreas N. Philippou Department of Mathematics, University of Patras, 26500 Patras, Greece Received: June 17, 2003; revised version: January 27, 2004
The number of /-overlapping success runs of length k in n trials, which was introduced and studied recently, is presently reconsidered hi the Bernoulli case and two exact formulas are derived for its probability distribution fimction in terms of multinomial and binomial coefficients respectively. A recurrence relation concerning this distribution, as well as its mean, is also obtained. ~ktrthermore, the number of l-overlapping success runs of length k in n Bernoulli trials arranged on a circle is presently considered for the first time and its probability distribution function and mean are derived. Finally, the latter distribution is related to the first, two open problems regarding limiting distributions are stated, and numerical illustrations are given in two tables. All results are new and they unify and extend several results of various authors oax binomial and circular binomial distributions of order k.
Keywords and phrases: Binomial distributions of order k, circular, success m s , nonoverlapping, overlapping, /-overlapping, recurrence relations, occupancy problem. 1. Introduction Let Nn,k denote the number of nonoverlapping success runs of length k (k >_ 1) in n (n > 1) independent trials with success probability p (0 < p < 1). The distribution of N=,k is known as binomial distribution of order k, with parameter vector (n,p). The asymptotic normality of a normalized version of N,~,k was first established by von Mises (see Feller (1968 p. 324), where a simpler proof is presented). The exact distribution of Nn,k was derived by Hirano (1986) and Philippou and Makri (1986). Since then several papers have appeared on the binomial distribution of order k and its applications, espedally on system reliability (see e.g. Phihppou (1986), Aki and Hirano (1988), Godbole
412 (1990), Papastavridis (1990), Hirano and Aki (1993), A_utzoulakos and Chadjiconstantinidis (2001) and Balakrishnan and Koutras (2002)). See, also, Eryilmaz (2003) for the distribution and expectation of the number of success runs in nonhomogeneous Markov dependent trials. A different type of binomial distribution of order k, called type II, was introduced and studied by Ling (1988) as the distribution of the number M~,k of overlapping success runs of length k in a sequence of n Bernoulli trials (see, also, Hirano et al. (1991)). When the trials are ordered on a circle, two circular binomial distributions of order k have been introduced and studied by Makri and Philippou (1994) (see, also, Charalambides (1994), Koutras et al. (1994, 1995) and Makri and Philippou (1996)). Recently, Aki and Hirano (2000) introduced a generMized counting scheme, which includes as special cases the nonoverlapping and the overlapping one, called/-overlapping, where I is a nonnegative integer less than k. The number of/-overlapping success runs of length k is the number of success runs of length k, each of which may have overlapping part of length at most I with the previous success run of length k, that has been enumerated. For t = 0 and t = k - 1, the nonoverlapping and overlapping cases are obtained respectively. For example, let us assume that n = 15 trials are performed, which are numbered from 1 to 15 and that we get the following outcomes SSSSSSFSSSSFSSS, where S denotes success and F denotes failure of a specific trial. Then, the nonoverlapping success runs of length k = 4 are the outcomes corresponding to the trials numbered by 1234and8910
11;
the overlapping success runs of length 4, in the sense of Ling, are 1 2 34, 2 345~ 3 4 5 6 a n d 8 9 10 11, and the 2-overlapping success runs of length 4, are 1234,3456and8910
11.
Aki and I-Iirano (2000) introduced a generalized binomial distribution of order k and investigated some of its properties.
Han and Aki (2000) derived a recerrence for the
413 probability generating function of the number of/-overlapping success runs in the case of n independent trials, as well as in the case of a higher order Markov chain of length n. See, also, Antzoulakos (2003) for a mlified approach for waiting times and number of appearances of runs. Let us assume that the outcomes are arranged on a circle. Then, if there is (at least) one F in the sequence, we start counting from the first S following an F and if there is no F, we can start counting from any S in the sequence. If we assume that the above 15 outcomes are arranged on a circle then the 2overlapping success runs of length 4 axe 8 9 t0 11, 13 14 15 1, 15 1 2 3 a n d 2 3 4 5 . In the present paper, in Section 2, we derive two alternative formulas for the probability distribution function of the random variable Nn,k,l, representing the number of /-overlapping success runs of length k (k > 1) in n (n >_ 1) independent trials with success probability p (0 < p < 1) (see Theorem 2.1 and Theorem 2.2). We also derive the Inean of N~,k,~ (see Proposition 2.1). Ill Section 3, we introduce a new circular binomial distribution of order k as the distribution of the random variable N~,k,l, representing the number of l-overlapping success runs of length k in n independent trials ordered on a circle, and we also derive its mean (see Theorem 3.1 and Proposition we establish a recurrence rdation for the probability distribution of
3.1). In Section 4, N,,,k,l (see Theorem
4.1) and we relate the two distributions by a recurrence relation (see Theorem 4.2). The usefulness of the recurrences for calculating the respective probabilities is illustrated (see Table 1). Finally, in Section 5 we refer to the lmown limiting distribution of N,~,k,0 and Nn, k and 0 otherwise.
2. On binomial distribution of order
k
for/-overlapping
success runs of length k
In this section we reconsider the number o f / - o v e r l a p p i n g success runs of length k in n Bernoulli trials, which was first studied by Aki and Hirano (2000) and Han and Aki (2000) and we derive its p r o b a b i l i t y distribution function in terms of m u l t i n o m i a i as well as in terms of binomial coefficients and its mean. THEOREM 2.1.
Let
N,~,k,t be
a r a n d o m variable (rv) denoting the n u m b e r of l-
overlapping success runs of length k (l < k - 1, k > 1) in n (> 1) i n d e p e n d e n t trials with success p r o b a b i l i t y p (0 < p < 1). Then, for n < k - 1,
P(Nn,k,t
. 0)
1.
pk and .
P(N,~,kj .
P(Nn,k,t = O) =
1) . pk, and. for n > k + 1 and x
1, for n = k, 0 , 1 , . . . ,[V-i],n-I
P(Nnkl, , = x) = pn ~:oZ( \ Xl,+"'.,x,~+xn](q/P)~'+'"+xn'] where the inner s u m m a t i o n is over all nonnegative integers x l , . . . , xn satisfying the con[,~-~,-e] . ditions ~ j = l jxj = n - s and ~i__l-~ -~ i ~=~i" Xi(k-t)+l+j = x -- e~, and m~,~ = m i n { k l, n - 1 - i(k - 1)}.
PROOF. A typical element of the event (N~,k,l = x) is an a r r a n g e m e n t
s
such that x~ of the a ' s are of the type e~ = ~ F ,
r = 1,...,n,
and there are
r--1
xl + ... + xk e~'s, each of which includes no success run of length k, xk+l + ... + x2k-l, e~'s each of which includes 1 /-overlapping success run of length k, x2k-l+l + ... + xak-21 e~'s, each of which includes 2 /-overlapping success runs of length k,... /-overlapping success runs of length k are included in each of the X i k _ ( i _ l ) l + 1 "~- . . . -~- X ( i + l ) k _ i l ~-- X i ( k _ l ) + l + l ~- . . . ~- X i ( k _ l ) + l + ( k - l )
. Generally, i
415 e~'s, i =
1,
"''~t
r'~-l-tl k-I J.
Thus, . the nonnegative . . integers xl,
xn have to satisfy the
conditions (1) x l + 2 x 2 + . . . + n x n = n - s ,
0 k + 1 a n d x = 0, 1 , . . .
~[xl
, [~--2/],n-t
+ . . . + x,,~
where the inner summation is over x , , . . . ,xn satisfying the conditions (1) and (2). For
n < k, P(N=,k,t = x) follows from the definition of tile rv. The proof of the theorem is completed. O For l = 0, Theorem 2.1 provides a new formula for the probability distribution of the number of nonoverlapping success runs of length k in n Bernoulli trims, which is alternative to the one given by I'Iirano (1986) and Philippou and Makri (1986).
For
l = k - 1, it reduces to Theorem 3.2 of Ling (1988). For 1 < l < k - 2, it provides new probability distributions. Since N,,,k,o (N,,,k,k-~) is distributed as binomial of order k, type I (type II) with parameter vector (n,p) and it is denoted by Ba,i(n,p) following definition.
(Bk,tt(n,p)),
we introduce the
416
DEFINITION 2.1. A rv X is said to be distributed as binomial of order k, in the loverlapping case with parameter vector (n, p), to be denoted by Bk,t(n, p), if its probability chstribution function is given by Theorem 2.1. Obviously, Bk,o(n, p) = Bkj(n, p) and Bk,k-, (n, p) = Bk,li(n , V). In the sequel an alternative exact formula for P(N,,,k,t = x) is derived in terms of binomial coefficients. We first state a preliminary lemma. LEMMA 2.1. The number of possible ways of distributing n identical balls into m different urns such that the maximum allowed number of balls in any one urn is r is given by
c(,~,m,,-)=F~(-i)J j=O
( : ) ( : (r: ) ,~+~
1)-1
( s ~ Riordan 1964, p.104). It is noted that (7,(0, m, r) is considered equal to 1. THEOREM 2.2. Let N=,k,i be as in Theorem 2.1. Then,
p'~-vqVC(n - y, y + 1, k - 1)
(a) P(N~,k,~ = O) =
y=[,, / k ] and for x
=
1,...
~-I , [v:~],
P(N,,,k,z
(b)
p,~_yqV ~
~_~
= ~)
y=t(n+xl)/kl-x
v+l
i=1
x--1
i
i -- 1
Mi
x ~
C(/3~, y + 1 - i, k - 1)C(ai -/~i, i, k - l - 1)
~i=mi
where al
1)(y
=
n -
y -
ik
-
(x
-
i)(k
-
l),
ml
= max{0,
ai
-
i(k
-
l -
1 ) } , M~ = m i n { a i ,
(k -
+ 1 -- i)}.
PROOF. (a) Consider the event (Nn,a,l = 0, Y,~ = y), where Y,~ denotes the number of failures in the n trials. Then, a typical dement of the above event is a sequence
SS... SFSS... FSS... F
417 of y failures and n - y successes such that at most k - 1 consecutive successes appear. The probability of any such sequence is qYp'~-Y and the number of such sequences is C(n y, y + l , k - 1) by Lemma 2.1, since the y failures create y + l ceils and C ( n - y ,
y + l , k - 1)
is the number of distributing n - y balls (S's) in y + 1 cells such that each cell contains at most k - 1 balls. Therefore,
P(Nn,k,t = O) = ~ P(N~,k,, = O, Yn = Y) = ~ C(n - y, y + 1, k - 1)p'~-Uq y. y u=[n/kl We now proceed to prove (b). (b) Consider the events Aj = {at least k successes are contained in the j -
th urn},
j = 1 , 2 , . . . , y + l , and A = Nj~{jl,...,j,}A~, where { j , , . . . ,j,} is a subset of { 1 , 2 , . . . , y + l } and A; denotes the complement of Aj. We observe that for 1 < i < m i n { y + l , [ ( n - y ) / k ] } , every element of the event (N,,k,t = x, Y, = y, Aj, N Aj~ a . . . N Aj, N A ) is a sequence
SS...SFSS...
SFSS...
S
with y failures and n - y successes such that x /-overlapping success runs of length k appear, which are contained in the j l - t h , j2-th,..., j i - t h urn, among the y + 1 created ones by the y failures, and no other urn contains more than k - 1 successes. Therefore
P(N~,k,l = x) = ~_,~_, ~ P(N~,k,l = x,Y,, = y, djl NAj~ n ... n d j , N A ) . (*) y i Jl,...,J, It is clear that every element of the event (N,~,k,~ = x, ~
~- y, Ajl N Aj2 N ... n Aj, N A)
has probability qypn-U. So, in order to evaluate its probability we proceed to count its elements, by considering the corresponding occupancy problem. We start by placing k balls (S's) into each of the j~-th, j2-th,...,ji-th urn and we continue by distributing x - i blocks, each consisting of k - t bails into the same urns without any restrictions. It is well known that this is accomplished in
418
possible ways.
Now, there are a~ = n - y - i k - ( x - i ) ( k - l) remaining balls to be
placed into the y + 1 urns under the following restrictions: specified urns (the j v t h ,
Every one of the above i
j2-th,..., j i - t h ) is allowed to contain no more than k - I - 1
balls and every one of the remaining y + 1 - i urns is allowed to contain no more t h a n k - 1 balls.
If ~i of the a i balls are to be distriduted in all the specified y + 1 - i
urns then a i - fli are to be placed in the i specified urns. According to Lernrna 2.1 the distribution of t h e / ~ baUs can be accomplished in C(fli, y + 1 - i, k - 1) different ways. For every d i s t r i b u t i o n of the fli baUs into the y + 1 -
i urns there are C ( a i -
fli, i, k - l - 1)
different ways of d i s t r i b u t i n g the remaining a i - ~i balls into the i urns. Observing that max{0, a~ - i ( k - 1 - 1)} < ~ < min{a~, (k - 1)(y + 1 - i)}, we conclude t h a t the total n u m b e r of ways of distributing the al balls into the y + 1 urns under the above restrictions is given by Mi
c(~,,
v + 1 - i, k - 1 ) c ( ~ ,
- ~,, i, k - 1 - 1),
so that the n u m b e r of the d e m e n t s of the event (N,~,k,t = x , Y,~ = y, Ajl N A j = A... f)Aji f"IA) is x - 1)
M,
i - 1
y]~ C ( f l i , y + 1 - i, k - 1 ) C ( a i - fl,, i, k - I - 1). /~immi
Therefore, P(N,~,k,t = x , Y,~ = y, A j l f) Aj2 n . . . N A j , f3 A ) = p'~-VqY •
i
~
c(#,,
~ + 1 - i, k - 1 ) c ( ~ ,
- #,, i, k - t - 1),
and the result follows from (*) by noting that there are (y+l) /-combinations of the set { 1 , 2 , . . . ,y + 1} and [(n + x l ) / k ] - x < y < n PROPOSITION 2.1.
k-
( z - 1)(k - l ) .
Let N~,k,l be a rv as in Theorem 2.1.
C Then, for n < k - 1,
E ( N ~ , k , t ) = 0 and for n > k > l > 0,
[--q E ( g , , k , l ) = pt ~ j=l
{1 + (1 - p ) { n - l - j ( k - / ) } } p J ( k - 0 .
419
PROOF. Let X 1 , . . . , X~ be i n d e p e n d e n t rvs with p r o b a b i l i t y d i s t r i b u t i o n
P ( X i = z ) = p ~ ( 1 - p ) ~-~, x = 0 , 1 , . . 2 (.k - / ) + l , For i . 1, k-l+l,
, [~_lkl(k--/)+l, let
1", and for i = 2 , . . - , n - k + 1 and j
l 1) in n (n _> 1) i n d e p e n d e n t trials with success p r o b a b i l i t y p (0 < p < 1) ordered circularly. Then, (a) for n < k - 1, P(N~,k, t = 0) = 1; (b) for n = k, P(N~,k, l = O) = 1 - pk and P(N~,k,l~
= [~-7-1]) = pk;
(c) for n = k + l , P(N~,k, l = O) = 1 - ( k + l ) q p k - p k+l and P(N~,k, t = x ) = p k+l ~ , [ ~ ] ,
x
=
(k-4-1)qpk Sx,l-4 -
k_~21., 1, [L~_tj
(d) for n _> k + 2 and x = O, 1, 9"" , ["-q,k-~J [V:7], '~
P(N~,k, l = x) = qp~-I ~_, s ~ s=l
xl + . . . + x~-i \
Xl~''" ~Xn-1
(q/p)X~+...+~,_~ + P'~,[k-~l, ]
where the inner s u m m a t i o n is over all nonnegative integers x l , . . . , xn-1 satisfying the conn--2--l
ditions ~5=1 ~-1 j x j = n - s and v'[--r=r--t "~ "-~ xik-(~-l)t+j = x z_.~=l ] i E5=i
c~-1, M~,~ = m i n { x ( k -
421
l) + k, n} and mi,,~ is as in T h e o r e m 2.1.
PROOF. Obviously, for n _< k - l ,
n = k and n = k + l (a), (b) and (c) of the theorem
hold. For n >- k + 2, we first observe t h a t for [~-1-~] --7=79 . . , rt--gzT-J, ~-l-~] rt ~,~J - 1
Let x = 0, 1,
< x
k, P(N=,k3 = O) = q E j =k -o1 f "P(N,~-l-j,k,l = 0). PROOF.
n-l and m .... j be as in the lemma. (a) Let n >_ k + i, x = i,...,[~-zT],
For
j = 0 , . . . , n - 1, we define the events A j = " j S's precede the first F in the sequence of n Bernoulli trials" and B = " t h e r e is no P in the sequence of n Bernoulli trials", so that
(N,, ,k,l = x) = Uj=o "-1 [(N,,,k,t = x) n Aj] U [(N,,,k,t = x) N B].
425 Obviously, (N,~, k, z # [v:7]) ~-l r B = O. Then, since Aj (j = O, 1 , . . .
, n - 1) a n d B are disjoint
events, we have n,--1
P(N,~,k,, = x)
=
~
P[(N~,k,, = x) I Aa1P(AJ) + P[(N,,,