THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
arXiv:1806.04962v1 [math.CO] 13 Jun 2018
MARGARET ARCHIBALD1 , AUBREY BLECHER, CHARLOTTE BRENNAN1 , AND ARNOLD KNOPFMACHER1 Abstract. A sequence of geometric random variables of length n is a sequence of n independent and identically distributed geometric random variables (Γ1 , Γ2 , . . . , Γn ) where P(Γj = i) = pq i−1 for 1 ≤ j ≤ n with p + q = 1. We study the number of distinct adjacent two letter patterns in such sequences. Initially we directly count the number of distinct pairs in words of short length. Because of the rapid growth of the number of word patterns we change our approach to this problem by obtaining an expression for the expected number of distinct pairs in words of length n. We also obtain the asymptotics for the expected number as n → ∞.
1. Introduction A sequence of geometric random variables of length n is a sequence of n independent and identically distributed geometric random variables (Γ1 , Γ2 , . . . , Γn ) where P(Γj = i) = pq i−1
for
1 ≤ j ≤ n
with p + q = 1.
The smaller the value of q, the greater the prevalence of smaller numbers in the sequence. So p is the probability of obtaining the letter 1, where 0 ≤ p ≤ 1 and pq i−1 is the probability of obtaining the letter i in any given position of the word. In this paper a word will mean a geometrically distributed random word. Some recent references on sequences of geometric random variables are [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. These sequences are also called geometrically distributed random words. In a given word of length n, we study the number of distinct adjacent two letter patterns (or pairs). This is a generalisation of the problem of distinct values previously studied in [4] by two of the current authors and subsequently in [12]. A pair is made up of two consecutive letters Γi Γi+1 for i = 1 to n − 1. So for example the word Γn = 12412413 is a sequence of geometric random variables of length n = 8. It is made up with 4 distinct letters 1 to 4 and it has four distinct pairs namely: 12, 24, 41, and 13. In Section 2 we count the expected number of distinct pairs in short words. Thereafter, in Section 3, we tackle the problem of the expected number of distinct pairs in a randomly generated word of length n. Here is an outline of our approach: We write a formula for the probability of the first (leftmost) occurrence of the pattern ii in a word of arbitrary length. This is dependent on the position (k, k + 1) in the word where the first occurrence happens. We use this to write a recurrence for this probability in terms of some earlier cases of k and Date: June 14, 2018. 1991 Mathematics Subject Classification. Primary: 05A15, Secondary: 05A05. Key words and phrases. geometric random variables, pairs, generating function, asymptotics. 1 This material is based upon work supported by the National Research Foundation under grant numbers 89147, 86329, 81021 respectively. 1
2
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
we solve this recursion. By summing these solutions over k we obtain the probability Pi,i of such occurrence in a word of arbitrary length n. We then duplicate this procedure for the pair ij where i 6= j. 2. A direct count for the number of distinct pairs in small words We illustrate the method in the case of words of length four and then five. 2.1. Words of length four. There are precisely 15 patterns of length four, these are shown in the first column of Table 1. They correspond to the number of restricted growth functions of length four. This is equal to the number of set partitions of length four for the set {1, 2, 3, 4}, i.e., the Bell number B(4) = 15, see [18]. The sum of the probabilities of obtaining these patterns is 1 as expected. In the last column, we have split the 15 words into five types: A to E. The type depends on the number of distinct letters making up the word and their multiplicity. For example words of type C are made up of two distinct letters: three letters of one kind and one of the other kind. Words with the same P type occur with equal probability. Notation: We write a,b,c,d,e to mean the sum over all values of a, b, c, d and e where these values are all distinct from each other. Now define SA (a), SB (a, b), SC (a, b), SD (a, b, c) and SE (a, b, c, d) to be the probability of occurrence of words of type A, B, C, D and E respectively. Thus SE (a, b, c, d) : =
X
q a+b+c+d = 4!
q a+b+c+d
a=1 b>a c>b d>c
a,b,c,d
=
∞ XXX X
24q 10 (1 − q)4 (1 + q)2 (1 + q 2 )(1 + q + q 2 )
For words of type D, we use the inclusion/exclusion principle to obtain the probability X SD (a, b, c) := q 2a+b+c a,b,c
=
∞ X ∞ X ∞ X
q 2a+b+c −
a=1 b=1 c=1
=
2q 7 (1 (1 −
q)3 (1
+
∞ X ∞ X
q 2a+b+b −
a=1 b=1 + 3q 2 )
∞ X ∞ X
q 2a+b+a −
a=1 b=1
∞ X ∞ X a=1 c=1
q 2a+a+c + 2
∞ X
q 4a
a=1
+ 2q . + q 2 )(1 + q + q 2 )
q)2 (1
Similarly SB (a, b) :=
X
q 2a+2b =
a,b
SC (a, b) :=
X
q 3a+b =
a,b
2q 6 (1 − q 2 )2 (1 + q 2 )
q 5 (1 + q + 2q 2 ) 1 − q3 − q4 + q7
and SA (a) :=
X a≥1
q 4a =
q4 . 1 − q4
(2.1) (2.2)
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
The patterns
Number of
Probability of
distinct pairs
this pattern p4 q4
P
aaaa
1
aq
abab
2
p4 q4
P
aaab
2
p4 q4
abbb
2
abaa
4a
Type
A
2a+2b
B
P
3a+b
C
p4 q4
P
a+3b
C
3
p4 q4
P
3a+b
C
aaba
3
p4 q4
P
3a+b
C
aabb
3
p4 q4
P
2a+2b
B
abba
3
p4 q4
P
2a+2b
B
aabc
3
p4 q4
P
2a+b+c
D
abac
3
p4 q4
P
2a+b+c
D
abbc
3
p4 q4
P
a+2b+c
D
abca
3
p4 q4
P
2a+b+c
D
abcb
3
p4 q4
P
a+2b+c
D
abcc
3
p4 q4
P
a+b+2c
D
abcd
3
a+b+c+d
E
p4 q4
a,b q
a,b q a,b q a,b q a,b q
a,b q a,b q
a,b,c q a,b,c q a,b,c q a,b,c q a,b,c q a,b,c q
P
a,b,c,d q
3
Table 1. Probabilities for all the patterns in a four letter word
Let the probability of obtaining ` distinct pairs in a random word of length four be P4 (`). By reading off from Table 1 we have P4 (1) =
p4 (1 − q)4 S = , A q4 1 − q4
4
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
P4 (2) =
p4 (SB + 2SC ), q4
p4 (2SB + 2SC + 6SD + SE ). q4 By multiplying the number of distinct pairs (the numbers in column two) and the corresponding probabilities (the expressions in column three) and summing over all cases, we obtain the expected number of distinct pairs in four letter words. We will denote this by F4 . Thus p4 F4 = 4 (SA + 2(SB + 2SC ) + 3(2SB + 2SC + 6SD + SE )) q 1 + 9q + 15q 2 + 20q 3 + 17q 4 + 11q 5 − q 6 . (2.3) = (1 + q)2 (1 + q 2 )(1 + q + q 2 ) We show in Figure 1 the plot of the number of distinct pairs against q where 0 < q < 1. P4 (3) =
3.0
2.5
2.0
1.5
0.2
0.4
0.6
0.8
1.0
Figure 1. A plot of the number of distinct pairs against q
Note that as q → 0, F4 → 1, since in the limit we have a word consisting only of ones which has one distinct pair. Also, in the case q → 1 the number of distinct pairs tends to 3, since now all three pairs are distinct with probability 1. 2.2. Words of length five. In this case the number of words is equal to the number of restricted growth functions of length 5, which is the Bell number B(5) = 52. We provide in Table 2, a condensed version of Table 1 for five letter patterns, where the words of the same type (and hence probability) are combined.
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
The patterns
Number of
Probability of
distinct pairs
this pattern p5 P 5a aq q5 p5 P 4a+b a,b q q5 p5 P 3a+2b a,b q q5 p5 P 4a+b a,b q q5 p5 P 3a+2b a,b q q5
aaaaa
1
aaaab, abbbb
2
ababa
2
aaaba, aabaa, abaaa
3
aaabb, aabab, aabbb,
3
5
abaab, ababb, abbab, abbba aaabc, abbbc, abccc
3
p5 q5
P
3a+b+c
ababc, abcab, abcbc
3
p5 q5
P
2a+2b+c
aabac, aabca, abaac, abaca
4
p5 q5
P
a,b,c q
a,b,c q
a,b,c q
3a+b+c
abbcb, abcaa, abcbb aabba, abbaa
4
aabbc, aabcb, aabcc, abacb
4
p5 q5 p5 q5
P
a,b q
P
a,b,c q
3a+2b
2a+2b+c
abacc, abbac, abbca, abbcc abcac, abcba, abcca, abccb aabcd, abacd, abbcd, abcad
4
p5 q5
P
a,b,c,d q
2a+b+c+d
abcbd, abccd, abcda, abcdb abcdc, abcdd abcde
4
p5 q5
P∞
a,b,c,d,e q
a+b+c+d+e
Table 2. Probabilities for all the patterns in a five letter word
As before, we denote the probabilities for patterns of length five by P5 (`) where ` indicates the number of distinct pairs. Then ∞ p5 X 5a P5 (1) = 5 q . q a=1
6
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
P5 (2) = 2
p5 X 4a+b p5 X 3a+2b q + 5 q . q5 q a,b
P5 (3) = 3 P5 (4) = 7
p5
X
q5 p5 q5
a,b
q 4a+b + 7
a,b
X
p5 q5
+ 10
q5
X
q 3a+2b + 3
a,b
q 3a+b+c + 2
a,b,c
p5
X p5
a,b,c
X
q5
q 3a+2b + 12
a,b
q 2a+b+c+d +
a,b,c,d
p5 X 3a+b+c p5 X 2a+2b+c q + 3 q . q5 q5
p5 q5
X
p5 q5
a,b,c
X
q 2a+2b+c
a,b,c
q a+b+c+d+e .
a,b,c,d,e
Again as expected the sum of the probabilities of the 52 cases is 1, and F5 = P5 (1) + 2P5 (2) + 3P5 (3) + 4P5 (4) =
1 + 12q + 30q 2 + 61q 3 + 83q 4 + 102q 5 + 83q 6 + 65q 7 + 34q 8 + 12q 9 − 3q 10 . (1 + q)2 (1 + q 2 )(1 + q + q 2 )(1 + q + q 2 + q 3 + q 4 )
In view of the rapid growth of the Bell numbers, computing the expected number of distinct pairs by this method becomes impractical for much larger n. 3. The expected number of distinct pairs In this section, we find an expression for the probability of obtaining a distinct pair. We split this problem into two cases: i) the ii case, and ii) the ij case where i 6= j. Let Pi,i (n) and Pi,j (n) be the probabilities of getting words of length n which have a pair ii and ij (where i 6= j) respectively somewhere in position 1 up to position n − 1. √ Notation: To simplify the equations we will use the following: a := pq i−1 , b := 1 + 2a − 3a2 , √ c := p2 q i+j−2 , and d := 1 − 4c. We have the following results: Proposition 1. Pi,i (n) = 1 +
2−n−1 ((1 − b + a) (1 − b − a)n − (1 + b + a) (1 + b − a)n ) . b
(see Subsection 3.1 for proof). Proposition 2. Pi,j (n) = d (1 − d)n − (1 − d)n − (d + 1)n + c 4 (1 − d)n + 4 (d + 1)n − 2n+3 − d (d + 1)n + 2n+1 = . 2n+1 d2 (see Subsection 3.2 for proof). Denote the expected number of distinct pairs in a word of length n by E(n). It is obtained by summing Pi,i (n) over all i and Pi,j (n) over all i and j for i 6= j. Thus E(n) =
∞ X i=1
Pi,i (n) +
X i,j distinct
Pi,j (n).
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
7
3.1. Probability of the occurrence of the sequence ii in geometric random words of length n. Here we prove Proposition 1. We first define g(k) to be the probability of the first (left-most) occurrence of the pattern ii at position (k, k + 1) in a geometric random word of arbitrary length. Because we require i in the first two positions of the word (and anything thereafter) g(1) = a2 . Also g(2) = (1 − a)a2 because there is no i in position 1 followed by two i’s. Likewise g(3) = (1 − a)a2 because here there is no i in position 2 followed by two i’s (and anything in all other positions). Using a similar logic we obtain for k ≥ 4 g(k) = (1 − g(1) − g(2) − · · · − g(k − 3)) (1 − a)a2 = g(k − 1) − (1 − a)a2 g(k − 3). The solution of this recursion with its initial conditions is 2−k a2 (1 + b − a)k − (1 − b − a)k , (3.1) g(k) = b Pn−1 Finally sum onP k, i.e., Pi,i (n) = k=1 g(k) to get Proposition 1. As a check k≥1 g(k) = 1 which is expected since ii should occur at some point in an infinite geometrically distributed random word. 3.2. Probability of the occurrence of the sequence ij, for i 6= j, in geometric random words of length n. Next, we prove Proposition 2. Let f (k) be the probability of the first (leftmost) occurrence of the pair ij where i 6= j in positions (k, k + 1) of a word. We obtain f (1) = f (2) = p2 q i+j−2 because in both cases we require ij in the relevant positions of the word (and anything else in all other positions). Also for k ≥ 2 we have ! k−2 X f (k) = 1 − f (r) p2 q i+j−2 = f (k − 1) − cf (k − 2) r=1
Pk−2
because 1 − r=1 f (r) in this equation represents the probability of no ij pair before its occurrence in positions (k, k + 1). The solution of this latter recursion subject to the two initial conditions is 2−k c (1 + d)k − (1 − d)k f (k) = . (3.2) d Checking for the probability of ij in infinite words, we once again obtain the expected X f (k) = 1. k≥1
Thus, as before Pi,j (n) =
n−1 X k=1 n
f (k)
d (1 − d) − (1 − d)n − (d + 1)n + c 4 (1 − d)n + 4 (d + 1)n − 2n+3 − d (d + 1)n + 2n+1 = , 2n+1 d2
8
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
which is our required result. 4. Counting pairs arising from adjacent equal parts In this section we look more closely at the ii√case first seen in Section 3. Let Sn = P∞ i−1 and b := 1 + 2a − 3a2 and the complete sum n=1 Pi,i (n). As before a := (1 − q)q ∞ X (1 − a − b)n (1 + a − b) − (1 − a + b)n (1 + a + b) (4.1) Sn := 1+ b2n+1 i=1
which after some rearrangement is given by ∞ X (1 + a) ((1 − a − b)n − (1 − a + b)n ) − b ((1 − a − b)n + (1 − a + b)n ) Sn = 1+ . b2n+1 i=1 (4.2) In keeping with Section 2 for words of length four and five we show that Sn can be expressed as a rational function of q. Proposition 3. The expected number of distinct pairs consisting of two equal letters is b n+1 c n−j j−1 2 XX (1 − q)k+r j−1 n−j (1 − q)k+r+1 n 1 X (−1)k 3r Sn = − n + k r 2 2j − 1 (1 − q k+r (1 − q k+r+1 j=0 k=1 r=0
n 2 r k+r X 3r n−r n n m n−m (1 − q) k r r (1 − q) − (−1) 3 2 4r (n − r)(1 − q r ) 2m r k 1 − q k+r m=0 k=1 r=0 r=1 n−1 (1 − q)r+1 1 1 X n − r − 1 n−2r−1 r (1 − q)r 2 3 + − . − n r r+1 2 r 1−q 1−q 2
1 − n 2
bn c n−m m 2 X XX
r=1
We prove Proposition 3 by first splitting the sum in (4.2) into odd and even cases of the terms in the binomial expansions. The sum on j is for odd cases and the sum on m is for the even cases. Thus b n+1 c ∞ 2 X X n 1 + 1 −(1 + a) (1 − a)n−2j+1 b2j−1 Sn = b2n 2j − 1 i=1 j=1 n b2c X n −2b (1 − a)n−2m b2m . (4.3) 2m m=0
Now consider the odd cases only, i.e., the sum on j with half of the 1 from the first term. b n+1 c ∞ 2 X X n 1 − (1 + a) Sodd (n) = (1 − a)n−2j+1 b2j−1 . 2 2n b 2j − 1 i=1
j=1
Next, replace a and b with their definitions defined earlier to obtain for Sodd . n+1
c ∞ bX 2 X i=1 j=0
! 1 (1 + (1 − q)q i−1 ) n i−1 n−j i−1 j−1 − (1 − (1 − q)q ) (1 + 3(1 − q)q ) . 2n 2j − 1 2(b n+1 2 c + 1)
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
9
After expanding the two binomial expressions, we get n+1 b 2 c X n−j X j−1 ∞ X X 1 n j − 1 n − j 1 (−1)k (1 − q)k+r 3r q (r+k)(i−1) − n n+1 2 2j − 1 r k 2(b c + 1) 2 i=1 j=0 k=0 r=0 !! X j−1 n−j X j−1 n−j 1−q n k k+r r (r+k+1)(i−1) (−1) (1 − q) 3 q . − n r k 2 2j − 1 k=0 r=0
We split up the k sum because we cannot sum the geometric series on i when k = r = 0, thus n+1 ! c n ∞ bX 2 X 1 2j−1 Sodd (n) = − 2n 2(b n+1 2 c + 1) i=1 j=0 b n+1 c 2
−
X j=0 b n+1 c 2
−
X j=0
n 2j−1 2n
n−j j−1 X X j − 1n − j 3r (1 − q)k+r (−1)k r k 1 − q r+k
n 2j−1 2n
k=1 r=0
j−1 X j − 1 3r (1 − q)r
c b n+1 2
−
X (1 − q)
b n+1 c 2
X (1 − q)
n−j j−1 X X j − 1n − j 3r (1 − q)k+r (−1)k r k 1 − q r+k+1 k=1 r=0
n 2j−1
2n
j=0
1 − qr
n 2j−1
2n
j=0
−
r
r=1
j−1 X j − 1 3r (1 − q)r r=1
r
1 − q r+1
b n+1 c 2
−
X j=0
n 2j−1 2n
.
Note that the first two terms are zero, i.e.; n+1
c ∞ bX 2 X i=1 j=0
1 − 2(b n+1 2 c + 1)
n 2j−1 2n
! = 0.
Thus b n+1 c 2
Sodd (n) = −
X j=0
−
1 2n
n 2j−1 2n
b n+1 c−1 2
X
n−j j−1 X X j − 1n − j 3r (1 − q)k+r (−1)k r k 1 − q r+k k=1 r=0
X j r n j 3 (1 − q)r 2j + 1 r 1 − qr
j=1
r=1
b n+1 c 2
−
n−j X j−1 r k+r X X 1 n j−1 n−j k 3 (1 − q) (1 − q) (−1) 2n 2j − 1 r k 1 − q r+k+1 j=0 k=1 r=0
b n+1 c j−1 b n+1 c 2 2 X X 1 X n j − 1 3r (1 − q)r+1 − n − 2 2j − 1 r 1 − q r+1 j=0 r=1
j=0
n 2j−1 2n
.
10
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
Next, consider the even terms, (i.e., the sum on m), together with 12 from the initial 1 in (4.3). We will call this Seven (n). c bn ∞ 2 X X n 1 − 1 (1 − a)n−2m b2m . Seven (n) = n 2m 2 2 m=0
i=1
Using the same procedure (expand the binomials and split the sum on k) we obtain: ! bn c ∞ X n 2 X 1 − 2m Seven (n) = 2n 2 b n2 c + 1 i=1 m=0 n
b2c X
−
m=0
n 2m 2n
n−m m X X m n − m 3r (1 − q)k+r (−1)k r k 1 − q r+k k=1 r=0
bn c 2
bn c 2 n m 1 X (1 − q)r 3r X . − n 2 1 − q r m=r 2m r r=1
In this case again the first two terms are zero; n
b2c ∞ X X i=1 m=0
! n 1 − 2m = 0. 2n 2 b n2 c + 1
The m-sum in the final term simplifies via the snake-oil method see [20] to bn c 2 X 2−1+n−2r n−r n m r . = n−r 2m r m=r Thus we have bn c n−m m c r bn 2 2 r k+r 1 X X X n mn − m 3 (1 − q)r n n−r 1 X k+1 3 (1 − q) r Seven (n) = n (−1) − . 2 2m r k 2 4r (1 − q r )(n − r) 1 − q r+k m=0 k=1 r=0
r=1
So finally, bringing the odd and even cases together we get Sn = Sodd (n) + Seven (n) n−j j−1 b n+1 c n 2 r k+r X X X j − 1n − j 2j−1 k+1 3 (1 − q) = (−1) 2n r k 1 − q r+k k=1 r=0
j=0
−
1 2n
b n+1 c−1 2
X j=1
X j r n j 3 (1 − q)r 2j + 1 r 1 − qr r=1
b n+1 c 2
+
n−j X j−1 r k+r X X 1 n j−1 n−j k+1 3 (1 − q) (1 − q) (−1) 2n 2j − 1 r k 1 − q r+k+1 j=0 k=1 r=0 n+1
n+1
j=0 r=1
j=0
b 2 c j−1 b 2 c X X 1 X n j − 1 3r (1 − q)r+1 − n − 2 2j − 1 r 1 − q r+1
n 2j−1 2n
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
11
bn c n−m m bn c r 2 2 r n n−r r (1 − q)k+r X X n mn − m X 3 (1 − q) 3 1 X 1 r (−1)k+1 + n − . 2 2m r k 2 4r (1 − q r )(n − r) 1 − q r+k m=0 k=1 r=0
r=1
Simplifying, we obtain b n+1 c n−j j−1 2 k+r XX 1 X (1 − q)k+r+1 n j−1 n−j k+1 r (1 − q) Sn = n + (−1) 3 2 2j − 1 r k (1 − q k+r 1 − q k+r+1 j=0 k=1 r=0
n bn c n−m m 2 2 r k+r X X n mn − m X 3r n−r (1 − q) 1 X n k+1 r r (1 − q) (−1) 3 + n − 2 2m r k 2 4r (n − r)(1 − q r ) 1 − q k+r
m=0 k=1 r=0
r=1
b n+1 c 2
j−1 n j − 1 r (1 − q)r 1 1 X X (1 − q)r+1 3 − . − n + r r+1 2 2j − 1 r 1−q 1−q 2 j=1 r=1
We proceed by using the snake-oil method again to prove the identity X n j − 1 n − r − 1 = 2n−2r−1 , 2j − 1 r r 2j−1≤n
leading to the result in Proposition 3. 5. Asymptotics for the number of distinct pairs with equal adjacent parts In this section we calculate an asymptotic expression for the number of distinct pairs with equal adjacent parts for words of length n as n → ∞. We have the following result: Theorem 1. The number of distinct pairs that consist of adjacent equal parts is asymptotic, as n → ∞ to 1 γ + 2 log(1 − q) 1 X 1 + Γ(χk )e2kπix . − logq n + − 2 2 2 log q 2 log q k6=0
We use√the same expressions for a and b as in the previous section, i.e., a := (1 − q)q i−1 and b := 1 + 2a − 3a2 . Consider Sn from (4.2) ∞ X (1 + a) ((1 − a − b)n − (1 − a + b)n ) − b ((1 − a − b)n + (1 − a + b)n ) . Sn = 1+ b2n+1 i=1
The contribution from the terms (1−a−b)n = O(an ) are asymptotically negligible as n → ∞, so we get ∞ X 1+a+b n Sn ∼ 1− (1 − a + b) . b 2n+1 i=1
Proceed by making a substitution, c1 := b − 1, thus we have ∞ X 1+a+b a c1 n Sn ∼ 1− 1− + . 2b 2 2 i=1
12
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
Using the exponential approximation (1 − x)n ∼ e−nx as x → 0, see for example [6, 19], 1 we have ! ∞ 1+a X c b + 1 −n( a2 − 21 ) Sn ∼ e . (5.1) 1− 2 i=1 √ 2 By expanding 1 + 2a − 3a around a = 0, we obtain an approximation for b which we shall call b2 , thus we let b ∼ b2 := 1 + a − 2a2 = 1 + (1 − q)q i−1 − 2(1 − q)2 q 2i−2 and c1 ∼ c2 := b2 − 1 = (1 − q)q i−1 − 2(1 − q)2 q 2i−2 so that
a c1 a c2 − ∼ − = (1 − q)2 q −2+2i . 2 2 2 2 1+a i Next, we use the approximation 1+a b ∼ b2 = 1 + O(q ), in (5.1) to get Sn ∼
∞ X
(5.2)
c2 a 1 − e−n( 2 − 2 ) .
i=1
Finally, we use (5.2) to obtain Sn ∼
∞ X
2 q −2+2i )
1 − e−n((1−q)
.
i=1
We then use a Mellin transform, see [6, 19] to approximate the sum. The Mellin transform of f (x) will be denoted by M(f (x)) or f ∗ (s) where Z ∞ M(f (x)) = f (x)xs−1 dx 0
defined over the strip s ∈ h−u, −vi. P −x(1−q)2 q −2+2i ). We know that We want to find the Mellin transform of S(x) := ∞ i=1 (1 − e M(1 − e−x ) = −Γ(s) for the strip s ∈ h−1, 0i. By the linearity property, we have
M(S(x)) =
∞ X
2 q −2+2i
M(1 − e−x(1−q)
i=1 ∞ X
=−
(1 − q)2 q 2i−2
−s
)
Γ(s)
for the strip s ∈ h−1, 0i
i=1
= −(1 − q)
−2s 2s
q
∞ X
q −2is Γ(s)
i=1
1 =− Γ(s). 2s (1 − q) (1 − q −2s ) 1In all the calculations that follow, the errors introduced by using asymptotic equivalences in approximations, are o(1) as n → ∞.
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
13
We now use the Mellin inversion formula to return to the original function S(x), where Z α+i∞ 1 S(x) := S ∗ (s)x−s ds 2πi α−i∞ and α is any value in the strip; we shall use α = 12 . Thus Z 1 +i∞ 2 1 −Γ(s) S(x) = x−s ds. 2πi 1 −i∞ (1 − q)2s (1 − q −2s ) 2
Since n → ∞, i.e., x → ∞, we shift the line of integration to the right, and compute the 2kπi negative residues. There is a double pole at s = 0, and a simple pole at s = χk = log Q for 1 all k 6= 0 where Q = q . For s = 0, we have a negative residue of −
1 γ + 2 log(1 − q) 1 1 γ + 2 log(1 − q) log n + − = logQ n + − , 2 log q 2 2 log q 2 2 2 log q
where γ is Euler’s constant. For s = χk , we have a negative residue of 4πik
(1 − q) log q n−χk Γ(χk ) . 2 log q Summing the residues at χk for k ∈ Z \ {0}, we get the fluctuations X 1 4kπi 2kπi δ(logq (n(1 − q)2 )) = (1 − q) log q n log q Γ(χk ) 2 log q k6=0
X Γ(χk ) 2kπi = n(1 − q)2 log q 2 log q k6=0
=
1 X 2 Γ(χk )e2kπi logq (n(1−q) ) . 2 log q k6=0
This gives our result in Theorem 1.
6. Counting pairs arising from adjacent non-equal parts Here we find the number of pairs arising from adjacent non-equal parts in words of length n. We have Proposition 4. The number of distinct pairs with non-equal parts is n
b2c X (−1)k−1 (1 − q)2k q k
n−k k
b n−1 c 2
X (−1)k−1 (1 − q)2k q k n−k−1 k Tn = n + . (1 − q k )2 (1 + q k )(n − k) (1 − q k )2 (1 + q k ) k=1 k=1 √ Again, we use the shorthand notation c = (1 − q)2 q i+j−2 and d = 1 − 4c where i and j are the sizes of the parts making up the distinct pair with non-equal parts. Let Tn :=
∞ X i−1 X i=1 j=1
tn
14
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
where tn : =
n−1 X k=1
2−k c (1 + d)k − (1 − d)k d
1 −1−n 2 2 d [(1 − d)n + (1 + d)n ] + 2−1−n d [(1 + d)n − (1 − d)n ] − d2 2 d which yields (after splitting into three terms) =−
2−1−n [(1 − d)n − (1 + d)n ] + 1. d Now we use the binomial theorem to separate the odd and even cases as in Section 4. The even cases survive in the first term whereas the odd cases survive in the second term. Thus ! Pn Pn n n n n s s −1−n X X (−d) d − 2 n n s=0 s=0 s s + 1. tn = −2−1−n (−d)s + ds − d s s tn = −2−1−n [(1 − d)n + (1 + d)n ] +
s=0
s=0
Putting the corresponding sums together gives n−1
n
tn = −2−n
b 2 c b2c X X n n 2m −n d −2 d2s + 1. 2m 2s + 1
m=0
s=0
d2
Expanding the binomial expression again since = 1 − 4c we get ! ! bn b n−1 c c X X m s 2 2 X X n m n s tn = −2−n (−4c)k + 1 −2−n (−4c)k + 1 +1. 2m k 2s + 1 k m=0
s=0
k=1
k=1
Now multiply out the brackets to obtain bn bn c c X m 2 2 X X n m n −n k −n tn = −2 (−4c) − 2 2m k 2m m=0
−n
−2
m=0
k=1
b n−1 c X s 2 X s n n k −n (−4c) − 2 + 1. 2s + 1 k 2s + 1
b n−1 c 2
X s=0
s=0
k=1
We note that −2−n
n
n−1
m=0
s=0
b2c b 2 c X X n n −n −2 + 1 = 0. 2m 2s + 1
So that tn simplifies further to bn c b n−1 c X X m s 2 2 X X n m n s −n k −n tn = −2 (−4c) − 2 (−4c)k . (6.1) 2m k 2s + 1 k m=0 s=0 k=1 k=1 P Pi−1 Now introduce the sum on i and j so that the entire function is Tn := 2 ∞ i=1 j=1 tn ; i.e., ∞ X i−1 n−1 X X 2−k c (1 + d)k − (1 − d)k Tn = 2 . (6.2) d i=1 j=1 k=1
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
15
In order to sum the expression (6.1) on i and j we substitute c = p2 q i+j−2 . The factor of two is to allow for j > i. Therefore bn c m ∞ X i−1 X 2 X n X m 1−n Tn = −2 (−4(1 − q)2 q i+j−2 )k 2m k i=1 j=1 m=0 k=1 n−1 c X ∞ X s i−1 b X 2 X n s + (−4(1 − q)2 q i+j−2 )k . 2s + 1 k i=1 j=1 s=0
k=1
In order to be able to sum on i and j we interchange the order of summation n b2c m ∞ X i−1 X X n X m 1−n (−4)k (1 − q)2k Tn = −2 q k(i+j−2) 2m k m=0 i=1 j=1 k=1 c b n−1 X ∞ X s i−1 2 X X s n (−4)k (1 − q)2k q k(i+j−2) . + k 2s + 1 s=0
i=1 j=1
k=1
Evaluating the double sum on i and j. n b2c m X n X m qk 1−n Tn = −2 (−4)k (1 − q)2k 2m k (1 − q k )2 (1 + q k ) m=0 k=1 b n−1 c X s 2 k X n s q . + (−4)k (1 − q)2k k 2s + 1 k (1 − q )2 (1 + q k ) s=0
k=1
Now interchange the sum on k and the sum on m to obtain n b2c bn c 2 k (1 − q)2k q k X X m n (−4) 1−n Tn = −2 k 2 k k 2m (1 − q ) (1 + q ) k=1 m=k b n−1 c b n−1 c 2 2 k 2k k X X (−4) (1 − q) q n s + . 2s + 1 k (1 − q k )2 (1 + q k ) k=1
s=k
Proceed using the following two identities: bn c 2 X 2−1−2k+n n n m = 2m k n−k
n−k k
m=k
and b n−1 c 2
X s=k
to obtain Proposition 4.
n 2s + 1
s −1−2k+n n − k − 1 =2 , k k
16
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
7. Asymptotic for the number of distinct pairs with non-equal adjacent parts We prove the following result: Theorem 2. The number of distinct pairs with non-equal adjacent parts is asymptotic to (2γ + 4 log p − log q) logq n 1 (logq n)2 + 2 2 log q 2 2 2 6γ + π + 24 log p + 12 log p(2γ − log q) − 6γ log q − (log q)2 + 12(log q)2 X 3(1 − q)−2χk x−χk Γ(χk ) X (1 − q)−2χk x−χk Γ(χk ) + + , 2 log q 2 log q k6=0
k6=0
as n → ∞. We start with the simplified expression for Pi,j (n) obtained from Proposition 2 in Section 3. 2−1−n −21+n + (1 − d)n+1 + (1 + d)n+1 + [23+n − 4(1 − d)n − 4(1 + d)n ]c . 2 d In order to approximate this as n → ∞ we use the fact that d is close to 1, so that 1 − d is negligible. Thus Pi,j (n) = −
Pi,j (n) ∼ −
2−1−n 1+n n+1 3+n n −2 + (1 + d) + (2 − 4(1 + d) )c . d2
This simplifies to 1 Pi,j (n) ∼ 1 − 2−1−n (1 + d)n 1 + d or using the fact that 1 +
1 d
is close to 2 tn ∼ (1 − 2−n (1 + d)n ).
We then sum Pi,j (n) over i and j to obtain P (n) = 2
∞ X ∞ X
1−2
−n
n
(1 + d)
i=1 j=i+1
=2
∞ X ∞ X i=1 j=i+1
d−1 1− 1+ 2
n .
Again we use the exponential approximation (1 − x)n ∼ e−nx as x → 0, see [6] to obtain ∞ X ∞ ∞ X ∞ X X d−1 P (n) ∼ 2 1 − en 2 ∼ 2 1 − e−nc i=1 j=i+1
i=1 j=i+1
√
since d = 1 − 4c ∼ 1 − 2c. To obtain an asymptotic approximation we apply, as before, the Mellin transform, where the formulae can be found in the previous section. We start with ∞ X ∞ X 2 i+j−2 P (n) ∼ 2 1 − e−np q . i=1 j=i+1
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
17
So that by the scaling rule M(1 − e−xp
2 q i+j−2
) = −(p2 q i+j−2 )−s Γ(s)
and M 2
∞ X ∞ X
−xp2 q i+j−2
1−e
= −2
i=1 j=i+1
∞ X ∞ X
(p2 q i+j−2 )−s Γ(s)
by linearity
i=1 j=i+1
−s ∞ ∞ X X p2 i −s Γ(s) (q ) (q j )−s = −2 q2 i=1 j=i+1 2s ∞ X q −s(i+1) q Γ(s) (q −s )i = −2 p 1 − q −s i=1 2s ∞ q −s X −2s i q Γ(s) (q ) = −2 p 1 − q −s
i=1
2Γ(s) = − 2s s . p q (1 − q −s )(1 − q −2s ) To use the inversion formula we need the negative residues of − p2s qs (1−q2Γ(s) −s )(1−q −2s ) . We have a triple pole at s = 0 and a double pole at s = χk for k 6= 0. The contribution for the triple pole is (2γ + 4 log p − log q) logq n 1 (logq n)2 + 2 2 log q 2 2 2 6γ + π + 24 log p + 12 log p(2γ − log q) − 6γ log q − (log q)2 . + 12(log q)2 The main term 21 (logQ n)2 is as expected, since the number of distinct values in a random geometric sequence is ∼ logQ n, see [4] and hence the number of possible distinct pairs is approximately log2Q n . The contribution from the double poles for all k 6= 0 is 1 (1 −
q −s )(1
−
q −2s )
=
1 (1 −
e−s log q )(1
− e−2s log q )
1
∼ (1 − (1 − s log q +
s2
2
log q ))(1 2
− (1 − 2s log q + 2s2 log2 q))
.
The contribution from the double poles at χk are X k6=0
3Γ(χk ) 2kπi logQ n e . 2p2χk log q
πik But there are also simple poles at another χk where q 2s = 1; i.e., s = log q , where the even πi cases have already been considered, so we only consider the odd values or s = log q (2m − 1)
18
M. ARCHIBALD, A. BLECHER, C. BRENNAN, AND A. KNOPFMACHER
for k = 2m − 1. The residues contribute to X Γ(χ ˆm ) m∈Z
2p2χˆm log q
e(2m−1)πi logQ n
where χ ˆm = (2m−1)πi log q . We therefore obtain the fluctuations for the even cases χk =
2πik log q
X 3(1 − q)−2χk x−χk Γ(χk ) k6=0
and for the odd cases χk =
2 log q
πi(2k−1) log q
X (1 − q)−2χk x−χk Γ(χk ) k6=0
2 log q
.
This finally yields the required result. References [1] M. Archibald, A. Blecher, C. Brennan and A. Knopfmacher, Descents following maximal values in samples of geometric random variables, Statistics and Prob letters, 97, 229–240, (2015). [2] M. Archibald, A. Blecher, C. Brennan, A. Knopfmacher and H. Prodinger, Geometric random variables: Descents following maxima, Statistics and Prob letters, 124, 140147, (2017). [3] M. Archibald and A. Knopfmacher, The largest missing value in a sample of geometric random variables, Combinatorics, Probability and Computing, 23, 670–685, special issue dedicated to Philippe Flajolet, (2014). [4] M. Archibald, A. Knopfmacher and H. Prodinger, The number of distinct values in a geometrically distributed sample, European Journal of Combinatorics, 1059–1081, 27, (2006). [5] C. Brennan and A. Knopfmacher, Descent variation in samples of geometric random variables, Discrete Mathematics and Theoretical Computer Science, 15:2, 1–12, (2013). [6] P. Flajolet and R. Sedgewick, Analytic Combinatorics, Cambridge University Press, 2009. [7] M. Fuchs and M. Javanian, Limit behavior of maxima in geometric words representing set partitions, Applicable Analysis and Discrete Mathematics, 9:2, 313–331, (2015). [8] M. Fuchs and H. Prodinger, Words with a generalized restricted growth property, Indationes Mathematicae (special issue in memory of N. G. de Bruijn), 24:4, 124–133, (2013). [9] R. Kalpathy and M.D. Ward, On a Leader Election Algorithm: Truncated Geometric Case Study, , Statistics and Probability Letters, 87, 40-47, (2014). [10] P. Kirschenhofer and H. Prodinger, The number of winners in a discrete geometrically distributed sample, Annals of Applied Probability, 6, 687–694, (1996). [11] G. Louchard, The Asymmetric leader election algorithm: number of survivors near the end of the game, Quaestiones Mathematicae, 1–19, (2015). [12] G. Louchard, H. Prodinger and M. D. Ward, The number of distinct values of some multiplicity in sequences of geometrically distributed random variables, In 2005 International Conference on Analysis of Algorithms, Discrete Mathematics and Theoretical Computer Science, 231–256, (2005). [13] G. Louchard and H. Prodinger, On gaps and unoccupied urns in sequences of geometrically distributed random variables, Discrete Mathematics, 308 1538-1562, (2008). [14] G. Louchard and M.D. Ward, The Truncated Geometric Election Algorithm : Duration of the Election, Statistics and Probability letters, 101, 40–48, (2015). [15] H. Prodinger, Combinatorics of geometrically distributed random variables: Left-to-right maxima, Discrete Mathematics, 153, 253–270, (1996). [16] H. Prodinger, The box parameter for words and permutations, Central European Journal of Mathematics, 12(1), 167–174, (2014).
THE NUMBER OF DISTINCT ADJACENT PAIRS IN GEOMETRICALLY DISTRIBUTED WORDS
19
[17] L.-L. Cristea and H. Prodinger, The visibility parameter for words and permutations , Central European Journal of Mathematics, 11, 283–295, (2013). [18] R. Stanley, Enumerative combinatorics, Wadsworth and Brooks/Cole, ISBN: 0-534-06546-5, Advanced books and software, Monterey, California. 1986. [19] W. Szpankowski, Average Case Analysis of Algorithms on Sequences, John Wiley and Sons, New York, 2001. [20] H. Wilf, Generatingfunctionology, ISB: 0-12-751956-4, 1994, Academia Press, INC. United Kingdom. M. Archibald, The John Knopfmacher Centre for Applicable Analysis and Number Theory, School of Mathematics, University of the Witwatersrand, Private Bag 3, Wits 2050, Johannesburg, South Africa E-mail address:
[email protected] A. Blecher, The John Knopfmacher Centre for Applicable Analysis and Number Theory, School of Mathematics, University of the Witwatersrand, Private Bag 3, Wits 2050, Johannesburg, South Africa E-mail address:
[email protected] C. Brennan, The John Knopfmacher Centre for Applicable Analysis and Number Theory, School of Mathematics, University of the Witwatersrand, Private Bag 3, Wits 2050, Johannesburg, South Africa E-mail address:
[email protected] A. Knopfmacher, The John Knopfmacher Centre for Applicable Analysis and Number Theory, School of Mathematics, University of the Witwatersrand, Private Bag 3, Wits 2050, Johannesburg, South Africa E-mail address:
[email protected]