ADVANCES IN APPLIED MATHEMATICS ARTICLE NO.
19, 378]414 Ž1997.
AM970555
Measures of Distinctness for Random Partitions and Compositions of an Integer H.-K. HwangU and Y.-N. Yeh† Institute of Mathematics, Academia Sinica, Nankang, Taipei, Taiwan, Republic of China Received February 15, 1997; accepted March 15, 1997
1. INTRODUCTION Partitions and compositions of integers are, besides their intrinsic interests, usually used as theoretical models for evolutionary processes in different contexts: statistical mechanics, theory of quantum strings, population biology, nonparametric statistics, etc.; cf. w1, 4, 8, 10, 12, 30, 49, 54x. Also parameters in partitions often have natural interpretations in terms of characters in symmetric groups; cf. w15, 47x. Thus properties Žstatistical, algebraic, analytic, . . . . of these objects received constant attention in the literature. In many situations, the notion of ‘‘degree of distinctness’’ naturally arises. The classical birthday paradox states that one needs on the average ) 23 people to discover two that have the same birthday with probability ) 1r2, assuming all birth dates to be equally likely; cf. w16x. The coupon collector problem is similar: what is the expected number of coupons one needs to gather before a full collection, under suitable probability assumptions on the issuing of the coupons? In applications in which only the first product Želement, particle, . . . . is ‘‘expensive’’ and the ‘‘cost’’ of the remaining reproductions is negligible, the study of the measures of distinctness becomes meaningful and important. The number of distinct outcomes in a sequence of multinomial trials Žthe classical occupancy problem. has wide applications; see, for example, Knuth w34x, Johnson and Kotz w28x, Kolchin et al. w35x, Arato ´ and Benczur ´ w5x, and Vitter and Chen w50x. The number of distinct sites visited by a random walk plays an important role * E-mail:
[email protected]. † E-mail:
[email protected]. 378 0196-8858r97 $25.00 Copyright Q 1997 by Academic Press All rights of reproduction in any form reserved.
MEASURES OF DISTINCTNESS
379
in a number of applications in physics and chemistry; see Larralde and Weiss w36x and the references therein. Finally, since distinct irreducible factors over finite fields are important in most algorithms for factorizing polynomials, it is also of interest to investigate the measures of distinctness Žin order to determine the complexity of the algorithms.; cf. w34, 46, 18x. Such a metric notion is useful and widely used in different fields. In number theory, the number of prime factors Žwith or without multiplicity. has long been used as a measure of compositeness of an integer; cf. w25, 40x. In algorithmic theory, the introduction of the measures of presortedness, like inversions, runs, total variations, etc., for sorting problems is well justified by its practical applications; cf. w37x. In information theory, the Shannon entropy is the standard measure of information of a source code. The study of measures of association for data is an important issue in many quantitative problems arising in diverse disciplines such as political science, psychology, and sociology, and in rank statistics, Kendall’s t and Spearman’s r are commonly used measures of association Ž or disarray. of data; cf. w11, 22, 29x. In probability theory, the Kolmogorov distance and the total variation distance between two distributions are frequently used measures of closeness. This paper is concerned with problems of the following type: Gi¨ en a random Ž under a suitable probability model . partition or composition, study quantitati¨ ely the measures of the degree of distinctness of its parts. Erdos ¨ and Lehner w14x were the first, from a probabilistic point of view, to study, in their classical and concise paper, the number and the sum of distinct parts in partitions Žinto positive integers.. More precisely, if P : r 1 ? 1 q r 2 ? 2 q ??? qrn ? n s n;
r j G 0, j s 1, 2, . . . , n,
denotes a partition of n, they considered the two quantities
Ý 1FjFn
1r j G 14
and
Ý
j ? 1r j G 14 ,
Ž 1.
1FjFn
where 1 A s 1 if property A holds and 1 A s 0 otherwise. These two typical quantities measure to some extent the distinctness of the parts in P. Their results state that a randomly given partition of n has about '6 n rp distinct parts whose sum is asymptotic to 6 nrp 2 , where a uniform probability measure on the set of partitions of n is assigned.1 Only quite recently did these measures receive further attention. Wilf w55x studied the number of distinct components in general Ždecomposable. combinatorial structures. In particular, he rederived the result of Erdos ˝ 1
The constant 6rp 2 is also the density of square-free positive integers and the probability that two integers should be prime to each other; cf. w25, Chap. XVIIIx.
380
HWANG AND YEH
and Lehner for the mean number of distinct summands in a random partition. Central limit theorems for the number of distinct summands in partitions were derived by Goh and Schmutz w21x; see also Schmutz w45x. Local limit theorems were studied by Hwang w26x. The corresponding problems for compositions are, unlike most other ones, more complicated and first treated by Knopfmacher and Mays w31, 32x. They derived some combinatorial properties of the number and sum of distinct parts using an elementary approach. Another paper by Richmond and Knopfmacher w44x is also interesting since the results there further reveal the intricacy of the composition structure when studied from a ‘‘distinct’’ viewpoint. See also Warlimont w51x for a multiplicative counterpart. A closely related stochastic model to integer composition is the one studied by Chen w9x, who considered the number of distinct values assumed by a Žfinite. sequence of discrete random variables with total sum n. Although limit theorems for the number of distinct parts in general compositions can be formulated into this model with suitably chosen distribution, his general limit theorems are not useful for our purposes. The purpose of this paper is twofold. First, the measures of distinctness for partitions and compositions are studied in a more general framework: We develop general methods for deriving generating functions for measures in partitions and different compositions}ordinary, cyclic, and branching}with parts belonging to any specific subset L of positive integers. Each of these types of compositions has its own operational characteristic and specific analytic properties. Next, we use probabilistic and analytic methods to investigate in detail the asymptotic behavior of a general weighted sum, including the number and sum of distinct parts in a random partition and composition as special cases. Our results reveal, in particular, a general phenomenon of ‘‘logarithmic transfer’’ from the asymptotic behavior of the given counting sequence to that of the mean number of distinct summands. Also of special interest is the periodic oscillation in the asymptotic expansion of the number of distinct summands in compositions Žbut not in partitions., a result demanding further structural interpretations. Some extensions will be briefly discussed in Section 5. Notation. Throughout this paper, we denote by L s l j 4jG 1 an infinite sequence of positive integers such that 1 F l1 - l 2 - ??? . The generating function of L is denoted by LŽ z . s Ý jG 1 z l j. The symbol w z n x f Ž z . represents the coefficient of z n in the Taylor expansion of f Ž z .. The Vinogradov symbol g is used as a synonym of Landau’s O Ž?. symbol. All unspecified limits Žincluding O, o, ; , f , and g. are taken to be n ª `. The symbol w Ž x . denotes a certain real-¨ alued weight function. We use
381
MEASURES OF DISTINCTNESS
L-partition of L-composition to mean partition or composition of an integer into parts l j . The symbol « always denotes arbitrarily small but fixed quantity whose value vary from one occurrence to another.
2. MEASURES OF DISTINCTNESS Let P s P Ž n . : r 1 ? l1 q r 2 ? l2 q ??? s n;
r j G 0, j s 1, 2, . . . ,
Ž 2.
denote a L-partition or L-composition of n. Consider the following ways to measure the degree of distinctness of P generalizing Ž1.: Xn Ž w . [
Ý w Ž l j . ? 1r G 14 j
Ž w g R. .
Ž 3.
jG1
The general measure X nŽ w . is a random variable when P Ž n. is chosen uniformly at random from the set of L-partitions Žor L-compositions. of n. Thus X n is the weighted sum of dependent Bernoulli random variables. The following probabilistic approach is especially useful for studying mean values of measures of the form Ž3.. By the linearity of expectation, we have E Ž Xn . s
Ý w Ž l j . P Ž rj G 1. jG1
s
Ý w Ž l j . P Ž l j appears in P . . jG1
Let pLŽ n. and cLŽ n. denote, respectively, the number of L-partitions and L-compositions of n: pLŽ n . s w z n x Ł Ž 1 y z l j .
y1
,
jG1
cLŽ n . s w z n x
1 1 y LŽ z .
.
Then E Ž Xn . s
Ý w Ž lj . jG1
pL Ž n y l j . pLŽ n .
,
and
w z n x Ž1 y L Ž z . q z l j . E Ž Xn . s Ý w Ž l j . 1 y cLŽ n . jG1
ž
y1
/
respectively. These expressions are useful for further asymptotics on EŽ X n ..
382
HWANG AND YEH
In what follows, when w Ž x . s x a , we write EŽ X nŽ a . .. Such an approach is, however, rather limited. Thus we shall derive the corresponding multivariate generating functions. Besides X n , one may also consider, for example, the following measures: 1. the number of odd Žor even. parts or, in general, the number of parts that are h mod k, k G 2; 2. the number of distinct pairs of consecutive summands or the number of distinct ‘‘patterns’’ Žwith or without overlaps.; 3. the total variation of parts, defined as the sum of the difference of all pairs of summands; 4. the order statistics of the parts Žthis being similar to order statistics of random variables.; 5. the greatest common divisor or the least common multiple of the parts; and 6. other statistics on the parts like the number of inversions, runs, peaks, left-to-right maxima, etc. Some of these quantities are difficult to work with. For other statistics, see Diaconis et al. w12x.
3. COMPOSITIONS In this section, we first derive generating functions for compositions with a given number of distinct parts; then we consider in detail the asymptotics of the mean measure EŽ X nŽ a . . when L s Zq for all possible values of a . The case of general L is considered in Section 3.3. Finally, we conclude with cyclic and branching compositions. 3.1. Generating Functions Let u 0 s 1 and C Ž z ; u 0 , u1 , u 2 , . . . . s 1 q
Ý nG1
zn
Ý
Ł uwŽr l . ,
r 1 l1qr 2 l 2q ??? sn jG1 rlG0
j
j
where the inner summation runs over all composition of n into parts l j . If w assumes real values, we need to keep the u j s away from the origin to avoid possible ambiguity. ŽFor computational purposes, one may take u j s e i u j..
383
MEASURES OF DISTINCTNESS
THEOREM 1. The generating function C satisfies C Ž z ; u 0 , u1 , u 2 , . . . . s
`
H0
eyt
Ł
jG1
ž
1q
Ý
lj. l llj u wŽ tz l
lG1
l!
/
dt.
Ž 4.
Proof. The presence of an infinite product in an integral indicates the basic principle: first consider the unordered counterpart and then incorporate the enumerating factor into the generating function. To each partition of n of the type P s P Ž n . : s1 ? l i1 q s2 ? l i 2 q ??? qsm l i m s n; s j G 1, j s 1, 2, . . . , m; l i1 - l i 2 - ??? - l i m ,
Ž 5.
there correspond
Ž s1 q ??? sm . ! s1 !s2 ! ??? sm ! compositions. Now we have
Ł
jG1
ž
1q
Ý lG1
lj. l llj u wŽ tz l
l!
/
s1q
Ý
z n Ý t s1q ? ? ? s m
nG1
Ł
lij. u wŽ sj
1FjFm
sj !
,
where the inner sum on the right-hand side is extended over all partitions of n of the form Ž5.. Multiplying both sides by eyt and integrating from 0 to ` gives the factor Ž s1 q ??? qsm .!. This completes the proof. Let e hŽ z . s Ý0 F jF h z jrj!. From Ž4., we obtain the following special cases. By taking w Ž x . s 1 and u l s u for l G h, h G 1, `
yt
H0 e Ł že jG1
l
hy 1
Ž tz l j . q u Ž e t z j y e hy1 Ž tz l j . . / dt,
where u ‘‘marks’’ the number of parts occurring G h times, and, by taking w Ž x . s x and u l s u for l G h, `
yt
H0 e Ł že jG1
l
hy 1
Ž tz l j . q u l j Ž e t z j y e hy1 Ž tz l j . . / dt,
where u ‘‘marks’’ the sum of those parts with frequencies G h.
384
HWANG AND YEH
Thus the expected number of parts with frequencies G h in a random composition of n is given by 1
z llj
1
w znx Ý y Ý l lq1 cLŽ n . jG1 1 y L Ž z . 0FlFhy1 Ž 1 y L Ž z . q z j .
ž
s
/
z lj h
1
w znx Ý . l h cLŽ n . jG1 Ž 1 y L Ž z . . Ž 1 y L Ž z . q z j .
3.2. Asymptotics of the Mean Measure Let us first consider the case L s Zq in some details for two reasons. First, this is a special case for which we can derive rather precise expressions for EŽ X nŽ a . .. The problem becomes more involved for general L and requires different analysis. In particular, there is a logarithmic transfer between the asymptotic behavior of Ý1 F jF n j a to that of EŽ X nŽ a . .. Next, it is surprising that there appears a certain fluctuating phenomenon in the resulting formula. This seems rather unexpected and a possibly structural interpretation requires further investigation. THEOREM 2. Ži.
As n ª `, the expectation of X nŽ a . satisfies
if a - y1, E Ž X nŽ a . . s z Ž ya . q O Ž Ž log n .
Žii.
aq1
.;
if a s y1, E Ž X nŽ a . . s log log n y log log 2 q O Ž Ž log n .
Žiii. EŽ
X nŽ a .
Ž 6. y1
.;
Ž 7.
if a ) y1,
.s
log n
aq1
ž / ž log 2
1
aq1
q
Ý
1FlF w a xq2
a Ž a y 1 . ??? Ž a y l q 1 .
Ž log n .
l
= Ž e l y à l Ž log 2 n . . q z Ž ya . q O Ž Ž log n .
y1
.,
Ž 8.
where e l s w s l x2ys G Ž1 y s ., and
à l Ž u. s
Ý
e l , j G Ž yx j . e 2 jp iu
jgZ_ 0 4
with e l, j s w s l x2ys G Žys y x j .rG Žyx j ..
/
Ž x j s 2 jp irlog 2 . ,
385
MEASURES OF DISTINCTNESS
The error term in Ž8. can be replaced by O Ž ny1 Žlog n. aq1 . when a is a nonnegative integer. Note also that log n
1
aq1
aq1
ª log log n y log log 2,
ž / log 2
as a ª y1. Observe that the first term on the right-hand side of Ž8. may be roughly derived as 1 a a q1 E Ž X nŽ a . . f 1a q ??? q Ž E Ž X n . . ; ŽE Ž Xn . . . aq1 This heuristic does not, however, apply for partitions; see Theorem 7. COROLLARY 1. The number and the sum of distinct parts in a random composition of n satisfy 3 g E Ž X nŽ0. . s log 2 n y q y p1 Ž log 2 n . q O Ž ny1 log n . , 2 log 2 E Ž X nŽ1. . s
Ž log n .
2
q p 2 Ž log 2 n . log n q p 3 Ž log 2 n . q O Ž ny1 Ž log n . . , 2
2 Ž log 2 .
2
respecti¨ ely, where g denotes Euler’s constant and the pj s are periodic functions of period 1 whose Fourier series can be written as p1 Ž u . s
à 0 Ž u. log 2
p2 Ž u . s
,
g y log 2
Ž log 2 .
2
y
à 0 Ž u.
Ž log 2 .
2
,
2
p3 Ž u . s
6g 2 y 12g log 2 q 5 Ž log 2 . q p 2 12 Ž log 2 . q
1 log 2
Ý kgZ_ 0 4
ž
1q
2
c Ž xk . log 2
/
g Ž x k . e 2 kp iu ,
c being the logarithmic deri¨ ati¨ e of the Gamma function. Note that the expected value of the number of parts, counted with multiplicity, in a random composition of n is equal to Ž n q 1.r2. Therefore, the preceding results imply roughly that almost all compositions of n ha¨ e many small parts with large multiplicities. A closely related quantity is the largest summand in a random composition of n whose mean value satisfies 1yz 1yz 2 1y n w z n x Ý y 1 y 2 z q zk kG1 1 y 2 z
ž
and has asymptotically the same behavior as
/
EŽ X nŽ0. .;
see w24, 39x.
386
HWANG AND YEH
Proof of Theorem 2. By definition, E Ž X nŽ a . . s 2 1yn w z n x F Ž z . , where FŽ z. s
Ý ka kG1
ž
1yz 1y2z
y
1yz 1 y 2 z q z k y z kq1
/
.
Ž 9.
We note that generating functions of the form Ž9. Žwith a s 0. were encountered in different contexts; see Knuth w33x and Gourdon and Prodinger w23x. Our approach, which is an extension of theirs, yields more precise results. Set f k Ž z . s 1 y 2 z q z k y z kq1. Then for n G 1, E Ž X nŽ a . . s
k am n , k ,
Ý 1FkFn
where
m n , k s 2 1y n w z n x
ž
1yz 1y2z
y
1yz fk Ž z .
/
s 1 y 2 1yn w z n x
1yz fk Ž z .
.
By Rouche’s ´ theorem, each f k Ž z . has a unique zero in the unit circle Žby considering