ELSEVIER
Statistics
A duality
& Probability
Letters
20 (1994) 155-162
theorem for solving multiple-player hypergeometric problems ,Milton Sobela,*, Krzysztof
multivariate
Frankowskib
a Department of Statistics and Applied Probabilit>,, UCSB, Santa Barbara CA 93106, USA b Department of Computer Science. Universit), of Minnesota. Mpls MN 55455, USA
Received November 1992; revised October
1993
Abstract In multi-player problems several samples are taken from a finite multi-categoried population without any replacement. Of interest are joint probabilities for each sample and each category. A duality theorem proved in this paper allows the exchange of players and categories and simplifies many such p’roblems.
Key words: Multivariate hypergeometric probabilities; Duality theorem; Multi-player problems; Multi-sample lems; Random partitioning for search problems; Counting square matrices to solve probability problems
prob-
0. Introduction
This paper
deals with the study of dualities in multiple-player problems when sampling is carried out neither between observations nor between players. Since each player takes exactly one sample, this could be regarded as a multi-sample problem. These dualities are quite useful in solving joint probability problems dealing with several players when observations can fall into distinctly different categories. In each case of duality we find two apparently unrelated events, and show that as a result of duality they have equal probability. In many cases this equality enables us to reduce a j-player problem to a l-player problem, which is much easier to solve. For example, consider dealing out at random 4 cards to each of two players from the usual deck of 52 cards. We ask for the probability that each of the two players gets at least ooze spade. This probability can be computed bv elementary methods as a double sum P in the form, without replacement,
P=
4
1 mi”(4i3-Xi (‘,“)(,“-“,) (‘3,-X)(“,“-‘,x) = X=1
*Corresponding
(“4’)
y=l
author.
E-mail:
(“,“,
2 134661 4 502 365
[email protected]
0167-7152/94/$7.00 (0 1994 - Elsevier Science B.V. All rights reserved SSDI 0167-7152(93)E0163-N
z 0.474120.
156
M. Sohel, K. Frankowski
/ Sratisrics & Prohahilify
Lrttrrs 20 (1994)
155-162
Consider now a different event, which we call the dual of the above. In this event one player takes 13 cards and we ask for the probability that he will obtain at least 1 card in each of two speci$ed ranks, (say aces and kings). The probability of this new event (say Q) is given by
(2) which is exactly the same as the result in (1). This is a simple example of our duality. In this paper we stress the concept of a multi-player (or multi-sample) problem which has received hardly any attention in the past literature. Most scholars seem to think that multi-player problems are trivial extensions ofsingle-player problems. This is true for problems with replacement but we hope to show in this paper that this is far from true for problems without replacement. In fact it appears that these problems form a class of problems that deserves more study and research in its own right. An example of an interesting multiple-player problem that is extremely easy to state but difficult to solve is given (with solution) is Section 4. In the literature few authors appear to have studied these problems; one exception in Bore1 and Chtron (1955), who accepted the challenge to study mutli-player problems as they arise in the game of bridge; the duality found and studied in this paper would have been useful to them, had they known about it. Even the well known encyclopedic book of David and Barton (1962) has little to say about multi-player problems, except that pages 15 and 16 do concern the different hands that arise in a bridge game; they quote the same book by Bore1 and Chkron (1955) that we have quoted. The potential for application however is very much, wider than to the game of bridge or to any game of chance, since the concept also applies to any multiple-sampling problem whenever the event of interest is a joint probability associated with all of the individual samples and each category.
1. Joint tail probabilities
for multi-player
problems
In all problems considered in this paper sampling is carried out without replacement (either simultaneously by all players or by one player after another in any fixed order). Let N denote the total population size (assumed to be finite) and suppose the structure consists of b categories with a common size M in each category and remainder of N - hM > 0, which we call the ‘sink’. Let n denote the common sample size taken by each player and let j denote the number of players. A typical event of interest is that each player has at least r observations in each of the b categories. The probability of this event is denoted by Hjl$‘,,(r, n), which may, of course, be zero. The duality of P and Q in (1) and (2) can be written in our notation as
We note that there is a double interchange of the pair (j, b) and simultaneously affecting the probability of the events involved, namely that Hj Ig,,(r,
n) = Hh I!$ (r, M),
of the pair (M, n) without
(4)
which we prove below. The same result also holds if we replace I (which signifies at least) by J (which signifies fewer than) on both sides of the equality (4). We shall not repeat our proof when J is used in place of I. The symbols I and J were extensively used with the same meaning in connection with Dirichlet integrals and the multinomial and negative multinomial distributions in Sobel et al. (1977, 1985). Some generalizations will be considered in Sections 2 and 6. Further examples illustrating the duality are given below after the proof.
M. Sohel, K. Frankowski
i Statistics
& Prohahility
Letters
20 (1994)
157
155-162
2. Proof of duality We now consider the proof of our duality for arbitrary j and h. The proof is easier to follow if we use, as done below, products of multinomials instead of binomials. We write the joint probability of at least r in each of h categories for the yth player (1) = 1,2, . ,j) each conditioned on all the previous players (using the convention that any sum from 1 to 0 is zero). Using c(;,~,CQ~,. . . , CQ as indices for the yth player, we have conditioned on the 7 - 1 previous players and combining all these for 1’= 1,2, . . ,j, we can write using a product of multinomials
L
’
N-jn
M-Cj=lr,,,M-Cj=,rx,z
,...,
I:[
M-&Q
j
N M,M ,...,
M
1.
The limits of all the hj indices (Q) run from a common lower limit r and the upper limits are all ‘natural’. Using (5) we can write our double interchange (DI) dual in the form
X
N-bM n-~~=~%~~~H-)$=~fl~~,
....n-_Ci=lZflj
]~[n,c.,J /
By combining LYE 1! cczl!, . , ctbl! and exchanging n! from the denominator have term by term equality, therefore the proof is complete.
with M! in the numerator,
etc., we
Comments. (1) Since our proof is term by term, it follows that not only does the duality hold for 1 (at least) and J (less than) but also for equality in the parameter denoted by r. (2) Define sample excess as the N - jn cards which are not dealt out to any player. Our duality interchanges the sink in one problem with the sample excess in the dual problem. (3) It is interesting and useful to note that the proof is unchanged if we use vectors for M and n, namely M=(M1,M2, . . . . M*)andn=(n,,n,, . . . , nj), signifying that the batch sizes for each of the b categories need not be equal and the sample sizes for each of the j players also need not be equal. (4) For the special cases j = 1 or b = 1 there is no interchange of the indices apr needed and hence in this case the common lower limit r also can be extended to a vector, namely for one player we have , rj). We have to be careful not to Y = (rl,r2, . . . , rb) or for many players and one batch we have r = (r1,r2, assume that our result holds in general for unequal r-components when min( j, b) > 1. However, it can also be seen from the proof above that if the unequal r-components are properly switched, the result will still hold. We illustrate this point by specific examples below. We now consider Problem 2.1 with three different ways of exchanging r-values, only one of which is correct; call these Problems 2.1 A, 2.1 B and 2.1 C. Problem 2.1 For two (specified) players with 4 cards each from the usual deck of 52 cards, what is the probability that each player has at least 1 spade and at least 2 clubs. Find the correct dual among the problems 2.1A, 2.1B, and 2.1C below. In our notation the original problem is Hz I\‘:, 52(1, 2,4). Problem 2.1A. Each of two players takes 13 cards; find the probability at least 2 kings.
that each player has at least 1 ace and
M. Sahel, K. Frankowki
158
/ Statistics
& Probability
Letters 20 (1994)
Problem 2.18. Same, except that we want the probability that one player 2 kings AND that the other player has at least 2 aces and at least 1 king.
155-162
has at least 1 ace and at least
Problem 2.1C. Same, except that we want the probability that one player has at least 1 ace and at least 1 king AND that the other player has at least 2 aces and at least 2 kings. Using multiple summation, the exact answers for these 4 problems, respectively, turn out to be 331903
201344
22511825’
3212066
and we see that Problem events in the form Spades
331903
202 606 425 ’
22511 825’
2.1C is the correct dual of Problem
Clubs
Player
1
1
2
Player
2 (
1
2
2.1. In matrix notation
we can write the ‘at least’
Aces Kings (5
;;;;::
; (
;
;
(8)
)
1
and we note that these two matrices result.
3. The special case Min(j,
(7)
22511825’
are the transpose
of each other; this is entirely
consistent
with our duality
6) = 1
We saw in the above proof that no interchange was necessary when Min (j, b) = 1 and hence we can use the duality and the resulting probability equality with vector values of Y,say Y.Without loss of generality assume not necessarily equal. The duality that b = 1, so that we can write Y = (rl, r2, , rj) with integer components states that
HjZ,,&, n) = H ll;‘k(r ) M) 7
(9)
1
where the subscript 1 (and superscript 1) are understood. The left-hand side of (9) is the probability that the jth player has at least ri observations in the ith category (i = 1,2, . . ,j). Below we consider a few examples. We will call a sample ‘sparse in a given stratum’ iffewer than u observations came from that stratum and we call it ‘crowded in a given stratum’ if at least v observations come from that stratum (0 < u < u d M). Example 3.1. Suppose each of four players gets 4 cards and a pair of players (say, North and South) are specified. What is the probability that North is crowded in spades and South is sparse in spades if u = 2 and v = 3 define sparse and crowded, respectively? We use the symbol HzZJ for 2 players, where I is associated with North and J with the South, but the number of categories (or strata) is still one, since we are dealing with only one suit, namely spades. Hence the desired probability is Hz ZJ13.52(3,2,4). By our duality this is equal to HZJ4, 52(3,2, 13), which is the probability that one player (with 13 cards) has at least 3 aces and fewer than 2 kings; note that since we now have only one player, the I and J must now refer to 2 different categories, so that b = 2. The common value is therefore HZJ4,52(3,2,13) Another
example
= i i 4 4 i~,j_,(Z)(j)(L3-?j)/(~~)zo’03572487’
deals with maximum
suit sizes.
(10)
M. Sohel, K. Frankowski
/ Statistics
& Probability
+ 1, 13) =
20
i 1994)
155- 162
159
M, of the 4 spade suit sizes if four players are each
Examples 3.2. What is the distribution of the maximum dealt 13 cards as in a bridge game. In our notation, HqJi3,52(.x
Letters
(11)
P(M, d x)
is the probability that each of the four players has at most x spades. By our duality this equals HJyj,,,(x + 1, 13) which is the probability that for one player the maximum suit size over all four suits is at most x. It follows that these two distributions are the same for all x (4 < x d 13). In this case our duality gives us the equality of the two entire (apparently unrelated) distributions. The common table for these two is (to 8 decimals) < x)
P(Max
x
d X)
x
P(Max
4
0.35080525
9
5
0.79420185
10
0.99999963
6
0.95967871
11
1.oOOOOooo
7
0.994945 11
12
1.oooOOooo
8
0.9996 1272
13
1.oooooooo
0.999983 17
We have considered above, from the point of view of all four bridge players, suit size where the maximum is overj suits, only forj = 1. For j = 2,3 and 4 be, respectively, H,J\2:.,,(x + 1, 13), H,J\3j.52,(~ + 1, 13) and H,Ji4i,,,,(x x is from 4 to 13. The last one of these 3 is self-dual and will be discussed
the distribution of the maximum the values in our notation would + 1, 13) where again the range of again in the next section.
4. Self-dual problems and counting square matrices count of In Section 3 we noted that H4 Jyj, 52(x + 1, 13) is self-dual. Let C,(g; x, 13) denote a permutation all different square matrices of size 4 x 4 for a fixed set of integer elements cc,,(/?, y = 1, . . . ,4) with each element at most x and with all line sums (i.e., both rows and columns) equal to 13. Let & denote the set of all such 4 x 4 matrices which are not isomorphic to each other (i.e., pairwise non-isomorphic). If we regard the four columns as suits and the four rows as players (or vice versa), then the relation of these counts to our probability problem is easily seen to be H,J:4;,,,(x
+ 1, 13) = c
A similar result holds for Z(X,.) replacing J(x + 1, .). Thus for the probability sj 3 3 (an I problem), there is only one matrix in JY, namely, /4
3
3
that every one of the 16 suit sizes
3\
(12)
with a permutation H4J:4:,52(3,
count 13) = 4!
of 4! = 24 and hence the result for this problem
[(‘4’)($(:>l’/(Z)(:3Cr)
~0°.00093142~
is (13)
160
M. Sohel, K. Frunkowski
Clearly,
/ Statistics
& Probabilit):
H4 Ji4d,52 (14, 13) = 1 and by an easy combinatorial
H4J\4;,,,(13,
13) = 1 _
o’_ (E)
2!(‘)2 (E)(E)
+ ... _
Letters
argument 4!(2)2
(:f)(:S)(::)(::)
20 (1994)
155-162
we also have
1z
O.u74804;
(14)
10
the bracket in (14) is the chance that at least one player has 13 cards of the same suit. Another self-dual problem is the following: If 13 players are each dealt 4 cards from the usual deck of 52 cards, what is the probability that no player gets any pair? In this case we are interested in the number of 13 x 13 binary matrices (i.e., with 0, 1 elements only) with all line sums equal to 4. Here we associate the columns with the 13 ranks and the rows with 13 players (or vice versa) and if Ci3(1, 4) is the count then the relation to our probability problem is Hi3 Ja,;:(l,
4) =
Cl3(L
4)(4!)2” ~ o oo732663 52!
The first 4 decimals (namely 0.0073) can P(j + l)/P( j ) for 1 < j < 4 and extrapolating. Ci3(lr4)
= 769, 237, 071,909,
(15)
also be obtained by studying the smoothness of the ratio The count Ci 3( 1,4) was found by McKay (1983) to be exactly:
157, 579, 108, 571, 190,000.
(16)
Another interesting self-dual problem was studied by Bore1 and Cheron (1955): If 4 players are dealt 13 cards each, what is the probability that each player is balanced (i.e., has at least 2 cards in each suit)? In our notation this is H41i4j, 52(2, 13); this was computed by Bore1 and Cheron (1955, page 72) to be 0.20628056, but not by counting square matrices.
5. Relation of the duality to a partition problem or searching for specials In the finite set of N given objects there is randomly distributed a small set of size S 4 N, which we refer to as specials, or successes or defectives or rarities, depending on the application. We remove a subset of size bM < N from the N units and randomly partition the subset into b batches of common size M, referring to the remainder (if any) of size N - hM as the partition excess. (The case of unequal values of M can also be included using vector M but we omit this at present.) Let Xi, X2, . ,X,, denote the random number of specials (which we may be searching for) in each of the b batches. We are interested in the joint distribution of the Xi, e.g., in the probability that each of the b batches contains fewer than r specials; denote this by PJ$!.(r, S). Similarly for ‘at least’ we use Pl$ N(r, S). The above partition problem is clearly equivalent to a multiple-sampling or multi-player problem with b players, each taking M observations. In this format there is only one category of specials and a sink of size N - S and N remains the same. Hence
PJ!i’,.(r, 9 = Hb~g’,(r,
M),
(17)
and it would be desirable in any search problem to have a high probability (say 0.95 or 0.99) of separating the rare items so that each batch has at most 1, i.e., here at most one means that r = 2. The problem is that if b is large then both sides of (17) might be time consuming to compute; here is where the dual comes to the rescue. From the duality of this paper we can replace both members of (17) by a single-player problem, i.e., PJ$!.(r,
S) = H,Jk.k(r,
and the last expression
M) = HI J$!+,(r, S),
(with 1 player) is generally
(18) much easier to work with.
M. Sohel,
K. Frankowski
/ Statistics
& Probuhility
Letters
20 (1994)
155-162
161
Example 5.1. A school estimates that it has 4 cases of HIV positive among 100 children at the same grade level. If they are randomly divided into 5 groups of 20 each, what is the chance that each class will have at most 1 of these cases. Using the duality, we have PJ&,,(2,4)
= H,Jk”&2,20)
= H1Ji5;, 100(2, 4) z 0.20401788.
(19)
Example 5.2. A seed box has h varieties of seeds all mixed up, each variety with a given common frequency M. Each of j farmers draws out n 3 br seeds at random from the total of N = bM and we ask for the smallest value of n such that the probability that each farmer gets at least Yof each variety is at least 0.95 (or 0.99); in our notation we have to find n such that HjJg’ N(r, n) 2 0.95 (or 0.99). For j = 3, b = 2, r = 1, M = 50 and N = bM = 100, we obtain H&2,). l&l>
7) = H&),0,,
(1, 50) cz 0.96308069
3 0.95 > H&‘,,,(l,
50),
(20)
H,I’$,&(L
9) = HzJ’&,,
(1, 50) z 0.99212615
3 0.99 > H,I&,(l,
50)
(21)
Thus to reach confidence level 0.95 (resp. 0.99) that each of 3 farmers has at least 1 of each of the h = 2 varieties, it is necessary that each farmer takes a sample of size 7 (resp. 9). Example 5.3. In an application in genetics they were searching for special DNA segments. It was known that there are exactly 20 specials among 1000 strings and the problem was to determine the number of batches b to use in partitioning at random these 1000 strings so that with specified high probability each batch would contain at most 3 specials, i.e., fewer than 4. In our notation, using our duality, this is
PJ’ib&,,,,, 1000(4,20) = H,J:‘,),,,,, The answers
(to 6 decimal
h PJCb’ 1,,00(4, 20) 1000,b,
=
H,J’$,,,,, 1,,00(4> 20).
places) are given as a function 25 0.838931
40 0.955352
50 0.977079
of h in the following 100 0.997728
200 0.999885
table: 250 0.999971
(23)
Thus for confidence level 0.99 we should be using between 50 and 100 batches, but if we do not want any partition excess, then we need to use b = 100 batches of size 10 each. The easiest way to compute the entries in the above table is to use the last expression in (22).
6. Directions
of related future research
(1) Since the duality studied is based on a term by term equality after some number of index interchanges, there is good reason to believe that the duality and consequent simplification of problems can also be extended to waiting time problems in the multi-player multivariate hypergeometric setting. (2) The possibility of using vector values Yin place of r has already been mentioned above but since the 7th one of the j players can have at least (or fewer than) rZ8 observations from the Pth category (p = 1,2, . . , h), there is in the full generalization a matrix {rla} of r-values. We claim that the duality theorem still holds if we replace this matrix {rrp} by its transpose in the dual result. This is a fairly evident conclusion of our result but has not been explicitly stated above. (3) We also have not yet investigated whether any similar dualities apply to other distributions. For example, since the hypergeometric distribution is a special case of generalized hypergeometric families and/or generalized Polya families, it would be surprising if by considering multi-sampling problems we could not
162
M. Sobel, K. Frankowski
J Statistics
& Probability
Letters
20 (1994)
155-162
find dualities there also. The reduction of a multiple-player problem to a single-player problem should prove to be useful in many different settings. (4) The negation of an event in the case ofj = 1 player is quite simple and for the case of I can be written for arbitrary b, M, N, r and n as HlE’,.(r,n)=
i (a=0
1)” ‘& HJ$,.(r, 0
n).
(24)
This is especially useful for small r (say, r = 1). The analogous result for j > 2 players and b 3 2 becomes quite complicated and laborious to compute even for this case ofj = 2 players, since the simple expression in (24) simply is not valid for j > 2. Further research in this area would be useful to make the negation operation more straightforward and simpler to use. (5) It would be interesting to see if and exactly how the Bore1 constant mentioned at the end of Section 4, as well as other more problems (not necessarily self-dual), can be obtained by counting matrices (not necessarily square).
Acknowledgements The authors wish to thank Prof. M. Watkins (Dept. of Mathematics) of Syracuse University for suggesting that the 13-player poker problem in Section 4 can be solved by counting the appropriate square matrices and Prof. B.D. McKay (1983) for providing us in (16) with the exact count that we needed. Further thanks are due to several members of the Mathematics Dept. at the University of Minnesota for various conversations about this paper, namely to Prof.‘s J. Baxter, B. Fristedt, N. Jain, J. Rosenthal and D. Stanton; it was the latter who put us in touch with Prof. McKay and other worldwide specialists in combinatorics. The authors also wish to acknowledge the effort of Prof. J. Rosenthal to derive an alternative combinatorial proof of our duality and we hope that it will be subsequently published.
References
Borel, E. and A. Cheron (1955) Theorie Mathtmatique du Bridge (A La Portee de Tous) (Gauthiers-Villars, Paris) (In French). David, F.N. and D.E. Barton (1962), Combinutorial Chance (Charles Griffin & Co. Ltd., London). McKay, B.D. (1983) Applications of a technique for labelled enumeration, Cony. Numer., 40, pp. 2077221. Sobel, M., V.R.R Uppuluri and K. Frankowski (1977) Dirichlet Distribution-Type 1, Selected Tables in Mathematical Statistics, Vol. 4, Published jointly by the IMS and AMS, Providence, Rhode Island. Sobel, M., V.R.R. Uppuluri and K. Frankowski (1985) Dirichlet Integrals of Type 2 and their Application. Selected Tables in Mathematical Statistics, VoI. 9, Published jointly by the IMS and AMS, Providence, Rhode Island.