Document not found! Please try again

On Estimation the Heterozygosity and Polymorphism ... - Science Direct

8 downloads 0 Views 124KB Size Report
heterozygosity, it follows from the Rao Blackwell. Lehmann Scheffe theorem that H is the unique UMVU estimator of heterozygosity. 266. Shete, Tiwari, and ...
Theoretical Population Biology 57, 265271 (2000) doi:10.1006tpbi.2000.1452, available online at http:www.idealibrary.com on

On Estimating the Heterozygosity and Polymorphism Information Content Value Sanjay Shete, Hemant Tiwari, and Robert C. Elston Department of Epidemiology and Biostatistics, Rammelkamp Center for Education and Research, MetroHealth Campus, Case Western Reserve University, 2500 MetroHealth Drive, Cleveland, Ohio 44109-1998 Received February 25, 1999

The polymorphism information content (PIC) value is commonly used in genetics as a measure of polymorphism for a marker locus used in linkage analysis. In this communication we have derived the uniformly minimum variance unbiased estimator of PIC along with its exact variance. We have also calculated the exact variance of the maximum likelihood estimator of PIC which is asymptotically an unbiased estimator. In order to find this variance we have derived a recursive formula to calculate the moments of any polynomial in a set of variables that are multinomially distributed. ] 2000 Academic Press Key Words: polymorphism information content; uniformly minimum variance unbiased estimator; maximum likelihood estimator; multinomial distribution; heterozygosity.

which was originally defined for a codominant marker used in a linkage study of a rare dominant disease but has more recently been shown to be relevant regardless of the mode of disease inheritance (Guo and Elston, 1999). In this paper we have derived the uniformly minimum variance unbiased (UMVU) estimator of PIC. We also give the exact variance of this estimator, as well as that of the maximum likelihood estimator, which can then be used to derive an approximate confidence interval for PIC.

1. INTRODUCTION One of the goals of human genetics is to map and identify the genes that are responsible for particular traits of interest. There are a large number of marker loci whose position and the order in which they appear along the chromosome are known. In linkage analysis (Ott, 1991), we try to determine which marker loci have alleles that cosegregate with alleles at a disease locus. A marker's usefulness for this purpose depends on the number of alleles it has and their corresponding relative frequencies. Qualitatively, a marker is called polymorphic if it has at least two alleles and its most frequent allele in the population has a frequency of at most 99 0. Quantitatively, the degree of polymorphism is commonly measured by two distinct quantities. One is known as heterozygosity, and its unbiased estimator and variance formula are well known (Nei and Roychoudhury, 1974). Another measure of polymorphism is the polymorphism information content (PIC) value (Botstein et al., 1980),

2. METHODS Consider a randomly mating population and assume the genotype frequencies are in HardyWeinberg equilibrium. Assume that there are m codominant alleles segregating at a marker locus. Let p i be the frequency of the i th allele in the population, and p^ i its unbiased estimate based on a random sample of n alleles, for i=1, ..., m. Heterozygosity is defined as the probability

265

0040-580900 K35.00 Copyright ] 2000 by Academic Press All rights of reproduction in any form reserved.

266

Shete, Tiwari, and Elston

that a random individual chosen from the population is heterozygous at a locus and is given in a randomly mating population by H=1&: p 2i

(2.1)

i

and the corresponding sample heterozygosity is given by H =1&: p^ 2i ,

(2.2)

i

which is the maximum likelihood estimator of H. The expected value of the sample heterozygosity is 1& i p 2i & (1& i p 2i )n or H&Hn, since E( p^ 2i )= p 2i +( p i & p 2i )n. Therefore H is not an unbiased estimator of H, though it is asymptotically unbiased. An unbiased estimator of H is given by H =

n 1&: p^ 2i . n&1 i

\

+

(2.3)

Nei and Roychoudhury (1974) gave the variance of the maximum likelihood estimator of heterozygosity as Var(H )=

2(n&1) (3&2n) : p 2i n3 i

{

\ + +2(n&2) : p +: p . = 3 i

i

Theorem 2.1. PI C=1&:

X i (X i &1) X j (X j &1) X i (X i &1) &: . n(n&1) n(n&1)(n&2)(n&3) (2.7)

Then PI C is a UMVU estimator of PIC. Proof. First we show that PI C is an unbiased estimator of PIC. We use the notation a (b) =a(a&1) } } } (a&b+1) for the bth descending factorial of a. Then we can rewrite PI C, defined in (2.7), as X (2) X (2) X (2) i j i . PI C=1&: (2) &: n n (4) Consider the expected value of PI C, given by

2 i

(2.4)

E[PI C]=1&:

i

2 (3&2n) : p 2i n(n&1) i

{

\ + +2(n&2) : p +: p . = 3 i

i

2 i

Define

2

From the relationship between H and H it can easily be seen that the variance of the unbiased estimator is Var(H )=

Let X i be the number of i th alleles in the sample of n alleles chosen at random from the population, where  X i =n. Then the random variables X i have a multinomial distribution. In the next theorem we propose a uniformly minimum variance unbiased (UMVU) estimator of PIC defined in (2.6).

(2.8)

From Lemma A.1, stated in the Appendix, we have (2) 2 pi E[X (2) i ]=n

2

X (2) E[X (2) E[X (2) i j ] i ] &: . (2) (4) n n

and

(4) 2 2 E[X (2) X (2) pi pj . i j ]=n

Substituting the above terms into (2.8), we obtain (2.5)

E[PI C]=1&: p 2i &: p 2i p 2j =PIC.

i

(2.6)

Hence, PI C is an unbiased estimator of PIC as defined in (2.6). Also note that (X 1 , ..., X m&1 ) is a complete sufficient statistic (see Casella and Berger (1990)). Since PI C is a function of a complete sufficient statistic and is unbiased, it follows from the RaoBlackwellLehmann Scheffe theorem that PI C is the unique UMVU estimator (see Theorem 7.3.5 of Casella and Berger (1990)). K

Throughout this paper the indices of the summations appearing in the formulas are all distinct, and from now on we shall omit putting these indices under the summation sign.

Remark. Since H given in (2.3) is also a function of a complete sufficient statistic and is an unbiased estimate of heterozygosity, it follows from the RaoBlackwell LehmannScheffe theorem that H is the unique UMVU estimator of heterozygosity.

The PIC value is defined as the probability that a given marker genotype of an offspring of an affected parent will allow deduction of the parental genotype at the marker locus and was shown by Botstein et al. (1980) to be PIC=1&: p 2i &: p 2i p 2j . i

i, j

267

Heterozygosity and PIC

Next we calculate the exact variance of the UMVU estimator of PIC defined in (2.6): X n

\

Var(PI C)=Var 1&:

\

=Var :

X n

(2) i (2)

(2) i (2)

X

&:

(2) i

X

n

(4)

+ +Var \:

X

(2) i

\

=

(n (2) ) 2 +

+

(n )

_ \

+

(2) +4EX (3) X (2) X (2) X (2) X (2) i j k +2EX i j k ) (2) X (2) X (2) +: EX (2) i j k Xl

+&

+ \

&E : X (2) E : X (2) X (2) i i j

(2) 2

(n )

_

\

+

+

+&

\

& : EX (2) i

1

+ & +(n ) _2 : E(X (4) 2

2 (3) 2 : (EX (4) X (2) X (2) i j +4EX i j n n (4)

_

(2)

\

& : EX (2) i

(2) i

2 X (2) j )

+: EX

X

(2) j

X

(2) k

X

\

+: EX (2) X (2) X (2) i j k

+\

(2.9)

(2) (8) 2 2 2 2 EX (2) X (2) X (2) pi pj pk pl . i j k X l =n

_

\

+& ,

(7) 3 2 2 EX (3) X (2) X (2) pi pj pk i j k =n

+&

: EX (2) X (2) i j

X (2) j

(7) 4 3 EX (4) X (3) pi pj i j =n

(2) l

2 2 (2) + (2) (4) 2 : E(X (2) i ) Xj n n

& : EX (2) i

(2) i

(4) 4 pi EX (4) i =n

2

& : EX (2) X (2) i j

+\: EX

2 where the last term follows from the identity (X (2) i ) = (4) (3) (2) X i +4X i +2X i . We can use Lemma A.1 in the appendix to obtain the mixed factorial moments required in (2.9). For instance,

2 (2) +4 : E(X (2) X (2) i ) Xj k

(2) i

+&

(2) +2EX (2) X (2) X (2) X (2) i j )+: EX i j k

2 (2) : E(X (2) X (2) i ) +: EX i j

2

2

& : EX (2) X (2) i j

_ \

1

(4) X (4) X (3) j +4EX i j

2

2 + (2) (4) E : X (2) : X (2) X (2) i i j n n

\

(4) i

(4) +4EX (2) X (2) X (2) X (2) i j )+4 : (EX i j k

2

\

_2 : (EX

(2) +2EX (2) X (4) X (3) i j +8EX i j

2

(2) i

E : X (2) X (2) i j

(n )

(3) +16EX (3) X (3) X (2) i j +8EX i j

_ E \: X + & \E : X + &

1

1 (4) 2

+&

(3) +2EX (4) X (2) X (4) i j +4EX i j

+

(2) i

(4) 2

2

(2)

Xi Xj X (2) i ,: (2) n n (4)

& E : X (2) X (2) i j

=

+

(2) j

2

1

_

\

+

X

1 (3) (2) : (EX (4) i +4EX i +2EX i ) (n (2) ) 2 (2) X (2) +: EX (2) i j & : EX i

n (4)

(2)

+2 Cov :

(2) j

=

+&

Hence, using these mixed factorial moments in (2.9), we obtain the variance of the UMVU estimator of PIC. Next, we calculate the exact variance of the maximum likelihood estimator (MLE) of PIC, which is not an unbiased estimator but is asymptotically unbiased. The MLE of the PIC value is given by PI C=1&:

Xi n

\ +

2

&:

Xi n

2

Xj n

2

\ + \ +.

(2.10)

268

Shete, Tiwari, and Elston

Consider the expectation of PI C.

E [PI C]=1&

In order to calculate the mixed moments required in (2.11) we have derived a recursive formula which is given in the Appendix as a theorem. Using this result (see (A.2)) we can calculate all the mixed moments required in (2.11). For instance,

1 1 : EX 2i & 4 : EX 2i X 2j 2 n n

1 n (2) n (4) =1& & 2 : p 2i & 4 : p 2i p 2j n n n (3)

&2

EX 4i =p i [n+7n (2) p i +6n (3) p 2i +n (4) p 3i ] EX 4i X 2j = p i p j [n (2) +n (3) (7p i + p j )

(2)

n n : p 2i p j & 4 : p i p j n4 n

+n (4)(7p i p j +6p 2i )+n (5) (6p 2i p j + p 3i ) +n (6)p 3i p j ]

since EX 2i = n (2)p 2i + np i and EX 2i X 2j = n (4)p 2i p 2j + n (3)p 2i p j +n (3)p i p 2j +n (2)p i p j . Hence, PI C is not an unbiased estimator of PIC, though it is asymptotically unbiased. We calculate the variance of PI C as follows: 2

Xi n

\

2

Xi n

2

Xj n

\ + \ + \ ++ 1 1 = Var : X + Var : X X \ + n \ + n 2 + Cov : X , : X X \ + n 1 = E :X + & \E : X + & n _ \ 1 + E :X X + &\E : X X + & n _ \ 2 + E :X +\: X X + n _ \ &E : X E : X X \ + \ +& 1 = : EX +: EX X & : EX \ +& n _ 1 + 2 : EX X +4 : EX X X n _ +: EX X X X & : EX X \ +& 2 + 2 : EX X +: EX X X n _ & : EX \ +\: EX X +& . (2.11)

Var(PI C)=Var 1&

&

2 i

4

2 i

8

2 i

6

2 i

2 j

2

2

2 i

4

2 j

2 i

2

8

2 i

6

2 i

2

2 j

2 i

2 i

2 i

2 i

2 j

2 j

2 j

2

4 i

4

2 i

4 i

8

2 j

2 i

4 j

4 i

2 j

2 k

2

2 i

2 j

2 k

4 i

6

2 i

2 l

2 i

2 j

2 i

2 i

2 j

2 j

2 j

2 k

EX 2i X 2j X 2k = p i p j p k [n (3) +n (4)( p i + p j + p k ) +n (5)( p i p j + p i p k + p k p j )+n (6)p i p j p k ]. Similarly the other mixed moments can be obtained and used in (2.11) to obtain the exact variance of the MLE of PIC.

3. DISCUSSION We have given expressions for the UMVU estimators of the heterozygosity and the PIC values along with their exact variances. We have also given expressions for the exact variances of the maximum likelihood estimators. Tables IIV each include in the second and third columns the values of heterozygosity obtained using (2.1) and PIC using (2.6) for 210 equifrequent alleles. We simulated observations from a multinomial distribution with parameters n and p 1 = } } } = p m = m1 . Here n is the sample size, which is the number of independent alleles sampled (i.e., n2 persons under random mating are sampled). For the purpose of simulation we set n=20, 50, 100, and 200. For each value of n we carried out 1000 simulations. The following quantities were computed based on 1000 simulations: averages of PI C, PI C, H, and H and estimates of the standard deviations of these estimators. These values are reported in Table I for n=20, in Table II for n=50, in Table III for n=100, and in Table IV for n=200. The values in the last four columns of Table I, II, III, and IV, _ PIC , _ PIC , _ H , and _ H , are obtained using expressions (2.9), (2.11), (2.5), and (2.4) respectively. From Tables IIV, for small sample sizes we observe that the unbiased estimator PI C is always closer to the true PIC value than PI C, which is not unbiased. In the large sample case both these estimators are close to the PIC value, with almost identical standard errors. Also, note that PI C consistently underestimates PIC. We therefore recommend in general that PI C be used to

269

Heterozygosity and PIC TABLE I

Values of PIC and Heterozygosity with Their Estimates and Standard Errors, for Equifrequent Alleles, Based on 1000 Samples: Sample Size n=20 No. of

Simulated average

Simulated standard deviation

alleles

PIC

H

PI C

PI C

H

H

_PI C

_ PI C

_ H

_ H

_ PI C

_ PI C

_H

_ H

2 3 4 5 6 7 8 9 10

0.3750 0.5926 0.7031 0.7680 0.8102 0.8397 0.8613 0.8779 0.8910

0.5000 0.6667 0.7500 0.8000 0.8333 0.8571 0.8750 0.8889 0.9000

0.3747 0.5922 0.7017 0.7686 0.8097 0.8397 0.8611 0.8774 0.8913

0.3612 0.5585 0.6593 0.7217 0.7606 0.7891 0.8097 0.8254 0.8389

0.4993 0.6660 0.7488 0.8004 0.8331 0.8571 0.8748 0.8885 0.9002

0.4743 0.6327 0.7114 0.7604 0.7915 0.8143 0.8311 0.8441 0.8552

0.0183 0.0353 0.0369 0.0329 0.0320 0.0283 0.0282 0.0259 0.0243

0.0198 0.0334 0.0347 0.0311 0.0305 0.0271 0.0272 0.0251 0.0236

0.0365 0.0354 0.0326 0.0285 0.0271 0.0243 0.0243 0.0225 0.0213

0.0346 0.0336 0.0309 0.0271 0.0258 0.0231 0.0231 0.0213 0.0202

0.0182 0.0348 0.0357 0.0338 0.0317 0.0296 0.0278 0.0262 0.0248

0.0197 0.0328 0.0335 0.0320 0.0301 0.0284 0.0268 0.0254 0.0242

0.0363 0.0342 0.0314 0.0290 0.0270 0.0254 0.0240 0.0228 0.0218

0.0345 0.0325 0.0298 0.0276 0.0257 0.0241 0.0228 0.0217 0.0207

TABLE II Values of PIC and Heterozygosity with Their Estimates and Standard Errors, for Equifrequent Alleles, Based on 1000 Samples: Sample Size n=50 No. of

Simulated average

Simulated standard deviation

alleles

PIC

H

PI C

PI C

H

H

_PI C

_ PI C

_ H

_ H

_ PI C

_ PI C

_H

_ H

2 3 4 5 6 7 8 9 10

0.3750 0.5926 0.7031 0.7680 0.8102 0.8397 0.8613 0.8779 0.8910

0.5000 0.6667 7500 0.8000 0.8333 0.8571 0.8750 0.8889 0.9000

0.3749 0.5929 0.7028 0.7677 0.8105 0.8396 0.8612 0.8776 0.8911

0.3698 0.5795 0.6858 0.7491 0.7910 0.8195 0.8408 0.8571 0.8704

0.4998 0.6670 0.7497 0.7997 0.8336 0.8571 0.8749 0.8887 0.9001

0.4898 0.6537 0.7347 0.7837 0.8170 0.8399 0.8574 0.8709 0.8821

0.0074 0.0132 0.0147 0.0131 0.0119 0.0118 0.0111 0.0105 0.0100

0.0077 0.0129 0.0143 0.0128 0.0117 0.0116 0.0110 0.0104 0.0099

0.0149 0.0132 0.0129 0.0112 0.0102 0.0101 0.0096 0.0091 0.0088

0.0146 0.0129 0.0127 0.0110 0.0100 0.0099 0.0094 0.0089 0.0086

0.0071 0.0136 0.0140 0.0133 0.0124 0.0116 0.0109 0.0103 0.0098

0.0074 0.0133 0.0136 0.0130 0.0122 0.0115 0.0108 0.0102 0.0097

0.0143 0.0135 0.0124 0.0114 0.0106 0.0100 0.0094 0.0090 0.0086

0.0140 0.0132 0.0121 0.0112 0.0104 0.0098 0.0093 0.0088 0.0084

TABLE III Values of PIC and Heterozygosity with Their Estimates and Standard Errors, for Equifrequent Alleles, Based on 1000 Samples: Sample Size n=100 No. of

Simulated average

Simulated standard deviation

alleles

PIC

H

PI C

PI C

H

H

_PI C

_ PI C

_ H

_ H

_ PI C

_ PI C

_H

_ H

2 3 4 5 6 7 8 9 10

0.3750 0.5926 0.7031 0.7680 0.8102 0.8397 0.8613 0.8779 0.8910

0.5000 0.6667 0.7500 0.8000 0.8333 0.8571 0.8750 0.8889 0.9000

0.3750 0.5930 0.7031 0.7676 0.8103 0.8398 0.8616 0.8779 0.8908

0.3724 0.5863 0.6946 0.7583 0.8006 0.8298 0.8514 0.8677 0.8805

0.5000 0.6671 0.7500 0.7997 0.8334 0.8573 0.8752 0.8889 0.8998

0.4950 0.6604 0.7425 0.7917 0.8251 0.8487 0.8664 0.8800 0.8908

0.0035 0.0063 0.0070 0.0067 0.0057 0.0053 0.0053 0.0050 0.0049

0.0036 0.0063 0.0070 0.0066 0.0057 0.0053 0.0052 0.0050 0.0049

0.0071 0.0064 0.0062 0.0057 0.0049 0.0046 0.0046 0.0044 0.0043

0.0070 0.0063 0.0062 0.0057 0.0049 0.0045 0.0045 0.0043 0.0042

0.0036 0.0067 0.0069 0.0066 0.0062 0.0058 0.0054 0.0051 0.0049

0.0036 0.0067 0.0069 0.0065 0.0061 0.0057 0.0054 0.0051 0.0048

0.0071 0.0067 0.0062 0.0057 0.0053 0.0050 0.0047 0.0045 0.0043

0.0070 0.0066 0.0061 0.0056 0.0052 0.0049 0.0047 0.0044 0.0042

270

Shete, Tiwari, and Elston

TABLE IV Values of PIC and Heterozygosity with Their Estimates and Standard Errors, for Equifrequent Alleles, Based on 1000 Samples: Sample Size n=200 No. of

Simulated average

Simulated standard deviation

alleles

PIC

H

PI C

PI C

H

H

_PI C

_ PI C

_ H

_ H

_ PI C

_ PI C

_H

_ H

2 3 4 5 6 7 8 9 10

0.3750 0.5926 0.7031 0.7680 0.8102 0.8397 0.8613 0.8779 0.8910

0.5000 0.6667 0.7500 0.8000 0.8333 0.8571 0.8750 0.8889 0.9000

0.3750 0.5926 0.7031 0.7682 0.8103 0.8396 0.8613 0.8780 0.8909

0.3737 0.5892 0.6988 0.7635 0.8054 0.8346 0.8562 0.8729 0.8857

0.5000 0.6666 0.7499 0.8002 0.8334 0.8571 0.8750 0.8889 0.8999

0.4975 0.6633 0.7462 0.7962 0.8292 0.8528 0.8706 0.8845 0.8954

0.0018 0.0034 0.0035 0.0033 0.0031 0.0029 0.0027 0.0025 0.0025

0.0018 0.0034 0.0034 0.0033 0.0031 0.0029 0.0027 0.0025 0.0025

0.0036 0.0034 0.0031 0.0028 0.0026 0.0025 0.0023 0.0022 0.0022

0.0036 0.0034 0.0031 0.0028 0.0026 0.0025 0.0023 0.0022 0.0022

0.0018 0.0033 0.0035 0.0033 0.0031 0.0029 0.0027 0.0026 0.0024

0.0018 0.0033 0.0034 0.0033 0.0031 0.0029 0.0027 0.0026 0.0024

0.0035 0.0033 0.0031 0.0028 0.0026 0.0025 0.0023 0.0022 0.0021

0.0035 0.0033 0.0031 0.0028 0.0026 0.0025 0.0023 0.0022 0.0021

estimate PIC, and that its variance be estimated by substituting sample values of the allele frequencies into (2.9) to obtain the standard error of the estimate. For large samples we can use PI C\1.96 _^ PIC as an approximate 950 confidence interval for PIC. Finally, we observe that the standard deviations of PIC and PIC are maximized if the number of equifrequent alleles is 4, irrespective of the sample size we choose, whereas in the case of heterozygosity the standard deviation of H decreases as the number of alleles increases. A computer program that performs this calculation is obtainable by ftp at http:darwin.cwru.edupic.

APPENDIX Lemma A.1. For a multinomial distribution the mixed factorial moment is given by (Johnson et al., 1997, p. 33) 1 ) } } } X (rm ) ]=: x (r1 ) } } } x (rm ) E[X (r 1 m 1 m

=n

( ri )

\

n ` p xi i x 1 , ..., x m

+

for x i =0, ..., n and for i=1, ..., m, where  x i =n and  p i =1. Assume that rm, and the indices i 1 , ..., i r are all distinct. Then j1 &1 r

1

`p .

(A.1)

Theorem A.1. Let the random variables X 1 , ..., X m have a multinomial distribution with their joint probability mass function given by n

\ x , ..., x + ` p 1

m

xi i

j 1 &1 k1

j r &1 E(Y ki 1 } } } Y ki r ), 1 r kr

+

+ (A.2)

Proof. We will use the notation x ir+1 =n&x i1 & } } } &x ir . Then by the definition of the expectation, we have the following: E(X ij1 } } } X ijr ) r

1

= : } } } : x ij1 } } } x ijr r

1

xi =0 r

1

n

\x , ..., x , x + p i1

ir

ir+1

xi 1 i1

x

x

} } } p i ir p i ir+1 r

r+1

= : } } } : x ij1 &1 } } } x ijr &1 xi =1 1

_

r

1

xi =1 r

n! x x x p i1 } } } p i ir p i ir+1 r r+1 (x i1 &1)! } } } (x ir &1)! x ir+1 ! i1

=n (r)p i1 } } } p ir : } } } : x ij1 &1 } } } x ijrr &1 xi =1 r

f (x 1 , ..., x m ; n, p 1 , ..., p m )=

\

kr =0

\

where (Y i1 , ..., Y ir+1 ) is distributed as Multinomial(n&r, p i1 , ..., p ir+1 ), Y ir+1 =n&r&Y i1 & } } } &Y ir , p ir+1 =1& p i1 & } } } &p ir and j 1 , ..., j r 1.

_

Next we give a recursive formula to find the mixed moments of a multinomial distribution. Using this recursion one can compute mixed moments of higher order in terms of mixed moments of lower order.

k1 =0

}}}

xi =0

ri i

jr &1

E(X ij1 } } } X ijr )=n (r)p i1 } } } p ir : } } } :

_

xi =1

1

r

(n&r) ! ir+1 p xi1 &1 } } } p xirir &1 p xir+1 (x i1 &1) ! } } } (x ir &1) ! x ir+1 ! i1

271

Heterozygosity and PIC

Let y i1 =x i1 &1, ..., y ir =x ir &1, y ir+1 =x ir+1 =n&r& y i1 & } } } &y ir ; then

j1 &1 k1 =0

}}} j1 i1

jr &1

=n (r)p i1 } } } p ir : } } } :

jr ir

E(X } } } X )

j r &1

\ k + E(Y r

yi =0

yi =0 r

1

k1 i1

1

} } } Y ki r ), r

where (Y i1 , ..., Y ir+1 ) is distributed as Multinomial(n&r, p i1 , ..., p ir+1 ), Y ir+1 =n&r&Y i1 & } } } &Y ir , p ir+1 =1&p i1 & } } } &p ir , and j 1 , ..., j r 1.

: } } } : (1+y i1 ) j1 &1

=n (r)p i1 } } } p ir

kr =0

j 1 &1

\k +

} } } (1+y ir ) jr &1 }

ACKNOWLEDGMENTS

(n&r)! p yi1 } } } p iyir p iyir+1 r r+1 y i1 ! } } } y ir ! y ir+1 ! i1

=n (r)p i1 } } } p ir : } } } : yi

}}}

yi

r

1

j1 &1

jr &1

k1 =0

kr =0

_ : }}} : \

j r &1 y ki 1 } } } y ki r 1 r kr

\ +

j 1 &1 k1

+

The study was supported in part by U.S. Public Health Service Research Grants GM 28353 from the National Institute of General Medical Sciences and HL 55055 from the National Heart, Lung and Blood Institute, by Resource Grant RR03655 from the National Center for Research Resources, and by the Ernest Gallo Clinic and Research Center.

& REFERENCES

n&r y y y } p i i1 } } } p i ir p i ir+1 1 r r+1 y i1 , ..., y ir , y ir+1

\

+

j1 &1

jr &1

=n (r)p i1 } } } p ir : } } } : k1 =0

_

kr =0

} : } } } : y ki 1 } } } y ki r r

1

yi

1

yi

r

y

y

y

1

r

r+1

_ p i i1 } } } p i ir p i ir+1

&

\

j 1 &1 j r &1 }}} k1 kr

+ \ +

n&r

\ y , ..., y , y + i1

ir

ir+1

Botstein, D., White, R. L., Skalnick, M. H., and Davies, R. W. 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphism, Am. J. Hum. Genet. 32, 314331. Casella, G., and Berger, R. L. 1990. ``Statistical Inference,'' Duxbury, N. Scitucte, MA. Guo, X., and Elston, R. C. 1999. Linkage information content of polymorphic genetic markers, Hum. Hered. 49, 112118. Johnson, N. L., Kotz, S., and Balakrishnan, N. 1997. ``Discrete Multivariate Distributions,'' Wiley, New York, p. 33. Nei, M., and Roychoudhury, A. K. 1974. Sampling variances of heterozygosity and genetic distance, Genetics 76, 379390. Ott, J. 1991. ``Analysis of Human Genetic Linkage,'' Johns Hopkins Univ. Press, Baltimore, MD.

Suggest Documents