Original Paper Hum Hered 1998;48:67–81
Laura C. Lazzeroni a Kenneth Lange b a
b
Division of Biostatistics and Department of Genetics, Stanford University, Stanford, Calif., and Departments of Biostatistics and Mathematics, University of Michigan, Ann Arbor, Mich., USA
Received: May 22, 1997 Revised: September 17, 1997 Accepted: September 25, 1997
A Conditional Inference Framework for Extending the Transmission/Disequilibrium Test
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Key Words Permutation tests Exact tests Multiple testing p-values Linkage disequilibrium
Abstract The transmission/disequilibrium test (TDT) of Terwilliger and Ott [Hum Hered 1992;42:337–346] and Spielman et al. [Am J Hum Genet 1993;52:506–516] is widely used to detect linkage and/or association between a genetically influenced disease and the alleles of a codominant marker locus. The TDT was specifically designed to avoid the spurious population associations produced by ethnic stratification of a sample of affected people. In this paper, we describe permutation extensions of the TDT that share this advantage. Our conditional inference framework permits extensions to multiple alleles, multiple loci, unaffected siblings, and genotypic rather than allelic associations. In the case of multiple loci, the conditional perspective provides a straightforward correction for multiple tests that can be substantially more powerful than the standard Bonferroni correction. OOOOOOOOOOOOOOOOO
1. Introduction The transmission/disequilibrium test (TDT) of Terwilliger and Ott [1] and Spielman et al. [2] detects linkage disequilibrium or causative association between the alleles of a codominant marker locus and a genetically influenced disease. Its originators specifically Research supported in part by USPHS grant GM53275 (L.C.L. and K.L.) and NSF grant DMS-9510516 (L.C.L.).
ABC
© 1998 S. Karger AG, Basel 0001–5652/98/0482–0067$15.00/0
Fax + 41 61 306 12 34 E-Mail
[email protected] www.karger.com
This article is also accessible online at: http://BioMedNet.com/karger
designed the TDT to circumvent the spurious population associations produced by ethnic stratification. The TDT achieves this goal by taking into account the marker genotypes of the parents of affected individuals as well as the marker genotypes of the affected individuals themselves. The parental alleles not passed to an affected child serve as controls for the parental alleles passed to the child. Under the null hypothesis of the TDT, parents transmit their marker alleles to an affected child in the
Dr. Laura Lazzeroni HRP Redwood Bldg. T101D Stanford University Stanford, CA 94305-5405 (USA) E-Mail
[email protected]
usual Mendelian fashion independently of the child’s disease status. An advantage of the TDT is that it mandates no specific genetic model for disease transmission. In the current paper, we describe a comprehensive conditional inference framework for the TDT that includes several interesting generalizations. The conditional perspective clearly explains why a test for symmetry such as the TDT is appropriate as a test for transmission disequilibrium, which we define as a lack of independence. This perspective also leads to permutation tests that provide straightforward extensions to multiple alleles, multiple marker loci, unaffected siblings of affected children, and genotypic rather than allelic associations. In the case of multiple marker loci, the conditional perspective provides a permutation-based correction for multiple testing that is less conservative and can be substantially more powerful than the standard Bonferroni correction. In generalizing the TDT, we condition all analyses on observed parental genotypes. This tactic eliminates the possibility of detecting spurious associations due to ethnic stratification. In the orginal TDT setting of a single biallelic marker, such conditioning is implicit. Conditioning also eliminates nuisance parameters such as allele frequencies. This leads to exact permutation tests that approximate p-values by Monte Carlo simulation and avoid possibly dubious large-sample approximations. Avoiding large-sample approximations is particularly helpful for markers with multiple alleles because in this case the contingency tables encountered in the TDT can contain many cells with small expected counts. While computationally intensive, Monte Carlo simulation can approximate p-values with sufficient accuracy for practical purposes. Bickeböller and Clerget-Darpoux [3] originally proposed basing generalizations of the TDT to multiple alleles on the conditional
68
Hum Hered 1998;48:67–81
distribution. Other authors [4–6] have followed suit using permutation procedures to evaluate conditional tests. The conditional framework used in this paper is both simpler and more general than the unconditional framework previously used to justify these conditional tests [3, 5]. The unconditional framework requires a parametric model for linkage disequilibrium, recombination and penetrance. In addition, it invokes strong assumptions such as random mating that call into question the applicability of the TDT with ethnic stratification. In contrast, this paper shows that the conditional framework easily justifies the use of conditional tests without direct reference to the underlying unconditional setting. Otherwise controversial results then follow immediately, even when restrictions such as random mating are dropped. Although the conditional framework is more intellectually satisfying, it is not a panacea. Important issues such as power and efficient model-based statistics require consideration of the unconditional setting. To summarize our remaining agenda, Section 2 reviews the TDT for a single biallelic locus and introduces the conditional inference framework. Section 3 carries out the TDT generalizations just mentioned. Section 4 addresses corrections for multiple tests and Monte Carlo approximation of p-values. Finally, section 5 discusses the implications and limitations of the TDT methodology.
2. The TDT at a Single Biallelic Locus 2.1. The Original TDT We define a marker locus to be in ‘transmission equilibrium’ with a disease if the disease status of a child is independent of which of the parental alleles are transmitted to the child at the marker locus. In contrast, a marker locus is in linkage equilibrium with a
Lazzeroni/Lange
disease if the disease status of a person is independent of his or her marker genotype. In other words, transmission disequilibrium describes associations within nuclear families conditional on the genotypes of the parents. In contrast, linkage disequilibrium describes associations within populations ignoring parental genotypes. Note that we will use the term ‘linkage disequilibrium’ only when the marker locus is linked to, but distinct from, an underlying disease-predisposing locus. Linkage disequilibrium occurs because each disease gene mutation happens by chance on a unique chromosomal background. The closer a marker locus is to the disease locus, the less likely it is that the original marker allele associated with a mutation will have been displaced by recombination. Linkage between a marker locus and a disease locus does not necessarily imply linkage disequilibrium. If a sufficient number of recombination events occur between the marker and disease loci, or if the marker alleles originally associated with disease mutations occur in proportion to the population marker allele frequencies, then linkage equilibrium will hold. Transmission disequilibrium obviously occurs when one of the marker alleles plays a direct role in the disease process. Transmission disequilibrium also occurs when the marker locus is in linkage disequilibrium with a disease-predisposing locus. Linkage disequilibrium favors certain phase combinations of marker and disease alleles on parental chromosomes. Because the loci are linked, these combinations will be preferentially transmitted from parent to child. In contrast, at a marker locus unlinked to any disease-predisposing locus, marker alleles associated with the disease in the parental generation through ethnic stratification are not preferentially transmitted to children. Thus, conditioning on parental genotypes eliminates the error of
confounding other types of population association with transmission disequilibrium. When linkage equilibrium holds at a linked marker locus, the marker and disease alleles on each parental chromosome are independent. Because linkage equilibrium is preserved by recombination, any given child will be in transmission equilibrium. However, siblings taken collectively fail to be in transmission equilibrium because siblings of like disease status tend to share marker alleles. The TDT is not designed to exploit the lack of independence among the marker alleles transmitted to multiple affected children of the same parent. As long as all parents are heterozygous, the TDT gains little from the use of affected siblings beyond what it would gain from a proportional increase in the sample size of single affected children. (Of course, the occurrence of multiple affected members of a sibship does argue for genetic as opposed to sporadic explanations of the disease segregating within the sibship.) Fortunately, a lack of linkage disequilibrium is not a barrier to detecting linkage. Standard pedigree-based methods accomplish precisely this task by counting contemporary recombination and nonrecombination events. In contrast, the TDT is designed to detect a deficit of historical as opposed to contemporary recombination events. A strongly positive TDT result suggests that the tested marker is a disease-predisposing gene or closely linked to such a gene. When there is prior evidence of population association between the marker and the disease, the TDT can distinguish these two possibilities from spurious association due to population stratification. When there is prior evidence of linkage, the TDT can still detect transmission disequilibrium if it exists. A strongly negative TDT result based on a sizable amount of data does not resolve the question of whether the marker is linked to a dis-
Extensions of the TDT
Hum Hered 1998;48:67–81
69
Table 1. Contingency table for the original TDT Transmitted
Not transmitted
Allele 1 Allele 2
Allele 1
Allele 2
n1/1 → 1 n1/2 → 2
n1/2 → 1 n2/2 → 2
ease-predisposing gene but does suggest that the marker is not causative. To test transmission equilibrium, the original TDT for a single biallelic locus uses data from the parents of a sample of affected individuals as shown in table 1. Cell count ni/j → i represents the number of times parents of genotype i/j transmit allele i to an affected child. When both parents and the child share the heterozygous genotype 1/2, it is impossible to determine which parent contributed which allele to the child. However, it is still possible to correctly enter one parent in each of the two off-diagonal cells of the table. If a parent has multiple affected children, he or she is counted in the table once for each child. Let pi/j → i be the probability corresponding to the cell containing count ni/j → i. In the TDT one tests the null hypothesis of symmetry H0: p1/2 → 1 = p1/2 → 2 against the alternative Ha: p1/2 → 1 7 p1/2 → 2. Assuming segregation distortion does not operate at the marker locus, the alternative Ha of asymmetry occurs when the marker locus is in transmission disequilibrium with the disease. To test whether p1/2 → 1 = p1/2 → 2, the TDT applies a standard statistical procedure known as McNemar’s test [1]. If we let t1 = n1/2 → 1 and t1/2 → 2 for notational simplicity, then McNemar’s statistic can be written as TMcNemar =
70
(t1 – t2)2 . t1 + t2
Hum Hered 1998;48:67–81
Observe that TMcNemar reasonably ignores the diagonal entries of table 1; these provide no information relevant to transmission equilibrium. McNemar’s test is often derived in terms of the multinomial distribution of the four cell counts of table 1 [7]. It is equally valid to derive the test using the conditional distribution of the two off-diagonal cell counts given the total number of transmission events t1 + t2 from heterozygous parents. Under the null hypothesis, each pair consisting of an affected child and a heterozygous parent constitutes an independent trial in which allele 1 1 is passed with probability 2 to the affected child. Conditional on the total number of trials, t1 is binomially distributed with suc1 t +t cess probability 2 , mean E (t1) = 1 2 2 , and t1 + t2 variance Var(t1) = 4 . McNemar’s test statistic can be written as the squared standardized residual
St – t +2 t D 1
TMcNemar =
1
t1 + t2 4
2 2
=
[t1 – E (t1)]2 Var (t1)
.
(1)
This representation makes it clear that TMcNemar follows an approximate ¯21 distribution with one degree of freedom. The ¯21 approximation is quite good in view of the null 1 success probability of 2 . For very small samples, one can compare the value of t1 directly to the binomial distribution. 2.2. General Framework We now describe more formally why McNemar’s test for asymmetry is appropriate as a test of transmission disequilibrium, which we have defined in terms of a lack of independence. Let G be the genotype of a parent, H the allele the parent transmits to one of his/her children, and A the affected status of the child. The null hypothesis of the TDT entails the conditional independence of H and
Lazzeroni/Lange
A given G. In symbols, this independence is expressed as
assuming the two parental alleles are different. In the two-allele case, condition (4) amounts to p1/2 → 1 = p1/2 → 2. This symmetry
condition implicitly assumes no segregation distortion or zygote viability differences at the marker locus. It is instructive to note here the difference between the conditional inference framework and the unconditional framework advocated by Bickeböller and Clerget-Darpoux [3] and Kaplan et al. [5] as justification for conditional tests. Within the conditional framework, the null distribution ignores disease status and can be derived by appeal to simple Mendelian transmission. In particular, the three transmission premises mentioned above immediately imply the independence of the two parental contributions to any child without invoking any model for penetrance, recombination, and linkage disequilibrium and without assuming random mating. As correctly noted by Bickeböller and Clerget-Darpoux [3], assortative mating and ethnic stratification can introduce dependencies between parental genotypes and subsequently transmitted alleles when inference is not conditioned on parental genotypes. This complication is one reason to prefer tests based on the conditional distribution. Because population association in the parental generation has no effect upon Pr (HAG), it cannot artificially distort the null distribution. In extending the TDT, we retain this advantage by continuing to condition on parental genotypes. Even when spurious population associations are not a problem, conditioning on parental genotypes eliminates nuisance parameters such as allele frequencies that can otherwise make test results more variable. In addition, low cell counts lead to poor performance of the large-sample approximations suggested by the unconditional multinomial distribution. In contrast, exact p-values can in principle be evaluated to any desired degree of accuracy by taking a large enough Monte Carlo sample. The price paid for these advantages is lost information contained in parental genotypes.
Extensions of the TDT
Hum Hered 1998;48:67–81
Pr (H ∩ AA G ) = Pr (H A G ) Pr (AA G ).
(2)
In view of the identity Pr(H ∩ AAG ) = Pr(HA A ∩ G) Pr(AAG), condition (2) is equivalent to Pr (H A A ∩ G) = Pr (H A G ).
(3)
Note that the conditional independence condition (3) is not equivalent to the marginal independence condition Pr (H A A) = Pr (H ).
In the presence of ethnic stratification, certain marker genotypes will occur more frequently among affected children and their parents. Any dependence between G and A can induce a marginal dependence between H and A. In testing condition (3), the TDT relies on three simplifying genetic assumptions in order to completely specify the transmission probability Pr(HAG) when affected status is ignored. These assumptions are, in fact, simply a restatement of Mendel’s laws. First, the maternal and paternal marker alleles transmitted to a child are independent given the two parental genotypes. Thus, each parent of a child can be entered separately in the table. Second, the alleles a parent transmits to different children are independent given the parent’s genotype. Thus, the TDT can take into account multiple affected children per sibship. Third, Pr(HAG) is symmetric in the transmitted and untransmitted alleles for each parent-child pair. If GuH denotes the event that the parent transmits his/her other allele to the child, then the symmetry condition reads Pr (H A G ) = Pr (GuH A G ) =
1 , 2
(4)
71
The ti are not independent. If i 7 j, then
Table 2. Multiallelic TDT data
Transmitted Not transmitted Total
Allele 1
Allele 2
...
Allele k
t1 c1 t1 + c1
t2 c2 t2 + c2
... ... ...
tk ck tk + ck
Cov (ti, tj) = Cov (ni/j → i, ni/j → j) = – Var (ni/j → i) =–
ni/j → i + ni/j → j 4
.
One reasonable statistic for testing transmission disequilibrium is k
3. Extensions to More General Data 3.1. Multiple Alleles In extending the TDT to k 1 2 alleles, we again omit the genes transmitted by uninformative homozygous parents and base inferences on the number ti =
™ ni/j → i { j: j 7 i }
of i alleles transmitted by heterozygous parents for each allele type i. The number ti is contrasted with the number ci =
™ ni/j → j { j: j 7 i }
of type i control alleles that are not transmitted. The data relevant to the TDT are summarized in table 2, which is a collapsed version of the k ! k table formed from the transmission counts ni/j → i. Under the null hypothesisH0 of transmission equilibrium, the transmission probabilities pi/j → i and pi/j → j coincide for each heterozygous parental genotype i/j. Furthermore, conditional on all parental genotypes, each ti is binomially distributed with ti + ci indepen1 dent trials and success probability 2 . Thus, E (ti) =
ti + ci 2
Var (ti) =
72
ti + ci . 4
Hum Hered 1998;48:67–81
TDTk =
[ti – E (ti)]2 Var (ti) i=1
™ k
=
™ i=1
(ti – ci)2 ti + ci
.
The ith term of this summation is the squared standardized residual of ti. For biallelic markers, TDT 2 reduces to twice the original TDT and is approximately twice a ¯21 variable. Although the sum of k independent ¯21 terms is approximately ¯2k, the terms here are not independent. Consequently, the distribution of TDTk for general k depends upon the covariances of the cell counts. Spielman and Ewens [8] suggest the null k–1 distribution of k TDTk is approximately ¯2k – 1 in some circumstances. However, the following example of a stratified population due to Schaid [9] argues against this approximation. Suppose that one stratum carries alleles 1 and 2 at the marker locus, and the other carries alleles 3 and 4. Mating occurs only between members of the same stratum. Thus, the TDT 4 statistic reduces to TDT 4 = 2
– n3/4 → 4) 2 (n1/2 → 1 – n1/2 → 2)2 (n + 2 3/4 → 3 n1/2 → 1 + n1/2 → 2 n3/4 → 3 + n3/4 → 4
and consequently approximates two times a 4–1 ¯22 random variable. It follows that 4 TDT 4 3 2 " 2 ¯2, which has expectation 3 and variance 9. In contrast, a ¯23 random variable has expectation 3 and variance 6. The reduced variance of the ¯23 approximation in this case leads to a high frequency of false-positive results. Of
Lazzeroni/Lange
course, less extreme patterns of population association might be more compatible with the ¯23 approximation. Using the Monte Carlo sampling scheme described in Section 4.1, it is possible to estimate not only the distribution function of TDTk but also the distribution functions of other statistics designed to be powerful against particular alternatives. For example, if we expect only a single unknown allele to be associated with the disease, we can use TDT amax = max
(ti – ci) 2 ti + ci
i
as proposed by Morris et al. [6]. When the association of the allele with the disease is expected to be in a particular direction, the sign of the standardized residual can be retained as in TDT bmax = max i
ti – ci ! ti + ci
.
Testing the maximum deviation can be more powerful than testing the deviation of each allele separately and applying the Bonferroni correction for multiple comparisons. Unfortunately, when some alleles are carried by only a few parents in a sample, TDT statistics for multiple alleles can behave poorly. For example, when ti + ci = 4, the null prob(t – c )2 ability that tii + cii attains its maximum value of 4 is 0.125. Due to the discreteness of the statistic, such alleles can never provide strong evidence of transmission disequilibrium. By contrast when ti + ci is large, the normal (t – c )2 approximation says that tii + cii 6 3.84 occurs with null probability 0.05. Thus, the power of the multiple allele TDT can be adversely affected by the presence of rare alleles that mask the evidence of common alleles. An alternative is to compute a p-value for each allele at a locus under its null binomial distribution, and to use the minimum such p-value as a test statistic. Suppose that allele i
Extensions of the TDT
is observed to have Ti = ti. Let the p-value of allele i be pi (ti). If the association is anticipated to be in a positive direction, then pi (ti) = Pr (Ti 6 ti).
Analogous to TDT bmax, the statistic mini pi (Ti) reflects the greatest evidence against transmission equilibrium of any allele at the locus. A similar statistic, analogous to TDT amax, can be constructed based on two-sided p-values. Lastly, if pi (ti) is a two-sided p-value, the statistic ™i ln pi (ti) is analogous to TDTk and reflects the evidence for disequilibrium aggregated across alleles. Section 4 contains a more rigorous discussion of multiple p-values in the context of TDT tests at multiple marker loci. 3.2. Unaffected and Affected Siblings Unaffected individuals also contain information about transmission disequilibrium. Based on the heterozygous parents of affected children, let tai (rather than ti) be the number of transmitted alleles of type i, and let cai (rather than ci) be the number of untransmitted alleles of type i. Define tui and cui similarly for the unaffected children. Under the null hypothesis, tai and tui are independent binomial variables. When transmission disequilibrium holds, the expected deviation of tui from its tu + cu null expectation i 2 i must be in the opposite direction from the expected deviation of tai ta + ca from its null expectation i 2 i . This suggests using statistics based on the standardized residual tai – tui – [E(tai – tui)] ! Var (tai – tui) tai – cai tui – cui + 2 2 tai + cai tui + cui + 4 4
tai – tui – =
=
!
tai + cui – (tui + cai) ! (tai + cui + (tui + cai)
Hum Hered 1998;48:67–81
73
for the contrast tai – tui. In words, we pool data from unaffected children with data from affected children, reversing the roles of the transmitted and untransmitted alleles for the unaffected children. If we redefine ti = tai + cui and ci = cai + tui, then we can apply the statistics described in Section 3.1. In practice, we suggest using only those unaffected children that are likely to provide the strongest contrast with the affected children. Ordinarily, these would be the unaffected siblings of affected children, particularly those unaffected siblings beyond the age of onset for the disease in question. If penetrance is low, it is probably safest to exclude unaffected children and concentrate solely on affected children. It is important to bear in mind that transmission equilibrium can only hold for multiple siblings if there is no linkage between the marker locus and a disease locus. In the presence of linkage, transmission disequilibrium as defined in Section 2 appears as non-independent transmission to the various children. Thus, including more than one child per family makes the TDT a test of linkage. Allelesharing statistics are likely to have greater power to detect linkage than TDT statistics if the marker is closely linked to a disease gene and there are many multiplex families in the sample. In contrast, TDT statistics are likely to have greater power if there is strong linkage disequilibrium and many simplex families in the sample. One can design a multiple-sibling permutation test for linkage disequilibrium that is insensitive to linkage. To achieve this goal, only two possible permutations are permitted per sibship. The first permutation corresponds to the observed marker types in the family. The second equally likely permutation reassigns to each child the allele not transmitted to that child. Since both parental genotypes are known, this second permutation has
74
Hum Hered 1998;48:67–81
no impact on allele-sharing statistics. Inclusion of the second permutation obviously replicates the TDT permutation procedure in simplex families. 3.3. Genotype Statistics Some authors [1, 10] have proposed the use of statistics based on genotypes as an alternative to those based on alleles. Such statistics contrast the genotype transmitted to a child with a hypothetical control genotype composed of the untransmitted alleles from each parent. For example in a parent-child trio, suppose the mother’s genotype is 1/2, the father’s genotype is 3/4, and the child’s genotype is 1/3. The child’s genotype contributes to a count statistic t1/3 for the transmitted genotype 1/3. The control genotype is 2/4. If we take this point of view and condition inference upon the pair consisting of the observed genotype and the control genotype, then the counts of transmitted genotypes are binomial and fall within the original TDT framework. Goodness of fit statistics can be constructed from the transmission counts ti/j as suggested in Section 3.1. We prefer to condition on parental genotypes. Under the null hypothesis, the three Mendelian transmission assumptions discussed in Section 2.2 imply that the child’s genotype is a multinomial variable with four equiprobable but possibly nondistinct outcomes. In our simple example, the control genotypes now include 1/4 and 2/3 in addition to 2/4. When a child and both parents share the same genotype, say 1/2, the trio provides no information under the binomial distribution because the control genotype is the same as the transmitted genotype. However, such children contribute useful information under the multinomial distribution because the homozygous genotypes 1/1 and 2/2 are valid control genotypes. We omit parent-child trios where both parents are homozygous.
Lazzeroni/Lange
Considering each parent-child trio in turn, one can clearly calculate the mean and variance and hence the standardized residual of each transmission count ti/j. Details are left to the reader. 3.4. Choosing a Test Statistic Other statistics [3, 5], including ones that do not condition on parental genotypes [11, 12] and ones that are derived from explicit models of inheritance [9, 13], can be evaluated under the permutation distribution. The value of model-based statistics lies in their power against specific alternative hypotheses. If the permutation distribution is used to evaluate all test statistics, then the conditional inference framework guarantees that the resulting p-values are exact except for Monte Carlo error. In particular, evaluating p-values by permutation allays our concerns about the large-sample approximations made in coming to grips with unconditional distributions. Freed from the worry of properly evaluating p-values, our attention can turn to issues of power. Although it is pointless to search for a uniformly most powerful statistic within such a large space of alternative hypotheses, some rules of thumb can be stated. Based on simulation studies, Kaplan et al. [5] suggest that statistics obtained from large, sparse contingency tables are usually less powerful than those based on small tables. For example, they show that the complete-symmetry statistic that explicitly compares each pair of counts ni/j → i and ni/j → j [3] is less powerful than statistics based on the collapsed marginal counts in table 2. Cleves et al. [4] agree with this general conclusion even though they successfully construct a counterexample favoring the complete-symmetry test. In the simulations of Kaplan et al. [5], three statistics based on the marginal table all have roughly equivalent power. These statistics are TDTk, a similar statistic including homozygous parents, and a
Extensions of the TDT
likelihood ratio statistic. The moral from these simulations is that the choice of statistic is perhaps less important than the set of counts on which the statistic is based. It should be noted that the permutation procedures used in these papers differ from ours. Kaplan et al. [5] use a smaller, but also valid sample space. Cleves et al. [4] use a Markov chain Monte Carlo procedure. Table size also comes into play with genotype-based statistics since distinct genotypes always outnumber distinct alleles. For multiallelic markers, the simulations cited above agree that genotypic test statistics are usually less powerful than allelic test statistics, especially as the number of alleles grows. One of the primary virtues of our treatment of multiple siblings is that table size is not increased by pooling unaffected with affected siblings. However, as Cleves et al. [4] point out, the use of multiple siblings introduces other issues related to power. In particular, ascertaining on the basis of multiple affected individuals decreases the chance that cases are sporadic. Finally, as we have already mentioned, the relative power of allele sharing and TDT-like statistics depends not only on sibship sizes but also on the strength of linkage versus linkage disequilibrium. 3.5. Multiple Loci Often, it is desirable to apply some variant of the TDT to haplotype data from many markers. Since multiple tests are being conducted simultaneously, it is likely that a number of false positives will occur by chance alone unless an appropriate correction is made. When each locus is tested at some nominal significance level, the overall probability of incorrectly declaring transmission disequilibrium at one or more loci will be greater than the nominal rate. The conditional inference framework provides a solution to this multiple testing prob-
Hum Hered 1998;48:67–81
75
lem that is less conservative than the standard Bonferroni correction for non-independent tests. The general idea is to exploit the fact that the test statistics are coupled (coexist) on a common probability space [14]. On this space, the marginal distribution of each test statistic is preserved. The joint distribution of the test statistics can be used to correct p-values and to achieve the desired overall significance level. Let H now represent the marker haplotype transmitted to a child by one of his parents. As in the single-locus setting, H splits off from the parent’s observed multilocus genotype G an untransmitted haplotype. Let GuH denote both this untransmitted haplotype and the even that it is transmitted to the child. Because of the possibility of recombination, the haplotype pair H and GuH can differ from the haplotypes actually present in the parent. However, it is true that {Pr(H AG) = Pr(GuHAG)}. If disease transmission is independent of marker transmission at all marker loci, then Pr(HAA ∩ G) = Pr (HAG), just as in the single-locus setting. Consequently, we now condition not only on the parental genotype but also on the complementary haplotype pair determined by the haplotype transmitted to the child. This further conditioning does not change the marginal distribution of any single-locus test statistic, but it does eliminate nuisance parameters connected with recombination. Furthermore, it insures that the two parental haplotypes transmitted to the child are independent and that the haplotypes transmitted by a parent to different children are independent. 4. Multiple P-Values In this section we review p-values and describe appropriate corrections when simultaneous TDT tests are conducted on haplo-
76
Hum Hered 1998;48:67–81
type data. After demonstrating some helpful properties of the resulting adjusted multiple p-values, we discuss the interpretation of the adjustments in some typical genetics problems. Suppose a TDT test is conducted at marker locus i with result Ti = ti for one of the statistics Ti described above. To be specific in the following discussion, we assume that large values of Ti favor transmission disequilibrium. Let the p-value of the observed value ti at locus i be pi (ti) = Pr (Ti 6 ti A H0i ),
where the probability is calculated under the null hypothesis H0i of transmission equilibrium between locus i and the disease. As a function of the observed value ti of the random variable Ti, the p-value pi (Ti) is itself a random variable. p-values possess a straightforward relationship to significance levels and hypothesis testing; namely, for any preselected significance level ·, the null hypothesis H0i is rejected when pi (ti) ^ · and accepted when pi (ti) 1 ·. In conducting multiple tests, we are faced with the problem of adjusting the marginal p-values of the separate tests to reflect the fact that one or more of the marginal p-values can appear strikingly low by chance alone. A major virtue of p-values is that they put the evidence from different tests on an equal footing. If a statistic T is continuously distributed, then its p-value p(T) is uniformly distributed on [0,1] under the null hypothesis. This suggests that multiple tests be compared through their p-values. Our strategy is to adjust the marginal p-value pi (ti) at locus i according to the probability that a p-value at any locus is at least as small by chance alone. This defines a set of adjusted p-values, each of which describes the evidence against transmission equilibrium at its respective locus within the context of multiple tests.
Lazzeroni/Lange
Let H0 denote the combined null hypothesis that the separate null hypothesis H10, ..., Hn0 at n different loci are simultaneously true. Corresponding to the p-value pi (ti) = p at locus i, we define the adjusted p-value
known inequality Pr[F (X) ^ ·] ^ · satisfied by any random variable X with distribution function F(x) [15]. Using the similar inequality Pr[ pi (Tj) ^ ·] ^ ·, it follows from definition (5) that n
p˜ (p) = Pr[ min pj (Tj) ^ p A H0].
(5)
[^j^n
This is just the distribution function of the statistic min1 ^ j ^ n pj (Tj) under H0 evaluated at the point p. The adjusted p-value p˜ [pi (ti)] depends on the observations at locus i but not on the observations at other loci. Unlike sequential multiple testing strategies that would base inferences at one locus on data previously observed at other loci, our adjusted p-values provide a standard comparison across all loci. Each ajdusted p-value measures the evidence for transmission disequilibrium at a particular locus on a common scale. Hypothesis tests at a preselected overall significance level · can be constructed as follows: For each locus i such that p˜ [pi (ti)] 1 ·, the null hypothesis H0i of transmission equilibrium at locus i is accepted; for each locus such that p˜ [pi (ti)] ^ ·, the null hypothesis H0i is rejected. Let S be the set of loci in transmission equilibrium. A type I error occurs whenever transmission equilibrium is rejected for one or more loci in S. The adjusted p-values yield a set of tests for which the probability of a type I error is less than or equal to ·, no matter what loci comprise the null set S. In fact, the type I error probability satisfies Pr {min p˜ [ pj (Ti)] ^ · A H 0j for j DS } jDS
= Pr {min p˜ [ pj (Tj)] ^ · A H0} jDS
^ Pr { p˜ [ min pj (Tj)] ^ · A H0} 1^j^n
n
p˜ [ pi (ti)] ^ ™ Pr [ pj (Tj) ^ pi (ti)] ^ ™ pi (ti) j=1
= npi (ti).
j=1
(6)
In the other direction p˜ [ pi (ti)] 6 Pr[ pi (Ti) ^ pi (ti)] = pi (ti)
(7)
is also clear from (5) and the analog of Pr[F(X) ^ F(x)] = F (x) for p-values [15]. Thus, the adjusted p-values p˜ [pi (ti)] are bounded below by the marginal p-values pi (ti) and above by the usual Bonferroni correction npi (ti) for the dependent test statistics T1 = t1, ..., Tn = tn. Consequently, the statistic p˜ [pi (Ti)] always rejects H0i whenever the Bonferroni statistic npi (Ti) rejects H0i. In some cases, the p-value p˜ [pi (ti)] can be substantially less than the Bonferroni correction npi (ti). For example, suppose that we conduct tests at two completely linked loci, and the test at the second locus replicates that at the first locus. Then p1 (t1) = p2 (t2) and the Bonferroni correction at each locus i is double pi (ti). However, the adjusted p-values attached to both T1 and T2 coincide with the original unadjusted p-values. Said slightly differently, the statistic p˜ [pi (Ti)] is better than the Bonferroni statistic npi (Ti) in the sense of having more power to reject the null hypothesis while still maintaining the overall type I error rate at the nominal significance level. As an aside, an alternative to p˜ [pi (ti)] is to use
^ ·,
where the first inequality follows from the monotonicity of p˜. The second inequality is an application to min1 ^ j ^ n pj (Tj) of the well-
Extensions of the TDT
Pr ( max Tj 6 ti A H0)
(8)
1^j^n
Hum Hered 1998;48:67–81
77
as an adjusted p-value. This second approach combines test statistics directly without reference to their marginal p-values and is easier to evaluate. However, this approach fails to provide a standard scale for the evidence at various loci and can yield corrected p-values that are even greater than those given by the Bonferroni approach. As an example, consider two independent tests with statistics T1 and T2 uniformly distributed on the (0,1) and (0,2) intervals, respectively. For 0 ^ t ^ 1, 1 p1(t) = 1 – t and p2(t) = 1 – 2 t. Transmission equilibrium can never be rejected for the first locus using correction (8) because it is bounded by Pr ( max T j 6 t1 A H0) = 1 – [1 – p1(t1)][1 – p2 (t1)] 1^j^n
=1–
t21 2
6
1 . 2
In this example, correcting for the test at the second locus hides the evidence from the first locus because we combine the statistics without first using the marginal p-values to place them on a common scale. The multiple p-values p˜ [pi (ti)] are particularly suited to a genome scan or tests at several candidate loci because they yield a separate comparable measure of the evidence against transmission disequilibrium for each locus. Suppose that of three tested loci only the third is truly in transmission disequilibrium. Then rejecting the null hypothesis for either or both of the first two loci is a type I error. If we accept the null hypothesis for loci at which p˜ [pi (ti)] 1 0.01, the adjusted p-values guarantee that at least 99% of the time the null hypothesis will be accepted at the first two loci. In fact, the probability of no type I error is likely to be slightly better because the adjusted p-value unnecessarily protects against rejecting the null hypothesis at the third locus, just in case it is also in transmission equilibrium. When the null hypothesis is true at all
78
Hum Hered 1998;48:67–81
loci, the multiple p-values correspond to exact nonconservative statements of type I error probabilities. It is important to remember that p-values in general measure evidence, not disequilibrium. If one locus has a lower p-value than another, it may simply reflect greater heterozygosity at that locus among the parents in the sample. Sometimes, it is useful to reach a single decision concerning a set of loci. For example, we may test transmission equilibrium at several closely linked loci in order to reach a general decision about that region. Obviously, moderate disequilibrium at several neighboring loci is more suggestive of a nearby causal gene than moderate disequilibrium at several scattered loci, which might simply be due to random variation. Simple multilocus statistics can be obtained by treating multiple markers as a single locus, which will then have a higher heterozygosity than at any constituent locus. In other settings, more sophisticated multilocus statistics may be appropriate. The null hypothesis corresponding to a multilocus statistic is that all constituent loci are in transmission equilibrium. Rejecting the null hypothesis indicates transmission disequilibrium at one of more such loci. The permutation approach is valid for evaluating multilocus test statistics. In fact, the statistics can even depend on overlapping sets of loci since the procedure automatically adjusts for the dependence structure of the tests. Of course, unnecessarily conducting many tests is to be avoided as it makes the multiple-testing adjustment stricter and consequently reduces the power of any one test. Linkage disequilibrium is also used for fine-mapping of disease genes once linkage within a region has been established. Because the TDT conditions out relevant information contained in the parental genotype distribution, we prefer other methods for fine mapping. While the TDT controls against spu-
Lazzeroni/Lange
rious associations, these are less of an issue once linkage has been established. Furthermore, permutations conducted under the null hypothesis are inappropriate for the analysis of transmission disequilibrium when the null hypothesis is rejected. As an alternative to the TDT, see Lazzeroni [16] for a gene localization method that uses haplotype data. 4.1. Monte Carlo Simulation We now outline the Monte Carlo estimation of p-values under the combined null hypothesis H0 of transmission equilibrium at all marker loci. Prior to the commencement of the Monte Carlo simulation, each child is assigned a maternal haplotype and a paternal haplotype. For the moment, we assume that this can be done unambiguously. As mentioned earlier, loci at which the parent is homozygous contain no useful information and should be treated as missing in both the observed and complementary haplotype. If there are c children in the sample, then there are 2c parent-child pairs. Each pair r involves a parental multilocus genotype gr that is partitioned into the child’s observed haplotype hr and the complementary untransmitted haplotype gruhr. Let Xobs collectively denote the observed genotypes of the c children at all marker loci. We view Xobs as one possible outcome from the random experiment that assigns to each child one of the two available maternal haplotypes and one of the two available paternal haplotypes. We can create an independent replicate X of Xobs by uniformly resampling for each pair r the haplotype hr or gruhr to be transmitted to the child. Because we condition on the complementary haplotype pair derived from the haplotype transmitted to the child, we eliminate all effects of recombination from the resulting Monte Carlo distribution. Once this permutation procedure is done, we construct from X the count tij of the number of times allele j is
Extensions of the TDT
transmitted at locus i. From the counts for the various alleles, we recompute the value Ti of our test statistic at each locus i. Creating m independent replicates of Xobs produces m independent sets of test statistics. Replicate k yields the n correlated statistics Tk1, ..., Tkn for the n different loci. At locus i, the m independent statistics T1i, ..., Tmi pro1 vide the empirical estimate m ™mk = 1 1 {Tki 6 t} of the p-value function pi (t) of test statistic Ti. This yields an estimate of the marginal p-value pi (ti) for the observed data Xobs and estimates of the marginal p-values for the replicates as well. Once we estimate all relevant marginal p-values empirically, we can calculate min1 ^ j ^ n pj (Tkj) for each replicate k. We then empirically estimate the adjusted p-values p˜ [pi (ti)] for each locus i of Xobs by m
p˜ [ pi (ti)] =
1 ™ 1{min1 ^ j ^ n pj (Tkj) ^ pi (ti)}. m k=1
If the child and both parents share the same heterozygous genotype at a locus, then it is impossible to distinguish the maternal and paternal alleles transmitted to the child. When such ambiguity occurs, we randomly assign phase in the child at that locus. Monte Carlo simulation is then used to evaluate the marginal and multiple p-values given the entire configuration of known and randomly assigned phases. Random phase assignment does not change marginal results at any single locus, but it may slightly effect the correlation of test statistics from linked loci and their resulting multiple p-values. To assess the impact of random phase assignment on multiple p-values within a given data set, the entire simulation can be rerun several times with independently assigned phases. The most conservative multiple p-values encountered, which naturally satisfy inequality (6), can then be taken as reasonable upper bounds on the true multiple p-values.
Hum Hered 1998;48:67–81
79
5. Discussion Most permutation tests have the advantages of simplicity, robustness, and practicality. In particular, they avoid dubious largesample approximations by using Monte Carlo simulation to approximate p-values. In the TDT setting, the natural permutation procedures condition on parental genotypes. This action guards against falsely declaring transmission disequilibrium when marker-disease associations arise from ethnic stratification. Conditioning on parental genotypes also eliminates nuisance parameters such as allele frequencies. In the current paper, we have described permutation generalizations of the TDT to multiple alleles, multiple markers, unaffected siblings, and genotypic association rather than allelic association. These extensions broaden the scope and power of the TDT without sacrificing its advantages. Extending the TDT to multiple alleles removes the most annoying restriction of the original TDT. It is clearly wasteful of genetic data to lump the alleles of a multiallelic marker to reduce it to biallelic marker. In extending the TDT to a multiple loci, we have suggested a method that performs better that the standard Bonferroni correction for multiple tests. This method, which depends on the joint null distribution of the various test statistics, can in principle be applied to a large number of marker loci, regardless of whether the loci are linked or unlinked. Westfall and Wolfinger [17] suggest similar methods of combining multiple tests and point out their superiority in handling the kind of discrete distributions that are encountered in genetic applications. Our third extension of the TDT attempts to exploit the contrast between affected children and their unaffected siblings. This extension should be helpful in discerning the protective effects of certain alleles. Late age of
80
Hum Hered 1998;48:67–81
onset, low penetrance, and a large sporadic rate argue against using unaffected siblings. Our fourth extension of the TDT to genotypic association rather than allelic association is primarily intended for diseases that act recessively. This extension generates larger and sparser tables than the allele-based TDT. Although p-values can still be approximated by Monte Carlo simulation, the underlying test statistics unfortunately become more discretely distributed. Sham and Curtis [11] and Spielman and Ewens [8] discuss how biases can be introduced into the TDT by using data from one parent when the other parent is unavailable. Their rules about when such data are acceptable apply to our tests as well. If a parent provides valid data at only some loci, the remaining data for the parent should be entered as missing. In closing, we would like to voice our concurrence with the conclusions of Risch and Merikangas [18] about the relative utility of association methods versus linkage methods in the positional cloning of genes for common diseases. As more human DNA sequence data accumulate and more human genes are classified by developmental stage of expression and tissues of expression, association methods will claim a larger role. It is important that association tests be conducted in a manner that eliminates spurious ethnic association. The TDT test is designed to accomplish just this purpose. Anything that enhances its power is a welcome addition.
Acknowledgment We thank Michael Boehnke and an anonymous referee for helpful suggestions for improving the clarity of this paper.
Lazzeroni/Lange
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
References 1 Terwilliger JD, Ott J: A haplotypebased ‘haplotype relative risk’ approach to detecting allelic associations. Hum Hered 1992;42:337– 346. 2 Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–516. 3 Bickeböller H, Clerget-Darpoux F: Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers. Genet Epidemiol 1995;12:865–870. 4 Cleves MA, Olson JM, Jacobs KB: Exact transmission/disequilibrium tests with multiallelic markers. Genetic Epidemiol 1997;14:337–347. 5 Kaplan NL, Martin ER, Weir BS: Power studies for the transmission/ disequilibrium tests with mutliple alleles. Am J Hum Genet 1997;60: 691–702.
Extensions of the TDT
6 Morris AP, Curnow RN, Whittaker JC: Randomization tests of diseasemarker associations. Ann Hum Genet 1997;61:49–60. 7 Ewens WJ, Spielman RS: The transmission/disequilibrium test: History, subdivision, and admixture. Am J Hum Genet 1995;57:455–464. 8 Spielman RS, Ewens WJ: The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 1996;59: 983–989. 9 Schaid DJ: General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 1996;13: 423–449. 10 Rubenstein P, Walker M, Carpenter C, Carrier C, Krassner J, Falk CT, Ginsburg F: Genetics of HLA disease associations. The use of the haplotype relative risk (HRR) and the ‘haplo-delta’ (Dh) estimates in juvenile diabetes from three racial groups. Hum Immunol 1981;3:384. 11 Sham PC, Curtis D: An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 1995;59:323–336.
12 Thomson G: Mapping disease genes: Family-based association studies. Am J Hum Genet 1995;57: 487–498. 13 Schaid DJ, Sommer SS: Comparison of statistics for candidate-gene association studies using cases and parents. Am J Hum Genet 1993;55: 402–409. 14 Lindvall T: Lectures on the Coupling Method. New York, Wiley, 1992. 15 Angus J: The probability integral transform and related results. SIAM Rev 1994;36:652–654. 16 Lazzeroni L: Linkage disequilibrium and gene mapping: An empirical least squares approach. Am J Hum Genet 1998, in press. 17 Westfall PH, Wolfinger RD: Multiple tests with discrete distributions. Am Stat 1997;51:3–8. 18 Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science 1996;273:1516– 1517.
Hum Hered 1998;48:67–81
81