Update section. Short communication. 153. Repeated sequences as genetic markers in pooled tissue samples. R.P. Hellens, T.H.N. Ellis, D. Lee and L. Turner.
Plant Molecular Biology 22: 153-157, 1993. © 1993 Kluwer Academic Publishers. Printed in Belgium.
153
Update section Short communication
Repeated sequences as genetic markers in pooled tissue samples R.P. Hellens, T.H.N. Ellis, D. Lee and L. Turner John Innes Institute, Colney Lane, Norwich NR4 7UH, UK Received 28 July 1992; accepted in revised form 20 January 1993
Key words: linkage, D N A mixtures, repeated sequences, pea, Pisum sativum
Abstract We show, using the PDR1 element of pea, that dispersed repeated sequences of moderate copy number can be used simply and efficiently to generate markers linked to a trait of interest. Inspection of hybridization patterns of repeated sequences to D N A mixtures of pooled genotypes is a sensitive way of detecting such markers. The large number of bands in tracks of digests of these mixtures allows the simultaneous sampling of loci at many places in the genome, and the many unlinked loci serve as internal controls. It is also shown that intensity ratios calculated from these band differences can be used to give a rough estimate of linkage distance.
The use of molecular markers in genetic analysis has become commonplace. In many such analyses it is desirable to find a molecular marker which maps close to a trait of interest. The problem this presents is one of scale; it is necessary to scan many markers against many genotypes, or recombinant types, in order to obtain the required information. To simplify this task strategies have been developed which limit the number of recombinant types assayed; these are the use of nearisogenic lines [7, 10] and of pools of recombinant types from existing mapping populations [3, 8]. The screening strategies employed have in turn been improved in efficiency by replacing hybridization assays with PCR-based markers. We have proposed that the use of dispersed repeated sequences as a source of molecular markers can combine the advantages of hybridization assays with a PCR approach [5], and a
similar PCR strategy has been proposed by Rogowsky et al. [9]. Here we suggest that the use of dispersed repeated sequences in the analysis of pooled recombinant types can be an efficient strategy for obtaining markers linked to a trait of interest because many markers can be screened at the same time in hybridization assays. A pea RFLP map has been derived from analysis of the patterns of segregation of markers among recombinant inbred (RI) lines, derived from the cross JI281 x JI399, [1]. Recombinant inbred progeny from a related cross, JI15 x JI399, have been used in further analysis, the F2 from which these RIs are derived, has been described previously [6], and this material has been used to test how robust the pooling of samples by genotype could be for the assignment of linkage. This latter cross segregates for the wrinkled-seed marker rb [6], consequently individual RI seeds
154 could be assigned to the R b R b or rbrb class before sowing. This was undertaken for the F 9 generation, so we expect a small (1/512) misclassification of the R b R b class when the genotype is Rbrb. This error has been ignored. A single leaflet was taken from each of 38 rbrb and each of 38 R b R b plants from a recombinant inbred population of 95 RI lines, these leaves were combined according to their genotypic class. The sample size sets the minimum possible error on the estimate of the frequency of recombinant types. In general for any population, the standard error of the estimate of any proportion (p) in a large population, of which n are sampled, cannot be less than (pq/n) ~/2, where q = 1-p. D N A was prepared from these two collections of leaflets, those chosen being approximately the same size, but no other precautions were taken to ensure that equivalent weights of tissue were taken from each plant. Digests of these two D N A preparations, together with digests of D N A from the parental lines were probed with two R F L P markers c D N A 53 and pJC 2-7 which, respectively, mapped approximately 4 and 16 map units from rb in the cross JI281 x JI399 [ 1]. The hybridization patterns obtained in this experiment are shown in Fig. 1. The association of these probes with the rb locus is obvious, and a densitometric analysis enabled an estimate to be made of the degree of linkage between these markers (Annex 1). The results of this analysis for the rb - pJC 2-7 interval compare favourably with conventional segregation analysis of the same cross JI281 x JI399 (24.1 + 1.6 vs. 15.7 + 5.1 map units). The results of this simple experiment suggest that D N A preparations from crude pools of tissue can provide a sensitive basis for the detection of distantly linked markers. The error associated with this estimate is a measurement error and does not take account of the error in the estimate of the proportion of recombinant types from a population of this size. The final error of the estimate should combine these two errors. To test whether multiple locus markers are able to detect linkage in pooled samples Eco RI digests of individual RI line D N A s from the cross JI281
Fig. i. Analysis of markers linked to rb in pooled samples. The tracks are Eco RI digests of the indicated line or pool. (A)
is probed with cDNA 53 and (B) is the same filter probed with pJC 2-7 [5]. Conditions for gel electrophoresis, and Southern blot preparation have been described in detail by Lee et
al. [5].
x JI399 were combined on the basis of the score for the marker DR4, which is a dispersed repeat marker linked to rb. For the cross JI281 x JI399 we have the map: rb - 3.9 mu - c D N A 53 - 14.3 mu - pJC 2-7 - 16.6 mu - DR4 - 6.2 mu - DR10 [1]. The marker DR4 was identified as a difference in Hind III digest patterns using probe B (Fig. 2) of the copia-like element PDR1 [ 1, 5]. The test hybridization was to Eco RI digests with the probe A (Fig. 2). A close pair of Eco RI sites in PDR1 separate these two probes and also fragments which would hybridize to them. The probe B detects about 50 bands in an Eco RI digest of pea genomic DNA, and many of these bands differ between lines; consequently the pattern of hybridization to mixtures of genotypes is very complex, because these combine allelic types from
155
Fig. 2. Eco RI digests of pooled recombinant types from the progeny of the cross JI281 x JI399 pooled on the basis of the scores for the DR4 locus have been probed with pJC 5-5bTSphI (probe A). D R 4 + is the left hand track and DR4- is the right hand track. The gel was 40 cm long, and 3mm thick, 0.8~.~, w/v agarose (sigma type I low EEO), in l x TAE (40 mM Tris-acetate pH 7.9, 5 mM sodium acetate, 1 mM EDTA) at 0.75 V/cm for ca. 112 h. A similar analysis is shown for the rb pooled samples used in Fig. 1. A densitometric trace of the relevant region of this blot is shown. The areas under peaks 1 and 2 were compared. Peak 2 is a reference peak, and is not allelic to peak 1. Densitometry was performed using a Chromoscan 3 densitometer.
both parents. The hybridization pattern which resulted is shown in Fig. 2 and several clear differences between the pools can be seen, suggesting that a dispersed repeat sequence probe can detect nonrandom contributions of alleles from several loci even when many hybridizing fragments are on the gel. The mixture prepared from pooled tissue of the
cross JI15 x JI399 was also hybridized with probe A, and the result is shown together with a densitometric trace in Fig. 2. DR4 is the closest locus (mapped in the cross JI281 x J 1399) which could be detected by this probe. It is at a distance of about 35 map units [1]. The densitometric analysis suggests that there is a locus showing about 30 ~o recombination with respect to rb segregating in the cross JI15 x JI399 detected by the PDR1 probe. As rb is at the end of a linkage group, we suggest that this detected difference is a candidate for an allele of one of the band differences seen in the DR4 pool from JI281 x JI399 (Fig. 2). These experiments show that dispersed repeated sequence probes can be used in the rapid assignment of molecular markers linked to various traits using pooled mixtures of tissue. The analysis of pools of recombinant inbred lines allows the detection of linkage between quite distant markers. The procedure we have adopted can detect linked markers at distances greater than 20 map units, thus each marker covers at least 40 map units. The pea genome has a total map length of about 1600 map units and so requires 40 well placed markers to enable detection of linkage to any single locus. Randomly placed markers will need to be about three times as abundant for complete coverage, implying that a fairly small number of probes like PDR1 are required. A corollary of this is that the analysis of pooled D N A samples does not necessarily provide markers tightly linked to a trait of interest. If a marker which detects more than one locus is used in the analysis of a segregating population then it is not necessarily clear which locus, as defined in another cross, is being followed. However, the linkage relationships of such loci contribute useful information. It is possible to determine which marker is likely to be involved using a pooling procedure and looking for linked markers. For example, a pool such as shown in Fig. 2 could be used to identify the individual D R locus by virtue of the hybridization pattern to a linked unique marker such as can be detected by pJC 2-7. An alternative is to use a process of elimination; for example, in the cross JI281 × JI399
156 only one of the two rRNA gene loci present in peas segregates as a length variant, so we wished to find a restriction enzyme site polymorphism which identified the other. Various digests were performed of the two parental lines and pooled samples of the recombinant inbreds. The recombinant inbreds were pooled according to the rRNA gene locus which had been scored. Differences in the r D N A pattern between the parents which did not distinguish the pools (data not shown) were thus identified as markers for the unscored locus; i.e. they are r D N A differences which cannot be accounted for by the scored segregation. For segregation analysis in recombinant inbred populations such as ours, dispersed repeated sequences can be scored directly [5]. For breeding programmes discrete, single locus markers are more useful. These can be generated as previously described [5 ] by conventional cloning or inverse PCR. The experience of others with F2 populations suggests that our strategy can have wide application and the work of Flavell et al. [2] suggests that copia-like element probes could be used in exactly this way in a wide range of plant species. Recombinant inbred lines have about twice the sensitivity of F2 populations for closely linked markers, but this advantage is eroded with increasing distance between the markers. The ratio of band intensities changes from P/(1-P) (or p: 1/2; see Annex 1 eq. 1) for recombinant inbred lines to p/(2-p) for F2 bulked samples. The three fold difference in this ratio for unlinked markers reflects the difference between 1:1 segregation for RI lines and 1:3 segregation for F2 individuals. This can be adjusted by controlled loading or with reference to unlinked markers thus the advantage for recombinant inbreds is at best 1:1 1/3. These considerations suggest that dispersed repeated sequence probes could be used successfully in F2 populations as well as the recombinant inbreds we have exploited here.
Acknowledgements The John Innes Institute is supported by a grant in aid from the Agricultural and Food Research Council. This work was supported by the Agricultural Genetics Company. We are grateful to Prof. D.R. Davies for many discussions and his comments on this manuscript.
References 1. Ellis THN, Turner L, Hellens RP, Lee D, Harker CL, Enard C, Domoney C, Davies DR: Linkage maps in pea. Genetics 130:649-663 (1992). 2. Flavell AJ, Smith DB, Kumar A: Extreme heterogeneity of Tyl-copia group retrotransposons in plants. Mol Gen Genet 231:233-242 (1992). 3. GiovannoniJ, Wing RA, Ganal MW, Tanksley SD: Isolation of molecular markers from specific chromosomal intervals using DNA pools from existing mapping populations. Nucl Acids Res 19:6553-6558 (1991). 4. Haldane JBS, Waddington CH: Inbreeding and linkage. Genetics 16:357-374 (1931). 5. Lee D, Ellis THN, Turner L, Hellens RP, Cleary WG: A copia-like element in Pisum demonstrates the uses of dispersed repeated sequences in genetic analysis. Plant Mol Biol 15:707-722 (1990). 6. Lee D, Turner L, Davies DR, Ellis THN: An RFLP marker for rb in pea. Theor Appl Genet 75:362-365 (1988). 7. Martin GB, Willians JGK, Tanksley SD: Rapid identification of markers linked to Pseudomonas resistance gene in tomato by using random primers and near-isogenic lines. Proc Natl Acad Sci USA 58:2336-2340 (1991). 8. Mitchelmore RW, Paran I, Kesseli RV: Identification of markers linked to disease resistance genes by bulked segregant analysis: A rapid method to detect markers in specific genomic regions by using segregating populations. Proc Natl Acad Sci USA 88:9828-9832 (1991). 9. Rogowsky PM, Shepherd KW, Langridge P: Polymerase chain reaction based mapping of rye involving repeated DNA sequences. Genome 35:621-626 (1992). 10. Young ND, Zamir D, Ganal MW, Tanksley SD: Use of isogenic lines and simultaneous probing to identify DNA markers tightly linked to the Tm-2a gene in tomato. Genetics 120:579-589 (1988).
157 A n n e x I. T h e o r e t i c a l c o n s i d e r a t i o n s
Assuming that an RFLP marker is linked to a gene a such that the frequency of recombination between them in a single meiosis is p, then for RI lines [4] the proportion of lines for which the parental alleles are recombined is P, where: 1)
P = 2p / (1+2p)
Hybridization patterns to pooled collections of each genotype will reflect the proportion each RFLP marker type in the mixed DNA samples, and this will be seen as differences in band intensity: band #
Parental types: AA
aa
[]
[] Ratio:
Relative amount in recombinant inbreds: AA pool ~-i-:-~Z~-(1 -P)
a a pool iiiii::i::p
!:!:!:!:P
:::_s_ (1 - P)
P/ (l-P)
(1-P)/P
The ratio of ratios of band intensities is R, where: 2)
3)
R = p 2 / (1-p)2 R= 4p 2
Taking the ratio of ratios controls for any intrinsic difference in the intensity of bands I and 2, the loading of tracks corresponding to the two pools, and enables a direct estimate of error to be made'. If there is approximately a 10% error in the measurement of any peak area the theory of errors suggests that there will be a 40% error in the estimation of R, but a 20% error in the estimate of p. Note that, assuming a 10% measurement error, if p has a value which is more than three standard error units below that expected for unlinked markers we can calculate an inequality; if there is a 10% error in measurement of each of the areas then the standard error of p is 0.2p (20% q.v.), so for a deviation greater than three standard errors from the expected (unlinked) value of 0.5 we have: 4) 0.5 < p + 0.6 p 5) p < 0.31 This suggests that distant linkages may be measurable by a densitometric procedure. Estimates of errors: In R from measured error in peak areas :
In p, from R :
Where Aij is the area of band i in genotype pool j R = (A2JAn) / (Az2/A12) = (A21A22)/(AlIA12) log R = log (A2I) + log (A22) - log (All) - log (A12)
R logR
80:
SO"
IdR/R =+IDA21/A21+JdA22/A22~=JdAn/An~IdA12/A12 zXR/R = +AA21/A21 +AA22/A22 ~:AAu/An ~AA12/A12 when all a&j/Aij are the same and - 10%: a R / R = + 4(A&/Aij) - 40%
IdR/R aR/R if then
= 4p 2 =21ogp
+log4
= +2Idp/p = 2(ap/p) zXR/R - 40% A p / p - 20%