181
Stock identification of coho salmon (Oncorhynchus kisutch) using minisatellite DNA variation Can. J. Fish. Aquat. Sci. Downloaded from www.nrcresearchpress.com by Fisheries and Oceans on 08/14/15 For personal use only.
Kristina M. Miller, Ruth E. Withler, and Terry D. Beacham
Abstract: The minisatellite DNA probe Ssa1 was used to survey variation in 19 British Columbia, 1 Yukon, 1 Alaskan, and 1 Russian populations of coho salmon (Oncorhynchus kisutch). Hybridization of Ssa1 to DNA restricted with five restriction enzymes revealed 2–12 restriction fragments derived from a single locus. Geographically, although no marked regional definition was found within stocks from northern British Columbia, the lower mainland, the Fraser River, and Vancouver Island, genetic variation between individual stocks was high. Fish were correctly identified to stock of origin with an accuracy of 45% using discriminant analysis and a jackknifed classification procedure with a 22-stock baseline. Estimated mixture proportions for the contributing stock in test samples comprising a single stock averaged 100% using a fixed baseline of 22 stocks (i.e., no error), and 89% when the baselines were resampled. Further, the stock composition of simulated mixed fishery samples containing up to seven southern British Columbia stocks was estimated within an average of 1.4% of the actual mixing portions, on the basis of a resampled baseline. Résumé : La sonde d’ADN minisatellite Ssa1 a été utilisée pour étudier les variations dans 19 populations de saumon coho (Oncorhynchus kisutch) de Colombie-Britannique, 1 du Yukon, 1 d’Alaska, et 1 de Russie. L’hybridation de la sonde Ssa1 à l’ADN après l’action de cinq enzymes de restriction a révélé 2–12 fragments de restriction provenant d’un seul locus. Bien qu’on n’ait pas observé de différences marquées à l’échelle régionale dans les populations du nord de la Colombie-Britannique, des basses terres du Fraser, du Fraser et de l’île de Vancouver, les variations génétiques entre les populations individuelles étaient grandes. On a correctement identifié la population d’origine des poissons avec une exactitude de 45% à l’aide de l’analyse discriminante et de la méthode de classification dite du « canif » à partir d’une base de 22 populations. La proportion estimée de la population constituante dans des échantillons comprenant une seule population était en moyenne de 100%, à partir d’une base fixe de 22 populations (c.-à-d. erreur nulle) et de 89% à partir de bases d’échantillonnage répétées. En outre, on a estimé que la composition d’échantillons simulés de captures de pêche mixte renfermant jusqu’à sept populations du sud de la Colombie-Britannique ne s’écartait pas en moyenne de plus de 1,4% de la composition d’un échantillon réel, à partir d’une base d’échantillonnage répétée. [Traduit par la Rédaction]
Introduction Accurate estimation of stock composition is important in the management of fisheries directed at Pacific salmon. A variety of techniques have been used to estimate stock composition in Pacific salmon, but for coho salmon (Oncorhynchus kisutch), coded wire tagging (Jefferts et al. 1963) has been the main method (Shaul and Clark 1990). Some variation at enzyme loci has been observed (Bartley et al. 1992; Wehrhahn and Powell 1987; Utter et al. 1973), but this variation has not been system-
Received November 4, 1994. Accepted July 20, 1995. J12621 K.M. Miller,1 R.E. Withler, and T.D. Beacham. Department of Fisheries and Oceans, Biological Sciences Branch, Pacific Biological Station, Nanaimo, BC V9R 5K6, Canada. 1 Author to whom all correspondence should be addressed. e-mail:
[email protected]
atically applied to estimate stock composition. Of the five Pacific salmon species, coho salmon show the least amount of genetic variation at enzyme loci (Wehrhahn and Powell 1987). As a consequence, the use of electrophoretic data to distinguish regions or stocks of coho salmon in mixed population fishery samples has had only limited success (Milner 1993). However, a few studies showed small levels of genetic variation between regions at specific loci. Utter et al. (1973) showed that genetic variation exists in transferrin alleles between stocks from the Columbia River and the Washington coast that could be useful in the regional identification of coho salmon. In addition, Wehrhahn and Powell (1987) found frequency differences in Pgi-3 and Ldh-4 alleles between fish from the lower coastal mainland of British Columbia (B.C.) and Vancouver Island. However, although these studies suggest that some local regional variation exists, variation at the level of individual populations has not been found. Because of the relatively low level of variation found at enzyme loci, we chose to examine variation at the DNA level using highly polymorphic variable number of tandem repeat
Can. J. Fish. Aquat. Sci. 53: 181–195 (1996). Printed in Canada / Imprimé au Canada
182
Can. J. Fish. Aquat. Sci. Vol. 53, 1996
Can. J. Fish. Aquat. Sci. Downloaded from www.nrcresearchpress.com by Fisheries and Oceans on 08/14/15 For personal use only.
Fig. 1. Location of coho salmon stocks used in DNA analysis. Stocks include (1) west Kamchatka, Russia, (2) Yukon, (3) Bristol Bay, Alaska, (4) Stikine, (5) Pallant, (6) Toboggan, (7) Kitimat, (8) Atnarko, (9) Robertson, (10) Nitinat, (11) Quinsam, (12) Puntledge, (13) Big Qualicum, (14) Nanaimo, (15) Shawnigan, (16) Tenderfoot, (17) Capilano, (18) Chehalis, (19) Chilliwack, (20) Coldwater, (21) Salmon, and (22) Eagle.
(VNTR) loci. VNTR loci that contain tandemly repeated arrays of 10–75 nucleotides comprise a class of repetitive DNA in the nuclear genome called minisatellite DNA (Jarman and Wells 1989). The extreme polymorphism found at minisatellite loci results from allelic differences in the numbers of repeats (Jeffreys et al. 1985). One way to examine minisatellite loci for variation is through the use of minisatellite DNA probes hybridized to genomic DNA digested with restriction enzymes. When hybridized at low stringency, these minisatellite DNA probes often recognize multiple loci, leading to the complex banding patterns that have been referred to as DNA fingerprints (Jeffreys et al. 1985). We utilized a minisatellite DNA probe, Ssa1, derived from Atlantic salmon (Salmo salar) by Bentzen et al. (1993). Ssa1 hybridizes to one or two loci in salmon. In this study, we examine nuclear DNA variation in populations of coho salmon at the Ssa1 minisatellite locus. The goal of the study was to determine the usefulness of minisatellite DNA in the identification of coho salmon to specific stocks or regions. The ability to identify individual fish to stock and the accuracy of estimated stock compositions are examined.
Materials and methods DNA samples and extraction Liver or blood samples were taken from approximately 50 spawning adult coho salmon at each of 22 locations in British Columbia, the Yukon, Alaska, and Russia during 1987–1992 (Fig. 1 and Table 1). Locations were chosen to represent a wide range of populations that act as discrete breeding units or stocks (according to definition by Ricker 1972), although the “stock” status of these populations has not yet been confirmed by estimations of rates of migration between populations. Samples were stored frozen at –20oC or stored in 95% ethanol at 4oC until DNA extraction. Prior to DNA extraction, ethanol was removed by centrifugation for ethanol-preserved samples. Approximately 0.3 g of liver tissue shaved from liver samples was placed in 15 mL of proteinase K lysis buffer (10 mM Tris-HCl pH 8.0, 10 mM ethylenediaminetetraacetic acid (EDTA) pH 8.0, 1% SDS, 0.2 µg/mL proteinase K) and incubated at 37oC overnight. Alternatively, 250 µL of blood (ethanol removed) was used. After overnight digestion, proteins were precipitated by the addition of 10 mL of 5 M NaCl,
183
Can. J. Fish. Aquat. Sci. Downloaded from www.nrcresearchpress.com by Fisheries and Oceans on 08/14/15 For personal use only.
Miller et al.
mixed briefly, and centrifuged at 12 000 × g for 20 min. Supernatant was extracted once with phenol–chloroform (1:1) to further remove proteins and excess salt. DNA was precipitated in 2.5 volumes of 95% ethanol and incubated at –70oC for a period ranging from 1 h to overnight. High molecular weight DNA was removed from the ethanol with a glass rod and placed in Eppendorf tubes containing 70% ethanol. DNA was then pelleted by centrifugation, air dried at room temperature, and resuspended in TE buffer (10 mM Tris-Cl, 1 mM EDTA, pH 8.0). Southern blots and hybridization Approximately 4.5 µg of genomic DNA from each fish was digested with each of five restriction enzymes (3 units/µg), HaeIII, HinfI, RsaI, MboI, and AluI. After a 3- to 5-h incubation at 37oC, DNA samples were loaded into 0.7% agarose gels. Each agarose gel contained 30 wells; 25 wells were loaded with genomic DNA, 1 of which was a control fish run on every gel, and 5 wells were loaded with lambda DNA digested with HindIII as a molecular weight standard. Most gels contained individuals from two to four populations, and each population was run on numerous gels. Each gel contained samples digested with the same restriction enzyme. DNA was size fractionated by electrophoresis of the 0.7% agarose gels run in TBE buffer (89 mM Tris-borate, 89 mM boric acid, 2 mM EDTA) at 35 V for 16 h. Gels were then Southern blotted (Southern 1975) and UV cross-linked. The minisatellite probe Ssa1 (Bentzen et al. 1993; Bentzen and Wright 1993) was used to assay variation at VNTR loci of coho salmon. The Ssa1 probe is comprised of a number of 16 base pair repeats and detects variation in the number of tandemly arranged copies of the repeat at a single locus in coho salmon (see family analysis). Ssa1 was labelled with [32P]dATP using random primer extensions (Feinberg and Vogelstein 1983). Membranes were placed in rotisserie hybridization oven bottles (4–6 membranes per bottle; Robbins Scientific) and prehybridized overnight at 65oC in 20 mL of prehybridization solution (7.0% SDS, 1.0 mM Na2EDTA (pH 8.0), 1.0% bovine serum albumin, 0.263 M Na2HPO4). Hybridization was followed by the addition of new hybridization solution, the denatured Ssa1 probe, and 10 µg of denatured sheared calf thymus DNA, followed by another overnight incubation at 65oC. After hybridization, membranes were washed twice for 15 min at room temperature in 2× SSC (300 mM NaCl, 30 mM trisodium citrate, 0.1% SDS, once for 30 min at 65oC in 1× SSC, 0.1% SDS, and once for 15 min at room temperature in 1× SSC. Membranes were then exposed to X-ray film (Kodak XOMAT-AR) for 1–3 days at –70oC. DNA blot analysis and binning Autoradiographs were scanned using a Kodak high-resolution two-dimensional charge coupled device (CCD) camera with a 512 × 512 pixel density. Bioimage software (Millipore Corp. Imaging Systems, Ann Arbor, Mich.) was used to analyze the blots. The lambda size markers, which were present in five lanes across the blots, were used to form a size grid across the blots upon which bands between 2 and 23 kilobase pairs (kbp) in size were measured. Within this range, Ssa1 resolved between 2 and 12 polymorphic bands with each of the five restriction enzymes used. In addition, there were four to six
Table 1. Stocks analyzed for variation at VNTR loci recognized by the minisatellite probe Ssa1. Stock Atnarko Kitimat Pallant Toboggan Stikine Robertson Nitinat Quinsam Puntledge Big Qualicum Nanaimo Shawnigan Tenderfoot Capilano Coldwater Salmon Eagle Chehalis Chilliwack Bristol Bay Yukon West Kamchatka
(At) (Ki) (Pa) (To) (St) (Ro) (Ni) (Qu) (Pu) (BQ) (Na) (Sh) (Te) (Ca) (Co) (Sa) (Ea) (Che) (Chi) (Br) (Yu) (Ka)
Region Northern B.C. Northern B.C. Northern B.C. Northern B.C. Northern B.C. WCVI WCVI ECVI ECVI ECVI ECVI ECVI Lower mainland Lower mainland Upper Fraser Upper Fraser Upper Fraser Lower Fraser Lower Fraser North of B.C. North of B.C. Russia
N 69 69 83 18 6 71 42 98 40 73 24 37 53 35 29 17 45 30 69 21 17 29
Note: The regional location of each of the stocks and numbers of individuals are shown as well as the stock abbreviations in parentheses. WCVI, west coast of Vancouver Island; ECVI, east coast of Vancouver Island. N is the number of individuals sampled in each stock.
bands below 2.0 kbp in size that were not scored because of variation in their resolution between blots. Because of the nearly continuous distribution of molecular weights sampled, and the large number of individuals and populations examined on multiple gels, an estimate of band size measurement error was made following Galbraith et al. (1991). Control fish that contained bands with sizes that spanned the 2- to 23-kbp range in size were run on each blot, with a different control fish for each enzyme. A total of 41 different molecular weight (MW) bands from all control fish combined were used to calculate the mean and standard deviation (SD) of MW (in kilobase pairs) over all gels. A linear regression of the natural logarithm of the standard deviation of MW (σMW) on mean MW (µMW) was used to interpolate standard deviations for bands with different molecular weights between 2 and 23 kbp. The regression equation was lnσMW = 0.151µMW – 3.96 r2 = 0.80, P < 0.001, N = 41 The regression equation was used to establish molecular weight size intervals, or bins, that were 5 SD wide, over the 2to 23-kbp size range in which bands were measured. A total of 41 continuous bins was established by assuming that a bin width of 5 SD (+2.5) would account for approximately 99% of the measurement error according to the gel set-up employed in this study (Gill et al. 1990; Galbraith et al. 1991). Bins widths increased with molecular weight, ranging from 0.2 kbp in the 2.0- to 2.2-kbp bin size interval to 3 kbp in the 20- to 23-kbp bin size interval, hence the positive correlation between µMW and lnσMW.
Can. J. Fish. Aquat. Sci. Downloaded from www.nrcresearchpress.com by Fisheries and Oceans on 08/14/15 For personal use only.
184
Bands from individual fish digested with each enzyme were binned into each of the 41 molecular weight size intervals. Each fish produced 2–12 bands, and consequently most of the bins contained zero or one band, and none contained more than two bands. Because of the measurement error and the contiguous molecular weight size intervals used, it was possible for homologous bands in different individuals to fall into more than one bin when the MW was close to the class limits of the bins. However, Gill et al. (1990) showed that this binning procedure did not differ significantly from that using a sliding window routine, which bases bin intervals on band frequency data. The sliding window routine was not possible with the present data set because the relatively large number of restriction fragments and extreme polymorphism detected by Ssa1 resulted in a nearly continuous distribution of molecular weights. Hence, nearly all of the 41 bins contained a band from at least one specimen for each of the five restriction enzymes examined. Statistical analysis Family data A pedigree study was conducted to establish the Mendelian inheritance of minisatellite bands and the number of loci recognized by Ssa1. Three male and three female coho salmon from a domesticated strain of Kitimat River origin were crossed to produce three full-sib families. Sixteen to 22 offspring from each family were analyzed using MboI, HinfI, AluI, and RsaI restriction enzymes. The use of four enzymes was sufficient to establish the Mendelian inheritance of Ssa1 and the number of loci recognized by Ssa1; thus, HaeIII was not included in the analysis. Mendelian inheritance was analyzed using the χ2 statistic. Interannual data Interannual variation of binned data was examined for six stocks with individuals collected over 2–4 years each, including Kitimat (1987, 1988, 1991, 1992), Big Qualicum (1988, 1992), Quinsam (1987, 1988, 1992), Pallant (1987, 1992), Robertson Creek (1988, 1992), and Chilliwack (1988, 1992). The χ2 test accompanied by Monte Carlo simulations of the distribution of the χ2 statistic was used to estimate the probability of obtaining a value equal to or greater than that observed (Roff and Bentzen 1989). One hundred to 1000 randomizations of the original data were used in the program MONTE of the REAP software package (McElroy et al. 1992). Restriction enzymes, each with 41 bins of input data, were analyzed separately. Because of the large number of tests made (6 stocks × 5 enzymes = 30 tests), significance levels were adjusted for the number of tests carried out using the method of Hochberg (1988), as suggested by Lessios (1992). In this method, tests are ordered by decreasing probability value. If the highest probability is 0.05, then comparisons of subsequent probabilities continue, but the significance level is adjusted by division by the number of tests performed plus one. When a test is significant at the adjusted significance level, testing stops, and it and all subsequent tests are deemed significant.
Can. J. Fish. Aquat. Sci. Vol. 53, 1996
Interstock data Interstock variation in binned data was statistically analyzed for (i) the significant contributions of individual bins to interstock variation, (ii) the general distances between stocks to assess geographic trends in variation, (iii) the ability to identify individual fish to stock, and (iv) the ability to estimate stock composition of mixtures. The contribution of individual bins to significant variations between stocks was examined using the G test statistic in SAS (SAS Institute Inc. 1989) for each of the 41 bins present in each of the five restriction enzyme data sets. For the rest of the statistical procedures, the five restriction enzyme data sets were combined into a master data set containing 22 populations and 205 bins. Thus, for each individual fish sampled (975 in all), the data set consisted of 205 variables (bins) with a value of 0, 1, or 2 in each (representing the number of bands in each bin). A generalized squared distance (D2; Pimentel 1979) was calculated from mean bin counts over all 205 bins for each pair of populations. The matrix of distances was then clustered using the neighborjoining algorithm (NEIGHBOR in PHYLIP; Saitou and Nei 1987; Felsenstein 1990) and a dendogram was developed. The ability to classify individual fish into stocks was examined using linear discriminant analysis with the DISCRIM procedure in SAS (Pimentel 1979). The level of accuracy in assigning individual fish to their stock of origin was assessed using the jackknife procedure. The ability to estimate stock composition from samples taken from a mixed stock fishery was examined using a maximum likelihood model (Fournier et al. 1984). The maximum likelihood model uses probability distributions for each of a number of independent characters to define the probability of choosing a fish with a particular set of characteristics. Principal component analysis (PRINCOMP procedure in SAS on correlation matrix) was used to incorporate the data from all five enzymes, into 205 uncorrelated variables (principal components) that accounted for 100% of the observed variation. Each principal component is a linear combination of the original 205 variables, which have been standardized by subtracting the corresponding overall mean and dividing by the corresponding standard deviation over all stocks. The input to the maximum likelihood model required discrete frequency counts for each variable, and as outlined by Beacham et al. (1995), the continuous distributions of the principal component scores were represented as 12-bin histograms. Each stock was then characterized as an input matrix of 205 rows (principal components) and 12 columns (counts of principal component scores in each of the 12 bins). The accuracy of the maximum likelihood model in the estimation of stock compositions of coho salmon was examined with both fixed and resampled (bootstrapped) baselines containing all 22 stocks. Fixed baseline data sets assume that the baseline data represent the stocks perfectly. More realistically, resampled baseline data sets emulate the randomness involved in the collection of baseline data under the assumption that the baseline samples are representative, random samples of the stocks. Single stock simulated fishery (test) samples of 50 fish were generated randomly with replacement, and 25 different single stock mixtures were analyzed with both fixed and resampled baseline data sets, with the mean and standard deviation of the estimated stock composition determined. In addition, stock composition was estimated for simulated
185
Miller et al. Table 2. Restriction fragment sizes (kb) for alleles present in three family crosses of coho salmon when analyzed with four restriction enzymes.
Can. J. Fish. Aquat. Sci. Downloaded from www.nrcresearchpress.com by Fisheries and Oceans on 08/14/15 For personal use only.
Allele A B C D E F G
MboI
HinfI
AluI
RsaI
2.21+6.05+12.69 2.21+6.05+12.69 3.93+5.16+5.88+9.04+13.88 2.38+3.67+5.77+6.25+11.77 7.70 5.27+5.75+6.18 9.21+20.96
4.99+14.55 4.57+14.55 23.1 2.47+3.52+4.95+11.25
2.40+7.80+15.88 2.40+7.80+15.88 2.35+3.23+3.67+7.57+>23.1 2.08+3.74+6.07+6.80+16.97 3.45 2.35+3.50+6.07+8.10+>23.1 2.80+6.60+9.10+14.00
Table 3. Genotypes of three families of coho salmon used in the pedigree analysis. Parental genotypes
Progeny genotypes
M1F1 Male 1 (CC) × female 1 (AB)* Male 1 (CC) × female 1 (AA)
9 AC, 9 BC 18 AC
M3F5 Male 3 (AE) × female 5 (CD)
3 AC, 3 AD, 4 CE, 6 DE
M4F2 Male 4 (AC) × female 2 (FG)
4 AF, 5AG, 6 CF, 4 CG
Fig. 2. Portions of autoradiographs of coho salmon DNA restricted with MboI (A), RsaI (B), or HinfI (C) and hybridized with Ssa1. Portions of each autoradiograph were included to illustrate variation observed. Molecular weights are given in kilo base pairs.
*Only HinfI resolved the B allele.
multistock fishery samples containing stocks from specific regions, such as the Strait of Georgia, northern British Columbia, and the Fraser River.
Results Banding patterns of Ssa1 In a given individual, the number of bands between 2 and 23 kbp in size varies from 2 to 12 with different restriction enzymes (Fig. 2). Further, there are a variable number of bands, from 4 to 6, present that are less than 2.0 kbp in size that were not scored in the present study because of variation in their resolution between blots. Family analysis Pedigree analysis of three coho families revealed that the Ssa1 probe detected a single highly polymorphic locus with diploid inheritance. Each allele consisted of between one and six restriction fragments (Table 2). The multiple fragments constituting a single allele displayed complete linkage (i.e., complete cotransmission, or lack of cotransmission, to progeny) and Mendelian inheritance (transmission to 50% of progeny from homozygous parents) (Table 3). For instance, in the cross between male 3 and female 5, analysis of the inheritance of banding patterns between parents and offspring revealed that four alleles were resolved by HinfI (Fig. 3). The female contributed allele C, with five bands (