Springer 2006
Conservation Genetics (2006) 7:971–982 DOI 10.1007/s10592-006-9122-0
Identification and characterization of microsatellites for striped bass from repeat-enriched libraries Caird Rexroad1,*, Roger Vallejo1, Issa Coulibaly1, Charlene Couch2, Amber Garber2, Mark Westerman3 & Craig Sullivan2 1
USDA/ARS/NCCCWA, 11861 Leetown Road, Kearneysville, West Virginia 25430, USA; 2Department of Zoology, North Carolina State University, Raleigh, North Carolina 27695, USA; 3Kent SeaTech Corporation, 11125 Flintkote Avenue, Suite J, San Diego, California 92121, USA (*Corresponding author: Phone: +1-304-724-8340, ext. 2129; Fax: +1-304-725-0351; E-mail:
[email protected])
Received 21 December 2005; accepted 17 January 2006
Key words: breeding, genetics, heterozygosity, microsatellite, striped bass
Abstract Striped bass (Morone saxatilis) is economically important in the US due to its value as an aquaculture species and in supporting commercial and recreational fisheries, especially those off the Atlantic coast and in the Gulf of Mexico. Modern strategies for managing fishery populations and aquaculture broodstocks employ the use of molecular genetic markers to identify individuals, assign parentage, and characterize population genetic structure and levels of inbreeding and migration. As part of a collaborative effort to utilize molecular genetic technologies in striped bass breeding programs we generated microsatellite markers for use in population genetic studies, broodstock selection and management strategies, and the construction of a genetic map. We developed 345 new microsatellite markers for striped bass, a subset (n=71) of which was characterized by genotyping samples from two striped bass broodstock populations to evaluate marker polymorphism, percent heterozygosity, Hardy–Weinberg equilibrium (HWE), linkage disequilibrium (LD) and utility for population genetic studies.
Striped bass (Morone saxatilis) is economically important in the US due to its value as an aquaculture species and in supporting commercial and recreational fisheries, especially those off the Atlantic coast and in the Gulf of Mexico. Modern strategies for managing fishery populations and aquaculture broodstocks employ the use of molecular genetic markers to identify individuals, assign parentage, and characterize population genetic structure and levels of inbreeding and migration. Several studies have utilized molecular genetic markers to characterize wild populations and migration patterns of this anadromous fish (Wirgin et al. 1991, 1993, 1997; Leclerc et al. 1996; Bielawski and Pumo 1997; Han et al. 2000; Roy et al. 2000; Ross et al. 2004; Brown et al. 2005).
As part of a collaborative effort to employ molecular genetic technologies in striped bass breeding programs we generated microsatellite markers for use in population genetic studies, broodstock selection and management strategies, and the construction of a genetic map. We developed 345 new microsatellite markers for striped bass, a subset (n=71) of which was characterized by genotyping samples from two striped bass broodstock populations to evaluate marker polymorphism, percent heterozygosity, Hardy–Weinberg equilibrium (HWE), linkage disequilibrium (LD) and utility for population genetic studies. Four repeat-enriched libraries were constructed by Genetic Identification Services (GIS; Chatsworth, CA, USA) to identify AAG, ATG, TACA,
972 and TAGA repeats in striped bass. The libraries were constructed from genomic DNA using protocols which enrich for fragments containing a motif of interest (Peacock et al. 2002). Single pass sequencing using the ABI Prism Big Dye Terminator cycle sequencing kit (ABI, Foster City, CA, USA) with a universal M13 forward primer was conducted on 408 clones from each of the four libraries. Sequences were trimmed for quality and vector sequences removed using Phred and Crossmatch (Ewing et al. 1998). All sequences in the data set and all other Morone saxatilis microsatellites from GenBank were analyzed for redundancy using Sequencher (Gene Codes Corporation, Ann Arbor, Michigan). Novel unique sequences were candidates for primer design using Oligo 6.0 (Rychlik and Rhoads 1989). An M13 tail was added to each forward primer for genotyping (Boutin-Ganache et al. 2001). Primer pairs were optimized for PCR by varying annealing temperatures and MgCl2 concentrations. Optimizations were conducted in a 12 ll reaction volume containing 12.5 ng DNA, 1.5–2.5 mM MgCl2, 1.0 lM of each primer, 200 lM of dNTPs, 1 manufacturer’s reaction buffer, and 0.5 Units Taq DNA polymerase. Thermal cycling parameters consisted of an initial denaturation at 95 C for 15 min followed by 30 cycles of 95 C for 1 min, annealing temperature for 45 s, 72 C extension for 45 s and a final extension at 72 C for 10 min. PCR products were visualized on agarose gels after staining with ethidium bromide. Primers were designed for 398 microsatellites, 345 of which were successfully optimized for PCR (see Appendix).
Marker development efficiency was 16.9% for AAG, 18.6% for ATG, 12.5% for TACA, and 22.5% for TAGA. Specific marker development efficiency (markers containing the enriched-repeat) was 8% for AAG, 15.9% for ATG, 7% for TACA, and 20.3% for TAGA (see Table 1). A subset of 71 microsatellites was randomly chosen for further characterization by genotyping striped bass from University of Maryland broodstock (UMD, n=14) and from North Carolina State University broodstock (NCSU, n=10). DNA was isolated from fin clips using the phenol– chloroform method (Sambrook and Russell 2001). Optimized PCR reaction conditions for each marker were used with the addition of 0.1 lM of fluorescently labeled (FAMTM, HEXTM, or NEDTM) M13 primer. Samples were electrophoresed on an ABI Prism 3730 DNA Analyzer and output files were analyzed using GeneMapper software version 3.7. Of the 71 microsatellite markers genotyped, 13 were monomorphic in both broodstock populations and an additional five were monomorphic in the NCSU population. Five markers consistently produced more than two fragments, suggesting the presence of PCR artifacts or duplication of the locus within the genome (MSM1373, MSM1374, MSM1382, MSM1402, and MSM1447). Of these, amplification of MSM1402 produced two sets of fragments separated by ‡125 base pairs in all individuals and therefore was treated as two separate loci for population analyses (MSM1402a and MSM1402b). Locus name, GenBank accession
Table 1. Discovery of microsatellites from repeat-enriched libraries Library
AAG ATG TACA TAGA GISd Totals a
# Clones sequenced
408 408 408 408 1632
# Markers developed
69 76 51 92 57 345
# Matcheda
33 65 29 83 210
# Perfect repeatsb
6 18 11 50 12 97
Unmatchedc Di
Tri
26 8 21 6 12 73
7 1 2 16 26
Tetra
Other
2 2 1 1 29 35
1
1
Number of markers whose repeat matched the enriched repeat for that library. Number of markers whose repeat matched the enriched repeat and are perfect. c Number of markers whose repeat did not match the enriched repeat. (Di, Tri and Tetra designations represent di-, tri- or tetranucleotide repeat types). d Number of markers identified from two or more of the GIS libraries. b
973 Table 2. Summary data for 51 microsatellite loci for striped bass P-valuec
Locus GenBank
Repeat
Forward primer, Reverse primer
TAe
MgCl2
Size range
# Alleles
MSM1334a BV678387 MSM1335a BV678388 MSM1355a BV678319 MSM1357a BV678321 MSM1359 BV678323 MSM1362a BV678325 MSM1375a BV678410 MSM1394 BV678429 MSM1402a BV678437 MSM1402ba MSM1424a BV678456 MSM1427a BV678459 MSM1429a BV678461 MSM1437a BV678469 MSM1442a BV678474 MSM1443a BV678475 MSM1445a BV678477 MSM1451 BV678483 MSM1453a BV678484 MSM1491a BV678519 MSM1501a BV678529 MSM1520a BV678547 MSM1524a BV678550 MSM1526a BV678552 MSM1536a BV678561 MSM1547a BV678569 MSM1555a BV678575
(ATAG)24
F-aaaagttctgcacatggatcg R-gtccttgccctccagttacag F-gcctgtttgccctcatatagc R-gatgtttccagccttttacgc F-atcctgactgcatccactagccta R-tgtgtggcccaatactagactgac F-gtacctgacacgcataatg R-ccaagcaaaccgtttagtg F-accatgcaccttaggacacta R-ttcatgcccatttagacacca F-atccactttgctgctgttaca R-gcagcacgtacgcagtacaca F-gtgtcggatgatgactctt R-ccactacacccctcacgtt F-cagtgtaggcaatcggagtt R-tctccctctttgcatgtcca F-cgcagctagctaacgagtgtatc R-ggtcgttcaaggaaaaaggttgg
62
2.0
301–365
10
69
0.00025d
58
2.0
238–248
4
52
0.00221d
62
2.0
304–353
10
91
0.57967
58
2.0
242–282
11
91
0.00147d
62
2.0
361–381
6
81
0.04037d
62
2.0
343–396
11
75
0.06761
58
2.0
206–267
7
69
0.33174
58
2.0
188–206
4
36
0.03061d
62
2.0
194–200
10.00
69
0.00256d
62
2.0
325–373 151–157
3.00 2
61 16
0.38964 0.03442d
62
2.0
341–381
7
58
0.00009d
62
2.0
247–359
17
82
0.00095d
62
2.0
180–196
4
29
62
2.0
320–336
2
41
62
2.0
288–303
4
21
0d
58
2.0
136–140
3
33
0d
58
2.0
182–184
2
100
0.00002d
58
2.0
183–202
5
75
0.00044d
58
2.0
196–208
5
66
0.00039d
62
2.0
179–182
2
25
0.00003d
62
2.0
172–222
15
100
0.00943d
62
2.0
337–345
3
75
0.92657
58
2.0
159–176
8
91
0.00039d
62
2.0
254–296
10
91
0.00767d
58
2.0
280–353
15
78
62
2.0
323–360
6
29
(CAT)7(CAC)7 (CA)20 (GATA)21 (ATCT)17 (ATCT)23 (CT)14 (GAA)12 (CTT)12
(AAG)13b (GGT)8(TCT)14b (GAA)40b (CTT)8b (GGAT)8b (TA)8(TG)13 (GAT)9 (GAT)8(GT)21b (GAT)12b (TGA)9 (CAT)11 (CA)31 (TACA)13 (GT)26 (GT)29 (TATG)14 (TATC)16
F-tagtcagccaagggagacgag R-ctccgcttcctcacgcatcat F-cttacctggttcattgtc R-cagaaaggaggcgataac F-ctacaactttgttggacga R-gcagtgggaagtggacttt F-tcgacagatctgcgctcta R-gtgaaggtgcggttggaca F-gagaatgaaaggagggtggatta R-ctttctcctgcctgccagttgta F-ccagagtgaagaccaggtagt R-agagggaagaagtggaacgtg F-gagagagctggatgcagaatgat R-ctgacgcgacagcttgtgtatga F-gattagccatgtgcttag R-cctctcctttacgtggtg F-gcttgttctgtgccctatctc R-gcaaagggacggcaaatttag F-gcatgcggatgaataactgtc R-ttcgagggcatctgaaccttg F-gtcagagggaataataacatgacc R-agttcagttcgagagtaatgcacg F-tctgcgtctcccatactgt R-ttcaagcgttccggttctc F-agcaccttcatggtacccaag R-cgaacgcaaccgcagtaatgt F-gactctgttgcatttaggt R-ggctgtattccatttaggc F-ccatatgagccaggtaaagta R-gaaacacaaacacgcacatct F-atgcaagcttaaccctgtt R-ccctctgctggctgacata F-cctgtttacctgctagtccac R-agtgaagttagcccggtctga
% Het
0d 0.01658d
0d 0.4197
974 Table 2. (Continued)
a
Locus GenBank
Repeat
Forward primer, Reverse primer
TAe
MgCl2
Size range
# Alleles
% Het
P-valuec
MSM1556a BV678576 MSM1557a BV678577 MSM1558a BV678578 MSM1559a BV678579 MSM1560a BV678580 MSM1577a BV678596 MSM1578a BV678597 MSM1579a BV678598 MSM1584 BV678601 MSM1587 BV678604 MSM1589 BV678606 MSM1591a BV678608 MSM1592 BV678609 MSM1598 BV678613 MSM1602 BV678616 MSM1604 BV678618 MSM1617 BV678627 MSM1625 BV678633 MSM1626a BV678634 MSM1628a BV678636 MSM1634 BV678641 MSM1638 BV678644 MSM1645a BV678651
(AGAT)26
F-taccagaagaacaccctagac R-tatgtcctgctactgcgattc F-gtggatcagtgagcgactatgtat R-tgcactgcttactgtagcttccgt F-gggagttctttggtcaagg R-tttctcgatcaccgccact F-agctgacctcctgtgctatct R-agaagcgagagcgtcaagtga F-ttagagggctaatgagtacag R-cacagacccaaactgatactg F-gaaagagggcatcataatacagca R-caaaacaaagagagcgcggtctaa F-gggcaagttgacaggtagaca R-gttagcctcattcgccccaat F-atacagtgtcacccggtactat R-ggacagttatccctgcactatg F-ctctaaggctaatggggttac R-ctgttctgcgttttagttgga F-gaggggtcaagacatttaaccat R-tgtgtcgatgtccttgtcctacg F-tgaataccttcctgtagcct R-catccatttgcaccgtttac F-gatggatggctttcctaccta R-agccacacgattatgaccacc F-cggcactggataaagttac R-tacaatttccctcgggatg F-gacaaatcctctgatgaac R-ctcagcgtctgttaccgta F-gggaatgtgctaaacttggaatag R-tgggattactgggcagatagttga F-attagtctgtggataccgctgga R-acagggctgcagtggaaggtaag F-agagtgggggaagaagtcgttt R-gctgaactctggtccaaggcaa F-aagcttccatatagtgcaccc R-tgtccgagtttgcctgatctc F-cgcaactacgtttcgttaccata R-cagaagccaggcagactgcatac F-aatcccacatggagttgtag R-ccagaccaataaaacgtccc F-gagtgtaatgctgctgctacaact R-tgatttctaacaacagccaacgtg F-acagtgacacacgcttatatgc R-cgttgcccatttcctcttacat F-caatgcaccactcttatac R-ggaacacagccatcattag
62
2.0
211–293
18
73
0d
62
2.0
356–404
10
75
0.03645d
62
2.0
325–373
12
75
0.00488d
62
2.0
320–376
11
91
0.02089d
62
2.0
333–364
8
73
0.44623
52
2.5
335–380
9
83
0.00225d
62
2.0
263–289
8
7
0.00115d
62
2.0
269–286
5
87
0.01464d
62
2.0
251–303
11
95
0.02886d
52
2.5
300–353
9
1
0.00451d
52
2.5
236–309
10
73
62
2.0
293–361
11
78
62
2.0
172–237
12
81
52
2.5
356–388
9
93
52
2.5
214–267
11
72
0d
62
2.0
188–246
10
94
0d
62
2.0
361–403
14
9
0d
62
2.0
197–258
10
66
62
2.0
280–353
15
78
0d
62
2.0
172–221
11
83
0d
62
2.0
168–205
8
65
0d
62
2.0
278–327
10
8
62
2.0
226–259
8
7
(AGAT)11 (TAGA)22 (AGAT)27 (AGAT)19 (CTAT)18 (AGAT)12 (ATCT)11 (ATCT)19 (ATCT)19 (TAGA)22 (TAGA)28 (TAGA)33 (AGAT)20 (CTAT)23 (ATCT)32 (ATCT)21 (ATCT)20 (TAGA)34 (TAGA)16 (CTAT)12 (ATAG)28 (CTAT)16
Marker was used for FST determination. Imperfect repeat. c Exact P-value using a Markov chain method (50 batches with 1000 iterations per batch). d Loci are statistically significant for differentiating two populations. e TA is the PCR annealing temperature in degree Celsius. b
0d 0.00091d 0d 0.07568
0.00029d
0.00012d 0d
975 number, PCR annealing temperatures and MgCl2 concentrations, observed allele size ranges, number of alleles and percent heterozygosity are reported in Table 2. Arlequin version 2.000 software (Schneider et al. 2000) was used to estimate Wright’s F-statistics (FST) and to evaluate HWE and LD for the marker loci for each population. Thirty-five loci (missing genotype data