Identification and characterization of microsatellites for ... - Springer Link

Springer 2006

Conservation Genetics (2006) 7:971–982 DOI 10.1007/s10592-006-9122-0

Identification and characterization of microsatellites for striped bass from repeat-enriched libraries Caird Rexroad1,*, Roger Vallejo1, Issa Coulibaly1, Charlene Couch2, Amber Garber2, Mark Westerman3 & Craig Sullivan2 1

USDA/ARS/NCCCWA, 11861 Leetown Road, Kearneysville, West Virginia 25430, USA; 2Department of Zoology, North Carolina State University, Raleigh, North Carolina 27695, USA; 3Kent SeaTech Corporation, 11125 Flintkote Avenue, Suite J, San Diego, California 92121, USA (*Corresponding author: Phone: +1-304-724-8340, ext. 2129; Fax: +1-304-725-0351; E-mail: [email protected])

Received 21 December 2005; accepted 17 January 2006

Key words: breeding, genetics, heterozygosity, microsatellite, striped bass

Abstract Striped bass (Morone saxatilis) is economically important in the US due to its value as an aquaculture species and in supporting commercial and recreational fisheries, especially those off the Atlantic coast and in the Gulf of Mexico. Modern strategies for managing fishery populations and aquaculture broodstocks employ the use of molecular genetic markers to identify individuals, assign parentage, and characterize population genetic structure and levels of inbreeding and migration. As part of a collaborative effort to utilize molecular genetic technologies in striped bass breeding programs we generated microsatellite markers for use in population genetic studies, broodstock selection and management strategies, and the construction of a genetic map. We developed 345 new microsatellite markers for striped bass, a subset (n=71) of which was characterized by genotyping samples from two striped bass broodstock populations to evaluate marker polymorphism, percent heterozygosity, Hardy–Weinberg equilibrium (HWE), linkage disequilibrium (LD) and utility for population genetic studies.

Striped bass (Morone saxatilis) is economically important in the US due to its value as an aquaculture species and in supporting commercial and recreational fisheries, especially those off the Atlantic coast and in the Gulf of Mexico. Modern strategies for managing fishery populations and aquaculture broodstocks employ the use of molecular genetic markers to identify individuals, assign parentage, and characterize population genetic structure and levels of inbreeding and migration. Several studies have utilized molecular genetic markers to characterize wild populations and migration patterns of this anadromous fish (Wirgin et al. 1991, 1993, 1997; Leclerc et al. 1996; Bielawski and Pumo 1997; Han et al. 2000; Roy et al. 2000; Ross et al. 2004; Brown et al. 2005).

As part of a collaborative effort to employ molecular genetic technologies in striped bass breeding programs we generated microsatellite markers for use in population genetic studies, broodstock selection and management strategies, and the construction of a genetic map. We developed 345 new microsatellite markers for striped bass, a subset (n=71) of which was characterized by genotyping samples from two striped bass broodstock populations to evaluate marker polymorphism, percent heterozygosity, Hardy–Weinberg equilibrium (HWE), linkage disequilibrium (LD) and utility for population genetic studies. Four repeat-enriched libraries were constructed by Genetic Identification Services (GIS; Chatsworth, CA, USA) to identify AAG, ATG, TACA,

972 and TAGA repeats in striped bass. The libraries were constructed from genomic DNA using protocols which enrich for fragments containing a motif of interest (Peacock et al. 2002). Single pass sequencing using the ABI Prism Big Dye Terminator cycle sequencing kit (ABI, Foster City, CA, USA) with a universal M13 forward primer was conducted on 408 clones from each of the four libraries. Sequences were trimmed for quality and vector sequences removed using Phred and Crossmatch (Ewing et al. 1998). All sequences in the data set and all other Morone saxatilis microsatellites from GenBank were analyzed for redundancy using Sequencher (Gene Codes Corporation, Ann Arbor, Michigan). Novel unique sequences were candidates for primer design using Oligo 6.0 (Rychlik and Rhoads 1989). An M13 tail was added to each forward primer for genotyping (Boutin-Ganache et al. 2001). Primer pairs were optimized for PCR by varying annealing temperatures and MgCl2 concentrations. Optimizations were conducted in a 12 ll reaction volume containing 12.5 ng DNA, 1.5–2.5 mM MgCl2, 1.0 lM of each primer, 200 lM of dNTPs, 1 manufacturer’s reaction buffer, and 0.5 Units Taq DNA polymerase. Thermal cycling parameters consisted of an initial denaturation at 95 C for 15 min followed by 30 cycles of 95 C for 1 min, annealing temperature for 45 s, 72 C extension for 45 s and a final extension at 72 C for 10 min. PCR products were visualized on agarose gels after staining with ethidium bromide. Primers were designed for 398 microsatellites, 345 of which were successfully optimized for PCR (see Appendix).

Marker development efficiency was 16.9% for AAG, 18.6% for ATG, 12.5% for TACA, and 22.5% for TAGA. Specific marker development efficiency (markers containing the enriched-repeat) was 8% for AAG, 15.9% for ATG, 7% for TACA, and 20.3% for TAGA (see Table 1). A subset of 71 microsatellites was randomly chosen for further characterization by genotyping striped bass from University of Maryland broodstock (UMD, n=14) and from North Carolina State University broodstock (NCSU, n=10). DNA was isolated from fin clips using the phenol– chloroform method (Sambrook and Russell 2001). Optimized PCR reaction conditions for each marker were used with the addition of 0.1 lM of fluorescently labeled (FAMTM, HEXTM, or NEDTM) M13 primer. Samples were electrophoresed on an ABI Prism 3730 DNA Analyzer and output files were analyzed using GeneMapper software version 3.7. Of the 71 microsatellite markers genotyped, 13 were monomorphic in both broodstock populations and an additional five were monomorphic in the NCSU population. Five markers consistently produced more than two fragments, suggesting the presence of PCR artifacts or duplication of the locus within the genome (MSM1373, MSM1374, MSM1382, MSM1402, and MSM1447). Of these, amplification of MSM1402 produced two sets of fragments separated by ‡125 base pairs in all individuals and therefore was treated as two separate loci for population analyses (MSM1402a and MSM1402b). Locus name, GenBank accession

Table 1. Discovery of microsatellites from repeat-enriched libraries Library

AAG ATG TACA TAGA GISd Totals a

# Clones sequenced

408 408 408 408 1632

# Markers developed

69 76 51 92 57 345

# Matcheda

33 65 29 83 210

# Perfect repeatsb

6 18 11 50 12 97

Unmatchedc Di

Tri

26 8 21 6 12 73

7 1 2 16 26

Tetra

Other

2 2 1 1 29 35

1

1

Number of markers whose repeat matched the enriched repeat for that library. Number of markers whose repeat matched the enriched repeat and are perfect. c Number of markers whose repeat did not match the enriched repeat. (Di, Tri and Tetra designations represent di-, tri- or tetranucleotide repeat types). d Number of markers identified from two or more of the GIS libraries. b

973 Table 2. Summary data for 51 microsatellite loci for striped bass P-valuec

Locus GenBank

Repeat

Forward primer, Reverse primer

TAe

MgCl2

Size range

# Alleles

MSM1334a BV678387 MSM1335a BV678388 MSM1355a BV678319 MSM1357a BV678321 MSM1359 BV678323 MSM1362a BV678325 MSM1375a BV678410 MSM1394 BV678429 MSM1402a BV678437 MSM1402ba MSM1424a BV678456 MSM1427a BV678459 MSM1429a BV678461 MSM1437a BV678469 MSM1442a BV678474 MSM1443a BV678475 MSM1445a BV678477 MSM1451 BV678483 MSM1453a BV678484 MSM1491a BV678519 MSM1501a BV678529 MSM1520a BV678547 MSM1524a BV678550 MSM1526a BV678552 MSM1536a BV678561 MSM1547a BV678569 MSM1555a BV678575

(ATAG)24

F-aaaagttctgcacatggatcg R-gtccttgccctccagttacag F-gcctgtttgccctcatatagc R-gatgtttccagccttttacgc F-atcctgactgcatccactagccta R-tgtgtggcccaatactagactgac F-gtacctgacacgcataatg R-ccaagcaaaccgtttagtg F-accatgcaccttaggacacta R-ttcatgcccatttagacacca F-atccactttgctgctgttaca R-gcagcacgtacgcagtacaca F-gtgtcggatgatgactctt R-ccactacacccctcacgtt F-cagtgtaggcaatcggagtt R-tctccctctttgcatgtcca F-cgcagctagctaacgagtgtatc R-ggtcgttcaaggaaaaaggttgg

62

2.0

301–365

10

69

0.00025d

58

2.0

238–248

4

52

0.00221d

62

2.0

304–353

10

91

0.57967

58

2.0

242–282

11

91

0.00147d

62

2.0

361–381

6

81

0.04037d

62

2.0

343–396

11

75

0.06761

58

2.0

206–267

7

69

0.33174

58

2.0

188–206

4

36

0.03061d

62

2.0

194–200

10.00

69

0.00256d

62

2.0

325–373 151–157

3.00 2

61 16

0.38964 0.03442d

62

2.0

341–381

7

58

0.00009d

62

2.0

247–359

17

82

0.00095d

62

2.0

180–196

4

29

62

2.0

320–336

2

41

62

2.0

288–303

4

21

0d

58

2.0

136–140

3

33

0d

58

2.0

182–184

2

100

0.00002d

58

2.0

183–202

5

75

0.00044d

58

2.0

196–208

5

66

0.00039d

62

2.0

179–182

2

25

0.00003d

62

2.0

172–222

15

100

0.00943d

62

2.0

337–345

3

75

0.92657

58

2.0

159–176

8

91

0.00039d

62

2.0

254–296

10

91

0.00767d

58

2.0

280–353

15

78

62

2.0

323–360

6

29

(CAT)7(CAC)7 (CA)20 (GATA)21 (ATCT)17 (ATCT)23 (CT)14 (GAA)12 (CTT)12

(AAG)13b (GGT)8(TCT)14b (GAA)40b (CTT)8b (GGAT)8b (TA)8(TG)13 (GAT)9 (GAT)8(GT)21b (GAT)12b (TGA)9 (CAT)11 (CA)31 (TACA)13 (GT)26 (GT)29 (TATG)14 (TATC)16

F-tagtcagccaagggagacgag R-ctccgcttcctcacgcatcat F-cttacctggttcattgtc R-cagaaaggaggcgataac F-ctacaactttgttggacga R-gcagtgggaagtggacttt F-tcgacagatctgcgctcta R-gtgaaggtgcggttggaca F-gagaatgaaaggagggtggatta R-ctttctcctgcctgccagttgta F-ccagagtgaagaccaggtagt R-agagggaagaagtggaacgtg F-gagagagctggatgcagaatgat R-ctgacgcgacagcttgtgtatga F-gattagccatgtgcttag R-cctctcctttacgtggtg F-gcttgttctgtgccctatctc R-gcaaagggacggcaaatttag F-gcatgcggatgaataactgtc R-ttcgagggcatctgaaccttg F-gtcagagggaataataacatgacc R-agttcagttcgagagtaatgcacg F-tctgcgtctcccatactgt R-ttcaagcgttccggttctc F-agcaccttcatggtacccaag R-cgaacgcaaccgcagtaatgt F-gactctgttgcatttaggt R-ggctgtattccatttaggc F-ccatatgagccaggtaaagta R-gaaacacaaacacgcacatct F-atgcaagcttaaccctgtt R-ccctctgctggctgacata F-cctgtttacctgctagtccac R-agtgaagttagcccggtctga

% Het

0d 0.01658d

0d 0.4197

974 Table 2. (Continued)

a

Locus GenBank

Repeat

Forward primer, Reverse primer

TAe

MgCl2

Size range

# Alleles

% Het

P-valuec

MSM1556a BV678576 MSM1557a BV678577 MSM1558a BV678578 MSM1559a BV678579 MSM1560a BV678580 MSM1577a BV678596 MSM1578a BV678597 MSM1579a BV678598 MSM1584 BV678601 MSM1587 BV678604 MSM1589 BV678606 MSM1591a BV678608 MSM1592 BV678609 MSM1598 BV678613 MSM1602 BV678616 MSM1604 BV678618 MSM1617 BV678627 MSM1625 BV678633 MSM1626a BV678634 MSM1628a BV678636 MSM1634 BV678641 MSM1638 BV678644 MSM1645a BV678651

(AGAT)26

F-taccagaagaacaccctagac R-tatgtcctgctactgcgattc F-gtggatcagtgagcgactatgtat R-tgcactgcttactgtagcttccgt F-gggagttctttggtcaagg R-tttctcgatcaccgccact F-agctgacctcctgtgctatct R-agaagcgagagcgtcaagtga F-ttagagggctaatgagtacag R-cacagacccaaactgatactg F-gaaagagggcatcataatacagca R-caaaacaaagagagcgcggtctaa F-gggcaagttgacaggtagaca R-gttagcctcattcgccccaat F-atacagtgtcacccggtactat R-ggacagttatccctgcactatg F-ctctaaggctaatggggttac R-ctgttctgcgttttagttgga F-gaggggtcaagacatttaaccat R-tgtgtcgatgtccttgtcctacg F-tgaataccttcctgtagcct R-catccatttgcaccgtttac F-gatggatggctttcctaccta R-agccacacgattatgaccacc F-cggcactggataaagttac R-tacaatttccctcgggatg F-gacaaatcctctgatgaac R-ctcagcgtctgttaccgta F-gggaatgtgctaaacttggaatag R-tgggattactgggcagatagttga F-attagtctgtggataccgctgga R-acagggctgcagtggaaggtaag F-agagtgggggaagaagtcgttt R-gctgaactctggtccaaggcaa F-aagcttccatatagtgcaccc R-tgtccgagtttgcctgatctc F-cgcaactacgtttcgttaccata R-cagaagccaggcagactgcatac F-aatcccacatggagttgtag R-ccagaccaataaaacgtccc F-gagtgtaatgctgctgctacaact R-tgatttctaacaacagccaacgtg F-acagtgacacacgcttatatgc R-cgttgcccatttcctcttacat F-caatgcaccactcttatac R-ggaacacagccatcattag

62

2.0

211–293

18

73

0d

62

2.0

356–404

10

75

0.03645d

62

2.0

325–373

12

75

0.00488d

62

2.0

320–376

11

91

0.02089d

62

2.0

333–364

8

73

0.44623

52

2.5

335–380

9

83

0.00225d

62

2.0

263–289

8

7

0.00115d

62

2.0

269–286

5

87

0.01464d

62

2.0

251–303

11

95

0.02886d

52

2.5

300–353

9

1

0.00451d

52

2.5

236–309

10

73

62

2.0

293–361

11

78

62

2.0

172–237

12

81

52

2.5

356–388

9

93

52

2.5

214–267

11

72

0d

62

2.0

188–246

10

94

0d

62

2.0

361–403

14

9

0d

62

2.0

197–258

10

66

62

2.0

280–353

15

78

0d

62

2.0

172–221

11

83

0d

62

2.0

168–205

8

65

0d

62

2.0

278–327

10

8

62

2.0

226–259

8

7

(AGAT)11 (TAGA)22 (AGAT)27 (AGAT)19 (CTAT)18 (AGAT)12 (ATCT)11 (ATCT)19 (ATCT)19 (TAGA)22 (TAGA)28 (TAGA)33 (AGAT)20 (CTAT)23 (ATCT)32 (ATCT)21 (ATCT)20 (TAGA)34 (TAGA)16 (CTAT)12 (ATAG)28 (CTAT)16

Marker was used for FST determination. Imperfect repeat. c Exact P-value using a Markov chain method (50 batches with 1000 iterations per batch). d Loci are statistically significant for differentiating two populations. e TA is the PCR annealing temperature in degree Celsius. b

0d 0.00091d 0d 0.07568

0.00029d

0.00012d 0d

975 number, PCR annealing temperatures and MgCl2 concentrations, observed allele size ranges, number of alleles and percent heterozygosity are reported in Table 2. Arlequin version 2.000 software (Schneider et al. 2000) was used to estimate Wright’s F-statistics (FST) and to evaluate HWE and LD for the marker loci for each population. Thirty-five loci (missing genotype data