Mol Breeding DOI 10.1007/s11032-013-9970-5
Development of single nucleotide polymorphisms in Phaseolus vulgaris and related Phaseolus spp D. Goretti • E. Bitocchi • E. Bellucci • M. Rodriguez • D. Rau • T. Gioia • G. Attene P. McClean • L. Nanni • R. Papa
•
Received: 15 April 2013 / Accepted: 30 September 2013 Ó Springer Science+Business Media Dordrecht 2013
Abstract In this study, new single nucleotide polymorphism (SNP) markers were developed for common bean (Phaseolus vulgaris L.) and related Phaseolus species. The applied strategy presents new and interesting aspects, such as the choice of accessions used, which was aimed at capturing a large portion of the genetic diversity present in the common bean, with particular focus on wild and domesticated materials from Mesoamerica and the identification of loci for sequencing. Indeed, the primer pairs for 34 loci were designed with the main strategy being to search for
D. Goretti and E. Bitocchi have contributed equally to this study.
Electronic supplementary material The online version of this article (doi:10.1007/s11032-013-9970-5) contains supplementary material, which is available to authorized users. D. Goretti E. Bitocchi E. Bellucci L. Nanni R. Papa (&) Dipartimento di Scienze Agrarie, Alimentari ed Ambientali, Universita` Politecnica delle Marche, Via Brecce Bianche, 60131 Ancona, Italy e-mail:
[email protected] M. Rodriguez D. Rau G. Attene Dipartimento di Agraria, Universita` degli Studi di Sassari, Via de Nicola, 07100 Sassari, Italy M. Rodriguez G. Attene Centro per la Conservazione e la Valorizzazione della Biodiversita` Vegetale (CBV), Universita` degli Studi di Sassari, 07041 Surigheddu, Alghero, Italy
single-copy orthologous genes among the legumes (for use in other legume species and comparative analyses). The 10 remaining loci were selected as being near to domestication quantitative trait loci or detected as putatively under selection during domestication in previous studies. To provide an efficient and inexpensive genotyping platform for geneticists and breeders, we used sequence data to develop 60 new SNP markers for KASPar assay genotyping. The same sample was also genotyped with SNP markers developed for common bean in other studies for the same assay. This allowed testing for systematic bias according to the criteria chosen to select the genotypes in which the genetic diversity is surveyed during SNP discovery. Finally, we show that most of the SNP markers worked well in a set of accessions of other species belonging to the Phaseolus genus. The genetic resources developed
T. Gioia Scuola di Scienze Agrarie, Forestali, Alimentari ed Ambientali, Universita` degli Studi della Basilicata, Via dell’Ateneo Lucano 10, 85100 Potenza, Italy P. McClean Department of Plant Sciences, North Dakota State University, 166 Loftsgard Hall, Fargo, ND 58105, USA R. Papa Consiglio per la Ricerca e la Sperimentazione in Agricoltura, Cereal Research Centre (CRA-CER), S.S. 16, Km 675, 71122 Foggia, Italy
123
Mol Breeding
will be very useful not only for breeding, but also for biodiversity conservation management and evolutionary studies on legumes. Keywords Sanger sequencing KASPar genotyping Phaseolus SNP Genetic diversity
Introduction Different molecular markers are being developed for the assessment of genetic diversity in plant species. These represent important tools for plant genetic research. Recently, scientists and geneticists have focused on the use of single nucleotide polymorphisms (SNPs), as these have many advantages over other molecular markers. SNPs are bi-allelic and co-dominant, and show a simpler mutational model, while being less susceptible to homoplasy than other markers such as microsatellites (Schlo¨tterer 2004). Their higher frequency in the genome compared to other molecular markers compensates for their lower polymorphism rate (Kruglyak 1997). SNPs are useful for both basic and applied research, such as evolutionary studies, analysis of genome structure, genome-wide association mapping, and integration of genetic maps, which represent useful tools for plant breeding (Rafalski 2002; Kumar et al. 2012). Furthermore, because of recent progress in sequencing, large numbers of SNPs can be detected in a short time and at relatively low cost. Currently, there are several high-throughput genotyping platforms for SNP detection (Kumar et al. 2012). KASPar assays appear to be a versatile genomewide SNP genotyping system that has desirable features, in that the assays (1) are flexible in terms of the number of SNPs (the assay runs as a standard polymerase chain reaction [PCR]) and genotypes to be used; (2) are rapid for data acquisition; (3) are relatively inexpensive; and (4) result in negligible mismatch levels compared to Sanger sequencing. However, KASPar assays require previous knowledge of the sequence and conserved regions (*40 bp) downstream and upstream of the SNPs. Generally, SNP markers are developed from a small subset of germplasm, and are then used in arrays for genotyping of larger numbers of individuals. This strategy of data generation through SNPs is rapid and cheap, although it also has some problems, as it can
123
lead to ‘ascertainment bias’ (Clark et al. 2005). This is because a set of markers identified in a certain sample can bias genetic analyses carried out in populations with different allele frequencies. Several studies have demonstrated that SNP detection can introduce bias into various population-genetics parameters, such as nucleotide diversity and linkage disequilibrium (Kuhner et al. 2000; Nielsen 2000; Schlotterer and Harr 2002; Akey et al. 2003; Moragues et al. 2010; Frascaroli et al. 2013). Although ascertainment bias is important in all SNP studies, it becomes crucial when SNPs are developed for species with complex (or unknown) population structures. Indeed, when populations are highly structured, the variation present is less likely to be included in all of the subpopulations, and incorrect inferences can arise (for review, see Rosenblum and Novembre 2007). The common bean (P. vulgaris L.) is the most important legume for human consumption throughout the world, and is one of the most important staple foods in developing countries. From a nutritional point of view, although they are poor in methionine, beans are an excellent source of minerals, especially iron and zinc, and they also contain proteins that are characterized by sulfur-based essential amino acids, such as lysine; nutritionally, they thus complement cereals (Broughton et al. 2003). P. vulgaris has an evolutionary history that is almost unique among crops, in that it is characterized by two main geographically distinct and partially isolated gene pools (Mesoamerica and Andes) that originated prior to domestication. This led to the occurrence of two independent domestication events (Bitocchi et al. 2012, 2013). A third gene pool is localized between Peru and Ecuador (Debouck et al. 1993), and includes only the wild forms of P. vulgaris that are characterized by a specific seed protein (phaseolin type I; Kami et al. 1995). Several studies have reported the identification of SNPs in the common bean and their use to perform linkage map construction and in diversity and synteny analyses. Ramirez et al. (2005) developed expressed sequence tag (EST) libraries of the Mesoamerican genotype Negro Jamapa 81 and the Andean genotype G19883, which represent the first sizeable resource of SNPs in common bean. On the basis of these ESTs, Galeano et al. (2009a, b) developed single-strand conformation polymorphism markers. Gaita´n-Solı´s et al. (2008) identified 239 SNPs by investigating the nucleotide diversity of 47 fragments of DNA from 10
Mol Breeding
P. vulgaris accessions. Cleaved amplified polymorphic sequence (CAPS) and derived CAPS (dCAPS) approaches have been used to generate gene-based SNP markers from the Andean genotype JaloEEP558 and the Mesoamerican genotype BAT93 (McConnell et al. 2010). By using the same genotypes, Hyten et al. (2010) designed and validated non-genic SNPs, using a reduced representation library from multiple rounds of nested digestions, with sequencing carried out by 454 pyrosequencing and Solexa technologies. Recently, Corte´s et al. (2011) used data from the literature (Ramirez et al. 2005; Galeano et al. 2009a, b; Hyten et al. 2010) to develop 94 SNP markers for KASPar genotyping. Galeano et al. (2012) identified about 170 new intron-SNPs from amplification of 313 intronic regions in the DOR364, BAT477 and G19833 genotypes, while Blair et al. (2013) constructed a Golden Gate assay for 768 SNPs (based on sequence data from about 1,400 gene fragments of JaloEEP558 and BAT93 genotypes). In the present study, we developed SNP markers for both evolutionary and applied common bean research, by (1) exploiting our knowledge at the level and organization of the genetic diversity of common bean germplasm to select a panel of accessions for SNP discovery that captures the genetic diversity of the species; (2) investigating the nucleotide diversity in gene fragments (mostly single-copy orthologous genes among legumes) through Sanger sequencing; (3) using sequence data to develop SNP markers for KASPar assays (a high-throughput platform); (4) evaluating ascertainment bias in the population genetics estimates computed with SNP sets developed in this study and identified by Corte´s et al. (2011) using a different sampling strategy; and (5) also applying these SNP markers to other legume species.
Materials and methods Plant materials A panel of 22 accessions of P. vulgaris was used for the SNP discovery. These accessions had already been characterized with different types of molecular markers and nucleotide data in previous studies (Rossi et al. 2009; Bitocchi et al. 2012, 2013; Desiderio et al. 2013). A major outcome highlighted by these studies was that the Mesoamerican gene pool is characterized
by the highest level of genetic diversity, and thus 18 of the selected accessions are representatives of the Mesoamerican gene pool. The remaining four genotypes included one wild accession and two domesticated accessions from the Andean gene pool, and one accession from northern Peru that is characterized by the ancestral type I phaseolin. The seeds were provided by the United States Department of Agriculture Western Regional Plant Introduction Station (Pullman, WA, USA), the International Centre of Tropical Agriculture in Colombia, and Prof. Paul Gepts, Department of Plant Sciences, Section of Crop and Ecosystem Sciences, UC Davis, CA, USA. A complete list of the accessions studied, along with their ‘passport’ information, is given in Supporting Information Table S1. The SNPs identified in P. vulgaris were genotyped in a set of 37 wild accessions from 10 different Phaseolus species (see Supporting Information Table S2). Primer design, PCR and sequencing Identification of loci through homology with the soybean genome The main strategy was to detect single-copy orthologous genes between legume species and to design primers in conserved regions for the amplification of gene-based fragments. We used 102 gene fragments previously amplified in soybean (Zhu et al. 2003; Hyten et al. 2006) to search for homologous regions in Phaseolus spp. BLASTn (Altschul et al. 1997) analyses were carried out against the NCBI/GenBank (http://www.ncbi.nlm.nih.gov/blast/) nucleotide collection (nr/nt), EST, and genomic survey sequence (GSS) databases. As coding regions are generally more conserved, exon-derived primers were preferred, so as to minimize PCR failure. Whenever possible, noncoding regions were included, because of their high genetic variation. PRIMER3 software (Rozen and Skaletsky 2000) was used for primer design. The annealing temperature (optimized to 58 °C through primer design), G/C content, and secondary structures were checked using PERLPRIMER v1.1 (Marshall 2004). The code ‘AN-Pv’ followed by a number was used to label the primer pairs developed. Genomic DNA of each accession was extracted from young leaves of a single, greenhouse-grown plant using the miniprep extraction method (Doyle and Doyle
123
Mol Breeding
1987). PCR analyses were performed using 25 ng genomic DNA template and the following reagents: 0.25 lM each of the forward and reverse primers, 200 lM of each dNTP, 2 mM MgCl2, 1 9 Taq polymerase buffer, 1 unit Taq DNA polymerase (Jena Bioscience, Germany), and sterile double-distilled H2O, to a final volume of 50 ll. Amplifications were carried out with a 9700 Thermal Cycler (Perkin-Elmer Applied Biosystems) with an initial denaturation of 2 min at 95 °C followed by 28 cycles of 1 min at 95 °C, 1 min at 58 °C, and 2 min at 72 °C, plus 10 min of final extension at 72 °C. The amplification products were visualized following 1.5 % agarose gel electrophoresis, with ethidium bromide staining. The PCR products were purified using GFX PCR DNA and Gel Band Purification kits (GE Healthcare, Buckinghamshire, UK), according to the manufacturer’s instructions. A single strand sequencing reaction was performed using Big Dye Terminator Cycle Sequencing Ready Reaction kits (Applied Biosystems, Foster City, CA, USA). The products were resolved on an ABI Prism 3100-Avant Automated Sequencer (Applied Biosystems). For three loci for which the amplification was more difficult, the samples were sequenced on both strands, using forward and reverse primers in the cycle sequencing reaction. The sequence data were then analyzed using Pregap4 and Gap4 of the Staden software package (http://staden.sourceforge. net/). The Pregap4 modules were used to prepare the sequence data for assembly (quality analysis). Gap4 was used for the final sequence assembly of the Pregap4 output files (normal shotgun assembly). The fragments were resequenced if there was any ambiguity as to which allele was present. The sequences are accessible through GenBank (GenBank accession numbers KF695425 - KF696358 and Supplementary File 1). The chromosomal locations of the loci in the P. vulgaris genome were identified through electronic mapping based on synteny with the Glycine max genome. The complete list of these loci is given in Supporting Information Table S3, along with the primer sequences, gene functions, chromosomal locations, and soybean and Phaseolus spp. reference sequences used for primer design. Other loci A further 10 loci were analyzed in this study (Supporting Information Table S3), seven of which
123
were from the literature: Leg443 (Hougaard et al. 2008), g510 and g523 (McConnell et al. 2010), and four sequence-tagged site (STS) loci, gssE18, gssE19, gssE20, and gssE28, developed by Bellucci (2006) using amplified fragment length polymorphism (AFLP) data (Papa et al. 2007), with the aim of identifying molecular markers that could be used to tag the genomic areas of high divergence between wild and domesticated P. vulgaris forms. The primer pairs for AN-PvCO, AN-TGA, and AN-DNAJ were developed in this study. In particular, AN-PvCO is homologous to the CONSTANS (CO) gene in Arabidopsis, which is involved in flowering time (Putterill et al. 1995; Samach et al. 2000; Hayama and Coupland 2004). It has been assumed that this has a similar role in Phaseolus, although there has been no scientific evidence of this to date (Kwak et al. 2008). The TGA gene encodes a TGA-type basic leucine-zipper protein, while the DNAJ gene encodes a DNAJ heat-shock family protein. The PCR conditions and sequencing procedures for these additional loci were the same as those reported for loci derived from soybean genes, with the only difference being the annealing temperature, which was specific for each locus, as shown in Supporting Information Table S3. SNP marker design for KASPar genotyping A further 10 loci were selected from McConnell et al. (2010) (Supporting Information Table S4) because they are located near domestication-related QTLs, and they were sequenced in a subsample of six P. vulgaris accessions (see Supporting Information Table S1; GenBank accession numbers KF696330 - KF696358 and Supplementary File 1). Editing and alignment of the sequences were performed as described for the AN-Pv loci. The data obtained from the Sanger sequencing were used to design a set of primers for the amplification of SNP markers for the KASPar genotyping platform (KBioscence, Hoddesdon, UK; http://www.kbioscence. co.uk). This uses a technique based on allele-specific oligo-extension and energy-transfer-based detection. A total of 60 SNP markers (hereafter defined as SNPsetA) were developed based on sequence data from loci described in the present study. The primer design was performed using PrimerPicker software (KBiosciences); the output provides sequence information for two
Mol Breeding
allele-specific oligonucleotides of about 40 bp in length and one common oligonucleotide of about 20 bp in length. All of these are standard unmodified and unlabelled oligonucleotides. Detailed information on all of the markers is given in Supporting Information Table S5. In common bean, 94 SNP markers were developed for the same genotyping technology by Corte´s et al. (2011); thus, we decided to also use these SNPs (hereafter defined as SNPsetB) to genotype the 22 P. vulgaris accessions and 37 accessions that included 10 different Phaseolus species. The genotyping was performed by KBioscience (Hoddesdon, UK). Duplicate samples and negative controls (water and blank whole-genome amplification reactions) were incorporated to validate reproducibility. The protocol used is provided by KBiosciences (http://www.lgcgenomics. com/genotyping/). Genetic diversity analyses Sanger sequencing data The sequence alignment and editing for each of the gene fragments successfully amplified in the sample of 22 P. vulgaris accessions were carried out using MUSCLE version 3.7 (Edgar 2004), and BioEdit version 7.0.9.0 (Hall 1999). BLASTn and BLASTp (Altschul et al. 1997) analyses were carried out against the NCBI/ GenBank database to identify homologies with Phaseolus spp. and Glycine spp. nucleotide and peptide sequences, respectively. This allowed the identification of exons, introns, 30 untranslalted region (UTR) and 50 UTR for 40 out of the 44 loci considered. Considering the whole sequence and the coding and noncoding regions separately, the following diversity estimates were computed: V (number of variable sites), Pi (parsimony informative sites), S (singleton variable sites), H (number of haplotypes), Hd (haplotype diversity; Nei 1987), p (Tajima 1983), and h (Watterson 1975). Insertion/deletions (indels) were not included in the analyses. These estimates were computed for the whole sample and for the following groups: Mesoamerican accessions, and Mesoamerican wild (MW) and domesticated (MD) forms. Molecular population genetics analyses were conducted using DnaSP version 5.10.01 (Librado and Rozas 2009). The loss of nucleotide diversity in the domesticated versus wild populations in Mesoamerica was computed using the statistic
Lp = 1 - (pMD/pMW) (Vigouroux et al. 2002), where pMD and pMW are the nucleotide diversities in the domesticated and wild populations, respectively. Using concatenated sequences, an unrooted neighbor-joining (NJ) tree based on Kimura two-parameter distances was developed to investigate the relationships between the P. vulgaris accessions. The relative support for each node was tested using the bootstrap method, with 1,000 replicates, in MEGA version 4 (Tamura et al. 2007). SNPs from the KASPar genotyping The SNP data obtained by the KASPar genotyping were used to analyze the level of genetic diversity in the whole sample and in the Mesoamerican, MW, and MD populations of the common bean. The following statistics were computed: average numbers of observed alleles per locus (Na), effective numbers of alleles per locus (Ne; Kimura and Crow 1964) and unbiased expected heterozygosity (He; Nei 1978). The loss of genetic diversity in the domesticated versus wild populations in Mesoamerica was computed using the statistic DH = 1 - (HeMD/HeMW) (Vigouroux et al. 2002), where HeMD and HeMW are the genetic diversities in the domesticated and wild populations, respectively. The genetic relationships between the accessions were investigated using principal component analysis (PCA). All of the analyses were performed using Genalex 6.41 (Peakall and Smouse 2006). The differences between the populations for the genetic diversity estimates were tested using Wilcoxon–Kruskal–Wallis nonparametric tests (Sokal and Rohlf 1995).
Results Sanger sequencing data We developed a set of primer pairs that were useful for the amplification of gene fragments in P. vulgaris through Sanger sequencing. We used 102 soybean gene fragments (Zhu et al. 2003; Hyten et al. 2006) to search for homologous regions in Phaseolus spp., with the aim of identifying single-copy orthologous genes between legume species, and then designing primers in conserved regions to amplify gene-based fragments. Seventy-eight soybean gene fragments had homologs in the Phaseolus genus, and, of these, 69 showed
123
Mol Breeding
homology with P. vulgaris accessions. Eight loci were redundant in Phaseolus spp., and were thus excluded. Seventy loci were selected for primer design. Out of the 70 designed primer pairs, 40 loci were successfully amplified (with a single, clearly amplified band) in all or most of the 22 P. vulgaris accessions. The primer pairs that gave no amplification or multiple products (43 %) were not examined further. Six loci were excluded because the amplification resulted from two or more homologous regions. Thus, we obtained successful amplifications and good quality sequencing for 34 primer pairs, which corresponded to 33 % of the original soybean gene-fragment set (Hyten et al. 2006). Three other primer pairs were designed to sequence gene fragments putatively influenced by selection during domestication (AN-PvCO, AN-TGA, and AN-DNAJ). For the same reason, seven loci were chosen from the literature: the Leg443 locus (Hougaard et al. 2008), the g510 and g523 loci (McConnell et al. 2010), and the four STS loci (Bellucci 2006). A total of 44 gene fragments were successfully amplified and sequenced in a sample of 22 P. vulgaris accessions, most of which were of Mesoamerican origin, with three accessions from the Andean gene pool and one from that of northern Peru. The sequenced region for each locus encompasses between 133 bp (AN-Pv41) and 728 bp (AN-Pv68), with a mean of *400 bp per locus. Overall we sequenced *17.2 kb per accession. A total of 304 SNPs were identified, with a mean of 6.9 SNPs per locus and a frequency of one SNP every 60 bp (Table 1; Supporting Information Table S6). Two loci (AN-Pv9 and AN-Pv55) were monomorphic among the 22 P. vulgaris accessions. The mean number of haplotypes per locus was 4.7, with a mean haplotype diversity of 0.54 (Table 1). Without considering the monomorphic loci, the nucleotide diversity (p) per locus ranged from 0.33 9 10-3 (gssE20) to 13.44 9 10-3 (AN-Pv48), with a mean value of 4.73 9 10-3 (Supporting information Table S6). The structure (exons, introns, 50 -UTRs and 30 -UTRs) for 40 out of the 44 loci were identified by BLAST analyses. The four fragments with unknown structure were the AN-Pv48, Leg443, gssE19 and gssE28 loci. Thirty-seven loci included exon regions, while 32 included noncoding regions (introns and/or 50 -UTRs and 30 -UTRs). The genetic diversity statistics computed separately for the coding and noncoding regions for each characterized locus are reported in Supporting
123
Information Tables S7 and S8, respectively. Taking into account the sequence length whose structure was known (*16.6 kb), 63 % was represented by exons, 30 % by introns, and 7 % by 50 -UTRs and 30 -UTRs (Table 1). A total of 135 SNPs were identified in exons, 61 % of which were synonymous substitutions, while 130 SNPs were identified in noncoding regions. The SNP frequency was higher in the noncoding regions (one SNP every 46 bp) than in the coding regions (one SNP every 77 bp), as was the genetic diversity (p = 7.23 9 10-3 and 3.70 9 10-3, respectively) (Table 1). The genetic diversity statistics for each locus were computed not only for the whole sample, but also for the Mesoamerican, MW, and MD populations (Supporting Information Tables S6, S7, S8). Two further loci were monomorphic in the Mesoamerican population (AN-Pv32 and gssE20). When the Andean and northern Peru accessions were excluded, the Mesoamerican population showed a total of 240 SNPs (one SNP every 76 bp), and a level of genetic diversity (p) of 5.00 9 10-3 (Table 1). The MW population showed higher values for all of the genetic diversity estimates (V, H, Hd, p, h) than the MD population for the whole sequence, and for the coding and noncoding regions (Table 1). Considering the whole sequence and excluding the four monomorphic loci in the Mesoamerican population, 13 loci were monomorphic in the MD population, while only one was monomorphic in the MW (Supporting Information Table S6). The loss of nucleotide diversity (Lp) of the MD relative to the MW was 0.47, 0.45, and 0.53 for the whole sequence and the coding and noncoding regions, respectively. Figure 1a shows the NJ tree that was obtained using the concatenated sequences of the 44 loci. The distance between the main common bean gene pools (Mesoamerica and Andes) was confirmed, along with a clear subdivision of the MW and MD accessions. The only exception was a MW accession (G22837) that appeared to be more closely related to the MD accessions. Comparison between Sanger and KASPar genotyping From the sequence data obtained by Sanger sequencing of the 44 loci in the 22 P. vulgaris accessions, 40 SNP markers were developed for KASPar genotyping. By excluding missing data for both KASPar genotyping
Mol Breeding Table 1 Summary of the estimated genetic diversity present in a set of 22 P. vulgaris accessions and in their subsets (Mesoamerican combined, Mesoamerican wild, Mesoamerican domesticated), based on whole sequences, coding regions, and noncoding regions (introns, 30 -UTR, and 50 -UTR), separately No. of loci
Na
Range (bp)b
Vb
Sb
Pib
Synb
Nonsynb
Ha
Hda
p 9 10-3a
h 9 10-3a
a) Whole sample Whole sequence
44
21.6
18,045–18,232
304
85
219
Coding regions
37
21.6
10,417
135
40
95
/
/
4.7
0.54
4.73
4.80
82
52
3.6
0.43
3.70
3.44
115
40
75
/
/
3.3
0.38
7.05
7.26
9
1
8
/
/
4.5
0.59
4.41
4.31
Noncoding regions Introns
27
21.6
4,795–4,965
50 -UTR
2
21.5
564–565
30 -UTR
6
22.0
638–652
Overall
32
21.6
5,997–6,182
11
2
9
/
/
2.3
0.25
5.75
4.23
135
43
92
/
/
3.4
0.41
7.23
7.12
b) Mesoamerican P. vulgaris accessions Whole sequence
44
17.6
18,045–18,232
240
66
174
Coding regions 37 Noncoding regions
17.6
10,417
105
31
74
85
29
56
/
/
2.8
0.33
6.31
6.09
6
1
5
/
/
3.0
0.45
2.99
3.04
Introns
27
17.6
4,795–4,965
50 -UTR
2
17.5
564–565
30 -UTR
6
18.0
638–652
Overall
32
17.6
5,997–6,182
/
/
3.8
0.47
5.00
4.15
65
39
3.0
0.38
3.16
2.90
11
3
8
/
/
2.3
0.23
4.39
4.48
102
33
69
/
/
2.8
0.35
6.27
6.06
c) Mesoamerican wild (MW) P. vulgaris accessions Whole sequence
44
8.8
18,045–18,232
216
99
117
/
/
3.3
0.55
4.81
4.81
Coding regions
37
8.8
10,417
93
40
53
60
32
2.6
0.43
3.51
3.36
Introns
27
8.8
4,795–4,965
75
35
40
/
/
2.5
0.40
7.49
7.09
50 -UTR
2
8.5
564–565
6
1
5
/
/
3.0
0.63
4.68
3.94
30 -UTR
6
9.0
638–652
10
5
5
/
/
2.0
0.20
3.35
4.58
Overall
32
8.8
5,997–6,182
91
41
50
/
/
2.6
0.41
7.15
6.96
121 57
41 21
80 36
/ 34
/ 23
1.9 1.7
0.26 0.22
2.54 1.93
2.55 2.02
33
9
24
/
/
1.4
0.15
3.02
2.79
0
0
0
/
/
1.0
0.00
0.00
0.00
Noncoding regions
d) Mesoamerican domesticated (MD) P. vulgaris accessions Whole sequence Coding regions
44 37
8.8 8.8
18,045–18,232 10,417
Introns
27
8.8
4,795–4,965
50 -UTR
2
9.0
564–565
Noncoding regions
0
3 -UTR
6
9.0
638–652
Overall
32
8.8
5,997–6,182
7
4
3
/
/
1.7
0.16
4.52
3.91
40
13
27
/
/
1.5
0.15
3.35
3.05
a
-3
Average estimate among loci: N sample size, H number of haplotypes, Hd haplotype diversity; p 9 10 measures of nucleotide diversity from Tajima (1983) and Watterson (1975), respectively
and h 9 10-3, two
b
Sum of the single locus estimates: range (bp), sequence length (bp); V variable sites, S singleton variable sites, Pi parsimony informative variable sites, Syn total number of synonymous changes, Nonsyn total number of replacement changes
and Sanger sequencing (where no sequence data was available for some accessions), the percentage of mismatches between these two different genotyping tools was very low (0.8 %). A further 20 SNP markers were developed from the sequencing of a further 10 loci in a subsample of six P.
vulgaris accessions. Three mismatches were found for two SNP loci (J_g117_562 and B_g2113_57), which made them monomorphic in the 22 P. vulgaris accessions. When Sanger sequencing was used, the J_g117_562 SNP marker was polymorphic between Mesoamerican and the two Andean domesticated
123
Mol Breeding
Fig. 1 a Unrooted neighbor-joining (NJ) bootstrap tree inferred from the concatenated sequence data; b PCA analysis performed using SNPsetA (55 SNP markers, with \5 % missing data). Each accession is represented by a circle (wild accessions) or square (domesticated accessions). MW Mesoamerican wild (blue), MD Mesoamerican domesticated (cyan), AW Andean wild (red), AD Andean domesticated (pink), PhI, type I phaseolin (northern Peru, yellow), black circles, nodes for which the bootstrap values are [50 %
accessions, while B_g2113_57 was polymorphic only for one MW accession (G12979). KASPar genotyping of the 22 P. vulgaris accessions In the whole sample, 58 out of the 60 SNP markers were polymorphic and biallelic, and within the Mesoamerican population, 47 were polymorphic; the remaining 11 were polymorphic between the Mesoamerican, type I phaseolin, and/or Andean accessions. Out of the 47 polymorphic SNP markers in the Mesoamerican population, 42 remained so also in the MW, while only 18 remained so in the MD (Supporting Information Figure S1a). Considering the three Andean accessions, 49 of the 58 SNP markers were
123
monomorphic, while 11 were polymorphic (Supporting Information Figure S1a). Of the 60 SNP markers, five (J_Pv54_GB_134, B_g683_141, J_Pv30_B_131, J_g117_501, J_g1378_506) showed [5 % missing data and were not considered for the genetic diversity analyses. The genetic diversity estimates are reported in Table 2. There was a significant difference (P \ 0.0001; Wilcoxon–Kruskal–Wallis nonparametric test) between the genetic diversity estimates of MW (Na = 1.67, Ne = 1.41, He = 0.27) and MD (Na = 1.27, Ne = 1.14, He = 0.10). The loss of genetic diversity (DH) of MD compared to MW was 0.63 (Table 2). The relationships between all of the individuals considered were investigated by PCA (Fig. 1b). The first (PC1) and second (PC2) principal components explained 50.1 and 19.1 % of the total molecular variance, respectively. The analysis separated the MW from the MD accessions, with the only exception being MW accession G22837, which grouped within MD, as was also highlighted by the NJ obtained with the Sanger sequencing data. Moreover, a larger dispersal was seen for the MW accessions than the MD accessions. The Andean accessions were more distantly related, with the northern Peru (type I phaseolin) accession being intermediate. Twenty out of the 22 accessions of the common bean were genotyped by the KASPar assays, with a further 94 SNPs that were developed by Corte´s et al. (2011); three SNPs did not give valid results (BSNP39_357, BSNP62_901, DREB2A_246). Among the remaining 91 SNPs, four were monomorphic (BSNP37_2521, BSNP_18_C2556, BSNP50_132, BSNP7_212). Nine of the SNPs showed [5 % missing data and were not considered for the genetic diversity analyses (BARCPV-0004483, BSNP_22_C2574, BSNP_29_C2625, BSNP32_601, BSNP59_363, BSNP78_929, BARCPV-0004216, BSNP83_291, BSNP51_597). Eighty-seven out of the 91 SNP markers were polymorphic and biallelic in the 20 P. vulgaris accessions (Supporting Information Figure S1b). Within the Mesoamerican population, 61 SNPs were polymorphic, while the remaining 26 were monomorphic. Considering MW and MD, there were 55 and 26 polymorphic SNPs, respectively (Supporting Information Figure S1b). Genetic diversity estimates were computed for the 60 new SNPs identified in this study (SNPsetA) and for the 94 SNPs (SNPsetB) from Corte´s et al. (2011), both separately and combined (Table 3). There were
Mol Breeding Table 2 Genetic diversity estimates computed using the SNP markers developed for KASPar genotyping (KBioscience) Population
No. of genotypes
No. of SNP markersa
N
Na
Ne
He
Whole sample
22
55
21.9
1.96
1.49
0.31
Mesoamerican
18
55
17.9
1.76
1.36
0.24
9 9
55 55
8.9 9.0
1.67 1.27
1.41 1.14
0.27 0.10
MW MD a
Number of SNP markers with \5 % missing data; N sample size, Na average number of alleles, Ne average effective number of alleles, He average unbiased expected heterozygosity
Table 3 Genetic diversity estimates computed using the SNP markers developed for the KASPar assay in the present study (SNPsetA), from Corte´s et al. (2011) (SNPsetB), and considering all of the SNPs together for the 20 shared P. vulgaris accessions
SNP set
No. of genotypes
No. of SNP markersa
N
Na
Ne
He
DH
SNPsetA
20
56
19.9
1.96
1.49
0.31
/
SNPsetB
20
82
19.6
1.95
1.56
0.34
/
All SNPs SNPsetA
20
138
19.7
1.96
1.53
0.33
Mesoamerican
17
56
16.9
1.77
1.39
0.25
/ 0.65
MW
9
8.9
1.68
1.42
0.27
MD
8
8.0
1.27
1.13
0.10
SNPsetB a
Number of SNP markers with \5 % missing data; N sample size, Na number of alleles, Ne effective number of alleles, He unbiased expected heterozygosity, DH loss of nucleotide diversity in MD versus MW
Mesoamerican
16.7
1.70
1.45
0.27
/
MW
17 9
82
8.9
1.63
1.40
0.26
0.54
MD
8
7.8
1.28
1.18
0.12
All SNPs Mesoamerican
16.8
1.72
1.42
0.26
/
MW
17 9
8.9
1.65
1.40
0.26
0.58
MD
8
7.9
1.28
1.16
0.11
no significant differences between the estimates obtained with the different SNP sets, although there was a slightly higher genetic diversity (for both Ne and He estimates) in MW for SNPsetA than for SNPsetB, and vice versa for MD; similarly, the loss of genetic diversity (DH) estimates were higher for SNPsetA (0.65) than for SNPsetB (0.54) (Table 3). The PCA analysis results were consistent between the two sets, although a clearly larger dispersal was seen in the MW population for SNPsetA compared to SNPsetB (Supporting Information Figure S2). KASPar genotyping of the other Phaseolus spp. accessions Kaspar assays were also performed for the 37 further accessions of 10 different species belonging to the
138
Phaseolus genus. The SNP markers that were polymorphic and those with high levels of missing data in the whole sample of these Phaseolus spp., and in the different groups defined on the basis of their phylogenetic relationship with P. vulgaris, are reported in Supporting Information Tables S9 and S10. The phylogeny of the Phaseolus genus reported by Delgado-Salinas et al. (2006) was considered. A summary of the number of polymorphic loci and the loci with \5 and \50 % missing data for the whole sample, within each Phaseolus species, and for the above-mentioned partition, is reported in Table 4. By excluding the SNP markers with[50 % missing data, 50 (83 %) of SNPsetA and 75 (82 %) of SNPsetB markers worked well also in these different species. There were 18 (30 %) and 23 (25 %) polymorphic loci for SNPsetA and SNPsetB, respectively (Table 4).
123
Mol Breeding Table 4 Number of SNP markers that are polymorphic and with \5 and \50 % missing data for the whole Phaseolus spp. sample, within each species and in the different groups defined on the basis of their phylogenetic relationships with P. vulgaris Phaseolus species
Phaseolus ssp.
No. of genotypes
SNPsetA
SNPsetB
SNPsetA
No. of polymorphic SNPs
No. of polymorphic SNPs
No. of SNPs \ 5 %
37
18
23
4
50
9
75
Clade Ba (vulgaris group) P. coccineus 17
SNPsetB No. of SNPs \ 50 %
No. of SNPs \ 5 %
No. of SNPs \ 50 %
3
7
41
54
46
82
4
3
5
19
54
40
74
4
7
1
34
49
59
69
P. angustissimus
2
3
0
46
50
63
72
P. filiformis
2
1
0
36
46
63
69
6
5
6
24
41
40
67
P. leptostachyus
2
0
0
34
44
60
69
P. micranthus
2
0
0
31
40
57
68
P. lunatus
2
1
0
35
43
53
69
P. acutifolius Clade Ba_1
Clade Ba_2
Clade A
a
6
4
6
21
42
34
60
P. parvulus
2
0
0
29
40
50
65
P. grayanus
2
0
1
35
42
53
62
P. hintonii
2
1
1
34
41
47
62
a
Clade B includes P. vulgaris and its more closely related Phaseolus species; Clade A includes Phaseolus species more distantly related to P. vulgaris. Bold values indicate main group subdivisions
Discussion In the present study, we have developed a new set of SNP markers for P. vulgaris and other Phaseolus spp. The starting point for this SNP detection process was the strategy applied for the selection of accessions to be used. Indeed, the choice of the panel of initial accessions is crucial in capturing the genetic diversity that is characteristic of a species and in limiting the ascertainment bias in subsequent studies (Clark et al. 2005). The main aspect that was considered was the very complex population structure characteristic of P. vulgaris. In particular, the wild germplasm is characterized by a higher level of genetic diversity than the domesticated materials (through reductions in diversity imposed by founder effects and selection at the target loci, due to the domestication process and subsequently to modern plant breeding), and the Mesoamerican gene pool is characterized by a higher level of genetic diversity than the Andean gene pool (Bitocchi et al. 2012, 2013). Thus, a set of 22 genotypes was chosen that were mostly of
123
Mesoamerican origin, with 50 % as wild materials. Moreover, to have a representation of all of the gene pools, accessions from the Andes (both wild and domesticated forms) and northern Peru were also included. The availability of molecular characterization of about 200 P. vulgaris accessions that are representative of the population structure of the species (Rossi et al. 2009; Bitocchi et al. 2012, 2013; Desiderio et al. 2013) allowed the building of a core collection that maximizes the genetic variability. To date, the sampling strategy and the accession set chosen in the present study represent a novelty among studies that are aimed at developing SNP markers in P. vulgaris. Indeed, in the literature, SNP detection in the common bean has mainly been based on domesticated material (cultivar or landrace) and on comparisons of sequence data from two P. vulgaris accessions from the two gene pools (Ramirez et al. 2005; Galeano et al. 2009a, b; Hyten et al. 2010; Corte´s et al. 2011; Blair et al. 2013). Gaita´n-Solı´s et al. (2008) used a small set of 10 accessions of P. vulgaris that represented both of the main gene pools (Mesoamerican and Andean), six
Mol Breeding
of which were domesticated (two from Mesoamerica and four from the Andes), and four of which were wild materials (two from Peru and two from Mexico). The first outcome of the present study was the identification of a new set of primer pairs that allowed the successful amplification through Sanger sequencing of 37 gene fragments in P. vulgaris, 34 of which were identified as single-copy orthologous genes between legume species. This strategy was chosen in order to develop markers to be used also in other legume species and for comparative analysis. Moreover, we included loci that have putatively undergone selection during the domestication of the common bean. These probably have important roles in the control of variation at the phenotypic level between wild and domesticated forms, such as the loci located near domestication-related QTLs and the STS markers developed by Bellucci (2006) on the basis of AFLP data from Papa et al. (2007). We found one SNP every 60 bp. As expected, this frequency increased for noncoding regions (*6.2 kbp; one SNP every 46 bp) and decreased for coding regions (*10.4 kbp; one SNP every 77 bp). The SNP density does not only depend on whether coding or noncoding regions are being considered, but also on the number and genetic diversity of the genotypes being assessed, and to a large extent on the species (mating system, autogamous or allogamous). Indeed, because of the population structure, the SNP frequency decreased when considering only a single subpopulation: the Mesoamerican population has one SNP every 76, 61, and 99 bp for the whole sequence and the noncoding and coding regions, respectively. Another important aspect to consider is the level of diversity of the initial sample. Indeed, if we compare the MW and MD subpopulations, the SNP frequency was almost twice as high in the MW (all, 1:84; noncoding, 1:68; coding, 1:112) as in the MD (all, 1:151; noncoding, 1:155; coding, 1:183). As previously mentioned, it is difficult to make comparisons with SNP frequency data reported in other studies for P. vulgaris; however, to have an idea, by analyzing sequence data from 400 gene fragments in the Mesoamerican BAT93 and Andean JaloEEP558 accessions, McConnell et al. (2010) reported a SNP frequency of 380 bp; the same value was shown by Ramirez et al. (2005), on the basis of EST data of the Andean G19833 and the Mesoamerican Negro Jamapa 81 lines. In contrast to these studies, and more
consistent with the results obtained in the present study, Gaita´n-Solı´s et al. (2008) reported one SNP every 88 bp through screening the nucleotide diversity of 47 fragments of DNA from 10 accessions (including also four wild genotypes) from both of the main gene pools. The second outcome of the present study was the development of a flexible, cost-effective, and efficient SNP-genotyping platform, and a KASPar assay for 60 new SNP markers, for use in common bean studies for different purposes, such as molecular breeding. The results of the genetic diversity analyses carried out on the 22 P. vulgaris accessions with both sequence and SNP data are consistent with the latest findings on common bean evolution (Rossi et al. 2009; Bitocchi et al. 2012, 2013; Desiderio et al. 2013). In particular, there was a clear and evident distinction between the Andean and Mesoamerican accessions, with the northern Peru accession being intermediate (NJ tree for Sanger data, and PCA for KASPar data). Focusing on the Mesoamerican gene pool, for which more genotypes were analyzed, the significantly higher diversity of the MW than the MD population, and the monophyletic NJ tree, indicated a single domestication event in Mesoamerica. In the light of these outcomes, our sampling strategy and the SNPs identified appear to be useful for highlighting the evolution of the common bean, and thus they will be very useful for different studies in the common bean. Another interesting topic of the present study was the comparison of the population genetics parameters estimated on the same sample using our SNP set (SNPsetA) and that developed for the same technology (KASPar) by Corte´s et al. (2011) (SNPsetB). This allowed us to investigate the differences in the genetic diversity estimates between these two sets of SNPs that were developed using different criteria to select the initial set of accessions for SNP discovery. Indeed, in contrast to the present study, Corte´s et al. (2011) identified SNP markers through the screening of sequence data present in the literature from two comparisons, both between two P. vulgaris domesticated accessions, one of Andean and the other of Mesoamerican origin (Ramirez et al. 2005; Galeano et al. 2009a, b; Hyten et al. 2010). There were no significant differences in the genetic diversity estimates between these two sets of SNPs as computed for the whole sample, for the Mesoamerican population, and for the MW and MD populations. This indicates
123
Mol Breeding
that searching for polymorphisms in the common bean by screening for the genetic diversity of genotypes from the two main different gene pools is nearly equivalent to screening the wild materials for the same purpose; thus, ascertainment bias appears to be a limited issue in the common bean when applying each of the sampling strategies described here. This is probably due to the particular evolutionary history of the species, with the genetic differentiation of the two gene pools that arose in ancient times (prior to domestication) catching the polymorphisms present in the wild forms of the species. However, a certain bias was detected, even though this was not statistically supported; indeed, comparing the MW and MD, we observed that our SNPs (SNPSetA) showed slightly higher diversity for the MW and lower for the MD when compared to the Corte´s et al. (2011) SNPs (SNPSetB). This difference is clearer when considering the estimation of the loss of diversity in the MD due to the domestication process (SNPSetA: 0.65; SNPsetB: 0.54). Thus, to avoid ascertainment bias when developing SNPs in P. vulgaris, we suggest that genotypes from all of the gene pools characteristic of the species and wild materials should be included, to provide a more precise picture of the genetic diversity that is characteristic of this species, so as to use these markers in different basic and applied plant genetic studies. Finally, another important outcome of this study was that most of the SNP markers (83 %, from both SNPsetA and SNPsetB) worked well also in accessions from other Phaseolus species. Even though there are two studies that focused on the identification of orthologous genes between legumes and the development of primer pairs to amplify and sequence parts of them (Choi et al. 2006; Hougaard et al. 2008), our study is the first to date that reports the development of SNP markers that are detectable with a high-throughput genotyping assay for different legume species. Thus, these SNP markers will be very useful for studies in different Phaseolus species and for comparative analyses. Acknowledgments Goretti D. and Bitocchi E. made equal contributions to this study, and so should be considered joint first authors. This study was supported by grants from the Italian Government (MIUR; Grant Number 20083PFSXA_001, PRIN Project 2008), the Universita` Politecnica delle Marche (2012–2013) and the Marche Region (Grant Number L.R.37/ 99 art. 2lett.I—PARDGR 247/10-DDPF98/CSI10).
123
References Akey JM, Zhang K, Xiong M, Jin L (2003) The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol Biol Evol 20:232–242 Altschul SF, Madden TL, Scha¨ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Bellucci E (2006) Development of molecular markers and analysis of a BAC library for evolutionary genomics studies in common bean (Phaseolus vulgaris L). PhD Thesis, Universita` Politecnica delle Marche, Ancona, Italy Bitocchi E, Nanni L, Bellucci E, Rossi M, Giardini A, Spagnoletti Zeuli P, Logozzo G, Stougaard J, Mcclean P, Attene G, Papa R (2012) Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data. Proc Natl Acad Sci USA 109:E788–E796 Bitocchi E, Bellucci E, Giardini A, Rau D, Rodriguez M, Biagetti E, Santilocchi R, Spagnoletti Zeuli P, Gioia T, Logozzo G, Attene G, Nanni L, Papa R (2013) Molecular analysis of the parallel domestication of the common bean in Mesoamerica and the Andes. New Phytol 197:300–313 Blair MW, Corte´s AJ, Penmetsa RV, Farmer A, CarrasquillaGarcia N, Cook DR (2013) A high-throughput SNP marker system for parental polymorphism screening, and diversity analysis in common bean (Phaseolus vulgaris L.). Theor Appl Genet 126:535–548 Broughton WJ, Hernandez G, Blair MW, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.)—model food legumes. Plant Soil 252:55–128 Choi HK, Luckow MA, Doyle J, Cook DR (2006) Development of nuclear gene-derived molecular markers linked to legume genetic maps. Mol Gen Genomics 276:56–70 Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R (2005) Ascertainment bias in studies of human genomewide polymorphism. Genome Res 15:1496–1502 Corte´s AJ, Chavarro MC, Blair MW (2011) SNP marker diversity in common bean (Phaseolus vulgaris L.). Theor Appl Genet 123:827–845 Debouck DG, Toro O, Paredes OM, Johnson WC, Gepts P (1993) Genetic diversity and ecological distribution of Phaseolus vulgaris in northwestern South America. Econ Bot 47:408–423 Delgado-Salinas A, Bibler R, Lavin M (2006) Phylogeny of the genus Phaseolus (Leguminosae): a recent diversification in an ancient landscape. Syst Bot 31:779–791 Desiderio F, Bitocchi E, Bellucci E, Rau D, Rodriguez M, Attene G, Papa R, Nanni L (2013) Chloroplast microsatellite diversity in Phaseolus vulgaris. Front Plant Sci 3:312 Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull 19:11–15 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 Frascaroli E, Schrag TA, Melchinger AE (2013) Genetic diversity analysis of elite European maize (Zea mays L.) inbred lines using AFLP, SSR, and SNP markers reveals
Mol Breeding ascertainment bias for a subset of SNPs. Theor Appl Genet 126:133–141 Gaita´n-Solı´s E, Choi IY, Quigley C, Cregan P, Tohme J (2008) Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system. Plant Genome 1:125–134 Galeano CH, Fernandez AC, Gomez M, Blair MW (2009a) Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.). BMC Genomics 10:629 Galeano CH, Gomez M, Rodriguez LM, Blair MW (2009b) CEL I nuclease digestion for SNP discovery and marker development in common bean (Phaseolus vulgaris L.). Crop Sci 49:381–394 Galeano CH, Corte´s AJ, Ferna´ndez AC, Soler A, Franco-Herrera N, Macunde G, Vanderleyeden J, Blair MW (2012) Gene-based single nucleotide polymorphism markers for genetic and association mapping in common bean. BMC Genet 13:48 Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/ NT. Nucleic Acids Symp Ser 41:95–98 Hayama R, Coupland G (2004) The molecular basis of diversity in the photoperiodic flowering responses of Arabidopsis and Rice. Plant Physiol 135:677–684 Hougaard BK, Heegaard Madsen L, Sandal N, Marcio de Carvalho M, Fredslund J, Schauser L, Nielsen AM, Rohde T, Sato S, Tabata S, Bertioli DJ, Stougaard J (2008) Legume anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and Arachis. Genetics 179:2299–2312 Hyten D, Song Q, Zhu Y, Choi IY, Nelson RL, Costa JM, Specht JE, Shoemaker RC, Cregan PB (2006) Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci USA 103:16666–16671 Hyten DL, Song Q, Fickus EW, Quigley CV, Lim JS, Choi IY, Hwang EY, Pastor-Corrales M, Cregan PB (2010) Highthroughput SNP discovery and assay development in common bean. BMC Genomics 11:475 Kami J, Becerra-Vela´squez V, Debouckand DG, Gepts P (1995) Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci USA 92:1101–1104 Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738 Kruglyak L (1997) The use of a genetic map of biallelic markers in linkage studies. Nat Genet 17:21–24 Kuhner MK, Beerli P, Yamato J, Felsenstein J (2000) Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156:439–447 Kumar S, Banks TW, Cloutier S (2012) SNP discovery through next-generation sequencing and its applications. Int J Plant Genomics 2012:1–15 Kwak M, Velasco D, Gepts P (2008) Mapping homologous sequences for determinacy and photoperiod sensitivity in common bean (Phaseolus vulgaris). J Hered 99:283–291 Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451–1452
Marshall O (2004) PerlPrimer: cross-platform, graphical primer design for standard, bisulphate and real-time PCR. Bioinformatics 20:2471–2472 McConnell M, Mamidi S, Lee R, Chikara S, Rossi M, Papa R, McClean P (2010) Syntenic relationships among legumes revealed using a gene-based genetic linkage map of common bean (Phaseolus vulgaris L.). Theor Appl Genet 121:1103–1116 Moragues M, Comadran J, Waugh R, Milne I, Flavell AJ, Russell JR (2010) Effects of ascertainment bias and marker number on estimations of barley diversity from high-throughput SNP genotype data. Theor Appl Genet 120:1525–1534 Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590 Nielsen R (2000) Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931–942 Papa R, Bellucci E, Rossi M, Leonardi S, Rau D, Gepts P, Nanni L, Attene G (2007) Tagging the signatures of domestication in common bean (Phaseolus vulgaris) by means of pooled DNA samples. Ann Bot 100:1039–1051 Peakall R, Smouse PE (2006) Genalex 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes 6:288–295 Putterill J, Robson F, Lee K, Simon R, Coupland G (1995) The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell 80:847–857 Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100 Ramirez M, Graham MA, Blanco-Lopez L, Silvente S, Medrano-Soto A, Blair MW, Herna´ndez G, Vance CP, Lara M (2005) Sequencing and analysis of common bean ESTs. Building a foundation for functional genomics. Plant Physiol 137:1211–1227 Rosenblum EB, Novembre J (2007) Ascertainment bias in spatially structured populations: a case study in the eastern fence lizard. J Hered 98:331–336 Rossi M, Bitocchi E, Bellucci E, Nanni L, Rau D, Attene G, Papa R (2009) Linkage disequilibrium and population structure in wild and domesticated populations of Phaseolus vulgaris L. Evol Appl 2:504–522 Rozen S, Skaletsky HJ (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics methods and protocols: methods in molecular biology. Humana Press, Totowa, NJ, pp 365–386 Samach A, Onouchi H, Gold SE, Ditta GS, Schwarz-Sommer Z, Yanofsky MF, Coupland G (2000) Distinct roles of CONSTANS target genes in reproductive development of Arabidopsis. Science 288:1613–1616 Schlo¨tterer C (2004) The evolution of molecular markers—just a matter of fashion? Nat Rev Genet 5:63–69 Schlotterer C, Harr B (2002) Single nucleotide polymorphisms derived from ancestral populations show no evidence for biased diversity estimates in Drosophila melanogaster. Mol Ecol 11:947–950 Sokal RR, Rohlf FJ (1995) Biometry: the principles and practice of statistics in biological research. WH Freeman and Co, New York
123
Mol Breeding Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460 Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: molecular evolutionary genetics analysis (MEGA) software, version 4.0. Mol Biol Evol 24:1596–1599 Vigouroux Y, McMullen M, Hittinger CT, Houchins K, Schulz L, Kresovich S, Matsuoka Y, Doebley J (2002) Identifying genes of agronomic importance in maize by screening microsatellites for evidence of selection
123
during domestication. Proc Natl Acad Sci USA 99:9650–9655 Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256–276 Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB (2003) Single-nucleotide polymorphisms in soybean. Genetics 163:1123–1134