The Relationship Between Microsatellite ... - Semantic Scholar

1 downloads 0 Views 227KB Size Report
Although previous studies have failed to detect an association between microsatellite polymorphism and broadscale recombination rates in the human genome, ...
The Relationship Between Microsatellite Polymorphism and Recombination Hot Spots in the Human Genome Mikael Brandstro¨m,*1 Andrew T Bagshaw,  Neil J. Gemmell, 2 and Hans Ellegren* *Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden; and  School of Biological Sciences, University of Canterbury, Christchurch, New Zealand Although previous studies have failed to detect an association between microsatellite polymorphism and broadscale recombination rates in the human genome, there are several possible reasons why such a relationship could exist. For instance, there might be a direct link if recombination is mutagenic to microsatellite sequences or if polymorphic microsatellites act as recombination signals. Alternatively, recombination could exert an indirect effect by uncoupling of natural selection at linked loci, promoting polymorphism. As recombination is concentrated in narrow hotspot regions in the human genome, we investigated the relationship between microsatellite polymorphism and recombination hot spots. By using data from a common allele frequency database, we found several polymorphism estimates to be similar for hot spots and the genomic average. However, this is likely explained by an ascertainment bias because markers with high polymorphism information content are usually selected for genotyping in human populations and pedigrees. In contrast, by using an unbiased set of shotgun sequence data, we found an excess of microsatellite polymorphism in recombination hot spots of 14%. However, when other genomic variables are taken into account in a generalized model and using wavelet analysis, the effect is no longer detectable and the only firm predictor of microsatellite polymorphism is the incidence of SNPs and indels. One possible neutral explanation to these observations is that there is a common denominator affecting the local rate of mutation in unique as well as in repetitive DNA, for example, base composition.

Introduction Microsatellites, or simple sequence repeats, are stretches of DNA sequence that consist of direct tandem repeats of short (1–5 bp) motifs (Tautz et al. 1986). There is an estimated density of one microsatellite every 2 kb in the human genome (International Human Genome Sequencing Consortium 2001), whereas there are examples of both higher and lower densities in other eukaryotic genomes (Katti et al. 2001; Dieringer and Schlo¨tterer 2003). Ever since the realization that simple repeats represent a ubiquitous and significant part of the genome, it has been an issue whether microsatellites simply represent ‘‘junk’’ DNA sequences or if they have some function (Li et al. 2004; Kashi and King 2006). Most investigators have considered them to represent neutral markers that, with the exception of expanded trinucleotide repeats causing human disease (Pearson et al. 2005), generally do not contribute to phenotypic variation. However, there have recently been frequent reports of associations between microsatellite alleles and various phenotypic traits (Hammock and Young 2005), supporting an old idea that microsatellites affect gene regulation (Hamada et al. 1984). However, the significance of these observations is unclear because other cis-acting elements, contained within microsatellitedefined haplotypes, might be responsible for the observed associations. Relevant in the context of microsatellite function is some evidence that point at an association with genetic recombination. An association between microsatellite occurrence and regions of high recombination rate has been noted in yeast (Bagshaw et al. 2008) and, to a lesser extent, in mammals (Kong et al. 2002; Jensen-Seaman et al. 2004; 1 Present address: Department of Forest Mycology and Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden 2 Present address: Otago School of Medical Sciences, University of Otago, Dunedin, New Zealand

Mol. Biol. Evol. 25(12):2579–2587. 2008 doi:10.1093/molbev/msn201 Advance Access publication September 15, 2008 Ó The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Myers et al. 2005). What biological processes primarily drive this association is a matter of current debate. In the yeast Saccharomyces cerevisiae, site-specific modification of interchromosomal meiotic recombination activity levels has been shown for microsatellites including poly-A (Schultes and Szostak 1991), poly-AC (Treco and Arnheim 1986; Gendrel et al. 2000), and pentanucleotide (Kirkpatrick et al. 1999) repeat arrays, and numerous studies have shown that microsatellites can stimulate recombination between plasmids (Bullock et al. 1986; Murphy and Stringer 1986; Wahls et al. 1990; Napierala et al. 2002, 2004). Evidence that recombination can act to mutate microsatellites has also been found in S. cerevisiae (Gendrel et al. 2000), and aberrant recombination has been implicated in the extreme microsatellite instability seen in some genetic disease (Jakupciak and Wells 2000; Mirkin 2006). The study of the relationship between microsatellite polymorphism and recombination is relevant to this debate because if recombination drives microsatellite evolution through a mutagenic effect, recombination rate should be correlated with microsatellite variability. Such a result would also be relevant for the use of microsatellites as genetic markers, which is currently limited by a lack of understanding of the factors underlying the variance in mutability among loci (Buschiazzo and Gemmell 2006). Apparently, inconsistent with a substantial role of recombination, microsatellite variation has been found not to correlate with broadscale recombination rate in humans (Payseur and Nachman 2000; Huang et al. 2002). However, it has now been established that meiotic recombination is concentrated in narrow ‘‘hot spots’’ of a few kilobases, separated by 50–100 kb areas of low recombination (Jeffreys et al. 1998; McVean et al. 2004; Myers et al. 2005). If recombination and microsatellite variation are somehow associated, we might expect to see the strongest signatures of such an association in recombination hot spots. Another reason for asking whether microsatellite polymorphism and recombination hot spots are associated is that our understanding of how recombination shapes DNA

2580 Brandstro¨m et al.

sequence diversity in general is far from complete. A finescale association with recombination was recently reported for sequence diversity on a single human chromosome (Spencer et al. 2006), but the extent to which this was caused by mutagenic activity, or recombination acting to decouple the effects of natural selection on linked neutral markers, remains questionable. Some evidence has suggested the importance of GC-biased single-base mutations, possibly resulting from aberrant gene conversion activity at recombination hot spots (Spencer et al. 2006; Duret and Arndt 2008). However, a positive relationship between insertion/deletion mutations, such as occur commonly in microsatellites, and recombination hot spots would suggest that other processes are also important. Here, we investigate the relationship between microsatellite polymorphism and recombination in the human genome by using data on the location of recombination hot spots, microsatellite allele frequencies in human populations, and unbiased data on microsatellite polymorphism from shotgun sequencing.

Materials and Methods Recombination Hot Spots We used the locations of recombination hot spots in the human genome reported by Myers et al. (2005). The coordinates given for the recombination hot spots in the original report are based on the NCBI34 version of the human genome. Because other data sets used in our study were based on NCBI35 or NCBI36, we used the liftOver utility from UCSC Genome Bioinformatics (http://genome.ucsc.edu/cgi-bin/hgLiftOver) to translate the hotspot coordinates between assemblies. Twenty-nine of 78,533 hot spots were lost in these translations.

Microsatellite Polymorphisms Two sources of microsatellite polymorphism data were used. We downloaded the tables from the ALlele FREquency Database (ALFRED) (Rajeevan et al. 2003), from which data were filtered to only include those loci which consisted of perfect (uninterrupted) microsatellites and had information on allele length. Based on this, we obtained three different estimates of the level of mutability for microsatellite loci. First, we determined the difference in length between the longest and shortest allele (allele span), a variable that can be expected to correlate positively with mutation rate as more allelic states are generated at high mutation frequencies. Second, based on similar arguments, we noted at each locus the number of segregating alleles. Third, P we calculated the expected heterozygosity (h51 xi2 ; where xi is the frequency of the ith allele), which also should reflect the underlying mutation rate. All three estimates of mutability were first calculated for each population, then averaged over the populations available in ALFRED for each locus. Finally, the statistic was averaged over hot spots. The second source of data was length polymorphisms identified by Mills et al. (2006) based on large shotgun se-

quencing efforts. These insertions and deletions had been classified as representing length variation in tandem repeats, or unique sequence, by Tandem Repeats Finder (Benson 1999).

Microsatellite Abundance The human genome assembly NCBI34, except chromosome Y, was screened for microsatellites using a modified version of the repeat-finding program Sputnik (Morgante et al. 2002) set to only report perfect microsatellite repeats with a total length longer than 6 bp, with the additional requirement of containing at least two repeat units.

Statistics To test if properties of microsatellites in hot spots differed from random expectation, we performed resampling tests. The statistic of interest was first calculated for hotspot regions. This value was compared with a null distribution of the statistic obtained from 1,000 resampled sets of regions. In each resampling round nonoverlapping regions with the same lengths as the known recombination hot spots were randomly drawn from the genome following a uniform distribution. The number and lengths of regions for each chromosome were retained in the random sample. This was done to focus the analysis on finer scale variation and to avoid to let chromosome scale heterogeneities in, for example, recombination rate or base composition affect the results. To analyze the connection between microsatellite polymorphism and recombination, we used a generalized linear model with a Poisson error distribution with no restriction on dispersion (family 5 quasi-Poisson in R, link 5 log) at the 1-kb scale (for the sake of consistency with the wavelet analysis described below). The quasi-Poisson method relaxes the assumption that the response conforms to a Poisson distribution, that is, that the variance equals the mean. In this generalized model, microsatellite polymorphism (the response variable) was predicted simultaneously by recombination, GC content, coding sequence content (taken from the UCSC Genome Browser; http:// genome.ucsc.edu), microsatellite occurrence, SNP density, and indel density. This approach effectively predicts the effect of recombination while controlling for the effects of other predictors. For larger scales, we used a correlation of detail wavelet transformed data as described by Spencer et al. (2006). The wavelet transform requires data for all subregions within an analysis block, with the number of subregions being an even exponential of 2. We used a subregion size of 1 kb, at which level all data were averaged. Throughout the human genome we analyzed a total of 37 regions of 215 kb (about 32.8 Mb), for which data about recombination (derived from Myers et al. 2005), microsatellites, SNP and indel polymorphism (Mills et al. 2006), coding sequence (UCSC Genome Browser), as well as GC content were available for all 1-kb regions with no gaps. These included samples from 16 different chromosomes

Density

0.10

B

0.0

0.00

Density

A

0.4

0.8

Microsatellites in Recombination Hot Spots 2581

20

25

30

35

6

7

8

9

10

D

0.00

10

Density

C

0.04

0.08

Mean number of alleles

05

Density

20

Allele span (bp)

0.66

0.70

0.74

0.78

Mean heterozygosity

10

20

30

40

50

Number of markers

FIG. 1.—Mutability estimates (based on allele frequency data) and marker density of microsatellites in recombination hot spots compared with genomic expectations. Ninety-five percent CIs were estimated using a resampling test with 1,000 replicates. Solid curve indicates null distribution from resampling. Dashed vertical lines indicate the 95% CI around the resampling median. Solid vertical line indicates the value observed in recombination hot spots. A is allele span, B is mean number of alleles, C is mean heterozygosity and D is number of markers. In D, the observation differs significantly from the null distribution (P 5 0.027).

and a reasonable selection of short arm, long arm, centromere-proximal and telomere-proximal regions. We selected a region size of 215 kb in preference to other possible sizes for three reasons. Firstly, it was employed in the study by Spencer et al. (2006), in which our wavelet analysis method was based. Secondly, only eight regions of 216 kb would have been possible with the data available due to the need in wavelet analysis for contiguity in all variables. Thirdly, although smaller regions would have enabled more replication and greater overall coverage of the genome, the power of wavelet analysis to detect broadscale correlations obviously declines with decreasing size of studied regions. The use of 37 replicate regions had the advantage of allowing some evaluation of regional effects, and overall significance was calculated by combining the P values in cases where the correlations were consistently positive or consistently negative across all regions. This was done by Stouffer’s method, also known as Stouffer’s z-test (Stouffer et al. 1949), which is a simple way to combine probabilities from multiple independent results to obtain a statistic for the overall probability of the combination of observations. The wavelet correlation was performed for regions between 2 kb and 1024 kb in size (in increasing exponentials of 2).

Results To analyze the relationship between recombination and microsatellite evolution, we used the ALFRED database, which contains microsatellite allele frequency information from population surveys. From this, we extracted polymorphism data from 282 microsatellite loci spread across the human genome. Forty of these loci are located within recombination hot spots. This is more than would

be expected by chance (27 would be expected; resampling test, P 5 0.023), showing an overrepresentation of polymorphic microsatellites in recombination hot spots. However, none of three mutability estimates (allele span, number of alleles, and heterozygosity) of microsatellites in recombination hot spots deviates significantly from random expectations (fig. 1a–c). This is unexpected at a first glance, at least for models invoking a role of recombination in causing microsatellite mutations. However, the failure to demonstrate increased mutability/polymorphism in recombination hot spots could be due to a methodological artifact. Markers contained within allele frequency databases are likely to initially have been selected for their polymorphism information content, that is, they have been chosen for population screening or linkage mapping on basis of known high heterozygosity. Such ascertainment bias (Ellegren et al. 1995), if it exists, could mask an underlying difference in microsatellite characteristics between recombination hot spots and other regions of the genome. Specifically, if highly polymorphic markers from across the genome are selected for use, it is expected that they show similar patterns of mutability/polymorphism. This would not only be able to explain the uniformity in variability estimates of microsatellites in hot spots and non–hot spots but also the observation of an overrepresentation of polymorphic microsatellites in hot spots (fig. 1d). If simple repeat loci with high polymorphism are more common in hot spots than in the rest of the genome, selection for variable markers also selects for markers in hot spots. To overcome the problem of using a set of microsatellite markers potentially affected by an ascertainment bias, we turned to a set of about 44,000 length polymorphisms within tandem repeats (Mills et al. 2006) identified from shotgun sequencing initiatives. Although this approach comes with the limitation of only estimating the proportion

2582 Brandstro¨m et al.

Table 1 Polymorphism Data (based on overlapping shotgun sequences) for Microsatellites, SNPs, and Indels in Recombination Hot Spots Compared with Genomic Expectations Genomic Expectation

Number of polymorphic microsatellites Proportion of microsatellites polymorphic Number of SNPs Number of indels

Hot Spots

Lower 95% CI

Point Estimate

Upper 95% CI

P Value

15,059 0.0136 442,015 22,218

12,095 0.0116 346,640 18,653

12,172 0.0119 355,125 19,051

12,363 0.0122 384,772 19,477

,0.001 ,0.001 ,0.001 ,0.001

NOTE.—Ninety-five percent CIs were estimated using a resampling test with 1,000 replicates.

of microsatellites that is polymorphic in a draw of two alleles, its advantage is that it provides polymorphism (mutability) data for a random set of markers. Note that sparse shotgun sequencing typically means that only one additional allele from the population is sequenced for each locus. This may either represent the same allelic state as the genome reference or a different allelic variant. Accordingly, any one locus can only be scored as homozygous (the same allele length seen twice) or heterozygous (two different length variants seen). Using these data, we find that there are significantly more polymorphic microsatellites in recombination hot spots than in other genomic regions of the same window size (P , 0.001, table 1). Moreover, we also find that the proportion of microsatellites that is polymorphic in recombination hot spots is significantly higher, by 14.3% (95% confidence interval [CI] 11.4–17.2%), than the expected mean frequency for the genome as a whole (table 1). An excess of similar magnitude was also found for SNPs and indels in hot spots, 20.9% (95% CI 14.9–27.5%) for SNPs and 16.5% (95% CI 14.1–19.1%) for indels. This observation offers a direct link between microsatellite polymorphism and recombination, although it cannot address their causal relationship. In order to attempt to address the causality of recombination on microsatellite polymorphism, we turned to using a generalized linear model of microsatellite polymorphism as predicted by recombination. To isolate the effect of recombination, GC content, microsatellite occurrence, gene content, SNP frequency, and indel frequency were included as predictors in the model. For consistency with the wavelet analysis (see below), the model was fitted to data for 1-kb windows in 37 different

215 kb regions. However, for all but a limited number of the 37 regions, recombination fails to predict microsatellite polymorphism. After correction for multiple testing, no region is longer statistically significant (table 2). The only consistent predictor of microsatellite polymorphism in this analysis is SNP density. Without any clear evidence for a connection between recombination and microsatellite polymorphism at the 1-kb scale, we turned to wider scales. Using the detailed wavelet transformation of microsatellite occurrence and recombination as described by Spencer et al. (2006), the correlation between microsatellite polymorphism and recombination was tested at scales between 1 and 1,024 kb. This transformation lets us study the effects on different scales independently of each other. A significant positive association was found in a few cases at the 256 kb level (fig. 2), although based on the large number of tests performed (370 scale– region combinations) this is what would be expected by pure chance. Hence, after correcting for other genomic variables, we are no longer able to detect an excess of microsatellite polymorphism in recombination hot spots.

Discussion In this study, we addressed the levels of genetic variability at human microsatellite loci in relation to recombination hot spots. The most significant finding was that microsatellite variability, as SNP and indel density, is considerably higher in hot spots compared with the genomic average. However, when correcting for other genomic variables, this effect is no longer detectable. In the latter analysis, the frequency of SNPs and indels is the only

Table 2 Results from Generalized Linear Model Analyses Predicting Polymorphic Microsatellites in 1-kb Regions of the Human Genome t-Test

Estimated Coefficent Variable

Mean

Recombination Gene content GC content Microsatellites SNPs Indels

0.0113 0.0010 1.14 0.432 0.178 0.313

SEM

Mean

SEM

Pr(T . |t|)

0.0125 0.0006 0.676 0.122 0.0138 0.0561

1.01 1.54 1.73 3.73 12.9 5.74

0.186 0.158 0.285 0.234 0.670 0.332

inc inc inc inc ,10 ,10

300 245

# Significant pos (neg) 2(0) 0(3) 0(9) 28(0) 35(0) 34(0)

NOTE.—The rightmost column shows numbers of 215 kb regions (out of 37 tested) with a significant positive (pos) or negative (neg) effect of the predictor (P , 0.01 by t-test). Overall significance was calculated by Stouffer’s method (Stouffer et al. 1949) in cases where the direction of correlation was consistent across all regions, and ‘‘inc’’ indicates inconsistency in the direction of correlation, that is, some regions showed negative effects and others positive effects. SEM, standard error of the mean.

Microsatellites in Recombination Hot Spots 2583

FIG. 2.—Wavelet correlations between polymorphic microsatellite fraction and recombination rate. Pair-wise Kendall’s Rank correlations between wavelet decompositions of microsatellite polymorphic fraction and recombination rate for the 37 regions of the human genome with values for each variable for 215 kb contiguous blocks. Scale (kb) is on the X axes and correlation coefficient is on the Y axes. Significant correlations (P , 0.01) are flagged with a red cross. Approximate locations of each region are given, and where they are within 10 Mb of a centromere or telomere they are labeled as near to that feature.

2584 Brandstro¨m et al.

significant predictor of microsatellite polymorphism. There are at least two possible explanations to these observations. One is that microsatellite variability and recombination are unrelated and that the higher incidence of polymorphic microsatellites in hot spots suggested by our initial analysis is mainly due to secondary correlates. However, we cannot exclude the alternative possibility that there indeed is a true relationship between microsatellite variability and recombination but that we do not have enough power to detect this relationship with a generalized model. In the following, we therefore discuss different scenarios compatible with either of these explanations, beginning with possible causes to a link between microsatellite polymorphism and recombination hot spots (polymorphic microsatellites promote recombination, recombination is mutagenic to microsatellites, and recombination reduces the effect of selection at linked sites). A causal Link between Microsatellite Polymorphism and Recombination? A direct link between microsatellites and recombination would be that simple repeats somehow trigger crossing-over. Although double-strand breaks (DSBs), the initial stage of meiotic recombination, show no clear sequence preference in eukaryotes (Nishant and Rao 2006), distal sequences are involved in regulating them (Wu and Lichten 1995). For instance, deletion of a 14-bp poly-A tract reduced activity of the yeast ARG4 hot spot by 75% (Schultes and Szostak 1991) despite the fact that DSBs avoid poly-A (de Massy et al. 1995). Because only a handful of hot spots have been examined in any detail with respect to mechanism, it remains an open possibility that microsatellites could potentiate recombination in at least a subset of such regions. The ubiquitous distribution of microsatellites does not rule this out because hot spots are regulated by heterogenous, complex, and multileveled processes, including distal and local sequences (Petes 2001). Moreover, it may be that different microsatellite motifs vary in the extent to which they trigger recombination. This idea gains some support from the observation of Myers et al. (2005) that certain microsatellite motifs are overrepresented in hot spots. An interesting yet speculative possibility is that length variation as such introduces an instability between sister chromatids that increases the likelihood for recombination events (Kayser et al. 2006). On the other hand, although the high heterozygosity of microsatellite loci means that they frequently show length variation, there are other genomic regions in which length variation involves longer sequence tracts and where a possible heterozygote instability effect would be more plausible. Minisatellites (Jeffreys et al. 1999) and segmental duplications (Sharp et al. 2005) are two such examples. A Causal Link between Recombination and Microsatellite Polymorphism? A possible explanation of high levels of microsatellite variability in recombination hot spots is that recombination

is mutagenic to microsatellites, promoting the generation and maintenance of polymorphic simple repeats in tracts of high recombination. This is supported by some experimental data in Escherichia coli (Hashem et al. 2004) and S. cerevisiae (Gendrel et al. 2000). Instability of trinucleotide repeats implicated in human disease is generated in part by massive expansions caused by homologous recombination (Jakupciak and Wells 2000). However, the observation that microsatellite mutation rates do not differ between markers on the nonrecombining Y chromosome and on autosomes (Kayser et al. 2000) is not compatible with a strong mutagenic effect of meiotic recombination. There is substantial support for the conclusion that replication slippage is the main, if not the sole, cause of microsatellite mutation (Ellegren 2004). In theory, there may be properties of hot spots that increase microsatellite mutability without the direct involvement of recombination. One possibility is replication pausing, which may be causally involved in at least a subset of recombination hot spots (Petes 2001) and where experimental evidence in vitro has linked it to microsatellite mutation (Fouche et al. 2006). However, the evidence for recombination being mutagenic to microsatellites remains weak. The Role of Natural Selection on Polymorphism Levels at Linked Loci It has been noted, in many organisms, that the level of intraspecific genetic diversity and the local rate of recombination are positively correlated (Begun and Aquadro 1992; Nachman 2001). Such correlation is compatible with a role of natural selection in shaping patterns of nucleotide variation at a local scale. This could either be because of background selection, which implies the removal from the population of deleterious mutations (Charlesworth 1994), or hitchhiking with positively selected alleles (Maynard Smith and Haigh 1974). Both these phenomena have the effect of reducing Ne of the local genomic region, which in turn reduces genetic diversity. The range of the effect is expected to be determined by the local recombination rate; high recombination decouples neutral polymorphisms from nearby selected sites, increasing the likelihood of genetic variation being maintained. However, previous studies have not found an association between microsatellite variability and local recombination rates in humans (Payseur and Nachman 2000; Yu et al. 2001; Huang et al. 2002). Similarly, it is not seen in mouse (Thomas et al. 2005). In theory, this could be due to a low density of targets for selection near microsatellites in low recombining regions; however, this seems unlikely given the ubiquitous distribution of microsatellites in the mammalian genome. A more plausible explanation is the usually high mutation rate of microsatellites, which should counterbalance the effects of long-range hitchhiking in low recombining regions (Slatkin 1995), in particular, if selective sweeps are relatively infrequent (Wiehe 1998). In accordance with this interpretation, in Drosophila melanogaster where microsatellite mutation rates are several orders of magnitude lower than in humans (Schlo¨tterer et al. 1998), recombination and microsatellite polymorphism are positively correlated (Schug et al. 1998).

Microsatellites in Recombination Hot Spots 2585

The fact that we, in contrast to previous work, find a rather strong association between recombination and microsatellite diversity (without controlling for other variables) can potentially be explained that recombination is highly clustered in the human genome (Myers et al. 2005). It may be that with relatively homogeneous rates of recombination in non–hot spot regions and with the high mutation rates of microsatellites, recombination is a poor predictor of microsatellite variability for most parts of the genome. It is also possible that previous microsatellite studies have been limited by an ascertainment bias (Ellegren et al. 1995), in which data mainly come from microsatellite markers selected to rather uniformly have high degree of polymorphism.

A Neutral Explanation of Polymorphism at Microsatellite and Other Loci, Not Invoking Recombination Similar to other studies (Begun and Aquadro 1992; Nachman 2001), we find higher levels of unique sequence (SNP, indels) variation in recombination hot spots. After correcting for variance in divergence, the correlation between nucleotide diversity and recombination has been found to be weak (Hellman et al. 2005) or even absent (Hellman et al. 2003), suggesting recombination to be mutagenic. An analysis with higher resolution on local recombination rates, including information on recombination hot spots, found recombination to exert an influence on nucleotide diversity only at very small scales (less than a few kb), but with no corresponding effect on divergence (Spencer et al. 2006). However, this too could be compatible with a neutral model of recombination being mutagenic; the fine-scale recombination landscape of humans and chimpanzee differs considerably (Ptak et al. 2005) which is likely to obscure detection of the possible mutagenic effect of recombination (Yi and Li 2005). Importantly, our failure to detect a significant effect of recombination hot spots on microsatellite variability after controlling for SNP and indel density could indicate a neutral model not related to recombination. The rate of point mutation, as well as of short insertions and deletions, vary considerable across the human genome (Ellegren et al. 2003; Spencer et al. 2006; Tanay and Siggia 2008). The causes of this variation are not fully resolved although they are likely to include factors such as base composition and chromatin structure. As SNP and indel density were found to predict microsatellite polymorphism, this would suggest that there are intrinsic factors affecting the stability of several types of DNA sequences in different chromosomal regions. Given that the mechanism for mutation clearly differs among single nucleotide substitutions, indels and microsatellites, any such intrinsic factor would be most easily conceived to assert its effect through some very general influence on DNA. Base composition, by affecting the stability of the helix, could potentially be seen to play such a general role on overall mutation rates. It is noteworthy in this context that base composition has recently been shown to affect the relative stability of different microsatellite motifs (Brandstro¨m and Ellegren 2008).

Acknowledgments We thank the UPPMAX cluster at Uppsala University for computer time. Financial support was obtained from the Swedish Research Council (H.E.) and the Royal Society of New Zealand Marsden Fund grant UOC202 (A.T.B. and N.J.G.). Literature Cited Bagshaw AT, Pitt JP, Gemmell NJ. 2008. High frequency of microsatellites in S. cerevisiae meiotic recombination hotspots. BMC Genomics. 9:49. Begun DJ, Aquadro CF. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature. 356:519–520. Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27:573–580. Brandstro¨m M, Ellegren H. 2008. Genome-wide analysis of microsatellite polymorphism in chicken circumventing the ascertainment bias. Genome Res. 18:881–887. Bullock P, Miller J, Botchan M. 1986. Effects of poly[d(pGpT)d(pApC)] and poly[d(pCpG)d(pCpG)] repeats on homologous recombination in somatic cells. Mol Cell Biol. 6:3948–3953. Buschiazzo E, Gemmell NJ. 2006. The rise, fall and renaissance of microsatellites in eukaryotic genomes. Bioessays. 28:1040–1050. Charlesworth B. 1994. The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet Res. 63:213–227. de Massy B, Rocco V, Nicolas A. 1995. The nucleotide mapping of DNA double-strand breaks at the CYS3 initiation site of meiotic recombination in Saccharomyces cerevisiae. EMBO J. 14:4589–4598. Dieringer D, Schlo¨tterer C. 2003. Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res. 13:2242–2251. Duret L, Arndt PF. 2008. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 4:e1000071. Ellegren H. 2004. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 5:435–445. Ellegren H, Primmer CR, Sheldon BC. 1995. Microsatellite ‘evolution’: directionality or bias? Nat Genet. 11:360–362. Ellegren H, Smith NG, Webster MT. 2003. Mutation rate variation in the mammalian genome. Curr Opin Genet Dev. 13:562–568. Fouche N, Ozgur S, Roy D, Griffith JD. 2006. Replication fork regression in repetitive DNAs. Nucleic Acids Res. 34:6044–6050. Gendrel CG, Boulet A, Dutreix M. 2000. (CA/GT)(n) microsatellites affect homologous recombination during yeast meiosis. Genes Dev. 14:1261–1268. Hamada H, Seidman M, Howard BH, Gorman CM. 1984. Enhanced gene expression by the poly(dT-dG)poly(dC-dA) sequence. Mol Cell Biol. 4:2622–2630. Hammock EA, Young LJ. 2005. Microsatellite instability generates diversity in brain and sociobehavioral traits. Science. 308:1630–1634. Hashem VI, Rosche WA, Sinden RR. 2004. Genetic recombination destabilizes (CTG)n(CAG)n repeats in E. coli. Mutat Res. 554:95–109. Hellmann I, Ebersberger I, Ptak SE, Paabo S, Przeworski M. 2003. A neutral explanation for the correlation of diversity with recombination rates in humans. Am J Hum Genet. 72:1527–1535.

2586 Brandstro¨m et al.

Hellmann I, Prufer K, Ji H, Zody MC, Paabo S, Ptak SE. 2005. Why do human diversity levels vary at a megabase scale? Genome Res. 15:1222–1231. Huang QY, Xu FH, Shen H, Deng HY, Liu YJ, Liu YZ, Li JL, Recker RR, Deng HW. 2002. Mutation patterns at dinucleotide microsatellite loci in humans. Am J Hum Genet. 70:625–634. International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature. 409:860–921. Jakupciak JP, Wells RD. 2000. Genetic instabilities of triplet repeat sequences by recombination. IUBMB Life. 50: 355–359. Jeffreys AJ, Barber R, Bois P, et al. (15 co-authors). 1999. Human minisatellites, repeat DNA instability and meiotic recombination. Electrophoresis. 20:1665–1675. Jeffreys AJ, Murray J, Neumann R. 1998. High-resolution mapping of crossovers in human sperm defines a minisatelliteassociated recombination hotspot. Mol Cell. 2:267–273. Jensen-Seaman MI, Furey TS, Payseur BA, et al. (42 co-authors). 2004. Comparative recombination rates in the rat, mouse, and human genomes. Genome Res. 14:528–538. Kashi Y, King DG. 2006. Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 22: 253–259. Katti MV, Ranjekar PK, Gupta VS. 2001. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol. 18:1161–1167. Kayser M, Roewer L, Hedman M, et al. (14 co-authors). 2000. Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs. Am J Hum Genet. 66: 1580–1588. Kayser M, Vowles EJ, Kappei D, Amos W. 2006. Microsatellite length differences between humans and chimpanzees at autosomal loci are not found at equivalent haploid Y chromosomal Loci. Genetics. 173:2179–2186. Kirkpatrick DT, Wang YH, Dominska M, Griffith JD, Petes TD. 1999. Control of meiotic recombination and gene expression in yeast by a simple repetitive DNA sequence that excludes nucleosomes. Mol Cell Biol. 19:7661–7671. Kong A, Gudbjartsson DF, Sainz J, et al. (16 co-authors). 2002. A high-resolution recombination map of the human genome. Nat Genet. 31: 241–247. Li YC, Korol AB, Fahima T, Nevo E. 2004. Microsatellites within genes: structure, function, and evolution. Mol Biol Evol. 21:991–1007. Maynard Smith J, Haigh J. 1974. The hitch-hiking effect of a favourable gene. Genet Res. 23:23–35. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. 2004. The fine-scale structure of recombination rate variation in the human genome. Science. 304:581–584. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE. 2006. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16:1182–1190. Mirkin SM. 2006. DNA structures, repeat expansions and human hereditary disorders. Curr Opin Struct Biol. 16:351–358. Morgante M, Hanafey M, Powell W. 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 30:194–200. Murphy KE, Stringer JR. 1986. RecA independent recombination of poly[d(GT)-d(CA)] in pBR322. Nucleic Acids Res. 14: 7325–7340.

Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science. 310:321–324. Nachman MW. 2001. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17:481–485. Napierala M, Dere R, Vetcher A, Wells RD. 2004. Structuredependent recombination hot spot activity of GAATTC sequences from intron 1 of the Friedreich’s ataxia gene. J Biol Chem. 279:6444–6454. Napierala M, Parniewski P, Pluciennik A, Wells RD. 2002. Long CTGCAG repeat sequences markedly stimulate intramolecular recombination. J Biol Chem. 277:34087–34100. Nishant KT, Rao MR. 2006. Molecular features of meiotic recombination hot spots. Bioessays. 28:45–56. Payseur BA, Nachman MW. 2000. Microsatellite variation and recombination rate in the human genome. Genetics. 156: 1285–1298. Pearson CE, Nichol Edamura K, Cleary JD. 2005. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet. 6:729–742. Petes TD. 2001. Meiotic recombination hot spots and cold spots. Nat Rev Genet. 2:360–369. Ptak SE, Hinds DA, Koehler K, Nickel B, Patil N, Ballinger DG, Przeworski M, Frazer KA, Paabo S. 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet. 37:429–434. Rajeevan H, Osier MV, Cheung KH, et al. (13 co-authors). 2003. ALFRED: the ALelle FREquency database. Update. Nucleic Acids Res. 31:270–271. Schlo¨tterer C, Ritter R, Harr B, Brem G. 1998. High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Mol Biol Evol. 15:1269–1274. Schug MD, Hutter CM, Noor MA, Aquadro CF. 1998. Mutation and evolution of microsatellites in Drosophila melanogaster. Genetica. 102–103:359–367. Schultes NP, Szostak JW. 1991. A poly(dA.dT) tract is a component of the recombination initiation site at the ARG4 locus in Saccharomyces cerevisiae. Mol Cell Biol. 11:322–328. Sharp AJ, Locke DP, McGrath SD, et al. (14 co-authors). 2005. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 77:78–88. Slatkin M. 1995. Hitchhiking and associative overdominance at a microsatellite locus. Mol Biol Evol. 12:473–480. Spencer CC, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G. 2006. The influence of recombination on human genetic diversity. PLoS Genet. 2:e148. Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RMJ. 1949. The American Soldier, vol 1: adjustment during army life. Princeton (NJ): Princeton University Press. Tanay A, Siggia ED. 2008. Sequence context affects the rate of short insertions and deletions in flies and primates. Genome Biol. 9:R37. Tautz D, Trick M, Dover GA. 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature. 322:652–656. Thomas M, Ihle S, Ravagarimanana I, Kraechter S, Wiehe T, Tautz D. 2005. Microsatellite variability in wild populations of the house mouse is not influenced by differences in chromosomal recombination rates. Biol J Linn Soc. 84: 629–635. Treco D, Arnheim N. 1986. The evolutionarily conserved repetitive sequence d(TGAC)n promotes reciprocal exchange and generates unusual recombinant tetrads during yeast meiosis. Mol Cell Biol. 6:3934–3947.

Microsatellites in Recombination Hot Spots 2587

Wahls WP, Wallace LJ, Moore PD. 1990. The Z-DNA motif d(TG)30 promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture. Mol Cell Biol. 10:785–793. Wiehe T. 1998. The effect of selective sweeps on the variance of the allele distribution of a linked multiallele locus: hitchhiking of microsatellites. Theor Popul Biol. 53:272–283. Wu TC, Lichten M. 1995. Factors that affect the location and frequency of meiosis-induced double-strand breaks in Saccharomyces cerevisiae. Genetics. 140:55–66.

Yi S, Li WH. 2005. Molecular evolution of recombination hotspots and highly recombining pseudoautosomal regions in hominoids. Mol Biol Evol. 22:1223–1230. Yu A, Zhao C, Fan Y, et al. (11 co-authors). 2001. Comparison of human genetic and sequence-based physical maps. Nature. 409:951–953.

Diethard Tautz, Associate Editor Accepted September 1, 2008