REVIEW ARTICLE
The role of variable DNA tandem repeats in bacterial adaptation Kai Zhou, Abram Aertsen & Chris W. Michiels Department of Microbial and Molecular Systems (M²S), Faculty of Bioscience Engineering, Laboratory of Food Microbiology and Leuven Food Science and Nutrition Research Centre (LFoRCe), KU Leuven, Leuven, Belgium
Correspondence: Chris W. Michiels, Laboratory of Food Microbiology, Kasteelpark Arenberg 23, B-3001 Leuven, Belgium. Tel.: +32 16 321578; fax: +32 16 321960; e-mail:
[email protected] Received 17 April 2013; revised 13 July 2013; accepted 26 July 2013. Final version published online 28 August 2013. DOI: 10.1111/1574-6976.12036 Editor: Grzegorz Wegrzyn
MICROBIOLOGY REVIEWS
Keywords polymorphic DNA; contingency loci; host– pathogen interaction; stress tolerance; evolution; phase variation.
Abstract DNA tandem repeats (TRs), also designated as satellite DNA, are inter- or intragenic nucleotide sequences that are repeated two or more times in a headto-tail manner. Because TR tracts are prone to strand-slippage replication and recombination events that cause the TR copy number to increase or decrease, loci containing TRs are hypermutable. An increasing number of examples illustrate that bacteria can exploit this instability of TRs to reversibly shut down or modulate the function of specific genes, allowing them to adapt to changing environments on short evolutionary time scales without an increased overall mutation rate. In this review, we discuss the prevalence and distribution of inter- and intragenic TRs in bacteria and the mechanisms of their instability. In addition, we review evidence demonstrating a role of TR variations in bacterial adaptation strategies, ranging from immune evasion and tissue tropism to the modulation of environmental stress tolerance. Nevertheless, while bioinformatic analysis reveals that most bacterial genomes contain a few up to several dozens of intra- and intergenic TRs, only a small fraction of these have been functionally studied to date.
Introduction To cope with rapidly changing environmental conditions and ensure their survival, unicellular organisms have evolved a plethora of adaptation strategies (Aertsen & Michiels, 2004, 2005). Most of these strategies are based on transient alterations in gene expression in response to stressful conditions, and well-known examples include the SOS response (controlled by RecA and LexA), the general stress response (regulated by the sigma factor RpoS), the stringent response (mediated by pppGpp and ppGpp), and the heat shock response (mainly controlled by RpoH; Massey & Buckling, 2002; Foster, 2005; Saint-Ruf & Matic, 2006; Foster, 2007; Jolivet-Gougeon et al., 2011). In addition, adaptation can also stem from the acquisition of stochastic mutations that alter the genotype, which become positively selected and fixed in a population if they coincide with a beneficial phenotype (Rando & Verstrepen, 2007). However, an important drawback of FEMS Microbiol Rev 38 (2014) 119–141
the latter strategy is that random mutations are more often deleterious than beneficial. Interestingly, neither the type nor the frequency of mutational events is randomly distributed over the genome, and some DNA sequences have evolved to be mutational hotspots that drive the variability of genes whose activity can impact the adaptive potential of their host. One type of such special sequences that is very abundant in prokaryotic and eukaryotic genomes is known as tandem repeats (TRs), a major class of direct DNA repeats. While at first TRs were considered to be junk DNA without any biological function, studies of the human genome have revealed some of these repetitive sequences to be hypermutable and the cause of diseases such as fragile X syndrome, spinobulbar muscular atrophy, and huntington disease (Hannan, 2010). In addition, accumulating evidence points out the potential role of TRs as engines of genetic variability and bacterial adaptation. In this review, we therefore focus on the ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
120
K. Zhou et al.
identification and distribution of TRs in bacteria, the mechanisms behind their variability, and their biological significance in bacterial adaptation.
Identification and distribution of TRs in bacterial genomes Definition of TRs
TRs are nucleotide sequences that are directly repeated in a head-to-tail manner. According to the conservation of the repeated sequence, TRs are classified as identical/perfect TRs or degenerated/imperfect TRs, respectively (Fig. 1). Furthermore, TRs are commonly classified into three categories according to their repeat unit size, although there is no consensus definition of these categories (Richard et al., 2008). Repeats with unit size varying from 1 to 9, 10 to 100, and > 100 bp are termed microsatellites, minisatellites, and macrosatellites, respectively (Lopes et al., 2006). The term ‘satellite DNA’ originally refers to the very large arrays of tandemly repeated noncoding DNA (often hundreds of copies) that are characteristic of large eukaryotic genomes, but, in the context of bacterial genomes, is also used to include small and intragenic TRs. In silico identification of TRs
The increasing availability of genome sequences and specialized bioinformatics software greatly facilitates the search and identification of TR loci on a genomewide scale, which obviously is a prerequisite for understanding their distribution, predicting their function, and tracking
their evolution. A variety of algorithms have been developed for detecting TRs, but it is important to be aware that these may differ in their ability to detect different types of TRs (Merkel & Gemmell, 2008; Treangen et al., 2009; Kajava, 2012). Hence, the choice of search tool should be determined by the TR type of interest, or the parallel use of several algorithms is advisable when a wide screen for TRs is performed. Furthermore, parameter settings (i.e. alignment weights, definition of repeats, and threshold scores) can also strongly affect outcome in terms of number and consensus motif of TRs (Lim et al., 2013). In particular, problems are still commonly encountered in detecting imperfect TRs (Leclercq et al., 2007; Schaper et al., 2012). Several algorithms are freely available online, such as Tandem Repeat Finder (Benson, 1999) and IMEx (Mudunuri & Nagarajaram, 2007). In addition, several databases of annotated TRs in prokaryotes have been established, such as TRs DB (http://minisatellites. u-psud.fr), PSSRDb (http://pssrdb.cdfd.org.in) and MICAS (http://180.149.48.108/micas/index.php). In the next section, we will review the major findings from some recent in silico studies of the genomic distribution of TRs in bacteria. The distribution of TRs in bacterial genomes
The analysis of TRs in bacterial genomes so far has mainly focused on microsatellites with unit size 1–6 bp, also termed ‘simple sequence repeats (SSRs)’. A number of general observations regarding the distribution of SSRs can be stated. First of all, the abundance of SSRs in bacteria is lower than that in eukaryotes (Schlotterer et al., 2006). Nevertheless, the number of SSRs is orders of
(a) IdenƟcal / Perfect TRs (unit sequence = 100% conserved)
AGCTG
AGCTG
AGCTG
AGCTG
AGCTG
Degenerated / Imperfect TRs (unit sequence < 100% conserved)
AGCTG
TGCTG
AGGTG
AGCTG
AGCTC
(b)
Microsatellite (unit size 1-9 bp)
Minisatellite (unit size 10-100 bp)
Macrosatellite (unit size > 100 bp)
Fig. 1. Schematic representation of different types of TRs. (a) Different conservation of repeat unit sequence. (b) Different sizes of repeat unit. Space between repeat units has only been introduced to improve visual clarity of the figure.
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
magnitude higher than that of other repeat types (i.e. minisatellite and macrosatellite) in the genomes of most, if not all bacteria. Of course, this is not unexpected because this SSR count included even mononucleotide trimers (e.g. AAA), which account for about 70% of the total number of SSRs. While SSRs are generally believed to contribute to genome polymorphism and adaptation potential of bacteria (Kassai-Jager et al., 2008), the contribution of these very small SSRs like mononucleotide trimers is probably limited. In fact, a rough threshold of minimum TR unit number (4–9) has been noted, below which a SSR is not likely to mutate or be variable (Lai & Sun, 2003; Dettman & Taylor, 2004; Kelkar et al., 2010). Intriguingly, heptameric repeats were found to be overrepresented among these SSRs in most prokaryotes, and it was hypothesized that the seemingly preferred 7 bp length of a repeat unit might relate to the DNA segment size that interacts with the active site of the DNA polymerase, thus facilitating the occurrence of polymerase slippage (Mrazek et al., 2007). A remarkable feature of SSRs is their widely diverse distribution across species, even closely related ones, and this may indicate that they are subject to rapid evolutionary change (Yang et al., 2003; Mrazek, 2006; Kassai-Jager et al., 2008). Analysis of more than 300 prokaryotic genomes showed that the distribution of SSRs varied with the bacterial species, genome size, and G + C content (Mrazek et al., 2007). More specifically, SSRs with small motif (1–4 bp) are more abundant in small genomes and particularly in host-adapted pathogens with reduced genomes (< 2 Mb) and low G + C content (< 40%), such as Mycoplasma and Haemophilus spp. (Moxon et al., 2006; Treangen et al., 2009). In contrast, SSRs with a larger motif (5–11 bp) are more frequent in nonpathogens and opportunistic pathogens with large genomes (> 4 Mb) and high G + C content (> 60%), such as Burkholderia and Anabaena spp. Based on this observation, it was hypothesized that the differential representations of SSRs in bacteria may correlate with pathogenicity, but more work is needed to corroborate this. Another interesting observation is that some relatively large bacterial genomes (e.g. Pseudomonas aeruginosa, c. 5 Mb) have fewer SSRs than would be predicted based on their genome properties, but harbor comparatively more two-component sensor transducers. In contrast, some host-adapted pathogens with small genome size (i.e. Haemophilus influenzae, Neisseria meningitidis, and Helicobacter pylori) have comparatively more SSRs, but less two-component sensor transducers (Moxon et al., 2006). Thus, it seems that environmental adaptability in host-adapted pathogens depends primarily on SSR variations, while in opportunistic pathogens with a more versatile lifestyle it depends primarily on two-component sensor transducers. FEMS Microbiol Rev 38 (2014) 119–141
121
Closer examination of the SSR distribution across the genome shows significant differences in coding and noncoding regions. Because bacterial genomes are more compact than those of eukaryotes, they have comparatively more intragenic than intergenic SSRs. For example, in Escherichia coli K-12, 79.5% of SSRs locate in coding regions (Gur-Arie et al., 2000), whereas in the genome of the Japanese pufferfish (Fugu rubripes), only 11.6% of SSRs are intragenic (Edwards et al., 1998). Generally, long mono- and dinucleotide SSRs are excluded from coding regions, probably because they have a higher probability to rearrange and cause frameshift mutations in genes (Coenye & Vandamme, 2005; Ackermann & Chao, 2006; Orsi et al., 2010; Lin & Kussell, 2012). In contrast, SSRs whose unit size is a multiple of three nucleotides (3, 6, 9 …) are overrepresented in open reading frames (ORFs) because their expansion or contraction does not disrupt the reading frame (Mrazek et al., 2007). However, exceptions have been reported. For example, tetranucleotide SSRs of H. influenzae are exclusively found in ORFs, which is consistent with their role in phase variation (Power et al., 2009). An interesting situation exists in the mycoplasmas, where long trinucleotide repeats are overrepresented in Mycoplasma genitalium, Mycoplasma gallisepticum, and Mycoplasma hyopneumoniae, but occur mainly in intergenic regions in the former two species, but in coding regions in the latter one (Mrazek, 2006). This difference in distribution is also reflected in different functional roles. In M. gallisepticum, the most prominent trinucleotide TRs are the GAA repeats in the 5′ untranslated region of the 42 up to 70 vlpA adhesin gene paralogs that exist in each strain, which regulate vlpA gene expression (Glew et al., 1998, 2000; Liu et al., 2000; Papazisi et al., 2003). In contrast, M. hyopneumoniae trinucleotide repeats are found mostly within hypothetical ORFs, but also in some adhesins, and their contraction or expansion results in variability of amino acid repeats that are believed to play a role in protein–protein interaction or adhesion (Mrazek, 2006). A more detailed study on the occurrence of intragenic TRs in 44 bacteria and archaea revealed additional features (Lin & Kussell, 2012). Intragenic SSRs were found more frequently near the termini (5′ and 3′ ends) of the ORF rather than in the middle, which most likely stems from biophysical constraints of protein structure. In addition, SSR-induced frameshifts at the 3′ end are less harmful than at other parts, because most of the upstream coding region will not be affected. Nevertheless, an overrepresentation of SSRs was found in the 5′ end in ORFs of pathogens, probably because this allows SSRinduced frameshifts to function as an ON/OFF switch for these ORFs, which can be advantageous for pathogens because it facilitates rapid adaptation of populations. Similar observations had already been made earlier for ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
122
some intragenic mononucleotide repeats at the 5′ end of genes (van Passel & Ochman, 2007; Janulczyk et al., 2010; Orsi et al., 2010). However, it remains unclear and often difficult to prove whether this type of distribution bias of intragenic SSRs is linked to selection pressures in bacteria. An argument in favor of such a link is that intragenic SSRs show a preference for certain categories of genes. In both Gram-negative (e.g. Haemophilus and Helicobacter) and Gram-positive (e.g. Streptococcus) pathogens, SSRassociated genes frequently encode virulence factors, cell surface components, and restriction–modification enzymes (van Belkum, 1999; Moxon et al., 2006; Guo & Mrazek, 2008; Power et al., 2009; Janulczyk et al., 2010). On the other hand, several intragenic SSRs with numerous repeat copies and a unit size that is not a multiple of three are also found in housekeeping genes whose products are essential for important cellular processes, such as cell division, energy production, and DNA replication and repair (Guo & Mrazek, 2008). Obviously, corresponding TR rearrangements leading to reading frame disruption are anticipated to be detrimental or even lethal for the cell, and it remains unclear why such TRs have been maintained during evolution. Intergenic SSRs also show a nonrandom distribution, being found more frequently in the immediate vicinity of genes than at distant positions. For example, intergenic SSRs of E. coli K-12 concentrate in a region up to 200 bp from the start codon, which contains proximal regulators of gene expression (Gur-Arie et al., 2000). Another study showed that in most cases, the intergenic SSRs with numerous copies are located upstream of the first gene in prokaryotic operons (Guo & Mrazek, 2008). Together, both studies reflect the potential role of intergenic SSRs in the regulation of gene expression.
The variability of TRs The variability of TRs is thought to be one of the drivers of genomic plasticity. The regions containing TRs are potentially hypermutable by contraction (deletion) or expansion (insertion) of TR units, and mutation frequencies up to 10 1 have been reported in bacteria (Rando & Verstrepen, 2007). Not surprisingly, the polymorphisms found in TR loci provide a foundation for DNA genotyping approaches such as variable number TR-based typing or multilocus variable repeat analysis, which are commonly applied for pathogen typing (Lindstedt, 2005, 2011; reviewed in Chiou, 2010). The molecular mechanisms of TR variation
Based on extensive studies in both plasmid-based and chromosomal systems, two nonexclusive mechanisms, ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
replication slippage and recombination, are currently widely accepted to explain TR variation (Pearson et al., 2005; Bichara et al., 2006; Gemayel et al., 2010). With regard to the slippage mechanism, several models have been proposed, and the most common one is strandslippage mispairing (SSM), also called DNA slippage or polymerase slippage (Kornberg et al., 1964; Streisinger et al., 1966). This model proposes that TR rearrangements result from strand slippage during DNA replication. The process is initiated by the formation of a bulge structure of unpaired repeats either on the template or on the nascent strand. If the bulge is present on the template strand, it will result in a TR contraction (deletion) in the newly synthesized DNA. In contrast, TR expansion (insertion) will result when a bulge forms on the nascent strand (Fig. 2). Currently, substantial evidence supports the involvement of replication in TR instability in bacteria. In E. coli, for example, triplet nucleotide tracts in a plasmid or in the chromosome were dramatically destabilized in a dnaQ49 mutant. The dnaQ gene encodes the 3′–5′ exonucleolytic e-subunit of DNA polymerase III, which is involved in proofreading, and it was therefore suggested that this mutant failed to correctly remove slipped structures in the TR tracts during replication (Iyer et al., 2000; Zahra et al., 2007). Moreover, mutations in the a subunit (encoded by dnaE) of DNA polymerase III holoenzyme, c and s subunits of the clamp-loading complex (encoded by dnaX), and b clamp (encoded by dnaN) have also been shown to increase instability of microsatellites and tandemly repeated DNA sequences (reviewed in Bichara et al., 2006). In general, the effect of DNA replication on TR rearrangements provides evidence for the replication slippage mechanism, because a prerequisite of this mechanism is that DNA replication is stalled by the secondary loop structures formed by TRs. Notably, the SSM mechanism is not only widely accepted to account for the majority of SSR variations, it is also invoked to explain the genesis of TRs from unrepeated DNA (Levinson & Gutman, 1987; Waite et al., 2003; Lindb€ack et al., 2011). Besides SSM, recombination is considered as another mechanism of TR instability that could explain phenomena that cannot be explained by the slippage mechanism. Generally, recombination is more important for the rearrangement of TRs with large unit size, whereas SSM is the dominant mechanism underlying variation of TRs with small unit size (Bi & Liu, 1996; Richard & P^aques, 2000; Bzymek & Lovett, 2001; Rocha, 2003; Gemayel et al., 2010). Both homologous (RecA-dependent) and illegitimate (RecA-independent) recombination can be involved. Several models have been proposed for the recombination mechanism, such as unequal crossover and intramolecular recombination (Fig. 3), and evidence for FEMS Microbiol Rev 38 (2014) 119–141
123
Variable tandem repeats and bacterial adaptation
3’
5’ 3’
ReplicaƟon 5’
3’ Strand dissociaƟon
5’ 3’
5’ ContracƟon
Expansion
5’ 3’
3’ 5’
5’ 3’
3’ 5’
Fig. 2. Diagram illustrating the replication slippage mechanism of TR rearrangement. Repeat units are shown as blocks on nascent (light green) and template (dark green) DNA strands. Shown is a partially replicated TR region undergoing transient dissociation and mispairing, resulting in a bulge on the template or the nascent strand and leading to insertion or deletion of TRs, respectively. Space between repeat units has only been introduced to improve visual clarity of the figure.
the involvement of recombination in TR variations is accumulating. For example, it has been suggested that double-strand DNA break (DSB) repair can induce TR instability via homologous recombination in prokaryotes and eukaryotes (Hebert & Wells, 2005; reviewed in Richard et al., 2008; Malkova & Haber, 2012), and mechanistic models explaining this process have been elaborated. One such model is the DSB repair slippage model, which combines the double Holliday junction intermediate pathway with the strand-slippage model (Richard et al., 2008). In an alternative explanation, the synthesis-dependent strand annealing pathway also contributes to the TR rearrangements mediated by DSB repair (P^aques et al., 1998, 2001; Richard et al., 1999; Richard & P^aques, 2000). This model is proposed for explaining TR rearrangements that are rarely associated with crossover events (P^aques et al., 1998, 2001; Richard et al., 1999). Factors influencing TR rearrangement frequencies
While TRs generally are intrinsically prone to incur contraction or expansion, the actual frequency of these events FEMS Microbiol Rev 38 (2014) 119–141
can vary widely depending on both intrinsic (structural) and extrinsic (environmental) factors. Regarding intrinsic TR features, a positive correlation has been established between TR copy number and rearrangement frequency in several studies. Through in silico genome analysis of multiple strains of 42 fully sequenced prokaryotic species, Lin & Kussell (2012) observed that the variability of three types of SSRs (monomeric, dimeric, and trimeric repeats) increased dramatically with the number of repeat units. In another study, an exponential relation between the number of repeat units and rearrangement frequency was observed in a comparison of 30 artificially constructed TRs (unit lengths of 2, 10, and 20 nucleotides; number of units between 2 and 50; sequence conservation between 62.5% and 100%; Legendre et al., 2007). This also accounts for the fact that long TRs are relatively uncommon (Lai & Sun, 2003). Similar findings have been reported in several other studies (Goldstein & Clark, 1995; Brinkmann et al., 1998; Lai & Sun, 2003; Vogler et al., 2006). Likewise, a positive relationship is generally found between TR mutation frequency and the size of the repeat unit (Sia et al., 1997; Schug et al., 1998; Eckert & Hile, 2009; Bayliss et al., 2012) as well as the degree of ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
124
K. Zhou et al.
5’ 3’
3’ 5’
(a)
(b)
5’ 3’ Unequal crossover
5’ 5’
3’ Intramolecular recombinaƟon
3’ 5’
5’ 3’ Expansion
ContracƟon
3’
3’ 5’
3’ 5’
3’ 5’
5’ 3’
3’ 5’
ContracƟon
Fig. 3. Diagram illustrating the recombination mechanism of TR rearrangement. DNA molecules with repeat units represented as blocks are shown in different colors. (a) Model of unequal crossover. When repeat units of two different DNA molecules misalign, unequal crossover can occur, resulting in repeat expansion in one crossover product and repeat contraction in the other. (b) Model of intramolecular recombination. When recombination occurs between repeat units within a DNA molecule, two products are generated that have undergone a repeat contraction.
conservation between repeats (Legendre et al., 2007). In addition, the GC content of the TR is also an important factor determining stability, because repeated sequences are prone to form diverse non-B DNA structures (i.e. hairpins and triplexes), which may cause pausing of the DNA polymerase and replication fork collapse, and in turn necessitate intervention of the repair and recombination machinery to reinitiate replication (Wells et al., 2005; Choudhary & Trivedi, 2010). Besides, the orientation of the repeats with respect to the direction of replication can affect the mutation ratio as well, because TRs are more prone to form secondary structures in one orientation than in the other (Hebert et al., 2004), and replication fidelity is not equal in the leading strand compared with the lagging strand (Gawel et al., 2002). Intriguingly, based on a theoretical model, Lai & Sun (2003) suggested that expansion occurs more frequently for short microsatellites, while contraction occurs more frequently for long ones, suggesting the rearrangement pattern might be dependent on the repeat type. However, the lack of experimental evidence so far cannot validate this observation. Aside from intrinsic factors, extrinsic environmental conditions may also affect TR rearrangement frequencies. For instance, it was shown that several TRs used in a multilocus TR typing scheme of E. coli showed enhanced ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
variation at increased growth temperature and upon starvation in E. coli O157:H7, but not upon irradiation (Cooley et al., 2010). Likewise, in sclB of Streptococcus pyogenes, encoding a collagen-like surface protein, TR variations occurred during growth in fresh human blood, but not in medium (Rasmussen & Bj€ ork, 2001). However, the underlying mechanisms by which environmental stresses affect the TR mutation frequency are poorly understood.
The phenotypical impact of intergenic TR variations in bacteria Accumulating evidence indicates that rearrangements of intergenic TRs can confer transcriptional evolvability (Jansen et al., 2012). More specifically, SSRs positioned as cis-regulatory elements around the promoter region can induce phase variation (i.e. stochastic, high frequency, reversible switching of genotype, and/or phenotype) by modulating the transcription of the corresponding genes. Most studies indicate that intergenic SSRs, except monomeric SSRs, involved in phase variation tend to be A/T rich, which makes them prone to melting and SSM. In this section, we review examples of this mechanism of phase variation according to the different positions of intergenic TRs relative to the transcriptional start site (Fig. 4; Table 1). FEMS Microbiol Rev 38 (2014) 119–141
125
Variable tandem repeats and bacterial adaptation
UAS
ATG
RNA pol –35
SD +1
–10
B
A
ORF
C
Fig. 4. Scheme showing possible positions of intergenic TRs in a standard promotor region that can cause phase variation. Indicated are an ORF, a promoter with 10 and 35 signatures that are recognized by RNA polymerase (RNA pol), an upstream regulatory sequence to which activators or repressors can bind (UAS), the transcription initiation site (+1), a Shine–Dalgarno sequence for ribosome binding (SD), and a translation start codon (ATG). Repeats in regions A and B (double-headed arrows) can modulate gene expression by affecting transcription initiation (see text TRs upstream of 35 site of promotor and TRs between 35 and 10 sites of promotor), whereas repeats in region C operate via as yet unidentified means (see text TRs between transcriptional start site and ORF; adapted from van der Woude & Baumler, 2004).
TRs upstream of
35 site of promotor
When TRs are present upstream of the 35 site, the repeat copy number is expected to affect the binding of transcription factors and thus modulate gene expression. This has been reported in the pathogen N. meningitidis, where expression of the nadA gene (encoding a protein that functions as invasin and adhesin) is regulated by the number of tetrameric repeats (TAAA) within the upstream region of the RNA polymerase binding site, resulting in three significantly distinct transcriptional levels (high, intermediate, and low). The frequency of phase variation between these levels has been estimated at c. 4.4 9 10 4. Remarkably, the levels of nadA transcription show a periodic rather than a monotonous relation to the number of repeat units. As such, the transcription level varies in the order low–high–intermediate–low–high for repeat copy numbers of 4–5–6–7–8 and again for 9–10–11–12–13 repeats. Further mechanistic studies indicated that variation of the tetrameric repeats affects the binding of the integration host factor transcriptional
regulator protein to the nadA promoter (Martin et al., 2003, 2005). More recently, Metruccio et al. (2009) discovered that depending on the number of TAAA repeats, a novel repressor (NadR) prevents transcription of nadA through binding of two operators flanking the variable tetrameric repeat tract on both sides. As a result, it was proposed that alteration of the spacing between these two operators by variation of the number of TAAA repeats may affect the ability of NadR to repress nadA expression (Metruccio et al., 2009). TRs between
35 and
10 sites of promotor
The spacing between the 35 and 10 sites of a standard promoter is critical for the binding efficiency of RNA polymerase (Fig. 4), and the optimal distance is around 17 bp in bacteria. Consequently, when TRs locate in that region, copy number changes are expected to modulate the transcription level of the genes in the transcriptional unit. An example of this is found in the host-adapted pathogen H. influenzae, which adheres to human epithelial cells with the help of LKP fimbriae (also called long, thick pili). These fimbriae are encoded by the hif gene cluster and important for H. influenzae infection at different stages (van Ham et al., 1993). Phase variation in the expression of these fimbriae is mediated by a string of dinucleotide repeats between the 35 and 10 sites within the overlapping, but divergent promoter regions of hifA and hifB genes, with TR copy numbers of 9, 10, or 11, respectively, resulting in no, high, and low expression of both genes. Another intriguing example is the FetA protein of Neisseria gonorrhoeae, an iron-repressible protein functioning as ferric enterobactin receptor. The expression of FetA exhibits extremely rapid phase variation (switching frequency up to 1.3% per generation) correlating with polymorphism of a poly-C tract between the 10 and 35 regions of the fetA promoter. The various lengths of the poly-C tract result in either high or low expression.
Table 1. Overview of mechanisms by which intergenic TRs can modulate gene expression Location of TRs
Mechanisms
References
Upstream of
Affects transcription initiation by modifying binding affinity of regulatory proteins Affects transcription initiation by altering the distance of promoter elements
Miller et al. (1987), Martin et al. (2003, 2005), Metruccio et al. (2009) Willems et al. (1990), Yogev et al. (1991), van Ham et al. (1993), Sarkari et al. (1994), Carson et al. (2000), van der Ende et al. (2000) Lafontaine et al. (2001), Attia & Hansen (2006)
Between
35 site
35 and
10 sites
Between transcriptional start and ORF Between two separate transcription start sites
May modify binding affinity of regulatory proteins or mRNA stability Unknown
Dawid et al. (1999)
ORF, open reading frame; TR, tandem repeat.
FEMS Microbiol Rev 38 (2014) 119–141
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
126
It was suggested that phase variation of FetA reflects a balance between the advantages of iron scavenging, on the one hand, and evasion of the host immune response (based on FetA immunogenicity), on the other (Carson et al., 2000). A more complex situation is found in the PorA outer membrane protein in N. meningitidis. Not only is expression of the porA gene stochastically modulated by two variable homopolymeric tracts (poly-G and poly-T) between the 35 and 10 sites of the promoter region, there is also a variable poly-A tract within the porA coding region. Both sites are believed to serve evasion of the host immune response to this protein and explain the poor efficacy observed for PorA-based vaccines (van der Ende et al., 2000). Further examples include the genes encoding the lipoprotein Vlps of Mycoplasma hyorhinis (Yogev et al., 1991), the fimbrial subunits of Bordetella pertussis (Willems et al., 1990), and the outer membrane protein Opc of N. meningitidis (Sarkari et al., 1994), and it can be concluded that variable SSRs within the 35 and 10 region represent a widespread strategy of phase variation by modulated gene expression in pathogens. TRs between transcriptional start site and ORF
While intergenic TR-dependent phase variations in bacteria mostly belong to the two classes described above, some cases are mediated by TRs located between the transcriptional start site and the ORF. In the Gram-negative pathogen Moraxella catarrhalis, the UspA1 protein functions as an adhesin to mediate binding to human epithelial cells (Lafontaine et al., 2000). Its expression was shown to be phase variable and correlated with adherence capacity. Sequence analysis revealed a variable homopolymeric poly-G tract between the transcriptional start site and the start codon to be responsible for the variability in uspA1 expression. Stable mRNA and strong expression of UspA1 was detected with a 10-bp G repeat tract, while truncated mRNA and weak expression occurred with a 9bp G tract. Based on these observations, it was proposed that alterations in the poly-G tract affect the binding efficiency of transcriptional regulators and/or the stability of the uspA1 mRNA (Lafontaine et al., 2001). Interestingly, a similar case was recently uncovered in another outer membrane protein of M. catarrhalis, UspA2, which is involved in serum resistance and vitronectin binding (Attia & Hansen, 2006). It was observed that a tetrameric (AGAT) repeat tract, located between the uspA2 transcription start site and the start codon, is highly variable in M. catarrhalis isolates. Moreover, the expression level of UspA2 and, as a result, the serum resistance and vitronectin binding capacity of the cells ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
were increased as the repeat copy number increased, possibly because the variable number of AGAT repeats may affect the secondary structure of the UspA2 mRNA transcript. TRs between two separated transcription start sites
This special example of phase switching induced by variable intergenic TRs has so far only been described in H. influenzae (Dawid et al., 1999). HMW1 and HMW2 are two adhesins of H. influenzae that exhibit different cellular binding specificities and are encoded by two separate chromosomal loci hmw1AC and hmw2AC, respectively. Both genes show 80% similarity, suggesting that they can be considered as alleles (Barenkamp & Leininger, 1992; Ecevit et al., 2004). Promoter analysis revealed two transcription start sites (P1 and P2) within the upstream region of each gene and a heptameric (ATCTTTC) TR array between P1 and P2 of both genes. The occurrence of variations in the number of TR copies has been confirmed in H. influenzae isolates from patients with chronic obstructive pulmonary disease (Dawid et al., 1999; Cholon et al., 2008). These variations can affect mRNA synthesis and thereby influence the expression of corresponding proteins. For example, an increasing number of TR units resulted in a gradual decrease in specific mRNA synthesis and protein expression and vice versa. However, the underlying molecular mechanism remains obscure.
The phenotypical impact of intragenic TRs in bacteria There is abundant evidence that besides intergenic TRs, also intragenic TRs can trigger phase variation (Table 2). However, the underlying mechanisms are dependent on the nature of the TR. More specifically, if the TR unit size is not a multiple of three, rearrangements are able to induce frameshift mutations as the cause of ON–OFF phase variation. In comparison, phase variation induced by TRs whose unit size is a multiple of three is more complex and is probably related to specific structural and functional alterations of the corresponding proteins (Gemayel et al., 2010). In this section, we review studies on the phenotypical impact of intragenic TRs, grouped according to their location in different functional classes of genes. Cell surface structural genes with TRs
It has been noted that TRs are most abundant in genes whose products are either exposed on the cell surface or involved in the biogenesis of cellular surface structures, FEMS Microbiol Rev 38 (2014) 119–141
127
Variable tandem repeats and bacterial adaptation
Table 2. Examples of intragenic TRs causing phase switching in bacteria Affected moiety
Bacterial species
Gene(s) or operon
Repeat motif* (5′–3′)
References
Adhesin
Haemophilus influenzae Helicobacter pylori Neisseria gonorrhoeae Mycoplasma hominis Legionella pneumophila Streptococcus pneumoniae
Neisseria meningitidis Escherichia coli Xanthomonas campestris Haemophilus influenzae Neisseria meningitidis
cha sabA opa vaa lcl cps15bM cap8E tts siaD neuO avrBs3 hgp hpuA, hmbR
56 bp CT CTCTT A 45 bp† TA 223 bp 22 bp C AAGACTC 102 bp CCAA G
Campylobacter coli Mycoplasma hyorhinis Mycoplasma bovis Mycoplasma pulmonis Haemophilus influenzae Haemophilus influenzae Haemophilus influenzae Neisseria meningitidis
flhA vlp vspA vsa losA lic1, lic2 A, lic3 A lex2, oafA lgtA, C, D
T 24/26 bp† 18/24 bp† 34 bp CGAGCATA CAAT GCAA G
Helicobacter pylori
futA, futB
C & 21 bp‡
Escherichia coli Escherichia coli Salmonella Typhimurium Neisseria meningitidis Group B streptococci Mycoplasma fermentans Escherichia coli Escherichia coli Neisseria gonorrhoeae Legionella pneumophila Listeria monocytogenes
xylB mutL mutL porA bca p78 tolA ahpC pilC fimV ctsR prfA hsdM hsdS modA modA modB modD res modH mod mrr
G CTGGCG GCTGGC G 246 bp A 15-18 bp† TCT G 18 bp† GGT CAGGAGT GACGA G AGTC AGCC CCCAA ACCGA C G CACAG G
Sheets & St Geme (2011) Goodwin et al. (2008) Murphy et al. (1989) Zhang & Wise (1997) Vandersmissen et al. (2010) van Selm et al. (2003) Waite et al. (2003) Waite et al. (2003) Hammerschmidt et al. (1996b) Deszo et al. (2005) Herbers et al. (1992), Kay et al. (2007) Jin et al. (1999), Ren et al. (1999) Lewis et al. (1999), Richardson & Stojiljkovic (1999) Park et al. (2000) Rosengarten & Wise (1991) Lysnyansky et al. (1996) Bhugra et al. (1995) Erwin et al. (2006) Hosking et al. (1999), Dixon et al. (2007) Griffin et al. (2003), Fox et al. (2005) Yang & Gotschlich (1996), Jennings et al. (1999) Appelmelk et al. (1999), Nilsson et al. (2008) Funchain et al. (2000) Shaver & Sniegowski (2003) Chen et al. (2010) van der Ende et al. (2000) Gravekamp et al. (1996, 1997) Theiss & Wise (1997) Zhou et al. (2012a, b) Ritz et al. (2001) Jonsson et al. (1991, 1992) Coil & Anne (2010) Karatzas et al. (2003, 2005) Lindb€ ack et al. (2011) Zaleski et al. (2005) Adamczyk-Poplawska et al. (2011) Srikhanta et al. (2005) Srikhanta et al. (2009) Srikhanta et al. (2009) Seib et al. (2011) de Vries et al. (2002) Srikhanta et al. (2011) Ryan & Lo (1999) Tesfazgi Mebrhatu et al. (2011)
Capsule
Effector Fe binding
Flagellin Lipoprotein
Lipopolysaccharides
Metabolism Mismatch repair Outer membrane protein
Inner membrane protein Peroxiredoxin Pilus Regulator R-M§ system I R-M§ system III
R-M§ system IV
Haemophilus influenzae Neisseria gonorrhoeae Haemophilus influenzae Neisseria spp. Neisseria spp. Neisseria meningitidis Helicobacter pylori Helicobacter pylori Pasteurella haemolytica Escherichia coli
TR, tandem repeat. *The motif sequences of microsatellites (≤ 9 bp) are listed; for longer TRs, only the length is given. † Different repeat unit lengths or sequences have been reported for this TR locus. ‡ Two TR loci exist at different positions in the same gene. § Restriction–modification system.
such as lipopolysaccharides (LPS), adhesins, pili, fimbriae, and capsules (Moxon et al., 1994; Jordan et al., 2003; Verstrepen et al., 2004; Gibbons & Rokas, 2009; Janulczyk FEMS Microbiol Rev 38 (2014) 119–141
et al., 2010; Jerome et al., 2011). Extensive studies in different organisms and with different cell surface genes support the notion that stochastic TR-based switching ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
128
contributes to the rapid generation of diversity in surface structures, which in pathogens can serve as a mechanism for escaping the immune response and/or for determining tissue tropism. Some examples are reviewed more elaborately below. TRs within LPS biosynthesis genes LPS is a complex macromolecule in Gram-negative bacteria that is composed of three distinct parts (i.e. lipid A, core sugar, and O-antigen side chains), and a large number of genes are involved in the synthesis and export of LPS. As one of the major cell surface antigens, LPS is often implicated in cell adhesion and virulence. Furthermore, because LPS molecules are an essential structural component of the outer membrane that forms the outer shell of the Gram-negative cell, it also determines bacterial resistance to a variety of toxic chemicals, including some antibiotics and xenobiotics. A notable feature of LPS of some pathogens is the extensive intra- and interstrain heterogeneity of the glycoform structure (i.e. the moieties comprising the core and O side chain sugars), which is mainly due to incomplete biosynthesis during the stepwise assembly of the sugar residues resulting from the phase-variable expression of the corresponding biosynthesis genes (Schweda et al., 2007). Typically, the phase variability of these genes derives from the fact that they contain nontrimeric TR tracts that exhibit stochastic variation. In some pathogens, this type of stochastic variation occurs in more than one gene for LPS synthesis, turning phase switching into a combinatorial process. Unlike that of most Gram-negative bacteria, the core LPS of H. influenzae lacks the homopolymeric sugar units comprising the O-antigen side chains. Therefore, phase variation in LPS biosynthesis of H. influenzae occurs mainly through reversible expression switching of a subset of TR-containing genes, involved in the addition of core sugars (i.e. glucose and sialic acid) to the conserved tri-heptose backbone and in the addition of phosphorylcholine or acyl groups to these core sugars (Moxon et al., 2006; Fig. 5a). These genes include lic1A, lic1B, lic1C, and lic1D (collectively designated as lic1 locus), and lic2A, lic3A, lgtC, lex2, and oafA, each of which contains a variable tetrameric repeat tract (reviewed in Schweda et al., 2007). Rearrangements in the tetrameric repeat tracts of these genes can individually turn their expression on or off, resulting in a modified LPS molecule at the single cell level, and a repertoire of different LPS epitopes throughout the population (Fig. 5b; Moxon et al., 1994; Bayliss et al., 2001; Moxon et al., 2006). Remarkably, all the phase-variable genes involved in LPS biosynthesis of H. influenzae account for structural elements directly relevant to virulence as well (Schweda et al., 2007). ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
Interestingly, searching for genes harboring tetrameric TRs has proven to be a successful strategy to identify novel LPS biosynthesis genes in H. influenzae genomes (Hood et al., 1996; Fox et al., 2005). Similar mechanisms generating LPS diversity have also been described in N. meningitidis, where the ON/OFF variation in expression of genes of the lgt gene family (encoding LPS biosynthesis functions) is mediated by stochastic expansions or contractions in their intragenic homopolymeric tracts and where the corresponding LPS structures have so far been classified into 12 immunotypes (Jennings et al., 1999; Berrington et al., 2002; Bayliss et al., 2008). The O-chain of H. pylori LPS can be fucosylated, thereby generating Lewis antigens that mimic human blood group antigens and mediate immune evasion (reviewed in Kusters et al., 2006). Lewis antigens of H. pylori are subject to reversible, high-frequency phase variation, and one of the mechanisms is the slipped-strand mispairing of poly-C tracts within three fucosyltransferase genes (futA, futB, and futC; Appelmelk et al., 1999; Wang et al., 1999). In addition, a unique TR region with a 21-bp unit size was recently uncovered in the 3′ end of futA and futB, but not in futC. Strikingly, although copy number variations of the 21-bp TR tract in both futA and futB did not alter the reading frame, they could affect the fucosyltransferase activity. In fact, a correlation was observed between the copy number of the 21-bp TRs and the number of O-antigen units being fucosylated, and the addition of one repeat unit led to the addition of an N-acetyl-b-lactosamine (LacNAc) unit in the O-antigen polysaccharide (Nilsson et al., 2006, 2008). These studies show that the variability of TRs in futA, futB, and futC increases the antigen diversity and population heterogeneity and thereby supports adaptability of H. pylori to fluctuating conditions in the gastric mucosa. TRs within capsule biosynthesis genes As one of the most external structures on the bacterial surface, capsules may completely conceal other antigenic surface molecules or may be co-exposed with other antigens, which are thought to be important for pathogenicity of bacteria. Production of a capsule provides some pathogenic bacteria with resistance to phagocytic and complement-mediated killing and, at the same time, affects bacterial attachment to host cells. Consequently, the ability to regulate capsule expression might confer a selective advantage for pathogens to cope with host immune responses during different stages of the infection process. For example, it was found that acapsulate variants of N. meningitidis serogroup B show much higher adherence and invasion of epithelial cells than their capsulated progenitors. Such variants can be generated at high frequency due to SSM of a poly-C tract in the polysialyltransferase FEMS Microbiol Rev 38 (2014) 119–141
129
Variable tandem repeats and bacterial adaptation
PEtn
(a)
PC
lic1A
6Glc
β-1,4
4
Hep
α-1,5
Kdo
Lipid A
α-1,3
Hep6
NeuAc
α-2,3
α-1,2
lic2A
lic3A
Gal
β-1,4
PEtn
Glc
β-1,2
Hep
(b) NeuAc lic3A lic2A
Gal
lic1
PC
(ON)
(OFF)
(ON)
(ON)
(ON)
(OFF)
(ON)
(OFF)
(ON)
(ON)
Type I
Type II
(OFF)
(OFF)
Type III
Type IV
Fig. 5. Structural variations of LPS molecules induced by TR rearrangements in LPS biosynthesis genes. (a) Representation of one possible structure of the LPS from Haemophilus influenzae strain Rd. The conserved tri-heptose backbone is highlighted in bold. Three phase-variable lic genes involved in the addition of specific components to the backbone are shown. Kdo, 2-keto-3-deoxyoctulosonic acid; Hep, L-glycero-D-mannoheptose; Glc, D-glucose; Gal, D-galactose; NeuAc, N-acetylneuraminic acid; PEtn, phosphoethanolamine; PC, phosphorylcholine. (b) Scheme showing some of the possible LPS types generated on the cell surface of H. influenzae Rd by the combinatorial effect of three LPS synthesis genes with phase-variable expression. lic2A is responsible for adding a galactose (Gal); lic3A, for adding sialic acid (NeuAc); and lic1, for adding phosphorylcholine (PC). The repeat tracts are shown by hatched boxes in the gray arrow representation of each gene. The TRs in each of these genes are subject to variation, which can cause a frameshift and thus block expression (indicated by ON or OFF). Types I–IV represent microbial cells with different LPS antigens depending on lic1, lic3A, and lic2A gene expression (adapted from Moxon et al., 2006).
gene siaD (Hammerschmidt et al., 1996a, b; Spinosa et al., 2007). On the other hand, meningococcus isolates from the blood of meningitis patients are almost always capsulated, while both capsulated and noncapsulated strains typically coexist in the nose or throat of healthy carriers. Because phase variation of siaD is reversible, it was therefore proposed that infection is initiated by acapsulate strains, but that capsule biosynthesis is reactivated at a later stage during infection, allowing N. meningitidis to resist the host immune system and to cause disease (i.e. sepsis and meningitis). However, other findings indicate that the presence of a capsule does not necessarily preclude invasiveness, and the role of the capsule and capsule phase switching in different stages of meningococcal infection may therefore be more subtle and strain dependent (Spinosa et al., 2007; Bartley et al., 2013). Besides binary ON/OFF switching, some bacteria can also modulate the composition of their capsule by a mechanism of TR-dependent phase switching. This is exemplified by the polysialic acid capsule (K1 antigen) of FEMS Microbiol Rev 38 (2014) 119–141
E. coli K1, which can be modified through phase-variable expression of the sialyl O-acetylating activity, resulting in an altered immunogenicity and susceptibility to glycosidases. The phase-variable acetylation is driven by a heptanucleotide TR tract (AAGACTC; copy number typically 14–39) within the O-acetyltransferase gene neuO (Deszo et al., 2005). Loss or gain of a number of repeat units that is not a multiple of three results in a disruption of the neuO reading frame and subsequent NeuO expression (Deszo et al., 2005). Functional analysis furthermore revealed enhanced desiccation resistance, but reduced biofilm formation in E. coli K1 with active NeuO, suggesting not only a role in host interaction, but also a more subtle ecological impact of phase-variable neuO expression (Mordhorst et al., 2009). Interestingly, each set of three repeat units encodes a protein structure designated the poly(w) motif, and NeuO enzymatic activity was found to increase with the number of poly(w) motifs, supporting maintenance of high repeat copy numbers in the population (Bergfeld et al., 2007). ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
130
Another intriguing aspect of this capsular acetylation is that the neuO gene resides in a lambdoid prophage termed ‘CUS-3’, and mitomycin treatment of E. coli K1 can induce release of this phage, which specifically infects K1 antigen-expressing bacteria (Deszo et al., 2005; King et al., 2007). This suggests that variant neuO alleles can be redistributed among K1-encapsulated bacteria via horizontal transfer. Indeed, CUS-3/neuO has been found in several serotypes, such as O18 and O45 (King et al., 2007). On the other hand, although the receptor for CUS-3 is polysialic acid, superinfection was not prevented by NeuO-mediated acetylation (Vimr & Steenbergen, 2006; King et al., 2007). Notably, sialic acid is also known as a modification of LPS in numerous pathogens, underscoring the vital role of O-acetylation in causing structure variation of polysaccharide epitopes (also see TRs within LPS biosynthesis genes). Also the mini- and macrosatellites within capsule biosynthesis genes of Gram-positive pathogens can mediate capsule diversity. As such, noncapsulated serotype 3 Streptococcus pneumoniae strains were shown to carry an out-of-frame perfect tandem duplication in one of the capsule biosynthesis genes (i.e. cap3A). Interestingly, based on the sequence and length of the duplication, at least seven different cap3A alleles were found in different nonencapsulated strains. Analysis of the phase reversion frequency (OFF to ON) induced by TR contractions revealed a positive correlation between the frequency of reversion and the length of the duplication (Waite et al., 2001). A similar mechanism of capsular phase variation correlating with a 223-bp and a 22-bp perfect tandem duplication in cap8E and tts was also demonstrated in serotype 8 and serotype 37, respectively, of S. pneumoniae (Waite et al., 2003). TRs within adhesin-associated genes Adhesins mediate bacterial attachment to and further colonization of host tissues, but they also act as surface antigens (Bayliss et al., 2001). The opacity proteins Opa of Neisseria spp. constitute a family of closely related, but size-variable outer membrane proteins, which enhance the adherence to epithelial, leukocyte, and phagocytic cells (reviewed in Sadarangani et al., 2011). They do not only determine host and tissue specificity, but also facilitate efficient cellular invasion (Carbonnelle et al., 2009). Neisseria spp. strains have 3–11 opa genes whose phasevariable expression is modulated by a pentanucleotide TR tract (CTCTT) in their coding regions or by intergenic recombination. As a result, a vast array of Opa variants can be generated to confer differential molecular specificities, allowing Neisseria both to alter its tissue tropism and to escape the host immune system. In fact, it has been ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
argued that the occurrence of different arrays of Opa variants in clinical (disease) and carriage (nondisease) isolates of N. meningitidis may be the result of immune selection pressure (Callaghan et al., 2006, 2008; Sadarangani et al., 2011). In the same vein, adhesins of H. pylori are categorized into two main subgroups, Hop (Helicobacter outer membrane porins) and Hor (Hop-related). For some, but not all of these adhesins, the corresponding host cell receptors have been identified (reviewed in Backert et al., 2011). For example, BabA binds to the Lewis B (Leb) antigen expressed on the gastric mucosa (Ilver et al., 1998), and SabA binds to glycosphingolipids of host cells that display a sialyl-dimeric Lewis X (sialyl-Lex) antigen (Mahdavi et al., 2002). Interestingly, expression of some of the adhesins (i.e. BabB, HopZ, SabA, OipA) is subject to ON/ OFF phase variation due to the presence of variable dinucleotide (CT) tracts within the corresponding genes. As a consequence, the expression patterns of adhesins generated by stochastic phase variation can not only affect the adhesion efficiency, but also alter the tissue tropism (Yamaoka et al., 2002; Backert et al., 2011). Additionally, the resulting repertoire of phase-variable adhesins can be advantageous for H. pylori to escape the host immune system. A final example in this category is the Eap protein of the Gram-positive pathogen Staphylococcus aureus. Eap is a multifunctional adhesin with a variable TR tract, in which the size of each repeat unit is not identical (93– 110 aa). A minimum of two TRs in the eap gene are required for Eap to cause agglutination, adherence and cellular invasion by S. aureus. Furthermore, as the repeat copy number increases from 2 to 5, those capacities are significantly enhanced, suggesting that TR copy number expansion in eap supports host adaptation of S. aureus (Hussain et al., 2008). TRs within iron (heme) acquisition genes
Free iron is typically limited in the host and is usually sequestered by iron-binding proteins (e.g. hemoglobin, transferrin, and lactoferrin). Iron acquisition mechanisms are therefore considered indispensable for bacterial pathogenicity, and pathogens have evolved many different strategies for adaptation to fluctuating iron concentrations. As such, pathogens are able to extract the iron from heme groups of iron-binding proteins via surface receptors. In H. influenzae, a family of hemoglobin-binding and hemoglobin–haptoglobin-binding proteins (Hgp) is known to mediate heme scavenging (Ren et al., 1998; Jin et al., 1999; Morton et al., 1999). Individual strains of H. influenzae have 1–4 hgp genes, of which knockout analysis has confirmed that they are indispensable FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
virulence determinants in invasive disease (Seale et al., 2006). Interestingly, a CCAA tetranucleotide repeat tract exists in each hgp gene, which is reminiscent of the LPS biogenesis genes of H. influenzae harboring a CAAT repeat tract (Moxon et al., 2006; see TRs within LPS biosynthesis genes). Phase variation induced by repeat rearrangements within the hgp genes has been observed; however, its biological significance is still obscure. A similar case exists in N. meningitidis where iron acquisition is modulated by the variable poly-G tract in the hpuA and hmbR genes that are involved in the biosynthesis of two hemoglobin receptors (Lewis et al., 1999; Richardson & Stojiljkovic, 1999). More recently, it was revealed that 91% of N. meningitidis pathogenic isolates, but only 71% of commensal isolates have at least one receptor in an ON state, suggesting expression of hemoglobin receptor(s) to be important for the systemic spread of meningococci (Tauseef et al., 2011). TRs within genes involved in restriction– modification systems
In addition to their role as drivers of genetic variation as reviewed above, TRs can also drive epigenetic variation when they affect restriction–modification (R-M) systems. Currently, TR-dependent phase variations have been documented only in type I and III R-M systems. Type I R-M systems are generally comprised of three subunits (S, M, and R), which together form a holoenzyme that has both methylation and restriction activity. In H. influenzae, the type I R-M system HindI functions as the main defense system against the entry of foreign DNA (Glover & Piekarowicz, 1972; Piekarowicz et al., 1974). Phase variation of HindI is driven by a GACGA pentanucleotide repeat tract in the methyltransferase subunit gene (hsdM), as supported by the finding that an hsdM allele with four repeat units encoding an HsdM protein of normal length was associated with resistance to phage HP1c1 infection, while gain or loss of one repeat unit resulted in phage sensitivity (Zaleski et al., 2005). Interestingly, phase switching of phage susceptibility could also independently be conferred by LPS alterations induced by the variable tetranucleotide repeat tract of the lic2A gene, which confirmed the role of Lic2A-modified LPS as the receptor of HP1c1 phage (Zaleski et al., 2005; see also TRs within LPS biosynthesis genes). Moreover, the frequencies of the phase variations of phage susceptibility described above are affected by Dam (deoxyadenosine methyltransferase) activity, although the mechanism is not clear yet (Zaleski et al., 2005). Another example was found in the NgoAV type I R-M system of N. gonorrhoeae, which is encoded by four genes: hsdMNgoAV, hsdRNgoAV, hsdSNgoAV1, and hsdSNgoAV2. The FEMS Microbiol Rev 38 (2014) 119–141
131
product of the hsdSNgoAV1 is responsible for the specific recognition of the target site, whereas hsdSNgoAV2 is nonfunctional. It was postulated that hsdSNgoAV1 and hsdSNgoAV2 are actually truncated proteins derived from an integral hsdS locus that has become interrupted by a frameshift mutation resulting in the formation of a stop codon between hsdSNgoAV1 and hsdSNgoAV2 (Piekarowicz et al., 2001). Interestingly, a recent study uncovered a variable poly-G tract within the 3′ end of hsdSNgoAV1 in N. gonorrhoeae, and loss of a guanine in that tract restores the fusion of the hsdSNgoAV1 and hsdSNgoAV2 genes, resulting in the generation of a new HsdSNgoAVD protein, responsible for a novel NgoAV R-M system, termed ‘NgoAVD’. The NgoAVD system has a modified DNA recognition specificity, thereby conferring an altered susceptibility to various phages (Adamczyk-Poplawska et al., 2011). Type III R-M systems only consist of two subunits, the methyltransferase (encoded by a mod gene) and restriction endonuclease (encoded by a res gene). In host-adapted pathogens, ON/OFF phase variation of this system has been reported due to variable TRs (with repeat unit not being a multiple of three bases) within either the mod or the res gene (Ryan & Lo, 1999; De Bolle et al., 2000; de Vries et al., 2002; Srikhanta et al., 2005, 2009, 2011; also see review Srikhanta et al., 2010). Interestingly, TR-induced phase variation in the mod gene has been shown to affect the expression of a number of genes, referred to as a phase-variable regulon or phasevarion (Srikhanta et al., 2005, 2010). In H. influenzae strain Rd, when mod expression was switched off by a TR rearrangement in a tetrameric repeat tract, nine other genes were down-regulated, and seven, up-regulated (Srikhanta et al., 2005). In the same vein, N. gonorrhoeae formed significantly thicker biofilms and thus may indirectly benefit from an increased resistance to external stresses, when its mod was in the OFF state. Additionally, a mod-ON phenotype resulted in an increased ability to associate with human cervical epithelial (pex) cells, whereas the mod-OFF configuration enhanced the ability to invade and survive within pex cells following invasion (Srikhanta et al., 2009). Further studies will be needed to clarify whether these phenotypes directly relate to mod deficiency itself or to altered expression of one or more members of the mod phasevarion. Interestingly, most N. meningitidis and N. gonorrhoeae strains have a second phase-variable methyltransferase gene (Srikhanta et al., 2009), and some, even a third (Seib et al., 2011), and it has been anticipated that the combinatorial use of different phasevarions may contribute to further phenotypic variability (Seib et al., 2011). Furthermore, Fox et al. (2007) suggested that the apparent evolution of this type III R-M system into an ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
132
K. Zhou et al.
epigenetic mechanism for controlling gene expression has resulted in loss of the DNA restriction function in some strains. In the latter context, it is also interesting to note the existence of solitary type IV restriction endonucleases that have no cognate methyltransferase. Because these peculiar enzymes display a specificity for methylated DNA, it has been anticipated that they might function to ward off the lateral acquisition of methyltransferases that might affect the cell’s epigenetic regulation (Fukuda et al., 2008; Tesfazgi Mebrhatu et al., 2011). As such, the phase variability of methyltransferase functions might serve to prevent detection of their activity during their establishment in a new host (Tesfazgi Mebrhatu et al., 2011). TRs within transcription activator-like effectors
Gram-negative plant-pathogenic bacteria of the genus Xanthomonas can infect a broad spectrum of plants. A common feature of their infection process is the injection of virulence proteins (termed effectors) into host cells, mostly by means of a type III secretion system. The type
III effectors are currently classified into nearly 40 groups based on sequence similarity and biochemical activity, and the largest group is the AvrBs3/PthA family, also known as transcription activator-like effector (TALE) family (reviewed in White et al., 2009). The most conspicuous feature of TALE genes is the variation of their central domain, mostly consisting of 15.5–19.5 nearly identical TRs with a 102-bp unit (G€ urlebeck et al., 2006; Mak et al., 2013; Fig. 6). A recent study revealed that the AvrBs3 can determine bacterial fitness in plants during infection (Kay et al., 2007). A set of upa (up-regulated by AvrBs3) genes among which upa20, the key regulator of the plant cell hypertrophy phenotype, has been identified as AvrBs3 targets in pepper plants. AvrBs3 acts as a transcription factor by binding to a conserved promoter element (UPA box) of upa20, resulting in up-regulation. Binding was shown to be mediated by the TR region of AvrBs3, suggesting the repeats to act as a DNA-binding motif (Kay et al., 2007). More specifically, it was found that the number of base pairs of the UPA box closely matches the TR copy number
(a)
DNA-binding TR domain
TTSS secreƟon signal
0 1
N
17.5
TranscripƟonal acƟvaƟon domain
C
HD NG NS NG NI NI NI HD HD NG NS NS HD HD HD NG HD NG
Nuclear localizaƟon signal
LT P E Q V VA I A S H D G G KQ A L E T V Q R L L P V L C Q A H G 1
12/13
34
(b)
UPA box T A T A T AA A C C T NN C CC T C T UPA gene Fig. 6. Domain organization and molecular function of Xanthomonas TALE AvrBs3. (a) The AvrBs3 functional domains shown include a type three secretion system (TTSS) signal sequence (dark blue), a central DNA-binding TR domain, a nuclear localization signal (purple), and a transcriptional activation domain (olive green). The TR domain in this case comprises 17.5 imperfect repeat units, each of which consists of 34 amino acids (aa). The unit numbered zero (red, dashed rectangle) is not a true repeat because it has a different aa sequence, but it also contributes to DNA binding. Each repeat binds to a base in the target sequence, and the binding specificity of the repeats is determined by aa 12 and 13 (known as RVD) and displayed with repeat-specific colors. The complete aa sequence of one repeat (no. 9) is shown with its RVD highlighted in green. (b) After injection into the plant cell by the bacterial type three secretion system, AvrBs3 is targeted to the nucleus and will bind with its TR domain to a specific DNA sequence known as UPA box. The consensus UPA box matches closely the binding specificity of the TR region of AvrBs3 as determined by the different TR units and their RVD (aa 12 and 13). AvrBs3 binding activates transcription of several UPA genes (adapted from Mak et al., 2013).
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
FEMS Microbiol Rev 38 (2014) 119–141
133
Variable tandem repeats and bacterial adaptation
in AvrBs3. An elegant study demonstrated that each repeat unit of AvrBs3 binds to one base pair of the UPA box and that the base recognition specificity of each repeat is determined by two hypervariable amino acids (the 12th and 13th amino acids of each repeat unit), known as repeat variable di-residues (RVDs; Boch et al., 2009). Moreover, a minimum of 6.5 TRs in AvrBs3 are necessary for activating upa gene expression (Boch et al., 2009; Moscou & Bogdanove, 2009; Scholze & Boch, 2011; Fig. 6). Most recently, the structural basis for this kind of sequence-specific recognition has been uncovered by crystallographic analysis of an artificially engineered TAL effector hybridized with its target DNA (Deng et al., 2012). AvrBs3-dependent modulation of plant signaling pathways causes enlargement of mesophyll cells and hypertrophy of the infected tissue, which might help the bacteria to proliferate and escape from infection sites to facilitate bacterial spreading (Kay et al., 2007; Boch & Bonas, 2010). Consequently, TR variations in the avrBs3 gene preclude activation of the UPA box causing a failure in inducing the hypersensitive response in plants (Kay et al., 2007). While the above studies highlight the biological importance of TALEs as major virulence determinants, TALEs also provide interesting avenues for biotechnological applications. As such, it will be possible to engineer broader pathogen resistance in crops by combining several UPA boxes into the promoter of plant resistance genes like Bs3, which will render these transgenic plants resistant to infection by bacteria delivering matching TAL effectors (Boch & Bonas, 2010; Scholze & Boch, 2011). Alternatively, the TR region of TAL effectors can be tailored as to recognize and bind to a predefined DNA sequence, resulting in activating of the expression of downstream target genes (Morbitzer et al., 2010). Moreover, engineered TAL effectors can be fused with endonuclease domains to generate TAL effector nucleases that can introduce cuts or double-strand breaks in or near specific sequences on the chromosome, targeting these loci for mutagenesis or recombinational repair and gene therapy (Bogdanove & Voytas, 2011; Mu~ noz Bodnar et al., 2013). TRs within stress response genes
Regulation of stress response genes is one of the most common strategies employed by bacteria to cope with stresses. TRs have been identified in a number of stress response genes (Rocha et al., 2002), and some studies have addressed the role of these repeats in the modulation of a stress response. For example, the gene encoding the CtsR regulator (class III stress gene repressor) of Listeria monocytogenes carries a triplet repeat (GGT) tract with three copies. Stochastic deletion of one triplet in the ctsR gene results in an inactive CtsR repressor, leading to FEMS Microbiol Rev 38 (2014) 119–141
expression of the clp genes (Karatzas et al., 2003, 2005). This alteration confers increased resistance to high hydrostatic pressure, heat, acid, and H2O2, but attenuates virulence in L. monocytogenes (Karatzas & Bennik, 2002). Another example is mutL, which is involved in mismatch repair in Salmonella Typhimurium and E. coli. The functional allele of mutL carries a trimeric hexanucleotide repeat in the region encoding the ATP-binding pocket of the protein. Spontaneous loss of one repeat unit resulting in a mutator phenotype due to MutL deficiency has been observed in long-term cultures of both E. coli and S.Typhimurium (Shaver & Sniegowski, 2003; Chen et al., 2010). Expansion of this TR region from 3 to 4 units and from 2 to 3 units has also been observed and, in the latter case, caused reversion of the mutator phenotype (Shaver & Sniegowski, 2003; Chen et al., 2010; Le Bars et al., 2013). This genetic switching of MutL may serve as a strategy to control the balance between genetic stability and mutability and thus serve as an element controlling bacterial evolution. Interestingly, not only mutL, but also many other genes (i.e. mutT, mutY, mutS, dinJ, and ruvC) involved in DNA repair of E. coli harbor SSRs, further underscoring the putative regulatory role of TRs in stress response (Rocha et al., 2002). Some membrane proteins are essential for membrane integrity and as such for tolerance to a variety of toxic chemicals and other stresses. One such protein, that is present in many Gram-negative bacteria, is TolA, which harbors a variable TR tract composed of 8–16 imperfect copies of a 15- to 18-bp repeat unit in E. coli (Levengood et al., 1991; Zhou et al., 2012b). TR variations in TolA occurred at a frequency of at least 6.9 9 10 5 in a clonal wild-type population of E. coli MG1655 and were shown to modulate stress tolerance, with the most outspoken TR-dependent phenotype being deoxycholic acid tolerance (Zhou et al., 2012a). However, the precise molecular mechanism underlying this phenotypic variation remains unclear. A peculiar case is the triplet repeat (TCT) in E. coli ahpC, where expansion of the TR tract with 1 unit converts the AhpC protein from a peroxidase into a disulfide reductase, as demonstrated by the ability of this newly acquired enzyme activity to restore normal growth of a mutant lacking thioredoxin and glutathione reductase (Ritz et al., 2001). To our knowledge, this is the only example of an intragenic TR variation generating a truly novel function in bacteria.
Conclusions and outlook: TRs and bacterial evolution The generation of mutations is the basis of evolution in bacteria and all other living organisms. Under changing environmental conditions, the evolution of better adapted ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
134
descendants may be essential for survival of the species, and an increased mutation rate obviously could confer higher survival chances. On the other hand, the generation of random mutations all over the genome also causes an important burden because most mutations are deleterious. Therefore, to be successful, bacteria will have to maintain a balance between genome plasticity and stability. One way to do this is to control the spatial distribution of mutations over the genome by evolving highly mutable sequences that are associated with loci (also known as contingency loci) that are needed for a flexible response to environmental conditions or stresses, especially those that cannot be detected by conventional bacterial sensors (e.g. phages or host receptors; Fonville et al., 2011). TRs and their variability provide a paradigm for this regulatory strategy of adaptability. The formation of TRs is suggested to be a random process based on replication slippage (Levinson & Gutman, 1987); however, only some of the formed TRs are believed to be maintained during evolution. Generally, variable TRs tend to localize in flexible genes involved in the biosynthesis of surface structures and with a function in adhesion and (for pathogens) invasion, although they are occasionally also found in genes encoding critical cellular functions like DNA replication (Moxon et al., 2006; Guo & Mrazek, 2008). The association between TRs and cell surface structures is suggested to allow populations to anticipate changes in the environment in order to enhance their survival rate (Moxon et al., 1994). This is particularly common and critical for pathogens such as H. influenzae and H. pylori with limited genetic information (i.e. reduced genome) to cope with complex environments (Razin et al., 1998). As such, TR-dependent phase variation can be regarded as a strategy for bacterial adaptation that is complementary to conventional mutations (i.e. single nucleotide polymorphism), but has some distinct features. First, the hypermutability (typical mutation rates of 10 2–10 5 per generation) can be advantageous for adaption on a short time scale, and it has been demonstrated using a theoretical model that TRs mediating stochastic switching can evolve and be maintained under a wide range of alternating selection regimens (Palmer et al., 2013). Second, TR rearrangements, both local and combinatorial, can effectuate alterations at both the transcriptional and the translational levels, resulting in either binary switching (‘ON’ and ‘OFF’) or gradual control, and this facilitates subtle adaptation of bacterial fitness under stress. Further, TR-dependent mechanisms operate not only in classic genetic, but also in epigenetic pathways and from the single locus to the more global level of phasevarions, suggesting the power of TR intermediate regulation in bacterial adaptation. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
Although rapid TR-dependent phase switching facilitates bacterial adaption on a short time scale, it is probably not necessary for bacteria to maintain hypermutable regions in the absence of selection pressures, because of the associated cost of DNA replication fidelity and metabolic energy. An interesting, but mostly unanswered question is how bacteria optimize the mutation rate of TRs in phase-variable genes in a fluctuating environment. Some intrinsic features of the TR sequence (i.e. conservation and copy number of repeat unit) and cellular function (i.e. DNA replication, recombination, and repair) are known to affect the frequency of TR-dependent phase variation and may be used to this purpose (Bayliss, 2009). On the other hand, some studies have shown a modulation of the TR rearrangement frequency by environmental stress (Kanbashi et al., 1997; Jackson & Loeb, 2000; Rasmussen & Bj€ ork, 2001; Srikhanta et al., 2009; Cooley et al., 2010). However, in general, it remains unclear how environmental signals are transduced to modulate the switching rate of TR-containing genes. In conclusion, TRs confer local sequence flexibility in bacterial genomes, thereby allowing targeted mutation and evolution. The genotypic and phenotypic variations modulated by TRs are rapid and coordinative and support the generation of substantial biological diversity on a short time scale. In spite of a certain metabolic cost, ‘prepared genomes’ (i.e. with TR-based contingency loci; Caporale, 1999) have a higher adaptability and thus a fitness advantage in frequently fluctuating environments.
Acknowledgements This work was supported by the Research Foundation – Flanders (FWO – Vlaanderen; Research project G.0289.06) and by the KU Leuven Research Fund (project METH/07/ 03).
References Ackermann M & Chao L (2006) DNA sequences shaped by selection for stability. PLoS Genet 2: e22. Adamczyk-Poplawska M, Lower M & Piekarowicz A (2011) Deletion of one nucleotide within the homonucleotide tract present in the hsdS gene alters the DNA sequence specificity of type I restriction-modification system NgoAV. J Bacteriol 193: 6750–9675. Aertsen A & Michiels CW (2004) Stress and how bacteria cope with death and survival. Crit Rev Microbiol 30: 263–273. Aertsen A & Michiels CW (2005) Diversify or die: generation of diversity in response to stress. Crit Rev Microbiol 31: 69–78. Appelmelk BJ, Martin SL, Monteiro MA et al. (1999) Phase variation in Helicobacter pylori lipopolysaccharide due to
FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
changes in the lengths of poly(C) tracts in alpha3-fucosyltransferase genes. Infect Immun 67: 5361–5366. Attia AS & Hansen EJ (2006) A conserved tetranucleotide repeat is necessary for wild-type expression of the Moraxella catarrhalis UspA2 protein. J Bacteriol 188: 7840–7852. Backert S, Clyne M & Tegtmeyer N (2011) Molecular mechanisms of gastric epithelial cell adhesion and injection of CagA by Helicobacter pylori. Cell Commun Signal 9: 28. Barenkamp SJ & Leininger E (1992) Cloning, expression, and DNA sequence analysis of genes encoding nontypeable Haemophilus influenzae high-molecular-weight surface-exposed proteins related to filamentous hemagglutinin of Bordetella pertussis. Infect Immun 60: 1302–1313. Bartley SN, Tzeng YL, Heel K et al. (2013) Attachment and invasion of Neisseria meningitidis to host cells is related to surface hydrophobicity, bacterial cell size and capsule. PLoS ONE 8: e55798. Bayliss CD (2009) Determinants of phase variation rate and the fitness implications of differing rates for bacterial pathogens and commensals. FEMS Microbiol Rev 33: 504–520. Bayliss CD, Field D & Moxon ER (2001) The simple sequence contingency loci of Haemophilus influenzae and Neisseria meningitidis. J Clin Invest 107: 657–662. Bayliss CD, Hoe JC, Makepeace K, Martin P, Hood DW & Moxon ER (2008) Neisseria meningitidis escape from the bactericidal activity of a monoclonal antibody is mediated by phase variation of lgtG and enhanced by a mutator phenotype. Infect Immun 76: 5038–5048. Bayliss CD, Bidmos FA, Anjum A et al. (2012) Phase variable genes of Campylobacter jejuni exhibit high mutation rates and specific mutational patterns but mutability is not the major determinant of population structure during host colonization. Nucleic Acids Res 40: 5876–5889. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580. ̈ Bergfeld AK, Claus H, Vogel U & Muhlenhoff M (2007) Biochemical characterization of the polysialic acid specific O-acetyltransferase NeuO of Escherichia coli K1. J Biol Chem 282: 22217–22227. Berrington AW, Tan YC, Srikhanta Y, Kuipers B, van der Ley P, Peak IR & Jennings MP (2002) Phase variation in meningococcal lipooligosaccharide biosynthesis genes. FEMS Immunol Med Microbiol 34: 267–275. Bhugra B, Voelker LL, Zou N, Yu H & Dybvig K (1995) Mechanism of antigenic variation in Mycoplasma pulmonis: interwoven, site-specific DNA inversions. Mol Microbiol 18: 703–714. Bi X & Liu LF (1996) A replicational model for DNA recombination between direct repeats. J Mol Biol 256: 849–858. Bichara M, Wagner J & Lambert IB (2006) Mechanisms of tandem repeat instability in bacteria. Mutat Res 598: 144–163.
FEMS Microbiol Rev 38 (2014) 119–141
135
Boch J & Bonas U (2010) Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu Rev Phytopathol 48: 419–436. Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A & Bonas U (2009) Breaking the code of DNA binding specificity of TAL-type III effectors. Science 326: 1509–1512. Bogdanove AJ & Voytas DF (2011) TAL effectors: customizable proteins for DNA targeting. Science 333: 1843–1846. Brinkmann B, Klintschar M, Neuhuber F, H€ uhne J & Rolf B (1998) Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am J Hum Genet 62: 1408–1415. Bzymek M & Lovett ST (2001) Instability of repetitive DNA sequences: the role of replication in multiple mechanisms. P Natl Acad Sci USA 98: 8319–8325. Callaghan MJ, Jolley KA & Maiden MC (2006) Opacity-associated adhesin repertoire in hyperinvasive Neisseria meningitidis. Infect Immun 74: 5085–5094. Callaghan MJ, Buckee CO, Jolley KA, Kriz P, Maiden MC & Gupta S (2008) The effect of immune selection on the structure of the meningococcal Opa protein repertoire. PLoS Pathog 4: e1000020. Caporale LH (1999) Chance favors the prepared genome. Ann NY Acad Sci 870: 1–21. Carbonnelle E, Hill DJ, Morand P, Griffiths NJ, Bourdoulous S, Murillo I, Nassif X & Virji M (2009) Meningococcal interactions with the host. Vaccine 27(suppl 2): B78–B89. Carson SD, Stone B, Beucher M, Fu J & Sparling PF (2000) Phase variation of the gonococcal siderophore receptor FetA. Mol Microbiol 36: 585–593. Chen F, Liu WQ, Eisenstark A, Johnston RN, Liu GR & Liu SL (2010) Multiple genetic switches spontaneously modulating bacterial mutability. BMC Evol Biol 10: 277. Chiou CS (2010) Multilocus variable-number tandem repeat analysis as a molecular tool for subtyping and phylogenetic analysis of bacterial pathogens. Expert Rev Mol Diagn 10: 5–7. Cholon DM, Cutter D, Richardson SK, Sethi S, Murphy TF, Look DC & St Geme III JW (2008) Serial isolates of persistent Haemophilus influenzae in patients with chronic obstructive pulmonary disease express diminishing quantities of the HMW1 and HMW2 adhesins. Infect Immun 76: 4463–4468. Choudhary OP & Trivedi S (2010) Microsatellite or simple sequence repeat (SSR) instability depends on repeat characteristics during replication and repair. J Cell Mol Biol 8: 21–34. Coenye T & Vandamme P (2005) Characterization of mononucleotide repeats in sequenced prokaryotic genomes. DNA Res 12: 221–233. Coil DA & Anne J (2010) The role of fimV and the importance of its tandem repeat copy number in twitching motility, pigment production, and morphology in Legionella pneumophila. Arch Microbiol 192: 625–631.
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
136
Cooley MB, Carychao D, Nguyen K, Whitehand L & Mandrell R (2010) Effects of environmental stress on stability of tandem repeats in Escherichia coli O157:H7. Appl Environ Microbiol 76: 3398–3400. Dawid S, Barenkamp SJ & St Geme III JW (1999) Variation in expression of the Haemophilus influenzae HMW adhesins: a prokaryotic system reminiscent of eukaryotes. P Natl Acad Sci USA 96: 1077–1082. De Bolle X, Bayliss CD, Field D, van de Ven T, Saunders NJ, Hood DW & Moxon ER (2000) The length of a tetranucleotide repeat tract in Haemophilus influenzae determines the phase variation rate of a gene with homology to type III DNA methyltransferases. Mol Microbiol 35: 211–222. de Vries N, Duinsbergen D, Kuipers EJ, Pot RG, Wiesenekker P, Penn CW, van Vliet AH, Vandenbroucke-Grauls CM & Kusters JG (2002) Transcriptional phase variation of a type III restriction-modification system in Helicobacter pylori. J Bacteriol 184: 6615–6623. Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu JK, Shi Y & Yan N (2012) Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335: 720–723. Deszo EL, Steenbergen SM, Freedberg DI & Vimr ER (2005) Escherichia coli K1 polysialic acid O-acetyltransferase gene, neuO, and the mechanism of capsule form variation involving a mobile contingency locus. P Natl Acad Sci USA 102: 5564–5569. Dettman JR & Taylor JW (2004) Mutation and evolution of microsatellite loci in Neurospora. Genetics 168: 1231–1248. Dixon K, Bayliss CD, Makepeace K, Moxon ER & Hood DW (2007) Identification of the functional initiation codons of a phase-variable gene of Haemophilus influenzae, lic2A, with the potential for differential expression. J Bacteriol 189: 511–521. Ecevit IZ, McCrea KW, Pettigrew MM, Sen A, Marrs CF & Gilsdorf JR (2004) Prevalence of the hifBC, hmw1A, hmw2A, hmwC, and hia genes in Haemophilus influenzae isolates. J Clin Microbiol 42: 3065–3072. Eckert KA & Hile SE (2009) Every microsatellite is different: intrinsic DNA features dictate mutagenesis of common microsatellites present in the human genome. Mol Carcinog 48: 379–388. Edwards YJ, Elgar G, Clark MS & Bishop MJ (1998) The identification and characterization of microsatellites in the compact genome of the Japanese pufferfish, Fugu rubripes: perspectives in functional and comparative genomic analyses. J Mol Biol 278: 843–854. Erwin AL, Bonthuis PJ, Geelhood JL, Nelson KL, McCrea KW, Gilsdorf JR & Smith AL (2006) Heterogeneity in tandem octanucleotides within Haemophilus influenzae lipopolysaccharide biosynthetic gene losA affects serum resistance. Infect Immun 74: 3408–3414. Fonville NC, Ward RM & Mittelman D (2011) Stress-induced modulators of repeat instability and genome evolution. J Mol Microbiol Biotechnol 21: 36–44.
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
Foster PL (2005) Stress responses and genetic variation in bacteria. Mutat Res 569: 3–11. Foster PL (2007) Stress-induced mutagenesis in bacteria. Crit Rev Biochem Mol Biol 42: 373–397. Fox KL, Yildirim HH, Deadman ME, Schweda EK, Moxon ER & Hood DW (2005) Novel lipopolysaccharide biosynthetic genes containing tetranucleotide repeats in Haemophilus influenzae, identification of a gene for adding O-acetyl groups. Mol Microbiol 58: 207–216. Fox KL, Srikhanta YN & Jennings MP (2007) Phase variable type III restriction-modification systems of host-adapted bacterial pathogens. Mol Microbiol 65: 1375–1379. Fukuda E, Kaminska KH, Bujnicki JM & Kobayashi I (2008) Cell death upon epigenetic genome methylation: a novel function of methyl-specific deoxyribonucleases. Genome Biol 9: R163. Funchain P, Yeung A, Stewart JL, Lin R, Slupska MM & Miller JH (2000) The consequences of growth of a mutator strain of Escherichia coli as measured by loss of function among multiple gene targets and loss of fitness. Genetics 154: 959–970. Gawel D, Jonczyk P, Bialoskorska M, Schaaper RM & Fijalkowska IJ (2002) Asymmetry of frameshift mutagenesis during leading and lagging-strand replication in Escherichia coli. Mutat Res 501: 129–136. Gemayel R, Vinces MD, Legendre M & Verstrepen KJ (2010) Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet 44: 445–477. Gibbons JG & Rokas A (2009) Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes. Mol Biol Evol 26: 591–602. Glew MD, Baseggio N, Markham PF, Browning GF & Walker ID (1998) Expression of the pMGA genes of Mycoplasma gallisepticum is controlled by variation in the GAA trinucleotide repeat lengths within the 5′ noncoding regions. Infect Immun 66: 5833–5841. Glew MD, Browning GF, Markham PF & Walker ID (2000) pMGA phenotypic variation in Mycoplasma gallisepticum occurs in vivo and is mediated by trinucleotide repeat length variation. Infect Immun 68: 6027–6033. Glover SW & Piekarowicz A (1972) Host specificity of DNA in Haemophilus influenzae: restriction and modification in strain Rd. Biochem Biophys Res Commun 46: 1610–1617. Goldstein DB & Clark AG (1995) Microsatellite variation in North American populations of Drosophila melanogaster. Nucleic Acids Res 23: 3882–3886. Goodwin AC, Weinberger DM, Ford CB, Nelson JC, Snider JD, Hall JD, Paules CI, Peek Jr RM & Forsyth MH (2008) Expression of the Helicobacter pylori adhesin SabA is controlled via phase variation and the ArsRS signal transduction system. Microbiology 154: 2231–2240. Gravekamp C, Horensky DS, Michel JL & Madoff LC (1996) Variation in repeat number within the alpha C protein of group B streptococci alters antigenicity and protective epitopes. Infect Immun 64: 3576–3583. Gravekamp C, Kasper DL, Michel JL, Kling DE, Carey V & Madoff LC (1997) Immunogenicity and protective efficacy
FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
of the alpha C protein of group B streptococci are inversely related to the number of repeats. Infect Immun 65: 5216– 5221. Griffin R, Cox AD, Makepeace K, Richards JC, Moxon ER & Hood DW (2003) The role of lex2 in lipopolysaccharide biosynthesis in Haemophilus influenzae strains RM7004 and RM153. Microbiology 149: 3165–3175. Guo X & Mrazek J (2008) Long simple sequence repeats in host-adapted pathogens localize near genes encoding antigens, housekeeping genes, and pseudogenes. J Mol Evol 67: 497–509. Gur-Arie R, Cohen CJ, Eitan Y, Shelef L, Hallerman EM & Kashi Y (2000) Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res 10: 62–71. G€ urlebeck D, Thieme F & Bonas U (2006) Type III effector proteins from the plant pathogen Xanthomonas and their role in the interaction with the host plant. J Plant Physiol 163: 233–255. Hammerschmidt S, Hilse R, van Putten JP, Gerardy-Schahn R, Unkmeir A & Frosch M (1996a) Modulation of cell surface sialic acid expression in Neisseria meningitidis via a transposable genetic element. EMBO J 15: 192–198. Hammerschmidt S, M€ uller A, Sillmann H, M€ uhlenhoff M, Borrow R, Fox A, van Putten J, Zollinger WD, Gerardy-Schahn R & Frosch M (1996b) Capsule phase variation in Neisseria meningitidis serogroup B by slipped-strand mispairing in the polysialyltransferase gene (siaD): correlation with bacterial invasion and the outbreak of meningococcal disease. Mol Microbiol 20: 1211–1220. Hannan A (2010) TRPing up the genome: tandem repeat polymorphisms as dynamic sources of genetic variability in health and disease. Discov Med 10: 314–321. Hebert ML & Wells RD (2005) Roles of double-strand breaks, nicks, and gaps in stimulating deletions of CTG.CAG repeats by intramolecular DNA repair. J Mol Biol 353: 961–979. Hebert ML, Spitz LA & Wells RD (2004) DNA double-strand breaks induce deletion of CTG.CAG repeats in an orientation-dependent manner in Escherichia coli. J Mol Biol 336: 655–672. Herbers K, Conrads-Strauch J & Bonas U (1992) Race-specificity of plant resistance to bacterial spot disease determined by repetitive motifs in a bacterial avirulence protein. Nature 356: 172–174. Hood DW, Deadman ME, Jennings MP, Bisercic M, Fleischmann RD, Venter JC & Moxon ER (1996) DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci USA 93: 11121–11125. Hosking SL, Craig JE & High NJ (1999) Phase variation of lic1A, lic2A and lic3A in colonization of the nasopharynx, bloodstream and cerebrospinal fluid by Haemophilus influenzae type b. Microbiology 145: 3005–3011. Hussain M, Haggar A, Peters G, Chhatwal GS, Herrmann M, Flock JI & Sinha B (2008) More than one tandem repeat
FEMS Microbiol Rev 38 (2014) 119–141
137
domain of the extracellular adherence protein of Staphylococcus aureus is required for aggregation, adherence, and host cell invasion but not for leukocyte activation. Infect Immun 76: 5615–5623. Ilver D, Arnqvist A, Ogren J et al. (1998) Helicobacter pylori adhesin binding fucosylated histo-blood group antigens revealed by retagging. Science 279: 373–377. Iyer RR, Pluciennik A, Rosche WA, Sinden RR & Wells RD (2000) DNA polymerase III proofreading mutants enhance the expansion and deletion of triplet repeat sequences in Escherichia coli. J Biol Chem 275: 2174–2184. Jackson AL & Loeb LA (2000) Microsatellite instability induced by hydrogen peroxide in Escherichia coli. Mutat Res 447: 187–198. Jansen A, Gemayel R & Verstrepen KJ (2012) Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences. Genome Dyn 7: 108–125. Janulczyk R, Masignani V, Maione D, Tettelin H, Grandi G & Telford JL (2010) Simple sequence repeats and genome plasticity in Streptococcus agalactiae. J Bacteriol 192: 3990–4000. Jennings MP, Srikhanta YN, Moxon ER, Kramer M, Poolman JT, Kuipers B & van der Ley P (1999) The genetic basis of the phase variation repertoire of lipopolysaccharide immunotypes in Neisseria meningitidis. Microbiology 145: 3013–3021. Jerome JP, Bell JA, Plovanich-Jones AE, Barrick JE, Brown CT & Mansfield LS (2011) Standing genetic variation in contingency loci drives the rapid adaptation of Campylobacter jejuni to a novel host. PLoS ONE 6: e16399. Jin H, Ren Z, Whitby PW, Morton DJ & Stull TL (1999) Characterization of hgpA, a gene encoding a haemoglobin/ haemoglobin-haptoglobin-binding protein of Haemophilus influenzae. Microbiology 145: 905–914. Jolivet-Gougeon A, Kovacs B, Le Gall-David S, Le Bars H, Bousarghin L, Bonnaure-Mallet M, Lobel B, Guille F, Soussy CJ & Tenke P (2011) Bacterial hypermutation: clinical implications. J Med Microbiol 60: 563–573. Jonsson AB, Nyberg G & Normark S (1991) Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. EMBO J 10: 477–488. Jonsson AB, Pfeifer J & Normark S (1992) Neisseria gonorrhoeae PilC expression provides a selective mechanism for structural diversity of pili. P Natl Acad Sci USA 89: 3204–3208. Jordan P, Snyder LA & Saunders NJ (2003) Diversity in coding tandem repeats in related Neisseria spp. BMC Microbiol 3: 23. Kajava AV (2012) Tandem repeats in proteins: from sequence to structure. J Struct Biol 179: 279–288. Kanbashi K, Wang X, Komura J, Ono T & Yamamoto K (1997) Frameshifts, base substitutions and minute deletions constitute X ray-induced mutations in the endogenous tonB gene of Escherichia coli K12. Mutat Res 385: 259–267. Karatzas KA & Bennik MH (2002) Characterization of a Listeria monocytogenes Scott A isolate with high tolerance
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
138
towards high hydrostatic pressure. Appl Environ Microbiol 68: 3183–3189. Karatzas KA, Wouters JA, Gahan CG, Hill C, Abee T & Bennik MH (2003) The CtsR regulator of Listeria monocytogenes contains a variant glycine repeat region that affects piezotolerance, stress resistance, motility and virulence. Mol Microbiol 49: 1227–1238. Karatzas KA, Valdramidis VP & Wells-Bennik MH (2005) Contingency locus in ctsR of Listeria monocytogenes Scott A: a strategy for occurrence of abundant piezotolerant isolates within clonal populations. Appl Environ Microbiol 71: 8390–8396. Kassai-Jager E, Ortutay C, T oth G, Vellai T & Gaspari Z (2008) Distribution and evolution of short tandem repeats in closely related bacterial genomes. Gene 410: 18–25. Kay S, Hahn S, Marois E, Hause G & Bonas U (2007) A bacterial effector acts as a plant transcription factor and induces a cell size regulator. Science 318: 648–651. Kelkar YD, Strubczewski N, Hile SE, Chiaromonte F, Eckert KA & Makova KD (2010) What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats. Genome Biol Evol 2: 620–635. King MR, Vimr RP, Steenbergen SM, Spanjaard L, Plunkett G III, Blattner FR & Vimr ER (2007) Escherichia coli K1-specific bacteriophage CUS-3 distribution and function in phase-variable capsular polysialic acid O acetylation. J Bacteriol 189: 6447–6456. Kornberg A, Bertsch LL, Jackson JF & Khorana HG (1964) Enzymatic synthesis of deoxyribonucleic acid, XVI. oligonucleotides as templates and the mechanism of their replication. Proc Natl Acad Sci USA 51: 315–323. Kusters JG, van Vliet AHM & Kuipers EJ (2006) Pathogenesis of Helicobacter pylori infection. Clin Microbiol Rev 19: 449–490. Lafontaine ER, Cope LD, Aebi C, Latimer JL, McCracken GH & Hansen EJ (2000) The UspA1 protein and a second type of UspA2 protein mediate adherence of Moraxella catarrhalis to human epithelial cells in vitro. J Bacteriol 182: 1364–1373. Lafontaine ER, Wagner NJ & Hansen EJ (2001) Expression of the Moraxella catarrhalis UspA1 protein undergoes phase variation and is regulated at the transcriptional level. J Bacteriol 183: 1540–1551. Lai Y & Sun F (2003) The relationship between microsatellite slippage mutation rate and the number of repeat units. Mol Biol Evol 20: 2123–2131. Le Bars H, Bousarghin L, Bonnaure-Mallet M & Jolivet-Gougeon A (2013) Role of a short tandem leucine/ arginine repeat in strong mutator phenotype acquisition in a clinical isolate of Salmonella Typhimurium. FEMS Microbiol Lett 338: 101–106. Leclercq S, Rivals E & Jarne P (2007) Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics 8: 125. Legendre M, Pochet N, Pak T & Verstrepen KJ (2007) Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res 17: 1787–1796.
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
Levengood SK, Beyer WF Jr & Webster RE (1991) TolA: a membrane protein involved in colicin uptake contains an extended helical region. P Natl Acad Sci USA 88: 5939–5943. Levinson G & Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4: 203–221. Lewis LA, Gipson M, Hartman K, Ownbey T, Vaughn J & Dyer DW (1999) Phase variation of HpuAB and HmbR, two distinct haemoglobin receptors of Neisseria meningitidis DNM2. Mol Microbiol 32: 977–989. Lim KG, Kwoh CK, Hsu LY & Wirawan A (2013) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14: 67–81. Lin WH & Kussell E (2012) Evolutionary pressures on simple sequence repeats in prokaryotic coding regions. Nucleic Acids Res 40: 2399–2413. Lindb€ack T, Secic I & Rørvik LM (2011) A contingency locus in prfA in a Listeria monocytogenes subgroup allows reactivation of the PrfA virulence regulator during infection in mice. Appl Environ Microbiol 77: 3478–3483. Lindstedt BA (2005) Multiple-locus variable number tandem repeats analysis for genetic fingerprinting of pathogenic bacteria. Electrophoresis 26: 2567–2582. Lindstedt BA (2011) Genotyping of selected bacterial enteropathogens in Norway. Int J Med Microbiol 301: 648–653. Liu L, Dybvig K, Panangala VS, van Santen VL & French CT (2000) GAA trinucleotide repeat region regulates M9/pMGA gene expression in Mycoplasma gallisepticum. Infect Immun 68: 871–876. Lopes J, Ribeyre C & Nicolas A (2006) Complex minisatellite rearrangements generated in the total or partial absence of Rad27/hFEN1 activity occur in a single generation and are Rad51 and Rad52 dependent. Mol Cell Biol 26: 6675–6689. Lysnyansky I, Rosengarten R & Yogev D (1996) Phenotypic switching of variable surface lipoproteins in Mycoplasma bovis involves high-frequency chromosomal rearrangements. J Bacteriol 178: 5395–5401. Mahdavi J, Sonden B, Hurtig M et al. (2002) Helicobacter pylori SabA adhesin in persistent infection and chronic inflammation. Science 297: 573–578. Mak ANS, Bradley P, Bogdanove AJ & Stoddard BL (2013) Tal effectors: function, structure, engineering and applications. Curr Opin Struct Biol 23: 93–99. Malkova A & Haber JE (2012) Mutations arising during repair of chromosome breaks. Annu Rev Genet 46: 455–473. Martin P, van de Ven T, Mouchel N, Jeffries AC, Hood DW & Moxon ER (2003) Experimentally revised repertoire of putative contingency loci in Neisseria meningitidis strain MC58: evidence for a novel mechanism of phase variation. Mol Microbiol 50: 245–257. Martin P, Makepeace K, Hill SA, Hood DW & Moxon ER (2005) Microsatellite instability regulates transcription factor binding and gene expression. P Natl Acad Sci USA 102: 3800–3804.
FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
Massey RC & Buckling A (2002) Environmental regulation of mutation rates at specific sites. Trends Microbiol 10: 580– 584. Merkel A & Gemmell N (2008) Detecting short tandem repeats from genome data: opening the software black box. Brief Bioinform 9: 355–366. Metruccio MM, Pigozzi E, Roncarati D, Berlanda Scorza F, Norais N, Hill SA, Scarlato V & Delany I (2009) A novel phase variation mechanism in the meningococcus driven by a ligand-responsive repressor and differential spacing of distal promoter elements. PLoS Pathog 5: e1000710. Miller VL, Taylor RK & Mekalanos JJ (1987) Cholera toxin transcriptional activator toxR is a transmembrane DNA binding protein. Cell 48: 271–279. Morbitzer R, R€ omer P, Boch J & Lahaye T (2010) Regulation of selected genome loci using de novo-engineered transcription activator-like effector (TALE)-type transcription factors. P Natl Acad Sci USA 107: 21617–21622. Mordhorst IL, Claus H, Ewers C et al. (2009) O-acetyltransferase gene neuO is segregated according to phylogenetic background and contributes to environmental desiccation resistance in Escherichia coli K1. Environ Microbiol 11: 3154–3165. Morton DJ, Whitby PW, Jin H, Ren Z & Stull TL (1999) Effect of multiple mutations in the hemoglobin- and hemoglobin-haptoglobin-binding proteins, HgpA, HgpB, and HgpC, of Haemophilus influenzae type b. Infect Immun 67: 2729–2739. Moscou MJ & Bogdanove AJ (2009) A simple cipher governs DNA recognition by TAL effectors. Science 326: 1501. Moxon ER, Rainey PB, Nowak MA & Lenski RE (1994) Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol 4: 24–33. Moxon R, Bayliss C & Hood D (2006) Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet 40: 307–333. Mrazek J (2006) Analysis of distribution indicates diverse functions of simple sequence repeats in Mycoplasma genomes. Mol Biol Evol 23: 1370–1385. Mrazek J, Guo X & Shah A (2007) Simple sequence repeats in prokaryotic genomes. Proc Natl Acad Sci USA 104: 8472– 8477. Mudunuri SB & Nagarajaram HA (2007) IMEx: imperfect microsatellite extractor. Bioinformatics 23: 1181–1187. Mu~ noz Bodnar A, Bernal A, Szurek B & L opez CE (2013) Tell me a tale of TALEs. Mol Biotechnol 53: 228–235. Murphy GL, Connell TD, Barritt DS, Koomey M & Cannon JG (1989) Phase variation of gonococcal protein II: regulation of gene expression by slipped-strand mispairing of a repetitive DNA sequence. Cell 56: 539–547. Nilsson C, Skoglund A, Moran AP, Annuk H, Engstrand L & Normark S (2006) An enzymatic ruler modulates Lewis antigen glycosylation of Helicobacter pylori LPS during persistent infection. P Natl Acad Sci USA 103: 2863–2868.
FEMS Microbiol Rev 38 (2014) 119–141
139
Nilsson C, Skoglund A, Moran AP, Annuk H, Engstrand L & Normark S (2008) Lipopolysaccharide diversity evolving in Helicobacter pylori communities through genetic modifications in fucosyltransferases. PLoS ONE 3: e3811. Orsi RH, Bowen BM & Wiedmann M (2010) Homopolymeric tracts represent a general regulatory mechanism in prokaryotes. Genomics 11: 102. Palmer ME, Lipsitch M, Moxon ER & Bayliss CD (2013) Broad conditions favor the evolution of phase-variable loci. mBio 4: e00430–12. Papazisi L, Gorton TS, Kutish G, Markham PF, Browning GF, Nguyen DK, Swartzell S, Madan A, Mahairas G & Geary SJ (2003) The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain R(low). Microbiology 149: 2307–2316. P^aques F, Leung WY & Haber JE (1998) Expansions and contractions in a tandem repeat induced by double-strand break repair. Mol Cell Biol 18: 2045–2054. P^aques F, Richard GF & Haber JE (2001) Expansions and contractions in 36-bp minisatellites by gene conversion in yeast. Genetics 158: 155–166. Park SF, Purdy D & Leach S (2000) Localized reversible frameshift mutation in the flhA gene confers phase variability to flagellin gene expression in Campylobacter coli. J Bacteriol 182: 207–210. Pearson CE, Nichol Edamura K & Cleary JD (2005) Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 6: 729–742. Piekarowicz A, Brzezi nski R & Kauc L (1974) Host specificity of DNA in Haemophilus influenzae: the in vivo action of the restriction endonucleases on phage and bacterial DNA. Acta Microbiol Pol A 27: 51–65. Piekarowicz A, Klyz A, Kwiatek A & Stein DC (2001) Analysis of type I restriction modification systems in the Neisseriaceae: genetic organization and properties of the gene products. Mol Microbiol 41: 1199–1210. Power PM, Sweetman WA, Gallacher NJ, Woodhall MR, Kumar GA, Moxon ER & Hood DW (2009) Simple sequence repeats in Haemophilus influenzae. Infect Genet Evol 9: 216–228. Rando OJ & Verstrepen KJ (2007) Timescales of genetic and epigenetic inheritance. Cell 128: 655–668. Rasmussen M & Bj€ ork L (2001) Unique regulation of SclB – a novel collagen-like surface protein of Streptococcus pyogenes. Mol Microbiol 40: 1427–1438. Razin S, Yogev D & Naot Y (1998) Molecular biology and pathogenicity of mycoplasmas. Microbiol Mol Biol Rev 62: 1094–1156. Ren Z, Jin H, Morton DJ & Stull TL (1998) hgpB, a gene encoding a second Haemophilus influenzae hemoglobin- and hemoglobin-haptoglobin-binding protein. Infect Immun 66: 4733–4741. Ren Z, Jin H, Whitby PW, Morton DJ & Stull TL (1999) Role of CCAA nucleotide repeats in regulation of hemoglobin and hemoglobin-haptoglobin binding protein
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
140
genes of Haemophilus influenzae. J Bacteriol 181: 5865–5870. Richard GF & P^aques F (2000) Mini- and microsatellite expansions: the recombination connection. EMBO Rep 1: 122–126. Richard GF, Dujon B & Haber JE (1999) Double-strand break repair can lead to high frequencies of deletions within short CAG/CTG trinucleotide repeats. Mol Gen Genet 261: 871–882. Richard GF, Kerrest A & Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72: 686–727. Richardson AR & Stojiljkovic I (1999) HmbR, a hemoglobin-binding outer membrane protein of Neisseria meningitidis, undergoes phase variation. J Bacteriol 181: 2067–2074. Ritz D, Lim J, Reynolds CM, Poole LB & Beckwith J (2001) Conversion of a peroxiredoxin into a disulfide reductase by a triplet repeat expansion. Science 294: 158–160. Rocha EP (2003) An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction. Genome Res 13: 1123–1132. Rocha EP, Matic I & Taddei F (2002) Over-representation of repeats in stress response genes: a strategy to increase versatility under stressful conditions? Nucleic Acids Res 30: 1886–1894. Rosengarten R & Wise KS (1991) The Vlp system of Mycoplasma hyorhinis: combinatorial expression of distinct size variant lipoproteins generating high-frequency surface antigenic variation. J Bacteriol 173: 4782–4793. Ryan KA & Lo RY (1999) Characterization of a CACAG pentanucleotide repeat in Pasteurella haemolytica and its possible role in modulation of a novel type III restriction-modification system. Nucleic Acids Res 27: 1505–1511. Sadarangani M, Pollard AJ & Gray-Owen SD (2011) Opa proteins and CEACAMs: pathways of immune engagement for pathogenic Neisseria. FEMS Microbiol Rev 35: 498–514. Saint-Ruf C & Matic I (2006) Environmental tuning of mutation rates. Environ Microbiol 8: 193–199. Sarkari J, Pandit N, Moxon ER & Achtman M (1994) Variable expression of the Opc outer membrane protein in Neisseria meningitidis is caused by size variation of a promoter containing poly-cytidine. Mol Microbiol 13: 207–217. Schaper E, Kajava AV, Hauser A & Anisimova M (2012) Repeat or not repeat? Statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res 40: 10005–10017. Schlotterer C, Imhof M, Wang H, Nolte V & Harr B (2006) Low abundance of Escherichia coli microsatellites is associated with an extremely low mutation rate. J Evol Biol 19: 1671–1676. Scholze H & Boch J (2011) TAL effectors are remote controls for gene activation. Curr Opin Microbiol 14: 47–53. Schug MD, Hutter CM, Wetterstrand KA, Gaudette MS, Mackay TF & Aquadro CF (1998) The mutation rates of
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
K. Zhou et al.
di-, tri- and tetranucleotide repeats in Drosophila melanogaster. Mol Biol Evol 15: 1751–1760. Schweda EKH, Richards JC, Hood DW & Moxon ER (2007) Expression and structural diversity of the lipopolysaccharide of Haemophilus influenzae: implication in virulence. Int J Med Microbiol 297: 297–306. Seale TW, Morton DJ, Whitby PW, Wolf R, Kosanke SD, VanWagoner TM & Stull TL (2006) Complex role of hemoglobin and hemoglobin-haptoglobin binding proteins in Haemophilus influenzae virulence in the infant rat model of invasive infection. Infect Immun 74: 6213–6225. Seib KL, Pigozzi E, Muzzi A, Gawthorne JA, Delany I, Jennings MP & Rappuoli R (2011) A novel epigenetic regulator associated with the hypervirulent Neisseria meningitidis clonal complex 41/44. FASEB J 25: 3622–3633. Shaver AC & Sniegowski PD (2003) Spontaneously arising mutL mutators in evolving Escherichia coli populations are the result of changes in repeat length. J Bacteriol 185: 6076–6082. Sheets AJ & St Geme III JW (2011) Adhesive activity of the Haemophilus cryptic genospecies Cha autotransporter is modulated by variation in tandem peptide repeats. J Bacteriol 193: 329–339. Sia EA, Kokoska RJ, Dominska M, Greenwell P & Petes TD (1997) Microsatellite instability in yeast: dependence on repeat unit size and DNA mismatch repair genes. Mol Cell Biol 17: 2851–2858. Spinosa MR, Progida C, Tala A, Cogli L, Alifano P & Bucci C (2007) The Neisseria meningitidis capsule is important for intracellular survival in human cells. Infect Immun 75: 3594–3603. Srikhanta YN, Maguire TL, Stacey KJ, Grimmond SM & Jennings MP (2005) The phasevarion: a genetic system controlling coordinated, random switching of expression of multiple genes. P Natl Acad Sci USA 102: 5547–5551. Srikhanta YN, Dowideit SJ, Edwards JL et al. (2009) Phasevarions mediate random switching of gene expression in pathogenic Neisseria. PLoS Pathog 5: e1000400. Srikhanta YN, Fox KL & Jennings MP (2010) The phasevarion: phase variation of type III DNA methyltransferases controls coordinated switching in multiple genes. Nat Rev Microbiol 8: 196–206. Srikhanta YN, Gorrell RJ, Steen JA, Gawthorne JA, Kwok T, Grimmond SM, Robins-Browne RM & Jennings MP (2011) Phasevarion mediated epigenetic gene regulation in Helicobacter pylori. PLoS ONE 6: e27569. Streisinger G, Okada Y, Emrich J, Newton J, Tsugita A, Terzaghi E & Inouye M (1966) Frameshift mutations and the genetic code. Cold Spring Harb Symp Quant Biol 31: 77–84. Tauseef I, Harrison OB, Wooldridge KG et al. (2011) Influence of the combination and phase variation status of the haemoglobin receptors HmbR and HpuAB on meningococcal virulence. Microbiology 157: 1446–1456. Tesfazgi Mebrhatu M, Wywial E, Ghosh A, Michiels CW, Lindner AB, Taddei F, Bujnicki JM, Van Melderen L &
FEMS Microbiol Rev 38 (2014) 119–141
Variable tandem repeats and bacterial adaptation
Aertsen A (2011) Evidence for an evolutionary antagonism between Mrr and Type III modification systems. Nucleic Acids Res 39: 5991–6001. Theiss P & Wise KS (1997) Localized frameshift mutation generates selective, high-frequency phase variation of a surface lipoprotein encoded by a mycoplasma ABC transporter operon. J Bacteriol 179: 4013–4022. Treangen TJ, Abraham AL, Touchon M & Rocha EP (2009) Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 33: 539–571. van Belkum A (1999) Short sequence repeats in microbial pathogenesis and evolution. Cell Mol Life Sci 56: 729–734. van der Ende A, Hopman CT & Dankert J (2000) Multiple mechanisms of phase variation of PorA in Neisseria meningitidis. Infect Immun 68: 6685–6690. van der Woude MW & Baumler AJ (2004) Phase and antigenic variation in bacteria. Clin Microbiol Rev 17: 581–611. van Ham SM, van Alphen L, Mooi FR & van Putten JP (1993) Phase variation of H. influenzae fimbriae: transcriptional control of two divergent genes through a variable combined promoter region. Cell 73: 1187–1196. van Passel MW & Ochman H (2007) Selection on the genic location of disruptive elements. Trends Genet 23: 601–604. van Selm S, van Cann LM, Kolkman MA, van der Zeijst BA & van Putten JP (2003) Genetic basis for the structural difference between Streptococcus pneumoniae serotype 15B and 15C capsular polysaccharides. Infect Immun 71: 6192– 6198. Vandersmissen L, De Buck E, Saels V, Coil DA & Anne J (2010) A Legionella pneumophila collagen-like protein encoded by a gene with a variable number of tandem repeats is involved in the adherence and invasion of host cells. FEMS Microbiol Lett 306: 168–176. Verstrepen KJ, Reynolds TB & Fink GR (2004) Origins of variation in the fungal cell surface. Nat Rev Microbiol 2: 533–540. Vimr E & Steenbergen SM (2006) Mobile contingency locus controlling Escherichia coli K1 polysialic acid capsule acetylation. Mol Microbiol 60: 828–837. Vogler AJ, Keys C, Nemoto Y, Colman RE, Jay Z & Keim P (2006) Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O157:H7. J Bacteriol 188: 4253–4263. Waite RD, Struthers JK & Dowson CG (2001) Spontaneous sequence duplication within an open reading frame of the pneumococcal type 3 capsule locus causes high-frequency phase variation. Mol Microbiol 42: 1223–1232. Waite RD, Penfold DW, Struthers JK & Dowson CG (2003) Spontaneous sequence duplications within capsule genes cap8E and tts control phase variation in Streptococcus pneumoniae serotypes 8 and 37. Microbiology 149: 497–504.
FEMS Microbiol Rev 38 (2014) 119–141
141
Wang G, Rasko DA, Sherburne R & Taylor DE (1999) Molecular genetic basis for the variable expression of Lewis Y antigen in Helicobacter pylori: analysis of the alpha (1,2) fucosyltransferase gene. Mol Microbiol 31: 1265–1274. Wells RD, Dere R, Hebert ML, Napierala M & Son LS (2005) Advances in mechanisms of genetic instability related to hereditary neurological diseases. Nucleic Acids Res 33: 3785–3798. White FF, Potnis N, Jones JB & Koebnik R (2009) The type III effectors of Xanthomonas. Mol Plant Pathol 10: 749–766. Willems R, Paul A, van der Heide HG, ter Avest AR & Mooi FR (1990) Fimbrial phase variation in Bordetella pertussis: a novel mechanism for transcriptional regulation. EMBO J 9: 2803–2809. Yamaoka Y, Kita M, Kodama T, Imamura S, Ohno T, Sawai N, Ishimaru A, Imanishi J & Graham DY (2002) Helicobacter pylori infection in mice: role of outer membrane proteins in colonization and inflammation. Gastroenterology 123: 1992–2004. Yang QL & Gotschlich EC (1996) Variation of gonococcal lipooligosaccharide structure is due to alterations in poly-G tracts in lgt genes encoding glycosyl transferases. J Exp Med 183: 323–327. Yang J, Wang J, Chen L, Yu J, Dong J, Yao ZJ, Shen Y, Jin Q & Chen R (2003) Identification and characterization of simple sequence repeats in the genomes of Shigella species. Gene 322: 85–92. Yogev D, Rosengarten R, Watson-McKown R & Wise KS (1991) Molecular basis of Mycoplasma surface antigenic variation: a novel set of divergent genes undergo spontaneous mutation of periodic coding regions and 5′ regulatory sequences. EMBO J 10: 4069–4079. Zahra R, Blackwood JK, Sales J & Leach DR (2007) Proofreading and secondary structure processing determine the orientation dependence of CAG 9 CTG trinucleotide repeat instability in Escherichia coli. Genetics 176: 27–41. Zaleski P, Wojciechowski M & Piekarowicz A (2005) The role of Dam methylation in phase variation of Haemophilus influenzae genes involved in defence against phage infection. Microbiology 151: 3361–3369. Zhang Q & Wise KS (1997) Localized reversible frameshift mutation in an adhesin gene confers a phase-variable adherence phenotype in mycoplasma. Mol Microbiol 25: 859–869. Zhou K, Michiels CW & Aertsen A (2012a) Variation of intragenic tandem repeat tract of tolA modulates Escherichia coli stress tolerance. PLoS ONE 7: e47766. Zhou K, Vanoirbeek K, Aertsen A & Michiels CW (2012b) Variability of the tandem repeat region of the Escherichia coli tolA gene. Res Microbiol 163: 316–322.
ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved