Ó Springer-Verlag 1999
Mol Gen Genet (1999) 262: 633±642
ORIGINAL PAPER
M. Jean á J. Pelletier á M. Hilpert á F. Belzile á R. Kunze
Isolation and characterization of AtMLH1, a MutL homologue from Arabidopsis thaliana
Received: 8 June 1999 / Accepted: 16 September 1999
Abstract DNA mismatch repair systems play an essential role in the maintenance of genetic information in living organisms and are also implicated in genetic recombination and genome stability. Using degenerate primers, we have cloned the ®rst plant homologue of the E. coli MutL gene, which we have called AtMLH1 for Arabidopsis thaliana MutL-homologue 1. AtMLH1 is present as a single-copy gene in the Arabidopsis genome and is located on the top arm of chromosome 4. Sequence analysis revealed that the product of this gene shows extensive sequence homology with other eukaryotic MLH1 proteins. As mlh1-de®cient lines would be useful for studying the biological function of this gene, several populations that had been mutagenized using T-DNA and transposon insertions were screened to identify such mutants. One line that carries a T-DNA insertion in the promoter region of the AtMLH1 gene was isolated. Surprisingly, although the insertion occurred only 80 bp upstream of the putative transcription start site, Northern analyses revealed very low but similar amounts of AtMLH1 transcript in both the wild type and the T-DNA insertion lines. RTPCR analyses suggest, however, that transcription is initiated further upstream in the insertion line and that the T-DNA may supply this novel initiation site. Finally, no increase in microsatellite instability ± a phenotype often associated with mutations in mismatch repair genes ± was observed in plants homozygous for this insertion.
Communicated by H. Saedler M. Jean á J. Pelletier á F. Belzile DeÂpartement de Phytologie, Pavillon Charles-EugeÁne Marchand, Universite Laval, Ste-Foy, QueÂbec, Canada G1K 7P4 M. Hilpert á R. Kunze (&) Institut fuÈr Genetik und Mikrobiologie, UniversitaÈt MuÈnchen, Maria-Ward-Strasse 1a, 80638 MuÈnchen, Germany E-mail:
[email protected] Tel.: +49-89-21806188; Fax: +49-89-21806160
Key words DNA mismatch repair á MutL homologue (MLH) á T-DNA insertion á Microsatellite á Reverse genetics
Introduction The mismatch repair (MMR) system plays an essential role in the maintenance of genetic information in living organisms (Modrich and Lahue 1996). Its well characterized anti-mutator activity is mainly responsible for the correction of errors in DNA replication such as mismatched bases and small insertions/deletions; the latter are thought to originate from slippage of the DNA polymerase during replication of microsatellites (Streisinger et al. 1966). In the absence of a functional MMR system, elevated rates of mutation are seen, and this is particularly true in the case of simple repetitive sequences such as microsatellites. In humans, microsatellite instabilities are frequently documented in patients suering from hereditary non-polyposis colon cancer, a disease usually associated with defects in MMR genes (reviewed in Kolodner 1995). The best characterized MMR system is the E. coli MutHLS system. As suggested by its name, the MutH, MutL and MutS genes constitute the key components of this system. Functional studies have shed light on the roles of these proteins (Modrich and Lahue 1996). MutS recognizes and binds to mismatched DNA, whereas MutH codes for an endonuclease which introduces a nick in the newly synthesized strand, thereby initiating the removal and replacement of the mismatched base. As for MutL, its role is less clear and it is thought to facilitate the recruitment of MutH to the MutSmismatch complex. In eukaryotes, a more complex situation is observed. In yeast, for example, six homologues of MutS (MSH16) and four homologues of MutL (PMS1, MLH1-3) have been identi®ed (Kolodner 1996; Modrich and Lahue 1996). Of these, however, only three MutS homologues (MSH2, MSH3 and MSH6) and three MutL
634
homologues (PMS1, MLH1 and MLH3) are involved in DNA mismatch repair in the nucleus. Whereas in bacteria homodimers of MutS and MutL interact to perform mismatch repair, the functional eukaryotic MMR complex is composed of heterodimers containing two MutS (MSH2/MHS3 or MSH2/MSH6) and two MutL (MLH1/PMS1 or MLH1/MLH3) homologues (Kolodner 1996; Flores-Rozas and Kolodner 1998). Despite its importance in the DNA mismatch process, very little is known of the function of the MutL protein and its eukaryotic homologues. It has been hypothesized that MutL acts as a molecular interface between MutS and MutH (Modrich 1991; Sancar and Hearst 1993). Three highly conserved motifs (KELVEN, DNGxG, GFRGEAL), all located in the N-terminal region of the protein, are thought to be essential for the MMR activity of the various MutL homologues (Bergerat et al. 1997; Pang et al. 1997). Recent work by Pang et al. (1997) supports this hypothesis by demonstrating that regions of the yeast MLH1p and PMS1p containing these motifs are essential for MMR activity. They also de®ned domains of yeast MLH1p and PMS1p which are required for dimerization. In MLH1p, residues 501±756 de®ne a PMS1p-interacting domain, while in PMS1p, residues 692±904 de®ne a MLH1p-interacting domain. Without these, MLH1p and PMS1p are unable to dimerize and this leads to a lack of MMR activity. Unique among the various MutL and MutS homologues with known mismatch repair activity, MLH1 has also been shown to promote crossing over during meiosis (Hunter and Borts 1997). This dierence in function between MLH1 and the other MutL and MutS homologues is further illustrated by the dierent meiotic phenotypes observed in various knock-out mice (Baker et al. 1995, 1996; De Wind et al. 1995; Reitmair et al. 1995; Edelmann et al. 1996). It has been proposed that since MLH1 is required during crossing over to promote the creation of short heteroduplex tracts, it could help to prevent or correct the over-replication of recombination intermediates, thus allowing the formation of Holliday junctions (Hunter and Borts 1997). As MLH1 per se does not have a DNA binding capacity nor an endonuclease activity, it may play a role in crossing over by mediating the interaction between a heteroduplex-binding protein, such as a meiosis-speci®c MutS homologue, and an endonuclease, such as a putative MutH homologue, thus acting analogously to the postulated ``matchmaking'' function of MutL in mismatch repair. The MutS homologues MSH4 and MSH5, although not involved in mismatch repair, are required for normal meiotic behavior, and their products have been found to act as a heterodimeric complex (Ross-Macdonald and Roeder 1994; Hollingsworth et al. 1995; Pochart et al. 1997). Since MLH1 has been shown to act in the same pathway as MHS4 (Hunter and Borts 1997) (and thus presumably MSH5), one could speculate that this MSH4/5 heterodimer could represent such a meiosis-speci®c heteroduplex-binding complex.
To gain insight into the process of genetic recombination in plants, both homologous and homeologous, we have initiated studies aimed at cloning and characterizing components of the Arabidopsis MMR system. In the work reported here, we describe the isolation and initial characterization of the Arabidopsis MLH1 gene, the ®rst MutL homologue identi®ed in plants.
Materials and methods Plant materials and growth conditions Seeds of the various ecotypes used in the present study were obtained from the Arabidopsis Biological Resource Center (ABRC) and from Dr. Farhah Assaad (University of Munich). They were sown and germinated using standard procedures. DNA analyses DNA was extracted from leaves or young seedlings by a modi®ed CTAB method (Dubois et al. 1998) or with a rapid microextraction protocol (Edwards et al. 1991). Phage and plasmid DNAs were prepared using the Wizard Plus Miniprep kit (Promega). RNA analyses Whole plants without roots were frozen in liquid N2 and ground to a ®ne powder for subsequent mRNA extraction. Total Arabidopsis RNA was isolated on a small scale for RT-PCR analysis, or on a larger scale for subsequent puri®cation of poly(A)+ RNA to be used in Northern experiments. For RNA minipreps, RNA was extracted from approximately 100 mg of tissue powder using the RNeasy Plant Mini Kit (Qiagen). For large-scale preparations, 1± 2 g of tissue was collected, and mRNA was prepared as described in Pawlowski et al. (1994) with minor modi®cations. Northern analysis was done essentially as described by Kunze et al. (1987). Cloning fragments of an Arabidopsis MutL homologue using degenerate primers In the Belzile laboratory, PCR ampli®cations were performed with two degenerate primers (PMS1 and PMS2; Table 1) using an annealing temperature of 50° C and genomic DNA from Arabidopsis Landsberg erecta (Ler) as template. The resulting products were separated on agarose gels, puri®ed, digested with SacI and BglII and ligated into the SacI + BglII-digested pBS KS(+) vector (Stratagene). A predominant 330-bp product (PMS330) was sequenced and its predicted protein sequence showed a high degree of homology to various MutL homologues. In the Kunze laboratory, ®rst-strand cDNA synthesis was done on 3 lg of mRNA from Arabidopsis Niederzenz (Nd-0) using pd(N)6 primers. Individual PCR ampli®cations were performed on aliquots of the cDNA using all possible combinations of ®ve degenerate forward primers (A1, A2, B1, B2, B3) and two degenerate reverse primers (C1, C2). The resulting products were separated on agarose gels. Bands of the expected sizes (A + C primers: »265 bp; B + C primers: »225 bp) were eluted, reampli®ed, digested with XbaI and PaeI and ligated into the XbaI + PaeI-digested pT7T3 vector (Amersham Pharmacia Biotech). Sequencing identi®ed one clone (patMLH-5b) that carried an insertion with strong homology to MutL homologues. Isolation of AtMLH1 cDNA clones The inserts from patMLH-5b and PMS330 were used as probes to screen a newly prepared Arabidopsis Niederzenz (Nd-0) cDNA
635 Table 1 List of primers Primer
Sequence (5¢ ® 3¢)
A1 A2 B1 B2 B3 C1 C2 LB102A MLH1 MLH2 MLH4 MLH5 MLH8 MLH9 MLH-F1577 MLH-F38 MLH-F-48 MLH-R2320 MLH-R28 MLH-R903 MLH-R974 PMS1 PMS2 RB16843
gctctagATHGCNGCNGGNGAA gctctagATHGCNGCNGGNGAG gctctagAARGARCTNGTNGANAA gctctagAARGARTTRGTNGANAA gctctagAARGARYTNATHGARAA caagcaTGCYTCNCCNCGRAANCC caagcaTGCYTCNCCYCTRAANCC GATGCAATCGATATCAGCCAATTTTAGAC CAGCCTCGACGCCGATTCAAGTTCC CATTGAACTCAGAGAGAACAAATCCTC AGTTCTTCTACAATCAGCGGTTTC GCAGCTGTCAACTCCAGCAATAAG CTCTCCATCTCTTGTTGGTGTTGTC CATCTCCGCCGTAAGAGACGAATC AGCTTAGCGATCCAGCCCCTTTGTCAGA AGGAAGAATCTCCGGCGACGACGATT CAGACCCAATTTGTATCTGGCGCGAGG TGGTTTAACCGTAAACTGGTTCCACTT CCGTCGTCGGAGACTTGAATGAGT TTCAATGGCTCTTTTTAAGGCAGAGCATT TCCCGTGGCAAATTGATTGACATGTAGACA ACCGCTGTTAAGGAGCTCGTNGARAA AGAGATCTCAGAGCCTCICCICKRAANCC GCTCAGGATCCGATTGTCGTTTCCCGCCTT
library, cloned in kZAP Express (Stratagene), as well as the Arabidopsis Columbia (Col-0) cDNA library CD4-15 obtained from the ABRC. Upon screening approximately 106 plaques we isolated one positive clone from each library. The inserts of these clones were excised in vivo, yielding the plasmids patMLH10a-3 and pMLH3-19, respectively, and transformed into E. coli XL1-Blue MRF¢. The cDNA inserts were sequenced using insertion-speci®c and ¯anking primers. RACE, primer extension and RT-PCR analyses On Nd-0 mRNA 5¢-RACE (Rapid Ampli®cation of cDNA Ends) was performed using primer MLH-R28 and the Marathon cDNA Ampli®cation Kit (Clontech). In addition, primer extension analyses were performed on Ler total RNA with a First-Strand cDNA Synthesis kit (Amersham Pharmacia Biotech) using MLH9 primer end-labelled with [c-32P]. First-strand cDNA for RT-PCR was synthesized from 5 lg of total DNA from Col-0 or CS-15463 (see below) with p(dT)20 or MLH-R974 as primer and Superscript II reverse transcriptase (Gibco), as recommended by the manufacturer. PCR was performed using 25% of the resulting singlestranded cDNA. Isolation of a genomic clone and mapping of the AtMLH1 gene To isolate a genomic clone, we screened an Arabidopsis Col-0 library consisting of sheared genomic DNA ligated into the EcoRI site of the kGEM11 vector (prepared by R. Mulligan and kindly provided by John McDowell), as well as the Arabidopsis Ler DNA library CD4-8 (obtained from the ABRC). Positive clones were isolated from both libraries. They were digested with either EcoRI or SacI, and the insertion fragments were subcloned into pBS KS(±) (Stratagene). During sequence analysis of these fragments the sequence of a BAC clone was published in GenBank that contains the complete Col-0 AtMLH1 gene (BAC T8A17, GenBank Accession No. AF072897). The location of the AtMLH1 gene in the Arabidopsis genome was assessed with a CAPS marker developed as follows. A 2.3-kb fragment from the 5¢ end of the AtMLH1 gene was ampli®ed with the AtMLH1-speci®c primers MLH1 and MLH5,
using Ler and Col-0 DNAs as templates. Upon sequencing, a polymorphic SspI site was found which allowed us to discriminate between the Ler and Col alleles of AtMLH1. Eighty-eight lines of the Lister and Dean RI mapping population (CS-1899, ABRC) were scored for the CAPS marker and cosegregation analyses were performed with the MAPMAKER program (Lander et al. 1987) using the segregation data available for these lines from the AtDB Web site (http://genome-www.stanford.edu/Arabidopsis/). Isolation of a line carrying an insertion near the AtMLH1 gene Four Arabidopsis populations were screened for the presence of a line carrying an insertion in the AtMLH1 gene: the Feldmann collection of 6000 T-lines mutagenized with T-DNA (CD5-7, obtained from ABRC), the Thomas Jack collection of 4900 T-DNA lines (CD6-7, ABRC), the AMAZE collection of 4590 En-1 lines (Baumann et al. 1998; obtained from Dr. Ellen Wisman of the Max-Planck-Institut fuÈr ZuÈchtungsforschung in Cologne) and the Dubois collection of 507 Ac lines (Dubois et al. 1998). These populations were available as DNA pools and corresponding subpools. A single insertion line was detected in the Feldmann collection as described below. In the ®rst screening step, DNA from the largest pools available were tested to identify lines containing an insertion in or near the AtMLH1 gene. PCR ampli®cations were performed with a combination of two AtMLH1-speci®c primers (MLH2 and MLH4; Fig. 1), and two primers located at the extremities of the T-DNA [LB102A and RB16843, McKinney et al. (1995)] oriented such as to drive DNA synthesis outward from the insert. The PCR products were separated on 1.5% agarose gels and Southern analysis was performed using as probe the PCR product generated from the AtMLH1 cDNA clone with the primers MLH1 and MLH4 (see Fig. 1). When a positive pool was observed, these screening steps were repeated with the corresponding subpools to identify the positive one. Seeds of the ten lines corresponding to the positive subpool CS15470 from the Feldmann collection were germinated and DNA was extracted from them. The single Arabidopsis Wassilewskija (Ws) line carrying a T-DNA insertion in the AtMLH1 gene was identi®ed by PCR as described above, except that separate reactions were carried out with each possible primer pair to determine the combination that gave rise to the positive PCR signal. The products obtained with MLH2/LB102A and MLH8/LB102A primers were puri®ed and sequenced. Seven seedlings from the positive line CS-15463 were transplanted into soil and grown for seed production. DNA was extracted from each plant and genotyped using the primers LB102A, MLH-R903, MLH2 and MLH8. One heterozygous individual was selected and sixteen progeny from this line were genotyped. From these, ®ve lines homozygous for the insertion and ®ve homozygous lines that lacked it (designated as ®rstgeneration homozygotes) were selected for further analysis. Four or ®ve progeny from each of these ten lines were grown (for a total of 48 individuals, designated as second-generation homozygotes) and their DNA was extracted for microsatellite analysis (see below). Microsatellite analysis of the T-DNA insertion line CS-15463 Six microsatellite loci, CA72, NGA172, ATHHANKA, ATTS0191, ATTS0262 and ATTS0392 (Bell and Ecker 1994; Loridon et al. 1998), were analyzed in 10 ®rst-generation homozygous lines (®ve homozygous for the insertion and ®ve that lacked it) derived from the T-DNA insertion line CS-15463, as well as in 48 of their progeny (second-generation homozygotes, see above). Microsatellites were ampli®ed and labelled in two consecutive steps. The resulting PCR products were electrophoresed on 6% polyacrylamide/8 M urea gels. Gels were dried and exposed to X-ray ®lms for 18±72 h without intensifying screens.
636
Fig. 1A, B Structure of the Arabidopsis AtMLH1 gene. A Top Arabidopsis AtMLH1 genomic DNA. Exons are represented by stippled boxes, introns by open boxes and untranslated sequences by lines. Restriction sites are designated as B (BamHI), H (HindIII), S (SacI) and X (XbaI). The polymorphic SspI site is indicated by an asterisk. Arrows denote the position and orientation of primers used in this study. I indicates the position of the degenerate forward primers, and II that of the degenerate reverse primers. The other denominations refer to the series of MLH primers (see Table 1 and Materials and methods). Primers and the 3¢ extremity of the genomic region are not drawn to scale. Bottom AtMLH1 mRNA. The AtMLH1 fragments used as probes for Southern analysis are represented by the dark lines A (obtained by PCR with the degenerate primers) and B (the PCR product generated from the cDNA clone pMLH3-19 with primers MLH1 and MLH4). B Schematic representation of the distribution of the three Arabidopsis thaliana MutL homologous genes on chromosome 4
Results Isolation and molecular characterization of a MutL homologue from Arabidopsis To clone an Arabidopsis homologue of MutL, degenerate oligonucleotide primers were designed based on the N-terminal amino-acid motifs KELVEN and GFRGEAL, two of the most conserved regions of previously characterized MutL homologues. With these primers, a 331-bp PCR product was successfully ampli®ed from Arabidopsis Ler genomic DNA, and a 233bp product was obtained from Nd-0 mRNA. These fragments were cloned and sequence analysis con®rmed that they originated from an Arabidopsis thaliana MutL
homologue, which we named AtMLH1. The dierence in length between these two products was accounted for by the presence of a 98-bp intron speci®c to Arabidopsis genomic DNA (Fig. 1A). The AtMLH1 fragments isolated by degenerate PCR ampli®cation were used to screen cDNA libraries (Nd-0 and Col). From the Nd-0 library, a 2-kb cDNA clone was isolated (named patMLH10a3) that lacks the 5¢ end but includes a polyA tail. From the Col library, a 2.3-kb cDNA clone was isolated (pMLH3-19) that has a longer 5¢ end, though not long enough to include the ATG start codon, and lacks a polyA tail. PCR was used to test for the presence of a full-length clone in ®ve additional cDNA libraries (CD4-6, CD4-7, CD4-13, CD4-14 and CD4-16; from ABRC) but none were found. 5¢-RACE and primer extension analyses were used to extend the sequences of patMLH10a3 and pMLH3-19, respectively. At the 5¢ end we obtained a product which extends 2 bp beyond an in-frame ATG codon and, at the 3¢ end, a polyA tail was reached. The complete sequence of the Col allele of the AtMLH1 gene and the corresponding protein was deposited in GenBank (Accession No. AJ012747). The putative translation start site is preceded in the genomic sequence by stop codons in all three reading frames but no other ATG; it thus constitutes a prime candidate for the authentic AtMLH1 start codon. If initiation occurs at this in-frame ATG codon (see Fig. 4B), the AtMLH1 ORF is predicted to be 2211 bp long and encodes a 737-residue protein. Primer extension experiments indicate the presence of two major
637
transcription initiation sites at 19 bp and 15 bp, respectively, upstream of the putative translation start codon, with the latter accounting for the majority of the primer extension products (data not shown). Putative TATAA and CCAAT boxes are located 108 bp and 45 bp, respectively, upstream from the ATG codon. Sequence homology searches in GenBank identi®ed a short Arabidopsis EST clone that may possibly correspond to AtMLH1 (EST No. AA605335). In addition, a genomic sequence recently released by the Arabidopsis Genome Initiative (AGI) contained the AtMLH1 locus (BAC clone T8A17, GenBank Accession No. AF072897). Two other putative MutL homologues were found on searching the AGI data: AtPMS1 (A. Alou and F. Belize, unpublished; BAC clone T14P8) and AtMLHx (BAC clone F8D20) (Fig. 1B). Both of these are more closely related to the PMS family of eukaryotic MutL homologues (see below). Southern analysis revealed that a probe speci®c for the highly conserved N-terminal domain of the AtMLH1 gene hybridizes to a single fragment in genomic DNA digested with each single and double enzyme combination tested (NcoI, BamHI, EcoRI, HindIII and SacI) (data not shown). This suggests that a single copy of the AtMLH1 gene is present in the Arabidopsis genome. A CAPS marker was developed based on a 1-bp insertion/deletion, located in the 7th intron, that creates a SspI restriction site in ecotype Landsberg erecta DNA (Fig. 1A). The segregation of this CAPS marker was studied in 88 individuals from the Lister and Dean RI mapping population. This allowed the AtMLH1 gene to be genetically mapped to the top arm of chromosome 4, between markers m506 and pCITf3, at a map position corresponding to the physical location of the BAC clone that contains the AtMLH1 gene. Comparison of our cDNA sequence with this genomic sequence revealed that AtMLH1 is 3932 bp long and consists of 16 exons (size ranging from 70 to 270 bp) and 15 introns (ranging in size from 80 to 222 bp; total intron length is 1718 bp) (Fig. 1A).
and Drosophila (30 and 36 residues, respectively). Similarly, the third hypervariable region is about twice as long in Arabidopsis (57 amino acids) as in yeast, rat and human (31, 29 and 29 residues, respectively) and almost four times as long as in Drosophila (15 amino acids). Sequence similarity is also minimal in these regions, as demonstrated by a comparison of the human and rat MLH1 proteins. Although these two proteins display a very high degree of overall identity (87% total; 93% if the three hypervariable regions are excluded), statistical analyses con®rm that sequence divergence between human and rat is signi®cantly higher in these hypervariable regions than in other parts of the protein (v2 8.54, 116.66 and 17.87 for the ®rst, second and third hypervariable regions, respectively). The most highly conserved stretch ± approximately 160 N-terminal residues ± of various MLH1 proteins were aligned and used to produce a hypothetical phylogenetic tree (Fig. 3). The MLH proteins can be roughly sorted into three distinct groups. The prokaryotic proteins from eubacteria cluster in group c (Fig. 3), with the proteins from the gram-positive B. subtilis and the two thermophilic species (A. pyrophilus and T. maritima) being more similar to the eukaryotic MLH proteins. These latter fall into two groups, the PMS and MLH families. AtMLH1 clearly belongs to the MLH group of proteins and is most similar to a putative MLH protein from maize (M. Hilpert and R. Kunze, unpublished). The other two putative MLH proteins from Arabidopsis (AtPMS1 and AtMLHx) apparently belong to the PMS family. Interestingly, however, AtMLHx appears in a somewhat intermediate position between the PMS and MLH proteins. Overall, the position of these proteins in the cladogram may indicate that the plant MLH proteins diverged from an ancestral eukaryotic MLH protein before the yeast, insect and mammalian proteins separated into distinct branches.
AtMLH1 is highly homologous to eukaryotic MutL homologues
To enable us carry out a functional characterization of the AtMLH1 gene, we searched for mlh1 mutants generated by insertional mutagenesis (either by T-DNA or transposon insertion). In order to isolate lines carrying an insertion in the AtMLH1 gene, numerous collections of T-DNA or transposon lines were screened by PCR with a combination of AtMLH1-speci®c and T-DNAspeci®c or transposon-speci®c primers, using the strategy described in Materials and methods. One putative mlh1 insertion mutant line, CS-15463, was isolated from the Feldmann collection of 6000 T-DNA lines. PCR analysis of seven individuals from this line con®rmed that it segregates for a T-DNA insertion in the 5¢ region of the AtMLH1 gene (Fig. 4A). Sequence analysis revealed that this T-DNA is inserted 80 bp upstream of the putative transcription start site and that the integration process destroyed a 15 bp segment of plant DNA sequence that includes a putative TATA box
Sequence alignment of the AtMLH1 protein with MutLhomologous proteins from a wide variety of organisms revealed a high degree of homology throughout the N-terminal half of the proteins, whereas the C-terminal portions are less well conserved, with the exception of the very last residues. This is illustrated by the alignment of AtMLH1 with the MLH1 proteins from yeast, rat, human and Drosophila melanogaster (Fig. 2). Although highly conserved motifs are found along the entire length of these peptide sequences, three hypervariable regions are notable (boxed in Fig. 2). Within these regions, major variations in length and sequence occur. For example, the second hypervariable region is more than three times longer in yeast, human and rat (97, 103 and 105 amino acids, respectively) than in Arabidopsis
Functional characterization of AtMLH1
638
(Fig. 4B). The 5¢-most T-DNA border (a in Fig. 4A) displays a precise junction with the plant DNA (data not shown), whereas at the 3¢-most junction that faces the AtMLH1 5¢-end (b in Fig. 4A), an imprecise insertion event occurred, as evidenced by the presence of 21 bp of ®ller DNA. The T-DNA insertion terminates with two
left-border sequences, thus suggesting the presence of an end-to-end (RB-RB) tandem insertion of T-DNA. The origin of the sequenced PCR fragments from the AtMLH1 locus and the presence of the T-DNA in the promoter region was veri®ed by Southern analysis (Fig. 4C). Genomic DNA from homozygous wild-type
639 b Fig. 2 Amino-acid sequence alignment of the MLH1 proteins from various species. Amino acids are designated according to the standard one-letter code, with dots representing missing amino acids. Amino acids that are not conserved between the MLH1 proteins from Rattus norvegicus (RnMLH1) and Homo sapiens (hMLH1) are indicated by asterisks. The MLH1 homologues from the other species are designated as DmMLH1 (Drosophila melanogaster), AtMLH1 (Arabidopsis thaliana) and ScMLH1 (Saccharomyces cerevisiae). Amino acids that are identical in all species compared are highlighted by black boxes indicate and those that are conserved in at least three species are stippled. The three hypervariable regions (see Results) are boxed
Fig. 3 Phylogenetic tree based on the N-terminal regions of the MLH1 protein from various species. The extents of aligned regions are indicated in parentheses, except for ZmMLH1 whose Nterminus is not known. The MLH1-homologous proteins from the dierent species are designated as follows: Mm, Mus musculus; Rn, Rattus norvegicus; h, Homo sapiens; Dm, Drosophila melanogaster; At, Arabidopsis thaliana; Zm, Zea mays; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Ec, Escherichia coli; St, Salmonella typhimurium; Hi, Haemophilus in¯uenzae; Bs, Bacillus subtilis; Ap, Aquifex pyrophilus; Tm, Thermotoga maritima. The alignment and dendrogram were generated by ClustalW. A value of 0.1 corresponds to a dierence of approximately 10% between two sequences
Fig. 4A±C Characterization of the T-DNA insertion in line CS-15463. A Hypothetical structure of the T-DNA insertion in the AtMLH1 promoter in line CS-15463 (not drawn to scale). LB, T-DNA left border; RB, T-DNA right border; a and b indicate segments ampli®ed by PCR and sequenced segments; m denotes the DNA fragment from the AtMLH1 5¢-end used as probe for Southern analysis in C. Restriction sites: H, HindIII; X, XbaI. B Genomic sequences found in A. thaliana ecotype Columbia (Col-0) and line CS-15463 and deduced amino-acid sequence of the AtMLH1 protein. The transcription start sites mapped by primer extension and 5¢-RACE experiments are marked by arrows. The stop codons in all three reading frames closest to the putative AtMLH1 translation initiation codon are boxed. Putative TATA and CAAT boxes are stippled. The T-DNA left border sequence is given in italics. The 15 nucleotides deleted from the plant DNA in the T-DNA insertion line CS-15463 are underlined; they are replaced by a 21-bp sequence of unknown origin (lower case letters). C Southern analysis of HindIII (left) and XbaI (right) digests of wild type (WT) and CS-15463 (15463) genomic DNA. The blots were hybridized with probe m from the 5¢ end of AtMLH1 (A)
640
and the CS-15463 insertion line was digested with HindIII and XbaI, respectively, blotted and hybridized with a probe consisting of the 537-bp stretch ¯anking the 3¢-most T-DNA left border (region b in Fig. 4A). As predicted from the genomic sequence (Fig. 4A), a single 3-kb (3.6 kb) band hybridized in the HindIII (XbaI) digest of wild-type DNA. In line CS-15463 a 5.0 kb (5.3 kb) hybridizes with the MLH1 probe. Homozygous CS-15463 individuals appear developmentally and morphologically normal and are fertile. To determine whether the T-DNA insertion impairs the expression of the AtMLH1 gene, RT-PCR and Northern analyses were performed. For PCR ampli®cation of cDNA from mutant and wild-type plants, primers were chosen that allow an unambiguous distinction between genomic DNA and cDNA. Primers MLH-F1577 and MLH-R2320 (Fig. 1) amplify a 744-bp product from cDNA and, due to the presence of four introns, a 1211bp product from genomic DNA. Using this approach, similar amounts of the 744-bp product were ampli®ed from wild-type and mutant cDNA, suggesting that the T-DNA insertion does not signi®cantly aect transcription and splicing near the 3¢ end of the gene (data not shown). This conclusion was corroborated by Northern analysis. In poly(A)+ RNA from wild-type and mutant plants alike, a single, extremely rare 2.5 kb transcript hybridizes with the AtMLH1 probe. No difference in size or abundance of AtMLH1 mRNA was observed between these plants (Fig. 5A). Although these experiments reveal that the T-DNA insertion line is not a knock-out mutant, the resolution is too low to exclude the possibility that the T-DNA causes subtle changes close to the transcription initiation site, such as incorrect splicing for example. We therefore performed cDNA ampli®cations at the very 5¢ end of the gene with a sense primer located between the mapped start sites and the T-DNA insert, and an antisense primer in exon 6 (primers MLH-F-48 and MLH-R903, respectively, Fig. 1). Whereas almost no product could be ampli®ed from wild-type cDNA, a strong band appeared in the mutant (Fig. 5B). As in both cDNAs the contaminating genomic DNA ampli®ed equally well, the PCR was obviously successful in both cases. If, however, a sense primer located 38 bp downstream of the ATG start codon was used in the reaction (MLH-F38), similar amounts of PCR product were obtained with wild-type and mutant cDNAs. We thus conclude that the primer extension analyses on wild-type RNA indeed mapped the predominant 5¢ transcription start sites. In the CS15463 mutant the preferred start sites are apparently located further upstream, suggesting that the left border of the Feldmann T-DNA could have a weak promoter activity. With respect to translation initiation, this is without consequence, however, as there are no additional ATGs located between the T-DNA and the AtMLH1 coding region (Fig. 4B). To investigate mismatch repair activity directly in this line, we examined six microsatellite loci. Microsatellite instability is often considered to be diagnostic for
Fig. 5A, B Analysis of AtMLH1 transcripts in Col wild type and the CS-15463 mutant. A Northern analysis with 2 lg of poly(A)+ RNA from Col-0 (+) and the T-DNA insertion line CS-15463 (M). As hybridization controls aliquots (1 pg, 5 pg, and 10 pg) of the unlabeled, denatured 4.1-kb AtMLH1 DNA fragment, ampli®ed from genomic DNA using primers MLH-F38 and MLH-R2320, were used. B RT-PCR ampli®cation of the 5¢ end of the AtMHL1 transcript. Aliquots (5 lg) of total RNA from Col-0 (+) and CS15463 (M) were reverse transcribed and PCR reactions were performed on 20% of the resulting cDNA with the primers indicated above the lanes. DNA, genomic Col-0 DNA used as control for the PCR reaction. The lanes marked kb contained a size marker (kb ladder)
impaired mismatch repair activity. The stability of six microsatellites (CA72, NGA172, ATHHANKA, ATTS0191, ATTS0262 and ATTS0392) was thus studied in homozygous Ws/Ws and CS-15463/CS-15463 individuals derived from a single heterozygous CS-15463 plant. Instability was observed for two (NGA172 and ATHHANKA) of the six microsatellites. A novel allele was observed in two dierent CS-15463/CS-15463 homozygotes, while no new alleles were detected in any of the ®ve Ws/Ws homozygotes (data not shown). The stability of these microsatellites was further investigated in a second set of 48 second-generation homozygotes (24 Ws/ Ws and 24 CS-15463/CS-15463; see Materials and methods). In this case, four to ®ve self progeny from each of the ten ®rst-generation homozygotes (®ve Ws/Ws and ®ve CS-15463/CS-15463) were examined. Microsatellite instability was observed both in plants carrying the TDNA insertion and in some that did not (data not shown). This suggests that the observed instability does
641
not correlate with the presence of the T-DNA insert. It was also noted that the particular instability detected in the ®rst-generation plants was usually not present in their progeny, thus suggesting that the observed microsatellite polymorphisms originated from somatic mutations which were not transmitted to ospring.
Discussion During the past several years, the components of the DNA mismatch repair apparatus of prokaryotes, yeast and mammals have been isolated and characterized in great detail. Interest in the mammalian MMR system was promoted by the discovery that MMR defects are responsible for certain kinds of familial cancer, for example hereditary nonpolyposis colorectal cancer (HNPCC) (Fishel et al. 1993; Leach et al. 1993). It turned out that the eukaryotic MMR system is far more complex than that of bacteria, and that ± in addition to post-replicative mismatch repair ± it is involved in the stabilization of simple-sequence repeats (microsatellites) (Strand et al. 1993), alkylation tolerance (Jones and Wagner 1981; Kat et al. 1993; Koi et al. 1994), homology-dependent regulation of mitotic recombination (Datta et al. 1997), chromosome synapsis and segregation in meiosis (Baker et al. 1995; Edelmann et al. 1996), and prevention of homologous recombination between diverged sequences (De Wind et al. 1995). Very little is known about the MMR system in plants. To date, four MutS homologues (AtMSH2, AtMSH3, AtMSH6-1 and AtMSH6-2) have been reported (Culligan and Hays 1997; Ade et al. 1999) and, as has generally been the case in other eukaryotes, this suggests that plants possess MSH gene families whereas prokaryotes rely on a single gene. In the case of MutL, a similar picture is emerging. The yeast S. cerevisiae, whose genome has been completely sequenced, has a total of four MutL homologues (Flores-Rozas and Kolodner 1998). In humans, three homologues (MLH1, PMS1 and PMS2) have so far been cloned and characterized (Bronner et al. 1994; Nicolaides et al. 1994; Papadopoulos et al. 1994). Here, we report the very ®rst plant homologue of MutL. Sequence comparisons indicate that this gene belongs to the MLH class of MutL homologues and has accordingly been named AtMLH1. Clear evidence for this is provided by the fact that Arabidopsis MLH1 shows more homology to its human counterpart (56% similarity) than to the putative Arabidopsis PMS1 (38% similarity). Preliminary sequence characterization of two other putative MutL homologues from Arabidopsis (AtPMS1 and AtMLHx) suggest that ± as in humans ± these belong to the PMS class of MutL homologues. An interesting feature which emerged from the sequence comparisons is the presence of hypervariable regions in the C-teminal half of eukaryotic MLH1 proteins. These regions of very low sequence homology dier greatly in length and separate well-conserved domains. From a functional point of view, it is worth
noting that the PMS1-interacting domain of the yeast MLH1, as de®ned by Pang et al. (1997), begins at the very end of the second hypervariable region and extends almost to the end of the protein. Hence, the bulk of this functionally-de®ned domain is comprised of mostly conserved residues. It is tempting to speculate that the conserved region bordered by hypervariable regions 2 and 3 might be sucient for MLH1 to interact with PMS1. It seems plausible that these hypervariable regions simply act as spacers between otherwise conserved and functionally important domains whose exact roles remain to be de®ned more precisely. Despite the obvious similarities between AtMLH1 and its counterparts in other eukaryotes, de®nite proof that AtMLH1 plays a role in MMR in Arabidopsis can only be obtained through mutant analysis. In mice, a mutation in the Mlh1 gene not only leads to an increase in mutation rate but also disrupts the meiotic process, resulting in both male and female sterility (Baker et al. 1996). With the aim of studying the function(s) of mismatch repair proteins in plants, we set out to isolate an insertion mutant for the AtMLH1 gene. A number of collections of insertion lines were screened, and one line was found in which a T-DNA is located 80 bp upstream of the AtMLH1 transcription start site, right within a sequence motif that resembles a canonical TATA box. As there have been many reports of insertions in 5¢ and 3¢ non-coding regions that produce mutant phenotypes (reviewed in Azpiroz-Leehan and Feldmann 1997), we expected AtMLH1 to be disrupted in this insertion line. Northern analysis, however, revealed transcripts of the same size and abundance in both insertion and wild-type lines, suggesting that the T-DNA insertion in the promoter did not signi®cantly aect or alter AtMLH1 expression. RT-PCR analysis of the 5¢ terminus of these transcripts indicated that transcription in the insertion line is initiated further upstream than in wild type. This suggests that the T-DNA insertion disrupts the normal transcription initiation signals but that the T-DNA itself or the rearranged sequences at the junction between the T-DNA and the AtMLH1 promoter provide alternative initiation signals. Since the AtMLH1 promoter has a very weak activity (evidenced by the very low abundance of AtMLH1 mRNA) ± as is often found for housekeeping genes ± a weak alternative promoter seems to provide normal and sucient levels of transcript. Furthermore, the transcript in the mutant appears to be normally processed at both extremities. Accordingly, we would expect that functional protein is produced in adequate amounts, and this is consistent with the ®nding that simple-sequence repeats are not destabilized and no obvious phenotypic feature (including fertility) distinguishes the insertion line from wild type. Therefore, additional screening eorts or alternative strategies, such as the use of dominant negative alleles (Pang et al. 1997), will be needed to analyze the function of this gene. Acknowledgements This work was supported in part by grant No. Ku715/7-1 from the Deutsche Forschungsgemeinschaft. This work
642 was also supported by a research grant from the Natural Sciences and Engineering Research Council of Canada to F. Belzile.
References Ade J, Belzile F, Philippe H, Doutriaux MP (1999) Four mismatch repair paralogues coexist in Arabidopsis thaliana: AtMSH2, AtMSH3, AtMSH6-1 and AtMSH6-2. Mol Gen Genet 262: 239±49 Azpiroz-Leehan R, Feldmann KA (1997) T-DNA insertion mutagenesis in Arabidopsis: going back and forth. Trends Genet 13:152±156 Baker SM, Bronner CE, Zhang L, Plug AW, Robatzek M, Warren G, Elliott EA, Yu JA, Ashley T, Arnheim N, Flavell RA, Liskay RM (1995) Male mice defective in the DNA mismatch repair gene PMS2 exhibit abnormal chromosome synapsis in meiosis. Cell 82:309±319 Baker SM, Plug AW, Prolla TA, Bronner CE, Harris AC, Yao X, Christie DM, Monell C, Arnheim N, Bradley A, Ashley T, Liskay RM (1996) Involvement of mouse Mlh1 in DNA mismatch repair and meiotic crossing over. Nature Genetics 13:336±342 Baumann E, Lewald J, Saedler H, Schulz B, Wisman E (1998) Successful PCR-based reverse genetic screen using an En-1mutagenised Arabidopsis thaliana population generated via single-seed descent. Theor Appl Genet 97:729±734 Bell CJ, Ecker JR (1994) Assignment of 30 microsatellite loci to the linkage map of Arabidopsis. Genomics 19:137±144 Bergerat A, de Massy B, Gadelle D, Varoutas PC, Nicolas A, Forterre P (1997) An atypical topoisomerase II from Archaea with implications for meiotic recombination. Nature 386:414±417 Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, Kane M, Earabino C, Lipford J, Lindblom A, Tannergard P, Bollag RJ, Godwin AR, Ward DC, Nordenskùld M, Fishel R, Kolodner R, Liskay RM (1994) Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature 368:258±261 Culligan KM, Hays JB (1997) DNA mismatch repair in plants ± an Arabidopsis thaliana gene that predicts a protein belonging to the MSH2 subfamily of eukaryotic MutS homologs. Plant Physiol 115:833±839 Datta A, Hendrix M, Lipsitch M, Jinks-Robertson S (1997) Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast. Proc Natl Acad Sci USA 94:9757±9762 De Wind N, Dekker M, Berns A, Radman M, te Riele H (1995) Inactivation of the mouse Msh2 gene results in mismatch repair de®ciency, methylation tolerance, hyperrecombination, and predisposition to cancer. Cell 82:321±330 Dubois P, Cutler S, Belzile F (1998) Regional insertional mutagenesis on chromosome III of Arabidopsis thaliana using the maize Ac element. Plant J 13:141±151 Edelmann W, Cohen PE, Kane M, Lau K, Morrow B, Bennett S, Umar A, Kunkel T, Cattoretti G, Chaganti R, Pollard JW, Kolodner RD, Kucherlapati R (1996) Meiotic pachytene arrest in MLH1-de®cient mice. Cell 85:1125±1134 Edwards K, Johnstone C, Thompson C (1991) A simple and rapid method for the preparation of plant genomic DNA for PCR analysis. Nucl Acids Res 19:1349 Fishel R, Leskoe MK, Rao MRS, Copeland NG, Jenkins NA, Garber J, Kane M, Kolodner R (1993) The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 75:1027±1038 Flores-Rozas H, Kolodner RD (1998) The Saccharomyces cerevisiae MLH3 gene functions in MSH3-dependent suppression of frameshift mutations. Proc Natl Acad Sci USA 95:12404±12409 Hollingsworth NM, Ponte L, Hasley C (1995) MSH5, a novel MutS homolog, facilitates meiotic reciprocal recombination between homologs in Saccharomyces cerevisiae but not mismatch repair. Genes Dev 9:1728±1739
Hunter N, Borts R (1997) Mlh1 is unique among mismatch repair proteins in its ability to promote crossing-over during meiosis. Genes Dev 11:1573±1582 Jones M, Wagner R (1981) N-Methyl-N¢-nitro-N-nitrosoguanidine sensitivity of E. coli mutants de®cient in DNA methylation and mismatch repair. Mol Gen Genet 184:562±563 Kat A, Thilly WG, Fang W-H, Longley MJ, Li G-M, Modrich P (1993) An alkylation-tolerant, mutator human cell line is de®cient in strand-speci®c mismatch repair. Proc Natl Acad Sci USA 90:6424±6428 Koi M, Umar A, Chauhan DP, Cherian SP, Carethers JM, Kunkel TA, Boland CR (1994) Human chromosome 3 corrects mismatch repair de®ciency and microsatellite instability and reduces N-methyl-N¢-nitro-N-nitrosoguanidine tolerance in colon tumor cells with homozygous hMLH1 mutation (published erratum appears in Cancer Res 55:201). Cancer Res 54:4308±4312 Kolodner RD (1995) Mismatch repair: mechanisms and relation to cancer susceptibility. Trends Biochem Sci 20:397±401 Kolodner R (1996) Biochemistry and genetics of eukaryotic mismatch repair. Genes Dev 10:1433±1442 Kunze R, Stochaj U, Laufs J, Starlinger P (1987) Transcription of transposable element Activator (Ac) of Zea mays L. EMBO J 6:1555±1563 Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SL, Newburg L (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1:174±181 Leach FS, et al (1993) Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell 75:1215±1225 Loridon K, Cournoyer B, Goubely C, Depeiges A, Picard G (1998) Length polymorphism and allele structure of trinucleotide microsatellites in natural accessions of Arabidopsis thaliana. Theor Appl Genet 97:591±604 McKinney EC, Ali N, Traut A, Feldmann KA, Belostotsky DA, McDowell JM, Meagher RB (1995) Sequence-based identi®cation of T-DNA insertion mutations in Arabidopsis: actin mutants act2-1 and act4-1. Plant J 8:613±622 Modrich P (1991) Mechanisms and biological eects of mismatch repair. Annu Rev Genet 25:229±253 Modrich P, Lahue R (1996) Mismatch repair in replication ®delity, genetic recombination, and cancer biology. Annu Rev Biochem 65:101±133 Nicolaides NC, et al (1994) Mutations in two PMS homologues in hereditary nonpolyposis colon cancer. Nature 371:75±80 Pang QS, Prolla TA, Liskay RM (1997) Functional domains of the Saccharomyces cerevisiae Mlh1p and Pms1p DNA mismatch repair proteins and their relevance to human hereditary nonpolyposis colorectal cancer-associated mutations. Mol Cell Biol 17:4465±4473 Papadopoulos N, et al (1994) Mutation of a mutL homolog in hereditary colon cancer. Science 263:1625±1629 Pawlowski K, Kunze R, de Vries S, Bisseling T (1994) Isolation of total, poly(A) and polysomal RNA from plant tissues. In: Gelvin SB, Schilperoort RA (eds) Plant molecular biology manual. Kluwer Academic Publishers, Dordrecht, pp D5/ 1±13 Pochart P, Woltering D, Hollingsworth NM (1997) Conserved properties between functionally distinct MutS homologs in yeast. J Biol Chem 272:30345±30349 Reitmair AH, et al (1995) Msh2 de®cient mice are viable and susceptible to lymphoid tumors. Nature Genet 11:64±70 Ross-Macdonald P, Roeder GS (1994) Mutation of meiosis-speci®c MutS homolog decreases crossing over but not mismatch correction. Cell 79:1069±1080 Sancar A, Hearst JE (1993) Molecular matchmakers. Science 259:1415±1420 Strand M, Prolla TA, Liskay RM, Petes TD (1993) Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature 365:274±276 Streisinger G, Okada Y, Emrich J, Newton J, Tsugita A, Terzah E, Inoye M (1966) Frameshift mutations and the genetic code. Cold Spring Harbor Symp Quant Biol 31:77±84