Feb 19, 1985 - labeled preparations of (dAAC) - (dGTT) (lane A), (dTC) * (dGA). (lane B), (dTTC) .... transcribed introns which are required to promote their own.
MOLECULAR AND CELLULAR BIOLOGY, Aug. 1985, p. 2123-2130 0270-7306185/082123-08$02.00/0 Copyright C 1985, American Society for Microbiology
Vol. 5, No. 8
Sequence Organization and Developmental Expression of an Interspersed, Repetitive Element and Associated Single-Copy DNA Sequences in Dictyostelium discoideum ALAN R.
KIMMEL'* AND RICHARD A. FIRTEL2
Laboratory of Cellular and Developmental Biology, National Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20205,1 and Department of Biology, University of California, San Diego, La Jolla, California 920932 Received 19 February 1985/Accepted 21 May 1985
We have examined the genomic organization and developmental expression pattern of a short, transcribed, interspersed repeat element and its associated single-copy sequences. We have previously shown that 1 % of the polyadenylated [poly(A)+] RNA from vegetative cells contains sequences that hybridize to this repeat. The complementary RNA is heterogeneous in size, and 90% of its mass hybridizes to single-copy DNA. In this study, we examined a series of genomic DNAs and cDNAs derived from poly(A)+ RNAs which are complementary to the repeat. Comparisons of sequence data from various genomic and cDNA clones indicated that (AAC)n * (GTT)n is the common sequence element. The tandem repeat occurred in - 100 short segments (-35 to 150 base pairs) per haploid genome interspersed with single-copy DNA. Probes from regions adjacent to this element hybridized to unique restriction fragments on DNA blots and unique poly(A)+ RNA species on RNA blots. The (AAC)n * (GTT)n sequence was asymmetrically transcribed with only (AAC)n sequences represented in RNA. The repeat was localized within the transcribed regions of several genes and 70 base pairs 5' to the transcription initiation site of another gene. Individual (AAC)n-containing RNAs exhibited a developmental pattern of expression suggestive of the coordinate expression of many AAC gene family members. gene family during early development. The possible function of the M4 AAC repeat in relation to its developmental pattern of expression is discussed. We suggest that the M4 AAC repeat may represent a recognition sequence involved in the regulation of coordinate developmental expression of a set of single-copy genes.
Large fractions of the genomes of many eucaryotes are composed of short, repeated segments of DNA which are interspersed with single-copy sequences (see reference 3). Many of these repeats are associated with single-copyderived sequences in heterogeneous nuclear RNA (hnRNA) (5). Recently, several very short repeat sequences have been implicated in the coordinate expression of adjacent genes (4). In addition, sequences which lie 5' to or within transcribed regions may enhance gene expression often in a manner specific to the species, the cell, or the developmental stage of the cell (7, 12). Dictyostelium discoideum is an excellent system for analyzing the relationship of repetitive elements to developmental gene expression. D. discoideum grows vegetatively as single cells but can be synchronously induced to enter a defined developmental cycle resulting in the formation of a multicellular organism containing specific cell types. Terminal differentiation occurs within -24 h. We have previously described the organization and expression of the M4 repeat sequence in D. discoideum (8, 11). It is found in -100 copies in the D. discoideum genome and is present on -1% of polyadenylated [poly(A)+] RNA from vegetative cells; this RNA population is heterogeneous in size, and -90% of its mass is complementary to single-copy sequences. We have now examined several genomic and cDNA clones which contain the M4 repeat. From comparative DNA sequence analyses of these clones, we now show that the common repetitive element is (AAC)Q * (GTT)n. This sequence is asymmetrically represented in RNA. Hybridization of the repeat and adjacent single-copy sequences to RNA isolated from various developmental stages suggests that there is a coordinate increase in expression of many members of this *
MATERIALS AND METHODS Growth and development of cells. D. discoideum axenic strain Ax-3 and wild-type strains NC-4 and MP-G were grown on bacteria and developed on filters (8). Isolation of RNA and DNA. Total, cytoplasmic, and nuclear poly(A)+ RNA and DNA were prepared, and RNA was isolated from developing cells at 0, 5, 10, and 15 h into the developmental cycle as described previously (8). RNA was separated by electrophoresis under denaturing conditions with formaldehyde. Generation of deletions and recombinant DNAs. M4 plasmids and subclones have been described previously (8). HpaI digestion of M4-4, followed by BAL 31 digestion, yielded a set of deletions near (AAC)n (GTT)n. These deletions were recovered by blunt-end recircularization and transformation. Homopolymeric (dAAC), * (dGTT)n was very briefly digested with BAL 31 and blunt end ligated into the HincIl site of M13-derivative mp9. By using crosshybridization and sequencing techniques, the polarity of the (AAC)n * (GTT)n sequence on mp9 to generate AAC- or GTT-specific probes was identified. RNA and DNA labeling. RNA was labeled in vivo with Na332PO4 (8). DNA restriction fragments were labeled in vitro by nick translation. M13 single-stranded DNA was labeled in vitro with a partially double-stranded probe made by using a hybridization probe primer. 5' and 3' labeling of restriction sites has been detailed previously (9). -
Corresponding author. 2123
2124
KIMMEL AND FIRTEL
MOL. CELL. BIOL.
Al
FIG. 1. Map of restriction enzyme sites, transcription units, and deletions in the M4 plasmid. The four major restriction fragments are indicated. The two transcription units arranged in opposite polarities are defined by the thickest lines. The two open boxes within the transcription unit from band 4 to band 3 are intervening sequences. The locations of the TATA Box (A), the oligo(dT) stretch (B), and the (AAC)Q for this gene are indicated. Five deletions within band 4 are shown. The M4-4 parental insert is 451 bp in length; nucleotide 1 begins at the HindIll site (see Fig. 3). Al extends from nucleotide 143 to nucleotide 183, A2 extends from 127 to 183, A3 extends from 112 to 326, A4 extends from 92 to 322, and AS extends from 56 to 271.
Hybridization. DNA blots were hybridized in 0.24 M phosphate buffer at 58°C and washed at the same temperature in 0.06 M phosphate buffer. RNA hybridizations were performed in 50% formamide-0.12 M phosphate buffer-3x SSC ( x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) at 37°C. Filters were washed at 37°C in 50% formamide-0.12 M phosphate buffer-2x SSC. RESULTS Genomic organization of M4 DNA. We have previously isolated a D. discoideum genomic recombinant plasmid denoted M4 (Fig. 1) (8, 11). M4 is predominantly single copy; however, a small fraction of the insert hybridizes to D. discoideum DNA (in excess), with kinetics indicating that it is repeated 50 to 100 times in the genome. Moreover, M4 hybridizes to >50 different restriction fragments on genomic DNA blots. We showed that the repeat sequence is located within band 4, a 451-base-pair (bp) HindIII-EcoRI restriction fragment (Fig. 1), whereas the remaining -4 kilobases (kb) of M4 DNA is single copy as assayed by DNA excess and Southern DNA blot hybridization. M4 DNA contains two divergent transcription units (8, 9; Kimmel and Carlisle, submitted for publication). One is located on two entirely single-copy restriction fragments, bands 1 and 2 (Fig. 1), and encodes a moderately abundant (0.1% of the total vegetative mRNA) 0.9-kb mRNA. The second transcription unit codes for a 1.0-kb mRNA found at a low abundance level in vegetative cells (-0.01% of the total vegetative mRNA). The start of the protein-coding region lies within band 4, which contains the M4 repeat sequence. We have identified two small, very A+T-rich intervening sequences within this gene (9). These sequences are transcribed as part of a 1.2-kb nuclear RNA precursor and are removed during maturation to yield the 1.0-kb polysomal mRNA. We wished to locate the transcription initiation site of the M4 band 4 mRNA and to identify and map the M4 repeat sequence. Band 4 was subcloned into pBR322 to produce
plasmid M4-4. Using the Berk-Sharp S1 nuclease technique, we mapped the 5' ends of the mature mRNA and the mRNA precursor (Fig. 2). The M4-4 plasmid was linearized with EcoRI, 5' end labeled, and then digested with HindIII. The small, labeled band 4 insert was strand separated and purified. The labeled fragment was hybridized to poly(A)+ cRNA and hnRNA, and the products were digested with S1 nuclease. The digested hybrids were separated electrophoretically adjacent to a DNA sequencing ladder of the same fragment (Fig. 2). As shown, the mRNA and the hnRNA yielded identical products; the transcription initiation site was located within the sequence of M4-4 (Fig. 3). Approximately 30 bp 5' to the transcription start site is a sequence (labeled A in Fig. 3) homologous to the TATA box. Located between the TATA box and the transcription initiation site is an oligodeoxythymidylate [oligo(dT)] stretch (labeled B in Fig. 3). Previously, we have shown that both the TATA box and oligo(dT) are found at the 5' ends of all protein-coding genes thus far examined in D. discoideum, and we have suggested that this structure forms part of the D. discoideum promoter (10). Immediately 5' to this region is the sequence AACAACAACACAACAACAACAACAACAACAAC [labeled (AAC)Q in Fig. 3]. As we will discuss, approximately 100 (AAC)Q blocks are present in the D. discoideum genome, and the repeat is present on a large number of poly(A)+ cRNAs. The sequence is asymmetrically expressed, and in most genes associated with the (AAC)Q repeat, the (AAC)N sequences are on the mRNA (see below). The M4-4 mRNA does not contain the repeat, although the (AAC)n sequences lie upstream from the 5' terminus in the same polarity as transcription. (AAC)N sequences are repeated in the D. discoideum genome. To define the limits of the M4 repeat, we constructed and analyzed a series of deletions of the M4 band 4 insert. The M4-4 plasmid was linearized with HpaI (Fig. 1) and digested with BAL 31 nuclease for various times, and deletions were rescued by circularization and transformation into Escherichia coli. The resulting plasmids were screened
VOL. 5, 1985
DEVELOPMENTALLY REGULATED REPEATS
by hybridization to D. discoideum genomic DNA blots to determine whether they retained a D. discoideum repetitive component (see below). Five deletions are described here by sequence analyses and DNA and RNA hybridization studies. The regions deleted are indicated as part of the M4 maps in Fig. 1, and the exact nucleotide positions deleted are listed in the legend to Fig. 1. In Al and A2, essentially only the (AAC)n was deleted; in A3, A4, and A5, the (AAC)Q, the TATA box, the oligo(dT) stretch, and the 5' transcription initiation site were deleted (Fig. 1 and 3). Plasmid M4-4 and
G A T C a b
T
A
G
_
T_G
_
A4
usii
-
C
A
the corresponding deletions Al through A5 were radiolabeled and hybridized to blots of D. discoideum DNA. The parental M4-4 hybridized to >50 restriction fragments, whereas all deletions hybridized to only a single band (Fig. 4), the same genomic restriction fragment to which M4-1 hybridized (Fig. 1). These results suggest that the (AAC)n sequence is the repetitive component of M4. To examine this further, we hybridized a D. discoideum DNA blot with a labeled (dAAC)n * (dGTT)n synthetic homopolymer. This hybridization yielded a pattern identical to that of M4-4 (Fig. 4). We also used the (dAAC), * (dGTT)n probe to screen genomic clones that hybridize to M4-4. All of the genomic clones which hybridized to M4-4 also hybridized to (dAAC)n * (dGTTC)n. Furthermore, M4-4 and (dAAC)n * (dGTTC)n hybridized with identical patterns when these clones were digested with various restriction enzymes and hybridized on DNA blots. Also, cDNA clones were identified and shown to hybridize both M4-4 and (dAAC)n (dGTT)n probes with identical patterns. Several genomic and cDNA clones were sequenced and shown to contain AAC sequences. Table 1 indicates the length variability and degree of homology observed for some genomic and cDNA clones. (AAC). sequences are expressed at high levels. Previous results indicated that the M4 repeat was present on -1% of D. discoideum vegetative poly(A)+ RNAs (8). These were compound mRNAs containing both repeat and single-copy sequences. To prove that the (AAC)Q sequence is associated with these messages, vegetative cell poly(A)+ RNA was labeled in vivo and hybridized to filters containing M4-1, M4-3, M4-4, and the M4-4 deletions. Approximately 1% of the poly(A)+ RNA hybridized to the M4-4 sequence (Table 2). The values for M4-1 and M4-3 were -0.1 and -0.02%, respectively, consistent with the previous results (8). The M4-4 deletions that lack the (AAC)Q repeat hybridized to c0.03% of the RNA. This indicates that the (AAC)n repeat is responsible for the high level of hybridization to M4-4 seen. The low level of hybridization that remained was the result of hybridization to the low-abundance mRNA transcript derived from bands 4 and 3 (Fig. 1 and 5). M4-4 and its deletions were also hybridized to RNA blots of vegetative poly(A)+ RNA (Fig. 5). M4-4 hybridized to a heterogeneous population of RNAs, indicating that a family of mRNAs shares this element. The (dAAC)n (dGTT)n probe exhibited an indistinguishable pattern of hybridization. Deletions lacking the (AAC)n repeat, however, hybridized to only a single RNA species. To examine the transcription of both strands of the repeat, -
5
T
2125
a
s'
a't
-
**0
(AAC)n- or (GTT)n-specific probes and hybridized them to RNA blots (Fig. 5). (GTT)n hybridized to a heterogeneous RNA population, whereas no hybridization to (AAC)n was detected. Developmental expression of the M4 (AAC)n repeat. We monitored the expression of the (AAC)n sequences during D. discoideum development. A Northern blot containing poly(A)+ RNA purified from various developmental stages was hybridized with an M4-4 probe. The repeat hybridized with a heterogeneous pattern at all stages. Approximately 1% of the poly(A)+ RNA from vegetative cells contained M4 AAC sequences, but as development proceeded, this level of hybridization increased significantly (Fig. 6A). Because of the differences in hybridization intensity, it was not clear whether the banding patterns were identical in each lane. Therefore, to examine whether repeat-containing transcripts actually showed a similar pattern of regulation, we hybridized developmental RNA blots containing
we prepared single-stranded
FIG. 2. S1 mapping of the 5' ends of M4-4 and hnRNA. Band M44 (Fig. 1) was 5' labeled at the EcoRI site, strand separated, and purified. The labeled fragment was hybridized to poly(A)+ RNA from cytoplasm and nuclei. The nuclear RNA preparation contains high-molecular-weight mRNA precursors with intervening sequences (see reference 9). The hybrids were digested with Si nuclease and separated by gel electrophoresis adjacent to a MaxamGilbert (13) sequencing ladder of the same restriction fragment. Shown are the poly(A)+ cRNA hybrid (lane a), the nuclear poly(A)+ RNA hybrid (lane b), and the Maxam-Gilbert reactions (lanes G, A, T, and C). A very short exposure reveals one band in lane a of the exact molecular weight of the band in lane b.
K
2126
KIMMEL AND FIRTEL
MOL. CELL. BIOL. 50
A AC TIG TTG CI GTA A TA CAC CAG T TAG TIC AAT TAA AAT TAG CAG CAA CAM MTT CAG AA AAG CAC AM T TG AAC AAC GAC CAT TAA GC TAT TTC GTG m AAT G GMIC AAA ATC MGA TTA AT TAMC GIC GTT GTT 100
TIC ATT TAG AAA MC CAA CT Cc TI MC TM Am TM CT GMA GAG M
A M MTT TAG T TAGCM GTA CM ATG m CMI ATA MG TTAA TT TAC MA GAC TAT TIC MT TIT TC TM ATC AM TC GT CAT AM
(AAC) n
1 50
200
A
C CMA CAAC) A ACAC MAC C C AM T ATA TM AAA AA AT ACA IA T AGMC MMM MM rrA MTTA=CIT GIT CIC; TI MC TCTG TIC TIC TIC TIC TIC mI TIT A TATTAMTTTITATIT T^ ¶U TM TTA
MCT
B TI A MA TAG
T AAA A
5
250
A
AM ATT TTA GGA GAG MTTTA GM TTA MT WA T A TM MT CCT CIC TM M MT TM AT
AAA
300
TAT WA ATT AA T ATA AMT TM T
350
AMT AAA GAT MT GAT m GM AGA AL A m AMT CM T'A MG ACA TAC TAT TAT GAT TAT AAA WCA AT ATr TAA T CTA CTA AT MAA 'T ITT ITT MA TMA CITMT TIC ITT AT ATA ATA CTA ATAm AT TMA TAA
I7S Hom
GCA CGT
GOT AGT
400
GMA T G TAT CM AT ATT 3GC TIC MT TTA ITG AAT CIT ATA GTr TAA TAM A AAC TrA AAT
AA TTA TTA GCA AAA CAT AAA ACA TCA GAA TCr AAT AAT COT TIT GTA TTT TT AGT C
450
TIc AAG
Translation start FIG. 3. DNA sequence of M4 band 4. M4-4 was sequenced by the Maxam-Gilbert technique (13) from restriction sites HindlIl, HpaI, Hinfl (not shown), and EcoRI to generate a completely overlapping unambiguous sequence. The lower line is the coding strand. The TATA Box (A), the oligo(dT) stretch (B), and the (AAC)n are indicated. The start site and direction of transcription are shown, as are the region of homology with the 3' end of the 17S rRNA and the AUG initiating codon sequence. An AUG sequence preceding the initiation codon is found at nucleotide 354 and is followed by in-frame stop codons. c 4q-
,tI c~I c
_ c}) a 0
C(N
40
4
*WA
C3 4
'g 4
4
amounts of poly(A)+ RNA that were adjusted to give approximately equal levels of M4-4 hybridization (Fig. 6B). These results indicate that many of the distinguishable RNA bands show the same relative changes in their levels of developmental expression. However, it is not possible to conclude from these data that all transcripts behave in a parallel manner. We thus began analyses of other individual members of the M4 family. cDNA clones containing (AAC)N. The M4-4 AAC repeat was used to screen a cDNA library (14), and several clones were isolated. Two cDNAs were studied in more detail to examine their organization and patterns of expression. These plasmids, pc(AAC)-1 and pc(AAC)-2, were mapped, and fragments were chosen for subcloning and hybridization to D. discoideum genomic DNA blots (Fig. 7A). Fragment R (repeat) of each plasmid hybridized to M4-4 and (dAAC)n * (dGTT)n and gave identical patterns to that observed for M4-4 when each was hybridized to D. discoideum genomic DNA blots (Fig. 7B). Regions of these inserts were sequenced. Fragment R from each clone contained (AAC)n sequences (Fig. 7; Table 1). When fragments SC (single copy) were hybridized to D. discoideum DNA blots, only one fragment was detected, indicating that these fragments are single copy and are derived from different regions of the genome. The RNAs complementary to these two cDNA clones are therefore compound transcripts containing both repetitive and single-copy sequences. The polarity and relative positions of the (AAC)n (GTT)n -
FIG. 4. DNA blot hybridization of M4 repeat and M4-4 deletions. D. discoideum DNA was digested with EcoRI, HaeIII, and HapII, electrophoresed, and blotted onto nitrocellulose. Identical DNA blots were hybridized to the in vitro-labeled M4-4 insert, the (dAAC)n (dGTT) homopolymer [(aac/gtt)nj, the M4-1 fragment, and various M4-4 deletions (Al through A5; see Fig. 1). -
VOL. 5, 1985
,-
DEVELOPMENTALLY REGULATED REPEATS
TABLE 1. Length, variability and degree of homology within the AAC repeat familya
Length (nucleotides)b
Region Region
M4-4
36 (12) 42 (14) 42 (14) 57 (19) 18 (6) 147 (49)
pc(AAC)-1 pc(AAC)-2
Dd(AAC)-X Dd(AAC)-Y Dd(AAC)-Z
51(17) 63 (21)
a M44, Dd(AAC)-X, Dd(AAC)-Y, and Dd(AAC)-Z are derived from genomic DNA sequences, and pc(AAC)-1 and pc(AAC)-2 are derived from cDNA sequences. pc(AAC)-1 and pc(AAC)-2 each contain two closely linked AAC clusters in the same polarity. b Numbers in parentheses indicate numbers of AAC repeats.
sequences on pc(AAC)-1 and pc(AAC)-2 were determined by hybridizing strand-specific probes derived from the single-copy fragments to poly(A)+ RNA isolated from cells developed for 15 h. Hybridization of 3'-end-labeled probes to mRNA would place the M4 AAC repeat 5' to the single-copy region, whereas if the 5'-end-labeled probes hybridized to mRNA, the repeat would lie toward the 3' end of the cDNA. Only the 3' probes from either cDNA hybridized (Fig. 7C). The repeat-containing fragment must lie 5' to the single-copy sequences examined and contain (AAC)n sequences on the RNA transcripts. Only one size of poly(A)+ RNA is derived from pc(AAC)-2, whereas the
M4-3
al
11W
DNA
Hybridization
M4-4 .......................................... M4-4 Al .......................................... M4-4 A2 .......................................... M4-4 A3 .......................................... M4-4 A4 .......................................... M4-4 A5 ..........................................
M4-3 .......................................... M4-1 ..........................................
1.1 0.03 0.01 0.03 0.02 0.01 0.02 0.11
a Samples (3 pLg) of the indicated M4 DNAs were immobilized on nitrocellulose and individually hybridized to in vivo-labeled 32P-labeled poly(A)+ RNA from vegetative cells. Control filters with pBR322 DNA retained