Apr 25, 2018 - The âTATA boxâ is located 27 nucleotides up- stream from the ... characteristics along with a general similarity. ..... Chris Sherwood, and Marc Nixon for excellent technical assistance ... (1981) Nature 2 9 4 , 713-718. 11. Gorin ...
Val. 260, No. 8, Issue of April 25, pp. 5055-5060, 1985 Printed in U.S.A.
THEJOURNAL OF BIOLOGICAL CHEMISTRY 01985 hy The American Society of Biological Chemists. Inc
The Human &-Fetoprotein Gene SEQUENCEORGANIZATIONAND
T H E 5‘ FLANKINGREGION* (Received for publication, June 13, 1984)
Masaharu SakaiSs, Tomonori Morinagas, Yoshio Uranos, Kazutada Watanabes, Thomas G. Wegmannq, and TaikiTamaokiOII From the §Department of Medical Biochemistry, University of Calgary, Calgary, Alberta, Canada T 2 N 4Nl and the lIDewartment of Immunolom and Medical Research Council Groupon Immunoregulation, University of Alberta, Edmonton, Alberta, Canada T6G 2H7-”
The human a-fetoprotein (AFP) gene was isolated into three overlapping clones in bacteriophage X vectors and its sequence organization analyzed by restriction endonuclease mapping and nucleotide sequencing. The human AFP gene is about 20 kilobase pairs long and contains 15 exons and 14 introns. The overall organization of the human AFP gene is similar to that of the mouse AFP gene, with all but two exons showing identical sizes. Nucleotide sequences at all exonlintron junctions display similarity to theconsensus boundary sequence (Breathnach, R., and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383), with the GT-AG rule applied to thesplicing point. The cap site maps 44 nucleotides upstream from the translation initiation site. The “TATA box” is located 27 nucleotides upstream from the putative cap site and is flanked by sequences with dyad symmetry. The TATA box can thus be placed in the loop portion of a possible stemloop structure formed by intrastrand base-pairing. Other characteristic nucleotide sequences in the 5’ flanking region include a CCAAC pentamer, a 14-base pair (bp) enhancer-like sequence, and a 9-bp sequence homologous to the glucocorticoid responsive element. A long (90 bp) direct repeat and several alternating purinelpyrimidine sequences are also present in the 5’ flanking region. A 736-bp sequence of the 5‘ flanking region adjacent to the cap site of the human AFP gene region shows a 61%similarity with the corresponding of the mouse AFP gene. There are two Alu family sequences and two poly(dT-dG) repeats in the human AFP gene that show different distribution patterns from those in the mouse AFP gene.
chemical andfunctionalproperties (6, 7) have stimulated interest in the molecular and evolutionary relationship between these genes and in the controlof their expression. The mouse AFP gene has been analyzed in detail (8).It is about 20 kb long, containing 15 exons and 14 introns. It is 13.5 kb downstream from the albumin gene (9) which also has 15 exons of similar sizes (8, 10). Further analysis of the size and nucleotide sequence of individual exons has established a correlation between the gene structure and the threedomain protein structure(10). This, coupled with a significant homology of amino acid sequences betweenAFP and albumin, supports the concept that these genes are derivedfroma common ancestral gene (11, 12). We havepreviously determined the completenucleotide sequence of human AFP mRNA anddeduced the amino acid isolation and analysis sequence (13).This report concerns the of genomic clonesof human AFP. Incomparison to themouse AFP gene, the human AFPgene shows severaldistinguishing characteristics along with a general similarity. These studies provide the basis for a more complete understanding of the molecular evolution of the AFP gene and its relation to the albumin gene. A preliminary account describing general features of the human AFP gene has been published elsewhere (14). EXPERIMENTALPROCEDURES
Gene Libraries-A Charon 20 library was constructed as follows. The DNA from a human B lymphoblastoid cell line, 6.1.6 (15), was completely digested by HindIII, and 10-15-kb fragments were recovered after electrophoresis of the digest on a 0.8% agarose gel. The right and left arms of Charon 20 were isolated from a HindIII digest of Charon 20 DNA, ligated with the above DNA fragments using T4 DNA ligase (Bethesda Research Laboratories), and packaged in vitro according to Hohn (16). a-Fetoprotein (AFP’) isa major serum protein during fetal A Charon 4A library, constructed from a partial EcoRI digest of life (1-3). Interest in the AFP gene stems from the fact that human fibroblast DNA (17), was generously provided by Dr. 0. S. it isdevelopmentally regulated and oftenre-expressed at high Smithies, University of Wisconsin. Human gene libraries were screened with 32P-laheledhuman AFP levels in hepatocarcinoma and teratocarcinoma patients (4, 5). In addition, its similarities to albumin in certainphysico- cDNA clones, pHAF6 and pHAF7 (13), according to Benton and Davis (18). Restriction Nuclease Mapping and Nucleotide Sequencing-Diges*This workwas supported by the National Cancer Institute of Canada, the Medical Research Council of Canada, and the Alberta tion ofDNA with restriction endonucleases (Bethesda Research Heritage Savings Trust Fund. The costs of publication of this article Laboratories, New England Biolabs, and Boehringer Mannheim, were defrayed in part by the payment of page charges. This article Canada) was carried out under the conditions recommended by the must therefore be hereby marked “advertisement” in accordance with suppliers. Appropriate fragments of genomic clones were subcloned into plasmid pBR322 for further restriction analysis and nucleotide 18 U.S.C. Section 1734 solely to indicate this fact. $ Present address: Department of Biochemistry, Faculty of Medi- sequencing. The fragments containing exons were detected by Southern blot analysis (19) using cDNA clones as probes. cine, University of Tokyo, Tokyo, Japan. The 5’ ends of DNA fragments were labeled with [Y-~*P]ATP using )I Research associate of the National Cancer Institute of Canada. T4 polynucleotide kinase (Bethesda Research Laboratories). The 3’ To whom reprint requests should be addressed. The abbreviations used are: AFP, a-fetoprotein; bp, base pairs; ends of DNA fragments were labeled with appropriate [a-32P]deoxkb, kilobase pairs. ynucleotide triphosphates using the large fragment of Escherichia coli
’
5055
GAA
The Human a-FetoproteinGene
5056 FIG. 1. The human AFP gene. The black boxes represent exons and theopen ureas indicate introns and flanking sequences. Three genomic clones, Ch.HAF1, Ch.HAF2, and Ch.HAF3, are drawn under the region from which they were derived. E , EcoRI; B, BamHI: H, HindIII; t,Alu repeats; ‘I, (dT-dG),.
7
H
E
B
H I
HI
1
2
3
EE
I*
1
I
5
4
nn
Ir
8
I I I I :
8
12
1415 9101 13 1
I
1
.
.
.
H H I1
E
* I
I I II
Ch HAF 1
I Ch HAF 2
C h HAF 3
77
n
H
I
1
I
.
1
.
.
.
.
5
0
1
.
.
.
10
.
1
.
.
.
.
1
.
.
,
2 0 kb
15
TABLEI The sizes of exons and introns and the nucleotide sequences uttheir junctions in the human AFP gene Exon
No.
3’ Junction
129 G GGA ATA 52 GCT GAC CT 133 ACC CAG 212 ATG AAC AA 133 ACA AAG 98 CAA GCC AT 7 130 GAT GGG GTAACA a AG GCA TTG 215 AAA GGA 9 133 GTAAGT 10GC AAT CAA 9a GCG 11 GGA 139 12 GTAAGA 224 GA CAA AAG 13 14 TTC TAC 55 GTAACAAG 15 1 2 3 4 5 6
133
Size
Intron
Exon
5’ Junction
Size
3’ Junction
5’ Junction
GTGAGA GTAAGT GTGAGT GTAAGG GTATCA GTAAGT GTGAGG
820 960 2200 1600 950 1550 2050 1610 590
TTCCAG TTTCAG TTATAG TCACAG ATCTAG CCCTAG TTGCAG TTTCAG TTTCAG TTTCAG AACCAG
CT TCC ATA G GCT ACC CTA CCT A TTC ATT GCA GCA A ACT GTT GAA AAA T TTT GTT GAA GAA G TTT CTC GCT12 GAC CTC G TTT 14 CAA GGA GGG15 AAG
GTATGT GTGAGT GTACAT
DNA polymerase I (Bethesda Research Laboratories and Boehringer Mannheim, Canada). The nucleotide sequence analysis was done according to Maxam and Gilbert (20). SI Nuclease Protection Mapping-The location of the cap site of AFP mRNA was determined by S1 nuclease protection mapping according to Berk and Sharp (21). A HincII-HinfI fragment (108 bp) of a AFP genomic clone corresponding to the 5’ end of human AFP mRNAwas labeled at the 5’ end, denatured, and hybridized with partially purified human AFP mRNA. Hybrids were digested with S1 nuclease (Miles Laboratories) and analyzed on 10% polyacrylamide gels containing 8 M urea. Detection of Repetitive Sequences-Each of the human AFP genomic clones was digested with EcoRI and/or HindIII and subjected to Southern blot hybridization. To detect all highly repeated sequences in the genomic clone, the total human DNA was used as a probe. Alu sequences were detected using a plasmid containing human Alu sequence (kindly provided by Dr. T. Taniguchi, Cancer Institute, Tokyo, Japan) as a probe. (dT-dG), sequences were detected using 32P-labeledsynthetic poly(dT-dG). (dC-dA) (Boehringer Mannheim, Canada) as a probe.
300
1560 TCACAG 1180 1280 339
TTTCAG TTGCAG
No.
2 3 4 5 6 7
a 9 10 11
13
are ingeneral agreement with those reportedby Minghetti et al. (22). Identification of ExonsandIntrons-TheAFPmRNAcoding sequence in the genomic clones was detected initially by Southern blot analysis of various restriction digests using AFP cDNA clones as probes. Those fragments showing positivehybridization were further analyzed for nucleotidesequence by the Maxam-Gilbert method. This resulted in the identification of 15 exons and the construction of the sequence organization of the AFP gene (Fig. 1).No gross discrepancies were observed between the proposed structure and the Southern blots of the human genomic DNA. The nucleotide sequences of the exons are identical with that of the human cDNA (13), except for two nucleotides at positions 902 and 1631. Both nucleotides occupy the third position of codons and thus do notaffect amino acid assignments. All exons in the human AFP gene, except for exons 3 and 15, have the same numberof nucleotides as the corresponding exons in the mouse AFP gene. Exon 3 of the human AFP gene is 12 nucleotides longer than that of the mouse AFP RESULTS AND DISCUSSION gene, accounting for the larger size of human AFP reported Isolation of Genomic AFP Clones-The Charon 4A library earlier (13). Exon 15 consists of the 3’ noncoding sequence, wasscreened withpHAF6, a cDNAclone containing 962 which tends todiverge more rapidly than thecoding sequence nucleotides of AFP mRNA coding for the COOH-terminal within the same species of mRNA. Similarly, a considerable amino acids but no 3‘ noncoding sequence (13). This resulted divergence was observed between several introns of the human in the isolation of a genomic clone, Ch.HAF1, which contained and mouse AFP genes (10). Instability of introns has been the 3‘ half of the AFP gene (Fig. 1). reported with other genes, such as globin (23, 24), albumin Ch.HAF2, a genomic clone containing a middle region of (25), andovalbumin (26). 20 library the AFPgene (Fig. l),was isolated from the Charon The presence of the same number of exons with similar by screening with pHAF7, a cDNA clone containing the 5’ sizes in the human and mouse AFP genes indicates that these sequence of AFP mRNA (13). genes resembleeach other in their overall structure. Tilghman Ch.HAF3, a genomic clone containing the 5‘ region of the and her associates (8, 10) have shown that the internal 12 AFP gene, was isolated from the Charon 4A library by~screen- exons (exons 3-14) of the mouse AFP gene can be grouped ing witha 2.5-kb fragment at the 5’ end of Ch.HAF2 released into three setsof four exons based on their sizes. This tripartite structure forms the basis of the three-domain structure by digestion with BamHI (see Fig. 1). conclusions The overlapping portions of the above three clones showed of the mouse AFP molecule. Essentially, the same identical restriction maps. The EcoRI sites shown in Fig. 1 can be madewithrespect to the human AFP gene. These
The Human &-FetoproteinGene
A.
a
b
c
B.
5057
-50 -7:
Hi ncI I -4? -3: 0 ACCAACAAAAGGTTACTAGT~AACAGGCATTGCCTGAAAAGAGTATAAAA -6
1
-2 0
Exon 1
*
-1 0
10
20
30
GAATTTCAGCATGATTTTCCATATTGTGC~TCCACCACTGCCAATAAC~ 60
Hi n f I
50
40
70
A A T A A C T A G ~ A A C CATG AAG TGG GTG GAA ~ C ATT A TTT T Y A Met Lys Trp Val G l u Ser Ile Phe Leu 80
10 0
90
CTA
110
ACT
ATT TTC CTA AAT TTT GAA TCC AGA ACA CTG CAT Ile Phe Leu Leu Asn Phe Thr Glu Ser Arg Thr Leu His y o
13 0
AGA AAT GAA TAT GGA ATA G ~ T G A G A A A T T Arg Asn Glu Thr Gly Ile
C. Exon 14
ATTTTCAG GGA CAA AAA CTG ATT TCA AAA ACT CGT GCT GCT Gly Gln L y s Leu Ile Ser Lys Thr Arg Ala Ala TTG GGA GTT TAA A T T A C T T C A G G T A A C A A A A C A T T C A G A C A A G C C T G A Leu Gly Val Ter
T
ATACAATGTTGTTTCTCCAGAAATATCAATCCATAATGAGATAGATC~.~GAG
4
GAGTGCCATTAATTCTCTTAAAAATACATGGAATTCAAAAAAAAGTTTATTT
TAAAACACTTGAACAAAATTACGCACACAATTGTTAAATTAGTGGCTCAACT ATGCAAAATCCTTTTTGGTTATTTAAAAGACTTCAACAAATGCTATCAGAAG ACTTTCCTACGTATCCAATATTTCTCTGATATAAAATAATAGAACCAGTTAC TT~CTGCACCTATTAGTTTAATTAGTATTTAATATATTTTTGCTCATATTG~
Exon15
AG G G G A A G A G A A G A C A A A A C G A G T C T T C A T T C G G T G T G A A C T T T T C T C T T T AATTTTATCTGATTTAACACTTTTTGTGAATTAATGAAATGATAAAGACTTT v
TATGTGAGATTTCCTTATCACAGAAATAAAATATCTCCAAATG TTTCCTTT TCCAAGTTTGCTTATTTATGAAAAGTTATCGATATTTCTTTGGTTTTGTATA
CCATTGTCTGAAG
FIG. 2. The nucleotide sequence of the 5‘ and 3’ termini of the human AFP gene. A , S1 nuclease mapping of the cap site of AFP mRNA. A HincIIIHinfI fragment (-49 to +57) containing the first exon and the 5’ flanking sequence (see B ) was terminally labeled on the anticoding strand, hybridized to partially purified human AFP mRNA (28), and digested with S1 nuclease. The S1 nuclease-resistant fragments were sized on a 10% polyacrylamidegel (lane c). The same HincIIIHinfI fragment was subjected to chemical degradation (20) at thymine and cytosine residues (lane a ) and adenine and guanine residues (lane b). B, the nucleotide and amino acid sequences of the first exon (underlined).The star indicates the putative cap site. C, the nucleotide and amino acid sequences of the lasttwo exons (underlined).The inverted triangle indicates the poly(A)-addition site.
general features are also found in the mouse albumin gene rule”(27) (Table I). Junction sequencesin the exons are (8), and it is therefore likely that human the albumin gene similarto the consensus sequence proposed by Breathnach will show a similar structurewhen it is analyzed. and Chambon (27). Inspection of nucleotide sequences at exon/intron boundaIdentification of the Cap Site and the TerminationSite of ries showed that all introns beginGTwith at the 5’ end and Human AFP mRNA-The location of the mRNA cap site terminate with AG at the 3‘ end, conforming to “GT-AG the was determined S1 by nuclease mapping (21). A HinfIIHincII
5058
The Human a-FetoproteinGene
A -710
-720
. vwvvvvvvvv
-730
-690
"610
-670
-660
hAFP
AATGATGCACCTG--ACCCAC-~ATAMGACACATGTGCAAGCTC
mAFP
ACCCATGCACCTGTGACATACATATGGM~ATTCTTTGGGCTCATCAGGTTTTGTGCTG
* * * * * * * ** *
-600
**
**
-590
** ******
* * * *********
-I10
- 5 -7503 0
- -556400
** * *
***.e.*
-550
hAFP
. vwvvvvvv TAAGTTTTCTATGTTI;AGCCATACATCGCATATTAAATACTTTM~TGTACCTT~TTGACATACATATTAAGTG~MGTGTT
mAFP
TAAGTTTTCTATGTTMACCAGATGCGATACACTA~TAMATMMTATAC----TTGACCGA---------TG------GTT
-650
-510
-620
-640
.*.
*********.*******
-520
******
-500
-570
-610
wvvvvvvvvvvvv
-560
-490
-600
"580
..***
****** ***
**
-470
-410
-460
-490
-520
"150
4..
-510
-500
*.
. L
hAFP
TCTGAGCTAAACAATGACAACATTATCAAGCAATCAAGCAATGATAATTTGAAATGAA~TATTATTCTGCAACTTAGGGACMGTCAT
mAF P
T-TGAGCGAAATMTAACTGGATRA"--TCAAGMAT-ATATCCACTMTGMT-----AGCCTG--.~CT-----AC---"--
**,.** *.*
*** **
-450
* * * * * * * * **.
****
-420
-430
-440
**.
.**.***
.**
-410
-400
hAFP mAFP
.
-3bO
- 3 6 0- 3 8 0
-390
mAFP
CTAGCATATGGTGTGCATTTTATTATTTTCAAAAAGAGTCTGCTACCTTTTCTTT
**.***
** * * * * * * * - 3- 3 57 00
** ***
**
-360
********* **
-340
** ****
- 3 3 0- 3 1 0
* * *****.
*****.*** -300
-320
-210
-310
-270
-250
-260
***..
hAFP
ATGGCTTCAT-TAA"CTTAATTTGAGAGAAATTAATTATTCTGCMCTTAGGGACAAGTCATCTCTTTGAATATTCTGTAGTT
mAFP
ATGGCTATATCTATGTCTTATGTTGAGATGAGATGAATGAATTATTCTTCAG----GGGA-AA-TMTCTATTTGAACAGTTT--AGAT
******
****
** **
******
-230
-2.40
*** ********* * * -26 0
-270
-280
-290
**** **
-250
- 2 1-01 7 0
- 2 2 0-160
-24 0
- 2-0101 0
****
**
t
-2 3 0
-220
-190
hAFP
TGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAGTTTTTTTTTTCTGCCCCAAAGAGCTCTGTGTCCTTGAA
mAFP
GGTGAAGAACATTTG---------CAGCAT---------TTGCAAGATTTTTTT--CCACTCTGAAGTGGTCTTTGTCCTTGAA
* * * * ****.
**
-210
-140
-130
-330
vvvvvvv vvvvvvv CTGGCATATGATAGGCATTTAATAGTTT--TAAAGAATTMTGTATTTAGATGMTTGCATACCMATCTGCTGTCTTTTCTTT
-320
0
.
-370
hAFP
-110
-200
-130
-150
* * * * * * * .******
*t
***
*******..*
ttt
.......
-190
-120
t
-16
-110
-1 7 0
0
-80
hAFP
CATAAAATACAAATAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGGMArACCATAAAGTAAChGAT
mAFP
CATAGGATACAAGTGACCCCTGCTCTGTTAATTATTGGCAAATTGCCTAACTTCAACGTAAGG~~TAG-----AGTCATATGT
-140
****
.*****
t
********.**********
*** * *
-150
ATACCAACAAAAGGTTACTAGTTAACAGGCATTGCCTGAAAAGA -60
mAFP
***
-50
"20
+* ****t*t*t*t*********~* ** ** TTGCTCACTGAAGGTTACTAGTTAACAGGCATCCCTTAAACAGG -50
-60
t
1
L
-70
t
-80
-90
vvvvv
hAFP
****** **********
**
GGACTTCAGCAGCACTGCTCGAAACA -20
-40
-10
B Hd
5'
DD
l B
Hf Hf S
D
D
D
D
l Hf
D
Hc
D
I
y E
Hf
Hf
3,
100 bp
U
FIG. 3. The 5' flanking sequence of the human AFP gene. A , comparison between the human andmouse AFP genes. The 5' flanking sequence of the human gene ( M F P ) and that of the mouse AFP gene (mAFP) (30) were compared for similarity using the algorithm of Waterman et al. (48),with the following weights: -1 for a matched pair, +1 for a mismatched pair, and k + 2 for a gap, where k is the number of nucleotides in the gap. Direct and inverted repeats were located using the matrix program of Novotny (49). *, identical nucleotides; -,
a-Fetoprotein Human
The
CGCCACGGGTCGGGGGTTAGCCGGGCGTGATGGGCGCCTGTAG TCCCAGCACTTTGGGAGCCGAGGCGAACGAACACACTGAGGTCAGGAG ATCGAGACCATCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAA
ATACAAAAAAATTAGCCGGGCGTGATGGTGGGCGCCTGTAGTCCCA GCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCTGG~C
GGAGCAGCAGTCAGCCCAGATTGCGCCACTGCACTCCCGCCTGGGC
CACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAGGAT
FIG. 4. The Alu sequence in intron 4 of the human AFP gene. The nucleotides identical to the human Alu consensus sequence (43) are underlined.
genomic fragment (see Fig. 2B) was terminally labeled, hybridized to partially purified human AFP mRNA (28), and digested with S1 nuclease. The S1 nuclease-resistant fragments were sized on a polyacrylamidegel by comparison with chemical degradation productsof the same DNA preparation (Fig. 2 A ) . The resultsshowed that the firstA of the ATATTG sequence 44 nucleotides upstream from the AUG translation initiation codon is the most likely nucleotide to be capped in the human AFP mRNA (Fig. 2B). This cap site is a t exactly the same position as has been reported for the mouse AFP gene (10). The translation termination codon, TAA, in exon 14 (Fig. ZC), and the poly(A)-addition signal, AATAAA (29), 14 nucleotides upstream from the polyadenylation site in exon 15 (Fig. X ) , were identified by comparison with the nucleotide sequence of pHAF2 (13,28). The 5’ Flanking Region of the Human AFPGene-Using a computer we compared the 736-bp 5‘ flanking sequence of the human AFP gene for its homology with the mouse AFP gene reported by ScottandTilghman (30).Alargevariety of optimal alignmentswere obtained at the level of 61% similarity by introducing deletions primarily into the mouse AFP gene. One example of such an alignment isshown in Fig. 3. A TATAAA sequence (the Goldberg-Hogness box (31) or “TATA box”) is present 27 nucleotides upstream from the cap site of human AFP mRNA (Fig. 3). A CCAAC pentamer is located at position -69. This sequence appears to bea derivative of another conserved sequence,CCAAT, commonly found about 80 nucleotides upstream from the cap site of many eukaryotic genes (23). It is noteworthy that the mouse AFP gene contains the TATA box at -30 (lo), but lacks a CCAAT pentamer a t about -80, although variant pentamers are found further upstream (30). The 10nucleotides from -8 to -17 from the cap siteof the human AFP gene are complementaryto thenucleotides from -32 to -43 withtwo unpaired nucleotides. Thisinverted a stem-loop structurewith the TATA repeat potentially forms box located at the centerof the loop. In thecase of the mouse AFP gene, nocorrespondingstem-loopstructurecan be formed. Whether or not sucha structure enhances therole of the TATA sequence as a transcription initiation signal (32) remains to be studied. Two additional inverted repeats in the 5‘flanking region of thehumanAFP gene include7-bp sequences at -2891-295 and -3011-307 and 9-bp sequences at -527/-535 and -5391-547. There is a long direct repeat in the 5’ flanking region of
Gene
5059
the human AFP gene. A 90-bp stretch at -4431-532 is homologous to a 93-bpstretch at -2141-306, with 12 nucleotides mismatched. In the mouse AFP, the presence of such a long direct repeat isless clear, although one copy may exist in the region corresponding to -2141-306 of the human AFP gene. The 14-bp sequence from -87 to -100 is similar to enhancer sequences found in viruses (see Ref. 33) and in mouse immunoglobulin heavy chain genes (34-37). An enhancer-like sequence is also present in the 5‘ flanking sequence of the mouse AFP gene, as shown in Fig. 3. The 9-bp sequence from -427 to -435, AGTCTGAAT, is a l-bp variantof the glucocorticoid responsive sequence (38), but oriented in a reverse direction. It has been shown that the administration of dexamethasone suppresses transcription of the AFP gene in the liver of newborn rats (39) and mice (40). In themouse AFP gene a similar sequenceis present a t -4231-433 in the anti-coding strand (30). Based on the similarities of their sequences and locations, it ispossible that expression of thehuman AFP gene is alsomodulated by glucocorticoid hormones. The significance of the 61% overall similarity between the 5‘ flanking sequence of the human AFPgene and that of the mouse AFP gene described above is difficult to assess at this time. An extensivesequencesimilarity in the 5’ flanking region has been observed in the rat and humaninsulin genes (41, 42). The conserved regions of the insulin genes contain transcriptional control elements as assayed by expression of the chloramphenicol acetyltransferase gene linked to them (42). These regions have also been shown to be sensitive to DNase Idigestion(42). The observed similarity of the 5’ flanking sequences of the human and mouse AFP genes may also be associated with the preservation of functionally significant sequences. Scott and Tilghman (30), using a mouse AFP minigene linked to SV40 expression vectors, have demonstrated promoter functionof the TATA sequence. Further studies using other in vitro and in vivo transcription systems may define additional functionalelements. Repetitive Sequences-The presence of Alu family sequences was examined by probing human AFPgenomic clones with 32P-labeledtotal human DNA or a cloned A h sequence. Alu family sequenceswere found to be present in intron 4 and the3’flanking region (Fig. 1). The sequence inintron 4 showed a characteristic dimer structure with more than a 90% similarity to the human Alu consensus sequence (43) (Fig. 4). By contrast, the second Alu sequence situated about 1.3 kb downstream from the 3’ end of the AFP gene was monomeric in structure (data not shown). In the mouse AFP gene, Alulike repeats have been documented in intron 1 (30,44). The presence of alternating dT and dG sequences was detected by hybridization with a 32P-labeled poly(dT-dG). (dC-dA) probe. A 7-residue stretch, (dT-dG)7, was found in intron 3, and a second repeat of unknown size was detected about 3 kb downstream fromexon 15 (Fig. 1).Alternating d T and dG sequences have been reported to be present in two introns of the rat albumin gene (25) and in the intergenic region between the 6- and @-globingenes of humans (45). In the lattercase, the dT-dG repeat has been postulated toserve as a recognition signal for gene conversion or unequal recom-
gap; V, alternating purine/pyrimidine sequences; --, inverted repeat; 0, putative enhancer sequence; V, “CCAAT”pentamer. The TATA sequence is boxed. B, sequencing strategy and restriction endonuclease map of a human AFP genomic subclone containing the first exon and the 5’ flanking sequence. The dark bow represents the first exon. The arrows indicate the direction and extent of sequence analysis by the Maxam and Gilbert method (20). Most determinations were performed at least twice. B, BstNI; D, DdeI; Hc, HincII; Hd, HindIII; Hf, HinfI; S , SauBA. The stars indicate the putative cap sites.
5060
a-Fetoprotein Human
The
bination events. These sequences can potentially form the left-handed configuration (Z DNA) (46) and may participate in the control of gene expression. The dT-dG repeats have been shown to be widespread in eukaryote genomes (47), and it is possible that the mouse AFP gene also contains such repeats, although this hasnot been reported so far. It must be emphasized that the biological significance of the nucleotide segments discussedabove is speculative at present, and their possible roles in transcription of the AFP gene can only be defined by functional assays. We have recently analyzed the 5’ flanking region of the AFP gene clonedfrom a humanhepatoma cell linethat produces a highlevel of AFP. Only 4 out of the 950 nucleotides determined were different from those of the genomic clone reported here. This suggests that expressionof the AFP gene in the hepatoma does not involve major alterations in the primary structure of the 5’ flanking region. Acknowledgments-We thank Geoff Protheroe,BarryMillott, Chris Sherwood, and Marc Nixon for excellent technical assistance and David Pot for computer analysis of the nucleotide sequences. REFERENCES 1. Abelev, G. I. (1974) Transplant. Rev. 2 0 , 3-37 2. Uriel, J . (1979) Adv. Cancer Res. 29, 127-174 3. Stillman, D., and Sell, S. (1979) in Methods in Cancer Research (Fishman, W. H.,andBush, H., eds) Vol. 25, pp. 135-168, Academic Press, New York 4. Belanger, L., Baril, P., Guertin, M., Gingras, M.-C., Gourdeau, H., Anderson, A,, Hamel, D., and Boucher, J.-M. (1983) in Adv. Enz. Regul. 2 1 , 73-99 5. Tamaoki, T., and Fausto, N. (1984) in Recombinant DNA and Cell Proliferation (Stein, G., and Stein, J., eds) pp. 145-168, Academic Press, New York 6. Hirai, H., Nishi, S., Watabe, H., and Tsukada, Y. (1973) Gann Monogr. Cancer Res. 1 4 , 19-34 7. Berde, C. B., Nagai, M., and Deutsch, H. F.(1979) J . Biol. Chem. 2 5 4 , 12609-12614 8. Kioussis, D., Eiferman, F., van de Rijn, P., Gorin, M. B., Ingram, R. S., and Tilghman, S. M. (1981) J. Biol. Chem. 2 5 6 , 19601967 9. Ingram, R., Scott, R. W., and Tilghman, S. M. (1981) Proc. Natl. Acad. Sci. U. S. A. 7 8 , 4694-4698 10. Eiferman, F. A,, Young, P. R., Scott, R. W., and Tilghman, S. M. (1981) Nature 2 9 4 , 713-718 11. Gorin, M. B., Cooper, D. L., Eiferman, F., van de Rijn, P., and Tilghman, S. M. (1981) J . Biol. Chem. 2 5 6 , 1954-1959 12. Law, S.W., and Dugaiczyk, A. (1981) Nature 291,201-205 13. Morinaga,T.,Sakai,M.,Wegmann, T. G., andTamaoki, T. (1983) Proc. Natl. Acad. Sci. U. S. A . 80,4604-4608 14. Tamaoki, T., Morinaga, T., Sakai, M., Protheroe, G., Urano, Y., and Wegmann, T. G. (1983) Ann. N. Y. Acad. Sci. 4 1 7 , 13-20 15. Gladstone, P., and Pious, D. (1978) Nature 2 7 1 , 459-461 16. Hohn, B. (1979) Methods Enzymol. 6 8 , 299-309 17. Slightom, J . L., Blechl, A. E., and Smithies, 0. (1980) Cell 2 1 , 627-638
Gene 18. Benton, W. D., and Davis, R. W. (1977) Science 1 9 6 , 180-182 19. Southern, E. M. (1975) J. Mol. Biol. 9 8 , 503-517 20. Maxam, A. M., andGilbert,W. (1977) Proc.Natl.Acad.Sci. U. S. A . 74, 560-564 21. Berk, A. J., and Sharp, P. A. (1977) Cell 1 2 , 721-732 22. Minghetti, P. P., Harper, M. E., Alpert, E., and Dugaiczyk, A. (1983) Ann. N. Y. Acad. Sci. 4 1 7 , 1-12 23. Efstratiadis, A., Posakony, J . W., Maniatis, T., Lawn,R. M., O’Connell, C., Spritz,R. A,, DeRiel, J. K., Forget, B. G., Weissman, S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Baralle, F. E., Shoulders, C.C., and Proudfoot, N. J. (1980) Cell 2 1, 653-668 24. Nishioka, Y., and Leder, P. (1979) Cell 1 8 , 875-882 25. Sargent, T. D., Jagodzinski, L. L., Yang, M., andBonner, J. (1981) Mol. Cell. Biol. 1, 871-883 26. Heilig, R., Perrin, F., Gannon, F., Mandel, J. L., and Chambon, P. (1980) Cell 2 0 , 625-637 27. Breathnach, R., and Chambon, P. (1981) Ann. Rev. Biochem.50, 349-383 28. Morinaga, T., Sakai, M., Wegmann, T. G., andTamaoki,T. (1982) Oncodeu. Biol. Med. 3 , 301-303 29. Proudfoot, N. J., and Brownlee, G. G. (1976) Nature 2 6 3 , 211214 30. Scott, R. W., and Tilghman, S. M. (1983) Mol. Cell. Biol. 3,12951309 31. Goldberg, M. L. (1979) Ph.D. thesis, Standford University 32. Lilley, D. M. J. (1980) Proc. Natl. Acad. Sci. U. S. A. 7 7 , 64686472 33. Khoury, G., and Gruss, P. (1983) Cell 33, 313-314 34. Gillies; S . D., Morrison, S. L., Oi, V. T., and Tonegawa, S.(1983) Cell 33, 717-728 35. Banerji, J., Olson, L., and Schaffner, W. (1983) Cell 3 3 , 729-740 36. Queen, C., and Baltimore, D. (1983) Cell 3 3 , 741-748 37. Emorine, L., Kuehl, M., Weir, L., Leder, P.,andMax, E. E. (1983) Nature 3 0 4 , 447-449 38. Schmid, W., Scherer, G., Danesch, U., Zentgraf, H., Matthias, P., Strange, C. M., Rowekamp, W., and Schutz, G . (1982) EMBO J. 10, 1287-1293 39. Belanger, L., Frain, M., Baril, P., Gingras, M.-C., Bartkowiak, J., and Sala-Trepat, J. (1981) Biochemistry 20,6665-6672 40. Commer, P., Schwartz, C., Tracy, S., Tamaoki, T., and Chiu, J.F. (1979) Biochern. Biophys. Res. Commun. 89, 1294-1299 41. Bell, G. I., Pictet, R. L., Rutter, W. J., Cordell, B., Tischer, E., and Goodman, H. M. (1980) Nature 2 8 4 , 2 6 - 3 2 42. Walker,M. D., Edlund, T., Boulet, A. M., and Rutter, W. J. (1983) Nature 3 0 6 , 557-561 43. Jelinek, W. R., and Schmid, C. W. (1982) Annu. Reu. Biochem. 5 1,813-844 44. Young, P. R., Scott, R. W., Hamer, D. H., and Tilghman, S. M. (1982) Nucleic Acids Res. 10, 3099-3116 45. Miesfeld. R.. Krvstal,. M.,. and Arnheim, N. (1981) Nucleic Acids Res. 9,’ 5931-5-947 46. Wang, A. H.-J., Quigley, G . J., Kolpak, F.J., Crawford, J. L., van Boom, J . H., van der Marel, G., and Rich, A. (1979) Nature 282,680-686 47. Hamada, H., Petrion, M. G., and Kakunaga, T. (1982) Proc. Natl. Acad. Sci. U. S. A. 7 9 , 6465-6469 48. Waterman, M. S., Smith, T. F., and Beyer, W. A. (1976) Adu. Math. 2 0 , 367-387 49. Novotny, J . (1982) Nucleic Acids Res. 1 0 , 127-131