*Department of Biochemical Genetics, Beckman Research Institute of the City of Hope, Duarte, CA 91010; and tDepartment of Biochemistry, University of.
Proc. Nad. Acad. Sci. USA Vol. 82, pp. 3771-3775, June 1985 Genetics
Cloning of cDNAs for human aldehyde dehydrogenases 1 and 2 (cDNA expression library/synthetic oligodeoxynucleotide probe/isozymes)
LILY C. Hsu*, KENZABURO TANI*, TOSHINOBU FuJIYOSHI*, KOTOKU KURACHIt, AND AKIRA YOSHIDA** *Department of Biochemical Genetics, Beckman Research Institute of the City of Hope, Duarte, CA 91010; and tDepartment of Biochemistry, University of Washington, Seattle, WA 98195
Communicated by Arno G. Motulsky, February 7, 1985
ABSTRACT Partial cDNA clones encoding human cytosolic aldehyde dehydrogenase (ALDH1) and mitochondrial aldehyde dehydrogenase (ALDH2) were isolated from a human liver cDNA library constructed in phage Agtll. The expression library was screened by using rabbit antibodies against ALDH1 and ALDH2. Positive clones thus obtained were subsequently screened with mixed synthetic oligonudeotides compatible with peptide sequences of ALDH1 and ALDH2. One of the positive clones for ALDH1 contained an insertion of 1.6 kilobase pairs (kbp). The insert encoded 340 amino acid residues and had a 3' noncoding region of 538 bp and a poly(A) segment. The amino acid sequence deduced from the cDNA sequence coincided with the reported amino acid sequence of human ALDH1 [Hempel, J., von Bahr-Lindstrom, H. & Jornvall, H. (1984) Eur. J. Biochem. 141, 21-35], except that valine at position 161 in the previous amino acid sequence study was found to be isoleucine in the deduced sequence. Since the amino acid sequence of ALDH2 was unknown, 33 tryptic peptides of human ALDH2 were isolated and sequenced. Based on the amino acid sequence data thus obtained, a mixed oligonucleotide probe was prepared. Two positive clones, AALDH2-21 and AALDH2-36, contained the same insert of 1.2 kbp. Another done, AALDH2-22, contained an insert of 1.3 kbp. These two inserts contained an overlap region of 0.9 kbp. The combined cDNA contained a sequence that encodes 399 amino acid residues, a chain-termination codon, a 3' untranslated region of 403 bp, and a poly(A) segment. The deduced amino acid sequence was compatible with the amino acid sequences of the tryptic peptides. The degree of homology between human ALDH1 and ALDH2 is 66% for the coding regions of their cDNAs and 69% at the protein level. No significant homology was found in their 3' untranslated regions.
normality, an absence of active ALDH1 and instead the presence of an enzymatically inactive protein, was found in some Orientals (11). A very high incidence (50-80%o) of acute alcohol intoxication in Orientals in comparison to Caucasians (about 10%) could be attributed to genetic differences in the ALDH isozymes (7). Both ALDH1 and ALDH2 are tetrameric forms (2-5), and the two isozymes do not contain a common subunit (6). The amino acid sequences of human and horse ALDH1 were recently reported (12, 13), but the sequence of ALDH2 is unknown. In this paper, we report the isolation and characterization of cDNA clones for human ALDH1 and ALDH2. The amino acid sequences of the two isozymes were deduced from their cDNA sequences and compared.
MATERIALS AND METHODS
Liver aldehyde dehydrogenase (ALDH; aldehyde:NAD+ oxidoreductase, EC 1.2.1.3) is considered to play a major role in alcohol metabolism. Two major and several minor isozymes exist in the livers of mammals, including man. One of the major isozymes, ALDH1 (or E1), is of cytosolic origin, associated with a low Km for NAD and a high Km for acetaldehyde, and strongly inactivated by disulfiram. Another major isozyme, ALDH2 (or E2), is of mitochondrial origin, associated with a high Km for NAD and a low Km for acetaldehyde, and insensitive to disulfiram (1-6). Racial differences in these two isozymes have been found between Caucasians and Orientals. All Caucasians examined thus far have both ALDH1 and ALDH2 in their livers (commonly designated "usual"). In contrast, =50% of Orientals have only the ALDH1 isozyme and are missing the ALDH2 isozyme (commonly designated "atypical") (7, 8). The atypical Oriental livers, however, contain a defective enzyme, with diminished activity, that is immunologically related to ALDH2 (6, 9, 10). More recently, another ab-
Sequence Analysis of ALDH2. Human liver ALDH2 was purified to homogeneity from a liver autopsy sample from a Caucasian with the usual phenotype, as described (6). The tryptic peptides were isolated either by peptide mapping or by reversed-phase HPLC (10). Amino acid sequences of the peptides were determined by manual Edman degradation (14). Preparation of Radioactive Oligonucleotide Probes. Two types of mixed icosamers, corresponding to ALDH1 amino acid sequence, and mixed tetradecamers, corresponding to ALDH2 amino acid sequence, were synthesized by a solidphase phosphotriester method (see Fig. 1). The chemically synthesized probe was labeled at the 5' end with [y-32P]ATP (2000-5000 Ci/mmol, 1 Ci = 37 GBq; ICN) and T4 polynucleotide kinase (Bethesda Research Laboratories) by the standard method (15). Screening of Human Liver cDNA Expression Library with Antibody. Rabbit antibody against homogeneous ALDH1 and that against ALDH2 were partially purified through (NH4)2SO4 precipitation and DEAE-cellulose chromatography as described (6). The human liver cDNA library, constructed by inserting the cDNA copies of poly(A)+ mRNA from human liver into the EcoRI site of bacteriophage vector Xgtll via synthetic linkers (16), was provided by S. L. C. Woo (Howard Hughes Medical Institute, Houston, TX). The cDNA library was screened by the antibody probe method described in a previous paper (17). Identification of Fusion Protein from Lysates of Induced Recombinant Lysogens. The method used for isolation of fusion protein is essentially identical to that described previously (17). Fusion protein was detected with anti-ALDH1 or anti-ALDH2 antibody. Analysis of Recombinant Agtll Inserts by Southern Blot Hybridization with Oligonucleotide Probes. The DNA prepa-
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. ยง1734 solely to indicate this fact.
Abbreviations: ALDH, aldehyde dehydrogenase; ADH, alcohol dehydrogenase; bp, base pair(s). *To whom reprint requests should be addressed. 3771
3772
Genetics: Hsu et al.
Proc. Natl. Acad. Sci. USA 82 (1985)
rations were separated by electrophoresis in agarose gels and transferred to nitrocellulose filters (15). Hybridization over-
night with 5'-end-labeled oligonucleotide mixture (106 cpm/ml) was at 560C for the ALDH1 probes and at 380C for the ALDH2 probes. The filters were subsequently washed three times in 0.9 M NaCl/0.09 M sodium citrate, pH 7, at the hybridization temperature for 15 min, dried at room temperature, and autoradiographed for 2 days at -70'C with two intensifying screens. Subcloning of Phage Inserts and Preparation of Cloned DNA. The individual insert cDNA was ligated to the EcoRIdigested pUC13 vector. Competent Escherichia coli TB1 cells were transformed with the ligated DNA by the calcium chloride procedure (15). Plasmid DNA was prepared from the transformed cells grown in a large-scale liquid culture and was purified by gradient centrifugation in cesium chloride (15). Restriction Endonuclease Maps. Restriction mapping of the cDNA insert of Xgtll was performed by single or double endonuclease digests of the recombinant phage or the purified insert DNA. Restriction enzymes (Bethesda Research Laboratories and Boehringer Mannheim) were used under the conditions recommended by the suppliers. DNA Sequence Analysis. The restriction fragments were subcloned in phage M13mpl8 and -mpl9. DNA sequence was determined by the dideoxynucleotide chain-termination method (18).
RESULTS AND DISCUSSION Amino Acid Sequences of Tryptic Peptides of ALDH2. To obtain structural information which is essential for synthesis of an oligonucleotide probe and confirmation of cloned cDNA, we determined amino acid sequences of tryptic peptides of ALDH2 (Table 1). Although some of these Table 1. Amino acid Peak or spot
III
VII VIII IX
XII XX XXI
XXIII XXX XXXIII XXXIV
XXXVII XXXVIII XLV
II III IV V VI VII VIII
sequences
peptides isolated by peptide mapping or by HPLC were contaminated with other peptides, their sequences can still be aligned, because of the difference in amounts of major peptides and minor contaminant peptides. From the amino acid sequence data obtained, a peptide Gly-Asn-Pro-Phe-Asp was selected as a probe site, and a mixture of 64 different tetradecadeoxynucleotides encoding this sequence was prepared (Fig. 1). Screening of cDNA Clones for ALDH1 and ALDH2 with Antibody Probes. The human cDNA library was screened successively four times with either anti-ALDH1 antibody or anti-ALDH2 antibody, neither of which crossreact with the host E. coli and Xgtll phage proteins. Five clones that gave a strong signal with anti-ALDH1 but no or a weak signal with anti-ALDH2 were isolated. Two clones, XALDH1-1 and XALDH1-2, which had longer inserts than the others, were selected for the examination of fusion proteins. NaDodS04/PAGE (not shown) revealed that the molecular sizes of the fusion proteins that reacted with anti-ALDH1 antibody were about 40 kDa (for XALDH1-1) and 20 kDa (for XALDH12) larger than E. coli /-galactosidase. A total of 39 clones that gave a strong signal with antiALDH2 but no or weak signal with anti-ALDH1 were isolated. Three clones, XALDH2-21, XALDH2-22, and XALDH2-36, were selected for examination of fusion proteins. NaDodSO4/PAGE of the lysates prepared from the induced recombinant lysogens indicated that the molecular size of the fusion proteins that reacted with anti-ALDH2 antibody was about 40 kDa larger than E. coli/3-galactosidase (Fig. 2), suggesting that the recombinants contained cDNA for human ALDH2. Southern Blotting Analysis and Subcloning of the ALDH1 and ALDH2 Recombinant Clones. DNA from XALDH1-1 was digested with EcoRI and subjected to Southern blot hybridization with ALDH1 oligonucleotide probes. The 1.6kilobase-pair (kbp) insert hybridized strongly with the syn-
of tryptic peptides obtained from human liver ALDH2 Amino acid sequence Peptides from HPLC
Ala-Val-Lys; Tyr-His-Gly-Lys [5] Ser-Val-Ala-Arg [16]; Met-His-Gly-Lys Val-Pro-Glx-Lys [34]; Ser-Tyr-Thr-Arg; Met-Asn-Ala-Ser-His-Arg Cys-Leu-Arg [3] Leu-Leu-Asn-Arg Tyr-Tyr-Ala-Gly-Trp-Ala-Asp-Lys [4]; Gly-Thr-Leu-Glu-Leu-Glu-Val-Asx-Lys Val-Val-Gly-Asx-Pro-Phe-Asx-Ser-Lys [19]; Leu-Ala-Asx-Leu-Ile-GlxIle-Leu-Gly-Tyr-Ile-Asn-Thr-Gly-Lys [21] Ala-Ala-Phe-Pro-Thr-Gly-Ser-Pro-Ala-; Lys-Thr-Glx-Glx-Leu-Val-Asx-Leu-Arg Thr-lle-Pro-Ble-Asp-Gly-Asx-Phe-Phe-Ser-Tyr- [6] His-Glu-Pro-Val-Gly-Val-Cys-Gly-Gln-Ile-Ile- [7]; Thr-Phe-Val-Glx-Glx-Asp-lle-Tyr-Asp-Glu-Phe-Val- [15]
Glu-Ala-Gly-Phe-Pro-Pro-Gly-Val-Val-Asn-Ue-Val-Pro- [10]
Glu-Glu-Be-Phe-Gly-Pro-Val-Met-Glx-Ile-Leu- [25]; Ser-Pro-Asn-fle-Ile-Met-Ser-Asp-Ala-Asp-Met- [14] Ala-Asx-Tyr-Leu-Ser- [30] Peptides from peptide map Met-Asn-Ala-Ser-His-Arg Met-Ser-Gly-Ser-Gly-Arg [31] Leu-Gly-Pro-Ala-Leu-Ala-Thr-Gly-Asx-Val-Val-Val- [8]; Asp-Leu-Asp- [29] Val-Thr-Leu-Glu-Leu-Gly-Gly-Lys [13]; Gly-Tyr-Phe-Ile-Glx-Pro-Thr-Val-Phe-Gly-Asx-Val- [24] Val-Val-Gly-Asn-Pro-Phe-Asp-Ser- [19] Val-Pro-Gin-Lys [34] Leu-Leu-Cys-Gly-Gly-Gly-Ile-Ala-Ala-Asp- [23]
Val-Ala-Phe-Thr-Gly-Ser-Thr-Glx- [11] The sequences in bold type are compatible with the sequence deduced from cDNA; the bracketed numbers correspond to the singly underlined sequences in Fig. 5.
Genetics: Hsu et al.
Proc. Natl. Acad. Sci. USA 82 (1985)
ALDH1
A
1
B
-
ab
3773
c C a b c D a b c
Probe I Peptide
:
H2N-Glu-Phe-Ala-His-His-Gly-Val-COOH
mRNA
:
5'
GAAG UUC GCN CAUC CAUC GGC GU
3'
cDNA
:
3'
CTCT AAGA
5'
-
CGN
GTA GTACCG
CA
fal40,"
Probe II
Peptide
H2N-Gln-Gly-Gln-Cys-Cys-Ile-Ala-COOH
mRNA
5'
CAA
GGN CAA UGU UGU AUC GC
3'
GTcT
CCN
cDNA
:
GTcT ACGA ACGA
TAG CG
3'
5'
ALDH2
H2N-Gly-Asn-Pro-Phe-Asp-COOH
Peptide :
5'
qGN AAC CCN UUg GA
3'
cDNA probe:
3'
CCN TT G GGN AAA CT
5'
mRNA
FIG. 1. Synthetic oligonucleotides used as probes. The ALDH1 probes each consisted of 64 different icosamers corresponding to amino acid sequences of a tryptic peptide of ALDH1 which has been implicated in the disuifiram binding site (12, 19). The ALDH2 probe consisted of 64 different tetradecamers corresponding to amino acid sequence of a tryptic peptide obtained from human ALDH2. N, all four possible deoxynucleotides.
thetic probes (not shown). The XALDH1-1 insert was subcloned in vector pUC13 (Fig. 3A). DNA prepared from the subclone was digested with Pst I or doubly digested with Pst I and EcoRJ and subjected to Southern blot hybridization. The probe site was located on a 0.94-kbp Pst I/EcoRI fragment (Fig. 3 B-D). The inserts of all three XALDH2 DNA were hybridized with the ALDH2 oligonucleotide probe (not shown). Insert size of XALDH2-21 and -36 was estimated as about 1.2 kbp, and that of XALDH2-22 was 1.3 kbp. Restriction Endonuclease Maps and Nucleotide Sequences. The restriction map of the EcoRJ insert of XALDH1-1 is A 1 2 3 4 5 6
a
B 1 2 3 4
Z
5 6
-
FIG. 2. Detection of fusion protein by NaDodSO4/PAGE and immunoblotting. Proteins accumulating in induced lysogens containing Xgtll, XALDH2-21, XALDH2-22, and XALDH2-36, respectively, were compared. (A) Gel stained with Coomassie blue. (B) Replica nitrocellulose filter stained with anti-ALDH2 antibody and peroxidase-conjugated goat antiserum against rabbit IgG. Lane 1: E. coli ,-galactosidase (a), bovine serum albumin (b), ALDH1 (c), ALDH2 (d), and egg albumin (e). Lane 2: lysate from host E. coil BNN103. Lanes 3-6: lysates from BNN103 lysogenized with Xgt11, XALDH221, XALDH2-22, and AALDH2-36, respectively.
FIG. 3. Agarose gel electrophoresis and Southern blot hybridization of ALDH1 recombinant DNA. (A) Agarose gel stained with ethidium bromide. Lanes: 1, EcoRI-digested XALDH1-1 DNA; 2, EcoRI-digested plasmid pUC13 DNA with XALDH1-1 insert; 3, HindIII-digested wild-type X phage DNA; (B-D) Agarose gel stained with ethidium bromide (B) and autoradiograms of replica nitrocellulose filters hybridized with ALDH1 probes I (C) and II (D), respectively. Lanes: a, HindIII-digested wild-type X DNA; b, Pst I/EcoRI-digested plasmid DNA with AALDH1-1 insert; c, Pst I-digested plasmid DNA with AALDH1-1 insert;
shown in Fig. 4. Cleavage sites for Sst I, Kpn I, Sma I, BamHI, Xba I, Sal I, and Sph I were not found in the insert. The strategy used forthe sequence determination of XALDH11 is also outlined in Fig. 4. The nucleotide sequence and the deduced amino acid sequence are shown in Fig. 5. The cDNA sequence was verified by the data generated from both strands. It contains a coding sequence for 340 amino acid residues, a 3' noncoding region of 538 bp, and a poly(A) segment. The amino acid sequence deduced from the nucleotide sequence exactly coincides with the reported amino acid sequence from position 161 to the COOH-terminal position 500 of human ALDH1, except for one position (12). Based on our sequence data, the amino acid at position 161 should be isoleucine, whereas it was reported as valine in the protein sequence study (12). Although the possibility of errors in the sequence determination cannot be ruled out, the discrepancy could be due to
a
substitution, G -- A, which occurred during
the ligation steps, or to genetic polymorphism at the ALDHI locus in man. The restriction endonuclease cleavage maps of the three clones for ALDH2 are shown in Fig. 4. The cDNA insert of AALDH2-22 contained a region of 0.9 kbp that overlapped with the insert of XALDH2-21 and -36; together, the inserts covered 1.6 kbp. Cleavage sites for HincII, Sma I, Pst I, HindIII, and Xba I were detected. Six Sau3A1 sites were also located in the clones (Fig. 4). No site for BamHI, Acc I, Kpn I, Sal I, Sst I, or Sph I was found in the clones. The synthetic oligonucleotide probe site was located between the Sma I and Sau3Al sites. The strategy employed to determine the sequences of the cDNA inserts of XALDH2-21, -22, and -36 is outlined in Fig. 4. The DNA sequences were verified by the data generated from both strands. The combined sequence of cDNA for ALDH2 derived from the clones is given in Fig. 5. The existence of a poly(A) segment at the 3' end of the combined cPNA suggested that it contained the 3' untranslated region of ALDH2 mRNA. Two of the three possible reading frames encountered in-phase termination codons at positions 105 and 215, respectively, from the 5' end of the combined sequence. The remaining reading frame encodes 399 amino acid residues before the stop codon TAA is encountered; this stop codon is followed by a 3' noncoding region of 403 nucleotides. The noncoding region does not contain the -A-A-T-A-A-A- sequence which is considered to be impor-
3774
0
ALDH1
Proc. Natl. Acad. Sci. USA 82 (1985)
Genetics: Hsu et al.
T SSHp Ill.
5I
AS
bp
1500
1000
500 P
S
II
P H 113'
kXALDH2-22 I
I
X ALDH2-21,36
S
SinS
SHc
ALDH2 5'
I
Ss
HcPHS X
II
Iii
1~~~~~3
cDNA2: ALDH2: cDNA2: ALDH2:
FIG. 4. Restriction maps of cDNA insert of AALDH1 and that of XALDH2 and sequence determination strategy. Horizontal arrows indicate the direction and extent sequencing. Restriction endonuclease cleavage sites: A, Ava II; H, HindIII; Hc, HincII; Hp, Hpa II; P, Pst I; S, Sau3Al; Sm, Sma I; T, Taq I; and X, Xba I. Thick lines indicate the coding regions.
CTGGCGGCCTTGGAGACCCTGGACAATGGCAAGCCCTATGTCATCTCCTACCTGGTG LeuAlaAlaLeuGluThrLeuAspAsnGlyLysProTyrVal IleSerTyrLeuVal 19
GATTTGGACATGGTCCTCAAATGTCTCCGGTATTATGCCGGCTGGGCTGATAAGTACCACGGGAAAACCATCCCCATTGACGGAGACTTCTTCAGCTACACACGCCATGAACCTGTGGGG
AspLeuAspMetV
LeuLysCysLeuArgTyrTyrAlaGlyTrpAlaAspLysTyrHisGlyLysThrI1ePro11eAspGlyAspPhePheSeryrTh9rArgHisGluProVa1G1y59
cDNAj: ATATGTGGCCAAATCATTCCTTGGAATTTCCCGTTGGTTATGCTCATTTGGAAGATAGGGCCTGCACTGAGCTGTGGAAACACAGTGGTTGTCAAACCAGCAGAGCAAACTCCTCTCACT I1 eCysGlyCl n IleI leProTrpAsnPheProLeuValMetLeulIleTrpLys IleGlyProAl aLeuSerCysGlyAsnThrVal Val Val LysProAl aGl uGl nThrProLeuThr
ALDH1: cDNA2:
GTGTGCGGGCAGATCATTCCGTGGMATTTCCCGCTCCTGATGCAAGCATGGAAGCTGGGCCCAGCCTTGGCAACTGGAAACGTGGTTGTGATGAAGGTAGCTGAGCAGACACCCCTCACC ALDH2: Val CysGlyGl n IleI 1eProTrpAsnPheProLeuLeuMetGl nAl aTrpLys~euGlyProAl aLeuAl aThrGlyAsnVal Val ValMetLysValAl aGl udlnThrProLeuThr 99
cDNAI: A1GCTCTCCACGTGGCATCTTTAATAAAAGAGGCAGGGTTTCCTCCTGGAGTAGTGAATATTGTTCCTGGTTATGGGCCTACAGCAGGGGCAGCCATTTCTTCTCACATGGATATAGACMAA u_ aLeuHi sValAl aSerLeuI leLysGl uAl aGl yPheProProGl yVal ValAsn IleVal ProGl yTyrGl yProThrAl aGlyAl aAl a IleSerSerHi sMetAspIlleAspLys GCCCTCTATGTGGCCAACCTGATCAAGGAGGCTGGCTTTCCCCCTGGTGTGGTCAACATTGTGCCTGGATTTGGCCCCACGGCTGGGGCCGCCATTGCCTCCCATGAGGATGTGGACAAA iGl aAl a IleAlaSerHi sGl
ALDH1: cDNA2: ALDH2:
A1 aLeuTyr~alAl aAsnLeuI 1eLysGl uAl aGlyPheProProGlyVal ValAsn Il1eVa! ProGl'yPheGlyProThrAl
yAl
uAspValAspLysl139
cDNAj: GTAGCCTTCACAGGATCAACAGAGGTTGGCAAGTTGATCAAAGAAGCTGCCGGGAAAAGCAATCTGAAGAGGGTGACCCTGGAGCTTGGAGGAAAGAGCCCTTGCATTGTGTTAGCTGAT ALDHj: ValAl aPheThrGlySerThrGl uVal G1yLysLeuI leLysGl uAl aAl aGl yLysSerAsnLeuLysArgValThrLeuGl uLeuGlyGlyLysSerProCysI leVal LeuAl aAsp
cDNA2: GTGGCATTCACAGGCTCCACTGAGATTGGCCGCGTAATCCAGGTTGCTGCTGGGAGCAGCAACCTCAAGAGAGTGACCTTGGAGCTGGGGGGGAAGAGCCCCAACATCATCATGTCAGAT ValAlaPheThrGlySerThrGluIleGlyArgVal IeGlnVa1A1aAlaGlySerSerAsnLeuLysArgVa1ThrLeuGluLeuGlyGlyLysSerProAsnIleIleetSers Ap17
ALDH2: ALDH2:
~~~1113
1
cDNAj: GCCGACTTGGACAATGCTGTT_>gATTTGCACACCATGGGhTTCTACCAkCf AGGGCCAGTGTTGTATAGCGCATCCAGGATTTTTGTGGAAGAATCAATTTATGATGAGTTTGTTCGA ALDH1: AlaAspLeuAspAsnAlaVal GuPheA aHisHisGlyVal PheTyrHisGlnGlyGlnCysCysTle- aAl aSerArgI1ePheVa1G1uGluSerIleTyrAspGl uPheValArg cDNA2: GCCGATATGGATTGGGCCGTGGAACAGGCCCACTTCGCCCTGTTCTTCAACCAGGGCCAGTGCTGCTGTGCCGGCTCCCGGACCTTCGTGCAGGAGGACATCTATGATGAGTTTGTGGTG 219 ALDH2: A1 aAspMetAspTrpAl aVal G1uGl nAl aHi sPheAl aLeuPhePheAsnGlnGlyGl nCjsCysCysAl aGlySerArgThrPheVal G1nGl uAspIleTyrAspGl uPheVal Val
cDNAj: AGGAGTGTTGAGCGGGCTAAGAAGTATATCCTTGGAAATCCTCTGACCCCAGGAGTCACTCAAGGCCCTCAGATTGACAAGGAACAATATGATAAAATACTTGACCTCATTGAGAGTGGG ArgSer~a1GluArgAl aLysLysTyrI1eLeuGl yAsnProLeuThrProGlya1ThrGlnGl yPrGnIeAspLysGluGlnTyrAspLysIeLeuAspLeuI1eGl uSerGly
ALDH1: cDNA2:
CGGAGCGTTGCCCGGGCCAAGTCTCGGGTGGT GGAACCC TTTG4TAGCAAGACCGAGCAGGGGCCGCAGGTGGATGAAACTCAGTTTAAGAAGATCCTCGGCTACATCAACACGGGG
259 aLysSerArgVa1Va1G1yAsnProPheAspSerLysThrGl uGlnGlyProGlnaAspGl uThrGl nPheLysLysIeLeuGlyTyrIeAsnThrGly ALDH2: ArgSerVa1aArgAl Zi 19 I6
cDNAj: AAGAAAGAAGGGGCCAAACTGGAATGTGGAGGAGGCCCGTGGGGGAATAAAGGCTACTTTGTCCAGCCCACAGTGTTCTCTMATGTTACAGATGAGATGCGCATTGCCAAAGAGGAGATT LysLysGl uGl yAl aLysLeuGl uCysGl yGl yGl yProTrpGlyAsnLysGl yTyrPheVal G1nProThrVa 1PheSerAsnVal1ThrAspGl uMetArgIl~eAl aLysGl uGl u Ile
ALDH1: cDNA2:
AAGCAAGAGGGGGCGAAGCTGCTGTGTGGTGGGGGCATTGCTGCTGACCGTGGTTACTTCATCCAGCCCACTGTGTTTGGAGATCTGCAGGATGGCATGACCATCGCCMAGGAGGAGATC
1eAl aAl aAspArgGl yTyrPhelIleGl nProThrVal pheGl yAspVal G1nAspGlyMetThrI leAl aLysGl uGl uIle 299 ALDH2: LysGl nGl uGl yAl aLysLeuLeuCysGl yGl y~lyI 23 24
cDNAI: TTTGGACCAGTGCAGCAAATCATGAAGTTTAAATCTTTAGATGACGTGATCAAAAGAGCAAACMATACTTTCTATGGCTTATCAGCAGGAGTGTTTACCMMAGACATTGATMAAGCCATA PheGlyProValG1nGlnIleMetLysPheLysSerLeuAspAspVal IleLysArgAlaAsnAsnThrPheTyrGlyLeuSerAlaGlyValPheThrLysAspIleAspLysAlalle
ALDH1: cDNA2: ALDH2:
TTCGGGCCAGTGATGCAGATCCTGAAGTTCAAGACCATAGAGGAGGTTGTTGGGAGAGCCAACMTTCCACGTACG3GCTGGCCGCAGCTGTCTTCACAAAGGATTTGGAC9GGCCMAT pheGl {ProVal MetGl nlIleLeuLysPheLysthrIl1eGl uGl uVal1Va 1G1lyArgAl aAsnAsnSerThrTyrGlyLeuAl aAl aAl aVal PheThrLysAspLeuAspLysAl a~sn 339
I GARRACITGGWAGAGTAC b1bTGGAAATIbb cDNAI: ACAATCTCCTCTGCTCTGCAGGCAGGAACAGTGTGGGTGAAITGCTIAITGGCGTGG ITAAGIGCCAGTCIC ICiTGGGATTCAAGAT ALDHj: ThrI eSerSerAl aLeuGl nAl aGl yThrVal1TrpVal1AsnCysTyrGl yVal1Val1SerAl aGl nCysProPheGl ytlyPheLyst~etSerGl yAsnGl yArgGl uLeuGly~l uTyr
cDNA2: TACCTGTCCCAGGCCCTCCAGGCGGGCACTGTGTGGGTCMACTGCTATGATGTGTTTGGAGCCCAGTCACCCTTTGGTGGCTACMAGATGTCGGGGAGTGGCCGGGAGTTGGGCGAGTAC
ALDH2:
TyrLeuSerGlnAlaLeuGnAl aGlyThrValTrpValAsnCysTyrAspValPheGlyAlaGl nSerProPheGlyGlyTyrLysetSerGlSerGlyArgGluLeuGlyGluTyr 379
c DNAI1G6TTTC CATGMTATACAGAGGTCAAMACAGTCACAGTGAAAATCTC TCAGMAGAACTCATAAAGAAAATACAAGAGTGGAGAGAAGC TCTTCMATA6C TAGCATCTCCTTACAGTCAC ALDH1: G1yPheHisG1uTyrThr luVa1LysThrVa1ThrVa1LysI1eSerG1nLysAsnSerTerm
cDNA2: 666CT6CA6GCATACACTGAGTGAAAACTGTCACAGTCAAAGTGCCTCAGAAGMCTCATAAGAATCATGCAAGCTTCCTCCCTCA6CCATTGAT6GAAAGTTCAGCAAGATCAGCMC ALDH2: G1lyLeuGl nAl aTyrThril uVal1LysThrVal1ThrVal1LysVal1Pro61lnLysAsnSerTerm39 34 11
TAATATAGTAGATTTTAAAGACAAAATTTTTCTTTTCTTGIfi^IATTTTTTTTAAACATAAGCTAAATCATATTAGTATTAATACTACCCATAGAAAACTTGACATGTAGCTI xTCTTCTAAA cDNAj: cDNA2: AAAACCAAGAAAAMTGATCCTTGCGTGCTGAATATCTGAAAAGAGAAATTTTTCCTACAAAATCTCTTGGGTCAAGAAGTTCTAGAATTTGAATTGATAAACATGGTGGGTTGGCTGAG cDNA1:
cDNA2:
cDNAI: cDNA2: cDNAj: cDNAj:
ATTATTTGCCTTCTGAAATGTGACCCCCAAGTCCTATCCTAAATAAAAAAGACAAATTCGGATGTATGATCTCTCTAGCTTTGTCATAGTTATGTGATTTTCCT-TTGTAGCTACTTTTG
GGTAAGAGTATATGAGGAACCTTTTAAACGACAACATACTGCTAGCTTTCAGGATGATTTTTAAAAAATAGATTCAAATGTGTTATCCTCTCTCTGAAACGCTTCCTATAACTCGAGTT CAGGATAATAATTTTATAGAAAAGGAACAGTTGCATTTAGCTTCTTTCCCTTAGTGACTCTTGAAGTACTTAACATACACGTTMACTGCAGAGTAAATTGCTCTGTTC6CAGTAGTTATA TATAGGGGAAGAAAAAGCTATTGTTTACAATTATATCACCATTAAGGCAACTGCTACACCCTGCTTTGTATTCTGGGCTAAGATTCATTAAAACTAGCTGCTCTT (A )1I5
AAGTCCTTGGACTGTTTTGAAAAGTTTCCTAGGATGTCATGTCTGCTTGTCAAAAGAAATAATCCCTGTAATATTTAGCTGTAAACTGAATATAAAGCTTAATAA
w cAACCTTGCATA
T(A )18
FIG. 5. Nucleotide sequences of cDNAs and deduced partial amino acid sequences for human ALDH1 and ALDH2. The singly underlined amino acid sequences are compatible with the ALDH2 tryptic peptides analyzed; the numbers immediately below these regions correspond to those of the tryptic peptides listed in Table 1. The doubly underlined region corresponds to the tryptic peptide which includes an amino acid substitution found in the atypical Oriental ALDH22 (10). Arrows indicate the substitution sites in the atypical gene and enzyme (10). The synthetic probe sites are boxed. Numbers at right indicate amino acid residue numbers in the partial ALDH2 sequence.
Genetics: Hsu et al. tant for the addition of the poly(A) tail to the 3' end of mRNA (20). However, a comparable sequence, -A-T-T-A-A-A-, is located 20 bases upstream from the 3' end of the region (Fig.
5).
The cDNA includes the synthetic probe site. The deduced 399 amino acid sequence contained 21 regions that were compatible with the amino acid sequences of ALDH2 tryptic peptides (Table 1 and Fig. 5), strong evidence that the cDNA we obtained is for human ALDH2 isozyme. The deduced amino acid sequence at positions 12-25 from the COOH terminus is similar to that of a tryptic peptide that has been implicated in the abnormality of the atypical Oriental ALDH2 molecule (10). In atypical Oriental ALDH22, the position 14 amino acids from the COOH terminus is lysine instead of the usual glutamic acid, and this single amino acid substitution seems to have resulted in a drastic reduction of the enzyme activity (10). There are some discrepancies between the previous amino acid sequence data (10) and the amino acid sequence deduced from the nucleotide sequence in this part of the molecule. The previous amino acid sequence data included errors, presumably due to contamination in the peptide samples and decomposition of tyrosine during acid hydrolysis. Amino acid sequence data available for ALDH2 have been limited to sequences of two tryptic peptides ofhorse ALDH2, one of 15 residues and the other of 23 residues (21, 22). The subunit molecular weight of human ALDH2 was estimated to be 52,600 (6). The deduced COOH-terminal sequence of 399 amino acids thus accounts for >80% of human ALDH2. A comparison of the 399 residues of ALDH2 and the reported amino acid sequence of human ALDH1 (12) indicated 69% homology between the two human isozymes (Fig. 5). Sixtyfive out of 123 substitutions are compatible with single-base changes. The degree of homology (about 69% at the amino acid level and 66% at the coding nucleotide level) between human ALDH1 and ALDH2 is lower than the homology between human ALDH1 and horse ALDH1, which is estimated to be 91% (13). This finding is compatible with early evolutionary divergence of the cytosolic and mitochondrial isozymes. It has been reported that homology between the pig mitochondrial and cytosolic aspartate aminotransferases is about 50% at the protein level (23). Thus, the rate of divergence of mitochondrial and cytosolic isozymes appear to differ substantially among enzymes. ALDH1 is strongly inactivated by disulfiram, whereas ALDH2 is resistant to this agent (2,. 4). A cysteine residue at position 302 from the NH2 terminus (position 199 from the COOH terminus) has been implicated in the disulfiram binding site of ALDH1 (12, 19). A cysteine residue is in the corresponding position in ALDH2 (position 200 in Fig. 5). However, the isoleucine residue next to this cysteine in ALDH1 is replaced by a cysteine in ALDH2; the corresponding sequences are -Gly-Gln-Cys-Cys-Ile-Ala- for ALDH1 and -Gly-Gln-Cys-Cys-Cys-Ala- for ALDH2 (see Fig. 5). The present study, together with the previous amino acid substitution study (10), provides the exact nucleotide sequences for the usual ALDH21, and the atypical ALDH22 genes; -GAA-GTG-AAA-ACT-GTC-ACA- in the region for ALDH21, and AAA instead of GAA for ALDH22 (Fig. 5, arrows). Use of two synthetic oligodeoxynucleotide probes corresponding to these sequences would make it possible to determine genotypes of individuals by Southern hybridization analysis of DNA from their peripheral blood cells, as previously accomplished in other cases (24, 25). Chromosomal assignments for ALDHJ and ALDH2 loci in man are not yet known. Use of the cloned cDNAs should allow these assignments to be made.
Proc. Natl. Acad. Sci. USA 82 (1985)
3775
Racial differences were also observed in alcohol dehydrogenase (ADH). The majority of Orientals have the "variant" ADH22 gene, producing the atypical enzyme which exhibits 100 times higher specific activity than the wild-type enzyme (26). The structural difference between the two types of the enzyme has been determined (26). Recently, a full-length cDNA for human ADH2 was cloned and sequenced, and the nucleotide difference between the wild-type ADH21 and variant ADH22 genes became apparent (17). Determination of the organization of ALDH and ADH genes may shed light on the possible relationship between the polymorphism of these loci and alcohol sensitivity and/or alcoholism in Caucasians and Orientals. We are indebted to Dr. Savio L. C. Woo for allowing us to use the human liver cDNA library. We also thank Dr. G. L,. Forrest for providing the pUC13 vector, Dr. T. 0. Baldwin for providing E. coli strain TB1, and Dr. B. Simmer and Mr. T. Hankapiller for computer analysis. This work was supported by Public Health Service Grants HL-29515 and AA05763. 1. Crow, K. E., Kitson, T. M., McGibbon, A. K. H. & Batt, R. D. (1974) Biochim. Biophys. Acta 350, 121-128. 2. Eckfeldt, J., Mope, L., Takio, K. & Yonetani, T. (1976) J. Biol. Chem. 251, 236-240. 3. Eckfeldt, J. & Yonetani, T. (1976) Arch. Biochem. Biophys. 175, 717-722. 4. Greenfield, N. J. & Pietruszko, R. (1977) Biochim. Biophys. Acta 483, 35-45. 5. Kitabatake, N., Sasaki, R. & Chiba, H. (1981) J. Biochem. (Tokyo) 89, 1223-1229. 6. Ikawa, M., Impraim, C. C., Wang, G. & Yoshida, A. (1983) J.
Biol. Chem. 258, 6282-6287.
7. Goedde, H. W., Harada, S. & Agarwal, D. P. (1979) Hum. Genet. 51, 331-334. 8. Teng, Y.-S. (1981) Biochem. Genet. 19, 107-114. 9. Impraim, C. C., Wang, G. & Yoshida, A. (1982) Am. J. Hum. Genet. 34, 837-841. 10. Yoshida, A., Huang, I.-Y. & Ikawa, M. (1984) Proc. Nail. Acad. Sci. USA 81, 258-261. 11. Yoshida, A., Wang, G. & Dave, V. (1983) Am. J. Hum. Genet. 35, 1117-1125. 12. Hempel, J., von Bahr-Lindstrom, H. & Jomvall, H. (1984) Eur. J. Biochem. 141, 21-35. 13. von Bahr-Lindstrom, H., Hempel, J. & Jornvall, H. (1984) Eur. J. Biochem. 141, 37-42. 14. Huang, I.-Y., Rubinfien, E. & Yoshida, A. (1980) J. Biol.
Chem. 255, 6408-6411. 15. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY). 16. Young, R. A. & Davis, R. W. (1983) Proc. Nail. Acad. Sci. USA 80, 1194-1198. 17. Ikuta, T., Fujiyoshi, T., Kurachi, K. & Yoshida, A. (1985) Proc. Nail. Acad. Sci. USA 82, 2703-2707. 18. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nail. Acad. Sci. USA 74, 5463-5467. 19. Hempel, J., Pietruszko, R., Fietzek, P. & Jornvall, H. (1982) Biochemistry 21, 6834-6838. 20. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London) 263, 211-214. 21. von Bahr-Lindstrom, H., Sohn, S., Woenckhaus, C., Jeck, R. & Jornvall, H. (1981) Eur. J. Biochem. 117, 521-526. 22. Hempel, J., von Bahr-Lindstrom, H. & Jomvall, H. (1983) Pharmacol., Biochem. Behav. 18, 117-121. 23. Kagamiyama, H., Sakakibara, R., Tanase, S., Morino, Y. & Wada, H. (1980) J. Biol. Chem. 255, 6153-6159. 24. Conner, B. J., Reyes, A. A., Morin, C., Itakura, K., Teplitz, R. L. & Wallace, R. B. (1983) Proc. Nail. Acad. Sci. USA 80, 278-282. 25. Kidd, V. J., Wallace, R. B., Itakura, K. & Woo, S. L. C. (1983) Nature (London) 304, 230-234. 26. Yoshida, A., Impraim, C. C. & Huang, I.-Y. (1981) J. Biol. Chem. 256, 12430-12436.