Human Gastric Cathepsin E Gene - The Journal of Biological Chemistry

1 downloads 0 Views 3MB Size Report
Pediatrics, The University of Texas Southwestern Medical Center, D a l h , Texas 75235. Genomic clones containing portions of the human cathepsin E (CTSE) ...
Vol. 267, No. 3, Issue of January 25. pp. 1609-1614, 1992 Prmted in U.S.A.

THEJOURNAL OF BIOLOGICAL CHEMISTRY ‘c

1992 by The American Society for Biochemistry and Molecular Biology, Inc.

Human Gastric CathepsinE Gene MULTIPLETRANSCRIPTSRESULT FROM ALTERNATIVE POLYADENYLATION OF THE PRIMARY TRANSCRIPTS OFA SINGLE GENE LOCUS AT lq31-q32* (Received for publication, May 21,1991)

Takeshi AzumaSB, Wanguo LiuO, Douglas J. Vander LaanQ, Anne M. BowcockT, and R. Thomas TaggartQII From the §Department ofMolecular Biology and Genetics, Wayne State UniversitySchool of Medicine, Detroit, Michigan 48201, the $Department of Preventive Medicine, Kyoto Prefectural University of Medicine, Kyoto 602, Japan,and the TDepartment of Pediatrics, The University of Texas Southwestern Medical Center, D a l h , Texas 75235

Genomic clones containing portions of the human cathepsin E (CTSE) gene were isolated from cosmid and X recombinant libraries. The regions correspondingto coding, the 5‘- and 3’-untranslated, and the exon-intron boundaries of the CTSE gene were identified by sequence and hybridization analysis. The size and placement of the nine exons found in the 17.5kilobase CTSE gene was highly conserved relative to other aspartic proteinases and provided additional evidence that these proteinases are derived from a common ancestral gene. Segregation and linkage analysis of two informative restriction fragment length polymorphisms (MspI and DruI) indicated that there is a single human CTSElocus located at chromosome lq31q32 which is closely linked to the renin gene. Three CTSE transcripts (3.6, 2.6, and 2.1 kilobases) were identified in gastricfundic and antral mucosa poly(A+) RNA, and these appeared identical in sizeand relative abundance to those contained in poly(A+) RNA from cultured gastric adenocarcinoma cell lines containing CTSE. Sequence analysis of cDNA clones and comparison with the 3”flanking untranslated region in genomic clones provided evidence that alternative polyadenylation of the primary transcript resulted in the 2.6- and 2.1-kilobase transcripts which constituted greater than 95% of CTSE transcripts found in the stomach.

Cathepsin E (CTSE)’ is one of four immunologically distinct groups of aspartic proteinases found in human gastric *This workwas supported in part grant from the Center for Molecular Biology a t Wayne State University, Detroit, Michigan and Grant-in-Aid 02857109 for encouragement of young scientists from the Ministry of Education, Science, and Culture Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequence(s) reported in thispaperhas been submitted totheGenBankTM/EMBLDataBankwith accession number(s) M8284 7. I(To whom correspondence should be addressed Dept. of Molecular Biology and Genetics, 3216 Scott Hall, Wayne State University School of Medicine, 540 East Canfield, Detroit, MI 48201. Tel.: 313577-5753. The abbreviations used are: CTSE, cathepsin E; PGA, pepsinogen A PGC, pepsinogen C; CTSD, cathepsin D; CR1, complement component-C3b receptor; CR2, complement component-C3d receptor; RFLP(s), restriction fragment length polymorphism(s); bp, base pair(s); kb, kilobase(s); SDS, sodium dodecyl sulfate; lod, logarithm of the odds for linkage scores.

mucosa (1).In addition to CTSE, the gastric aspartic proteinases include: pepsinogen A (PGA), pepsinogen C (PGC), and cathepsin D (CTSD). Unlike PGA and PGC, CTSE is an intracellular proteinase found in highest concentration in the superficial epithelial cells of the stomach (2-4). CTSE has also been localized to several lymphoid-associated tissues, including thymus, spleen, macrophages, and polymorphonuclear lymphocytes (5-15). The function of this proteinase is not known; however, several findings about CTSE relative to other aspartic proteinases suggest that it has anintracellular rather than extracellular function, i.e. a role in polypeptide processing or gastric mucosal protection. The inherentproblems for characterization of CTSE have involved difficulty in purification of the protein andthe inability to assay its activity distinct from other aspartic proteinases. The lower concentration of this proteinase compared with that of the pepsinogens also contributed to the relatively slow progress toward characterization of the enzyme. Recently, we reported the isolation and sequence analysis of a series of CTSE cDNA clones from a gastric adenocarcinoma cDNA library using a set of complementary 18-base oligonucleotide probes specific for the conserved first active site region of all known aspartic proteinases (16). Sequence analysis revealed an 1188-bp open reading frame that exhibited 59% sequence identity with human PGA. The predicted CTSE amino acid sequence included a 379-residue proenzyme (Mr = 40,883) and a 17-residue signal peptide. Three transcripts (3.6, 2.6, and 2.1 kb) were identified in poly(A+) RNA isolated from a gastric adenocarcinoma cell line producing the enzyme which were absent in a nonproducing subclone. The multiple transcripts were a unique finding among aspartic proteinases and potentially reflected the existence of multiple genes, rearranged genes, alternative initiation of transcription,alternative splicing, or alternative polyadenylation. Since CTSE hasbeen demonstrated in 54% of gastric cancers (4), we sought to determine the origins and distribution of the multiple transcripts in bothpoly(A+)RNA from normal stomach and gastric adenocarcinoma cell lines. In thepresent study,we performed detailed sequence analysis of two partially overlapping CTSE recombinant clones isolated from two human genomic libraries. The structure of the 17.5-kb gene included nine exons that were highly conserved relative to otherknown aspartic proteinases bothwith regard to size and location within the derived protein sequence. Segregation and linkage analysis of two restriction fragment length polymorphisms (RFLPs) provided evidence that there is a single locus located at chromosome lq31-q32 which is closely linked to renin. CTSE transcripts in normal gastric fundic and antral mucosa appeared identical in size

1609

Human CathepsinE Gene

1610 13 kb E E E

E

EE

U

2kb

-

-3lM P S

1

tom

T

W

E

I

H

I

Bp

I HP

FIG. 1. Genomic structure andrestriction map of the human CTSE gene. Restriction sites: B, BamHI; Bg,BglII; E, EcoRI; H, HindIII; Hp, HpaII; and T, TaqI.Solid boxes indicate the size and positions of the nine exons. Whiteboxes include the exon number, the length of the coding sequence, and the enzyme used to map the exon within each EcoRI fragment. SMPl was a cosmid clone containing 5”untranslated region and exons 1-8, SMP13 was a bacteriophage A clone containing exons 4-9 and 3’-untranslated region.

and relative concentrations to that previously found in cultured gastric adenocarcinoma cell lines producing CTSE. Analysis of cDNA clones which contained different 3’-untranslated regions and comparison with the corresponding region of the genomic clones demonstrated that thetwo most abundant CTSE transcripts (2.6 and 2.1 kb) resulted from alternative polyadenylation of the primary CTSE transcript. EXPERIMENTALPROCEDURES

Identification and Analysis of Recombinant Cosmid and X Genomic Clones-High molecular weight DNA was purified from human lymphocytes, partially digested with MboI, and fractionated by sucrose density gradient centrifugation. The 30-40-kb genomic DNA fragments were recovered and cloned into the BamHI site of a cosmid vector pWE15 (Stratagene). The cosmid library was transfected into Escherichia coli strain NM554 after in vitro packaging and screened with a human CTSE cDNA probe (AGS 402; Ref. 16). Nitrocellulose filters containing denatured cosmid DNA wereprehybridized at 42 “C for 3 h in 50% formamide, 25 mM phosphate (pH 6.5), 500 Ng/ml sonicated salmon testes DNA, and 5 X Denhardt’s solution. Filters were hybridized overnight in a solution containing 50% formamide, 10% dextran sulfate, 20 mM sodium phosphate (pH 6.5), 250 wg/ml sonicated salmon testes DNA, 2 X Denhardt’s solution, and CTSE cDNA AGS 402 probe (1 X loficpm/ml) labeled with [3ZP]dCTPby the random primer method of Feinberg and Vogelstein (17). Filters were rinsed several times at room temperature in asolution containing 0.1 X SSC and 0.05% SDS toremove excesshybridization solution and then washed at 55-62 “C for 1h in the same solution to provide for specific detection of the CTSE sequences. Positive clones were picked and rescreened at low colony density to purify clones. Cosmid DNA waspurified from overnight culture by a modified alkaline lysis procedure (18). A human bacteriophage A library was screened with an EcoRI fragment (SMP1-6) derived from the SMPl cosmid clone which contained exons 7 and 8 (Fig. 1). The library was constructed by cloning partial Sau3AI-digested human DNA into the BamHI site of a EMBL3 vector (19). Positive clones were purified by replating at low density on E. coli LE392. Filters containing denatured bacteriophage were prehybridized and hybridized using the conditions described above except the SMP1-6 probe wasused. The washing conditions were as described as above. Preparation of phage DNA followed standard procedures. Restriction maps were constructed using double digests and partial digests. Digested DNA was separated on 0.5 or 1.2% agarose gels and transferred to nitrocellulose following the procedure of Southern (20). The blots were hybridized with human CTSE cDNA probes, oligonucleotide probes specific for coding sequences, or oligonucleotide primers flanking the vector cloning regions (T3 andT7, Stratagene). Restriction fragments were subcloned into M13mp18,pUC18, or Bluescript plasmid vectors for sequence analysis. Dideoxy chain termination sequencingprocedures employing35S-dATPwere performed on both strands of denatured plasmid inserts using T7 DNA polymerase and oligonucleotidesspecific for region flanking the cloning site and CTSE coding sequences (21). Northern Analysis-RNA was purified from human gastric fundic and antral mucosa following the procedure of Rall et al. (22). One gram of pulverized tissue was homogenized for 1 min at room tem-

perature in 10 ml of lysis buffer containing 5 M guanidinium thiocyanate, 100 mM Tris-HC1 (pH 7.5), 10 mM EDTA (pH 8.0), and 14% (v/v) P-mercaptoethanol. After dissolution, the homogenate was clarified by centrifugation (10,000 X g, 4 “C, 10 min). The RNA was precipitated (16-24 h, 4 “C)by adding 5.5 volumes of 4 M LiCl. The RNA and DNA wasthen pelleted by centrifugation (10,000 X g, 4 “C, 30 min). The pellet was mechanically resuspended in 5 ml of a solution of 3 M LiCl containing 4 M urea and thenmixed vigorously for 1 min. The volume of the suspension was adjusted to 30 ml, and the precipitate was collected by centrifugation as described above. Each RNA pellet was finally dissolved in 2.5 ml of 1%SDS. RNA was purified by sequential extraction of the aqueous phase with a pheno1:chloroform:isoamyl alcohol solution (25:241, v/v) and chloroform:isoamylalcohol(24:1, v/v), followedby precipitation with ethanol. Poly(A+)RNA waspurified by the chromatography of RNA on oligo(dT)-cellulose (23). Poly(A+) RNA isolated from gastric fundic and antral mucosawere resuspended in 10 mM potassium phosphate (pH 7.5), 1 M glyoxal, and 50% dimethyl sulfoxide and subjected to electrophoresis in 1%agarose gels (24). Denatured DNA size standards and RNA were transferred to nitrocellulose and hybridized to random primer [32 PIdCTP labeled CTSE cDNA probe (AGS 402; Ref. 16) which contained the entire CTSEcoding sequence and portions of the 5’- and 3”flanking sequence. Isolation and Characterization of cDNA Recombinant ClonesCTSE clones were isolated from a gastric adenocarcinoma cDNA library which employed the ZAP11 vector (Stratagene) as described previously (16). The phage were packaged and recombinants were selected by plating on E. coli strain XL-1 (Stratagene). Phage were purified by successive rounds of plating and hybridization and were converted to the corresponding plasmid form utilizing the plasmid excision procedure provided by the manufacturer (Stratagene; Bluescript 11). Southern Blot Analysis-Genomic DNA was purified from whole blood, digested with restriction enzymes following manufactures’ conditions, and subjected to electrophoretic separation in agarose gels. The following enzymes did not detect RFLPs: AccI, AluI, ApaI, BstEII, BstNI, ApaLI,AuaI, BamHI, BanII, BclI, BglI, BglII, Bsp1286, EcoRI, EcoRV, HaeII, HaeIII, HgiAI, HincII, HindIII, HinfI, HphI, MboI, NdeI, ScaI, SstI, StuI, SspI, PstI, PuuII, TaqI, XbaI, and XmnI when screened against a panel of six unrelated Caucasians. Genomic DNA was denatured and transferred to nitrocellulose (20) and prehybridized at 42 “C for 3h in 50% formamide, 25 mM sodium phosphate (pH 6.5), 500 pg/ml sonicated salmon testes DNA, and 5 X Denhardt’s solution. Filters were hybridized overnight in asolution containing 50% formamide, 10% dextran sulfate, 20 mM sodium phosphate (pH 6.5), 250 pg/ml sonicated salmon testes DNA, 2 X Denhardt’s solution, and a CTSE cDNA probe AGS402 (1 X lo6 cpm/ml) labeled with [32P]dCTPby the random primer method as described previously (16). Filters were rinsed several times at room temperature in a solution containing 0.1 X SSC and 0.05% SDS to remove excess hybridization solution and then washed at 55-62 “C for 1h in the same solution for detection of the CTSE sequences. Linkage Analysis-The DraI RFLP detected with the AGS402 CTSE cDNA clone AGS 402 was typed on the forty large reference families available through the Centre &Etude du Polymorphisme Humain Collaboration, Paris, France. Linkage analysis between CTSE and a set of 10 previously mapped set of chromosome 1 loci was performed with the LINKAGE package and the database of genotypes for these loci provided by the CEPHdatabase (Version IV; Refs. 25-28). RESULTSANDDISCUSSION

Isolation and Characterization of Genomic Clones-Seven

recombinant clones containing a portion of the CTSE gene were identified in the cosmid library using a CTSE cDNA probe. Restriction enzyme digestion and hybridization analysis with cDNA and oligonucleotide probes for different coding sequences indicated that theseven clones were identical. The clones were missing a portion of the coding sequence corresponding to exon 9 and the 3”untranslated region. We were not successful in the identification of any CTSE clones in an earlier attempt which employed a different cosmid library (pCV105; Ref. 29). These results raised the possibility that the CTSE gene might contain sequences that were unstable or difficult to isolate from a recombinant library employing a

E Gene

Cathepsin Human

1611

-300 -250 -200 CCATTCATTAACACCGTTTCTT~CCTCACA~A~TCCC~CCCCTGACACAA~CCMU:CCCACA~ACTCCCMCTTCTCCC~ACACTCATATTCCTCTTGTAGC~MAG -1 50 -100

GCACCCAGACCTCCACCCCCMCCCCCTCTTCCACATCCCCTATTTACTCTCW~CATGCCTMTCACTCCACW~CCTT~TCGCCCCTCACAC~C~AGCTCTCACACTT -17 Met Ly. Thr Leu Leu

-1

-50

Le" Leu Leu Leu Val

A G C C A A A C T C C C T T C C C A C T C C C T C C C C C ~ G ~ C M C ~ U C C CA TA2 AM AGO CTC CTT CTT TTG CTC CTG GTO 1

10

-1

Leu Leu Clu Leu C ~ YClu Ala Cln Cly Ser Leu Ais Arg

CTC CTG GAG CTC OCA GAG CCC C M GCA TCC CTT CACA0 GTCAGMCACCT 30

- 0.74

kb

- CTTTCTCTCCAC

Val R o LeuArg 0 0%

Arg BLs

CCC CTC A G CA C C

CAT

40

20

Pro Ssr Leu Lys Lys Lys LeuArg A l a Arg Ssr Cln Leu Ssr Glu Phe Trp Ly# 9.r Ai. A m Leu Amp Met 110 Cln Phe Thr GlU S8r CCC TCC CTCM C M C M C CTG CCC CCA CCC ACC CAC CTCTCT CAO TTC TGO A M TCC CAT M T TTG CAC ATG ATC CAC TTC ACC GAG TCC

.o

60

50

Cys Ser Met l a p Gln Ser Ala Lys Clu Pro Leu Ile A m Tyr Leu Amp TGC TCA A X CAC CAC ACT CCC M C C I A CCC CTC ATC M C TAC TTC CAT CTCAGCCCTCCT

S *O

- 0.U

kb

- TCTTCCTTCCAO

Met Clu Tyr

TIC

90 Pha Cly Thr 11s Ser 11s Cly Ser Pro Pro Cln Ann Phe Thr Val I1e Phe Amp Thr c l y Ser 9.r A m Leu Trp V a l R a Ssr V a l Trr TTC CCC ACT ATC TCC A T T GCC K C CCA CCA CAG M C TTC ACT GTC ATC TTC GAC ACT GGC TCC TCC M C CTC TW CTC CCC TCTGlYj TIC 70

FIG. 2. Nucleotide sequence of the of the 5'- and 3'coding regions and untranslated regions of human CTSE gene. The predicted amino acid sequence is shown above the corresponding nucleotides. A 17-residue signal peptide (residues -17 to -1, underlined) was predicted upon comparison with the signal peptides of other aspartic proteinases. The positions of the two active site aspartyl groups are indicated with stars at residues 79 and 264, respectively. The TATA box, TATCAT, within the promoterregion(-1-14 to -119) and the polyadenylation signal sequence, AATAAA, in the 3'-untranslated region are underlined. The polyadenylation sites detected in cDNA clones (Table 11) are indicated by vertical arrowheads (residues 2109 and 2537, respectively). In the 5'-untranslated region several short sequences (CAGGG, AGAGGA, GAGAAG,GGGAAAG,CTGGGCA, and 320 GCCCTC) are were repeated and three pairs of palindromic sequences (CAGGG and CCCTG, CTTGGG and CCCAAG, AGAGGA and TCCTCT) were noted between residues -304 and -18.

Cys Thr Ser Pro A l a Lye Cya TCC ACT ACC CCA CCCTGC A

CTAACTGCCCM

-

1.15

kb

--

ACTTCmCCAC

100 110 Thr Ria Ser Arg Phe Oln R e Sor Cln 9.r9.r Thr I C ACC CAC AOC AGC TTC CAC CCT TCC CAC TCC ACC ACA

120 130 T ~ T9.r Cln Ro Cly Cln 9.r Phe 9.r Ile Oln Tyr Cly Thr Gly Ser Leu 9.r Cly 110 11,01, Ala Asp Cln Val Sar TIC ACC CAC CCA CGT C M TCT TTC TCC ATT CAG TAT GCA ACC OOO AU: TTO K C Coo ATC ATT ffiA OCC CAC C M CTC TCT CTCACTCCM

GT

--

5 . 0 kb

-- TTCTCCTACCAC

150 140 Val Clu 0 4 Leu Tbr ValVal Cly Cln Cln Pha Cly Clu Ser V a l Thr Clu Pro Cly Cln Thr Phe GTG C M CGA CTA ACC CTC CTT Gcc CAC CAC TR GCA O M ACT CTC ACA GAG CCA GCC CAC ACCT'R

160

170

180

Val Asp A l a Clu Phe Asp 01y I l e Leu C 4 Leu Cly Tyr R o 9.r Lsu Ala V.1 01y C 4 Val Thr R o Val Phe Aap Asn Met Mat Ala CTG CAT CCA GAG TR GAT GCA A T T CTG OCC CTG CGA TAC CCC TCCTTG CCT 6% COI GCA GTC ACT CCA CTA TR CAC M C ATG A X CCT

200

190

C l n A m Leu Val Aap Leu R o Wst Phe 9.r Val Tyr Met Ser 9.r

CAG M C CTC 0°C GAC TTO Cffi A T G TR TCT CTC TIC ATG AOC I C CTMGCCCCATC

- 2.0kb - CTCTCCATGTAC

220

210

A m R o Glu Cly

T M C CCA G M GCT

230

Cly A l a Cly Ser Clu Leu 11. Phe Cly 0 4 Tyr Asp BLa 9.r Kim Phe 9.r C 4 9.r Leu Asn Trp Val R o V a l Thr Lye G l n Ala Tyr GCT OCC Coo ACC GAG CTC A ' R TlT GCA Gcc TIC GAC CACTCC CAT TIT TCT Coo AGC CTG M T x% O X CCA GTC ACC M C C M OCT TIC 240

Trp Cln 11s A h Leu Amp A m TW CAG A ' R OCA CTC CAT M CTGAGTATTCCC - 1.2 kb

*

- CTOTCCCCTCAC

11.

cln

VI1

c4

250 01,

Thr Val Hat Phe Cy. Ser Cl"

C ATC CAG CTG GGA Gcc ACT GTT

ATC

TTC TOC K C CAC

260 270 280 Cly Cym Cln A l a 110 V a l Amp Thr C 4 Thr 9.r Leu 110 Thr C 4 Ro 9.r Aop Lya11. L y m G l n Leu Cln A m A l a 110 Cly A l a A h GGC TGC CAG OCC ATT Oll3 CAC ACA Coo ACT TCC CTC A K ACT C4CCCT TCC GAC M C A l T M C CAC CTO CU M C OCC A T T OOC CCA OCC

3m

290

Pro Val Amp Cly Clu CCC OTC CAT 001 O M CTGACTOCCTOC

- 0.32 kb - A T S U X T T l T A C

Tyr A h ValClu Cy. A l a A m Leu A m Val W s t R o Asp Val TAT CCT CTC GAG TCT GCC M C ClT M C CTC ATG CCC GAT CTC

310 Ru Ph. Thr 110 A m Cly Val R o Tyr Thr Leu 9.r R o Thr A l a Tyr Thr LeuLsu ACC TTC ACCA 1 1 M C GCA O X CCC TAT AGC CTC *cc CCA ACT W C TIC ACC CTA CTO CTMCMCTGTT

330

-- 3.8

kb

- TCCMCCCACAC

340 350 Cln Gly Leu Asp 11. Bin R e R o A l a Cly Pro Leu Trp 11s Leu Cly Asp

Asp Phe Val Asp Cly Met Cln Phe CJI 9.r 9.r Gly Ph. CAC 'RC CTC CAT GGA A T G CAC 'RC n;C AM: ACT GccR ' IC M WA CTT GAC ATC CAC CCT CCAWT Coo CCC CTC ?oc A X : CTG OoC CAT 360

379

-

370

Val Phe 110 Arg Cln Phe Tyr S.r Val Phe Aap Arg 0 4 A m A m Arg Val 0 4 Leu Ala R o A l a V a l R o OTC TIC A R ffiA CAC TR TAC TCAO X TlT QAC COT Coo M T *IC COT 010 001 CTO OCC CCA (w GTC OCC TMOOAooo(icCTTCT2TCT2TG CCTGCCTGTCTGACACACCTTCAATATG~AGcc~ATTCTlTACACCTAC~CTTATTlTCCACACMTCTA~~CCA~~MCTTGMTTMCACCAMCAG AACATGAGAATACACACACACACACACATATACACACACACACACTTCACACATACACACCACTCCCACCACCCTCATGA?~~AGCMTTACCTTATACA'RCATATRTGTATTCATT ~CATTATCMMTC~TT~TCACA~CA'RATC~TCTCCAMCATATCCACMCCACACATCATGOTAT~TCCC~MCTCCACTCACCCCTGACMCCCAT

CCACACACffiCCAGCCCTGTTATCTACACTGCTGCCCACTCCTCTCTCCACCTCCACATCCTGTACCTGOATCATTCT2~CAMTTCffiACCATTACATCAT~TCCATAMTA

TTTCTAACATCCTTAMTATACMT~MTTCMCCATCTCCCATTCTCCCACAMTGT~~-TAC~A~TRCTATTAGCATTCMCCM~CCCATATATTCC ATITATT~AMTGTCTGTAACTCTC~CCATCTACACAC~ACCACA~MGC~CTGOTTCAMTCCGCAGCTGTCATTTCACATGOTTCTCTGMCTTATC~CCTATM

AATGCTACTTACATCTOCAGGTCTCATT~TC~~TA~CCTAGC~~TAC~TCTTGTTCCATCCTGTCAGCA~ACATAA~~~CTCTA~T

I

CTTAACACTCCX001~ACTTCTTGCCTRMTCAlTCATTACAMCTICACCATTlTACCTGATCCTRC~?ocTCATTCATCATCA~TCACATCCACACTATMTT

A C W i C C G C C A C M C A f f i T G 1 C T M T T C T C C T A T C C C T T T C ~ CffiACCCCMGCCAGCCACATCAffiAGCTCMGACATWACACTATCCTGCCCMCATGCTTAMCCCCTTCTCTACT~CTAC~'RACCCAGCCA~GCTGOCACAGCCCT2T ACTCCCACCTACCCA~AACCTCACCCACCACAATCCCTTCMCCCAGCACACACAGCCTGCACTGAOCCMCATCACCCCACTCCACTCCACCCTGOCMCACA~~~~ACACTCCTTCT

I

CMAATAAMAA

cosmid vector. An EcoRI fragment of one CTSEcosmid clone that contained exons 7 and 8 (SMPl; Fig. 1) was used to identify two genomic clones that contained an overlapping X library portion of the CTSE genefromabacteriophage constructed with the EMBL3 vector (19). Restriction analysis and alignment of the cosmids and bacteriophage X clones (SMP13 and SMP15) were performed using oligonucleotide probes for the human CTSE coding sequence. Sequenceanalysis was performed with one cosmid clone and one X clone (SMP1 and SMP13, respectively; Fig. 1). The CTSEgene structure consistedof 9 exons andoccupied 17.5 kb (Fig. 2). The positions of the exon-intron junctions

were determined by comparison with the human CTSE cDNA clones reported previously (16). The coding sequence identified by analysis of the genomic clones was identical to the composite cDNA sequence. The cosmid clone (SMP1) contained 5'-untranslated region and exons 1-8, and the bacteriophage clone (SMP13) contained exons4-9 and 3"untranslated region (Fig. 1).The sequence of 5"untranslated region of CTSE gene was different from that found in other aspartic proteinases (Fig. 2). At positions -114 to -119 the sequence TATCAT was identified as homologous totheconsensus TATA box. The position of the CTSE TATAbox was different from that found in other aspartic proteinases; TATATAA

Cathepsin Human

1612

..I . .. .. .f . .. ... '" I&" . .. ....*.....*.**. . *.. . .....*..*...*.*

CTSE:MKTLLLLLLnLELGEAWSLH l d

20 , PLRRHPSLKKKLRARSQLSEFWKSHNLDMIQ

E Gene kb

fft

PGA :MKW--LLLLGLVALSECI--MYKVPLIRKKSLRRTLSERGLLKDFLKKHNLNP~ f

.

.

PGC :MKW-MVVnVCLQLLEM--VVXVPLKKFKSIRETnKEKGLLGEFLRTHKYDPAW

12

-

7.1

-

3.7

-

a"

--~~ESCSllWSAKEPLlNYL~~YFGTISlGSPPQNffVlF~SSNLWPSWCTSPA

*.*

ff

f

*

f

f

KYFPQWEAPTLVDEQPLENYLDMEYFGTIGIGTPAOD~WF~SSNLWVPSVYCSSLA

*

ff

*

f

f

CTSHSRFNPSESSTYSTNGOTFSLQYGSGSLTGFFGYDTL~QS~QVPNQE~LSENEPG

.....

160 1 BO 200 OTFVDAEFDGILGLGYPSLAVGGVTPVFD~QNLVDLPKFSWMS PEGGAGSELIF

**..

** **

.* ff

t

.

f .

SFLYYAPFDGILGLAYPSISSSGATPVF~~~WLVSQDLFSWLSAD~--SGSWIF *Of.

"f..

f

TNFWAQFDGIllGLAYPALSVDEA~AM~MVQEGALTSPVFSVYLSNQ-~SSGGAWF

A ... . ..* .,." ... . . -

240

220

GGYDHSHFSGSLNWVPVTKOAYWQIALD QVGGTWF f

f

'b

.

GGVDSSLY~QIYYAPYTQELYWQIGIEEFLIGGQASGWCSEGCOAIVDTGTSLLTWW 280

KIKQLONAIGMP-VDG

.*

I

f.

1,

300

f.

.

320

AVECANLNWPDVTFTINGVPYTLSPTAYTLL FVDGMQFC

.. . *.

****.

f

f

PIANIQSDIGASENSDGDMWSCSAlSSLPDlVFTINGVQYPWPSAYlLQS~S----C f

**.I

..t

Y~SALLQATGAQEDEYMPLYCNSlQNLPSLTFIINGVEFPLPPSSYILSNNGY----C

.

.

340 360 379 SSGFWLDIHPPAGP-LWILGDVFIROFYSVFDRGNNRVGLAPAVP

***.****

..f.f. f.

28--k-.(.

f

0

25-Bo"

260

CSEGCOAIVDTGTSLITGPSD

** fff * f . t..ttt*tt.t. "* GG~DSSYYTGSL~VPYTVEGYWOI~DSITnNGWIA-CAEGCOAIVDT~TSLLTGPTS ** * f "*...**.***** f

-

2.0

-

a-,

FIG. 5. Southern blot analysis of genomic DNA from EIGHT unrelated individuals digested with DraI. The nitrocellulose blot was hybridized with a CTSE cDNA probe (AGS 402). Individuals 1 and 2 are homozygous for the larger fragment allele (2.8 kb), individuals 6 and 8 are homozygous for the smaller fragment allele (2.0 kb), and individuals3-5 and 7 are heterozygous for the RFLP.

ISGFWMNLPTESGE-LWILGDVFIRQYFTVFDRANNQVGLAPVA . . . . . t t t

*

TVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

quences to select oligonucleotide primers for sequencing and mapping of theCTSE exons (35, 36). We compared the predicted amino acid sequences of human CTSE and other human aspartic proteinases after alignment for maximal identity (Fig. 3). The highest degree of identity was obtained with PGA, 59% identity in the nucleotide sequence and 53% identity in aminoacid sequence. The location of each exon-intron junction was conserved relative to the maximal alignment of the sequences in PGA and PGC (Fig. 3). The structureof the CTSD gene has notyet appeared, sequence analysis of CTSD genomic clonesindicates that it also has nineexonswith placement of exon-intron junctionsconserved relative to that found in other aspartic proteinases.2 The highest degree of identity with other aspartic proteinases included the regions surrounding the two active site aspartic groups (exons 3 and 7 ; Fig. 3). The primary and tertiary structures of aspartic proteinases all have a similar bilobal structure that accommodates these two catalytically important and strictly con1.18 served aspartyl residues in the active site region. The CTSE gene structure provided additional evidence for the hypothesis that the aspartic proteinases are derived from a common ancestral gene (37, 38). Segregation and Linkage Analysis of CTSE Gene-RFLPs FIG. 4. Southern blot analysis of genomic DNA from five were detected by MspI and DraI digestion of genomic DNA unrelated individuals digested with MspI. The nitrocellulose followed by hybridization with a CTSE cDNA probe (Figs. 4 blot was hybridized with aCTSE cDNA probe (AGS 402). Individuals and 5). The polymorphisms exhibited Southern hybridization 1 and 5 are homozygous for the smaller fragment allele (0.73 kb), patterns consistentwith that expected for codominant expresindividuals 2 and 4 are homozygous for the larger fragment allele sion of two alleles at a single gene polymorphism (18, 39). (1.18 kb), and individual 3 is heterozygous for the RFLP. Segregation analysis of 10 informative families including heterozygous and homozygous individuals for the MspI polymorat -82 for PGA (30), TATAAAA at -93 for PGC (18, 31), phismindicated that CTSE was a single gene locus. The and TATAAA at -29 for renin (32-34). The hexamer se- allelic frequencies of the MspI alleles (0.73 and 1.18 kb) were quence, "AATCTT," found in the promoter regions of the determined in 35 unrelated individuals (0.76 and 0.24, respechuman PGA and PGC was not found in the CTSE gene. tively). The DraI polymorphism exhibited two alleles (Fig. 5; There were several short palindromicsequences in 5'-un2.0 and 2.8 kb) and also exhibited codominant segregation in translated region as indicated in Fig. 2. The differences found the CEPH families that were analyzed for linkage with chroin the CTSE 5"untranslated region whencomparedwith mosome 1 markers (28). Each of these RFLPs will be useful other aspartic proteinases was expected based upon the dif- for linkage studies of markers located in the region. Twoferential expression of CTSE and pepsinogens in gastric and point linkage analysis provided significant lod (logarithm of extragastric tissues. the odds for linkage scores) between CTSE and five genes The 17.5-kb CTSE gene (Fig. 2) was larger thanother and five DNA markers located on chromosome 1as shown in human aspartic proteinasegenes (PGA, 9.4 kb; PGC, 10.7 kb; Table I. No recombination was observed between CTSE and and renin 11.7. kb). During initial characterization of the J. Chirgwin, personal communication. clones, we employed the alignment of the polypeptidese-

FIG.3. Comparison of human CTSE, PGA, and PGC amino acid sequences. The predictedamino acidsequences of CTSE (upper h e ) , PGA (middle line), and PGC (lower line) are displayed with the single letter code after alignment for maximal homology (35, 36).Thenumbering refers to the sequence of CTSE. Regions of identity are indicated(*) between the aligned sequences. The position of the active site aspartyl groups are indicated with vertical arrowheads. The locations of the eight intron-exon junctions are indicated with vertical arrows. Six of seven Cys residues in the CTSEsequence were located in conserved locations of aspartyl proteinases thatform intrachain disulfide bonds (residues 92,97,255,259,297, and334). A unique Cys residue located a t position 43 within the putative activation peptide is postulated to form an interchain disulfide bond and thereby determine the native dimerform of the proenzyme.

Human Cathepsin E Gene

1613

TABLE I Lad scores for linkage of CTSE to chromosome1 loci in CEPH families Two-point linkage analysis of the CTSEpolymorphism with 10 genes and DNA segments located on chromosome 1 (DlS#; Ref. 28). Lod scores (2)a t individual recombination fractions (0) reflect the logarithm of the odds for linkage a t a specific value of recombination and can vary from -m to +m. Lod scores of -2.0 and +3.0 are generally accepted as values excluding linkage (1OO:l odds for nonlinkage) or indicating linkage (1OOO:l odds for linkage), respectively. Recombination between CTSE andeach of the chromosome 1markers was observed (lod score of -m at 0 = 0) except for the two complement receptor genes (CR1 and CR2). The recombination fraction with the highest associated lod score ( 2 )is the maximum likelihood estimate, 0, of the recombination fraction (43). REN, renin; DAF, complement decay-accelerating factor; SNRPE, small nuclear ribonucleoprotein polypeptide E. Gene/ locus

DlS65 DlS53 DlS52 REN CR1 CR2 DAF SNRPE DlS58 DlS70

Recombination fraction (0) 0

0.01

0.05

0.10

0.20

0.30

0.40

6

2

-m

3.02 5.78 1.20 9.48 3.26 6.01 5.67 3.33 5.25 2.59

5.10 6.55 3.46 11.15 3.07 6.10 8.55 3.67 5.35 4.71

5.38 6.33 3.96 10.89 2.81 5.67 8.64 3.50 4.88 5.07

4.58 5.15 3.63 8.92 2.24 4.42 6.87 2.78 3.59 4.46

3.09 3.50 2.61 6.10 1.61 2.88 4.32 1.81 2.20 3.15

1.27 1.56 1.27 2.80 0.87 1.22 1.79 0.73 0.98 1.47

0.09 0.05 0.12 0.06 0.00 0.03 0.08 0.05 0.03 0.10

5.38 6.55 3.98 11.18 3.31 6.16 8.76 3.67 5.43 5.07

“m “m “m

3.31 5.31 -m

-m -m

-m

The multiple CTSE transcripts were a unique finding for asparticproteinases and potentially may have reflected a distribution specific for gastric adenocarcinoma. We therefore attempted to determine the origin and diversity of CTSE 3.6transcripts in poly(A+) RNA from human gastric fundic and 2.6antral mucosa. The distribution and size of CTSE transcripts 2. I’ observed in fundic and antral gastric mucosa samples was indistinguishable from that found in the gastric adenocarcinoma cell line (Fig. 6). Our previous analysis of eight CTSE cDNA clones revealed a composite sequence (2158 bp) which included an open FIG. 6. Northern analysis of poly(A+) RNA isolated from human gastric fundic mucosa (slot 1) and antral mucosa (slot reading frame of 1188 bases, along with portion of the 5‘- (49 2) after electrophoresis of denatured RNA in 1% agarose. The bp) and 3”untranslated (921 bp) regions (Table 11). A polysizes of hybridizing transcripts to cDNA probes for human CTSE are adenylation signal sequence, AATAAA,was identified 337 indicated in kilobases. Three transcripts (3.6, 2.6, and 2.1 kb) were nucleotides downstream from the stopcodon (Fig.2). In order detected in both gastric mucosal regions. to determine the origin of the multiple transcripts we performed analysis of additional gastric adenocarcinoma CTSE cDNA clones. We screened CTSE cDNA clones containing TABLE I1 poly(A) tails and found one (AGS 452) that contained alarger Composition of individual CTSE cDNA clones relative to the 3”untranslated region (residue 1189-2537) when compared composite CTSE sequence previously reported (16) with the composite sequence (Table 11; residues 1189-2109). Assignment of residue numbers were relative tothe coding Two different positions of the poly(A) tail were noted among seauence. ~. ~the clones analyzed; 582 residues (AGS 435) and 1002 residues Clone Length 5”Flanking Coding 3”Flanking (AGS 452) from the polyadenylation signal (AATAAA). The Composite“ 2158 -49 1-1188 to -1 1189-2109 location of the polyadenylation site was determined by com435b 2003 -36 to -1 1-785,’ 928-1188 1189-2109 parison between the 3”untranslated regions of the cDNA 2550452’ -13 to -1 1-1188 1189-2537 sequence and thecorresponding gene sequence. We estimated Based upon the analysis of eight cDNA clones (16). by analysis of Northern blots that the 2.1- and 2.6-kb tranContained a poly(A) tail. scripts comprised greater than 95% of the total (data not Contained an internal 142-residue deletion. shown). Based upon these findings we concluded that thetwo major CTSE transcripts (2.6 and 2.1 kb) result from alternatwo complement receptor genes (CR1 and CR2). This places tive polyadenylation. Primer extension analysis of poly(A+) CTSE at lq31-q32, since we previously localized the CTSE RNA did not reveal any heterogeneity in5”untranslated gene to human chromosome band lq31 by in situ hybridiza- region (data not shown). The origin of the largest transcript tion to metaphase chromosomes (40). Despite the tight link- (3.6 kb) remains to be determined. The 3.6-kb transcript may reflect alternative splicing or alternative polyadenylation of age of CTSE and thecomplement receptor genes, no linkage the primary transcript; however, additional analysis will be disequilibrium was detected between CTSEIDraI and Sac1 necessary to determine the contentof this transcript. RFLP of CR1 ( r = 0.09, x2 = 1.17, 1 degree of freedom) or between CTSE/DruI andCR2 (r = 0.09, x2 = 1.02, 1d.f.). Six Acknowledgments- We wish to thank Chiping Qian for technical percent recombination was detected between CTSE andrenin assistance. Dr. Graeme Bell at the University of Chicago Howard and the one-lod (logarithm of the odds for linkage scores) Hughes Medical Institute generously provided assistance in the idenunit confidence interval of the linkage was = 0.016-0.15 (27). tification of the bacteriophage genomic clones. Characterization of CTSE Transcripts-In our previous reREFERENCES port, three RNA transcripts (3.6, 2.6, and 2.1 kb) were iden1. Samloff, I. M., Taggart, R. T., Shiraishi, T.,Branch, T., Reid, W. tified in the poly(A+) RNA isolated from a cultured gastric A., Heath, R., Lewis, R. W., Valler, M. J., and Kay, J. (1987) adenocarcinoma cell line that produced CTSE (16, 41, 42). Gastroenterology93,77-84 1 2

>-----



Human Cath.epsin E Gene

1614

2. Samloff, I. M. (1969) Gastroenterology 67, 659-669 3. Samloff, I. M. (1982) Gastroenterology 82,26-33 4. Shiraishi, T., Samloff, I. M., Taggart, R. T., and Stemmermann, G. N. (1988) Dig. Dis. Sci. 33,1466-72 5. Tarasova, N.I., Szecsi, P. B., and Foltmann, B. (1986) Biochim. Biophys. Acta 880,96-100 6. Jupp, R. A., Richards, A. D., Kay, J., Dunn, B. M., Wyckoff, J. B., Samloff, I. M., and Yamamoto, K. (1988) Biochem. J.2 5 4 , 895-898 7. Yamamoto, K., Katsuda, N., and Kato, K. (1978) Eur. J.Biochem. 92,499-508 8. Muto, N.,Arai, K.M., and Tani, S. (1983) Biochim. Biophys. Acta 746,61-69 9. Muto, N., and Tani, S. (1985) Biochim. Biohys. Acta 8 4 3 , 114122 10. Puizdar, V., Lapresle, C., and Turk, V. (1985) FEBS Lett. 1 8 5 , 236-238 11. Yonezawa, S., Tanaka, T., Muto, N., and Tani,S. (1987) Biochem. Biophys. Res. Commun. 1 4 4 , 1251-1256 12. Yonezawa, S., Tanaka,T.,and Miyauchi, T. (1987) Arch. Biochem. Biophys. 266,499-508 13. Muto, N., Yamamoto, M., and Tani, S. (1987) J. Biochem. (Tokyo) 1 0 1 , 1069- 1075 14. Yamamoto, K., Ueno,E., Uemura, H., and Kat0 Y. (1987) Biochem. Biophys. Res. Commun. 1 4 8 , 267-272 15. Muto, N.,Yamamoto, M., Tani, S., and Yonezawa, S. (1988) J. Biochem. (Tokyo) 1 0 3 , 629-632 16. Azuma, T., Pals, G., Mohandas, T. K., Couvreur, J. M., and Taggart, R. T. (1989) J.Biol. Chem. 2 6 4 , 16748-16753 17. Feinberg, A. P., and Vogelstein, B. (1983) Anal. Biochem. 1 3 2 , 6-13 18. Pals, G., Azuma, T., Mohandas, T. K., Bell, G. I., Bacon, J. A.,

19. 20. 21. 22.

Samloff, I. M., Walz, D.A., Barr, P. J., and Taggart, R. T. (1989) Genomics 4,137-145 Frishchauf, A. M., Lehrach, H., Poustka, A., and Murry, N. (1983) J. Mol. Biol. 170, 827 Southern, E. (1975) J. Mol. Biol. 98,503-517 Sanger, F., Coulson, A. R., Barrell, B.G., Smith, A. J. H., and Roe, B. A. (1980) J. Mol. Biol. 1 4 3 , 161-178 Rall, L. B., Scott, J., and Bell, G. I. (1987) Methods Enzymol.

147,239-248 23. Aviv, H., and Leder, P. (1972) Proc. Natl. Acad. Sci. U. S. A. 6 9 , 1408-1412 24. Williams, J. G., and Mason, P. J. (1985) in Nucleic Acid Hybridization (Hames, B. D., and Higgins, s.J., eds) pp. 139-160, IRL

Press, Washington, D. C.

25. Lathrop, G., Lalouel, J. M., Julier, C., and Ott, J. (1984) Proc.

Natl. Acad. Sci. U. S. A. 8 1 , 3443-3446 26. Lathrop, G., Lalouel, J. M., Julier, C., and Ott, J. (1985) Am. J. Hum. Genet. 37,482-498 27. Conneally, P. M., Edwards, J. G., Kidd, K. K., Lalouel, J. M., Morton, N. E., Ott, J., and White, R. (1985) Cytogenet. Cell Genet. 40,356-359 28. Dracopoli, N. C., O'Connell, P., Elsner, T. I., Lalouel, J. M., White, R. L., Buetow, K. H., Nishimura, D. V., Murray, J. C., Helms, C., Mishra, S. K., Donis-Keller, H., Hall, J. M., Lee, M. K., King, M.-C., Attwood, J., Morta, N. E., Robson, E. B., Mahtani, M., Willard, H. F., Royle, N. J., Patel, I., Jeffreys, A. J., Verga, V., Jenkins, T., Weber, J. L., Mitchell, A.L., and Bale, A. E. (1991) Genomics 9,686-700 29. Lau, Y. F., and Kan, Y. W. (1983) Proc. Natl. Acad. Sci.U. S. A. 80,5225-5229 30. Sogawa, K., Fujii-Kuriyama, Y., Mizukami, Y., Ichihara, Y., and Takahashi, K. (1983) J. Bid. Chem. 268,5306-5311 31. Hayano, T., Sogawa, K., Ichihara, Y., Fujii-Kuriyama, Y., and Takahashi, K. (1988) J. Biol. Chem. 263,1382-1385 32. Hobart, P. M., Fogliano, M., O'Conner, B.A., Schaefer, I. M., and Chirgwin, J. M. (1984) Proc. Natl. Acad. Sci. U. S. A. 8 1 , 5026-5030 33. Hardman, J. A., Hort, Y. J., Catanzaro, D. F., Tellam, J. T., Baxter, J. D., Morris, B. J., and Shine, J. (1984) D N A ( N Y 3) , 457-468 34. Miyazaki, H., Fukamizu, A., Hirose, A., Hayashi, T., Hori, H., Ohkubo, H., Nakanishi, S., and Murakami, K. (1984) Proc. Natl. Acad. Sci. U. S. A. 8 1 , 5999-6003 35. Wilbur, W. J., and Lipman, D. J. (1983) Proc. Natl. Acad. Sci. U. S. A. 80. 726-730 36. Doolittle, R. F . (1986) Of Urfs and Orfs, pp. 3-60, University

Science Books, Mill Valley, CA 37. Pearl, L., and Blundell, T. (1984) FEBS Lett. 174,96-101 38. Blundell, T., Sibanda, B.L., and Pearl, L. (1983) Nature 3 0 4 , 273-275 39. Johnson, M. P., Azuma, T., Boudi, F. H., and Taggart, R. T. (1989) Nucleic Acids Res. 1 7 , 10147 40. Couvreur, J. M., Azuma, T., Miller, D. A., Rocchi, M., Mohandas, T. K., Boudi, F. A., and Taggart, R. T. (1990) Cytogenet. Cell Genet. 63,137-139 41. Barranco, S. C., Townsend, C.M., Quraishi, M.A., Burger, N. L., Nevill, H. C., Howell K. H., and Boerwinkle, W. R. (1983) Invest. New Drugs 1, 117-127 42. Barranco, S. C., Townsend, C. M., Casartelli, C., Macik, B. G., Burger, N. L., Boerwinkle, W. R., and Gourley, W. K. (1983) Cancer Res. 4 3 , 1703-1709 43. Ott, J. (1985) Analysis of Human Genetic Linkage, 1st Ed., The

Johns Hopkins University Press, Baltimore