Primary Structure of the Human Heparan Sulfate Proteoglycan from ...

Vol. 267, No. 12, Issue of April 25, pp. 8544-8557,1992 Printed in U.S.A.

THEJOURNAL OF BIOLOGICAL CHEMISTRY

0 1992 by The American Society for Biochemistry and Molecular Biology, Inc.

Primary Structure of the Human Heparan Sulfate Proteoglycan from Basement Membrane (HSPGBIPerlecan) A CHIMERIC MOLECULE WITHMULTIPLE DOMAINS HOMOLOGOUS TO THE LOW DENSITY LIPOPROTEINRECEPTOR, LAMININ, NEURAL CELL ADHESION MOLECULES, AND EPIDERMAL GROWTH FACTOR* (Received for publication, December 31,1991)

Alan D. Murdoch, George R. Dodge, Isabelle Cohen, RockyS . TuanS, and RenatoV. Iozzo# From the Department of Pathology and Cell Biologyand the Jefferson Cancer Institute andthe $Departments of Orthopaedic Surgery and of Biochemistry and Molecular Biology, T h o r n Jefferson University, Philadelphia, Pennsylvania 19107

We have determined the complete nucleotide and is present in all vascularized tissues and suggest that deduced amino acid sequence of the major protein core this unique molecule has evolved from the utilization of the human heparan sulfate proteoglycan HSPG2/ of modular structures with adhesive and growthregperlecan of basement membranes. Eighteen overlap- ulatory properties. ping cDNA clones comprise 14.35 kilobase pairs (kb) of contiguous sequence with anopen reading frame of 13.2 kb. The mature protein core, without the signal peptide of 21 amino acids, has a M, of 466,564. This Tissue homeostasis and the remodeling of organs during large protein is composed of multiple modules homol- development, repair and neoplastic growth are influenced by ogous to the receptorof low density lipoprotein,lami- specific interactions between the cellular elements and the nin,neural cell adhesion molecules, andepidermal surrounding extracellular matrix. Pivotal roles are played by growth factor. Domain I, near the amino terminus, proteoglycans which are some of the most complex and mulappears unique for theproteoglycan since it sharesno tivalent molecules presentin mammalian tissues (1). Our significant homology with any other proteins. It con- laboratory has extensively investigated the biosynthesis, posttains three Ser-Gly-Asp sequences that could act as translational modifications, and cellular expression of the attachment sites for heparan sulfate glycosaminogly- major heparan sulfate proteoglycan from human colon and cans. Domain I1 is highly homologousto theLDL recep- colon carcinoma cells (2-7). When colon carcinoma cells are tor and contains four repeats with perfect conservation cultured as monolayers, they synthesize a unique proteoglycan of all 6 consecutive cysteines. Next is domain I11which shares homology to the short arm of laminin A chain which is closely associated with the plasma membrane and and contains four cysteine-rich regions intercalated the pericellular microenvironment (2-4). This proteoglycan among three globular domains. Domain IV,the largest containsaprotein core of ~ 4 0 0kDa (4), which is highly module with >2000 residues, contains 2 1 repeats of glycosylated with both 0-linked oligosaccharides and numerthe immunoglobulin type as found in neuralcell adhe- ous heparan sulfate side chains (5), some of which are totally sion molecule. Near thebeginning of this domain, there unsulfated (6). One interesting featureof the colon carcinoma is a stretchof 29 hydrophobic amino acids which could cell proteoglycan is its post-translational modification with allow the molecule to interact with the plasma mem- myristate and palmitate, two long-chain fatty acids that add to thecomplexity of this macromolecule and could be involved brane. Domain V, similartothecarboxyl-terminal in membrane targeting andhydrophobic interactions (6). The globular G-domain of laminin A and to the related protein merosin, contains three globular regions and microvillar surface of colon carcinoma cells reacts intensely four EGF-like repeats. In situ hybridization and im- with a murine polyclonal antiserum (2) which was originally munoenzymatic studies show a close association of this raised against the proteoglycan isolated from the basement gene product with a variety of cells involved in the membrane-producing EHS’ tumor (8).Subsequent studies (4) assembly of basement membranes, in addition to being revealed several immunologicaland structural featuresshared localized within the stromal elements of various con- by the human andmurine proteoglycan species. They include: nective tissues. Our studies show that thisproteoglycan (i) similar size(-400 kDa) of the protein corefollowing heparitinase digestion; (ii) ability of the anti-EHS antiserum * This work was supported by National Institutes of Health Grants to immunoprecipitate a human =400-kDa precursor protein; CA-39481 and CA-47282 (to R. V. I.) and HD-15822 and USDA 8& (iii) evidence for a precursor-product relationship between the 37200-3746 (to R. S. T.) and by a Wellcome Trust Research Travel =400-kDa and thefully glycosylatedproteoglycanusing pulseGrant (to A.D.M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must chase experiments; and (iv) detection by Western blotting of therefore be hereby marked “advertisement” in accordance with 18 a similar product which was present only following heparitiU.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequence(s)reported in this paper has been submitted to the GenBankTM/EMBLData Bank withaccessionnumber(s) M85289. 3 Recipient of a Faculty Research Award FRA-376 from the American Cancer Society. To whom correspondence and reprint requests should be addressed Dept. of Pathology and Cell Biology, Rm. 249, Jefferson Alumni Hall, Thomas Jefferson University, 1020 Locust St., Philadelphia, PA 19107. Tel.: 215-955-2208; Fax: 215-923-2218.

The abbreviations used are: EHS, Engelbreth-Holm-Swarm; LDL, low density lipoprotein; N-CAM, neural cell adhesion molecule; EGF, epidermal growth factor; kb, kilobase pair(s); “perlecan,” this term has been recently assigned to the mouse species because of its beaded appearance on electron microscopy of isolated spread molecules (21). Because we have no such information regarding the human species, we refer to it as HSPG2, the official name given by the Human Genome Mapping Nomenclature Workshop Committee (9).

8544

Molecular Structure of HSPG2 Protein Core nase digestion of the immunoprecipitated proteoglycan (5). We have recently isolated and characterized two overlapping cDNA clones from a human colon library (9). These human clones were ~ 8 5 %homologous, at both the nucleotide and amino acid level, to the murine clones encoding a lamininlike domain of theEHS tumorprotein core (10). Using human/rodent somatic cell hybrids and Southernblotting, we also localized the human HSPG2 gene to thetelomeric region of the short arm of chromosome 1, a finding that has been confirmed by other laboratories (11,12). In this report, we describe the complete primary structure of HSPG2 protein core. This complex macromolecule of 467 kDa combines repeating modules homologous to the LDL receptor, laminin A chain, N-CAM, and EGF. The human HSPG2 differs from the mouse in that it contains a much larger N-CAM region, it lacks the cell-binding sequence RGD, and it contains at least six potential glycosaminoglycan attachment sites. In situ hybridization and immunoenzymatic studies showed a close association of this gene product with a variety of cells involved in the assembly of basement membranes, in addition to being found within the stromal elements of various connective tissues. The results suggest that this multidomain proteoglycan is expressed by nearly all the vascularized tissues and that it may acquire a tissue-specific function by alternative splicing of the various domains or by post-translational modifications. It is likely that thischimeric molecule has evolved from the utilization of modular structures with adhesive and growth regulatory properties. EXPERIMENTAL PROCEDURES

Materials-All the reagents were of molecular biology grade. Radionucleotides [32P]dCTP(-3000 Ci/mmol) and [35S]dATP(-1000 Ci/mmol) were obtained from Amersham Corp. cDNA Libraries and Screening Strategy-Four different human cDNA libraries were screened to isolate the clones which together comprise the entirecDNA sequence for the protein core of the human heparan sulfate proteoglycan HSPG2: 1) a randomly primed colon Xgtll cDNA library (Clontech); 2) an oligo(dT)/randomly primed cDNA library prepared from mRNA of a fibroblast cell line (ATCC, CRL 1262) kindly provided by Dr. Mon-Li Chu; 3) an oligo(dT)/ randomly primed cDNA library prepared from mRNA of a human amnion cell line referred to as WISH cells (ATCC, CCL 25), and 4) an oligo(dT) randomly primed cDNA library from human keratinocytes mRNA (both kindly provided by Dr.Jouni Uitto). The fibroblast and WISH cell cDNA libraries were cloned into the EcoRI site of X ZAP I I (Clontech), whereas the colon and keratinocyte cell cDNA libraries were in conventional Xgtll. The fibroblast and WISH libraries were screened extensively, whereas the keratinocyte library was used only for the 5’ regionof the cDNA and found to be negative. The first human clone (HS-l), a 1.1-kb insert encoding a portion of the laminin-like domain of the HSPGZ protein core, has been characterized previously (9). The HS-1insert was labeled by the random priming method (13) and used to screen a t least lo6 recombinant phage from the various libraries. Subsequent clones were obtained by performing repetitive screenings with polymerase chain reactiongenerated probes of 150-200 base pairs that were determined from the most 5’- or 3’-ends of the new clones. Isolated plaques remaining positive after tertiary or quaternaryscreenings were “rescued” by the automatic excision process (14) which resulted in a pBluescript SK plasmid containing the cDNA insert. The procedure for “rescue” was followed as outlined by the manufacturer (Stratagene). The positive clones identified in the colon Xgtll library were subcloned into the EcoRI site of PGEM-3Z (Promega) or pBluescript (Stratagene). DNA Sequencing and Computer Analysis-Plasmids were sequenced by a modified dideoxynucleotide chain termination method (15) using either polylinker primers T3 and T7or synthetic oligonucleotide primers (9). At least 4 kb of the 5’ cDNA sequence was also confirmed by comparison with the exonic sequence of two human cosmid clones we have recently isolated and partially characterized.* Ambiguities were resolved by modifying sequencing reactions or I. Cohen and R. V. Iozzo, manuscript in preparation.

8545

electrophoretic conditions to enhance DNA sequences proximal or distal to theprimers. Alignment of nucleotide sequences and comparisons with EMBL and NBRF data bases were performed utilizing the programs contained in PC/GENErelease 6.6 (Intelligenetics) including CD-ROM data base (release 5) and theFASTA and MULTALIN programs contained in the GCG package of the Jefferson Cancer Institute. InSitu Hybridization and Immunoenzymatic Staining-Gene expression of HSPGZ was determined using previously described protocols (16, 17). A number of human tissues, including placenta, colon, skin, uterus, prostate,bladder, ovary, and skeletal muscle, were fixed and processed as described before (17). The HS-1plasmid insert was biotin-labeled by nick translation (18)and visualized histochemically with a streptavidin-alkaline phosphatase procedure (16, 17). For immunoenzymatic staining, frozen sections of various surgically obtained human tissues were reacted with a monoclonal antibody HS42 directed against the basement membrane proteoglycan from human placenta (19),an antibody that recognizes the human HSPG2 (see “Results”). Immunoenzymatic labelling of HS42 monoclonal antibody was achieved by using immune complexes of alkaline phosphatase and anti-alkaline phosphatase monoclonal antibodies. Amplification of the signal was obtained by using two layers of bridging antibodies and anti-phosphatase antibodies (20). Bound phosphatase molecules were visualized using new fuchsin as substrate. Additional experimental details are provided in the text and the legends to figures. RESULTS

Isolation and Characterization of Overlapping cDNA Clones-Initial screening of a human coloncDNA library with the HS-1 insert (9) gave a number of relatively small clones which overlapped to a great extent. To circumvent this problem, we screened three other libraries from fibroblast, amnion cell, and keratinocyte, as well as a humanlymphocyte genomic (cosmid) library.’ Over 100clones were obtained and 18 were fully sequenced to provide the complete nucleotide sequence of HSPG2 protein core. A schematic representation of the overlapping clones is shown in Fig. 1. Clones 1 and 2 are the two inserts previously described by us (9) and their sequence was identical in all the 5’ and 3’ overlapping cDNAs. Subsequent clones were obtained by rescreening several human libraries with 5’ or 3‘ fragments generated by polymerase chain reaction. This strategy yielded a number of overlapping clones with several new clones that significantly extended 5’ and 3‘ (Fig.1). Nucleotide and Deduced Amino Acid Sequence-The com-P I 0 ~

.2

___

IS

-10 ~

.-

- 7

t-

I*

‘ 2 66

~

~

08

-,,*

A10

0

kb

lo

15

FIG. 1. Schematic representation of the overlapping cDNA clones encoding the 467-kDa protein core of human heparan sulfate proteoglycan HSPGB. The 18cDNA clones are represented by thin lines in the upper portion of the figure. Clones 1and 2 (labeled by a star) were characterized previously (9). The dotted line in clone 185 indicates a recombinant sequence which likely occurred during plasmid rescue. The full-length cDNA is represented by the filled bar with the start codon of the first methionine (ATG) and the stop codon (TAG) indicated. The untranslated 5’ and 3’ regions are shown by the unfilled bar. The size in kilobases ( k b ) is shown at thebottom. For additional details, see Fig. 2.

8546

Molecular Structure ofProtein HSPG2

plete nucleotide and deduced amino acid sequences are shown in Fig. 2. The entire sequence contains 14,356 nucleotides with an open reading frame of 13,173 nucleotides and a deduced precursor protein of 4391 amino acids (Mr= 468,789). The sequence starts with 80 base pairs of a 5"untranslated region which is enriched in guanine and cytosine (GC = 89%). The first methionine is followedby a highly hydrophobic segment of 20 amino acids that constitutes the signal peptide. The position of the first methionine and the amino acid sequence of the signal peptide is very similar to that of the murine species (21). Prediction of the eukaryotic secretory signal sequence givesa potentialcleavage site between residue 21 and22, with valine as thefirst amino acid, the same amino acid found in the mouse sequence (21). This cleavage site conforms to the -3,-1 rule of von Heijne (22). The mature protein core, without the signal peptide of 21 amino acids, is composed of4370 residues which encode a protein with a calculated M, of466,564. Some of the physicochemicalparameters of this protein, as predicted by computer analysis, include an isoelectric point of 6.0 and a net charge of -78.6 at pH 7.0. The protein core contains 187 cysteines, which are for the most part conserved between the human and the murine species (21). There are 10 potential N-glycosylation sites, which are randomly distributed along the protein core, and a zinc finger motif, between base residue 761 and 785, at the same location as thatfound in themouse (21). The human protein core contains a total of 53 Ser-Gly repeats, putative attachment sites for glycosaminoglycans. These SG repeats are distributed randomly throughout the sequence. However, a cluster of three Ser-Gly-Asp (SGD) are only found in the amino terminusregion of the protein, between residue 65 and 78 (Fig. 2). These SGD repeats, which are flanked by hydrophobic residues, correspond at least in part to the proposed consensus sequence E/D-GSG-E/D found in theproteoglycan versican (23). These sequences and surrounding amino acids are fully conserved in the murine species (21) and have been also found in theprotein cores of mouse (24) and human (25) syndecan, in glypican (26), in the NG2 proteoglycan (27), and in collagen type IX (28). The human protein core contains three additional sequences of Ser-Gly-X-Gly, a consensus sequence previously established for a group of small proteoglycans (29). One of such sequences is located at residue 2995 and the other two near the carboxyl-terminal region of the protein core, at residues 3933 and 4179, respectively (Fig. 2). Interestingly, the sequences of two peptides of 18 amino acids each derived from the human HSPG protein core of fibroblasts (30) were identical to our deduced amino acid sequence contained within residues 1379-1398 and residues 2841-2860, respectively (Fig. 2). Furthermore, glutamic acid was the amino acid preceding these two peptide sequences, in agreement with the fact that these two peptides were generated by V8 protease (30) which cleaves after a glutamic or aspartic acid residue. Similar sequences were also found in a cDNA encoding the basement membrane proteoglycan from the EHStumor (10) and a humanfibrosarcoma cell line (12). Two Leu-Arg-Glu (LRE) sequences, potential mediators of motor-neuron attachment (31) were found near the carboxyl terminus (Fig. 2). In contrast to the murine protein core which contains the cell-binding motif Arg-Gly-Asp (RGD) (32), the human protein core contained no such sequence. Three different computer programs (33-35) contained in the PC/GENE package predicted one to four hydrophobic stretches in addition to the signal peptide (Table I). Specifically, one region, contained between residues 2007 and 2034, was predicted as a transmembranedomain by all three methods (33-35). This being correct, the protein core of the HSPG2

Core

proteoglycan could be intercalated in the plasma membrane via this hydrophobic region. Future experiments need to establish whether this hydrophobic domain is utilized by different cells and whether it may be required for intracellular transport andmembrane targeting. It is noteworthy, however, that the HSPG2 from colon carcinoma cells binds avidly to hydrophobic matrices and requires relatively high concentrations of detergent to be displaced from octyl-Sepharose (36). At the 3'-end of the molecule, there is a stop codon (TAG) followed by1.1kb of 3"untranslated region (Fig.2). A typical polyadenylation signal AATAAA is separated from the poly(A) tail by 16 nucleotides. Structural Model and Internal Repeats: A Chimeric Molecule with Multiple Modular Units-Extensive computer analyses revealed several interesting features of this protein core. The most salient isthe assembly of multiple domains with striking homology to other extracellular matrix and adhesive proteins and the presence of several internally repeated structures. A schematic representation of this model reveals five discrete domains (Fig. 3). Domain I contains cluster a of three heparan sulfate chains and appearsunique for the HSPG2. Domain I1 is homologous to theLDL receptor. Domain IIa contains one IgG repeat. Domain I11 is homologous to the shortarm of the laminin A chain. Domain IV contains 21 IgG repeats asin NCAM, whereas domain V is structurally similar to the G domain of laminin A chain (i.e. the carboxyl globular domain of laminin long arm). The presence of the multiple internal repeats present in domains I1 to V is demonstrated by homology plot analysis (37) (Fig. 4). Details of the various domains will be presented individually in the following sections. Domain I: Unique to Heparan Sulfate Proteoglycan (Amino Acids 22-193)"Immediately after the signal peptide, there is a domain of 172 amino acids that contains the threeglycosaminoglycan attachment sites (SGD)described above. In contrast to the other domains, domain I does not contain any internal repeats, lacks cysteines, and is highly enriched in acidic residues. Consequently, domain Ihas an estimated isoelectric point of 4.04 and a netcharge of -14.71 at pH 7.0. Extensive computer analysis of the various data banks utilizing the FASTA program (38) did not reveal significant homology with any other protein. Therefore, this domain appears to be specific for the HSPG2 protein core. The uniqueness of this domain has also been found in themurine species (21). Domain 11:Homology with the LDL Receptor (Amino Acids 194403)"The second discrete domain of210 amino acids contains four modular units (Fig. 5, A and B ) .These cysteinerich repeats consist of about 40 residues each and exhibit striking homology to the LDL receptor (39) and to related proteins such as the glycoprotein of the Heyman nephritis antigen (40) and thehuman complement component C9 (41). Between the first and the second cysteine repeat there is a 45-residue-long segment enriched in proline with no homology to other proteins, as in the murine species (21). The four repeated subdomains have 6 conserved cysteines and several acidic and hydrophobic amino acids as in the LDL receptor (42). Alignment with the murine protein core sequence (Fig. 5B) reveals complete conservation of the cysteine residues and of several adjacent amino acids. Of particular interest is the conservation of the sequence DGSDE, an amino acid segment that mediates the binding of LDL to itsreceptor (39, 42). Immediately distal to the LDL receptor domain, there is a region of 101 residues (404-504) with one isolated IgG-like structure homologous to therepeats found in N-CAM and the immunoglobulin gene superfamily (43). This region will be

Molecular Structure of HSPGB Protein Core

8547

-

380

23 1 51

TCGTACCTTTClWTWlWGTAWIlGClGGCTWCAGCATCTWIGGAGACGACClGGGCAGTGGGGACCTGGGCAGCGGGGACTTC~GATGGTTTATTTCCWGCCClGCTGMTTTCACTCGCTCCATCGAGTA~GCCCTWIGClG S Y L S D D E V R L A D S I - D L G - L G ~ F O M V V F R A L V N F T R S I E Y S P P L

381 101

G A G W T G W I G G C T C C A W W G T T T C G A W G G T G T C C ~ G G C T G T G G T A ~ C A C G C T G W G l C G ~ G T A C l T ~ T T C C C G W G A C C A G G T T G T C A G T G l G G T G T l W I T ~ G ~ G C T G W T G G C T G G G T T T T T G T G G A G C T C W530 TGlG E D A G S R E F R E V S E A V V D T L E S E V L K I P G D O V V S V V F I K E L D G U V F V E L D V 150

531 151

GGCTCGGMGCGMTGCGGATGGTGCTCAGATTCAGWWTGCTGCTCAGGGTCATCTCCAGCGGCTCTGTGGCCTCCTACGTCACCTCTCCCCAGGGATTCCAGTTCCWCGCCTGGGCACAGTGCCCCAGTTCC~GAGCCTGCACG 680 G S E C N A D G A O I O E R L L R V I S ~ S V A S Y V T S P O G F O F R R L G T V P O F P R A C T 200

681 201

GAGGCCWGTTTGCCTCCCACAGCTACMTGAGTGTGTGGCCClGGAGTATCGCTGTGACCGGCGGCCCWCTGCAGGGACATGTCTGATWGCTCMTTGTGAGGAGCWIGTCCTGGGTATCAGCCCCACATTCTCTCTCCTTGTGWG E A E F A C H S Y N E C V A L E Y R C D R R P D C R D M S D E L N C E E P V L G I S P T F S L L V

830 250

E

980 300

ACGACATCTTTACCGCCCCGGCCAGAWCAACCATCATGCGACAGCCACCAGTCACCCACGCTCCTCAGCCCCTGCTTCCCGGTTCCGTCAGGCCCCTGCCCTGTGGGCCC~GWGGCCGCATGCCGCMTGGGWICTGCATCCCCAW 831 25 1 T T S L P P R P E T T I M R Q P P V T H A P O P L L P G S V R P L P C G P P E A A C R N G H C I P R

981 301

100

GACTACCTCTCCWCGGACAGWGGACTGCGAGCACGGCAGCWTGAGCTA~CTGTGGCCCCCCGCCACCCTGTGAGCCCMCGAGTTCCCCTGCGGGAATGGACATTGTGCCClCMGCTGTGGCGCTGCGATGGTGACTTTGACTGl D Y l C D G O E D C E D G S D E L D C G P P P P C E P N E F P C G N G H C A L K L U R C D G D f D C

1130 350 1280 400

1131 351 1281 401

TTTGCCTGCATCCCCCCCCAGGTGGTGACACCTCCCCGGGAGTCCATCCAGGCTTCCCGGGGCCAGACAGTGACCTTCACCTGCGTGGCCATTGGCGTCCCCACCCCCATCATCMTTGWGGCTCAACTGGGGCCACATCCCCTCTWIT F G C R P P P V V T P P R E S I O A S R G O T V T F T C V A I G V P T P I I N U R L N U G H I P S H

1430 450

1431 451

CCCAGGGTGACAGTCACCAGCGAGGGTGGCCGTGGCACACTGATCATCCGTGATGTGMGGAGTCAGACCAGGGTGCCTACACCTGTGAGGCCATGAACGCCCGGGGCATGGTGTTTGGCATTCCTGACGGTGTCCTTGAGCTCGTCCCA P R V T V T S E C G R G T L I I R D V K E S D O G A Y T C E A M N A R G M V F G I P D G V L E L V

P

1580 500

1581 501

CAACGAGGCCCCTGCCCTGACGGCCACTTCTACCTGGAGCACAGCGCCGCCTGCCTGCCCTGCTTCTGCTTTGGCATCACCAGCGTGTGCCAGAGCACCCGtCGCTTCCGGWCCAGATCAGGCTGCGCTTTGACCMCCCGATGACTTC O R G P C P D G H F Y L E H S A A C L P C F C F G I T S V C O S T R R F R D O I R L R F D O P D D F

1730 550

1731 551

MGGGTCTCAATCTCACMTGCCTGCGCAGCCCGGCACGCCACCCCTCTCCTCCACGCAGCTGCAGATCGACCCATCCCTGCACGAGTTCCAGCTAGTAGACCTGTCCCGCCGCTTCCTCGTCCACGACTCCTTCTGGGCTCTGCCTGM K G V N V T M P A O P G T P P L S S T O L O l D P S L H E F O L V D L S R R F L V H D S F U A L P E I

1880 600 2030 650

1881 601

2180

2031 65 1

CCCACCCMCCTGCTCCTCTGCWIGCCCCAGGTCCAGTTCTCTGAGGAGCACTGGGTCCATGAGTCTGGCCGGCCGGTGCAGCGCGCGGAGCTGCTGCAGGTGCTGWIWGCCTGWGGCCGTGCTCATCCAGACCGTGTACMCACC P T O P G A L N O R P V P F S E E H U V H E ~ R P V O R A E L L O V L O S L E A V L l O T V Y N

2181 701

AAGATGGCTAGCG~GGWCTTAGCWCATCGCCATGGATACCACCGTCACCCATGCCACCAGCCATGGCCGTGCCCACAGTGTGGAGGAGTGCAGATGCCCCATTGGCTATTCTGGCTTGTCCTGCGAGAGCTGTGATGCCCACTTCACl 2330 K W A S V G L S D I A R D T T V T H A T S H G R A H S V E E C R C P ~ G Y ~ L S C E S C D A H F T750

2331 7.51

CGGGTGCCTGCTGGCCCCTACCTGGCCACCTGCTCTGGTTGCAGTTGCMTGGCCATGCCAGClCCTCTCACCCTGTGTATGCCCACTGCCTGAATTGCCAGCACMCACGGAGGCGCCACAGTGCMCMGTGCMGGCTGGCTTCTlT R V P G G P V L G T C S t C S C N G H A S S C D P V Y G H C L N C O H N T E G P O C N K C K A G

2481 801

CGCWCCCCATGMGCCCACCGCCACTTCCTGCCGGCCCTGCCCTTGCCCATACATCGATGCCTCCCGCAGATTCTCAGACACTTGCTTCCTGGACACGGATGGCCMGCCACATGTGACGCCTGTGCCCCAGGCTACACTGGCCGCCGC G D A M K A T A T S C R P C P C P Y I D A S R R F S D T C F L D T D G O A T C D A C A P G Y T G R R

263 1 851

TGTCAGAGCTCTCCCCCCGGATACGAGGGCMCCCCATCCAGCCCGGCGGGAAGTGCAGGCCCGTCAACCAGGAGATTGTGCGCTGTGACGAGCGTGGCAGCATGGGGACCTCCGGGGAGGCCTGCCGCTGTMGMCMTGTGGTGGGG C E S C A P G Y E G N P I O P G G K C R P V N O E I V R C D E R G S M G T ~ E A C R C K N N V

V

G 900

2781 901

CGCTTGTCCMTGMTGTGCTWCGGCTCTTTCCACCTGAGTACCCG~CCCC~TGGCTGCCTCAAGTGCTTCTGCATGGGTGTCAGTCGCCACTGCACWIGCTCTTCATGWGCCGTGCCCAGTTGCATGGGGCCTCTGAGGAGCCT R L C N E C A D G S F H L S T R N P D G C L K C F C M G V S R H C T S S S U S R A O L H G A S E E P

2930 950

2931 95 1

GGTCACTTCAGCCTGACCMCGCCGCMGCACCCACACCACCMCWGGGCATCTTClCCCCCACGCCCGGGGMCTGGWTTCTCCTCCTTCCACAGACTCTTATCTGGACCCTACTTCTGWGCCTCCCTTCACGCTTCCTGGGGWC G H F S L T N A A S T H T T N E G I F S P T P G E L G F S S F H R L L ~ P Y F U S L P S R F L G

3081 1001

MGGTWCCTCCTATGWGGAWGCTGCGCTTCACAGTGACCCAWGGTCCCAGCCGGGCTCCAWICCCCTGCACGGGCAGCCGTTGGTGGTGCTGCMGGTMCMCATCATCCTAGAGCACCATGTGGCCWIGWGCCCAGCCCCGGC K V T S V G G E L R F T V T P R S O P G S T P L H G O P L V V l O G N N I I L E H H V A O E P S P G

3230 1050

3231 1051

CAGCCCAGCACCTTCATTGTGCCTTTCCGGGAGCMGCATGGCAGCGGCCCGATGGGCAGCCAGCCACACGGGAGCACCTGCTGATGGCACTGGCAGGCATCGACACCCTCCTWTCCGAGCATCCTACGCCCAGCAGCCCGCTGAGAGC P P S T F l V P F R E P A U O R P D G Q P A T R E H L L M A L A G l D T L L l R A S Y A O P P A E

3380 llD0

3381 1101

AGGGTCTCTGGCATCAGCATGGACGTGGCTGTGCCCWGG~CCGGCCAGGaCCCCGCGCTGGMGTGGMCAGTGCTCCTGCCCACCCGGGTACCGTGGGCCGTCCTGCCAGGACTGTGA~CAGGClACACACGCACGCCCAGTGGC R V ~ I S M D V A V P E E T G O D P A L E V E O C S C P P G Y R G P S C O D C D T G Y T R T

3530 1150 P ~

3531 1151

CTCTACCTCGGTACCTGTGMCGCTGCAGCTGCCATGGCCACTCAGAGGCCTGCGAGCCAGMACAGGTGCCTGCCAGGGCTGCCAGCATCACACGGAGGGCCClCGGTGlGAGCAGTGCCAGCCAGWTACTACGGGGACGCCCAGCGG L Y L C T C E R C S C H G H S E A C E P E T G A C O G C O H H T E G P R C E P C O P G Y Y G D A O R

3680

3681 1201

GGGACACCACAGGACTGCCAGCTGTGCCCCTGCTACGGAGACCCTGCTGCCGGCCAGGCTGCCCACACTTGTTTTCTGGACACAGACGGCCACCCCACCTGTGATGCGTGCTCCCCAGGCCACAGTGGGCGTCACTGTGAGAGGTGCGCC G T P O D C O L C P C Y G D P A A G O A A H T C F L D T D G H P T C D A C S P G H ~ R H C E R C A

3830 1250

3831 1251

CCTGCCTACTATGGWIACCCCAGCCAGGGCCAGCCATGCCAWGAGACAGCCAGGTGCCAGGGCCCATAGGCTGCMCTGTGACCCCCAACGCAGCGTCAGCAGCCAGTGTGAlGCTGCTGGTCAGTGCCAGTGCMGGCCCAGGTAGM P G Y Y C N P S P G O P C O R D S O V P G P l G C N C D P O G S V S S O C D A A G O C O C K A O V E

3980 1300

F

T 700

2480 800

~

2630 850 2780

3080 O 1000

S

3981 1301

1200

4130 1350 4280

4131 1351

CGGCACTTCCMGGCTTTCCCCTGGTGMCCCACAGCGMACAGCCGCCTGACAGGAGAATTCACTCTGGMCCCGTGCCCGAGGGTGCCCAGCTCTCTTTTGGCAACTTTGCCCMCTCGGCCATGAGTCCTTCTACTCGCAGCTGCCG G D F O G F A L V N P O R N S R L T G E F T V E P V P E G A ~ L S F G ~ F A P L G H E S F V U O

L 1400 P

4281 1401

GAWCATACCAGGCAWCMGGTGGCGGCCTACGGTGGGMGTTGCWTACACCCTCTCCTACACAGCAGGCCCACAGGGCAGCCCACTCTCGGACCCCGATGlGCAGATCACGGGCMCMCATCATGCTAGTGGCCTCCCAGCCAGCG E T Y O C D K V A A Y G G K L R Y T L S Y T A G P O G S P L S D P D V O I T G N N I M L V A S O P A

4430 1450

4431 1451

CTGCAGGGCCCACAGAGGAGGAGCTACGAGATCATGTTCCGAGAGGMTTCTGGCGCCGGCCCGATGGGCAGCCGGCCACACGCGAGCACCTCCTGATGGCACTGGCCGACCTGGATGAGCTCCTGATCCGGGCCACGTTCTCCTCCGTG 4580

L

O

G

P

E

R

R

S

Y

E

I

M

F

R

E

E

F

U

R

R

P

D

G

~

P

A

T

R

E

H

L

L

~

A

L

A

D

L

D

E

L

L

I

R

A

T

FIG. 2. Complete nucleotide and deduced amino acid sequence of human HSPGS protein core. Shown are the nucleotide sequence (top line) and the deduced amino acid sequence in single letter code (bottom line). An open reading frame of 4391 amino acids comprises a mature (lacking the signal peptide) protein core of ~ 4 6 kDa. 7 The signal peptide of 21 amino acids is shaded. The threeamino-terminally located Ser-Gly-Asp (SGD) tripeptides, possible attachment sites for heparan sulfate chains, aredoubly underlined. Three more carboxyl-located Ser-GlyX-Gly (SGXG) tetrapeptides,proposed attachment sites for glycosaminoglycans,are shaded and double-underlined. The other Ser-Gly (SG) dipeptides are underlined. Potential N-glycosylation sites are noted by a bracket. The two stretches of amino acids which are identical to previously sequenced peptides from a human fibroblast HSPG (30) are underlined and printed in bold. A highly hydrophobic region, which may act as a transmembrane domain, is shaded and underlined (see also Table I). Two Leu-Arg-Glu (LRE) tripeptides, which can mediate motor neuron attachment (311,are shuded. The stop (TAG) codon is marked with three asterisks. The polyadenylation signal AATAAA, which is located 16 base pairs upstream of the poly(A) tail, is double-underlined and printed in bold.

F

S

S 1500 V

Molecular Structure of HSPG2 ProteinCore 4581 1501

CCGCTGGTGGCCAGCATCAGCGCAGTCAGCCTGGAGGTCGCCCAGCCGGGGCCCTCMACAGACCCCGCGCCCTCGAGGTGGAGGAGTGCCGCTGCCCGCCAGGCTACATCGGTCTGTCCTGCCAGGACTGTGCCCCCGGCTACACGCGC P L V A S I S A V S L E V A O P G P S N R P R A L E V E E C R C P P G Y I G L S C O O C A P G V T

4731 1551

ACCGGGAGTGGGCTCTACCTCGGCCACTGCGAGCTATGTGMTGCMTGGCCACTCAGACCTGTGCCACCCAGAGACTGGGGCCTGCTCGCMTGCCAGCACMCGCCGCAGGGGAGTTCTGCGAGCTTTGTGCCCCTGGCTACTACGGA T G ~ L V L G H C E L C E C N G H S O L C H P E T G A C S O C O H N A A G E F C E L C A P G Y Y

4880 G 1600

4881 1601

GATGCCACAGCCGGWCGCCTGAGGACTGCCAGCCCTGTGCCTGCCCACTGACCAACCCAGAGMCATGTTTTCCCG~CCTGTGAGAGCCTGGGAGCCGGCGGGTACCGCTGCACGGCCTGCGMCCCGGCTACACTGGCCAGTACTGT O A T A G T P E O C O P C A C P L T N P E N M F S R T C E S L G A G G Y R C T A C E P G Y T G O Y C

5030 1650

5031 1651

GAGCAGTGTGGCCCAGGTTACGTGGGTMCCCCAGTGTGCMGGGGGCCAGTGCCTGCCAGAGAC~CCMGCCCCACTGGTGGTCGAGGTCCATCCTGCTCGMGCATAGTGCCCCMGGTGGCTCCCACTCCCTGCGGTGTCAGGTC E O C G P G Y V G N P S V O G G O C L P E T N O A P L V V E V H P A R S l V P O G G S H S L R C O

V

5180 1700

5181 1701

AGTGGGAGCCCACCCCACTACTTCTATTGGTCCCGTGAGGATGGGCGGCCTGTGCCCAGCGGCACCCAGCAGCGACATCMGGCTCCGAGCTCCACTTCCCCAGCGTCCAGCCCTCGGATGCTGGGGTCTACATTTGCACCTGCCGTMT S G S P P H Y F Y U S R E O G R P V P ~ T O O R H O G S E L H F P S V O P S O A G V Y ~ C T C

R 1750 N

5331 1751

CTCCACCMTCCMTACCAGCCGGGCAGAGCTGCTGGTCACTGAGGCTC~GCMGCCCATCACAGTWCTGTGGAGGAGCAGCGGAGCCAGAGCGTGCGCCCCGGAGCTGACGTCACCTTCATCTGCACAGCC~GCMGTCCCCA L H O S N T S R A E L L V T E A P S K P l T V T V E E O R S O S V R P G A O V T F l C T A K S K S P

5480 1800

5481 1801

G C C T A T A C C C T G G T G T G G A C C C G C C T G C A ~ C G G G M A C T G T C C T G A C C A T T C G C M C G T C C A G C T G A G T G A T G C A G G C A C C T A C G T G T G C A C C G G C T C C M C A T G T T T G C C A T G G A C C A G 5630 A Y T L V U T R L H N G K L P T R A M O F N G I L T I R N V O L S D A G T Y V C T G S N H F A M D O 1850

5631 1851

GGCACAGCCACTCTACATGTGCAGGCCTCGGGCACCTTGTCCGCCCCCGTGGTCTCCATCCATCCGCCACAGCTCACAGTGCAGCCCGGGCMCTGGCGGAGTTCCGCTGCAGCGCCACAGGGAGCCCCACGCCCACCCTCGAGTGGACA G T A T L H V O A ~ T L S A P V V S I H P P O L T V O P G O L A E F R C S A T G S P T P T L E

5781 1901

GGGGGCCCCGGCGGCCAGCTCCCTGCGMGGCAC~TCCACGGCGGCATCCTGCGCCTGCCAGCTGTCGAGCCCACGGATCAGGCCCAGTACTTGTGCCGAGCCCACAGCAGCGCTGGGCAGCAGGTGGCCAGGGCTGTGCTCCACGTG G G P G G O L P A K A O I H G G 1 L R L P A V E P T D O A O Y L C R A H S S A G O O V A R A V L H V

5930 1950

5931 1951

CATGGGGGCGGTGGGCCCAWGTCCMGTGAGCCCAGAGAGGACCCAGGTCCACGCAGGCCGWCCGTCAGGCTGTACTGCAGGGCTGCAGGCGTGCCTAGCGCCACCATCACCTGGAG~G~GGGGGCAGCCTCCCACCACAGGCC H G G G G P R V O V S P E R T O V H A G R T V R L Y C R A A G V P S A T I T U R K E G G S L P P O A

6080 2000

-

4730 1550

R

5330

-

6081 2001

5780 U

Ttwo

6230 2050

GGCCCGGATGCMGTGGTTGTCCTTTCAGCCTCAGATGCCAGCCCA A R H Q V V V L S A S O A S p

-

6231 2051

CCGGGGGTCMGATTGAGTCCTCATCGCCTTCTGTGACA~GGGC~CACTCGACCTCMCTGTGTGGTGGCAGGGTCAGCCCATGCCCAGGTCACCTGGTACAGGCGAGGGGGTAGCCTGCCTCCCCACACCCAGGTGCACGGCTCC 6300 P G V K I E S S S P S V T E G O T L O L N C V V A G S A H A O V T U Y R R G G S L P P H T O V H G S 2100

6381 2101

CGTCTGCGGCTCCCCCAGGTCTCACCAGCTGATTCTGGAGMTATGTGTGCCGTGTGGAGMTGGATCGGGCCCCMGGAGGCCTCCATTACTGTGTCTGTGCTCCACGGCACCCATTCTGGCCCCAGCTACACCCCAGTGCCCGGCAGC 6530 R L R L P O V S P A O U E Y V C R V E N G S G P K E A S I T V S V L H G T H U P S Y T P V P G S 2150

6531 2151

ACCCGGCCCATCCGCATCWGCCCTCCTCCTCACACGTGGCG~GGGCAGACCCTGGATCTGAACTGCGTGGTGCCCGGGCAGGCCCACGCCCAGGTCACGTGGCACMGCGTGGGGGCAGCCTCCCTGCCCGGCACCAGACCCACGGC T R P l R l E P S S S H V A E G O T L O L N C V V P G O A H A O V T U H K R G G S L P A R H O T H G

6681 2201

TCGCTGCTGCGGCTGCACCAGGTGACCCCGGCCGACTCAGGCGAGTATGTGTGCCATGTGGTGGGCACCTCCGGCCCCCTAGAGGCCTCAGTCCTGGTCACCATCGMGCCTCTGTCATCCCTGGACCCATCCCACCTGTCAGGATCGAG S L L R L H O V T P A O ~ E Y V C H V V G T ~ P L E A S V L V T l E A S V I P G P l P P V

683 1 2251

TCTTCATCCTCCACAGTGGCCGAGGGCCAGACCCTGGATCTGAGCTGCGTGGTGGCAGGGCAGGCCCACGCCCAGGTCACATGGTACAAGCGTGGGGGCAGCCTCCCTGCCCGGCACCAGGTTCGTGGCTCCCGCCTGTACATCTTCCAG S S S S T V A E G O T L O L S C V V A G O A H A O V T U Y K R G G S L P A R H O V R G S R L Y l F

6981 2301

GCCTCACCTGCCGATGCGGGACAGTACGTCTGCCGGGCCAGCMCGGCATGGAGGCCTCCATCACGGTCACAGTAACTGGGACCCAGGGGGCCAACTTAGCCTACCCTGCCGGCAGCACCCAGCCCATCCGCATCGAGCCCTCCTCCTCG A S P A O A G O Y V C R A S N G M E A S l T V T V T G T O G A N L A Y P A G S T O P I R I E P S S

7131 2351

CAAGTGGCGGAAGGGCAGACCCTGGATCTGMCTGCGTGGTGCCCGGGCAGTCCCATGCC~GGTCACGTGGCACMGCGTGGGGGCAGCCTCCCTGTCCGGCACCAGACCCACGGCTCCCTGCTGAGACTCTACCMGCGTCCCCCGCC 7280 O V A E G O T L O L N C V V P G O S H A O V T U H K R G G S L P V R H O T H G S L L R L Y O A S P A 2400

7281 2401

GACTCGGGCGAGTACGTGTGCCGAGTGTTGGGCAGCTCCGTGCCTCTAGAGGCCTCTGTCCTGGTCACCATTGAGCCTGCGGGCTCAGTGCCTGCACTTGGGGTCACCCCCACGGTCCGGATCGAGTCATCGTCTTCGCAAGTGGCCGAG O S G E Y V C R V L G S S V P L E A S V L V T I E P A G S V P A L G V T P T V R I E S S S S O V A

E

7430 2450

743 1 245 1

GGGCAGACCCTGGACCTGMCTGCCTCGTTGCTGGTCAGGCCCATGCCCAGGTCACGTGGCACMGCGCGGGGGCAGCCTCCCGGCCCGGCACCAGGTGCATGGCTCGAGGCTACGCCTGCTCCAGGTGACCCCAGCTGATTCAGGGGAG G O T L O L N C L V A G Q A H A O V T U H K R G G S L P A R H Q V H G S R L R L L O V T P A D s _ c

E

7580 2500

7581 2501

TACGTGTGCCGTGTGGTCGGCAGCTCAGGTACCCAGGAAGCCTCAGTCCTTGTCACCATCCAGCAGCGCCTTAGTGGCTCCCACTCCCAGGGTGTGGCGTACCCCGTCCGCATCGAGTCCTCCTCAGCCTCCCTGGCCAATGGACACACC Y V C R V V G S ~ T O E A S V L V T I O O R L ~ S H S O G V A Y P V R I E S S S A S L A N

G

H2550 T

7731 2551

CTGGACCTCMCTGCCTGGTTGCCAGCCAGGCTCCCCACACCATCACCTGGTATMGCGTGGAGGCAGCTTACCCAGCCGGCACCAGATCGTGGGCTCCCGGCTGCGGATCCCTCAGGTGACTCCGGCA~CTCGGGCGAGTACGTGTGT L O L N C L V A S O A P H T I T U Y K R G G S L P S R H O l V G S R L R l P O V T P A O ~ E Y V

7880 C2600

7881 2601

CACGTCAGTMCGGTGCAGGCTCCCGGGAGACCTCGCTCATCGTCACCATCCAGGGCAGCGGTTCCTCCCACGTGCCCAGCGTCTCCCCACCGATCAGGATCGAGTCGTCTTCCCCCACGGTGGTGGMGGGCAGACCTTGGATCT~C H V S N G A G S R E T S L l V T l O G ~ S S H V P S V S P P l R I E S S S P T V V E G O T L O L

N 2650

8031 2651

TGCGTGGTCGCCAGGCAGCCCCAGGCTATCATCACATGGTACMGCGTGGGGGCAGCCTTCCCTCCCGACACCAGACCCATGGCTCCCACCTGCGGTTGCACC~TGTCTGTGGCTGACTCGGGCGAGTATGTGTGCCGGGCCMCMC C V V A R O P O A l l T U Y K R G G S L P S R H O T H G S H L R L H O M S V A O ~ E Y V C R A N

N2700

8181 2701

MCATCGATGCCCTGGAGGCCTCCATCGTCATCTCCGTCTCCCCTAGCGCCGGCAGCCCCTCCGCCCCTGGCAGCTCCATGCCCATCA~TTGAGTCATCCTCCTCACACGTGGCC~GGGGAGACCCTGGATCTGMCTGCGTGGTC

8331 2751

CCCGGGCAGGCCCATGCCCAGGTCACTTGGCACUGCtTGGCGTGGGGGCAGCCTCCCCAGTCACCATCAGACCCGCGGCTCACGGCTGCGGCTGCACCATGTGTCCCCGGCCGACTCGGGTGMTACGTGTGCCGGGTGATGGGCAGCTCTGGC P G O A H A O V T U H K R G G S L P S H H O T R G S R L R L H H V S P A O ~ E Y V C R V H G

8481 2801

CCCCTGGAGGCCTCAGTCCTGGTCACCATC~GCCTCTGGCT~GTGCTGTCCACGTCCCCGCCCCAGGTGGAGCCCCACCCATCCGCATCGAGCCCTCCTCCTCCCGAGTGGCAGMGGGCAGACCCTGGATCTGMGTGCGTGGTG 8630 P L E A S V L V T I E A ~ S S A V H V P A P G G A P P I R I E P S S S R V A E G O T L D L K C V V 2850

8631 2851

CCCGGGCAGGCCCACGCCCAGGTCACATGGCACMGCGTGGAGG~CCTCCCTGCCCGGCACCAGGTCCACGGCCCACTGCTGAGGCTGMCCAGGTGTCCCCGGCTGACTCTGGCGAGTACTCGTGCCMGTGACCGGMGCTCAGGC P G P A I A O V T U H K R G G N L P A R H O V H G P L L R L N O V S P A O ~ E Y S C O V T G

8781 2901

ACCCTGGAGGCATCTGTCCTGGTCACMTTGAGCCCTCCAGCCCAGGACCCATTCCTGCTCCAGGACTGGCCCAGCCCATCTACATCGAGGCCTCCTCTTCACACGTWCT~GGGCAWCTCTGGATCTGMCTGTGTGGTGCCCGGG T L E A S V L V T I E P S S P G P I P A P G L A O P I Y I E A S S S H V T E G O T L O L N C V V P G

8930 2950

8931 2951

CAGGCCCATGCCCAGGTCACGTGGTACMGCGCGGGGGCAGCCTCCCCGCCCGGCACCAGACCCATGGCTCCCAGCTGCGGCTCCACCTCGTCTCCCCTGCCGACTCAGGCGAGTATGTGTGTCGTGCAGCCAGCGGCCCAGGCCCTGAG ~ A ~ A ~ V T ~ y K R G G S L p A R ~ ~ T ~ ~ ~ ~ L R ~ ~

9080

9081 3001

CMGMGCCTCCTTCACAGTCACCGTCCCGCCCAGTWGGGGTCTTCCTACCGCCTTAGGAGCCCGGTCATCTCCATCWCCCGCCCAGCAGCACCGTGCAGCAGGGCCAGGATGCCAGCTT~GTGCCTCATCCATGACGGGGCAGCC9230 3050 O E A S F T V T V P P S E G S S Y R L R S P V l S l O P P S S T V O O G O O A S F K C L l H O G A A

9231 3051

CCCATCAGCCTCGAGTGGMGACCCGGMCCAGGAGCTGGAGGACMCGTCCACATCAGTCCCMTGGCTCCATCATCACCATCGTGGGCACCCGGCCCAGCMCCACGGTACCTACCGCTGCGTGGCCTC~TGCCTACGGTGTGGCC 9380

9381 3101

CAGAGTGTGGTGMCCTCAGTGTGCACGGGCCCCCTACAGTGTCCGTGCTCCCCGAGGGCCCCGTGTGGGT~GTGGG~GGCTGTCACCCTGGAGTGTGTCAGTGCCGGGGAGCCCCGCTCCTCTGCTCGTTGGACCCGGATCAGC 9530 3150 O S V V N L S V H G P P T V S V L P E G P V U V K V G K A V T L E C V S A G E P R S S A R U T R I S

9531 3151

AGCACCCCTGCCMGTTGWGCAGCGGACATATGGGCTCATGGACAGCCACGCGGTGCTGCAGATTTCATCAGCT~CCATCAGATGCGGGCACTTATGTGTGCCTTGCTCAGMTGCACTAGGCACAGCACAGMGCAGGTGGAGGTG S T P A K L E O R T V G L M O S H A V L O l S S A K P S O A G T Y V C L A ~ N A L G T A O K O V

9681 3201

ATCGTGGACACGGGCGCCATGGCCCCAGGGGCCCCTCAGGTCCMGCTGMGAAGCTGAGCTGACTGTGGAGGCTGGACACACGGCCACCTTGCGCTGCTCAGCCACAGGCAGCCCCGCGCCCACCATCCACTGGTCCMGCTGCGTTCC I V O T G A M A P G A P O V O A E E A E L T V E A G H T A T L R C S A T G S P A P T I H U S K L R S

9831 3251

CCACTGCCCTGGCAGCACCGGCTGGAAGGTGACACACTCATCATACCCCGGGTAGCCCAGCAGGACTCGGGCCAGTACATCTGCMTGCCACTAGCCCTGCTGGGCACGCTGAGGCCACCATCATCCTGCACGTGGAGAGCCCACCATAT P L P U O H R L E G O T L l l P R V A O O O ~ O Y l C N A T S P A G H A E A T l l L H V E S P

6680 2200

6830 l 2250 E

R

6980 2300

O

S

7730

8030 8180 8330 2750

N I O A L E A S I V I S V S P S A G S P S A P G S S H P I R I E S S S S H V A E G E T L O L N C V V

-

7130 2350

8480 S2800 ~

8780 S 2 ~W O

~ 3000 V

-

P l S L E U K T R N O E L E O N V H l S P N G S l l T l V G T R P S N H G T Y R C V A S N A Y G V A 3100

u

FIG.2-continued

9680 E

V 3200

9830 3250

W80 P

Y 3300

S

p

A


Core

8549

9981 3301

10130 3350

10131 3351

102m 3400

10281 3401

10430 3450

10431 3451

10580 3500

10581 3501

10730 3550 1o m

10731 3551

GGACAGTATCGCTGCACTGCUCCMCGCAGCTGGCACUCACMTCCCACGTCCTGCTGCTTGTGCMGCCTTGCCCCAGATCTCMTGCCC~GMGTCCGTGTGCCTGCTGGTTCTGCAGCTGTCTTCCCCTGUTAGCCTCAGGC G O V R C T A T N A A G T T O S H V L L L V O A L P O l S M P O E V R V P A G S A A V F P C l

10881 3601

TACCCCACTCCTGACATCAGCTGGAGCMGCTGWTGGCAGCCTGCCACCTWCAGCCGCCTGGAGAACMCATGCTWTGCTGCCCTUGTCCWCCCCAGWCGCAGGTACCTACGTCTGCACCGCCACTMCCGCCAGGGCMGGTC Y P T P D I S U S K L D G S L P P D S R L E N N M L M L P S V R P O O A G T V V C T A T N R O G K V

11031 365 1

11180 MAGCCTTTGCCCACCTGCAGGTGCCAWGCGGGTGGTGCCCTACTTCACGCAWCCCCCTACTCCTTCCTACCGCTGCCCACCATCMGCCACTCGWTGCCTAUGWGTTCGAGAT~WTCACCTTCCGGCCCGACTCAGCCGATGGWTG 3700 K A F A H L O V P E R V V P V F T P T P Y S F L P L P T I K D A Y R K F E I K I T F R P D S A D G M

11181 3701

CTGCTGTACMTGGGCAGMGCWGTCCCAGGWGCCCUCCMGCCACTCCCTGGCCMCCGGCAGCCCWCTTUTCTCCTTCGGCCTCGTGGGGGWGGCCCGAGTTCCGGTTCGATGCAGGCTCAGGUTGGCCACUTCCGCCATCCCAU L L Y N G Q K R V P G S P T N L A N R O P D F l S F G L V G G R P E F R F D A G ~ M A T l R H

P

T 3750

11331 3751

CCACTGGCCCTGGGCUTTTCCACACCGTGACCCTGCTGCGCAGCCTCACCUGGGCTCCCTWTTGTGGGTGACCTGGCCCCGGTCMTGGGACCTCCCAGGGCMGTTCCAGGGCCTGWTCTWCWGGMCTCTACCTGGGTGGC P L A L G H F H T V T L L R S L T O G S L l V G D L A P V N G T S O G K F O G L D L N E E L V L G G

11480 3800

A 3600 ~

11030 3650

11330

11481 3801

11630 3850

11631 3851

11780 3900

11781 3901

11930 3950

11931 3951

12080 4000

12081 4001

12230 4050

12231 4051

12380 4100

12381 4101

12530 4150

12531 4151

~CCCTGTCTGUTGGGGGUCCTGCCAGGGCACCCGCTGCCTCTGCCTCCCTGGCTTCTCTGGCCCACGCTGCCMCAAGGCTCTGGACATGGCATAGCAGAGTCCGACTGGCATCTTGMGGCAGCGGGGGCAATWTGCCCCTGGG 12680 :::: ....ei:P C L H G G T C 0 G T R C L C L P G F P R C 0 P G P.i:'il&i"i#'::ii:kI A E S D U H L E G SG G W D A P G 4200 ..... .

12681 4201

CAGTACGWGCCTATTTCCACWTGATGGCTTCCTCGCCTTCCCTGGCCATGTCTTCTCCAGWGCCTGCCCWGGTGCCCGAGACCATCWGCTGWGGTTCGWCCAGCAUGCCAGTGGCCTCCTGCTCTGGCAGGGTGTGWGGTG ~ Y G A Y F ~ D D G F L A F P G H V F S R ~ L P E ~ P E ~ I E L E ~ R

12831 4251

GGAWGGCCGGCCMGGCMGWCTTCATCAGCCTCGGGCTTCMWCGGGCACCTTGTCTTCAGGTACCAGCTGGGTAGTGGGWGGCCCGCCTGGTCTCTGAGGACCCCATCMTWCGGCGAGTGGCACCGGGTWCAGCACTGCGG, 12980 G E A G O G K D F I S L G L O D G H L V F R Y O L G ~ E A R L V S E D P I N D G E U ~ R V T A. . .~. . .~. L4300 : :

~

~

12981 4301

GAGGGCCGCAGAGGTTCCATCCMGTCWCGGTGAGGAGCTGGTCAGCGGCCGGTCCCCAGGTCCCAACGTGGCAGTCMCGCCMGGGCAGCGTCTACATCGGCGGAGCCCCTGACGTGGCUCGCTWCCGGGGGCAWTTCTCCTCG ' ~ ~ j : ~ G R R G S l O V D G E E L V ~ R S P G P N V A V N A K G S V Y l G G A P D V A T L T G

13130 G 4350 R F

S

~

13131 4351

GGCATCACAGGCTGTGTCMWCCTGGTGCTGCACTCGGCCCGACCCGGCGCCCCGCCCCCACAGCCCCTGGACCTGCAGCACCGCGCCCAGGCCGGGGCCAACACACGCCCCTGCCCCTCGTAGGCACCTGCCTGCCCCACACGGACT 13280 ~ I T G C V K N L V L H S A R P G A P P P ~ P L D L P ~ R A ~ A G A N ~ R P C P ~ * * * 4391

13281 13431 13581 13731 13881 14031 14181 14331

CCCGCGCCACGCCCCAGCCCW~TGTCWGTATATTATTATTMTATTATTATWTTTTTGTMGA~CCGAGGCWTGCCACGCTTTGCTGCTACCGCCCTGGGCTGGACTGGAGGTGGGCATGCCACCCTCACACACACAGCTGG 13430 GCAMGCCACAAGGCTGGCCAG~GGCAGGTTGGATGGWGTGGGCACCTCA~GTCACCAGGACTTGGGGTCAGWACAGTGGCTGGGTGGGCCCAGAACTGCCCCCACTGTCCCCCTACCCACCGATGGAGCCCCCAGATAGAGC 13580 TGGGTGGCCTGTTTCTGUGCCCTTGGGCAGTTCTCACTCCTAGGAGAGCCMCCTCGGCTTGTGGGCTGGTGCCCCACAGCTACCTGAGACGGGCATCGCAGGAGTCTCTGCCACCUCTCAGGATTGGGMTTGTCTTTAGTGCCGGC 13730 TGTGGAGCMAAGGCAGCTCACCCCTGGGCAGGCGGTCCCCATCCCCACCAGCTCGTTTTTCAGCACCCCCACCCACCTCCACCCAGCCCCTGGCACCTCCTCTGGCAGACTCCCCCTCCTACUCGTCCTCCTGGCCTGCATTCCCACC 13880 CCCTCCTGCCAGCACAUGCCTGGGGTCCCTCCCTCAGGGGCTGTMGGGMGGCCCACCCCMCTCTTACCAGGAGCTGCTACAGGCAGAGCCCAGCACTGATAGGGCCCCGCCCACCGGGCCCCGCCCACCCCAGGCCACATCCCCAC 14030 CCATCTGWGTGMGGCCCAGGWCTCCTCCMCAWCMCGGACGGACGGATGCCGCTGGTGCTCAGGAAGAGCTAGTGCCTTAGGTGGGGGMGGCAGGACTCACGACTWGAWGAGAGWGGGGGATATGACCACCCTGCCCCAT 14180 CTGUGGAGCCTGMGATCCAGCTCMGTGCUTCCTGCCAGTGGCCCCCAGACTGTGGGGTTGGGACGCCTGGCCTCTGTGTCCTAGMGGGACCCTCCTGTGGTCTTTGTCTTGATTTTTCTT~GGTGCTATCCCCGCCAM 14330

MMMAMMMMMMMMMA

14356

FIG. 2-continued

TABLEI Prediction of hydrophobic domains in the protein core of HSPG2 proteoglycan Prediction of transmembrane domains was obtained using the programs Raoargos (33), Helixmem (34), and Soap (35) contained in the PC/GENE package program. The settings for the various estimates were the same as in the original methods for predicting transmembrane hydrophobic domains. The signal peptide is consistently predicted by the three methods. The cleavage occurs before the valine according to the -3,-1 rule of von Heijne (22). Notice that only one other domain, centered in the middle of the protein core between amino acid 2010 and 2026, is recognized as a possible transmembrane domain by all the three methods. The 17 amino acids predicted by the three methods are boldface. Residues

2-23 515-530 1491-1511 2007-2034 2007-2027 2010-2026

12830 ~4250 ~

Classification

WRAPGALLLALLLHGRLLAVT SAACLPCFCFGITSVC LLIRATFSSVPLVASISAVSL

IATLLIPAITTADAGFYLCVATSPAGTA IATLLIPAITTADAGFYLCVA LLIPAITTADAGFYLCV

Signal peptide Transmembrane Transmembrane Transmembrane Transmembrane Transmembrane

helix helix helix multimeric segment

Ref.

33-35 33,34 33 33 34 35

~

A

~ '

~

8550

Molecular Structure of HSPG2 ProteinCore Residue

DOMAIN HOMOLOGY: 1.000

I I II

I

111

1.000

1,000

IV

4,000

I

v

FIG. 4. Homology plot analysis of HSPGP protein core reveals multiple internal repeats. The complete peptide sequence shown in Fig. 2 was subjected to homology plot analysis using the Dot-Plot program (GCG package) to illustrate the presence of internal FIG.3. Molecular model of the human HSPG2 protein core. repeats. The window setting was 30, with a stringency of 20. A single This chimeric molecule is composed of five discrete domains. Domain dot, thus, indicates that >20 residues match in a searching area of 30 I is specific for HSPG2 and contains a cluster of three glycosamino- amino acids. The horizontal bar at the bottom indicates the various glycans (dottedlines) attached to SGD sequences which are only domains as in Fig. 3. Domain Z, HSPG2-specific; domain ZZ, LDLfound in this region. Domain I1 is highly homologous to the LDL receptor-like; domain ZZZ, laminin A short arm-like; domain ZV, Nreceptor and contains four cysteine-rich repeats. Domain IIa contains CAM-like; domain V, laminin A globular domain-like. one IgG-like repeat. Domain I11 is homologous to the short arm of laminin, with four cysteine-rich domains intercalated among three subdomain (Fig. 6B) contains only two half-repeats, the sec(a, b, and c) globular domains. Domain IV comprises 21 consecutive IgG-like repeats as found in N-CAM. This region differs the most ond and third subdomains contain three complete and two from the published murine sequence where only 14 IgG-like repeats half-repeats, whereas the fourthsubdomain contains one full are present (21). Domain V is similar to the structure observed in the and one half-repeat. The similarity in the amino acid sequence carboxyl globular domain (G-domain)of laminin Achain. It contains and the conservation of the cysteine-rich regions between four EGF-like cysteine-rich repeats intercalatedamong three globular laminin A chain and the HSPG2 protein core suggest that repeats (a, b, and c). The dotted lines in domain IV and V represent they may have originated from a common ancestor at similar additional glycosaminoglycan chains possibly bound to SGXG seevolutionary time points aspreviously proposed (21). quences (23). In addition, there are several motifs, such asten Domain ZV: Homology with the Immunoglobulin Repeats of potential N-glycosylation sites dispersed throughout the molecule and a hydrophobic region in the middle of the molecule, which are N-CAM (Amino Acids 1677-3686)"Domain IV is the largest not represented for clarity. For additional details, see Fig. 2 and the domain of the HSPG2 molecule comprising 2010 residues text. with 21 consecutive repeats homologous to the immunoglob-

discussed below because there are21 additional repeatsof this nature found in domain IV. Domain ZZZ: Homology with the Short Arm of Laminin A Chain (Amino Acids 505-1676)"Homology plot analysis between the laminin A chain (44,45)and theHSPG2 sequence shows several areas of high homology between domain I11 and laminin (Fig. 6A). Domain I11 can be further divided into seven discrete subdomains: three cysteine-free globular domains (designated ZZZa,ZZZb, and ZZZc on Fig. 3) and four cysteine-rich repeats (Fig. 6B).The globular domains are 85% identical to those described in the murine sequence (21) and are =30% homologous to domain IV of the short arm of the laminin chains(44-46).The cysteine-rich repeats are30-35% homologous to those found in domains I11 and V of the laminin short arms (44-46).Alignment of these repeats (Fig. 6B) shows that a full repeat contained eight conserved cysteine andseveral glycine residues that arelikely to be involved in the bending of the loops as described for the laminin molecule (44-46).As in laminin, not all the human HSPG2 repeats contain all the 8 cysteines. The first cysteine-rich

ulin repeats of N-CAM (43).Therefore, this protein core is the gene product with the largest number of IgG-like repeats thus far described. Alignment of the 22 repeats (Fig. 71, including the isolated repeat in domain IIa (see Fig. 3),reveals near complete conservation not only of the cysteine but also of glycine and tryptophan, typically found in members of the immunoglobulin gene superfamily (43).In contrast, the murine species has only 14 IgG repeats, which makes the mouse protein core =67 kDa smaller than the human (21). The presence of this longer polypeptide in the human species suggests that there may be alternative splicing in this region and that the human species may assume a more extended configuration. A possible glycosaminoglycan attachment sequence is found in the 14threpeat (see Fig. 3). Domain V: Homology with the Carboxyl-terminal G-domain of Laminin AChain and EGF (Amino Acids 36874391)"The terminal module of 705 residues comprises seven discrete subdomains: three globular regions ( Va, Vb, and Vc, see Fig. 3) and two duplicate repeats similar to theepidermal growth factor (47).The globular subdomains exhibit a high degree of similarity (~33%)to the globular carboxyl end G-domain of

8551

Molecular Structure of HSPG2 Protein Core

A LDL RECEPTOR (residue) ZOO

0

400

600

100

I “ “’ .” ” ’ ” ” ” ’ ” ” ”

-

IO0

-

600

FIG. 5. Homology plot analysis of HSPGZ protein core and the LDL receptor ( A ) and amino acid align-

ment between the human and murine species ( B ) .The complete amino acid sequenceofhumanLDLreceptor was compared with the first 860 residues of human HSPG2 sequence ( A ) .Homology plot analysis was done using the DotPlot program (GCG package) with window settings of 30 and stringency of 15. A single dot, thus, indicates that >15 residues match in a searching area of 30 amino acids. In B , the four sequences in domain I1 that arehomologous to the LDL receptor are compared with the mouse sequence (21). All the 6 cysteines (shaded) are fully conserved. The consensus sequence (42) is shown at the bottom. The sequence DGSDE, aproposed binding site for LDL, is underlined. Gaps introducedto optimize alignment are shown as dashes.

3

e

2

Y

r

1

-

ZOO

,’

Repeat (residue)

1 Human

(194-234)

Mouse (195-235)

2 Human (281-319) Mouse (281-319)

3

Human (320-359) Mouse (320-359)

4 Human

GPPPP---

(360-403)

Mouse (360-403) LDL RECEPTOR CONSENSUS ( 4 2 )

laminin A (44, 45) and to merosin, a laminin A homologue (48). These subdomains may fold into globular structures as found in laminin. The EGF-like repeatsare composed of about 40 amino acids, each exhibiting a perfect conservation of all the 6 cysteines (Fig. 8). The amino acid alignment follows a consensus sequence for theEGF type 1 repeat (47) and contains several conserved glycines that may be involved in folding. The second and fourth EGF repeats contain a SGXG sequence which may be substituted with glycosaminoglycan chains. It is not clear whether the EGF motifs can act as growth modulators or whether they have lost their original cell signaling abilities and may serve as mere “spacers” between more functionally active domains. Detection of HSPG2 mRNA by in Situ Hybridization-To determine the precise cellular localization of HSPGP gene expression, a number of human tissueswere fixedin a manner that preserves mRNA (16) and processed for in situ hybridization using the biotin-labeled HS-1 insert( 9 ) .Specific signal, seenas a dark purple chromogen by Nomarski optics, is detected in the syncytiotrophoblasts of placenta (Fig. 9 B ) and within the endothelial cells of the fetal circulation (Fig. 912). In contrast, biotin-labeled pBR322 plasmid without any human insertgives no appreciable signal (Fig. 9 D ) .The presence

T- @ ---EF-

&

--G--

I---W-

4

D---D

8

-pEspE--

@

of the HSPGP transcript was detected in generally all epithelial and endothelial cells of all the tissues analyzed, including those derived from colon, prostate, uterus, ovary, and skin (not shown). The presence of HSPG2 transcriptin endothelial cells is in close agreement with previous studies which have shown a single transcript of 12-14 kb in RNA derived from human endothelial cells, human fibroblasts, colon tissue, colon carcinoma cells, and liver ( 9 ) . In summary, the in situ hybridization studies reported above indicate a wide expression of the HSPG2 gene among human vascularized tissues. Tissue Distribution of HSPG2 Protein Core os Detected by Monoclonal Antibody and Immumenzymutic Staining-To investigate in more detail the expression of this proteoglycan, frozen sections of various human tissues, both benign and malignant, were reacted with a monoclonal antibody HS42 (19). This antibody was raised against the protein core of the human placenta proteoglycan, a large heparan/dermatan sulfate hybrid proteoglycan that binds fibronectin and is localized to the basement membrane (19). Because HS42 reacts strongly with the human HSPG2proteoglycan and stains the colon carcinoma cells intensely (not shown), experiments were performed to investigate in detail the distribution of HSPG2 in various human organs. The technique used was

Molecular Structure of HSPG2 ProteinCore

8552

A A CHAIN (residue) 2.000

1,000

0

2.000

FIG. 6. Homology plot analysis of HSPG2 protein core and the laminin A chain (A) and amino acid alignment of cysteine repeats in domain I11 ( B ) .The first 2600 residues of human laminin A sequence (45) was compared with the corresponding amino acid residues of human HSPG2 (A). Homology plot analysis was done using the DotPlot program (GCG package) with window setting of 30 and a stringency of 15. A single dot,thus, indicates that >15 residues match in a searching area of 30 amino acids. In B, the four cysteine repeats of domain I11 are aligned. Gaps introduced to optimize alignment are shown as dashes.

B Repeat (residue) 1 (505-530)

C PDGHFYLEHSAA---

C F C FGITSV------ C

2 (731-933)

C C S C NGHASS------ C DPVY-GH----- C C P C PYIDASRRFSDT C FLDTDGQAT--- C - VNQEIVR----- C DERGSMGTSGEA C C F C MGVSRH------ C "

3

C C C C

RLN DA R-

LP

C

PIGYSGLS C ES C DAHFTRVPGGPYLGT SG QHNTEGPQ C NK C KAGFPGDAWATATS C RP APGYTGRR C ES C APGYEGNPIQPGGK- c RP K N N W G R L C NE C ADGSFHLSTRNPDG- LK

C S- C PPGYRGPS C QD C DTGYTRTPSGLYLGT C ER

(1126-1334)

C S C HGHSEA-----C EPEl"GA----- C QG C C P C YGDPAAGQAAHT C FLDTDGHPT--- C DA C C N C DPQGSVSSQ--- C DAAGQ------- C Q- C C F C MGITQQ------ C C R- C C E 0 NGHSDL------ C HPEl'GA------ C SQ C C A C PLTNPENMFSRT C ESLGAGGYR--- C TA C

4 (1530-1670)

QHHTEGPR 6 EQ 0 QPGYYGDAQRGTWD C QL SPGHSGRH C ER C APGYYGNPSQGQP" QRDSQVPGPIG KAQVEGLT C SH C RPHHPHLSASNPDG- b LP

c 5

PPGYIGLS C QD $ APGYTRTGSGLYLGH EL QHNAAGEF C EL C APGYYGDATAGTPED Q QP EPGYTGQY 0 EQ C GPGWGNPSVQGGQLP

#

REPEAT pama," I l a 1

PPQVVTPPRESIQA 5 RGPTVTFI C VA1

5 VPTPIIN-

Y

6

RLN-YGHIPSHPRVTVTSEGGRGTLIIRDVKESDP

AYT C EIVINARGlVFGIPDGVLELVPPRGP

pmain I V

2 1 1

5 6

I 8

9 LO I1 12 13 1) 15

I6

I7

LVVEVHPARSIVPP 8 G--SHSLR fi PVS P IVTVLEPRSPSVRP 6 A - - D V I F I C TAK S PVVSIHPPPLIVPP 9-LAEFR 0 SAT 0 PRVPVSPERTPVHA R - - l V R L I C R A A 6 PGVKIESSSPSVTE 6 P--lLOLN C VVA 4 RPIRIEPSSSHVAE 6 D--ILDLN C VVP &

4 4

20 21 22

U SRE-DGRPVPSGlPPRH---~GSELHFPSVPPSOA Y IRLHNGKLPTRIVIOF------NGILTIRNVOLSOA

VY1 C ICRNLHPSNISRAELLVlEAPSKPl I Y V E IGSNlF~DOGIAlLHV-MSGlLSA M -lGGPGGPLP~QIH-----GGILRLPLV~PlDP PYL RMSSAGq$AUAYLW%GGG Y RKE-GGSLPPPARSERl---DlAlLLlPAlTlADA FYL VATSPAGTAQARlPVVVLSASDASP Y IRR-GGSLPPHTQVH------GSRLRLPPVSPADS EYV @ RVENGSGPKEASITVSVLHGIHSGPSITPVPGST Y HKR-GGSLPARHOlH------GSLLRlHOVTPAOS 6 E I V C HVVGlSGPIFASYIYTI-EAS-------VIPGPI

1

6

&

c

F'YAllVPEw\SVQA PTVQVTPQLETKSI VLlNlRTSVQlVVV LPQISMPPEVRVPA

6

4 6

E-TVPLQ A-SVEFH H--AVEFE S-IAVFP

Repeat

1

2 3 4

G

6

6

LAH AVP LAL IAS

4

9

TPPL-IF9 DPGT-QLR DPKP-QVI

c

1

PPIRIEPSSSPVAE 6 0--1LDLN ij VVP 6 PSHA-&I # HKR-GGSLPVRHdlH------GSLLRLYbSPADS PTVRIESSSSPVLE 6 0--1LDLN LVA @ PAHA-PVI U HKR-GGSLPARHPVH------GSRLRlLPVlPADS IPVRlESSSASLAN 8 H--TLOLN C LVA 1 PAPH-TIT Y YKR-GGSLPSRH~lV------GSRLRlPPVlPADS PPIRlESSSPlVVE B 0--1LOLN E VVA It O P O A - I I T Y YKR-GGSLPSRHOTH------GSHLRLHOMSVADS MPlRlESSSSHVAE E-TLDLN VVP O A k O V T U HKR-GGSLPSHHOTR------GSRLRLH~VSPADS PPlRlLPSSSRVAL @ P--ILDLK C VVP QAHA-QVl Y HR-GGHLPARHOVH------GPLLRLNPVSPADS QPIIIEASSSHVTE P-ILDLN VVP 1 P A H I - Q V I ! I YKR-GGSLPARHPlH------GS~lRLHLVSPADS PVlSlDPPSSTVPQ r( 9-DASFK @ L I H d GPAPISLE ! IKTRNPELEDNVHISPN-----GSllTlVGlRPSHH T V S V L P E G P M K V 6 K - - A V l L f I VSA B EPRS-SAU TRI-SSIPULEORIYGIYOSHIVLOISSUPSOI

18

19

SPPHIFYKSPAITLV SPTP-TLE VPSA-TIT SAHA-PVI OAHA-OVI

U

SRV-GSSLPGdAT#------nELLHFERAAP~DS

Y -FKEGGPLPPGHSVP------OGVLRIPNLWSCP Y -SKVGGHLRPGlVPS------GGVVRlAHViLADA 6 YPIP-OIS Y -SKLDGSLPPDSRLE------NWnLnLPSYRPQDA

Amino a c i d

1

(3849-3S88) (3889-3929) (4109-4147) (4148-4184)

g

C o n s e n s u s sequence ( 4 7 )

4 IB e8

RVLGSSVPLEASVLVTI-EPA~-GSV--PALGVT RVVGSSGl~EASVLVTI-WRLSGS---HSOGVA HVSN~GSRElSLIVlI-POSGSS----WPSVS

EIV EIV EYV EYV

E RAUNNID*LEASIVISV-SPSAGSP---SAP6SS E Y i 6 RVrw;SSGPiiAsViVil-EASGSSAVHVPAPSGA

8 EYS '.

1&

QVlGSSGlLEASV~Vll-EPsspCpl--PAPGLA RMSGPGPEQEASFTYTVPPSECSSY--RLRS VASNAIGVAQSVVNLSV-HGPP I V Y i! L A O M L G 7 A L y O V F V I V O l ~ P G A EYV IYR

1 81 8 RIR

TI1

PIR IIV

RVTNKYGSIEAFAPLLV-~PPGSLPAlSIPffiSl PAffiPffiMMSAPLVI-PALPS TATMGTTOSWLLLV-OA TAIURPGXVMFAHLPVPERVVPIFTPIPISFL

2

3

4

9

9

R-DRP C. QN99Q HD-SESSSYV C H-PEA G GPDAT $i VNRPDGRGYT E-RQP C QHPAT MP-AGEYEFQ $ LHST Q------GTR @ QLREP

9

C

0

9

P"--

C LNGGT H

9

----D----A

FIG.7. Alignment of the 22 immunoglobulin-like repeats in domains IIa and IV of the human HSPG2 protein core. One I&-like repeat is present in domain IIa (see Fig. 3), and 21 consecutive repeats are present indomain IV. The individual repeats vary between 85 and 100 amino acids and have extensive conservation of the cysteine (C), glycine (G), and tryptophan ( W )residues, which are all shaded. Gaps introduced to optimize alignment are shown as dashes.

5 V R L L

$j -

9

a

6

PASFTPSR @ EHSQALH HLQRSPLR EEGVT RWFKGDL EHEENP LF'QFSPPR QQGSGH

p

8

"GF-G-R Y

@ E

Q

FIG. 8. Alignment of the four epidermal growth factor-like repeats in domain V. The four EGF-like cysteine-rich domains are aligned with each other and with the consensus sequence for EGF type 1 repeats (47). Notice that there is perfect conservation of the 6 cysteine residues (shaded and numbered 1-6). The disulfide bonds are expected to form as shown: CYS"~,C Y S ~and - ~ , Cy&' (47). The conserved glycines are underlined. Gaps introduced to optimize alignment are indicated as dashes.

highly sensitive both due to the fact that the antibody was strongly reactive at >1:1000 dilution, thus notably reducing the background, and that thesignal was amplified by bridging

antibodies (20). The results show a marked labeling along the basement membranes of human glomeruli (Fig. lOA), proximal and distal renal tubules (Fig. lOB), smooth muscle cells

Molecular Structure of HSPGZ Protein Core

FIG.9. Localization of HSPG2 proteoglycan transcript by in situ hybridization. A, light microscopic view of human termplacenta stained with hematoxylin and eosin. Notice a chorionic villus with well developed fetal vessels lined by syncytiotrophoblasts. B and C,sequential sections from the same block viewedby Nomarski differential interference contrast optics. Notice the presence of HSPG2 transcript (arrowheads) in the syncytiotrophoblasts and endothelial cells, respectively. D,control section where HSPG2 probe was replaced with an irrelevant plasmid. For in situ hybridization, freshly obtained tissues were fixed at -20'C in modified Carnoy's solution, sequentially dehydrated inethanol, impregnated with amyl acetate/paraffn, andembedded in Paraplast. Eight-pm-thick sections were digested with proteinase K, refixed in formaldehyde, and denatured with formamide prior to hybridization (17). The human HS-1 insert (9) was biotin-labeled by nick translation in the presence of biotin-dUTP (18). [3H]dATPwas routinely used to monitor incorporation, whereas pBR322 plasmid was labeled as control. Routinely, hybridization conditions included 5 pg/ml of biotinylated probe which was denatured at 65 'C for 5 min before incubation, 7% dextran sulfate, 24% formamide, 37 "C, 48 h. After rinsing with 12.5% formamide, hybridization was visualized histochemically with streptavidin-alkalinephosphatase (Enzo Biochemicals) using 5-bromo-4chloro-3-indolyl phosphate andnitro blue tetrazolium as substrates. The hybridization signals were detected as dark purple precipitates on the sections ( B and C).(Bar in A = 10 pm).

8553

;" "

A

hl

of colonic tunica muscularis (Fig. lOC),arterioles and venules liver (not shown), in agreement with a previousreport (19). Taken together, these results indicate a ubiquitous distri(Fig. loll), and ovarian epithelium (Fig. 1OE). A striking degree of reactivity is observed in the ovarian stromal cells bution of HSPG2 proteoglycan throughout the vascularized (Fig. 1OE). In contrast, in papillary carcinoma of the ovary tissues and suggest that a significant contribution to the (Fig. lOF), the HSPGP epitopes are primarily located along genesis of this proteoglycan is brought by stromal cells which the basement membranes of the neoplastic cells and blood previously have beenthought to be incapableof synthesizing vessels. In normal colonic mucosa (Fig. lOG), the proteoglycan basement membrane constituents. is localized alongthe basement membrane of the epithelium DISCUSSION and blood vessel cells. The apparent reactivity of the apical portion of the epithelial cells (Fig. 10G) is likely due to the In the present study we report the complete sequence of endogenous alkaline phosphatase, since it is also present in the human HSPG2 protein core, whose partial sequence and the control sections lacking monoclonalantibody (not shown). chromosomal assignment were initially reported by this labIn colon carcinoma (Fig. lOH), the proteoglycan is localized oratory (9). This proteoglycan is one of the largest gene not only along the basement membrane of the tumor cells products in human tissues with a mature protein core of -467 (multiple arrowheads), but also in the fibrovascular tumor kDa. If we include the numerous post-translational modifistroma (single arrowhead). Intense proteoglycan reactivity is cations such as six potential glycosaminoglycan side chains also found in the syncytiotrophoblasts and the endothelial of c30 kDa each (2), the large number of 0-linked oligosacbasement membrane of the fetal vessels (not shown) of term charides (3), 10 potential N-linked oligosaccharides, and two human placenta. These results are in close agreement with or more long-chainfatty acids (6), the complete proteoglycan the in situ hybridization studies reported above. Finally, we could reach the size of 850 kDa as we originally estimated (2). found immunoenzymaticreactivity along the basement mem- One of the most fascinating features of this gene product is brane of skin, prostatic glands, and perisinusoidal region of its elaborate structure which clearlyis a result of the assembly

8554

Molecular Structure of HSPG2 ProteinCore

FIG. 10. Gallery of light micrographs of various human tissues fol-

lowingimmuuoenzymatic staining with monoclonal antibody against the protein core of HSPG2. A, renal cortex showing intense staining of glomerular (arrowhead) and tubular basement membranes. B, higher magnification showing intense staining of tubular basementmembrane (arrowheads). C, cross-sectionof tunica muscularis of colon showing diffuse cellular positivityin smooth muscles, whereas the nerve plexus is essentially negative.D, vessels in perirenal fibmadipose tissue. E, ovary. F, papillary carcinoma of the ovary. G, colonic mucosa. Notice the intense staining of the epithelial basement membrane (arrowheads). The reactivity present in the apical portions of the epithelial cells is likely due to the endogenous alkaline phosphatase; it was present also in control sections (not shown). H,human colon carcinoma. Notice the presence of immunoreactive proteoglycan alongthe basement membrane (multiple arrowheads) and in the fibrovascular tumor stroma (single arrowhead). Frozen sections of freshly-obtained surgicalspecimens were reactedwith HS42 monoclonal antibody(19)from ascites fluid at a 1:lOOO dilution, followedby rabbitanti-mouse bridging antibodies. Immunoenzymatic labelingof the monoclonal was achievedby usingimmunecomplexes of alkaline phosphatase and antialkaline phosphatase monoclonal antibodies. Signal enhancement was achieved by using oneor two additional layers ofbridging antibody and antiphosphatase antibody(20). (A, c, and E = X 210; B, D, and F = X 450; H = X 600).

1

of modular units. These protein modules are involved in the control of lipoprotein metabolism, in adhesion of cells to substratum, in theinteractions between cellsand matrix, and in the control of cellular growth. The existence of repeated modules in vertebrate proteins is now well recognized (49) and allows a more efficientutilization of functional domains by providing synergism of functional units (49-51). At the genomic level, their molecular organization is characterized by a series of exons with the same phase at their intron-exon boundaries, which allows divergent evolution of a primordial gene by either duplication or exon shuffling (49-51). Below, we will briefly discussthe various modulesand we will attempt to generate a comprehensive understanding of this complex proteoglycan. Domain I: A Unique Region Carrying ThreeHeparan Sulfate Chains-This172-aminoacid-longdomain appears to be unique forthe HSPG2, since an extensive computersearch of >32,000 protein sequences using the FASTA program revealdd no significant similarity with any other protein. This domain contains a cluster of three SGD sequences whichare exclusively foundin this region and arefully conservedin the murine species (21). Two of these sequences conformto the consensus sequence ED-GSG-ED proposed previously for

attachment of glycosaminoglycans(23) and also found in three different integral membrane proteoglycans such as syndecan (24, 25), glypican (26), and NG2 proteoglycans (27). Although syndecan is a hybrid proteoglycan and contains both heparan and chondroitin sulfate chains, glypican contains only heparan sulfate chains and NG2 proteoglycan only chondroitin sulfate chains.It is presently unclear whether the glycosaminoglycan specificity derives from and/or cell- tissuespecific enzymes. The location of three HS chains in the amino terminus of the human and mouse proteoglycans (21) supports electron microscopic data which have shownin the murine EHS tumor a cluster of HS chains at one end of the molecule (52-54). In addition, the human HSPG2 contains three potential HS-attachment sites SGXG, one in domain IV and two in domain V. This consensus sequencehas been observed in a numberof small chondroitin/dermatan sulfatecontaining proteoglycans (29). In domain IV, the related sequence DSGE occurs nine times, five times more than in the mouse (21). However, most of these tetrapeptides lack the acidic-X-S-G-acidic motif necessary for glycosaminoglycan binding (23). Near the carboxyl terminus, there are two additional tetrapeptides, EGSG and GSGE, whichpartially fulfill the consensus sequence describedabove.

Molecular Structure of HSPG2 Protein Core

8555

Structural Similarities between Proteoglycans from Base- with epithelial development (68). Recent studies (69) have ment Membranes and the Attachment of Diverse Glycosami- shown that cysteine-rich peptides derived from the region of noglycans-There is compelling evidence that, in contrast to laminin which is homologous to domain I11of HSPG2 prothe smaller proteoglycan species (55-56), the high molecular mote cell division. Interestingly, these peptides induce cell weight proteoglycans made by different basement membrane- growth that is comparable in dose response and time dependpossible function of producing tissues share significant structural andimmunolog- ence with that of EGF (69). Thus, a ical homology (57, 58). For example, similar protein cores of HSPG2 domain I11 could be growth-promoting activity during ~ 4 0 kDa 0 are found in human colon carcinoma cells (2), EHS development and repair. Finally, domain I11 may bind fibronectin as shown for the tumor (8, 52, 53), rat yolk sac tumor (59), human placenta (19), bovine endothelial cells (60), human lung fibroblasts proteoglycan of humanplacenta (19) and lung fibroblasts (30), human fibrosarcoma cells (12), and calf lens epithelial (70). Because of similar protein core size, tissue distribution, cells (61). The evidence published so far suggests that some immuno cross-reactivity, similar sizemRNA species, and of these related protein cores can be substituted with either same peptide sequences (see Fig. 2), we believe that these heparan or chondroitin/dermatan sulfate and that in some proteoglycans are identical to those described in colon carcicircumstances, the proteoglycan may carry both types of noma and colon (2,9) andfibrosarcoma cells (12). The HSPG chains. For example, it has been shown that both EHS tumor from human lung fibroblast matrix (30) contains two major tissue (62) and a cell line recently established from the EHS glycosaminoglycan-free peptides of =110 and 62 kDa which tumor (63) contain hybrid proteoglycans. It has also been bind avidly ( k d =2 nM) to fibronectin (70). This suggests the proposed (52, 62) that two forms of the basement membrane existence of stable complexes in vivo (70) and is in agreement HSPG derived from the EHS tumor tissue may be distinct with the data from the human placenta proteoglycan (19). gene products, whereas others (64)have demonstrated a com- Our findings suggest that domain I11 could be involved in mon origin of the two species from proteolytic processing of mediating these proteoglycan-fibronectin interactions. Domain IV: A Large Module Composed of Reiterative IgGthe same precursor. The results of our study indicate the likely possibility of extensive post-translational modifications like Subdomains as in N-CAM-This module is by far the largest assembly of IgG-like repeats found so far with 21 of the protein core which could include both heparan and consecutive units spanning 2010 amino acids which are hochondroitin sulfate chains. These chains would be localized at the opposite ends of the protein core and could result in mologous to the IgG-like repeats found in N-CAM (43, 71). similar behavior on analytical chromatography or ultracenti- It is interesting that there are seven additional IgG repeats in fugation if the cleavage sites of the protein core were closeto the human ascompared with that of the murine species which the glycosaminoglycan chains. Accordingly, a high-density contains only 14 repeats in domain IV. This may derive from proteoglycan, composed of three chains attachedto a partially alternative splicing of the mRNA. The modules present in cleaved protein core, could derive from either the amino or the immunoglobulin gene superfamily, to which the HSPG2 carboxyl terminus. A plausible scenario which could explain domain IV clearly belongs, is one of the best studied. It has a number of conflicting published results is that a common been shown how this module can be adapted to bind avariety precursor protein undergoes tissue- and cell-specific process- of ligands by changing the length of the variable polypeptide ing to generate multiple forms of proteoglycans. loops attached to a stable p-sheet core structure (72). It is Domain 11:A Structural Module withHomology to the LDL possible that these IgG repeats are involved in homophilic Receptor-The presence of a molecular domain with features binding as proposed for the N-CAM (43, 71). In addition, typically found in proteins that mediate nutritional uptake domain IV contains a sequence of high hydrophobicity and and its proximity to the heparan sulfate-containing region is three independent methods (33-35) predict a transmembrane quite interesting. One possibility is that thetwo domains may domain (see TableIand Fig. 2), followedby apotential coordinately influence LDL metabolism. This view derives cleavage site. This opens the possibility that thisgene product from the observations that heparin, a relatedglycosaminogly- may be intercalated or tightly bound to theplasma membrane can, displaces LDL from its receptor (39),that heparan sulfate at least during some stages of synthesis and translocation. binds LDL (65), and that the LDL receptor-like domain of These findings arein agreement with the relatively high HSPG2 is the region directly involved in theinteraction with hydrophobicity of the human proteoglycan (36). the LDL (42). Of particularinterest is the presence of Domain V: A Carboxyl-terminal Module Analogous to the DGSDE, a sequence that is fully conserved in the mouse (21) G-domain of Laminin A Chain and EGF-The terminal doand that has been proposed to represent the specific site of main is similar to the large, carboxyl-terminal G-domain of interaction between the receptor and LDL (42). Future studies laminin A chain (45, 46) andtothe related glycoprotein need to establish whether HSPG2, eitherasa basement merosin (48). It differs, however, in two respects: (i) the membrane or as an integral membrane proteoglycan, is di- presence of three globular domains instead of five and (ii)the rectly involved in the metabolism of LDL. unique presence of four EGF-like repeats. The globular doDomain 111: Homologyto the Short Arm of Laminin A mains could be involved in both homotypic and heterotypic Chain-The laminin-like nature of the HSPG2 was known interactions, as it has been shown for the EHS proteoglycan since the original report of the partialcDNA sequence of the (53, 58). Synthetic peptides from the G-domain of laminin A murine species (10)and was later confirmed by the first promote cell adhesion, neurite outgrowth, and interact with human clones published (9, 12). Obviously, the similarity in heparin and thepl integrin subunit (73). Therefore, plausible structure to the short arm of laminin A, with both globular functional roles of the globular regions of domain V would be repeats andcysteine-rich repeats, suggests that HSPG2 shares their direct involvement in the assembly and maintenance of a number of functions with laminin, such ascell adhesion and basement membranes, and adhesion of epithelial cells. growth. The human HSPG2, however, lacks both RGD and The four EGF repeats are very interesting features of this YIGSR sequences, two peptides that have been involved in domain, since all the 6 cysteines are fully conserved, in cellular binding and differentiation (32, 66, 67). Laminin is contrast to the murine species in which only 5 cysteines are the first matrix protein to be detectable at early stages of present in the first two EGF repeats (21). The prototype EGF differentiationand its synthesis and deposition correlates causes pleiotropic proliferative and developmental effects


8556

which are mediated by a specific cell surface receptor endowed with tyrosine-kinase activity (74). The secondary structure of human EGF in solution has been recently elucidated (75). Accordingly, the four EGF-like motifs in HSPG2 to the type 1consensus Present in a number of Proteins, including the human tissue plasminogen activator, transforming growth factor a, and laminin (47). If the disulfide bonds among the 6 cysteines follow the same patternas in human EGF, then domain V would contains four "finger-like" structures which together would provide maxima1binding affinity. It is an appealing concept that the EGF motifs may exert vectorial growth-promoting activity on opposing cells as proposed for some of the EGFsubdomains of laminin (77). Finally, domain V contains two Leu-Arg-Glu (LRE)tripeptides that are present in S-laminin, a homologue of laminin concentrated in the basement membranes of motor nerve terminals and muscle fibers at the neuromuscular junction (31, 76). Neurons bind to LRE-containing peptides and soluble LRE tripeptidesblock attachment of neurons to S-laminin (31). The presence of two LRE, in contrast to the murine species in which Only One is present (21)? suggest an important role forneurite HSPG2 in outgrowth. Conclusions-The molecular data Presented in this Paper have unraveled the structural complexity of the major proteoglycan from human basement membranes and otherextracellular matrices. This composite, multidomain gene product appears to be evolutionarily related to molecules involved in f l d a m e n t a l cellular processes such as nutrient binding and delivery, mitogenesis, andthe attachment/detachment of cells. This nondiffusible chimeric macromolecule with intrinsic growth-promoting and cell-adhesive properties could be utilized by tissues during embryonic development, repair, and growth' Our findings provide evidence for permissive structural hierarchies, in which the level of contact, growth regulation, and binding between HSPG2 and Surrounding moleculeswould be provided by several site-specific interactions and collectively affect the cellular processes and their microenvironment. Acknowledgments"We thank J' for his continuous support, Drs. M.-L. Chu and J. Uitto for providing cDNA libraries, Dr. M. Isemura for the monoclonal antibody HS42, Dr. R. Schwarting for help with the immunoenzymatic studies, and N. Hacobian and M. Naso and K. Shepley for excellent technical assistance. We thank also the members of the Computer Facility of the Jefferson Cancer Institute.

REFERENCES 1. Ruoslahti, E. (1988) Annu. Rev. Cell Bid. 4 , 229-255 2. Iozzo, R. V. (1984) J. Cell Biol. 99,403-417 3. Iozzo, R. V., and Clark, C. C. (1987) J. B i d . Chem. 2 6 2 , 1118811199 4. Iozzo,R. V., and Hassell, J. R. (1989) Arch. Biochem. Biophys. 269,239-249 R. V. (1989) J . B i d . Chem. 2 6 4 , 2690-2699 5. IOZZO, 6. IOZZO, R. V., Kovalszky, I., Hacobian, N., Schick, P. K., Ellingson, J. S., and Dodge, G. R. (1990) J. BWl. Chem. 265,19980-19989 7. Dodge, G. R., Kovalszky, I., Hassell, J. R., and Iozzo, R. V. (1990) J. Biol. Chem. 2 6 5 , 18023-18029 8. Hassell, J. R., Robey, P. G., Barrach, H.-J., Wilczek, J., Rennard, S. I., and Martin, G. R. (1980) Proc. Natl. Acad. Sci. U. S. A . 77,4494-4498 9. Dodge, G. R., Kovalszky,I., Chu, M.-L., Hassell, J. R., McBride, 0.W., Yi, H. F., and Iozzo, R. V. (1991) Genomics 10,673-680 10. Noonan, D. M., Horigan, E. A., Ledbetter, S. R., Vogeli, G., Sasaki, M., Yamada, Y., and Hassell, J. R. (1988) J. Biol. Chern. 263,16379-16387 11. Wintle, R. F., Kisilevsky, R., Noonan, D., and Duncan, A. M. V. (1990) Cytogen. Cell Genet. 54, 60-61 12. Kallunki, P., Eddy, R. L., Byers, M.G., Kestila, M., Shows, T. B., and Tryggvason, K. (1991) Genomics 11, 389-396

Core

13. Feinberg, A. p., and Vogelstein, B. (1984) A d . Biochem. 1 3 7 , 266-267 14. Short, J. M.9 Fernandez, HuW w. D.9 and SOW, J. (1988) Nucleic Acids Res. 16, 7583-7600 15. Chen, E. Y., and Seeburg, P. H. (1985) D N A ( N Y )4 , 165-170 16. Tuan, R. S., Lamb, B. T., and Jesinkey, C. B. (1988) Differentiation 3 7 , 198-204 17. McDonald, s. A., andTuan, R. s. (1989) Deu. Bid. 133, 221234 18. Langer, P., Waldrop, A., and Ward, D. (1981) Proc. Natl. Acud. Sci. U. S. A . 78,6633-6637 19. Isemura, M., Sate, N., Yamamchi, Y., Aikawa, J., Munakata, H., Hayashi, N., Yosizawa, Z., Nakamura, T., Kubota, A., Arakawa, M., and Hsu, C.-C. (1987) J. Bwl. Chem. 262,8926-8933 20. Schwarting, R., Gerdes, J., Durkop, H., Falini, B., Pileri, S., and Stein, H. (1989) Blood 74,1678-1689 21. Noonan, D. M., Fulle, A., Valente, P., Cai, S., Horigan, E., Sasaki, M., Yamada, Y., and Hassell, J. R. (1991) J. Biol. Chem. 266, 22939-22947 22. von Heijne, G. (1986) Nucleic Acids Res. 1 4 , 4683-4690 23. Zimmermann, D. R., and Ruoslahti, E. (1989) EMBO J. 8,29752981 24. Saunders, S., Jalkanen, M., O'Farrell, s.,andBernfield, M. 11989) J. Cell B i d . 1 0 8 , 1547-1556 25. Mali, M., Jaakkola, p., Arvilommi, A."., and Jalkanen, M. (1990) J. Biol. Chem. 265,6884-6889 26. David, G., Lories, V.,Decock,B., Marynen, P., Cassiman, J.-J., and Van Den Berghe, H. (1991) J. Cell Biol. 111,3165-3176 27. Nishiyama, A., Dahlin, K. J., Prince, J. T., Johnstone, S. R., and Stallcup, W. B. (1991) J. Cell Biol. 114,359-371 28. Huber, S., Winterhalter, K. H., and Vaughan, L. (1988) J. Biol. Chem. 263, 752-756 29. Bourdon, M. A., Kmsius, T., Campbell, S., Schwa&, N.B., and Ruoslahti, E. (1987) Proc. Natl. Acud. Sci. U. S. A . 84, 31943198 30. Heremans, A., Van Der Schueren, B., De Cock, B., Paulsson, M., Cassiman, J.-J., Van Den Berghe, H., and David, G. (1989) J. Cell B i d . 109,3199-3211 31. Hunter, D. D., Porter, B. E., Bulock, J. W., Adams, S. P., Merlie, J. P., and Sanes, J. R. (1989) Cell 59,905-913 32. Ruoslahti, E., and Pierschbacher, M. D. (1987) Science 238,491497 33. Rae, M. J. K., and Argas, p. (1986) Biochim. Biophys. Acta 8 6 9 , 197-214 34. Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984) J. Mol. Biol. 1 7 9 , 125-142 35. Klein, P., Kanehisa, M., and DeLisi, C. (1985) Biochim. Biophys. Acta 815,468-476 36. Iozzo, R. v. (1988) J , cell, Biochem. 37, 61-78 37. Maizel, J. v., andLenk, R. p. (1981) proc. Natl,Acad, sei. u, A, 78,7665-7669 38. pearson, w. R,, and Lipman, D. J. (1988) proc,Nuti. Acad, sei, U. S. A. 85,2444-2448 39. Yamamoto, T., Davis, G. G., Brown, M. S., Schneider, W. J., Casey, M. L., Goldstein, J. L., and Russell, D. W. (1984) Cell 39,27-38 40. Raychowdhury, R., Niles, J. L., McCluskey, R. T., and Smith, J. A. (1984) Science 2 4 4 , 1163-1165 41. DiScipio, R. G., Gehring, M. R., Podack, E. R., Kan, C. C., Hugli, T . E., and Fey, G.H. (1984) Proc.Natl. Acud. Sci. U. S. A . 8 1 , 7298-7302 42. Sudhof, T. C., Goldstein, J. L., Brown, M. S., and Russell, D. W. (1985) Science 228, 815-822 43. Cunningham, B. A., Hemperly, J. J., Murray, B. A., Prediger, E. A., Brackenbury, R., and Edelman, G. M. (1987) Science 236, 799-806 44. Sasaki, M., Kleinman, H. K., H u h , H., Deutzmann, R., and Yamada, Y. (1988) J . B i d . Chem. 263, 16536-16544 45. Haaparanta. T., Uitto, J., Ruoslahti, E., and Engvall, E. (1991) Matrix 11, 151-160 46. Sasaki, M., and Yamada, Y. (1987) J. BWl. Chem. 2 6 2 , 1711117117 47. Appella, E., Weber, I. T., and Blasi, F. (1988) FEBS Lett. 2 3 1 , 1-4 48. Ehrig, K., Leivo, I., Argraves, W. S., Ruoslahti, E., and Engval, E. (1990) Proc. Natl. Acad. Sci. U. S. A . 87, 3264-3268 49. Doolittle, R. F. (1989) Trends Biochern. Sci. 1 4 , 244-245 50. Patthy, L. (1987) FEBS Lett. 2 1 4 , 1-7 J.9

s,

Molecular Structure of HSPG2 Protein Core 51. Baron, M., Norman, D.G., and Campbell, I. D. (1991) Trends Biochem. Sci. 16, 13-17 52. Paulsson, M., Yurchenco, P. D., Ruben, G. C., Engel, J., and

Timpl, R. (1987) J. Mol. Biol. 197, 297-313 53. Yurchenco, P. D., Cheng, Y.-S., and Ruben, G. C. (1987) J. Biol. Chem. 262,17668-17676 54. Laurie, G . W., Inoue, S., Bing., J. T., and Hassell, J. R. (1988) Am. J.Anat. 181, 320-326 55. Kanwar, Y. S., Hascall, V. C., and Farquhar, M. G. (1981) J. Cell Biol. 90,527-532 56. Soroka, C. J., and Farquhar,M. G. (1991) J. Cell Biol. 113,12311241 57. Hassell, J. R., Kimura, J. H., and Hascall, V. C. (1986) Annu. Reu. Biochem. 55,539-567 58. Yurchenco, P. D., and Schittny, J. C. (1990) FASEB J. 4, 15771590 59. Wewer, U. M., Albrectsen, R., and Hassell, J. R. (1985) Differentiation 30,61-67 60. Saku, T., and Furthmayr, H. (1989) J. Biol. Chem. 264, 35143523 61. Mohan, P. S., and Spiro, R. G. (1991) J. Biol. Chem. 266,85678575 62. Kato, M., Koike, Y., Ito, Y. Suzuki, S., and Kimata, K. (1987) J. Bid. Chem. 262,7180-7188 63. Danielson, K.G., Martinez-Hernandez, A., Hassell, J. R., and Iozzo, R. V. (1992) Matrix, 12, 22-35 64. Klein, D. J., Brown, D. M., Oegema, T. R., Brenchley, P. E.,

65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77.

8557

Anderson, J. C., Dickinson, M.A. J., Horigan, E. A., and Hassell, J. R. (1988) J. Cell Biol. 106, 963-970 Cardin, A. D., and Weintraub, H. J. R. (1989) Arteriosclerosis 9, 21-32 Graf, J., Iwamoto, Y., Sasaki, M., Martin, G. R., Kleinman, H. K., Robey, F. A., and Yamada, Y. (1987) Cell 48,989-996 Grant, D. S., Tashiro, K.-I., Segui-Real, B., Yamada, Y., Martin, G. R., and Kleinman, H. K. (1989) Cell 58,933-943 Timpl, R., and Dziadek, M. (1986) Znt. Reu. Exp. Pathol. 29, 1112 Panayotou, G., End, P., Aumailley, M., Timpl, R., and Engel, J. (1989) Cell 56, 93-101 Heremans, A,, De Cock, B., Cassiman, J.-J., Van den Berghe, H., and David, G. (1990) J. Biol. Chem. 265,8716-8724 Rutishauser, U., Acheson, A., Hall, A.K., Mann, D.M., and Sunshine, J. (1988) Science 240,53-57 Williams, A. F., and Barclay, A. N.(1988) Annu. Reu. Zmmunol. 6,381-405 Skubitz, A. P. N., Letourneau, P. C., Wayner, E., and Furcht, L. (1991) J. Cell Biol. 115, 1137-1148 Carpenter, G., and Cohen, S. (1979) Annu. Reu.Biochem. 48, 193-216 Cooke, R. M., Wilkinson, A. J., Baron, M., Pastore, A., Tappin, M. J., Campbell, I. D., Gregory, H., and Sheard, B. (1987) Nature 327, 339-341 Hunter, D. D., Shah, V., Merlie, J. P., and Sanes, J. R. (1989) Nature 338, 229-233 Engel, J. (1991) Znt. J. Biol. Macromol. 1 3 , 147-151