of modular structures with adhesive and growth reg- ulatory properties. .... the laminin-like domain of the HSPGZ protein core, has been char- acterized ...
Vol. 267, No. 12, Issue of April 25, pp. 8544-8557,1992 Printed in U.S.A.
THEJOURNAL OF BIOLOGICAL CHEMISTRY
0 1992 by The American Society for Biochemistry and Molecular Biology, Inc.
Primary Structure of the Human Heparan Sulfate Proteoglycan from Basement Membrane (HSPGBIPerlecan) A CHIMERIC MOLECULE WITHMULTIPLE DOMAINS HOMOLOGOUS TO THE LOW DENSITY LIPOPROTEINRECEPTOR, LAMININ, NEURAL CELL ADHESION MOLECULES, AND EPIDERMAL GROWTH FACTOR* (Received for publication, December 31,1991)
Alan D. Murdoch, George R. Dodge, Isabelle Cohen, RockyS . TuanS, and RenatoV. Iozzo# From the Department of Pathology and Cell Biologyand the Jefferson Cancer Institute andthe $Departments of Orthopaedic Surgery and of Biochemistry and Molecular Biology, T h o r n Jefferson University, Philadelphia, Pennsylvania 19107
We have determined the complete nucleotide and is present in all vascularized tissues and suggest that deduced amino acid sequence of the major protein core this unique molecule has evolved from the utilization of the human heparan sulfate proteoglycan HSPG2/ of modular structures with adhesive and growthregperlecan of basement membranes. Eighteen overlap- ulatory properties. ping cDNA clones comprise 14.35 kilobase pairs (kb) of contiguous sequence with anopen reading frame of 13.2 kb. The mature protein core, without the signal peptide of 21 amino acids, has a M, of 466,564. This Tissue homeostasis and the remodeling of organs during large protein is composed of multiple modules homol- development, repair and neoplastic growth are influenced by ogous to the receptorof low density lipoprotein,lami- specific interactions between the cellular elements and the nin,neural cell adhesion molecules, andepidermal surrounding extracellular matrix. Pivotal roles are played by growth factor. Domain I, near the amino terminus, proteoglycans which are some of the most complex and mulappears unique for theproteoglycan since it sharesno tivalent molecules presentin mammalian tissues (1). Our significant homology with any other proteins. It con- laboratory has extensively investigated the biosynthesis, posttains three Ser-Gly-Asp sequences that could act as translational modifications, and cellular expression of the attachment sites for heparan sulfate glycosaminogly- major heparan sulfate proteoglycan from human colon and cans. Domain I1 is highly homologousto theLDL recep- colon carcinoma cells (2-7). When colon carcinoma cells are tor and contains four repeats with perfect conservation cultured as monolayers, they synthesize a unique proteoglycan of all 6 consecutive cysteines. Next is domain I11which shares homology to the short arm of laminin A chain which is closely associated with the plasma membrane and and contains four cysteine-rich regions intercalated the pericellular microenvironment (2-4). This proteoglycan among three globular domains. Domain IV,the largest containsaprotein core of ~ 4 0 0kDa (4), which is highly module with >2000 residues, contains 2 1 repeats of glycosylated with both 0-linked oligosaccharides and numerthe immunoglobulin type as found in neuralcell adhe- ous heparan sulfate side chains (5), some of which are totally sion molecule. Near thebeginning of this domain, there unsulfated (6). One interesting featureof the colon carcinoma is a stretchof 29 hydrophobic amino acids which could cell proteoglycan is its post-translational modification with allow the molecule to interact with the plasma mem- myristate and palmitate, two long-chain fatty acids that add to thecomplexity of this macromolecule and could be involved brane. Domain V, similartothecarboxyl-terminal in membrane targeting andhydrophobic interactions (6). The globular G-domain of laminin A and to the related protein merosin, contains three globular regions and microvillar surface of colon carcinoma cells reacts intensely four EGF-like repeats. In situ hybridization and im- with a murine polyclonal antiserum (2) which was originally munoenzymatic studies show a close association of this raised against the proteoglycan isolated from the basement gene product with a variety of cells involved in the membrane-producing EHS’ tumor (8).Subsequent studies (4) assembly of basement membranes, in addition to being revealed several immunologicaland structural featuresshared localized within the stromal elements of various con- by the human andmurine proteoglycan species. They include: nective tissues. Our studies show that thisproteoglycan (i) similar size(-400 kDa) of the protein corefollowing heparitinase digestion; (ii) ability of the anti-EHS antiserum * This work was supported by National Institutes of Health Grants to immunoprecipitate a human =400-kDa precursor protein; CA-39481 and CA-47282 (to R. V. I.) and HD-15822 and USDA 8& (iii) evidence for a precursor-product relationship between the 37200-3746 (to R. S. T.) and by a Wellcome Trust Research Travel =400-kDa and thefully glycosylatedproteoglycanusing pulseGrant (to A.D.M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must chase experiments; and (iv) detection by Western blotting of therefore be hereby marked “advertisement” in accordance with 18 a similar product which was present only following heparitiU.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequence(s)reported in this paper has been submitted to the GenBankTM/EMBLData Bank withaccessionnumber(s) M85289. 3 Recipient of a Faculty Research Award FRA-376 from the American Cancer Society. To whom correspondence and reprint requests should be addressed Dept. of Pathology and Cell Biology, Rm. 249, Jefferson Alumni Hall, Thomas Jefferson University, 1020 Locust St., Philadelphia, PA 19107. Tel.: 215-955-2208; Fax: 215-923-2218.
The abbreviations used are: EHS, Engelbreth-Holm-Swarm; LDL, low density lipoprotein; N-CAM, neural cell adhesion molecule; EGF, epidermal growth factor; kb, kilobase pair(s); “perlecan,” this term has been recently assigned to the mouse species because of its beaded appearance on electron microscopy of isolated spread molecules (21). Because we have no such information regarding the human species, we refer to it as HSPG2, the official name given by the Human Genome Mapping Nomenclature Workshop Committee (9).
8544
Molecular Structure of HSPG2 Protein Core nase digestion of the immunoprecipitated proteoglycan (5). We have recently isolated and characterized two overlapping cDNA clones from a human colon library (9). These human clones were ~ 8 5 %homologous, at both the nucleotide and amino acid level, to the murine clones encoding a lamininlike domain of theEHS tumorprotein core (10). Using human/rodent somatic cell hybrids and Southernblotting, we also localized the human HSPG2 gene to thetelomeric region of the short arm of chromosome 1, a finding that has been confirmed by other laboratories (11,12). In this report, we describe the complete primary structure of HSPG2 protein core. This complex macromolecule of 467 kDa combines repeating modules homologous to the LDL receptor, laminin A chain, N-CAM, and EGF. The human HSPG2 differs from the mouse in that it contains a much larger N-CAM region, it lacks the cell-binding sequence RGD, and it contains at least six potential glycosaminoglycan attachment sites. In situ hybridization and immunoenzymatic studies showed a close association of this gene product with a variety of cells involved in the assembly of basement membranes, in addition to being found within the stromal elements of various connective tissues. The results suggest that this multidomain proteoglycan is expressed by nearly all the vascularized tissues and that it may acquire a tissue-specific function by alternative splicing of the various domains or by post-translational modifications. It is likely that thischimeric molecule has evolved from the utilization of modular structures with adhesive and growth regulatory properties. EXPERIMENTAL PROCEDURES
Materials-All the reagents were of molecular biology grade. Radionucleotides [32P]dCTP(-3000 Ci/mmol) and [35S]dATP(-1000 Ci/mmol) were obtained from Amersham Corp. cDNA Libraries and Screening Strategy-Four different human cDNA libraries were screened to isolate the clones which together comprise the entirecDNA sequence for the protein core of the human heparan sulfate proteoglycan HSPG2: 1) a randomly primed colon Xgtll cDNA library (Clontech); 2) an oligo(dT)/randomly primed cDNA library prepared from mRNA of a fibroblast cell line (ATCC, CRL 1262) kindly provided by Dr. Mon-Li Chu; 3) an oligo(dT)/ randomly primed cDNA library prepared from mRNA of a human amnion cell line referred to as WISH cells (ATCC, CCL 25), and 4) an oligo(dT) randomly primed cDNA library from human keratinocytes mRNA (both kindly provided by Dr.Jouni Uitto). The fibroblast and WISH cell cDNA libraries were cloned into the EcoRI site of X ZAP I I (Clontech), whereas the colon and keratinocyte cell cDNA libraries were in conventional Xgtll. The fibroblast and WISH libraries were screened extensively, whereas the keratinocyte library was used only for the 5’ regionof the cDNA and found to be negative. The first human clone (HS-l), a 1.1-kb insert encoding a portion of the laminin-like domain of the HSPGZ protein core, has been characterized previously (9). The HS-1insert was labeled by the random priming method (13) and used to screen a t least lo6 recombinant phage from the various libraries. Subsequent clones were obtained by performing repetitive screenings with polymerase chain reactiongenerated probes of 150-200 base pairs that were determined from the most 5’- or 3’-ends of the new clones. Isolated plaques remaining positive after tertiary or quaternaryscreenings were “rescued” by the automatic excision process (14) which resulted in a pBluescript SK plasmid containing the cDNA insert. The procedure for “rescue” was followed as outlined by the manufacturer (Stratagene). The positive clones identified in the colon Xgtll library were subcloned into the EcoRI site of PGEM-3Z (Promega) or pBluescript (Stratagene). DNA Sequencing and Computer Analysis-Plasmids were sequenced by a modified dideoxynucleotide chain termination method (15) using either polylinker primers T3 and T7or synthetic oligonucleotide primers (9). At least 4 kb of the 5’ cDNA sequence was also confirmed by comparison with the exonic sequence of two human cosmid clones we have recently isolated and partially characterized.* Ambiguities were resolved by modifying sequencing reactions or I. Cohen and R. V. Iozzo, manuscript in preparation.
8545
electrophoretic conditions to enhance DNA sequences proximal or distal to theprimers. Alignment of nucleotide sequences and comparisons with EMBL and NBRF data bases were performed utilizing the programs contained in PC/GENErelease 6.6 (Intelligenetics) including CD-ROM data base (release 5) and theFASTA and MULTALIN programs contained in the GCG package of the Jefferson Cancer Institute. InSitu Hybridization and Immunoenzymatic Staining-Gene expression of HSPGZ was determined using previously described protocols (16, 17). A number of human tissues, including placenta, colon, skin, uterus, prostate,bladder, ovary, and skeletal muscle, were fixed and processed as described before (17). The HS-1plasmid insert was biotin-labeled by nick translation (18)and visualized histochemically with a streptavidin-alkaline phosphatase procedure (16, 17). For immunoenzymatic staining, frozen sections of various surgically obtained human tissues were reacted with a monoclonal antibody HS42 directed against the basement membrane proteoglycan from human placenta (19),an antibody that recognizes the human HSPG2 (see “Results”). Immunoenzymatic labelling of HS42 monoclonal antibody was achieved by using immune complexes of alkaline phosphatase and anti-alkaline phosphatase monoclonal antibodies. Amplification of the signal was obtained by using two layers of bridging antibodies and anti-phosphatase antibodies (20). Bound phosphatase molecules were visualized using new fuchsin as substrate. Additional experimental details are provided in the text and the legends to figures. RESULTS
Isolation and Characterization of Overlapping cDNA Clones-Initial screening of a human coloncDNA library with the HS-1 insert (9) gave a number of relatively small clones which overlapped to a great extent. To circumvent this problem, we screened three other libraries from fibroblast, amnion cell, and keratinocyte, as well as a humanlymphocyte genomic (cosmid) library.’ Over 100clones were obtained and 18 were fully sequenced to provide the complete nucleotide sequence of HSPG2 protein core. A schematic representation of the overlapping clones is shown in Fig. 1. Clones 1 and 2 are the two inserts previously described by us (9) and their sequence was identical in all the 5’ and 3’ overlapping cDNAs. Subsequent clones were obtained by rescreening several human libraries with 5’ or 3‘ fragments generated by polymerase chain reaction. This strategy yielded a number of overlapping clones with several new clones that significantly extended 5’ and 3‘ (Fig.1). Nucleotide and Deduced Amino Acid Sequence-The com-P I 0 ~
.2
___
IS
-10 ~
.-
- 7
t-
I*
‘ 2 66
~
~
08
-,,*
A10
0
kb
lo
15
FIG. 1. Schematic representation of the overlapping cDNA clones encoding the 467-kDa protein core of human heparan sulfate proteoglycan HSPGB. The 18cDNA clones are represented by thin lines in the upper portion of the figure. Clones 1and 2 (labeled by a star) were characterized previously (9). The dotted line in clone 185 indicates a recombinant sequence which likely occurred during plasmid rescue. The full-length cDNA is represented by the filled bar with the start codon of the first methionine (ATG) and the stop codon (TAG) indicated. The untranslated 5’ and 3’ regions are shown by the unfilled bar. The size in kilobases ( k b ) is shown at thebottom. For additional details, see Fig. 2.
8546
Molecular Structure ofProtein HSPG2
plete nucleotide and deduced amino acid sequences are shown in Fig. 2. The entire sequence contains 14,356 nucleotides with an open reading frame of 13,173 nucleotides and a deduced precursor protein of 4391 amino acids (Mr= 468,789). The sequence starts with 80 base pairs of a 5"untranslated region which is enriched in guanine and cytosine (GC = 89%). The first methionine is followedby a highly hydrophobic segment of 20 amino acids that constitutes the signal peptide. The position of the first methionine and the amino acid sequence of the signal peptide is very similar to that of the murine species (21). Prediction of the eukaryotic secretory signal sequence givesa potentialcleavage site between residue 21 and22, with valine as thefirst amino acid, the same amino acid found in the mouse sequence (21). This cleavage site conforms to the -3,-1 rule of von Heijne (22). The mature protein core, without the signal peptide of 21 amino acids, is composed of4370 residues which encode a protein with a calculated M, of466,564. Some of the physicochemicalparameters of this protein, as predicted by computer analysis, include an isoelectric point of 6.0 and a net charge of -78.6 at pH 7.0. The protein core contains 187 cysteines, which are for the most part conserved between the human and the murine species (21). There are 10 potential N-glycosylation sites, which are randomly distributed along the protein core, and a zinc finger motif, between base residue 761 and 785, at the same location as thatfound in themouse (21). The human protein core contains a total of 53 Ser-Gly repeats, putative attachment sites for glycosaminoglycans. These SG repeats are distributed randomly throughout the sequence. However, a cluster of three Ser-Gly-Asp (SGD) are only found in the amino terminusregion of the protein, between residue 65 and 78 (Fig. 2). These SGD repeats, which are flanked by hydrophobic residues, correspond at least in part to the proposed consensus sequence E/D-GSG-E/D found in theproteoglycan versican (23). These sequences and surrounding amino acids are fully conserved in the murine species (21) and have been also found in theprotein cores of mouse (24) and human (25) syndecan, in glypican (26), in the NG2 proteoglycan (27), and in collagen type IX (28). The human protein core contains three additional sequences of Ser-Gly-X-Gly, a consensus sequence previously established for a group of small proteoglycans (29). One of such sequences is located at residue 2995 and the other two near the carboxyl-terminal region of the protein core, at residues 3933 and 4179, respectively (Fig. 2). Interestingly, the sequences of two peptides of 18 amino acids each derived from the human HSPG protein core of fibroblasts (30) were identical to our deduced amino acid sequence contained within residues 1379-1398 and residues 2841-2860, respectively (Fig. 2). Furthermore, glutamic acid was the amino acid preceding these two peptide sequences, in agreement with the fact that these two peptides were generated by V8 protease (30) which cleaves after a glutamic or aspartic acid residue. Similar sequences were also found in a cDNA encoding the basement membrane proteoglycan from the EHStumor (10) and a humanfibrosarcoma cell line (12). Two Leu-Arg-Glu (LRE) sequences, potential mediators of motor-neuron attachment (31) were found near the carboxyl terminus (Fig. 2). In contrast to the murine protein core which contains the cell-binding motif Arg-Gly-Asp (RGD) (32), the human protein core contained no such sequence. Three different computer programs (33-35) contained in the PC/GENE package predicted one to four hydrophobic stretches in addition to the signal peptide (Table I). Specifically, one region, contained between residues 2007 and 2034, was predicted as a transmembranedomain by all three methods (33-35). This being correct, the protein core of the HSPG2
Core
proteoglycan could be intercalated in the plasma membrane via this hydrophobic region. Future experiments need to establish whether this hydrophobic domain is utilized by different cells and whether it may be required for intracellular transport andmembrane targeting. It is noteworthy, however, that the HSPG2 from colon carcinoma cells binds avidly to hydrophobic matrices and requires relatively high concentrations of detergent to be displaced from octyl-Sepharose (36). At the 3'-end of the molecule, there is a stop codon (TAG) followed by1.1kb of 3"untranslated region (Fig.2). A typical polyadenylation signal AATAAA is separated from the poly(A) tail by 16 nucleotides. Structural Model and Internal Repeats: A Chimeric Molecule with Multiple Modular Units-Extensive computer analyses revealed several interesting features of this protein core. The most salient isthe assembly of multiple domains with striking homology to other extracellular matrix and adhesive proteins and the presence of several internally repeated structures. A schematic representation of this model reveals five discrete domains (Fig. 3). Domain I contains cluster a of three heparan sulfate chains and appearsunique for the HSPG2. Domain I1 is homologous to theLDL receptor. Domain IIa contains one IgG repeat. Domain I11 is homologous to the shortarm of the laminin A chain. Domain IV contains 21 IgG repeats asin NCAM, whereas domain V is structurally similar to the G domain of laminin A chain (i.e. the carboxyl globular domain of laminin long arm). The presence of the multiple internal repeats present in domains I1 to V is demonstrated by homology plot analysis (37) (Fig. 4). Details of the various domains will be presented individually in the following sections. Domain I: Unique to Heparan Sulfate Proteoglycan (Amino Acids 22-193)"Immediately after the signal peptide, there is a domain of 172 amino acids that contains the threeglycosaminoglycan attachment sites (SGD)described above. In contrast to the other domains, domain I does not contain any internal repeats, lacks cysteines, and is highly enriched in acidic residues. Consequently, domain Ihas an estimated isoelectric point of 4.04 and a netcharge of -14.71 at pH 7.0. Extensive computer analysis of the various data banks utilizing the FASTA program (38) did not reveal significant homology with any other protein. Therefore, this domain appears to be specific for the HSPG2 protein core. The uniqueness of this domain has also been found in themurine species (21). Domain 11:Homology with the LDL Receptor (Amino Acids 194403)"The second discrete domain of210 amino acids contains four modular units (Fig. 5, A and B ) .These cysteinerich repeats consist of about 40 residues each and exhibit striking homology to the LDL receptor (39) and to related proteins such as the glycoprotein of the Heyman nephritis antigen (40) and thehuman complement component C9 (41). Between the first and the second cysteine repeat there is a 45-residue-long segment enriched in proline with no homology to other proteins, as in the murine species (21). The four repeated subdomains have 6 conserved cysteines and several acidic and hydrophobic amino acids as in the LDL receptor (42). Alignment with the murine protein core sequence (Fig. 5B) reveals complete conservation of the cysteine residues and of several adjacent amino acids. Of particular interest is the conservation of the sequence DGSDE, an amino acid segment that mediates the binding of LDL to itsreceptor (39, 42). Immediately distal to the LDL receptor domain, there is a region of 101 residues (404-504) with one isolated IgG-like structure homologous to therepeats found in N-CAM and the immunoglobulin gene superfamily (43). This region will be
Molecular Structure of HSPGB Protein Core
8547
-
380
23 1 51
TCGTACCTTTClWTWlWGTAWIlGClGGCTWCAGCATCTWIGGAGACGACClGGGCAGTGGGGACCTGGGCAGCGGGGACTTC~GATGGTTTATTTCCWGCCClGCTGMTTTCACTCGCTCCATCGAGTA~GCCCTWIGClG S Y L S D D E V R L A D S I - D L G - L G ~ F O M V V F R A L V N F T R S I E Y S P P L
381 101
G A G W T G W I G G C T C C A W W G T T T C G A W G G T G T C C ~ G G C T G T G G T A ~ C A C G C T G W G l C G ~ G T A C l T ~ T T C C C G W G A C C A G G T T G T C A G T G l G G T G T l W I T ~ G ~ G C T G W T G G C T G G G T T T T T G T G G A G C T C W530 TGlG E D A G S R E F R E V S E A V V D T L E S E V L K I P G D O V V S V V F I K E L D G U V F V E L D V 150
531 151
GGCTCGGMGCGMTGCGGATGGTGCTCAGATTCAGWWTGCTGCTCAGGGTCATCTCCAGCGGCTCTGTGGCCTCCTACGTCACCTCTCCCCAGGGATTCCAGTTCCWCGCCTGGGCACAGTGCCCCAGTTCC~GAGCCTGCACG 680 G S E C N A D G A O I O E R L L R V I S ~ S V A S Y V T S P O G F O F R R L G T V P O F P R A C T 200
681 201
GAGGCCWGTTTGCCTCCCACAGCTACMTGAGTGTGTGGCCClGGAGTATCGCTGTGACCGGCGGCCCWCTGCAGGGACATGTCTGATWGCTCMTTGTGAGGAGCWIGTCCTGGGTATCAGCCCCACATTCTCTCTCCTTGTGWG E A E F A C H S Y N E C V A L E Y R C D R R P D C R D M S D E L N C E E P V L G I S P T F S L L V
830 250
E
980 300
ACGACATCTTTACCGCCCCGGCCAGAWCAACCATCATGCGACAGCCACCAGTCACCCACGCTCCTCAGCCCCTGCTTCCCGGTTCCGTCAGGCCCCTGCCCTGTGGGCCC~GWGGCCGCATGCCGCMTGGGWICTGCATCCCCAW 831 25 1 T T S L P P R P E T T I M R Q P P V T H A P O P L L P G S V R P L P C G P P E A A C R N G H C I P R
981 301
100
GACTACCTCTCCWCGGACAGWGGACTGCGAGCACGGCAGCWTGAGCTA~CTGTGGCCCCCCGCCACCCTGTGAGCCCMCGAGTTCCCCTGCGGGAATGGACATTGTGCCClCMGCTGTGGCGCTGCGATGGTGACTTTGACTGl D Y l C D G O E D C E D G S D E L D C G P P P P C E P N E F P C G N G H C A L K L U R C D G D f D C
1130 350 1280 400
1131 351 1281 401
TTTGCCTGCATCCCCCCCCAGGTGGTGACACCTCCCCGGGAGTCCATCCAGGCTTCCCGGGGCCAGACAGTGACCTTCACCTGCGTGGCCATTGGCGTCCCCACCCCCATCATCMTTGWGGCTCAACTGGGGCCACATCCCCTCTWIT F G C R P P P V V T P P R E S I O A S R G O T V T F T C V A I G V P T P I I N U R L N U G H I P S H
1430 450
1431 451
CCCAGGGTGACAGTCACCAGCGAGGGTGGCCGTGGCACACTGATCATCCGTGATGTGMGGAGTCAGACCAGGGTGCCTACACCTGTGAGGCCATGAACGCCCGGGGCATGGTGTTTGGCATTCCTGACGGTGTCCTTGAGCTCGTCCCA P R V T V T S E C G R G T L I I R D V K E S D O G A Y T C E A M N A R G M V F G I P D G V L E L V
P
1580 500
1581 501
CAACGAGGCCCCTGCCCTGACGGCCACTTCTACCTGGAGCACAGCGCCGCCTGCCTGCCCTGCTTCTGCTTTGGCATCACCAGCGTGTGCCAGAGCACCCGtCGCTTCCGGWCCAGATCAGGCTGCGCTTTGACCMCCCGATGACTTC O R G P C P D G H F Y L E H S A A C L P C F C F G I T S V C O S T R R F R D O I R L R F D O P D D F
1730 550
1731 551
MGGGTCTCAATCTCACMTGCCTGCGCAGCCCGGCACGCCACCCCTCTCCTCCACGCAGCTGCAGATCGACCCATCCCTGCACGAGTTCCAGCTAGTAGACCTGTCCCGCCGCTTCCTCGTCCACGACTCCTTCTGGGCTCTGCCTGM K G V N V T M P A O P G T P P L S S T O L O l D P S L H E F O L V D L S R R F L V H D S F U A L P E I
1880 600 2030 650
1881 601
2180
2031 65 1
CCCACCCMCCTGCTCCTCTGCWIGCCCCAGGTCCAGTTCTCTGAGGAGCACTGGGTCCATGAGTCTGGCCGGCCGGTGCAGCGCGCGGAGCTGCTGCAGGTGCTGWIWGCCTGWGGCCGTGCTCATCCAGACCGTGTACMCACC P T O P G A L N O R P V P F S E E H U V H E ~ R P V O R A E L L O V L O S L E A V L l O T V Y N
2181 701
AAGATGGCTAGCG~GGWCTTAGCWCATCGCCATGGATACCACCGTCACCCATGCCACCAGCCATGGCCGTGCCCACAGTGTGGAGGAGTGCAGATGCCCCATTGGCTATTCTGGCTTGTCCTGCGAGAGCTGTGATGCCCACTTCACl 2330 K W A S V G L S D I A R D T T V T H A T S H G R A H S V E E C R C P ~ G Y ~ L S C E S C D A H F T750
2331 7.51
CGGGTGCCTGCTGGCCCCTACCTGGCCACCTGCTCTGGTTGCAGTTGCMTGGCCATGCCAGClCCTCTCACCCTGTGTATGCCCACTGCCTGAATTGCCAGCACMCACGGAGGCGCCACAGTGCMCMGTGCMGGCTGGCTTCTlT R V P G G P V L G T C S t C S C N G H A S S C D P V Y G H C L N C O H N T E G P O C N K C K A G
2481 801
CGCWCCCCATGMGCCCACCGCCACTTCCTGCCGGCCCTGCCCTTGCCCATACATCGATGCCTCCCGCAGATTCTCAGACACTTGCTTCCTGGACACGGATGGCCMGCCACATGTGACGCCTGTGCCCCAGGCTACACTGGCCGCCGC G D A M K A T A T S C R P C P C P Y I D A S R R F S D T C F L D T D G O A T C D A C A P G Y T G R R
263 1 851
TGTCAGAGCTCTCCCCCCGGATACGAGGGCMCCCCATCCAGCCCGGCGGGAAGTGCAGGCCCGTCAACCAGGAGATTGTGCGCTGTGACGAGCGTGGCAGCATGGGGACCTCCGGGGAGGCCTGCCGCTGTMGMCMTGTGGTGGGG C E S C A P G Y E G N P I O P G G K C R P V N O E I V R C D E R G S M G T ~ E A C R C K N N V
V
G 900
2781 901
CGCTTGTCCMTGMTGTGCTWCGGCTCTTTCCACCTGAGTACCCG~CCCC~TGGCTGCCTCAAGTGCTTCTGCATGGGTGTCAGTCGCCACTGCACWIGCTCTTCATGWGCCGTGCCCAGTTGCATGGGGCCTCTGAGGAGCCT R L C N E C A D G S F H L S T R N P D G C L K C F C M G V S R H C T S S S U S R A O L H G A S E E P
2930 950
2931 95 1
GGTCACTTCAGCCTGACCMCGCCGCMGCACCCACACCACCMCWGGGCATCTTClCCCCCACGCCCGGGGMCTGGWTTCTCCTCCTTCCACAGACTCTTATCTGGACCCTACTTCTGWGCCTCCCTTCACGCTTCCTGGGGWC G H F S L T N A A S T H T T N E G I F S P T P G E L G F S S F H R L L ~ P Y F U S L P S R F L G
3081 1001
MGGTWCCTCCTATGWGGAWGCTGCGCTTCACAGTGACCCAWGGTCCCAGCCGGGCTCCAWICCCCTGCACGGGCAGCCGTTGGTGGTGCTGCMGGTMCMCATCATCCTAGAGCACCATGTGGCCWIGWGCCCAGCCCCGGC K V T S V G G E L R F T V T P R S O P G S T P L H G O P L V V l O G N N I I L E H H V A O E P S P G
3230 1050
3231 1051
CAGCCCAGCACCTTCATTGTGCCTTTCCGGGAGCMGCATGGCAGCGGCCCGATGGGCAGCCAGCCACACGGGAGCACCTGCTGATGGCACTGGCAGGCATCGACACCCTCCTWTCCGAGCATCCTACGCCCAGCAGCCCGCTGAGAGC P P S T F l V P F R E P A U O R P D G Q P A T R E H L L M A L A G l D T L L l R A S Y A O P P A E
3380 llD0
3381 1101
AGGGTCTCTGGCATCAGCATGGACGTGGCTGTGCCCWGG~CCGGCCAGGaCCCCGCGCTGGMGTGGMCAGTGCTCCTGCCCACCCGGGTACCGTGGGCCGTCCTGCCAGGACTGTGA~CAGGClACACACGCACGCCCAGTGGC R V ~ I S M D V A V P E E T G O D P A L E V E O C S C P P G Y R G P S C O D C D T G Y T R T
3530 1150 P ~
3531 1151
CTCTACCTCGGTACCTGTGMCGCTGCAGCTGCCATGGCCACTCAGAGGCCTGCGAGCCAGMACAGGTGCCTGCCAGGGCTGCCAGCATCACACGGAGGGCCClCGGTGlGAGCAGTGCCAGCCAGWTACTACGGGGACGCCCAGCGG L Y L C T C E R C S C H G H S E A C E P E T G A C O G C O H H T E G P R C E P C O P G Y Y G D A O R
3680
3681 1201
GGGACACCACAGGACTGCCAGCTGTGCCCCTGCTACGGAGACCCTGCTGCCGGCCAGGCTGCCCACACTTGTTTTCTGGACACAGACGGCCACCCCACCTGTGATGCGTGCTCCCCAGGCCACAGTGGGCGTCACTGTGAGAGGTGCGCC G T P O D C O L C P C Y G D P A A G O A A H T C F L D T D G H P T C D A C S P G H ~ R H C E R C A
3830 1250
3831 1251
CCTGCCTACTATGGWIACCCCAGCCAGGGCCAGCCATGCCAWGAGACAGCCAGGTGCCAGGGCCCATAGGCTGCMCTGTGACCCCCAACGCAGCGTCAGCAGCCAGTGTGAlGCTGCTGGTCAGTGCCAGTGCMGGCCCAGGTAGM P G Y Y C N P S P G O P C O R D S O V P G P l G C N C D P O G S V S S O C D A A G O C O C K A O V E
3980 1300
F
T 700
2480 800
~
2630 850 2780
3080 O 1000
S
3981 1301
1200
4130 1350 4280
4131 1351
CGGCACTTCCMGGCTTTCCCCTGGTGMCCCACAGCGMACAGCCGCCTGACAGGAGAATTCACTCTGGMCCCGTGCCCGAGGGTGCCCAGCTCTCTTTTGGCAACTTTGCCCMCTCGGCCATGAGTCCTTCTACTCGCAGCTGCCG G D F O G F A L V N P O R N S R L T G E F T V E P V P E G A ~ L S F G ~ F A P L G H E S F V U O
L 1400 P
4281 1401
GAWCATACCAGGCAWCMGGTGGCGGCCTACGGTGGGMGTTGCWTACACCCTCTCCTACACAGCAGGCCCACAGGGCAGCCCACTCTCGGACCCCGATGlGCAGATCACGGGCMCMCATCATGCTAGTGGCCTCCCAGCCAGCG E T Y O C D K V A A Y G G K L R Y T L S Y T A G P O G S P L S D P D V O I T G N N I M L V A S O P A
4430 1450
4431 1451
CTGCAGGGCCCACAGAGGAGGAGCTACGAGATCATGTTCCGAGAGGMTTCTGGCGCCGGCCCGATGGGCAGCCGGCCACACGCGAGCACCTCCTGATGGCACTGGCCGACCTGGATGAGCTCCTGATCCGGGCCACGTTCTCCTCCGTG 4580
L
O
G
P
E
R
R
S
Y
E
I
M
F
R
E
E
F
U
R
R
P
D
G
~
P
A
T
R
E
H
L
L
~
A
L
A
D
L
D
E
L
L
I
R
A
T
FIG. 2. Complete nucleotide and deduced amino acid sequence of human HSPGS protein core. Shown are the nucleotide sequence (top line) and the deduced amino acid sequence in single letter code (bottom line). An open reading frame of 4391 amino acids comprises a mature (lacking the signal peptide) protein core of ~ 4 6 kDa. 7 The signal peptide of 21 amino acids is shaded. The threeamino-terminally located Ser-Gly-Asp (SGD) tripeptides, possible attachment sites for heparan sulfate chains, aredoubly underlined. Three more carboxyl-located Ser-GlyX-Gly (SGXG) tetrapeptides,proposed attachment sites for glycosaminoglycans,are shaded and double-underlined. The other Ser-Gly (SG) dipeptides are underlined. Potential N-glycosylation sites are noted by a bracket. The two stretches of amino acids which are identical to previously sequenced peptides from a human fibroblast HSPG (30) are underlined and printed in bold. A highly hydrophobic region, which may act as a transmembrane domain, is shaded and underlined (see also Table I). Two Leu-Arg-Glu (LRE) tripeptides, which can mediate motor neuron attachment (311,are shuded. The stop (TAG) codon is marked with three asterisks. The polyadenylation signal AATAAA, which is located 16 base pairs upstream of the poly(A) tail, is double-underlined and printed in bold.
F
S
S 1500 V
Molecular Structure of HSPG2 ProteinCore 4581 1501
CCGCTGGTGGCCAGCATCAGCGCAGTCAGCCTGGAGGTCGCCCAGCCGGGGCCCTCMACAGACCCCGCGCCCTCGAGGTGGAGGAGTGCCGCTGCCCGCCAGGCTACATCGGTCTGTCCTGCCAGGACTGTGCCCCCGGCTACACGCGC P L V A S I S A V S L E V A O P G P S N R P R A L E V E E C R C P P G Y I G L S C O O C A P G V T
4731 1551
ACCGGGAGTGGGCTCTACCTCGGCCACTGCGAGCTATGTGMTGCMTGGCCACTCAGACCTGTGCCACCCAGAGACTGGGGCCTGCTCGCMTGCCAGCACMCGCCGCAGGGGAGTTCTGCGAGCTTTGTGCCCCTGGCTACTACGGA T G ~ L V L G H C E L C E C N G H S O L C H P E T G A C S O C O H N A A G E F C E L C A P G Y Y
4880 G 1600
4881 1601
GATGCCACAGCCGGWCGCCTGAGGACTGCCAGCCCTGTGCCTGCCCACTGACCAACCCAGAGMCATGTTTTCCCG~CCTGTGAGAGCCTGGGAGCCGGCGGGTACCGCTGCACGGCCTGCGMCCCGGCTACACTGGCCAGTACTGT O A T A G T P E O C O P C A C P L T N P E N M F S R T C E S L G A G G Y R C T A C E P G Y T G O Y C
5030 1650
5031 1651
GAGCAGTGTGGCCCAGGTTACGTGGGTMCCCCAGTGTGCMGGGGGCCAGTGCCTGCCAGAGAC~CCMGCCCCACTGGTGGTCGAGGTCCATCCTGCTCGMGCATAGTGCCCCMGGTGGCTCCCACTCCCTGCGGTGTCAGGTC E O C G P G Y V G N P S V O G G O C L P E T N O A P L V V E V H P A R S l V P O G G S H S L R C O
V
5180 1700
5181 1701
AGTGGGAGCCCACCCCACTACTTCTATTGGTCCCGTGAGGATGGGCGGCCTGTGCCCAGCGGCACCCAGCAGCGACATCMGGCTCCGAGCTCCACTTCCCCAGCGTCCAGCCCTCGGATGCTGGGGTCTACATTTGCACCTGCCGTMT S G S P P H Y F Y U S R E O G R P V P ~ T O O R H O G S E L H F P S V O P S O A G V Y ~ C T C
R 1750 N
5331 1751
CTCCACCMTCCMTACCAGCCGGGCAGAGCTGCTGGTCACTGAGGCTC~GCMGCCCATCACAGTWCTGTGGAGGAGCAGCGGAGCCAGAGCGTGCGCCCCGGAGCTGACGTCACCTTCATCTGCACAGCC~GCMGTCCCCA L H O S N T S R A E L L V T E A P S K P l T V T V E E O R S O S V R P G A O V T F l C T A K S K S P
5480 1800
5481 1801
G C C T A T A C C C T G G T G T G G A C C C G C C T G C A ~ C G G G M A C T G T C C T G A C C A T T C G C M C G T C C A G C T G A G T G A T G C A G G C A C C T A C G T G T G C A C C G G C T C C M C A T G T T T G C C A T G G A C C A G 5630 A Y T L V U T R L H N G K L P T R A M O F N G I L T I R N V O L S D A G T Y V C T G S N H F A M D O 1850
5631 1851
GGCACAGCCACTCTACATGTGCAGGCCTCGGGCACCTTGTCCGCCCCCGTGGTCTCCATCCATCCGCCACAGCTCACAGTGCAGCCCGGGCMCTGGCGGAGTTCCGCTGCAGCGCCACAGGGAGCCCCACGCCCACCCTCGAGTGGACA G T A T L H V O A ~ T L S A P V V S I H P P O L T V O P G O L A E F R C S A T G S P T P T L E
5781 1901
GGGGGCCCCGGCGGCCAGCTCCCTGCGMGGCAC~TCCACGGCGGCATCCTGCGCCTGCCAGCTGTCGAGCCCACGGATCAGGCCCAGTACTTGTGCCGAGCCCACAGCAGCGCTGGGCAGCAGGTGGCCAGGGCTGTGCTCCACGTG G G P G G O L P A K A O I H G G 1 L R L P A V E P T D O A O Y L C R A H S S A G O O V A R A V L H V
5930 1950
5931 1951
CATGGGGGCGGTGGGCCCAWGTCCMGTGAGCCCAGAGAGGACCCAGGTCCACGCAGGCCGWCCGTCAGGCTGTACTGCAGGGCTGCAGGCGTGCCTAGCGCCACCATCACCTGGAG~G~GGGGGCAGCCTCCCACCACAGGCC H G G G G P R V O V S P E R T O V H A G R T V R L Y C R A A G V P S A T I T U R K E G G S L P P O A
6080 2000
-
4730 1550
R
5330
-
6081 2001
5780 U
Ttwo
6230 2050
GGCCCGGATGCMGTGGTTGTCCTTTCAGCCTCAGATGCCAGCCCA A R H Q V V V L S A S O A S p
-
6231 2051
CCGGGGGTCMGATTGAGTCCTCATCGCCTTCTGTGACA~GGGC~CACTCGACCTCMCTGTGTGGTGGCAGGGTCAGCCCATGCCCAGGTCACCTGGTACAGGCGAGGGGGTAGCCTGCCTCCCCACACCCAGGTGCACGGCTCC 6300 P G V K I E S S S P S V T E G O T L O L N C V V A G S A H A O V T U Y R R G G S L P P H T O V H G S 2100
6381 2101
CGTCTGCGGCTCCCCCAGGTCTCACCAGCTGATTCTGGAGMTATGTGTGCCGTGTGGAGMTGGATCGGGCCCCMGGAGGCCTCCATTACTGTGTCTGTGCTCCACGGCACCCATTCTGGCCCCAGCTACACCCCAGTGCCCGGCAGC 6530 R L R L P O V S P A O U E Y V C R V E N G S G P K E A S I T V S V L H G T H U P S Y T P V P G S 2150
6531 2151
ACCCGGCCCATCCGCATCWGCCCTCCTCCTCACACGTGGCG~GGGCAGACCCTGGATCTGAACTGCGTGGTGCCCGGGCAGGCCCACGCCCAGGTCACGTGGCACMGCGTGGGGGCAGCCTCCCTGCCCGGCACCAGACCCACGGC T R P l R l E P S S S H V A E G O T L O L N C V V P G O A H A O V T U H K R G G S L P A R H O T H G
6681 2201
TCGCTGCTGCGGCTGCACCAGGTGACCCCGGCCGACTCAGGCGAGTATGTGTGCCATGTGGTGGGCACCTCCGGCCCCCTAGAGGCCTCAGTCCTGGTCACCATCGMGCCTCTGTCATCCCTGGACCCATCCCACCTGTCAGGATCGAG S L L R L H O V T P A O ~ E Y V C H V V G T ~ P L E A S V L V T l E A S V I P G P l P P V
683 1 2251
TCTTCATCCTCCACAGTGGCCGAGGGCCAGACCCTGGATCTGAGCTGCGTGGTGGCAGGGCAGGCCCACGCCCAGGTCACATGGTACAAGCGTGGGGGCAGCCTCCCTGCCCGGCACCAGGTTCGTGGCTCCCGCCTGTACATCTTCCAG S S S S T V A E G O T L O L S C V V A G O A H A O V T U Y K R G G S L P A R H O V R G S R L Y l F
6981 2301
GCCTCACCTGCCGATGCGGGACAGTACGTCTGCCGGGCCAGCMCGGCATGGAGGCCTCCATCACGGTCACAGTAACTGGGACCCAGGGGGCCAACTTAGCCTACCCTGCCGGCAGCACCCAGCCCATCCGCATCGAGCCCTCCTCCTCG A S P A O A G O Y V C R A S N G M E A S l T V T V T G T O G A N L A Y P A G S T O P I R I E P S S
7131 2351
CAAGTGGCGGAAGGGCAGACCCTGGATCTGMCTGCGTGGTGCCCGGGCAGTCCCATGCC~GGTCACGTGGCACMGCGTGGGGGCAGCCTCCCTGTCCGGCACCAGACCCACGGCTCCCTGCTGAGACTCTACCMGCGTCCCCCGCC 7280 O V A E G O T L O L N C V V P G O S H A O V T U H K R G G S L P V R H O T H G S L L R L Y O A S P A 2400
7281 2401
GACTCGGGCGAGTACGTGTGCCGAGTGTTGGGCAGCTCCGTGCCTCTAGAGGCCTCTGTCCTGGTCACCATTGAGCCTGCGGGCTCAGTGCCTGCACTTGGGGTCACCCCCACGGTCCGGATCGAGTCATCGTCTTCGCAAGTGGCCGAG O S G E Y V C R V L G S S V P L E A S V L V T I E P A G S V P A L G V T P T V R I E S S S S O V A
E
7430 2450
743 1 245 1
GGGCAGACCCTGGACCTGMCTGCCTCGTTGCTGGTCAGGCCCATGCCCAGGTCACGTGGCACMGCGCGGGGGCAGCCTCCCGGCCCGGCACCAGGTGCATGGCTCGAGGCTACGCCTGCTCCAGGTGACCCCAGCTGATTCAGGGGAG G O T L O L N C L V A G Q A H A O V T U H K R G G S L P A R H Q V H G S R L R L L O V T P A D s _ c
E
7580 2500
7581 2501
TACGTGTGCCGTGTGGTCGGCAGCTCAGGTACCCAGGAAGCCTCAGTCCTTGTCACCATCCAGCAGCGCCTTAGTGGCTCCCACTCCCAGGGTGTGGCGTACCCCGTCCGCATCGAGTCCTCCTCAGCCTCCCTGGCCAATGGACACACC Y V C R V V G S ~ T O E A S V L V T I O O R L ~ S H S O G V A Y P V R I E S S S A S L A N
G
H2550 T
7731 2551
CTGGACCTCMCTGCCTGGTTGCCAGCCAGGCTCCCCACACCATCACCTGGTATMGCGTGGAGGCAGCTTACCCAGCCGGCACCAGATCGTGGGCTCCCGGCTGCGGATCCCTCAGGTGACTCCGGCA~CTCGGGCGAGTACGTGTGT L O L N C L V A S O A P H T I T U Y K R G G S L P S R H O l V G S R L R l P O V T P A O ~ E Y V
7880 C2600
7881 2601
CACGTCAGTMCGGTGCAGGCTCCCGGGAGACCTCGCTCATCGTCACCATCCAGGGCAGCGGTTCCTCCCACGTGCCCAGCGTCTCCCCACCGATCAGGATCGAGTCGTCTTCCCCCACGGTGGTGGMGGGCAGACCTTGGATCT~C H V S N G A G S R E T S L l V T l O G ~ S S H V P S V S P P l R I E S S S P T V V E G O T L O L
N 2650
8031 2651
TGCGTGGTCGCCAGGCAGCCCCAGGCTATCATCACATGGTACMGCGTGGGGGCAGCCTTCCCTCCCGACACCAGACCCATGGCTCCCACCTGCGGTTGCACC~TGTCTGTGGCTGACTCGGGCGAGTATGTGTGCCGGGCCMCMC C V V A R O P O A l l T U Y K R G G S L P S R H O T H G S H L R L H O M S V A O ~ E Y V C R A N
N2700
8181 2701
MCATCGATGCCCTGGAGGCCTCCATCGTCATCTCCGTCTCCCCTAGCGCCGGCAGCCCCTCCGCCCCTGGCAGCTCCATGCCCATCA~TTGAGTCATCCTCCTCACACGTGGCC~GGGGAGACCCTGGATCTGMCTGCGTGGTC
8331 2751
CCCGGGCAGGCCCATGCCCAGGTCACTTGGCACUGCtTGGCGTGGGGGCAGCCTCCCCAGTCACCATCAGACCCGCGGCTCACGGCTGCGGCTGCACCATGTGTCCCCGGCCGACTCGGGTGMTACGTGTGCCGGGTGATGGGCAGCTCTGGC P G O A H A O V T U H K R G G S L P S H H O T R G S R L R L H H V S P A O ~ E Y V C R V H G
8481 2801
CCCCTGGAGGCCTCAGTCCTGGTCACCATC~GCCTCTGGCT~GTGCTGTCCACGTCCCCGCCCCAGGTGGAGCCCCACCCATCCGCATCGAGCCCTCCTCCTCCCGAGTGGCAGMGGGCAGACCCTGGATCTGMGTGCGTGGTG 8630 P L E A S V L V T I E A ~ S S A V H V P A P G G A P P I R I E P S S S R V A E G O T L D L K C V V 2850
8631 2851
CCCGGGCAGGCCCACGCCCAGGTCACATGGCACMGCGTGGAGG~CCTCCCTGCCCGGCACCAGGTCCACGGCCCACTGCTGAGGCTGMCCAGGTGTCCCCGGCTGACTCTGGCGAGTACTCGTGCCMGTGACCGGMGCTCAGGC P G P A I A O V T U H K R G G N L P A R H O V H G P L L R L N O V S P A O ~ E Y S C O V T G
8781 2901
ACCCTGGAGGCATCTGTCCTGGTCACMTTGAGCCCTCCAGCCCAGGACCCATTCCTGCTCCAGGACTGGCCCAGCCCATCTACATCGAGGCCTCCTCTTCACACGTWCT~GGGCAWCTCTGGATCTGMCTGTGTGGTGCCCGGG T L E A S V L V T I E P S S P G P I P A P G L A O P I Y I E A S S S H V T E G O T L O L N C V V P G
8930 2950
8931 2951
CAGGCCCATGCCCAGGTCACGTGGTACMGCGCGGGGGCAGCCTCCCCGCCCGGCACCAGACCCATGGCTCCCAGCTGCGGCTCCACCTCGTCTCCCCTGCCGACTCAGGCGAGTATGTGTGTCGTGCAGCCAGCGGCCCAGGCCCTGAG ~ A ~ A ~ V T ~ y K R G G S L p A R ~ ~ T ~ ~ ~ ~ L R ~ ~
9080
9081 3001
CMGMGCCTCCTTCACAGTCACCGTCCCGCCCAGTWGGGGTCTTCCTACCGCCTTAGGAGCCCGGTCATCTCCATCWCCCGCCCAGCAGCACCGTGCAGCAGGGCCAGGATGCCAGCTT~GTGCCTCATCCATGACGGGGCAGCC9230 3050 O E A S F T V T V P P S E G S S Y R L R S P V l S l O P P S S T V O O G O O A S F K C L l H O G A A
9231 3051
CCCATCAGCCTCGAGTGGMGACCCGGMCCAGGAGCTGGAGGACMCGTCCACATCAGTCCCMTGGCTCCATCATCACCATCGTGGGCACCCGGCCCAGCMCCACGGTACCTACCGCTGCGTGGCCTC~TGCCTACGGTGTGGCC 9380
9381 3101
CAGAGTGTGGTGMCCTCAGTGTGCACGGGCCCCCTACAGTGTCCGTGCTCCCCGAGGGCCCCGTGTGGGT~GTGGG~GGCTGTCACCCTGGAGTGTGTCAGTGCCGGGGAGCCCCGCTCCTCTGCTCGTTGGACCCGGATCAGC 9530 3150 O S V V N L S V H G P P T V S V L P E G P V U V K V G K A V T L E C V S A G E P R S S A R U T R I S
9531 3151
AGCACCCCTGCCMGTTGWGCAGCGGACATATGGGCTCATGGACAGCCACGCGGTGCTGCAGATTTCATCAGCT~CCATCAGATGCGGGCACTTATGTGTGCCTTGCTCAGMTGCACTAGGCACAGCACAGMGCAGGTGGAGGTG S T P A K L E O R T V G L M O S H A V L O l S S A K P S O A G T Y V C L A ~ N A L G T A O K O V
9681 3201
ATCGTGGACACGGGCGCCATGGCCCCAGGGGCCCCTCAGGTCCMGCTGMGAAGCTGAGCTGACTGTGGAGGCTGGACACACGGCCACCTTGCGCTGCTCAGCCACAGGCAGCCCCGCGCCCACCATCCACTGGTCCMGCTGCGTTCC I V O T G A M A P G A P O V O A E E A E L T V E A G H T A T L R C S A T G S P A P T I H U S K L R S
9831 3251
CCACTGCCCTGGCAGCACCGGCTGGAAGGTGACACACTCATCATACCCCGGGTAGCCCAGCAGGACTCGGGCCAGTACATCTGCMTGCCACTAGCCCTGCTGGGCACGCTGAGGCCACCATCATCCTGCACGTGGAGAGCCCACCATAT P L P U O H R L E G O T L l l P R V A O O O ~ O Y l C N A T S P A G H A E A T l l L H V E S P
6680 2200
6830 l 2250 E
R
6980 2300
O
S
7730
8030 8180 8330 2750
N I O A L E A S I V I S V S P S A G S P S A P G S S H P I R I E S S S S H V A E G E T L O L N C V V
-
7130 2350
8480 S2800 ~
8780 S 2 ~W O
~ 3000 V
-
P l S L E U K T R N O E L E O N V H l S P N G S l l T l V G T R P S N H G T Y R C V A S N A Y G V A 3100
u
FIG.2-continued
9680 E
V 3200
9830 3250
W80 P
Y 3300
S
p
A
Molecular Structure ofProtein HSPG2
Core
8549
9981 3301
10130 3350
10131 3351
102m 3400
10281 3401
10430 3450
10431 3451
10580 3500
10581 3501
10730 3550 1o m
10731 3551
GGACAGTATCGCTGCACTGCUCCMCGCAGCTGGCACUCACMTCCCACGTCCTGCTGCTTGTGCMGCCTTGCCCCAGATCTCMTGCCC~GMGTCCGTGTGCCTGCTGGTTCTGCAGCTGTCTTCCCCTGUTAGCCTCAGGC G O V R C T A T N A A G T T O S H V L L L V O A L P O l S M P O E V R V P A G S A A V F P C l
10881 3601
TACCCCACTCCTGACATCAGCTGGAGCMGCTGWTGGCAGCCTGCCACCTWCAGCCGCCTGGAGAACMCATGCTWTGCTGCCCTUGTCCWCCCCAGWCGCAGGTACCTACGTCTGCACCGCCACTMCCGCCAGGGCMGGTC Y P T P D I S U S K L D G S L P P D S R L E N N M L M L P S V R P O O A G T V V C T A T N R O G K V
11031 365 1
11180 MAGCCTTTGCCCACCTGCAGGTGCCAWGCGGGTGGTGCCCTACTTCACGCAWCCCCCTACTCCTTCCTACCGCTGCCCACCATCMGCCACTCGWTGCCTAUGWGTTCGAGAT~WTCACCTTCCGGCCCGACTCAGCCGATGGWTG 3700 K A F A H L O V P E R V V P V F T P T P Y S F L P L P T I K D A Y R K F E I K I T F R P D S A D G M
11181 3701
CTGCTGTACMTGGGCAGMGCWGTCCCAGGWGCCCUCCMGCCACTCCCTGGCCMCCGGCAGCCCWCTTUTCTCCTTCGGCCTCGTGGGGGWGGCCCGAGTTCCGGTTCGATGCAGGCTCAGGUTGGCCACUTCCGCCATCCCAU L L Y N G Q K R V P G S P T N L A N R O P D F l S F G L V G G R P E F R F D A G ~ M A T l R H
P
T 3750
11331 3751
CCACTGGCCCTGGGCUTTTCCACACCGTGACCCTGCTGCGCAGCCTCACCUGGGCTCCCTWTTGTGGGTGACCTGGCCCCGGTCMTGGGACCTCCCAGGGCMGTTCCAGGGCCTGWTCTWCWGGMCTCTACCTGGGTGGC P L A L G H F H T V T L L R S L T O G S L l V G D L A P V N G T S O G K F O G L D L N E E L V L G G
11480 3800
A 3600 ~
11030 3650
11330
11481 3801
11630 3850
11631 3851
11780 3900
11781 3901
11930 3950
11931 3951
12080 4000
12081 4001
12230 4050
12231 4051
12380 4100
12381 4101
12530 4150
12531 4151
~CCCTGTCTGUTGGGGGUCCTGCCAGGGCACCCGCTGCCTCTGCCTCCCTGGCTTCTCTGGCCCACGCTGCCMCAAGGCTCTGGACATGGCATAGCAGAGTCCGACTGGCATCTTGMGGCAGCGGGGGCAATWTGCCCCTGGG 12680 :::: ....ei:P C L H G G T C 0 G T R C L C L P G F P R C 0 P G P.i:'il&i"i#'::ii:kI A E S D U H L E G SG G W D A P G 4200 ..... .
12681 4201
CAGTACGWGCCTATTTCCACWTGATGGCTTCCTCGCCTTCCCTGGCCATGTCTTCTCCAGWGCCTGCCCWGGTGCCCGAGACCATCWGCTGWGGTTCGWCCAGCAUGCCAGTGGCCTCCTGCTCTGGCAGGGTGTGWGGTG ~ Y G A Y F ~ D D G F L A F P G H V F S R ~ L P E ~ P E ~ I E L E ~ R
12831 4251
GGAWGGCCGGCCMGGCMGWCTTCATCAGCCTCGGGCTTCMWCGGGCACCTTGTCTTCAGGTACCAGCTGGGTAGTGGGWGGCCCGCCTGGTCTCTGAGGACCCCATCMTWCGGCGAGTGGCACCGGGTWCAGCACTGCGG, 12980 G E A G O G K D F I S L G L O D G H L V F R Y O L G ~ E A R L V S E D P I N D G E U ~ R V T A. . .~. . .~. L4300 : :
~
~
12981 4301
GAGGGCCGCAGAGGTTCCATCCMGTCWCGGTGAGGAGCTGGTCAGCGGCCGGTCCCCAGGTCCCAACGTGGCAGTCMCGCCMGGGCAGCGTCTACATCGGCGGAGCCCCTGACGTGGCUCGCTWCCGGGGGCAWTTCTCCTCG ' ~ ~ j : ~ G R R G S l O V D G E E L V ~ R S P G P N V A V N A K G S V Y l G G A P D V A T L T G
13130 G 4350 R F
S
~
13131 4351
GGCATCACAGGCTGTGTCMWCCTGGTGCTGCACTCGGCCCGACCCGGCGCCCCGCCCCCACAGCCCCTGGACCTGCAGCACCGCGCCCAGGCCGGGGCCAACACACGCCCCTGCCCCTCGTAGGCACCTGCCTGCCCCACACGGACT 13280 ~ I T G C V K N L V L H S A R P G A P P P ~ P L D L P ~ R A ~ A G A N ~ R P C P ~ * * * 4391
13281 13431 13581 13731 13881 14031 14181 14331
CCCGCGCCACGCCCCAGCCCW~TGTCWGTATATTATTATTMTATTATTATWTTTTTGTMGA~CCGAGGCWTGCCACGCTTTGCTGCTACCGCCCTGGGCTGGACTGGAGGTGGGCATGCCACCCTCACACACACAGCTGG 13430 GCAMGCCACAAGGCTGGCCAG~GGCAGGTTGGATGGWGTGGGCACCTCA~GTCACCAGGACTTGGGGTCAGWACAGTGGCTGGGTGGGCCCAGAACTGCCCCCACTGTCCCCCTACCCACCGATGGAGCCCCCAGATAGAGC 13580 TGGGTGGCCTGTTTCTGUGCCCTTGGGCAGTTCTCACTCCTAGGAGAGCCMCCTCGGCTTGTGGGCTGGTGCCCCACAGCTACCTGAGACGGGCATCGCAGGAGTCTCTGCCACCUCTCAGGATTGGGMTTGTCTTTAGTGCCGGC 13730 TGTGGAGCMAAGGCAGCTCACCCCTGGGCAGGCGGTCCCCATCCCCACCAGCTCGTTTTTCAGCACCCCCACCCACCTCCACCCAGCCCCTGGCACCTCCTCTGGCAGACTCCCCCTCCTACUCGTCCTCCTGGCCTGCATTCCCACC 13880 CCCTCCTGCCAGCACAUGCCTGGGGTCCCTCCCTCAGGGGCTGTMGGGMGGCCCACCCCMCTCTTACCAGGAGCTGCTACAGGCAGAGCCCAGCACTGATAGGGCCCCGCCCACCGGGCCCCGCCCACCCCAGGCCACATCCCCAC 14030 CCATCTGWGTGMGGCCCAGGWCTCCTCCMCAWCMCGGACGGACGGATGCCGCTGGTGCTCAGGAAGAGCTAGTGCCTTAGGTGGGGGMGGCAGGACTCACGACTWGAWGAGAGWGGGGGATATGACCACCCTGCCCCAT 14180 CTGUGGAGCCTGMGATCCAGCTCMGTGCUTCCTGCCAGTGGCCCCCAGACTGTGGGGTTGGGACGCCTGGCCTCTGTGTCCTAGMGGGACCCTCCTGTGGTCTTTGTCTTGATTTTTCTT~GGTGCTATCCCCGCCAM 14330
MMMAMMMMMMMMMA
14356
FIG. 2-continued
TABLEI Prediction of hydrophobic domains in the protein core of HSPG2 proteoglycan Prediction of transmembrane domains was obtained using the programs Raoargos (33), Helixmem (34), and Soap (35) contained in the PC/GENE package program. The settings for the various estimates were the same as in the original methods for predicting transmembrane hydrophobic domains. The signal peptide is consistently predicted by the three methods. The cleavage occurs before the valine according to the -3,-1 rule of von Heijne (22). Notice that only one other domain, centered in the middle of the protein core between amino acid 2010 and 2026, is recognized as a possible transmembrane domain by all the three methods. The 17 amino acids predicted by the three methods are boldface. Residues
2-23 515-530 1491-1511 2007-2034 2007-2027 2010-2026
12830 ~4250 ~
Classification
WRAPGALLLALLLHGRLLAVT SAACLPCFCFGITSVC LLIRATFSSVPLVASISAVSL
IATLLIPAITTADAGFYLCVATSPAGTA IATLLIPAITTADAGFYLCVA LLIPAITTADAGFYLCV
Signal peptide Transmembrane Transmembrane Transmembrane Transmembrane Transmembrane
helix helix helix multimeric segment
Ref.
33-35 33,34 33 33 34 35
~
A
~ '
~
8550
Molecular Structure of HSPG2 ProteinCore Residue
DOMAIN HOMOLOGY: 1.000
I I II
I
111
1.000
1,000
IV
4,000
I
v
FIG. 4. Homology plot analysis of HSPGP protein core reveals multiple internal repeats. The complete peptide sequence shown in Fig. 2 was subjected to homology plot analysis using the Dot-Plot program (GCG package) to illustrate the presence of internal FIG.3. Molecular model of the human HSPG2 protein core. repeats. The window setting was 30, with a stringency of 20. A single This chimeric molecule is composed of five discrete domains. Domain dot, thus, indicates that >20 residues match in a searching area of 30 I is specific for HSPG2 and contains a cluster of three glycosamino- amino acids. The horizontal bar at the bottom indicates the various glycans (dottedlines) attached to SGD sequences which are only domains as in Fig. 3. Domain Z, HSPG2-specific; domain ZZ, LDLfound in this region. Domain I1 is highly homologous to the LDL receptor-like; domain ZZZ, laminin A short arm-like; domain ZV, Nreceptor and contains four cysteine-rich repeats. Domain IIa contains CAM-like; domain V, laminin A globular domain-like. one IgG-like repeat. Domain I11 is homologous to the short arm of laminin, with four cysteine-rich domains intercalated among three subdomain (Fig. 6B) contains only two half-repeats, the sec(a, b, and c) globular domains. Domain IV comprises 21 consecutive IgG-like repeats as found in N-CAM. This region differs the most ond and third subdomains contain three complete and two from the published murine sequence where only 14 IgG-like repeats half-repeats, whereas the fourthsubdomain contains one full are present (21). Domain V is similar to the structure observed in the and one half-repeat. The similarity in the amino acid sequence carboxyl globular domain (G-domain)of laminin Achain. It contains and the conservation of the cysteine-rich regions between four EGF-like cysteine-rich repeats intercalatedamong three globular laminin A chain and the HSPG2 protein core suggest that repeats (a, b, and c). The dotted lines in domain IV and V represent they may have originated from a common ancestor at similar additional glycosaminoglycan chains possibly bound to SGXG seevolutionary time points aspreviously proposed (21). quences (23). In addition, there are several motifs, such asten Domain ZV: Homology with the Immunoglobulin Repeats of potential N-glycosylation sites dispersed throughout the molecule and a hydrophobic region in the middle of the molecule, which are N-CAM (Amino Acids 1677-3686)"Domain IV is the largest not represented for clarity. For additional details, see Fig. 2 and the domain of the HSPG2 molecule comprising 2010 residues text. with 21 consecutive repeats homologous to the immunoglob-
discussed below because there are21 additional repeatsof this nature found in domain IV. Domain ZZZ: Homology with the Short Arm of Laminin A Chain (Amino Acids 505-1676)"Homology plot analysis between the laminin A chain (44,45)and theHSPG2 sequence shows several areas of high homology between domain I11 and laminin (Fig. 6A). Domain I11 can be further divided into seven discrete subdomains: three cysteine-free globular domains (designated ZZZa,ZZZb, and ZZZc on Fig. 3) and four cysteine-rich repeats (Fig. 6B).The globular domains are 85% identical to those described in the murine sequence (21) and are =30% homologous to domain IV of the short arm of the laminin chains(44-46).The cysteine-rich repeats are30-35% homologous to those found in domains I11 and V of the laminin short arms (44-46).Alignment of these repeats (Fig. 6B) shows that a full repeat contained eight conserved cysteine andseveral glycine residues that arelikely to be involved in the bending of the loops as described for the laminin molecule (44-46).As in laminin, not all the human HSPG2 repeats contain all the 8 cysteines. The first cysteine-rich
ulin repeats of N-CAM (43).Therefore, this protein core is the gene product with the largest number of IgG-like repeats thus far described. Alignment of the 22 repeats (Fig. 71, including the isolated repeat in domain IIa (see Fig. 3),reveals near complete conservation not only of the cysteine but also of glycine and tryptophan, typically found in members of the immunoglobulin gene superfamily (43).In contrast, the murine species has only 14 IgG repeats, which makes the mouse protein core =67 kDa smaller than the human (21). The presence of this longer polypeptide in the human species suggests that there may be alternative splicing in this region and that the human species may assume a more extended configuration. A possible glycosaminoglycan attachment sequence is found in the 14threpeat (see Fig. 3). Domain V: Homology with the Carboxyl-terminal G-domain of Laminin AChain and EGF (Amino Acids 36874391)"The terminal module of 705 residues comprises seven discrete subdomains: three globular regions ( Va, Vb, and Vc, see Fig. 3) and two duplicate repeats similar to theepidermal growth factor (47).The globular subdomains exhibit a high degree of similarity (~33%)to the globular carboxyl end G-domain of
8551
Molecular Structure of HSPG2 Protein Core
A LDL RECEPTOR (residue) ZOO
0
400
600
100
I “ “’ .” ” ’ ” ” ” ’ ” ” ”
-
IO0
-
600
FIG. 5. Homology plot analysis of HSPGZ protein core and the LDL receptor ( A ) and amino acid align-
ment between the human and murine species ( B ) .The complete amino acid sequenceofhumanLDLreceptor was compared with the first 860 residues of human HSPG2 sequence ( A ) .Homology plot analysis was done using the DotPlot program (GCG package) with window settings of 30 and stringency of 15. A single dot, thus, indicates that >15 residues match in a searching area of 30 amino acids. In B , the four sequences in domain I1 that arehomologous to the LDL receptor are compared with the mouse sequence (21). All the 6 cysteines (shaded) are fully conserved. The consensus sequence (42) is shown at the bottom. The sequence DGSDE, aproposed binding site for LDL, is underlined. Gaps introducedto optimize alignment are shown as dashes.
3
e
2
Y
r
1
-
ZOO
,’
Repeat (residue)
1 Human
(194-234)
Mouse (195-235)
2 Human (281-319) Mouse (281-319)
3
Human (320-359) Mouse (320-359)
4 Human
GPPPP---
(360-403)
Mouse (360-403) LDL RECEPTOR CONSENSUS ( 4 2 )
laminin A (44, 45) and to merosin, a laminin A homologue (48). These subdomains may fold into globular structures as found in laminin. The EGF-like repeatsare composed of about 40 amino acids, each exhibiting a perfect conservation of all the 6 cysteines (Fig. 8). The amino acid alignment follows a consensus sequence for theEGF type 1 repeat (47) and contains several conserved glycines that may be involved in folding. The second and fourth EGF repeats contain a SGXG sequence which may be substituted with glycosaminoglycan chains. It is not clear whether the EGF motifs can act as growth modulators or whether they have lost their original cell signaling abilities and may serve as mere “spacers” between more functionally active domains. Detection of HSPG2 mRNA by in Situ Hybridization-To determine the precise cellular localization of HSPGP gene expression, a number of human tissueswere fixedin a manner that preserves mRNA (16) and processed for in situ hybridization using the biotin-labeled HS-1 insert( 9 ) .Specific signal, seenas a dark purple chromogen by Nomarski optics, is detected in the syncytiotrophoblasts of placenta (Fig. 9 B ) and within the endothelial cells of the fetal circulation (Fig. 912). In contrast, biotin-labeled pBR322 plasmid without any human insertgives no appreciable signal (Fig. 9 D ) .The presence
T- @ ---EF-
&
--G--
I---W-
4
D---D
8
-pEspE--
@
of the HSPGP transcript was detected in generally all epithelial and endothelial cells of all the tissues analyzed, including those derived from colon, prostate, uterus, ovary, and skin (not shown). The presence of HSPG2 transcriptin endothelial cells is in close agreement with previous studies which have shown a single transcript of 12-14 kb in RNA derived from human endothelial cells, human fibroblasts, colon tissue, colon carcinoma cells, and liver ( 9 ) . In summary, the in situ hybridization studies reported above indicate a wide expression of the HSPG2 gene among human vascularized tissues. Tissue Distribution of HSPG2 Protein Core os Detected by Monoclonal Antibody and Immumenzymutic Staining-To investigate in more detail the expression of this proteoglycan, frozen sections of various human tissues, both benign and malignant, were reacted with a monoclonal antibody HS42 (19). This antibody was raised against the protein core of the human placenta proteoglycan, a large heparan/dermatan sulfate hybrid proteoglycan that binds fibronectin and is localized to the basement membrane (19). Because HS42 reacts strongly with the human HSPG2proteoglycan and stains the colon carcinoma cells intensely (not shown), experiments were performed to investigate in detail the distribution of HSPG2 in various human organs. The technique used was
Molecular Structure of HSPG2 ProteinCore
8552
A A CHAIN (residue) 2.000
1,000
0
2.000
FIG. 6. Homology plot analysis of HSPG2 protein core and the laminin A chain (A) and amino acid alignment of cysteine repeats in domain I11 ( B ) .The first 2600 residues of human laminin A sequence (45) was compared with the corresponding amino acid residues of human HSPG2 (A). Homology plot analysis was done using the DotPlot program (GCG package) with window setting of 30 and a stringency of 15. A single dot,thus, indicates that >15 residues match in a searching area of 30 amino acids. In B, the four cysteine repeats of domain I11 are aligned. Gaps introduced to optimize alignment are shown as dashes.
B Repeat (residue) 1 (505-530)
C PDGHFYLEHSAA---
C F C FGITSV------ C
2 (731-933)
C C S C NGHASS------ C DPVY-GH----- C C P C PYIDASRRFSDT C FLDTDGQAT--- C - VNQEIVR----- C DERGSMGTSGEA C C F C MGVSRH------ C "
3
C C C C
RLN DA R-
LP
C
PIGYSGLS C ES C DAHFTRVPGGPYLGT SG QHNTEGPQ C NK C KAGFPGDAWATATS C RP APGYTGRR C ES C APGYEGNPIQPGGK- c RP K N N W G R L C NE C ADGSFHLSTRNPDG- LK
C S- C PPGYRGPS C QD C DTGYTRTPSGLYLGT C ER
(1126-1334)
C S C HGHSEA-----C EPEl"GA----- C QG C C P C YGDPAAGQAAHT C FLDTDGHPT--- C DA C C N C DPQGSVSSQ--- C DAAGQ------- C Q- C C F C MGITQQ------ C C R- C C E 0 NGHSDL------ C HPEl'GA------ C SQ C C A C PLTNPENMFSRT C ESLGAGGYR--- C TA C
4 (1530-1670)
QHHTEGPR 6 EQ 0 QPGYYGDAQRGTWD C QL SPGHSGRH C ER C APGYYGNPSQGQP" QRDSQVPGPIG KAQVEGLT C SH C RPHHPHLSASNPDG- b LP
c 5
PPGYIGLS C QD $ APGYTRTGSGLYLGH EL QHNAAGEF C EL C APGYYGDATAGTPED Q QP EPGYTGQY 0 EQ C GPGWGNPSVQGGQLP
#
REPEAT pama," I l a 1
PPQVVTPPRESIQA 5 RGPTVTFI C VA1
5 VPTPIIN-
Y
6
RLN-YGHIPSHPRVTVTSEGGRGTLIIRDVKESDP
AYT C EIVINARGlVFGIPDGVLELVPPRGP
pmain I V
2 1 1
5 6
I 8
9 LO I1 12 13 1) 15
I6
I7
LVVEVHPARSIVPP 8 G--SHSLR fi PVS P IVTVLEPRSPSVRP 6 A - - D V I F I C TAK S PVVSIHPPPLIVPP 9-LAEFR 0 SAT 0 PRVPVSPERTPVHA R - - l V R L I C R A A 6 PGVKIESSSPSVTE 6 P--lLOLN C VVA 4 RPIRIEPSSSHVAE 6 D--ILDLN C VVP &
4 4
20 21 22
U SRE-DGRPVPSGlPPRH---~GSELHFPSVPPSOA Y IRLHNGKLPTRIVIOF------NGILTIRNVOLSOA
VY1 C ICRNLHPSNISRAELLVlEAPSKPl I Y V E IGSNlF~DOGIAlLHV-MSGlLSA M -lGGPGGPLP~QIH-----GGILRLPLV~PlDP PYL RMSSAGq$AUAYLW%GGG Y RKE-GGSLPPPARSERl---DlAlLLlPAlTlADA FYL VATSPAGTAQARlPVVVLSASDASP Y IRR-GGSLPPHTQVH------GSRLRLPPVSPADS EYV @ RVENGSGPKEASITVSVLHGIHSGPSITPVPGST Y HKR-GGSLPARHOlH------GSLLRlHOVTPAOS 6 E I V C HVVGlSGPIFASYIYTI-EAS-------VIPGPI
1
6
&
c
F'YAllVPEw\SVQA PTVQVTPQLETKSI VLlNlRTSVQlVVV LPQISMPPEVRVPA
6
4 6
E-TVPLQ A-SVEFH H--AVEFE S-IAVFP
Repeat
1
2 3 4
G
6
6
LAH AVP LAL IAS
4
9
TPPL-IF9 DPGT-QLR DPKP-QVI
c
1
PPIRIEPSSSPVAE 6 0--1LDLN ij VVP 6 PSHA-&I # HKR-GGSLPVRHdlH------GSLLRLYbSPADS PTVRIESSSSPVLE 6 0--1LDLN LVA @ PAHA-PVI U HKR-GGSLPARHPVH------GSRLRlLPVlPADS IPVRlESSSASLAN 8 H--TLOLN C LVA 1 PAPH-TIT Y YKR-GGSLPSRH~lV------GSRLRlPPVlPADS PPIRlESSSPlVVE B 0--1LOLN E VVA It O P O A - I I T Y YKR-GGSLPSRHOTH------GSHLRLHOMSVADS MPlRlESSSSHVAE E-TLDLN VVP O A k O V T U HKR-GGSLPSHHOTR------GSRLRLH~VSPADS PPlRlLPSSSRVAL @ P--ILDLK C VVP QAHA-QVl Y HR-GGHLPARHOVH------GPLLRLNPVSPADS QPIIIEASSSHVTE P-ILDLN VVP 1 P A H I - Q V I ! I YKR-GGSLPARHPlH------GS~lRLHLVSPADS PVlSlDPPSSTVPQ r( 9-DASFK @ L I H d GPAPISLE ! IKTRNPELEDNVHISPN-----GSllTlVGlRPSHH T V S V L P E G P M K V 6 K - - A V l L f I VSA B EPRS-SAU TRI-SSIPULEORIYGIYOSHIVLOISSUPSOI
18
19
SPPHIFYKSPAITLV SPTP-TLE VPSA-TIT SAHA-PVI OAHA-OVI
U
SRV-GSSLPGdAT#------nELLHFERAAP~DS
Y -FKEGGPLPPGHSVP------OGVLRIPNLWSCP Y -SKVGGHLRPGlVPS------GGVVRlAHViLADA 6 YPIP-OIS Y -SKLDGSLPPDSRLE------NWnLnLPSYRPQDA
Amino a c i d
1
(3849-3S88) (3889-3929) (4109-4147) (4148-4184)
g
C o n s e n s u s sequence ( 4 7 )
4 IB e8
RVLGSSVPLEASVLVTI-EPA~-GSV--PALGVT RVVGSSGl~EASVLVTI-WRLSGS---HSOGVA HVSN~GSRElSLIVlI-POSGSS----WPSVS
EIV EIV EYV EYV
E RAUNNID*LEASIVISV-SPSAGSP---SAP6SS E Y i 6 RVrw;SSGPiiAsViVil-EASGSSAVHVPAPSGA
8 EYS '.
1&
QVlGSSGlLEASV~Vll-EPsspCpl--PAPGLA RMSGPGPEQEASFTYTVPPSECSSY--RLRS VASNAIGVAQSVVNLSV-HGPP I V Y i! L A O M L G 7 A L y O V F V I V O l ~ P G A EYV IYR
1 81 8 RIR
TI1
PIR IIV
RVTNKYGSIEAFAPLLV-~PPGSLPAlSIPffiSl PAffiPffiMMSAPLVI-PALPS TATMGTTOSWLLLV-OA TAIURPGXVMFAHLPVPERVVPIFTPIPISFL
2
3
4
9
9
R-DRP C. QN99Q HD-SESSSYV C H-PEA G GPDAT $i VNRPDGRGYT E-RQP C QHPAT MP-AGEYEFQ $ LHST Q------GTR @ QLREP
9
C
0
9
P"--
C LNGGT H
9
----D----A
FIG.7. Alignment of the 22 immunoglobulin-like repeats in domains IIa and IV of the human HSPG2 protein core. One I&-like repeat is present in domain IIa (see Fig. 3), and 21 consecutive repeats are present indomain IV. The individual repeats vary between 85 and 100 amino acids and have extensive conservation of the cysteine (C), glycine (G), and tryptophan ( W )residues, which are all shaded. Gaps introduced to optimize alignment are shown as dashes.
5 V R L L
$j -
9
a
6
PASFTPSR @ EHSQALH HLQRSPLR EEGVT RWFKGDL EHEENP LF'QFSPPR QQGSGH
p
8
"GF-G-R Y
@ E
Q
FIG. 8. Alignment of the four epidermal growth factor-like repeats in domain V. The four EGF-like cysteine-rich domains are aligned with each other and with the consensus sequence for EGF type 1 repeats (47). Notice that there is perfect conservation of the 6 cysteine residues (shaded and numbered 1-6). The disulfide bonds are expected to form as shown: CYS"~,C Y S ~and - ~ , Cy&' (47). The conserved glycines are underlined. Gaps introduced to optimize alignment are indicated as dashes.
highly sensitive both due to the fact that the antibody was strongly reactive at >1:1000 dilution, thus notably reducing the background, and that thesignal was amplified by bridging
antibodies (20). The results show a marked labeling along the basement membranes of human glomeruli (Fig. lOA), proximal and distal renal tubules (Fig. lOB), smooth muscle cells
Molecular Structure of HSPGZ Protein Core
FIG.9. Localization of HSPG2 proteoglycan transcript by in situ hybridization. A, light microscopic view of human termplacenta stained with hematoxylin and eosin. Notice a chorionic villus with well developed fetal vessels lined by syncytiotrophoblasts. B and C,sequential sections from the same block viewedby Nomarski differential interference contrast optics. Notice the presence of HSPG2 transcript (arrowheads) in the syncytiotrophoblasts and endothelial cells, respectively. D,control section where HSPG2 probe was replaced with an irrelevant plasmid. For in situ hybridization, freshly obtained tissues were fixed at -20'C in modified Carnoy's solution, sequentially dehydrated inethanol, impregnated with amyl acetate/paraffn, andembedded in Paraplast. Eight-pm-thick sections were digested with proteinase K, refixed in formaldehyde, and denatured with formamide prior to hybridization (17). The human HS-1 insert (9) was biotin-labeled by nick translation in the presence of biotin-dUTP (18). [3H]dATPwas routinely used to monitor incorporation, whereas pBR322 plasmid was labeled as control. Routinely, hybridization conditions included 5 pg/ml of biotinylated probe which was denatured at 65 'C for 5 min before incubation, 7% dextran sulfate, 24% formamide, 37 "C, 48 h. After rinsing with 12.5% formamide, hybridization was visualized histochemically with streptavidin-alkalinephosphatase (Enzo Biochemicals) using 5-bromo-4chloro-3-indolyl phosphate andnitro blue tetrazolium as substrates. The hybridization signals were detected as dark purple precipitates on the sections ( B and C).(Bar in A = 10 pm).
8553
;" "
A
hl
of colonic tunica muscularis (Fig. lOC),arterioles and venules liver (not shown), in agreement with a previousreport (19). Taken together, these results indicate a ubiquitous distri(Fig. loll), and ovarian epithelium (Fig. 1OE). A striking degree of reactivity is observed in the ovarian stromal cells bution of HSPG2 proteoglycan throughout the vascularized (Fig. 1OE). In contrast, in papillary carcinoma of the ovary tissues and suggest that a significant contribution to the (Fig. lOF), the HSPGP epitopes are primarily located along genesis of this proteoglycan is brought by stromal cells which the basement membranes of the neoplastic cells and blood previously have beenthought to be incapableof synthesizing vessels. In normal colonic mucosa (Fig. lOG), the proteoglycan basement membrane constituents. is localized alongthe basement membrane of the epithelium DISCUSSION and blood vessel cells. The apparent reactivity of the apical portion of the epithelial cells (Fig. 10G) is likely due to the In the present study we report the complete sequence of endogenous alkaline phosphatase, since it is also present in the human HSPG2 protein core, whose partial sequence and the control sections lacking monoclonalantibody (not shown). chromosomal assignment were initially reported by this labIn colon carcinoma (Fig. lOH), the proteoglycan is localized oratory (9). This proteoglycan is one of the largest gene not only along the basement membrane of the tumor cells products in human tissues with a mature protein core of -467 (multiple arrowheads), but also in the fibrovascular tumor kDa. If we include the numerous post-translational modifistroma (single arrowhead). Intense proteoglycan reactivity is cations such as six potential glycosaminoglycan side chains also found in the syncytiotrophoblasts and the endothelial of c30 kDa each (2), the large number of 0-linked oligosacbasement membrane of the fetal vessels (not shown) of term charides (3), 10 potential N-linked oligosaccharides, and two human placenta. These results are in close agreement with or more long-chainfatty acids (6), the complete proteoglycan the in situ hybridization studies reported above. Finally, we could reach the size of 850 kDa as we originally estimated (2). found immunoenzymaticreactivity along the basement mem- One of the most fascinating features of this gene product is brane of skin, prostatic glands, and perisinusoidal region of its elaborate structure which clearlyis a result of the assembly
8554
Molecular Structure of HSPG2 ProteinCore
FIG. 10. Gallery of light micrographs of various human tissues fol-
lowingimmuuoenzymatic staining with monoclonal antibody against the protein core of HSPG2. A, renal cortex showing intense staining of glomerular (arrowhead) and tubular basement membranes. B, higher magnification showing intense staining of tubular basementmembrane (arrowheads). C, cross-sectionof tunica muscularis of colon showing diffuse cellular positivityin smooth muscles, whereas the nerve plexus is essentially negative.D, vessels in perirenal fibmadipose tissue. E, ovary. F, papillary carcinoma of the ovary. G, colonic mucosa. Notice the intense staining of the epithelial basement membrane (arrowheads). The reactivity present in the apical portions of the epithelial cells is likely due to the endogenous alkaline phosphatase; it was present also in control sections (not shown). H,human colon carcinoma. Notice the presence of immunoreactive proteoglycan alongthe basement membrane (multiple arrowheads) and in the fibrovascular tumor stroma (single arrowhead). Frozen sections of freshly-obtained surgicalspecimens were reactedwith HS42 monoclonal antibody(19)from ascites fluid at a 1:lOOO dilution, followedby rabbitanti-mouse bridging antibodies. Immunoenzymatic labelingof the monoclonal was achievedby usingimmunecomplexes of alkaline phosphatase and antialkaline phosphatase monoclonal antibodies. Signal enhancement was achieved by using oneor two additional layers ofbridging antibody and antiphosphatase antibody(20). (A, c, and E = X 210; B, D, and F = X 450; H = X 600).
1
of modular units. These protein modules are involved in the control of lipoprotein metabolism, in adhesion of cells to substratum, in theinteractions between cellsand matrix, and in the control of cellular growth. The existence of repeated modules in vertebrate proteins is now well recognized (49) and allows a more efficientutilization of functional domains by providing synergism of functional units (49-51). At the genomic level, their molecular organization is characterized by a series of exons with the same phase at their intron-exon boundaries, which allows divergent evolution of a primordial gene by either duplication or exon shuffling (49-51). Below, we will briefly discussthe various modulesand we will attempt to generate a comprehensive understanding of this complex proteoglycan. Domain I: A Unique Region Carrying ThreeHeparan Sulfate Chains-This172-aminoacid-longdomain appears to be unique forthe HSPG2, since an extensive computersearch of >32,000 protein sequences using the FASTA program revealdd no significant similarity with any other protein. This domain contains a cluster of three SGD sequences whichare exclusively foundin this region and arefully conservedin the murine species (21). Two of these sequences conformto the consensus sequence ED-GSG-ED proposed previously for
attachment of glycosaminoglycans(23) and also found in three different integral membrane proteoglycans such as syndecan (24, 25), glypican (26), and NG2 proteoglycans (27). Although syndecan is a hybrid proteoglycan and contains both heparan and chondroitin sulfate chains, glypican contains only heparan sulfate chains and NG2 proteoglycan only chondroitin sulfate chains.It is presently unclear whether the glycosaminoglycan specificity derives from and/or cell- tissuespecific enzymes. The location of three HS chains in the amino terminus of the human and mouse proteoglycans (21) supports electron microscopic data which have shownin the murine EHS tumor a cluster of HS chains at one end of the molecule (52-54). In addition, the human HSPG2 contains three potential HS-attachment sites SGXG, one in domain IV and two in domain V. This consensus sequencehas been observed in a numberof small chondroitin/dermatan sulfatecontaining proteoglycans (29). In domain IV, the related sequence DSGE occurs nine times, five times more than in the mouse (21). However, most of these tetrapeptides lack the acidic-X-S-G-acidic motif necessary for glycosaminoglycan binding (23). Near the carboxyl terminus, there are two additional tetrapeptides, EGSG and GSGE, whichpartially fulfill the consensus sequence describedabove.
Molecular Structure of HSPG2 Protein Core
8555
Structural Similarities between Proteoglycans from Base- with epithelial development (68). Recent studies (69) have ment Membranes and the Attachment of Diverse Glycosami- shown that cysteine-rich peptides derived from the region of noglycans-There is compelling evidence that, in contrast to laminin which is homologous to domain I11of HSPG2 prothe smaller proteoglycan species (55-56), the high molecular mote cell division. Interestingly, these peptides induce cell weight proteoglycans made by different basement membrane- growth that is comparable in dose response and time dependpossible function of producing tissues share significant structural andimmunolog- ence with that of EGF (69). Thus, a ical homology (57, 58). For example, similar protein cores of HSPG2 domain I11 could be growth-promoting activity during ~ 4 0 kDa 0 are found in human colon carcinoma cells (2), EHS development and repair. Finally, domain I11 may bind fibronectin as shown for the tumor (8, 52, 53), rat yolk sac tumor (59), human placenta (19), bovine endothelial cells (60), human lung fibroblasts proteoglycan of humanplacenta (19) and lung fibroblasts (30), human fibrosarcoma cells (12), and calf lens epithelial (70). Because of similar protein core size, tissue distribution, cells (61). The evidence published so far suggests that some immuno cross-reactivity, similar sizemRNA species, and of these related protein cores can be substituted with either same peptide sequences (see Fig. 2), we believe that these heparan or chondroitin/dermatan sulfate and that in some proteoglycans are identical to those described in colon carcicircumstances, the proteoglycan may carry both types of noma and colon (2,9) andfibrosarcoma cells (12). The HSPG chains. For example, it has been shown that both EHS tumor from human lung fibroblast matrix (30) contains two major tissue (62) and a cell line recently established from the EHS glycosaminoglycan-free peptides of =110 and 62 kDa which tumor (63) contain hybrid proteoglycans. It has also been bind avidly ( k d =2 nM) to fibronectin (70). This suggests the proposed (52, 62) that two forms of the basement membrane existence of stable complexes in vivo (70) and is in agreement HSPG derived from the EHS tumor tissue may be distinct with the data from the human placenta proteoglycan (19). gene products, whereas others (64)have demonstrated a com- Our findings suggest that domain I11 could be involved in mon origin of the two species from proteolytic processing of mediating these proteoglycan-fibronectin interactions. Domain IV: A Large Module Composed of Reiterative IgGthe same precursor. The results of our study indicate the likely possibility of extensive post-translational modifications like Subdomains as in N-CAM-This module is by far the largest assembly of IgG-like repeats found so far with 21 of the protein core which could include both heparan and consecutive units spanning 2010 amino acids which are hochondroitin sulfate chains. These chains would be localized at the opposite ends of the protein core and could result in mologous to the IgG-like repeats found in N-CAM (43, 71). similar behavior on analytical chromatography or ultracenti- It is interesting that there are seven additional IgG repeats in fugation if the cleavage sites of the protein core were closeto the human ascompared with that of the murine species which the glycosaminoglycan chains. Accordingly, a high-density contains only 14 repeats in domain IV. This may derive from proteoglycan, composed of three chains attachedto a partially alternative splicing of the mRNA. The modules present in cleaved protein core, could derive from either the amino or the immunoglobulin gene superfamily, to which the HSPG2 carboxyl terminus. A plausible scenario which could explain domain IV clearly belongs, is one of the best studied. It has a number of conflicting published results is that a common been shown how this module can be adapted to bind avariety precursor protein undergoes tissue- and cell-specific process- of ligands by changing the length of the variable polypeptide ing to generate multiple forms of proteoglycans. loops attached to a stable p-sheet core structure (72). It is Domain 11:A Structural Module withHomology to the LDL possible that these IgG repeats are involved in homophilic Receptor-The presence of a molecular domain with features binding as proposed for the N-CAM (43, 71). In addition, typically found in proteins that mediate nutritional uptake domain IV contains a sequence of high hydrophobicity and and its proximity to the heparan sulfate-containing region is three independent methods (33-35) predict a transmembrane quite interesting. One possibility is that thetwo domains may domain (see TableIand Fig. 2), followedby apotential coordinately influence LDL metabolism. This view derives cleavage site. This opens the possibility that thisgene product from the observations that heparin, a relatedglycosaminogly- may be intercalated or tightly bound to theplasma membrane can, displaces LDL from its receptor (39),that heparan sulfate at least during some stages of synthesis and translocation. binds LDL (65), and that the LDL receptor-like domain of These findings arein agreement with the relatively high HSPG2 is the region directly involved in theinteraction with hydrophobicity of the human proteoglycan (36). the LDL (42). Of particularinterest is the presence of Domain V: A Carboxyl-terminal Module Analogous to the DGSDE, a sequence that is fully conserved in the mouse (21) G-domain of Laminin A Chain and EGF-The terminal doand that has been proposed to represent the specific site of main is similar to the large, carboxyl-terminal G-domain of interaction between the receptor and LDL (42). Future studies laminin A chain (45, 46) andtothe related glycoprotein need to establish whether HSPG2, eitherasa basement merosin (48). It differs, however, in two respects: (i) the membrane or as an integral membrane proteoglycan, is di- presence of three globular domains instead of five and (ii)the rectly involved in the metabolism of LDL. unique presence of four EGF-like repeats. The globular doDomain 111: Homologyto the Short Arm of Laminin A mains could be involved in both homotypic and heterotypic Chain-The laminin-like nature of the HSPG2 was known interactions, as it has been shown for the EHS proteoglycan since the original report of the partialcDNA sequence of the (53, 58). Synthetic peptides from the G-domain of laminin A murine species (10)and was later confirmed by the first promote cell adhesion, neurite outgrowth, and interact with human clones published (9, 12). Obviously, the similarity in heparin and thepl integrin subunit (73). Therefore, plausible structure to the short arm of laminin A, with both globular functional roles of the globular regions of domain V would be repeats andcysteine-rich repeats, suggests that HSPG2 shares their direct involvement in the assembly and maintenance of a number of functions with laminin, such ascell adhesion and basement membranes, and adhesion of epithelial cells. growth. The human HSPG2, however, lacks both RGD and The four EGF repeats are very interesting features of this YIGSR sequences, two peptides that have been involved in domain, since all the 6 cysteines are fully conserved, in cellular binding and differentiation (32, 66, 67). Laminin is contrast to the murine species in which only 5 cysteines are the first matrix protein to be detectable at early stages of present in the first two EGF repeats (21). The prototype EGF differentiationand its synthesis and deposition correlates causes pleiotropic proliferative and developmental effects
Molecular Structure ofProtein HSPG2
8556
which are mediated by a specific cell surface receptor endowed with tyrosine-kinase activity (74). The secondary structure of human EGF in solution has been recently elucidated (75). Accordingly, the four EGF-like motifs in HSPG2 to the type 1consensus Present in a number of Proteins, including the human tissue plasminogen activator, transforming growth factor a, and laminin (47). If the disulfide bonds among the 6 cysteines follow the same patternas in human EGF, then domain V would contains four "finger-like" structures which together would provide maxima1binding affinity. It is an appealing concept that the EGF motifs may exert vectorial growth-promoting activity on opposing cells as proposed for some of the EGFsubdomains of laminin (77). Finally, domain V contains two Leu-Arg-Glu (LRE)tripeptides that are present in S-laminin, a homologue of laminin concentrated in the basement membranes of motor nerve terminals and muscle fibers at the neuromuscular junction (31, 76). Neurons bind to LRE-containing peptides and soluble LRE tripeptidesblock attachment of neurons to S-laminin (31). The presence of two LRE, in contrast to the murine species in which Only One is present (21)? suggest an important role forneurite HSPG2 in outgrowth. Conclusions-The molecular data Presented in this Paper have unraveled the structural complexity of the major proteoglycan from human basement membranes and otherextracellular matrices. This composite, multidomain gene product appears to be evolutionarily related to molecules involved in f l d a m e n t a l cellular processes such as nutrient binding and delivery, mitogenesis, andthe attachment/detachment of cells. This nondiffusible chimeric macromolecule with intrinsic growth-promoting and cell-adhesive properties could be utilized by tissues during embryonic development, repair, and growth' Our findings provide evidence for permissive structural hierarchies, in which the level of contact, growth regulation, and binding between HSPG2 and Surrounding moleculeswould be provided by several site-specific interactions and collectively affect the cellular processes and their microenvironment. Acknowledgments"We thank J' for his continuous support, Drs. M.-L. Chu and J. Uitto for providing cDNA libraries, Dr. M. Isemura for the monoclonal antibody HS42, Dr. R. Schwarting for help with the immunoenzymatic studies, and N. Hacobian and M. Naso and K. Shepley for excellent technical assistance. We thank also the members of the Computer Facility of the Jefferson Cancer Institute.
REFERENCES 1. Ruoslahti, E. (1988) Annu. Rev. Cell Bid. 4 , 229-255 2. Iozzo, R. V. (1984) J. Cell Biol. 99,403-417 3. Iozzo, R. V., and Clark, C. C. (1987) J. B i d . Chem. 2 6 2 , 1118811199 4. Iozzo,R. V., and Hassell, J. R. (1989) Arch. Biochem. Biophys. 269,239-249 R. V. (1989) J . B i d . Chem. 2 6 4 , 2690-2699 5. IOZZO, 6. IOZZO, R. V., Kovalszky, I., Hacobian, N., Schick, P. K., Ellingson, J. S., and Dodge, G. R. (1990) J. BWl. Chem. 265,19980-19989 7. Dodge, G. R., Kovalszky, I., Hassell, J. R., and Iozzo, R. V. (1990) J. Biol. Chem. 2 6 5 , 18023-18029 8. Hassell, J. R., Robey, P. G., Barrach, H.-J., Wilczek, J., Rennard, S. I., and Martin, G. R. (1980) Proc. Natl. Acad. Sci. U. S. A . 77,4494-4498 9. Dodge, G. R., Kovalszky,I., Chu, M.-L., Hassell, J. R., McBride, 0.W., Yi, H. F., and Iozzo, R. V. (1991) Genomics 10,673-680 10. Noonan, D. M., Horigan, E. A., Ledbetter, S. R., Vogeli, G., Sasaki, M., Yamada, Y., and Hassell, J. R. (1988) J. Biol. Chern. 263,16379-16387 11. Wintle, R. F., Kisilevsky, R., Noonan, D., and Duncan, A. M. V. (1990) Cytogen. Cell Genet. 54, 60-61 12. Kallunki, P., Eddy, R. L., Byers, M.G., Kestila, M., Shows, T. B., and Tryggvason, K. (1991) Genomics 11, 389-396
Core
13. Feinberg, A. p., and Vogelstein, B. (1984) A d . Biochem. 1 3 7 , 266-267 14. Short, J. M.9 Fernandez, HuW w. D.9 and SOW, J. (1988) Nucleic Acids Res. 16, 7583-7600 15. Chen, E. Y., and Seeburg, P. H. (1985) D N A ( N Y )4 , 165-170 16. Tuan, R. S., Lamb, B. T., and Jesinkey, C. B. (1988) Differentiation 3 7 , 198-204 17. McDonald, s. A., andTuan, R. s. (1989) Deu. Bid. 133, 221234 18. Langer, P., Waldrop, A., and Ward, D. (1981) Proc. Natl. Acud. Sci. U. S. A . 78,6633-6637 19. Isemura, M., Sate, N., Yamamchi, Y., Aikawa, J., Munakata, H., Hayashi, N., Yosizawa, Z., Nakamura, T., Kubota, A., Arakawa, M., and Hsu, C.-C. (1987) J. Bwl. Chem. 262,8926-8933 20. Schwarting, R., Gerdes, J., Durkop, H., Falini, B., Pileri, S., and Stein, H. (1989) Blood 74,1678-1689 21. Noonan, D. M., Fulle, A., Valente, P., Cai, S., Horigan, E., Sasaki, M., Yamada, Y., and Hassell, J. R. (1991) J. Biol. Chem. 266, 22939-22947 22. von Heijne, G. (1986) Nucleic Acids Res. 1 4 , 4683-4690 23. Zimmermann, D. R., and Ruoslahti, E. (1989) EMBO J. 8,29752981 24. Saunders, S., Jalkanen, M., O'Farrell, s.,andBernfield, M. 11989) J. Cell B i d . 1 0 8 , 1547-1556 25. Mali, M., Jaakkola, p., Arvilommi, A."., and Jalkanen, M. (1990) J. Biol. Chem. 265,6884-6889 26. David, G., Lories, V.,Decock,B., Marynen, P., Cassiman, J.-J., and Van Den Berghe, H. (1991) J. Cell Biol. 111,3165-3176 27. Nishiyama, A., Dahlin, K. J., Prince, J. T., Johnstone, S. R., and Stallcup, W. B. (1991) J. Cell Biol. 114,359-371 28. Huber, S., Winterhalter, K. H., and Vaughan, L. (1988) J. Biol. Chem. 263, 752-756 29. Bourdon, M. A., Kmsius, T., Campbell, S., Schwa&, N.B., and Ruoslahti, E. (1987) Proc. Natl. Acud. Sci. U. S. A . 84, 31943198 30. Heremans, A., Van Der Schueren, B., De Cock, B., Paulsson, M., Cassiman, J.-J., Van Den Berghe, H., and David, G. (1989) J. Cell B i d . 109,3199-3211 31. Hunter, D. D., Porter, B. E., Bulock, J. W., Adams, S. P., Merlie, J. P., and Sanes, J. R. (1989) Cell 59,905-913 32. Ruoslahti, E., and Pierschbacher, M. D. (1987) Science 238,491497 33. Rae, M. J. K., and Argas, p. (1986) Biochim. Biophys. Acta 8 6 9 , 197-214 34. Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984) J. Mol. Biol. 1 7 9 , 125-142 35. Klein, P., Kanehisa, M., and DeLisi, C. (1985) Biochim. Biophys. Acta 815,468-476 36. Iozzo, R. v. (1988) J , cell, Biochem. 37, 61-78 37. Maizel, J. v., andLenk, R. p. (1981) proc. Natl,Acad, sei. u, A, 78,7665-7669 38. pearson, w. R,, and Lipman, D. J. (1988) proc,Nuti. Acad, sei, U. S. A. 85,2444-2448 39. Yamamoto, T., Davis, G. G., Brown, M. S., Schneider, W. J., Casey, M. L., Goldstein, J. L., and Russell, D. W. (1984) Cell 39,27-38 40. Raychowdhury, R., Niles, J. L., McCluskey, R. T., and Smith, J. A. (1984) Science 2 4 4 , 1163-1165 41. DiScipio, R. G., Gehring, M. R., Podack, E. R., Kan, C. C., Hugli, T . E., and Fey, G.H. (1984) Proc.Natl. Acud. Sci. U. S. A . 8 1 , 7298-7302 42. Sudhof, T. C., Goldstein, J. L., Brown, M. S., and Russell, D. W. (1985) Science 228, 815-822 43. Cunningham, B. A., Hemperly, J. J., Murray, B. A., Prediger, E. A., Brackenbury, R., and Edelman, G. M. (1987) Science 236, 799-806 44. Sasaki, M., Kleinman, H. K., H u h , H., Deutzmann, R., and Yamada, Y. (1988) J . B i d . Chem. 263, 16536-16544 45. Haaparanta. T., Uitto, J., Ruoslahti, E., and Engvall, E. (1991) Matrix 11, 151-160 46. Sasaki, M., and Yamada, Y. (1987) J. BWl. Chem. 2 6 2 , 1711117117 47. Appella, E., Weber, I. T., and Blasi, F. (1988) FEBS Lett. 2 3 1 , 1-4 48. Ehrig, K., Leivo, I., Argraves, W. S., Ruoslahti, E., and Engval, E. (1990) Proc. Natl. Acad. Sci. U. S. A . 87, 3264-3268 49. Doolittle, R. F. (1989) Trends Biochern. Sci. 1 4 , 244-245 50. Patthy, L. (1987) FEBS Lett. 2 1 4 , 1-7 J.9
s,
Molecular Structure of HSPG2 Protein Core 51. Baron, M., Norman, D.G., and Campbell, I. D. (1991) Trends Biochem. Sci. 16, 13-17 52. Paulsson, M., Yurchenco, P. D., Ruben, G. C., Engel, J., and
Timpl, R. (1987) J. Mol. Biol. 197, 297-313 53. Yurchenco, P. D., Cheng, Y.-S., and Ruben, G. C. (1987) J. Biol. Chem. 262,17668-17676 54. Laurie, G . W., Inoue, S., Bing., J. T., and Hassell, J. R. (1988) Am. J.Anat. 181, 320-326 55. Kanwar, Y. S., Hascall, V. C., and Farquhar, M. G. (1981) J. Cell Biol. 90,527-532 56. Soroka, C. J., and Farquhar,M. G. (1991) J. Cell Biol. 113,12311241 57. Hassell, J. R., Kimura, J. H., and Hascall, V. C. (1986) Annu. Reu. Biochem. 55,539-567 58. Yurchenco, P. D., and Schittny, J. C. (1990) FASEB J. 4, 15771590 59. Wewer, U. M., Albrectsen, R., and Hassell, J. R. (1985) Differentiation 30,61-67 60. Saku, T., and Furthmayr, H. (1989) J. Biol. Chem. 264, 35143523 61. Mohan, P. S., and Spiro, R. G. (1991) J. Biol. Chem. 266,85678575 62. Kato, M., Koike, Y., Ito, Y. Suzuki, S., and Kimata, K. (1987) J. Bid. Chem. 262,7180-7188 63. Danielson, K.G., Martinez-Hernandez, A., Hassell, J. R., and Iozzo, R. V. (1992) Matrix, 12, 22-35 64. Klein, D. J., Brown, D. M., Oegema, T. R., Brenchley, P. E.,
65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77.
8557
Anderson, J. C., Dickinson, M.A. J., Horigan, E. A., and Hassell, J. R. (1988) J. Cell Biol. 106, 963-970 Cardin, A. D., and Weintraub, H. J. R. (1989) Arteriosclerosis 9, 21-32 Graf, J., Iwamoto, Y., Sasaki, M., Martin, G. R., Kleinman, H. K., Robey, F. A., and Yamada, Y. (1987) Cell 48,989-996 Grant, D. S., Tashiro, K.-I., Segui-Real, B., Yamada, Y., Martin, G. R., and Kleinman, H. K. (1989) Cell 58,933-943 Timpl, R., and Dziadek, M. (1986) Znt. Reu. Exp. Pathol. 29, 1112 Panayotou, G., End, P., Aumailley, M., Timpl, R., and Engel, J. (1989) Cell 56, 93-101 Heremans, A,, De Cock, B., Cassiman, J.-J., Van den Berghe, H., and David, G. (1990) J. Biol. Chem. 265,8716-8724 Rutishauser, U., Acheson, A., Hall, A.K., Mann, D.M., and Sunshine, J. (1988) Science 240,53-57 Williams, A. F., and Barclay, A. N.(1988) Annu. Reu. Zmmunol. 6,381-405 Skubitz, A. P. N., Letourneau, P. C., Wayner, E., and Furcht, L. (1991) J. Cell Biol. 115, 1137-1148 Carpenter, G., and Cohen, S. (1979) Annu. Reu.Biochem. 48, 193-216 Cooke, R. M., Wilkinson, A. J., Baron, M., Pastore, A., Tappin, M. J., Campbell, I. D., Gregory, H., and Sheard, B. (1987) Nature 327, 339-341 Hunter, D. D., Shah, V., Merlie, J. P., and Sanes, J. R. (1989) Nature 338, 229-233 Engel, J. (1991) Znt. J. Biol. Macromol. 1 3 , 147-151