Human a1 (XIII) Collagen Gene - The Journal of Biological Chemistry

2 downloads 0 Views 3MB Size Report
polypeptide chains contains one or more regions with a re- peating Gly-X- ... chains, M, 2 95,000, but the triple helix is frequently inter- rupted by short ... collagen does not follow the 54-bp pattern for Gly-X- Y-repeat coding exons in the 3'-end ...
Vol. 266, No. 26, Issue of September 15, pp. 17713-17719,1991 Printed in U.S.A.

THEJOURNALOF B I O L O G I C ACHEMISTRY L (c)1991 by The American Society for Biochemistry andMolecular Biology, Inc.

Human a1 (XIII) Collagen Gene MULTIPLE FORMS OF THE GENE TRANSCRIPTS ARE GENERATEDTHROUGHCOMPLEX ALTERNATIVESPLICING OF SEVERALSHORTEXONS* (Received for publication, April 1, 1991)

Liisa Tikka, Outi Elomaa, Taina PihlajaniemiS, and KarlTryggvasonO From the Biocenter and Department of Biochemistry, University of Oulu,SF-90570 Oulu and SBiocenter and Department of Medical Biochemistry, University of Oulu,SF-90220 Oulu,Finland

The structure of the gene for the human al(XII1) M , 2 95,000 a chains which form a long continuous triplecollagen chain (COL13A1) was determined from ge- helical domain of repeating Gly-X-Y-sequence and two ternomic clones spanning 140,000base pairs (bp), includ- minal propeptides. These collagens assemble into fibrils in ing about 3,000 bp of the 5’-end-flanking region and the various extracellular tissues where they are expressed. 6,000 bp of the 3’-end-flanking region. The gene was The fibrillar collagens are also characterized by their highly shown to contain39 exons. There wereeight exonsof conserved gene structure: the number and size of Gly-X-Y2 7 bp, five of 36 bp, fourof 54 bp, three of 45 bp, and repeat coding exonshas been strictlymaintainedthrough two of 42 bp. The restof the exons coding for translated evolution with few exceptions (4). Mostof these exons are54 sequences had sizes varying between 24 and 153 bp. bp or multiples thereof,a size which has been proposed to be The genomic clones did not contain exons 3 and 4 whose sizes could, however, be estimated from cDNA the ancestralcoding unit from which the triple-helical domain has evolved (5). The basement membrane (typeIV) collagen clones. S 1nuclease mapping and primer extension analyzes indicated five closely spaced initiation sites of forms a distinct group of nonfibrillar collagens with large a transcription. Sequencing of the 5’-end-flanking re- chains, M,2 95,000, but the triple helix is frequently intergion did not reveal a typical TATA boxbut a four times rupted by short noncollagenous sequences (1,6,7). Thegenes repeated TATTTAT sequence that may serve as true coding for these discontinuous triple helices do not contain TATA boxes. Two CCAAT boxes were found starting the 54-bp’ exon pattern, and apparently theyhave a different at positions -13 and -194, and furthermore, the pro- evolutionary pathway (8-11). moter region contains two GC boxes. Previous studies Collagen types VI, VIII, IX, X, XII, andXI11 are heteroloon al(XII1) collagen cDNA and genomic clones showed gous nonfibrillar collagens, which can be characterized by one that the primary transcriptundergoes complex alter- or moretriple-helical domainsconsiderably shorter than those native splicing generating at least four different forms in fibrillar collagens (12-15). Types IX and XI1 are structurof mRNAs. The present work demonstrated that se- ally related collagens (14) of which the former has been shown quences of seven exons are alternatively used. These to be associated with the surface of collagen fibrils (16). The exons contain sequences coding for pure collagenous genes forthe type IXcollagen a1 and a2 chains have a similar regions, pure noncollagenousregions, andan exon cod- exonorganization, although they varyconsiderably in size ing for ajunction of a collagenous and noncollagenous (14, 17, 18). Several 54-bp exons are clustered in the central domain. part of the chick a2(IX) gene, and exons with varying sizes arepresentintheterminalparts of the gene. Also, the partially characterizeda1 chain gene for the related typeXI1 collagen does not follow the 54-bp pattern for Gly-X-Y-repeat The heterogeneous family of collagenous proteins consti- coding exonsinthe 3’-end of the gene (19).The a l ( X ) tutes the major structural supportive elements of the extra- collagen gene is unique in that the entire triple-helical domain cellular matrix. Thecollagen molecules are composed of three (460 amino acid residues) is encoded by a single exon (20). a chains which are either identical or homologous products Lately, additional heterogeneity of collagens has been reof two or three distinct genes. Characteristically, each of the ported by the discovery of collagen a chains which are genpolypeptide chains contains one or more regions with a reerated by alternative splicing and theuse of alternative tranpeating Gly-X-Y-sequence which participate in the formation scription initiation sites. The first collagen of this kind to be of the triple-helical domain of each collagen molecule. The fibrillar collagens (types I, 11,111, V,and XI) constitute described was type XI11 collagen (15, 21, 22). The complete been the major group of collagens (1-3). They are characterizedby primarystructure of the human al(XII1) chain has deduced from cDNA clones (15, 21). The calculated size of the al(XII1) chain is only about M, 62,000-65,000. The chain li This work was supported in part by grants from the Academy of contains three collagenous domains, COL-1,COL-2, and Finland, Finland’s Cultural Foundation, and Finland’s Cancer Institute. The costs of publication of this article were defrayed in part by COL-3 and four non-collagenous domains, NC-1, NC-2, NCthe payment of page charges. This article must therefore be hereby 3, and NC-4 numbered beginning from the amino terminus marked “advertisement” in accordance with 18 U.S.C. Section 1734 (15, 21). The structure of the molecule, therefore, resembles solely to indicate this fact. XI1 collagens. The biological function The nucleotide sequence($ reportedin this paper has been submitted that of type IX and type totheGenBankTM/EMBLDataBankwith accession number(s) of type XI11 collagen is unknown, but in situ hybridization M68974-M69010. studies have shown high level of al(XII1) mRNA expression 3 To whom correspondence should be addressed Biocenter and Department of Biochemistry, University of Oulu, SF-90570 Oulu, Finland. Tel.: 358-81-352361;Fax: 358-81-352356.

’ The abbreviations used are: bp, base pair(s);kb, kilobase(s); Pipes, piperazine-N,N’-bis(2-ethanosulfonicacid).

17713

17714

a1(XIII) Collagen Gene used. For priming, universal primers specific for the vectors or synthetic primers specific for the exon to be analyzed or exon flanking sequences were used. The syntheticoligonucleotides were synthesized in an Applied Biosystems DNA synthesizer. SI Nuclease Protection and Primer Extension-Total RNA from HT-1080 cellswasisolated by the guanidine thiocyanate phenolchloroform extraction (33). The poly(A)-enriched RNAwas prepared by oligo(dT)-cellulose chromatography. For the S1 nuclease protection assay we used a 840-bp-long AuaII-EagI fragment 5”labeled a t the Rag1 site. The S1 nuclease protection assay was performed as described(34).Briefly, thedouble-strandedprobe was hybridized with 2 fig of poly(A)+ RNA in the presence of 80% formamide, 40 mM Pipes, pH 6.4,400 mM NaCI, and 1 mM EDTA a t 60 “C for 4 h after which digestion with 1,000 units/ml of S1 nuclease (Bethesda Research Laboratories) was performed a t 37 “C for 30 min. The protected fragments were analyzed on a 5% polyacrylamide sequencing gel. For primer extension, 2 pg of the poly(A)+ RNA was hybridized with 70,000 cpm (about 5 pmol) of 5”end-labeled 17-mer oligonucleotide (nucleotides 83-99 in Fig. 4) as described (35). The extension reaction was carried outat 37 “Cfor 40 min, and the reaction product was analyzed on a 6% polyacrylamide sequencing gel.

in the epidermis, hair follicles, and nail root cells of the skin andintheintestinal mucosallayer of humanfetus(23). Sequence analyzes of cDNA clones and S1 nuclease mapping indicate that thiscollagen type exhibits wide heterogeinity a t the mRNA level resulting from alternativesplicing a t several sites of the primary transcript. The regions involved in the differential splicing include short sequences (36-87 nucleotides) which code for portions of COL-1, NC-2, COL-3, and NC-4 domains of the al(XII1) chainleaving the central part of the polypeptide invariant (15).We have reported previously the characterization of the 3’-end of the gene and localized the gene to theq22 region of chromosome 10 (22,24,25). The alternative splicing of three exons (36, 45, and 87 bp) was shown to beresponsible for the absenceof these sequences in some of the overlapping cDNA clones (22). More recently, type I1 collagen has also been shown to undergo alternative splicing which dictates the presence or absence of a cysteinerich portion of the amino-terminal propeptide (26). Furthermore, in the a 3 chain of type VI collagen, two of the nine amino-terminal noncollagenous repetitive subdomains are involved inanalternative splicingprocess (27).Alternative splicing also generates multiple a2(VI) mRNAs differing a t their carboxyl termini (28). Theuse of alternative transcription start sites about 20 kb apart together with alternative usage of coding sequences has been reported for the al(1X) collagen gene (29). In the present study we have determined the structure of the al(XII1) chain gene. The exon organization of this gene explains the existence of multiple forms of transcripts. Presently, seven of the exons characterized have been shown to undergo alternative splicing.

RESULTS

Isolation of Genomic Clones-We have previously described two cosmid clones, cosD2 and cosD3, which cover about 60 kb of the3’-end of the human d(XII1) chain gene and determined thesize of the eleven most 3‘ exons (22). In order to isolategenomic clones forthe remaining 5‘-end of the gene, cDNA clone E-26 which covers almost all of the about 2.6-kb mRNA (15) was used as probe to screengenomic X phage and cosmid libraries. Two overlapping X clones, CL-37 and AHwith the 21, and a cosmid clone, cos-61, which overlapped cosmid cloneD3 andwhich alsocontained theregions covered by AH-21 and CL-37 clones were isolated (Fig. 1).Since an EXPERIMENTALPROCEDURES oligonucleotide probe specific for the 5’-endof the E-26cDNA Genomic Clones-Genomic librariesmadewithhuman genomic clone (nucleotides 31-70, Ref. 15) did not hybridize with cosDNA cloned in bacteriophage EMBL-3(Clontech) or Charon 4A 61, a 500-bp 5’ fragment of the cDNA clone was used for (kindly provided by Dr. Ed Fritsch, Genentech) anda cosmid library screening of a library resulting in the isolation of cosmid clone (30) constructed in thepCV108 vector (kindly provided by Dr. Yuncos-31. This clone spanned 32,000 bp, but it contained only Fai Chris Lau, Howard Hughes Medical Institute, San Francisco) of about 6 kb hybridizing with were initially screened with the human al(XII1) chain cDNA clone one EcoRI restriction fragment E-26 (15) ora 500-bp EcoRI-BamHI 5’ restriction fragment from the the E-26 cDNA. Only one exon (exon 2) was shown to be same clone. The probes were nick-translated in the presenceof [”’PI present in this largecos-31 clone. Inordertoisolatethe dCTP and screeningwas performed as described (31).The 5‘-end of extreme 5‘-end of the gene, a n oligonucleotide corresponding the gene was isolated using an end-labeled synthetic 40-mer (nucleotides 31-70 in E-26, see Ref. 15) as a probe. Isolated clones were to the most 5’ sequence of the E-26 cDNA (see above) was characterized by restriction mapping and subfragments were cloned further used to screena genomic X phage library. This resulted into pBR322, pUC18/19, and/or M13 mp18/19 for further character- in the isolation of a X phage clone CL-412 which turned out ization and nucleotide sequencing. to contain the first exon and the promoter region of the DNA Sequencing-DNA sequences were determined by the dial(XII1) collagen gene, but it did not overlap with COS-31. deoxy chain termination method (32) using single stranded M13 or Together, the clones isolated cover about 140,000 bp of gedouble-stranded pUC and pBR322 clones as templates. In the sequencing reactions, either DNApolymerase large fragment (Klenow), nomic DNA, including about 3,000 bp of the 5’-end-flanking “Sequenase,” or Taq polymerase (U. S. BiochemicalsCorp.) were region, 130,000 bp of the structural gene, and 5,000 bp of the 2

1

5

AH 21

CL412

3s

30

39

-

-

mt 31 I

2s

20

10

ms D3

CL 37

I

I

0

~

I

I

o

r

I n

s

I a

I 0

~

I

I

)

mm

m

I

I

I

90

IM

110

I 120

I

13om

FIG. 1. Diagram of the human al(XII1) collagen gene. Top, interveningsequencesareindicatedas a horizontal line with the approximate location of 37 exons indicated by vertical bars. Two regions not covered by the genomic clones and containing exons 3 and 4 are indicated. Middle, alignment of genomic clones with EcoRI restriction sites indicated by vertical bars. Bottom line, scale in kilobases (kb).

17715

a1 (XIII) Collagen Gene

221 15

I

2 3

1 5

70

I6 Gfl (G)ly 39 GGG 61y SI clccccacap GGG Gly

7

8

9

GAC llflfl Rep L y e

TCC CCC 5.r P r o

l l l l l l l c o p GGI Cflfl CCfl Gly Gln P r o 80 c l g p l l l c a p GGI CCC f l T 1 Sly Pro I I . 92 c t c c t t c c a g GGC cac ccn Gly HI. P r o

I01 c l l c c c a c a p G G l CTO flCG G l y L o u lhr I IO

IO c l c l l l n l a a GGfl CCC CflG

... ... GGll CTC Cell

1351

... ...

13

l c c a p C I G CTG CCT Lou Lou P r o I17 11 lpaaeclcclo GGC Gflfl CflG Gly Glu G l n

. .., Gly

GGC CCC PP rr oo

I I

... ... ... ... .., ...

... .., ... ...

62 CCI

79 GGR T I T CCG Gly Ph. P r o 91 GGC 1 1 1 1 1 CCG Oly Lye P r o 100 GGC Gnc 110 G I nap ~ not I09 GGfl CflG CCG G l y Gln P r o

,..

..,

... ...

G l y Leu Pro

... ...

20

COL-I

21

51

COL-I*

25

358

36

l s l s l g c c ~ gGGG GCG CCC .., Gly flla P r o 221 t l c c o c a e a g GGT GAG ccfl , , . Gly Glu P r o 239 l c l e e l g e a q GET GAG AnG Gly Glu Lya 248

.., ..,

... ...

26

COL-I

27

COL-I

27

27

COL-I

28

21

Lya GCfl

nu0 CGI flrp

... ... G l y CCfl IRG P r o Lya

87

COL-2.

54

COL-2

15

COL-2

69

j""6110"

... ... GCC 1TC AI0

24

HC-3

... ,..

63

COL-3

i2

COL-3'

153

COL-3

GGG

... ,.. GGfl GflC G I U nep

... .,, GGC CCC Gly P r o

l g c c l l c c o p GGC TCC RAG S l y S w Lyo 328 p p l l l l l l o p GflG I I I C EGG Glu 1 1 0 flrp 351 l c t c l c c l q GGG CCICT1 Gly P r o Leu 359 GGG GRG Rll G l y G l uI l e 180 l l l l c l a a a o GGfl CCT CGC Gly P r o nrp

..,

27

COL-I

27

Juncllon

66

Gly Sor

34

Juncllon

l e e e l l c c o ( GGT RCT CCfl G l y l h rP r o

522 27

35

COL-2

GGC RGc

nnn

Gly

36

36

COL-2

50r Lye si0 ~ l g c l l p c a pGOT T l f l Cfl1 G l y Lou Hlo

212

558

GCC AflG

h Lyr

45

COL-2

37

33

COL-2.

38

227 GGG AT0 flnG G I U net L ~ O 238 GGR flcc nno O l y Thr Ly* 247 EGG CflG RRG Gly Gln Ly. 265

.., GGR

45

COL-3

159 GGfl CCT CCG Gly PPrroo

12

COL-3.

... ... GGfl CCfl 471 IflG

27

COL-3

... .., GGfl CTfl CCC

81

COL-3

36

COL-3

51

COL-3

51

COL-3

I6

COL-3

l l a l l l o c a g GGG GflG flGG G l y Glu llrp 570 l c l c l t g c a g GGC Gflfl G R I

glolglllll

81

Junctlon

O I GIU ~

g100pge11e

39

Ice-1

Gly L a u P r o

Gly

... ... GGC Gly

... ,.. GGG Gly

... .., GGfl Sly

... .., ..,

COL-2

51

COL-2

GCA GflC

Gly P r o Lye 182

IGC Cy0

... 1GG

Trp

n*p

599 27

379 flflfl Lyo 393 CCG Pro

Gly flla Glu

... ... GGG

ne-2

flAG

Gly Gln Ly.

111

510 108

GGC CRG

... ... GGC TCfl ... ...

ne-2

291 nllG LUO 312 RRG Ly. 327 CAG Gln 350

L ~ n Ue t

... ... Gfll GGG flrp G l y

plpcccacop GGfl ERG 1111 .,, G l y Glu Lye 115 30 l l g s l l l l a p 011 CCT GGG UaI P r o G l y 160 31 caaalpscag GGO C I CC A I G l y Lou G l n 174 32 l l l s c l c c ~ pGGG Gflfl GCfl G l y G l u flla 183 3 1 c l l c p l c c o p 661 GCT TCll G l y flla Ser

L ~ O

flCR T I C CRG T h r Ph. Gln 159 OGG CCfl CCT G I PN ~ rp. I91 GGfl CCfl flflG G l y P r o Lyo 200 GGC CGG AllG G l y flrp Lyo

... fllo GCT CTC Leu nlo

29

COL-l

155

81y n

Rflfl

391

0011 Gfln RAG

,.. ... GGfl

213

19

23

I361

I18

LW

I56 I5 p c l l c c o c a g GGC T l l l C C I , , , .., 192 g l l l l c c c a g Gbfl GflC CCfl Gly Rap P r o 201 c l l c c l c c a g GGfl Gfln CGG G I GIU ~ nrg

COL-I,5'eplIl

Gly flrp U a l

57

Glu Cy. Leu I 37

I8

22

lccaccccap G G I GRG G l y Glu 266 l c l l c c c c a o Gbfl Gflfl G I GIU ~ 295 GGG 6116 Gly Glu

31s

50 GGfl CGT GIG

I2

17

nC-!,3'OQlIl

G l y Leu P r o

I1

I6

*PI

21

38

CCT CCT P r oP r o

63 6

t

5'UI e l g n a lp o p l .

509 CCR CCG P r oP r e 521 GRG RGG Glu Rrp 539 C T T ccr Leu Pro 557 IflC CGG Ran flrg 569 CCA T I G P r o Lou 598 RRC RRG A m Lyr 61 1

3s " n l r m . l ~ l o d

FIG. 2. Nucleotide sequences at exon/intron boundaries in the al(XII1) collagen gene. Exon sequences are indicated by uppercase letters and flanking intron sequences by lowercase letters. The deduced amino acid sequence, numbered as in Ref. 15 is indicated below the exon sequence. The sequence encoded by exon 13 is alternative to the sequence encoded by exon 12 and the numbering of amino acids goes uninterrupted through exons 11, 12, and 14. Size of each exon is indicated as well as the domain it encodes. The sequence of the 3'untranslated region of exon 39 is shown including 50 bp downstream of the second polyadenylation signal ataaa (underlined). Exons with interruptions (*) in the Gly-X-Y repeat sequence and exons with a split codon are indicated.

exon 26 cDNR

FIG. 3. The nucleotide sequences of exons 26 and 28. The encoded amino acid sequence is indicated at the top of nucleotide sequence and is numbered as in Ref. 15. Below the exon seof Ref. 15 is quence, the cDNA sequence indicated at the sites of difference. Accordingly, thealteredaminoacidsare shown below. 444

359 G l y P r oP r o G l yL e u P r o GIy G l n I l e G l y P r oP r o agGGG C C T CCT G G CT T CT C T GGG C A R R T T GGCCCRCCT

T

T

Leu

Leu

Ltu

39t

exon 28

379 G l y Rla P r o G l y I l e P r o G l y G l n L y s GGR GCT C C R GGG RTT CCR GGC CRG RRCgt T 1 G T Leu net L ~ U 414 G l y P r o G l n G l y P r o P r o G l y L y s Rap GGT C C C CAR GGC CCC CCT GGR RRG GRT 1 T Leu Leu

F l y P r o Arg G l y L y s P r o G l y Asp n e t G I y P r o P r o agGGR C C CT G C GGT RRR C C R GGR GRC RTG GGC C C TC C T T Leu

cDNR

415 G l y P r oP r o GGR C C TC C R

435 G l y Val L y s G l y G l u GGA GTG RAG GGR GAR

R m G l y H l s P r o G l y S e rP r o G I y Glu L y sG l yG l uL y s ARC GGG CRCCCR GGG RGC C C R GGR GRG RRG GGG GRR RRR

i 36

G l yG l u T h r G l y G l n R I a G l y Ser P r o GGG GRG RCR GGR CRA GCR G G CT C RC C G g l

al(XIII) Collagen Gene

17716

-1081 ggggatctcaggaggtacagagcccccggggagtcggttgtcgggggagaaggaaggggaaagaggttgggggcttgggagtagaggagcgtggatgaag

-980

-981

ggatcaggtagtcctgatcaggacgtggagatccacccacaagccctgacaataactcagtcgagttcctggagggtctcaagggctgggaaaggtggag

-880

-881

tggtgcctgccccaacgcgtccatggaaggtggtgcccgcctcaccgcctccgtggaaggccaaacgagccctcggtgtgaatccggagaaagagaggga

-780

-781 a g t g c a g c g c a c g t c c a c t c c a c a c c c c c t t c c t g c c t a t a a a t a a c c c c a a g t g t c a c t t c a a a g a g g a c g g g g a g c g c g g c a g a g g g g c t c c c g g g c c

-680

-681

-580

ggcgggctggggccgccggacctgctcaccacgccccccaccctgggcgagtttgcataaataaacaaatccaagccatcggccccgagggaggtggagg

-581 a g g a g t g a c g a g g a g g a g g a g g a g g a g g g a g g g g g g a g g g a g a g a t t a g t t t g g g a a g t a g c a c g g a t t g c t c a t c c g a t c c g t g c c g c c g c a g g g a g t g

-480

-181

tgtcaagttacagaggcgcgggaatcggcccctgcgctcctcgccagccgccacgacccacctctgcccatggggccctgcatgtgcgccccttcgcccg

-381

g g g a c t g a a a c t g a c t g g c c c g g g a g a g a c a c g a g g c g c c c a g a a g g a c t g a c a g c g c g g c a c c a a c t g c t c t g c a g a c a c t t g a a g g g a a a g a c t ~ ~-280 l

-380

-281 a g a g a a g g a g a g c c g g t c a g a t t c c c c t a a c t t t c c t g g a c t t g g a a c g t t c t t c g a a a t a a c t t t t t t c t c a c c t a g g t g t a c ~ ~ a c c g c t g g t - I 8 0 -181 -81 20

tgtgctttttcggcacttcctctcctactgctaatttttccgtcctctttgccgggagcagcggaaagggacgttttccagcgatacaagccctttcccc

-80

ctgccccgcagtttggatagagccttttggcagcggctgtcgcctttattta~tc~atttatt~tctcaag~CGC e G R .G R G G ~ G T R G1~9 G G R G C I O C R O C C C R C R R R B C G G C R G C C R C C G G T G C C C G C G G C C C T G G G G R G T T G G G C G C G C C C G G G R ~ G G T G G ~ T ~ T G G T G G C G................. GCGC~GGCG~RGCG~GG~G

..................

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

119

..........................................................................................................................................................................................................................

120

C R C G G C T G C C G A G T C C R G G ~ . X . C I . ~ . F . ~ . I . G . F . C . I . G . ~ ~ ~ . I ~ . G . F . C ~ . ~ . F . ~ . C . G ~ ~ . ~ . ~ . . G . ~ , ~ . ~ . . ~ . ~ . ~ . I . ~ . ~ . ~ . ~ . ~ . ~ . f l ~ ~ . G . ~ . ~ . ~ . ~ . ~ . F . C . G219 .~.~C.~.~.~.~..~

220

C C G G O T G C T G C G C C T G G R A G C G G ~ G C G C G G G G R G C ~ G C R ~ ~ G R ~ R C R G C T ~ T T T T ~ G G ~ C G R G T C R R T C R ~ C T G C T G G R C G ~ G g t c t g t g c319 tctttgt f l E T R l L G R U N O L L D E

320

tgctggcttttatccctcccctcggcgccccctcgccgagcctccagactgttcagccggtgtccgcgggcgctcgctctctgcgaccaggttcgcgcga

419

120

gttgcctgcgggaggggtcggggacacggacagcgcagccctaccaggattcggaggacctggtgggcaagagcatgacccagcgactcccgcctcggc

519

520

aagcgcccgggcttcccgcgtgggacgctgggcacgccagct

FIG. 4. Nucleotide sequence of the 5'-end and -flanking regionof the human al(XII1) collagen gene. Exon sequences are indicated by uppercase letters and intron and 5"flanking sequences by lowercase letters. Five tentative transcription initiation sites as determinedby S1 nuclease (0)and one detectedby primer extension (0) are shown with the most 5' site defined as +l.Two in-frame ATG codons for methionine are underlined, and the amino acidsequence encoded by the first exon are shown by the one-letter code. The most 5'-end sequence determined from cDNA clones (15) is highlighted by shading. The sequence of the primer used for extension is underlined with an arrow. The four TATA box-like sequences TATTTAT are underlined and two CCAAT boxes, and two GC boxes are indicated.

3'-end-flanking region (Fig. 1).However, the clones are not completelyoverlapping so that CL-412 and cos-31 on one hand, and cos-31 and cos-61 on the other, do not overlap. Exhaustive screeningof several genomiclibraries did not yield clones spanning thetwo gaps. Therefore, the exactsize of the gene could not be determined. Enonllntron Structure-The isolated clones were characterized by restriction mapping and Southern hybridization analysis using the cDNA clones E-12, E-3, and E-26 (15). These cloneshaveseveralsequencedifferenceswhich are presumably due to alternatively spliced transcripts (see Fig. 6). Genomic clone fragments containing exonsequences were subcloned, and the exon sizes and sequences around the intron boundaries were determined by nucleotide sequencing. The genomic clones studied here contained all coding sequences present in the cDNA clones except for a 71-bp segment near the5'-end of the cDNA. The sequence of 37 exons was determined (seeFig. 2). The present datasuggest that the71bp codingsequence notpresentinthe genomicclones is contained in two separate exons (exons 3 and 4), which are 35 and36 bp, respectively(see below). Accordingly, the human al(XII1) gene contains 39 exons. All but one exonbegin with a complete codon for an aminoacid as only the codon for the first glycine residue of the triple-helical domain (COL-1) is split in exons 2 and 3. The sizes of exons coding for the translated region vary between 24 and 153 bp. Only two exons are larger than 100-bp exons 14 and 28 being 108 and 153 bp, respectively. However, most of the exons are54 bp or smaller. There are 28 exons that code solely for parts of the three triple-helical domains. Seven of them consistof 27 bp, five of 36 bp, four of 54 bp, three of 45 bp, and two of 42 bp. Seven of the exons encoding the collagenous domain have unique exon sizes of 33, 35, 51, 63, 81,87, and 153 bp. The COL-1

domain of the protein isencoded by exon 3 (35 bp) that starts with the second nucleotide of the codon for glycine, exons 4 and 5 (36 and 51 bp,respectively), exon 6 (36 bp), and exons 7-11 which are 27 bp each. Exon 11 does not code solely for a Gly-X- Y-repeat sequencebut it is a "junction exon" coding for parts of both the COL-1 and NC-2 domains. The NC-2 domain is encoded partially by the junction exons 11 and 14 and by exons 12 or 13 that are both alternatively spliced. COL-2is encoded by exons 14-24. The NC-3domainis encoded by a junction exon (exon 24) and exon 25. COL-3 is encoded by exons 26 through 37 the latter being a junction exon. Part of exon 37 and all of exon 38 code for the NC-4 domain. The last exon of the gene starts with the translation stop codon and codes solely for the 3'-end-untranslated region. The nucleotide sequences of exons 26 and 28 differed from those recently published for the corresponding sequences of al(XII1) cDNAclones (15). Altogether, nine base differences were observed which all change the encoded amino acid. In exon 26, 5 cytidines observed instead of thymidines change the leucine codons in the cDNA to proline. Additionally, 1 thymidine was observed instead of a guanosine which changes the codon of methionine to isoleucine. In exon 28,3 cytidines instead of thymidines were observed which all change the codon of leucine to proline (Fig. 3). Recently, re-examination of the cDNA sequences hasverified that thesequences determined here are presentalso in the cDNA clones.' Primary Structure of the 5'-End and 5"Flunking Region and Determination of Transcription Start Sites-Genomic X clone CL412 which hybridized with an oligonucleotide probe specific to the most 5' region of the cDNA clone E-26 was mapped with restriction enzymes, and a 2.4-kb BglII-Hind111

' T. Pihlajaniemi,personalcommunication.

al(XIII) Collagen Gene fragment containing the sequences corresponding to the 5'end region of the cDNA clone was analyzed by determining a part (1642 bp) of its sequence (Fig. 4). This sequence contained theextreme 166 bp present in the 5'-end of the cDNA clone, extended about 1.2kb upstream and about 250bp downstream into the first intron of the gene. The sequence of the 5'-end of the previously characterized cDNA clone E-26 (15) revealed an in-frame codonfor methionine that was predicted to be the translation start site. However, the E-26 clone extended 120 bp upstream from this site as an open

c

A

P

I.....J

I

B

reading frame (15) leaving the possibility for another translation initiation codon 5' of the isolated clone. The sequence determined here revealed a second in-frame ATG codon for methionine located 249 nucleotides upstream from the first one. S1 nuclease mapping and primer extension analyzes were carried out to determine the true transcription start site. In the S1 nuclease experiment, an 840-bp AuaII- EagI fragment 5'4abeled at the EagI site was hybridized to poly(A)-rich RNA from HT-1080 cells (Fig. 5). Five protected fragments of 185-207 nucleotides weredetected after S1 digestion indicating five potential initiation sites for transcription. Two of these sites were downstream from the more 5' ATG codon, one was within that codon and two were located only a few nucleotides upstream of this methionine codon (Fig.4). These data indicate that thedownstream ATG codon isthe authentic translation start codon. In contrast to the S1 nuclease mapping, the primer extension experiment yielded only one product corresponding to the shortest protected fragment in the S1 nuclease assay. Taken together, these data suggest that multiple transcription start sites are used within a stretch of about 26 nucleotides (the most 5' of which was defined +l).The nucleotide sequence immediately upstream of the five potential initiation sites for transcription did not reveal typical TATA or CCAAT boxes locatedat the expected locations of about -30 and -70, respectively. However, the sequence TATTTAT is repeated, partially overlapping,four times (Fig. 4). Two CCAAT boxes were found the in promoter region starting at locations -13 and -194. Furthermore two GGGCGG sequences that can serve as binding sites for the nuclear transcription factor s p l (36) were observed (nucleotides -275 to -280 and +lo5 to+110). ExonsInuolued in Altermtiue Splicing-The al(XII1) cDNA clones (E-26, E-3, E-12) represent the three different forms of transcripts in endothelial cellsamong a total of eleven different cDNA clonescharacterized so far (15). These clones, together with a previously characterized cDNA clone (HT-125) isolated from an HT-1080-cell library (21), show differences in their coding sequence in five separate regions. These differences consist of short segments termed I, 11, 11', 111, IV, and V (87,57,66,45,36, and 87 bp, respectively)that are either present or absent in the cDNA clones as depicted in Fig. 6. Characterization of the al(XII1)gene indicates that the 87-bp segment I is likely to be encoded by two exons (exons 4 and 5). A 51-bp sequence from the 3'-end of this segment was present in exon 5 as determined by nucleotide sequencingof one of the genomic clones analyzed in this study (Fig. 2). The rest of the 87-bp segment is probably encoded by a separate 36-bp exon (exon 4) which was not present in

B

A

EP

t1

H

&..........I

-* -

intmn 1

S 1 prob. A

17717

n Il.Eq I

pllnvt 40-1

-

Irn 1

FIG. 5. Determination of transcription start sites. A, in the S1 nuclease assay, an 840-bp AuaII- EagI fragment was hybridized to poly(A)-rich RNA of HT-1080 cells, and the protected fragments were analyzed on a 5% sequencing gel. The arrow indicates the major protected fragment. B, the same RNA was used in the primer extension experiment, and the extension products were analyzed on a 6% sequencing gel. The extension product is indicated by arrow. The lanes to the left of hnes A and B contain sequencing products of known length as molecular size markers. C, partial restriction mapof a 2.4-kb BglII-Hind111 fragment containing the promoter region. B denotes BglII; A, AuaII; P, PstI; E, EagI; H, HindIII. The probe used in theS1 nuclease assay and theprimer 40-1 used in primer extension assay are illustrated. The region covered by the cDNA clones is shown as a shaded box. The detected fragments in bothassays are indicated schematically by a hatched line, and the transcription start site is indicated by + I .

nl(XIl1) Geneexons 1

10

5

FIG. 6. Schematic alignment of exons in the al(XII1) collagen gene with cDNA clones HT-125 (21), E26,E-3, and E-12 (15) that represent different mRNA isoforms expressed in endothelial and HT-1080 cells. The exons are separated by vertical lines and are numbered beginning from the 5'-end. The four non-collagenous domains (NC-I-NC-4) andthe three collagenousdomains (COL-I-COL3 ) encoded by the cDNA clones are indicated. The alternative segments I, 11, 11', 111, IV, and V areindicated, and their size is given in base pairs.

15

..a 7 .: I ... ... .. ... .... .... ... ...

.. . ... ... .. ..... .... ..... ...

..5 7 ..6 6 .. ;ll;ll.; .... .... .... .... .... .... ..... ..... ..... .... .... .... ... ... ... ..... ..... .....

...

..

... .... .

... . ...

. ..... .

20

35

45

. . IV . .... .

. . 311;

4

. .... ..

. .... ..

.

.. .. ... ...

. .

... ..

v

v

.. ..

H

... ... ... ... . .

.

..

.. .

..

COLI

NC.2

.. .. .. ..

NC-3

.

.... ..

.... ..

.... ..

...

...

cDNA clones

HT-125

3'

E-26

. .

.. ..

... ... COL-2

87

..

;. v

II .. ..

r

V

NC-1

. . .... .

39

1 ;

II .. ..

. ..... ..

35

30

25

COL-3

-+ NC-4

3.

E-12

17718

a1 (XIII) Collagen Gene

gen genes. The al(XII1) chain mRNA isonly about 2.3 kb so that the codingsequences represent less than 1.8%of the intron 4 entire gene, whereas the al(1) chain mRNA is about 5.5 kb intron 11 and represents about 30%of the total gene sequence. intron 4 intron 11 The human al(XII1)gene was shown to contain 39 exons. intron 32 intron 36 Seven of the 28 exons encodingforcollagenous Gly-X-Yrepeat sequence were 27 bp, the rest being between 36 and intron intron 11 intron 28 153 bp. Four of these exons were 54 bp, the size typically i n t r o n 32 intron 36 found in the genes for fibrillar collagens. It has been hypothesized that the sequences coding for Gly-X-Y-repeats in the intron 4 intron 11 fibrillar collagen genes have evolved by a tandem duplication intron 28 intron 32 i n t r o n 36 of a 54-bp ancestor element ( 5 ) . This rule clearly does not apply for the al(XII1)gene which must have evolved along a different pathway. With the exception of exons 3, 5 , 18, 21, 27 and 30, all the Gly-X-Y repeat coding exons are multiples FIG. 7. Nucleotide sequences of 5"flanking regions of al- of 9 bp which corresponds to a single Gly-X-Y triplet. Exon ternatively spliced exons in theal(XII1) collagen gene. The 3' 3 (35 bp)begins with thesecond nucleotide of the firstglycine intron-exon boundary is indicated by bold letters. Potential lariat codon of triple-helical domain. Exon 5 (51 bp), exon 18 (33 branch point sequences (PyrNPyrTPurAPyr) areunderlined. bp), exon21 (87 bp), and exon30 (42 bp) each encode a short imperfection in the Gly-X-Y-repeat sequence in the form of lacking one glycine residue or an amino acid in the X or Y the genomic clones studied. Accordingly, exon 3 which was of the Gly-X-Ynoteitherpresentinthe genomicclones is 35 bp which position. Therefore, it appears that many corresponds to the sequence in the cDNA between exons 2 coding exons in the al(XII1) gene have evolved by a quite and 4. The existence of exons 3 and 4 wasthus deduced based random multiplication of a single 9-bp Gly-X-Y coding element and that exons 5,18,21, and 30 have each lost a single on the information existing on the various cDNA clones as no genomic clones containing these exonswere found despite codon from 54-, 36-, 90-, and 45-bp exons, respectively. Furthermore, there seems to have beena preference for triplicascreening of several genomic libraries. tion of the 9-bp ancestor element, which has resulted in the Segments I1 and 11' in the cDNA(Fig. 6)representsequences of 57 and 66 bp, respectively, both of which are not formation of seven 27-bp exons, and in the generation of four simultaneously present in any of the eleven cDNA clones 54 bp exons and one 81-bp exon which could have evolved, characterized so far (15). Each of the regions 11,11',111, IV, respectively, by duplication and triplicationof a 27-bp exon. and V that are presentonly in a part of the al(XII1) mRNA The preference for 27-bp exons is particularly prominent in are defined to one exon in the gene (Fig. 6). Two different the 5'-end of the gene. Extensive size variety of Gly-X-Y alternative splicing pathways are presumably used exon skipcoding exons has also been reported for the genes for nonfiping(exons 29, 33, and 37 for segments 111, IV, and V, brillar collagens of types IV, VI, IX, X, and XI1 (4,9-11, 14). respectively) and mutually exclusive splicing (exons 12 and Analysis of the 5'-end and -flanking region of the gene 13 for segments I1 and 11'). revealed some highly unusual featuresof the al(XII1)collagen Nucleotide sequences in the3' regions of introns preceding gene. First of all, S1 nuclease mapping indicated the presence the alternativelyspliced exons 5, 12,29,33, and37 are shown of five different, but closely spaced, initiation sites for tranin Fig. 7. In intron 4, the ag sequence at the exon-intron scription between 237 and 259 nucleotides upstream of the boundary is preceded by a pyrimidine-rich (86%) segment of putative ATG translation initiationcodon. The five potential 274 nucleotides. Possible branchpoint regions following the transcription start sites were preceded by four, partiallyoverconsensus sequence PyrNPyrTPurAPyr (where Pyr is a py- lapping, TATTTATrepeatsstartingat 28-40 nucleotides rimidine, Pur is a purine, and N is any nucleotide) (37) could upstream of each site. A TATTTAT sequence represents a be identified at a distance of 20-50 nucleotides upstream of functional TATA box in the Simian virus 40 (SV40) early the intron-exon-boundary. promoter(40). The TATTTAT sequences in the al(XII1) gene promoter are likely to be true TATA boxes, which is DISCUSSION supported by location of the these sequences around position box-like elements The present work provides the entire exon pattern of the -30 bp. The presenceof the multiple TATA may account for the presence of several transcription initiahumantype XI11 collagen a1 chain gene (COL13A1) and tion sites. We could only detect one of the transcription sites demonstrates that the gene is unusually large for a collagen gene. We have isolated and mapped X phage andcosmid clones by primer extension analysis. We surmise that this isdue to which cover over 140,000 bp of genomic DNA spanning3,000 secondary structure formation at the 5'-end of the mRNA, bp of a 5'-flanking region, 130,000 bp of the structural gene, since the corresponding region of the gene was extremely and 5,000 bp of a 3'-flanking region. The clones were not difficult to sequence. The functionality of the TATTTAT completely overlapping and lacked a part of the first intron sequences remains to be determinedin transfection studies. Asecond feature of the promoter was the absenceof a as well as a region containing exons 3 and 4. Therefore, the CCAAT box at the characteristic location around -70 to exact size of the gene could not be determined. The al(XII1) collagen gene is considerably larger than the genes for the a -100. Instead, we found an inverted CCAAT sequence parchains of fibrillar collagens, such as the human al(1) gene tially within one of the potential TATA box sequences (see which is 18 kb (38) and the chick a2(I) gene which is 39 kb Fig. 4) and, additionally, a second CCAAT box starting at (39). In contrast, the human gene for the basement membrane position -194. The biological function of these sequences type IV collagen a1 chain has been shown to be more than have not yet been studied. Itwill be of particular interest to 100,000 bp in size (10). In spiteof its large size, the al(XII1) explore whether or how this promoter structure affects the by alternative gene transcript and the translated products areconsiderably formation of varianttranscriptsgenerated smaller than thoseof the fibrillarcollagen and typeIV colla- splicing. intron 4 i n t r o n 11

4

a1(XIII) Collagen Gene

17719

5. Yamada, Y., Avvidimento, V. E., Mudryj, M., Ohkubo, H., and Vogeli, G. Alternative splicing has been reported for a large number (1980) Cell 22,887-892 6. Kuhn, K., Glanville, R. W., Babel, W., Qian, R.-Q., Dieringer, H., Voss, T., of genes (41). Among those are the collagen genes COL2Al Siebold, B., Oberbaumer, I., Schwarz, U., and Yamada, Y. (1985) Ann. (26), COL6A2 (28), COL6A3 (27, 42), and COL9A1 (29) in N . Y. Acad. Sci. 4 6 0 , 14-24 7. Hostikka, S. L., and Tryggvason, K. (1988) J . Biol. Chem. 2 6 3 , 19488additiontothe COL13A1 studiedhere (15, 21, 22). The 19493 alternative splicing event has been found to involve Gly-X8. Kurkinen, M., Bernard,M.P.,Barlow, D. P.,and Chow, L. T. (1985) Nature 3 1 7 , 177-179 Y-repeat sequences only in the caseof the al(XII1) collagen 9. Soininen, R., Tikka, L., Chow, L., Pihlajaniemi, T., Kurkinen, M., Prockop, chain. The present study demonstrates that at least seven D. J., Boyd, C. D., andTryggvason, K. (1986) Proc. Natl. Acad. Sci. U. S. A. 8 3 , 1568-1572 exons can undergo alternative splicing of which exons 4 and R., Huotari, M., Ganguly, A,, Prockop,D. J., and Tryggvason,K. 5 are spliced out together based on current cDNA data (15). 10. Soininen, (1989) J. Biol. Chem. 2 6 4 , 13565-13571 T h e exons involved contain sequencescoding for pure collag- 11. Hostikka, S. L., and Tryggvason, K. (1987) FEBS Lett. 2 2 4 , 297-205 M.-L., Pan, T.-C.,Conway, D., Saitta, B., Stokes, D., Kuo,H.-J., enous regions (exons 4, 5 , 29, and 33), pure noncollagenous 12. Chu, Glanville, R. W., Timpl, R., Mann, K., and Deutzmann, R. (1990) Ann. N. Y. Acad. Sei. 580.55-63 regions (exons 12 and 13) and an exon coding for the junction N., Benya,'P. D., van der Rest, M., and Ninomyia, Y. (1989) of a collagenous (COL-3) andnoncollagenous (NC-4) domain 13. Yamaguchi, J. Biol. Chem. 2 6 4 , 16022-16029 14. Ninomiya, Y., Costagnola, P., Gerecke, D., Gordon, M., Jacenko, O., Lu(exon 37) (see Fig. 5 ) . Semiquantitative analysisof the numvalle, P., McCarthy,M., Muragaki, Y., Nishimura, I., Oh, S., Rosenblum, ber of different clones isolated from an endothelial cell cDNA N., Sato, N., Sugrue, S., Taylor, R., Vasios, G., Yamaguchi, N., and Olsen, B. R. (1990) in Extracellular Matrix Genes (Sandell, L. J., and library has demonstrated that alternatively used exons 4, 5 , Boyd, C. O., eds) pp. 79-113, Academic Press Inc., Orlando, FL 13, 29, 33, and 37 are less frequently present in processed 15. Pihlajaniemi, T., and Tamminen, M. (1990) J. Bid. Chem. 2 6 5 , 1692216928 mRNAs of this cell line (15). However,there is a heterogeneity L., Mendler, M., Huber, S., Bruckner, P., Winterhatter, K. H., of alternatively spliced al(XII1) transcripts in this cell line 16. Vaughan, Irwin, M. I., and Mayne, R. (1988) J. Cell Biol. 106,991-997 as at least three combinations of the alternatively spliced 17. Vasios, G., Nishimura, I., Konomi, H., van der Rest, M., Ninorniya, Y., and Olsen, B. R. (1988) J. Biol. Chem. 2 6 3 , 2324-2329 exons exist at the mRNAlevel. Since four different types of 18. Lozono, G., Ninomiya, Y., Thompson, H., and Olsen, B. R. (1985) Proc. Natl. Acad. Sci. U. S. A. 82,4050-4054 cDNA clones generated through alternative splicing have been 19. Gordon, M., Gerecke, D. R., Dublet, B., van der Rest, M., and Olsen,B. R. observed in only fourteen individual cDNA clones from two (1989) J. Biol. Chem. 2 6 4 , 19772-19778 20. Ninomiya, Y., Gordon, M., van der Rest, M., Schmid, T., Linsenmayer, T., different cell lines, the occurence of additional alternative and Olsen, B. R. (1986) J . Biol. Chem. 261,5041-5050 splicing events cannot be excluded in those cell lines or in 21. Pihlajaniemi, T., Myllyl, R., Seyer, J., Kurkinen, M., and Prockop, D. J. (1987) Proc. Natl. Acad. Sei. U. S. A. 8 4 , 940-944 cells from other sources. 22. Tikka, L., Pihlajaniemi, T., Henttu, P., Prockop, D. J., and Tryggvason, K. Thealternative splicing of type XI11 collageninvolves (1988) Proc. Natl. Acad. Sci. U. S. A. 8 5 , 7491-7495 variation of the lengthof two of the threecollagenous domains 23. Sandherg, M., Tamminen, M., Hirvonen,H., Vuorio, E., and Pihlajaniemi, T. (1989) J. Cell Bid. 109,1371-1379 and,additionally,exchange of shortinterconnectingnon24. Shows, T. B., Tikka, L., Byers, M. G., Eddy, R. L., Haley, L. L., Henry, W. M., Prockop, D. J., and Tryggvason, K. (1989) Genomics 5 , 128-133 collagenous sequences.The biological significance of this comL.,Tamminen, M., Solomon, E., andPihlajaniemi, T. (1989) plex alternative splicing pattern is still obscure. None of the 25. Pajunen, Cytogenet. Cell Genet. 5 2 , 190-193 variable regions contains any known sequences of potential 26. Ryan, M. C., and Sandell, L. J. (1990) J. Biol. Chem. 265,10334-10339 M.-L., Zhang, R.-Z., Pan, T., Stoker, D., Conway, D., Kuo,H.-J., biological significance such as Arg-Gly-Asp that may be im- 27. Chu, Glanville. R.. Maver. U.. Mann., K.., Deutzmann., R.., and Timul.R. (1990) , , portant for integrin binding (43, 44) or Asn-X-Ser/Thr that EMBO J.' 9,'385-393 ' 28. Saitta, B., Stokes, D. G., Vissing, H., Timpl, R., and Chu, M.-L. (1990) J. are importantfor attachment of carbohydrates. The temporal Biol. Chem. 265,6473-6480 andspatialdistribution of different COL13A1 gene tran- 29. Nishimura, I., Muragaki, Y., and Olsen, B. R. (1989) J. Biol. Chem. 2 6 4 , 20033-20041 scripts is also still unknown. This would require studies with 30. Lau, Y. F., and Kan, Y. W. (1983) Proc. Natl. Acad. Sei. U. S. A. 80,52255229 highly sensitiveprobes specific forthealternativelyused 31. Maniatis, T., Fritsch, E. F., and Samhrook, J. (1982) Molecular Cloning: A exons. Yet another intriguing question about type XI11 colLaboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY lagen is how the formation of triple-helical molecules with F., Nicklen, S., and Coulsom, A. R. (1977) Proc. Natl. Acad. Sci. al(XII1) chainsof different length assemble within the same 32. Sanger, U. S. A. 74,5463-5467 celll. This could lead to the formationof heterotrimers where 33. Chomczynski, P., and Sacchi, N. (1987) Anal. Bioehem. 1 6 2 , 156-159 J., Treisman, R., and Kamen, R. (1980) Methods Enzymol. 6 5 , Gly-X- Y sequence containing segments are contained within 34. Favalaro, 718-749 35. Bennet, V. D., Weiss, I. M., and Adams, S. L. (1989) J. Bid. Chem. 2 6 4 , a noncollagenous domain.Theseaspects could tobeap8402-8409 proached by double or triple transfection systems with expres36. Dynan, W. S., and Tjian, R. (1983) Cell 32,669-680 37. Padgett, R. A,, Grabowski, P. J., Kanarska, M. M., Seiler, S., and Sharp, sionvectorscontainingthedifferentforms of full-length P. A. (1986) Annu. Reu. Eiochem. 55,1119-1150 cDNAs. 38. Chu, M.-L., de Wet, W., Bernard, M., Ding, J.-F., Morabito, M., Myers,J.,

REFERENCES 1. Miller, E. J. (1985) Ann. N. Y. Acad. Sci. 4 6 0 , 1-13 2. Fessler, J. H., and Fessler,L. J. (1987) in Structure and Function of Collagen Types (Mayne, R., and Burgeson,R. E., eds) pp.81-103, Academic Press,

Inc., Orlando, FL

3. Eyre, D. (1987) in Structure and Function of Collagen Types (Mayne, R., and Burgeson, R. E., eds) pp. 261-281, Academic Press, Inc., Orlando,

FL

4. Vuorio, E., and de Crombrugghe, B. (1990) Annu. Reu. Biochem. 5 9 , 837872

Williams, C., and Ramirez, F. (1984) Nature 3 1 0 , 337-340 39. Boedtker, H., Finer, M., and Aho, S. (1985) Ann. N. Y. Acad. Sci. 4 6 0 , 8 5 116 40. Laimins, L. A,, Khoury, G., Gorman, C., Howard, B., and Gruss, P. (1982) Proc. Natl. Acad. Sci. U. S. A. 79,6453-6457 41. Smith, C. W. J., Patton, J. G., and Nadal-Giravd, B. (1989) Annu. Reu. Genet. 23,527-577 42. Doliana, R., Bonaldo, P., and Colombatti,A. (1990) J . Cell Biol. 1 1 12197, 2205 43. Ruoslabti, E., and Pierschbacher,M. D. (1986) Cell 4 4 , 517-518 44. Pierschbacher, M. D., andRuoslahti, E. (1984) Proc. Natl. Acad. Sci. U. S. A. 81,5985-5988