Collagen gene structure. RAYMOND DALGLEISH. I)epartmetit of Genetics, Utiiversity of Leicester. Utiiversity. Road, Leicester LE l 7RH, U.K.. Sjtiopsis.
625th Meeting Held at the Charing Cross and Westminster Medical School, 16-18 December 1987
-BiochemicalAspects of Skin Organized and Edited by L. C. Archard (Charing Cross and Westminster Medical School, London)
Collagen gene structure RAYMOND DALGLEISH I)epartmetit of Genetics, Utiiversity of Leicester. Utiiversity Road, Leicester LE l 7RH, U.K. Sjtiopsis
The collagens of vertebrates may be divided into three groups according to chain size and whether or not the helical domains are continuous. Present evidence suggests that, at least within one of these groups, similarity between collagens i s reflected in the organization of the genes that encode them. Early evidence suggested that collagen genes evolved on the basis o f exons which are multiples of a primordial building block of 54 bp, separated by much larger introns. This model o f collagen gene evolution is contradicted by the recent discovery of a collagen gene with a single long open reading frame. ltitrodirctioti
The collagens are a complex family of large structural proteins, found extensively in connective tissues throughout the body, which are characterized by a repeating protein sequence motif of Gly-Xaa-Yaa (where Xaa and Yaa are often proline or hydroxyproline) (Miller & Gay, 1987). Collagen molecules consist of three left-handed polypeptide helices, known as a-chains, which are interwound into a right-handed helix. According t o collagen type, the a-chains may be identical or dissimilar. A minimum of 12 collagen types are recognized on the basis of features including protein sequence, tissue distribution, macromolecular organization and chain constitution. While most chain types have been defined using classical biochemical and histochemical methods, a recent trend has been for new types to be defined initially in terms of cloned cDNA sequences. Collagen types IX and X11 were discovered in this way (Ninomiya & Olsen, 1084; Gordon etal., 1987). In an attempt to bring some order to the discussion of collagens, Miller ( 1985) proposed that they be classified into three groups based on chain size and whether or not they contain long uninterrupted helical domains. Group 1 (the fibrillar collagens) is the most extensively studied with chain sizes of M, 95 000 or greater and uninterrupted helical domains of around 300 nm. This group comprises collagen types 1, 11, Ill, V and K (also known as XI) which have structural properties allowing extensive lateral aggregation. Evidence concerning the first four types suggests that the structure of their corresponding genes is conserved and this Abbreviation used: CAT, chloramphenicol acetyltransferase.
Vol. 16
will be discussed further. Group 2 (non-fibril forming collagens) consists of collagen types with chains in excess of M,95 000, but with several interruptions within the helical domains. This group comprises IV, VI, VII and VIll with gene structure information limited, at present, to type IV collagen. Group 3 (short-chain collagens) comprises collagen types IX, X and XI1 whose constituent chains are less than M,95 000.
Exon urrarigemerit
The first collagen cDNAs, coding for the a2(1)and a l ( l ) chains of type 1 collagen (group I ) , were isolated from chicken in the late 1970s (Lehrach et al., 1978, 1979). The isolation of genomic clones for the a2(1)chain for sheep and chicken followed rapidly (Boyd er al., 1980; Ohkubo et al., 1980) with R-loop analyses revealing an extremely complex organization of introns and exons. Sequencing of a number of randomly chosen exons, encoding Gly-Xaa-Yaa repeats, indicated that the gene consisted predominantly o f 54 bp exons, each of which encoded exactly 18 amino acids and with each exon beginning at the start of a glycine codon (Yamada et al., 1980). Any exceptions to this were still multiples of 9 bp, namely 45,99. 108 and 162 bp. A model was proposed that 54 bp was the primordial building block from which collagen genes evolved and this view remained over the years with the isolation of many of the other genes encoding chains f o r group 1 collagens. The genes for the fibrillar collagens are typically divided into 50 or more exons and it is notable that the size and coding potential of exons is highly conserved between different group 1 collagen genes (Yamada et al., 1984; Chu et al., 1984; Weil et al., 1987). In spite of extensive conservation between these genes in terms of exon organization, there is little evidence for conservation within corresponding introns. Indeed, there is considerable variation in intron length from gene to gene resulting in overall lengths that vary from 18 kb in the case of C'OLlAl, the gene encoding the al(1) chain of type I collagen (Barsh et al., 1984), to 35 kb in the case of COLlA2, the a2(1)chain gene (Dickson et al., 1985). Collagens have both N - and C-terminal globular propeptides which aid in the assembly of the triple helix and which are subsequently cleaved off. The exons encoding these propeptides d o not follow the 'multiples of 9 bp' rule of the helical region exons, though their structure and organization are still extensively conserved. The group 1 collagens must form into highly ordered, staggered bundles of fibrils and systematic evolution of genes from simple building blocks would ensure the formation of a structure compatible with these constraints. An alternative view is that introns were
66 1
662 somehow introduced into a gene that had previously consisted of a single continuous open reading frame. Such a change would help ensure that unequal crossing over between homologous regions of a gene at meiosis would be reduced and any lengthening or shortening of the a-chain avoided. The isolation and characterization of type 1V (group 2) and type 1X (group 3 ) collagen genes was to change that view (Kurkinen et al., 1985; Lozano et al., 1985).Exons with sizes of 64, 71, 78, 123, 147 and 182 were found. Not only did this run contrary to the ‘multiples of 9 bp’ rule, but exon-intron boundaries were found to interrupt codons. The significance of this lack of conservation of gene structure across all collagen types is said to reflect different constraints on protein conformation in these collagen types which d o not form ordered fibrils. The gene encoding type X collagen is also ‘atypical’ in that the a-chain and N- and C-terminal propeptides are encoded in a single long open reading frame uninterrupted by introns (Ninomiya et ul., 1986). Type X collagen isolated from three different species has a-chains with a conserved size of 45 kDa suggesting that the lack of introns has not resulted in major rearrangements of the genes. In view of the very different gene structure of type X collagen as compared with those of types 1X and XI1, it must be considered whether the criterior, of chain length is sufficient to define the group 3 and whether a subdivision of this the group is necessary. Single or multiple genes?
Early evidence from Northern blotting suggested the possibility of multiple copies of collagen genes since cloned cDNA probes revealed multiple mRNAs. Either there were single genes with multiple transcripts or multiple genes each with its own transcript. To resolve this, a formal count of gene copy number was made for the human a2(I) collagen gene (Dalgleish et al., 1982)revealing that it was present in a single copy. Subsequent detailed analyses of the 3’ end of the chicken and human a2(1) gene explained the multiple transcripts as being due to the presence of more than one polyadenylation signal (Aho et al., 1983; Myers et al., 1983). Thcre is, in fact, n o evidence for duplicated copies of any of the collagen genes and multiple polyadenylation signals are now known to be a common feature. Controlling elements
Two main approaches have been used to determine the position and nature of elements that are important in the control of expression of collagen genes. The first of these is DNA sequence analyses which can reveal conserved regions and possible secondary structures. A conserved DNA sequence within the C-terminal propeptide coding regions of a number of collagen genes was noted by Yamada et al. (1983a), though no function has yet been ascribed to it. Analysis of the 5‘ end of the a 1( I ) , a 2 (1) and a 1(Ill)collagen genes from chicken has revealed that the initiator codon in each case lies within a region of conserved DNA sequence with the potential to form a hairpin structure (Yamada et al., 19836).This sequence is not found in other genes and so may be specific to collagen genes, possibly playing some role in translational control though, so far, no factor has been identified that interacts with the hairpin. A more direct approach to the identification of important controlling elements in collagen genes is to study their expression both in vivo and in vitro. In practice, this is usually accomplished by fusing promoter regions from collagen genes to the bacterial gene for chloramphenicol acetyltransferase (CAT).These hybrid constructs are used to produce transgenic mice or are introduced into cultured cells
BIOCHEMICAL SOCIETY TRANSACTIONS where the levels of CAT are easily assayed. By expressing hybrid gene constructs in transgenic mice, it has been demonstrated that all the information required for tissueand stage-specific expression of the mouse a 2 (I ) collagen gene lies within the region -2000 to + 5 4 of the gene (Khillan et al., 1986). Study of this same gene, expressed in cultured fibroblastic and lymphoid cells, has identified an enhancer lying in the first intron of the gene which is active in the former but not the latter cell type (Rossi & d e Crombrugghe, 1987). Sequence variation
Nucleotide changes in collagen genes which result in amino acid substitutions are mostly catastrophic, generally resulting in severe heritable connective tissue disorders such as osteogenesis imperfecta and the Ehlers-Danlos syndromes (Tsipouras & Ramirez, 1987). However, in this laboratory, we have recently identified a human type 111 collagen variant by comparing the sequences of cDNAs characterized in this and other laboratories (B. S. Mankoo and R. Dalgleish, unpublished work). Chromosomal location
The genes encoding the human collagens are dispersed in the genome with assignments made to chromosomes 2,7, 12, 13, 17 and 2 1. In the case of type IV collagen, the two genes encoding the a l ( 1 V ) and a2(1V) chains are known to map within a 400 kb region on chromosome 13. Interestingly, genes for three different collagen chains ( a l ( l l l ) ,a 2 ( V ) and a3(VI)) and for two other connective tissue components, elastin and fibronectin, all map to the end of the long arm of chromosome 2. Aho, S., Tate, V. & Boedtker, H. ( 1983) Nucleic Acids Hex 11, 5443-5450 Barsh, G. S., Roush, C. L. & Gelinas, R. E. (1984) J . Hiol. (’hem. 259, 14906-14913 Boyd, C . D., Tolstoshev, P., Schafer, M. P., Trapnell, B. C.. Coon. H. C., Kretschmer, P. J., Nienhuis, A. W. & Crystal, R. G. (1980) J . Biol. Chern. 255, 32 12-3220 Chu, M.-L., de Wet, W., Bernard, M., Ding, J.-F., Morabito. M., Myers, J., Williams, C. & Ramirez, F. (1984) Nuture (1,ondorr) 310,337-340 Dalgleish, R., Trapnell, B. C., Crystal, R. G. & Tolstoshev, P. ( 1982) J . Biol. Chern. 257, I 3 8 16- 13822 Dickson, L. E., de Wet, W., Di Liberto, M.. Weil, D. & Kamirez. F. ( 1985) Nucleic Acids Hes. 13,3427-3438 Gordon, M. K., Gerecke, D. R. & Olsen, B. R. (1987) I’roc. Nut/. Acad. Sci. U.S.A. 84, 6040-6044 Khillan, J. S., Schmidt, A,, Overbeek, P. A.. de Crombrugghe. B. bi Westphal, H. ( 1 986) I’roc. Nut/. Acud. Sci. U.S.A. 83, 725-729 Kurkinen, M., Bernard, M. P., Barlow, D. & Chow, L. T. (1985) Nature(London) 311, 177-179 Lehrach, H., Frischauf, A. M., Hanahan, D., Wozney, J., Fuller, F,, Crkvenjakov, R., Boedtker, H. & I h t y , P. ( 1978) I’roc. Nut/. Acud.Sci. U.S.A.15.5417-5421 Lehrach, H., Frischauf, A. M., Hanahan, D., Wozney, J., Fuller, F.& Boedtker, H. ( 1 979) Biochemistry 18,3 146-3 I52 Lozano, G., Ninomiya, Y., Thompson, H. & Olsen. B. K. ( I 985) Proc. Natl. Acad. Sci. U.S.A. 82, 4050-4054 Miller, E. J. ( 1985) Ann. N. Y. Acud. Sci. 460, 1 - I3 Miller, E. J. & Gay, S. ( 1987) Methods Enzyrnol. 144,3-4 I Myers, J. C., Dickson, L. A,, de Wet, W. J., Bernard, M. P., Chu. M.-L., Di Liberto, M., Pepe, G., Sangiorgi, F. 0. & Ramirez, F. (1983) J . B i d . Chem. 258,10128-10135 Ninomiya, Y. & Olsen, B. K. ( 1984) I’roc. Nut/. Ac,cid. Sci. U.S.A. 81, 3014-3018 Ninomiya, Y., Gordon, M., van der Rest, M., Schmid, T., Linsenmeyer, T. & Olsen, B. R. ( 1986) J . Hiol. (’hem. 26 1, SO4 1-5050 Ohkubo, H., Vogeli, G., Mudryj, M., Avvedimento, V. E., Sullivan, M., Pastan, 1. & de Crombrugghe, B. ( 1980) l’roc. Null. Accid. Sci. U.S.A. 11,7059-7063
1988
625th MEETING, LONDON
663
Rossi. P. & de Cromhrugghe, B. ( 1987) f’roc. Natl. Acad. Sci. U.S.A. 84, 5.590-5594 Tsipouras. P. & Ramirez, F. ( 1087) J. Med. Genet. 24,2-8 Weil, D., Bernard, M., Gargano, S. & Ramirez, F. ( 1987) Nucleic Acids Hes. 15, 18 I - 198 Yamada, Y., Avvedimento, V. E., Mudryj, M., Ohkuho, H., Vogeli, G.. Irani. M., Pastan, 1. & de Cromhrugghe, B. (1980) Cell 22, 887-892
Yamada, Y., Kuhn, K. & de Cromhrugghe. H. ( 1 9 8 3 ~Niccleic ) Acids Res. 11,2733-2744 Yamada, Y., Mudryj, M. & de Cromhrugghe, B. (1983h) J . Riol. Chern. 258, 149 14- I49 19 Yamada, Y., Liau, G., Mudryj, M., Ohici. S. & dc Crombrugghe, 13. (1984) Nature (London) 310,333-337
Received I S April 1988
The collagens of skin: structure and assembly MICHAEL E. GRANT and SHIRLEY AYAD Depurtmerit of Biochemistry and Molecular BioioD, School of Riologicul Sciences, University of Matichester, Manchester M I 3 W T , U.K . The major physiological function of the skin is to serve as a barrier between the body and the outside world and the major component of this barrier is collagen, which represents over 70% of the dry weight of skin. The collagen fibres provide the structural framework underlying the epidermis and their interactions with other connective tissue macromolecules such as glycoproteins, proteoglycans and elastin, provide the strength and flexibility of the tissue such that maximum mobility and adaptability is permitted. The major portion of the collagen is, of course, to be found in the dermis where dermal fibroblasts produce the collagen molecules which assemble into fibres of high tensile strength. Such collagen fibres exhibit the classical cross-striated pattern with a 67 nm periodicity when stained and visualized in the electron microscope. In addition, other collagenous molecules in skin are also present in connective tissue elements such as reticulin fibres, basement membranes, anchoring fibrils, microfibrillar components and small blood vessels. This brief review is intended to provide a synopsis of recent advances in our understanding of the structure and assembly of the collagenous structures found in the skin and factors controlling their synthesis.
Genetic diversity of collugeri striictiire The term collagen is now recognized to encompass a family of genetically distinct proteins each having characteristic structural features which give rise to specific supramolecular structures in the extracellular matrix, Over the last 10-15 years, 12 different genetic types of collagen (designated I-XII) have been described in avian and mammalian connective tissues (Kuhn, 1986; Gordon et ul., 1987) and other newly identified collagenous molecules remain to be characterized and added to the list. Much has been learned about the monomeric forms of these collagen types and their assembly, but in many instances, we still remain quite ignorant about the organization and function of these components in the extracellular matrix. Nevertheless, it is becoming possible to classify the different collagen types into three groups on the basis of the supramolecular structures they assume in the extracellular matrix, namely fibres, microfibrils/microfilaments or non-fibrillar meshworks. Sequence analysis of cDNA and genomic DNA clones is aiding in these studies, for structure and function must inevitably be related to the primary amino acid sequences encoded by the genes for the various collagen types. The genetically distinct collagens exhibit a significant degree of tissue specificity, e.g. type IV is only found in basement membranes, type 11 is cartilage specific and type X localizes exclusively in calcifying cartilage (Kuhn, 1986). In contrast. the major interstitial fibrillar collagen types, I and Vol. 16
111, have a very widespread distribution, as does the microfibrillar type VI collagen (Rauterberg et ul., 1986; Trueb et al., 1987)which occurs in cartilaginous (Ayad et ul., 1987) as well as non-cartilaginous tissues. In the skin, at least six different collagen types are found (Table 1 ) and although types IV, V, VI and VII are quantitatively minor collagens, they undoubtedly have crucial roles in maintaining skin architecture and integrity. All these molecules possess properties which allow them to be defined as collagens, although our concept of collagens as rigid rod-like molecules which selfassemble into fibres exhibiting a regular periodicity has had to be modified in recent years.
Definition of collagen Collagen molecules are composed of three polypeptides, termed a-chains, that are wound into a characteristic triple helix as a consequence of their unique primary structures in which glycine occurs at every third residue of the a-chain. Thus, the helical sequences have a repeating -Gly-Xaa-Yaastructure in which X is frequently proline and Y often 4-hydroxyproline. Only the small hydrogen atom side-group of glycine will fit into the centre of the helix, and if the glycine is not present in the Gly-Xaa-Yaa sequence, the triple helix is interrupted. In the helical sequences glycine must account for 33% of the amino acids present and the imino acids, proline and 4-hydroxyproline, normally contribute a further 20-25%. Collagens also contain hydroxylysine residues which can participate in covalent cross-links (Eyre ef a/., 1984), as well as providing sites for attachment of galactose and/or glucosylgalactose saccharide units (Kivirikko & Myllyla, 1984). Another amino acid residue unique to collagens is 3-hydroxyproline, found only in the sequence -Gly3Hyp-4Hyp-. which may occur up to 10- 1 2 times in certain a-chains (Kivirikko & Myllyla, 1984). It is noteworthy that some other proteins, such as acetylcholinesterase and Clq, also contain short triple-helical collagenous sequences rich in glycine, hydroxyproline and hydroxylysine, but their lack of a structural role in the extracellular matrix excludes them from the general definition of a collagen (Reid, 1982). The individual a-chains, which are each the product o f different genes, are identified by an arabic numeral, followed by a roman numeral of the collagen type. Thus, type Ill collagen which is comprised of only one type of a-chain has the molecular configuration [ a1 (III)J3. Other collagen types occur as heterotrimers such that the two genetically distinct a-chains of the commonest of all collagens, type I, are present in the ratio 2: 1, giving rise to a type I helix of composition [ al(l)l2a2(1).However, small amounts of the homotrimer [ a l ( I ) l 3have also been detected in skin (Uitto, 1979). The possibility of three different a-chains in a helix is exemplified by type VI collagen and within some types, for example, collagen type V, evidence for more than one heterotrimer has been obtained (Table 1). In all instances the collagens are synthesized as soluble precursors which undergo a series of post-translational