Dec 15, 1994 - knob domain of the adenovirus type 5 (Ad5) fiber protein has been ...... 12-1.7 A were set aside as an independent check for the improvement of ...
Crystal structure of the receptor-binding domain of adenovirus type 5 fiber protein at 1.7 A resolution Di xial, Lynda J Henry2, Robert D Gerard2 and johann ~eisenhoferl* 'Howard Hughes Medical Institute and ?Departmentsof Biochemistry and Internal Medicine, The University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75235-9050, USA Background: Adenoviral infection begins with the binding of virion to the surface of host cells. Specific attachment is achieved through interactions between host-cell receptors and the adenovirus fiber protein and is mediated by the globular carboxy-terminal domain of the adenovirus fiber protein, termed the carboxy-terminal knob domain. Results: The crystal structure of the carboxy-terminal knob domain of the adenovirus type 5 (Ad5) fiber protein has been determined at 1.7 A resolution. Each knob monomer forms an eight-stranded antiparallel P-sandwich structure. In the crystal lattice, the knob monomers form closely interacting trimers which possess a deep surface depression centered around the
three-fold molecular symmetry axis and three symmetry-related valleys. Conclusions: The amino acid residues lining the wall of the central surface depression and the three symmetryrelated floors of the valleys are strictly conserved in the knob domains of Ad5 and adenovirus type 2 (Ad2) fiber proteins, which share the same cellular receptor. The Psandwich structure of the knob monomer demonstrates a unique folding topology which is different from that of other known antiparallel P-sandwich structures. The large buried surface area and numerous polar interactions in the trimer indicate that this form of the knob protein is predominant in solution, suggesting a possible assembly pathway for the native fiber protein.
Structure 15 December 1994, 2:1259-1270 Key words: adenovirus fiber protein, adenovirus type 5, receptor-binding protein, virus-host interaction, viral assembly
Introduction Adenoviruses are non-enveloped icosahedral doublestranded D N A viruses whose structures and modes of replication are well-characterized [I]. Besides being major pathogenic agents which lead to numerous infectious diseases, adenoviruses are also invaluable as model systems in molecular biology [I]. Recently, adenoviruses have attracted special interest as some of the most effective viral vectors for gene therapy [2,3]. Such vectors are characterized by high efficiency of gene transfer, relatively large D N A capacity (7-8 kb), and an ability to infect a wide range of cell types. Under the electron microscope, an adenovirus particle resembles a space satellite with protruding antennae [ l ,4,5]. The virion contains at least 11 structural proteins and one double-stranded linear genomic D N A molecule. The viral capsid comprises at least six different polypeptides, including 240 copies of the trimeric hexone (polypeptide 11) and 12 copies each of the pentameric penton (polypeptide 111) base and trimeric fiber [6]. Adenoviral infection begins with the specific attachment of the virus to host-cell receptors followed by internalization of the viral particle through receptor-mediated endocytosis [4,7-91. The binding of cell-surface receptors to the viral fiber protein is both strong (Kd 1 0 - ~ - 1 0 - ~M) ~ and specific. Adenoviruses of different subgroups recognize different protein molecules on the cell surface. The identity of these host-cell primary
receptors and the mechanism of recognition at the atomic level is unknown. Adenovirus type 2 (Ad2) requires additional host-cell receptors for infection [lo]. These secondary receptors are cell-surface integrins, which interact with the R G D (one-letter amino acid code) sequences on the penton base located at the vertices of the icosahedral virus [ l 01. The fiber proteins, protruding outward from the vertices of the icosahedral viral particle, can be divided into three structural domains [ l 11. The amino-terminal tail of the fiber protein is attached non-covalently to the penton base at each vertex. T h e carboxy-terminal segment of the fiber protein folds into a globular 'knob' domain, which is necessary and sufficient for virion binding to host cells [12,13]. Between these two terminal domains, the fiber protein forms a long shaft whose length varies among virus serotypes. T h e amino acid sequence of the shaft is characterized by repeating motifs of -15 residues which share a common pattern of hydrophobicity [I 4-1 71. Adenovirus fiber proteins vary significantly in length among different serotypes, ranging from six repeating units in the shaft of Ad3 [18] to 22 repeats in Ad5 and Ad2 [ l 11. Early models described fibers as dimers [ l 1,181, but more recent data support the trimeric models [6,19-211. In this paper, we describe the crystal structure of the carboxy-terminal receptor-binding knob domain of the Ad5 fiber protein determined at 1.7 A resolution, and discuss its functional implications.
*Correspondingauthor. O Current Biology Ltd ISSN 0969-21 26
1260
Structure 1 994, Vol 2 No 1 2
Results and discussion Structure determination
The knob domain of the Ad5 fiber protein has been expressed in and purified from Eschericlzia coli [12]. It includes 196 amino acid residues (residues 386-581 of the intact fiber protein) and has a molecular weight of 21 279 Da. The expressed knob protein forms trimers in solution and can compete efficiently with Ad5 for receptor binding. Antibodies to the Ad5 knob domain efficiently neutralize Ad5 viral infectivity. Crystals of the knob protein can be grown consistently using a microseeding technique [22]. These crystals diffract X-rays to 204~1)) Rms deviation of bond length from ideal value Rms deviation of bond angle from ideal value Rms deviation of improper angles from planarity Mean coordinate error estimated from Luzzati plot Rms coordinate error estimated from SiGMAA plot Mean main chain B-factor Mean side chain B-factor
0.162 0.011 A 1.6" 1.5" 0.014A 0.017 A 18.3 A2 21.2 A2
!, lFobsl is aR-factor is defined as R = ZIIFobsl- I ~ ~ ~ ~ , l l l Z l ~ , ~ ,where the observed structure-factor amplitude, and IF& is the calculated structure-factor amplitude. This number was obtained for the 17633 unique reflections with lFl> 20(IFI) in the resolution range 12.0-1.7A. w h ~ c hwere used for refinement.
atoms. Mean B-values for main-chain and side-chain atoms of each residue are shown in Fig. 2. The B-values for residues in the P-sheets are much lower than average, indicating a well ordered, compact core structure. The B-values for residues at the amino terminus and in some surface loops are significantly higher, indicating flexibility in these regions of the structure. For example, residues 538-546 (part of the HI loop, see below) have a mean B-value of 50 A*. The two cysteine residues, Cys411 and Cys428, near the amino terminus are in their free thiol form and both are accessible to heavy-atom compounds.
The knob structure was determined by the multiple isomorphous replacement (MIR) method at 3 A resolution, and subsequently refined at 1.7 A resolution. The crystallographic R-factor of the refined model is 16.2% for 17 633 unique reflections with structure-factor amplitudes greater than two standard deviations in the resolution range 12 to 1.7 A. The refined atomic model contains 151 water molecules, which cover almost the entire surface of the molecule. The mean coordinate error estimated from a Luzzati plot [23] is 0.14 A, and the root mean square (rms) coordinate error calculated from a SIGMAA plot [24] is 0.17 A. A Ramachandran plot using PROCHECK [25] shows that the dihedral angles of peptide backbone atoms occupy energetically favorable regions. Table 1 gives statistics on the geometry of the refined model. Electron density is clearly defined for most amino acid residues of the expressed knob domain of Ad5 fiber protein. Exceptions are residues 386-392 at the amino terminus, for which no electron density can be found; residues 393-397 and 540-546 have weak and discontinuous electron density. An example of the electron density at 1.7 A resolution is given in Fig. 1.
The monomer structure is an eight-stranded antiparallel P-sandwich (Figs 3a and 3b). The P-sheet which consists of P-strands G, H, I and D is exposed to solvent and presumably faces the cellular receptor; hence it is called the 'R-sheet'. The other P-sheet, composed of P-strands J, C, B and A, is partially buried in the trimeric structure and is closer to the virion; it is named the 'V-sheet'. The two small P-strands E and F are attached to the V-sheet, but are considered as part of the D G loop (see below). As in most known P-sandwich structures, residues from both P-sheets are packed together to form a hydrophobic core. The directions of the P-strands in the R-sheet and those in the V-sheet differ by an angle of -30'.
The refined knob model has an mean B-value of 18.3 A* for all main-chain atoms and 21.2 A* for all side-chain
The P-strands represent only 35% of the knob monomer structure; the remaining 65% comprise turns
Structure of the knob monomer
I
1
1
I
Fig. 1. Stereoview of electron density around the three cysteines ~ ~ ~ 4 2 8 , Cys1428 and Cys2428, which are related by the three-fold molecular symmetry axis. Tbe electron density is calculated at 1.7 A resolution and contoured at 1.2 standard deviations. The residue numbering of one monomer follows that in the fiber sequence; residue numbers in the other two monomers are increased by 1000 and 2000 respectively. The three cysteines are readily accessible to heavy-atom compounds. The distance bstween each pair of thiol groups is 4.1 A, and together the thiols form the floor of the central surface depression.
Receptor-binding domain of adenovirus type 5 fiber protein Xia et al. and loops connecting the 3-strands. The six prominent loops with lengths between 8 and 55 residues are labeled AB, CD, DG, GH, HI and IJ. The DG loop consists of 55 residues including the short 3-strands E and F (residues 479-486), which are attached to the V-sheet (Fig. 4). The DG loop contributes to the hydrophobic core of the knob monomer with the predominantly hydrophobic residues 494-505, which intercalate between 3-sheets at one edge of the structure (the lower edge in Figs 3a and 3b). The larger gap between the 3-sheets at this edge gives the 13-sandwich a tetrahedron-shaped structure, which is necessary for the assembly of the knob trimer. The reverse turn is another abundant structural motif in the high-resolution knob structure. Nearly 25% of the residues are involved in such motifs. All of the 13 observed reverse turns (Table 2) have been classified based on the nomenclature proposed by Wilmot and Thornton [26]. The second turn in Table 2 has two prolines, both of which are in the trans configuration. Because of the limited conformational freedom for the alanine residue immediately amino-terminal to Pro407, no hydrogen bonds are formed between the residues in the first and fourth positions. The sixth turn in Table 2 contains a cis-proline in the third position. Hydrogen bonding occurs between residues in the second and third positions instead of in the first and fourth positions. Hydrogen-bonding interactions within the molecule are extensive. Nearly all possible hydrogen-bonding donors and acceptors are properly paired. The hydrogen-bonding pattern for the 13-sandwich motif of the molecule is drawn schematically in Fig. 4, which also shows the connectivity between 13-strands. Out of the hydrogen-bonding interactions identified, 39 hydrogen bonds and 11 water-mediated hydrogen-bonding interactions are not involved in defining the 13-sandwich motif of the molecule. Although the antiparallel 3-sandwich folding of the knob monomer is similar to that of other antiparallel 13-sandwich structures, such as immunoglobulin lightchain domains [27] or subunits of Cu,Zn-superoxide dismutase (SOD) [28], the connectivity between 1-strands for the knob molecule is unlike that in any other known structure. We probed the Brookhaven Protein Data Bank (PDB) with the knob structure using the program WHATIF [29], and found no structure with matching topology. Thornton and colleagues [30,31] recently identified five unique topologies for antiparallel 13-structures in the PDB; these unique topologies, together with that for the knob monomer, are shown in Fig. 5. This figure also shows representative structures for these unique topologies: human rhinovirus coat protein VP1 (for jelly-roll topology); human Fab New light-chain constant and variable domains; SOD; and tomato bushy stunt virus P domain. Clearly, the knob monomer is different from any of these. The structure of the hexon [32], a major capsid
Fig. 2. Mean B-factors of knob domain residues: (a) side-chain atoms and (b) main-chain atoms. The boxes marked A-J indicate the position of 3-strands. Two regions with very high B-factors, at the amino terminus and in the loop between -strands H and I, are observed. component of the adenovirus, consists of two antiparallel 3-barrel motifs. Both adopt the jelly-roll topology found in most viral capsid proteins. Superpositions of the knob structure with many known antiparallel -sheet structures were performed using the programs WHATIF and HOMOLOGY [33]. The closest structural similarity was found between the knob monomer and SOD [28], an enzyme present in all aerobic organisms. The monomer of the enzyme has 151 residues, of which 55 can be superimposed onto the knob monomer with an rms deviation of 2.75 A. The matching segments include strands from both 3-sheets. However, the folding topologies of the two proteins are different (Fig. 5), they share no obvious primary sequence homology, and the functional form of SOD is a dimer. For these reasons an evolutionary relationship between the two proteins is unlikely. The superpositions of viral coat proteins such as the rhinovirus 14 VP1 and VP2 [34] with the knob monomer give rms deviations of 2.64 A and 4.09 A for 41 and 50 residues respectively. The matching segments include strands from only one of the 3-sheets of the viral coat proteins, because these proteins form antiparallel 3-barrel structures whose 3-strands are almost parallel to each other.
1261
1262
Structure 1994, Vol 2 No 12
Fig. 3. Structure of the knob monomer. (a) Stereo ribbon diagram of the knob monomer, showing the eight-stranded antiparallel P-sandwich fold. 3-Strands are represented as ribbons in green, and coils and loops are shown in yellow. The -strands G, H, I and D comprise the R-sheet, and -strands J, C, B, A, E and F the V-sheet. The diagram was drawn using the program RIBBONS [54]. (b) Stereoview of the C,-trace of the knob monomer with every tenth residue labeled.
Fig. 4. Hydrogen-bonding pattern for the P-sandwich of the knob monomer. Square boxes identify residues that are involved in hydrogen bonding in the Psandwich motif. The thick black lines denote hydrogen bonds between residues. The V-sheet (-strands J, C, B, A, Eand F)and the R-sheet (13-strands GC, H, I and D) are shown schematically.
Structure of the knob trimer The knob monomers form homotrimers both in solution and in the crystal lattice [12]. The crystal structure of the knob reveals that the overall shape of the trimer resembles a three-bladed propeller with a surface depression around the three-fold molecular symmetry axis (Fig. 6a). The R-sheets of the three monomers are the faces of the blades. The trimer is -62 A in diameter and -40 A high. The interactions between the monomers are extensive and include hydrogen bonds, salt bridges, van der Waals contacts and water-mediated charge interactions. In total there are 39 inter-subunit salt bridges and hydrogen bonds (Table 3) and 51 watermediated hydrogen-bonding interactions (Table 4) between monomers. This network of interactions
contributes to the stability of the trimer. In addition, 18 van der Waals contacts are formed between a pair of monomers. The V-sheets play a significant role in the trimerization by providing many contacts between monomers (Fig. 6b), especially through the J strands, which are in close contact with each other. Most of the inter-subunit interactions are made between 13-strands J and G, as shown in Tables 3 and 4. The only strands involved in the trimerization from the R-sheets are the G strands, which are close to the three-fold molecular axis and participate in hydrogen-bonding interactions with each other and with adjacent J strands (Tables 3 and 4). Therefore, these two regions of the structure are critical for trimer formation.
Receptor-binding domain of adenovirus type 5 fiber protein Xia et al.
Table 2. Reverse turns in the knob monomer structure. No.
Residue
D (A)a
From
To
1 2 3 4 5
307 404 415 428 443
400 407 418 431 445
6 7 8 9 10 11 12 13
445 449 482 494 512 525 558 563
447 452 485 497 515 528 561 566
Positions in the turn
Dihedral angles
Type of turn
14(2)
4/1(3)
Thr Pro Asp Gln Ala
-71, -4 -72, -20 -52, 135 81, -123 -61, -16
-103, 5 -137, 73 42, 44 -74, 7 -109, 2
O (PE p'y go co
lie Val Leu Phe Asn Lys His Glu
-139, 80 - 79, -6 64, -123 -56, -29 -57, -29 -54, -22 -51, 132 -56, 133
-87, 167 -90, -7 -103, 11 -68, -11 -76, -1 -89, 0 91, 1 81, 1
E3pcis-Pro ME OX
1
2
3
4
3.02 b 457 3.28 3.60 3.17
Asp Thr Ala Met Gly
Lys Pro Glu Gly Ser
Leu Ala Lys Ser Leu
6.54c 3.84 2.83 2.85 3.03 2.90 2.95 3.08
Leu Ser Asn Ala Ala Asp Trp Trp
Ala Gly Gly Val Lys Lys Ser lie
Pro Thr Asp Gly Ser Thr Gly Asn
/
a at ax
3 pY 3
pY
aHydrogen-bonding distance is defined as being between CO of the residue in the first position and NH of the residue in the fourth position unless otherwise indicated. Hydrogen bonds were selected from X-PLOR followed by manual editing with the distance of H atom from the acceptor < 3.0 A ° and an angle of deviation from linearity not > 30 . bThis turn has two trans-prolines. There is no hydrogen bonding between the residue in the first position and that in the fourth position. However, the distance between CO of the residue in the first position and NH of the residue in the third position is 3.46 A. This reverse turn contains a cis-proline. The distance listed in the table is between NH of the residue in the first position and CO of the residue in the fourth position. These groups are pointing away from each other, and no hydrogen bond is formed between them. However, the NH of the residue in the second position and the CO of the residue in the third position are 3.37 A apart, allowing a hydrogen bond to form between them.
Table 3. Inter-subunit salt bridges and hydrogen bonding in the molecular asymmetric unit. Hydrogen bond donor
Hydrogen bond acceptor
Location
Residue
Group
Group
Residue
Locationa
cG B EF EF DG E cG
Lys1513b Thr26 Ser1430 Ser430 Asn500 Arg481 LysS13 5er2572 Ser2572 Thr574 Ser576 Arg2481
N& 071 Oy Oy N82 Ntl2 Nc Oy N Oy1 N Nq2
O OE1 O01 Oy1 061 Os2 071 O O Oy O'y1 OTe
Pro405 Giln1431 Thr426 Thr2400 Asp2484 Glu1581 Thr2422 Lys513 Asn515 Ser1576 Thr2574 Glu581
AB C B A EF ct B cG G J J ct
J J J J E
D (A)
2.87c 2.93 3.21 2.51 2.87 d 305 3.03 2.72 2.78 2.67 3.07 d 2.82
aThe symbols are defined as: cG: close to G strand; : B strand; EF:between E and F strands; D: between D and G strands; J: J strand; E: E strand; AB: between A and B strands; C: C strand; A: A strand; ct: carboxy-terminal. bResidue numbering of one monomer follows that in the fiber sequence. Residue numbers in the other two monomers are increased by 1000 and 2000. Hydrogen bonds were selected from X-PLOR followed by manual editing with the distance from the H atom to the acceptor being 30. dSalt bridges. eCarboxy-terminal oxygen atom.
The loss of accessible surface area in the trimeric knob upon monomer association totals -1 950 A2, which is typical for most oligomeric proteins [35]. Consequently, the trimeric state of the knob protein should also be predominant in solution; this assumption is further supported by a gel-filtration analysis of Ad5 [12] and a cross-linking study of Ad2 [13]. Loops shape the outer surface of the trimer but contribute very little to the trimerization of the knob protein.
The knob trimer has a deep surface depression which presumably faces the cellular receptor and is centered on the three-fold molecular symmetry axis (Figs 7 and 8). This depression has a diameter of -24 A at the surface and reaches as a narrowing channel -15 A deep into the knob trimer. Three cysteines, one from each monomer, approach the three-fold axis at the bottom of the depression and are in van der Waals contact with each other (Fig. 1). With the exception of Lys526, which is in the depression, and Lys528, which sits on the rim of the depression, the residues lining the wall of the depression are mostly uncharged and hydrophilic. No hydrophobic residues are found in the depression (Fig. 8). Other important features on the surface of the knob are the valleys formed by the R-sheets and the HI loops. As expected, the residues that are exposed to solvent in the valley are mostly hydrophilic, and the hydrophobic residues have their side chains buried in the interior of the molecule. The HI loop comprises 13 residues, all of which are hydrophilic. The high B-factors of residues 538-546 in the HI loop indicate flexibility. Fiber assembly The recombinant knob includes 15 residues of the 22nd repeating unit of the shaft domain, but the first seven residues (residues 386-392) do not have significant electron density. This could be attributable to either a lack of crystallographic three-fold symmetry in the shaft region, or to mobility of the amino terminus. The amino termini of the monomers in the ordered structure (residue 393 and symmetry-equivalent residues) are close to the three-fold molecular symmetry axis in a position where the carboxy-terminal end of the shaft
1263
1264
Structure 1994, Vol 2 No 12
Fig. 5. Diagram of unique topologies for p-structures. The connectivity of the knob monomer is shown together with other unique topologies for antiparallel p-structures identified in the PDB [30,31]. Ribbon diagrams of the corresponding structures are also shown. For clarity, the carboxy-terminal and/or amino-terminal residues for some molecules have been excluded from the drawing. The ribbon diagrams were drawn with MOLSCRIPT [55]. The connectivity between P-strands for the knob molecule is different from that of the others as shown by this diagram. HRV, human rhinovirus; Ic, light-chain constant domain; Iv, light-chain variable domain; TBSV, tomato bushy stunt virus; SOD, Cu,Zn-superoxide dismutase. domain is expected to be in the intact fiber protein (Fig. 6b). On the basis of the characteristic repeating motifs in the fiber sequence, a triple-helical model was proposed for the shaft of Ad2 [21]. This model is consistent with the shaft length observed by electron microscopy, and its Fourier transformation matches the low-resolution X-ray diffraction data of Ad2 fiber crystals. Unfortunately, the disorder in the amino terminus of the knob structure prevents the drawing of further conclusions on the structure of the shaft.
The trimeric knob structure with its large subunit contact areas and many polar interactions probably facilitates the trimerization of the fiber protein. Although the assembly pathway for the fiber protein is not known, the fact that the knob domain is capable of self-trimerization leads us to assume that the assembly pathway of the fiber protein may start with folding of the knob domains, followed by their association into trimers. This, in turn, may trigger the trimerization of the shaft. This view is supported by the observations that the deletion of the
Receptor-binding domain of adenovirus type 5 fiber protein Xia et a/.
carboxy-terminal portion of the fiber protein leads to disrupted fiber proteins, and that the 40 residues at the carboxyl terminus of the fiber protein are particularly important for trimerization [36]. These residues lie in the knob structure region where most inter-subunit interactions are found (Tables 3 and 4). Additionally, fiber assembly lags behind fiber protein synthesis, suggesting that the entire fiber protein must be synthesized before trimerization [37].
Table 4. Inter-subunit water-mediated hydrogen bonds in the molecular asymmetric unit. Water
Residuea
Croup
~ocationb D (A)
Comments
Structure conservation and putative receptor-binding sites
The sequences of the knob domains from seven different adenovirus fiber proteins [14-18,381 were aligned as shown in Fig. 9. Most of the conserved residues occur in the P-sandwich motif, and most of the surface loops are variable. We therefore expect that the secondary structures for knob domains of different serotypes are similar. Sequence conservation in the V-sheet is much higher than in the R-sheet. As is evident in the knob structure, the V-sheets play an important structural role in the trimerization of monomers by providing interactions at subunit interfaces. The role of R-sheets is more functional because they presumably interact with different cellular receptors for different subgroups of adenoviruses. The primary sequence is most conserved in the aminoterminal and carboxy-terminal regions of the knob domain. The high sequence homology among the various species in P-strand A, suggests that this strand may play an essential role in maintaining a common transition between the shaft and knob domains. The positioning of this P-strand may be crucial for the correct folding of the shaft domain. The carboxy-terminal P-strand J is also highly conserved. Deletion studies in Ad2 have shown that this region is critical for knob trimerization [36]. Neighboring J strands from the three monomers are in close contact with each other in the crystal structure. A third highly conserved region is the hydrophobic stretch of residues located in the DG loop between residues 494 and 505. This stretch of residues sits in the groove between the two P-sheets and is part of the hydrophobic core. Adenovirus serotypes Ad5 and Ad2 bind to the same cellular receptor [I]. The sequences of the entire fiber proteins of the two serotypes are 69% identical, whereas the sequence identity for the knob domains is only 60%. As demonstrated in Fig. 9, the sequence differences between Ad2 and Ad5 are unevenly distributed within the knob structure. High sequence variability in the surface loops probably contributes to the serological differences between Ad5 and Ad2. Regions with strictly conserved sequences in the surface depression around the three-fold molecular symmetry axis and on the floors of the valleys formed by the R-sheets and HI loops are candidates for binding sites for the cellular receptor (Fig. 7). We presume that those surface residues conserved between Ad5 and Ad2 define the receptor-bindmg
Water 2679 and water 2682 are 2.99 A apart.
"Residues interacting with the water. bThe symbols are defined as for Table 3. CHydrogen bonds were selected from X-PLOR followed by manual editing with the distance from the H atom to the acceptor being < 3.0 A and an angle deviating from linearity by not > 30". dResidue numbering of one monomer follows that in the fiber sequence. Residue numbers in the other two monomers are increased by 1000 and 2000.
specificity of the knob domains for different adenovirus subgroups. The sequence alignment of adenovirus serotypes Ad3 and Ad7 supports this view. Ad3 and Ad7 bind to the same cellular receptor, which is drfferent fiom that bound by Ad5 and Ad2 [39]. Many of the residues which are identical in Ad3 and Ad7, but different fiom thbse in Ad5 and Ad2, are located in the central surface depressions and valleys (Fig. 7). Two possible receptor-binding modes can therefore be postulated. In the first, the cellular receptor binds to the central surface depression and each trimeric knob binds to one cellular receptor. This scenario is similar to other known virus-receptor interactions, such as rhinovirus 14 binding to intertellular adhesion molecule 1 (ICAM-1) [40]. The other possible mode of bindmg to cellular receptors would utihze the floors of the valleys formed by the R-sheets and the HI loops. In this case, three receptor-binding sites would be available for each trimeric knob, similar to the binding of tumor necrosis factor to its cellular receptor [41,42]. The flexible HI loop may become ordered upon receptor binding to stabdize the receptor-ligand complex. Indeed, the very low dissociation constant for bindmg of the knob domain to receptors may reflect simultaneous binding of all three virus attachment sites.
1265
1266
Structure 1994, Vol 2 No 12
Fig. 6. Trimeric structure of the knob. (a) Stereo ribbon diagram of the knob trimer, showing the putative receptorbinding surface viewed down the threefold molecular symmetry axis. The individual monomers are colored red, purple and green. The overall shape of the trimer resembles a three-bladed propeller, with a central surface depression and three valleys formed by the symmetry-related R-sheets and HI loops. (b) Stereo ribbon diagram of the knob trimer viewed from the virus surface, showing the contacts between monomers. The amino termini of the three monomers converge at the three-fold molecular axis. In the intact fiber protein, this marks the end of the shaft region.
Biological implications
Fig. 7. Space-filling model of the trimeric knob protein, showing the deep surface depression centered on the three-fold axis, and the valleys formed by the R-sheets and HI loops. Red: amino acid residues of Ad5 that differ from those of Ad2. Yellow: residues identical in Ad5 and Ad2. Green: residues identical in Ad5 and Ad2 but different from residues that are identical in Ad7 and Ad3. The main conserved regions are in the central surface depression around the three-fold symmetry axis and on the floor the valley. This diagram was produced with the program INSIGHT II (Biosym Technologies Inc., 9685 Scranton Road, San Diego, CA 92121-2777, USA).
Adenoviral infection remains a threat to the world population, causing diseases as diverse as pneumonia, conjunctivitis, cystitis and diarrhea. These infections often become fatal to patients who are immunocompromized [43]. Any viral infection starts with recognition of the host cell by a virus, which is achieved through specialized proteins on the viral surface which can bind to surface receptors of the host cell. The knob domain of the fiber protein from adenovirus type 5 is one such receptor-binding protein subunit. The crystal structure of this domain reveals a trimeric organization with each subdomain folded into two functionally distinct -sheets. The V-sheet is highly conserved among different adenovirus serotypes and mainly provides contact surfaces in the formation of the trimer. The R-sheet is more variable in sequence and may play a role in virus-receptor interactions. The atomic structure also unveils two prominent surface features of the knob trimer: a central depression and three symmetry-related valleys. Sequence conservation among fiber proteins from different adenovirus serotypes that share the same cellular receptor indicates that these features are likely to be receptor-binding sites. The human adenovirus family consists of several subgroups of viruses. Within each subgroup, viruses tend to share many common properties,
Receptor-binding domain of adenovirus type 5 fiber protein Xia et al.
Fig. 8. Surface profile around the central depression and the residues lining the wall of the depression of the knob trimer. The profile is calculated as the distance from the molecular surface to a reference plane which is placed in front of the knob molecule normal to the three-fold molecular axis. Red color indicates the smallest, and dark blue color the largest, distance from the molecular surface to the reference plane. The key shows the main distance ranges and their corresponding colors. The depression is -15 A deep and residues in the depression are mostly hydrophilic. The diagram was prepared using the program Roadmap [56].
Fig. 9. Sequence alignment of the knob domains of seven different adenovirus fiber species. Ad2 and Ad5 belong to the same subgroup and share the same cellular receptor [1]. Ad3 and Ad7 are also human adenoviruses, but bind to a cellular receptor different from that for Ad2 and Ad5 [1]. One of the two variants of Ad40 and Ad41 fiber proteins is shown. CAV is a canine adenovirus. The shaded boxes show the conserved residues among the knob sequences from different adenovirus species, and the open boxes indicate the p-strands in the structure of the Ad5 knob protein. Most p-strands show higher sequence conservation than the loop regions of the structure. The sequence contains three highly conserved regions: the amino-terminal region from residue 400-407; the carboxy-terminal region from residue 573-580; and the region between residues 494-505, which participates in the formation of the hydrophobic core of the monomer structure.
1267
1268
Structure 1994, Vol 2 No 12
such as binding to common host-cell types. The sequence comparison of different types of adenovirus fiber protein suggests an overall similarity in the structure of the knob domain of these proteins. The atomic structure of the knob domain of adenovirus type 5 will help to identify the cellular receptors, to understand how adenovirus binds to the host-cell receptor and to determine how specificity is achieved for different adenovirus subgroups. This understanding may ultimately lead to the development of anti-viral drugs which interfere with host-cell recognition, and might additionally aid the engineering of improved replication-defective adenovirus vectors for use in gene-transfer studies and human gene therapy.
Materials and methods Crystallization and heavy-atom derivative preparation Knob protein expression and purification followed the procedure described previously [12]. Knob protein in TE buffer (20 mM Tris, pH 8.0, 1 mM EDTA) was concentrated using a Centricon 10 device. The protein concentration was measured with BioRad Protein Assay Kit using bovine serum albumin as a standard. The hanging-drop vapor-diffusion method was used in the crystallization of the knob protein [22]. Protein concentrations used in the crystallization were in the range 7-10 mgmn-1. The reservoir solution contained 2% (v/v) polyethylene glycol 400, 30% ammonium sulphate, 0.1 M imidazol, pH 7.5; the hanging drop consisted of 4 of protein solution mixed with 4 l of the reservoir solution. Showers of tiny crystals were obtained overnight at room temperature. Large single crystals of dimensions >0.5 mm
were obtained with microseeding techniques [22]. Tiny crystal seeds were first diluted 108 times before being introduced into fresh hanging drops. The introduction of heavy-atom solutions directly into crystalcontaining hanging drops almost always cracked the crystals, because they were extremely sensitive to osmotic shock. Therefore, a drop of heavy-atom solution was placed beside the crystal-bearing hanging drop to equilibrate with the reservoir solution for at least 24 h before the two drops were mixed together. The final concentration of heavy atom was between 5-10 mM. Data collection Measurements of X-ray diffraction intensity data were carried out using a Xuong-Hamlin multiwire area detector system (Mark 11) and CuK, radiation from a Rigaku RU 200 X-ray generator. Data frames were processed with the XDS program package [44,45]. Integrated intensities were merged and scaled with the PROTEIN package [46]. The knob crystals have the symmetry of space group P2 1 3 and a unit cell dimension of 86.4 A, with one monomer per asymmetric unit. The Matthews coefficient of the crystal is 2.56 A3 Da 1 [47]. Structure determination and refinement Four heavy-atom derivatives, namely Hg(Ac) 2, p-chloromercuriphenyl sulfonic acid, Thimerosal and HgCl 2 , gave significant heavy-atom signals both in difference Patterson maps and in difference-Patterson-based vector verification procedures [48]. The heavy-atom parameters were refined using PROTEIN [46]. Table 5 provides statistics on the native and derivative data sets. The MIR phases, calculated with all four derivatives, had an overall figure of merit of 0.67 in the resolution range of 20-3 A. The resulting electron-density map could be traced unambiguously except for a few loop regions between lB-strands. An
Table . Statistics on native and heavy atom derivative data sets. Derivatives
Hg(Ac)2
THIM
Resolution range (A)
< 6F > / c
ppd
Rcullise
< &F> /
80.00-11.71 11.71-8.28 8.28-6.40 6.40-5.22 5.22-4.40 4.40-3.81 3.81-3.36 3.36-3.00 average
0.29 0.27 0.28 0.26 0.23 0.23 0.25 0.26 0.25
3.04 3.35 3.50 3.0; 1.70 1.48 1.20 1.19 2.31
0.262 0.447 0.469 0.495 0.827 0.664 0.926 0.925 0.672
0.27 0.25 0.25 0.21 0.17 0.17 0.19 0.20 0.20
Data statistics Total no. of measurements No. of unique reflections Rmergef(l > 2al)) No. of heavy atom binding sites
15460 3841 4.36 3
a
p-CMPb
PP Rcuiiis
1.16 1.43 1.68 1.79 1.28 1.06 0.88 0.91 1.27
0.663 0.819 0.543 0.530 0.739 0.923 0.849 1.008 0.761
8477 3405 6.89 2
< F> / 0.31 0.24 0.25 0.21 0.18 0.20 0.22 0.24 0.22
HgCI 2
PP RCulls
3.06 4.76 2.45 2.52 2.04 1.74 1.86 1.64 2.51
0.364 0.339 0.606 0.462 0.603 0.558 0.584 0.675 0.552
15386 3850 4.54 3
Native
< 8F > /
PP
Rcullis Figure of Resolution Unique % merit range (A) reflections complete
0.20 0.19 0.18 0.14 0.11 0.11 0.13 0.14 0.13
1.64 1.64 2.58 2.72 1.85 1.49 1.09 1.00 1.75
0.526 0.606 0.399 0.470 0.580 0.722 0.987 0.898 0.661
15779 3922 4.05 3
0.77 0.85 0.86 0.85 0.77 0.70 0.62 0.51 0.67
80.00-3.54 3.54-2.81 2.81-2.45 2.45-2.23 2.23-2.07 2.07 1.95 1.95 1.85 1.85-1.77 1.77-1.70
2671 2537 2457 2376 2266 2069 1682 1267 842
95.6 95.0 92.3 90.1 85.8 78.1 64.4 48.2 31.9
113 369 18182 6.31
aThimerosal. bp-Chloromercuriphenyl sulfonic acid. CMean differences in structure-factor amplitudes between derivative and native data sets over mean native structure-factor amplitudes. dPhasing power is defined as < fH > /residual, where < fH > is the rms of structure-factor contribution in amplitudes from heavy atoms in the derivative, and residual is the rms of lack of closure error. eRCulli$ = 1(FpHl[Fp--IfHl)/z(FpHilFpI), where IFpHI is the structure-factor amplitude for heavy atom derivative, fFpj is the structure-factor amplitude for the native crystal, and IfHI is the amplitude of calculated structure-factor contribution of heavy atoms in the derivative. The summation is over all centric reflections in a given resolution shell. fRmerge is defined as :(lhj - < l h >)/1lhj, where lhi is the ith observation of a reflection with Miller index h, and < Ih > is the mean for all maesured IhSand Friedel pairs. The summation is carried out over all observations with I > 2cr(l). For each derivative data set, only one crystal was used to complete the data collection. The native data set was collected using six crystals.
Receptor-binding domain of adenovirus type 5 fiber protein Xia et a / . atomic model was b u i l t i n t o the MIR map using the graphics program 0 [49]. Four kagrnents o f the k n o b atomic m o d e l were refined using the program X-PLOR [50]. T h e refinement was carried out w i t h conventional energy minimization. Phases obtained f r o m the partial structure m o d e l were combined w i t h MIR phases using the program S I G M A A [24,51]. Further m o d e l building proceeded w i t h better electron density and the four fragments were pieced together. This m o d e l was refined and phases were extended t o higher resolution using the program X-PLOR [50] in t w o separate stages. In the first stage, the phases were extended t o 2.5 A resolution in three cycles o f restrained positional refinement only w i t h intervening manual modification o f the model. After each cycle o f refinement, the calculated phases were combined w i t h the MIR phases using the program S I G M A A . T h e second stage o f refinement extended t o 1.7 A resolution. T h e standard protocol the phases & o m 2.5 o f simulated annealing w i t h slow cooling was applied in the refinement [52]. T h e initial temperature was set t o 4000 K f o r each cycle, except for the last t w o cycles where starting temperatures o f 3000 K and 2500 K respectively were used. Phase combination was n o t used at this stage o f refinement and only 2FcFc maps were calculated. T h e refinement was carried o u t w i t h 90% o f the data as the w o r k i n g data set, except for the last cycle o f refinement. Fig. 10 shows the improvement o f and Rwork as the refinement progressed. Individual B-factor refinem e n t was introduced at the fourth cycle. Throughout the refinement, stringent stereochemical restraints were applied [53]. A total o f 151 water molecules were modeled i n t o the electron density during the last t w o cycles o f the refinement.
A
Rfest
Add Water
0.2
-
0
1
2
3
4
5
6
7
8
9
1
0
Number of Refinement Cycles Fig. 10. Simultaneous improvement of R,,, and Rwork for the refinement of knob structure. During the course of ref~nement, 10% o f r$ndornly selected reflections i n the resolution range o f 12-1.7 A were set aside as an independent check for the improvement o f the refinement [57]. Cycle 1 was done at resolution o f 3.0 A; cycle 2, 2.7 A; cycle 3, 2.5 A; cycle 4, 2.3 A; cycle 5, 2.1 A; cycle 6, 1.9 A; cycle 7, 1.8 A; cycles 8 and 9, 1.7 A. A l l data were included for the last cycle o f refinement.
Acknowledgements: T h e authors thank D r s Sekhar S Boddupalli, Charles A Hasemann, KG Ravichandran, and Chyung-Ru W a n g for helpful discussions, M s Barbara S Smith for help in the laboratory, Ms Dorothee B Staber for assistance in preparing the manuscript, and the referees for constructive comments.
References 1. Philipson, L. (1983). Structure and assembly of adenoviruses. In Current Topics i n Microbiology and Immunology. pp. 1-42, Springer-Verlag, Berlin, Heidelberg. 2. Gerard, R.D. & Meidell, R.S. (1993). Adenovirus-mediated gene transfer. Trends Cardiovasc. Med. 3, 171-1 77. 3. Rosenfeld, M., et a/., & Crystal, R.G. (1991). Adenovirus-mediated transfer of a recombinant al-antitrypsin gene to the lung epithelium in vivo. Science 252, 43 1 4 3 4 . 4. Morgan, C., Rosenkranz, H.S. & Mednis, B. (1969). Structure and development of viruses as observed in the electron microscope. X. Entry and uncoating of adenovirus. J. Virol. 4, 777-796. 5. Stewart, P.L., Burnett, R.M., Cyrklaff, M. & Fuller, S.D. (1991). Image reconstruction reveals the complex molecular organization of adenovirus. Cell 67, 145-1 54. 6. van Oostrum, J. & Burnett, R.M. (1985). Molecular composition of the adenovirus type 2 virion. J. Virol. 56, 439-448. 7. Chardonnet, Y. & Dales, S. (1970). Early events in the interaction of adenoviruses with Hela cells. I. Penetration of type 5 and intracellular release of the DNA genome. Virology 40, 462477. 8. FitzGerald, D.J.P., Padmanabhan, R., Pastan, I. & Willingham, M.C. (1978). Adenovirus-induced release of epidermal growth factor and Pseudomonas toxin into the cytosol of KB cells during receptormediated endocytosis. Cell 32, 400402. 9. Seth, P., FitzGerald, D., Willingham, M.C. & Pastan, 1. (1986). Pathway of adenovirus into cells. In Virus Attachment and Entry into Cells. (Crowell, R.L. & Lonberg-Holm, K., eds), pp. 191-195, American Society for Microbiology, Washington, D.C. 10. Wickham, T.J., Mathias, P., Cheresh, D.A. & Nemerow, G.R. (1993). lntegrins avb3 and avb5 promote adenovirus internalization but not virus attachment. Cell 73, 309-31 9. 11. Green, N.M., Wrigley, N.C., Russell, W.C., Martin, S.R. & McLachlan, A.D. (1983). Evidence for a repeating cross p-sheet structure in the adenovirus fibre. EMBOJ. 2, 1357-1365. 12. Henry, L., Xia, D., Wilke, M., Deisenhofer, J. & Gerard, R.D. (1994). Characterization of the knob domain of the adenovirus type 5 fiber protein expressed in Escherichia coli. 1. Virol. 68, 5239-5246. 13. Louis, N., Fender, P., Barge, A,, Kitts, P. & Chroboczek, I.(1994). Cell-binding domain of adenovirus~erotype2 fiber. J. Virol. 68, 410 4 4 106. 14. Dragulev, B.P., Sira, S., Abouhaidar, M.C. & Campbell, J.B. (1991). Sequence analysis of putative E3 and fiber genomic regions of two strains of canine adenovirus type 1. Virology 183, 298-305. 15. Kidd, A.H. & Erasmus, M.J. (1989). Sequence characterization of the adenovirus 40 fiber gene. Virology 172, 134-144. 16. Kidd, A.H., Chroboczek, J., Cusack, S. & Ruigrok, R.W.H. (1993). Adenovirus type 40 virions contain two distinct fibers. Virology 192, 73-84. 17. Kidd, A.H., Erasmus, M.J. & Tiemessen, C.T. (1990). Fiber sequence heterogeneity in subgroup F adenoviruses. Virology 179, 139-1 50. 18. Signas, C., Akusjarvi, G. & Pettersson, U. (1985). Adenovirus 3 fiber polypeptide gene: implications for the structure of the fiber protein. J. Virol. 53, 672-678. 19. Ruigrok, R.W.H., Barge, A., Albiges-Rizo, C. & Dayan, 5. (1990). Structure of adenovirus fibre II. Morphology of single fibres. J. Mol. Biol. 215, 589-596. 20. Albiges-Rizo, C., Barge, A., Ruigrok, R.W.H., Timmins, P.A. & Chroboczek, J. (1991). Human adenovirus serotype 3 fiber protein. J. Biol. Chem. 266, 3961-3967. 21. Stouten, PI., Sander, C., Ruigrok, R.W.H. & Cusack, S. (1992). New triple-helical model for the shaft of the adenovirus fiber. J. Mol. Biol. 226, 1073-1 084. 22. McPherson, A. (1982). Preparation and Analysis of Protein Crystals. John Wiley and Sons Inc., New York. 23. Luzzati, V. (1952). Traitment statistique des erreurs dans la determination des structures cristallines. Acta Crystallogr. 5, 8 0 2 4 10. 24. Read, R.J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr. A 42, 140-149. 25. Laskowski, R.A., MacArthur, M.W., Moss, D.S. & Thornton, J.M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283-291. 26. Wilmot, C.M. & Thornton, J.M. (1990). p-turns and their distortions: a proposed new nomenclature. Protein Eng. 3, 479493. 27. Saul, F.A., Amzel, L.M. & Poljak, R.J. (1978). Preliminary refinement and structural analysis of the Fab fragment from human immunoglobulin New at 2.0 A resolution. J. Biol. Chem. 253, 585-597. 28. Tainer, J.A., Cetzoff, E.D., Beem, K.M., Richardson, J.S. & Richardson, D.C. (1982). Determination and analysis of the 2 A structure of Cu, Zn superoxide dismutase. J. Mol. Biol. 180, 181-21 7.
1269
1270
Structure 1994, Vol 2 No 12 29. Vriend, G. & Sander, C. (1991). Detection of common three-dimensional substructures in proteins. Proteins 11, 52-58. 30. Woolfson, D.N., Evans, P.A., Hutchinson, E.G. & Thornton, J.M. (1993). Topological and stereochemical restrictions in -sandwich protein structures. Protein Eng. 6, 461-470. 31. Orengo, C.A., Flores, T.P., Taylor, W.R. & Thornton, J.M. (1993). Identification and classification of protein fold families. Protein Eng. 6, 485-500. 32. Roberts, M.M., White, J.L., Grutter, M.G. & Burnett, R.M. (1986). Three-dimensional structure of the adenovirus major coat protein hexon. Science 232, 1148-1151. 33. Rao, S.T. & Rossmann, M.G. (1973). Comparison of supersecondary structures in proteins. J. Mol. Biol. 76, 241-256. 34. Rossmann, M.G., et al., & Vriend, G. (1985). Structure of a human common cold virus and functional relationship to other picornaviruses. Nature 317, 145-153. 35. Janin, J., Miller, S. & Chothia, C. (1988). Surface, subunit interfaces and interior of oligomeric proteins. . Mol. Biol. 204, 155-164. 36. Novelli, A. & Boulanger, P.A. (1991). Deletion analysis of functional domains in baculovirus-expressed adenovirus type 2 fiber. Virology 185, 365-376. 37. Novelli, A. & Boulanger, P.A. (1991). Assembly of adenovirus type 2 fiber synthesized in cell-free translation system. J.Biol.Chem. 266, 9299-9303. 38. Chroboczek, J. & Jacrot, B. (1987). The sequence of adenovirus fiber: similarities and differences between serotypes 2 and 5. Virology 161, 549-554. 39. Defer, C., Belin, M., Caillet-Boudin, M. & Boulanger, P. (1990). Human adenovirus-host cell interactions: comparative study with members of subgroups B and C.J. Virol. 64, 3661-3673. 40. Olson, N.H., et al., & Rossmann, M.G. (1993). Structure of a human rhinovirus complexed with its receptor molecule. Proc. Natl. Acad. Sci. USA 90, 507-511. 41. Banner, D.W., et al., & Lesslauer, W. (1993). Crystal structure of the soluble human 55 kd TNF receptor-human TNF 13-complex: implications for TNF receptor activity. Cell 73, 43.1-447. 42. Sprang, S.R. & Eck, M.J. (1992). The three-dimensional structure of TNF. In Tumor Necross Factors: The Molecules and Their Emerging Role in Medicine. (Beutler, B., ed), pp. 11-31, Raven Press, Ltd., New York.
43. Hierholzer, J.C. (1992). Adenoviruses in the immunocompromised host. Clin. Microbiol. Rev. 5, 262-274. 44. Kabsch, W. (1988). Evaluation of single-crystal X-ray diffraction data from a position-sensitive-detector. J. Appl. Crystallogr. 21, 237-247. 45. Kabsch, W. (1988). Automatic indexing of rotation diffraction pattern. J. Appl. Crystallogr. 21, 67-71. 46. Steigemann, W. (1993). PROTEIN Version 3.2, A Progress Report. American Crystallographic Association (series 2) 21, 29. 47. Matthews, B.W. (1968). Solvent content of protein crystals. J. Mol. Biol. 33, 491-494. 48. Mighell, A.D. & Jacobson, R.A. (1963). Analysis of three-dimensional Patterson maps using vector verification. Acta Crystallogr. 16, 443-445. 49. Bazan, J.F. (1990). Structural design and molecular evolution of a cytokine receptor superfamily. Proc. Natl. Acad. Sci. USA 87, 6934-6938. 50. Broinger, A.T., Kuriyan, J. & Karplus, M. (1987). Crystallographic R factor refinement by molecular dynamics. Science 235, 458-460. 51. Read, R.J. (1990). Structure-factor probabilities for related structures. Acta Crystallogr. A 46, 900-912. 52. Briinger, A.T., Krukowski, A. & Erickson, J.W. (1990). Slowingcooling protocols for crystallographic refinement by simulated annealing. Acta Crystallogr. A 46, 585-593. 53. Engh, R.A. & Huber, R.(1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr. A 47, 392-400. 54. Carson, M. (1987). Ribbon models of macromolecules. J. Mol. Graphics 5, 103-106. 55. Kraulis, P.J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24, 946-950. 56. Chapman, M.S. (1993). Mapping the surface properties of macromolecules. Prot. Sci. 2, 459-469. 57. BrUnger, A.T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355, 472-475. Received: 26 Sep 1994; revisions requested: 17 Oct 1994; revisions received: 28 Oct 1994. Accepted: 3 Nov 1994.