statistical analyses were done with the program JMP IN 5.1 (SAS Institute Inc., Cary, NC). Relaxation measurements were done at two fields to permit finer ...
Nuclear Magnetic Resonance and Dynamic Characterization of the Intrinsically Disordered HIV-1 Tat Protein
BY Shaheen Shojania
A Thesis Submitted to the Faculty of Graduate Studies in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Department of Chemistry University of Manitoba © July 30, 2007
Abstract The HIV-1 transactivator of transcription (Tat) is a protein essential for both viral gene expression and virus replication. Tat is an RNA-binding protein that, in cooperation with host cell factors cyclin T1 and cyclin-dependent kinase 9, regulates transcription elongation. Tat also interacts with numerous other intracellular and extracellular proteins, and is implicated in a number of pathogenic processes. The Tat protein is encoded by two exons and is 101 residues in length. The first exon encodes a 72-residue molecule that activates transcription with the same proficiency as the full-length protein. The physicochemical properties of Tat make it a particularly challenging target for structural studies: Tat contains seven cysteine residues, six of which are essential for transactivation, and is highly susceptible to oxidative cross-linking and aggregation. In addition, a basic segment (residues 48-57) gives the protein a high net positive charge of +12 at pH 7, endowing it with a high affinity for anionic polymers and surfaces. In order to study the structure of Tat, both alone and in complex with partner molecules, we have developed a system for the bacterial expression and purification of polyhistidine-tagged and isotopically enriched (in
15
N and
15
N/13 C) recombinant HIV-1 Tat1−72 (BH10 isolate) that yields large amounts
of protein. These preparations have facilitated the assignment of 95% of the non-proline backbone resonances using heteronuclear 3-dimensional nuclear magnetic resonance (NMR) spectroscopy. Analysis by mass spectrometry and NMR demonstrate that the cysteine-rich Tat protein is unambiguously reduced and monomeric in aqueous solution at pH 4. NMR chemical shifts and coupling constants suggest that it exists in a disordered conformation. Line broadening and multiple peaks in the cysteine-rich and core regions suggest that transient folding occurs in two of the five sequence domains. NMR 15 N-relaxation parameters were measured and analysed by spectral density and model-free approaches both confirming
the lack of structure throughout the length of the molecule.
The absence of a fixed
conformation and the observation of fast dynamics are consistent with the ability of the Tat protein to interact with a wide variety of proteins and nucleic acid lending further support to the concept that Tat exists as an intrinsically disordered protein.
ii
For Pamela. There are no words to describe my sense of gratitude and love for her friendship, love and support.
iii
Science is a wonderful thing if one does not have to earn one’s living at it. Albert Einstein
iv
Contents
List of Figures
ix
List of Tables
xiv
Copyrighted Material
xvi
Acknowledgments
xviii
Abbreviations
xxi
1 Introduction 1.1
1
Intrinsically Disordered Proteins . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1
The Origin of the Structure-Function Paradigm . . . . . . . . . . . .
2
1.1.2
Discrepancies in the Structure-Function Paradigm . . . . . . . . . . .
4
1.1.3
Discovery of Intrinsic Disorder . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Classifications of Disorder and the Protein Trinity . . . . . . . . . . . . . . .
6
1.3
Intrinsic Disorder and Function . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3.1
Protein-Chameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
16
1.4
Disorder Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.5
Detecting and Characterizing Disorder . . . . . . . . . . . . . . . . . . . . .
19
1.6
The Human Immunodeficiency Virus . . . . . . . . . . . . . . . . . . . . . .
25
1.7
The HIV-1 Trans-Activator of Transcription . . . . . . . . . . . . . . . . . .
31
1.8
NMR Investigation of the Structure and Dynamics of Tat . . . . . . . . . . .
37
2 Spectral Densities, Relaxation and Dynamics in Nuclear Magnetic Resonance Spectroscopy
39
2.1
Semi-Classical Description of Relaxation . . . . . . . . . . . . . . . . . . . .
40
2.1.1
The Master Equation of Relaxation . . . . . . . . . . . . . . . . . . .
41
2.1.2
The Master Equation in Operator Form . . . . . . . . . . . . . . . .
45
2.1.3
Time Evolution of a Physical Variable . . . . . . . . . . . . . . . . .
53
Relaxation and Dipolar Coupling . . . . . . . . . . . . . . . . . . . . . . . .
56
2.2.1
Unlike Spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
2.2.2
Longitudinal Relaxation . . . . . . . . . . . . . . . . . . . . . . . . .
61
2.2.3
Transverse Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . .
73
2.2.4
Orientational Spectral Densities and Spherical Harmonics . . . . . . .
80
2.3
Chemical Shift Anisotropy . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
2.4
The Steady-State Heteronuclear Nuclear Overhauser Effect . . . . . . . . . .
98
2.5
Lipari-Szabo Model-Free Formalism . . . . . . . . . . . . . . . . . . . . . . . 103
2.6
Relaxation in the Rotating Frame . . . . . . . . . . . . . . . . . . . . . . . . 107
2.2
3 Materials and Methods
119 vi
3.1
Plasmid construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.2
Expression of unlabelled His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . 120
3.3
Expression of
3.4
Purification of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . 123
3.5
MALDI-TOF-MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.6
NMR Sample Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.7
NMR HSQC Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.8
NMR Backbone Assignments
3.9
NMR Relaxation Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 129
13
C/15 N-His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . 121
. . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.10 Relaxation Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 3.11 pH and Hydrogen Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4 Results
143
4.1
Protein Expression and Purification
4.2
Monomer Identification: MALDI-TOF-MS . . . . . . . . . . . . . . . . . . . 144
4.3
NMR Spectroscopy and Resonance Assignments . . . . . . . . . . . . . . . . 145
4.4
Chemical Shifts and 3 JH N H α Coupling Constants . . . . . . . . . . . . . . . 156
4.5
NMR Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.6
Spectral Density Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.7
Model-Free Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.8
pH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.9
Disorder Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 vii
. . . . . . . . . . . . . . . . . . . . . . 143
5 Discussion
213
5.1
Protein Expression and Purification . . . . . . . . . . . . . . . . . . . . . . . 213
5.2
NMR Spectroscopy and Backbone Assignment . . . . . . . . . . . . . . . . . 216
5.3
Chemical Shifts and Coupling Constants . . . . . . . . . . . . . . . . . . . . 217
5.4
NMR Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.5
Reduced Spectral Density Mapping . . . . . . . . . . . . . . . . . . . . . . . 222 5.5.1
J(0.87ωH ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.5.2
J(ωN ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.5.3
Jef f (0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.6
Model-Free Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
5.7
pH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
5.8
Disorder Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6 Conclusions
239
Bibliography
242
Appendices
264
A Resonance Assignments for His-tagged Tat1−72
264
B Model-Free Parameter Estimates for His-tagged Tat1−72
273
viii
List of Figures 1.1
Variation and classification of levels of protein disorder . . . . . . . . . . . .
7
1.1
continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.2
The Protein Trinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Classification scheme for IDP Function . . . . . . . . . . . . . . . . . . . . .
11
1.4
A Protein-Chameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.5
NMR of the Folded and Unfolded state of drkN SH3 . . . . . . . . . . . . . .
24
1.6
General features of the HIV-1 virion . . . . . . . . . . . . . . . . . . . . . . .
27
1.7
Open reading frames of the HIV genome . . . . . . . . . . . . . . . . . . . .
28
1.8
General features of the HIV life-cycle . . . . . . . . . . . . . . . . . . . . . .
30
1.9
The HIV-1 Tat sequence encoded by exon 1 . . . . . . . . . . . . . . . . . .
33
1.10 The Tat-TAK-TAR association . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.1
Energy level diagram showing transition frequencies for the two spin-1/2 system 59
2.2
Energy level diagram showing transition probabilities for the two spin-1/2 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
99
2.3
Magnetic field vectors in the rotating frame resulting from a selective spin-lock for a nucleus with Larmor frequency ω0 . . . . . . . . . . . . . . . . . . . . . 110
4.1
Amino acid sequence of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . 144
4.2
MALDI-TOF-MS identification of monomeric unlabelled His-tagged Tat1−72
4.3
Amide backbone regions of 1 H/15 N-HSQC spectrum for naturally abundant 15
145
N in unlabelled His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . 147
4.4
1
H/15 N-HSQC resonance assignments of His-tagged Tat1−72 . . . . . . . . . . 148
4.4
continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.5
Intensity profile of 1 H/15 N-HSQC backbone resonances for His-tagged Tat1−72 .151
4.6
Strip plots from HN(CA)CO and HNCACB spectra of His-tagged Tat1−72 . . 153
4.7
1
4.8
Difference plots for His-tagged Tat1−72 chemical shifts and 3 JH N H α coupling
H/15 N-HSQC resonance assignments of His-tagged Tat1−72 . . . . . . . . . . 155
constants from the random coil. . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.8
continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.8
continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.8
continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.9
THRIFTY estimation of the extended disordered state of His-tagged Tat1−72 . 161
4.10 Sample spectra for the steady state heteronuclear 1 H-15 N NOE of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.10 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.11 Relaxation measurements of the His-tagged Tat1−72 protein at pH 4.1 and 293 K, determined at 14.1 T and 18.8 T field strengths. . . . . . . . . . . . . . . 165 x
4.11 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.12 Sample spectra for T1 and T1ρ relaxation series for His-tagged Tat1−72 . . . . 168 4.12 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.13 Sample fits for T1 of Gly-68 measured at 14.1 T and 18.8 T field strengths . 170 4.13 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.14 Sample fits for T1ρ of Gly-68 measured at 14.1 T and 18.8 T field strengths . 172 4.14 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.15 Transverse relaxation rates (R2 ) for His-tagged Tat1−72 determined at 14.1 T field strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.16 Sample spectra for T2 relaxation series for His-tagged Tat1−72 at 14.1 T field strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.17 Sample fit for T2 of Gly-68 measured at 14.1 T field strength . . . . . . . . . 176 4.18 Reduced spectral density mapping of motions for His-tagged Tat1−72 at pH 4.1 and 293 K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.18 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.19 Field dependent conformational exchange rates for His-tagged Tat1−72 at pH 4.1 and 293 K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.20 Jef f (0) spectral density maps determined for His-tagged Tat1−72 at 14.1 T and 18.8 T field strengths separately and combined. . . . . . . . . . . . . . . 185 4.20 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 4.21 Model-free parameter estimates using Model 2 (Rf = 0.227) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 xi
4.21 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 4.22 Model-free parameter estimates using Model 3 (Rf = 0.136) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 4.22 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 4.23 Model-free parameter estimates using Model 7 (Rf = 0.098) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4.23 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 4.24 Variation in chemical shift and intensity of 1 H/15 N-HSQC with increasing pH 198 4.25 Predicted amide hydrogen exchange rates for His-tagged Tat1−72 . . . . . . . 199 4.25 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 4.25 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 4.26 Variation in absolute peak heights with increasing pH for observed glycine residues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 4.26 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 4.26 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 4.27 Variation in absolute peak heights with increasing pH for selected serine and threonine residues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 4.27 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 4.27 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 4.28 Decrease in calculated net charge with increasing pH for His-tagged Tat1−72 . 208
xii
4.29 DisProt disorder predictions of amino acid sequence for His-tagged Tat1−72 . . 210 4.29 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 4.30 PONDR disorder predictions for the His-tagged Tat1−72 amino acid sequence. 211 4.31 RONN disorder predictions for the His-tagged Tat1−72 amino acid sequence. . 212 4.32 IUPred disorder predictions for the His-tagged Tat1−72 amino acid sequence. 5.1
212
Variation in the theoretical relaxation rates and steady-state heteronuclear NOE with overall rotational correlation time. . . . . . . . . . . . . . . . . . . 224
B.1 Single Field model-free parameter estimates and Monte Carlo error estimates using Model 2 with relaxation data collected at 600 MHz field strength. . . . 274 B.1 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 B.2 Single Field model-free parameter estimates and Monte Carlo error estimates using Model 2 with relaxation data collected at 18.8 T field strength. . . . . 276 B.2 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
xiii
List of Tables 1.1
Intrinsically disordered proteins (IDPs) and their functions
. . . . . . . . .
12
1.2
Moonlighting Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.3
Predictors of protein disorder . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.1
Tensor Operators for the Dipolar Interaction . . . . . . . . . . . . . . . . . .
59
2.2
Commutation Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
2.3
Tensor Operators for the CSA Interaction . . . . . . . . . . . . . . . . . . .
90
3.1
M9 Minimal Medium ingredients . . . . . . . . . . . . . . . . . . . . . . . . 122
3.2
Protein purification buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.3
Acquisition parameters for the NMR experiments. . . . . . . . . . . . . . . . 127
3.4
Models tested using Lipari-Szabo and Cole-Cole model-free methods . . . . . 140
4.1
Range and average values for the reduced spectral density mapping of Histagged Tat1−72 at pH 4.1 and 293 K . . . . . . . . . . . . . . . . . . . . . . . 181
4.2
R-factors and mean Akaike Information Criterion values for model-free estimates of dynamics parameters. . . . . . . . . . . . . . . . . . . . . . . . . 188
xiv
A.1 Resonance assignments of Histidine-tagged Tat1−72 . . . . . . . . . . . . . . 264 A.2 Additional assignments of resonances from 1 H/15 N-HSQC of 13 C/15 N labelled Histidine-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
xv
List of Copyrighted Material The following material has been reproduced or adapted with permission of the copyright holder or author: • Figure 1.1 on page 7: adapted from Journal of Molecular Recognition, 18(5):343–384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Figure 1. Different levels of order and disorder, Copyright (2005) with permission of V. N. Uversky. • Figure 1.3 on page 11: adapted from FEBS Letters, 579(15):3346–3354, P. Tompa, “The interplay between structure and function in intrinsically unstructured proteins”, Figure 1. Functional classification scheme of IUPs, Copyright (2005) with permission of P. Tompa. • Table 1.1 on page 12: adapted from Trends in Biochemical Sciences, 27(10):527–533, P. Tompa, “Intrinsically unstructured proteins”, Table 1. Intrinsically unstructured proteins (IUPs) and domains, Copyright (2002); FEBS Letters, 579(15):3346–3354, P. Tompa, “The interplay between structure and function in intrinsically unstructured proteins”, Table 1. Functional classification of IUPs, Copyright (2005) with permission of P. Tompa. • Table 1.2 on page 16: adapted from Trends in Biochemical Sciences, 30(9):484–489, P. xvi
Tompa, “Structural disorder throws new light on moonlighting”, Table 1. Examples of disordered moonlighting proteins, Copyright (2005) with permission of P. Tompa. • Figure 1.4 on page 17: reproduced from Journal of Molecular Recognition, 18(5):343– 384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Figure 13. Being a protein-chameleon, Copyright (2005) with permission of V. N. Uversky. • Table 1.3 on page 19: reproduced from Journal of Molecular Recognition, 18(5):343– 384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Table 1. Protein disorder predictors, Copyright (2005) with permission of V. N. Uversky. • Figure 1.5 on page 24: reproduced from Biochemistry, 36(9):2390–2402, N. A. Farrow and O. Zhang and J. D. Forman-Kay and L. E. Kay, “Characterization of the backbone dynamics of folded and denatured states of an SH3 domain”, Supplementary Figure S1. Copyright (1997) with permission of L. E. Kay. • Figure 1.10 on page 35: reproduced from J. Mol. Biol., 293(2) pp.235–254, J. Karn, “Tackling Tat”, Figure 4. Recognition of Tar RNA by Tat and TAK, Copyright (1999), with permission from Elsevier and J. Karn.
xvii
Acknowledgments This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the University of Manitoba; it was initiated with funding from the Medical Research Council of Canada and the Manitoba Health Research Council. Funding for the 600 MHz spectrometer at the University of Manitoba was made possible by the Canada Foundation for Innovation. I would like to thank the following people for making this research and thesis possible: • Joe O’Neil for giving me the opportunity to work on this project, advice, guidance, patience, support, humour, and kindness; • James Peeling, Scott Kroeker, and Frank Hruska for patience as my committee throughout this long process; • Anthony S. Secco, Hyman D. Gesser, and Arthur Chow for giving me my first experiences at research at the University of Manitoba in my undergraduate and graduate career; • Leo Spyracopoulos (University of Alberta) for many helpful discussions, introducing me to NMR and structural biology, giving me the direction I needed, and for providing the Mathematica notebooks for the single field spectral density and Lipari-Szabo calculations from which all of my subsequent notebooks were based; xviii
• Kaveh Shojania for a lifetime of support, encouragement and advice (some of which I followed); • Ted Schaefer for giving me an idea of what it was all about; • Vincent C. Chen and Hélène Perreault (University of Manitoba) for mass spectral data collection and analysis; • Gillian D. Henry (Tufts University) for all of her work in constructing the Tat expression plasmid; • Kirk Marat for assistance and training at the NMR facility at the University of Manitoba; • Ryan McKay for the acquisition of NMR data on the 800 MHz spectrometer at NANUC (University of Alberta) and for helpful advice on data acquisition; • Frank Delaglio (NIH) for use of some of his unreleased NMRPrime scripts to help in the assignment of the Tat protein; • Lucio Frydman (Weizmann Institute of Science), Pei Zhou and Brian Coggins (Duke University), Thomas Szyperski (SUNY, Buffalo), Ray Freeman (Jesus College, Cambridge) and Eriks Kup˘ce (Varian Inc.) for providing many pulse sequences that I attempted to try to overcome acquisition problems; • Markus Heller (University of British Columbia) for many helpful discussions on NMR and his assistance and advice in preparing this thesis; • the LATEX community; • Lawrence MacIntosh of the Biochemistry Department at the University of British Columbia for providing me with an office and the support of his research group during the writing of this thesis; xix
• Walter Englander (University of Pennsylvania) for providing the Excel files for calculating hydrogen exchange rates; • Richard Sparling of the Department of Microbiology, University of Manitoba and the members of his lab for the use of their glove bag and degassing equipment; • Mark Berjanski (University of Alberta) for verifying some of my conclusions; • Thach N. Vo for finally running the gels that I would never do; • Julian Saba (University of Montreal) for providing me with the motivational words of wisdom that got me through the last few years; • Jamie Galka for allowing me someone to vent with and for always providing me with a good laugh; • the students, faculty and staff of the Department of Chemistry of the University of Manitoba that I have had a chance to know; • The University of Manitoba and the Faculty of Graduate Studies for funding; • my parents for love and support all of my life; • Pamela, Evan and Elliot for their patience, love and support.
xx
Abbreviations 4EPB
eukaryotic translation initiation factor 4E binding protein
6×His
hexahistidine
AIC
Akaike information criterion
AIDS
acquired immunodeficiency syndrome
BBB
blood-brain barrier
βME
β-mercaptoethanol
BSA
bovine serum albumin
CA
capsid protein
CC
Cole-Cole
CD
circular dichroism
CDK9
cyclin-dependent kinase 9
CFTR
cystic fibrosis transmembrane conductance regulator
cm
centimetre
CNS
central nervous system
CPMG
Carr-Purcell-Meiboom-Gill
CREB
cAMP response element binding protein
CSA
chemical shift anisotropy
CSI
chemical shift index
CTD
carboxy terminal domain xxi
CaMKIV
Ca2+ /calmodulin-dependent protein kinase IV
Cdk
cyclin-dependent kinase
Da
dalton
DD
dipole-dipole
deg
degrees
DHPR
dihydro-pyridine receptor
dmol
decimole
DNA
deoxyribonucleic acid
DNase I
deoxyribonuclease I
drkN SH3
N-terminal SH3 domain from the adapter protein drk
Dsp
desiccation stress protein
DSS
2,2-dimethyl-2-silapentane-5-sulfonate
DTT
dithiothreitol
E. coli
Escherichia coli
EBD
entropic bristle domain
EBV-SM
Epstein-Barr Virus nuclear protein BS-MLF1
EMBL
European Molecular Biology Laboratory
FG
Phenylalanine-Glycine
FlgM
flagellar anti-σ factor
g
gram
g
acceleration due to gravity
Gag
group-specific antigen
Gdn-HCl
guanidine hydrogen chloride
gp120
glycoprotein 120
gp41
glycoprotein 41
HAD
HIV-associated dementia xxii
HAT
histone acetyl transferase
His-tag
hexahistidine affinity tag
HIV
human immunodeficiency virus
HIVE
HIV-associated encephalitis
HSQC
heteronuclear single quantum coherence
Hexim1
hexamethylene bisacetamide-inducible protein 1
HX
hydrogen exchange
Hz
hertz
IDP
intrinsically disordered protein
ILK
integrin-linked kinase
IN
integrase
INEPT
insensitive nuclei enhanced by polarization transfer
IPTG
isopropyl-β-D-thiogalactopyranoside
J
joule
K
kelvin
kDa
kilodalton
kHz
kilohertz
kV
kilovolt
L
litre
LS
Lipari-Szabo
LS(ext)
Lipari-Szabo extended
LTR
long terminal repeat
M
molar
MA
matrix complex
MALDI
matrix-assisted laser desorption-ionization
MAP2
microtubule-associated protein 2 xxiii
MARK
microtubule-affinity regulating kinase
MDM2
mouse double minute 2
MDa
megadalton
MES
2-(N-morpholino)ethanesulfonic acid
MHC
major histocompatibility complex
MHz
megahertz
mL
millilitre
mM
millimolar
mRNA
messenger RNA
mg
milligram
ms
millisecond
MS
mass-spectrometry
µg
microgram
µL
microlitre
µM
micromolar
µs
microsecond
MW
molecular weight
m/z
mass-to-charge ratio
N-TEF
negative transcription elongation factor
NACP
non-A beta component of Alzheimer’s disease amyloid plaque
NC
nucleocapsid protein
NCBI
National Center for Biotechnology Information
Nef
negative factor
NIAID
National Institute of Allergy and Infectious Disease
NIH
National institute of Health
nM
nanomolar xxiv
NMR
nuclear magnetic resonance
NOE
nuclear Overhauser effect
NPCs
nuclear pore complexes
ns
nanosecond
Nup
nuclear porin
P-TEFb
positive transcription elongation factor b
PCAF
p300/CBP-associated factor
PCR
polymerase chain reaction
PDB
Protein Data Bank
PEVK
Proline, Glutamate, Valine and Lysine rich region
pI
isoelectric point
PIAS1
protein inhibitor of activated STAT1
PKA
cAMP-dependent protein kinase
PP1
protein phosphatase 1
ppm
parts-per-million
PR
protease
PRP
proline-rich protein
ps
picosecond
RGD
Arginine-Glycine-Aspartate
RNA
ribonucleic acid
RNAPII
RNA Polymerase II
RNase I
ribonuclease I
RPA
replication protein A
RT
reverse transcriptase
Rev
regular expression of virus
s
second xxv
SAXS
small angle X-ray scattering
SD
standard deviation
SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SNAP-25
syntaxin and synaptosomal-associated protein of 25 kDa
SPE
solid phase extraction
STAT
signal transducer and activator of transcription
SU
surface unit complex
SW
sweep width
T
temperature
T
tesla
TAD
transactivator domain
TAF
TATA-box associated factors
TAK
Tat-associated kinases
TAR
trans-activation response
TB
Terrific Broth
TCEP
tris(2-carboxyethyl) phosphine
TCP
tris(2-cyanoethyl)phosphine
TFA
trifluoro-acetic acid
THP
tris(hydroxypropyl)phosphine
TM
transmembrane complex
TOF
time-of-flight
Tat
trans-activator of transcription
Tris-HCl
tris(hydroxymethyl) aminomethane hydrochloride
UCU
uridine-cytidine-uridine
UV
ultraviolet
Vif
viral infectivity factor xxvi
Vpr
viral protein R
Vpu
viral protein U
WH2
Wiskott–Aldrich syndrome protein homology domain 2
xxvii
Chapter 1 Introduction 1.1
Intrinsically Disordered Proteins
For most of the last century, it has generally been accepted that a protein must adopt a well defined tertiary structure to achieve its functional native state, and that proteins and protein domains that lacked secondary structural motifs were without function. The idea that a well defined three-dimensional structure was a prerequisite for protein function came to be referred to as the structure-function paradigm. Over the last 20 years, there has been increasing evidence of proteins that exist in partially folded, unfolded and molten globule states, that have functional importance. These observations—along with the increasing numbers of proteins that are being discovered to be intrinsically disordered or partially folded in proteomics and bioinformatics research—has led to a re-assessment of the structurefunction paradigm [1]. In this section a brief introduction to the origin of the structure-function paradigm, along with some of the arguments for its re-assessment, will be presented. Examples of key cellular processes illustrating the functional importance of the disordered state will be
1
described along with the functional classification of disordered proteins.
1.1.1
The Origin of the Structure-Function Paradigm
The origin of the structure-function paradigm is not clearly understood. However, a detailed review of the literature by Dunker et al. [2] outlines some of the pivotal work that led to the development of the structure-function idea. • Schloss und Schlüssel (lock and key): Emil Fischer (1894) observed that extracellular extracts of beer yeast, containing invertase, hydrolyzed α-glucosides but not βglucosides, while emuslin hydrolyzed the β’s but not the α’s [3]. The translation (by Lemieux and Spohr) [4] of the conclusion to these observations stated: “To use an image, I would say that the enzyme and glucoside have to fit each other like a lock and key in order to exert a chemical effect on each other”. • Hsien Wu (1931) hypothesized that denaturation corresponded to protein unfolding— as opposed to chemical alteration of the protein. Wu proposed that denaturation involved a transition from a compact ordered structure to a more flexible disordered structure and resulted in exposure of the amino acid side-chains to the solvent [2, 5]. Wu’s work was not well known at the time but its importance was realized later following the independent work of Mirsky and Pauling [6]. • Mirsky and Pauling (1936) published a survey on the structure of the native, denatured and coagulated states of proteins. Their review compiled the following observations [6]: loss of pepsin activity correlated with the amount of protein denatured; acid, alkali and urea all increased the viscosity of protein solutions and denatured the proteins without aggregation; many native proteins form crystals while denatured proteins
2
do not crystallize; exposure of sulfhydryls and other side-chain groups is typically accompanied by denaturation. “The characteristic specific properties of native proteins we attribute to their uniquely defined configurations. The denatured protein molecule we consider to be characterized by the absence of a uniquely defined configuration.” “It is evident that with loss of the uniquely defined configuration there would be loss of the specific properties of the native protein”. [6] The following decades provided numerous studies that identified loss of function upon denaturation of proteins and formed the foundation for the view that ordered structure is a necessary condition for protein function. The identification of the α-helix and β-strand in 1951 by Pauling et al. [7–9] provided the structural units that were attributed to biological activity. In 1959, Kauzmann published an extensive review on protein denaturation [10,11] in which the idea of the hydrophobic effect as the governing force in protein folding was outlined. This idea of hydrophobicity took time to take root, but inevitably became widely used as the explanation for structural containment and biological function [11]. By the 1960’s, when the atomic resolution structures of myoglobin [12] and lysozyme [13] had been determined, it was already generally accepted that a necessary condition for protein function was a specific folded 3D structure [2]. The disruption of hydrogen bonds in denatured proteins would result in a loss of function. However, studies such as these say nothing about proteins that lacked well folded structure in the absence of denaturant. The above studies of protein structure, denaturation and relationships to function may have played a key role in the development of structure-function paradigm, but perhaps also influential was the discovery of the double-helix as the structure of DNA [14, 15]. Watson and Crick’s 1953 papers proposing a structural model for DNA established the basis for the transfer of genetic information which later became the central dogma of molecular biology. 3
To have such a complex set of biological phenomena explained by a structural model must have had great influence on the biological community. The more than 42,000 protein structures now available, have obscured alternatives to the structure-function paradigm. However, most of these structures are similar to each other (only 1028 unique folds) and have recognisable sequence similarity to only a small fraction of the proteins in nature [2].
1.1.2
Discrepancies in the Structure-Function Paradigm
Karush reported in 1950 [2, 16] that, contrary to the behaviour of every other native protein known at the time, serum albumin demonstrated an outstanding capacity for the formation of reversible, high-affinity complexes with a variety of ions and molecules of diverse configurations.
With arguments similar to Fischer’s [2, 3] for the lock-and-key,
Karush inferred that the binding sites of albumin assume a large number of configurations in equilibrium with each other and of similar energy. In the presence of an anion, the configuration adopted is the one that is stabilized through the specific interactions with the present anion (allowing the anion to interact with appropriate residues of the polypeptide). In other words, upon interaction with the anion, the best configuration is adopted from albumin’s structural ensemble. Karush referred to this phenomenon as ‘configurational adaptability’. In 1958, Koshland independently proposed a concept very similar to configurational adaptability, which was later called the ‘induced-fit’ theory [2, 17]. In his examination of enzyme reactivity and specificity, Koshland [17] postulated that: “(a) a precise orientation of catalytic groups is required for enzyme action; (b) the substrate may cause an appreciable change in the three-dimensional relationship of the amino acids at the active site; and (c) the changes in protein structure 4
caused by a substrate will bring the catalytic groups into the proper orientation for reaction, whereas a non-substrate will not”. However, Koshland did not propose a mechanism for ‘induced-fit’ and left in question whether binding induced a conformation to be adopted or was the conformation selected as the best-fit from an ensemble of structures in equilibrium. In 1978, Bennett and Steitz reported evidence of significant domain movement with glucose-induced conformational changes in yeast hexokinase [2,18]. In this study, the authors proposed two possible functional roles for the flexibility and conformational change: as an “embracing” mechanism to surround the substrate, or as a discriminating mechanism against water as a substrate [18].
1.1.3
Discovery of Intrinsic Disorder
According to the Protein Data Bank (PDB) [19] there were, by 1978, only 41 protein structures known. At least two of these structures showed that certain segments of a protein, known to be essential for the function, yielded no apparent electron density [2, 20, 21]. The absence of electron density in protein structures can be the result of: failure to solve the phase problem, crystal defects, or proteolytic degradation during purification. However, the most common reason for missing electron density is that the unobserved atom, side-chain, residue or region fails to scatter X-rays coherently due to differences in atomic position (disorder) from one protein to the next in the crystal [2]. Also in 1978, Aviles et al., using nuclear magnetic resonance (NMR), noted disorder in the highly charged functional tail of histone H5 [22]. NMR later revealed (by the 1990’s) functional proteins that lacked any identifiable structure (disordered from end-to-end) [23– 25]. Unlike X-ray diffraction where the absence of electron density may indicate disorder, NMR evidence for disorder is observable through chemical shift dispersion, peak widths, 5
relaxation times and heteronuclear nuclear Overhauser effects (NOEs) [26, 27].
1.2
Classifications of Disorder and the Protein Trinity
Before introducing the classification scheme for protein disorder, it is prudent to introduce and clarify the terminology. Historically, the term ‘natively denatured’ [28] was introduced to distinguish extremely flexible proteins from normal globular proteins. Slightly earlier, the term ‘rheomorphic’ was used as an alternative to the term random coil [29, 30] which originated from the study of polymers. The term ‘natively unfolded’ [31] was introduced to describe a non-compact protein state that lacked secondary structure under physiological conditions. More recently, the term ‘intrinsically unstructured’ [1] has been used frequently to describe proteins and protein domains that lack secondary structure. Uversky et al. [29] outline the various combinations of these terms to describe proteins that do not possess a rigid 3D structure. The authors proposed the term ‘intrinsically disordered’ as a means of acknowledging the fact that a disordered protein is not without structure, but it exists as an ensemble of interconverting structures [29]; neither is it ‘unfolded’ since this term implies that under some set physiological conditions or circumstances the protein would fold, which is not necessarily true. Uversky et al. [29], in addition to proposing the appropriate language for describing disordered proteins, propose a classification scheme of the various levels of disorder, found in proteins and outlined in Figure 1.1. This classification scheme also accounts for the ‘molten globule’ state of proteins (partially disordered but still collapsed) proposed by Ohgushi and Wada [32].
6
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1.1: Variation and classification of levels of protein disorder: (a) no disorder; (b) disordered termini; (c) disordered linker; (d) disordered loop; (e) disordered domain; (f) disordered protein with some residual structure; (g) wholly disordered, mostly collapsed protein; and (h) wholly disordered, extended protein. Adapted from [29] with permission of V. N. Uversky. 7
(g)
(h)
Figure 1.1: continued The identification of the molten globule state of proteins [32] resulted in the reworking of the two-state model of protein folding [33–35]. Although it took some time to be generally accepted, it has come to be understood that not all globular proteins undergo a cooperative transition from the unfolded state (U) to the folded or native state (N) without any stable intermediates. Ptitsyn and Uversky [36] proposed that proteins along the protein folding landscape may exist in stable molten globule states. In a subsequent study of β-lactamase, Ptitsyn and Uversky [37] identified a fourth state which they termed ‘partially folded’ although it has since become known as the pre-molten globule state [38] because it is less compact than the molten globule state, but more compact than the completely unfolded state. The identification of the molten globule as a thermodynamically stable state [36] led to the re-evaluation of the structure-function paradigm which came from Dunker and Obradovic in their proposal of the protein trinity hypothesis (Figure 1.2) [39]. According to Dunker and Obradovic, native proteins can be in one of three states: the ordered (folded) state, the liquid-like collapsed-disordered state (molten globule), and the extended-disordered state (random coil). Function may arise in any one of these three states or from transitions between the states [2, 39]. Uversky later extended this hypothesis to include the so-called pre-molten globule state, which he referred to as the protein quartet [40].
8
Ordered (folded)
Collapsed (molten globule)
Extended (random coil)
Figure 1.2: The Protein Trinity of native and functional states of proteins [2, 39]. Proteins may exist and function in the ordered, collapsed-disordered, or extended-disordered states as well as in transitions between these states.
1.3
Intrinsic Disorder and Function
A literature review of intrinsically disordered proteins (IDPs) and of disordered protein domains with identified function listed 90 proteins that were identified to be involved in 28 distinct functions [41]. It has been suggested that these functional roles can be divided into six broad categories as follows [42]: entropic chains, effectors, scavengers, assemblers, chaperones and display sites [42–44]. In all of the categories except for the entropic chains, interactions between the disordered segment and target typically result in some degree of disorder-to-order transitions [42]. Entropic chains are a unique category of IDPs that emphasizes the contradiction of the old structure-function paradigm. The function of the entropic chains requires disorder. These entropic chains can be categorized as linkers/spacers, bristles, and brushes. In all cases, the disordered segment remains disordered as it functions [2, 41–46]. The entropic chains can serve multiple purposes. In some cases, the disordered protein segment serves as linker or spacer between ordered domains in a multi-domain protein. The linker/spacer then serves to regulate the distance between the adjacent domains and enables
9
conformational freedom in orientational searches [42,43]. An example of this type of entropic chain is found in replication protein A (RPA) where the N-terminal 108 residues form a fivestranded β-barrel capped by two small helices, followed by a 60-residue flexible linker to its DNA-binding domain [47]. The entropic bristles, or brushes, operate via steric repulsion or excluded volume effects. In this manner entropic bristle domains (EBD) can regulate pores, channels, or active sites by rapidly adopting many conformations and can restrict the entrance until the EBD has been modified (eg., phosphorylation) [45]. An example of this sort of gating is found in nuclear pore complexes (NPCs) by nucleoporins (Nups) with large phenylalanine-glycine (FG) repeats [48]. In the case of the entropic brush, the excluded-volume principle operates in a similar manner to control the spacing of larger proteins or complexes. For example, neurofilament separation is regulated by EBD sidearms along the core filament. The thermally driven motions of the EBDs will give each filament a much larger effective volume [49]. The spacing of the filaments in bundling is therefore regulated by these thermally driven motions, which can also be regulated by the amount of phosphorylation of the EBDs [50]. The effectors, scavengers, assemblers, chaperones and display-sites classes of protein are classified according to their degree of disorder-to-order and binding interaction. Figure 1.3 shows how the intrinsically disordered proteins are separated into the appropriate functional class based according to whether the protein continues to freely move through its conformational space (no disorder-to-order transition) or whether it undergoes some disorder-to-order transition upon binding [43]. Those proteins involved in target binding are further separated depending on whether the interaction is permanent or transient [43]. Table 1.1, adapted from [42] and [43], contains a limited set of examples of each class of IDP.
10
IDP
entropic chain directly function due to disorder as a spring, bristle, or linker
molecular recognition
transient binding
display-sites sites of posttranslational modification
chaperones assist the folding of RNA or protein
permanent binding
effectors modulate the activity of a partner molecule
assemblers assemble complexes or target activity
scavengers store and/or neutralize small ligands
Figure 1.3: IDP functional classification relates directly to their ability to move freely through a large conformational space (entropic chains), or to the lifetime of binding to their target. Adapted from [43] with permission of P. Tompa.
11
Table 1.1: Examples of intrinsically disordered proteins (IDPs) and domains with target/partner (if applicable) and function3 . Adapted from [42] and [43] with permission of P. Tompa. IDP (protein/domain)
Target/partner
Function/action
Entropic chains Microtubule-associated Not applicable
Entropic bristle (spacing
protein 2 (MAP2)
in microtubule architecture)
projection domain Titin PEVK domain
Not applicable
Entropic spring (passive contractile force in muscle)
SNAP-25 linker region
Not applicable
Flexible spacer/linker of binding domains
Effectors Calpastatin
p21/27
Ca2+ -activated
Inhibitor of calpain in
protease (calpain)
Ca2+ signalling
Cyclin-dependent kinases
Kip/Cip class inhibitors in cell cycle regulation
4EBP1, 2, 3
Eucaryotic translation
Inhibitor of translation
initiation factor (eIF4E)
initiation Continued on next page
3
Abbreviations: PEVK, Pro, Glu, Val and Lys rich region; SNAP-25, syntaxin and synaptosomal-associated protein of 25 kDa; 4EPB, eukaryotic translation initiation factor 4E binding protein; CREB, cAMP response element binding protein; PKA, cAMPdependent protein kinase; CaMKIV, Ca2+ /calmodulin-dependent protein kinase IV; MARK, microtubule-affinity regulating kinase; NACP, non-A beta component of Alzheimer’s disease amyloid plaque. 12
Table 1.1 – continued from previous page IDP (protein/domain) Securin
Target/partner Separase
Function/action Inhibitor of chromosome separation before anaphase in mitosis
FlgM
Sigma 28 transcription Inhibitor of flagellin-specific factor
gene expression in bacteria
Tubulin dimers
Microtubule disassembly, catastrophe
Thymosins (proTα)
Zn2+ , histone
Not reported
Caseins
Calcium phosphate
Nanocluster formation,
Stathmin Scavengers
inhibition of precipitation in milk Salivary proline-rich
Tannin
Binding/neutralization of
protein (PRP) Desiccation stress
polyphenolic plant compounds Water
Retention of water to prevent
protein (Dsp) 16
desiccation of plants
Assemblers MAP2 microtubule-
Tubulin dimers
Microtubule polymerization,
binding domain
bundling Continued on next page
13
Table 1.1 – continued from previous page IDP (protein/domain) Caldesmon
Target/partner
Function/action
Ca2+ calmodulin, F-actin,
Actin polymerization,
myosin, tropomyosin
bundling
Oct1 transcription factor,
B-cell-specific expression of
Igκ promoter, TAFII105
immunoglobulin genes
λ phage N protein
mRNA, NusA, RNA Pol II
Translation anti-termination
SIBLING proteins
Integrin, complement
Assembly of bone
factor H, CD44, fibronectin
extracellular matrix
Fibronectin
Adherence to extracellular
Bob1
Fibronectin receptor (MSCRAMM) D1-D4
matrix of host in bacterial invasion
CREB transactivator
TATA-box-associated
Assembly of transcription
domain (TAD)
factors (TAFs), CREB-binding preinitiation complex protein
Display sites CREB TAD
Protein kinases
Regulation by
(e.g. PKA, CaMKIV)
phosphorylation
MAP2 microtubule-
Protein kinases
Regulation by
binding domain
(e.g. PKA, MARK)
phosphorylation Continued on next page
14
Table 1.1 – concluded from previous page IDP (protein/domain) Bcl-2 antiapoptotic
Target/partner
Function/action
Proteases (e.g. caspase) In vivo proteolysis site
protein (24–93)
Chaperones α-Synuclein (NACP)
Protein chaperone
Casein
Protein chaperone
Nucleocapsid protein 7/9
RNA chaperone
Ribosomal S12
RNA chaperone
Prion protein N-terminal domain
RNA chaperone
One advantage of the non-entropic chain IDPs is their ability to bind multiple partners and to have multiple functions. This binding ‘promiscuity’, which can modulate the activity of different targets, has been observed for several IDPs [2,51,52]. These proteins, commonly referred to as moonlighting proteins, have also been found to have opposing effects on the same target [51]. Some examples of moonlighting proteins are listed in Table 1.2 from [51].
15
Table 1.2: A selection of disordered moonlighting proteins with known opposing function. Adapted from [51] with permission of P. Tompa. Proteina
One (inhibiting) function
Another (activating) function
Calpastatin
Inhibition of calpain
Activation of calpain
CFTR (R domain)
Inhibition of CFTR
Activation of CFTR
DHPR (peptide C)
Inhibition of RyR
Activation of RyR
EBV-SM
Down-regulation of
Up-regulation of
intron-containing mRNA
intron-less mRNA
MDM2 (180–298)
Down-regulation of p21Cip1
Activation of estrogen receptor α
p21Cip1 and p27Kip1
Inhibition of Cdk
Activation of Cdk
PIAS1 (392–541)
Inhibition of activated STAT
Activation of p53
I-2
Inhibition of PP1
Activation of PP1
Ribosomal L5
Inhibition of MDM2 ubiquitin ligase Activation (chaperoning) of ribosome
Securin
Inhibition of separase
Activation (chaperoning) of separase
Thymosin-b4
Sequestration of G-actin
Activation of actin polymerization,
(WH2 domain) a
ILK kinase
Abbreviations: CFTR, cystic fibrosis transmembrane conductance regulator; DHPR, dihydro-pyridine receptor; EBV-SM, Epstein-Barr Virus nuclear protein BS-MLF1; MDM2, mouse double minute 2; Cdk, cyclin-dependent kinase; STAT, signal transducer and activator of transcription; PIAS1, protein inhibitor of activated STAT1; PP1, protein phosphatase 1; WH2, Wiskott–Aldrich syndrome protein homology domain 2; ILK, integrin-linked kinase.
1.3.1
Protein-Chameleon
An interesting example of the multiple functional roles of disordered proteins is the presynaptic protein α-synuclein, whose aggregation and fibrillation are implicated in the development of Parkinson’s disease [29, 53].
α-Synuclein can adopt several completely
different structures depending on its environment. Its conformational plasticity allows it 16
to be substantially disordered, adopt a partly folded (amyloidogenic) conformation, fold into either α-helical or β-sheet species (both monomeric and oligomeric), and form aggregates with several different morphologies (spheres, doughnuts, amorphous, or amyloid-like fibrils) 370
[29]. Figure 1.4, taken from [29], illustrates the many forms of this unusual protein. V. N. UVERSKY ET AL.
Figure 13. Being a protein-chameleon, "-synuclein is able to adopt absolutely different conformations in a template-dependent manner (modified from Uversky, 2003b).
Figure 1.4: A protein-chameleon, α-synuclein is able to adopt several completely different structures depending on its environment. Reproduced from [29] with permission of V. N. represented in a form of chameleon with different potential on the crystal structure of the protein (Dajani et al., 2001), Uversky. monomeric, oligomeric and insoluble conformations drawn around it.
where the N-terminus of GSK3! was shown to be converted to an autoinhibitory pseudo-substrate via the phosphorylation of Ser9 (Dajani et al., 2001; Frame et al., 2001). Remember that the non-phosphorylated N-terminal fragKilling two birds with one stone ment of GSK3! preceding Lys35 was shown to be disordered in the crystal (Dajani et al., 2001). The structural plasticity of ID proteins in the non-bound Thus, it has been pointed out that GSK3! achieves the state enables them to interact with multiple, structurally clever trick of transducing signals for two completely distinct partners, giving rise to structural polymorphism in independent pathways without any obvious cross-talk or A state. number algorithms have been developed interference in recent years theIndisordered the bound This of capability has functional implications, (Dajanitoetpredict al., 2003). the Wnt signaling since one ID protein can serve several different signaling network, a subset of the cellular GSK3! pool is involved segments on amino acid properties as charge, hydropathy, networks and can of be proteins regulated based via several different pathin a (such multiprotein complex that bringssecondary GSK3! and its !ways. Let us further consider GSK3! as a prototypical catenin substrate into close proximity. In the insulin signalexample of this concept. Besides its crucial role in the ing pathway, GSK3! operates via a completely different Wnt signaling pathway controlling the levels of !-catenin 17 mechanism, where the phosphorylation of Ser9 converts the as discussed above, GSK3! is also involved in the insulin disordered N-terminus of GSK3! to an autoinhibitory segand growth factor signaling pathways. In insulin signaling, ment, which blocks access to the active site and/or substrate
1.4
Disorder Prediction
structure propensity, and flexibility index) and their frequency of occurrence throughout the protein [54–61]. Low-sequence-complexity is often an indicator of protein disorder (i.e., low variability of the 20 amino acids within a segment of the protein and repetition of amino acids) [29]. Disordered proteins often exhibit a compositional bias against bulky or nonpolar amino acids (i.e., low content of Val, Leu, Ile, Met, Phe, Trp, and Tyr) [62]. Because high content of polar or charged residues tends to favour disorder, a higher proportion of Gln, Ser, Pro, Glu and Lys are usually observed [29]. Gly and Ala are often found to be present in higher proportions because their small side chains favour flexibility [62]. Table 1.3 lists several of these algorithms and their sites for web-server disorder predictions, some of which have been used to analyze protein sequence databases of entire genomes. Sequence analysis using the DISOPRED2 algorithm showed that disordered segments of 30 or more consecutive residues occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins [63].
18
Table 1.3: Predictors of protein disorder. Adapted from [29] with permission from V. N. Uversky. Predictor
Web address
Reference
Charge-hydropathy plot
www.pondr.com/
[64]
DisEMBL
http://dis.embl.de
[65]
DISOPRED
http://bioinf.cs.ucl.ac.uk/disopred/
[63, 66, 67]
DISpro
www.ics.uci.edu/∼baldig/diso.html
no literature ref.
DRIPpred
http://sbcweb.pdc.kth.se/cgi-bin/maccallr/
no literature ref.a
disorder/submit.pl FoldIndex©
http://bioportal.weizmann.ac.il/fldbin/findex
[68]
GlobPlot
http://globplot.embl.de/
[65, 69]
IUPred
http://iupred.enzim.hu/
[54, 55]
NORSp
http://cubic.bioc.columbia.edu/services/NORSp/ [70]
PONDR®b
www.pondr.com/
[39, 56–58, 71]
PreLink
http://genomics.eu.org/
[72]
RONN
www.strubi.ox.ac.uk/RONN
[61]
SEGc
http://mendel.imp.univie.ac.at/METHODS/
[73]
seg.server.html/ a
A description of DRIPpred algorithm can be found at www.forcasp.org/paper2127.html
b
PONDR® is a family of ID predictors, which includes VL-XT and VL3
c
SEG is a predictor of low-sequence-complexity regions
1.5
Detecting and Characterizing Disorder
Detecting the presence of disorder in a protein can be achieved by a variety of experimental methods. Some methods are sensitive to the presence of disorder but do not yield any residue-specific information on the location of the disorder. This section will briefly describe
19
some of the more common methods used for detecting and characterising protein disorder. Disorder detection is in no way limited to the methods described herein.
X-ray Crystallography The absence of electron density in single-crystal X-ray diffraction can be the result of failure of a region of the molecule to scatter X-rays coherently due to differences in the atomic positions from one protein to the next in the crystal. This absence of electron density often indicates regions of disorder in the structure. Additional experiments are needed to verify this conclusion. It is possible that a well folded domain could be ‘wobbly’ in that the whole domain moves as a rigid structure, so the domain has different positions from one protein to the next in the crystal [2, 55, 74]. These wobbly domains would also result in an absence of electron density for the whole domain. In addition, the missing electron density could be the result of a crystal defect or proteolysis during purification [2].
Circular Dichroism Spectropolarimetry Circular dichroism (CD) spectropolarimetry provides global structural characterization of proteins in solution. CD is based on the differential absorption of left- and right-circularly polarized light, and is reported in terms of the difference in the electric field vectors ∆E for the left- and right-circularly polarized light (EL and ER respectively) or in degrees of ellipticity (θ) [75]. The ellipticity is defined as the arctangent of the ratio of the difference and the sum of the two electric field vectors (defining the minor and major axis of an ellipse) [75] as tan θ =
∆E EL + ER
20
(1.1)
The ellipticity values are generally normalized and reported in molar ellipticity, [θ], values of deg·cm2 ·dmol−1 . Far-ultraviolet (far-UV) CD (240–180 nm range) is sensitive to the symmetry in the peptide bond environment and is therefore a fast method to characterize the content of secondary structure (α-helix, β-sheet, β-turn, and disordered) in a protein [2, 76]. Far-UV CD can characterize proteins with α-helices by large positive bands at 193 nm and negative bands at 208 and 222 nm; β-sheets and turns by a positive band at 193 nm and a negative band at 218 nm; and disorder (coil) by a negative band at 195 nm and very low ellipticity above 210 nm [75]. In the near-UV region (320–260 nm), CD is sensitive to the environment of the aromatic side-chains (Phe, Tyr, and Trp) and can give tertiary structural information [76]. However, CD in both the near- and far-UV range give only global information (i.e., fraction of protein that is helical, sheet or coil but not which residues are involved in the structured segments).
Protease Digestion Susceptibility to protease degradation is a useful method of determining flexible and exposed regions of proteins [2, 76,77]. Protease attack of peptide bonds can only occur in the flexible regions of the protein over 8 to 10 residues long that are exposed and therefore identify the disordered segments connecting well folded domains or intrinsically disordered domains [76–79]. This method is particularly useful in determining the cause of missing electron density in X-ray structures and discriminating disordered protein segments from wobbly domains [2]. The wobbly domains will remain as ordered structures once cleaved from the whole protein at their flexible linkers, whereas a disordered segment will be further digested into smaller fragments [2]. Proteolysis therefore provides a means of mapping the regions of disorder across the protein sequence and complements other biophysical methods of analysis. 21
Small-Angle X-Ray Scattering Small angle X-ray scattering (SAXS) is a useful method for solution studies of flexible, low-compactness macromolecules in the kDa–MDa range [75, 80]. SAXS provides structural data (at a low resolution) that enables estimation of the size of the molecule via its radius of gyration (Rg ) and its degree of extension via the maximal intramolecular distance parameter, DM AX [81]. SAXS provides overall features (nanometre scale) of the molecule (size, tertiary and quaternary folds) but does not yield atomic resolution details [80]. For dilute solutions of monodispersed non-interacting particles, the scattering of X-rays is given by the intensity relation under the Guinier approximation for small angles [82, 83]. In short, s is the modulus of the scattering vector (the momentum transfer) and is given by
s=
4π sin θ λ
(1.2)
where 2θ is the total scattering angle and λ is the wavelength of the incident X-rays [84, 85]. For small momentum transfer, the scattering intensity is !
Rg2 I(s) ≈ I(0) exp −s 3 2
"
(1.3)
where I(0) denotes the intensity of forward scattering. Thus, for sRg < 1, a plot of the natural logarithm of the scattering intensity against s2 , the square of the modulus of the scattering vector (the Guinier plot) will have a slope of −Rg2 /3 and intercept of ln (I(0)) [86]. Larger values of Rg will indicate disorder or lack of compactness [2]. From the I(0) term, the molecular weight of the molecule can be inferred through the relation I(0) = κc∆ρ2 (M W )2
22
(1.4)
where κ is a constant of proportionality determined from the measurement of a standard sample, c is the concentration, ∆ρ is the average electron density contrast (difference in electron density between the macromolecule and the solvent) and M W is the molecular weight [84]. The parameter DM AX is not obtained in such a straightforward manner and its description is beyond the scope of this thesis.
Briefly, the scattering intensity can be
described by a distribution function, p(r), for the pairwise intramolecular atomic distances r which can be obtained from an indirect Fourier transformation of the scattering profile [85, 86]. In this manner, the scattering intensity is described as
I(s) =
#
DM AX
p(r)
0
sin(sr) dr sr
(1.5)
Plotting s2 I(s) against s, referred to as a Kratky plot, will indicate the degree of compactness in the molecule.
A typical globular protein will exhibit a bell-shaped
distribution, whereas a disordered protein will have no clear maximum [77].
Nuclear Magnetic Resonance Spectroscopy Nuclear magnetic resonance (NMR) spectroscopy is uniquely applicable to the determination of three-dimensional biomolecular structures at the atomic level in solution [87]. NMR is also able to provide structural and dynamic details of flexible and disordered solution state proteins at the atomic level [88]. The 1 H one-dimensional NMR spectrum can indicate disorder through the amount of dispersion of the resonances and the widths of the lines [88]. Poorly dispersed signals in the proton dimension indicate disorder. Isotope enrichment of the protein samples in
15
N and
13
C allows multidimensional experiments to be performed
that take advantage of the increased dispersion in the
23
15
N and
13
C dimensions of two- and
three-dimensional experiments. High magnetic field spectrometers also provide increased dispersion in the signals and, in combination with isotope labelling, can overcome the serious crowding and overlap in resonances [89]. Figure 1.5 shows representative 1 H and 15
N correlation spectra of the N-terminal SH3 domain of the protein drk (drkN SH3) in
the folded and unfolded states [90] and demonstrates the reduction in the dispersion of resonances (particularly the protons) between the folded and unfolded states of the protein.
Figure 1.5:
1
H-15 N correlation spectra from the first time point in a T1 relaxation series
(t=0.011 s) of (a) the folded state of drkN SH3 and (b) the guanidinium chloride unfolded state of the drkN SH3 domain. Spectra were recorded at 14◦ C on a 600 MHz spectrometer. In (b), the peak at 7.3 and 115 ppm corresponds to the signal arising from guanidinium chloride. Reproduced from [90] with permission of L. E. Kay.
One of the main advantages of NMR spectroscopy is the ability to obtain detailed dynamic information of the atomic motions (particularly of the
15
N and
13
C nuclei) over a
range of timescales that span 12 orders of magnitude (picosecond–second) [91]. However, 24
NMR is hampered by the fact that it requires sample concentrations in the 1-2 mM range. This concentration condition can be a serious problem in the study of the disordered state since disordered proteins are prone to aggregation at these concentrations [88].
Thus,
preparing and maintaining a monodisperse sample may be difficult. The characterisation of the disordered states of proteins is not limited to the aforementioned methods, but they are the most common. For a more in-depth review of detecting and characterising disordered proteins, the reader is directed to [77] and [92]. The remaining chapters of this thesis describe NMR relaxation and analysis of the disordered state of the HIV-1 Tat protein.
1.6
The Human Immunodeficiency Virus
Despite some disagreements in naming and credit for discovery, human immunodeficiency virus (HIV) has been understood for more than 20 years to be the causative agent of acquired immunodeficiency syndrome (AIDS) [93–95]. HIV belongs to the lentivirus genus which is a family of retroviruses that are distinguished by their cone-shaped capsid core [96]. Retroviruses are defined by their RNA genomes (single strand) which are reverse-transcribed into DNA and then integrated into the host DNA (the provirus) of the infected cells [97]. The major cells infected and depleted by the HIV are the CD4+ T-lymphocytes which play a critical role in the immune response [98]. The mature HIV virion, depicted in Figure 1.6, is comprised of a lipid bilayer that is derived from its host cell. The lipid bilayer also contains some host cell membrane proteins. The surface of the virion is covered with glycoproteins (gp) which are involved in binding to host cell surface receptors, the most common of which is the immunoglobulinlike protein CD4 [99]. The surface unit (SU) is made of a trimeric complex of glycoprotein
25
120 (gp120) and is attached to the virion through a transmembrane (TM) trimeric complex of glycoprotein 41 (gp41) [100,101]. The transmembrane complex may also aid in the fusion of the virion and host cell initiated through an N-terminal glycine-rich peptide [99]. Beneath the lipid bilayer is the matrix (MA) complex of proteins (p17). The conical capsid is a complex of approximately 1500 capsid (CA) proteins (p24) [102]. The capsid contains the viral genome (two single strands of RNA) along with several viral enzymes necessary for early replication steps [103]. The inner core contains the viral enzyme reverse transcriptase (RT or p51) which processes the viral RNA into viral DNA inside the host cell; integrase (IN or p31) that inserts the viral DNA into the host DNA in the nucleus; nucleocapsid (NC or p9) which functions to deliver unspliced RNA for assembly of new virions; protease (PR) which cleaves viral polyproteins into their functional units; and negative factor (Nef) which is primarily involved in the down regulation of CD4 surface expression (increases the rate of CD4 endocytosis and degradation by lysosomes [104]) from the host cell but it may also serve to enhance the envelope (Env) incorporation into new virions and facilitate budding and release [99].
26
SU TM
gp120
gp41
Nef
PR
CA (p24)
NC (p9)
RT (p51) RNA lipid membrane MA (p17)
IN (p31)
Figure 1.6: The characteristic features of the HIV-1 virion showing the conical capsid comprised of ∼1500 copies of the capsid protein (CA or p24). The core contains the viral diploid single-strand RNA [103], nucleocapsid protein (NC or p9), protease (PR), integrase (IN), negative factor (Nef), integrase (IN or p31) and reverse transcriptase (RT or p51). The capsid core is enclosed in a protein matrix (MA or p17). The matrix is enveloped by a lipid bilayer derived from the host cell along with some host cell proteins. The surface unit (SU) is comprised of trimers of glycoprotein 120 (gp120) which are anchored to the envelope via the transmembrane (TM) complex consisting of a trimer of glycoprotein 41 (gp41).
27
There are nine open reading frames in the HIV-1 genome [99] as depicted in Figure 1.7. The group-specific antigen (gag) gene encodes a polyprotein (Gag) that contains the major structural components of the virus core (matrix, capsid, and nucleocapsid). The pol gene encodes a polyprotein (Pol) containing reverse transcriptase, integrase and protease. Protease (also contained in the mature virion) cleaves the Gag and Gag-Pol polyproteins into the individual protein units [96]. The envelope (env) gene encodes the Env proteins glycoprotein 120 and glycoprotein 41 which make up the surface unit and transmembrane complexes [99]. The six additional reading frames are the genes for the regulatory proteins trans-activator of transcription (Tat) and regular expression of virus (Rev), and the accessory proteins: viral infectivity factor (Vif), viral protein U (Vpu), viral protein R (Vpr), and negative factor (Nef) [96].
vpr rev LTR
gag
env
tat nef
pol vif
tat vpu
rev
LTR
promoter Figure 1.7: Open reading frames of the HIV genome. The HIV long terminal repeat (LTR) has an inducible promoter [105] followed by the genes for: group-specific antigen (gag) encoding a polyprotein containing the major structural components of the matrix, capsid and nucleocapsid complexes; polyprotein (pol) encoding another polyprotein containing the viral enzymes protease, reverse transcriptase and integrase; viral infectivity factor (vif); viral protein R (vpr); viral protein U (vpu); envelope (env) encoding the surface and transmembrane glycoproteins; and the regulatory proteins trans-activator of transcription (tat) and regular expression of virus (rev). Both of the regulatory proteins are encoded by two exons.
28
The general features of the HIV life-cycle are depicted in Figure 1.8 for infection of a CD4+ T-lymphocyte. Upon gp120 recognition and binding to the cell surface receptor (in this case CD4), the virion attaches itself to the cell. Additional interactions between host cell surface chemokine receptors (CXCR4 or CCR5) induce a conformational change in the CD4 receptor that allows for fusion of the viral envelope with the cell plasma membrane [96,106]. The fusion and release of the viral core are not well understood processes. Once the diploid viral RNA is released into the cell, it is processed by the viral enzyme reverse transcriptase to double-stranded DNA. A pre-integration complex results from the association of the viral DNA with a complex that contains at least integrase, matrix protein, and reverse transcriptase [96]. The pre-integration complex crosses the nuclear membrane and is integrated into the host DNA by integrase and becomes the provirus.
29
chemokine receptor
recognition and binding
mature virion
fusion/penetration
CD4 RT viral RNA
reverse transcription viral DNA proviral pre-integration complex
nucleus integration
provirus
expression
Tat spliced RNA Rev
host DNA
Rev
unspliced RNA
Gag-Pol endoplasmic reticulum
ribosomal translation Gag immature virion
assembly
mature virion
budding
Figure 1.8: The general features of the HIV life-cycle upon infection of a CD4+ Tlymphocyte.
30
Expression of the virus results in a number of mRNAs of varying length. These viral RNAs fall into three categories: unspliced, partially spliced and multiply spliced [106]. The full length, unspliced RNAs can exit the nucleus for translation, or assemble at the cell membrane for packaging into a new virion. Initially, multiply-spliced mRNAs are transported to the cytoplasm and translated into the viral regulatory proteins Tat and Rev along with the accessory protein Nef. The Rev protein ensures that full length viral transcripts leave the nucleus and enter the cytoplasm for Gag and Gag-Pol synthesis and assembly of new virions [96]. The viral regulatory protein Tat is essential for virus expression and its function is described in more detail in the next section.
1.7
The HIV-1 Trans-Activator of Transcription
The HIV-1 trans-activator of transcription (Tat) is a small regulatory protein essential to the viral life cycle. Tat is a 101-residue protein that is encoded by two exons and is expressed during the early stages of viral infection [107]. In addition to its role as a transcriptional regulator of HIV gene expression, Tat has been implicated in a number of extracellular activities including supporting endothelial cell proliferation (contributing to development of Karposi Sarcoma) [108–110], inducing apoptosis of T cells [111], inducing cell death of neurons [112, 113], decreasing expression of tight junction proteins [114], disruption of the blood-brain barrier (BBB) [115], and inducing oxidative stress [113, 116]. Tat may also be involved in derepression of heterochromatin, in transcription initiation [117], and in reverse transcription [118]. Absence of Tat and low levels of CDK9 and cyclin T1 in resting CD4+ T-cells are all implicated in HIV-1 latency [119]. In general, the pathological activities of Tat contribute to both immune and non-immune dysfunction resulting in an overall increase in the impact of the viral infection. A major event in the progression of HIV is neuronal damage despite the fact that 31
neurons cannot be infected with the virus [120]. Tat can be released from infected cells within the central nervous system (CNS) (microglia and astrocytes) [121] and is able to cross the BBB [115] resulting in apoptosis of neurons. Two of the resulting central nervous system pathologies are HIV-associated dementia (HAD) and HIV-associated encephalitis (HIVE) [121,122]. Several HIV proteins have been implicated in neural dysfunction including (but not limited to) gp120, gp41, Rev, Nef and Tat [123]. The Tat amino acid sequence has a low overall hydrophobicity and a high net positive charge, and analyses by several disorder prediction algorithms suggest that it is intrinsically disordered with a possible folding nucleus between residues 42 and 75; early CD spectropolarimetry experiments suggested a lack of secondary structure [124]. The first tat exon defines amino acids 1–72 (shown in Figure 1.9) that encompass an acidic and proline-rich N-terminus (1-21), a cysteine-rich region (22–37), a core (38–47), a basic region (48–57), and a Gln-rich segment (58-72) [125]; it activates transcription with the same proficiency as the full-length protein [126–129]. Residues 1-24 form the co-activator and acetyltransferase CBP (CREB-response element binding protein) KIX domain binding site [124]. Cyclin T1 is thought to interact with the Cys-rich region of Tat [130]; mutation of any one of 6 of the 7 Cys residues results in loss of transactivation [126]. The end of the Cysrich region and the core are involved in mitochondrial apoptosis of bystander non-infected cells through their ability to bind tubulin and prevent its depolymerization [131]. The basic region is important for TAR RNA binding (see below) [132] and nuclear localization; the segment between Tyr-47 and Arg-57 has been used to transport a large variety of materials including proteins, DNA, drugs, imaging agents, liposomes, and nanoparticles across cell and nuclear membranes [133]. The Gln-rich region has been implicated in mitochondrial apoptosis of T-cells [134].
32
Pro-rich
Cys-rich
core
basic
Gln-rich
MEPVDPRLEPWKHPGSQPKTA CTNCYCKKCCFHCQVC FITKALGISYG RKKRRQRRRPP QGSQTHQVSLSKQ
10
20
30
40
50
60
70
exon 2 segment
101
Figure 1.9: The HIV-1 Tat sequence (BH10 isolate) encoded by exon 1. The 72 residue segment encompasses an N-terminal proline-rich region (1-21) containing the only three acidic residues, a cysteine-rich region (22–37), a core (38–47), a basic region (48–57), and a Gln-rich segment (58-72). Residues 73–101 are encoded by exon 2. The 72 residue segment encoded by exon 1 activates transcription with the same proficiency as the full-length protein.
The second tat exon defines residues 73–101 and includes an RGD motif that may mediate Tat binding to cell surface integrins [135].
The function of the second exon-
encoded polypeptide has, thus far, been difficult to determine [118,129]. Studies have shown that the peptide encoded by the second exon is involved in repressing expression of major histocompatibility complex (MHC) class I molecules whose presence at the cell surface serve as targets for cytotoxic T lymphocytes [136–138]. This repressive function from the exon-2 encoded peptide may contribute to HIV infected cells escaping an immune response [136,137]. There are several laboratory strains of the Tat protein with 86 residues that may originate from the HXB2 strain (subtype B) commonly found in Europe and North America [139]. These 86 residue variants are not found in natural viral isolates [140]. It has been suggested that the 86 residue form of Tat was a consequence of tissue culture passaging and a single nucleotide correction in the laboratory genomes yielded the expected 101 residue protein from the Tat coding frame [141]. During transcription of the HIV viral DNA, RNA Polymerase II (RNAP II) is halted as a result of binding to negative transcription elongation factor (N-TEF), leading to 33
prematurely terminated transcripts that may include the tat message [126]. It has also been suggested that the early HIV proteins, Nef, Tat and Rev, may result from transcription of non-integrated viral DNA [142]. Regardless of Tat’s origin, following translation it is transported from the cytoplasm into the nucleus where it binds to a stable, nuclease-resistant, stem-loop structure referred to as the trans-activation response (TAR) element. The TAR element is located downstream of the long terminal repeat (LTR) and spans nucleotides +1 to +59 of the nascent RNA [143]. Tat stimulates elongation of full-length transcripts by recruiting the positive transcription elongation factor b (P-TEFb), a hetero-dimeric complex of a regulatory cyclin T and cyclin-dependent kinase 9 (CDK9). Upon formation of the PTEFb/Tat-TAR complex, CDK9 is brought into close proximity to the carboxy-terminal domain (CTD) of RNAP II. CDK9 can then hyperphosphorylate the CTD of RNAP II, the components of N-TEF, and the transcription elongation factor Spt5 [144–146]. Recent results suggest that Tat activates P-TEFb by displacing Hexim1 (hexamethylene bisacetamideinducible protein 1) from its cyclin T1 binding site [147] and that the affinity of the Tatcyclin T1-CDK9 complex for TAR is regulated through Tat acetylation by histone acetyl transferase (HAT) [148, 149]. Tat binds directly to TAR, as depicted in Figure 1.10, through electrostatic interactions between its basic arginine-rich region and the negatively charged phosphates at a stem-loop UCU-bulge (uridine23-cytidine24-uridine25) of the RNA and the complex has a dissociation constant of Kd =12 nM [150]. The two base pairs immediately above (G26 :C39 and A27 :U38 ) the TAR bulge are also believed to be critical for Tat recognition and the two base pairs below (A22 :U40 and G21 :C41 ) the bulge also contribute to the binding affinity [151]. Phosphates at positions 22, 23 and 40 on the RNA are as well critical for Tat binding interactions [152]. The basic Arg-rich and Gln-rich regions of Tat govern the binding affinity of Tat to TAR RNA, but it is the core region that seems to control the specificity of Tat for the TAR element [153, 154]. 34
Regulation of CDK9 by cyclin T1 and TAR In addition to carrying the kinase subunit CDK9, TAK also contains a cyclin subunit called cyclin T1 (Wei et al., 1998). Cyclin T1 is required for CDK9 kinase activity and promotes auto-phosphorylation of the C-terminus of CDK9 (Fong & Zhou, 2000; Garber et al., 2000; Garber et al., 1998a). Remarkably, in addition to regulating CDK9 activity, cyclin T1 is able to mediate Tat association with TAR RNA (Figure 2). TAK U G G CDK-9 G C G G A Cyclin T1 A C U G U C C U A U Tat G C C
A C C A G A U U G G U C U C U C U G G G 5'
U G G C U A A C TAR RNA U A G G G A A C C C 3'
Figure 2. Recognition of TAR RNA by Tat and TAK. Tat recognition primarily requires interactions with the Figure 1.10: The In Tat-TAK-TAR complex. The regulatory complex formed by bulge region of TAR. the presence of regulatory cyclin T1, conformational rearrangements in Tat permit interactions with the apical loop sequences. Part of the interface between Tat and cyclin T1 is believed to involve cysteine recognition of the TAR stem-loop bulge by Tat and the Tat-associated kinases (TAK). Tat residues from each protein that participate in zinc binding (From Karn, 1999). recognition primarily involves interactions between the Arg-rich region of its basic domain
and the phosphates of the UCU bulge in the TAR element of the RNA. The Tat-cyclin T1 interaction may involve cysteine residues in both proteins through coordination with zinc ions. Reproduced from J. Mol. Biol., 293(2) pp.235–254, J. Karn, “Tackling Tat”, Copyright (1999), with permission from Elsevier and J. Karn.
35
There have been several attempts to determine solution conformations of Tat and its segments, both alone and in complexes. Most of these studies suffered from poor resolution in homonuclear 1 H NMR experiments on unlabelled protein. However, 1 H NMR spectroscopy and molecular dynamics simulations suggested that Tat1−86 (Z-variant) forms condensed domains encompassing the core and Gln-rich regions, whereas the basic and Cys-rich regions were found to be highly flexible at pH 6.3 under reducing conditions [155]. In a model of the 87-residue Tat Mal protein at pH 4.5 under oxidizing conditions, the N-terminal Trp-11 forms a hydrophobic core through interactions with Phe-38 and Tyr-47 [156]. The basic region is in an extended conformation and the Cys-rich region contains β-turns; an α-helix is found in the Gln-rich segment. A low-resolution, globular conformation with some flexible segments (particularly in the basic region) was deduced for
13
Cα -Gly-labelled synthetic Tat1−86 (Bru)
at pH 4.5, in the absence of reducing agents [157]. An oxidized Tendamistat-Tat1−37 fusion protein showed multiple conformations with some evidence of helicity in the Cys-rich region (20-33) at pH 3.5 [158]. A fusion protein consisting of the activation domain from the unrelated Equine Infectious Anemia Virus and Tat48−57 showed high helical content in the basic domain by NMR spectroscopy and CD [159]. There have also been several studies of Tat fragments in complex with TAR RNA mainly focusing on the conformation of TAR [152,160– 163]. NMR spectroscopy suggested a conformational change in Tat32−72 , in the region of Gly42 and Gly-44, upon binding to TAR [162]. 1 H NMR also showed that Tat46−55 , acetylated at Lys-50, is bound in an extended conformation to the bromodomain of p300/CBP-associated factor (PCAF), a HAT transcriptional coactivator [149]. CD spectra suggested the possibility of a conformational change in Tat1−86 upon binding to the KIX domain of CBP [124]. 15
N NMR relaxation measurements showed that Tat47−58 becomes slightly more ordered on
binding heparin [164], while CD studies of overlapping peptide fragments suggested that the most flexible regions of Tat are those that are adjacent to the basic region [165].
36
1.8
NMR Investigation of the Structure and Dynamics of Tat
In order to more clearly define the role of Tat in regulating transcription, as well as its extracellular activities, the determination of its molecular structure is critically important. However, with its low-amino acid-sequence-complexity, low overall hydrophobicity, and high net positive charge, Tat has all of the indicators of intrinsic disorder. Previous homonuclear NMR studies of Tat have shown that amide proton chemical shifts of the protein are within the range characteristic of disordered proteins [156–158]. In order to gain a greater understanding of Tat and its multifaceted activities, one should observe the behaviour of the protein in solution with and without its many binding partners. High-resolution multidimensional heteronuclear NMR will afford the greatest amount of information on the structural and dynamic properties of Tat in solution. However, it has thus far been difficult to obtain a monodisperse solution, in particular a monomeric solution, of the protein at concentrations amenable to NMR due to difficulties imposed by the readily oxidized Cys-rich region of the protein leading to mixtures of soluble aggregates. The high net positive charge of the protein also poses difficulties in that it causes the protein to stick to many charged surfaces including glass and polyanionic species in cell lysates (DNA and RNA). In order to study Tat both alone and in the presence of binding partners, isotopic labelling of the protein is necessary to resolve the complicated and crowded spectra. Isotopic enrichment of one protein in a complex will allow filtering of NMR signals and will reduce the complexity of the spectra and their analysis. To this end, one of the goals of this project was to develop a protocol for biological expression and purification of isotopically enriched Tat (in
13
C and
15
N) at yields amenable for study by NMR. The resulting protein samples
37
were also required to be monodisperse and preferably monomeric. With isotopic-enrichment, 1
H NMR resonances of a disordered protein can be more rigorously assigned. And, finally,
with the resonance assignments in hand, another goal of this research was to characterize the structure and dynamics of the protein by multinuclear NMR spectroscopy. This thesis presents a protocol for the bacterial expression of isotopically enriched recombinant Tat (residues 1–72) for structural and dynamics studies by NMR. This protocol has been used to prepare NMR-quality samples of Tat1−72 in an unambiguously reduced and monomeric state for the assignment of the protein backbone resonances [166]. These preparations have permitted the amide backbone
15
N-relaxation rates and steady-state
heteronuclear 1 H-15 N NOEs to be measured for most residues in the protein and used to gain insight into the molecular motions of this intrinsically disordered protein.
38
Chapter 2 Spectral Densities, Relaxation and Dynamics in Nuclear Magnetic Resonance Spectroscopy Preface The following treatment of the relaxation of spins in nuclear magnetic resonance (NMR) spectroscopy is based primarily on the description in Abragam (1961) [167]. However, several aspects of the Abragam description are modified to be consistent with the work of others published subsequent to Abragam’s pivotal text. Some of the notation has also been changed to avoid confusion with other standard notation schemes (e.g. Abragam’s interaction frame is denoted by “
∗
” which also denotes the complex conjugate in many mathematical texts).
Some additional texts are noted during the course of this treatment as they provide more appropriate descriptions of some aspects of the discussion. This treatment is intended to provide the reader with a detailed description of the development of the relaxation equations
39
used in most heteronuclear NMR studies to provide dynamics information and ultimately the data for Model-Free estimation of dynamics parameters.
2.1
Semi-Classical Description of Relaxation
The theory of relaxation has four descriptions of varying complexity [168]: (i) the phenomenological Bloch equations where relaxation is described in terms of a firstorder rate process to return the magnetization to equilibrium; (ii) second-order perturbation theory, where longitudinal relaxation rates account for the transition probabilities between distinct eigenstates caused by coupling of the nuclei to the lattice; (iii) semi-classical relaxation where the lattice, with a large number of degrees of freedom, is considered to be a continuous distribution of lattice states [167]; (iv) a full quantum mechanical treatment of the lattice (the most fundamental description), which becomes necessary at very low temperatures where only a fraction of the number of degrees of freedom of the lattice are excited [167]. In the semi-classical approach to describing nuclear relaxation, the spin system is treated quantum mechanically and the surroundings (lattice) are treated classically. A drawback of this treatment is that the spin system evolves toward a final state in which the energy levels of the spin system are equally populated. Equivalently, the semi-classical theory is formally correct only for an infinite Boltzmann spin temperature; at finite temperatures a correction is required to the theory to ensure that the spin system relaxes toward an equilibrium in which populations are described by a Boltzmann distribution. The completely quantum mechanical description of spin relaxation does not suffer from the problems 40
associated with predicting the system reaching proper equilibrium, but is consequently far more complicated in its computation and therefore beyond the scope of this treatment.
2.1.1
The Master Equation of Relaxation
In the semi-classical theory of spin relaxation, the Hamiltonian for the system is written as the sum of a deterministic quantum mechanical Hamiltonian, Hdet (t), that acts only on the spin system and a stochastic Hamiltonian, H1 (t), that couples the spin system to the lattice. H(t) = Hdet (t) + H1 (t)
(2.1)
= H0 + Hrf (t) + H1 (t) where the H0 represents the Zeeman and scalar coupling Hamiltonians and Hrf (t) is the Hamiltonian for any applied radio frequency fields. The equation describing the evolution of the density operator is given by d σ = −ı[H(t), σ(t)] dt
(2.2)
The Hamiltonians Hrf (t) and H1 (t) are time-dependent perturbations acting on the main time-independent Hamiltonian H0 . The explicit influence of H0 can be removed by transforming (2.2) into the interaction representation where every operator Q is replaced by ˜ = eıH0 t Qe−ıH0 t Q
(2.3)
The interaction representation is a unitary transformation of each operator by U (t) = eıH0 t and U † (t) = e−ıH0 t (adjoint of U (t)), and d U (t) = ıH0 eıH0 t = ıH0 U (t) dt 41
(2.4)
˜ 1 = U H1 (t)U † . Then, σ ˜ = U σ(t)U † and H Consider a system in the absence of a rf -field, where the Hamiltonian is of the form
H(t) = H0 + H1 (t) and d σ = −ı[H0 + H1 (t), σ(t)] dt
(2.5)
In the interaction frame the evolution of the density operator is described by: d d σ ˜ = (U σ(t)U † ) dt dt
(2.6)
Substitution of equation (2.5) into the expansion of (2.6) yields d d dU dσ dU † σ ˜ = (U σ(t)U † ) = σU † + U U † + U σ dt dt dt dt dt = ıU H0 σ U † − ıU [H0 + H1 (t), σ]U † − ıU σH0 U † = −ıU {−H0 σ + H0 σ + H1 σ − σH0 − σH1 + σH0 }U † = −ıU {H1 σ − σH1 }U †
(2.7)
= −ıU [H1 , σ]U † = −ı[U H1 U † , U σU † ] = −ı[H˜1 , σ ˜] The equation in (2.7) can be solved by successive approximation up to the second order as follows: d σ ˜ (t$ ) = −ı[H˜1 (t$ ), σ ˜ (t$ )] dt$ # t # t $ ˜ $ ), σ d˜ σ (t ) = −ı [H(t ˜ (t$ )]dt$ 0
0
42
(2.8)
σ ˜ (t) = σ ˜ (0) − ı or equivalently, $
σ ˜ (t ) = σ ˜ (0) − ı
#
t
0
#
t!
0
˜ $ ), σ [H(t ˜ (t$ )]dt$
(2.9)
˜ $$ ), σ [H(t ˜ (t$$ )]dt$$
(2.10)
Substitution of (2.10) into (2.9) yields
σ ˜ (t) = σ ˜ (0) − ı =σ ˜ (0) − ı
# t$ 0
#
t
0
%
˜ $ ), σ H(t ˜ (0) − ı
#
0
˜ ), σ [H(t ˜ (t )]dt + ı $
$
$
t!
2
˜ $$ ), σ [H(t ˜ (t$$ )]dt$$
# t %# 0
t!
0
&'
dt$
˜ 1 (t ), [H ˜ 1 (t ), σ [H ˜ (t )]]dt $
$$
$$
$$
&
(2.11) $
dt
Again (2.11) can be rewritten as
σ ˜ (t$$ ) = σ ˜ (0) − ı
#
0
t!
˜ 1 (t$$ ), σ [H ˜ (0)]dt$ −
#
t!
0
%#
t!!
0
˜ 1 (t$$ ), [H ˜ 1 (t$$$ ), σ [H ˜ (t$$$ )]]dt$$$
&
dt$$
(2.12)
Repeating the above procedure and substituting (2.12) back into (2.11), leads to
σ ˜ (t) = σ ˜ (0)−ı
#
0
t
& # t %# t! ˜ 1 (t$ ), [H ˜ 1 (t$$ ), σ ˜ 1 (t$ ), σ [H ˜ (0)]]dt$$ dt$ +higher order terms [H ˜ (0)]dt$ − 0
0
(2.13)
If the higher order terms are dropped and σ ˜ (t) is truncated to a second order approximation, a differential equation can be obtained by differentiating (2.13) with respect to t. d˜ σ (t) ˜ 1 (t), σ = −ı[H ˜ (0)] − dt
#
t
0
˜ 1 (t), [H(t ˜ $$ ), σ [H ˜ (0)]]dt$$
(2.14)
Applying the change of variable τ = t − t$$ to (2.14) leads to d˜ σ (t) ˜ 1 (t), σ = −ı[H ˜ (0)] − dt
#
0
t
˜ 1 (t), [H(t ˜ − τ ), σ [H ˜ (0)]]dτ
(2.15)
Remark 1. The Hamiltonian H1 (t) is a random function with vanishing average value 43
˜ 1 (t) = 0). (H1 (t) = H Remark 2. Since H1 (t) is a random operator, then so is σ ˜ (t) of (2.13). The observable behaviour of a statistical ensemble will be described by an average density operator σ ˜ which obeys an equation generated by taking the ensemble average on both sides of (2.15) over all the random Hamiltonians H1 (t). To obtain the corresponding equation for the evolution of the density operator in a macroscopic sample, both sides of (2.15) must be averaged over the ensemble of subsystems. The ensemble average is performed under the following assumptions [167, 169]: ˜ 1 (t) = 0. Any non-vanishing components of H ˜ 1 (t) after (i) The ensemble average of H averaging over the ensemble can be included with H0 . ˜ 1 (t) and σ (ii) The ensemble average of H ˜ (0) can be calculated independent of each other. ˜ 1 (t) is much shorter than Remark 3. In liquids, the characteristic correlation time, τc , for H t—on the order of the rotational diffusion correlation time for the molecule (10 −12 − 10 −18 s) [169]. (iii) Given the assumption in (ii) it is permissible to replace σ ˜ (0) with σ ˜ (t) in (2.15). (iv) The upper limit of integration in (2.15) can be extended to +∞. (v) The higher order terms that would have been in (2.15) had the expression in (2.13) not been truncated to a second order approximation can be neglected.
d˜ σ (t) =− dt
#
0
∞
˜ 1 (t − τ ), σ [H1˜(t), [H ˜ (t)]]dτ
since ˜ 1 (t), σ ˜ 1 (t), σ −ı[H ˜ (0)] = −ı[H ˜ (0)] = −ı[0, σ ˜ (0)] = 0 44
(2.16)
and where d˜ σ (t)/dt is an ensemble average (overbar omitted). Remark 4. σ ˜ will henceforth stand for the average density matrix. The semi-classical treatment of the coupling of the spin system to the lattice as a random perturbation should be corrected by replacing σ ˜ (t) with σ ˜ (t) − σ ˜0 , where σ ˜0 = σ0 =
e−!H0 /kT tr{e−!H0 /kT }
(2.17)
is the equilibrium density operator, tr denotes the trace, T is the absolute temperature, and k is the Boltzmann constant. Replacing σ ˜ (t) with σ ˜ (t) − σ ˜0 ensures that the spin system relaxes toward thermal equilibrium populations rather than to a distribution where the states are equally populated. The resulting differential equation is then d˜ σ (t) =− dt
#
0
∞
˜ 1 (t), [H ˜ 1 (t − τ ), σ [H ˜ (t) − σ ˜0 ]]dτ
(2.18)
Remark 5. Relaxation rate constants for the density matrix elements σij are on the order of Rij = H12 (t)τc . The equation in (2.18) is valid on the time scale τc
1 1 2 (0) F Iz − F (1) I+ + F (−1) I− 3 2 2
&
(2.149)
Table 2.3: Tensor Operators for the CSA Interaction (m)
ω (m)
1
A =2
2
m 0
2 I 3 z
0
− 21 I+
ωI
0
2ωI
(m) ∗
(−m)
A2
= A2 = 2 I 3 z + 12 I− 0
(±m)
=
F2 3 2
(3 cos2 θ − 1)
∓3 sin θ cos θe±ıϕ 3 2
sin2 θe±ıϕ
Using the same arguments used for deriving the master equation for dipolar relaxation in (2.58), the unperturbed Hamiltonian can be given as Ho = ωI Iz and the tensor operator in the interaction representation is written as A˜(m) = eıHo t A(m) e−ıHo t = eımωI t A(m) 90
(2.150)
The master equation in (2.58) for the evolution of some physical variable represented by an operator Q, can then be written as " ! ˜ - (−m) - (m) .. - (−m) - (m) .. 1 2 ( ( (m) d'Q( = − ξCSA j (mωI ) ' Ap , Ap , Q ( − ' Ap , Ap , Q (0 (2.151) dt 2 m p where the CSA constant ξCSA is analogous to the dipolar constant α used previously except it has been factored out of the equation for simplicity.
ξCSA = ωI
σ|| − σ⊥ ∆σ = ωI 3 3
(2.152)
Making further simplifications using the relations in (2.123)-(2.125) along with the fact that there is only one value of p for each value of m and thus no longer requiring summation over p, (2.151) can be rewritten as ! " ˜ - (−m) - (m) .. - (−m) - (m) .. d'Q( 1 2 ( (m) 2 (m) = − ξCSA |F | J (mωI ) ' A , A ,Q ( − ' A , A , Q (0 dt 2 m (2.153) In a manner completely analogous to the development of the dipolar relaxation superoperator, the time evolution of the physical variable Q may be written as ˜ d'Q( = −('AQ ( − 'AQ (o ) dt where
.. 1 2 ( (m) 2 (m) 'AQ ( = ξCSA |F | J (mωI )' A(−m) , A(m) , Q ( 2 m
(2.154)
(2.155)
In the case of longitudinal relaxation, where Q = Iz , the double commutators evaluate
91
as follows: - (−0) - (0) .. - (0) - (−0) .. A , A , Iz = A , A , Iz $> $> '' 2 2 = Iz , Iz , Iz 3 3
(2.156)
2 = [Iz , [Iz , Iz ]] 3 =0 ) ** ) - (−1) - (1) .. 1 1 A , A , Iz = I− , − I+ , Iz 2 2 1 = [I− , [−I+ , Iz ]] 4 1 = [I− , I+ ] 4 1 = (−2Iz ) 4 1 = − Iz 2
(2.157)
) ** ) - (1) - (−1) .. 1 1 A , A , Iz = − I+ , I− , Iz 2 2 1 = [−I+ , [I− , Iz ]] 4 1 = − [I+ , I− ] 4 1 = − (2Iz ) 4 1 = − Iz 2
(2.158)
1 2 1 1 'AIz (CSA = ξCSA |F (1) |2 J (1) (ωI )'− Iz − Iz ( 2 2 2 1 2 = − ξCSA |F (1) |2 J (1) (ωI )'Iz ( 2
(2.159)
Then, (2.155) becomes
92
which in turn allows the longitudinal relaxation rate to be obtained from 1 d'Iz ( = − CSA ('Iz ( − 'Iz (o ) dt T1 1 = − CSA ('Iz ( − Izo ) T1
(2.160)
where
R1CSA =
1 T1CSA
1 2 = − ξCSA |F (1) |2 J (1) (ωI ) 2
(2.161)
If the term |F (1) |2 is evaluated as done previously with (2.126), then |F (1) |2
=
F (1) F (−1)
#
π
1 sin θF (1) F (−1) dθ 2 #0 π 8 9 1 sin θ 3 sin θ cos θe−ıϕ (−3 sin θ cos θeıϕ ) dθ = 0 2 # 9 π =− sin θ sin2 θ cos2 θe−ıϕ+ıϕ dθ 2 0 # 9 π 3 sin θ cos2 θdθ =− 2 0 ) *π 9 1 1 1 3 =− cos 5θ − cos θ − 2 cos θ 2 16 5 3 0 ! " 9 1 64 =− 2 16 15 6 =− 5 =
93
(2.162)
Substitution of the result from (2.162) into (2.161) yields 1 2 R1CSA = − ξCSA |F (1) |2 J (1) (ωI ) 2 ! " 6 1 2 J (1) (ωI ) = − ξCSA − 2 5 ! " 1 ∆σ 2 ωI2 6 =− − J (1) (ωI ) 2 9 5 1 ∆σ 2 ωI2 (1) = J (ωI ) 5 3
(2.163)
= c2 J(ωI ) where the factor 1/5 has been absorbed into the orientational spectral density function defined in (2.135) and the constant c is given by 8 9 σ|| − σ⊥ ωI ∆σωI √ c= √ = 3 3
(2.164)
Using analogous reasoning, a relation for the contribution of the CSA interaction to transverse relaxation may be obtained by using Q = I+ or I− . The double commutators of (2.155) evaluate as: - (−0) - (0) .. .. A , A , I+ = A(0) , A(−0) , I+ $> $> '' 2 2 Iz , Iz , I+ = 3 3 2 [Iz , [Iz , I+ ]] 3 2 = [Iz , −I+ ] 3 2 = I+ 3
=
94
(2.165)
) ** ) .. - (−1) - (1) 1 1 A , A , I+ = I− , − I+ , I+ 2 2 1 = [I− , [−I+ , I+ ]] 4
(2.166)
=0
) ) ** - (1) - (−1) .. 1 1 A , A , Iz = − I+ , I− , I+ 2 2 1 = [−I+ , [I− , I+ ]] 4 1 = − [I+ , −2Iz ] 4 1 = [I+ , Iz ] 2 1 = − I+ 2
(2.167)
With the double commutators evaluated for each value of m, (2.155) may be rewritten as 'AI+ (CSA
1 2 = ξCSA 2
!
" 1 (−1) 2 (1) 2 (0) 2 (0) |F | J (0) − |F | J (ωI ) 'I+ ( 3 2
(2.168)
Since the value of |F (−1) |2 = |F (1) |2 has already been determined in (2.162), it is only
95
necessary to evaluate |F (0) |2 using the relation in (2.126). |F (0) |2
#
π
1 sin θ|F (0) |2 dθ 0 2 %> & %> & # π 9 9 1 38 38 2 2 sin θ 3 cos θ − 1 3 cos θ − 1 dθ = 2 2 0 2 # 92 8 13 π sin θ 3 cos2 θ − 1 dθ = 22 0 # 3 π = (sin θ − 6 sin θ cos2 θ + 9 sin θ cos4 θ)dθ 4 0 ) ! "*π 8 9 3 1 3 5 = (− cos θ) − 2 − cos θ + 9 − cos θ 4 5 0 ! " 3 8 = 4 5 6 = 5 =
(2.169)
With the evaluations of |F (0) |2 and |F (1) |2 from (2.169) and (2.162) respectively, equation (2.168) becomes 'AI+ (CSA
! ! " ! " " 2 6 1 2 1 6 (0) (1) J (0) − − J (ωI ) 'I+ ( = ξCSA 2 3 5 2 5 ! " 1 ∆σ 2 ωI2 4 (0) 3 (1) = J (0) + J (ωI ) 'I+ ( 2 9 5 5
(2.170)
Since the equilibrium value 'I+ (o vanishes, the CSA contribution to transverse relaxation may be written as d'I+ ( 1 = − CSA ('I+ ( − 'I+ (o ) dt T2 1 = − CSA 'I+ ( T2
96
(2.171)
where R2CSA
=
1 T2CSA
! " 1 ∆σ 2 ωI2 4 (0) 3 (1) = J (0) + J (ωI ) 2 9 5 5 2 2 1 ∆σ ωI = (4J(0) + 3J(ωI )) 2 9 c2 = (4J(0) + 3J(ωI )) 6
(2.172)
Finally, the CSA contribution may be used in (2.141) for ρ∗I , as the “miscellaneous” contribution term for the total relaxation rate to obtain the total longitudinal and transverse relaxation rates.
Therefore, from equations (2.136) and (2.163) the total longitudinal
relaxation rate is II II R1II = R1(DIP OLAR) + R1(CSA)
d2 [J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + c2 J(ωI ) = 4
(2.173)
and from (2.139) and (2.172) the total transverse relaxation rate is II II R2II = R2(DIP OLAR) + R2(CSA)
d2 c2 = [4J(0) + J(ωI − ωS ) + 6J(ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + [4J(0) + 3J(ωI )] 8 6 (2.174) with constants d and c defined by (2.132) and (2.164) respectively. The corresponding equation for the total longitudinal relaxation rates of the S spin is similarly obtained from equations (2.138) and (2.163) as SS SS R1SS = R1(DIP OLAR) + R1(CSA)
d2 [J(ωI − ωS ) + 3J(ωS ) + 6J(ωI + ωS )] + c2 J(ωS ) = 4
97
(2.175)
and from equations (2.140) and (2.172) the total transverse relaxation rate of the S spin is SS SS R2SS = R2(DIP OLAR) + R2(CSA)
=
d2 c2 [4J(0) + J(ωI − ωS ) + 6J(ωI ) + 3J(ωS ) + 6J(ωI + ωS )] + [4J(0) + 3J(ωS )] 8 6 (2.176)
The cross relaxation rate is not affected by CSA interactions and is given by equation (2.137).
2.4
The Steady-State Heteronuclear Nuclear Overhauser Effect
Without delving too deeply into the origin of the nuclear Overhauser effect (NOE), or the derivation of the relaxation rates in terms of the transition probabilities and the Solomon equations, an expression may be derived for the steady-state NOE for the two spin-1/2 system considered thus far. Since the expressions for the auto- and cross-relaxation rates have already been derived in terms of spectral density functions, all of the pertinent relations necessary to obtain an expression for the steady-state heteronuclear NOE are already present. For a complete description of the origin and derivation of the NOE, the reader is referred to the extensive explanation in [171] from which this discussion is based. In an effort to avoid the complete derivation of the NOE enhancement, some preliminary points must be made without justification [171]. (i) The intensity of a resonance in an NMR spectrum is directly proportional to the population differences between the energy levels involved in the transition. (ii) The rate at which these populations return to their equilibrium populations following a perturbation is determined by a transition state probability W . Although in the present treatment, these rates have been described by the frequency of the transition and the corresponding spectral density function describing the 98
motion of the transition. In fact, one could arrive at the same results by describing the evolution of the spin operators in terms of transition probabilities and populations. (iii) For dipolar relaxation, the transition probability is dependent on (among other factors) the strength of the local field fluctuating at the frequency of the transition. This local field is the field at the site of one dipole due to the presence of the other dipole. (iv ) The frequency corresponding to the transition is proportional to the energy difference between the two states (Bohr frequency condition). ""
W1S W1I
"! W0IS
!" W1I W1S
W2IS
!!
Figure 2.2: Energy level diagram showing transition probabilities (W) for spin eigenstates α and β. The W1I and W1S probabilities are associated with single-quantum transitions of spins I and S. The probabilities W0IS and W2IS are for zero-quantum transitions (‘flip-flops’) and double-quantum transitions (‘flip-flips’), respectively. Only single-quantum transitions are considered ‘allowed’ transitions. The zero- and double-quantum transitions occur via cross-relaxation. It is assumed that the two spins in the system are close enough in space that their dipole-dipole interaction is appreciable. In other words, the spins are dipole-dipole coupled 99
but not necessarily scalar coupled. It is further assumed that these spins are part of a rigid molecule tumbling isotropically. From the energy level diagram in Figure 2.2 (which is the same as that of Figure 2.1 except that the transition frequencies have been replaced by transition probabilities) it is seen that there are two transitions that involve simultaneous flips of both spins. These are the zero-quantum (αβ ↔ βα) and double-quantum (αα ↔ ββ) transitions, with transition probabilities W0IS and W2IS respectively. These transitions are central to the NOE enhancement by allowing the saturation of spin S to affect the intensity of spin I. The zero- and double-quantum transitions are both referred to as cross-relaxation pathways. Remark 23. The zero- and double-quantum transitions are forbidden in the conventional sense and thus cannot be directly excited by an rf-pulse resulting in an NMR signal. However, the transitions are not forbidden in terms of relaxation mechanisms. There are different selection rules that govern the interactions of the spins with the lattice than those which apply to the interaction with the external oscillating field. Definition ! The NOE enhancement, fI {S} is defined as the fractional change in the intensity of I on saturating S and is given by
fI {S} =
I − Io Io
(2.177)
where I o is the equilibrium intensity of I. " The intensity of I is proportional to the sum of the population differences of the energy levels involved in the transition.
I ∝ (Nαα − Nβα ) + (Nαβ − Nββ )
100
(2.178)
The intensity of S can be obtained similarly from
S ∝ (Nαα − Nαβ ) + (Nβα − Nββ )
(2.179)
At thermal equilibrium, the intensities I o and S o are related by the following relation Io γI = So γS
(2.180)
When spin S is saturated, the populations of the αα and αβ levels are equalized; the populations of the ββ and βα levels are similarly equalized. By equalizing these sets of populations, the ββ and αβ populations are increased and consequently the αα and βα populations are decreased. The single-quantum transitions W1I and W1S only produce independent spin-lattice relaxation of spins I and S respectively. However, if the doublequantum transition (W2IS ) occurs it will act to restore the αα and ββ populations to their equilibrium values decreasing the ββ and increasing the αα populations. The net result is an increase in the population differences (Nαα − Nβα ) and (Nαβ − Nββ ) and increasing the intensity of the I resonance (i.e., a positive NOE enhancement of the I signal). By analogous arguments, it can be shown that the zero-quantum transition (W0IS decreases the intensity of I upon saturation of S and thus gives rise to negative NOE enhancements . The intensities of I and S are proportional to Iz and Sz respectively, immediately prior to the observe rf -pulse. Consequently, the vectors Iz and Sz will also be proportional to the population differences between the states. If one were to go through the derivation of the Solomon equations, one would arrive at the time evolution of the Iz and Sz vectors in terms
101
of transition probabilities to obtain the relation [171] d'Iz ( = − (2W1I + W2IS + W0IS ) ('Iz ( − Izo ) − (W2IS − W0IS ) ('Sz ( − Szo ) dt
(2.181)
Notice the similarity to the expression for longitudinal relaxation of spin I in (2.93). Here the relaxation is described in terms of the transition probabilities rather than the spectral densities of motions but the end result is the same d'Iz ( = −R1II ('Iz ( − Izo ) − R1IS ('Sz ( − Szo ) dt
(2.182)
If S is saturated with a weak rf -pulse (so as to avoid perturbing I) for a period of time t such that t >> 1/R1II and 1/R1SS , then the population of S transitions becomes equalized and the I spin evolves to a steady-state value 'Iz (ss . Under these conditions, d'Iz (ss =0 dt
(2.183)
'Sz ( = 0 d'Iz (ss = −R1II ('Iz (ss − Izo ) − R1IS (0 − Szo ) = 0 dt −R1II ('Iz (ss − Izo ) = R1IS (−Szo ) R1II ('Iz (ss − Izo ) = R1IS (Szo )
(2.184)
'Iz (ss − Izo R1IS = Szo R1II
Using the relation in (2.180) we have Szo = (γS /γI )Izo and upon substitution into (2.184) we obtain 'Iz (ss − Izo R1IS = (γS /γI )Izo R1II γS R1IS 'Iz (ss − Izo = = fI {S} Izo γI R1II 102
(2.185)
Although the nature of the auto-relaxation parameter R1II has not been specified at this point, it refers to the dipolar relaxation parameter as described by the transition probabilities (2W1I + W2IS + W0IS ). However, (2.185) could be easily modified to include a “miscellaneous” relaxation contribution such as the CSA interaction.
Thus, upon
substitution of (2.173) and (2.137) into (2.185), the steady-state NOE is obtained in the form fI {S} =
'Iz (ss − Izo γS R1IS = II II Izo γI R1(DIP OLAR) + R1(CSA) γS = γI
d2 4 d2
[−J(ωI − ωS ) + 6J(ωI + ωS )]
[J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + c2 J(ωI ) 4 γS −J(ωI − ωS ) + 6J(ωI + ωS ) = 2 γI J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS ) + 4c J(ωI ) d2
2.5
(2.186)
Lipari-Szabo Model-Free Formalism
Recall from the definition of the power spectral density function in (2.45) describing the contribution to orientational dynamics of the molecular motions with frequency components in the ω to ω + dω range that j (ω) = Re (q)
/#
∞
−∞
(−q) (q) Fk (t)Fk (t
−ıωτ
+ τ )e
dτ
0
(2.187)
For relaxation in isotropic liquids at the high temperature limit, j (q) (ω) = (−1)q j (0) (ω) ≡ (−1)q j(ω)
(2.188)
where j(ω) is the auto-spectral density function [169]. The consequence of (2.188) is that only one auto-spectral density needs to be calculated. As mentioned in section 2.2.4, the (q)
spatial functions F2
arise from tensor operators of rank k = 2 and may then be expressed
103
in terms of spherical harmonics. (0)
F2
(2.189)
= c0 (t)Y20 [Ω(t)]
where Ω(t) represents the time variation of the polar angles θ(t) and ϕ(t) in the laboratory reference frame which define the orientation of the unit vector involved in the interaction (i.e. in the direction of the internuclear bond vector connecting spins I and S for the dipolar interaction). With (2.189) the auto-spectral density j(ω) can then be expressed as j(ω) = Re = Re
/#
∞
/#−∞ ∞ −∞
−ıωτ
c0 (t)c0 (t + τ )Y20 [Ω(t)]Y20 [Ω(t + τ )]e 0 −ıωτ C(τ )e dτ
dτ
0
(2.190)
where the stochastic correlation function C(τ ) has been introduced and is defined [169] as
C(τ ) = c0 (t)c0 (t + τ )Y20 [Ω(t)]Y20 [Ω(t + τ )]
(2.191)
For a rigid spherical molecule undergoing Brownian rotational motion, c0 (t) = c0 and the auto-spectral density function [169] can be described by the orientational spectral density function introduced in (2.135) j(ω) = d2 J(ω)
(2.192)
where d is the constant from (2.132) and is equal to c0 . The corresponding orientational correlation function is defined as
C0 (τ ) = 4πY20 [Ω(t)]Y20 [Ω(t + τ )]
(2.193)
where the spherical harmonics defined in (2.122) are being used. Again, it is assumed that the correlation function takes the form of e−|τ |/τc as was done in (2.133). However, in this 104
case the normalization factor 1/(2k + 1) (related to the rank k of the tensor or the order of the spherical harmonic) must be included2 to the spherical harmonics in equation (2.122). Then for k = 2, the orientational correlation function is 1 C0 (τ ) = e−|τ |/τc 5
(2.194)
Upon Fourier transformation the same result as in (2.135) is obtained
J(ω) =
2 τc 5 1 + ω 2 τc2
Since proteins are not rigid spheres but contain internal dynamics in addition to the overall rotational correlation of the molecule, a description of the internal dynamics is required. If the overall motion of the molecule is isotropic and the internal motions differ from the overall motions by at least two orders of magnitude, then the stochastic correlation function is separable and can be written as
C(τ ) = CO (τ )CI (τ )
(2.195)
In other words, the overall correlation function CO (τ ) and the internal correlation function CI (τ ) are said to be stochastically independent. The overall correlation is the same as the function defined in (2.194). However, the internal correlation function does not present itself so straightforwardly. The reason for this is that, with the overall correlation function, an idealized spherical top tumbling isotropically in the laboratory reference frame was assumed. In the case of internal motions, some description of each of the transition sites within 2
The additional factor 1/(2k + 1) is required due to the fact that Lipari and Szabo did not ? use the normalization factor (2k + 1)/4π in their definition of the spherical harmonics Ykq (θ, ϕ) as was done in (2.122).
105
the molecule—a model—is required. However, if the internal motions could be expressed analogously to the overall motion as something with similar exponential character, this would lead to a summation over all sites in the molecule
CI (t) =
(
ai e−t/τi
(2.196)
i
The length of this expansion and the magnitudes of the amplitudes of motion (ai ) and correlation times (τi ) depend on the nature of the motion and require a model description. However, it is possible to infer some properties of the internal correlation function that are model independent (or model-free) if it is assumed that the internal motions are on a faster time-scale than the overall motions. The first inference that may be made is that at t = 0 the normalized correlation function CI (0) = 1. The second inference is that for long times (t = ∞) the internal correlation function is CI (∞) = S2 , where S is the generalized order parameter. S2 describes the model-independent behaviour of the internal correlation function CI (t). When the internal motion of the molecule is completely unrestricted, the internuclear vector will sample all possible orientations with equal probability; the internal correlation function will go to zero and S2 = 0. If, on the other hand, the internal motion is completely restricted as in a rigid molecule, then the function of the orientational probability will vanish for all orientations not equal to the t = 0 orientation. In this case, CI (t) = CI (0) = 1 and S2 = 1. In the Lipari-Szabo model-free formalism, the internal correlation function is approximated by a single exponential with correlation time τe that decays toward S2 as t → ∞ [175, 176]. The Lipari-Szabo internal correlation function is then CI (t) = S2 + (1 − S2 )e−t/τe
106
(2.197)
With this approximation to the internal correlation function, the total correlation function can be expressed as the product of the overall and internal correlation functions as in (2.195) to obtain the auto-spectral density function j(ω) = Re
/#
∞
−ıωτ
C(τ )e
/#−∞ ∞
dτ
0
CO (τ )CI (τ )e dτ = Re −∞ ) * 2 S2 τc2 (1 − S2 )τ 2 = + 5 1 + ω 2 τc2 1 + ω2τ 2 −ıωτ
0
(2.198)
where (2.194) has been used for the overall correlation function and 1 1 1 = + τ τc τe
(2.199)
The Lipari-Szabo formalism thus results in a description of the motion of the protein in terms of the spatial restriction of internal motion (S2 ), the overall correlation time τc and the effective internal correlation time τe . However, the formalism hinges on the assumption that the overall and internal motions are stochastically independent with the internal motions being on a faster timescale than the overall motion.
2.6
Relaxation in the Rotating Frame
Remark 24. The following development of relaxation in the rotating frame is not as complete as the previous development of the longitudinal and transverse relaxation rates (Sections 2.2 and 2.3). In this section many of the results from the previous sections are utilized to simplify the development of the rotating frame relaxation rates. Even with the aid of the results of the previous sections, the following ‘skeleton outline’ of the development of the rotating frame relaxation rates is still lengthy. 107
In the previous sections it has been shown how heteronuclear relaxation rates probe dynamics, over a wide time-scale, through the behaviour of the spectral densities. These spectral densities provide measurements of the motional behaviour of proteins in solution and can be obtained from relaxation rates of protonated
15
N (in particular the backbone
amide) which are dominated by dipole-dipole interactions between the nitrogen and its attached proton. Traditionally, protein dynamics are investigated through measurements of the longitudinal (R1 ) and transverse (R2 ) relaxation rates and the steady-state heteronuclear NOE [177]. The transverse relaxation rate is usually obtained through a Carr-PurcellMeiboom-Gill (CPMG) sequence [178] where the dependence of the R2 rate on the CPMG delay probes conformational exchange processes on the order of 103 to 104 Hz [179]. Alternatively, the transverse relaxation rate can be obtained through on-resonance spin-lock based sequences [180] where the relaxation rate in the rotating frame is measured— strictly a doubly tilted rotating frame. In this situation, the measurement of the spin-lock relaxation rate, R1ρ , as a function of the spin-lock amplitude results in equivalent probe of the spin-spin relaxation. The range of exchange rates observed by the spin-lock relaxation rate is extended if the measurements are made along an effective field tilted away from the static field axis. The spin-lock relaxation rate, R1ρ , is thus measured along the effective field and as such is referred to as longitudinal relaxation in the rotating frame. Without going into the same level of detail in describing the derivation of the relaxation rates in terms of spectral densities, it will be shown that the on-resonance R1ρ rate is equivalent to the transverse relaxation rate with some added benefits. For a more complete description of 15 N longitudinal relaxation in the rotating frame, see [180–182], from which this discussion is derived. In order to obtain an expression for the relaxation rate in the rotating frame in terms of spectral densities, the laboratory frame must first be transformed to an appropriate
108
interaction frame in which the magnetic field terms from the Hamiltonian perturbation have vanished. To begin, it is necessary to revisit the expression of the laboratory frame Hamiltonian in (2.1) H(t) = H0 + Hrf (t) + H1 (t) where H0 is the Zeeman term, Hrf (t) is the time-dependent Hamiltonian of an applied rf -field and H1 (t) is also a time-dependent perturbation to the main Hamiltonian. In the previous sections describing dipolar relaxation, it was assumed that there was an absence of an external field, but here that simplification is not possible. The rotating frame is usually defined with its Z -axis parallel to the static field B0 (i.e., coincident with the laboratory z -axis). A spin-lock field B1 is applied perpendicular to the static field B0 (see Figure 2.3). For the current discussion, consider the spin-lock to be applied selectively such that it affects only the S spins (in this case
15
N). The Hrf (t) term is then given by
Hrf (t) = ω1 (Sx cos(ω0 t) + Sy sin(ω0 t))
(2.200)
where ω1 = γs B1 and ω0 is the carrier frequency. The resultant effective field, Bef f , is then tipped away from the static field by an angle β and has an x -component of −ω1 /γs and a z -component of −(ωS − ω0 )/γS where ωS is the Larmor frequency of the S spin.
109
Bo z
"(#S"#$)/%S
Beff
! y B1=-#1/%S x
Figure 2.3: Effective magnetic field vector (Bef f ) in the rotating frame resulting from an applied spin-lock (B1 ) perpendicular to the static field. The effective field is tilted away from the static field vector (B0 ) by an angle β. The tip angle β is then given by
tan β =
ω1 ωS − ω0
(2.201)
and frequency of the effective field is then
ωe = γS Bef f
= = ω12 + (ωS − ω0 )2
(2.202)
To obtain a representation of the Hamiltonian in a reference frame in which the applied fields have vanished, three successive rotational transformations must be carried out on those spins experiencing the spin-lock. Conventional transformation to the rotating frame will only remove the dependence on the static field B0 . In order to remove the dependence on the spin-lock field B1 , a transformation to a doubly rotating frame results in an expression for 110
the Hamiltonian in which only the local perturbing operators remain. Using the same expressions for the Zeeman Hamiltonian and tensor operators as in (2.63) to (2.66), the laboratory frame and Hamiltonian is defined with the additional spinlock Hamiltonian term defined as in (2.200) with ω0 = ωS . If the rotations are considered in terms of the Euler angles α, β and γ then the first rotation is through an angle α(t) = ωS t about the laboratory z -axis. Hence {x, y, z} → {X, Y, Z} which is the conventional transformation to the rotating frame with the laboratory z -axis coincident with the rotating frame Z -axis. The second transformation is through an angle β about the new Y -axis such that {X, Y, Z} → {X ∗ , Y ∗ , Z ∗ }. This rotation effectively tips the Z -axis to be coincident with Bef f . Then a third rotation through an angle γ(t) = ωe t about the Z* -axis results in the final doubly tilted rotating frame where {X ∗ , Y ∗ , Z ∗ } → {x$ , y $ , z $ }. Thus, the rotations can be summarized by the rotational operator U S
U S = eıωe tSz eıβSy eıωS tSz
(2.203)
An additional rotational operator U I is also applied to the Hamiltonian which corresponds to the rotation through an angle δ(t) = ωI t about the laboratory z -axis such that U I = eıωI tIz
(2.204)
The rotation by the operator in (2.204) corresponds to the single rotation of the I spin (proton). The I spin is not affected by the spin-lock and therefore undergoes no additional rotations.
111
Therefore, the combined rotational operator for the I and S spins is of the form U = U S U I = eıωe tSz eıβSy eıωS tSz eıωI tIz
(2.205)
ıωe tSz ıβSy ı(ωI Iz +ωS Sz )t
e
=e
e
Hence, the Hamiltonian will transform according to (2.206)
H$ = U HU †
The development of the dipolar relaxation rate is essentially the same as that given in Section 2.2.2 except that the tensor operators—the A(q) ’s of (2.61)—need to be transformed into the doubly tilted rotating frame. To obtain the transformed tensor operators resulting from a rotational transformation R about the y-axis by an angle β on spin S, the following relations are used [168, 182–184]:
cos β 0 − sin β RyS (β) = 1 0 0 sin β 0 cos β †
RyS (β)Sz RyS (β) = sin βSx + cos βSz †
RyS (β)S± RyS (β) = cos βSx ± ıSy − sin βSz Sx =
S+ + S− 2
S+ − S− 2 ! " ! "2 β cos β − 1 4 = sin 2 2 ! " "2 ! β cos β + 1 4 = cos 2 2 ıSy =
112
(2.207)
(2.208) (2.209) (2.210) (2.211) (2.212) (2.213)
The spin tensor operators A(q) in (2.61) can be transformed [185] to B (q) according to †
B (q) = RyS (β)A(q) RyS (β)
(2.214)
The end result for the B (q) ’s—after a lot of simplifying—are expressions that are linear (q)
combinations of the Ap ’s.
†
B (0) = RyS (β)A(0) RyS (β)
7 1 ! cos β − 1 " 6 6 7 1 (−1) (1) (2) (−2) + − sin β A0 + A0 A0 + A0 = 3 3 2 "6 ! 7 7 6 cos β − 1 1 (0) (−0) (1) (−1) + − sin β A1 + A1 A1 + A1 2 6 (0) cos βA0
(2.215)
†
B (1) = RyS (β)A(1) RyS (β) ! ! " " cos β + 1 cos β − 1 3 (0) (1) (1) (−1) A0 + A0 + A0 + cos βA1 = 2 2 2
(2.216)
(0)
(2)
+ sin βA0 + 3 sin βA1
B
(−1)
=
!
+
(−2) sin βA0
cos β + 1 2
"
+
(−1) A0
+
!
cos β − 1 2
"
3 (0) (1) (−1) A0 + A0 + cos βA1 2
(2.217)
(0) 3 sin βA1
†
B (2) = RyS (β)A(2) RyS (β) ! " " ! cos β + 1 cos β − 1 1 (2) (0) (1) = A0 + 3 A1 − sin βA1 2 2 2
113
(2.218)
B
(−2)
=
!
cos β + 1 2
"
(−2) A0
+3
!
cos β − 1 2
"
(0)
A1 −
1 (−1) sin βA1 2
(2.219)
Now we re-express (2.58) in terms of the transformed spin tensor operators: ! " ˜ - (−q) - (q) .. - (−q) - (q) .. d'Q( 1 ( ( q (q) =− j (ωp ) ' Bp , Bp , Q ( − ' Bp , Bp , Q (0 dt 2 q p
(2.220)
For longitudinal relaxation in the rotating frame, the differential equation for the relaxation of spin S by dipolar interaction with spin I is obtained with Q = Sz . As such, analogous to equation (2.84) we have d'Sz (ρ = −('BzS ( − 'BzS (o ) dt
(2.221)
and analogous to equation (2.85), the relaxation superoperator is given by BzS =
1 ( ( (q) (q) - (−q) - (q) .. j (ωp ) Bp , Bp , Sz 2 q p
(2.222)
which has an expectation value given by 'BzS ( =
1 ( ( (q) (q) - (−q) - (q) .. j (ωp )' Bp , Bp , Sz ( 2 q p
(2.223)
Remark 25. Note in (2.221) that the notation for the relaxation superoperator has been changed from A to B to denote the fact that the operator is for the doubly tilted rotating frame. For this reason, the subscript ρ has also been added to d'Sz (/dt to avoid confusion with (2.99). Since the expressions for the transformed spin tensor operators are simply linear (q)
combinations of the Ap ’s used previously in evaluating the double commutators with Q = Iz , it not necessary to re-evaluate all of the double commutators. Instead, it is simply a matter of interchanging the I and S terms and including the coefficients of the linear combinations. 114
Therefore, as in Section 2.2, the relaxation superoperator BzS may be written as 2BzS
! " 8 29 4α2 α2 β 2 (0) 4 j (0) (ωI − ωS + ωe )(Sz + Iz ) = sin βj (ωe ) Iz Sz + sin 9 18 2 ! " ! " 8 9 α2 β β (0) 2 4 4 + j (ωI − ωS − ωe )(Sz − Iz ) + 4α cos j (1) (ωS + ωe ) Iz2 Sz cos 18 2 2 ! " 2 8 9 β α + 4α2 sin4 j (1) (ωs − ωe ) Iz2 Sz + sin2 βj (1) (ωI + ωe )(Sz + Iz ) 2 2 2 2 α α sin2 βj (1) (ωI − ωe )(Sz − Iz ) + cos4 βj (2) (ωI + ωS + ωe )(Sz + Iz ) + 2 2 ! " α2 β + sin4 j (2) (ωI + ωS − ωe )(Sz − Iz ) (2.224) 4 2
Recalling that the terms j (q) (ω) = |F (q) |2 J (q) (ω) as defined in (2.128) to (2.130), that I(I + 1) = S(S + 1) = 3/4, and noting that the spatial functions F (q) are invariant under the transformation to the doubly tilted rotating frame, (2.224) may be simplified to / ! " γI2 γS2 !2 6 µo 7 1 2 (0) 1 4 β = sin βJ (ωe ) + sin J (0) (ωI − ωS + ωe ) 5r6 4π 2 4 2 ! " ! " ! " β 1 4 β 3 3 4 β 4 (0) (1) + sin J (ωI − ωS − ωe ) + cos J (ωS + ωe ) + sin J (1) (ωS − ωe ) 4 2 4 2 4 2 ! " 3 2 (1) 3 3 2 (1) β 4 + sin βJ (ωI + ωe ) + sin βJ (ωI − ωe ) + cos J (2) (ωI + ωS + ωe ) 8 8 2 2 ! " 0 3 4 β (2) + sin J (ωI + ωS − ωe ) 'Sz ( 2 2 / ! " ! " 1 γI2 γS2 !2 6 µo 7 1 4 β β (0) 4 J (ωI − ωS + ωe ) − cos J (0) (ωI − ωS − ωe ) + sin 6 5r 4π 4 2 4 2 ! " 3 3 β 3 + sin2 βJ (0) (ωI + ωe ) + sin2 βJ (0) (ωI − ωe ) + cos4 J (2) (ωI + ωS + ωe ) 8 8 2 2 0 ! " 3 4 β (2) − sin J (ωI + ωS − ωe ) (2.225) 2 2 'BzS (
The relations in (2.134) and (2.135) are applied to obtain the expression 1 J(ω) = J (q) (ω) 5 115
(2.226)
Using (2.226) along with the definition of d from (2.132) and some rearranging of terms, the longitudinal relaxation rate equation for the rotating frame may be expressed, analogous to (2.93), as 8 9 d'Sz (ρ = − 'BzS ( − 'BzS (o dt 1 1 = − SS ('Sz ( − 'Sz (o ) − SI ('Iz ( − 'Iz (o ) T1ρ T1ρ
(2.227)
where the rotating frame longitudinal auto-relaxation rate is given by / ) * ! " ! " β d2 β 2 4 4 = SS = 4 sin βJ(ωe ) + 2 sin J(ωI − ωS + ωe ) + cos J(ωI − ωS − ωe ) 8 2 2 T1ρ ! " ) * ! " β β 4 4 +6 cos J(ωS + ωe ) + sin J(ωS − ωe ) + 3 sin2 β [J(ωI + ωe ) + J(ωI − ωe )] 2 2 *0 ) ! " ! " β β 4 4 J(ωI + ωS + ωe ) + sin J(ωI + ωS − ωe ) (2.228) +12 cos 2 2
SS R1ρ
1
and the cross-relaxation rate is / ) ! " ! " * β β 4 4 2 sin J(ωI − ωS + ωe ) − cos J(ωI − ωS − ωe ) 2 2 ! " ) ! " *0 β β 2 4 4 +3 sin β [J(ωI + ωe ) + J(ωI − ωe )] + 12 cos J(ωI + ωS + ωe ) + sin J(ωI + ωS − ωe ) 2 2 1 d2 = SI = 8 T1ρ
SI R1ρ
(2.229)
In the on-resonance case, where the tilt angle β = π/2, (2.228) reduces to * ) * 1 1 1 1 4J(ωe ) + 2 J(ωI − ωS + ωe ) + J(ωI − ωS − ωe ) + 6 J(ωS + ωe ) + J(ωS − ωe ) 4 4 4 4 ) *0 1 1 +3 [J(ωI + ωe ) + J(ωI − ωe )] + 12 J(ωI + ωS + ωe ) + J(ωI + ωS − ωe ) (2.230) 4 4
SS R1ρ
d2 = 8
/
)
If it is further assumed that, since ωe is in the kHz range (compared to the MHz range for
116
ωI and ωS ), then the following approximations may be made:
J(ωI ± ωS ± ωe ) ≈ J(ωI ± ωS )
(2.231a)
J(ωI ± ωe ) ≈ J(ωI )
(2.231b)
J(ωS ± ωe ) ≈ J(ωS )
(2.231c)
With these approximations, (2.230) further reduces to SS R1ρ =
d2 {4J(ωe ) + J(ωI − ωS ) + 3J(ωS ) + 6J(ωI ) + 6J(ωI + ωS )} 8
(2.232)
Excepting the fact that the rate equation in (2.232) contains a term for J(ωe ) rather than J(0), the equation is identical to that of the laboratory frame R2SS for the dipolar interaction in equation (2.140). If the additional contribution for the CSA interaction is considered in the on-resonance case, then there will be an additional term corresponding to SS R1ρ(CSA)
c2 = 6
/
0 3 4J(ωe ) + [J(ωS + ωe ) + J(ωS − ωe ) 2
(2.233)
where the constant c is the same as that defined in (2.164). Using the approximations in (2.231c), (2.233) reduces to SS = R1ρ(CSA)
c2 {4J(ωe ) + 3J(ωS )} 6
(2.234)
Hence the longitudinal relaxation rate in the rotating frame for the on-resonance condition
117
is SS SS SS R1ρ = R1ρ(DIP OLAR) + R1ρ(CSA)
d2 {4J(ωe ) + J(ωI − ωS ) + 3J(ωS ) + 6J(ωI ) + 6J(ωI + ωS )} 8 c2 + {4J(ωe ) + 3J(ωS )} 6 =
(2.235)
Therefore, for the on-resonance case with the tilt angle β at 90◦ , the longitudinal relaxation rate in the rotating frame, R1ρ , is identical to the transverse relaxation rate R2 in the laboratory frame except for the low frequency term which is now J(ωe ) rather than J(0). Consequently, measurements of R1ρ can be used as a substitute for R2 in the Lipari-Szabo Model-Free estimation of dynamics parameters. Remark 26. The term for the CSA contribution for the relaxation rate could have been developed in the same way as the dipolar contribution was, but for simplicity it is just given. This development of R1ρ is lengthy enough as it is.
118
Chapter 3 Materials and Methods 3.1
Plasmid construction
The Tat expression vector used throughout this work was constructed by Gillian Henry from the E. coli codon-optimized exon 1 tat gene (residues 1-72 of the HIV-1 BH10 isolate) contained in pSV2tat72 obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH from Dr. Alan Frankel [186]. A brief outline of the development of the expression vector is as follows. The tat gene was amplified by the polymerase chain reaction (PCR) using pSV2tat72 as template and the following forward (Nde I) and reverse (Bgl II) primers (5’-ATGATCGTCATATGGAACCGGTCGACCCGCGT3’ and 5’-CCGGGAGATCTTCACTGTTTAGACAGAGAAACCTGGTGGGTC-3’). The PCR amplified DNA was then ligated into pUC18 (Pierce, Milwaukee, WI) that had been opened with Sma I. The insert was DNA sequenced and the resulting plasmid is referred to as pUC18tat. The tat exon 1 gene from pUC18tat was removed using Nde I and Bgl II and the purified fragment ligated into pET28b(+) (Novagen, Madison, WI) that had been opened with Nde I and BamH I. The expression vector was verified using the PCR primers for
119
sequencing. The pET28tat plasmid was transformed into NovaBlue cells (Novagen, Madison, WI) for plasmid storage and into E. coli BL21(DE3)pLysS cells for protein expression with an N-terminal hexahistidine segment (His-tag) and thrombin cleavage site that adds 20 residues to the 72 residue protein.
3.2
Expression of unlabelled His-tagged Tat1−72
Initial experiments were designed to test the expression system and were done using nonlabelling conditions for the over-expression of Tat. The following expression protocol was developed to increase the protein yield and simplify the procedure to allow convenient production of significant amounts of protein for use in NMR experiments. Transformed cells from a 100 µL glycerol stock were grown up in 50 mL of Terrific Broth (TB) (Sigma, St. Louis, MO) inoculated with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin for 16 hours at 37 ◦ C in a rotary shaker. A 10 mL aliquot of the overgrown culture was then added to 1 L of pre-incubated (37 ◦ C) TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin) in a 2 L baffled flask. Cell growth was monitored by optical density measurements at 600 nm until the measured reading was 0.8. Expression was then initiated by induction with 60 mg of isopropyl-β-D-thiogalactopyranoside (IPTG) (Sigma, St. Louis, MO). This level of IPTG (∼0.25 mM) was chosen following experiments that showed an increasing yield of expressed protein upon reduction of the IPTG concentration. The standard starting point for induction of lac-repressor regulated promoters is 1 mM [187]. In some experiments the IPTG concentration was reduced to as low as 0.1 mM (see Chapter 5 for further details on optimization). Cells were allowed to express for 5 hours before the cell culture was put on ice for 15 minutes to halt the protein expression. The cells were then collected by centrifugation at 2,600×g for 15 minutes, sealed in bottles under an argon atmosphere prior to freezing in liquid nitrogen, and stored at -72 ◦ C. 120
3.3
Expression of
13
C/15N-His-tagged Tat1−72
The following expression protocol was modified from published methods [188] to reduce the consumption of isotopically-labelled ingredients. As with the unlabelled protein expression, cell growth was initiated from a 100 µL glycerol stock of the pET28tat-transformed cells into 50 mL of TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin) and grown for approximately 15 hours. Four 10 mL aliquots of the 50 mL overgrown cell culture were then used to inoculate 4×2 L baffled flasks each containing 1 L of pre-incubated (37 ◦ C) TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin). Cells were grown at 37 ◦ C in a rotary shaker; growth was halted when the optical density of each flask reached 0.6-0.9. Flasks were submerged in crushed ice for 15 minutes to halt cell growth and then cells were collected by centrifugation at 2,600×g at 4 ◦ C for 15 minutes. Cell pellets were re-suspended in 40 mL of M9 salts solution (see Table 3.1) to wash away residual rich media, pooled, and then centrifuged again at 2,600×g for 15 minutes. The single pooled pellet was then re-suspended in 10 mL of the M9 wash solution and then added to 1 L of pre-incubated (37 ◦ C) M9 minimal medium with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin. The M9 medium was adapted from [189] and contained 0.7 g 15 NH4 Cl and 2 g of 13 C6 -glucose (Cambridge Isotope Laboratories Inc., Andover, MA) and was supplemented with vitamins and micronutrients (see Table 3.1). The cells were allowed to adjust to the new medium for 15 minutes and then over-expression was induced upon addition of 240 mg of IPTG. Cell expression was stopped after 5 hours and cells were harvested by centrifugation at 2,600×g at 4 ◦ C. Cell pellets were re-suspended with 10 mL of M9 wash solution per pellet, pooled, and centrifuged at 2,600×g for 15 minutes. The supernatant was removed and the bottle sealed in an argon atmosphere prior to freezing in liquid nitrogen for storage at -72 ◦ C.
121
Table 3.1: M9 Minimal Medium ingredients adapted from [189] Component
Concentration (mM)
KH2 PO4
22
Na2 HPO4
42
NH4 Cl
12.8
15
MgSO4
2
CaCl2
0.01
NaCl
8.5
FeSO4
0.01
U-13 C6 -glucose
10.7
(NH4 )6 (MoO7 )24
3×10−6
H3 BO3
4×10−4
CoCl2
3×10−5
CuSO4
1×10−5
MnCl2
8×10−5
ZnSO4
1×10−5
Choline chloride
2.9×10−3
Folic acid
1.1×10−3
Pantothenic acid
2.1×10−3
Nicotinamide
4.1×10−3
Myo-inositol
5.5×10−3
Pyridoxal hydrochloride
2.4×10−3
Thiamin hydrochloride
1.5×10−3
Riboflavin
1.4×10−4
Biotin
4.1×10−3
122
3.4
Purification of His-tagged Tat1−72
Cell lysis was achieved by two freeze-thaw cycles, each with a 30 minute incubation period at room temperature following complete thawing of the pellet. DNase I and RNase I (Sigma, St. Louis, MO) were added to the lysate (200 µg of each) and incubated at 37 ◦ C for 30 minutes. A 100 mL aliquot of extraction buffer (see Table 3.2) was added to the lysate and the mixture was microprobe-sonicated (twice at 35 % power with 30 second bursts and 30 seconds between bursts) using a Fisher Sonic Dismembrator Model 300 (Fisher Scientific, Norcross, GA). The lysate was then centrifuged at 17,000×g for 30 minutes, and the supernatant was poured over a 4 mL bed of Talon™ (cobalt-Superflow™) metal affinity resin (Clonetech, Palo Alto, CA) in a 10 mL polypropylene gravity flow column (QIAGEN Inc., Mississauga, ON). Because of the expectation of higher yields of unlabelled protein, the extract was usually divided into two identical portions to avoid saturating the cobalt metal affinity resin. The resin was pre-equilibrated with the extraction buffer prior to introduction of the extract. The resin was washed with 20 mL of additional extraction buffer followed by 30 mL of wash buffer (see Table 3.2). Tat protein was released from the cobalt column with the elution buffer (see Table 3.2) and 10×1 mL fractions were collected. The fractions were pooled and serially dialysed against 1 L of degassed acetate buffer at pH 3 at concentrations of 0.1 M, 0.05 M, and 0.01 M (approximately 6 hours each). A final dialysis was done against degassed water for 4 hours. Each of the dialysis buffers was sealed under an argon atmosphere. A 1 mL aliquot was removed from the dialysate for near-ultraviolet (near-UV) absorbance analysis and mass spectrometric analysis; the remainder of the dialysate was frozen and freeze-dried.
123
Table 3.2: Protein purification buffers. Buffer
pH Composition
Extraction
7.2
6 M guanidine hydrogen chloride (Gdn-HCl); 100 mM sodium phosphate; 10 mM tris(hydroxymethyl) aminomethane hydrochloride (Tris-HCl); 10 mM tris(2-carboxyethyl) phosphine (TCEP)
Wash Elution
3.5
6.4 4
6 M Gdn-HCl; 50 mM sodium phosphate; 10 mM TCEP 6 M Gdn-HCl; 50 mM sodium acetate; 10 mM TCEP
MALDI-TOF-MS
To assess the purity of the protein sample and identify the Tat monomer, Vincent Chen from the Hélène Perreault lab at the University of Manitoba, prepared samples for matrixassisted laser desorption-ionization time-of-flight mass-spectrometry (MALDI-TOF-MS). A 10 µL aliquot of the dialysate (from the unlabelled Tat purification) in aqueous solution was subjected to solid phase extraction (SPE) to remove unwanted salts and buffers using a Millipore C18 ZipTip™(Billerica, MA) following the manufacturer’s recommended protocol as follows: SPE-treated samples were concentrated by aspirating the SPE tip with 2 µL of 50:50 acetonitrile/water with 0.1% trifluoro-acetic acid (TFA). Samples were then mixed with 2 µL of sinapinic acid (3,5-dimethoxy-4-hydroxycinnamic acid) matrix solution (Sigma, St. Louis, MO) saturated in water and transferred to a Bruker Scout™ (Billerica, MA) 384 stainless steel target. Mass spectrometric analysis was performed on a Bruker Biflex™ IV MALDI-TOF instrument operated in positive, linear mode with acceleration potentials of 21 kV and 17 kV for lenses 1 and 2, respectively. The instrument was externally calibrated with the [M+H]+ and [M+2H]2+ ions of bovine serum albumin (BSA) (m/z 66431, m/z 33215) and myoglobin (m/z 16952.62, m/z 8476.81).
124
3.6
NMR Sample Preparation
Freeze-dried Tat protein was dissolved in 600 µL of degassed buffer containing 50 mM acetate-d4 /ammonium hydroxide, 20 mM 2-(N-morpholino)ethanesulfonic acid (MES) (only in
13
C/15 N-labelled sample), 80 µM sodium sulfite, 0.02% sodium azide and 5% D2 O. The
resulting protein solutions were at pH 4 (unlabelled) and pH 4.1 (13 C/15 N-labelled). The samples were put into 5 mm (535-PP) NMR tubes (Wilmad-Labglass, Buena, NJ) that had been purged with argon gas for 15 minutes and the dissolved protein was added to the sample tube under an argon atmosphere. The NMR tube caps were then sealed with Teflon® tape (DuPont, Wilmington, Delaware). The final protein concentration in the NMR tube was 1.5 mM (unlabelled) and 1 mM (13 C/15 N-labelled) Tat.
3.7 1
NMR HSQC Acquisition
H/15 N heteronuclear single quantum coherence (HSQC) spectra of the 92-residue His-
tagged Tat1−72 were acquired, for both the unlabelled protein (using the natural abundance of the
15
N isotope for the indirect dimension) and the
13
C/15 N-labelled protein, on a
600 MHz Varian INOVA spectrometer (14.1 tesla field strength) equipped with a triple resonance probehead at 20.2 ◦ C, using the standard gradient sensitivity-enhanced HSQC Varian BioPack pulse sequence [190]. The NMR probe temperature was calibrated with methanol [169] and spectra were processed with NMRPipe [191]. HSQC experiments were collected with 2048 complex points in the direct dimension for both samples. For the indirect 15
N dimension, 256 and 128 complex points were collected for the
13
C/15 N-labelled and the
unlabelled samples, respectively. Sweep widths in both experiments were 12 ppm in the direct and 36 ppm in the indirect dimensions. A total of 192 transients was collected on the unlabelled Tat protein whereas 32 transients were collected for the 125
13
C/15 N-labelled
protein. Spectra were apodized using a squared cosine bell function, zero filled to twice (13 C/15 N-labelled) or four times (unlabelled) the data set size, and linear predicted (forwardbackward with eight prediction coefficients) prior to Fourier transformation in the indirect dimension. The dimensions of the resulting processed data sets were 4096×1024 points for both 1 H/15 N-HSQC experiments. Non-linear line shape fitting was performed on the peaks in the spectrum of the unlabelled Tat sample and the noise was subtracted from the result. The HSQC pulse sequence was sensitivity-enhanced and used gradients for coherence selection and water suppression [178]. Radiation damping was suppressed with a water flip-back pulse (1.42 ms).
15
N decoupling during acquisition was done using the WALTZ-16 sequence [192]
at a frequency of 7.2 kHz.
1
H chemical shifts were referenced to the water signal that
resonates 4.82 ppm from 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) at 293 K [169]. and
13
3.8
15
N
C referencing were done indirectly relative to DSS as recommended [193].
NMR Backbone Assignments
All backbone assignment experiments for the 92-residue 13 C/15 N-labelled His-tagged Tat1−72 were done on a 600 MHz Varian INOVA spectrometer (14.1 T) equipped with a triple resonance probe head at 20.2◦ C, using standard Varian BioPack pulse sequences [190, 194– 198] (see Table 3.3). The NMR probe was calibrated with methanol [169] and all spectra were processed with NMRPipe [191]. Spectra were apodized using a squared cosine bell function, zero filled to twice the data set size, and linear predicted (forward-backward with eight prediction coefficients) prior to Fourier transformation. The dimensions of the resulting processed data sets were 4096×1024 for the 1 H/15 N-HSQC experiment and 2048×256×128 for all 3-dimensional experiments. The pulse sequences used are sensitivity-enhanced (with the exception of the HNHA experiment) and use gradients for coherence selection and water suppression [178]. Radiation damping was suppressed with a water flip-back pulse (1.42 126
ms).
15
N decoupling during acquisition was done using the WALTZ-16 sequence [192] at a
frequency of 7.2 kHz. 1 H chemical shifts were referenced to the water signal that resonates 4.821 ppm from 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) at 293 K [169].
15
N and
13
C
referencing were done indirectly relative to DSS as recommended [193]. Table 3.3: Acquisition parameters for the NMR experiments. Experiment
a
a
Scans b
1
H/15 N-HSQC
[190]
1
H/15 N-HSQC [190]
Complex Points
SW[1 H] SW[13 C] SW[15 N] Field (ppm) (ppm) (ppm) (tesla)
192
2048×128
12
36
14.1
32
2048×256
12
36
14.1
HNCACB [194]
16
1024×64×32
12
70
30
14.1
CBCA(CO)NH [195]
8
1024×64×32
10
70
24
14.1
HNCO [196]
8
1024×64×32
10
8
24
14.1
HN(CA)CO [197]
8
1024×64×32
10
8
24
14.1
HNCA [195, 196, 199, 200]
16
1024×64×30
20
20
20
14.1
HNHA [198]
8
1024×64×32
10
10
30
14.1
T1 [178, 201]
8
2048×256
10
24
14.1
T2 [178, 201]
8
2048×256
10
24
14.1
T1ρ [202]
8
2048×256
10
24
14.1
NOE [178]
32
2048×256
10
24
14.1
T1 [178, 201]
4
672×256
15
26
18.8
T1ρ [202]
4
672×256
15
26
18.8
NOE [178]
16
672×256
15
26
18.8
All HSQC and 3D experiments were acquired with a 0.7 s post-acquisition relaxation delay. Relaxation experiments (T1 , T2 , T1ρ , and NOE ) used a 5 s relaxation delay. The saturation period for the NOE experiments was 5 s.
b
Experiment carried out using the natural abundance of the 15 N isotope in the unlabelled Tat sample.
127
Chemical shift differences from a random coil were determined according to the method of Schwarzinger et al. [203] in which experimentally-derived random coil chemical shifts from model pentapeptides (Ac-G-G-X-G-G-NH2 ) under denaturing conditions [204] were subtracted from the observed 1 H,
13
C, and
15
N chemical shifts for His-tagged Tat1−72 . The
random coil values were corrected for local sequence effects as the amide nitrogen, amide proton and carbonyl carbon chemical shifts are very sensitive to the local amino acid sequence (the Cα and Hα are less sensitive). The random coil values are corrected for the effects of the neighbouring residues [203] according to
δcorrected (i) = δrc (i) + ∆δ(i − 1) + ∆δ(i + 1) + ∆δ(i − 2) + ∆δ(i + 2)
(3.1)
where δcorrected is the corrected chemical shift difference for the residue at position i in the sequence, δrc is the experimentally derived random coil chemical shift for residue i in the pentapeptide, and the ∆δ terms are experimentally determined correction factors for the two residues preceding and following residue i. These corrections were applied to the amide HN , amide N, and C’ as well as the Cα and Hα (since correction factors were available) chemical shifts. No sequence dependent correction factors were available for the Cβ chemical shift. The random coil values in [204] and the correction factors in [203] were determined at 293 K. 3
JH N H α coupling constants were determined to a first approximation from the ratio
of the intensities of the cross- and diagonal-peaks in the HNHA experiment [198]. This approximation assumes that the lineshapes of the cross- and diagonal-peaks are identical. The 3 JH N H α coupling constants are then obtained (ignoring relaxation effects) [169] from the relation Icross = − tan2 (3 JH N H α π2δ2 ) Idiag
(3.2)
where Icross and Idiag are the intensities of the cross- and diagonal-peaks respectively, and 128
2δ2 is the re-phasing period—set to 12.5 ms [198]. As the above approximation does not α
consider the effects of longitudinal relaxation of the Hα proton (R1H ) during the 2δ2 period the results obtained under this approximation are likely 5-10% underestimated [169]. The ∆3 JH N H α values were calculated by subtracting the sequence-corrected COIL values reported in [205] from the above approximation for the 3 JH N H α coupling constants1 .
3.9 NMR
NMR Relaxation Measurements 15
N-relaxation data were collected on both
15
N-labelled and
13
C/15 N-labelled His-
tagged Tat1−72 on a Varian INOVA 600 MHz spectrometer (14.1 T field) at the University of Manitoba and on a Varian INOVA 800 MHz spectrometer (18.8 T field) at the University of Alberta (NANUC) with triple resonance probe heads at 20.2 ◦ C, using Varian BioPack pulse sequences [178, 201, 202]. Cross-peak intensities were measured as peak heights. Spectra were processed with NMRPipe [191] which was also used to fit the relaxation data to twoparameter exponential decays. The errors in the relaxation rates were calculated using the signal-to-noise ratios of the individual peaks and the fits of the data to the decays. Duplicate measurements were made to verify the error estimates. A total of nine data sets were acquired to obtain longitudinal relaxation rates (R1 ) using relaxation delays of 0, 50, 100, 250, 500, 1000, 1500, 3000, and 4000 ms. Measurements for longitudinal relaxation rates in the rotating frame (R1ρ ) were made with eight data sets using spin-lock times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The 15 N spin-lock continuous-wave frequency for the R1ρ relaxation experiments was 1.5 kHz, with 90◦ pulse lengths of 166.755 ms and 125.029 ms for the 14.1 T and 18.8 T fields respectively. The R1ρ measurements were corrected for offset 1
Sequence-dependent effects on COIL 3 JH N H α coupling constants where the preceding residue is β-branched or aromatic (L-type: Phe, His, Ile, Thr, Val, Trp, and Tyr) or is any other residue (S-type). 129
from the carrier using the measured R1 values as described in reference [202]. The peaks at the outer edges of the spectra required correction by less than 10%. Transverse relaxation rate (R2 ) measurements were done at 600 MHz only and with Carr-Purcell-Meiboom-Gill (CPMG) [178] times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The R1 and R1ρ data were acquired using 4 transients whereas 8 transients were collected for the R2 experiments; the post-acquisition relaxation delay was 5 s. Data collected at 600 MHz were 2048×256 complex points with SW[1 H]=10 ppm and SW[15 N]=24 ppm. The steady-state 1 H-15 N NOE values were obtained from ratios of peak heights from experiments with (IN OE ) and without (InoN OE ) saturation of the protons for 5 s at the beginning of the experiment. The heteronuclear NOE values were then obtained from (IN OE -InoN OE )/InoN OE . The spectra were acquired with 32 transients, a 5 s relaxation delay, and the same resolution as in the R1 and R1ρ experiments. Water suppression was achieved through the use of gradients to select for the
15
N-1 H coherence [178]. Data collection at 800 MHz was done exactly as at 600 MHz
but the resolution for the experiments was 672×256 complex points with SW[1 H]=15 ppm and SW[15 N]=26 ppm and NOE experiments were done with 16 transients.
3.10
Relaxation Data Analysis
The measurement of NMR relaxation rates provides a window on protein dynamics over a broad range of timescales:
15
N longitudinal (R1 ), transverse (R2 ), rotating-frame (R1ρ ), and
heteronuclear cross-relaxation (contained in the NOE) rates are sensitive to dynamics on the picosecond-nanosecond timescales, and R2 and R1ρ can also be sensitive to conformational exchange (Rex ) on the millisecond to microsecond timescales. The equations relating the macroscopic rates of relaxation (Rx ) to the values of the spectral density of motions (J) at the nuclear spin transition frequencies (ω) were given by Abragam [167] and are summarized
130
as follows (see Sections 2.2, 2.3 and 2.4):
R1 =
d2 [J(ωH − ωN ) + 3J(ωN ) + 6J(ωH + ωN )] + c2 J(ωN ) 4
d2 [4J(0) + J(ωH − ωN ) + 6J(ωH ) + 3J(ωN ) + 6J(ωH + ωN )] 8 c2 + [4J(0) + 3J(ωN )] + Rex 6 ) * γH d2 6J(ωH + ωN ) − J(ωH − ωN ) N OE = γN 4 R1
R2 = R1ρ =
(3.3)
(3.4)
(3.5)
The constants d and c in equations (3.3)-(3.10) are defined from (2.132) and (2.164) as 6µ 7 γ γ ! o H N d= 3 4π rN H c=
∆σωN √ 3
where • µo = 4π × 10−7 · kg · m · s−2 is the permeability constant of free space; • γH = 2.68 × 108 · rad · s−1 · T −1 is the proton gyromagnetic ratio; • γN = −2.71 × 107 · rad · s−1 · T −1 is the gyromagnetic ratio of
15
N;
• rN H =102 pm is the proton-nitrogen internuclear separation [206]; • ∆σ =-172 ppm is the difference between the parallel and perpendicular components of the
15
N chemical shift tensor [206];
• ! = 1.05 × 10−34 J · s is Planck’s constant divided by 2π. Since equations (3.3)-(3.5) involve spectral density functions at five distinct frequencies, it will not be possible to evaluate the system of relaxation equations with the limited 131
data set of only three relaxation experiments. At least two additional relaxation equations (and corresponding data sets) would be necessary to unambiguously evaluate the spectral densities at these five frequencies. Peng and Wagner originally proposed full spectral density mapping [181,207] using a number of relaxation experiments equal to the number of distinct frequencies of the spectral density function plus an additional experiment to account for the conformational exchange contribution (Rex ) to R1ρ or R2 . However, it was later found that using the methods of Farrow et al. [207, 208], it is possible reduce the complexity of the system by combining the three high frequency spectral densities into a single spectral density function for J(ωH ) and incorporating the exchange contribution (if present) into an effective J(0) estimate such that
Jef f (0) = J(0) + λRex
(3.6)
where the constant λ is defined as λ=
3d2
6 + 4c2
(3.7)
The result is a system of three equations with spectral density functions at only three frequencies. For this reduced spectral density mapping approach, equations (3.3)-(3.5) can be approximated as follows [91, 208, 209]:
R1 =
R2 = R1ρ
d2 [3J(ωN ) + 7J(β1 ωH )] + c2 J(ωN ) 4
d2 c2 = [4Jef f (0) + 3J(ωN ) + 13J(β2 ωH )] + [4Jef f (0) + 3J(ωN )] 8 6 ) * γH d2 5J(β3 ωH ) N OE = γN 4 R1
(3.8)
(3.9) (3.10)
where β1 = 0.921, β2 = 0.955 and β3 = 0.87. The reduced spectral density approximations in equations (3.8)–(3.10) result in solutions for Jef f (0), J(ωN ) and J(β3 ωH ). The J(βi ωH )
132
term can be approximated in several ways, but for these analyses it has been approximated according to reference [91] by
J(βi ωH ) =
!
β3 βi
"2
J(β3 ωH )
(3.11)
The solution [210] to the system of equations in (3.8)-(3.10) is then "* ) ! 1 18 γN Jef f (0) = 2 N OE 6R1ρ − R1 3 + 3d + 4c2 5 γH
(3.12)
) * 7 γN 4 R1 1 − J(ωN ) = 2 N OE 3d + 4c2 5 γH
(3.13)
J(0.87ωH ) =
4 γN R1 N OE 2 5d γH
(3.14)
Note that R2 and R1ρ are determined by the same combination of spectral density values as long as the
15
N spin-lock is on resonance for all spins [181] (see Section 2.6). J(0.87ωH )
is determined from equation (3.10), and J(0.921ωH ) and J(0.955ωH ) are calculated directly from it using the assumption that at high frequency J(ω) ∝ 1/ω 2 . One advantage to measuring R1ρ is that, in contrast to R2 , contributions from conformational exchange are minimized (Rex ∼ 0) as long as the nitrogen carrier is placed on resonance and the spin-lock power is sufficiently high [211–213]. In the event that conformational exchange contributions are significant, Jef f (0) should be interpreted as a combination of slow motions (i.e., molecular tumbling) and conformational exchange on the µs-ms timescale. The reduced spectral density approach thus allows a direct calculation of J(ωN ) and Jef f (0) (strictly, Jef f (ωe ) the magnitude of the effective field/frequency in the presence of the spin-lock) from the measured relaxation rates and steady-state NOE. Uncertainties in the spectral densities were determined by repeating the calculations 500 times using the standard deviations of the NMR measurements and Monte Carlo methods to generate 133
a normal distribution as described in [214, 215].
The calculations were done using a
Mathematica 5.0 notebook, that I modified from the original form (written and provided by Leo Spyracopoulos [216]), using the program’s built-in simulated annealing protocol [217]; statistical analyses were done with the program JMP IN 5.1 (SAS Institute Inc., Cary, NC). Relaxation measurements were done at two fields to permit finer mapping of the spectral density and more specifically, to test the assumptions inherent in the reduced spectral density analysis. In addition, since Rex scales with the square of the applied magnetic field it is possible to determine the contribution of Rex to equation (3.4) by measuring relaxation parameters at two fields. Thus, Rex and Jef f (0) values were calculated from the relaxation measurements at 600 MHz and 800 MHz as described in [209] using the following relations: ) 1 3d2 800 600 800 600 Jef f (0) = {J(ωN ) − κJ(ωN )} {R1ρ − κR1ρ }− β 8 * c2800 800 600 + {J(0.96ωH ) − κJ(0.96ωH )} (3.15) 2 Rex =
600 R1ρ
−
!
d2 2c600 + 2 3
"
Jef f (0) −
!
3d2 c2600 + 8 2
"
600 J(ωN )−
13d2 600 J(0.96ωH ) 8
(3.16)
where the fields are denoted by their proton Larmor frequency in the superscripts and 800 600 2 subscripts, κ = (ωH /ωH ) , β = (d2 /2)(1 − κ), d is defined as above since it is field
independent, ci is the constant c from equations (3.8) and (3.9) evaluated with ωN for field 600 800 i. In this analysis, the longitudinal relaxation rates in the rotating frame (R1ρ and R1ρ )
have been used instead of the transverse relaxation rates (used by Farrow et al. [209]) in equations (3.15) and (3.16). The R1ρ relaxation data were modelled by assuming that the effect of its neighbours (j ) on the correlation time of a residue (i) decreases exponentially as the distance from the
134
residue increases and was first described in [211]:
R1ρ (i) =
int R1ρ
N (
!
|i − j| V exp − L j=1
"
(3.17)
int where R1ρ is an intrinsic residue relaxation rate, N is the length of the polypeptide, V is
the residue molecular volume [218], and L is the persistence length of the polypeptide in residues. A different solution to equations (3.3)-(3.5) was proposed by Lipari and Szabo [175,176] who derived a simplified spectral density function J(ω)LS on the assumption that global molecular reorientation (τc ) and fast internal motions (τe ) are stochastically uncorrelated [219]: J(ω)LS
* ) 2 (1 − S2 )τ S2 τc = + 5 1 + (ωτc )2 1 + (ωτ )2
(3.18)
where 1/τ = 1/τc + 1/τe . The Lipari-Szabo model-free spectral density reduces the number of unknown parameters in equations (3.3)-(3.5) to three: S2 , the square of the generalized order parameter which indicates the degree of spatial freedom of the internal motion; τc , the global rotational correlation time for molecular reorientation; and τe , the effective internal rotational correlation time which is related to both the amplitude and the rate of internal motion. The separability of internal and overall dynamics is questionable for a random coil polymer but comparisons of the Lipari-Szabo parameters to those obtained for other folded and unfolded proteins can be informative. Relaxation data were analysed using the approach developed by Schurr et al. [220] in which all three Lipari-Szabo parameters are optimized for each residue individually, as this is reported to provide a significantly better fit to the NMR data [220] . The analysis was initially carried out using the simple model in (3.18), but additional models were tested using variations of the extended model-free approach [221] and the Cole-Cole distribution [222–224].
135
The extended model-free or two-timescale method proposed by Clore et al. [221] separates the correlation time for internal motions, τe , into fast (τf ) and slow (τs ) components. In this work [221], it was found that the time evolution of the internal reorientational correlation function—CI (t) in (2.197)—probed by NMR was non-exponential when the slow motions were not at the extreme narrowing limit. The proposed solution to describing this behaviour is an expression for the internal correlation function of the form CI (t) = S2 + Af e−t/τf + As e−t/τs
(3.19)
S2 + Af + As = 1
(3.20)
with
If τf and τs differ by at least one order of magnitude, then CI (t) will tend towards an intermediate plateau before reaching a final plateau at S2 . Clore et al. [221] suggest with such a separation of timescales, the term 1 − Af could be interpreted as the generalized order parameter for fast motions, denoted S2f . If it is then assumed that the fast motions are axially symmetric and independent of the slow motions, the generalized order parameter can be decomposed into two independent components as S2 = S2f S2s
136
(3.21)
The extended model-free spectral density2 can be then be expressed as
J(ω)ext
$
(S2f − S2 )τs$ (1 − S2f )τf$ 2 S τc = + + 5 1 + (ωτc )2 1 + (ωτs$ )2 1 + (ωτf$ )2 2
' (3.22)
$
S2f S2s τc S2f (1 − S2s )τs$ (1 − S2f )τf$ 2 + = + 5 1 + (ωτc )2 1 + (ωτs$ )2 1 + (ωτf$ )2
'
where S2s is the generalized order parameter for slow motions (equivalent to 1 − As ), 1/τs$ = 1/τc + 1/τs and 1/τf$ = 1/τc + 1/τf . The extended model-free approach necessitates the use of multiple field measurements or measurements of more than three relaxation rates at the same field (although it is the former that is most often done) since the three relaxation relations in equations (3.3)-(3.5) are not sufficient alone to make estimates of more than three dynamics parameters. Another approach suggests that for unfolded or disordered proteins, a single local overall rotational correlation time, τc , is not an appropriate description of the dynamics [223–225]. Disordered or denatured proteins consist of an ensemble of rapidly converting conformational states at the nanosecond timescale, and the dynamics at each residue should reflect that ensemble of conformations. The individual residues along the disordered protein may be more appropriately described by a statistical distribution of correlation times on the nanosecond timescale [223–225]. There have been two approaches to this modification: one is to assume that the distribution of correlation times is Lorentzian [225], and the other is to assume that the correlation times follow the Cole-Cole distribution (see below) [222–224]. For the relaxation data in the present analysis, the Cole-Cole distribution was chosen to estimate 2
Note that the relation for the extended model-free spectral density described here differs from that in [221] by a factor of 2/5. The factor has been included here to be consistent with the way the spectral density function has been defined in Chapter 2.
137
the overall rotational correlation times as it was more easily implemented in calculations due to its similarity to the standard Lipari-Szabo spectral density. The Cole-Cole distribution function [222–224] is defined as
F (s) =
1 sin(επ) 2π cosh(εs) + cos(επ)
(3.23)
where s = ln(τc /τ0 ), τ0 is the centre of the distribution and ε defines the width of the distribution ( 0 < ε < 1). The resulting spectral density function based on the Cole-Cole distribution is applied to the model-free formalism to obtain the Cole-Cole spectral density function [223, 224] $ ' 8 9 S2 ω ε−1 τ0ε sin π2 ε 2 (1 − S2 ) τ 8 9+ JCC (ω) = 5 1 + (ωτ0 )2ε + 2(ωτ0 )ε cos π2 ε 1 + (ωτ )2
(3.24)
The distribution width is 1 − ε and τ0 is the centre of the distribution. In the event that ε = 1 (i.e., zero width), the Cole-Cole spectral density equation reduces to the Lipari-Szabo relation in equation (3.18) and τ0 becomes equivalent to τc . Using a program, written with Mathematica 5.0, based on the single field versions in reference [216], a series of models were tested that utilized the single-field relaxation data alone, as well as two-field data. For the extended and Cole-Cole models, the two-field data were required due to the number of parameters. In several of the tested models, an additional parameter was added to the transverse relaxation rate in equation (3.4), corresponding to the conformational exchange rate [226]. d2 [4J(0) + J(ωH − ωN ) + 6J(ωH ) + 3J(ωN ) + 6J(ωH + ωN )] 8 c2 + [4J(0) + 3J(ωN )] + Rex . 6
R2 = R1ρ =
138
(3.25)
Rex is the field-dependent exchange rate and is defined as [91] Rex = Φex B02 .
(3.26)
where Φex is the field-independent contribution to the exchange rate. The errors in the Lipari-Szabo and Cole-Cole parameters were determined by Monte Carlo analysis as described above for the spectral density analysis, except that only 100 points were calculated [214, 215]. The tested models (Table 3.4) were evaluated based on both R-factors (Rf ) [227] as well as the Akaike information criterion (AIC) as described in [228] and which is based on the χ2 test statistic and the number of parameters being optimized. The form of the χ2 error function is taken from [229] and is defined as $! " ! " n ( N ( R1(i,j) (calc) − R1(i,j) (exp) 2 R1ρ(i,j) (calc) − R1ρ(i,j) (exp) 2 2 χ = + δR1(i,j) δR1ρ(i,j) j i ! "' N OE(i,j) (calc) − N OE(i,j) (exp) 2 (3.27) + δN OE(ij) where i is the residue index, N is the number of residues, j is the field index and n is the number of fields. The terms exp and calc refer to the experimental and back-calculated (from model estimates) values for the relaxation parameters, respectively. δX (X being either R1 , R1ρ or N OE) is the estimated error in the relaxation parameter—either experimental or estimated from Monte Carlo simulation. The AIC value for a given model is then χ2 + 2p where p is the number of parameters being optimized.
139
Table 3.4: Models tested using Lipari-Szabo and Cole-Cole model-free methods model
a
model type
optimised parameters
fixed parameters
Field 14.1 T
model 1
LSb
S2 , τc
18.8 T
τe = 0; Rex = 0
14.1 T, 18.8 T 14.1 T model 2
LS
S2 , τc , τe
18.8 T
Rex = 0
14.1 T, 18.8 T
a
model 3
LS
S2 , τc , τe , Rex
14.1 T, 18.8 T
model 4
LS(ext)c
S2f , S2s , τc , τs
τf = 0; Rex = 0
14.1 T, 18.8 T
model 5
LS(ext)
S2f , S2s , τc , τs , τf
Rex = 0
14.1 T, 18.8 T
model 6
LS(ext)
S2f , S2s , τc , τs , τf , Rex
model 7
CCd
S2 , τ0 , τe , ε
model 8
CC
S2 , τ0 , τe , ε, Rex
14.1 T, 18.8 T Rex = 0
14.1 T, 18.8 T 14.1 T, 18.8 T
Models 1 and 2 can be used with relaxation measurements at a single field but models 3–8 require measurements from at least two fields.
b
LS denotes the per residue model-free variation of the Lipari-Szabo spectral density proposed by Schurr et al. [220]
c
LS(ext) denotes model-free estimates using the Clore et al. [221] variation of the LipariSzabo spectral density for fast and slow internal motions in eq. (3.22)
d
CC denotes model-free estimation using the Cole-Cole spectral density function proposed by Buevich et al. [223, 224] given in eq. (3.24) Models were selected initially based on R-factors (Rf ) such that models with reasonably
low Rf values (0.15 or less) were then used for determining the Monte Carlo error estimates. The best model was then selected from a reduced set of models with (Rf < 0.15) based on 140
the mean AIC ± the standard deviation for the model.
3.11
pH and Hydrogen Exchange
The dialysate from a
13
C/15 N-labelled His-tagged Tat1−72 preparation was separated into
to two equal portions and freeze-dried separately. One half of the freeze-dried protein was dissolved in 550 µL of a degassed aqueous solution of 80 µM sodium sulfite, 0.02% sodium azide and 5% D2 O (i.e., no buffer). The resulting protein solution was at pH 3.3 and was then added under an argon atmosphere to an NMR sample tube. Subsequently, a 1 H/15 N-HSQC spectrum was collected under the same conditions as described previously. Following the acquisition of the HSQC spectrum, a 50 µL aliquot of 0.6 M degassed MES buffer at pH 6 was added to the sample tube under an argon atmosphere. The pH of the resulting solution was quickly measured under ambient atmosphere and then the sample was degassed. The resulting solution was at pH 5.3. A second 1 H/15 N-HSQC spectrum was collected and a small aliquot of degassed 1.0 M sodium hydroxide was added. The pH of the resulting solution under ambient atmosphere was determined to be at pH 5.8 and then the sample was degassed. Following acquisition of a 1 H/15 N-HSQC spectrum of the pH 5.8 protein solution, another small aliquot of degassed 1.0 M sodium hydroxide was added and the new pH was measured (under ambient atmosphere) to be 6.7 and then degassed. A final 1 H/15 N-HSQC spectrum was collected. At pH 6.7, the protein solution was now at the limit of the effective buffering range of MES (pH 5.5-6.7). All pH measurements were done under ambient atmospheric conditions as quickly as possible and the sample was degassed immediately following the pH measurement in order to minimize the chance of oxidation of the protein. The resulting 1 H/15 N-HSQC spectra (at pH 3.3, 5.3, 5.8 and 6.7) were analysed along with the pH 4.1 spectra (described previously) and the peak heights and widths were tabulated. In order to determine if the observed losses in peak intensity with increasing 141
pH are a result of the increasing rates of hydrogen exchange with the water solvent, the theoretical hydrogen exchange rates for unfolded Tat at pH values of 3.3, 4.1, 5.3, 5.8 and 6.7 were calculated taking into account the nearest neighbour inductive and steric effects [230]. Predicted hydrogen exchange rates were determined using a Microsoft Excel spread sheet provided by Walter Englander, University of Pennsylvania School of Medicine (HX2.med.upenn.edu). The spread sheet determines the intrinsic hydrogen exchange rates for a protein in a fully opened conformation (unprotected) at any pH and temperature as well as including the influence of neighbouring side-chains.
142
Chapter 4 Results 4.1
Protein Expression and Purification
Growth of E. coli BL21(DE3)pLysS cells containing pET28tat in TB typically yielded about 10 g of cells (wet weight) per litre of TB medium. Yields of E. coli are reduced by about half when the cells are grown in 1 L of
13
C/15 N labelling medium (M9) using cells from
4 L of TB, as described in Chapter 3. Typically, in both unlabelled and labelled protein purifications, the protein dialysate is free of visible precipitate. UV absorbance measurements of the protein dialysate at 280 nm (calculated 1280 = 9090 cm−1 ×M−1 [231]) were used to determine protein yields of His-tagged Tat. Typically, up to 20 mg of unlabelled protein and 15 mg of 13 C/15 N-Tat1−72 are recovered from cells grown in 1 L of TB and minimal medium, respectively. Attempts to remove the hexahistidine (6×His) affinity tag followed by re-purification on the metal affinity column resulted in a significant loss of protein. Possible reasons for the problems associated with thrombin cleavage are that the protein contains a potential internal thrombin cleavage site (see Fig 4.1) between Lys-61 and Ala-62 [232] and the protein
143
contains a possible thrombin inhibitory segment Arg-Pro-Pro (residues 76–78) [233]. NMR analysis [166] (below) suggests that there is no interaction between the affinity tag and any other segment of the protein and this has been generally found to be the case for a large number of proteins containing polyhistidine purification tags [234].
1 20 | | MGSSHHHHHH SSGLVPRGSH
21 40 | | MEPVDPRLEP WKHPGSQPKT
41 60 | | ACTNCYCKKC CFHCQVCFIT
61 80 | | KALGISYGRK KRRQRRRPPQ
81 92 | | GSQTHQVSLS KQ
Figure 4.1: Amino acid sequence of His-tagged Tat1−72 . The affinity tag residues are shown in normal face and Tat residues are shown in bold face.
4.2
Monomer Identification: MALDI-TOF-MS
As indicated in Figure 4.2, mass spectrometry, and in particular, MALDI-MS, is an effective approach to ascertaining both the purity and the oligomeric state of the protein. A significant advantage of MALDI-MS over sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is that the former method can maintain the protein at a low pH where the cysteine residues are protonated and unreactive. The MALDI-TOF mass spectrum shown in Figure 4.2 indicates that there is one predominant peak at 10,519.8 Da corresponding to the [M+H]+ species for the unlabelled His-tagged Tat monomer (calculated MW 10,509.076 Da). A second, less intense peak at 5256.5 Da, is likely the [M+2H]2+ peak. These two peaks corresponding to the monomeric Tat protein make up 89% of the total intensity of the non-matrix related peaks. The additional weak peaks at 7376.4, 21028.7 and 31345.6 Da are most likely the [2M+3H]3+ , [2M+H]+ , and [3M+H]+ species, respectively. Similar peaks are often observed in MS and are usually ascribed to non-covalent protein oligomer 144
formation mediated by interactions between basic residues (Arg, Lys, and His) and acidic residues (Asp and Glu) in proteins [235]. The low intensity of these peaks in the present spectrum may be explained by the high net positive charge on Tat at low pH suggesting that there is minimal Coulombic attraction between the proteins. 100 [M+H]+
50
[M+2H]2+ [2M+3H]3+
[2M+H]+
[3M+H]+
0 20000
40000
m/z
60000
80000
100000
Figure 4.2: MALDI-TOF-MS identification of monomeric unlabelled His-tagged Tat1−72
4.3
NMR Spectroscopy and Resonance Assignments
Dissolution of freeze-dried protein dialysate at pH 4 usually yielded solutions free of visible precipitate and free of suspended material based on the near UV absorption spectra. The natural abundance 1 H/15 N-HSQC spectrum of unlabelled Tat1−72 shown in Figure 4.3 shows 145
64 of the 83 observable amide backbone resonances (non-proline and non-N-terminal) as well as 9 peaks corresponding to the Arg, Gln, and Asn side chain resonances. The 1 H/15 NHSQC spectrum of the
13
C/15 N-labelled Tat1−72 in Figure 4.4 shows the same peaks as
in the unlabelled protein and backbone resonances missing from the unlabelled sample, as well as some additional weaker resonances that correspond to backbone residues that are undergoing slow conformational exchange (ms-s range). In general, both samples show crosspeaks regionally clustered in a manner typical of denatured or disordered proteins: a Gly region, a Ser/Thr region and a region containing the rest of the backbone amides [204]. The spectral dispersion of the resonances is also typical for proteins lacking regular secondary structure in that all backbone resonances lie within a 1.1 ppm range in the 1 H dimension and within 20 ppm in the
15
N dimension [236, 237].
146
109 110 111 112 113 114 115 116
118 119
N (ppm)
117
120 121 122 123 124 125 126 127 128 129 8.8
8.6
8.4
8.2
8.0
7.8
7.6
7.4
HN (ppm)
Figure 4.3: Amide backbone region of a 1 H/15 N-HSQC spectrum (192 scans) for naturally abundant
15
N in unlabelled His-tagged Tat1−72 acquired on a Varian INOVA 600 MHz
spectrometer at pH 4.1 and 293 K.
147
129
G64
109
G35 G68
G18 G81
110
130
111
G13
131 10.0
10.4
112 113 114
T84
115
T40
S19
S36
S3
S82
116
T43
118
S12
H7 H8 H9 H10
W31 S11 H33 H6 C42 S66 C45 K39 H5 T60 S88 H85 H53
H20
119 I65
V24 R69
120
L28
E29
C50 Q80 C54 N44 C47 C51L63 R27 K32 Q37 L14 V56 R17 V87 Q86 K70 Y46 R75 Y67 K71 Q74 R73 Q83 K49 V15 I59 C57 K91 R76 R72
121 122
K48
123 124
F52/F58 E22 K61
125
L89 D25
N (ppm)
117 S90
S4
A62
126
Q92
A41
127
Q55
128 129 8.8
8.6
8.4
8.2
8.0
7.8
7.6
7.4
HN (ppm) (a)
Figure 4.4: (a) 1 H/15 N-HSQC spectrum of
13
C/15 N-labelled His-tagged Tat1−72 at pH 4.1
and 293 K recorded on a Varian INOVA 600 MHz spectrometer. Backbone amide region with assignment of 80 of the 83 non-proline and non-N-terminal resonances (side-chain Asn and Gln NH2 resonances are outlined with a solid ellipse and the side-chain amide of Arg resonances are outlined in a dashed ellipse). Inset region shows the three peaks associated with the side chain of the single Trp residue (Trp-31). (b) Expanded region of (a) in dashed rectangle. Cys residues in (a) and (b) are shown in bold face. 148
S88
K39
120.0
H5
H20
T60 C50
H85
V24
H53
R69
120.5
121.0
N44
Q80
C47
C54
121.5
C51 R27
K32
Q37
R17
L14
L63
V87 Q86
122.5
Y46 Q74
122.0
V56 K70
R75
K71
Y67
123.0
K49
R73 Q83
F52/F58
V15
123.5 I59 R76
8.6
R72
8.5
K91
8.4 8.3 N H (ppm) (b)
Figure 4.4: continued
149
8.2
124.0 8.1
The observed chemical shifts of the cross-peaks do not differ significantly between the unlabelled (Fig. 4.3) and
13
C/15 N-labelled (Fig. 4.4(a)) samples, indicating that the
proteins are in the same conformational state. Peaks that are missing in the spectrum of the unlabelled protein correspond to those that are of relatively weak intensity in the 13
C/15 N-labelled protein and are therefore absent due to the sensitivity limitations of the
natural abundance experiment. Many of the missing peaks in the natural abundance HSQC spectrum correspond to amide backbone resonances in the Cys-rich and core regions of the protein [166]. These resonances are the weakest in the spectrum and in some cases are associated with multiple cross-peaks observed in the HSQC spectrum of the 13 C/15 N-labelled Tat. The multiple cross-peaks may indicate conformational exchange on the µs-ms timescale in these regions indicating transient structural formation, which may only become stabilized in the presence of zinc ions, binding to TAR, cyclin T1, or other binding partners. In addition to the chemical shift dispersion of resonances, the 1 H/15 N-HSQC spectra of Tat in Figs. 4.3 and 4.4 show that the peaks exhibit a range of intensities with nearly all the weak and medium intensity cross-peaks falling in the sequence between Cys-47 and Leu-63, as seen in the intensity profile depicted in Figure 4.5. The weakness in intensity in this range of the sequence suggests that this region of the protein is likely undergoing conformational exchange on the ms-µs timescale—indicating possible transient structure formation.
150
Relative Intensity
1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue Figure 4.5: Relative intensities of the amide backbone resonances from a 1 H/15 N-HSQC spectrum of His-tagged Tat1−72 at pH 4.1 and 293 K. The solid horizontal line denotes the mean relative intensity of 0.42 (s.d.=±0.26). The 1 H/15 N-HSQC spectrum in Figure 4.4 reveals 40 additional cross-peaks (mostly very weak) than can be accounted for by the backbone and side-chain atoms of a 92-residue protein. Seventeen residues have multiple cross-peaks which have been sequentially assigned (indicated with the designation ‘a’ in Table A.1, Appendix A); some of the peaks were assigned to amino acid identity only, and some could not be unambiguously assigned (see Table A.2 in Appendix A). Thirteen of the unambiguously assigned minor resonances fall in the region spanned by residues Cys-45 to Arg-69. One example of the multiplicity of cross-peaks is shown for the single Trp at position 31 (see Fig. 4.4, inset), which exhibits one strong and two weaker side-chain indole amine cross-peaks. The Trp is preceded by a Pro, so one of the two minor peaks could arise from Trp bonded to a Pro cis-peptide bond isomer, but this is unlikely (see Ch. 5) . Another possible explanation is that some of the minor peaks are due to the presence of minor amounts of oxidized Cys residues that are unobserved in the lower-resolution three-dimensional experiments. Another interesting example of peak multiplicity is Gly-64, which exhibits two amide 151
cross-peaks of approximately equal intensity (Figures 4.3 and 4.4). The nearest proline to Gly-64 is separated from it by 14 residues ruling out cis-trans proline isomerism as an explanation of the peak multiplicity. This suggests that some segments of the reduced, monomeric Tat protein exist in multiple conformations that are in slow equilibrium on the chemical shift time scale (ms-s). In the case of Gly-64, the two resonances have comparable intensity suggesting equal populations of two conformers whereas in many other cases one resonance is significantly more intense than those arising from alternate conformers suggesting one dominant conformer and minor alternates. Gly-64 is located between Leu and the β-branched Ile and it is possible that steric crowding could locally restrict the dynamics of the Gly amide. Since there is a variation in the intensities of the duplicate peaks across the sequence this immediately suggests that the conformers populated arise from local interactions, as expected in a disordered protein. Sequential assignments of 1 HN , 15 N, C’, Cα , Hα , and Cβ , resonances were done entirely with 3D heteronuclear triple resonance experiments that use one- and two-bond scalar couplings to connect the atoms [169]. These experiments take advantage of the comparatively wide chemical shift dispersions of 15 N and 13 C resonances in unfolded proteins [238,239]. All the backbone resonances were sequentially assigned except for Met-1, Met-21, Phe-52, Phe58, Arg-77, and Pro-78. The Met-1 amino and Gly-2 amide protons exchange too rapidly to be observed. Resonances from Phe-52/58 could be assigned to residue type only, because they are both preceded by weak Cys resonances. Arg-77 and Pro-78 are part of the difficult sequence Arg-Arg-Arg-Pro-Pro and could not be unambiguously sequentially assigned. Of the assigned Pro resonances, all but one have Cβ chemical shifts characteristic of the trans peptide. The Cβ shifts of Pro-38 are outside the canonical chemical shifts for the trans configuration [204] but are nearer to the trans than they are to the cis configuration. As an example, parts of the HN(CA)CO and HNCACB experiments used for the backbone assignment of residues Gln-83–Leu-89 are shown in Figure 4.6. The assignments are listed 152
in Table A.1 in Appendix A.
(a)
N(ppm) 114.83 120.63 122.54 122.32 119.76 125.07
(b)
172
N(ppm) 114.83 120.63 122.54 122.32 119.76 125.07 10 20
H85
30
T84
176 Q83
Q86
C!C"(ppm)
S88
CO(ppm)
174
V87
40 50 Q83
L89 S88 Q86
60 T84
V87
70
L89
178
H85
80 8.150 8.545 8.499 8.358 8.456 8.443 HN(ppm)
8.150 8.545 8.499 8.358 8.456 8.443 HN(ppm)
Figure 4.6: Strip plots extracted from three-dimensional, amide-detected heteronuclear NMR experiments for backbone assignment. Inter- and intra-residual correlations are obtained from (a) an HN(CA)CO [197] spectrum correlating HN (i) and N(i) with C’(i) and C’(i-1) resonances; and (b) an HNCACB [194] spectrum correlating HN (i) and N(i) with Cα (i), Cα (i1), Cβ (i), and Cβ (i-1) resonances. A segment of the His-tagged Tat1−72 at pH 4.1 and 293 K is shown depicting connectivity between residues 83–89. Correlations in (a) are shown with long dashed lines; in (b) the Cα correlations are connected with solid lines and correlations of Cβ are connected with short dashed lines. Both experiments were recorded on a Varian INOVA 600 MHz spectrometer. The assignments of the Cys residues (shown in bold-face in Fig. 4.4) are particularly informative as they confirm that all of the Cys residues are reduced; all of the Cα and 153
Cβ chemical shifts shown in Figure 4.7 observed in the three-dimensional HNCACB [194] and CBCA(CO)NH [195] spectra, are in the range of the random coil chemical shifts of reduced cysteine (58.6 ppm and 28.3 ppm) [204] differing significantly from those of oxidized cysteine (55.6 ppm and 41.2 ppm) [204] involved in disulfide bond formation. The chemical shift resonances for the Cys residues thus confirm the findings from the MALDI-TOF-MS analysis that the protein is in the reduced monomeric state and that the weak peaks in the mass spectrum of Figure 4.2 most likely indicate the presence of non-covalent oligomers formed during the MS analysis.
154
Figure 4 15N(ppm) 119.25 (i-1) C45
119.0 (i-1) C42
121.21 (i-1) C47
120.68 (i-1) C50
121.71 (i-1) C51
121.31 (i-1) C54
123.72 (i-1) C57
10 C"(i-1)
15 20
C"(i-1)
25 C"(i)
C"(i)
C"(i)
C"(i)
C"(i-1)
C"(i-1)
30 35 40
C"(i)
C"(i)
C"(i)
C"(i-1)
C"(i-1)
C"(i-1)
45 C!(i-1)
C!(i-1)
50
C!(i-1) C!(i-1)
55 C!(i)
C!(i)
60
C!(i)
C!(i)
C!(i) C!(i-1)
C!(i-1)
C!(i)
C!(i)
C!(i-1)
65 70 75 80 85 8.416 8.416
8.252 8.252
8.116 8.116
8.403 8.403
8.382 8.382
8.412 8.412
8.664 8.664
1H(ppm)
Figure 4.7: Strip plots for cysteine residues from 3D HN -detected HNCACB [194] and CBCA(CO)NH [195] spectra of the
13
C/15 N-labelled His-tagged Tat1−72 . The HNCACB
spectrum correlates each amide HN (i) with its attached N(i) and the Cα and Cβ of the (i) and (i-1) residues. The corresponding strips from the CBCA(CO)NH spectrum correlate each amide HN (i) with its attached N(i) and the Cα and Cβ of the preceding (i-1) residue only. Both spectra were recorded on the same sample at pH 4.1 and 293 K on a Varian INOVA 600 MHz spectrometer.
155
4.4
Chemical Shifts and 3JH N H α Coupling Constants
The NMR chemical shift is a sensitive indicator of conformation, and assignment of backbone chemical shifts permits an analysis of secondary structure by comparison to random coil values corrected for local sequence effects [203, 240]. Consensus multinuclear (C’, Cα , Cβ , and Hα ) chemical shift indexing (CSI) [241] (data not shown) suggests that the reduced Tat protein at pH 4.1 exists in a random coil conformation. Only three residues (Cys-54, Cys-55, and Cys-56) indicate a tendency towards α-helical conformation but since one turn of an α-helix consists of 3.6 amino acids, at least four consecutive residues are required for identification of an α-helix [7]. It is also possible that these three residues constitute a short ‘turn’ or a nascent helix [203]. However, the consensus CSI calculations are not corrected for local sequence effects. Examination of the individual chemical shift difference plots shown in Figure 4.8, corrected for the sequence effects on the chemical shifts [203, 204], indicates that a majority of the resonances are within the random coil range, and that rarely are there more than 3 consecutive resonances in either the α-helix or β-sheet chemical shift ranges. However, among the HN (Fig. 4.8(a)) and Hα (Fig. 4.8(d)) resonances there appears to be a slight weighting of the conformations toward the α-helix, the most consistent classification being for the segment around Glu-29. Unlike some other denatured proteins [211, 242, 243] there is less evidence of a tendency to the β-sheet conformation perhaps because of a lack of hydrophobic β-branched amino acids in Tat1−72 (two Ile and four Val). The conclusions based on the chemical shift differences are supported by the uncorrected (for Hα relaxation) 3 JH N H α measurements which are all in the range of 5.5–7.1 Hz with a mean value of 6.7 characteristic of unfolded molecules [244]. The results are shown in Figure 4.8(g) in which the differences of the measured values from random coil values corrected for sequence effects of the preceding residue (β-branched or aromatic) according to Penkett et al. [205] and Smith et al. [244] are presented (Gly and Pro residues are omitted). They show 156
that the entire polypeptide is undergoing rapid sampling of the α-helix and β-sheet regions
!HN (ppm)
of Ramachandran space with a slight preference for the α-helix in most segments.
1.5 1
"
0.5 0 -0.5 -1
#
-1.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
Figure 4.8: Chemical shift difference plots of: (a) HN , (b) N, (c) C’, (d) Hα , (e) Cα and (f) Cβ . (g) Difference plot of the 3 JH N H α coupling constant from random coil coupling constant (corrected for sequence effects of the preceding residue). The random coil values for HN , N, C’, Cα , and Hα , have been adjusted for sequence dependence [203] (correction factors for Cβ are unavailable). Reference lines in plots (a)–(f) correspond to thresholds where chemical shift differences reflect secondary structure formation. The plot ranges in (a)–(f) correspond to two standard deviations from the mean value determined from the chemical shift tables in the BioMagResBank database (URL: www.bmrb.wisc.edu).
157
!N (ppm)
8 6 4 2 0 -2 -4 -6 -8
"
# 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue
!C’ (ppm)
(b)
4 3 2 1 0 -1 -2 -3 -4
"
# 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure 4.8: continued
158
!H" (ppm)
1
"
0.5 0 -0.5
#
-1
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
!C" (ppm)
4
"
2 0 -2
#
-4 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (e)
Figure 4.8: continued
159
!C" (ppm)
4 3 2 1 0 -1 -2 -3 -4
"
non-" 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue !3JHNH" (Hz)
(f)
4
#
2 0 -2
"
-4 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (g)
Figure 4.8: continued
160
Since there is a lack of long range homonuclear 1 H-1 H NOEs for Tat, it is not possible to obtain a high resolution structure of the protein. However, it possible to obtain an initial representation of the extended structure of Tat using the THRIFTY web server [245] which generates a PDB structure file of atomic coordinates based on the measured backbone chemical shifts. The resulting PDB structure file can provide an estimate or ‘snapshot’ of the state of the protein through comparison of its chemical shifts to those observed for other proteins in the PDB. A space-filled model based on the THRIFTY generated PDB file for His-tagged Tat1−72 is shown in Figure 4.9. The resulting model shows Tat to be in an extended disordered state and containing a few short turns at residues Pro-30 to Lys-39 and Cys-42 to Lys-48.
Figure 4.9: Space-filled model of the extended disordered form of His-tagged Tat1−72 at pH 4.1 and 293 K, determined using the backbone chemical shifts and the THRIFTY web server [245] generated PDB structure file (left-to-right N-terminal to C-terminal). Regions of the protein are coloured according to: His-tag (residues 1-20) in light grey; acidic or Pro-rich region (residues 21-41) in red with the exception of Glu-29 (dark grey) and Trp31(olive); Cys-rich region (residues 42-57) in yellow; core (residues 58-67) in purple; basic region (residue 68-77) in blue; and C-terminal Gln-rich region (resides 78-92) in orange. Model generated using MacPyMol molecular graphics system, version 0.99 [246]. 161
4.5
NMR Relaxation
Relaxation data (R1 , R1ρ and heteronuclear steady-state NOEs) were measured for 64/83 observable (non-proline and non-N-terminal) resonances at 600 MHz (60/83 at 800 MHz). Sample spectra for the saturation and no-saturation steady-state heteronuclear NOE are shown for His-tagged Tat1−72 in Figure 4.10. The steady-state 1 H-15 N NOE values were obtained (as described in Section 3.9) from ratios of peak heights from experiments with (IN OE ) and without (InoN OE ) saturation of the protons for 5 s at the beginning of the experiment. The heteronuclear NOE values were then obtained from (IN OE -InoN OE )/InoN OE . As indicated in Figure 4.11(a), the steady-state heteronuclear NOEs measured at 600 MHz and 800 MHz exhibit a relatively featureless, flattened bell-shaped variation with amino acid sequence, as expected for an unfolded protein [247]. The observed NOEs range from -3.3 (-2.6) to -0.60 (-0.41) with mean values of -1.27 (-0.933) at 600 MHz (and 800 MHz). For comparison, an average NOE of about -0.2 is observed for several folded proteins with similar lengths of polypeptide chain [248–250]. The more negative NOE values for Tat indicate much less restricted dynamics on the ns-ps timescales than for folded proteins. The ends of Tat exhibit the most negative values indicative of faster dynamics and the values gradually increase away from the C-terminus whereas the increase away from the N-terminus is steeper. Significant deviations from the average values are observed for Thr-43, Lys-61 and Ala-62.
162
108 109 110 111 112 113 114 115 116
118 119
N (ppm)
117
120 121 122 123 124 125 126 127 128 129 8.8
8.6
8.4
8.2
8.0
7.8
7.6
130
HN (ppm) (a) noNOE
Figure 4.10: Sample spectra for the steady state heteronuclear 1 H-15 N NOE of His-tagged Tat1−72 at pH 4.1 and 293 K. (a) no saturation period (noNOE) and (b) 5 s saturation period (NOE). Spectra were recorded on a Varian INOVA 600 MHz spectrometer. 163
108 109 110 111 112 113 114 115 116
118 119 120 121 122 123 124 125 126 127 128 129 8.8
8.6
8.4
8.2
8.0
7.8
HN (ppm) (b) NOE
Figure 4.10: continued
164
7.6
130
N (ppm)
117
1
1
H-15N NOE
0 -1 -2 -3 -4 -5 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
R1 (s-1)
3
2
1
0
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.11: Relaxation measurements of the His-tagged Tat1−72 protein at pH 4.1 and 293 K, determined at 14.1 T (0) and 18.8 T (#) field strengths for: (a) heteronuclear NOE, (b) longitudinal relaxation, R1 , (c) rotating-frame longitudinal relaxation, R1ρ , and (d) R1ρ data at 14.1 T field strength (0) plotted along with the predicted behaviour for a random-coil polymer of uniform composition (—) and the variation in relaxation when residue contributions are weighted by residue volume (- - -) [211, 218].
165
10
R1! (s-1)
8 6 4 2 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
10
R1! (s-1)
8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
Figure 4.11: continued
166
The R1 measurements show a similar bell-shaped profile with two notable features (Fig. 4.11(b)). The R1 values range from 0.75 s−1 (0.89 s−1 ) to 1.91 s−1 (1.92 s−1 ) with mean values of 1.45 s−1 (1.43 s−1 ) at 14.1 T (at 18.8 T) field strength. Several folded proteins of similar size show slightly higher average R1 values on the order of 1.5 s−1 at 14.1 and 17.6 T field strengths [249, 251]. The slower relaxation in Tat indicates a shorter rotational correlation time and faster dynamics on the ns-ps timescale than for a folded protein. Similar to the NOE values, the R1 rates decline near the ends of the protein and more steeply at the Nterminus than the C-terminus. The lowest rates, apart from the termini, are found in the segment connecting the Cys-rich and basic regions, between Thr-60 and Ser-66 and suggest fast dynamics there. An example of the T1 and T1ρ relaxation series spectra are shown in Figure 4.12 along with examples of the exponential fits of of the relaxation times for Gly-68 in Figures 4.13 and 4.14.
167
107 108 109 110 111 112 113 114 115 116
118 119
N (ppm)
117
120 121 122 123 124 125 126 127 128 129 130 8.8
8.6
8.4
8.2
8.0
HN (ppm)
7.8
7.6
(a)
Figure 4.12: Sample spectra for (a) T1 (50 ms relaxation time) and (b) T1ρ (30 ms relaxation time) relaxation series for His-tagged Tat1−72 at pH 4.1 and 293 K. Spectra recorded on Varian INOVA 600 MHz spectrometer. 168
107 108 109 110 111 112 113 114 115 116
118 119 120 121 122 123 124 125 126 127 128 129 130 8.8
8.6
8.4
8.2
8.0
7.8
HN (ppm) (b)
Figure 4.12: continued
169
7.6
N (ppm)
117
1
Amplitude
0.8
0.6
0.4
0.2
0 0
500
1000
1500
2000
2500
3000
3500
4000
Relaxation Time (ms) (a) T1 = 658 ± 1 ms
Figure 4.13: Sample fits for T1 of Gly-68 measured at (a) 14.1 T and (b) 18.8 T field strengths.
170
1
Amplitude
0.8
0.6
0.4
0.2
0 0
500
1000
1500
2000
2500
3000
Relaxation Time (ms) (b) T1 = 652 ± 2 ms
Figure 4.13: continued
171
3500
4000
1
Amplitude
0.8
0.6
0.4
0.2
0 0
50
100
150
200
250
Relaxation Time (ms) (a) T1ρ = 392 ± 1 ms
Figure 4.14: Sample fits for T1ρ of Gly-68 measured at (a) 14.1 T and (b) 18.8 T field strengths.
172
1
Amplitude
0.8
0.6
0.4
0.2
0 0
50
100
150
Relaxation Time (ms) (b) T1ρ = 363 ± 1 ms
Figure 4.14: continued
173
200
250
The rotating frame longitudinal relaxation rates (R1ρ ) measured for Tat at 14.1 T and 18.8 T fields are plotted in Figure 4.11(c). The R1ρ rates range from 1.5 s−1 (1.3 s−1 ) to 5.9 s−1 (7.2 s−1 ) with mean values of 3.26 s−1 (3.29 s−1 ) at 14.1 T (at 18.8 T) field strength. The R2 rates measured at 14.1 T (Fig. 4.15), range from 1.6 s−1 to 7.1 s−1 with an average value of 3.5 s−1 . The differences between the R2 and the R1ρ rates presumably arise from contributions to the former from slow conformational exchange. An example of the T2 relaxation series is shown in Figure 4.16 and the exponential fit of the data for Gly-68 is shown in Figure 4.17.
10
R2 (s-1)
8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue Figure 4.15: Transverse relaxation rates (R2 ) for His-tagged Tat1−72 at pH 4.1 and 293 K determined at 14.1 T field strength.
174
107 108 109 110 111 112 113 114 115 116
118 119
N (ppm)
117
120 121 122 123 124 125 126 127 128 129 130 8.8
8.6
8.4
8.2
8.0
HN (ppm)
7.8
7.6
7.4
(a)
Figure 4.16: Sample spectrum for T2 (50 ms relaxation time) relaxation series for His-tagged Tat1−72 at pH 4.1 and 293 K. Spectrum recorded on a Varian INOVA 600 MHz spectrometer (14.1 T field). 175
1
Amplitude
0.8
0.6
0.4
0.2
0 0
50
100
150
200
Relaxation Time (ms) (a) T2 = 323 ± 6 ms
Figure 4.17: Sample fit for T2 of Gly-68 measured at 600 MHz.
176
250
In general, low R1ρ and R2 measurements indicate unrestricted fast dynamics whereas high values suggest restricted fast dynamics and possible contributions from slow conformational exchange [250]. In folded proteins of similar length to Tat the R2 values in the absence of exchange are on the order of 8 s−1 [249]. The low R1ρ and R2 values and the negative NOE values measured for Tat indicate large amplitude fluctuations on the ns-ps timescale characteristic of a random coil-like conformation. The R1ρ relaxation data obtained at the 14.1 T field were fit to equation (3.17) in which the influence of neighbouring residues is modelled as a decaying exponential [211]. The flattened, bell-shaped solid curve (Fig. 4.11(d)) shows the behaviour predicted for a randomcoil polymer of uniform composition. The dashed line shows the variation in relaxation when residue contributions are weighted by residue volume [218]. Although a number of individual residues deviate from the volume-weighted model, overall the theoretical line follows the data fairly closely and is a much better fit than the uniform polymer model. The minima in the model correspond mainly to the small flexible residues Gly, Ala, and Ser whereas the maxima are found at the positions of Trp, Arg, and Lys. In contrast to some other applications of this model to denatured proteins [211, 252, 253] there are no obvious regions with large positive deviations from the theoretical curve, further evidence that reduced Tat1−72 at pH 4.1 is predominantly disordered. The segment from Pro-23–Pro-38 contains five prolines and the measured R1ρ values for most residues in this region are greater than the calculated ones. This suggests that the prolines restrict dynamics on the ms-µs timescale and stiffen the backbone in this region. In the C-terminus, from residue 60 onwards, the measured values generally fall below the calculated ones suggesting greater flexibility in this region of the molecule. One exception is the high value for Gly-64, suggesting restricted motion and slow exchange at this position [253]. One final observation is that the region of the protein spanning residues 45-60 (Cys-rich region and core) contains the fewest number of dynamics measurements. This is because the peak intensities in this region are low. The 177
largest number of assigned minor peaks are also found in this segment (see Table A.2 in Appendix A) supporting the suggestion that some residues in this segment undergo slow conformational exchange and are the most likely sites of folding nuclei.
4.6
Spectral Density Mapping
The relaxation measurements carried out at two field strengths allowed the mapping of the spectral density functions at five frequencies: 0, 61, 81, 522 and 696 MHz where the latter two frequencies are 0.87 times the 1 H Larmor frequencies for the 14.1 T and 18.8 T magnetic fields. The spectral density functions at these frequencies are plotted in Figure 4.18. The high-frequency values make a small, relatively uniform contribution to the relaxation across the sequence except at the N-terminus where a significant increase in high frequency motions is observed for the first 10 residues at 522 MHz (Fig. 4.18(a)).
178
J(0.87!H) (ns/rad)
0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
Figure 4.18: Reduced spectral density mapping of motions for His-tagged Tat1−72 at pH 4.1 and 293 K at frequencies: (a) 522 MHz (0) and 696 MHz (#), based on estimation of J(0.87ωH ); (b) 61 MHz (0) and 81 MHz (#); (c) 0 MHz effective spectral density ($), calculated using the measurements from two fields according to the method of Farrow et al. described in [209]. Average values of the backbone spectral densities are listed in Table 4.1.
179
J(!N) (ns/rad)
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Jeff(0)F (ns/rad)
2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure 4.18: continued
180
Table 4.1: Means with standard deviations and maximum and minimum values of the reduced spectral density mapping for backbone amides of His-tagged Tat1−72 at pH 4.1 and 293 K at: 0, 61, 81, 522, and 696 MHz. Averages correspond to the 58 residues that are common to relaxation measurements at both 600 and 800 MHz. µ±s.d. (ns/rad)
J(ω)
1.05±0.48
2.1
0.44
0.71±0.30
1.59
0.28
J(61)
0.27±0.04
0.35
0.12
J(81)
0.22±0.03
0.30
0.13
J(522)
0.031±0.008
0.068
0.017
J(696)
0.023±0.006
0.049
0.011
Jef f (0)F Jef f (0)
a
b
a
Max. (ns/rad) Min. (ns/rad)
b
calculated using two fields by the method of Farrow et al. [90]. mean residue Jef f (0) from averaging the 600 and 800 MHz solutions for Jef f (0) in the reduced spectral density mapping in equations (3.12)–(3.14).
181
The spectral density profiles at 61 MHz and 81 MHz are highly similar and do show some variation with sequence (Fig. 4.18(b)). The ends of the protein, residues Lys-61 to Leu-63 and Thr-43 exhibit the smallest contributions at mid-frequencies. Interestingly, in the acid-denatured state of apomyoglobin, maxima in buried surface area correlate weakly with maxima in the J(ωN ) plot [254] suggesting that J(ωN ) is sensitive to formation of transient folding nuclei. The smaller values J(ωN ) in Tat near Lys-61 and at the termini suggest less restricted motion in these regions. The low frequency spectral densities cover a wider range of values but also contain the highest levels of error in comparison to the values calculated at high frequencies (Fig. 4.18(c)). The most notable feature is a local maximum in slow motions centred at the 6×His affinity tag. There are also less well-defined peaks in the proline-rich region (residues 21-41) and in the basic region (residues 68-77). The Cys-rich (residues 42-57) segment contains the fewest measurements and they are associated with some of the largest errors. These errors arise from the weak peak intensities in this region of the protein, possibly indicating the presence of conformational exchange in this region. In order to estimate the contribution to relaxation from conformational exchange, Jef f (0) (Fig. 4.18(c)) and Rex were calculated using equations (3.15) and (3.16) from Farrow et al. [90, 209] using relaxation data measured at 14.1 T and 18.8 T field strengths. The conformational exchange rates (Rex ) determined by using this method are plotted in Figure 4.19. These conformational exchange rates are field dependent and were calculated relative to the lower field of 14.1 T (νH = 600 MHz). The mean exchange rate for all residues observed is 1 s−1 with a maximum value of 3.3 s−1 ; 49 of the 58 values measured are within one standard deviation of the mean. Thus, for most residues the contributions to R1ρ relaxation from conformational exchange is minor. Moreover, small Rex cannot be measured accurately using this approach when the data are measured at two similar field strengths (see Ch. 5) [90].
182
4 3.5
Rex (s-1)
3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue Figure 4.19: Field dependent conformational exchange rates (relative to 14.1 T field) for His-tagged Tat1−72 at pH 4.1 and 293 K determined from the method of Farrow et al. [209] using equation (3.16).
183
Unlike the results reported in [90] for the 59 residue N-terminal SH3 domain from the adapter protein drk (drkN SH3), where many of the calculated conformational exchange rates are negative (contrary to theory), only one residue for Tat was found to have a negative conformational exchange rate (His-5 of the affinity tag). The explanation for the discrepancy from theory in [90] was that the difference between the two fields strengths used to measure their T2 times was only a factor of 1.2 (using data measured at 11.7 T and 14.1 T) and the resulting difference between T2 times was relatively small. The authors note that the calculation of Jef f (0) using two fields with equation (3.15), resulted in a four-fold increase in the errors compared to the errors of Jef f (0) determined from single field calculations. A similar increase in the errors of Jef f (0) determined from the two-field equation (3.15) is found with the single field measurements in Figures 4.20(b) and 4.20(c). Similar to the observations of drkN SH3 in [90,209], the increased error observed for Jef f (0) of Tat is likely the result of error propagation in the calculation as well as similarity in the field strengths used for the measurements (they differ by a factor of 1.333). Consequently, the calculation of Rex using the Jef f (0) values from equation (3.15) should be judged cautiously. The low frequency spectral density values for His-tagged Tat1−72 determined by equation (3.15) are plotted in Figure 4.20 along with the values determined from the single field solutions to equations (3.8)-(3.10) and the mean Jef f (0) values from the two single field measurements. In light of the above discussion on the similarity of the relaxation data when the field strengths are similar, the similarity in the range of values for the R1ρ and R2 data measured at 600 MHz (Figs. 4.11(d) and 4.15) imply that contributions from slow conformational exchange on the ms-µs timescale are not significant for the Tat protein under these conditions. Therefore, the low frequency spectral density values, Jef f (0): (i) do not differ significantly between the calculations from the two separate fields as a result of the small to negligible contribution of Rex and the small difference in the field strengths; (ii) Jef f (0) is likely a very close approximation to the actual J(0) values. 184
Jeff(0)F (ns/rad)
2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
Jeff(0)600 (ns/rad)
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.20: Jef f (0) spectral density maps determined for His-tagged Tat1−72 at 14.1 T (νH = 600 MHz) and 18.8 T (νH = 800 MHz) field strengths calculated for each field separately and using combined data: (a) Jef f (0)F calculated by the method of Farrow et al. [209] ($); (b) Jef f (0)600 calculated from 14.1 T field strength data (0); (c) Jef f (0)800 calculated from 18.8 T field strength data (#); (d) mean value, Jef f (0), calculated using data from both 14.1 T and 18.8 T field strengths (%).
185
Jeff(0)800 (ns/rad)
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Jeff(0) (ns/rad)
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
Figure 4.20: continued
186
4.7
Model-Free Analysis
To obtain further insight into the dynamics of Tat, the relaxation data were fit using LipariSzabo and Cole-Cole model-free methods. The results from the analysis of R-factors (Rf ) and AIC values for the eight tested models (listed in Table 3.4) are shown in Table 4.2. Only Models 7 and 8, which use the Cole-Cole distribution model, result in Rf values of less than 0.1. As the Cole-Cole analysis [223, 224] is not widely used, it may be helpful to describe the Lipari-Szabo results first despite the fact that their Rf values are higher than those for the two Cole-Cole distribution models. The Monte Carlo error estimates were determined for Lipari-Szabo model 2 using the data from the 600 MHz and 800 MHz measurements independently and combined (2nd , 4th and 6th entries in Table 3.4). Model 3, which includes estimates of the conformational exchange rate, can only be used when the two data sets are combined (since only three data sets were obtained at each field).
187
Table 4.2: Individual and total R-factors (Rf ) along with mean Akaike information criterion (AIC) values for the Lipari-Szabo (Models 1 and 2), Lipari-Szabo extended (Models 4-6) and Cole-Cole distribution (Models 7 and 8) methods listed in Table 3.4. Model
Rf [R1 ]
Rf [R1ρ ] Rf [NOE]
1
0.088
0.499
0.400
2
0.065
0.298
1
0.031
2
Rf
Mean[AIC]
SD[AIC]
0.454 600 MHz
4883.59
3915.86
0.109
0.262 600 MHz
1336.04
2998.99
0.257
0.545
0.268
800 MHz
1790.39
2000.84
0.025
0.121
0.142
0.115
800 MHz
184.097
759.199
1
0.110
0.507
0.211
0.453 600 MHz
800 MHz
9795.57
8310.55
2
0.098
0.249
0.154
0.227 600 MHz
800 MHz
3959.18
6922.11
3
0.051
0.146
0.135
0.136 600 MHz
800 MHz
818.647
847.911
4
0.068
0.194
0.143
0.178 600 MHz
800 MHz
2418.22
6134.28
5
0.244
0.356
0.141
0.329 600 MHz
800 MHz
5305.13
13643.1
6
0.303
0.254
0.143
0.254 600 MHz
800 MHz
21316.9
86664.7
7
0.049
0.099
0.138
0.098 600 MHz
800 MHz
321.115
398.497
8
0.051
0.097
0.131
0.097 600 MHz
800 MHz
319.896
401.147
188
Field 1
Field 2
The results of the Lipari-Szabo model-free analyses of the relaxation measurements are shown in Figure 4.21 (Model 2) and Figure 4.22 (Model 3), and in Appendix B (Model 2). The following observations were made in all analyses: In general, the field dependent Rex contributions (determined relative to the 600 MHz field) are less than 2 s−1 for most residues, the largest value being 3.3 s−1 . From the Lipari-Szabo based models (Models 2 and 3) the average S2 values are 0.58 and 0.50 for calculations with and without Rex , respectively. These results are very similar to analyses of relaxation data in other unfolded proteins where order parameters in the range of 0.4-0.6 were determined [90,223]. The 6×His affinity tag contains the residues with the highest order parameters (Figures 4.21(a) and 4.22(a)) but also contains some of the highest errors for all of the model parameter estimates. The His-tag region also exhibited the highest R2 values (Fig. 4.15), suggesting that accounting for conformational exchange or hydrogen exchange, not detectable by R1ρ , may improve the analysis. The τc values show a slight bell-shaped variation with sequence (Figs. 4.21(b) and 4.22(b)) with the correlation times at the protein termini being smaller than in the centre. Furthermore, the average τc values (1.9 and 3.6 ns, calculated with and without Rex respectively), are barely a factor of 10 greater than the average τe (Figs. 4.21(c) and 4.22(c)) values (0.14 and 0.19 ns) and in some cases the errors in the internal and overall correlation times overlap. This lack of stochastic independence in the correlation functions supports the notion that the HIV-1 Tat1−72 protein exists in a disordered or random coil-like conformation in which there is no clear separation of internal and overall rotational correlation times [247].
189
1.5
S
2
1
0.5
0
-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
25
!c (ns)
20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.21: Model-free parameter estimates using Model 2 (Rf = 0.227) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (600 and 800 MHz) using 58 residues common to both data sets. Residues Ser-4, His-5 and Ser-11 were omitted as outliers of the parameter estimates as they failed to converge to a solution. (a) Generalized order parameters S2 ; (b) local overall rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines. 190
800 700
!e (ps)
600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure 4.21: continued
191
1.5
S
2
1
0.5
0
-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
10
!c (ns)
8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.22: Model-free parameter estimates using Model 3 (Rf = 0.136) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (14.1 T and 18.8 T) using 58 residues common to both data sets.
No
residues were omitted as outliers. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal rotational correlation times τe (ps); (d) field independent conformational exchange parameters Φex (s/rad2 ). estimates are indicated by the solid lines. 192
The sequence mean values of the
800 700
!e (ps)
600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
!ex (10-17 s/rad2)
2.5 2 1.5 1 0.5 0 -0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
Figure 4.22: continued
193
The results of the Cole-Cole model-free analyses using Model 7 (Table 3.4) are shown in Figure 4.23. Compared to the Lipari-Szabo analyses there is a slight increase in the average order parameter to 0.63 as a result of the very high S2 values in the affinity tag. In general, there are fewer residues with high S2 values across the sequence (Fig. 4.23(a)), but several residues at the N-terminal end of the protein in the His-tag region have values of S2 =1. It is notable that these residues also have the highest errors in all of the estimated parameters. The mean local rotational correlation time (τ0 ) shows a significant decrease in its range (0.26-4.09 ns) as seen in Figure 4.23(b) compared to the Lipari-Szabo models (0.39-21.80 ns for model 2 and 0.28-9.07 ns for model 3), with an average value of 1.21 ns. The rotational correlation time for internal motions (τe ) is also lower in its range (0-384 ps) (Fig. 4.23(c)), and mean value (0.1 ns). Here again, the two correlation times are barely a factor of 10 different which further demonstrates the difficulty of separating the two modes of motion within a disordered protein. Thus, the use of a distribution of local overall correlation times does not appear to have greatly improved the model because most of the distribution width parameter estimates (ε) yield narrow distributions.
194
1.5
2
1
S
0.5
0
-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
5
!0 (ns)
4 3 2 1 0 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.23: Model-free parameter estimates using Model 7 (Rf = 0.098) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (14.1 T and 18.8 T) using 58 residues common to both data sets. Residue His-8 was omitted as an outlier of the parameter estimates as it failed to converge to a solution. (a) Generalized order parameters S2 ; (b) distribution mean local rotational correlation times τ0 (ns); (c) internal rotational correlation times τe (ps); (d) Cole-Cole distribution width parameters ε. The sequence mean values of the estimates are indicated by the solid lines. 195
800 700
!e (ps)
600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
1 0.8
!
0.6 0.4 0.2 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
Figure 4.23: continued
196
4.8
pH Effects
The NMR samples used for the sequential assignment and relaxation analyses of Tat were stable for over 1 year at pH 4.1. Figure 4.24 shows the effects of increasing pH on the 1
H/15 N-HSQC spectrum of Tat1−72 . The most obvious result is an overall reduction of
cross-peak intensities. Fast hydrogen exchange with water might account for this because exchange rates greater than 103 s−1 will result in the loss of signal intensity from chemical exchange line broadening. To determine if the observed losses are a result of the increasing rates of hydrogen exchange, the theoretical hydrogen exchange rates for unfolded Tat at various pH values were calculated taking into account the nearest neighbour inductive and steric effects [230] (Fig. 4.25). The calculated hydrogen exchange rates for His-tagged Tat between pH 3.3 and 5.8 are predicted to increase on the order of 250-fold. At pH 5.8, the calculated hydrogen exchange rates range from 0.0015 s−1 for Gln-92 to 63.0 s−1 for Gly-2 and it is likely that for many of the peaks the intensity loss is attributable to rapid hydrogen exchange. For example, the histidines in the affinity tag and the Cys residues are predicted to be the fastest exchanging amides and their cross-peaks diminish early in the pH titration. In these cases, the loss of cross-peaks from the spectra is indirect evidence that a residue is not involved in a stable, folded conformation. However, detailed analysis of the peaks shows that hydrogen exchange alone cannot explain all the peak heights. For example, Thr-40 is predicted to exchange slowly, yet its cross-peak disappears early in the titration (Fig. 4.24). These results suggest that some cross-peaks may lose intensity because of the development of local conformations that are in intermediate exchange on the µs-ms timescale as observed in the molten globules of other proteins [255].
197
109
G64 G35 G81
110
G68
G18
111
G13
112 113 114 T84 S19
S36
S3
115
T40
S82
116 117 118 119
N(ppm)
T43
120 121 122 123 124 125 126 127 128 129 130 131 9.0
8.8
8.6
8.4
8.2 N
8.0
7.8
7.6
H (ppm) Figure 4.24: Two-dimensional 1 H/15 N-HSQC spectra of Tat1−72 at 293 K observed at pH 3.3 (red), pH 4.1 (yellow), pH 5.3 (green), pH 5.8 (blue) and pH 6.7 (violet). All samples are approximately 1 mM and were obtained from a single expression/purification. Each spectrum was collected with 32 transients, 2048×256 complex points, and sweep widths of 10 ppm in F2(1 H) and 24 ppm in F1(15 N).
198
Predicted kex (min-1)
16 14 12 10 8 6 4 2 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue Predicted kex (min-1)
(a)
80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.25: Predicted amide hydrogen exchange rates (kex ) for His-tagged Tat1−72 at 293 K for pH values (a) 3.3, (b) 4.1, (c) 5.3, (d) 5.8, and (e) 6.7 using the method of Bai et al. [230].
199
Predicted kex (min-1)
1200 1000 800 600 400 200 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue Predicted kex (min-1)
(c)
4000 3500 3000 2500 2000 1500 1000 500 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (d)
Figure 4.25: continued
200
Predicted kex (min-1)
35000 30000 25000 20000 15000 10000 5000 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (e)
Figure 4.25: continued Several resonances in Figure 4.24 remain resolved with the increase in pH and were examined more closely. These resonances correspond to those of the glycine and serine/threonine region of the HSQC spectra and include Gly-13, Gly-18, Ser-19, Gly-35, Ser-36, Thr-40, Thr-43, Gly-64, Gly-68, Gly-81, Ser-82, and Thr-84. The profiles of the absolute peak heights of each of these residues with increasing pH are given in Figures 4.26 and 4.27. In each profile it is clear that the maximum intensity is achieved at pH 4.1. The observation of the maximum intensity at pH 4.1 is slightly higher than the observed pH for which the hydrogen exchange rate is at a minimum in model compounds. The plot in Figure 4.28 shows the calculated variation in the net charge of His-tagged Tat1−72 with increasing pH. The net charge calculations were determined using the European Molecular Biology Laboratory (EMBL) Isoelectric Point Service (http://www.emblheidelberg.de/cgi/pi-wrapper.pl). The pI (the pH for which the protein is neutral) for this Tat sequence was determined to be 10.43.
201
6e+08
Peak Height
5e+08 4e+08 3e+08 2e+08 1e+08 0
2
3
4
5
6
7
6
7
pH (a) Gly-13
4e+08
Peak Height
3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2
3
4
5
pH (b) Gly-18
Figure 4.26: Variation in absolute peak heights with increasing pH for observed glycine residues from 1 H/15 N HSQC spectra measured at 293 K for pH values 3.3, 4.1, 5.3, 5.8, and 6.7. Noise estimates in the spectra varied in the 1 × 105 to 3 × 105 range.
202
Peak Height
3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0
2
3
4
5
6
7
5
6
7
pH (c) Gly-35
Peak Height
3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2
3
4
pH (d) Gly-64
Figure 4.26: continued
203
4e+08
Peak Height
3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0
2
3
4
5
6
7
5
6
7
pH
Peak Height
(e) Gly-68
4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2
3
4
pH (f) Gly-81
Figure 4.26: continued
204
4e+08
Peak Height
3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0
2
3
4
5
6
7
5
6
7
pH
Peak Height
(a) Ser-19
5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0
2
3
4
pH (b) Ser-36
Figure 4.27: Variation in absolute peak heights with increasing pH for selected serine and threonine residues from 1 H/15 N HSQC spectra measured at 293 K at pH values 3.3, 4.1, 5.3, 5.8, and 6.7. Noise estimates in the spectra varied in the 1 × 105 to 3 × 105 range.
205
Peak Height
5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0
2
3
4
5
6
7
5
6
7
pH
Peak Height
(c) Ser-82
1.8e+08 1.6e+08 1.4e+08 1.2e+08 1e+08 8e+07 6e+07 4e+07 2e+07 0 2
3
4
pH (d) Thr-40
Figure 4.27: continued
206
7e+08
Peak Height
6e+08 5e+08 4e+08 3e+08 2e+08 1e+08 0
2
3
4
5
6
7
6
7
pH
Peak Height
(e) Thr-43
5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2
3
4
5
pH (f) Thr-84
Figure 4.27: continued
207
30 25
Net Charge
20 15 10 5 0 -5 -10 -15
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14
pH Figure 4.28: Decrease in calculated net charge with increasing pH for His-tagged Tat1−72 . The net charge was determined using the EMBL Isoelectric Point Service (http://www.emblheidelberg.de/cgi/pi-wrapper.pl). Filled circles correspond to the calculated net charge for the protein sequence at increments of 0.5 pH units. Open circles correspond to the predicted net charge of the protein at pH values used for the HSQC measurements in Fig. 4.24 from a cubic spline interpolation of the EMBL calculations.
4.9
Disorder Predictions
Several programs exist to search protein sequences for regions of disorder (DisEMBL, PONDR, FoldIndex, RONN, IUPred, DISOPRED and DisProt). Four of these programs: PONDR [56–58], RONN [61], DisProt [59, 60] and IUPred [54, 55], were tested with the sequence of His-tagged Tat1−72 to compare their predictions with observations of the protein’s flexibility measured by NMR spectroscopy.
208
The DisProt neural network-based predictions shown in Figure 4.29 ((a) to (c) in order of increasing algorithm complexity) predict that the protein is essentially completely disordered with only a slight suggestion of structure in the vicinity of Lys-49. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence. The prediction scores using the VL3 algorithm (Fig. 4.29(a)) for Lys-48, Lys-49 and Cys-50 are 0.474, 0.482, and 0.491 respectively. The VL3H and VL3E algorithms both predict all residues are disordered with scores > 0.5 , but there is a tendency to order in the region of Lys-48 to Cys-50. The disorder predictions produced by PONDR (Fig. 4.30), RONN (Fig. 4.31) and IUPred (Fig. 4.32) all indicate an ordered region exists in the segment containing the Cys-rich region of the protein although the width of the ordered segment varies among the predictions. The PONDR scores (Fig. 4.30) also indicate a short, ordered segment between Val-87 and Lys-91.
209
DISPROT (VL3) score
1
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
DISPROT (VL3H) score
1
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure 4.29: DisProt [59, 60] disorder predictions for the His-tagged Tat1−72 amino acid sequence using the algorithms: (a) VL3, (b) VL3H, and (c) VL3E. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.
210
DISPROT (VL3E) score
1
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure 4.29: continued 1
PONDR score
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue
Figure 4.30: PONDR [56–58] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.
211
1
RONN score
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue
Figure 4.31: RONN [61] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.
Disorder Tendency
1
0.8
0.6
0.4
0.2
0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue
Figure 4.32: IUPred [54, 55] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.
212
Chapter 5 Discussion 5.1
Protein Expression and Purification
The Tat1−72 gene was cloned into pET28 by G. Henry, to enable protein expression with an N-terminal, thrombin-cleavable, 6×His purification tag. Metal affinity chromatography has been used to rapidly purify many proteins to greater than 95% purity in a single step; rapid purification was considered important for the highly oxidation-prone Tat protein. The ability to elute the protein from the resin at low pH also helped prevent disulphide bond formation. A number of approaches to increasing Tat1−72 expression have been attempted in the past [256, 257]. In this case, the pSV2tat72 vector [186] was chosen to be the source of the tat gene because it is codon-optimized for expression in E. coli. As Tat has well described cytotoxic effects, the pET28 plasmid was chosen for expression because of the stringent control over the lacUV5 promoter owing to the presence of the lac repressor (lacI) [258]; the plasmid is also less easily lost from the cell because of the kanamycin resistance gene. The choice of a pLysS-containing host was also made with the toxicity of Tat in mind but has
213
the added advantage of more facile cell lysis by the endogenous T7 lysozyme. Several reducing agents (β-mercaptoethanol, dithiothreitol, tris(2-carboxyethyl)phosphine, tris(2-cyanoethyl)phosphine, tris(hydroxypropyl)phosphine, and sodium sulfite) were tested for the production of reduced, monomeric Tat1−72 . Tris(2-carboxyethyl) phosphine (TCEP), in the presence of 6 M guanidine, was found to be the most effective. TCEP is both a stronger reducing agent and effective over a wider pH range (1.5-8.8) than thiol reducing agents such as β-mercaptoethanol (βME) and dithiothreitol (DTT) [259]. Its use permitted the entire purification—from cell lysis at neutral pH to elution from the cobalt resin at pH 4—to be done in a strong reducing environment. This is not possible with thiol reducing agents as neither DTT (pKa = 9.2, 10.1) nor βME (pKa = 9.6) are effective reducing agents at low pH [260]. According to the manufacturers (Clonetech and QIAGEN), DTT is not compatible with metal affinity resins but we observed no incompatibility between TCEP and immobilized Co2+ and Ni2+ . βME can be used up to concentrations of 20 mM in metal affinity chromatography (according to the manual from QIAGEN) but its effectiveness as a reducing agent is much less than that of DTT or TCEP, especially at lower pH values. One problem with TCEP is that it has 3 negative charges at neutral pH and precipitates the highly basic Tat protein. To overcome this problem, 6 M guanidine was included in the extraction buffer; with subsequent removal of guanidine and TCEP together at low pH by dialysis. Tris(2-cyanoethyl)phosphine (TCP) is also a strong reducing agent and has added advantages: it is neutral, it does not precipitate Tat, and it can access less solvent exposed sulfhydryls. Unfortunately, TCP is less soluble and significantly less stable than TCEP, oxidizing more readily in air. Tris(hydroxypropyl)phosphine (THP) was also investigated as a reducing agent as it is miscible with water and neutral. However, THP is a viscous liquid and very reactive. To deal with the short lifetime of the reducing agent, attempts were made to degas and seal protein samples containing THP in the NMR tubes. However, it was found that the viscosity of THP-aqueous Tat mixtures significantly increased the protein 214
NMR line-widths. Another reducing agent that was tested was sodium sulphite. A small amount was added to the NMR sample at low pH and has the advantage of being invisible to 1 H NMR spectroscopy but is only a mild reducing agent in comparison to TCEP. The use of 6 M guanidine throughout the purification has several advantages over alternatives such as 8 M urea with or without 1-2 M NaCl. In comparison to urea, guanidine solutions consistently yielded the highest amounts of soluble Tat in the initial cell lysates. Presumably, high concentrations of guanidine encourage dissociation of Tat from DNA, RNA, and other anionic molecules. Furthermore, removal of the guanidine during the washing steps of the protein while bound to the metal affinity resin resulted in very slow elution of the protein from the resin suggesting, that the Arg-rich basic domain of the protein can interact with the nitrilotriacetic acid groups of the Sepharose resin that have lost their coordinated metal ion. In the presence of 6 M guanidine at pH 4, 70% of the Tat protein eluted from the immobilized metal resin in the first two 1 mL fractions (based on UV absorbance measurements with spectroscopic grade guanidine hydrogen chloride) whereas in the absence of guanidine the same fraction of protein eluted in approximately 40 mL. Purification with denaturant used throughout each step of the purification process followed by subsequent removal during dialysis was found to be the most efficient way to obtain large quantities of reduced, monomeric protein. Removal of the denaturant and reducing agent by dialysis necessitated the use of low pH buffers to maintain the cysteine thiol groups in their unreactive protonated state. Degassing the dialysis and NMR buffers as well as maintaining an argon atmosphere further reduces the possibility of oxidation of the protein. TCEP was not used as a reducing agent in the NMR samples because it apparently precipitates Tat at neutral pH. Instead, rigorous degassing of the sample buffer and addition of a mild reducing agent (sodium sulphite) allowed preparation of NMR samples that were stable for more than 6 months.
215
5.2
NMR Spectroscopy and Backbone Assignment
The spectral dispersion of the resonances shown in Figure 4.3 is typical for disordered proteins. The clustering of resonances into three regions is also expected for disordered proteins. The narrowness of the dispersion range in both 1 H and 15 N dimensions is indicative of disorder and is virtually identical to the dispersion of resonances for strongly denatured ubiquitin [236]. The resonance line-widths are broad relative to those of the native state of ubiquitin (MW =8565 Da, τc =4.1 ns) with 1 H and respectively [169]. The 1 H and
15
15
N linewidths of 6-9 Hz and 3 Hz
N linewidths of Tat have mean values 15±5 Hz and 6±1
Hz suggesting possible conformational exchange on the intermediate NMR time scale (µs-ms range). Line broadening may also result from hydrogen exchange with the water solvent. However, as these Tat samples are at low pH (∼4) where the rate of hydrogen exchange is near its minimum [261, 262], it is not likely that exchange with the solvent is the major cause of the line broadening. Hence, the broad lines observed for Tat are most likely the result of conformational exchange in the µs-ms range as one would expect for a protein that lacks regular secondary structural elements. The intensity profile depicted in Figure 4.5 shows that the weakest resonances lie within the Cys-rich (residues 42-57) and core (residues 58-67) regions of the protein. Several residues (see Tables A.1 and A.2 in Appendix A) are observed to have multiple resonances in slow exchange. Many of these additional resonances are associated with residues in the Cys-rich and core regions of Tat. The decreased intensity of resonances within these regions are the result of the intensity being split between multiple signals in slow-exchange or by line broadening of the residues in intermediate exchange. The conformational exchange on the µs-ms timescale in these regions may indicate regions of transient structure formation, which may only become stabilized in the presence of zinc ions, binding to TAR, cyclin T1, or other binding partners. The absence of additional unassigned peaks in the NMR spectrum of the unlabelled 216
Tat (Fig. 4.3(b)) indicates a high level of purity since other unlabelled proteins in the sample would produce signals through their 15 N natural abundance and confirms the MALDITOF-MS analysis. The backbone assignment described in [166] resulted in unambiguous assignment of 80 of the 83 observable (non-proline and non-N-terminal) amide resonances. The assignments of the Cys residues are particularly informative as they confirm that all of the Cys residues are reduced; all of the Cα and Cβ chemical shifts (shown in Fig. 4.7) observed in the 3-dimensional HNCACB [194] spectrum are in the range of the random coil chemical shifts of reduced cysteine rather than oxidized cysteine involved in disulfide bond formation. The chemical shift resonances for the Cys residues also confirm the findings from the MALDI-TOF-MS analysis that the protein is in the reduced monomeric state and that the weak peaks in Figure 4.2 most likely indicate the presence of non-covalent oligomers formed during the MS analysis. Due to the narrow dispersion and overlap of signals in the proton dimension, many experiments were required for the sequential assignments of the protein backbone. All of the experiments used (listed in Table 3.3) required 3-dimensional HN detection for the assignment of protein backbone resonances. These experiments utilize the increased dispersion in the 15 N and 13 C dimensions to resolve the clustered regions of the 1 H/15 N-HSQC spectra. However, in order to resolve some of the clustered resonances, the experiments required higher resolution than used for folded proteins of comparable size. The sequential assignment of the backbone was further complicated by the presence of multiple slowexchange resonances which needed to be assigned as well.
5.3
Chemical Shifts and Coupling Constants
The use of the chemical shift as an indicator of secondary structure has been applied to the analysis of proteins under denaturing conditions to identify regions of residual structure in 217
the protein [254, 263–265]. In some cases, non-native residual structure is identified that may indicate transient intermediates along the protein folding pathway [254, 263]. In the case of intrinsically disordered proteins, which lack identifiable structural elements at high resolution, the chemical shift provides a means of identifying residual structure that may be characterized as a helix, loop or sheet. The CSI procedure introduced by Wishart and Sykes [241] has been applied to identify the secondary structural elements in folded proteins. However, for the study of unfolded, denatured, or disordered proteins, variations in the procedure have been adopted to account for local sequence effects [203, 204, 252] on the reference shifts for the random coil. Consistent deviations over three or four residues from the sequence-corrected random coil values may serve as indicators of residual structure in these proteins. In the absence of the sequence correction to the random coil chemical shifts, small chemical shift variations along a protein sequence may be too subtle to observe conformational preferences along the backbone. For His-tagged Tat1−72 (Fig. 4.8), the majority of the chemical shift differences from the sequence-corrected random coil shifts [203, 204] lie within the bounds of the random coil conformation. Because no sequence-dependent corrections were available for the Cβ random coil chemical shifts, these differences are less informative than the other difference plots. The difference plots for the HN , Cα and Hα chemical shifts, as well as the 3 JH N H α coupling constants, show a slight preference for helical conformation in the vicinity of Glu-29 but there is no uninterrupted segment of 3 or more residues defining a helical domain. The difference plot for the C’ shifts (Fig. 4.8(c)) indicates a slight preference for the β conformation in the Cys-rich region (residues 42-57) although this cannot be confirmed from any of the other difference plots. The 3 JH N H α coupling constants do not verify this observation as many of the Hα shifts in this region were absent in the HNHA experiment used to measure the coupling constants. The region with a weak preference for helical conformations near Glu-29 is very close 218
to the single Trp residue at position 31 (noted in Fig. 4.9). Studies of the unfolded state of drkN SH3 domain observed non-native burial of the Trp indole ring and at the centre of a hydrophobic cluster [169, 191]. The Trp indole H-N resonance in the HSQC spectrum (Fig. 4.4 inset) shows multiple signals, one strong and two weak cross-peaks. These crosspeaks may be the result of cis-trans proline isomerization from the preceding Pro residue or perhaps arise from slow conformational exchange between the open and buried Trp state. Isomerization at Pro-30 would more likely affect its preceding residue, Glu-29 (noted in Fig. 4.9), and isomerization at Pro-34 is unlikely to influence chemical shift deviations three residues away at Trp-31. It is possible that interactions between the indole ring of Trp-31 with the charged imidazole ring of His-33 account for some restriction in the flexibility and weak helical conformational preference in that region. The presence of proline at positions 30 and 34 may also prevent stabilization of this segment as a helix [266–268]. The region that shows a slight tendency toward the β conformation is in the region of residues 44-60 which includes most of the Cys-rich region (residues 42-57) of the protein. Although the chemical shift difference plot for the C’ shifts does not reveal any segment of 3 or more residues outside the random coil range, all of the residues in this region have values closer to β-sheet than to α-helix. This Cys-rich region can also be seen in Figure 4.9 as the yellow region deviating from the simple extended structure.
5.4
NMR Relaxation
Despite the relatively poor chemical shift dispersion in the NMR spectra of Tat, relaxation data were obtained for 77% and 72% of the observable resonances (non-proline and nonN-terminal) at the 600 MHz and 800 MHz spectrometer frequencies respectively.
An
unfortunate consequence of reduced spectrometer time at the higher frequency was the loss of resolution preventing identification of some closely clustered resonances. 219
The lack of any significant variation in the steady-state heteronuclear NOEs (Fig. 4.11(a)), coupled with their negative values, are good indicators of the degree of uniform disorder (or less restricted dynamics) throughout the protein backbone at the ns-ps timescale. The mean values for the NOE at 600 and 800 MHz are -1.27±0.46 and -0.93±0.33 respectively. These values are consistent with NOEs obtained by Farrow et al. [90] for the guanidine-denatured state of the 59 residue drkN SH3 domain which had mean NOEs of -1.41±0.37 (at 500 MHz) and -1.20±0.31 (at 600 MHz). The larger negative NOE values for the unfolded state of drkN SH3 indicate less restricted motions at the ns-ps timescale in contrast to observations on the folded state of drkN SH3 where the mean NOEs were found to be -0.39±0.09 (at 500 MHz) and -0.36±0.19 (at 600 MHz). The less negative mean values as well as the more narrow variation in the values across the sequence indicates a greater degree of restricted motions in the folded state of drkN SH3. The longitudinal relaxation rates (R1 ) observed for Tat (Fig. 4.11(b)) have mean values of 1.5±0.2 and 1.4±0.2 s−1 at 600 and 800 MHz respectively. These R1 values for Tat are similar to those observed for drkN SH3 denatured state (1.6±0.2 s−1 at 500 MHz and 1.5±0.2 s−1 at 600 MHz). The folded state of drkN SH3 has mean R1 values of 2.5±0.3 s−1 (500 MHz) and 2.2±0.2 s−1 (600 MHz). The slightly lower R1 values for the unfolded state of drkN SH3 and those observed for Tat compared to the folded state of drkN SH3 imply slower relaxation and hence shorter rotational correlation times (τc ) and faster dynamics at the ns-ps timescale. The rotating frame longitudinal relaxation rates (R1ρ ) measured at both 600 and 800 MHz (Fig. 4.11(c)) have means of 3.3±1.1 and 3.5±1.4 s−1 respectively. These rates show more variation across the sequence than the R1 values, and have local maxima in the neighbourhood of the end of the Cys-rich region (Cys-57 to Ile-59), in the middle of the Pro-rich region (near the residues adjacent to Pro-30), and near the hexahistidine segment of the affinity tag (His-5 to His-10). These same regions are found to have higher values 220
of transverse relaxation rates (R2 ) measured at 600 MHz (Fig. 4.15). The increased rates in these regions, relative to the neighbouring residues, are likely indicative of contributions from slow conformational exchange. The mean transverse relaxation rates (R2 ) from the 600 MHz field measurements were found to be 3.8±1.3 s−1 . The increase in the average R2 rate over that of the R1ρ is likely the result of the increased sensitivity of the R2 experiment to conformational exchange. Both the R1ρ and R2 means are slightly higher than the values observed with the urea denatured state of drkN SH3 (approximately 3.0±0.9 s−1 at both 500 and 600 MHz) [90]. The folded state of drkN SH3 has mean transverse rates of approximately 6.0±0.8 s−1 . Again the slower rates observed for Tat are indicative of faster dynamics at the ms-µs timescale. It is worth noting that the errors in both R2 and R1ρ measurements for Tat are roughly a factor of 10 and 4, respectively, larger than the errors observed for the R1 measurements. There are several reasons for this increase in the error. In general, the transverse rates are faster than the longitudinal relaxation rates. Consequently, the signal for the resonance will decay faster and reduce the signal-to-noise of the peak thereby making measurement of the intensity higher in error. Additionally, the pulse sequence for both the R2 [178, 201] and R1ρ [202] experiments contain many pulses on the
15
N channel for the CPMG (in R2 ) and
spin-lock (in R1ρ ) that introduce errors due to magnetic field inhomogeneities. These same pulses on the
15
N channel may also introduce coil/sample heating which will affect both
the position and intensity of the resonance resulting from the rotational correlation time decreasing as the viscosity of the solvent decreases with increasing temperature [184] as well as deterioration of the lock signal. The heating of the sample and coil as the relaxation delay increases limits the range of relaxation delays available for both the R2 and R1ρ experiments (relaxation delays should be & 250 ms). This limit on the maximum relaxation delay results in a much smaller range of sampling times compared to the R1 experiments which samples relaxation delays between 0 and 4 seconds and the decay to zero is observed. 221
This is illustrated in the sample fits for the T1 (Fig. 4.13), T1ρ (Fig. 4.14) and T2 (Fig. 4.17) data of Gly-68 that show the narrow range of data sampled in the T1ρ and T2 measurements compared to the T1 measurements (since the decay to zero is not observed).
5.5
Reduced Spectral Density Mapping
The original implementation of spectral density mapping by Peng and Wagner [181] used an expanded set of six relaxation experiments in order to evaluate the spectral density function at the five critical frequencies for a single field strength: J(0), J(ωN ), J(ωH − ωN ), J(ωH ), and J(ωH + ωN ) as well as the contribution from conformational exchange on the µs-ms timescale. The reduced spectral density method [90, 181, 213] avoids the collection of six relaxation rates by replacing J(ωH ) and J(ωH ± ωN ) with a single high frequency spectral density function [181] and combining the contribution from conformational exchange into an effective low frequency spectral density, Jef f (0) defined in equation (3.6). hif req Jred = J(ωH + ωN ) = J(ωH − ωN ) = J(ωH )
(5.1)
The above assumption is deemed reasonable because J(ω) is relatively flat at the frequencies ωH and ωH ± ωN , and because the heteronuclear cross-relaxation rates become smaller with increasing field strength [181]—the latter observation is true for Tat but the former has not been verified. The reduced spectral density approach thus allows for the mapping of the spectral densities at three frequencies, Jef f (0), J(ωN ), and J(0.87ωH ), using only three relaxation data sets which for convenience are chosen to be the R1 , R1ρ and the steady-state heteronuclear NOE since they are also used for the model-free formalism. If relaxation data are collected at two field strengths, then the spectral density is mapped at five frequencies corresponding to the 0 and the Larmor frequencies of the
222
15
N and 1 H spins at each field.
Figure 5.1(a) shows the plots of the longitudinal and transverse relaxation rates and the steady-state heteronuclear NOE determined as a function of the overall rotational correlation time of the molecule from evaluation of the relaxation parameters in equations (2.173), (2.174) (assuming Rex = 0) and (2.186) using the orientational spectral density function in (2.135) for an isotropically tumbling rigid body where τc represents the overall rotational correlation time [269]. The plots were calculated with respect to a 14.1 T magnetic field. Figure 5.1(b) shows the variation in the corresponding spectral density functions with the overall rotational correlation time from evaluation of (2.135) at frequencies 0, 61, and 600 MHz. Both plots in Fig. 5.1 are presented in a logarithmic scale except for the NOE (right y-axis). For a protein the size of His-tagged Tat1−72 (10.5 kDa) the theoretical rotational correlation time according to the Stokes-Einstein-Debye equation [261,270] for spherical body is given by τm =
ηV kT
(5.2)
where η is the viscosity of the sample (1.014 mPa·s), V is the hydrodynamic volume (1.2715× 10−26 m3 ), k is the Boltzmann constant (1.38066 × 10−23 J· K−1 ), and T is the temperature (293 K). The calculated overall rotational correlation time for His-tagged Tat1−72 using equation (5.2) is 3.19 ns.
223
4
0 R2
Log10(Ri)
2
-1
1
-2
0 R1
-1
-3
-2 -3 -5
-4
NOE
-4 12
NOE
3
11
10
9
8
7
6
5
-5
-Log10(!m) (a)
-6
Log10[J(!)]
-7
J(0)
-8 -9 -10 J(!N)
-11 -12
J(!H)
-13 -14
12
11
10
9
8
7
6
5
-Log10("m) (b)
Figure 5.1: (a) Variation in the theoretical relaxation rates and steady-state heteronuclear NOE with overall rotational correlation time from evaluation of equations (2.173), (2.174), and (2.186) assuming Rex = 0 and using the orientational spectral density function defined in equation (2.135) relative to a 14.1 T field; (b) Variation in the orientational spectral density function evaluated at zero, ωN and ωH frequencies relative to a 14.1 T field with the overall rotational correlation time (τm ). 224
The plots of the spectral density function (Fig. 5.1(b)) evaluated at 0 and ωH (61 MHz) are clearly colinear in ps-ns range but begin to diverge near 10 ns.
Given the
estimated correlation time from the Stokes-Einstein-Debye equation of 3.19 ns for Histagged Tat1−72 , the J(0) and J(ωN ) motions will be correlated for the 14.1 T and 18.8 T fields. Conversely, the spectral density function at low- and mid-frequencies will be anticorrelated with the J(ωH ) [271]. The integral of the spectral density function J(ω) is constant over the entire frequency range [184]. For motionally restricted amide bond vectors, the greatest contributions to the spectral density function will come from the low-frequency components and the high frequency contributions will be minor [272]. Highly mobile N-H vectors will have the greatest contribution to the spectral density function from the highfrequency components, J(ωH ) and the low-and mid-frequency contributions will decrease. Consequently, intramolecular motions increase the values of J(ω) at high frequency but decrease its magnitude at low and mid-frequencies for proteins in the small- to medium molecular weight range [271, 272]. With these characteristics in mind, the interpretation of the reduced spectral density mapping allows for the following observations (Sections 5.5.15.5.3) on the dynamics of the His-tagged Tat1−72 protein.
5.5.1
J(0.87ωH )
The high frequency spectral density plots (Fig. 4.18(a)) show very little variation in range over the length of the sequence except in the region of the hexahistidine affinity tag. There is a significant increase in the high-frequency contributions in the N-terminal His-tag (residues 4-10) observed at 522 MHz. The C-terminal region of the protein does not show any significant increase in high-frequency components. This observation is in contrast to the low pH and urea-unfolded state of apomyoglobin in which minima in J(0.87ωH ) plots correspond to maxima in buried surface area in the folded protein [217] suggesting that hydrophobic
225
interactions persist even in 8 M urea and low pH. Interestingly, the minima in the high frequency spectral densities are less apparent in the acid-unfolded state of apomyoglobin [254] but the maxima in the J(ωN ) plot still correlate weakly with the maxima in buried surface area—implying that the spectral density at mid-frequency is more sensitive to residual structure. The lack of definition in the J(522) and J(696) plots for reduced Tat1−72 implies a lack of formation of any residual structure at pH 4.1.
The range of values for the
high frequency spectral density corresponds well to the range observed for the guanidine denatured state of drkN SH3 [90], acid-denatured apomyoglobin [254], low pH/urea denatured apomyoglobin [181], and the natively disordered pro-peptide of subtilisin [223]. By comparison, the folded state of drkN SH3 shows the J(0.87ωH ) values to be roughly half of those observed for the unfolded state [90]—indicating a greater degree in the restriction of the motions probed at these high frequencies.
5.5.2
J(ωN )
More variation is observed in the range of values for J(ωN ) probed at 61 and 81 MHz (Fig. 4.18(b)). The spectral density function at mid-frequencies is sensitive to motions on the ns-ps timescale. In contrast to the high frequency values for small- and mediumsized proteins, the increased motion in the protein backbone is marked by a reduction in the spectral density at mid-frequencies [271]. The J(ωN ) plots (in Fig. 4.18(b)) show that the termini have the lowest contribution to the spectral density at mid-frequencies. These residues are those involved in the slowest relaxation rates and most negative NOEs (Fig. 4.11) and hence are the regions of greatest flexibility. The increased motion at the ends of the protein is common for both folded and unfolded proteins and is often termed end effects [242, 273]. Residues Asn-44, Lys-61 and Ala-62 were also found to have significantly 226
faster motions—on the order of the sequence termini. Schwarzinger et al. [252] in the studies of low pH/urea denatured apomyoglobin noted that high proportions of glycine and alanine were present in the most flexible regions of the protein and suggested that the Gly/Ala rich segments serve as flexible “molecular hinges”. The region of Tat following the Cys-rich region contains one alanine and two glycines (Ala-62, Gly-64 and Gly-68). Interestingly, Gly-64 is one of the residues that is observed as two distinct peaks of approximately equal intensity in the HSQC spectra of Tat (Fig. 4.3 and 4.4), while Ala-62 and Leu-63 each have four distinct peaks of varying intensity (the most intense peak is labelled in Fig. 4.3 and 4.4 and additional resonances are tabulated in Table A.2). At the other end of the Cys-rich region there is only a single alanine residue (Ala-41) within the 10 residues preceding the Cys-rich region. The small side-chains of alanine and glycine residues flanking the Cys-rich region may provide a higher degree of mobility at the ends of the Cys-rich region and thereby allow more rapid sampling of conformational space and facilitate the rapid reorientation of the cysteines in the presence of Zn2+ ions or a binding partner (cyclin T1) involved in transcriptional regulation [105, 147, 274]. The reduced spectral density mapping of the acid-unfolded [254] and to a lesser extent the low pH/urea denatured [252] state of apomyoglobin showed local maxima of J(ωN ) correlated with maxima in the average area buried upon folding. As there is no folded form of the Tat protein to compare buried surface area, it can only be said that three residues exceed one standard deviation of the average value of the J(ωN ) data at both fields: Glu-29, Trp-31 and Lys-32—perhaps indicating the burial of the indole ring of Trp-31 in a hydrophobic cluster.
227
5.5.3
Jef f (0)
The errors in the R1ρ measurements noted previously become quite significant in the calculation of the low frequency spectral density, Jef f (0), using equation (3.15) from the method of Farrow et al. [90] and shown in Figure 4.20(a). The measured values for R1ρ are involved only in the analytical solution of Jef f (0) in the reduced spectral density mapping from equations (3.8)-(3.10). The R1ρ errors propagate in the estimation of Jef f (0) using equation (3.15) such that they result in errors in Jef f (0) that are roughly 4 times the errors in the two estimates of Jef f (0) using the 600 and 800 MHz data sets separately. With this in mind it is more useful to use the mean residue zero frequency spectral density (Jef f (0) in Fig. 4.20(d)) to assess the slow motions of the protein. The overall average in the low frequency spectral density is 0.7±0.3 ns/rad. Residues with significantly larger values of Jef f (0) compared to their neighbouring residues can indicate regions of conformational exchange on the µs-ms timescale. There are eight residues which exceed one standard deviation of the mean in Figure 4.20(d): Asp-25, Leu-28, Glu-29, Trp-31, His-33, Gly-35, Ala-41, and Val-56 (residues Leu-28, Glu-29, Trp-31 actually exceed 2×s.d.). Of these residues, only Glu-29 and Trp-31 agree with the spectral density mapping at high and mid-frequencies in terms of restriction in the dynamics and may perhaps indicate conformational preferences. According to the chemical shift difference analysis (Fig. 4.8), some residues in this region do exceed the limits of the random coil range. However, it should be kept in mind that, even though exceeding 2×s.d. of the mean, all Jef f (0) values still fall in the range of motions observed for unfolded or partially unfolded proteins [90,223,252, 254, 264, 272, 275–277]. The fact that two residues in a three residue segment (Glu-Pro-Trp) have spectral density values consistent with more restricted dynamics does not necessarily imply that they are involved in transient structure formation as the actual values of all the spectral densities are still in ranges comparable to disordered or denatured proteins. However, based on the chemical
228
shift differences, it may not be unreasonable to suggest that there may be some tendency of this segment to exist in α-helical conformations.
5.6
Model-Free Analysis
The use of the model-free formalism is often employed to interpret NMR relaxation parameters in terms of the motions of folded globular proteins. However, in the case of denatured or disordered proteins, the Lipari-Szabo approach [175, 176] is quite limited due to the underlying assumption of the separability of the internal and overall motions. Some success has been made in the modelling of denatured or disordered proteins using variations of the model-free method such as the Cole-Cole [223, 224] and Lorentzian [225] distribution models in which the model-free spectral density is based on a distribution of local overall correlation times. The assumption of the separability of the internal motions from the distributions of overall motions is however still present in these approaches. Interpretation of the relaxation data, on partially or fully disordered proteins, in terms of the Lipari-Szabo formalism [175, 176] (or variations thereof [221, 223–225]) is complicated by several factors including the following [269]: • The disordered state of the protein exists as an ensemble of conformations in fast exchange on the NMR timescale and the measured relaxation parameters therefore represent a population weighted average of the ensemble through chemical shift averaging. • The shape of any one of the conformations in the ensemble could be anisotropic. • The disordered protein is not likely to be fully extended and behave as a rigid rotor and may therefore preclude a description of molecular reorientation in terms of an overall rotational correlation time. 229
• Some disordered proteins (or partially disordered proteins) exist as an ensemble of conformations in intermediate exchange on the NMR timescale diminishing the intensities of the NMR resonances. With regard to the fact that a disordered protein exists as an ensemble of rapidly converting conformations, it can be assumed that conformational averaging should ‘smooth’ static anisotropy resulting in an isotropic average [269, 278]. The generalized order parameter S2 in equation (2.198) from the original Lipari-Szabo formalism [175, 176] describes the amplitude of the motions of the internuclear vector (the 1
H-15 N amide bond vector in this case) on a timescale faster than overall tumbling [279].
The order parameter, according to the assumptions of Lipari and Szabo, is defined such that 0 & S2 & 1 with S2 = 1 representing complete restriction of internal motion and S2 = 0 representing relaxation in which the internuclear vector is completely dominated by internal motions. The 1 − S2 term in equation (2.198) represents the extent of orientational motion that is lost due to the internal motions as opposed to rotational diffusion [279]. The flexibility of a protein backbone is therefore reflected in the magnitude of the order parameter. As flexibility in a protein often changes as a result of binding interactions, changes in the order parameter may provide a useful indicator of regions where binding of a flexible region of the protein to a target (protein binding partner, nucleic acid, etc.) occurs. The generalized order parameter could therefore reflect changes in the dynamics of the Cys-rich region of Tat (residues 42-57) in the presence of Zn2+ and cyclin T1, or the basic region (residues 68-77) in the presence of TAR. Fast internal motions described by the effective internal correlation times (τe ) are associated with small amplitude librations resulting from restriction of the N-H bond vector. Internal motions less than 100 ps are associated with order parameters in the 0.7 to 1.0 range [272, 280]. Regions of consecutive residues in the protein where the internal motions 230
of the N-H bond vector are fast relative to the rest of the segment may indicate transient structure or centres of early folding events. Conversely, slow internal motions are associated with large amplitude fluctuations of the N-H bond vector in regions of increased flexibility (lower order parameters). Extremely slow effective internal correlation times are associated with completely unrestricted motions and here S2 tends toward zero. The simple model-free formalism as originally posed by Lipari and Szabo [175, 176] assumes isotropic molecular tumbling (τc ) on the nanosecond timescale and fast internal motions (τe ) with characteristic correlation times of less than 100 ps [175,176,281]. However, if the internal motions are much faster than the overall tumbling (with τc /τe 1 100 ), then the Lipari-Szabo spectral density function defined in equation (3.18) becomes insensitive to the timescale of internal motions but is still sensitive to the degree of restriction [210]. In such a case, a simplified spectral density equation could be used in which the effective correlation time of internal motions, τe , is assumed to be zero and only S2 and τc are optimized. Conversely, if the overall motions and internal motions occur on similar timescales (τc /τe 2 100), then the motion cannot be considered isotropic in the Lipari-Szabo sense and there is less clear separation between the timescales of motion and they cannot be considered to be uncorrelated. The extended model-free equations, as in equation (3.22), suggested by Clore et al. [221] describe internal motions on two uncorrelated timescales (slow and fast) in which an order of magnitude difference exists between the τs and τf . However, the slow timescale motions often approach the timescale of overall motion and there is not a well defined separation between overall tumbling and internal motion [210]. In the case of the model-free estimation of dynamics parameters for the His-tagged Tat1−72 , the timescales of internal and overall motion are not well separated. As such, it is difficult to make definitive conclusions about the motions of the residues along the chain
231
with any model-free method. The simplest method that best fits the data is Model 7 using the Cole-Cole distribution of local rotational tumbling and no conformational exchange. Addition of a conformational exchange term to the model (as in Model 8) results in only a slight reduction in the Rf values, but increases both the number of parameters estimated and the value of mean[AIC]+sd[AIC] for the overall fit (see Table 4.2). The difference between Models 7 and 8 is very slight in terms of selection criteria, but as the sum of the χ2 values across the protein sequence is less than the sum of the 95% confidence limit critical values, χ2 (0.95), it is reasonable to choose model 7 to represent the data since it is the simpler of the two models. However, it should be noted that, despite the fact that N ( i
χ2i
100 ps leaving barely one order of magnitude separating the timescales of the two types of motion. The low average value of τe stems from the result of τe = 0 for 14 residues. Not surprisingly, each of these 14 residues correspond to failures in the χ2i < χ2 (0.95)i condition and suggest that a different model should be used. Alexandrescu and Shortle [269] observed that failures to meet the χ2i < χ2 (0.95)i in the two-timescale model 233
suggested estimations should be made with a model for a different number of timescales of motion (in that case they increased the number of timescales to use the extended model free approach of Clore et al. [221]). The use of the model-free formalism to study the dynamics of a disordered or partially disordered protein has several pitfalls and is not widely used. Despite some successes in parameter estimation with variations in the approach—like that of the distribution of overall correlation times [223–225]—it is unlikely that there exists a clear separation in the timescales of the internal and overall motions. However, the successes in using the modified modelfree approach in studying the pro-peptide of subtilisin [223, 224] and the partially unfolded domain 2 of annexin I [225] suggested that some attempt should be made to extract dynamic parameters. Unfortunately, none of the models tested provided estimates with uniform significance across the sequence. It may be worthwhile to consider each residue within a disordered protein or a region of disorder in a protein as an independent tumbling body and using the model selection criteria on a per residue basis. It is difficult to justify this idea of each residue being treated as a lone amino acid in solution since there are no studies in the literature referring to such a case. However, if a protein is truly disordered, then there is little reason to assume that one model would represent the entire protein.
5.7
pH Effects
The HSQC spectra for Tat over the pH range from 3.3 to 6.7 (Fig. 4.24) demonstrate a general decrease in the intensities of the cross-peaks. The chemical shifts of the resonances for most residues remain relatively unchanged over the same range of pH values. Exceptions are found for Ser-12, Leu-14, Glu-29, Gly-35, Lys-48, Lys-61 and Lys-91 where small changes in the 1 H and
15
N chemical shifts are observed. The absence of increased dispersion in the
resonances as the pH is raised supports the suggestion that the disordered state of Tat 234
persists at physiological pH. There are several possible explanations that might account for the variation in the intensities of the NMR signals in the HSQC spectra as the pH of the Tat samples increases. One possibility is that the loss of signal intensity may be due to amide hydrogen exchange with the solvent water, which is most pronounced in the affinity tag (residues 1-20) and Cysrich (residues 42-57) regions of the protein where the predicted hydrogen exchange rates are highest (Fig. 4.25). The sensitivity of these regions to hydrogen exchange may result from saturation transfer effects with the water signal leading to resonances within these regions vanishing rapidly as the pH exceeds 4. The susceptibility of resonances from the affinity tag and cysteine-rich region to hydrogen exchange is indirect evidence that these regions are not involved in hydrogen bonding that would be present in stable folded conformations protecting them from hydrogen exchange. The significantly higher rates found in the affinity tag (in particular with the consecutive histidines) is in part responsible for the generally low intensity of affinity tag resonances at or near physiological pH—a general benefit of the tag in NMR spectroscopy. In fact, at pH 6.7 the predicted hydrogen exchange rates for His-6 to His-10 are greater than the amide proton coupling constant (JN H =94 Hz). When kex is much greater than JN H , then the lifetime of the state is too short to be observed in the INEPT period (10.6 ms) and there will be a loss of coherence resulting in no signal [283, 284]. Only one other residue has a hydrogen exchange rate that exceeds JN H and that is Cys-54 at pH 6.7. A possible mechanism for the observed general loss of signal intensity throughout the protein with increasing pH is the inefficient application of the water “flip-back” pulse prior to acquisition. The flip-back pulse is a selective 90◦ pulse which returns the bulk of the water magnetization to the +z -axis [285]. Any remaining transverse solvent magnetization is then dephased by the gradient pulses [169]. However, for labile amide protons exchanging with
235
the water, this would lead to saturation transfer effects from the dephased water resulting in loss of N-H intensity. As the predicted rates of hydrogen exchange for Tat are highest within the affinity tag and Cys-rich regions, these regions would be most affected by saturation transfer between the exchanging amide protons and water and may account for their rapid disappearance as the pH is raised. A second possible explanation of the loss of signal intensity that cannot be ignored is the possible formation of oxidized protein as the pH is raised and the cysteines are less likely to be in a protonated state. The increased sizes of oxidized multimers would result in significant differences in their molecular rotational correlation times compared to the monomer and changes in the chemical environments of many of the residues. Such a complex mixture (dimers, trimers, tetramers,..., icosamers) would result in many weak signals among those of the monomer that would not likely be detectable unless there were some amount of uniformity in the multimerisation (i.e., all dimers or all trimers etc.) It is worth noting that the intensities of the HSQC resonances in Figure 4.24 are highest for the pH 4.1 sample which is slightly higher than the observed pH for which the hydrogen exchange rate is at a minimum in model compounds. For model compounds in water, the logarithm of the hydrogen exchange rates reaches a minimum at approximately pH 3 (pHmin ) [261, 262]. The value of pHmin reflects the ratio between the acid- and basecatalysed exchange rates [286]. For peptides and proteins, deviations from the pHmin of 3.1 observed for model compounds [262, 287] results from unequal effects on the acid- and basecatalysed rate constants: higher ka and lower kb rates result in (elevated) pHmin > 3 [287]. Such deviations of proteins from model compound behaviour are due to sequence-dependent inductive contributions to exchange as well as electrostatic contributions [262]. In solvent accessible regions of the protein, there may also be local pH effects dependent on the net charge of the protein and the ionic strength of the solution [287]. Additional deviations in pHmin result from protein structure, although those are not likely to be significant in the 236
case of Tat in the present study.
5.8
Disorder Predictions
The disorder predicting algorithms tested with the sequence for His-tagged Tat1−72 all found the sequence to be predominantly disordered. The algorithms for RONN (Fig. 4.31), PONDR (Fig. 4.30), IUPred (Fig. 4.32), and DisProt VL3 (Fig. 4.29(a)) predict some degree of order occurring in the cysteine-rich region of the protein (the width of the ordered segment predicted by these methods varies). The prediction methods are likely weighting the sequence toward order in all of these cases as a result of the presence of several cysteine residues which would be expected to be involved in intramolecular disulfide bond formation or coordinated with a metal ion (zinc finger-like). These prediction algorithms do not allow for forced predictions of the protein in a reduced state and therefore assume that each cysteine has the potential to be involved in a disulfide bond. In general, the prediction algorithms agree with the inferences made from the reduced spectral density mapping about the general lack of restriction in the dynamics and motions of the amide backbone of the protein. Although the sequential assignment of the protein shows that all of the cysteine residues are in the reduced state, the relaxation data for the cysteine residues are limited due to low signal-to-noise. These residues are likely undergoing slow conformational exchange on the NMR timescale and result in several cross-peaks being observed with varying intensity for a single residue. Despite the absence of some cysteine residues from the spectral density mapping, the plots in Figures 4.18 and 4.20 show that the cysteine-rich region has varying degrees of motional freedom, but spectral density values for all residues fall within the ranges observed for other disordered or partially disordered proteins [90, 223, 252, 254, 264, 272, 275–277]. Thus, the predictions for sequence disorder throughout the protein from the DisProt program (Fig. 4.29) more closely agree with the 237
reduced spectral density data and with the predictions from the VL3H and VL3E algorithms showing the best agreement.
238
Chapter 6 Conclusions We have developed both an efficient method for the bacterial over-expression of uniformly labeled with
15
N and
13
C Tat1−72 and a rapid purification protocol based on a hexahistidine
affinity tag purification by metal affinity chromatography. This expression and purification system yields on the order of 20 mg of the uniformly labelled protein per litre of labeling medium. Both MALDI-TOF-MS and NMR spectroscopy have shown that the resulting protein is unambiguously reduced and monomeric in solution at pH 4. This expression system provided sufficient protein for detailed structural and dynamic analysis of Tat [166] and may be used to study its interaction with potential binding partners using heteronuclear NMR methods. This is the first example of a uniformly
15
N/13 C-labeled Tat and paves the
way for a number of potential studies of Tat interactions with host cell proteins and TAR. Intrinsically disordered proteins are classified into the eight categories proposed by Uversky et al. [29] based on their level of disorder (see Fig. 1.1). Measurement of both static and dynamic multinuclear NMR parameters shows that Tat1−72 exists predominantly in a wholly disordered extended (random coil) conformation at pH 4. However, multinuclear NMR has also revealed evidence for multiple backbone conformations mainly, but not
239
exclusively, in the Cys-rich region and core. Possible origins of the minor cross-peaks include: cis-trans proline peptide isomerization, minor Cys oxidation, and multiple conformers in slow equilibrium. The multiplicity of some peaks in the spectra, together with broadened peaks and the changes in peak intensity as a function of pH, suggest that the Cys-rich and core regions form transiently stabilized structures at acidic and neutral pH. The present results are pertinent to Tat’s interactions with intracellular binding partners such as cyclin T1 that are expected to encounter only reduced Tat in the intracellular environment. Cyclin T1 likely recognizes the transiently stabilized structure that forms in the Cys-rich region of Tat. Furthermore, the affinity of Tat for the loop region of TAR is greatly increased by interaction with cyclin T1 suggesting binding-induced folding [288], another feature of intrinsically disordered proteins. Multinuclear NMR is likely to be of value in determining the structures of complexes of Tat and its interaction partners. Finally, there is considerable interest in developing a Tat vaccine [139] based on the presence of Tat antibodies in HIV-1-infected individuals who are long-term non-progressors to AIDS. The antibodies raised against oxidized protein putatively recognize conformational epitopes suggesting that, at neutral pH, parts of the protein exist in a stable conformation. The present dynamics analysis suggests that the most likely region to fold is the Cys-rich region and the formation of disulphide bonds could stabilize local structure there. However, the high positive charge density, lack of hydrophobic residues, and our dynamics analysis suggest that the remainder of the protein is unlikely to form a stable conformation even at neutral pH. Given the results of this research, there are many directions to take future studies. Of key importance is our efficient method to obtain the monomeric and isotopically enriched samples for any number of solution state NMR experiments. One area that needs to be studied is the structural and dynamic behavior of Tat at pH values closer to physiological conditions. The loss of NMR resonance intensity at higher pH values may require the 240
presence of zinc ions to retrieve lost signals corresponding to the Cys-rich region, but also to prevent intermolecular disulfide cross-links. Even in the presence of zinc, the protein is not likely to adopt a stable conformation, but the coordination with zinc may change the rate of conformational exchange within the Cys-rich and core regions where signal intensity was observed to be weak. Zinc-bound Tat may provide a convenient workaround to studying monomeric Tat without resorting to harsh reducing conditions at higher pH. NMR relaxation studies of Tat at higher pH would provide a reference frame for the dynamics of Tat prior to any future binding studies, and provide much needed information on the dynamic behavior of Tat in solution to gain understanding of its role in HIV regulation and other pathogenic effects. In addition to zinc, NMR investigation of Tat bound to a peptide fragment corresponding to the cyclin T1 binding domain would provide detail on the structural and functional aspects of this key regulatory complex. Addition of TAR RNA to the Tat–co-factor complex would provide greater insight into the role of Tat. However, the study of such a large complex will have new difficulties in terms of NMR signal intensity due to the slower tumbling of the complex in solution relative to Tat alone. Another aspect of the aforementioned studies would be the inclusion of the full 101 residue protein. Although the 72 residue protein encoded by the first tat exon appears to govern its transcriptional activities, the remaining 29 residues must be of significance to the virus life cycle since it is conserved in all natural HIV isolates. Thus far, the full length protein has been ignored in structural studies. In addition to providing additional epitopes for developing Tat vaccines across many HIV subtypes, the presence of residues 73-101 may have an important effect on some of the non-transcriptional activities of Tat that need to be addressed.
241
Bibliography [1] Wright, P. E.; Dyson, H. J. J. Mol. Biol. 1999, 293, 321–331. [2] Dunker, A. K.; Lawson, J. D.; Brown, C. J.; Williams, R. M.; Romero, P.; Oh, J. S.; Oldfield, C. J.; Campen, A. M.; Ratliff, C. M.; Hipps, K. W. J. Mol. Graphics Modell. 2001, 19, 26–59. [3] Fischer, E. Ber. Dt. Chem. Ges. 1894, 27, 2985—2993. [4] Lemieux, R. U.; Spohr, U. Adv. Carbohydrate Chem. Biochem. 1994, 50, 1–20. [5] Wu, H. Chinese J. Physiol. 1931, 1, 219–234. [6] Mirsky, A. E.; Pauling, L. Proc. Natl. Acad. Sci. USA 1936, 22, 439–447. [7] Pauling, L.; Corey, R. B.; Branson, H. R. Proc. Natl. Acad. Sci. USA 1951, 37, 205–211. [8] Pauling, L.; Corey, R. B. Proc. Natl. Acad. Sci. USA 1951, 37, 729–740. [9] Pauling, L.; Corey, R. B. Proc. Natl. Acad. Sci. USA 1951, 37, 251–256. [10] Kauzmann, W. Adv. Protein Chem. 1959, 14, 1–63. [11] Tanford, C. Protein Sci. 1997, 6, 1358–1366.
242
[12] Kendrew, J. C.; Dickerson, R. E.; Strandberg, B. E.; Hart, R. G.; Davies, D. R.; Phillips, D. C.; Shore, V. C. Nature 1960, 185, 422–427. [13] Blake, C. C. F.; Koenig, D. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Nature 1965, 206, 757–761. [14] Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 737–738. [15] Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 964–967. [16] Karush, F. J. Am. Chem. Soc. 1950, 72, 2705–2713. [17] Koshland Jr., D. E. Proc. Natl. Acad. Sci. USA 1958, 44, 98–104. [18] Bennett, W. S.; Steitz, T. A. Proc. Natl. Acad. Sci. USA 1978, 75, 4848–4852. [19] Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235–242. [20] Bloomer, A. C.; Champness, J. N.; Bricogne, G.; Staden, R.; Klug, A. Nature 1978, 276, 362–368. [21] Bode, W.; Schwager, P.; Huber, R. J. Mol. Biol. 1978, 118, 99–112. [22] Aviles, F. J.; Chapman, G. E.; Kneale, G. G.; Crane-Robinsom, C.; Bradbury, E. M. Eur. J. Biochem. 1978, 88, 363–371. [23] Kriwacki, R. W.; Hengst, L.; Tennant, L.; Reed, S. I.; Wright, P. E. Proc. Natl. Acad. Sci. USA 1996, 93, 11504–11509. [24] Daughdrill, G. W.; Chadsey, M. S.; Karlinsey, J. E.; Hughes, K. T.; Dahlquist, F. W. Nat. Struct. Biol. 1997, 4, 285–291. [25] Fletcher, C. M.; Wagner, G. Protein Sci. 1998, 7, 1639–1642. 243
[26] Shortle, D. Adv. Protein Chem. 2002, 62, 1–23. [27] Bracken, C. J. Mol. Graphics Modell. 2001, 19, 3–12. [28] Schweers, O.; Schonbrunn-Hanebeck, E.; Marx, A.; Mandelkow, E. J. Biol. Chem. 1994, 269, 24290–24297. [29] Uversky, V. N.; Oldfield, C. J.; Dunker, A. K. J. Mol. Recognit. 2005, 18, 343–384. [30] Holt, C.; Sawyer, L. J. Chem. Soc. Faraday Trans. 1993, 89, 2683–2692. [31] Weinreb, P. H.; Zhen, W.; Poon, A. W.; Conway, K. A.; Lansbury, P. T. Biochemistry 1996, 35, 13709–13715. [32] Ohgushi, M.; Wada, A. FEBS Lett. 1983, 164, 21–24. [33] Creighton, T. E. Proc. Natl. Acad. Sci. USA 1988, 85, 5082–5086. [34] Jackson, S.; Fersht, A. R. Biochemistry 1991, 30, 10428–10435. [35] Zwanzig, R. Proc. Natl. Acad. Sci. USA 1997, 94, 148–150. [36] Ptitsyn, O. B.; Uversky, V. N. FEBS Lett. 1994, 341, 15–18. [37] Uversky, V. N.; Ptitsyn, O. B. Biochemistry 1994, 33, 2782–2791. [38] Uversky, V. N.; Ptitsyn, O. B. J. Mol. Biol. 1996, 255, 215–228. [39] Dunker, A. K.; Obradovic, Z. Nat. Biotech. 2001, 19, 805–806. [40] Uversky, V. N. Protein Sci. 2002, 11, 739–756. [41] Radivojac, P.; Iakoucheva, L. M.; Oldfield, C. J.; Obradovic, Z.; Uversky, V. N.; Dunker, A. K. Biophys. J. 2007, 92, 1439–1456. [42] Tompa, P. Trends Biochem. Sci. 2002, 27, 527–533. 244
[43] Tompa, P. FEBS Lett. 2005, 579, 3346–3354. [44] Tompa, P.; Csermely, P. FASEB J. 2004, 18, 1169–1175. [45] Hoh, J. H. Proteins 1998, 32, 223–228. [46] Rout, M. P.; Aitchison, J. D.; Magnasco, M. O.; Chait, B. T. Trends Cell Biol. 2003, 13, 622–628. [47] Kim, T.-A.; Avraham, H. K.; Koh, Y.-H.; Jiang, S.; Park, I.-W.; Avraham, S. J. Immunol. 2003, 170, 2629–2637. [48] Denning, D. P.; Patel, S. S.; Uversky, V.; Fink, A. L.; Rexach, M. Proc. Natl. Acad. Sci. USA 2003, 100, 2450–2455. [49] Brown, H. G.; Hoh, J. H. Biochemistry 1997, 36, 15035–15040. [50] Mukhopadhyay, R.; Kumar, S.; Hoh, J. H. BioEssays 2004, 26, 1017–1025. [51] Tompa, P.; Szasz, C.; Buday, L. Trends Biochem. Sci. 2005, 30, 484–489. [52] Radivojac, P.;
Vucetic, S.;
O’Connor, T. R.;
Uversky, V. N.;
Obradovic, Z.;
Dunker, A. K. Proteins 2006, 63, 398–410. [53] Bussell, Robert, J.; Eliezer, D. J. Biol. Chem. 2001, 276, 45996–46003. [54] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. Bioinformatics 2005, 21, 3433– 3434. [55] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. J. Mol. Biol. 2005, 347, 827–839. [56] Li, X.; Romero, P.; Rani, M.; Dunker, A. K.; Obradovic, Z. Genome Inform. 1999, 10, 30–40.
245
[57] Romero, P.; Obradovic, Z.; Li, X.; Garner, E.; Brown, C.; Dunker, A. K. Proteins 2001, 42, 38–48. [58] Romero, P.; Obradovic, Z.; Dunker, A. K. Genome Inform. 1997, 8, 110–124. [59] Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C. J.; Dunker, A. K. Proteins 2003, 53(S6), 566–72. [60] Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C. J.; Dunker, A. K.; Z., O. J. Bioinform. Comput. Biol. 2005, 3, 35–60. [61] Yang, Z. R.; Thomson, R.; McMeil, P.; Esnouf, R. M. Bioinformatics 2005, 21, 3369–3376. [62] Dyson, H. J.; Wright, P. E. Nat. Rev. Mol. Cell Biol. 2005, 6, 197–208. [63] Ward, J. J.; Sodhi, J. S.; McGuffin, L. J.; Buxton, B. F.; Jones, D. T. J. Mol. Biol. 2004, 337, 635–645. [64] Uversky, V. N.; Gillespie, J. R.; Fink, A. L. Proteins 2000, 41, 415–427. [65] Linding, R.; Jensen, L. J.; Diella, F.; Bork, P.; Gibson1, T. J.; Russell, R. B. Structure 2003, 11, 1453–1459. [66] Jones, D. T.; Ward, J. J. Proteins 2003, 53, 573–578. [67] Ward, J. J.; McGuffin, L. J.; Bryson, K.; Buxton, B. F.; Jones, D. T. Bioinformatics 2004, 20, 2138–2139. [68] Prilusky, J.; Felder, C. E.; Zeev-Ben-Mordehai, T.; Rydberg, E. H.; Man, O.; Beckmann, J. S.; Silman, I.; Sussman, J. L. Bioinformatics 2005, 21, 3435–3438. [69] Linding, R.; Russell, R. B.; Neduva, V.; Gibson, T. J. Nucleic Acids Res. 2003, 31, 3701–3708. 246
[70] Liu, J.; Rost, B. Nucleic Acids Res. 2003, 31, 3833–3835. [71] Dunker, A. K.; Garner, E.;
Guilliot, S.; Romero, P.; Albrecht, K.; Hart, J.;
Obradovic, Z.; Kissinger, C.; Villafranca, J. E. Pac. Symp. Biocomput. 1998, 3, 471–482. [72] Coeytaux, K.; Poupon, A. Bioinformatics 2005, 21, 1891–1900. [73] Wootton, J. C. Comput. Chem. 1994, 18, 269–285. [74] Vucetic, S.; Obradovic, Z.; Vacic, V.; Radivojac, P.; Peng, K.; Iakoucheva, L. M.; Cortese, M. S.;
Lawson, J. D.;
Brown, C. J.;
Sikes, J. G.;
Newton, C. D.;
Dunker, A. K. Bioinformatics 2005, 21, 137–140. [75] Greenfield, N. J. Nat. Prot. 2007, 1, 2876–2890. [76] Price, N. C. Biotechnol. Appl. Biochem. 2000, 31, 29–40. [77] Receveur-Bréchot, V.; Bourhis, J.-M.; Uversky, V. N.; Canard, B.; Longhi, S. Proteins 2006, 62, 24–45. [78] Tsai, C.-J.; de Laureto, P. P.; Fontana, A.; Nussinov, R. Protein Sci. 2002, 11, 1753–1770. [79] Fontana, A.; de Laureto, P. P.; Spolaore, B.; Frare, E.; Picotti, P.; Zambonin, M. Acta Biochim. Pol. 2004, 59, 299–321. [80] Svergun, D. I.; Koch, M. H. J. Curr. Opin. Struct. Biol. 2002, 12, 654–660. [81] Longhi, S.; Receveur-Brechot, V.; Karlin, D.; Johansson, K.; Darbon, H.; Bhella, D.; Yeo, R.; Finet, S.; Canard, B. J. Biol. Chem. 2003, 278, 18638–18648. [82] Vachette, P.; Svergun, D. Small-angle X-ray scattering by solutions of biological macromolecules. In Structure and Dynamics of Biomolecules; Fanchon, E.; Geissler, E.; 247
Hodeau, J.-L.; Regnard, J.-R.; Timmins, P. A., Eds.; Oxford University Press: New York, 2000. [83] Svergun, D. I.; Koch, M. H. J. Rep. Prog. Phys. 2003, 66, 1735–1782. [84] Lipfert, J.; Doniach, S. Annu. Rev. Biophys Biomol. Struct. 2007, 36, 307–327. [85] Koch, M. H. J.; Vachette, P.; Svergun, D. I. Q. Rev. Biophys. 2003, 36, 147–227. [86] Doniach, S. Chem. Rev. 2001, 101, 1763–1778. [87] Wüthrich, K. Angew. Chem. Int. Ed. Engl. 2003, 42, 3340–3363. [88] Chatterjee, A.; Kumar, A.; Chugh, J.; Srivastava, S.; Bhavesh, N. S.; Hosur, R. V. J. Chem. Sci. 2005, 117, 3-21. [89] Dyson, H. J.; Wright, P. E. Chem. Rev. 2004, 104, 3607–3622. [90] Farrow, N. A.; Zhang, O.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1997, 36, 2390–2402. [91] Palmer, A. G. Chem. Rev. 2004, 104, 3623–3640. [92] Mittag, T.; Forman-Kay, J. D. Curr. Opin. Struct. Biol. 2007, 17, 3–14. [93] Barre-Sinoussi, F.; Chermann, J. C.; Rey, F.; Nugeyre, M. T.; Chamaret, S.; Gruest, J.;
Dauguet, C.;
Axler-Blin, C.;
Vezinet-Brun, F.;
Rouzioux, C.;
Rozenbaum, W.; Montagnier, L. Science 1983, 220, 868–871. [94] Popovic, M.; Sarngadharan, M. G.; Read, E.; Gallo, R. C. Science 1984, 224, 497–500. [95] Coffin, J.; Haase, A.; Levy, J. A.; Montagnier, L.; Oroszlan, S.; Teich, N.; Temin, H.; Toyoshima, K.; Varmus, H.; Vogt, P.; Weiss, R. A. Nature 1986, 321, 10. 248
[96] Turner, B. G.; Summers, M. F. J. Mol. Biol. 1999, 285, 1–32. [97] Cullen, B. FASEB J. 1991, 5, 2361–2368. [98] Kingsman, S. M.; Kingsman, A. J. Eur. J. Biochem. 1996, 240, 491–507. [99] Frankel, A. D.; Young, J. A. T. Annu. Rev. Biochem. 1998, 67, 1–25. [100] Kwong, P. D.; Wyatt, R.; Robinson, J.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A. Nature 1998, 393, 648–659. [101] Zwick, M. B.; Saphire, E. O.; Burton, D. R. Nat. Med. 2004, 10, 133–134. [102] Garzon, M. T.;
Lidon-Moya, M. C.;
Barrera, F. N.;
Prieto, A.;
Gomez, J.;
Mateu, M. G.; Neira, J. L. Protein Sci. 2004, 13, 1512–1523. [103] Haseltine, W. FASEB J. 1991, 5, 2349–2360. [104] Aiken, C.; Konner, J.; Landau, N. R.; Lenburg, M. E.; Trono, D. Cell 1994, 76, 853–864. [105] Karn, J. J. Mol. Biol. 1999, 293, 235–254. [106] Freed, E. O. Somat. Cell Mol. Genet. 2001, 26, 13–33. [107] Liang, C.; Wainberg, M. A. AIDS Rev. 2002, 4, 41–49. [108] Ensoli, B.; Barillari, G.; Salahuddin, S. Z.; Gallo, R. C.; Wong-Staal, F. Nature 1990, 345, 84–6. [109] Albini, A.;
Benelli, R.;
Presta, M.;
Rusnati, M.;
Ziche, M.;
Rubartelli, A.;
Paglialunga, G.; Bussolino, F.; Noonan, D. Oncogene 1996, 12, 289–297.
249
[110] Albini, A.;
Soldi, R.;
Giunciuglio, D.;
Giraudo, E.;
Benelli, R.;
Primo, L.;
Noonan, D.; Salio, M.; Camussi, G.; Rockl, W.; Bussolino, F. Nat. Med. 1996, 2, 1371–1375. [111] Goldstein, G. Nat. Med. 1996, 2, 960–964. [112] Nath, A.; Psooy, K.; Martin, C.; Knudsen, B.; Magnuson, D. S.; Haughey, N.; Geiger, J. D. J. Virol. 1996, 70, 1475–1480. [113] Pocernich, C. B.; Sultana, R.; Mohmmad-Abdul, H.; Nath, A.; Butterfield, D. A. Brain. Res. Rev. 2005, 50, 14–26. [114] András, I. E.; Pu, H.; Deli, M. A.; Nath, A.; Hennig, B.; Toborek, M. J. Neurosci. Res. 2003, 74, 255-265. [115] Banks, W. A.; Robinson, S. M.; Nath, A. Exp. Neurol. 2005, 193, 218–227. [116] Westendorp, M. O.; Shatrov, V. A.; Schulze-Osthoff, K.; Frank, R.; Kraft, M.; Los, M.; Krammer, P. H.; Droge, W.; Lehmann, V. EMBO J. 1995, 14, 546–554. [117] Pumfery, A.; Deng, L.; Maddukuri, A.; de la Fuente, C.; Li, H.; Wade, J. D.; Lambert, P.; Kumar, A.; Kashanchi, F. Curr. HIV Res. 2003, 1, 343–362. [118] Guo, X.; Kameoka, M.; Wei, X.; Roques, B.; Gotte, M.; Liang, C.; Wainberg, M. A. Virology 2003, 307, 154–163. [119] Lassen, K.; Han, Y.; Zhou, Y.; Siliciano, J.; Siliciano, R. F. Trends Mol. Med. 2004, 10, 525–531. [120] Kaul, M.; Garden, G. A.; Lipton, S. A. Nature 2001, 410, 988–994. [121] King, J. E.; Eugenin, E. A.; Buckner, C. M.; Berman, J. W. Microbes Infect. 2006, 8, 1347–1357. 250
[122] Toborek, M.; Lee, Y. W.; Flora, G.; Pu, H.; András, I. E.; Wylegala, E.; Hennig, B.; Nath, A. Cell. Mol. Neurobiol. 2005, 25, 181–199. [123] Nath, A.; Geiger, J. Prog. Neurobiol. 1998, 54, 19–33. [124] Vendel, A. C.; Lumb, K. J. Biochemistry 2003, 42, 910–916. [125] Derse, D.; Carvalho, M.; Carroll, R.; Peterlin, B. M. J. Virol. 1991, 65, 7012–7015. [126] Jeang, K.-T.; Xiao, H.; Rich, E. A. J. Biol. Chem. 1999, 274, 28837–28840. [127] Kuppuswamy, M.; Subramanian, T.; Srinivasan, A.; Chinnadurai, G. Nucleic Acids Res. 1989, 17, 3551–3561. [128] Garcia, J. A.; Harrich, D.; Pearson, L.; Mitsuyasu, R.; Gaynor, R. B. EMBO J. 1988, 7, 3143–3147. [129] Smith, S. M.; Pentlicky, S.; Klase, Z.; Singh, M.; Neuveut, C.; Lu, C. Y.; Reitz, M. S.; Yarchoan, R.; Marx, P. A.; Jeang, K. T. J. Biol. Chem. 2003, 278, 44816–44825. [130] Bieniasz, P. D.; Grdina, T. A.; Bogerd, H. P.; Cullen, B. R. EMBO J. 1998, 17, 7056–7065. [131] Chen, D.; Wang, M.; Zhou, S.; Zhou, Q. EMBO J. 2002, 21, 6801–6810. [132] Weeks, K. M.; Ampe, C.; Schultz, S. C.; Steitz, T. A.; Crothers, D. M. Science 1990, 249, 1281–1285. [133] Gupta, B.; Levchenko, T. S.; Torchilin, V. P. Adv. Drug Deliv, Rev. 2005, 57, 637– 651. [134] Campbell, G. R.;
Pasquier, E.;
Watkins, J.;
Bourgarel-Rey, V.;
Peyrot, V.;
Esquieu, D.; Barbier, P.; de Mareuil, J.; Braguer, D.; Kaleebu, P.; Yirrell, D. L.; Loret, E. P. J. Biol. Chem. 2004, 279, 48197–48204. 251
[135] Avraham, H. K.; Jiang, S.; Lee, T. H.; Prakash, O.; Avraham, S. J. Immunol. 2004, 173, 6228–6233. [136] Weissman, J. D.; Brown, J. A.; Howcroft, T. K.; Hwang, J.; Chawla, A.; Roche, P. A.; Schiltz, L.; Nakatani, Y.; Singer, D. S. Proc. Natl. Acad. Sci. USA 1998, 95, 11601– 11606. [137] Carroll, I. R.; Wang, J.; Howcroft, T. K.; Singer, D. S. Mol. Immunol. 1998, 35, 1171–1178. [138] Howcroft, T.; Strebel, K.; Martin, M.; Singer, D. Science 1993, 260, 1320–1322. [139] Opi, S.; Péloponèse, J.-M.; Esquieu, D.; Watkins, J.; Campbell, G.; De Mareuil, J.; Jeang, K. T.; Yirrell, D. L.; Kaleebu, P.; Loret, E. P. Vaccine 2004, 22, 3105–3111. [140] Jeang, K.-T. HIV-1 Tat: Structure and Function. In Human Retroviruses and AIDS 1996: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences; Myers, G.; Korber, B. T.; Foley, B. T.; Jeang, K.-T.; Mellors, J. W.; WainHobson, S., Eds.; Los Alamos National Laboratories: Los Alamos, 1996. [141] Neuveut, C.; Jeang, K. T. J. Virol. 1996, 70, 5572–5581. [142] Wu, Y.; Marsh, J. W. Microbes Infect. 2003, 5, 1023–1027. [143] Berkhout, B.; Silverman, R. H.; Jeang, K. T. Cell 1989, 59, 273–82. [144] Yamaguchi, Y.;
Takagi, T.;
Wada, T.;
Yano, K.;
Furuya, A.;
Sugimoto, S.;
Hasegawa, J.; Handa, H. Cell 1999, 97, 41–51. [145] Bourgeois, C. F.; Kim, Y. K.; Churcher, M. J.; West, M. J.; Karn, J. Mol. Cell. Biol. 2002, 22, 1079–1093.
252
[146] Kim, Y. K.; Bourgeois, C. F.; Isel, C.; Churcher, M. J.; Karn, J. Mol. Cell. Biol. 2002, 22, 4622–4637. [147] Schulte, A.;
Czudnochowski, N.;
Barboric, M.;
Schonichen, A.;
Blazek, D.;
Peterlin, B. M.; Geyer, M. J. Biol. Chem. 2005, 280, 24968–24977. [148] Bannwarth, S.; Gatignol, A. Curr. HIV Res. 2005, 3, 61–71. [149] Mujtaba, S.; He, Y.; Zeng, L.; Farooq, A.; Carlson, J. E.; Ott, M.; Verdin, E.; Zhou, M. M. Mol. Cell. Biol. 2002, 9, 575–586. [150] Dingwall, C.;
Ernberg, I.;
Gait, M. J.;
Green, S. M.;
Heaphy, S.;
Karn, J.;
Lowe, A. D.; Singh, M.; Skinner, M. A. EMBO J. 1990, 9, 4145–4153. [151] Pritchard, C. E.; Grasby, J. A.; Hamy, F.; Zacharek, A. M.; Singh, M.; Karn, J.; Gait, M. J. Nucleic Acids Res. 1994, 22, 2592–2600. [152] Aboul-ela, F.; Karn, J.; Varani, G. J. Mol. Biol. 1995, 253, 313–332. [153] Churcher, M. J.; Lamont, C.; Hamy, F.; Dingwall, C.; Green, S. M.; Lowe, A. D.; Butler, J. G.; Gait, M. J.; Karn, J. J. Mol. Biol. 1993, 230, 90–110. [154] Rana, T. M.; Jeang, K. T. Arch. Biochem. Biophys. 1999, 365, 175–85. [155] Bayer, P.; Kraft, M.; Ejchart, A.; Westendorp, M.; Frank, R.; Rosch, P. J. Mol. Biol. 1995, 247, 529–535. [156] Gregoire, C.; Péloponèse, J.-M.; Esquieu, D.; Opi, S.; Campbell, G.; Solomiac, M.; Lebrun, E.; Lebreton, J.; Loret, E. P. Biopolymers 2001, 62, 324–335. [157] Peloponese, J.-M. et al. C.R. Accad. Sci., Ser. III 2000, 323, 883–894. [158] Freund, J.; Vertesy, L.; Koller, K. P.; Wolber, V.; Heintz, D.; Kalbitzer, H. R. J. Mol. Biol. 1995, 250, 672–688. 253
[159] Puglisi, J. D.; Tan, R.; Calnan, B. J.; Frankel, A. D.; Williamson, J. R. Science 1992, 257, 76–80. [160] Long, K. S.; Crothers, D. M. Biochemistry 1999, 38, 10059–10069. [161] Seewald, M. J.; Metzger, A. U.; Willbold, D.; Rosch, P.; Sticht, H. J. Biomol. Struct. Dyn. 1998, 16, 683–692. [162] Metzger, A. U.; Bayer, P.; Willbold, D.; Hoffmann, S.; Frank, R. W.; Goody, R. S.; Rosch, P. Biochem. Biophys. Res. Commun. 1997, 241, 31–36. [163] Greenbaum, N. L. Structure 1996, 4, 5–9. [164] Hakansson, S.; Caffrey, M. Biochemistry 2003, 42, 8999–9006. [165] Gregoire, C. J.; Loret, E. P. J. Biol. Chem. 1996, 271, 22641–22646. [166] Shojania, S.; O’Neil, J. D. J. Biol. Chem. 2006, 281, 8347–8356. [167] Abragam, A. The Principles of Nuclear Magnetism; Clarendon Press: Oxford, 1961. [168] Ernst, R. R.; Bodenhausen, G.; Wokaun, A. Principles of Nuclear Magnetic Resonance in One and Two Dimensions; Clarendon Press: Oxford, 7th ed.; 2003. [169] Cavanagh, J.; Fairbrother, W. J.; Palmer, A. G.; Skelton, N. J. Protein NMR Spectroscopy: Principles and Practice; Academic Press: San Diego, 1996. [170] Goldman, M. Quantum Description of High Resolution NMR in Liquids; Oxford University Press: New York, 1991. [171] Neuhaus, D.; Williamson, M. P. The Nuclear Overhauser Effect in Structural and Conformational Analysis; John Wiley and Sons: New York, 2nd ed.; 2000.
254
[172] Seaborn, J. B. Hypergeometric Functions and Their Applications; Springer-Verlag: London, 1991. [173] McQuarrie, D. A. Quantum Chemistry; University Science Books: Mill Valley, 1983. [174] Harris, R. K. Nuclear Magnetic Resonance Spectroscopy; Longman: London, 1986. [175] Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4546–4559. [176] Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4559–4570. [177] Kay, L. E.; Torchia, D. A.; Bax, A. Biochemistry 1989, 28, 8972–8979. [178] Farrow, N. A.; Muhandiram, R.; Singer, A. U.; Pascal, S. M.; Kay, C. M.; Gish, G.; Shoelson, S. E.; Pawson, T.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1994, 33, 5984–6003. [179] Orekhov, V. Y.; Pervushin, K. V.; Korzhnev, D. M.; Arseniev, A. S. J. Biomol. NMR 1995, 6, 113–122. [180] Peng, J. W.; Thanbal, V.; Wagner, G. J. Magn. Reson. 1991, 94, 82–100. [181] Peng, J. W.; Wagner, G. Biochemistry 1992, 31, 8571–8586. [182] Luginbuhl, P.; Wüthrich, K. Prog. Nucl. Magn. Reson. Spectrosc. 2002, 40, 199–247. [183] Levitt, M. H. Spin Dynamics: Basics of Nuclear Magnetic Resonance; John Wiley and Sons, Ltd.: West Sussex, England, 1st ed.; 2001. [184] Korzhnev, D. M.; Billeterc, M.; Arsenievb, A. S.; Orekhov, V. Y. Prog. Nucl. Magn. Reson. Spectrosc. 2001, 38, 197–266. [185] Kelly, S. W.; Sholl, C. A. J. Phys.: Condens. Matter 1992, 4, 3317–3330. [186] Frankel, A. D.; Pabo, C. O. Cell 1988, 55, 1189–1193. 255
[187] Sambrook, J.; Russell, D. W. Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 3rd ed.; 2001. [188] Marley, J.; Lu, M.; Bracken, C. J. Biomol. NMR 2001, 20, 71–75. [189] Neidhardt, F. C.; Bloch, P. L.; Smith, D. F. J. Bacteriol. 1974, 119, 736–747. [190] Kay, L. E.; Keifer, P.; Saarinen, T. J. Am. Chem. Soc. 1992, 114, 10663–10665. [191] Delaglio, F.; Grzesiek, S.; Vuister, G. W.; Zhu, G.; Pfeifer, J.; Bax, A. J. Biomol. NMR 1995, 6, 277–293. [192] Shaka, A. J.; Keeler, J.; Frenkiel, T.; Freeman, R. J. Magn. Reson. 1983, 52, 335–338. [193] Wishart, D. S.; Bigam, C. G.; Yao, J.; Abildgaard, F.; Dyson, H. J.; Oldfield, E.; Markley, J. L.; Sykes, B. D. J. Biomol. NMR 1995, 6, 135–140. [194] Wittekind, M.; Mueller, L. J. Magn. Reson., Ser. B 1993, 101, 201–205. [195] Grzesiek, S.; Bax, A. J. Am. Chem. Soc. 1992, 114, 6291–6293. [196] Ikura, M.; Kay, L. E.; Bax, A. Biochemistry 1990, 29, 4659–4667. [197] Yamazaki, T.;
Lee, W.;
Revington, M.;
Mattiello, D. L.;
Dahlquist, F. W.;
Arrowsmith, C. H.; Kay, L. E. J. Am. Chem. Soc. 1994, 116, 6464–6465. [198] Vuister, G. W.; Bax, A. J. Am. Chem. Soc. 1993, 115, 7772–7777. [199] Muhandiram, D. R.; Kay, L. E. J. Magn. Reson., Ser. B 1994, 103, 203–216. [200] Kay, L. E.; Xu, G. Y.; Yamazaki, T. J. Magn. Reson., Ser. A 1994, 109, 129–133. [201] Kay, L. E.; Nicholson, L. K.; Delaglio, F.; Bax, A.; Torchia, D. A. J. Magn. Reson. 1992, 97, 359–375. 256
[202] Habazettl, J.; Myers, L. C.; Yuan, F.; Verdine, G. L.; Wagner, G. Biochemistry 1996, 35, 9335–9348. [203] Schwarzinger, S.; Kroon, G. J.; Foss, T. R.; Chung, J.; Wright, P. E.; Dyson, H. J. J. Am. Chem. Soc. 2001, 123, 2970–2978. [204] Schwarzinger, S.; Kroon, G. J.; Foss, T. R.; Wright, P. E.; Dyson, H. J. J. Biomol. NMR 2000, 18, 43–48. [205] Penkett, C. J.;
Redfield, C.;
Dodd, I.;
Hubbard, J.;
McBay, D. L.;
Mossakowska, D. E.; Smith, R. A. G.; Dobson, C. M.; Smith, L. J. J. Mol. Biol. 1997, 274, 152–159. [206] Palmer, A. G. Annu. Rev. Biophys Biomol. Struct. 2001, 30, 129–155. [207] Peng, J. W.; Wagner, G. J. Magn. Reson. 1992, 98, 308–332. [208] Farrow, N. A.; Zhang, O.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1995, 34, 868–878. [209] Farrow, N. A.; Zhang, O.; Szabo, A.; Torchia, D. A.; Kay, L. E. J. Biomol. NMR 1995, 6, 153–162. [210] Jarymowycz, V. A.; Stone, M. J. Chem. Rev. 2006, 106, 1624–1671. [211] Schwalbe, H.; Fiebig, K. M.; Buck, M.; Jones, J. A.; Grimshaw, S. B.; Spencer, A.; Glaser, S. J.; Smith, L. J.; Dobson, C. M. Biochemistry 1997, 36, 8977–8991. [212] Szyperski, T.; Luginbuhl, P.; Otting, G.; Guntert, P.; W¨ uthrich, K. J. Biomol. NMR 1993, 3, 151–164. [213] Lefevre, J. F.; Dayie, K. T.; Peng, J. W.; Wagner, G. Biochemistry 1996, 35, 2674–2686. 257
[214] Palmer, A. G.; Rance, M.; Wright, P. E. J. Am. Chem. Soc. 1991, 113, 4371–4380. [215] Mandel, A. M.; Akke, M.; Palmer, A. G. J. Mol. Biol. 1995, 246, 144–163. [216] Spyracopoulos, L. J. Biomol. NMR 2006, 36, 215–224. [217] Wolfram Research, Inc., Mathematica; Version 5.0 Wolfram Research, Inc.: Champaign, IL, 2004. [218] Creighton, T. E. Proteins : structures and molecular properties; W.H. Freeman: New York, 2nd ed ed.; 1993. [219] Andrec, M.; Montelione, G. T.; Levy, R. M. J. Magn. Reson. 1999, 139, 408–421. [220] Schurr, J. M.; Babcock, H. P.; Fujimoto, B. S. J. Magn. Reson., Ser. B 1994, 105, 211–224. [221] Clore, G. M.; Szabo, A.; Bax, A.; Kay, L. E.; Driscoll, P. C.; Gronenborn, A. M. J. Am. Chem. Soc. 1990, 112, 4989–4991. [222] Cole, K. S.; Cole, R. H. J. Chem. Phys. 1941, 9, 341–351. [223] Buevich, A. V.; Shinde, U. P.; Inouye, M.; Baum, J. J. Biomol. NMR 2001, 20, 233–249. [224] Buevich, A. V.; Baum, J. J. Am. Chem. Soc. 1999, 121, 8671–8672. [225] Ochsenbein, F.; Neumann, J.-M.; Guittet, E.; Heijenoort, C. V. Protein Sci. 2002, 11, 957–964. [226] Peng, J. W.; Wagner, G. Biochemistry 1995, 34, 16733–16752. [227] Laskowski, R. A.; MacArthurt, M. W.; Thornton, J. M. Curr. Opin. Struct. Biol. 1998, 8, 631–639. 258
[228] d’Auvergne, E. J.; Gooley, P. R. J. Biomol. NMR 2003, 25, 25–39. [229] Ochsenbein, F.;
Guerois, R.;
Neumann, J.-M.;
Sanson, A.;
Guittet, E.;
van
Heijenoort, C. J. Biomol. NMR 2001, 19, 3–18. [230] Bai, Y.; Milne, J. S.; Mayne, L.; Englander, S. W. Proteins 1993, 17, 75–86. [231] Gill, S. C.; von Hippel, P. H. Anal. Biochem. 1989, 182, 319–326. [232] Harris, J. L.; Backes, B. J.; Leonetti, F.; Mahrus, S.; Ellman, J. A.; Craik, C. S. Proc. Natl. Acad. Sci. USA 2000, 97, 7754–7759. [233] Hasan, A. A. K.; Amenta, S.; Schmaier, A. H. Circulation 1996, 94, 517–528. [234] Edwards, A. M.; Arrowsmith, C. H.; Christendat, D.; Dharamsi, A.; Friesen, J. D.; Greenblatt, J. F.; Vedadi, M. Nat. Struct. Biol. 2000, 7, 970–972. [235] Vertes, A.;
Benscura, A.;
Sadeghi, M.;
Wu, X. Adduct formation and energy
redistribution in UV and IR matrix-assisted laser desorption. In Proceedings of the Society of Photo-optical Instrumentation Engineers: Laser plasma generation and diagnostics, Vol. 3935; Haglund, R. F.; Wood, R. F., Eds.; SPIE: Bellingham, 2000. [236] Peti, W.; Smith, L. J.; Redfield, C.; Schwalbe, H. J. Biomol. NMR 2001, 19, 153–165. [237] Dyson, H. J.; Wright, P. E. Nat. Struct. Biol. 1998, 5, 499–503. [238] Yao, J.; Dyson, H. J.; Wright, P. E. FEBS Lett. 1997, 419, 285–289. [239] Zhang, O.; Forman-Kay, J. D.; Shortle, D.; Kay, L. E. J. Biomol. NMR 1997, 9, 181–200. [240] Wishart, D. S.; Sykes, B. D.; Richards, F. M. J. Mol. Biol. 1991, 222, 311–333. [241] Wishart, D. S.; Sykes, B. D. J. Biomol. NMR 1994, 4, 171–180. 259
[242] Frank, M. K.; Clore, G. M.; Gronenborn, A. M. Protein Sci. 1995, 4, 2605–2615. [243] Zhang, O.; Forman-Kay, J. D. Biochemistry 1995, 34, 6784–6794. [244] Smith, L. J.; Bolin, K. A.; Schwalbe, H.; MacArthur, M. W.; Thornton, J. M.; Dobson, C. M. J. Mol. Biol. 1996, 255, 494–506. [245] Zhang, H.;
Leung, A.;
Wishart, D. “The THRIFTY web server, version 1.0”,
http://redpoll.pharmacy.ualberta.ca/thrifty/, University of Alberta, Edmonton, 2005. [246] DeLano, W. “MacPyMOL: A PyMOL-based Molecular Graphics Application for MacOS X”, http://www.pymol.org, DeLano Scientific LLC, San Francisco, 2005. [247] Hu, Y.; Macinnis, J. M.; Cherayil, B. J.; Fleming, G. R.; Freed, K. F.; Perico, A. J. Chem. Phys. 1990, 93, 822–836. [248] Ulrich, D. L.; Kojetin, D.; Bassler, B. L.; Cavanagh, J.; Loria, J. P. J. Mol. Biol. 2005, 347, 297–307. [249] Thormann, T.; Soroka, V.; Nielbo, S.; Berezin, V.; Bock, E.; Poulsen, F. M. Biochemistry 2004, 43, 10364–10369. [250] Bhavesh, N. S.; Sinha, R.; Mohan, P. M.; Hosur, R. V. J. Biol. Chem. 2003, 278, 19980–19985. [251] Otzen, D. E.; Miron, S.; Akke, M.; Oliveberg, M. Biochemistry 2004, 43, 12964– 12978. [252] Schwarzinger, S.; Wright, P. E.; Dyson, H. J. Biochemistry 2002, 41, 12681–12686. [253] Platt, G. W.; McParland, V. J.; Kalverda, A. P.; Homans, S. W.; Radford, S. E. J. Mol. Biol. 2005, 346, 279–294.
260
[254] Yao, J.; Chung, J.; Eliezer, D.; Wright, P. E.; Dyson, H. J. Biochemistry 2001, 40, 3561–71. [255] Redfield, C. Methods 2004, 34, 121–132. [256] Magnuson, D. S. K.; Knudsen, B. E.; Geiger, J. D.; Brownstone, R. M.; Nath, A. Ann. Neurol. 1995, 37, 373–380. [257] Hakansson, S.; Jacobs, A.; Caffrey, M. Protein Sci. 2001, 10, 2138–2139. [258] Studier, F. W.; Rosenberg, A. H.; Dunn, J. J.; Dubendorff, J. W. Methods Enzymol. 1990, 185, 60–89. [259] Han, J. C.; Han, G. Y. Anal. Biochem. 1994, 220, 5–10. [260] Gough, J. D.; R. H., J. W.; Donofrio, A. E.; Lees, W. J. J. Am. Chem. Soc. 2002, 124, 3885–3892. [261] Wüthrich, K. NMR of proteins and nucleic acids; Wiley: New York, 1986. [262] Eriksson, M. A.; Härd, T.; Nilsson, L. Biophys. J. 1995, 69, 329–339. [263] Teilum, K.; Kragelund, B. B.; Poulsen, F. M. J. Mol. Biol. 2002, 324, 349–357. [264] Garcia, P.; Serrano, L.; Durand, D.; Rico, M.; Bruix, M. Protein Sci. 2001, 10, 1100–1112. [265] Bhavesh, N. S.; Juneja, J.; Udgaonkar, J. B.; Hosur, R. V. Protein Sci. 2004, 13, 3085–3091. [266] Gray, T. M.; Arnoys, E. J.; Blankespoor, S.; Born, T.; Jagar, R.; Everman, R.; Plowman, D.; Stair, A.; Zhang, D. Protein Sci. 1996, 5, 742–751. [267] MacArthur, M. W.; Thornton, J. M. J. Mol. Biol. 1991, 218, 397–412. 261
[268] Suh, J.-Y.; Lee, Y.-T.; Park, C.-B.; Lee, K.-H.; Kim, S.-C.; Choi, B.-S. Eur. J. Biochem. 1999, 266, 665–674. [269] Alexandrescu, A. T.; Shortle, D. J. Mol. Biol. 1994, 242, 527–546. [270] Lavalette, D.; Tetreau, C.; Tourbez, M.; Blouquit, Y. Biophys. J. 1999, 76, 2744– 2751. [271] Dayie, K. T.; Wagner, G.; Lefevre, J.-F. Annu. Rev. Phys. Chem. 1996, 47, 243–282. [272] Buck, M.; Schwalbe, H.; Dobson, C. M. J. Mol. Biol. 1996, 257, 669–683. [273] Bai, Y.; Chung, J.; Dyson, H. J.; Wright, P. E. Protein Sci. 2001, 10, 1056–1066. [274] Garber, M. E.; Wei, P.; KewalRamani, V. N.; Mayall, T. P.; Herrmann, C. H.; Rice, A. P.; Littman, D. R.; Jones, K. A. Genes Dev. 1998, 12, 3512–3527. [275] Kelly, G. P.; Muskett, F. W.; Whitford, D. Eur. J. Biochem. 1997, 245, 349–354. [276] Cao, W.; Bracken, C.; Kallenbach, N. R.; Lu, M. Protein Sci. 2004, 13, 177–189. [277] Daughdrill, G. W.; Vise, P. D.; Zhou, H.; Yang, X.; Yu, W.-F.; Tasayco, M. L.; Lowry, D. F. J. Biomol. Struct. Dyn. 2004, 21, 663–670. [278] Torchia, D. A.; Lyerla, J. R.; Quattrone, A. J. Biochemistry 1975, 14, 887-900. [279] Goodman, J. L.; Pagel, M. D.; Stone, M. J. J. Mol. Biol. 2000, 295, 963–978. [280] Clore, G. M.; Driscoll, P. C.; Wingfield, P. T.; Gronenborn, A. M. Biochemistry 1990, 29, 7387–7401. [281] Korzhnev, D. M.; Orekhov, V. Y.; Arseniev, A. S. J. Magn. Reson. 1997, 127, 184–191. [282] Chen, J.; Brooks, C. L.; Wright, P. E. J. Biomol. NMR 2004, 29, 243–257. 262
[283] Henry, G. D.; Sykes, B. D. J. Magn. Reson. Ser. B 1993, 102, 193–200. [284] Koide, S.; Jahnke, W.; Wright, P. E. J. Biomol. NMR 1995, 6, 306–312. [285] Palmer, A. G.; Massi, F. Chem. Rev. 2006, 106, 1700–1719. [286] Dempsey, C. E. Prog. Nucl. Magn. Reson. Spectrosc. 2001, 39, 135–170. [287] Matthew, J.; Richards, F. J. Biol. Chem. 1983, 258, 3039–3044. [288] Wei, P.; Garber, M. E.; Fang, S. M.; Fischer, W. H.; Jones, K. A. Cell 1998, 92, 451–62.
263
Appendix A Resonance Assignments for His-tagged Tat1−72 Table A.1: Resonance assignments of Histidine-tagged Tat1−72 determined at pH 4.1 and 293 K Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
1
MET (M)
2
GLY (G)
3
SER (S)
8.706
115.583
58.720
64.670
174.542
4.521
4
SER (S)
8.508
117.930
58.825
64.393
174.398
4.389
5
HIS (H)
8.582
120.104
55.764
29.599
174.101
4.66
43.418
Hα
178.270
Continued on next page
264
Table A.1 – continued from previous page Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
Hα
6
HIS (H)
8.551
119.261
55.772
29.851
174.109
4.642
7
HIS (H)
8.733
120.061
55.913
29.941
174.126
4.666
8
HIS (H)
8.803
120.545
56.009
29.730
174.109
4.667
9
HIS (H)
8.813
121.030
56.040
29.755
174.076
4.658
10
HIS (H)
8.793
121.694
56.068
30.060
174.133
4.707
11
SER (S)
8.596
118.846
59.008
64.521
174.424
12
SER (S)
8.600
118.500
58.394
63.935
174.857
4.483
13
GLY (G)
8.449
110.714
45.918
173.750
3.953
14
LEU (L)
8.142
121.815
55.668
43.111
177.196
4.356
15 a
VAL (V)
8.224
123.400
60.486
33.390
174.409
4.396
16
PRO (P)
63.699
32.340
176.972
17
ARG (R)
8.529
122.210
56.824
31.519
177.037
18
GLY (G)
8.513
110.532
45.950
19
SER (S)
8.208
115.439
58.824
20
HIS (H)
8.626
120.212
21
MET (M)
22
GLU (E)
8.551
124.631
23
PRO (P)
174.104
3.984
64.529
174.197
4.42
56.037
29.640
174.036
4.705
54.666
31.068
173.814
4.306
62.917
32.303
176.667
Continued on next page
265
Table A.1 – continued from previous page Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
Hα
24 a
VAL (V)
8.223
120.681
62.657
33.578
175.602
25 a
ASP (D)
8.428
126.118
52.331
42.062
175.127
26
PRO (P)
63.699
33.551
176.906
27
ARG (R)
8.550
121.984
56.618
30.072
176.346
4.324
28
LEU (L)
7.883
120.144
55.564
42.895
176.983
4.268
29
GLU (E)
7.965
120.480
54.695
29.759
174.334
4.134
30
PRO (P)
64.394
32.967
177.573
31
TRP (W)
8.399
118.809
57.559
30.792
176.842
4.132
32
LYS (K)
8.416
121.884
56.000
33.723
175.776
4.455
33
HIS (H)
8.098
119.158
53.735
29.243
172.236
4.849
34
PRO (P)
63.900
32.933
177.600
35
GLY (G)
8.638
109.933
45.959
36
SER (S)
8.314
115.643
59.210
37
GLN (Q)
8.568
122.214
38
PRO (P)
39
LYS (K)
8.359
40
THR (T)
41
ALA (A)
4.837
174.188
4.006
64.409
174.832
4.428
56.618
30.072
176.176
4.398
62.916
33.649
176.478
119.721
56.884
33.702
174.289
8.097
115.159
62.291
70.542
174.150
4.304
8.393
126.505
53.110
20.015
177.519
4.368
Continued on next page
266
Table A.1 – continued from previous page Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
Hα
42
CYS (C)
8.419
119.011
59.177
28.745
174.909
4.572
43
THR (T)
8.289
116.541
62.583
70.306
174.211
4.41
44
ASN (N)
8.432
120.995
53.951
39.487
174.999
45 a
CYS (C)
8.254
119.256
59.241
28.689
174.227
46
TYR (Y)
8.307
122.671
57.958
36.497
175.523
47
CYS (C)
8.119
121.208
58.887
28.768
173.745
4.388
48
LYS (K)
7.607
122.380
56.818
33.682
175.552
4.153
49
LYS (K)
8.332
123.080
57.127
33.734
176.583
50
CYS (C)
8.405
120.658
59.147
28.809
173.635
51 a
CYS (C)
8.384
121.717
59.120
28.819
174.928
52
PHE (F)
8.381
123.552
58.265
40.414
174.110
4.549
53
HIS (H)
8.426
120.817
53.951
28.311
174.036
4.685
54 a
CYS (C)
8.416
121.317
58.792
29.286
172.737
55 a
GLN (Q)
8.177
126.982
58.149
31.060
172.164
56
VAL (V)
8.357
122.442
62.991
33.699
176.058
57 a
CYS (C)
8.664
123.713
58.757
28.794
173.638
4.517
58
PHE (F)
8.576
124.503
58.649
40.433
175.247
4.598
59 a
ILE (I)
8.193
123.581
61.567
39.719
175.932
4.234
4.185
Continued on next page
267
Table A.1 – continued from previous page Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
Hα
60 a
THR (T)
8.331
120.290
62.526
70.468
174.160
4.31
61
LYS (K)
8.536
125.652
56.852
33.819
175.879
4.277
62 a
ALA (A)
8.332
125.550
53.115
19.884
177.733
4.31
63 a
LEU (L)
8.282
122.012
53.035
43.177
178.001
4.291
64 a
GLY (G)
8.394
109.454
46.064
174.092
3.925
65
ILE (I)
7.978
120.084
61.858
39.541
176.267
4.135
66 a
SER (S)
8.335
119.400
58.695
64.365
174.223
4.433
67
TYR (Y)
8.258
122.864
58.847
39.496
176.454
4.53
68 a
GLY (G)
8.361
110.077
46.148
174.137
3.887
69 a
ARG (R)
8.163
120.671
56.893
31.444
176.538
4.3
70
LYS (K)
8.339
122.588
56.995
33.717
176.058
71
LYS (K)
8.360
122.962
57.127
33.734
176.583
4.266
72
ARG (R)
8.516
123.710
54.846
29.890
174.224
4.27
73
ARG (R)
8.536
123.441
56.685
31.684
176.000
4.302
74
GLN (Q)
8.546
122.755
56.364
30.662
176.176
4.333
75
ARG (R)
8.497
122.547
56.364
30.155
175.836
76
ARG (R)
8.556
123.736
56.685
31.684
176.112
77
ARG (R) Continued on next page
268
Table A.1 – concluded from previous page Position
Residue Type
HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)
78
PRO (P)
79
PRO (P)
80
GLN (Q)
8.580
81
GLY (G)
82
Hα
63.651
32.675
176.979
121.074
56.712
30.468
176.616
8.531
110.671
45.950
SER (S)
8.235
115.555
58.916
64.529
174.357
4.345
83
GLN (Q)
8.450
123.127
54.366
29.763
174.224
4.583
84
THR (T)
8.151
114.849
62.559
70.371
174.342
4.262
85
HIS (H)
8.549
120.642
55.834
29.771
174.099
4.708
86
GLN (Q)
8.484
122.253
56.261
31.444
175.965
4.35
87
VAL (V)
8.358
122.331
62.991
33.699
176.058
4.151
88
SER (S)
8.456
119.760
58.644
64.388
174.476
4.475
89
LEU (L)
8.444
125.073
55.836
43.067
177.434
4.389
90
SER (S)
8.308
117.186
58.833
64.325
173.486
4.426
91 a
LYS (K)
8.349
123.790
57.046
33.704
175.646
4.334
92
GLN (Q)
8.060
126.607
57.879
31.130
172.329
4.169
269
4.309
174.217
Table A.2: Additional peak assignments from 1 H/15 N-HSQC of
13
C/15 N labelled Histidine-
tagged Tat1−72 determined at pH 4.1 and 293 K Residue Type
Assignment HN (ppm) N (ppm)
GLY
G64 b
8.354
109.262
GLY
G
8.504
110.149
GLY
G68 b
8.406
110.216
GLY
G
8.524
110.288
SER
S
8.289
116.541
CYS
C57 b
7.467
117.651
CYS
C45 b
8.157
119.191
THR
T60 b
8.196
119.256
SER
S66 b
8.342
119.406
LYS
K
8.371
119.722
HIS
H5/20
8.61
120.007
CYS
C57 c
8.415
120.048
ARG
R69 b
8.493
120.181
THR
T60 c
8.34
120.291
VAL
V24 b
8.313
120.531
Continued on next page
270
Table A.2 – continued from previous page Residue Type
Assignment HN (ppm) N (ppm)
THR
T
8.61
120.644
CYS
C54 b
8.373
120.683
ARG
R
8.559
120.962
CYS
C51 b
8.306
121.554
LEU
L63 b
8.638
121.766
LEU
L63 c
8.242
121.78
LEU
L63 d
8.307
122.169
GLN
Q
8.591
122.302
LYS
K
8.384
122.592
TYR/PHE
Y/F
8.272
122.733
TYR/PHE
Y/F
8.302
123.079
GLN
Q
8.45
123.127
GLN
Q55 b
8.592
123.257
CYS
C
8.385
123.268
GLN
Q55 c
8.609
123.349
VAL
V15 b
8.234
123.399
ILE
I59 b
8.181
123.581
CYS
C57 d
8.659
123.711
Continued on next page
271
Table A.2 – concluded from previous page Residue Type
Assignment HN (ppm) N (ppm)
LYS
K
8.342
124.364
ALA
A62 b
8.311
125.386
ASP
D25 b
8.367
125.458
ALA
A62 c
8.368
125.892
ALA
A62 d
8.395
126.169
GLN
Q
8.073
126.612
LYS
K91 b
8.034
127.997
272
Appendix B Model-Free Parameter Estimates for His-tagged Tat1−72
273
1.5
S
2
1
0.5
0
-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
10
!c (ns)
8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure B.1: Model-free parameter estimates using Model 2 (Rf = 0.262) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at a single 14.1 T field using 63. Residues S4 was omitted as an outlier of the parameter estimates. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines.
274
800 700
!e (ps)
600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure B.1: continued
275
1.5
2
1
S
0.5
0
-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (a)
10
!c (ns)
8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (b)
Figure B.2: Model-free parameter estimates using Model 2 (Rf = 0.115) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at a single 18.8 T field using 60 residues. No residues were omitted as outliers of the parameter estimates. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines.
276
800 700
!e (ps)
600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Residue (c)
Figure B.2: continued
277